Lecture Notes in Computational Science and Engineering Editors M. Griebel, Bonn D. E. Keyes, Norfolk R. M. Nieminen, Espoo D. Roose, Leuven T. Schlick, New York
12
Ursula van Rienen
Numerical Methods in Computational Electrodynamics Linear Systems in Practical Applications
With 173 Figures, 65 in Colour
. Springer
Ursula van Rienen Fachbereich Elektrotechnik und Informationstechnik Universitat Rostock 18051 Rostock, Germany e-mail:
[email protected] Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Rienen, Ursula Ivan: Numerical methods in computational electrodynamics: linear systems in practical applications I Ursula van Rienen. - Berlin; Heidelberg; New York; Barcelona; Hong Kong; London; Milan; Paris; Singapore; Tokyo: Springer, 2001 (Lecture notes in computational science and engineering; 12) ISBN 3-540-67629-5
Front cover: High-voltage engineering. Epoxid resin specimen with layer of water drops on the surface. Shown is a vector representation of the electro-quasistatic field.
Mathematics Subject Classification (2000): 65C20, 65F05, 65FlO, 65F50, 65N06, 65N12, 6SN22, 6SN2S, 6SNso, 6SNSS, 78-02, 78-04, 78-08, 78A2S, 78A30, 78A3S, 78A40, 78A4S ISSN 1439-73S8 ISBN 3-S40-67629-S Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microftlm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH © Springer-Verlag Berlin Heidelberg 2001
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover Design: Friedheim Steinen-Broo, Estudio Calamar, Spain Cover production: design & production GmbH, Heidelberg Typeset by the author using a Springer TEX macro package Printed on acid-free paper
SPIN 10653083
46/3142/LK - 5 43210
To my children
J an and Viola
Contents
Acknowledgements ........................................... XI Overview ..................................................... XIII Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Classical Electrodynamics ................................ 1.1 Maxwell's Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.2 Energy Flow and Processes of Thermal Conduction. . . . . . . .. 1.2.1 Energy and Power of Electromagnetic Fields ........ , 1.2.2 Thermal Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.3 Classification of Electromagnetic Fields ................... 1.3.1 Stationary Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.3.2 Quasistatic Fields ................................ 1.3.3 General Time-Dependent Fields and Electromagnetic Waves .......................................... 1.3.4 Overview and Solution Methods. . . . . . . . . . . . . . . . . . .. 1.4 Analytical Solution Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.4.1 Potential Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.4.2 Decoupling by Differentiation . . . . . . . . . . . . . . . . . . . . .. 1.4.3 Method of Separation ............................ , 1.5 Boundary Value Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.5.1 Boundary Value Problems of the Potential Theory ... , 1.5.2 Further Boundary Conditions. . . . . . . . . . . . . . . . . . . . .. 1.5.3 Complete Systems of Orthogonal Functions. . . . . . . . .. 1.6 Bibliographical Comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
11 11 13 13 14 15 15 17
Numerical Field Theory. .. .. .. . . . . . . .. . . . . .. . . . . .. .. .. . .. 2.1 Mode Matching Technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.1.1 Mathematical Treatment of the Field Problem. . . . . .. 2.1.2 Scattering Matrix Formulation. . . . . . . . . . . . . . . . . . . .. 2.1.3 Standing Waves and Traveling Waves. . . . . . . . . . . . . .. 2.1.4 Convergence and Error Investigations. . . . . . . . . . . . . .. 2.2 Finite Element Method .................................
35 36 37 38 42 44 48
1.
2.
21 22 23 23 25 26 29 29 30 31 33
VI
3.
Contents 2.2.1 General Outline of the Finite Element Approach 2.2.2 Weighted Residual Method; Galerkin Approach ...... 2.2.3 Duality Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.2.4 Finite Element Discretizations of Maxwell's Equations 2.2.5 Synthesis Between FEM with Whitney Forms and Finite Integration Technique. . . . . . . . . . . . . . . . . . . . . . . .. 2.3 Finite Integration Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.3.1 FIT Discretization of Maxwell's Equations. . . . . . . . . .. 2.3.2 Stationary Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.3.3 Quasistatic Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.3.4 General Time-Dependent Fields and Electromagnetic Waves .......................................... 2.4 Resulting Linear Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.4.1 Special Properties of Complex Matrices ... . . . . . . . . .. 2.4.2 Mode Matching Technique. . . . . . . . . . . . . . . . . . . . . . . .. 2.4.3 Finite Integration Technique ...................... 2.5 Bibliographical Comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
49 50 52 54
Numerical 'freatment of Linear Systems. .. .. .. .. .. . . . . . .. 3.1 Direct Solution Methods ................................ 3.1.1 LV-decomposition; Gaussian Elimination. . . . . . . . . . .. 3.2 Classical Iteration Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.2.1 Practical Vse of Iterative Methods: Stopping Criteria. 3.2.2 Gauss-Seidel and SOR . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.2.3 SGS and SSOR Algorithms. . . . . . . . . . . . . . . . . . . . . . .. 3.2.4 The Kaczmarz Algorithm ......................... 3.3 Chebyshev Iteration .................................... 3.4 Krylov Subspace Methods .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.4.1 The CG Algorithm ............................... 3.4.2 Algorithms of Lanczos Type ....................... 3.4.3 Look-Ahead Lanczos Algorithm .................... 3.4.4 CG Variants for Non-Hermitian or Indefinite Systems. 3.5 Minimal Residual Algorithms and Hybrid Algorithms ....... 3.5.1 GMRES Algorithm (Generalized Minimal Residual) .. 3.5.2 Hybrid Methods .................................. 3.5.3 GCG-LS(s) Algorithm (Generalized Conjugate Gradient, Least Square) ................................ 3.5.4 Overview of BiCG-like Solvers ..................... 3.6 Multigrid Techniques ................................... 3.6.1 Smoothing and Local Fourier Analysis .............. 3.6.2 The Two-Grid Method ............................ 3.6.3 The Multigrid Technique .......................... 3.6.4 Embedding of the Multigrid Method into a Problem Solving Environment .............................. 3.7 Special MG-Algorithm for Non-Hermitian Indefinite System .
83 87 87 91 92 92 94 95 96 99 100 105 108 109 114 115 116
56 56 56 70 72 74 76 76 77 78 80
120 121 121 123 124 127 128 131
Contents
3.7.1
3.8
3.9
3.10
3.11 4.
Pecularities of the Special Problem and Corresponding Measures ........................................ 3.7.2 The Multigrid Algorithm; Properties of the Linear System and its Solution ........................... 3.7.3 Grid Transfers for Vector Fields .................... 3.7.4 The Relaxation .................................. 3.7.5 The Choice of the Cycles in the FMG Approach ...... 3.7.6 The Solution Method on the Coarsest Grid .......... 3.7.7 Concluding Remarks on the Multigrid Algorithm and Possible Outlook ................................. Preconditioning ........................................ 3.8.1 Incomplete LV Decompositions .................... 3.8.2 Iteration Methods ................................ 3.8.3 Polynomial Preconditioning ........................ 3.8.4 Multigrid Methods ............................... Real-Valued Iteration Methods for Complex Systems ........ 3.9.1 Axelsson's Reduction of a Complex Linear System to Real Form ....................................... 3.9.2 Efficient Preconditioning of the C-to-R Method ...... 3.9.3 C-to-R Method and Electro-Quasistatics ............ Convergence Studies for Selected Solution Methods ......... 3.10.1 Real Symmetric Positive Definite Matrices ........... 3.10.2 Complex Symmetric Positive Stable Matrices ........ 3.10.3 Complex Indefinite Matrices ....................... Bibliographical Comments ...............................
VII
Applications from Electrical Engineering ................. 4.1 Electrostatics .......................................... 4.1.1 Plug ............................................ 4.2 Magnetostatics ......................................... 4.2.1 C-Magnet ....................................... 4.2.2 Current Sensor ................................... 4.2.3 Velocity Sensor .................................. 4.2.4 Nonlinear C-magnet .............................. 4.3 Stationary Currents; Coupled Problems ................... 4.3.1 Hall Element .................................... 4.3.2 Semiconductor ................................... 4.3.3 Circuit Breaker .................................. 4.4 Stationary Heat Conduction; Coupled Problems ............ 4.4.1 Temperature Distribution on a Board ............... 4.5 Electro-Quasistatics .................................... 4.5.1 High Voltage Insulators with Contaminations ........ 4.5.2 Surface Contaminations ........................... 4.5.3 Fields on High Voltage Insulators ................... 4.5.4 Outlook .........................................
132 137 141 145 148 148 149 150 151 154 154 155 156 156 158 160 161 162 176 182 199 205 205 205 211 212 213 214 216 217 219 220 221 223 223 224 224 226 227 229
VIII
5.
Contents 4.6 Magneto-Quasistatics ................................... 4.6.1 TEAM Benchmark Problem ....................... 4.7 Time-Harmonic Problems ............................... 4.7.1 3 dB Waveguide Coupler .......................... 4.7.2 Microchip ....................................... 4.8 General Time-Dependent Problems ....................... 4.9 Bibliographical Comments ...............................
233 234 234 237 237 239 239
Applications from Accelerator Physics .................... 5.1 Acceleration of Elementary Particles ...................... 5.2 Linear Colliders ........................................ 5.2.1 Actual Linear Collider Studies ..................... 5.2.2 Acceleration in Linear Colliders .................... 5.2.3 The S-Band Linear Collider Study .................. 5.3 Beam Dynamics in a Linear Collider ...................... 5.3.1 Emittance ....................................... 5.3.2 Wake Fields and Wake Potential ................... 5.3.3 Single Bunch and Multibunch Instabilities ........... 5.4 Numerical Analysis of Higher Order Modes ................ 5.4.1 Computation of the First Dipole Band of the S-Band Structure with 30 Homogeneous Sections ............ 5.4.2 Developments That Followed the ORTHO Studies .... 5.4.3 Geometry and Convergence Studies of Trapped Modes 5.4.4 Comparison with the Coupled Oscillator Model COM. 5.4.5 Comparison with Measurements for the LINAC II Structure at DESY ............................... 5.5 36-Cell Experiment on Higher Order Modes ................ 5.5.1 Design .......................................... 5.5.2 Numerical Results for the First Dipole Band ......... 5.5.3 Measurement Methods ............................ 5.5.4 Bead Pull Measurements .......................... 5.5.5 Comparison of Measurement and Simulation ......... 5.5.6 Measurement with Local Damping .................. 5.5.7 Comments and Outlook ........................... 5.5.8 Suppression of Parasitic Modes ..................... 5.5.9 Design of the Damped SBLC Structure ............. 5.5.10 Concluding Remarks about the Linear Collider Studies 5.6 Coupled Temperature Problems .......................... 5.6.1 Inductive Soldering of a Traveling Wave Tube ........ 5.6.2 Temperature Distribution in Accelerating Structures .. 5.6.3 RF-Window ..................................... 5.6.4 Waveguide with a Load ........................... 5.7 Bibliographical Comments ...............................
243 243 245 247 250 265 268 268 270 279 280 281 288 288 290 294 295 297 299 302 304 305 310 314 315 318 319 320 320 322 323 324 324
Contents
IX
Summary ..................................................... 335 References . ................................................... 337 Symbols ...................................................... 353 Index ......................................................... 363
Acknowledgements
The author wishes to thank all who contributed to the successful completion of this book!. The author is particularly grateful to: - Prof. Dr.-Ing. Thomas Weiland for his encouragement to undertake this venture, for many fruitful discussions and ideas, and for the stimulating working environment and atmosphere. - Prof. Dr. Peter Rentrop for his strong interest in the project and for taking care of the second referee's report on the postdoctoral thesis. - Prof. Dr. Willi Tornig for his encouragement to start a postdoctoral thesis. - My colleagues of the study group Theory of Electromagnetic Fields at the Darmstadt University of Technology for good cooperation. My thanks go especially to Michael Bartsch, Dr. Ulrich Becker, Dr. Markus Clemens, Dr. Micha Dehler, Dr. Peter Hahne, Dr. Bernd Krietenstein, Dr. Philipp Pinder, Oliver Podebrad, Dr. Brigitte Schillinger, Dr. Rolf Schuhmann, Dr. Klaus Steinigke, Dr. Bernhard Wagner, and Dr. Heike Wolter, who shared their ideas with the author in valuable discussions. Dr. Markus Clemens also allowed me to use some magneto-quasistatic examples from his research in this book. - My colleague Dr. Alfons Langstrof deserves very special thanks, since he made it possible for me to carry out a substantial part of the research at home 2 by his help with the installations of a workstation and a pc. - My students who worked on senior or diploma theses, thus making essential contributions to the success of my postdoctoral thesis: Ralf Ehmann, Michael Hilgner, Dr. Bernd Krietenstein, Jiirgen Nahr, Dr. Philipp Pinder, Oliver Podebrad, Michael Sommer, Dr. Martin Witting. Working with them was a real pleasure and gave an important motivation to start a university career. - Dr. Oliver Claus and Dr. Hans-Joachim Kloes from the Institute of High Voltage Engineering of the Darmstadt University of Technology for a fruitful cooperation in the research on contaminated high voltage insulators. 1 2
The German version of this book was the Habilitationsschrift of the author (author's postdoctoral thesis required for qualification as a university lecturer). being able to take care of the children
XII
Acknowledgements
- My colleagues from DESy3 in Hamburg for valuable discussions and for letting me use several drawings concerning the SBLC study: Dr. Michael Drevlak, Dr. Norbert Holtkamp, Dr. Martin Dohlus, and Dr. Rainer Wanzenberg. - My colleagues Dr. Peter HUlsmann, Dr. Martin Kurz, and Wolfgang Muller from Frankfurt University for good cooperation in the course of the 36-cell experiment. - Dr. Bernhard Steffen from the Forschungszentrum JUlich for important ideas and discussions on the development of the presented multigrid algorithm. - Prof. Dr. Owe Axelsson and Dr. Maya Neytcheva for their interest in my problems and for a lot of information on the solution of complex linear systems. The author is indebted to them for the material on the algorithm which is called 'C-to-R algorithm' in this book. - The reviewers and the language editor Ms. Olga Holtz of this book as well as Prof. Tasche, Lubeck University, and Dr. Gisela P6plau, Rostock University, for valuable comments. - Dr. Dirk Hecht, Rostock University, for careful reading of the manuscript. - Mrs. Brigitte Lalk from Rostock University for redrawing many figures in the book. - The "Deutsche Forschungsgemeinschaft" (DFG) for postdoctoral scholarship, which enabled me to have optimal working conditions. In this context, the author would also like to thank her student assistants Christian Wengerter and Michael Hilgner for their enthusiasm and their work. - Finally, the author wishes to thank all who created a suitable work climate in her private life. In particular, the thanks are due to my friend Claudia Dicken-Hahne, who often helped out and lovingly cared for my children. - Very special thanks go to my children Jan and Viola for many weekends away from their mother and to my husband Gereon for his support in every respect. My parents deserve thanks for my education, which let me undertake unusual journeys through life.
3
Deutsches Elektronen-Synchrotron
Overview
L Classical Electrod!!!amics Maxwell's Equations and its analytical solution
b Numerical Field Theory
/
Representative for Semi-analytical Methods: Mode-Matching Method
Maxwell's Equations and its numerical solution
~
A Representative for Discretization Methods: Finite Element Method
,,, ,, :,
A Representative for Discretization Methods: Finite Integration Technique
..:.
I 3. Numerical Treatment of Linear Systems I
/
I Direct Methods
ARRlications from Electrical En&ineerin& ~
i
~
Stationary Methods
Instationary Methods
f
f
~ ARRlications
from Accelerator Physics
iJ Figure 0.1. Organization of the book.
I
Introd uction
Linear systems are found in all kinds of scientific disciplines once a linear relation between a system of unknowns is formulated. In addition, nonlinear relations are often linearized in order to simplify their solution. Classical electrodynamics deals with macroscopic electric and magnetic phenomena. These experimentally observed phenomena have been formally described by James Clerk Maxwell (1831 - 1879) in Maxwell's equations. Time-varying electric fields cause magnetic fields and vice versa. Therefore the general term electromagnetic fields is used. Maxwell's equations form the axiomatic basis of electrodynamics - analogously to Newton's axioms for mechanics. For very simple geometrical structures, Maxwell's equations can be explicitly solved analytically. (Analytical methods are reviewed in section 1 "Classical Electrodynamics" .) The goal of this book is to bring together demanding applications from numerical field theory and modern methods from numerical mathematics in order to make the solution of the problems as efficient as possible. Furthermore, practical properties of the numerical methods are studied for examples of practical relevance. For this purpose, various field problems are solved. Some of them are parts of bigger future projects that require semi-analytical methods on one hand and some discretization method on the other hand. Choosing an appropriate method for the solution of linear systems, one cannot right away transfer to practice the convergence properties that have been derived theoretically, since practical problems generally show a diversity of additional difficulties, which are simply not typical for model problems. We will see examples of this. Moreover, complex, non-Hermitian systems, often indefinite or even nearly-singular, are typical for field theory. Some recently developed methods for this kind of systems have been studied from the point of view of their practical suitability for field theoretical problems.
Calculation of Electromagnetic Fields A variety of methods has been developed in the past to compute electromagnetic fields in practical applications. Some representatives of these methods are described in section 2 "Numerical Field Theory": The mode matching method, the Finite Element Method, and the Finite Integration Technique are U. Rienen, Numerical Methods in Computational Electrodynamics © Springer-Verlag Berlin Heidelberg 2001
2
Introduction
treated in more detail. They are just specimen of larger classes of schemes. Essentially, we have to distinguish between semi-analytical methods, discretization methods, and lumped circuit models. The semi-analytical methods and the discretization methods start directly from Maxwell's equations. Semi-analytical methods are concentrated on the analytical level: They use a computer only to evaluate expressions and to solve resulting linear algebraic problems. The best known semi-analytical methods are the mode matching method, which is described in subsection 2.1, the method of integral equations, and the method of moments. In the method of integral equations, the given boundary value problem is transformed into an integral equation with the aid of a suitable Greens' function. In the method of moments, which includes the mode matching method as a special case, the solution function is represented by a linear combination of appropriately weighted basis functions. The treatment of complex geometrical structures is very difficult for these methods or only possible after geometric simplifications: In the method of integral equations, the Greens function has to satisfy the boundary conditions. In the mode matching method, it must be possible to decompose the domain into subdomains in which the problem can be solved analytically, thus allowing to find the basis functions. Nevertheless, there are some applications for which the semi-analytic methods are the best suited solution methods. For example, an application from accelerator physics used the mode matching technique (see subsection 5.4). This method leads to full complex matrices with rank of order one to two hundred (compare subsection 2.4). So, they can be solved very well by means of direct solution methods described in subsection 3.1. The second class of methods employs local difference equations obtained after suitable discretization. The best known discretization methods are the Finite Element Method and the Finite Difference Method. The Finite Element Method is treated briefly in subsection 2.2. For the numerical field computation, various Finite Element formulations are used. Some of these methods start from the Poisson, wave, or Helmholtz equation. Yet, these partial differential equations of second order follow only if one imposes restrictions on the material parameters €, /-L and K.. Therefore, numerical solution methods that are based on these partial differential equations can only be applied to domains which are composed of a few sub domains with linear, homogeneous, and isotropic material each. Then the differential equations are solved for a vector potential and/or a scalar potential or for the vector fields E or H. The domain decomposition, which is performed to allow the treatment of piecewise homogeneous material filling, requires a complicated coupling of the equations at all junctions. A very adequate Finite Element formulation for electromagnetic problems is given by the edge element technique. A method that is particularly well suited for field theoretical problems is the Finite Integration Technique (shortly FIT), which was developed by Weiland and is directly based on Maxwell's equations in integral form. The
Introduction
3
Finite Integration Technique is described in subsection 2.3. It consistently transforms Maxwell's equations into a system of linear algebraic equations, the so-called Maxwell Grid Equations. Depending on the problem type, linear systems or eigenvalue problems have to be solved or a time-domain integration has to be performed. Subsection 2.4 describes the resulting linear systems. The matrices are each sparse with fixed band structure. Their rank is typically of order several hundred thousand or several million. Besides real positive semi-definite matrices, complex non-Hermitian, partly indefinite systems also have to be solved. In particular, the complex symmetric matrices are typical for field theoretical problems. Generally, the nature of chosen discretization directly implies special characteristics of the resulting matrices, like their sparsity, the special filling pattern (e.g., band structure in FIT), etc. Thus, looking at the difference equations, i.e., the connections among unknowns in the underlying grid, already gives an idea about the resulting matrix structure.
Modeling of Numerical Field Problems The application of the Finite Integration Technique to field theoretical problems is briefly described in subsection 2.3. In the course of this, the modeling of stationary current fields, stationary temperature problems, electroquasistatics, and problems with time-harmonic excitation arises as a new area of application of the Finite Integration Technique. Generally, electroquasistatics is hardly known. Its area of applicability and its modeling are therefore treated in some more detail. Also new is the discussed possibility to consistently solve coupled temperature problems by the Finite Integration Technique. Analogously to electrostatics, the stationary current and temperature problems lead to Poisson's equations and to real positive semi-definite matrices. A formulation with a complex scalar potential is used for electroquasistatics. This yields Poisson's equation with complex quantities and complex symmetric matrices. Problems with time-harmonic excitation yield complex indefinite systems, which may even become singular, depending on the excitation frequency. The numerical problems from field theory thus often reduce after suitable modeling to solving large sparse linear systems of equations
Ax = b. The type of the system matrix A varies from real, symmetric, positive definite to complex, non-Hermitian, indefinite, and nearly-singular (see subsection 2.4). The matrices are usually sparse, but some methods in numerical field theory give rise to full matrices. Complex matrices may be rewritten as twice as large real matrices in order to apply methods for real linear systems. However, this procedure cannot be recommended from the numerical point of
4
Introduction
view, since the condition of the corresponding real matrix is much worse than that of the complex matrix. The complex non-hermitian system matrices are particularly typical for electrical engineering, while they rarely occur in other fields. Numerical Treatment of Large Linear Systems - Theory and Practice
The most important numerical methods for the solution of linear systems of equations are briefly introduced in section 3 "Numerical Treatment of Linear Systems" . In the focus of that section are several more recent iterative methods. The direct methods such as Gaussian elimination are only briefly presented in subsection 3.1, since they belong to the "classics", which can be found in most elementary textbooks on numerical mathematics. They are appropriate for full matrices, like those occurring, for example, in the mode matching technique. Iterative methods are applied to large sparse matrices, like those typical for discretization methods. The iterative methods may be divided 4 into two groups: the classical iterative methods, which are described in subsection 3.2, and the Krylov subspace methods described in subsection 3.4. there also exist such methods as the multigrid methods, which are described in subsection 3.6. The Jacobi, GauB-Seidel, and SOR methods are representatives of the classical iteration methods. They are easy to understand and implement. Significantly more efficient are the modern Krylov subspace methods and the multigrid methods, yet their theoretical analysis is significantly more difficult. The Krylov subspace methods described in subsection 3.4 form a group of closely related methods. Historically, the Lanczos method published in 1950 and the conjugate gradient method (shortly cg method) published by Hestenes and Stiefel in 1952 form the basis of the subsequent evolution of the Krylov subspace methods. With these algorithms, the solution of a linear system is achieved by minimization of a residual functional. The iterates are related to the initial residual by multiplication by a polynomial in the system matrix, i.e., the minimization takes place over special vector spaces, the socalled Krylov spaces. They give rise to a sequence of orthogonal or conjugated vectors. Conjugateness means orthogonality with respect to an inner product with a weighting matrix. The important advantages of Krylov subspace methods is their relatively high convergence rate, which may be even increased by various preconditioning techniques, their independence of any parameters, which make unnecessary such estimations as those used for the relaxation parameter in the SOR method, their acceptable storage requirements, their low performance 4
yet, another possible division into stationary and non-stationary methods is given in section 3.
Introduction
5
times per iteration, and their good rounding error properties. Yet, for indefinite or non-symmetric matrices, these methods may become unstable. Therefore, generalized cg methods in many different versions were developed since the end of the 70s, those being applicable also for non-symmetric and/or indefinite systems. Often, only a clever combination of preconditioning and generalized cg method yields the wanted robustness. Nevertheless, the Krylov subspace methods are still an active area of research. There is a large number of recent publications, which in particular deal with the application of these methods to non-Hermitian linear systems. Unfortunately, for non-Hermitian systems Ax = b with A E c nxn , A :I A H , the robustness of the Krylov subspace methods is not sufficient to use them as black box solvers. There are enough examples of system matrices A for which the presented methods either do not reach the prescribed accuracy or even do not converge at all. Some of these examples are of great importance for practice, so thorough numerical experiments are always needed in order to decide which solution method is suitable for a given problem. Other important classes of methods are the Minimal Residual method described in subsection 3.5, which is more stable than the cg-like methods and in particular demonstrates absolutely monotone convergence. But, to compete with these methods, it usually requires (even in versions with restarts) to store a large number of basis vectors. Taking into account the typical size of the applications, the use of the Minimal Residual methods often makes no sense because of bad storage efficiency. Hybrid methods combine the cg or BiCG method or the Look-Ahead Lanczos method with a Minimal Residual formulation, especially with the GMRES algorithm. This way, the advantages of the short recursions in the cg and Lanczos method are combined with the stable and monotone behaviour of the Minimal Residual methods. These methods are briefly treated in subsection 3.5. The resulting algorithms are particularly well suited for the solution of complex non-Hermitian systems of equations. Originally, the multigrid methods, which are described in subsection 3.6, were developed as a construction principle for fast solvers of Poisson's equation. The evolution of the multigrid methods began in the early 60s with publications of Fedorenko and Bakhalov. Based on these publications, Brandt, who recognized the great efficiency of the multigrid methods, started his investigations in 1972. Independently of these studies, Hackbusch developed his multigrid algorithms, which he first published in 1976. Recently, the multigrid methods acquired additional importance as the so-called multilevel preconditioners. A variety of different views and derivations appeared. Recent noteworthy developments are Griebel's representation of the multigrid methods and multilevel preconditioners as classical iterative methods over generating systems and the cascade methods of Deuflhard. The observation of the following typical properties of the classical iterative methods applied to linear systems as they arise as a result of the
6
Introduction
discretization of partial differential equations governed the development of the multigrid methods: The classical iterative methods provide a very good smoothing of the error in only few iteration steps. But with increasingly finer discretization (h ~ 0 ), their rate of convergence decreases, and their total error decreases only insignificantly after the smoothing of the high frequency error components. The multigrid methods also reduce the low frequency error components and have a rate of convergence which is nearly independent of the step size h of the discretization. In general, multigrid methods are especially well suited for linear boundary value problem8 of elliptic partial differential equations as well as for elliptic eigenvalue problems. Then typical advantages of these methods making them superior to the others are their speed and robustness. Axelsson developed a special method, which is described in subsection 3.9, based on an equivalent real formulation for complex linear systems. This method does not use the well known procedure to solve the twice as large real system instead of the complex system, where the condition number of the real system equals the square of that of the original system. Axelsson's method is described in this book. An efficient preconditioning and an iterative method for the solution of the real subsystem is recommended. In this context, the Chebyshev method is used, which is an acceleration method for classical fixed point methods. Instead of using only the information of the last iteration step, a linear combination of the already computed approximate solutions is used. The coefficients are chosen so that faster convergence is reached than for the original sequence of approximate solutions. In this process, the minimax properties of the Chebyshev polynomials are used. For the iterative methods, some convergence studies from numerical experiments are presented in subsection 3.10. In particular, results for real applications are presented in addition to some purely academic examples, which are still relatively closely related to the model problems that are usually used for theoretical convergence studies. The central question, to what extent the theoretical convergence results may be transferred to practical applications, is answered for several examples. Often, these practical problems are very large and have geometrical singularities. Not rarely, the methods show a different convergence behaviour for these practical applications, as could be expected theoretically: For special problems, even hybrid methods, for example, the BiCGSTAB method, demonstrate significantly worse convergence than what could be expected; they even stagnate quite often. Another example of this fact is given by the multigrid methods: In theory, they provide an excellent method for the iterative solution of large linear systems. Yet, the theoretical convergence studies are almost all limited to Poisson's equation on the unit square or similar very simple domains. For the practical application to other partial differential equations and/or to domains with arbitrary convex edges, some problems may arise which require a creative solution before properties of the multigrid algorithm can be reached.
Introduction
7
Sometimes, these then are only roughly comparable with theory, as is shown in subsection 3.10.
Practical Applications in Electrical Engineering and Accelerator Physics The studies on the solution of linear systems in numerical field computation were carried out for examples relevant for practice, as is described in section 4 "Applications from Electrical Engineering" and section 5 "Applications from Accelerator Physics". In the course of this, we went far beyond the solution of a linear system. In high-voltage engineering, the fields on contaminated high-voltage insulators were modelled. In accelerator physics, the physical problem of parasitic fields in accelerator structures was studied. Besides these two very comprehensive problems, a series of other applications was investigated. In particular, computations for coupled temperature problems were also presented.
Field Computation for Various Applications from Electrical Engineering In section 1, a classification of electromagnetic field problems was given and taken up again in subsection 2.3. In section 4, several application problems from these problem classes are investigated: As an electrostatic example, the electric field in a plug is calculated with two different solvers. Results are shown in subsection 4.1. For both magnetostatic examples which were treated in subsection 3.10, the C-magnet and the current sensor, field representations are shown in subsection 4.2. For a velocity sensor, different solvers are compared, including also a black box multigrid solver. Results for a nonlinear calculation of another C-magnet are also described there, showing the necessity of fast solvers. As examples for stationary current and coupled problems, a simple Hall element, a semi-conducting cube, and a circuit breaker are presented. The temperature distribution on a board serves as an example of stationary computation. Furthermore, four examples of coupled temperature computation are presented in subsection 5.6: inductive soldering and temperature distribution in an rf-cavity, an rf-window, and some waveguide with load. Some test specimen of humid high voltage insulators are considered as typical examples for electro-quasistatics (also see remarks below). As an example of magneto-quasistatics, simulation results for a TEAM benchmark problem are shown. Time-harmonic problems can be divided into problems with excitation and eigenvalue problems. Simulation results are shown for the two examples which were studied in subsection 3.10: the 3dB waveguide coupler and the microchip. Another application problem is given by the inductive soldering, an
8
Introduction
eddy current problem with coupled thermal computation which is presented in subsection 5.6. Eigenvalue problems are treated in section 5, especially in subsection 5.4. Some of the above-mentioned examples could also be solved as general time-dependent problems. Other examples of those are the rf-window and the waveguide with load shown in subsection 5.6. Field Simulation for High Voltage Insulators under Environmental Damage Field theoretical simulation intended to optimize the design of high voltage insulators should also include environmental influences which affect the insulator material. Modeling and implementation of electro-quasi statics with the Finite Integration Technique is described in subsection 2.3. It shows the possibility to simulate humid or contaminated high voltage insulators. Discharges may occur on such insulators. Hitherto, the electrostatic simulation was mainly used to solve this problem. Yet, it is easy to show that significant difference exists between the results of the two formulations. Electroquasistatics is the appropriate model, as is also supported by computations for examples treated in subsection 4.5. A Special Application: Modes in Accelerating Structures for Linear Colliders In the future, such high energies (500 GeV to 1.5 TeV) will be needed in e+ e- -physics that storage rings will no longer be a realistic possibility because of their energy losses from synchrotron radiation. As a result, some current projects of a linear collider are carried out worldwide. They are discussed in subsection 5.2. For the S-band 2 x 250 GeV linear collider project SBLC, some investigations were carried out. The SBLC project proposes 2452 so-called constant gradient structures to accelerate the elementary particles. These aperiodic traveling wave tubes shall have 180 cells and an accelerating gradient of 17 MV 1m. A so-called bunch train of 125 bunches is proposed with distance 16 ns from bunch to bunch. In order to reach as high luminosity as possible, each bunch must be prevented from spreading. The beam dynamics is treated in subsection 5.3: Effects of scattered fields which are caused by parasitic modes are among the main reasons for the bunch spreading. Consequently, the suppression of these modes, the so-called Higher Order Modes (HOMs), is of fundamental importance for the actual collider design. The intensity of interaction of the particles with the higher order modes differs a lot. It can be expressed by the so-called loss parameter. For reasons clear from experience and some preliminary theoretical and numerical investigations, it could be assumed that the modes of the first dipole band would cause the worst blow-up effects. Therefore, of main interest were computations for this dipole band.
Introduction
9
The development of a program based on the mode matching technique is briefly described in subsection 2.1 resp. subsection 5.4. In particular, it can be used for the field computation of parasitic modes in long accelerating structures. Results from field computations for a typical accelerating structure are presented. The investigated structure showed many modes which strongly interact with the particle beam and which are trapped inside the structure. Such unwelcome modes have to be suppressed as much as possible in order to maintain the stability of the beam. The" 36-cell experiment" described in subsections 5.5 - 5.5.4 was done with a relatively short structure for measurement studies of higher order modes and their damping. Coupled Temperature Problems in Accelerators The design of many technical components requires the investigation of their electro- or magneto-thermal behaviour. The Finite Integration Technique proved to be a consistent numerical method for the computation of electromagnetic fields. In order to allow the simulation of coupled thermal problems, FIT was applied to the calculation of stationary temperature problems, as is described in subsection 2.3. This procedure guarantees the consistency of coupled calculations. For this purpose, a material of prescribed temperature can serve as a heat source, or a heat source with given density or emission can be chosen. Thus, in particular, the heating resulting from wall losses of modes and the heating by Joule's energy which is caused by eddy currents can completely be evaluated inside a modular program package which is based on FIT. Several examples of practical applications from accelerator physics are presented in subsection 5.6.
1. Classical Electrodynamics
Classical electrodynamics treats macroscopic electrical and magnetic phenomena. These experimentally observed phenomena have been formally described by James Clerk Maxwell (1831 - 1879) in Maxwell's equations [172]. Maxwell's equations build the axiomatic basis of electrodynamics, analogously to Newton's axioms for mechanics. This chapter discusses Maxwell's equations and means for their analytical solution. Rather than aspiring to a comprehensive treatise, the book contains the discussion of only those topics which are important for the main themes of this book. Extensive treatment can be found in many textbooks, e.g., [142], [251]' or [168].
1.1 Maxwell's Equations Maxwell's equations are the fundamental equations of electrodynamics. Timevarying electric fields cause magnetic fields and vice versa. Therefore, one uses the term electromagnetic fields. Their macroscopic behaviour is described by Maxwell's equations.
Maxwell's Equations in Differential ForIll. Maxwell's equations [172] reflect the relations between the four characteristic quantities of electromagnetic fields. These quantities are: E H D = foE+P
B = ILo(H
+ M)
[Vim] [A/m] [C/m2] = [As/m 2] [T]=[Vs/m 2]
electric field strength magnetic field strength electric flux magnetic flux
Often the electric flux is also referred to as the electric displacement and the magnetic flux as the magnetic induction. P is called the electric polarization density (shortly polarization). It is connected to certain charge displacements in the molecules of the considered material and a resulting change of the field. The polarization describes the vector sum of all dipoles with respect to the volume in presence of exterior fields (macroscopic averaged electric dipole density of the material system). Strictly speaking, the electric quadrupole moment and higher order terms have to be added, but in most media they U. Rienen, Numerical Methods in Computational Electrodynamics © Springer-Verlag Berlin Heidelberg 2001
12
1. Classical Electrodynamics
are negligible [142]. M is called magnetization and describes the macroscopic averaged magnetic" dipole density" of the material system. In homogeneously polarized materials, P = fOXeE where Xe is the dielectric susceptibility. For the linear materials, D = fE with f = fo(l + Xe). Here the dielectric constant of vacuum (also influence constant) equals fO = 8.854187· 10- 12 As/Vm and fr := 1 + Xe is referred to as relative permittivity. In diamagnetic media, the relation M = XmH holds, with the magnetic susceptibility Xm. In magnetic linear materials, we can write B = J.LH with J.L = J.Lo(l + Xm). Here J.Lo = 1.256· 10- 6 Vs/ Am equals the permeability of vacuum (also induction constant) and J.Lr := 1 + Xm is called relative permeability. In general, the permittivity f = frfO and the permeability J.L = J.LrJ.Lo are tensors depending on time, space, and field. They are scalars in linear isotropic media. In vacuum, they are constant (see above) satisfying the condition c = 1/ VfoJ.Lo where c = 2.99792458· 108 m/s is the velocity of light. Another field quantity is the current density J with unit A/m 2
J
= J L + J E + J K = ",E + J E + 6 grad p.
J L = ",J E is the conduction current density arising in materials with electric conductivity", (unit 1/ Dm) from the subsisting field strength. J E represents an impressed current density which is independent of all field forces. The convection current density J K = c5 grad p is the density of a current of
free electrical charges with the electric charge densities p (unit As/m 3 ). The proportionality constant 6 is called diffusion constant. Thus all quantities appearing in Maxwell's equations are introduced. Maxwell's equations for static media are then given by
BB Bt BD J Bt +
curl E curl H divD divB
=p
= o.
(1.1) (1.2) (1.3) (1.4)
So, they are coupled partial differential equations of first order. Maxwell's Equations in Integral Form. Equivalent to Maxwell's equations in the form of partial differential equations (1.1) - (1.4) is their representation in the integral form. Let A be a 2-dimensional domain with boundary BA. Upon integrating (1.1) and (1.2) over A, using Stokes' theorem, we can rewrite the first two Maxwell's equations as
1 E.ds leA 1 H.ds leA
_r BB .dA JA
Bt BD (-+J) ·dA. A Bt
!
(1.5) (1.6)
1.2 Energy Flow and Processes of Thermal Conduction
13
Now let V be a 3-dimensional domain enclosed by the surface av. Integrating (1.3) and (1.4) over V and using GauB' theorem, we rewrite the third and fourth Maxwell's equations as
1
D.dA =
lev 1 B·dA lev
+:
rp.dV
(1.7)
iv
= 0.
(1.8)
The continuity equation and the theorem on the conservation of charges divJ
= 0,
1 J.dA+~
lev
r p.dV=O
at iv
(1.9)
follow from Maxwell's equations. To get the continuity equation, apply div to both sides of (1.2), take into account that divcurlH == 0, interchange the order in the mixed derivative with respect to space and time, and use (1.3). The conservation law for charges follows from the continuity equation by integration and GauB' theorem. This law states that the total charge in a volume decreases if and only if the corresponding current flows out at the same time. The solution of Maxwell's equations depends on the problem. The different permissible assumptions lead to a whole series of different types of differential equations and hence to different solution arrangements, which will be discussed later.
1.2 Energy Flow and Processes of Thermal Conduction The energy conservation law states that energy may neither be generated nor annihilated. However, energy can be transformed. All forms of energy: heat, work, mechanical energy, electrical energy, chemical energy, sound, and light, - are equivalent to each other. In our context, we are mainly interested in the transformation of electrical energy into heat. 1.2.1 Energy and Power of Electromagnetic Fields
Electromagnetic energy is transported during the propagation of electromagnetic fields. Their spatial distribution in static, isotropic, electrically and magnetically linear media is representable by their electric and magnetic energy densities (1.10) Electric energy and magnetic energy are coupled with each other and cause an energy flux. The density of this energy current is represented by the vector
14
1.
Classical Electrodynamics
S=ExH. The vector S measured in W /m 2 is referred to as the Poynting vector. Instead of the term 'energy flux', one often uses 'power density' . The conservation of energy is described by Poynting's theorem, which follows from Maxwell's equations:
.
aD at
dlV S = -E· J - E . -
aB at
H . -.
In case of isotropic, linear, homogeneous media, the last formula reduces, by (1.10), to
aWe aw m
.
(
)
Tt+Tt+dlVS=- JL+J E ·E. The rate of change of electromagnetic energy plus the emerging power with respect to time equals the opposite of the performed work. The work performed per unit of space and time (J L + J E) . E describes the conversion of electromagnetic energy into thermal energy and is called Joule's heat. Therefore, thermal effects are also of interest in connection with the conversion of electrical power in Joule's heat.
1.2.2 Thermal Effects In general, there are three ways thermal energy can be transmitted from one thermodynamic system to another: thermal flow (also called thermal conduction), convection and thermal radiation. In gases or liquids, even though thermal conduction takes place, convection plays the main role in the thermal transport. In the following, we are mainly interested in thermal conduction.
Thermal Conduction. Transport of thermal energy by thermal conduction happens through intra-molecular impulse exchange. In isotropic materials, the thermal flux J w (unit W/m 2 ) is described by Fourier's law l : J w = -"'TgradT
(1.11)
Fourier's law states that the thermal current always flows in the direction of decreasing temperature, namely perpendicularly to the isothermallines 2 . The proportionality factor "'T has the dimension W /m·K and is the thermal conductivity of the material (the notation A is often used instead of "'T). In general, the thermal conductivity depends on the temperature T and the pressure of the material. For good electric conductors, the thermal conductivity "'T is proportional to the electric conductivity "'E: 1
2
This relation is only an approximation formula but for some important materials such as metals it is sufficiently good. Isotherms are lines connecting neighbouring points of equal temperature (cf. contour lines).
1.3 Classification of Electromagnetic Fields /'i,T /'i,E
k2
= 2"2T. e
15
(1.12)
This relation is c2,lled Wiedemann-Franz' law; here k 1.38 . 10- 23 J /K 19 denotes the Boltzmann constant and e = 1.60.10- As the charge of the electron. Now, consider a heat source inside an infinitesimally small volume element dV. The first law of thermodynamics states that the change in inner energy u (T) equals the energy supplied by the internal heat source with density w diminished by the thermal flow through the surface of the volume element
au at dV = w dV -
J w . dA.
(1.13)
By Taylor expansion of first order and use of Fourier's law, we get
~~ =w+div(/'i,TgradT).
(1.14)
For a stationary heat flow, i.e., if constant sources have produced an equilibrium state, (1.14) reduces to (1.15) The divergence of the temperature gradient weighted with the thermal conductivity is therefore proportional to the density of the heat source. Since the gradient of the thermal field is a priori free from eddy currents (curl grad == 0), the case of stationary temperature fields presents a problem which is formally identical to the problem of electrostatics, which is treated in the next subsection.
1.3 Classification of Electromagnetic Fields The problems on electromagnetic fields can be divided into several classes: electrostatics, magnetostatics, stationary currents, quasistatics, and fast varying fields. An important special case of time varying fields is the case of harmonic time dependence. 1.3.1 Stationary Fields
Stationary fields are time-independent electromagnetic fields E, H, B, D and J. Their field quantities depend on space only. They are generated by static or uniformly moving charges. Magnetic and electric field get decoupled in stationary case. Because of the independence of time, we get two completely independent systems of differential equations for both.
16
1. Classical Electrodynamics
Electrostatics. Electrostatic fields only exist in non-conductive areas. Since J E = 0, J K = 0, and
'" = 0 =>
JL
= ",E = 0,
electric and magnetic field are decoupled. Electrostatics considers the system curl E = 0
(1.16)
= p.
(1.17)
div D
Equation (1.16) expresses the fact that the electrostatic fields are free from eddy currents. Equation (1.17) gives the source density of the generating electric charges. In linear media, we have D = EE. In general, D = EoE+P. This material equation is also valid, e.g., for permanently polarized media. Many dielectrics are not isotropic, i.e., the polarization depends on how the direction of the applied electric field relates to the preferential direction of the dielectrics. Then the permittivity E is a symmetric tensor of second order and has nine components (only six of which are independent).
Magnetostatics. Magnetostatics treats time-independent magnetic fields: curl H
(1.18)
divB
(1.19)
Equation (1.18) gives the curl density of the generated direct current or, equivalently, of moving electrical charges. Equation (1.19) expresses the fact that there are no magnetic charges. Recall that B = /-1H and that /-1 is a symmetric tensor for linear but non-isotropic media. Non-linear materials, which satisfy equation B = /-1o(H + M), are often considered in magnetostatics. For ferromagnetic media such as iron the relation between M and H depends on the history of the medium and on its special kind. It has to be found by measurements. The relation between M and H (or between B and H) is displayed by the so-called hysteresis curves. Increasing H from its initial value to the desired value, we obtain the first curve of the hysteresis; decreasing H from the obtained value to the initial value leads to a different curve. Hence the hysteresis loop, whose area describes the losses for re-magnetization. Media with wide hysteresis loops display large losses for the re-magnetization and are called "hard materials". A "soft material" with skinny hysteresis loop is appropriate for transformers.
Stationary Current Fields. Stationary current fields arise in conductors where direct voltage is applied. The system curlE
divJL
o o
(1.20) (1.21 )
1.3 Classification of Electromagnetic Fields
17
applies to the direct currents J L = /'i.E. For the stationary case, equation (1.21) follows directly from charge conservation law (1.9) with J = J L. Equations (1.20) and (1.21) demonstrate that the stationary current field is free from eddy currents and sources. 1.3.2 Quasistatic Fields
One can divide time-dependent fields into "slowly" and "fast" varying fields. The quasistatic laws for "slowly"varying fields result from Maxwell's equations if one neglects the magnetic induction or the electric displacement current. Thus, the electromagnetic waves which result from the coupling of magnetic induction and electric displacement current are neglected in electroand magneto-quasistatics. No derivatives with respect to time occur in the quasistatic equations. However, this doesn't mean that the sources and therefore the fields themselves are time-independent. On the contrary, the fields are determined by their sources at a given time independently of the state of the sources a moment earlier. In general, the sources are unknown rather then determined by the fields themselves via Lorentz' force which acts on the particles present in the field. In this context, the time dependencies have to be taken into account again. The "secondary" fields H in electro-quasistatics (shortly EQS) and E in magneto-quasistatics (shortly MQs) may be obtained from the timedependent equations: the continuity equation in electro-quasistatics and the first Maxwell's equation (1.1) in magneto-quasistatics. Electro-Quasistatics. Electro-quasistatics gives a reasonable approximation for low frequency fields which can be thought to be free of eddy currents (i.e., magnetic induction may be neglected), while the effects of displacement current play an important role. So, we assume aB = 0 aD ;i
at
'at
o.
Then Maxwell's equations amount to curl E
=0
curl H - aD +J
- at
div D div B
= p =0
with J = J L + J E, i.e., assuming no electric charges in motion. For harmonic time dependence, E(r, t) = E(r) cos(wt + ¢), we can use the ansatz E(r, t) = Re(E(r)e iwt ),
H(r, t)
= Re(H(r)e iwt )
(1.22)
18
1. Classical Electrodynamics
with the complex amplitudes E(r) = E(r)e i ¢ and H(r) where ¢ is the phase angle of the cosine function. E and H are often also called phasers. Differentiating and taking out the term eiwt , we get curl E
=0
(1.23)
curl H = iwD + ",E + ~E
(1.24)
div D div B
(1.25)
= f!.. = o.
(1.26)
Equations (1.23) and (1.25) are sufficient to determine E uniquely and are, therefore, the fundamental equations of electro-quasistatics. Magneto-Quasistatics. As for the slowly varying fields where the main role is played by the magnetic flux density, one can neglect the contribution of the displacement currents, i.e., max IBBD t
rER3
where J = J L given by
+JE
I« max IJI rER3
= ",E + J E. The resulting system of equations 3 is then
curl E curl H
J
divD
=p
divB
=
BB Bt
o.
For harmonic time dependence, the ansatz (1.22) may be used. Upon differentiating and cancelling eiwt , Maxwell's equations yield curl E = -iwB ~
(1.27) (1.28)
= f!..
(1.29)
=
(1.30)
curl H = div D div B
o.
(1.28) and (1.30) are sufficient to determine H uniquely and are, therefore, the fundamental equations of magneto-quasistatics. The continuity condition div ~ = 0 follows from (1.28). Conditions for Quasistatic Fields. To determine whether the quasistatic approximation is a suitable model for a given problem,4 we may study the 3
4
Magneto-quasistatics is often referred to as quasistatics in literature. The discretization of electro-quasistatics is described in section 2.3. Furthermore, electro-quasistatics is one main topic of the subsections 3.10 and 4.5. Electroquasistatics and the suitability of the quasistatic approximation for certain problems are mostly discussed very shortly in literature, electro-quasistatics is often not remarked upon at all. The argumentation here follows [120], where electroand magneto-quasistatics is treated in great detail.
1.3 Classification of Electromagnetic Fields
19
error fields (satisfying the equations obtained by subtracting the simplified Maxwell's equations from the full ones). The error fields should be small compared to the quasistatic field. In order to facilitate discussion of the orders of magnitude, we assume that all space quantities do not change more than by a factor of two. In this case, we can speak of a typical length L of the setup. Here are two examples. For electro-quasistatics, consider a pair of ideally conducting spheres with radii and distance between the spheres of order L which are excited by a voltage source. For magneto-quasistatics, consider an ideally conducting loop with alternating current applied to it. The radius of the loop and the diameter of its cross-section are of order L. If we think of a medium as a combination of an "ideal conductor" and an "ideal insulator", we may use the following rule of thumb to decide if a problem is electro- or magneto-quasistatic. The frequency of the driving source should be decreased so that the fields become static. If, at this point, the magnetic field vanishes, the field is electro-quasistatic. If the electric field vanishes, the field is magneto-quasistatic. Since many metals are very good conductors and many gases, liquids, and solids are very good insulators, this rule of thumb is not too unrealistic. To estimate the magnitude of the fields, we let L be the typical length for the setup in question. Then the spatial derivatives of the curl and divergence operators can be replaced by 1/ L, which implies
E = pL
for EQS
co
and H = JL for MQS
(1.31)
where E, H, J stand for typical values of E, H, J. For a sine-like excitation, the characteristic time 7 for the sine-like solution to the oscillation equation is be given by the inverse of the frequency w. In case of electro-quasistatics, a time-varying charge causes a current, which in turn induces an H-field. In case of magneto-quasistatics, the time-varying currents cause a time-varying H-field, which causes an E-field. Using (1.31), we obtain
H
= coEL = L2p 7
for EQS and E
7
= /1o HL = /1o JL2 7
for MQS.
7
To get an estimate for the error caused by neglecting magnetic induction (or displacement current), the corresponding estimates are substituted into the full Maxwell's equations. Then we get the following estimates for the error fields: Eerror
/10pL2 = --:;=2
for EQS and
Ii _ co/1oJ L3 error 72 for MQS.
An application of (1.31) gives
Eerror E
/1ocoL2 :"""::"-;0--
72
for EQS and
Herror H
= co/1oL2 72
for MQS
20
1. Classical Electrodynamics
as an estimate for the relative error caused by using the quasistatic equations instead of the full Maxwell's equations. Electro-quasistatics as well as magneto-quasistatics are based on the assumption of sufficiently slow time-variation (low frequencies) and sufficiently small dimensions so that L
-« T c (recall that c = 1/ Vcof.Lo is the speed of light). The quasistatic approximation is therefore valid if an electromagnetic wave can propagate over the characteristic length of the setup in a time which is small compared to the time T.
One decides which of the two quasistatic approximations to use by comparing the given fields with the fields which would exist in the static case. In our example with the spheres, if the system is excited by a constant voltage source, the spheres are charged and the charge causes an electric field. But in this static borderline case there exist no currents and hence no magnetic field. Therefore, the static system is mostly influenced by the electric current, so it is appropriate to assume the setup is electro-quasistatic even when the excitation varies with time. If a direct current is applied to the loop in the second example, the circulating current will cause a magnetic field, but there exist no charges and therefore no electric field. Thus, magneto-quasistatics is the appropriate approximation. In [120], a circular plate capacitor is given as an example of an electroquasistatic setup. If this plate capacitor is excited with a frequency 1 MHz, the quasistatic equations give a good approximation for the actual field, as long as the radius of the plate is much smaller than 300 m. The same book contains an overview of other practical applications of quasistatics. Here is a list of some of them: The skin effect at transmission lines is magneto-quasistatic. The processes in transistors and in the picture tube of a television set, which converts signals into picture and sound, are electro-quasistatic. Electric currents in the nerve lines and other electric activities in the brain are electroquasistatic. Electrical power supply systems give more examples of electroand magneto-quasistatics. For instance, the generator fields in a power plant are magneto-quasistatic, while most electronics in the control room is electroquasistatic. The high voltage power transmission system may be regarded as electro-quasistatic. The specification of the insulator function starts off with EQS approximations. However, after an electric breakdown there flow enough error currents to make an MQS approximation more appropriate. Many aspects of an overland line are electro- or magneto-quasistatic. To lightnings, however, these approximations are no longer applicable. We shall study humid high-voltage insulators in subsection 4.5 as an important example of applications of electro-quasistatics.
1.3 Classification of Electromagnetic Fields
21
1.3.3 General Time-Dependent Fields and Electromagnetic Waves In case of fast varying fields where neither the time variation of the magnetic induction nor that of the density of displacement current may be neglected, the set of full Maxwell's equations has to be solved. Maxwell's equations state that the electric and magnetic fields are interrelated. The time-dependent change of the fields propagates with finite velocity through space. It is called electromagnetic wave 5 . In engineering, electromagnetic waves are generated for the purpose of energy and signal transmission. General Time-Dependent Fields. For fields with general time-dependence where no terms in Maxwell's equations may be neglected, it is convenient to rewrite Maxwell's equations as an initial-boundary value problem for a system of differential equations. For this purpose, the field quantities E and H first have to be normalized appropriately. We use the normalizations
aB /at
aD /at
E H
= .,(ZoE' = IYoH'
(1.32) (1.33)
with the square root of the wave impedance ffo = J /-lo / co, the admittance Yo = l/Zo, and c = crcO, /-l = /-lr/-lO. This normalization results in E' and H' having the same dimension. Introducing an unknown function
u(t):= and an excitation function q(t) = (
(~,)
-t
pv )
for the case of moving charges, we can rewrite the first two Maxwell's equations as u (t) = Lu(t) where
L'- (
_!5:.
c 1 --curl/-lr co 1
+ q(t)
~CUrl~) Cr /-lo 0
(1.34)
.
Adding the initial conditions
U(tO)=(~t), we get an initial-value problem. Its formal solution is given by
u(t) 5
= u(to) +
rt (Lu(t) + q(t))dt.
lto
The propagation speed in vacuum is just the velocity of light.
(1.35)
22
1. Classical Electrodynamics
Time-harmonic Oscillations and Waves. Electromagnetic waves with periodic time dependence are of special importance. For harmonic time dependence, we can use the ansatz (1.22). Upon differentiating and cancelling eiwt Maxwell's equations, we get
curl H
= -iwB = iwD + J L + J E
div D div B
= O.
curl E
e.
1.3.4 Overview and Solution Methods
Stationary Fields Magnetostatics Stationary Currents curl H curl E 0 JE -= divB = 0 divJL -- 0
Electrostatics curl E = 0 div D = p
Quasistatic Fields Electro-Quasistatics Magneto-Quasistatics
o
curl E curl H divD divB
=
curlE
= _oB
curl H divD divB
=
at
General Time-Dependent Fields and Electromagnetic Waves Time-Harmonic Oscillations General Case
-iwB
curl E curl H divD divB
=
iwD + ,IL P
o
i:JB
curl E
+ ,IE
curl H div D div B
alJt
= at + J =
=
P
0
Table 1.1. Classification of electromagnetic fields
Table 1.1 gives a summary of the classification of electromagnetic fields. Typical solution methods (the choice depends on the type of the problem) are analytical solution, numerical methods with field-theoretical orientation, and lumped circuit formalisms. In the sequel, we discuss the usual analytical solution methods. Some special numerical methods of field computation are described in subsections 2.1 and 2.3. Often one has to resort to the application of appropriate lumped circuits.
1.4 Analytical Solution Methods
23
1.4 Analytical Solution Methods Now we discuss some analytical solution methods, precisely, the potential method (for electro- and magnetostatics, stationary current fields, electroquasistatics, and the wave equation), the decoupling of Maxwell's equations by differentiation, and the separation method (for Helmholtz equation and waves in circular waveguides). We assume that t, JL and K, are constants. 1.4.1 Potential Theory To solve Maxwell's equations, one frequently uses potentials. The potential theory is especially useful if the fields are static. Electrostatics. Since the electrostatic field is free from eddy currents, we have curl E = 0, so the field can be described uniquely by a scalar potential function (shortly potential):
E(r)
= - gradip(r) = 'Vip
since curl grad ip == 0. For a linear isotropic material, we have D = tE. Substituting this into the divergence equation, we get the Poisson equation (also potential equation) for a homogeneous material (t = const.) P Llip = --. t
Recall that the Laplace operator Ll equals 'V 2 • In charge free space, p = 0, so the potential satisfies the Laplace equation
Llip =
o.
The Laplace (and Poisson) equations are elliptic differential equations. Magnetostatics. Since div B = 0, the vector B is source free, hence may be expressed as curl of some vector potential A: B
= curlA.
This vector potential is unique up to the gradient of some scalar ip. Choose ip so as to obtain the so-called Coulomb gauge div A = 0.
(1.36)
This is very suitable for static problems. In a linear isotropic medium, we have B
= JLH.
(1.37)
24
1. Classical Electrodynamics
Substituting (1.36) and (1.37) in (1.18) yields 1 curl (-curl A) = J E. J-t
For homogeneous media, this is equivalent to
.!.( graddiv A J-t
..:1A)
= J E,
which then leads (with Coulomb gauge) to
Stationary Current Fields. As in electrostatics, the electric field can be uniquely described by a scalar potential cp because curl E = 0: E = - gradcp.
Since J L
= K,E and K, = canst, we obtain K,div E
= -K,div gradcp = 0,
i.e., the Laplace equation
..:1cp = O.
Electro-Quasistatics. Since the electro-quasi static field is free from eddy currents, it is uniquely defined by some scalar potential function. In time-harmonic case, curl E = 0 with the complex amplitude E. As in the real case, we choose a complex scalar potential
Ce
¢=>
6D;;lb
¢=>
5j
¢=>
Sb= O.
b
= j + DKe
=0
= j + DKe and substituting Ce = - -
b yields
1
CD;; Ce + DK e= - j . Setting y(t)
:=
(2.15)
e makes (2.15) into a first order differential equation y (t) = Ly(t)
+ r(t),
with L = _D~16fJ;;lC and r(t) = _D~l j. Therefore, an initial value problem has to be solved in the general time dependent case. This problem cannot be solved by explicit schemes for stability reasons, so it is necessary to use implicit or semi-implicit methods. The algorithm requires linear systems with the above matrix to be solved repeatedly. For harmonic time dependence, we have E(r, t) = Re(E(r)e iwt ). After differentiating with respect to time and cancelling exp(iwt), we obtain the discrete equation (6fJ;;lc + iwDKk = -iwl, i.e., a boundary value problem.
74
2. Numerical Field Theory
2.3.4 General Time-Dependent Fields and Electromagnetic Waves General Time-Dependent Fields. For general time-dependent fields it is recommended to use the so-called mean-value state variables [76] for the formulation of the discrete initial value problem rather than the integral state variables used before. Let e' = [E'] be the vector of all mean values of the normalized electric field along the grid lines and (1.32), h' = [h'] the corresponding vector for the magnetic field. u(t) := (:;,) The operator L has the following discrete analogue:
(
L
_!5:. E
1 1 --curl/1r
2-CUrl~) Er /10 0
EO
1 1fj-1cfj fj-l) -D K DDA - ( E Er A S /10 _fj-l D-1CD D-l 0 /1r
A
S
EO
The mean-value state variables and the integral state variables are connected via the operators D s, fj s, D A, and fj A for the line (surface) integrals on the . (d ua1) gn. 'd S0, e.g., e = D s D- /10 1/2 D eO -l/2 e. , pnmary The initial value problem can be solved by a one-step or multi-step algorithm. Detailed information about the standard methods for the numerical solution of ordinary differential equations can be found in textbooks. [76] is devoted specifically to the initial value problem for high frequency fields described above. For stability reasons, implicit methods are chosen for slowly varying fields. Then it is necessary to solve a linear system with the matrix above in each step in order to obtain the numerical integration. Since this matrix is very illconditioned [207]' this presents an extremely demanding problem. Recently, some progress has been achieved in solving this problem [57]. Harmonic Oscillations. Harmonic oscillations such as eddy current problems are problems on the frequency domain. Using the Finite Integration Technique for the solution of Maxwell's equations results in the following equivalence between continuous and discrete equations in case of excited time-harmonic oscillations: curlE curl H div D div B
= -iwB = iwD+J... = f!.. =0
{::::::} {::::::} {::::::}
{::::::}
= -iw!!. Cll = iwfl + i Sfl = f!.. S!!.= o. C~
2.3 Finite Integration Technique
75
The relevant continuous and discrete material equations are given by
D=cE
4. = l2es::.
B = /!.H
Q= D Jill
l..=JL+JE
-j
= D"s::. + j
~
.
The fields are excited by an electric current -e j which flows in a closed current loop or a current-carrying wire. The excitation current can also be the Fourier transform of an elementary particle moving along some trajectory. Substituting the equations one into another yields the so-called discrete Curl-Cud Equation or discrete Helmholtz equation. (curl -I curl? - w~f. ') E
/!.
= -iwl..E
The right-hand side -iwj-e represents the impressed current excitation. c' combines the complex conductivity and permittivity.
A Special Cylindrically Symmetric Problem. For the cylindrically symmetric case without azimuthal dependence (monopole fields) and excitation on the axis, a special multigrid algorithm is presented in subsection 3.7. It has been developed in [277]. Again, normalized fields E' and H' are used. Then Maxwell's equations in integral form are
1 E' ·ds laA 1 H'.ds laA 1 ErE'· dA lev 1 Ilr H '· dA lav
CiArIlr H '· dA
-i~
CiAr ErE'· dA + I'
i~
cq' = O.
Here I' represents the beam current (cf. section 5) of frequency w in some rf-structure for the acceleration of elementary particles. This current I' is given by I q v'ZO e-ik(z-zo) for r = 0 . I (w , r , z) = 0 otherwIse
{c
with the wave number k = w / c. As the general solution of an inhomogeneous partial differential equation is the sum of a special solution of the inhomogeneous equation and the general solution of the homogeneous equation, the magnetic field H' is decomposed into an inhomogeneous part H* (the source field), which is caused by the current I', and the homogeneous part HO (source-free field):
76
2. Numerical Field Theory HI
= HO +H*.
Then the solution for the inhomogeneous problem in free space is chosen to be the inhomogeneous part. The current l' induces the following azimuthal magnetic field: H* (w r z) = ~ e-ik(z-zo) Vpper Hessenberg matrix =>Long vector recursion
Arnoldi Implementations FOM GMRES
.;;:~
~
Q)
... ;;. o
~
~
(1)
boO
~
3.1 Direct Solution Methods
87
of the residual r and matrix A. The approximate solutions all lie in the span of {ro, Aro, A2ro, ..... , Anro }. This set forms a basis for the Krylov subspace, and thus Xk belongs to the Krylov subspace. In every iteration, two inner products and a matrix vector product are calculated. The approximate solution is obtained through a three term recurrence of the residual vector which provides a better iterate in the conjugate direction of A. Other methods of this subgroup are the Lanczos algorithm for linear systems, SYMMLQ, the minimal residual method, and the conjugate residual method (CR). Non-stationary iterative methods for general matrices can be split again into two subgroups as displayed in Fig. 3.3: Bi-Lanczos or cg-like methods and Arnoldi Methods. The Bi-Lanczos methods are skew projection methods with a basis pair (two Krylov subspaces); they lead to short recursions like in the cg method. Arnoldi methods are based on orthogonal projections which in the case of general matrices lead to long recursions. First, there are Bi-Lanczos, BiCG, and SCBiCG methods for complex symmetric system matrices A, which are strongly related to BiCG methods with internal LV decomposition, the generalized CGS methods for which CGS and CGS2 serve as representatives. Secondly, there exist BiCG methods with internal least square fit; they belong to the QMR family: QMR and TFQMR. Bi-Lanczos type methods will be described in this section. The second subgroup of non-stationary methods are the Arnoldi Methods with the Arnoldi implementations FOM and GMRES and related methods such as GCR, GMOrthomin, and Axelsson's. Some of those will be described in this section. Finally, there are hybrid non-stationary methods which combine the BiLanczos scheme with the Arnoldi scheme. These are BiCGSTAB, BiCGSTAB2, and the more general BiCGSTAB(l) method. They are described in this section, too.
3.1 Direct Solution Methods The use of direct solution methods can be recommended for systems Ax = b with dense and unstructured system matrix A. In this book, Gaussian elimination is used in the framework of the mode matching technique for determinant computation, matrix inversion, and solution of linear systems. The basic idea of the direct solution of a linear system of equations is the transformation of the system into an equivalent system having triangular form. To achieve stability of the algorithms, it is necessary to apply some scaling (equilibration) and some pivoting.
3.1.1 LU-decompositionj Gaussian Elimination Consider a linear system of equations Ax = b with regular matrix A E c nxn and x, b E C. The so-called pivot elements akk playa central role in the LV
88
3. Numerical Treatment of Linear Systems
decomposition (Gaussian elimination) since they are used as divisors. Their magnitude compared to that of the remaining matrix elements is of crucial importance for the stability of the algorithm. Yet, instability can be avoided by appropriate permutations of rows and columns during the elimination. The permutations can be expressed in matrix notation as follows:
Definition 3.1.1. Let Si E Sn, i = 1, ... , n, be permutations of (1,2, ... , n) and ei = (8li , ... ,8ni f,i = 1, ... ,n be unit vectors. Then a matrix P E c nxn with P
= [e
Sl ' ... ,
esJ is called permutation matrix.
In practice the partial pivoting is used where, in each step, the row containing the element with the largest absolute value in the pivoting column is placed first by permutations. This pivoting strategy needs at most O(n 2 ) additional operations. The algorithm for the Gaussian elimination with partial pivoting goes as follows:
Algorithm 3.1.1.1 (LU Decomposition with Partial Pivoting) Given a regular matrix A E For k=1, ... , n-1 Determine p E {k, k r(k) := p
cnxn .
+ 1, ... ,n}
with
la~~1
=
maxk~i~n H~l
Exchange ak~) and a~~) ,j=k, ... , n. For i = k+1, ... , n (k)
.- aik Iik·-W a kk
a~~+1) := l~:) a~;+I) := a~;) -likak~) for j = k + 1, ... , n The permutations PI, ... , Pn- I are described by the integer vector (r(I), ... , r(n-I»). Pk denotes the exchange of the rows k and r(k). However, partial pivoting only makes sense if the matrix has been scaled before. Otherwise it can lead to an inadequate choice of the pivot element. Row scaling means mUltiplying each row of the matrix A by an appropriate factor so that 11.1100 of all rows of the new system matrix D;I A finally equals a fixed constant 2 . Remark 3.1.1. Let PI, ... , Pn- I be the permutations in the algorithm 3.1.1.1. With P = Pn- I ... PI, we get P A = LU. Here L is a lower triangular matrix and U the upper triangular matrix resulting from the decomposition. The standard algorithm for the direct solution of a linear system is Gaussian elimination with row scaling and partial pivoting. We usually assume in the sequel that L is the product of the elementary lower triangular matrices and the permutation matrices, so that we can write A = LU. 2
Row scaling lowers the probability that a very small number is added to a very large one during the elimination.
3.1 Direct Solution Methods
89
Algorithm 3.1.1.2 (Gaussian Elimination)
- A = LU, Gaussian triangular decomposition (L lower, U = A(n) = L -1 A upper triangular matrix) - Ly = b, forward substitution - U x = y, backward substitution Remark 3.1.2. The number of coefficients to be stored in Gaussian elimination equals n(n + 1), i.e., exactly the number of input coefficients (matrix A and the vector b). Regarding the computational effort, counted in multiplications, the following holds for the Gaussian elimination without pivoting and scaling: n-1 - ex
k=l
n-1 - ex
3
Lk 2 == n3
for the LU-decomposition,
2
Lk == n2 ' each, for the forward and backward substitution
k=l
The computational effort for the partial pivoting O(n 2 ) has to be added. For the mode matching technique, Gaussian elimination is not only used for the solution of linear systems of equations but also for matrix inversion and determinant computation [246J, [279J. The relation A -1 = U- 1 L -1 is exploited for matrix inversion. The matrix L is not stored explicitly: only the multipliers lik below the main diagonal of A are stored. For the pivoting, the pivot vector (r(l), ... , r(n-1)) is constructed. The matrices Lk are Frobenius matrices and therefore have the property that the inverse L;l results from Lk by changing the sign of the elements lik. Thus we get L-1 = Ln-1Pn-1 ... L2P2L1P1. The algorithm for matrix inversion consists of two steps: Algorithm 3.1.1.3 (Matrix inversion with Gaussian Elimination) 1. Replace U with the help of a column-oriented algorithm by U- 1 • 2. Determine the unknown inverse A -1 corresponding to
A- 1 = U- 1Ln-1Pn-1 ... L2P2L1P1. After LU decomposition of A, the determinant computation with the help of Gaussian elimination reduces to evaluating the product of the diagonal elements of U, since the relation det(A) = det(L) det(U) gives n
det(A)
= (-l)P II Ukk, k=l
where p stands for the number of permutations.
90
3. Numerical Treatment of Linear Systems
Remark 3.1.3. Basically, there are two reasons why direct methods are not well suited for large sparse systems resulting due to discretization of two- and three-dimensional problems: 1. Direct methods which are based on a LU decomposition produce a Jillin with non-zero entries in the major part of the band width of matrix A and therefore cost a lot of storage space. Then the matrix has to be stored peripherally for large problems and the I/O-costs, i.e., the writing from disc to file, start to dominate. 2. A common serious drawback of the direct methods is that rounding errors and errors in the data tend to grow faster than the condition number. For elliptical partial differential equations, they grow as O(h- 3 ), where h stands for the step size of the discretization. Therefore, at least a double precision calculation is necessary which has an even stronger impact on the storage requirements.
For full matrices, e.g., the coupling and scattering matrices of the mode matching technique (also for matrices from many other applications), Gaussian elimination remains the method. Many developments have been made to cure the weakness of the algorithm to rounding errors. Examples are intervalanalytical variants of Gaussian elimination or the use of special number representations 3 . The division-free Gaussian algorithm is of great importance for contemporary vector computers, since a division operation takes much more time on those computers than a multiplication [9]. Iterative methods do not have to fight the problem of fill-ins, since appropriate preconditioning and accelerated methods make it possible to obtain algorithms of nearly optimal order of computational complexity. Compared to direct methods, the influence of rounding errors can be reduced by several orders of magnitude by thorough evaluation of the residuals inside the iterative algorithms [7]. Some modern developments (which can not be treated here) are worth mentioning. For some special applications, e.g., two-dimensional problems with system matrix A(p) that continuously depend on some parameter p, direct methods on bandwidth-limited RISC-machines (PC's) can even handle badly conditioned cases. The so-called "U ni-/Multifrontal Direct Solvers" [254] use sophisticated algorithms to reorder the unknowns. Additionally, they use BLAS Level 2 and Level 3 routines, which reach nearly peak performance and are fitted to the Cache size. For matrices A(p) which do not change their filling pattern with p, a subsequent factorization then only requires 10% of the first LU decomposition. Nevertheless, for the asymptotic complexity of the direct solvers becomes evident for really large problems 3
In [104], for example, integer systems of equations resulting from system analysis (studies concerning Petri-nets) were solved "exactly". In this context, a divisionfree Gaussian algorithm with special number representation was developed and interval-analytical inclusion methods were applied.
3.2 Classical Iteration Methods
91
the complexity of the best direct solvers for general matrices reaches O(N 2.S)
[56J.
3.2 Classical Iteration Methods Iteration methods for the solution of linear systems start with an initial approximation to the solution vector, from which they build a sequence of approximate solutions, which (under certain conditions) converges to the exact solution of the system of equations. A special advantage with respect to sparse matrices lies in the simplicity of these methods. Essentially, only matrix-vector multiplications and vector additions have to be carried out. It should be noted that the effort for a matrix-vector multiplication with a sparse matrix usually is of order O(n) rather than O(n 2 ) as for full matrices. The reason is that the number of non-vanishing elements in each row is usually small and independent of the dimension n of the matrix. These methods can also offer an enormous saving of storage space compared with direct methods, since usually only the non-vanishing elements of the system matrix, the solution vector, and a few additional vectors have to be stored. Their disadvantage compared to direct methods is slow convergence or even divergence. In addition, they require an appropriate stopping criterion. As already noted, the iterative methods can be split into two groups, viz. stationary and non-stationary iteration methods. "Classical" iterative methods like Jacobi, Gauss-Seidel, and SOR are stationary. They are easy to understand and to implement. Much more efficient, however, are the multigrid methods, which internally use stationary methods like Gauss-Seidel, and non-stationary methods like the Krylov subspace methods. Yet, their analysis is also much more difficult. But practical applications require fast (and robust) solvers. Thus, the engineers can hardly wait for a decent analysis but start using the methods (recall FEM, which was used by engineers for a long time before the mathematical theory was fully developed). Therefore, it is important to document systematic experimental convergence studies until theoretical convergence results are available. An attempt for such a documentation based on typical examples will be given in section 3.10. Since there exists a whole host of literature on this topic, of which [9] and [103] are recommended examples, the well-known classical methods (together with their symmetric variants) and the Kaczmarz algorithm are treated only shortly in what follows. There is an overview in [21], which is also available on the World Wide Web. The non-stationary methods, especially their modern variants, will be treated to a greater extent in section 3.4, since they are still relatively unknown outside the mathematical community.
92
3. Numerical Treatment of Linear Systems
3.2.1 Practical Use of Iterative Methods: Stopping Criteria Using iterative methods in practice, we need to implement some stopping criterion based on the size of some available quantity. Available quantities in all iterative methods are the initial vector Xo and the iterates Xk, Xk+l, besides the matrix A and the right-hand side b. The residual rk is easily computed but needs one additional matrix-vector multiplication. Non-stationary methods like the Krylov subspace methods recursively calculate a "residual" which can be used for a stopping criterion, but the true residual should be also computed from time to time. Usual criteria are as follows: - an absolute criterion depending on the residual:
- an absolute criterion depending on the iteration error
Ilx* -
Xk
II:
IIXk+l - xkll :S E, - a relative criterion depending on the relative residual, i.e., on the ratio of the actual residual and the initial residual:
- a relative criterion depending on the iterates and the initial vector, again taking Xk+l as an approximation for the unknown exact solution X* and thus approximating the relative iteration error:
The initial vector Xo is usually either set to zero or chosen to be a random vector. In case Xo = 0 we have ro = b in the relative residual. The reader who would like more details about stopping criteria and their relation to the iteration error itself is referred, e.g., to [9].
3.2.2 Gauss-Seidel and SOR Most classical methods are fixed point methods. The iteration function Xk+l = ¢(xd is constructed in such a way that it possesses exactly one fixed point x* which equals the sought for exact solution of the linear system of equations Ax = b. The choice of an appropriate decomposition of the given matrix A is a crucial point in the construction of an effective classical iteration algorithm. The convergence speed of the fixed point iteration Xk+l = M Xk + c is mainly determined by the spectral radius p(M) of the iteration matrix. Yet, it can be improved for a given M by relaxation The SOR algorithm (successive over-relaxation) may be called the classical iteration method for linear systems, besides the Jacobi algorithm. Its main advantage in comparison with other methods is the small storage requirement. This makes the algorithm suitable for extremely large matrices arising in some applications.
3.2 Classical Iteration Methods
93
Algorithm 3.2.2.1 (SOR Algorithm; Successive Over-relaxation) For i = 1, ... , n
In matrix notation, the SOR algorithm can be written as
M;ORx(k+l)
= (M;OR -
with
M;OR
A)x(k)
+b
= ~(D + wL).
w D stands for a diagonal matrix with the elements for a lower triangular matrix with the elements (M~OR - A) is
(M;OR _ A)
aij,
aii,
i
i
= 1, ... , nand
L
> j. Then the matrix
= 1-
w D - U, w where U contains the elements of A lying above the main diagonal. Remark 3.2.1. For w = 1, we obtain the Gauss-Seidel algorithm, which is also referred to as a relaxation method. Strictly speaking, the term 'SOR' is correct only for values w > 1, while for 0 < w < 1 the notion of under-relaxation would be accurate.
Some results for the SOR algorithm will be given here without proofs. They can be found for in textbooks like [9]' [103]' [114], [248]: Theorem 3.2.1. The matrix A is assumed to be positive definite and decomposed as where D is the diagonal of A and L a strictly lower triangular matrix. Furthermore,
0<w'E[a,b]
= min
with Pk E Pk and Pk(l)
=1
is the minimax problem to be solved. The minimax properties of the Chebyshev polynomials are well known ([103], [74]) from the interpolation theory, and the polynomials P k turn out to be specially normalized Chebyshev polynomials P ()..) = Ck(t()..)) with t('\) = 2'\ - a - 1. k Cdt(l)) b- a Using the three term recurrence
as well as some transformations (for details see [103] or [74]) and observing the fact that PI (,\) = w'\ + 1 - w with the optimal relaxation parameter w = 2/(2 - b - a) for the symmetrizable iteration method Xk+1 = MXk + c, one obtains the following relaxation: 5
The choice of another norm presents another possibility. The conjugate gradient methods which are treated in the next subsection are based on this idea.
98
3. Numerical Treatment of Linear Systems
with
,Ck - l (i) . ' 2-b-a Pk := 2t '" with t:= b . Ck(t) - a
=I-
In particular, for a fixed point iteration of the form M the relation Yk
= Pk(Yk-l -
Yk-2
B- 1 A, c = B- 1 b,
+ wB- l (b - AYk-l)) + Yk-2
holds. With that, it is possible to write down the algorithm for the Chebyshev iteration, sometimes also called the Chebyshev acceleration for the fixed point iteration Xk+1 = M Xk + c with M = I - B- 1 A, c = B- 1 b: Algorithm 3.3.0.2 (Chebyshev Iteration) Yo := Xo , 2 - Amax(M) - Amin(M) , 2 t .w .- - - - - - - - - .- Amax(M) - Amin(M) .- 2 - Amax(M) - Amin(M) Co := 1; C l := i Yl := w(Myo + c) + (1 - w)yo For k = 2,,3, ... , kmax : Ck = 2ick- l - Ck-2 ,Ck-l(i) Pk - 2 t -----i:-'Ck(i) Solve the linear system Bz = b - AYk-l Yk = Pk(Yk-l - Yk-2 + wz) + Yk-2 If II Yk - Yk-l II :S tol II Yk II : stop
The convergence speed of the Chebyshev acceleration for a symmetrizable fixed point method can be estimated by the following theorem: Theorem 3.3.1. Let A be a symmetric positive definite matrix and M = M(A) the iteration matrix of a symmetrizable fixed point method Xk+1 ( x k) = M x k + c for A and the arbitrary initial vector Xo ERn. Then
II Yk
- x
II:S
1
ICk(i)1
II Xo -
x
holds for the corresponding Chebyshev iteration Yk algorithm 3.3.0.2.
II
= Pk( 1, the estimate
I (~)I ~ ~ Ck
holds.
Ii-I
2
(fo+ 1)k fo-l
3.4 Krylov Subspace Methods
99
Proof: see [74]. Altogether, the following estimate is valid for the iterates Yk:
II Yk
- x
II ~
2
(
J>:(A) - 1 )
~ J>:(A) + 1
k
II Xo -
x
II
Obviously, for the realization of the Chebyshev iteration it is very important to have a good knowledge of the spectrum of A (see the literature cited in [103]). The method can also be applied to non-symmetric matrices. Then ellipses enclose the (generally non-real) spectrum [171]. In the Axelsson method [11], [10], which is described in subsection 3.9 and is applicable to complex systems, the Chebyshev iteration is used to solve a real subproblem.
3.4 Krylov Subspace Methods Krylov subspace methods is a whole group of methods which are closely related. Linear systems are solved by minimization of a residual functional. In this process, the iterates are obtained from the initial residual by multiplication by a polynomial evaluated at the coefficient matrix, i.e., the minimization takes place over special vector spaces, the so-called Krylov subspaces. The algorithm builds a sequence of orthogonal or conjugate vectors (conjugate means orthogonal with respect to an inner product with a weighting matrix, say (x, y) = x T Ay with A symmetric positive definite. This scalar product defines the so-called energy norm.) Historically, the Lanczos algorithm [164] published in 1950 and the Conjugate Gradient algorithm (cg-algorithm) [127] published by Hestenes and Stiefel in 1952 have been the basis for the development of the Krylov subspace methods. Before describing these methods in detail, we will give a short overview of a few main characteristics. Here are the main reasons supporting the use of Krylov subspace methods. - Because of the optimality property of the approximate solution over the relevant solution space, the convergence rate is relatively high. - Using certain precondition techniques, it is possible to increase the convergence rate considerably. - The algorithms are free of any parameters, i.e., the user does not need to make parameter estimates (compare to the relaxation parameter of the SOR algorithm, which has a strong impact on the rate of convergence but is difficult to determine optimally). - The short recurrence relation leads to acceptable storage requirements and acceptable computation times per iteration. - The rounding error characteristics is also acceptable.
100
3. Numerical Treatment of Linear Systems
The main idea of all cg-like algorithms lies in solving an equivalent system instead of the original linear system. The equivalent problem is the minimization of a functional (cf. (3.2) in the derivation of the Chebyshev-iteration). Here it is dispensed with the historical development as an improvement of the gradient method. Yet, these methods can become unstable for indefinite or non-symmetric matrices. As a result, generalized cg methods in a great variety of versions have been developed since the end of the 70th. These versions are also applicable to non-symmetric and/or indefinite systems. For this kind of problems, it became obvious that often only a clever combination of preconditioning and some generalized cg algorithm led to the desired robustness. The Krylov subspace methods still are an active research area. The use of these methods for non-hermitian linear systems is discussed in a large number of recent publications. Unfortunately, for non-Hermitian systems Ax = b with A E c nxn , A"1 AH, the robustness of Krylov subspace methods is not sufficient to use them as black box solvers. There exist enough examples of system matrices A for which the described methods cannot reach the prescribed accuracy or for which they even fail to converge at all. Some of these examples have a great practical importance, so it is always absolutely necessary to carry out careful numerical experiments in order to see which solver is best for a given application problem.
3.4.1 The CG Algorithm As already mentioned in the derivation of the Chebyshev iteration, the cg algorithm results instead of the Chebyshev iteration if another norm is chosen in the construction of the problem alternative to the minimization problem. This will be shown in the sequel, where the presentation follows [74) (the English speaking reader is again referred to [103) instead). Again, the approximation of the solution x of the linear problem Ax = b by vectors Yk from an affine subspace Vk is the starting point. Choosing the Euclidean norm IIYI12 = J< y, y > first led to a dead end, the only way from which was passing to a solvable alternative problem. The underlying idea will still be followed, but now a scalar product fitting to the problem shall be used. This is the scalar product < x, Y >A =< x, Ay >= x T Ay weighted with a symmetric positive definite (shortly spd) matrix A. The energy norm is the corresponding norm.
Definition 3.4.1. For positive definite matrices A the so-called A-norm or energy norm is introduced:
Let Vk = Xo + Uk eRn be again a k-dimensional affine subspace and let Uk be the linear subspace parallel to Vk .
3.4 Krylov Subspace Methods
Definition 3.4.2. The solution
Xk
101
of the minimization problem
is called the Ritz- Galerkin approximation of x in Vk . According to a theorem from linear least squares theory,
holds with
ut
= {y E Rnl < y, z >= 0 for all z E Ud
the orthogonal complement of Uk in Rn. Thus, the solution Xk E Vk of the minimization problem is uniquely determined.
Definition 3.4.3. The function
P: R n --+
Vk,
y
f----t
Py with
Ily -
Pyll = min
ZEVk
Ily - zll
is affine linear and is called the orthogonal projection of Rn onto the affine subspace Vk . Consequently, Xk is the orthogonal projection of x onto Vk with respect to < .,. > A. With that, the minimization problem is equivalent to the following variational problem: (3.3) Instead of "orthogonal with respect to < .,. >A", the term" A-orthogonal" (also A-conjugate, for historical reasons) is often used. Equation (3.3) states that the error x - Xk must be A-orthogonal to Uk. With the residuals rk := b - Axk, the relation
holds. Consequently, the variational problem (3.3) is equivalent to the condition that the residuals rk be orthogonal (with respect to the Euclidean scalar product) to Uk, i.e.,
< rk,u >= 0 for all U
E Uk.
Now let PI, ... ,Pk be an A-orthogonal basis of Uk, i.e.,
Then it follows for the A-orthogonal projection Pk : Rn --+ Vk that
102
3. Numerical Treatment of Linear Systems
(3.4)
In contrast to the derivation of the Chebyshev iteration, here no longer exists any dependence of the right-hand side on the yet unknown solution x, i.e., the A-orthogonal projection Xk of x onto Vk can be calculated explicitly. Now (3.4) imply the recursions
Xk = Xk-l
+ CXkPk
rk = rk-l - cxkAPk for the approximate solutions Xk and the residuals rk, since
Now only the subspaces Vk C Rn for which an A-orthogonal basis PI, ... ,Pk can easily be computed are still missing for the construction of an approximation method. By the Cayley-Hamilton theorem (see, e.g., [48], [114]), there exists a polynomial Pn - l E P n-l such that
and therefore
x - Xo Choose Vk = Xo
= A-Iro = Pn- l (A)ro + Uk
E span{ro, Aro, ... , An-Iro}.
with
Uk := span{ro, Aro, ... , Ak-Iro} for k = 1, ... , n for the approximation spaces. Then x E Vn , i.e., the n-th approximation Xn is the solution itself. The subspaces Uk defined in this way are called Krylov subspaces [91]. Generally, the notation Km is mostly used instead of the Uk used here:
Definition 3.4.4.
L ciAiy}
m-l
Km = Km(A,y):= span{y,Ay, ... ,Am-Iy} = {v E Rnlv =
i=O
with Ko = {y} and m = 1,2,3, ..... Km is called m-th Krylov subspace ofRn generated by A and y . Therefore, all cg-like methods are referred to as Krylov subspace methods. With y = ro = b - Axo the residual for the initial vector Xo, the relation b - AXm 1. Km(A, ro) is called the Petrov-Galerkin condition. The following theorem justifies the construction of an A-orthogonal basis PI, ... , Pk from the residuals:
3.4 Krylov Subspace Methods
Theorem 3.4.1. Let rk thogonal, i. e.,
:P
103
O. Then the residuals ro, ... , rk are pairwise or-
< ri, rj >= Oij < ri, ri > for i,j = 0, ... , k. They span Uk+! , i.e., Uk+l
= span{ro, ... , rd·
Proof: complete induction over k (see, for instance, [74]) Set PI := ro for ro :P 0. By the above theorem, rk vanishes for k > 1, i.e., the solution x = Xk has been found or the vectors PI, ... ,Pk-l and rk are linearly independent and span Uk+! so that, with
an orthogonal basis of Uk+l was found. This gives rise to the cg algorithm:
Algorithm 3.4.1.1 (CG Algorithm; Conjugate Gradient) Choose f31 == 0, PI == TO, Xo := 0 and thus ro := b For k = 1, ... , n ifrk-l = 0 then Set x = Xk-l and finish the computation. otherwise 13k =< Tk-l, rk-l > / < Tk-2, rk-2 > Pk = rk-l + f3kPk-1 (lk =< rk-l,rk-l > / < Pk,Pk >A Xk = Xk-l + (lkPk rk = rk-l - (lkAPk x
=
Xn
In each iteration step, only one matrix-vector multiplication, namely Apk, is necessary. The cg algorithm has the following properties if computation is free of rounding errors:
°
rn = and Xn is the exact solution after at most n steps. - The vectors Pk are pairwise A-orthogonal or conjugate, i.e.,
-
whenever
k:p i.
They are called search directions. - Each residual is orthogonal to all previous search directions of the functional F. - All residuals are pairwise orthogonal.
104
3. Numerical Treatment of Linear Systems
- The recursively computed residual rk is the same as the actual residual rk = b - AXk of the iterated approximate solution Xk. Thus the cg algorithm also combines properties of direct and iterative solution methods: It generates a series Xk which approximates the solution x in the prescribed manner, but, after at most n steps, it gives the exact solution if there are no rounding errors, i.e. in exact arithmetic. In practice, the recursively computed residuals rk of the cg algorithm are not orthogonal because of rounding errors. Therefore, the exact solution cannot be obtained after n steps in practice. So, the iterative aspect of the method became more emphasized over the years. In 1959, Stiefel and his co-workers (Rutishauser, Engeli et al.) pointed out the attractiveness of the cg algorithm as a solution method for sparse systems. Since the publication of Reid in 1971, the iterative aspect is mainly of interest. One advantage among others is the availability of the recursively computed residuals in each iteration step. The following rough error estimate holds for the asymptotic rate of convergence of the cg algorithm. Theorem 3.4.2. Let A E Rnxn be a symmetric positive definite matrix with the eigenvalues Al ~ A2 ~ ... ~ An. Let x· be the exact solution of the linear system Ax b. The following relation holds for the iterates Xk of the cg algorithm
II Xk and
II
Xk
- x·
IIA:= 1. 3. Determine the approximate solution:
with the n x k-matrix Qk = [V1' V2, ... , Vk] and the tridiagonal matrix Tk as in (3.6). In the Hermitian as well as in the non-Hermitian case, the following formula can be given for the residual [226]:
with the k-th unit vector ek. Therefore, the residual norm in step 2 of algorithm 3.4.2.2 can be determined without much effort without computing the approximate solution itself. In step 3 of the algorithm 3.4.2.2, the linear system TkYk = (31 e1 with the tridiagonal matrix Tk has to be solved. This can be done, e.g., by LV decomposition Tk = LkUk. Including the decomposition into the Lanczos step, one can determine the approximate solutions Xk consecutively. Saad [226] calls this procedure the direct Lanczos algorithm. If A is real symmetric and a different factorization is chosen, the SYMMLQ algorithm [193] follows. A variant for non-Hermitian systems which follows another algorithm for real non-symmetric matrices given in [103] can be found in [280]. For the non-Hermitian Lanczos algorithm, one needs to store only six vectors and a tridiagonal matrix.
108
3. Numerical Treatment of Linear Systems
3.4.3 Look-Ahead Lanczos Algorithm While normalizing Vj+! and Wj+l, an exact breakdown of the algorithm may happen if < Vj+l' Wj+! >= O. More serious, however, is the case of a near-breakdown. In this case, the Lanczos vectors are scaled by very small numbers, which, after a few steps, can cause a near-breakdown. Yet, by some modifications of the algorithm, it is possible to continue in most cases. This procedure is often referred to as Look-Ahead Lanczos. The underlying idea of this algorithm is that the pair {Vj+2, Wj+2} can often be defined even if the pair {Vj+l, wj+d cannot. Then the algorithm can be continued starting with the pair {Vj+2,Wj+2}. If also {Vj+2' wj+d cannot be defined, the pair {Vj+3, Wj+3} etc. can be used. The following explanation of the mechanism underlying the Look-Ahead Lanczos algorithm follows Saad [226]. First, define a bilinear form on the subspace P k - l by
(3.10) Unfortunately, it may happen that < P, q >p vanishes or takes on negative values, thus being an "indefinite inner product". Now, there exists a polynomial Pj of degree j and a scalar "fj such that Vj+l = Pj (A)VI and Wj+! = "fjpj(AH)WI. The Lanczos algorithm attempts to construct a sequence of polynomials which are orthogonal with respect to the inner product (3.10). Because of
< Pj,Pj >p= "fj < Pj(A)VI,Pi(AH)WI >, an exact breakdown happens in step j if and only if the indefinite norm of the polynomial Pi vanishes in step j. The main idea of the Look-Ahead Lanczos algorithm is that the polynomial Pi is left out but Pi+! is calculated in any case, thus allowing to continue constructing the series. The following example, with
will illustrate this idea. The polynomials qj and qj+! are orthogonal to the polynomials PI, .·.,Pj-2· If Pj = qj is fixed and Pj+l is determined by the requirement that qj+! be orthogonal to Pj-l and Pj, then the resulting polynomial is orthogonal to all polynomials of degree ~ j. It is therefore possible to continue the algorithm in the same way starting from step j + l. The disadvantage of Look-Ahead implementations is the immense additional complexity. One difficulty is the need to decide when the situation is near-breakdown. Furthermore, the matrices Tk are no longer tridiagonal. In what follows, the QMR algorithm will serve as a representative of the many known variants.
3.4 Krylov Subspace Methods
109
3.4.4 Variants of the CG Algorithm for Linear Systems with Non-Hermitian or Indefinite System Matrix The cg algorithm is a very efficient and simply implementable method for symmetric positive definite matrices. In practice, however, linear systems Ax = b with complex non-Hermitian or indefinite system matrix A often arise. A number of variants of the cg algorithm has been developed for these linear systems. Some important ones will be described in the following. In literature, many algorithms have only been introduced for real non-symmetric matrices but they can easily be transferred into the complex space. With that, the inner product < .,. >: cn x cn -t C is then defined as < x, y >:= yH x = x where y is the complex conjugate of the vector y. The Krylov subspace methods are an actual research topic even today. Especially interesting is their application to non-Hermitian linear systems. Yet it has to be noted that the Krylov subspace methods for non-Hermitian systems Ax = b with A E cnxn, A f:. AH are not robust enough to use them as black box solvers. There are many examples of system matrices A for which the methods which are introduced below either do not reach the prescribed accuracy or even fail to converge at all. Some of these examples are important applications from practice. Therefore thorough numerical experiments are always absolutely necessary in order to decide if an algorithm is applicable to a given problem.
rr
CGNR and CGNE Algorithm (CG Applied to the Normal Equations). The easiest algorithms which are suitable for non-Hermitian or indefinite systems Ax = b and related to the cg algorithm are the CGNR and CGNE algorithm, which were developed by Craig [69] for real matrices A. Here the more general complex form is given. These algorithms are based on the normalized equation AH Ax = AHb (CGNR) or AAH X = b (CGNE). First, they transform the system into a related positive definite system, then apply the cg algorithm. [89] studies this ansatz for real problems. The advantage of these algorithms is their easy implementation (cf. [226], [280]). Yet, the normalization of the matrix increases the condition number, since I\:(AH A) = I\:(A)2. Consequently, only very slow convergence can be expected, and it is essential to precondition the system appropriately. Then these algorithms are usually very robust but also very slow. The latter is the reason why these methods have been a priori excluded for the studies discussed below. BiCG Algorithm (Bi-Conjugate Gradient). The cg algorithm is not suitable for non-Hermitian systems, since the residual vectors can no longer be made orthogonal via a short recurrence. The GMRES algorithm avoids this difficulty by using longer recurrences, which imposes extra storage requirements. The BiCG algorithm, on the contrary, replaces the orthogonal series of residuals by two mutually orthogonal sequences, but then it cannot maintain the minimization property. Just like it is possible to derive the cg algorithm from the Hermitian Lanczos algorithm, it is possible to derive the
no
3. Numerical Treatment of Linear Systems
BiCG algorithm from the non-Hermitian Lanczos algorithm [226]. The algorithm was first presented in 1952 by Lanczos [165]' then, in 1975, Fletcher [96] published a cg-like version for real non-symmetric matrices. The iterates of the BiCG algorithm satisfy the Petrov-Galerkin condition
(3.11) with the Krylov subspaces Vk = Vk(A, ro) with ro = b - Axo and W k = WdAH,ro) , where TO is an additional non-trivial starting vector, the socalled pseudo-residual related to AH. In 1981, Jacobs [144] introduced the method for complex non-symmetric matrices. Thus, it is a method of projection onto Vk orthogonally to Wk. The BiCG algorithm has been developed for general systems of complex linear equations. It is the basis for several modern algorithms, which will be introduced in the sequel.
Algorithm 3.4.4.1 (BiCG Algorithm) Choose Xo, ro = b - Axo and TO such that < ro, TO ># 0, set P-l := P-l :=
0, < r-l,r-l >:= 1 For k = 0,1, . .. : 13k =< rk,Tk > / < rk-l,Tk-l Pk = rk +13kPk-l Pk = Tk + 13kPk-l ak =< rk,Tk > / A XkH = Xk + akPk rk+l = rk - akApk Tk+l = Tk - akA HPk
>
In practice, the often used expressions like Ok :=< rk, Tk > and Vk := APk, are stored in a separate variable. Here this is not done deliberately to keep the algorithm as transparent as possible. The pseudo-residuals Tk = b- AH Xk and the pseudo search directions Pk theoretically provide the termination of the process after finitely many steps. By construction, the pseudo-residuals Tk are orthogonal to the residuals rk and the pseudo search directions Pk are A-conjugate to the search directions Pk. In the real symmetric case, the BiCG algorithm corresponds to the cg algorithm, since in this case the pseudo search directions and pseudo-residuals coincide with the original search directions and residuals. Only a few theoretical results about the convergence of the BiCG algorithm is known until now. It can be shown that the residual norm is reduced in phases. During each phase, the number of iterations of the algorithm is more or less comparable with that in the GMRES algorithm [100]. In practice, this can often be observed, but irregular convergence can also occur and the algorithm may even break down. The breakdowns are caused by divisions by zero during the computation of ak and 13k. In the first case, the QMR algorithm gives the appropriate Look-ahead strategy, in the other case it leads to very complicated algorithms (cf. [196]). A restart directly before
3.4 Krylov Subspace Methods
111
some (near-)breakdown and a switch to more robust algorithms such as the GMRES algorithm are other possibilities to handle (near-)breakdowns. The residuals Tk may be expressed as a linear combination of the basis vectors of Vk(A,TO), which means nothing else than Tk = ¢k(A)TO, with a polynomial ¢k in A. According to Sonneveld [240], the calculation of the vectors Pk and Tk can be considered the construction of the so-called BiCG polynomials of degree k 'ljJk, ¢k E P k (where P k = {qlq( t) = 2:7=0 ait, q(O) = 1, ai E R}) that satisfy the following three-term recurrence relation for Pk and Tk: Tk = ¢~iCG(A)TO, Pk = 'ljJkiCG(A)TO' For the derivation of further cg-like algorithms, the polynomial-like representation of the algorithms shows great advantages. The polynomial representation of BiCG corresponds to the polynomial representation of the cg algorithm - merely with another bilinear form. The bilinear form used in BiCG is given by [¢, 'ljJ] := rf! ¢(A)'ljJ(A)TO, while for the cg algorithm, the positive semi-definite bilinear form [¢, 'ljJ] := is used. In general, i.e., for non-Hermitian indefinite A, the bilinear form used in BiCG is not positive semi-definite.
TJ' ¢(Af 'ljJ(A)TO
COCG Algorithm (Conjugate Orthogonal Conjugate Gradient). The COCG algorithm of van der Vorst and Melissen [276] is a symmetric variant of the complex BiCG algorithm. Let A E cn x n , A = AT, b E cn, Xo E cn. The choice of ro = 1'0, the conjugate complex of the initial residual TO, simplifies the computation of the factors ak and 13k of the BiCG algorithm in the complex symmetric case since one matrix-vector multiplication with the complex conjugate matrix A H is avoided. Algorithm 3.4.4.2 (COCG Algorithm) Choose Xo E en, set Po = TO = b - Axo FOT k = 0,1, ... " 13k =< Tk, 1'k > / < Tk-l, 1'k-l > Pk = Tk + 13kPk-l ak =< Tk,1'k > / < Pk,Pk >A Xk+l = xk + akPk TkH = Tk - akApk If the system matrix is complex symmetric, it may be assumed that this algorithm is the best choice, since there is only oue matrix-vector multiplication per iteration necessary, while the BiCG algorithm requires two matrix-vector multiplications. With respect to storage requirements, the COCG algorithm is also favourable, with three vectors to store compared to five vectors in the BiCG algorithm. For the polynomial representation, the following relations hold for the residuals Tk and the search directions Pk of the COCG:
112
3. Numerical Treatment of Linear Systems
rk =
/ < Pk,TO >A qkH = Uk - Q:kApk XkH = Xk + Q:k(Uk + qkH) TkH = Tk - Q:kA(Uk + qkH)
< ro, TO >¥
0, set qo
:= P-l :=
Compared with the BiCG algorithm, the CGS algorithm has the advantage that the pseudo-residuals Tk and the pseudo search directions Pk as well as AH are not needed. It only requires to perform two matrix-vector multiplications and to store seven vectors. Theoretically, the CGS algorithm converges if the BiCG algorithm converges. Yet, for many applications, CGS is usually faster than BiCG. The quadrature of the BiCG polynomials is a characteristics of the CGS algorithm; its increase - depending on the choice of the initial residual To - either intensifies an existing contraction property of the polynomial or increases the residual norm. This causes irregular convergence behaviour, which often can be observed, can lead to numerical extinction, and thus strongly influences the stability of this algorithm. As a result of this observation, several new, stabilized algorithms have been developed.
3.4 Krylov Subspace Methods
113
CGS2 Algorithm. Fokkema, Sleijpen, and van der Vorst [238] developed generalized versions of the CGS algorithm. The next relative is the CGS2 algorithm. This algorithm chooses
with a 'nearby BiCG polynomial' ¢k which is based on the vector s instead of f. Because of the great similarity to CGS, the algorithm is not explicitly given here. It also requires only two matrix-vector multiplications but, compared to the CGS algorithm, has a worse storage requirement of ten vectors.
SCBiCG Algorithm. Recently, Clemens [60] included some algorithms for complex symmetric systems in one class by introducing the more general formulation of the SCBiCG(T, n) algorithm. This class includes the COCG and BiCGCR algorithms for n = and n = 1. This class stands out due to the fact that it only requires one matrix-vector multiplication per iteration. Clemens also combines these algorithms with Minimal Residual smoothing, following Schonauer [233]. Let 7r E P n be a polynomial of degree n (P n := {qlq(z) = I:7=oCiZi,Z E C, Ci E R, Cn :P O}). Denote the set of its coefficients Ci by T: T := {cih=o, ... ,n. For these algorithms, the pseudo-residual fo of the BiCG algorithm is chosen according to fo := 7r(A)ro.
°
This implies by induction that
fk
= 7r(A)rk
and 'Pk
= 7r(A)Pk
for k
= 0,1, ...
holds for the pseudo-residuals and pseudo search directions of BiCG. The set of coefficients T and the degree n of the polynomial 7r chosen for the construction of 1'0 also determine a special algorithm from this class. Auxiliary vectors v(ih := AiVk are defined in order to formulate the algorithm:
Algorithm 3.4.4.4 (SCBiCG(T, n) Algorithm) Choose xo, r(O)o = b - Axo For i = 0,1, ... ,n - 1 : r(i + 1)0 = Ar(i)o For i = 0,1, ... , n : p( i)o = r( i)o p(n + 1)0 = Ap(n)o For k = 0,1, ... : Xk+l = Xk + (}:kP(Oh For i = 0,1, ... , n : r(ih+l = r(ih - (}:kP(i + 1h For i = 0,1, ... ,n : P(i)k+l = r(ih+l + (3kP(ih p(n + 1h+l = Ap(nh+l
114
3. Numerical Treatment of Linear Systems
with
L elk
=
Cl
< r(ih, r(jh >
O A =
° °
for i :j; j; i,j
= 0,1, ... , k
for i :j; j; i, j = 0,1, ... , k
The COCG algorithm results for n = 0, while, for n = 1 and Co = 0, one obtains an algorithm which coincides with the CR algorithm (Conjugate Residual) of Stiefel [247] in the real case; it will be called the BiCGCR algorithm.
3.5 Minimal Residual Algorithms and Hybrid Algorithms The minimal residual algorithms are more stable than the cg-like algorithms. In particular, they display absolutely monotone convergence behaviour. Nevertheless, some parallels to the cg algorithm exist. The GMRES algorithm of Saad and Schultz [229] is given below. It is a generalization of the MINRES algorithm of Paige and Saunders [193] for non-symmetric systems. Both generate a sequence of orthogonal vectors. However, while MINRES uses short recursions, it is necessary to take into account all previously computed vectors of the orthogonal sequence in GMRES. For this reason, restarted versions of this method are used in practice. Other related methods are ORTHODIR of Jea and Young [145], a method by Axelsson [6] that builds the basis of the Generalized Conjugate Gradient, Least Squares method of [8], which is described in subsection 3.5.2, the Generalized Conjugate Residual method GCR of Elman [89], [88], the GMERR algorithm of Weiss [312], and the recently published GMBACK algorithm of Kasenally [151]. However, in order to be able to compete with the cg-like methods, these methods require in general (even in the restarted versions) to store a large number of basis vectors. In combination with the typical size of the problems coming from application, their use very often does not make sense. GMRES, for example, was compared, for ill-conditioned linear
3.5 Minimal Residual Algorithms and Hybrid Algorithms
115
systems, by Schmid, Paffrath, and Hoppe in [230] with BiCGSTAB and CGS, which proved to be much more efficient. The observed very slow convergence of GMRES and wild oscillation of CGS can each be regarded typical for both classes.
3.5.1 GMRES Algorithm (Generalized Minimal Residual) Since the GMRES algorithm itself cannot be recommended for practice because of its enormous storage requirements and since it only occurs in the sequel in connection with hybrid methods such as the BiCGSTAB algorithm, a detailed derivation is not given here. It can be found in the original publication of Saad and Schultz [229] (also see [280]). In GMRES, like in the cg algorithm, a minimization problem is solved instead of the linear system itself. The minimization problem in GMRES is (3.12) G MRES is based on the conservation of orthogonality rather than on a threeterm recurrence. Consequently, the orthonormal basis {Vi, ... , vd of the Krylov subspace Vk(A, 1'0) has to be determined. To this end, the Arnoldi algorithm [3], a modified Gram-Schmidt orthonormalization method, is applied. If the vector y in (3.12) is expressed by its Krylov space representation QkZ, then (3.12) is equivalent to the minimization of with f3 =
111'011,
where Qk = [Vi, ""Vk] is the matrix with the columns Vi, ... ,Vk. The Arnoldi process yields a k-dimensional upper-Hessenberg matrix Hk with the elements hij = < V j, Vi >A. Now let Hk be the Hessenberg matrix H k extended by the row (0, ... ,0, hk+l,k). Then the relation
which is proven in [44] and is very important for the derivation of GMRES, holds. With that, using the orthonormality of Qk+l, the functional J(z) can be expressed as
with the unit vector el. For the solution of the minimization problem, Hk is then brought into upper triangular shape via Givens rotations: Hk = fhRk with a unitary rotation matrix fh. Then J(z) can be transformed into
With that, Zk = minz J(z) can be obtained by backward substitution since Rk is an upper triangular matrix.
116
3. Numerical Treatment of Linear Systems
The number of vectors to be stored for the reconstruction of the orthonormal basis grows linearly, the number of floating point operations (flops) quadratically with k. Therefore, in practice a "dynamical" version of GMRES is used which restarts after I steps and therein uses Xl as initial approximation.
Algorithm 3.5.1.1 (GMRES(I) Algorithm) Choose xo, set ro = b - Axo, VI = ro/llrol!' [l = I, For j = 1,2, ... , l: (Arnoldi) For i = 0,1, ... ,j: hi,j =< Vj,Vi >A
90
= (1IroI12' 0, ... , of
j
Vj+! = AVj - Lhi,jvi i=1
hj+l,j = Ilvj+111 Vj+1 = vj + l/hj+l,j Store iI j in factorized form iIj 9j = [lj9j-1 If Ilrtil = 91 > c: restart with Xo := Xl, VI := rtlllrtli Compute Zl from RlZl = 91 Xl = Xo + QlZl
= [ljRj
Convergence of GMRES(I) is guaranteed only for positive definite matrices. For indefinite systems, stagnation may happen, which, however, is observed seldom[230]. Implementations using the Gram-Schmidt orthogonalization are relatively inexpensive, but may be numerically unstable. So, some implementations apply the Householder transformation for orthonormalization, which is twice as expensive. These implementations are known for their stability and their good vectorizability [230], [44].
3.5.2 Hybrid Methods The hybrid methods described below combine the cg algorithm, the BiCG algorithm, or the Look-Ahead Lanczos algorithm with a minimal residual ansatz, particularly with the GMRES algorithm. This way the advantage of short recursions in the cg or Lanczos algorithm is combined with the stable and monotone convergence behaviour of the minimal residual algorithm. The resulting algorithms are very well suited for the solution of complex nonHermitian linear systems.
BiCGSTAB Algorithm (Bi-Conjugate Gradient Stabilized). The BiCGSTAB algorithm [275] was developed by van der Vorst as a stabilized version of the cg algorithm. The BiCGSTAB algorithm is a modification of the CGS algorithm. Because of squaring of the residual polynomial ¢~;CG2 (A) in the CGS algorithm, it may happen, in case of irregular convergence, that
3.5 Minimal Residual Algorithms and Hybrid Algorithms
117
rounding errors build up and finally lead to an overflow. To avoid this, the residuals rk in BiCGSTAB are defined by rk = 71'dA)¢~iCG(A)ro, with a new polynomial 71'k. This new polynomial 71'k is defined recursively in each step. The goal is stabilization and smoothing of the algorithm. Therefore, 71'k E P k is defined as 71'k+1(t)
= (1 -
Wkt)71'k(t).
Thus, in each step, the preceding polynomial is multiplied by a polynomial of degree 1. The new parameter Wk is chosen so that, by multiplication of the residual vector by (I - WkA), a steepest descent in the direction of the preceding residual is achieved. Thus the parameters Wk are chosen so that the Euclidean norm of rk is minimized:
In other words, 71'k is a product of k I-step MR polynomials (Minimal Residual): 71'k(t)
= (1 -
wlt)(1 - W2t) ... (1 - wkt).
Introducing the vector Sk with Sk :=
71'k-1 (A)¢k BiCG
(A)ro,
define the parameter Wk by sf! ASk
Wk
= sf! AH ASk'
In vector form, the algorithm is given by: Algorithm 3.5.2.1 (BiCGSTAB Algorithm) Choose xo, ro = b - Axo and 1'0 such that < ro, 1'0 For k = 1,2, ... : (}:k-l =< rk-l,fo > / < Pk-l,fo >A Sk = rk-l - (}:k-IApk-1 Wk =< Sk,Sk >A / < Sk,Sk >AHA Xk = Xk-l + (}:k-IPk + WkSk rk = Sk - WkAsk (3k =< rk,fo > / < rk-l,fo > (3k(}:k-1 ( A) Pk = rk + Wk Pk-l + Wk Pk-l
>i= 0,
set qo := 0
The BiCGSTAB algorithm combines the BiCG polynomial with a GMRES(I)-minimization step. Therefore, the BiCGSTAB algorithm normally leads to smoother convergence curves than the CGS algorithm. However, in the BiCGSTAB algorithm, stagnation or even a breakdown can happen if Wk nearly vanishes. The computational effort is two matrix-vector multiplications and a storage requirement is seven vectors.
118
3. Numerical Treatment of Linear Systems
BiCGstab2 and BiCGstab(l) Algorithm. In 1991, Gutknecht proposed the BiCGstab2 algorithm [110] in order to avoid stagnation and breakdown caused by nearly vanishing Wk. His proposal was to use a MR polynomial of second degree. In each even step, he corrects the first degree MR polynomial of the preceding step. The MR polynomial of degree one, however, can already be nearly degenerate and thus can cause degeneration of the MR polynomial of second degree as well as large errors. Therefore, Sleijpen and Fokkema introduced a generalization of this method in 1993 that constructs a MR polynomial of degree l in each l-th step. This leads to the more efficient BiCGstab(l) algorithm [237] with 7rk
where k
= ml + l,
= XjXj-1 ... Xo
Xi E Pt. Xm minimizes
This method can be regarded a combination of the BiCG algorithm with GMRES(l). The iterates satisfy Xk
= Xtm
E Xo
+ V2mt (A,TO).
Consequently, for l = 1, it coincides with the original BiCGSTAB algorithm. Certain near-breakdowns can be avoided using the BiCGstab(l) algorithm, but generally they cannot be avoided since the leading coefficient of Xm may become very small. In section 3.10, some studies regarding implementations with l = 1 and l = 2 are discussed. Even for l = 2, stagnation may happen if the GMRES(2)-part stagnates. The BiCGstab(l) algorithm requires 2l matrix-vector multiplications in each iteration and requires to store.
QMR and TFQMR Algorithm (Transpose-free Quasi-Minimal Residual). The QMR algorithm (Quasi Minimal Residuan of Freund and Nachtigal [100] is based on the Look-Ahead Lanczos algorithm for nonHermitian linear systems. The last one is combined with the minimal residual algorithm GMRES in the QMR algorithm. The QMR algorithm can be applied even to singular quadratic systems, as Freund and Hochbruck [98] showed. Thus, the Petrov-Galerkin condition is replaced in the QMR algorithm by quasi-minimization of the residual norm. In contrary to the BiCG algorithm, breakdowns are principally excluded in QMR by using a Look-Ahead strategy in the underlying Lanczos process. In the QMR algorithm, the vectors {Vj} (cf. algorithm 3.4.2.1) generated by the Look-Ahead Lanczos algorithm are used as a basis for the Krylov subspace Vk(A, TO)' Let Qk be the matrix Qk := [VI, ... , Vk] built from the basis vectors. Then the k-th QMR iterate Xk is defined by Xk = Xo
+ QkZk
where Zk E C k is the unique solution of the least squares problem
3.5 Minimal Residual Algorithms and Hybrid Algorithms
119
(compare to the minimization problem in the derivation of the GMRES algorithm). Therein (3 = Ilroll, e1 is the first unit vector from Rk+1 and the matrix Yk+l := diag(w1,w2, ... ,Wk+l) is an arbitrary diagonal weighting matrix with Wj > O,j = 1,2, ... , k+ 1. The standard choice for the weights is Wj = 1 for all j. The matrix Tk is a (k + 1) x k tridiagonal matrix from the Lanczos process which satisfies AQk Qk+1Tk.
=
Therefore, the matrix Yk+1 'h has full rank and guarantees the existence of a unique solution of the problem (3.13). Then the following holds for the residual vector rk := b - AXk: (3.14) Consequently, because of (3.13), the k-th QMR iterate Xk is characterized by the minimization of the second factor in (3.14). This is just the quasiminimal residual property. For further details, the reader is referred to [100]. The QMR and TFQMR algorithms of Freund [97] are closely related to the CGS algorithm (compare [97], [330]). The convergence behaviour of the QMR algorithm is very similar to that of the CGS and CGS2 algorithms. However, the convergence curves are evidently much smoother. In the original QMR algorithm, the usual three-term recursions are used inside the Lanczos process. Yet, since it has been observed that vector iterations based on three-term recurrences are less robust in case of finite computation accuracy than the mathematically equivalent two-term recurrences, an implementation with two-term recurrence relations was introduced by Freund and Nachtigal in [99]. There also exists a transpose-free version of the QMR algorithm, the so-called TFQMR algorithm. This algorithm is explicitly given here:
Algorithm 3.5.2.2 (TFQMR Algorithm)
Choose xo, W1 = Yl = ro = b - Axo, ro such that AYl, Po = 0, '190 = 0 For k = 1,2, . .. : Cl:k-1 =< w2k,rO > / < vk-l,rO > Y2k = Y2k-1 - Cl:k-1 Vk-1 For j = 2k - 1, ... , 2k : Wj+l = Wj - Cl:k-1AYj
tl j = IIWj+lllV1 + tl]_dllwjll Pj = Yj + tl]_lPj_l!(l + '19]_1)
Xj = Xj-1 + Cl:k-1Pj/(1 + '19]) (3k =< W2k+1, ro > / < W2k,rO > Y2k+l = W2k+l + (3kYZk Vk = AY2k+1 + (3k(AY2k + (3k v k-d
¥-
0, Vo
120
3. Numerical Treatment of Linear Systems
Again, to make the algorithm transparent, the usual auxiliary scalars, which of course should be used in an implementation, are not introduced. 3.5.3 GCG-LS(s) Algorithm (Generalized Conjugate Gradient, Least Square) The Generalized Conjugate Gradient, Least Squares algorithm ofAxelsson [5], [6], [8J belongs to the group of generalized cg algorithms. There is a number of such methods invented in recent years. They are either of least squares type, as ORTHOMIN [291]' the predecessor of the method [5], [6J treated here and GMRES [229], or of Galerkin type, as in [313], [6J. In [228], the equivalences between the different methods are discussed in detail. All of these methods recursively construct a sequence of search directions {pj} and approximate solutions Xk as linear combinations of the preceding search directions or as truncated linear expansions by minimization of a weighted squared norm of the error of the residuals in the least squares sense or by demanding certain orthogonality relations. The computation of the search directions can happen in many different ways. In [5], [6], they are recursively determined as a linear combination of the last residuals and search directions. The GCG-LS(s) algorithm generalizes the method from [5], [6J allowing the search direction Pk to be a certain (s + I)-term expansion and Xk+1 a combination of Pk and Pk-s-2. Here Pk-s-2 plays the role of a control term, and small values of this term indicate that the appropriate value might have been found for s where s is the parameter of the "truncated version". For s = n - 1, the full version arises. Algorithm 3.5.3.1 (GCG-LS Algorithm) k-th step; Sk = min{k,s}:
Q:~k) =
A / (k) Q: k Pk
< Pk,Pk >AHA
+ + Q:~k) APk
hk+i = AH Ark+i
Fori
= O,I, .. ,sk:
f3~'".ll =
/ < Pk-l,Pk-l > AHA
Sk
Pk+i
=
-rk+1
+ L}~'".ljPk-j j=O
In each step, Sk +3 inner products, two matrix-vector multiplications with and one with AH are necessary. The GCG-LS(s) algorithm converges monotonely without breakdowns. One can prove under certain assumptions that the algorithm terminates after finitely many steps. As usual, the convergence rate depends on the eigenvalue distribution. In [8], Axelsson gives estimates showing that the GCG-LS(s) A,
3.6 Multigrid Techniques
121
algorithm needs evidently less iterations than the CGNE algorithm in order to reach the same accuracy. It is worth noting that the Krylov sequence in the GCG-LS(s) algorithm is based on A and not on AHA as in the CGNE algorithm. However, in the full GCG-LS algorithm, the costs per iteration grow linearly with the number k of steps. This is the reason why the truncated version of the algorithm was introduced.
3.5.4 Overview of BiCG-like Solvers Table 3.1 compares the polynomial representation and numerical effort for some commonly used Krylov subspace and hybrid methods. Solver BiCG COCG CGS CGS2 BiCGSTAB(l) TFQMR
BiCG polynomial rk - if>k'C;V(A)ro rk = if>~iCG(A)ro rk = (if>~iCG?(A)ro rk = ¢k(A)if>~iCG(A)ro rk = 7rk(A)if>~iCG(A)ro QMR ansatz on CGS
multiplications 2 1 2 2 2I 2
vector storage 5
3 7 10 2I
+5
8
Table 3.1. Comparison of polynomial representation and numerical effort for some Krylov subspace and hybrid methods. The second column shows the polynomial representation of the k-th residual. The third column shows the number of matrixvector-multiplications per iteration. The last column shows the necessary storage requirements in number of vectors to be stored.
3.6 Multigrid Techniques The multigrid method, which in spite of its name is a construction principle for iterative solvers rather than a 'method', was originally developed for fast solution of Poisson's equation. The historical development of multigrid techniques, shortly MG, began in the beginning of the sixties with studies by Fedorenko and Bakhvalov: Already in 1961, Fedorenko [92] described a twogrid algorithm and, in 1964 [93], the first multigrid algorithm for Poisson's equation on a square. In 1966, Bakhvalov [14] published a method for second order elliptical differential equations with variable coefficients. At that time they spoke of a method for a sequence of grids. On the basis of these papers, Brandt [40] began his studies in 1972. He discerned the great efficiency of the multigrid scheme. Independently of these papers and studies, Hackbusch [111] developed his multigrid algorithms, which he first published in 1976. Further important papers are listed, e.g., in [253] by Stiiben and Trottenberg. In [269], Trottenberg gives a short overview of the basic ideas of multigrid techniques. Meanwhile, a more textbook-like treatise can be found, e.g., in
122
3. Numerical Treatment of Linear Systems
Briggs' multigrid tutorial [49], Hackbusch [112], [114], or GroBmann / Roos [107]. In [320], Wittum gives a popular description of the method. In [42], Brandt gives some kind of a guide for the development of a multigrid algorithm. The multigrid techniques recently acquired additional importance as the so-called multilevel preconditioner. Over the years, a variety of different derivations and views was found. Unfortunately, the multitude of notations going along with this does not lead to greater clarity. Among the noteworthy newer developments are Griebel's [106] representation of multigrid techniques and multilevel preconditioners as classical iterative solvers (GauB-Seidel, Jacobi preconditioner) over generating systems which keep the node bases of several discretization levels and Deuflhard's [73] cascade methods. The steadily published "mgnet-digest" [81] contains an overview of the wide range of existing literature. Only the essentials of multigrid techniques shall be given in this subsection before a special multigrid algorithm [277] will be described in subsection 3.7.
Principle of Multigrid Techniques A boundary value problem for a partial differential equation is usually discretized on some grid. The discretization leads to a large linear system Ax = b with simple structure: The system matrix A is sparse, the non-zero elements for many discretization methods are in very systematic order (e.g., appear only on a few bands below and above the main diagonal) and, for homogeneous domains, all non-zero terms have the same order of magnitude. Iterative solution methods are best for such linear systems. The classical stationary iteration methods have three typical properties when they are applied to linear systems due to discretization of differential equation: 1. They give a very good smoothing of the error, i.e., the amplitude of
high frequency error components in the Fourier expansion of the error is strongly diminished in a few iterations steps. But:
2. The rate of convergence worsens with refinements of the discretization, i.e., as h -+ O. 3. The total error hardly diminishes after the smoothing of the high frequency terms. The basic principle of multigrid now is the combined treatment of a sequence of discrete problems of increasing resolution - all approximating the same continuous problem. The combination is chosen in a way that each discrete problem mostly takes care of the highest frequency components, leaving lower frequencies to coarser resolutions. Properly implemented, this gives a rate of convergence independent of the step size h in the finest discretization.
3.6 Multigrid Techniques
123
Multigrid techniques are especially well suited for the treatment of linear boundary value problems for elliptic differential equations, for elliptic systems of partial differential equations, and for elliptic eigenvalue problems. In addition, they are also suited for non-linear boundary value problems. For other problems such as hyperbolic problems they can also be used but then they do not show their typical advantages (speed and robustness) compared with other methods. Let us emphasize again that - even though an explicit multigrid or multilevel method is used most often - the multigrid idea is a principle to construct iterative solvers for discrete partial differential equations rather than a solution method. This idea has been successfully applied to a wide range of problems, from elasticity to fluid dynamics and to all kinds of discretization methods. The original underlying principle is to exploit a separation of frequency scales implied by the spectral properties of the differential operator, which originally was an elliptic operator.
3.6.1 Smoothing and Local Fourier Analysis Crucial for the quality of a multigrid algorithm is the choice of a suitable smoothing procedure for the given partial differential equation. One method, the local mode analysis, requires expanding the error of the approximate solution into its Fourier series. The smoothing is mainly a local process, since high frequencies have only a small area of influence. Therefore, the smoothing may be studied far away from the boundaries in the inner part of the grid. Such an analysis is also very helpful for the estimation of the rate of convergence. The error vn,m = v(n· h x , m· hy) of an approximate solution on a two-dimensional grid is (locally) expanded in a Fourier series: 00
V n,m --
L..J A e e i (6l 1 n+6l 2 m) ,
""
(3.15)
k=-oo
with 8 = (8 1 ,8 2 ), The single Fourier components of this error expansion may be studied separately. The convergence rate for the high-frequent components gives the smoothing rate where a component is said to have high frequency if it is no longer visible, i.e., not representable, on the next coarser grid. For a grid with step size H, this is the case if the wave length of the wave is < 4H. Thus, the following notation results in case of uniform coarsening:
Definition 3.6.1. Let G h be a fine grid and let G H be the next coarser one. The error components visible on G h but not on G H are the high frequency components. If Gh and G H are regular grids with H = ph, then; :::; 181 :::; 7r for these components. Assume that the partial differential equation has been reduced by discretization to the linear system of equations
124
3. Numerical Treatment of Linear Systems
(3.16)
Ax=b. Then
M x(j+1)
+ (A -
M)x(j)
=b
(3.17)
gives a general description of the iteration. Now let v~~ = xn,m - xW,~ be ('+1) ('+1) the error before and V,,{,m = xn,m - X';,m the error after one iteration step. For those, the ansatz
vU) n,m
= AU) ei(el n+e2m) V(j+1) = A U+1) ei(el n+e2 m) e 'n,m e
is used. The subtraction of (3.17) from (3.16) results in
MV(j+1)
+ (A -
M)v(j) = O.
Now, in order to obtain statements about convergence and the smoothing factor, it is necessary, at this point, to look at the studied problem and the chosen iterative method. In accordance with Brandt [41], the convergence factor and the smoothing rate are defined as follows:
Definition 3.6.2. Let 8 be a Fourier component of the error function. Then AU+1) JL(8) = _e_._ A(J)
e
is called the convergence factor of the 8-component and
is called the smoothing factor.
A smoothing factor equal to 0.5, e.g., means that three relaxation steps reduce the high frequency error components nearly about one magnitude (which is the case for Poisson's equation on a square using Gauss-Seidel and p = 2). Local mode analysis is straightforward only for regular grids with constant spacing; for unstructured grids it does not work at all.
3.6.2 The Two-Grid Method For the explanation of the principle behind multigrid techniques, it is sufficient to look at a two-grid method. This comprises all important components of a multigrid method and at the same time is still very easy to overview. In this introductory subsection, only regular two-dimensional grids will be treated. Let h be the step size of the fine grid and H the step size of the coarse grid. H = 2h is a usual choice. Before giving the description of the two-grid method, introduce some basic notation.
3.6 Multigrid Techniques
125
The Relaxation. According to their special task in connection with multigrid techniques, the classical iteration methods with good error smoothing, as, e.g., Gauss-Seidel, are generally referred to as smoothing procedures or as the relaxation. The choice of the relaxation depends on the problem. In general, a fixed number of relaxation steps is performed. The Defect Equation. An essential point in multigrid techniques is the determination of some correction on the coarser grid. The following properties of the error are basic in this context: 1. The error of the iterated approximate solution of the linear system AhXh
=
having the exact solution Xh is itself the solution of another linear system with the same system matrix A h : Let Xh be the approximate solution of AhXh = b h iterated by the relaxation method. The error is given by Vh = Xh - Xh. Then AhVh = dh holds with d h = AhXh - bh. Because of Xh = Xh - Vh, the solution Vh yields the searched correction for the approximate solution Xh. 2. Also, the error Vh can be approximated well on a coarser since it is a smooth grid function: Let Xh be the initial approximation for Xh = Ah1bh . The relaxation yields the approximation Xh. The error Vh = Xh - Xh is smoother than the error Xh - Xh and therefore can be represented and determined on a coarser grid without considerable distortion. bh
In this context, the following notation is used: Definition 3.6.3. The equation AhVh = dh is called the defect equation and the quantity d h := AhXh - b h is called the defect. The Coarse Grid Correction. To determine the correction on a grid with step size H, all quantities of the defect equation have to be defined there. In general, the matrix AH is constructed by applying the discretization method to the coarse grid, i.e., analogously to A h . There exist other possibilities such as the algebraic AMG method described in subsection 3.6.3. The defect dh and the correction VH are transferred from one grid to another by linear mappings: Definition 3.6.4. A linear mapping If! : G h -t GH from a grid with step size h to a coarser grid with step size H is referred to as the restriction. The restriction dH = If! dh assigns a certain weighted average value obtained from neighbouring fine grid values to each coarse grid point. Definition 3.6.5. A linear mapping I'H : G H -t Gh from a grid with step size H to a finer grid with step size h is referred to as the interpolation.
126
3. Numerical Treatment of Linear Systems
For every fine grid point, the interpolation Ii! uses neighbouring coarse grid points. Usually - the error analysis will tell- a linear interpolation is sufficient. Interpolation and restriction are dual concepts, so there is some incentive in taking for I'll the generalized inverse of Ii!. Usually, a somewhat simpler restriction serves as well. If a good interpolation is given, which may not be easy, there should be no problem in getting a matching restriction. Next, the defect equation on the coarse grid G H can be set up and the correction can be determined.
Algorithm 3.6.2.1 (Coarse Grid Correction) One step of the coarse grid correction is composed of the following single steps: 1. Determine dh = AXh - bh and transfer dh onto a coarser grid --t dH 2. Solve AHvH = dH 3. Transfer VH onto the fine grid --t Vh 4. Build x hew = Xh - Vh On the coarsest grid used, the defect equation AHvH = dH is usually solved directly. Thus, all elements of a multigrid algorithm are already introduced. A multigrid algorithm usually consists of a combination of two kinds of iteration methods: A classical iteration method with good error smoothing, as, e.g., the Gauss-Seidel algorithm. I I An iteration method that reduces low frequency error components: One iteration step consists of applying a correction that has been computed on a coarser grid. I
Even though the coarse grid correction, i.e., method II, is not convergent itself, the combination of the smoothing iteration I with the coarse grid correction turns out to be a very fast converging method. These facts are discussed in detail in [112]. Before describing the general multi grid scheme, the two-grid method is formally given first:
Algorithm 3.6.2.2 (Two-Grid Method) One step of a two-grid method consists of:
h-grid restriction H-grid interpolation h-grid
relaxation (--t error smoothing) defect computation dh = AhXh - bh dH = Ii!dh solution of AHvH = dH Vh = I'HvH correction Xh - Vh
The two-grid method may also shortly be expressed by: X(Hl)
= R V x(j) - I'HAli Ii! (b h - AhR v x(j)),
3.6 Multigrid Techniques
127
where RIJ stands for lJ relaxation steps, i.e., RIJ x(j) corresponds to Xh in the above description of the two-grid method. Usually, a fixed number of relaxation steps is carried out. Adaptive strategies perform smoothing steps until the smoothing rate, which is easily monitored, falls far below the smoothing factor. 3.6.3 The Multigrid Technique
While the two-grid method already shows the principle of the multigrid scheme, it is nevertheless different from the typical multigrid algorithms, for which the following holds in general: - The grids are staggered more deeply than in the two-grid method; in general, at least three grids are used. - The grids are swept through in special cycles, the most important types of which are the V- and W-cycles. - The coarsest grid has only very few grid points, so that a direct solution method can be applied there. Even for fine grids (h -+ 0), the use of several grids only leads to an effort proportional to the number of unknowns on the finest grid, because, on each of the finer grids, only a few iterations are necessary and, on the coarsest grid, only a problem with very few unknowns has to be solved directly or iteratively. In a multigrid algorithm with the grids G1, ... , G I , the correction on the grid GI- I , 1 > 1 is only approximately determined by another coarse grid correction. Thus, these algorithms are built by recursive use of coarse grid corrections where the number of recursions is determined by the number of grids. Algorithm 3.6.3.1 {Multigrid Scheme} One step of a multigrid scheme on the grids G 1, 1 > 1 consists of the following recursion: lJ relaxations defect computation d l = AIXI - bl dl - l = If-Idl restriction 'Y multigrid steps on grid G l - I , grid G l - I i.e., for A l - I Vl-I = dl - l interpolation VI = ILl Vl-I correction Xl - VI grid G 1 The problem Al VI = dl is either solved by a direct method or by another iterative method.
The choice of the number 'Y of steps on each level j = 1, ... , 1 always leads to a different kind of cyclic course. A schematic representation of these cycles motivates the choice of the following notation:
128
3. Numerical Treatment of Linear Systems
Definition 3.6.6. A multigrid iteration with 'Y = 1 is called a V-cycle; one with 'Y = 2 is called a W-cycle. Fully adaptive cycling strategies have been devised, which are superior in complicated situations.
3.6.4 Embedding of the Multigrid Method into a Problem Solving Environment The solution of an elliptic partial differential equation (PDE) usually is only part of a larger problem, which may contain many further parts. For the embedding of the solver there are two primary decisions: - Can the multigrid be interwoven with the problem definition creating a series of approximations of a continuous problem or is it supposed do be a black box linear equation solver? Multigrid does not give a perfect black box linear equation solver. The most general attempt to build one was AMG (see below), some 'grey box' attempts [245] have been developed, but all these fall far below the speed of a genuine multigrid code. The definition of coarse grids as well as the interpolation have to be determined by the properties of the differential operator. - Is there a series of similar problems to be solved (parameter studies ... ) or a single one? Like all iterative solvers, the multigrid cycle given above needs a starting approximation. Unless this can be supplied by a previous problem or a parameter study, the natural way to get it is by solving the same problem with lower resolution. This leads directly to the FMG algorithm, where every grid has to provide only a slight increase in accuracy (maybe a factor of 4). Random starts are a waste of effort. Thus, the proper way to implement a geometrical multigrid scheme proceeds as follows: the initial approximation of the solution x of the linear system Ax = b is generated using all coarser grids.
Algorithm 3.6.4.1 (FMG Approach) The FMC approach (Full MultiGrid) on the grid sequence G l , ... , G 1 is determined by: grid G l grid Gk, k = 2, ... , l
solution of AlXl
= bl
interpolation Xk = iLl Xk-l i multigrid steps on grid G k
Xk denotes the initial approximation on the grid G k, Xk, k = 2, ... , l denotes the approximation iterated with the multigrid scheme, Xl denotes the approximate solution on Gl . The interpolation iLl may be different from the interpolation ILl of the multigrid scheme.
3.6 Multigrid Techniques
129
The choice of the number i of iteration steps on each grid depends on the problem. Often, an interpolation j of higher order than I is chosen. The reason is that j interpolates the approximate solution of the linear system, which is not necessarily smooth, while I interpolates the smooth correction (= error of approximation). The term nested multigrid iteration is sometimes used in the literature for the FMG approach. grid 1
grid 2
grid 3
\N\:/\:/\ r---------l
°
:-CYde
on
~d3
°v:° °v:°
-I
L~
0\
1
_______
~J
°
Figure 3.4. Formal course of the FMG approach in case of three grids with two V-cycles each. (Grid 1 is the coarsest.)
A further variant of the multigrid idea is the algebraic multigrid. These techniques use the multigrid idea without any geometrical background by assigning lower-dimensional systems to a given linear system where these lower-dimensional systems reproduce certain strong couplings to the initial system. Besides that, this approach is completely analogous to the multigrid techniques.
Algorithm 3.6.4.2 (AMG) In the AMG (Algebraic MultiGrid), problems of lower dimension are set up for each system Ax = b on several levels 1 - I, ... , 1: A k - l := I~-l AkILl'
bk-l = I~-lbk' k = l, 1 - I, ... ,2.
The mappings I~-l : Rnk ~ Rnk-l and ILl: Rnk-l ~ Rn k are referred to as the restriction and the interpolation. The restriction is defined by (Ikk-l )T . I kk-l .._ For the total of 1 levels, the method consists of the following recursion: levell restriction levell - 1 interpolation levell
v relaxations defect computation dl = AIXI - bl dl - l = If-ldl 'Y steps of AMG on levell - 1, i.e., for AI-IVI-l = dl- l VI = ILlvl-l correction Xl - VI
130
3. Numerical Treatment of Linear Systems
The problem Al VI
= dl
is solved by a direct or another iterative method.
The analogy with multi grid techniques is obvious. The setup of the coarse grid matrices by means of interpolation and restriction according to A k - l := 1 AkILl is also called the Galerkin approach [112]. Algebraic multi grid has mainly been developed in order to apply the multigrid idea also to so-called black box solvers. Such techniques are applicable to a whole class of algebraic problems such as, e.g., linear systems with symmetric positive definite matrix. The AMG algorithm frequently compares favourably with other black box linear equation solvers, but for some problems it failed miserably. The crucial 1. part is the proper definition of the matrices
1:-
1:-
Convergence Properties. As was noted at the beginning, the multigrid technique stands out due to the fact that its convergence is independent of the step size of the discretization. While the classical iteration methods x(jH) = M x(j) + c typically show a contraction behaviour according to
Ilx~H) - xhll :::; IIMhllllx~) - xhll, j
= 1,2, ...
with limh-t+O IIMhl1 = 1, it is possible to achieve the following estimate with some constant ~ E (0,1) independent of h
Ilx~+l)
- xhll :::; ~llx~) - xhll,
j
= 1,2, '"
only if the algorithmic components of the multigrid method are suitably chosen and combined. Regarding a two-grid method on Gk, G k - l for the solution of
AkWk
= qk,
qk E Gk,
the algorithm can be written as W
(Hl) --
RV2(I _ Ik A-I Ik-lA )RV1w(j) k-l k-l k k k k k _. S w(j) -. k,k-l k + Ck,k-l qk·
k
+ Ck,k-l qk
(3.18) (3.19)
Generalized, the (l + 1)-grid operator can be obtained recursively as
Sk,k-l
:=
R~2 (I -
ILl (I -
Sk-l,k-l)Ak~l I~-l Ak)R~I.
(3.20)
Then the following theorem [107] can be shown:
Theorem 3.6.1. Let the two-grid operators Sk,k-l defined by (3.19) be bound by IISk,k-lll:::; Cl, k = 2,3, ... ,l with some constant
Cl
E (0,1). Further, assume that some
IIR~2 ILlIIIIAk~l 1:- 1 AkR~111 :::;
C2,
k
C2 > 0
exists with
= 2,3, ... , l.
Then there exists a positive integer (T such that the multigrid operators described by (3.20) can be estimated by
IISk,lll :::; c, with some constant
C
k
= 2,3, ... ,t
E (0,1) independent of t.
3.7 Special MG-Algorithm for Non-Hermitian Indefinite System
131
Some Remarks on the Development of Multigrid Algorithms. It became obvious in this subsection that the following components have to be chosen appropriately during the development of a multigrid algorithm for a special problem : -
the the the the the
special variant of the method, e.g., the FMG method, relaxation method, choice of grids and cycles, restriction and interpolation, solution method on the coarsest grid.
For simple problems, e.g., for Poisson's equation discretized on a rectangular domain with a structured grid, the choice of these components can be done with help of the so-called model problem analysis or the (local) Fourier analysis (compare [112], [42], also see subsection 3.6.1). This technique, often referred to as local mode analysis, is feasible only on structured grids. For more general problems, especially for problems on irregular domains, theoretical studies are hardly to carry out. Complex applications require a careful design. The sequence of grids, the smoother and the interpolation have to be adapted to the properties of the differential operator. Here, techniques like semi-coarsening or transforming smoothers may be necessary. Special care has to be taken if spectral properties of the problem, e.g., the number of negative eigenvalues, depend on the resolution. After the principal design has been decided upon, the tuning will usually require extensive experiments. The abstract convergence theory is of little help here, since it either treats only simple situations or is overly pessimistic. A frequent problem with real world applications is that the resolution of the coarse grid problems are far too low, so the asymptotic estimates are of little value.
3.7 A Special Multigrid Algorithm for the Solution of a Non-Hermitian Indefinite System Originally, the multigrid technique was developed as a principle for the construction of iterative solvers for discrete elliptic problems. Thus, optimal multigrid algorithms for Poisson's equation or the Navier-Stokes equation as well as for some other applications can easily be found in the relevant literature. Below we present an example of a problem illustrating the fact that, for some special practical applications, it can be very hard to find an optimal combination of multigrid components. The algorithm was designed in 1987 to solve some problems in the construction of an accelerator. It was subject to restrictions to be discussed in the text. They resulted in a less than optimal solution. Some of the issues involved - notably the treatment of indefinite problems - are better understood by now and would find a more
132
3. Numerical 'Treatment of Linear Systems
efficient treatment. Nevertheless, the example is rather typical for the accumulation of difficulties that may arise in real applications and thus result in a convergence speed far off optimality. A multigrid algorithm [277] that was developed primarily for a two-dimensional FIT grid in order to solve a linear system with complex symmetric indefinite matrix is introduced. This algorithm solves the high frequency indefinite linear system (2.16) from subsection 2.4 which can happen to be indefinite (for resonances and quasi-resonances). The complex problem (difficult domains with arbitrary boundary curves) did not allow us to perform a theoretical convergence study. Therefore, the algorithm is to a large extent the result of experimental studies which are summarized in the following. The description of the multigrid components which were finally used also reflects the imposed restrictions. In subsection 3.10, the convergence behaviour which is mainly determined by the indefiniteness is studied in more detail. The indefiniteness leads to problems in the course of the smoothing procedure and the approximation of the continuous problem [260]. In particular, numerical studies in [277] revealed that a grid-dependent correction of the discrete operator was necessary. Besides the indefiniteness, there are the high frequency and the near-singularity, which make the setup of an optimal multigrid algorithm more difficult. 3.7.1 Pecularities of the Special Problem and Corresponding Measures Discretization and Coarse Grid Generation. The calculations were embedded into a system of connected problems where easy exchange of data was essential, so the geometry description and the finest grid were given from outside by the URMEL/MAFIA code (Fig. 3.5). At that time (1987), a compatible grid of higher resolution would have required too much computing time for industrial use. Of course, this given restriction collides with a guiding principle of multi grid construction: "Start out with a coarse grid and refine it." - yet this situation is quite common. For coarsening, first an existing AMG method was tested and found unsatisfactory, see below. To save development time, coarse grids were constructed by taking every other grid line in the primary grid and constructing the discrete equations by the similar routines as for the fine grid with some variations at boundaries. An irregular coarsening adapted to the geometry would have given much better results. For the structure given in (Fig. 3.5), the first coarsening would optimally contain horizontal lines 11 and 18, too, for a proper representation of the geometry of the system, and further coarsenings would be more irregular. Implementing such a coarsening based on the geometry description of the URMEL/MAFIA codes is quite a task. Even with a geometry adapted coarsening, it will in general not be possible to represent the geometry exactly on the coarse meshes, so that modifications of the coarse
3.7 Special MG-Algorithm for Non-Hermitian Indefinite System
133
grid equations as described below (Extension of FIT) will be needed. This way, all grids treat (nearly) the same physical problem, at least for k = O.
r
Q:-z 'I'
. . . . . r-.
t'/
\
/ r
Q:-z 'I'
Figure 3.5. FIT grids G1 and GI- 3 for a cavity that was used in the PETRA storage ring at DESY in Hamburg (Deutsches Elektronen Synchrotron) for the acceleration of elementary particles. It is a cylindrically symmetric structure, so that, for reasons of symmetry, it is sufficient to discretize the upper half of the cross-section.
To facilitate explanations, only regular two-dimensional grids were treated in subsection 3.6. But a good discretization requires irregular grids. In [277], a Cartesian FIT grid is used that allows irregular step size. As was already noted, a uniform coarsening was chosen for which every other grid line of the grid GI builds a grid line of GI - 1 in z- as well as in r-direction: Definition 3.7.1. LetPi,1 = (j-I)·J+k withj = 1,2, ... ,J, k = 1,2, ... ,K, J. K = N be the points of the grid GI . Then, in the uniform coarsening of a FIT grid, the grid GI - 1 is determined by the points Pi,l-l := (j - I) . J + k with j = 1,3, ... , J - 2, J, k = 1,3, ... , K - 2, K for odd J, K, j = 1,3, .... , J - 1, J for even J and k = 1,3, ... , K - 1, K for even K.
For this coarsening, the following relation holds on all grid levels: GI -
1
C
GI .
For a regular Cartesian grid, this corresponds to the uniform coarsening with factor 2 (compare [112]). Besides the actual FIT grid G, the finite integration
134
3. Numerical Treatment of Linear Systems -
H- --j
!
f- 1=
1
!
1
i
I
I
I
!
I
I I
i i
i
I
I
I
I
..l....
--+ I
I I-----j--I i I II
I
-+-I -
+-
4--
i
I
I
i
i
.-j--
f--+--!
I
i
----t--, i
i
---- ---
I
II
I
j
I
!
!
I
I
I
I
rr-= Figure 3.6. FIT grids Cl and CI- 1 as well as the dual grids (;1 and (;1-1.
technique also needs a corresponding dual grid G. On the finest grid, the dual grid is fixed by the middles of the grid lines of G. On the coarser grids, the dual grid is chosen such that it can be composed of grid lines from the grid one level finer: Definition 3.7.2. LetPi,l = (j-l)·J+k withj = 1,2, ... ,J, k = 1,2, ... ,K, J. K = N be the points of grid G1. The dual grid GI - 1 for uniform coarsening is determined by the points Fi ,I-1 := (j -1)· J + k with j = 2,4, ... , J -1, k = 2,4, ... , K -1 for odd J, K or j = 2,4, .... , J - 2 under inclusion of the last grid line in r-direction of GI for even J and k = 2,4, ... , K - 2 under inclusion of the last grid line in z-direction of G1 for even K.
The following restriction concerning the grid was used during the implementation in order to lower remarkably the programming effort: The fine grid is chosen so that J and K are odd on all grids G1• Since this does not present a principal restriction, this assumption is made in the sequel. It is the only restriction made with respect to the general applicability to problem (2.16). At the same time, this commitment guarantees that the step size has about the same order of magnitude over the whole grid on all levels. Consequently, large differences in the matrix elements which could lead to instabilities are avoided. The dual grids GI give a sequence of the so-called staggered grids with
GI - 1 ct. G1
(cf. [112]). Figure 3.5 shows an example of the coarsening of a Cartesian FIT grid. Figure 3.6 shows the corresponding dual grid.
3.7 Special MG-Algorithm for Non-Hermitian Indefinite System
135
At this point, it becomes evident that the coarser grids often do no longer yield a good approximation of the studied geometry, i.e., the boundaries of the sub domains with different materials do no longer (approximately) coincide with grid lines. An extension of FIT was developed to solve this problem. An Extension of the Finite Integration Technique (FIT). Suppose we are given a FIT grid G that approximates the boundaries of some structure geometry by a polygon consisting of elementary lines. Next, the elementary areas bounded in this way are filled with material. These fillings can also be done diagonally (compare Fig. 3.7).
M=
o
2
4
5
Figure 3.7. Filling types of the FIT-grid in the program URMEL-I [277]'[287] which uses the special multigrid algorithm.
On the coarser grids generated for the multigrid algorithm, the material boundaries do no longer necessarily coincide with the elementary lines. Therefore, elementary areas exist on the coarser grids which are partially filled with different material. At the same time, it is extraordinarily important not to change the physical problem and hence the location of the material boundaries on the coarser grids, because this would lead to serious convergence problems of the multigrid algorithm, as the solutions are very sensitive to this. To allow nevertheless the use of the compatible coarsening, an extension of the discretization method has been developed in [277]. An ansatz at hand for the treatment of these partially filled areas is the addition of further filling types to those shown in Fig. 3.7. But even if many similar fillings are put together, this would require an enormous number of types even for relatively few grid levels. Additionally, the automatic assignment of the type calibration to the coarse grid cells requires a great programming effort. The following more practical ansatz has been carried out with the preliminary restriction to vacuum or perfect conductors as materials [277]. The state variables of FIT are defined by integrals along elementary lines and elementary areas. Now these integrals are modified according to the partial filling: Definition 3.7.3. The state variables of a partially filled elementary area are determined only over the subarea filled with vacuum or over that part of the elementary lines which borders vacuum. Figure 3.8 shows how different the partial filling of an elementary area on a coarse grid may look like. This ansatz leads to additional approximations, but their influence on the total error and on the convergence is probably very
136
3. Numerical Treatment of Linear Systems
small 6 . Figure 3.8 displays the extension of FIT. The line integrals extend over the shortened distances .1 1, ... ,.1 4 and the integration area is reduced to Aj := .13.12 + .11.14 - .1 1.12. A complete description of the grid quantities used on the coarser grids and of the modified FIT equations can be found in [277]. In [266], the application of this extension has been studied for general time-dependent problems on two-dimensional Cartesian grids.
Z j +J
Figure 3.8. Left: Part of a coarse grid with partially filled elementary areas. Right: Relevant grid quantities for the difference equation for H
O. Depending on k, the continuous and discrete problems may have a small to moderate number of negative eigenvalues (compare Fig. 3.49 in section 3.10). Indefinite problems need special attention at many components of the solver. Gauss-Seidel relaxation ceases to converge, but it still is a good fine grid smoother. Coarse grids may require a more robust smoother, like the Kaczmarz method. Under the following conditions, the coarse grid correction can compensate for the error of the smooth components introduced by the use of Gauss-Seidel relaxation (cf. subsection 3.7.4): Let the error Vk on the fine grid Gk have the smooth eigenfunction ek as the main part. Then the relation dk = MkVk = A.kek holds (compare (2.16)) for the defect dk . The corresponding coarse grid equation is given by Mk-1Vk-l
= A.kI;-lek'
As ek is smooth, we assume it to be close to an eigenvector of eigenvalue A.k-l' Then
M k- 1
with
138
3. Numerical Treatment of Linear Systems Vk-1
Ak k-1 = -,-I ek k Ak-1
is an approximate solution of the coarse grid equation. Since ILl 1:- 1ek = ek may be assumed for the smooth eigenfunction ek, the following holds for the new error on grid G k:
Thus, good convergence can be reached for
1 1-~I«1.
(3.21 )
Ak-1
There may be a small number of eigensolutions for which coarse and fine grid eigenvalues differ in sign. The coarse grid correction for these has the wrong sign and multigrid does not converge. In our example, this could be avoided by correcting for the grid dependence of the numerical speed of light (see below). Otherwise, taking the coarse grid correction as a preconditioner of a Krylov subspace method for indefinite problems can take care of these eigensolutions at moderate cost [28]. Disadvantages of the AMG Algorithm. First, an algebraic multigrid algorithm had been designed that oriented itself to an AMG algorithm for the solution of Maxwell's equations in three dimensions [245]. Eventually, this algorithm was not used, since
1.
the Galerkin approach to the coarse grid matrix of the AMG method leads, for many practical structures, to the situation that another physical problem is solved on the lower levels; 2. for the special AMG method mentioned above, only the piecewise constant interpolation and the so-called "trivial injection" can be used for reasons which are explained in detail in [245]. Both would lead to convergence problems if the AMG algorithm is to be applied in an appropriate way to the linear system (2.16). The first point can easily be seen. As to the second point, there are some peculiarities of this system which are decisive for the convergence of such an AMG algorithm. These peculiarities are generally of basic importance for the setup of a suitable solution method for this system and are described next. Let Xk be an approximate solution of the system (2.16)
Then the related defect equation for
Vk
=
Xk - Xk
is given by
3.7 Special MG-Algorithm for Non-Hermitian Indefinite System Expand the defect
dk
into a series of eigenvectors dk
=L
ekj
of M
139
k:
,Bkjekj'
j
Assume that the actual grid G k is already one of the coarser grids. Then the low frequency eigensolutions are dominant in dk , since the smoothing operations on the fine grids already have eliminated the high frequency eigensolutions, i.e., l,Bkj I is small for all j with large )..kj' Now, some ilk
:=
ilk
is used as a correction on
ILlzk-l
with
Zk-l
Xk.
This correction is given by
= approximate solution
Mk-lZk-l
= I~-ldk'
In particular, let the piecewise constant interpolation be chosen as ILl and the corresponding restriction as I~-l. Then high frequencies couple in the restriction of dk : For d k -1, a staircase approximation is generated by the restriction with twice as large steps than there are for dk on grid G k. This effect corresponds to the addition of a remarkable portion of high frequency components to the right hand side of the defect equation on the grid G k - l . Remark 3.7.3. With the Galerkin approach to the generation of the coarse grid matrices and the "trivial injection" as the restriction, the steps in the staircase approximation load each low frequency eigensolution with a great portion of high frequency components. Therefore, the coarse grid frequency follows by averaging the basic frequency and the intermixed higher frequencies, i.e., the coarse grid eigenvalues are strongly shifted. For that reason, the factors in Zk-l and hence in ilk of an eigenfunction from dk in the correction term will be different from those in Vk. It even may happen that, for some of the eigenfunctions, the sign with which the eigenfunction contributes to the correction changes.
For homogeneous problems (wave number k eigenvalues is smaller [245], [244].
= 0), the relative change of the
Grid-Dependent Eigenvalue Shift. For the compatible coarsening, the eigenvalues are shifted less than for the Galerkin approach. Now the eigenvalues of the coarse grid matrices do not depend on the restriction. In the residual restriction, higher frequencies again couple, but they are not enforced to the same extent as for the Galerkin approach: On one hand, they share a smaller portion because of the interpolation of higher order (compare subsection 3.7.3); on the other hand, the low frequency part of the correction has the right factor, since the low frequency parts are not changed too much in the coarse grid matrix. Thus, only high frequency components that can be eliminated by a few smoothing steps should appear. However, during the development of the multigrid algorithm, the suspicion arose that the eigenvalue
140
3. Numerical Treatment of Linear Systems
shift of M = A - k 2 I + ik'D by the discretization cannot be neglected but should be determined and included in the multigrid algorithm (compare subsections 3.7.3 and 3.10). The following reflections allow to give an estimate of the grid-dependant eigenvalue shift: In the discretization of a sine wave, some essential shifts happen. In fact, a sine wave is still an eigenfunction of the discrete .1-operator, but the eigenvalue is no longer given as square of the wave number k. In order to make the discrete operators (A - k~I) have about the same spectrum as the continuous operator (.1 - k 2 I) and, in particular, always have the same sign for all the eigenvalues on the different grids, it is necessary to choose kd somewhat different from k. With the following ansatz, a hint for the calculation of kd can be obtained: _k 2 is the eigenvalue of the one-dimensional wave
f(x) = sin kx;
f" (x)
= _k 2 sin kx.
Discretization on a regular grid with step size h leads to:
f; (x)
= (sin(k(x - h)) - 2 sin kx + sin(k(x + h))/h2 = (sin kx cos kh - cos kx sin kh - 2 sin kx + sin kx cos kh + cos kx sin kh)/h 2
= (2 sinkx cos kh - 2sinkx)/h 2 - 1) = sinkx· 2(coskh h2 ; (~k2 for small h), i.e. k~ should be set to about
2(coskh -1) h2
(3.22)
in order to reach the right distribution of signs in the spectrum. The optimal value is still somewhat different, since the assumption of a linear wave is only an approximation. At the same time, it follows that kh < 7r /2 is a necessary condition for meaningful computations. The "shift" k 2 in the matrix M now also has the factor s = 2(coskh -1)/(h2 . k 2 ). Correspondingly, the factor s is applied to the right hand side b, which also contains k2 . In subsection 3.10 the results are shown for a sample computation with and without a shift factor. Only one grid was used for these computations, i.e., the linear systems were solved directly. The results show very clearly that the sought for quasi-resonances appear on all grids for the same frequency if the shift factor is used while otherwise they are shifted to lower frequencies for coarser discretizations. Remark 3.7.4. The discretization of the .1-operator leads to a shift of eigenvalues. For the indefinite problem .1- k 2 I, this shift causes convergence problems. Using a shift factor for k2 to achieve an equal eigenvalue distribution on all grids of the multigrid algorithm remarkably improves convergence.
3.7 Special MG-Algorithm for Non-Hermitian Indefinite System
The use of the shift factor diminishes improvement (compare (3.21)).
11 - oX::
1
141
I, which leads to convergence
3.7.3 Grid Transfers for Vector Fields The multigrid algorithm was implemented in order to compute electromagnetic monopole fields in cylindrically symmetric structures. In this case, the unknowns of the linear system to be solved by the multigrid algorithm are allocated at the points Fi ,/ of the dual FIT grid G/. As was noted in subsection 3.7.1, the dual grids GI , G2 , ... ,G/ form a sequence of staggered grids. The transfer functions between the grids have been chosen accordingly. Details can be found in [277]. For the grid transfer, the piecewise bilinear interpolation and restriction were chosen.
The coefficients in the interpolation and restriction formulas depend on the step sizes hz and hr. Figure 3.9 shows the allocation of known values on the dual grid G/- I as well as the values which have to be interpolated on grid G/ on top. The allocation of known values on the dual grid G/ and the unknown values H/- I on G/- I is displayed on the bottom. Theoretically, instead of the transfer function of second order, a piecewise constant interpolation which is of first order could also be chosen. However, tests with such kind of transfer functions of lower order showed that, as might be expected, then the convergence of the multigrid algorithm worsens more and more with increasing frequency w. Treatment of the Boundaries in the Interpolation. One problem for the grid transfer is the correct treatment of the boundaries. At the boundaries of the studied geometry, one or several points lie inside the metal or outside of the grid. The treatment of such boundary points is an important issue affecting the convergence of the algorithm. In the interpolation at some border with metal (which is assumed to be perfectly conducting), it is possible to work with "mirror fields" by using the boundary condition that the tangential electric field Ell vanishes there. Figure 3.10 shows this method known from potential theory [142]. This leads to the boundary condition EZI := 0, EZ2 := O. In the monopole case (Hr = 0), the following differential equation for Ez can be formulated: ik(r~ - rnEZI
=>
v'r2H 3
= 2r2H3 - 2rlHI v'r1 = r.:;; v'r1 H I yr2
Analogously, H4 can be replaced by H2 in the interpolation formula if a perfectly conducting boundary crosses a dual grid cell parallel to the z-axis, as is shown in Fig. 3.10.
142
3. Numerical Treatment of Linear Systems -
I
+-
-
t
~,
\1
I
I
I
I
t ,J ~
H"
H,J
I
I
I
-
•
known values
o
interpolated values
t
-
i
f---
I
I I
H"
-1
f---
-
t
-
H"
I
I
-
1·1,
I
-
[+
-
I
H"
HI2
I
f---
-I
f---
-j -
~
t
f---
i
•
known values
o
values determined hy restridion
I
I
I
I
:--
I
I
-
t t
r-
r
r-
f---
I
f---
I
f---
I
t-
I
t
l-
I
0,., 0,
I
-
t
-
t
I
r
t rI
-
I
t
t
I
I
l-
0,
I
I
f---
t I
I
-I
I
t-
I
+I
f---
I
-
I
-+ -
I
I
i
-
I
t
"4H I
-
I
-
,-1
H"
f---
I
11 l:..L
t
I
-
I
I
l-
-
I
I
-
I
-+ f--
-
I
I
-
I
-+
l-
I
l-
I
Figure 3.9. Top: Allocation of known values and values to be interpolated on the grids Gl- 1 and Gl. Bottom: Allocation of known values and values to be determined by restriction on the grids Gl and Gl- 1 •
Figure 3.10. To the method of mirror fields used for the interpolation at domain boundaries.
3.7 Special MG-Algorithm for Non-Hermitian Indefinite System
143
Remark 3.7.5 {Balancing Transformation of the Fields}. The equation has been written in the form above, since the linear system (2.16) is transformed before its solution: The fields solving the system (2.16) vary strongly as the radius r increases. In order to keep the interpolation error as small as possible, this property can either be compensated for by a weighted interpolation or by some field transformation which leads to a smaller variation when r changes. Here the second method was chosen: The magnetic field H
0, aij ::; 0 for i =1= j and A-I 2: 0 (elementwise). M-matrices arise in the discretization of simple partial differential equations, e.g., in electrostatics.
3.8 Preconditioning
153
In most cases, increasing the number of levels used for the decomposition also improves the rate of convergence (cf. subsection 3.10). One disadvantage is that, for k > 0, it is hardly possible to know in advance how large the additional entry has to be. In general, ILU(k) yields an effective pre conditioner. Two special modifications due to Wittum [319] known as ILU,a(O)- and ILU,a(3)-factorization often prove themselves to be very efficient preconditioners. Also, over large intervals, they are remarkably insensitive to the parameter (3, so that the choice of a standard value such as (3 = -0.99 is possible. ILU,a(O) leads to the same filling scheme as the scheme of A, while ILU,a(3) allows three additional side bands. On the other hand, ILU,a(3) generates a better approximation and thus, in general, a better convergence. If storage presents no problem, ILU,a(3) should therefore be preferred (see also section 3.10). Modified Incomplete LU Decomposition (MILU(k) and MIC(O». To improve the ILU decomposition, Gustafsson [109] proposed a modification of the diagonal elements of L such that the row sum of the error matrix LU - A vanishes. Let Zk be defined as above. Algorithm 3.8.1.2 (MILU(k» Fori = 1, ... ,n: lii = 0 For j = 1, ... , n: min(i,j)-l Sij
= aij -
L
iikukj
k=l
If (i,j) E Zk : For i < j : lij = Sij For i = j : lii = lii + Sii For i > j : iiij = Sij Otherwise: iii = lii + Sij For j = i + 1, ... , n Uij = iiij / lii
For certain elliptic partial differential equations, this decomposition turned out to be very efficient [173]. In general, however, the MILU(k) method is often worse than the ILU(k) and even can fail at all. Another modified method is the MICw(k) [7]. In section 3.10, some convergence studies on practical examples are given. E.G., this pre conditioner improves the spectrum of the complex linear system of electro-quasistatics and leads to a remarkable reduction of the necessary number of iteration steps of the studied cg-like methods (see section 3.10). Incomplete LU Decomposition with Thresholds for Additional Entries (ILUT(k». Saad [225] uses another criterion to neglect some entries. He defines scalars nf and nf to be the number of non-vanishing elements in the lower, resp. strictly upper part of the i-th row of A. Then only the nf + k
154
3. Numerical Treatment of Linear Systems
nf
(resp. + k ) largest elements in L (resp. U) get an entry. The advantage of ILUT(k) compared with ILU(k) is the predictable storage requirement. The algorithm can be found in [225) and [173). 3.8.2 Iteration Methods
The SSOR method is a pre conditioner used very often. The SSOR preconditioning also improves the spectrum of the linear system and in many cases leads to a remarkable reduction of the required number of iterations. The preconditioning matrix is then given by M
= (~D - ~L)(~D)-I(~D - ~U) WI
W2
WI
WI
W2
where A = D - L - U, D is the diagonal, L the lower left triangular matrix and U the upper right triangular matrix. L = UT holds for a symmetric system matrix A. The advantage of this decomposition is given by the fact that it is not necessary to compute M explicitly and to store it. For acceleration of the method, the parameters WI and W2 have to be chosen appropriately. A typical choice is WI = W2 = 1, since the sensitivity of the preconditioning process to these parameters is usually small. For this choice of the parameters, the method is called symmetric Gauss-Seidel preconditioning (SGS). The SSOR or SGS method is often used as a split preconditioner. Then the forward SOR algorithm is used from the left and the backward SOR algorithm from the right. For a "nearly-symmetric" matrix, it is often possible to use the storage efficient SSOR preconditioner by merely applying it to the symmetric part of the system matrix. This idea goes back to Concus, Golub, and Widlund in the middle of the 70's and sometimes is called the CGW method (the authors' initials). There are many variants and (unfortunately) also many different names for this technique. For example, this ansatz is also referred to as partial SSOR preconditioning. Such an ansatz is also studied by Yserentant in [326) for an indefinite symmetric matrix, as it results from the discretization of elliptic equations of Helmholtz-type. As a split preconditioner for the complete system, he uses a matrix B obtained from the symmetric positive definite part of an indefinite matrix A, and he gives estimates for the spectrum of these preconditioned systems. The consequences for the Krylov subspace method were also described in [326). 3.8.3 Polynomial Preconditioning
In polynomial preconditioning, the system s(A)Ax = s(A)b is solved instead of the system Ax = b where s is a polynomial of usually low degree. This idea goes back to Rutishauser [220) and was taken up later
3.8 Preconditioning
155
by several authors (see, for example, [222]). For instance, for a symmetric positive definite matrix, a least squares polynomial over the interval [0, Amax] can be used where Amax is an estimate of the largest eigenvalue obtained via Gershgorin circles. Polynomial preconditioning cannot be recommended on scalar computers since no advantages can be expected there; on such kind of computers, it is customary to apply incomplete LU decompositions for preconditioning. Yet, on vector computers and especially on parallel machines the polynomial preconditioning is advantageous according to [222]. It will not be treated in more detail here, since parallel machines are not standard yet. 3.8.4 Multigrid Methods In general, classical multigrid algorithms have optimal complexity only for regular problems (Le., sufficiently regular solution and sufficiently regular grid) 13. The method of hierarchical basis functions of Yserentant [325] is not of optimal order, especially for three-dimensional problems where the condition number grows like O(h- 1 ), h being the step size. In [13], Axelsson and Vassilevski present a multilevel preconditioning (shortly AMLI for Algebraic MultiLevel Iterative Method) on Finite Element triangulations which yield an algorithm of optimal order under weak assumptions on the FEM grid. Their preconditioning is based on a special incomplete decomposition of block matrices. The decomposition at one level is defined by recursion in terms of the decomposition at the previous level, and the resulting linear system at the coarsest grid is then solved directly. This kind of preconditioning can then also be used for a preconditioned iteration method such as a cg-like method. There are successful applications of this method to simple domains with one re-entrant corner. This domain does not satisfy the elliptic (H2) regularity condition, so a classical V-cycle multigrid algorithm would not converge with optimal order, while the AMLI-preconditioned cg-method converges very fast and also better than the ILU-preconditioned cg-method. Neytcheva [189] also uses the AMLI-preconditioner for indefinite and nearly-singular systems. In [190], Oosterlee and Washio look at three different multigrid algorithms as single solvers and as preconditioners for the BiCGSTAB and GMRES(20) method comparing it with BiCGSTAB and GMRES(50) method, each preconditioned by MILU. They apply these methods for singularly perturbed problems for which it is difficult to design optimal standard multigrid methods. In particular, a diffusion equation with strongly varying diffusion coefficients is one example for which the three MG algorithms do not converge satisfactorily and for which the Krylov acceleration "is really needed for convergence" [190]. In all examples studied in [190], the MG-preconditioned 13
Compare to the special multigrid algorithm in subsection 3.7 for high-frequency solutions and irregular grids which does approve this statement.
156
3. Numerical Treatment of Linear Systems
Krylov subspace methods are faster than the MG algorithms themselves and faster than the MILU-preconditioned Krylov subspace method.
3.9 Real-Valued Iteration Methods for Complex Systems For many applications, complex linear systems Ax = b with A E cnxn,x,b E C n
(3.23)
are to be solved. The linear system can also be written as
(R + is)(x
+ iy) = u + iv
(3.24)
with R,S E Rnxn,x,y,u,v ERn. In what follows, a (preconditioned) iterative method is introduced which was developed by Axelsson. The presentation here follows van den Meijdenberg [274] and Korotov [157] as well as a draft by Axelsson and Kutcherov [11], [10]. 3.9.1 Axelsson's Reduction of a Complex Linear System to Real Form This method uses a well-known procedure to solve a real system (3.25) whose dimension is twice as large as the dimension of (3.23) and whose condition number is the square of the condition number of the original problem. It is assumed that Rand R + SR-1S are non-singular. This holds, e.g., if R is symmetric and positive definite (which is usually abbreviated as spd) and S is symmetric 14 . Using the identity (I - iSR-1)(R + is) = R + SR-1S, the relations x = (R + SR-1S)-1(u + SR-1v), y = (R+SR-1S)-1(V-SR-1u)
(3.26) (3.27)
follow for the solution of (3.24). This is an equivalent real formulation for the complex linear system (3.23) or (3.24). However, this formulation is not very advantageous, since it requires to solve two linear systems, one with SR- 1S and one with R. In addition, it necessitates extra matrix-vector multiplications and vector operations. 14
If R is allowed to be singular, then Sand S + RS- 1 R have to be non-singular. In this case, the system (3.24) multiplied by i is solved.
3.9 Real-Valued Iteration Methods for Complex Systems
157
To reduce the effort, the real formulation (3.25) can be exploited for the complex system: As soon as x is known, y can be determined from Ry = v-Sx instead of (3.27?5. With that, instead of the complex system (3.24)
(R+iS)(x+iy) =u+iv, we have the systems
(R + SR- 1S)x = U + SR- 1v Ry = v - Sx.
(3.28) (3.29)
They are now solved consecutively in the real vector space. If R is ill-conditioned, some generalized form of the first system (3.28) may be better suited. This is the case if R + as with some real parameter a is better conditioned than R. The derivation of the generalized method starts off with system (3.24). Adding the first equation multiplied by a yields the system
Rx - Sy (R + as)y - (aR - S)x
=u =v -
(3.30)
au.
(3.31)
Elimination of y leads to
Rx - S(R + as)-I(aR - S)x
= u + (R + as)-I(v -
au).
Using
aR - S
= a(R + as) -
(1 + ( 2 )S
gives
Cox
= u + S(R + as)-1 (v -
au)
(3.32)
where (3.33)
For a = 0, (3.28) follows again. The system of form (3.32), (3.33) was first studied in [11], see also [10]. Below it will be shown that the iterative solution of (3.32) is essentially equivalent to the solution of (3.28) with (R + as)R-l(R + as) as preconditioner. This method was already studied in [4]. A general iterative method can be written in the form C( x 1+1 - x I) 15
I -TIT,
l --
° ,
1, ...
Thus the solution for one system with R + S R- 1 S is saved.
158
3. Numerical Treatment of Linear Systems
where Xo is the initial residual, C a preconditioning matrix, Tl are accelerating parameters such as the parameters in the Chebyshev iteration. Consider C(a) = R + as. Equation (3.32) implies the relation
rl
= (R -
as)xl - u + S(R + as)-l [(1 + a 2 )Sxl - v + au] .
Using identity (3.9.1), evaluate (1 + a 2 )S:
rl
= Rxl -
U -
S(R + as)-l [(aR - S)xl + V
-
au] .
Then rl can be computed without direct inversion of (R+aS) in the following way: Set zl = (aR - S)xl + V Solve (R + as)yl = zl rl = Rxl - Syl - U
-
au
(3.34) (3.35) (3.36)
This way yl is already found when rl is being determined after xl is determined, so (3.29) no longer has to be solved. Furthermore, this procedure avoids the initial computation of S(R + as)-l(v - au) on the right-hand side of (3.32)16. Finally, the method shall be written down completely. For convenience, it is called here the C-to-R method.
Algorithm 3.9.1.1 (C-to-R Method with PreconditioningjAxelsson)
Choose a. Set Xo := O. For 1 = 0,1, ... Set zl:= (aR - S)xl + V - au Solve (R + as)yl = zl Compute rl = RXI - Syl - u Choose Tl Solve (R + as)xl+ 1 = (R + as)xl - Tlrl This is a general form of the C-to-R method where the parameters a and Tl and the solution method for both linear systems (R + as)yl = zl and (R + as)xl+ 1 = (R + as)xl - Tlrl still has to be chosen.
3.9.2 Efficient Preconditioning of the C-to-R Method Let C(a) = R+aS be a preconditioner for Ga. To analyze the corresponding condition number, the generalized eigenvalue problem
JL(R + as)x 16
= Gax
If the first system in (3.25) is multiplied by (-a) and added to the second, then, after simple reordering, one obtains (3.35), i.e., (R+aS)yl = (aR-S)xl +v-au.
3.9 Real-Valued Iteration Methods for Complex Systems
159
is studied. With H := R- 1 / 2SR- 1 / 2, this is equivalent to
J.t(I + exH)y
= [1 -
exH + (1
+ ex 2)H(I + exH)-l H]
y,
(3.37)
where y = R 1 / 2 X. Now let>.. be an eigenvalue of the second generalized eigenvalue problem
>..Rx {::::::} >..y
= Sx with x f:. 0 = H y with y f:. O.
By (3.37), it follows that 1 - ex>..
J.t =
+ (1 + ex 2)>..2 1(1 + ex>..) 1 + ex>.. '
or, equivalently, (3.38) The parameter ex should be chosen so that the spectral condition number J.tminl J.tmax is minimized.
Theorem 3.9.1. Let R be spd and S be symmetric positive semi-definite. Then _ {
J.tmin J.tmax
={
I " H2 for 0 :S ex :S A
~ for 5.. {Ha>..)2
< ex
-
I for 0: :S ex ~ lor 0 < ex < ii {1+a>..)2 J'
holds for the extreme eigenvalues of (R where 5.. = maximal eigenvalue of R-1S, ii = }. . 1+v/1+}.2
-
-
+ exS)-lG a ,
The spectral condition number is minimized when ex value
J.tmaxlJ.tmin
with Ga as in (3.33)
= ii; then it takes on the
= 2 J1+12 ~. 1 + VI + >..2
Proof: The limits for the extreme eigenvalues follow from (3.38) byelementary calculation for 0 :S >.. :S 5... Similarly, it can be shown that J.tmax 1J.tmin is minimized by some ex in the interval ii :S ex :S 5.. with J.tmax = 1. Consequently, it is minimized for ex = argmax&'50a(l + ex 2)-1, i.e., for ex = ii.
160
3. Numerical Treatment of Linear Systems
Remark 3.9.1. For an arbitrary 0: satisfying 0: ::; 0: ::; 1, the condition number is limited by 2, as was shown above. In practice 5. is often large, viz. 0: = 1 - 1/5.+ 0(1/5. 2 ), >. -t 00 [11], [10]. For this reason, 0: = 1 if >. is unknown. In this case, the smallest eigenvalue is given by ~, the largest by 1 and the condition number takes on the value 2. As shown in [11], [10], the C-to-R method can be extended to a twoparametric method. But it is obvious that the computational effort for one iteration step of the two-parametric C-to-R method equals the effort of two iteration steps of the one-parametric C-to-R method with the Chebyshev iteration. Next, because of the optimality property of the Chebyshev iteration, the one-parametric C-to-R method with the Chebyshev iteration converges at least as fast as the two-parametric C-to-R method. For this reason, the application of the two-parametric C-to-R method does not present further advantages compared with the one-parametric C-to-R method with the Chebyshev or cg iteration.
3.9.3 C-to-R Method and Electro-Quasistatics In electro-quasistatics, the complex linear system (2.14) results from discretization with Finite Integration Technique
If P.E is written as P.E = (+iT] and the right-hand side is written as l!.o = u+iv, the correspondence to system (3.24) becomes obvious. The block matrices AI< = SDJ;T and Ac = SDcST are both symmetric and positive definite (spd). Thus the C-to-R method can be applied to the system (2.14) of electroquasistatics. In an unpublished diploma thesis [274], supervised by Axelsson, van den Meijdenberg applies the Finite Element method to the electro-quasistatic equations. She ends up with a complex linear system of the same block structure:
(A + iwB)(( + iT])
= x + iy.
She uses the C-to-R method by Axelsson to solve this system in the twodimensional case. The essential results of van den Meijdenberg can shortly be summarized as follows: - She uses the C-to-R method with the Chebyshev iteration. To determine yl, the preconditioned cg algorithm is used as an inner iteration. An incomplete LV decomposition works as a preconditioner. Van den Meijdenberg scales the ill-conditioned matrices A and B by some diagonal scaling matrix such that the resulting diagonal entries of the scaled matrix are close to 1.
3.10 Convergence Studies for Selected Solution Methods
161
- Several two-dimensional problems with 700 to 2000 unknowns were studied. The condition number of the matrix could be significantly improved by scaling for the electro-quasistatic examples,17 thus she reduces the cost of the outer iterations and performs that computation with lower accuracy than the accuracy possible and meaningful for the inner cg-iterations. In some examples, convergence to the correct solution could only be reached after scaling. - The values of the conductivity of the sheet vary by powers of ten in the range between 8 . 10- 12 and 8 . 10- 7. From 8 . 10- 9 on, the achievable accuracy of the iterated solution went down and, for 8 . 10- 7, no solution could be found any more. This reflects the fact that the diagonal elements of the system matrix may not be to large compared with the other elements in order to be balanced by the scaling. - The Gauss algorithm, which is still applicable to the dimensions of the examples, was three times faster than the C-to-R method for some of the examples. For the example with about 2000 unknowns it became comparable to the C-to-R method. - The CGS algorithm as well as the GCG-LS method converged for none of the electro-quasistatic examples while the C-to-R method with scaling converged in all cases. The C-to-R method was also implemented in the FIT system of electroquasistatics [129]. The parameter studies of van den Meijdenberg also have been comprehended with Krylov-subspace methods for some relatively simple examples. Some of the results are described in subsection 3.10. Altogether, a very similar behaviour was observed.
3.10 Convergence Studies for Selected Solution Methods Some of the previously introduced direct, stationary, and non-stationary solution methods as well as a special multigrid algorithm have been implemented for several field-theoretical problem types. Some convergence studies from numerical experiments will be described here for the iterative methods. Besides purely academic examples, which still are relatively close to the usual model problems in theoretical convergence studies, some results are presented for realistic applications. One of the questions studied was to what extent theoretical results about convergence properties can apply to practical problems. These practical problems are often very large and possess geometrical singularities. All examples have been discretized with the Finite Integration Technique (FIT) [296], [305] in the program packages MAFIA [65] or URMEL-I 17
Van den Meijdenberg refers to this case as electrostatics, but in fact the equations are those of electro-quasistatics. The other problems which she calls thermalmagnetic are usually referred to as eddy current problems.
162
3. Numerical Treatment of Linear Systems
[287]. Unless otherwise stipulated, the relative residual is taken for a stopping criterion
Ihlliliroll
:s 6.
3.10.1 Real Symmetric Positive Definite Matrices In [159], the implementation of the FIT equations for the electro- and magnetostatics is described for three-dimensional grids, in [25]- for two-dimensional Cartesian grids ((r, z) or (x, y); cylindrical problems or Cartesian problems which are invariant in one coordinate direction) and in [72]- for three-dimensional circular cylindric grids as well as with open boundary conditions. In [292]' an improved algorithm is introduced for nonlinear magnetostatics, and its application to nonlinear electrostatic problems is described. The FIT equations have also been implemented for stationary current problems and stationary temperature problems (see also [23], [283], [288]). The system matrices of electro- and magnetostatics, stationary current problems and stationary temperature problems are real symmetric and positive definite. Therefore these linear systems can be solved with the GaussSeidel, the SOR, and the cg method. For the cg method, again, several different pre conditioners can be applied. In extensive convergence studies, the standard IC preconditioning, the MIC1] called modification by Gustafsson [109], and several iterative preconditioners have been compared (see also [202]). The results of these convergence studies are shortly summarized in the following.
Simple Model Problem. As a simple model problem, a unit cube with Dirichlet boundaries was chosen, driven by a conducting loop (cf. [202]). The example was discretized on an equidistant Cartesian grid with m points in each coordinate direction (step size h = 1I (m -1), total number of grid points N = m 3 ). The side length of the conducting loop is 0.1 m. The exciting direct current has strength 10 A. Figure 3.13 shows the geometry of the example.
/
I~
D
x
/
~
Figure 3.13. Simple model problem for statics. - Unit cube driven by a current loop with side length 0.1 m and direct current 10 A.
3.10 Convergence Studies for Selected Solution Methods
163
In statics, the matrix A has dimension n = N, the diagonal elements (}:i each have the value 6/h 2 and the entries in the side bands (3i,"/i and c5i each have the value 1/ h 2 (before implementing the boundary conditions). The matrix A corresponds to the Finite Difference matrix for Poisson's equation on the same domain. The eigenvalues and eigenvectors of this matrix are well known: The eigenvectors
correspond to the eigenvalues
A/
= :2
(sin2(ih%) +sin2(jh%) +sin2(kh%)) , 1 ~ i,j,k
~ m, 1 = 1,,,,N.
The extreme eigenvalues correspond to i = j = k = m and i = j = k = 1: Amax
12 2 7r 12 . 2 7r = h2 cos (h'2) and Am in = h2 sm (h'2)'
As in the two-dimensional case [114], the condition number is equal to
It grows quadratically when step size h decreases. The eigenvalues of the Jacobi matrix which are decisive in the convergence of the Jacobi, Gauss-Seidel, and SOR method (cf. theorem 3.2.2) are as follows:
A/
= ~ (cos(ih7r) + cos(jh7r) + cos(kh7r)) , 1 ~ i,j, k ~ m, 1 = 1, '"
N.
The eigenvectors of the Jacobi matrix are the eigenvectors e/, 1 = 1, "., N of the matrix A. The spectral radius of the Jacobi matrix in two- and threedimensional case is given by
Thus, for the classical iteration methods in three-dimensional case, the effort is at most of order O(N1. 66 ) for Jacobi and Gauss-Seidel methods and of order O(N1.33) for SOR with optimal relaxation parameter w which is given by (3.1) in theorem 3.2.2. The estimate (3.5) from lemma 3.4.1 of the effort for the error reduction in the cg method also holds for the preconditioned cg method. In this case, however, the matrix A has to be replaced by the matrix A' of the transformed system (see subsection 3.8). For left-handed preconditioning with the MIC1] , ILU w , or SSOR method, the following relation holds, which has been shown by Gustafsson [109] and Axelsson [7] for the two- and three-dimensional case: (3.39)
164
3. Numerical Treatment of Linear Systems
In the three-dimensional case, h- I
k
IX
N I /3 and
= O(N I / 5 ),
which follows from (3.5) for the number of necessary steps. O(N1.33) follows for the dependence of the total number of operations on the step size h or on the number of grid points N if no preconditioning or the Jacobi, SGS, IC(O), or IC(3) preconditioning was used, and O(N1.17) if SSOR with optimal w, MIC1](O), MIC1](3), ILUw(O), or ILU w(3) was used as a preconditioner.
Iteratlors
lteratlo..
I~r-----r--->nr----.
I~r-----r-----r-----'
~
SOR j
-+-.
CGJAKi
100
-8-
CGSGS j -0" CGSSORi
-e-
0'-
10 ........"----'1000 (a) Iteratlors
100...-----,-----;p,------,
-0'
I 1000 (e)
1010.--7.0'---ri06
Nj
Figure 3.14. Residual curve for different iterative methods in case of the simple model problem from statics. Graphs from the graduation paper of Pinder [202]
Figure 3.14 shows comparisons of different iteration methods with and without preconditioning on five differently refined grids in case of the model problem. Table 3.2 gives the used number of grid points and the resulting values for the condition number of A, the spectral radius of the Jacobi matrix and the optimal relaxation parameter Wopt. In Fig. 3.14, the number of iterations is displayed on the doubly logarithmic scale as a function of the number of unknowns (= grid points). The iterations were stopped as soon as the actual residual r~rue (which should not be confused with the recursively calculated residual rk) from the cg method satisfied the condition
3.10 Convergence Studies for Selected Solution Methods I,J,K 11 21 31 41 53
h 0.10000 0.05000 0.03333 0.02500 0.01923
N = dim(A) 1.33100 .1O~ 9.26100 .10 3 2.97910 .10 4 6.89210 .104 1.48877 .10 5
K(A) 3.9860.10 1 1.6145 .10 2 3.6409 .10 2 6.4779.10 2 1.0952.10 3
p(MJa~)
Wopt
0.95106 0.98769 0.99452 0.99692 0.99818
1.5279 1.7295 1.8107 1.8545 1.8861
165
Table 3.2. Step size, number of grid points, condition of the system matrix, spectral radius of the Jacobi matrix and optimal value for the SOR relaxation parameter of the five examples computed for the model problem.
The curves show very well the expected linear dependence. The determined average gradients m are listed in Table 3.3. Gauss-Seidel 0.59 cg-SGS 0.30
SOR, Wopt 0.34 ICCG(O) 0.30
cg, cg-Jacobi 0.34 ICCG(3) 0.30
m Table 3.3. Average gradients determined for different preconditioned methods. Both cg without preconditioning and that with Jacobi preconditioning have the same gradient, since those methods are identical for the studied example, as the diagonal elements of A all have the constant value 6/ h 2 •
The effort per iteration is ex N for all methods such that the total effort is ex NHm. The theoretical dependencies given above for the computational effort as functions of the grid resolution could also be put within the bounds of expected accuracy. According to (3.39), the cg algorithm preconditioned with MIC rp ILU w , or SSOR should possess a remarkably better convergence speed compared to the other methods. This indeed becomes obvious in Fig. 3.15. Note that the SOR method with optimal W is nearly as fast as the cg method with SGS preconditioning. The incomplete LU decomposition in its version with three additional diagonals only seems to be advantageous in the IC decomposition. In the other cases, it does not show essential advantages because the lower number of iteration steps is more or less compensated by the greater computational effort. As expected, the preconditioned cg methods are without any doubt the optimal methods for the system matrix of static problems. All studied pre-
166
3. Numerical Treatment of Linear Systems 1e+04 . - - - - - - - - - - - - - - - - - - - , 8 - - - £ ) G a u s s - S e l d e l 8---£) cg-Jakobl )E--1'1 = (11:1, WCl,r), >'2 = (11:2, WC2,r)' The parameters in the studied example were chosen as follows: half height h = 5 cm, material parameters (c1,r, 11:1) = (8.0,2.0.10- 7 S/m) and (c2,r, 11:2) = (4.0,1.0.10- 12 S/m), voltage V = 1 V, and frequency f = 50 Hz. The dimensions are 4 cm x 10 cm xl cm. The system was discretized using regular step size 1 cm in all directions, i.e., with only 110 grid cells. Figure 3.32 shows a comparison between the electrostatic potential cP E and the real part of the electro-quasistatic potential Re(~E)' each computed with Finite Integration
178
3. Numerical Treatment of Linear Systems C===================~I £2 1\:2 fh --------------------------------------------------------------. I
£,
v
X; I ()
Figure 3.31. Simple parallel plate capacitor of height 2h with two layers of different material.
Technique, as well as the real part of the analytically determined potential cp. The agreement between the numerical and the analytical results is excellent. This example also makes perfectly clear that the electrostatic model would not be well suited and only the electro-quasistatic model works well for this kind of problem. 2.5
ID I
2.0
,./
",/
Elektrostatik Elektro-Quasistatik
,,-
.... ,'
(l)
..... .....
........III
1.5
I'::
(l)
.... 0
'"
"~/
1.0 .,-
/
0.5
----_/
/
/
/
/
/
/
/
I
/
/
/
/
/
I
/
/
/
/
/
/
/
I
/
/
/
/
/
/
/
/
/
o. -4.00
o.
-2.00
2.00
4.00
y/cm
Figure 3.32. Electrostatic potential E and real part of the electro-quasistatic potentials Re('kE ), each computed using Finite Integration Technique, as well as the real part of the analytically determined potential ¢ for the simple parallel plate capacitor. The numerical results agree very well with the analytic solution, so that no difference can be found in the curves.
Simple Plate Capacitor. Figure 3.33 shows a typical convergence history for another simple plate capacitor. Its side length is 8 cm and its height is 3 cm. It is assumed that the dielectric material has relative permittivity lOr = 3 and electric conductivity", = 10- 6 S/m. Furthermore, a voltage gradient 15 kV Icm is assumed at the frequency f equal to 50 Hz. The complex linear system to be solved has dimension 45177.
3.10 Convergence Studies for Selected Solution Methods it ::2 au.
I-
0.1
c
0.01
£ "0 ::::I
179
MIC(O)"C()S ... _.~ (1) MIC(0)-CGS2 _._-_. (2) ...... ·MIC(O)'TFQMR''''''(3Y MIC(O)-BiCGSTAB ...... (4) ... MtC(OFBiCGSlati{2)'CCCC'(S]
0
.0
§
0.001
0
c 0.0001 Q;
a. a.
2§ 0
c (ij
::::I "0
'iii ~
Q)
.:::
tii
Oi
cr:
1e-05
..
1e-06
......................;.....
········f
1e-07
(3)
1e-08
········f·· .. ·
0
50
100
150
200
Number of matrix vector multiplications
250
Figure 3.33. Convergence history for the simple plate capacitor with 45177 grid points and MIC(O) left-handed preconditioning with w = -0.5. The graph is courtesy
of Clemens, TU Darmstadt [58]
In all cases, the MIC(O) preconditioning was used. Comparing the convergence curves, it has to be noted that only an upper bound for the true residual is given in case of the TFQMR method. In some cases this upper bound is much larger than the true residual. The CGS and CGS2 methods both produce very strong oscillations. The convergence of the CGS2 method is somewhat better than that of the CGS method. The BiCGSTAB and the BiCGstab2 method show a very similar convergence behaviour. Altogether, the convergence of the BiCGstab2 method is the fastest. It reaches a relative residual norm of 10- 8 after approximately 100 matrix-vector multiplications. The CGS method needs about 25% more matrix-vector multiplications to reach the same residual. Some parameter studies on the dependence of convergence on the actual value of the electric conductivity K, were carried out for this example with a grid of 42772 grid points shown in Fig. 3.35. The values of the electric conductivity K, of the layer of water varied by one magnitude each between 10- 9 Sim and 10- 6 S/m. Starting at about 10- 6 Slm, the convergence curve remarkably went down. This illustrates the fact that the diagonal elements of the problem matrix may not differ too much one from another in order to be balanced by the scaling provided in the Jacobi-preconditioned BiCGCR method. Contaminated Insulator. An important problem in high voltage engineering are phenomena caused by moisture layers or pollution layer on the insulator. In [208], there are experimental studies on the effects of surface contaminations with low conductivity on the aging process of cylindrical test insulators from epoxy resin loaded by alternating current. For numerical studies
180
3. Numerical Theatment of Linear Systems
J-.
,
Figure 3.34. Simple plate capacitor with layer of water. For symmetry reasons it is sufficient to discretize a quarter of the geometry.
"OOO"01 1~"~~
l.U O !~ O l
Figure 3.35. Simple plate capacitor with layer of water. Grid in the (x, y)-plane.
one of these test specimen was chosen: The electrodes are each 6 mm thick and have radius 18 mm. The computed model is a 30 mm long solid piece of the originally [208]100 mm long hollow cylindrical test specimen with radius 15 mm. The epoxy resin has relative permittivity fr = 4; the relative permittivity of water droplets is fr = 81 and their electrical conductivity may be assumed to be I'\, = 10- 6 S/m. The frequency f of the alternating current is 50 Hz, and a voltage gradient of 5 kV Icm is applied. The size and form of the water droplets vary. Neglecting possible deformations which may be caused by the electric field, a rounded form with typical diameter of 1-3 mm can be assumed. As clearly visible in the photograph of Fig. 4.32 in subsection 4.5, the droplets are quite close to each other but randomly distributed. First, constant radius 3 mm was assumed for the water droplets. Next, in Fig. 4.38 in subsection 4.5, a simulation model with many different droplets distributed over the whole surface is shown on the left. On the right, Fig. 4.38
3.10 Convergence Studies for Selected Solution Methods
181
shows a model where some of the water droplets coalesced. In subsection 4.5, potentials and electric fields are displayed for all those examples.
10' ,-----~------~------_r------,_----__. 10'
-
10'
PBiCGCR PBiCGCR (MRS)
-
COCG
-
COCG(MRS) PCOCG
10'
10" 10" 0.0
2000.0
4000.0 6000.0 8000.0 number of malri. vector mu~iplicalions
10000.0
Figure 3.36. Convergence history of different Krylov subspace methods for the solutIOn of the field proolem in case of an insulator with seven distinct water droplets and a discretization with 308826 grid points, i.e., 926478 complex unknowns. Compared are the COCG and COCG-MRS (COCG with Minimal Residual Smoothing) with the preconditioned methods PCOCG, PBiCGCR, and PBiCGCR-MRS. The preconditioned methods only need about 3 % of the computational effort compared with the COCG without preconditioning. The graph is courtesy of Clemens, TV Darmstadt [60)
The convergence history of different Krylov subspace methods is shown in Fig. 3.36 for the solution of the field problems in case of an insulator with seven distinct water droplets and a discretization with 308826 grid points (cf. Fig. 4.35). The complex linear system also has the dimension of 926478. Compared are the COCG and COCG-MRS (COCG with Minimal Residual Smoothing) with the preconditioned methods PCOCG, PBiCGCR, and PBiCGCR-MRS. The preconditioned methods only need about 3 % of the computational effort compared with the COCG without preconditioning. As a preconditioner, an implicit complex-valued split Jacobi preconditioning is used. The effect of MRS is quite visible: The original COCG method shows the well-known wild oscillations, while COCG-MRS has a smooth convergence curve, which always stays below the oscillating COCG curve. Figure 3.37 shows a zoom into the convergence curves of the preconditioned methods. Most of the time, the curves of PBiCGCR and PBiCGCRMRS coincide. As expected, they always differ where the PBiCGCR curve happens to have an increase in the residual norm. All three curves end with the same final result in this example. Yet, the PCOCG curve starts at ap-
182
3. Numerical Treatment of Linear Systems
1EH>1
1EH>5 0.0
100.0
200.0 :JlO.o number of matrix vedDr rrUti plications
_.0
m.O
Figure 3.37. Convergence history of different preconditioned Krylov subspace methods for the solution of the field problems in case of an insulator with seven distinct water droplets and a discretization with 308826 grid points, i.e., 926478 complex unknowns, and computation in double precision. The graph is courtesy of Clemens, TU Darmstadt [60)
proximately 100 matrix-vector multiplications with strong oscillations of the residual, which, however, are still relatively smooth compared with the wild oscillations of COCG (cf. Fig. 3.36). These computations were carried out on a SUN Microsystems Workstation in double precision. Furthermore, some parameter studies with respect to the dependence of the convergence on the actual value of the electric conductivity K, have been carried out. The values of the electric conductivity K, of the water droplets or water layer were varied by one order of magnitude each between 10- 9 and 10- 6 . That strongly influenced convergence, as can be seen in Fig. 3.38. This reflects the fact that the diagonal elements of the problem matrix may differ too much to be balanced by preconditioning. Figure 3.38 shows the obtained convergence curves for different conductivities in case of the Jacobi preconditioned BiCGCR method. In Fig. 3.39, not only the convergence curves for different conductivities but also the convergence curves of distinct and coalescing droplets are displayed each for the application of the Jacobipreconditioned BiCGCR method.
3.10.3 Complex Indefinite Matrices In time-harmonic field problems, linear systems have complex indefinite matrices. Depending on the problem type, these matrices may be symmetric, quasi-symmetric, or non-symmetric. (A matrix A is called quasi-symmetric if it is similar to a symmetric matrix.) For the two-dimensional case with
3.10 Convergence Studies for Selected Solution Methods
10'
--- ._. c:ood. 10-11 ---- cood.l ...1 - - cood. 10-6
10"
~
..,~ 'f!
10~
10"
.~
liI
i!
183
"
IO~
",
'
......~-~~......---,.j:. ~ \
104
104
0
lOll
200
300
400
soo
oo:mber of itefIIioo 5tepo
Figure 3.38. Convergence curves for different conductivities. The Jacobipreconditioned BiCGCR method is used. Computation with single precision. 10'
---- _,.,ed drops (canducliYlly 1...9 SImI
- - - completely dry - - - conneeted drcp$ (canducliYlly 1.9 SImI - - separa'oo drops (canducliYlly 1&-6 SImI
10~
o
100
200 number 01
300 ~8"1ion .t~
Figure 3.39. Convergence curves for different conductivities in case of distinct and coalescing water droplets. The Jacobi-preconditioned BiCGCR method is used. Computation with single precision.
184
3. Numerical Treatment of Linear Systems
quasi-symmetric indefinite matrix, a special multigrid algorithm was developed [277]. In the three-dimensional case, several modern Krylov subspace methods have been implemented [116], [58], [60]. The results of corresponding convergence studies are presented in the following. Krylov Subspace Methods. In [116], the solution of the FIT equations on three-dimensional Cartesian grids is described for time-harmonic fields. For this problem type, the system matrix is complex symmetric. In [116], this system was solved with the preconditioned COCG method. If now a waveguide boundary condition is used in order to simulate open boundaries [58], then the complex system matrix becomes non-symmetric and indefinite. Therefore, to solve these linear systems, the modern Krylov subspace methods, which also were used in electro-quasistatics, as well as variants of the QMR method that do not assume symmetry, were implemented. For the time-harmonic problems, the convergence behaviour of these solution methods was studied in [58] and [60] in one simple example and two realistic applications. (i) Simple Test Example: As a simple test problem, a rectangular domain with a wire inside was chosen [58]. The current in the wire induces electromagentic fields in the box. One complete side of the domain was assumed to be waveguide boundary. Figure 3.40 shows this simple example. The computational domain was discretized with a 4 x 3 x 4 grid. This gives a complex non-symmetric system with 144 unknowns. Figure 3.41 shows the eigenvalue distribution of the system matrices of this simple example for the cases with and without waveguide boundary condition.
Figure 3.40. 4 x 3 x 4 grid of the simple test problem.
Schuhmann, TV Darmstadt [58]
The picture is courtesy of
To make a fair judgment of the convergence studies, the following has to be noted: The number of grid points belonging to the waveguide boundary amounts to about 10% of the total number of grid points; this number it is unusually large compared to one in practical applications. Since preconditioning is oriented to practical problems, the partial SSOR preconditioning was chosen, which simply ignores the non-symmetric fraction of the matrix. As a consequence, the application of this preconditioner leads to deteriorated
3.10 Convergence Studies for Selected Solution Methods 1.0
185
r-----,--~---,-----__r--~-_,
x Ideal magnetic bOundary o Open waveguide boundary
,
0.5
,p ,
o
-- - - --- -&- - ->f-OOCM
-0.5
~&GM & '.I:l
I
", ,
'" 1O-1i c=: 10.7
... . ....
0;
10-8
0
l
]
" .' .
10.5
~~~ I
250
.......
.........,
500
I ,
' .' .' .' ...~
750
1000
I
1
1250
Number of iterations Figure 4.4. Convergence history of built-in solvers: SOR versus preconditioned cg.
10° 10-)
\\
:I ~
peG SOR
\,
\
§ 10-2
0
c:: Ol
.g
10-3
_.-_.- -'-.- -'-.- --
'r;; 104 ~
~
10-5
' i 10-1i
Q
c=: to-7
to- S
0
50
100
150
J
200
Number of iterations Figure 4.5. Blown-up view of the first 200 iteration steps of the convergence history of built-in solvers: SOR versus preconditioned cg.
208
4. Applications from Electrical Engineering
Figure 4.6. Error development in the preconditioned cg-algorithm. The error in the best iterative approximation is plotted after the 1st , 6t , 12th , 24 th , and 50th step.
To get a better understanding of the local error distribution during the iteration process, the following studyl was carried out: First, the best approximation Pi: was computed with the preconditioned cg algorithm. Let nsol v be the maximal number of iterations needed for the specific solver to reach the level of machine precision. Then, step by step, each iteration was stopped after only j steps, j = 1,2,3, ... , nsolv, with PCG and SOR, and the "error" was computed as the difference between that approximation p~7G) or 1
pttJR) and Pi:. Figures 4.6 and 4.7 show the error development in PCG
The plots are taken from mpeg films which can be requested from the author. They have been prepared with the help of M. Hilgner.
4.1 Electrostatics
209
Figure 4.7. Error development of the SOR algorithm. The error in the best iterative approximation is plotted after the 1st, 6t, 12th, 24th, 50th , and 150th step.
and SOR. Figure 4.8 displays the development of the solution, i.e., the scalar electrostatic potential PE, during the iterative solution process with the preconditioned cg method.
210
4. Applications from Electrical Engineering
Figure 4.8. Development of the iterative solution, the electrostatic potential, in case of peG. Displayed are the approximations after the 1st , 6t, 12 th , 24th and 50th step. The corresponding electric field, i.e., the gradient of the final approximation of the electrostatic potential, is shown once more for comparison (bottom right).
4.2 Magnetostatics
211
Figure 4.9. Electrostatic potential on the surface of the plug and in its neighborhood.
4.2 Magnetostatics In magnetostatics, the analytic and FIT equations to be solved are given by 1 - - 1curl (-curl A) = J E CD; Ca =je ' /L As was described in detail in subsection 2.3, the magnetic field H is decomposed into its homogeneous and non-homogeneous parts: H:=Hh+Hi '
This finally allows one to use a scalar potential H geneous part: div (/L grad
.~
0) ~
_1
Il
to·5 - .~ 10.7 1 1
\
-
10.9
1000 500 750 umber of iterations
250
0
1250
Figure 4.13. Convergence history of built-in solvers: SOR versus preconditioned cg. ~
10°
0
I:::
(;j
::l
:9
10-2
\j".
\ \
CI)
~ II)
;>
'.0
\
\I
10-4
\~
"\(1 .,
oj
7) ~
.~ ~
10-6
~
1
10-8
~/
0
50
100 150 200 250 Nwnber of iterations
300
Figure 4.14. Blown-up view of the first 300 iteration steps of the convergence history of built-in solvers: SOR versus preconditioned cg.
The symmetry with respect to the planes 'P = 0, 'P = 27T allows to discretize only the half shown in Fig. 3.27. The chosen grid has N = 27869 points. Figure 4.15 shows the magnetic flux density in the plane z = O. Because of the high permeability of the material, the field is mainly located inside the sensor ring.
4.2.3 Velocity Sensor A certain magnetic velocity sensor has been studied by Schillinger and Clemens in order to compare the preconditioned cg with the algebraic multigrid. The cg algorithm was used as it is implemented in the CAE tool MAFIA with the two different preconditioners IC(O) and ILU(3). The black box solver by Ruge and Stiiben [219], [252] was used as a multi grid solver.
4.2 Magnetostatics
'\
215
..... ~.\
o o.
----
,, ,
.,
0.00
"\
~ .......
" .
\,: \" I
..
\,
ph.i
t I
~ :II:
r
0.01
~ •
.
:.
I
•
j
I
I
1
I
I
I
I
I
•
I
I I
I
I
I
I
I
I
,
:\ ·· ~. .:.
.. ':
·: j
j
0.02
0 . 02
Figure 4.15. Magnetic flux density in some z-plane of the annular current sensor.
Figure 4.16. Geometry and magnetic flux density of magnetic velocity sensor. The plot is courtesy of Schillinger, TU Darmstadt
The problem was discretized using about N ~ 230000 unknowns. Figure 4.16 shows half of the system and the magnetostatic flux density. Figure 4.17 compares the convergence of the cg algorithm with the two different preconditioners, the AMG and the AMG plus its overhead by the grid setup on the coarser levels. AMG is about twice as fast as peG, even if the overhead is taken into account. Thus, the velocity sensor is an excellent example of optimal performance of a black box algebraic multigrid solver for Poisson's equation.
216
4. Applications from Electrical Engineering
Rei. Residual 1~ r-----~----~------~----,
-
AMG IC(O)-CG ILU(3)-CG AMG+Grid-Set
Cpu/sec.
Figure 4.17. Convergence history of magnetic field computation of the magnetic
velocity sensor. Compared are the cg algorithm with IC(O) and ILU(3) preconditioning and the black box AMG (with and without overhead). The plot is courtesy of Schillinger, TV Darmstadt
4.2.4 Nonlinear C-magnet
In case of nonlinear permeabilities, the following Newton-Raphson-like procedure [191]' [292] is carried out in the static S-module of the CAE package MAFIA: The calculation is carried out in a series of cycles C1 , ... , CM , as is illustrated in Fig. 4.18. The first step in cycle Ci is to solve the linear problem using the permeability J.li-l' This gives the magnetic field H(i). Looking up the right value in the B - H-curve yields the next value for the permeability J.li (see Fig. 4.19 for a qualitative plot of a B - H-curve). As soon as some kind of convergence is reached (see [292] for details), the final field H(M+1) is evaluated via a linear static computation with permeability J.lM. A C-shaped magnet is chosen again as an example - now with a nonlinear material. The example was originally discretized by the team of Weiland, TU Darmstadt; more details and more examples can be found, e.g., in [292]. Figure 4.20 displays the permeability of the C-magnet. Figure 4.21 shows the magnetic vector potential after the last cycle of the nonlinear iteration. Both are shown in a two-dimensional cross-section. Figures 4.22 and 4.23 display the convergence curves for the SOR method and the preconditioned cg method. The preconditioned cg method only needs about 10% of the total number of iterations necessary for the SOR method. For nonlinear problems, the rate of convergence of the implemented solver becomes especially important.
4.3 Stationary Currents; Coupled Problems
{ Cl) -----------. ~ { ~
217
J.l (0)
Cyclet
J.l
Cycle 2
-----------.
J.l(2) ........
.... ....
{
Cycle M
-.......
Ell)
El2)
~
J.l (M) - - - - - - - - - - - . JlM+l)
Figure 4.18. Computational procedure for nonlinear problems. The arrows from left to right stand for linear computation, those from right to left for the JL-update.
B J.lCi)
,
I I I
I,
'
I
I'
I, " I I , I' ,
, ,
,
,~
,' ,' ",
,I,', ,,' I
'
,
H
'/
Figure 4.19. Qualitative plot of a B - H-curve.
4.3 Stationary Currents; Coupled Problems As in electrostatics, the potential formulation may be used for stationary current fields, as was shown in section 1 and subsection 2.3. Then the continuous and discrete equations are given as follows: div K, grad ,de6
Cycle 7 CycleS Cycle 9 Cycle 10 -Lost -
100 10~
10-2
~ ~
10 03 10"
0
10
20
30
number of iterations
40
Figure 4.23. Convergence history of the preconditioned cg algorithm in case of a nonlinear C-magnet.
In the following, some field representations are shown for two examples: First, a simple Hall element, next a semi-conductor problem is considered as a sample coupled problem. 4.3.1 Hall Element In this example, consider a simplified Hall element without application of a magnetic field. This is a stationary current problem. The conductivity of the material is assumed to be '" ~ 4 S/m. The resulting real positive definite linear system is solved using the cg method. After 33 iterations, the accuracy cannot be improved further. The obtained solution satisfies the continuity equation div J = 0 with the accuracy of 0.13.10- 4 .
220
4. Applications from Electrical Engineering
Figure 4.24. Simple Hall element and vector representation of the electric field.
The scalar potential is given by the solution vector. The electric field can be directly determined without any loss in accuracy. Figure 4.24 shows the geometry of the problem and a representation of the electric field. 4.3.2 Semiconductor
Figure 4.25. Current flow field and resulting magnetic field in a semiconducting cube. The plot is courtesy of Bartsch, CST GmbH
This example shows the coupled calculation of stationary currents and the excited magnetostatic fields in a semiconducting cube. Two copper contacts are attached to one face of the cube. The contacts are connected to different potentials of ± 10 V. Figure 4.25 shows the current flow field and the resulting magnetic field for the investigated semiconducting cube (cf. [23], [24]). For the coupled calculation, the field J of the stationary current computation that is allocated on the FIT grid G has to be transferred to the dual FIT
4.3 Stationary Currents; Coupled Problems
221
3
grid G. The vector that represents J on G satisfies the continuity equation and can be used as the excitation for the magnetostatic computation. This procedure is described in detail in [25], [23], [24]. In our example, the relative accuracy of the continuity equation div J = 0 is better than 10- 6 . The results of the coupled calculation are displayed in Fig. 4.25. 4.3.3 Circuit Breaker
This practical example (presented also in [24]) deals with a circuit breaker2 • Figure 4.26 displays its geometry. The upper left and right region in Fig. 4.26 are the two fixed parts of the contact. They are connected by the movable contact bridge shown in the lower region.
Figure 4.26. Geometry of a circuit breaker.
The plot is courtesy of Bartsch, CST GmbH
To simulate the current, assume there is a difference of potentials at the contacts. The resulting current flow J is displayed as a three-dimensional arrow plot in Fig. 4.27. The potential distribution PE on the surface of the structure is shown in Fig. 4.28. Figure 4.29 shows an arrow plot of the field strength H as a result of the coupled calculation of the magnetic field. An essential feature of the circuit breaker is that, with increasing current, the electromagnetic force can be used to separate the bridge from the fixed contacts once the current exceeds a certain threshold value. Figure 4.30 shows the absolute value of the force on the bridge as a three-dimensional contour plot. Forces arise mainly in the surrounding of the contact area. The calculated forces are in good agreement with measurements. 2
Thanks are due to P. Steinhauser from Rockwell Automation, Switzerland, for the model and measurement of the circuit breaker example.
222
4. Applications from Electrical Engineering
CST
Figure 4.27. Current flow field J. The plot is courtesy of Bartsch, CST GmbH
Figure 4.28. Electric potential distribution q; E. The plot is courtesy of Bartsch, CST
GmbH
Figure 4.29. Magnetic field strength H. The plot is courtesy of Bartsch, CST GmbH
4.4 Stationary Heat Conduction; Coupled Problems
Figure 4.30. Absolute of the force on the contact bridge. Bartsch, CST GmbH
223
The plot is courtesy of
4.4 Stationary Heat Conduction; Coupled Problems As was already discussed in subsection 2.3, stationary temperature problems lead to Poisson's equation, just like the static problems treated above. The underlying analytic and the corresponding FIT equations are given as div /'\,T grad T
= -w
Here the temperature distribution on a board is chosen as an example. further examples of temperature calculations are given in subsection 5.6 for several coupled problems arising in accelerator physics: 1. Inductive soldering, which needs the coupled computation of an eddy current problem, i.e., an excited time-harmonic problem, and a stationary temperature problem. Figures 5.46, 5.47, 5.48 and 5.49 display results for all stages of the coupled computation. 2. Temperature distribution in rf cavity, which requires the coupled eigenmode computation, i.e., the solution of a time-harmonic problem, and a stationary temperature problem. Figures 5.50 - 5.53 show the solutions of the single problems in different representations. 3. RF window requiring a general time domain simulation being coupled with the temperature simulation. Figures 5.54 - 5.56 display the results. 4. Waveguide with a load, which also requires combined electromagnetic time domain and temperature simulation. The results are shown in Fig. 5.58 and 5.59. 4.4.1 Temperature Distribution on a Board
A board with several IC's was investigated with regard to the heating of neighbouring components by some heated IC . In the numerical calculation, the heated IC corresponds to a material with given temperature, thus serving as a heat source. The surrounding space is open, i.e., for the simulation it is assumed that the temperature vanishes at infinity. Correspondingly, all temperatures given in the figures are relative to the outer temperature of the board; they should not be understood as absolute temperatures. The problem domain has dimensions 9.4 cm x 6.4 cm x 6.0 cm. The heat source which might be, for example, a CPU of a PC is assumed to be 70° Celsius (or Kelvin) warmer then the surrounding air. The other
224
4. Applications from Electrical Engineering
Figure 4.31. Board with different IC's. The big IC is assumed to be the heat source with temperature 70° Celsius above that of the surrounding air. Displayed are the board and isometric planes at 55° and 80° above the room temperature.
components have the following conductivities: the substrate and the mechanical connections K, = 15 Slm, the small Ie's K, = 100 Slm, the sheets K, = 200 Slm, and the air K, = 0.588 S/m. The linear system was solved using the cg method with ILU(3) preconditioning. The implemented cg solver has automatic control over the iteration process: it ends the process as soon as the residuum stagnates. After 33 iterations, this criterion was satisfied and the value of the relative residuum was 0.2179827· 10- 6 . The required cpu time was about 72 seconds on a SUN Sparc Server. Substituting the solution in the divergence equation 0.734857· 10- 4 as the relative accuracy of the solution. Figure 4.31 shows the board with two isometric planes which visualize temperatures 55° and 80 0 above the room temperature.
4.5 Electro-Quasistatics As was already discussed in subsection 2.3, electro-quasistatic problems lead to a complex Poisson equation
where the electric field E is represented as the gradient of a complex scalar potential: E = -grad~ 4.5.1 High Voltage Insulators with Contaminations
An important branch of electrical energy engineering is the high voltage engineering. Electric voltages of more than 1000 Volt (1 kV) are referred to as
4.5 Electro-Quasistatics
225
high voltages. For these voltages, electric fields may cause strong discharging effects. High voltage is applied, e.g., in the long distance transmission of electric energy via overhead lines. The operating frequency is 50 Hz. The resulting electromagnetic fields are slowly varying fields which depend mainly on the displacement current. Thus, the equations of electro-quasistatics have to be solved. An important problem in high voltage engineering are phenomena which are caused by humidity or contamination of the insulator, in particular, material aging, which is caused by breakdowns. Figure 4.32 shows a typical example of an insulator as well as a test specimen for experimental studies on aging processes. High voltages are applied at the insulators. Below a critical voltage Uk, the electrostatic field of the charge-free space prevails. The dielectric can be assumed to be perfect insulator free of space charges. Above the critical voltage Uk, the insulating material looses its insulating properties and becomes the carrier of a discharge which makes a conducting connection along the insulator. The coalescence of single drops on humid insulators allows to reach the discharge voltage. This is displayed in Fig. 4.33
Figure 4.32. Insulator and epoxid resin specimen with a layer of drops as it develops on the surface after water sputtering.
So far, the computations of superposed electrostatic fields and electric fields of lossy dielectrics were not satisfactory, because a suitable model which also allowed detailed quantitative studies was missing: In the course of the
226
4. Applications from Electrical Engineering
Figure 4.33. Discharge on epoxid resin specimen with a layer of drops as it develops on the surface after water sputtermg. The photo is courtesy of Weiland, TU Darmstadt, and Philipp Morris Company
studies, only one publication [274] could be found that discretizes the same differential equation - although with the Finite Element Method (FEM) and only for the two-dimensional case. If this problem type is simulated at all with some discretization method, then it is simulated as a static model and mostly only in the two-dimensional case (see, e.g., [241], where the Boundary Element Method (BEM) is used). The discretization of the electro-quasistatic equations with Finite Integration Technique presents a suitable model for investigations of the superposed electrostatic fields and electric fields of lossy dielectrics. 4.5.2 Surface Contaminations
The electric field on a contaminated surface of a solid dielectric is different from that on a clean and dry surface. The electrostatic field on the surface of a dry and clean dielectric can be simulated by a series on n capacitors. A conductive contamination leads to an increase in surface conductivity and can influence the resulting electric field distribution. Then a series of n capacitors connected in parallel with n resistors gives a suitable equivalent circuit. Qualitatively, the question of surface contaminations was already studied in extenso [156]. In [208], experimental studies are presented on the influence of weakly conductive contaminations on the surface aging of on-load alternating voltage cylindrical specimen from epoxy-resin. The specimen differ, e.g., by the shape of their electrodes. The influence of the electrode shape on the field arising in their neighbourhood is dramatic. The phenomenon of the so-called "electrolytic partial discharge erosion" causes an increase of the total conductivity of the surface layer. In most cases, the local conductivity depends non-linearly on the local electric field strength. This dependence is again different for different types of contamination.
4.5 Electro-Quasistatics
227
4.5.3 Fields on High Voltage Insulators
In [208], experimental studies are presented on the influence of weakly conductive contaminations on the surface aging of on-load alternating voltage cylindrical specimen from epoxy-resin. The specimen differ, e.g., by the shape of their electrodes. The influence of the electrode shape on the field arising in their neighbourhood is very substantial: For a jutting out disc electrode, the maximal electric field is about fifteen times higher than the homogeneous electric field; for the toroid electrodes, it is about six times higher, and for electrodes similar to the Rogowski profile, it only increases by about 10% [208]. Cylindrical Specimen with Toroid Electrodes. For numerical studies, one of these specimen was chosen: The caps are each 6 mm thick and have radius 18 mm. The computed model is a 30 mm long, solid piece of the originally [208] 100 mm long hollow cylindrical specimen of radius 15 mm. The epoxy-resin has relative permittivity f. r = 4, the relative permittivity of the water drops is f. r = 81, and their electric conductivity may be assumed to be /'i, = 10- 6 S/m. The frequency f of the alternating voltage is 50 Hz, and a voltage gradient of 5 kV Icm is applied. The size and form of the water droplets vary. Neglecting deformations caused by the electric field, we assume a round shape with a typical diameter of 1-3 mm. As is clear from the picture in Fig. 4.32, the drops are close together but randomly distributed. A technical drawing of the specimen and a picture of the experimental studies described in [208] is displayed in Fig. 4.34. The problem domain was discretized using a 57 x 57 x 73 grid leading to a linear system with n = N = 237177 complex unknowns. First, a constant radius 3 mm was assumed for the water droplets. Figure 4.35 shows (in the left part) a representation of the electric field when only seven droplets of radius 3 mm are assumed on the surface. On the right, a contour plot of the real part of the complex potential is displayed. Figure 4.36 shows isometric lines of the real part of the electro-quasistatic potential when the water droplets are arranged in one row along the specimen. So far, simulations for high voltage insulators were in most cases based on electrostatic calculations. Since the electrostatic model does not include the current density and the displacement current, it is obvious that this model inheres a systematic error. Figure 4.37 shows a comparison of the longitudinal electric field along the specimen, i.e., from one electrode to the other. This path crosses some water drops, which each cause a strong increase in the field. The electrostatic model, however, shows significant discrepancies in these areas, which are most interesting in this study. Consequently, the electro-quasistatic model is to be preferred for simulations of high voltage insulators. In Fig. 4.38, a simulation model is shown with many different drops which are distributed over the complete surface. Some of the water drops have coalesced. For symmetry reasons, only a quarter of the structure is discretized.
228
4. Applications from Electrical Engineering
-
1 em,
0.5 q bzw.
5 mq/cm'
Figure 4.34. Hollow cylindrical epoxid resin specimen (1) with toroid electrodes (2), terminating caps (3), and annular seal (4). Also shown is the layer of droplets which develops on the surface of this specimen after water sputtering. The drawing is courtesy of Quint [208J
In each of the Fig. 4.39 and 4.40, the absolute value of the electric field is displayed over some specific cross-section. The field increase between the drops and near the electrodes becomes very obvious. Only the field increase near the electrodes remains if the test specimen is completely dry (compare Fig. 4.40). The computed field courses agree very well with the observations in the corresponding experiments [150]. Qualitative comparisons are hardly possible for such specimen for reasons related to the measurement techniques. As a result, simulation, manufacture, and measurement of special specimen are planned for a quantitative validation of the numerical method. An evaluating
4.5 Electro-Quasistatics
229
comparison with other numerical methods is not possible for the time being since only electrostatic models are available (e.g., [241]).
4.5.4 Outlook In the future, further numerical and experimental studies are foreseen. The influence of the electrode shape, which was already experimentally investigated in [208], shall be compared with numerical results. The relation between the applied field strength and the shape deformation of the droplets found in the experiments described in [55] shall be realized numerically by the simulation of the deformation caused by the electric field. In this process, the field and the resulting deforming forces are iteratively computed. Also, a typical insulator shape (as in Fig. 4.32) shall be simulated. Finally, a direct measurement of the electric field is needed for the quantitative validation of the simulation. For that, single water droplets are put by a pipette on a flat specimen, as is shown in Fig. 4.41, or on a specimen with electrodes inside, and the fields are measured in which the conductivity of the water will be variable. The enclosed electrodes shall avoid possible field increase at the boundaries of the electrodes. Measurements of this kind shall facilitate precise comparisons between simulation and practice.
230
4. Applications from Electrical Engineering
Figure 4.35. Vector representation of the electric field Re(E) and contours of the real part of the complex potential Re(tE ) for a specimen with only a few water droplets. -0.01
Figure 4.36. Isometric lines of the real part of the electro-quasistatic potential Re(tE ) in a cross-section with four water drops in one row.
4.5 Electro-Quasistatics
231
2.0e+05
-- .. - ---
,...._J
I I
I I I
'0
I
~
o ·c tl
.,
i I
I
·3.0e+05
I I
"
ill
- - - - Electro·Quasistatics .. Electrostatics
.S.Oe+05 L -_ _ _ ·0.02
~
_ _ __'__ _ _
~
0.03
_ _ _- ' -_ _ _
~
_
__.J
O.OS z/m
Figure 4.37. Comparison of the results from electrostatic and electro-quasistatic computation. The dashed line with the most instant minimum shows the zcomponent of the electric field Re(E) as a function of z as obtained from the electro-quasistatic computation. The dotted line shows the z-component of the electric field E resulting from electrostatic computation.
x Figure 4.38. Specimen with many partly coalesced water droplets of varying size. Displayed are the studied geometry (left) and the equipotential lines of the real part of the complex potential Re(fE ) (right).
232
4. Applications from Electrical Engineering
y
Figure 4.39. Representation of the absolute value of the electric field over some specific (y,z)-cross-section (x = 0.7 cm) for a specimen with many separate drops of varying size.
y
Figure 4.40. Representation of the absolute value of the electric field over some specific (y,z)-cross-section (x = 0.7 cm) for a specimen without any water drops.
4.6 Magneto-Quasistatics
,,
I I
I
I
Z
233
y
\Z... x
Figure 4.41. Electric field on some flat epoxid-resin specimen with electrodes on top. A row of three water droplets is put between the electrodes.
4.6 Magneto-Quasistatics In subsection 2.3, we derived the FIT equation for magneto-quasistatics
As is explained in more detail in [59] and [57], several aspects, like regularization, have to be taken into account in the setup of the numerical model eventually leading either to wave equations for the electric grid voltage e and the magnetic vector potential a or to a system of differential-algebraic equations (DAE) of order 1 with a regular matrix stencil: (
- - -1
-T
-)
d
d .
CDI' C + DeS DNSDe e + D", dt e = - dt J
(6iJ;:lC + DeST DNSDe)
a+ D", :ta =
j
where the term with the normalizing diagonal matrix D N represents a local Coulomb gauging [116]. This system is solved by standard implicit one-step methods of integration with respect to time [117]. An example of such standard time-marching algorithms are 8-methods [331], which include the well-known implicit backward Euler (8 = I), Galerkin (8 = 2/3), and Crank-Nicholson (8 = 1/2) methods. A good compromise between the numerical effort arising from the
234
4. Applications from Electrical Engineering
necessarily iterative solution of two linear systems for each time step and the achievable L-stability is given by the stiffly accurate SDIRK2 scheme [1], which is of order 2 for DAEs of order 1. Also applicable are Gear's backward differential formulas (BDF) [117J. For the startup phase of the second order BDF2 method, the SDIRK2 method can be applied [56], [59J. After discretization with respect to time, the DAE system yields linear systems of the form (D(Lltn+1) + A) Yn+1 = b where D(Llt n+1) is a diagonal matrix with entries depending on the actual duration of steps. The formulations were chosen to yield real-valued, symmetric, positive (semi-)definite linear algebraic systems, to which the classical preconditioned conjugate gradient algorithm is applicable. Best results with respect to the total computational time so far were achieved by an efficient implementation of the SSORCG method using an operator-type matrix-vector multiplication [59J. The cg method is especially well-suited for this type of semi-definite problems and also gives meaningful solutions to the strongly singular systems arising from discretizations with respect to time. In case of non-linear magnetic material, the system to be solved is equivalent to a nonlinear problem
which can either be solved by methods of successive approximations or by the Newton-Raphson schemes [158J. 4.6.1 TEAM Benchmark Problem This simple magneto-quasistatic example shows an eddy current simulation of the Team Workshop benchmark problem 7 [38], [119] at 50 Hz. The structure has been modelled using about 47000 mesh cells. The total simulation time of the frequency domain solver is less than 1 hour, whereas the implicit time domain calculation with a transient building-up phase over 1.5 periods requires about 2 (4) hours for 30 time steps with 30 (60) PCG solutions on a SUN ULTRA SPARC 1, depending on the chosen time integration scheme for the vector potential formulations. The deviation of the otherwise excellent results from the measured values is due to the staircase approximation of the coil curvature.
4.7 Time-Harmonic Problems The Helmholtz equation and its discrete analogue obtained using the Finite Integration Technique, the so-called discrete Curl-Curl Equation or discrete Helmholtz equation (cf. subsection 2.3), are given by
4.7 Time-Harmonic Problems
235
Figure 4.42. Geometry of TEAM benchmark problem No.7 and measurement path. The plot is courtesy of Clemens, TU Darmstadt
Figure 4.43. Field strength. The plot is courtesy of Clemens, TU Darmstadt
(curl.!.. curl -
!!:.
W2~/) E = -iwlE
The right-hand sides -iwlE and -iwie represent the impressed current excitation. ~' combines the complex conductivity and permittivity. In case of resonant modes, the right hand sides are equal to zero and an eigenvalue problem results. Some kind of gauging is applied with the aim of shifting static solutions to facilitate the numerical solution of the eigenvalue problem, which then is given by:
(JE 1 CD;lC + ST DNSI2ck + w2~ = O. Eigenvalue problems are not the main topic of this book, but the reader can find some examples for this type of problems in section 5 (Fig. 5.1), subsections 5.2 (Fig. 5.6,5.7),5.4 (Fig. 5.20, 5.28, 5.29), and 5.6 (Fig. 5.50). However, the computation of resonant fields by other methods may well lead to linear equation systems: the mode matching technique, described in
236
4. Applications from Electrical Engineering
O.OOB
,----~-----,----___,
0.007 0.006 ~
1 !::.N, II>
-
0.005 0.004 0.003
+ measured values 1requen~ domain solution ..-.-. quasis1atic••• sdirk22 --- quasis1atlc. a. bd12 --_. wav....qn .......·s method
0.002 0.001 0.000
'---~-~-------'---------'
0.00
o. to
0.20
Palh Blm
Figure 4.44. Comparison of magnetic field strength values for different simulation models and measurement. The plot is courtesy of Clemens, TV Darmstadt
subsection 2.1 is one example of this. With this method, the natural frequencies (eigenfrequencies) are found as zeros of some matrix determinant (see (2.1.3)) and amplitudes of the wave excitation by solving the linear system
(cf. (2.1.3) and subsection 2.1). Some results of resonant field computations are given in subsection 2.1 (Fig. 2.5 - 2.7) and subsection 5.4 ((Fig. 5.19, 5.20 and related figures). In the convergence studies in subsection 3.10, two time-harmonic applications with excitation were investigated. Both are problems which can also be solved in time domain and were first simulated there. Both examples are only briefly described and some typical results are presented. For a more detailed description, see the literature referred to there. Another example of a time-harmonic application with excitation is the eddy current problem related to inductive soldering described in subsection 5.6 (Fig. 5.46).
4.7 Time-Harmonic Problems
237
4.7.1 3 dB Waveguide Coupler The 3 dB waveguide coupler, or, to be more precise, the 3 dB rectangular waveguide directional coupler,3 was calculated in time domain in [76J and compared with measurements and mode matching calculations before it was simulated as a time-harmonic problem in [58J. For the latter, the coupler was discretized on a 53 x 2 x 128 grid (N = 13568 points). It consists of two rectangular waveguides whose wide sides are facing each other. The energy from one of the waveguides is to be evenly distributed by the coupler into the two waveguides. For this purpose, six connecting slits, differing in height and distance, lie between the waveguides over their full width. These were devised to lie in the range 11-12.5 GHz for the optimal frequency response. In this frequency range, the coupling is even better than 2.84 dB [76]).
Figure 4.45. Electric field Re(E) in the 3 dB waveguide coupler. The walls of
both waveguides (top and bottom in the picture) were showed transparent to allow a view of the field.
Figure 4.45 shows the electric field in the 3 dB waveguide coupler. In this picture, the geometry of the coupler also becomes clear. Figure 4.46 shows a comparison of the reflection coefficient Sl1 with the time domain computation [76J for three different frequencies. The agreement between both solution ways is very good. 4.7.2 Microchip
In this example, a section of a microchip was considered in the frequency range of 10 to 40 GHz with the focus on a possible cross-talk between the 3
The coupler was manufactured and measured by the company MBB. The results of measurements and mode matching calculations were provided to Dohlus [76] in private communication.
238
4. Applications from Electrical Engineering
-
time domain computation
¢ frequency domain computation
'20.0
til
~
::
-25.0
(/)
·30.0
·35.0 ~---'-------'.---~----'-------" 10.0 11.0 12.0 13.0 14.0 15.0
frequency I GHz
Figure 4.46. Reflection coefficient 5 11 of the 3 dB waveguide coupler. The results from the frequency domain calculation show a very good agreement with the time domain solution.
Figure 4.47. Cross-talk of the electric field Re(E), logarithmically scaled, at a frequency of 10 GHz. bond wires. As is visible in Fig. 4.47, the discretized section consists of two microstrip ports and two thin bond wires which connect the microstrips with resistive blocks on the material. The resistive blocks have conductivity K, = 1.3 . 104 Slm, and the substrate has relative permittivity Cr = 9.0. The dimensions are about 700 /lm x 300 /lm. The discretization was done on a grid with 71 x 20 x 85 = 120700 points. The cross-talk from one wire to the other was determined for the frequencies 10 GHz and 40 GHz. The comparison of the reflection coefficients obtained in time and frequency domain shows an extremely good agreement, cf. Fig. 16 in [58]. Figure 4.47 shows a cross-talk of
4.9 Bibliographical Comments
239
the electric field at 10 GHz. The electric field Re(E) is displayed in vectorial, logarithmically scaled form.
4.8 General Time-Dependent Problems General time-dependent problems lead to initial value problems which can be solved by explicit procedures based on discretization with respect to time. The analytic operator and the corresponding FIT operator are given by
L
-~ ~CUrl~) (--curl1
f.lr
E
1
Er
Eo
f.lo
0
Thus, no linear system has to be solved for this problem type, so it is not in the scope of this book. Yet, some examples of the solution of Maxwell's equations in time domain can be found in subsection 5.6: Results of the field computation for some rfwindow are displayed in Fig. 5.54 and Fig. 5.55. Figure 5.58 shows the field distribution in a waveguide with a load.
4.9 Bibliographical Comments In this section, several applications which require solutions of large linear systems have been presented. Most of the examples have already been studied from the point of view of convergence of applicable algorithms in subsection 3.10. There appeared some field plots and other relevant plots. In general, the author is not aware of any textbook covering applications from electrical engineering from the point of view of solution of the resulting linear system (which was one of the main reasons to write this book). Consequently, mainly research papers, student's papers, and dissertations have been cited in this section. The most relevant literature concerning the abovementioned examples has already been listed in the bibliographical comments of section 3 and will not be repeated here again. Electrostatics The electrostatic example was originally described in [72], where it was solved by the SOR algorithm and an older version of the CAE package MAFIA[303].
240
4. Applications from Electrical Engineering
Magnetostatics The linear C-magnet and the sensor are just two typical and rather simple examples discretized by Weiland and his group; see section 3 for references. The nonlinear C-magnet was first described in [292]. A description of the Newton-Raphson-like procedure can be found, e.g., in [191]. Stationary Current Problems; Coupled Problems The Hall element was discretized by the author. More details on the semiconducting cube can be found in [23] and [24]. [25] describes the computation of electromagnetic forces with FIT. The circuit breaker was modelled and measured by Rockwell Automation. The simulations for the circuit breaker have been first presented in [24]. Stationary Temperature Problems; Coupled Problems The discretization of the PC board was taken from some time domain simulation of Weiland and his group. The temperature simulation was done by the author. The temperature simulations in the coupled problems from subsection 5.6 were guided by the author. Electro-Quasistatics Electro-Quasistatics is one of the author's main research topics of the last years. It started out from some ideas and discussions with Weiland. Besides theoretical studies, the implementation in MAFIA was carried out and some examples, like the insulator, have been discretized. In the course of the studies, only one publication (274) could be found on a 2D FEM discretization of the same differential equation. A 2D static BEM model is described in (241). Extensive qualitative experimental studies have been carried out under the guidance of Konig [156], [208], and [55); see also [150]. Magneto-Quasistatics In order to complete the spectrum of examples, some recent examples by Clemens have been presented. The examples and the theory underlying the FIT implementation are described in detail in [59] and [57]. Some of the ideas have first been described in [116]. The low frequency Team Workshop benchmark problem 7 was first solved by FEM ([38], (119)), then by FIT ([59) and [57)). Textbooks dealing with the implicit time-stepping algorithms are, e.g., [1], [117], and (331). A German textbook describing the solution of nonlinear electromagnetic problems is [158].
4.9 Bibliographical Comments
241
Time-Harmonic Problems Several examples for this problem type are described in subsections 5.2 and 5.6 as well as in subsections 2.1 and 5.4. Please refer to the bibliographical comments in the corresponding sections. The waveguide coupler was manufactured and measured by the company MBB. The results of the measurement and mode matching calculations were provided to Dohlus [76], who solved the problem in time domain and compared the results with measurements and mode matching calculations before it was simulated as time-harmonic problem by Clemens et al. in [58]. The cross-talk study for the microstrip was described in [58] before. General Time-Dependent Problems Some coupled problems involving time domain simulations are described in subsection 5.6; see bibliographical comments there.
5. Applications from Accelerator Physics
High energy physics studies the elementary constituents of matter. Adequate instruments - comparable with microscopes - for studying elementary particles are accelerators, or storage rings. In the construction of large accelerators, technical components are used which are on the edge of feasibility and require a notable amount of funds. Therefore, most precise statements about the capability of a new accelerator are a substantial part of the design. Based on the methods introduced above, computer codes were developed which serve for the optimization of certain technical components of an accelerator, especially of the accelerating sections. Besides their application in accelerator physics, these methods can be used practically everywhere in electrical engineering where microwave systems, e.g., waveguides or resonators, are used.
5.1 Acceleration of Elementary Particles The simplest apparatus for the acceleration of a charged particle is the plate capacitor. In accelerator physics, the required devices must supply very high voltages (up to lOll Volt (100000 MV)). Yet, such high voltages cannot be created between a pair of electrodes. In order to reach the desired high energies, high frequency alternating currents are used and the energy supply is carried out in several steps. The technique for creation of those high frequency waves is the same as in radio or TV engineering. The transport of such high frequency waves with wave length 0.1-1 m takes place in hollow metallic structures that are also of size 0.1-1 m. Waveguides. A waveguide is characterized by the fact that it carries propagating electromagnetic waves. In general, the z-coordinate is assumed to be the direction of propagation. Then an ideal waveguide is given by a 3dimensional domain which extends infinitely in the z-direction with perfectly conducting walls forming the transversal boundaries. Waveguides are filled with air or vacuum and often also with dielectrics or ferrites. A real waveguide is some kind of a pipe with walls of finite conductivity. The rf waves are carried to the accelerating sections of the accelerator. These sections are driven so that the particles pass them at the same moment as the accelerating field of the wave. U. Rienen, Numerical Methods in Computational Electrodynamics © Springer-Verlag Berlin Heidelberg 2001
244
5. Applications from Accelerator Physics
Cavities. In storage rings, the accelerating structure is usually designed to make the electromagnetic fields stay inside by resonance. In its most essential electromagnetic properties, such an accelerating structure does not differ from a resonant cavity. An ideal resonant cavity is a finite volume where perfectly conducting walls form the boundaries. It may be filled with some material, e.g., some dielectrics or ferrite. A cavity made of a material with finite conductivity having one or several openings is still referred to as a cavity if it is possible to create resonant oscillations in the cavity by applying electromagnetic modes of appropriate frequency. For such frequencies, standing waves develop by reflection of the electromagnetic waves at the boundaries in their direction of propagation. The eigenfrequencies belonging to these resonant oscillations are called resonant frequencies. Then the system is in resonance. An ideal cavity has a J-like resonant spectrum. In a cavity with finite conductivity, the resonances are no longer undamped. The damping leads to a finite resonance width. Therefore the resonances can occur in the neighbourhood of an eigenfrequency. The eigenfrequencies of the cavities lie in the rf range l . The resonant oscillations are time-harmonic oscillations. These harmonic oscillations with finite energy in a given domain are often denoted as modes. The modes form a complete system of functions in £2(0, 21T), i.e., any oscillation can be described as a combination of all modes where amplitude and phase are suitably chosen. Therefore, the knowledge of the harmonic oscillations which solve Maxwell's equations is sufficient to describe all possible oscillation forms. A resonant mode is suitable for the" acceleration" of (or, more precisely, for the supply of kinetic energy to) the particles if it has a strong longitudinal electric field on the axis. Figure 5.1 shows the electric field of such a TMolO mode. In a storage ring (resp. linear accelerator), resonant modes (resp. traveling waves) are used for the energy supply to the particles. Upon passing the electric fields transmitted in the cavities (resp. traveling wave tubes), the particles experience a force which increases their kinetic energy. With that, the laws of the relativity theory hold for ultra-relativistic particles. Accordingly, the speed of light cannot be exceeded. Consequently, these particles do not get faster but heavier according to E = mc2 • In the sequel, we present computations for cavities, aperiodic waveguides, and rf windows. The mode matching technique and the Finite Integration Technique were used to compute traveling waves and standing waves (modes) in accelerating structures. While passing the accelerating structures, the particle bunches themselves cause electromagnetic fields, which are undesirable. Their interactions with the generating and all following particles have to be studied together with the possibilities to suppress those parasitic fields. The results presented for dipole modes in a typical accelerating structure for future linear colliders [267] elucidated one typical problem of these structures, 1
f ;::: 300 MHz
5.2 Linear Colliders
245
Figure 5.1. Electric field of the fundamental mode of a cylindrically symmetric cavity.
the so-called trapped modes. These results initiated a series of theoretical and experimental studies. Some of the successive studies are also described here. Furthermore, calculations of temperature were carried out for the resulting wall losses. The simulation algorithm for the computation of temperature is a helpful tool for many problems related to the accelerator operation. This algorithm also proved itself useful for some problems on manufacturing techniques, in particular, for the soldering of accelerating structures.
5.2 Linear Colliders In what follows, a special field of applications of accelerator physics is treated in more detail. This field is highly interesting and important, if not from the numerical, but certainly from the physical point of view. In the future, the e+e- -physics will be interested in such high energies (500 GeV up to 1.5 TeV) that storage rings will no longer be applicable because of their high energy losses by synchrotron radiation. This necessitates the construction of a linear collider, and the studies toward this goal have been already carried out worldwide. The results for the S-band 2 x 250 GeV linear collider study SBLC [311] are discussed in detail. In this context, a program based on mode matching has been developed [279] and a series of numerical studies on the mode matching technique [183], [87], [204], [239] as well as field calculations for the accelerating traveling wave and parasitic modes in the accelerating structures [318] were carried out. Since the results of the field calculation gave strong indications of the necessity of radical changes in the
246
5. Applications from Accelerator Physics
design, a short test structure was designed and an experimental measuring was decided upon [284], [160], [161]' [163]. In this subsection on linear colliders and the accelerating structures of the future, we analyze linear accelerators and their influence on the beam dynamics. Those accelerators are referred to as linear colliders. The actual concepts of linear colliders can be in principle split into two groups: Most paradigms propose normal conductive structures for the acceleration of elementary particles. One paradigm suggests acceleration by superconductive structures. In the sequel, the normal conductive structures are treated in detail. All design studies for normal conductive linear colliders predict traveling wave tubes for the acceleration of the particles. In principal, periodic and aperiodic structures can be used. The advantages and disadvantages of constant impedance and constant gradient tubes will be considered. A special study, the S-band linear collider study, shortly SBLC, will be the focus of our discussion. A special emphasis will be on the design of the S-band tube. The SBLC study proposes 2452 so-called constant gradient structures for acceleration. These aperiodic traveling wave tubes will have 180 cells and accelerating gradient 17 MV 1m. A so-called bunch train of 125 packets of particles (bunches) with the distance between the packets equal to 16 ns is planned. In order to reach a luminosity2, as large as possible, any dilation of the bunches has to be avoided. Effects of scattered fields (wake fields 3 ), which are caused by parasitic resonant oscillations, are among the main factors that cause the dilation of the bunches. Consequently, the suppression of these modes, the so-called Higher Order Modes (HaMs), is of fundamental importance for the actual design of the collider. The interaction of the single HaMs with the bunch can be described by the so-called loss parameter, which will be treated later in detail. The evidence collected so far and some theoretical and numerical preliminary investigations suggest that one can assume that the modes of the first dipole band would cause the most harmful dilating effects. Therefore, the main interest lay on computations for this dipole band. In what follows, mainly the dipole modes of the S-band tube, their influence on the beam dynamics, and possible measures for their suppression are discussed. With the mode matching method, a numerical analysis was carried out for the HaMs. The frequencies and loss parameters of the dipole modes 2
3
The high energy experiments aim for a reaction between the elementary particles of the colliding bunches. Such reaction will only happen for those elementary particles out of the two colliding bunches which are close enough; most of the particles will just pass on. The luminosity measures the probability of the event that some elementary particles will cause a reaction. A good luminosity requires a high number of particles in the colliding bunches, a high number of bunches, and small values for transversal density distributions (Le. "dense" bunches with a small "cross section"). The term for these remaining fields is chosen by the association with the wakes of a boat. Wake fields and beam loading are treated in more detail in subsection 5.3.2.
5.2 Linear Colliders
247
thus computed allow the subsequent computation of the wake potential. So, they are of basic relevance for beam dynamics simulations. Contrary to all preceding qualitative considerations, the detailed numerical analysis showed that a radically different design is needed for the HOM damping: Besides the few strongly interacting modes at the end of the tube with the input coupler, there is a multitude of other strongly interacting modes. Many of these modes are trapped completely in the inner part of the tube, i.e., have no field at both ends of the tube. The grave consequence of this is that one damper is by far not sufficient to suppress all dipole modes which are dangerous for the particle beam. For experimental validation of the numerical results, a short test structure was designed which was measured in the microwave laboratory of Frankfurt University. The measuring results confirmed very well the numerical forecast about the field distribution of the parasitic modes.
5.2.1 Actual Linear Collider Studies Before describing the SBLC study in more detail, let us first make some introductory remarks about linear colliders. In high energy physics, a wide consensus exists that an electron-positron collider should be studied as the next project for the largest accelerator, the LHC (Large Hadron Collider) at CERN. This linear collider should have a center of mass energy of 500 Ge V and an event rate, the so-called luminosity, of 1033 cm -2 sec-I. For such high energies, storage rings are no longer suitable, since their energy losses by synchrotron radiation are proportional to the fourth power of the energy. Starting at about 100 Ge V, the compensation of this energy loss is no longer technically and financially affordable. In 1989, the first linear collider SLC was put into operation. It uses the 3 km long linear accelerator, already existing since the sixties. The goal energy is 50 GeV. In some respects, this machine differs from those studied since it carries electrons and positrons in one common accelerator. Nevertheless, it is of great importance for the studies in question, since the SLC presents a successful prototype of a normal conductive linear collider.
,/~ bunch compressor
ml
. J.P."
main lioac
main linac
J.P.
wiggler beam dump
\
.
collimator
matching optics
bunch compressor beam dump
e --source, injector pre-ace.
Figure 5.2. Schematic layout of the S-band linear collider SBLC.
248
5. Applications from Accelerator Physics
Figure 5.2 shows the schematic layout of the S-band linear collider. This layout is valid for most studies. The electrons as well as the positrons pass through a linear accelerator of their own. These accelerators are opposite to each other. The interaction region, i.e., the experimental zone, has a small angle of a few milliradians to the center line between the electron and positron linear accelerator. Before the main accelerator are, for each linear accelerator, a source for the elementary particles, a pre-accelerator, and a section for the bunch compression. The accelerating structures in the main accelerator playa central role in the design studies. They are the topic of the following subsections. Worldwide, there have been six different design projects of a future linear collider. They are carried out in part by international groups. These are the collider projects TESLA (coordinated by DESY, Hamburg, Germany), SBLC (also coordinated by DESY, Hamburg, Germany), JLC (coordinated by KEK, Tsukuba, Japan), NLC (coordinated by SLAC, Stanford, USA), VLEPP (coordinated by BINP, Novosibirsk, Russia), CLIC (coordinated by CERN, Geneva, Switzerland). On the conference EPAC 1994, a "Technical Review Committee" with representatives of all the projects was officially founded. Its report [267] of 1995 and the proceedings of the "Next Generation Linear Colliders" workshops [242], [153], [140] give a good overview of the different studies. The physics which is made possible by these linear colliders is described in [327]. Three of these design projects are very similar with respect to their working frequency, meanwhile also being in close co-operation, so that there are essentially four different concepts. Table 5.1 gives a short overview of the design projects. 1. with superconductive accelerating structures
(9-cell cavities) 1.3 GHz L-Band TESLA 2. with normal conductive accelerating structures (traveling waves tubes with 80-200 cells) S-Band SBLC 3 GHz 11.4 GHz X-Band JLC NLC 11.4 GHz VLEPP 14 GHz CLIC (2 beams) 30 GHz Table 5.1. Design projects of future linear colliders
The TESLA study [314] conducted by DESY in international collaboration is the only linear collider project foreseeing the use of superconductive accelerating structures. These are 9-cell cavities each driven at a resonant frequency of 1.3 GHz.
5.2 Linear Colliders
249
Besides TESLA, all other collider studies mentioned above foresee normal conducting accelerating structures. These are each traveling wave structures with about 80 to 200 cells. The only project in the S-band regime is SBLC [310], [311] with a working frequency of 3 GHz. This study was conducted by DESY in collaboration with other institutions, e.g., the Darmstadt University of Technology and the Frankfurt University. In the X-band regime, the Japanese study JLC [258], [257] and the American study NLC [194] of SLAC, each working with 11.424 GHz, and the Russian study VLEPP [16] with 14 GHz were initiated. Finally, the third conceptually different project is CLIC of CERN [232] with frequency 30 GHz. The last differs essentially from the previous studies, since it proposes the so-called Two-Beam Accelerator (TBA). The concept of TBA is based on a relativistic driving beam with high intensity but medium energy (6 GeV) running parallel to the main accelerator and depositing energy periodically in its 30 GHz traveling wave tubes. All the design projects face the practical problem to construct accelerating structures under extremely narrow tolerances. Precisions of up to 1 /-Lm have to be kept in the production. Potential scientific problems are dark currents and wake fields. The wake fields will be discussed in more detail below. TESLA proposes an accelerating gradient of 25 MV 1m. This puts high demands on the material, the production technique, and the cleaning and handling process. Thermal breakdowns and field emission are reasons why that gradient is at the edge of technical feasibility. In the first prototype, an accelerating gradient of 20 MV 1m was reached in continuous wave operation; under pulsed operation, one of six cavities reached 25 MV 1m [267]. The total length 4 of the collider will be 29 km. This project has the highest efficiency at the expense of a very high technical effort. The normal conductive DESY project SBLC mainly uses the same technology as SLC, which has been working successfully since 1989. Yet, SBLC's accelerating structures are essentially longer and differ slightly in geometry. Like in SLC, the accelerating gradient of the SBLC structures equals 17 MV 1m. The necessary length of the collider amounts to 33 km. Details on this project follow in the sequel. To reduce the length of the linear collider and thus reduce the costs, the Japanese project JLC, the Stanford project NLC, and the Russian project VLEPP chose frequencies in the X-band regime. Their accelerating gradients lie at 58 MV 1m, 37.3 MV 1m, and 91 MV 1m respectively, while their total lengths amount to 10.4 km, 15.6 km, and 7.0 km respectively. The CERN project CLIC follows the TBA principle already mentioned above. Its accelerating gradient is 78.4 MV 1m, its total length amounts to 8.8 km. 4
By the total length, the active length plus further beam guiding components, including the cryostats in TESLA, are meant. The numbers were taken from [267].
250
5. Applications from Accelerator Physics
With increasing frequency, the dimensions of the accelerating structures get smaller. The diameter of the opening for the beam of the CLIC structure, for example, is only 6 mm. The smaller sizes necessitate very narrow tolerances for the alignment of the structures. For CLIC, they are 10 /Lm; for the other projects, they are of order 100 /Lm. For many years, all projects were pursued in parallel. Since 1997, test facilities are in operation for several of the projects. At this time, comparative evaluations have also started. Because of the enormous building costs of several billion dollars for each of the linear coUiders, probably only one of the concepts will be eventually realized. With several thousands of accelerating structures, which all have to be equipped with technical devices for the suppression of parasitic modes, the detailed knowledge of these modes is a decisive financial factor. In the following, studies regarding the parasitic modes or Higher Order Modes (HOM) of SBLC are presented. 5.2.2 Acceleration in Linear Colliders The goal of acceleration of particles is to continuously transmit energy to the beam over sections as long as possible. The necessary high energies are transferred via fast oscillating rf fields. In the GHz range, waveguide elements are used as cavities or to transmit electromagnetic waves. However, a plain cylindrical waveguide is not suited for the acceleration of elementary particles, since the phase velocity of electromagnetic waves is larger than the speed of light while the velocity of the particles is just below 5 the speed of light. The particles would not only be accelerated but also curbed by a wave with higher phase velocity. Yet, by insertion of irises in the cylindrical waveguide, the waves can be scaled to the same phase velocity as the particles. Usually, a constant distance between the irises is chosen. Such structures are called" slow wave structures". Figure 5.3 shows such a slow wave structure (cf. Fig. 2.1). The photograph shows the cross-section of a typical accelerating structure for a linear coUider. In particular, some cups of the SBLC structure are shown. The effect of the irises can be elucidated by the graph in Fig. 5.4: The phase velocity in a waveguide is a function of frequency. The dispersion relation of the waveguide states that the frequency w of the wave equals the product of the speed of light c = 2.997925· 108 mls and the square root of the sum of squared phase constants, also called wave numbers, (3 and the squared cut-off number kc: w = cl(P 5
+ k~
For the particle velocity v, the relation v = {3c holds, with the speed of light c and {3 = J1 - 1/,2 the Lorentz factor ,. For an electron with energy E = 100 GeV, this yields, = 100 Ge V / 511 ke V. Thus, the estimate {3 ;:::: 1- 1.3.10- 11 follows.
5.2 Linear Colliders
251
Figure 5.3. Iris-loaded waveguide. The photo is courtesy of DESY
The cut-off number kc of a waveguide separates the range of free wave propagation from that with damping. The phase constant {3 is defined via the wave length Az of the mode as {3 = 21l".
Az
The phase velocity vp of a wave can be derived from the dispersion relation: The phase velocity is given as the ratio of the frequency wand the wave number {3: w vp
= (3'
Often, the dispersion relations are displayed on graphs of w as functions of {3. Then the phase velocity is given by the gradient of the line connecting a point on the dispersion curve with the origin. From the graph of the dispersion relation in a waveguide (cf. Fig. 5.4), it is obvious that the phase velocity in the plain waveguide is always higher than the speed of light. The group velocity Vg is given by the gradient of the dispersion curve: dJ..u
Vg
= df3'
In lossless structures, the energy velocity and the group velocity are equal. The dispersion curve for a cylindrical waveguide loaded with irises separated by a constant distance d starts with large wave length, i.e., small wave numbers f3, like the dispersion curve of the plain waveguide. Then (at (3 = 1l" / d) it intersects the borderline where the phase velocity vp is equal to the speed of light c. Afterwards it quickly flattens for short wave lengths. The frequency range given by the extreme values of a dispersion curve is referred to as the pass band. A frequency range between two pass bands is called the stop band; there no power can flow into the structure. Suitable accelerating structures for high energy electrons have large coupling holes in order to achieve phase velocity vp close to the speed of light
252
5. Applications from Accelerator Physics
Tw ......... ... ...~..... "
,,'
,,'
~~ -:-.;, , .
,
..
---..>x) = ayV(3y(s) . cos(1/Jy(s) + 4>y)
with the phase advance
1/Jx,y(s)
r (3x,y(s') ds' 1
= Jo
represents a transversal periodic oscillation (corresponding to the harmonic oscillator). The amplitude and phase advance of this motion are described by the beta Junction (3(s), which is of central importance for beam optics. The direct determination of the beta function is very costly, so it is customary to use a matrix formalism. The divergences x~(s),Y~(s) are determined for this purpose and, after some rearrangements and eliminations of trigonometric functions, the following equation of an ellipse in x follows:
a; = ,
. x 2 + 2Q . X . x'
+ {3 . x'2,
with Q = -{3'J2 and, = (1 + (2)J{3. The equation of the other transversal coordinate Y is completely analogous. Of course, the area F = 71" a 2 =: 71" € of the ellipse is a constant independent of the localization. The factor € := a~,y is referred to as emittance. x'
Equivalent area of particle.distribution
A=1t£
x
Figure 5.14. Emittance ellipse in phase space. The illustration is courtesy of Drevlak, DESY; now IPP Greifswald
The emittance is derived from the equation of motion for a single particle. Then the motion of the bunch can be described by the phase ellipse or emittance ellipse: If the coordinates (x, x') of all particles with emittance € are displayed in the phase plane, this produces an ellipse in the phase space. Consequently, the emittance ellipse characterizes the beam at the position s. Given a magnet system and a beam that is given in the beginning by a cluster
270
5. Applications from Accelerator Physics
of points in the phase plane (x, X') centered around the reference point (0, 0), then an ellipse which exactly borders the cluster and thus represents the beam can be determined by a choice of (30, 0:0, and f. But then, Liouville's Theorem on the invariance of the phase ellipse states that the particle density in the phase space remains constant when the beam passes a magnet system, i.e., that the area of the phase ellipse is invariant: J x'dx = canst. = f'rr. Setting up transport matrices for each accelerating section, each magnet, and each drift space, the particle trajectory can be tracked by multiplication of the transport matrices starting off from the equation of the phase ellipse (cf. [218] and references therein). It is very important to note that Liouville's theorem is valid only for beams which are guided by external fields. Furthermore, the transformation properties are adjusted to the position in the phase space. A beam with this position is said to be matched. For a mismatched position, the emittance increases. Besides the just explained emittance concept, there is a number of related terms. To keep high luminosity in an accelerator operated with a bunch train - in the so-called multibunch operation - not only the emittance of the single bunch has to be preserved, but any cumulative beam instability along the bunch train has to be avoided. In this context, the specific terms of the single bunch emittance and multibunch emittance are introduced. In computer simulations for studies of the single and multi bunch dynamics of the beam, some effective emittance of the bunch train is computed from the so-called centroids of the bunches (centers of the bunches) and the single bunch emittances. Prescribed tolerances are reflected in the effective emittance of the bunch train at the end of the linear collider. Main input parameters for the simulation are the loss parameters introduced below and the quality factors of higher order modes (long range wake fields). This topic is treated in detail in [83]. Effects of wake fields and dispersion errors, which are caused by misalignment of accelerating structures and injection errors of the bunch, are main reasons of emittance growth. Further reasons are ground motions, jitter, filamentation by phase mixing, and so on. In [210], these effects and suitable correction mechanisms are described. 5.3.2 Wake Fields and Wake Potential In April 1966, cumulative beam instabilities were observed for the first time ever at the SLAC two-mile accelerator [184]. Transverse distortions of the beam were slightly amplified in each of the 960 constant gradient structures, thus leading to a large total amplification of six to seven orders of magnitude in the end. With 10 to 20 rnA, these instabilities had low current thresholds. A series of observations, experiments and calculations followed in the years 1966/67 [184], the results of which can be summarized as follows. 1. The lowest resonant frequency for which beam break-up was observed was
about 4.14 GHz. The field of the corresponding dipole mode was confined
5.3 Beam Dynamics in a Linear Collider
271
to the first eight to ten cells - the rest of the structure was practically field free. The cell-to-cell phase advance in the first cells was close to 1r. 2. Only few modes contributed to the instability. 3. For all contributing modes, the relevant electric field was confined to the first quarter of the structure. Thus, for the SBLC project with its relatively similar constant gradient structure, the following conclusion was irresistible: (i) Only few interacting dipole modes were expected. (ii) For these modes, it was expected that their field and thus their energy would also be confined to the first cells of the structure. (iii) As a consequence of (ii), only one damper in one of the first cells would be sufficient for an adequate suppression of short range wake fields. The numerical simulations described in subsection 5.4 (resp. in [83]) were used to study issues (i) and (ii) (resp. (iii)). These studies made it obvious that the situation is fundamentally different for the SBLC structure because of the different degree of geometrical variation compared to the SLAC structure. Now, we introduce wake fields and related quantities before the abovementioned studies are described in subsection 5.4. Some remarks concerning beam instabilities will also be made. There is a number of assumptions made for the exposition to follow: (i) The particles are assumed to be electrons. (ii) The particles move at the speed of light c, i.e. they are ultra-relativistic. (iii) The vacuum in the accelerator structure is assumed to be perfect. (iv) The walls of all components are assumed to be perfectly conducting. (v) The particle energy is assumed to be so high that Coulomb forces between the particles can be neglected. The following description is partly based on [309]. More detailed descriptions can also be found in [20] and [53].
Fundamental Principles of Formation of Wake Fields. First, consider a simple point charge q moving in free space with velocity v = j3c, j3 ~ 1, i.e., close to the speed of light. It is convenient to choose the cylindrical coordinate system for the description. It is known from classical electrodynamics [142) that a highly relativistic point charge carries a field whose electric and magnetic field lines are almost totally confined to the transverse plane because of the Lorentz contraction. The comparison is often made to a thin plate perpendicular to the direction of propagation. The opening angle of the field lines in the longitudinal plane approximately equals 1;', with the Lorentz factor 'Y = 1/~. In the ultra-relativistic borderline case v --t c {:> 'Y --t 00, the plate thickness turns into a .Bi with eigenvectors iT = (11, h)m. Details of the further steps are described in [80]. Another variable transformation or the matching of the dispersion curves can be noted as further key words.
Results. In the following, some results of the calculations performed by Dohlus with the double-banded coupled oscillator model are described and compared with ORTHO results. For the COM generation, the following parameters are needed for each cell: the frequencies of the 0-, 7r /2 and 7r-mode of the first dipole band, those of the 0- and 7r-mode of the second dipole band, and finally the "mode-to-voltage" coupling coefficients (compare [80]). For this purpose, the periodic solutions of ten of the "original" SBLC cups (with roundings) with the group velocities vg/eo = 0.013, 0.016, ... , 0.042 were computed by MAFIA for the TMO 27r j3-mode. The values for the other cells were interpolated or extrapolated based on this data. The quasi-constant gradient structure with 30 landings [279] was analyzed with the COM model in order to compare directly ORTHO and COM. After all, there were two differences in this comparison. Their effects have to be assessed as follows: (i) ORTHO used a rectangular cross-section, while COM computed the rounded SBLC geometry. In geometrical studies [160] with URMEL-T for an 18-cell structure - in one case with the rounded SBLC shape, in the other with sharp edges,15 - the higher loss parameters lay between ~ 2000 V j(pCm2 ) and ~ 6300 V j(pCm 2 ) for the rounded shape or between ~ 2000 V j(pCm2 ) and ~ 8100 V j(pCm2 ) for the angular shape which corresponds to an increase of the maximal values by ~ 29% for the change from "round" to "angular" (d. [160], Fig. 5.11). This means that an over-estimation of the loss parameter up to 30% has to be expected for the simulation and measurement of the angular shape as a model for the original rounded structure. The comparative studies clearly indicated the advantage of rounded irises. (ii) An infinitely long tube is assumed at the input and output ends of the lBO-cell structure in the simulation with ORTH0,16 while COM uses a perfect magnetic boundary condition as the longitudinal termination. Since the interesting modes are trapped inside the structure, it is reasonable to assume that the effect of the boundary conditions can be neglected for these modes. In any case, it should be noted that the boundary conditions in ORTHO as well as in COM differ from the actual conditions where the input cell is connected in the radial direction with a waveguide and the last cell (output cell) is coated with an absorbing material. Altogether, 140 strongly interacting modes were found by COM, viz. the modes with k' > 103 V/(pCm 2 ). Figure 5.23 shows both curves obtained by 15 16
Here all roundings were replaced by edges inside. In discretization methods, this is often referred to as the open boundary condition or waveguide boundary condition. In these methods, a series expansion is done at the waveguide boundary.
5.4 Numerical Analysis of Higher Order Modes b
a __
~
I•
: : :'''-l~;~.~~t-~ -- ~ ,~.- r--t-.--.~-.-~
j
L . __ 1 :::" '.:. :>...::.....J..' I'
, '''-. . 1 .·-1··l -" ...., J :., .,:'" • ,
!.' .:.:' '. .I'.
o
•
·1
("'+!~
- -1-• • -
....
!
•• , -
j
1 .lUI:.".::::J/:::~;
1. 00
Figure 5.33. Normalized longitudinal electric field IEzl/IIEziloo at radius r = 6 mm off the axis for the dipole modes no. 19 and no. 20 of the 36-cell structure.
308
5. Applications from Accelerator Physics
MODE 21
1. -- MAFIA-SIMULATION .• MEASUREMENT
0.9000
.,N,
0.6000
~
:' ...'"
,\
0.4000
0.2000
o. o.
0.20
0.40
0.60
0.90
Z I M
--->
1.00
1.20
1.40
1. 00
1. 20
1.40
MODE 22
1.
-- MAFIA-SIMULATION •. MEASUREMENT
0.9000
N
.,' ~
0.6000
:' ...'"
0.4000
,
,,
I I'
0.2000
,
" " O.
O.
0.20
0.40
0.60
O. BO Z I M --->
Figure 5.34. Normalized longitudinal electric field IEzl/IIEziloo at the distance r = 6 mm from the axis for the dipole modes no. 21 and no. 22 of the 36-cell structure.
5.5 36-Cell Experiment on Higher Order Modes
309
in a constant gradient structure also becomes obvious in Fig. 5.33 and 5.34: With increasing frequency, more or less the same field pattern travels from the input end with larger iris radii to the output end with the smallest iris radius. The modes are well apart, which guarantees a good mode separation. In Table 5.7, the computed resonant frequencies and normalized loss parameters of these modes are given as obtained by MAFIA. Their normalized loss parameters differ by at most 2.5 %. The normalized loss parameters of modes no. 19-21 even agree to ::::J 0.2 % (see also Fig. 5.27). Dipole mode no. 12 with the resonant frequency h2 = 4.145 GHz lies below the so-called beam pipe mode no. 13 with the resonant frequency h3 = 4.150295 GHz. This mode was found in simulation and measurement and is displayed in Fig. 5.35: It is the only mode having a significant electric mode close to the beam pipe. Therefore, no interaction with the beam is possible, as is reflected in the small loss parameter of 866 V /(pC m 2 ). Mode 12 15 16 19 20 21 22
Resonant MAFIA 4.14504 GHz 4.17099 GHz 4.18557 GHz 4.23120 GHz 4.24409 GHz 4.25630 GHz 4.26796 GHz
frequency Measurement 4.17477 4.18931 4.23057 4.24390 4.25631 4.26827
GHz GHz GHz GHz GHz GHz
normalized loss parameter MAFIA 4204 V/(pC m~) 4180 V/(pC m 2 ) 4207 V/(pC m 2 ) 4108 V/(pC m 2 ) 4102 V/(pC m 2 ) 4100 V/(pC m 2 ) 4000 V/(pC m 2 )
Table 5.1. Resonant frequencies and normalized loss parameters of some trapped dipole modes of the 36-cell structure as computed by MAFIA.
Some measurements and simulations were also carried out for the third dipole band. It has to be noted that the measurements are significantly more difficult for these higher frequencies than for the first dipole band. Also the simulations with MAFIA using the same parameters for the eigenvalue solver gave much less accurate solutions than the simulations for the first dipole band. This is caused by the fact that either twice as many modes have to be calculated or only a certain frequency window has to be computed. In the first case, 146 eigenvalues and eigenvectors have to be determined, where 36 eigenvalues are in each cluster. This creates difficulties in the orthogonalization of the eigenvectors in the eigenvalue solver SAP [271]. In the second case, a special method also implemented in SAP has to be used to find single eigenvalues. Therefore, further studies are necessary in order to get good results. This is the reason why no comparison of measurements with the results of simulation is given here for the third dipole band. Nevertheless, a principal match between measured and simulated field distribution could already be observed.
310
5. Applications from Accelerator Physics MODE 13
l. -- MAFIA-SIMULATION .. MEASUREMENT
0.8000
N
",'
0.6000
",
0.4000
'" ""'
0.2000
~ '"
Ij l
o.
O.
0.20
0.40
0.60
0.80
Z / M
--->
Figure 5.35. Normalized longitudinal electric field no. 13, the so-called beam pipe mode.
1.00
1.20
IEzl/IIEzlloo
1.40
for dipole mode
5.5.6 Measurement with Local Damping For the SBLC test facility, two HOM damper cells are proposed for each accelerating structure. The HOM damper cells have been designed by numerical simulation and some have been measured in short constant impedance structures [78] (see also subsection 5.5.8). For the study of the behaviour of such a damper cell, first measurements were carried out with a damped cell in the 36-cell structure. Since the electrical properties of the clamped structure would strongly change if one demounts and mounts it again, which would be necessary for a direct comparison, a design was found [179] that allowed to compare the damped and undamped cases without demounting. For this purpose, a sheet of graphite was put on a piece of paper which then was inserted in the 18 th cell (compare Fig. 5.36). The thickness of the graphite sheet was 1.5 J.lm and the conductivity of the graphite was 1.25.105 II Dm. It has to be stressed that this is only a rough model for test purposes representing a cell with wall slots and attached waveguides (compare [78] and subsection 5.5.8). This procedure prevents the influence of modified electrical contacts on the quality factor Q, which would result from remounting of the structure. The measurement setup remained unchanged. In this measurement, the focus was on the longitudinal electric field, since it indicates most clearly the effects of the damper cell. Therefore, a dielectric needle made of Alz0 3 (lOr = 9.2) was used. This bead was 6 mm long and had diameter 1 mm. It was also calibrated in a TMolO pill-box. Its longitudinal form factor amounted to a z = 9.34.10- 20 Asm 2 IV. It was 7 mm away from the axis in this measurement. Some modes from the first pass band with significant field strength in the damper position, i.e., in cell no. 18, were
5.5 36-Cell Experiment on Higher Order Modes
Figure 5.36. 36-cell structure with damping ring in cell 18. of Muller and Hiilsmann, Univ. Frankfurt
311
The drawing is courtesy
selected for this measurement, since a relevant damping effect can only be expected for such kind of modes. The cell-to-cell phase advance of the dipole modes varies between 0 and 7r in an aperiodic structure. The following modes were selected in order to study the influence of the damper position by measurement: For mode A, the 7r-mode-like end of the field pattern is located in the damper cell, for mode B it is the 7r /2-part, and for mode C is it the O-mode-like end. Two different positions of the damping material were studied: First, the paper with graphite sheet was pressed onto the inner wall of cell no. 18 in order to get a symmetric damper and to reach only a relatively low damping effect. The damping effect was increased in another measurement by moving the paper several millimeters inward, thus placing it in a location with higher field strength. Figure 5.37 shows the measured field distribution for all three cases: (a) no damper, (b) weak damping, and (c) strong damping. Unexpectedly, even for mode C, having only the O-mode-like field pattern in the damper cell, a good damping over the complete field pattern was found. Table 5.8 gives some data of the test result. The entries in table 5.8 for the damping effect (damping factor) equal ratios of the measured field strengths, which are again proportional to the square of the electric field strengths. Figure 5.38 illustrates that the damping effect is evenly distributed over all phases in the modes both for weak and for strong damping. The computer simulation of the 36-cell structure with a very thin (about 1.5 /-lm) damping sheet in cell no.I8 is scarcely possible with discretization methods, since a sufficiently accurate discretization would imply enormous storage demands. Therefore, a perturbation approach using data of a field calculation with MAFIA assuming only loss-free material was used. This approach and the results obtained are described in [308]. Earlier measurements of a 12-cell constant impedance structure [78] showed strong influence of the damping on the field pattern of the modes, shortly mode geometry, so the same was in fact expected for the aperiodic 36-cell structure. However, neither the weak nor the strong damping significantly changes the mode geometry, as Fig. 5.37 and 5.38 make obvious. Moreover, the damping effect is evenly distributed over all cell-to-cell phase advances,
312
5. Applications from Accelerator Physics
~ : :~ "" ' j '-'-r'-'~ j : 'Q -- --,- . - .• •. •• • . ......... .
- - - ",;,hou, Damrcr - - - Damper on Surf3Ce
- - - Damper in Cell
~
-.;
:
:
:
-iE ----,----------r------.. · .. _-_..· ·~··· · ··· · ····;···········t····· ·· · · C'O, : Ct¥l #18 ~ ~
.
t.-.....IFl·· ..... ·····r····
.~ ..... ~ ........... ~ ........ .
.:
'
t i: u:
:
iil·.. ··~- ----------1'- -
:a.., .....L--.. ; ::;
:
"5:"'+'" (I)
o
+o • • • • •
~
• • •• • • •••• ••
,
:
0,2
--"'!"
~
:
0,4
0,6
~
• • •• • •• • •• •
0,8
t!
!........ .
~ositio~ I
(mJ 1,4
1,2
:.E ..... :........... l: ...... .....: ...: . :::s :
- - - wilhout Damper
.;;..... ~ ........... l............~ ... .
- - - Dampct In
0........;
:
]1 :
:
:
- - - Damper on Surf.
c