EMS Tracts in Mathematics 14
EMS Tracts in Mathematics Editorial Board: Carlos E. Kenig (The University of Chicago, USA) Andrew Ranicki (The University of Edinburgh, Great Britain) Michael Röckner (Universität Bielefeld, Germany, and Purdue University, USA) Vladimir Turaev (Indiana University, Bloomington, USA) Alexander Varchenko (The University of North Carolina at Chapel Hill, USA) This series includes advanced texts and monographs covering all fields in pure and applied mathematics. Tracts will give a reliable introduction and reference to special fields of current research. The books in the series will in most cases be authored monographs, although edited volumes may be published if appropriate. They are addressed to graduate students seeking access to research topics as well as to the experts in the field working at the frontier of research. 1 2 3 4 5 6 7 8 9 10 11 12 13
Panagiota Daskalopoulos and Carlos E. Kenig, Degenerate Diffusions Karl H. Hofmann and Sidney A. Morris, The Lie Theory of Connected Pro-Lie Groups Ralf Meyer, Local and Analytic Cyclic Homology Gohar Harutyunyan and B.-Wolfgang Schulze, Elliptic Mixed, Transmission and Singular Crack Problems Gennadiy Feldman, Functional Equations and Characterization Problems on Locally Compact Abelian Groups , Erich Novak and Henryk Wozniakowski, Tractability of Multivariate Problems. Volume I: Linear Information Hans Triebel, Function Spaces and Wavelets on Domains Sergio Albeverio et al., The Statistical Mechanics of Quantum Lattice Systems Gebhard Böckle and Richard Pink, Cohomological Theory of Crystals over Function Fields Vladimir Turaev, Homotopy Quantum Field Theory Hans Triebel, Bases in Function Spaces, Sampling, Discrepancy, Numerical Integration , Erich Novak and Henryk Wozniakowski, Tractability of Multivariate Problems. Volume II: Standard Information for Functionals Laurent Bessières et al., Geometrisation of 3-Manifolds
Steffen Börm
Efficient Numerical Methods for Non-local Operators 2-Matrix Compression, Algorithms and Analysis
Author: Prof. Dr. Steffen Börm Institut für Informatik Christian-Albrechts-Universität zu Kiel 24118 Kiel Germany E-mail:
[email protected] 2010 Mathematical Subject Classification: 65-02; 65F05, 65F30, 65N22, 65N38, 65R20 Key words: Hierarchical matrix, data-sparse approximation, boundary element method, preconditioner
ISBN 978-3-03719-091-3 The Swiss National Library lists this publication in The Swiss Book, the Swiss national bibliography, and the detailed bibliographic data are available on the Internet at http://www.helveticat.ch. This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of use permission of the copyright owner must be obtained. © 2010 European Mathematical Society Contact address: European Mathematical Society Publishing House Seminar for Applied Mathematics ETH-Zentrum FLI C4 CH-8092 Zürich Switzerland Phone: +41 (0)44 632 34 36 Email: info @ems-ph.org Homepage: www.ems-ph.org Typeset using the author’s TEX files: I. Zimmermann, Freiburg Printing and binding: Druckhaus Thomas Müntzer GmbH, Bad Langensalza, Germany ∞ Printed on acid free paper 987654321
Foreword
Non-local operators appear naturally in the field of scientific computing: non-local forces govern the movement of objects in gravitational or electromagnetic fields, nonlocal density functions describe jump processes used, e.g., to investigate stock prices, and non-local kernel functions play an important role when studying population dynamics. Applying standard discretization schemes to non-local operators yields matrices that consist mostly of non-zero entries (“dense matrices”) and therefore cannot be treated efficiently by standard sparse matrix techniques. They can, however, be approximated by data-sparse representations that significantly reduce the storage requirements. Hierarchical matrices (H -matrices) [62] are one of these data-sparse representations: H -matrices not only approximate matrices arising in many important applications very well, they also offer a set of matrix arithmetic operations like evaluation, multiplication, factorization and inversion that can be used to construct efficient preconditioners or solve matrix equations. H 2 -matrices [70], [64] introduce an additional hierarchical structure to reduce the storage requirements and computational complexity of H -matrices. In this book, I focus on presenting an overview of theoretical results and practical algorithms for working with H 2 -matrices. I assume that the reader is familiar with basic techniques of numerical linear algebra, e.g., norm estimates, orthogonal transformations and factorizations. The error analysis of integral operators, particularly Section 4.7, requires some results from polynomial approximation theory, while the error analysis of differential operators, particularly Section 9.2, is aimed at readers familiar with standard finite element techniques and makes use of a number of fundamental properties of Sobolev spaces. Different audiences will probably read the book in different ways. I would like to offer the following suggestions: Chapters 1–3 provide the basic concepts and definitions used in this book and any reader should at least be familiar with the terms H -matrix, H 2 -matrix, cluster tree, block cluster tree, admissible and inadmissible blocks and cluster bases. After this introduction, different courses are possible: • If you are a student of numerical mathematics, you should read Sections 4.1–4.4 on integral operators, Sections 5.1–5.5 on orthogonalization and truncation, Sections 6.1–6.4 on matrix compression, and maybe Sections 7.1, 7.2, 7.6 and 7.7 to get acquainted with the concepts of matrix arithmetic operations. • If you are interested in using H 2 -matrices to treat integral equations, you should read Chapter 4 on basic approximation techniques and Chapters 5 and 6 in order to understand how the storage requirements can be reduced as far as possible. Remarks on practical applications can be found in Sections 10.1–10.4.
vi
Foreword
• If you are interested in applying H 2 -matrices to elliptic partial differential equations, you should consider Sections 5.1–5.5 on truncation, Chapter 6 on compression, Chapter 8 on adaptive matrix arithmetic operations and Sections 10.5 and 10.3. Convergence estimates can be found in Chapter 9. If you would like to try the algorithms presented in this book, you can get the HLib software package that I have used to provide the numerical experiments described in Chapters 4–10. Information on this package is available at http://www.hlib.org and it is provided free of charge for research purposes. This book would not exist without the help and support of Wolfgang Hackbusch, whom I wish to thank for many fruitful discussions and insights and for the chance to work at the Max Planck Institute for Mathematics in the Sciences in Leipzig. I am also indebted to my colleagues Stefan A. Sauter, Lars Grasedyck and J. Markus Melenk, who have helped me find answers to many questions arising during the course of this work. Last, but not least, I thank Maike Löhndorf, Kai Helms and Jelena Djoki´c for their help with proofreading the (already quite extensive) drafts of this book and Irene Zimmermann for preparing the final version for publication. Kiel, November 2010
Steffen Börm
Contents 1
2
3
Introduction 1.1 Origins of H 2 -matrix methods . . . . . . . . . . . . . 1.2 Which kinds of matrices can be compressed? . . . . . . 1.3 Which kinds of operations can be performed efficiently? 1.4 Which problems can be solved efficiently? . . . . . . . 1.5 Organization of the book . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1 1 3 4 6 6
Model problem 2.1 One-dimensional integral operator 2.2 Low-rank approximation . . . . . 2.3 Error estimate . . . . . . . . . . . 2.4 Local approximation . . . . . . . . 2.5 Cluster tree and block cluster tree . 2.6 Hierarchical matrix . . . . . . . . 2.7 Matrix approximation error . . . . 2.8 H 2 -matrix . . . . . . . . . . . . . 2.9 Numerical experiment . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
9 9 10 11 15 16 20 22 23 26
Hierarchical matrices 3.1 Cluster tree . . . . . . . . . . . . . . . . . . . . . . 3.2 Block cluster tree . . . . . . . . . . . . . . . . . . 3.3 Construction of cluster trees and block cluster trees 3.4 Hierarchical matrices . . . . . . . . . . . . . . . . 3.5 Cluster bases . . . . . . . . . . . . . . . . . . . . . 3.6 H 2 -matrices . . . . . . . . . . . . . . . . . . . . . 3.7 Matrix-vector multiplication . . . . . . . . . . . . . 3.8 Complexity estimates for bounded rank distributions 3.9 Technical lemmas . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
28 29 34 37 47 53 56 59 63 70
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
74 76 76 79 87 103 114 125 149 155
4 Application to integral operators 4.1 Integral operators . . . . . . . . . 4.2 Low-rank approximation . . . . . 4.3 Approximation by Taylor expansion 4.4 Approximation by interpolation . . 4.5 Approximation of derivatives . . . 4.6 Matrix approximation . . . . . . . 4.7 Variable-order approximation . . . 4.8 Technical lemmas . . . . . . . . . 4.9 Numerical experiments . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
viii 5
6
Contents
Orthogonal cluster bases and matrix projections 5.1 Orthogonal cluster bases . . . . . . . . . . . . . . . . . . 5.2 Projections into H 2 -matrix spaces . . . . . . . . . . . . 5.3 Cluster operators . . . . . . . . . . . . . . . . . . . . . . 5.4 Orthogonalization . . . . . . . . . . . . . . . . . . . . . 5.5 Truncation . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Computation of the Frobenius norm of the projection error 5.7 Numerical experiments . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
163 164 166 175 180 185 200 202
Compression 6.1 Semi-uniform matrices . . . . . . . . . . . . . . . . . 6.2 Total cluster bases . . . . . . . . . . . . . . . . . . . 6.3 Approximation by semi-uniform matrices . . . . . . . 6.4 General compression algorithm . . . . . . . . . . . . 6.5 Compression of hierarchical matrices . . . . . . . . . 6.6 Recompression of H 2 -matrices . . . . . . . . . . . . 6.7 Unification and hierarchical compression . . . . . . . 6.8 Refined error control and variable-rank approximation 6.9 Numerical experiments . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
211 212 218 222 227 234 239 248 259 271
7 A priori matrix arithmetic 7.1 Matrix forward transformation . . . . 7.2 Matrix backward transformation . . . . 7.3 Matrix addition . . . . . . . . . . . . 7.4 Projected matrix-matrix addition . . . 7.5 Exact matrix-matrix addition . . . . . 7.6 Matrix multiplication . . . . . . . . . 7.7 Projected matrix-matrix multiplication 7.8 Exact matrix-matrix multiplication . . 7.9 Numerical experiments . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
280 281 287 292 293 297 301 302 319 328
8 A posteriori matrix arithmetic 8.1 Semi-uniform matrices . . . . . . . . 8.2 Intermediate representation . . . . . 8.3 Coarsening . . . . . . . . . . . . . . 8.4 Construction of adaptive cluster bases 8.5 Numerical experiments . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
332 334 336 347 359 360
9 Application to elliptic partial differential operators 9.1 Model problem . . . . . . . . . . . . . . . . . . 9.2 Approximation of the solution operator . . . . . 9.3 Approximation of matrix blocks . . . . . . . . . 9.4 Compression of the discrete solution operator . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
363 364 366 376 384
. . . . .
ix
Contents
10 Applications 10.1 Indirect boundary integral equation . . . . . . . . . . . . 10.2 Direct boundary integral equation . . . . . . . . . . . . 10.3 Preconditioners for integral equations . . . . . . . . . . 10.4 Application to realistic geometries . . . . . . . . . . . . 10.5 Solution operators of elliptic partial differential equations
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
387 388 395 401 411 413
Bibliography
421
Algorithm index
427
Subject index
429
Chapter 1
Introduction
The goal of this book is to describe a method for handling certain large dense matrices efficiently. The fundamental idea of the H 2 -matrix approach is to reduce the storage requirements by using an alternative multilevel representation of a dense matrix instead of the standard representation by a two-dimensional array.
1.1 Origins of H 2 -matrix methods The need for efficient algorithms for handling dense matrices arises from several fields of applied mathematics: in the simulation of many-particle systems governed by the laws of gravitation or electrostatics, a fast method for computing the forces acting on the individual particles is required, and these forces can be expressed by large dense matrices. Certain homogeneous partial differential equations can be reformulated as boundary integral equations, and compared to the standard approach, these formulations have the advantage that they reduce the spatial dimension, improve the convergence and can even simplify the handling of complicated geometries. The discretization of the boundary integral equations leads again to large dense matrices. A number of models used in the fields of population dynamics or machine learning also lead to integral equations that, after discretization, yield large dense matrices. Several approaches for handling these kinds of problems have been investigated: for special integral operators and special geometries, the corresponding dense matrices are of Toeplitz or circulant form, and the fast Fourier transform [37] can be used to compute the matrix-vector multiplication in O.n log n/ operations, where n is the matrix dimension. The restriction to special geometries limits the range of applications that can be treated by this approach. The panel clustering method [71], [72], [91], [45] follows a different approach to handle arbitrary geometries: the matrix is not represented exactly, but approximated by a data-sparse matrix, i.e., by a matrix that is still dense, but can be represented in a compact form. This approximation is derived by splitting the domain of integration into a partition of subdomains and replacing the kernel function by local separable approximations. The resulting algorithms have a complexity of O.nm˛ logˇ n/ for problem-dependent small exponents ˛; ˇ > 0 and a parameter m controlling the accuracy of the approximation. The well-known multipole method [58], [60] is closely related and takes advantage of the special properties of certain kernel functions to improve the efficiency. It has
2
1 Introduction
originally been introduced for the simulation of many-particle systems, but can also be applied to integral equations [88], [86], [85], [57]. “Multipole methods without multipoles” [2], [82], [108] replace the original multipole approximation by more general or computationally more efficient expansions while keeping the basic structure of the corresponding algorithms. Of particular interest is a fully adaptive approach [46] that constructs approximations based on singular value decompositions of polynomial interpolants and thus can automatically find efficient approximations for relatively general kernel functions. It should be noted that the concept of separable approximations used in both the panel clustering and the multipole method is already present in the Ewald summation technique [44] introduced far earlier to evaluate Newton potentials in crystallographical research efficiently. Wavelet techniques use a hierarchy of nested subspaces combined with a Galerkin method in order to approximate integral operators [94], [9], [41], [39], [73], [105], [102]. This approach reaches very good compression rates, but the construction of suitable subspaces on complicated geometries is significantly more complicated than for the techniques mentioned before. Hierarchical matrices [62], [68], [67], [49], [52], [63] and the closely related mosaic skeleton matrices [103] are the algebraic counterparts of panel-clustering and multipole methods: a partition of the matrix takes the place of the partition of the domains of integration, and low-rank submatrices take the place of local separable expansions. Due to their algebraic structure, hierarchical matrices can be applied not only to integral equations and particle systems, but also to more general problems, e.g., partial differential equations [6], [56], [55], [54], [76], [77], [78] or matrix equations from control theory [53], [51]. Efficient approximations of densely populated matrices related to integral equations can be constructed by interpolation [16] or more efficient cross approximation schemes [5], [4], [7], [17]. H 2 -matrices [70], [64] combine the advantages of hierarchical matrices, i.e., their flexibility and wide range of applications, with those of wavelet and fast multipole techniques, i.e., the high compression rates achieved by using a multilevel basis. The construction of this cluster basis for different applications is one of the key challenges in the area of H 2 -matrices: it has to be efficient, i.e., it has to consist of a small number of vectors, but it also has to be accurate, i.e., it has to be able to approximate the original matrix up to a given tolerance. In some situations, an H 2 -matrix approximation can reach the optimal order O.n/ of complexity while keeping the approximation error consistent with the requirements of the underlying discretization scheme [91], [23]. Obviously, we cannot hope to be able to approximate all dense matrices in this way: if a matrix contains only independent random values, the standard representation is already optimal and no compression scheme will be able to reduce the storage requirements. Therefore we have first to address the question “Which kinds of matrices can be compressed by H 2 -matrix methods?” It is not sufficient to know that a matrix can be compressed, we also have to be able to find the compressed representation and to use it in applications, e.g., to perform
1.2 Which kinds of matrices can be compressed?
3
matrix-vector multiplications or solve systems of linear equations. Of course, we do not want to convert the H 2 -matrices back to the less efficient standard format, therefore we have to consider the question “Which kinds of operations can be performed efficiently with compressed matrices?” Once these two theoretical questions have been answered, we can consider practical applications of the H 2 -matrix technique, i.e., try to answer the question “Which problems can be solved efficiently by H 2 -matrices?”
1.2 Which kinds of matrices can be compressed? There are two answers to this question: in the introductory Chapter 2, a very simple one-dimensional integral equation is discussed, and it is demonstrated that its discrete counterpart can be handled by H 2 -matrices: if we replace the kernel function by a separable approximation, the resulting matrix will be an H 2 -matrix and can be treated efficiently. Chapter 4 generalizes this result to the more general setting of integral operators with asymptotically smooth kernel functions. In Chapter 6, on the other hand, a relatively general characterization of H 2 -matrices is introduced. Using this characterization, we can determine whether arbitrary matrices can be approximated by H 2 -matrices. In this framework, the approximation of integral operators can be treated as a special case, but it is also possible to investigate more general applications, e.g., the approximation of solution operators of ordinary [59], [96] and elliptic partial differential equations by H 2 -matrices [6], [15]. The latter very important case is treated in Chapter 9.
Separable approximations Constructing an H 2 -matrix based on separable approximations has the advantage that the problem is split into two relatively independent parts: the first task is to approximate the kernel function in suitable subdomains by separable kernel functions. This task can be handled by Taylor expansions [72], [100] or interpolation [45], [65], [23] if the kernel function is locally analytic. Both of these approaches are discussed in Chapter 4. For special kernel functions, special approximations like the multipole expansion [58], [60] or its counterparts for the Helmholtz kernel [1], [3] can be used. The special techniques required by these methods are not covered here. Once a good separable approximation of the kernel function has been found, we face the second task: the construction of an H 2 -matrix. This is accomplished by splitting the integral operator into a sum of local operators on suitably defined subsets and then replacing the original kernel function by its separable approximations. Discretizing the resulting perturbed integral operator by a standard scheme (e.g., Galerkin methods, collocation or Nystrøm techniques) then yields an H 2 -matrix approximation of the original matrix.
4
1 Introduction
The challenge in this task is to ensure that the number of local operators is as small as possible: using one local operator for each matrix entry will not lead to a good compression ratio, therefore we are looking for methods that ensure that only a small number of local operators are required. The standard approach in this context is to use cluster trees, i.e., to split the domains defining the integral operator into a hierarchy of subdomains and use an efficient recursive scheme to find an almost optimal decomposition of the original integral operator into local operators which can be approximated. The efficiency of this technique depends on the properties of the discretization scheme. If the supports of the basis functions are local, i.e., if a neighborhood of the support of a basis function intersects only a small number of supports of other basis functions, it can be proven that the cluster trees will lead to efficient approximations of the matrix [52]. For complicated anisotropic meshes or higher-order basis functions, the situation becomes more complicated and special techniques have to be employed.
General characterization Basing the construction of an H 2 -matrix on the general theory presented in Chapter 6 has the advantage that it allows us to treat arbitrary dense matrices. Whether a matrix can be approximated by an H 2 -matrix or not can be decided by investigating the effective ranks of two families of submatrices, the total cluster bases. If all of these submatrices can be approximated using low ranks, the matrix itself can be approximated by an H 2 -matrix. Since this characterization relies only on low-rank approximations, but requires no additional properties, it can be applied in relatively general situations, e.g., to prove that solution operators of strongly elliptic partial differential operators with L1 coefficients can be approximated by H 2 -matrices. Chapter 9 gives the details of this result.
1.3 Which kinds of operations can be performed efficiently? In this book, we consider three types of operations: first the construction of an approximation of the system matrix, then arithmetic operations like matrix-vector and matrix-matrix multiplications, and finally more complicated operations like matrix factorizations or matrix inversion, which can be constructed based on the elementary arithmetic operations.
Construction An H 2 -matrix can be constructed in several ways: if it is the approximation of an explicitly given integral operator, we can proceed as described above and compute the
1.3 Which kinds of operations can be performed efficiently?
5
H 2 -matrix by discretizing a number of local separable approximations. For integral operators with locally smooth kernel functions, the implementation of this method is relatively straightforward and it performs well. This approach is described in Chapter 4. If we want to approximate a given matrix, we can use the compression algorithms introduced in Chapter 6. These algorithms have the advantage that they construct quasi-optimal approximations, i.e., they will find an approximation that is almost as good as the best possible H 2 -matrix approximation. This property is very useful, since it allows us to use H 2 -matrices as a “black box” method. It is even possible to combine both techniques: if we want to handle an integral operator, we can construct an initial approximation by using the general and simple interpolation scheme, and then improve this approximation by applying the appropriate compression algorithm. The experimental results in Chapter 6 indicate that this technique can reduce the storage requirements by large factors.
Arithmetic operations If we want to solve a system of linear equations with a system matrix in H 2 -representation, we at least have to be able to evaluate the product of the matrix with a vector. This and related operations, like the product with the transposed matrix or forward and backward substitution steps for solving triangular systems, can be accomplished in optimal complexity for H 2 -matrices: not more than two operations are required per unit of storage. Using Krylov subspace methods, it is even possible to construct solvers based exclusively on matrix-vector multiplications and a number of simple vector operations. This is the reason why most of today’s schemes for solving dense systems of equations (e.g., based on panel clustering [72], [91] or multipole expansions [58], [60]) provide only efficient algorithms for matrix-vector multiplications, but not for more complicated operations. Hierarchical matrices and H 2 -matrices, on the other hand, are purely algebraic objects, and since we have efficient compression algorithms at our disposal, we are able to approximate the results of complex operations like the matrix-matrix multiplication. In Chapters 7 and 8, two techniques for performing this fundamental computation are presented. The first one reaches the optimal order of complexity, but requires a priori knowledge of the structure of the result. The second one is slightly less efficient, but has the advantage that it is fully adaptive, i.e., that it is possible to guarantee a prescribed accuracy of the result.
Inversion and preconditioners Using the matrix-matrix multiplication algorithms, we can perform more complicated arithmetic operations like the inversion or the LU factorization. The derivation of the
6
1 Introduction
corresponding algorithms is straightforward: if we express the result in terms of block matrices, we see that it can be computed by a sequence of matrix-matrix multiplications. We replace each of these products by its H 2 -matrix approximation and combine all of the H 2 -submatrices to get an H 2 -matrix approximation of the result (cf. Section 6.7 and Chapter 10). If we perform all operations with high accuracy, the resulting inverse or factorization can be used as a direct solver for the original system, although it may require a large amount of storage. If we use only a low accuracy, we can still expect to get a good preconditioner which can be used in an efficient iterative or semi-iterative scheme, e.g., a conjugate gradient or GMRES method.
1.4 Which problems can be solved efficiently? In this book, we focus on dense matrices arising from the discretization of integral equations, especially those connected to solving homogeneous elliptic partial differential equations with the boundary integral method. For numerical experiments, these matrices offer the advantage that they are discretizations of a continuous problem, therefore we have a scale of discretizations of differing resolution at our disposal and can investigate the behavior of the methods for very large matrices and high condition numbers. The underlying continuous problem is relatively simple, so we can easily construct test cases and verify the correctness of an implementation. We also consider the construction of approximate inverses for the stiffness matrices arising from finite element discretizations of elliptic partial differential operators. In the paper [6], it has been proven that these inverses can be approximated by hierarchical matrices [62], [52], [63], but the proof is based on a global approximation argument that does not carry over directly to the case of H 2 -matrices. Chapter 9 uses the localized approach presented in [15] to construct low-rank approximations of the total cluster bases, and applying the general results of Chapter 6 and [13] yields the existence of H 2 -matrix approximations. H 2 -matrices have also been successfully applied to problems from the field of electromagnetism [24], heat radiation, and machine learning.
1.5 Organization of the book In the following, I try to give an overview of the current state of the field of H 2 -matrices. The presentation is organized in nine chapters covering basic definitions, algorithms with corresponding complexity analysis, approximation schemes with corresponding error analysis, and a number of numerical experiments.
1.5 Organization of the book
7
Chapter 2: Model problem This chapter introduces the basic concepts of H 2 matrices for a one-dimensional model problem. In this simple setting, the construction of an H 2 -matrix and the analysis of its complexity and approximation properties is fairly straightforward. Chapter 3: Hierarchical matrices This chapter considers the generalization of the definition of H 2 -matrices to the multi-dimensional setting. H 2 -matrices are defined based on a block cluster tree describing a partition of the matrix into a hierarchy of submatrices and cluster bases describing the form of these submatrices. If a number of relatively general conditions for the block cluster tree and the cluster bases are fulfilled, it is possible to derive optimal-order estimates for the storage requirements and the time needed to compute the matrix-vector multiplication Chapter 4: Integral operators A typical application of H 2 -matrices is the approximation of matrices resulting from the finite element (or boundary element) discretization of integral operators. This chapter describes simple approximation schemes based on Taylor expansion and constant-order interpolation, but also more advanced approaches based on variable-order interpolation. The error of the resulting H 2 -matrices is estimated by using error bounds for the local approximants of the kernel function. Chapter 5: Orthogonal cluster bases This chapter describes techniques for finding the optimal H 2 -matrix approximation of a given arbitrary matrix under the assumption that a suitable block cluster tree and good cluster bases are already known. If the cluster bases are orthogonal, the construction of the optimal approximation is straightforward, therefore this chapter contains two algorithms for converting arbitrary cluster bases into orthogonal cluster bases: the first algorithm yields an orthogonal cluster basis that is equivalent to the original one, the second algorithm constructs an approximation of lower complexity. Chapter 6: Compression This chapter introduces the total cluster bases that allow us to give an alternative characterization of H 2 -matrices and to develop algorithms for approximating arbitrary matrices. The analysis of these algorithms relies on the results of Chapter 5 in order to establish quasi-optimal error estimates. Chapter 7: A priori matrix arithmetic Once a matrix has been approximated by an H 2 -matrix, the question of solving corresponding systems of linear equations has to be answered. For dense matrices, the usual solution strategies require factorizations of the matrix or sometimes even its inverse. Since applying these techniques directly to H 2 -matrices would be very inefficient, this chapter introduces an alternative: factorization and inversion can be performed using matrix-matrix products, therefore finding an efficient algorithm for approximating these products is an important step towards solving linear systems. By using the orthogonal projections introduced in Chapter 5 and preparing suitable quantities in advance, the best approximation of a matrix-matrix product in a given H 2 -matrix space can be computed very efficiently.
8
1 Introduction
Chapter 8: A posteriori matrix arithmetic The algorithms introduced in Chapter 7 compute the best approximation of the matrix-matrix product in a given matrix space, but if this space is not chosen correctly, the resulting error can be quite large. This chapter describes an alternative algorithm that constructs an H 2 -matrix approximation of the matrix-matrix product and chooses the cluster bases in such a way that a given precision can be guaranteed. Chapter 9: Elliptic partial differential equations Based on the a posteriori arithmetic algorithms of Chapter 8, it is possible to compute approximate inverses of H 2 matrices, but it is not clear whether these inverses can be represented efficiently by an H 2 -matrix. This chapter proves that the inverse of the stiffness matrix of an elliptic partial differential equation can indeed be approximated well in the compressed format, and due to the best-approximation property of the compression algorithm, this means that the computation can be carried out efficiently. Chapter 10: Applications The final chapter considers a number of practical applications of H 2 -matrices. Most of the applications are related to boundary integral formulations for Laplace’s equation, but there are also some examples related to more general elliptic partial differential equations. In some chapters, I have collected technical lemmas in a separate section in the hope of focusing the attention on the important results, not on often rather technical proofs of auxiliary statements.
Chapter 2
Model problem
In this chapter, we introduce the basic concepts of hierarchical matrices and H 2 matrices. Since the underlying ideas are closely related to panel-clustering techniques [72] for integral equations, we use a simple integral operator as a model problem.
2.1 One-dimensional integral operator Let us consider the integral operator Z
1
G Œu.x/ ´
log jx yju.y/ dy 0
for functions u 2 L2 Œ0; 1. For n 2 N, we discretize it by Galerkin’s method using the n-dimensional space spanned by the basis .'i /niD1 of piecewise constant functions given by ´ 1 if x 2 Œ.i 1/=n; i=n; 'i .x/ D 0 otherwise; for all i 2 f1; : : : ; ng and x 2 Œ0; 1. This leads to a matrix G 2 Rnn with entries Z 1 Z 1 Gij ´ 'i .x/ g.x; y/'j .y/ dy dx 0 0 (2.1) Z i=n Z j=n D g.x; y/ dy dx; .i1/=n
.j 1/=n
where the kernel function is given by ´ log jx yj g.x; y/ ´ 0
if x ¤ y; otherwise:
Due to supp g D Œ0; 12 , all entries Gij of the matrix are non-zero, therefore even a simple task like computing the matrix-vector product in the standard way requires at least n2 operations. Finite element techniques tend to require a large number of degrees of freedom in order to reach a suitable precision, and if n is large, a quadratic complexity means that the algorithm will take very long to complete. Therefore we have to look for more efficient techniques for handling the matrix G.
10
2 Model problem
2.2 Low-rank approximation Since typically all entries of G will be non-zero, the standard sparse matrix representations used in the context of partial differential equations will not by efficient. Therefore we have to settle for an approximation which is not sparse, but only data-sparse, i.e., which requires significantly less storage than the original matrix. In order to derive a data-sparse approximation of G, we rely on an approximation of the kernel function g. By defining f W R>0 ! R;
z 7! log z;
8 ˆ y; if x < y; otherwise;
for all x; y 2 R. Since g is symmetric, we only consider the case x > y in the following. We approximate f by its m-th order Taylor expansion (see, e.g., Chapter XIII, §6 in [75]) around a center z0 2 R>0 , which is given by fQz0 ;m .z/ ´
m1 X
f .˛/ .z0 /
˛D0
.z z0 /˛ : ˛Š
We pick x0 ; y0 2 R with z0 D x0 y0 and notice that the Taylor expansion of f gives rise to an approximation of g, gQ z0 ;m .x; y/ ´ fQz0 ;m .x y/ D
m1 X ˛D0
D
m1 X
f .˛/ .x0 y0 /
˛D0
D
m1 X ˛D0
D
m1 X ˛D0
D
.x y z0 /˛ ˛Š
.x x0 C .y0 y//˛ ˛Š
˛ f .˛/ .x0 y0 / X ˛ .x x0 /˛ .y0 y/ ˛Š D0
f
.˛/
(2.2)
˛ X .x x0 /˛ .y y0 / .x0 y0 / .1/ .˛ /Š Š D0
m1 X m1 X D0
f .˛/ .z0 /
D0
.1/ f .C/ .x0 y0 /
.x x0 / .y y0 / : Š Š
The main property of gQ z0 ;m , as far as hierarchical matrices are concerned, is that the variables x and y are separated in each term of the sum. Expansions with this property,
2.3 Error estimate
11
no matter whether they are based on polynomials or more general functions, are called degenerate. A data-sparse approximation of G can be obtained by replacing the original kernel function g by gQ z0 ;m in (2.1): Z
i=n
Z
zij ´ G .i1/=n
D
.j 1/=n
m1 X m1 X D0
j=n
gQ z0 ;m .x; y/ dy dx
Z .1/ f .C/ .x0 y0 /
D0
Z j=n .x x0 / .y y0 / dx dy: Š Š .i1/=n .j 1/=n i=n
We introduce K ´ f0; : : : ; m 1g and ´ f1; : : : ; ng and matrices V; W 2 RK and S 2 RKK with Z i=n Z j=n .x x0 / .y y0 / Vi ´ dx; Wj ´ dy; Š Š .i1/=n .j 1/=n 8 .C/ (2.3) ˆ .x0 y0 / if x0 > y0 and C < m; 0 holds. The refined proof is based on the Cauchy representation of the remainder: Lemma 2.1 (Cauchy error representation). Let f 2 C 1 Œa; b and z0 2 Œa; b. We have Z 1 .z z0 /m .1 t /m1 f .m/ .z0 C t .z z0 // dt f .z/ fQz0 ;m .z/ D .m 1/Š 0 for all z 2 Œa; b and all m 2 N. Proof. See, e.g., [75], Chapter XIII, §6. Applying this error representation to the logarithmic function f and bounding the resulting terms carefully yields the following improved error estimate: Lemma 2.2 (Error of the Taylor expansion). Let a; b 2 R with 0 < a < b. Let z0 ´ .a C b/=2 and ´ 2a=.b a/. For all z 2 Œa; b and all m 2 N, we have m1 1 1 : C1 jf .z/ fQz0 ;m .z/j log C1 Proof. Let z 2 Œa; b and m 2 N. Due to Lemma 2.1, the Taylor expansion satisfies Z f .z/ fQz0 ;m .z/ D
1
.1 t /m1 f .m/ .z0 C t .z z0 // 0
.z z0 /m dt: .m 1/Š
(2.5)
2.3 Error estimate
For our special kernel function, we have ´ log.z/ ./ f .z/ D . 1/Š .1/ z
13
if D 0; otherwise;
and the remainder takes the form Z 1 jz z0 jm Q jf .z/ fz0 ;m .z/j .1 t /m1 .m 1/Š jz0 C t .z z0 /jm dt .m 1/Š 0 Z 1 .1 t /m1 jz z0 jm D dt jz0 C t .z z0 /jm 0 m Z 1 jz z0 j m1 .1 t / dt: jz0 j t jz z0 j 0 Due to jz0 j D a C
ba ba ba ba D C D . C 1/ ; 2 2 2 2
jz z0 j
ba ; 2
we find jz z0 j .b a/=2 1 D jz0 j t jz z0 j . C 1/.b a/=2 t .b a/=2 C1t and observe
Z
jf .z/ fQz0 ;m .z/j
1 0
Z
.1 t /m1 dt D . C 1 t /m
1
0
1 Cs
s Cs
m1 ds
by substituting s D 1 t . Since elementary computations yield s 1 Cs C1 we can conclude
jf .z/ fQz0 ;m .z/j D D
for all s 2 Œ0; 1;
1 C1 1 C1 1 C1
m1 Z m1
0
1
1 ds Cs
.log.1 C / log / m1
1C log
;
which is the desired result. The speed of convergence depends on the quantity , the ratio between a and the radius of the interval Œa; b. In order to ensure uniform convergence of gQ z0 ;m , we have to assume a uniform lower bound of jzj D jx yj.
14
2 Model problem
Corollary 2.3. Let 2 R>0 . Let t; s R be non-trivial intervals satisfying diam.t / C diam.s/ 2 dist.t; s/:
(2.6)
Let x0 be the midpoint of t and let y0 be the midpoint of s, and let z0 ´ x0 y0 . Then the estimate m1 jg.x; y/ gQ z0 ;m .x; y/j log. C 1/ C1 holds for all x 2 t , y 2 s, and m 2 N. Proof. Let x 2 t , y 2 s, and m 2 N. Since g and gQ z0 ;m are symmetric, we can restrict our attention to the case x > y without loss of generality. For a t ´ inf t , b t ´ sup t, as ´ inf s, bs ´ sup s we have diam.t / D b t a t ; diam.s/ D bs as ; dist.t; s/ D a t bs ; bt C at bs C as b t C a t bs a s bCa x0 D ; y0 D ; z0 D D 2 2 2 2 with a ´ a t bs D dist.t; s/ and b ´ b t as . We apply Lemma 2.2 to z D x y 2 Œa; b and get
1 jg.x; y/ gQ z0 ;m .x; y/j log 1 C
1 1C
m1 :
Now the admissibility condition (2.6) implies D
2a 2 dist.t; s/ 2 dist.t; s/ 1 D D ba diam.t / C diam.s/ 2 dist.t; s/
and we can conclude jg.x; y/ gQ z0 ;m .x; y/j log. C 1/
1 1 C 1=
m1 D log. C 1/
C1
m1
for all x 2 t and y 2 s. This means that the Taylor expansion gQ z0 ;m will converge exponentially in m, and that the speed of the convergence depends on the ratio of the distance and the diameter of the intervals. The assumption that the intervals containing x and y have positive distance is crucial: if x and y could come arbitrarily close, the resulting singularity in g could no longer be approximated by the polynomial gQ z0 ;m .
2.4 Local approximation
15
2.4 Local approximation Corollary 2.3 implies that we cannot use a global Taylor expansion to store the entire matrix G in factorized form: at least the diagonal entries Gi i correspond to integrals with singular integrands which cannot be approximated efficiently by our approach. Instead, we only use local Taylor expansions for subblocks of the matrix G. Let Ot ; sO , and let t; s Œ0; 1 be intervals satisfying Œ.i 1/=n; i=n t
and
Œ.j 1/=n; j=n s
(2.7)
for all i 2 tO and j 2 sO . Let x t 2 t be the midpoint of t , and let ys 2 s be the midpoint of s. We assume that t and s satisfy the admissibility condition (2.6), so Corollary 2.3 implies that the Taylor expansion of g at the point z0 ´ x t ys will converge exponentially for all x 2 t and y 2 s. Similar to the global case, we introduce the local approximation z t;s ´ V t S t;s Ws G with S t;s 2 RKK and V t ; Ws 2 RK given by ´R i=n .xx t / dx if i 2 tO; .i1/=n Š .V t /i ´ 0 otherwise; ´R j=n .yys / dy if j 2 sO ; .Ws /j ´ .j 1/=n Š 0 otherwise; 8 .C/ ˆ .x t ys / if x t > ys and C < m; 0; `D1 .k C `/ for all n; k 2 N0 : ´ (4.8) D kŠ k 1 otherwise Definition 4.5 (Asymptotically smooth kernels). Let g 2 Rd Rd ! R. Let Cas 2 R>0 , c0 2 R1 and 2 N. The function g is called .Cas ; ; c0 /-asymptotically smooth (cf. [28], [27]) if j@p g.x; y/j Cas
c0 kpk2 1 kx ykC 2
(4.9)
holds for all 2 N0 , x; y 2 Rd with x ¤ y and all directions p 2 Rd Rd . For D 0, the function g is called .Cas ; 0; c0 /-asymptotically smooth if j@p g.x; y/j Cas . 1/Š
c0 kpk2 kx yk2
(4.10)
holds for all 2 N, x; y 2 Rd with x ¤ y and all directions p 2 Rd Rd . In this context is called the order of the singularity of g, D 0 corresponds to logarithmic singularities. Example 4.6. The most important examples of asymptotically smooth kernel functions are ´ 1 1 if x ¤ y; 3 3 g3 W R R ! R; .x; y/ 7! 4 kxyk2 0 otherwise;
84
4 Application to integral operators
the fundamental solution of Poisson’s equation in three-dimensional space, and ´ g2 W R2 R2 ! R;
.x; y/ 7!
1 log kx yk2 2 0
if x ¤ y; otherwise;
its two-dimensional counterpart. According to [63], Appendix E, both functions are asymptotically smooth: for a given c0 > 1, [63], Satz E.2.1, yields a constant Cas 2 R>0 such that g3 is .Cas ; 1; c0 /asymptotically smooth and g2 is .Cas ; 0; c0 /-asymptotically smooth. In order to be able to formulate the approximation error estimate in the familiar terms of diameters and distances, we require the following result (which is obvious if considered geometrically, cf. Figure 4.1): xt
ys rt
dist rs
Figure 4.1. Distance of the centers x t and ys of two circles expressed by the distance of the circles and their radii.
Lemma 4.7 (Distance of centers). Let x t 2 K t and ys 2 Ks be the centers of the balls K t and Ks , respectively. Let r t 2 R0 and rs 2 R0 be the radii of K t and Ks , respectively. If dist.K t ; Ks / > 0 holds, we have kx t ys k2 dist.K t ; Ks / C r t C rs : Proof. We define the continuous functions x W Œ0; r t ! Rd ; y W Œ0; rs ! Rd ; h W Œ0; r t Œ0; rs ! R;
x t ys ; kx t ys k2 x t ys ˇ 7! ys C ˇ ; kx t ys k2 .˛; ˇ/ 7! kx t ys k2 ˛ ˇ; ˛ 7! x t ˛
85
4.3 Approximation by Taylor expansion
and observe x.˛/ 2 K t ;
y.ˇ/ 2 Ks
for all ˛ 2 Œ0; r t ; ˇ 2 Œ0; rs ;
which implies x t ys 0 < dist.K t ; Ks / kx.˛/ y.ˇ/k2 D .kx t ys k2 ˛ ˇ/ kx y k t
s 2 2
D jkx t ys k2 ˛ ˇj D jh.˛; ˇ/j for all ˛ 2 Œ0; r t and ˇ 2 Œ0; rs . Since h is continuous with h.0; 0/ D kx t ys k2 dist.K t ; Ks / > 0; we conclude h.˛; ˇ/ > 0 for all ˛ 2 Œ0; r t and all ˇ 2 Œ0; rs , i.e., dist.K t ; Ks / jh.r t ; rs /j D h.r t ; rs / D kx t ys k2 r t rs ; and this is the desired estimate. Now we can proceed to prove an estimate for the approximation error resulting from Taylor approximation: Theorem 4.8 (Approximation error). Let 2 R>0 . Let K t and Ks be d -dimensional balls satisfying the admissibility condition diam.K t / C diam.Ks / 2 dist.K t ; Ks /:
(4.11)
Let the kernel function g be .Cas ; ; c0 /-asymptotically smooth. Let x t 2 K t and ys 2 Ks be the centers of K t and Ks , respectively. Then 8 m1 ˆ 0, we find Z 1 Z 1 ˛ ˛ 1 1 1 dt D ds D C1 C1 .˛ C ˇ/ ˇ 0 ..1 t/˛ C ˇ/ 0 .s˛ C ˇ/ 1 1 1 . C 1/ 1 1 D ˇ .ˇ C ˇ/ . C 1/ ˇ 1 ˇ and can conclude
8 m1 ˆ qopt , we can find a constant Cta 2 R>0 such that kg gQ t;s kK t Ks
Cta qm dist.K t ; Ks /
holds for all b D .t; s/ 2 LC J ; m 2 N:
4.4 Approximation by interpolation Using Taylor expansions to construct separable approximations of the kernel function has many advantages: significant portions of the resulting transfer and coupling matrices contain only zero entries, i.e., we can save storage by using efficient data formats, the evaluation of the monomials corresponding to the cluster bases is straightforward, and the error analysis is fairly simple.
88
4 Application to integral operators
Unfortunately, the approach via Taylor expansions has also two major disadvantages: the construction of the coupling matrices requires the efficient evaluation of derivatives of the kernel function g, e.g., by recursion formulas that have to be derived by hand, and the error estimates are not robust with respect to the parameter c0 appearing in the Definition 4.5 of asymptotic smoothness: if c0 grows, we have to adapt in order to guarantee exponential convergence. Both properties limit the applicability of Taylor-based approximations in general situations. We can overcome the disadvantages by using an alternative approximation scheme: instead of constructing a polynomial approximation of the kernel function g by Taylor expansion, we use Lagrangian interpolation.
One-dimensional interpolation Let us first consider the one-dimensional case. For each interpolation order m 2 N, we fix interpolation points . m; /m D1 in the interval Œ1; 1. We require that the points corresponding to one m 2 N are pairwise different, i.e., that ¤ ) m; ¤ m;
holds for all m 2 N and ; 2 f1; : : : ; mg:
(4.14)
The one-dimensional interpolation operator of order m 2 N is given by Im W C Œ1; 1 ! Pm ;
f 7!
m X
f . m; /Lm; ;
D1
where the Lagrange polynomials Lm; 2 Pm are given by Lm; .x/ ´
m Y
x m;
m; D1 m;
for all x 2 R; m 2 N; 2 f1; : : : ; mg:
¤
Since Lm; . m; / D ı holds for all m 2 N and ; 2 f1; : : : ; mg, we have Im Œf . m; / D f . m; /
for all f 2 C Œ1; 1; m 2 N and 2 f1; : : : ; mg:
Combining (4.14) with this equation and the identity theorem for polynomials yields Im Œp D p
for all m 2 N and p 2 Pm ;
(4.15)
i.e., the interpolation Im is a projection with range Pm . In order to define an interpolation operator for general non-empty intervals Œa; b, we consider the linear mapping ˆŒa;b W Œ1; 1 ! Œa; b;
t 7!
bCa ba C t; 2 2
4.4 Approximation by interpolation
89
W C Œa; b ! Pm by and define the transformed interpolation operator IŒa;b
m 1 IŒa;b
m Œf ´ .Im Œf B ˆŒa;b / B ˆŒa;b
for all m 2 N;
i.e., by mapping f into C Œ1; 1, applying the original interpolation operator, and is an affine mapping the resulting polynomial back to the interval Œa; b. Since ˆ1 Œa;b
mapping, the result will still be a polynomial in Pm . Let m 2 N. The definition of Im yields IŒa;b
m Œf D
m X
f .ˆŒa;b . m; //Lm; B ˆ1 Œa;b ;
D1
Œa;b m and defining the transformed interpolation points m; in the interval Œa; b by D1 bCa ba C
m; 2 2
m and the corresponding transformed Lagrange polynomials LŒa;b
m; D1 by Œa;b
m; ´ ˆŒa;b . m; / D
1 LŒa;b
m; ´ Lm; B ˆŒa;b ;
we get the more compact notation IŒa;b
m
m X Œa;b Œa;b
Lm; : D f m; D1
Since the equation Œa;b
1 LŒa;b
m; . m; / D Lm; B ˆŒa;b .ˆŒa;b . m; // D Lm; . m; / D ı
D
m Œa;b
Œa;b
Y
m; m; D1 ¤
Œa;b
Œa;b
m; m;
holds for all ; 2 f1; : : : ; mg, the identity theorem for polynomials yields the equation LŒa;b
m; .x/
D
m Y D1 ¤
Œa;b
x m; Œa;b
Œa;b
m; m;
for all x 2 R; m 2 N and 2 f1; : : : ; mg;
which we can use to evaluate the transformed Lagrange polynomials efficiently.
90
4 Application to integral operators
Separable approximation by multi-dimensional interpolation Since we intend to apply interpolation to construct a separable approximation of the kernel function g defined in a multi-dimensional domain, we require multi-dimensional interpolation operators. Let us consider an axis-parallel d -dimensional box Q D Œa1 ; b1 Œad ; bd with a1 < b1 , …, ad < bd . The order m of the one-dimensional interpolation scheme is replaced by an order vector m 2 Nd , and the corresponding tensor-product interpolation operator IQ m is defined by Œa1 ;b1
Œad ;bd
˝ ˝ Im : IQ m ´ Im1 d
We can observe that it takes the familiar form X Q IQ Œf D f . m; /LQ m m;
(4.16)
for all f 2 C.Q/;
0 0. Comparing the Theorems 4.8, 4.22 and 4.33 yields the error bounds kg gQ t;s k1;K t Ks
Cg q n dist.K t ; Ks /
for all b D .t; s/ 2 LC J ;
Cg q n dist.Q t ; Qs /
for all b D .t; s/ 2 LC J ;
Cg q n dist.Q t ; Qs / Cjj
for all b D .t; s/ 2 LC J ;
if Taylor expansions are used, kg gQ t;s k1;Q t Qs if interpolation is applied and kg gQ t;s k1;Q t Qs
119
4.6 Matrix approximation
if the -th partial derivative of an interpolant is considered. Here Cg 2 R>0 is a constant which does not depend on n and b, and q 2 Œ0; 1Œ is the rate of convergence (recall that the admissibility parameter has to be sufficiently small in the case of the Taylor expansion). In the case of the Taylor approximation, the corresponding admissibility condition (4.11) implies 1 .diam.K t / C diam.Ks // 2 dist.K t ; Ks / for all b D .t; s/ 2 LC J ;
diam.K t /1=2 diam.Ks /1=2
and the error estimate takes the form kg gQ t;s k1;K t Ks
Cg q n : diam.K t /=2 diam.Ks /=2
In order to derive a similar result for approximations constructed by interpolation, we have to replace the admissibility condition (4.37) by the slightly stronger condition maxfdiam.Q t /; diam.Qs /g 2 dist.Q t ; Qs /
(4.49)
and observe that it yields diam.Q t /1=2 diam.Qs /1=2 maxfdiam.Q t /; diam.Qs /g 2 dist.Q t ; Qs /
for all b D .t; s/ 2 LC J
(4.50)
if the block cluster tree is constructed based on the new admissibility condition. Then the error estimate implies kg gQ t;s k1;Q t Qs
Cg .2/ q n diam.Q t /=2 diam.Qs /=2
in the case of the interpolation and kg gQ t;s k1;Q t Qs
Cg .2/Cjj q n diam.Q t /.Cjj/=2 diam.Qs /.Cjj/=2
if -th partial derivatives of interpolants are used. Since all three cases can be handled in a similar fashion, we restrict our attention to approximations constructed by interpolation, i.e., we use kg gQ t;s k1;Q t Qs b ´
Cg; q n diam.Q t /=2 diam.Qs /=2
(4.51)
for all b D .t; s/ 2 LC J , where we let Cg; ´ Cg .2/ . Combining this estimate with Lemma 4.40 yields the following global error bound for the Frobenius norm:
120
4 Application to integral operators
Theorem 4.41 (Frobenius approximation error). Let C ; CJ 2 R0 be defined as in (4.48), and let q 2 Œ0; 1Œ be the rate of convergence introduced in (4.51). Let the families . i /i2 and . j /j 2J be Cov -overlapping. For all admissible leaves b D .t; s/ 2 LC Q t;s satisfies (4.51). Then we have J , we assume that the g z F kG Gk
Cov C CJ Cg; j j =2 minfdiam.Q t / diam.Qs / W b D .t; s/ 2 LC J g
qn:
(4.52)
Proof. Combine Lemma 4.40 with (4.51). Remark 4.42 (Asymptotic behaviour of the error). Let us assume that is a d dimensional subset or submanifold of Rd , and that the discretization is based on a quasiuniform hierarchy of grids with a decreasing sequence h0 ; h1 ; : : : of mesh parameters. The minimum in estimate (4.52) is attained for leaf clusters, and we can assume that the diameters of their bounding boxes are approximately proportional to the grid parameter. If we neglect the constants C and CJ corresponding to the scaling of the basis functions, the Frobenius approximation error on mesh level ` can be expected to behave like h qn. `
Spectral norm estimates z of the H 2 -matrix approxiNow let us focus on the spectral norm of the error G G mation. Definition 4.43 (Spectral norm). Let X 2 RJ . The spectral norm (or operator norm) of X is given by ² ³ kXuk2 kXk2 ´ sup W u 2 RJ n ¹0º : kuk2 Under the same conditions as in the case of the Frobenius norm we can also derive a blockwise error estimate for the spectral norm: Lemma 4.44 (Blockwise error). Let . b /b2LC
J
be a family satisfying (4.47). Let
C ; CJ 2 R>0 be defined as in (4.48). We have X 1=2 X 1=2 k t Gs V t Sb Ws k2 C CJ b j i j j j j i2tO
j 2Os
for all b D .t; s/ 2 LC J . Proof. Let E ´ t Gs V t Sb Ws . Let u 2 RJ . Due to 2 X X X X XX kEuk22 D .Eu/2i D Eij uj Eij2 uj2 i2
i2
j 2J
i2
j 2J
j 2J
121
4.6 Matrix approximation
D
XX
Eij2 kuk22 D kEk2F kuk22 ;
i2 j 2J
we can conclude by using the Frobenius norm estimate of Lemma 4.38. In order to find a global error bound, we have to combine these blockwise error bounds using the operator-norm counterpart of Lemma 4.39: Lemma 4.45 (Global spectral norm). Let X 2 RJ . We have
X
kX k2
k t Xs k22
1=2 :
bD.t;s/2LJ
Proof. Let v 2 R and u 2 RJ . We have ˇ ˇ X ˇ ˇ hv; t Xs ui2 ˇ jhv; Xui2 j D ˇ
X
bD.t;s/2LJ
bD.t;s/2LJ
X
k t Xs k2 k t vk2 ks uk2
bD.t;s/2LJ
jh t v; t Xs s ui2 j
X
k t Xs k22
1=2
bD.t;s/2LJ
X
k t vk22 ks uk22
1=2 :
bD.t;s/2LJ
Combining Lemma 3.14 and Corollary 3.9 yields that LJ corresponds to a disjoint partition of J, i.e., we have X k t vk22 ks uk22 D kvk22 kuk22 bD.t;s/2LJ
and conclude jhv; Xui2 j
X
k t Xs k22
1=2 kuk2 kvk2 :
bD.t;s/2LJ
Setting v ´ Xu and proceeding as in the proof of Lemma 4.44 proves our claim. We can combine the Lemmas 4.44 and 4.45 in order to prove the following bound for the H 2 -matrix approximation error: Lemma 4.46 (Spectral error bound). Let . b /b2LC
J
be a family of positive real
numbers satisfying (4.47). Let the families . i /i2 and . j /j 2J be Cov -overlapping, and let C ; CJ 2 R>0 be defined as in (4.48). We have z 2 Cov C CJ j j maxf b W b 2 LC g: kG Gk J
122
4 Application to integral operators
z and use Lemma 4.44 in order Proof. We apply Lemma 4.45 to the matrix X ´ G G to obtain X z 22 kG Gk k t Gs V t Sb Ws k22 bD.t;s/2LC J
X
C2 CJ2
b2
bD.t;s/2LC J
X
X
j i j
i2tO
j j j
j 2Os
X
X
C2 CJ2 maxf b2 W b 2 LC J g
bD.t;s/2LC J
i2tO
X
j i j
j j j :
j 2Os
As in the proof of Lemma 4.40, we combine Lemma 3.14 with Corollary 3.9 and find X X XX X j i j j j j D j i j j j j bD.t;s/2LC J
i2tO
i2 j 2J
j 2Os
D
X i2
X
j i j
2 j j j Cov j j2
j 2J
by using Lemma 4.37 in the last step. The estimate of Lemma 4.45 is quite straightforward, general and completely sufficient for many standard applications, but it is, in general, far from optimal: the Lemmas 4.40 and 4.46 even yield exactly the same upper bounds for Frobenius and spectral norm, although the spectral norm will indeed be significantly smaller than the Frobenius norm in practical applications. To illustrate this, let us consider the example of the identity matrix 0 1 1 B C nn :: I D@ A2R : : 1 It can bepconsidered as an n n block matrix, and Lemma 4.45 yields the estimate kI k2 n, which is obviously far from the optimal bound kI k2 D 1. Therefore we now consider an improved estimate (a generalization of [49], Satz 6.2) which takes the block structure of a matrix into account: Theorem 4.47 (Global spectral norm). Let X 2 RJ , and let TJ be Csp -sparse. pJ p Let p and pJ be the depths of T and TJ . Let . ;` /`D0 and . J;` /`D0 be families in R0 satisfying 1=2 1=2 J;level.s/ k t Xs k2 ;level.t/
for all b D .t; s/ 2 LJ :
(4.53)
123
4.6 Matrix approximation
Then we have kX k2 Csp
p X
;`
pJ 1=2 X
`D0
1=2 J;`
:
`D0
Proof. Let v 2 R and u 2 RJ . Due to the triangle inequality, estimate (4.53), and the Cauchy–Schwarz inequality, we have X jh t v; t Xs s ui2 j jhv;Xui2 j bD.t;s/2LJ
X
k t Xs k2 k t vk2 ks uk2
bD.t;s/2LJ
X
1=2 1=2 ;level.t/ k t vk2 J;level.s/ ks uk2
bD.t;s/2LJ
X
;level.t/ k t vk22
1=2
bD.t;s/2LJ
X
J;level.s/ ks uk22
1=2 : (4.54)
bD.t;s/2LJ
Let us take a look at the first factor. Using sparsity and Corollary 3.10, we find X X ;level.t/ k t vk22 Csp ;level.t/ k t vk22 t2T
bD.t;s/2LJ
X p
D Csp
D Csp
;`
X
k t vk22 Csp
`D0
t2T`
p X
;` kvk22 :
p X
;` kvk22
`D0
`D0
By the same reasoning, the second factor in (4.54) can be bounded by X
J;level.s/ ks uk22
Csp
bD.t;s/2LJ
pJ X
J;` kuk22 ;
`D0
combining both bounds yields jhv; Xui2 j Csp
p X `D0
;`
pJ 1=2 X
1=2 J;`
kvk2 kuk2 ;
`D0
and we can complete the proof by setting v D Xu. Let us now consider the application of these estimates to the kernel function. Combining Lemma 4.44 with (4.51) yields the following result:
124
4 Application to integral operators
Lemma 4.48 (Factorized error estimate). Let b D .t; s/ 2 LC J satisfy the strong admissibility condition (4.49). Let the families . i /i2 and . j /j 2J of supports be Cov -overlapping. Let the approximation gQ t;s satisfy the error estimate (4.51). Then we have 1=2 1=2 j t j j s j n k t Gs V t Sb Ws k2 Cov C CJ Cg; q : diam.Q t / diam.Qs / (4.55) Proof. Combining Lemma 4.44 with (4.51) yields k t Gs V t Sb Ws k2
X 1=2 X 1=2 C CJ Cg; q n j j j j : i j diam.Q t /=2 diam.Qs /=2 i2tO
Due to Lemma 4.37, we have X j i j Cov j t j; i2tO
X
j 2Os
j j j Cov j s j;
j 2Os
so combining both estimates concludes the proof. This lemma provides us with error bounds matching the requirements of Theorem 4.47 perfectly. Theorem 4.49 (Spectral approximation error). Let C ; CJ 2 R>0 be given as in (4.48). Let the families . i /i2 and . j /j 2J of supports be Cov -overlapping. Let the block cluster tree TJ be Csp -sparse and let all of its admissible leaves satisfy the strong admissibility condition (4.49). Let p and pJ be the depths of T and TJ , respectively. Let ² ³ j t j .`/ ;` ´ max W t 2 T for all ` 2 f0; : : : ; p g; diam.Q t / ² ³ j s j J;` ´ max W s 2 TJ.`/ for all ` 2 f0; : : : ; pJ g: diam.Qs / Then we have z 2 Csp Cg; Cov C CJ q n kG Gk
p X
;`
pJ 1=2 X
`D0
1=2 J;`
:
(4.56)
`D0
Proof. We combine Lemma 4.48 with Theorem 4.47. Remark 4.50 (Asymptotic behaviour of the error). As in Remark 4.42, let us assume that is a d -dimensional subset or submanifold and that the discretization is based on a quasi-uniform grid hierarchy with a decreasing sequence h0 ; h1 ; : : : of mesh
4.7 Variable-order approximation
125
parameters. We again neglect the scaling of the basis functions .'i /i2 and . j /j 2J captured by the constants C and CJ . For piecewise smooth geometries, we can expect the diameters diam. t / of cluster supports and diam.Q t / of the corresponding bounding boxes to be approximately proportional, and we can expect that j t j behaves like diam. t /d . Under these assumptions, we find ;` maxfdiam. t /d W t 2 T.`/ g J;` maxfdiam. s /
d
W s2
TJ.`/ g
for all ` 2 f0; : : : ; p g; for all ` 2 f0; : : : ; pJ g
and have to distinguish between three different cases: • If d > holds, i.e., if the singularity of the kernel function is weak, the sums appearing in the estimate (4.56) will be dominated by the large clusters. Since the large clusters will remain essentially unchanged when refining the grid, the sums can be bounded by a constant, and we can conclude that the spectral error will behave like q n on all levels of the grid. • If d D holds, all terms in the sums appearing in (4.56) can be individually bounded by a constant. Therefore we can expect that the error is approximately proportional to the depth of the cluster tree, i.e., that it will behave like j log h` jq n and grow very slowly when the grid is refined. • If d < holds, i.e., if the kernel function is strongly singular, the sums appearing in (4.56) will be dominated by the small clusters, and therefore the spectral error will behave like h`d q n . In all three cases, the spectral error estimate is better than the Frobenius error estimate given in Remark 4.42 as h qn. `
4.7 Variable-order approximation Until now, we have assumed that the order of the Taylor expansion or the interpolation scheme is constant for all clusters. We have seen in Lemma 4.48 that the size of the support of a cluster plays a major role in determining the spectral approximation error, and we can now take advantage of this observation in order to construct approximation schemes that lead to better error estimates than the ones provided by Theorem 4.49. The fundamental idea is to use different approximation orders for different cluster sizes. It was introduced in [70] and analyzed for Taylor expansions and weakly singular kernel functions in [90], [91]. A refined analysis of this approach was presented in [101], [100]. By using interpolation instead of Taylor expansions, the convergence results can be significantly improved [23].
126
4 Application to integral operators
The restriction to weakly singular kernel functions can sometimes be overcome by using suitable globally-defined antiderivatives of the kernel function [25], but since this approach does not fit the concept of H 2 -matrices introduced here and since its error analysis requires fairly advanced tools which cannot be introduced in the context of this book, we do not discuss it further.
Motivation We assume that the kernel function is only weakly singular, i.e., that the order of the singularity is smaller than the dimension d of the subset or submanifold . For this case, Remark 4.50 states that the spectral error will be dominated by the error arising in the large clusters, while the blockwise error in leaf clusters will behave like h`d , i.e., it will decrease as the meshwidth h` decreases. We would like to ensure the same favourable convergence behaviour for all clusters, z In order to do this, we reconsider Lemma 4.48: let and thus for the entire matrix G. C b D .t; s/ 2 LJ be an admissible block. If we can ensure kg gQ t;s k1;Q t Qs .
q nt diam.Q t /
1=2
q ns diam.Qs /
1=2
instead of (4.51), where .n t / t2T and .ns /s2TJ are suitably-chosen families of parameters, the estimate (4.55) takes the form nt ns q j t j 1=2 q j s j 1=2 k t Gs V t Sb Ws k2 . ; (4.57) diam.Q t / diam.Qs / and proceeding as in Theorem 4.49 yields z 2. kG Gk
p X
;`
pJ 1=2 X
`D0 p
1=2 J;`
`D0
p
J for the families . ;` /`D0 and . J;` /`D0 given by ² nt ³ q j t j .`/ ;` ´ max W t 2 T diam.Q t / ² ns ³ q j s j .`/ J;` ´ max W s 2 TJ diam.Qs /
for all ` 2 f0; : : : ; p g; for all ` 2 f0; : : : ; pJ g:
This error estimate differs significantly from the one provided by Theorem 4.49: instead of working with the same interpolation order for all clusters, we can use a different order for each individual cluster. This variable-order approach [91], [90] allows us to compensate the growth of the cluster supports j t j by increasing the order of interpolation.
4.7 Variable-order approximation
127
We assume that the measure of the support of a cluster can be bounded by the diameter of the corresponding bounding box, i.e., that there is a constant Ccu 2 R>0 satisfying j t j Ccu diam.Q t /d
for all t 2 T ;
(4.58a)
d
for all s 2 TJ :
(4.58b)
j s j Ccu diam.Qs /
The size of the constant Ccu is determined by the geometry: if is a subset of Rd , we always have Ccu 1. If is a submanifold, the size of Ccu depends on how “tightly folded” is, e.g., on the curvature of the manifold (cf. Figure 4.3). Qt
Qt
t
t
Ccu D
p 2
Ccu > 10
Figure 4.3. Influence of the geometry of on the constant Ccu .
We also have to assume that the diameters of bounding boxes of leaf clusters are approximately proportional to the mesh parameter h and that they do not grow too rapidly as we proceed from the leaves of the cluster trees towards the root, i.e., that there are constants Cgr 2 R>0 and 2 R1 satisfying diam.Q t / Cgr h p level.t/
for all t 2 T ;
(4.59a)
diam.Qs / Cgr h pJ level.s/
for all s 2 TJ :
(4.59b)
Based on the assumptions (4.58) and (4.59), we find ³ ² nt q j t j .`/ ;` D max W t 2 T diam.Q t / Ccu maxfq n t diam.Q t /d W t 2 T.`/ g Ccu Cgrd hd maxfq n t . d /p ` W t 2 T.`/ g: For arbitrary parameters ˛; ˇ 2 N0 , we can choose the interpolation orders high enough to ensure n t ˛ C ˇ.p level.t // for all t 2 T . Then the error estimates takes the form ;` Ccu Cgrd hd q ˛ .q ˇ d /p ` :
128
4 Application to integral operators
For any given 20; 1Œ, we can let ˇ and observe
log .d / log log q q ˇ d ;
which implies the bound ;` Ccu Cgrd hd q ˛ p ` : This estimate allows us to bound the sum over all levels ` by a geometric sum, and bounding the sum by its limit yields p X
;` Ccu Cgrd hd q ˛
`D0
Ccu Cgrd hd q ˛
p X `D0 1 X
p ` D Ccu Cgrd hd q ˛
p X
`
`D0
` D Ccu Cgrd hd q ˛
`D0
1 : 1
p
J Applying the same reasoning to the family . J;` /`D0 yields
z 2 . hd kG Gk
q˛ : 1
(4.60)
Since we have assumed < d , this estimate implies that the approximation error will decrease like hd if the grid is refined. The additional parameter ˛ can be used to ensure that the approximation error is sufficiently small. The major advantage of this variable-order approximation scheme is that the order of the interpolation is bounded in the leaf clusters and grows only slowly if we proceed towards the root of the cluster tree. This means that the leaf clusters, and these dominate in the complexity estimates, are handled very efficiently. A detailed complexity analysis shows that the resulting rank distribution is bounded, i.e., the computational and storage complexity is optimal.
Re-interpolation scheme We have seen that we have to ensure n t ˛ C ˇ.p level.t//
for all t 2 T
if we want to reach the desirable error estimate (4.60). If we restrict our analysis to the case of isotropic interpolation, this can be achieved by using the order vectors m t; ´ ˛ C ˇ.p level.t //
for all t 2 T ; 2 f1; : : : ; d g:
(4.61)
4.7 Variable-order approximation
129
Since larger clusters now use a higher approximation order than smaller clusters, we have to expect that I t 0 ŒL t; ¤ L t; will hold for certain clusters t 2 T with t 0 2 sons.t / and certain 2 K t : if the order in the cluster t 0 is lower than the order used in t , not all Lagrange polynomials used for t can be represented in the basis used for t 0 . According to (4.20a), this can lead to X Vt ¤ Vt 0 Et 0 ; t 0 2sons.t/
i.e., the cluster basis V D .V t / t2T will in general not be nested. Losing the nested structure of the cluster basis is not acceptable, since it would lead to a less efficient representation of the matrix approximation. As in [23], [22], we fix this by constructing a nested cluster basis based on the – no longer nested – cluster basis V : we define Vz D .Vzt / t2T by ´ if sons.t / D ;; Vt for all t 2 T : Vzt ´ P z 0 0 otherwise t 0 2sons.t/ V t E t This cluster basis is obviously nested and uses the same rank distribution K D .K t / t2T as V . The same reasoning applies to the cluster basis W D .Ws /s2TJ , and we introduce z D .W zs /s2T by a similarly modified nested clusterbasis W J ´ if sons.s/ D ;; s zs ´ W for all s 2 TJ : W P z otherwise s 0 2sons.s/ Ws 0 Fs 0 z is the same as for W . Instead of approximating an admisThe rank distribution for W z z sible block b D .t; s/ 2 LC J by V t Sb Ws , we now approximate it by V t Sb Ws . z require no Remark 4.51 (Implementation). The algorithms for constructing Vz and W modifications compared to the standard case, we only have to ensure that the correct interpolation orders and Lagrange polynomials are used. Lemma 4.52 (Complexity). According to Lemma 4.10, we have #K t D .˛ C ˇ.p level.t///d
for all t 2 T :
Let T be quasi-balanced, i.e., let there be constants Cba 2 N and 2 R>1 satisfying #ft 2 T W level.t / D `g Cba `p c
for all ` 2 N0 :
Then the rank distribution K D .K t / t2T is .Cbn ; ˛; ˇ; d; /-bounded with Cbn ´ Cba
:
1
(4.62)
130
4 Application to integral operators
Proof. Let ` 2 N and R` ´ ft 2 T W #K t > .˛ C ˇ.` 1//d g: For a cluster t 2 R` , we have .˛ C ˇ.` 1//d < #K t D .˛ C ˇ.p level.t ///d ; ˛ C ˇ.` 1/ < ˛ C ˇ.p level.t //; ` 1 < p level.t /; level.t / < p ` C 1; level.t / p `: Since T is quasi-balanced, the estimate (4.62) yields p `
R`
[
ft 2 T W level.t / D ng;
nD0 p `
#R`
X
p `
#ft 2 T W level.t / D ng
nD0
Cba np c
nD0
X p
D Cba c
X
n D Cba c `
p `
X
n < Cba c `
nD0
nD`
1 D Cbn ` c 1 1=
with Cbn D Cba =. 1/. In order to analyze the approximation error corresponding to the new cluster bases z , we have to relate them to suitable approximation schemes for the kernel Vz and W function. Let t 2 T and i 2 tO. If sons.t / D ; holds, we have Z .V t /i D L t; .x/'i .x/ dx for all 2 K t
by definition. Otherwise, we let t 0 2 sons.t / with i 2 tO0 . If sons.t 0 / D ; holds, we have Z .V t 0 /i D L t 0 ; .x/'i .x/ dx for all 2 K t 0 ;
and since the transfer matrix E t 0 is given by .E t 0 / D L t; .x t 0 ; /
for all 2 K t ; 2 K t 0 ;
the definition of Vzt implies .Vzt /i D Vzt 0 E t 0 D V t 0 E t 0 D
X 2K t 0
Z L t; .x t 0 ; /
L t 0 ; .x/'i .x/ dx
4.7 Variable-order approximation
131
Z I t 0 ŒL t; .x/'i .x/ dx
D
for all 2 K t ;
i.e., the Lagrange polynomial L t; corresponding to t is replaced by its lower-order interpolant I t 0 ŒL t; . If t 0 is not a leaf, we proceed by recursively applying interpolation operators until we reach a leaf cluster containing i . The resulting nested interpolation operator is defined as follows: Definition 4.53 (Re-interpolation). For all t 2 T and all r 2 sons .t /, we define the re-interpolation operator Ir;t by ´ Ir;t 0 I t if there is a t 0 2 sons.t / with r 2 sons .t 0 /; Ir;t ´ It otherwise, i.e., if t D r: Due to Vzt D V t for t 2 L , we can derive interpolation operators corresponding to the cluster basis Vz inductively starting from the leaves of T . We collect all leaf clusters influencing a cluster in the families .L t / t2T and .Ls /s2TJ of sets given by L t ´ fr 2 L W r 2 sons .t /g Ls ´ fr 2 LJ W r 2 sons .s/g
for all t 2 T ; for all s 2 TJ :
and can express the connection between re-interpolation operators and the new cluster basis Vz by a simple equation: Lemma 4.54 (Re-interpolation). Let t 2 T , and let r 2 L t . Then we have Z z Ir;t ŒL t; .x/'i .x/ dx for all i 2 r; O 2 Kt : .V t /i D
(4.63)
Proof. By induction on level.r/ level.t /. Let t 2 T and r 2 L t with level.r/ level.t/ D 0. We have t D r 2 L , i.e., Vzt D V t by definition. According to X X L t; .x t; /L t; D ı L t; D L t; ; I t ŒL t; D 2K t
we find
2K t
Z .Vzt /i D .V t /i D Z
L t; .x/'i .x/ dx I t ŒL t; .x/'i .x/ dx
D
for all i 2 tO; 2 K t :
Let now n 2 N0 be such that (4.63) holds for all t 2 T and r 2 L t with level.r/ level.t/ D n. Let t 2 T and r 2 L t with level.r/ level.t / D n C 1. Due to the
132
4 Application to integral operators
definition of sons .t /, we can find t 0 2 sons.t / with r 2 sons .t 0 /. The definition of Vzt yields X .Vzt 0 /i .E t 0 / .Vzt /i D .Vzt 0 E t 0 /i D 2K t 0
X
D
.Vzt 0 /i L t; .x t 0 ; /
for all i 2 r; O 2 Kt :
2K t 0
Since level.r/ level.t 0 / D level.r/ level.t / 1 D n holds, we can apply the induction assumption and get Z X z 0 .V t /i D L t; .x t ; / Ir;t 0 ŒL t 0 ; .x/'i .x/ dx 2K t 0
Z
Ir;t 0
D
h X
i L t; .x t 0 ; /L t 0 ; .x/'i .x/ dx
2K t 0
Z
Ir;t 0 ŒI t 0 ŒL t; .x/'i .x/ dx
D Z
Ir;t ŒL t; .x/'i .x/ dx
D
for all i 2 r; O 2 Kt ;
which concludes the induction. zs Using this lemma, we can find a representation for subblocks of the matrix Vzt Sb W used to approximate a matrix block: Lemma 4.55 (Re-interpolated kernel). Let b D .t; s/ 2 LC 2 L t and J . Let t s 2 Ls . We have Z Z zs /ij D 'i .x/ .I t ;t ˝ Is ;s /Œg.x; y/'j .y/ dy dx .Vzt Sb W
O
for all i 2 t , j 2 sO . Proof. Let i 2 tO and j 2 sO . Lemma 4.54 and the definition of Sb (cf. 4.20c) yield X X zs /ij D zs /j .Vzt /i .Sb / .W .Vzt Sb W 2K t 2Ls
D
X X Z 2K t 2Ls
Z
D
Z
'i .x/
Z
I t ;t ŒL t; .x/'i .x/
Is ;s ŒLs; .y/
j .y/g.x t; ; xs; / dy
.I t ;t ˝ Is ;s / h X X i g.x t; ; xs; /L t; ˝ Ls; .x; y/
2K t 2Ls
j .y/ dy
dx
dx
4.7 Variable-order approximation
Z
Z 'i .x/
D Z D
Z
.I t ;t ˝ Is ;s /ŒI t ˝ Is Œg.x; y/ .I t ;t ˝ Is ;s /Œg
'i .x/
133
j .y/ dy
j .y/ dy
dx
dx:
In the last step, we have used the fact that I t and Is are projections. Using this representation of the matrix approximation, we can generalize the error estimate of Lemma 4.38 to the case of variable-order approximation: Lemma 4.56 (Blockwise error). Let b D .t; s/ 2 LC J , and let b 2 R>0 satisfy for all t 2 L t ; s 2 Ls :
kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs b Let C and CJ satisfy (4.48). Then we have
zs k2 k t Gs Vzt Sb W zs kF k t Gs Vzt Sb W X 1=2 X 1=2 C CJ b j i j j j j : i2tO
j 2Os
zs . For all u 2 RJ , we have Proof. Let X ´ t Gs Vzt Sb W 2 X X X XX kXuk22 D Xij uj Xij2 uj2 D kX k2F kuk22 i2
j 2J
i2
j 2J
j 2J
and can conclude kX k2 kX kF . Corollary 3.9 and Lemma 3.8 yield that ftO W t 2 L t g and fOs W s 2 Ls g are disjoint partitions of tO and sO , respectively, and we find X X X X XX kXk2F D k t Xs k2F D Xij2 ; t 2L t s 2Ls
t 2L t s 2Ls i2tO j 2Os
and we can apply Lemma 4.55 to each t 2 sons .t / \ L and s 2 sons .s/ \ LJ in order to find zs /ij j jXij j D jGij .Vzt Sb W ˇZ ˇ Z ˇ ˇ ˇ D ˇ 'i .x/ .g.x; y/ .I t ;t ˝ Is ;s /Œg.x; y// j .y/ dy dx ˇˇ Z Z b j'i .x/j dx j j .y/j dy b j i j1=2 k'i kL2 j j j1=2 k j kL2
b C CJ j i j
1=2
j j j1=2 :
Due to this estimate, we can conclude X X XX XX j i j j j j D b2 C2 CJ2 j i j j j j; kX k2F b2 C2 CJ2 t 2L t s 2Ls i2tO j 2Os
which is the desired result.
i2tO j 2Os
134
4 Application to integral operators
Figure 4.4. Re-interpolation of a Lagrange polynomial.
Assuming that the overlap of the supports is bounded, we can simplify this result: Corollary 4.57 (Blockwise spectral error). Let b D .t; s/ 2 LC J , and let b 2 R>0 satisfy kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs b
for all t 2 L t ; s 2 Ls :
Let C and CJ satisfy (4.48), and let the families . i /i2 and . j /j 2J be Cov overlapping. Then we have zs k2 Cov C CJ b j t j1=2 j s j1=2 : k t Gs Vzt Sb W Proof. Combine Lemma 4.56 with Lemma 4.37.
Error analysis in the one-dimensional case Before we can analyze the properties of the re-interpolation operators I t ;t in d dimensional space, we have to investigate the one-dimensional case. Let ˛ 2 N and ˇ 2 N0 . Let p 2 N, and let .J` /p`D0 be a family of non-trivial intervals J` D Œa` ; b` satisfying J`C1 J`
for all ` 2 f0; : : : ; p 1g:
For each ` 2 f0; : : : ; pg, we introduce the interpolation operator J
` ´ I˛Cˇ.p`/ : IJ;˛;ˇ `
The one-dimensional re-interpolation operator IJ;˛;ˇ is given recursively by ` ;` ´ IJ;˛;ˇ if ` < ` ; IJ;˛;ˇ ` ;`C1 ` for all `; ` 2 f0; : : : ; pg with ` `: ´ IJ;˛;ˇ ` ;` otherwise IJ;˛;ˇ ` (4.64)
4.7 Variable-order approximation
135
We can see that this definition implies IJ;˛;ˇ D IJ;˛;ˇ IJ;˛;ˇ : : : IJ;˛;ˇ IJ;˛;ˇ ` ;` ` ` 1 `C1 `
for all `; ` 2 f0; : : : ; pg with ` `: (4.65)
In order to bound the interpolation error by a best-approximation estimate, we have to establish the stability of IJ;˛;ˇ . The simple approach ` ;` kIJ;˛;ˇ Œf k1;Œa` ;b` ƒ˛Cˇ.p` / kIJ;˛;ˇ Œf k1;Œa` ;b`
` ;` ` 1;` ƒ˛Cˇ.p` / kIJ;˛;ˇ Œf k1;Œa` 1 ;b` 1
` 1;` ::: ƒ˛Cˇ.p` / ƒ˛Cˇ.p` C1/ : : : ƒ˛Cˇ.p`/ kf k1;Œa` ;b`
suggested by (4.65) will not yield a sufficiently good bound, since the resulting “stability constant” will grow too rapidly if ` ` grows. An improved estimate can be derived if we require that the intervals shrink sufficiently fast: Definition 4.58 ( -regular intervals). Let 20; 1Œ. If jJ`C1 j jJ` j
holds for all ` 2 f0; : : : ; p 1g;
we call the family J D .J` //p`D0 -regular. In the -regular case, we can exploit the fact that all but the rightmost interpolation operators in (4.65) are applied to polynomials and not to general functions: the growth of polynomials can be controlled, and applying Lemma 4.14 yields the following improved stability estimate: Theorem 4.59 (Stability of re-interpolation). Let the family J D .J` /p`D0 be -regular. Let the interpolation scheme .Im /1 mD1 be .ƒ; /-stable. Then there is a constant Cre 2 R1 depending only on , ƒ, and ˇ which satisfies kIJ;˛;ˇ Œf k1;J` Cre ƒ˛Cˇ.p`/ kf k1;J` ` ;` for all `; ` 2 f0; : : : ; pg with ` ` and all f 2 C.J` /. Proof. Cf. Theorem 3.11 in [23]. Using this inequality, we can derive an error estimate for the re-interpolation operator:
136
4 Application to integral operators
Lemma 4.60 (Re-interpolation error). Let the family J D .J` /p`D0 be -regular. Let the interpolation scheme be .ƒ; /-stable. Let Cre 2 R1 be the constant introduced in Theorem 4.59. Let `; ` 2 f0; : : : ; pg with ` ` . We have
kf
IJ;˛;ˇ Œf ` ;`
k1;J` Cre ƒ
` X
.˛ C ˇ.p n/ C 1/ kf IJ;˛;ˇ Œf k1;Jn (4.66) n
nD`
for all f 2 C.J` /. Proof. Let f 2 C.J` /. We define ´ I if n D ` C 1; Pn ´ ˛;ˇ I` ;n otherwise
for all n 2 f`; : : : ; ` C 1g:
, and Theorem 4.59 yields According to (4.64), we have Pn D PnC1 IJ;˛;ˇ n kPn Œgk1;J` Cre ƒ˛Cˇ.pn/ kgk1;Jn
for all g 2 C.Jn /; n 2 f`; : : : ; ` g:
We can use the .ƒ; /-stability of the interpolation scheme in order to prove ´ Cre ƒ.˛ C ˇ.p .n C 1// C 1/ if n < ` ; kPnC1 Œgk1;J` kgk1;Jn 1 otherwise Cre ƒ.˛ C ˇ.p n/ C 1/ kgk1;Jn for all g 2 C.Jn / and all n 2 f`; : : : ; ` g. Using a telescoping sum, we get kf
IJ;˛;ˇ Œf ` ;`
k1;J`
` X D PnC1 Œf Pn Œf
1;J`
nD`
` X D PnC1 f IJ;˛;ˇ Œf n
1;J`
nD` ` X Œf PnC1 f IJ;˛;ˇ n nD`
1;J`
Cre ƒ
` X
.˛ C ˇ.p n/ C 1/ kf IJ;˛;ˇ Œf k1;Jn ; n
nD`
and this is the required estimate. We can use Theorem 4.16 in order to bound the terms in the sum (4.66): if we again assume jf
./
Cf .x/j f 1
for all x 2 Œa` ; b` ; 2 N0 ;
4.7 Variable-order approximation
137
we observe Œf k1;Jn 2eCf .ƒ˛Cˇ.pn/ C 1/.˛ C ˇ.p n/ C 1/ kf IJ;˛;ˇ n 2f ˛ˇ.pn/ jJn j 1C % f jJn j for all n 2 f`; : : : ; ` g. Due to -regularity, we have jJn j n` jJ` j
for all n 2 f`; : : : ; ` g;
and Lemma 4.78 yields that there is a O 20; 1Œ satisfying ` 2f 2f 2f 1 % % % ` : jJn j jJ` j
jJ` j
O We can use the additional factor O ` in order to bound the sum (4.66): Theorem 4.61 (Re-interpolation error). Let the family .J` /p`D0 be -regular. Let the interpolation scheme be .ƒ; /-stable. Let a rank parameter ˇ 2 N0 and a lower bound r0 2 R>0 for the convergence radius be given. There are constants Cri 2 R1 , ˛0 2 N such that 1 2C kf IJ;˛;ˇ 1 C %.r0 /˛ˇ.p`/ Œf k C C .˛ C ˇ.p `/ C 1/ 1;J` f ri ` ;` r0 holds for all `; ` 2 f0; : : : ; pg with ` `, all ˛ ˛0 and all f 2 C 1 .J` / satisfying jf
./
Cf .x/j f 1
for all x 2 J` ; 2 N0
(4.67)
for constants Cf 2 R0 , f 2 R>0 and 2 N with 2f =jJ` j r0 . Proof. Let Cre 2 R1 be the constant introduced in Theorem 4.59. According to Lemma 4.78, there is a constant O 20; 1Œ satisfying 1 r %.r/ for all r 2 Rr0 : %
O Let w 20; 1Œ. Due to O < 1, we can find ˛0 2 N such that
O ˛0 w%.r0 /ˇ holds. We define Cri ´ 2eƒ2 Cre =.1 w/. Let now `; ` 2 f0; : : : ; pg with ` `, ˛ 2 N˛0 , and let f 2 C 1 .J` / be a function satisfying (4.67) for constants Cf 2 R0 , f 2 R>0 and 2 N with 2f =jJ` j r0 . According to Theorem 4.16, we have
138
4 Application to integral operators
kf IJ;˛;ˇ Œf k1;Jn n
2f ˛ˇ.pn/ jJn j % 2eCf .ƒ˛Cˇ.pn/ C 1/.˛ C ˇ.p n/ C 1/ 1 C 2f jJn j ˛ˇ.pn/ 2f 1 2eƒCf .˛ C ˇ.p n/ C 1/C 1 C % n` r0
jJ` j 2f ˛ˇ.pn/ 1 O .n`/.˛Cˇ.pn// C 2eƒCf .˛ C ˇ.p n/ C 1/ % 1C
r0 jJ` j 1 2eƒCf .˛ C ˇ.p `/ C 1/C 1 C
O ˛0 .n`/ %.r0 /˛ˇ.pn/ r0 1 C 1C w n` %.r0 /ˇ.n`/ %.r0 /˛ˇ.pn/ 2eƒCf .˛ C ˇ.p `/ C 1/ r0 1 w n` %.r0 /˛ˇ.p`/ D 2eƒCf .˛ C ˇ.p `/ C 1/C 1 C r0
for all n 2 f`; : : : ; ` g. We can combine this estimate with Lemma 4.60 in order to find Œf k1;J` kf IJ;˛;ˇ ` ;`
Cre ƒ
` X
.˛ C ˇ.p n// kf IJ;˛;ˇ Œf k1;Jn n
nD`
` X 1 ˛ˇ.p`/ 1C %.r0 / Cre 2eƒ Cf .˛ C ˇ.p `/ C 1/ w n` r0 nD` 1 Cre 2eƒ2 %.r0 /˛ˇ.p`/ Cf .˛ C ˇ.p `/ C 1/2C 1 C 1w r0 1 2C 1C %.r0 /˛ˇ.p`/ ; D Cf Cri .˛ C ˇ.p `/ C 1/ r0 2
2C
which is the desired result.
Error analysis in the multi-dimensional case Now we can turn our attention to the multi-dimensional case. Let p 2 N0 . Let .Q` /p`D0 be a family of non-trivial axis-parallel boxes Q` D Œa`;1 ; b`;1 Œa`;d ; b`;d satisfying Q`C1 Q`
for all ` 2 f0; : : : ; p 1g:
139
4.7 Variable-order approximation
We let m` ´ ˛ C ˇ.p `/ for all ` 2 f0; : : : ; pg and introduce a tensor-product interpolation operator ` IQ;˛;ˇ ´ IQ m` `
for all ` 2 f0; : : : ; pg:
(4.68)
The multi-dimensional re-interpolation operators are given by ´ IQ;˛;ˇ IQ;˛;ˇ if ` < ` ; Q;˛;ˇ ` ;`C1 ` I` ;` ´ for all `; ` 2 f0; : : : ; pg with ` `: otherwise IQ;˛;ˇ ` (4.69) For each 2 f1; : : : ; d g, the family Q D .Q` /p`D0 gives rise to a family J ´ .J;` /p`D0 of non-trivial intervals J;` ´ Œa`; ; b`; satisfying J;`C1 J;`
for all 2 f1; : : : ; d g; ` 2 f0; : : : ; p 1g;
and the equation Q` D J1;` Jd;` implies that IQ;˛;ˇ D IJ` 1 ;˛;ˇ ˝ ˝ I` d `
J ;˛;ˇ
holds for all ` 2 f0; : : : ; pg:
Using the definition of the re-interpolation operators yields ;˛;ˇ IQ;˛;ˇ D IJ`1;` ˝ ˝ I`d;` ` ;`
J ;˛;ˇ
for all `; ` 2 f0; : : : ; pg with ` ` ;
i.e., we have found a tensor-product representation for the multi-dimensional re-interpolation operator. We proceed as in Section 4.4: the operators IQ;˛;ˇ are characterized by ` ;` directional re-interpolation operators, which can be analyzed by using one-dimensional results. For all 2 f1; : : : ; d g and all `; ` 2 f0; : : : ; pg with ` ` , we let IQ;˛;ˇ ´ I ˝ ˝ I ˝I`J ;˛;ˇ ˝ I ˝ ˝ I ;` ` ;`; „ ƒ‚ … „ ƒ‚ … d times
1 times
and observe IQ;˛;ˇ D ` ;`
d Y
IQ;˛;ˇ ` ;`;
for all `; ` 2 f0; : : : ; pg with ` ` :
D1
The directional operators can again be expressed as polynomials in one variable: for
2 f1; : : : ; d g, `; ` 2 f0; : : : ; pg with ` ` , a function f 2 C.Q` / and all points x 2 Q` , we have IQ;˛;ˇ Œf .x/ D ` ;`;
m` X D1
J
J
f .x1 ; : : : ; x1 ; m;` ; xC1 ; : : : ; xd /I`J ;˛;ˇ ŒLm;` .x /: ` ; ` ; ;`
140
4 Application to integral operators
Now we can turn our attention to the problem of establishing stability and approximation error estimates for the directional operators. For the one-dimensional re-interpolation operators, the -regularity of the sequence of intervals is of central importance, therefore it is straightforward to introduce its multi-dimensional counterpart: Definition 4.62 ( -regular boxes). Let 20; 1Œ. The family Q D .Q` /p`D0 of boxes is called -regular if we have jJ;`C1 j jJ;` j
for all ` 2 f0; : : : ; p 1g; 2 f1; : : : ; d g;
i.e., if all the families J are -regular, where Q` D J1;` Jd;`
for all ` 2 f0; : : : ; pg:
If the bounding boxes are -regular, we can generalize the Lemmas 4.18 and 4.19: Lemma 4.63 (Stability of directional re-interpolation). Let Q D .Q` /p`D0 be -regular. Let the interpolation scheme be .ƒ; /-stable. Then there is a constant Cre 2 R1 depending only on , ƒ, and ˇ which satisfies Œf k1;Q` Cre ƒ˛Cˇ.p`/ kf k1;Q` kIQ;˛;ˇ ` ;`; for all 2 f1; : : : ; d g, f 2 C.Q` / and `; ` 2 f0; : : : ; pg with ` ` . Proof. Combine the technique used in the proof of Lemma 4.18 with the result of Theorem 4.59. Lemma 4.64 (Directional re-interpolation error). Let Q D .Q` /p`D0 be a family of
-regular boxes. Let the interpolation scheme be .ƒ; /-stable. Let a rank parameter ˇ 2 N0 and a lower bound r0 2 R>0 for the convergence radius be given. There are constants Cri 2 R1 , ˛0 2 N such that 1 2C kf IQ;˛;ˇ Œf k C C .˛ C ˇ.p `/ C 1/ 1 C %.r0 /˛ˇ.p`/ 1;Q` f ri ` ;`; r0 holds for all 2 f1; : : : ; d g, all `; ` 2 f0; : : : ; pg with ` ` , all ˛ 2 N˛0 and all f 2 C 1 .Q` / satisfying k@ f k1;Q`
Cf f 1
for all 2 N0
for constants Cf 2 R0 , f 2 R>0 and 2 N with 2f =jJ;` j r0 . Proof. Proceed as in the proof of Lemma 4.19 and use Theorem 4.61.
4.7 Variable-order approximation
141
Combining the Lemmas 4.63 and 4.64 yields the following multi-dimensional error estimate for the re-interpolation operator: Theorem 4.65 (Multi-dimensional re-interpolation error). Let Q D .Q` /p`D0 be regular. Let the interpolation scheme be .ƒ; /-stable. Let a rank parameter ˇ 2 N0 and a lower bound r0 2 R>0 for the convergence radius be given. There are constants Cmr 2 R1 , ˛0 2 N such that kf IQ;˛;ˇ Œf k1;Q` ` ;`
1 Cf Cmr .˛ C ˇ.p `/ C 1/.d C1/C 1 C %.r0 /˛ˇ.p`/ r0
holds for all `; ` 2 f0; : : : ; pg with ` ` , all ˛ 2 N˛0 and all f 2 C 1 .Q` / satisfying k@ f k1;Q`
Cf f 1
for all 2 N0 ; 2 f1; : : : ; d g
and 2f =jJ;` j r0 for all 2 f1; : : : ; d g for constants Cf 2 R0 , f 2 R>0 and 2 N. Proof. As in the proof of Theorem 4.20, replacing the Lemmas 4.18 and 4.19 by the Lemmas 4.63 and 4.64, and using Cmr ´ Cri Cred 1 .
Application to the kernel function Before we can apply Theorem 4.65, we have to ensure that all relevant sequences of bounding boxes are -regular. Definition 4.66 ( -regular bounding boxes). Let 20; 1Œ. Let .Q t / t2T be a family of bounding boxes for a cluster tree T . It is called -regular if all boxes have to form Q t D J t;1 J t;d for suitable non-trivial intervals J t;1 ; : : : ; J t;d and if J t 0 ; J t; ;
jJ t 0 ; j jJ t; j
holds for all t 2 T ; t 0 2 sons.t /; 2 f1; : : : ; d g:
Let .Q t / t2T and .Qs /s2TJ be families of bounding boxes for the cluster trees T and TJ . In order to be able to apply Theorem 4.65, we have to find a sequence of bounding boxes for each pair t 2 T , t 2 L t of clusters.
142
4 Application to integral operators
Lemma 4.67 (Sequence of ancestors). For each t 2 T and each t 2 L t , there is a sequence .t` /p`D0 with p D level.t / satisfying t0 D root.T /;
tlevel.t/ D t;
tp D t
and t`C1 2 sons.t` /
for all ` 2 f0; : : : ; p 1g:
Proof. By induction on level.t / 2 N0 . If level.t / D 0 holds, t0 D t D root.T / is the solution. Now let n 2 N0 be such that we can find the desired sequence for all t 2 T , t 2 L t with level.t / D n. Let t 2 T and t 2 L t with level.t / D n C 1. Due to Definition 3.6, there is a cluster t C 2 T with t 2 sons.t C / and level.t C / D n. We can apply the induction assumption to get a sequence t0 ; : : : ; tn of clusters with t0 D root.T /, tn D t C and t`C1 2 sons.t` / for all ` 2 f0; : : : ; n1g. We let p ´ nC1 and tp ´ t . Due to definition, we have t 2 sons .tlevel.t/ / and t 2 sons .t /, i.e., ; ¤ tO tOlevel.t/ \ tO, and since the levels of tlevel.t/ and t are identical, Lemma 3.8 yields t D tlevel.t/ , thus concluding the induction. We can use this result to construct sequences of bounding boxes corresponding to the re-interpolation operators I t ;t and Is ;s and get the following error estimate: Theorem 4.68 (Variable-order interpolation). Let .Q t / t2T and .Qs /s2TJ be -regular families of bounding boxes. Let the kernel function g be .Cas ; ; c0 /-asymptotically smooth with > 0. Let the interpolation scheme be .ƒ; /-stable. Let 2 R>0 . Let ˇ 2 N0 . There are constants Cvo 2 R1 and ˛0 2 N such that kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs
Cvo .˛ C ˇ`C C 1/.2d C1/C 1 ˛ˇ ` % dist.Q t ; Qs / c0
with `C ´ maxfp level.t/; pJ level.s/g; ` ´ minfp level.t /; pJ level.s/g holds for all ˛ 2 N˛0 , all blocks b D .t; s/ 2 LC J satisfying the standard admissibility condition maxfjJ t; j; jJs; j W 2 f1; : : : ; d gg 2 dist.Q t ; Qs /
(4.70)
and all t 2 L t and s 2 Ls . Proof. Let ˛ 2 N˛0 , let b D .t; s/ 2 LC J satisfying (4.70), and let t 2 L t and s 2 Ls . In order to be able to apply Theorem 4.65, we have to find families of boxes Q D .Q` /p`D0 such that the corresponding re-interpolation operators IQ;˛;ˇ coincide ` ;` with I t ;t or Is ;s .
4.7 Variable-order approximation
143
Without loss of generality, we only consider I t ;t . Let ` t ´ level.t / and `t ´ `
t satisfying t` D t , level.t /. According to Lemma 4.67, there is a sequence .t` /`D0 t` D t and
t`C1 2 sons.t` /
for all ` 2 f0; : : : ; `t 1g:
`
t by Q` ´ Q t` . This family is -regular due to We define the family Q ´ .Q` /`D0 our assumptions, and comparing Definition 4.53 and (4.68), (4.69) yields t ;ˇ I`Q;˛ D I t ;t ;` t
t
with ˛ t ´ ˛ C ˇ.p `t / ˛:
The asymptotic smoothness of g implies that we can apply the re-interpolation error estimate provided by Theorem 4.65 in order to get kg .I t ;t ˝ I /Œgk1;Q t Qs
Cf Cmr .˛ t C ˇ.`t ` t //.d C1/C .1 C c0 / 1 ˛ t ˇ.` t ` t / % dist.Q t ; Qs / c0 Cf Cmr .˛ C ˇ.p ` t //.d C1/C .1 C c0 / 1 ˛ˇ.p ` t / % dist.Q t ; Qs / c0 ˛ˇ ` Cf Cmr .˛ C ˇ`C /.d C1/C .1 C c0 / 1 % : dist.Q t ; Qs / c0
Applying the same reasoning to Is ;s yields kg .I ˝ Is ;s /Œgk1;Q t Qs
Cf Cmr .˛ C ˇ`C /.d C1/C .1 C c0 / 1 ˛ˇ ` % : dist.Q t ; Qs / c0
Due to the stability estimate of Theorem 4.59, we have k.I t ;t ˝ I /Œf k1;Q t Qs Cred ƒd .˛ C ˇ`C /d kf kQ t Qs for all f 2 C.Q t Qs /, and we can conclude kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs kg .I t ;t ˝ I /Œgk1;Q t Qs C k.I t ;t ˝ I /Œg .I ˝ Is ;s /Œgk1;Q t Qs kg .I t ;t ˝ I /Œgk1;Q t Qs C Cred ƒd .˛ C ˇ`C /d kg .I ˝ Is ;s /Œgk1;Q t Qs Cf Cmr Cred ƒd .˛ C ˇ`C C 1/.2d C1/C .1 C c0 / 1 ˛ˇ ` % : dist.Q t ; Qs / c0
144
4 Application to integral operators
Defining the constant Cvo as Cvo ´ Cf Cmr Cred ƒd .1 C c0 / yields the desired estimate. In order to be able to apply Theorem 4.47, we have to separate the clusters t and s appearing in the estimate of Theorem 4.68. We can do this by assuming that the quantities p level.t / and pJ level.s/ describing the number of re-interpolation steps are close to each other: Definition 4.69 (Ccn -consistency). Let Ccn 2 N0 , and let TJ be an admissible block cluster tree. TJ is called Ccn -consistent if j.p level.t // .pJ level.s//j Ccn
for all b D .t; s/ 2 LC J :
For standard grid hierarchies, we can find a uniform bound for the difference between p and pJ for all meshes in the hierarchy, i.e., the left-hand term in Definition 4.69 can be bounded uniformly for all meshes. This implies that the corresponding block cluster trees will be Ccn -consistent for a constant Ccn which does not depend on the mesh, but only on the underlying geometry and the discretization scheme. Before we can apply Theorem 4.47 to bound the spectral error of the variable-order approximation, we have to establish the factorized error estimates of the form (4.53). Corollary 4.70 (Factorized estimate). Let .Q t / t2T and .Qs /s2TJ be -regular families of bounding boxes. Let TJ be Ccn -consistent. Let the kernel function g be .Cas ; ; c0 /-asymptotically smooth with > 0. Let the interpolation scheme be .ƒ; /stable. Let 2 R>0 . Let ˇ 2 N0 . There are constants Cin 2 R>0 and ˛0 2 N such that kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs !1=2 !1=2 Cin q ˛Cˇ.pJ level.s// Cin q ˛Cˇ.p level.t// diam.Q t / diam.Qs / holds with
²
c0 c0 ; q ´ min c0 C 1 2
(4.71)
³
for all ˛ 2 N˛0 , all blocks b D .t; s/ 2 LC J satisfying the admissibility condition maxfdiam.Q t /; diam.Qs /g 2 dist.Q t ; Qs / and all t 2 L t and s 2 Ls .
(4.72)
4.7 Variable-order approximation
145
Proof. Let ˛ 2 N˛0 and let b D .t; s/ 2 LC J satisfy the admissibility condition (4.72). We have to find bounds for the three important factors appearing in the error estimate of Theorem 4.68. The admissibility condition (4.72) yields 1 .2/ dist.Q t ; Qs / maxfdiam.Q t / ; diam.Qs / g 1 1 .2/ : diam.Q t /=2 diam.Qs /=2
(4.73)
Using the consistency of the block cluster tree TJ , we find ˛ C ˇ`C C 1 ˛ C ˇ.p level.t // C 1 C Ccn ˇ .Ccn ˇ C 1/.˛ C ˇ.p level.t // C 1/; ˛ C ˇ`C C 1 ˛ C ˇ.pJ level.s// C 1 C Ccn ˇ .Ccn ˇ C 1/.˛ C ˇ.pJ level.s// C 1/; and can bound the stability term by .˛ C ˇ`C C 1/.2d C1/C .Ccn ˇ C 1/1=2 .˛ C ˇ.p level.t // C 1/1=2 .˛ C ˇ.pJ level.s// C 1/1=2 :
(4.74)
The convergence term can also be bounded by using the consistency. We get
1 % c0
%
1 c0
˛ˇ `
˛ˇ `
1 ˛ˇ.p level.t//CCcn ˇ % c0 1 Ccn ˇ 1 ˛ˇ.p level.t// D% % ; c0 c0 1 ˛ˇ.pJ level.s//CCcn ˇ % c0 1 Ccn ˇ 1 ˛ˇ.pJ level.s// D% % c0 c0
and conclude 1 ˛ˇ ` 1 Ccn ˇ=2 1 .˛Cˇ.p level.t///=2 % % % c0 c0 c0 .˛Cˇ.pJ level.s///=2 1 % : c0
(4.75)
Combining Theorem 4.68 with (4.73), (4.74) and (4.75) yields the factorized error
146
4 Application to integral operators
estimate kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs 1=2 0 C .˛ C ˇ.p level.t // C 1/.2d C1/C diam.Q t / %.1=.c0 //˛Cˇ.p level.t// 1=2 0 C .˛ C ˇ.pJ level.s// C 1/.2d C1/C ; diam.Qs / %.1=.c0 //˛Cˇ.pJ level.s//
(4.76)
where we abbreviate C 0 ´ Cvo .2/ .Ccn ˇ C 1/1=2 %
1 c0
Ccn ˇ=2 :
As in Remark 4.23, we can see that q>
1 %.1=.c0 //
holds, i.e., we have ´ q%.1=.c0 // > 1. We consider only the first factor on the right-hand side of estimate (4.76): introducing the polynomial S W R ! R;
x 7! C 0 .x C 1/.2d C1/C
yields C 0 .˛ C ˇ.p level.t // C 1/.2d C1/C S.˛ C ˇ.p level.t /// D ˛Cˇ.p level.t// %.1=.c0 // %.1=.c0 //˛Cˇ.p level.t// S.n/ S.n/ D D qn n n .=q/ for n ´ ˛ C ˇ.p level.t //. Since S is a polynomial and due to > 1, we can find a constant Cin 2 R>0 such that S.x/ Cin x
holds for all x 2 R1 ;
and we conclude C 0 .˛ C ˇ.p level.t // C 1/.2d C1/C Cin q n : %.1=.c0 //˛Cˇ.p level.t//
Global spectral error estimate Now we can assemble the spectral error estimate provided by Theorem 4.47, the blockwise estimate of Lemma 4.45 and the factorized error estimate of Corollary 4.70.
4.7 Variable-order approximation
147
We assume that the families .Q t / t2T and .Qs /s2TJ of bounding boxes are regular, that the block cluster tree TJ is Ccn -consistent and Csp -sparse, that the kernel function is .Cas ; ; c0 /-asymptotically smooth with > 0, that the interpolation scheme is .ƒ; /-stable, that the supports . i /i2 and . j /j 2J of the basis functions are Cov -overlapping and that the L2 -norms of the basis functions .'i /i2 and . j /j 2J are bounded by constants C ; CJ 2 R>0 defined as in (4.48). We fix an admissibility parameter 2 R>0 and assume that (4.58) holds, i.e., that the domain is not folded too tightly, and that (4.59) holds, i.e., that the bounding boxes do not grow too rapidly. Lemma 4.71 (Factorized error estimate). Let ˇ 2 N0 . With the constants Cin 2 R>0 , q 20; 1Œ and ˛0 2 N from Corollary 4.70, we have zs k2 k t Gs Vzt Sb W
q Cin C2 Cov j t j
˛Cˇ.p level.t//
!1=2
diam.Q t /
q Cin CJ2 Cov j s j
˛Cˇ.pJ level.s//
!1=2
diam.Qs /
for all ˛ 2 N˛0 and all blocks b D .t; s/ 2 LC J satisfying the admissibility condition (4.72). Proof. We can use Corollary 4.70 to bound the interpolation error by kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs b ´
Cin q ˛Cˇ.p level.t// diam.Q t /
!1=2
Cin q ˛Cˇ.pJ level.s// diam.Qs /
!1=2
and since b does not depend on t or s , we can use Lemma 4.56 to conclude zs k2 C CJ b k t Gs Vzt Sb W
X i2tO
1=2 X
j i j
1=2 j j j
:
j 2Os
Since the supports are Cov -overlapping, this estimate can be simplified: zs k2 C CJ b j t j1=2 j s j1=2 : k t Gs Vzt Sb W Combining this estimate with the definition of b yields the desired bound for the blockwise error. Combining Theorem 4.47 with this estimate yields the following error bound for the global spectral error: Theorem 4.72 (Variable-order spectral error). Let ˇ 2 N0 , and let d . With the constants Cin 2 R>0 , q 20; 1Œ and ˛0 2 N from Corollary 4.70 and Ccu 2 R>0 and
148
4 Application to integral operators
d 2 N from (4.58), we have z 2 Csp Ccu Cgrd Cin Cov C CJ hd q ˛ kG Gk p X
ˇ d p `
.q
/
pJ 1=2 X
`D0
.q ˇ d /pJ `
1=2
`D0
for all ˛ 2 R˛0 . Proof. We combine Theorem 4.47 with Lemma 4.71. In the resulting estimate, we use (4.58) and (4.59) in order to find Ccu diam.Q t /d j t j D Ccu diam.Q t /d diam.Q t / diam.Q t / Ccu Cgrd hd .d /.p level.t//
for all t 2 T ;
j s j Ccu diam.Qs /d D Ccu diam.Qs /d diam.Qs / diam.Qs / Ccu Cgrd hd .d /.pJ level.s//
for all s 2 TJ
and notice that combining these factors with the ones from Lemma 4.71 yields the desired result. We can use the rank parameter ˇ to compensate the growth of the clusters and the rank parameter ˛ to reduce the global error to any prescribed accuracy: Corollary 4.73 (Variable-order approximation). Let d . Let C 2 R>0 . We can ensure z 2 C C CJ hd kG Gk by choosing the rank parameters ˛ 2 N and ˇ 2 N0 appropriately. Proof. Let ˇ 2 N0 with
q ˇ d < 1=2:
According to Theorem 4.72, we can choose ˛ 2 N˛0 with 2Csp Ccu Cgrd Cin Cov q ˛ C and observe p X `D0 pJ
X `D0
ˇ d p `
.q
/
.q ˇ d /pJ `
p p ` X 1 `D0 pJ
2
2;
X 1 pJ ` 2; 2 `D0
and applying Theorem 4.72 yields the desired bound.
4.8 Technical lemmas
149
4.8 Technical lemmas Lemma 4.74 (Cardinality of multi-index sets). We have nCd d #f 2 N0 W jj ng D d
(4.77)
for all n 2 N0 and all d 2 N. Proof. We first prove
n X j Cd nCd C1 D d d C1
(4.78)
j D0
for all d 2 N and all n 2 N0 by induction on n. Let d 2 N. For n D 0, the equation is trivial. Assume now that (4.78) holds for n 2 N0 . We conclude the induction by observing n nC1 X j C d X j Cd nCd C1 nCd C1 nCd C1 D C D C d d d d C1 d j D0
j D0
.n C d C 1/Š .n C d C 1/Š C nŠ.d C 1/Š .n C 1/Šd Š .n C d C 1/Š.n C 1/ .n C d C 1/Š.d C 1/ D C .n C 1/Š.d C 1/Š .n C 1/Š.d C 1/Š .n C d C 1/Š.n C d C 2/ nCd C2 D D : .n C 1/Š.d C 1/Š d C1
D
Now we prove (4.77) by induction on d 2 N. For d D 1, we have f 2 Nd0 W jj ng D f0; : : : ; ng; i.e.,
.n C 1/Š nCd #f 2 W jj ng D n C 1 D nŠ D : 1Š d Let us now assume that (4.77) holds for a given d 2 N and all n 2 N0 . Let n 2 N0 . We have n [ d C1 W jj ng D f.; i / W 2 Nd0 ; jj n i g; f 2 N0 Nd0
iD0
which implies #f 2 Nd0 C1 W jj ng D
n X
#f.; i / W 2 Nd0 ; jj n i g
iD0
D
n X iD0
f 2 Nd0 W jj n i g
150
4 Application to integral operators
X n n X j Cd nCd C1 ni Cd D ; D D d d C1 d j D0
iD0
where we have used (4.78) in the last step. Lemma 4.75 (Taylor approximation error). Let ! Rd be a star-shaped domain with center z0 2 Rd . Let m 2 N, and let f 2 C m .!/. For all z 2 !, the approximation error of the Taylor approximation fQz0 ;m .z/ D
X
@ f .z0 /
jj<m
.z z0 / Š
is given by X Z
f .z/ fQz0 ;m .z/ D m
jjDm
1
.1 t /m1 @ f .z0 C .z z0 /t / dt
0
.z z0 / : (4.79) Š
Proof. Let z 2 ! and p ´ z z0 . We define the function F W Œ0; 1 ! R;
t 7! f .z0 C pt /;
which satisfies F .1/ D f .z/, and we define its m-th order Taylor approximation FQm W Œ0; 1 ! R;
t 7!
m1 X i iD0
t .i/ F .0/; iŠ
for all m 2 N. A simple induction yields F .i/ .t/ D
X iŠ @ f .z0 C pt /p ; fQz0 ;m .z/ D FQm .1/ Š
for all i 2 f0; : : : ; mg;
jjDi
so we can apply Lemma 2.1 to get Z 1 1 .1 t /m1 F .m/ .t / dt .m 1/Š 0 X mŠ Z 1 1 D .1 t /m1 @ f .z0 C pt /p dt .m 1/Š Š 0 jjDm X Z 1 .z z0 / Dm ; .1 t /m1 @ f .z0 C pt / dt Š 0
F .1/ FQm .1/ D
jjDm
and this is the desired result.
4.8 Technical lemmas
151
Lemma 4.76 (Holomorphic extension). Let f 2 C 1 Œ1; 1, Cf 2 R>0 , f 2 R>0 and 2 N be such that jf ./ .x/j
Cf f 1
holds for all 2 N0 ; x 2 Œ1; 1:
(4.80)
Let r 2 Œ0; f Œ, and let Rr ´ fz 2 C W there is a z0 2 Œ1; 1 with jz z0 j rg: We have Œ1; 1 Rr , and there is a holomorphic extension fQ 2 C 1 .Rr / of f satisfying f Q jf .z/j Cf for all z 2 Rr : (4.81) f r Proof. For all 2 N, we define the function s W Œ0; 1Œ! R; and observe that s .`/
7!
` . 1 C `/Š D s C` s C` D 1 . 1/Š
1 ; .1 /
holds for all ` 2 N0 ; 2 N:
Computing the Taylor expansion of s at zero yields s ./ D
1 X 1 D0
for all 2 Œ0; 1Œ:
(4.82)
Let us now turn our attention to the function f . For all z0 2 Œ1; 1, we define the disc Dz0 ´ fz 2 C W jz z0 j rg centered at z0 with radius r. Let z0 2 Œ1; 1. For all z 2 Dz0 , we have jz z0 j r < f , and the upper bound (4.80) implies ˇ 1 1 ˇ ˇ X X ˇ ./ jz z0 j ˇf .z0 / .z z0 / ˇ Cf ˇ ˇ Š f 1 D0 D0 1 1 X X r Cf D Cf 1 1 f D0 D0 for ´ r=f 2 Œ0; 1Œ. Comparing this equation with (4.82) yields ˇ 1 ˇ ˇ X ˇ ./ ˇf .z0 / .z z0 / ˇ Cf s ./ < 1; ˇ ˇ Š D0
(4.83)
152
4 Application to integral operators
i.e., the Taylor series in the point z0 converges on Dz0 , therefore the function fQz0 W Dz0 ! C;
z 7!
1 X
f .i/ .z0 /
iD0
.z z0 /i ; iŠ
is holomorphic. Let z 2 Dz0 \ Œ1; 1 R. Combining (4.27) and the error equation (2.5) of the truncated Taylor expansion, we find ˇ ˇ m1 X ˇ .z z0 /i ˇˇ .i/ ˇf .z/ f .z0 / ˇ ˇ iŠ iD0 ˇZ 1 ˇ ˇ .z z0 /m ˇˇ m1 .m/ ˇ D ˇ .1 t / f .z0 C t .z z0 // dt ˇ .m 1/Š 0 Z 1 jz z0 jm dt .1 t /m1 jf .m/ .z0 C t .z z0 //j .m 1/Š 0 m Z 1 jz z0 j .1 t /m1 Cf . 1 C m/Š dt f .m 1/Š 0 . 1 C m/Š r m Cf mŠ f for all m 2 N. Due to r=f < 1, sending m to infinity yields f .z/ D fQz0 .z/ for all z 2 Dz0 \ Œ1; 1, i.e., fQz0 is an extension of f into Dz0 . Now we can define the extension fQ of f into Rr . For all z 2 Rr , we pick a point z0 2 Œ1; 1 with jz z0 j r and define fQ.z/ ´ fQz0 .z/. Since all holomorphic functions fQz0 are extensions of f , the identity theorem for holomorphic functions yields that fQ is a well-defined holomorphic extension of f in Rr . The inequality (4.83) implies (4.81). p Lemma 4.77 (Covering of E% ). Let r 2 R>0 and % ´ r C 1 C r 2 . Let Rr C be defined in Lemma 4.76 and let E% C be the regularity ellipse defined in Lemma 4.14. Then we have E% Rr . Proof. Let z 2 E% , and let x; y 2 R with z D x C iy. We have to find z0 2 Œ1; 1 with jz z0 j r. The definition (4.26) implies 2 2 2x 2y C 1; % C 1=% % 1=% and we can conclude x2
.% C 1=%/2 ; 4
y2
.% 1=%/2 : 4
4.8 Technical lemmas
Rr
153
E%
Figure 4.5. Covering E% by a family of discs.
If x 2 Œ1; 1 holds, we let z0 ´ x and observe p %2 1 .r C 1 C r 2 /2 1 % 1=% jz z0 j D jyj D D 2 2% 2% p p 2 2 2 2r C 2r 1 C r 2 r C 2r 1 C r 2 C 1 C r 1 D D 2% 2% p 2 2r.r C 1 C r / D r; D 2% i.e., z 2 Rr . Otherwise we have jxj > 1. Due to symmetry, we can restrict our attention to the case x > 1. We let z0 ´ 1. We have 1<x
%2 C 1 ; 2%
which implies 4x 2 %2 4%2
.%2 C 1/2 D .%2 C 1/2 ; 4%2
and we can conclude x 2 .%2 1/2 D x 2 .%4 2%2 C 1/ D x 2 .%4 C 2%2 C 1/ 4x 2 %2 D x 2 .%2 C 1/2 4x 2 %2 x 2 .%2 C 1/2 .%2 C 1/2 D .x 2 1/.%2 C 1/2 > .x 2 2x C 1/.%2 C 1/2 D .x 1/2 .%2 C 1/2 : This estimate is equivalent to 4x 2 4.x 1/2 > 2 ; 2 C 1/ .% 1/2
.%2
154
4 Application to integral operators
so we get 2 2 2x 2y 4x 2 %2 4y 2 %2 1 C D 2 C % C 1=% % 1=% .% C 1/2 .%2 1/2 2 2 2 2 2 4.x 1/ % 4y % 4.x 1/ 4y 2 .x 1/2 y2 > C D C D C ; .%2 1/2 .%2 1/2 .% 1=%/2 .% 1=%/2 r2 r2 i.e., .x 1/2 C y 2 < r 2 . This is equivalent to jz z0 j < r and implies z 2 Rr . Lemma 4.78 (Scaling of %). Let % W R0 ! R1 ;
r 7! r C
p
1 C r 2:
Let r0 2 R>0 . For each 2 .0; 1/, there is a O 2 .0; 1/ such that 1 r %.r/ holds for all r 2 Rr0 : %
O Proof. Let x 2 R>0 . We start by considering the function p f W R>1 ! R0 ; ˛ 7! 1 C x 2 ˛ 2 : Elementary computations yield ˛x 2 x2 f 0 .˛/ D p ; f 00 .˛/ D 0; p 1 C x2˛2 .1 C x 2 ˛ 2 / 1 C x 2 ˛ 2
for all ˛ 2 R>0 :
Applying the error equation (2.5) to the second-order Taylor expansion of f in the point 1 yields f .˛/ f .1/ f 0 .1/.˛ 1/ 0
for all ˛ 1;
and we conclude p p x2 1 C x 2 ˛ 2 D f .˛/ f .1/Cf 0 .1/.˛ 1/ D 1 C x 2 C p .˛ 1/: (4.84) 1 C x2 Now let r 2 Rr0 . Applying (4.84) to x ´ r and ˛ ´ 1= > 1 yields p p r2 1 C r 2 ˛ 2 r˛ C 1 C r 2 C p .˛ 1/ 1 C r2 p p r 1 C r2 C r2 2 .˛ 1/ Dr C 1Cr C p 1 C r2 p r D .r C 1 C r 2 / 1 C p .˛ 1/ 1 C r2
%.r= / D %.r˛/ D r˛ C
4.9 Numerical experiments
155
r .˛ 1/ : D %.r/ 1 C p 1 C r2 Due to r r0 0, we have r 2 r02 and r 2 .1 C r02 / D r 2 C r 2 r02 r02 C r 2 r02 D r02 .1 C r 2 /. This means r02 r2 ; 1 C r2 1 C r02 so we can conclude
i.e.,
p
r 1C
r2
r0 q ; 1 C r02
! 1 %.r= / %.r/ 1 C q : 1 C r2 r0
(4.85)
0
We let
q 1 C r02
0. Let the family of interpolation operators .Im /1 mD0 be .ƒ; /-stable. Let 2 .0; 1/ be such that diam.Q t 0 / diam.Q t /
holds for all t 2 T ; t 0 2 sons.t /:
(6.59)
As in Remark 4.23, we denote the convergence rate of the interpolation scheme by ³ ² c0 c0 ; 2 .0; 1/: q ´ min c0 C 1 2 There are constants Cin 2 R>0 and O 2 .0; 1/ which only depend on d , g, , , ƒ and such that Cin C kg gQ t k1;Q t Qs O ˛.level.t/level.t // q ˛Cˇ.p level.t// (6.60) dist.Q t ; Qs / holds for all t 2 T , t C 2 pred.t / and s 2 rowC .t C / satisfying the admissibility condition diam.Q t C / 2 dist.Q t C ; Qs / (which is implied by the standard admissibility condition (4.49)).
268
6 Compression
Proof. Let t 2 T . We start by observing that Cf ´
Cas ; dist.Q t ; Qs /
f ´
dist.Q t ; Qs / c0
fulfill the requirements of Theorem 4.20, which gives us n t 2f diam.Q t / d kg gQ t k1;Q t Qs 2edCf ƒm 1 C .n t C 1/ % : f diam.Q t / We apply (6.59) to bound the diameter of Q t by that of Q t C : n t n t 2f 2f % % : diam.Q t / level.t/level.t C / diam.Q t C / The admissibility condition yields 2f 2 dist.Q t ; Qs / 2 dist.Q t C ; Qs / 2 dist.Q t C ; Qs / 1 D D ; diam.Q t C / c0 diam.Q t C / c0 diam.Q t C / 2c0 dist.Q t C ; Qs / c0 c0 diam.Q t / c0 diam.Q t C / 2c0 dist.Q t C ; Qs / diam.Q t / D D 2c0 f dist.Q t C ; Qs / dist.Q t C ; Qs / dist.Q t C ; Qs / and we conclude
kg gQ t k1;Q t Qs
2edCf ƒdm .1
C 2c0 /.n t C 1/ %
n t
1
:
level.t/level.t C / c0
Now we can use Lemma 4.78 to find a O 2 .0; 1/ such that %.1=.c0 // %.1=.c0 //=O holds, and applying this estimate level.t / level.t C / times yields 1 n t d O n t .level.t/level.t C // kg gQ t k1;Q t Qs 2edCf ƒm .1 C 2c0 /.n t C 1/ % : c0 Remark 4.17 implies %.1=.c0 // > maxf1=.c0 / C 1; 2=.c0 /g D 1=q, so we have ´ q%.1=.c0 // 2 R>1 , and Lemma 3.50 implies that there is a constant Cin 2 R>0 such that 2edCas .1 C 2c0 /ƒd .x C 2/d .x C 1/ x Cin
holds for all x 2 N;
therefore we find kg gQ t k1;Q t Qs d
2edCf ƒ .n t C 2/
d
O n t .level.t/level.t C //
.1 C 2c0 /.n t C 1/
1 % c0
n t
2edCas .1 C 2c0 /ƒd C .n t C 2/d .n t C 1/ n t O n t .level.t/level.t // q n t dist.Q t ; Qs / Cin C O n t .level.t/level.t // q n t : dist.Q t ; Qs / D
Observing n t D ˛ C ˇ.p level.t // ˛ completes the proof.
269
6.8 Refined error control and variable-rank approximation
It is important to note that estimates similar to (6.60) also hold for Taylor or multipole expansion schemes. Based on this estimate, we now derive the required bound for the blockwise spectral error: Lemma 6.39. Let Cin 2 R>0 and O 2 .0; 1/ be such that (6.60) holds for all ˛; ˇ; t; t C 1=2 and s. Let O 2 R>0 , p 2 .0; Cbc / and 2 .0; 1/. We require the following assumptions: • The measure of the support of a cluster can be bounded by a power of its diameter, i.e., there are constants Ccu 2 R1 and d 2 N satisfying for all t 2 T :
j t j Ccu diam.Q t /d
(6.61a)
This assumption defines the “effective dimension” d of the subdomain or submanifold (cf. (4.58)). • The order of the singularity of g is not too high, i.e., we have d . • The growth of bounding boxes can be controlled, and the diameters of bounding boxes of leaf clusters can be bounded by a multiple of the grid parameter h, i.e., there are constants Cgr 2 R>0 and 2 .0; 1/ satisfying p level.t/ diam.Q t / Cgr h
for all t 2 T :
(6.61b)
This assumption is related to the quasi-uniformity of the underlying grid and the regularity of the clustering scheme (cf. (4.59)). • The levels of row and column clusters of admissible blocks are not too far apart, i.e., there is a constant Ccn 2 N0 satisfying j.p level.t // .pJ level.s//j Ccn
for all b D .t; s/ 2 LC J : (6.61c)
This assumption is related to the compatibility of the cluster tree T and TJ and to the “level-consistency” (cf. [18]) of the resulting block cluster tree TJ . We assume that the rank parameters are large enough, i.e., that ˛ ˛
log.p 1=2 / ; log O
ˇ
log. / ; log q
log.Cin Cov Ccu Cgr Csp1=2 .2/ .1 /1=2 . /Ccn =2 C CJ hd =O / log q
hold. This ensures
1 C .p 1=2 /level.t/level.t / .p level.t//=2 .pJ level.s//=2 Csp (6.62) C C 2 pred.t / and s 2 row .t /.
k t Gs A t B t;s k2 O for all t 2 T , t C
s
270
6 Compression
Proof. Let t 2 T , t C 2 pred.t / and s 2 rowC .t C /. We denote the levels of t , t C and s by l t ´ level.t /, l t C ´ level.t C / and ls ´ level.s/. Lemma 4.44 yields k t Gs A t B t;s k2 Cov C CJ j t j1=2 j s j1=2 kg gQ t k1;Q t Qs :
We apply (6.61a) and get k t Gs A t B t;s k2
Cov Ccu C CJ diam.Q t /d =2 diam.Qs /d =2 kg gQ t k1;Q t Qs : Now we use the interpolation error estimate (6.60) in order to find k t Gs A t B t;s k2
Cin Cov Ccu C CJ
diam.Q t /d =2 diam.Qs /d =2 O ˛.l t l C / ˛Cˇ.p l t / t q : dist.Q t ; Qs /
Since .t C ; s/ 2 LC J is admissible, we have 1 1 diam.Q t C / diam.Q t / ; dist.Q t ; Qs / dist.Q t C ; Qs / 2 2 1 dist.Q t ; Qs / dist.Q t C ; Qs / diam.Qs / ; 2 and these inequalities can be used to eliminate the denominator of our estimate and get k t Gs A t B t;s k2 Cin Cov Ccu C CJ .2/
diam.Q t /.d /=2 diam.Qs /.d /=2 O ˛.l t l t C / q ˛Cˇ.p l t / : Now we use (6.61b) in order to prove k t Gs A t B t;s k2 C1 C CJ hd .l t p /=2 .ls pJ /=2 O ˛.l t l t C / q ˛Cˇ.p l t / ;
where the constants are collected in C1 ´ Cin Cov Ccu Cgr .2/ for better readability. Due to our choice of ˛ and ˇ, we have q ˇ and O ˛ p 1=2 , and the consistency assumption (6.61c) yields O ˛.l t l t C / q ˇ.p l t / .p 1=2 /l t l t C . /p l t D .p 1=2 /l t l t C . /.l t l t C /=2 . /.p l t /=2 . /.p l t /=2 D .p 1=2 /l t l t C . /.p l t /=2 . /.p l t C /=2 .p 1=2 /l t l t C . /.p l t /=2 . /.pJ ls /=2 . /Ccn =2 ; therefore our error estimate now takes the form k t Gs A t B t;s k2
C1 C CJ . /Ccn =2 hd q ˛ .p 1=2 /l t l t C .p l t /=2 .pJ ls /=2 :
6.9 Numerical experiments
271
Observing that our choice of ˛ implies . /Ccn =2 q C1 C CJ hd
s
˛
1 O Csp
concludes the proof. It is important to note that the rank parameter ˇ depends only on the constants , and q, but not on O , while ˛ depends on the quotient O ; C CJ hd i.e., if O decays not faster than C CJ hd , both ˛ and ˇ can be chosen as constants independent of h, i.e., the corresponding rank distributions K and L will be bounded independently of the mesh parameter, while the discretization error converges like hd . This means that the approximation resulting from Algorithm 34 will have the same advantages as the variable-order interpolation scheme introduced in Section 4.7, while requiring weaker assumptions and no complicated analysis of re-interpolation operators: the existence of low-rank approximations A t B t for each cluster is sufficient, they do not have to be nested. Other panel clustering schemes, e.g., Taylor expansion or multipole expansion, can also be used to construct these low-rank approximations, and the quasi-optimality result of Lemma 6.36 implies that the adaptively constructed H 2 -matrix approximation will be at least as good as any of these.
6.9 Numerical experiments The theoretical estimates for the complexity of the compression algorithms require the rank distribution of the resulting cluster basis to be bounded. In the case of the recompression of an existing H 2 -matrix, this requirement is easily fulfilled, in the general case we have to rely on the quasi-optimality estimate of Lemma 6.23: if the total cluster bases can be approximated by bounded-rank matrices, then the rank distribution of the resulting cluster basis will also be bounded. In a first series of experiments, we consider the compression of dense matrices in standard array representation. The dense matrices S; D 2 R are given by Z Z 1 'i .x/ 'j .y/ log kx yk2 dy dx for all i; j 2 ; Sij ´ 2 Z Z 1 hx y; n.y/i Dij ´ 'i .x/ 'j .y/ dy dx for all i; j 2 ; 2 kx yk22
272
6 Compression
where is a polygonal curve in two-dimensional space and .'i /i2 is a family of piecewise constant basis functions in L2 . /. The matrices S and D correspond to the two-dimensional single and double layer potential operators. The entries of the matrices S and D are approximated by hierarchical quadrature [21]. In the first experiment, we apply the weighted compression Algorithm 34 to the matrix S, where is a regular polygonal approximation of the unit p circle C ´ fx 2 R2 W kxk2 D 1g. We use the weighting parameters p D 2=3 < 1=2 and D 5=6 < 1 and observe the results given in Table 6.1: the times for the construction of the original matrix (“Matrix”), of the adaptive cluster basis (“Basis”) and of the corresponding
Table 6.1. Variable-order recompression of the two-dimensional single layer potential operator on the circle with O D h2 , based on a dense matrix.
n Matrix Basis Proj Mem Mem=n kX XQ k2 256 0:1 < 0:1 < 0:1 109:2 0:43 1:04 0:6 0:1 < 0:1 223:0 0:44 4:15 512 1024 2:2 0:4 0:2 446:4 0:44 1:35 2048 8:9 1:8 0:7 896:6 0:44 3:76 4096 36:0 7:8 3:1 1794:3 0:44 9:37 8192 144:2 38:3 12:9 3590:4 0:44 2:57 16384 580:6 164:1 55:6 7176:0 0:44 6:38 32768 2318:5 688:3 236:3 14369:1 0:44 1:78 10000
1000
Basis O(n^2)
1000
Proj O(n^2)
100
100 10 10 1
1
0.1 1000
10000
100000
100000
0.1 1000
10000
0.0001
Memory O(n)
100000 Error O(h^2)
1e-05 10000 1e-06 1000 1e-07
100 1000
10000
100000
1e-08 1000
10000
100000
6.9 Numerical experiments
273
Table 6.2. Variable-order recompression of the two-dimensional single layer potential operator on the square with O D h2 , based on a dense matrix.
n Matrix Basis Proj Mem Mem=n kX XQ k2 0:57 2:14 256 0:1 < 0:1 < 0:1 146:1 512 0:6 0:1 < 0:1 283:7 0:55 2:45 1024 2:2 0:4 0:1 553:6 0:54 8:26 8:9 1:9 0:6 1094:7 0:53 3:26 2048 4096 36:0 7:6 2:4 2170:8 0:53 8:67 8192 144:2 37:5 10:1 4318:5 0:53 2:07 16384 576:7 154:1 43:4 8569:3 0:52 5:08 32768 2320:5 654:4 187:0 17040:3 0:52 1:38 10000
1000
Basis O(n^2)
1000
Proj O(n^2)
100
100 10 10 1
1
0.1 1000
10000
100000
100000
0.1 1000
10000
0.0001
Memory O(n)
100000 Error O(h^2)
1e-05 10000 1e-06 1000 1e-07
100 1000
10000
100000
1e-08 1000
10000
100000
H 2 -matrix representation (“Proj”) scale like n2 , which can be considered optimal for a method based on dense matrices and is in accordance with the Lemmas 6.20 and 5.8. The storage requirements scale like n and the approximation error scales like n2 , therefore we can conclude that the weighted compression algorithm indeed finds a variable-order approximation of the dense matrix. In the next experiment, we apply the algorithm to a regular triangulation of the unit square1 S ´ f1; 1g Œ1; 1 [ Œ1; 1 f1; 1g. The results given in Table 6.2 are 1
In practical applications, a graded triangulation would be more appropriate, but the discussion of the
274
6 Compression
Table 6.3. Variable-order recompression of the two-dimensional double layer potential operator on the square with O D h2 , based on a dense matrix.
n Matrix Row Column Proj Mem Mem=n kX XQ k2 256 0:1 < 0:1 < 0:1 < 0:1 153:8 0:60 2:44 512 0:2 0:1 0:1 < 0:1 302:9 0:59 4:35 1024 0:9 0:4 0:4 0:1 598:5 0:58 1:25 2048 3:7 1:9 1:9 0:6 1185:5 0:58 3:76 15:1 7:8 7:6 2:4 2346:5 0:57 6:57 4096 8192 60:4 37:1 31:0 10:1 4620:8 0:56 1:47 16384 243:3 154:0 123:9 43:2 9132:9 0:56 4:78 32768 971:6 601:8 514:0 175:6 18058:7 0:55 1:78 1000
1000
Row basis Column basis O(n^2)
100
100
10
10
1
1
0.1 1000
10000
100000
100000
0.1 1000
Proj O(n^2)
10000
0.0001
Memory O(n)
100000 Error O(h^2)
1e-05 10000 1e-06 1000 1e-07
100 1000
10000
100000
1e-08 1000
10000
100000
very similar to those observed for the unit sphere: again the time for the construction scales like n2 , while the storage requirements of the resulting H 2 -matrix scale like n and the error scales like n2 . The previous two cases are covered by the theory presented in Sections 4.7 and 6.8. The situation is more complicated for the double layer potential operator D on the unit square: a variable-order approximation exists [81], but its construction is technical details connected to this approach is beyond the scope of this book, cf. [66].
6.9 Numerical experiments
275
complicated. Fortunately, the existence of an approximation is already sufficient for Algorithm 34, and the numerical results in Table 6.3 show that it finds an efficient H 2 -matrix approximation. Let us now consider the three-dimensional case. Since the matrix dimensions tend to grow rapidly due to the higher spatial dimension, we can no longer base our experiments on dense matrices, but construct an initial H 2 -matrix and apply recompression to improve its efficiency. We construct this initial approximation by the interpolation approaches discussed in the Sections 4.4 and 4.5. We start with a simple example: the matrix V corresponding to the single layer operator (cf. Section 4.9) is approximated by interpolation with constant order m D 4
Table 6.4. Recompression of the single layer potential operator with O D 4h2 105 , based on an initial approximation by constant-order interpolation with m D 4.
n Build Mem Mem=n MVM kX XQ k2 512 1:9 1:8 3:6 < 0:01 6:26 2048 14:6 8:6 4:3 0:04 9:57 8192 70:4 36:8 4:6 0:23 2:57 32768 322:6 165:5 5:2 1:05 6:38 131072 1381:5 664:5 5:2 4:25 1:28 524288 5975:2 2662:6 5:2 17:72 2:59 10000
10000
Build O(n)
1000
1000
100
100
10
10
1 100
1000
10000
100
100000
1e+06
1 100
MVM O(n)
1e-06
1
1e-07
0.1
1e-08
1000
10000
100000
1000
10000
1e-05
10
0.01 100
Memory O(n)
1e+06
1e-09 100
100000
1e+06
Error O(h^2)
1000
10000
100000
1e+06
276
6 Compression
on a cluster tree with Clf D 16, then we use Algorithm 30 with the variable-order error control scheme (cf. Section 6.8) to construct a more efficient approximation. According to Section 4.6, it is reasonable to use an error tolerance O h2 , where h is again the minimal meshwidth of the underlying triangulation. The results are given in Table 6.4. We can see that the computation time, the storage requirements and the time for the matrix-vector multiplication are in O.n/, as predicted by our theory, and that the error decreases at a rate of approximately h2 . Compared to the uncompressed case (cf. Table 4.1), we can see that the storage requirements are reduced by a factor of more than six, while the approximation error and the computation time are only slightly increased.
Table 6.5. Recompression of the double layer potential operator with O D 4h2 104 , based on an initial approximation by constant-order interpolation with m D 4.
n Build Mem Mem=n MVM kX XQ k2 512 1:9 1:8 3:5 < 0:01 9:15 2048 14:2 7:6 3:8 0:03 9:26 8192 68:6 31:0 3:9 0:18 2:26 32768 312:8 143:3 4:5 0:84 6:57 131072 1360:5 585:2 4:6 3:45 1:77 524288 5937:8 2388:1 4:7 15:00 4:78 10000
10000
Build O(n)
1000
1000
100
100
10
10
1 100
1000
10000
100
100000
1e+06
1 100
Memory O(n)
1000
10000
0.0001
MVM O(n)
10
100000
1e+06
Error O(h^2)
1e-05
1 1e-06 0.1 1e-07
0.01
0.001 100
1000
10000
100000
1e+06
1e-08 100
1000
10000
100000
1e+06
6.9 Numerical experiments
277
Table 6.6. Variable-order recompression of the single layer potential operator with O D 2h3 102 , based on an initial approximation by variable-order interpolation with ˛ D 2 and ˇ D 1.
Build Mem Mem=n MVM kX XQ k2 n 1:0 1:7 3:5 < 0:01 2:94 512 2048 6:9 7:3 3:6 0:02 1:24 8192 43:1 30:7 3:8 0:18 6:06 32768 267:9 142:9 4:5 0:84 6:77 131072 1574:9 590:5 4:6 3:49 8:28 524288 8271:4 2449:4 4:8 15:60 9:89 2097152 38640:7 9921:5 4:8 65:74 1:29 100000
100000
Build O(n)
10000
10000
1000
1000
100
100
10
10
1 100
1000
10000
100000
100
1e+06
1e+07
1 100
Memory O(n)
1000
10000
0.001
MVM O(n)
100000
1e+06
1e+07
Error O(h^3)
0.0001
10
1e-05 1 1e-06 0.1 1e-07 0.01
0.001 100
1e-08
1000
10000
100000
1e+06
1e+07
1e-09 100
1000
10000
100000
1e+06
1e+07
In the next experiment, we construct an approximation of the matrix K corresponding to the double layer operator (cf. Section 4.9) by using the derivatives of local interpolants (cf. Section 4.5) and again use Algorithm 30 with variable-order error control and a tolerance of O h2 . The results given in Table 6.5 are comparable to the previous experiment: the computation times are almost identical, and the storage requirements for K are slightly lower than for V . We have seen in Chapter 4 that variable-order interpolation schemes yield a very good approximation quality compared to their time and storage requirements, and of course we would like to ensure that these properties are preserved by the recompression algorithm. Combining an initial variable-order approximation with the variable-order
278
6 Compression
Table 6.7. Variable-order recompression of the double layer potential operator with O D 2h3 101 , based on an initial approximation by constant-order interpolation with m D 6.
Build Mem Mem=n MVM kX XQ k2 n 8:0 1:7 3:4 < 0:01 3:33 512 2048 71:4 7:2 3:6 0:02 1:13 8192 521:4 28:9 3:6 0:17 3:25 32768 2373:9 134:5 4:2 0:79 7:66 131072 10102:2 567:8 4:4 3:43 3:57 524288 41980:5 2353:6 4:6 15:52 3:38 100000
10000
Build O(n)
10000
Memory O(n)
1000
1000 100 100 10
10
1 100
1000
10000
100
100000
1e+06
1 100
1000
10000
0.01
MVM O(n)
100000
1e+06
Error O(h^3)
0.001
10
0.0001 1 1e-05 0.1 1e-06 0.01
0.001 100
1e-07
1000
10000
100000
1e+06
1e-08 100
1000
10000
100000
1e+06
recompression algorithm yields the results reported in Table 6.6: compared to the uncompressed case (cf. Table 4.6), the storage requirements are reduced by a factor of four, the approximation error is even smaller than before, only the computation time is increased. This latter effect can be explained by a closer look at the proof of Theorem 6.27: we have to bound the product of a ninth-order polynomial and an exponential term, and the corresponding constant is so large that the asymptotic behaviour is not yet visible. As a final example, we consider the variable-order recompression of the matrix K corresponding to the double layer potential operator. It is possible to base the recompression on variable-order interpolation schemes [23] or on Taylor expansions [90], [100], but the implementation would be relatively complicated and not very
6.9 Numerical experiments
279
efficient. Instead, we simply use an initial approximation constructed by constant-order interpolation with m D 6 and compress it using the strategy described in Section 6.8. The experimental results given in Table 6.7 show that the compression strategy is successful: the error decays like h3 , the storage requirements increase only like n.
Chapter 7
A priori matrix arithmetic
The structure of H - and H 2 -matrices is purely algebraic, therefore it is straightforward to wonder whether it is possible to perform matrix arithmetic operations like addition, multiplication, inversion or factorizations efficiently in these data-sparse formats. Factorizations could be used to construct efficient solvers for systems of linear equations, the matrix inverse would be a useful tool for the approximation of matrix functions or the solution of matrix equations like Lyapunov’s or Riccati’s. Both the inversion and the basic Cholesky and LU -factorizations can be expressed by means of sums and products of matrices, therefore we focus on algorithms for performing these fundamental operations efficiently. We consider three ways of handling sums and products of two H 2 -matrices A and B with different structure: • We can prescribe an H 2 -matrix space and compute the best approximations of A C B and AB C C in this space. Using orthogonal cluster bases, we can use the results of Chapter 5 in order to handle these computations very efficiently. • The exact sum A C B and the exact product AB are elements of H 2 -matrix spaces with suitably chosen block cluster trees and cluster bases. • We can compute an auxiliary approximation of a sum or a product in the form of a hierarchical matrix, which can then be approximated by an H 2 -matrix with adaptively chosen cluster bases by applying Algorithm 33. The three methods have different advantages and disadvantages: the first approach computes the best approximation of the result in a prescribed H 2 -matrix space, i.e., in order to reach a given precision, the space has to be chosen correctly. Constructing a suitable matrix space can be quite complicated in practical applications, but once it is available, the algorithm reaches the optimal order of complexity. The second approach yields exact sums and products, but at the cost of a significant increase both in the number of nodes in the block cluster tree and in the rank. Therefore it is, at least at the moment, only interesting for theoretical investigations, but computationally too complex to be used in practical applications. The third approach combines both methods: it computes an intermediate approximation by a technique similar to the second approach, relying on simple blockwise low-rank approximations instead of an exact representation in an H 2 -matrix format. Since this approach is only loosely related to the other two, it is discussed in the separate Chapter 8.
7.1 Matrix forward transformation
281
The current chapter is organized as follows: • Section 7.1 introduces the matrix forward transformation algorithm, a variant of the forward transformation Algorithm 6 that applies to matrices and is of crucial importance for the algorithms introduced in the following Sections 7.4 and 7.7. • Section 7.2 is devoted to its counterpart, the matrix backward transformation algorithm that is also required in Sections 7.4 and 7.7. • Section 7.3 contains the definitions and notations required to investigate the matrix addition. • Section 7.4 considers an algorithm for computing the best approximation of the sum of two matrices in a given H 2 -matrix space (cf. [11], Section 3.3). • Section 7.5 considers the computation of the exact sum of two matrices in an extended H 2 -matrix space that will usually be too large for practical applications. • Section 7.6 contains the definitions and notations required to investigate the matrix multiplication. • Section 7.7 is devoted to an algorithm for computing the best approximation of the product of two matrices in a given H 2 -matrix space (cf. [11], Section 4). This algorithm and the proof of its optimal complexity are the main results of this chapter. • Section 7.8 considers the computation of the exact product of two matrices in an extended H 2 -matrix space that will usually be far too large for practical applications. • Section 7.9 presents numerical experiments that show that the complexity estimates for the algorithm introduced in Section 7.7 are of optimal order. Assumptions in this chapter: We assume that cluster trees T , TJ and TK for the finite index sets , J and K, respectively, are given. Let n ´ #, nJ ´ #J and nK ´ #K denote the number of indices in each of the sets, and let c ´ #T , cJ ´ #TJ and cK ´ #TK denote the number of clusters in each of the cluster trees.
7.1 Matrix forward transformation In this chapter, we frequently have to compute representations of blocks of an H 2 matrix in varying cluster bases. Differently from the simple techniques introduced in Section 5.3, which only handle transformations between admissible leaves of a block cluster tree, we require transformations between admissible leaves, inadmissible leaves and non-leaf blocks.
282
7 A priori matrix arithmetic
We now introduce algorithms for handling these transformations efficiently. Let A 2 H 2 .TA;J ; VA ; WA /, where VA D .VA;t / t2T and WA D .WA;s /s2TJ are nested cluster bases for T and TJ , respectively, and TA;J is an admissible block cluster tree. Let V D .V t / t2T and W D .Ws /s2TJ be nested cluster bases for T and TJ , respectively, and let K D .K t / t2T and L D .Ls /s2TJ be the rank distributions for V and W , respectively. We consider the computation of the matrix family SyA D .SyA;b /b2TA;J (cf. Figure 7.1) defined by SyA;b ´ V t AWs 2 RK t Ls
for all b 2 TA;J :
(7.1)
If V and W are orthogonal cluster bases, the matrix V t SyA;b Ws D V t V t AWs Ws is the best approximation of the block t As with respect to the Frobenius norm (cf. Lemma 5.5), but the matrices SyA D .SyA;b /b2TA;J are also useful in the nonorthogonal case (cf. equation (7.17) used for computing matrix-matrix products).
Figure 7.1. Blocks involved in the matrix forward transformation.
If b D .t; s/ 2 TA;J is an inadmissible leaf of TA;J , both t and s have to be leaves of T and TJ , respectively, and we can compute SyA;b directly by using its definition (7.1). If b is an admissible leaf of TA;J , we have t As D VA;t SA;b WA;s
and conclude Ws D PV;t SA;b PW;s ; SyA;b D V t AWs D V t VA;t SA;b WA;s
(7.2)
where the cluster operators PV D .PV;t / t2T and PW D .PW;s /s2TJ are defined by PV;t ´ V t VA;t ;
PW;s ´ WA;s Ws ;
for all t 2 T ; s 2 TJ :
We can see that PV and PW are cluster basis products of the cluster bases V , VA , W and WA , so we can use Algorithm 13 to compute these cluster operators efficiently.
7.1 Matrix forward transformation
283
This leaves us with the case that b is not a leaf, i.e., that sons.TA;J ; b/ ¤ ; holds. According to Definition 3.12, we have 8 ˆ if sons.t / D ;; sons.s/ ¤ ;; .˛ C ˇ.` 1//r for s 2 row .TA;J ; t /g; DC;` ´ ft 2 T W #KC;t > .˛ C ˇ.` 1//r g; O 1//r g DM;` ´ ft 2 T W #KM;t > .˛O C ˇ.` O for all ` 2 N. Let ` 2 N, and let t 2 DM;` . According to the definition of ˛O and ˇ, this implies X .Csp C 2/.˛ C ˇ`/r < #KM;t D #KC;t C #KA;t C #KB;s : (7.31) s2row .TA;J ;t/
Due to row .TA;J ; t / row.TA;J ; t /, we have # row .TA;J ; t / Csp and find that (7.31) can only hold if at least one of the following holds: Case 1: #KC;t > .˛ C ˇ.` 1//r . This implies t 2 DC;` . Case 2: #KA;t > .˛ C ˇ.` 1//r . This implies t 2 DA;` . Case 3: There exists a cluster s 2 row .TA;J ; t / with #KB;s > .˛ C ˇ.` 1//r . This implies t 2 DB;` . We conclude that DM;` DC;` [ DA;` [ DB;` (7.32) holds. Since KA and KC are .Cbn ; ˛; ˇ; r; /-bounded, we have #DC;` Cbn ` c ;
#DA;` Cbn ` c :
For the set DB;` , we have DB;` D ft 2 T W #KB;s > .˛ C ˇ.` 1//r for an s 2 row .TA;J ; t /g ft 2 T W #KB;s > .˛ C ˇ.` 1//r for an s 2 row.TA;J ; t /g
324
7 A priori matrix arithmetic
D ft 2 T W #KB;s > .˛ C ˇ.` 1//r for an s 2 TJ with t 2 col.TA;J ; s/g [ D col.TA;J ; s/ s2TJ #KB;s >.˛Cˇ.`1//r
and can use the sparsity of TA;J to find #DB;` Csp #fs 2 TJ W #KB;s > .˛ C ˇ.` 1//r g Csp Cbn ` cJ Csp Cbn Ccr ` c : Combining the estimates for the cardinalities of DC;` , DA;` and DB;` with the inclusion (7.32) yields the required bound for #DM;` . Case 2: bA is admissible If bA is admissible, we have t As D VA;t SA;bA WA;s
and need to ensure that the cluster bases VM and WM are chosen in such a way that t As Br D VA;t SA;bA WA;s s Br
can be expressed. Case 1 already covers the situation that bB is admissible. If bB is not admissible, we have to ensure that the cluster basis WM is able to express r B s WA;s , i.e., that there is a matrix RW;r;s with r B s WA;s D WM;r RW;r;s ;
(7.33)
since then the product takes the form t As Br D VA;t SA;bA WA;s s Br D VA;t SA;bA RW;r;s WM;r ;
i.e., it can be expressed in VA and WM . Since bB is not admissible, we have bB 2 TB;JK , i.e., s 2 col.TB;JK ; r/, and since TB;JK is Csp -sparse, we only have to ensure (7.33) for up to Csp clusters s. We introduce the set C col .TB;JK ; r/ ´ fs 2 TJ W .s; r/ 2 TB;JK n LB;JK g
D col.TB;JK ; r/ n colC .TB;JK ; r/ and can now define a cluster basis satisfying our requirements.
7.8 Exact matrix-matrix multiplication
325
Definition 7.24 (Induced column basis). Let all index sets in the rank distributions LA D .LA;s /s2TJ , LB D .LB;r /r2TK and LC D .LC;r /r2TK be disjoint. We define the induced column cluster basis for the multiplication by WM ´ .WM;r /r2TK with WM;r ´ WC;r
WB;r
r B WA;s1
:::
r B WA;s
(7.34)
for all r 2 TK , where ´ # col .TB;JK ; r/ is the number of inadmissible blocks connected to r and fs1 ; : : : ; s g D col .TB;JK ; r/ are the corresponding row clusters. The rank distribution for WM is given by LM ´ .LM;r /r2TK with P LB;r [ P LA;s1 [ P [ P LA;s : LM;r ´ LC;r [ Lemma 7.25. The induced column cluster basis WM is nested. If the rank distributions LA , LB and LC are .Cbn ; ˛; ˇ; r; /-bounded and if cJ Ccr cK holds, the rank distribution LM of the induced column cluster basis O r; /-bounded with is .Cybn ; ˛; O ˇ; Cybn ´ Cbn .Ccr Csp C 2/;
˛O ´ ˛
p r
Csp C 2;
ˇO ´ ˇ
p r
Csp C 2:
Proof. Similar to the proofs of the Lemmas 7.22 and 7.23. Case 3: bA and bB are inadmissible We cannot handle the case that bA and bB are both inadmissible while bC is admissible, since it leaves us with no information whatsoever about the structure of the product t As Br . Therefore we have to ensure that we never encounter the situation that bC is an admissible leaf, but bA and bB are inadmissible. This can be done by choosing the block cluster tree TM;K appropriately: we construct TM;K based on TC;K and subdivide its leaves .t; r/ as long as there is a cluster s 2 TJ such that .t; s/ and .s; r/ are both inadmissible and at least one of them can be subdivided further. A block in TM;K is considered admissible if it can be expressed by VM and WM , i.e., if for each s 2 TJ with .t; s/ 2 TA;J and .s; r/ 2 TB;JK at least one of the blocks is admissible. Definition 7.26 (Induced block cluster tree). Let TM;K be the minimal block cluster tree for T and TK satisfying the following conditions: • A block b D .t; r/ 2 TM;K is subdivided if it is subdivided in TC;K or if there exists a cluster s 2 TJ with .t; s/ 2 TA;J and .s; r/ 2 TB;JK such that both are inadmissible and at least one of them is subdivided, i.e., the sons of b
326
7 A priori matrix arithmetic
in TM;K are given by 8 ˆ sonsC .t / sonsC .r/ ˆ ˆ ˆ ˆ < sons.TM;K ; b/ D ˆ ˆ ˆ ˆ ˆ :;
if b 2 C or there exists s 2 TJ with .t; s/ 2 A ; .s; r/ 2 B or .t; s/ 2 A ; .s; r/ 2 B ; otherwise: (7.35)
• A block b D .t; r/ 2 TM;K is admissible if and only if for all s 2 TJ with .t; s/ 2 TA;J and .s; r/ 2 TB;JK at least one of these blocks is admissible, i.e., C b D .t; r/ 2 LM;K () .8s 2 TJ W .t; s/ 2 TA;J ^ .s; r/ 2 TB;JK C C ) .t; s/ 2 LA;J _ .s; r/ 2 LB;JK /: (7.36)
Then TM;K is called the induced block cluster tree for the multiplication. Lemma 7.27 (Admissibility). The induced block cluster tree TM;K is admissible. Proof. In order to prove that TM;K is admissible, we have to ensure that each inadmissible leaf b D .t; r/ 2 TM;K corresponds to leaf clusters t and r. Let b D .t; r/ 2 LM;K be an inadmissible leaf. According to (7.36), this means that there exists a cluster s 2 TJ satisfying .t; s/ 2 TA;J and .s; r/ 2 TB;JK with C C .t; s/ 62 LA;J and .s; r/ 62 LB;JK , i.e., .t; s/ 2 A and .s; r/ 2 B . If sons.t/ ¤ ; would hold, we would have .t; s/ 62 LA;J , and .t; s/ 2 A D C TA;J n LA;J would imply .t; s/ 2 A D TA;J n LA;J . By (7.35), we would get sons.TM;K ; b/ ¤ ;, which is impossible, since b is a leaf of TM;K . Therefore we can conclude sons.t / D ;. We can apply the same reasoning to r in order to prove sons.r/ D ;, which implies that TM;K is indeed an admissible block cluster tree. Lemma 7.28 (Sparsity). If TA;J , TB;JK and TC;K are Csp -sparse, the tree TM;K is Csp .Csp C 1/-sparse. Proof. Let t 2 T . We prove row.TM;K ; t / row.TC;K ; t / [ fr 2 T W there exists s 2 TJ with .t; s/ 2 TA;J ; .s; r/ 2 TB;JK g:
(7.37)
Let r 2 row.TM;K ; t /. If b ´ .t; r/ D root.TM;K /, we have t D root.T / and r D root.TK /, i.e., .t; r/ D root.TC;K / 2 TC;K .
7.8 Exact matrix-matrix multiplication
327
Otherwise, b has to have a father, i.e., a block b C D .t C ; r C / 2 TM;K with b 2 sons.TM;K ; b C /. If b C 2 C , we have b 2 sons.TM;K ; b C / D sons.TC;K ; b C / and therefore r 2 row.TC;K ; t /. Otherwise, we can find a cluster s C 2 TJ with bAC ´ .t C ; s C / 2 A TA;J and bBC ´ .s C ; r C / 2 B TB;JK or with bAC D .t C ; s C / 2 A TA;J and bBC D .s C ; r C / 2 B TB;JK . We let s 2 sonsC .s C / and bA ´ .t; s/ and bB ´ .s; r/. Due to t 2 sonsC .t C / and r 2 sonsC .r C /, we can apply Lemma 7.17 in order to find bA 2 TA;J and bB 2 TB;JK , which proves (7.37). This inclusion yields X # row.TM;K ; t / # row.TC;K ; t / C # row.TB;JK ; s/ s2row.TA;J ;t/
Csp C
Csp2
D Csp .Csp C 1/:
A similar argument can be applied to demonstrate # col.TM;K ; r/ Csp .Csp C 1/ and conclude this proof. Theorem 7.29 (Exact matrix multiplication). Let VM D .VM;t / t2T be the induced row cluster basis from Definition 7.21 and let WM D .WM;r /r2TK be the induced column cluster basis from Definition 7.24. Let TM;K be the induced block cluster tree from Definition 7.26. Then we have M D C C AB 2 H 2 .TM;K ; VM ; WM /. Proof. Due to the construction of TM;K , VM and WM , we can directly see that C 2 H 2 .TM;K ; VM ; WM / holds. Since H 2 .TM;K ; VM ; WM / is a matrix space (cf. Remark 3.37), it is sufficient to prove AB 2 H 2 .TM;K ; VM ; WM /. We prove the more general statement t As Br 2 H 2 .TM;K ; VM ; WM /
(7.38)
for all t 2 T , s 2 TJ and r 2 TK with .t; s/ 2 TA;J , .s; r/ 2 TB;JK and .t; r/ 2 TM;K . For t D root.T /, s D root.TJ / and r D root.TK /, this implies the desired property. C holds, i.e., that Case 1: Let us first consider the case that bA ´ .t; s/ 2 LA;J bA is an admissible leaf. We have seen in Case 1 that t As Br D VA;t .SA;bA RW;r;s /WM;r 2 H 2 .TM;K ; VM ; WM /;
holds, i.e., the product can be expressed by the induced row and column cluster bases. C Case 2: Let us now consider the case that bB ´ .s; r/ 2 LB;JK holds, i.e., that bB is an admissible leaf. According to Case 2, we have t As Br D VM;t .RV;t;s SB;bB /WB;r 2 H 2 .TM;K ; VM ; WM /;
328
7 A priori matrix arithmetic
i.e., the product can once more be expressed in the induced cluster bases. Induction: We prove (7.38) by an induction on the number of descendants of bA and bB , i.e., on # sons .bA / C # sons .bB / 2 N2 . Let us first consider bA D .t; s/ 2 TA;J and bB D .s; r/ 2 TB;JK with # sons .bA / C # sons .bB / D 2. This implies sons .bA / D fbA g and sons .bB / D fbB g, i.e., bA and bB are leaves. If at least one of them is admissible, we have already established that (7.38) holds. Otherwise, i.e., if bA and bB are inadmissible leaves, (7.36) implies that bC ´ .t; r/ is also an inadmissible leaf of TM;K , i.e., (7.38) holds trivially. Let now n 2 N2 , and assume that (7.38) holds for all t 2 T , s 2 TJ and r 2 TK with bA D .t; s/ 2 TA;J , bB D .s; r/ 2 TB;JK and bC D .t; r/ 2 TM;K satisfying # sons .bA / C # sons .bB / n. Let t 2 T , s 2 TJ and r 2 TK with bA D .t; s/ 2 TA;J , bB D .s; r/ 2 TB;JK and bC D .t; r/ 2 TM;K and # sons .bA / C # sons .bB / D n C 1. C C If bA or bB are admissible leaves, i.e., if bA 2 LA;J or bB 2 LB;JK holds, we can again use Case 1 or Case 2 to conclude that (7.38) holds. Otherwise, i.e., if bA and bB are inadmissible, they cannot both be leaves, since this would imply # sons .TA;J ; bA / C # sons .TB;JK ; bB / D 2 < n C 1. Therefore at least one of bA and bB has to be subdivided. Definition 7.26 yields sons.TM;K ; bC / D sonsC .t / sonsC .r/, and we observe X X X t 0 As 0 Br 0 : t As Br D t 0 2sonsC .t/ s 0 2sonsC .s/ r 0 2sonsC .r/
For all t 0 2 sonsC .t /, s 0 2 sonsC .s/ and r 0 2 sonsC .r/, Lemma 7.17 yields .t 0 ; s 0 / 2 TA;J and .s 0 ; r 0 / 2 TB;JK . Since at least one of bA and bB is subdivided, we can apply the induction assumption to get t 0 As 0 Br 0 2 H 2 .TM;K ; VM ; WM / for all t 0 2 sonsC .t /, s 0 2 sonsC .s/ and r 0 2 sonsC .r/. According to Remark 3.37, we can conclude X X X t As Br D t 0 As 0 Br 0 2 H 2 .TM;K ; VM ; WM /: t 0 2sonsC .t/ s 0 2sonsC .s/ r 0 2sonsC .r/
7.9 Numerical experiments On a smooth surface, the matrices V and K introduced in Section 4.9 correspond to discretizations of asymptotically smooth kernel functions. The products V V , VK, K V and KK of the matrices correspond, up to discretization errors, to the convolutions of these kernel functions, and we expect (cf. [63], Satz E.3.7) that these convolutions
7.9 Numerical experiments
329
will again be asymptotically smooth. Due to the results of Chapter 4, this means that they can be approximated by polynomials and that the approximation will converge exponentially if the order is increased. Up to errors introduced by the discretization process, the matrix products V V , K V , VK and KK can be expected to share the properties of their continuous counterparts, i.e., it should be possible to approximate the products in the H 2 -matrix space defined by polynomials of a certain order m, and the approximation should converge exponentially as the order increases. In the first example, we consider a sequence of polygonal approximations of the unit sphere S D fx 2 R3 W kxk D 1g consisting of n 2 f512; 2048; : : : ; 131072g plane triangles. On each of these grids, we construct orthogonal nested cluster bases by applying Algorithm 16 to the polynomial bases introduced in Sections 4.4 and 4.5. Together with a standard block cluster tree, these cluster bases define H 2 -matrix spaces, and we use Algorithm 45 to compute the best approximation of the products V V , K V , VK and KK in these spaces. The results are collected in Table 7.1 (which has been taken from [11]). For each grid, each product and each interpolation order, it contains the
Table 7.1. Multiplying double and single layer potential on the unit sphere.
n mD2 512 2:2 1:54 2048 13:0 2:64 8192 66:5 4:64 32768 283:9 5:44 131072 1196:8 5:64 512 2:3 2:63 KV 2048 13:8 5:43 8192 71:1 1:02 32768 304:3 1:62 131072 1257:4 2:32 VK 512 2:3 5:03 2048 14:0 2:12 8192 72:7 4:22 32768 313:1 7:02 131072 1323:6 1:11 KK 512 2:2 6:94 2048 12:9 2:93 8192 66:4 6:13 32768 283:9 1:12 131072 1169:1 1:62
Oper. VV
mD3 1:0 8:57 32:3 9:66 184:2 3:65 897:0 4:95 3625:6 4:75 1:0 4:95 34:4 1:74 196:7 8:54 959:9 2:33 3876:3 4:03 1:0 7:86 35:6 4:24 204:3 2:03 1003:3 4:13 4101:6 7:13 1:0 9:46 32:4 5:25 184:0 2:64 894:0 5:54 3602:0 9:64
mD4 0:4 conv. 40:8 4:67 355:2 1:66 1919:2 2:36 7918:5 3:66 0:4 conv. 41:7 5:96 374:8 5:35 1982:2 1:44 8361:8 2:84 0:4 conv. 41:7 1:95 395:3 1:34 2098:5 2:64 8744:8 5:14 0:4 conv. 40:6 4:86 354:4 1:85 1881:5 3:55 7839:9 6:85
330
7 A priori matrix arithmetic
time1 in seconds for computing the product by Algorithm 45 and the resulting approximation error in the relative operator norm kX ABk2 =kABk2 , which was estimated using a power iteration. We can see that increasing the interpolation order m will typically reduce the approximation error by a factor of 10, i.e., we observe the expected exponential convergence. Since we are working with interpolation in three space dimensions, the rank of the cluster bases is bounded by k D m3 , i.e., we expect a behaviour like O.nk 2 / D O.nm6 / in the computation time. Especially for higher interpolation orders and higher problem dimensions, this behaviour can indeed be observed. In Table 7.2, we investigate the behaviour of the polynomial approximation on the surface C D @Œ1; 13 of the cube. Since it is not a smooth manifold, we expect Table 7.2. Multiplying double and single layer potential on the unit cube.
n mD2 768 2:0 2:93 3072 10:9 3:63 12288 49:7 3:83 49152 208:6 4:03 196608 833:0 4:03 KV 768 2:2 6:32 3072 11:7 7:02 12288 53:4 8:52 49152 222:8 8:62 196608 869:8 8:22 VK 768 2:4 1:91 3072 11:8 2:71 12288 55:0 3:41 49152 232:3 3:71 196608 930:5 3:71 KK 768 2:0 2:72 3072 11:0 3:52 12288 49:7 4:72 49152 206:5 5:72 196608 804:8 6:32
Oper. VV
mD3 3:8 3:54 32:0 6:34 176:7 7:94 850:7 8:54 3692:2 8:64 3:9 5:53 34:9 1:42 200:4 1:92 978:0 2:12 4245:9 2:12 3:8 4:42 37:7 1:11 214:5 1:61 1059:7 2:01 4614:4 2:31 3:8 5:13 32:6 1:22 185:3 1:92 903:1 2:42 3945:2 2:72
mD4 1:2 conv. 133:7 2:04 455:6 2:84 1867:4 3:44 7542:0 3:64 1:2 conv. 134:6 3:23 470:0 7:23 1922:1 8:63 7862:8 9:13 1:2 conv. 134:4 3:92 484:0 8:32 2045:5 1:21 8454:1 1:41 1:2 conv. 132:6 5:13 454:3 9:43 1823:6 1:42 7374:3 1:82
that the smoothness of the result of the multiplication, and therefore the speed of convergence, will be reduced. This is indeed visible in the numerical experiments: we no longer observe a convergence like 10m , but only 2m for V V and K V and even 1
On a single 900 MHz UltraSPARC IIIcu processor of a SunFire 6800 computer.
7.9 Numerical experiments
331
slower convergence for VK and KK. The latter effect might be a consequence of the fact that the column cluster bases used for K involve the normal derivative of the kernel function, which reduces the order of smoothness even further.
Chapter 8
A posteriori matrix arithmetic
In the previous chapter, we have considered two techniques for performing arithmetic operations like matrix addition and multiplication with H 2 -matrices. The first approach computes the best approximation of sums and products in a prescribed H 2 -matrix space, and this can be accomplished in optimal complexity. Since the a priori construction of suitable cluster bases and block partitions can be very complicated in practical applications, the applicability of this technique is limited. The second approach overcomes these limitations by constructing H 2 -matrix spaces that can represent the exact sums and products without any approximation error. Unfortunately, the resulting matrix spaces lead to a very high computational complexity, and this renders them unattractive as building blocks for higher-level operations like the LU factorization or the matrix inversion. In short, the first approach reaches the optimal complexity, while the second approach reaches the optimal accuracy. In this chapter, we introduce a third approach that combines the advantages of both: it allows us to construct adaptive cluster bases a posteriori which ensure a prescribed accuracy and it does not lead to excessively high storage requirements. The new algorithm consists of three parts: first the exact matrix-matrix product is computed in an intermediate representation that is closely related to the left and right semi-uniform matrices introduced in Section 6.1. Since the block cluster tree for the intermediate representation would lead to prohibitively high storage requirements, we do not construct this representation explicitly, but instead approximate it on the fly by a suitable H -matrix format with a coarse block structure. In the last step, this H -matrix is converted into an H 2 -matrix by Algorithm 33. The resulting procedure for approximating the matrix-matrix product is sketched in Figure 8.1. In practical applications, it is advisable to combine the second and third step in order to avoid the necessity of storing the entire intermediate H -matrix (cf. Algorithm 33), but for our theoretical investigations, it is better to treat them separately. The accuracy of these two approximation steps can be controlled, therefore the product can be computed with any given precision. Since the product is computed block by block, the new procedure can be used as a part of inversion and factorization algorithms. The price for its flexibility is an increased algorithmic complexity: the new algorithm does not scale linearly with the number of degrees of freedom, but involves an additional logarithmic factor. The chapter is organized as follows: • Section 8.1 introduces the structure of semi-uniform matrices, a generalization of the left and right semi-uniform matrices defined in Section 6.1.
333 • Section 8.2 contains an algorithm for computing the matrix M0 ´ C C AB exactly and representing it as a semi-uniform matrix. • Section 8.3 introduces an algorithm for coarsening the block structure to compute z that approximates M0 . an H -matrix M • Section 8.4 is devoted to the construction of adaptive cluster bases for this H z into an H 2 -matrix approximation of C C AB. matrix and the conversion of M • Section 8.5 presents numerical experiments that demonstrate that the new multiplication algorithm ensures the desired accuracy and produces data-sparse approximations of the exact result. H 2 -matrices A, B, C
Input A, B, C
Exact computation 2 Semi-uniform matrix M0 2 Hsemi .TM;K ; VA ; WB /
M0 ´ C C AB
Coarsening z 2 H .TC;K ; kH / H -matrix M
z M0 M
Compression M C C AB
H 2 -matrix M with adaptive cluster bases
Figure 8.1. A posteriori multiplication.
Assumptions in this chapter: We assume that cluster trees T , TJ and TK for the finite index sets , J and K, respectively, are given. Let n ´ #, nJ ´ #J and nK ´ #K denote the number of indices in each of the sets, and let c ´ #T , cJ ´ #TJ and cK ´ #K denote the number of clusters in each of the cluster trees. Let TA;J , TB;JK and TC;K be admissible block cluster trees for T and TJ , TJ and TK , and T and TK , respectively. Let VA and VC be nested cluster bases for T with rank distributions KA and KC , let WA and VB be nested cluster bases for TJ with rank distributions LA and KB , and let WB and WC be nested cluster bases for TK with rank distributions LB and LC .
334
8 A posteriori matrix arithmetic
8.1 Semi-uniform matrices The new algorithm for approximating the matrix-matrix product can be split into two phases: first an exact representation of the product is computed, then this representation is converted into an H 2 -matrix with adaptively-chosen cluster bases. In Section 7.8, we have seen that the exact product is an H 2 -matrix for a modified block cluster tree TM;K and modified cluster bases VM and WM . In practice, these modified cluster bases VM and WM will have very high ranks, i.e., the representation of the exact product as an H 2 -matrix will require large amounts of storage (cf. Lemma 7.23 taking into account that Csp 100 holds in important applications). In order to reduce the storage requirements, we replace the H 2 -matrix space 2 H .TM;J ; VM ; WM / used in Section 7.8 by a new kind of matrix subspace: Definition 8.1 (Semi-uniform hierarchical matrices). Let X 2 RJ be matrix. Let TJ be an admissible block cluster tree, and let V D .V t / t2T and W D .Ws /s2TJ be nested cluster bases with rank distributions K D .K t / t2T and L D .Ls /s2TJ . X is a semi-uniform matrix for TJ , V and W if there are families A D .Ab /b2LC J
t s of matrices satisfying Ab 2 RL and Bb 2 RsJK for all O tO
and B D .Bb /b2LC
J
b D .t; s/ 2 LC J and X
XD
bD.t;s/2LC J
.Ab Ws C V t Bb / C
X
t Xs :
(8.1)
bD.t;s/2L J
The matrices A D .Ab /b2LC and B D .Bb /b2LC are called left and right coeffiJ J cient matrices. Obviously, a matrix X is semi-uniform if and only if it can be split into a sum of a left and a right semi-uniform matrix, i.e., the set of semi-uniform matrices is the sum of the spaces of left and right semi-uniform matrices. Remark 8.2 (Matrix subspaces). Let TJ , V and W be as in Definition 8.1. The set 2 .TJ ; V; W / ´ fX 2 RJ W X is a semi-uniform Hsemi matrix for TJ ; V and W g
is a subspace of RJ . 2 .TJ ; V; W / during Proof. Since we have to compute sums of matrices in Hsemi the course of the new multiplication algorithm, we prove the claim constructively instead of simply relying on the fact that H 2 .TJ ; ; W / and H 2 .TJ ; V; / are subspaces. 2 Let X; Y 2 Hsemi .TJ ; V; W /, and let ˛ 2 R. Let b D .t; s/ 2 LC J be an admissible leaf. Due to Definition 8.1 and Lemma 5.4, there are matrices AX;b ; AY;b 2
8.1 Semi-uniform matrices
335
t s and BX;b ; BY;b 2 RsJK satisfying RL O tO
C ˛.AY;b Ws C V t BY;b / t .X C ˛Y /s D t Xs C ˛ t Ys D AX;b Ws C V t BX;b
D .AX;b C ˛AY;b /Ws C V t .BX;b C ˛BY;b / : Applying this equation to all admissible leaves b 2 LC J of TJ yields the desired result and proves that we can compute the representation of linear combinations efficiently by forming linear combinations of the left and right coefficient matrices. We construct semi-uniform matrices by accumulating blockwise updates, and since these updates can appear for arbitrary blocks of the block cluster tree TJ , not only for leaf blocks, we have to ensure that the result still matches Definition 8.1. t s and B 2 RJK . Let TJ , Lemma 8.3 (Restriction). Let t 2 T , s 2 TJ , A 2 RL sO tO V and W be as in Definition 8.1. For all t 2 sons .t / and s 2 sons .s/, we have
t .AWs C V t B /s D . t AFs ;s /Ws C V t .s BE t ;t / : If .t; s/ 2 TJ holds, this equation implies 2 AWs C V t B 2 Hsemi .TJ ; V; W /:
Proof. Let t 2 sons .t / and s 2 sons .s/. Due to Lemma 6.13, we have t .AWs /s D t A.s Ws / D t A.Ws Fs ;s / D t AFs ;s Ws ; t .V t B /s D t V t B s D V t E t ;t B s D V t .s BE t ;t / ; and adding both equations proves our first claim. Now let .t; s/ 2 TJ . We have to prove that AWs C V t B is an element of the 2 .TJ ; V; W /. Let b ´ .t; s/ 2 TJ , and let Tb be the subtree of TJ space Hsemi with root b (cf. Definition 6.30). Due to Corollary 3.15, we find that Lb D fb D .t ; s / 2 LJ W b 2 sons .b/g describes a disjoint partition fbO D tO sO W b 2 Lb g O and this implies of b, AWs C V t B D
X
t .AWs C V t B /s
b D.t ;s /2Lb
D
X
. t AFs ;s /Ws C V t .s BE t ;t / :
b D.t ;s /2Lb
Since all blocks b appearing in this sum are leaves of Tb , each of the terms in the sum 2 is an element of Hsemi .TJ ; V; W / due to Definition 8.1. According to Remark 8.2, 2 Hsemi .TJ ; V; W / is a linear space, so the sum itself is also contained and we conclude 2 .TJ ; V; W /. AWs C V t B 2 Hsemi
336
8 A posteriori matrix arithmetic
8.2 Intermediate representation We now consider the computation of the matrix M0 ´ C C AB for A 2 RJ , B 2 RJK and C 2 RK . As in Section 7.6, we assume that A, B and C are H 2 -matrices given in the standard H 2 -matrix representation X X VA;t SA;b WA;s C t As ; AD bD.t;s/2LA;J
C bD.t;s/2LA;J
BD
X
VB;s SB;b WB;r C
X
s Br ;
bD.s;r/2LB;JK
C bD.s;r/2LB;JK
C D
X
VC;t SC;b WC;r C
X
t Cr
bD.t;r/2L C;K
bD.t;r/2LC C;K
for families SA D .SA;b /b2LC , SB D .SB;b /b2LC and SC D .SC;b /b2LC A;J B;JK C;K of coupling matrices. We denote the sets of subdivided blocks of TA;J , TB;JK and TC;K by J , JK and K and the sets of inadmissible blocks by J , JK and K (cf. Definition 7.16). The first step of the new multiplication algorithm consists of representing the exact result M0 ´ C C AB as a semi-uniform matrix. As in Section 7.7, we express the operation M0
C C AB
M0 M0
C; M0 C t1 As1 Br1 ;
as a sequence
:: : M0
M0 C tm Asm Brm
of updates involving subblocks ti Asi of A and si Bri of B. If we can find a good bound for the number of updates and perform each update efficiently, the resulting algorithm is also efficient. Let us consider one of the updates. We fix t 2 T , s 2 TJ and r 2 TK , and let bA ´ .t; s/, bB ´ .s; r/ and bC ´ .t; r/. Our goal is to perform the update t M0 r
t M0 r C t As Br
efficiently. As in Section 7.8, we now investigate three special cases.
(8.2)
8.2 Intermediate representation
337
Case 1: bB is admissible If bB is admissible, we have s Br D VB;s SB;bB WB;r ;
since B is an H 2 -matrix. This means that the product on the right-hand side of (8.2) has the form t As Br D t AVB;s SB;bB WB;r D . t AVB;s SB;bB /WB;r :
(8.3)
We recall Definition 6.1 and see that the structure of the product is that of a right semi-uniform matrix. We can split the computation of the product t AVB;s SB;bB into two parts: the challenging part is finding AybA ´ t AVB;s
(8.4)
efficiently, the remaining multiplication by SB;bB can be handled directly. Since the matrices AybA resemble the matrices computed by the matrix forward transformation Algorithm 36, we can adapt its recursive approach: we have X X t D t 0 ; VB;s D VB;s 0 EB;s 0 ;s : t 0 2sonsC .t/
s 0 2sonsC .s/
If sons.bA / ¤ ;, Definition 3.12 implies sons.bA / D sonsC .t / sonsC .s/ and we find X X AybA D t AVB;s D t 0 AVB;s 0 EB;s 0 ;s D
X
t 0 2sonsC .t/ s 0 2sonsC .s/
0 D.t 0 ;s 0 /2sons.b / bA A
AybA0 EB;s 0 ;s :
(8.5)
Otherwise, i.e., if bA is a leaf of TA;J , it can be either an inadmissible leaf, in which case we can compute AybA directly by using its definition (8.4), or it is an admissible leaf, and we get AybA D t AVB;s D t As VB;s D VA;t SA;bA WA;s VB;s :
The computation of this product can be split into three steps: the cluster basis product PAB D .PAB;s /s2TJ given by PAB;s ´ WA;s VB;s for all s 2 TJ can be computed efficiently by Algorithm 13, multiplying PAB;s by SA;bA is straightforward, so we only VB;s D require a method for multiplying the intermediate matrix Yyt ´ SA;bA WA;s SA;bA PAB;s with VA;t . We could use a block variant of the backward transformation
338
8 A posteriori matrix arithmetic
Algorithm 7 to compute VA;t Yyt , but for our application it is simpler and in the general case even more efficient to prepare and store the matrices .VA;t / t2T explicitly instead of using the transfer matrices. Algorithm 46 can be used to compute these matrices efficiently. Combining Algorithm 13 for the computation of cluster basis products with the block backward transformation and the recursion (8.5) yields Algorithm 47, the semiuniform matrix forward transformation. Algorithm 46. Expansion of a nested cluster basis. procedure ExpandBasis(t , var V ); if sons.t/ ¤ ; then t 0 2 RtK ; Vt O 0 for t 2 sons.t / do ExpandBasis(t 0 , V ); Vt Vt C Vt 0 Et 0 end for end if
Algorithm 47. Semi-uniform matrix forward transformation for the matrix A. y procedure SemiMatrixForward(b, VA , VB , PAB , SA , A, var A); .t; s/ b; if sons.b/ ¤ ; then Ayb 0; for b 0 2 sons.b/ do b0; .t 0 ; s 0 / y SemiMatrixForward(b 0 , VA , VB , PAB , SA , A, A); Ayb Ayb C Ayb 0 EB;s 0 ;s end for else if b is admissible then Yyt SA;b PAB;s ; VA;t Yyt Ayb else Ayb t AVB;s end if This procedure is a variant of the matrix forward transformation Algorithm 36 used in Chapter 7: instead of applying cluster basis matrices from the left and the right and efficiently computing a projection of a matrix block into a given H 2 -matrix space, we apply only one cluster basis matrix from the right and end up with a matrix in a semi-uniform matrix space.
8.2 Intermediate representation
339
Remark 8.4 (Block backward transformation). The computation of the product VA;t Yyt is closely related to the computation of the vector VA;t yO t performed by the backward transformation Algorithm 7 that is introduced in Section 3.7, and generalizing this procedure yields the block backward transformation Algorithm 48, the counterpart of the block forward transformation Algorithm 10 introduced in Section 5.2. Instead of using Algorithm 46 to compute all matrices VA;t and WB;r explicitly, we therefore can also apply the block backward transformation Algorithm 48 to Yyt and Yyr and reduce the amount of auxiliary storage. This will increase the number of operations, but should not hurt the asymptotic complexity. Algorithm 48. Block backward transformation. procedure BlockBackwardTransformation(t , V , var Y , Yy ); if sons.t/ D ; then Y Y C V t Yyt else for t 0 2 sons.t / do Yyt 0 E t 0 Yyt ; BlockBackwardTransformation(t 0 , V , Y , Yy ) end for end if
Case 2: bA is admissible If bA is admissible, the fact that A is an H 2 -matrix implies t As D VA;t SA;bA WA;s
and we conclude that the product on the right-hand side of (8.2) now satisfies Br D VA;t .r B WA;s SA;b / ; t As Br D VA;t SA;bA WA;s A
(8.6)
i.e., it is a block in a left semi-uniform matrix. As in Case 1, we can split the computation of the product r B WA;s SA;b A into two parts: first we compute BybB ´ r B WA;s
(8.7)
can be for all bB D .s; r/ 2 TB;JK , then the remaining multiplication with SA;b A handled directly. The structure of (8.7) closely resembles the structure of (8.4): if
340
8 A posteriori matrix arithmetic
sons.bB / ¤ ;, we can use the recursion X BybB D r 0 B WA;s 0 FA;s 0 ;s D b 0 D.s 0 ;r 0 /2sons.bB /
X
BybB FA;s 0 ;s ;
(8.8)
b 0 D.s 0 ;r 0 /2sons.bB /
otherwise, we can either rely on the definition (8.7) if bB is an inadmissible leaf or on V WA;s D WB;r SB;b P BybB D r B WA;s D WB;r SB;b B B;s B AB;s
if it is an admissible leaf. The resulting recursive procedure is given in Algorithm 49. Algorithm 49. Semi-uniform matrix forward transformation for the matrix B . y procedure SemiTransMatrixForward(b, WB , WA , PAB , SB , B, var B); .s; r/ b; if sons.b/ ¤ ; then Byb 0; for b 0 2 sons.b/ do .s 0 ; r 0 / b0; y SemiTransMatrixForward(b 0 , WB , WA , PAB , SB , B, B); y y y Bb B C Bb 0 FA;s 0 ;s end for else if b is admissible then Yyr SB;b PAB;s ; y y Bb WB;r Y t else Byb r B WA;s end if
Case 3: bA and bB are inadmissible As in Section 7.8, we do not have any information about the product of two inadmissible blocks and cannot hope to be able to express it in a simple way, e.g., as a semiuniform matrix. Fortunately, we can use the induced block cluster tree introduced in Definition 7.26 to ensure that this situation only appears if bC is also inadmissible. If bC is a leaf, we can handle it directly. Otherwise, we can proceed to the sons of bA , bB and bC by recursion. Lemma 8.5 (Exact matrix product). Let TM;K be the induced block cluster tree of 2 Definition 7.26. Then we have AB 2 Hsemi .TM;K ; VA ; WB /. Proof. Similar to the proof of Theorem 7.29: we prove the more general statement 2 .TM;K ; VA ; WB / t As Br 2 Hsemi
8.2 Intermediate representation
341
for all t 2 T , s 2 TJ and r 2 TK with .t; s/ 2 TA;J , .s; r/ 2 TB;JK and .t; r/ 2 TM;K by induction. C holds, our investigation of Case 1 yields If bA D .t; s/ 2 LA;J / 2 H 2 .TM;K ; VA ; / t As Br D VA;t .r BWA;s SA;b A 2 Hsemi .TM;K ; VA ; WB /: C If bB D .s; r/ 2 LB;JK holds, the results of Case 2 imply 2 H 2 .TM;K ; ; WB / t As Br D . t AVB;s SB;bB /WB;s 2 Hsemi .TM;K ; VA ; WB /:
We can continue as in the proof of Theorem 7.29. Our algorithm requires an explicit representation X .Xb WB;r C VA;t Yb / C MAB ´ AB D
X
t ABr
bD.t;r/2LM;K
C bD.t;r/2LM;K
(8.9) of the product AB as a semi-uniform matrix, i.e., we have to compute the left and right coefficient matrices .Xb /b2LC and .Yb /b2LC . M;K
M;K
We accomplish this task in two steps: first we compute an intermediate representation X X MAB D AB D .Xb WB;r C VA;t Yb / C t ABr bD.t;r/2TM;K
bD.t;r/2LM;K
of the product with coefficient matrices in all blocks, and then we apply a procedure similar to the matrix backward transformation Algorithm 38 in order to translate it into the desired form (8.9). This second step is part of the coarsening Algorithm 54. For the first part of the construction, the structure of the proof of Theorem 7.29 suggests a recursive approach: we start with t D root.T /, s D root.TJ / and r D root.TK /. If bA ´ .t; s/ or bB ´ .s; r/ are admissible leaves, we update XbC or YbC for bC ´ .t; r/ using the equations (8.3) and (8.6). Otherwise, all blocks are inadmissible, and we can either handle an inadmissible leaf directly or proceed by recursion. Since we are looking for the matrix M0 D C C AB D C C MAB , we also have to handle the matrix C . We solve this task by a very simple procedure: we construct a representation of C 2 H 2 .TC;K ; VC ; WC / in the matrix space H 2 .TM;K ; VC ; WC / corresponding to the induced block partition TM;K . Due to Definition 7.26, each block of TC;K is also present in TM;K , and we can use the matrix backward transformation Algorithm 38 to compute the representation of C in H 2 .TM;K ; VC ; WC / efficiently.
342
8 A posteriori matrix arithmetic
Algorithm 50 computes the exact product MAB D AB of two H 2 -matrices A 2 and B, represented as a semi-uniform matrix AB 2 Hsemi .TM;J ; VA ; WB /. Algo2 rithm 38 computes an H -matrix representation of C in H 2 .TM;K ; VC ; WC / 2 Hsemi .TM;K ; VC ; WC /. Each admissible block b D .t; r/ of the matrix M D C C AB is therefore represented in the form VC;t SyC;b WC;r C Xb WB;r C VA;t Yb , i.e., its rank is bounded by .#LC;r / C .#LB;r / C .#KA;t /. This fairly low rank is a major advantage of the new approach compared to the H 2 -matrix representation introduced in Section 7.8: the latter is based on induced cluster bases, and the rank of these cluster bases can become very large (depending on the size of the sparsity constant Csp , cf. Lemma 7.23). Algorithm 50. Semi-uniform representation of the matrix product. procedure MatrixProductSemi(t, s, r, A, B, var MAB , X , Y ); bA .t; s/; bB .s; r/; bC .t; r/; if bB is admissible then C XbC XbC C AybA SB;bB fbB 2 LB;JK g else if bA is admissible then C YbC YbC C BybB SA;b fbA 2 LA;J g A else if bA and bB are leaves then MAB C t As Br fbC 2 L MAB C;K g else for t 0 2 sonsC .t /, s 0 2 sonsC .s/, r 0 2 sonsC .r/ do fbA 2 A or bB 2 B g MatrixProductSemi(t 0 , s 0 , r 0 , A, B, MAB , X , Y ) end for end if As already mentioned this advantage comes at a price: the storage requirements of semi-uniform matrices typically do not grow linearly with the matrix dimension, as for H -matrices the depth of the block cluster tree is also a factor in the complexity estimates.
Complexity estimates Let us now investigate the complexity of Algorithm 50. Since it is based on the matrices .Ayb /b2TA;J and .Byb /b2TB;JK prepared by Algorithm 47 and 49, we first have to analyze the complexity of these preliminary steps. Lemma 8.6. Let V be a nested cluster basis with rank distribution K, let .k t / t2T be defined as in (3.16), and let kO ´ maxfk t W t 2 T g:
343
8.2 Intermediate representation
The computation of all matrices .V t / t2T by Algorithm 46 requires not more than 2.p C 1/kO 2 n operations. Proof. Let t 2 T . If t is not a leaf, the algorithm computes the products of V t 0 2 K 0 RtO0 t and E t 0 2 RK t 0 K t for all t 0 2 sons.t /. This can be accomplished in X
2.#tO0 /.#K t 0 /.#K t /
X
2.#tO0 /k t2 D 2.#tO/k t2 2.#tO/kO 2
t 0 2sons.t/
t 0 2sons.t/
operations, since Definition 3.4 implies X #tO0 D # t 0 2sons.t/
[
tO0 D #tO:
t 0 2sons.t/
Adding these bounds for all t 2 T and applying Corollary 3.10 yields the bound X
2.#tO/kO 2 D 2kO 2
t2T
X t2T
2kO 2
p X
#tO D 2kO 2
p X
X
`D0
t2T level.t/D`
#tO
n D 2kO 2 .p C 1/n :
`D0
Once the cluster bases have been prepared, the complexity of the semi-uniform matrix forward transformation can be investigated. Lemma 8.7. Let KA , LA , KB and LB be the rank distributions of VA , WA , VB and WB . Let .kA;t / t2T , .lA;s /s2TJ , .kB;s /s2TJ and .lB;r /r2TK be defined as in (3.16) and (3.18), and let kO ´ maxfkA;t ; lA;s ; kB;s ; lB;r W t 2 T ; s 2 TJ ; r 2 TK g:
(8.10)
If TA;J is Csp -sparse, Algorithm 47 requires not more than 2Csp .kO 3 c C kO 2 .p C 1/n / operations. If TB;JK is Csp -sparse, Algorithm 49 requires not more than 2Csp .kO 3 cK C kO 2 .pK C 1/nK / O 0; 1/-regular, the number of operations is in operations. If T and TK are .Crc ; k; 2 2 O O O.k .p C 1/n / and O.k .pK C 1/nK /, respectively.
344
8 A posteriori matrix arithmetic
Proof. Since Algorithms 47 and 49 are very similar, we only consider the first one. Let b D .t; s/ 2 TA;J . If sons.b/ ¤ ;, the algorithm computes Ayb 0 EB;s 0 ;s for all sons b 0 of b and adds the result to Ayb . This can be accomplished in X 2.#tO0 /.#KB;s 0 /.#KB;s / b 0 D.t 0 ;s 0 /2sons.b/
D2
X
#tO0
t 0 2sonsC .t/
2 2.#tO/kB;s
X
#KB;s 0 .#KB;s /
s 0 2sonsC .s/
2.#tO/kO 2 :
Otherwise, i.e., if b is a leaf, it can be either inadmissible or admissible. If it is inadmissible, the clusters t and s are leaves of T and TJ , respectively, so we have #sO kB;s and find that we can compute the product in 2.#tO/.#sO /.#KB;s / 2.#tO/kB;s .#KB;s / 2.#tO/kO 2 operations. If we are dealing with an admissible leaf, the computation of Yyt can be performed in 2.#KA;t /.#LA;s /.#KB;s / 2kA;t lA;s kB;s 2kO 3 operations, and the multiplication by VA;t takes not more than 2.#tO/.#KA;t /.#KB;s / 2.#tO/kA;t kB;s 2.#tO/kO 2 operations. We can summarize that we need not more than 2.kO C #tO/kO 2 operations for the block b. Adding the bounds for all blocks yields X X X X kO 3 C 2 2.kO C #tO/kO 2 D 2 .#tO/kO 2 bD.t;s/2TA;J
b2TA;J
t2T s2row.t/
2k #TA;J C 2Csp kO 2 O3
X
#tO
t2T
D 2kO 3 #TA;J C 2Csp kO 2
2kO 3 #TA;J C 2Csp kO 2
p X
X
`D0
t2T level.t/D`
p X
n
`D0
D 2kO 3 #TA;J C 2Csp kO 2 .p C 1/n :
#tO
8.2 Intermediate representation
Using the estimate #TA;J D
X
345
# row.t / Csp #T D Csp c
t2T
O 0; 1/-regular, we can find a constant C we get the desired result, and if T is .Crc ; k; O independent of k and n satisfying c C n =kO and complete the proof. In order to analyze the complexity of Algorithm 50, we follow the approach used in Section 7.7: we observe that the triples .t; s; r/ for which the algorithm is called are organized in a tree structure. The corresponding call tree TJK is the minimal tree with root root.TJK / ´ .root.T /; root.TJ /; root.TK // and a father-son relation given by 8 C C C ˆ <sons .t / sons .s/ sons .r/ sons.t; s; r/ ´ ˆ : ;
if .t; s/ 2 A ; .s; r/ 2 B ; or .t; s/ 2 A ; .s; r/ 2 B ; otherwise; (8.11) for all triples .t; s; r/ 2 TJK it contains. Here A A TA;J and B B TB;JK denote the sets of subdivided and inadmissible blocks of A and B, respectively (cf. Definition 7.16 and (7.18)). As in Section 7.7, the complexity estimate depends on a sparsity estimate for the call tree. Lemma 8.8 (Sparsity of the call tree). Let TA;J and TB;JK be Csp -sparse admissible block cluster trees, and let C .t/ ´ f.s; r/ 2 TJ TK W .t; s; r/ 2 TJK g CJ .s/ ´ f.t; r/ 2 T TK W .t; s; r/ 2 TJK g CK .r/ ´ f.t; s/ 2 T TJ W .t; s; r/ 2 TJK g
for all t 2 T ; for all s 2 TJ ; for all r 2 TK :
Then we have #C .t /; #CJ .s/; #CK .r/ 3Csp2
for all t 2 T ; s 2 TJ ; r 2 TK :
Proof. As in the proof of Lemma 7.18: comparing (8.11) and (7.19) reveals that each node of the call tree of Algorithm 50 is also a node of the call tree of Algorithm 45 (with a minimal TC;K consisting only of the root). Since we are working with semi-uniform matrices instead of H 2 -matrices, the number of operations for a block b D .t; r/ 2 TM;K depends on the cardinalities of tO and r, O which means that the weak assumptions used in the previous chapters do not allow us to derive meaningful complexity estimates. Instead, we have to require that the strict conditions introduced in Section 3.8 are met, i.e., that the rank distribution is k-bounded and that the cluster trees are .Crc ; k/-bounded.
346
8 A posteriori matrix arithmetic
Lemma 8.9 (Complexity). Let KA , LA , KB and LB be the rank distributions of VA , WA , VB and WB . Let .kA;t / t2T , .lA;s /s2TJ , .kB;s /s2TJ and .lB;r /r2TK be defined as in (3.16) and (3.18), and let kO be defined as in (8.10). If TA;J and TB;JK are Csp -sparse, Algorithm 50 requires not more than 6Csp2 kO 2 ..p C 1/n C .pK C 1/nK / operations to compute the semi-uniform representation MAB D AB of the product of the H 2 -matrices A and B. Proof. Let .t; s; r/ 2 TJK , and let bA ´ .t; s/, bB ´ .s; r/ and bC ´ .t; r/. If bB is admissible, we have to compute the matrix AybA SB;bB and add it to XbC , and this requires not more than 2.#tO/.#KB;s /.#LB;r / 2.#tO/kO 2 operations. If bA is admissible, we have to compute the matrix BybB SA;b and add it to A YbC , and this requires not more than
2.#r/.#L O O kO 2 A;s /.#KA;t / 2.# r/ operations. If bA and bB are inadmissible leaves, the fact that TA;J and TB;JK are admissible block cluster trees implies that the clusters t , s and r are leaves of T , TJ and TK , and the update of t MAB r requires not more than 2.#tO/.#sO /.#r/ O .#tO/kO 2 operations, and we conclude that the number of operations for each triple .t; s; r/ 2 TJK is bounded by 2kO 2 .#tO C #r/: O Combining Corollary 3.10 and Lemma 8.8 we can find and estimate for the number of operations of the entire algorithm: we have X X X #tO D .#tO/#C .t / 3Csp2 #tO t2T
.t;s;r/2TJK
t2T
X
X
`D0
t2T level.t/D`
p
D X
#rO D
.t;s;r/2TJK
and this implies X .t;s;r/2TJK
3Csp2 X
#tO 3Csp2
p X
n D 3Csp2 .p C 1/n ;
`D0
2 .#r/#C O K .r/ 3Csp .pK C 1/nK ;
r2TK
2kO 2 .#tO C #r/ O 6Csp2 kO 2 ..p C 1/n C .pK C 1/nK /:
8.3 Coarsening
347
Remark 8.10 (Computation of M0 D C C AB). Algorithm 50 computes the representation of MAB D AB as a semiuniform matrix in the block cluster tree TzM;K defined by the call tree through .t; r/ 2 TzM;K () there is an s 2 TJ with .t; s; r/ 2 TJK : In general, TzM;K is only a subtree of the induced block cluster tree TM;K , but we can use Lemma 8.3 to compute a semi-uniform representation of MAB using TM;K . Similarly we can use the matrix backward transformation Algorithm 38 to compute an H 2 -matrix representation of C in the space H 2 .TM;K ; VC ; WC /. The result is a representation of the exact result C C AB D C C MAB in the matrix 2 space H 2 .TM;K ; VC ; WC / C Hsemi .TM;K ; VA ; WB /.
8.3 Coarsening In most practical applications, the induced block cluster tree TM;K used in the exact representation of M0 D C C AB provided by Lemma 8.5 will not coincide with the block cluster tree TC;K we have prescribed for the result of the multiplication: although Definition 7.26 ensures that each block of TC;K will also appear in TM;K , leaves of TC;K may correspond to subdivided blocks in TM;K , and admissible blocks in TC;K may be inadmissible in TM;K due to the modified admissibility condition used in the construction of TM;K . The second phase of the a posteriori matrix-matrix multiplication algorithm addresses this problem: for all admissible leaves b D .t; r/ 2 TC;K , we check whether they are also admissible leaves in TM;K . If they are not, we construct low-rank approximations of the corresponding submatrices by using the hierarchical approximation approach described in [52]. The result of this procedure is an approxiz of M0 D C C AB by a hierarchical matrix based on the prescribed block mation M cluster tree TC;K instead of the induced block cluster tree TM;K . In the third and final phase of the multiplication algorithm, we convert this intermediate approximation into the desired H 2 -matrix based on the block cluster tree TC;K . Since we aim to approximate the matrix M D C C AB by an H -matrix, we can compute all approximations block by block and do not have to take interactions within rows or columns into account. This computation can be carried out by a recursive procedure working from the leaves of TM;K upwards.
Inadmissible leaves of TM; K Let b 2 LM;K be an inadmissible leaf of TM;K . If b is also an inadmissible leaf of TC;K , we are done. Otherwise, we can use a singular value decomposition to turn the corresponding submatrix into a low-rank matrix:
348
8 A posteriori matrix arithmetic
Lemma 8.11 (Singular value decomposition). Let 0 and J 0 J. Let M 2 0 0 RJ 0 ;J 0 . Let p ´ rank.M / minf# ; #J g with p > 0. Let 1 p > 0 be the non-zero singular values of M . For each l 2 f1; : : : ; pg, we can find an index set and W 2 RJK and a diagonal K with nK D l, orthogonal matrices V 2 RK 0 J0 KK matrix S 2 R which satisfy ´ lC1 if l < p; (8.12) kM V SW k2 D 0 otherwise; ´ P
1=2 p i2 if l < p; iDlC1 kM V SW kF D (8.13) 0 otherwise; Proof. As in Lemma 5.19. For a given error tolerance b 2 R>0 , this result allows us to find an approximation with sufficiently small error and the lowest possible rank: for an inadmissible leaf K b D .t; r/ 2 TM;K , we can find an index set Kb , orthogonal matrices Vb 2 RtO b , KKb
Wb 2 RrO
and a diagonal matrix Sb 2 RKb Kb such that
k t Mr Vb Sb Wb k2 b
or
k t Mr Vb Sb Wb kF b
(8.14)
holds. Low-rank representations of this type are closely related to the ones used in the context of H -matrices: by setting Ab ´ Vb Sb and Bb ´ Wb , we immediately get the desired factorized representation of a low-rank approximation of the submatrix corresponding to the block b. Remark 8.12 (Complexity). According to Lemma 5.17, the Householder factorization in Algorithm 51 requires not more than Cqr mnq operations, and according to Remark 5.20 the singular value decomposition can be computed in not more than Csvd q 3 operations. If m > n, the computation of Vy can be carried out in 2mn2 D 2mnq y can be performed in 2m2 n D 2mnq operations, otherwise the computation of W operations. The entire Algorithm 51 therefore takes not more than Csvdf .# 0 /.#J 0 / minf# 0 ; #J 0 g operations with Csvdf D Cqr C Csvd C 2 to complete.
8.3 Coarsening
349
Algorithm 51. Compute a low-rank approximation V S W of a matrix X 2 RJ 0 ;J 0 . y procedure LowrankFactor(X , 0 , J 0 , , var V , S, W , K); 0 0 m # ; n #J ; q minfm; ng; Fix arbitrary isomorphisms m W f1; : : : ; mg ! 0 and n W f1; : : : ; ng ! J 0 ; Xy 0 2 Rmn for i 2 f1; : : : ; mg, j 2 f1; : : : ; ng do Xyij Xm .i/;n .j / end for; if m > n then yR y of Xy ; Compute a Householder factorization Xy D Q qq y y Y R2R else yR y of Xy ; Compute a Householder factorization Xy D Q qq y y Y R 2R end if y of Yy ; Compute a singular value decomposition Yy D Vy diag.1 ; : : : ; q /W if m > n then y Vy Vy Q else y W
yW y Q end if FindRank( , q, .i /qiD1 , l); Ky
m .f1; : : : ; lg/; y
fAlgorithm 17g y
K K V 0 2 R ; W 0 2 RJ ; J0 0 for i 2 f1; : : : ; mg, j 2 f1; : : : ; lg do Vm .i/;m .j / Vyij end for; for i 2 f1; : : : ; ng, j 2 f1; : : : ; lg do yij Wn .i/;m .j / W end for; for i 2 f1; : : : ; lg do Sm .i/;m .i/ i end for
S
y
y
0 2 RKK ;
Admissible leaves of TM; K C We can apply the same approach to admissible leaves: let b D .t; r/ 2 LM;K . Assuming that the index sets LC;r , LB;r and KA;t are pairwise disjoint, the fact that
350
8 A posteriori matrix arithmetic
M is semi-uniform implies C VA;t Yb t Mr D VC;t SyC;b WC;r C Xb WB;r 0 1
WC;r A y ; D VC;t SC;b Xb VA;t @WB;r Yb
(8.15)
i.e., the submatrix corresponding to b is already given in a factorized representation of rank not higher than .#LC;r / C .#LB;r / C .#KA;t /. If we want to keep the storage requirements low, we can try to reduce the rank by applying Lemma 8.11 to t Mr . Since the direct computation of the singular value decomposition of the submatrix corresponding to b could become computationally expensive if the index sets tO and rO become large, we have to exploit the factorized structure (8.15): we apply Lemma 5.16 to the left factor in (8.15) in order to find an index set Kyb , an orthogonal matrix yb K yb 2 RKyb .LC;r [LB;r [KA;t / with and a matrix R Qb 2 R tO
VC;t SyC;b
Xb
yb : VA;t D Qb R
Combining this factorization with (8.15) yields 0 1
WC;r A t Mr D VC;t SyC;b Xb VA;t @WB;r Yb
yb j y D Qb R Kb LC;r
0 1 WC;r yb j y @W A R Kb KA;t B;r Yb
yb j y R Kb LB;r
yb j y yb j y yb j y y D Qb R W CR W CR Y D Qb M b Kb LC;r C;r Kb LB;r B;r Kb KA;t b for the auxiliary matrix yb j y b ´ WC;r R M y K
b LC;r
yb j C WB;r R y K
b LB;r
yb j C Yb R y K
b KA;t
yb KK
2 RsO
:
y b has only #Kyb .#LC;r / C .#LB;r / C .#KA;t / rows, we can compute its Since M singular value decomposition efficiently, and this gives us an index set Kb , orthogonal y JK matrices Vyb 2 RKb Kb , Wb 2 RsO b , and a diagonal matrix Sb 2 RKb Kb such that y b Wb Sb Vy k2 b kM b
or
y b Wb Sb Vy kF b kM b
holds. Since Qb is an orthogonal matrix, the same holds for Vb ´ Qb Vyb , and we find y Vyb Sb W /; t Mr Vb Sb Wb D Qb .M b b so (8.14) is guaranteed (since Sb is diagonal, we have Sb D Sb ). The resulting procedure for finding low-rank approximations of leaves of TM;K is given in Algorithm 52.
8.3 Coarsening
351
Algorithm 52. Low-rank approximation of leaf blocks. procedure CoarsenLeaf(b, M , , var Vb , Sb , Wb ); .t; r/ b; if b is admissible in TM;K then VC;t SyC;b ; X1;b X2;b Xb ; VA;t ; X3;b .LC;r [LB;r [KA;t / ; Zb 0 2 RtO Zb jtOLC;r X1;b ; Zb jtOLB;r X2;b ; Zb jtOKA;t X3;b ; yb , Kyb ); fAlgorithm 15g Householder(Zb , tO, Qb , R y1;b y y y2;b y y y3;b y y R Rj ; R Rj ; R Rj ; Kb LC;r
y 1;b M y 2;b M
y ; WC;r R 1;b y ; WB;r R
y 3;b M
y ; Yb R 3;b
Kb LB;r
Kb KA;t
2;b
y 1;b C M y 2;b C M y 3;b 2 RJKyb ; yb M M sO y b , sO , Kyb , b , Wb , Sb , Vyb , Kb ); LowrankFactor(M Qb Vyb Vb else LowrankFactor( t Mr , tO, r, O b , Vb , Sb , Wb , Kb ) end if
fAlgorithm 51g
fAlgorithm 51g
Subdivided blocks in TM; K Now let us consider blocks b D .t; r/ 2 TM;K which are not leaves. We assume that we have already found low-rank approximations Vb 0 Sb 0 Wb0 for all sons b 0 2 sons.b/, and we have to find a way of combining these approximations into a low-rank approximation of the entire block. This means that we have to approximate the matrix X X X Vb 0 Sb 0 Wb0 D V t 0 ;r 0 S t 0 ;r 0 W t0 ;r 0 : (8.16) b 0 2sons.b/
t 0 2sonsC .t/ r 0 2sonsC .r/
Let ´ # sonsC .t / and ft1 ; : : : ; t g ´ sonsC .t /, and let ´ # sonsC .r/ and fr1 ; : : : ; r g ´ sonsC .r/. The inner sum can be expressed in the form 1 W t0 ;r1
B C V t 0 ;r S t 0 ;r @ ::: A : W t0 ;r 0
X r 0 2sonsC .r/
V t 0 ;r 0 S t 0 ;r 0 W t0 ;r 0 D V t 0 ;r1 S t 0 ;r1
:::
(8.17)
352
8 A posteriori matrix arithmetic
We once more use Lemma 5.16 in order to find an index set Kyt 0 , an orthogonal matrix y 0 K y t 0 2 RKyt 0 .K t 0 ;r1 [[K t 0 ;r / with Q t 0 2 R 0 t and a matrix R tO
V t 0 ;r1 S t 0 ;r1
yt 0 : V t 0 ;r S t 0 ;r D Q t 0 R
:::
We combine this equation with (8.17) and get 1 W t0 ;r1
B C V t 0 ;r S t 0 ;r @ ::: A W t0 ;r 0
X
V t 0 ;r 0 S t 0 ;r 0 W t0 ;r 0 D V t 0 ;r1 S t 0 ;r1
:::
r 0 2sonsC .r/
yt 0 j y D Qt 0 R K t 0 K t 0 ;r
yt 0 j y R K
::: 1
D Qt 0
X
yt 0 j y R K
r 0 2sonsC .r/
t 0 K t 0 ;r 0
t 0 K t 0 ;r
1 W t0 ;r1 B :: C @ : A W t0 ;r 0
y t0 W t0 ;r 0 D Q t 0 M
for the auxiliary matrix y t0 ´ M
X r 0 2sonsC .r/
y t 0 j W t 0 ;r 0 R y K
t 0 K t 0 ;r 0
y 0 KK t
2 RrO
:
We assume that the index sets Kyt1 ; : : : ; Kyt are pairwise disjoint and combine the above equation with (8.16) in order to get 0 1 y M X
B :t1 C y Vb 0 Sb 0 Wb 0 D Q t1 : : : Q t @ :: A D Qb M b b 0 2sons.b/ y t M
for the auxiliary matrices Qb ´ Q t1
:::
yb K Q t 2 R ; tO
yb ´ M y t1 M
:::
y t 2 RKKyb M rO
P [ P Kyt . We have Q t 0 D t 0 Q t 0 for all t 0 2 sonsC .t /, and since with Kyb ´ Kyt1 [ the index sets corresponding to different sons of t are disjoint, we can conclude that the orthogonality of the matrices Q t 0 implies that also Qb is orthogonal. Therefore we can proceed as in the case of leaf blocks: we have to find a low-rank approximation of y . The resulting procedure for finding a low-rank approximation of a subdivided Qb M b matrix is given in Algorithm 53. Remark 8.13 (Intermediate approximation). In practice, it can be more efficient to replace the exact QR factorization of the matrices Z t 0 by truncations which reduce the y t 0 at an early stage. rank of the intermediate matrices Q t 0 R
8.3 Coarsening
Algorithm 53. Low-rank approximation of subdivided blocks. procedure CoarsenSubdivided(b, M , , var Vb , Sb , Wb ); .t; r/ b; Kyb ;; for t 0 2 sonsC .t / do K t 0 ; ;; for r 0 2 sonsC .r/ do b0 .t 0 ; r 0 /; K t 0 ; K t 0 ; [ Kb 0 end for; K 0 0 2 RtO t ; ; Zt 0 for r 0 2 sonsC .r/ do b0 .t 0 ; r 0 /; Z t 0 jKb0 Vb 0 Sb 0 end for; y t 0 , Kyt 0 ); Householder(Z t 0 , tO0 , Q t 0 , R y J K y t0 t0 ; M 02R 0 C for r 2 sons .r/ do b0 .t 0 ; r 0 /; y t0 y t 0 C Wb 0 R y t 0 j M M y
353
fAlgorithm 15g
K t 0 Kb 0
end for; Kyb Kyb [ Kyt 0 end for; y y K JK yb 0 2 RtO b ; M 0 2 RrO b ; Qb for t 0 2 sonsC .t / do y b jJK 0 y t0 Qb jK t 0 Qt 0 ; M M t end for; y b , r, LowrankFactor(M O Kyb , b , Wb , Sb , Vyb , Kb ); y Vb Qb Vb
fAlgorithm 51g
This approach also makes the implementation of the algorithm simpler: it is only necessary to implement the approximation of matrices of the form 0 1 X1
B :: C X D X1 : : : X ; X D @ : A X and handle the case of general block matrices by first compressing all rows, thus creating an intermediate block matrix with only one block column, and then compressing this intermediate matrix in order to get the final result.
354
8 A posteriori matrix arithmetic
Using the Algorithm 52 for leaf blocks and the Algorithm 53 for subdivided blocks recursively leads to the coarsening Algorithm 54 which approximates the semi-uniform 2 z 2 H .TC;K ; k/ given by matrix M 2 Hsemi .TM;K ; VA ; WB / by an H -matrix M z D M
X
X
Vb Sb Wb C
bD.t;r/2LC C;K
t Mr :
(8.18)
bD.t;r/2L C;K
Since we use the matrices Xb and Yb of the semi-uniform representation only in leaf blocks, we also have to embed a variant of the matrix backward transformation into Algorithm 54 (cf. Lemma 8.3). Algorithm 54. Coarsening of the block structure. procedure Coarsen(b, M , , var Vb , Sb , Wb ); .t; r/ b; if sons.TM;K ; b/ D ; then if b 62 TC;K or b 2 LC C;K then CoarsenLeaf(b, M , b , Vb , Sb , Wb ); fAlgorithm 52g end if else for b 0 D .t 0 ; r 0 / 2 sons.TM;K ; b/ do Xb 0 Xb 0 C t 0 Xb FB;r Yb 0 Yb 0 C r 0 Yb EA;t 0 ;r ; 0 ;t ; 0 Coarsen(b , M , , Vb 0 , Sb 0 , Wb 0 ) end for; if b 62 TC;K or b 2 LC C;K then CoarsenSubdivided(b, M , b , Vb , Sb , Wb ) fAlgorithm 53g end if end if
z be the matrix (8.18) constructed by Algorithm 54 Lemma 8.14 (Error estimate). Let M with the error tolerances . b /b2TM;K . We have z k2 kM M
X
b ;
b2TM;K
if the matrices are approximated with respect to the spectral norm, and z kF kM M
X b2TM;K
if the Frobenius norm is used instead.
b ;
355
8.3 Coarsening
Proof. We introduce the intermediate matrices
zb ´ M
8 ˆ t Mr 1, we find that b is not a leaf, so Algorithm 54 computes approximations for all blocks b 0 2 sons.b/. Due to sons .b 0 / sons .b/ n fbg, we have # sons .b 0 / n and can apply the induction assumption to all of these submatrices: X X z b0 M z b k2 D z b0 / C zb k t Mr M M . t 0 Mr 0 M b 0 D.t 0 ;r 0 /2sons.b/
z b 0 k2 C k t 0 Mr 0 M
X
b 0 D.t 0 ;r 0 /2sons.b/
X
b 0 2sons.b/
X
b
C
b 0 D.t 0 ;r 0 /2sons.b/ b 2sons .b 0 /
X
2
zb z b0 M M
b 0 2sons.b/
X
2
zb z b0 M M :
b 0 2sons.b/
2
z z 0 If b 62 TC;K or b 2 LC C;K , Mb is constructed from the submatrices Mb by Algorithm 53 and the norm on the right is bounded by b . Otherwise, no further approximation takes place and this norm is equal to zero. In both cases, we can conclude X X X z b k2 b C b D b k t Mr M b 0 2sons.b/ b 2sons .b 0 /
b 2sons .b/
and have completed the induction. For the Frobenius norm, we can apply exactly the same argument.
356
8 A posteriori matrix arithmetic
Complexity estimates We have seen that Algorithm 54 can provide us with a good H -matrix approximation z of the matrix M D C C AB. Now we have to investigate the number of operations M required to reach this goal. Lemma 8.15. Let .kC;t / t2T , .lC;r /r2TK , .kA;t / t2T and .lB;r /r2TK be defined as in (3.16) and (3.18). Let kO ´ maxfkC;t ; lC;r ; kA;t ; lB;r W t 2 T ; r 2 TK g: Computing low-rank approximations Vb Sb Wb for all leaves b 2 LM;K of TM;K by applying Algorithm 52 requires not more than a total of Clfc Csp2 kO 2 ..p C 1/n C .pK C 1/nK / operations for a constant Clfc 2 R>0 depending only on Cqr and Csvdf (cf. Lemma 5.17 and Remark 8.12). Proof. Let b D .t; r/ 2 LM;K be a leaf of TM;K . If b is inadmissible, Algorithm 52 uses the singular value decomposition to compute a low-rank approximation. Since b D .t; r/ is an inadmissible leaf, both t and r have to be leaves of T and O TK , respectively, and since the cluster trees are bounded, we have #rO kC;r k. According to Remark 8.12, the singular value decomposition can be found in not more than Csvdf .#tO/.#r/ O minf#tO; #rg O Csvdf .#tO/.#r/ O 2 Csvdf kO 2 #tO operations. If b is admissible, Algorithm 52 computes the matrix X1;b D VC;t SyC;b , and this can be done in 2.#tO/.#KC;t /.#LC;r / 2kO 2 #tO operations. Next, Algorithm 15 is used to compute the QR factorization of Zb . Since Zb has #tO rows and .#LC;r / C .#LB;r / C .#KA;t / 3kO columns, this computation requires not more than O 2 9Cqr kO 2 #tO Cqr .#tO/.3k/ operations. y 1;b is the product of WC;r 2 RKLC;r and R y 2 RLC;r Kyb , adding The matrix M 1;b rO O2 O y b takes not more than 2.#r/.#L O O it to M O C;r /.# Kb / 6k # rO operations since # Kb 3k. KL y B;r y 2;b is the product of WB;r 2 R y 2 RLB;r Kb and can The matrix M and R rO
2;b
O2 O y be added in 2.#r/.#L O B;r /.# Kb / 6k # rO operations. The matrix M3;b is the product
8.3 Coarsening KK
357
y
A;t y 2 RKA;t Kb and can be added in 2.#r/.#K O and R O of Yb 2 RrO A;t /.# Kb / 3;b 2 y b can be constructed in 6kO #rO operations. We conclude that M
18kO 2 #rO operations. y b is available, Algorithm 51 is applied, and according to Remark 8.12 it Once M requires not more than 2 O 2 D 9Csvdf kO 2 #rO O O k/ Csvdf .#r/.#K b / Csvdf .# r/.3
operations. The matrix Vb is the product of Qb and Vyb , and this product can be computed in O 2 D 18kO 2 #tO 2.#tO/.#Kyb /.#Kb / 2.#tO/.3k/ operations. Adding the bounds yields that the algorithm requires not more than .20 C 9Cqr /kO 2 #tO C .18 C 9Csvdf /kO 2 #rO operations for one block b D .t; r/ 2 TM;K . In order to get a bound for the entire algorithm, we add the bounds for all blocks. Combining the structure of the proof of Lemma 3.31 with the estimate of Lemma 8.8, we find X kO 2 .#tO C #r/ O 3Csp2 kO 2 ..p C 1/n C .pK C 1/nK / (8.20) C bD.t;r/2LM;K
and setting Clfc ´ 3 maxf20 C 9Cqr ; 18 C 9Csvdf g completes the proof. Lemma 8.16 (Complexity). Let .kC;t / t2T , .lC;r /r2TK , .kA;t / t2T and .lB;r /r2TK be defined as in (3.16) and (3.18). Let kO ´ maxfkC;t ; lC;r ; kA;t ; lB;r ; kb W t 2 T ; r 2 TK ; b 2 TM;K g; Cbc ´ maxf# sons.t /; # sons.r/ W t 2 T ; r 2 TK g: Computing an H -matrix approximation of M D C C AB with the block cluster tree TC;K using Algorithm 54 requires not more than 4 /Csp2 kO 2 ..p C 1/n C .pK C 1/nK / .Clfc C Csbc Cbc
operations for a constant Csbc 2 R>0 and the constant Clfc defined in Lemma 8.15. Both constants depend only on Cqr and Csvdf .
358
8 A posteriori matrix arithmetic
Proof. Due to Lemma 8.15, the compression of all leaf blocks of TM;K can be handled in Clfc Csp2 kO 2 ..p C 1/n C .pK C 1/nK / operations, so we only have to consider non-leaf blocks treated by Algorithm 53. Let b D .t; r/ 2 TM;K . If b 2 TC;K , no arithmetic operations are needed. Otherwise Algorithm 53 constructs the matrix Z t 0 using not more than X 2.#tO0 /.#K t 0 ;r 0 /2 2Cbc .#tO0 /kO 2 r 0 2sonsC .r/
operations. Due to [
#K t 0 ; D #
K t 0 ;r 0
r 0 2sonsC .r/
X
O #K t 0 ;r 0 Cbc k;
r 0 2sonsC .r/
the matrix Z t 0 has not more than Cbc kO columns, and Lemma 5.17 yields that Algorithm 15 requires not more than O 2 D Cqr C 2 .#tO0 /kO 2 Cqr .#tO0 /.Cbc k/ bc y t 0 . The matrix M y t 0 can be assembled in operations to compute Q t 0 and R X X O bc k/ O D 2Cbc .#r/ 2.#rO 0 /.#K t 0 ;r 0 /.#Kyt 0 / 2 .#rO 0 /k.C O kO 2 r 0 2sonsC .r/
r 0 2sonsC .r/
operations. This procedure is repeated for all t 0 2 sons.t /, and adding the estimates yields the bound X 2 2 2 .2Cbc CCqr Cbc /.#tO0 /kO 2 C2Cbc .#r/ O kO 2 .2Cbc CCqr Cbc /.#tO/kO 2 C2Cbc .#r/ O kO 2 t 0 2sons.t/
y b . Due to Remark 8.12, the matrices for the construction of the matrices Qb and M Wb , Sb and Vyb can be computed by Algorithm 51 in 2 O 2 4 Csvdf .#r/.# k/ D Csvdf Cbc O Kyb /2 Csvdf .#r/.C O bc .#r/ O kO 2
operations due to #Kyb D #
[ t 0 2sonsC .t/
Kyt 0 D
X
#Kyt 0
t 0 2sonsC .t/
X
2 O k: Cbc kO Cbc
t 0 2sonsC .t/
Finally the algorithm constructs Vb in not more than 2 O 2 O 4 2.#tO/.#Kyb /.#Kb / 2.#tO/.Cbc k/.Cbc k/ D 2Cbc .#tO/kO 2
8.4 Construction of adaptive cluster bases
359
operations. Adding the estimates gives us the bound 2 2 4 4 .2Cbc C Cqr Cbc /.#tO/kO 2 C 2Cbc .#r/ O C Csvdf Cbc .#r/ O kO 2 C 2Cbc .#tO/kO 2 2 4 2 4 maxf2Cbc C Cqr Cbc C 2Cbc ; 2Cbc C Csvdf Cbc g.#tO C #r/ O kO 2 maxf4 C Cqr ; 2 C Csvdf gC 4 .#tO C #r/ O kO 2 bc
for the number of operations used by Algorithm 53 applied to a block b D .t; r/ 2 TM;K . Using (8.20) yields the bound 4 2 O2 Csp k ..p C 1/n C .pK C 1/nK / Csbc Cbc
for the number of operations required by all calls to Algorithm 53, where Csbc ´ 3 maxf4 C Cqr ; 2 C Csvdf g: Adding this estimate to the one given by Lemma 8.15 completes the proof.
8.4 Construction of adaptive cluster bases In the first step of the adaptive multiplication algorithm, we have computed the exact product by Algorithm 50 using O.kO 2 ..p C 1/n C nJ C .pK C 1/nK // operations. z of In the second step, we have approximated the exact product by an H -matrix M 2 O O local rank k with the prescribed block cluster tree TC;K using O.k ..p C 1/n C .pK C 1/nK // operations. Now we have to approximate this intermediate H -matrix by an H 2 -matrix, i.e., we have to find suitable cluster bases and compute the H 2 -matrix best approximation of z in the corresponding H 2 -matrix space. M z can be apFortunately, we have already solved both problems: assuming that M 2 O proximated by an H -matrix with k-bounded cluster bases, Algorithm 27 computes adaptive cluster bases in O.kO 2 .pJ C 1/.n C nK // operations (cf. Lemma 6.25), z in the corresponding space and Algorithm 12 computes the best approximation of M 2 O in O.k .pJ C 1/.n C nK // operations (cf. Lemma 5.9). We can conclude that computing the adaptive H 2 -matrix approximation of the product M D C C AB can be computed in O.kO 2 .p C pJ C pK C 1/.n C nK / C kO 2 nJ / operations. Due to the structure of Algorithm 27, the computation of a cluster basis for a cluster t 2 T requires information on all admissible blocks b D .t C ; r/ 2 LC C;K connected C to ancestors t 2 pred.t /, i.e., we essentially have to store the H -matrix representation O J C 1/.n C nK // units of auxiliary storage are required. z explicitly, i.e., O.k.p of M In order to avoid the need for this large amount of temporary storage, the blockwise compression Algorithm 33 can be used. Combining this procedure with the coarsening
360
8 A posteriori matrix arithmetic
and multiplication routines allows us to avoid storing both the intermediate semiz , since we can convert each uniform matrix M0 and the H -matrix approximation M admissible block into an H 2 -matrix as soon as it becomes available. According to Theorem 6.32, this requires O.kO 2 .pJ C 1/.n C nK // operations, and the order of complexity is not changed if we use the blockwise compression Algorithm 33 instead of the direct Algorithm 27.
8.5 Numerical experiments We test the a posteriori matrix-matrix multiplication algorithm by applying it to compressed approximations of the single and double layer potential matrices V and K on the sphere and the cube. The results for the approximation of X ´ V 2 are given in Table 8.1: O n1=2 is the error tolerance for the compression algorithm, the column “A post.” gives the time in seconds for the matrix-matrix multiplication with adaptive cluster bases, the columns “Mem” and “Mem=n” the total storage requirements in MB and the requirements per degree of freedom in KB, and the column gives the relative spectral error kX V 2 k2 =kV 2 k2 of the product. Table 8.1. A posteriori and a priori multiplication for the single layer potential on the unit sphere (top half) and the unit cube (bottom half.)
n 512 2048 8192 32768 131072 768 3072 12288 49152 196608
O A post. A prio. Mem Mem=n 2:04 0:4 0:3 1:9 3:7 2:85 1:04 14:6 9:2 9:8 4:9 4:05 5:05 113:3 53:3 42:6 5:3 2:45 2:05 685:0 249:0 183:7 5:7 9:06 1:05 3626:3 1030:9 753:7 5:9 5:56 2:04 1:0 0:7 3:6 4:8 4:85 1:04 20:8 8:7 12:4 4:1 5:75 5:05 166:3 81:2 96:6 8:1 2:95 2:05 1041:7 356:5 415:4 8:7 9:96 1:05 5661:1 1316:3 1599:9 8:3 4:66
We can combine the new multiplication algorithm with the approach discussed in Chapter 7: according to Lemma 6.23, we can assume that the cluster bases chosen by the adaptive algorithm will be almost optimal, therefore computing the best approximation of the exact product in the corresponding H 2 -matrix space should yield good results. According to Theorem 7.19, we can expect the a priori multiplication Algorithm 45 to compute this best approximation in optimal complexity. The column “A prio.” in Table 8.1 gives the time in seconds required by Algorithm 45 to compute the best
8.5 Numerical experiments
361
approximation of the product with respect to the quasi-optimal cluster bases constructed by the adaptive multiplication algorithm. We can see that the algorithms work as expected: the time for the adaptive algorithm grows like O.n log.n//, the time for the a priori algorithm grows like O.n/. The storage requirements seem also to grow like O.n/. The measured error is always below the prescribed tolerance O , and this indicates that the error control works reliably. Now we perform the same experiment with the double layer potential matrix K instead of V . The results given in Table 8.2 show that the adaptive algorithm works as expected: the time required for the multiplication is almost linear, and the measured error is always bounded by the prescribed tolerance O . Using the cluster bases chosen by the adaptive algorithm, the a priori multiplication Algorithm 45 can compute approximations of the product that are far better than the ones given in Table 7.2 for the non-adaptive approach.
Table 8.2. A posteriori and a priori multiplication for the double layer potential on the unit sphere (top half) and the unit cube (bottom half).
n 512 2048 8192 32768 131072 768 3072 12288 49152 196608
O A post. A prio. Mem Mem=n 2:04 0:4 0:3 1:9 3:8 8:96 1:04 16:6 11:9 11:0 5:5 1:15 5:05 140:2 78:5 49:4 6:2 1:15 2:05 895:2 384:6 216:4 6:8 7:46 1:05 5245:4 1709:7 916:1 7:2 5:56 2:04 1:3 0:9 3:8 5:0 1:35 1:04 29:0 14:7 16:6 5:5 1:35 5:05 202:3 96:6 107:3 8:9 1:25 2:05 1184:7 383:8 445:0 9:3 8:96 1:05 5896:7 1268:8 1664:4 8:7 6:46
In a final experiment, we compare the adaptive multiplication algorithm of Chapter 8, the projected multiplication algorithm of Chapter 7 and the H -matrix multiplication algorithm [49], [52]. Table 8.3 lists computing times and accuracies for the adaptive algorithm (“Adaptive”), the a priori algorithm combined with the cluster bases provided by the adaptive algorithm (“A priori/new”) and the H -matrix algorithm. We can see that the H -matrix algorithm yields accuracies that are slightly better than those of the other algorithms, but also that it takes very long to complete. The adaptive algorithm finds H 2 -matrix approximations that are almost as good as the ones of the H -matrix algorithm, but it is faster and exhibits a better scaling behaviour. The a priori algorithm is easily the fastest, but it is also very inaccurate if the wrong kind of cluster basis is used. In combination with the cluster bases provided by the adaptive algorithm, the a priori algorithm is very fast and yields very good approximations.
362
8 A posteriori matrix arithmetic
Table 8.3. Adaptive and non-adaptive multiplication algorithms for the single and double layer potential on the cube.
n Adaptive A priori/new H -arithmetic 768 0:7 conv. 0:7 conv. 0:7 conv. 3072 35:8 1:85 32:6 1:15 153:0 1:75 276:6 3:15 72:3 3:05 824:8 2:55 12288 49152 1343:7 4:15 240:5 4:05 6591:3 2:55 196608 6513:3 4:75 805:5 4:65 29741:6 2:55 KV 768 0:7 conv. 0:7 conv. 0:7 conv. 3072 37:6 2:75 37:2 2:65 152:9 5:55 12288 270:3 3:55 87:6 3:45 755:5 5:45 49152 1267:0 4:35 291:6 4:25 5688:3 5:65 196608 5933:5 9:05 989:9 9:05 26404:6 5:55 VK 768 0:7 conv. 0:7 conv. 0:7 conv. 3072 37:8 3:25 40:2 3:15 150:2 3:35 12288 267:1 3:35 94:4 3:35 737:4 3:75 49152 1219:5 4:55 310:4 4:45 5886:5 3:45 196608 5734:5 8:05 1067:0 7:85 27303:6 3:65 KK 768 0:7 conv. 0:7 conv. 0:7 conv. 3072 39:8 6:86 44:0 6:66 151:4 5:36 12288 272:4 1:65 100:3 1:65 658:2 6:46 49152 1327:9 3:25 324:5 3:25 5066:8 1:45 196608 5791:6 4:55 1080:8 4:55 24862:1 1:55
Oper. VV
Chapter 9
Application to elliptic partial differential operators According to Theorem 6.21 and Corollary 6.22, we can approximate a matrix X 2 RJ by an efficient H 2 -matrix if the total cluster bases of X and X can be approximated by low rank. We have already seen (e.g., in Chapter 4 and Lemma 6.39) that these assumptions hold for integral operators. Now we turn our attention to elliptic partial differential operators. The discretization of a partial differential operator L by a standard finite element scheme always leads to a matrix L in which for all blocks b D .t; s/ satisfying even the fairly weak admissibility condition dist. t ; s / > 0 the equation t Xs D 0 (cf. Definition 3.20) holds, i.e., each admissible block can be “approximated” by rank zero without any loss. Therefore the matrix L is an H 2 -matrix with trivial cluster bases. The inverse matrix L1 is more interesting: in typical situations, it corresponds to the non-local inverse of the partial differential operator, and we can expect most of its entries to be non-zero. In order to be able to handle L1 efficiently, we need a datasparse representation, and our goal in this chapter is to prove that an H 2 -matrix can be used to approximate L1 up to a certain accuracy proportional to the discretization error. This limitation is due to the structure of the proof, it is not experienced in numerical experiments. The proof is based on an approximation result [6] for the solution operator L1 corresponding to the equation: if and are subdomains satisfying a suitable admissibility condition and if the support of a functional f is contained in , the restriction of L1 f to can be approximated efficiently in a low-dimensional space. The operator L1 and the matrix L1 are connected by the Galerkin projection: applying this projection and its adjoint from the left and right to L1 , respectively, directly yields L1 due to Galerkin orthogonality. Unfortunately, the Galerkin projection is typically a non-local mapping, therefore local approximation properties of L1 would be lost by this procedure. In order to fix this problem, we replace the Galerkin projection by a different mapping into the discrete space. In [6], the L2 -orthogonal projection is used, which is at least quasi-local (i.e., exhibits exponential decay as the distance to the support grows) and leads to an error estimate for the approximation of L1 by an H -matrix. Since the L2 -projection is only quasi-local, the construction of the blockwise error estimates needed by H 2 -matrix approximation theory is complicated. In [15] a different approach is presented: the Galerkin projection is replaced by a Clément-type interpolation operator, and we get a new matrix S approximating the
364
9 Application to elliptic partial differential operators
inverse L1 . Since the interpolation operators are local, they can be used to easily derive the blockwise error estimates we need. Since the operators are also L2 -stable, they provide approximations that are almost as good as those of the L2 -orthogonal projection. This chapter is organized as follows: • Section 9.1 introduces a model problem for an elliptic partial differential equations with non-smooth coefficients. • Section 9.2 describes a construction for a low-rank approximation of the solution operator of the partial differential equation. • Section 9.3 uses this result to find low-rank approximations of admissible submatrices of the discrete solution operator S . • Section 9.4 applies the error estimates of Theorem 6.16 and Corollary 6.17 to prove that the discrete solution operator S can be approximated by an efficient H 2 -matrix. • Numerical experiments are described together with the approximative inversion algorithm in Section 10.5. Assumptions in this chapter: We assume that a cluster tree T for the finite index is given. Let T be an admissible block cluster tree for T . Let n ´ # and c ´ #T denote the number of indices and clusters of and T . Let p be the depth of T .
9.1 Model problem We fix a domain Rd and a coefficient function C W ! Rd d satisfying C.x/ D C.x/ ;
.C.x// Œ˛; ˇ
for all x 2 :
We are interested in the partial differential operator Lu ´
d X
@i Cij @j u
i;j D1
mapping the Sobolev space H01 . / into H 1 . /. For f 2 H 1 . /, the partial differential equation Lu D f (9.1)
9.1 Model problem
is equivalent to the variational equation Z a.v; u/ ´ hrv.x/; C.x/ru.x/i2 dx D f .v/
for all v 2 H01 . /:
365
(9.2)
The bounds for the spectrum of C imply ˛kwk22 hC.x/w; wi2 D kC.x/1=2 wk22 ˇkwk22 ;
for all w 2 Rd :
Combining this inequality with the Cauchy–Schwarz inequality provides us with the upper bound hrv.x/; C.x/ru.x/i2 D hC.x/1=2 rv.x/; C.x/1=2 ru.x/i2 kC.x/1=2 rv.x/k2 kC.x/1=2 ru.x/k2 ˇkrv.x/k2 kru.x/k2 ; and the definition of the Sobolev space H 1 . / yields ja.v; u/j ˇkvkH 1 ./ kukH 1 ./
for all u; v 2 H 1 . /:
This means that the bilinear form a is bounded, i.e., continuous. Due to hru.x/; C.x/ru.x/i2 ˛kru.x/k22 ; Friedrichs’ inequality implies the existence of a constant C 2 R>0 depending only on the domain such that 2 2 a.u; u/ ˛krukL 2 ./ C ˛kukH 1 ./
for all u 2 H01 . /;
i.e., a is a coercive bilinear form, therefore (9.2) and the equivalent (9.1) possess unique solutions [34]. Usually, strongly elliptic partial differential equations of the type (9.1) are treated numerically by a finite element method: a mesh h for the domain is constructed, and basis functions .'i /i2 are used to define a finite-dimensional space Vn ´ spanf'i W i 2 g H01 . /; where is a finite index set and n ´ # is the dimension of the discrete space Vn . Using the standard Galerkin approach, an approximation un 2 Vn of u is represented in the form X xi 'i un D i2
for the solution vector x 2 R of the linear system Lx D b
(9.3)
366
9 Application to elliptic partial differential operators
given by the stiffness matrix L 2 R and the load vector b 2 R defined by Lij D a.'i ; 'j /;
bi D f .'i /
for all i; j 2 :
(9.4)
The system (9.3) can be solved by several techniques, e.g., by fast direct solvers [93], multigrid iterations [61] or H - and H 2 -matrix methods [62], [52], [18], [15]. H - and H 2 -matrix techniques offer the advantage that they can handle jumps and anisotropies in the coefficient matrix C better than multigrid techniques and that they are more efficient than direct solvers for large problems.
9.2 Approximation of the solution operator Let Rd be a convex set with \ ¤ ;. Let be a subset with dist.; / > 0.
Let 2 R>0 . We are looking for a low-dimensional space V H 1 . / such that for each right-hand side f 2 H 1 . / with supp f the corresponding solution u 2 H01 . / of the variational equation (9.2) can be approximated in V , i.e., such that there exists a v 2 V with ku vkH 1 . / kf kH 1 ./ : Since V is required to be independent of f , this property implies that the interaction between the domains and can be described by a low-rank operator. If the coefficient function C and the boundary of were sufficiently smooth, interior regularity estimates would yield an estimate of the form m c kuj kH m . / C mŠkf kH 1 ./ for all m 2 N0 dist.; / and we could simply approximate uj by a polynomial uQ of order m. In this setting, the space V would have a dimension . md and the approximation uQ would converge
9.2 Approximation of the solution operator
367
exponentially with respect to the order m if an admissibility condition of the type (4.11) or (4.37) holds. In the general case, we have to use a refined approach first presented in [6]: since u 2 H 1 . / holds, we can approximate the solution by a piecewise constant function, but the convergence rate will not be exponential. Projecting this function into a local space of L-harmonic functions (cf. Definition 9.1 below) yields an approximation v1 . We can apply a weak interior regularity argument to show that v1 j 1 is contained in H 1 .1 / for a subset 1 , therefore the error u1 ´ uj 1 v1 j 1 is also an Lharmonic function in H 1 .1 /, and the argument can be repeated until a sufficiently accurate approximation v ´ v1 C C vp has been found. The key element of the proof is the space of locally L-harmonic functions: Definition 9.1 (Locally L-harmonic functions). Let ! Rd be a domain (that may be unrelated to ). A function u 2 L2 .!/ is called locally L-harmonic on ! if for all !Q ! with dist.!; Q @!/ > 0 the following conditions hold: uj!Q 2 H 1 .!/; Q a.v; uj / D 0 uj!n D 0:
(9.5a) for all v 2
H01 . /
with supp v !; Q
(9.5b) (9.5c)
The space of all locally L-harmonic functions on ! is denoted by H .!/. For functions in H .!/, the following weak interior regularity estimate holds (cf. Lemma 2.4 in [6]): Lemma 9.2 (Cacciopoli inequality). Let u 2 H .!/, and let !Q ! be a domain with Q and dist.!; Q @!/ > 0. Then we have uj!Q 2 H 1 .!/ p creg krukL2 .!/ kukL2 .!/ ; creg ´ 4 ˇ=˛ 4: Q dist.!; Q @!/ Proof. Let ı ´ dist.!; Q @!/. Let 2 C 1 . [ !/ be a function satisfying 0 1; j!Q 1; krk2 2=ı; .x/ D 0 for all x 2 ! with dist.x; @!/ < ı=4: Such a function exists since the distance between the subdomain !Q and the boundary of ! is ı > 0. For the domain !O ´ fx 2 ! W dist.x; @!/ > ı=8g; we have dist.!; O @!/ ı=8 > 0, so (9.5a) implies uj!O 2 H 1 .!/, O and since is continuously differentiable and bounded, we have v ´ 2 u 2 H01 .!/ and can extend this function by zero to H01 . [ !/. Due to (9.5c), we get vj 2 H01 . / with supp v !O and uj 2 H01 . /, therefore we can use (9.5b) in order to prove Z 0 D a.vj ; uj / D hC.x/r.2 u/.x/; ru.x/i2 dx !\ O
368
9 Application to elliptic partial differential operators
Z D !\ O
hC.x/.2ur C 2 ru/.x/; rui2 dx:
Moving the second term of this sum to the left side of the equation yields Z Z 2 1=2 2 .x/ kC.x/ ru.x/k2 dx D .x/2 hC.x/ru.x/; ru.x/i2 dx !\ O !\ O Z D 2 .x/u.x/hC.x/r.x/; ru.x/ dx !\ O Z 2 .x/ju.x/j kC.x/1=2 r.x/k2 kC.x/1=2 ru.x/k2 dx !\ O Z 1=2 2ˇ .x/ju.x/j kr.x/k2 kC.x/1=2 ru.x/k2 dx !\ O
4 4
ˇ
1=2
Z
ı
!\ O
.x/ju.x/j kC.x/1=2 ru.x/k2 dx
ˇ 1=2 kL2 .!\/ kuj!\ O O ı
1=2
Z !\ O
.x/2 kC.x/1=2 ru.x/k22 dx
Dividing both sides by the rightmost factor (if it is not zero) yields Z
1=2 2
1=2
.x/ kC.x/ !\ O
ru.x/k22
dx
4
ˇ 1=2 kL2 .!\/ : kuj!\ O O ı
Due to j!Q 1 and (9.5c), we get Z kruj!Q kL2 .!/ kL2 .!\/ D Q Q D kruj!\ Q ˛ ˛
1=2
1=2
4 ı
ˇ ˛
Z
1=2 2
.x/ !\ Q
2
1=2
ru.x/k22
2
1=2
ru.x/k22
.x/ kC.x/ Z
kru.x/k22
!\ Q
dx
1=2 dx 1=2
.x/ kC.x/ !\ O 1=2
kuj!\ kL2 .!\/ O O
dx p 4 ˇ=˛ kukL2 .!/ : ı
This is the required estimate. As mentioned before, we use orthogonal projections to map functions from L2 .!/ into H .!/. The construction of these projections is straightforward if H .!/ is a complete set, i.e., closed in L2 .!/. Using Lemma 9.2, this property can be proven (the proof is a variant of that of Lemma 2.2 in [6] which requires only elementary tools): Lemma 9.3. The space H .!/ is a closed subspace of L2 .!/.
9.2 Approximation of the solution operator
369
Proof. Let .un /n2N be a Cauchy sequence in H .!/ with respect to the L2 .!/-norm. Since L2 .!/ is complete, we can find a function u 2 L2 .!/ with lim kun ukL2 .!/ D 0:
(9.6)
n!1
Let !Q ! be a domain with dist.!; Q @!/ > 0. Lemma 9.2 implies 1=2 2 2 kvj!Q kH 1 .!/ C krvj k Q kL2 .!/ ! Q 2 Q D kvj! Q Q L .!/ C kvkL2 .!/ with the constant
for all v 2 H .!/
16ˇ=˛ C ´ 1C dist.!; Q @!/2
(9.7)
1=2 ;
therefore we have kun j!Q um j!Q kH 1 .!/ Q C kun um kL2 .!/
for all n; m 2 N
and conclude that .un j!Q /n2N is a Cauchy sequence with respect to the H 1 .!/-norm. Q Since H 1 .!/ Q is complete, we can find a function u!Q 2 H 1 .!/ Q with lim kun j!Q u!Q kH 1 .!/ Q D 0:
(9.8)
n!1
Q (9.6) implies Since the restriction to !Q is a continuous mapping from L2 .!/ to L2 .!/, lim kun j!Q uj!Q kL2 .!/ Q D 0:
n!1
Combining this property with (9.8) and the estimate kuj!Q u!Q kL2 .!/ Q un j! Q C un j! Q u! Q kL2 .!/ Q D kuj! Q kuj!Q un j!Q kL2 .!/ Q u! Q kL2 .!/ Q C kun j! Q kuj!Q un j!Q kL2 .!/ Q u! Q kH 1 .!/ Q C kun j! Q
for all n 2 N
1 yields kuj!Q u!Q kL2 .!/ Q Q D u! Q 2 H .!/. Q D 0, i.e., uj! 1 Let now v 2 H0 . / with supp v !. Q We have just proven that uj!Q 2 H 1 .!/ Q is 1 the limit of un j!Q with respect to the H -norm, therefore (9.5b) and the continuity of a imply a.v; uj / D lim a.v; un j / D 0: n!1
The continuity of the restriction from L2 .!/ to L2 .! n / yields uj!n D lim un j!n D 0; n!1
and we can conclude that u 2 H .!/ holds, therefore H .!/ is closed.
370
9 Application to elliptic partial differential operators
We introduce the maximum-norm diameter diam1 .!/ ´ supfkx yk1 W x; y 2 !g D supfjxi yi j W x; y 2 !; i 2 f1; : : : ; d g and can now state the basic approximation result (the proof is a slight modification of the proof of Lemma 2.6 in [6]): Lemma 9.4 (Finite-dimensional approximation). Let ! Rd be a convex domain. Let ` 2 N. Let Z be a closed subspace of L2 .!/. There is a space V Z with dim.V / `d such that for all u 2 Z \ H 1 .!/ a function v 2 V can be found with p diam1 .!/ 2 d ku vkL2 .!/ capx krukL2 .!/ ; capx ´ : ` Proof. We let ı ´ diam1 .!/ and introduce ai ´ inffxi W x 2 !g
for all i 2 f1; : : : ; d g:
By definition, we have xi ai D jxi ai j ı
for all i 2 f1; : : : ; d g; x 2 !;
i.e., the d -dimensional hypercube Q ´ a C Œ0; ıd satisfies ! Q. We let SQ ´ f1; : : : ; `gd and define Q ´ a C
d O i 1 i ı; ı ` `
for all 2 SQ
iD1
and observe that .Q /2SQ is a family of `d disjoint open hypercubes satisfying p diam1 .Q / D ı=` and diam.Q / D d ı=` that covers Q up to a null set. For each 2 SQ , we let ! ´ ! \ Q . Defining S! ´ f 2 SQ W j! j > 0g; we have found that .! /2S! is a family of not more than `d convex sets that covers ! up to a null set. We construct an intermediate approximation by piecewise constant functions defined on .! /2S! , i.e., by functions in the space W ´ fw 2 L2 .!/ W wj! is constant almost everywhere for all 2 S! g L2 .!/: For a function u 2 L2 .!/, we let w ´
1 j! j
Z u.x/ dx !
for all 2 S!
9.2 Approximation of the solution operator
371
and introduce the piecewise constant approximation w 2 W by w.x/ ´ w
for all 2 S! ; x 2 ! :
Due to u 2 H 1 .!/, the Poincaré inequality yields Z diam.! /2 kru.x/k22 dx ju.x/ w j 2 ! ! Z d ı2 2 2 kru.x/k22 dx for all 2 S! ` !
Z
2
and summing over all 2 S! yields ku wkL2 .!/
p dı krukL2 .!/ : `
This is already the desired estimate, but w is not necessarily contained in the space Z.
!2;3
Figure 9.1. Domains for the piecewise constant approximation in Lemma 9.4.
Since Z is a closed subspace of L2 .!/, the orthogonal projection …Z W L2 .!/ ! Z defined by h…Z f; giL2 .!/ D hf; giL2 .!/
for all f 2 L2 .!/; g 2 Z
exists and satisfies k…Z kL2 .!/ L2 .!/ 1 and …Z g D g for all g 2 Z. We let V ´ …Z W and v ´ …Z w 2 V and conclude p dı ku vkL2 .!/ D k…Z u …Z wkL2 .!/ ku wkL2 .!/ krukL2 .!/ ; ` i.e., V is a subspace of Z with a dimension not larger than `d satisfying the desired estimate.
372
9 Application to elliptic partial differential operators
Combining the approximation result of Lemma 9.4 with the regularity result of Lemma 9.2 allows us to find finite-dimensional spaces approximating the solutions of the variational equation (9.2): Theorem 9.5 (Low-rank approximation). Let 2 R>0 and q 2 .0; 1/. There are constants Capx ; Cdim 2 R>0 such that for all convex open domains Rd and all p 2 N2 , we can find a space V H . / satisfying dim V Cdim p d C1 :
(9.9)
diam1 . / 2 dist.; /
(9.10)
For all domains with
and all right-hand sides f 2 H 1 . / with supp f , the corresponding solution u 2 H01 . / of the variational equation (9.2) can be approximated by a function v 2 V with kruj rvj kL2 . / Capx q p kf kH 1 ./ ;
(9.11) p
kuj vj kH 1 . / Capx .dist.; /=8 C 1/q kf kH 1 ./ :
(9.12)
Proof. Let Rd be a convex open domain, let be a domain satisfying (9.10), let ı ´ diam. /=.2/, and let p; ` 2 N. We introduce the domains ² ³ .p i /ı !i ´ x 2 Rd W dist.x; / < for all i 2 f0; : : : ; pg: p
!0 !p
By construction we have !p , !i !i1 for all i 2 f1; : : : ; pg and !0 \ D ;. In order to apply Lemma 9.2, we need an estimate for the distance between the boundaries of these subdomains. Let i 2 f1; : : : ; pg, x 2 !i and y 2 @!i1 . Due to dist.x; / < .p i /ı=p, we can find z 2 with kx zk2
.p i /ı p
9.2 Approximation of the solution operator
373
Due to dist.y; / D .p i C 1/ı=p and z 2 , we have ky zk2
.p i C 1/ı p
and conclude kx yk2 ky zk2 kz xk2
ı .p i C 1/ı .p i /ı D ; p p p
i.e., dist.!i ; @!i1 / ı=p. Let f 2 H 1 . / with supp f . Let u 2 H01 . / be the corresponding solution of the variational equation (9.2). We define u0 2 L2 .!0 / by letting u0 j!0 \ ´ uj!0 ;
u0 j!0 n ´ 0:
Q Since u equals zero on @ , we get u0 2 H 1 .!0 /. For all v 2 H01 . / with supp v !, we have supp v \ supp f D ; and therefore a.v; u0 j / D a.v; u/ D f .v/ D 0; so we can conclude u0 2 H .!0 /. We now construct ui 2 H .!i / and vi 2 Vi H .!i1 / for all i 2 f1; : : : ; pg such that ui D .ui1 vi /j!i and 2capx . C 1/ı krui1 kL2 .!i 1 / ; ` (9.13) p krui kL2 .!i / c krui1 kL2 .!i 1 / ` hold for the constant c ´ 2creg capx . C 1/. Let i 2 f1; : : : ; pg and assume that ui1 2 H .!i1 / is given. We apply Lemma 9.4 to find a space Vi H .!i1 / with dim Vi `d and a function vi 2 Vi with kui kL2 .!i /
diam1 .!i1 / krui1 kL2 .!i 1 / ` diam1 . / C 2ı capx krui1 kL2 .!i 1 / : ` By definition, we have diam1 . / 2ı, and the estimate becomes kui1 vi kL2 .!i 1 / capx
2. C 1/ı (9.14) krui1 kL2 .!i 1 / : ` According to Lemma 9.2, the restriction ui ´ .ui1 vi /j!i of the error ui1 vi is contained in H .!i / and the interior regularity estimate creg krui kL2 .!i / kui1 vi kL2 .!i 1 / dist.!i ; @!i1 / p 2. C 1/ı p creg capx krui1 kL2 .!i 1 / D c krui1 kL2 .!i 1 / : ı ` ` kui1 vi kL2 .!i 1 / capx
374
9 Application to elliptic partial differential operators
holds. Restricting the left side of (9.14) to the subdomain !i !i1 completes the induction and we have proven (9.13) for all i 2 f1; : : : ; pg. Iterating the second estimate of (9.13) yields p i krui kL2 .!i / c kru0 kL2 .!0 / for all i 2 f1; : : : ; pg: ` By construction, we have up j D up1 j vp j D up2 j vp1 j vp j D D u0 j v1 j vp j ; so v ´ v1 j C C vp j is an approximation of u0 j D uj . We define the space V ´ V1 j C C Vp j ; where the restriction of the space is interpreted as the space spanned by the restriction of its elements, and get v 2 V with dim V p`d . The estimates (9.13) imply 2capx . C 1/ı krup1 kL2 .!i 1 / ` 2capx . C 1/ı p p1 kru0 kL2 .!0 / c ` ` ı p p D kru0 kL2 .!0 / ; c creg p ` p p D krup j kL2 . / c kru0 kL2 .!0 / : `
kuj vkL2 . / D kup j kL2 . /
kr.uj v/kL2 . /
Combining both estimates allows us to bound the full H 1 -norm by 1=2 2 2 kuj vkH 1 . / D kuj vkL C kr.uj v/j k L2 . / 2 . / !1=2 p p ı2 c C 1 kru0 kL2 .!0 / 2 p2 creg ` 2 1=2 p p ı c C 1 kru0 kL2 .!0 / 2 2 4 2 ` p p .ı=8 C 1/ c kru0 kL2 .!0 / : ` Since u is the solution of (9.2), we have 2 kukH 1 ./
1 C ˛
a.u; u/ D
1 C ˛
f .u/
1 C ˛
kf kH 1 ./ kukH 1 ./ ;
and this implies kru0 kL2 .!0 / krukL2 ./ kukH 1 ./
1 kf kH 1 ./ : C ˛
9.2 Approximation of the solution operator
375
Combining this estimate with the error bounds for u v yields 1 p p kr.uj v/kL2 . / kf kH 1 ./ ; c C ˛ ` p p 1 kuj vkH 1 . / kf kH 1 ./ : .ı=8 C 1/ c C ˛ ` In order to get the estimates (9.9), (9.11) and (9.12), we have to choose ` appropriately. A simple approach is to let cp ; `´ q since this yields p p p pq c c D q; c qp ; cp ` ` and the dimension of V can be bounded by `
cp c 1 C1 pC p D q q 2
c 1 C p q 2
due to p 2, so setting Cdim ´
c 1 C q 2
d ;
Capx ´
1 C ˛
yields dim V p`d Cdim p d C1 and kr.uj v/kL2 . / Capx q p kf kH 1 ./ ; kuj vkH 1 . / Capx q p .ı=8 C 1/kf kH 1 ./ ; therefore the proof is complete. This result is closely related to Theorem 2.8 in [6], but it yields an H 1 -norm estimate for the solution of the variational equation (9.2) using the H 1 -norm of the right-hand side functional instead of an L2 -norm estimate of Green’s function. The main difference between both proofs is that the one given here exploits the fact that the original solution u already is L-harmonic in , therefore we can perform the approximation by Lemma 9.4 first, and follow it by the regularity estimate of Lemma 9.2 in order to get an H 1 -estimate for the error. The proof of [6], Theorem 2.8, on the other hand, deals with Green’s function, and this function is not globally in H 1 , therefore the first step has to be the regularity estimate and the resulting error bound is given only for the L2 -norm. Remark 9.6 (Direct L2 -norm estimate). Of course, we can use the same ordering of regularity estimates and approximation steps in Theorem 9.5 in order to get an estimate of the form p p kuj vkL2 . / c ku0 kL2 .!0 / Capx q p kf kH 1 ./ `
376
9 Application to elliptic partial differential operators
instead of (9.11). Since the space V constructed in this way would differ from the one used in Theorem 9.5, we cannot simply combine both estimates in order to get an estimate for the full H 1 -norm and have to rely on results of the type (9.12) instead. Remark 9.7 (Influence of ˛ and ˇ). The bounds ˛ and ˇ for the spectra of the coefficient matrices influence the constants Cdim and Capx . Due to p p 2 d . C 1/; c D 2creg capx . C 1/ D 8 ˇ=˛ we have p p 2 d 1 d Cdim D 16 ˇ=˛ . C 1/ C q 2 and can expect Cdim . .ˇ=˛/d=2 for ˇ ˛. The constant ˛ also appears in the definition of Capx , and we get Capx . 1=˛.
9.3 Approximation of matrix blocks The problem of proving that the H 2 -matrix arithmetic algorithms yield a sufficiently accurate approximation of the inverse can be reduced to an existence result: since the adaptive arithmetic operations (cf. Section 8 and [49]) have a best-approximation property, we only have to show that an approximation of L1 by an H 2 -matrix exists, because this already implies that the computed approximation Sz will be at least as good as this approximation. This proof of existence can be accomplished using our main result stated in Theorem 9.5: a block t L1 s describes the mapping from a right-hand side vector b with support in s to the restriction of the corresponding discrete solution to t . In order to apply our approximation result, we have to exploit the relationship between the inverse matrix L1 and the inverse operator L1 . In [6], this problem is solved by applying L2 -orthogonal projections. Since these projections are non-local operators, additional approximation steps are required, which increase the rank, lead to sub-optimal error estimates, and make the overall proof quite complicated. We follow the approach presented in [15]: instead of a non-local L2 -projection, a Clément-type interpolation operator [35] can be used to map continuous functions into the discrete space. These operators are “sufficiently local” to provide us with improved error estimates and guarantee that the rank of the approximation will not deteriorate. In order to keep the presentation simple, we assume that the finite element mesh is shape-regular in the sense of [38], Definition 2.2. It is not required to be quasi-uniform. Let us recall the basic definitions and properties of Clément interpolation operators: we fix a family .i /i2 of functionals mapping L2 . / into R such that supp i supp 'i
for all i 2 ;
(9.15)
9.3 Approximation of matrix blocks
377
holds, that the local projection property i .'j / D ıij
for all i; j 2
(9.16)
is satisfied and also that the local stability property ki .u/'i kL2 ./ Ccs kukL2 .supp 'i /
for all i 2 ; u 2 L2 . /
(9.17)
holds for a constant Ccs 2 R>0 depending only on the shape-regularity of the mesh. Constructions of this kind can be found in [95], [8]. The interpolation operator is defined by X I W L2 . / ! Vn ; u 7! i .u/'i : (9.18) i2
The local projection property (9.16) implies its global counterpart Ivn D vn
for all vn 2 Vn :
(9.19)
Since the matrices we are dealing with are given with respect to the space R , not Vn , we need a way of switching between both spaces. This is handled by the standard basis isomorphism X ˆ W R ! Vn H01 . /; x 7! xi 'i : i2
The interpolation operator I can be expressed by I D ˆƒ if we define ƒ W L2 . / ! R by .ƒv/i ´ i .v/
for all i 2 ; v 2 L2 . /:
In order to construct the approximation of L1 by using L1 , we turn a vector b 2 R into a functional, apply L1 , and approximate the result again in Vh . The first step can be accomplished by using the adjoint of ƒ: we define X ƒ W R ! .L2 . //0 ; b 7! bi i : i2
This operator turns each vector in R into a functional on L2 . /, and due to X X X hƒ b; vi D bi hi ; vi D bi i .v/ D bi .ƒv/i D hb; ƒvi2 ; i2
it is indeed the adjoint of ƒ.
i2
i2
378
9 Application to elliptic partial differential operators
If the vector b is given by (9.4) for a right-hand side functional f 2 H 1 . /, the projection property (9.16) implies X .ƒ b/.'j / D bi i .'j / D bj D f .'j / for all j 2 ; b 2 R ; i2
therefore the functional ƒ b and the original right-hand side f of (9.1) yield the same Galerkin approximation un . We have to prove that ƒ is a bounded mapping with respect to the correct norms. In order to do so, we first have to prove that I is L2 -stable. The shape-regularity of the mesh implies that there is a constant Csr 2 N such that #fj 2 W supp 'i \ supp 'j ¤ ;g Csr
for all i 2
holds. Since the local finite element spaces are finite-dimensional, there is also a constant Cov 2 N such that #fj 2 W 'j .x/ ¤ 0g Cov
for all x 2 :
Using these estimates, we can not only prove the well-established global L2 -stability of the interpolation operator I, but also localized counterparts on subdomains corresponding to clusters: p Lemma 9.8 (Stability). Let Ccl ´ Ccs Csr Cov . For all tO and all with supp 'i
for all i 2 tO
we define the local interpolation operator I t W L2 . / ! Vn ;
v 7!
X
i .v/'i :
i2tO
Then we have kI t vkL2 ./ Ccl kvkL2 . /
for all v 2 L2 . /:
(9.20)
In particular, the interpolation operator I is L2 -stable, i.e., satisfies kIvkL2 ./ Ccl kvkL2 ./
for all v 2 L2 . /:
Proof. Let t 2 T and v 2 L2 . t /. We define the functions i and ij by ´ 1 if 'i .x/ ¤ 0; i W ! N; x 7! for all i 2 ; 0 otherwise; ´ 1 if 'i .x/ ¤ 0; 'j .x/ ¤ 0; ij W ! N; x 7! for all i; j 2 0 otherwise
(9.21)
9.3 Approximation of matrix blocks
and observe X i .x/ Cov ;
X
i2
ij .x/ D
i2
X
379
for all j 2 ; x 2 :
j i .x/ Csr
i2
Combining these estimates with Cauchy’s inequality yields Z X 2 X 2 2 i .v/'i 2 D i .v/'i .x/ dx kI t vkL2 ./ D L ./
O
D
t Z i2X X
D
i2tO
i .v/j .v/'i .x/'j .x/ dx
i2tO j 2tO
Z XX
ij .x/i .v/j .v/'i .x/'j .x/ dx
i2tO j 2tO
Z XX
ij .x/i .v/2 'i .x/2
1=2
i2tO j 2tO
XX
ij .x/j .v/2 'j .x/2
1=2 dx
i2tO j 2tO
D
Z XX
D Csr
ij .x/i .v/2 'i .x/2 dx Csr
i2tO j 2tO
X
XZ i2tO
i .v/2 'i .x/2 dx
2 ki .v/'i kL 2 ./ :
i2tO
Now we apply the local stability estimate (9.17) to get 2 kI t vkL 2 ./
Csr Ccs2
X i2tO
D Csr Ccs2 D
2 kvkL 2 .supp ' / i
XZ
D
Csr Ccs2
XZ i2tO
v.x/2 dx supp 'i
Z
i .x/v.x/2 dx Csr Cov Ccs2
i2tO 2 Csr Cov Ccs2 kvkL 2 . /
v.x/2 dx
This is equivalent to (9.20). For tO ´ and ´ , this estimate implies (9.21). We are interested in bounding ƒ b by a norm of the vector b, so we need a connection between coefficient vectors and the corresponding elements of Vn . Since the finite element mesh is shape-regular, a simple application of Proposition 3.1 in [38] yields that there is a positive definite diagonal matrix H 2 R satisfying Cb1 kH d=2 xk2 kˆxkL2 ./ Cb2 kH d=2 xk2
for all x 2 R ;
380
9 Application to elliptic partial differential operators
where Cb1 ; Cb2 2 R>0 are constants depending only on the shape-regularity of the mesh. Using this inequality, we can prove the necessary properties of ƒ : Lemma 9.9 (ƒ bounded and local). Let % 2 Œ0; 1 and b 2 R . We have kƒ bkH 1C% ./
Ccl kH d=2 bk2 ; Cb1
(9.22)
i.e., ƒ is a continuous mapping from R to H 1C% . /. The mapping preserves locality, i.e., it satisfies [ fsupp 'i W i 2 ; bi ¤ 0g: supp.ƒ b/
(9.23)
Proof. Let v 2 H01% . /. Let y ´ ƒv and vn ´ ˆy D Iv. By the definition of ƒ we get .ƒ b/.v/ D hb; ƒvi2 D hb; yi2 D hb; H d=2 H d=2 yi2 D hH d=2 b; H d=2 yi2 kH d=2 bk2 kH d=2 yk2 kH d=2 bk2 kH d=2 bk2 kˆyk2 D kvn kL2 ./ Cb1 Cb1 kH d=2 bk2 Ccl D kIvkL2 ./ kH d=2 bk2 kvkL2 ./ ; Cb1 Cb1
and this implies (9.22) due to kvkL2 ./ kvkH 1% ./ . Due to (9.15) and the definition of ƒ , we have X X [ [ supp.ƒ b/ D supp bi i D supp b i i supp i supp 'i : i2
i2 bi ¤0
i2 bi ¤0
i2 bi ¤0
This is the desired inclusion. This result allows us to switch from the vector b corresponding to the discrete setting to the functional f of the variational setting. In the variational setting, we can apply Theorem 9.5 to construct the desired approximation of the solution, then we have to switch back to the discrete setting. Unfortunately, we cannot use the Galerkin projection to perform this last step, which would be the natural choice considering that we want to approximate un , since it is a global operator and the approximation result only holds for a subdomain. Therefore we have to rely on the Clément-type interpolation operator again, which has the desired locality property. Using interpolation instead of the Galerkin projection leads to a second discrete approximation of L1 , given by the matrix S D ƒL1 ƒ 2 R :
9.3 Approximation of matrix blocks
381
If we let b 2 R , f ´ ƒ b, u ´ L1 f and uQ n ´ Iu, we observe ˆS b D uQ n , i.e., S provides us with the coefficients of the Clément-type approximation of the solution operator. Since I is L2 -stable (cf. (9.21)) and a projection (cf. (9.19)), we have the estimate kuQ n ukL2 ./ D kIu ukL2 ./ D kIu Ivn C vn ukL2 ./ D kI.u vn / C .vn u/kL2 ./ kI.vn u/kL2 ./ C kvn ukL2 ./ .Ccl C 1/kvn ukL2 ./
(9.24)
for all vn 2 Vn , i.e., uQ n is close to the best possible approximation of u with respect to the L2 -norm. In particular we can apply (9.24) to the Galerkin solution un and get kuQ n ukL2 ./ .Ccl C 1/kun ukL2 ./ ;
(9.25)
therefore uQ n will converge to the same limit as un , and its rate of convergence will be at least as good. In fact, (9.24) even implies that uQ n may converge faster than un in situations of low regularity: due to u 2 H 1 . /, we can always expect uQ n to converge at least like O.h/ with respect to the L2 -norm, where h is the maximal meshwidth. If we assume that the equation (9.1) is H 1% . /-regular, it is possible to derive a refined error estimate for the matrices S and L1 : Lemma 9.10 (Clément vs. Galerkin). Let % 2 Œ0; 1. We assume that for all functionals f 2 H 1C% . /, the solution u ´ L1 f satisfies u 2 H01C% . / and kukH 1C% ./ Crg kf kH 1C% ./ :
(9.26)
Then there is a constant Ccg 2 R>0 depending only on Crg , Ccl , Cb1 , and the shaperegularity of the mesh with kH d=2 .S L1 /bk2 Ccg h2˛ kH d=2 bk2
for all b 2 R ;
where h 2 R>0 is the maximal meshwidth. Proof. Let b 2 R , let f ´ ƒ b and u ´ L1 f . Let x ´ L1 b and un ´ ˆx. Q By definition, we have uQ n D Iu. Let xQ ´ S b and uQ n ´ ˆx. Using the standard Aubin–Nitsche lemma yields ku un kL2 ./ Can h2˛ kf kH 1C% ./ ; with a constant Can depending only on Crg and the shape-regularity parameters of the mesh. Combining this estimate with (9.25) gives us kun uQ n kL2 ./ kun ukL2 ./ C ku uQ n kL2 ./ Can .Ccl C 2/h2˛ kf kH 1C˛ ./ :
382
9 Application to elliptic partial differential operators
We observe 1 kˆS b ˆL1 bkL2 ./ Cb1 1 Can .Ccl C 2/ 2˛ D kuQ n un kL2 ./ h kf kH 1C˛ ./ Cb1 Cb1 Can .Ccl C 2/ 2˛ Ccl h kH d=2 bk2 Cb1 Cb1
kH d=2 .S L1 /bk2
2 and complete the proof be setting Ccg ´ Can Ccl .Ccl C 2/=Cb1 .
Due to this result, a good approximation of S on a sufficiently fine mesh is also a good approximation of L1 , and a good approximation of S can be constructed by Theorem 9.5: we pick subsets tO; sO and subsets ; Rd with supp 'i ;
supp 'j ;
is convex
for all i 2 tO; j 2 sO :
(9.27)
Theorem 9.5 yields the following result: Theorem 9.11 (Blockwise low-rank approximation). Let 2 R>0 , q 2 .0; 1/. There are constants Cblk ; Cdim 2 R>0 depending only on , q, and the shape regularity of the mesh such that for all tO; sO and ; Rd satisfying (9.27) and the admissibility condition diam. / 2 dist.; / (9.28) and all p 2 N2 we can find a rank k 2 N with k Cdim p d C1 and matrices X t;s 2 RtOk , Y t;s 2 RsO k with /bk2 Cblk q p kHsd=2 bk2 kH td=2 .SjtOOs X t;s Y t;s
for all b 2 RsO
for H t ´ H jtOtO , Hs ´ H jsO Os , i.e., the submatrix of S corresponding to the block tO sO can be approximated by a matrix of rank k. Proof. If \ D ; or \ D ;, we have u 0 and the error estimates holds for the trivial space V D f0g. Therefore we can restrict our attention to the case \ ¤ ;, \ ¤ ;. Due to Theorem 9.5, there is a space V L2 . / with dim V Cdim p d C1 such that for all f 2 H 1 . / with supp f we can find a function v 2 V satisfying kuj vkH 1 . / Capx .dist.; /=8 C 1/q p kf kH 1 ./ with u ´ L1 f . Let b 2 RsO . We extend b to a vector bO 2 R by ´ bj if j 2 sO ; bOj ´ for all j 2 : 0 otherwise
(9.29)
9.3 Approximation of matrix blocks
383
Due to Lemma 9.9, the functional f ´ ƒ bO satisfies supp f ;
kf kH 1 ./
Ccl O 2 D Ccl kH d=2 bk2 : kH d=2 bk s Cb1 Cb1
(9.30)
Let u ´ L1 f , and let v 2 V be the local approximation introduced in (9.29). Since v approximates u only locally, we need local variants of ƒ and ˆ: ƒ t W L2 . / ! RtO ; ˆ t W RtO ! Vh ;
v 7! .i .v//i2tO ; X y 7! yi 'i : i2tO
O O D Sj O b. We let xQ ´ ƒ t u. According to the definition of S , we have xQ D .S b/j t t Os Let us now turn our attention to the local approximation of u. Due to \ ¤ ; and \ ¤ ;, we have dist.; / diam. /, and we have already seen that we can find a function v 2 V with kuj vkH 1 . / Capx .dist.; /=8 C 1/q p kf kH 1 ./ Capx .diam. /=8 C 1/q p kf kH 1 ./ :
(9.31)
We let yQ ´ ƒ t v and observe that Lemma 9.8 implies 1 1 kˆ t .xQ y/k Q L2 . / D kˆ t ƒ t .uj v/kL2 . / Cb1 Cb1 1 Ccl D kI t .uj v/kL2 . / kuj vkL2 . / : (9.32) Cb1 Cb1
kH td=2 .xQ y/k Q L2 . /
Now we can define Cblk ´ Capx .diam. /=8 C 1/
Ccl2 2 Cb1
and combining (9.30), (9.31) and (9.32) yields kH td=2 .xQ y/k Q L2 . / Cblk q p kHsd=2 bk2 : of S jtOOs : we Using this result, we can now derive the low-rank approximation X t;s Y t;s introduce the space Zh ´ fH td=2 ƒ t w W w 2 V g
and observe k ´ dim Zh dim V Cdim p d C1 for the dimension of Zh and H td=2 yQ D H td=2 ƒ t v 2 Zh . We fix an orthogonal basis of Zh , i.e., a matrix Q 2 RtOk with orthogonal columns and range Q D Zh . We define zQ ´ H t1=2 QQ H td=2 x. Q Since Q is orthogonal, QQ is the orthogonal projection onto Zh and we get hH td=2 .xQ z/; Q wi2 D hH td=2 xQ QQ H td=2 x; Q QQ wi2 D 0
384
9 Application to elliptic partial differential operators
for all w 2 Zh , , i.e., H td=2 zQ is the best approximation of H td=2 xQ in the space Zh . In Q and we get particular, H td=2 zQ is at least as good as H td=2 y, kH td=2 .xQ z/k Q 2 kH td=2 .xQ y/k Q 2 Cblk q p kHs1=2 bk2 : We let X t;s ´ H t1=2 Q 2 RtOk ;
Y t;s ´ S jtOOs H td=2 Q 2 RsO k
and conclude b; zQ D H t1=2 QQ H td=2 xQ D .H t1=2 Q/.Q H td=2 S jtOOs /b D X t;s Y t;s
which completes the proof.
9.4 Compression of the discrete solution operator In order to apply Theorem 9.11, we have to use an admissibility condition of the type (9.10). The straightforward choice for the convex set is the bounding box (cf. (3.10)), and with this choice, the admissibility condition takes the form maxfdiam.Q t /; diam.Qs /g 2 dist.Q t ; Qs / we have already encountered in (4.49). According to Corollary 6.17, finding low-rank approximations for the total cluster basis .S t / t2T (cf. Definition 6.11) corresponding to the matrix S is equivalent to finding an orthogonal nested cluster basis V D .V t / t2T such that the orthogonal projection …T ;V; S into the space of left semi-uniform matrices defined by T and V is a good approximation of S. Lemma 9.12 (Approximation of S t ). Let 2 R>0 and q 2 .0; 1/. There are constants Cblk ; Cdim 2 R>0 depending only on , q, and the shape regularity of the mesh such that for all t 2 T and all p 2 N2 we can find a rank k 2 N with k Cdim p d C1 and matrices X t 2 RtOk ; Y t 2 Rk with kH d=2 .S t X t Y t /bk2 Cblk q p kH d=2 bk
for all b 2 R :
Proof. Let t 2 T . The matrix S t of the total cluster basis of S is given by X St D t Ss ; s2row .t/
i.e., it corresponds to the restriction of S to the rows in tO and the columns in [ Nt ´ sO : s2row .t/
9.4 Compression of the discrete solution operator
385
Due to definition (cf. Lemma 5.7) we can find a t C 2 pred.t / for each s 2 row .t / such that .t C ; s/ is an admissible leaf of T , i.e., .t C ; s/ 2 LC . Since we are using the admissibility condition (4.49), this implies diam.Q t / diam.Q t C / 2 dist.Q t C ; Qs / 2 dist.Q t ; Qs /: We let ´ Qt ;
´
[
Qs
s2row .t/
and conclude that is convex, that supp 'i ;
supp 'j
hold for all i 2 tO; j 2 N t ;
and that diam./ D diam.Q t / minf2 dist.Q t ; Qs / W s 2 row .t /g D 2 dist.; / holds, i.e., and satisfy the requirements of Theorem 9.11. Applying the theorem and extending the resulting rank-k-approximation of S jtON t by zero completes the proof. Theorem 9.13 (Approximation of S). Let 2 R>0 and q 2 .0; 1/. There are constants Cblk ; Cdim 2 R>0 depending only on , q, and the shape regularity of the mesh such that for all p 2 N2 we can find a nested cluster basis V D .V t / t2T with a rank distribution .L t / t2T satisfying #L t Cdim p d C1 and z 2 Cyblk .p C 1/pc q p kH d=2 bk2 kH d=2 .S S/bk
for all b 2 R ;
where Sz 2 H 2 .T ; V; V / is an H 2 -matrix and p is the depth of T . Proof. Let Cblk and Cdim be defined as in Lemma 9.12. We let Sy ´ H d=2 SH d=2 and denote its total cluster basis by .Syt / t2T . Replacing b by H 1=2 bO in this lemma yields O 2 Cblk q p kbk O 2 /H d=2 bk kH d=2 .S t X t;s Y t;s
for all bO 2 R ;
i.e., the matrices Syt can be approximated by rank Cdim p d C1 up to an accuracy of Cblk q p . We apply Corollary 6.18 to prove that there exists a nested orthogonal cluster basis Q D .Q t / t2T with rank distribution L D .L t / t2T satisfying #L t Cdim p d C1 and X 2 2p 2 y 22 Cblk q Cblk .#T /q 2p : (9.33) kSy …T ;Q; ŒSk t2T
This provides us with an estimate for the row cluster basis.
386
9 Application to elliptic partial differential operators
Since L is self-adjoint, we have L D L and L1 D .L1 / and conclude S D .ƒL1 ƒ / D .ƒ / .L1 / ƒ D ƒL1 ƒ D S; therefore we can use Q also as a column cluster basis. Restricting (9.33) to admissible blocks b D .t; s/ 2 T yields p y s k2 Cblk #T q p for all b D .t; s/ 2 LC : k t Sys Q t Qt S We introduce the cluster basis V D .V t / t2T by V t ´ H d=2 Q t define the approximation of S by X Sz ´ t Ss C bD.t;s/2L
for all t 2 T ; X
V t Qt SyQs Vs
bD.t;s/2LC
and get z s /H d=2 k22 D kH d=2 . t Ss V t Qt SyQs Vs /H d=2 k22 kH d=2 . t .S S/ y s Q t Qt SQ y s Qs k22 D k t S y s Q t Qt S y s k22 C kQ t Qt .Sys SyQs Qs /k22 D k t S y s Q t Qt S y s k22 C ks Sy t Qs Qs Sy t k22 k t S 2 2Cblk .#T /q 2p
for all b D .t; s/ 2 LC :
We use Theorem 4.47 to get kH d=2 .S Sz/H d=2 k2 Csp
p X
Cblk
p p 2#T q p Cblk Csp .p C 1/ 2#T q p ;
`D0
p
and setting Cyblk ´ Csp Cblk 2 and substituting b for bO completes the proof.
Chapter 10
Applications
We have considered methods for approximating integral operators in the Chapters 2 and 4, we have discussed methods for reducing the storage requirements of these approximations in the Chapters 5 and 6, and we have investigated techniques for performing algebraic operations with H 2 -matrices in the Chapters 7 and 8. Using the tools provided by these chapters, we can now consider practical applications of H 2 -matrices. Due to their special structure, H 2 -matrix techniques are very well suited for problems related to elliptic partial differential equations, and we restrict our attention to applications of this kind: • In Section 10.1, we consider a simple boundary integral equation related to Poisson’s equation on a bounded or unbounded domain. • In Section 10.2, we approximate the operator mapping Dirichlet to Neumann boundary values of Poisson’s equation. • Section 10.3 demonstrates that the approximative arithmetic operations for H 2 matrices can be used to construct efficient preconditioners for boundary integral equations. • Section 10.4 shows that our techniques also work for non-academic examples. • Section 10.5 is devoted to the construction of solution operators for elliptic partial differential equations. These operators are non-local and share many properties of the integral operators considered here, but they cannot be approximated by systems of polynomials. Our goal in this chapter is not to prove theoretical estimates for the accuracy or the complexity, but to demonstrate that the H 2 -matrix techniques work in practice and to provide “rules of thumb” for choosing the parameters involved in setting up the necessary algorithms. Assumptions in this chapter: We assume that cluster trees T and TJ for the finite index sets and J are given. Let T be an admissible block cluster tree for T . Let p and pJ be the depths of T and TJ .
388
10 Applications
10.1 Indirect boundary integral equation Let R3 be a bounded Lipschitz domain. We consider Laplace’s equation u ´
3 X
@2i u D 0
(10.1a)
iD1
with Dirichlet-type boundary conditions uj D uD
(10.1b)
for the solution u 2 H 1 . / and the Dirichlet boundary values uD 2 H 1=2 ./.
Boundary integral equation According to [92], Subsection 4.1.1, we can solve the problem (10.1) by finding a density function f 2 H 1=2 ./ satisfying Symm’s integral equation Z f .y/ dy D uD .x/ for almost all x 2 : (10.2)
4kx yk2 Using this f , the solution u 2 H 1 . / is given by Z f .y/ u.x/ D dy for almost all x 2 : 4kx yk2
We introduce the single layer potential operator Z V W H 1=2 ./ ! H 1=2 ./; f 7! x 7!
f .y/ dy ; 4kx yk2
and note that (10.2) takes the form V f D uD :
(10.3)
We multiply both sides of the equation by test functions v 2 H 1=2 ./, integrate, and see that (10.3) is equivalent to the variational equation aV .v; f / D hv; uD i
for all v 2 H 1=2 ./
with the bilinear form aV W H 1=2 ./ H 1=2 ./ ! R; Due to [92], Satz 3.5.3, aV is H tion.
1=2
Z .v; f / 7!
Z v.x/
(10.4)
f .y/ dy dx: 4kx yk2
./-coercive, therefore (10.4) has a unique solu-
10.1 Indirect boundary integral equation
389
Discretization In order to discretize (10.4) by Galerkin’s method, we assume that can be represented by a conforming triangulation G ´ fi W i 2 g (cf. Definition 4.1.2 in [92]) and introduce the family .'i /i2 of piecewise constant basis functions given by ´ 1 if x 2 i ; 'i .x/ ´ for all i 2 ; x 2 : 0 otherwise Applying Galerkin’s method to the bilinear variational equation (10.4) in the finitedimensional space S0 ´ spanf'i W i 2 g H 1=2 ./ leads to the system of linear equations Vx D b for the matrix V 2 R and the right-hand side vector b 2 R defined by Z Z 'j .y/ Vij ´ 'i .x/ dy dx 4kx yk2
Z Z 1 D dy dx for all i; j 2 ; 4kx yk2 Zi j uD .x/ dx for all i 2 : bi ´
(10.5)
(10.6)
i
The solution vector x 2 R of (10.5) defines the approximation X fh ´ xi 'i i2
of the solution f 2 H 1=2 ./.
Approximation of the stiffness matrix We can see that Vij > 0 holds for all i; j 2 , therefore the matrix V is densely populated and cannot be handled efficiently by standard techniques. Fortunately, the matrix V 2 R fits perfectly into the framework presented in Chapter 4, i.e., we can construct an H 2 -matrix approximation Vz 2 R and solve the perturbed system Vz xQ D b:
(10.7)
390
10 Applications
The solution of the perturbed system is given by X xQ i 'i fQh ´ i2
and can be characterized by the variational equation aQ V .vh ; fQh / D hvh ; uD i
for all vh 2 S0 ;
where the perturbed bilinear form aQ V .; / is defined by aQ V .'i ; 'j / ´ Vzij
for all i; j 2 :
(10.8)
We have to ensure that approximating V by Vz , i.e., approximating a.; / by a.; Q /, does not reduce the speed of convergence of the discretization scheme. According to [92], Satz 4.1.32, we can expect kf fh k1=2; . h3=2 in the case of maximal regularity, i.e., if f 2 H 1 ./ holds, therefore our goal is to choose the accuracy of the approximation Vz in such a way that kf fQh k1=2; . h3=2 holds. Strang’s first lemma (e.g., Theorem 4.1.1 in [34] or Satz III.1.1 in [26]) yields the estimate jaV .vh ; wh / aQ V .vh ; wh /j kf fQh k1=2; . inf kf vh k1=2; C sup vh 2S0 kwh k1=2; wh 2S0 if aQ V is H 1=2 ./-elliptic, i.e., if aQ V is “close enough” to the elliptic bilinear form aV . We choose vh D fh in order to bound the first term on the right-hand side and now have to prove that the second term is also bounded. The latter term is related to the approximation error introduced by using Vz instead of V . In order to bound it, we assume that the triangulation G is shape-regular and quasiuniform with grid parameter h 2 R>0 . Due to the inverse estimate [38], Theorem 4.6, we have h1=2 kwh k0; . kwh k1=2; and conclude
for all wh 2 S0
jaV .vh ; wh / aQ V .vh ; wh /j 1=2 sup kf vh k1=2; Ch : vh 2S0 kwh k0; wh 2S0
kf fQh k1=2; . inf
We can bound the right-hand side of this estimate by using vh D fh and the following estimate of the approximation error:
391
10.1 Indirect boundary integral equation
Lemma 10.1 (Accuracy of aQ V .; /). We have jaV .vh ; wh / aQ V .vh ; wh /j . hd kV Vz k2 kvh k0; kwh k0;
for all vh ; wh 2 S0 :
Proof. Let x; y 2 R be the coefficient vectors of vh and wh satisfying X X vh D yj 'j : xi 'i ; wh D j 2
i2
We have
ˇXX ˇ ˇ ˇ jaV .vh ; wh / aQ V .vh ; wh /j D ˇ xi yj .aV .'i ; 'j / aQ V .'i ; 'j //ˇ i2 j 2J
ˇXX ˇ ˇ ˇ Dˇ xi yj .Vij Vzij /ˇ i2 j 2J
D jhx; .V Vz /yi2 j kV Vz k2 kxk2 kyk2 : Due to [38], Proposition 3.1, we find hd =2 kxk2 . kvh k0; ;
hd =2 kyk2 . kwh k0; ;
and combining these estimates completes the proof. Applying this result to the estimate from Strang’s lemma yields kf fQh k1=2; . kf fh k1=2; C hd 1=2 kV Vz k2 kfh k0; if kV Vz k2 is small enough to ensure that Vz is still positive definite. Since fh converges to f if the mesh size decreases, we can assume that kfh k0; is bounded by a constant and get kf fQh k1=2; . kf fh k1=2; C hd 1=2 kV Vz k2 : If we can ensure kV Vz k2 . hd C2 ;
(10.9)
we get kf fQh k1=2; . h3=2 ; which yields the desired optimal order of convergence. Combining the error estimate provided by Theorem 4.49 with the simplified interpolation error estimate of Remark 4.23 yields kV Vz k2 . Cov C2 q m
p X `D0
² max
³ j t j .`/ W t 2 T : diam.Q t /
(10.10)
392
10 Applications
The piecewise constant basis functions satisfy Cov D 1 and C . hd =2 . The kernel function defining V has a singularity of order D 1, therefore we can assume j t j . diam.Q t /d D diam.Q t /: diam.Q t / This means that the sum in estimate (10.10) can be bounded by a geometric sum dominated by the large clusters (cf. Remark 4.50), and these clusters can be bounded independently of h. We conclude kV Vz k2 . hd q m : In order to guarantee (10.9), we have to choose the order m 2 N large enough to yield q m . h2 , i.e., log.h/= log.q/ . m.
Experiments According to our experiments (cf. Table 4.2), increasing the interpolation order by one if the mesh width h is halved should be sufficient to ensure q m . h2 and therefore kf fQh k1=2; . h3=2 . We test the method for the approximations of the unit sphere S used in the previous chapters and the harmonic functions u1 W R3 ! R;
x 7! x1 C x2 C x3 ;
u2 W R3 ! R;
x 7! x12 x32 ; 1 x 7! kx x0 k2
u3 W R3 ! R;
for x0 D .6=5; 6=5; 6=5/
by computing the corresponding densities f1;h , f2;h and f3;h and comparing Z fi;h .y/ ui;h .x/ ´ dy for all i 2 f1; 2; 3g
4kx yk2 to the real value of the harmonic function at the test point xO ´ .1=2; 1=2; 1=2/. Due to the results in Chapter 12 of [97], we expect the error to behave like i ´ jui .x/ O ui;h .x/j O . h3
for all i 2 f1; 2; 3g:
In a first series of experiments, we use interpolation (cf. Section 4.4) to create an initial approximation of V , and then apply Algorithm 30 with blockwise control of the relative spectral error (cf. Section 6.8) for an error tolerance O h2 . In order to reduce the complexity, we use an incomplete cross approximation [104] of the coupling matrices Sb , which can be computed efficiently by theACA algorithm [4]. The nearfield matrices are constructed using the black-box quadrature scheme described in [89], [43]. The
393
10.1 Indirect boundary integral equation
resulting system of linear equations is solved approximately by the conjugate gradient method [99] with a suitable preconditioner (cf. [50]). The parameters used for the interpolation and the recompression are given in Table 10.1: as usual, n gives the number of degrees of freedom, in this case the number of triangles of the surface approximation, m is the order of the initial interpolation and O is the tolerance used by the recompression algorithm. Table 10.1. Solving Poisson’s equation using an indirect boundary element formulation. The matrix is constructed by applying the recompression Algorithm 30 to an initial H 2 -matrix approximation constructed by interpolation.
n m O 512 3 23 2048 4 54 8192 5 14 32768 6 25 131072 7 56 524288 8 16 100000
Build Mem Mem=n 1 3:1 1:3 2:7 6:64 21:7 8:2 4:1 1:84 152:1 45:3 5:7 2:76 1027:8 238:1 7:4 2:97 6950:0 1160:9 9:1 1:27 67119:0 5583:8 10:9 4:08
2 2:74 2:35 2:36 2:67 3:08 4:69
0.14
Build O(n log(n)^4)
3 1:84 1:35 8:06 9:07 8:48 9:610
Build/n O(log(n)^4)
0.12
10000 0.1 1000
0.08 0.06
100
0.04 10 0.02 1 100
1000
10000
10000
100000
1e+06
0 100
1000
10000
0.001
Memory O(n log(n)^2)
1e+06
eps1 eps2 eps3 O(h^3)
0.0001
1000
100000
1e-05 1e-06
100 1e-07 1e-08
10
1e-09 1 100
1000
10000
100000
1e+06
1e-10 100
1000
10000
100000
1e+06
We know that a local rank of k log2 .n/ is sufficient to ensure (10.9), since multipole expansions require this rank. Due to the best-approximation property of Algorithm 25, and therefore Algorithm 30, we expect the compressed H 2 -matrix to
394
10 Applications
have the same rank, and this expectation is indeed justified by Table 10.1: the time required for the matrix construction is proportional to n log4 .n/, i.e., nk 2 for k log2 .n/, and the storage requirements are proportional to n log2 .n/, i.e., nk for our estimate of k. The surprisingly long computing time for the last experiment (n D 524288) may be attributed to the NUMA architecture of the computer: although the resulting matrix requires less than 6 GB, Algorithm 30 needs almost 40 GB for weight matrices of the original cluster bases. Since this amount of storage is not available on a single processor board of the computer, some memory accesses have to use the backplane, and the bandwidth of the backplane is significantly lower than that of the local storage. The high temporary storage requirements are caused by the high rank of the initial approximation constructed by interpolation: for m D 8, the initial rank is k D m3 D 512, and using two 512 512-matrices for each of the 23855 clusters requires a very large amount of temporary storage. In practical applications, it can even happen that the auxiliary matrices require far more storage than the resulting compact H 2 -matrix approximation (it is not uncommon to see that the matrices .RX;s /s2TJ require more than two times the storage of the final H 2 -matrix, and the ratio grows as the order m becomes larger). Therefore we look for an alternative approach providing a lower rank for the initial approximation. A good choice is the hybrid cross approximation technique [17], which constructs blockwise low-rank approximations of the matrix, thus providing an initial H -matrix approximation. We can apply the recompression Algorithm 33 in order to convert the low-rank blocks into H 2 -matrices as soon as they become available. This allows us to take advantage of both the fast convergence of the HCA approximation and the low storage requirements of the resulting H 2 -matrix. The results of this approach are listed in Table 10.2. We use a modified HCA implementation by Lars Grasedyck that is able to increase the order m according to an internal heuristic depending on the prescribed accuracy O . The resulting orders are given in the column “mad ”. We can see that this approach yields results similar to those provided by Algorithm 30, but that the reduction of the amount of temporary storage seems to lead to an execution time proportional to n log5 .n/ instead of the time proportional to n log4 .n/ that we could observe for Algorithm 30. Remark 10.2 (Symmetry). In the case investigated here, the matrix V is symmetric and can therefore be approximated by a symmetric H 2 -matrix Vz . Instead of storing the entire matrix, it would be sufficient to store the lower or upper half, thus reducing the storage and time requirements by almost 50 percent.
395
10.2 Direct boundary integral equation
Table 10.2. Solving Poisson’s equation using an indirect boundary element formulation. The matrix is constructed by applying the recompression Algorithm 33 to an initial H -matrix approximation constructed by the HCA method.
n 512 2048 8192 32768 131072 524288
O 23 54 14 25 56 16
1e+06
Mem Mem=n 1 mad Build 4 7:1 1:6 3:2 4:24 5 35:4 8:1 4:0 5:76 7 298:3 45:6 5:7 5:37 9 2114:5 240:9 7:5 1:47 11 12774:5 1170:0 9:1 4:18 14 91501:2 5507:3 10:8 4:08
2 8:06 8:16 8:97 1:17 3:88 4:810
0.2
Build O(n log(n)^5)
Build/n O(log(n)^5)
0.18
100000
3 1:74 6:65 8:06 9:27 8:88 1:49
0.16 0.14
10000
0.12 1000
0.1 0.08
100
0.06 0.04
10
0.02 1 100
1000
10000
10000
100000
1e+06
0 100
1000
10000
0.001
Memory O(n log(n)^2)
1e+06
eps1 eps2 eps3 O(h^3)
0.0001
1000
100000
1e-05 1e-06
100 1e-07 1e-08
10
1e-09 1 100
1000
10000
100000
1e+06
1e-10 100
1000
10000
100000
1e+06
10.2 Direct boundary integral equation Now we consider a refined boundary integral approach for solving Laplace’s equation (10.1) with Dirichlet boundary values. Instead of working with the density function f , we now want to compute the Neumann values @u uN W ! R; x 7! .x/; @n corresponding to the Dirichlet values uD D uj of the harmonic function u directly.
396
10 Applications
According to [97], Kapitel 12, both functions are connected by the variational equation 1 aV .v; uN / D aK .v; uD / C hv; uD i0; 2
for all v 2 H 1=2 ./;
(10.11)
where the bilinear form aK corresponding to the double layer potential is given by Z Z hx y; n.y/iu.y/ 1=2 1=2 aK W H ./ H ./ ! R; .v; u/ 7! v.x/ dy dx: 4kx yk32
As in the previous example, existence and uniqueness of uN 2 H 1=2 ./ is guaranteed since aV is H 1=2 ./-coercive.
Discretization Usually we cannot compute the right-hand side of (10.11) explicitly, instead we rely on a discrete approximation: we introduce the family .xj /j 2J of vertices of the triangulation G and the family . j /j 2J of corresponding continuous piecewise linear basis functions given by ´ 1 if i D k; for all j; k 2 J; j .xk / D 0 otherwise j j i
for all i 2 ; j 2 J:
is affine
We approximate uD by its L2 -orthogonal projection uD;h into the space S1 ´ spanf
j
W j 2 Jg H 1=2 ./:
This projection can be computed by solving the variational equation hvh ; uD;h i0; D hvh ; uD i0;
for all vh 2 S1 ;
(10.12)
and the corresponding system of linear equations is well-conditioned and can be solved efficiently by the conjugate gradient method. Now we proceed as before: we replace the infinite-dimensional space H 1=2 ./ by the finite-dimensional space S0 and the exact Dirichlet value uD by uD;h and get the system of linear equations 1 Vx D K C M y (10.13) 2 for the matrix V 2 R already introduced in (10.6), the matrix K 2 RJ defined by Z Z hx y; n.y/i2 j .y/ Kij ´ 'i .x/ dy dx for all i 2 ; j 2 J; 4kx yk32
397
10.2 Direct boundary integral equation
the mass matrix M 2 RJ defined by Z 'i .x/ j .y/ dy dx Mij ´
for all i 2 ; j 2 J;
and the coefficient vector y 2 RJ corresponding to uD;h , which is defined by X yj j : uD;h D j 2J
Since aV is H 1=2 ./-coercive, the matrix V is symmetric and positive definite, so (10.13) has a unique solution x 2 R which corresponds to the Galerkin approximation X xi 'i uN;h ´ i2
of the Neumann boundary values uN .
Approximation of the matrices The mass matrix M and the matrix corresponding to (10.12) are sparse and can be computed exactly by standard quadrature. The matrices V and K are not sparse, therefore we have to replace them by efficient approximations. For V we can use interpolation (cf. Section 4.4), while for K the equation hx y; n.y/i2 @ 1 D 3 @n.y/ 4kx yk2 4kx yk2 suggests using derivatives of interpolants (cf. Section 4.5). Replacing V and K by their respective H 2 -matrix approximations Vz and Kz leads to the perturbed system
1 Vz xQ D Kz C M y: 2
(10.14)
The solution of the corresponding perturbed system corresponds to the function X uQ N;h ´ xQ i 'i i2
and can be characterized by the variational equation 1 aQ V .vh ; uQ N;h / D aQ K .vh ; uD;h / C hvh ; uD;h i0; 2
for all vh 2 S0
for the perturbed bilinear forms aQ V defined by (10.8) and aQ K defined by aQ K .'i ;
j/
D Kzij
for all i 2 ; j 2 J:
(10.15)
398
10 Applications
Strang’s first lemma (e.g., Satz III.1.1 in [26]) yields kuN uQ N;h k1=2; . inf kuN vh k1=2; vh 2S0
C sup wh 2S0
C sup wh 2S0
jaV .vh ; wh / aQ V .vh ; wh /j kwh k1=2;
Q h /j j.wh / .w kwh k1=2;
for the right-hand side functional 1 .wh / ´ aK .wh ; uD / C hwh ; uD i0; 2
for all wh 2 S0
and its perturbed counterpart Q h / ´ aQ K .wh ; uD;h / C 1 hwh ; uD;h i0; for all wh 2 S0 : .w 2 We have already given estimates for the first two terms of the error estimate in the previous section, so we now only have to find a suitable bound for the difference of the right-hand sides. To this end, we separate the approximation of uD and the approximation of aK : Q h / D aK .wh ; uD / aQ K .wh ; uD;h / C 1 hwh ; uD uD;h i0; .wh / .w 2 D aK .wh ; uD uD;h / C aK .wh ; uD;h / 1 aQ K .wh ; uD;h / C hwh ; uD uD;h i0; : 2 Since uD;h is computed by the L2 -projection, we have kuD uD;h k1=2; . h3=2 according to [97], equation (12.19), in the case of optimal regularity uD 2 H 2 ./. This implies hwh ; uD uD;h i0; kwh k1=2; kuD uD;h k1=2; . h3=2 kwh k1=2; : According to Satz 6.11 in [97], we have aK .wh ; uD uD;h / . kwh k1=2; kuD uD;h k1=2; . h3=2 kwh k1=2; : For the perturbed bilinear form, combining the technique used in the proof of Lemma 10.1 with the inverse estimate ([38], Theorem 4.6) yields z 2 kwh k0; kuD;h k0; jaK .wh ; uD;h / aQ K .wh ; uD;h /j . hd kK Kk d 1=2 z 2 kwh k1=2; kuD;h k0; : .h kK Kk
10.2 Direct boundary integral equation
399
Due to uD 2 H 1=2 ./ L2 ./, we can assume that kuD;h k0; is bounded independently of h and conclude Q h /j . .hd 1=2 kK Kk z 2 C h3=2 /kwh k1=2; ; j.wh / .w i.e., if we can ensure
z 2 . hd C2 ; kK Kk
(10.16)
we get kuN uQ N;h k1=2; . h3=2 : Since the kernel function corresponding to K is based on first derivatives of a kernel function with singularity order D 1, Corollary 4.32 yields an approximation error of kg gk Q 1; t s .
qm dist.Q t ; Qs /2
for all b D .t; s/ 2 LC J :
Assuming again j t j . diam.Q t /d ;
j s j . diam.Qs /d
for all t 2 T ; s 2 TJ
and applying Theorem 4.49 gives us z 2 . Cov C CJ q m maxfp ; pJ g: kK Kk
(10.17)
For our choice of basis functions, we have Cov D 3, C ; CJ . hd =2 and get z 2 . hd q m maxfp ; pJ g: kK Kk In order to guarantee (10.16), we have to choose the order m large enough to yield q m maxfp ; pJ g . h2 .
Experiments We can assume p . log n and pJ . log nJ , therefore our previous experiments (cf. Table 4.4) suggest that increasing the interpolation order by one if the mesh width h is halved should be sufficient to ensure q m maxfp ; pJ g . h2 and therefore the desired error bound kuN uQ N;h k1=2; . h3=2 . We once more test the method for the unit sphere and the harmonic functions u1 , u2 and u3 introduced in the previous section. For these functions, the Neumann boundary values u1;N , u2;N and u3;N can be computed and compared to their discrete approximations u1;N;h , u2;N;h and u3;N;h . Since computing a sufficiently accurate approximation of the norm of H 1=2 ./ is too complicated, we measure the L2 -norm instead. According to the inverse estimate [38], Theorem 4.6, we can expect i ´ kuN;i uN;i;h k0; . h
for all i 2 f1; 2; 3g:
400
10 Applications
The results in Table 10.3 match our expectations: the errors decrease at least like h, the faster convergence of 1 is due to the fact that uN;1 can be represented exactly by the discrete space, therefore the discretization error is zero and the error 1 is only caused by the quadrature and the matrix approximation, and its fast convergence is owed to the fact that we choose the error tolerance O slightly smaller than strictly necessary. Table 10.3. Solving Poisson’s equation using a direct boundary element formulation. The matrix V is constructed by applying the recompression Algorithm 30 to an initial H 2 -matrix approximation constructed by interpolation. The matrix K is approximated on the fly by interpolation and not stored.
n m O 512 3 23 2048 4 54 8192 5 14 32768 6 25 131072 7 56 524288 8 16 100000
Build Mem Mem=n 1 3:1 1:3 2:7 3:22 21:7 8:2 4:1 1:12 152:1 45:3 5:7 4:43 1027:8 238:1 7:4 1:23 6950:0 1160:9 9:1 4:64 67119:0 5583:8 10:9 1:44
2 2:51 1:21 6:22 3:12 1:52 7:73
0.14
Bld O(n log(n)^4)
3 5:12 2:32 1:52 5:63 2:83 1:43
Bld/n O(log(n)^4)
0.12
10000 0.1 1000
0.08 0.06
100
0.04 10 0.02 1 100
1000
10000
10000
100000
1e+06
0 100
Mem O(n log(n)^2)
1000
0.1
100
0.01
10
0.001
1 100
1000
10000
100000
1000
10000
1
1e+06
0.0001 100
100000
1e+06
eps1 eps2 eps3 O(h)
1000
10000
100000
1e+06
10.3 Preconditioners for integral equations
401
10.3 Preconditioners for integral equations According to Lemma 4.5.1 in [92], the condition number of the matrix V can be expected to behave like h1 , i.e., the linear systems (10.5) and (10.13) become illconditioned if the resolution of the discrete space grows. For moderate problem dimensions, this poses no problem, since the conjugate gradient method used for solving the linear systems is still quite fast, especially if V is approximated by an H 2 -matrix and Algorithm 8 is used to perform the matrix-vector multiplication required in one step of the solver. For large problem dimensions, the situation changes: due to the growing condition number, a large number of matrix-vector multiplications are required by the solver, and each of these multiplications requires an amount of time that is not negligible.
Preconditioned conjugate gradient method If we still want to be able to solve the linear systems efficiently, we have to either speed up the matrix-vector multiplication or reduce the number of these multiplications required by the solver. The time per matrix-vector multiplication can be reduced by compressing the matrix as far as possible, e.g., by applying the Algorithms 30 or 33, since reducing the storage required for the H 2 -matrix also reduces the time needed to process it, or by parallelizing Algorithm 8. The number of multiplications required by the conjugate gradient method can be reduced by using a preconditioner, i.e., by solving the system Vy xO D bO for the transformed quantities Vy ´ P 1=2 VP 1=2 ;
xO ´ P 1=2 x;
bO ´ P 1=2 b;
where P 2 R is a symmetric positive definite matrix, called the preconditioning matrix for V . The idea is to construct a matrix P which is, in a suitable sense, “close” to V , since then we can expect Vy to be “close” to the identity, i.e., to be well-conditioned. The matrix P can be constructed by a number of techniques: wavelet methods [74], [40], [36] apply a basis transformation and suitable diagonal scaling, the matrix P can be constructed by discretizing a suitable pseudo-differential operator [98], but we can also simply rely on H 2 -matrix arithmetic operations introduced in Chapter 8 by computing an approximative LU factorization of V .
402
10 Applications
LU decomposition In order to do so, let us first consider a suitable recursive algorithm for the computation of the exact LU factorization of V . For all t; s 2 T , we denote the submatrix corresponding to the block .t; s/ by V t;s ´ V jtOOs . Let t 2 T be a cluster. The LU factorization of V t;t can be computed by a recursive algorithm: if t is a leaf cluster, the admissibility condition (4.49) and, in fact, all other admissibility conditions we have introduced, implies that .t; t / 2 L is an inadmissible leaf of the block cluster tree T . Therefore it is stored in one of the standard representations for densely populated matrices and we can compute the LU factorization of V t;t by standard algorithms. If t is not a leaf cluster, the admissibility condition guarantees that .t; t / is inadmissible, i.e., that V t;t is a block matrix. We consider only the simple case # sons.t / D 2, the general case # sons.t / 2 N can be handled by induction. Let sons.t / D ft1 ; t2 g. We have V t1 ;t1 V t1 ;t2 V t;t D V t2 ;t1 V t2 ;t2 and the LU factorization V t;t D L.t/ U .t/ takes the form ! L.t/ U t.t/ V t1 ;t1 V t1 ;t2 t1 ;t1 1 ;t1 D .t/ V t2 ;t1 V t2 ;t2 L.t/ L t2 ;t1 t2 ;t2
! U t.t/ 1 ;t2 ; U t.t/ 2 ;t2
which is equivalent to the four equations .t/ V t1 ;t1 D L.t/ t1 ;t1 U t1 ;t1 ;
V t1 ;t2 D V t2 ;t1 D V t2 ;t2 D
.t/ L.t/ t1 ;t1 U t1 ;t2 ; .t/ L.t/ t2 ;t1 U t1 ;t1 ; .t/ L.t/ t2 ;t1 U t1 ;t2 C
(10.18a) (10.18b) (10.18c) .t/ L.t/ t2 ;t2 U t2 ;t2 :
(10.18d)
According to (10.18a), we compute L.t1 / and U .t1 / by recursively constructing the LU .t1 / factorization V t1 ;t1 D L.t1 / U .t1 / of the submatrix V t1 ;t1 and setting L.t/ t1 ;t1 ´ L and U t.t/ ´ U .t1 / . 1 ;t1 .t/ .t/ .t/ Once we have L.t/ t1 ;t1 and U t1 ;t1 , we compute U t1 ;t2 and L t2 ;t1 by using equations (10.18b) and (10.18c). Then equation (10.18d) yields .t/ .t/ .t/ V t2 ;t2 L.t/ t2 ;t1 U t1 ;t2 D L t2 ;t2 U t2 ;t2 ; .t/ and we compute the left-hand side by constructing V t02 ;t2 ´ V t2 ;t2 L.t/ t2 ;t1 U t1 ;t2 , using .t2 / a recursion to find its LU factorization V t02 ;t2 D L.t2 / U .t2 / , and setting L.t/ t2 ;t2 ´ L
and U t.t/ ´ U .t2 / . 2 ;t2
10.3 Preconditioners for integral equations
403
Algorithm 55. LU decomposition. procedure LUDecomposition(t, X , var L, U ); if sons.t/ D ; then Compute factorization X D LU else # sons.t /; ft1 ; : : : ; t g sons.t /; for i; j 2 f1; : : : ; g do Xij X jtOi tOj end for; for i D 1 to do LUDecomposition(ti , Xi i , var Li i , Ui i ); for j 2 fi C 1; : : : ; g do Uij Xij ; fAlgorithm 56g LowerBlockSolve(ti , Li i , Uij ); Lj i Xj i ; UpperBlockSolve(ti , Ui i , Lj i ) fAlgorithm 57g end for; for j; k 2 fi C 1; : : : ; g do Xj k Xj k Lj i Uik fUse approximative multiplicationg end for end for; for i; j 2 f1; : : : ; g do if i > j then LjtOi ;tOj Lij ; U jtOi ;tOj 0 fUnify resulting matrices by Algorithm 32g else if i < j then LjtOi ;tOj 0; U jtOi ;tOj Uij fUnify resulting matrices by Algorithm 32g else LjtOi ;tOj Lij ; U jtOi ;tOj Uij fUnify resulting matrices by Algorithm 32g end if end for end if The equations (10.18b) and (10.18c) are solved by recursive forward substitution: for an arbitrary finite index set K and a matrix Y 2 RtOK , we can find the solution X 2 RtOK of Y D L.t/ X by again using the block representation. We let X t1 ´ XjtO1 K , X t2 ´ X jtO2 K , Y t1 ´ Y jtO1 K and Y t2 ´ Y jtO2 K and get ! L.t/ X t1 Y t1 t1 ;t1 D ; .t/ .t/ Y t2 X t2 L t2 ;t1 L t2 ;t2 which is equivalent to Y t1 D L.t/ t1 ;t1 X t1 ;
(10.19a)
404
10 Applications .t/ Y t2 D L.t/ t2 ;t1 X t1 C L t2 ;t2 X t2 :
(10.19b)
The first equation (10.19a) is solved by recursion. For the second equation (10.19b), we .t/ 0 compute the matrix Y t02 ´ Y t2 L.t/ t2 ;t1 X t1 and then solve the equation Y t2 D L t2 ;t2 X t2 by recursion. For the general case # sons.t / 2 N, the entire recursion is summarized in Algorithm 56. Algorithm 56. Forward substitution for a lower triangular block matrix. procedure LowerBlockSolve(t, L, var X ); if sons.t/ D ; then Y X; Apply standard backward substitution to solve LX D Y else # sons.t /; ft1 ; : : : ; t g sons.t /; for i; j 2 f1; : : : ; g do Lij LjtOi tOj end for; for i 2 f1; : : : ; g do Xi X jtOi K end for; for i D 1 to do LowerBlockSolve(ti , Li i , Xi ); for j 2 fi C 1; : : : ; g do Xj Xj Lj i Xi fUse approximative multiplicationg end for end for; for i 2 f1; : : : ; g do Xj Oi K Xi fUnify resulting matrix by Algorithm 32g end for end if A similar procedure can be applied to (10.18c): for a finite index set K and a matrix Y 2 RKtO , we compute the solution X 2 RKtO of Y D X U .t/ by considering the subblocks X t1 ´ X jKtO1 , X t2 ´ X jKtO2 , Y t1 ´ Y jKtO1 and Y t2 ´ Y jKtO2 and the corresponding block equation !
U t.t/;t U t.t/;t 1 1 1 2 Y t1 Y t 2 D X t1 X t2 : U t.t/ 2 ;t2 The resulting equations Y t1 D X t1 U t.t/ ; 1 ;t1 Y t2 D
X t1 U t.t/ 1 ;t2
C
(10.20a) X t2 U t.t/ 2 ;t2
(10.20b)
10.3 Preconditioners for integral equations
405
and then are solved by applying recursion to (10.20a), computing Y t02 ´ Y t2 X t1 U t.t/ 1 ;t2
. The entire recursion is given in Algorithm 57. applying recursion to Y t02 D X t2 U t.t/ 2 ;t2 Algorithm 57. Forward substitution for an upper triangular block matrix. procedure UpperBlockSolve(t, U , var X ); if sons.t/ D ; then Y X; Apply standard backward substitution to solve X U D Y else # sons.t /; ft1 ; : : : ; t g sons.t /; for i; j 2 f1; : : : ; g do Uij U jtOi tOj end for; for i 2 f1; : : : ; g do Xi X jKtOi end for; for i D 1 to do UpperBlockSolve(ti , Ui i , Xi ); for j 2 fi C 1; : : : ; g do Xj Xj Xi Uij fUse approximative multiplicationg end for end for; for i 2 f1; : : : ; g do XjK Oi Xi fUnify resulting matrix by Algorithm 32g end for end if
The Algorithms 55, 56 and 57 are given in the standard notation for dense matrices and will compute the exact LU factorization of a given matrix. An excellent preconditioner for our system V x D b would then be the matrix P D LU , since P 1 D U 1 L1 can be easily computed by forward and backward substitution and P 1=2 VP 1=2 D I means that the resulting preconditioned method would converge in one step.
Approximate H 2 -LU decomposition This strategy is only useful for small problems, i.e., in situations when the matrices can be stored in standard array representation. For large problems, we cannot treat V in this way, and we also cannot compute the factors L and U in this representation, either. Therefore we use the H 2 -matrix approximation Vz of V and try to compute H 2 -matrix z and Uz of the factors L and U . approximations L
406
10 Applications
Fortunately, switching from the original Algorithms 55, 56 and 57 to their approximative H 2 -matrix counterparts is straightforward: we replace the exact matrix-matrix multiplication by the approximative multiplication algorithms introduced in Chapter 7 or Chapter 8. This allows us to compute all submatrices Lij , Uij and Xij in the algorithms efficiently. Since we are working with H 2 -matrices, we cannot simply combine the submatrices to form L, U or X , but we have to use the unification Algorithm 32 to construct uniform row and column cluster bases for the results. z Uz Vz is a sufficiently good approximation, we can now use P ´ Assuming that L z z LU as a preconditioner if we are able to provide efficient algorithms for evaluating z D y and Uz x D y for vectors z 1 , which is equivalent to solving Lx P 1 D Uz 1 L x; y 2 R . We consider only the forward substitution, i.e., the computation of x 2 R solving Lx D y for a vector y 2 R and a lower triangular H 2 -matrix for a cluster tree T and nested cluster bases V D .V t / t2T and W D .Ws /s2T with transfer matrices .E t / t2T and .Fs /s2T , respectively. Let us first investigate the simple case # sons.t / D 2 and ft1 ; t2 g D sons.t /. As before, we let L11 ´ LjtO1 tO1 , L21 ´ LjtO2 tO1 , L22 ´ LjtO2 tO2 , x1 ´ xjtO1 , x2 ´ xjtO2 , y1 ´ yjtO1 and y2 ´ yjtO2 and observe that Lx D y is equivalent to y1 L11 x1 D ; y2 L21 L22 x2 i.e., to the two equations y1 D L11 x1 ; y2 D L21 x1 C L22 x2 : The first of these equations can be solved by recursion, for the second one we introduce y20 ´ y2 L21 x1 and solve L22 x2 D y20 , again by recursion. So far, the situation is similar to the one encountered in the discussion of Algorithm 56. Now we have to take a look at the computation of y20 . If we would use the standard matrix-vector multiplication Algorithm 8 to compute L21 x1 , the necessary forward and backward transformations would lead to a suboptimal complexity, since some coefficient vectors would be recomputed on all levels of the cluster tree. In order to reach the optimal order of complexity, we have to avoid these redundant computations. Fortunately, this is possible: in the simple case of our 2 2 block matrix, we can see that the computation of x1 requires no matrix-vector multiplication, and therefore no forward or backward transformations. The computation of L21 x1 requires the coefficient vectors xO s ´ Ws x for all s 2 sons .t1 / and computes contributions to y only in the coefficient vectors yOs for all s 2 sons .t2 /. This observation suggests a simple algorithm: we assume that y is given in the form X y D y0 C V t yO t t 2sons .t/
10.3 Preconditioners for integral equations
407
and that we have to compute xjtO and all coefficient vectors xO s ´ Ws x for s 2 sons .t/. If t is a leaf, we have sons .t / D ft g and compute y D y0 C V t yO t directly. Since t is a leaf, LjtOtO is a dense matrix and we use the standard forward substitution algorithm for dense matrices. Then we can compute xO t ´ W t x directly. If t is not a leaf, we use the equation X V t yO t D V t 0 E t 0 yO t t 0 2sons.t/
to express yO t in terms of the sons of t, perform the recursion described above, and then assemble the coefficient vector X F t0 xO t 0 : xO t D t 0 2sons.t/
In short, we mix the elementary steps of the forward and backward transformations with the computation of the solution. Algorithm 58. Forward substitution. procedure SolveLower(t , L, var x, y, x, O y); O if sons.t/ D ; then yjtO yjtO C .V t yO t /jtO ; Solve LjtOtO xjtO D yjtO ; xO t W t x else # sons.t /; ft1 ; : : : ; t g sons.t /; xO t 0; for i D 1 to do yO ti yO ti C E ti yO t ; for j 2 f1; : : : ; i 1g do b0 .ti ; tj /; for b D .t ; s / 2 sons .b 0 / do if b 2 L then yjtO LjtO Os xjsO yjtO else if b 2 LC then yO t Sb xO s yO t end if end for end for; SolveLower(ti , L, x, y, x, O y); O xO t C F ti xO ti xO t end for end if
408
10 Applications
The resulting recursive procedure is given in Algorithm 58. An efficient algorithm for the backward substitution to solve the system Ux D y can be derived in a very similar way. A closer investigation reveals that both algorithms have the same complexity as the standard matrix-vector multiplication Algorithm 8, i.e., that their complexity can be considered to be of optimal order.
Experiments In the first experiment, we construct an approximation Vz of the matrix V for a polygonal approximation of the unit sphere with n D 32768 degrees of freedom using the same parameters as before, i.e., with an interpolation order of m D 6, a tolerance Orc D 106 for the recompression and a quadrature order of 3. We have already seen that these parameters ensure that the error introduced by the matrix approximation is sufficiently small. The approximation of Vz requires 336:8 MB, its nearfield part accounts for 117:5 MB, and the construction of the matrix is accomplished in 1244:7 seconds. The preconditioner is constructed by computing an approximate H 2 -matrix LU factorization of Vz with an error tolerance O for the blockwise relative spectral error. Table 10.4 lists the experimental results for different values of O : the first column contains the value of O , the second the time in seconds required for the construction of z and Uz , the third the storage requirements in MB, the fourth the storage requirements L z Uz k2 computed by per degree of freedom in KB, the fifth an estimate for ´ kVz L a number of steps of the power iteration, the sixth gives the maximal number of steps that where required to solve the problems (10.5) and (10.13), and the last column gives the time required for solving the system. Without preconditioning, up to 321 steps of the conjugate gradient method are required to solve one of the problems (10.5) or (10.13), and due to the corresponding large number of matrix-vector multiplications the solver takes 270 seconds. Even with the lowest accuracy O D 0:2, the number of iterations is reduced significantly, and only 44:4 seconds are required by the solver, and the storage requirements of 142:1 MB are clearly dominated by the nearfield part of 117:5 MB and less than half of the amount needed by Vz . Although the estimate for seems to indicate that the approximate LU factors are too inaccurate, they are well-suited for the purpose of preconditioning. The construction of the approximate LU factors requires 434:6 seconds, i.e., approximately a third of the time required to find the approximate matrix Vz . If we use smaller values of O , we can see that the error is proportional to the error tolerance O and that the storage requirements and the time for preparing the preconditioner seem to grow like log2 .1=O /. Now let us consider the behaviour of the preconditioner for different matrix dimensions n. We have seen in the previous experiment that the error is proportional to the error tolerance O . Since the condition number can be expected to grow like h1 as the mesh parameter h decreases, we use the choice O h to ensure stable convergence rates.
409
10.3 Preconditioners for integral equations
Table 10.4. Preconditioners of different quality for the single layer potential matrix V and the matrix dimension n D 32768.
O 21 11 52 22 12 53 23 13 1000
Mem Mem=n 142:1 4:4 7:20 146:6 4:6 5:10 152:9 4:8 3:10 162:6 5:1 1:10 170:0 5:3 5:41 178:9 5:6 3:61 194:9 6:1 1:61 210:4 6:6 7:62
LU 434:6 471:1 524:8 613:1 682:6 746:7 851:3 958:3
Steps Solve 32 44:4 26 36:1 21 30:2 15 22:7 11 17:3 9 15:1 7 12:2 6 11:0
220
LU O(log^2(1/eps))
Memory O(log^2(1/eps))
210
900 200 800
190 180
700 170 600
160 150
500 140 400 0.001
0.01
10
0.1
1
130 0.001
0.01
0.1
35
Error O(eps)
1 Steps
30 1
25 20
0.1
15 10
0.01 0.001
0.01
0.1
1
5 0.001
0.01
0.1
1
The results in Table 10.5 suggest that this approach works: the number of iteration steps required for solving the system is bounded and even seems to decay for large problem dimensions, while the time for the construction of the preconditioner seems to grow like n log3 .n/, i.e., the ratio between the time required for the construction of Vz and the preconditioner improves as n grows. Both the storage requirements and the time for solving the linear system grow like n log2 .n/, i.e., at the same rate as the storage requirements for the matrix Vz . We can conclude that the preconditioner works well: the time for solving the linear systems (10.5) and (10.13) is proportional to the storage requirements for the matrix
410
10 Applications
Table 10.5. Preconditioners for the single layer potential matrix V for different matrix dimensions n and O h.
n 512 2048 8192 32768 131072 524288
O 1:01 6:72 3:22 1:62 7:93 3:93
Build LU Mem Mem=n 3:1 2:4 1:2 2:4 2:02 21:2 19:7 6:1 3:1 5:51 151:9 102:2 32:1 4:0 9:11 1038:3 582:1 165:1 5:2 7:51 6953:5 3131:9 825:9 6:5 9:91 66129:6 18771:4 4009:8 7:8 1:20
100000
10000
LU O(n log^3(n))
10000
1000
1000
100
100
10
10
1
1 100
1000
10000
16
100000
1e+06
0.1 100
Memory O(n log^2(n))
1000
1000
Steps
Steps Solve 11 0:1 13 0:9 13 4:2 13 18:8 10 56:1 10 259:8
10000
100000
1e+06
Solve O(n log^2(n))
14 100 12 10
10
8 1 6 4
0.1
2 100
1000
10000
100000
1e+06
0.01 100
1000
10000
100000
1e+06
approximation Vz , which is the best result we can hope for, and the construction of the preconditioner takes less time than the construction of Vz , it even becomes more efficient as the problem dimension grows. Remark 10.3 (Symmetry). If Vz is symmetric and positive definite, it is advisable to use the Cholesky decomposition instead of the general LU decomposition. In this way, the computation time can be halved, and the Cholesky factorization can be computed even if only the lower half of Vz has been stored (cf. Remark 10.2).
10.4 Application to realistic geometries
411
10.4 Application to realistic geometries Until now, we have only considered problems on “academical” domains: the unit circle or sphere and the unit square. Since these geometries are very smooth, we expect the discretization error to be small, at least for those solutions that are of practical interest. According to Strang’s first lemma, this means that we have to ensure that the approximation of the matrices V and K is very accurate if we intend to get the optimal result. For geometries appearing in practical applications, a lower accuracy may be acceptable, since the discretization error will usually be quite large, but the algorithm has to be able to handle non-uniform meshes, possibly with fine geometric details. In order to investigate the performance of our techniques under these conditions, we now consider geometries that are closer to practical applications: Figure 10.1 contains a surface resembling a crank shaft1 , a sphere2 pierced with cylindrical “drill holes” and an object with foam-like structure3 .
Figure 10.1. Examples of practical computational domains.
We test our algorithms in the same way as in the previous section: we construct an H 2 -matrix approximation of the matrix V on the surface mesh by compressing an initial approximation based on interpolation, we compute the L2 -projection of the Dirichlet values and apply an H 2 -matrix approximation of the matrix K to compute the right-hand side of equation (10.14). Then we solve the equation by a conjugate gradient method using an approximate LU factorization of Vz as a preconditioner. Since the triangulation is less regular than the ones considered before, we use higher quadrature orders q 2 f4; 5; 6g to compute the far- and nearfield entries and determine the accuracy of the matrix approximation by trial and error: the matrices are constructed for the interpolation orders m 2 f6; 7; 8g and the error tolerances O 2 f106 ; 107 ; 108 g, and we check whether the approximation of the Neumann data changes. The results for the “crank shaft” problem are given in Table 10.6: switching from m D 6 to m D 7 hardly changes the errors 2 and 3 , only the error 1 is affected. 1
This mesh was created by the NGSolve package by Joachim Schöberl. Also created by NGSolve. 3 This mesh is used by courtesy of Heiko Andrä and Günther Of. 2
412
10 Applications
Table 10.6. Matrix approximations of different degrees of accuracy for the “crank shaft” geometry.
m q
O
Matrix V Build Mem
Matrices LU Build Mem
Slv.
1
2
3
6 7 8
4 16 4 17 4 18
1543:3 456:3 1011:5 180:1 20:6 5:73 2478:7 597:1 881:4 213:1 18:9 3:13 5009:6 753:3 893:3 248:9 20:1 2:93
1:12 1:12 1:12
5:53 4:13 4:03
6 7 8
5 16 5 17 5 18
2268:6 456:3 1016:3 180:1 21:1 3505:1 597:1 890:6 213:1 18:9 6319:0 753:3 890:3 248:9 20:1
5:13 1:73 1:33
1:12 1:12 1:12
4:33 2:23 2:13
7 8
6 17 6 18
5311:9 597:1 8660:5 753:3
1:43 9:74
1:12 1:12
2:03 1:83
885:3 213:1 19:2 896:0 248:9 20:2
If we increase the quadrature order, the errors 1 and 3 decrease, the error 2 is not affected. As mentioned before, 1 is only determined by the quadrature and matrix approximation errors, since the corresponding solution uN;1 is contained in the discrete space and therefore would be represented exactly by an exact Galerkin scheme. If we increase the interpolation error to m D 8, all errors remain almost unchanged. The fact that 1 changes only slightly suggests that the remaining errors are caused by the quadrature scheme, not by the matrix approximation. Table 10.7. Matrix approximations of different degrees of accuracy for the “pierced sphere” problem.
O
Matrix V Build Mem
6 7 8
4 16 4 17 4 18
1682:1 440:4 2696:0 577:2 5155:2 729:5
6 7
5 16 5 17
2444:1 440:4 3705:9 577:2
5 6 7
6 56 6 16 6 17
m q
Matrices LU Build Mem
Slv.
1
2
3
955:1 873:1 883:5
187:2 24:2 213:9 24:2 249:3 23:8
7:82 7:82 7:82
4:42 4:42 4:42
6:02 6:02 6:02
959:2 871:4
187:2 24:3 4:32 213:9 22:8 4:32
3:82 3:82
3:32 3:32
2595:6 359:2 1168:0 174:7 25:7 2:72 3758:8 440:4 962:1 187:2 24:2 2:22 5497:8 577:2 874:8 213:9 22:8 2:22
3:72 3:62 3:62
2:02 1:82 1:82
10.5 Solution operators of elliptic partial differential equations
413
We can conclude that an interpolation order of m D 7 and a quadrature order of q D 5 are sufficient to approximate the solutions uN;2 and uN;3 up to their respective discretization errors. For the “pierced sphere” problem, the results are collected in Table 10.7: even an interpolation order of m D 6 seems to be sufficient, since changing to m D 7 or m D 8 does not significantly change the approximation error. Increasing the quadrature order q, on the other hand, reduces the errors 1 and 3 , therefore we can conclude that the quadrature error dominates and the H 2 -matrix approximation is sufficiently accurate. For the “foam block” problem, the situation is similar: Table 10.8 shows that an interpolation order of m D 5 seems to be sufficient, the quadrature error dominates. The approximations of uN;2 and uN;3 are only slightly changed if the interpolation or the quadrature order are increased. Table 10.8. Matrix approximations of different degrees of accuracy for the “foam block” problem.
m q
O
Matrix V Build Mem
Matrices LU Build Mem
Slv.
1
2
3
1926:0 817:3 2969:7 976:7 4693:8 1240:3 8017:0 1521:0
2151:0 290:0 1807:4 317:8 1703:4 382:7 1767:4 469:4
91:1 85:5 78:7 78:6
6:92 6:32 6:22 6:22
1:41 1:41 1:41 1:41
7:22 7:12 7:12 7:12
2852:8 817:3 2180:5 290:0 4295:7 976:7 1822:2 317:8 6561:9 1240:3 1693:7 382:6 10861:6 1521:0 1769:5 469:4
90:9 83:4 79:3 78:8
4:12 2:82 2:72 2:72
1:41 1:41 1:41 1:41
6:62 6:42 6:42 6:42
290:0 92:3 317:9 88:3
3:32 1:32
1:41 1:41
6:42 6:22
5 6 7 8
4 4 4 4
56 16 17 18
5 6 7 8
5 5 5 5
56 16 17 18
5 6
6 56 6 16
4503:1 6636:4
817:3 2164:0 976:7 1816:3
10.5 Solution operators of elliptic partial differential equations All examples we have considered so far have been, in one way or another, based on polynomial expansion: even the hybrid cross approximation relies on interpolation. Now we investigate a problem class that cannot be treated in this way: the approximation of the inverse of a finite element stiffness matrix corresponding to an elliptic partial differential equation (cf. Chapter 9).
414
10 Applications
Finite element discretization We consider the equation div C.x/ grad u.x/ D f .x/ u.x/ D 0
for all x 2 ; for all x 2 @ ;
(10.21a) (10.21b)
where R2 is a bounded domain and C 2 L1 . ; R22 / is a mapping that assigns a 2 2-matrix to each point x 2 . We assume that C is symmetric and that there are constants ˛; ˇ 2 R with 0 < ˛ ˇ and .C.x// Œ˛; ˇ for all x 2 , i.e., that the partial differential operator is strongly elliptic. The variational formulation of the partial differential equation (10.21) is given by Z Z hrv.x/; C.x/ru.x/i2 dx D v.x/f .x/ dx for all v 2 H01 . /; (10.22)
where u 2 H01 . / is now only a weak solution (which nevertheless coincides with u if a classical solution exists). The variational formulation is discretized by a Galerkin scheme: as in Section 10.2, we use the space S1 spanned by the family .'i /i2 of piecewise linear nodal basis functions on a conforming triangulation G and consider the equation Ax D b; (10.23) where the matrix A 2 R is defined by Z Aij D hr'i .x/; C.x/r'j .x/i2 dx
for all i; j 2 ;
(10.24)
the vector b 2 R is defined by Z bi D 'i .x/f .x/ dx
for all i 2
and the solution vector x 2 R corresponds to the discrete solution X uh ´ xi 'i i2
of the variational equation. A is a symmetric positive definite matrix, therefore we could compute approximations of its Cholesky or LU factorization in order to construct efficient preconditioners (cf. [52], [79], [56]). A factorization is very efficient if we are only interested in solving a system of linear equations, but there are tasks, e.g., the computation of the Schur complement of a saddle point problem or the treatment of a matrix equation, when we require a sufficiently accurate approximation of the inverse B ´ A1 of the matrix A.
10.5 Solution operators of elliptic partial differential equations
415
Inversion As in the previous section, we reduce this task to a blockwise recursion using matrixmatrix multiplications. For all t; s 2 T , we denote the submatrices corresponding to the block .t; s/ by A t;s ´ AjtOOs and B t;s ´ BjtOOs . Let t 2 T be a cluster. If t is a leaf cluster, the admissibility condition implies that .t; t / 2 L is an inadmissible leaf of T , and A t t is stored in standard array representation. Its inverse B .t/ can be computed by the well-known algorithms from linear algebra. If t is not a leaf cluster, A t;t is a block matrix. As in the previous section, we consider only the simple case # sons.t / D 2 with sons.t / D ft1 ; t2 g. The defining equation A t;t B .t/ D I of B .t/ takes the form ! .t/ B B t.t/ I A t1 ;t1 A t1 ;t2 ;t ;t t 1 1 1 2 D : I A t2 ;t1 A t2 ;t2 B t.t/;t B t.t/;t 2 1
2 2
We compute the inverse B .t1 / of A t1 ;t1 by recursion and multiply the equation by the matrix B .t1 / A t2 ;t1 B .t1 / I in order to get
I 0
A t2 ;t2
B .t1 / A t1 ;t2 A t2 ;t1 B .t1 / A t1 ;t2
B t.t/ 1 ;t1 B t.t/ 2 ;t1
B t.t/ 1 ;t2 B t.t/ 2 ;t2
!
B .t1 / D A t2 ;t1 B .t1 /
0 : I
By denoting the Schur complement by S .t2 / ´ A t2 ;t2 A t2 ;t1 B .t1 / A t1 ;t2 , we can write this equation in the short form ! .t/ B B t.t/ 0 B .t1 / I B .t1 / A t1 ;t2 ;t ;t t 1 1 1 2 D : A t2 ;t1 B .t1 / I 0 S .t2 / B t.t/;t B t.t/;t 2 1
2 2
We compute the inverse T .t2 / of S .t2 / by recursion and multiply the equation by the matrix I B .t1 / A t1 ;t2 T .t2 / ; T .t2 / which yields the desired result ! .t/ B B t.t/ 0 B .t1 / I B .t1 / A t1 ;t2 T .t2 / t1 ;t2 1 ;t1 D A t2 ;t1 B .t1 / I 0 T .t2 / B t.t/ B t.t/ 2 ;t1 2 ;t2 .t / B 1 C B .t1 / A t1 ;t2 T .t2 / A t2 ;t1 B .t1 / B .t1 / A t1 ;t2 T .t2 / D : T .t2 / A t2 ;t1 B .t1 / T .t2 /
416
10 Applications
This construction of the inverse of A t;t gives rise to the recursive Algorithm 59: we start with the upper left block, invert it by recursion, and use it to compute the matrices X1;2 ´ B .t1 / A t1 ;t2 and X2;1 ´ A t2 ;t1 B .t1 / . Then we overwrite A t2 ;t2 by the Schur complement, given by A t2 ;t2 A t2 ;t1 X1;2 . In the next iteration of the loop, A t2 ;t2 is inverted, which gives us the lower right block of the inverse. The remaining blocks can be computed by multiplying this block by X1;2 and X2;1 . Algorithm 59. Matrix inversion, overwrites A with its inverse A1 and uses X for temporary matrices. procedure Inversion(t , var A, X ); if sons.t/ D ; then Compute A1 directly else # sons.t /; ft1 ; : : : ; t g sons.t /; for i; j 2 f1; : : : ; g do Aij AjtOi tOj end for; for i D 1 to do Inversion(ti , Ai i , Xi i ); for j 2 fi C 1; : : : ; g do Xij Ai i Aij ; fUse approximative multiplicationg Aj i Ai i fUse approximative multiplicationg Xj i end for; for j; k 2 fi C 1; : : : ; g do Aj k Aj k Aj i Xik fUse approximative multiplicationg end for end for; for i D downto 1 do for j 2 fi C 1; : : : ; g do Aij 0; Aj i 0; for k 2 fi C 1; : : : ; g do Aij Aij Xik Akj ; fUse approximative multiplicationg Aj i Aj i Aj k Xki ; fUse approximative multiplicationg end for; Ai i Ai i Xij Aj i fUse approximative multiplicationg end for end for; for i; j 2 f1; : : : ; g do Aj ti ;tj Aij fUnify resulting matrix by Algorithm 32g end for end if
10.5 Solution operators of elliptic partial differential equations
417
As in the case of the LU factorization, we only need matrix-matrix multiplications and recursion, therefore we can construct an approximate inverse by replacing the matrix-matrix products by their H 2 -matrix approximations, which can be computed by the algorithms introduced in the Chapters 7 and 8. Using the unification Algorithm 32, the blocks of the intermediate representation of the inverse can then be combined into the final H 2 -matrix representation.
Experiments We use the inversion Algorithm 59 to construct H 2 -matrix approximations of the stiffness matrix (10.24) for different types of coefficients. In our first experiment, we consider the simple case C D I , i.e., we discretize Poisson’s equation on the square domain Œ1; 12 with a regular triangulation. Table 10.9 gives the computation times, storage requirements and approximation errors ´ kI BAk2 for matrix dimensions ranging from n D 1024 to 1048576. The tolerance O for the approximative arithmetic operations is chosen to ensure 2 104 . Since we can expect the condition number of A to grow like n, using O 1=n should ensure a good approximation, and the experiment shows this leads to good results. Table 10.9. Approximate inverse of Poisson’s equation in two spatial dimensions for different matrix dimensions n.
n 1024 4096 16384 65536 262144 1048576 100000
O 46 16 17 28 49 19
Build Mem Mem=n 1:8 3:3 3:3 6:35 23:0 17:4 4:3 2:04 229:0 86:4 5:4 1:24 1768:7 395:1 6:2 1:44 12588:0 1785:2 7:0 1:54 80801:3 7585:0 7:4 2:24 0.0075
Invert O(n log^4(n))
Mem/n O(log(n))
0.007
10000
0.0065 0.006
1000
0.0055 0.005
100
0.0045 0.004
10
0.0035 1 1000
10000
100000
1e+06
1e+07
0.003 1000
10000
100000
1e+06
1e+07
418
10 Applications
Combining Theorem 9.13 (cf. Theorem 2.8 in [6] for a related result for H -matrices) with Lemma 9.10 yields that a rank of log.n/3 should be sufficient to approximate the inverse B of A, but the numerical experiment shows that in fact log.n/ is sufficient, and Lemma 3.38 yields that the storage requirements are in O.n log.n//. Our estimates also apply to strongly elliptic partial differential equations with nonsmooth coefficients in L1 , and even to anisotropic coefficients. We therefore consider two examples for non-smooth coefficients: in the first case, we split D Œ1; 12 into four equal quarters and choose C D 100 I in the lower left and upper right quarter and C D I in the remaining quarters: 8 ˆ 100 ˆ if x 2 Œ1; 0/ Œ1; 0/ [ Œ0; 1 Œ0; 1; ˆ < 100 22 CQ W ! R ; x 7! ˆ ˆ 1 ˆ otherwise: : 1 Due to the jump in the coefficients, the Green function for the corresponding elliptic differential operator will not be smooth, and approximating it by polynomials is not an option. The general theory of Section 6.3 and Chapter 9, on the other hand, can still be applied, therefore we expect to be able to approximate the inverse of A by an H 2 -matrix. Table 10.10. Approximate inverse of elliptic partial differential equations with coefficients in L1 for different matrix dimensions n.
n 1024 4096 16384 65536 262144
O 46 16 17 28 49
100000
Quartered Build Mem 1:8 3:3 22:7 17:5 223:6 87:1 1756:2 399:6 12670:2 1786:4
9:25 2:64 1:74 1:94 2:24
Anisotropy Build Mem 1:8 3:3 22:8 17:9 240:4 91:2 2137:2 429:4 17021:6 1964:7
0.0075
Q: Invert A: Invert O(n log^4(n))
10000
2:34 4:54 2:14 2:84 3:94
Q: Mem/n A: Mem/n O(log(n))
0.007 0.0065 0.006
1000
0.0055 0.005
100
0.0045 0.004
10
0.0035 1 1000
10000
100000
1e+06
0.003 1000
10000
100000
1e+06
10.5 Solution operators of elliptic partial differential equations
419
We can conclude that H 2 -matrices can be used in situations where the underlying operator cannot be approximated by polynomials and that the H 2 -matrix arithmetic algorithms will find suitable approximations in almost linear complexity. In a second example, we investigate the influence of anisotropic coefficients. We split into two halves and use an anisotropic coefficient matrix in one of them: 8 ˆ 100 ˆ if x 2 Œ1; 1 Œ0; 1; ˆ < 1 22 CA W ! R ; x 7! ˆ ˆ 1 ˆ otherwise: : 1 The approximate inverse is computed by using the recursive inversion Algorithm 59 in combination with the adaptive matrix-matrix multiplication introduced in Chapter 8 and yields the results given in Table 10.10. We can see that, compared to the case of Poisson’s equation, the storage requirements are only slightly increased and still proportional to n log.n/. For the anisotropic problem, the computation time is increased by approximately 35%, but it is still proportional to n log.n/4 .
Bibliography The numbers at the end of each item refer to the pages on which the respective work is cited. [1] S. Amini and A. T. J. Profit, Multi-level fast multipole solution of the scattering problem. Eng. Anal. Bound. Elem. 27 (2003), 547–564. 3 [2] C. R. Anderson, An implementation of the fast multipole method without multipoles. SIAM J. Sci. Statist. Comput. 13 (1992), 923–947. 2 [3] L. Banjai and W. Hackbusch, Hierarchical matrix techniques for low- and high-frequency Helmholtz problems. IMA J. Numer. Anal. 28 (2008), 46–79. 3 [4] M. Bebendorf, Approximation of boundary element matrices. Numer. Math. 86 (2000), 565–589. 2, 248, 392 [5] M. Bebendorf, Effiziente numerische Lösung von Randintegralgleichungen unter Verwendung von Niedrigrang-Matrizen. PhD thesis, Universität Saarbrücken, Saarbrücken 2000; dissertation.de -Verlag im Internet GmbH, Berlin 2001. 2 [6] M. Bebendorf and W. Hackbusch, Existence of H -matrix approximants to the inverse FE-matrix of elliptic operators with L1 -coefficients. Numer. Math. 95 (2003), 1–28. 2, 3, 6, 363, 367, 368, 370, 375, 376, 418 [7] M. Bebendorf and S. Rjasanow, Adaptive low-rank approximation of collocation matrices. Computing 70 (2003), 1–24. 2, 227, 248 [8] C. Bernardi and V. Girault, A local regularization operator for triangular and quadrilateral finite elements. SIAM J. Numer. Anal. 35 (1998), 1893–1916. 377 [9] G. Beylkin, R. Coifman, and V. Rokhlin, Fast wavelet transforms and numerical algorithms I. Comm. Pure Appl. Math. 44 (1991), 141–183. 2 [10] S. Börm, Approximation of integral operators by H 2 -matrices with adaptive bases. Computing 74 (2005), 249–271. 163, 212 [11] S. Börm, H 2 -matrix arithmetics in linear complexity. Computing 77 (2006), 1–28. 281, 329 [12] S. Börm, Adaptive variable-rank approximation of dense matrices. SIAM J. Sci. Comput. 30 (2007), 148–168. 212 [13] S. Börm, Data-sparse approximation of non-local operators by H 2 -matrices. Linear Algebra Appl. 422 (2007), 380–403. 6, 211 [14] S. Börm, Construction of data-sparse H 2 -matrices by hierarchical compression. SIAM J. Sci. Comput. 31 (2009), 1820–1839. 257 [15] S. Börm, Approximation of solution operators of elliptic partial differential equations by H - and H 2 -matrices. Numer. Math. 115 (2010), 165–193. 3, 6, 363, 366, 376 [16] S. Börm and L. Grasedyck, Low-rank approximation of integral operators by interpolation. Computing 72 (2004), 325–332. 2 [17] S. Börm and L. Grasedyck, Hybrid cross approximation of integral operators. Numer. Math. 101 (2005), 221–249. 2, 75, 248, 394
422
Bibliography
[18] S. Börm, L. Grasedyck, and W. Hackbusch, Hierarchical matrices. Lecture Note 21 of the Max Planck Institute for Mathematics in the Sciences, Leipzig 2003. 49, 269, 366 http://www.mis.mpg.de/de/publications/andere-reihen/ln/lecturenote-2103.html [19] S. Börm, L. Grasedyck, and W. Hackbusch, Introduction to hierarchical matrices with applications. Eng. Anal. Bound. Elem. 27 (2003), 405–422. 49 [20] S. Börm and W. Hackbusch, Approximation of boundary element operators by adaptive H 2 -matrices. In Foundations of computational mathematics: Minneapolis 2002, London Math. Soc. Lecture Note Ser. 312, Cambridge University Press, Cambridge 2004, 58–75. 163 [21] S. Börm and W. Hackbusch, Hierarchical quadrature of singular integrals. Computing 74 (2005), 75–100. 272 [22] S. Börm, M. Löhndorf, and J. M. Melenk, Approximation of integral operators by variableorder interpolation. Preprint 82/2002, Max Planck Institute for Mathematics in the Sciences, Leipzig 2002. http://www.mis.mpg.de/publications/preprints/2002/prepr2002-82.html 129 [23] S. Börm, M. Löhndorf, and J. M. Melenk, Approximation of integral operators by variableorder interpolation. Numer. Math. 99 (2005), 605–643. 2, 3, 75, 94, 125, 129, 135, 161, 278 [24] S. Börm and J. Ostrowski, Fast evaluation of boundary integral operators arising from an eddy current problem. J. Comput. Phys. 193 (2004), 67–85. 6 [25] S. Börm and S. A. Sauter, BEM with linear complexity for the classical boundary integral operators. Math. Comp. 74 (2005), 1139–1177. 126 [26] D. Braess, Finite Elemente. 4th ed., Springer-Verlag, Berlin 2007. 390, 398 [27] A. Brandt, Multilevel computations of integral transforms and particle interactions with oscillatory kernels. Comput. Phys. Comm. 65 (1991), 24–38. 83 [28] A. Brandt and A. A. Lubrecht, Multilevel matrix multiplication and fast solution of integral equations. J. Comput. Phys. 90 (1990), 348–370. 83 [29] S. Chandrasekaran, M. Gu, and W. Lyons,A fast adaptive solver for hierarchically semiseparable representations. Calcolo 42 (2005), 171–185. 59 [30] S. Chandrasekaran, M. Gu, and T. Pals, Fast and stable algorithms for hierarchically semi-separable representations. Tech. rep., Department of Mathematics, University of California, Berkeley, 2004. 59 [31] S. Chandrasekaran, M. Gu, and T. Pals, A fast ULV decomposition solver for hierarchically semiseparable representations. SIAM J. Matrix Anal. Appl. 28 (2006), 603–622. 59 [32] S. Chandrasekaran, M. Gu, X. Sun, J. Xia, and J. Zhu, A superfast algorithm for Toeplitz systems of linear equations. SIAM J. Matrix Anal. Appl. 29 (2007), 1247–1266. 59 [33] S. Chandrasekaran and I. C. F. Ipsen, On rank-revealing factorisations. SIAM J. Matrix Anal. Appl. 15 (1994), 592–622. 227 [34] P. G. Ciarlet, The finite element method for elliptic problems. Classics Appl. Math. 40, Society for Industrial and Applied Mathematics (SIAM), Philadelphia 2002. 365, 390
Bibliography
423
[35] P. Clément, Approximation by finite element functions using local regularization. RAIRO Anal. Numér. 9 (1975), 77–84. 376 [36] A. Cohen, W. Dahmen, and R. DeVore, Adaptive wavelet methods for elliptic operator equations: convergence rates. Math. Comp. 70 (2001), 27–75. 401 [37] J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series. Math. Comp. 19 (1965), 297–301. 1 [38] W. Dahmen, B. Faermann, I. G. Graham, W. Hackbusch, and S. A. Sauter, Inverse inequalities on non-quasi-uniform meshes and application to the mortar element method. Math. Comp. 73 (2004), 1107–1138. 376, 379, 390, 391, 398, 399 [39] W. Dahmen, H. Harbrecht, and R. Schneider, Compression techniques for boundary integral equations—asymptotically optimal complexity estimates. SIAM J. Numer. Anal. 43 (2006), 2251–2271. 2 [40] W. Dahmen, S. Prössdorf, and R. Schneider, Multiscale methods for pseudo-differential equations on smooth closed manifolds. In Wavelets: theory, algorithms, and applications (Taormina, 1993), Wavelet Anal. Appl. 5, Academic Press, San Diego 1994, 385–424. 401 [41] W. Dahmen and R. Schneider, Wavelets on manifolds I: Construction and domain decomposition. SIAM J. Math. Anal. 31 (1999), 184–230. 2 [42] R. A. DeVore and G. G. Lorentz, Constructive approximation. Grundlehren Math. Wiss. 303, Springer-Verlag, Berlin 1993. 95, 106 [43] S. Erichsen and S.A. Sauter, Efficient automatic quadrature in 3-d Galerkin BEM. Comput. Methods Appl. Mech. Engrg. 157 (1998), 215–224. 156, 392 [44] P. P. Ewald, Die Berechnung optischer und elektrostatischer Gitterpotentiale. Ann. Physik 369 (1921), 253–287. 2 [45] K. Giebermann, Multilevel approximation of boundary integral operators. Computing 67 (2001), 183–207. 1, 3, 74 [46] Z. Gimbutas and V. Rokhlin, A generalized fast multipole method for nonoscillatory kernels. SIAM J. Sci. Comput. 24 (2002), 796–817. 2 [47] G. Golub, Numerical methods for solving linear least squares problems. Numer. Math. 7 (1965), 206–216. 227 [48] G. H. Golub and C. F. Van Loan, Matrix computations. 3rd. ed., The Johns Hopkins University Press, Baltimore 1996. 180, 181, 186, 187, 188, 233 [49] L. Grasedyck, Theorie und Anwendungen Hierarchischer Matrizen. PhD thesis, Universität Kiel, Kiel 2001. 2, 50, 122, 262, 361, 376 http://eldiss.uni-kiel.de/macau/receive/dissertation_diss_00000454 [50] L. Grasedyck, Adaptive recompression of H -matrices for BEM. Computing 74 (2005), 205–223. 248, 393 [51] L. Grasedyck, Existence of a low-rank or H -matrix approximant to the solution of a Sylvester equation. Numer. Linear Algebra Appl. 11 (2004), 371–389. 2 [52] L. Grasedyck and W. Hackbusch, Construction and arithmetics of H -matrices. Computing 70 (2003), 295–334. 2, 4, 6, 29, 37, 38, 50, 347, 361, 366, 414
424
Bibliography
[53] L. Grasedyck, W. Hackbusch, and B. N. Khoromskij, Solution of large scale algebraic matrix Riccati equations by use of hierarchical matrices. Computing 70 (2003), 121–165. 2 [54] L. Grasedyck, W. Hackbusch, and R. Kriemann, Performance of H-LU preconditioning for sparse matrices. Comput. Methods Appl. Math. 8 (2008), 336–349. 2 [55] L. Grasedyck, R. Kriemann, and S. Le Borne, Parallel black box H -LU preconditioning for elliptic boundary value problems. Comput. Visual Sci. 11 (2008), 273–291. 2 [56] L. Grasedyck, R. Kriemann, and S. Le Borne, Domain decomposition based H -LU preconditioning. Numer. Math. 112 (2009), 565–600. 2, 414 [57] L. Greengard, D. Gueyffier, P.-G. Martinsson, and V. Rokhlin, Fast direct solvers for integral equations in complex three-dimensional domains. Acta Numer. 18 (2009), 243– 275. 2 [58] L. Greengard and V. Rokhlin, A fast algorithm for particle simulations. J. Comput. Phys. 73 (1987), 325–348. 1, 3, 5, 74 [59] L. Greengard and V. Rokhlin, On the numerical solution of two-point boundary value problems. Comm. Pure Appl. Math. 44 (1991), 419–452. 3, 36, 59 [60] L. Greengard and V. Rokhlin, A new version of the fast multipole method for the Laplace in three dimensions. Acta Numer. 6 (1997), 229–269. 1, 3, 5, 74 [61] W. Hackbusch, Multigrid methods and applications. Springer Ser. Comput. Math. 4, Springer-Verlag, Berlin 1985. 366 [62] W. Hackbusch, A sparse matrix arithmetic based on H -matrices. Part I: Introduction to H -matrices. Computing 62 (1999), 89–108. v, 2, 6, 29, 36, 49, 366 [63] W. Hackbusch, Hierarchische Matrizen. Springer-Verlag, Dordrecht 2009. 2, 6, 29, 49, 84, 104, 328 [64] W. Hackbusch and S. Börm, Data-sparse approximation by adaptive H 2 -matrices. Computing 69 (2002), 1–35. v, 2, 211, 212 [65] W. Hackbusch and S. Börm, H 2 -matrix approximation of integral operators by interpolation. Appl. Numer. Math. 43 (2002), 129–143. 3, 74, 75 [66] W. Hackbusch and B. N. Khoromskij, H -matrix approximation on graded meshes. In The mathematics of finite elements and applications X (MAFELAP 1999), Elsevier, Kidlington 2000, 307–316. 274 [67] W. Hackbusch and B. N. Khoromskij, A sparse H -matrix arithmetic: general complexity estimates. J. Comp. Appl. Math. 125 (2000), 479–501. 2, 29 [68] W. Hackbusch and B. N. Khoromskij, A sparse H -matrix arithmetic. Part II: Application to multi-dimensional problems. Computing 64 (2000), 21–47. 2, 29 [69] W. Hackbusch, B. N. Khoromskij, and R. Kriemann, Hierarchical matrices based on a weak admissibility criterion. Computing 73 (2004), 207–243. 36, 59 [70] W. Hackbusch, B. N. Khoromskij, and S. A. Sauter, On H 2 -matrices. In Lectures on applied mathematics, Springer-Verlag, Berlin 2000, 9–29. v, 2, 29, 125 [71] W. Hackbusch and Z. P. Nowak, O cloznosti metoda panelej (On complexity of the panel method, Russian). In Vychislitelnye protsessy i sistemy 6, Nauka, Moscow 1988, 233–244. 1
Bibliography
425
[72] W. Hackbusch and Z. P. Nowak, On the fast matrix multiplication in the boundary element method by panel clustering. Numer. Math. 54 (1989), 463–491. 1, 3, 5, 9, 74 [73] H. Harbrecht and R. Schneider, Wavelet Galerkin schemes for boundary integral equations—implementation and quadrature. SIAM J. Sci. Comput. 27 (2006), 1347–1370. 2 [74] S. Jaffard, Wavelet methods for fast resolution of elliptic problems. SIAM J. Numer. Anal. 29 (1992), 965–986. 401 [75] S. Lang, Real and functional analysis. 3rd ed., Graduate Texts in Math. 142, SpringerVerlag, New York 1993. 10, 12 [76] S. Le Borne, H -matrices for convection-diffusion problems with constant convection. Computing 70 (2003), 261–274. 2 [77] S. Le Borne, Hierarchical matrices for convection-dominated problems. In Domain decomposition methods in science and engineering (Berlin, 2003), Lect. Notes Comput. Sci. Eng. 40, Springer-Verlag, Berlin 2005, 631–638. 2 [78] S. Le Borne and L. Grasedyck, H -matrix preconditioners in convection-dominated problems. SIAM J. Matrix Anal. Appl. 27 (2006), 1172–1183. 2 [79] S. Le Borne, L. Grasedyck, and R. Kriemann, Domain-decomposition based H -LU preconditioners. In Domain decomposition methods in science and engineering XVI, Lect. Notes Comput. Sci. Eng. 55, Springer-Verlag, Berlin 2007, 667–674. 414 [80] M. Löhndorf, Effiziente Behandlung von Integraloperatoren mit H 2 -Matrizen variabler Ordnung. PhD thesis, Universität Leipzig, Leipzig 2003. http://www.mis.mpg.de/scicomp/Fulltext/Dissertation_Loehndorf.pdf 65, 161 [81] M. Löhndorf and J. M. Melenk, Approximation of integral operators by variable-order approximation. Part II: Non-smooth domains. In preparation. 65, 274 [82] J. Makino, Yet another fast multipole method without multipoles—pseudoparticle multipole method. J. Comput. Phys. 151 (1999), 910–920. 2 [83] P.-G. Martinsson, A fast direct solver for a class of elliptic partial differential equations. J. Sci. Comput. 38 (2009), 316–330. 59 [84] E. Michielssen, A. Boag, and W. C. Chew, Scattering from elongated objects: direct solution in O.N log2 N / operations. IEE Proc.-Microw. Antennas Propag. 143 (1996),277– 283. 36 [85] G. Of, O. Steinbach, and W. L. Wendland, The fast multipole method for the symmetric boundary integral formulation. IMA J. Numer. Anal. 26 (2006), 272–296. 2 [86] J. Ostrowski, Boundary element methods for inductive hardening. PhD thesis, Universität Tübingen, Tübingen 2003. http://tobias-lib.uni-tuebingen.de/dbt/volltexte/2003/672/ 2 [87] T. J. Rivlin, The Chebyshev polynomials. Wiley-Interscience, New York 1974. 94 [88] V. Rokhlin, Rapid solution of integral equations of classical potential theory. J. Comput. Phys. 60 (1985), 187–207. 2, 74 [89] S. A. Sauter, Cubature techniques for 3-D Galerkin BEM. In Boundary elements: implementation and analysis of advanced algorithms (Kiel, 1996), Notes Numer. Fluid Mech. 54, Vieweg, Braunschweig 1996, 29–44. 156, 392
426
Bibliography
[90] S. A. Sauter, Variable order panel clustering (extended version). Preprint 52/1999, Max Planck Institute for Mathematics in the Sciences, Leipzig 1999. 37, 125, 126, 278 http://www.mis.mpg.de/publications/preprints/1999/prepr1999-52.html [91] S. A. Sauter, Variable order panel clustering. Computing 64 (2000), 223–261. 1, 2, 5, 37, 75, 125, 126 [92] S. A. Sauter and C. Schwab, Randelementmethoden. B. G. Teubner, Wiesbaden 2004. 37, 388, 389, 390, 401 [93] O. Schenk, K. Gärtner, W. Fichtner, and A. Stricker, PARDISO: a high-performance serial and parallel sparse linear solver in semiconductor device simulation. Future Generat. Comput. Syst. 18 (2001), 69–78. 366 [94] R. Schneider, Multiskalen- und Wavelet-Matrixkompression. B. G. Teubner, Stuttgart 1998. 2 [95] L. R. Scott and S. Zhang, Finite element interpolation of nonsmooth functions satisfying boundary conditions. Math. Comp. 54 (1990), 483–493. 377 [96] P. Starr and V. Rokhlin, On the numerical solution of two-point boundary value problems II. Comm. Pure Appl. Math. 47 (1994), 1117–1159. 3, 36, 59 [97] O. Steinbach, Numerische Näherungsverfahren für elliptische Randwertprobleme. B. G. Teubner, Wiesbaden 2003. 392, 396, 398 [98] O. Steinbach and W. L. Wendland, The construction of some efficient preconditioners in the boundary element method. Adv. Comput. Math. 9 (1998), 191–216. 401 [99] E. Stiefel, Über einige Methoden der Relaxationsrechnung. Z. Angew. Math. Physik 3 (1952), 1–33. 393 [100] J. Tausch, The variable order fast multipole method for boundary integral equations of the second kind. Computing 72 (2004), 267–291. 3, 75, 125, 278 [101] J. Tausch, A variable order wavelet method for the sparse representation of layer potentials in the non-standard form. J. Numer. Math. 12 (2004), 233–254. 125 [102] J. Tausch and J. White, Multiscale bases for the sparse representation of boundary integral operators on complex geometry. SIAM J. Sci. Comput. 24 (2003), 1610–1629. 2, 191 [103] E. Tyrtyshnikov, Mosaic-skeleton approximations. Calcolo 33 (1996), 47–57. 2 [104] E. E. Tyrtyshnikov, Incomplete cross approximation in the mosaic-skeleton method. Computing 64 (2000), 367–380. 227, 392 [105] T. von Petersdorff and C. Schwab, Fully discrete multiscale Galerkin BEM. In Multiscale wavelet methods for partial differential equations, Wavelet Anal. Appl. 6, Academic Press, San Diego 1997, 287–346. 2 [106] J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Fast algorithms for hierarchically semiseparable matrices. Numer. Linear Algebra Appl., to appear. Article first published online: 22 DEC 2009, DOI: 10.1002/nla.691 59 [107] J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Superfast multifrontal method for large structured linear systems of equations, SIAM J. Matrix Anal. Appl. 31 (2009), 1382–1411. 59 [108] L. Ying, G. Biros, and D. Zorin, A kernel-independent adaptive fast multipole algorithm in two and three dimensions. J. Comput. Phys. 196 (2004), 591–626. 2
Algorithm index
Adaptive bases for H -matrices, 238 Adaptive bases for H 2 -matrices, 246 Adaptive bases for dense matrices, 230 Adaptive bases for dense matrices, theoretical, 229 Adaptive bases for weighted dense matrices, 263 Backward transformation, 61 Block backward transformation, 339 Block cluster tree, 43 Block forward transformation, 169 Blockwise conversion of an H -matrix, 257 Bounding boxes, 48
Forward transformation, 61 Geometric cluster trees, 42 Householder factorization, 181 Low-rank approximation of leaf blocks, 351 Low-rank approximation of matrix blocks, 349 Low-rank approximation of subdivided blocks, 353 Low-rank projection, 188 Lower forward substitution, 404 LU decomposition, 403
Check for orthogonality, 166 Cluster basis product, 176 Coarsening, 354 Construction of a nested cluster basis from a general one, 224 Construction of weights for the total cluster basis, 243 Conversion, H to H 2 , 173 Conversion, H to H 2 , with error norm, 202 Conversion, H 2 to H 2 , 178 Conversion, H 2 to H 2 , with error norm, 203 Conversion, dense to H 2 , 170 Conversion, dense to H 2 , with error norm, 201 Covering balls, 46
Matrix addition, 296 Matrix backward transformation, 291 Matrix backward transformation, elementary step, 290 Matrix forward transformation, 286 Matrix forward transformation, elementary step, 285 Matrix inversion, 416 Matrix multiplication, 314 Matrix multiplication, bA admissible, 312 Matrix multiplication, bB admissible, 312 Matrix multiplication, bC admissible, 313 Matrix multiplication, inadmissible, 314 Matrix multiplication, recursion, 311 Matrix-vector multiplication, 62
Expansion of a cluster basis, 338
Orthogonalization of cluster bases, 184
Find optimal rank, 187 Forward substitution, 407
Recursive construction of cluster bases for H -matrices, 237
428
Algorithm index
Recursive construction of cluster bases for H 2 -matrices, 246 Recursive construction of unified cluster bases, 252 Regular cluster trees, 40 Semi-uniform matrix forward transformation, 338, 340
Semi-uniform matrix product, multiplication step, 342 Truncation of cluster bases, 190 Unification of submatrices, 255 Upper forward substitution, 405
Subject index
Admissibility condition, 36 Admissibility condition, weak, 36 Anisotropic coefficients, 419 Approximation of derivatives, 104 Asymptotically smooth function, 83 Backward transformation, blocks, 339 Backward transformation, matrices, 287 Block backward transformation, 339 Block cluster tree, 34 Block cluster tree, admissible, 37 Block cluster tree, consistent, 144 Block cluster tree, induced, 298, 325 Block cluster tree, sparse, 51 Block cluster tree, transposed, 215 Block columns, 50 Block columns, admissible, 59 Block forward transformation, 169 Block restriction, 167 Block rows, 50 Block rows, admissible, 59 Block rows, extended, 170 Boundary integral formulation, direct, 395 Boundary integral formulation, indirect, 388 Bounding boxes, 45, 90 Cacciopoli inequality, 367 Call tree for multiplication, 315 Characteristic point, 38 Chebyshev interpolation, 93 Clément interpolation operator, 377 Cluster basis, 54 Cluster basis product, 176 Cluster basis, conversion into nested basis, 223 Cluster basis, induced, 299, 321, 325
Cluster basis, nested, 54 Cluster basis, orthogonal, 164 Cluster basis, orthogonalization, 180 Cluster basis, truncation, 185 Cluster basis, unified, 248 Cluster operator, 175 Cluster tree, 30 Cluster tree, bounded, 63 Cluster tree, depth, 34 Cluster tree, quasi-balanced, 129 Cluster tree, regular, 69 Cluster, descendants, 30 Cluster, father, 30 Cluster, level, 31 Cluster, predecessors, 30 Coarsening the block structure, 354 Complexity of block forward transformation, 169 Complexity of cluster basis expansion, 342 Complexity of coarsening, 357 Complexity of collecting submatrices, 284 Complexity of compression of dense matrices, 229 Complexity of compression of H -matrices, 236 Complexity of compression of H 2 -matrices, 245 Complexity of conversion of dense matrices, 171 Complexity of conversion of H -matrices, 173 Complexity of conversion of H 2 -matrices, 178 Complexity of finding cluster weights, 243
430
Subject index
Complexity of hierarchical compression, 258 Complexity of matrix backward transformation, 292 Complexity of matrix forward transformation, 286 Complexity of orthogonalization, 183 Complexity of projected matrix addition, 295 Complexity of projected multiplication, 317 Complexity of semi-uniform forward transformation, 343 Complexity of semi-uniform matrix multiplication, 346 Complexity of splitting into submatrices, 289 Complexity of the cluster basis product, 177 Complexity of truncation, 190 Complexity of unification of cluster bases, 250 Complexity of unification of submatrices, 255 Condensation of H -matrix blocks, 234 Condensation of total cluster bases, 240, 249 Conjugate gradient method, 401 Control of blockwise relative error, 262 Control of spectral error by variable rank compression, 264 Covering a regularity ellipse with circles, 152 Density function, 388 Direct boundary integral formulation, 395 Directional interpolation, 98 Double layer potential, 155 Elliptic partial differential equation, 363, 413 Error decomposition for truncation, 194
Error estimate for asymptotically smooth functions, 102 Error estimate for coarsening, 354 Error estimate for compression, 232 Error estimate for derived asymptotically smooth functions, 113 Error estimate for discrete solution operators, 385 Error estimate for isotropic interpolation, 101 Error estimate for multi-dimensional derived interpolation, 111 Error estimate for multi-dimensional interpolation, 100 Error estimate for multi-dimensional re-interpolation, 141 Error estimate for one-dimensional derived interpolation, 108 Error estimate for one-dimensional interpolation, 97 Error estimate for one-dimensional re-interpolation, 136 Error estimate for re-interpolation of analytic functions, 137 Error estimate for re-interpolation of asymptotically smooth functions, 142 Error estimate for semi-uniform projection, 224 Error estimate for solution operators, 372 Error estimate for submatrices of solution operators, 382 Error estimate for the double layer potential, 399 Error estimate for the single layer potential, 391 Error estimate for the Taylor expansion, 85 Error estimate for truncation, 195 Error estimate for variable-order interpolation, 147 Error orthogonality, 192 Expansion matrices, 54
Subject index
Expansion system, 77 Farfield, 37 Finite element method, 76, 365, 389, 414 Forward substitution, 406 Forward substitution, blocks, 402 Forward transformation, blocks, 169 Forward transformation, matrices, 285 Forward transformation, semi-uniform matrices, 338, 340 Frobenius error, blockwise, 116 Frobenius error, integral operator, 120 Frobenius error, projection, 200 Frobenius error, total, 117 Frobenius inner product, 166 Frobenius norm, 115, 166, 200 Galerkin’s method, 76, 365, 389, 414 H -matrix, 49 H 2 -matrix, 56 Hierarchical compression, 257 Hierarchical matrix, 49 Holomorphic extension, 151 Householder factorization, 180 HSS-matrix, 59 Indirect boundary integral formulation, 388 Induced block cluster tree, 298, 325 Induced cluster basis, 299, 321, 325 Integral operator, double layer potential, 155 Integral operator, Frobenius error, 120 Integral operator, single layer potential, 155, 388 Integral operator, spectral error, 124 Integral operator, truncation, 198 Integral operator, variable-rank compression, 265 Interior regularity, 366, 367 Interpolation, 88 Interpolation scheme, 93
431
Interpolation, best approximation property, 94 Jumping coefficient, 418 Lebesgue constant, 93 Locally L-harmonic, 367 LU decomposition, 402 Markov’s inequality, 106 Matrix addition, exact, 300 Matrix addition, projected, 293 Matrix backward transformation, 287 Matrix forward transformation, 285 Matrix inversion, 415 Matrix multiplication, exact, 327 Matrix multiplication, projected, 310 Matrix multiplication, semi-uniform, 340 Nearfield, 37 Operator norm, 120 Orthogonality criterion, 165 Orthogonalization of cluster bases, 180 Overlapping supports, 115 Partial differential equation, 364 Partial interpolation, 98 Polynomial approximation of analytic functions, 95 Preconditioner, 401 Projection into H 2 -matrix spaces, 167 Projection into semi-uniform matrix spaces, 214 Projection of H -matrices, 172 Projection of dense matrices, 168 Quasi-optimality of compression, 233 Rank distribution, 54 Rank distribution, bounded, 64 Rank distribution, maximum, 176 Re-interpolated kernel, 132
432
Subject index
Re-interpolation, 131 Re-interpolation cluster basis, 131 Restriction of cluster basis, 220 Schur complement, 415 Semi-uniform hierarchical matrix, 212, 334 Semi-uniform matrix forward transformation, 340 Semi-uniform matrix forward transformation, 338 Separable approximation by Taylor expansion, 81 Separable approximation by tensor interpolation, 90 Sequence of -regular bounding boxes, 141 Sequence of -regular boxes, 140 Sequence of -regular intervals, 135 Single layer potential, 155, 388 Singular value decomposition, 185, 348 Spaces of semi-uniform hierarchical matrices, 213 Spectral error, blockwise, 120 Spectral error, factorized estimate, 124 Spectral error, integral operator, 124 Spectral error, projection, 216 Spectral norm, 120
Spectral norm, total, 121, 122 Stability of re-interpolation, 135 Stable interpolation scheme, 93 Strang’s lemma, 390 Support, 38 Symm’s equation, 388 Tausch/White wavelets, 191 Taylor expansion, 80 Tensor interpolation, 90 Total cluster basis, 218 Total cluster basis, geometric interpretation, 222 Total cluster basis, properties, 218 Total cluster basis, weighted, 261 Transfer matrices, 54 Transfer matrices, long-range, 193 Tree, 29, 30 Truncation of cluster bases, 185 Unification, 248 Variable-order approximation, 125 Variable-order interpolation of the kernel function, 142 Variable-rank compression, integral operator, 265