ot99Bottcher FM1.qxp
10/6/2005
10:27 AM
Page 1
Spectral Properties of Banded Toeplitz Matrices
ot99Bottcher FM1.qxp
10/6/2005
10:27 AM
Page 2
ot99Bottcher FM1.qxp
10/6/2005
10:27 AM
Page 3
Spectral Properties of Banded Toeplitz Matrices Albrecht Böttcher Chemnitz University of Technology Chemnitz, Germany Sergei M. Grudsky CINVESTAV del I. P. N. Mexico City, Mexico and Rostov-on-Don State University Rostov-on-Don, Russia
Society for Industrial and Applied Mathematics Philadelphia
ot99Bottcher FM1.qxp
10/6/2005
10:27 AM
Page 4
Copyright © 2005 by the Society for Industrial and Applied Mathematics. 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. MATLAB® is a registered trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software. For MATLAB® product information, please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 USA, 508-647-7000, Fax: 508-647-7101,
[email protected], www.mathworks.com/ Library of Congress Cataloging-in-Publication Data Böttcher, Albrecht. Spectral properties of banded Toeplitz matrices / Albrecht Böttcher, Sergei M. Grudsky. p. cm. Includes bibliographical references and index. ISBN 0-89871-599-7 (pbk.) Toeplitz matrices. I. Grudsky, Sergei M., 1955- II. Title. QA188.B674 2005 512.9’434—dc22 2005051608 Partial royalties from the sale of this book are placed in a fund to help students attend SIAM meetings and other SIAM-related activities. This fund is administered by SIAM, and qualified individuals are encouraged to write directly to SIAM for guidelines.
is a registered trademark.
i
i
i
buch7 2005/10/5 page v i
Contents Preface 1
2
3
4
Infinite Matrices 1.1 Toeplitz and Hankel Matrices 1.2 Boundedness . . . . . . . . . 1.3 Products . . . . . . . . . . . 1.4 Wiener-Hopf Factorization . 1.5 Spectra . . . . . . . . . . . . 1.6 Norms . . . . . . . . . . . . 1.7 Inverses . . . . . . . . . . . 1.8 Eigenvalues and Eigenvectors 1.9 Selfadjoint Operators . . . .
ix
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
1 1 2 5 6 9 10 12 15 21
Determinants 2.1 Circulant Matrices . . . . . . . . . . 2.2 Tridiagonal Toeplitz Matrices . . . . 2.3 The Baxter-Schmidt Formula . . . . 2.4 Widom’s Formula . . . . . . . . . . 2.5 Trench’s Formula . . . . . . . . . . 2.6 Szegö’s Strong Limit Theorem . . . 2.7 The Szegö-Widom Theorem . . . . . 2.8 Geronimo, Case, Borodin, Okounkov
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
31 31 34 36 38 41 43 45 48
Stability 3.1 Strong and Weak Convergence . . . . . 3.2 Stable Sequences . . . . . . . . . . . . 3.3 The Baxter-Gohberg-Feldman Theorem . 3.4 Silbermann Theory . . . . . . . . . . . 3.5 Asymptotic Inverses . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
59 59 61 62 64 68
Instability 4.1 Outside the Essential Spectrum . . . . . . . . . . . . . . . . . . . . . 4.2 Exponential Growth Is Generic . . . . . . . . . . . . . . . . . . . . . 4.3 Arbitrarily Fast Growth . . . . . . . . . . . . . . . . . . . . . . . . .
79 79 80 83
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
v
i
i i
i
i
i
i
vi
5
6
7
8
9
buch7 2005/10/5 page vi i
Contents 4.4 4.5 4.6 4.7 4.8
Sequences Versus Polynomials . . . . Symbols with Zeros: Lower Estimates Symbols with Zeros: Upper Estimates Inside the Essential Spectrum . . . . . Semi-Definite Matrices . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
85 88 90 96 101
Norms 5.1 5.2 5.3 5.4 5.5 5.6 5.7
A Universal Estimate . . . . . . . . . . . Spectral Norm of Toeplitz Matrices . . . . Fejér Means . . . . . . . . . . . . . . . . Toeplitz-Like Matrices . . . . . . . . . . . Exponentially Fast Convergence Is Generic Slow Convergence . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
113 113 117 121 123 126 131 132
Condition Numbers 6.1 Asymptotic Inverses of Toeplitz-Like Matrices 6.2 The Limit of the Condition Numbers . . . . . 6.3 Convergence Speed Estimates . . . . . . . . . 6.4 Generic and Exceptional Cases . . . . . . . . 6.5 Norms of Inverses of Pure Toeplitz Matrices . 6.6 Condition Numbers of Pure Toeplitz Matrices 6.7 Conclusions . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
137 137 139 142 145 147 153 154
Substitutes for the Spectrum 7.1 Pseudospectra . . . . . . . . . . . . . . . 7.2 Norm of the Resolvent . . . . . . . . . . . 7.3 Limits of Pseudospectra . . . . . . . . . . 7.4 Pseudospectra of Infinite Toeplitz Matrices 7.5 Numerical Range . . . . . . . . . . . . . 7.6 Collective Perturbations . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
157 157 159 163 165 165 170
Transient Behavior 8.1 The General Message . . . . . . . . . . 8.2 Polynomial Numerical Hulls . . . . . . 8.3 The Pseudospectra Perspective . . . . . 8.4 A Triangular Example . . . . . . . . . . 8.5 Gauss-Seidel for Large Toeplitz Matrices 8.6 Genuinely Finite Results . . . . . . . . . 8.7 The Sky Region Contains an Angle . . . 8.8 Oscillations . . . . . . . . . . . . . . . 8.9 Exponentials . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
177 177 179 181 186 187 192 195 200 204
. . . . . . . . .
Singular Values 211 9.1 Approximation Numbers . . . . . . . . . . . . . . . . . . . . . . . . 211 9.2 The Splitting Phenomenon . . . . . . . . . . . . . . . . . . . . . . . 212
i
i i
i
i
i
i
Contents 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 10
11
12
13
14
buch7 2005/10/5 page vii i
vii Singular Values of Circulant Matrices . . . . Extreme Singular Values . . . . . . . . . . . The Limiting Set . . . . . . . . . . . . . . . The Limiting Measure . . . . . . . . . . . . Proper Clusters . . . . . . . . . . . . . . . . Norm of Matrix Times Random Vector . . . The Case of Toeplitz and Circulant Matrices The Nearest Structured Matrix . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
215 216 217 219 224 225 229 233
Extreme Eigenvalues 10.1 Hermitian Matrices . . . . . . . . 10.2 First-Order Trace Formulas . . . . 10.3 The Spectral Radius . . . . . . . . 10.4 Matrices with Nonnegative Entries
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
245 245 248 250 253
Eigenvalue Distribution 11.1 Toward the Limiting Set . . . . . . 11.2 Structure of the Limiting Set . . . 11.3 Toward the Limiting Measure . . . 11.4 Limiting Set and Limiting Measure 11.5 Connectedness of the Limiting Set
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
261 261 264 267 271 275
Eigenvectors and Pseudomodes 12.1 Tridiagonal Circulant and Toeplitz Matrices . . . . 12.2 Eigenvectors of Triangular and Tridiagonal Matrices 12.3 Asymptotics of Eigenvectors . . . . . . . . . . . . 12.4 Pseudomodes of Circulant Matrices . . . . . . . . . 12.5 Pseudomodes of Toeplitz Matrices . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
287 287 288 294 300 303
Structured Perturbations 13.1 Toeplitz Pseudospectra . . . . . . . . . . . 13.2 The Nearest Singular Matrix . . . . . . . . 13.3 Structured Normwise Condition Numbers . 13.4 Toeplitz Systems . . . . . . . . . . . . . . . 13.5 Exact Right-Hand Sides . . . . . . . . . . . 13.6 The Condition Number for Matrix Inversion 13.7 Once More the Nearest Singular Matrix . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
309 309 310 313 317 320 328 330
Impurities 14.1 The Discrete Laplacian . . . . . . . 14.2 An Uncertain Block . . . . . . . . . 14.3 Emergence of Antennae . . . . . . . 14.4 Behind the Black Hole . . . . . . . . 14.5 Can Structured Pseudospectra Jump?
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
335 335 341 347 353 362
. . . . .
. . . . .
. . . . .
. . . . .
Bibliography
387
Index
407
i
i i
i
i
buch7 2005/10/5 page viii i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page ix i
Preface Toeplitz matrices emerge in plenty of applications and have been extensively studied for about a century. The literature on them is immense and ranges from thousands of articles in periodicals to huge monographs. This does not imply that there is nothing left to say on the topic. To the contrary, Toeplitz matrices are an active field of research with many facets, and the amount of material gathered only in the last decade would easily fill several volumes. The present book lives within its limitations: to banded Toeplitz matrices on the one hand and to the spectral properties of such matrices on the other. As a third limitation, we consider large matrices only, and most of the results are actually asymptotics. When speaking of banded Toeplitz matrices, we have in mind an n × n Toeplitz matrix of bandwidth 2r + 1, and we silently assume that n is large in comparison with 2r + 1. A Toeplitz matrix is completely specified by the (complex) numbers that constitute its first row and its first column. The function on the complex unit circle whose Fourier coefficients are just these numbers is referred to as the symbol of the matrix. In the case of Toeplitz band matrices, the symbol is a Laurent polynomial. Thus, we need not struggle with piecewise continuous or oscillating symbols, which arise in many applications, but “only” with Laurent polynomials. This circumstance simplifies part of the investigation. On the other hand, Laurent polynomials cause questions that are different from those one encounters in connection with more general symbols. Eventually, Toeplitz band matrices form their own realm in the world of Toeplitz matrices. We understand spectral properties in a broad sense. Of course, we study such problems as the evolution of the eigenvalues of banded n×n Toeplitz matrices as n goes to infinity. The pioneering result in this direction was already proved by Schmidt and Spitzer in 1960, and every worker in the field has a personal copy of the Schmidt/Spitzer paper. Here we cite a full proof of this result for the first time in the monographical literature. This proof is Schmidt and Spitzer’s original proof with several simplifications and improvements introduced by Hirschman and Widom. We regard the singular values of a matrix as its most important spectral characteristics after the eigenvalues and pseudoeigenvalues; hence, we pay due attention to the asymptotic behavior of the singular values as the matrix dimension increases to infinity. Clearly, questions about the norm, the norm of the inverse, and the condition numbers of a matrix are questions about the extreme singular values. Normal Toeplitz matrices raise specific problems, and these will be discussed. However, typically a Toeplitz matrix is nonnormal; hence, pseudospectra tell us more about Toeplitz matrices than spectra. Accordingly, we embark on pseudospectra of Toeplitz matrices and on related issues, such as the transient behavior of powers of large Toeplitz matrices. ix
i
i i
i
i
i
i
x
buch7 2005/10/5 page x i
Preface
Finally, the book contains some very recent results on the spectral behavior of Toeplitz matrices under certain structured perturbations. These results are far from what one wants to know about Toeplitz matrices with randomly perturbed main diagonal, but they are beautiful, they point in a good direction, and they have the potential to stimulate further research. As already stated, the majority of the results describe the asymptotic behavior as the matrix dimension n goes to infinity. Many questions considered here can be easily answered by a few MATLAB commands if the matrix dimension is moderate, say in the low hundreds. We try to deliver answers in the case where n is really large and the computer quits. Part of the results are equipped with estimates of the convergence speed, which provides the user at least with a vague feeling for as to whether one can invoke the result for n in the hundreds. And, most importantly, several problems of this book are motivated by applications in statistical physics, where n is around 108 , the cube root of the Avogadro number 1023 , and, for such astronomic values of n, asymptotic formulas are the only chance to describe and to understand something. In summary, the book provides several pieces of information about the eigenvalues, singular values, determinants, norms, norms of inverses, (unstructured and structured) condition numbers, (unstructured and structured) pseudospectra, transient behavior, eigenvectors and pseudomodes, and spectral phenomena caused by perturbations of large Toeplitz band matrices. The selection of the material represents our taste and is to some extent determined by subjects we have worked on ourselves, and we think we can tell the community something about. Naturally, numerous problems are left open. Moreover, various important topics, such as fast inversion of Toeplitz matrices or fast solution of Toeplitz systems, are not touched at all. These topics are the business of other books (see, for example, [157] and [177]). However, the material of the present book is certainly useful and in many cases even indispensable when dealing with such practical problems as the effective solution of a large banded Toeplitz system. The book is intended as an introductory text to some advanced topics. We assume that the reader is familiar with the basics of real and complex analysis, linear algebra, and functional analysis. Almost all results are accompanied by full proofs. A baby version of this book was published under the title Toeplitz Matrices, Asymptotic Linear Algebra, and Functional Analysis by Hindustan Book Agency, New Delhi, and Birkhäuser, Basel, in 2000. S. M. Grudsky thankfully acknowledges financial support by CONACYT grant N 40564-F (México). We sincerely thank our wives, Sylvia Böttcher and Olga Grudskaya, for their usual patient and excellent work on the LATEX masters and on part of the illustrations. We are also greatly indebted to Mark Embree for valuable remarks on a draft of this book and to Linda Thiel and the staff of SIAM for their help with publishing the book. Tragically, Olga Grudskaya died in a car accident in February 2004. We have lost an exceptional woman, a wonderful friend, and an irreplaceable colleague. Her early death leaves an emptiness that can never be filled. In late 2003, she began working on the illustrations for this book with great enthusiasm. She could not accomplish her visions. We were left with the drafts of her illustrations and included some of them. They provide us with an idea of the beauty that would have emerged if she would have been able to complete her work. May this book keep the memory of our irretrievable Olga. Chemnitz and Mexico City, spring 2005
The authors
i
i i
i
i
i
i
buch7 2005/10/5 page 1 i
Chapter 1
Infinite Matrices
When studying large finite matrices, it is natural to look also at their infinite counterparts. The spectral phenomena of the latter are sometimes easier to understand than those of the former. The question whether properties of infinite Toeplitz matrices mimic the corresponding properties of their large finite sections is very delicate and is, in a sense, the topic of this book. We regard infinite Toeplitz matrices as operators on p . This chapter is concerned with some basic properties of these operators, including boundedness, norms, invertibility and inverses, spectrum, eigenvalues, and eigenvectors. Wiener-Hopf factorization provides us with a fairly effective tool for the inversion of infinite (but not of finite) Toeplitz matrices. We also embark on some of the problems that are specific for selfadjoint operators.
1.1 Toeplitz and Hankel Matrices An infinite Toeplitz matrix is a matrix of the form ⎛ a0 a−1 ⎜ ⎜ a 1 a0 (aj −k )∞ j,k=0 = ⎝ a a1 2 ... ...
a−2 a−1 a0 ...
⎞ ... ... ⎟ ⎟. ... ⎠ ...
(1.1)
Such matrices are characterized by the property of being constant along the parallels to the main diagonal. Clearly, the matrix (1.1) is completely determined by its entries in the first row and first column, that is, by the sequence {ak }∞ k=−∞ = { . . . , a−2 , a−1 , a0 , a1 , a2 , . . . }.
(1.2)
Throughout this book we assume that the ak ’s are complex numbers. The matrix (1.1) is a band matrix if and only if at most finitely many of the numbers in (1.2) are nonzero. Although our subject is Toeplitz band matrices, it is also necessary to study Toeplitz matrices which are not band matrices. For example, the inverse of the band 1
i
i i
i
i
i
i
2
buch7 2005/10/5 page 2 i
Chapter 1. Infinite Matrices
matrix
⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
1 − 21 0 0 1 0 1 −2 0 0 0 1 − 21 0 0 0 1 ... ... ... ...
... ... ... ... ...
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
is the Toeplitz matrix ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
1 0 0 0 ...
1 22 1 2
1 2
1 0 0 ...
1 0 ...
1 23 1 22 1 2
... ... ... 1 ... ... ...
⎞ ⎟ ⎟ ⎟ ⎟, ⎟ ⎠
and this is not a band matrix. There is another type of matrix that arises when working with Toeplitz matrices. These are the Hankel matrices. An infinite Hankel matrix has the form ⎛ ⎞ a1 a2 a3 . . . ⎜ a2 a3 . . . . . . ⎟ ⎜ ⎟. (aj +k+1 )∞ (1.3) j,k=0 = ⎝ a ... ... ... ⎠ 3 ... ... ... ... Notice that (1.3) is completely given by only the numbers with positive indices in (1.2). Obviously, if the sequence (1.2) has finite support, then the matrix (1.3) contains only finitely many nonzero entries.
1.2
Boundedness
The Wiener algebra. Let T := {t ∈ C : |t| = 1} be the complex unit circle. The Wiener algebra W is defined as the set of all functions a : T → C with absolutely convergent Fourier series - that is, as the collection of all functions a : T → C which can be represented in the form a(t) =
∞
an t n (t ∈ T)
with
aW :=
n=−∞
∞
|an | < ∞.
(1.4)
n=−∞
Notice that instead of (1.4) we could also write a(eiθ ) =
∞ n=−∞
an einθ (eiθ ∈ T)
with
aW :=
∞
|an | < ∞.
(1.5)
n=−∞
The numbers an are the Fourier coefficients of a, and they can be computed by the formula 2π 1 an = a(eiθ )e−inθ dθ. (1.6) 2π 0
i
i i
i
i
i
i
1.2. Boundedness
buch7 2005/10/5 page 3 i
3
Sometimes it will be convenient to identify a function a : T → C with the function θ → a(eiθ ); the latter function may be thought of as being given on [0, 2π ), (−π, π ], or even on all of the real line R. Clearly, functions in W are continuous on T and, when regarded as functions on R, they are 2π-periodic continuous functions. Now let a ∈ W and let {an }∞ n=−∞ be the sequence of the Fourier coefficients of a. We denote by T (a) and H (a) the matrices (1.1) and (1.3), respectively: ⎛ ⎞ ⎛ ⎞ a0 a−1 a−2 . . . a1 a2 a3 . . . ⎜ a1 a0 a−1 . . . ⎟ ⎜ ⎟ ⎟ , H (a) := ⎜ a2 a3 . . . . . . ⎟ . T (a) := ⎜ ⎝ a2 a1 ⎠ ⎝ a0 . . . a3 . . . . . . . . . ⎠ ... ... ... ... ... ... ... ... The matrix T (a) is called the infinite Toeplitz matrix
generated by a, while a is referred to as the symbol of the matrix T (a). Note that if ∞ n=−∞ |an | < ∞, then there is exactly one a ∈ W such that (1.6) holds for all n. On the other hand, although H (a) is uniquely determined by a, it is only the numbers an with n ≥ 1 that can be recovered from the matrix H (a). In other words: H (a) = H (b) if and only if an = bn for all n ≥ 1. Infinite matrices as operators. We let p := p (Z+ ) (1 ≤ p ≤ ∞) stand for the usual Banach spaces of complex-valued sequences {xn }∞ n=0 : for 1 ≤ p < ∞, p p x = {xn }∞ n=0 ∈ ⇐⇒ xp :=
∞
|xn |p < ∞,
n=0
and for p = ∞, ∞ x = {xn }∞ ⇐⇒ x∞ := sup |xn | < ∞. n=0 ∈ n≥0
(aj k )∞ j,k=0
is said to induce a bounded operator on p if there is a An infinite matrix A = p constant M ∈ (0, ∞) such that for every x = {xn }∞ n=0 ∈ the inequality ∞ ∞ ∞ p aj k xk ≤ M p |xk |p j =0 k=0
(1.7)
k=0
holds; we remark that (1.7) includes the requirement that the series yj =
∞ k=0
aj k xk (j ≥ 0)
and
∞
|yj |p
j =0
p are convergent. If A = (aj k )∞ j,k=0 induces a bounded operator on , we can simply think p of A as being a bounded operator on which, after writing the elements of p as column vectors, acts by the rule ⎛ ⎞⎛ ⎞ ⎛ ⎞ y0 x0 a00 a01 a02 . . . ⎜ y1 ⎟ ⎜ a10 a11 a12 . . . ⎟ ⎜ x1 ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ y = Ax with ⎜ y2 ⎟ = ⎜ a20 a21 a22 . . . ⎟ ⎜ x2 ⎟ . ⎝ ⎠⎝ ⎠ ⎝ ⎠ .. .. .. .. .. . . . . .
i
i i
i
i
i
i
4
buch7 2005/10/5 page 4 i
Chapter 1. Infinite Matrices
If A induces a bounded operator on p , then there is a smallest M for which (1.7) is true for all x ∈ p . This number M is the norm of A, and it is denoted by Ap : Ap = sup x =0
Axp = sup Axp . xp xp =1
If A does not induce a bounded operator on p , we put Ap = ∞. Let Z be the set of all integers. For n ∈ Z, define χn ∈ W by χn (t) = t n
(t ∈ T).
The matrix T (χn ) has units on a single parallel to the main diagonal and zeros elsewhere. Obviously, for n ≥ 0, T (χn )x = {0, . . . , 0, x0 , x1 , . . . },
T (χ−n )x = {xn , xn+1 , . . . }.
(1.8)
n
Similarly, H (χn ) is the zero matrix for n ≤ 0 and is a matrix with units on a single “antidiagonal” and zeros elsewhere for n ≥ 1: H (χn )x = {xn−1 , xn−2 , . . . , x0 , 0, 0, . . . }
for
n ≥ 1.
(1.9)
Proposition 1.1. If a ∈ W , then T (a) and H (a) induce bounded operators on the space p (1 ≤ p ≤ ∞) and T (a)p ≤ aW , H (a)p ≤ aW . Proof. If a is given by (1.4), then T (a) =
∞
an T (χn ),
H (a) =
n=−∞
∞
an H (χn ),
n=1
and from (1.8) and (1.9) we infer that T (χn )p = 1 for all n and H (χn )p = 1 for n ≥ 1, whence T (a)p ≤
∞
|an |,
H (a)p ≤
n=−∞
∞
|an |.
n=1
By virtue of Proposition 1.1, we can regard T (a) and H (a) as bounded linear operators on p . For Hankel operators, we can say even much more. Proposition 1.2. If a ∈ W , then H (a) is compact on p (1 ≤ p ≤ ∞). Proof. Write a in the form (1.4) and put (SN a)(t) :=
N
an t n
(t ∈ T).
n=−N
i
i i
i
i
i
i
1.3. Products
buch7 2005/10/5 page 5 i
5
The operator H (SN a) is given by the matrix ⎛ a1 . . . ⎜ .. ⎜ . ⎜ H (SN a) = ⎜ ⎜ aN . . . ⎜ 0 ... ⎝ .. .
aN .. .
0 .. .
0 0 .. .
0 0 .. .
...
⎞
⎟ ⎟ ⎟ ... ⎟ ⎟ ... ⎟ ⎠
and is therefore a finite rank operator. From Proposition 1.1 we infer that H (a) − H (SN a)p = H (a − SN a)p ≤ a − SN aW = |aj | = o(1)
as
N → ∞.
|n|≥N
Therefore, H (a) is a uniform limit of finite rank operators. This implies that H (a) is compact.
1.3
Products
It is easily seen that W is a Banach algebra with pointwise algebraic operations and the norm · W , i.e., (W, · W ) is a Banach space, and if a, b ∈ W , then ab ∈ W and abW ≤ aW bW . Given a ∈ W , we define the function a˜ by a(t) ˜ := a(1/t) (t ∈ T). Clearly, a˜ also belongs to W . Since a(t) = an t n ⇒ a(t) ˜ = a−n t n , we see that T (a) ˜ and H (a) ˜ are the matrices ⎛ ⎞ ⎛ a0 a1 a2 . . . ⎜ a−1 a0 a1 . . . ⎟ ⎜ ⎟ T (a) ˜ =⎜ ˜ =⎜ ⎝ a−2 a−1 a0 . . . ⎠ , H (a) ⎝ ... ... ... ...
a−1 a−2 a−3 ...
a−2 a−3 ... ...
a−3 ... ... ...
⎞ ... ... ⎟ ⎟. ... ⎠ ...
Thus, T (a) ˜ is simply the transpose of T (a), but H (a) ˜ has nothing to do with H (a). ˜ Proposition 1.3. If a, b ∈ W then T (ab) = T (a)T (b) + H (a)H (b). Proof. The j k entry of T (ab) is (ab)j −k =
m+n=j −k
the j k entry of T (a)T (b) equals
am bn =
∞
aj + b−k− ,
=−∞
⎛
⎞ b−k 0 ⎜ ⎟ (aj aj −1 . . . ) ⎝ b−k+1 ⎠ = aj + b−k− , .. =−∞ .
i
i i
i
i
i
i
6
buch7 2005/10/5 page 6 i
Chapter 1. Infinite Matrices
˜ is equal to and the j k entry of H (a)H (b) ⎛ ⎞ b−k−1 ∞ ⎜ ⎟ (aj +1 aj +2 . . . ) ⎝ b−k−2 ⎠ = aj + b−k− . .. =1 . Moral: The product of two infinite Toeplitz matrices is in general not a Toeplitz matrix, but it is always a Toeplitz matrix minus the product of two Hankel matrices. The previous proposition indicates the role played by Hankel matrices in the theory of Toeplitz matrices. We now introduce two important subalgebras W+ and W− of W . Let W+ and W− stand for the set of all functions a ∈ W which are of the form a(t) =
∞
an t (t ∈ T)
and
a(t) =
0
an t n (t ∈ T),
n=−∞
n=0
respectively. Equivalently, for a ∈ W we have a ∈ W+ ⇐⇒ H (a) ˜ = 0 ⇐⇒ T (a) a ∈ W− ⇐⇒ H (a) = 0 ⇐⇒ T (a)
is lower-triangular, is upper-triangular.
Proposition 1.4. If a− ∈ W− , b ∈ W , a+ ∈ W+ , then T (a− ba+ ) = T (a− )T (b)T (a+ ). Proof. Since H (a− ) = H (a˜ + ) = 0, we deduce from Propostion 1.3 that T (a− ba+ ) = T (a− )T (ba+ ) + H (a− )H (b˜ a˜ + ) = T (a− )T (ba+ ) = T (a− )T (b)T (a+ ) + T (a− )H (b)H (a˜ + ) = T (a− )T (b)T (a+ ).
1.4 Wiener-Hopf Factorization In what follows, we have to work with a few important subsets of the Wiener algebra: GW , exp W , GW± , exp W± . Wiener’s theorem. We let GW stand for the group of the invertible elements of the algebra W . Thus, a ∈ GW if and only if a ∈ W and if there is a b ∈ W such that a(t)b(t) = 1 for all t ∈ T. Clearly, a function a ∈ GW cannot have zeros on T. The following famous theorem by Wiener says that the converse is also true. Theorem 1.5. GW = {a ∈ W : a(t) = 0 for all t ∈ T}. The winding number. The set exp W is defined as the collection of all a ∈ W which have a logarithm in W , that is, which are of the form a = eb with b ∈ W . To characterize exp W , we need the notion of the winding number. If a : T → C \ {0} is a continuous function, then a(t) traces out a continuous and closed curve in C \ {0} as t moves once around the
i
i i
i
i
i
i
1.4. Wiener-Hopf Factorization
buch7 2005/10/5 page 7 i
7
counterclockwise oriented unit circle. The number of times this curve surrounds the origin counterclockwise is called the winding number of a and is denoted by wind a. Another (equivalent) definition is as follows. Every continuous function a : T → C \ {0} can be written in the form a(eiθ ) = |a(eiθ )|eic(θ ) (eiθ ∈ T), where c : [0, 2π ) → R is a continuous function. The number 1 c(2π − 0) − c(0 + 0) 2π is an integer which is independent of the particular choice of c. This integer is wind a. Theorem 1.6. exp W = {a ∈ GW : wind a = 0}. Analytic Wiener functions. In Section 1.3, we introduced the algebras W± . We denote by GW± the functions a± ∈ W± for which there exist b± ∈ W± such that a± (t)b± (t) = 1 for all t ∈ T, and we let exp W± stand for the functions a± ∈ GW± which can be represented in the form a± = eb± with b± ∈ W± . Notice that GW± is a proper subset of W± ∩ GW : for example, if a+ ∈ W+ is given by a+ (t) = t, then 1/a+ (t) = t −1 is a function in W− . Let D := {z ∈ C : |z| < 1} be the open unit disk. Every function a+ ∈ W+ can be extended to an analytic function in D by the formula a+ (z) =
∞
(z ∈ D),
an z n
n=0
where {an }∞ n=0 is the sequence of the Fourier coefficients of a. Analogously, every function a− ∈ W− admits analytic continuation to {z ∈ C : |z| > 1} ∪ {∞} via a− (z) =
∞
a−n z−n
(1 < |z| ≤ ∞).
n=0
Theorem 1.7. We have GW+ = {a ∈ W : a(z) = 0 for all |z| ≤ 1}, GW− = {a ∈ W : a(z) = 0 for all |z| ≥ 1 and for z = ∞}, exp W+ = GW+ , exp W− = GW− . Theorems 1.5 to 1.7 are standard results of the theory of commutative Banach algebras and are essentially equivalent to the facts that the maximal ideal spaces of W , W+ , W− are T, D ∪ T, (C ∪ {∞}) \ D, respectively. Theorem 1.8 (Wiener-Hopf factorization). Let a ∈ W and suppose that a(t) = 0 for all t ∈ T and that wind a = m. Then a can be written in the form a(t) = a− (t)t m a+ (t) (t ∈ T) with a± ∈ GW± . Proof. Recall that χm (t) = t m . We have wind (aχ−m ) = wind a+wind χ−m = m−m = 0, whence aχ−m = eb with some b ∈ W by Theorems 1.5 and 1.6. Let b(t) =
∞
bn t n
(t ∈ T)
n=−∞
i
i i
i
i
i
i
8
buch7 2005/10/5 page 8 i
Chapter 1. Infinite Matrices
and put −1
b− (t) =
bn t n ,
b+ (t) =
n=−∞
It is obvious that eb± ∈ GW± . factorization.
∞
bn t n .
n=0
The representation a = eb− χm eb+ is the desired
Laurent polynomials. These are the functions in the Wiener algebra with only finitely many nonzero Fourier coefficients. Thus, b : T → C is a Laurent polynomial if and only if b is of the form s
b(t) =
bj t j
(t ∈ T),
(1.10)
j =−r
where r and s are integers and −r ≤ s. We denote the set of all Laurent polynomials by P and we write Pr,s for the Laurent polynomials of the form (1.10). We also put Ps := Ps,s . Finally, we let Ps+ := P0,s−1 stand for the analytic polynomials of degree at most s − 1 and we set P + = ∪s≥1 Ps+ . Let us assume that b ∈ Pr,s is not identically zero and that b−r = 0 and bs = 0. We can write b(t) = t −r (b−r + b−r+1 t + · · · + bs t r+s ). If b(t) = 0 for t ∈ T, we further have b(t) = t −r bs
J
(t − δj )
j =1
K
(t − μk ),
(1.11)
k=1
where |δj | < 1 for all j and |μk | > 1 for all k. Obviously, wind (t − δj ) = 1 and wind (t − μk ) = 0, whence wind b = J − r;
(1.12)
that is, wind b is the number of zeros of b in D minus the number of poles of b in D (all counted according to their multiplicity). The factorization b(t) = bs
J
1−
j =1
δj t
t J −r
K
(t − μk )
(1.13)
k=1
is a Wiener-Hopf factorization; notice that
δj 1− t
−1
=1+
δj2 δj + 2 + ··· t t
(t ∈ T)
(1.14)
and (t − μk )
−1
t 1 t2 1+ =− + 2 + ··· μk μk μk
(t ∈ T)
(1.15)
are functions in W− and W+ , respectively.
i
i i
i
i
i
i
1.5. Spectra
1.5
buch7 2005/10/5 page 9 i
9
Spectra
Fredholm operators. Let X be a Banach space. We denote by B(X) and K(X) the bounded and compact linear operators on X, respectively. The spectrum of an operator A ∈ B(X) is the set sp A = {λ ∈ C : A − λI
is not invertible}.
The operator valued function C \ sp A → B(H ), λ → (A − λI )−1 is well defined and analytic. It is called the resolvent of A. An operator A ∈ B(X) is said to be Fredholm if it is invertible modulo compact operators, that is, if there is an operator B ∈ B(X) such that AB − I and BA − I are compact. We define the essential spectrum of A ∈ B(X) as the set spess A = {λ ∈ C : A − λI
is not Fredholm}.
Clearly, spess A ⊂ sp A and spess A is invariant under compact perturbations. The kernel and the image (= range) of A ∈ B(X) are defined as usual: Ker A = {x ∈ X : Ax = 0}, Im A := A(X). An operator A ∈ B(X) is said to be normally solvable if Im A is a closed subspace of X. In that case the cokernel of A is Coker A = X/Im A. One can show that A ∈ B(X) is Fredholm if and only if A is normally solvable and both Ker A and Coker A have finite dimensions. The index of a Fredholm operator A ∈ B(X) is the integer Ind A := dim Ker A − dim Coker A. Theorem 1.9. Let a ∈ W . The operator T (a) is Fredholm on p (1 ≤ p ≤ ∞) if and only if a(t) = 0 for all t ∈ T. In that case Ind T (a) = −wind a. Proof. If a has no zeros on T and if the winding number of a is m, then a = a− χm a+ with a± ∈ GW± by virtue of Theorem 1.5. From Proposition 1.4 we infer that T (a) = T (a− )T (χm )T (a+ ), −1 and the same proposition tells us that T (a± ) are invertible, the inverses being T (a± ). From (1.8) we see that T (χm ) has closed range and that 0 if m ≥ 0, m if m ≥ 0, dim Ker T (χm ) = dim Coker T (χm ) = |m| if m < 0, 0 if m < 0,
which implies that T (χm ) is Fredholm of index −m. Consequently, T (a) is also Fredholm of index −m. Conversely, suppose now that T (a) is Fredholm and let m be the index. Contrary to what we want, let us assume that a(t0 ) = 0 for some t0 ∈ T. We can then find b, c ∈ GW
i
i i
i
i
i
i
10
buch7 2005/10/5 page 10 i
Chapter 1. Infinite Matrices
such that a−bW and a−cW are as small as desired and such that |wind b−wind c| = 1. Since Fredholmness and index are stable under small perturbations, it follows that T (b) and T (c) are Fredholm and that Ind T (b) = Ind T (c) = m. However, from what was proved in the preceding paragraph and from the equality |wind b − wind c| = 1 we know that |Ind T (b) − Ind T (c)| = 1. This contradiction shows that a cannot have zeros on T. Corollary 1.10. If a ∈ W , then spess T (a) = a(T). Proof. Apply Theorem 1.9 to a − λ. Corollary 1.11. Let a ∈ W . The operator T (a) is invertible on p (1 ≤ p ≤ ∞) if and only if a(t) = 0 for all t ∈ T and wind a = 0. Proof. If T (a) is invertible, then T (a) is Fredholm of index zero and Theorem 1.9 shows that a has no zeros on T and that wind a = 0. If a(t) = 0 for t ∈ T and wind a = 0, then a = a− a+ with a± ∈ GW± due to Theorem 1.5. From Proposition 1.4 we deduce that −1 −1 T (a+ )T (a− ) is the inverse of the operator T (a) = T (a− )T (a+ ). The following beautiful purely geometric description of the spectrum of a Toeplitz operator is illustrated by Figure 1.1. Corollary 1.12. If a ∈ W , then
sp T (a) = a(T) ∪ λ ∈ C \ a(T) : wind (a − λ) = 0 .
Proof. This is Corollary 1.11 with a replaced by a − λ. In Section 1.2, we observed that H (a) is compact for every a ∈ W . The following result shows that the zero operator is the only compact Toeplitz operator. Corollary 1.13. If a ∈ W and T (a) is compact on p (1 ≤ p ≤ ∞), then a vanishes identically. Proof. If T (a) is compact, then spess T (a) = {0}, and Corollary 1.10 tells us that this can only happen if a(T) = {0}.
1.6
Norms
The cases p = 1 and p = ∞. It is well known that an infinite matrix A = (aj k )∞ j,k=0 induces a bounded operator on 1 and ∞ , respectively, if and only if sup
∞
|aj k | < ∞
∞
|aj k | < ∞,
and
sup
and
A∞ = sup
k≥1 j =1
j ≥1 k=1
in which case A1 = sup
∞
k≥1 j =1
|aj k |
∞
j ≥1 k=1
|aj k |.
(1.16)
i
i i
i
i
i
i
1.6. Norms
11
6
6
4
4
2
2
0
0
−2
−2
−4
−4
−6
−6
−8 −6
−4
buch7 2005/10/5 page 11 i
−2
0
2
4
6
−8 −6
8
−4
−2
0
2
4
6
8
Figure 1.1. The essential spectrum spess T (a) = a(T) on the left and the spectrum sp T (a) on the right. This easily implies the following. Theorem 1.14. If a ∈ W then T (a)1 = T (a)∞ = aW . The case p = 2. Let L2 := L2 (T) be the usual Lebesgue space of complex-valued functions on T with the norm 2π |dt| 1/2 dθ 1/2 |f (t)|2 = |f (eiθ )|2 . f 2 := 2π 2π T 0 The set H 2 := H 2 (T) := {f ∈ L2 : fn = 0 for n < 0} is a closed subspace of L2 and is referred to as the Hardy space of L2 . Let P : L2 → H 2 be the orthogonal projection. Thus, if f ∈ L2 is given by f (t) =
∞
fn t n
(t ∈ T),
n=−∞
then (Pf )(t) =
∞
fn t n
(t ∈ T).
n=0
The map : H 2 → 2 ,
f → {fn }∞ n=0
(1.17)
is a unitary operator of H 2 onto 2 . It is not difficult to check that if a ∈ W , then −1 T (a) is the operator −1 T (a) : H 2 → H 2 ,
f → P (af ),
(1.18)
i
i i
i
i
i
i
12
buch7 2005/10/5 page 12 i
Chapter 1. Infinite Matrices
where (af )(t) := a(t)f (t). The observation (1.18) is in fact the key to the theory of Toeplitz operators on 2 . We here confine ourselves to the following consequence of (1.18). Theorem 1.15. If a ∈ W then T (a)2 = a∞ , where a∞ := maxt∈T |a(t)|. Proof. If f ∈ H 2 , then −1 T (a)f 2 = P (af )2 ≤ af 2 ≤ a∞ f 2 , whence T (a)2 = −1 T (a)2 ≤ a∞ . On the other hand, Corollary 1.12 implies that the spectral radius rad T (a) = max |λ| : λ ∈ sp T (a) is equal to a∞ . Because rad T (a) ≤ T (a)2 , it follows that a∞ ≤ T (a)2 . Other values of p. This case is more delicate, but one has at least two-sided estimates. Proposition 1.16. If a ∈ W and 1 ≤ p ≤ ∞, then a∞ ≤ T (a)p ≤ aW . Proof. The inequality T (a)p ≤ aW results from Proposition 1.1, and the inequality a∞ ≤ T (a)p is a consequence of Corollary 1.12 together with the estimate a∞ = rad T (a) ≤ T (a)p .
1.7
Inverses
Let b be a Laurent polynomial of the form (1.10). Suppose b(t) = 0 for t ∈ T and wind b = 0. From Section 1.4 we know that b can be written in the form b = b− b+ with s r δj 1− b− (t) = (t − μk ), (1.19) , b+ (t) = bs t j =1 k=1 where δ := max(|δ1 |, . . . , |δr |) < 1 and μ := min(|μ1 |, . . . , |μs |) > 1. When proving Corollary 1.11, we observed that −1 −1 )T (b− ). T −1 (b) = T (b+
From (1.19) we see that
(1.20)
∞ δj2 δj cm 1+ = , + 2 + · · · =: t t tm m=0 j =1 s s ∞ 1 t 1 t2 −1 b+ (t) = − 1+ + 2 + · · · =: dm t m . bs k=1 μk k=1 μk μk m=0 −1 (t) b−
r
With the coefficients cm and dm , formula (1.20) takes the form ⎛ ⎞⎛ d0 c0 c1 c2 ⎜ d1 d0 ⎟⎜ c0 c1 −1 ⎜ ⎟ ⎜ T (b) = ⎝ ⎠⎝ c0 d2 d1 d0 ... ... ... ...
⎞ ... ... ⎟ ⎟. ... ⎠ ...
(1.21)
i
i i
i
i
i
i
1.7. Inverses
buch7 2005/10/5 page 13 i
13
Proposition 1.3 and (1.20) imply that −1 −1 T −1 (b) = T (b−1 ) − H (b+ )H ( b− ).
(1.22)
−1 Representation (1.20) gives us T −1 (b) as the product of the lower triangular matrix T (b+ ) −1 −1 and the upper triangular matrix T (b− ), while (1.22) shows that T (b) is the difference of −1 −1 )H ( b− ) of two Hankel the (in general full) Toeplitz matrix T (b−1 ) and the product H (b+ matrices. Let α be any number satisfying 1 0 < α < min log , log μ . (1.23) δ
Lemma 1.17. For every n ≥ 0, −1 −1 | b− −n | ≤ min |b− (z)| (δ + ε)n (ε > 0), −1 | b+ |≤ n
|z|=δ+ε
−1
min |b+ (z)|
|z|=μ−ε
(μ − ε)n (0 < ε < μ).
(1.24) (1.25)
−1 −1 )−n and (b+ )n are O(e−αn ) as n → ∞. Consequently, (b−
Proof. Since 1/b− (z) is analytic for |z| > δ, we get 1 1 zn−1 dz zn−1 dz −1 = (b− )−n = 2π i |z|=1 b− (z) 2π i |z|=δ+ε b− (z) and hence −1 )−n | |(b−
1 ≤ 2π
−1
min |b− (z)|
|z|=δ+ε
(δ + ε)n−1 2π(δ + ε),
which is (1.24). Estimate (1.25) can be verified analogously. Proposition 1.18. For the j, k entry of T −1 (b) we have the estimate −1 T (b) j,k = b−1 j −k + O e−α(j +k) . Proof. From (1.22) we see that
∞ −1 −1 = − (b ) (b ) (b ) j −k + j + − −k− j,k =1 ∞ 1/2 ∞ −1 −1 ≤ |(b+ )j + |2 |(b− )−k− |2 ,
−1 T (b)
−1
=1
(1.26)
=1
and Lemma 1.17 implies that (1.26) is ⎛ 1/2 ⎞ ⎛ ∞ 1/2 ⎞ ∞ ⎠ O⎝ ⎠ = O(e−αj ) O(e−αk ). e−2α(j +) e−2α(k+) O⎝ =1
=1
i
i i
i
i
i
i
14
buch7 2005/10/5 page 14 i
Chapter 1. Infinite Matrices
Given two sequences x = {xk } and y = {yk }, we set (x, y) = xk y k . The j, k entry of T −1 (b) is just (T −1 (b)x, y) for x = ek and y = ej , where {en } is the standard basis of 2 . The following useful observation evaluates (T −1 (b)x, y) at another interesting pair (x, y). For z ∈ D, define wz ∈ 2 by (wz )n = zn (n ≥ 0). Proposition 1.19. Let b = b− b+ with b± given by (1.19). Then for α, β ∈ D, (T −1 (b)wα , wβ ) = =
r s 1 1 1 1 bs 1 − αβ j =1 1 − δj α k=1 β − μk
1
1
1 − αβ b− (1/α)b+ (β)
(1.27) (1.28)
.
Proof. We have −1
−1 −1 −1 (T −1 (b)wα , wβ ) = (T (b+ )T (b− )wα , wβ ) = (T (b− )wα , T (b+ )wβ ).
Define χn (t) := t n (t ∈ T). It is easily seen that, for |δ| < 1, T −1 (1 − δχ−1 )wα = T (1 + δχ−1 + δ 2 χ−2 + · · · )wα =
1 wα , 1 − δα
whence ⎛
⎞ 1 −1 ⎠ wα . )wα = ⎝ T (b− 1 − δ α j j =1 r
Analogously, if |μ| > 1, T −1 (χ−1 − μ)wβ = −
1 1 1 −1 1 1 1 − χ−1 wβ = − T wβ , wβ = μ μ μ 1 − μ−1 β β −μ
which implies that −1 T (b+ )wβ
=
−1 bs
s
1 β − μk k=1
wβ .
Consequently, −1
−1 (T (b− )wα , T (b+ )wβ ) =
r s 1 1 1 (wα , wβ ). bs j =1 1 − δj α k=1 β − μk
As (wα , wβ ) = 1/(1 − αβ), we arrive at (1.27). Clearly, (1.28) is nothing but another way of writing (1.27).
i
i i
i
i
i
i
1.8. Eigenvalues and Eigenvectors
1.8
buch7 2005/10/5 page 15 i
15
Eigenvalues and Eigenvectors
Let b be a Laurent polynomial. In this section we study the problem of finding the λ ∈ sp T (b) for which there exist nonzero x ∈ p such that T (b)x = λx. These λ are called eigenvalues of T (b) on p , and the corresponding x’s are referred to as eigenelements or eigenvectors (which sounds much better). Since T (b) − λI = T (b − λ), our problem is equivalent to the question of when a Toeplitz operator has a nontrivial kernel. Throughout this section we assume that b is not constant on the unit circle T. Outside the essential spectrum. For a point λ ∈ C \ b(T), we denote by wind (b, λ) the winding number of b about λ, that is, wind (b, λ) := wind (b − λ). A sequence {xn }∞ n=0 is said to be exponentially decaying if there are C ∈ (0, ∞) and γ ∈ (0, ∞) such that |xn | ≤ Ce−γ n for all n ≥ 0. Proposition 1.20. Let 1 ≤ p ≤ ∞. A point λ ∈ / b(T) is an eigenvalue of T (b) as an operator on p if and only if wind (b, λ) = −m < 0, in which case Ker (T (b) − λI ) has the dimension m and each eigenvector is exponentially decaying. Proof. From Theorem 1.8 (or simply from (1.13)) we get a Wiener-Hopf factorization b(t) − λ = b− (t)t −m b+ (t). Proposition 1.4 implies that T (b − λ) decomposes into the product T (b− )T (χ−m )T (b+ ) and that the operators T (b± ) are invertible, the inverses being −1 T (b± ). Thus, x ∈ Ker T (b − λ) if and only if T (χ−m )T (b+ )x = 0. If m ≤ 0, this is equivalent to the equation T (b+ )x = 0 and hence to the equality x = 0. So let m > 0. We denote by ej ∈ p the sequence given by (ej )k = 1 for k = j and (ej )k = 0 for k = j . Clearly, T (χ−m )T (b+ )x = 0 if and only if T (b+ )x belongs to the linear hull lin {e0 , . . . , em−1 } of e0 , . . . , em−1 . Consequently, −1 −1 )e0 , . . . , T (b+ )em−1 }. Ker T (b − λ) = lin {T (b+
This shows that dim Ker T (b−λ) = m, and from Lemma 1.17 we deduce that the sequences in Ker T (b − λ) are exponentially decaying. Inside the essential spectrum. Things are a little bit more complicated for points λ ∈ b(T). In that case b − λ has zeros on T. For τ ∈ T, we define the function ξτ by ξτ (t) = 1 −
τ t
(t ∈ T).
Notice that ξτ has a single zero on T and that T (ξτ ) is the upper triangular matrix ⎛ ⎞ 1 −τ 0 ... ⎜ 0 1 −τ . . . ⎟ ⎟. T (ξτ ) = ⎜ ⎝ 0 0 1 ... ⎠ ... ... ... ... Lemma 1.21. Let τ1 , . . . , τ be distinct points on T and let α1 , . . . , α be positive integers. Then Ker T ξτα11 . . . ξτα = {0} on p (1 ≤ p < ∞) (1.29)
i
i i
i
i
i
i
16
buch7 2005/10/5 page 16 i
Chapter 1. Infinite Matrices
and Ker T ξτα11 . . . ξτα = lin {wτ1 , . . . , wτ } on ∞ ,
(1.30)
where (wτ )n := 1/τ n . Proof. Put ξ = ξτα11 . . . ξτα and write ξ(t) = a0 + a1
1 1 1 + a 2 2 + · · · + aN N . t t t
The equation T (ξ )x = 0 is the difference equation a0 xn + a1 xn+1 + · · · + aN xn+N = 0
(n ≥ 0),
which is satisfied if and only if xn =
α 1 −1
γk(1)
k=0
α −1 k nk () n + · · · + γ k τ1n τn k=0
(1.31)
(j )
with complex numbers γk . The sequence given by (1.31) belongs to p (1 ≤ p < ∞) if and only if it is identically zero, which proves (1.29), and it is in ∞ if and only if it is of the form xn = γ0(1)
1 () 1 , n + · · · + γ0 τ1 τn
which gives (1.30). Given λ ∈ b(T), we denote the distinct zeros of b − λ on T by τ1 , . . . , τ and their multiplicities by α1 , . . . , α . We extract the zeros by “anti-analytic” linear factors, that is, we write b − λ in the form b(t) − λ =
1−
j =1
τj αj c(t), t
(1.32)
where c(t) = 0 for t ∈ T. Proposition 1.22. Let 1 ≤ p < ∞. Suppose λ ∈ b(T) and write b − λ in the form (1.32). Then λ is an eigenvalue of the operator T (b) on p if and only if wind c = −m < 0, in which case Ker (T (b) − λI ) is of the dimension m and all eigenvectors are exponentially decaying. Proof. By Proposition 1.4, T (b − λ) =
[T (ξτj )]αj T (c).
(1.33)
j =1
i
i i
i
i
i
i
1.8. Eigenvalues and Eigenvectors
buch7 2005/10/5 page 17 i
17
From (1.29) we see that Ker T (b − λ) = Ker T (c), and Proposition 1.20 therefore gives the assertion. We now turn to the case p = ∞. A sequence {xn }∞ n=0 is called extended if lim sup |xn | > 0. n→∞
Proposition 1.23. Let λ ∈ b(T) and write b−λ in the form (1.32). Then λ is an eigenvalue of T (b) on ∞ if and only if wind c = −m < . In that case the dimension of Ker (T (b) − λI ) is m + . There is a basis in Ker (T (b) − λI ) whose elements enjoy the following properties: (a) if m > 0, then m elements of the basis decay exponentially and elements have zeros in the first m places and are extended; (b) if m ≤ 0, then all the − |m| elements of the basis are extended. Proof. Combining (1.30) and (1.33), we see that T (b − λ)x = 0 if and only if there are complex numbers γj such that T (c)x =
γ j w τj .
(1.34)
j =1
Let c = c− χ−m c+ be a Wiener-Hopf factorization of c. It can be readily checked that −1 −1 −1 T (c− )wτ = c− (1/τ )wτ . Thus, setting δj = γj c− (1/τj ), we can rewrite (1.34) in the form T (χ−m )T (c+ )x =
δ j w τj .
(1.35)
j =1
If m ≥ 0, then (1.35) holds if and only if T (c+ )x ∈ lin {e0 , . . . , em−1 , T (χm )wτ1 , . . . , T (χm )wτ }, which is equivalent to the requirement that x be in −1 −1 −1 −1 lin {T (c+ )e0 , . . . , T (c+ )em−1 , T (χm )T (c+ )wτ1 , . . . , T (χm )T (c+ )wτ }. −1 The sequences T (c+ )ej decay exponentially (Lemma 1.17), and since −1 )wτ ]n = [T (c+
n 1 −1 k (c )k τ , τ n k=0 +
−1 it follows that the sequences T (c+ )wτj are extended. This completes the proof in the case m ≥ 0. Let now m < 0. In that case (1.35) is satisfied if and only if
δj (1/τj )n = 0 for n = 0, 1, . . . , |m| − 1
(1.36)
j =1
i
i i
i
i
i
i
18
buch7 2005/10/5 page 18 i
Chapter 1. Infinite Matrices
and x=
−1 δj T (c+ )T (χ−|m| )wτj .
(1.37)
j =1
Equations (1.36) are a Vandermonde system for the δj ’s. If |m| ≥ , then (1.36) has only the trivial solution. If |m| < , we can choose δ1 , . . . , δ−|m| arbitrarily. The numbers δ−|m|+1 , . . . , δ are then uniquely determined. This shows that the set of all x of the form (1.37) has the dimension − |m| and that all nonzero x in this set are extended. In geometric terms, the winding number of the function c in (1.32) can be determined as follows. Choose a number > 1 and consider the function b defined by b (t) = b(t) (t ∈ T). If is sufficiently close to 1, then b − λ has no zeros on T and hence wind (b , λ) is well defined. Proposition 1.24. We have wind c = lim wind (b , λ). →1+0
! τ Proof. Let c (t) := c(t). From (1.32) we obtain b (t) − λ = j =1 (1 − tj )αj c (t), whence wind (b , λ) = wind c . Since wind c converges to wind c as → 1, we arrive at the assertion. Frequently, the following observation is very useful. Proposition 1.25. Let λ ∈ b(T) and suppose is a connected component of C \ b(T) whose boundary contains λ. If wind (b, z) = κ for z ∈ , then wind c ≥ κ. Proof. For z ∈ , we have b(t) − z = bs t −r
r+κ
(t − δj (z))
j =1
s−κ
(t − μk (z))
k=1
with |δj (z)| < 1 and |μk (z)| > 1. Now let z ∈ approach λ ∈ ∂. The some of the δj (z), say δ1 (z), . . . , δm (z), move onto the unit circle, while the remaining δj (z) stay in the open unit disk. Analogously, some of the μk (z), say μ1 (z), . . . , μ (z), attain modulus 1, whereas the remaining μk (z) keep modulus greater than 1. We can write b(t) − λ as bs t −r
m
(t − δj (λ))
j =1
r+κ j =m+1
(t − δj (λ))
(t − μk (λ))
k=1
s−κ
(t − μk (λ)).
k=+1
The zeros of b − λ are δj (λ)/ and μk (λ)/. If > 1, then |δj (λ)| < 1 for all j . If, in addition, > 1 is sufficiently close to 1, then |μk (λ)/| > 1 for k = + 1, . . . , s − κ and |μk (λ)/| < 1 for k = 1, . . . , . The result is that lim wind (b − λ) = −r + (r + κ) + = κ + ≥ κ.
→1+0
From Proposition 1.24 we so obtain that wind c ≥ κ.
i
i i
i
i
i
i
1.8. Eigenvalues and Eigenvectors
buch7 2005/10/5 page 19 i
19
Corollary 1.26. If λ lies on the boundary of a connected component of C \ b(T) such that wind (b, z) ≥ 0 for z ∈ , then λ cannot be an eigenvalue of T (b) on p (1 ≤ p < ∞). Proof. Immediate from Propositions 1.22 and 1.25. Corollary 1.27. If b is a real-valued Laurent polynomial, then T (b) as an operator on p (1 ≤ p < ∞) has no eigenvalues. Proof. This is a straightforward consequence ! of Corollary 1.26. Here is an alternative proof. Let λ ∈ b(T) and write b(t) − λ = br t −r 2r j =1 (t − zj ). Since b(t) − λ is real valued, ! passage to the complex conjugate gives b(t) − λ = br t −r 2r j =1 (t − 1/ zj ). Thus, if b − λ has ≥ 1 distinct zeros τ1 , . . . , τ on T and n ≥ 0 distinct zeros μ1 , . . . , μn of modulus greater than 1, then b(t) − λ = br t −r
n
[(t − μj )(t − 1/ μj )]βj
j =1
(t − τj )αj =
j =1
1−
j =1
τj αj c(t) t
with c(t) = br t −r t α1 +···+α
n
[(t − μj )(t − 1/ μj )]βj .
j =1
Clearly, wind c = −r+α1 +· · ·+α +β1 +· · ·+βn . Since α1 +· · ·+α +2β1 +· · ·+2βn = 2r and αj ≥ 1 for all j , we get β1 + · · · + βn < r and thus wind c = r − β1 − · · · − βn > 0. The assertion now follows from Proposition 1.22. Example 1.28. We remark that Corollary 1.27 is not true for p = ∞: the sequence {1, 0, −1, 0, 1, 0, −1, 0, . . . } is obviously in the kernel of the operator ⎛ ⎞ 0 1 0 0 ... ⎜ 1 0 1 0 ... ⎟ ⎜ ⎟ ⎜ 0 1 0 1 ... ⎟ T (χ−1 + χ1 ) = ⎜ ⎟. ⎝ 0 0 1 0 ... ⎠ ... ... ... ... ... For this operator, things are as follows. The symbol is b(t) = t −1 + t and hence sp T (b) = b(T) = [−2, 2]. For λ ∈ [−2, 2], τ1 (λ) τ2 (λ) b(t) − λ = 1 − 1− t, t t where τ1 (λ), τ2 (λ) ∈ T are given by λ τ1,2 (λ) = ± i 2
" 1−
λ2 . 4
Thus, if λ ∈ (−2, 2), then Proposition 1.23 (with wind c = 1 and = 2) implies that T (b) − λI has a one-dimensional kernel in ∞ whose nonzero elements are extended, and
i
i i
i
i
i
i
20
buch7 2005/10/5 page 20 i
Chapter 1. Infinite Matrices
if λ ∈ {−2, 2}, then Proposition 1.23 (with wind c = 1 and = 1) shows that the kernel of T (b) − λI in ∞ is trivial. Example 1.29. Let b(t) = (1 + 1/t)3 . The image b(T) is the solid curve in the left picture of Figure 1.2; this curve is traced out in the clockwise direction. The curve intersects itself at the point −1. We see that C \ b(T) has two bounded connected components 1 and 2 with wind (b, λ) = −1 for λ ∈ 1 and wind (b, λ) = −2 for λ ∈ 2 . Thus, sp T (b) = 1 ∪ 2 ∪ b(T). By Proposition 1.20, the points in 1 ∪ 2 are eigenvalues. Looking at Figure 1.2 and using Proposition 1.24, we see that wind c = −1 for the points on the two small open arcs of b(T) that join 0 and −1. Thus, by virtue of Propositions 1.22 and 1.23, these points are also eigenvalues. The points of b(T) which are boundary points of the unbounded connected component of C\b(T), including the point −1, are not eigenvalues if 1 ≤ p < ∞ (Corollary 1.26), but they are eigenvalues if p = ∞, because wind c = 0 (Propositions 1.23 and 1.24). Finally, for λ = 0 we have = 1, and the right picture of Figure 1.2 reveals that wind c = 0. Consequently, λ = 0 is not an eigenvalue if 1 ≤ p < ∞ (Proposition 1.22) and is an eigenvalue if p = ∞ (Proposition 1.23).
origin +
Figure 1.2. The curves b(T) (solid) and b (T) with = 1.05 (dotted). Both curves are traced out clockwise. The right picture is a close-up (with a magnification about 4300) of the left picture in a neighborhood of the origin, which is marked by +.
Remark 1.30. Let b be a real-valued Laurent polynomial and suppose λ is a point in b(T). We know from Corollary 1.27 that Ker T (b − λ) = {0} in p (1 ≤ p < ∞). This implies that if 1 < p < ∞, then the range Im T (b − λ) is not closed but dense in p . In other words, T (b) has no residual spectrum on p for 1 < p < ∞. However, the polynomial b = χ−1 + χ1 is an example of a symbol for which Ker T (b) = {0} in ∞ and thus Im T (b) is not dense in 1 . Consequently, there are b’s such that T (b) has a residual spectrum on 1 .
i
i i
i
i
i
i
1.9. Selfadjoint Operators
1.9
buch7 2005/10/5 page 21 i
21
Selfadjoint Operators
We now consider Toeplitz operators on the space 2 . Obviously, T (b) is selfadjoint if and only if bn = b−n for all n, that is, if and only if b is real valued. Thus, let b(eix ) =
s
bk eikx = b0 +
k=−s
s
(an cos nx + cn sin nx),
n=1
where b0 , an , cn are real numbers. The resolution of the identity. Let A be a bounded selfadjoint operator on the space 2 . Then the operator f (A) is well defined for every bounded Borel function f on R. For λ ∈ R, put E(λ) = χ(−∞,λ] (A), where χ(−∞,λ] is the characteristic function of (−∞, λ]. The family {E(λ)}λ∈R is called the resolution of the identity for A. Stone’s formula states that 1 (E(λ + 0) + E(λ − 0))x 2 λ 1 = lim (A − (λ + iε)I )−1 − (A − (λ − iε)I )−1 x dλ ε→0+0 2πi −∞
(1.38)
for every x ∈ 2 . Let 2pp , 2ac , 2sing denote the set of all x ∈ 2 for which the measure dx (λ) := d(E(λ)x, x) is a pure point measure, is absolutely continuous with respect to Lebesgue measure, and is singular continuous with respect to Lebesgue measure, respectively. The sets 2pp , 2ac , 2sing are closed subspaces of 2 whose orthogonal sum is all of 2 . Moreover, each of the spaces 2pp , 2ac , 2sing is an invariant subspace of A. The spectra of the restrictions A|2ac and A|2sing are referred to as the absolutely continuous spectrum and the singular continuous spectrum of A, respectively. The point spectrum of A is defined as the set of the eigenvalues of A (and not as the spectrum of the restriction A|2pp ). It is well known that the spectrum of A is the union of the absolutely continuous spectrum, the singular continuous spectrum, and the closure of the point spectrum. The spectrum of T (b) is the line segment b(T) =: [m, M]. Corollary 1.27 tells us that the point spectrum of T (b) is empty unless b is a constant. As the following theorem shows, the singular continuous spectrum is also empty. Theorem 1.31 (Rosenblum). The spectrum of a Toeplitz operator generated by a realvalued nonconstant Laurent polynomial is purely absolutely continuous. Proof. Let b be a real-valued nonconstant Laurent polynomial. We may without loss of generality assume that the highest coefficient bs is 1. Fix λ ∈ R and ε > 0, and put z = λ + iε. As in Section 1.4, we can write δj (t − μj ), b(t) − z = 1− t where |δj | = |δj (z)| < 1 and |μj | = |μj (z)| > 1. Passing to the complex conjugate, we
i
i i
i
i
i
i
22
buch7 2005/10/5 page 22 i
Chapter 1. Infinite Matrices
get 1 1 b(t) − z = . 1− t− μj t δj Proposition 1.19 now implies that, for α ∈ D, 1 1 1 1 , (T −1 (b − z)wα , wα ) = , 2 f (z) 1 − |α| f (z) 1 − |α|2 ! ! where f (z) := (1 − δj α) (α − μj ). We have Im f (z) 1 2 1 2 2 f (z) − f (z) = 2 |f (z)|2 ≤ |f (z)| = ! |1 − δ α| ! | α − μ | ≤ (1 − |α|)2s , j j (T −1 (b − z)wα , wα ) =
because |1 − δj α| ≥ 1 − |α| and | α − μj | ≥ 1 − |α| for all j . Since E(λ − 0) = E(λ + 0), formula (1.38) gives λ2 2 1 dλ, |(E(λ2 )wα , wα ) − (E(λ1 )wα , wα )| ≤ 2π λ1 (1 − |α|)2s which shows that the function λ → (E(λ)wα , wα ) is absolutely continuous for each α ∈ D. It follows that wα ∈ 2ac for each α ∈ D, and as the linear hull of the set {wα }α∈D is dense in 2 , we arrive at the conclusion that 2ac = 2 , which is the assertion. The problem of diagonalizing selfadjoint bounded Toeplitz operators is solved. More or less explicit formulas can be found in [227], [228], [229], and [288]. We here confine ourselves to a few simple observations. Chebyshev polynomials. We denote by {Un }∞ n=0 the normalized Chebyshev polynomials of the second kind: " 2 sin(n + 1)θ . Un (cos θ) = π sin θ 2 The {Un }∞ n=0 constitute an orthonormal basis in the Hilbert space L ((−1, 1), √ polynomials 2 2 1 − λ ) =: L (σ ), 1 # Uj (λ)Uk (λ) 1 − λ2 dλ = δj k , −1
and they satisfy the identities λUn (λ) =
1 1 Un+1 (λ) + Un−1 (λ), 2 2
U−1 (λ) := 0.
(1.39)
For α ∈ T, we define Vα : 2 → L2 (σ ) by (Vα x)(λ) =
∞
xn α n Un (λ),
λ ∈ (−1, 1).
n=0
i
i i
i
i
i
i
1.9. Selfadjoint Operators
buch7 2005/10/5 page 23 i
23
Clearly, Vα is unitary and Vα−1 : L2 (σ ) → 2 acts by the rule 1 # 1 (Vα−1 f )n = n f (λ)Un (λ) 1 − λ2 dλ, n ≥ 0. α −1 We denote by Mf (λ) the operator of multiplication by the function f (λ) on L2 (σ ). Proposition 1.32. Let b(eix ) = b1 e−ix + b0 + b1 eix = b0 + a1 cos x + c1 sin x be a real valued trinomial. Put $ $ % b1 a1 + ic1 1 2 α= = , β = 2|b1 | = a + c12 . b1 a1 − ic1 2 1 Then T (b) = Vα−1 Mb0 +βλ Vα . Proof. Using (1.39) and the orthonormality of the polynomials Un we obtain 1 # 1 −1 (Vα Mλ Vα x)n = n λ(Vα x)(λ)Un (λ) 1 − λ2 dλ α −1 ∞ 1 # 1 = n xk α k λUk (λ)Un (λ) 1 − λ2 dλ α k=0 −1 # ∞ 1 1 1 1 k = n xk α 1 − λ2 dλ Uk (λ)Un+1 (λ) + Uk (λ)Un−1 (λ) α k=0 2 −1 2 α n+1 α n−1 α 1 1 + xn−1 = xn+1 + xn−1 , = n xn+1 α 2 2 2 2α where x−1 := 0. Equivalently, Vα−1 Mλ Vα = T
α 1 χ−1 + χ1 . 2 2α
This implies that Vα−1 Mb0 +2|b1 |λ Vα is the Toeplitz operator with the symbol $ $ b1 b1 b0 + |b1 |χ−1 + |b1 |χ1 = b0 + b1 χ−1 + b1 χ1 . b1 b1 In particular,
1 1 χ−1 + χ1 = V1−1 Mλ V1 , 2 2 i i T (sin x) := T χ1 − χ−1 = Vi−1 Mλ Vi . 2 2
T (cos x) := T
(1.40) (1.41)
i
i i
i
i
i
i
24
buch7 2005/10/5 page 24 i
Chapter 1. Infinite Matrices
Diagonalization of symmetric and skewsymmetric Toeplitz matrices. The polynomials g(x) = b0 +
s
an cos nx
n=1
and
u(x) =
s
cn sin nx
n=1
generate symmetric (A = A) and skewsymmetric (A = −A) Toeplitz matrices, respectively. From identities (1.39) we infer that 1 Un+2 (λ) + 4 1 λ3 Un (λ) = Un+3 (λ) + 8
λ2 Un (λ) =
1 1 Un (λ) + Un−2 (λ), 2 4 3 3 1 Un+1 (λ) + Un−1 (λ) + Un−3 (λ), 8 8 8
and so on, where U−2 (λ) = U−3 (λ) = · · · = 0. Consequently, as in the proof of Proposition 1.32, 1 1 1 1 1 T + cos 2x = T χ−2 + χ0 + χ2 = V1−1 Mλ2 V1 , 2 4 4 2 4 3 1 1 3 3 1 T cos x + cos 3x = T χ−3 + χ−1 + χ1 + χ3 = V1−1 Mλ3 V1 , 8 8 8 8 8 8 etc. This shows that we can find coefficients γ0 , γ1 , . . . , γs such that T
b0 +
s
an cos nx
= V1−1 Mγ0 +γ1 λ+···+γs λs V1 .
n=1
One can diagonalize the skewsymmetric matrices T (u) analogously. Resolution of the identity for Toeplitz operators. Let A be a bounded selfadjoint operator −1 on 2 and suppose we have √ a unitary operator V such that V AV is multiplication by λ on 2 2 2 L (σ ) := L ((−1, 1), 1 − λ ). Then the resolution of the identity for A can be computed from the formula E(λ) = V Mχ(−∞,λ] V −1 . Clearly, we can think of E(λ) as an infinite matrix (Ej k (λ))∞ j,k=0 . Proposition 1.33. The resolution of the identity for T (cos x) = T ( 21 χ−1 + 21 χ1 ) is given by E(λ) = 0 for λ ∈ (−∞, −1], E(λ) = I for λ ∈ [1, ∞), and ⎧ 1 sin(j + k + 2)θ sin(j − k)θ ⎪ ⎪ − if j = k ⎪ ⎨ π j +k+2 j −k Ej k (λ) = ⎪ ⎪ ⎪ 1 sin(2j + 2)θ − θ + π if j = k ⎩ π 2j + 2 with θ = arccos λ for λ ∈ (−1, 1). Proof. It suffices to consider λ in (−1, 1). Let en be the nth element of the standard basis
i
i i
i
i
i
i
1.9. Selfadjoint Operators
buch7 2005/10/5 page 25 i
25
of 2 . By virtue of (1.40), Ej k (λ) = (E(λ)ek , ej ) = (V1−1 Mχ(−∞,λ) V1 ek , ej ) = (Mχ(−∞,λ) V1 ek , V1 ej ) = (Mχ(−∞,λ) Uk , Uj ) λ # = Uk (μ)Uj (μ) 1 − μ2 dμ −1 2 π sin(k + 1)ϕ sin(j + 1)ϕ = sin2 ϕ dϕ π θ sin ϕ sin ϕ 1 π = [cos(j − k)ϕ − cos(j + k + 2)ϕ] dϕ, π θ which implies the assertion. From (1.40) we also deduce that if f is any continuous function on [−1, 1], then the j, k entry of f (T (cos x)) is [f (T (cos x))]j k = (V1−1 Mf (λ) V1 ek , ej ) = (Mf (λ) Uk , Uj ) 1 # = f (λ)Uk (λ)Uj (λ) 1 − λ2 dλ −1 2 π sin(k + 1)θ sin(j + 1)θ = f (cos θ) sin2 θ dθ π θ sin θ sin θ 1 π = f (cos θ)[cos(j − k)θ − cos(j + k + 2)θ ] dθ. π θ For example, the nonnegative square root of ⎛
⎞ 2 −1 0 ... ⎜ −1 2 −1 . . . ⎟ ⎟ T (2 − 2 cos x) = ⎜ ⎝ 0 −1 2 ... ⎠ ... ... ... ...
has j, k entry 1 π√ 2 − 2 cos θ [cos(j − k)θ − cos(j + k + 2)θ ] dθ π θ 2 π θ = sin [cos(j − k)θ − cos(j + k + 2)θ ] dθ π θ 2 * 1 1 1 1 = sin j − k + θ − sin j − k − θ π −1 2 2 + 1 1 − sin j + k + 2 + θ + sin j + k + 2 − θ dθ 2 2 1 1 1 1 1 = , − + − π j − k + 21 j − k − 21 j + k + 2 + 21 j + k + 2 − 21 and this equals 4 π
1 1 − . 4(j + k + 2)2 + 1 4(j − k)2 + 1
i
i i
i
i
i
i
26
buch7 2005/10/5 page 26 i
Chapter 1. Infinite Matrices
Exercises 1. (a) Find a function a ∈ L∞ (T) whose Fourier coefficients an (n ∈ Z) are just an = 1/(n + 1/2). (b) Show that the infinite Toeplitz matrix ⎛ ⎞ 1 − 21 − 13 . . . ⎜ 1 1 − 21 . . . ⎟ ⎜ 2 ⎟ ⎜ 1 ⎟ 1 ⎝ 3 1 ... ⎠ 2 ... ... ... ... induces a bounded operator on 2 but not on 1 . (c) Show that the infinite Toeplitz matrix ⎛ 1 1 2 ⎜ 1 1 ⎜ 2 ⎜ 1 1 ⎝ 3 2 ... ...
1 3 1 2
... ... 1 ... ... ...
⎞ ⎟ ⎟ ⎟ ⎠
does not generate a bounded operator on 2 . (d) Show that the infinite upper-triangular Toeplitz matrix ⎛ ⎞ 1 21 13 . . . ⎜ 1 21 . . . ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 1 ... ⎠ ... does not define a bounded operator on 2 ⎛ 1 1 2 ⎜ 1 1 ⎜ 2 3 ⎜ 1 ⎝ 3 ... ... ...
but that the infinite Hankel matrix ⎞ 1 ... 3 ... ... ⎟ ⎟ ⎟ ... ... ⎠ ... ...
is the matrix of a bounded operator on 2 . ˜ + T (a) H (b) for all a, b ∈ W . 2. Prove that H (ab) = H (a) T (b) 3. Find a Wiener-Hopf factorization of 6t − 41 + 31t −1 − 6t −2 . 4. Let b1 , . . . , bm ∈ P + have no common zero on T. Prove that there are c1 , . . . , cm ∈ W such that T (c1 )T (b1 ) + · · · + T (cm )T (bm ) = I . Can one choose the c1 , . . . , cm as rational functions without poles on T? 5. Let b(t) = 1 + 2t + γ t 3 . Show that there is no γ ∈ C for which T (b) is invertible.
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 27 i
27
6. Let ⎛ ⎜ ⎜ b(t) = det ⎜ ⎜ ⎝
2 1 0 0 1
1 2 1 0 t
0 1 2 1 t2
0 0 1 2 t3
0 0 0 1 t4
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
Show that b has no zeros on T and that wind b = 4. Try to prove the analogue of this if the 5 × 5 determinant is replaced by an n × n determinant in the obvious way.
7. Let b(t) = 4 + 5j =−5 t j . Show that T (b) is invertible. 8. Let b be a Laurent polynomial and 1 ≤ p ≤ ∞. Show that T (b) : p → p has closed range if and only if either b is identically zero or b has no zeros on T. 9. Let bn (t) = 1 + 21 (t + t −1 ) + 13 (t 2 + t −2 ) + · · · + n1 (t n + t −n ). Show that 2 log n + 0.1544 ≤ T (bn )4 ≤ 2 log n + 0.1545 for all sufficiently large n. 10. Prove that T (a) + Kp ≥ T (a)p for every a ∈ W and every compact operator on p (1 ≤ p ≤ ∞). Deduce that the zero operator is the only compact Toeplitz operator. 11. Prove that T n (a)2 = T (a n )2 for every a ∈ W . 12. Show that there exist Laurent polynomials b such that H (b)2 < b∞ and H (b)1 < bW .
13. For b = j bj χj ∈ P, define Sn b =
|j |≤n−1
bj χj ,
σn b =
1 (S1 b + · · · + Sn b). n
Prove that always T (σn b)2 ≤ T (b)2 but that there exist b and n such that T (Sn b)2 > 100 T (b)2 . 14. Show that if a ∈ W and T (a) is a unitary operator on 2 , then a is a unimodular constant. 15. Let B = (bj k )∞ j,k=1 with b23 = b32 = −1 and bj k = 0 otherwise. Show that the operator T (χ−1 + χ1 ) + B ∈ B(p ) (1 ≤ p ≤ ∞) has eigenvalues in (−2, 2). 2 16. Let A = T (χ−1 + χ1 ) + diag (vj )∞ j =1 . Show that if vj = o(1/j ), then A ∈ B( ) has at most finitely many eigenvalues in each segment [α, β] ⊂ (−2, 2) and that if vj = o(1/j 1+ε ) with some ε > 0, then the only possible eigenvalues of A ∈ B(2 ) are −2 and 2.
i
i i
i
i
i
i
28
buch7 2005/10/5 page 28 i
Chapter 1. Infinite Matrices
17. Show that the positive square root of T (2+2 cos x) is the Toeplitz-plus-Hankel matrix 4 π
(−1)j −k+1 (−1)j +k+2 + (2j − 2k − 2)(2j − 2k + 1) (2j + 2k + 3)(2j + 2k + 5)
∞ . j,k=0
Notes In his 1911 paper [267], Otto Toeplitz considered doubly infinite matrices of the form 2 (aj −k )∞ j =−∞ and proved that the spectrum of the corresponding operator on (Z) is just the curve , ∞ k ak t : t ∈ T . k=−∞
The matrices L(a) := (aj −k )∞ j =−∞ are nowadays called Laurent matrices. In a footnote of [267], Toeplitz established that the simply infinite matrix (aj −k )∞ j =0 induces a bounded operator on 2 (Z+ ) if and only if the doubly infinite matrix (aj −k )∞ j =−∞ generates a bounded ∞ 2 operator on (Z). This is why the matrices (aj −k )j =0 now bear his name. The material of Sections 1.1 to 1.7 is standard. The books [71] and [130] may serve as introductions to the basic phenomena in connection with infinite Toeplitz matrices. A nice source is also [150]. In [25], infinite systems with a banded Toeplitz matrix T (a) are treated with the tools of the theory of difference equations; in this book, we find formulas for the entries of the inverses in terms of the zeros of a(z) (z ∈ C) and solvability criteria in the n spaces of sequences x = {xn }∞ n=1 subject to the condition xn = O( ). Advanced topics in the theory of infinite Toeplitz matrices (= Toeplitz operators) are treated in the monographs [70], [103], [195], [196]. The standard texts on Hankel matrices are [196], [201], [204], [213]. Full proofs of Theorems 1.5, 1.6, 1.7 can be found in [103] or [230], for example. Theorem 1.8 as it is stated is due to Mark Krein [184]. The method of Wiener-Hopf factorization was introduced by N. Wiener and E. Hopf in 1931. What we call Wiener-Hopf factorization has its origin in the work of Gakhov [123], although the basic idea (in the case of vanishing winding number) was already employed by Plemelj [205]. Mark Krein [184] was the first to understand the operator theoretic essence and the Banach algebraic background of Wiener-Hopf factorization and to present the method in a crystal-clear manner. The results of Section 1.5 are also due to Krein [184]. However, it had been known a long time before that T (a) is Fredholm of index −wind a whenever a has no zeros on T; this insight is more or less explicit in works by F. Noether, S. G. Mikhlin, N. I. Muskhelishvili, F. D. Gakhov, V. V. Ivanov, A. P. Calderón, F. Spitzer, H. Widom, A. Devinatz, G. Fichera, and certainly others. Moreover, in 1952, Israel Gohberg [128] had already proved that T (a) is Fredholm if and only if a has no zeros on T. From this result it is only a small step (from the present-day understanding of the matter) to the formula Ind T (a) = −wind a. Section 1.8 is based on known results of [129], [130], [184]. Rosenblum’s papers [226], [227], [228] are the classics on selfadjoint Toeplitz operators. The monograph [229] contains very readable material on the topic. In these works
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 29 i
29
one can also find precise references to previous work on selfadjoint Toeplitz operators. For example, in [229] it is pointed out that the diagonalization (1.40) was carried out by Hilbert (1912) and Hellinger (1941). Proposition 1.19 is from [81] and [226] and Theorem 1.31 was established in [226]. The results around Proposition 1.32 are special cases of more general results in [227], [228]. Part of Rosenblum’s theory was simplified and generalized by Vreugdenhil [288]. We took Proposition 1.33 and the example following after it from [288]. Exercises 5 and 6 are from [208]. Exercises 15 and 16 are results of the papers [182], [183]. Actually, these two papers are devoted to the following more general problem: If λ is not an eigenvalue for T (b) ∈ B(p ), for which perturbations B ∈ B(p ) is λ not an eigenvalue of T (b) + B? In [183] it is in particular proved that if B = (bj k )∞ j,k=1 is such that p induces a bounded operator on (1 ≤ p < ∞), then bj k = 0 for j > k and (j 1+ε bj k )∞ j,k=1 the interval (−2, 2) contains no eigevalues of T (χ−1 + χ1 ) + B. As Exercise 15 shows, the requirement that B be upper-triangular is essential. A solution to Exercise 17 is in [288].
i
i i
i
i
buch7 2005/10/5 page 30 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 31 i
Chapter 2
Determinants
In this chapter, the main actors of this book enter the scene: finite Toeplitz matrices. For a in the Wiener algebra W and n ∈ {1, 2, 3, . . . }, we define the n × n Toeplitz matrix Tn (a) as the principal n × n section of T (a), that is, by ⎞ ⎛ a−1 . . . a−(n−1) a0 ⎜ a1 a0 . . . a−(n−2) ⎟ ⎟ ⎜ (2.1) Tn (a) := (aj −k )nj,k=1 = ⎜ . ⎟. . .. .. .. ⎠ ⎝ .. . . an−1
an−2
...
a0
If a finite Toeplitz matrix is a circulant matrix, then nearly every piece of information on its spectral properties is explicitly available. We also provide formulas for the eigenvalues and eigenvectors of tridiagonal Toeplitz matrices. Things are significantly more complicated for general Toeplitz matrices. The focus of this chapter is on the determinants Dn (a) := det Tn (a). We establish several exact and asymptotic formulas for these determinants, including the Szegö-Widom limit theorem and the Geronimo-Case-Borodin-Okounkov formula. Clearly, nowadays nobody would determine the eigenvalues of Tn (a) by computing the zeros of the polynomial Dn (a − λ) = det(Tn (a) − λI ) = det Tn (a − λ). However, the results of Chapter 11 on the asymptotic distribution of the eigenvalues of Tn (a) in the limit n → ∞ are heavily based on consideration of determinants and, independently of eigenvalues, Toeplitz determinants are a hot topic in statistical physics.
2.1
Circulant Matrices
Circulant matrices are the “periodic cousins” of Toeplitz matrices. While Toeplitz matrices usually emerge in stationary problems with zero boundary conditions, circulant matrices arise in connection with periodic boundary conditions. From the viewpoint of spectral theory, circulant matrices are much simpler than (noncirculant) Toeplitz matrices. Given a0 , a1 , . . . , an−1 ∈ C, we denote by circ (a0 , a1 , . . . , an−1 ) the circulant matrix 31
i
i i
i
i
i
i
32
buch7 2005/10/5 page 32 i
Chapter 2. Determinants
whose first column is ( a0 a1 . . . an−1 ) , ⎛
a0 a1 a2 .. .
an−1 a0 a1 .. .
an−2 an−1 a0 .. .
... ... ... .. .
a1 a2 a3 .. .
an−1
an−2
an−3
...
a0
⎜ ⎜ ⎜ circ (a0 , a1 , . . . , an−1 ) = ⎜ ⎜ ⎝ Let ωn = exp(2πi/n) and put ⎛ ⎜ 1 ⎜ ⎜ Fn = √ ⎜ n⎜ ⎝
1 1 1 .. .
1 ωn ωn2 .. .
1 ωn2 ωn4 .. .
... 1 . . . ωnn−1 . . . ωn2(n−1) .. .
1
ωnn−1
ωn2(n−1)
...
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
ωn(n−1)(n−1)
The matrix Fn is called the Fourier matrix of order n. Obviously, Fn is unitary. By a straightforward computation one can readily verify that circ (a0 , a1 , . . . , an−1 ) = Fn∗ diag (a(1), a(ωn ), . . . , a(ωnn−1 )) Fn ,
(2.2)
a(z) := a0 + a1 z + · · · + an−1 zn−1 .
(2.3)
where
Identity (2.3) tells us that the eigenvalues of circ (a0 , a1 , . . . , an−1 ) are a(1), a(ωn ), a(ωn2 ), . . . , a(ωnn−1 ) and that 1 1 ωnj ωn2j . . . ωn(n−1)j √ n j
is a (normalized) eigenvector to a(ωn ). Notice that the eigenvectors are extended, which means that their moduli do not show any kind of exponential decay. Let z1 , . . . , zn−1 be the zeros of the polynomial (2.3). For the determinant, we obtain det circ (a0 , a1 , . . . , an−1 ) =
n−1
a(ωnj )
j =0
=
n−1
an−1
j =0 n = an−1
n−1
(ωnj − zk ) =
an−1 (−1)n−1
j =0
k=1 n−1 n−1
n−1
n (zk − ωnj ) = an−1
k=1 j =0 n = an−1 (−1)n−1
n−1
(zk − ωnj )
k=1 n−1
(zkn − 1)
k=1 n−1
(1 − zkn ).
(2.4)
k=1
i
i i
i
i
i
i
2.1. Circulant Matrices
buch7 2005/10/5 page 33 i
33
Now let b be a Laurent polynomial of the form s
b(t) =
r ≥ 1,
bj t j ,
s ≥ 1,
b−r bs = 0.
(2.5)
j =−r
If n ≥ max(r, s) + 1, then the first and last columns of Tn (b) are ( b0 b1 . . . bs 0 . . . 0 )
and
( 0 . . . 0 b−r . . . b−1 b0 ) ,
respectively. For n ≥ r + s + 1, we define the circulant matrix Cn (b) as Cn (b) = circ (b0 , b1 , . . . , bs , 0, . . . , 0, b−r , . . . , b−1 ). Thus, Cn (b) results from Tn (b) by “periodization.” For example, if b(t) = b−2 t −2 + b−1 t −1 + b0 + b1 t + b2 t 2 + b3 t 3 , then
⎛ ⎜ ⎜ ⎜ ⎜ ⎜ C8 (b) = ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
b0 b1 b2 b3 0 0 b−2 b−1
b−1 b0 b1 b2 b3 0 0 b−2
b−2 b−1 b0 b1 b2 b3 0 0
0 b−2 b−1 b0 b1 b2 b3 0
0 0 b−2 b−1 b0 b1 b2 b3
b3 0 0 b−2 b−1 b0 b1 b2
b2 b3 0 0 b−2 b−1 b0 b1
b1 b2 b3 0 0 b−2 b−1 b0
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎠
Proposition 2.1. If n ≥ r + s + 1, then Cn (b) = Fn∗ diag (b(1), b(ωn ), . . . , b(ωnn−1 )) Fn ,
(2.6)
the eigenvalues of Cn (b) are b(1), b(ωn ), . . . , b(ωnn−1 ), j
and a (normalized) eigenvector for b(ωn ) is 1 . 1 ωnj ωn2j . . . ωn(n−1)j √ n Proof. In this case the polynomial (2.3) is a(z) = b0 + b1 z + · · · + bs zs + b−r zn−r + · · · + b−1 zn−1 . Since a(ωnj ) = b0 + b1 ωnj + · · · + bs ωnj s + b−r ωn−j r + · · · + b−1 ωn−j = b(ωnj ), the assertions follow from the corresponding result on circ (a0 , a1 , . . . , an−1 ).
i
i i
i
i
i
i
34
buch7 2005/10/5 page 34 i
Chapter 2. Determinants
Proposition 2.2. If n ≥ r + s + 1, then the determinant of Cn (b) is det Cn (b) = bsn (−1)s(n−1)
r+s (1 − zkn ), k=1
where z1 , . . . , zr+s are the zeros of the polynomial zr b(z). Proof. Using Proposition 2.1 we get det Cn (b) = ⎛ = bs ⎝
n−1 j =0
n−1
b(ωnj ) = ⎞r
ωn−j ⎠
j =0
n−1
bs ωn−j r
j =0
r+s (ωnj − zk ) k=1
r+s n−1
(ωnj − zk )
j =0 k=1
= bsn (−1)(n−1)r
n−1
(−1)r+s
j =0
= bsn (−1)(n−1)r (−1)(r+s)n
r+s
(zk − ωnj )
k=1 r+s n−1
(zk − ωnj )
k=1 j =0
= bsn (−1)ns−r = bsn (−1)
r+s
(zkn − 1) = bsn (−1)ns−r (−1)r+s
k=1 r+s s(n−1)
r+s
(1 − zkn )
k=1
(1 − zkn ).
k=1
Example 2.3. Let b(t) = t +α 2 t −1 , where α ∈ (0, 1]. Since b(eix ) = (1+α 2 ) cos x +i(1− α 2 ) sin x, the values b(t) trace out an ellipse in the counterclockwise direction as t moves along the unit circle in the counterclockwise direction; for α = 1, the ellipse degenerates to the line segment [−2, 2]. The eigenvalues of Cn (b) are quite regularly distributed on this ellipse, and the eigenvectors are all extended. The zeros of zb(z) = z2 + α 2 are ±iα, and hence for n ≥ 3, the determinant det Cn (b) equals nπ (−1)n−1 (1 − (iα)n )(1 − (−iα)n ) = (−1)n−1 1 − 2α n cos + α 2n . 2
2.2 Tridiagonal Toeplitz Matrices By a tridiagonal Toeplitz matrix we understand a matrix of the form ⎞ ⎛ 0 0 ... a0 a−1 ⎜ a1 a0 a−1 0 ... ⎟ ⎟ ⎜ ⎜ a1 a0 a−1 . . . ⎟ T (a) = ⎜ 0 ⎟. ⎝ 0 0 a1 a0 . . . ⎠ ... ... ... ... ...
i
i i
i
i
i
i
2.2. Tridiagonal Toeplitz Matrices
buch7 2005/10/5 page 35 i
35
The symbol of this√ matrix is a(t) = a−1√ t −1 + a0 + a1 t. Suppose a−1 = 0 and a1 = 0. We √ fix any value α = a−1 /a1 and define a1 /a−1 := 1/α and a1 a−1 := a1 α. Recall that Tn (a) is the principal n × n block of T (a). Theorem 2.4. The eigenvalues of Tn (a) are λj = a0 + 2
√
" (j )
a1 a−1
Proof. Put b(t) = t + α 2 t −1 . Thus, ⎛ ⎜ ⎜ T (b) = ⎜ ⎜ ⎝
(j = 1, . . . , n),
(2.7)
. . . xn ) with
(j )
(j )
and an eigenvector for λj is xj = ( x1 xk =
πj n+1
a1 a−1 cos
k sin
kπj n+1
α2 0 1 0 ...
0 1 0 0 ...
0 α2 0 1 ...
(k = 1, . . . , n).
0 0 α2 0 ...
... ... ... ... ...
(2.8)
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
Since, obviously, Tn (a) = a0 + a1 Tn (b), it suffices to prove that Tn (b) has the eigenvalues μj = 2α cos (j )
and that xj = ( x1
πj n+1
(j = 1, . . . , n)
. . . xn ) with (j )
xk = α −k sin (j )
kπj n+1
(k = 1, . . . , n)
is an eigenvector for μj . This is equivalent to proving the equalities (j )
(j )
α 2 x2 = μj x1 , (j )
(j )
(j )
xk + α 2 xk+2 = μj xk+1 (j ) xn−1
=
(k = 1, . . . , n − 2),
(2.9)
μj xn(j ) .
But these equalities can easily be verified: For example, (2.9) amounts to α −k sin
kπj (k + 2)πj πj (k + 1)πj + α 2 α −k−2 sin = 2αα −k−1 cos cos , n+1 n+1 n+1 n+1
which follows from the identity sin β + sin γ = 2 cos
β +γ β −γ sin . 2 2
Example 2.5. Let b be as in Example 2.3. In contrast to the situation for Cn (b), the eigenvalues of Tn (b) are distributed along the interval (−2α, 2α), which is the interval between
i
i i
i
i
i
i
36
buch7 2005/10/5 page 36 i
Chapter 2. Determinants
the foci of the ellipse b(T). Also notice that the eigenvectors are localized and exponentially decaying from the right for α ∈ (0, 1) (non-Hermitian case, b(T) is a nondegenerate ellipse) and that they are extended for α = 1 (Hermitian case, b(T) degenerates to [−2, 2]). Theorem 2.6. Let q1 , q2 be the zeros of the polynomial q 2 − a0 q + a1 a−1 . Then q2n+1 − q1n+1 if q1 = q2 , q2 − q 1 Dn (a) = (n + 1)q n if q1 = q2 = q. Dn (a) =
(2.10) (2.11)
Proof. It suffices to prove (2.10), because (2.11) results from (2.10) by the limit passage q2 → q1 . We have D1 (a) = a0 = q1 + q2 , D2 (a) = a02 − a−1 a1 = (q1 + q2 )2 − (q1 q2 )2 = q12 + q1 q2 + q22 , Dn (a) = a0 Dn−1 (a) − a−1 a1 Dn−2 (a) (n ≥ 3). Let δn denote the right-hand side of (2.10). Since δ1 = q1 + q2 , δ2 = q12 + q1 q2 + q22 , δn − a0 δn−1 + a−1 a1 δn−2 = 0 (n ≥ 3), it follows that Dn (a) = δn for all n ≥ 1. In particular, for the matrix ⎛ ⎜ ⎜ T (χ−1 + χ1 ) = ⎜ ⎜ ⎝
0 1 0 0 ...
1 0 1 0 ...
0 1 0 1 ...
0 0 1 0 ...
... ... ... ... ...
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
we obtain q1 = i, q2 = −i and thus Theorems 2.4 and 2.6 give ⎧ n ⎨ 0 if n ≡ 1, 3 (mod 4), πj −1 if n ≡ 2 (mod 4), Dn (χ−1 + χ1 ) = 2n cos = n + 1 ⎩ 1 if n ≡ 0 (mod 4). j =1
2.3 The Baxter-Schmidt Formula Let a(z) = a0 + a1 z + a2 z2 + · · · ,
a0 = 0,
be a function that is analytic and nonzero in some open neighborhood of the origin. In this neighborhood, we can consider the analytic function c(z) := 1/a(z) = c0 + c1 z + c2 z2 + · · · ,
c0 = a0−1 .
i
i i
i
i
i
i
2.3. The Baxter-Schmidt Formula
buch7 2005/10/5 page 37 i
37
We denote by Dn (z−r a(z)) and Dr (z−n c(z)) the determinants of the n×n and r ×r principal submatrices of the Toeplitz matrices ⎛ ⎞ ar ar−1 . . . a0 0 0 ... ⎜ ar+1 ar . . . a1 a0 0 ... ⎟ ⎜ ⎟ ⎝ ar+2 ar+1 . . . a2 a1 a0 . . . ⎠ ... ... ... ... ... ... ... and ⎛
cn
⎜ cn+1 ⎜ ⎝ cn+2 ...
cn−1 cn cn+1 ...
... ... ... ...
c0 c1 c2 ...
⎞ 0 ... 0 ... ⎟ ⎟, c0 . . . ⎠ ... ...
0 c0 c1 ...
respectively. Theorem 2.7 (Baxter and Schmidt). If n, r ≥ 1, then a0−r Dn (z−r a(z)) = (−1)rn c0−n Dr (z−n c(z)).
(2.12)
Proof. Since (2.12) is symmetric in n and r, we may without loss of generality assume that n ≥ r. Put ⎛ ⎞ ar ... a1 a0 ... 0 ⎜ ar+1 ... a2 a1 ... 0 ⎟ ⎜ ⎟ ⎜ .. .. .. .. ⎟ ⎜ ⎟ . . . . ⎜ ⎟ ⎟ a . . . a a . . . a A=⎜ n−r n−r−1 0 ⎟ ⎜ n−1 ⎜ an ⎟ . . . a a . . . a n−r+1 n−r 1 ⎟ ⎜ ⎜ .. .. .. .. ⎟ ⎝ . . . . ⎠ an an−1 . . . ar an+r−1 . . . and ⎛
c0 c1 .. .
⎜ ⎜ ⎜ ⎜ ⎜ C=⎜ ⎜ cr−1 ⎜ cr ⎜ ⎜ . ⎝ .. cn−1
... ...
0 0 .. .
0 0 .. .
... ...
c0 c1 .. .
0 1 .. .
...
cn−r
0
Taking into account that a(z)c(z) = 1, we get 0 AC = −R
D ∗
⎞ ... 0 ... 0 ⎟ ⎟ .. ⎟ . ⎟ ⎟ ... 0 ⎟ ⎟. ... 0 ⎟ ⎟ .. ⎟ . ⎠ ... 1 ,
i
i i
i
i
i
i
38
buch7 2005/10/5 page 38 i
Chapter 2. Determinants
where D is a lower-triangular (n − r) × (n − r) matrix whose diagonal elements are all equal to a0 and where R is the r × r matrix ⎛ ⎞ a 0 cn . . . a0 cn−r+1 ⎜ ⎟ .. R = ⎝ ... ⎠ . ar−1 cn + · · · + a0 cn−r+1 . . . ar−1 cn−r+1 + · · · + a0 cn ⎛ ⎜ ⎜ =⎜ ⎝
⎞⎛
a0 a1 .. .
a0 .. .
ar−1
ar−2
..
. ...
cn
⎟ ⎜ cn+1 ⎟⎜ ⎟⎜ .. ⎠⎝ . cn+r−1 a0
cn−1 cn .. .
... ... .. .
cn−r+1 cn−r+2 .. .
cn+r−2
...
cn
⎞ ⎟ ⎟ ⎟. ⎠
Clearly, det A = Dn (z−r a(z)),
det C = c0r ,
det AC = (−1)r(n−r) a0n−r (−1)r det R, det R = a0r Dr (z−n c(z)),
whence c0r Dn (z−r a(z)) = (−1)r(n−r)+r a0n Dr (z−n c(z)). Since r(n − r) + r ≡ rn (mod 2), c0 = a0−1 , a0 = c0−1 , we therefore arrive at the desired formula (2.12).
2.4 Widom’s Formula Let now b be a Laurent polynomial, b(t) =
s
bj t j
(t ∈ T).
j =−r
We are interested in a formula for the determinant Dn (b) of the Toeplitz band matrix Tn (b) whose complexity is independent of n. If r ≤ 0 or s ≤ 0, then Tn (b) is triangular and hence Dn (b) is the nth power of the entry on the main diagonal. Thus, assume r ≥ 1, s ≥ 1, b−r = 0, and bs = 0. We can write b(t) = bs t −r
r+s
(t − zj )
(t ∈ T),
(2.13)
j =1
where z1 , . . . , zr+s are the zeros of the polynomial zr b(z) = b−r + b−r+1 z + · · · + bs zr+s .
(2.14)
Theorem 2.8 (Widom). If the zeros z1 , . . . , zr+s are pairwise distinct then, for every n ≥ 1, n Dn (b) = CM wM , (2.15) M
i
i i
i
i
i
i
2.4. Widom’s Formula
buch7 2005/10/5 page 39 i
39
where the sum is over all ( r+s s ) subsets M ⊂ {1, 2, . . . , r + s} of cardinality |M| = s and, with M := {1, 2, . . . , r + s} \ M, wM := (−1)s bs zj , CM := zjr (zj − zk )−1 . j ∈M
j ∈M
j ∈M k∈M
Proof. From (2.13) we see that Dn (b) = bsn Dn (z−r a(z))
(2.16)
where a(z) = (z − z1 ) · · · (z − zr+s ) = a0 + a1 z + · · · + zr+s . Put c(z) = 1/a(z) = c0 + c1 z + c2 z2 + · · · . The Baxter-Schmidt formula (2.12) gives Dn (z−r a(z)) = (−1)rn a0r c0−n Dr (z−n c(z)) = (−1)rn a0r+n Dr (z−n c(z)).
(2.17)
We decompose c(z) into partial fractions: Bi 1 1 = = a(z) (z − z1 ) · · · (z − zr+s ) zi − z i Bi z Bi Bi 1 z , = = = +1 z 1 − z/z z z z i i i i i i i i
c(z) =
which holds whenever |z| < |zi | for all i (notice that b−r = 0 so that |zi | = 0 for all indices i). Using the Cauchy-Binet formula and the formula for Vandermonde determinants, we obtain cn . . . cn−r+1 .. .. .. Dn (z−n c(z)) = . . . cn+r−1 . . . cn
n+1 n−r+2 ... i Bi /zi i Bi /zi .. .. .. = . . .
n+r n+1 B . . . /z B /z i i i i i i 1/z1n+1 . . . 1/z1n−r+2 B1 ... Br+s .. .. .. .. = . . . . B1 /zr−1 . . . Br+s /zr−1 1/zn+1 . . . 1/zn−r+2 r+s r+s r+s 1 Bi1 1/zin+1 . . . ... Bir 1 .. .. .. = . . . 1≤i1 ≤···≤ir ≤r+s B /zr−1 . . . B /zr−1 1/zin+1 ... i1 i 1 ir ir r
1/zin−r+2 r .. . 1/zin−r+2 r
i
i i
i
i
i
i
40
buch7 2005/10/5 page 40 i
Chapter 2. Determinants
=
i∈M
=
i∈M
=
i∈M
=
B i 1 · · · Bi r (zi1 · · · zir )n+1 B i 1 · · · Bi r (zi1 · · · zir )n+1
1 B i 1 · · · Bi r (ziβ − ziα ) n+1 (zi1 · · · zir ) zir−1 β β β=α ⎛ ⎝
M
=
M
1 . . . zir−1 1 ... 1 1 .. .. .. .. . . . . 1/zr−1 . . . 1/zr−1 1 . . . zr−1 i1 ir ir 1 1 − (zi − ziα ) z iβ ziα β>α β β>α
⎞⎛ Bi ⎠ ⎝
i∈M
⎛ ⎝
⎞
⎛
Bi ⎠ ⎝
i∈M
⎞
⎜ ⎟ −r−1 ⎟ zi−n−1 ⎠ ⎜ (z − z ) zi i k ⎝ ⎠
i∈M
⎞⎛
⎞
⎛
i,k∈M k =i
i∈M
⎞
⎜ ⎟ zi−n−r ⎠ ⎜ (zi − zk )⎟ ⎝ ⎠.
i∈M
(2.18)
i,k∈M k =i
The coefficients Bi of the partial fraction decomposition are Bi = −
(zi − z ). =i
Hence ⎛ ⎝
⎞
⎛
⎞
⎜ ⎟ Bi ⎠ ⎜ (zi − zk )⎟ ⎝ ⎠
i∈M
= (−1)r
i,k∈M k =i
(zi − zj )−1
i∈M j ∈M
= (−1)r+rs
(zi − z )−1
i,∈M =i
(zi − zk )
i,k∈M k =i
(zi − zj )−1 .
(2.19)
i∈M j ∈M
Combining (2.16), (2.17), (2.18), and (2.19), we arrive at the equality
Dn (b) = bsn (−1)rn a0r+n
M
⎛ ⎝
i∈M
⎞
⎛
⎞
⎜ ⎟ zi−n−r ⎠ (−1)r−rs ⎜ (zi − zj )−1 ⎟ ⎝ ⎠. i∈M j ∈M
i
i i
i
i
i
i
2.5. Trench’s Formula
buch7 2005/10/5 page 41 i
41
Since a0 = (−1)r+s z1 · · · zr+s , it follows that
Dn (b) = bsn (−1)rn+(r+s)(r+n)+r+rs
⎛ ⎝
j ∈M
M
⎞
⎞
⎛
⎟ ⎜ −1 ⎟ zin+r ⎠ ⎜ (z − z ) i j ⎠. ⎝ i∈M j ∈M
As rn + (r + s)(r + n) + r + rs ≡ 2rn + 2rs + r(r + 1) + sn ≡ sn (mod 2) and ⎛ ⎛ ⎞ ⎞ ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ n+r ⎜ ⎟ −1 ⎟ ⎝ ⎝ zin+r ⎠ ⎜ (z − z ) zi ⎠ ⎜ (zi − zk )−1 ⎟ i j ⎝ ⎝ ⎠= ⎠, M
j ∈M
M
i∈M j ∈M
j ∈M
k∈M j ∈M
we finally obtain (2.15). Example 2.9. Let T (a) be the tridiagonal matrix considered in Section 2.2 and write a(t) = t −1 (t − z1 )(t − z2 ). Suppose z1 = z2 . There are exactly two sets M ⊂ {1, 2} of cardinality |M| = 1, namely M = {1} and M = {2}. Thus, by Theorem 2.8, w{1} = (−1)a1 z1 ,
C{1} = z1 (z1 − z2 )−1 ,
w{2} = (−1)a1 z2 ,
C{2} = z2 (z2 − z1 )−1 ,
Dn (a) =
zn+1 − z2n+1 z1 z2 (−1)n a1n z1n + (−1)n a1n z2n = (−1)n a1n 1 . z1 − z 2 z2 − z 1 z2 − z 1
We leave it as an exercise to verify that this is in accordance with the formula provided by Theorem 2.6.
2.5 Trench’s Formula Let b be as in Section 2.4. We now establish a formula for Dn (b) that is also applicable to the case where the polynomial (2.14) has multiple zeros. We denote by gn (z) the row gn (z) = ( 1 z . . . zr−1 zr+n zr+n+1 . . . zr+n+s−1 ). Let ξ1 , . . . , ξm be the distinct roots of the polynomial (2.14) and let α1 , . . . , αm be their multiplicities. We define Gn as the determinant of the (r + s) × (r + s) matrix Ar+s whose first α1 rows are gn (ξ1 ), gn (ξ1 ), . . . , gn(α1 −1) (ξ1 ), whose next α2 rows are gn (ξ2 ), gn (ξ2 ), . . . , gn(α2 −1) (ξ2 ), and so on. Theorem 2.10 (Trench). We have G0 = 0 and Dn (b) = (−1)ns bsn
Gn for every n ≥ 1. G0
(2.20)
Confluent Vandermonde determinants. Before turning to the proof of this theorem, we cite the formula for the so-called confluent Vandermonde determinants. Let hβ (z) be the
i
i i
i
i
i
i
42
buch7 2005/10/5 page 42 i
Chapter 2. Determinants
row ( zβ zβ+1 . . . zβ+α−1 ). Given m distinct number ξ1 , . . . , ξm and natural numbers α1 , . . . , αm such that α1 + · · · + αm = α, we let V (ξ1 (β1 , α1 ), . . . , ξm (βm , αm )) denote the determinant of the α × α matrix whose rows are m −1) (ξm ). hβ1 (ξ1 ), hβ1 (ξ1 ), . . . , hβ(α11 −1) (ξ1 ), . . . , hβm (ξm ), hβm (ξm ), . . . , h(α βm
In the case α1 = · · · = αm = 1, this is a pure Vandermonde determinant and hence V (ξ1 (β1 , 1), . . . , ξα (βα , 1)) =
i
β
ξi i
(ξj − ξi ).
j >i
Appropriate limit passages and l’Hospital’s rule therefore yield V (ξ1 (β1 , α1 ), . . . , ξm (βm , αm )) =
α βi
ξi i
i
G(αi + 1)
i
(ξj − ξi )αj αi ,
(2.21)
j >i
where G(α + 1) := (α − 1)!(α − 2)! . . . 1!0!. Proof of Theorem 2.10. By formula (2.21), G0 = V (ξ1 (0, α1 ), . . . , ξm (0, αm )) =
G(αi + 1)
i
(ξj − ξi )αj αi = 0.
j >i
Suppose first that the zeros z1 , . . . , zr+s of the polynomial (2.14) are all simple. We show that in this case the right-hand side of (2.20) coincides with the right-hand side of (2.15). Let M range over all subsets {j1 , . . . , js } of {1, 2, . . . , r + s} whose cardinality is s and put {1, 2, . . . , r + s} \ M =: {k1 , . . . , kr }. Using Laplace’s expansion theorem, we obtain that Gn is (−1)1+···+r+k1 +···+kr
1 .. .
z k1 .. .
...
zkr−1 1 .. .
r+n zj 1 .. . r+n z js
...
zjr+n+s−1 1 .. . zjr+n+s−1 s
zkr−1 ... r = (−1)1+···+r+k1 +···+kr (zj1 . . . zjs )r+n (zkα − zkβ ) (zjγ − zjδ ). M
1
z kr
...
α>β
M
(2.22)
γ >δ
Permuting the rows 1, 2, . . . , r + s of G0 to k1 , . . . , kr , j1 , . . . , js shows that G0 equals (−1)(k1 −1)+(k2 −2)+···+(kr −r)
α>β
(zkα − zkβ )
γ >δ
(zjγ − zjδ )
(zjγ − zkα ).
(2.23)
γ ,α
i
i i
i
i
i
i
2.6. Szegö’s Strong Limit Theorem
buch7 2005/10/5 page 43 i
43
From (2.22) and (2.23) we get Gn = (zj1 . . . zjs )r+n (zjγ − zkα )−1 G0 γ ,α M ⎛ ⎞n ⎝ = zj ⎠ zjr (zj − zk )−1 j ∈M
M
=
j ∈M
j ∈M k∈M
n ((−1)s bs−1 wM )n CM = (−1)sn bs−n CM wM , M
M
which completes the proof in the case of simple zeros. If there are multiple zeros among z1 , . . . , zr+s , then the formula follows from what was just proved by the appropriate limit passages.
2.6
Szegö’s Strong Limit Theorem
Let b(t) =
s
j =−r
bj t j be a Laurent polynomial and suppose b(t) = 0 for t ∈ T and wind b = 0.
(2.24)
In that case there is a function log b ∈ C(T) such that b = exp(log b). Clearly, log b is determined uniquely up to an additive constant in 2π iZ. Let (log b)k be the kth Fourier coefficient of log b and put G(b) = exp(log b)0 , E(b) = exp
∞
(2.25)
k(log b)k (log b)−k .
(2.26)
k=1
Since b ∈ C ∞ (T), the function log b also belongs to C ∞ (T). This implies that (log b)k = O(1/|k|m ) for every m ≥ 1 and shows that the series in (2.26) converges absolutely. The constants G(b) and E(b) are obviously independent of the particular choice of log b. We know from Section 1.4 that if (2.24) holds, then b(t) = bs
k i=1
δi 1− t
i
(t − μj )σj ,
j =1
where δ1 , . . . , δk , μ1 , . . . , μ are distinct, |δi | < 1, |μj | > 1, 1 + · · · + k = r, and σ1 + · · · + σ = s. On writing log b(t) = log bs +
δi t i log 1 − σj log(−μj ) + σj log 1 − + t μj i=1 j =1 j =1
k
i
i i
i
i
i
i
44
buch7 2005/10/5 page 44 i
Chapter 2. Determinants
and using the formula log(1 − z) = −z − z2 /2 − z3 /3 − · · · (|z| < 1), we get
(log b)0 = log bs +
σj log(−μj ),
j =1
(log b)n = −
σj nμnj j =1
(log b)−n = −
(n ≥ 1),
k i δ n
(n ≥ 1).
i
n
i=1
Thus, the constants (2.25) and (2.26) are G(b) = bs (−1)s
σ
μj j ,
j =1
E(b) =
k
1−
i=1 j =1
δi μj
(2.27)
−i σj .
(2.28)
Theorem 2.11 (Szegö’s strong limit theorem). If b is a Laurent polynomial satisfying (2.24), then Dn (b) = G(b)n E(b)(1 + O(q n )) as n → ∞
(2.29)
with some constant q ∈ (0, 1). Proof. Formula (2.20) and Laplace expansion of the determinant Gn show that (−1)ns bsb V (δ1 (0, 1 ), . . . , δk (0, k )) G0 × V (μ1 (r + s, σ1 ), . . . , μ (r + n, σ )) (1 + O(q n ))
Dn (b) =
with max | δi | / min |μj | < q < 1.
(2.30)
By formula (2.21), V (δ1 (0, 1 ), . . . , δk (0, k )) = G(i + 1) (δi2 − δi1 )i2 i1 , i
i2 >i1
V (μ1 (r + s, σ1 ), . . . , μ (r + n, σ )) σ (r+n) = μj i G(σj + 1) (μj2 − μj1 )σj2 σj1 , j
j
j2 >j1
i
i i
i
i
i
i
2.7. The Szegö-Widom Theorem
buch7 2005/10/5 page 45 i
45
G0 = V (δ1 (0, 1 ), . . . , δk (0, k ), μ1 (0, σ1 ), . . . , μ (0, σ )) = G(i + 1) G(σj + 1) i
×
j
(δi2 − δi1 )
i2 i1
i2 >i1
(μj2 − μj1 )σj2 σj1
j2 >j1
j
(μj − δi )σj i ,
i
whence (−1)ns bsb V (δ1 (0, 1 ), . . . , δk (0, k )) V (μ1 (r + s, σ1 ), . . . , μ (r + n, σ )) G0 ⎛ ⎞n σj σj r = (−1)ns bsn ⎝ μj ⎠ μj (μj − δi )−σj i , j
and since
σ r
μj j
j
j
=
j
σ r μj j
j
j
i
(μj − δi )−σj i
i
j
−σ r μj j
i
δi 1− μj
−σj i
=
j
i
δi 1− μj
−σj i ,
formulas (2.27) and (2.28) imply the assertion.
2.7 The Szegö-Widom Theorem The road from Baxter-Schmidt through Widom and Trench led to a completely elementary but rather computational proof of Szegö’s strong limit theorem. With a little bit of operator theory, this theorem can be proved in a straightforward and very nifty way. 2 Trace class. Let (cj k )∞ j,k=0 be an infinite matrix that defines a compact operator K on . The operator K is said to be a Hilbert-Schmidt operator if ∞
|cj k |2 < ∞,
j,k=0
and K is called a trace class operator if it is the product of two Hilbert-Schmidt operators. The Hilbert-Schmidt and trace class operators form two-sided ideals in the algebra of all bounded operators. Operator determinants. Let K be a trace class operator and let {λj (K)}N j =1 (N finite or N = ∞) be the sequence of its eigenvalues, each eigenvalue repeated according to its algebraic multiplicity. Then N
|λj (K)| < ∞,
j =1
i
i i
i
i
i
i
46
buch7 2005/10/5 page 46 i
Chapter 2. Determinants
and the determinant det(I + K) is defined by det(I + K) =
N
(1 + λj (K)).
j =1
The operator I + K is invertible if and only if det(I + K) = 0. If K and L are trace class operators, then det(I + K)(I + L) = det(I + K) det(I + L).
(2.31)
If C is an invertible operator, then det(I + CKC −1 ) = det C(I + K)C −1 = det(I + K).
(2.32)
For n ≥ 1, let Pn : 2 → 2 be the projection defined by Pn : {x0 , x1 , x2 , . . . } → {x0 , x1 , . . . , xn−1 , 0, . . . }. We identify Pn (I + K)Pn with the n × n matrix (δj k + cj k )n−1 j,k=1 , where δj k is the Kronecker delta. Then, if K is of trace class, lim det Pn (I + K)Pn = det(I + K).
(2.33)
n→∞
Prologue. Recall that H (a) is the Hankel operator given by the infinite matrix (aj +k+1 )∞ j,k=0 . Obviously, H (a) is Hilbert-Schmidt ⇐⇒
∞
n|an |2 < ∞.
(2.34)
n=1
Now let b be a Laurent polynomial satisfying (2.24). We then can write b = b− b+ with s r δi 1− (t − μj ), (2.35) , b+ (t) = bs b− (t) = t i=1 j =1 −1 where |δi | < 1 and |μj | > 1. The matrix T (b− )! is upper triangular with 1 on the main −1 diagonal, while T (b+ ) is lower triangular with (bs (−μj ))−1 on the main diagonal. This implies that −1 −1 )Pn = T (b− )Pn , Pn T (b−
and
−1 −1 Pn T (b+ )Pn = Pn T (b+ )
⎛ −1 ) = 1, det Tn (b−
−1 det Tn (b+ ) = ⎝bs
s
(2.36)
⎞−n (−μj )⎠
= G(b)−n
(2.37)
j =1
(recall (2.27)). By Proposition 1.3, b−1 ) T (b)T (b−1 ) = I − H (b)H (
(2.38)
i
i i
i
i
i
i
2.7. The Szegö-Widom Theorem
buch7 2005/10/5 page 47 i
47
and −1 −1 −1 −1 −1 T (b+ )T (b)T (b− ) = T (b− )T (b− ) − H (b+ )H ( b)T (b− ) −1 −1 = I − H (b+ )H (b)T (b− ). −1 −1 )T (b)T (b− ) are of the We therefore deduce from (2.34) that both T (b)T (b−1 ) and T (b+ form I + trace class operator.
Theorem 2.12 (Szegö-Widom limit theorem). If b is a Laurent polynomial without zeros on the unit circle and with winding number zero, then lim
n→∞
Dn (b) = det T (b)T (b−1 ). G(b)n
Proof. From (2.36) and (2.37) we see that −1 −1 −1 −1 )T (b)T (b− )Pn = det Pn T (b+ )Pn T (b)Pn T (b− )Pn det Pn T (b+ −1 −1 = det Tn (b+ )Tn (b)Tn (b− ) = Dn (b)/G(b)n .
(2.39)
−1 −1 )T (b)T (b− ) − I is of trace class, we infer from (2.33) that On the other hand, since T (b+ −1 −1 −1 −1 )T (b)T (b− )Pn = det T (b+ )T (b)T (b− ), lim det Pn T (b+
n→∞
and (2.32) shows that −1 −1 −1 −1 )T (b)T (b− ) = det T (b)T (b− )T (b+ ) = det T (b)T (b−1 ). det T (b+
To compute the operator determinant det T (b)T (b−1 ) we need two auxiliary results. We abbreviate det T (b)T (b−1 ) to E(b). Lemma 2.13. Let f± , g± , h± be Laurent polynomials in W± without zeros on T and with winding number zero. Then E(f− g+ h+ ) = E(f− g+ )E(f− h+ ), E(g− h− f+ ) = E(g− f+ )E(h− f+ ).
(2.40)
Proof. Using (2.31) and (2.32), we obtain −1 E(f− g+ h+ ) = det T (f− g+ h+ )T (f−−1 h−1 + g+ ) −1 = det T (f− )T (g+ )T (h+ )T (f−−1 )T (h−1 + )T (g+ ) −1 = det T (g+ )T (f− )T (g+ )T (h+ )T (f−−1 )T (h−1 + ) −1 = det T (g+ )T (f− )T (g+ )T (f−−1 ) det T (f− )T (h+ )T (f−−1 )T (h−1 + ) −1 = det T (f− )T (g+ )T (f−−1 )T (g+ ) det T (f− )T (h+ )T (f−−1 )T (h−1 + )
= E(f− g+ )E(f− h+ ). The proof of the second equality of (2.40) is analogous.
i
i i
i
i
i
i
48
buch7 2005/10/5 page 48 i
Chapter 2. Determinants Lemma 2.13 implies that * + δi E(b) = E 1− (t − μj ) , t i=1 j =1 s r
(2.41)
and hence we are left with computing the factors on the right-hand side of (2.41). Lemma 2.14. If |δ| < 1 and |μ| > 1, then * + δ δ −1 E 1− . (t − μ) = 1 − t μ
(2.42)
Proof. Put c(t) = (1 − δ/t)/(t − μ). By virtue of Proposition 1.3, E(c) = det T (c)T (c−1 ) = det(I − H (c)H ( c−1 )). Clearly, I − H (c)H ( c−1 ) equals ⎛ 1 0 0 ... ⎜ 0 0 . .. I −⎜ ⎝ 0 ... ... ⎛ 1 − (c−1 )−1 ∗ ⎜ 0 1 =⎜ ⎝ 0 0 ... ...
⎞⎛ ⎟⎜ ⎟⎜ ⎠⎝ ∗ ∗ 1 ...
(c−1 )−1 (c−1 )−2 (c−1 )−3 ... ⎞ ... ... ⎟ ⎟, ... ⎠ ...
(c−1 )−2 (c−1 )−3 ...
(c−1 )−3 ...
...
⎞ ⎟ ⎟ ⎠
which shows that E(c) = 1 − (c−1 )−1 . The −1st Fourier coefficient of c−1 is * + δ2 t t2 δ 1 −1 (c )−1 = 1 + + 2 + · · · 1 + + 2 + ··· − t t μ μ μ 0 1 δ δ2 δ3 1 =− δ+ + 2 + ··· = − , μ μ μ μ 1 − δ/μ whence 1 − (c−1 )−1 = 1/(1 − δ/μ. Combining (2.41) and (2.42) we arrive at the formula s r δi −1 det T (b)T (b−1 ) = 1− , μj i=1 j =1 which is in accordance with (2.28).
2.8
Geronimo, Case, Borodin, Okounkov
Jacobi’s theorem says that if A is an invertible m × m matrix, then the determinant of the upper-left n×n block of A−1 is equal to the determinant of the lower-right (m−n)×(m−n) block of A divided by det A. Let Qn : 2 → 2 be the projection acting by the rule Qn : {x0 , x1 , x2 , . . . } → {0, . . . , 0, xn , xn+1 , . . . }.
i
i i
i
i
i
i
2.8. Geronimo, Case, Borodin, Okounkov
buch7 2005/10/5 page 49 i
49
Thus, Qn = I − Pn . If A = I + K with a trace class operator K, then det Pn A−1 Pn =
det Qn AQn . det A
(2.43)
Indeed, with A replaced by Pm APm , this is Jacobi’s theorem, and for general A the formula follows from (2.33) after the limit passage m → ∞. Now let b = b− b+ , where b− and b+ are given by (2.35). Put −1 , u = b− b+
−1 v = b− b+ .
Since uv = 1, we have T (u)T (v) = I − H (u)H ( v ). Notice that H (u)H ( v ) is in the trace class. The remarkable formula contained in the following theorem was established by Geronimo and Case [127] in 1979 (for positive symbols b) and rediscovered by Borodin and Okounkov [31] in 2000. Theorem 2.15. Let b be a Laurent polynomial without zeros on the unit circle and with winding number zero. Then for all n ≥ 1, det Qn T (u)T (v)Qn Dn (b) det(I − Qn H (u)H ( v )Qn ) = = . n G(b) det T (u)T (v) det(I − H (u)H ( v ))
(2.44)
Proof. By (2.39) and (2.43), det Qn T (b− )T −1 (b)T (b+ )Qn Dn (b) −1 −1 = det P T (b )T (b)T (b )P = , n n + − G(b)n det T (b− )T −1 (b)T (b+ ) and as −1 −1 )T (b− )T (b+ ) = T (u)T (v) = I − H (u)H ( v ), T (b− )T (b)T (b+ ) = T (b− )T (b+
we arrive at the assertion. Since det(I − Qn H (u)H ( v )Qn ) → 1 as n → ∞, formula (2.44) implies at once the Szegö-Widom limit theorem: Dn (b) 1 = = det T −1 (v)T −1 (u) n n→∞ G(b) det T (u)T (v) −1 −1 = det T (b+ )T (b− )T (b+ )T (b− ) lim
−1 −1 = det T (b− )T (b+ )T (b− )T (b+ ) = det T (b)T (b−1 ).
i
i i
i
i
i
i
50
buch7 2005/10/5 page 50 i
Chapter 2. Determinants
Exercises 1. Prove that
⎛ ⎜ ⎜ det ⎜ ⎝
d1 x .. .
x d2 .. .
... ... .. .
x x .. .
x
x
...
dn
⎞ ⎟ ⎟ ⎟ ⎠
= x(d1 − x) · · · (dn − x) 2. Prove that
⎛ ⎜ ⎜ ⎜ det ⎜ ⎜ ⎝
x a1 a1 .. .
a1 x a2 .. .
a2 a2 x .. .
a1
a2
a3
1 1 1 + + ··· + x d1 − x dn − x
.
⎞ . . . an . . . an ⎟ ⎟ . . . an ⎟ ⎟ .. ⎟ .. . . ⎠ ... x
= (x + a1 + a2 + · · · + an )(x − a1 )(x − a2 ) · · · (x − an ). 3. Let b(t) = x + t + t −1 . Prove that n − 1 n−2 n − 2 n−4 Dn (b) = x n − x + x − +··· . 1 2 4. Let ωn = e2π i/n . ⎛ 1 ⎜ 1 ⎜ ⎜ det ⎜ 1 ⎜ .. ⎝ .
Prove that
1 5. Prove that
⎛
1
⎜ 1 ⎜ 2 det ⎜ ⎜ .. ⎝ .
1 n
1 ωn ωn2 .. .
1 ωn2 ωn4 .. .
... 1 . . . ωnn−1 . . . ωn2(n−1) .. .
ωnn−1
ωn2(n−1)
...
1 2 1 3
1 3 1 4
.. .
.. .
1 n+1
1 n+2
... ...
⎟ ⎟ ⎟ ⎟ = nn/2 i −(n−1)(n+2)/2 . ⎟ ⎠
ωn(n−1)(n−1)
1 n 1 n+1
.. . ...
⎞
⎞ ⎟ 3 ⎟ ⎟ = [1! 2! · · · (n − 1)!] ⎟ n! (n + 1)! · · · (2n − 1)! ⎠
1 2n−1
and that, with G(m) := (m − 2)! · · · 2! 1! 0!, this is the same as (G(n + 1))4 . G(2n + 1)
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 51 i
51
6. Let a1 = 1, a2 = 3, a3 = 240, a4 = 1512000. Which of the three numbers 1512030752000,
1536288768000, 1541291254000
is a5 ? 7. Let b(t) = 8t 2 − 54t + 101 − 54t −1 + 8t −2 . Prove that Dn (b) > 26n−1 for all sufficiently large n. 8. Let b ∈ P satisfy (2.24) and let b = b− b+ be a Wiener-Hopf factorization. (a) Show that T (b)T (b−1 ) = eT (log b− ) eT (log b+ ) e−T (log b− ) e−T (log b− ) . (b) Show that tr (T (log b− )T (log b+ ) − T (log b+ )T (log b− )) = tr H (log b)H ((log b)) =
∞
k(log b)k (log b)−k .
k=1
9. Let b ∈ P and suppose T (b) is invertible. Show that det Pn T −1 (b)Pn = 1/G(b)n and that, therefore, the Szegö-Widom limit theorem can also be written as det Pn T (b)Pn det Pn T −1 (b)Pn → det T (b)T (b−1 ). 10. Let b ∈ Pr and suppose that b has no zeros on T and winding number zero. Prove that Dn (b−1 ) = G(b−1 )n det T (b−1 )T (b) for all n ≥ r. 11. Let |αi | < 1 and |βj | < 1. Prove that 1 1 ! Dn ! = −1 1 − α i βj (1 − αj t) (1 − βj t ) i,j for all n ≥ 1. 12. Let a ∈ W and suppose that a(t) > 0 for t ∈ T. Prove that if there are nonzero such that constants G(a) and E(a) n E(a) Dn (a) = G(a) for all n ≥ r, then a −1 ∈ Pr . 13. (a) Show that if H (a) is a trace class operator, then |a1 | + |a2 | + |a3 | + · · · < ∞. (b) Show that if |a1 | + 2 |a2 | + 3 |a3 | + · · · < ∞, then H (a) is of trace class. (c) Prove that if a ∈ C 2 (T), then H (a) is a trace class operator.
i
i i
i
i
i
i
52
buch7 2005/10/5 page 52 i
Chapter 2. Determinants
14. Let b(t) = t −70 (t − α)40 (t − β)60 with distinct points α and β on T. Prove that 1 G(31)G(21)G(11) α 400 β 600 10 20 n 1100 1+O Dn (b) = . (α β ) n G(60) (α − β)1000 n 15. Let
τ1 δ1 t γ1 t γ2 τ 2 δ2 b(t) = 1 − 1− 1− , 1− t τ1 t τ2
where τ1 ∈ T, τ2 ∈ T, τ1 = τ2 , and δ1 , γ1 , δ2 , γ2 are nonnegative integers. Find all quadruples (δ1 , γ1 , δ2 , γ2 ) such that δ1 + γ1 = 9, δ2 + γ2 = 3, and Dn (b)/nδ1 γ1 +δ2 γ2 converges to a finite and nonzero limit as n → ∞. 16. Let b be a nonzero Laurent polynomial. Prove that Dn (b) equals 2π 2π n iθ 1 e j − eiθk 2 · · · b(eiθj ) dθ1 . . . dθn . (2π)n n! 0 0 1≤j 0 and the derivative b satisfies a Hölder condition, then Dn (b)/G(b)n → E(b) as n → ∞. Baxter [18], Hirschman [164], and Devinatz [102] then became to understand that the positivity of b may be replaced by the “index zero condition” (2.24), and they gave different proofs of the formula Dn (b) = G(b)n E(b)(1 + o(1)) for symbols b from classes much larger than the class of Laurent polynomials. The development culminated with Widom’s paper [294]. Basor [13] writes: “The proofs of the various Szegö theorems were for the most part difficult, indirect, and worst of all gave no ‘natural’indication why the terms in the expansion, especially the E(b), occurred. Fortunately, this state of affairs was considerably altered in 1976 by Widom [294], whose elegant application of ideas from operator theory extended Szegö’s theorem to the block case and gave easy proofs of the results.” For more details on Szegö’s limit theorem we refer the reader to the books [70] and [71]. Comprehensive studies of trace class operators and infinite determinants are in [133], [217], [254]. The story with formula (2.44) started in 1979, but let us begin 20 years later. In June 1999, during an MSRI workshop on random matrices, Alexander Its and Percy Deift
i
i i
i
i
i
i
54
buch7 2005/10/5 page 54 i
Chapter 2. Determinants
raised the question of whether there is a general formula that expresses the determinant of the Toeplitz matrix Tn (a) as the Fredholm determinant of an operator I − K, where K acts on 2 {n, n + 1, . . . }. Borodin and Okounkov [31] then showed that such a formula indeed exists. The form in which we cited their formula in Theorem 2.15 is due to Harold Widom. The original proof by Borodin and Okounkov is based on representation theory and combinatorics, in particular on results by Okounkov on infinite wedge and random partitions and a theorem by I. M. Gessel expressing a Toeplitz determinant as a sum over partitions of products of Schur functions. Two other proofs were subsequently given by Basor and Widom [16]. The first of these proofs uses an identity for det Tn−1 (a) / det Tn (a) containing just H (b)H (c), ˜ which was established by Widom in 1973, and the second is a further development of the argument employed by Basor and J. W. Helton in 1980 to prove the Szegö-Widom limit theorem. The two proofs by Basor and Widom are operator-theoretic and very lucid. A third operator-theoretic proof was found [37]. It follows from the identity −1 −1 Tn−1 (a) = Tn (a+ )(I − Pn T (c)Qn XQn T (b)Pn )Tn (a− ),
where X = (I − Qn H (b)H (c)Q ˜ n )−1 . This identity, which lifts formula (2.44) from the determinant level to the matrix level, was obtained in 1980 by B. Silbermann and one of the authors [63]. The proof given here is from [38]. It is a modification of the second proof of [16] and is the probably shortest proof of identity (2.44). In July 2003, the people involved in formula (2.44) and its proof since 1999 received an email from Percy Deift. This email was as follows. “Recently Jeff Geronimo showed me a 1979 paper of his with Ken Case in which they wrote down the Borodin-Okounkov formula in the context of proving strong Szegö. The reference is [127]. See, in particular, formula VII.28 on page 308. It’s quite remarkable that the formula was already known in 1979. The proof of the formula by Geronimo-Case is inverse-scattering theoretic and is the analogue of Dyson’s second-derivative log det formula for the Schrödinger case.” There is nothing we can add, except for our congratulations to Jeff Geronimo and Ken Case on their brilliant feat, for our gratification of the eventual recognition of their great success, and for expressing our regret that their names are missing in [16], [31], [37], [38]. Simon’s book [255] contains a proof of (2.44)
under the most general (and natural) 2 smoothness conditions on b: It is only required that ∞ k=−∞ |k| |bk | < ∞. This proof is based on the proof given here and on additional technical arguments due to Rowan Killip and Percy Deift. Exercises 1 to 5 are from [113] and [208]. The observation of Exercise 10 was probably already known to Szegö (see [145]). Exercise 11 is due to Baxter [17] (see also [79] and [174]). Solutions to Exercises 10 and 12 are in [63] and [67]. The problem of characterizing the symbols a for which H (a) is of trace class had been open for a long time before it was completely solved by Peller [202], [203] (see also his recent capital monograph [204]). Simple solutions to Exercise 13 can be found in [67]. For Exercise 15 see [64]. The formula of Exercise 16 goes back to Szegö [262]. The result of Exercise 17 is due to Fejér and F. Riesz; see, e.g., [145]. Exercise 18 summarizes classical results by Szegö and we refer the reader to [108], [145], [255] for proofs and more material. The link between Exercises 8(a) and 8(b) is the formula det eA eB e−A e−B = etr (AB−BA) ,
(2.45)
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 55 i
55
which is true whenever A and B are bounded on 2 and AB − BA is a trace class operator on 2 . Formula (2.45) appeared first in [158] and [206]. It was used by Widom in [294] to show that ∞ k(log b)k (log b)−k det T (b)T (b−1 ) = exp k=1
and thus to recover Szegö’s original strong limit theorem from Theorem 2.12. The proofs of (2.45) given in [158] and [206] are difficult. Recently Torsten Ehrhardt [106] found a remarkably simple proof of the stronger identity det eA eB e−A−B = etr 2 (AB−BA) . 1
(2.46)
This identity is true for arbitrary A, B ∈ B( ) for which AB − BA is of trace class. Note that (2.46) implies that 2
det eA+B e−A e−B = etr 2 (AB−BA) , 1
(2.47)
and that multiplication of (2.46) and (2.47) yields (2.45). Further results: Fisher-Hartwig symbols. The formulas established by Widom and Trench can be employed to compute the determinants Dn (b) for an arbitrary Laurent polynomial b. However, in order to understand the asymptotic behavior of Dn (b), additional work must be done (see, for example, Exercise 14). In the following we cite some asymptotic results for Laurent polynomials in the so-called Fisher-Hartwig class. Let first t γ τ δ c(t), t ∈ T, 1− b(t) = 1 − t τ where τ ∈ T, δ and γ are nonnegative integers, and c is a Laurent polynomial without zeros on T and with wind c = 0. In the case where c is identically 1 (and thus absent), an exact formula for Dn (b) was established in [68]: * + τ δ t γ 1− Dn 1 − t τ G(1 + δ)G(1 + γ ) G(n + 1)G(n + 1 + γ + δ) = , (2.48) G(1 + γ + δ) G(n + 1 + δ)G(n + 1 + γ ) where G(1) = 1 and G(m) = (m − 2)! · · · 2! 1! 0! for m ≥ 2. (Two elementary proofs of this identity can be found in [73].) The right-hand side of (2.48) is G(1 + δ)G(1 + γ ) δγ n (1 + o(1)) as n → ∞. G(1 + γ + δ) Now suppose that c is present. We put G(c) = exp(log c)0 ,
E(c) = exp
∞
k(log c)k (log c)−k ,
k=1
c+ (t) = exp
∞ k=1
(log c)k t k ,
c− (t) = exp
∞
(log c)−k t −k .
k=1
i
i i
i
i
i
i
56
buch7 2005/10/5 page 56 i
Chapter 2. Determinants
Clearly, c = G(c)c− c+ . In [69], it is shown that * * + +/ t γ t γ τ δ τ δ 1− 1− c(t) Dn 1 − Dn 1 − t τ t τ G(c)n E(c) 1 = 1+O . c− (τ )γ c+ (τ )δ n Things are essentially more complicated for symbols with more than one zero on the unit circle. Let now b(t) =
N
1−
j =1
τj δj t γj 1− c(t), t τj
t ∈ T,
where τ1 , . . . , τN are distinct points on T, δj and γj are nonnegative integers, and c is as above. The asymptotics of Dn (b) in this case was obtained in [64]. The result is as follows. Without loss of generality assume that γ := γ1 + · · · + γN ≤ δ1 + · · · + δN =: δ (otherwise pass to the adjoint matrix). Put zj = γj + δj , M = {(m1 , . . . , mN ) : mj ∈ Z, 0 ≤ mj ≤ zj , m1 + · · · + mn = γ }, ⎫ ⎧ N ⎬ ⎨ (mj zj − m2j ) : (m1 , . . . , mN ) ∈ M , Q = max ⎭ ⎩ j =1 ⎧ ⎫ N ⎨ ⎬ M∗ = (m1 , . . . , mN ) ∈ M : (mj zj − m2j ) = Q . ⎩ ⎭ j =1
Then Dn (b) = G(c)n E(c) nQ (1 + O(1/n)) ⎡ ⎤ m1 n τ1 · · · τNmN 1 ⎦, ×⎣ Am1 ,...,mN +O γ1 γN n τ · · · τ ∗ 1 N (m ,...,m )∈M 1
(2.49)
n
where G(c) and E(c) are as above and τi (mi −zi )mj Am1 ,...,mN = 1− τj i=j ×
N N G(zj − mj + 1)G(mj + 1) 1 . m j G(zj + 1) c (τ ) c+ (τj )zj −mj j =1 j =1 − j
(2.50)
The constellation where δj = γj for all j is especially interesting. In this case the asymptotics of Dn (b) was described earlier by Widom [293]. Thus, let b(t) =
N
|t − τj |2γj c(t),
t ∈ T.
j =1
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 57 i
57
If (m1 , . . . , mN ) ∈ M, then N
(mj · 2γj − m2j ) ≤
j =1
N
(m2j + γj2 − m2j ) =
j =1
N
γj2 =
j =1
N
(γj · 2γj − γj2 )
j =1
and equality holds if and only if mj = γj for all j . Hence Q = γ12 + · · · + γN2 and M∗ = {(γ1 , . . . , γN )}. Formulas (2.49) and (2.50) therefore yield γ1 γ n τ · · · τNN 2 2 (1 + O(1/n)) Dn (b) = G(c)n E(c) nγ1 +···+γN A 1γ1 γ τ1 · · · τNN = G(c)n E(c) A nγ1 +···+γN (1 + O(1/n)) 2
2
(2.51)
with A=
1−
i=j
Since
τi τj
−γi γj N N G(2γi − γi + 1)G(γi + 1) 1 . γ j G(2γi + 1) c (τ ) c+ (τj )γj j =1 j =1 − j
7 8 τj −γj γi τi −γi γj τi −γi γj 1− 1− 1− = τj τj τi i=j i>j −2γi γj 1 − τi = = |τi − τj |−2γi γj = |τi − τj |−γi γj , τ j i>j i>j i=j
we obtain A = G(c)γ
i=j
|τi − τj |−γi γj
N
N 1 G(γj + 1)2 . c(τj )γj j =1 G(2γj + 1) j =1
(2.52)
As already said, (2.51) and (2.52) are already in [293].
i
i i
i
i
buch7 2005/10/5 page 58 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 59 i
Chapter 3
Stability
For a ∈ W and n ∈ {1, 2, 3, . . . }, the n × n Toeplitz matrix Tn (a) is defined by (2.1). We now consider the sequence {Tn (a)}∞ n=1 as an entity associated with T (a). In this chapter, we study the following problem: Is there an n0 ≥ 1 and an M ∈ (0, ∞) such that Tn (a) is invertible for all n ≥ n0 and Tn−1 (a)p ≤ M for all n ≥ n0 ?
(3.1)
This problem plays a key role in the theory of large Toeplitz matrices. Since Tn−1 (a)2 =
1 ≥ rad Tn−1 (a), σmin (Tn (a))
where σmin (Tn (a)) is the smallest singular value of Tn (a) and rad Tn−1 (a) stands for the spectral radius of Tn−1 (a), question (3.1) for p = 2 is equivalent to asking whether σmin (Tn (a)) stays away from zero, and if the answer is yes, we can conclude that the eigenvalues of Tn (a) are bounded away from zero as n → ∞.
3.1
Strong and Weak Convergence
Throughout this section, X and Y are Banach spaces. We denote by B(X, Y ) the Banach space of all bounded linear operators from X to Y , and we let K(X, Y ) stand for the Banach space of all compact operators from X to Y . As usual, we put B(X, X) =: B(X) and K(X, X) =: K(X). Let {An }∞ n=1 be a sequence of operators An ∈ B(X, Y ). We say that An converges to an operator A ∈ B(X, Y ) uniformly if An − A → 0, strongly if An x − Ax → 0 for all x ∈ X, weakly if |(An x, y) − (Ax, y)| → 0 for all x ∈ X and all y ∈ Y ∗ ; here Y ∗ is the dual space of Y , and for z ∈ Y and y ∈ Y ∗ , we let (z, y) denote the value of the functional y at z. 59
i
i i
i
i
i
i
60
buch7 2005/10/5 page 60 i
Chapter 3. Stability
Theorem 3.1 (Banach and Steinhaus). Let {An }∞ n=1 be a sequence of operators An ∈ B(X, Y ) such that {An x}∞ is a convergent sequence in Y for each x ∈ X. Then n=1 supn≥1 An < ∞, the operator A defined by Ax := limn→∞ An x belongs to B(X, Y ), and A ≤ lim inf n→∞ An . This theorem, which is also known as the uniform boundedness principle, is proved in every text on functional analysis. Proposition 3.2. If K ∈ K(X, Y ) and the operators An ∈ B(Y ) converge strongly to A ∈ B(Y ), then the operators An K converge uniformly to AK. Proof. Let B1 := {x ∈ X : x ≤ 1} and S1 := {x ∈ X : x = 1}. Fix ε > 0. Since K maps B1 to a set whose closure is a compact subset of Y , there exists some finite collection of elements x1 , . . . , xN ∈ B1 such that for each x ∈ S1 we can find an xj satisfying Kx − Kxj < ε. Clearly, An Kx − AKx ≤ An Kx − Kxj + An Kxj − AKxj + A Kxj − Kx ≤ An ε + An Kxj − AKxj + A ε. Theorem 3.1 implies that An ≤ M < ∞ for all n, and as An converges strongly to A, the norms An Kxj − AKxj are less than ε for all j provided n is sufficiently large. Thus, for n large enough, An Kx − AKx ≤ (M + 1 + A) ε = (M + 1 + A)εx, which shows that An K − AK → 0. Proposition 3.3. If K ∈ K(X, Y ) and the operators An ∈ B(X) converge weakly to A ∈ B(X), then the operators KAn converge strongly to KA. Proof. Fix an arbitrary x ∈ X. Since (An x, y) → (Ax, y) for all x ∈ X and all y ∈ Y ∗ , we see that the operators Tn defined by Tn : Y ∗ → C, y → (An x, y) converge strongly to the operator T : y → (Ax, y). Thus, by Theorem 3.1, supn≥1 An x = supn≥1 Tn < ∞. This shows that the sequence {An x}∞ n=1 is bounded. Contrary to what we want, let us assume that KAn x does not converge to KAx. Then there exist an ε > 0 and a sequence {nk } such that KAnk x − KAx ≥ ε for all k.
(3.2)
Since K is compact and {Ank }∞ k=1 is bounded, there exists a subsequence {nkj } of {nk } such that {KAnkj x} has a limit z in Y . The weak convergence of KAnkj to KA implies that z = KAx, whence KAnkj x − KAx → 0 as j → ∞. But this contradicts (3.2). Let p (1 ≤ p ≤ ∞) be the spaces introduced in Section 1.2. We denote by · p both the norm in p and B(p ). Furthermore, we let c0 denote the closed subspace of ∞ that is constituted by the sequences that converge to zero. The norm in c0 is the · ∞ norm.
i
i i
i
i
i
i
3.2. Stable Sequences
buch7 2005/10/5 page 61 i
61
For n = 1, 2, . . . , let Pn and Qn be the projections on p and c0 defined by Pn : {x0 , x1 , x2 , . . . } → {x0 , x1 , . . . , xn−1 , 0, 0, . . . }, Qn : {x0 , x1 , x2 , . . . } → {0, . . . , 0, xn , xn+1 , . . . }.
(3.3) (3.4)
n
Note that Pn + Qn = I . It is clear that the operators Pn do not converge uniformly. The operators Pn converge strongly to I on p if 1 ≤ p < ∞ and on c0 , but they do not converge strongly on ∞ .
3.2
Stable Sequences
Throughout this section we assume that the underlying Banach space X is c0 or p (1 ≤ p ≤ ∞). For n = 1, 2, . . . , we denote by Xn the space Cn with the p norm if X = p and with the ∞ norm if X = c0 . On identifying {x0 , x1 , . . . , xn−1 } and
{x0 , x1 , . . . , xn−1 , 0, 0, . . . },
we can identify Xn with the image of X under the projection Pn given by (3.3). If A is an n × n matrix, we think of A as an operator on (the column space) Xn and therefore A is well defined. Notice that A = Pn APn , where A is the norm of A as an element of B(Xn ) and Pn APn stands for the norm of the (well-defined) operator Pn APn ∈ B(X). Let {An } := {An }∞ n=1 be a sequence of n × n matrices An . The sequence {An } is said to be stable on X if lim sup A−1 n < ∞, n→∞
where, by convention, A−1 n := ∞ if An is not invertible. Equivalently, {An } is stable on X if and only if there exist n0 ≥ 1 and M ∈ (0, ∞) such that An is invertible for all n ≥ n0 and A−1 n ≤ M for all n ≥ n0 . Lemma 3.4. Suppose X = c0 or X = p with 1 < p < ∞. Let {An } be a sequence of n × n matrices An and assume there is an operator A ∈ B(X) such that An → A and A∗n → A∗ strongly. If lim inf A−1 n 0 there is an n0 (ε) such that (Qn T −1 (a)Qn )−1 Qn < (1 + ε)T −1 (a −1 ) for all n ≥ n0 (ε). For these n we obtain from Lemma 3.6 that the matrices Tn (a) = Pn T (a)Pn | Im Pn are invertible and that (Pn T (a)Pn )−1 Pn = Pn T −1 (a)Pn − Pn T −1 (a)Qn (Qn T −1 (a)Qn )−1 Qn T −1 (a)Pn . This implies that Tn−1 (a) ≤ T −1 (a) + (1 + ε)T −1 (a) T −1 (a −1 ) T −1 (a) and hence gives (3.7).
i
i i
i
i
i
i
64
buch7 2005/10/5 page 64 i
Chapter 3. Stability
Let now X be the space 1 . From what was already proved, we know that lim sup Tn−1 (a)0 < ∞ if T (a) is invertible on c0 and that lim Tn−1 (a)0 = ∞ if T (a) is not invertible on c0 . As Tn−1 (a)0 = Tn−1 (a)1 and as T (a) is invertible on c0 if and only if T (a) is invertible on 1 , we arrive at (3.7) and (3.8) for X = 1 . Finally, the preceding argument can be employed to reduce the case X = ∞ to the (by now settled) case X = 1 . Corollary 3.8. Let X be c0 or p with 1 ≤ p < ∞ and a ∈ W . If T (a) is invertible, then the operators Tn−1 (a)Pn converge strongly to T −1 (a) on X. Proof. The proof is immediate from Proposition 3.5 and Theorem 3.7. Corollary 3.8 tells us that if T (a) is invertible, we can approximately solve the infinite system T (a)x = y by replacing it with the finite system Tn (a)x (n) = Pn y, x (n) ∈ Im Pn . This is called the finite section method. In this connection it is of interest to know something about the first n0 such that Tn (a) is invertible for all n ≥ n0 . In the case of banded matrices, the following result reveals that this n0 is in general not an astronomic number. Proposition 3.9. Let b be a Laurent polynomial and suppose b has no zeros on the unit circle T and wind b = 0. Choose a number α satisfying (1.23). Then there is a constant C(b, α), depending only on b and α, such that Tn (b) is invertible whenever C(b, α) e−2αn < 1. Proof. By Lemma 3.6, Tn (b) = Pn T (b)Pn | Im Pn is invertible if and only if the operator An := Qn T −1 (b)Qn | Im Qn is invertible. We have −1 −1 T −1 (b) = T (b+ )T (b− ) = T (b−1 ) − K,
−1 −1 K := H (b+ )H ( b− ).
Since the operator Qn T (b−1 )Qn | Im Qn has the same matrix as T (b−1 ), the operator An is certainly invertible if Qn KQn T −1 (b−1 ) < 1. But from Lemma 1.17 we infer that −1 −1 Qn KQn ≤ Qn H (b+ ) H ( b− )Qn ⎛ ⎞⎛ ⎞ ∞ ∞ −1 −1 ≤ ⎝ (j + 1) (b+ )n+j ⎠ ⎝ (j + 1) (b− )−n−j ⎠ = O(e−2αn ). j =0
3.4
j =0
Silbermann Theory
In this section we present Silbermann’s approach to the stability problem. This approach yields another proof to Theorem 3.7 and, moreover, allows us to extend this theorem to Toeplitz matrices with certain perturbations. We begin with the analogue of Proposition 1.3 for finite Toeplitz matrices. For n = 1, 2, 3, . . . , we define the operators Wn on p by Wn : {x0 , x1 , x2 , . . . } → {xn−1 , . . . , x1 , x0 , 0, 0, . . . }.
(3.9)
Obviously, Wn2 = Pn . It is also easy to verify that Wn Tn (a)Wn = Tn ( a ),
(3.10)
i
i i
i
i
i
i
3.4. Silbermann Theory
buch7 2005/10/5 page 65 i
65
where, as usual, a (t) := a(1/t) (t ∈ T). Proposition 3.10 (Widom). If a, b ∈ W then Tn (ab) = Tn (a)Tn (b) + Pn H (a)H ( b)Pn + Wn H ( a )H (b)Wn . Proof. From Proposition 1.3 we obtain b)Pn , Tn (ab) = Pn T (ab)Pn = Pn T (a)T (b)Pn + Pn H (a)H ( and since Pn T (a)T (b)Pn = Pn T (a)Pn T (b)Pn + Pn T (a)Qn T (b)Pn = Tn (a)Tn (b) + Pn T (a)Qn T (b)Pn , it suffices to check that a )H (b)Wn . Pn T (a)Qn T (b)Pn = Wn H (
(3.11)
But it is easily seen that the j, k entry (1 ≤ j, k ≤ n) of each side of (3.11) is a−n+j −1 bn−k+1 + a−n+j −2 bn−k+2 + · · · . We now suppose that X is c0 or p with 1 < p < ∞. Let F denote the set of all sequences {An } := {An }∞ n=1 of n × n matrices An such that supn≥1 An < ∞. It is easily seen that F is a Banach algebra with the algebraic operations {An } + {Bn } := {An + Bn },
α{An } := {αAn }, {An }{Bn } := {An Bn }
and the norm {An } := supn≥1 An . We define S as the subset of F that is constituted by ∈ B(X) such that the sequences {An } for which there exist two operators A ∈ B(X) and A An → A,
A∗n → A∗ ,
Wn An Wn → A,
∗ Wn A∗n Wn → A
strongly. It is not difficult to check that S is a closed subalgebra of F, and hence S itself is a Banach algebra. Finally, let J be the set of all sequences {An } ∈ F that are of the form An = Pn KPn + Wn LWn + Cn
(3.12)
with compact operators K and L and with Cn → 0 as n → ∞. Lemma 3.11 (Silbermann). J is a closed two-sided ideal of S. Proof. Let An be of the form (3.12). Since Wn → 0 weakly, we deduce from Proposition 3.3 that An → K strongly. Similarly, A∗n → K ∗ , Wn An Wn → L, Wn A∗n Wn → L∗ strongly. Thus, J is a subset of S. It is clear that J is a linear space. To prove that J is closed, notice first that, by Theorem 3.1, K ≤ lim inf An , n→∞
L ≤ lim inf Wn An Wn . n→∞
i
i i
i
i
i
i
66
buch7 2005/10/5 page 66 i
Chapter 3. Stability
(j ) ∞ Consequently, if {An }∞ j =1 is a Cauchy sequence in J, then the sequences {K }j =1 and (j ) ∞ {L }j =1 are Cauchy sequences in K(X). This implies that there exist operators K and L in K(X) such that K (j ) → K and L(j ) → L uniformly. Now it follows easily that (j ) {An } − {An } → 0 as j → ∞ for some {An } ∈ S. This proves that J is closed. Finally, let {Bn } ∈ S and {An } ∈ J. Then (j )
Bn An = Bn Pn KPn + Bn Wn LWn + Bn Cn = Pn Bn KPn + Wn (Wn Bn Wn )Pn LWn + Bn Cn n + Cn = Pn BKPn + Wn BLW uniformly due to Proposition with Cn → 0, since Bn → BK and Wn Bn Wn Pn L → BL 3.2. Consideration of adjoints shows that {An Bn } is also in J. Theorem 3.12 (Silbermann). Let X be c0 or p with 1 < p < ∞. A sequence {An } ∈ S is are invertible in B(X) and the coset {An } + J is invertible in stable if and only if A and A the quotient algebra S/J. Moroever, if {Rn } + J is the inverse of {An } + J in S/J, then −1 − R)W n Bn := Rn + Pn (A−1 − R)Pn + Wn (A satisfies An Bn = Pn + Cn and Bn An = Pn + Cn with Cn → 0 and Cn → 0. Proof. Suppose the sequence {An } is stable. Then A is invertible by Lemma 3.4. Since (Wn An Wn )−1 = A−1 n , the sequence {Wn An Wn } is also stable, and hence, again by is invertible. The stability of {An } implies the existence of a sequence Lemma 3.4, A {Bn } ∈ F such that Bn An = Pn + Cn with Cn → 0 (simply take Bn = A−1 n for all sufficiently large n). Using Proposition 3.5, it is not difficult to see that {Bn } belongs to S. Thus, {Bn } + J is a left inverse of {An } + J. Analogously one can show the {An } + J is invertible from the right. are invertible and that {Rn } + J is the inverse Conversely, suppose now that A and A of {An } + J. Then An Rn = Pn + Pn KPn + Wn LWn + Cn , where K and L are compact and Cn → 0. Passage to the strong limit n → ∞ gives R = I +L. Thus, S := A−1 −R = −A−1 K and T := A −1 − R = −AL AR = I +K and A are also compact. Put Bn = Rn + Pn SPn + Wn T Wn . Then {Bn } ∈ S and, by Proposition 3.2, An Bn = Pn + Pn KPn + Wn LWn + Cn + An Pn SPn + An Wn T Wn = Pn + Pn (K + An Pn S)Pn + Wn (L + Wn An Wn T )Wn + Cn )Wn + Cn = Pn + Pn (K + AS)Pn + Wn (L + AT = Pn + Cn
i
i i
i
i
i
i
3.4. Silbermann Theory
buch7 2005/10/5 page 67 i
67
with Cn → 0. Consideration of adjoints yields that Bn An = Pn + Cn with Cn → 0. An infinite matrix is said to have finite support if at most finitely many of its entries are nonzero. Clearly, infinite matrices with finite support generate compact operators. Recall that a is defined by a (t) := a(1/t) (t ∈ T). Theorem 3.13 (Silbermann). Let X be c0 or p with 1 ≤ p ≤ ∞ and let An = Tn (a) + Pn KPn + Wn LWn , where a ∈ W and K and L are matrices with finite support. Put = T ( A = T (a) + K, A a ) + L. Then lim sup A−1 n < ∞ if A and A are invertible,
(3.13)
lim A−1 n = ∞ if A or A is not invertible.
(3.14)
n→∞
n→∞
Consequently, {An }∞ n=1 is stable if and only if both A and A are invertible. Proof. We first consider the case where X = c0 or X = p with 1 < p < ∞. −1 If lim inf A−1 n < ∞, A is invertible by virtue of Lemma 3.4. Since (Wn An Wn ) −1 is invertible = An , we can apply Lemma 3.4 to the sequence {Wn An Wn } to obtain that A whenever lim inf A−1 < ∞. This completes the proof of (3.14). n be invertible. It is clear that {An } ∈ S. Thus, (3.13) will follow Let now A and A from Theorem 3.12 once we have shown that {An } + J is invertible in S/J. As {An } + J = {Tn (a)} + J, we are left to prove that {Tn (a)} + J is invertible. But if A = T (a) + K is invertible, then T (a) is Fredholm and a ∈ GW . By Proposition 3.10, Tn (a −1 )Tn (a) = Pn − Pn H (a −1 )H ( a )Pn − Wn H ( a −1 )H (a)Wn , and since all occurring Hankel operators are compact, it follows that {Pn H (a −1 )H ( a )Pn + Wn H ( a −1 )H (a)Wn } ∈ J. Thus, {Tn (a −1 )}+J is a left inverse of {Tn (a)}+J. Similarly one can show that {Tn (a −1 )}+J is a right inverse of {Tn (a)} + J. This proves that {Tn (a)} + J is invertible, as desired. To dispose of the case where X = 1 , we proceed as in the proof of Theorem 3.7. Define Bn := Tn (a) + Pn K ∗ Pn + Wn L∗ Wn , where K ∗ and L∗ are the usual Hermitian adjoints of the matrices K and L. By what was already proved, lim sup Bn−1 0 < ∞ if = T ( a) + L∗ are invertible on c0 , while lim inf Bn−1 0 = ∞ if B or B = T (a) + K ∗ and B B is not invertible on c0 . But Bn−1 0 = (Bn−1 )∗ 1 = A−1 n 1 , and B and B are invertible ∗ 1 ∗ on c0 if and only if B = A and B = A are invertible on . This gives (3.13) and (3.14) for X = 1 . The ∞ case can be reduced to the 1 case by the same reasoning. Corollary 3.14. Let X be c0 or p with 1 ≤ p < ∞ and let An = Tn (a)+Pn KPn +Wn LWn where a ∈ W and K and L are finitely supported. Then the operators A−1 n Pn converge strongly to (T (a) + K)−1 on X if and only if both T (a) + K and T ( a ) + L are invertible.
i
i i
i
i
i
i
68
buch7 2005/10/5 page 68 i
Chapter 3. Stability
Proof. This follows from Proposition 3.5 and Theorem 3.13.
3.5 Asymptotic Inverses The sequence {Bn } delivered by Theorem 3.12 is an asymptotic inverse for An : we have A−1 n = Bn + Cn with Cn → 0 (notice that if Pn − An Bn → 0 and {An } is stable, −1 then A−1 n − Bn ≤ An Pn − An Bn → 0). In the special case An = Tn (a) we have = T ( A = T (a) and A a ), and when proving Theorem 3.13, we saw that Rn = Tn (a −1 ) = T ( a −1 ). Thus, on defining does the desired job, whence R = T (a −1 ) and R K(a) := T −1 (a) − T (a −1 ),
K( a ) := T −1 ( a ) − T ( a −1 ),
(3.15)
for a ∈ GW with wind a = 0, we obtain a )Wn + Cn Tn−1 (a) = Tn (a −1 ) + Pn K(a)Pn + Wn K(
(3.16)
with Cn → 0 on c0 and p for 1 ≤ p < ∞ (note that the proof of the convergence Cn → 0 in Theorem 3.12 also works for 1 ). Here are some alternative expressions for the operators (3.15). From Proposition 1.3 we infer that T (a)T (a −1 ) = I − H (a)H ( a −1 ), whence T −1 (a) − T (a −1 ) = −1 −1 a ). This shows that T (a)H (a)H ( a −1 ), K(a) = T −1 (a)H (a)H (
K( a ) = T −1 ( a )H ( a )H (a −1 ).
(3.17)
Combining (3.17) and Proposition 1.2 we see that K(a) and K( a ) are compact. Anala ), we get T −1 (a) − T (a −1 ) = ogously, starting with T (a −1 )T (a) = I − H (a −1 )H ( a )T −1 (a) and thus H (a −1 )H ( a )T −1 (a), K(a) = H (a −1 )H (
K( a ) = H ( a −1 )H (a)T −1 ( a ).
(3.18)
Given a Wiener-Hopf factorization a = a− a+ , we can also write −1 −1 −1 −1 −1 −1 )T (a− ) − T (a+ a− ) = −H (a+ )H ( a− ), K(a) = T (a+
(3.19)
−1 −1 −H ( a− )H (a+ ).
(3.20)
K( a) =
−1 −1 T ( a− )T ( a+ )
−
−1 −1 T ( a− a+ )
=
Finally, from (3.19) and Exercise 2 of Chapter 1 we obtain −1 −1 )H ( a− ) = −H (a −1 a− )H ( a+ a −1 ) K(a) = −H (a+
= −H (a −1 )T ( a− )T ( a+ )H ( a −1 ) = −H (a −1 )T −1 ( a )H ( a −1 )
(3.21)
and analogously, K( a ) = −H ( a −1 )T −1 (a)H (a −1 ).
(3.22)
In the case of Toeplitz band matrices, we can show that the norm of the matrices Cn in (3.16) decays exponentially. Theorem 3.15. Let b be a Laurent polynomial without zeros on T and with winding number zero. Choose an α satisfying (1.23) and let 1 ≤ p ≤ ∞. Then (3.16) holds with a replaced by b and with Cn p = O(e−αn ).
i
i i
i
i
i
i
3.5. Asymptotic Inverses
buch7 2005/10/5 page 69 i
69
Proof. From Proposition 3.10 and formula (3.10) we see that Tn−1 (b) equals Tn (b−1 ) + Tn−1 (b)Pn H (b)H ( b−1 )Pn + Wn Tn−1 ( b)Pn H ( b)H (b−1 )Wn .
(3.23)
We have Tn−1 (b)Pn H (b)H ( b−1 )Pn = Tn−1 (b)Pn (I − T (b)T (b−1 ))Pn = Tn−1 (b)Pn (T (b)T −1 (b) − T (b)T (b−1 ))Pn = Tn−1 (b)Pn T (b)K(b)Pn = Tn−1 (b)Pn T (b)Pn K(b)Pn + Tn−1 (b)Pn T (b)Qn K(b)Pn = Pn K(b)Pn + Tn−1 (b)Pn T (b)Qn K(b)Pn .
(3.24)
Let b = b− b+ be the Wiener-Hopf factorization (1.19). By virtue of (3.19), K(b) = −1 −1 −H (b+ )H ( b− ). Using Lemma 3.6 we obtain as in the proof of Proposition 3.9 that Qn K(b)p ≤ M1
∞
∞ −1 (j + 1) (b+ )n+j ≤ M1 M2 e−α(n+j ) = O(e−αn ).
j =0
j =0
Since Tn−1 (b) = O(1) due to Theorem 3.7, we arrive at the conclusion that (3.24), that is, the second term of (3.23), is Pn K(b)Pn + O(e−αn ). Analogously, one can show that the b)Wn + O(e−αn ). third term of (3.23) is Wn K( Corollary 3.16. Let b be a Laurent polynomial without zeros on T and suppose the winding number is zero. Let α satisfy (1.23) and let 1 ≤ p ≤ ∞. Then for each natural number k, Tn−1 (b)Pk − T −1 (b)Pk p = O(e−αn ) as n → ∞. Proof. For j ≥ 0, let ej ∈ p be the sequence defined by (ej )k = 1 for k = j and (ej )k = 0 for k = j . Clearly, it suffices to prove that Tn−1 (b)ej − T −1 (b)ej p = O(e−αn ) for each fixed j . Let n ≥ j . By (3.16), Tn−1 (b)ej − T −1 (b)ej = Tn (b−1 )ej + Pn (T −1 (b) − T (b−1 ))Pn ej − T −1 (b)ej + Wn K( b)Wn ej + Cn ej −1 = −Qn T (b)ej + Wn K( b)Wn ej + Cn ej . (3.25)
r j Let b(t) = j =−r bj t . Then H (b) = H (b)Pr . Since Wn ej = en−j , we therefore obtain from (3.18) that Wn K( b)Wn ej p ≤ H ( b−1 )H (b)p Pr T −1 ( b)en−r p . As −1 p Pr T (b)en−r p is the norm of the first r components of the (n − r)th column of b), we deduce from Lemma 1.17 and Proposition 1.18 that the norm the matrix T −1 ( Pr T −1 ( b)en−r p is O(e−αn ). Also by Lemma 1.17 and Proposition 1.18, Qn T −1 (b)p = O(e−αn ). Finally, because Cn ej p = O(e−αn ) by virtue of Theorem 3.15, all three terms in (3.25) have norm O(e−αn ). Corollary 3.17. Under the hypotheses of Corollary 3.16, Pk Tn−1 (b)Pn − Pk T −1 (b)p = O(e−αn ) as n → ∞.
i
i i
i
i
i
i
70
buch7 2005/10/5 page 70 i
Chapter 3. Stability
1.2 1 0.8 0.6 0.4 0.2 0 0
20
40
60
80
100
Figure 3.1. The componentwise error |xj − xj(99) | against j = 0, . . . , 98. Proof. Put En = Tn−1 (b)Pk −T −1 (b)Pk . Corollary 3.16 says that if we consider En on c0 or p (1 ≤ p ≤ ∞), then En = O(e−αn ). This implies that En∗ = O(e−αn ) for the adjoint operator En∗ = Pk Tn−1 (b)Pn − Pk T −1 (b). As 1 = c0∗ and p = (q )∗ with 1/p + 1/q = 1 for 1 < p ≤ ∞, we arrive at the assertion. Example 3.18. The symbol b(t) = −2 − 3it − 2t 2 − 3t 3 + t −1 − 2it −2 + 3t −3 generates an invertible Toeplitz operator T (b). (To see this, plot b(T).) Choose x ∈ 2 by xj = 1 for 0 ≤ j ≤ 99 and xj = 0 for j ≥ 100 and put y = T (b)x. Obviously, only the first 103 components of y are nonzero. The equation T99 (b)x (99) = P99 y has a unique solution, and Figure 3.1 shows the componentwise error |xj − xj(99) | versus j = 0, . . . , 98. It is clearly seen that P50 x (99) is a good approximation to P50 x, but that x (99) and P99 x differ heavily in their last components. For example, the last two components of x (99) are 0.5082 − 0.7108i and 2.0239 − 0.0622i (while those of P99 x are 1 and 1). Things change dramatically when passing from 99 to 100: we have x (100) = P100 x exactly and MATLAB reports x (100) − P100 x∞ < 2 · 10−15 . Figure 3.2 shows the norms x (n) − Pn x∞ and x (n) − Pn x2 versus n. Example 3.19. Let b(t) = 5+t −t 2 +t 3 −t −2 . The operator T (b) is invertible and hence we can solve the equation T (b)x = y for each y ∈ 2 approximately by passing to the truncated systems Tn (b)x (n) = Pn y. We choose 200 vectors xj of length 100 randomly from the unit sphere S99 of the Euclidean space R100 with the uniform distribution, extend xj by zeros to a sequence in 2 , and put yj = T (b)xj . Then we solve the 200 systems Tn (b)xj(n) = Pn yj for n = 30, n = 60, and n = 90. The logarithmic error log10 Pn xj − xj(n) 2 of the computational results mildly fluctuates around −2 for n = 30, n = 60, n = 90, that is, there is no significant improvement when increasing n. However, the improvement in a
i
i i
i
i
i
i
3.5. Asymptotic Inverses
buch7 2005/10/5 page 71 i
71
9 8 7 6 5 4 3 2 1 0 −1 0
20
40
60
80
100
Figure 3.2. The norms x (n) − Pn x∞ (solid) and x (n) − Pn x2 (dashed) versus n for 5 ≤ n ≤ 105. 0 −2 −4 −6 −8 −10 −12 −14 −16 −18 −20 0
50
100
150
200
Figure 3.3. Let T (b)xj = yj with 200 random vectors xj of length 100. The picture shows log10 P30 xj −P30 xj(n) 2 versus j = 1, . . . , 200 for the 200 systems Tn (b)xj(n) = Pn yj with n = 30 (upper curve), n = 60 (middle curve), and n = 90 (lower curve). fixed set of components is drastic. Figure 3.3 shows that log10 P30 xj − P30 xj(n) 2 fluctuates around the values −2, −12, and −16 for n = 30, n = 60, and n = 90, respectively.
i
i i
i
i
i
i
72
buch7 2005/10/5 page 72 i
Chapter 3. Stability
It is well known that in many instances Toeplitz matrices generated by the inverses of Laurent polynomials have much better properties than those generated by Laurent polynomials. The following result illustrates this phenomenon.
Theorem 3.20. Let b(t) = sj =−r bj t j be a Laurent polynomial without zeros on the unit circle T and with winding number zero. Put a = b−1 . Then a )Wn Tn−1 (a) = Tn (b) + Pn K(a)Pn + Wn K(
(3.26)
for all n ≥ n0 := max(r, s). Moreover, all entries outside the n0 × n0 upper-left blocks of K(a) and K( a ) are zero. If b = b− b+ is a Wiener-Hopf factorization of b, then b− ), K( a ) = −H ( b− )H (b+ ). K(a) = −H (b+ )H ( Proof. Let b = b− b+ be the Wiener-Hopf factorization (1.19). By Lemma 3.6, Tn (a) = Pn T (a)Pn | Im Pn is invertible if and only if Qn T −1 (a)Qn | Im Qn is invertible. We have b− )Qn , Qn T −1 (a)Qn = Qn T (b+ )T (b− )Qn = Qn T (b)Qn − Qn H (b+ )H ( and from (3.19) we know that H (b+ )H (b− ) = −K(a). Since b+ and b− are analytic polynomials of degree s and r in t and t −1 , respectively, we see that K(a) has nonzero entries in the principal n0 × n0 block only. This implies that Qn K(a) = 0 for n ≥ n0 , whence Qn T −1 (a)Qn = Qn T (b)Qn for n ≥ n0 . As Qn T (b)Qn | Im Qn has the same matrix as T (b), we conclude that Tn (a) is invertible for all n ≥ n0 . From (3.20) we see that K( a ) = −H ( b− )H (b+ ), and all nonzero entries of this matrix are clearly concentrated in the principal n0 × n0 block. Finally, as in the proof of Theorem 3.15 we obtain that Tn−1 (a) is equal to Tn (b−1 ) + Pn K(a)Pn + Tn−1 (a)Pn T (a)Qn K(a)Pn + Wn K( a )Wn + Wn Tn−1 ( a )Pn T ( a )Qn K( a )Wn , a ) = 0 for n ≥ n0 , we arrive at (3.26). and since Qn K(a) = 0 and Qn K(
Exercises 1. Let b ∈ P and suppose b has no zeros on T and winding number zero. Let ⎛ ⎜ ⎜ Tn (b) ⎜ ⎝
x0(n) x1(n) .. . (n) xn−1
⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎠ ⎝
1 0 .. . 0
⎞ ⎟ ⎟ ⎟, ⎠
⎛
⎞ ⎛ x0 ⎜ ⎟ ⎜ T (b) ⎝ x1 ⎠ = ⎝ .. .
⎞ 1 0 ⎟ ⎠. .. .
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 73 i
73
Show that x0(n) =
Dn−1 (b) , Dn (b)
x0 =
1 , G(b)
deduce that Dn (b)/Dn−1 (b) converges to G(b) exponentially fast, and use this insight to conclude that Dn (b)/G(b)n converges to a finite and nonzero limit. 2. Show that Corollary 3.8 is not true for p = ∞. 3. Let a ∈ W and QN x = 0. Put y = T (a)x. Show that if n ≥ N and Tn (a) is invertible, then the solution x (n) of Tn (a)x (n) = Pn y satisfies the equality PN x (n) = PN x. 4. Explain the straight horizontal parts and the sudden descents of the two curves of Figure 3.2. space of all real sequences in 1 over the scalar 5. In this exercise, 1 is the real Banach
field R. Let y ∈ 1 and let b+ (t) = rj =0 bj t j be an analytic polynomial with real coefficients. Suppose that b+ has no zeros on T. Put d = inf y − T (b+ )x1 . x∈1
(a) Show that there exists an x0 ∈ 1 such that y − T (b+ )x0 1 = d. (b) Show that dn := min Pn y − Pn T (b+ )Pn−r x1 x∈1
converges to d as n → ∞. (c) Let κ be the number of zeros (counted with multiplicities) of b+ in the open unit disk. Show that dn0 := min Pn y − Pn T (b+ )Pn−κ x1 x∈1
converges to d as n → ∞. 6. (a) Let {Bn } be a sequence of n × n matrices, let B be an invertible operator on 2 , and let K be a trace class operator on 2 . Suppose Bn−1 Pn → B −1 strongly. Prove that lim
n→∞
det (Bn + Pn KPn ) = det (I + B −1 K). det Bn
(b) Let a, b ∈ P and suppose b has no zeros on T and winding number zero. Let Tn (b) and Hn (a) denote the n × n truncations of the Toeplitz matrix T (b) and the Hankel matrix H (a), respectively. Prove that T (b−1 )T (b) + T (b−1 )H (a) is of the form identity plus trace class operator and show that lim
n→∞
det (Tn (b) + Hn (a)) = det (T (b−1 )T (b) + T (b−1 )H (a)). G(b)n
i
i i
i
i
i
i
74
buch7 2005/10/5 page 74 i
Chapter 3. Stability (c) Let a, b be as in (b) and let b = b− b+ be a Wiener-Hopf factorization. Put −1−1 c = b− b+ a. Prove that det (T (b−1 )T (b) + T (b−1 )H (a)) = det T (b−1 )T (b) det (I + H (c)).
7. Let a(z, w) = amn zm w n be a Laurent polynomial of two variables z and w. The quarter-plane Toeplitz operator T (2) (a) is defined on 2 (N × N) by (T
(2)
(a)x)ij =
∞
ai−k,j − xk ,
i, j ≥ 1.
k,=1
Suppose a(z, w) = b(zw−1 ) with some Laurent polynomial b of a single variable. (a) Prove that T (2) (a) is Fredholm if and only if {Tn (b)}∞ n=1 is stable. (b) Prove that T (2) (a) is invertible if and only if {Tn (b)}∞ n=1 is stable and Dn (b) = 0 for all n ≥ 1. 8. A Wiener-Hopf integral operator is an operator of the form ∞ k(x − t)f (t)dt, x > 0. (Af )(x) = 0
Suppose k ∈ L (R) and k(x) = 0 for |x| > M. Let {en }∞ n=0 be the orthonormal basis in L2 (0, ∞) constituted by the Laguerre functions 1
ex/2 d n n −x (x e ). n! dx n To solve the equation f + Af = g approximately, one can look for an approximate solution in the form en (x) =
f (n) = γ1(n) e1 + · · · + γn(n) en and determine the coefficients γ1(n) , . . . , γn(n) from the n linear equations γj(n) + (Af (n) , ej ) = (g, ej ) for j = 1, . . . , n. (This is called a Galerkin method.) (a) Prove that A is bounded on L2 (0, ∞) and that the matrix representation of A in ∞ the basis {en }∞ n=0 is the Toeplitz matrix T (a) = (aj −k )j,k=1 with n ∞ 1 dξ ˆ ) ξ + i/2 k(ξ , an = 2π −∞ ξ − i/2 ξ 2 + 1/4 9 ˆ ) := ∞ k(x)eiξ x dx. where k(ξ −∞
(b) Prove that a ∈ W .
(c) Suppose I +A is invertible on L2 (0, ∞). Let g ∈ L2 (0, ∞) and let f ∈ L2 (0, ∞) be the solution of the equation f + Af = g. Show that if the approximate solutions f (n) are computed as described above, then f (n) converges in L2 (0, ∞) to f . ˆ ) = 0 for all (d) Prove that I + A is invertible on L2 (0, ∞) if and only if 1 + k(ξ ξ ∈ R and the winding number of 1 + kˆ about the origin is zero.
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 75 i
75
9. The Cauchy singular integral operator S is defined by 1 f (τ ) (Sf )(t) = v.p. dτ , t ∈ T. πi T τ −t (a) Show that Sχn = χn for n ≥ 0 and Sχn = −χn for n < 0. (b) Let a, b ∈ P. Compute the matrix representation R = (rj k )∞ j,k=−∞ of the socalled singular integral operator aI + bS in the orthonormal basis {(1/(2π ) χn }∞ n=−∞ of L2 (T). (c) Put Rn = (rj k )nj,k=−n . Prove that {Rn } is stable on 2 (Z) if and only if a + b and a − b have no zeros on T and winding number zero. (d) Use (b) and (c) to establish an approximation method for the singular integral equation af + bSf = g on L2 (T). 10. Let L2 (D) denote the Hilbert space of all functions f on D that satisfy 1 1 f 2 := |f (z)|2 dA(z) := |f (reiθ )|2 r dr dθ < ∞. π D π D The Bergman space A2 (D) is the set of all functions in L2 (D) that are analytic in D. It is well known that A2 (D) is a closed subspace of L2 (D) and that the orthogonal projection of L2 (D) onto A2 (D) acts by the rule 1 f (w) (Pf )(z) = dA(w), z ∈ D. π D (1 − zw)2 The Bergman space Toeplitz operator T B (a) induced by a function a ∈ W is defined by T B (a) : A2 (D) → A2 (D),
f → P (af ˆ ),
where aˆ is the harmonic extension of a into D, a(re ˆ iθ ) =
∞
aj r |j | eij θ .
j =−∞
An basis in A2 (D) is built by the functions {en }∞ n=1 defined by en (z) = √ orthonormal n−1 nz . ∞ (a) Show that the matrix representation of T B (a) in the basis {en }∞ n=1 is (bj k )j,k=1 with √ 2 jk aj −k , bj k = |j − k| + j + k 9 2π 1 iθ −inθ where an = 2π dθ. 0 a(e )e
(b) Show that the matrix (bj k − aj −k )∞ j,k=1 induces a compact operator on the space 2 . (c) Use (a) and (b) to establish an approximation method for the equation T B (a)f = g.
i
i i
i
i
i
i
76
buch7 2005/10/5 page 76 i
Chapter 3. Stability
Notes The material of Section 3.1 can be found in every standard text on functional analysis. Proposition 3.5 is usually referred to as Polski’s theorem [207]. For a more thorough discussion of the issues of Section 3.2 we recommend the books [71], [130], and [149]. Lemma 3.6 and its various disguises were discovered repeatedly and successfully employed in several contexts by many people. We learned of it from Anatoli Kozak in the second half of the 1970s. Theorem 3.7 was established by Baxter [18] and Reich [218] for X = 1 and by Gohberg and Feldman for X = c0 and X = p . There exist numerous different proofs of this basic theorem. The proof given here is from [67]. In [25], Theorem 3.7 is proved for banded Toeplitz matrices with methods from the theory of difference equations. The approach presented in Section 3.4 originated from a lucky constellation that emerged in the late 1970s. Silbermann was then embarking on Szegö’s limit theorem and read Widom’s paper [294] on this occasion. That paper contained in particular Proposition 3.10. At the same time, Silbermann and one of the authors were able to extend Theorem 3.7 to Toeplitz operators whose symbols are piecewise continuous with a finite number of jumps [65], and Silbermann was anxious to further extend the result to symbols with countably many jumps, that is, for general piecewise continuous symbols. In this connection, he understood that Widom’s formula is the perfect tool to carry out so-called localization over a central subalgebra provided everything is appropriately adjusted. This led him to consider the Banach algebra S and its ideal J, and in the groundbreaking paper [253], which contains all the results of Section 3.4, he solved the basic problems concerning the finite section method for Toeplitz operators that had been open at that time. Moreover, this paper laid the foundation for a new level of application of Banach algebra techniques to numerical analysis and thus to an approach that has led to plenty of impressive results during the last 25 years. Except for Theorem 3.15 and its two corollaries, the results of Section 3.5 are in principle already in [253] and [294]. We have no explicit reference for Theorem 3.15 and Corollaries 3.16 and 3.17, but these results are well known to specialists. Strohmer’s paper [260] is a very readable account of several aspects of the inversion and of the inverses of positive definite (infinite and finite) Toeplitz matrices. For Exercises 1, 8, and 9 we refer to [130]. Exercise 7 is based on [104], and Exercise 10 is from [33]. Exercise 5 is a result of [168]. We remark that the prevailing operators in discrete-time linear time-invariant systems are lower-triangular Toeplitz operators T (b+ ). If the inputs and outputs are in 2 (that is, of finite energy), then T (b+ ) must be considered on 2 and its norm is T (b+ )2 = b+ ∞ . This leads to H ∞ control. However, if the inputs and outputs are from ∞ (persistent perturbations bounded in magnitude), then the relevant norm is T (b+ )∞ = b+ W , that is, the 1 norm of the coefficients of b+ . This is the origin of what is called 1 control and provides a good example for the necessity of studying Toeplitz operators not only on 2 . For the observation made in Exercise 6, see [70, Proposition 10.25]. Recently Basor and Ehrhardt [14] studied in detail the case a = b and showed that det (I + T −1 (b)H (b)) = E(b)F (b),
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 77 i
77
where E(b) is defined by (2.26) and, with the Wiener-Hopf factorization b = G(b)b− b+ , ∞ 1 b+ (1) 1/2 exp − k(log b)2k F (b) = b+ (−1) 2 k=1 ∞ ∞ 1 2 = exp (log b)2k−1 − k(log b)k . 2 k=1 k=1 Further results: effective solution of Toeplitz systems. Theorem 3.7 and its refinements, such as the formulas of Section 3.5, are the tools we need to study asymptotic spectral properties of large Toeplitz matrices. A big business deals with exact formulas for Tn−1 (a) and with fast (O(n2 ) operations) or superfast (O(n(log n)δ ) operations) algorithms for the solution of finite Toeplitz systems Tn (a)x = y. As this is not the topic of the present book, we confine ourselves to a few modest remarks. Trench [276], [277] established recursion formulas that allowed him to obtain all entries of Tn−1 (a) from the entries of the first and last columns of Tn−1 (a). This result was independently found by Gohberg and Sementsul [134], who, moreover, formulated the result in beautiful matrix language. The Gohberg-Sementsul formula is as follows: if Tn (a) is invertible, if (x1 . . . xn ) and (y1 . . . yn ) are the first and last columns of Tn−1 (a), respectively, and if x1 = 0, then ⎛ ⎞⎛ ⎞ x1 yn yn−1 . . . y1 ⎟⎜ x1 yn . . . y2 ⎟ 1 ⎜ ⎜ x2 ⎟⎜ ⎟ Tn−1 (a) = ⎜ .. ⎟ ⎜ .. .. ⎟ . . . . ⎠⎝ x1 ⎝ . . . . ⎠ . xn ⎛ 1 − x1
⎜ ⎜ ⎜ ⎜ ⎜ ⎝
xn−1
...
x1
0 y1 y2 .. .
0 y1 .. .
0 .. .
yn−1
yn−2
...
⎞⎛ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝ y1
0
yn 0
xn 0
xn−1 xn
... ... .. .
x2 x3 .. .
⎞
⎟ ⎟ ⎟ ⎟. ⎟ xn ⎠ 0
This representation of Tn−1 (a) via triangular matrices can be used to design effective algorithms for inverting Tn (a) or for solving systems of the form Tn (a)x = y. There exist many other formulas of the above type, in particular, formulas that do not need any additional assumptions such as x1 = 0. We refer the reader to Heinig and Rost’s books [155], [157] and the references therein for more on this subject. Formulas for Tn−1 (a) in which the triangular matrices of the Gohberg-Sementsul formula are replaced by circulant matrices were detected by Lerer and Tismenetsky [188] and by Ammar and Gader [3]. These lead to different effective algorithms for finite Toeplitz systems. Finally, formulas involving only diagonal matrices and a few discrete Fourier transforms as well as the associated algorithms for systems with Toeplitz matrices can be found in [156] and the references listed there. Further results: approximating inverses of Toeplitz matrices by circulants. Given a Laurent polynomial b, define the circulant matrix Cn (b) as in Section 2.1. Suppose b has no zeros on T and wind b = 0. Then Tn (b) and Cn (b) are invertible for all sufficiently large n.
i
i i
i
i
i
i
78
buch7 2005/10/5 page 78 i
Chapter 3. Stability
Theorem 3.7 implies that the first column of Tn−1 (b) converges to the first column of T −1 (b). If b = b− b+ is a Wiener-Hopf factorization subject to the normalization (b− )0 = 1, then the first column of T −1 (b) is −1 −1 −1 −1 −1 −1 T −1 (b)e0 = T (b+ )T (b− )e0 = T (b+ )e0 = ( (b+ )0 (b+ )1 (b+ )2 . . . ) .
On the other hand, using (2.6) it easy to see that the first column of Cn−1 (b) converges to 1 ( (b−1 )0 (b−1 )1 (b−1 )2 . . . ) . 2π Consequently, in general one cannot approximate the first and last columns of Tn−1 (b) by the corresponding columns of Cn−1 (b). However, things are different for the central columns. Let tj(n) and cj(n) denote the j th column of Tn−1 (b) and Cn−1 (b), respectively. Fix − cj(n) 2 → 0 exponentially fast as n → ∞ whenever a natural number K. Then tj(n) n n |jn − n/2| ≤ K for all n. Thus, the middle columns of the inverses of banded Toeplitz matrices are approximated by the corresponding columns of appropriate circulant matrices exponentially well. This was first observed in [260] and [261] for positive symbols b and was proved in [60] for symbols with no zeros on T and with winding number zero. Figure 3.4 illustrates the phenomenon. We took the same symbol b as in Example 3.18 and computed tj(n) − cj(n) 2 (1 ≤ j ≤ n) for n = 50, n = 100, and n = 150. 0.9 0.8 n=50 0.7 0.6
n=150
n=100
0.5 0.4 0.3 0.2 0.1 0
0
50
100
150
Figure 3.4. The norms tj(n) − cj(n) 2 versus 1 ≤ j ≤ n for three choices of n.
i
i i
i
i
i
i
buch7 2005/10/5 page 79 i
Chapter 4
Instability
We know from the Baxter-Gohberg-Feldman theorem that Tn−1 (b)p → ∞ if T (b) is a noninvertible Toeplitz band matrix. This chapter is devoted to estimates of the growth of Tn−1 (b)p as n → ∞. We can slightly restate the problem. Namely, let λ ∈ C and consider Tn−1 (b − λ)p . If λ ∈ / sp T (b), then the theorem tells us that Tn−1 (b − λ)p remains bounded. In this chapter we consider the case where λ ∈ sp T (b). We first embark on the case where λ ∈ sp T (b) \ spess T (b), and after this we pass to numbers λ in spess T (b).
4.1
Outside the Essential Spectrum
Let b be a Laurent polynomial which does not vanish on T and write b in the form (1.11). As in Section 1.7, we define δ ∈ (0, 1) and μ ∈ (1, ∞) by δ = max(|δ1 |, . . . , |δJ |),
μ = min(|μ1 |, . . . , |μK |).
p
Finally, let n stand for Cn with the p norm. Theorem 4.1. Let b be a Laurent polynomial and suppose wind b = 0. Let further 1 ≤ p ≤ ∞. Then for every 1 α < min log , log μ δ there is a constant Cα depending only on α (and b, p) such that Tn−1 (b)p ≥ Cα eαn for all n ≥ 1.
(4.1)
Proof. Put χk (t) = t k (t ∈ T). We have b = χκ c− c+ with c− (t) = bs
J
1−
j =1
δj t
,
c+ (t) =
K
(t − μk ),
k=1
79
i
i i
i
i
i
i
80
buch7 2005/10/5 page 80 i
Chapter 4. Instability
and κ = wind b ∈ Z \ {0}. We assume without loss of generality that κ < 0, since otherwise we may pass to adjoint matrices. By Lemma 1.17, ∞
−1 c+ (t) =
d t
with
d = O(e−α ).
=0
Define x
(n)
∈
p n
and x ∈ by x p
(n)
= (d0 , . . . , dn−1 ) and x = {d0 , d1 , . . . }. Clearly,
Tn (b)x (n) = Pn T (c− )T (χκ )T (c+ )x (n) .
(4.2)
Since κ < 0, we get Pn (T (c− )T (χκ )e0 ) = 0, and as e0 = T (c+ )x, it follows that Pn T (c− )T (χκ )T (c+ )x = 0.
(4.3)
From (4.2) and (4.3) we obtain that Tn (b)x (n) = Pn T (c− )T (χκ )T (c+ )(x (n) − x) = Pn T (b)(x (n) − x), and since x (n) − xp = O(e−αn ), we arrive at the estimate Tn (b)x (n) p ≤ T (b)p x (n) − xp ≤ Dα e−αn . Taking into account that x (n) p → xp > 0 as n → ∞, we see that, for all sufficiently large n, Tn−1 (b)p ≥
x (n) p (1/2)xp ≥ = Bα eαn . Tn (b)x (n) p Dα e−αn
Figure 4.1 shows the norms Tn−1 (b − λ)2 (5 ≤ n ≤ 80) for b(t) = t −2 + 0.75 · t −1 + 0.65 · t and λ = −0.5, 0.82, 0.83 + 0.7i (top pictures and left picture in the middle) and for b(t) = t −2 − 2 t −1 + 1.25 · t 3 and λ = −3.405, 1.48, 0.995 + 3i (right middle picture and bottom pictures). The curve b(T) and the point λ are indicated in the lower right corners of the pictures.
4.2
Exponential Growth Is Generic
Theorem 4.1 provides us with a general lower bound for the norms Tn−1 (b)p in case b has no zeros but nonvanishing winding number. In this section we consider the problem of finding upper bounds. If b(t) = t, then Tn (b) is not invertible and hence Tn−1 (b)p = ∞ for all n ≥ 1. This indicates that a universal upper bound for the growth of Tn−1 (b)p will hardly exist. Hadamard’s inequality says that if a matrix X ∈ Cn×n has the columns x1 , . . . , xn , then |det X| ≤ x1 2 · · · xn 2 ,
where xj 2 is the 2 norm of xj . Note that if b(t) = sj =−r bj t j (t ∈ T), then s
1 |bj | = 2π j =−r
2
0
2π
|b(eiθ )|2 dθ = b22 .
(4.4)
i
i i
i
i
i
i
4.2. Exponential Growth Is Generic
buch7 2005/10/5 page 81 i
81
15
4
10
10
3
10
10
10
2
10 5
10
1
10 0
10
0
0
20
40
60
80
100
6
10
0
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
4
10
10
3
10
4
10
2
10 2
10
1
10 0
10
0
0
20
40
60
80
100
15
10
0
2
10
10
10
10
1
10 5
10
0
10
0
0
20
40
60
80
100
10
0
Figure 4.1. Norms Tn−1 (b − λ)2 for two symbols b and three λ’s.
i
i i
i
i
i
i
82
buch7 2005/10/5 page 82 i
Chapter 4. Instability
Theorem 4.2. Let b be a Laurent polynomial and suppose that the determinant Dn (b) := det Tn (b) is nonzero. Then for every 1 ≤ p ≤ ∞, Tn−1 (b)p ≤
1 n1/q bn−1 2 , |Dn (b)|
(4.5)
where 1/p + 1/q = 1 and n1/∞ := 1. n Proof. We have Tn−1 (b) = (1/Dn (b))An (b), where An (b) = (aj(n) k (b))j,k=1 and the number (n) j +k times the determinant of the matrix arising from Tn (b) by deleting the aj k (b) is (−1) kth row and the j th column. Simple application of Hölder’s inequality shows that
1/q (n) q q An (b)p ≤ max |aj(n) (b)| + · · · + |a (b)| . 1 jn 1≤j ≤n
(4.6)
By Hadamard’s inequality, ⎛ ⎝ |aj(n) k (b)| ≤
s
j =−r
⎞1/2 |bj |2 ⎠
⎛ ···⎝
s
⎞1/2 |bj |2 ⎠
j =−r
,
n−1 n−1 and (4.4) therefore gives |aj(n) k (b)| ≤ b2 . Combining this estimate and (4.6) we arrive at (4.5).
Example 4.3. Let ⎛
⎞ 0 −4 0 ... ⎜ 1 0 −4 . . . ⎟ ⎟. T (b) = ⎜ ⎝ 0 1 0 ... ⎠ ... ... ... ... Then b(t) = t − 4t −1 = t −1 (t − 2)(t + 2), which shows that wind b = −1. From Example 2.9 we get n |2n+1 − (−2)n+1 | 2 if n is even, |Dn (b)| = = 0 if n is odd. |2 − (−2)| √ By virtue of (4.4), b2 = 17. Hence, (4.5) implies that Tn−1 (b)2 is at most √ n √ 1 17 1 1/2 = √ n1/2 en log( 17/2) < 0.25 n e0.723 n < 0.25 e0.73 n √ n 2 17 17 for all sufficiently large even n. On the other hand, (4.1) with α = log 2 − 0.001 shows that Tn−1 (b)2 ≥ Cen(log 2−0.001) > Ce0.69 n with some constant C > 0 for all n. In summary, there are C1 , C2 ∈ (0, ∞) such that C1 e0.69 n ≤ Tn−1 (b)2 ≤ C2 e0.73 n for all even n, and we have Tn−1 (b)2 = ∞ for all odd n.
i
i i
i
i
i
i
4.3. Arbitrarily Fast Growth
buch7 2005/10/5 page 83 i
83
Let D be the set of all Laurent polynomials of the form b(t) = bs t −r (t − z1 ) · · · (t − zr+s )
(t ∈ T)
(4.7)
with bs = 0 and 0 < |z1 | < |z2 | < · · · < |zr+s |. Corollary 4.4. Let 1 ≤ p ≤ ∞. If b ∈ D, then there are constants γ ∈ (0, ∞) and Dγ ∈ (0, ∞) depending only on b and p such that Tn−1 (b)p ≤ Dγ eγ n for all n ≥ 1.
(4.8)
Proof. From Theorem 2.8 we deduce that |Dn (b)| = A |bs |n |zr+1 · · · zr+s |n (1 + O(q n ))
(4.9)
with some A = 0 and some q ∈ (0, 1). Combining (4.9) and (4.5), we get (4.8). Corollary 4.5. Let 1 ≤ p ≤ ∞ and let E be the set of all Laurent polynomials that have no zeros on T and whose winding number is nonzero. (a) E ∩ D is a dense and open subset of the set E (with the uniform metric). (b) If b ∈ E ∩ D, then there are constants C1 , C2 ∈ (0, ∞) and γ1 , γ2 ∈ (0, ∞) depending only on b and p such that C1 eγ1 n ≤ Tn−1 (b)p ≤ C2 eγ2 n for all n ≥ 1.
(4.10)
Proof. (a) It is clear that E ∩ D is an open subset of E. Every Laurent polynomial b can be written in the form (4.7). Put bε1 ,...,εr+s (t) = bs t −r (t − z1 − ε1 ) · · · (t − zr+s − εr+s ). Clearly, given any ε > 0, we can find ε1 , . . . , εr+s ∈ C such that the moduli |z1 + ε1 |, . . . , |zr+s + εr+s | are pairwise distinct and such that b − bε1 ,...,εr+s ∞ < ε. If b ∈ E, then bε1 ,...,εr+s can obviously also be chosen in E. This proves that E ∩ D is dense in E. (b) The lower estimate of inequality (4.10) holds for every b ∈ E by virtue of Theorem 4.1, and the upper estimate of (4.10) is satisfied for every b ∈ D due to Corollary 4.4. In view of the preceding corollary, we may say that exponential growth of the norms Tn−1 (b)p is generic for Toeplitz band matrices which generate Fredholm operators of nonzero index.
4.3 Arbitrarily Fast Growth In the previous section we observed that if T (b) is Fredholm of nonzero index, then there may be infinitely many n such that Tn−1 (b)p = ∞. Example 4.3 gives rise to the following question: Is there a kind of function ϕ : N → N (for example, ϕ(n) = eγ n ) such that either
i
i i
i
i
i
i
84
buch7 2005/10/5 page 84 i
Chapter 4. Instability
Tn−1 (b)p = ∞ or Tn−1 (b)p ≤ Cϕ(n) with some constant C ∈ (0, ∞) independent of n? The purpose of the present section is to show that the answer to this question is no. Pick α ∈ (0, 1) and put b(t) = t + α 2 t −1 = t −1 (t + iα)(t − iα)
(t ∈ T).
(4.11)
Since b(eiθ ) = (1 + α 2 ) cos θ + i(1 − α 2 ) sin θ , we see that b(T) is an ellipse with the foci −2α and 2α. If λ ∈ (−2α, 2α), then b − λ has no zeros on T and wind (b − λ) equals 1. Theorem 4.6. Let ϕ : N → N be any monotonically increasing function, for example, ϕ(n) = exp(nn ), and let 1 ≤ p ≤ ∞. Then, with b given by (4.11), there exists a number λ ∈ (−2α, 2α) such that Tn−1 (b)p < ∞ for all n ≥ 1 and Tn−1 (b)p > nk ϕ(nk ) for k infinitely many nk ∈ N. Proof. Every λ ∈ (−2α, 2α) can be written in the form λ = 2α cos y with y ∈ (0, π ). By Theorem 2.4, the eigenvalues of Tn (b − λ) are λ(n) j := 2α cos
πj − 2α cos y, n+1
j ∈ {1, . . . , n}.
(4.12)
Notice that |λ(n) j |
1 y 1 j πj πj = 4α sin − y sin + y ≤ 2απ − . 2 n+1 2 n+1 π n + 1
(4.13)
(n) Let λ(n) be the minimum of |λ(n) 1 |, . . . , |λn |. Since Tn−1 (b − λ)p ≥ rad Tn−1 (b − λ) = 1/λ(n),
where rad (·) denotes the spectral radius, it suffices to prove that there is a y such that 0 < λ(n) for all n ≥ 1 and λ(nk ) < 1/(nk ϕ(nk )) for infinitely many nk . Pick any natural number N1 ≥ 1 and choose natural numbers N1 < N2 < N3 < · · · successively by requiring that (4.14) 2απ 10N1 +···+Nk − 1 ϕ 10N1 +···+Nk − 1 < 10Nk+1 (k ≥ 1). Put nk := 10N1 +···+Nk − 1 and y/π := 10−N1 + 10−N1 −N2 + 10−N1 −N2 −N3 + · · · . Obviously, y/π is irrational. This implies that none of the eigenvalues (4.12) is zero, and hence λ(n) > 0 for all n ≥ 1. As 0 < 10−N1 + 10−N1 −N2 + · · · + 10−N1 −···−Nk < 1, it follows that 10−N1 + 10−N1 −N2 + · · · + 10−N1 −···−Nk = jk 10−N1 −···−Nk with a natural number jk satisfying 1 ≤ jk ≤ 10N1 +···+Nk − 1 = nk . We have jk y − = 10−N1 + 10−N1 −N2 + · · · π nk + 1 − 10−N1 + 10−N1 −N2 + · · · + 10−N1 −···−Nk = 10−N1 −···−Nk+1 + 10−N1 −···−Nk+2 + · · · ,
i
i i
i
i
i
i
4.4. Sequences Versus Polynomials
buch7 2005/10/5 page 85 i
85
which shows that 0
0, it results that Pn T (b − λ)x = Pn T (ξβ c− )T (χ−m )T (c+ )x = 0. Hence, again as in the proof of Theorem 4.1, Tn (b − λ)x (n) 2 = Pn T (b − λ)x (n) 2 = Pn T (b − λ)(x (n) − x)2 ≤ T (b − λ)2 x (n) − x2 ≤ D e−αn with certain constants D, α ∈ (0, ∞), which yields (4.36). Example 4.19. Consider the symbol b(t) = (t − 1)2 t k (2.001 + t + 0.49t −1 ). Figure 4.3 shows what happens in the five cases k = −3, −2, −1, 0, 1. In each picture we see the norm Tn−1 (b)2 against n. We also plotted the shape of the curve b(T) in the lower-right corner; the origin is marked by a big dot. As predicted by Theorem 4.18, the norms increase at least exponentially for k = −3 and k = 1, while the growth of the norms is polynomial for −2 ≤ k ≤ 0. In the picture in the bottom, we replaced values greater than 1015 by the value 1017 . Our next objective is to translate Theorem 4.18 into geometrical language. We label each connected component of C \ b(T) by the winding number of the oriented curve b(T) about the points of the component. Let λ ∈ b(T)\S(b). Then there is an open neighborhood Uλ ⊂ C of λ such that Uλ ∩ b(T) =: γλ is an oriented analytic arc. Clearly, λ belongs to − + the boundaries of exactly two components + λ and λ of C \ b(T). We let λ stand for − the component on the left of γλ , and, accordingly, λ is the component on the right of γλ . − Lemma 4.20. Let λ ∈ b(T) \ S(b). If m is the winding number of + λ , then λ has the winding number m − 1.
Proof. Let n be the winding number of − λ . Fix a sufficiently small disk Uλ centered at λ, pick a point μ ∈ Uλ ∩ + , and replace b(T) by the continuous curve δλ that coincides with λ b(T) outside Uλ and with ∂Uλ ∩ + otherwise. It is obvious that wind (δλ , μ) = m − 1. On λ the other hand, since μ and − are contained in the same connected component of C \ δλ , λ we have wind (δλ , μ) = n. Consequently, n = m − 1. For r ∈ (0, 1), put br (t) = b(rt) (t ∈ T). Lemma 4.21. Let λ ∈ b(T) \ S(b) and let β be the order of the zero of b − λ on T. Suppose r ∈ (0, 1) is sufficiently close to 1 and a point moves along the curve br (T), following the orientation of this curve. Then, in a small neighborhood Uλ ⊂ C of λ, this point is first in + λ ∩ Uλ , then it encircles λ exactly [(β − 1)/2] times in the clockwise direction, after which it is again in + λ ∩ Uλ .
i
i i
i
i
i
i
98
buch7 2005/10/5 page 98 i
Chapter 4. Instability 15
3
10
10
k = −3
k = −2
10
2
10
10
5
1
10
10
0
10
0
0
20
40
60
80
3
10
0
60
80
40
60
80
10
k = −1
k=0
2
2
10
10
1
1
10
10
0
0
40
3
10
10
20
0
20
40
60
80
10
0
20
20
10
15
10
k=1
10
10
5
10
0
10
0
20
40
60
80
Figure 4.3. Norms Tn−1 (b)2 for several symbols b with zeros.
i
i i
i
i
i
i
4.7. Inside the Essential Spectrum
buch7 2005/10/5 page 99 i
99
Proof. This follows from the fact that the Riemann surface of b(z) − λ at z = t0 is locally homeomorphic to the Riemann surface of zβ at the origin and that rT lies “on the left” of the circle T. Theorem 4.22. Let λ ∈ b(T) \ S(b). Suppose the order of the zero of b − λ is β and the winding number of + λ is m. Then * + * + β +1 β Tn−1 (b − λ)2 nβ if − ≤m≤ , (4.37) 2 2 and there are constants C ∈ (0, ∞) and α ∈ (0, ∞) such that * + * + β +1 β −1 αn Tn (b − λ)2 ≥ C e if m < − or m > . 2 2
(4.38)
Proof. Write b − λ in the form (4.34). Then b(rt) − λ = (rt − t0 )β r k t k c(rt) for t ∈ T. If r ∈ (0, 1) is sufficiently close to 1, then λ ∈ / br (T) and wind (b(rt), λ) = wind (b(rt) − λ, 0) = wind ((rt − t0 )β , 0) + k + wind (c(rt), 0) = 0 + k + 0 = k. Evidently, br − b∞ → 0 as r → 1 − 0. This in conjunction with Lemma 4.21 shows that if μ ∈ − λ is sufficiently close to λ, then * + β −1 wind (b(rt), λ) = wind (b(rt), μ) = wind (b(t), μ) − . 2 Consequently, k = wind (b, μ) − [(β − 1)/2]. By Lemma 4.20, wind (b, μ) = m − 1. It results that k = m − 1 − [(β − 1)/2], and since * + * + * + β −1 β β +1 −β ≤ m − 1 − ≤ 0 ⇐⇒ − ≤m≤ , 2 2 2 (4.37) and (4.38) follow from (4.35) and (4.36). Here are two interesting special cases. Recall that, by Corollary 1.12, sp T (b) = b(T) ∪ λ ∈ C \ b(T) : wind (b, λ) = 0 . Corollary 4.23. If λ ∈ b(T) \ S(b) is located on ∂ sp T (b), the boundary of the spectrum of T (b), and β is the order of the zero of b − λ, then Tn−1 (b − λ)2 nβ . − Proof. By assumption, + λ or λ has the winding number zero. From Lemma 4.20 we deduce that the winding number m of + λ is 0 or 1. Since [(β + 1)/2)] ≥ 1 and −[β/2] ≤ 0 for every β ∈ N, the assertion follows from (4.37).
Corollary 4.24. If λ belongs to b(T) \ S(b) and the order of the zero of b − λ is 1, then Tn−1 (b −λ)2 n for λ ∈ ∂ sp T (b), while Tn−1 (b −λ)2 increases at least exponentially in case λ ∈ / ∂ sp T (b).
i
i i
i
i
i
i
100
buch7 2005/10/5 page 100 i
Chapter 4. Instability
Proof. The first part of the assertion is immediate from Corollary 4.23. So suppose λ ∈ + − b(T) \ (S(b) ∪ ∂ sp T (b)) and let m be the winding number of + λ . As neither λ nor λ have the winding number zero, we infer from Lemma 4.20 that m = 0 and m − 1 = 0. Consequently, m > 1 or m < 0, and Theorem 4.22 with β = 1 completes the proof. Example 4.25. Let b(t) = (t + 1)3 . We have b(T) = A ∪ {−1} ∪ B ∪ {0}, where A and B are as in Figure 4.4. 6
A
4
2
B 0
–1
A
0 B
–2
–4 A
–6 –2
–1
0
1
2
3
4
5
6
7
8
9
Figure 4.4. The curve b(T) for b(t) = (t + 1)3 . Theorem 4.22 implies that Tn−1 (b
− λ)2
n n3
for λ ∈ A, for λ = 0,
and that Tn−1 (b − λ)2 increases at least exponentially for λ ∈ B. We are left with λ = −1. In that case b(t) − λ = (t + 1)3 + 1 = (t 2 + t + 1)(t + 2) = (t − ω)(t − ω2 )(t + 2), where ω = e2π i/3 . Since the two zeros of b − λ on T are of order 1, Corollary 4.12 implies that Tn−1 (b − λ)2 ≥ C n with some constant C. The results we have established so far cannot be used to obtain an upper estimate for Tn−1 (b − λ)2 . However, in the case at hand we can proceed as follows. Since Tn (b(t) − λ) = Tn (t 2 + t + 1)Tn (t + 2) and supn Tn−1 (t + 2)2 < ∞ by Theorem 3.7, it remains to estimate the norms Tn−1 (t 2 + t + 1)2 from above. It can be checked straightforwardly that Tn−1 (t 2 + t + 1) is the lower-triangular Toeplitz matrix whose first column is ( 1, −1, 0, 1, −1, 0, 1, −1, 0, . . . ) .
i
i i
i
i
i
i
4.8. Semi-Definite Matrices
buch7 2005/10/5 page 101 i
101
By computing the Frobenius norm of that matrix or by writing it as Tn (1) + Tn (−t) + Tn (t 3 ) + Tn (−t 4 ) + Tn (t 6 ) + Tn (−t 7 ) + · · · , we see that the spectral norm is O(n). In summary, Tn (b − λ)2 n for λ = −1. Example 4.26. Things are very complicated for points λ ∈ S(b). Let, for example, b(t) − λ = t k (t − 1)(t + 1) (t ∈ T). Corollary 4.12 gives Tn−1 (b − λ)2 ≥ C n for all n ≥ 1. If k ≤ −3 or k ≥ 1, then Tn (b − λ) is triangular with zeros on the main diagonal and hence Tn−1 (b−λ)2 = ∞. In the case k = 0, the inverse of Tn (b−λ) is the lower-triangular Toeplitz matrix with the first column ( −1, 0, −1, 0, −1, 0, . . . ) , and as in Example 4.25 we obtain that the spectral norm of this matrix is O(n). Thus, Tn−1 (b − λ)2 n for k = 0. Passage to adjoints yields the same result for k = −2. Finally, let k = −1. Then Tn (b − λ) is a skew-symmetric tridiagonal Toeplitz matrix. Thus, Tn−1 (b − λ)2 = ∞ if n is odd. For even n we may employ the fact that Tn (b − λ) is normal and hence Tn−1 (b − λ)2 = 1/μn where μn is the minimum of the moduli of the eigenvalues of Tn (b − λ). From Theorem 2.4 we infer that the eigenvalues of T2m (b − λ) are 2i cos πj/(2m + 1) (j = 1, . . . , 2m), which shows that μ2m = 2 cos
πm π 2π π π = 2 sin ∼ ∼ = . 2m + 1 4m + 2 4m + 2 2m n
Consequently, Tn−1 (b − λ)2 ∼ n/π for even n, where here and throughout the book αn ∼ βn means that αn /βn → 1.
4.8
Semi-Definite Matrices
A matrix A ∈ Cn×n is said to be positive semi-definite if Re (Ax, x) ≥ 0 for all x in Cn and is called positive definite if there is an ε > 0 such that Re (Ax, x) ≥ εx22 for all x ∈ Cn . From (4.17) we infer that if a ∈ W and Re a(t) ≥ 0 for all t ∈ T, then Tn (a) is positive semi-definite, and that if a ∈ W and Re a(t) ≥ ε > 0 for all t ∈ T, then Tn (a) is positive definite. In this section we establish upper estimates for Tn−1 (b)2 in terms of the zeros of the real part Re b of b provided b is positive semi-definite, that is, in the case where Re b(t) ≥ 0 for all t ∈ T. The estimates we will obtain are very coarse in general, but they have some advantages. First, they imply that if b is a Laurent polynomial and λ ∈ ∂ conv b(T) then Tn−1 (b−λ)2 grows at most polynomially; note that λ is allowed to belong to S(b). Second, our estimates for Tn−1 (b)2 are sharp in case b is real valued and nonnegative. For a ∈ W , let R(a) = a(T) be the range of a, let conv R(a) stand for the convex hull of R(a), let ∂ conv R(a) denote the boundary of conv R(a), and put dist (0, conv R(a)) := min{|z| : z ∈ conv R(a)}. Proposition 4.27. Suppose a ∈ W does not vanish identically and R(a) is not a line segment containing the origin in its interior. If 0∈ / conv R(a) or 0 ∈ ∂ conv R(a),
(4.39)
i
i i
i
i
i
i
102
buch7 2005/10/5 page 102 i
Chapter 4. Instability
then Tn (a) is invertible for all n ≥ 1. n Proof. Assume Tn (a) is not invertible. Then Tn (a)x = 0 for some 9 x ∈2 C \ {0}, and (4.17) + implies that there exists a polynomial f ∈ Pn \ {0}9 such that a|f | = 0. By (4.39), we can find a number γ ∈ T such that Re (γ a) ≥ 0. As Re (γ a)|f |2 = 0 and |f |2 > 0 almost everywhere, it follows that Re (γ a) 9 = 0 throughout T. Consequently, R(γ a) = i[m, M] with real numbers m < M. Since Im (γ a)|f |2 = 0 and |f |2 > 0 almost everywhere, we deduce that m < 0 and M > 0. However, this case was excluded.
Corollary 4.28. Let a ∈ W . If R(a) is a singleton, then sp Tn (a) = R(a). If R(a) is a line segment [z1 , z2 ] with z1 = z2 , then sp Tn (a) ⊂ [z1 , z2 ] \ {z1 , z2 }.
(4.40)
Finally, if R(a) is neither a singleton nor a line segment, then sp Tn (a) ⊂ conv R(a) \ ∂ conv R(a).
(4.41)
Proof. The case where R(a) is a singleton is trivial. The inclusion (4.41) is immediate from Proposition 4.27. We are so left with the case where R(a) is a proper line segment. From Proposition 4.27 we deduce that 9 sp Tn (a) ⊂ [z1 , z2 ]. Assume z1 is in sp Tn (a). Then, as in the proof of Proposition 4.27, (a − z1 )|f |2 = 0 for some nonzero f ∈ Pn+ . Since R(γ (a − z1 )) ⊂ [0, ∞) for some γ ∈ T, we therefore see that a(t) = z1 for all t ∈ T, / sp Tn (a). Analogously one can show that which means that R(a) is a singleton. Thus, z1 ∈ z2 ∈ / sp Tn (a). Theorem 4.29 (Brown and Halmos). Let a ∈ W and suppose d := dist (0, conv R(a)) > 0.
(4.42)
Then T (a) is invertible on 2 with T −1 (a)2 ≤ 1/d
(4.43)
and Tn (a) is invertible for all n ≥ 1 with Tn−1 (a)2 ≤ 1/d.
(4.44)
Proof. There is a γ ∈ T such that the set γ conv R(a) is contained in the half-plane {z ∈ C : Re z ≥ d}. Fix any ε ∈ (0, d). If r is sufficiently large, then the disk {z ∈ C : |z − (d + r − ε)| < r} certainly contains the set γ conv R(a). Thus |γ a(t) − d − r + ε| < r for all t ∈ T and therefore γ a(t) r d + r − ε − 1 < d + r − ε for all t ∈ T. Since γ T (a) = I + T d +r −ε
γa −1 d +r −ε
i
i i
i
i
i
i
4.8. Semi-Definite Matrices
buch7 2005/10/5 page 103 i
103
and the norm of the Toeplitz operator on the right is less than r/(d + r − ε) < 1, it follows from expansion into the Neumann series that T (a) is invertible and that d +r −ε 1 d +r −ε = T −1 (a)2 < , r |γ | 1 − d+r−ε d −ε whence T −1 (a)2 < 1/(d − ε). As ε ∈ (0, d) can be chosen as small as desired, we arrive at estimate (4.43). Clearly, the argument remains true with T (a) replaced by Tn (a) and so also gives (4.44). Example 4.30. Consider the n × n Toeplitz matrix ⎛ 2 −1 0 ⎜ −1 2 −1 ⎜ ⎜ Tn (b) = ⎜ 0 −1 2 ⎜ .. .. .. ⎝ . . . 0
0
0
⎞ ... 0 ... 0 ⎟ ⎟ ... 0 ⎟ ⎟. .. ⎟ .. . . ⎠ ... 2
The symbol is b(eiθ ) = −e−iθ + 2 − eiθ = 2(1 − cos θ). Obviously, R(b) = [0, 4] and hence d = dist (0, conv R(b)) = 0. Thus, Theorem 4.29 is not applicable. However, let us replace b by b + ign , where gn (eiθ ) = cos nθ = (einθ + e−inθ )/2. As the Fourier coefficients (gn )k of gn are zero for |k| ≤ n − 1, we have Tn (b) = Tn (b + ign ).
(4.45)
Clearly, dn := dist (0, conv R(b + ign )) > 0, and we can therefore apply Theorem 4.29 to the matrices (4.45). Our aim is to estimate dn from below. The graph of b + ign in C = R2 is given by (2 − 2 cos θ, cos nθ ),
θ ∈ (−π, π ].
(4.46)
Put εn =
π 8 2 π 4 1 − cos = sin . 3 3n 3 6n
(4.47)
1 1 − (2 − 2 cos θ) , 2 εn
(4.48)
The graph of 2 − 2 cos θ,
θ ∈R
is the straight line y = 1/2 − (1/εn )x. We show that the range of b + ign lies above this line. By (4.46) and (4.48) this is equivalent to showing that 1 1 (2 − 2 cos θ) + cos nθ ≥ εn 2
(4.49)
i
i i
i
i
i
i
104
buch7 2005/10/5 page 104 i
Chapter 4. Instability
for θ ∈ (−π, π]. If |nθ| < π/3, then cos nθ > 1/2 and hence (4.49) is true. If |nθ| ≥ π/3, then cos θ ≤ cos(π/(3n)), whence, by (4.47), 1 π 1 1 2 − 2 cos (2 − 2 cos θ ) + cos nθ ≥ −1= , εn εn 3n 2 which gives (4.49) again. Thus, dn ≥ Dn where Dn is the distance of the origin to the straight line y = 1/2 − (1/εn )x. Obviously, Dn =
εn 1 . # 4 1/4 + εn2 /4
√ Since εn → 0 as n → ∞, we have Dn > εn /(2 2 ) for all sufficiently large n. Taking into account (4.44), (4.45), (4.47) we therefore obtain √ √ √ 3 2 π 6n 2 1 1 2 2 3 2 1 −1 < ≤ < = < 10 n2 . Tn (b)2 ≤ dn Dn εn 4 sin2 (π/(6n)) 4 2 π To extend the trick employed in this example to symbols with more than one zero, we need a further result. Lemma 4.31 (Dirichlet). Let β1 , . . . , βN be real numbers and μ > 0. Then there exists a number q ∈ N such that 1 ≤ q ≤ ([1/μ] + 1)N and qβj ∈ Z + (−μ, μ) for all j ∈ {1, . . . , N}. Proof. For x ∈ R, denote by {x} the fractional part of x. Thus, x = [x] + {x} with [x] ∈ Z and {x} ∈ [0, 1). Put K = [1/μ] + 1 and divide the cube [0, 1)N into K N congruent cubes of the form [i1 /K, (i1 + 1)/K) × · · · × [iN /K, (iN + 1)/K).
(4.50)
The K N +1 points ({β1 }, . . . , {βN }), = 0, 1, . . . , K N all belong to [0, 1)N and therefore two of them must be located in the same cube (4.50). Consequently, there are 1 , 2 such that 0 ≤ 1 < 2 ≤ K N and |{2 βj } − {1 βj }| < 1/K for all j . Put q = 2 − 1 and mj = [2 βj ] − [1 βj ]. Then |qβj − mj | = |2 βj − [2 βj ] − (1 βj − [1 βj ])| = |{2 βj } − {1 βj }| < 1/K < μ. Theorem 4.32. Let b be a Laurent polynomial and suppose 0 ∈ b(T). Assume that Re b ≥ 0 on T and that Re b is not identically zero. Then Re b has a finite number of zeros on T and the orders of these zeros are all even. If 2α is the maximal order of the zeros of Re b on T, then Tn−1 (b)2 ≤ D n2α for all n ≥ 1 with some constant D ∈ (0, ∞) independent of n.
i
i i
i
i
i
i
4.8. Semi-Definite Matrices
buch7 2005/10/5 page 105 i
105
Proof. Put u(θ ) := Re b(eiθ ). Clearly, u is also a Laurent polynomial. By assumption, u does not vanish identically, u(θ ) ≥ 0 for all θ , and u has N ≥ 1 zeros θ1 , . . . , θN ∈ (−π, π ] of even orders 2α1 , . . . , 2αN . Since Tn (b) is invertible for all n ≥ 1 due to Proposition 4.27, it suffices to prove the estimate Tn−1 (b)2 ≤ D n2α for all n large enough. Using Lemma 4.31 with μ = 1/12 and βj = nθj /(2π ), we get an integer qn such that 1 ≤ qn ≤ 13N ,
nqn θj ∈ 2π Z + (−π/6, π/6).
(4.51)
We have cos(nqn θ ) = cos(nqn θj ) cos(nqn (θ − θj )) − sin(nqn θj ) sin(nqn (θ − θj )), and (4.51) shows that √ π 3 cos(nqn θj ) > cos = , 6 2
sin(nqn θj ) < sin
π 1 = . 6 2
If |nqn (θ − θj )| < π/6, then √ 3 π cos(nqn (θ − θj )) > cos = , 6 2
sin(nqn (θ − θj )) < sin
π 1 = . 6 2
Hence, √ √ 3 3 1 1 π 1 . − = for |θ − θj | < cos(nqn θ ) > 2 2 2 2 2 6nqn Choose δ > 0 so that the sets (θj − δ, θj + δ) are pairwise disjoint and put : 1 1 := min u(θ ) : ≤ |θ − θj | < δ . ωj (n) n
(4.52)
(4.53)
Since θj is a zero of the order 2αj , there is a constant Cj ∈ (1, ∞) such that (1/Cj ) n2αj ≤ ωj (n) ≤ Cj n2αj . Let v(θ ) := Im b(eiθ ) and put 1 εn,j
:= 3 (v∞ + 1) ωj
6nqn π
(4.54)
,
M := 2 (v∞ + 1).
(4.55)
Consider the function an (eiθ ) := b(eiθ ) + i M cos(nqn θ). Since qn ≥ 1, we have Tn (b) = Tn (an ). Now let n be so large that π/(6nqn ) < δ. We claim that the range R(an |(θj − δ, θj + δ)) lies above the straight line y = 1 − x/εn,j . As an (eiθ ) = u(θ ) + i(v(θ ) + M cos(nqn θ)), this is equivalent to claiming that v(θ ) + M cos(nqn θ) > 1 − u(θ )/εn,j
i
i i
i
i
i
i
106
buch7 2005/10/5 page 106 i
Chapter 4. Instability
for all θ ∈ (θj − δ, θj + δ). We prove that actually u(θ )/εn,j + M cos(nqn θ) > 1 + v∞
(4.56)
for all θ ∈ (θj − δ, θj + δ). If |θ − θj | < π/(6nqn ), then (4.52), (4.54), and the nonnegativity of u(θ ) give u(θ )/εn,j + M cos(nqn θ) >
m = v∞ + 1. 2
So let π/(6nqn ) ≤ |θ − θj | < δ. Then, by (4.53) and (4.55), u(θ )/εn,j + M cos(nqn θ) ≥
1 −M εn,j ωj (6nqn /π )
= 3 (v∞ + 1) − 2 (v∞ + 1) = v∞ + 1. This completes the proof of (4.56). ; Thus, the range of the restriction of an to j (θj − δ, θj + δ) lies above the line y = 1 − x/εn ,
εn := min εn,j j
(4.57)
(here we also took into account that Re an ≥ 0). The number η given by ⎫ ⎧ < N ⎬ ⎨ . (θj − δ, θj + δ) η = inf u(θ ) : θ ∈ (−π, π ] ⎭ ⎩ j =1
;
is positive. If θ ∈ (−π, π ] \ j (θj − δ, θj + δ), then an (eiθ ) is located on the right of the vertical line x = η. Since 1/εn → ∞ as n → ∞, it follows that R(an ) is contained in the half-plane above the line (4.57) for all sufficiently large n. # The distance of the origin to the line (4.57) is Dn := εn / 1 + εn2 . Thus, Dn > εn /2 if only n is large enough. From Theorem 4.29 and (4.55), (4.57) we now obtain that 6nqn 1 2 −1 −1 < = 6 (v∞ + 1) max ωj , Tn (b)2 = Tn (an )2 ≤ j Dn εn π whence, by (4.51) and (4.54), Tn−1 (b)2 = O(nmax(2α1 ,...,2αN ) ). Example 4.33. In general, there is a gap between Theorems 4.11 and 4.32. Let b(t) = t and λ = eiθ0 ∈ T. Since b − λ has a zero of the order 1 on T, Theorem 4.11 gives Tn−1 (b − λ)2 ≥ C n. We have Re (−λ−1 (b − λ)) = 1 − Re (e−iθ0 eiθ ) = 1 − cos(θ − θ0 ) ≥ 0, and 1 − cos(θ − θ0 ) has a zero of the order 2. Thus, by Theorem 4.32, Tn−1 (b − λ)2 = Tn−1 (−λ−1 (b − λ))2 ≤ D n2 .
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 107 i
107
In fact Tn−1 (b − λ)2 ≤ n, because
⎛ Tn−1 (b
⎜ 1⎜ ⎜ − λ) = − ⎜ λ⎜ ⎝
1 1/λ 1/λ2 .. .
0 1 1/λ .. .
... ... ... .. .
0 0 0 .. .
1/λn−1
1/λn−2
... 1
⎞ ⎟ ⎟ ⎟ ⎟, ⎟ ⎠
and on writing this matrix as 1 1 1 − Tn (1) − 2 Tn (t) − · · · − n−1 Tn (t n−1 ), λ λ λ we see that its spectral norm is at most n. In certain cases combination of Theorems 4.11 and 4.32 yields a sharp result. Here is the most important of these cases. Corollary 4.34. Let b be a real-valued Laurent polynomial and suppose b is not constant. Then R(b) = [m, M] with m < M. If λ ∈ {m, M} and the maximal order of the zeros of b − λ on T is 2α, then Tn−1 (b − λ)2 n2α . Proof. The proof is immediate from Theorems 4.11 and 4.32.
Exercises 1. Try to prove Theorem 4.14. 2. (a) Find a b ∈ P such that b(T) = [m, M] with m < 0, M > 0, and Dn (b) = 0 for all n ≥ 1. (b) Find a b ∈ P such that b(T) = [m, M] with m < 0 and M > 0 and such that there are infinitely many n with Dn (b) = 0 and infinitely many n with Dn (b) = 0. 3. Show that if b ∈ P does not vanish identically and b(T) ⊂ R, then Dn (b) = 0 for infinitely many n. 4. Let a ∈ W and denote by Sn a and σn a the nth partial sum of the Fourier series of a and the nth Fejér mean, respectively. (a) Show that if Re a ≥ ε > 0 on T, then T −1 (Sn a)2 → T −1 (a)2 ,
T −1 (σn a)2 → T −1 (a)2 .
(b) Show that if a(eiθ ) = θ 2 /4 (θ ∈ (−π, π ]), then a ∈ W and T (Sn a) is not invertible whenever n is odd.
i
i i
i
i
i
i
108
buch7 2005/10/5 page 108 i
Chapter 4. Instability (c) Show that if Re a ≥ 0 on T and Re a has exactly N zeros of the orders 2α1 , . . . , 2αN on T, then T −1 (σn a)2 = O n2 max(α1 ,...,αN ) .
5. Let a ∈ W and suppose a vanishes identically on some subarc of T. Show that there exist positive constants C and α such that Tn−1 (a)2 ≥ Ceαn for all n ≥ 1. 6. Let a ∈ W and suppose Re a ≥ 0 on T. Assume that Tn−1 (a)2 = O(nα ) for some α 0. We denote by 2α the Hilbert space of all complex sequences {xn }∞ n=1 for which
≥ 2α n |xn |2 < ∞. Prove that if y is an element of 2 such that the equation T (a)x = y has a solution x ∈ 2α , then the solution x (n) of Tn (a)x (n) = Pn y converges to x in the norm of 2 . 7. We denote by 2∞ the countably normed space of all complex sequences x = {xn }∞ n=1 satisfying ∞
x22,k :=
n2k |xn |2 < ∞
n=1
for all k ∈ N. Let b ∈ P. (a) Show that T (b) is bounded on 2∞ . (b) Let τ1 , . . . , τm be the zeros of b on T (repeated according to their multiplicities). Then b(t) =
m
1−
j =1
τj c(t), t
where c(t) = 0 for t ∈ T. Show that T (b) is invertible on 2∞ if and only if wind c = 0. (c) Let K be a compact operator on 2∞ and suppose that T (b) + K is invertible on 2∞ . Show that for each y ∈ 2∞ the equations (Tn (a) + Pn KPn )x (n) = Pn y have a unique solution whenever n is large enough and that x (n) converges in 2∞ to the solution x ∈ 2∞ of the equation (T (a) + K)x = y. 8. Show that Tn−1 (2 − t − t −1 ) equals ⎛ 1 1 ... ... 1 ⎜ 1 2 ... . . . 2 ⎜ ⎜ .. .. . . .. ⎜ . . . . ⎜ ⎜ .. .. ⎝ . . n−1 n−1 1 2 ... n − 1 n
⎞
⎛ ⎟ ⎟ ⎜ ⎟ ⎟− 1 ⎜ ⎟ n+1⎜ ⎝ ⎟ ⎠
1 2 .. .
⎞ ⎟ ⎟ ⎟ 1 ⎠
2
...
n
.
n
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 109 i
109
Notes Estimates for the growth of Tn−1 (b)p in the case where b has zeros or nonzero winding number have been of great interest for a long time. For example, questions on random walks or on Toeplitz determinants with so-called Fisher-Hartwig symbols necessitate such estimates. As Tn (b)2 Tn−1 (b)2 is just the (spectral) condition number of Tn (b), the need for such estimates is of course also currently emerging in numerical analysis. Theorem 4.1 as it is stated was established in our paper [47] (by different methods). However, at least for p = 2 and in a few other contexts, the theorem was known before. For example, the formulas of [25] for the entries of Tn−1 (a) show that these entries grow exponentially for symbols with nonzero winding number. The abstract of Reichel and Trefethen’s paper [219] begins as follows: “The eigenvalues of a nonhermitian Toeplitz matrix A are usually highly sensitive to perturbations, having condition numbers that increase exponentially with the dimension N . An equivalent statement is that the resolvent (zI − A)−1 of a Toeplitz matrix may be much larger in norm than the eigenvalues alone would suggest - exponentially large as a function of N , even when z is far away from the spectrum.” The results of Sections 4.1 and 4.3 are also from [47]. The change between sequences and polynomials has been successfully employed since the earliest studies of Toeplitz matrices. In particular results like Lemmas 4.7 to 4.9 are already in [145]. Lemma 4.10 is taken from [46]. In the case of Hermitian matrices, a good deal of the results of Sections 4.5 and 4.8 are known. In particular, Corollary 4.34 goes back to Kac, Murdock, Szegö, Parter, and Widom for symbols with a single zero and to Serra Capizzano for symbols with several zeros. We will discuss this problem in detail in Chapter 10. In the form stated here, the results of Sections 4.5 and 4.8 were established in our paper [46]. Theorem 4.13 appeared in Andreas Pomp’s preprints [209], [210] for the first time. People working on Toeplitz operators and matrices with “singular” symbols have come to appreciate Duduchava and Roch’s formula (4.29) as a kind of magic wand. For γ + δ = 0, the formula was established by Duduchava [105]. When studying Toeplitz determinants with Fisher-Hartwig symbols, Silbermann and one of the authors realized the need for the formula for Re (γ + δ) > −1. Steffen Roch was able to extend the formula to this significantly more general case. It was published in [68] for the first time. The original proofs by Duduchava and Roch were very complicated. A simpler proof is in [68] and [70]: This proof is based on expanding a hypergeometric function into a power series in two different ways and on subsequently comparing the coefficients of equal powers. Corollary 4.15 and Theorems 4.16 and 4.17 are from [69], and the results of Section 4.7 are all taken from [47]. We thank André Eppler for Figures 4.1 to 4.3. Further results: entries of the inverse. Let c ∈ P and suppose c is positive, c(T) ⊂ (0, ∞). For each natural number α, the matrices Tn (|t − 1|2α c(t)) are positive definite banded Toeplitz matrices. The inverses Tn−1 (|t −1|2α c(t)) have been of considerable interest since papers by Spitzer and Stone [257] and Kesten [178], [179]. It has been well known for a long time, at least since [90], that for 1 ≤ k ≤ ≤ n the k, entry of Tn−1 (|t − 1|2 ) is 1 1 −1 2 [Tn (|t − 1| )]k, = k − . (4.58) n+1
i
i i
i
i
i
i
110
buch7 2005/10/5 page 110 i
Chapter 4. Instability
In the case α = 2, we have for 1 ≤ k ≤ ≤ n the formula 1 [T −1 (|t − 1|4 )]k, k(k + 1)( + 1) n 1 k+−1 1 1 1 − − − = +2 n+3 2 ( + 1)( + 2) (n + 2)(n + 3) (k − 1)( − 1) 1 1 + − 3 ( + 1)( + 2) (n + 1)(n + 2)(n + 3)
(4.59)
(see, e.g., [2]). Since the matrices Tn−1 (|t − 1|2α ) are Hermitian, the right-hand sides of (4.58) and (4.59) also give the entries [Tn−1 (|t − 1|2α )],k for 1 ≤ k ≤ ≤ n. Now take k = [nx] and = [ny] with fixed 1 ≤ x ≤ y ≤ 1. We here (and only here) denote by [nx] and [ny] the smallest integer in {1, . . . , n} that is greater than or equal to nx and ny, respectively. We so arrive at the asymptotic formulas 1 −1 [T (|t − 1|2 )][nx],[ny] = x(1 − y) + o(1), n n 1 1 [T −1 (|t − 1|4 )][nx],[ny] = x 2 (1 − y)2 (3y − 2xy − x) + o(1). n3 n 6 Note that all these formulas concern the case where the function c is identically 1. The presence of the factor c(t) complicates things significantly. Recently Rambour and Seghier [215], [216] proved that if x, y ∈ [0, 1], then [Tn−1 (|t − 1|2α c(t))][nx],[ny] =
1 Gα (x, y)n2α−1 + o(n2α−1 ) as n → ∞, c(1)
(4.60)
uniformly with respect to x and y in [0, 1] (see also [214] for the case α = 1). The constant Gα (x, y) is independent of c. Thus, it is only the value of c at the zero of |t − 1|2α that enters the principal term of the right-hand side of (4.60). The constant Gα (x, y) satisfies Gα (x, y) = Gα (y, x) = Gα (1 − x, 1 − y) = Gα (1 − y, 1 − x)
(4.61)
and hence it suffices to find Gα (x, y) for 0 ≤ x ≤ y ≤ 1 or even only for 0 ≤ x ≤ 1 and y ≥ max(x, 1 − x). Rambour and Seghier showed that, for these x and y, G1 (x, y) = x(1 − y), 1 G2 (x, y) = x 2 (1 − y)2 (3y − x − 2xy), 6 and they established a formula that allows at least in principle the computation Gα (x, y) for all α. In [40], the Duduchava-Roch formula (4.29) was used to prove (4.60) for c = 1 and to find the constant Gα (x, y) in “closed form.” The result is as follows: If 0 ≤ x ≤ 1 and y ≥ max(x, 1 − x), then xαyα Gα (x, y) = [(α − 1)!]2
y
1
(t − x)α−1 (t − y)α−1 dt. t 2α
(4.62)
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 111 i
111
In particular, Gα (x, x) =
x 2α−1 (1 − x)2α−1 . (2α − 1)[(α − 1)!]2
(4.63)
Combining (4.60) and (4.63) we get the trace formula tr Tn−1 (|t − 1|2α c(t)) =
1 2α (2α − 1)!(2α − 2)! + o(n2α ). n c(1) (4α − 1)![(α − 1)!]2
The sum of the entries of the inverse is also known to be of interest (see, e.g., [1], [242], [287]). In [40], it is shown that (4.60), (4.61), (4.62) yield the formula n j,k=1
[Tn−1 (|t − 1|2α c(t))]j,k =
1 1 c(1) 2α + 1
*
α! (2α)!
+2 n2α+1 + o(n2α+1 ).
We will say more on Gα (x, y) and the accompanying story in Chapter 10.
i
i i
i
i
buch7 2005/10/5 page 112 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 113 i
Chapter 5
Norms
The norms of pure Toeplitz and of Toeplitz-like matrices approach a limit as the matrix dimension goes to infinity. In this chapter we identify this limit and give estimates for the speed of convergence.
5.1 A Universal Estimate Let b be a Laurent polynomial. We write b in the form b(t) =
s
bj t j
(t ∈ T),
(5.1)
j =−s
assuming that at least one of the coefficients bs and b−s is nonzero. It is clear that Tn (b)1 = Tn (b)∞ =
s
|bj | = T (b)1 = T (b)∞
j =−s
whenever n ≥ 2s + 1. So let 1 < p < ∞. Obviously, Tn (b)p ≤ T (b)p . Since Tn (b) converges strongly to T (b), we deduce from Theorem 3.1 that T (b)p ≤ lim inf Tn (b)p . n→∞
Thus, lim Tn (b)p = T (b)p .
n→∞
(5.2)
The purpose of this section is to show that always Tn (b)p = T (b)p + O(1/n). We begin with a few elementary lemmas. As usual, [α] is the integral part of α. 113
i
i i
i
i
i
i
114
buch7 2005/10/5 page 114 i
Chapter 5. Norms
Lemma 5.1. Let n and s be natural numbers. (a) The identity n = [4n/3] − [ [4n/3] ] holds. 4 (b) If n ≥ 12 then [4n/3] ≥ 5n/4. (c) If n ≥ 8 + 12 s then [ 1s [ [4n/3] ]] ≥ 4sn . 4 Proof. (a) For n = 3k + with k ∈ N and ∈ {0, 1, 2}, the asserted identity is equivalent to the obvious identity n = 4k + − k. (b) This is immediate from the inequality [4n/3] ≥ 4n/3 − 1. (c) Let n = 3k + with k ∈ N and ∈ {0, 1, 2}. Then * * ++ * * * +++ * + 1 [4n/3] 1 1 4 k k = k+ = ≥ −1 s 4 s 4 3 s s 1 n−2 n − 2 − 3s n n − 8 − 12s ≥ −1= = + . s 3 3s 4s 12s Lemma 5.2. If 1 < p < ∞ and 0 ≤ x ≤ 1/2, then (1 − x)1/p > 1 − 2x/p. Proof. A look at the graphs of the functions y = (1 − x)1/p and y = 1 − 2x/p shows that it suffices to prove the asserted inequality for x = 1/2. Thus, we must prove that 1/p + 1/21/p > 1. Consider the function f (t) = t + 2−t . Since f (0) = 1 and f (t) = 1 − (log 2)2−t > 1 − log 2 > 0 for t > 0, it follows that f (t) > 1 for t > 0. Taking t = 1/p we get 1/p + 1/21/p > 1, as desired. Lemma 5.3. If 1 < p < ∞ and 1/p + 1/q = 1, then Tn (b)p = Tn ( b)p = Tn (b)q = Tn ( b)q . Proof. From (3.10) we obtain Tn ( b)p = Wn Tn (b)Wn p ≤ Tn (b)p = Wn Tn ( b)Wn p ≤ Tn ( b)p , whence Tn (b)p = Tn ( b)p . As Tn ( b) is the transpose matrix of Tn (b), we see that Tn (b)p = Tn (b)q . Lemma 5.4. Let 1 < p < ∞ and let s, , n be natural numbers satisfying n ≥ 3 and p 1 ≤ s ≤ ≤ n/3. If x ∈ n is a unit vector, xpp
=
n−1
|xj |p = 1,
j =0
then there is a natural number m such that + s ≤ m ≤ 3 − s and m+s−1 j =m−s
|xj |p ≤
* +−1 . s
i
i i
i
i
i
i
5.1. A Universal Estimate
buch7 2005/10/5 page 115 i
115
Proof. Put d = [/s]. Since + 2ds ≤ 3, we have 1≥
3−1
|xj |p ≥ (|x |p + · · · + |x+2s−1 |p )
j =
+ (|x+2s |p + · · · + |x+4s−1 |p ) + · · · + (|x+2(d−1)s |p + · · · + |x+2ds−1 |p ).
As there are d terms on the right-hand side, at least one of them does not exceed 1/d. Hence, there exist a k0 ∈ {0, . . . , d − 1} such that |x+2k0 s |2 + · · · + |x+2k0 s+2s−1 |2 ≤
1 . d
The assertion now follows with m = + 2k0 s + s. Theorem 5.5. Let b be given by (5.1) and let 1 < p < ∞. If n ≥ 40 s, then 40 s T (b)p 1 − ≤ Tn (b)p ≤ T (b)p . pn
(5.3)
Proof. Put M0 = T (b)p and εn = M0 − Tn (b)p . We already know that εn converges monotonically to zero. Thus, we are left with showing that εn ≤
40 s M0 . pn
(5.4)
The assertion is trivial for s = 0. So let s ≥ 1 and n ≥ 40 s. Choose a vector x = p (x0 , . . . , xn−1 ) ∈ n so that xp = 1 and Tn (b)xp = M0 − εn . Set = [n/4]. By Lemma 5.4, there exists a natural number m for which * +−1 m+s−1 p + s ≤ m ≤ 3 − s and |xj | < . s j =m−s
(5.5)
(5.6)
We have Tn (b)xpp =
m−1
|(Tn (b)x)j |p +
j =0
n−1
|(Tn (b)x)j |p .
(5.7)
j =m
Since bj −k = 0 for j ≤ m − 1 and k ≥ m + s, we get p m−1 m−1 m+s−1 |(Tn (b)x)j |p = bj −k xk j =0 j =0 k=0 p m+s−1 m+s−1 = = m+s−1 =p ≤ bj −k xk = =Tm+s (b)(xk )k=0 p j =0
k=0
≤ (M0 − εm+s )p
m+s−1
|xk |p .
(5.8)
k=0
i
i i
i
i
i
i
116
buch7 2005/10/5 page 116 i
Chapter 5. Norms
Analogously, starting with the observation that bj −k = 0 if j ≥ m and k ≤ m − s − 1, we obtain n−1 p p n−1 n−1 n−1 n−m+s−1 |(Tn (b)x)j |p = bj −k xk = bj −n+1+r xn−1−r r=0 j =m j =m k=m−s j =m p n−m−1 n−m+s−1 p n−m−1 n−m+s−1 = bn−1−i−n+1+r xn−1−r = br−i xn−1−r r=0 r=0 i=0 i=0 p n−m+s−1 n−m+s−1 = = n−m+s−1 =p ≤ br−i xn−1−r = =Tn−m+s ( b)(xn−1−r )r=0 p i=0
r=0
≤ (M0 − εn−m+s )p
n−m+s−1
|xn−1−r |p
(by Lemma 5.3)
r=0
= (M0 − εn−m+s )p
n−1
|xk |p .
(5.9)
k=m−s
Combining (5.7), (5.8), (5.9), we arrive at the inequality Tn (b)xpp ≤ (M0 − εm+s )p
m+s−1
|xk |p + (M0 − εn−m+s )p
k=0
n−1
|xk |p .
k=m−s
As m + s ≤ n − and n − m + s ≤ n − , and as εk is monotonically decreasing, it follows that m+s−1 n−1 p p p p Tn (b)xp ≤ (M0 − εn− ) |xk | + |xk | . k=0
k=m−s
The equality xp = 1 and inequality (5.6) imply that m+s−1 k=0
|xk |p +
n−1
|xk |p ≤ 1 + [/s]−1 ,
(5.10)
k=m−s
whence Tn (b)xpp ≤ (M0 − εn− )p (1 + [/s]) ≤ (M0 − εn− )p + M0 [/s]−1 . p
Thus, by (5.5), (M0 − εn )p ≤ (M0 − εn− )p + M0 [/s]−1 or, equivalently, * +−1 εn p εn− p 1− ≤ 1− + . M0 M0 s
(5.11)
Recall that = [n/4] and put n1 = [4n/3]. In (5.11), we can replace n by n1 . Taking into consideration Lemma 5.1(a),(c) we get εn p εn p 4s 1− 1 (5.12) ≤ 1− + ; M0 M0 n
i
i i
i
i
i
i
5.2. Spectral Norm of Toeplitz Matrices
buch7 2005/10/5 page 117 i
117
notice that n ≥ 40 s > 8 + 12 s. Now substitute the n in (5.12) consecutively by * + * * + + 4nj −1 4n 4n1 n1 = , n2 = , . . . , nj = , 3 3 3 add the corresponding inequalities and pass to the limit j → ∞. What results is that ∞ εn p 4s 4s 1− 1− ≤ + . M0 n n j =1 j By Lemma 5.1(b), n 4 ≤ , n1 5
4 n1 ≤ , n2 5
4 n2 ≤ , n3 5
... .
Consequently, ∞ 4s 20 s 4s 4 j εn p ≤ = + . 1− 1− M0 n n j =1 5 n Since n ≥ 40 s, we infer from Lemma 5.2 that 20 s 1/p 40 s εn ≤ 1 − 1 − M0 ≤ M0 , n pn which is (5.4). Notice that, by Lemma 5.3, estimate (5.3) can be improved to 40 s T (b)p 1 − ≤ Tn (b)p ≤ T (b)p . n max(p, q)
5.2
Spectral Norm of Toeplitz Matrices
Theorem 5.5 results from the techniques we employed to prove it, and this theorem can probably be sharpened. In this section we establish a significant improvement of Theorem 5.5 in the case p = 2. If b(t) takes a constant value b0 for all t ∈ T, then Tn (b)2 = |b0 | = T (b)2 for all n ≥ 1. The following proposition describes the Laurent polynomials with constant modulus. Proposition 5.6. If b is a Laurent polynomial and |b| is constant on T, then b(t) = γ t m (t ∈ T) with constants γ ∈ C and m ∈ Z. Proof. The assertion is trivial if b vanishes identically. So suppose b is not identically zero and let bm be the first nonzero coefficient of b. Then t −m b(t) = bm + bm+1 t + · · · + bm+k t k
(t ∈ T)
i
i i
i
i
i
i
118
buch7 2005/10/5 page 118 i
Chapter 5. Norms
with bm+k = 0. In the case k = 0 we are done. So assume k ≥ 1. The function f (z) := (bm + bm+1 z + · · · + bm+k zk )(bm + bm+1 z−1 + · · · + bm+k z−k ) is analytic in C \ {0} and takes the constant value of |b|2 on T. It follows that f is constant throughout C \ {0}, which is impossible because f (z) = bm+k bm zk + O(zk−1 ) as z → ∞ and bm+k bm = 0. Proposition 5.6 implies that if |b| is constant on T, then Tn (b)2 = |γ | = T (b)2 for all n ≥ m + 1, where b(t) = γ t m . We now turn to symbols whose modulus is not constant. We begin with semi-definite Hermitian matrices. Lemma 5.7. Let b be a nonconstant and nonnegative Laurent polynomial and let 2γ be the maximal order of the zeros of b∞ − b(t) on T. Then there exist constants 0 < d1 < d2 < ∞ independent of n such that d2 d1 b∞ 1 − 2γ ≤ Tn (b)2 ≤ b∞ 1 − 2γ for all n ≥ 1. n n Proof. Let M = b∞ . By Corollary 4.34, there are constants 0 < D1 < D2 < ∞ such that D1 n2γ ≤ Tn−1 (M − b)2 ≤ D2 n2γ .
(5.13)
Given a positive definite Hermitian matrix A, we denote by λmin (A) and λmax (A) the minimal and maximal eigenvalue of A, respectively. Recall that rad (·) stands for the spectral radius. We have Tn−1 (M − b)2 = rad (Tn−1 (M − b)) = 1/λmin (MI − Tn (b)) = 1/(M − λmax (Tn (b))) = 1/(M − Tn (b)2 ). Inserting this into (5.13) we get 1 1 1 1 ≤ T . (b) ≤ M 1 − M 1− n 2 MD2 n2γ MD1 n2γ As the following result shows, Lemma 5.7 is almost literally true in the general case, too. Recall that T (b)2 = b∞ . Theorem 5.8. Let b be a Laurent polynomial, suppose |b| is not constant on T, and let 2γ be the maximal order of the zeros of b∞ − |b(t)| on T. Then there are constants 0 < d1 < d2 < ∞ which do not depend on n such that d2 d1 (5.14) b∞ 1 − 2γ ≤ Tn (b)2 ≤ b∞ 1 − 2γ for all n ≥ 1. n n
Proof. Let b(t) = sj =−s bj t j (t ∈ T) with s ≥ 1. By Proposition 3.10, Tn (b)Tn (b) = Tn (|b|2 ) − Pn KPn − Wn LWn ,
i
i i
i
i
i
i
5.2. Spectral Norm of Toeplitz Matrices
buch7 2005/10/5 page 119 i
119
where ⎞⎛ ⎞ b1 b2 . . . b1 b2 . . . ⎠ ⎝ b2 . . . ⎠ K = H (b)H ( b) = ⎝ b2 . . . ... ... ⎛
and ⎛
b−1 L = H ( b)H (b) = ⎝ b−2 ...
b−2 ...
...
⎞⎛
b−1 ⎠ ⎝ b−2 ...
b−2 ...
...
⎞ ⎠.
Since (Kx, x) ≥ 0 and (Lx, x) ≥ 0 for all x ∈ 2 , it follows that (Tn (b)Tn (b)x, x) ≤ (Tn (|b|2 )x, x) for all x ∈ 2 . As Tn (b)Tn (b) and Tn (|b|2 ) are Hermitian, we therefore obtain Tn (b)22 = Tn (b)Tn (b)2 ≤ Tn (|b|2 )2 .
(5.15)
Lemma 5.7 implies that Tn (|b|2 )2 ≤ |b|2 ∞ (1 − d/n2γ ) = b2∞ (1 − d/n2γ ) ≤ b2∞ (1 − d/(2n2γ ))2 .
(5.16)
Combining (5.15) and (5.16), we get the upper estimate in (5.14) with d1 = d/2. We are left to prove the lower estimate in (5.14). Put M = b2∞ . Since Tn (b(t/t0 )) = diag (1, t0−1 , . . . , t0−n+1 ) Tn (b) diag (1, t0 , . . . , t0n+1 ), we can without loss of generality assume that M = |b(1)|2 . By assumption, there is a constant C ∈ (0, ∞) such that M − |b(eiθ )|2 ≤ C |θ |2γ for all θ ∈ [−π, π ].
(5.17)
For x ∈ 2n , Tn (b)22 x22 = Tn (b)Tn (b)2 x22 ≥ |(Tn (b)Tn (b)x, x)| = |(Tn (|b|2 )x, x) − (Pn KPn x, x) − (Wn LWn x, x)| ≥ Mx22 − |((M − Tn (|b|2 ))x, x)| − |(Pn KPn x, x)| − |(Wn LWn x, x)|.
(5.18)
We now identify 2n and Pn+ as in Section 4.4. Put j = γ + 1, let m be the natural number given by mj < n ≤ (m + 1)j , define f ∈ Pn+ by
f (e ) = 1 + e + · · · + e iθ
iθ
imθ j
=e
imj θ/2
m+1 θ 2 θ sin 2
sin
j (5.19)
i
i i
i
i
i
i
120
buch7 2005/10/5 page 120 i
Chapter 5. Norms
(recall (4.20) and (4.21)), and let x ∈ 2n be the sequence of the Fourier coefficients of f . From Lemma 4.10 we know that f 2 ≥ C1 (m + 1)j −1/2 with some C1 > 0.
(5.20)
We estimate the terms on the right of (5.18) separately. Since bj = 0 for |j | > s, we have |(Pn KPn x, x)| = |(Ps KPs x, x)| = |(KPs x, Ps x)| ≤ K2 Ps x22 , and from (5.19) we infer that Ps x is independent of n provided n (and thus m) is large enough. Consequently, by (5.20) and Lemma 4.7, C2 C2 j 2j −1 C2 (γ + 1)2γ +1 |(Pn KPn x, x)| ≤ ≤ = (m + 1)2j −1 n2j −1 n2γ +1 x22
(5.21)
with some C2 ∈ (0, ∞) independent of n. Analogously, C3 (γ + 1)2γ +1 |(Wn LWn x, x)| ≤ , 2 n2γ +1 x2
(5.22)
where C3 ∈ (0, ∞) does not depend on n. From (5.17), (5.19), and Lemma 4.15 we obtain 2π |((M − Tn (|b|2 ))x, x)| = 2π |(Tn (M − |b|2 )x, x)| 2j π iθ 2 sin((m + 1)θ/2) = (M − |b(e )| ) dθ sin(θ/2) −π (m + 1)|θ | 2j π 2 2j 2γ ≤ C|θ | dθ 2 2 |θ | |θ|1/(m+1) ≤ C3
(m + 1)2j 1 1 + C4 = C3 (m + 1) + C4 ≤ C5 n (m + 1)2γ +1 (m + 1)2j −2γ (m + 1)2
with constants C3 , C4 , C5 ∈ (0, ∞) independent of n. Taking into account (5.20) we therefore get |((M − Tn (|b|2 ))x, x)| n 1 ≤ C6 2j −1 = C6 2γ , n n x22
(5.23)
where C6 ∈ (0, ∞) is independent of n. Putting (5.18), (5.21), (5.22), (5.23) together we arrive at the estimate Tn (b)22 ≥ M(1 − C7 /n2γ ), where C7 ∈ (0, ∞) is independent of n, whence Tn (b)2 ≥ b∞ (1 − C7 /n2γ ) as soon as n2γ > C7 . An example is considered in Figure 5.1. The function b∞ − |b(t)| has two zeros, one of the order 4 and of the order 2. Hence, by Theorem 5.8, b∞ − Tn (b)2 decays as cn /n4 with a bounded sequence {cn }. Figure 5.1 shows precisely this decay with cn stabilizing very quickly at a constant value.
i
i i
i
i
i
i
5.3. Fejér Means
buch7 2005/10/5 page 121 i
121
0
10
60 40 20 0 −20
–1
10
−40 −60 −50
0
50
66 64
–2
10
62 60 58 56 54
–3
52
−1
0
1
2
3
4
10
0
10
1
10
2
10
Figure 5.1. The symbol is b(t) = (64 − |t − 1|4 |t + 1|2 )t. The set b(T) is shown in the upper-left picture. The lower-left picture shows |b(eiθ )| for −π/2 ≤ θ ≤ 3π/2 with the maximum line b∞ = 64. In the right picture, the asterisks mark b∞ − Tn (b)2 versus n. As we have logarithmic scales, the slope −4 of the asterisks corresponds to a decay of b∞ − Tn (b)2 as constant/n4 . The straight line in the picture is simply a line with the slope −4. Corollary 5.9. If b is a Laurent polynomial, then there is a constant d ∈ (0, ∞) depending only on b such that d b∞ 1 − 2 ≤ Tn (b)2 ≤ b∞ for all n ≥ 1. n Proof. By Proposition 5.6, this is trivial if |b| is constant on T. So assume that |b| is not constant on T. Since the function b∞ − |b(t)| does always have a zero of order at least 2, the lower estimate follows from Theorem 5.8. The upper estimate is obvious.
5.3
Fejér Means
In this section, we slightly change our view at Toeplitz matrices. Namely, given a sequence n−1 b = {bj }∞ j =−∞ of complex numbers, we denote by Tn (b) := (bj −k )j,k=0 the n × n Toeplitz
i
i i
i
i
i
i
122
buch7 2005/10/5 page 122 i
Chapter 5. Norms
matrix generated by this sequence. We consider the Laurent polynomials (Sn b)(eiϕ ) =
bj eij ϕ ,
|j |≤n−1
(σn b)(eiϕ ) =
|j | ij ϕ bj 1 − e . n |j |≤n−1
1 Clearly, if {bj }∞ j =−∞ is the sequence of the Fourier coefficients of an L function a, then Sn b is just the nth partial sum Sn a of the Fourier series of a and σn b is the nth Fejér mean σn a of the Fourier series of a. Since Tn (b) = Tn (Sn b), it is clear that Tn (b)2 ≤ Sn b∞ for all n ≥ 1 and all sequences b. Is there a universal finite constant C such that Tn (b)2 ≤ Cσn b∞ for all n ≥ 1 and b? The answer is negative: If the sequence b has only one nonzero term, say bn−1 = 1 (n ≥ 1), then Tn (b) is the matrix with 1 in the lower-left corner and zeros elsewhere, so that Tn (b)2 = 1, although (σn b)(eiϕ ) = ei(n−1)ϕ /n, which implies that σn b∞ = 1/n. What about estimates of Tn (b)2 from below? Is there a universal constant c > 0 such that cSn b∞ ≤ Tn (b)2 for all n ≥ 1 and all b? The answer is again negative. Indeed, assume the answer is in the affirmative. Let a be a continuous function on the complex unit circle T and let b = {bj } be the sequence of the Fourier coefficients of a. Since Tn (b)2 ≤ a∞ for all n ≥ 1, it follows that Sn a∞ ≤ (1/c)a∞ for all continuous functions a. This implies that Sn a converges uniformly to a for every continuous function a, which is well known not to be true since du Bois-Reymond’s 1876 paper [29] (also see [304, Theorem VIII.1.2]). On the other hand, we know from Fejér [119] that the means σn a converge uniformly to a for every continuous function a. The following theorem provides us with a simple lower estimate for the spectral norm of a Toeplitz matrix through the Fejér mean.
Theorem 5.10. The inequality Tn (b)2 ≥ σn b∞ holds for every n ≥ 1 and every sequence b. Proof. Fix t = eiθ ∈ T and let xt ∈ Cn be the vector xt = xt = 1 and hence Tn (b)2 ≥ |(Tn (b)xt , xt )|. Since
(Tn (b)xt , xt ) = =
n−1 √1 (1, t, . . . , t ). n
Then
n−1 1 bj −k e−ikθ eij θ n j,k=0
n−1 n−1 1 1 bj −k ei(j −k)θ = (n − |j |)bj eij θ = (σn b)(eiθ ), n j,k=0 n j =−(n−1)
we get Tn (b)2 ≥ |(σn b)(eiθ )| = |(σn b)(t)|. As t ∈ T can be chosen arbitrarily, it follows that Tn (b)2 ≥ σn b∞ .
i
i i
i
i
i
i
5.4. Toeplitz-Like Matrices
buch7 2005/10/5 page 123 i
123
5.4 Toeplitz-Like Matrices In this section we consider sequences {Bn }∞ n=1 of n × n matrices Bn of the form Bn = Tn (b) + Pn KPn + Wn LWn ,
s
(5.24)
where b(t) = j =−s bj t (t ∈ T) and where K and L have only a finite number of nonzero entries, which means that j
Pn0 KPn0 = K,
Pn0 LPn0 = L
(5.25)
for some n0 ∈ N. Thus, the matrix Bn differs from Tn (b) by the n0 ×n0 block K in the upper left and the n0 × n0 block Wn LWn in the lower right corners. We refer to such matrices as Toeplitz-like matrices. From Section 3.5 we know that the investigation of the inverses Tn−1 (a) of Toeplitz matrices leads to matrices having the structure (5.24). Here is another context in which matrices of the form (5.24) emerge. Proposition 5.11. Let bj k be a finite collection of Laurent polynomials. Then there exist matrices K and L satisfying (5.25) for some n0 ∈ N such that ⎛ ⎞ Tn (bj k ) = Tn ⎝ bj k ⎠ + Pn KPn + Wn LWn j
k
j
k
for all sufficiently large n. Proof. It suffices to prove that Tn (b1 ) . . . Tn (bN ) = Tn (b1 . . . bN ) + Pn KN Pn + Wn LN Wn
(5.26)
for all sufficiently large n. Clearly, (5.26) holds with KN = LN = 0 if N = 1. So assume (5.26) is true with Pn0 KN Pn0 = KN and Pn0 LN Pn0 = LN for some N ≥ 1. We have Tn (b1 ) . . . Tn (bN )Tn (bN +1 ) − Tn (b1 . . . bN )Tn (bN +1 ) = Pn KN Pn Tn (bN +1 ) + Wn LN Wn Tn (bN +1 ) = Pn KN T (bN +1 )Pn − Pn KN Qn T (bN +1 )Pn − Wn LN T ( bN +1 )Wn − Wn LN Qn T ( bN +1 )Wn . Since KN Qn = 0 and LN Qn = 0 for n ≥ n0 , we obtain from Proposition 3.10 that (5.26) is also true with N replaced by N + 1. From (1.16) we infer that Bn 1 = max (T (b) + K1 , T ( b) + L1 ), Bn ∞ = max (T (b) + K∞ , T ( b) + L∞ )
(5.27) (5.28)
whenever n ≥ max(2s + 1, 2n0 + 1). Throughout what follows in this chapter we therefore assume that 1 < p < ∞. Our objective is to show that lim Bn p = max (T (b) + Kp , T ( b) + Lp ).
n→∞
i
i i
i
i
i
i
124
buch7 2005/10/5 page 124 i
Chapter 5. Norms
Since Wn → 0 weakly, we deduce from Proposition 3.3 that Wn LWn → 0 strongly,
Wn KWn → 0 strongly.
Consequently, Bn → T (b) + K strongly,
Wn Bn Wn → T ( b) + L strongly.
Because Wn Bn Wn p = Bn p , we deduce from Theorem 3.1 that b) + Lp ) ≤ lim inf Bn p . max (T (b) + Kp , T ( n→∞
(5.29)
We define b) + Lp ), Mp := max (T (b) + Kp , T (
Mp0 := T (b)p .
(5.30)
Lemma 5.12. We have T (b) + Kp ≥ T (b)p and T ( b) + Lp ≥ T (b)p . Proof. For n ∈ Z, let χn (t) = t n (t ∈ T). From Proposition 1.4 we obtain the equality T (χ−n )T (b)T (χn ) = T (b). On the other hand, from (5.25) we see that KT (χn ) = 0 for n ≥ n0 . Hence, for n ≥ n0 , T (b)p = T (χ−n )(T (b)+K)T (χn )p , and since T (χ±n )p = 1, it follows that T (b)p ≤ T (b) + Kp . Analogously we get T ( b)p ≤ T ( b) + Lp . As b)p = limn→∞ Tn ( b)p , we deduce from Lemma T (b)p = limn→∞ Tn (b)p and T ( 5.3 that T ( b)p = T (b)p . This gives the inequality T ( b) + Lp ≥ T (b)p . If s = 0, then Bn = b0 I + Pn KPn + Wn LWn , and it is clear that Bn p = max(b0 I + Kp , |b0 |, b0 I + Lp ) for n ≥ 2n0 + 1. As b0 I + Kp ≥ |b0 | and b0 I + Lp ≥ |b0 | by Lemma 5.12, it results that Bn p = max(b0 I + Kp , b0 I + Lp ) = Mp for all n ≥ 2n0 + 1. Thus, in the following we will always assume that s ≥ 1. The desired result is Corollary 5.14. The next theorem gives the difficult part of that corollary and, in addition, an upper estimate of Bn p . Theorem 5.13. If n ≥ max(8s + 8, 4n0 ), then 8s . Bn p ≤ Mp 1 + pn p
Proof. Put = [n/4] and let x = (xj )n−1 j =0 ∈ n be any vector such that xp = 1. By Lemma 5.4, there is an m ∈ N such that * +−1 m+s−1 p |xj | ≤ . (5.31) + s ≤ m ≤ 3 − s, s j =m−s
i
i i
i
i
i
i
5.4. Toeplitz-Like Matrices
buch7 2005/10/5 page 125 i
125
We now proceed as in the proof of Theorem 5.5. First, we have Bn xpp =
m−1
|(Bn x)j |p +
j =0
n−1
|(Bn x)j |p .
(5.32)
j =m
It is clear that ≤ n/4. By assumption, n ≥ 4n0 , whence = [n/4] ≥ n0 . Thus, n ≥ 4 = + 3 ≥ n0 + 3 > n0 + 3 − s ≥ n0 + m.
(5.33)
If j ≤ m − 1, then (5.33) implies that n − 1 − j ≥ n − m > n0 .
(5.34)
Let K = (Kij ) and L = (Lij ). Since (Wn LWn x)j = (LWn x)n−1−j =
n 0 −1
Ln−1−j,i xn−1−i ,
i=0
we deduce from (5.25) and (5.34) that (Wn LWn x)j = 0 whenever j ≤ m−1. Consequently, m−1
|(Bn x)j | = p
j =0
m−1 m+s−1 j =0
= Pm (T (b) +
bj −i xi +
i=0
n 0 −1 i=0
p K j i xi
m+s−1 p K)(xi )i=0 p
≤ T (b) + Kpp
m+s−1
|xi |p ≤ Mpp
m+s−1
i=0
|xi |p .
(5.35)
i=0
Similarly, if j ≥ m then (Pn KPn x)j = 0, whence n−1 p n n−1 0 −1 |(Bn x)j | = bj −i xi + Ln−1−j,i xn−1−i j =m j =m i=m−s i=0 p n n−1 n−m+s−1 0 −1 = bj −n+1+r xn−1−r + Ln−1−j,r xn−1−r r=0 r=0 j =m p n n−m−1 0 −1 n−m+s−1 = bn−1−k−n+1+r xn−1−r + Ln−1−n+1+k,r xn−1−r r=0 r=0 k=0 n n−m−1 0 −1 p n−m+s−1 = br−k xn−1−r + Lk,r xn−1−r n−1
p
k=0
r=0
r=0
n−m+s−1 p = Pn−m (T ( b) + L)(xn−1−r )r=0 p
≤ T ( b) + Lpp
n−m+s−1 r=0
|xn−1−r |p ≤ Mpp
n−1
|xj |p .
(5.36)
j =m−s
i
i i
i
i
i
i
126
buch7 2005/10/5 page 126 i
Chapter 5. Norms
Combining (5.32), (5.35), and (5.36) we get ⎛ ⎞ m+s−1 n−1 Bn xpp ≤ Mpp ⎝ |xj |p + |xj |p ⎠ , j =0
which, by (5.10), implies that 8 + 8s, it results that
p Bn xp
≤
j =m−s
p Mp (1
+ [/s]−1 ). As [/s] ≥ n/(8s) for n ≥
8s 1/p 8s ≤ Mp 1 + . Bn p ≤ Mp 1 + n pn
Corollary 5.14. We have lim Bn p = Mp .
n→∞
Proof. The proof is immediate from (5.29), (5.30), and Theorem 5.13.
5.5
Exponentially Fast Convergence Is Generic
We now turn to the problem of estimating the speed with which Bn p converges to Mp . The solution will be as follows: generically Bn p converges to Mp exponentially fast, but in some exceptional cases it may happen that | Bn p − Mp | decays only as 1/n2 . Precise results are in this and the next sections. Let Bn be as in the previous section and put m0 = max(s, n0 ) and B = T (b) + K. We know from Lemma 5.12 that always Bp ≥ T (b)p . The following lemma shows the existence of a rapidly decaying sequence at which B attains its norm. p Lemma 5.15. If Bp > T (b)p , then there exists an x0 = {xj(0) }∞ j =0 ∈ such that
x0 p = 1, Bx0 p = Bp , and
⎛ ⎝
∞
j =m
⎞1/p |xj(0) |p ⎠
≤
T (b)p Bp
(5.37)
(m−(m0 +s))/(2s) (5.38)
for every m ≥ m0 + s. Proof. Put M0 = T (b)p and M = Bp . Pick any natural number m ≥ m0 . There are p xk = {xj(k) }∞ j =0 ∈ such that xk p = 1, Bxk p = M p − δk ,
(5.39)
where δk goes monotonically to zero as k → ∞. As in the proof of Theorem 5.13 we see that + Qm T (b)(xj(k) )∞ Bxk = Pm (T (b) + K)(xj(k) )jm+s−1 j =m−s , =0
i
i i
i
i
i
i
5.5. Exponentially Fast Convergence Is Generic
buch7 2005/10/5 page 127 i
127
whence M p − δk ≤ M p
m+s−1
p
|xj(k) |p + M0
j =0
∞
|xj(k) |p .
(5.40)
j =m−s
On defining Om(k) =
∞
|xj(k) |p ,
γk = δk /M p ,
q = (M0 /M)p ,
j =m (k) (k) we can rewrite (5.40) in the form 1 − γk ≤ (1 − Om+s ) + qOm−s . Thus, (k) (k) Om+s ≤ qOm−s + γk for all m ≥ m0 .
(5.41)
Since Om(k) ≤ 1, we obtain from (5.41) that Om0 +s+v ≤ q + γk for all v in the set {0, 1, . . . , 2s − 1}. This and (5.41) give Om(k)0 +3s+v ≤ qOm(k)0 +s+v + γk ≤ q 2 + (q + 1)γk for v ∈ {0, 1, . . . , 2s − 1}. Repeating this argument we arrive at the inequalities Om0 +(2j +1)s+v ≤ q j +1 + (q j + · · · + 1)γk ≤ q j +1 +
γk 1−q
(5.42)
for v ∈ {0, 1, . . . , 2s − 1}. Let r := q 1/(2s) . Then (5.42) can be written as Om(k)0 +s+2j s+v ≤ q j +1 +
γk γk = r 2j s+2s + , 1−q 1−q
and it results that Om(k)0 +s+ ≤ r +
γk for ≥ 0. 1−q
(5.43)
p We now show that {xk }∞ k=1 can be taken to be a Cauchy sequence in . Given ε > 0, (N−(m0 +s))/p < ε/4. On passing to a subsequence there is a natural number N such that r p if necessary, we can assume that {PN xk }∞ is a Cauchy sequence in N . Hence, there is a k=1 natural number R such that N−1
|xj(k1 ) − xj(k2 ) |p
m0 by assumption. For < m < 2, we have p n−m−1 n−m+s−1 p n m−1 0 −1 m+s−1 (n) (n) (n) p Bn p = bj −k xk + Kj k xk + bj −k xk j =0 k=0 j =m k=m−s k=0 n n−1 n−1 0 −1 p (n) (n) + b x + (W LW ) x j −k k n n jk k j =n−m k=n−m+s
≤
k=0
m+s−1 p n−m+s−1 p (T (b) + K)(xk(n) )k=0 p + T (b)(xk(n) )k=m−s p p + (Tn (b) + Wn LWn )(xk(n) )n−1 k=n−m+s p .
b)Wn and Wn p = 1, it follows from (5.46) that Since Tn (b) = Wn Tn ( M p + εn ≤ M p
m+s−1 k=0
p
|xk(n) |p + M0
n−m+s−1 k=m−s
|xk(n) |p + M p
n−1
|xk(n) |p .
k=n−m+s
i
i i
i
i
i
i
5.5. Exponentially Fast Convergence Is Generic
buch7 2005/10/5 page 129 i
129
With Om(n) :=
n−m−1
|xj(n) |p ,
j =m p
(n) (n) (n) (n) ) + M0 Om−s , whence Om+s ≤ qOm−s − γn with we therefore get M p + εn ≤ M p (1 − Om+s p p q := (M0 /M) and γn := εn /M . This inequality is of the form (5.41). Consequently, in analogy to (5.42) and (5.43) we have
Om(n)0 +s+j ≤ r j −
1 − rj γn , 1−q
r := q 1/(2s)
(5.47)
as long as Om(n)0 +s+j is well defined, i.e., for all j satisfying m0 +s +j ≤ n−(m0 +s +j )−1 or, equivalently, for j≤
n−1 n − 2m0 − 2s − 1 − m0 − s = . 2 2
(5.48)
As the left-hand side of (5.47) is nonnegative provided (5.48) holds, we get r (n−2m0 −2s−1)/2 1 − q (n−2m0 −2s−1)/2 r ≤ (n−2m −2s−1)/2 0 1−r 1−r 2s p n−2m02−2s−1 1 − (M0 /M)p/(2s) M0 2s = 1 − (M0 /M)p/(2s) M 4sp (n−2m0 −2s−1) 1 M0 M0 4s (n−2m0 −2s−1) ≤ 2s ≤ 2s . M M
γn ≤ (1 − q)
(5.49)
Put ηn := Bn p − M. By (5.46), ηn = (M p + εn )1/p − M. If εn ≥ 0, then εn 1/p εn + εn ηn = M 1 + p −M = −M ≤M , p M pM pM p−1 and if εn < 0, then ηn < 0. Since γn = εn /M p , we see from (5.49) that in either case M ηn ≤ 2s p
M M0
(2m0 +2s+1)/(4s)
M0 M
n/(4s) ≤ Mcp pn .
This is the upper estimate in (5.45). To prove the lower estimate in (5.45) we make use of Lemma 5.15. For the sake of definiteness, suppose M = T (b) + Kp (since Bn p = Wn Bn Wn p , the case M = T ( b) + Lp can be reduced to the case we consider). Let x0 satisfy (5.37) and (5.38). Put v = [n/2] (> m0 ) and yn = Pv x0 . Then Bn p equals Pn (T (b) + K)yn p = (T (b) + K)yn − Qn (T (b) + K)yn p (0) ∞ (0) v−1 = (T (b) + K)(xj(0) )∞ j =0 − (T (b) + K)(xj )j =v − Qn (T (b) + K)(xj )j =v−s p
i
i i
i
i
i
i
130
buch7 2005/10/5 page 130 i
Chapter 5. Norms
and this is clearly greater than or equal to (0) v−1 (T (b) + K)x0 p − (T (b) + K)(xj(0) )∞ j =v p − Qn (T (b) + K)(xj )j =v−s p ⎞1/p ⎛ ∞ (0) v−1 ⎝ ≥ M − (xj(0) )∞ |xj(0) |p ⎠ j =v p − (xj )j =v−s p ≥ M − 2M
≥ M − 2M
M0 M
(v−s−(m0 +s))/(2s)
which equals
M − 2M
M M0
≥ M − 2M
(2m0 +4s+1)/(4s)
M0 M
j =v−s
M0 M
((n−1)/2−s−(m0 +s))/(2s) ,
(n/(4s) ≥ M − Mcp pn .
Since yn p ≤ 1, we finally obtain Bn p ≥ Bn yn p /yn p ≥ Bn yn p ≥ M − Mcp pn . The crucial assumption of Theorem 5.16 is the strict inequality Mp0 < Mp . In the case p = 2, the following argument can be employed to see that this inequality is generically true. Let P be the set of all Laurent polynomials and let X denote the set of all infinite matrices with only finitely many nonzero entries. We equip P × X × X with the norm (b, K, L) := max(T (b)2 , K2 , L2 ). Recall that T (b)2 = b∞ (Theorem 1.15). Proposition 5.17. The set {(b, K, L) ∈ P × X × X : max(T (b) + K2 , T ( b) + L2 ) > T (b)2 }
(5.50)
is an open and dense subset of P × X × X . Proof. It is clear that (5.50) is an open subset of P × X × X . To show that it is dense in P × X × X , it suffices to prove that the set of all (b, K) ∈ P × X satisfying T (b) + K2 > T (b)2 is dense in P × X . Pick (b, K) ∈ P × X and suppose T (b) + K2 = T (b)2 = b∞ . Given any ε > 0, we can find a λ ∈ C such that |λ| < ε and b + λ∞ > b∞ . Then T (b) + K + λI 2 = T (b − λ) + K2 ≥ T (b + λ)2 = b + λ∞ > b∞ , and since T (b) + K + λPn converges strongly to T (b) + K + λI , it follows that lim inf T (b) + K + λPn 2 ≥ T (b) + K + λI 2 > T (b)2 . n→∞
Consequently, letting K0 := K + λPn , we see that T (b) + K0 2 > T (b)2 for all sufficiently large n, while (T (b) + K) − (T (b) + K0 )2 ≤ |λ| Pn 2 = |λ| < ε. Now let Bn be given by (5.24) and let p = 2. Proposition 5.17 tells us that the strict inequality M20 < M2 represents the generic case, whereas the equality M20 = M2 is the exceptional case. Thus, Theorem 5.16 says that generically Bn 2 − M2 decays to zero with exponential speed.
i
i i
i
i
i
i
5.6. Slow Convergence
5.6
buch7 2005/10/5 page 131 i
131
Slow Convergence
In concrete situations, we nevertheless often encounter the exceptional case where Mp0 = Mp . For instance, we are definitely in this case whenever Bn = Tn (b). Theorem 5.8 reveals that in the exceptional case it is possible that M2 − Bn 2 = M20 − Bn 2 ≥ d/n2 with some d > 0 independent of n, which means that we do not have exponentially fast convergence. The following theorem certainly does not give sharp estimates, but it provides us with a universal estimate with good constants. Theorem 5.18. If s ≥ 1 and n ≥ max(4n0 , 81 s), then 81 s 8 s Mp0 1 − ≤ Bn p ≤ Mp 1 + . p n p n Proof. The upper estimate is immediate from Theorem 5.13. So we are left to prove the lower estimate. p By Theorem 5.5, there is an x ∈ [n/2] such that xp = 1 and 40 s 0 T[n/2] (b)xp ≥ Mp 1 − . p [n/2] p
Let = [n/4] and define the unit vector y ∈ n by y = (0, . . . , 0, x0 , x1 , . . . , x[n/2]−1 , 0, . . . , 0).
By assumption, ≥ n0 and ≥ n0 . This implies that Bn y = Tn (b)y, whence ⎞1/p ⎛ +[n/2] Bn yp = Tn (b)yp ≥ ⎝ |(Tn (b)y)j |p ⎠ j =+1
p ⎞1/p +[n/2] +[n/2] =⎝ bj −k xk ⎠ = T[n/2] (b)xp j =+1 k=+1 40 s 40 2s 81 s 0 0 0 ≥ Mp 1 − ≥ Mp 1 − , ≥ Mp 1 − p [n/2] p n−1 p n ⎛
the last estimate resulting from the assumption that n ≥ 81s ≥ 81. Corollary 5.19. Suppose Mp0 = Mp . If s ≥ 1 and n ≥ max(4n0 , 81 s), then 81 s 8 s Mp 1 − ≤ Bn p ≤ Mp 1 + . p n p n Proof. The proof is immediate from Theorem 5.18.
i
i i
i
i
i
i
132
5.7
buch7 2005/10/5 page 132 i
Chapter 5. Norms
Summary
Let us summarize the essence of this chapter. We are given matrices Bn = Tn (b) + Pn KPn + Wn LWn , where b is a Laurent polynomial and K and L have only finitely many nonzero entries. We put b) + Lp ), Mp := max (T (b) + Kp , T (
Mp0 := T (b)p .
The inequality Mp ≥ Mp0 is always true, and we know that at least for p = 2 the strict inequality Mp > Mp0 represents the generic case. If 1 < p < ∞ and Mp > Mp0 , then there is a γ > 0 such that Bn p − Mp = O(e−γ n ),
(5.51)
while if 1 < p < ∞ and Mp = Mp0 , then 1 . Bn p − Mp = O n
(5.52)
If p = 1 or p = ∞, then Bn p − Mp = 0
(5.53)
for all sufficiently large n. Now suppose that Bn = Tn (b) is a pure Toeplitz band matrix. In this case, Mp = Mp0 = T (b)p . If 1 < p < ∞, then 1 . Tn (b)p − T (b)p = O n
(5.54)
In the case p = 2 this can be improved to 1 . Tn (b)2 − T (b)2 = O n2
(5.55)
The convergence in (5.55) may be faster, of the form O(1/n2γ ) with some natural number γ , but it is never exponentially fast unless |b| is constant on T (which happens if and only if b has at most one nonzero coefficient). Finally, if p = 1 or p = ∞, then (5.56) Tn (b)p − T (b)p = 0 for all n large enough.
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 133 i
133
Exercises 1. Let A ∈ B(2 ) be selfadjoint and positive definite. Put An = Pn APn . Prove that {An }∞ n=1 is a stable sequence and that An 2 ≤ A2 for all n ≥ 1, A−1 n 2
lim An 2 = A2 ,
n→∞
−1
≤ A 2 for all n ≥ 1,
−1 lim A−1 n 2 = A 2 .
n→∞
2. Let b ∈ P and let Pcirc (Tn (b)) be the circulant matrix whose first column is col
(n − j )bj + j b−(n−j ) n
n−1 j =0
(with b−n := 0).
Prove that Pcirc (Tn (b)) is the best approximation of Tn (b) by a circulant matrix in the Frobenius norm. 3. Let a, b ∈ P. Prove that Tn (a)Tn (b) − Tn (b)Tn (a)p = T (a)T (b) − T (b)T (a)p for all sufficiently large n. 4. Let α and β be positive integers and put ⎛ 0 1α β ⎜ 1 0 ⎜ ⎜ 2β 1β An = ⎜ ⎜ .. .. ⎝ . . (n − 1)β
(n − 2)β
2α 1α 0 .. .
... ... ... .. .
(n − 1)α (n − 2)α (n − 3)α .. .
(n − 3)β
...
0
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
Prove that An 2F /(nAn 22 ) → 0 as n → ∞, where · F is the Frobenius norm. 2
2
2
2
2
5. Consider the sequence {21 , 22 , 23 , 24 , 25 , . . . }. Define ak = k if k belongs to this sequence and ak = 0 otherwise. Let An be the lower-triangular Toeplitz matrix ⎞ ⎛ a0 ⎟ ⎜ a1 a0 ⎟ ⎜ An = ⎜ . ⎟. . . .. .. ⎠ ⎝ .. an−1
an−2
...
a0
Prove that for each number μ in the segment [0, 1] there exists a sequence {nj }∞ j =1 such that Anj 2F /(nj Anj 22 ) → μ as nj → ∞. 6. Let a be a complex number with |a| ≥ 4. Let b(t) = 2 + t + t −1 and put Bn = Tn (b) + diag (a, 0, . . . , 0, a),
B = T (b) + diag (a, 0, . . . ).
i
i i
i
i
i
i
134
buch7 2005/10/5 page 134 i
Chapter 5. Norms Show that for n ≥ 10,
1 11 Bn 2 − B2 < (|a| + 2) 4 5 7. Let
⎛
3 ⎜ −1 B = T (b) + K = ⎜ ⎝ 0 ...
#
84 3 −1 ...
n4
4 |a|2 + 1
0 −1 3 ...
.
⎞ ... ... ⎟ ⎟ ... ⎠ ...
and Bn = Tn (b) + Pn KPn . (a) Show that M20 = 5 and 80 ≤ M2 ≤ 90. (b) Use Theorem 5.16 to show that for n ≥ 14, 120150 . Bn 2 − B2 < 2n (c) MATLAB gives B32 2 = 84.1117. Deduce that B2 = 84.1117 ± 0.0001. 8. Show that if ϕ ∈ L∞ and
T2 (ϕ) =
then ϕ∞ ≥
0 1
1 0
,
√ 2.
Notes Section 5.1 is from [58]. The results of Section 5.2 are well known in the case where b is real valued and, hence, Tn (b) is Hermitian: See Kac, Murdock, and Szegö [176], Grenander and Szegö [145], Widom [290], Parter [198], and Serra Capizzano [247], [248]. In the form stated here, Theorem 5.8 was established in [58]. Theorem 5.10 probably first appears in our paper [53]. Our original proof of Theorem 5.10 used the Fejér kernel and gave only the inequality √ Tn (b)2 ≥ (1/ 3π ) σn b∞ , which, however, was sufficient for the purpose of our paper [53]. Then Stefano Serra Capizzano √ communicated an alternative proof to us which, moreover, showed that the constant 1/ 3π can be improved to the (optimal) value 1. Subsequently we were able to modify (and even to simplify) our original proof so that it yielded the constant 1 as well, but this was no longer the point. Serra Capizzano’s proof was based on arguments that are standard in the preconditioning literature (see, e.g., [84], [88], [249]). He first showed that
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 135 i
135
Tn (b)2 ≥ Pcirc (Tn (b))2 , where Pcirc (Tn (b)) is the circulant matrix of Exercise 2. One can straightforwardly verify that Pcirc (Tn (b))2 = max |(σn b)(t)|,
(5.57)
t∈Tn
where Tn is the group of the nth unit roots. A scaling trick, viz., the equality Tn (b)2 = Tn (bζ )2 with (bζ )j := bj ζ j (ζ ∈ T), eventually gives the desired inequality Tn (b)2 = max Tn (bζ )2 ≥ max max |(σn bζ )(t)| ζ ∈T
ζ ∈T t∈Tn
= max max |(σn b)(ζ t)| = σn b∞ . ζ ∈T t∈Tn
In an attempt to find out who was the first to write down (5.57) explicitly, we looked into the paper [249], and once we saw the first lines of the proof of Lemma 2.1 of [249], we came to understand that our Theorem 5.10 can actually be proved as in Section 5.3. The results of Sections 5.4 to 5.7 are also taken from paper [58]. Figure 5.1 was done by André Eppler. For Exercises 4 and 5 see [53]. Exercise 8 is from [194], where it is attributed to A. Volberg. Further results: symbols of minimal norm. Suppose we are given a Toeplitz matrix Tn (a). Since only the Fourier coefficients ak with |k| ≤ n − 1 enter the matrix, we have Tn (a) = Tn (ϕ) for every ϕ ∈ L∞ satisfying ϕk = ak for |k| ≤ n − 1. Put νn (a) = inf{ϕ∞ : ϕk = ak for |k| ≤ n − 1}. It is clear that Tn (a)2 ≤ νn (a) ≤ a∞ . Nikolskaya and Farforovskaya [194] showed that actually Tn (a)2 ≥
1 νn (a). 3
Thus, we can find a symbol ϕ ∈ L∞ such that Tn (a) = Tn (ϕ)
and
Tn (a)2 ≤ ϕ∞ ≤ 3Tn (a)2 .
From Exercise 8 we learn that if a(t) = t + t −1 , then ν2 (a) ≥ 1 = T2 (a)2 ≥ c ν2 (a) ≥ c
√ 2, which implies that
√ 2
√ the conclusion that the optimal constant in the cannot be true if c > 1/ 2. We so arrive at √ inequality Tn (a)2 ≥ c νn (a) is in [1/3, 1/ 2]. Together with Tn (a) = Pn T (a)Pn we may also consider the rectangular Toeplitz matrices Tn−r,n+r (a) = Pn−r T (a)Pn+r
(|r| ≤ n − 1).
i
i i
i
i
i
i
136
buch7 2005/10/5 page 136 i
Chapter 5. Norms
For example, in the case n = 3 we have the five matrices ⎞ ⎛ ⎛ a−2 a−1 a−2 ⎜ a−1 ⎟ ⎟ ⎜ ⎜ a0 a−1 ⎟ ⎜ T5,1 (a) = ⎜ ⎜ a0 ⎟ , T4,2 (a) = ⎝ a1 a0 ⎝ a1 ⎠ a2 a1 a2 ⎞ ⎛ a0 a−1 a−2 T3,3 (a) = ⎝ a1 a0 a−1 ⎠ , T2,4 (a) = a2 a1 a0 T1,5 (a) = a2 a1 a0 a−1 a−2 .
⎞ ⎟ ⎟, ⎠
a0 a1
a1 a2
a−1 a0
a−2 a−1
,
Put δn (a) = max Tn−r,n+r (a)2 . |r|≤n−1
Again it is obvious that δn (a) ≤ νn (a) ≤ a∞ . Bakonyi and Timotin [12] proved the estimate δn (a) ≥
1 n+2 νn (a). 2 n+1
Consequently, there is a symbol ϕ ∈ L∞ such that Tn (a) = Tn (ϕ)
and
δn (a) ≤ ϕ∞ ≤ 2
n+1 δn (a). n+2
i
i i
i
i
i
i
buch7 2005/10/5 page 137 i
Chapter 6
Condition Numbers
Let Bn be an n × n matrix. For 1 ≤ p ≤ ∞, we denote by κp (Bn ) the condition number of p Bn as an operator on n : κp (Bn ) := Bn p Bn−1 p . Throughout what follows we put Bn−1 p = ∞ in case Bn is not invertible. In this chapter, we study the behavior of the condition numbers of Toeplitz band matrices Tn (b) and of Toeplitz-like matrices Bn = Tn (b) + Pn KPn + Wn LWn for large n.
6.1 Asymptotic Inverses of Toeplitz-Like Matrices As in Section 5.4, we assume that Bn is given by Bn = Tn (b) + Pn KPn + Wn LWn ,
(6.1)
where b(t) = sj =−s bj t j (t ∈ T) and where K and L have only a finite number of nonzero entries, that is, Pn0 KPn0 = K,
Pn0 LPn0 = L
(6.2)
for some n0 ∈ N. Theorem 3.15 gives an asymptotic inverse for the pure Toeplitz matrices Tn (b). The purpose of this section is to extend this result to the Toeplitz-like matrices Bn . We put = T ( B = T (b) + K, B b) + L,
(6.3)
are invertible (which, by Theorem 1.9, implies that b is invertible and and in case B and B that wind b = 0), we set X = B −1 − T (b−1 ),
−1 − T ( Y =B b−1 ).
(6.4)
137
i
i i
i
i
i
i
138
buch7 2005/10/5 page 138 i
Chapter 6. Condition Numbers
are Lemma 6.1. Let 1 ≤ p ≤ ∞ and let α be any number satisfying (1.23). If B and B p −αn −αn invertible on , then Qn Xp = O(e ) and Qn Y p = O(e ) as n → ∞. Proof. From (6.4) we infer that I = (T (b−1 ) + X)(T (b) + K). This and Proposition 1.3 yield I = X(T (b) + K) + T (b−1 )T (b) + T (b−1 )K = X(T (b) + K) + I − H (b−1 )H ( b) + T (b−1 )K, whence X = (H (b−1 )H ( b) − T (b−1 )K)(T (b) + K)−1 and thus Qn Xp ≤ Qn H (b−1 )p H ( b)p + Qn T (b−1 )Kp (T (b) + K)−1 p . Taking into account that K has only a finite number of nonzero entries and using Lemma 1.17 it is easy to show that Qn H (b−1 )p and Qn T (b−1 )Kp are O(e−αn ), which implies that Qn Xp = O(e−αn ). The proof is analogous for Qn Y p . are invertible on Theorem 6.2. Let 1 ≤ p ≤ ∞, let α satisfy (1.23), and suppose B and B p . Then the matrices Bn are invertible for all sufficiently large n and Bn−1 = Tn (b−1 ) + Pn XPn + Wn Y Wn + En , where X and Y are given by (6.4) and En p = O(e−αn ). Proof. Put An = Tn (b−1 ) + Pn XPn + Wn Y Wn . Theorem 3.13 tells us that the matrices Bn are invertible for all sufficiently large n and that Bn−1 p remains bounded as n → ∞. We have En = Bn−1 − An = Bn−1 (Pn − Bn An ), and the theorem will follow as soon as we have shown that Pn − Bn An p = O(e−αn ). The matrix Bn An − Pn equals Pn (T (b) + K)Pn (T (b−1 ) + X)Pn − Pn + Pn (T (b) + K)Pn Wn Y Wn + Wn LWn Pn (T (b−1 ) + X)Pn + Wn LWn Wn Y Wn . Let n0 be the number in (6.2). Since Pn0 Wn = Wn0 T (χ−(n−n0 ) )Pn Qn−n0 for all n ≥ n0 , we obtain from (6.2) that Pn KPn Wn Y Wn = Pn0 KPn0 Wn Y Wn = Pn0 KWn0 T (χ−(n−n0 ) )Pn Qn−n0 Y Wn , and Lemma 6.1 therefore gives Pn KPn Wn Y Wn p = O(e−αn ). Analogously one can show that Wn LWn Pn XPn = O(e−αn ). Thus Bn An − Pn is Pn (T (b) + K)Pn (T (b−1 ) + X)Pn − Pn + Pn T (b)Pn Wn Y Wn + Wn LWn Pn T (b−1 )Pn + Wn LWn Wn Y Wn + En
(6.5)
with En p = O(e−αn ). The identities Pn T (b)Wn Y Wn = Wn T ( b)Pn Y Wn , Wn LWn T (b−1 )Pn = Wn LPn T ( b−1 )Wn ,
i
i i
i
i
i
i
6.2. The Limit of the Condition Numbers
buch7 2005/10/5 page 139 i
139
imply that (6.5) equals Pn (T (b) + K)Pn (T (b−1 ) + X)Pn − Pn + Wn (T ( b) + L)Pn (T ( b−1 ) + Y )Wn − Wn T ( b)Pn T ( b−1 )Wn + En ,
(6.6)
b) + L)(T ( b−1 ) + Y ) = I , it results that (6.6) and since (T (b) + K)(T (b−1 ) + X) = (T ( is equal to Pn − Pn (T (b) + K)Qn (T (b−1 ) + X)Pn − Pn + Pn − Wn (T ( b) + L)Qn (T ( b−1 ) + Y )Wn − Wn T ( b)Pn T ( b−1 )Wn + En .
(6.7)
Using (6.2) and Lemma 6.1 we see that (6.7) is Pn − Pn T (b)Qn T (b−1 )Pn − Pn + Pn − Wn T ( b)Qn T ( b−1 )Wn − Wn T ( b)Pn T ( b−1 )Wn + En
(6.8)
with En p = O(e−αn ). Finally, taking into account Propositions 1.3 and 3.10 we obtain that (6.8) minus En is b)T ( b−1 )Wn Pn − Pn T (b)Qn T (b−1 )Pn − Wn T ( = Pn − Pn T (b)T (b−1 )Pn + Pn T (b)Pn T (b−1 )Pn − Wn T ( b)T ( b−1 )Wn = Pn − Pn (I − H (b)H ( b−1 ))Pn + Tn (b)Tn (b−1 ) − Wn (I − H ( b)H (b−1 ))Wn = Pn − Pn + Pn H (b)H ( b−1 )Pn + Tn (b)Tn (b−1 ) − Pn + Wn H ( b)H (b−1 )Wn = Pn − Pn + Pn − Pn = 0. Thus, Bn An − Pn = En .
6.2 The Limit of the Condition Numbers by (6.3). We know from Theorem 3.13 that Let Bn be of the form (6.1) and define B and B is not invertible. So suppose B and B are invertible and put Bn−1 p → ∞ if B or B −1 p ). Np := max(B −1 p , B Again by Theorem 3.13, the matrices Bn are invertible for all sufficiently large n and lim sup Bn−1 p < ∞. The following theorem shows that in fact the limit lim Bn−1 p exists and it identifies this limit. Theorem 6.3. If 1 ≤ p ≤ ∞, then lim Bn−1 p = Np .
n→∞
(6.9)
Proof. Fix ε > 0. By virtue of Theorem 6.2, Bn−1 = Tn (b−1 ) + Pn XPn + Wn Y Wn + En ,
i
i i
i
i
i
i
140
buch7 2005/10/5 page 140 i
Chapter 6. Condition Numbers
where X and Y are given by (6.4) and En p = O(e−αn ). We can choose a Laurent polynomial a such that b − a −1 W < ε and b−1 − aW < ε. Due to Lemma 6.1, there exist infinite matrices U and V with only finitely many entries such that X − U p < ε and Y − V p < ε. Put An = Tn (a) + Pn U Pn + Wn V Wn , Corollary 5.14 implies that
Mp = max(T (a) + U p , T ( a ) + V p ).
An p − Mp < ε
(6.10)
for all sufficiently large n. We have −1 B p − An p n = Tn (b−1 ) + Pn XPn + Wn Y Wn + En p − Tn (a) + Pn U Pn + Wn V Wn p ≤ Tn (b−1 − a)p + Pn (X − U )Pn p + Wn (Y − V )Wn p + En p ≤ b−1 − aW + X − U p + Y − V p + En p < 4ε
(6.11)
provided n is large enough. Furthermore, | Mp − Np | −1 p ) = max(T (a) + U p , T ( a ) + V p ) − max(B −1 p , B −1 p | ≤ max | T (a) + U p − B −1 p |, | T ( a ) + V p − B −1 p ≤ max T (a) + U − B −1 p , T ( a) + V − B a − b−1 ) + V − Y p = max T (a − b−1 ) + U − Xp , T ( ≤ max a − b−1 W + U − Xp , a − b−1 W + V − Y p < max (ε + ε, ε + ε) = 2ε.
(6.12)
Putting (6.10), (6.11), (6.12) together, we arrive at the conclusion that | Bn−1 p − Np | < ε + 4ε + 2ε = 7ε for all sufficiently large n. = 0, then Corollary 6.4. Let 1 ≤ p ≤ ∞. If B = 0 or B lim κp (Bn ) = Mp Np ,
n→∞
(6.13)
p ) and Np = max(B −1 p , B −1 p ). where Mp = max(Bp , B = 0, Proof. This is immediate from Corollary 5.14 and Theorem 6.3. (Notice that if B = B then the assertion reads lim 0 = 0 · ∞.) Corollary 6.5. If 1 ≤ p ≤ ∞ and b is a Laurent polynomial that does not vanish identically, then lim Tn−1 (b)p = max(T −1 (b)p , T −1 ( b)p ),
(6.14)
lim κp (Tn (b)) = max(κp (T (b)), κp (T ( b))).
(6.15)
n→∞ n→∞
i
i i
i
i
i
i
6.2. The Limit of the Condition Numbers
buch7 2005/10/5 page 141 i
141
Proof. In the case where T (b) is invertible, formula (6.14) is a special case of Theorem 6.3 and formula (6.15) follows from Corollary 6.4: lim Tn (b)p Tn−1 (b)p = T (b)p lim Tn−1 (b)p
n→∞
n→∞
= T (b)p max(T −1 (b)p , T −1 ( b)p ) = max(T (b)p T −1 (b)p , T (b)p T −1 ( b)p ) −1 −1 = max(T (b)p T (b)p , T (b)p T (b)p ), the equality T (b)p = T ( b)p resulting from Theorem 1.14 for p = 1 and p = ∞ and from Lemma 5.3 for 1 < p < ∞. If T (b) is not invertible, then Theorem 3.7 tells us that both sides of (6.14) and (6.15) are infinite. If p = 2, then T (b)2 = T ( b)2 = b∞ and T −1 (b)2 = T −1 ( b)2 (note that T ( b) is just the transpose of T (b)), whence lim Tn−1 (b)2 = T −1 (b)2 ,
n→∞
lim κ2 (Tn (b)) = κ2 (T (b)).
(6.16)
n→∞
As the following example shows, the situation is different for p = 2. Example 6.6. Let a(t) =
1 , a− (t)a+ (t)
a− (t) = 4 + 4t −1 + 3t −2 ,
a+ (t) = 2 − t.
It is easily seen that a± ∈ GW± (Theorem 1.7). Thus, T (a) is invertible T (a+ )T (a− ) due to Proposition 1.4. We have ⎛ ⎞⎛ 2 0 0 ... 4 4 3 0 ⎜ −1 ⎟⎜ 0 2 0 . . . 4 4 3 ⎟⎜ T (a+ )T (a− ) = ⎜ ⎝ 0 −1 2 ... ⎠⎝ 0 0 4 4 ... ... ... ... ... ... ... ... ⎛ ⎞ 8 8 6 0 0 ... ⎜ −4 4 5 6 0 ... ⎟ ⎟. =⎜ ⎝ 0 −4 4 5 6 ... ⎠ ... ... ... ... ... ...
and T −1 (a) = ⎞ ... ... ⎟ ⎟ ... ⎠ ...
This in conjunction with (1.16) shows that T −1 (a)1 = max(12, 16, 19) = 19, T −1 ( a )1 = max(22, 19) = 22. a )1 = aW , it follows that Since T (a)1 = T ( T −1 (a)1 < T −1 ( a )1 ,
κ1 (T (a)) < κ1 (T ( a )).
If bn − aW → 0, then T −1 (bn ) − T −1 (a)1 ≤ T −1 (a)1 a − bn W T −1 (bn )1 → 0,
i
i i
i
i
i
i
142
buch7 2005/10/5 page 142 i
Chapter 6. Condition Numbers
and hence we can find a Laurent polynomial b such that T −1 (b)1 < T −1 ( b)1 ,
κ1 (T (b)) < κ1 (T ( b)).
By the Riesz-Thorin interpolation theorem, the numbers log T −1 (b)p ,
log T −1 ( b)p ,
log T (b)p ,
log T ( b)p
are convex functions of 1/p ∈ [0, 1]. These functions are therefore continuous, which implies that if p > 1 is sufficiently close to 1, then T −1 (b)p < T −1 ( b)p ,
κp (T (b)) < κp (T ( b)).
The moral is that, in general, we cannot remove the maxima in formulas (6.14) and (6.15) for p = 2.
6.3
Convergence Speed Estimates
Again let Bn = Tn (b) + Pn KPn + Wn LWn be as in Section 6.1, set B = T (b) + K, = T ( B b) + L, and put −1 p ), Np = max(B −1 p , B
Np0 = T (b−1 )p .
are invertible. Theorem 6.7. Let 1 < p < ∞ and suppose B and B 0 If Np > Np , then there is a γ > 0 depending only on b, K, L such that √ Bn p − Np = O(e−γ n ).
(6.17)
If Np = Np0 , then Bn p − Np = O
log n . n
(6.18)
Proof. We know from Theorem 6.2 that Bn−1 = Tn (a) + Pn XPn + Wn Y Wn + En , where a = b−1 , X and Y are given by (6.4), and En p = O(e−αn ). Let s(n) be a sequence of natural numbers such that s(n) → ∞,
n → ∞. s(n)
(6.19)
Put Xs(n) = Ps(n) XPs(n) , Ys(n) = Ps(n) Y Ps(n) , and let cn denote the s(n)th partial sum of the Fourier series of a. Set An = Tn (cn ) + Pn Xs(n) Pn + Wn Ys(n) Wn .
(6.20)
Lemma 1.17 implies that a − cn W = O(s(n) ) with some < 1.
(6.21)
i
i i
i
i
i
i
6.3. Convergence Speed Estimates
buch7 2005/10/5 page 143 i
143
We have −1 B p − An p ≤ B −1 − An p n
n
≤ Tn (a − cn )p + Pn (X − Xs(n) )Pn p + Wn (Y − Ys(n) )Wn p + En p ≤ a − cs(n) W + X − Xs(n) p + Y − Ys(n) p + En p .
(6.22)
By Lemma 6.1, X − Ps XPs p = O(σ s ) and Y − Ps Y Ps p = O(σ s ) with σ = e−α < 1 as s → ∞. This and (6.17) show that (6.22) is O(s(n) ) + O(σ s(n) ) + O(σ s(n) ) + O(σ s(n) ) = O(τ s(n) ), where max(, σ ) < τ < 1. Let Mp (s(n)) = max(T (cn ) + Xs(n) p , T ( cn ) + Ys(n) p ),
Mp0 (s(n)) = T (cn )p .
Then | Mp (s(n)) − Np | does not exceed cn ) + Ys(n) p − T ( max T (cn ) + Xs(n) p − T (a) + Xp , T ( a ) + Y p ≤ max T (cn − a) + Xs(n) − Xp , T ( cn − a ) + Ys(n) − Y p , and again taking into account (6.21) and Lemma 6.1 we see that this is O(s(n) ) + O(σ s(n) ) = O(τ s(n) ). Furthermore, also by (6.21), 0 M (s(n)) − N 0 p p = T (cn )p − T (a)p ≤ T (cn − a)p = O(s(n) ) = O(τ s(n) ). In summary, at the present moment we have shown that −1 B p − Np ≤ An p − Mp (s(n)) + O(τ s(n) ) n
(6.23)
and that Mp (s(n)) = Np + O(τ s(n) ),
Mp0 (s(n)) = Np0 + O(τ s(n) ).
(6.24)
Now suppose that Np > Np0 . Then, by (6.24), Mp (s(n)) > Mp0 (s(n)) for all sufficiently large n, and Theorem 5.16 therefore gives An p − Mp (s(n)) n/(4s(n)) (6s(n)+1)/(4s(n)) Mp0 (s(n)) Mp (s(n)) ≤ 2s(n)Mp (s(n)) Mp0 (s(n)) Mp (s(n)) 2 n/(4s(n)) Mp0 (s(n)) Mp (s(n)) ≤ 2s(n)Mp (s(n)) . (6.25) Mp0 (s(n)) Mp (s(n))
i
i i
i
i
i
i
144
buch7 2005/10/5 page 144 i
Chapter 6. Condition Numbers
Choose any ε > 0 so that (Np0 + ε)/(Np − ε) ∈ (0, 1) and assume that τ ∈ (0, 1) is larger than (Np0 + ε)/(Np − ε). If n is sufficiently large, then (6.25) is at most
Np + ε 2s(n)(Np + ε) Np0 − ε
2
Np0 + ε
n/(4s(n))
Np − ε
= O s(n)τ n/(4s(n)) .
This in conjunction with (6.23) yields the estimate −1 B p − Np = O s(n)τ n/(4s(n)) . n √ Letting s(n) = [ n ], we obtain (6.17) with any γ > 0 such that τ 1/4 < e−γ . Now assume Np = Np0 . Then Mp (s(n)) ≥ Mp0 (s(n)) for all n by virtue of Lemma 6.10. From Theorem 5.13 and (6.24) we obtain 8 s(n) s(n) An p − Mp (s(n)) ≤ Mp (s(n)) =O , p n n while the lower estimate of Theorem 5.18 and (6.24) give −An p + Mp (s(n)) 81 0 s(n) ≤ Mp (s(n)) + Mp (s(n)) − Mp0 (s(n)) p n s(n) =O + O(τ s(n) ) n (to get the O(τ s(n) ) we used (6.24) and the equality Np = Np0 ). Choose γ > 0 and α > 0 so that τ < e−γ and γ α > 1. Then put s(n) = [α log n]. Clearly, (6.19) is satisfied. As s(n) log n O =O , n n log n s(n) −γ α log n −γ α O(τ ) = O(e , ) = O(n ) = O n we get (6.18) from (6.23). are invertible. Let further α > 0 Theorem 6.8. Let p = 1 or p = ∞ and suppose B and B be any number satisfying (1.23). Then Bn p − Np = O(e−αn/2 ). (6.26) Proof. Write Bn−1 as in the proof of Theorem 6.7, put s(n) = [n/2] − 1, and define An by (6.20). Proceeding in the way we derived (6.22) and using Lemmas 1.17 and 6.1, we get −1 B p − An p n ≤ a − cn W + X − Xs(n) p + Y − Ys(n) p + En p = O(e−αs(n) ) = O(e−αn/2 ).
(6.27)
i
i i
i
i
i
i
6.4. Generic and Exceptional Cases
buch7 2005/10/5 page 145 i
145
Since 2s(n) + 1 ≤ n, we may employ (5.27) and (5.28) to conclude that An p = max T (cn ) + Xs(n) p , T ( cn ) + Ys(n) p , whence, again by Lemmas 1.17 and 6.1, An p − Np ≤ max cn − aW + X − Xs(n) p , cn − a W + Y − Ys(n) p = O(e−αs(n) ) = O(e−αn/2 ).
(6.28)
Combining (6.27) and (6.28) we arrive at the asserted estimate (6.26). Recall that p ), Mp0 = T (b)p Mp = max(Bp , B −1 p ), Np0 = T (b−1 )p . Np = max(B −1 p , B
(6.29) (6.30)
are invertible on p . Theorem 6.9. Suppose B and B If 1 < p < ∞, Mp > Mp0 , and Np > Np0 , then there is a constant γ > 0 (depending on p, b, K, L) such that √ κp (Bn ) − Mp Np = O(e−γ n ). (6.31) If 1 < p < ∞ and if Mp = Mp0 or Np = Np0 , then κp (Bn ) − Mp Np = O log n . n If p = 1 or p = ∞, then κp (Bn ) − Mp Np = O(e−αn/2 ),
(6.32)
(6.33)
where α > 0 satisfies (1.23). Proof. Estimate (6.31) follows from (5.51) and Theorem 6.7, estimate (6.32) is a consequence of (5.51), (5.52), and Theorem 6.7, and equality (6.33) results from (5.27), (5.28), and Theorem 6.8.
6.4
Generic and Exceptional Cases
= T ( Let Bn = Tn (b)+Pn KPn +Wn LWn be as in Section 6.1, put B = T (b)+K, B b)+L, and define Mp , Mp0 , Np , Np0 by (6.29) and (6.30). Lemma 6.10. Let 1 ≤ p ≤ ∞. We always have Mp0 ≤ Mp and Np0 ≤ Np . Proof. The inequality Mp0 ≤ Mp follows from Lemma 5.12 for 1 < p < ∞ and from (1.16) for p = 1 and p = ∞. To prove that Np0 ≤ Np , note first that B −1 = T (b−1 ) + X
i
i i
i
i
i
i
146
buch7 2005/10/5 page 146 i
Chapter 6. Condition Numbers
by (6.4). Hence Qn B −1 Qn = Qn T (b−1 )Qn + Qn XQn , and since Qn T (b−1 )Qn p = T (b−1 )p and Qn XQn p → 0 (Lemma 6.1), it follows that lim Qn B −1 Qn p = T (b−1 )p . As B −1 p ≥ Qn B −1 Qn p , we conclude that B −1 p ≥ T (b−1 )p . Anal−1 p ≥ T ( b−1 )p = T (b−1 )p . ogously, B In Chapter 5, we observed that the norm Bn p converges exponentially fast to Mp if Mp > Mp0 , while the convergence may be slow in the case Mp = Mp0 . We also proved that the strict inequality M2 > M20 represents the generic case and that we have the (exceptional) equality in the case where Bn = Tn (b) is a pure Toeplitz matrix. The results of Section 6.3 show that the same phenomenon is encountered when treating the norms of inverses: Bn−1 p converges to Np very fast if Np > Np0 , whereas this convergence may be slow provided Np = Np0 . As the following proposition reveals, at least for p = 2 the strict inequality N2 > N20 is the generic case. Define P × X × X as in the paragraph before Proposition 5.17 and let P0 be the set of all Laurent polynomials without zeros on T. Proposition 6.11. The set of all (b, K, L) ∈ P0 × X × X for which max((T (b) + K)−1 2 , (T ( b) + L)−1 2 ) > T (b−1 )2 is an open and dense subset of P0 × X × X . Proof. Clearly, the set under consideration is open. It remains to show that {(b, K) ∈ P0 × X : (T (b) + K)−1 2 > T (b−1 )2 } is dense in P0 ×X . So let b ∈ P0 and K ∈ X and suppose (T (b)+K)−1 2 = T (b−1 )2 = b−1 ∞ . Fix a number ε > 0. We know from (6.4) and Lemma 6.1 that (T (b) + K)−1 = T (b−1 )+X with a compact operator X. If λ ∈ C is sufficiently small, then T (b−1 )+X+λPn is invertible and (T (b−1 ) + X)−1 2 (T (b−1 ) + X + λPn )−1 2 ≤ . 1 − (T (b−1 ) + X)−1 2 |λ| Consequently, (T (b−1 ) + X + λPn )−1 − (T (b−1 ) + X)−1 2 < ε/2
(6.34)
for all n ≥ 1 whenever |λ| is small enough. Now, as in the proof of Proposition 5.17, choose a λ ∈ C of sufficiently small absolute value so that b−1 +λ∞ > b−1 ∞ , T (b−1 )+X +λPn is invertible, and (6.34) holds. It follows, again as in the proof of Proposition 5.17, that T (b−1 ) + X + λPn 2 > T (b−1 )2 for all sufficiently large n. Put Kn = (T (b−1 ) + X + λPn )−1 − T (b). By construction, (T (b) + Kn )−1 2 > T (b−1 )2 . Since Kn is compact together with X + λPn , the operators Pm Kn Pm ∈ X converge uniformly to Kn as m → ∞ (Proposition 3.2). Hence (T (b) + Pm Kn Pm )−1 > T (b−1 )2 for all sufficiently large m. If m is large enough, then Pm Kn Pm − Kn 2 < ε/2. This together with (6.34) shows that Kn − K2 = (T (b−1 ) + X + λPn )−1 − (T (b−1 ) + X)−1 2 < ε, which completes the proof.
i
i i
i
i
i
i
6.5. Norms of Inverses of Pure Toeplitz Matrices
buch7 2005/10/5 page 147 i
147
Combining Propositions 5.17 and 6.11, we arrive at the conclusion that, at least for p = 2, the fast convergence (6.31) of the condition numbers is generic, while the slow convergence (6.32) occurs in exceptional cases only.
6.5
Norms of Inverses of Pure Toeplitz Matrices
We now turn to pure Toeplitz matrices. Throughout what follows, b is a Laurent polynomial. In the case Bn = Tn (b), we have Mp = T (b)p , Mp0 = T (b)p , b)p , Np = max T −1 (b)p , T −1 (
Np0 = T (b−1 )p .
Corollary 6.12. Let b be a Laurent polynomial and suppose T (b) is invertible on the space p . If 1 < p < ∞ and T −1 (b)p > T (b−1 )p , then there exists a γ > 0 (depending only on p and b) such that √ −1 T (b)p − max T −1 (b)p , T −1 ( b)p = O(e−γ n ). (6.35) n If 1 < p < ∞ and T −1 (b)p = T (b−1 )p , then −1 T (b)p − max T −1 (b)p , T −1 ( b)p = O n
log n . n
If p = 1 or p = ∞, then −1 T (b)p − max T −1 (b)p , T −1 ( b)p = O(e−αn/2 ), n
(6.36)
(6.37)
where α > 0 is any number satisfying (1.23). Proof. This is Theorems 6.7 and 6.8 in the special case Bn = Tn (b). We remark that for p = 2 the number max (T −1 (b)p , T −1 ( b)p ) can be replaced by T −1 (b)2 . Thus, N2 = T −1 (b)2 and N20 = T (b−1 )2 . Corollary 6.12 motivates the discussion of the question of when in T −1 (b)p ≥ −1 T (b )p equality or strict inequality holds. This section gives partial answers to this question in the case p = 2. Proposition 6.13. Let T (b) be invertible and let b = b− b+ be a Wiener-Hopf factorization. If |b− | and |b+ | attain their minimum on T at the same point, min |b− (t)| = |b− (t0 )|, min |b+ (t)| = |b+ (t0 )|, t∈T
t∈T
(6.38)
then N2 = N20 = 1/|b(t0 )|. Proof. By (6.38), −1 −1 −1 −1 )T (b− )2 ≤ b+ ∞ b− ∞ T −1 (b)2 = T (b+ −1 −1 = |b+ (t0 )| |b− (t0 )| = |b−1 (t0 )| ≤ b−1 ∞ = T (b−1 )2 .
Since always T (b−1 )2 ≤ T −1 (b)2 due to Lemma 6.10, we arrive at the assertion.
i
i i
i
i
i
i
148
buch7 2005/10/5 page 148 i
Chapter 6. Condition Numbers
Corollary 6.14. Suppose T (b) is invertible. If T (b) is Hermitian or triangular, then N2 = N20 . Moreover, if t0 ∈ T is any point at which |b| attains its minimum on T, then N2 = N20 = 1/|b(t0 )|. Proof. Let T (b) be Hermitian. Then b is real valued, and the invertibility of T (b) implies that b(t) ≥ ε > 0 or b(t) ≤ −ε < 0 for all t ∈ T. In the first case, log b is a real-valued function in W . Therefore (log b)−k = (log b)k , whence ∞ 1 −k (log b)0 + (log b)−k t b− (t) := exp 2 k=1 ∞ 1 = exp (log b)0 + (log b)k t k =: b+ (t). 2 k=1 The factorization b = b− b+ is a Wiener-Hopf factorization. Since |b+ | = |b− |, condition (6.38) is satisfied for some t0 ∈ T. Thus, N2 = N20 = 1/|b(t0 )| by Proposition 6.13. If b(t) ≤ −ε < 0, we can apply the preceding argument to −b. If T (b) is triangular, then T −1 (b) = T (b−1 ), and this yields all assertions. Theorem 6.15. Let P00 be the set of all Laurent polynomials that have no zeros on T and whose winding number is zero. Equip P00 with the L∞ metric. The set of all b ∈ P00 for which N2 = T −1 (b)2 > T (b−1 )2 = N20 is an open and dense subset of P00 . Proof. It is clear that the set is open. To show that it is dense, pick b ∈ P00 and ε > 0. We construct a c ∈ P00 such that c − b∞ < ε and T −1 (c)2 > T (c−1 )2 . Since T (μb(t/t0 )) = μ diag (1, t0 , t02 , . . . ) T (b) diag (1, t0−1 , t0−2 , . . . ), we may without loss of generality assume that mint∈T |b(t)| = 1 and b(1) = 1. Let 0 < δ < 1/4 and consider c = bϕδ− ϕδ+ where ϕδ− (t) =
1 − 2δ + 2δt −n , 1 − δ + δt −n
ϕδ+ (t) =
1 − δ + δt n . 1 − 2δ + 2δt n
Clearly, c−b∞ < ε if only δ > 0 is sufficiently small. It is easy to see that |ϕδ− (t)ϕδ+ (t)| = 1 for all t ∈ T and that ϕδ− (1) = ϕδ+ (1) = 1. Since T (c−1 )2 = c−1 ∞ = 1, we are left with proving that T −1 (c)2 > 1. Let b = b− b+ be a Wiener-Hopf factorization with b− (1) = b+ (1) = 1. Then c = c− c+ ,
c− := b− ϕδ− ,
c+ := b+ ϕδ+
is a Wiener-Hopf factorization of c. Given z ∈ C such that |z| > 1, put $ t −1 1 . fz (t) = 1 − 2 1 − |z| z The sequence of the Fourier coefficients of fz is $ : 1 1 1 xz := 1 − 2 . . . , 0, 0, 1, , 2 , . . . . |z| z z
i
i i
i
i
i
i
6.5. Norms of Inverses of Pure Toeplitz Matrices
buch7 2005/10/5 page 149 i
149
Thus, we may think of xz as a unit vector in 2 . It is readily seen that T (d− )xz = d− (z)xz for every d− ∈ W− . This implies that −1 −1 −1 −1 )T (c− )xz 22 = |c− (z)|2 T (c+ )xz 22 . T −1 (c)xz 22 = T (c+
Using the analogue of Lemma 4.7 for infinite matrices, we get −1 −1 )xz 22 = c+ fz 22 T (c+ −1 −1 −1 = (c+ − c+ (1/z))fz + c+ (1/z)fz 22 −1 −1 −1 = (c+ − c+ (1/z))fz 22 + |c+ (1/z)|2 2π −1 iθ −1 −1 (c+ (e ) − c+ 1 (1/z)) c+ (1/z) dθ + 2 1 − 2 Re . iθ −iθ |z| (1 − e /z)(1 − e /z) 2π 0
The integral is the 0th Fourier coefficient of its integrand, and taking into account that * + + * e−iθ e−2iθ a+ (eiθ ) iθ 2iθ = a + a e + a e + · · · 1 + + · · · + 0 1 2 1 − eiθ /z 0 z z2 0 a1 a2 = a0 + + 2 + · · · = a+ (1/z), z z we see that the integral is zero. Thus, −1 −1 −1 −1 −1 (z)c+ (1/z)|2 + |c− (z)|2 (c+ − c+ (1/z))fz 22 . T −1 (c)xz 22 = |c−
(6.39)
Now put n = [1/δ 3 ] and z = 1 + δ 3 . We have 1 ϕδ− (z)
=
1 δ(1 − (1 + δ 3 )−n ) = 1 + 1 − 2δ(1 − (1 + δ 3 )−n ) + δ3)
ϕδ− (1
and since (1 + 1/n)−n = e−1 + O(1/n) and hence (1 + δ 3 )−n = e−1 + O(δ 3 ), we obtain that δ(1 − e−1 ) + O(δ 4 ) = 1 + δ(1 − e−1 ) + O(δ 2 ). 1 − 2δ(1 − e−1 )
(6.40)
δ(1 − e−1 ) 1 = 1 − + O(δ 4 ) = 1 − δ(1 − e−1 ) + O(δ 2 ). 1 − δ(1 − e−1 ) ϕδ+ (1/z)
(6.41)
1 ϕδ− (z)
=1+
Analogously,
It is obvious that ϕδ− (z)ϕδ+ (1/z) = 1.
(6.42)
Since z = z, equalities (6.39) and (6.42) imply that −1 −1 −1 −1 −1 (z)b+ (1/z)|2 + |b− (z)|2 |1/ϕδ− (z)|2 (c+ − c+ (1/z))fz 22 . T −1 (c)xz 22 = |b−
i
i i
i
i
i
i
150
buch7 2005/10/5 page 150 i
Chapter 6. Condition Numbers
The formulas b− (z) = b− (1 + δ 3 ) = 1 + O(δ 3 ),
b+ (1/z) = b+ (1/(1 + δ 3 )) = 1 + O(δ 3 )
together with (6.40) therefore give −1 −1 T −1 (c)xz 22 = 1 + O(δ 3 ) + (1 + O(δ 3 ))(1 + O(δ)) (c+ − c+ (1/z))fz 22 .
(6.43)
Further, −1 −1 −1 −1 (c+ − c+ (1/z))fz 2 = ((ϕδ+ )−1 b+ − (ϕδ+ )−1 (1/z)b+ (1/z))fz 2 −1 −1 −1 ≥ ((ϕδ+ )−1 − (ϕδ+ )−1 (1/z))b+ fz 2 − |(ϕδ+ )−1 (1/z)| (b+ − b+ (1/z))fz 2 =: A − B.
We have B = 2
|(ϕδ+ )−1 (1/z)|2
1 1− 2 z
T
−1 −1 |b+ (t) − b+ (1/z)|2 |dt| |1 − t/z|2 2π
−1 −1 |b+ 1 (t) − b+ (1/z)|2 |dt| = |(ϕδ+ )−1 (1/z)|2 1 − 2 z |t − 1/z|2 2π T |dt| 1 = |(ϕδ+ )−1 (1/z)|2 1 − 2 |gz (t)|2 z 2π T with the function ∞
gz (t) =
−1 −1 (t) − b+ (1/z) b+ hj (t − 1/z)j . = t − 1/z j =1
The function gz is analytic in some open disk containing D. Thus, there is a constant M < ∞ such that gz ≤ M for all z sufficiently close to 1. Since, by (6.41), |(ϕδ+ )−1 (1/z)|2 = 1 + O(δ) and, obviously, 1 − 1/z2 = O(δ 3 ), it follows that B ≤ (1 + O(δ)) O(δ 3/2 ) M = O(δ 3/2 ). On the other hand,
−1 |(ϕδ+ )−1 (t) − (ϕδ+ )−1 (1/z)|2 |b+ (t)|2 |dt| 1 A2 = 1 − 2 z |1 − t/z|2 2π T + −1 + −1 2 1 |(ϕ ) (t) − (ϕ ) (1/z)| |dt| δ δ ≥ γ2 1 − 2 , z |1 − t/z|2 2π T
−1 where γ = mint∈T |b+ (t)|. Taking into account (6.41) we get
δ(1 − t n ) δ(1 − 1/zn ) (ϕδ+ )−1 (t) − (ϕδ+ )−1 (1/z) = 1 − − 1 + 1 − δ(1 − t n ) 1 − δ(1 − 1/zn ) n −1 1−e 1−t − + O(δ 3 ) =δ n 1 − δ(1 − t ) 1 − δ(1 − e−1 ) = δ(1 − t n − (1 − e−1 ) + O(δ)) = δ(e−1 − t n + O(δ))
i
i i
i
i
i
i
6.5. Norms of Inverses of Pure Toeplitz Matrices
buch7 2005/10/5 page 151 i
151
and hence arrive at the estimate 1 (|t n − e−1 | + O(δ))2 |dt| A2 ≥ γ 2 1 − 2 δ 2 z |1 − t/z|2 2π T −1 2 1 (1 − e + O(δ)) |dt| ≥ γ 2 1 − 2 δ2 z |1 − t/z|2 2π T = γ 2 δ 2 (1 − e−1 + O(δ))2 = γ 2 (1 − e−1 )2 δ 2 + O(δ 3 ). In summary, (6.43) yields T −1 (c)xz 2 ≥ 1 + O(δ 3 ) + (1 + O(δ))(A − B)2 ≥ 1 + O(δ 3 ) + (1 + O(δ))(γ (1 − e−1 )δ + O(δ 3/2 ))2 = 1 + γ 2 (1 − e−1 )2 δ 2 + O(δ 5/2 ), and this is certainly greater than 1 if only δ > 0 is sufficiently small. By virtue of Theorem 6.15, we may say that the strict inequality N2 > N20 is the generic case, whereas the equality N2 = N20 represents the exceptional case. Corollary 6.14 tells us that we are in the exceptional case whenever T (b) is Hermitian or triangular. Estimates (6.35) and (6.36) result from our techniques, and it may be that these estimates can be improved. We conjecture that Corollary 6.12 remains valid with the righthand sides of (6.35) and (6.36) replaced by O(e−γ n ) and O (1/n2 ), respectively. However, as the following example shows, the gap between (6.35) and (6.36) is essential and cannot be removed. Example 6.16. Let b(t) = 3 − t − t −1 (t ∈ T). By Theorem 2.4, the eigenvalues of the Hermitian matrix Tn (b) are jπ λj (Tn (b)) = 3 + 2 cos (j = 1, . . . , n). n+1 This implies that 1 1 π2 −1 Tn (b)2 = , =1− 2 +O λn (Tn (b)) n n3 and it also shows that Tn−1 (b)2 − T −1 (b)2 = Tn−1 (b)2 − 1 cannot decay faster than O(1/n2 ). Estimate (6.36) is universal in the sense that it does not depend on any further information about b. In case we know more about b, we can improve (6.36). Lemma 6.17. Let b be a Laurent polynomial for which T (b) is invertible. Further, let t0 ∈ T be any point at which |b| attains its minimum and suppose that |b(t) − b(t0 )| ≤ D|t − t0 |γ for all t ∈ T, where D ∈ (0, ∞) and γ ∈ N. Then Tn−1 (b)2 ≥ T (b−1 )2 −
C for all n ≥ 1 nγ
with some constant C ∈ (0, ∞).
i
i i
i
i
i
i
152
buch7 2005/10/5 page 152 i
Chapter 6. Condition Numbers
Proof. Without loss of generality assume that t0 = 1. Put j = γ + 1 and let m be j + the nonnegative integer for which mj < n ≤ (m + 1)j . Define pm ∈ Pmj +1 by (4.20), (4.21). Clearly, pm ∈ Pn+ . To estimate Tn−1 (b)2 , we make use of Lemma 4.9. We have j j j bpm 2 ≤ |b(1)| pm 2 + (b − b(1))pm 2 , and 2j π dθ j 2 iθ 2 sin((m + 1)θ/2) (b − b(1))pm 2 = |b(e ) − b(1)| sin(θ/2) 2π −π 2γ 2j dθ 2γ −2j dθ + =O |θ| (m + 1) |θ | |θ | 2π 2π |θ|1/(m+1) 2j 2γ +1 (m + 1) (m + 1) =O = O (m + 1) = O(m). + 2γ +1 (m + 1) (m + 1)2j j
Thus, by Lemma 4.10,
j
bpm 2 j
pm 2
≤ |b(1)| + O
m1/2 mj −1/2
= |b(1)| + O
From Lemma 4.9 we now obtain that Tn−1 (b)2 is at least 1 1 1 = ≥ −O |b(1)| + O(1/mγ ) |b(1)| + O(1/nγ ) |b(1)|
1 nγ
1 mγ
≥
.
1 1 −C γ, |b(1)| n
and since T (b−1 )2 = b−1 ∞ = 1 / min |b| = 1/|b(1)|, we arrive at the assertion. Theorem 6.18. Let b be a Laurent polynomial and suppose T (b) is invertible. Let t0 ∈ T and assume |b(t0 )| = min |b(t)|, |b(t) − b(t0 )| ≤ D|t − t0 |γ t∈T
(t ∈ T).
If T (b) is Hermitian or triangular, then N2 = T (b−1 )2 = 1/|b(t0 )| and C N2 1 − γ ≤ Tn−1 (b)2 ≤ N2 for all n ≥ 1, n
(6.44)
where C ∈ (0, ∞) is some constant independent of n. Proof. The equalities N2 = T (b−1 )2 = 1/|b(t0 )| follow from Corollary 6.14, and the lower estimate in (6.44) results from Lemma 6.17. To prove the upper estimate of (6.44), assume first that T (b) is Hermitian. Then b(t) ≥ |b(t0 )| > 0 or b(t) ≤ −|b(t0 )| < 0 for all t ∈ T. We consider the first case (the second case can be reduced to the first case by replacing b with −b). Let λmin be the smallest eigenvalue of Tn (b) and let x be an eigenvector such that x2 = 1. Define f ∈ Pn+ by (4.15). Then, by (4.16), 2π 1 λmin = (Tn (b)x, x) = b(eiθ )|f (eiθ )|2 dθ ≥ |b(t0 )|, 2π 0 whence Tn−1 (b)2 = 1/λmin ≤ 1/|b(t0 )| = N2 . This is the upper estimate of (6.44) for Hermitian matrices. If T (b) is triangular, we have Tn−1 (b) = Tn (b−1 ), which implies that Tn−1 (b)2 = Tn (b−1 )2 ≤ b−1 ∞ = 1/|b(t0 )| = N2 .
i
i i
i
i
i
i
6.6. Condition Numbers of Pure Toeplitz Matrices
6.6
buch7 2005/10/5 page 153 i
153
Condition Numbers of Pure Toeplitz Matrices
Combining our results on the converge speed of norms and norms of inverses, we can establish a result on the convergence speed for condition numbers. Corollary 6.19. Let b be a Laurent polynomial and let T (b) be invertible on the space p . If 1 < p < ∞, then log n κp (Tn (b)) − max κp (T (b)), κp (T ( . (6.45) b)) = O n If p = 1 or p = ∞, then κp (Tn (b)) − max κp (T (b)), κp (T ( b)) = O(e−αn/2 ),
(6.46)
where α > 0 is subject to (1.23). b))). Thus, (6.32) gives (6.45), Proof. In the case at hand, Mp Np = max(κp (T (b)), κp (T ( while (6.33) yields (6.46). Note that in the case p = 2 the equality max (κ2 (T (b)), κ2 (T ( b))) = κ2 (T (b)) holds. −1 We emphasize that generically T (b) converges to N very fast, namely, at least p p n √ of the order O(e−γ n ). The generically slow convergence of κp (Tn (b)) to Mp Np is caused by the generically slow convergence of Tn (b)p to Mp . As in the case of the norms of the inverses, we believe that (6.45) can be sharpened. However, the following result shows that generically we cannot expect more than polynomially fast convergence. Proposition 6.20. The set of all Laurent polynomials b ∈ P00 for which there exist constants 2γ ∈ {2, 4, 6, . . . } and μ > 0 such that μ | κ2 (Tn (b)) − κ2 (T (b)) | ≥ 2γ for all n ≥ 1 n is an open and dense subset of P00 . Proof. Let E1 = {b ∈ P00 : |b| is not constant},
E2 = {b ∈ P00 : T −1 (b)2 > T (b−1 )2 }.
Proposition 5.6 implies that E1 is open and dense, and Theorem 6.15 says that E2 is open and dense. Hence, E1 ∩ E2 is also an open and dense subset of P00 . If b ∈ E1 ∩ E2 , then √ c | Tn (b)2 − T (b)2 | ≥ 2γ , Tn−1 (b)2 − T −1 (b)2 ≤ de−δ n n by Theorem 5.8 and (6.35), whence | κ2 (Tn (b)) − κ2 (T (b)) | = Tn (b)2 Tn−1 (b)2 − T (b)2 T −1 (b)2 ≥ T −1 (b)2 Tn (b)2 − T (b)2 − Tn (b)2 Tn−1 (b)2 − T −1 (b)2 ≥ T −1 (b)2 cn−2γ n − Tn (b)2 de−δ
√
n
≥ μn−2γ
for some μ > 0.
i
i i
i
i
i
i
154
6.7
buch7 2005/10/5 page 154 i
Chapter 6. Condition Numbers
Conclusions
In Chapters 5 and 6, we proved several estimates that make precise the following insights: (a) Within the class of Toeplitz-like matrices, fast convergence of the norms, of the norms of the inverses, and of the condition numbers is generic. (b) Within the class of pure Toeplitz matrices, norms converge generically slow, norms of inverses converge generically fast, and condition numbers converge generically slow.
Exercises 1. Prove that there exist b, c ∈ P such that T (b)T (c)2 < T (c)T (b)2 . 2. Let K = diag (0, −3/4, 0, 0, . . . ) and L = diag (2, −1/2, 0, 0, . . . ). Then for n ≥ 4, 1 1 An := I + Pn KPn + Wn LWn = diag 1, , 1, . . . , 1, , 3 . 4 3 n−4
denote the strong limits of An and Wn An Wn . Show that Let A and A lim κ2 (An ) = 12,
n→∞
κ2 (A) = 4,
= 6. max(κ2 (A), κ2 (A))
3. Show that there exist b ∈ P such that κ2 (Tn (b)) → ∞ as n → ∞ although Tn (b)2 ≤ 2 and det Tn (b) = 1 for all n ≥ 1. 2 4. Let {an }∞ n=0 be a sequence of complex numbers satisfying |an | = O(1/n ) and put ⎞ ⎛ 0 0 ... 0 a0 ⎜ a0 0 ... 0 a1 ⎟ ⎟ ⎜ ⎜ a1 a0 ... 0 a2 ⎟ An = ⎜ ⎟. ⎜ .. .. .. .. ⎟ . . ⎝ . . . . . ⎠
an
an−1
...
a0
an+1
Fix p ∈ [1, ∞]. Show that κp (An ) remains bounded as n → ∞ if and only if
∞ n n=0 an z = 0 for |z| ≤ 1. 5. Find four Laurent polynomials a, b, c, d such that ad − bc has no zeros on T and winding number zero but Tn (a) Tn (b) → ∞ as n → ∞ κp Tn (c) Tn (d) for every p ∈ [1, ∞]. 6. Let b ∈ P and suppose b ≥ 0 on T and G(b) = 1. Prove that κ2 (Tn (b))/Dn (b) converges to zero if b has at least three distinct zeros on T. What can be said about the ratio κ2 (Tn (b))/Dn (b) if b has no, exactly one, or exactly two distinct zeros on T?
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 155 i
155
7. Let p(z) = p0 + · · · + pn−1 zn−1 + zn (p0 = 0) and suppose we know that p(z) has m zeros in {z ∈ C : |z| < r < 1} and n − m zeros in {z ∈ C : |z| > R > 1}. We want a polynomial factorization p(z) = v(z)(z) such that all zeros of v(z) and (z) are of modulus less than r and greater than R, respectively. To estimate the conditioning (z) be another polynomial of the same form as p(z) and denote of this problem, let p (z) = by p v (z) (z) the corresponding factorization. (a) Prove that the coefficients of are the solution x of the equation T (a −1 )x = e1 , where a(t) = t −m p(t) and e1 = {1, 0, 0, . . . }. ∞ ≤ ε p∞ , then (b) Prove that if p − p − 2 ≤ ε κ2 (T (a −1 )) p∞ p −1 ∞ + O(ε 2 ) 2
as
ε → 0,
where · 2 and · ∞ are the norms in L2 (T) and L∞ (T). 8. Let An be an n × n matrix with distinct eigenvalues λ1 , . . . , λn . Choose nonzero vectors xj and yj such that An xj = λj xj and A∗n yj = λj yj (j = 1, . . . , n). The instability index i(An ) is defined as i(An ) = max
1≤j ≤n
xj 2 yj 2 . |(xj , yj )|
(a) Prove the following implications: Vn−1 An Vn is diagonal for some Vn with κ2 (Vn ) ≤ k ⇒
(An − λI )−1 2 ≤ k/dist (λ, sp An ) for all λ ∈ / sp An
⇒
i(An ) ≤ k.
(b) Show that if
⎛ ⎜ ⎜ An = ⎜ ⎝
f0 f1 a .. .
fn−1 a −1 f0 .. .
... ...
f1 a −(n−1) f2 a −(n−2) .. .
fn−1 a n−1
fn−2 a n−2
...
f0
⎞ ⎟ ⎟ ⎟ ⎠
with a > 1, then i(An ) =
a (a n − a −n ). (a 2 − 1)n
Notes Corollary 6.5 was established in [34] for p = 2 and in [146] for p = 1. Example 6.6 is from [146]. All other results of this chapter are taken from the papers [49], [59], [62]. We remark that Theorem 6.15 is stated in [59] but that the proof given there is incorrect. We thank Alexander Rogozhin for bringing this to our attention. The proof presented here is new. The result of Exercise 7 was established in [28]. Exercise 8 is from [95]. Note that what is called the “instability index” in [95] and here is also known as the maximum of the eigenvalue condition numbers (see [298] and [137]).
i
i i
i
i
buch7 2005/10/5 page 156 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 157 i
Chapter 7
Substitutes for the Spectrum
As will be seen in Chapter 11, the spectrum sp Tn (b) need not mimic sp T (b) as n goes to infinity. In contrast to this, pseudospectra and numerical ranges behave as nicely as we could ever expect. These sets are the concern of the present chapter.
7.1
Pseudospectra
For ε > 0, the ε-pseudospectrum of an operator A ∈ B(X) on a Banach space X is defined by (7.1) spε A = λ ∈ C : (A − λI )−1 B(X) ≥ 1/ε . Here we put (A − λI )−1 B(X) = ∞ if A − λI is not invertible. Thus, the usual spectrum p sp A is always a subset of spε A. Clearly, spε A depends on X. If A acts on p or n , we (p) denote the ε-pseudospectrum of A by spε A. Definition (7.1) admits several modifications and generalizations. An important generalization, which is motivated by plenty of applications, is the so-called structured pseudospectrum. In this context we are given three operators A, B, C ∈ B(X) and we define / sp A : C(A − λI )−1 BB(X) ≥ 1/ε . (7.2) spB,C ε A = sp A ∪ λ ∈ Evidently, spε A is nothing but spI,I ε A. Furthermore, some authors prefer (7.1) and (7.2) with “≥” replaced by “>”. Theorem 7.2 and its Corollary 7.3 will give alternative descriptions of the sets (7.1) and (7.2) in the Hilbert space case. Lemma 7.1. If M and N are linear operators, then I + MN is invertible if and only if I + N M is invertible, in which case (I + MN )−1 = I − M(I + N M)−1 N. Proof. Simply check that (I + MN )(I − M(I + N M)−1 N ) = (I − M(I + N M)−1 N )(I + MN ) = I. 157
i
i i
i
i
i
i
158
buch7 2005/10/5 page 158 i
Chapter 7. Substitutes for the Spectrum
Theorem 7.2. If H is a Hilbert space, A, B, C ∈ B(H ), and ε > 0, then .
sp (A + BKC) = sp A ∪ λ ∈ / sp A : C(A − λI )−1 B ≥ 1/ε ,
(7.3)
sp (A + BKC) = sp A ∪ λ ∈ / sp A : C(A − λI )−1 B > 1/ε ,
(7.4)
K≤ε
. K 0, .
spε A =
sp (A + K),
K≤ε
the union taken over all K ∈ B(H ) of norm at most ε. Proof. This is (7.3) with B = C = I . The following result sharpens (7.4). Theorem 7.4. Let X be a Banach space, let A, B, C be operators in B(X), and let ε > 0. Then sp A ∪ {λ ∈ / sp A : C(A − λI )−1 B > 1/ε} =
.
sp (A + BKC)
(7.5) (7.6)
K 1/ε. Then we can find a u ∈ X such that u = 1 and CA−1 Bu > 1/ε. Thus, CA−1 Bu = 1/δ with δ < ε. By the Hahn-Banach theorem, there is a functional ϕ ∈ X ∗ such that ϕ = 1 and ϕ(CA−1 Bu) = CA−1 Bu = 1/δ. Let K ∈ B(X) be the rank-one operator defined by Kx = −δϕ(x)u. Clearly, K ≤ δ < ε. Furthermore, BKCA−1 Bu = B(−δϕ(CA−1 Bu)u) = −δϕ(CA−1 Bu)Bu = −δ(1/δ)Bu = −Bu.
(7.8)
Put y = A−1 Bu. If y = 0, then CA−1 Bu = Cy = 0, which contradicts the assumption CA−1 Bu = 1/δ > 0. Consequently, y = 0. From (7.8) we see that BKCy = −Bu = −Ay, whence (A + BKC)y = 0. This implies that A + BKC is not invertible. We have not been able to prove Theorem 7.4 with strict inequalities replaced by nonstrict inequalities. However, one can show that if X is a Banach space, A, B, C ∈ B(X), ε > 0, and at least one of the operators B or C is compact, then sp A ∪ {λ ∈ / sp A : C(A − λI )−1 B ≥ 1/ε} =
.
sp (A + BKC) =
K≤ε
.
sp (A + BKC).
K≤ε, rank K=1
To see this, assume that A is invertible and CA−1 B = 1/δ. Since CA−1 B is compact, there exists a u ∈ X such that u = 1 and CA−1 Bu = 1/δ. The rest of the proof is as in the proof of Theorem 7.4.
7.2
Norm of the Resolvent
In this section we show that the norm of the resolvent of a bounded operator on p (1 < p < ∞) cannot be locally constant. It should be noted that such a result is not true for arbitrary analytic operator-valued functions. To see this, consider the function λ 0 p . A : C → B(2 ), λ → 0 1 Obviously, A(λ)p = max(|λ|, 1) and thus A(λ)p = 1 for all λ in the unit disk. Theorem 7.5 (Daniluk). Let H be a Hilbert space and A ∈ B(H ). Suppose that A − λI is invertible for all λ in some open subset U of C. If there is an M < ∞ such that (A − λI )−1 ≤ M < ∞ for all λ ∈ U , then (A − λI )−1 < M for all λ ∈ U . Proof. A little thought reveals that what we must show is the following: If U is an open subset of C containing the origin and (A − λI )−1 ≤ M for all λ ∈ U , then A−1 < M.
i
i i
i
i
i
i
160
buch7 2005/10/5 page 160 i
Chapter 7. Substitutes for the Spectrum
To prove this, assume the contrary, i.e., let A−1 = M. There is a sufficiently small r > 0 such that (A − λI )−1 =
∞
λj A−j −1 for |λ| = r.
j =0
Given f ∈ H , we therefore get (A − λI )−1 f 2 =
k
λj λ (A−j −1 f, A−k−1 f )
j,k≥0
whenever λ = reiϕ . Integrating the last equality we obtain 1 2π
2π
−1
(A − re I ) f dϕ = iϕ
2
0
∞
r 2j A−j −1 f 2 ,
j =0
and since (A − reiϕ I )−1 f ≤ Mf , it follows that A−1 f 2 + r 2 A−2 f 2 ≤ M 2 f 2 . Now pick an arbitrary ε > 0 and choose an fε ∈ H such that fε = 1 and A−1 fε 2 > M 2 − ε. Then M 2 − ε + r 2 A−2 fε 2 < M 2 , whence 1 = fε 2 ≤ A2 2 A−2 fε 2 < εr −2 A2 2 , which is impossible if ε > 0 is small enough. The following result extends Theorem 7.5 to operators on p for 1 < p < ∞. Theorem 7.6. Let 1 < p < ∞ and let A ∈ B(p ). If A − λI is invertible for all λ in some open set U ⊂ C and (A − λI )−1 p ≤ M < ∞ for λ ∈ U , then (A − λI )−1 p < M for λ ∈ U. Proof. We may without loss of generality assume that p ≥ 2; otherwise we can pass to adjoint operators. Again it suffices to show that A−1 p < M provided U contains the origin. Assume the contrary, that is, let A−1 p = M. There is an r > 0 such that (A − λI )−1 =
∞
λj A−j −1
j =0
for all λ = reiϕ . Hence, for every x ∈ p , p 2p/2 ∞ ∞ ∞ ∞ −1 p j −j −1 j ij ϕ −j −1 λ (A x)n = r e (A x)n (A − λI ) xp = n=0 j =0 n=0 j =0 p/2 ∞ ∞ ∞ j ij ϕ −j −1 k −ikϕ = r e (A x)n r e (A−k−1 x)n n=0 j =0 k=0 p/2 ∞ ∞ = B (r, ϕ, n) , (7.9) C(r, n) + n=0
=1
i
i i
i
i
i
i
7.2. Norm of the Resolvent
buch7 2005/10/5 page 161 i
161
where C(r, n) =
∞
r 2j |(A−j −1 x)n |2 ,
j =0
B (r, ϕ, n) = 2
∞
r +2k Re eiϕ (A−−k−1 x)n (A−k−1 x)n .
k=0
For m = 0, 1, 2, . . . , put p/2 ∞ ∞ B2m (r, ϕ, n) . Im (r, ϕ) = C(r, n) + n=0
=1
Clearly, lim Im (r, ϕ) =
m→∞
∞
|C(r, n)|p/2 .
(7.10)
n=0
We now apply the inequality |a|p/2 ≤
1 |a + b|p/2 + |a − b|p/2 2
(7.11)
to a = C(r, n) +
∞
B2 (r, ϕ, n),
b=
=1
∞
B2−1 (r, ϕ, n)
=1
and sum up the results for n = 0, 1, 2, . . . . Taking into account that nothing but I0 (r, ϕ + π), we get 1 I0 (r, ϕ) + I0 (r, ϕ + π ) . I1 (r, ϕ) ≤ 2
∞ n=0
|a − b|p/2 is
(7.12)
Letting a = C(r, n) +
∞
B4 (r, ϕ, n),
=1
b=
∞
B4−2 (r, ϕ, n)
=1
in (7.11), we analogously obtain that I2 (r, ϕ) ≤
1 I1 (r, ϕ) + I1 (r, ϕ + π/2) . 2
Combining (7.12) and (7.13) we arrive at the inequality 1 I1 (r, ϕ) + I1 (r, ϕ + π/2) I2 (r, ϕ) ≤ 2 1 I0 (r, ϕ) + I0 (r, ϕ + π/2) + I0 (r, ϕ + π ) + I0 (r, ϕ + 3π/2) . ≤ 4
(7.13)
(7.14)
i
i i
i
i
i
i
162
buch7 2005/10/5 page 162 i
Chapter 7. Substitutes for the Spectrum
In the same way we see that I3 (r, ϕ) ≤
1 I2 (r, ϕ) + I2 (r, ϕ + π/4) , 2
which together with (7.14) gives 7 kπ 1 I3 (r, ϕ) ≤ I0 r, ϕ + . 8 k=0 4 Continuing this procedure we get 2 −1 kπ 1 Im (r, ϕ) ≤ m I0 r, ϕ + m−1 2 k=0 2 m
(7.15)
for every m ≥ 0. Now put ϕ = 0 in (7.15) and pass to the limit m → ∞. The limit of the left-hand side is given by (7.10). The right-hand side is an integral sum and hence 2π 2m −1 ∞ 1 1 2π kπ 2kπ 1 lim m r, = I0 r, m−1 = lim I I0 (r, ϕ)dϕ. 0 m→∞ 2 m→∞ 2π 2m 2 2m 2π 0 k=0 k=0 Thus, ∞
| C(r, n)|p/2 ≤
n=0
1 2π
2π
I0 (r, ϕ)dϕ. 0
Since (A − λI )−1 p ≤ M, we have I0 (r, ϕ) = (A − reiϕ I )−1 xp ≤ M p xp . Consequently, p
∞
p
|C(r, n)|p/2 ≤ M p xpp .
(7.16)
n=0
Because ∞
|C(r, n)|p/2 ≥
n=0
∞ 2 p/2 −1 2 (A x)n + r 2 (A−2 x)n n=0
∞ ∞ −1 p −2 p (A x)n + r p (A x)n = A−1 xp + r p A−2 xp ≥ p p n=0
n=0
(here we used the inequality (|a| + |b|)p/2 ≥ |a|p/2 + |b|p/2 ), we deduce from (7.16) that A−1 xpp + r p A−2 xpp ≤ M p xpp .
(7.17)
Finally, let ε > 0 and choose xε ∈ p so that xε p = 1 and A−1 xp > M p − ε. Then p (7.17) yields M p − ε + r p A−2 xε p < M p , and this implies that p
1 = xε pp ≤ A2 pp A−2 xε pp < εr −p A2 pp . This inequality is impossible if ε > 0 is sufficiently small. Thus, our assumption A−1 p = M must be false.
i
i i
i
i
i
i
7.3. Limits of Pseudospectra
7.3
buch7 2005/10/5 page 163 i
163
Limits of Pseudospectra
Let {Mn }∞ n=1 be a sequence of sets Mn ⊂ C. We define lim inf Mn n→∞
as the set of all λ ∈ C for which there are λ1 ∈ M1 , λ2 ∈ M2 , . . . such that λn → λ, and we let lim sup Mn n→∞
denote the set of all λ ∈ C for which there exist n1 < n2 < · · · and λnk ∈ Mnk such that λnk → λ. In other words, λ ∈ lim inf Mn if and only if λ is the limit of some sequence {λn }∞ n=1 with λn ∈ Mn , while λ ∈ lim sup Mn if and only if λ is a partial limit of such a sequence. We remark that if M and the members of the sequence {Mn } are nonempty compact subsets of C, then lim inf Mn = lim sup Mn = M n→∞
n→∞
if and only if Mn converges to M in the Hausdorff metric, which means that d(Mn , M) → 0 with d(A, B) := max max dist (a, B), max dist (b, A) . a∈A
b∈B
This result is due to Hausdorff. Proofs can be found in [149, Sections 3.1.1 and 3.1.2] and [153, Section 2.8]. Theorem 7.7. Let b be a Laurent polynomial. Then for every ε > 0 and every p ∈ (1, ∞), (p) (p) (p) lim inf sp(p) ε Tn (b) = lim sup spε Tn (b) = spε T (b) ∪ spε T (b). n→∞
(7.18)
n→∞
Proof. We first show that (p) sp(p) ε T (b) ⊂ lim inf spε Tn (b). n→∞
(7.19)
If λ ∈ sp T (b), then Tn−1 (b − λ)p → ∞ by virtue of Lemma 3.4. Thus, we have (p) Tn−1 (b − λ)p ≥ 1/ε for all n ≥ n0 , which implies that λ ∈ spε Tn (b) for all n ≥ n0 . (p) Consequently, λ belongs to lim inf spε Tn (b). (p) Now suppose that λ ∈ spε T (b) \ sp T (b). Then T −1 (b − λ)p ≥ 1/ε. Let U ⊂ C be any open neighborhood of λ. From Theorem 7.6 we deduce that there is a point μ ∈ U such that T −1 (b − μ)p > 1/ε. Hence, we can find a natural number k0 such that T −1 (b − μ)p ≥
1 for all k ≥ k0 . ε − 1/k
As U was arbitrary, it follows that there exists a sequence μ1 , μ2 , . . . such that μk ∈ (p) spε−1/k T (b) and μk → λ. For every invertible operator A ∈ B(p ), A−1 xp yp Ayp −1 −1 = sup = inf . (7.20) A p = sup y =0 yp xp x =0 y =0 Ayp
i
i i
i
i
i
i
164
buch7 2005/10/5 page 164 i
Chapter 7. Substitutes for the Spectrum
Since T −1 (b − μk )p ≥ 1/(ε − 1/k), it results that inf T (b − μk )yp ≤ ε − 1/k.
yp =1
Thus, there are yk ∈ p such that yk p = 1 and T (b − μk )yk p < ε − 1/(2k). Clearly, Tn (b − μk )Pn yk p → T (b − μk )yk p and Pn yk p → yk p = 1 as n → ∞. Hence, Tn (b − μk )Pn yk p < ε − 1/(3k) Pn yk p for all n > n0 (k). Again invoking (7.20) we see that Tn−1 (b − μk )p > (ε − 1/(3k))−1 > 1/ε (p)
and thus μk ∈ spε Tn (b) for all n > n0 (k). This implies that λ = lim μk belongs to (p) lim inf spε Tn (b). At this point the proof of (7.19) is complete. Repeating the above reasoning with b in place of b we get the inclusion (p) sp(p) ε T (b) ⊂ lim inf spε Tn (b).
(7.21)
n→∞
p (p) As Tn ( b − λ) = Wn Tn (b − λ)Wn and Wn is an isometry of n , it is clear that spε Tn ( b) = (p) b) on the right by Tn (b), which in spε Tn (b). Thus, in (7.21) we may replace the Tn ( (p) (p) (p) conjunction with (7.19) proves that spε T (b) ∪ spε T ( b) is contained in lim inf spε Tn (b). We are left to prove the inclusion (p) (p) lim sup sp(p) ε Tn (b) ⊂ spε T (b) ∪ spε T (b).
(7.22)
n→∞
(p) (p) So let λ ∈ / spε T (b) ∪ spε T ( b). Then T −1 (b − λ)p < 1/ε and T −1 ( b − λ)p < 1/ε, whence, by Theorem 6.3,
Tn−1 (b − λ)p < 1/ε − δ < 1/ε for all n ≥ n0
(7.23)
with some δ > 0. If |μ − λ| is sufficiently small, then Tn (b − μ) is invertible together with Tn (b − λ), and we have, from the first resolvent identity, Tn−1 (b − μ)p ≤
Tn−1 (b − λ)p
1 − |μ − λ| Tn−1 (b − λ)p
.
(7.24)
Let |μ − λ| < εδ(1/ε − δ)−1 . In this case (7.23) and (7.24) give Tn−1 (b − μ)p
0, then (2) (2) lim inf sp(2) ε Tn (b) = lim sup spε Tn (b) = spε T (b). n→∞
n→∞
i
i i
i
i
i
i
7.4. Pseudospectra of Infinite Toeplitz Matrices
buch7 2005/10/5 page 165 i
165
Proof. Since T ( b) is simply the transpose of T (b), the two norms T −1 (b − λ)2 and −1 (2) T (b − λ)2 coincide. Therefore sp(2) ε T (b) = spε T (b). The assertion is now immediate from Theorem 7.7. Figures 7.1 and 7.2 show an example with the symbol b(t) = −(6 − 13i)t + (5 − 4i)t 2 + 3t 3 − (4 + 3i)t −2 + 3t −3 .
7.4
Pseudospectra of Infinite Toeplitz Matrices
For every operator A ∈ B(2 ) the inequality 1/dist (λ, sp A) ≤ (A − λI )−1 2
(7.25)
holds. This implies that (A−λI )−1 2 ≥ 1/ε whenever dist (λ, sp A) ≤ ε and hence yields the universal lower estimate sp A + ε D ⊂ sp(2) ε A. For Toeplitz operators, Theorem 4.29 gives (T (a) − λI )−1 2 = T −1 (a − λ)2 ≤ 1/dist (λ, conv R(a)).
(7.26)
Consequently, if dist (λ, conv R(a)) > ε then T −1 (a − λ)2 < 1/ε and λ cannot belong to sp(2) ε T (a). We therefore arrive at the upper estimate sp(2) ε T (a) ⊂ conv R(a) + ε D. Given a ∈ W , let V (a) be the set of all λ ∈ C \ sp T (a) for which dist (λ, sp T (a)) = dist (λ, conv R(a)). From (7.25) and (7.26) we infer that if λ ∈ V (a), then T −1 (a − λ)2 = 1/dist (λ, conv R(a)) and hence in V (a) the level curves T −1 (a − λ)2 = 1/ε coincide with the curves dist (λ, conv R(a)) = ε. If V (a) = C \ sp T (a), or equivalently, if sp T (a) is a convex set, then sp(2) ε T (a) = conv R(a) + ε D.
(7.27)
Equality (7.27) is particularly true for tridiagonal Toeplitz matrices, in which case sp T (a) = conv R(a) is an ellipse.
7.5
Numerical Range
Let X be a Banach space and put (X) = {(f, x) ∈ X ∗ × X : f = 1, x = 1, f (x) = 1}.
i
i i
i
i
i
i
166
buch7 2005/10/5 page 166 i
Chapter 7. Substitutes for the Spectrum
20
20
15
15
10
10
5
5
0
0
−5
−5
−10
−10
−15
−15
−20
−20
−25
−25
−30
−20
−10
0
10
20
−30
20
20
15
15
10
10
5
5
0
0
−5
−5
−10
−10
−15
−15
−20
−20
−25
−25
−30
−20
−10
0
10
20
−30
20
20
15
15
10
10
5
5
0
0
−5
−5
−10
−10
−15
−15
−20
−20
−25
−25
−30
−20
−10
0
10
20
−30
−20
−10
0
10
20
−20
−10
0
10
20
−20
−10
0
10
20
Figure 7.1. In the two top pictures we see b(T) (left) and b(T) together with the 100 eigenvalues of T100 (b) (right). The other four pictures indicate sp(2) ε Tn (b) for ε = 1/100 and n = 50, 100, 150, 200. Each picture shows the superposition of the spectra sp (Tn (b) + E) for 50 randomly chosen matrices E with E2 = ε.
i
i i
i
i
i
i
7.5. Numerical Range
buch7 2005/10/5 page 167 i
167
20
20
10
10
0
0
−10
−10
−20
−20
−30
−30 −20
0
20
20
20
10
10
0
0
−10
−10
−20
−20
−30
−20
0
20
−20
0
20
−30 −20
0
20
Figure 7.2. These pictures were done by Mark Embree. In contrast to the lower four pictures of Figure 7.1, the solid curves are the boundaries of the pseudospectra sp(2) ε Tn (b) for ε = 1/100 and n = 50, 100, 150, 200. These curves were determined with the help of Tom Wright’s package [299]. The (spatial) numerical range (= field of values) HX (A) of an operator A ∈ B(X) is defined as HX (A) = {f (Ax) : (f, x) ∈ (X)}. If X = H is a Hilbert space, we can identify the dual space X ∗ with H and, accordingly, (H ) = {(y, x) ∈ H × H : y = 1, x = 1, (y, x) = 1}. Since equality holds in the Cauchy-Schwarz inequality |(y, x)| ≤ y x if and only if y and x are linearly dependent, we see that actually (H ) = {(x, x) ∈ H × H : x = 1}. This implies that in the Hilbert space case the numerical range may also be defined by HH (A) = {(Ax, x) : x = 1}.
i
i i
i
i
i
i
168
buch7 2005/10/5 page 168 i
Chapter 7. Substitutes for the Spectrum
It is well known that HX (A) is always a bounded and connected set whose closure contains the spectrum of A: sp A ⊂ clos HX (A). The Toeplitz-Hausdorff-Stone theorem says that if X = H is a Hilbert space, then HH (A) is necessarily convex. For finite-dimensional Banach spaces X, the numerical range HX (A) is obviously closed. This shows that if X = H = 2n , then HH (A) contains the convex hull of the eigenvalues of A. p If X = p or X = n , we denote HX (A) by Hp (A). If A ∈ B(p ), then Pn APn may p be thought of as an operator on n . The purpose of this section is to show that if 1 < p < ∞, then always lim inf Hp (Pn APn ) = lim sup Hp (Pn APn ) = clos Hp (A). n→∞
n→∞
In particular, lim inf Hp (Tn (b)) = lim sup Hp (Tn (b)) = clos Hp (T (b)). n→∞
n→∞
The dual space of may be identified with q (1/p + 1/q = 1). Thus, for f in (p )∗ = q , Pn f is a well-defined element of (p )∗ = q . p
Lemma 7.9. Let 1 < p < ∞. If (f, x) ∈ (p ), then (Pn f/Pn f q , Pn x/Pn xp ) is in p (n ) for all sufficiently large n. Proof. Since Pn f q → f q = 1 and Pn xp → xp = 1, it follows that Pn f q = 0 and Pn xp = 0 for all n large enough. Put fn = Pn f/Pn f q and xn = Pn x/Pn xp . We are left with showing that fn (xn ) = 1. Let x = {r0 eiϕ0 , r1 eiϕ1 , . . . } with 0 ≤ rj < ∞ and 0 ≤ ϕj < 2π . By assumption, p
p
xp = (r0 + r1 + · · · )1/p = 1. Set g = {r0 e−iϕ0 , r1 e−iϕ1 , . . . }. Then p/q
p/q
p
p
gq = (r0 + r1 + · · · )1/q = 1 and p/q
p/q
p
p
g(x) = r0 r0 + r1 r1 + · · · = r0 + r1 + · · · = 1. Thus, (g, x) ∈ (p ). The space q is uniformly convex. This means that if h1 q = h2 q = 1 and h1 + h2 q = 2, then h1 = h2 . Since f q = gq = 1 and f + gq ≥ |f (x) + g(x)| = |1 + 1| = 2, we arrive at the conclusion that f = g. Consequently, Pn f q Pn xp = Pn gq Pn xp p p p 1/q p p p 1/p = r0 + r1 + · · · + rn−1 r0 + r1 + · · · + rn−1 p p p = r0 + r1 + · · · + rn−1 p/q
p/q
p/q
= r0 r0 + r1 r1 + · · · + rn−1 rn−1 = (Pn g)(Pn x) = (Pn f )(Pn x),
i
i i
i
i
i
i
7.5. Numerical Range
buch7 2005/10/5 page 169 i
169
which is equivalent to the desired equality fn (xn ) = 1. Theorem 7.10 (Roch). Let 1 < p < ∞ and A ∈ B(p ). Then lim inf Hp (Pn APn ) = lim sup Hp (Pn APn ) = clos Hp (A). n→∞
Proof. On regarding
n→∞
p n
p
as a subspace of m for n ≤ m, we have (pn ) ⊂ (pm ) ⊂ (p ),
whence Hp (Pn APn ) ⊂ Hp (Pm APm ) ⊂ Hp (A). This shows that lim sup Hp (Pn APn ) ⊂ clos Hp (A). n→∞
p
To prove the reverse inclusion, let (f, x) ∈ (p ) and define (fn , xn ) ∈ (n ) as in (the proof of) Lemma 7.9. Since fn − f q → 0 and xn − xp → 0, we obtain that f (Ax) = lim fn (Axn ) = lim fn (Pn APn xn ). n→∞
n→∞
p
From Lemma 7.9 we infer that (fn , xn ) ∈ (n ). Thus, Hp (A) ⊂ lim inf Hp (Pn APn ), n→∞
and since limiting sets are always closed, it results that clos Hp (A) ⊂ lim inf Hp (Pn APn ). n→∞
In the case p = 2 and A = T (a), the limit in Theorem 7.10 is known. Theorem 7.11. If a ∈ W , then clos H2 (T (a)) = conv sp T (a) = conv a(T). Proof. Let M(a) : L2 → L2 be the operator of multiplication by a. From Section 1.6 we know that −1 T (a) = P M(a)|H 2 , where P is the orthogonal projection of L2 onto H 2 . This implies that H2 (T (a)) = HH 2 (−1 T (a)) = HH 2 (P M(a)|H 2 ) = {(P M(a)f, f ) : f ∈ H 2 , f 2 = 1} = {(M(a)f, f ) : f ∈ H 2 , f 2 = 1} (since P ∗ f = Pf = f ) ⊂ {(M(a)g, g) : g ∈ L2 , g2 = 1} = HL2 (M(a)).
(7.28)
i
i i
i
i
i
i
170
buch7 2005/10/5 page 170 i
Chapter 7. Substitutes for the Spectrum
The closure of the numerical range of a normal operator is the convex hull of its spectrum (see, e.g., [150, Problem 171]). As M(a) is normal, we deduce that clos HL2 (M(a)) = conv sp M(a) = conv a(T).
(7.29)
Consequently, clos H2 (T (a)) ⊂ clos HL2 (M(a))
(by (7.28)) = conv a(T) (by (7.29)) ⊂ conv sp T (a) (by Corollary 1.12) ⊂ clos H2 (T (a))
(since always sp A ⊂ clos HX (A)),
which gives the assertion.
7.6
Collective Perturbations
Let G be the collection of all sequences {Gn }∞ n=1 of complex n × n matrices Gn such that Gn 2 → 0 as n → ∞. Theorem 7.12 (Roch). If a ∈ W then . lim sup sp (Tn (a) + Gn ) = sp T (a). n→∞
{Gn }∈G
Proof. Let λ ∈ / sp T (a). Then, by Theorem 3.7, {Tn (a − λ)} is a stable sequence: lim sup Tn−1 (a − λ)2 < ∞. n→∞
It follows that if Gn 2 → 0 and μ is in some sufficiently small open neighborhood U of λ, then lim sup (Tn (a − μ) + Gn )−1 2 < ∞. n→∞
This implies that U ∩ sp (Tn (a) + Gn ) = ∅ for all sufficiently large n, whence λ∈ / lim sup sp (Tn (a) + Gn ). n→∞
Now take λ ∈ sp T (a). By virtue of Theorem 3.7, {Tn (a − λ)} is not stable. If Tnk (a − λ) is not invertible for infinitely many nk , then λ ∈ lim sup sp Tn (a). n→∞
So assume Tn (a − λ) is invertible for all n ≥ n0 but lim sup Tn−1 (a − λ)2 = ∞. n→∞
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 171 i
171
There are n1 < n2 < n3 < · · · and xnk ∈ 2nk such that xnk 2 = 1 and Tn−1 (a − λ)xnk 2 ≥ k. k Put ynk = Tn−1 (a − λ)xnk and let Gnk be the matrix of the linear operator k Gnk : 2nk → 2nk , z → −
(z, ynk )xnk . ynk 22
Obviously, xnk 2 1 |(z, ynk )| xnk 2 ≤ ≤ . 2 ynk 2 k ynk 2 z2 =1
Gnk 2 = sup
Let Gn = 0 for n ∈ N \ {n1 , n2 , . . . }. Then {Gn } ∈ G. Since Tnk (a) + Gnk − λI ynk = Tnk (a − λ)ynk + Gnk ynk = xnk − xnk = 0, it follows that Tnk + Gnk − λI is not invertible. Hence λ ∈ lim sup sp (Tn (a) + Gn ). n→∞
Exercises 1. Let K be a compact operator on 2 . Prove that lim inf sp (I + Pn KPn ) = lim sup sp (I + Pn KPn ) = sp (I + K). n→∞
n→∞
2. Let a and b satisfy 0 < a ≤ b ≤ ∞. Show that there exists a selfadjoint operator A ∈ B(2 ) such that A−1 2 = a
and
lim sup (Pn APn )−1 Pn 2 = b. n→∞
3. Let A be a selfadjoint operator on 2 and suppose that spess A is a connected set. Put An = Pn APn |Im Pn . Prove that lim inf sp An = lim sup sp An = sp A, n→∞
n→∞
lim (An − λI )−1 2 = (A − λI )−1 2
n→∞
(λ ∈ C \ sp A).
4. For m ≥ 0, let A : 2 → 2 be the operator A : {x0 , x1 , x2 , . . . } → {xm , xm+1 , . . . }. Prove that sp(2) ε A = (1 + ε)D.
i
i i
i
i
i
i
172
buch7 2005/10/5 page 172 i
Chapter 7. Substitutes for the Spectrum
5. Let m ≥ 2 and b(t) = t + t −m /m. Find the set V (b) and show that {λ ∈ V (b) : T −1 (b − λ)2 = 1/ε} is the union of n + 1 pure circular arcs of curvature 1/ε. 6. Let An be an n × n matrix and put ⎛ ⎞ n |aij |⎠ − |aii |. Ri (An ) = ⎝ j =1
Show that sp(p) ε (An ) ⊂
n . {λ ∈ C : |λ − aii | ≤ Ri (An ) + εn}. i=1
7. Let ε > 0. Show that there exist Laurent polynomials b and c such that sp T (b) + εD = sp(2) ε T (b),
sp(2) ε T (c) = conv R(c) + εD.
8. Show that there exist finite matrices A and B such that (λI − A)−1 2 = (λI − B)−1 2 for all λ ∈ C but p(A)2 = p(B)2 for some polynomial p. 9. Show that the numerical range is robust in the following sense: If E ∈ B(X) and E ≤ ε, then HX (A + E) ⊂ HX (A) + εD. 10. Let A be an n × n matrix. The numbers αH (A) = max Re λ, λ∈H2 (A)
α(A) = max Re λ λ∈sp A
are called the numerical and spectral abscissas of A, respectively. Prove that lim
t→0+0
d log etA 2 = αH (A), dt
lim
t→+∞
d log etA 2 = α(A). dt
11. Let a, b ∈ P \{0} and suppose that a0 = b0 = 0. Prove that the equality Tn (a)Tn (b) = 0 is impossible for n ≤ 3 but possible for n ≥ 4.
Notes Embree and Trefethen’s Web page [110] and book [275] are inexhaustible sources on all aspects of pseudospectra. The first chapter of [275] is on eigenvalues and it ends as follows: “In the highly nonnormal case, vivid though the image may be, the location of the eigenvalues may be as fragile an indicator of underlying character as the hair color of a Hollywood actor.
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 173 i
173
We shall see that pseudospectra provide equally compelling images that may capture the spirit underneath more robustly.” “In summary, eigenvalues and eigenfunctions have a distinguished history of application throughout the mathematical sciences; we could not get around without them. Their clearest successes, however, are associated with problems that involve well-behaved systems of eigenvectors, which in most contexts means matrices or operators that are normal or nearly so. This class encompasses the majority of applications, but not all of them. For nonnormal problems, the record is less clear, and even the conceptual significance of eigenvalues is open to question.” As for the history of pseudospectra, we take the liberty of citing Trefethen and Embree [275] again. “These data suggest that the notion of pseudospectra has been invented at least five times: J. M. Varah H. J. Landau S. K. Godunov et al. L. N. Trefethen D. Hinrichsen and A. J. Pritchard
1967 1979 1975 1975 1990 1992
r-approximate eigenvalue ε-spectrum ε-approximate eigenvalues spectral portrait ε-pseudospectrum spectral value set
One should not trust this table too far, however, as even recent history is notoriously hard to pin down. It is entirely possible that Godunov or Wilkinson thought about pseudospectra in the 1960s, and indeed von Neumann may have thought about them in the 1930s. Nor were others such as Dunford and Schwartz, Gohberg, Halmos, Kato, Keldysch, or Kreiss far away.” The infinite Toeplitz matrix T (b) is normal if and only if the essential range of b is a line segment in the complex plane [77]. Consequently, infinite Toeplitz matrices are typically nonnormal and hence pseudospectra are expected to tell us more about them than spectra. The pioneering work on pseudospectra of Toeplitz matrices is the paper [219] by Reichel and Trefethen. This paper was the source of inspiration for one of the authors’ paper [34] and thus for investigations that have essentially resulted in large parts of the present book. For B = C = I , that is, in the unstructured case, Theorems 7.2 and 7.4 are in principle already in [269], [270]. In the structured case, these theorems are due to Hinrichsen, Kelb, Pritchard, and Gallestey [124], [125], [162], [163]. Section 7.1 is based on ideas of [125] and follows our paper [50]. The question whether the resolvent norm (A − λI )−1 may be locally constant arose in connection with [34]. One of the authors (A. B.) posed this question as an open problem at a Banach semester in Warsaw in 1994, and a few weeks later, Andrzej Daniluk of Cracow was able to solve the problem. The proof of Theorem 7.5 is due to him. Theorem 7.6 was established in [62]. Corollary 7.8, that is, Theorem 7.7 for p = 2, is due to Landau [185], [186], [187] and Reichel and Trefethen [219]. The first clean proof of this result was given in [34]. For general p ∈ (1, ∞), Theorem 7.7 was proved in [62]. The (Hilbert space) numerical range HH (A) was introduced by Otto Toeplitz [268]. For more on numerical ranges, in particular for proofs of the properties quoted in the text, we refer to the books [30], [148], [150], [167], [275]. Theorem 7.10 was established by
i
i i
i
i
i
i
174
buch7 2005/10/5 page 174 i
Chapter 7. Substitutes for the Spectrum
Roch [220]. Theorem 7.12 is also Roch’s; it appeared first in [149, Theorem 3.19]. Theorem 7.11 and the proof given here are due to Halmos [150]. This theorem gives us the closure of H2 (T (b)). The set H2 (T (b)) itself was determined by Klein [180]. There are two theorems in [180]. Theorem 1 says that if the Toeplitz operator has a nonconstant symbol and is normal so that the spectrum is a closed interval [γ , δ] ⊂ C, then the numerical range is the corresponding open interval (γ , δ). Theorem 2 states that if the Toeplitz operator is not normal, then its numerical range is the interior of the convex hull of its spectrum. We remark that Halmos and Klein’s results are actually true for arbitrary b ∈ L∞ . We will say more on the numerical range of finite Toeplitz matrices in the notes to Chapter 8. Exercise 6 is from [111]. For Exercise 7 see [71]. Exercise 8 is a result by Greenbaum and Trefethen [144] (and can also be found in [275]). A solution to Exercise 10, which shows that αH (A) and α(A) give the initial and final slope of the curve t → log etA 2 , is in [275], for example. Exercise 11 is from [147]. Further results: convergence speed. Let b be a Laurent polynomial. If λ is in C \ b(T) and wind (b, λ) = 0, then Tn−1 (b − λ)2 goes to infinity at least exponentially due to Theorem 4.1. This implies that the inequality Tn−1 (b − λ)2 ≥ 1/ε is already satisfied for (2) n’s of moderate size, and consequently, the convergence of sp(2) ε Tn (b) to spε T (b), which is ensured by Corollary 7.8, is very fast. In [34], it is shown that Corollary 7.8 remains true for dense Toeplitz matrices provided the symbol b is piecewise continuous. It was observed in [45] that in the case of piecewise continuous symbols the convergence of sp(2) ε Tn (b) to −1 T (b) may be spectacularly slow, which has its reason in the fact that T sp(2) ε n (b − λ)2 may grow only polynomially. The main result of our paper [54] says that such a slow convergence of pseudospectra is generic even within the class of continuous symbols. In [54], we proved the following. Let b ∈ C 2 and let λ ∈ C be a point whose winding number with respect to b(T) is −1 (respectively, 1). Then Tn−1 (b − λ)2 increases faster than every polynomial, lim Tn−1 (b − λ)2 n−β = ∞
n→∞
for each
if and only if P b (respectively, Qb) is in C ∞ . Here (P b)(t) :=
−1
∞ j j j =−∞ bj t for b(t) = j =−∞ bj t .
β > 0,
∞
j =0
bj t j and (Qb)(t) :=
Further results: operator polynomials. Roch [221] considered the polynomials Ln (λ) = Tn (b0 ) + Tn (b1 )λ + · · · + Tn (bk )λk , L∞ (λ) = T (b0 ) + T (b1 )λ + · · · + T (bk )λk , thought of as acting on 2n and 2 , respectively, and proved that if T (bk ) is invertible, then −1 lim inf {λ ∈ C : L−1 n (λ)2 ≥ 1/ε} = lim sup{λ ∈ C : Ln (λ)2 ≥ 1/ε} n→∞
= {λ ∈ C :
n→∞
L−1 ∞ (λ)2
≥ 1/ε}
for each ε > 0.
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 175 i
175
Further results: higher order relative spectra. Let A ∈ B(2 ) and let L be a closed subspace of 2 . We denote by PL the orthogonal projection of 2 onto L. For a natural number k, the kth order spectrum spk (A, L) of A relative to L is defined as the set of all λ ∈ C for which the compression PL (A − λI )k PL |L is not invertible on L. This definition is due to Brian Davies [94], who suggested that second order spectra might be useful for the approximate computation of spectra of self-adjoint operators. Shargorodsky’s paper [252] is devoted to the geometry of spk (A, L) for fixed L and to the limiting behavior of spk (A, Ln ) as PLn converges strongly to the identity operator. One main result is a purely geometric description of the minimal set Qk (K) with the property that spk (A, L) ⊂ Qk (K) whenever A is a normal operator with sp A ⊂ K. Let, for example, K be a compact subset of R. Put a = min K and b = max K. The set (a, b) \ K is an at most countable union of open intervals, that is, of the form ∪j (aj , bj ). Let B(c1 , c2 ) denote the closed disk with diameter [c1 , c2 ]. Then . Q2 (K) = B(a, b) \ Int B(aj , bj ), j
where Int stands for the interior points. Another remarkable result of [252] states that if A is normal, then . lim sup spk (A, Ln ) = sp A ∪ Qk (spess A), {Ln }
n→∞
where spess A is the essential spectrum of A and the union is over all sequences {Ln } for which PLn converges strongly to the identity operator. As a consequence, Shargorodsky obtains that if k is even, then . lim sup spk (A, Ln ) ∩ R = sp A {Ln }
n→∞
for every selfadjoint operator A. (In the last two equalities, lim sup may be replaced by lim inf.) This shows that, in contrast to usual spectra, even order relative spectra do not deliver spurious points in the gaps of spess A when employing a projection method for the approximate computation of sp A. For an arbitrary bounded operator A, the estimate . {Ln }
lim sup spk (A, Ln ) ⊂ sp A ∪ n→∞
Aess D sin(π/(2k))
is proved in [252]. Further results: normal finite Toeplitz matrices. The characterization of finite normal Toeplitz matrices is discussed in [114], [126], [147], [169], [170], [171], [172]. The approach of Gu and Patton [147] is especially elegant and is applicable to the more general problem of determining all n × n Toeplitz matrices A, B, C, D such that AB − CD is again Toeplitz or zero. For example, [147] contains a simple proof of the following result: The matrix Tn (a) = (aj −k )nj,k=1 is normal if and only if there is a λ ∈ T such that either aj = λa−(n−j ) for 1 ≤ j ≤ n − 1 or aj = λ a−j for 1 ≤ j ≤ n − 1. Note that if aj = λ a−j
i
i i
i
i
i
i
176
buch7 2005/10/5 page 176 i
Chapter 7. Substitutes for the Spectrum
with λ ∈ T for 1 ≤ j ≤ n − 1, then a(T) is a line segment. Indeed, choose μ ∈ T so that μ2 = λ. Then, for t ∈ T, a(t) = a0 +
n
λ a−j t j + a−j t −j
j =1
= a0 + μ
n
n μ a−j t j + μa−j t −j = a0 + 2μ Re μ a−j t j .
j =1
j =1
In the other case, aj = λ a−(n−j ) with λ ∈ T for 1 ≤ j ≤ n − 1, the range a(T) need not to be a line segment (consider, for instance, n = 3 and a(t) = t −1 + it 2 ).
i
i i
i
i
i
i
buch7 2005/10/5 page 177 i
Chapter 8
Transient Behavior
Let An be a complex n × n matrix. The behavior of the norms Akn is of considerable interest in connection with several problems. We specify · to be the spectral norm · 2 . The norms Akn 2 converge to zero as k → ∞ if and only if rad An < 1, where rad An denotes the spectral radius of An . However, sole knowledge of the spectral radius or even of all the eigenvalues of An does not tell us whether the norms Akn 2 run through a critical transient phase, that is, whether there are k for which Akn 2 becomes very large, before eventually decaying exponentially to zero. In this chapter we embark on this problem in the case where An is a Toeplitz band matrix.
8.1 The General Message So what can be said about the norms Tnk (b)2 := (Tn (b))k 2 ? The computer is expected to give a reliable answer in the case where n is small. We therefore assume that n is large. The message of this chapter is that Tnk (b)2 has critical behavior as k increases (which means that the norms Tnk (b)2 become large before decaying to zero or that the norms Tnk (b)2 go to infinity) if and only if b∞ > 1. In other words, to find out whether Tnk (b)2 shows critical transient or limiting behavior, we need only look whether the L∞ norm of the symbol is greater than one. Notice that it is much more difficult to decide whether rad Tn (b) or lim supn→∞ rad Tn (b) is smaller than one (see Sections 10.3 and 10.4). Since Tn (b) = Pn T (b)Pn |Im Pn , it follows at once that Tnk (b)2 ≤ Tn (b)k2 ≤ T (b)k2 = bk∞
(8.1)
for all n and k. Thus, for the norms of powers of Toeplitz matrices we have the simple universal upper estimate (8.1). In particular, if b∞ ≤ 1, then there is no critical behavior. The following theorem shows that (8.1) contracts to an equality in the n → ∞ limit. This theorem may be viewed as an argument in support of the statement that Tnk (b) shows critical behavior whenever b∞ > 1. 177
i
i i
i
i
i
i
178
buch7 2005/10/5 page 178 i
Chapter 8. Transient Behavior
Theorem 8.1. If b ∈ W , then lim Tnk (b)2 = bk∞
n→∞
for each natural number k. Proof. This follows from Corollary 5.14. Here is a direct proof. For fixed k, the operators Tnk (b) converge strongly to T k (b) as n → ∞. Hence, by the Banach-Steinhaus theorem, lim inf Tnk (b)2 ≥ T k (b)2 . n→∞
(8.2)
By (8.1) and (8.2), we are left with proving that T k (b)2 ≥ bk∞ . But T k (b)2 ≥ rad T k (b) = (rad T (b))k , and Corollary 1.12 says that the spectrum of T (b) contains the range of b, whence rad T (b) ≥ b∞ . The problem studied here amounts to looking for peaks of the “surface” (k, n) → Tnk (b)2 along the lines n = constant, while Theorem 8.1 concerns the behavior of Tnk (b)2 along the lines k = constant. We will return to this question later. For the moment, let us consider an example. Example 8.2. The simplest nontrivial Toeplitz matrix is the Jordan block ⎛ ⎞ λ 0 0 ... 0 ⎜ 1 λ 0 ... 0 ⎟ ⎜ ⎟ ⎜ ⎟ Jn (λ) = ⎜ 0 1 λ . . . 0 ⎟ . ⎜ .. .. .. . . . ⎟ ⎝ . . . . .. ⎠ 0
0
0
...
(8.3)
λ
Clearly, Jn (λ) = Tn (b) with b(t) = λ + t (t ∈ T). Figure 8.1 indicates the shape of the “surface” (k, n) → Jnk (0.8)2 . We will omit the quotation marks in the following and will simply speak of the norm surface. We will also silently identify the surface with its “map” in the k, n plane; that is, we will not distinguish the point (k, n, f (k, n)) on the surface from its projection (k, n) in the plane. The spectral radius of Jn (0.8) equals 0.8, while the L∞ norm of the symbol is b∞ = 1.8. We see that the surface has a lowland (say below the curve Jnk (0.8)2 = 10−6 , a steep region (say between the curves Jnk (0.8)2 = 10−6 and Jnk (0.8)2 = 102 ), and a part where it grows enormously (say above the curve Jnk (0.8)2 = 102 ). We call the last part the sky region. If n is fixed and k increases, we move horizontally in Figure 8.1. On the surface, we will soon be in the sky region, then step down the steep region, and finally be caught in the lowland forever. Thus, we have the critical transient phase shown in the left picture of Figure 8.2. On the other hand, if we fix k and let n increase, then this corresponds to a vertical movement in Figure 8.1. This time we will fairly quickly reach the sky region and move at nearly constant height for the rest of the journey, as shown the right picture of Figure 8.2. Note that in the right picture of Figure 8.2 the norms converge to the limit 1.820 = 12.75 · 104 and that the norms are already very close to this limit beginning with n between 20 and 40.
i
i i
i
i
i
i
8.2. Polynomial Numerical Hulls
buch7 2005/10/5 page 179 i
179
60
50
40
30
20
10
0 0
100
200
300
400
500
Figure 8.1. Level curves Jnk (0.8)2 = h for h = 10−6 , 102 , 1010 , 1018 , 1026 , 1034 (the lower curve corresponds to h = 10−6 , the upper to 1034 ). We took n = 3, 4, 5, . . . , 60 and k = 5, 10, 15, . . . , 500. 12
2.5
4
x 10
14
x 10
12
2
10 1.5
8
1
6 4
0.5 0 0
2 100
200
300
400
500
0 0
20
40
60
Figure 8.2. Movement on the surface (k, n) → Jnk (0.8)2 along the horizontal line n = 20 (left) and along the vertical line k = 20 (right).
8.2
Polynomial Numerical Hulls
Let An be an n × n matrix. The polynomial numerical hull Gk (An ) of degree k is defined as + }, Gk (An ) = {z ∈ C : |p(z)| ≤ p(An )2 for all p ∈ Pk+1
i
i i
i
i
i
i
180
buch7 2005/10/5 page 180 i
Chapter 8. Transient Behavior
+ where Pk+1 is the set of all polynomials of the form
p(z) = p0 + p1 z + · · · + pk zk . The objective of Gk (An ) is to employ the obvious inequality p(An )2 ≥ max |p(z)| z∈Gk (An )
in order to get a lower estimate for p(An )2 . The sets Gk (An ) were introduced by Olavi Nevanlinna [192], [193] and were independently discovered by Anne Greenbaum [141], [142]. Their works describe various properties of polynomial numerical hulls. In general, polynomial numerical hulls can be computed only numerically (see, e.g., [141]), and the development of algorithms and software for polynomial numerical hulls is still far away from the advanced level of the pseudospecta counterpart [299], [300]. Faber, Greenbaum, and Marshall [112] obtained remarkably precise results on the polynomial numerical hulls of Jordan blocks. Let Jn (λ) be the Jordan block (8.3). If k is greater than the degree of the minimal polynomial of An , then Gk (An ) collapses to the spectrum of An . Thus, Gk (Jn (λ)) = {λ} for k ≥ n. In [112], it is shown that if 1 ≤ k ≤ n−1, then Gk (Jn (λ)) is a closed disk with the center λ whose radius n,k satisfies log(2n) log(log(2n)) 1 π = 1,n ≥ k,n ≥ n−1,n = 1 − + +o . cos n+1 n n n Consequently, Jnk (λ)2 ≥
max |ζ |k = (|λ| + n,k )k .
|ζ −λ|≤n,k
It is also shown in [112] that n−1,n is greater than or equal to the positive root of 2r n +r −1 = 0, which implies that n−1,n > 1 −
log(2n) n
for all
n.
50 50 (0.8)2 ≥ 1012.11 . MATLAB gives that J100 (0.8)2 equals This yields, for instance, J100 12.76 10 . We remark that a matrix with a norm that is gigantic in comparison with the matrix dimension must have a gigantic entry. For n > k, the th entry of the first column of the lower-triangular Toeplitz matrix Jnk (λ) is ( k ) λk− . Taking = [k/(1 + |λ|)], where [·] denotes the integer part, we get k k Jn (λ)2 ≥ |λ|k−[k/(1+|λ|)] . [k/(1 + |λ|)]
This simple observation delivers
50 (0.8)2 J100
50 ≥ 0.823 = 1011.80 , 27
which is fairly good.
i
i i
i
i
i
i
8.3. The Pseudospectra Perspective
buch7 2005/10/5 page 181 i
181
8.3 The Pseudospectra Perspective The role of pseudospectra in connection with the norms of powers of matrices and operators is as follows: norms of powers can be related to the resolvent norm, and pseudospectra decode information about the resolvent norm in a visual manner. A relation between resolvent and power norms is established by the Kreiss matrix theorem. Let An be an n × n matrix with rad An ≤ 1. For λ outside the closed unit disk D, put Rn (λ) = (An − λI )−1 2 . The Kreiss matrix theorem says that sup (|λ| − 1)Rn (λ) ≤ max Akn 2 ≤ en sup (|λ| − 1)Rn (λ). k≥0
|λ|>1
|λ|>1
We refer the reader to [289] for a delightful discussion of the theorem. The “easy half” of the Kreiss matrix theorem is the lower estimate. It implies that if we pick a point λ ∈ C with := |λ| > 1, then the maximum of the norms Akn 2 (k ≥ 0) is at least ( − 1)Rn (λ). Thus, the maximum is seen to be large whenever we can find a λ outside D with large resolvent norm. The following theorem and its proof are due to Nick Trefethen [272]. This theorem is similar to but nevertheless slightly different from the lower estimate of the Kreiss matrix theorem. Theorem 8.3. Let A be a bounded linear operator and let λ ∈ C \ sp A. Put = |λ| and R(λ) = (A − λI )−1 2 . If R(λ) > 1/, then / k − 1 1 j k 1+ max A 2 ≥ . (8.4) 1≤j ≤k − 1 R(λ) − 1 Proof. Put Mk = max1≤j ≤k Aj 2 . By assumption, R(λ) − 1 is positive, so the term in the large brackets of (8.4) is at least 1 and the right-hand side of (8.4) is thus at most k . If does not exceed the spectral radius of A, then (8.4) is accordingly trivial. We may therefore assume that is larger than the spectral radius of A. In this case we have λ(λI − A)−1 = (I − λ−1 A)−1 = I + (λ−1 A) + (λ−1 A)2 + (λ−1 A)3 + · · · and the corresponding bound R(λ) ≤ 1 + −1 A2 + −2 A2 2 + −3 A3 2 + · · · .
(8.5)
Let us take the case k = 2 for illustration. Clearly, A2 , A2 2 ≤ M2 ,
A3 2 , A4 2 ≤ M22 ,
A5 2 , A6 2 ≤ M23 ,
and so on. Grouping the terms in (8.5) accordingly into pairs gives R(λ) ≤ 1 + (−2 M2 )(1 + ) + (−2 M2 )2 (1 + ) + · · · . If M2 ≥ 2 , then (8.4) is true, so we may assume that −2 M2 < 1. In this case R(λ) ≤ 1 +
+1 −2 M2 (1 + ) . =1+ 2 1 − −2 M2 /M2 − 1
i
i i
i
i
i
i
182
buch7 2005/10/5 page 182 i
Chapter 8. Transient Behavior
For general k, we obtain similarly that R(λ) is at most 1 + (−k Mk )(1 + · · · + k−1 ) + (−k Mk )2 (1 + · · · + k−1 ) + · · · =1+
1 + + · · · + k−1 , k /Mk − 1
that is, R(λ) − 1 ≤
1 k − 1 . k − 1 /Mk − 1
It follows that −1 1 ≥ k R(λ) − 1 −1
k −1 , Mk
which implies that 1 k − 1 k . −1≤ Mk − 1 R(λ) − 1 This is (8.4). Thus, if sp A is known to be a subset of the closed unit disk D and if there is an ε ≤ 1 such that the pseudospectrum spε A contains points outside D, then, by (7.1), each point λ ∈ spε A \ D yields an estimate / max Aj 2 ≥ k
1≤j ≤k
1+
k − 1 1 − 1 /ε − 1
(8.6)
with = |λ|. Different choices of λ give different values of the right-hand sides of (8.4) and (8.6), and we want these right-hand sides to be as large as possible. Tom Wright [299], [300] has implemented the search for an optimal λ in EigTool. Recall that P denotes the set of all Laurent polynomials. Fix b ∈ P and consider Tn (b). We put Mk (n) = max Tnj (b)2 , 1≤j ≤k
M(n) = lim Mk (n). k→∞
Clearly, M(n) = supk≥1 Mk (n); that is, M(n) is the height of the highest peak of Tnk (b)2 as k ranges over N. Thus, the powers of Tn (b) go to infinity if and only if M(n) = ∞ and they have a critical transient behavior before decaying to zero if and only if M(n) is large but finite. We assume that the plane Lebesgue measure of sp T (b) is nonzero. Equivalently, we assume that there exist points in the plane that are encircled by the (naturally oriented) curve b(T) with nonzero winding number. Suppose b∞ > 1. Then there exist points λ ∈ C \ b(T) such that = |λ| > 1 and the winding number of b about λ is nonzero. From Theorem 4.1 we know that Rn (λ) := (Tn (b) − λI )−1 2 = Tn−1 (b − λ)2
i
i i
i
i
i
i
8.3. The Pseudospectra Perspective
buch7 2005/10/5 page 183 i
183
increases at least exponentially, i.e., there exist positive constants C = C(b, λ) and β = β(b, λ) such that Rn (λ) ≥ Ceβn
(8.7)
for all n ≥ 1. Consequently, Rn (λ) > 1 for all sufficiently large n and we obtain from Theorem 8.3 that / k − 1 1 k Mk (n) ≥ 1+ , − 1 Rn (λ) − 1 and M(n) ≥ ( − 1)(Rn (λ) − 1).
(8.8)
This in conjunction with (8.7) gives M(n) ≥ ( − 1)(Ceβn − 1),
(8.9)
that is, there must be critical transient or limiting behavior of Tnk (b)2 for all n larger than a moderately sized n0 . We will now say something about the constants in (8.7) and (8.9). We denote by Pr (r ≥ 1) the set of all Laurent polynomials of degree at most r, that is, the set of all functions of the form b(t) =
r
bj t j .
(8.10)
j =−r
Let b ∈ Pr and suppose br = 0. Assume b∞ > 1 and choose λ as above. For the sake of definiteness, let wind (b, λ) = −κ ≤ −1. We have t r (b(t) − λ) = br
s
(t − μi )
i=1
2r
(t − δi )
i=s+1
with |μi | > 1 and |δi | < 1 (recall Section 1.4). It follows that 2r s δj −κ b(t) − λ = br t 1− (t − μi ) , t i=1 i=s+1 where κ = s − r (≥ 1). Put d(t) =
s ∞ (t − μi )−1 =: dj t j
Pn d22 =
(t ∈ T),
(8.11)
j =0
i=1 n−1
|dj |2 ,
j =0
Qn d22 =
∞
|dj |2 .
(8.12)
j =n
Theorem 8.4. For all n ≥ 1, Rn (λ) ≥
Pn d2 . b − λ∞ Qn d2
i
i i
i
i
i
i
184
buch7 2005/10/5 page 184 i
Chapter 8. Transient Behavior
Proof. Put c = b − λ. With the notation χk (t) = t k , we have c = χ−κ c− c+ , where c− (t) =
2r i=s+1
δi 1− t
,
c+ (t) = br
s
(t − μi ).
i=1
Define x (n) ∈ Cn and x ∈ 2 by x (n) = (d0 , . . . , dn−1 ) and x = {d0 , d1 , . . . }. Since T (c) = T (c− )T (χ−κ )T (c+ ), we obtain Tn (c)x (n) = Pn T (c− )T (χ−κ )T (c+ )x (n) .
(8.13)
Because κ ≥ 1, we have Pn T (c− )T (χ−κ )e0 = 0, where e0 = {1, 0, 0, . . . }. As T (c+ )x = e0 , it follows that Pn T (c− )T (χ−κ )T (c+ )x = 0. This equality and (8.13) give Tn (c)x (n) = Pn T (c− )T (χ−κ )T (c+ )(x (n) − x) = Pn T (c)(x (n) − x). Taking into account that x (n) − x2 = Qn d2 , we arrive at the estimate Tn (c)x (n) 2 ≤ T (c)2 Qb d2 = c∞ Qn d2 , and since x (n) 2 = Pn d2 , it results that Tn−1 (c)2 ≥
x (n) 2 Pn d2 ≥ . Tn (c)x (n) 2 c∞ Qn d2
For the sake of simplicity, assume that the zeros μ1 , . . . , μs are distinct and that |μ1 | < |μi | for all i ≥ 2. Decomposition into partial fractions gives d(t) =
s i=1
∞
Ai Ai = tj , j 1 − t/μi j =0 i=1 μi s
(8.14)
with explicitly available constants A1 , . . . , As . Theorem 8.4 in conjunction with the estimates s s 1 Pn d2 ≥ |d0 | = Ai = |μi | i=1 i=1 and s 2 ∞ ∞ s s A |Ai |2 |Ai |2 s i Qn d22 = ≤ s = j 2j 2n 1 − 1/|μ |2 |μ | |μ | i i i j =n i=1 μi i=1 j =n i=1 s s Bi Bi μ1 2n B1 1 + = =: |μi |2n |μ1 |2n B1 μi i=1 i=2 yields Rn (λ)2 ≥
1 |μ1 |2n !s
s . 2 2 B1 b − λ∞ i=1 |μi | 1 + i=2 (Bi /B1 )|μ1 /μi |2n
(8.15)
i
i i
i
i
i
i
8.3. The Pseudospectra Perspective
buch7 2005/10/5 page 185 i
185
In practice, we could try computing Rn (λ) directly via the MATLAB commands Rn (λ) = norm(inv(Tn (b − λ))) or Rn (λ) = 1/min(svd(Tn (b − λ))). However, as (8.7) shows, this is an ill-conditioned problem for large n. Estimate (8.15) is more reliable. It first of all shows that (8.7) and (8.9) are true with β = log |μ1 |. To get the constants contained in (8.15) we may proceed as follows. We first determine the zeros μ1 , . . . , μs , we then use MATLAB’s residue command to find the numbers Ai in formula (8.14), and finally we put Bi = s|Ai |2 /(1 − 1/|μi |2 ). Note that s is in general much smaller than n so that possible numerical instabilities are no longer caused by large matrix dimensions but at most by unfortunate location of the zeros μ1 , . . . , μs . Notice also that (8.11) implies that the numbers d0 , . . . , ds−1 are the entries of the first column of the lower-triangular Toeplitz matrix Ts (d). Thus, alternatively we could solve the s × s system s Ts (t − μi ) ( d0 . . . ds−1 ) = ( 1 0 . . . 0 ) i=1
to obtain d0 , . . . , ds−1 and then, taking into account (8.14), find the constants A1 , . . . , As as the solutions of the s × s Vandermonde system ⎞⎛ ⎞ ⎛ ⎞ ⎛ 1 1 ... 1 A1 d0 ⎜ ⎟ ⎜ ⎟ ⎜ 1/μ1 1/μ2 . . . 1/μs ⎟ ⎟ ⎜ A2 ⎟ ⎜ d1 ⎟ ⎜ ⎟ ⎜ .. ⎟ = ⎜ .. ⎟ . ⎜ .. .. .. ⎠⎝ . ⎠ ⎝ . ⎠ ⎝ . . . s−1 s−1 s−1 As ds−1 1/μ1 1/μ2 . . . 1/μs Here is an example that can be done by hand. Example 8.5. Let b(t) = t −1 + α 2 t with 0 < α < 1/2. The range b(T) is the ellipse x2 y2 + =1 (1 + α 2 )2 (1 − α 2 )2 and the eigenvalues of the matrix Tn (b) are densely spread over the interval (−2α, 2α) between the foci of the ellipse. The spectral radius is rad (Tn (b)) = 2α cos
π . n+1
This is smaller than 1 but may be close to 1. The norm of the symbol is b∞ = 1 + α 2 > 1. Fix λ = ∈ (1, 1 + α 2 ). The zeros of t (b(t) − ) are # # − 2 − 4α 2 + 2 − 4α 2 , μ = . μ1 = 2 2α 2 2α 2 The numbers μ1 and μ2 are greater than 1, and we have A1 =
1 , μ1 − μ 2
A2 =
1 , μ2 − μ 1
B1 =
2A21 , 1 − 1/μ21
B2 =
2A22 . 1 − 1/μ22
i
i i
i
i
i
i
186
buch7 2005/10/5 page 186 i
Chapter 8. Transient Behavior
Thus, (8.15) becomes α 4 μ2n 1 Rn (λ) ≥ B1 ( + 1 + α 2 )2 2
>
B2 1+ B1
μ1 μ2
2n .
Here are a few concrete samples. In each case we picked λ = = 1.01. Note that (8.8) with = 1.01 gives M(n) ≥ 0.1(1.01 Rn (1.01) − 1), which is almost the same as the Kreiss estimate M(n) ≥ 0.1Rn (1.01). α = 0.2. We have μ1 = 1.0323, μ2 = 24.2177, Rn (1.01) ≥ 100.0138 n−1.0863 =: E0.2 (n). In particular, E0.2 (1000) = 1012.70 . MATLAB result: R1000 (1.01) = 1015.03 . α = 0.4. Now μ1 = 1.2296, μ2 = 5.0829, Rn (1.01) ≥ 100.0897 n−0.9316 =: E0.4 (n). We get E0.4 (100) = 108.04 and E0.4 (1000) = 1088.77 , while the MATLAB results are R100 (1.01) = 109.48 and R1000 (1.01) = 1090.28 . α = 0.49. This time μ1 = 1.5945, μ2 = 2.6121, Rn (1.01) ≥ 100.2026 n−1.2233 =: E0.49 (n). We obtain E0.49 (50) = 108.91 ,
E0.49 (100) = 1019.04 ,
E0.49 (1000) = 10201.38 ,
E0.49 (2000) = 10403.98 ,
and MATLAB delivers R50 (1.01) = 1010.22 ,
R100 (1.01) = 1020.35 ,
R1000 (1.01) = 10202.71 ;
the last two values are delivered with a warning. MATLAB returns a warning without a value for n = 2000.
8.4 A Triangular Example The next question
after Theorem 8.1 concerns the convergence speed. Let b ∈ P be of the form b(t) = j ≥0 bj t j , that is, suppose T (b) is lower triangular. Assume also that |b| is not constant on T. We have Tnk (b) = Tn (bk ), and hence we may employ Theorem 5.8 with b replaced by bk . It results that there are constants ck , dk ∈ (0, ∞) and γ ∈ {1, 2, . . . } such that bk∞ −
dk ck ≤ Tnk (b)2 ≤ bk∞ − 2γ n2γ n
(8.16)
i
i i
i
i
i
i
8.5. Gauss-Seidel for Large Toeplitz Matrices
buch7 2005/10/5 page 187 i
187
for all n ≥ 1. However, the constants ck and dk may be large if k is large. We remark that the constants C of inequalities like b∞ − |b(t)| ≤ C|t − t0 |2γ
(8.17)
enter the d2 of (5.14). For the powers of b, we obtain from (8.17) something like bk∞ − |bk (t)| Ckbk∞ |t − t0 |2γ , and since Ckbk∞ is large whenever b∞ > 1 and k is large, we can expect that dk is also large. Let us consider a concrete example. Take 1 1 1 1 3 1 4 2 2 11 4 b(t) = 10 − − t + t − t − t = 10t − |1 − t| . 16 4 4 16 8 16 In this case 2γ = 4. The spectral radius of Tn (b) is about 0.63 and b∞ equals 110/8 = 13.75. The level curves of the norm surface are shown in Figure 8.3 and we clearly see the expected critical transient behavior. An interesting feature of Figure 8.3 is the indents of the level curves. These indents correspond to vertical valleys in the surface. For instance, along the horizontal line n = 18 we have Figure 8.4. The interesting piece of Figure 8.4 is between k = 15 and k = 40. If we move vertically along one of the lines k = 15, k = 20, . . . , k = 40, we obtain Figure 8.5. Figure 8.6 reveals some mild turbulence in the steep region, but Figure 8.5 convincingly shows that in the sky region everything goes smoothly. In particular, the speed of the convergence of Tnk (b)2 to T k (b)2 = bk∞ is not affected by small fluctuations of the exponent k.
8.5
Gauss-Seidel for Large Toeplitz Matrices
To solve the n × n system Cn x = y, one decomposes Cn into a sum Cn = Ln + Un of a lower-triangular matrix Ln and an upper-triangular matrix Un with zeros on the main diagonal. If Ln is invertible, then the system Cn x = y is equivalent to the system −1 x = −L−1 n Un x + Ln y.
(8.18)
The Gauss-Seidel iteration consists of choosing an initial x0 and computing the iterations by −1 xk+1 = −L−1 n Un xk + Ln y.
(8.19)
Sometimes (8.18) and (8.19) are written in the form x = x + L−1 n (y − Cn x),
xk+1 = xk + L−1 n (y − Cn xk ).
k −1 We have xk − x = (−L−1 n Un ) (x0 − x). Thus, the iteration matrix is An := −Ln Un = C , and the iteration converges whenever rad A < 1. The problem is whether a I − L−1 n n n critical transient behavior of the norms Akn may garble the solution.
i
i i
i
i
i
i
188
buch7 2005/10/5 page 188 i
Chapter 8. Transient Behavior
40 35 30 25 20 15 10 5 0 0
50
100
150
200
250
300
350
400
1 1 4 Figure 8.3. The symbol is b(t) = 10 (− 16 − 41 t + t 2 − 41 t 3 − 16 t ). The plot k −2 2 4 shows the level curves Tn (b)2 = h for h = 10 , 1, 10 , 10 , . . . , 1024 (the lower curve corresponds to h = 10−2 , the upper to 1024 ). We took 6 ≤ n ≤ 40 and k = 10, 15, 20, . . . , 400.
12
3
x 10
2.5
2
1.5
1
0.5
0 0
20
40
60
80
100
k Figure 8.4. The symbol is as in Figure 8.3. The picture shows the norms T18 (b)2 for 1 ≤ k ≤ 100.
i
i i
i
i
i
i
8.5. Gauss-Seidel for Large Toeplitz Matrices
buch7 2005/10/5 page 189 i
189
45 40 35 30 25 20 15 10 5 0 0
10
20
30
40
50
60
70
Figure 8.5. The symbol is again as in Figure 8.3. We see the values of log10 Tnk (b)2 for 6 ≤ n ≤ 70 and k = 15, 20, 25, . . . , 40. Eventually higher curves correspond to higher values of k.
12
10
8
6
4
2
6
8
10
12
14
16
18
Figure 8.6. A close-up of Figure 8.5.
i
i i
i
i
i
i
190
buch7 2005/10/5 page 190 i
Chapter 8. Transient Behavior
Now suppose Cn = Tn (c) is a Toeplitz matrix and let us for the sake of simplicity assume that ∞
c(t) =
cj t j ,
where
j =−∞
∞
|cj | < ∞.
j =−∞
We write Cn = Ln + Un = Tn (c+ ) + Tn (c− ) with c+ (t) =
∞
cj t j ,
c− (t) =
j =0
−1
cj t j .
j =−∞
−1 If c+ (z) = 0 for |z| ≤ 1, then the inverse of Tn (c+ ) is Tn (c+ ) and the iteration matrix −1 becomes An = −Tn (c+ )Tn (c− ). The spectrum of An is the set of all complex numbers λ for which −1 −1 −Tn (c+ )Tn (c− ) − λI = −Tn (c+ )Tn (c− + λc+ ) −1 is not invertible. Since Tn (c+ ) is invertible by assumption, we have to look for the λ’s for which Tn (c− + λc+ ) is not invertible. By the Banach-Steinhaus theorem, −1 −1 lim inf Akn 2 = lim inf (Tn (c+ )Tn (c− ))k 2 ≥ (T (c+ )T (c− ))k 2 . n→∞
n→∞
−1 −k k Propositions 1.2 and 1.3 imply that (T (c+ )T (c− ))k is T (c+ c− ) plus a compact operator. −k k The function c+ c− is not a Laurent polynomial, but it lies in the Wiener algebra, and the argument of the proof of Lemma 5.12 remains valid for symbols in the Wiener algebra. This −1 −k k −1 shows that (T (c+ )T (c− ))k 2 ≥ c+ c− ∞ = c+ c− k∞ . Consequently, −1 lim inf Akn 2 ≥ c+ c− k∞ . n→∞
(8.20)
Thus, if n is large enough, a critical transient phase will certainly occur in the case where −1 c− ∞ > 1. c+ If a(t) = a−1 t −1 + a0 + a1 t, then the eigenvalues of Tn (a) are known to be πj √ a0 + 2 a1 a−1 cos n+1
(j = 1, . . . , n)
(Theorem 2.4). Hence, in case c is a trinomial, c(t) = c−1 t −1 + c0 + c1 t, the matrix Tn (c− + λc+ ) has the eigenvalues √ πj √ λc0 + 2 c1 c−1 λ cos n+1 It follows that
sp An =
(j = 1, . . . , n).
: 4c1 c−1 2 πj : j = 1, . . . , n cos n+1 c02
i
i i
i
i
i
i
8.5. Gauss-Seidel for Large Toeplitz Matrices
buch7 2005/10/5 page 191 i
191
and rad An =
π 4|c1 | |c−1 | cos2 . 2 |c0 | n+1
Anne Greenbaum [141] discussed the example c(t) = −1.16t −1 + 1 + 0.16t. The spectral radius of An is about 0.73. Using (numerically computed) polynomial numerical 29 ≈ 700 and trying the computer directly, she hulls, she obtained that A29 30 2 ≥ 1.256 29 arrived at the much better result A30 2 ≈ 104 . Our estimate (8.20) with c+ (t) = 1 + 0.16t and c− (t) = −1.16t −1 gives lim inf A29 n 2 n→∞
−1.16t −1 29 ≥ = max t∈T 1 + 0.16t 29 1.16 = = 1.38129 = 104.07 . 1 − 0.16 −1 c+ c− 29 ∞
(8.21)
Thus, although (8.21) is an “n → ∞” result, it is already strikingly good for n about 30. Figure 8.7 shows the norm surface. 50 45 40 35 30 25 20 15 10 5 0 0
50
100
150
−1 Figure 8.7. The norm surface for An = Tn (c+ )Tn (c− ) with c+ (t) = 1 + 0.16t and c− (t) = −1.16t −1 . The picture shows the level curves Akn 2 = h for h = 10−2 , 1, 102 , 104 , 106 (the lower curve corresponds to h = 10−2 , the upper to 106 ). We took 3 ≤ n ≤ 50 and k = 5, 10, 15, . . . , 150.
Nick Trefethen [270] considered the symbol c(t) = t −1 − 2 + t. In this case c+ (t) = −2 + t and c− (t) = t −1 , so −1 t −1 = 1, c+ c− ∞ = max t∈T −2 + t
i
i i
i
i
i
i
192
buch7 2005/10/5 page 192 i
Chapter 8. Transient Behavior
and hence (8.20) does not provide any useful piece of information. However, the brute estimate 1 k k k −1 k k −1 k k ·1 =1 An 2 ≤ Tn (c+ )2 Tn (c− )2 ≤ c+ ∞ c− ∞ = max t∈T −2 + t π shows that there is no critical behavior. The spectral radius of An is cos2 n+1 , which is approximately 0.9990325 for n = 100. MATLAB gives rad A100 = 0.9990 (fantastic!) and tells us that the norms Ak100 2 are 0.9080, 0.6169, and 0.3804 for k = 100, 500, 1000, respectively.
8.6
Genuinely Finite Results
We begin with an observation similar
to Proposition 5.11. We denote by Pr the set of all Laurent polynomials of the form rj =−r bj t j . Lemma 8.6. Let b ∈ Pr and n > 2r(k − 1). Then Tnk (b) = Tn (bk ) + Pn Xk Pn + Wn Yk Wn with r(k − 1) × r(k − 1) matrices Xk and Yk independent of n. Proof. For k = 1, the assertion is true with X1 = Y1 = 0. Assume the asserted equality is valid for k. We prove it for k + 1. Obviously, Tnk+1 (b) = Tn (bk )Tn (b) + Pn Xk Pn T (b)Pn + Wn Yk Wn Pn T (b)Pn . By Proposition 3.10, Tn (bk )Tn (b) = Tn (bk+1 ) − Pn H (bk )H ( b)Pn − Wn H ( bk )H (b)Wn . We have ⎛
Z11 ⎜ .. ⎜ . ⎜ H (bk ) = ⎜ Zk1 ⎜ ⎝ 0 ...
...
Z1k .. .
. . . Zkk ... 0 ... ...
0 .. .
...
⎞
⎟ ⎟ ⎟ , 0 ... ⎟ ⎟ ⎠ 0 ... ... ...
⎛
U H ( b) = ⎝ 0 ...
⎞ 0 ... 0 ... ⎠ ... ...
with certain r × r blocks Zij and U , which implies that ⎛
Z11 U ⎜ .. ⎜ . ⎜ H (bk )H ( b) = ⎜ Zk1 U ⎜ ⎝ 0 ...
0 .. .
...
⎞
⎟ ⎟ ⎟ . 0 ... ⎟ ⎟ ⎠ 0 ... ... ...
i
i i
i
i
i
i
8.6. Genuinely Finite Results
buch7 2005/10/5 page 193 i
193
Thus, Pn H (bk )H ( b)Pn = Pn Xk+1 Pn with a kr × r matrix Xk+1 independent of n. Analo k b )H (b)Wn = Wn Yk+1 Wn with some kr × r matrix Yk+1 . Further, gously, Wn H ( ⎛ ⎞ . . . X1,k−1 0 ... X11 ⎜ ⎟ .. .. .. ⎜ ⎟ . . . ⎜ ⎟ Xk = ⎜ Xk−1,1 . . . Xk−1,k−1 0 . . . ⎟ ⎜ ⎟ ⎝ 0 ... 0 0 ... ⎠ ... ... ... ... ...
and
⎛
A0 ⎜ A1 T (b) = ⎜ ⎝
A−1 A0 A1
⎞ A−1 A0 ...
A−1 ...
⎟ ⎟ ⎠ ...
with r × r blocks Xij and A independent of n. This shows that at most the first (k − 1)r rows of Xk Pn T (b) are nonzero. For 1 ≤ j ≤ k − 1 and ≥ k + 1, the j, block of Xk Pn T (b) is k−1
Xj m Am− = Xj,−1 A−1 + Xj, A0 + Xj,+1 A1 ,
m=1
and as Xj,−1 = Xj, = Xj,+1 = 0 for ≥ k +1, it follows that at most the first kr columns Pn with some (k − 1)r × kr of Xk Pn T (b) are nonzero. Thus, Pn Xk Pn T (b)Pn = Pn Xk+1 matrix Xk+1 independent of n. Finally, Wn Yk Wn Pn T (b)Pn = Wn Yk Pn T ( b)Wn , Wn with some (k − 1)r × kr and the same argument as above shows that this equals Wn Yk+1 matrix Yk+1 that does not depend on n. Inequality (8.16) is an asymptotic result. In contrast to this, Corollary 5.19 is genuinely finite. Combining Lemma 8.6 and Corollary 5.19 with M2 = M20 = bk∞ yields the inequality 41r(k − 1) (8.22) ≤ Tnk (b)2 ≤ bk∞ bk∞ 1 − n
for all n and k. Unfortunately, the lower bound of (8.22) is positive for n > 41r(k − 1) only, which is not yet of much use for n’s below 1000. Better estimates can be derived from Theorem 5.10 on the Fejér means. Here is such an estimate. It is already applicable to n > 2r(k − 1). Theorem 8.7. If b ∈ Pr and n > 2r(k − 1), then Tnk (b)2 ≥ σn−2r(k−1) (bk )∞ . Proof. By Lemma 8.6,
⎛
Xk Tnk (b) = Tn (bk ) + ⎝ 0 0
0 0 0
⎞ ⎛ 0 ∗ ∗ ⎠=⎝ ∗ A 0 W k Yk W k ∗ ∗
⎞ ∗ ∗ ⎠, ∗
i
i i
i
i
i
i
194
buch7 2005/10/5 page 194 i
Chapter 8. Transient Behavior
where Xk and Wk Yk Wk are (k − 1)r × (k − 1)r matrices and A = Tn−2r(k−1) (bk ). This implies that Tnk (b)2 ≥ A2 = Tn−2r(k−1) (bk )2 . It remains to combine the last inequality and Theorem 5.10.
Recall that · W is the Wiener norm of a Laurent polynomial: bW = j |bj |. Theorem 8.8. If b ∈ Pr and n > r(3k − 2), then Tnk (b)2 ≥ bk∞ −
kr bkW . n − 2r(k − 1)
Proof. For a ∈ Pkr and m = n − 2r(k − 1) > kr, |j | 1 1− aj t j − |j |aj t j , aj t j = (σm a)(t) = m m |j |≤kr |j |≤kr |j |≤kr whence
kr j |(σm a)(t)| ≥ aj t − |aj | m |j |≤kr |j |≤kr
and thus σm a∞ ≥ a∞ − (kr/m)aW . This inequality in conjunction with Theorem 8.7 gives the assertion. Corollary 8.9. If b ∈ Pr and bj ≥ 0 for all j , then ⎞k ⎛ n − 3kr ⎝ bj ⎠ for n ≥ 3kr. Tnk (b)2 ≥ n − 2kr j
Proof. In this case b∞ = bW = j bj and, hence, by Theorem 8.8, k kr n − 3kr k Tnk (b)2 ≥ 1 − bj ≥ bj . n − 2r(k − 1) n − 2kr Example 8.10. Let b(t) = t + α 2 t −1 with α ∈ R. From Corollary 8.9 we infer that 1 (1 + α 2 )k ≤ T4kk (b)2 ≤ (1 + α 2 )k 2 for all k ≥ 1. Example 8.11. Suppose T (b) is lower triangular. Then Tnk (b) = Tn (bk ) for all n and k, and hence we can have immediate recourse to Theorem 5.10 to obtain that σn (bk )∞ ≤ Tnk (b)2 ≤ bk∞ for all n and k.
(8.23)
i
i i
i
i
i
i
8.7. The Sky Region Contains an Angle
buch7 2005/10/5 page 195 i
195
Let first b(t) = λ + t. Thus, Tn (b) is the Jordan block Jn (λ). For k ≤ n, (σn (b ))(t) = k
k j =0
j 1− n
k t k k−j j k λ t = (λ + t) 1 − j n t +λ
and hence (8.23) yields 1 k (1 + |λ|)k ≤ Jnk (λ)2 ≤ (1 + |λ|)k 1− n 1 + |λ|
(k ≤ n).
In particular, 50 (0.8) ≤ 1012.77 , 1012.62 ≤ J100
100 1025.17 ≤ J100 (0.8) ≤ 1025.53 ,
which is better than the results of Section 8.1. Now let a(t) = t + t 2 . Then Tn (a) is a “super Jordan block” [270]. For every natural number k, σ2k (a )∞ = k
2k−1 j =k
j 1− 2k
k k−j
=
3 k 2 , 8
and thus (8.23) implies that (3/8) 2k ≤ T2kk (a) ≤ 2k .
8.7 The Sky Region Contains an Angle Let b ∈ Pr and suppose b∞ > 1. In Figures 8.1 and 8.3, the sky region looks approximately like an angle: It is bounded by a nearly vertical line on the left and by a curve close to the graph of a linear function n = ck + d from the right and below. The question is whether this remains true beyond the cutouts we see in the pictures and whether this is valid in general. To be more precise, we fix a (large) number B and we call the set SB = {(k, n) : Tnk (b)2 > B} the sky region (for our choice of the bound B). The following theorem proves that the lower-right boundary of the sky region is always linear or sublinear. Theorem 8.12. If b ∈ Pr and b∞ > 1, then there exist positive constants c and k0 , depending on b and B, such that SB contains the angle {(k, n) : k > k0 , n > ck}. Proof. This is a simple consequence of inequality (8.22), which shows that 41rk Tnk (b)2 ≥ bk∞ 1 − n
(8.24)
for all k and n: if k > k0 where bk∞0 > 2B and n > 82rk, then the right-hand side of (8.24) is greater than B.
i
i i
i
i
i
i
196
buch7 2005/10/5 page 196 i
Chapter 8. Transient Behavior
Thus, if we walk on the norm surface (k, n) → Tnk (b)2 along a curve whose projection in the k, n plane is given by n = ϕ(k), then we will eventually reach any height B and stay above this height forever provided ϕ(k)/k → ∞. Note that this is satisfied for k ϕ(k) = k log log k. Or in yet other terms, if ϕ(k)/k → ∞, then Tϕ(k) (b)2 → ∞. Theorem 8.12 is the deciding argument in support of the statement that Tnk (b)2 runs through a critical transient phase if limn→∞ rad Tn (b) < 1 but b∞ > 1. Suppose, for example, k = 100. If the sky region were roughly of the form SB = {(k, n) : k > k0 , n > k 2 }, then Tn100 (b)2 would be larger than B for n > 10000 only. It is the linearity or sublinearity of the lower-right border of the sky region that allows us to conclude that Tn100 (b)2 is already larger than B for n in the hundreds. We call the number LB (n) = #{k : Tnk (b)2 > B} the length of the critical transient phase for the matrix dimension n. Theorem 8.12 implies that LB (n) > n/c − k0 . In other words, LB (n) increases at least linearly with the matrix dimension n. Equivalently and a little more elegantly, lim inf n→∞
LB (n) > 0. n
May the lower-right boundary of the sky region be strictly
sublinear? Or alternatively, may the lowland contain an angle? Let b ∈ Pr and b(t) = rj =−r bj t j . For ∈ (0, ∞), we define b ∈ Pr by b (t) =
r
bj j t j .
j =−r
Theorem 8.13. Let b ∈ Pr , b∞ > 1, and B > 1. If there exists a number in (0, ∞) such that b ∞ < 1, then SB is contained in an angle {(k, n) : n > ck + d} with c > 0 and d > 0. Proof. The key observation is due to Schmidt and Spitzer [243] and will be extensively exploited in Chapter 11. It consists of the equality Tn (b) = D−1 Tn (b ) D , where D = diag (1, , . . . , n−1 ). Letting M = max(, 1/), we get Tnk (b)2 ≤ D−1 Tnk (b )D 2 ≤ D−1 2 Tnk (b)2 D 2 ≤ M n−1 b k∞ for all n and k. Thus, if (k, n) ∈ SB , then B < M n−1 b k∞ . Since = 1 and thus M > 1, it follows that n>k
log(1/b ∞ ) log B + + 1 =: ck + d, log M log M
and as b ∞ < 1 and B > 1, we see that c > 0 and d > 0. We will see in Chapter 10 that always lim sup rad Tn (b) ≤ n→∞
inf b ∞
∈(0,∞)
i
i i
i
i
i
i
8.7. The Sky Region Contains an Angle
buch7 2005/10/5 page 197 i
197
and that in certain special cases the equality lim rad Tn (b) =
n→∞
inf b ∞
∈(0,∞)
(8.25)
holds. Equality (8.25) is particularly true if T (b) is Hermitian or tridiagonal or triangular or nonnegative, where nonnegativity means that bj ≥ 0 for all j . Hermitian matrices are uninteresting in our context, because for them the value given by (8.25) coincides with a∞ . However, if T (b) is tridiagonal or triangular or nonnegative and if lim rad Tn (b) < 1, then (8.25) implies that we can find a ∈ (0, ∞) such that b ∞ < 1 and hence, by Theorem 8.13, the sky region is contained in an angle. Equivalently, the lowland contains an angle. It also follows that in these cases lim sup n→∞
LB (n) < ∞. n
Suppose now that b ∈ Pr , lim rad Tn (b) < 1, and b∞ > 1. If the lower border of the sky region is strictly sublinear, then, by Theorem 8.13, inf b ∞ must be at least 1. We looked for such symbols in P3 and observed that they are difficult to identify. The determination of inf b ∞ for a given b ∈ P3 is simple. The problem comes with checking whether lim sup rad Tn (b) is smaller than 1 for a given candidate b ∈ P3 . It turns out that inf b ∞ and lim sup rad Tn (b) are usually extremely close. We took 3000 random symbols b ∈ P3 whose real and imaginary parts of the 7 coefficients b−3 , . . . , b3 were drawn from the uniform distribution on (−1, 1). Each time MATLAB computed q=
inf b ∞ /rad T64 (b).
∈(0,∞)
The result was as follows: 1.00 ≤ q < 1.02 in 2793 samples (= 93.1 %) 1.02 ≤ q < 1.04 in 117 samples (= 3.9 %) 1.04 ≤ q < 1.06 in 48 samples (= 1.6 %) 1.06 ≤ q < 1.08 in 23 samples (= 0.77 %) 1.08 ≤ q < 1.10 in 10 samples (= 0.33 %) 1.10 ≤ q < 1.20 in 9 samples (= 0.3 %) and there was no sample with q ≥ 1.20. Thus, the dice show that if rad T64 (b) ≤ 0.98, then inf b ∞ is at most 1.02 · 0.98 = 0.9996 < 1 with probability about 93 % and if rad T64 (b) ≤ 0.83, then inf b ∞ does not exceed 1.20 · 0.83 = 0.996 < 1 nearly surely. This result reveals that if we had a symbol b ∈ P3 for which lim sup rad Tn (b) < 1 and inf b ∞ ≥ 1, then lim sup rad Tn (b) would be dramatically close to 1. Ensuring that lim sup rad Tn (b) is really strictly smaller than 1 and guaranteeing at the same time that inf b ∞ ≥ 1 requires subtle tiny adjustments in the higher decimals after the comma of the coefficients. As we were not sure whether these subtleties survive the numerics needed plot the norm surface, that is, to compute Tnk (b)2 for n in the 30’s and k in the hundreds, we gave up. Thus, we do not know a single symbol with strictly sublinear lower-right border of the sky region or, equivalently, with a lowland that does not contain an angle.
i
i i
i
i
i
i
198
buch7 2005/10/5 page 198 i
Chapter 8. Transient Behavior
Example 8.14. Figure 8.8 shows the norm surface for the symbol b(t) = t −1 + 0.492 t. π The spectral radius of Tn (b) is exactly 0.98 cos n+1 and b∞ = 1 + 0.492 = 1.2401. We k clearly see the critical transient behavior of Tn (b)2 for n exceeding 20 or 30. We also nicely see the indents in the level curves. The matrix T (b) is nonnegative, and hence, by Theorem 8.13, the sky region must be contained in an angle. The strange thing with Figure 8.8 is that the lower-right pieces of the level curves nevertheless look slightly sublinear. Let n = ϕB (k) be the equation of the lower-right piece of the level curve Tnk (b)2 = B. If ϕB were sublinear, then ϕB (k)/k would approach zero as k → ∞. The left picture of Figure 8.9, showing 10 ϕB (k)/k and 10 ϕB (k) log k/k for 100 ≤ k ≤ 1000 does not yet convincingly indicate that ϕB (k)/k tends to a positive finite limit. However, the right picture of Figure 8.9, where we extended the range of the k’s up to 3000, reveals that there must be a positive finite limit for the lower curve 10 ϕB (k)/k. Theorem 8.12 tells us that the lower-right boundary of the sky region is always sublinear and Theorem 8.13 shows that it is superlinear in many cases. Things are completely different for the n × n truncations An of arbitrary bounded linear operators on 2 . Let ψ : N → (e, ∞) be any function such that ψ(n) → ∞ as n → ∞. We put ξ(n) =
ψ(n) , log ψ(n)
λn = e−1/ξ(n) .
Clearly, 1/e < λn < 1. We define the operator A by A = diag (J2 (λ1 ), J2 (λ2 ), . . . ) λ2 λ1 0 , := diag 1 λ1 1
0 λ2
,...
.
Since λn < 1 for all n, the operator A is bounded on 2 . We have diag (J2k (λ1 ), . . . , J2k (λm )) for n = 2m, Akn = diag (J2k (λ1 ), . . . , J2k (λm ), λkm+1 ) for n = 2m + 1. The equality
J2k (λ)
=
λk kλk−1
0 λk
and the condition 1/e < λ < 1 imply that kλk ≤ kλk−1 ≤ J2k (λ)2 ≤ λk + kλk−1 ≤ (1 + ek)λk . Consequently, up to constants independent of n and k, we may replace Akn 2 by kλkm with m = [n/2]. The function fm (x) := xλxm has its maximum at ξ(m) and fm (ξ(m)) = (1/e)ξ(m). If k ≥ ψ(m), then fm (k) ≤ fm (ψ(m)) because fm is monotonically decreasing on the right of ψ(m) (note that ψ(m) ≥ ξ(m)). As fm (ψ(m)) = 1, it follows that SB ⊂ {(k, n) : k < ψ([n/2])}
(8.26)
once B is large enough, say B > 10. Thus, on choosing very slowly increasing functions ψ, we obtain very narrow sky regions. In particular, the sky regions need not contain any angles.
i
i i
i
i
i
i
8.7. The Sky Region Contains an Angle
buch7 2005/10/5 page 199 i
199
40 35 30 25 20 15 10 5 0 0
50
100
150
200
250
300
350
400
Figure 8.8. The symbol is b(t) = t −1 + 0.492 t. The picture shows the level curves = h for h = 1, 10, 102 , 103 , . . . , 106 (the lower curve corresponds to h = 1, the upper to 106 ). We took 3 ≤ n ≤ 40 and k = 5, 10, 15, . . . , 400.
Tnk (b)2
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0 0
200
400
600
800
1000
0 0
1000
2000
3000
Figure 8.9. The symbol is b(t) = t −1 + 0.492 t. The pictures show 10 ϕB (k)/k and 10 ϕB (k) log k/k for B = 10−5 over two different ranges of k.
i
i i
i
i
i
i
200
buch7 2005/10/5 page 200 i
Chapter 8. Transient Behavior
This is the right moment to return to what was said after Theorem 8.12. For the operator A just constructed, we have Akn 2 kλk[n/2] → k as n → ∞. Thus, when moving along the line k = 104 on the norm surface, we will eventually be at a height of about 104 and may conclude that Akn 2 is close to 104 for all sufficiently large n. But if, for example, ψ(n) = log n for large n, then (8.26) shows that we will be in S104 /2 only for the n’s satisfying log[n/2] > k = 104 , that is, for n > 2 exp(104 ) ≈ 2 · 104343 . We beautifully see that in this case movement along the lines k = constant does practically not provide us with information about the evolution of the norms along the lines n = constant. Moral: Theorem 8.1 is a good reason for expecting critical behavior whenever a∞ > 1, but it is Theorem 8.1 in conjunction with Theorem 8.12 that justifies this expectation within reasonable dimensions. In the language of critical transient phase lengths, (8.26) says that lim sup n→∞
LB (n) e22 ξ(m) for 21 ξ(m) < k < 2 ξ(m). Consequently, LB (n) ≥ (3/2)ξ([n/2]) if only (2/e2 )ξ([n/2]) > B. As the last inequality is satisfied for all sufficiently large n, we arrive at the conclusion that lim inf n→∞
LB (n) 3 ≥ . ξ([n/2]) 2
Finally, since ξ(m) = ψ(m)/ log ψ(m), it follows that lim inf n→∞
LB (n) log ψ([n/2]) > 0. ψ([n/2])
Rapidly growing functions ψ, such as ψ(n) = en , therefore yield gigantic critical phase lengths.
8.8
Oscillations
k Let b(t) = t −1 + 0.492 t be as in Example 8.14. When plotting T30 (b) for k between 1 and 300, we see Figure 8.10 on the screen, and the purpose of this section is to explain the oscillating behavior in Figure 8.10. Let An be an n × n matrix and suppose rad An < 1. For the sake of simplicity, assume that all eigenvalues λ1 , . . . , λn of An are simple. We then have An = CC −1 , where = diag (λ1 , . . . , λn ) and C is an invertible n × n matrix. It follows that Akn = Ck C −1 , and hence
Akn = C1 λk1 + · · · + Cn λkn
(8.27)
i
i i
i
i
i
i
8.8. Oscillations
buch7 2005/10/5 page 201 i
201
4
6
x 10
5
4
3
2
1
0 0
50
100
150
200
250
300
k Figure 8.10. The symbol is b(t) = t −1 + 0.492 t. We see the norms T30 (b)2 for k = 1, 2, 3, . . . , 300. A close-up is in the left picture of Figure 8.11.
with certain n × n matrices C1 , . . . , Cn that do not depend on k (note that Cj is simply the product of the j th column of C by the j th row of C −1 ). Assume that |λ1 | = · · · = |λs | > max |λj |. j ≥s+1
Then (8.27) gives Akn = C1 λk1 + · · · + Cs λks + O(σ k )
as
k→∞
with σ = maxj ≥s+1 |λj |. Thus, if k is large, then Akn 2 ≈ C1 λk1 + · · · + Cs λks 2 ,
(8.28)
which has good chances for oscillatory behavior due to the fact that λ1 to λs have equal moduli. In the case where An is the Toeplitz matrix Tn (b) with b(t) = t −1 +α 2 t, the eigenvalues are given by 2α cos
πj n+1
(j = 1, . . . , n).
i
i i
i
i
i
i
202
buch7 2005/10/5 page 202 i
Chapter 8. Transient Behavior
The two dominating eigenvalues are λ1 = 2α cos
π n+1
λ2 = 2α cos
and
nπ π = −2α cos . n+1 n+1
Consequently, Tnk (b)2 ≈ |2α| cos
π n+1
k C1 + (−1)k C2 2
(8.29)
π for large k. Thus, the damping factor |2α|k cosk n+1 has an amplitude that equals C1 +C2 2 for even k and C1 − C2 2 for odd k. If the damping factor is not too small, then (8.29) (which holds for large k only) should be already valid in the critical transient phase so that we can see it with our eyes. This would be an explanation for the oscillating behavior in Figure 8.10. Let us check our example. Thus, take α = 0.49 and n = 30. Then
100 T30 (b)2 = 104.6457 , 101 T30 (b)2 = 104.7014 ,
π 100 C1 + C2 2 = 105.1438 , 31 π 101 0.98 cos C1 − C2 2 = 105.1959 , 31 0.98 cos
that is, (8.29) cannot be said to be satisfied. The point is that the modulus of the quotient of the dominant eigenvalues and the next eigenvalue is very close to 1, which implies that approximation (8.29) is not yet good enough in the high transient phase. However, consideration of a few more terms does set things right: = = π k = cos(2π/31) k = = = k k 0.98 cos =C1 + (−1) C2 + (C3 + (−1) C4 ) = 31 = cos(π/31) =
2
equals 104.4597 and 104.5201 for k = 100 and k = 101, respectively, and 0.98 cos
= k π k = =C1 + (−1)k C2 + (C3 + (−1)k C4 ) cos(2π/31) 31 = cos(π/31) k = cos(3π/31) = = + (C5 + (−1)k C6 ) cos(π/31) =2
is 104.6512 and 104.7067 for k = 100 and k = 101, respectively. From the paper [96] by Brian Davies, we learned that such oscillation phenomena for semigroup norms are well known. We also learned from [96] that the kind of oscillation may depend on the norm chosen. Figures 8.11 and 8.12 show the different oscillatory behavior of the spectral norms Tnk (a)2 and the Frobenius norms Tnk (a)F fairly convincingly. Finally, Figure 8.13 illustrates what happens when walking on the norm surface along the lines n = constant.
i
i i
i
i
i
i
8.8. Oscillations
buch7 2005/10/5 page 203 i
203
4
4
x 10
6.5
x 10
5 6 4.8 5.5 4.6 5 4.4 4.5 80
90
100
110
120
80
90
100
110
120
Figure 8.11. The symbol is given by b(t) = t −1 + 0.492 t. The left picture is a k close-up of part of Figure 8.10. The right picture shows the Frobenius norms T30 (b)F .
18 16 14 12 10 8 6 4 2 0 0
20
40
60
80
100
k Figure 8.12. The symbol is b(t) = t −1 + 0.492 t. We see the norms T12 (b)2 k (lower curve) and the Frobenius norms T12 (b)F (upper curve) for k = 1, 2, 3, . . . , 100.
i
i i
i
i
i
i
204
buch7 2005/10/5 page 204 i
Chapter 8. Transient Behavior 9
3
6
x 10
4
x 10
2.5 3 2 1.5
2
1 1 0.5 0 0
50
100
150
200
0 32
34
36
38
Figure 8.13. The symbol is again as in Figure 8.10. The pictures show the norms Tn100 (a) (solid) and Tn101 (a) (dashed) for two different ranges of n. Another example is considered in Figures 8.14 and 8.15. Now the symbol is b(t) = + t −2 ). We have rad T30 (b) = 0.9847 and b∞ = 1.04. The dominating eigenvalues of T30 (b) are 10 (t 19
λ1 = μ, λ2 = μω, λ3 = μω2 and hence
(μ = 0.9847, ω = e2π i/3 )
⎧ ⎨ C1 + C2 + C3 2 |μ|k k C1 + C2 ω + C3 ω2 2 |μ|k T30 (b)2 ≈ ⎩ C1 + C2 ω2 + C3 ω2 |μ|k
for for for
k ≡ 0 (mod 3), k ≡ 1 (mod 3), k ≡ 2 (mod 3).
The period 3 is nicely seen in the right picture of Figure 8.15.
8.9
Exponentials
Let An be an n × n matrix and let τ > 0. Then eτ An → 0 as τ → ∞ if and only if the spectrum of An is contained in the open left half-plane. Now let An = Tn (b) with b ∈ W . The decomposition b = Re b + i Im b of b into the real and imaginary parts yields 1 (An + A∗n ) = Tn (Re b), 2 1 Im An := (An − A∗n ) = Tn (Im b). 2i Re An :=
We denote by max Re b the maximum of Re b on the unit circle T. Here is the analogue of Theorem 8.1. Theorem 8.15. Let b ∈ W . Then eτ Tn (b) 2 ≤ eτ sup Re b for all τ > 0 and all n ∈ N.
(8.30)
i
i i
i
i
i
i
8.9. Exponentials
buch7 2005/10/5 page 205 i
205
1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1
−0.5
0
0.5
1
Figure 8.14. The range a(T) for a(t) = 10 (t + t −2 ) and the eigenvalues of T30 (a). 19 The three dominating eigenvalues are 0.9847, 0.9847ω, 0.9847ω2 with ω = e2π i/3 . The maximum modulus of the remaining 27 eigenvalues is 0.9549. 4 3.5
3.66
3 3.64
2.5 3.62
2 1.5
3.6
1 3.58
0.5 0
50
100
150
200
52
54
56
58
60
62
64
k Figure 8.15. The pictures show the norms T30 (b)2 for the symbol defined by b(t) = + t −2 ). 10 (t 19
Moreover, lim eτ Tn (b) 2 = eτ sup Re b for each τ > 0.
n→∞
(8.31)
Proof. We have eAn 2 ≤ eRe An 2 for every matrix An (see, e.g., [27, p. 258]). Furthermore, eRe An 2 = eλmax (Re An ) , where λmax (Re An ) is the maximal eigenvalue of the Hermitian matrix Re An . Thus, eτ Tn (b) 2 ≤ eτ λmax (Tn (Re b)) . The number λmax (Tn (Re b)) does not exceed max Re b. The proof is as follows. Pick
i
i i
i
i
i
i
206
buch7 2005/10/5 page 206 i
Chapter 8. Transient Behavior
n μ > max Re b and assume Tn (Re b)x = μx for some nonzero x = (xj )n−1 j =0 ∈ C . Put
x(eiθ ) = x0 + x1 eiθ + · · · + xn−1 ei(n−1)θ . Formula (4.16) shows that (Tn (μ − Re b)x, x) =
1 2π
2π
(μ − (Re b)(eiθ ))|x(eiθ )|2 dθ,
(8.32)
0
and since Tn (μ−Re b)x = 0, the right-hand side of (8.32) must be zero. As x(eiθ ) vanishes only at finitely many eiθ ∈ T, it follows that μ − Re b = 0 almost everywhere, which is impossible for μ > max Re b. Thus, the proof of (8.30) is complete. Since eτ Tn (b) → eτ T (b) strongly, the Banach-Steinhaus theorem gives lim inf eτ Tn (b) 2 ≥ eτ T (b) 2 ≥ rad eτ T (b) . n→∞
From the spectral mapping theorem we deduce that sp eτ Tn (b) = eτ sp T (b) , and hence Corollary 1.12 implies that lim inf eτ Tn (b) 2 ≥ eτ max Re b . n→∞
This and (8.30) imply (8.31). Thus, Theorem 8.15 tells us that if n is large, then eτ Tn (b) has critical behavior if and only if max Re b > 0. To get realistic estimates, one can employ the analogue of (8.6). Trefethen’s note [272] and [96], [275] contain an analogue of Theorem 8.3 for exponentials. This result implies that if the pseudospectrum spε A contains points in the open right halfplane, then each point λ ∈ spε A with β := Re λ > 0 gives us an estimate of the form / eβτ0 − 1 τA βτ0 1+ε . sup e ≥ e β 0 0. For B > 0, put SB = {(τ, n) ∈ (0, ∞) × N : eτ Tn (b) 2 > B}. Then for each ε > 0, there exist positive and finite constants τ0 and c depending only on B, b, ε such that SB ⊃ (τ, n) : τ > τ0 , n > cτ (log τ )1+ε . Clearly, this result is weaker than Theorem 8.12 because of the presence of the factor (log τ )1+ε . From the practical point of view we can say that, for large τ , this factor may
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 207 i
207
be ignored in comparison with τ . As for the theoretical side of the problem, we remark that this factor emerges from our techniques and that it can probably be removed by more powerful machinery. We conclude with an example. Example 8.17. Let T (b) be the tridiagonal matrix generated by b(t) = t −1 + α 2 t − λ with α ∈ [0, ∞) and λ ∈ C. The eigenvalues of Tn (b) are densely spread over the interval (−2α − λ, 2α − λ). Therefore, eτ Tn (b) 2 → 0 as τ → ∞ for each n if and only if 2α < Re λ. We have that max Re b = −Re λ + 1 + α 2 . Consequently, eτ Tn (b) 2 has a critical transient phase before decaying to zero for large n if and only if 2α < Re λ < 1+α 2 . Figure 8.16 reveals that the norm surfaces of exponentials look much like their counterparts for powers.
60
50
40
30
20
10
0 0
50
100
150
200
Figure 8.16. The symbol is b(t) = t −1 + 32 t − 7. The picture shows the level curves eτ Tn (b) 2 = h for h = 10−2 , 1, 102 , 104 , . . . , 1010 (the lower curve corresponds to h = 10−2 , the upper to 1010 ). We took 3 ≤ n ≤ 60 and τ = 5, 10, 15, . . . , 200.
Exercises 1. Prove that if Un is unitary, then Gk (Un∗ An Un ) = Gk (An ). 2. Let An be an n × n matrix and let m be the degree of the minimal polynomial of An . Prove that H2 (An ) = G1 (An ) ⊃ G2 (An ) ⊃ · · · ⊃ Gm (An ) = Gm+1 (An ) = · · · = sp An .
i
i i
i
i
i
i
208
buch7 2005/10/5 page 208 i
Chapter 8. Transient Behavior
3. Let An be a Hermitian matrix. Show that Gk (An ) equals conv sp An for k = 1 and sp An for k ≥ 2. 4. The polynomial convex hull of degree k of a set S ⊂ C is defined by + pcok S := {ζ ∈ C : |p(ζ )| ≤ max |p(z)| for all p ∈ Pk+1 }. z∈S
Let b ∈ P. Prove that if n is large enough, then Gk (Cn (b)) = pcok b(Tn ), where Tn := {e2π i/n : = 0, 1, . . . , n − 1}. Does pcok b(Tn ) converge to pcok b(T)? 5. Prove that pcok b(T) ⊂ Gk (T (b)) ⊂ conv R(b) for every b ∈ P. 6. Let k ≥ 2. Find a bounded operator A on 2 such that Gk (Pn APn ) converges in the Hausdorff metric to a set that contains Gk (A) properly. 7. Prove that lim sup Gk (Tn (b)) ⊂ Gk (T (b)) n→∞
for every Laurent polynomial b. 8. Let A be a nonzero n × n matrix and let rad A denote the spectral radius of the matrix A. Show that lim sup k→∞
Ak 2 (rad A)k
is finite if and only if A is diagonalizable.
Notes In this chapter we follow [39]. As already said, polynomial numerical hulls were independently invented by Nevanlinna [192], [193] and Greenbaum [141], [142]. It should be mentioned that Faber, Greenbaum, and Marshall pointed out in [112] that the asymptotic formula log(2n) log(log(2n)) 1 n−1,n = 1 − + +o n n n is in fact a very old result. Namely, the problem of determining n−1,n is equivalent to a classical problem in complex approximation theory (closely related to the CarathéodoryFejér interpolation problem) which was explicitly solved by Schur and Szegö [244] and then rediscovered with a different proof by Goluzin [138], [139, Theorem 6, pp. 522–523]. However, the proof in [112] yields more information than the earlier proofs since it allowed √ the authors to determine 4m−2,4m , for example (it is shown that 4m−2,4m = 2m−1,2m ). The exploitation of pseudospectra in connection with the transient behavior of powers of Toeplitz matrices goes back at least to Figure 2 of Reichel and Trefethen’s paper [219]. Section 8.3 addresses issues concerning the computation of concrete pseudospectra. Tom
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 209 i
209
Wright’s package EigTool [299] provides us with fantastic software for computing pseudospectra numerically. In 1999, Trefethen [271] wrote: “In 1990, getting a good plot of pseudospectra on a workstation for a 30×30 matrix took me several minutes. Today I would expect the same of a 300 × 300 matrix, and pseudospectra of matrices with dimensions in the thousands are around the corner.” Concerning the computation of pseudospectra of dense matrices, Wright notes in his 2002 thesis [300]: “What was once a very expensive computation has become one that is practically of a similar order of complexity to that of computing eigenvalues.” We refer the reader to [110], [271], [275] for more on the subject. The results of Exercises 1 to 3 are from [141], [192], [193]. In connection with Exercise 7 we remark that James Burke and Anne Greenbaum [80] recently proved that lim inf Gk (Tn (b)) contains the closure of the interior of Gk (T (b)). Further results: numerical range of finite Toeplitz matrices. We denote by Pr the set
of the Laurent polynomials of degree r, and for b(t) = rj =−r bj t j in Pr , we define the Abel-Poisson mean (= harmonic extension) h b ∈ Pr by (h b)(t) =
r
|j | bj t j
(t ∈ T).
j =−r
It is clear that H2 (Tn (b)) ⊂ H2 (T (b)). The closure of the convex set H2 (T (b)) is given by Theorem 7.11. The following lower estimate for H2 (Tn (b)) was communicated to us by Anne Greenbaum [143]: If b ∈ Pr , then . H2 (Tn (b)) ⊃ (h b)(T), (8.33) 0≤≤r,n
where r,n is as in Section 8.2. Anne’s proof is as follows. Put b+ (z) =
r
j
bj z ,
b− (z) =
j =0
r
b−j zj .
j =1
Then Tn (b) = b+ (J )+b− (J ), where J := Jn (0) (recall (8.3)). Let ζ = (J x, x) be a point in the polynomial numerical hull Gr (J ). In [141], it is shown that ((J ) x, x) = (J x, x) for = 1, . . . , r. This implies that (J x, x) = ((J ) x, x) = (J x, x) for = 1, . . . , r. Hence (Tn (b)x, x) = b+ ((J x, x)) + b− ((J x, x) ), and we arrive at the conclusion that H2 (Tn (b)) contains {b+ (ζ ) + b− (ζ ) : ζ ∈ Gr (J )} = {b+ (eiθ ) + b− (e−iθ ) : 0 ≤ ≤ r,n , θ ∈ [0, 2π )} = {(h b)(t) : 0 ≤ ≤ r,n , t ∈ T}, which completes the proof. The numerical range of a tridiagonal n × n matrix was found by Eiermann [107] (see also Brown and Spitkovsky’s paper [78]). If b(t) = t + α 2 t −1 with 0 < α < 1,
i
i i
i
i
i
i
210
buch7 2005/10/5 page 210 i
Chapter 8. Transient Behavior
then H2 (Tn (b)) is the ellipse with the foci ±2α cos(π/(n + 1)), the minor half-axis length (1 − α 2 ) cos(π/(n + 1)), and the major half-axis length (1 + α 2 ) cos(π/(n + 1)). In other terms, H2 (Tn (b)) = cos(π/(n + 1)) clos H2 (T (b)). We know from Section 8.2 that cos(π/(n + 1)) is just 1,n . Thus, in the case at hand estimate (8.33) amounts to the containment H2 (Tn (b)) ⊃ {(t + α 2 t −1 ) : 0 ≤ ≤ 1,n , t ∈ T} : π 2 −1 = (t + α t ) : 0 ≤ ≤ cos ,t ∈T n+1 π = cos clos H2 (T (b)). n+1
i
i i
i
i
i
i
buch7 2005/10/5 page 211 i
Chapter 9
Singular Values
The asymptotic behavior of the singular values of the matrices Tn (b) is, in a sense, a mirror image of the topological properties of the symbol b: Roch and Silbermann’s splitting phenomenon says that if b has no zeros on T and winding number k about the origin, then the |k| smallest singular values of Tn (b) go exponentially fast to zero while the remaining n − |k| singular values stay away from zero. We also determine the limiting set of the singular values of Tn (b) as n → ∞, and we prove the Avram-Parter theorem, which identifies the corresponding limiting measure. As an application of the Avram-Parter theorem, we show that if x is a randomly chosen vector of length n and n is large, then Tn (b)x22 clusters sharply around a certain value which, moreover, is much smaller than one would predict.
9.1 Approximation Numbers For j ∈ {0, 1, . . . , n}, let Fj(n) denote the collection of all n × n matrices of rank at most j . The j th approximation number with respect to the p norm of an n × n matrix An is defined by (p) aj (An ) = distp (An , Fj(n) ) := min An − Fn p : Fn ∈ Fj(n) . (p)
(p)
(p)
(p)
Clearly, 0 = an (An ) ≤ an−1 (An ) ≤ · · · ≤ a1 (An ) ≤ a0 (An ) = An p . Put (p)
(p)
σj (An ) = an−j (An ). (p)
(p)
(p)
(p)
Thus, 0 = σ0 (An ) ≤ σ1 (An ) ≤ · · · ≤ σn−1 (An ) ≤ σn (An ) = An p . It is well (p) known that in the case p = 2 the numbers σj (An ) are the singular values of an , that is, the nonnegative square roots of the eigenvalues λj (A∗n An ) (j = 1, . . . , n) of the matrix A∗n An : # (9.1) σj (An ) := σj(2) (An ) = λj (A∗n An ). The following results are standard. 211
i
i i
i
i
i
i
212
buch7 2005/10/5 page 212 i
Chapter 9. Singular Values
Theorem 9.1 (singular value decomposition). If An is an n × n matrix, then there exist unitary matrices Un and Vn such that An = Un diag (σ1 (An ), . . . , σn (An )) Vn . Theorem 9.2. If 1 ≤ p ≤ ∞ and An is an n × n matrix, then if An is invertible, 1/A−1 (p) n p σ1 (An ) = 0 if An is not invertible. Theorem 9.3. If An , Bn , Cn are n × n matrices and 1 ≤ p ≤ ∞, then (p)
(p)
σj (An Bn Cn ) ≤ An p σj (Bn ) Cn p for every j ∈ {0, 1, . . . , n}. Given a Hilbert space H and A ∈ B(H ), we define (A) = σ ∈ [0, ∞) : σ 2 ∈ sp A∗ A . In particular, for an n × n matrix An we have (An ) = {σ1 (An ), . . . , σn (An )} . Finally, we set (p) p (An ) = σ1 (An ), . . . , σn(p) (An ) .
9.2 The Splitting Phenomenon The splitting phenomenon is the most striking property of the singular values (approximation numbers) of Toeplitz matrices. It is described by the following theorem. An illustration is in Figure 9.1. Theorem 9.4 (Roch and Silbermann). Let b be a Laurent polynomial and suppose T (b) is Fredholm of index k ∈ Z. Let α be any number satisfying (1.23) and 1 ≤ p ≤ ∞. Then (p) (p) the |k| first approximation numbers σ1 (Tn (b)), . . . , σ|k| (Tn (b)) of Tn (b) go to zero with exponential speed, σ|k| (Tn (b)) = O(e−αn ), (p)
(p)
(9.2) (p)
while the remaining n − |k| approximation numbers σ|k|+1 (Tn (b)), . . . , σn (Tn (b)) stay away from zero, (p)
σ|k|+1 (Tn (b)) ≥ d > 0
(9.3)
for all sufficiently large n, where d is a constant depending only on b.
i
i i
i
i
i
i
9.2. The Splitting Phenomenon
buch7 2005/10/5 page 213 i
213
9 8 7 6 5 4 3 2 1 0 −1 0
10
20
30
40
50
60
10
20
30
40
50
60
10
8
6
4
2
0 0
Figure 9.1. In the small pictures we see the range of the symbol a(t) = t −1 − it + 2it − 5it 3 , which has winding numbers 3 and 2 about the origin and the point −3 − 2i, respectively. The singular values of Tn (a) and Tn (a + 3 + 2i) for 5 ≤ n ≤ 60 are shown in the top and bottom pictures. As predicted by Theorem 9.4, three of them go to zero in the top picture and two of them approach zero in the bottom picture. The rest stays away from zero. 2
i
i i
i
i
i
i
214
buch7 2005/10/5 page 214 i
Chapter 9. Singular Values
Proof. We first prove (9.2). For the sake of definiteness, let us assume that k > 0; there is nothing to be proved for k = 0, and the case k < 0 can be reduced to the case k > 0 by passage to adjoints. Recall that χm is defined by χm (t) := t m (t ∈ T) and that Pn+ is the set of all analytic polynomials of degree at most n − 1. Let b = b− χ−k b+ be a Wiener-Hopf factorization of b. If n is sufficiently large, then cn (t) :=
n−k
−1 −1 (b+ ) t
=0
is a function in W . Let Fn : Pn+ → Pn+ be the linear operator that sends f to n (cn χ−k b− f ), where n is the orthogonal projection of L2 (T) onto Pn+ . For j = 0, 1, . . . , k − 1, the function χj cn−1 belongs to Pn+ , and we have Fn (χj cn−1 ) = n (cn χ−k b− χj cn−1 ) = n (χj −k b− ) = 0. Hence, dim Im Fn = n−dim Ker Fn ≤ n−k. Let Gn be the matrix representation of Fn with (p) (n) respect to the basis {χ0 , χ1 , . . . , χn−1 } of Pn+ . Then Gn ∈ Fn−k and thus, σk (Tn (b)) = (p) an−k (Tn (b)) ≤ Tn (b) − Gn p . Since Tn (b) − Gn is the matrix representation of the linear operator Pn+ → Pn+ , f → n ((b+ −cn )χ−k b− f ) in the basis {χ0 , χ1 , . . . , χn−1 }, it follows that (p)
σk (Tn (b)) ≤ b+ − cn W χ−k W b− W = O(b+ − cn W ). −1 From Lemma 1.17 we know that b+ −cn−1 W = O(e−αn ). This implies that b+ −cn W = −αn O(e ) and so gives (9.2). We now prove inequality (9.3). This time we assume without loss of generality that k = −j < 0, since for k = 0 the assertion follows from Theorems 3.7 and 9.2 and for k > 0 we may pass to adjoints. We have b = cχj , where c has no zeros on T and the winding number of c is zero. As T (χ−j )p = 1, we deduce from Theorem 9.3 that (p)
(p)
(p)
σj +1 (Tn (b)) = σj +1 (Tn (cχj )) = σj +1 (Tn (cχj )) T (χ−j )p (p)
(p)
≥ σj +1 (Tn (cχj )Tn (χ−j )) = σj +1 (Tn (c) − Pn H (cχj )H (χj )Pn ) (recall Proposition 3.10 for the last equality). Obviously, dim Im H (χj ) = j . Consequently, Hj := Pn H (cχj )H (χj )Pn ∈ Fj(n) and hence (p)
(p)
σj +1 (Tn (c) − Hj ) = an−j −1 (Tn (c) − Hj ) (n) = min Tn (c) − Hj − Kn−j −1 p : Kn−j −1 ∈ Fn−j −1 (n) ≥ min Tn (c) − Ln−1 p : Ln−1 ∈ Pn−1 (p)
(p)
= an−1 (Tn (c)) = σ1 (Tn (c)). Since T (c) is invertible, Theorems 3.7 and 9.2 yield that lim inf σ1 (Tn (c)) = lim inf Tn−1 (c)−1 p = d > 0. (p)
n→∞
n→∞
i
i i
i
i
i
i
9.3. Singular Values of Circulant Matrices
9.3
buch7 2005/10/5 page 215 i
215
Singular Values of Circulant Matrices
Throughout the rest of this chapter we restrict ourselves to the case p = 2. Let b be a Laurent polynomial, b(t) =
r
bj t j
(t ∈ T).
(9.4)
j =−r
In (9.4) we do not require that both the coefficients b−r and br are nonzero. For n ≥ 2r + 1, we define the circulant matrix Cn (b) as in Section 2.1. We have O(n−r)×(n−r) Dr , (9.5) Cn (b) − Tn (b) = Er Or×r where O denotes the zero matrix and ⎛ br br−1 . . . b1 ⎜ 0 br . . . b2 ⎜ Dr = ⎜ . . . . .. .. ⎝ .. . . 0
0
...
⎞ ⎟ ⎟ ⎟, ⎠
⎛ ⎜ ⎜ Er = ⎜ ⎝
br
b−r b−r+1 .. .
0 b−r .. .
b−1
b−3
... 0 ... 0 . . .. . . . . . b−r
⎞ ⎟ ⎟ ⎟. ⎠
Proposition 9.5. The singular values of Cn (b) are |b(1)|, |b(ωn )|, . . . , |b(ωnn−1 )|, where ωn := exp(2πi/n). Proof. Clearly, Cn∗ (b) = Cn (b). From (2.6) we infer that Cn∗ (b)Cn (b) = Cn (bb) = j Cn (|b|2 ). Thus, by Proposition 2.1, the eigenvalues of Cn∗ (b)Cn (b) are just |b(ωn )|2 (j = 0, 1, . . . , n − 1). Formula (9.1) completes the proof. Theorem 9.6. Let b be a Laurent polynomial of the form (9.4) and suppose |b| is not constant. Put m = min |b(t)|, M = max |b(t)|, t∈T
t∈T
denote by α ∈ N the maximal order of the zeros of |b| − m on T, and let β ∈ N be the maximal order of the zeros of M − |b| on T. Then for each k ∈ N and all sufficiently large n, m ≤ σk (Cn (b)) ≤ m + Ek
1 1 , M − Dk β ≤ σn−k (Cn (b)) ≤ M, nα n
where Ek , Dk ∈ (0, ∞) are constants independent of n. Proof. Put f (θ ) = |b(eiθ )| − m for θ ∈ [0, 2π ) and let f have a zero of the maximal order α at θ0 ∈ [0, 2π). By Proposition 9.5, the singular values of Cn (b) (n ≥ 2r + 1) are f (2πj/n) + m (j = 0, 1, . . . , n − 1). Let Uk,n be the segment [θ0 , θ0 + 4π k/n] and
i
i i
i
i
i
i
216
buch7 2005/10/5 page 216 i
Chapter 9. Singular Values
denote by j1 < · · · < jq the numbers jμ for which 2πjμ /n belongs to Uk,n . If n is large enough, then f is strictly monotonically increasing on Uk,n and q is approximately equal to 2k. Hence σk (Cn (b)) ≤ f (2πjq /n) + m. It follows that 0 ≤ σk (Cn (b)) − m ≤ f
2πjq n
≤f
4π k θ0 + n
≤E
4π k n
α = Ek
1 . nα
The estimate for σn−k (Cn (b)) can be shown analogously.
9.4
Extreme Singular Values
Theorem 9.4 leaves us with the case where T (b) is not Fredholm. In this section we show that in that case σk (Tn (b)) goes to zero as n → ∞ for each fixed k. This will be done with the help of Theorem 9.6 and the following well-known interlacing result for singular values. Theorem 9.7. Let A be a complex n×n matrix and let B = Pn−1 APn−1 be the (n−1)×(n−1) principal submatrix. Then σ1 (A)
≤
σ3 (B)
σ2 (A)
≤
σ4 (B)
≤
... σn−3 (A)
≤
σn−1 (B)
σn−2 (B) ≤
σn−2 (A)
≤
σn (B)
σ2 (B) ≤ σn−3 (B) σn−1 (B)
≤ σn−1 (A).
Here is the desired result on the lower singular values. Theorem 9.8. Let b be a nonconstant Laurent polynomial and suppose T (b) is not Fredholm. Let α ∈ N be the maximal order of the zeros of |b| on T. Then for each natural number k ≥ 1, σk (Tn (b)) = O(1/nα ) as n → ∞. Proof. First notice that, by Theorem 1.9, |b| does have zeros on T if T (b) is not Fredholm. Let b be as in (9.4) and n ≥ 2r + 1. From (9.5) we know that Tn (b) can be successively extended to Cn+r (b) by adding one row and one column in each step. We have σk (Tn (b)) ≤ σk+1 (Tn (b)). Since k + 1 ≥ 2, we can r times employ Theorem 9.7 to get σk (Tn (b)) ≤ σk+1 (Cn+r (b)) for all sufficiently large n, and Theorem 9.6 with m = 0 implies that σk+1 (Cn+r (b)) = O(1/nα ). Note that Theorem 9.8 together with the equality Tn−1 (b)2 = 1/σ1 (Tn (b)) yields another proof of Corollary 4.12. The following result shows that the upper singular values σn−k (Tn (b)) always approach T (b)2 = b∞ as n → ∞, independently of whether T (b) is Fredholm or not.
i
i i
i
i
i
i
9.5. The Limiting Set
buch7 2005/10/5 page 217 i
217
Theorem 9.9. Let b be a Laurent polynomial and suppose the modulus |b| is not constant on T. Denote by β ∈ N the maximal order of the zeros of b∞ − |b| on the unit circle T. Then for each k ≥ 0, b∞ − Dk
1 ≤ σn−k (Tn (b)) ≤ b∞ nβ
with some constant Dk ∈ (0, ∞) independent of n. Proof. If n is large enough then, by (9.5) and Theorem 9.7, σn−k (Tn (b)) ≥ σn−k−2r (Cn+r (b)). The assertion is therefore immediate from Theorem 9.6. What happens if |b| is constant? By Proposition 5.6, this occurs if and only if b(t) = γ t m (t ∈ T) with γ ∈ C and m ∈ Z. If γ = 0, then all singular values of Tn (b) are zero, and if γ = 0, it is easy to see that |m| singular values of Tn (b) are zero and that the n − |m| remaining singular values are equal to |γ |.
9.5 The Limiting Set The objective of this section is the determination of the limiting sets lim inf (Bn ) and lim sup (Bn ) in the case where Bn is Toeplitz-like. Recall that the set (B) of the singular values of a Hilbert space operator is defined as the set of all σ ∈ [0, ∞) for which σ 2 ∈ sp B ∗ B. Lemma 9.10. Let Bn = Tn (b) + Pn KPn + Wn LWn + Cn , where b is a Laurent polynomial, K and L are matrices with only a finite number of nonzero entries, and Cn 2 → 0 as = T ( n → ∞. Put B = T (b) + K and B b) + L. Then lim inf sp (Bn ) ⊂ lim sup sp (Bn ) ⊂ sp B ∪ sp B, n→∞
(9.6)
n→∞
and if, in addition, the matrices Bn are all Hermitian, then lim inf sp (Bn ) = lim sup sp (Bn ) = sp B ∪ sp B. n→∞
(9.7)
n→∞
− λI are invertible, Then B − λI and (B − λI )∼ = B Proof. Let λ ∈ / sp B ∪ sp B. and hence Theorem 3.13 implies that there are numbers n0 and M ∈ (0, ∞) such that (Bn − λI )−1 2 ≤ M for all n ≥ n0 . It follows that the spectral radius of (Bn − λI )−1 is at most M, which gives U1/M (0) ∩ sp (Bn − λI ) = ∅ for n ≥ n0 , where Uδ (μ) := {λ ∈ C : |λ − μ| < δ}. Hence U1/M (λ) ∩ sp Bn = ∅ for n ≥ n0 and thus λ ∈ / lim sup sp Bn . This completes the proof of (9.6). are selfadjoint and all spectra Now suppose that Bn = Bn∗ for all n. Then B and B occurring in (9.7) are subsets of the real line. We are left to show that if λ ∈ R and λ ∈ / But if λ is real and not in lim inf sp Bn , then there exists a lim inf sp Bn , then λ ∈ / sp B ∪sp B. δ > 0 such that Uδ (λ)∩sp Bnk = ∅ for infinitely many nk , that is, Uδ (0)∩sp (Bnk −λI ) = ∅
i
i i
i
i
i
i
218
buch7 2005/10/5 page 218 i
Chapter 9. Singular Values
for infinitely many nk . As Bnk − λI is Hermitian, the spectral radius and the norm of the operator (Bnk − λI )−1 coincide, which gives (Bnk − λI )−1 2 < 1/δ for infinitely many nk . It follows that {Bnk − λI } and thus also {Wnk (Bnk − λI )Wnk } are stable. Lemma 3.4 − λI are invertible. This proves (9.7). now shows that B − λI and B be as in Lemma 9.10. Then Corollary 9.11. Let Bn , B, B lim inf (Bn ) = lim sup (Bn ) = (B) ∪ (B). n→∞
(9.8)
n→∞
In particular, for every Laurent polynomial b, b)). lim inf (Tn (b)) = lim sup (Tn (b)) = (T (b)) ∪ (T ( n→∞
(9.9)
n→∞
Proof. We have Bn∗ Bn = Tn (b b) + Pn XPn + Wn Y Wn + Dn , where X and Y have only a finitely many nonzero entries and Dn 2 → 0 as n → ∞. Equalities (9.8) are therefore straightforward from (9.7). Remark 9.12. Let V : 2 → 2 be the map given by (V x)j = xj . Since sp V AV = sp A,
sp (A∗ A) ∪ {0} = sp (AA∗ ) ∪ {0}
(9.10)
b), we obtain for every A ∈ B(2 ) and V T (b)V = T ( ((T (b)))2 = sp T (b)T (b) = sp V T (b)T (b)V = sp V T (b)V V T (b)V = sp T ( b)T ( b) = ((T ( b)))2 , that is, (T ( b)) = (T (b)). This and the second equality of (9.10) imply that (T (b)) ∪ {0} = (T ( b)) ∪ {0}. However, in general the sets (T (b)) and (T ( b)) need not coincide: If b(t) = t, then b)T ( b) = diag (0, 1, 1, . . . ), whence (T (b)) = {1} T ∗ (b)T (b) = diag (1, 1, 1, . . . ), T ∗ ( and (T ( b)) = {0, 1}. The set (T (b)) is available in special cases only. Sometimes the following is useful. Proposition 9.13. If b is a Laurent polynomial, then [min |b|, max |b|] ⊂ (T (b)) ⊂ [0, max |b|]. Proof. From Propositions 1.2 and 1.3 and Corollary 1.10 we see that there is a compact operator K such that ((T (b)))2 = sp T (b)T (b) = sp (T (|b|2 ) + K) ⊃ spess (T (|b|2 ) + K) = spess T (|b|2 ) = [min |b|2 , max |b|2 ],
i
i i
i
i
i
i
9.6. The Limiting Measure
buch7 2005/10/5 page 219 i
219
and obviously, ((T (b)))2 = sp T (b)T (b) ⊂ [0, T (b)22 ] = [0, max |b|2 ]. Thus, if T (b) is not Fredholm, which is equivalent to the equality min |b| = 0, then (T (b)) ∪ (T ( b)) = [0, max |b|]. However, if T (b) is Fredholm, the question of finding (T (b)) ∪ (T ( b)) ∩ [0, min |b|] is difficult.
9.6 The Limiting Measure The purpose of this section is to show that if b is a Laurent polynomial, then 1 1 f (σk (Tn (b))) = n→∞ n 2π k=1 n
2π
f (|b(eiθ )|)dθ
lim
(9.11)
0
for every compactly supported function f : R → C of bounded variation. Formula (9.11) is the Avram-Parter theorem. The approach of this section is due to Zizler, Zuidwijk, Taylor, and Arimoto [303]. Functions of bounded variation. Let f : R → C be a function with compact support. The function f is said to have bounded variation on a segment [a, b] ⊂ R, f ∈ BV [a, b], if there exists a constant V ∈ [0, ∞) such that m
|f (xj ) − f (xj −1 )| ≤ V
(9.12)
j =1
for every partition a = x0 < x1 < · · · < xm = b of [a, b]. The minimal V for which (9.12) is true for every partition of the segment [a, b] is called the total variation of f on [a, b] and is denoted by V[a,b] (f ). We let BV stand for the set of all functions f : R → C that have compact support and are of bounded variation on each segment [a, b] ⊂ R. Such functions are simply referred to as functions of bounded variation. If f is compactly supported and continuously differentiable, then f is clearly BV and V[a,b] (f ) ≤ f ∞ (b − a). The characteristic function χE of a finite interval E is also of bounded variation and V[a,b] (χE ) = 2 whenever [a, b] ⊃ E. If f ∈ BV and a ≤ x ≤ y ≤ b, then |f (y) − f (x)| ≤ V[a,b] (f );
(9.13)
indeed, by the definition of V[a,b] (f ), we even have |f (x) − f (a)| + |f (y) − f (x)| + |f (b) − f (x)| ≤ V[a,b] (f ).
i
i i
i
i
i
i
220
buch7 2005/10/5 page 220 i
Chapter 9. Singular Values
We begin with a result on the singular values of matrices, large blocks of which coincide. Theorem 9.14. Let r and n be natural numbers such that 1 ≤ r < n, let K = {k1 , . . . , kr } be a subset of {1, 2, . . . , n}, and put L = {1, 2, . . . , n} \ K. Suppose A and A are two complex n × n matrices whose j k entries coincide for all (j, k) ∈ L × L. If f ∈ BV and [a, b] is any segment that contains all singular values of A and A , then n f (σk (A)) − f (σk (A )) ≤ 3rV[a,b] (f ). k=1
Proof. Suppose first that r = 1. We can without loss of generality assume that K = {n} (the general case can be reduced to this case by permutation similarity). Let A = (aj k )nj,k=1 and define B = (aj k )n−1 j,k=1 . Applying Theorem 9.7 to the pairs (A, B) and (A , B), we get σ1 (A), σ1 (A ) ∈ [a, σ3 (B)], σ2 (A), σ2 (A ) ∈ [σ2 (B), σ4 (B)], ... σn−2 (A), σn−2 (A ) ∈ [σn−2 (B), b], σn−1 (A), σn−1 (A ) ∈ [σn−1 (B), b], σn (A), σn (A ) ∈ [σn−1 (B), b]. This in conjunction with (9.13) and the abbreviation σj (B) := σj gives n f (σk (A)) − f (σk (A )) k=1
≤ V[a,σ3 ] (f ) + V[σ2 ,σ4 ] (f ) + · · · + V[σn−2 ,b] (f ) + V[σn−1 ,b] (f ) + V[σn−1 ,b] (f ). Since each point of [a, b] is covered by at most three of the segments occurring in the last sum, it follows that this sum is at most 3V[a,b] (f ), which completes the proof for r = 1. Now let r > 1. Again we may assume that K = {n − r + 1, . . . , n}. Define n × n matrices A(0) , A(1) , . . . , A(r) so that A(0) = A, A(r) = A , and the pairs A(ν−1) , A(ν) (ν = 1, . . . , r) are as in the case r = 1 considered above. This can be achieved by setting, for ν = 0, . . . , r, aj(ν) k =
aj k aj k
for for
1 ≤ j, k ≤ n − ν, n − ν < j ≤ n or n − ν < k ≤ n.
Let [c, d] ⊃ [a, b] by any segment which contains the singular values of all A(ν) and define f : R → C by f =
0 f (x)
for for
x ∈ (−∞, a) ∪ (b, ∞), x ∈ [a, b].
i
i i
i
i
i
i
9.6. The Limiting Measure
buch7 2005/10/5 page 221 i
221
Clearly, f ∈ BV . From what was proved for r = 1 we obtain n n f (σk (A)) − f (σk (A )) = f(σk (A)) − f(σk (A(r) )) k=1
k=1
n r r f(σk (A(ν−1) )) − f(σk (A(ν) )) ≤ ≤ 3V[c,d] (f) = 3rV[a,b] (f ). ν=1 k=1
ν=1
Theorem 9.15. Let b(t) = rj =−r bj t j (t ∈ T) be a Laurent polynomial and let f ∈ BV . If [c, d] is any segment that contains [0, b∞ ], then for all n ≥ 1 n 2π n iθ f (σk (Tn (b))) − f (|b(e )|)dθ ≤ 7rV[c,d] (f ). 2π 0 k=1
Proof. Suppose first that |b| is constant on T. As observed in the end of Section 9.4, in that case b(t) = γ t m , |m| singular values of Tn (b) are zero and n − |m| singular values are equal to |γ |. Hence n
f (σk (Tn (b))) = |m|f (0) + (n − |m|)f (|γ |),
k=1
n 2π
2π
f (|b(eiθ )|)dθ = nf (|γ |),
0
and the assertion amounts to the inequality |m| |f (0) − f (|γ |)| ≤ 7|m|V[c,d] (f ), which is certainly true because |f (0) − f (|γ |)| ≤ V[c,d] (f ) by virtue of (9.13). Now suppose that |b| is not constant on T. Define Cn (b) as in Section 2.1 for n ≥ 2r +1 and put Cn (b) = Tn (b) for n ≤ 2r. The singular values of Cn (b) and Tn (b) are all contained in [0, b∞ ]. If n ≥ 2r + 1, then (9.5) implies that Tn (b) and Cn (b) differ only in the last r columns and rows. Consequently, by Theorem 9.14, n n f (σk (Tn (b))) − f (σk (Cn (b))) ≤ 3rV[c,d] (f ). (9.14) k=1
k=1
Put h(θ ) = f (|b(eiθ )|). By Proposition 9.5, n k=1
f (σk (Cn (b))) =
n−1 2π k h , n k=0
which gives n 2π n f (σk (Cn (b))) − h(θ )dθ 2π 0 k=1 n−1 2π(k+1)/n n 2π k = − h(θ ) dθ h 2π n 2π k/n k=0
i
i i
i
i
i
i
222
buch7 2005/10/5 page 222 i
Chapter 9. Singular Values n−1 n 2π(k+1)/n V[2π k/n,2π(k+1)/n] (h)dθ 2π k=0 2π k/n
≤
n−1
=
(recall (9.13))
V[2π k/n,2π(k+1)/n] (h) = V[0,2π ] (h).
(9.15)
k=0
Now let u(θ ) = |b(eiθ )|2 . By assumption, u is a nonconstant and nonnegative Laurent polynomial of degree at most 2r. Thus, u has at least 2 and at most 4r local extrema in [0, 2π). Let θ1 < θ2 < · · · < θ denote the local extrema. As |b| is monotonous on [θj , θj +1 ] (θ+1 := θ1 + 2π), we get V[0,2π ] (h) = V[θ1 ,θ1 +2π ] (f ◦ |b|) = V[θ1 ,θ2 ] (f ◦ |b|) + V[θ2 ,θ3 ] (f ◦ |b|) + · · · + V[θ ,θ1 +2π] (f ◦ |b|) ≤ V[c,d] (f ) + V[c,d] (f ) + · · · + V[c,d] (f ) = V[c,d] (f ) ≤ 4rV[c,d] (f ).
(9.16)
Combining (9.14), (9.15), and (9.16) we arrive at the assertion. Corollary 9.16. Let b be a Laurent polynomial and let f : R → C be a function with compact support. If f is continuous or of bounded variation, then 2π n 1 1 lim f (σk (Tn (b))) = f (|b(eiθ )|)dθ. (9.17) n→∞ n 2π 0 k=1 Proof. For f ∈ BV , the assertion is immediate from Theorem 9.15. So suppose f is continuous. Then f can be uniformly approximated by compactly supported functions of bounded variation (e.g., by continuously differentiable functions) fm . Given ε > 0, there is an m0 such that |f (x) − fm0 (x)| ≤ ε for x ∈ R. It follows that n n 1 1 f (σk (Tn (b))) − fm0 (σk (Tn (b))) n n k=1
k=1
1 1 f (σk (Tn (b))) − fm0 (σk (Tn (b))) ≤ nε = ε, n k=1 n 2π 2π 1 f (|b(eiθ )|)dθ − fm0 (|b(eiθ )|)dθ 2π 0 0 2π 1 f (|b(eiθ )|) − fm (|b(eiθ )|) dθ ≤ 1 2π ε = ε, 0 2π 0 2π n
≤ 1 2π ≤ and as
2π n 1 1 fm0 (σk (Tn (b))) − fm0 (|b(eiθ )|)dθ < ε n 2π 0 k=1
for all sufficiently large n due to Theorem 9.15, we get 2π n 1 1 iθ f (σk (Tn (b))) − f (|b(e )|)dθ < 3ε n 2π 0 k=1
i
i i
i
i
i
i
9.6. The Limiting Measure
buch7 2005/10/5 page 223 i
223
whenever n is large enough. This implies (9.17). Let E be a (Lebesgue) measurable subset of R. Given n ∈ N, we denote by Nn (E) the number of singular values of Tn (b) in E (multiplicities taken into account): Nn (E) =
n
χE (σk (Tn (b))).
k=1
We define the measure μn by μn (E) =
1 Nn (E) n
and we let μ denote the measure given by 2π 1 1 χE (|b(eiθ )|)dθ = μ(E) = {t ∈ T : |b(t)| ∈ E} , 2π 0 2π where | · | stands for the Lebesgue measure on T. Corollary 9.17. If b is a Laurent polynomial, then the measures μn converge weakly to the measure μ, that is, f dμn → f dμ R
R
for every compactly supported continuous function f : R → C. Proof. Since R
1 f (σk (Tn (b))), n k=1 n
f dμn =
f dμ = R
1 2π
2π
f (|b(eiθ )|)dθ, 0
this is a straightforward consequence of Corollary 9.16. Obviously, all singular values of Tn (b) lie in [0, max |b|]. Corollary 9.18. Let b is a Laurent polynomial of the form b(t) = E ⊂ R is any segment, then | Nn (E) − nμ(E) | ≤ 14r for all n ≥ 1,
r
j =−r
bj t j (t ∈ T). If (9.18)
and if E = [min |b|, max |b|], then even | Nn (E) − n | ≤ 7r for all n ≥ 1.
(9.19)
Proof. Theorem 9.15 with f = χE and [c, d] = [0, max |b|] gives |Nn (E) − nμ(E)| ≤ 7rV[c,d] (χE ). Since V[c,d] (χE ) ≤ 2, we get (9.18). If E = [min |b|, max |b|], then μ(E) = 1 and V[c,d] (χE ) ≤ 1. This yields (9.19). Our next objective is an improvement of estimate (9.19). For this purpose we need the following analogue of Theorem 9.7.
i
i i
i
i
i
i
224
buch7 2005/10/5 page 224 i
Chapter 9. Singular Values
Theorem 9.19 (Cauchy’s interlacing theorem). Let A be a Hermitian n × n matrix and let B = Pn−1 APn−1 be the (n − 1) × (n − 1) principal submatrix. Then λ1 (B) ≤
λ1 (A)
≤ λ1 (B)
λ2 (A)
≤
λn−2 (B) ≤
... λn−1 (A)
λn−1 (B) ≤
λn (A).
Theorem 9.20. Let b(t) =
r
j =−s
λ2 (B)
≤ λn−1 (B)
bj t j (t ∈ T) with r, s ≥ 0. Then
Nn ([0, min |b|)) ≤ r + s for all n ≥ 1.
(9.20)
Proof. If n ≤ r + s, then (9.20) is trivial. So let n ≥ r + s + 1. We have Tn (b)Tn (b) = Tn (|b|2 ) − Pn Ks Pn − Wn Lr Wn , where Ks and Lr are infinite matrices whose entries outside the upper-left s × s and r × r blocks, respectively, vanish (see the beginning of the proof of Theorem 5.8). Thus, we may think of Tn (b)Tn (b) as resulting from Tn−r−s (|b|2 ) by r + s times adding a row and a column. On r + s times employing Theorem 9.19, we get λ1 (Tn−r−s (|b|2 )) ≤ λr+s+1 (Tn (b)Tn (b)). As λ1 (Tn−r−s (|b|2 ) ≥ min |b|2 by Corollary 4.28, it follows that λr+s+1 (Tn (b)Tn (b)) ≥ min |b|2 . Consequently, at most r + s eigenvalues of Tn (b)Tn (b) are located in [0, min |b|2 ). This is equivalent to saying that at most r + s singular values of Tn (b) lie in the set [0, min |b|).
Let b(t) = rj =−s bj t j (t ∈ T) with r, s ≥ 0 and suppose min |b| > 0. Denote by k the winding number of b about the origin. Since b(t) = t −s (b−s + b−s+1 t + · · · + br t r+s ) and since k is the difference of the number of zeros and the number of poles of b(z) in the unit disk, we see that |k| ≤ max(r, s). Form Theorem 9.4 we know that if n is sufficiently large, then at least |k| singular values of Tn (b) lie in [0, min |b|), and Theorem 9.20 shows that, for every n ≥ 1, at most r + s singular values of Tn (b) are contained in [0, min |b|).
9.7
Proper Clusters
Let E be a subset of R and denote by γn (E) the number of the singular values of Tn (b) (multiplicities taken into account) that do not belong to E. Thus, with Nn (E) as in Section 9.6, γn (E) = n − Nn (E). For ε > 0, put Uε (E) = {λ ∈ R : dist (λ, E) < ε}. Tyrtyshnikov calls E a cluster and a proper cluster for (Tn (b)) if, respectively, γn (Uε (E)) = o(n) and γn (Uε (E)) = O(1) for each ε > 0. Put R(|b|) = [min |b|, max |b|]. Theorem 9.21. Let b be a Laurent polynomial of degree r. Then γn (R(|b|)) ≤ 7r and hence R(|b|) is a proper cluster for (Tn (b)). If E is a subset of R(|b|) and the closure of E is properly contained in R(|b|), then E is not a cluster for (Tn (b)). Proof. Formula (9.19) is equivalent to the inequality γn (R(|b|)) ≤ 7r. As γn (Uε (R(|b|))) ≤ γn (R(|b|)),
i
i i
i
i
i
i
9.8. Norm of Matrix Times Random Vector
buch7 2005/10/5 page 225 i
225
it follows that R(|b|) is a proper cluster. Now let E ⊂ R(|b|) and suppose R(|b|) \ E contains some interval (c, d) with c < d. The R(|b|) \ Uε (E) also contains some interval (cε , dε ) with cε < dε if only ε > 0 is sufficiently small. Clearly, μ((cε , dε )) =
1 | {t ∈ T : |b(t)| ∈ (cε , dε )} | =: δε > 0. 2π
From formula (9.18) we therefore obtain that γn (Uε (E)) = Nn (R(|b|) \ Uε (E)) ≥ Nn ((cε , dε )) ≥ nμ((cε , dε )) − 14r = nδε − 14r, which shows that γn (Uε (E))/n does not converge to zero. Thus, E cannot be a cluster for (Tn (b)).
9.8
Norm of Matrix Times Random Vector
Let An be a real n×n matrix and let σ1 ≤ σ2 ≤ · · · ≤ σn be the singular values of An . We have An x2 ≤ An 2 for every unit vector x ∈ Rn , and the set {An x2 /An 2 : x2 = 1} coincides with the segment [σ1 /σn , 1]. The purpose of this section is to show that for a randomly chosen unit vector x the value of An x22 /An 22 typically lies near 1 σ12 + · · · + σn2 . σn2 n
(9.21)
Notice that σn = An 2 and that σ12 + · · · + σn2 = An 2F , where An F is the Frobenius (or Hilbert-Schmidt norm). Thus, if An 2 = 1, then for a typical unit vector x the value of An x22 is close to An 2F /n. Obviously, in the case where An is a large Toeplitz matrix, the expression (9.21) can be tackled by the Avram-Parter formula (9.11). Let Bn = {x ∈ Rn : x2 ≤ 1} and Sn−1 = {x ∈ Rn : x2 = 1}. For a given real n × n matrix An , we consider the random variable Xn (x) =
An x2 , An 2
where x is uniformly distributed on Sn−1 . For k ∈ N, the expectation of Xnk is An xk2 1 k EXn = dσ (x), |Sn−1 | Sn−1 An k2 where dσ is the surface measure on Sn−1 . The variance of Xnk is 2 2 σ 2 Xnk = E Xnk − EXnk = EXn2k − EXnk . Lemma 9.22. For every natural number k, 1 An xk2 An xk2 1 dσ (x) = dx. |Sn−1 | Sn−1 An k2 |Bn | Bn An k2 xk2
i
i i
i
i
i
i
226
buch7 2005/10/5 page 226 i
Chapter 9. Singular Values
Proof. Using spherical coordinates, x = rx with x ∈ Sn−1 , we get 1 An xk2 r k An x k2 n−1 1 dx = r dσ (x )dr = An x k2 dσ (x ), k rk n Sn−1 0 Bn x2 Sn−1 and since |Sn−1 | =
2π n/2 (n/2)
and
|Bn | =
π n/2 (n/2 + 1)
(9.22)
and thus |Sn−1 |/n = |Bn |, the assertion follows. Theorem 9.23. If An = 0, then 1 σ12 + · · · + σn2 , σn2 n 2 4 4 2 2 σ + · · · + σ + · · · + σ 2 1 σ n n 1 1 . σ 2 Xn2 = − n + 2 σn4 n n EXn2 =
(9.23) (9.24)
Proof. Let An = Un Dn Vn be the singular value decomposition. Thus, Un and Vn are orthogonal matrices and Dn = diag (σ1 , . . . , σn ). By Lemma 9.22, 1 Un Dn Vn x22 2 EXn = dx |Bn | Bn Un Dn Vn 22 x22 Dn Vn x22 Dn x22 1 1 dx = dx = |Bn | Bn Dn 22 Vn x22 |Bn | Bn Dn 22 x22 σ12 x12 + · · · + σn2 xn2 1 (9.25) = dx1 . . . dxn . |Bn | Bn σn2 (x12 + · · · + xn2 ) A formula by Liouville states that if λ < (p1 + · · · + pn )/2, then p −1 p −1 x1 1 . . . xn n ··· dx1 . . . dxn (x12 + · · · + xn2 )λ x1 , . . . , xn ≥ 0 x12 + · · · + xn2 ≤ 1 p p 1 n . . . 1 2 2 = p1 + · · · + pn n p1 + · · · + pn 2 −λ 2 2
(9.26)
(see, e.g., [120, No. 676.14]). From (9.22) and (9.26) we infer that xj2 1 dx |Bn | Bn x12 + · · · + xn2 n−1 n 3 1 +1 n 2 1 2 2 2 = . = n/2 n − 1 3 3 n − 1 n π + −1 + 2n 2 2 2 2
i
i i
i
i
i
i
9.8. Norm of Matrix Times Random Vector This together with (9.25) gives (9.23). In analogy to (9.25), (σ12 x12 + · · · + σn2 xn2 )2 1 EXn4 = dx1 . . . dxn . |Bn | Bn σn4 (x12 + · · · + xn2 )2
buch7 2005/10/5 page 227 i
227
(9.27)
From (9.26) we obtain xj4 1
dx (x12 + · · · + xn2 )2 n−1 n 5 1 +1 n 3 2 2 2 2 = = , n/2 n−1 5 n−1 5 n(n + 2) π + −2 + 2n 2 2 2 2
|Bn |
1 |Bn |
Bn
xj2 xk2
dx (x12 + · · · + xn2 )2 n−2 2 n 3 1 +1 n 2 1 2 2 2 = = , n − 2 n − 2 3 3 3 3 n(n + 2) π n/2 n 2 + + −2 ++ + 2 2 2 2 2 2 Bn
whence, by (9.27), EXn4 =
n σj4 j =1
=
σj2 σk2 3 1 + 2 4 n(n + 2) σn4 n(n + 2) σ n j 0. This reveals that for large n the values of An x22 /(An 22 x22 ) cluster around 1 σ12 + · · · + σn2 . σn2 n Notice also that σ 2 Xn2 can be written as
2 σj2 − σi2 2 1 . σ 2 Xn2 = n + 2 σn4 i<j n
Figures 9.2 to 9.4 illustrate the phenomenon by two examples. Obvious modifications of the proof of Theorem 9.23 show that Theorem 9.23 remains true for complex matrices on Cn with the 2 norm.
i
i i
i
i
i
i
228
buch7 2005/10/5 page 228 i
Chapter 9. Singular Values
2
2
1.5
1.5
1
1
0.5
0.5
0
0
−0.5 0
10
20
30
40
50
−0.5 0
10
20
30
40
50
Figure 9.2. Let An be the n × n matrix all entries of which are 1. We see the values An x22 /An 22 for 50 vectors x that were randomly drawn from the unit sphere of Rn with the uniform distribution. Note that the expected value of An x22 /An 22 is 1/n and that the variance is less than 2/n2 . The n is 20 in the left picture and 100 in the right.
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0 0
10
20
30
40
50
0 0
10
20
30
40
50
Figure 9.3. Let b(t) = t + t −1 . The pictures show Tn (b)x22 /Tn (b)22 for 50 vectors x that were randomly drawn from the unit sphere of Rn with the uniform distribution. We have n = 30 in the left picture and n = 600 in the right. The expected value for Tn (b)x22 /Tn (b)22 converges to 0.5 as n → ∞.
i
i i
i
i
i
i
9.9. The Case of Toeplitz and Circulant Matrices
500
buch7 2005/10/5 page 229 i
229
2000
400
1500
300 1000 200 500
100 0 0
20
40
60
80
0 0
100
20
40
60
80
100
Figure 9.4. The symbol is again b(t) = t + t −1 . The pictures show the distribution of 100 Tn (b)x22 /Tn (b)22 for 10000 vectors x that were randomly drawn from the unit sphere of Rn with the uniform distribution. In the left picture we took n = 30 and in the right n = 600. Notice the different scales of the vertical axes.
9.9 The Case of Toeplitz and Circulant Matrices We need one more simple auxilary result. Lemma 9.24. Let EXn2 = μ2n and suppose μn → μ as n → ∞. If ε > 0 and |μn − μ| < ε, then P (|Xn − μ| ≥ ε) ≤
μ2n (ε
σ 2 Xn2 . − |μn − μ|)2
Proof. We have P (|Xn − μ| ≥ ε) ≤ P |Xn − μn | ≥ ε − |μn − μ| ≤ P |Xn − μn |(Xn + μn ) ≥ μn (ε − |μn − μ|) = P |Xn2 − μ2n | ≥ μn (ε − |μn − μ|) , and the assertion is now immediate from Chebyshev’s inequality. Now let b be a Laurent polynomial and let σ1 (Tn (b)) ≤ · · · ≤ σn (Tn (b)) be the singular values of Tn (b). We abbreviate σj (Tn (b)) to σj . The Avram-Parter formula (9.11) tells us that 2π f (σ1 ) + · · · + f (σn ) 1 = f (|b(eiθ )|)dθ (9.29) lim n→∞ n 2π 0
i
i i
i
i
i
i
230
buch7 2005/10/5 page 230 i
Chapter 9. Singular Values
for every compactly supported function f : R → C with bounded variation. In particular, σ1k + · · · + σnk 1 = bkk := n→∞ n 2π
2π
lim
|b(eiθ )|k dθ
(9.30)
0
for every natural number k. Moreover, if T (b) is invertible, then σ1 = σ1 (Tn (b)) stays away from zero as n → ∞ (Theorem 3.7), and hence (9.29) with f (s) equal to a negative integral power of s times the characteristic function of [m, M] for appropriate 0 < m < M shows that (9.30) is true for every integer k. Theorem 9.25. If |b| is not constant on the unit circle T, then for each ε > 0 there is an n0 such that Tn (b)x2 3 1 b44 − b22 b2 P ≥ε ≤ − (9.31) Tn (b)2 x2 b∞ n + 2 ε 2 b22 b2∞ for all n ≥ n0 . If |b| is constant throughout T, then Tn (b)x2 1 P ≤1−ε =o Tn (b)2 x2 n
(9.32)
for each ε > 0. Proof. Put $ 1 μn = σn
σ12 + · · · + σn2 , n
μ=
b2 . b∞
Suppose first that |b| is not constant. Then b4 > b2 . From (9.30) we know that μn → μ. Moreover, (9.30) and Theorem 9.23 imply that n+2 2 2 1 σ Xn → b44 − b42 . 4 2 b∞ Thus, Lemma 9.24 shows that P (|Xn − μ| ≥ ε) ≤
1 1 3 4 4 b − b 4 2 n + 2 b4∞ μ2 ε 2
for all sufficiently large n, which is (9.31). On the other hand, if |b| is constant, we infer from (9.30) and Theorem 9.23 that μn → 1
and
n+2 2 2 σ Xn = o(1), 2
whence, by Lemma 9.24, P (Xn ≤ 1 − ε) ≤
1 3 1 o(1) 2 = o , n+2 ε n
i
i i
i
204
SING METHODS FOR DIFFERENTIAL EQUATIONS
The vectors
and
have dimension mxmy x 1, where Theorem A.22 has been used. Note that co(Z/(2)) corresponds to a natural or lexicographic ordering of the sine gridpoints from left to right, bottom to top. Thus (zt>3/j) follows (xk, yi) if yj > yi or if yj = yi and z; > z*. For purposes of illustrating a solution method for the general equation (5.84) or equivalently (5.101), assume that D(^)^4(v)JD(0!c) and [D((f>'y)A(w)D((j)'y)]Tare diagonalizable (this will depend on the choice of weight function). Diagonalizability guarantees two nonsingular matrices P and Q such that
and
From Appendix A.3, (5.84) is equivalent to
where and
Thus if the spectrums of the matrices are denoted by
and
i
i
i
232
buch7 2005/10/5 page 232 i
Chapter 9. Singular Values
with 0 < σj ≤ σj +1 ≤ · · · ≤ σn and j ≤ k, and from Theorem 9.23 we infer that 1 1 1 EXn2 = σj2 + ··· + 2 n σn σj2 2 σj2 σj2 1 σj 1 1 = + ··· + 2 + + ··· + 2 2 n σj2 n σn σk σk+1 1 C 2 e−2γ n n − k (k − j + 1) + n n λ2 2 −2γ n k C e k+1 ≤ + ≤ n λ2 n ≤
for all sufficiently large n. Also by Theorem 9.23, ⎛ 2 ⎞ 1 2 1 1 1 1 σ 2 Xn2 = + ··· + 4 − + ··· + 2 ⎠, σ4 ⎝ n(n + 2) j σj4 σn n σj2 σn and, analogously,
C 4 e−4γ n (n − k) ≤ k + 1, λ4 2 2 1 1 C 2 e−2γ n (n − k) 4 σj + · · · + ≤ k + ≤ (k + 1)2 , σn2 λ2 σj2 σj4
1 1 + ··· + 4 σn σj4
≤k+
which gives σ 2 Xn2 = O(1/n2 ). If ε > 0, then k + 1 ε2 P (Xn ≥ ε) = P (Xn2 ≥ ε2 ) ≤ P Xn2 ≥ + n 2 for all sufficiently large n, and thus, ε2 2 2 P (Xn ≥ ε) ≤ P Xn ≥ EXn + 2 2 4 2 2 ε 1 2 2 ≤ P |Xn − EXn | ≥ . ≤ 4 σ Xn = O 2 ε n2 Define the circulant matrices Cn (b) as in Section 2.1. The singular values of Cn (b) j are |b(ωn )| (j = 0, . . . , n − 1), where ωn = e2π i/n . The only Laurent polynomials b of constant modulus are b(t) = αt k (t ∈ T) with α ∈ C; in this case Cn (b)x2 = |α| x2 for all x. Theorem 9.28. If |b| is not constant, then for each ε > 0 there exists an n0 such that Cn (b)x2 3 1 b44 − b22 b2 P ≥ ε ≤ − for all n ≥ n0 . Cn (b)2 x2 b∞ n + 2 ε 2 b22 b2∞
i
i i
i
i
i
i
9.10. The Nearest Structured Matrix
buch7 2005/10/5 page 233 i
233
Proof. The proof is analogous to the proof of (9.31). Note that now (9.30) amounts to the fact that the integral sum σ1k + · · · + σnk 1 = |b(e2π ij/n )|k n n j =0 n−1
converges to the Riemann integral 1 2π iθ k |b(e )| dθ = 0
2π
|b(eiθ )|k
0
dθ =: bkk . 2π
j
Furthermore, it is obvious that σn = max |b(ωn )| → b∞ . If b has no zeros on T, then Cn−1 (b) = Cn (b−1 ), and hence Theorem 9.28 delivers Cn−1 (b)x2 3 1 b−1 44 − b−1 22 b−1 2 P ≥ ε ≤ − (9.33) b−1 ∞ n + 2 ε 2 b−1 22 b−1 2∞ Cn−1 (b)2 x2 for all sufficiently large n.
9.10 The Nearest Structured Matrix We denote by Mn (R) the linear space of all n×n matrices with real entries. Let An ∈ Mn (R) and let 0 ≤ σ1 ≤ · · · ≤ σn be the singular values of An . Suppose σn > 0. The random variable Xn2 = An x22 /An 22 assumes its values in [0, 1]. In this section we establish a few results on the distribution function of this random variable and we give a nice application to the problem of describing in probabilistic terms the distance of a matrix to the nearest matrix of a given structure. With notation as in the proof of Theorem 9.23, : : An x22 Dn Vn x22 Eξ := x ∈ Sn−1 : < ξ = x ∈ Sn−1 : 0 and β > 0. We first consider 2 × 2 matrices, that is, we let n = 2. From (9.34) we infer that F2 (ξ ) = P
σ12 2 2 x + x < ξ . 2 σ22 1
(9.35)
The constellation σ1 = σ2 is uninteresting, because F2 (ξ ) = 0 for ξ < 1 and F2 (ξ ) = 1 for ξ ≥ 1 in this case. Theorem 9.29. If σ1 < σ2 , then the random variable X22 is subject to the B( 21 , 21 ) distribution on (σ12 /σ22 , 1). 1 Proof. Put τ = σ1 /σ2 . By (9.35), F2 (ξ ) is 2π times the length of the piece of the unit 2 2 circle x1 + x2 = 1 that is contained in the interior of the ellipse τ 2 x12 + x22 = ξ . This gives F2 (ξ ) = 0 for ξ ≤ τ 2 and F2 (ξ ) = 1 for ξ ≥ 1. Thus, let ξ ∈ (τ 2 , 1). Then the circle and the ellipse intersect at the four points
⎛ " ⎞ $ 2 ⎝± 1 − ξ , ± ξ − τ ⎠ , 1 − τ2 1 − τ2 and consequently, $ 2 F2 (ξ ) = arctan π
ξ − τ2 , 1−ξ
which implies that F2 (ξ ) equals 1 1 (ξ − τ 2 )−1/2 (1 − ξ )−1/2 = (ξ − τ 2 )−1/2 (1 − ξ )−1/2 π B(1/2, 1/2) and proves that X22 has the B( 21 , 21 ) distribution on (τ 2 , 1). In the case n ≥ 3, things are more involved. An idea of the variety of possible distribution functions is provided by the class of matrices whose singular values satisfy 0 = σ1 = · · · = σn−2 < σn−1 < σn . For y ∈ (0, 1), the complete elliptic integrals K(y) and E(y) are defined by K(y) = 0
π/2
#
dϕ 1 − y 2 sin2 ϕ
,
E(y) =
π/2
%
1 − y 2 sin2 ϕ dϕ.
0
i
i i
i
i
i
i
9.10. The Nearest Structured Matrix
buch7 2005/10/5 page 235 i
235
Put μ = σn /σn−1 . In [52], we showed that on (0, 1/μ2 ) one has the following densities: ⎞ ⎛$ 2 − 1) μ ξ(μ ⎠, f3 (ξ ) = √ K⎝ 1−ξ π 1−ξ f4 (ξ ) = μ
(uniform distribution), ⎛$ ⎞ √ 3μ 1 − ξ ⎝ ξ(μ2 − 1) ⎠ f5 (ξ ) = E , π 1−ξ f6 (ξ ) = 2μ − μ(μ2 + 1)ξ , and f7 (ξ ) equals ⎛ ⎛$ ⎞ ⎛$ ⎞⎞ 2 − 1) 2 − 1) 5μ # ξ(μ ξ(μ ⎠ − (1 − ξ μ2 ) K ⎝ ⎠⎠ . 1 − ξ ⎝(4 − 2ξ − 2ξ μ2 ) E⎝ 3π 1−ξ 1−ξ In some particular cases, one gets a complete answer. Here is an example. Theorem 9.30. Let n ≥ 3. If σ1 = · · · = σn−m = 0 and σn−m+1 = · · · = σn > 0, then the random variable Xn2 is B( m2 , n−m ) distributed on (0, 1). 2 This can be proved by the argument of the proof of Theorem 9.29, the only difference being that now one has to compute some multidimensional integrals. A full proof is in [52]. Orthogonal projections have just the singular value pattern of Theorem 9.30. This leads to some pretty nice conclusions. Let E be an N -dimensional Euclidean space and let U be an m-dimensional linear subspace of E. We denote by PU the orthogonal projection of E onto U . Then for y ∈ E, the element PU y is the best approximation of y in U and we have y2 = PU y2 + y − PU y2 . The singular values of PU are N − m zeros and m units. Thus, Theorem 9.30 implies that if y is uniformly distributed on the unit sphere of E, then PU y2 has the B( m2 , N−m ) distribution on (0, 1). In particular, if N is large, then 2 #m PU y lies with high probability close to the sphere of radius N and the squared distance m . y − PU y2 clusters sharply around 1 − N Now take E = Mn (R). With the Frobenius norm · F , E is an n2 -dimensional Euclidean space. Let U = Str n (R) denote any class of structured matrices that form an m-dimensional linear subspace of Mn (R). Examples include the Toeplitz matrices, Toepn (R) the Hankel matrices, Hank n (R) the tridiagonal matrices, Tridiagn (R) the tridiagonal Toeplitz matrices, TridiagToepn (R) the symmetric matrices, Symmn (R) the lower-triangular matrices, Lowtriangn (R) the matrices with zero main diagonal, Zerodiagn (R) the matrices with zero trace, Zerotracen (R).
i
i i
i
i
i
i
236
buch7 2005/10/5 page 236 i
Chapter 9. Singular Values
The dimensions of these linear spaces are dim Toepn (R) = 2n − 1,
dim Hank n (R) = 2n − 1,
dim Tridiagn (R) = 3n − 2, dim Symmn (R) =
n +n , 2 2
dim Zerodiagn (R) = n2 − n,
dim TridiagToepn (R) = 3, dim Lowtriangn (R) =
n2 + n , 2
dim Zerotracen (R) = n2 − 1.
Suppose n is large and Yn ∈ Mn (R) is uniformly distributed on the unit sphere on Mn (R), Yn 2F = 1. Let PStr Yn be the best approximation of Yn by a matrix in Str n (R). Notice that the determination of PStr Yn is a least squares problem that can be easily solved. For instance, PToep Yn is the Toeplitz matrix whose kth diagonal, k = −(n − 1), . . . , n − 1, is formed by the arithmetic mean of the numbers in the kth diagonal of Yn . Recall that dim Str n (R) = m. 2 From what was said in the preceding paragraph, we conclude that PStr Yn 2F is B( m2 , n −m ) 2 2 2n−1 n −2n+1 2 distributed on (0, 1). For example, PToep Yn has the B( 2 , ) distribution on 2 (0, 1). The expected value of the variable Yn − PToep Yn 2 is 1 − n2 + n12 and the variance does not exceed n43 . Hence, Chebyshev’s inequality gives 2 ε 2 ε 1 1 4 2 P 1 − + 2 − < Yn − PToep Yn < 1 − + 2 + (9.36) ≥ 1− 2. n n n n n n nε Consequently, PToep Yn is with high probability found near the sphere with the radius % 2 − n12 and Yn − PToep Yn 2F is tightly concentrated around 1 − n2 + n12 . n We arrive at the conclusion that nearly all n × n matrices of Frobenius norm 1 are at nearly the same distance to the set of all n × n Toeplitz matrices! This does not imply that the Toeplitz matrices are at the center of the universe. In fact, the conclusion is true for each of the classes Str n (R) listed above. For instance, from Chebyshev’s inequality we obtain 1 1 1 1 1 P (9.37) − − ε < Yn − PSymm Yn 2 < − +ε ≥1− 2 2 2 2n 2 2n 2n ε and
P
1 ε 1 ε − 2 < Yn − PZerotrace Yn 2 < 2 + 2 2 n n n n
≥1−
2 n2 ε 2
.
If the expected value of Yn − PStruct Yn 2 stays away from 0 and 1 as n → ∞, we have much sharper estimates. Namely, Lemma 2.2 of [93] in conjunction with Theorem 9.30 implies that if Xn2 has the B( m2 , N −m ) distribution on (0, 1), then 2 m/2 m m 1−τ m/2 P Xn2 ≤ σ , P Xn2 ≥ τ (9.38) ≤ σ e1−σ ≤ τe N N for 0 < σ < 1 < τ . This yields, for example, 1 1 1 1 P σ − < Yn − PSymm Yn 2F < τ − 2 2n 2 2n 1−σ (n2 +n)/4 1−τ (n2 +n)/4 − τe (9.39) ≥ 1 − σe
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 237 i
237
whenever 0 < σ < 1 < τ . Clearly, (9.39) is better than (9.37). On the other hand, let ε > 0 be small and choose τ such that τ (1 − n2 + n12 ) = 1 − n2 + n12 + nε . Then ε2 (τ e1−τ )n−1/2 = 1 − 2n + O ( n12 ), the O depending on ε, and hence (9.38) amounts to 2 ε 1 ε2 1 2 P Yn − PToep Yn ≥ 1 − + 2 + , ≤1− +O n n n 2n n2 which is worse than the Chebyshev estimate (9.36).
Exercises p
1. Let n be the space Cn with the p norm. Does every n × n matrix An have a p representation An = Un Sn Vn where Un and Vn induce invertible isometries on n and Sn is a diagonal matrix? 2. Let {An }, {Bn }, {En }, {Rn } be sequences of n × n matrices and suppose An = Bn + En + Rn ,
En 2F = o(n),
rank Rn = o(n).
Let λj (Cn ) and σj (Cn ) (j = 1, . . . , n) denote the eigenvalues and singular values of an n × n matrix Cn . (a) Prove that if, in addition, An , Bn , En , Rn are all Hermitian, then the eigenvalues of An and Bn are tied by the relation 1 ϕ(λj (An )) − ϕ(λj (Bn )) = 0 n→∞ n j =1 n
lim
for every continuous function ϕ : R → C with compact support. (b) Show that if Cn is an arbitrary n × n matrix and Hn denotes the Hermitian matrix ( C0n∗ C0n ), then {λ1 (Hn ), . . . , λ2n (Hn )} = {σ1 (Cn ), . . . , σn (Cn ), −σ1 (Cn ), . . . , −σn (Cn )}. (c) Deduce from (a) and (b) that the singular values of An and Bn satisfy 1 ϕ(σj (An )) − ϕ(σj (Bn )) = 0 n→∞ n j =1 n
lim
for every compactly supported continuous function ϕ : R → C. 3. Let X1 , . . . , Xn be independent random variables subject to the Gaussian normal distribution with mean 0 and variance 1. Show that (X1 , . . . , Xn ) % X12 + · · · + Xn2 is uniformly distributed on the unit sphere Sn−1 .
i
i i
i
i
i
i
238
buch7 2005/10/5 page 238 i
Chapter 9. Singular Values
4. Let An be the n × n matrix
⎛
1 ⎜ 1 An = ⎜ ⎝ ... 1
1 ... 1 ... ... ... 1 ...
⎞ 1 1 ⎟ ⎟, ... ⎠ 1
let x = (x1 , . . . , xn ) be uniformly distributed on Sn−1 , and consider the random variable Xn2 = An x22 /An 22 . (a) Show that the inequality An x22 ≤ An 22 x22 is the inequality (x1 + · · · + xn )2 ≤ n (x12 + · · · + xn2 ). (b) Compute the singular values of An . (c) Show that EXn2 =
1 , n
σ 2 Xn2 =
2 1 n+2 n
1 1− . n
Use Chebyshev’s inequality to deduce that the inequality n (9.40) (x1 + · · · + xn )2 ≤ (x12 + · · · + xn2 ) 2 is true with probability of at least 90 % for n ≥ 18 and with probability of at least 99 % for n ≥ 57 and that the inequality n (x1 + · · · + xn )2 ≤ (9.41) (x 2 + · · · + xn2 ) 100 1 is true with probability of at least 90 % for n ≥ 895 and with probability of at least 99 % for n ≥ 2829. (d) Prove that Xn2 is subject to the B ( 21 , n−1 ) distribution on (0, 1) and use this 2 insight to show that (9.40) is true with probability of at least 90 % for n ≥ 6 and with probability of at least 99 % for n ≥ 12 and that (9.41) is true with probability of at least 90 % for n ≥ 271 and with probability of at least 99 % for n ≥ 662. 5. Every Hilbert space operator A with closed range has a well-defined Moore-Penrose inverse A+ . Let A = T (a) : 2 → 2 with a ∈ W and suppose T (a) has closed range. Heinig and Hellinger [154] (also see [71]) showed that Tn+ (a) converges strongly to T + (a) if and only if a(t) = 0 for t ∈ T and one of the following conditions is satisfied: wind a = 0,
(9.42) −1
wind a > 0 and (a )−m = 0 for all sufficiently large m, −1
wind a < 0 and (a )m = 0 for all sufficiently large m,
(9.43) (9.44)
where (a −1 )j denotes the j th Fourier coefficient of a −1 . Show that if a ∈ P, then (9.43) or (9.44) are only possible if a is of the form a(t) = t k p+ (t) with k ≥ 1 and a polynomial p+ ∈ P + such that p+ (z) = 0 for |z| ≤ 1 or a(t) = t −k p− (t) with k ≥ 1 and a polynomial p− ∈ P − such that p− (1/z) = 0 for |z| ≤ 1.
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 239 i
239
6. To compute the numerical range H2 (An ) = {(An x, x) : x2 = 1} of an n × n matrix An one could try drawing a large number N of random vectors xj from the uniform distribution on the unit sphere of Cn and plotting the superposition of the values (An xj , xj ) (j = 1, . . . , N). In the right picture of Figure 9.5 we see the result for An = Tn (b) with b as in Figure 7.1, n = 50 and N = 500. As the numerical range contains at least the convex hull of the eigenvalues, which are shown in the left picture, we conclude that our experiment failed dramatically. Why did it fail? 20
20
15
15
10
10
5
5
0
0
−5
−5
−10
−10
−15
−15
−20
−20
−25
−25
−30
−20
−10
0
10
20
−30
−20
−10
0
10
20
Figure 9.5. We see the range b(T) and the 50 eigenvalues of T50 (b) (left) and the values (T50 (b)x, x) for 500 vectors x drawn randomly from the uniform distribution on the unit sphere of C50 (right).
Notes Proofs of the theorems of Section 9.1 are in [27], [133], [166], for example. The splitting phenomenon, Theorem 9.4, was discovered by Roch and Silbermann [224], [225]. They proved Theorem 9.4 for p = 2 and their proof is based on C ∗ -algebra techniques. In [35], another proof was given and the result was extended to 1 ≤ p ≤ ∞. Estimate (9.2) was established in our paper [47]. The proof of Theorem 9.4 presented here is a combination of arguments of [35] and [48]. The idea of deriving results on Toeplitz matrices by comparing Toeplitz matrices with their circulant cousins has been developed by Shigeru Arimoto and coauthors in a series of papers since about 1985 which deal with problems of theoretical chemistry and in which sequences of banded circulant matrices are called alpha matrices. Independently, the same idea has emerged in papers by Beam and Warming [20], Tyrtyshnikov [284], and Serra Capizzano and Tilli [251]. Theorems 9.6 to 9.9 are from our book [48], but we are sure that versions of these theorems had been known earlier. We noticed in [48] that we found these theorems in our manuscripts but that we cannot remember whether we obtained them ourselves some time ago or whether we took them from somewhere. Theorems 9.7 and 9.19 can be found in [27] and [167]. We thank Estelle Basor for pointing out that Theorem 9.7 was incorrectly stated in our previous book [48].
i
i i
i
i
i
i
240
buch7 2005/10/5 page 240 i
Chapter 9. Singular Values
Formula (9.9) is Widom’s [295] and its generalization (9.8) is due to Roch and Silbermann [222]. Formula (9.11), the so-called Avram-Parter theorem, was established by Parter [200] for symbols b ∈ L∞ which are locally normal, that is, which can be written as the product of a continuous and a real-valued function. Avram [10] proved (9.11) for general b ∈ L∞ . We refer the reader to Sections 5.6 and 5.8 of [71] and to [301] for further generalizations of the Avram-Parter theorem. The elegant approach and the results of Section 9.6 are due to Zizler, Zuidwijk, Taylor, and Arimoto [303]. The notions of the cluster and the proper cluster were introduced by Tyrtyshnikov in [284]. Sections 9.8 to 9.10 are based on our paper [52]. A solution to Exercise 1 is in [71] (see also [250]). Exercise 2 is a result of Tyrtyshnikov [282], [284]. Further results: C ∗ -algebras I. The purpose of the following is to illustrate how a few simple C ∗ -algebra arguments yield part of the results established in the previous chapters very quickly. Of course, the C ∗ machinery forces us to limit ourselves to the case p = 2. A Banach algebra is a complex Banach space A with an associative and distributive multiplication such that ab ≤ a b for all a, b ∈ A. If a Banach algebra has a unit element, which is usually denoted by e, 1, or I , it is referred to as a unital Banach algebra. A conjugate-linear map a → a ∗ of a Banach algebra into itself is called an involution if a ∗∗ = a and (ab)∗ = b∗ a ∗ for all a, b ∈ A. Finally, a C ∗ -algebra is a Banach algebra with an involution such that a ∗ a = a2 for all a ∈ A. Notice that C ∗ -algebras are especially nice Banach algebras, although the terminology suggests the contrary. If H is a Hilbert space, then the sets B(H ) and K(H ) of all bounded and compact linear operators on H are C ∗ -algebras under the operator norm and passage to the Hermitian adjoint as involution. This is in particular the case for B := B(2 ) and K := K(2 ). Clearly, B is unital, but K has no unit element. The set C := C(T) is a C ∗ -algebra under the norm · ∞ and the involution a → a (passage to the complex conjugate). The Wiener algebra W is a Banach algebra with an involution, a → a, but it is not a C ∗ -algebra, because the equality aaW = a2W is not satisfied for all a ∈ W . In the case p = 2, the Banach algebras F and S introduced in Section 3.4 are C ∗ -algebras. The involution is defined by {An }∗ := {A∗n }. A subset of a C ∗ -algebra A that is itself a C ∗ -algebra is called a C ∗ -subalgebra. Let A be a C ∗ -algebra and let E be a subset of A. The C ∗ -algebra generated by E is the smallest C ∗ -subalgebra of A that contains E. (In other words, the C ∗ -algebra generated by E is the intersection of all C ∗ -subalgebras of A that contain E.) If this C ∗ -algebra is A itself, one says that A is generated by E. One of the many excellent properties of C ∗ -algebras is their inverse closedness. This means the following. If A1 is a unital C ∗ -algebra with the unit element e and A2 is a C ∗ -subalgebra of A1 which contains e, then every element a ∈ A2 that is invertible in A1 is automatically invertible in A2 . So far we have considered the Toeplitz operator T (a) for a ∈ W only. For a ∈ C (or even a ∈ L∞ (T)), this operator is also defined via the matrix (aj −k )∞ j,k=1 formed of the Fourier coefficients. It is not difficult to prove that T (a) is bounded on 2 for a ∈ C (or even a ∈ L∞ (T)). If b ∈ P, then the sequence {Tn (b)} is an element of the C ∗ -algebra F. Let A denote the C ∗ -algebra generated by E = {{Tn (b)} : b ∈ P} in F. Let finally G be the
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 241 i
241
set of all {An } ∈ F for which An 2 → 0 as n → ∞. Theorem on A. The C ∗ -algebra A is the set of all sequences {An } of the form An = Tn (a) + Pn KPn + Wn LWn + Cn
(9.45)
with a ∈ C, K ∈ K, L ∈ K, {Cn } ∈ G. This theorem was established in [66] (proofs are also in [70, Proposition 7.27] and [62, Proposition 2.2]). A C ∗ -algebra homomorphism is a map f : A1 → A2 of a C ∗ -algebra A1 into a ∗ C -algebra A2 satisfying f (αa) = αf (a),
f (a + b) = f (a) + f (b),
f (ab) = f (a)f (b),
f (a ∗ ) = f (a)∗
for all α ∈ C, a ∈ A1 , b ∈ A1 . The set G is obviously a closed two-sided ideal of the C ∗ -algebra A. Therefore the quotient algebra A/G is a C ∗ -algebra with the usual quotient operations and the usual quotient norm. The sum B ⊕B is the C ∗ -algebra of all ordered pairs (A, B) ∈ B 2 with the natural operations and the norm (A, B) := max(A2 , B2 ). Let {An } be given by (9.45). Then An → A := T (a) + K
and
:= T ( W n An Wn → A a) + L
strongly as n → ∞. Theorem on A/G. The map Sym defined by Sym : A/G → B ⊕ B, {An } + G → (A, A) is a C ∗ -algebra homomorphism that preserves spectra and norms. It is easily verified that Sym is a C ∗ -algebra homomorphism. Since the only compact Toeplitz operator is the zero operator (this is Corollary 1.13 for a ∈ C), it follows that Sym is injective. As injective homomorphisms of unital C ∗ -algebras automatically preserve spectra and norms (which is another exquisite property of C ∗ -algebras that is not shared by general Banach algebras), we arrive at the conclusion of the theorem. This simple reasoning goes back to [66], [70, Theorem 7.11] and is explicit in [34]. Here are some immediate consequences of the theorem on A/G. Consequence 1. A sequence {An } ∈ A is stable on 2 if and only if A = T (a) + K = T ( and A a ) + L are invertible on 2 (Theorem 3.13 for p = 2). Indeed, due to the inverse closedness of A/G in F/G, the stability of {An } is equivalent to the condition that 0 does not belong to the spectrum of {An } + G in A/G. are invertible on 2 , then Consequence 2. If {An } ∈ A and both A and A −1 A−1 n = Tn (a ) + Pn XPn + Wn Y Wn + Dn
(9.46)
with X ∈ K, Y ∈ K, {Dn } ∈ G for all sufficiently large n (recall Section 3.5). Indeed, the assumption implies that ({An } + G)−1 ∈ A/G and the theorem on A therefore yields (9.46). Passing to the strong limit n → ∞ in (9.46) we get T −1 (a) = T (a −1 ) + X
and
T −1 ( a ) = T ( a −1 ) + Y,
i
i i
i
i
i
i
242
buch7 2005/10/5 page 242 i
Chapter 9. Singular Values
that is, we recover (3.15) and (3.16). Consequence 3. If {An } ∈ A, then 2) lim An 2 = max(A2 , A
n→∞
(Corollary 5.14 for p = 2). This follows from Theorem 3.1 and the equalities 2 ). lim sup An 2 = {An } + GA/G = Sym ({An } + G)B⊕B = max(A2 , A n→∞
are invertible on 2 , then Consequence 4. If {An } ∈ A and A and A −1 −1 lim A−1 n 2 = max(A 2 , A 2 )
n→∞
(Theorem 6.3 for p = 2). To see this, combine Consequences 2 and 3. is nonzero, Consequence 5. If {An } ∈ A and at least one of the operators A and A then 2 ) max(A−1 2 , A −1 2 ) lim κ2 (An ) = max(A2 , A
n→∞
(Corollary 6.4 for p = 2). This is straightforward from Consequences 3 and 4. Consequence 6. If {An } ∈ A and ε > 0, then (2) (2) (2) lim inf sp(2) ε An = lim sup spε An = spε A ∪ spε A n→∞
n→∞
(generalization of Theorem 7.7 in the case p = 2). Once Consequence 4 is available, this can be proved by the argument of the proof of Theorem 7.7. Consequence 7. If {An } ∈ A and A∗n = An for all n, then lim inf sp An = lim sup sp An = sp A ∪ sp A n→∞
n→∞
Then {An − λI } is stable (Consequence 1) (Lemma 9.10). To see this, let λ ∈ / sp A ∪ sp A. and hence the spectral radius of (An − λI )−1 remains bounded as n → ∞. This implies that / lim inf sp An . Then there exists a δ > 0 λ∈ / lim sup sp An . Conversely, let λ ∈ R and λ ∈ such that Uδ (λ) ∩ sp An = ∅ for infinitely many n, that is, Uδ (0) ∩ sp (An − λI ) = ∅ for infinitely many n. As An − λI is Hermitian, the spectral radius and the norm of (An − λI )−1 coincide, which gives that (An − λI )−1 2 ≤ 1/δ for infinitely many n. Consequently, we arrive at a subsequence {nk } such that {Ank − λI } and thus also {Wnk (Ank − λI )Wnk } is − λI . stable. Lemma 3.4 now yields the invertibility of A − λI and A Consequence 8. If {An } ∈ A, then lim inf (An ) = lim sup (An ) = (A) ∪ (A) n→∞
n→∞
(Corollary 9.11). Since {A∗n An } ∈ A, this follows from Consequence 7. Summary. Thus, we have demonstrated that many sharp convergence results can be obtained very comfortably by working with appropriate C ∗ -algebras. For more details and
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 243 i
243
for further developments of this idea we refer the reader to [36], [48], [223] and especially to Hagen, Roch, and Silbermann’s monograph [149]. Another approach to questions of numerical analysis via C ∗ -algebras was worked out by Arveson [7], [8], [9]. Fragments of this approach will be outlined in the notes to Chapter 14. We nevertheless want to emphasize that the C ∗ -algebra approach has its limitations. For example, it is restricted to Hilbert space operators. Moreover, refinements of the consequences cited above, such as estimates of the convergence speed, require hard analysis and hence the tools presented in the preceding chapters.
i
i i
i
i
buch7 2005/10/5 page 244 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 245 i
Chapter 10
Extreme Eigenvalues
In this chapter we embark on the extreme eigenvalues of Hermitian Toeplitz matrices and on estimates for the spectral radius of not necessarily Hermitian Toeplitz matrices.
10.1
Hermitian Matrices
Let b be a nonconstant Laurent polynomial. The matrix Tn (b) is Hermitian if and only if bj = b−j , that is, if and only if b is real valued. So suppose b is real valued and let m = min b(t),
M = max b(t).
t∈T
t∈T
By Lemma 4.7, 1 (Tn (b)x, x) = 2π
2π
b(eiθ )|f (eiθ )|2 dθ ,
(10.1)
0
where f (eiθ ) = x0 + x1 eiθ + · · · + xn−1 ei(n−1)θ . This implies that all eigenvalues of Tn (b) are contained in the open interval (m, M), m < λ1 (Tn (b)) ≤ λ2 (Tn (b)) ≤ · · · ≤ λn (Tn (b)) < M.
(10.2)
Theorem 10.1. Let b be a nonconstant real-valued Laurent polynomial, let R(b) = [m, M], and denote by 2α and 2β the maximal order of the zeros of b − m and M − b, respectively. Then for each fixed k, λk (Tn (b)) − m
1 1 , M − λn−k (Tn (b)) 2β , 2α n n
where the notation xn yn means that there are constants C1 , C2 ∈ (0, ∞) such that C1 yn ≤ xn ≤ C2 yn for all sufficiently large n. Proof. Put a = b − m. Then, by (10.1), (Tn (a)x, x) ≥ 0 for all x ∈ 2n . This shows that (Tn∗ (a)Tn (a))1/2 = (Tn (a)Tn (a))1/2 = Tn (a). Consequently, the eigenvalues of Tn (a) 245
i
i i
i
i
i
i
246
buch7 2005/10/5 page 246 i
Chapter 10. Extreme Eigenvalues
coincide with its singular values. Theorem 9.8 therefore gives λk (Tn (a)) ≤ Ek n−2α for all n with some Ek ∈ (0, ∞) independent of n. On the other hand, from Theorem 4.32 we infer that λk (Tn (a)) ≥ λ1 (Tn (a)) = Tn−1 (a)−1 ≥ Dk n−2α for all n with some constant Dk ∈ (0, ∞) which does not depend on n. Thus, λk (Tn (b)) − m = λk (Tn (a)) n−2α . Repeating the above reasoning with a = M − b, we obtain that M − λn−k (Tn (b)) = λk+1 (Tn (a)) n−2β . In concrete cases, the estimates provided by Theorem 10.1 can be made more precise by comparing Toeplitz matrices with appropriate circulants. For n ≥ 2r + 1, define the circulant Cn (b) as in Sections 2.1 and 9.3. On using Theorem 9.19 r times, we obtain λk (Cn+r (b)) ≤ λk (Tn (b)) ≤ λk+r (Cn+r (b)), λn−k (Cn+r (b)) ≤ λn−k (Tn (b)) ≤ λn+r−k (Cn+r (b)).
(10.3) (10.4)
From Proposition 2.1 we know that if n ≥ r + 1, then the eigenvalues of Cn+r (b) are b(e2π ij/(n+r) )
(j = 0, 1, . . . , n + r − 1).
This in conjunction with (10.3) and (10.4) often gives good bounds for the extreme eigenvalues of Tn (b). Example 10.2. Let b be a nonnegative and nonconstant Laurent polynomial of degree r ≥ 1. Suppose b has exactly one zero on T. Without loss of generality assume that this is a zero of order 2α ≥ 2 at the point 1. Thus, if we write h(θ ) = b(eiθ ), then h(θ ) =
h(2α) (0) 2α 1 + O(θ 2α+1 ) . θ (2α)!
Put μα = h(2α) (0)/(2α)!. It is clear that, for sufficiently large n, λ1 (Cn+r (b)) = h(0) = 0, 2α 2π 2π λ2 (Cn+r (b)) ∼ h , ∼ μα n+r n 2α 2π 2π λ3 (Cn+r (b)) ∼ h , ∼ μα n+r n 2α 4π 4π λ4 (Cn+r (b)) ∼ h ∼ μα , n+r n ... 2π [j/2] 2α 2π [j/2] λj (Cn+r (b)) ∼ h . ∼ μα n+r n
(10.5)
Combining (10.3) and (10.5) we see that, for fixed k ≥ 1, 2π [k/2] 2α 2π [(k + r)/2] 2α 10 9 ≤ λk (Tn (b)) ≤ μα μα 10 n 9 n for all sufficiently large n.
i
i i
i
i
i
i
10.1. Hermitian Matrices
buch7 2005/10/5 page 247 i
247
Although the eigenvalue distribution of Toeplitz band matrices will be the subject of the next chapter, we already now have everything at our disposal in order to treat the Hermitian case. Since sp Tn (b) = m + sp Tn (b − m) and since the eigenvalues of Tn (b − m) coincide with the singular values of Tn (b − m), we can have immediate recourse to the results of Section 9.4. However, since Theorem 9.19 is a little bit sharper than Theorem 9.7, repetition of the reasoning of Section 9.6 with Theorem 9.7 replaced by Theorem 9.19 yields slightly better results. We begin with the analogue of Theorem 9.14. Theorem 10.3. Let r and n be natural numbers such that 1 ≤ r < n, let K = {k1 , . . . , kr } be a subset of {1, 2, . . . , n} consisting of r distinct elements, and put L = {1, 2, . . . , n} \ K. Suppose A and A are two Hermitian n × n matrices whose j k entries coincide for all (j, k) ∈ L × L. If f ∈ BV and [a, b] is any segment which contains the eigenvalues of both A and A , then n f (λk (A)) − f (λk (A )) ≤ rV[a,b] (f ). k=1
Proof. We proceed exactly as in the proof of Theorem 9.14. Define B as in that proof. Now Theorem 9.19 gives λ1 (A), λ1 (A ) ∈ [a, λ1 (B)], λ2 (A), λ2 (A ) ∈ [λ1 (B), λ2 (B)], ... λn−1 (A), λn−1 (A ) ∈ [λn−2 (B), λn−1 (B)], λn (A), λn (A ) ∈ [λn−1 (B), b], whence, as in the proof of Theorem 9.14, n f (λk (A)) − f (λk (A )) ≤ V[a,λ ] (f ) + V[λ ,λ ] (f ) + · · · + V[λ ,b] (f ) = V[a,b] (f ) 1 1 2 n−1 k=1
with λj := λj (B). Thus, the factor 3 that we encountered in the proof of Theorem 9.14 disappears.
r j Theorem 10.4. Let b(t) = j =−r bj t (t ∈ T) be a nonconstant real-valued Laurent polynomial and let f ∈ BV . If [c, d] is any segment which contains R(b) = [min b, max b], then for all n ≥ 1, n 2π n f (λk (Tn (b))) − f (b(eiθ ))dθ ≤ 3rV[c,d] (f ). 2π 0 k=1
Proof. This follows from the reasoning of the proof of Theorem 9.15. In (9.14) we can drop the factor 3, and considering u(θ ) = b(eiθ ) instead of u(θ ) = |b(eiθ )|2 , we can replace the factor 4 in (9.16) by the factor 2. This gives the assertion.
i
i i
i
i
i
i
248
buch7 2005/10/5 page 248 i
Chapter 10. Extreme Eigenvalues
Corollary 10.5. Let b be a real-valued Laurent polynomial and let f : R → C be a function with compact support. If f is continuous or of bounded variation, then 1 1 f (λk (Tn (b))) = n→∞ n 2π k=1 n
2π
f (b(eiθ ))dθ.
lim
0
Proof. Use the same arguments as in the proof of Corollary 9.16. Given a (Lebesgue) measurable subset E of R, we put Nn (E) =
n
χE (λk (Tn (b))),
μn (E) =
k=1
1 μ(E) = 2π
2π
χE (b(eiθ ))dθ =
0
1 Nn (E), n
1 |{t ∈ T : b(t) ∈ E}|, 2π
where | · | is Lebesgue measure on T. In the same way we proved Corollaries 9.17 and 9.18 we now obtain the following two results. Corollary 10.6. Let b be a real-valued Laurent polynomial. Then f dμn → f dμ R
R
for every compactly supported continuous function f : R → C. Corollary 10.7. If b is a real-valued Laurent polynomial of degree r and E ⊂ R is a segment, then |Nn (E) − nμ(E)| ≤ 6r for every n ≥ 1. Corollary 10.8. Let b be a real-valued Laurent polynomial. Then sp Tn (b) ⊂ [min b, max b]
(10.6)
for every n ≥ 1 and lim inf sp Tn (b) = lim sup sp Tn (b) = [min b, max b] ( = sp T (b) ). n→∞
n→∞
Proof. The inclusion (10.6) follows from (10.2). Let λ ∈ [min b, max b] and put E = [λ − ε, λ + ε], where ε > 0 can be chosen as small as desired. Then μ(E) > 0, and Corollary 10.7 therefore implies that Nn (E) ≥ 1 for all sufficiently large n. This shows that [min b, max b] ⊂ lim inf sp Tn (b).
10.2
First-Order Trace Formulas
In this section we establish some simple results which are of interest on their own and will be needed in the next section. The trace tr A of an n × n matrix A = (aj k )nj,k=1 is defined as usual: tr A = a11 + a22 + · · · + ann .
i
i i
i
i
i
i
10.2. First-Order Trace Formulas
buch7 2005/10/5 page 249 i
249
Denoting by λ1 (A), . . . , λn (A) the eigenvalues of A, we have tr Ak = λk1 (A) + · · · + λkn (A) for every natural number k. The trace norm of A is defined by Atr = σ1 (A) + · · · + σn (A), where σ1 (A), . . . , σn (A) are the singular values of A. From Theorem 9.3 we deduce that ABCtr ≤ A2 Btr C2 .
(10.7)
|tr A| ≤ Atr .
(10.8)
It is also well known that
Finally, we denote by O the collection of all sequences {Kn }∞ n=1 of complex n × n matrices Kn such that 1 Ktr → 0. n Lemma 10.9. If a and b are Laurent polynomials, then {Tn (a)Tn (b) − Tn (ab)} ∈ O. Proof. By Proposition 3.10, Tn (a)Tn (b) − Tn (ab) = −Pn H (a)H ( b)Pn − Wn H ( a )H (b)Wn . The matrices H (a)H ( b) and H ( a )H (b) have only finitely many nonzero entries. Thus, since Pn 2 = Wn 2 = 1, inequality (10.7) yields 1 1 b)Pn tr ≤ Pn 2 H (a)H ( b)tr Pn 2 = o(1), Pn H (a)H ( n n 1 1 a )H (b)Wn tr ≤ Wn 2 H ( a )H (b)tr Wn 2 = o(1). Wn H ( n n Lemma 10.10. If b is a Laurent polynomial and k ∈ N, then {Tnk (b) − Tn (bk )} ∈ O. Proof. The assertion is trivial for k = 1. Now suppose that the assertion is true for some k ∈ N. Then Tnk+1 (b) = Tnk (b)Tn (b) = Tn (bk )Tn (b)+Kn Tn (b) with some {Kn } ∈ O. Since Kn Tn (b)tr ≤ Kn tr Tn (b)2 ≤ Kn tr b∞ , it is clear that {Kn Tn (b)} ∈ O. Lemma 10.9 implies that {Tn (bk )Tn (b) − Tn (bk+1 )} ∈ O. This shows that {Tnk+1 (b) − Tn (bk+1 )} is a sequence in O. Theorem 10.11. Let b be a Laurent polynomial and k ∈ N. Then 1 k 1 lim λj (Tn (b)) = n→∞ n 2π j =1 n
2π
(b(eiθ ))k dθ.
(10.9)
0
i
i i
i
i
i
i
250
buch7 2005/10/5 page 250 i
Chapter 10. Extreme Eigenvalues
Proof. First notice that 1 k 1 λj (Tn (b)) = tr Tnk (b). n j =1 n n
By Lemma 10.10, 1 1 1 tr Tnk (b) = tr Tn (bk ) + tr Kn n n n with {Kn } ∈ O. Since 1 1 1 k 1 tr Tn (bk ) = (b )0 + · · · + (bk )0 = n(bk )0 = (bk )0 = n n n 2π
2π
(b(eiθ ))k dθ 0
and, by (10.8), |tr Kn |/n ≤ Kn tr /n = o(1), we arrive at (10.9).
10.3 The Spectral Radius Given a matrix or an operator A, we denote by rad A its spectral radius, rad A = max{|λ| : λ ∈ sp A}. In general, rad Tn (b) does not converge to rad T (b); indeed, if b(t) = t, then rad Tn (b) = 0 and rad T (b) = 1.
r j Suppose b(t) = j =−r bj t (t ∈ T). For ∈ (0, ∞), we define the Laurent polynomial b by b (eiθ ) = b(eiθ ) =
r
bj j eij θ .
j =−r
Theorem 10.12 (Schmidt and Spitzer). If b is a Laurent polynomial, then 1/k 2π 1 iθ k lim sup (b(e )) dθ ≤ lim inf rad Tn (b) n→∞ 2π 0 k→∞ ≤ lim sup rad Tn (b) ≤ n→∞
inf b ∞ .
∈(0,∞)
(10.10)
Proof. Let λj (Tn (b)) be the eigenvalues of Tn (b). Obviously, 1/k n 1 k λj (Tn (b)) ≤ max |λj (Tn (b))| = rad Tn (b). n 1≤j ≤n j =1 From Theorem 10.11 we therefore get 1 2π
0
2π
1/k (b(eiθ ))k dθ ≤ lim inf rad Tn (b), n→∞
i
i i
i
i
i
i
10.3. The Spectral Radius
buch7 2005/10/5 page 251 i
251
which is stronger than the first inequality of (10.10). Since Tn (b ) = diag (1, , . . . , n−1 ) Tn (b) diag (1, −1 , . . . , −(n−1) ), we have sp Tn (b) = sp Tn (b ) and thus rad Tn (b) = rad Tn (b ) ≤ Tn (b )2 ≤ b ∞ . This implies that rad Tn (b) ≤
inf b ∞
∈(0,∞)
and gives the last inequality of (10.10). By Hadamard’s three circles theorem, log M() := log b ∞ = log max |b(eiθ )| θ ∈[0,2π )
is a convex function of log . Since log M() → +∞ as → 0 and → +∞, it follows that there is a unique 0 ∈ (0, ∞) such that inf b ∞ = b0 ∞ .
∈(0,∞)
We exhibit two cases in which all inequalities of (10.10) become equalities. Theorem 10.13. Let b be a Laurent polynomial and suppose T (b) is Hermitian. Then lim rad Tn (b) = rad T (b) = b∞ 1/k 2π 1 = inf b ∞ = lim sup (b(eiθ ))k dθ . ∈(0,∞) 2π 0 k→∞ n→∞
Proof. The function b is real valued. Let R(b) = [m, M]. We know from (10.2) and Theorem 10.1 that sp Tn (b) ⊂ [m, M],
λ1 (Tn (b)) → m,
λn (Tn (b)) → M.
This implies that
rad Tn (b) = max |λ1 (Tn (b))|, |λn (Tn (b))| → max(m, M) = b∞ = rad T (b).
If f is a nonnegative continuous function on T, then 1/k 2π 1 iθ k (f (e )) dθ = f ∞ . lim k→∞ 2π 0 Since b2 is nonnegative, we therefore get 1/k 1/(2k) 2π 2π 1 1 iθ k iθ 2k (b(e )) dθ ≥ lim sup (b(e )) dθ lim sup 2π 0 2π 0 k→∞ k→∞ 1/k 1/2 2π 1 = lim sup (b(eiθ ))2k dθ = b2 1/2 ∞ = b∞ . 2π 0 k→∞
i
i i
i
i
i
i
252
buch7 2005/10/5 page 252 i
Chapter 10. Extreme Eigenvalues
On the other hand, it is obvious that 1/k 2π 1 iθ k lim sup (b(e )) dθ ≤ b∞ . 2π 0 k→∞ Consequently, 1/k 2π 1 (b(eiθ ))k dθ = b∞ . lim sup 2π 0 k→∞ We are left with showing that inf b ∞ = b∞ .
∈(0,∞)
Assume the contrary, that is, inf b ∞ < b∞ .
∈(0,∞)
Because b−j = bj , we have b(eiθ ) = b(−1 eiθ ). Thus, we can assume that b ∞ = b−1 ∞ < b∞ . This means that the analytic function b(z) does not attain its maximum modulus on the boundary of the annulus {z ∈ C : ≤ |z| ≤ −1 }, which contradicts the maximum modulus principle. Theorem 10.14. Let b be a Laurent polynomial and suppose T (b) is triangular. Let b0 denote the 0th Fourier coefficient of b. Then 1/k 2π 1 iθ k (b(e )) dθ rad Tn (b) = |b0 | = inf b ∞ = ∈(0,∞) 2π 0 for every n ≥ 1 and every k ≥ 1. Proof. For the sake of definiteness, assume that b(z) = b0 + b1 z + · · · + bs zs . Clearly, sp Tn (b) = b0 and hence rad Tn (b) = |b0 |. The maximum modulus principle implies that M() := b ∞ decreases monotonically as → 0. Therefore inf M() = lim M() = |b(0)| = |b0 |. →0
∈(0,∞)
Finally, 1 2π
2π
(b(eiθ ))k dθ 0
is the 0th Fourier coefficient of bk and thus equal to b0k . Consequently 1/k 2π 1 iθ k (b(e )) dθ = |b0k |1/k = |b0 |. 2π 0
i
i i
i
i
i
i
10.4. Matrices with Nonnegative Entries
10.4
buch7 2005/10/5 page 253 i
253
Matrices with Nonnegative Entries
We now consider Toeplitz band matrices generated by Laurent polynomials b(t) =
r
bj t j
j =−r
for which bj ≥ 0 for all j . Schmidt and Spitzer showed that in this case the inequalities in (10.10) also become equalities. We here follow Elsner and Friedland [109], who developed an alternative approach to the problem. A nonempty subset K ⊂ Rn is called a cone if x +y ∈ K and αx ∈ K for all x, y ∈ K and all α ≥ 0. A cone K is said to be proper with respect to a linear subspace L ⊂ Rn if K ⊂ L, K is a closed and convex subset of Rn , K ∩ (−K) = {0} (property of being a pointed cone), and K − K = L (property of being solid in L). Theorem 10.15 (Krein and Rutman). Let L be a linear subspace of Rn and let K ⊂ L be a cone that is proper with respect to L. If a linear operator A : Rn → Rn leaves L invariant, then rad (A|L) is an eigenvalue of A with an associated eigenvector in K. For a proof see, e.g., [26, p. 6]. Let A be a real n × n matrix and let x ∈ Rn . In this section we write A ≥ 0 if all entries of A are nonnegative and x ≥ 0 in case all components of x are nonnegative. Corollary 10.16. Suppose A is a real n × n matrix and y ∈ Rn is a vector such that y ≥ 0, Ay ≥ 0, A2 y ≥ 0,
... .
Then A has a nonnegative real eigenvalue with an associated eigenvector z ≥ 0. Proof. Let K be the smallest closed and convex cone which contains y, Ay, A2 y, . . . and put L = K − K. Since K is a subset of the standard cone of the nonnegative vectors of Rn , it is clear that K ∩ (−K) = {0}. Thus, K is proper with respect to L. Obviously, A leaves K and therefore L invariant. By Theorem 10.15, rad (A|L) is a nonnegative eigenvalue of the matrix A with an associated eigenvector z ∈ K. Hence z ≥ 0. Theorem 10.17. Let b be a Laurent polynomial and suppose Tn (b) ≥ 0 for all n. Then lim rad Tn (b) =
n→∞
inf b ∞ .
∈(0,∞)
Proof. To make the proof transparent, we only consider the special case where b(t) = b−2 t −2 + b−1 t −1 + b0 + b1 t; it is easily seen that the argument used in the following works also in the general case. Let μn be the spectral radius of Tn (b) and, for n ≥ 3, consider the equation Tn (b)x (n) = μn x (n) .
(10.11)
i
i i
i
i
i
i
254
buch7 2005/10/5 page 254 i
Chapter 10. Extreme Eigenvalues
Theorem 10.15 (applied to the case where K is the standard cone of the nonnegative vectors of Rn ) implies that (10.11) has a solution (n) x (n) = (x0(n) , x1(n) , . . . , xn−1 ) ≥ 0,
(n) xn−1 = 0.
The first n − 2 equations of (10.11) yield b−2 x2(n)
=
(μn − b0 )x0(n) − b−1 x1(n)
b−2 x3(n)
=
−b1 x0(n) + (μn − b0 )x1(n) − b−1 x2(n)
b−2 x4(n)
=
−b1 x1(n) + (μn − b0 )x2(n) − b−1 x3(n)
... (n) b−2 xn−1
Put
(n) (n) (n) −b1 xn−4 + (μn − b0 )xn−3 − b−1 xn−2 .
=
⎛ An = ⎝ ⎞ x0(n) = ⎝ x1(n) ⎠ , x2(n)
0 0
−1 −b−2 b1
⎛
y (0,n)
1 0 −1 −b−2 (μn − b0 )
⎞ x1(n) = ⎝ x2(n) ⎠ , x3(n)
0 1
−1 −b−2 b−1
⎛
y (1,n)
⎞ ⎠, (n) ⎞ xn−3 (n) ⎠ . = ⎝ xn−2 (n) xn−1
⎛
. . . , y (n−3,n)
The last n − 3 equations of the above n − 2 equations can then be written in the form An y (0,n) = y (1,n) ,
An y (1,n) = y (2,n) ,
. . . , An y (n−2,n) = y (n−3,n) .
(10.12)
We can without loss of generality assume that y (0,n) 2 = 1. Then there is a sequence nk → ∞ such that y (0,nk ) converges in 2 to some y ∈ 2 with y2 = 1. Since Tn (b) is a principal submatrix of Tn+1 (b), we deduce that μn ≤ μn+1 (see, e.g., [26, p. 28]). As μn ≤ b∞ for all n, it follows that the sequence {μn } has a limit μ ∈ (0, ∞). Consequently, the matrices An converge to the matrix ⎞ ⎛ 0 1 0 ⎠. 0 0 1 A=⎝ −1 −1 −1 −b−2 b1 −b−2 (μ − b0 ) −b−2 b−1 For every integer j ≥ 0, Aj y = lim Ajnk y (0,nk ) = lim y (j,nk ) ≥ 0. k→∞
k→∞
Thus, by Corollary 10.16, the matrix A has a nonnegative eigenvalue λ with an associated eigenvector z = (z1 , z2 , z3 ) ≥ 0. The companion structure of A implies that z can be assumed to be z = (1, λ, λ2 ). Considering the last row of the equality Az = λz we obtain −b−2 (b1 + (μ − b0 )λ + b−1 λ2 ) = λ · λ2 ,
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 255 i
255
whence μ = b1 (1/λ) + b0 + b−1 (1/λ)−1 + b−2 (1/λ)−2 . Thus, μ = b(1/λ). Since b1/λ ∞ ≤ b(1/λ), it results that μ≥
inf b ∞ .
∈(0,∞)
The reverse inequality follows from Theorem 10.12.
Exercises 1. Let A ∈ B(2 ) be selfadjoint and put m = inf (Ax, x), x2 =1
M = sup (Ax, x). x2 =1
(a) Prove that An = Pn APn |Im Pn has the following properties: m ≤ λmin (An ) ≤ λmax (An ) ≤ M, lim λmin (An ) = m,
n→∞
lim λmax (An ) = M,
n→∞
{m, M} ⊂ sp A ⊂ lim inf sp An ⊂ lim sup sp An ⊂ [m, M]. n→∞
n→∞
(b) Find an A for which m = 0, M = 1, sp A = {0, 1}, and sp An = {0, 1} for all n ≥ 2. (c) Find an A such that m = 0, M = 1, sp A = {0, 1}, and lim inf sp An = lim sup sp An = [0, 1]. n→∞
n→∞
2. Let a, b ∈ P be real-valued Laurent polynomials such that a ≤ b on T and denote by λj (Tn (a)) and λj (Tn (b)) the eigenvalues of Tn (a) and Tn (b) in nondecreasing order. Show that λj (Tn (a)) ≤ λj (Tn (b)) for all j . 3. Let a, b ∈ P be nonnegative on T. Prove that the eigenvalues of Tn−1 (a)Tn (b) are all located in (r, R) where r=
inf
t∈T, a(t)>0
b(t) , a(t)
R=
sup t∈T, a(t)>0
b(t) . a(t)
4. Let ck = a|k| + b−|k| with real numbers a, b, such that b = 0 and 0 < < 1. Denote by λ1 (An ) ≤ · · · ≤ λn (An ) the eigenvalues of the Toeplitz matrix An = (cj −k )nj,k=1 . Prove that there are finite numbers m and M such that m ≤ λ2 (An ) ≤ · · · ≤ λn−1 (An ) ≤ M
(10.13)
for all n and that lim λ1 (An ) = −∞,
n→∞
lim λn (An ) = +∞.
n→∞
i
i i
i
i
i
i
256
buch7 2005/10/5 page 256 i
Chapter 10. Extreme Eigenvalues
5. Let H denote the set of all Hermitian Toeplitz matrices (aj −k )nj,k=1 with a0 ∈ [α0 , β0 ] and ak ∈ [αk , βk ] + i[γk , δk ] for 1 ≤ k ≤ n − 1, and let H0 stand for the subset of H constituted by the matrices with a0 = 0 and ak ∈ {αk , βk } + i{γk , δk }. The set H is called a Hermitian Toeplitz interval matrix and the 4n−1 matrices in H0 are referred to as the vertex matrices of H. Prove that the maximum (minimum) of the eigenvalues of the matrices in H equals β0 (α0 ) plus the maximum (minimum) of the eigenvalues of the matrices in H0 .
Notes The asymptotic behavior of the extreme eigenvalues of Hermitian Toeplitz matrices has been extensively studied for a long time and by many authors, including Kac, Murdock, and Szegö [176], Grenander and Szegö [145], Widom [290], [291], [292], Parter [197], [198], [199], and Serra Capizzano [247], [248]. We will say more on the contributions by Parter and Widom below. In the case of a single zero, Theorem 10.1 is classical. For symbols with several zeros, this theorem is due to Serra Capizzano [247]. The proof given here is from our book [48] and the same comments as on Theorem 9.8 apply to Theorem 10.1. Beginning with Theorem 10.3, we follow [303] in Section 10.1. Theorem 10.11 is a special case of what is called Szegö’s first limit theorem. For symbols b ∈ L∞ , this theorem was established by Szegö [262]. Versions of the theorem are now known to be true even for symbols b ∈ L1 or for Toeplitz matrices generated by the Fourier coefficients of Radon measures [285], [301]. We refer the reader to [71] for some more details. The set O was introduced by SeLegue [245], and the simple approach of Section 10.2 is based on ideas of SeLegue [245] and Fasino and Tilli [116]. The results of Sections 10.3 and 10.4 are due to Schmidt and Spitzer [243]. The arguments of Section 10.4 are from Elsner and Friedland’s paper [109]. Exercise 2 is from [290]. For Exercise 3 see [23]. Exercise 4 is due to Trench [279] (see also [281]). He even proved the following. If we put g(θ) =
(a − b)(1 − 2 ) , 1 − 2 cos θ + 2
then (10.13) is true with m = min g(θ ) and M = max g(θ ) for all sufficiently large n and 2π n−1 1 1 lim ϕ(λj (An )) = ϕ(g(θ)) dθ n→∞ n 2π 0 j =2 for every ϕ ∈ C[m, M]. Exercise 5 is from [159]. Further results: Parter and Widom. These two outstanding mathematicians have no joint published work, but their parallel efforts made the late 1950s and early 1960s the heyday of extreme eigenvalues of Hermitian Toeplitz matrices. Citing their results with proofs would go beyond the scope of this book. We think the following does nevertheless provide an idea of the exciting territory that was explored by Seymour Parter and Harold Widom more than 40 years ago. Let α be a natural number and let c ∈ P be a real-valued and positive Laurent polynomial, c(T) ⊂ (0, ∞). Put b(t) = |t − 1|2α c(t). The matrices Tn (b) are all Hermitian
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 257 i
257
and positive definite. We denote by 0 < λ1 (Tn (b)) ≤ λ2 (Tn (b)) ≤ · · · ≤ λn (Tn (b)) their eigenvalues. As in the notes to Chapter 4, we consider the functions [Tn−1 (b)][nx],[ny] ,
(x, y) ∈ [0, 1]2 .
Let Hα(n) be the integral operator on L2 (0, 1) with the kernel n [Tn−1 (b)][nx],[ny] . Widom [291], [292] observed that sp Hα(n) = sp Tn−1 (b) and showed that n−2α Hα(n) converges in the operator norm on L2 (0, 1) to an integral operator Hα with a certain kernel Fα (x, y). The expression for Fα (x, y) was quite complicated, but it resembled Gα (x, y)/c(1) with Gα (x, y) given by (4.62). To see that this should not come as a surprise, let us shortly jump to the year 2004. Formula (4.60) says that n−2α n [Tn−1 (b)][nx],[ny] →
1 Gα (x, y) c(1)
as
n→∞
uniformly on [0, 1]2 . This clearly implies that Hα(n) converges in the operator norm on L2 (0, 1) to the integral operator with the kernel Gα (x, y)/c(1). Thus, Widom’s kernel Fα (x, y) must coincide with Gα (x, y)/c(1). Let us denote by Kα the integral operator on L2 (0, 1) whose kernel is Gα (x, y). Obviously, Kα = c(1) Hα . The operator Kα is compact, selfadjoint, and positive definite. Let Kα = μ1 (Kα ) ≥ μ2 (Kα ) ≥ · · · > 0 be its eigenvalues (repeated according to their multiplicities). The largest eigenvalues of n−2α Hα(n) are n2α λ
1 1 ≥ 2α ≥ ··· . n λ2 (Tn (b)) 1 (Tn (b))
Since n−2α Hα(n) converges in the operator norm to (1/c(1)) Kα , we arrive at the conclusion that lim n2α λj (Tn (b)) =
n→∞
c(1) μj (Kα )
for each j . We write qn ∼ rn if rn = 0 for all n and qn /rn → 1 as n → ∞. Thus, we culminate with λj (Tn (b)) ∼
1 c(1) . μj (Kα ) n2α
This formula was proved by Kac, Murdock, and Szegö [176] for α = 1 and by Parter in [198] for α = 2 and then in [197] for general α. The above derivation is Widom’s [291], [292].
i
i i
i
i
i
i
258
buch7 2005/10/5 page 258 i
Chapter 10. Extreme Eigenvalues
What can be said about the eigenvalues μj (Kα )? Parter [197] and Widom [291] showed that Gα (x, y) is the Green kernel of the boundary value problem (−1)α u(2α) (x) = v(x),
x ∈ [0, 1],
(10.14)
u (0) = u (1) = 0,
k = 0, 1, . . . , α − 1.
(10.15)
(k)
(k)
For α = 1, this is a classical result [90]. For general α, this was independently rediscovered by Rambour and Seghier [215], [216]. A very short and self-contained proof is also in [40]. Thus, the solution of (10.14), (10.15) is given by 1 Gα (x, y)v(y) dy. u(x) = 0
Consequently, if we denote by 0 < γ1 (α) ≤ γ2 (α) ≤ · · · the eigenvalues of the boundary value problem (10.14), (10.15), then γj (α) = 1/μj (Kα ). It follows that c(1) γj (α) n2α
λj (Tn (b)) ∼ for each j . In particular, λmin (Tn (b)) ∼
c(1) γmin (α) , n2α
where λmin = λ1 and γmin = γ1 . We have γmin (α) = min(Au, u)/(u, u), where A is the operator given by (10.14), (10.15). Due to the boundary conditions, we can partially integrate α times to get 9 1 (α) [u (x)]2 dx (Au, u) . = 09 1 (u, u) [u(x)]2 dx 0
Hence, γmin (α) can also be characterized as 1/Cα , where Cα is the best constant for which the Wirtinger-Sobolev type inequality 1 1 2 [u(x)] dx ≤ Cα [u(α) ]2 dx 0
0
is true for all smooth functions u satisfying (10.15). It is well known (and easily seen) that γmin (1) = π 2 . Computation of an appropriate 4 × 4 determinant yields that γmin (2) = δ 4 , where δ is the smallest positive solution of the equation cosh δ cos δ = 1. This yields γmin (2) = 500.5467. The constant γmin (3) can be found as the smallest zero of a function that is given by a 6 × 6 determinant. We arrived numerically at γmin (3) ≈ 61529. In [74] it is shown that γmin (α) has the asymptotics 2α * + √ 1 4α 1+O √ γmin (α) = 8πα as α → ∞. e α
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 259 i
259
In the case α = 1 we also have γj (1) = j 2 π 2 . Parter [198] considered the case α = 2 and proved the formula (2j + 1)π + Ej 4 γj (2) = 16 , 4 where Ej is determined by the equation (2j + 1)π + Ej (2j + 1)π + Ej j = (−1) tanh . tan 4 4 He also computed the first four values: 5π + E2 = 3.9266, 4 9π + E4 = 7.0686. 4
3π + E1 = 2.3650, 4 7π + E3 = 5.4978, 4
The above results are first order asymptotics. Parter and Widom also established second-order asymptotics. Here is a sample result. Put h(θ ) = b(eiθ ). Widom [290] showed that if c is even, that is, c(t) = c(1/t) for all t ∈ T, then σ 2π 2j 2 1 2 1+ , +o λj (Tn (|t − 1| c(t))) = 2 2(n + 1) n+1 n3 where σ 2 = h (0) (> 0) and =
π
−π
1 h(θ ) 2 θ log cot dθ. 2σ 2 2 sin2 (θ/2)
Further results: preconditioning. Let b be a nonzero Laurent polynomial and suppose b(t) ≥ 0 for all t ∈ T. Then the matrices Tn (b) are positive definite. To solve the system Tn (b)x = y numerically, one can use preconditioning, that is, one can pass to the equivalent system −1 A−1 n Tn (b)x = An y
(10.16)
with an appropriate positive definite matrix An . System (10.16) can be solved by conjugate gradient iteration (see, e.g., [11], [85], [92], [101], [121], [211], [269], [273]). One starts with an initial vector x0 , and at the kth iteration step one obtains an approximate solution xk satisfying the error estimate √ κn − 1 2k ≤ 4 √ −1/2 −1/2 κn + 1 (An Tn (b)An (x − x0 ), x − x0 ) −1/2
(An
−1/2
Tn (b)An
(x − xk ), x − xk )
(10.17)
with −1/2
κn =
λmax (An
−1/2
λmin (An
−1/2
Tn (b)An
−1/2
Tn (b)An
)
)
=
λmax (A−1 n Tn (b)) λmin (A−1 n Tn (b))
.
(10.18)
i
i i
i
i
i
i
260
buch7 2005/10/5 page 260 i
Chapter 10. Extreme Eigenvalues
In view of (10.17), the task of designing a good preconditioner A−1 n includes the problem of keeping κn (≥ 1) as close to 1 as possible, which, by (10.18), means that the extreme eigenvalues of A−1 n Tn (b) should be as close to one another as possible. Gilbert Strang [258] and Tony Chan [89] proposed using circulant matrices An (see also [87]). This works if b has no zeros on T, but Tyrtyshnikov [283] proved that such preconditioners may fail if b has zeros on T. Potts and Steidl [212] showed that so-called ω-circulant matrices An lead to better results. Raymond Chan [83] proposed the use of banded Toeplitz matrices as preconditioners. Suppose, for example, b has a single zero on T, say at t0 , and let 2α be the order of this zero. Theorem 10.1 implies that κ(Tn (b)) n2α . Taking An = Tn (a) with a(t) = |t − t0 |2α one gets κn =
λmax (A−1 n Tn (b)) λmin (A−1 n Tn (b))
= O(1)
(see [83], [86]). This idea was elaborated also in [23], [246]. Chan, Ng, and Yip’s survey [86] and Potts’ dissertation [211] are excellent and upto-date introductions to the advanced precoditioning business for positive definite Toeplitz matrices. For additional issues addressing banded Toeplitz matrices we also refer to [151], [190], [260].
i
i i
i
i
i
i
buch7 2005/10/5 page 261 i
Chapter 11
Eigenvalue Distribution
This chapter is devoted to the results of Schmidt, Spitzer, and Hirschman on the asymptotic eigenvalue distribution of Tn (b) as n → ∞. We show that the spectra sp Tn (b) converge in the Hausdorff metric to a certain limiting set (b), which is either a singleton or the union of finitely many analytic arcs. We also determine the corresponding limiting measure, which characterizes the density of the asymptotic distribution of the eigenvalues along the limiting set.
11.1 Toward the Limiting Set Because things are trivial in the case where T (b) is triangular, we will throughout this chapter assume that b(t) =
s
bj t j ,
r ≥ 1,
s ≥ 1,
b−r = 0,
bs = 0.
j =−r
As first observed by Schmidt and Spitzer, it turns out that the eigenvalue distribution of Toeplitz band matrices is in no obvious way related to the spectrum of the corresponding infinite matrices. To see this, choose ∈ (0, ∞) and put b (t) =
s
bj j t j .
j =−r
Clearly, b (T) = b(T). We have Tn (b ) = diag (, 2 , . . . , n ) Tn (b) diag (−1 , −2 , . . . , −n ),
(11.1)
sp Tn (b ) = sp Tn (b).
(11.2)
and hence
261
i
i i
i
i
i
i
262
buch7 2005/10/5 page 262 i
Chapter 11. Eigenvalue Distribution
Thus, if there were any reason for sp Tn (b) to mimic b(T) or sp T (b), this reason would also force sp Tn (b) to mimic b(T) or sp T (b ). There are at least two possible definitions of the limiting set of the spectra sp Tn (b). We define s (b) := lim inf sp Tn (b) n→∞
as the set of all λ ∈ C for which there exist λn ∈ sp Tn (b) such that λn → λ, and we let w (b) := lim sup sp Tn (b) n→∞
stand for the set of all λ ∈ C for which there are n1 < n2 < n3 < · · · and λnk ∈ sp Tnk (b) such that λnk → λ. Obviously, s (b) ⊂ w (b). Lemma 11.1. We have s (b) ⊂ w (b) ⊂ sp T (b). Proof. Let λ0 ∈ / sp T (b). Then, by Theorem 3.7, {Tn (b − λ0 )} is stable, that is, Tn−1 (b − λ0 )2 ≤ M < ∞ for all n ≥ n0 . It follows that if |λ − λ0 | < 1/(2M), then Tn−1 (b − λ)2 ≤ 2M for all n ≥ n0 , which shows that λ0 has a neighborhood U (λ0 ) such that U (λ0 ) ∩ sp Tn (a) = ∅ for all n ≥ n0 . Consequently, λ0 ∈ / w (b). Corollary 11.2. We even have ?
s (b) ⊂ w (b) ⊂
sp T (b ).
(11.3)
∈(0,∞)
Proof. From (11.2) we know that s (b) = s (b ) and w (b) = w (b ). The assertion is therefore an immediate consequence of Lemma 11.1. We will show that all inclusions of (11.3) are actually equalities. At the present moment, we restrict ourselves to giving another description of the intersection occurring in (11.3). For λ ∈ C, put Q(λ, z) = zr (b(z) − λ) = b−r + · · · + (b0 − λ)zr + · · · + bs zr+s and denote by z1 (λ), . . . , zr+s (λ) the zeros of Q(λ, z) for fixed λ: Q(λ, z) = bs
r+s
(z − zj (λ)).
j =1
Label the zeros so that |z1 (λ)| ≤ |z2 (λ)| ≤ · · · ≤ |zr+s (λ)| and define (b) = {λ ∈ C : |zr (λ)| = |zr+1 (λ)|}.
(11.4)
i
i i
i
i
i
i
11.1. Toward the Limiting Set
buch7 2005/10/5 page 263 i
263
Theorem 11.3. The following equality holds: ? sp T (b ) = (b). ∈(0,∞)
Proof. By Corollary 1.11, T (b) − λ is invertible if and only if b(z) − λ has no zeros on T and wind (b − λ) = 0. As wind (b − λ) equals the difference of the zeros and the poles of b(z) − λ in D := {z ∈ C : |z| < 1} and as the only pole of b(z) − λ = b−r z−r + · · · + (b0 − λ) + · · · + bs zs is a pole of the multiplicity r at z = 0, it results that T (b) − λ is invertible if and only if b(z) − λ has no zeros on T and exactly r zeros in D. Equivalently, T (b) − λ is invertible exactly if Q(λ, z) has no zeros on T and precisely r zeros in D. Analogously, T (b ) − λ is invertible if and only if Q(λ, z) has no zero on −1 T and exactly r zeros in −1 D. Now suppose λ ∈ / (b). Then |zr (λ)| < |zr+1 (λ)|. Consequently, there is a such that |zr (λ)| < < |zr+1 (λ)|. It follows that Q(λ, z) has no zero@on T and exactly r zeros in D. Thus, T (b1/ − λ) is invertible and therefore λ is not in ∈(0,∞) sp T (b ). Conversely, suppose there is a ∈ (0, ∞) such that λ ∈ / sp T (b ). Then, by what was said above, Q(λ, z) has no zeros on −1 T and precisely r zeros in −1 D. This implies that |zr (λ)| < −1 < |zr+1 (λ)|, whence λ ∈ / (b). We will henceforth in this chapter always suppose that the greatest common divisor of the indices k with bk = 0 is 1: g.c.d. {k : bk = 0} = 1.
(11.5)
This is no loss of generality. Indeed, consider, for example, T5 (b) with b(t) = b2k t 2k . k
It is easily seen that
⎛ ⎜ ⎜ T5 (b) = ⎜ ⎜ ⎝
b0 0 b2 0 b4
0 b0 0 b2 0
b−2 0 b0 0 b2
0 b−2 0 b0 0
is unitarily equivalent (via a permutation matrix) to ⎛ b0 b−2 b−4 0 0 ⎜ b2 b0 b−2 0 0 ⎜ ⎜ b4 b2 b 0 0 0 ⎜ ⎝ 0 0 0 b0 b−2 0 0 0 b 2 b0
b−4 0 b−2 0 b0
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
(11.6)
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
(11.7)
which shows that (11.6) and (11.7) have the same eigenvalues. It follows that s (b) = s (b# ) and w (b) = w (b# ), where b# (t) := b2k t k . k
i
i i
i
i
i
i
264
11.2
buch7 2005/10/5 page 264 i
Chapter 11. Eigenvalue Distribution
Structure of the Limiting Set
A point λ0 ∈ C is called a branch point if Q(λ0 , z) has multiple zeros. Lemma 11.4. There are at most 2(r + s) − 1 branch points. Proof. A point λ is a branch point if and only if the polynomials Q(λ, z) = bs zr+s + · · · + (b0 − λ)zr + · · · + b−r and ∂ Q(λ, z) = (r + s)bs zr+s−1 + · · · + r(b0 − λ)zr−1 + · · · + b−r+1 ∂z have a common zero. This happens exactly if the resultant of Q and ∂Q/∂z is zero. The resultant of Q and ∂Q/∂z is the determinant ... b0 − λ ... b−r bs ... ... ... ... bs ... b0 − λ ... b−r (11.8) , (r + s)br+s ... r(b0 − λ) ... b−r+1 ... ... ... ... (r + s)bs ... r(b0 − λ) ... b−r+1 which has r + s − 1 rows starting with bs and r + s rows beginning with (r + s)bs . The determinant (11.8) equals r+s + O λ2(r+s)−2 ±(b0 − λ)r+s−1 r(b0 − λ) = ±r r+s λ2(r+s)−1 + O λ2(r+s)−2 , which is a polynomial of degree 2(r + s) − 1 and therefore has at most 2(r + s) − 1 distinct zeros. Since b−r = 0, none of the zeros zj (λ) of Q(λ, z) is zero. Lemma 11.5. If λ0 is not a branch point, then zj (λ)/zk (λ) (j = k) is not constant in some open neighborhood of λ0 . Proof. Assume zj (λ) = γ zk (λ) with some γ ∈ C \ {0} for all λ in some open neighborhood of λ0 . Then Q(λ, zk (λ)) = 0,
Q(λ, γ zk (λ)) = 0.
Consequently, 0 = γ −r Q(λ, γ zk (λ)) − Q(λ, zk (λ)) = γ −r b−r + · · · + (b0 − λ)(γ zk (λ))r + · · · + bs (γ zk (λ))r+s − b−r + · · · + (b0 − λ)(zk (λ))r + · · · + bs (zk (λ))r+s = (γ −r − 1)b−r + · · · + 0 + · · · + (γ s − 1)bs (zk (λ))r+s .
(11.9)
i
i i
i
i
i
i
11.2. Structure of the Limiting Set
buch7 2005/10/5 page 265 i
265
The function zk (λ) is not constant in a neighborhood of λ0 : If Q(λ, z0 ) = 0 for all λ sufficiently close to λ0 , then λz0r = b−r + · · · + b0 z0r + · · · + bs z0r+s , and since b−r = 0, it follows that z0 = 0, whence λ = b−r z0−r + · · · + b0 + · · · + bs z0s = const, which is impossible. Thus, from (11.9) we infer that γ k = 1 whenever bk = 0. By (11.5), this shows that γ = 1. Thus, zj (λ) = zk (λ) and, in particular, zj (λ0 ) = zk (λ0 ). This, however, contradicts our hypothesis that Q(λ0 , z) have no multiple zeros. Recall that we labelled the zeros zj (λ) of Q(λ, z) so that |z1 (λ)| ≤ |z2 (λ)| ≤ · · · ≤ |zr+s (λ)| and that we defined (b) = {λ ∈ C : |zr (λ)| = |zr+1 (λ)|}. Proposition 11.6. Let λ0 ∈ (b) and suppose λ0 is not a branch point. Then (a) λ0 is not an isolated point of (b), (b) there is an open neighborhood U of λ0 such that (b) ∩ U is a finite union of analytic arcs. Proof. Since λ0 ∈ (b), there are p ≥ 1 and q ≥ 1 such that |z1 (λ0 )| ≤ · · · ≤ |zr−p (λ0 )| < |zr−p+1 (λ0 )| = · · · = |zr+q (λ0 )| < |zr+q+1 (λ0 )| ≤ · · · ≤ |zr+s (λ0 )|. There is an open neighborhood U of λ0 such that |z1 (λ)|, . . . , |zr−p (λ)| < |zr−p+1 (λ)|, . . . , |zr+q (λ)| < |zr+q+1 (λ)|, . . . , |zr+s (λ0 )|
for
λ ∈ U.
Pick j, k ∈ {r − p + 1, . . . , r + q} and let γj k = {λ ∈ U : zj (λ)/zk (λ) ∈ T}. Clearly, λ0 ∈ γj k . By Lemma 11.5, the function ϕ(z) := zj (λ)/zk (λ) is not constant. Hence, there is a smallest m ∈ N such that ϕ (m) (λ0 ) = 0. We have ϕ(λ) = ϕ(λ0 ) +
ϕ (m) (λ0 ) (λ − λ0 )m (1 + ψ(λ)), m!
where ψ is analytic in U and ψ(λ0 ) = 0. This implies ; that γj k is the union of 2m analytic arcs starting at λ0 and terminating on ∂U . Put = j =k γj k . Of course, is also a finite
i
i i
i
i
i
i
266
buch7 2005/10/5 page 266 i
Chapter 11. Eigenvalue Distribution
union of analytic arcs beginning at λ0 and ending on ∂U . Each arc of is the carrier of a set of relations |zj1 (λ)| = |zj2 (λ)| = · · · = |zjn (λ)|
(n ≥ 2).
(11.10)
An arc of is contained in (b) if and only if |zr (λ)| = |zr+1 (λ)|
(11.11)
is a part of the relations (11.10). If (11.11) is not contained in (11.10), then λ0 is an isolated point of (b). Thus, we show that λ0 cannot be an isolated point of (b). This will complete the proof of (a) and (b). Assume λ0 is isolated. Then, after suitably relabelling the zeros, |zj (λ)| < |zk (λ)| for j ∈ {1, . . . , r}, k ∈ {r + 1, . . . , r + s}, λ ∈ U \ {λ0 }. Let χ (λ) = zr (λ)/zr+1 (λ). It follows that |χ (λ)| < 1 for λ ∈ U \ {λ0 } and that |χ (λ0 )| = 1. As χ is analytic in U , this contradicts the maximum modulus principle. Hence λ0 cannot be isolated. If λ0 is a branch point, we can introduce a uniformization parameter t such that λ = λ0 + t m ,
Q(λ0 + t m , z) = bs
r+s
(z − τj (t)),
j =1
and the functions τj are analytic in some open neighborhood V (0) of the origin. Lemma 11.7. If λ0 is a branch point and j = k, then τj (t)/τk (t) is not constant in V (0). Proof. We proceed as in the proof of Lemma 11.5. Assume τj (t) = γ τk (t) for some γ ∈ C \ {0}. Then Q(λ0 + t m , τk (t)) = 0 and Q(λ0 + t m , γ τk (t)) = 0, whence 0 = γ −r Q(λ0 + t m , γ τk (t)) − Q(λ0 + t m , τk (t)) = (γ −r − 1)b−r + · · · + 0 + · · · + (γ s − 1)bs (τk (t))r+s .
(11.12)
If τk (t) = z0 is constant, then Q(λ0 + t m , z0 ) = 0 for all t ∈ V (0), so (λ0 + t m )z0r = b−r + · · · + b0 z0r + · · · + bs z0r+s for all t ∈ V (0), and as z0 = 0, it results that λ0 + t m = b−r z0−r + · · · + b0 + · · · + bs z0s = const for all t ∈ V (0), which is impossible. Thus, τk (t) cannot be constant. From (11.12) and (11.5) we therefore obtain that γ = 1 and, consequently, τj (t) = τk (t). By Lemma 11.4, Q(λ0 + t m , z) has only simple zeros for t ∈ V (0) \ {0}. Hence τj (t) = τk (t) for t ∈ V (0) \ {0}. This contradiction completes the proof.
i
i i
i
i
i
i
11.3. Toward the Limiting Measure
buch7 2005/10/5 page 267 i
267
Proposition 11.8. If λ0 ∈ (b) is a branch point, then (a) λ0 is not an isolated point of (b), (b) there exists an open neighborhood U of λ0 such that (b) ∩ U is a finite union of analytic arcs. Proof. Using Lemma 11.7, this can be proved by the argument of the proof of Proposition 11.6, the only difference being that the γj k appearing there must be replaced by γj k = {t ∈ V (0) : τj (t)/τk (t) ∈ T}. A point λ ∈ (b) will be called an exceptional point if λ is a branch point or if there is no open neighborhood U of λ such that (b) ∩ U is an analytic arc starting and terminating on ∂U . Theorem 11.9. The set (b) is the union of a finite number of pairwise disjoint (open) analytic arcs and a finite number of exceptional points, and the set (b) has no isolated points. Proof. By Lemma 11.4, (b) has at most finitely many branch points. Theorem 11.3 implies that (b) is compact, which in conjunction with Propositions 11.6 and 11.8 shows that (b) contains at most a finite number of exceptional points. The assertion is now immediate from Propositions 11.6 and 11.8.
11.3 Toward the Limiting Measure If λ ∈ / (b), then, by definition (11.4), there is a real number satisfying |zr (λ)| < < |zr+1 (λ)|.
(11.13)
As usual, let Dn (a) = det Tn (a). Lemma 11.10. There is a continuous function g : C \ (b) → (0, ∞) such that lim |Dn (b − λ)|1/n = g(λ)
n→∞
uniformly on compact subsets of C \ (b). If is given by (11.13), then g(λ) = exp
2π
log |b (eiθ ) − λ|
0
dθ . 2π
(11.14)
Proof. Let λ ∈ / (b) and pick any such that (11.13) is satisfied. By (11.1), Dn (b − λ) = Dn (b − λ).
i
i i
i
i
i
i
268
buch7 2005/10/5 page 268 i
Chapter 11. Eigenvalue Distribution
Clearly, b (t) − λ = 0 for t ∈ T. The function b (z) − λ has exactly one pole in D, namely, a pole of the order r at z = 0. Due to (11.13), b (z) − λ has exactly r zeros in D. Hence wind (b − λ) = 0. From Theorem 2.11 we deduce that (11.15) Dn (b − λ) = G(b − λ)n E(b − λ) 1 + o(qλn ) , where, by (2.30), qλ ∈ (0, 1) can be taken as |zr (λ)| 1 qλ = +1 . 2 |zr+1 (λ)| Since qλ depends continuously on λ ∈ C \ (b), it follows that (11.15) holds uniformly with respect to compact subsets of C \ (b). From (11.15) we obtain that |Dn (b − λ)|1/n → |G(b − λ)| =: g(λ), and (2.25) implies that |G(b − λ)| = exp
2π
log |b (eiθ ) − λ|
0
dθ . 2π
Now let λ0 ∈ (b) and suppose λ0 is not an exceptional point. Let U be a sufficiently small open neighborhood of λ0 . Then U \ (b) has exactly two connected components. We denote them by D1 and D2 . For λ ∈ (b) ∩ U , there are p ≥ 1 and q ≥ 1 such that |z1 (λ)| ≤ · · · ≤ |zr−p (λ)| < |zr−p+1 (λ)| = · · · = |zr+q (λ)| < |zr+q+1 (λ)| ≤ · · · ≤ |zr+s (λ)|.
(11.16)
We can label the numbers |zj (λ)| so that max |z1 (λ)|, . . . , |zr (λ)| < min |zr+1 (λ)|, . . . , |zr+s (λ)| for λ ∈ D1 . Put N1 = {r + 1, . . . , r + s}. If λ ∈ D2 , there is a (unique) set N2 of integers drawn from {1, 2, . . . , r + s} such that max |zj (λ)| < min |zk (λ)|. j ∈N / 2
k∈N2
Clearly, N2 is the union of {r + q + 1, . . . , r + s} and a set formed by s − q natural numbers from {r − p + 1, . . . , r + q}. Consequently, if λ ∈ (b) ∩ U , then
r+s
|zk (λ)| =
k∈N1
|zk (λ)| =
k=r+1
|zk (λ)|.
(11.17)
k∈N2
Lemma 11.11. Let i ∈ {1, 2}. Then g(λ) = |bs |
|zk (λ)| for λ ∈ Di .
k∈Ni
i
i i
i
i
i
i
11.3. Toward the Limiting Measure
buch7 2005/10/5 page 269 i
269
Proof. Fix λ ∈ D1 and choose a satisfying (11.13). We have Q(λ, eiθ ) = bs
r+s
eiθ − zj (λ) = r eirθ b(eiθ ) − λ
j =1
and thus −r −irθ
b(e ) − λ = bs e iθ
r+s
eiθ − zj (λ)
j =1 r
r+s
j =1
It follows that
eiθ − zk (λ) .
1 − −1 e−iθ zj (λ)
= bs
k=r+1
2π dθ dθ log b(eiθ ) − λ log bs = 2π 2π 0 0 r 2π dθ + log 1 − −1 e−iθ zj (λ) 2π j =1 0 2π
r+s
+
2π
k=r+1 0
dθ log eiθ − zk (λ) + 2mπ i 2π
with some m ∈ Z. Because ∞ 1 −1 −iθ log 1 − −1 e−iθ zj (λ) = − e zj (λ) , =1
log eiθ − zk (λ) = log(−zk (λ)) + log 1 − eiθ /zk (λ) = log(−zk (λ)) −
∞ 1 =1
eiθ /zk (λ) ,
we get 0
2π
r+s dθ log b(eiθ ) − λ log(−zk (λ)) + 2mπ i = log bs + 2π k=r+1
and hence exp 0
2π
r+s dθ = bs log b(eiθ ) − λ (−zk (λ)). 2π k=r+1
Taking moduli and using Lemma 11.10 we arrive at the equality g(λ) = |bs |
r+s k=r+1
|zk (λ)| = |bs |
|zk (λ)|.
k∈N1
i
i i
i
i
i
i
270
buch7 2005/10/5 page 270 i
Chapter 11. Eigenvalue Distribution
This proves the assertion for i = 1. The proof is analogous for i = 2. Put Gi (λ) = bs (−zk (λ)) (i = 1, 2). k∈Ni
Obviously, Gi is analytic in U . Lemma 11.11 shows that g(λ) = |Gi (λ)| for λ ∈ Di .
(11.18)
This implies that g extends to a continuous function on the closure D i of Di . Lemma 11.12. We have |G1 (λ)| = |G2 (λ)| for λ ∈ (b) ∩ U , |G1 (λ)| > |G2 (λ)| for λ ∈ D1 , |G2 (λ)| > |G1 (λ)| for λ ∈ D2 .
(11.19) (11.20) (11.21)
Figure 11.1 illustrates the situation.
g(λ)
g(λ)
|G2 (λ)|
|G1 (λ)|
|G1 (λ)|
|G2 (λ)|
D1
(b)
D2 D1
(b)
D2
Figure 11.1. An illustration to Lemma 11.12. Proof. Equality (11.19) is obvious from (11.17). Let us prove inequality (11.20). For j ∈ {r − p + 1, . . . , r} and k ∈ {r + 1, . . . , r + q}, consider ϕj k (λ) := zj (λ)/zk (λ). By Lemma 11.5, ϕj k is not constant on U . Consequently, there is a λ0 ∈ (b) ∩ U such that ϕj k (λ0 ) = 0 for all j and all k. By the labelling of the numbers |z (λ)|, ϕj k (λ0 ) ∈ T and |ϕj k (λ)| < 1 for λ ∈ D1 .
(11.22)
i
i i
i
i
i
i
11.4. Limiting Set and Limiting Measure
buch7 2005/10/5 page 271 i
271
As ϕj k maps a small neighborhood of λ0 univalently onto a region of C, we can assume that ϕj k is univalent on U (simply choose U small enough). Thus, by (11.22), |ϕj k (λ)| > 1 for λ ∈ D2 , which implies that |zj (λ)| > |zk (λ)| for λ ∈ D2 . Taking into account the definition of N2 , we therefore see that N2 contains at least min(p, q) ≥ 1 numbers from {r − p + 1, . . . , r}. Consequently, for λ ∈ D2 we have |G1 (λ)| = |bs |
r+s
|zk (λ)| > |bs |
k=r+1
|zk (λ)| = |G2 (λ)|.
k∈N2
This completes the proof of (11.20). The inequality (11.21) follows from (11.20) by symmetry. For λ ∈ (b) ∩ U , let n1 = n1 (λ) and n2 = n2 (λ) denote the outer normal vector of D1 and D2 at λ, respectively. Corollary 11.13. (a) The function g extends to a function that is continuous and positive on C \ (b) and at the nonexceptional points of (b). (b) If λ ∈ (b) is a nonexceptional point, then the normal derivatives ∂g/∂n1 and ∂g/∂n2 exist at λ and ∂g ∂g (λ) + (λ) = 0. ∂n1 ∂n2
(11.23)
Proof. (a) We already observed that g admits the asserted continuous extension (see the paragraph before Lemma 11.12). Since all zeros of Q(λ, z) are nonzero, Lemma 11.11 implies that g(λ) > 0. (b) Since G1 and G2 are analytic in U , we see from (11.18) that the normal derivatives of g exist. Lemma 11.12 in conjunction with (11.18) implies (11.23).
11.4
Limiting Set and Limiting Measure
We denote by μn = μn,b the measure that assigns each eigenvalue λj (Tn (b)) (j = 1, . . . , n) measure 1/n. Thus, if E is a subset of C, then μn (E) is 1/n times the number of eigenvalues of Tn (b) (counted up in accordance with their multiplicities) which lie in E. Let C(C) be the set of all continuous functions on C. Obviously,
1 ϕ(λ) dμn (λ) = ϕ(λj (Tn (b))) for all ϕ ∈ C(C). n j =1 C n
(11.24)
Let g be as in Section 11.3. Then log g is locally integrable in C \ (b). As the two-dimensional Lebesgue measure of (b) is zero (Theorem 11.9), we see that log g is defined almost everywhere in C. Let log g stand for the distributional Laplacian of log g. We let C0∞ (C) denote the set of all infinitely differentiable and compactly supported functions of C = R2 into C. Furthermore, dA and ds will denote area and length measures.
i
i i
i
i
i
i
272
buch7 2005/10/5 page 272 i
Chapter 11. Eigenvalue Distribution
Lemma 11.14. The measures μn converge in the distributional sense to the measure 1 log g dA, that is, 2π 1 ϕ(λ) dμn (λ) → log g(λ) ϕ(λ) dA(λ) (11.25) 2π C C for every ϕ ∈ C0∞ . Proof. Evidently,
1 log |λj (Tn (b)) − λ| = log |Dn (b − λ)|1/n . n j =1 n
log |z − λ| dμn (z) = C
Thus, by Lemma 11.10, log |z − λ| dμn (z) → log g(λ)
(11.26)
C
uniformly on compact subsets of C \ (b). It follows that (11.26) is also true in the distributional sense, whence 1 1 λ λ log g(λ) log |z − λ| dμn (z) → (11.27) 2π 2π C in the distributional sense. It is well known from potential theory that 1 λ log |z − λ| dν(z) = ν(λ) 2π C for every compactly supported finite Borel measure ν. Consequently, (11.27) implies that μn (λ) →
1 λ log g(λ) 2π
in the distributional sense, which says that (11.25) holds for all ϕ ∈ C0∞ . By Corollary 11.13, g and the normal derivatives ∂g/∂n1 and ∂g/∂n2 are well defined at the nonexceptional points of (b). From Theorem 11.9 we know that (b) has at most finitely many exceptional points. Lemma 11.15. If ϕ ∈ C0∞ (C), then log g(λ)(ϕ)(λ) dA(λ) = C
1 ∂g ∂g ϕ(λ) (λ) + (λ) ds(λ). g(λ) ∂n1 ∂n2 (b)
Proof. Let λ0 ∈ (b) be a nonexceptional point and let U be a sufficiently small open neighborhood of λ0 . We denote the boundaries of the two connected components D1 and D2 of U \(b) by ∂D1 and ∂D2 , and we put 1 = ∂D1 ∩(b), 2 = ∂D2 ∩(b). Clearly, 1 and 2 coincide as sets. However, the outer normal n is n1 at the points of 1 and it is n2 at the points of 2 (see Figure 11.2).
i
i i
i
i
i
i
11.4. Limiting Set and Limiting Measure
buch7 2005/10/5 page 273 i
273
(b) 1
D1
2
D2
U
Figure 11.2. An illustration to the proof. From Green’s formula, (u v − v u) dA =
∂
we deduce that (log g ϕ − ϕ log g) dA = D1
∂v ∂u u −v ∂n ∂n
log g
∂D1
(11.28)
ds,
∂ϕ ∂ −ϕ log g ∂n1 ∂n1
By (11.14), log g is harmonic in D1 . Thus, log g = 0, whence ∂ϕ ∂ log g ϕ dA = log g ds − ϕ log g ds. ∂n1 ∂n1 D1 ∂D1 ∂D1
ds.
(11.29)
Analogously,
∂ϕ log g ϕ dA = log g ds − ∂n2 D2 ∂D2
ϕ ∂D2
∂ log g ds. ∂n2
(11.30)
Adding (11.29) and (11.30) we obtain ∂ϕ ∂ϕ ∂ϕ log g ϕ dA = log g log g ds + log g ds ds + ∂n ∂n1 ∂n2 U ∂U 1 2 ∂ ∂ ∂ − ϕ ϕ log g ds − ϕ log g ds, log g ds − ∂n ∂n ∂n 1 2 ∂U 1 2 and since ∂ϕ ∂ϕ + =0 ∂n1 ∂n2 it follows that
∂ 1 ∂g log g = , ∂n g ∂n
∂ϕ 1 ∂g ds − ds ϕ ∂n g ∂n ∂U ∂U ∂g 1 ∂g − ds. ϕ + g ∂n1 ∂n2 (b)
log g ϕ dA = U
and
log g
(11.31)
i
i i
i
i
i
i
274
buch7 2005/10/5 page 274 i
Chapter 11. Eigenvalue Distribution
From (11.18) and Lemma 11.12 we infer that ∂g 0 for all sufficiently large n, which implies that λ0 ∈ s (b). We are left to prove that w (b) ⊂ (b). This follows from Corollary 11.2 and Theorem 11.3. We can also argue as follows. Let λ0 ∈ / (b). Then |zr (λ0 )| < |zr+1 (λ0 )|. Since |zr (λ)| and |zr+1 (λ)| depend continuously on λ, there is an open neighborhood U of λ0 such that |zr (λ)| < |zr+1 (λ)| for all λ ∈ U . By Lemma 11.10, |Dn (b − λ)|1/n → g(λ) > 0 for all λ ∈ U . Hence, Dn (b − λ) = 0 for all sufficiently large n and all λ ∈ U . This shows that sp Tn (b) ∩ U = ∅ for all sufficiently large n and implies that λ0 ∈ / w (b). Example 11.18. We already know (b) in the case where b is a trinomial of the form b(t) = b−1 t −1 + b0 + b1 t. One can also describe (b) explicitly if b is a general trinomial, that is, b(t) = b−r t −r + b0 + bs t s . By translation of the plane, we may assume that b0 = 0, and by further rotation and change of the scale, we may confine ourselves to the case where b(t) = t −r + t s ,
r ≥ 1,
s ≥ 1, g.c.d.(r, s) = 1.
Schmidt and Spitzer showed that in this case (b) is the star (b) = ωd : ω = e2π i/(r+s) , 0 ≤ d ≤ R , where R = (r + s)s −s/(r+s) r −r/(r+s) . The pictures at the top right corners of the beginning pages of the chapters are examples of symbol curves b(T) and the eigenvalues of T60 (b). Figures 11.3 to 11.5 show some more sophisticated examples. These figures were scanned from printouts left by Olga Grudskaya.
11.5
Connectedness of the Limiting Set
Theorem 11.19 (Ullman). If b is a Laurent polynomial, then (b) is a connected set. Proof. Assume (b) is not connected. Then we can find a subset K ⊂ (b) and a function ϕ ∈ C0∞ (C) such that ϕ|K = 1 and ϕ|(b) \ K = 0. Moreover, ϕ can be chosen so that ϕ|1 = 1 and ϕ|2 = 0 where 1 and 2 are open sets with smooth Jordan boundaries which contain K and (b) \ K, respectively. Let 0 and 3 be open subsets of C which contain K and (b)\K and are contained in the interior of 1 and 2 , respectively. We may assume that 0 and 3 have smooth Jordan boundaries. Finally, put = C \ (0 ∪ 3 ). Abbreviate the limiting measure (11.33) to dμ.
i
i i
i
i
i
i
276
buch7 2005/10/5 page 276 i
Chapter 11. Eigenvalue Distribution
Figure 11.3. Legacy of Olga Grudskaya: The ranges b(T) for two Laurent polynomials and the eigenvalues of the matrices Tn (b).
i
i i
i
i
i
i
11.5. Connectedness of the Limiting Set
buch7 2005/10/5 page 277 i
277
Figure 11.4. Legacy of Olga Grudskaya: The ranges b(T) for two Laurent polynomials and the eigenvalues of the matrices Tn (b).
i
i i
i
i
i
i
278
buch7 2005/10/5 page 278 i
Chapter 11. Eigenvalue Distribution
Figure 11.5. Legacy of Olga Grudskaya: The ranges b(T) for two Laurent polynomials and the eigenvalues of the matrices Tn (b).
i
i i
i
i
i
i
11.5. Connectedness of the Limiting Set
buch7 2005/10/5 page 279 i
279
Since μ((b)) = 1 and neither K nor (b) \ K can degenerate to a point (Theorem 11.9), it follows that 0 < μ(K) < 1. From Lemma 11.15 we obtain μ(K) = dμ = K
ϕ dμ =
log g ϕ dA, C
(b)
and since ϕ = 0 on 0 and 2 , we get
(11.35)
μ(K) =
log g ϕ dA.
As log g = 0 on , Green’s formula (11.28) implies that ∂ϕ ∂ log g log g −ϕ ds, μ(K) = ∂n ∂n ∂ where n is now the outer normal to . Taking into account that ∂ϕ ∂ϕ ϕ | ∂3 = 0, ∂3 = 0, ϕ | ∂0 = 1, ∂n ∂n we get
μ(K) = ∂0
∂ log g ds, ∂n
∂0 = 0,
(11.36)
where n is now the outer normal to 0 . From (11.15) we infer that if λ0 is a point in C \ (b), then the analytic functions Dn+1 (b − λ)/Dn (b − λ) converge uniformly in some open neighborhood of λ0 to some analytic function. Thus, there is an analytic function G : C \ (b) → C such that lim
n→∞
Dn+1 (b − λ) = G(λ) = 0 Dn (b − λ)
for all λ ∈ C \ (b). Lemma 11.10 implies that g(λ) = |G(λ)|. Fix a point λ0 ∈ ∂0 and choose an argument v(λ) = arg G(λ) of G(λ) which is continuous on ∂0 \ {λ0 }. Put u = log g. Since log G = log |G| + i arg G = u + iv is analytic in a neighborhood of ∂0 \ {λ0 }, the Cauchy-Riemann equations give ux = vy and uy = −vx . Let x = x(t) and y = y(t) (t ∈ (0, 2π )) be a parametrization of of ∂0 \ {λ0 }. From (11.36) we now obtain that 2π ∂ log g ds = (ux y˙ − uy x) ˙ dt μ(K) = ∂n 0 ∂0 2π 2π 1 ∂v = ds = v (vx x˙ + vy y) ˙ dt = . (11.37) ∂s 2π ∂0 \{λ0 } 0 0
i
i i
i
i
i
i
280
buch7 2005/10/5 page 280 i
Chapter 11. Eigenvalue Distribution
But the number (11.37) is always an integer, which contradicts (11.35). As the following result shows, (b) may separate the plane. Theorem 11.20. Let b(t) = μ + t −r (t − α)r (t − β)r , where μ, α, β are complex numbers and αβ = 0. If r = 1 or r = 2, then C \ (b) is connected. If r ≥ 3, then C \ (b) has at most [(r + 1)/2] components (including the unbounded component), and for each natural number j between 1 and [(r + 1)/2] there exist α and β such that C \ (b) has exactly j components. Proof. Put a(t) = t −1 (t − α)(t − β). The curve a(T) is an ellipse with the foci −(α + β) ± √ 2 αβ and (a) is the line segment between the foci. Obviously, b(t) = μ + (a(t))r . We claim that (b) = μ + ((a))r .
(11.38)
To prove our claim, we assume for the sake of definiteness that a(t) traces out a(T) counterclockwise as t moves around T counterclockwise. Notice that b = μ + ar for all ∈ (0, ∞). Pick λ ∈ (a). Then λ ∈ sp T (a ) for all ∈ (0, ∞) due to Theorem 11.3. Put ω = e2π i/r . If ωk λ ∈ a (T) for some k ∈ {1, . . . , r}, then μ + λr = μ + (ωk λ)r ∈ b (T) and hence μ + λr ∈ sp T (b ). Now assume that ωk λ ∈ / a (T) for all k ∈ {1, . . . , r}. Since λ ∈ sp T (a ), it follows that wind (a − λ) = 1. As b − μ − λr = ar − λr , we have wind (b − μ − λr ) = wind
r
(a − ωk λ) =
k=1
r
wind (a − ωk λ),
k=1
and as wind (a − ωk λ) is either 0 or 1, we conclude that wind (b − μ − λr ) ≥ 1. Consequently, μ + λr ∈ sp T (b ). From Theorem 11.3 we now obtain the inclusion “⊃” in (11.38). To verify the reverse inclusion, let ∈ C \ {0} be any number such that 2 = αβ. Then a (T) = (a). It results that b (T) = μ + ((a))r , and as wind (b − ζ ) = 0 for all ζ ∈ / b (T), we see that sp T (b ) = μ + ((a))r . Since ? ? sp T (b ) = sp T (b ), ∈C\{0}
∈(0,∞)
we infer from Theorem 11.3 that (b) ⊂ sp T (b ) = μ + ((a))r , which is the inclusion “⊂” of (11.38). Since neither a line segment nor the square of a line segment does separate the plane, we obtain from equality (11.38) the assertion for r = 1 and r = 2. Combining (11.38) with the fact that (a) is a line segment, one can easily see that the set C \ (b) has at most [(r + 1)/2] components and that each number of components between 1 and [(r + 1)/2] can indeed be realized (for example, we get exactly [(r + 1)/2] components if |μ − (α + β)| > 0 is sufficiently small and |αβ| is sufficiently large.
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 281 i
281
Exercises 1. Gershgorin’s theorem states that if An = (aij ) is an n × n matrix and ⎛ ⎞ n |aij |⎠ − |aii |, Ri (An ) := ⎝ j =1
then sp An ⊂
n .
{λ ∈ C : |λ − aii | ≤ Ri (An )}.
i=1
Show that in the case of large Toeplitz band matrices this theorem amounts to the trivial estimate rad Tn (b) ≤ bW . 2. Let b ∈ P and suppose R(b) ⊂ R. Let further {a1 , a2 , a3 , . . . } be a convergent sequence of real numbers. Put An = Tn (b) + diag (a1 , . . . , an ),
A = T (b) + diag (a1 , a2 , . . . ).
Show that lim inf sp An = lim sup sp An = sp A. n→∞
n→∞
3. Let a ∈ P and let Hn (a) = Pn H (a)Pn |Im Pn . (a) Show that (Hn (a)) and sp Hn (a) converge to (H (a)) and sp H (a), respectively, in the Hausdorff metric. (b) Show that (H (a)) and sp H (a) are finite sets containing the origin. (c) Prove that 1 1 ϕ(σj (Hn (a))) = lim ϕ(λj (Hn (a))) = ϕ(0) n→∞ n n→∞ n j =1 j =1 n
n
lim
for every measurable function ϕ that is continuous at the origin. 4. Let b be a Laurent polynomial. (a) Show that sp Tn (b) ⊂ conv R(b) for all n ≥ 1. (b) Prove that 1 1 ϕ(λj (Tn (b))) = n→∞ n 2π j =1 n
2π
ϕ(eiθ ) dθ
lim
(11.39)
0
for every function ϕ that is analytic in a disk containing conv R(b).
i
i i
i
i
i
i
282
buch7 2005/10/5 page 282 i
Chapter 11. Eigenvalue Distribution (c) Let ⊂ C be an open set such that conv R(b) ⊂ . Prove that (11.39) is true for every harmonic function ϕ on . (d) What does (11.39) say for b(t) = b0 + b1 (t)? (e) Let be an open set in the plane and ϕ ∈ C(). Prove that if (11.39) is true for every b with conv R(b) ⊂ , then ϕ is harmonic on .
5. Let a, b ∈ P and consider An = Tn (b) + Hn (a). Fix an open subset of C that contains sp T (b) ∪ sp (T (b) + H (a)). (a) Show that sp An ⊂ for all sufficiently large n. (b) Show that ⎤ ⎡ n n 1 ⎣ lim p(λj (An )) − p(λj (Tn (b)))⎦ n→∞ n j =1 j =1 = lim
n→∞
1 tr [p(An ) − p(Tn (b))] = 0 n
for every polynomial p ∈ P + . (c) Prove that if ϕ is harmonic in , then 1 1 ϕ(λj (An )) = n→∞ n 2π j =1 n
lim
ϕ(λ)dμ(λ), (b)
where dμ is the measure (11.33). 6. Do the mass centers of the sets (b) and sp T (b) always coincide? 7. Compute the determinant and the eigenvalues of the Toeplitz matrix ⎞ ⎛ c+1 a . . . a n−1 ⎜ a −1 c + 1 . . . a n−2 ⎟ ⎟ ⎜ An = ⎜ .. .. .. ⎟ . .. ⎝ . . . . ⎠ −(n−1) −(n−2) a a ... c + 1
Notes Sections 11.1 to 11.4 are based on the pioneering papers on this topic, Schmidt and Spitzer [243] and Hirschman [165], and on some significant improvements of the original proofs that were introduced by Widom [296], [297]. Theorem 11.19 was established in [286]. Exercise 3 is of course easy. For the spectral distribution of more general Hankel matrices we refer to Fasino and Tilli’s papers [115] and [116]. Exercise 4 is a result of Tilli [264]. The analogue of Exercise 5 for the singular values is in [116], for example. The result of Exercise 5 can probably be generalized to a broader class of test functions ϕ. Exercise 7 is from Sakhnovich’s paper [241].
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 283 i
283
Further results: limiting sets separating the plane. For a long time it had been an open question whether the set (b) may separate the plane. In 1992, Anselone and Sloan [6] studied the truncated Wiener-Hopf operator x τ (Aτ f )(x) = 2 e−(x−t) f (t)dt + ex−t f (t)dt (0 < x < τ ) 0
x
on L (0, τ ). They were interested in the set of all λ ∈ C such that λ = lim λn with λn ∈ is the sp Aτn and τn → ∞, and on the basis of numerical experiments, they conjectured √ that √ union of the circle {λ ∈ C : |λ−1/12| = 1/12} and the line segment [3/2− 2, 3/2+ 2 ]. Note that this set separates the plane. The symbol of the corresponding “infinite” WienerHopf operator x ∞ −(x−t) (Af )(x) = 2 e f (t)dt + ex−t f (t)dt (0 < x < ∞) 2
0
x
on L2 (0, ∞) is the Fourier transform of the kernel, 0 ∞ 3 + ix a(x) = eixt et dt + eixt · 2e−t dt = 1 + x2 −∞ 0
˙ := R ∪ {∞}). (x ∈ R
˙ is an ellipse. In a sense, Anselone and Sloan’s conjecture This is a rational function and a(R) answered the continuous analogue on the question whether (b) may separate the plane in the negative. In [72], the continuous analogue of Theorem 11.17 was established, and this result implied that the conjecture of Anselone and Sloan was true. Thus, it was Wiener-Hopf operators with rational symbols that showed us for the first time a limiting set for which C \ is disconnected (see Figure 11.6). 0.2
1.5
0.15
1
0.1
0.5 0.05
0
0
−0.5
−0.05 −0.1
−1 −0.15
−1.5 0
1
2
3
−0.1
0
0.1
0.2
0.3
˙ and the limiting set on the left and a zoom in on Figure 11.6. The ellipse a(R) the right. In the context of Toeplitz band matrices, symbols b for which (b) separates the plane were detected numerically in [20] (Figure 5.3(c)) and [71] (lower left picture of
i
i i
i
i
i
i
284
buch7 2005/10/5 page 284 i
Chapter 11. Eigenvalue Distribution
Figure 38, the “smiling shark”). The picture on the first page of this chapter also shows a disconnected set C \ (b); the corresponding symbol is b(t) = t −3 + 0.99 t −2 + 0.1 t 2 − 0.44 t 3 . Theorem 11.20, which is a first “analytic” result in this direction, is from our paper [50]. Further results: dense Toeplitz matrices. The sets s (b) and w (b) can be defined for Toeplitz matrices T (b) with arbitrary symbols b, but for general symbols b things remain mysterious. In the case of rational symbols, the limiting sets were characterized by K. M. Day [97], [98] (the result is also cited in [71, Section 5.9], and for the underlying determinant formula see also [32]). For certain classes of continuous or piecewise continuous symbols b, the asymptotic eigenvalue distribution of Tn (b) can be described by formulas of the Szegö type. These say (n) that if λ(n) 1 , . . . , λn are the eigenvalues of Tn (b), then 1 1 lim ϕ(λ(n) j )= n→∞ n 2π j =1 n
2π
ϕ(b(eiθ ))dθ
(11.40)
0
for every continuous function ϕ : C → R with compact support (see [15], [71], [265], [296], [297] for details). Notice that (11.40) implies that, up to o(n) possible outliers, the eigenvalues cluster along the (essential) range b(T) of the symbol but that (11.40) does not tell us whether the possible o(n) outliers produce additional pieces of w (b). In order to find w (b), one can try computing sp Tn (b) for some large values of n numerically, having hopes that the result is a more or less good approximation to w (b). This works for rational symbols and, in general, also for the symbols for which (11.40) holds. Another possibility of determining w (b) is to approximate b by a Laurent polynomial bn , surmising that w (bn ) is close to w (b). This approach fails for piecewise continuous symbols, which should not come as a surprise, since a properly piecewise continuous function can never be approximated uniformly by Laurent polynomials as closely as desired. Unexpectedly, this approach does in general also not work for continuous symbols. This is a consequence of the main result of [51], which shows that the asymptotic spectrum w (·) is discontinuous on the space of continuous symbols. The result of [51] is as follows: There exist b ∈ W such that w (Sn b) does not converge to w (b) in the Hausdorff metric. Here Sn b denotes the nth partial sum of the Fourier series (see Section 5.3). Figures 11.7 and 11.8, which are from [51] and were produced by Olga Grudskaya, convincingly illustrate what happens. The symbol b is b(t) = t −1 (33 − (t + t 2 )(1 − t 2 )3/4 ). This is a continuous function which is piecewise C ∞ but not C ∞ . The matrix T (b) is a lower Hessenberg matrix. The results of Widom [296] imply that (11.40) is valid, that is, we expect that the eigenvalues of Tn (b) cluster along the range b(T) of the symbol. The two pictures of Figure 11.7 show the range b(T) and the eigenvalues of Tm (b) for m = 128 (left) and m = 512 (right). In Figure 11.8, we plot the eigenvalues of T128 (Sn b) for n = 4, 6, 8, 12. These eigenvalue distributions mimic the sets w (Sn b) sufficiently well, and it is clearly seen that w (Sn a) grows like a rampant tree that in the n → ∞ limit has nothing in common with w (b).
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 285 i
285
Figure 11.7. The range b(T) and the eigenvalues of T128 (b) (left) and T512 (b) (right).
Figure 11.8. The range b(T) and the eigenvalues of T128 (Sn b) for the values n = 4, 6, 8, 12.
i
i i
i
i
buch7 2005/10/5 page 286 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 287 i
Chapter 12
Eigenvectors and Pseudomodes
In previous chapters, we studied the asymptotic behavior of eigenvalues and pseudoeigenvalues (= points in the pseudospectrum) of large Toeplitz matrices. This chapter concerns asymptotic results on eigenvectors and pseudomodes (= pseudoeigenvectors). We will in particular point out that there are striking differences between the Toeplitz and circulant cases.
12.1 Tridiagonal Circulant and Toeplitz Matrices Let b(t) = t +α 2 t −1 . In Sections 2.1 and 2.2 we found explicit formulas for the eigenvectors of Cn (b) and Tn (b). The purpose of this section is to give the reader some delight with a few pictures that can be easily produced with MATLAB. Let n ≥ 3. We know from Proposition 2.1 that the eigenvectors of Cn (b) normalized to 2 norm 1 are x1 , . . . , xn with 1 xj = √ 1, ωnj −1 , ωn2(j −1) , . . . , ωn(n−1)(j −1) , (12.1) n where ωn = e2π i/n . Theorem 2.4 tells us that after normalization to the 2 norm 1 the eigenvectors of Tn (b) are x1 , . . . , xn with 2 n 1 2πj nπj πj 1 1 xj = cn (α) sin sin sin , , ..., , (12.2) α n+1 α n+1 α n+1 where cn (α) is the normalization constant. Note first of all that (12.1) is independent of b, while (12.2) depends on b. A second immediate observation is that the absolute values of the components of the eigenvectors (12.1) are constant, whereas those of (12.2) are localized in the left for |α| > 1 and in the right for |α| < 1. The eigenvectors x1 , . . . , xn form a basis in Cn . When identifying Cn with R2n , these eigenvectors deliver the basis (Re x1 , Im x1 ), . . . , (Re xn , Im xn ), (−Im x1 , Re x1 ), . . . , (−Im xn , Re xn )
(12.3)
287
i
i i
i
i
i
i
288
buch7 2005/10/5 page 288 i
Chapter 12. Eigenvectors and Pseudomodes
in R2n . Notice that Im xj = 0 for all j in the Toeplitz case provided α is real. The second half of the basis (12.3) clearly mimics the first half. In Figure 12.1 we plotted the 10 vectors of the first halves of the bases (12.3) for C10 (b) and T10 (b) with α = 3/2. We arrange the 2n vectors (12.3) to a (2n)×(2n) matrix and denote this matrix by ECn in the circulant case and by ETn (α) in the Toeplitz case. Figure 12.2 shows pseudocolor plots of the matrices EC30 and ET30 (3/2). A gray point indicates a matrix entry near zero, dark points stand for positive entries, and light points represent negative ones. The right picture of Figure 12.1 and especially the bottom picture of Figure 12.2 show that (12.3) is a very ill-conditioned basis in the Toeplitz case. Finally, Figures 12.3 and 12.4 depict surface plots of the matrices EC30 and ET30 (3/2) from two different viewpoints. Whatever insights these two figures might provide, they show us at least the bizarre difference between two worlds.
12.2
Eigenvectors of Triangular and Tridiagonal Matrices
We now turn to the eigenvectors of general banded Toeplitz matrices. Let b(t) = be a Laurent polynomial. As in Chapter 11, we write
s
b(t) − λ = t −r bs (t − z1 (λ)) . . . (t − zr+s (λ)), |z1 (λ)| ≤ · · · ≤ |zr+s (λ)|.
j =−r
bj t j
(12.4)
The system Tn (b − λ)xn = 0 (xn ∈ Cn ) is a difference equation with constant coefficients and r +s boundary conditions. Analogously, the equation T (b −λ)x0 = 0 (x0 ∈ 2 ) is a difference equation that has constant coefficients, and the constraints are s boundary conditions and the requirement that x0 be in 2 . Thus, once λ is an eigenvalue, the eigenvectors can be expressed in terms of the zeros z1 (λ), . . . , zr+s (λ) of (12.4) and certain coefficients that can be determined from the boundary constraints. The resulting formulas are nevertheless very complicated and inappropriate for answering the question we are interested in here in a straightforward fashion. Trench [278] derived simpler equations for the eigenvalues of Tn (b) and constructed simpler formulas for the eigenvectors (see Exercise 2). Again, these formulas cannot be immediately employed in order to see the asymptotic behavior of the eigenvectors. An interesting result on the eigenvectors was established by Zamarashkin and Tyrtyshnikov [301], [302]. They proved that the eigenvectors of Tn (b) are asymptotically distributed as the columns of the Hermitian adjoint of the Fourier matrix, 1 2π (n) (n) Fn∗ = √ (e−i n j k )n−1 j,k=0 =: (col1 , . . . , coln ), n in the following sense: For each ε > 0, the number of integers k ∈ {1, . . . , n} for which minλ Tn (b − λ) col(n) k 2 > ε is o(n) as n → ∞. We here consider an entirely different question. Take a λn in sp Tn (b) for each n ≥ 1 and suppose the points λn converge to some point λ. Clearly, λ belongs to the limiting set (b) studied in Chapter 11. Conversely, given any λ ∈ (b), we can find λn ∈ sp Tn (b) such that λn → λ. If λ is an eigenvalue of T (b), are the eigenvectors of Tn (b) corresponding to the eigenvalues λn related to the eigenvectors of T (b) corresponding to λ? What can be said about the eigenvectors of Tn (b) for λn in the case where λ is not an eigenvalue of T (b)?
i
i i
i
i
i
i
12.2. Eigenvectors of Triangular and Tridiagonal Matrices
buch7 2005/10/5 page 289 i
289
Figure 12.1. Eigenvectors of a tridiagonal circulant matrix (left) and a tridiagonal Toeplitz matrix (right).
i
i i
i
i
i
i
290
buch7 2005/10/5 page 290 i
Chapter 12. Eigenvectors and Pseudomodes
60 55 50 45 40 35 30 25 20 15 10 5 5
10
15
20
25
30
35
40
45
50
55
60
5
10
15
20
25
30
35
40
45
50
55
60
60 55 50 45 40 35 30 25 20 15 10 5
Figure 12.2. A pseudocolor plot of the real eigenvector basis of a tridiagonal circulant matrix (top) and a tridiagonal Toeplitz matrix (bottom).
i
i i
i
i
i
i
12.2. Eigenvectors of Triangular and Tridiagonal Matrices
buch7 2005/10/5 page 291 i
291
Eigenvectors of infinite Toeplitz matrices were investigated in Section 1.8. We repeat Proposition 1.20 in slightly modified form. Proposition 12.1. Let λ ∈ sp T (b) and suppose that λ ∈ / b(T). The point λ is an eigenvalue of T (b) as an operator on 2 if and only if wind (b, λ) = −m < 0. In that case dimKer T (b− λ) = m and Ker T (b − λ) = lin{x0 , V x0 , . . . , V m−1 x0 }, where V is the shift operator V : 2 → 2 , {w1 , w2 , . . . } → {0, w1 , w2 , . . . } (0) and x0 = {xk(0) }∞ k=1 is an exponentially decaying sequence with x1 = 1.
Proof. The proof of Proposition 1.20 gives −1 −1 )e0 , . . . , T (b+ )em−1 }. Ker T (b − λ) = lin {T (b+ −1 −1 )ej as the j th column of the lower-triangular Toeplitz matrix T (b+ ). We may think of T (b+ −1 As the 0th Fourier coefficient of b+ may be assumed to be 1 and the Fourier coefficients of −1 b+ decay exponentially, we get all assertions.
Triangular Toeplitz matrices. Suppose b(t) = b0 + b−1 t −1 + · · · + b−r t −r with r ≥ 1 and b−r = 0. The matrix T (b) is upper triangular. It is clear that sp Tn (b) = {b0 } for all / b(T). Let b−1 = · · · = b−m0 +1 = 0 and n and hence (b) = {b0 }. We assume that b0 ∈ b−m0 = 0. Then b(t) − b0 = b−m0 t −r (t − δ1 ) . . . (t − δp )(t − μ1 ) . . . (t − μq ), where |δk | < 1 and |μk | > 1 for all k. Since p + q = r − m0 , we get wind (b, b0 ) = −r + p = −m0 − q ≤ −m0 ≤ −1.
(12.5)
Thus, by Proposition 12.1, Ker T (b − b0 ) has the dimension m0 + q and from the proofs of Propositions 1.20 and 12.1 we infer that −1 −1 )e1 , . . . , T (b+ )em0 +q } Ker T (b − b0 ) = lin {T (b+
with −1 b+ (t) =
q
1+
k=1
t t2 + 2 + ··· μ1 μ1
.
On the other hand, it is easily seen that Ker Tn (b − b0 ) = lin {e1 , . . . , em0 }. Consequently, in general Ker Tn (b − b0 ) and Ker T (b − b0 ) are in no way related. If b−1 = 0 (⇔ m0 = 1), which is the generic case, then Ker Tn (b−b0 ) is spanned by e1 . If, in addition, wind (b, b0 ) = −1, then q = 0 and hence Ker T (b − b0 ) is also spanned by e1 . This is in accordance with Theorem 12.3, which will be proved below. However, in the case where b−1 = 0 and wind (b, b0 ) ≤ −2, no eigenvector of T (b) that is outside lin {e1 } can be approximated by eigenvectors of Tn (b). If b(t) = b0 + b1 t + · · · + bs t s (s ≥ 1, bs = 0), then T (b) is lower triangular. Again sp Tn (b) = {b0 } for all values of n and (b) = {b0 }. From (12.5) we obtain that
i
i i
i
i
i
i
292
buch7 2005/10/5 page 292 i
Chapter 12. Eigenvectors and Pseudomodes
0.2 0.15 0.1 0.05 0 –0.05 –0.1 –0.15 –0.2 60 50
60
40
50 30
40 30
20
20
10
10 0
0
1
0.5
0
–0.5
–1 60 50
60
40
50 30
40 30
20
20
10
10 0
0
Figure 12.3. A surface plot of the real eigenvector basis of a tridiagonal circulant matrix (top) and a tridiagonal Toeplitz matrix (bottom).
i
i i
i
i
i
i
12.2. Eigenvectors of Triangular and Tridiagonal Matrices
buch7 2005/10/5 page 293 i
293
0.2 0.15 0.1 0.05 0 –0.05 0 –0.1 –0.15 –0.2 60
20
40 50
40
30
20
10
0
60
1
0.5
0
0 –0.5 20 –1 60
40 50
40
30
20
10
0
60
Figure 12.4. Another view at the surface plots of Figure 12.3.
i
i i
i
i
i
i
294
buch7 2005/10/5 page 294 i
Chapter 12. Eigenvectors and Pseudomodes
wind (b, b0 ) = −wind ( b, b0 ) ≥ 1, and hence Ker T (b − b0 ) = {0} by virtue of Proposition 12.1. If b1 = · · · = bm0 −1 = 0 and bm0 = 0, then Ker Tn (b − b0 ) = lin {en , . . . , en−m0 +1 }. Tridiagonal Toeplitz matrices. Let b(t)√= b−1 t −1 + b0 + b1 t. Fix one of the two values √ of b−1 /b1 and denote it by α. Define b1 b−1 as b1 α. By Theorem 2.4, the eigenvalues of Tn (b) are # πj λj = b0 + 2 b1 b−1 cos n+1
(j = 1, . . . , n),
(n) n and an eigenvector for λj is xj,n = (xj,k )k=1 with
k−1 1 kπj πj = sin / sin (k = 1, . . . , n). α n+1 n+1 √ √ It follows that (b) is the line segment [ b0 − 2 b1 b−1 , b0 + 2 b1 b−1 ]. The range b(T) √ is an ellipse with the foci b0 ± 2 b1 b−1 for |α| = 1 and the line segment (b) for |α| = 1. Consequently, the spectrum of T (b) consists of the points on the ellipse and in its interior for |α| = 1 and coincides with the line segment (b) for |α| = 1. If |α| < 1, then wind (b, λ) = 1 for all λ ∈ (b), while if |α| > 1, we have wind (b, λ) = −1 for all λ ∈ (b). √ Now pick a point λ = b0 + 2 b1 b−1 cos θ ∈ (b) (θ ∈ [0, π ]) and chose any jn ∈ {1, . . . , n} such that πjn /(n + 1) → θ as n → ∞. Then (n) xj,k
# πjn λn := b0 + 2 b1 b−1 cos ∈ sp Tn (b) n+1 and λn → λ. For the kth component of the eigenvector xn := xjn ,n we obtain xk(n) =
k−1 k−1 1 sin kθ kπjn πjn 1 sin / sin → =: xk(0) as n → ∞, α n+1 n+1 α sin θ
with the convention that sin(kθ ) / sin θ = k in the cases θ = 0 and θ = π . Put x0 = {xk(0) }∞ / 2 , and T (b) is known to have no eigenvalues in this case k=1 . If |α| ≤ 1, then x0 ∈ (Proposition 12.1 for |α| < 1 and Theorem 1.31 for |α| = 1). However, if |α| > 1, then x0 ∈ 2 , Ker T (b − λ) is one dimensional and x0 is an eigenvector of T (b) for λ. Clearly, if |α| > 1, then xn converges to x0 not only componentwise but even in 2 . The conclusion is as follows. Let λ ∈ (b), λn ∈ sp Tn (b), and λn → λ. The matrix Tn (b) has exactly one eigenvector xn for λn satisfying the normalization condition x1(n) = 1. If λ is an eigenvalue of T (b) and x0 is an eigenvector for λ, then x0 can be normalized so that x1(0) = 1. In the case where λ is an eigenvalue of T (b), the vectors xn converge to x0 in 2 , but in the case where λ is not an eigenvalue of T (b), the vectors xn converge componentwise to some vector x0 that does not belong to the space 2 .
12.3 Asymptotics of Eigenvectors Now take λ ∈ C \ b(T) and suppose wind (b, λ) = −m < 0. We then can write b − λ = χ−m c, where χk (t) := t k (t ∈ T) and c is a Laurent polynomial without zeros on T and
i
i i
i
i
i
i
12.3. Asymptotics of Eigenvectors
buch7 2005/10/5 page 295 i
295
with wind (c, 0) = 0. Let c(t) = j cj t j (t ∈ T). Since 0 ∈ / c(T) and wind (c, 0) = 0, it follows that the operator T (c) is invertible on 2 , that the matrices Tn (c) are invertible for all sufficiently large n, and that Tn−1 (c)Pn converges strongly to T −1 (c) on 2 (Corollaries 1.11 and 3.8). Lemma 12.2. Let λ ∈ sp Tn (b). Suppose λ ∈ / b(T) and wind (b, λ) = −m < 0. A vector x = (xj )nj=1 belongs to Ker Tn (b − λ) if and only if ⎛ ⎞ xm+1 ⎜ xm+2 ⎟ ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ c1 cm ⎜ .. ⎟ ⎜ . ⎟ ⎜ c2 ⎟ ⎜ cm+1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ x + · · · + x (12.6) −Tn (c) ⎜ = x ⎜ ⎟ ⎟. .. .. n m 1⎜ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ . . ⎜ 0 ⎟ ⎜ ⎟ cn cm+n−1 ⎜ . ⎟ ⎝ .. ⎠ 0 Proof. We have b − λ = cχ−m and hence, by Proposition 3.10, Tn (b − λ) = Tn (cχ−m ) = Tn (c)Tn (χ−m ) + Pn H (c)H (χm )Pn . It follows that Tn (b − λ)x = 0 if and only if −Tn (c)Tn (χ−m )x = Pn H (c)H (χm )Pn x, which is the same as (12.6). The following result shows that things are remarkably nice in the case m = 1. Theorem 12.3. Let λ ∈ (b), λn ∈ sp Tn (b), λn → λ. Suppose that λ ∈ / b(T) and that wind (b, λ) = −1. Then there exist n0 ∈ N, xn ∈ Cn (n ≥ n0 ), and x0 ∈ 2 such that x1(n) = 1 and Ker Tn (b − λn ) = lin {xn }, Ker T (b − λ) = lin {x0 }, xn → x0 in 2 . Proof. We have b − λ = χ−1 c with c as above and b − λn = b − λ + λ − λn = χ−1 (c + (λ − λn )χ1 ) =: χ−1 (c + δn χ1 ). Since Tn (c + δn χ1 ) = Tn (c) [I + δn Tn−1 (c)Tn (χ1 )] and δn Tn−1 (c)Tn (χ1 ) ≤ 1/2 for all n large enough, we see that Tn (c + δn χ1 ) is invertible for all sufficiently large n and that Tn−1 (c + δn χ1 )Pn converges strongly to T −1 (c) on 2 . Lemma 12.2 shows that xn = (xj(n) )nj=1 is in Ker Tn (b − λn ) if and only if ⎛
x2(n) x3(n) .. .
⎜ ⎜ ⎜ ⎜ ⎜ ⎝ x (n) n 0
⎞
⎛ c1 + δ n ⎟ ⎜ ⎟ c2 ⎟ (n) ⎜ ⎟ = −Tn−1 (c + δn χ1 ) x1 ⎜ .. ⎟ ⎝ . ⎠ cn
⎞ ⎟ ⎟ ⎟. ⎠
(12.7)
i
i i
i
i
i
i
296
buch7 2005/10/5 page 296 i
Chapter 12. Eigenvectors and Pseudomodes
Since dim Ker Tn (b − λn ) ≥ 1, the system (12.7) must have a solution. As equation (12.7) determines x2(n) , . . . , xn(n) uniquely once x1(n) is given, it follows that dim Ker Tn (b − λn ) = 1 and that Ker Tn (b − λn ) = lin {xn }, where xn = (xj(n) )nj=1 is specified by x1(n) = 1 and ⎛
x2(n) x3(n) .. .
⎜ ⎜ ⎜ ⎜ ⎜ ⎝ x (n) n 0
⎞ ⎟ ⎟ ⎟ ⎟ = −Tn−1 (c + δn χ1 ) ⎟ ⎠
⎛ ⎜ ⎜ ⎜ ⎝
c1 + δn c2 .. .
⎞ ⎟ ⎟ ⎟. ⎠
(12.8)
cn
The right-hand side of (12.8) converges to ⎛
⎞ c1 ⎜ ⎟ −T −1 (c) ⎝ c2 ⎠ . .. . (0) Consequently, xn converges in 2 to x0 = {xj(0) }∞ j =1 with x1 = 1 and
⎛ ⎞ ⎞ c1 x2(0) ⎜ ⎟ ⎜ x (0) ⎟ −1 ⎝ 3 ⎠ = −T (c) ⎝ c2 ⎠ . . .. .. . ⎛
This can be written as T (c)T (χ−1 )x0 +H (c)H (χ1 )x0 = 0, which is equivalent to the equality 0 = T (cχ−1 )x0 = T (b − λ)x0 . Thus, x0 is a nonzero element of Ker T (b − λ). Because dim Ker T (b − λ) = 1 by Proposition 12.1, it follows that Ker T (b − λ) = lin {x0 }. Example 12.4. This example illustrates Theorem 12.3. Let b(t) = i t 2 + 2 + t −1 +
1 −2 t − 2 t −3 . 2
The range b(T) and sp T50 (b) are shown in Figure 12.5. We consider the following points λn ∈ sp Tn (b): λ10 = 4.4190 + 0.2617i λ20 = 4.7423 + 0.1834i λ30 = 4.8177 + 0.1639i λ40 = 4.8463 + 0.1563i
∈ sp T10 (b), ∈ sp T20 (b), ∈ sp T30 (b), ∈ sp T40 (b),
λ50 = 4.8601 + 0.1526i ∈ sp T50 (b). These points are the beginning of a sequence of points λn ∈ sp Tn (b) that converge to some point λ ∈ (b), which is indicated by the arrow in Figure 12.5. For each of these λn , we compute an eigenvector xn ∈ Cn of Tn (b) and normalize it so that its first component is 1. By Theorem 12.3, the vectors xn must converge to some x0 ∈ 2 . This is convincingly seen
i
i i
i
i
i
i
12.3. Asymptotics of Eigenvectors
buch7 2005/10/5 page 297 i
297
3 2 1 0 −1 −2 −3 −4 −5 −2
0
2
4
6
Figure 12.5. The range b(T) and the 50 eigenvalues of T50 (b) for the symbol of Example 12.4. The arrow points to a point in (b) that is approximated by points λn ∈ sp Tn (b). in Figure 12.6, which shows the real part of xn in the left column and the imaginary part of xn in the right. The top row corresponds to n = 10, the bottom row to n = 50. We now turn to the case where wind (b, λ) = −m ≤ −2. We will then prove that generically the kernels of Tn (b − λn ) are one dimensional, Tn (b − λn ) = lin {xn }, that the vectors xn ∈ Cn converge to some x0 ∈ Ker T (b − λ), and that the limits x0 corresponding to different choices of the sequence λn → λ all belong to a single one-dimensional subspace of Ker T (b − λ). Given λ ∈ (b), we write the factorization (12.4) now in the form (0) b(t) − λ = bs t −r (t − z1(0) ) . . . (t − zr+s ),
(0) |z1(0) | ≤ · · · ≤ |zr+s |.
Since (b) = {λ ∈ C : |zr (λ)| = |zr+1 (λ)|} by virtue of Theorem 11.17, the point λ (0) |. belongs to (b) if and only if |zr(0) | = |zr+1 Theorem 12.5. Let r ≥ 1, s ≥ 1, λ ∈ (b), λ ∈ / b(T), λn ∈ sp Tn (b), and λn → λ. (0) Suppose |zr−1 | < |zr(0) |. Then, for all sufficiently large n, Ker Tn (b − λn ) = lin {xn } with certain xn =
(xj(n) )nj=1
∈ C satisfying x1(n) = 1. The limits n
xk(0) = lim xk(n) n→∞
exist for all k ≥ 2. If wind (b, λ) ≤ −1, then {1, x2(0) , x3(0) , . . . } ∈ Ker T (b − λ), while if wind (b, λ) ≥ 0, we have Ker T (b − λ) = {0} and {1, x2(0) , x3(0) , . . . } ∈ / 2 .
i
i i
i
i
i
i
298
buch7 2005/10/5 page 298 i
Chapter 12. Eigenvectors and Pseudomodes
1
1
0
0
−1
−1 0
10
20
30
40
50
1
1
0
0
−1
−1 0
10
20
30
40
50
1
1
0
0
−1
−1 0
10
20
30
40
50
1
1
0
0
−1
−1 0
10
20
30
40
50
1
1
0
0
−1
−1 0
10
20
30
40
50
0
10
20
30
40
50
0
10
20
30
40
50
0
10
20
30
40
50
0
10
20
30
40
50
0
10
20
30
40
50
Figure 12.6. The figure shows the real parts (left) and imaginary parts (right) of eigenvectors of Tn (b) corresponding to the eigenvalues λn of Example 12.4 for n = 10, 20, 30, 40, 50 (from the top to the bottom). As predicted by Theorem 12.3, these eigenvectors converge to some (infinite) vector in 2 .
i
i i
i
i
i
i
12.3. Asymptotics of Eigenvectors
buch7 2005/10/5 page 299 i
299
Proof. Recall the definition of b given in Section 11.1. The key observation, already used in Chapter 11, is the identity Tn (b − λn ) = D Tn (b − λn )D−1 , where D = diag (1, , . . . , n−1 ). This identity implies that sp Tn (b) = sp Tn (b ),
Ker Tn (b − λn ) = D−1 Ker Tn (b − λn ).
(12.9)
We have (0) b (t) − λ = bs −r t −r (t − z1(0) ) . . . (t − zr+s )
=
(0) bs (−z1(0) ) . . . (−zr+s )−r t −r
1−
t
z1(0)
... 1 −
t (0) zr+s
.
By assumption, we can choose a ∈ (0, ∞) such that |z1(0) |
≥ ··· ≥
(0) |zr−1 |
>1>
|zr(0) |
≥ ··· ≥
(0) |zr+s |
.
This gives wind (1 − t/zj(0) ) = 1 for j = 1, . . . , r − 1 and wind (1 − t/zj(0) ) = 0 for j = r, . . . , r + s, and hence wind (b , λ) = −r + r − 1 = −1. Thus, we can use Theorem 12.3 to conclude that there are yn ∈ Cn and y0 ∈ 2 such that y1(n) = 1 and Ker Tn (b − λn ) = lin {yn }, Ker T (b − λ) = lin {y0 },
yn → y0 in 2 .
From (12.9) we obtain that xn = (xj(n) )nj=1 ∈ Ker Tn (b − λn ) if and only if xj(n) = −j yj(n) with (yj(n) )nj=1 in Ker Tn (b − λn ). As Ker Tn (b − λn ) = lin {(1, y2(n) , . . . , yn(n) )}, where yk(n) → yk as n → ∞ for each k, it follows that Ker Tn (b − λn ) = lin {(1, x2(n) , . . . , xn(n) )} with xk(n) = −k yk(n) → −k yk =: xk(0) as n → ∞ for each k. Now suppose wind (b, λ) = −m ≤ −1. By Lemma 12.2 and (12.10), ⎛ (n) ⎞ ⎛ ⎡ ⎛ ⎞ xm+1 cm + δ n c 1 ⎜ . ⎟ ⎜ ⎢ .. ⎜ ⎟ ⎜ .. ⎟ . ⎜ ⎢ .. . ⎜ ⎟ ⎟ ⎜ ⎜ ⎢ ⎜ ⎟ ⎜ x (n) ⎟ (n) ⎜ .. (n) ⎜ ⎟ ⎜ n ⎟ = −T −1 (c + δn χm ) ⎢ c + δ + · · · + x x ⎢ m ⎜ m n ⎟ n 1 ⎜ . ⎜ 0 ⎟ ⎜ ⎢ ⎜ ⎟ ⎟ ⎜ .. ⎜ ⎢ .. ⎜ . ⎟ ⎝ ⎠ . ⎝ ⎣ . ⎝ .. ⎠ cn c m+n−1 0
(12.10)
⎞⎤ ⎟⎥ ⎟⎥ ⎟⎥ ⎟⎥ ⎟⎥ ⎟⎥ ⎟⎥ ⎠⎦
with x1(n) = 1. As x2(n) → x2(0) , . . . , xm(n) → xm(0) ,
δn → 0,
Tn−1 (c + δn χm ) → T −1 (c) (strongly),
i
i i
i
i
i
i
300
buch7 2005/10/5 page 300 i
Chapter 12. Eigenvectors and Pseudomodes
we obtain that (n) (0) (0) {xm+1 , . . . , xn(n) , 0, . . . } → {xm+1 , xm+2 ,...}
in 2 . Thus, xn → x0 in 2 . Because Tn (b − λn )xn = 0, it follows that T (b − λ)x0 = 0, that is, x0 belongs to Ker T (b − λ). Finally, let wind (b, λ) ≥ 0. Then Ker T (b − λ) = {0} by Proposition 12.1. Put xj(n) = 0 and xj(0) = 0 for j < 0. The kth equation of the system Tn (b − λn )xn = 0 reads (n) (n) (n) (n) bs xk−s + · · · + b1 xk−1 + (b0 − λn )xk(n) + b−1 xk+1 + · · · + b−r xk+r = 0.
Passage to the limit n → ∞ gives (0) (0) (0) (0) bs xk−s + · · · + b1 xk−1 + (b0 − λ)xk(0) + b−1 xk+1 + · · · + b−r xk+r = 0,
which is the kth equation of the infinite system T (b − λ)x0 = 0. Thus, if the sequence {1, x2(0) , x3(0) , . . . } would belong to 2 , then it would be in Ker T (b − λ), which is impossible. Example 12.6. We let b be as in Example 12.4, but we now consider the transpose matrices Tn ( b). Choose λn ∈ sp Tn ( b) = sp Tn (b) and the limiting point λ ∈ ( b) = (b) exactly as in Example 12.4. While wind (b, λ) = −1 in Example 12.4, we now have wind ( b, λ) = 1, and hence Theorem 12.5 implies that the corresponding eigenvectors (with first component equal to 1) do not converge in 2 . Figure 12.7 indicates that this is indeed the case. This figure also shows that nevertheless, as predicted by Theorem 12.5, for each fixed k the kth components of the eigenvectors converge. We finally remark that the assumption of Theorem 12.5 (that is, the requirement that (0) |zr−1 | < |zr(0) |) is generically satisfied in the following sense. Let P denote the set of all Laurent polynomials on T and equip P with the L∞ metric. Then the set of all b ∈ P satisfying the condition of Theorem 12.5 is open and dense in P.
12.4
Pseudomodes of Circulant Matrices
Let A be a bounded linear operator on a complex Hilbert space H. A point λ in C is said to be an ε-pseudoeigenvalue of A if (A − λI )−1 ≥ 1/ε (with the convention that (A − λI )−1 := ∞ in case A − λI is not invertible). In the language of Section 7.1, the ε-pseudoeigenvalues are just the points of the ε-pseudospectrum spε A. If λ is an εpseudoeigenvalue, then there exists a nonzero x ∈ H such that (A − λI )x ≤ εx. Each such x is called an ε-pseudomode (or ε-pseudoeigenvector) for A at λ. n×n Now suppose we are given a sequence {An }∞ . We think of n=1 of matrices An ∈ C n 2 An as an operator on C with the norm. We call a point λ ∈ C an asymptotically good pseudoeigenvalue for {An } if (An − λI )−1 2 → ∞ as n → ∞. In that case we can find nonzero vectors xn ∈ Cn satisfying (An − λI )xn 2 /xn 2 → 0 as n → ∞, and each sequence {xn } with this property will be called an asymptotically good pseudomode for {An } at λ.
i
i i
i
i
i
i
12.4. Pseudomodes of Circulant Matrices
301
8
8
6
6
4
4
2
2
0
0
–2
–2
–4
–4
–6 0
20
40
–6 0
80
60
60
40
40
20
20
0
0
–20
–20
–40
–40
–60
–60 0
20
40
–80 0
600
600
400
400
200
buch7 2005/10/5 page 301 i
20
40
20
40
20
40
20
40
200
0 0 –200 –200
–400
–400
–600 –800 0
20
40
6000
–600 0
6000
4000
4000
2000 2000 0 0 –2000 –2000
–4000 –6000 0
20
40
–4000 0
Figure 12.7. This figure is the analogue of the upper eight pictures of Figure b) 12.6 and shows the real parts (left) and imaginary parts (right) of eigenvectors of Tn ( corresponding to the eigenvalues λn of Example 12.6 for n = 10, 20, 30, 40 (from the top to the bottom). Notice the different scales of the vertical axes.
i
i i
i
i
i
i
302
buch7 2005/10/5 page 302 i
Chapter 12. Eigenvectors and Pseudomodes
This and the following section are devoted to the structure of asymptotically good pseudomodes for sequences constituted by the circulant cousins of Toeplitz band matrices (called α-matrices in theoretical chemistry [303]) and by Toeplitz matrices themselves. Given a subset Jn of {1, 2, . . . , n}, we denote by PJn the projection on Cn defined by yj for j ∈ Jn , (PJn y)j = (12.11) 0 for j ∈ / Jn . The number of elements in Jn will be denoted by |Jn |. Let {yn }∞ n=1 be a sequence of nonzero vectors yn ∈ Cn . We say that {yn } is asymptotically localized if there exists a sequence {Jn }∞ n=1 of sets Jn ⊂ {1, . . . , n} such that lim
n→∞
|Jn | =0 n
and
lim
n→∞
PJn yn 2 = 1. yn 2
We denote by Fn ∈ Cn×n the Fourier matrix: 1 n−1 Fn = √ ωnj k j,k=0 , n
ωn := e2π i/n .
n A sequence {yn }∞ n=1 of nonzero vectors yn ∈ C will be called asymptotically extended if {Fn yn } is asymptotically localized. We define the circulant matrix Cn (b) as in Section 2.1.
Theorem 12.7. Let b be a Laurent polynomial. A point λ ∈ C is an asymptotically good pseudoeigenvalue for {Cn (b)} if and only if λ ∈ b(T), in which case every asymptotically good pseudomode for {Cn (b)} is asymptotically extended. Proof. Clearly, Cn (b) − λI = Cn (b − λ). We know from Proposition 2.1 that ∗ Cn (b − λ) = Fn∗ diag (b(ωnj ) − λ)n−1 j =0 Fn =: Fn Dn Fn .
Since Fn is unitary, it follows that Cn−1 (b − λ)2 =
1 j
min |b(ωn ) − λ|
,
0≤j ≤n−1
which shows that Cn−1 (b − λ)2 → ∞ if and only if λ ∈ b(T). Now pick λ ∈ b(T) and suppose {xn } is an asymptotically good pseudomode for {Cn (b)} at λ. We may without loss of generality assume that xn 2 = 1. Put yn = (yj(n) )nj=1 = Fn xn . We have Cn (b − λ)xn 2 = Fn∗ Dn Fn xn 2 = Dn yn 2 .
(12.12)
Fix an ε > 0. For δ > 0, we put Gn (δ) = {j ∈ {1, . . . , n} : |b(ωnj −1 ) − λ| ≤ δ}, E(δ) = {θ ∈ [0, 2π ) : |b(eiθ ) − λ| < δ}.
i
i i
i
i
i
i
12.5. Pseudomodes of Toeplitz Matrices
buch7 2005/10/5 page 303 i
303
Since b is analytic in C \ {0}, the set E(δ) is a finite union of intervals. Hence |Gn (δ)|/n → |E(δ)|/(2π) as n → ∞, where |E(δ)| denotes the (length) measure of E(δ). Because |E(δ)| → 0 as δ → 0, there exist δ(ε) > 0 and N1 (ε) ≥ 1 such that |Gn (δ(ε))|/n < ε for all n ≥ N1 (ε). From (12.12) we infer that Dn yn 22 → 0 as n → ∞. Consequently, Dn yn 22 < εδ(ε)2 for all n ≥ N2 (ε). Since Dn yn 22 =
n
|b(ωnj −1 ) − λ|2 |yj(n) |2 ≥ δ(ε)2
j =1
|yj(n) |2 ,
j ∈G / n (δ(ε))
(n) 2 2 it follows that j ∈G / n (δ(ε)) |yj | < ε for n ≥ N2 (ε). Thus, PGn (δ(ε)) yn 2 > 1 − ε for all n ≥ N2 (ε). Put n(ε) = max(N1 (ε), N2 (ε)). Now let εk = 1/k (k ≥ 2). With δk := δ(εk ) and nk := n(εk ) we then have 1 |Gn (δk )| < n k
and
PGn (δk ) yn 22 > 1 −
1 k
for n ≥ nk .
(12.13)
We may without loss of generality assume that 1 < n2 < n3 < · · · . For 1 ≤ n < n2 , we let Jn denote an arbitrary subset of {1, . . . , n}. For n ≥ n2 , we define the sets Jn ⊂ {1, . . . , n} by Jn2 = Gn2 (δ2 ), Jn2 +1 = Gn2 +1 (δ2 ), . . . , Jn3 −1 = Gn3 −1 (δ2 ), Jn3 = Gn3 (δ3 ), Jn3 +1 = Gn3 +1 (δ3 ), . . . , Jn4 −1 = Gn4 −1 (δ3 ),
... .
From (12.13) we see that 1 |Jn −1 | 1 |Jn2 | < ,..., 3 < , n2 2 n3 − 1 2 1 |Jn4 −1 | |Jn3 | 1 < ,..., < , n3 3 n4 − 1 3
...,
which shows that |Jn |/n → 0 as n → ∞. Also by (12.13), 1 PJn2 yn2 22 > 1 − , . . . , PJn3 −1 yn3 −1 22 > 1 − 2 1 2 PJn3 yn3 2 > 1 − , . . . , PJn4 −1 yn4 −1 22 > 1 − 3
1 , 2 1 , 3
...,
and hence PJn yn 2 → 1 as n → ∞. Since yn 2 = 1 for all n, it results that {yn } is asymptotically localized. Consequently, {xn } is asymptotically extended.
12.5
Pseudomodes of Toeplitz Matrices
Let b be a Laurent polynomial. Suppose that λ ∈ / b(T) and wind (b, λ) = −m < 0. We then can write b − λ = cχ−m , where 0 ∈ / c(T), wind (c, 0) = 0, and χk is defined by χk (t) = t k (t ∈ T). The operator T (c) is invertible on 2 and, moreover, the matrices Tn (c) are invertible for all sufficiently large n, lim Tn−1 (c)2 = T −1 (c)2
n→∞
and
Tn−1 (c)Pn → T −1 (c) strongly
(12.14)
i
i i
i
i
i
i
304
buch7 2005/10/5 page 304 i
Chapter 12. Eigenvectors and Pseudomodes
(see Corollaries 3.8 and 6.5). We know from the proof of Proposition 1.20 that the m elements uj := T −1 (c)ej
(j = 1, . . . , m)
(12.15)
form a basis in Ker T (b − λ), where ej ∈ 2 is the sequence whose j th term is 1 and the remaining terms of which are zero. By Theorem 3.7, each point λ ∈ C \ b(T) with wind (b, λ) = 0 is an asymptotically good pseudoeigenvalue for {Tn (b)}. The following theorem provides us with a complete description of the structure of asymptotically good pseudomodes. Theorem 12.8. Suppose λ ∈ / b(T) and wind (b, λ) = −m < 0. Let xn ∈ Cn be unit vectors. The sequence {xn } is an asymptotically good pseudomode for {Tn (b)} at λ if and only if there exist γ1(n) , . . . , γm(n) ∈ C and zn ∈ Cn such that xn = γ1(n) Pn u1 + · · · + γm(n) Pn um + zn , sup
n≥1, 1≤j ≤m
|γj(n) |
< ∞,
lim zn 2 = 0,
n→∞
(12.16) (12.17)
where u1 , . . . , um are given by (12.15). Proof. Assume that (12.16) and (12.17) hold. Since Tn (b − λ)Pn 2 ≤ b − λ∞ , we see that Tn (b − λ)zn → 0. As the numbers |γj(n) | are bounded by a constant independent of n and as Pn → I strongly and T (b − λ)uj = 0, we obtain that lim Tn (b − λ)xn =
n→∞
m j =1
lim γ (n) Tn (b n→∞ j
− λ)uj = 0.
Thus, {xn } is an asymptotically good pseudomode. Conversely, suppose Tn (b −λ)xn 2 → 0. Put yn = Tn (b −λ)xn . With Qn = I −Pn , we have Tn (b − λ) = Tn (χ−m c) = Pn T (χ−m c)Pn = Pn T (χ−m )T (c)Pn = Pn T (χ−m )Pn T (c)Pn + Pn T (χ−m )Qn T (c)Pn =: An + Bn . Since T (χ−m ) is nothing but the shift operator {ξ1 , ξ2 , . . . } → {ξm+1 , ξm+2 , . . . }, it follows that Im An ⊂ Im Pn−m ,
Im Bn ⊂ Im P{n−m+1,...,n} ,
(12.18)
where Im C refers to the image (= range) of the operator C. Also recall (12.11). This implies that yn 22 = An xn + Bn xn 22 = An xn 22 + Bn xn 22 , and hence An xn 2 → 0 because yn 2 → 0. The equality Tn (χ−m )Tn (c)xn = An xn gives Tn (c)xn = γ1(n) e1 + · · · + γm(n) em + Tn (χm )An xn
i
i i
i
i
i
i
12.5. Pseudomodes of Toeplitz Matrices
buch7 2005/10/5 page 305 i
305
with certain complex numbers γ1(n) , . . . , γm(n) . Since ⎛ ⎞1/2 m ⎝ |γj(n) |2 ⎠ ≤ Tn (c)xn 2 + T (χm )2 An xn 2 ≤ b∞ + An xn 2 , j =1
we conclude that there is an M < ∞ such that |γj(n) | ≤ M for all n and j . Finally, from (12.14), (12.15) and the equality xn = γ1(n) Tn−1 (c)e1 + · · · + γm(n) Tn−1 (c)em + Tn−1 (c)Tn (χm )An xn we get (12.16) and (12.17) with zn = Tn−1 (c)Tn (χm )An xn +
m
γj(n) (Tn−1 (c)ej − Pn T −1 (c)ej ).
j =1
This completes the proof. We now sharpen the definition of an asymptotically localized sequence. We say that a sequence {yn } of vectors yn ∈ Cn is asymptotically strongly localized in the beginning part if P{1,...,jn } yn 2 lim =1 (12.19) n→∞ yn 2 for every sequence {jn }∞ n=1 such that jn → ∞ and 1 ≤ jn ≤ n. Asymptotic strong localization in the beginning part implies, for example, that (12.19) is true with jn = log log n for sufficiently large n. Theorem 12.9. Suppose λ ∈ / b(T) and wind (b, λ) = −m < 0. Then every asymptotically good pseudomode for {Tn (b)} at λ is asymptotically strongly localized in the beginning part. Proof. Let {xn } be an asymptotically good pseudomode for {Tn (b)} at λ. We may without loss of generality assume that xn 2 = 1 for all n. By Theorem 12.8, xn = γ1(n) Pn u1 + · · · + γm(n) Pn um + zn =: wn + zn , where u1 , . . . , um are given by (12.15) and γ1(n) , . . . , γm(n) , zn satisfy (12.17). Choose M < ∞ so that |γi(n) | ≤ M for all i and n. Let {jn } be any sequence such that jn → ∞ and 1 ≤ jn ≤ n. Put Jn = {1, . . . , jn } and Jnc = {jn + 1, . . . , n}. From (12.15) we infer that
(i) ∞ 2 u1 , . . . , um ∈ 2 . We have PJnc wn 2 ≤ M m i=1 PJnc ui 2 . Since ui = {uk }k=1 is in and hence P
Jnc
ui 22
=
n k=jn +1
2 |u(i) k |
≤
∞
2 |u(i) k | = o(1)
as
jn → ∞,
k=jn +1
it follows that PJnc wn 2 → 0 as n → ∞. Finally, 1 ≥ PJn xn 22 = 1 − PJnc xn 22 = 1 − PJnc (wn + zn )22 2 ≥ 1 − PJnc wn 2 + PJnc zn 2 ,
i
i i
i
i
i
i
306
buch7 2005/10/5 page 306 i
Chapter 12. Eigenvectors and Pseudomodes
and because PJnc wn 2 → 0 and PJnc zn 2 → 0 as n → ∞, we arrive at the conclusion that PJn xn 2 → 1. To conclude this section, suppose that λ ∈ C \ b(T) and wind (b, λ) = m > 0. We then have λ ∈ / b(T) and wind ( b, λ) = −m < 0. Moreover, with Wn defined by (3.9), Wn Tn (b − λ)Wn = Tn ( b − λ) and hence Tn (b − λ)xn 2 = Tn ( b − λ)Wn xn 2 . Consequently, by Theorem 12.8, a sequence {xn } of unit vectors is an asymptotically good pseudomode of {Tn (b)} at λ if and only if Wn xn = γ1(n) Pn u1 + · · · + γm(n) Pn um + z n ,
(12.20)
where |γj(n) | ≤ M < ∞ for all j and n, zn 2 → 0 as n → ∞, and u1 , . . . , um are given −1 c )ej . Clearly, (12.20) can be rewritten in the form by uj = T ( xn = γ1(n) Wn u1 + · · · + γm(n) Wn um + zn with zn = Wn zn . The analogue of Theorem 12.9 says that every asymptotically good pseudomode {xn } for {Tn (b)} at λ is asymptotically strongly localized in the terminating part, that is, the sequence {Wn xn } is asymptotically strongly localized in the beginning part.
Exercises 1. Let T2n (b) be a real symmetric Toeplitz matrix of order 2n. An eigenvalue λ of T2n (b) is said to be even (odd) if there exists an eigenvector x for λ such that Wn x = x (Wn x = −x). Show that the even and odd eigenvalues of T2n (b) are the eigenvalues of Tn (b) + Hn (b) and Tn (b) − Hn (b), respectively, where Hn (b) is the principal n × n submatrix of the infinite Hankel matrix H (b).
2. Let b(t) = sj =−r bj t j with rsb−r bs = 0. Suppose λ is an eigenvalue of Tn (b). Denote by ξ1 , . . . , ξm the distinct zeros of the polynomial zr (b(z) − λ) and let αi be the multiplicity of ξi . Define the matrix Ar+s as in Section 2.5. Finally, put dn (λ) = dim Ker (Tn (b) − λI ). (a) Prove that x = ( x0 x1 . . . xn−1 ) is an eigenvector of Tn (b) for λ if and only if xk =
m α i −1 i=1 j =0
n − k n−k−j Cj i j ! ξi j
(k = 0, 1, . . . , n − 1),
where ( C01 . . . Cα1 −1,1 . . . C0m . . . Cαm −1,m ) Ar+s = ( 0 . . . 0 ). (b) Show that dn (λ) = r + s − rank Ar+s . (c) Show that dn (λ) ≤ min(r, s). (d) Show that if dn (λ) = m ≥ 2, then λ is also an eigenvalue of Tn+1 (b) and Tn−1 (b) and dn+1 (λ) ≥ m − 1 and dn−1 (λ) ≥ m − 1.
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 307 i
307
3. Let k be a function in L1 (R) and suppose k(x) = 0 for |x| ≥ r. For τ > 2r, there is a unique continuation of k to a τ -periodic function kτ on all of R. A continuous analogue of the circulant matrix Cn (b) is the operator on L2 (0, τ ) that is defined by τ (Aτ f )(x) = γf (x) + kτ (x − t)f (t)dt, x ∈ (0, τ ), 0
where γ is a fixed complex number. Prove that a point λ ∈ C is an asymptotically good eigenvalue, which means that (Aτ − λI )−1 2 → ∞ as τ → ∞, if and only if λ = γ or r λ=γ + k(x)eiξ x dx −r
for some ξ ∈ R.
Notes The results of Sections 12.2 and 12.3 are from our paper [61] with Ramírez de Arellano. Sections 12.4 and 12.5 are based on our article [55]. Exercise 1 is a special case of results of [5] and [82]; there one can also find a corresponding result for T2n+1 (b). Exercise 2 summarizes the basic results of Trench’s paper [278]. A solution to Exercise 3 is in [55]. Further results: real symmetric Toeplitz matrices. Let b ∈ P. The matrix Tn (b) is real and symmetric if and only if b(eiθ ) (θ ∈ (−π, π )) is real and even. Assume this condition is satisfied. We then have π 1 1 π iθ −inθ bn = b(e )e dθ = b(eiθ ) cos nθ dθ. 2π −π π 0 If λ is an eigenvalue of Tn (b), then there exists an eigenvector x for λ such that Wn x = x or Wn x = −x. In the former case λ is called an even eigenvalue and in the latter case λ is said to be odd (recall Exercise 1). In [5], [82] it is shown that Tn (b) has exactly [ n+1 ] 2 even and exactly [ n2 ] odd eigenvalues (a repeated eigenvalue is necessarily both even and odd [99]). For α < β, we denote by Neven (α, β; n) and Nodd (α, β; n) the number of even and odd eigenvalues of Tn (b) in [α, β]. Trench [280] proved that lim
n→∞
Neven (α, β; n) Nodd (α, β; n) 1 = lim = |{θ ∈ (0, π ) : α ≤ b(eiθ ) ≤ β}|, n→∞ n n π
where | · | denotes Lebesgue measure.
i
i i
i
i
buch7 2005/10/5 page 308 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 309 i
Chapter 13
Structured Perturbations
The pseudospectrum spε Tn (b) measures the extent to which the spectrum of Tn (b) may change by an arbitrary perturbation of norm at most ε. We now consider perturbations that have the same Toeplitz band structure as Tn (b). In this way we can find the distance of Tn (b) to the nearest singular matrix within the set of all matrices of the same banded structure as Tn (b). Various condition numbers measure the sensitivity of properties of a matrix subject to perturbations. If the matrix has a certain structure, it is natural to require that the perturbations be of the same structure. This leads to the notion of structured condition numbers. In this chapter we study structured condition numbers for the Toeplitz structure. We give in particular a probabilistic argument which shows that in general we do not win anything by passing from unstructured condition numbers of banded Toeplitz matrices to Toeplitz-structured condition numbers.
13.1 Toeplitz Pseudospectra Let Pr,s denote the set of all Laurent polynomials c of the form c(t) =
s
j =−r
cj t j (t ∈ T). Toep[r,s]
For b ∈ Pr,s and ε > 0, we define the Toeplitz-structured pseudospectrum spε by . Tn (b) = sp Tn (b + ϕ). spToep[r,s] ε
Tn (b)
ϕ∈Pr,s , ϕ∞ ≤ε Toep[r,s]
Clearly, spε Tn (b) is a subset of spε Tn (b). We denote by Uσ (λ0 ) the open disk of radius σ centered at λ0 . Lemma 13.1. Let c ∈ Pr,s . If λ0 ∈ / (c), then there exist n0 ∈ N, σ > 0, δ > 0 such that Uσ (λ0 ) ∩ sp Tn (c + ϕ) = ∅ whenever ϕ ∈ Pr,s , ϕ∞ ≤ δ, and n ≥ n0 . Proof. From Theorem 11.3 we infer that there is a number ∈ (0, ∞) such that λ0 does not belong to sp T (c ). Hence, by Theorem 3.7, there exist n0 ∈ N and M ∈ (0, ∞) such 309
i
i i
i
i
i
i
310
buch7 2005/10/5 page 310 i
Chapter 13. Structured Perturbations
that Tn−1 (c − λ0 )2 ≤ M for all n ≥ n0 . Put σ = 1/(4M) and suppose that λ ∈ Uσ (λ0 ) and ϕ ∈ Pr,s . We have Tn ((c + ϕ) − λ) = Tn (c − λ0 ) + Tn (ϕ ) + (λ0 − λ)Pn and Tn (c − λ0 )2 ≥ 1/M > 1/(2M). Since |λ0 − λ| < 1/(4M) and Tn (ϕ )2 ≤ ϕ ∞ < 1/(4M) if only ϕ∞ ≤ δ for some sufficiently small δ > 0, it follows that Tn ((c + ϕ) − λ) is invertible for all λ ∈ Uσ (λ0 ) and all ϕ ∈ Pr,s with ϕ∞ ≤ δ. As the invertibility of Tn (c + ϕ − λ) is equivalent to the invertibility of Tn ((c + ϕ) − λ) (notice that the two matrices are similar), we arrive at the assertion. Theorem 13.2. If b ∈ Pr,s then lim inf spToep[r,s] Tn (b) = lim sup spToep[r,s] Tn (b) = ε ε n→∞
n→∞
.
(b + ϕ).
ϕ∈Pr,s , ϕ∞ ≤ε
Toep[r,s]
Proof. Let λ ∈ lim sup spε Tn (b). Then there are ϕnk in Pr,s and λnk in sp Tnk (b + ϕnk ) such that ϕnk ∞ ≤ ε and λnk → λ. As the unit ball of Cr+s+1 is compact, the sequence {ϕnk } has a subsequence {ϕnk } converging to some ϕ ∈ Pr,s of ∞-norm at most ε. We claim that λ ∈ (b + ϕ). Indeed, if λ ∈ / (b + ϕ) then, by Lemma 13.1, Uσ (λ) ∩ sp Tnk (b + ϕnk ) = ∅ for some σ > 0 and all sufficiently large nk , which is impossible because λnk → λ. Thus, we have proved that λ is in ∪ (b + ϕ), the union over all ϕ ∈ Pr,s for which ϕ∞ ≤ ε. Now let λ ∈ (b + ϕ) for some ϕ ∈ Pr,s with ϕ∞ ≤ ε. Then there are λn in Toep[r,s] sp Tn (b + ϕ) such that λn → λ, which shows that λ is in lim inf spε Tn (b).
13.2 The Nearest Singular Matrix Given an n × n matrix An , the distance d(An ) to the nearest singular matrix is defined as the infimum of the set of all ε > 0 for which there exists an n × n matrix Kn such that Kn 2 ≤ ε and 0 ∈ sp (An + Kn ). If An is singular, then d(An ) = 0. If An is invertible, we have d(An ) = inf{ε > 0 : 0 ∈ spε An } = inf{ε > 0 : A−1 n 2 ≥ 1/ε} −1 −1 −1 = inf{ε > 0 : ε ≥ A−1 n 2 } = An 2 = σ1 (An ).
Thus, in either case d(An ) equals the minimal singular value of An . Analogously, if A is a bounded linear operator on 2 , we define d(A) as the infimum of all ε > 0 such that 0 ∈ spε A. As above, d(A) = 0 if A is not invertible, while d(A) = A−1 −1 2 in case A is invertible. Now let b be a Laurent polynomial and suppose b has no zeros on the unit circle T. If wind b = 0, then (6.16) implies that −1 d(Tn (b)) = σ1 (Tn (b)) = Tn−1 (b)−1 (b)−1 2 → T 2 = d(T (b)),
that is, d(Tn (b)) stays away from zero. However, if wind b = 0, we deduce from Theorem 9.4 that d(Tn (b)) = σ1 (Tn (b)) converges to zero with at least exponential speed.
i
i i
i
i
i
i
13.2. The Nearest Singular Matrix
buch7 2005/10/5 page 311 i
311
Things are less dramatic when restricting ourselves to structured perturbations. For b ∈ Pr,s , we define d Toep[r,s] (Tn (b)) = inf ε > 0 : 0 ∈ sp Tn (b + ϕ), ϕ ∈ Pr,s , ϕ∞ ≤ ε . In the notation of Section 13.1,
d Toep[r,s] (Tn (b)) = inf ε > 0 : 0 ∈ spToep[r,s] Tn (b) . ε
Theorem 13.3. If b ∈ Pr,s , then lim d Toep[r,s] (Tn (b)) = inf
n→∞
⎧ ⎨
ε>0:0∈
⎩
. ϕ∈Pr,s , ϕ∞ ≤ε
⎫ ⎬
(b + ϕ) , ⎭
(13.1)
and we always have dist (0, sp T (b)) ≤ lim d Toep[r,s] (Tn (b)) ≤ dist (0, (b)). n→∞
(13.2)
Proof. Clearly, the infimum on the right of (13.1) is equal to inf f ∞ : f ∈ Pr,s , 0 ∈ (b + f ) =: . Let f ∈ Pr,s , 0 ∈ (b + f ), f ∞ < + ε. There are λn ∈ sp Tn (b + f ) such that λn → 0. As 0 ∈ sp Tn (b + f − λn ), we have d Toep[r,s] (Tn (b)) ≤ f − λn ∞ , whence lim sup d Toep[r,s] (Tn (b)) ≤ f ∞ < + ε. Since ε > 0 can be chosen arbitrarily, it follows that lim sup d Toep[r,s] (Tn (b)) ≤ . Let σ := lim inf d Toep[r,s] (Tn (b)). n→∞
Given any ε > 0, there exist fnk ∈ Pr,s such that 0 ∈ sp Tnk (b + fnk ) and fnk ∞ ≤ σ + ε. We may assume that the fnk ’s converge in L∞ to some f ∈ Pr,s satisfying f ∞ ≤ σ + ε (if necessary, we can pass to a subsequence). Assume 0 ∈ / (b + f ). Then, by Lemma 13.1, Uσ (0) ∩ sp Tnk (b + fnk ) = ∅ for all sufficiently large nk , which is impossible. Hence 0 ∈ (b + f ). We arrive at the conclusion that ≤ σ + ε, and as ε > 0 is an arbitrary number, it follows that ≤ σ . This completes the proof of equality (13.1). Since (b + δ) = (b) + δ for every complex number δ, we see that always lim d Toep[r,s] (Tn (b)) ≤ dist (0, (b)), and as (b + ϕ) ⊂ sp T (b + ϕ) and . sp T (b + ϕ) = sp T (b) + εD, ϕ∞ ≤ε
we obtain that dist (0, sp T (b)) ≤ lim d Toep[r,s] (Tn (b)). This gives (13.2). The following two examples show that the estimates (13.2) are sharp. Example 13.4. Let b ∈ P0,1 be given by b(t) = b0 + b1 t with |b0 | > |b1 | > 0. Then sp T (b) = b0 + b1 D, (b) = {b0 }, dist (0, sp T (b)) = |b0 | − |b1 |, dist (0, (b)) = |b0 |, . (b + ϕ) = b0 + εD. ϕ∈P0,1 , ϕ∞ ≤ε
i
i i
i
i
i
i
312
buch7 2005/10/5 page 312 i
Chapter 13. Structured Perturbations
Thus, by (13.1), dist (0, sp T (b)) < lim d Toep[0,1] (Tn (b)) = dist (0, (b)). n→∞
Example 13.5. Define b ∈ P1,1 by b(t) = 4 + 2t + t −1 . The range b(T) is the ellipse {4 + 3 cos θ + i sin θ : θ ∈ [0, 2π )}. Hence dist (0, sp T (b)) = 1. From Theorem 2.4 we know that if c(t) = c0 + c1 t + c−1 t −1 , then (c) is the line segment between the foci of the ellipse c(T), √ √ (c) = [c0 − 2 c1 c−1 , c0 + 2 c1 c−1 ]. (13.3) √ √ It follows √ that in the case at−1hand (b) = [4 − 2 2, 4 + 2 2], whence dist (0, (b)) = 4 − 2 2 > 1. Put ϕ(t) = t . Then, by (13.3), √ √ (b + ϕ) = [4 − 2 2 · 2, 4 + 2 2 · 2] = [0, 8], which together with Theorem 13.3 implies that lim d Toep[1,1] (Tn (b)) ≤ ϕ∞ = 1. Consequently, dist (0, sp T (b)) = lim d Toep[1,1] (Tn (b)) < dist (0, (b)). n→∞
The following lemma is needed to prove Corollary 13.7. This corollary clarifies what happens with the distance of a large Toeplitz band matrix to the nearest singular matrix when the distance is measured within the Toeplitz band matrices of the same structure as the original matrix. Lemma 13.6. The function b → (b) is upper-semicontinuous on Pr,s , that is, given any ε > 0, there is a δ > 0 such that (b + ϕ) ⊂ (b) + εD whenever ϕ ∈ Pr,s and ϕ∞ ≤ δ. Proof. The case where b vanishes identically is trivial. We may therefore without loss of generality assume that ε ∈ (0, b∞ ). Let K be the compact set / (b) + εD}. K = {λ ∈ C : |λ| ≤ 2b∞ , λ ∈ By virtue of Lemma 13.1, for each λ ∈ K there are n ∈ N, σ > 0, δ > 0 such that Uσ (λ) ∩ sp Tk (b + ϕ) = ∅ for all ϕ ∈ Pr,s with ϕ∞ ≤ δ and all k ≥ n. Since K is compact, we can find finitely many λj ∈ K, nj ∈ N, σj > 0, δj > 0 (j = 1, . . . , m) such that K⊂
m .
Uσj (λj ),
Uσj (λj ) ∩ sp Tk (b + ϕ) = ∅
j =1
for ϕ ∈ Pr,s , ϕ∞ ≤ δj , k ≥ nj . Put n0 = max nj and δ = min(δ1 , . . . , δm , b∞ ). If λ ∈ K, then λ ∈ Uσj (λj ) for some j and hence Tk (b + ϕ − λ) is invertible for ϕ ∈ Pr,s , ϕ∞ ≤ δ, k ≥ n0 . If λ > 2b∞ , then Tk (b + ϕ − λ) is invertible for all k and all ϕ ∈ Pr,s with ϕ∞ ≤ δ because Tk (b + ϕ) ≤ b∞ + ϕ∞ ≤ 2b∞ < |λ|.
i
i i
i
i
i
i
13.3. Structured Normwise Condition Numbers
buch7 2005/10/5 page 313 i
313
Thus, we have shown that if λ ∈ / (b) + εD, then Tk (b + ϕ − λ) is invertible for k ≥ n0 and ϕ ∈ Pr,s , ϕ∞ ≤ δ. Consequently, sp Tk (b + ϕ) ⊂ (b) + εD for all k ≥ n0 and all ϕ ∈ Pr,s satisfying ϕ∞ ≤ δ. This implies that (b + ϕ) ⊂ (b) + εD for ϕ ∈ Pr,s with ϕ∞ ≤ δ. Corollary 13.7. Let b ∈ Pr,s . If 0 ∈ (b), then d Toep[r,s] (Tn (b)) → 0 as n → ∞. If 0∈ / (b), then there is an ε > 0 such that d Toep[r,s] (Tn (b)) ≥ ε for all sufficiently large n. Proof. If 0 ∈ (b), then d Toep[r,s] (Tn (b)) goes to zero due to Theorem 13.3. If 0 ∈ / (b), we can find an ε > 0 such that 0 ∈ / (b) + εD. Theorem 13.3 and Lemma 13.6 so imply that lim d Toep[r,s] (Tn (b)) ≥ ε.
13.3
Structured Normwise Condition Numbers
Let K stand for R or C, let Mn (K) be the collection of all n × n matrices with entries in K, and let Str n (K) denote the matrices in Mn (K) which possess a certain prescribed structure. For example, Str n (K) might be the set of all symmetric Toeplitz matrices with entries in K. For an invertible matrix An ∈ Str n (K), a vector b ∈ Kn , and x ∈ Kn \ {0}, one defines δx2 κbStr (An , x) = lim sup : (An + δAn )(x + δx) = An x + δb, ε→0 εx2 δAn ∈ Str n (K), δAn 2 ≤ εAn 2 : δb ∈ Kn , δb2 ≤ εb2 . Two natural choices are b = 0 (no perturbations to the right-hand side) and b = An x (right-hand sides with the same relative error as in the matrix of the system). In the first case we speak of the structured (normwise) condition number and in the second case of the full structured (normwise) condition number, and we introduce the notations κ Str (An , x) := κ0Str (An , x),
Str κfull (An , x) := κAStrn x (An , x).
In what follows we also need the number Str (An , x) = sup A−1 n δAn x2 : δAn ∈ Str n (K), δAn 2 ≤ 1 . Proposition 13.8. If Str n (K) is invariant under multiplication by real numbers, then Str (An , x) b2 b2 Str −1 An 2 + ≤ κb (An , x) ≤ An 2 An 2 + . x2 x2 x2 Proof. Let δAn ∈ Str n (K), δb ∈ Kn and suppose δAn 2 ≤ εAn 2 , δb2 ≤ εb2 . If ε > 0 is sufficiently small, then A−1 n δAn 2 < 1. In this case the equation (An + δAn )(x + δx) = An x + δb gives −1 −1 x + δx = (I + A−1 n δAn ) (x + An δb) −1 2 = (I − A−1 n δAn )(x + An δb) + O(ε ) −1 2 = x + A−1 n δb − An δAn x + O(ε ).
(13.4)
i
i i
i
i
i
i
314
buch7 2005/10/5 page 314 i
Chapter 13. Structured Perturbations
It follows that δx2 A−1 A−1 n 2 δAn 2 x2 n 2 δb2 ≤ + + O(ε) εx2 εx2 εx2 A−1 n 2 b2 ≤ + A−1 n 2 An 2 + O(ε), x2 whence κbStr (An , x) ≤
A−1 n 2 b2 + A−1 n 2 An 2 , x2
which is the asserted estimate from above. To estimate κbStr (An , x) from below, choose δb = −
b2 δAn x. An 2 x2
Clearly, δb2 ≤ εb2 . By (13.4), −1 2 δx = A−1 n δb − An δAn x + O(ε ) b2 2 =− 1+ A−1 n δAn x + O(ε ), An 2 x2
which yields A−1 b2 δx2 n δAn x2 = 1+ + O(ε) εx2 An 2 x2 εx2 = = = b2 An 2 = 1 −1 = = 1+ A δAn x = = + O(ε). An 2 x2 x2 = n εAn 2 2 Since (1/(εAn 2 ))δAn ∈ Str n (K), we obtain that An 2 Str b2 (An , x) κbStr (An , x) ≥ 1 + An 2 x2 x2 Str (An , x) b2 = An 2 + , x2 x2 which proves the asserted lower estimate. From Proposition 13.8 we see in particular that Str (An , x) ≤ 2κ(An ) := An 2 A−1 κ Str (An , x) ≤ κfull n 2 .
For b = 0, Proposition 13.8 can be sharpened. Proposition 13.9. If Str n (K) is invariant under multiplication by real numbers, then κ Str (An , x) =
An 2 Str (An , x). x2
i
i i
i
i
i
i
13.3. Structured Normwise Condition Numbers
buch7 2005/10/5 page 315 i
315
2 Proof. Formula (13.4) now reads δx = −A−1 n δAn x + O(ε ), and hence
A−1 δx2 n δAn x2 = + O(ε). εx2 εx2 Since : A−1 n δAn x2 : δAn ∈ Str n (K), δAn 2 ≤ εAn 2 εx2 An 2 sup A−1 = n δBn x2 : δBn ∈ Str n (K), δBn 2 ≤ 1 , x2
sup
we arrive at the assertion. In the case where Str n (K) = Mn (K), we write κb (An , x), κ(An , x), κfull (An , x), and Str (An , x) instead of κbStr (An , x), κ Str (An , x), κfull (An , x), and Str (An , x). Proposition 13.10. We have (An , x) = A−1 n 2 x2 , κ(An , x) =
A−1 n 2
(13.5)
An 2 = κ(An ),
−1 κfull (An , x) = A−1 n 2 An 2 + An 2
(13.6) An x2 . x2
(13.7)
Proof. It is clear that (An , x) ≤ A−1 n 2 x2 . To prove that equality actually holds, −1 n n choose a nonzero y ∈ Kn such that A−1 n y2 = An 2 y2 and define δAn : K → K by y x δAn ξ = ξ, . x2 y2 Then δAn 2 = 1 and = = = −1 x y = = = x2 A−1 y2 = A−1 2 x2 , = x, A−1 δA x = A n 2 n n n = n x2 y2 =2 y2 which completes the proof of (13.5). Combining Proposition 13.9 and (13.5) we get (13.6), while Proposition 13.8 in conjunction with (13.5) gives (13.7). Theorem 13.11. Suppose Str n (K) is invariant under multiplication by real numbers. If An ∈ Str n (K) is an invertible matrix, x ∈ Kn is a nonzero vector, and ∗ Str (An , x) ≥ ω(A−1 n ) x2
(13.8)
with some ω ∈ (0, ∞), then " Str (An , x) ≥ κfull
ω A−1 n 2 An 2 . 2
(13.9)
i
i i
i
i
i
i
316
buch7 2005/10/5 page 316 i
Chapter 13. Structured Perturbations
Proof. Without loss of generality assume that x2 = 1. From (13.4) we see that : 1 −1 Str −1 κfull (An , x) = lim sup A δAn x − An δb2 , ε→0 ε n the supremum over all δAn ∈ Str n (K) and δb ∈ Kn such that (An + δAn )(x + δx) = An x + δb, δAn 2 ≤ εAn 2 , δb2 ≤ εAn x2 . Since −δb2 = δb2 , it follows that : 1 Str −1 −1 −1 −1 κfull (An , x) = lim sup max An δAn x − An δb2 , An δAn x + An δb2 , ε→0 ε the supremum over the same set as before. For arbitrary u, v ∈ Kn we have % 1 max u + v2 , u − v2 ≥ u22 + v22 ≥ √ (u2 + v2 ) 2 (note that if we are given a parallelogram with the sides a and b, then the length of its longest diagonal is d 2 = a 2 + b2 − 2ab cos ϕ ≥ a 2 + b2 because cos ϕ ≤ 0). Consequently, : 1 −1 Str −1 κfull (An , x) ≥ lim sup √ An δAn x2 + An δb2 , ε→0 2ε the supremum again taken over the same set as above, whence 1 Str κfull (An , x) ≥ √ An 2 Str (An , x) + An x2 A−1 n 2 . 2 Thus, we are left to show that An 2 Str (An , x) + An x2 A−1 n 2 ≥
% ωA−1 n 2 An 2 .
# ωAn 2 /A−1 n 2 . So assume that % An x2 < ωAn 2 /A−1 n 2 .
This is certainly true if An x2 ≥
(13.10)
The product of the 1 × n matrix x ∗ A−1 n and the n × 1 matrix An x is 1. Hence −1 ∗ 1 ≤ x ∗ A−1 n 2 An x2 = (An ) x2 An x2 .
(13.11)
From (13.8), (13.10), and (13.11) we get ∗ An 2 Str (An , x) ≥ ωAn 2 (A−1 n ) x2 % ωAn 2 ωAn 2 ≥ ># = ωAn 2 A−1 n 2 . −1 An x2 ωAn 2 /An 2
This completes the proof of (13.9). Things are very simple for circulant matrices. Let Circn (K) stand for the collection of the matrices Cn (b) defined in Section 2.1 with entries in K. Suppose b has no zeros on
i
i i
i
i
i
i
13.4. Toeplitz Systems
buch7 2005/10/5 page 317 i
317
T. Then Cn (b) is invertible for all n. Note that (Cn (b))∗ = Cn (b). Proposition 2.1 implies that κ(Cn (b)) → b∞ b−1 ∞ as n → ∞. Corollary 13.12. If Cn (b) is invertible, then Circ (Cn (b), x) = Cn−1 (b)x2 = Cn−1 (b)x2 , Cn (b)2 −1 κ Circ (Cn (b), x) = Cn (b)x2 , x2 " 1 Circ κfull (Cn (b), x) ≥ κ(Cn (b)). 2 Proof. By Proposition 2.1, Cn (b) = Fn∗ Dn (b)Fn with a unitary matrix Fn and the diagonal matrix Dn (b) := diag (b(1), b(ωn ), . . . , b(ωnn−1 )). Thus we can write Cn−1 (b) = Fn∗ Dn (b−1 )Fn . By definition, Circ (Cn (b), x) = sup{Cn−1 (b)Cn (g)x2 : Cn (g)2 ≤ 1}, and since Cn−1 (b) is also a circulant matrix and circulant matrices commute, we see that Circ (Cn (b), x) = sup{Cn (g)Cn−1 (b)x2 : Cn (g)2 ≤ 1} ≤ Cn−1 (b)x2 . As I ∈ Circn (K), we have Circ (Cn (b), x) ≥ Cn−1 (b)x2 . Thus, Circ (Cn (b), x) = Cn−1 (b)x2 . −1 −1 −1 Furthermore, Cn−1 (b)x2 = Fn∗ Dn (b )Fn x2 = Dn (b )Fn x2 , and since Dn (b ) = Sn Dn (b−1 ) with a unitary diagonal matrix Sn , we get −1
Dn (b )Fn x2 = Sn Dn (b−1 )Fn x2 = Dn (b−1 )Fn x2 = Fn∗ Dn (b−1 )Fn x2 = Cn−1 (b)x2 . Now the formula for κ Circ (Cn (b), x) follows from Proposition 13.9, while the estimate for Circ κfull (Cn (b), x) results from Theorem 13.11. Formula (9.33) and Corollary 13.12 show that if the Laurent polynomial b has no zeros on T, then for a typical x, κ Circ (Cn (b), x) ≈
b∞ b−1 2 b−1 ∞ x2 = b∞ b−1 2 x2 b−1 ∞
for all sufficiently large n. This is a little better than κ(Cn (b)) ≈ b∞ b−1 ∞ , but the improvement is hardly significant.
13.4 Toeplitz Systems Define Toepn (K) as the set of all n × n Toeplitz matrices with entries in K. In this section Toep we estimate κfull (Tn (b), x) and show that these full structured condition numbers always have the same (good or bad) behavior as the usual condition numbers κ(Tn (b)) as n → ∞. Following the Highams [160] and Rump [236], we associate the n × (2n − 1) matrix ⎞ ⎛ x0 xn−1 . . . x1 ⎟ ⎜ xn−1 . . . x1 x0 ⎟ x := xToep := ⎜ (13.12) ⎠ ⎝ ... ... ... ... xn−1 . . . x1 x0
i
i i
i
i
i
i
318
buch7 2005/10/5 page 318 i
Chapter 13. Structured Perturbations
with a vector x = (x0 , x1 , . . . , xn−1 ) ∈ Kn . For every n × n Toeplitz matrix ⎛
g−1 g0 ... gn−2
g0 ⎜ g1 Tn (g) = ⎜ ⎝ ... gn−1
⎞ . . . g−(n−1) . . . g−(n−2) ⎟ ⎟ ... ... ⎠ ... g0
we have the equality ⎛ ⎜ ⎜ Tn (g)x = x ⎜ ⎜ ⎝
g−(n−1) g−(n−2) ... gn−2 gn−1
⎞ ⎟ ⎟ ⎟ =: x g, ⎟ ⎠
(13.13)
which for n = 3 reads ⎛
g0 ⎝ g1 g2
g−1 g0 g1
⎞⎛
⎞
⎛
x0 x2 g−2 g−1 ⎠ ⎝ x1 ⎠ = ⎝ 0 0 g0 x2
x1 x2 0
0 x0 x1
x0 x1 x2
⎞
⎛
⎜ 0 ⎜ ⎜ ⎠ 0 ⎜ ⎝ x0
g−2 g−1 g0 g1 g2
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
and which can be readily verified for general n. We also define the map W n by W n : Kn → Kn , (x0 , x1 , . . . , xn−1 ) → (x n−1 , . . . , x 1 , x 0 ), the bar denoting passage to the complex conjugate. Lemma 13.13. For each x ∈ Kn there exists a Toeplitz matrix Tn (g) ∈ Toepn (K) such that Tn (g)W n x = x and Tn (g)2 ≤ 1. Proof. Put y = W n x. In accordance with (13.12), ⎛ x 0 x 1 . . . x n−1 ⎜ x0 x1 . . . x n−1 y = ⎜ ⎝ ... ... ... x0 x1
⎞ ⎟ ⎟. ⎠ ... . . . x n−1
We extend the n × (2n − 1) matrix y to a (2n − 1) × (2n − 1) circulant matrix Cy by adding n − 1 more rows. For example, if n = 3, ⎛
x0 y = ⎝ 0 0
x1 x0 0
x2 x1 x0
0 x2 x1
⎞
0 0 ⎠, x2
⎛ ⎜ ⎜ Cy = ⎜ ⎜ ⎝
x0 0 0 x2 x1
x1 x0 0 0 x2
x2 x1 x0 0 0
0 x2 x1 x0 0
0 0 x2 x1 x0
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
i
i i
i
i
i
i
13.4. Toeplitz Systems
buch7 2005/10/5 page 319 i
319
By Section 2.1, Cy = U ∗ DU with the unitary matrix U = F2n−1 and a diagonal matrix D = diag (d1 , . . . , d2n−1 ). Put dj+ = 0 if dj = 0 and dj+ = dj−1 if dj = 0. Then let + ) and C = U ∗ D + D ∗ U . We have D + = diag (d1+ , . . . , d2n−1 Cy C = U ∗ DU U ∗ D + D ∗ U = U ∗ DD + D ∗ U = U ∗ D ∗ U = Cy∗ . Since y consists of the first n rows of Cy , it follows that y C is the matrix constituted by the first n rows of Cy∗ : y C = Pn Cy∗ . Now define ⎞ ⎛ ⎞ ⎛ 1 g−(n−1) ⎜ 0 ⎟ ⎜ g−(n−2) ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ := C ⎜ . . . ⎟ =: Ce1 . . . . (13.14) g := ⎜ ⎟ ⎜ ⎟ ⎜ ⎝ 0 ⎠ ⎝ gn−2 ⎠ 0 gn−1 From (13.13), (13.14), and the equality y C = Pn Cy∗ we obtain that Tn (g)y = y g = y Ce1 = Pn Cy∗ e1 = x. It remains to show that Tn (g)2 ≤ 1. The matrix C is obviously a circulant matrix, and by virtue of (13.14), g is the first column of C. This implies that Tn (g) coincides with the lower-left n × n block of C. For example, if n = 2, ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ g−1 c1 c 1 c2 c 3 C = ⎝ c 3 c 1 c 2 ⎠ , g = ⎝ g 0 ⎠ = ⎝ c2 ⎠ , c2 c3 c1 g1 c3 c2 c1 g0 g−1 T2 (g) = = . g1 g 0 c3 c2 Consequently, Tn (g)2 ≤ C2 = D + D ∗ 2 = 1. Theorem 13.14. For every vector x ∈ Kn , Toep κfull (Tn (b), x)
" ≥
1 κ(Tn (b)) . 2
Proof. By definition, Toep (Tn (b), x) = sup{Tn−1 (b)Tn (f )x2 : Tn (f )2 ≤ 1}. Since W n Tn (a)W n = Tn (a) and W n is isometric, we get Toep (Tn (b), x) = sup{W n Tn−1 (b)W n W n Tn (f )W n W n x2 : Tn (f )2 ≤ 1} = sup{Tn−1 (b)Tn (f )W n x2 : Tn (f )2 ≤ 1}. Lemma 13.13 tells us that there is an f such that Tn (f )W n x = x and Tn (f )2 ≤ 1. Hence Toep (Tn (b), x) ≥ Tn−1 (b)x2 = (Tn−1 (b))∗ x2 , and Theorem 13.11 now gives the assertion. Thus, if κ(Tn (b)) increases at least exponentially or polynomially, then so does Toep κfull (Tn (b), x). In other words, we do not win much when passing from unstructured condition numbers to full structured condition numbers.
i
i i
i
i
i
i
320
13.5
buch7 2005/10/5 page 320 i
Chapter 13. Structured Perturbations
Exact Right-Hand Sides
We now consider the structured condition numbers κ Toep (Tn (b), x). We begin with a simple auxiliary result. Lemma 13.15. Let n ≤ k, let A be an n × n matrix, and let B be an n × k matrix. Then AB2 ≥ A2 σmin (B), where σmin (B) denotes the smallest singular value of the matrix B. Proof. There is nothing to prove for σmin (B) = 0. So assume that σmin (B) > 0 and hence rank B = n. There is an x such that A2 = Ax2 and x2 = 1. Since rank B = n, there exists a y with By = x. It follows that 1 = By2 ≥ σmin (B)y2 , and this gives AB2 ≥ ABy2 /y2 = A2 /y2 ≥ A2 σmin (B). Theorem 13.16. For every vector x ∈ Kn , κ Toep (Tn (b), x) 1 σmin (x ) . ≥√ κ(Tn (b), x) n x2 Proof. We know from (13.13) that Tn (f )x = x f . Since √ Tn (f )2 ≤ Tn (f )F ≤ n f 2 , we have Toep (Tn (b), x) = sup{Tn−1 (b)Tn (f )x2 : Tn (f )2 ≤ 1} = sup{Tn−1 (b)x f 2 : Tn (f )2 ≤ 1} √ ≥ sup{Tn−1 (b)x f 2 : n f 2 ≤ 1} √ √ 1 = √ sup{Tn−1 (b)x n f 2 : n f 2 ≤ 1} n 1 1 = √ Tn−1 (b)x 2 ≥ √ Tn−1 (b)2 σmin (x ), n n the last inequality resulting from Lemma 13.15. The assertion now follows from Propositions 13.9 and 13.10. Thus, the ratio κ Toep (Tn (b), x)/κ(Tn (b), x) can be estimated from below by the smallest singular value of the n × (2n − 1) matrix x . Rump [236] took 106 samples of x ∈ R100 with independent xj that are either uniformly distributed random variables in [−1, 1] or random variables with standard normal distribution. In either case, he observed that the mean of σmin (x )/x2 is 0.31 and that the standard deviation is 0.069. Thus, vectors x with small σmin (x )/x2 are rare. One such rare vector is composed of the coefficients of the polynomial x(t) = (t + 1)n−1 . This vector was proposed by Georg Heinig, and Rump showed numerically that for this vector n σmin (x ) 2 x2 5
i
i i
i
i
i
i
13.5. Exact Right-Hand Sides
buch7 2005/10/5 page 321 i
321
(see Figure 13.1). Notice, however, that a small value of σmin (x )/x2 does not yet imply that the quotient κ Toep (Tn (b), x)/κ(Tn (b), x) is also small, because Theorem 13.16 contains only a lower estimate (see also Example 13.19 below). Moreover, in general the determination of σmin (x ) is difficult.
0 −5 −10 −15 −20
10
20
30
40
Figure 13.1. The curve log10 (σmin (x )/x2 ) (solid) and the curve n log10 (2/5) (dashed) for 5 ≤ n ≤ 40. Here is another estimate for κ Toep (Tn (b), x). Theorem 13.17. Let x = (x0 , x1 , . . . , xn−1 ) ∈ Kn and suppose the polynomial x(t) = x0 + x1 t + · · · + xn−1 t n−1 has exactly zeros (counted with multiplicities) on the unit circle T. Thus, x(t) =
(t − μj ) z(t), μj ∈ T, z(t) = 0 for t ∈ T,
j =1
where z(t) is a polynomial of degree n − − 1. Put min |z| = mint∈T |z(t)|. If the matrix Tn (b) is invertible, then Toep (Tn (b), x) ≥
min |z| T −1 (b)2 2/2 n+1/2 n
(13.15)
and hence κ Toep (Tn (b), x) min |z| 1 ≥ /2 +1/2 . κ(Tn (b), x) 2 n x2
(13.16)
Proof. To avoid involved notation, let us assume that n = 4 and = 2. Thus, x0 + x1 t + x2 t 2 + x3 t 3 = (t − μ1 )(t − μ2 )(z0 + z1 t) with |μ1 | = |μ2 | = 1 and z0 + z1 t matrix equality ⎛ ⎞ ⎛ x0 −μ1 ⎜ x1 ⎟ ⎜ 1 ⎜ ⎟ ⎜ ⎝ x2 ⎠ = ⎝ 0 x3 0
(13.17)
= 0 for t ∈ T. Factorization (13.17) is equivalent to the 0 −μ1 1 0
⎞ ⎛ 0 −μ2 ⎟ 0 ⎟⎝ 1 −μ1 ⎠ 0 1
⎞ 0 z0 −μ2 ⎠ , z1 1
i
i i
i
i
i
i
322
buch7 2005/10/5 page 322 i
Chapter 13. Structured Perturbations
or, in other and self-evident notation, x = T4,3 (t − μ1 )T3,2 (t − μ2 )z.
(13.18)
Choose y ∈ K4 so that y2 = 1 and T4−1 (b)y2 = T4−1 (b)2 . We first assume that K = C. We begin with determining an h ∈ P3 such that T4,2 (h)z = y, that is,
⎛
h0 ⎜ h1 ⎜ ⎝ h2 h3
(13.19)
⎞ ⎛ h−1 y0 ⎜ y1 h0 ⎟ z 0 ⎟ =⎜ ⎝ y2 h1 ⎠ z1 h2 y3
Equation (13.20) is certainly satisfied if ⎛ z1 z0 0 0 ⎜ 0 z1 z 0 0 ⎜ ⎜ 0 0 z1 z 0 ⎜ ⎝ 0 0 0 z1 z0 0 0 0
0 0 0 z0 z1
⎞⎛ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
h−1 h0 h1 h2 h3
⎞ ⎟ ⎟. ⎠
⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎟ ⎜ ⎠ ⎝
(13.20)
y0 y1 y2 y3 0
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
or, equivalently, C5 (z1 + z0 t −1 )h = y. Thus, we can take y , h = C5−1 (z1 + z0 t −1 ) 0 whence h2 ≤
1 1 y2 = y2 . −1 min t∈T |z1 + z0 t | min |z|
(13.21)
Next we seek a g ∈ P3 such that T4,3 (g)T3,2 (t − μ2 ) = T4,2 (h), which is the system ⎛ g0 ⎜ g1 ⎜ ⎝ g2 g3
g−1 g0 g1 g2
⎞ ⎛ g−2 −μ2 ⎟ g−1 ⎟ ⎝ 1 g0 ⎠ 0 g1
(13.22)
⎛ ⎞ h0 0 ⎜ h1 −μ2 ⎠ = ⎜ ⎝ h2 1 h3
The last system is satisfied as soon as g−2 = 0 and ⎞⎛ ⎛ 0 0 0 0 −μ2 ⎜ ⎜ 1 0 0 0 ⎟ −μ2 ⎟⎜ ⎜ ⎟ ⎜ 0 1 −μ2 0 0 ⎟⎜ ⎜ ⎜ ⎝ 0 0 ⎠⎝ 0 1 −μ2 0 0 0 1 −μ2
g−1 g0 g1 g2 g3
⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎟ ⎜ ⎠ ⎝
⎞ h−1 h0 ⎟ ⎟. h1 ⎠ h2
h−1 h0 h1 h2 h3
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
(13.23)
i
i i
i
i
i
i
13.5. Exact Right-Hand Sides
buch7 2005/10/5 page 323 i
323
that is, T5 (t − μ2 )g = h. The solution of (13.23) is ⎛ ⎞ ⎛ g−1 1 ⎜ g0 ⎟ ⎜ 1/μ2 1 ⎜ ⎟ ⎜ ⎜ g1 ⎟ = − 1 ⎜ 1/μ2 1/μ2 1 2 ⎜ ⎟ μ2 ⎜ ⎝ g2 ⎠ ⎝ 1/μ32 1/μ22 1/μ2 g3 1/μ42 1/μ32 1/μ22
⎞⎛
1 1/μ2
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝ 1
h−1 h0 h1 h2 h3
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
which gives g22 ≤ (5 + 4 + 3 + 2 + 1) h22 and thus √ g2 ≤ 2 · 4 h2 (13.24) √ (for general n the sum 5+4+3+2+1 and the factor 2·4 become (2n−3)+(2n−4)+· · ·+1 √ and 2 n, respectively). Finally, we want an f ∈ P3 such that T4 (f )T4,3 (t − μ1 ) = T4,3 (g), or, equivalently, ⎛ f0 f−1 ⎜ f1 f 0 ⎜ ⎝ f 2 f1 f3 f 2
f−2 f−1 f0 f1
⎞⎛ f−3 −μ1 ⎜ 1 f−2 ⎟ ⎟⎜ f−1 ⎠ ⎝ 0 f0 0
0 −μ1 1 0
This is satisfied provided f−2 = f−3 = 0 and ⎛ −μ1 0 0 0 0 ⎜ 1 −μ 0 0 0 1 ⎜ ⎜ 0 0 0 1 −μ 1 ⎜ ⎝ 0 0 1 −μ1 0 0 0 0 1 −μ1 As above, we have f 2 ≤
(13.25)
⎞ ⎛ 0 g0 ⎜ g1 0 ⎟ ⎟=⎜ −μ1 ⎠ ⎝ g2 1 g3
⎞⎛ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
f−1 f0 f1 f2 f3
⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎟ ⎜ ⎠ ⎝
g−1 g0 g1 g2 g3
√ 2 · 4 g2 .
g−1 g0 g1 g2
⎞ 0 g−1 ⎟ ⎟. g0 ⎠ g1
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
(13.26)
(13.27)
From (13.18), (13.25), (13.22), (13.19) we get T4 (f )x = T4 (f )T4,3 (t − μ1 )T3,2 (t − μ2 )z = T4,3 (g)T3,2 (t − μ2 )z = T4,2 (h)z =y
(13.28)
and from (13.27), (13.24), (13.21) we obtain
√ T4 (f )2 ≤ T4 (f )F ≤ 4f 2 √ √ √ 1 y2 ≤ 4( 2 · 4)( 2 · 4) min |z| 1 = 2/2 n+1/2 y2 min |z| 1 =: M. = 2/2 n+1/2 min |z|
(13.29)
i
i i
i
i
i
i
324
buch7 2005/10/5 page 324 i
Chapter 13. Structured Perturbations
We have Toep (Tn (b), x) = sup Tn−1 (b)Tn (ϕ)x2 : Tn (ϕ)2 ≤ 1 , and since Tn (f/M)2 ≤ 1 due to (13.29), it follows that Toep (Tn (b), x) ≥ Tn−1 (b)Tn (f )x2 /M. Equality (13.28) now implies that Toep (Tn (b), x) ≥ Tn−1 (b)y2 /M = Tn−1 (b)2 /M. This completes the proof of (13.15). Estimate (13.16) follows from (13.15) and Propositions 13.9 and 13.10: κ Toep (Tn (b), x) 1 1 Tn (b)2 Toep (Tn (b), x) ≥ . = κ(Tn (b), x) M x2 x2 Tn (b)2 Tn−1 (b)2 This completes the proof in the case K = C. Now assume K = R. Then the zeros of x(t) are either real or form pairs μ, μ. In the case of real zeros, we can proceed as before. So assume, for the sake of definiteness, that μ1 = μ and μ2 = μ in (13.17). Then the two systems (13.23) and (13.26), which read T5 (t − μ)f = g and T5 (t − μ)g = h, can be united in the single system A5 f = h with A5 = T5 (t − μ)T5 (t − μ) = T5 (t 2 − 2 Reμ t + 1). Since A5 is an invertible real matrix, it follows that the solution f is also real. The rest is as above. In the following two examples we assume that b is a Laurent polynomial without zeros on T but with nonzero winding number. By Theorem 4.1 and Proposition 13.10, in this case the usual condition number increases at least exponentially, κ(Tn (b), x) = Tn (b)2 Tn−1 (b)2 ≥ Ceγ n
(13.30)
with constants C > 0 and γ > 0, where Tn−1 (b)2 := ∞ if Tn (b) is not invertible. Example 13.18. Fix x = (x0 , x1 , . . . , xm−1 ) ∈ Km and extend x by zeros before x0 or after xm−1 to a vector in Kn (n ≥ m). Then and min |z| are constant, and the right-hand side of (13.16) goes to zero as n−−1/2 . This in conjunction with (13.30) shows that κ Toep (Tn (b), x) grows exponentially. Since in the case at hand σmin (x ) does not depend on n, the same conclusion can also be drawn from Theorem 13.16. Example 13.19. Suppose x(t) = 1 + t n−1 . Then x(t) has n − 1 zeros on T, and (13.16) tells us that κ Toep (Tn (b), x) 1 1 ≥ (n−1)/2 n−1/2 √ . κ(Tn (b), x) 2 n 2 This nourishes the hope that κ Toep (Tn (b), x) might grow essentially more slowly than κ(Tn (b), x). However, this is not the case. To see this, we modify the argument in the
i
i i
i
i
i
i
13.5. Exact Right-Hand Sides
buch7 2005/10/5 page 325 i
325
proof of Theorem 13.17. Take y ∈ Kn so that y2 = 1 and Tn−1 (b)y2 = Tn−1 (b)2 . We seek a g ∈ Pn−1 with coefficients in K such that Tn (b)x = y, that is, g0 + g−(n−1) = y0 g1 + g−(n−2) = y1 ... gn−1 + g0 = yn−1 . This is satisfied for g0 = g−1 = · · · = g−(n−2) = 0, g−(n−1) = y0 , g1 = y1 , g2 = y2 , . . . , gn−1 = yn−1 . √ √ √ It follows that Tn (g)2 ≤ Tn (g)F ≤ n g2 = n y2 = n, whence √ 1 1 Toep (Tn (b), x) ≥ Tn−1 (b)Tn (g/ n)x2 = √ Tn−1 (b)y2 = √ Tn−1 (b)2 , n n and thus κ Toep (Tn (b), x) 1 1 1 ≥√ =√ . κ(Tn (b), x) n x2 2n Consequently, κ Toep (Tn (b), x) increases exponentially together with κ(Tn (b), x). Moral: Inequality (13.16) is a lower estimate that does a good job in many cases but that may be too crude in other cases. Here is a probabilistic argument which shows that if (13.30) holds, then we at least in general do not win anything by passing from unstructured condition numbers to structured condition numbers. Recall that a Rademacher variable is a random variable that assumes the value 1 with probability 1/2 and the value −1 with probability 1/2. A complex random variable is said to have a certain distribution if its real and imaginary parts are independent random variables with that distribution. Theorem 13.20. Let x0 , x1 , . . . , xn−1 ∈ K be independent standard normal or independent Rademacher variables and put x = (x0 , x1 , . . . , xn−1 ). There exist universal constants ε ∈ (0, ∞) and n0 ∈ N such that Toep κ ε (Tn (b), x) 99 P ≥ 3/2 > κ(Tn (b), x) n 100 for all Laurent polynomials b with coefficients in K and all n ≥ n0 . Proof. We confine ourselves to the case K = C; the case K = R is analogous (and a little simpler). Thus, let u0 , v0 , . . . , un−1 , vn−1 be independent standard normal or independent Rademacher variables. Put x0 = u0 + iv0 , . . . , xn−1 = un−1 + ivn−1 and let u(t) = u0 + u1 t + · · · + un−1 t n−1 , x(t) = u(t) + iv(t).
v(t) = v0 + v1 t + · · · + vn−1 t n−1 ,
i
i i
i
i
i
i
326
buch7 2005/10/5 page 326 i
Chapter 13. Structured Perturbations
A deep recent result by Konyagin and Schlag [181] states that there is a universal constant D ∈ (0, ∞) such that δ lim sup P min |u(t)| < √ ≤ Dδ t∈T n n→∞ for each δ > 0 (they even proved this with the unit circle T replaced by the annulus {t ∈ C : ||t| − 1| < ε/n2 }). Choose δ > 0 so that Dδ < 0.005. Then there is an n0 such that δ δ P min |x(t)| < √ ≤ P min |u(t)| < √ < 0.005 (13.31) t∈T t∈T n n for all n ≥ n0 . Theorem 13.17 tells us that 1 1 κ Toep (Tn (b), x) ≥ min |x(t)| √ . (13.32) κ(Tn (b), x) x2 t∈T n √ In the case of Rademacher variables we have x2 = 2n, and (13.31) in conjunction with (13.32) gives Toep κ (Tn (b), x) 1 δ 1 δ P ≤P > 0.99. P min |x(t)| ≥ √ and √ t∈T x2 n 20 n
Now (13.32) gives Toep (Tn (b), x) 1 δ 1 δ κ ≥P > 0.99, ≥ min |x(t)| √ ≥ P κ(Tn (b), x) 20 n3/2 x2 t∈T 20 n3/2 n which is the assertion with ε = δ/20. We conclude with an example of a Toeplitz-like structure for which there may occur indeed drastic differences between structured and unstructured condition numbers.
i
i i
i
i
i
i
13.5. Exact Right-Hand Sides
buch7 2005/10/5 page 327 i
327
Example 13.21. Let Str n (R) = SymtridiagToepn (R) be the set of all n × n symmetric tridiagonal Toeplitz matrices with real entries. If Tn (f ) is a matrix in SymtridiagToepn (R), then the equality Tn (f )x = x f is satisfied with ⎞ ⎛ x1 x0 ⎟ ⎜ x1 x0 + x 2 ⎟ ⎜ ⎟ ⎜ x x + x 2 1 3 SymtridiagToep ⎟. ⎜ x := x := ⎜ ⎟ . . . . . . ⎟ ⎜ ⎝ xn−2 xn−3 + xn−1 ⎠ xn−1 xn−2 Since Tn (f )2 is at least the 2 norm of its second column, we have Tn (f )2 ≥ f 2 and hence, for every Tn (b) ∈ SymtridiagToepn (R), SymtridiagToep (Tn (b), x) = sup{Tn−1 (b)Tn (f )x2 : Tn (f ) ∈ SymtridiagToepn (R), Tn (f )2 ≤ 1} = sup{Tn−1 (b)x f 2 : Tn (f ) ∈ SymtridiagToepn (R), Tn (f )2 ≤ 1} ≤ sup{Tn−1 (b)x f 2 : Tn (f ) ∈ SymtridiagToepn (R), f 2 ≤ 1} ≤ Tn−1 (b)x 2 .
(13.34)
Now suppose that the natural number n is divisible by 6, that is, n = 6m. Put z = (y, y, 0) and x = (z, −z, z, −z, . . . , z, −z). Then x has two equal columns, x = (x x ). It is easily seen that Tn (b)x = (b0 + b1 )x, whence Tn−1 (b)x =
1 x . b0 + b 1
Consequently, Tn−1 (b)x 2 =
√ 1 2 x2 , |b0 + b1 |
and from (13.34) we infer that SymtridiagToep
√ 2 x2 (Tn (b), x) ≤ , |b0 + b1 |
which together with Proposition 13.9 gives κ
SymtridiagToep
√ √ |b0 | + 2|b1 | 2 (Tn (b), x) ≤ Tn (b)2 ≤ 2 . |b0 + b1 | |b0 + b1 |
Thus, κ SymtridiagToep (Tn (b), x) remains bounded as n = 6m → ∞. On the other hand, from Corollary 4.34 we know that if |b0 | ≤ 2|b1 |, then κ(Tn (b), x) = Tn (b)2 Tn−1 (b)2 n2 .
i
i i
i
i
i
i
328
buch7 2005/10/5 page 328 i
Chapter 13. Structured Perturbations
13.6 The Condition Number for Matrix Inversion For an invertible matrix An ∈ Str n (K), the structured condition number for matrix inversion is defined by : (An + δAn )−1 − A−1 δAn 2 n 2 Str κ (An ) = lim sup : δAn ∈ Str n (K), ≤ε . ε→0 An 2 εA−1 n 2 The role played by Str (An , x) in Section 13.3 is now figured by −1 Str (An ) := sup{A−1 n δAn An 2 : δAn ∈ Str n (K), δAn 2 ≤ 1}.
Proposition 13.22. If Str n (K) is invariant under multiplication by real scalars, then κ Str (An ) =
An 2 A−1 n 2
Str (An ).
Proof. If ε > 0 is small enough, then A−1 n 2 δAn 2 < 1 and hence −1 2 −1 (An + δAn )−1 = [An (I + A−1 = (I − A−1 n δAn )] n δAn + O(ε ))An −1 −1 2 = A−1 n − An δAn An + O(ε ).
It follows that (An + δAn )−1 − A−1 n 2 εA−1 n 2
=
−1 2 A−1 n δAn An + O(ε )2
εA−1 = =n 2 = 1 An 2 = 2 =A−1 = = δAn A−1 n = + O(ε ), = n εAn 2 A−1 n 2 2
which implies that κ Str (An ) =
An 2 A−1 n 2
−1 sup{A−1 n δAn An 2 : δAn ∈ Str n (K), δAn 2 ≤ 1}.
As usual, we omit the superscript Str in case Str n (K) = Mn (K). The following observation shows that in the unstructured case we get back the usual condition number. Proposition 13.23. If An is invertible, then κ(An ) = An 2 A−1 n 2 . 2 −1 2 n Proof. It is clear that (An ) ≤ A−1 n 2 . To prove that (An ) ≥ An 2 , choose x ∈ K −1 −1 −1 −1 so that x2 = 1 and An x2 = An 2 . Thus, An x = An 2 y with y2 = 1. We obtain −1 (An ) ≥ sup{A−1 n δAn An x2 : δAn 2 ≤ 1} −1 = A−1 n 2 sup{An δAn y2 : δAn 2 ≤ 1},
and since δAn defined by δAn z = (z, y)x obviously satisfies δAn 2 = 1 and δAn y = x, −1 −1 2 it results that (An ) ≥ A−1 n 2 An x2 = An 2 . The assertion is now immediate from Proposition 13.23.
i
i i
i
i
i
i
13.6. The Condition Number for Matrix Inversion
buch7 2005/10/5 page 329 i
329
The next result is the key for what follows. Theorem 13.24 (Takagi). Let An ∈ Mn (K) be symmetric, A n = An , and let σ1 ≤ · · · ≤ σn be the singular values of An . Then there exists a unitary matrix Vn ∈ Mn (K) such that An = Vn diag (σ1 , . . . , σn ) Vn . A proof is in [166, Corollary 4.4.4], for example. Now we are able to prove that for the structure Toepn (K) there is actually no difference between the structured and the usual condition numbers for matrix inversion. Theorem 13.25. We have κ Toep (Tn (b)) = κ(Tn (b)). Proof. In view of Propositions 13.22 and 13.23 it remains to prove that Toep (Tn (b)) ≥ Tn−1 (b)22 . Let 0 < σ1 ≤ · · · ≤ σn be the singular values on Tn (b). Recall that Wn is the matrix with units on the anti-diagonal and zeros elsewhere. As this matrix is unitary, the singular values of Tn (b)Wn coincide with those of Tn (b). The (Hankel) matrix Tn (b)Wn is symmetric, and hence Theorem 13.24 ensures the existence of a unitary matrix Vn such that Tn (b)Wn = Vn diag (σ1 , . . . , σn ) Vn . It follows that 1 1 Wn Tn−1 (b) = V n diag Vn∗ . ,..., σ1 σn Put x = Vn e1 and y = V n e1 . Then Wn Tn−1 (b)x = V n diag
1 1 ,..., σ1 σn
e1 =
1 1 V n e1 = y σ1 σ1
and, consequently, Toep (Tn (b)) = sup{Tn−1 (b)Tn (g)Tn−1 (b)2 : Tn (g)2 ≤ 1} ≥ sup{Tn−1 (b)Tn (g)Tn−1 (b)x2 : Tn (g)2 ≤ 1} 1 = sup{Tn−1 (b)Tn (g)Wn y2 : Tn (g)2 ≤ 1}. σ1 By Lemma 13.13, there is a matrix Tn (g) of norm at most 1 such that Tn (g)Wn y = y. Thus, Toep (Tn (b)) ≥
1 T −1 (b)y2 . σ1 n
Since Tn−1 (b)y = Wn Wn Tn−1 (b)y 1 1 1 = Wn V n diag Vn∗ Vn e1 = Wn V n e1 , ,..., σ1 σn σ1 we finally obtain that Toep (Tn (b)) ≥
1 1 1 Tn−1 (b)y2 = 2 Wn V n e1 2 = 2 = Tn−1 (b)22 . σ1 σ1 σ1
i
i i
i
i
i
i
330
buch7 2005/10/5 page 330 i
Chapter 13. Structured Perturbations
13.7
Once More the Nearest Singular Matrix
In contrast to Section 13.1, we now measure the distance of a Toeplitz matrix to the nearest singular matrix within the set of all Toeplitz matrices. Thus, let d Toep (Tn (b)) = inf{ε > 0 : 0 ∈ sp Tn (b + ϕ), Tn (ϕ)2 ≤ ε}. The following theorem tells us that this distance is equal to the usual distance measured within the set of all n × n matrices. Theorem 13.26. We have d Toep (Tn (b)) = σ1 (Tn (b)). Proof. Obviously, d Toep (Tn (b)) ≥ d(Tn (b)) = σ1 (Tn (b)) (recall Section 13.2). To prove the reverse inequality, we proceed as in the proof of Theorem 13.25. The matrix Tn (b)Wn is symmetric and hence, by Theorem 13.24, Tn (b)Wn = Vn SVn , where Vn is unitary and S = diag (σ1 (Tn (b)), . . . , σn (Tn (b))). There is an x in Kn \ {0} such that Vn x = e1 . With σ1 := σ1 (Tn (b)) we therefore get Tn (b)Wn x = Vn SVn x = Vn Se1 = Vn σ1 e1 = σ1 Vn e1 = σ1 Vn e1 = σ1 Vn Vn x = σ1 Vn Vn∗ x = σ1 x. As Wn x = W n x, we arrive at the equality Tn (b)W n x = σ1 x. From Lemma 13.13 we infer the existence of a matrix Tn (g) of norm at most 1 such that Tn (g)W n x = x. Thus, we have (Tn (b) − σ1 Tn (g))W n x = 0, which shows that d Toep (Tn (b)) is not larger than σ1 Tn (g)2 = σ1 .
Exercises 1. Let An and Kn be the n × n matrices ⎛ 1 −1 −1 . . . −1 ⎜ 1 −1 . . . −1 ⎜ ⎜ 1 . . . −1 An = ⎜ ⎜ .. .. ⎝ . .
⎞ ⎟ ⎟ ⎟ ⎟, ⎟ ⎠
⎛ ⎜ ⎜ Kn = ⎜ ⎝
1
−ε −ε .. .
0 0 .. .
⎞ ... 0 ... 0 ⎟ ⎟ .. ⎟ . . ⎠
−ε
0
... 0
Prove that det (An + Kn ) = 1 − 2n−1 ε and hence d(An ) ≤ the other hand, d Toep[0,n] (An ) = 1.
√ n / 2n−1 . Show that, on
2. Let b(t) = 2 + α + t + t −1 where α > 0 is small. Show that κ(Cn (b)) =
4+α 4 ≈ . α α
Prove that if n is large and x is drawn from the unit sphere of Rn with the uniform distribution, then κ Circ (Cn (b), x) =
(4 + α)1/4 (2 + α)1/2 2 ≈ 3/4 3/4 α α
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 331 i
331
with probability near 1. Thus, for α = 0.01, κ(Cn (b)) ≈ 400,
κ Circ (Cn (b), x) ≈ 63,
and for α = 0.0001, κ(Cn (b)) ≈ 40000,
κ Circ (Cn (b), x) ≈ 2000.
How does this fit with the remark after the proof of Corollary 13.12? 3. Find the eigenvalues of the 2n × 2n matrix ⎛ a ⎜ a b ⎜ ⎜ . . . ⎜ ⎝ b a b
b
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
a
4. Let ⎛
F (a, n) := Tn (|t − 1|2 ) + (a − 2)Enn
2 −1 0 .. .
⎜ ⎜ ⎜ ⎜ := ⎜ ⎜ ⎜ ⎝ 0 0
−1 2 −1 .. .
0 −1 2 .. .
0 0
0 0
... ... ... .. .
0 0 0 .. .
... 2 . . . −1
0 0 0 .. .
⎞
⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ −1 ⎠ a
Prove that F (1, n)−1 = (min(j, k))nj,k=1 n jk −1 F (2, n) = min(j, k) − n + 1 j,k=1 n 2j k F (3, n)−1 = min(j, k) − . 2n + 1 j,k=1 Show that det F (a, n) = n(a − 1) + 1. 5. (a) Let A and B be k × k matrices. Toeplitz matrix ⎛ A B ⎜ B A ⎜ ⎜ B ⎜ ⎜ ⎜ ⎜ ⎝
Prove that the spectrum of the nk × nk block ⎞ B A .. .
B .. . B
..
. A B
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ B ⎠ A
i
i i
i
i
i
i
332
buch7 2005/10/5 page 332 i
Chapter 13. Structured Perturbations is
;n
j =1
πj sp (A + 2B cos n+1 ).
(b) Let Cn be the nk × nk matrix (Tn (cij ))ki,j =1 with cij (t) = aij + bij (t + t −1 ) (aij , bij ∈ C). Put A = (aij )ki,j =1 and B = (bij )ki,j =1 . Prove that πj sp Cn = sp A + 2B cos . n+1 j =1 n .
6. The structured componentwise condition number of a matrix An ∈ Str n (R) at x ∈ Rn is defined by condStr E,f (An , x) = lim sup ε→0
δx∞ : (An + δAn )(x + δx) = An x + δb, εx∞ δAn ∈ Str n (R), |δAn | ≤ ε|En |, : n δb ∈ R , |δb| ≤ ε|f | .
Here En ∈ Str n (R) is a given weight matrix and f ∈ Rn is a given weight vector. (The cases En = An , f = b and En = An , f = 0 are of particular interest.) Furthermore, we use the absolute value and comparison of vectors and matrices componentwise. (a) Prove that if Str n (R) = Mn (R), then −1 |A−1 n | |En | |x| + |An | |f | ∞ . x∞
condStr E,f (An , x) =
(b) Prove that if Str n (R) = Toepn (R), then condStr E,f (An , x) =
−1 |A−1 n x | |pEn | + |An | |f | ∞ , x∞
where x is the matrix (13.12) and, for En = Tn (h), pEn := h−(n−1) h−(n−2) . . . hn−1 hn (so that En x = x pEn as in (13.13)). (c) Prove that if Str n (R) = Circn (R), then condStr E,f (An , x) =
−1 |En | |A−1 n x| + |An | |f | ∞ . x∞
7. In our files, we found a copy of a transparency by Siegfried Rump showing the
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 333 i
333
following: ⎛
2
⎜ ⎜ −1 A=⎜ ⎜ ⎝
2 −1
⎛
⎞
−1 −1 2 .. .
⎟ ⎟ .. ⎟ , . ⎟ ⎠ .. .
⎜ ⎜ ⎜ x=⎜ ⎜ ⎝
1 −1 1 −1 .. .
⎞ ⎟ ⎟ ⎟ ⎟, ⎟ ⎠
κ(A) = A−1 A = 8.2 · 106 κ SymmToep (A, x) < 6.5 · 104 , condA,0 (A, x) = SymmToep
condA,0
|A−1 | |A| |x| ∞ = 5.0 · 105 , x∞
(A, x) = 1.
Which dimension does the matrix A have? 8. Let An = Tn (1 + χ1 ) + (−1)n+1 E1n , where E1n is the matrix whose 1, n entry is 1 and whose other entries are all zero. Let dcomp (An , |An |) be the infimum of all ε > 0 for which there exists a matrix Fn ∈ Mn (R) such that |Fn | ≤ ε|An | and 0 ∈ sp (An + Fn ). Show that det An = 2,
rad (|A−1 n | |An |) = n.
dcomp (An , |An |) = 1,
9. The full-Toeplitz structured pseudospectrum of a matrix An ∈ Mn (C) is defined by . n (C) (A ) = spToep sp (An + En ). n ε En ∈Toepn (C), En 2 ≤ε Toepn (C)
Prove that if An ∈ Toepn (C), then spε (An ) = spε
(An ).
Notes The problem of determining the distance of a structured matrix to the nearest singular matrix within the matrices of the same given structure and the problem of finding the componentwise (and not normwise) distance to the nearest singular matrix are studied in the work of Demmel [100], Gohberg and Koltracht [131], [132], D. J. Higham and N. J. Higham [160], [161], and Rump [231], [234], [235], [236], [237]. The results of Sections 13.1 and 13.2 are from our paper [57] with Kozak. The structured condition numbers κbStr (An , x) and κ Str (An ) as well as the distance Str d (An ) were introduced by D. J. Higham and N. J. Higham [160], [161]. We learned of this topic from Siegfried Rump in 2001, and Sections 13.3 to 13.7 are based on Rump’s paper [236]. In particular, all results and proofs of Sections 13.3, 13.6, and 13.7 are taken from [236]. The Tn (g)x = x g trick with the matrix (13.12) (and the analogue of this trick for other structures) was invented by D. J. Higham and N. J. Higham [160]. The marvelous Lemma 13.13 and Theorem 13.14 are Rump’s [236]. Theorem 13.16 and Example 13.21 are
i
i i
i
i
i
i
334
buch7 2005/10/5 page 334 i
Chapter 13. Structured Perturbations
also from [236]. Theorem 13.17 was established in [56]. Notice that it is this theorem that allows us to have recourse to the Konyagin and Schlag result [181] in an easy and luxurious way. We also remark that Theorems 13.14, 13.16, 13.17, 13.25, 13.26 are valid for general n × n Toeplitz matrices and are not limited to banded Toeplitz matrices. Clearly, the study of Toeplitz band matrices within the structure Toepn (K) of all Toeplitz matrices is still too crude. Given b ∈ Pr,s , the natural territory for the matrices Tn (b) is the structure Toep[r, s]n (K) of all matrices Tn (f ) with f ∈ Pr,s and with the coefficients of f in K. This strategy was pursued in Sections 13.1 and 13.2, and Corollary 13.7 is a really striking result: it says that outside (b), or at least at a certain distance from (b), no evil will happen. In contrast to this, the conclusion of Sections 13.4 to 13.7 is that the structured condition numbers and the distance to the nearest singular matrix in the structure Toepn (K) are never (or at least almost never) significantly better than their unstructured counterparts. We believe that for structured condition numbers things change when passing from Toepn (K) Toep[r,s] to Toep[r, s]n (K), that is, we conjecture that κfull (Tn (b), x), κ Toep[r,s] (Tn (b), x), and Toep[r,s] (Tn (b)) behave much better than κ(Tn (b), x) = κ(Tn (b)) and κ(Tn (b)). Example κ 13.21 and Corollary 13.7 show that there are good reasons for such a belief. Exercise 1 is well known. We took it from [275]. We found Exercise 3 in [113]. Exercise 4 is a result of [122]. Part (a) of Exercise 6 is well known and due to Skeel [256] (see also [161]), part (b) is from [160] and [237], and part (c) is an observation of [237]. Exercise 8 is contained in [233]. The result of Exercise 9 was established in [140] and [238]. The last two papers have analogous results for plenty of other structures, too. Rump [238] also studies the case where Toepn (C) is replaced by Toepn (R). Theorem 8.2 of [62] implies that the analogue of the result of Exercise 9 for bounded Toeplitz operators on 2 is not true. This is another beautiful example of the fact that results being valid for all finite matrices need not extend to infinite matrices. Further results: componentwise distance to the nearest singular matrix. For a matrix C = (cj k ), we denote by |C| the matrix (|cj k |). An inequality of the form C ≤ D is here (in contrast to the rest of the text) understood entrywise, that is, C ≤ D means cj k ≤ dj k for all j, k. Let En ∈ Mn (R) be a given matrix with nonnegative entries. The componentwise distance dcomp (an , En ) of a matrix An ∈ Mn (R) to the nearest singular matrix is defined as the infimum of all ε > 0 for which there exists a matrix Fn ∈ Mn (R) such that |Fn | ≤ εEn and An + Fn is singular. One always has dcomp (An , En ) ≥ 1/rad (|A−1 n | En ). In 1992, N. J. Higham and J. Demmel raised the conjecture that there exists a γ (n) < ∞ such that dcomp (An , |An |) ≤
γ (n) rad (|A−1 n | |An |)
(see [100]). This conjecture was confirmed by Rump [232], [233] (see also [234]), who even proved that for arbitrary weight matrices En the inequality √ 1 (3 + 2 2 )n (A , E ) ≤ ≤ d comp n n rad (|A−1 rad (|A−1 n | En ) n | En ) holds. Notice that Exercise 8 implies that γ (n) ≥ n.
i
i i
i
i
i
i
buch7 2005/10/5 page 335 i
Chapter 14
Impurities
In this chapter we describe several phenomena that arise when randomly perturbing a relatively small number of entries of an infinite or a large finite Toeplitz matrix. We illustrate the appearance of localized eigenvectors by a very simple example. We also use the pseudospectral approach to explain the emergence of bubbles and antennae. It turns out that there are again big differences between the cases of infinite and large finite Toeplitz matrices. The chapter ends with some results on the important question whether structured pseudospectra can jump.
14.1 The Discrete Laplacian Let σ (t) = t + t −1 (t ∈ T). The matrix Tn (σ ) is referred to as the discrete Laplacian and matrices of the form Tn (σ ) + Vn with a real diagonal matrix Vn = diag (v1 , . . . , vn ) are called discrete Hamiltonians. We here consider the case where Vn has only one nonzero entry, that is, we study the discrete Laplacian with a single impurity. In this simple situation a fairly complete analysis is possible. Let Ejj denote the matrix whose jj entry is 1 and all other entries of which are zero. The spectrum of Tn (σ ) is a subset of (−2, 2). Moreover, we know from Section 2.2 that Tn (σ ) has the eigenvalues λ(n) j = 2 cos and that x
(j )
πj n+1
(j = 1, . . . , n)
2πj nπj πj , sin , . . . , sin = sin n+1 n+1 n+1
is an eigenvector for λ(n) j . Notice that all eigenvectors are extended. Proposition 14.1. If v > 1 and j ∈ {2, . . . , n − 1}, then Tn (σ ) + vEjj has an eigenvalue λ > 2 for each n ≥ 3. The matrices Tn (σ ) + vE11 and Tn (σ ) + vEnn have an eigenvalue λ > 2 whenever v > 1 + 1/n. 335
i
i i
i
i
i
i
336
buch7 2005/10/5 page 336 i
Chapter 14. Impurities
Proof. Let v > 1. The eigenvalues of T3 (σ ) + vE22 are √ √ v2 + 8 v2 + 8 v v λ1 = 0, λ2 = − , λ3 = + , 2 2 2 2 and λ3 > 2 for v > 1. Since T3 (σ ) + vE22 is a submatrix of Tn (σ ) + vEjj for every n ≥ 3 and every j ∈ {2, . . . , n − 1}, we see that for these n and j , Tn (σ ) + vEjj 2 ≥ T3 (σ ) + vE22 2 = λ3 > 2, which implies that the maximal eigenvalue of Tn (σ ) + vEjj is also greater than 2 (note that this conclusion can also be drawn from Theorem 9.19). The determinant of Tn (σ − λ) + vE11 is a polynomial in λ of degree n with the leading coefficient (−1)n . Hence, this determinant has a zero in (2, ∞) if its value at λ = 2 is (−1)n times a negative number. By formula (2.11), det(Tn (σ − 2) + vE11 ) = (v − 2) det Tn−1 (σ − 2) − det Tn−2 (σ − 2) = (v − 2)(−1)n−1 n − (−1)n−2 (n − 1) = (−1)n −(v − 2)n − (n − 1) = (−1)n (−nv + n + 1) and −nv + n + 1 < 0 whenever v > 1 + 1/n. The case where Tn (σ ) + vEnn can be disposed of analogously, or it can be simply reduced to the case Tn (σ ) + vE11 by a similarity transformation. Proposition 14.2. Consider the matrix Tn (σ ) + diag (v1 , . . . , vn ). If λ > 2 is an eigenvalue and x = (x1 , . . . , xn ) is an eigenvector for λ, then |x1 |2 + · · · + |xn |2 ≤
1 |v1 x1 |2 + · · · + |vn xn |2 . 2 (λ − 2)
Proof. Put V = diag (v1 , . . . , vn ). We have Tn (σ − λ)x = −V x, and taking into account that Tn (σ )2 ≤ σ ∞ = 2 < λ, we get x=
−Tn−1 (σ
∞
1 1 k − λ)V x = T (σ )V x, λ k=0 λk n
whence x2 ≤
∞ 1 2 k 1 V x2 = V x2 . λ k=0 λ λ−2
Applying Proposition 14.2 to the matrix Tn (σ ) + vEjj , we obtain that |x1 |2 + · · · + |xn |2 ≤
|v|2 |xj |2 . (λ − 2)2
(14.1)
i
i i
i
i
i
i
14.1. The Discrete Laplacian
buch7 2005/10/5 page 337 i
337
This inequality shows that if n is large, then the vector x must be localized in some sense. For example, if |v|2 /(λ − 2)2 = 25 and |xj | = 1, then |x1 |2 + · · · + |xj −1 |2 + |xj +1 |2 + · · · + |xn |2 ≤ 24, which implies that at most 24 values of |xk |2 can be near 1, at most 48 values of |xk |2 can be near 1/2, etc. Clearly, the same localization phenomenon can be deduced from Proposition 14.2 in case diag (v1 , . . . , vn ) has only a finite number of nonzero entries and this number is small in comparison with n. Put Dk (λ) = det Tk (σ − λ). From Theorem 2.6 we deduce that if the zeros √ √ λ2 − 4 λ2 − 4 λ λ q1 = − + , q2 = − − 2 2 2 2 of the polynomial q 2 + λq + 1 are distinct, then Dk (λ) =
q2k+1 − q1k+1 . q2 − q 1
Let λ p1 = − 2
√
λ2 − 4 , 2
λ p2 = + 2
√
λ2 − 4 2
(14.2)
be the zeros of the polynomial p2 − λp + 1 and put Fk (λ) =
p2k+1 − p1k+1 p2 − p 1
for p1 = p2 . Clearly, for k ≥ 1, p1 = −q1 ,
p2 = −q2 ,
Fk (λ) = (−1)k Dk (λ).
Lemma 14.3. Let v ∈ R. A number λ ∈ R is an eigenvalue of Tn (σ ) + vEjj if and only if (v − λ)Fj −1 (λ)Fn−j (λ) + Fj −2 (λ)Fn−j (λ) + Fj −1 (λ)Fn−j −1 (λ) = 0.
(14.3)
Proof. Expanding the determinant det(Tn (σ ) + vEjj − λI ) by the j th row, we get (−1)n+1 times the left-hand side of (14.3). Proposition 14.4. Let v > 1 and let λn (v, 1) be the maximal eigenvalue of the matrix Tn (σ ) + vE11 . Then, as n → ∞, λn (v, 1) = v +
1 + o(1). v
Proof. Proposition 14.1 implies that λn := λn (v, 1) > 2 for all sufficiently large n. Moreover, the argument of the first half of the proof of Proposition 14.1 shows that λn ≤ λn+1
i
i i
i
i
i
i
338
buch7 2005/10/5 page 338 i
Chapter 14. Impurities
for all n. Since λn ≤ Tn (σ ) + vE11 2 ≤ 2 + v, it follows that λn converges to some limit λ∞ . From Lemma 14.3 we infer that p2,n − p1,n Fn−2 (λn ) =− n n Fn−1 (λn ) p2,n − p1,n n−1
v − λn = −
n−1
(14.4)
with p1/2,n
λn ∓ = 2
#
λ2n − 4 λ∞ → ∓ 2 2
#
λ2∞ − 4 =: p1/2 . 2
Since p2 > 1 + ε and 0 < p1 < 1 − ε with some ε > 0 for all sufficiently large n, we conclude that the right-hand side of (14.4) converges to −1/p2 = −p1 . Thus, # λ2∞ − 4 λ∞ + , v − λ∞ = − 2 2 whence λ∞ = v + 1/v. Proposition 14.5. Let jn = [n/2]+1. For v > 1, define λn (v, jn ) as the maximal eigenvalue of Tn (σ ) + vEjn ,jn . Then, as n → ∞, λn (v, jn ) =
# v 2 + 4 + o(1).
Proof. The matrix Tn (σ ) + vEjn ,jn is a submatrix of Tn+1 (σ ) + vEjn+1 ,jn+1 . Hence λn := λn (v, jn ) converges monotonically to some limit λ∞ ∈ (2, 2 + v) as n → ∞ (recall the proof of Proposition 14.4). From Lemma 14.3 we obtain v − λn = −
Fjn −2 (λn ) Fn−jn −1 (λn ) − , Fjn −1 (λn ) Fn−jn (λn )
and as in the proof of Proposition 14.4 it follows that −
Fjn −2 (λn ) Fn−jn −1 (λn ) 1 1 = −2p1 . − →− − Fjn −1 (λn ) Fn−jn (λn ) p2 p2
# # Consequently, v − λ∞ = −λ∞ + λ2∞ − 4, which gives λ∞ = λ2∞ + 4. The last two propositions provide us with two cases in which the constant v/(λ − 2) appearing in (14.1) is seen to be no astronomic number. In the case of a single corner impurity we have v v ≈ ≤ 5 for all v > 1.8, λn (v, 1) − 2 v + 1/v − 2 while in the case of a single center impurity we even get " v 4 2 √ v ≈√ = 1 + 2 + < 5 + 2 < 5 for all v > 1. λn (v, [n/2] + 1) − 2 v v v2 + 4 − 2
i
i i
i
i
i
i
14.1. The Discrete Laplacian
buch7 2005/10/5 page 339 i
339
Proposition 14.6. Let v ∈ R and let λ ∈ R \ {−2, 2} be an eigenvalue of the matrix Tn (σ ) + vEjj . Then an eigenvector for λ is given by x = (ψ1 , . . . , ψj , ϕn−j , . . . , ϕ1 ),
(14.5)
where n−j +1
ψk = Fn−j (λ)Fk−1 (λ) =
p2 j
n−j +1
− p1 p2 − p 1
p2k − p1k p2 − p 1
(1 ≤ k ≤ j ),
j
p − p1 p2k − p1k ϕk = Fj −1 (λ)Fk−1 (λ) = 2 p2 − p 1 p2 − p 1 √ λ2 − 4 λ p1/2 = ∓ . 2 2
(1 ≤ k ≤ n − j ),
Proof. We must verify that
Tn (σ − λ)x + vEjj x
k
=0
(14.6)
for k ∈ {1, . . . , n}. If k ∈ {1, . . . , j − 1}, then the left-hand side of (14.6) is ψk−1 − λψk + ψk+1 = Fn−j (λ) Fk−2 (λ) − λFk−1 (λ) + Fk (λ) (note that F−1 (λ) = 0), and this is zero because p1 , p2 are the zeros of the polynomial p2 − λp + 1. If k = j + with ∈ {1, . . . , n − j }, then the left-hand side of (14.6) equals xj +−1 − λxj + + xj ++1 = ϕn−j −+2 − λϕn−j ++1 + ϕn−j − (notice that xj = ψj = Fn−j (λ)Fj −1 (λ) = ϕn−j +1 ), which is Fj −1 (λ) Fn−j −+1 (λ) − λFn−j − (λ) + Fn−j −−1 (λ) = 0, again because p1 , p2 are the roots of the equation p 2 − λp + 1 = 0. Finally, the j th equation of (14.6) is 0 = ψj −1 + (v − λ)ψj + ϕn−j = Fn−j (λ)Fj −2 (λ) + (v − λ)Fn−j (λ)Fj −1 (λ) + Fj −1 (λ)Fn−j −1 (λ), which is equivalent to (14.3). The following corollary is illustrated in Figure 14.1. Corollary 14.7. Let v ∈ R and let λ ∈ R be an eigenvalue of Tn (σ )+vEjj . If |λ| < 2, then the eigenvector (14.5) is extended, while if |λ| > 2, then the eigenvector (14.5) is localized around j . Proof. This follows from Proposition 14.6 along with the observation that |p1 | = |p2 | = 1 if |λ| < 2, 0 < p1 < 1 < p2 if λ > 2, and p1 < −1 < p2 < 0 if λ < −2.
i
i i
i
i
i
i
340
buch7 2005/10/5 page 340 i
Chapter 14. Impurities
Figure 14.1. The matrix T30 (σ ) + 3 E5,5 has 30 real eigenvalues λ1 < · · · < λ30 . The maximal eigenvalue λ30 is 3.6055; the remaining eigenvalues are located in (−2, 2). The pictures show eigenvectors to λ1 = −1.9858, λ8 = −1.3385, λ15 = −0.0243, λ22 = 1.3102, λ29 = 1.9850, and λ30 = 3.6055 (from the top to the bottom). The vertical axis is always from −1 to 1.
i
i i
i
i
i
i
14.2. An Uncertain Block
buch7 2005/10/5 page 341 i
341
Finally, let n be large, choose m small in comparison with n (say, m = 10), and let 1 = j0 < j1 < · · · < jm < jm+1 = n. Consider the matrix An = Tn (σ ) +
m
v Ej ,j
=1
and suppose λ ∈ R\{−2, 2} is an eigenvalue of An . Let x be an eigenvector for λ. Assume we have an for which j+1 −j is large (say, n/5). The values of xk for k ∈ {j +1, . . . , j+1 −1} are determined by the difference equation xk−1 − λxk + xk+1 = 0 and the two boundary conditions xj − λxj +1 + xj +2 = xj+1 −2 − λxj+1 −1 + xj+1 = 0 (with x−1 = xn+1 = 0). This gives xk = c1 p1k + c2 p2k with certain constants c1 , c2 and # p1/2 = λ/2 ∓ (1/2) λ2 − 4. If |λ| > 2, then necessarily 0 < p1 < 1 < p2 or p1 < −1 < p2 < 0. For the sake of definiteness, assume 0 < p1 < 1 < p2 . If c2 = 0, then xk is localized near j+1 . If c2 = 0 and c1 = 0, then xk is localized near j . In case c1 = c2 = 0, xk vanishes identically for k ∈ {j + 1, . . . , j+1 − 1}, which may also be viewed as being localized. If |λ| < 2, then |p1 | = |p2 | = 1 and hence xk is extended unless c1 = c2 = 0.
14.2 An Uncertain Block We now turn to general Toeplitz band matrices with a finite number of impurities. Let A be a bounded linear operator on 2 (Z), 2 (N), or Cn = 2 ({1, . . . , n}). Recall that Pm is the orthogonal projection onto the first m coordinates. For ε > 0, we set . sp (A + Pm KPm ). spm ε A= K2 ≤ε
Thus, spm ε A measures the extent to which sp A can increase by a perturbation (impurity, uncertainty) of norm at most ε localized in the upper-left m × m block of A. Pm ,Pm In the notation of Section 7.1, spm A, and the results of Section 7.1 ε A is just spε imply that . sp (A + Pm KPm ) spm ε A= K2 ≤ε, rankK=1
i
i i
i
i
i
i
342
buch7 2005/10/5 page 342 i
Chapter 14. Impurities
and : 1 −1 spm A = sp A ∪ λ ∈ / sp A : P (A − λI ) P ≥ . m m 2 ε ε
(14.7)
If b(t) = b0 is constant (equivalently, if T (b) is diagonal), then, obviously, 1 spm ε T (b) ⊃ spε T (b) = b0 + εD,
sp T (b) = {b0 }.
Thus, spm ε T (b) is strictly larger than sp T (b) for every ε > 0. As the following theorem shows, nondiagonal infinite Toeplitz band matrices behave differently. Theorem 14.8. If b is a nonconstant Laurent polynomial, then there exists a number ε1 = ε1 (b, m) > 0 such that spm ε T (b) = sp T (b) for all ε ∈ (0, ε1 ). Proof. Due to (14.7), it suffices to show that sup Pm T −1 (b − λ)Pm 2 < ∞.
λ∈sp / T (b)
If λ ∈ / sp T (b), we can write b − λ = b− b+ with b− (t) =
r
1−
j =1
δj t
b+ (t) = bs
,
s
(t − μk ),
k=1
where |δj | < 1, |μk | > 1, bs = 0, and hence −1 −1 )T (b− ) T −1 (b − λ) = T (b+
(recall Section 1.4). Since −1 −1 ) = Pm T (b+ )Pm , Pm T (b+
−1 −1 T (b− )Pm = Pm T (b− )Pm ,
it follows that −1 −1 )Tm (b− ). Pm T −1 (b − λ)Pm = Tm (b+
We have ∞ δ2 δ2 cn δ1 δr −1 + 21 + · · · . . . 1 + + 2r + · · · =: (t) = 1 + b− t t t t tn n=0 with
|cn | =
α1 +···+αr =n
δ1α1
. . . δrαr
≤ (|δ1 | + · · · + |δr |)n ≤ r n .
i
i i
i
i
i
i
14.2. An Uncertain Block
buch7 2005/10/5 page 343 i
343
Thus, −1 2 Tm (b− )2 ≤ m|c0 |2 + (m − 1)|c1 |2 + · · · + |cm−1 |2
≤ m + (m − 1)r 2 + · · · + r 2m−2 . On writing b+ (t) = bs
s
(−μk )
k=1
s
t 1− μk
k=1
,
we obtain analogously that 1 m + (m − 1)s 2 + · · · + s 2m−2 . 2 |bs |
−1 2 T (b+ )2 ≤
In summary, Pm T
−1
(b −
λ)Pm 22
1 ≤ |bs |2
m−1
(m − )r
2
m−1
=0
(m − )s
2
,
(14.8)
=0
which is the desired result. Let b(t) =
s
bj t j ,
r ≥ 1,
s ≥ 1,
b−r bs = 0.
j =−r
Inequality (14.8) implies that Theorem 14.8 is true with ε1 = |bs |
m−1
(m − )r
=0
2
−1/2 m−1
−1/2 (m − )s
2
.
=0
Considering the transpose of T (b), we can slightly improve (and symmetrize) this estimate to −1/2 m−1 −1/2 m−1 ε1 ≤ max(|b−r |, |bs |) (m − )r 2 (m − )s 2 . (14.9) =0
=0
We now turn to large finite Toeplitz band matrices. From Theorems 11.3 and 11.17 we know that ? lim sp Tn (b) = (b) = sp T (b ). (14.10) n→∞
>0
If T (b) is triangular, then, obviously, 1 spm ε Tn (b) ⊃ spε Tn (b) = b0 + εD
i
i i
i
i
i
i
344
buch7 2005/10/5 page 344 i
Chapter 14. Impurities
for each ε > 0 and each n ≥ 1. This implies that the limit of spm ε Tn (b) as n → ∞ is strictly larger than (b) = {b0 } for each ε > 0. The following theorem shows that this cannot happen for nontriangular Toeplitz band matrices provided ε > 0 is sufficiently small. Theorem 14.9. Let b be a Laurent polynomial and suppose T (b) is not a triangular matrix. Then there exists an ε2 = ε2 (b, m) > 0 such that lim spm ε Tn (b) = (b) for all ε ∈ (0, ε2 ).
n→∞
The proof of this theorem is based on three lemmas. Since T (b) is supposed to be nontriangular, we can write b(t) =
s
bj t j ,
r ≥ 1,
s ≥ 1,
b−r = 0,
bs = 0.
j =−r
Lemma 14.10. There exists a constant δ = δ(b) > 1 such that ? sp T (b ). (b) = ∈[1/δ,δ]
Proof. We have bs−1 1 1 b−r b (t) = bs s t s 1 + . + ··· + bs t bs r+r t r+s Hence, if is large enough, then, for all λ ∈ sp T (b), b − λ has no zeros on T and wind (b − λ) = s = 0. This implies there is a 1 ∈ (1, ∞) such that sp T (b) ⊂ sp T (b ) for all > 1 . Analogously, from the representation b−r+1 bs r+s r+s −r −r 1+ t + · · · + t b (t) = b−r t b−r b−r we infer that there exists a 2 ∈ (0, 1) such that sp T (b) ⊂ sp T (b ) for all < 2 . Letting δ := max(1 , 1/2 ) we get ? sp T (b ) ⊃ sp T (b), ∈[1/δ,δ] /
whence ?
⎡ sp T (b ) ⊃ sp T (b) ∩ ⎣
?
⎤ sp T (b )⎦ =
∈[1/δ,δ]
∈(0,∞)
?
sp T (b ).
∈[1/δ,δ]
Lemma 14.11. For every ε > 0, m lim sup spm ε Tn (b) ⊂ spε T (b). n→∞
i
i i
i
i
i
i
14.2. An Uncertain Block
buch7 2005/10/5 page 345 i
345
Proof. Pick λ ∈ C \ spm / sp T (b) and, by (14.7), ε T (b). Then λ ∈ Pm T −1 (b − λ)Pm 2 < 1/ε. It follows that there is an open neighborhood U ⊂ C of λ such that if μ ∈ U , then μ∈ / sp T (b) and Pm T −1 (b − μ)Pm 2 < 1/ε. From Corollary 3.8 we infer that Tn−1 (b − μ) converges strongly to T −1 (b − μ). Consequently, Pm Tn−1 (b − μ)Pm − Pm T −1 (b − μ)Pm 2 → 0 as n → ∞,
(14.11)
and it is straightforward to check that the convergence in (14.11) is uniform with respect to μ in compact subsets of U . Thus, there exist an open neighborhood V ⊂ U of λ and a natural number n0 such that Pm Tn−1 (b − μ)Pm 2 < 1/ε for all μ ∈ V and all n ≥ n0 . This, in conjunction with (14.7), implies that V ∩ spm ε Tn (b) = ∅ for all n ≥ n0 , whence λ∈ / lim supn→∞ spm T (b). n ε Lemma 14.12. Let δ be the constant given by Lemma 14.10. If ∈ [1/δ, δ] and ε > 0, then m lim sup spm ε Tn (a) ⊂ spβ T (a ),
(14.12)
n→∞
where β = ε max(m−1 , −m+1 ). Proof. We have Tn (b − λ) = Dn−1 ()Tn (b − λ)Dn (), where Dn () is the diagonal matrix diag (, 2 , . . . , n ). It follows that −1 Pm Tn−1 (b − λ)Pm = Dm ()Pm Tn−1 (b − λ)Pm Dm (),
whence Pm Tn−1 (b − λ)Pm 2 ≤ κm ()Pm Tn−1 (b − λ)Pm 2 , where −1 ()2 Dm ()2 . κm () := Dm
Since also sp Tn (b) = sp Tn (b ), we conclude from (14.7) that spm ε Tn (b) = sp Tn (b) ∪ {λ ∈ / sp Tn (b) : Pm Tn−1 (a − λ)Pm 2 ≥ 1/ε} = sp Tn (b ) ∪ {λ ∈ / sp Tn (b ) : Pm Tn−1 (b − λ)Pm 2 ≥ 1/ε} ⊂ sp Tn (b ) ∪ {λ ∈ / sp Tn (b ) : κm ()Pm Tn−1 (b − λ)Pm 2 ≥ 1/ε} m = spεκm () Tn (b ),
i
i i
i
i
i
i
346
buch7 2005/10/5 page 346 i
Chapter 14. Impurities
and now Lemma 14.11 yields the inclusion m lim sup spm ε Tn (b) ⊂ spεκm () T (b ). n→∞
Because κm () = the assertion.
m−1
if ≥ 1 and κm () = −m+1 if < 1, we are then able to arrive at
Proof of Theorem 14.9. Denote the right-hand side of (14.9) by C(r, s, m) max(|b−r |, |bs |), let δ be the constant from Lemma 14.10, and put ε2 := C(r, s, m)|bs |/δ s+m−1 . We claim that Theorem 14.9 is true with this choice of ε2 . Let ε ∈ (0, ε2 ). Lemma 14.12 shows that if ∈ [1/δ, δ], then (14.12) holds with β = ε max(m−1 , −m+1 ) < C(r, s, m)|bs | max(m−1 , −m+1 )/δ s+m−1 ≤ C(r, s, m)|bs |s δ s max(m−1 , −m+1 )/δ s+m−1 = C(r, s, m)|bs | max( s
m−1
,
−m+1
)/δ
(since 1 ≤ δ)
m−1
≤ C(r, s, m)|bs | . s
(14.13)
Applying Theorem 14.8 to the matrix T (b ) and taking into account that |bs | does not exceed max(|b−r |, |bs |), we see that spm β T (b ) = sp T (b )
for
β < C(r, s, m)|bs |s .
Thus, (14.12) and (14.13) give lim sup spm ε Tn (b) ⊂ sp T (b ) n→∞
for every ∈ [1/δ, δ]. By Lemma 14.10, this implies that lim sup spm ε Tn (b) ⊂ (b). n→∞
To establish the inclusion in the other direction, note that (14.10) implies (b) ⊂ lim inf sp Tn (b) ⊂ lim inf spm ε Tn (b), n→∞
n→∞
thus completing the proof of Theorem 14.9. m In summary, Theorems 14.8 and 14.9 show that spm ε T (b) and lim spε Tn (b) stabilize at constant sets before ε reaches zero and that, moreover, these two constant sets are in general different. Thus, except for some trivial cases, we always have m lim spm ε Tn (b) spε T (b)
n→∞
for sufficiently small ε, implying that the passage from the “finite volume case” to the “infinite volume case” is discontinuous.
i
i i
i
i
i
i
14.3. Emergence of Antennae
14.3
buch7 2005/10/5 page 347 i
347
Emergence of Antennae
Let A be a bounded operator on 2 over Z, N, or {1, . . . , n}. We denote by Ej the projection defined by xj for k = j , (Ej x)k = 0 for k = j and by Ej k the operator given by the matrix whose j, k entry is 1 and all other entries of which are zero. For a subset of C, we put . (j,k) sp A = sp (A + ωEj k ). (14.14) ω∈ (j,k)
Thus, sp A is the union of all possible spectra that may emerge as the result of a perturbation of A in the j, k site by a number randomly chosen in . Notice that in the case where E ,E = εD, the set (14.14) is just the set spε j k A introduced in Section 7.1 (recall Theorem 7.2). For a set M ⊂ C, we define −1/M as the set −1/M := {z ∈ C : 1 + μz = 0 for some μ ∈ M}. Here is an analogue to (7.3). Lemma 14.13. If 0 ∈ , then sp A = sp A ∪ {λ ∈ / sp A : [(A − λI )−1 ]kj ∈ −1/ }, (j,k)
where [(A − λI )−1 ]kj is the k, j entry of the resolvent (A − λI )−1 . (j,k)
Proof. Since 0 ∈ , we have sp A = sp A ∪ X with some set X ⊂ C \ sp A. Fix λ ∈ C \ sp A. Obviously, λ belongs to X if and only if there is an ω ∈ such that A − λI + ωEj k is not invertible or, equivalently, such that I + (A − λI )−1 ωEj k is not invertible. As Ej k is a trace class operators, the operator I + (A − λI )−1 ωEj k is not invertible if and only if 0 = det(I + (A − λI )−1 ωEj k ) = 1 + ω[(A − λI )−1 ]kj , which proves the assertion. Now let b be a Laurent polynomial. For λ ∈ / sp T (b), we denote the j, k entry of T −1 (b − λ) by dj k (λ). Thus, T −1 (b − λ) = (dj k (λ))∞ j,k=1 for λ ∈ C \ sp T (b). We also put jk
H (b) = {λ ∈ C \ sp T (b) : dkj (λ) ∈ −1/ }.
(14.15)
i
i i
i
i
i
i
348
buch7 2005/10/5 page 348 i
Chapter 14. Impurities
Theorem 14.14. If 0 ∈ , then (j,k)
jk
sp T (b) = sp T (b) ∪ H (b). jk
Proof. This is immediate from Lemma 14.13 and the definition of H (b). Example 14.15. Let b(t) = t + α 2 t −1 (t ∈ T) with α ∈ [0, 1). The set b(T) is the ellipse {(1 + α 2 ) cos θ + i(1 − α 2 ) sin θ : θ ∈ [0, 2π )}. Let E+ and E− denote the interior and the exterior of the ellipse b(T), respectively. Clearly, C \ sp T (b) = E− . For > 1, define b by b (t) = t + α 2 −1 t −1 . Then b (T) is the ellipse {( + α 2 −1 ) cos θ + i( − α 2 −1 ) sin θ : θ ∈ [0, 2π )}. Since + α 2 −1 > 1 + α 2
and
− α 2 −1 > 1 − α 2
for
> 1,
the ellipses b (T) are contained in E− for every > 1, and each point of E− lies on exactly one of these ellipses. In other words, each point λ ∈ E− can be uniquely written as λ = eiθ + α 2 −1 e−iθ with ∈ (1, ∞), θ ∈ [0, 2π ). We have b(t) − λ = t −1 (t 2 − λt + α 2 ) = t −1 (t − z1 )(t − z2 ) with z1 = α 2 −1 e−iθ ,
z2 = eiθ .
−1 −1 )T (b− ) with Since |z1 | = α 2 −1 < 1 and |z2 | = > 1, we get T −1 (b − λ) = T (b+ t z1 , b− (t) = t −1 (t − z1 ) = 1 − . b+ (t) = t − z2 = −z2 1 − z2 t
Taking into account that −1 (t) b+
1 =− z2
t t2 1+ + 2 + ··· z2 z2
,
−1 b− (t) = 1 +
we arrive at the representation ⎛
1 1 ⎜ 1/z2 −1 T (b − λ) = − ⎜ z2 ⎝ 1/z22 ...
⎞⎛ 1 1/z2 ...
1 ...
⎟⎜ ⎟⎜ ⎠⎝ ...
1
z2 z1 + 21 + · · · , t t
z1 1
z12 z1 1
⎞ ... ... ⎟ ⎟. ... ⎠ ...
i
i i
i
i
i
i
14.3. Emergence of Antennae
buch7 2005/10/5 page 349 i
349
sp(1,1) T (b) εD for ε = 0, 23 , 2, 25
sp(1,1) [−ε,ε] T (b) for ε =
5 2
2 −1 Figure 14.2. The sets sp(1,1) T (b) and sp(1,1) with α = [−ε,ε] T (b) for b(t) = t + α t εD
9 . 10
In particular, 1 1 z1 α2 = − iθ , d12 (λ) = − = − 2 2iθ , z2 e z2 e 2 α 1 1 1 z1 1 d21 (λ) = − 2 = − 2 2iθ , d22 (λ) = − + 1 = − iθ +1 . e z 2 z2 e 2 e2iθ z2 d11 (λ) = −
These formulas and their analogues for general dj k (λ) can be used to compute the sets jk H (b). In the cases = εD and = [−ε, ε], (14.15) reads : 1 jk HεD (b) = λ ∈ E− : |dkj (λ)| > , ε : 1 1 jk H[−ε,ε] (b) = λ ∈ E− : dkj (λ) ∈ −∞, − ∪ ,∞ . ε ε Figure 14.2 shows examples of sp(1,1) T (b) and sp(1,1) [−ε,ε] T (b): An elliptic “halo” emerges εD in the former spectrum and two “wings” or “antennae” arise in the latter. In Figures 14.3 (3,3) (3,3) and 14.4, we illustrate sp(2,2) T (b) and sp(2,2) [−ε,ε] T (b), and spεD T (b) and sp[−ε,ε] T (b) for the εD T (b) and sp(3,3) T (b) contain holes that disappear for same values of ε. Notice that sp(2,2) εD εD (j,k)
larger values of ε. Computing the sets sp[−ε,ε] T (b) for all (j, k) with ε = 5, we arrive at Figure 14.5, which is examined in more detail in the three close-ups of Figure 14.7. Finally, Figure 14.6 is an attempt to obtain Figure 14.5 by replacing the infinite matrix T (b) with T30 (b); we will say more about this in the following section.
i
i i
i
i
i
i
350
buch7 2005/10/5 page 350 i
Chapter 14. Impurities
sp(2,2) T (b) εD for ε = 0, 23 , 2, 25
sp(2,2) [−ε,ε] T (b) for ε =
5 2
2 −1 Figure 14.3. The sets sp(2,2) T (b) and sp(2,2) with α = [−ε,ε] T (b) for b(t) = t + α t εD
9 . 10
sp(3,3) T (b) εD for ε = 23 , 2, 25
sp(3,3) [−ε,ε] T (b) for ε =
5 2
2 −1 Figure 14.4. The sets sp(3,3) T (b) and sp(3,3) with α = [−ε,ε] T (b) for b(t) = t + α t εD
9 . 10
i
i i
i
i
i
i
14.3. Emergence of Antennae
buch7 2005/10/5 page 351 i
351
Figure 14.5. The set ∪(j,k)∈N×N sp[−ε,ε] T (b) for b(t) = t + α 2 t −1 with α = (j,k)
2 5
and ε = 5.
Figure 14.6. Computed eigenvalues of 5000 Toeplitz matrices of dimension 30 with b(t) = t + α 2 t −1 for α = 25 , each perturbed in a single random entry by a random number uniformly distributed in [−5, 5].
i
i i
i
i
i
i
352
buch7 2005/10/5 page 352 i
Chapter 14. Impurities
1.4
1.2
1
0.8
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
1.2
0.6
0.65
1.4
1
1.4 1.3
0.95
1.2 0.9 1.1 0.85
1 0.9
0.8
0.8
0.75
0.7 0.7
0.6
0.65
0.5 0.4
0.6 0.4
0.6
0.8
1
1.2
0.7
0.75
0.8
0.85
0.9
Figure 14.7. Close-ups of the set shown in Figure 14.5; the gray boxes in the top image indicate the axes of the images below.
i
i i
i
i
i
i
14.4. Behind the Black Hole
14.4
buch7 2005/10/5 page 353 i
353
Behind the Black Hole (j,k)
We now study the behavior of sp Tn (b) as n goes to infinity. Let b be a Laurent polynomial, b(t) =
s
bk t k ,
b−r bs = 0,
(14.16)
k=−r
and for ∈ (0, ∞), define b by b (t) = k bk k t k . The limiting set (b) given by (11.4) can be characterized as in Theorem 11.3. Thus, if λ ∈ C \ (b), then there is a > 0 such that T (b − λ) is invertible. Lemma 14.16. The analytic functions dj k : C \ sp T (b) → C defined in Section 14.3 can be continued to analytic functions dj k : C \ (b) → C such that if λ ∈ C \ (b), > 0, and T (b − λ) is invertible, then T −1 (b − λ) = (j −k dj k (λ))∞ j,k=1 .
(14.17)
Proof. Suppose first that T (b) is not triangular; that is, let b be of the form (14.16) with r ≥ 1 and s ≥ 1. Pick λ ∈ C \ (b) and choose any > 0 such that T (b − λ) is invertible. One can write b(t) − λ = t −r bs
r+s
(t − zj (λ)),
j =1
whence −r −r
b (t) − λ = t
bs
r+s
(t − zj (λ)).
j =1
Taking into account that T (b − λ) is invertible if and only if b − λ has no zeros on T and has winding number zero, it is not difficult to check that the invertibility of T (b − λ) is equivalent to the existence of a labelling of the zeros zj (λ) such that |z1 (λ)| ≤ · · · ≤ |zr (λ)| < < |zr+1 (λ)| ≤ · · · ≤ |zr+s (λ)|.
(14.18)
Abbreviating zj (λ) to zj , we have b (t) − λ = bs b− (t)b+ (t) with r r+s zj 1− t − zj , b+ (t) = b− (t) = t j =1 j =r+1 and thus −1 −1 T −1 (b − λ) = bs−1 T (b+ )T (b− ).
Clearly,
(14.19)
∞ zj2 zj 1+ = bn −n t −n , + 2 2 + · · · =: t t n=0 j =1 ∞ r+s s 2 2 t (−1) t −1 b+ (t) = 1+ + 2 + ··· = cn n t n , zr+1 . . . zr+s j =r+1 zj zj n=0
−1 b− (t)
r
i
i i
i
i
i
i
354
buch7 2005/10/5 page 354 i
Chapter 14. Impurities
where bn = bn (λ) and cn = cn (λ) are independent of . Convergence of these series is a consequence of (14.18). Thus, by (14.19), T −1 (b − λ) equals ⎛ ⎞ ⎞⎛ c0 b0 b1 / b2 /2 . . . ⎟⎜ 1 ⎜ b0 b1 / . . . ⎟ ⎜ c1 2 c0 ⎟, ⎟⎜ ⎝ ⎠ ⎝ c c c ... ⎠ b0 bs 2 1 0 ... ... ... ... ... which shows that [T −1 (b − λ)]j,k = j −k bs−1 (cj −1 bk−1 + cj −2 bk−2 + · · · ). This proves (14.17) with certain numbers dj k (λ) independent of . If T (b −λ) is invertible, then so is T (b −μ) for all μ in some open neighborhood of λ, and the entries of the inverse of T (b − μ) are analytic functions of μ in this neighborhood. Thus, the functions dj k are defined and analytic in C \ (b). As outside sp T (b) these functions coincide with the functions dj k introduced in Section 14.3, we arrive at the assertion of the lemma. Finally, the proof is similar if T (b) is triangular. The following two results will be used in the proof of Theorem 14.19, which is the main result of this section. Lemma 14.17. Fix a site (j, k) and let λ ∈ C \ (b). If > 0 and T (b − λ) is invertible, then there exist an open neighborhood U ⊂ C \ (b) of λ and a natural number n0 such that T (b − μ) is invertible for all μ ∈ U , the matrices Tn (b − μ) are invertible for all μ ∈ U and all n ≥ n0 , and −1 Tn (b − μ) j k → T −1 (b − μ) j k as n → ∞ uniformly with respect to μ ∈ U . Proof. By Theorem 3.7, there exist an open neighborhood V of λ and a natural number m0 such that T (b − μ) is invertible for all μ ∈ V , M := sup sup Tn−1 (b − μ) < ∞, n≥m0 μ∈V
and
Tn−1 (b − μ)
jk
→ T −1 (b − μ)
jk
as n → ∞
for all μ ∈ V . Hence, given any ε > 0, we can find a number n0 ≥ m0 and an open neighborhood U ⊂ V of λ such that −1 Tn (b − λ) j k − T −1 (b − λ) j k < ε/3, −1 T (b − μ)
jk
− T −1 (b − λ)
< ε/3, jk
i
i i
i
i
i
i
14.4. Behind the Black Hole and
−1 Tn (a − μ)
jk
buch7 2005/10/5 page 355 i
355
− Tn−1 (a − λ)
jk
≤ Tn−1 (a − μ) − Tn−1 (a − λ) ≤ |μ − λ| Tn−1 (a − μ) Tn−1 (a − λ) ≤ M 2 |μ − λ| < ε/3 for all μ ∈ U and all n ≥ n0 . The assembly of these three ε/3 inequalities yields the assertion. Theorem 14.18 (Hurwitz). Let G ⊂ C be an open set, let f be a function that is analytic in G and does not vanish identically, and let {fn } be a sequence of analytic functions in G that converges to f uniformly on compact subsets of G. If f (λ) = 0 for some λ ∈ G, then there is a sequence {λn } of points λn ∈ G such that λn → λ as n → ∞ and fn (λn ) = 0 for all sufficiently large n. jk
By virtue of Lemma 14.16 we can extend the sets H (b) ⊂ C \ sp T (b) given by (14.15) to C \ (b). Thus let henceforth jk
H (b) = {λ ∈ C \ (b) : dkj (λ) ∈ −1/ }. Here is the analogue of Theorem 14.14 for large finite matrices. The technical assumptions made in the following result will be discussed later. Theorem 14.19. Let b be a Laurent polynomial and let be a compact subset of C that contains the origin. If dkj : C \ (b) → C is identically zero or nowhere locally constant or assumes a constant value c that does not belong to −1/ , then (j,k)
jk
lim sp Tn (b) = (b) ∪ H (b).
(14.20)
n→∞
Proof. Let G be a connected component of C \ (b). We first prove that (j,k) jk lim inf sp Tn (b) ∩ G ⊃ (b) ∪ H (b) ∩ G. n→∞
(14.21)
jk
/ −1/ or dkj (μ) = 0 for all μ ∈ , then H (b) = ∅, and (14.21) is If dkj (μ) = c ∈ evident from Theorem 11.17. (Recall that 0 ∈ .) Thus assume dkj is not constant in G and jk H (b) is not empty. Take λ in the right-hand side of (14.21). If λ is in the boundary ∂G of (j,k) jk G, then λ is in (b) and hence in lim inf sp Tn (b). Thus, let λ ∈ G. Since λ ∈ H (b), there is an ω ∈ such that 1 + ωdkj (λ) = 0. Choose > 0 so that T (b − λ) is invertible and let U and n0 be as in Lemma 14.17. Due to Lemma 14.16, f (μ) := 1 + ωdkj (μ) = 1 + ωj −k T −1 (b − μ) kj for μ ∈ U . Lemma 14.17 shows that the functions fn defined in U by fn (μ) = 1 + ωj −k Tn−1 (b − μ) kj
i
i i
i
i
i
i
356
buch7 2005/10/5 page 356 i
Chapter 14. Impurities
converge uniformly to f in U . Since f is not constant in U and is zero at λ ∈ U , Theorem 14.18 implies that there are λn ∈ U such that λn → λ and fn (λn ) = 0. Let D = diag (1, , . . . , n−1 ). It can be readily verified that Tn (b − μ) = D Tn (b − μ)D−1 , whence
Tn−1 (b − μ)
and thus
kj
= k−j Tn−1 (b − μ)
0 = fn (λn ) = 1 + ω Tn−1 (b − λn )
kj
(14.22)
kj
(14.23)
.
(j,k)
From Lemma 14.13 we now deduce that λn ∈ sp Tn (b), and since λn → λ, it follows that λ is in the left-hand side of (14.21). We now show that (j,k) jk (14.24) lim sup sp Tn (b) ∩ G ⊂ (b) ∪ H (b) ∩ G. n→∞
Pick λ in the left-hand side of (14.24). If λ ∈ ∂G ⊂ (b), then λ is obviously in the righthand side of (14.24). We can therefore assume that λ ∈ G. By the definition of the partial (j,k) limiting set, there are λn ∈ sp Tn (b) ∩ G such that λn → λ. Choose > 0 so that T (b −λ) is invertible. By Lemma 14.17, the matrices Tn (b −λn ) are invertible whenever n is sufficiently large, and from (14.22) it then follows that the matrices Tn (b − λn ) are also (j,k) invertible for all n large enough. Hence, taking into account that λn ∈ sp Tn (b) and −1 using Lemma 14.13, we see that there are ωn ∈ such that 1 + ωn [Tn (b − λn )]kj = 0. Due to (14.23), this implies that (b − λn ) kj = 0. (14.25) 1 + ωn j −k Tn−1 Since is compact, the sequence {ωn } has a partial limit ω in . Consequently, (14.25) and Lemma 14.17 give 1 + ωj −k [T −1 (a − λ)]kj = 0, and Lemma 14.16 now yields the jk equality 1 + ωdkj (λ) = 0. It results that dkj (λ) ∈ −1/ and thus that λ ∈ H (b). (j,k) From (14.21) and (14.24) we obtain that lim inf sp Tn (b) ∩ G is equal to G ∩ jk ((b) ∪ H (b)). Considering the union over all components of C \ (b) we arrive at (14.20). Example 14.20. Let b(t) = t + α 2 t −1 be as in Example 14.15. The range b(T) is an ellipse, sp T (b) equals b(T) ∪ E+ , where E+ denotes the set of points inside this ellipse, and (b) = [−2α, 2α] is the line segment between the foci of the ellipse. The expressions for dkj (λ) found in Example 14.15 for λ outside the ellipse E+ extend by analyticity to all λ ∈ C \ (b). Since C \ (b) is connected, it is not difficult to show that the functions dkj are nowhere locally constant (details are given in the next section). Thus, (14.20) is valid in the case at hand. At the intersection of the j th row and kth column of Figure 14.8 we see the set jk (j,k) (b) ∪ H (b) and thus limn→∞ sp Tn (b) for = [−5, 5]. Figure 14.9 illustrates the effects of complex single-entry perturbations. In the (j,k) j th row and the kth column of Figure 14.9 we see spD T25 (b) for b(t) = t + 41 t −1 .
i
i i
i
i
i
i
14.4. Behind the Black Hole
buch7 2005/10/5 page 357 i
357
k=1
k=2
k=3
j =1
j =2
j =3
Figure 14.8. The sets (b) ∪ H (b) for b(t) = t + 19 t −1 , = [−5, 5], and the nine possible choices of (j, k) with j, k ∈ {1, 2, 3}. jk
k=1
k=2
k=3
j =1
j =2
j =3
Figure 14.9. Complex single-entry perturbations to T (b) and Tn (b) for b(t) = t + 41 t −1 and = D, the closed unit disk.
i
i i
i
i
i
i
358
buch7 2005/10/5 page 358 i
Chapter 14. Impurities
The plot in the j th row and kth column shows the superimposed eigenvalues of 1000 random perturbations to T30 (b) in the (j, k) entry for j, k ∈ {1, 2, 3}. Each perturbation is a random number uniformly distributed in D. The boundaries of the regions (j,k)
(j,k)
lim sp D Tn (b) = (b) ∪ HD
n→∞
(b)
are drawn as solid curves. While the emergence of wings (or antennae) is typical for realvalued perturbations, one finds that complex perturbations usually lead to “bubbles.” For example, we see two bubbles in sp(2,2) T25 (b), which split into three bubbles in sp(3,3) T30 (b). D D Figures 14.10 and 14.11 illustrate the following experiment. We choose one of the entries of the upper m × m block of Tn (b) randomly with probability 1/m2 and then perturb Tn (b) in this entry by a random number uniformly distributed in = [−5, 5], plotting the n eigenvalues of the perturbed matrix. We repeat this N times and consider the superimposition of the N n eigenvalues obtained. Equality (14.20) suggests that this superimposition should approximate . 1≤j,k≤m
.
(j,k)
lim sp[−5,5] Tn (b) =
n→∞
(j,k)
(b) ∪ H[−5,5] (b)
(14.26)
1≤j,k≤m
as n → ∞ and N → ∞. For m = 3 and m = 5, the sets (14.26) are shown in the middle pictures of Figures 14.10 and 14.11, respectively. Notice that, up to a change in scale, the middle picture of Figure 14.10 is nothing but the union of the nine pictures of Figure 14.8. The bottom pictures of Figures 14.10 and 14.11 depict the result of concrete numerical experiments with N = 2000 and n = 20. The agreement between the n → ∞ theory and practice is striking even for modest values of n. The top pictures of Figures 14.10 and 14.11 illustrate .
(j,k)
sp[−5,5] T (b) =
1≤j,k≤m
(j,k) sp T (b) ∪ H[−5,5] (b) .
.
(14.27)
1≤j,k≤m
Obviously, these top pictures (infinite volume case) differ significantly from the middle pictures (finite volume case). Even more than that, in the finite volume case we discover a remarkable structure in the set (14.27). In the infinite volume case, this structure is hidden behind the black ellipse E+ , so we are only aware of the ends of certain arcs, resembling antennae sprouting from the ellipse. The top picture of Figure 14.12 shows .
(j,k)
sp[−5,5] T (b) =
(j,k)∈N×N
.
(j,k)
sp T (b) ∪ H[−5,5] (b)
(j,k)∈N×N
(infinite volume case), while in the middle picture we approximate . (j,k)∈N×N
(j,k)
lim sp[−5,5] Tn (b) =
n→∞
.
(j,k)
(b) ∪ H[−5,5] (b)
(j,k)∈N×N
i
i i
i
i
i
i
14.4. Behind the Black Hole
buch7 2005/10/5 page 359 i
359
Figure 14.10. Real single-entry perturbations to T (b) and Tn (b) for b(t) = (j,k) t + 19 t −1 and = [−5, 5]. The top picture shows the union of sp T (b) over all (j, k) (j,k) in the upper 3 × 3 block; the middle picture represents the union of limn→∞ sp Tn (b) over the same (j, k). The bottom picture superimposes the eigenvalues of 2000 random single-entry perturbations of T20 (b).
i
i i
i
i
i
i
360
buch7 2005/10/5 page 360 i
Chapter 14. Impurities
Figure 14.11. This is the analogue of Figure 14.10 for real single-entry perturbations in the upper 5 × 5 block.
i
i i
i
i
i
i
14.4. Behind the Black Hole
361
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−1
−0.5
0
buch7 2005/10/5 page 361 i
0.5
1
1.5
−1.5
−1
−0.5
0
0.5
1
1.5
Figure 14.12. Real single-entry perturbations to T (b) and Tn (b) for b(t) = t + (j,k) and = [−5, 5]. The top picture is the union of sp T (b) over (j, k) ∈ N ×N. The middle picture shows computed eigenvalues of 1000 single-entry perturbations to T50 (b), where the perturbed entry is chosen by random anywhere in the matrix, and the perturbation itself is randomly chosen from the uniform distribution on [−5, 5]. The two bottom pictures intend to motivate the dark interior elliptic region in the middle picture. They are explained in the text. 1 −1 t 9
i
i i
i
i
i
i
362
buch7 2005/10/5 page 362 i
Chapter 14. Impurities
(finite volume case) by the union of sp (T50 (a) + ωEj k ) for 1000 random choices of j, k ∈ {1, 2, . . . , 50} and ω ∈ [−5, 5]. The eigenvalues of the finite Toeplitz matrices Tn (a) are highly sensitive to perturbations even for modest dimensions. It is interesting that the single entry perturbations investigated here do not generally change the qualitative nature of that eigenvalue instability. This is revealed for a specific example by the pseudospectral plots at the bottom of Figure 14.12 (computed using [300]). The interior elliptical region of high eigenvalue concentration in the middle picture is an artifact of finite precision arithmetic; this is revealed by the two bottom pictures, which show the boundaries of the pseudospectra spε T50 (b) (left) and spε (T50 (b)+ωE54 ) with ω = −3 (right) for ε = 10−1 , 10−3 , . . . , 10−15 . Dots (·) denote computed eigenvalues; circles (◦) show the true eigenvalue locations. This explains the dark interior ellipse in the center plot of Figure 14.12: Many of these computed eigenvalues are inaccurate due to rounding errors. Generic perturbations of norm 10−15 obscure the effects of our larger, single-entry perturbations. The true structure is more delicate, as emphasized by Figure 14.13, which zooms in on the middle image of Figure 14.11 for perturbations to the upper left 5 × 5 corner of Tn (b). We compare the n → ∞ structure to the eigenvalues of perturbations of T10 (b) and T50 (b). The convergence to the asymptotic limit is compelling, though from an applications perspective, any point in the interior of b(T) will behave like an eigenvalue when n is large. Finally, Figure 14.14 exhibits a Toeplitz matrix with six diagonals. The symbol b is given by b(t) = (1.5 − 1.2 i)t −1 + (0.34 + 0.84 i)t + (−0.46 − 0.1 i)t 2 + (0.17 − 1.17 i)t 3 + (−1 + 0.77 i)t 4 . Notice again the emergence of many wings, which make the set (b) (middle picture) become something reminiscent of a horse in cave paintings.
14.5
Can Structured Pseudospectra Jump?
Let H be a Hilbert space and let A, B, C ∈ B(H ). In accordance with (7.2), we define the structured pseudospectrum spB,C / sp A : C(A − λI )−1 B ≥ 1/ε . (14.28) ε A = sp A ∪ λ ∈ This section addresses the question whether spB,C ε A can jump as ε ∈ (0, ∞) changes continuously. By virtue of (14.28), this question is equivalent to asking whether the norm C(A − λI )−1 B may be locally constant. If H = 2 (N), we have (j,k)
E ,Ek
spεD A = spε j
A,
where Ej is the projection on the j th coordinate. Thus, the question whether |dj k (λ)| = |[T −1 (b − λ)]j k | = Ej T −1 (b − λ)Ek can locally be a nonzero constant, which is of importance in connection with Theorem 14.19, E ,E amounts to the problem whether spε j k T (b ) can jump.
i
i i
i
i
i
i
14.5. Can Structured Pseudospectra Jump?
buch7 2005/10/5 page 363 i
363
n→∞
n = 50
n = 10
Figure 14.13. Closer inspection of Figure 14.11. The top plot shows a portion (j,k) of limn→∞ sp Tn (b) over all (j, k) in the top 5 × 5 corner for = [−5, 5]. The middle image shows eigenvalues of 10000 random perturbations to a single entry in the top corner of T50 (b); the bottom image shows the same for T10 (b). (The accurate eigenvalues for N = 50 were obtained by reducing the nonnormality in the problem via a similarity transformation.)
Proposition 14.21. Let A, B, C be bounded linear operators on a Hilbert space H and suppose B or C is compact. Let G∞ denote the unbounded component of C \ sp A and let G be any component of C \ sp A. Suppose further that there exists a path in the plane that connects G and G∞ such that C(A − λI )−1 B can be analytically continued from G along to some function f (λ) defined in some open subset V of G∞ and that f (λ) = C(A−λI )−1 B for λ ∈ V . Then C(A − λI )−1 B is either nowhere locally constant in G or identically zero in G.
i
i i
i
i
i
i
364
buch7 2005/10/5 page 364 i
Chapter 14. Impurities
Figure 14.14. Real single-entry perturbations to Tn (b). The range b(T) of the Laurent polynomial b is seen in the top picture. The middle picture shows sp T50 (b) and provides a very good idea of (b). The bottom picture depicts the superimposed eigenvalues of 1000 perturbations of T50 (b) in a randomly chosen entry from (1, 1), (2, 2), (3, 3) by a random number uniformly distributed in [−5, 5].
Of course, here “nowhere locally constant in G” means that there is no open subset of G on which the function is constant. Proposition 14.21 implies in particular that, provided B or C is compact, spB,C ε A cannot jump if C \ sp A is connected (which is certainly true for finite matrices A as well as for selfadjoint or compact operators A).
i
i i
i
i
i
i
14.5. Can Structured Pseudospectra Jump?
buch7 2005/10/5 page 365 i
365
Proof. Abbreviate C(A − λI )−1 B to f (λ). Pick λ0 ∈ G. Since f (λ0 ) is compact, there exist x0 , y0 ∈ H of norm 1 such that (x0 , f (λ0 )y0 ) = f (λ0 ). Put h(λ) = (x0 , f (λ)y0 ). The function h is analytic and hence either an open map or constant. In the former case, every neighborhood of λ0 contains a λ1 ∈ G such that f (λ0 ) = |h(λ0 )| < |h(λ1 )| = |(x0 , f (λ1 )y0 )| ≤ f (λ1 ), which shows that f (λ) cannot be locally constant in G. In the latter case, analytic continuation of h along to G∞ and subsequently to infinity yields that the constant value assumed by h must be zero. Example 14.22. Let U be the forward shift on 2 (Z), that is, (U x)n = xn−1 . Equivalently, U is the Laurent operator L(b) induced by b(t) = t. Then sp U is the unit circle T and the central 4 × 4 block of the resolvent operator (U − λI )−1 is ⎛ ⎜ ⎜ ⎜ ⎝
0 0 0 0
1 0 0 0
λ 1 0 0
λ2 λ 1 0
⎞ ⎟ ⎟ ⎟ ⎠
⎛ and
⎜ ⎜ ⎜ ⎝
−1/λ −1/λ2 −1/λ3 −1/λ4
0 −1/λ −1/λ2 −1/λ3
0 0 −1/λ −1/λ2
0 0 0 −1/λ
⎞ ⎟ ⎟ ⎟ ⎠
for |λ| < 1 and |λ| > 1, respectively. Thus, P2 (U − λI )−1 P2 = 1 for |λ| < 1. In this case the crucial hypothesis of Proposition 14.21 is not satisfied: although each entry of (U − λI )−1 can be analytically continued from |λ| < 1 to all of C, the result of this continuation is different from the corresponding entry of (U − λI )−1 for |λ| > 1. Using (14.28) and the explicit expressions for (U −λI )−1 displayed above, it is easy to % √ compute spPε 2 ,P2 U . Put ε0 = 2/(3 + 5) = 0.618 . . . . There is a continuous and strictly monotonically increasing function h : [ε0 , ∞) → [0, ∞) such that h(ε0 ) = 0, h(∞) = ∞, and ⎧ for 0 < ε ≤ ε0 , ⎨ {λ ∈ C : |λ| = 1} {λ ∈ C : 1 ≤ |λ| < 1 + h(ε)} for ε0 < ε ≤ 1, spPε 2 ,P2 U = ⎩ {λ ∈ C : 0 ≤ |λ| < 1 + h(ε)} for 1 < ε. Clearly, spPε 2 ,P2 U jumps at ε = 1. Now let b be a Laurent polynomial of the form b(t) =
s
bk t k ,
r ≥ 0,
s ≥ 0,
b−r = 0,
bs = 0.
(14.29)
k=−r
Conjecture 14.23. Let the Laurent polynomial b be of the form (14.29) and suppose that one of the operators B and C is compact. If G is any connected component of C \ sp T (b), then CT −1 (b − λ)B is either nowhere locally constant in G or identically zero in G. The rest of this section is devoted to the proof of the following result.
i
i i
i
i
i
i
366
buch7 2005/10/5 page 366 i
Chapter 14. Impurities
and C = CP m2 with bounded Theorem 14.24. Suppose B and C are of the form B = Pm1 B operators B and C. Then Conjecture 14.23 is true in each of the following cases: (a) C \ (b) is connected; (b) m1 = m2 = 1; (c) T (b) is Hessenberg, that is, r or s equals 1; (d) r + s is a prime number and r or s equals 2; (e) r + s ≤ 5 or r + s = 7. Before proceeding to the proof, we cite a corollary of Theorem 14.24. Corollary 14.25. Let b be of the form (14.29) and let G be a connected component of C \ (b). The function d11 is always nowhere locally constant in G. If j ≥ 1 or k ≥ 1, then the function dj k is either identically zero in G or nowhere locally constant in G provided one of the following conditions is satisfied: (a) C \ (b) is connected; (b) T (b) is Hessenberg, that is, r or s equals 1; (c) r + s is a prime number and r or s equals 2; (d) r + s ≤ 5 or r + s = 7. Proof. Pick λ0 ∈ G. There is a > 0 such that λ0 ∈ C \ sp T (b ) and dj k (λ) = Ej T −1 (b − λ)Ek for all λ in some open neighborhood of λ0 . It remains to use Theorem = Ej , and with m2 = k and C = Ek . 14.24 with b replaced by b , with m1 = j and B We emphasize that C \ (b) is connected in many cases. This set is in particular connected if T (b) has at most three nonzero diagonals (see Example 11.18) or if T (b) is triangular (in which case (b) is a singleton) or if T (b) is Hermitian (which implies that (b) is a line segment of the real line). We do not know any b with r + s ≤ 5 for which C \ (b) is disconnected. Theorem 11.20 delivers a b with r = s = 3 such that C \ (b) is disconnected. We now turn to the proof of Theorem 14.24. Let b be of the form (14.29) and put
n = r + s,
Qn (z) = b−r + b−r+1 + · · · + bs zr+s .
(14.30)
For λ ∈ C, let z1 (λ), . . . , zn (λ) be the zeros of the polynomial Qn (z) − λzr , Qn (z) − λzr = bs (z − z1 (λ)) · · · (z − zn (λ)).
(14.31)
The operator T (b−λ) is invertible if and only if r of the zeros z1 (λ), . . . , zn (λ) have modulus less than 1 and the remaining s zeros are of modulus greater than 1. We denote the former zeros by δ1 (λ), . . . , δr (λ) and the latter zeros by μ1 (λ), . . . , μs (λ). We put
i
i i
i
i
i
i
14.5. Can Structured Pseudospectra Jump?
buch7 2005/10/5 page 367 i
367
μ(λ) = μ1 (λ) · · · μs (λ), u0 (λ) = 1, v0 (λ) = 1, um (λ) = μ1 (λ)−α1 · · · μs (λ)−αs (m ≥ 1), α1 +···+αs =m αj ≥0
vm (λ) =
δ1 (λ)β1 · · · δr (λ)βr
(m ≥ 1),
β1 +···+βr =m βj ≥0
⎞ u0 (λ) ⎟ ⎜ u1 (λ) u0 (λ) ⎟, U (λ) = ⎜ ⎠ ⎝ u2 (λ) u1 (λ) u0 (λ) ... ... ... ... ⎛ ⎞ v0 (λ) v1 (λ) v2 (λ) . . . ⎜ v0 (λ) v1 (λ) . . . ⎟ ⎟. V (λ) = ⎜ ⎝ v0 (λ) . . . ⎠ ... ⎛
Then T −1 (b − λ) =
(−1)s 1 · U (λ)V (λ). bs μ(λ)
(14.32)
From (14.32) we see that each entry of T −1 (b − λ) is of the form [T −1 (b − λ)]j k = Rj k (δ1 (λ), . . . , δr (λ); μ1 (λ), . . . , μs (λ)),
(14.33)
where Rj k is a rational function of n = r + s variables with coefficients in Z. Moreover, Rj k is symmetric in the first r variables and in the last s variables. Throughout what follows we let f (λ) = CT −1 (b − λ)B and we assume that G is some bounded component of C \ sp T (b). We consider the Riemann surface of Qn (z) − λzr = 0. The points λ ∈ C for which Qn (z) − λzr has a multiple zero are called the finite branch points. There exist at most n finite branch points. We denote them by λ1 , . . . , λk . The point at infinity is also a branch point. Thus, the set of all branch points is W := {λ1 , . . . , λk , ∞}. We join λ1 to λ2 by a cut S1 , λ2 to λ3 by a cut S2 , . . . , and λk to ∞ by a cut Sk . Put S = S1 ∪ · · · ∪ Sk ∪ W. We can draw S1 , . . . , Sk so that C \ S is connected. The zeros in (14.31) can be chosen so that z1 (λ), . . . , zn (λ) are analytic functions in C \ S. Take n copies 1 , . . . , n of C \ S and think of zj as a map of j to C. We glue i and j along the cut S whenever the function zi (λ) can be continued analytically to the function zj (λ) across S . The resulting set is the Riemann surface of Qn (z) − λzr = 0, and j is referred to as the j th branch of . Each path in C \ W induces a permutation of the branches of in a natural way. The set of all these permutations is a group G, the monodromy group of Qn (z) − λzr = 0. Let πj
i
i i
i
i
i
i
368
buch7 2005/10/5 page 368 i
Chapter 14. Impurities
(j = 1, . . . , k) be the permutation corresponding to a small counterclockwise-oriented circle around the branch point λj . Clearly, G contains π1 , . . . , πk . We put π∞ = π1 . . . πk . Thus, π∞ is the permutation of the branches resulting from a large counterclockwise-oriented circle containing all finite branch points in its interior. The group G is generated by π1 , . . . , πk . Let λ ∈ C \ sp T (b). We have |zj (λ)| < 1 for exactly r values of j . We call the branches j corresponding to these values of j the small branches at λ. The s branches j for which |zj (λ)| > 1 will be called the large branches at λ. Proposition 14.26. If there is a π ∈ G that permutes the set of the small branches at the points of G to the set of the small branches at the points of the unbounded component G∞ , then f (λ) is either nowhere locally constant in G or identically zero in G. Proof. Let be the path in C \ W that corresponds to the permutation π . From (14.32) and m2 T −1 (b − λ)Pm1 B (14.33) we see that each entry of Pm2 T −1 (b − λ)Pm1 and hence also CP can be analytically continued along . Since π permutes the small branches into themselves, m2 T −1 (b − λ)Pm1 B the result of the analytic continuation coincides with the operator CP for λ ∈ G∞ . It remains to apply Proposition 14.21. From Chapter 11 we know that when labelling the zeros z1 (λ), . . . , zr+s (λ) of (14.31) so that |z1 (λ)| ≤ |z2 (λ)| ≤ · · · ≤ |zr+s (λ)|,
(14.34)
(b) = {λ ∈ C : |zr (λ)| = |zr+1 (λ)|}.
(14.35)
then
Proof of Theorem 14.24(a). Pick a point λ0 ∈ G. Since C \ (b) is connected, there is a path in C \ (W ∪ (b)) joining λ0 to infinity. By (14.35), we have |zr (λ)| < |zr+1 (λ)| throughout this path. This means that when moving along this path we will eventually stay in the r small branches of at infinity. In other words, there is a π ∈ G permuting the set of the small branches at λ0 to the set of the small branches at infinity. The assertion is therefore immediate from Proposition 14.26. The group G is said to be r-transitive if for every two r-tuples (i1 , . . . , ir ) and (j1 , . . . , jr ) of r distinct branches of there is a π ∈ G such that π(i1 ) = j1 , . . . , π(ir ) = jr . We call the group G weakly r-transitive if for every two sets {i1 , . . . , ir } and {j1 , . . . , jr } of r distinct branches of there exists a π ∈ G such that {π(i1 ), . . . , π(ir )} = {j1 , . . . , jr }, i.e., such that π(i1 ), . . . , π(ir ) coincide with j1 , . . . , jr up to the arrangement. Clearly, weak 1-transitivity and 1-transitivity are equivalent. Proposition 14.27. If the monodromy group of Qn (z) − λzr = 0 is weakly r-transitive, then f (λ) is either nowhere locally constant in G or identically zero in G.
i
i i
i
i
i
i
14.5. Can Structured Pseudospectra Jump?
buch7 2005/10/5 page 369 i
369
Proof. Weak r-transitivity means that we can permute any prescribed set of r branches into any prescribed set of r branches. We can in particular permute the r small branches at the points of G into the r small branches at the points of G∞ . The assertion is therefore a direct consequence of Proposition 14.26. Proposition 14.28. The monodromy group of the polynomial Qn (z) − λzr = 0 is always 1-transitive. Proof. It is well known (see, e.g., [175, Section 4.14]) that G is 1-transitive if and only if Qn (z) − λzr is irreducible in C[z, λ]. But the irreducibility of Qn (z) − λzr in C[z, λ] can be readily verified. Proof of Theorem 14.24(b). Let λ be a point in C \ sp T (b). We label the zeros z1 (λ), . . . , zr+s (λ) so that (14.34) holds. Thus, the small branches at λ are 1 , . . . , r and the large branches at λ are r+1 , . . . , r+s . By formula (14.32), the function g(λ) := [T −1 (b − λ)]11 equals (−1)s bs−1 μ1 (λ)−1 · · · μs (λ)−1 = (−1)s bs−1 zr+1 (λ)−1 · · · zr+s (λ)−1 . This shows that g(λ) = 0. Since the group G is 1-transitive (Proposition 14.28), there is a path in C\W starting and terminating at λ such that the s large branches at λ are permuted into s branches i1 , . . . , is containing at least one small branch. Consequently, after analytic continuation along this curve, g(λ) becomes zi1 (λ)−1 . . . zis (λ)−1 , and because, obviously, |zi1 (λ) · · · zis (λ)| < |zr+1 (λ) · · · zr+s (λ)|, = g(λ)CP 1 B, 1 g(λ)P1 B it follows that |g(λ)| cannot be locally constant. Since f (λ) = CP we arrive at the conclusion that f (λ) = |g(λ)| CP1 B is either identically zero or nowhere locally constant in . Proof of Theorem 14.24(c). Without loss of generality assume r = 1; the case s = 1 can be reduced to the case r = 1 by passage to adjoint operators. Combining Propositions 14.27 and 14.28 we arrive at the assertion. We now turn to the case n = r + s = 4. By what was already proved, we are left with the constellation r = s = 2. Everything would be fine if the monodromy group of Q4 (z) − λz2 = 0 were always weakly 2-transitive (Proposition 14.27). Unfortunately, this need not be the case. Indeed, let Q4 (z)be the polynomial μz2 + (z − α)2 (z − β)2 , where α, β, μ ∈ C, α = β, and αβ = 0. We have exactly two finite branch points, λ1 (= μ) and λ2 , and the monodromy group is generated by π1 = (12)(43) and π2 = (13)(24). Here and in what follows we identify 1 , 2 , . . . with the numbers 1, 2, . . . . Clearly, we cannot permute {1, 2} into {1, 3}. Fortunately, in this case we can have recourse to Theorem 11.20, which shows that C \ (b) is connected. Let us return for a moment to the case of general n and r, s. The permutation π associated with a branch point λ can be written as a product of cycles. We say that λ is of the type (L1 , L2 , . . . ) if the cycle lengths of π are L1 , L2 , . . . . In the previous paragraph, we encountered two finite branch points of the type (2, 2). Proposition 14.29. If all finite branch points of the Riemann surface of the polynomial
i
i i
i
i
i
i
370
buch7 2005/10/5 page 370 i
Chapter 14. Impurities
Qn (z)−λz2 = 0 are of the type (2, 1, . . . , 1), then the monodromy group of Qn (z)−λz2 = 0 is weakly 2-transitive. Proof. It suffices to show that we can permute {1, 2} to {1, m} for arbitrary m = 1. Let λ1 , . . . , λk be the finite branch points. Without loss of generality suppose that the permutations (12), (13), . . . , (1 p) are among π1 , . . . , πk and that the permutations (1 p + 1), . . . , (1n) are not among π1 , . . . , πk . It is easily seen that we can permute {1, 2} to {1, m} for every m ∈ {2, . . . , r}. Thus, let m ≥ p + 1. Since G is 1-transitive (Proposition 14.28), there is a path on joining 1 to m . This path goes through the branches x1 , x2 , . . . , x and we may assume that xj = 1 and xj = m for all j and that x1 ∈ {2, . . . , p}. As each branch point is of the type (2, 1, . . . , 1), it follows that the path goes from the branch x1 to the branch 1 and stays there. Thus, we can permute {1, x1 } to {1, m}. Proposition 14.30. If r = s = 2, then f (λ) is either nowhere locally constant in G or identically zero in G. Proof. If there is a λ such that Q4 (z) − λz2 has two distinct zeros of multiplicity 2 or one zero of multiplicity 4, then the polynomial Q4 (z) is of the form λz2 + (z − α)2 (z − β)2 , and the assertion follows from Theorem 11.20 and Theorem 14.24(a). Thus, we may restrict ourselves to the case where all of the (at most four) finite branch points are of the types (3, 1) or (2, 1, 1). It is easily seen that the branch point at infinity is of the type (2, 2). Our goal is to show that G is weakly 2-transitive so that the present theorem follows from Proposition 14.27. If all finite branch points have the type (2, 1, 1), then Proposition 14.29 implies weak 2-transitivity. Thus, assume there is at least one finite branch point, λ1 , of the type (3, 1) and that π1 = (123). Since G is 1-transitive, there must exist at least one more finite branch point λ2 . We first consider the case where all finite branch points different from λ1 are of the type (2, 1, 1). As G is 1-transitive, branch 4 must be in the cycle of length 2 of one of these branch points, say λ2 . By symmetry, we may assume that the permutation π2 is (14). But a group of permutations of 1, 2, 3, 4 containing π1 = (123) and π2 = (14) is easily seen to be weakly 2-transitive. We are left with the case where λ2 is of the type (3, 1). If branch 4 is in the cycle of length 3 of π2 , then G is readily checked to be weakly 2-transitive. So assume branch 4 is left fixed by π2 . Then there must exist a third finite branch point λ3 of the type (2, 1, 1) such that branch 4 is contained in a cycle of length 2. By direct inspection of the few possible cases one sees that the group G is always weakly 2-transitive. Lemma 14.31. The order of the monodromy group G of Qn (z) − λzr = 0 is divisible by n. Proof. For 1 ≤ j ≤ n, let Gj = {π ∈ G : π(1) = j }. Clearly, G = G1 ∪ · · · ∪ Gn and Gi ∩ Gj = ∅ for i = j . Given two distinct numbers i, j in {1, . . . , n}, we can find a σ ∈ G such that σ (i) = j (Proposition 14.28). The map Gi → Gj , π → σ π is obviously bijective.
i
i i
i
i
i
i
14.5. Can Structured Pseudospectra Jump?
buch7 2005/10/5 page 371 i
371
This implies that all the sets Gj have the same number of elements, say . Consequently, the order of G is n. Lemma 14.32. If n is a prime number, then the monodromy group G of the polynomial Qn (z) − λzr = 0 contains an n-cycle, that is, after appropriately labelling the branches, (12 . . . n) ∈ G. Proof. This follows from Lemma 14.31 and a well-known theorem by Cauchy, which says that if n is a prime and the order of a finite group is divisible by n, then the group contains an element of the order n. Lemma 14.33 (well known). If a subgroup G of the full symmetric group Sn contains an n-cycle and a transposition, then G = Sn . Proof of Theorem 14.24(d). If r = 2 or s = 2, then the branch point at infinity of the n−2 Riemann surface of Qn (z) − λzr = 0 is of the type (n − 2, 2). Hence π∞ is a transposition (notice that n − 2 is odd). Combining Proposition 14.27 with Lemmas 14.32 and 14.33, we arrive at the assertion. Proof of Theorem 14.24(e). Without loss of generality, assume r ≤ s. Parts (a) and (c) of Theorem 14.24 imply the desired result for r = 0 and r = 1. Proposition 14.30 disposes of the case r = s = 2, and Theorem 14.24(d) gives the assertion for r = 2, s = 3 and r = 2, s = 5. We are left with the case where r = 3 and s = 4. In this case we deduce from Lemma 14.32 that G contains a 7-cycle, say π = (12 . . . 7). The branch point at infinity provides us with a permutation of the type σ = (1xyz)(abc). By checking all possible cases, one sees that the group generated by π and σ is always weakly 3-transitive. It remains to make use of Proposition 14.27. Remark. The arguments used above are standard in Galois theory, and in fact the monodromy group G of Qn (z) − λzr = 0 is known to be isomorphic to the Galois group G0 of Qn (z) − λzr = 0. Let R = C(λ) be the field of the rational functions (of λ) with complex coefficients. We think of Qn (z) − λzr as an element of R[z]. The splitting field of this polynomial is R(z1 (λ), . . . , zn (λ)), where z1 (λ), . . . , zn (λ) are given by (14.31). The Galois group G0 may be identified with the group of all permutations of z1 (λ), . . . , zn (λ) that can be extended to an automorphism of the field R(z1 (λ), . . . , zn (λ)) that leaves the elements of R fixed. When working with the monodromy group, we transform valid equalities into new equalities by analytic continuation, while from the algebraic point of view, valid equalities are transformed into new equalities by the action of the Galois group. The polynomial Qn (z) − λzr can be shown to be the minimal polynomial of each of its zeros z1 (λ), . . . , zn (λ). This implies that the dimension of R(z1 (λ)) over R is n. Lemma 14.31 so amounts to saying that this dimension divides the order of G0 , and this is one of the conclusions of the fundamental theorem of Galois theory.
i
i i
i
i
i
i
372
buch7 2005/10/5 page 372 i
Chapter 14. Impurities
Exercises 1. Let Un (λ) denote the Chebyshev polynomial of the second kind introduced in Section 1.9. (a) Show that the eigenvalues of Tn (σ ) + vEnn are the zeros of the polynomial Un (λ/2) − vUn−1 (λ/2). (b) Let λj be an eigenvalue of Tn (σ ) + vEnn and let θj ∈ C be any number such that cos θj = λj . Prove that xj =
#
n
2 sin(kθj ) 2n + 1 − U2n (λj /2)
k=1
is an eigenvector for λj and that xj 2 = 1. (c) Use (b) to show that the eigenvector xj is extended for |λj | ≤ 2 and localized for |λj | > 2. 2. Let An = Tn (σ ) + vIn,n , where v is uniformly distributed in [−M, M] and In,n is the n × n identity matrix. The density of the eigenvalues of An is defined as ⎛ ⎞ n 1 δ(λ − λj (An ))⎠ . n (λ) = E ⎝ n j =1 By Theorem 2.4, n (λ) =
1 2Mn
n πj δ λ − v − 2 cos dv. n+1 −M j =1
M
Clearly, n (λ) = 0 for λ ∈ C \ R. Prove that for λ ∈ R, 1 lim n (λ) = n→∞ 2Mπ
λ+M
λ−M
χ[−2,2] (x) dx, √ 4 − x2
where χ[−2,2] (x) is 1 for −2 ≤ x ≤ 2 and zero otherwise. 3. (a) Let a ∈ W and λ ∈ C \ sp T (a). Show that [T −1 (a − λ)]1,1 = 1/G(a − λ), where G(b) is defined by (2.25), and deduce that T (a) = sp T (a) ∪ {λ ∈ / sp T (a) : G(|a − λ|) ≤ ε}. sp(1,1) εD T (a) is (b) Show that there is a dense subset M of W such that if a ∈ M, then sp(1,1) εD strictly larger than sp T (a) for every ε > 0.
i
i i
i
i
i
i
Exercises
buch7 2005/10/5 page 373 i
373
4. For a ∈ W , we denote by L(a) the Laurent matrix (aj −k )∞ j,k=−∞ (recall the notes to Chapter 1). (a) Show that L(a) generates a bounded operator on 2 (Z). (b) Show that sp L(a) = a(T). L(a) = sp L(a) for all sufficiently small ε > 0 if and only if (c) Prove that spm εD sup
max
|k|≤m−1 λ∈a(T) /
2π
0
eikθ < ∞. dθ a(eiθ ) − λ
L(a) is strictly larger than (d) Deduce that if a ∈ P is real valued on T, then spm εD sp L(a) for every ε > 0. 5. Show that the matrix ⎛
v1 ⎜ e−g1 ⎜ ⎜ ⎜ ⎝ e gn
e g1 v2 ...
e−gn e g2 ...
e−gn−2
... vn−1 e−gn−1
egn−1 vn
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
is similar to the matrix ⎛
v1 ⎜ e−g ⎜ ⎜ ⎜ ⎝ eg
eg v2 ...
e−g eg ... e−g
... vn−1 e−g
eg vn
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
where g = (g1 + · · · + gn )/n. 6. Show that the matrix ⎛
v1 ⎜ 1 ⎜ ⎜ ⎜ ⎝
is similar to the matrix ⎛
α2 v2 ...
v1 /α ⎜ 1 ⎜ α⎜ ⎜ ⎝
⎞ α2 ... ... 1 vn−1 1
1 v2 /α ...
⎟ ⎟ ⎟ ⎟ 2 ⎠ α vn
(α = 0)
⎞ 1 ... ... 1 vn−1 /α 1
1 vn /α
⎟ ⎟ ⎟. ⎟ ⎠
i
i i
i
i
i
i
374
buch7 2005/10/5 page 374 i
Chapter 14. Impurities
7. Show that the matrix ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
v1
1 v2
⎞ 1 .. .
..
. vn−1
⎟ ⎟ ⎟ ⎟ ⎟ 1 ⎠ vn
is similar to the matrix ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
v1
⎞
β v2
β .. .
β −n
..
. vn−1
⎟ ⎟ ⎟ ⎟. ⎟ β ⎠ vn
8. Show that ⎛ ⎝
a
−1 b
⎞−1 −1 ⎠ c
⎛ =⎝
a −1
a −1 b−1 b−1
⎞ a −1 b−1 c−1 b−1 c−1 ⎠ c−1
and generalize the result from 3 × 3 to n × n matrices. 9. Let A be an n × n matrix and let u and v be column vectors of length n. Show that det (A + uv ) = det A + v adj (A) u, where adj (A) is the adjugate matrix of A.
Notes The study of discrete Hamiltonians, either with deterministic or with random potential Vn , is a big business and goes beyond the scope of the present book. We therefore leave things with a few remarks. Anderson [4] considered the selfadjoint matrix Tn (σ ) with random perturbations on the main diagonal and discovered numerically that the eigenvectors for eigenvalues outside the original spectrum in [−2, 2] are localized. Figure 14.15 illustrates the phenomenon. The top picture shows the 50 eigenvalues of the matrix T50 (σ ) + diag (vj )50 j =1 for a single realization of 50 independent vj ’s drawn from the uniform distribution on [−1, 1]. The (extended) eigenvector for the encircled eigenvalue in the middle is shown in the bottom left picture, and the (localized) eigenvector for the encircled boundary eigenvalue is seen in the bottom right picture. The discrete Laplacian Tn (σ ) is selfadjoint, as it should be in quantum mechanics. The interest in randomly perturbed general tridiagonal Toeplitz matrices Tn (t + α 2 t −1 ) comes from so-called non-Hermitian quantum mechanics and was pioneered by Hatano and Nelson
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 375 i
375 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1
−2
−1
0
1
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
−0.1
−0.1
−0.2
−0.2
−0.3
−0.3
−0.4
−0.4
−0.5
−0.5 0
20
40
0
2
20
40
Figure 14.15. The Anderson model.
[152]. Their discovery is illustrated in Figure 14.16. To make the effect more visible, one replaces Tn (t + α 2 t −1 ) by Cn (t + α 2 t −1 ). In the top picture we plotted the 50 eigenvalues of Cn (t + 0.49t −1 ) + diag (vj )50 j =1 for a single realization of 50 independent diagonal entries vj from the uniform distribution on [−1, 1]. We see a bubble and two wings. The absolute value of the (extended) eigenvector for the encircled eigenvalue on the bubble is shown in the bottom left picture; the absolute value of the (localized) eigenvector for the encircled eigenvector on the wing is in the bottom right picture. Profound theoretical investigations about this topic are due to Brézin and Zee [75], Brouwer, Silvestrov, and Beenakker [76], Davies [95], Feinberg and Zee [117], [118], Goldsheid and Khoruzhenko [135], [136], and Janik et al. [173], to cite only a few figures. Applications to population dynamics appear in [91], [191], and applications to small world networks can be found in [189], [259], for instance. An introduction to the ideas of Feinberg and Zee follows below. In this chapter we confine ourselves to showing how structured pseudospectra can be employed in order to get interesting information about the possible spectra of banded (and not necessarily Hermitian) Toeplitz matrices with impurities. The idea of formulating
i
i i
i
i
i
i
376
buch7 2005/10/5 page 376 i
Chapter 14. Impurities 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1
−2
−1
0
1
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
−0.1
0
20
40
−0.1
0
2
20
40
Figure 14.16. The Hatano-Nelson model.
problems on random Toeplitz matrices in the language of pseudospectral analysis goes back to Trefethen, Contedini, and Embree [274]. For the reader’s convenience, we cite their main results below. The approach of [274] depends heavily on the circumstance that a simple explicit inverse for bidiagonal matrices is available (see Exercise 8). For matrices with more than two diagonals, things become significantly more intricate. Papers [41] to [44] are devoted to this more general situation and exhibit several phenomena that can be explained with the help and in the language of structured pseudospectra. The purpose of Section 14.1 is to illustrate a few phenomena by a transparent example. Paper [189] is closely related to this section. The results of Section 14.2 were established in [43], and Section 14.3 is based on [42]. In Section 14.4 we follow [44]. All of Section 14.5 is from [50]. Exercise 3 is a result of [122]. Exercises 3 and 4 are from [42]. These two exercises reveal that Theorem 14.8 is not a consequence of some sort of general perturbation theory. Exercise 5 is taken from [91]. Exercise 7 is probably well known. We found it in [274]. Note that if |β| is large, then the matrix with the β’s differs from a purely bidiagonal matrix in an exponentially small term only—the spectral properties of Cn (χ ) + random diagonal
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 377 i
377
and Tn (χ1 ) + random diagonal are nevertheless significantly different. Exercise 8 is the basis of the approach of [274], because it delivers explicit expressions for the entries of the inverse of the matrix λI − (Tn (χ1 ) + diag (vj )). Exercise 9 is explicit in [233]. Further results: Arveson and C ∗ -algebras II. The following is intended as another illustration of the usefulness of C ∗ -algebras in numerical analysis on the one hand and as another instance of the emergence of Toeplitz plus diagonal matrices on the other. The results and ideas cited here are due to Arveson [7], [8], [9] (see also [21], [22], [149]). The general problem is to find the spectrum of a selfadjoint operator A in B(2 (Z)) by spectral approximation, say by considering the 2n × 2n matrices An = Pn APn |Im Pn , where now (in contrast to the rest of the book) Pn denotes the projection on 2 (Z) defined by Pn : {ξk }∞ k=−∞ → {. . . , 0, ξ−n , . . . , ξ−1 , ξ0 , . . . , ξn−1 , 0, . . . }. Put Qn = I − Pn . The Følner algebra F({Pn }) associated with the sequence {Pn }∞ n=1 is the C ∗ -subalgebra of B(2 (Z)) that is constituted by all operators A for which tr Pn A∗ Qn APn = 0, n→∞ n lim
tr Pn AQn A∗ Pn = 0. n→∞ n lim
It is easily seen that F({Pn }) contains all bounded Laurent operators and all bounded diagonal operators. A tracial state of a unital C ∗ -algebra A is a bounded linear functional τ : A → C satisfying τ (a ∗ a) ≥ 0 and τ (ab) = τ (ba) for all a, b ∈ A and τ (e) = 1 for the unit e of A. A unital C ∗ -algebra is said to have a unique tracial state if the set of its tracial states is a singleton. Let A have a unique tracial state τ . Then for each selfadjoint a ∈ A the map C0 (R) → C, ϕ → τ (ϕ(a)) is a positive linear functional. (Here C0 (R) stands for the compactly supported continuous functions on R.) By the Riesz-Markov theorem, there is a probability measure μa on R such that ∞ ϕ(x)dμa (x) for all ϕ ∈ C0 (R). τ (ϕ(a)) = −∞
This measure μa is called the spectral distribution of a. Arveson’s theorem. Let A be a unital C ∗ -subalgebra of the Følner algebra F({Pn }) and suppose A has a unique tracial state τ . Let A ∈ A be a selfadjoint operator, let λj (An ) (j = 1, . . . , 2n) be the eigenvalues of An , and let μA be the spectral distribution of A. Then 1 lim ϕ(λj (An )) = n→∞ 2n j =1 2n
∞
−∞
ϕ(x)f μA (x) for every ϕ ∈ C0 (R).
(14.36)
An irrational rotation C ∗ -algebra is a C ∗ -algebra that is generated by two unitary elements u and v which satisfy uv = e2π iθ vu with some irrational number θ ∈ R. It
i
i i
i
i
i
i
378
buch7 2005/10/5 page 378 i
Chapter 14. Impurities
turns out that all such algebras with the same θ are isomorphic. To be more precise, if A(j ) (j = 1, 2) are C ∗ -algebras that are generated by unitary elements uj , vj satisfying uj vj = e2π iθ vj uj , then the map u1 → u2 , v1 → v2 extends to a C ∗ -algebra isomorphism of A(1) onto A(2) . A simple example of an irrational rotation C ∗ -algebra is the C ∗ -subalgebra of B(2 (Z)) that is generated by the two unitary operators (U ξ )n = ξn−1 ,
(V ξ )n = e−2π inθ ξn .
(14.37)
We denote this C ∗ -algebra by Aθ . Since, obviously, U and V belong to the Følner algebra F({Pn }), the entire C ∗ -algebra Aθ is contained in F({Pn }). One can also show that Aθ has a unique tracial state. Let us now turn to quantum mechanics. The one-dimensional Hamiltonian 1 (Hf )(x) = − f (x) + ψ(x)f (x) 2 is discretized by Hδ =
1 2 P + ψ(Qδ ), 2 δ
where δ, the numerical step size, is a small positive rational number and Pδ , Qδ (which should not be confused with the projections Pn , Qn introduced above) are the selfadjoint operators defined on L2 (R) by f (x + δ) − f (x − δ) , 2iδ
(Pδ f )(x) =
(Qδ f )(x) =
sin(δx) f (x). δ
With the unitary operators U and V given on L2 (R) by (Uδ f )(x) = f (x + 2δ), we have Hδ = −
1 8δ 2
*
Uδ + Uδ∗ − 8δ 2 ψ 2
(Vδ f )(x) = eiδx f (x), +
1 (Vδ − Vδ∗ ) 2iδ
+
1 I. 4δ 2
(14.38)
2
It is easily seen that Uδ Vδ = e2iδ Vδ Uδ = e2π i(δ /π ) Vδ Uδ . By what was said above, the C ∗ -algebra generated by Uδ and Vδ is isomorphic to Aδ2 /π (recall that δ is rational and that hence δ 2 /π is irrational). We may therefore identify the operator in the brackets of (14.38) with the operator 1 A = U + U ∗ − 8δ 2 ψ (V − V ∗ ) ∈ Aθ ⊂ B(2 (Z)), (14.39) 2iδ where U and V are given by (14.37) with θ = δ 2 /π . Clearly, we can write (14.39) in the form 2 A = L(χ−1 + χ1 ) + diag (vj )∞ j =−∞ ∈ Aθ ⊂ B( (Z))
(14.40)
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 379 i
379
with vj = −8δ 2 ψ((−1/δ) sin(2j δ 2 )). Application of Arveson’s theorem now yields that (14.36) is true with ⎞ ⎛ 1 v−n ⎟ ⎜ 1 v−n+1 1 ⎟ ⎜ ⎟ ⎜ ... ... ... An = ⎜ ⎟ ⎝ 1 vn−2 1 ⎠ 1 vn−1 = T2n (χ−1 + χ1 ) + diag (vj )n−1 j =−n and thus sets at least a theoretical foundation for the spectral approximation of A and hence Hδ . Further results: Feinberg and Zee. The following material is taken from [117] and [118]. Let An be a random n × n matrix. The eigenvalue density of An is ⎞ ⎛ n 1 δ(λ − λj (An ))⎠ , (λ) = E ⎝ n j =1 where E denotes the expected value (= mean) and δ is the Dirac delta function. The key formulas for computing (λ) are 1 1 ∗ −1 (14.41) tr [(λI − An I ) ] , (λ) = ∂ E π n 1 1 = ∂ ∂ ∗E log det (λI − An )(λI − A∗n ) , (14.42) π n where ∂∗ =
∂
=
1 2
∂ ∂ +i ∂x ∂y
,
∂=
∂ 1 = ∂λ 2
∂ ∂ −i ∂x ∂y
. ∂λ The function in the parentheses on the right of (14.41) is called Green’s function and denoted by G(λ). Thus, 1 1 ∗ −1 tr [(λI − An I ) ] . (λ) = ∂ G(λ), G(λ) = E π n Let now An = Tn (b)+vE11 where b ∈ P is fixed and v is drawn from some probability distribution. If |λ| is large, we have (λI − An )−1 = (Cn (λ − b) − vE11 )−1 = (I − Cn−1 (λ − b)vE11 )−1 Cn−1 (λ − b) = Cn−1 (λ − b) +
∞
v k ([Cn−1 (λ − b)]11 )k−1 Cn−1 (λ − b)E11 Cn−1 (λ − b)
k=1
and hence 1 tr [(λI − An )−1 ] n ∞ 1 = tr Cn−1 (λ − b) + v k ([Cn−1 (λ − b)]11 )k−1 tr Cn−1 (λ − b)E11 Cn−1 (λ − b). n k=1
i
i i
i
i
i
i
380
buch7 2005/10/5 page 380 i
Chapter 14. Impurities
We know from Proposition 2.1 that Cn−1 (λ − b) = Un∗ diag
1 λ−
j b(ωn )
Un ,
Cn−2 (λ − b) = Un∗ diag
1 j
(λ − b(ωn ))2
Un ,
where ωn = e2π i/n . Put 1 1 1 . tr Cn−1 (λ − b) = n n j =0 λ − b(ωnj ) n−1
G0 (λ) = Then
1 1 ∂G0 (λ) = =− n j =0 (λ − b(ωnj ))2 ∂λ n−1
tr E11 Cn−2 (λ
− b)E11 =
[Cn−2 (λ
− b)]11
and consequently, G(λ) = G0 (λ) −
1 ∂G0 (λ) E n ∂λ
v 1 − G0 (λ)v
(14.43)
.
As both sides of (14.43) are analytic functions of λ, equality (14.43) is actually true for all λ outside sp Cn (b) = {b(1), b(ωn ), . . . , b(ωnn−1 )}. Now take b(t) = χ1 (t) := t. For large n, 2π n−1 dθ 1 1 1 0 G0 (λ) = ≈ = 1/λ n j =0 λ − ωnj 2π 0 λ − eiθ
for for
|λ| < 1, |λ| > 1,
and thus, by (14.43), G(λ) ≈
0 1 λ
+
1 nλ
E
v λ−v
for |λ| < 1, for |λ| > 1.
With the Heaviside function H (x) = 0 for x < 0 and H (x) = 1 for x > 1, we can write * + 1 1 v G(λ) ≈ H (λλ − 1) + E . (14.44) λ nλ λ−v As a first concrete example, assume v takes the two values r and −r with equal probability, P (v = r) = P (v = −r) =
1 . 2
(14.45)
Then (14.44) becomes *
+ 1 1 r r G(λ) ≈ H (λλ − 1) + − λ 2nλ λ − r λ+r + * 1 1 1 1 1 = H (λλ − 1) 1 − + + . n λ 2n λ − r λ+r
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 381 i
381
Taking into account that H (x) = δ(x) and ∂ ∗ (1/λ) = π δ(λ) we get π (λ) = ∂ ∗ G(λ)
+ * λ r r 1 ≈ δ(λλ − 1) 1 − + + n 2n λ − r λ+r * + 1 1 1 + H (λλ − 1) 1 − π δ(λ) + π δ(λ − r) + π δ(λ + r) . n 2n 2n
We have δ(λ) = 0 for |λ| > 1, and the term λ r r δ(λλ − 1) + 2n λ − r λ+r can be neglected for large n. In summary * + 1 1 1 1 (λ) ≈ 1− δ(|λ|2 − 1) + H (|λ|2 − 1) δ(λ − r) + δ(λ + r) . π n 2n 2n Thus, in the n → ∞ limit, the situation is as follows. If r ∈ (0, 1), then the eigenvalues are distributed on T, and if r ∈ (1, ∞), then n − 2 eigenvalues are distributed on T, while two eigenvalues are located at −r and r with the density 1/(2n). Notice that the coefficient 1/π before δ(|λ|2 − 1) is correct, since if is any region containing T, then 2π 1+ε 1 1 2 δ(|λ| − 1) dA(λ) = δ(r 2 − 1) rdrdθ π π 0 1−ε 1+ε (1+ε)2 1 =2 δ(r 2 − 1) rdr = 2 δ(s − 1) ds = 1. 2 1−ε (1−ε)2 We remark that the special distribution (14.45) can also be treated in a more elementary manner. Namely, the eigenvalues of Cn (χ1 ) + vE11 are the roots of the polynomial λn − vλn−1 − 1. If |v| > 1 then, in the n → ∞ limit, one of these roots is at v with the density 1/n and the remaining roots are distributed on T. If |v| ≤ 1 then, in the n → ∞ limit, all zeros are distributed on T. We now take v from the uniform distribution on [−M, M]. Proceeding as above, one can show that then * + 1 1 1 λ−M G(λ) ≈ H (λλ − 1) 1 − − log n λ 2nM λ+M and (λ) ≈
1 π
1−
1 H (|λ|2 − 1) δ(|λ|2 − 1) + δ(Im λ)H (M − |Re λ|). n 2nM
We see in particular that if M > 1, then asymptotically the eigenvalues lie on T and two wings [−M, −1] and [1, M]. Figure 14.17 shows the results of two numerical realizations. If x = (x1 , . . . , xn ) is an eigenvector for Cn (χ1 ) + vE11 , then xj +1 = λxj for j = 2, . . . , n − 1. Thus, if |λ| = 1 then x is extended, whereas for λ = ±r with r > 1 the vector x is localized at the beginning.
i
i i
i
i
i
i
382
buch7 2005/10/5 page 382 i
Chapter 14. Impurities 2
2
1.5
1.5
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5
−2
−2 −2
−1
0
1
2
−2
−1
0
1
2
Figure 14.17. The superposition of the eigenvalues of C100 (χ1 ) + vE11 for 50 random choices of v. In the left picture, v takes the values −2 and 2 with equal probability. In the right picture, v is drawn from the uniform distribution on the segment [−2, 2]. Feinberg and Zee also consider An = Cn (χ1 ) + diag (v1 , . . . , vn ), where the vj are independent and P (vj = r) = P (vj = −r) = 1/2 for all j . In this case one can employ (14.42). In the case at hand, n det An = (λ − vj ) − 1 k=1
and hence E[log det (λI − An )] =
n 1 n log (λ − r)j (λ + r)n−j − 1 . n 2 j =0 j
In the limit n → ∞, the binomial coefficients are sharply peaked around j ≈ n/2, and thus E[log det (λI − An )] ≈ log (λ − r)n/2 (λ + r)n/2 − 1 . (14.46) A similar approximation is true for E[log det (λI − A∗n )]. By (14.42), the support of the density (λ) is the singularities of the right-hand side of (14.46). These singularities occur at (λ2 − r 2 )n/2 = 1, that is, at # λj = ± r 2 + e4π ij/n (j = 0, 1, . . . , n/2). (14.47) It follows that the original circle spectrum is distorted by the specific randomness considered into the curve λ2 = r 2 + eiθ (0 ≤ θ < 2π ). Figure 14.18 shows some examples. Further results: Trefethen, Contedini, and Embree. All of the following is from [274]. This paper is devoted to matrices of the form Tn (χ1 ) + random diagonal and to Cn (χ1 ) + random diagonal, and it exhibits a Hatano-Nelson bubble and an associated localizationdelocalization phenomenon in the context of pseudospectra and resolvents instead of eigenvalues and eigenvectors.
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 383 i
383
1.5
1.5 r = 0.5
r = 0.9
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5 −1.5
−1
−0.5
0
0.5
1
1.5
1.5
−1.5
−1
−0.5
0
0.5
1
1.5
1
1.5
1.5 r = 1.0
r = 1.1
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5 −1.5
−1
−0.5
0
0.5
1
1.5
−1.5
−1
−0.5
0
0.5
Figure 14.18. Each picture shows the 100 eigenvalues (marked by +) of the matrix C100 (χ−1 + χ1 ) + diag (vj ) with independent vj taking the values −r and r with equal probability. The values of r are 0.5, 0.9, 1.0, 1.1. The dots are the points (14.47) for n = 100.
Let V be a random variable taking values dense in a compact subset supp V of the complex plane. For λ ∈ C, we define dmin (λ) = min |λ − v|, v∈supp V
dmax (λ) = max |λ − v|, v∈supp V
dmean (λ) = exp E(log |λ − V |). We then have 0 ≤ dmin (λ) ≤ dmean (λ) ≤ dmax (λ) < ∞ for every λ ∈ C. Let
i
i i
i
i
i
i
384
buch7 2005/10/5 page 384 i
Chapter 14. Impurities I = {λ ∈ C : dmax (λ) < 1}, II = {λ ∈ C : dmean (λ) < 1 ≤ dmax (λ)}, III = {λ ∈ C : dmin (λ) ≤ 1 ≤ dmean (λ)}, IV = {λ ∈ C : 1 < dmin (λ)}.
Obviously, these four sets are disjoint, and their union is C. Here are two concrete examples. Suppose first that supp V = {−1, 1} and that P (V = −1) = P (V = 1) = 1/2. Then I is empty, II is the closed set bounded by the lemniscate |λ2 − 1|2 = 1, III is the two closed disks {−1, 1} + D minus II , and IV is C \ III (see the left picture of Figure 14.19). If V is uniformly distributed on [−2, 2], then I is again empty, II is the bubble bounded by two symmetric arcs in the upper and lower half-planes meeting at λ0 ∈ (1, 2) and −λ0 ∈ (−2, −1), III is [−2, 2] + D minus III , and IV is the complement of III (see the right picture of Figure 14.19). 3
2 1.5
IV
2
1
IV III
1
0.5
III II
0
II
0
−0.5 −1 −1 −2
−1.5 −2
−3 −2
−1
0
1
2
−3
−2
−1
0
1
2
3
Figure 14.19. The sets II , III , IV for the case of the uniform distribution on supp V = {−1, 1} (left) and supp V = [−2, 2] (right). Now put An = Tn (χ1 ) + diag (v1 , . . . , vn ),
A = T (χ1 ) + diag (vj )∞ j =1 ,
Bn = Cn (χ1 ) + diag (v1 , . . . , vn ),
B = L(χ1 ) + diag (vj )∞ j =1 ,
where L(c) := (cj −k )∞ j,k=−∞ and the vj are independent random variables with the same distribution as V . We say some result holds almost surely if it is true with probability 1. Theorem on An . (a) If λ ∈ I , then (λI − An )−1 2 ≥ (1/dmax (λ))n (guaranteed exponential growth). (b) If λ ∈ I ∪ II , then (λI − An )−1 2 → ∞ as n → ∞ almost surely. If, in 1/n addition, λ ∈ / supp V , then (λI − An )−1 2 → 1/dmean (λ) as n → ∞ almost surely (almost sure exponential growth). (c) If λ ∈ III , then (λI −An )−1 2 → ∞ as n → ∞ almost surely. If, in addition, λ ∈ / 1/n supp V , then (λI − An )−1 2 → 1 as n → ∞ almost surely (almost sure subexponential growth).
i
i i
i
i
i
i
Notes
buch7 2005/10/5 page 385 i
385
(d) If λ ∈ IV , then (λI − An )−1 2 < 1/(dmin (λ) − 1) (guaranteed boundedness). The spectrum sp An converges in the Hausdorff metric to supp V as n → ∞ almost surely. Theorem on A. If λ ∈ I , then (λI −A)−1 2 = ∞. If λ ∈ II ∪III , then (λI −A)−1 2 = ∞ almost surely. If λ ∈ IV , then (λI − A)−1 2 ≤ 1/(dmin (λ) − 1) and this inequality is an equality almost surely. The spectrum sp A equals I ∪ II ∪ III almost surely. We define Sbubble = {λ ∈ C : dmean (λ) = 1},
Swings = {λ ∈ supp V : dmean (λ) > 1}.
Theorem on Bn . (a) If λ ∈ I , then (λI − Bn )−1 2 ≤ 1/(1 − dmax (λ)). (b) If λ ∈ II , then (λI − Bn )−1 2 → ∞ as n → ∞ almost surely and the quantity 1/n (λI − Bn )−1 2 goes to 1 as n → ∞ almost surely. (c) If λ ∈ III , then (λI − Bn )−1 2 → ∞ as n → ∞ almost surely. If in addition 1/n λ∈ / Sbubble ∪ Swings , then (λI − Bn )−1 2 → 1 as n → ∞ almost surely. −1 (d) If λ ∈ IV , then (λI − Bn ) 2 ≤ 1/(dmin (λ) − 1). If Sbubble consists only of curves disjoint from supp V except at isolated points, then sp Bn converges in the Hausdorff metric to Sbubble ∪ Swings as n → ∞ almost surely. Theorem on B. If λ ∈ I then (λI − B)−1 2 ≤ 1/(1 − dmax (λ)), and this inequality is an equality almost surely. If λ ∈ II ∪ III , then (λI − B)−1 2 = ∞ almost surely. If λ ∈ IV then (λI − B)−1 2 ≤ 1/(dmin (λ) − 1), and this inequality is an equality almost surely. The spectrum sp B coincides with II ∪ III almost surely.
i
i i
i
i
buch7 2005/10/5 page 386 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 387 i
Bibliography [1] V. M. Adamyan, Asymptotic properties for positive and Toeplitz matrices, Oper. Theory Adv. Appl., 43 (1990), pp. 17–38. [2] E. L. Allgower, Exact inverses of certain band matrices, Numer. Math., 21 (1973), pp. 279–284. [3] G. Ammar and P. Gader, A variant of the Gohberg–Semencul formula involving circulant matrices, SIAM J. Matrix Anal. Appl., 12 (1991), pp. 534–540. [4] P. W. Anderson, Absence of diffusion in certain random lattices, Phys. Rev. (Second Series), 109 (1958), pp. 1492–1505. [5] A. L. Andrew, Eigenvectors of certain matrices, Linear Algebra Appl., 7 (1973), pp. 151–162. [6] P. M. Anselone and I. H. Sloan, Spectral approximations for Wiener-Hopf operators II, J. Integral Equations Appl., 4 (1992), pp. 465–489. [7] W. Arveson, Noncommutative spheres and numerical quantum mechanics, in Operator Algebras, Mathematical Physics, and Low-Dimensional Topology (Istanbul, 1991), Res. Notes Math., Vol. 5, A. K. Peters, Wellesley, MA, 1993, pp. 1–10. [8] W. Arveson, The role of C ∗ -algebras in infinite-dimensional numerical linear algebra, Contemp. Math., 167 (1994), pp. 114–129. [9] W. Arveson, C ∗ -algebras and numerical linear algebra, J. Funct. Anal., 122 (1994), pp. 333–360. [10] F. Avram, On bilinear forms in Gaussian random variables and Toeplitz matrices, Probab. Theory Related Fields, 79 (1988), pp. 37–45. [11] O. Axelsson, Iterative Solution Methods, Cambridge University Press, Cambridge, UK, 1996. [12] M. Bakonyi and D. Timotin, On an extension problem for polynomials, Bull. London Math. Soc., 33 (2001), pp. 599–605. [13] E. Basor, Review of “Invertibility and Asymptotics of Toeplitz Matrices”, Linear Algebra Appl., 68 (1985), pp. 275–278. 387
i
i i
i
i
i
i
388
buch7 2005/10/5 page 388 i
Bibliography
[14] E. Basor and T. Ehrhardt, Asymptotic formulas for determinants of a sum of finite Toeplitz and Hankel matrices, Math. Nachr., 228 (2001), pp. 5–45. [15] E. Basor and K. E. Morrison, The Fisher-Hartwig conjecture and Toeplitz eigenvalues, Linear Algebra Appl., 202 (1994), pp. 129–142. [16] E. Basor and H. Widom, On a Toeplitz determinant identity of Borodin and Okounkov, Integral Equations Operator Theory, 37 (2000), pp. 397–401. [17] G. Baxter, Polynomials defined by a difference system, J. Math. Anal. Appl., 2 (1961), pp. 223–263. [18] G. Baxter, A norm inequality for a finite-section Wiener-Hopf equation, Illinois J. Math., 7 (1963), pp. 97–103. [19] G. Baxter and P. Schmidt, Determinants of a certain class of non-Hermitian Toeplitz matrices, Math. Scand., 9 (1961), pp. 122–128. [20] R. M. Beam and R. F. Warming, The asymptotic spectra of banded Toeplitz and quasi-Toeplitz matrices, SIAM J. Sci. Comput., 14 (1993), pp. 971–1006. [21] E. Bédos, On filtrations for C ∗ -algebras, Houston J. Math., 20 (1994), pp. 63–74. [22] E. Bédos, On Følner nets, Szegö’s theorem and other eigenvalue distribution theorems, Exposition. Math., 15 (1997), pp. 193–228. [23] F. Di Benedetto, G. Fiorentino, and S. Serra, CG preconditioning for Toeplitz matrices, J. Comput. Math. Appl., 25 (1993), pp. 35–45. [24] L. Berg, Über eine Identität von W. F. Trench zwischen der Toeplitzschen und einer verallgemeinerten Vandermondeschen Determinante, Z. Angew. Math. Mech., 66 (1986), pp. 314–315. [25] L. Berg, Lineare Gleichungssysteme mit Bandstruktur und ihr asymptotisches Verhalten, Deutscher Verlag der Wissenschaften, Berlin, 1986. [26] A. Berman and R. J. Plemmons, Nonnegative Matrices in the Mathematical Sciences, Classics Appl. Math. 9, SIAM, Philadelphia, 1994. [27] R. Bhatia, Matrix Analysis, Springer-Verlag, New York, 1997. [28] D. Bini and A. Böttcher, Polynomial factorization through Toeplitz matrix computations, Linear Algebra Appl., 366 (2003), pp. 25–37. [29] P. du Bois-Reymond, Untersuchungen über die Convergenz und Divergenz der Fourierschen Darstellungsformeln, Abh. d. Math.-Phys. Classe d. Königl. Bayerischen Akad. d. Wiss., 12 (1876), pp. 1–13. [30] F. F. Bonsall and J. Duncan, Numerical Ranges of Operators on Normed Spaces and Elements of Normed Algebras, Cambridge University Press, Cambridge, UK, 1971.
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 389 i
389
[31] A. Borodin and A. Okounkov, A Fredholm determinant formula for Toeplitz determinants, Integral Equations Operator Theory, 37 (2000), pp. 386–396. [32] A. Böttcher, Status Report on Rationally Generated Block Toeplitz and WienerHopf Determinants, unpublished manuscript, 34 pages, 1989 (available from the author on request). [33] A. Böttcher, Truncated Toeplitz operators on the polydisk, Monatshefte f. Math., 110 (1990), pp. 23–32. [34] A. Böttcher, Pseudospectra and singular values of large convolution operators, J. Integral Equations Appl., 6 (1994), pp. 267–301. [35] A. Böttcher, On the approximation numbers of large Toeplitz matrices, Documenta Mathematica, 2 (1997), pp. 1–29. [36] A. Böttcher, C ∗ -algebras in numerical analysis, Irish Math. Soc. Bull., 45 (2000), pp. 57–133. [37] A. Böttcher, One more proof of the Borodin-Okounkov formula for Toeplitz determinants, Integral Equations Operator Theory, 41 (2001), pp. 123–125. [38] A. Böttcher, On the determinant formulas by Borodin, Okounkov, Baik, Deift, and Rains, Oper. Theory Adv. Appl., 135 (2002), pp. 91–99. [39] A. Böttcher, Transient behavior of powers and exponentials of large Toeplitz matrices, Electron. Trans. Numer. Anal., 18 (2004), pp. 1–41. [40] A. Böttcher, The constants in the asymptotic formulas by Rambour and Seghier for inverses of Toeplitz matrices, Integral Equations Operator Theory, 50 (2004), pp. 43–55. [41] A. Böttcher, M. Embree, and M. Lindner, Spectral approximation of banded Laurent matrices with localized random perturbations, Integral Equations Operator Theory, 42 (2002), pp. 142–165. [42] A. Böttcher, M. Embree, and V. I. Sokolov, Infinite Toeplitz and Laurent matrices with localized impurities, Linear Algebra Appl., 343/344 (2002), pp. 101–118. [43] A. Böttcher, M. Embree, and V. I. Sokolov, On large Toeplitz band matrices with an uncertain block, Linear Algebra Appl., 366 (2003), pp. 87–97. [44] A. Böttcher, M. Embree, and V. I. Sokolov, The spectra of large Toeplitz band matrices with a randomly perturbed entry, Math. Comp., 72 (2003), pp. 1329–1348. [45] A. Böttcher, M. Embree, and L. N. Trefethen, Piecewise continuous Toeplitz matrices and operators: Slow approach to infinity, SIAM J. Matrix Anal. Appl., 24 (2002) pp. 484–489. [46] A. Böttcher and S. Grudsky, On the condition numbers of large semi-definite Toeplitz matrices, Linear Algebra Appl., 279 (1998), pp. 285–301.
i
i i
i
i
i
i
390
buch7 2005/10/5 page 390 i
Bibliography
[47] A. Böttcher and S. Grudsky, Toeplitz band matrices with exponentially growing condition numbers, Electron. J. Linear Algebra, 5 (1999), pp. 104–125. [48] A. Böttcher and S. Grudsky, Toeplitz Matrices, Asymptotic Linear Algebra, and Functional Analysis, Hindustan Book Agency, New Delhi, 2000, and Birkhäuser Verlag, Basel, 2000. [49] A. Böttcher and S. Grudsky, Condition numbers of large Toeplitz-like matrices, Contemp. Math., 280 (2001), pp. 273–299. [50] A. Böttcher and S. Grudsky, Can spectral value sets of Toeplitz band matrices jump?, Linear Algebra Appl., 351/352 (2002), pp. 99–116. [51] A. Böttcher and S. Grudsky, Asymptotic spectra of dense Toeplitz matrices are unstable, Numer. Algorithms, 33 (2003), pp. 105–112. [52] A. Böttcher and S. Grudsky, The norm of the product of a large matrix and a random vector, Electronic J. Probability, 8 (2003), Paper 7, pp. 1–29. [53] A. Böttcher and S. Grudsky, Fejér means and norms of large Toeplitz matrices, Acta Sci. Math. (Szeged), 69 (2003), pp. 889–900. [54] A. Böttcher and S. Grudsky, Toeplitz matrices with slowly growing pseudospectra, in Factorization, Singular Integral Operators, and Related Topics, S. Samko, A. Lebre, and A. F. dos Santos, eds., Kluwer Academic Publishers, Dordrecht, The Netherlands, 2003, pp. 43–54. [55] A. Böttcher and S. Grudsky, Asymptotically good pseudomodes for Toeplitz matrices and Wiener-Hopf operators, Oper. Theory Adv. Appl., 147 (2004), pp. 175–188. [56] A. Böttcher and S. Grudsky, Structured condition numbers of large Toeplitz matrices are rarely better than usual condition numbers, Numer. Linear Algebra Appl., 12 (2005), pp. 95–102. [57] A. Böttcher, S. Grudsky, and A. Kozak, On the distance of a large Toeplitz band matrix to the nearest singular matrix, Oper. Theory Adv. Appl., 135 (2002), pp. 101–106. [58] A. Böttcher, S. Grudsky, A. Kozak, and B. Silbermann, Norms of large Toeplitz band matrices, SIAM J. Matrix Anal. Appl., 21 (1999), pp. 547–561. [59] A. Böttcher, S. Grudsky, A. Kozak, and B. Silbermann, Convergence speed estimates for the norms of the inverses of large truncated Toeplitz matrices, Calcolo, 36 (1999), pp. 103–122. [60] A. Böttcher, S. Grudsky, and E. Ramírez de Arellano, Approximating inverses of Toeplitz matrices by circulant matrices, Methods Appl. Anal., 11 (2004), pp. 211– 220.
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 391 i
391
[61] A. Böttcher, S. Grudsky, and E. Ramírez de Arellano, On the asymptotic behavior of the eigenvectors of large banded Toeplitz matrices, Math. Nachr., to appear. [62] A. Böttcher, S. Grudsky, and B. Silbermann, Norms of inverses, spectra, and pseudospectra of large truncated Wiener-Hopf operators and Toeplitz matrices, New York J. Math., 3 (1997), pp. 1–31. [63] A. Böttcher and B. Silbermann, Notes on the asymptotic behavior of block Toeplitz matrices and determinants, Math. Nachr., 98 (1980), pp. 183–210. [64] A. Böttcher and B. Silbermann, The asymptotic behavior of Toeplitz determinants for generating functions with zeros of integral orders, Math. Nachr., 102 (1981), pp. 79–105. [65] A. Böttcher and B. Silbermann, Über das Reduktionsverfahren für diskrete Wiener-Hopf-Gleichungen mit unstetigem Symbol, Z. Anal. Anwendungen, 1, no. 2 (1982), pp. 1–5. [66] A. Böttcher and B. Silbermann, The finite section method for Toeplitz operators on the quarter-plane with piecewise continuous symbols, Math. Nachr., 110 (1983), pp. 279–291. [67] A. Böttcher and B. Silbermann, Invertibility and Asymptotics of Toeplitz Matrices, Akademie-Verlag, Berlin, 1983. [68] A. Böttcher and B. Silbermann, Toeplitz matrices and determinants with FisherHartwig symbols, J. Funct. Anal., 63 (1985), pp. 178–214. [69] A. Böttcher and B. Silbermann, Toeplitz operators and determinants generated by symbols with one Fisher-Hartwig singularity, Math. Nachr., 127 (1986), pp. 95– 123. [70] A. Böttcher and B. Silbermann, Analysis of Toeplitz Operators, AkademieVerlag, Berlin, 1989 and Springer-Verlag, Berlin, 1990. [71] A. Böttcher and B. Silbermann, Introduction to Large Truncated Toeplitz Matrices, Springer-Verlag, New York, 1999. [72] A. Böttcher and H. Widom, Two remarks on spectral approximations for WienerHopf operators, J. Integral Equations Appl., 6 (1994), pp. 31–36. [73] A. Böttcher and H. Widom, Two elementary derivations of the pure Fisher-Hartwig determinant, Integral Equations Operator Theory, to appear. [74] A. Böttcher and H. Widom, From Toeplitz eigenvalues through Green’s kernels to higher-order Wirtinger-Sobolev inequalities, Oper. Theory Adv. Appl., to appear. [75] E. Brézin and A. Zee, Non-Hermitean delocalization: Multiple scattering and bounds, Nuclear Phys. B, 509 (1998), pp. 599–614.
i
i i
i
i
i
i
392
buch7 2005/10/5 page 392 i
Bibliography
[76] P. W. Brouwer, P. G. Silvestrov, and C. W. J. Beenakker, Theory of directed localization in one dimension, Phys. Rev. B, 56 (1997), pp. R4333–R4335. [77] A. Brown and P. Halmos, Algebraic properties of Toeplitz operators, J. Reine Angew. Math., 213 (1963/1964), pp. 89–102. [78] E. S. Brown and I. M. Spitkovsky, On matrices with elliptical numerical range, Linear Multilinear Algebra, 52 (2004), pp. 177–193. [79] D. Bump and P. Diaconis, Toeplitz minors, J. Combin. Theory Ser. A, 97 (2002), pp. 252–271. [80] J. Burke and A. Greenbaum, Some equivalent characterizations of the polynomial numerical hull of degree k, Oxford University Computing Laboratory Report, no. 04/29 (2004). [81] A. Calderón, F. Spitzer, and H. Widom, Inversion of Toeplitz matrices, Illinois J. Math., 3 (1959), pp. 490–498. [82] A. Cantoni and F. Butler, Eigenvalues and eigenvectors of symmetric centrosymmetric matrices, Linear Algebra Appl., 13 (1976), pp. 275–288. [83] R. H. Chan, Toeplitz preconditioners for Toeplitz systems with nonnegative generating functions, IMA J. Numer. Anal., 11 (1991), pp. 333–345. [84] R. H. Chan, X.-Q. Jin, and M.-C. Yeung, The circulant operator in the Banach algebra of matrices, Linear Algebra Appl., 149 (1991), pp. 41–53. [85] R. H. Chan and M. K. Ng, Conjugate gradient methods for Toeplitz systems, SIAM Rev., 38 (1996), pp. 427–482. [86] R. H. Chan, M. K. Ng, and A. M. Yip, A survey of preconditioners for ill-conditioned Toeplitz systems, Contemp. Math., 281 (2001), pp. 175–191. [87] R. H. Chan and G. Strang, Toeplitz equations by conjugate gradients with circulant preconditioner, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 104–119. [88] R. H. Chan and M.-C. Yeung, Circulant preconditioners constructed from kernels, SIAM J. Numer. Anal., 29 (1992), pp. 1093–1103. [89] T. Chan, An optimal circulant preconditioner for Toeplitz systems, SIAM J. Sci. Statist. Comput., 9 (1988), pp. 766–771. [90] R. Courant, K. Friedrichs, and H. Lewy, Über die partiellen Differenzengleichungen der mathematischen Physik, Math. Ann., 100 (1928), pp. 32–74. [91] K. A. Dahmen, D. R. Nelson, and N. M. Shnerb, Life and death near a windy oasis, J. Math. Biol., 41 (2000), pp. 1–23. [92] J. W. Daniel, The conjugate gradient method for linear and nonlinear operator equations, SIAM J. Numer. Anal., 4 (1967), pp. 10–26.
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 393 i
393
[93] S. Dasgupta and A. Gupta, An elementary proof of a theorem of Johnson and Lindenstrauss, Random Structures & Algorithms, 22 (2003), pp. 60–65. [94] E. B. Davies, Spectral enclosures and complex resonances for general self-adjoint operators, LMS J. Comput. Math., 1 (1998), pp. 42–74. [95] E. B. Davies, Spectral properties of random non-self-adjoint matrices and operators, Proc. Roy. Soc. London Ser. A, 457 (2001), pp. 191–206. [96] E. B. Davies, Semigroup growth bounds, J. Oper. Theory, to appear. [97] K. M. Day, Toeplitz matrices generated by the Laurent series expansion of an arbitrary rational function, Trans. Amer. Math. Soc., 206 (1975), pp. 224–245. [98] K. M. Day, Measures associated with Toeplitz matrices generated by the Laurent expansion of rational functions, Trans. Amer. Math. Soc., 209 (1975), pp. 175–183. [99] P. Delsarte and Y. Genin, Spectral properties of finite Toeplitz matrices, in Mathematical Theory of Networks and Systems (Beer Sheva, 1983), Lecture Notes in Control and Inform. Sci. 58, Springer-Verlag, London, 1984, pp. 194–213. [100] J. Demmel, The componentwise distance to the nearest singular matrix, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 10–19. [101] P. Deuflhard and A. Hohmann, Numerische Mathematik. Eine algorithmisch orientierte Einführung, de Gruyter, Berlin, 1991. [102] A. Devinatz, The strong Szegö limit theorem, Illinois J. Math., 11 (1967), pp. 160– 175. [103] R. G. Douglas, Banach Algebra Techniques in Operator Theory, 2nd edition, Springer-Verlag, New York, 1998. [104] R. G. Douglas and R. Howe, On the C ∗ -algebra of Toeplitz operators on the quarter-plane, Trans. Amer. Math. Soc., 158 (1971), pp. 203–217. [105] R. V. Duduchava, On discrete Wiener-Hopf equations, Trudy Tbilis. Mat. Inst., 50 (1975), pp. 42–59 (in Russian). [106] T. Ehrhardt, A generalization of Pincus’ formula and Toeplitz operator determinants, Arch. Math. (Basel), 80 (2003), pp. 302–309. [107] M. Eiermann, Fields of values and iterative methods, Linear Algebra Appl., 180 (1993), pp. 167–197. [108] R. L. Ellis and I. Gohberg, Orthogonal Systems and Convolution Operators, Birkhäuser Verlag, Basel, 2003. [109] L. Elsner and S. Friedland, The limit of the spectral radius of block Toeplitz matrices with nonnegative entries, Integral Equations Operator Theory, 36 (2000), pp. 193–200.
i
i i
i
i
i
i
394
buch7 2005/10/5 page 394 i
Bibliography
[110] M. Embree and L. N. Trefethen, Pseudospectra Gateway, Web site: http:// www.comlab.ox.ac.uk/pseudospectra. [111] M. Embree and L. N. Trefethen, Generalizing eigenvalue theorems to pseudospectra theorems, SIAM J. Sci. Comput., 23 (2001), pp. 583–590. [112] V. Faber, A. Greenbaum, and D. E. Marshall, The polynomial numerical hulls of Jordan blocks and related matrices, Linear Algebra Appl., 374 (2003), pp. 231–246. [113] D. K. Faddeev and I. S. Sominski˘ı, A Collection of Exercises in Higher Algebra, 10th edition, Nauka, Moscow, 1972 (in Russian). [114] D. R. Farenick, M. Krupnik, N. Krupnik, and W. Y. Lee, Normal Toeplitz matrices, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 1037–1043. [115] D. Fasino, Spectral properties of Toeplitz-plus-Hankel matrices, Calcolo, 33 (1996), pp. 87–98. [116] D. Fasino and P. Tilli, Spectral clustering properties of block multilevel Hankel matrices, Linear Algebra Appl., 306 (2000), pp. 155–163. [117] J. Feinberg and A. Zee, Non-Hermitian localization and delocalization, Phys. Rev. E, 59 (1999), pp. 6433–6443. [118] J. Feinberg and A. Zee, Spectral curves of non-hermitian hamiltonians, Nuc. Phys. B, 552 (1999), pp. 599–623. [119] L. Fejér, Untersuchungen über Fouriersche Reihen, Math. Ann., 58 (1904), pp. 501– 569. [120] G. M. Fichtenholz, Differential- und Integralrechnung III, fünfte Auflage, Deutscher Verlag d. Wisss., Berlin, 1972. [121] B. Fischer, Polynomial Based Iteration Methods for Symmetric Linear Systems, John Wiley & Sons, Ltd., Chichester, UK, and B. G. Teubner, Stuttgart, 1996. [122] J. Fortiana and C. M. Cuadras, A family of matrices, the discretized Brownian bridge, and distance-based regression, Linear Algebra Appl., 264 (1997), pp. 173– 188. [123] F. D. Gakhov, On Riemann’s boundary value problem, Matem. Sbornik, 2 (44) (1937), pp. 673–683 (in Russian). [124] E. Gallestey, D. Hinrichsen, and A. J. Pritchard, Spectral value sets of infinitedimensional systems, in Open Problems in Mathematical Systems and Control Theory, Comm. Control Engrg. Ser., Springer-Verlag, London, 1999, pp. 109–113. [125] E. Gallestey, D. Hinrichsen, and A. J. Pritchard, Spectral value sets of closed linear operators, Proc. Roy. Soc. Lond. Ser. A, 456 (2000), pp. 1397–1418. [126] V. I. Gel’fgat, A normality criterion for Toeplitz matrices, Comput. Math. Math. Phys., 35 (1995), pp. 1147–1150.
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 395 i
395
[127] J. S. Geronimo and K. M. Case, Scattering theory and polynomials orthogonal on the unit circle, J. Math. Phys., 20 (1979), pp. 299–310. [128] I. Gohberg, On an application of the theory of normed rings to singular integral equations, Uspekhi Mat. Nauk, 7 (1952), pp. 149–156 (in Russian). [129] I. Gohberg, On the number of solutions of homogeneous singular integral equations with continuous coefficients, Dokl. Akad. Nauk SSSR, 122 (1958), pp. 327–330 (in Russian). [130] I. Gohberg and I. A. Feldman, Convolution Equations and Projection Methods for Their Solution, AMS, Providence, RI, 1974. [131] I. Gohberg and I. Koltracht, Mixed, componentwise, and structured condition numbers, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 688–704. [132] I. Gohberg and I. Koltracht, Structured condition numbers for linear matrix structures, in Linear Algebra for Signal Processing (Minneapolis, 1992), IMA Vol. Math. Appl. 69, Springer-Verlag, New York, 1995, pp. 17–26. [133] I. Gohberg and M. G. Krein, Introduction to the Theory of Linear Non-Selfadjoint Operators in Hilbert Space, AMS, Providence, RI, 1969. [134] I. Gohberg and A. A. Sementsul, The inversion of finite Toeplitz matrices and their continual analogues, Matem. Issled., 7 (1972), pp. 201–223 (in Russian). [135] I. Ya. Goldsheid and B. A. Khoruzhenko, Distribution of eigenvalues in nonHermitian Anderson models, Phys. Rev. Lett., 80 (1998), pp. 2897–2900. [136] I. Ya. Goldsheid and B. A. Khoruzhenko, Eigenvalue curves of asymmetric tridiagonal random matrices, Electronic J. Probability, 5 (2000), Paper 16, pp. 1–28. [137] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd edition, Johns Hopkins University Press, Baltimore, MD, 1996. [138] G. M. Goluzin, Some estimates for bounded functions, Matem. Sbornik, N. S., 26 (68) (1950), pp. 7–18 (in Russian). [139] G. M. Goluzin, Geometric Theory of Functions of a Complex Variable, AMS, Providence, RI, 1969. [140] S. Graillat, A note on structured pseudospectra, J. Comput. Appl. Math., to appear. [141] A. Greenbaum, Generalizations of the field of values useful in the study of polynomial functions of a matrix, Linear Algebra Appl., 347 (2002), pp. 233–249. [142] A. Greenbaum, Card shuffling and the polynomial numerical hull of degree k, SIAM J. Sci. Comput., 25 (2003), pp. 408–416. [143] A. Greenbaum, Personal communication, August 2003.
i
i i
i
i
i
i
396
buch7 2005/10/5 page 396 i
Bibliography
[144] A. Greenbaum and L. N. Trefethen, Do the Pseudospectra of a Matrix Determine Its Behavior?, Technical Report TR 93-1371, Comp. Sci. Dept., Cornell University, Ithaca, NY, August 1993. [145] U. Grenander and G. Szegö, Toeplitz Forms and Their Applications, University of California Press, Berkeley, CA, 1958. [146] S. Grudsky and A. V. Kozak, On the convergence speed of the norms of inverses of truncated Toeplitz operators, in Integro-Differential Equations and Their Applications, Rostov State Univ. Press, Rostov-on-Don, 1995, pp. 45–55 (in Russian). [147] C. Gu and L. Patton, Commutation relations for Toeplitz and Hankel matrices, SIAM J. Matrix Anal. Appl., 24 (2003), pp. 728–746. [148] K. E. Gustafson and D. K. M. Rao, Numerical Range. The Field of Values of Linear Operators and Matrices, Springer-Verlag, New York, 1997. [149] R. Hagen, S. Roch, and B. Silbermann, C ∗ -Algebras and Numerical Analysis, Marcel Dekker, New York, 2001. [150] P. Halmos, A Hilbert Space Problem Book, D. van Nostrand, Princeton, 1967. [151] M. Hanke and J. G. Nagy, Toeplitz approximate inverse preconditioner for banded Toeplitz matrices, Numer. Algorithms, 7 (1994), pp. 183–199. [152] N. Hatano and D. R. Nelson, Vortex pinning and non-Hermitian quantum mechanics, Phys. Rev. B, 56 (1997), pp. 8651–8673. [153] F. Hausdorff, Set Theory, Chelsea, New York, 1957. [154] G. Heinig and F. Hellinger, The finite section method for Moore-Penrose inversion of Toeplitz operators, Integral Equations Operator Theory, 19 (1994), pp. 419–446. [155] G. Heinig and K. Rost, Algebraic Methods for Toeplitz-Like Matrices and Operators, Akademie-Verlag, Berlin, 1984 and Birkhäuser Verlag, Basel, 1984. [156] G. Heinig and K. Rost, DFT representations of Toeplitz-plus-Hankel Bezoutians with application to fast matrix-vector multiplication, Linear Algebra Appl., 284 (1998), pp. 157–175. [157] G. Heinig and K. Rost, Introduction to Structured Matrices, book to appear. [158] J. W. Helton and R. E. Howe, Integral operators: Commutators, traces, index, and homology, in Proceedings of a Conference on Operator Theory, Lecture Notes in Math. 345, Springer-Verlag, Berlin, 1973, pp. 141–209. [159] D. Hertz, On the extreme eigenvalues of Toeplitz and real Hankel interval matrices, Multidimens. Systems Signal Process., 4 (1993), pp. 83–90. [160] D. J. Higham and N. J. Higham, Backward error and condition of structured linear systems, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 162–175.
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 397 i
397
[161] N. J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, 1996. [162] D. Hinrichsen and B. Kelb, Spectral value sets: A graphical tool for robustness analysis, Systems Control Lett., 21 (1993), pp. 127–136. [163] D. Hinrichsen and A. J. Pritchard, Real and complex stability radii: A survey, in Control of Uncertain Systems, Progr. Systems Control Theory 6, Birkhäuser Verlag, Boston, 1990, pp. 119–162. [164] I. I. Hirschman, Jr., On a formula of Kac and Achiezer, J. Math. Mech., 16 (1966), pp. 167–196. [165] I. I. Hirschman, Jr., The spectra of certain Toeplitz matrices, Illinois J. Math., 11 (1967), pp. 145–159. [166] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, UK, 1985. [167] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, UK, 1991. [168] Z. Hurák, A. Böttcher, and M. Šebek, Minimum distance to the range of a banded lower triangular Toeplitz operator in 1 and application in 1 -optimal control, to appear. [169] Kh. D. Ikramov, On a desciption of normal Toeplitz matrices, Comput. Math. Math. Phys., 34 (1994), pp. 399–404. [170] Kh. D. Ikramov, Classification of normal Toeplitz matrices with real elements, Math. Notes, 57 (1995), pp. 463–469. [171] Kh. D. Ikramov and V. N. Chugunov, A criterion for the normality of a complex Toeplitz matrix, Comput. Math. Math. Phys., 36 (1996), pp. 131–137. [172] T. Ito, Every normal Toeplitz matrix is either of type I or of type II, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 998–1006. [173] R. A. Janik, M. A. Nowak, G. Papp, and I. Zahed, Localization transitions from free random variables, Acta Physica Polonica B, 30 (1999), pp. 45–58. [174] K. Johansson, On random matrices from the compact classical groups, Annals of Math., 145 (1997), pp. 519–545. [175] G. A. Jones and D. Singerman, Complex Functions - An Algebraic and Geometric Viewpoint, Cambridge University Press, Cambridge, UK, 1987. [176] M. Kac, W. L. Murdock, and G. Szegö, On the eigenvalues of certain Hermitian forms, J. Rational Mech. Anal., 2 (1953), pp. 767–800. [177] T. Kailath and A. H. Sayed, eds., Fast Reliable Algorithms for Matrices with Structure, SIAM, Philadelphia, 1999.
i
i i
i
i
i
i
398
buch7 2005/10/5 page 398 i
Bibliography
[178] H. Kesten, On a theorem of Spitzer and Stone and random walks with absorbing barriers, Illinois J. Math., 5 (1961), pp. 246–266. [179] H. Kesten, Random walks with absorbing barriers and Toeplitz forms, Illinois J. Math., 5 (1961), pp. 267–290. [180] E. M. Klein, The numerical range of a Toeplitz operator, Proc. Amer. Math. Soc., 35 (1972), pp. 101–103. [181] S. V. Konyagin and W. Schlag, Lower bounds for the absolute value of random polynomials on a neighborhood of the unit circle, Trans. Amer. Math. Soc., 351 (1999), pp. 4963–4980. [182] P. A. Kozhukhar, Linear operators, Matem. Issled., 54 (1980), pp. 50–55 (in Russian). [183] P. A. Kozhukhar, The absence of eigenvalues in a perturbed discrete Wiener-Hopf operator, Izv. Akad. Nauk Moldav. SSR Mat., 1990/3 (1990), pp. 26–35 (in Russian). [184] M. G. Krein, Integral equations on the half-line with a kernel depending on the difference of the arguments, Uspekhi Mat. Nauk, 13 (1958), pp. 3–120 (in Russian). [185] H. Landau, On Szegö’s eigenvalue distribution theorem and non-Hermitian kernels, J. Analyse Math., 28 (1975), pp. 335–357. [186] H. Landau, Loss in unstable resonators, J. Opt. Soc. Amer., 66 (1976), pp. 525–529. [187] H. Landau, The notion of approximate eigenvalues applied to an integral equation of laser theory, Quart. Appl. Math., April 1977, pp. 165–171. [188] L. Lerer and M. Tismenetsky, Generalized Bezoutians and the inversion problem for block matrices, I, General scheme, Integral Equations Operator Theory, 9 (1986), pp. 790–819. [189] X. Liu, G. Strang, and S. Ott, Localized eigenvectors from widely spaced matrix modifications, SIAM J. Discrete Math., 16 (2003), pp. 479–498. [190] J. Nagy, R. Plemmons, and T. Torgersen, Iterative image restoring using approximate inverse preconditioning, IEEE Trans. Image Processing, 15 (1996), pp. 1151–1162. [191] D. R. Nelson and N. M. Shnerb, Non-Hermitian localization and population biology, Phys. Rev. E, 58 (1998), pp. 1383–1403. [192] O. Nevanlinna, Convergence of Iterations for Linear Equations, Birkhäuser Verlag, Basel, 1993. [193] O. Nevanlinna, Hessenberg matrices in Krylov subspaces and the computation of the spectrum, Numer. Funct. Anal. Optim., 16 (1995), pp. 443–473. [194] L. N. Nikolskaya and Yu. B. Farforovskaya, Toeplitz and Hankel matrices as Hadamard-Schur multipliers, Algebra i Analiz, 15 (2003), pp. 141–160 (in Russian).
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 399 i
399
[195] N. K. Nikolski, Treatise on the Shift Operator, Springer-Verlag, Berlin, 1986. [196] N. K. Nikolski, Operators, Functions, and Systems: An Easy Reading. Vol. 1. Hardy, Hankel, and Toeplitz, AMS, Providence, RI, 2002. [197] S. V. Parter, On the extreme eigenvalues of truncated Toeplitz matrices, Bull. Amer. Math. Soc., 67 (1961), pp. 191–196. [198] S. V. Parter, Extreme eigenvalues of Toeplitz forms and applications to elliptic difference equations, Trans. Amer. Math. Soc., 99 (1961), pp. 153–192. [199] S. V. Parter, On the extreme eigenvalues of Toeplitz matrices, Trans. Amer. Math. Soc., 100 (1961), pp. 263–276. [200] S. V. Parter, On the distribution of the singular values of Toeplitz matrices, Linear Algebra Appl., 80 (1986), pp. 115–130. [201] J. R. Partington, An Introduction to Hankel Operators, Cambridge University Press, Cambridge, UK, 1988. [202] V. V. Peller, Smooth Hankel operators and their applications (ideals Sp , Besov classes, random processes), Dokl. Akad. Nauk SSSR, 252 (1980), pp. 43–48 (in Russian). [203] V. V. Peller, Hankel operators of class Sp and their applications (rational approximation, Gaussian processes, the problem of majorization of operators), Mat. Sbornik, 113 (1980), pp. 538–581 (in Russian). [204] V. V. Peller, Hankel Operators and Their Applications, Springer-Verlag, New York, 2003. [205] J. Plemelj, Ein Ergänzungssatz zur Cauchy’schen Integraldarstellung analytischer Funktionen, Randwerte betreffend, Monatshefte Math. Phys., 19 (1908), pp. 205–210. [206] J. D. Pincus, On the Trace of Commutators in the Algebra of Operators Generated by an Operator with Trace Class Self-Commutator, unpublished manuscript, 1972. [207] N. I. Polski, Projection methods for solving linear equations, Uspekhi Mat. Nauk, 18 (1963), pp. 179–180 (in Russian). [208] G. Pólya and G. Szegö, Problems and Theorems in Analysis, Vols. I and II, SpringerVerlag, Berlin, 1998. [209] A. Pomp, Zur Konvergenz des Reduktionsverfahrens für Wiener-Hopfsche Gleichungen, Teil I: Ein allgemeines Operatorenschema, Preprint P-MATH-03/81,Akad. Wiss. DDR, Inst. f. Math., Berlin, 1981. [210] A. Pomp, Zur Konvergenz des Reduktionsverfahrens für Wiener-Hopfsche Gleichungen, Teil II: Anwendungen auf diskrete Wiener-Hopfsche Gleichungen und Fehlerabschätzungen, Preprint P-MATH-05/81, Akad. Wiss. DDR, Inst. f. Math., Berlin, 1981.
i
i i
i
i
i
i
400
buch7 2005/10/5 page 400 i
Bibliography
[211] D. Potts, Schnelle Polynomialtransformationen und Vorkonditionierer für ToeplitzMatrizen, Shaker Verlag, Aachen, 1998. [212] D. Potts and G. Steidl, Preconditioners for ill-conditioned Toeplitz matrices, BIT, 39 (1999), pp. 513–533. [213] S. C. Power, Hankel Operators on Hilbert Space, Pitman, Boston, London, 1982. [214] P. Rambour, J.-M. Rinkel, and A. Seghier, Inverse asymptotique de la matrice de Toeplitz et noyau de Green, C. R. Acad. Sci. Paris, 331 (2000), pp. 857–860. [215] P. Rambour and A. Seghier, Exact and asymptotic inverse of the Toeplitz matrix with polynomial singular symbol, C. R. Acad. Sci. Paris, 335 (2002), pp. 705–710; erratum in C. R. Acad. Sci. Paris, 336 (2003), pp. 399–400. [216] P. Rambour and A. Seghier, Formulas for the inverses of Toeplitz matrices with polynomially singular symbols, Integral Equations Operator Theory, 50 (2004), pp. 83–114. [217] M. Reed and B. Simon, Methods of Modern Mathematical Physics, Vol. I, Academic Press, New York, 1972. [218] E. Reich, On non-Hermitian Toeplitz matrices, Math. Scand., 10 (1962), pp. 145– 152. [219] L. Reichel and L. N. Trefethen, Eigenvalues and pseudo-eigenvalues of Toeplitz matrices, Linear Algebra Appl., 162/164 (1992), pp. 153–185. [220] S. Roch, Numerical ranges of large Toeplitz matrices, Linear Algebra Appl., 282 (1998), pp. 185–198. [221] S. Roch, Pseudospectra of operator polynomials, Oper. Theory Adv. Appl., 124 (2001), pp. 545–558. [222] S. Roch and B. Silbermann, Limiting sets of eigenvalues and singular values of Toeplitz matrices, Asymptotic Anal., 8 (1994), pp. 293–309. [223] S. Roch and B. Silbermann, C ∗ -algebra techniques in numerical analysis, J. Oper. Theory, 35 (1996), pp. 241–280. [224] S. Roch and B. Silbermann, Index calculus for approximation methods and singular value decomposition, J. Math. Anal. Appl., 225 (1998), pp. 401–426. [225] S. Roch and B. Silbermann, A note on singular values of Cauchy-Toeplitz matrices, Linear Algebra Appl., 275/276 (1998), pp. 531–536. [226] M. Rosenblum, The absolute continuity of Toeplitz’s matrices, Pacific J. Math., 10 (1960), pp. 987–996. [227] M. Rosenblum, Self-adjoint Toeplitz operators and associated orthonormal functions, Proc. Amer. Math. Soc., 13 (1962), pp. 590–595.
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 401 i
401
[228] M. Rosenblum, A concrete spectral theory for self-adjoint Toeplitz operators, Amer. J. Math., 87 (1965), pp. 709–718. [229] M. Rosenblum and J. Rovnyak, Hardy Classes and Operator Theory, Oxford University Press, New York, 1985. [230] W. Rudin, Real and Complex Analysis, 3rd edition, McGraw-Hill, New York, 1987. [231] S. M. Rump, Estimation of the sensitivity of linear and nonlinear algebraic problems, Linear Algebra Appl., 153 (1991), pp. 1–34. [232] S. M. Rump, Almost sharp bounds for the componentwise distance to the nearest singular matrix, Linear and Multilinear Algebra, 42 (1997), pp. 93–107. [233] S. M. Rump, Bounds for the componentwise distance to the nearest singular matrix, SIAM J. Matrix. Anal. Appl., 18 (1997), pp. 83–103. [234] S. M. Rump, Structured perturbations and symmetric matrices, Linear Algebra Appl., 278 (1998), pp. 121–132. [235] S. M. Rump, Ill-conditioned matrices are componentwise near to singularity, SIAM Rev., 41 (1999), pp. 102–112. [236] S. M. Rump, Structured perturbations part I: Normwise distances, SIAM J. Matrix Anal. Appl., 25 (2003), pp. 1–30. [237] S. M. Rump, Structured perturbations part II: Componentwise distances, SIAM J. Matrix Anal. Appl., 25 (2003), pp. 31–56. [238] S. M. Rump, Eigenvalues, pseudospectrum and structured perturbations, Linear Algebra Appl., to appear. [239] D. E. Rutherford, Some continuant determinants arising in physics and chemistry, Proc. Royal Soc. Edin., 62 A (1947), pp. 229–236. [240] D. E. Rutherford, Some continuant determinants arising in physics and chemistry, II, Proc. Royal Soc. Edin., 63 A (1952), pp. 232–241. [241] A. L. Sakhnovich, Szegö limits for infinite Toeplitz matrices determined by the Taylor series of two rational functions, Linear Algebra Appl., 343/344 (2002), pp. 291–302. [242] A. L. Sakhnovich and I. M. Spitkovsky, Block-Toeplitz matrices and associated properties of a Gaussian model on the half axis, Teoret. Mat. Fiz., 63 (1985), pp. 154–160 (in Russian). [243] P. Schmidt and F. Spitzer, The Toeplitz matrices of an arbitrary Laurent polynomial, Math. Scand., 8 (1960) pp. 15–38. [244] I. Schur and G. Szegö, Über die Abschnitte einer im Einheitskreise beschränkten Potenzreihe, Sitzungsberichte Preuss. Akad. Wiss. Berlin, 1925, pp. 545–560.
i
i i
i
i
i
i
402
buch7 2005/10/5 page 402 i
Bibliography
[245] D. SeLegue, A C ∗ -algebraic extension of the Szegö trace formula, talk given at the GPOTS, Arizona State University, Tempe, May 22, 1996. [246] S. Serra, Preconditioning strategies for Hermitian Toeplitz systems with nondefinite generating functions, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 1007–1019. [247] S. Serra Capizzano, On the extreme spectral properties of Toeplitz matrices generated by L1 functions with several minima/maxima, BIT, 36 (1996), pp. 135–142. [248] S. Serra Capizzano, On the extreme eigenvalues of Hermitian (block) Toeplitz matrices, Linear Algebra Appl., 270 (1998), pp. 109–129. [249] S. Serra, How to choose the best iterative strategy for symmetric Toeplitz systems, SIAM J. Numer. Anal., 36 (1999), pp. 1078–1103. [250] S. Serra Capizzano, Spectral behavior of matrix sequences and discretized boundary value problems, Linear Algebra Appl., 337 (2001), pp. 37–78. [251] S. Serra Capizzano and P. Tilli, Extreme singular values and eigenvalues of nonHermitian block Toeplitz matrices, J. Comput. Appl. Math., 108 (1999), pp. 113–130. [252] E. Shargorodsky, Geometry of higher order relative spectra and projection methods, J. Oper. Theory, 44 (2000), pp. 43–62. [253] B. Silbermann, Lokale Theorie des Reduktionsverfahrens für Toeplitzoperatoren, Math. Nachr., 104 (1981), pp. 137–146. [254] B. Simon, Notes on infinite determinants of Hilbert space operators, Advances in Math., 24 (1977), pp. 244–273. [255] B. Simon, Orthogonal Polynomials on the Unit Circle, Part 1, AMS, Providence, RI, 2005. [256] R. D. Skeel, Scaling for numerical stability in Gaussian elimination, J. Assoc. Comput. Mach., 26 (1979), pp. 494–526. [257] F. Spitzer and C. J. Stone, A class of Toeplitz forms and their application to probability theory, Illinois J. Math., 4 (1960), pp. 253–277. [258] G. Strang, A proposal for Toeplitz matrix computations, Stud. Appl. Math., 74 (1986), pp. 171–176. [259] G. Strang, From the SIAM President, SIAM News, April 2000 and May 2000. [260] T. Strohmer, Four short stories about Toeplitz matrix calculations, Linear Algebra Appl., 343/344 (2002), pp. 321–344. [261] F.-W. Sun, Y. Jiang, and J. S. Baras, On the convergence of the inverses of Toeplitz matrices and its applications, IEEE Trans. Inform. Theory, 49 (2003), pp. 180–190. [262] G. Szegö, Ein Grenzwertsatz über die Toeplitzschen Determinanten einer reellen positiven Funktion, Math. Ann., 76 (1915), pp. 490–503.
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 403 i
403
[263] G. Szegö, On certain Hermitian forms associated with the Fourier series of a positive function, in Festschrift Marcel Riesz, Lund, 1952, pp. 222–238. [264] P. Tilli, Singular values and eigenvalues of non-Hermitian block Toeplitz matrices, Linear Algebra Appl., 272 (1998), pp. 59–89. [265] P. Tilli, Some results on complex Toeplitz eigenvalues, J. Math. Anal. Appl., 239 (1999), pp. 390–401. [266] M. Tismenetsky, Determinant of block-Toeplitz band matrices, Linear Algebra Appl., 85 (1987), pp. 165–184. [267] O. Toeplitz, Zur Theorie der quadratischen und bilinearen Formen von unendlichvielen Veränderlichen, Math. Ann., 70 (1911), pp. 351–376. [268] O. Toeplitz, Das algebraische Analogon zu einem Satze von Fejér, Math. Z., 2 (1918), pp. 187–197. [269] L. N. Trefethen, Approximation theory and numerical linear algebra, in Algorithms for Approximation II, J. C. Mason and M. G. Cox, eds., Chapman and Hall, London, 1990, pp. 336–360. [270] L. N. Trefethen, Pseudospectra of matrices, in Numerical Analysis 1991, D. F. Griffiths and G. A. Watson, eds., Longman Sci. Tech, Harlow, Essex, UK, 1992, pp. 234–266. [271] L. N. Trefethen, Computation of pseudospectra, Acta Numerica, 8 (1999), pp. 247–295. [272] L. N. Trefethen, Personal communication, February 2003. [273] L. N. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM, Philadelphia, 1997. [274] L. N. Trefethen, M. Contedini, and M. Embree, Spectra, pseudospectra, and localization for random bidiagonal matrices, Comm. Pure Appl. Math., 54 (2001), pp. 594–623. [275] L. N. Trefethen and M. Embree, Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators, Princeton University Press, Princeton, 2005. [276] W. F. Trench, An algorithm for the inversion of finite Toeplitz matrices, J. Soc. Indust. Appl. Math., 12 (1964), pp. 515–522. [277] W. F. Trench, Inversion of Toeplitz band matrices, Math. Comp., 28 (1974), pp. 1089–1095. [278] W. F. Trench, On the eigenvalue problem for Toeplitz band matrices, Linear Algebra Appl., 64 (1985), pp. 199–214.
i
i i
i
i
i
i
404
buch7 2005/10/5 page 404 i
Bibliography
[279] W. F. Trench, Asymptotic distribution of the spectra of a class of generalized KacMurdock-Szegö matrices, Linear Algebra Appl., 294 (1999), pp. 181–192; erratum in Linear Algebra Appl., 320 (2000), p. 213. [280] W. F. Trench, Asymptotic distribution of the even and odd spectra of real symmetric Toeplitz matrices, Linear Algebra Appl., 302/303 (1999), pp. 155–162. [281] W. F. Trench, Spectral distribution of generalized Kac-Murdock-Szegö matrices, Linear Algebra Appl., 347 (2002), pp. 251–273. [282] E. E. Tyrtyshnikov, Influence of matrix operations on the distribution of eigenvalues and singular values of Toeplitz matrices, Linear Algebra Appl., 207 (1994), pp. 225– 249. [283] E. E. Tyrtyshnikov, Circulant preconditioners with unbounded inverses, Linear Algebra Appl., 216 (1995), pp. 1–24. [284] E. E. Tyrtyshnikov, A unifying approach to some old and new theorems on distribution and clustering, Linear Algebra Appl., 232 (1996), pp. 1–43. [285] E. E. Tyrtyshnikov and N. L. Zamarashkin, Toeplitz eigenvalues for Radon measures, Linear Algebra Appl., 343/344 (2002), pp. 345–354. [286] J. L. Ullman, A problem of Schmidt and Spitzer, Bull. Amer. Math. Soc., 73 (1967), pp. 883–885. [287] V. S. Vladimirov and I. V. Volovich, A model of statistical physics, Teoret. Mat. Fiz., 54 (1983), pp. 8–22 (in Russian). [288] R. Vreugdenhil, The resolution of the identity for selfadjoint Toeplitz operators with rational matrix symbol, Integral Equations Operator Theory, 20 (1994), pp. 449–490. [289] E. Wegert and L. N. Trefethen, From the Buffon needle problem to the Kreiss matrix theorem, Amer. Math. Monthly, 101 (1994), pp. 132–139. [290] H. Widom, On the eigenvalues of certain Hermitian operators, Trans. Amer. Math. Soc., 88 (1958), pp. 491–522. [291] H. Widom, Extreme eigenvalues of translation kernels, Trans. Amer. Math. Soc., 100 (1961), pp. 252–262. [292] H. Widom, Extreme eigenvalues of N -dimensional convolution operators, Trans. Amer. Math. Soc., 106 (1963), pp. 391–414. [293] H. Widom, Toeplitz determinants with singular generating functions, Amer. J. Math., 95 (1973), pp. 333–383. [294] H. Widom, Asymptotic behavior of block Toeplitz matrices and determinants, II, Advances in Math., 21 (1976), pp. 1–29. [295] H. Widom, On the singular values of Toeplitz matrices, Z. Anal. Anwendungen, 8 (1989), pp. 221–229.
i
i i
i
i
i
i
Bibliography
buch7 2005/10/5 page 405 i
405
[296] H. Widom, Eigenvalue distribution of nonselfadjoint Toeplitz matrices and the asymptotics of Toeplitz determinants in the case of nonvanishing index, Oper. Theory Adv. Appl., 48 (1990), pp. 387–421. [297] H. Widom, Eigenvalue distribution for nonselfadjoint Toeplitz matrices, Oper. Theory Adv. Appl., 71 (1994), pp. 1–8. [298] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, UK, 1965. [299] T. Wright, EigTool software package, Web site: http://www.comlab.ox.ac.uk/ pseudospectra/eigtool. [300] T. Wright, Algorithms and Software for Pseudospectra, Thesis, University of Oxford, Oxford, UK, 2002. [301] N. L. Zamarashkin and E. E. Tyrtyshnikov, Distribution of the eigenvalues and singular numbers of Toeplitz matrices under weakened requirements on the generating function, Sb. Math., 188 (1997), pp. 1191–1201. [302] N. L. Zamarashkin and E. E. Tyrtyshnikov, On the distribution of the eigenvectors of Toeplitz matrices under weakened requirements for the generating function, Russian Math. Surveys, 52 (1997), pp. 1333–1334. [303] P. Zizler, R. A. Zuidwijk, K. F. Taylor, and S. Arimoto, A finer aspect of eigenvalue distribution of selfadjoint band Toeplitz matrices, SIAM J. Matrix Anal. Appl., 24 (2002), pp. 59–67. [304] A. Zygmund, Trigonometric Series, Vol. I, Cambridge University Press, Cambridge, UK, 1988.
i
i i
i
i
buch7 2005/10/5 page 406 i
i
i
i
i
i
i
i
i
i
buch7 2005/10/5 page 407 i
Index · p , 4, 60 · ∞ , 12, 60 · F , 225 · tr , 249 , 96 ∼, 101
Brown-Halmos theorem, 102 BV , BV [a, b], 219 B(X), 9 B(X, Y ), 59 C, complex numbers c0 , 60 Cn (b), 33 C ∗ -algebra, 240 χn , 4 Cauchy singular integral operator, 75 Cauchy’s interlacing theorem, 224 Chebyshev polynomial, 22 circ, 32 circulant matrix, 32 cluster, 224 Coker A, 9 cokernel, 9 componentwise condition number, 332 condition number, 137 componentwise, 332 for matrix inversion, 328 full structured, 313 normwise, 313 structured, 313 confluent Vandermonde, 42 conv, convex hull convergence strong, 59 uniform, 59 weak, 59 critical behavior, 177 critical transient phase, 177
a, ˜ 5 a , 250, 261 (p) aj (A), 211 algebra Banach, 240 C ∗ , 240 Følner, 377 irrational rotation, 378 Wiener, 2 algorithm fast, 77 superfast, 77 Anderson model, 374 approximation number, 211 asymptotically extended sequence, 302 asymptotically good pseudoeigenvalue, 300 asymptotically good pseudomode, 300 asymptotically localized sequence, 302, 305 Avram-Parter theorem, 219 Banach algebra, 240 unital, 240 Banach-Steinhaus theorem, 60 Baxter-Gohberg-Feldman theorem, 63 Baxter-Schmidt formula, 37 Bergman space, 75 beta distribution, 233 bounded variation, 219 branch point, 264, 367
D, open unit disk ∂, boundary Dn (a), 31 407
i
i i
i
i
i
i
408 dj k (λ), 347 determinant, 46 discrete Hamiltonian, 335 discrete Laplacian, 335 Duduchava-Roch formula, 92 E, expected value Ej , 347 Ejj , 335 Ej k , 347 E(a), 43 ηβ , 90 eigenvalue density, 379 essential spectrum, 9 exp W , exp W± , 7 expectation, 225 exponentially decaying sequence, 15 extended sequence, 17 Fj(n) , 211 factorization Wiener-Hopf, 7 fast algorithm, 77 Fejér mean, 122 field of values, 167 finite section method, 64 Følner algebra, 377 formula Baxter-Schmidt, 37 Duduchava-Roch, 92 Gohberg-Sementsul, 77 Trench’s, 41 Widom’s, 38, 65 Fourier coeffcients, 2 Fourier matrix, 32 Fredholm operator, 9 function of bounded variation, 219 G(a), 43 Gk (An ), 179 GW , GW± , 6 Galerkin method, 74 Gauss-Seidel iteration, 187 Gohberg-Sementsul formula, 77 H (a), 3
buch7 2005/10/5 page 408 i
Index H 2 , 11 jk H (b), 347, 355 HX (A), Hp (A), 167 Hadamard’s inequality, 80 Hamiltonian discrete, 335 Hankel matrix, 2 Hardy space, 11 Hardy’s inequality, 91 Hatano-Nelson model, 375 higher order relative spectrum, 175 Hilbert-Schmidt operator, 45 Hirschman’s theorem, 274 homomorphism of C ∗ -algebras, 241 hull polynomial convex, 208 polynomial numerical, 179 Hurwitz’ theorem, 355 Im A, 9 image, 9 Ind A, 9 index, 9 inequality Hadamard’s, 80 Hardy’s, 91 instability index, 155 interval matrix, 256 inverse closedness, 240 involution, 240 irrational rotation C ∗ -algebra, 378 Jn (λ), 178 Jacobi’s theorem, 48 κp (An ), 137 κ(A, x), 315 κb (A, x), 315 κ Str (A), 328 κ Str (A, x), 313 κbStr (A, x), 313 κfull (A, x), 315 Str κfull (A, x), 313 K(X), 9 K(X, Y ), 59 Ker A, 9
i
i i
i
i
i
i
Index kernel, 9 Krein-Rutman theorem, 253 Kreiss matrix theorem, 181 L(a), 28 Lp := Lp (T), 11 (b), 262 s (b), 262 w (b), 262 p := p (Z+ ), 3 2 (β), 90 p n , 79 Laplacian discrete, 335 Laurent matrix, 28 Laurent polynomial, 8 lim inf Mn , 163 lim sup Mn , 163 lin, linear hull log, natural logarithm Mn (K), 313 μn (E), μ(E), 223 matrix circulant, 32 finite Toeplitz, 31 Fourier, 32 infinite Hankel, 2 infinite Toeplitz, 1 Laurent, 28 positive definite, 101 positive semi-definite, 101 Toeplitz-like, 123 tridiagonal Toeplitz, 34 monodromy group, 367 N, natural numbers Nn (E), 223 norm surface, 178 normal solvability, 9 normwise condition number, 313 nowhere locally constant, 364 numerical range, 167 O, 249
buch7 2005/10/5 page 409 i
409 operator Cauchy singular integral, 75 Fredholm, 9 Hilbert-Schmidt, 45 normally solvable, 9 trace class, 45 Wiener-Hopf, 74 order of a zero, 88 P , probability Pn , 46, 61, 377 P, 8 P +, 8 Ps+ , Pn+ , 8, 85 Pr , 8, 183 Pr,s , 8, 309 (A, x), 315 Str (A, x), 313 polynomial convex hull, 208 polynomial numerical hull, 179 positive definite, 101 positive semi-definite, 101 preconditioning, 259 proper cluster, 224 pseudoeigenvalue, 300 asymptotically good, 300 pseudomode, 300 asymptotically good, 300 pseudospectrum, 157 structured, 157 Qn , 49, 61, 377 R, real numbers R(a), 101 Rn (λ), 181 rad A, 12 resolution of the identity, 21 resolvent, 9 Riesz-Markov theorem, 377 S(b), 96 σ (t), 335 σj , 225 σj (A), 211 σmin (A), 59
i
i i
i
i
i
i
410 σn b, 122 σ 2 , variance (p) σj (A), 211 (A), p (A), 212 Schmidt-Spitzer theorem, 274 second order relative spectrum, 175 sequence asymptotically extended, 302 asymptotically localized, 302, 305 exponentially decaying, 15 extended, 17 stable, 61 singular value, 211 singular value decomposition, 212 singular value interlacing, 216 sky region, 178 sp A, 9 spess A, 9 spε A, 157 spm ε A, 341 (p) spε A, 157 spB,C ε A, 157 (j,k) sp A, 347 space Bergman, 75 Hardy, 11 spectral distribution, 377 spectral radius, 12 spectrum, 9 absolutely continuous, 21 essential, 9 higher order relative, 175 point, 21 second order relative, 175 singular continuous, 21 splitting phenomenon, 212 stable sequence, 61 Stone’s formula, 21 Str n (K), 313 strong convergence, 59 structured condition number, 313 for matrix inversion, 328 full, 313 structured pseudospectrum, 157 superfast algorithm, 77
buch7 2005/10/5 page 410 i
Index symbol, 3 Szegö’s strong limit theorem, 44 Szegö-Widom limit theorem, 47 T, complex unit circle T (a), 3 T −1 (a) := (T (a))−1 Tn (a), 31 Tn−1 (a) := (Tn (a))−1 theorem Avram-Parter, 219 Banach-Steinhaus, 60 Baxter-Gohberg-Feldman, 63 Brown-Halmos, 102 Cauchy’s interlacing, 224 Hirschman’s, 274 Hurwitz’, 355 Jacobi’s, 48 Krein-Rutman, 253 Kreiss matrix, 181 Riesz-Markov, 377 Schmidt-Spitzer, 274 singular value interlacing, 216 Szegö’s strong limit, 44 Szegö-Widom limit, 47 Wiener’s, 6 Toeplitz matrix, 1, 31 Toeplitz-like matrix, 123 tr A, 248 trace, 248 trace class operator, 45 tracial state, 377 Trench’s formula, 41 tridiagonal Toeplitz matrix, 34 uniform boundedness principle, 60 uniform convergence, 59 V (a), 165 V[a,b] f , 219 variance, 225 W, 2 W± , 6 Wn , 64 weak convergence, 59
i
i i
i
i
i
i
Index
buch7 2005/10/5 page 411 i
411
Widom’s formula, 38, 65 Wiener algebra, 2 Wiener’s theorem, 6 Wiener-Hopf factorization, 7 Wiener-Hopf operator, 74 wind a, 7 wind (a, λ), 15 winding number, 7 ξβ , 90 Z, integers Z+ , nonnegative integers
i
i i
i
i
buch7 2005/10/5 page 412 i
i
i
i
i
i
i