This page intentionally left blank
GRAPH SPECTRA FOR COMPLEX NETWORKS PIET VAN MIEGHEM Delft University of Technology...
284 downloads
1758 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
This page intentionally left blank
GRAPH SPECTRA FOR COMPLEX NETWORKS PIET VAN MIEGHEM Delft University of Technology
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, S˜ ao Paulo, Delhi, Dubai, Tokyo, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521194587 c Cambridge University Press 2011 ° This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2011 Printed in the United Kingdom at the University Press, Cambridge A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data ISBN 978-0-521-19458-7 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
to Saskia
who tolerated with love that science is a passionate and most demanding mistress
Contents
Preface
ix
Acknowledgements
xiii
Symbols 1
Introduction 1.1 1.2 1.3 1.4 Part I
2
Interpretation and contemplation Outline of the book Classes of graphs Outlook Spectra of graphs
13
Graph related matrices Walks and paths
13 25
General properties The number of walks Regular graphs Bounds for the largest, positive eigenvalue 1 Eigenvalue spacings Additional properties The stochastic matrix S = 1 D
Eigenvalues of the Laplacian T 4.1
2 5 7 10 11
Eigenvalues of the adjacency matrix 3.1 3.2 3.3 3.4 3.5 3.6 3.7
4
1
Algebraic graph theory 2.1 2.2
3
xv
29 29 33 43 46 55 58 63 67
General properties
67 v
vi
Contents 4.2 4.3 4.4 4.5 4.6
5
Spectra of special types of graphs 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12
6
The complete graph A small-world graph A circuit on Q nodes A path of Q 1 hops A path of k hops The wheel ZQ +1 The complete bipartite graph Np>q A general bipartite graph Complete multi-partite graph An p-fully meshed star topology A chain of cliques The lattice
Density function of the eigenvalues 6.1 6.2 6.3 6.4 6.5
7
Second smallest eigenvalue of the Laplacian T Partitioning of a graph The modularity and the modularity matrix P Bounds for the diameter Eigenvalues of graphs and subgraphs
Definitions The density when Q $ 4 Examples of spectral density functions Density of a sparse regular graph Random matrix theory
Spectra of complex networks 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8
Simple observations Distribution of the Laplacian eigenvalues and of the degree Functional brain network Rewiring Watts-Strogatz small-world graphs Assortativity Reconstructability of complex networks Reaching consensus Spectral graph metrics
Part II 8
Eigensystem and polynomials
Eigensystem of a matrix 8.1 8.2
Eigenvalues and eigenvectors Functions of a matrix
80 89 96 108 109 115 115 115 123 124 129 129 129 131 135 138 147 154 159 159 161 163 166 169 179 179 181 184 185 187 196 199 200 209 211 211 219
Contents 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9
Polynomials with real coe!cients 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11
10
Hermitian and real symmetric matrices Vector and matrix norms Non-negative matrices Positive (semi) definiteness Interlacing Eigenstructure of the product DE Formulae of determinants
General properties Transforming polynomials Interpolation The Euclidean algorithm Descartes’ rule of signs The number of real zeros in an interval Locations of zeros in the complex plane Zeros of complex functions Bounds on values of a polynomial Bounds for the spacing between zeros Bounds on the zeros of a polynomial
Orthogonal polynomials 10.1 10.2 10.3 10.4 10.5 10.6
Definitions Properties The three-term recursion Zeros of orthogonal polynomials Gaussian quadrature The Jacobi matrix
vii 222 230 235 240 243 252 255 263 263 270 274 277 282 292 295 302 305 306 308 313 313 315 317 323 326 331
References
339
Index
345
Preface During the first years of the third millennium, considerable interest arose in complex networks such as the Internet, the world-wide web, biological networks, utility infrastructures (for transport of energy, waste, water, trains, cars and aircrafts), social networks, human brain networks, and so on. It was realized that complex networks are omnipresent and of crucial importance to humanity, whose still augmenting living standards increasingly depend on complex networks. Around the beginning of the new era, general laws such as “preferential attachment” and the “power law of the degree” were observed in many, totally dierent complex networks. This fascinating coincidence gave birth to an area of new research that is still continuing today. But, as is often the case in science, deeper investigations lead to more questions and to the conclusion that so little is understood of (large) networks. For example, the rather simple but highly relevant question “What is a robust network?” seems beyond the realm of present understanding. The most natural way to embark on solving the question consists of proposing a set of metrics that tend to specify and quantify “robustness”. Soon one discovers that there is no universal set of metrics, and that the metrics of any set are dependent on each other and on the structure of the network. Any complex network can be represented by a graph. Any graph can be represented by an adjacency matrix, from which other matrices such as the Laplacian are derived. These graph related matrices are defined in Chapter 2. One of the most beautiful aspects of linear algebra is the notion that, to each matrix, a set of eigenvalues with corresponding eigenvectors can be associated. The physical meaning of an “eigen” system is best understood by regarding the matrix as a geometric transformation of “points” in a space. Those “points” define a vector: a line segment from an origin that ends in the particular point and that is directed from origin to end. The transformation (rotation, translation, scaling) of the vector is again a vector in the same space, but generally dierent from the original vector. The vector that after the transformation turns out to be proportional with itself is called an eigenvector and the proportionality strength or the scaling factor is the eigenvalue. The Dutch and German adjective “eigen” means something that is inherent to itself, a characteristic or fundamental property. Thus, knowing that each graph is represented by a matrix, it is natural to investigate the “eigensystem”, the set of all eigenvalues with corresponding eigenvectors because the “eigensystem” characterizes the graph. Stronger even, since both the adjacency and Laplacian matrix are symmetric, there is a one-to-one correspondence between the matrix and the “eigensystem”, established in art. 151. In a broader context, transformations have proved very fruitful in science. The ix
x
Preface
most prominent is undoubtedly the Fourier (or Laplace) transform. Many branches of science ranging from mathematics, physics and engineering abound with examples that show the power and beauty of the Fourier transform. The general principle of such transforms is that one may study the problem in either of two domains: in the original one and in the domain after transformation, and that there exists a one-to-one correspondence between both domains. For example, a signal is a continuous function of time that may represent a message or some information produced over time. Some properties of the signal are more appropriately studied in the time-domain, while others are in the transformed domain, the frequency domain. This analogy motivates us to investigate some properties of a graph in the topology domain, represented by a graph consisting of a set of nodes connected by a set of links, while other properties may be more conveniently dealt with in the spectral domain, specified by the set of eigenvalues and eigenvectors. The duality between topology and spectral domain is, of course, not new and has been studied in the field of mathematics called algebraic graph theory. Several books on the topic, for example by Cvetkovi´c et al. (1995); Biggs (1996); Godsil and Royle (2001) and recently by Cvetkovi´c et al. (2009), have already appeared. Notwithstanding these books, the present one is dierent in a few aspects. First, I have tried to build-up the theory as a connected set of basic building blocks, called articles, which are abbreviated by art. The presentation in article-style was inspired by great scholars in past, such as Gauss (1801) in his great treatise Disquisitiones Arithmeticae, Titchmarsh (1964) in his Theory of Functions, and Hardy and Wright (1968) in their splendid Introduction to the Theory of Numbers, and many others that cannot be mentioned all. To some extent, it is a turning back to the past, where books were written for peers, and without exercise sections, which currently seem standard in almost all books. Thus, this book does not contain exercises. Second, the book focuses on general theory that applies to all graphs, and much less to particular graphs with special properties, of which the Petersen graph, shown in Fig. 2.3, is perhaps the champion among all. In that aspect, the book does not deal with a zoo of special graphs and their properties, but confines itself to a few classes of graphs that depend at least on a single parameter, such as the number of nodes, that can be varied. Complex networks all dier and vary in at least some parameters. Less justifiable is the omission of multigraphs, directed graphs and weighted graphs. Third, I have attempted to make the book as self-contained as possible and, as a peculiar consequence, the original appendices consumed about half of the book! Thus, I decided to create two parts, the main part I on the spectra, while part II overviews interesting results in linear algebra and the theory of polynomials that are used in part I. Since each chapter in part II discusses a wide area in mathematics, in fact, separate books on each topic are required. Hence, only the basic theory is discussed, while advanced topics are not covered, because the goal to include part II was to support part I. Beside being supportive, part II contains interesting theory that opens possibilities to advance spectral results. For example, Laguerre’s beautiful Theorem 51 may once be applied to the characteristic
Preface
xi
polynomials of a class of graphs with the same number of negative, positive and zero eigenvalues of the adjacency matrix. A drawback is that the book does not contain a detailed list of references pointing to the original, first published papers: it was not my intention to survey the literature on the spectra of graphs, but rather to write a cohesive manuscript on results and on methodology. Sometimes, dierent methods or new proofs of a same result are presented. The monograph by Cvetkovi´c et al. (1995), complemented by Cvetkovi´c et al. (2009), still remains the invaluable source for references and tables of graph spectra. The book is a temporal reflection of the current state of the art: during the process of writing, progress is being made. In particular, the many bounds that typify the field are continuously improved. The obvious expectation is that future progress will increasingly shape and fine-tune the field into — hopefully — maturity. Hence, the book will surely need to be updated and all input is most welcome. Finally, I hope that the book may be of use to others and that it may stimulate and excite people to dive into the fascinating world of complex networks with the rigorous devices of algebraic graph theory oered here.
Ars mathematicae
October 2010
Piet Van Mieghem
Acknowledgements
I would like to thank Huijuan Wang, for her general interest, input and help in pointing me to interesting articles. Further, I am most grateful to Fernando Kuipers for proofreading the first version of the manuscript, to Roeloef Koekoek for reviewing Chapter 10 on orthogonal polynomials, and to Jasmina Omic for the numerical evaluation of bounds on the largest eigenvalue of the adjacency matrix. Javier Martin Hernandez, Dajie Liu and Xin Ge have provided me with many nice pictures of graphs and plots of spectra. Stojan Trajanovski has helped me with the p-dimensional lattice and art. 105. Wynand Winterbach showed that the assortativity of regularly graphs is not necessarily equal to one, by pointing to the example of the complete graph minus one link (Section 7.5.1.1). Rob Kooij has constructed Fig. 4.1 as a counter example for the common belief that Fiedler’s algebraic connectivity is always an adequate metric for network robustness with respect to graph disconnectivity. As in my previous book (Van Mieghem, 2006b), David Hemsley has suggested a number of valuable textual improvements.
xiii
Symbols
Only when explicitly mentioned, will we deviate from the standard notation and symbols outlined here. Random variables and matrices are written with capital letters, while complex, real, integer, etc., variables are in lower case. For example, [ refers to a random variable, D to a matrix, whereas { is a real number and } is complex number. Usually, l> m> n> o> p> q are integers. Operations on random variables are denoted by [=], whereas (=) is used for real or complex variables. A set of elements is embraced by {=}.
Linear algebra 6 d11 · · · d1p 9 : q × p matrix 7 ... 8 5
D
dq1
det D trace(D) diag(dn ) DW D DK fD ({) adjD T () hm nm
===
dqp
¯ ¯ ¯ d11 · · · d1q ¯ ¯ ¯ ¯ ¯ determinant of a square matrix D; also denoted by ¯ ... ¯ ¯ ¯ ¯ d ¯ = = = d q1 qq Pq = m=1 dmm : sum of diagonal elements of D = diag(d1 > d2 > = = = > dq ): a diagonal matrix with diagonal elements listed, while all o-diagonal elements are zero transpose of a matrix, the rows of D are the columns of DW matrix in which each element is the complex conjugate of the corresponding element in D = (D )W : Hermitian of matrix D = det (D {L): characteristic polynomial of D = D1 det D: adjugate of D fD () = LD : adjoint of D basic vector, all components are zero, except for component m that is 1 Kronecker delta, nm = 1 if n = m, else nm = 0 xv
xvi
Pr [[] H [[] Var[[] i[ ({) I[ ({) *[ (})
{[n }1np [(n) S 1{{}
L N O Q NQ Nq>p gm g(m) gmax gmin G N (J) L (J) K D E T M x NJ {n }1nQ {n }1nQ Qn Zn
Symbols Probability theory probability of the event [ = : expectation of the random variable [ 2 = [ : variance of the random variable [ gI[ ({) = g{ : probability density function of [ probability distribution function of [ probability generating function of [ £ ¤ *[ (}) = H £} [ when [ is a discrete r.v. ¤ }[ *[ (}) = H h when [ is a continuous r.v. = {[1 > [2 > = = = > [p } n-th order statistics, n-th smallest value in the set {[n }1np transition probability matrix (Markov process) indicator function: 1{{} = 1 if the event or condition {{} is true, else 1{{} = 0. For example, nm = 1{n=m} Graph theory set of links in a graph set of nodes in a graph = |L|: number of links in a graph = |N |: number of nodes in a graph the complete graph with Q nodes the complete bi-partite graph with Q = q + p degree of node m the m-th largest degree of node in a graph maximum degree in the graph minimum degree in the graph degree in a graph (random variable) vertex (node) connectivity of graph J edge (link) connectivity of graph J hopcount in a graph (random variable) adjacency matrix of graph J incidence matrix of graph J = EE W Laplacian matrix of graph J all-one matrix all-one vector the number of triangles in J = diag(g1 > g2 > = = = > gQ ): diagonal matrix of the nodal degrees set of eigenvalues of D ordered as 1 2 · · · Q set of eigenvalues of T ordered as 1 2 · · · Q total number of walks with length n number of closed walks with length n
1 Introduction
Despite the fact that complex networks are the driving force behind the investigation of the spectra of graphs, it is not the purpose of this book to dwell on complex networks. A generally accepted, all-encompassing definition of a complex network does not seem to be available. Instead, complex networks are understood by instantiation: the Internet, transportation (car, train, airplane) and infrastructural (electricity, gas, water, sewer) networks, biological molecules, the human brain network, social networks, software dependency networks, are examples of complex networks. By now, there is such a large literature about complex networks, predominantly in the physics community, that providing a detailed survey is a daunting task. We content ourselves here with referring to some review articles by Strogatz (2001); Newman et al. (2001); Albert and Barabasi (2002); Newman (2003b), and to books in the field by Watts (1999); Barabasi (2002); Dorogovtsev and Mendes (2003); Barrat et al. (2008); Dehmer and Emmert-Streib (2009); Newman (2010), and to references in these works. Application of spectral graph theory to chemistry and physics are found in Cvetkovi´c et al. (1995, Chapter 8). Complex networks can be represented by a graph, denoted by J, consisting of a set N of Q nodes connected by a set L of O links. Sometimes, nodes and links are called vertices and edges, respectively, and are correspondingly denoted by the set Y and H. Here and in my book on Performance Analysis (Van Mieghem, 2006b), a graph is denoted by J (N > L) or J (Q> O) to avoid conflicts with the expectation operator H in probability theory. There is no universal notation of a graph, although in graph theory J = (Y> H) often occurs, while in network theory and other applied fields, nodes and links are used and the notation J (Q> O) appears. None of these notations is ideal nor optimized, but fortunately in most cases, the notation J for a graph seems su!cient. Graphs, in turn, can be represented by a matrix (art. 1). The simplest among these graph-associated matrices is the adjacency matrix D, whose entries or elements are dlm = 1{node
l is connected to node m}
(1.1)
where 1{ is the indicator function and equal to one if the { is true, else it is zero. 1
2
Introduction
All elements dlm of the adjacency matrix are thus either 1 or 0 and D is symmetric for undirected graphs. Unless mentioned otherwise, we assume in this book that the graph is undirected and that D (and other graph-associated matrices) are symmetric. If the graph consists of Q nodes and O links, then art. 151 demonstrates that the Q × Q symmetric adjacency matrix can be written as D = [[ W
(1.2)
where the Q ×Q orthogonal matrix [ contains as columns the eigenvectors {1 > {2 ,..., {Q of D belonging to the real eigenvalues 1 2 = = = Q and where the matrix = diag(n ). This basic relation (1.2) equates the topology domain, represented by the adjacency matrix, to the spectral domain of the graph, represented by the eigensystem in terms of the orthogonal matrix [ of eigenvectors and the diagonal matrix with corresponding eigenvalues. The major di!culty lies in the map from topology to spectral domain, D $ [[ W , because the inverse map from spectral to topology domain, [[ W $ D, consists of straightforward matrix multiplications. Thus, most of the eorts in this book lie in computing or deducing properties of [ and , given D. Even more confining, most energy is devoted to and the distribution and properties of the eigenvalues {m }1mQ of D and of other graph related matrices. It is fair to say that not too much is known about the eigenvectors and the distribution and properties of eigenvector components. A state of the current art is presented by Cvetkovi´c et al. (1997).
1.1 Interpretation and contemplation One of the most studied eigenvalue problems is the stationary Schrödinger equation in quantum mechanics (see, e.g., Cohen-Tannoudji et al. (1977)), K* (u) = H* (u) where * (u) is the wave function, H is the energy eigenvalue of the Hamiltonian (linear) dierential operator K=
}2 + Y (u) 2p 2
2
2
C C C k in which the Laplacian operator is = C{ 2 + C| 2 + C} 2 , } = 2 and k ' 6=62 × 34 10 Js is Planck’s constant, p is the mass of an object subject to a potential field Y (u) and u is a three-dimensional location vector. The wave function * (u) is generally complex, but |* (u)|2 represents the density function of the probability that the object is found at position u. The mathematical theory of second-order linear dierential operators is treated, for instance, by Titchmarsh (1962, 1958). While the interpretation of the eigenfunction * (u) of the Hamiltonian K, the continuous counterpart of an eigenvector with discrete components, and its corresponding energy eigenvalue H is well understood, the meaning of an eigenvector of a graph is rather vague and not satisfactory. An attempt is as follows. The basic
1.1 Interpretation and contemplation
3
equation (8.1) of the eigenvalue problem, D{ = {, combined with the zero-one nature of the adjacency matrix D, states that the m-th component of the eigenvector {n belonging to eigenvalue n can be written as n ({n )m = (D{n )m =
Q X
X
dmo ({n )o =
o=1
({n )o
(1.3)
o is a direct neighb or of m
Since dmm = 0, the eigenvector component ({n )m weighted (multiplied) by the eigenvalue n equals the sum of the other eigenvector components ({n )o over all direct neighbors o of node m. Since all eigenvectors are orthogonal1 , each eigenvector can be interpreted as describing a dierent inherent property of the graph. What that property means is yet unclear, but the eigenvalue basic equation (1.2) says that there are only Q such inherent properties, and the orthogonality of [ or of the eigenvectors tells us that these inherent properties are independent. The above component equation (1.3) then expresses that the value ({n )m of the inherent property n, belonging to the eigenvalue n and specified by the eigenvector {n , at each node m equals a weighted sum of those values ({n )o over all its direct neighbors o, and each such sum has a same weight 1 n (provided n 6= 0, else one may say that the average over all direct neighbors of those values ({n )o is zero). Since both sides of the basic equation (8.1), D{ = {, can be multiplied by some non-zero number or quantity, we may interpret that the value of property n is expressed in dierent “physical” units. Perhaps, depending on the nature of the complex network, some of these units can be determined or discovered, but the pure mathematical description (8.1) of the eigenvalue problem does not contain this information. Although the focus here is on eigenvectors, equation (1.3) also provides interesting information about the eigenvalues, for which we refer to art. 172. Equation (1.3) reflects a local property with value ({n )m that only depends the corresponding values ({n )o of direct neighbors. But this local property for node m holds globally for any node m, with a same strength or factor n . This local and global aspect of the eigenstructure is another fascinating observation, that is conserved after “self-replication”. Indeed, using (1.3) with index m = o into (1.3) yields 2n ({n )m =
Q X o1 =1
dmo1
Q X o2 =1
= gm ({n )m +
do1 o2 ({n )o2 = X
Q X ¡ 2¢ D mo2 ({n )o2
o2 =1
({n )o2
o2 is a second hop neighb or of m
¢ ¡ ¢ ¡ P PQ 2 W , since (see art. 30) D2 mm = Q n=1 dmn dnm = n=1 dmn by symmetry D = D PQ PQ 2 and n=1 dmn = n=1 dmn = gm due to the zero-one nature of dlm , and where gm is the degree, the number of neighbors, of node m. The idea can be continued and a 1
Mathematically, the eigenvectors form an orthogonal basis that spans the entire Q-dimensional space. Each eigenvector “adds” or specifies one dimension or one axis (orthogonal to all others) in that Q-dimensional coordinate frame.
4
Introduction
subsequent substitution of (1.3) leads to an expression that involves a sum over all three hops nodes away from node m. Subsequent iterations relate the expansion of the graph around node m in the number of hops p, further elaborated in art. 17 and art. 36, to the eigenvalue structure as n o X p p (D ) ({n )op (1.4) n mm ({n )m = op is an p-th hop neighb or of m
Again, this local expansion around node m holds globally for any node m. The alternative representation (8.31) D=
Q X
n {n {Wn
n=1
shows that there is a hierarchy in importance of the properties, specified by the absolute value of the eigenvalues, because all eigenvectors are scaled to have equal unit norm. In particular, possible zero eigenvalues contain properties that the graph does not possess, because the corresponding eigenvectors do not contribute to the structure — the adjacency matrix D — of the graph. In contrast, the properties belonging to the largest (in absolute value) eigenvalues have a definite and strong influence on the graph structure. Another observation2 is that the definition of the adjacency matrix D is somewhat arbitrary. Indeed, we may agree to assign the value to the existence of a link and otherwise, where and 6= can be any complex number. Clearly, the graph is then equally well described by a new adjacency matrix D (> ) = ( ) D + M, where M is the all-one matrix. Unless = 1 and = 0, the eigenvalues and eigenvectors of D (> ) are dierent from those of D. This implies that an entirely dierent, but still consistent theory of the spectra of graphs can be built. We have not pursued this track here, although we believe that for certain problems a more appropriate choice of and may simplify the solution. When encountering the subject for the first time, one may be wondering where all the energy is spent, because the problem of finding the eigenvalues of D, reviewed in Chapter 8, basically boils down to solving the zeros of the associated characteristic polynomial (art. 138). In addition, we know (art. 1), due to symmetry of D, that all zeros are real, a fact that considerably simplifies matters as shown in Chapter 9. For, nearly all of the polynomials with real coe!cients possess complex zeros, and only a very small subset has zeros that are all real. This suggests that there must be something special about these eigenvalues and characteristic polynomials of D. There is one most fascinating class of polynomials with real coe!cients whose zeros are all real: orthogonal polynomials, which are studied in Chapter 10. In some particular cases, there is, indeed, a relation between the spectrum (eigenvalues) of the graph and the zeros of orthogonal polynomials. Much of the research in the spectral analysis of graphs is devoted to understand 2
Communicated to me by Dajie Liu.
1.2 Outline of the book
5
Fig. 1.1. A realization of an Erd˝os-Rényi random graph Js (Q) with Q = 400 nodes, of about 4. The link density s ' 1032 equals the O = 793 links and average degree 2O Q probability to have a link between two arbitrary chosen nodes in Js (Q). The size of a node is drawn proportional to its degree.
properties of the graph by inspecting the spectra of mainly two matrices, the adjacency matrix D and the Laplacian T, defined in art. 2. For example, how does the spectrum show that a graph is connected? What is the physical meaning of the largest and smallest eigenvalue, how large or small can they be? How are eigenvalues changing when nodes and/or links are added to the graph? Deeper questions are, “Is alone, without [ in (1.2), su!cient to characterize a graph?”, “How are the spacings, the dierences between consecutive eigenvalues, distributed and what do spacings physically mean?”, or, extremal problems as “What is the class of graphs on Q nodes and O links that achieves the largest second smallest eigenvalue of the Laplacian?”, and so on.
1.2 Outline of the book Chapter 2 introduces some definitions and concepts of algebraic graph theory, which are needed in Part I. We embark on the subject in Chapter 3 that focuses on the eigenvalues of the adjacency matrix D. In Chapter 4, we continue with the investigation of the spectrum of the Laplacian T. As argued by Mohar, the theory of the Laplacian spectrum is richer and contains more beautiful achievements than that of the adjacency matrix. In Chapter 5, we compute the entire adjacency spectrum and sometimes also the Laplacian spectrum of special types of classes containing at least one variable parameter such as the number of nodes Q or/and
6
Introduction
Fig. 1.2. An instance of a Barabási-Albert graph with Q = 400 nodes and O = 780 links, which is about the same as in Fig. 1.1. The size of a node is drawn proportional to its degree.
Fig. 1.3. The Watts-Strogatz small world graph on Q = 100 nodes and with nodal degree 1 G = 4 (or n = 2 as explained in Section 5.2) and rewiring probability su = 100 .
the number links O. This chapter thus illustrates the theory of Chapter 3 and Chapter 4 by useful examples. In fact, the book originated from Chapter 5 and it was a goal to collect all spectra of graphs (with at least one parameter that can be varied). The underlying thought was to explain the spectrum of a complex network
1.3 Classes of graphs
7
by features appearing in “known spectra”. Chapter 6 complements Chapter 5 asymptotically when graphs grow large, Q $ 4. For large graphs, the density or distribution of the eigenvalues (as nearly continuous variables) is more appealing and informative than the long list of eigenvalues. Apart from the three marvelous scaling laws by Wigner, Mar˘cenko-Pastur and McKay, we did not find many explicit results on densities of eigenvalues of graphs. Finally, Chapter 7, the last chapter of Part I, applies the spectral knowledge of the previous chapters to gain physical insight into the nature of complex networks. As mentioned in the Preface, the results derived in Part I have been built on the general theory of linear algebra and of polynomials with real coe!cients summarized in Part II.
Fig. 1.4. A Barabási “fractal-like” tree with Q = 1000 nodes, grown by adding at each step one new node to nodes already in the tree and proportional to their degree.
1.3 Classes of graphs The main classes of graphs in the study of complex networks are: the class of Erd˝os-Rényi random graphs (Fig. 1.1), whose fascinating properties are derived in Bollobas (2001); the class of Watts-Strogatz small-world graphs (Fig. 1.3) first explored in Watts (1999); the class of Barabási-Albert power law graphs (Fig. 1.2 and Fig. 1.4) introduced by Barabási and Albert (1999); and the regular hyperlattices in several dimensions. The Erd˝os-Rényi random graph is the simplest random model for a network. Its
8
Introduction
analytic tractability in a wide range of graph problems has resulted in the richest and most beautiful theory among classes of graphs. In many cases, the Erd˝os-Rényi random graph serves as a basic model that provides a fast benchmark for first order estimates and behaviors in real networks. Usually, if a graph problem cannot be solved analytically for the Erd˝os-Rényi random graph or for hyper-lattices, few hope exists that other classes of (random) graphs may have a solution. However, in particular the degree distribution of complex networks does not match well with the binomial degree distribution of Erd˝os-Rényi random graphs (drawn in Fig. 1.5) and this observation has spurred the search for “more realistic models”.
4
0.4
fit: Pr[D = k] = c k
2
-2.3
-1
Pr[D = k]
10
Pr[D = k]
0.3
4 2 -2
10
4 2
0.2
-3
10
2
3
4
5
6
7 8 9
2
3
10 degree k of node
0.1
Erdos-Renyi random graph Barabasi-Albert power law graph Binomial(N-1,p)
0.0 0
5
10
15
20
25
30
35
degree k of node
Fig. 1.5. The probability density function (pdf) of the nodal degree in the Erd˝os-Rényi random graph shown in Fig. 1.1 and in the Barabási-Albert power law graph in Fig. 1.2.
The Watts-Strogatz small-world graphs (after random rewiring of links) possesses a relatively high clustering and short hopcount. The probability su that a link is rewired seems to be a powerful tool in Watts-Strogatz small-world graphs to balance between “long hopcounts” (su is small) and “small-worlds” (su $ 1). The most distinguishing property of the Barabási-Albert power law graphs is the power law degree distribution, Pr [G = n] fn with power index 3 for large Q where f is a normalization constant, which is observed as a major characteristic in many real-world complex networks. Fig. 1.5 compares the degree distribution of the Erd˝os-Rényi random graph shown in Fig. 1.1 and of the Barabási-Albert power law graph in Fig. 1.2, both with the same number of nodes (Q = 400) and almost the same average degree (H [G] = 4). The insert illustrates the characteristic power law of the Barabási-Albert graph, recognized by a straight line in a log-log plot. Most nodes in the Barabási-Albert power law graph have small degree, while a few nodes have degree larger than 10 (which is the maximum degree in the
1.3 Classes of graphs
9
Erd˝os-Rényi random graph with same number of nodes and links), and even one node has 36 neighbors. A power law graph is often called a “scale-free graph”, meaning p that there is no typical scale for the degree. Thus, the standard deviation G = Var [G] is usually larger than the average H [G], such that the latter is not a good estimate for the random variable G of the degree, in contrast to Gaussian or binomial distributions, where the bell-shape is centered around the mean with, usually, small variance. Physically, power law behavior can be explained by the notion of long-range dependence, heavy correlations over large spacial or temporal intervals, and of self-similarity. A property is self-similar if on various scales (in time or space) or aggregation levels (e.g., hierarchical structuring of nodes in a network) about the same behavior is observed. The result is that a local property is magnified or scaled-up£towards a ¤global extent. Mathematically, Pr [G = n] f n , from which Pr 1 G = n = Pr [G = n]; scaling the property (here, the degree G) by a factor 1 leads to precisely the same distribution, apart from a proportionality constant . Thus, on dierent scales, the behavior “looks” similar. There is also a large number of more dedicated classes, such as Ramanujan graphs and the Kautz graphs, shown in Fig. 1.6, that possess interesting extremal properties. We will not further elaborate on the dierent properties of these classes;
Fig. 1.6. The Kautz graph of degree g = 3 and of dimension q = 3 has (g + 1) gq nodes and (g + 1) gq+1 links. The Kautz graph has the smallest diameter of any possible directed graph with Q nodes and degree g.
we have merely included some of them here to illustrate that complex networks are
10
Introduction
studied by comparing observed characteristics to those of “classes of graphs with known properties”.
1.4 Outlook I believe that we still do not understand “networks” su!ciently. For example, if the data (e.g., the adjacency matrix) of a large graph is given, and you are not allowed to visualize the network, it seems quite complex to tell, by computing graph metrics only, what the properties of the network are. You may list a large number of topological metrics such as hopcount, eccentricity, diameter, girth, expansion, betweenness, distortion, degree, assortativity, coreness, clique number, clustering coe!cient, vertex and edge connectivity and others. We as humans see a pile of numbers, but often miss the overall picture and understanding. I believe that the spectrum, that is for a su!ciently large graph a unique fingerprint as conjectured in van Dam and Haemers (2003), may reveal much more. First, graph or topology metrics are generally correlated and dependent. In contrast, eigenvalues weigh the importance of eigenvectors, that are all orthogonal, which makes the spectrum a more desirable device. Second, the belief in the spectrum stems from earlier research in condensed matter (Borghs et al., 1989), where we have deduced from the photoluminescence spectra, quite useful and precise information about the structural properties of doped GaAs substrates. By inspecting long and carefully the dierences in peaks and valleys, in gaps and in the broadness of the distribution of eigenvalues, that physically represented energy levels in the solid described by Schrödinger’s equation in Section 1.1, insight gradually arose. A similar track may be followed to understand real, complex networks, because at the time of writing, “reading and understanding” the spectrum of a graph seems beyond our ability. We hope that the mathematical properties of spectra, presented here, may help in achieving this goal.
Part I Spectra of graphs
2 Algebraic graph theory
The elementary basics of the matrix theory for graphs J (Q> O) is outlined. The books by Cvetkovi´c et al. (1995) and Biggs (1996) are standard works on algebraic graph theory.
2.1 Graph related matrices 1. The adjacency matrix D of a graph J with Q nodes is an Q × Q matrix with elements dlm = 1 only if the pair of nodes (l> m) is connected by a link of J, otherwise dlm = 0. If the graph is undirected, the existence of a link implies that dlm = dml , the adjacency matrix D = DW is a real symmetric matrix. It is assumed further that the graph J does not contain self-loops (dll = 0) nor multiple links between two nodes. The complement Jf of the graph J consists of the same set of nodes but with a link between (l> m) if there is no link (l> m) in J and vice versa. Thus, (Jf )f = J and the adjacency matrix Df of the complement Jf is Df = M L D where M is the all-one matrix ((M)lm = 1). The links in a graph can be numbered 1
3 4
2 6
5
Fig. 2.1. A graph with Q = 6 and O = 9. The links are lexicographically ordered, h1 = 1 < 2> h2 = 1 < 3> h3 = 1 {3 6, etc.
in some way, for example, lexicographically as illustrated in Fig. 2.1. Information about the direction of the links is specified by the incidence matrix E, an Q × O matrix with elements ; ? 1 if link ho = l $ m elo = 1 if link ho = l # m = 0 otherwise 13
14
Algebraic graph theory
Fig. 2.1 exemplifies the definition of D and E: 5 9 9 9 D=9 9 7
0 1 1 0 0 1
1 0 1 0 1 1
1 1 0 1 0 0
0 0 1 0 1 0
0 1 0 1 0 1
1 1 0 0 1 0
6
5 1 : 9 31 : 9 : 9 0 : > E =9 0 : 9 8 7 0 0
31 0 0 0 0 1
1 0 31 0 0 0
0 1 31 0 0 0
0 31 0 0 1 0
0 1 0 0 0 31
0 0 1 31 0 0
0 0 0 31 1 0
0 0 0 0 31 1
6 : : : : : 8
An important property of the incidence matrix E is that the sum of the columns equals zero, xW E = 0
(2.1)
where x = (1> 1> = = = > 1) is the all-one vector. An undirected graph can be represented by an Q × (2O) incidence matrix E, where each link (l> m) is counted twice, once for the direction l $ m and once for the direction m $ l. In that case, the degree of each node is just doubled. Instead of using the incidence matrix, the unsigned incidence matrix U, defined in art. 7, can be more appropriate. 2. The relation between adjacency and incidence matrix is given by the admittance matrix or Laplacian T, T = EE W = D where = diag(g1 > g2 > = = = > gQ ) is the degree matrix. Indeed, if l 6= m and noting that each column has precisely two non-zero elements at a dierent row, ½ O X ¢ ¡ 1 if (l> m) are linked W tlm = EE lm = eln emn = 0 if (l> m) are not linked PO
n=1
If l = m, then n=1 e2ln = gl , the number of links that have node l in common. By the definition of D, the row sum l of D equals the degree gl of node l, gl =
Q X
dln
(2.2)
n=1
P Consequently, each row sum Q n=1 tln = 0, which shows that T is singular, implying that det T = 0. P PQ Since Q l=1 n=1 dln = 2O, the basic law for the degree follows as Q X
gl = 2O
(2.3)
l=1
Probabilistically, when considering an arbitrary nodal degree G — thus, G is viewed as a random variable of the degree in a graph with possible realization or outcome equal to one of the values g1 > g2 > = = = > gQ —, the basic law for the degree equals H [G] =
2O Q
2.1 Graph related matrices
15
meaning that the average degree or expectation of G in any graph J is twice the ratio of the number O of links over the number Q of nodes. Especially, in large real-world networks, a probabilistic approach is adequate as illustrated in Section 7. £ ¤W Let us define the degree vector g = g1 g2 · · · gQ , then both (2.2) and (2.3) have a compact vector presentation as Dx = g
(2.4)
xW Dx = gW x = 2O
(2.5)
and
3. Many other graph related matrices can be defined. We mention as an example the distance matrix K, where the element Klm is equal to the shortest distance (in hops) between node l and node m and the modularity matrix P , defined in art. 104. The matrix S = 1 D is a stochastic matrix because all elements of S lie in the interval [0> 1] and each row sum is 1. Often weighted graphs are considered, defined by a non-negative adjacency matrix Z , where each element zlm represents the weight of a link between node l and m and zmm = 0 for all 1 m Q . A particular class of weighted graphs are undiW rected weighted graphs, where Z = Z the corresponding weighted ³P. Similarly, ´ Q ˜ = diag Laplacian can be defined as T z Z , thus t˜lm = zlm if l 6= m, m=1 lm PQ else, t˜mm = l=1;l6=m t˜ml . 2.1.1 The incidence matrix E The Q × O incidence matrix E transforms a O × 1 vector | of the “link”-space to a Q × 1 vector { of the “nodal” space by { = E|. Physically, this transformation is best understood when | is a flow or current vector in a network. Then, the row PO (E|)l = n=1 Eln |n = {l equals the sum of the in-flows and out-flows at node l. Left-multiplying both sides of { = E| by xW and using (2.1) yields xW { = 0, which means that the net flow, influx plus outflow, in the network is zero. Thus, E| = { reflects a conservation law: the demand {l oered at node l in the network is balanced by the sum of currents or flows at node l and the net demand or influx (outflow) to the network is zero. 4. Rank of the incidence matrix E. Theorem 1 If the network J is connected, then rank(E) = Q 1. Proof: The basic property (2.1) implies that rank(E) Q 1. Suppose that there exists a non-zero vector { 6= x for any real number such that {W E = 0. Under that assumption, the vector x and { are independent and the kernel (or zero space of E) consisting of all vectors y such that yW E = 0 has at least rank 2,
16
Algebraic graph theory
and consequently rank(E) Q 2. We will show that { is not independent, but proportional to x. Consider row m in E corresponding to the non-zero component {m . All non-zero elements in the row vector (E)m are links incident to node m. Since each column of E only consists of two elements (with opposite signs), for each link o incident to node m, there is precisely one other row n in E with a non-zero element in column o. In order for the linear relation {W E = 0 to hold, we thus conclude that {m = {n , and this observation holds for all nodal indices m and n because J is connected. This implies that {W E = xW E, which shows that the rank of the incidence matrix cannot be lower than Q 1. ¤ An immediate consequence is that rank(E) = Q n if the graph has n disjoint but connected components, because then (see also art. 80) there exists a relabeling of the nodes such that E can be partitioned as 5 6 E1 R = = = R 9 .. : 9 R E2 . : : E=9 9 . : .. 7 .. 8 . R
= = = En
5. The cycle-space and cut-space of a graph J. The cycle-space of a graph J consists of all possible cycles in that graph. A cycle1 of length n from node l back to node l is a succession of n undirected links of the form (q0 q1 )(q1 q2 ) · · · (qn1 q0 ), where q0 = l. A cycle can have two cycle orientations. This means that the orientation of links in a cycle either coincides with the cycle orientation or that it is the reverse of the cycle orientation. For example, the cycle (1 2) (2 6) (6 1) in Fig. 2.1 corresponds to the links (columns in E) 1> 3 and 6 and all links are oriented in a same direction along the cycle. When adding columns 1> 3 and 6, the sum is zero, which is equivalent to E| = 0 with | = (1> 0> 1> 0> 0> 1> 0> 0> 0). On the other hand, the cycle (1 2) (2 3) (3 1) corresponds to the links 1> 2 and 4, but not all links are oriented in the same direction such that | = (1> 1> 0> 1> 0> 0> 0> 0> 0) has now negative sign components. In general, if E| = 0, then the non-zero components of the vector | are links of a cycle. Indeed, consider the m-th row (E|)m = {m . If node m is not incident with links of the cycle, then {m = 0. If node m is incident with some links of the cycle, then it is incident with precisely two links, with opposite sign such that {m is again zero. Since the rank of E is Q n (where n is the number of connected components), the rank of the kernel (or null space) of E is O Q + n. Hence, the dimension of the cycle-space of a graph equals the rank of the kernel of E, which is O Q + n. The orthogonal complement of cycle-space is called the cut-space, with dimension Q n. Thus, the cut-space is the space consisting of all vectors | for which E| = { 6= 0. Since xW { = 0, the non-negative components of { are the nodes belonging to one 1
Art. 17 defines a walk, from which it follows that a cycle is a closed walk, but a cycle is slightly more because the direction of a link (qn 3 qn+1 ) can be either (qn < qn+1 ) or (qn {3 qn+1 ).
2.1 Graph related matrices
17
partition and the negative components define the other partition. These two disjoint sets (or£ partitions) of nodes thus¤ define a cut in the graph. For example in Fig. 2.1, Ex = 1 0 1 2 1 1 defines a cut that separates nodes 3 and 4 from the rest. Section 4.3 further investigates the partitioning of a graph. 6. Spanning trees and the incidence matrix E. Consider the incidence matrix E of a graph J and an arbitrary row in E, corresponding to a node q. Let Pq ¢ ¡ remove be one of the QO1 square (Q 1) × (Q 1) submatrices of E without row q and let Jq denote the subgraph of J on Q 1 nodes formed by the links in the columns of Pq . Since there are Q 1 columns in Pq , the subgraph Jq has precisely Q 1 links, where some links may start or end at node q, outside the node set of Jq . We will now investigate det Pq . (a) Suppose first that there is no node with degree 1 in J, except possibly for q, in which case Jq is not a tree spanning Q 1 nodes. Since the number of links is O (Jq ) = Q 1, the basic law of the degree (2.3) shows that there must be a zero degree node in Jq . If the zero degree node is not q, then Jq has a zero row and det Pq = 0. If q is the zero degree node, then each column of Pq contains a 1 and 1. Thus, each row sum of Pq is zero and det Pq = 0. (b) In the other case, Jq has a node l with degree 1. Then, the l-th row in Jq only has one non-zero element, either 1 or 1. After expanding det Pq by this l-th row, we obtain a new (Q 2) × (Q 2) determinant Pq;l corresponding to the graph Jq:l , formed by the links in the columns of Pq;l . For det Pq;l , we can repeat the analysis: either Jq:l is not a tree spanning the Q 2 nodes of J except for q and l, in which case det Pq;l = 0 or det Pq;l = ± det Pq;l;n . Iterating this process shows that the determinant of any square (Q 1)×(Q 1) submatrix of E is either 0, when the corresponding graph formed by the links, corresponding to the columns in E is not a spanning tree, or ±1, when that corresponding graph is a spanning tree. Thus, we have shown: Theorem 2 (Poincaré) The determinant of any square submatrix of the incidence matrix E is either 0, 1, or 1. If the determinant of any square submatrix of a matrix is 0, 1, or 1, then that matrix is said to be totally unimodular.
2.1.2 The line graph 7. The line graph o (J) of the graph J (Q> O) has as set of nodes the links of J and two nodes in o (J) are adjacent if and only if they have, as links in J, exactly one node of J in common. The line graph o (J) of J is sometimes called the “dual graph” or the “derived graph” of J. For example, the line graph of the star N1>q is the complete graph Nq and the line graph of the example graph in Fig. 2.1 is drawn in Fig. 2.2.
18
Algebraic graph theory 9
8
1
2
7 6
5
4
3
Fig. 2.2. The line graph of the graph drawn in Fig. 2.1.
We denote by U the absolute value of the incidence matrix E, i.e., ulm = |elm |. In other words, ulm = 1 if node l and link m are incident, otherwise ulm = 0. Hence, the unsigned incidence matrix U ignores the direction of links in the graph, in contrast to the incidence matrix E. Analogous the definition of the Laplacian in art. 2, we may verify that the Q × Q adjacency matrix D of the graph J is written in terms of the unsigned Q × O node-link incidence matrix U as D = UUW
(2.6)
The O × O adjacency matrix of the line graph o (J) is similarly written in terms of U as Do(J) = UW U 2L
(2.7)
We remark that E W E is generally a (1> 0> 1)-matrix, and that taking the absolute value of its entries equals UW U, whereas the Laplacian matrix T = 2 UUW = EE W . In a graph J, where multiple links with the same direction between two nodes are excluded, we consider ; Q ? 1 if both link l and m either start or end in node q X ¡ W ¢ E E lm = eql eqm = 1 if either link l or m starts or ends in node q = q=1 2 if link l and m have two nodes in common ¡ ¢ The latter case, where E W E lm = 2, occurs for a bidirectional link between two nodes. When the ¡ links¢ at each node of the graph J either all start or all end, we observe that E W E lm = 1 for all links l and m and, in that case, there holds that E W E = UW U. An interesting example of such a graph is the general bipartite graph, studied in Section 5.8, where the direction of the links is the same for each node in the set M to each node in the other set N . 8. Basic properties of the line graph. The number of nodes in the line graph o (J) equals O, the number of links in J. The number of links in the line graph o (J) is computed from the basic law of the degree (2.5) and (2.7) with x the O × 1 all-one
2.1 Graph related matrices
19
vector as 1 W 1 x Do(J) x = xW UW Ux xW x 2 2 1 2 = kUxk2 O 2
Oo(J) =
It follows from the definition of the unsigned incidence matrix U that xW1×Q U = 2xWO×1 , which is the companion of (2.1), and that Ux = g, because the row sum of PO o=1 Ulo = gl , the number of links in J incident to node l. Hence, we find that the number of links in the line graph o (J) equals 1 W 1X 2 g gO= g O 2 2 l=1 l Q
Oo(J) =
(2.8)
Alternatively, each node l in J with degree gl generates in the line graph o (J) precisely corresponding ¡ ¢ gl nodes that are all connected to each other (as in a clique), ¡gl ¢ P to g2l links. The number of links in o (J) is thus also Oo(J) = Q l=1 2 . Art. 2 indicates that the average degree of a node in the line graph o (J) is Q £ ¤ 2Oo(J) 1X 2 H Go(J) = = g 2 Qo(J) O l=1 l
The degree vector of the line graph o (J) follows from (2.4) as go(J) = Do(J) xO×1 = UW Ux 2x = UW g 2x Since in each column of U (as in the matrix E), there are only two non¡ incidence ¢ zero elements, the vector element UW g o = go+ + go , where o+ denotes the node at the start and o the node at the end of the link o. Hence, the maximum (and similarly minimum) degree of the line graph o (J) equals max go(J) = max (go+ + go 2) g(1) + g(2) 2 1oO
where g(n) denotes the n-th largest degree in J. When J is connected, then also o (J) is connected as follows from the construction2 (or definition) of the line graph o (J). Given a line graph o (J), it is possible to reconstruct the original graph J. The reconstruction or the inverse J = o1 (o (J)) is not so easy. Each link o in J connects two nodes l and m and is transformed in the line graph o (J) to a node o that bridges two cliques Ngl and Ngm . If a line graph o (J) can be partitioned into cliques, then the number of those cliques equals the number Q of nodes in J and each node o in o (J) that bridges two cliques l and m, corresponds to a link o in J between two nodes l and m. Apart from J = N3 , the reconstruction or inverse line graph o1 (J) 2
In a connected graph J, each node is reachable from any other node via a path (a sequence of adjacent links, art. 17). Similarly, in the dual setting corresponding to the line graph, each link in J is reachable from any other link via a path (a sequence of adjacent nodes or neighbors).
20
Algebraic graph theory
is unique. Algorithms to compute the original graph J from the line graph o (J) are presented by Lehot (1974) and Roussopoulos (1973). 9. Since UW U is a Gram matrix (art. 175), all eigenvalues of UW U are non-negative. Hence, it follows from (2.7) that the eigenvalues of the adjacency matrix of the line graph o (J) are not smaller than 2. The adjacency spectra of the line graph o (J) and of J are related by Lemma 10 since ³¡ ´ ³¡ ´ ¢ ¢ det UW U O×O L = OQ det UUW Q×Q L Using the definitions (2.7) and (2.6) in art. 7 yields ¢ ¡ det Do(J) ( 2) L = OQ det ( + D L) or
¢ ¡ OQ det ( + D ( + 2) L) det Do(J) L = ( + 2)
(2.9)
The eigenvalues of the adjacency matrix of the line graph o (J) are those of the unsigned Laplacian +D shifted over 2 and an eigenvalue at 2 with multiplicity O Q. If E W E = UW U, then we have by Lemma 10 that ´ ³¡ ´ ³¡ ¢ ¢ det E W E O×O L = OQ det EE W Q ×Q L from which
or
¡ ¢ det (T L) = Q O det Do(J) ( 2) L ¢ ¡ det Do(J) L = ( + 2)OQ det (T ( + 2) L)
(2.10)
In graphs J, where E W E = UW U, the eigenvalues of the adjacency matrix of the line graph o (J) are those of the Laplacian T = D shifted over 2 and an eigenvalue at 2 with multiplicity O Q . The restriction, that all eigenvalues of an adjacency matrix are not less than 2, is not su!cient to characterize line graphs (Biggs, 1996, p. 18). The state-of-the-art knowledge about line graphs is reviewed by Cvetkovi´c et al. (2004), who treat the characterization of line graphs in detail. A graph is a line graph if it satisfies certain conditions. Referring for proofs to Cvetkovi´c et al. (1995, 2004), we mention here only: Theorem 3 (Krausz) A graph is a line graph if and only if its set of links can be partitioned into “non-trivial” cliques such that (i) two cliques have at most one node in common and (ii) each node belongs to at most two cliques. Theorem 4 (Van Rooij and Wilf ) A graph is a line graph if and only if (i) it does not contain the star N1>3 as an induced subgraph and (ii) the remaining (or
2.1 Graph related matrices
21
opposite) nodes in any two triangles with a common link must be adjacent and each of such triangles must be connected to at least one other node in the graph by an odd number of links.
2.1.3 The quotient graph 10. Permutation matrix S . Consider the set N = {q1 > q2 > = = = > qQ } of nodes of J, where qm is the label of node m. The most straightforward way is the labeling qm = m. Suppose that the nodes in J are relabeled. This means that there is a permutation, often denoted by , that rearranges the node identifiers qm as ql = (qm ). The corresponding permutation matrix S has element slm = 1 if ql = (qm ), and slm = 0 otherwise. For example, the set of nodes {1> 2> 3> 4} is permuted to the set {2> 4> 1> 3} by the permutation matrix 6 5 0 1 0 0 9 0 0 0 1 : : S =9 7 1 0 0 0 8 0 0 1 0 If the vector y = (1> 2> 3> 4), then S y = z, where the permuted vector z = (2> 4> 1> 3). Next, S z = } = (4> 3> 2> 1), then S } = | = (3> 1> 4> 2), and S | = y. Thus, S 4 y = y. This observation holds in general, S Q y = y for each Q × Q permutation matrix S : each node can be relabeled to one of the {q1 > q2 > = = = > qQ } possible labels and the permutation matrix maps each time a label qm $ (qm ) = ql , where, generally, ql 6= qm , else certain elements are not permuted3 . After Q relabelings, we arrive again at the initial labeling. The definition (8.77) of the determinant shows that det S = ±1, because in each row there is precisely one non-zero element equal to 1. Another example of a permutation matrix is the unit-shift relabeling transformation in Section 5.2.1. 11. A permutation matrix S is an orthogonal matrix. Since a permutation matrix S relabels a vector y to a vector z = S y, both vectors y and z contain the same components, but in a dierent order (provided S 6= L), such that their norms (art. 161) are equal, kyk = kzk. Using the Euclidean norm k{k22 = {W {, the equality yW y = zW z implies that S W S = L, such that S is an orthogonal matrix (art. 151). 12. If J1 and J2 are two directed graphs on the same set of nodes, then they are called isomorphic if and only if there is a permutation matrix S such that S W DJ1 S = DJ2 . Since permutation matrices are orthogonal, S 1 = S W , the spectra of J1 and J2 are identical (art. 151) : the spectrum is an invariant of the isomorphism class of a graph. 3
The special permutation S = L does not, in fact, relabel nodes.
22
Algebraic graph theory
13. Automorphism. We investigate the eect of a permutation of the nodal set N of a graph on the structure of the adjacency matrix D. Suppose that ql = (qm ) and qn = (qo ), then we have with the definition of S in art. 10, (S D)lo =
Q X
slp dpo = dmo
p=1
(DS )lo =
Q X
dlp spo = dln
p=1
In order for D and S to commute, i.e., S D = DS , we observe that, between each node pair (qm > qo ) and its permutation ( (qm ) > (qo )) there must be a link such that dmo = 1 = dln . An automorphism of a graph is a permutation of the nodal set N such that (ql > qm ) is a link of J if and only if ( (ql ) > (qm )) is a link of J. Hence, if the permutation is an automorphism, then D and S commute. The consequences for the spectrum of the adjacency matrix D are interesting. Suppose that { is an eigenvector of D belonging to the eigenvalue , then DS { = S D{ = S { = S { which implies that S { is also an eigenvector of D belonging to eigenvalue . If { and S { are linearly independent, then cannot be a simple eigenvalue. Thus, an automorphism produces multiple eigenvectors belonging to a same eigenvalue. 14. Partitions. A generalization of a permutation is a partition that separates the nodal set N of a graph in disjoint, non-empty subsets of N , whose union is N . The n 5 [1> Q ] disjoint, non-empty subsets generated by a partition are sometimes called cells, and denoted by {F1 > F2 > = = = > Fn }. If n = Q , the partition reduces to a permutation. We also denote a partition by . Let {F1 > F2 > = = = > Fn } be a partition of the set N = {1> 2> = = = > Q } of nodes and let D be a symmetric matrix, that is partitioned as 6 5 D1>1 · · · D1>n 9 .. : D = 7 ... . 8 Dn>1
···
Dn>n
where the block matrix Dl>m is the submatrix of D formed by the rows in Fl and the columns in Fm . For example, the partition F1 = {1> 3}, F2 = {2> 4> 6} and F3 = {5} of the nodes in Fig. 2.1 leads to the partitioned adjacency matrix 5 ¸ ¸ 6 ¸ 0 1 0 1 0 1 : 9 9 5 1 0 6 5 1 1 0 6 5 0 6 : : 9 1 0 0 1 1 1 : 9 : D = 9 7 9 0 1 8 7 0 0 0 8 7 1 8 : : 9 8 7 1 £ 1 0 ¤ £ 1 0 0 ¤ [0] 0 0 1 1 1
2.1 Graph related matrices
23
The characteristic matrix V of the partition, also called the community matrix V, is the Q × n matrix whose columns are the vectors Fn labeled in accordance with D. Thus, in the example, the matrix V corresponding to the partition F1 = {1> 3}, F2 = {2> 4> 6} and F3 = {5} is 6 5 1 0 0 9 1 0 0 : 5 6 : 9 x2 0 0 : 9 0 1 0 : 7 9 V=9 0 x3 0 8 := 9 0 1 0 : : 9 0 0 x1 7 0 1 0 8 0 0 1 where xm is the all one vector of dimension m. Clearly, V W V = diag(2> 3> 1). In general, V W V = diag(|F1 | > |F2 | > = = = > |Fn |), where |Fn | equals the number of elements in the set Fn . Each row of V only contains one non-zero element, which follows from the definition of a partition: a node can only belong to one cell (or community) of the partition and the union of all cells is again the complete set N of nodes. Thus, the elements of the Q × n community matrix V are ½ 1 if node l belongs to the set (or community) Fm Vlm = 0 otherwise ¢ ¡ and, the columns of V are orthogonal and trace V W V = Q . 15. Quotient matrix. The quotient matrix corresponding to the partition specified by {F1 > F2 > = = = > Fn } is defined as the n × n matrix ¡ ¢1 W V ( D) V (2.11) D = V W V ³ ´ ¡ ¢1 where V W V = diag |F11 | > |F12 | > = = = > |F1n | . The quotient matrix of the matrix D of the example in art. 14 is 6 5 1 2 0 D = 7 4 2 1 8 3
3
0
3
0
We can verify that (D )lm denotes the average row sum of the block matrix ( D)l>m . An example of the quotient matrix T of a Laplacian T is given in Section 5.11. If the row sum of each block matrix Dl>m is constant, then the partition is called equitable (or regular). In that case, Dl>m x = ( D)l>m x or DV = VD . Also, a partition is equitable if, for any l and m, the number of neighbors that a node in Fl has in the cell Fm does not depend on the choice of a node in Fl . For example, consider a node y in the Petersen graph shown in Fig. 2.3 and construct the three cell partitions as F1 = {y}, F2 is the set of the neighbors of y and F3 is the set of nodes two hops away from y. The number of neighbors of y in F2 is three and zero in F3 , while the number of
24
Algebraic graph theory
Fig. 2.3. The Petersen graph.
neighbors of a node in F2 with F3 is two such 5 0 3 D = 7 1 0 0 1
that 6 0 2 8 2
A distance partition with respect to node y is the partition of N into the sets of nodes in J at distance u from a node y. A distance partition is, in general, not equitable. If y is an eigenvector of D belonging to the eigenvalue , then Vy is an eigenvector of D belonging to the same eigenvalue . Indeed, left-multiplication of the eigenvalue equation D y = y by V yields Vy = VD y = ( D) Vy This property makes equitable partitions quite powerful. For example, the adjacency matrix of the complete bipartite graph Np>q (see Section 5.7) has an equitable partition with n = 2. The corresponding quotient ¸ s 0 p whose eigenvalues are ± pq, which are the non-zero matrix is D = q 0 eigenvalues of Np>q . The quotient matrix of the complete multi-partite graph is derived in Section 5.9. The quotient graph of an equitable partition, denoted by J , is the directed graph with the cells of the partition as its nodes and with (D )lm links going from cell/node Fl to node Fm . Thus, (D )lm equals the number of links that join a node in the cell Fl to the nodes in cell Fm . In general, the quotient graph contains multiple links and self-loops. The subgraph induced by each cell in an equitable partition is necessarily a regular graph because each node in cell Fl has the same number of neighbors in cell Fm .
2.2 Walks and paths
25
16. Trees. We consider an application of equitable partitions to trees. A tree W is centrally symmetric with center x if and only if there is an automorphism of W , which fixes x and which maps { to |, where { and | are any two nodes in W at the same distance from x. Lemma 1 A tree is centrally symmetric with respect to node x if and only if the distance partition is equitable. ¤
Proof: see Godsil (1993).
Lemma 2 Any tree with maximum degree gmax is a subgraph of a centrally symmetric tree Wgmax with property that all nodes of Wgmax have degree 1 or gmax . ¤
Proof: see Godsil (1993).
The tree Wgmax is completely determined once the radius, the distance from the center to an end-node, is known. The quotient matrix D of the adjacency matrix of Wgmax corresponding to the distance partition is 6 5 0 ··· 0 0 gmax : 9 1 0 gmax 1 : 9 : 9 .. D = 9 : . : 9 7 1 0 gmax 1 8 1
0
A similarity transform does not alter the eigenvalues (art. 142). If μ½³p ´l1 ¾ ¶ K = diag gmax 1 l
f = KD K 1 , with the same dimensions as D , then D 5 s gmax 0 ··· 0 0 gmax 1 s s 9 0 gmax 1 9 gmax 1 9 .. f = 9 D . 9 s s 9 7 gmax 1 0 gmax 1 s gmax 1 0
6 : : : : : : 8
These results are used in art. 63.
2.2 Walks and paths 17. A walk of length n from node l to node m is a succession of n arcs of the form (q0 $ q1 )(q1 $ q2 ) · · · (qn1 $ qn ) where q0 = l and qn = m. A path is a walk in which all vertices are dierent, i.e., qo 6= qp for all 0 o 6= p n. A closed walk of length n, also called a cycle of length n, is a walk that starts in node l and returns, after n hops, to that same node l.
26
Algebraic graph theory
Lemma¡3 The ¢ number of walks of length n from node l to node m is equal to the element Dn lm . Proof (by induction): For n = 1, the number of walks of length 1 between state l and m equals the number of direct links between l and m, which is by definition the element dlm in the adjacency matrix D. Suppose the lemma holds for n 1. A walk of length n consists of a walk of length n 1 from l to some vertex u which is adjacent to m.¡ By the ¢ induction hypothesis, the number of walks of length n 1 from l to u is Dn1 lu and the number of walks with length 1 from u to m equals dum .¢ The total of walks from l to m with length n then equals ¢ ¡ nnumber PQ ¡ n1 D d = D (by the rules of matrix multiplication). ¤ um u=1 lu lm Explicitly, Q X Q Q X X ¡ n¢ D lm = ··· dlu1 du1 u2 · · · dun2 un1 dun1 m u1 =1 u2 =1
(2.12)
un1 =1
18. The number of paths with n hops between node l and node m is, for n A 1 and Q A 2, [n (l> m; Q ) =
X
X
u1 6={l>m} u2 6={l>u1 >m}
X
···
dlu1 du1 u2 · · · dun1 m
un1 6={l>u1 >===>un2 >m}
while the number of paths with n = 1 hop between the node pair (l> m) is [1 (l> m; Q ) = dlm Symmetry of the adjacency matrix D implies that [n (l> m; Q ) = [n (m> l; Q ). The definition of a path restricts the first index u1 to Q 2 possible values, the second u2 to Q 3, etc., such that the maximum number of n hop paths, which is attained in the complete graph NQ , where dlm = 1 for each link (l> m), equals n1 Y
(Q 1 o) =
o=1
(Q 2)! (Q n 1)!
whereas the total possible number of walks follows from (2.12) as Q n1 . The total number of paths PQ between two nodes in the complete graph is PQ =
Q 1 X m=1
Q2 X 1 (Q 2)! = (Q 2)! (Q m 1)! n!
= (Q 2)!h U
n=0
2.2 Walks and paths
27
where U = (Q 2)!
4 X m=Q 1
X (Q 2)! 1 = m! m=0 (Q 1 + m)! 4
1 1 1 + + + ··· = Q 1 (Q 1)Q (Q 1)Q (Q + 1) ¶m 4 μ X 1 1 ? = Q 1 Q 2 m=1 implying that for Q 3, U ? 1. But PQ is an integer. Hence, the total number of paths in NQ is exactly equal to PQ = [h(Q 2)!]
(2.13)
where h = 2.718 281=== and [{] denotes the largest integer smaller than or equal to {. Since any graph is a subgraph of the complete graph, the maximum total number of paths between two nodes in any graph is upper bounded by [h(Q 2)!]. 19. Diameter of a graph. A graph J is connected if there exists a walk between each pair of nodes in J. Lemma 3 shows ¡that¢ connectivity is equivalent to the existence of some integer n A 0 for which Dn lm 6= 0 for each nodal pair (l> m). ¡ ¢ The lowest integer n for which Dn lm 6= 0 for each pair (l> m) of nodes is called the diameter of the graph J. The diameter thus equals the length of the longest shortest hop path in J. 20. A shortest path. To each link ho = l $ m 5 L from node l 5 N to node m 5 N in the network, we assign a link weight z(ho ) = zlm , a non-negative real number, which quantifies a property of that link such as the delay incurred or energy needed when traveling over that link, the distance, the capacity, monetary cost, etc. The set of all link weights is called the link weight structure of J. We consider only P additive link weights such that the weight of a path S is z(S ) = ho 5S z(ho ), i.e., z(S ) equals the sum of the weights of the constituent links of S . The shortest path SD$E from D to E is the path with minimal weight, thus, z (SD$E ) z (SD$E ) for all SD$E . If all link weights are equal to zlm = 1, shortest paths are shortest hop paths and z (SD$E ) is also called the distance between nodes D and E. There exist many routing algorithms to compute shortest paths in networks. The most important of these routing algorithms are explained, for example, in Van Mieghem (2006a) and Cormen et al. (1991).
3 Eigenvalues of the adjacency matrix
Only general results of the eigenvalue spectrum of a graph J are treated. For special types of graphs, there exists a wealth of additional, but specific properties of the eigenvalues.
3.1 General properties 21. Since D is a real symmetric matrix, art. 151 shows that D has Q real eigenvalues, which we order as Q Q 1 · · · 1 . Apart from a similarity transform (art. 142), the set of eigenvalues with corresponding eigenvectors is unique. A similarity transform consists of a relabeling of the nodes in the graph that obviously does not alter the structure of the graph but merely expresses the eigenvectors in a dierent base. The classical Perron-Frobenius Theorem 38 in art. 168 for non-negative square matrices states that 1 is a simple and non-negative root of the characteristic polynomial in (8.3) possessing the only eigenvector of D with non-negative components. The largest eigenvalue 1 is also called the spectral radius of the graph. 22. Since the characteristic polynomial fD () = det (D L), defined in art. 138, Q has integer coe!cients and fq = (1) , it follows from art. 197 that the only rational zeros of fD (), i.e., zeros belonging to Q, are integers. This property also holds for the Laplacian matrix T. For example, 34 is never an eigenvalue of D nor T. 23. Gerschgorin’s Theorem 36 applied to the adjacency matrix states that any eigenvalue of D lies in the interval [gmax > gmax ]. Hence, 1 Q 1 and this maximum is attained in the complete graph (see Section 5.1). 24. Theorem 63, with p = 1 and using (3.2), indicates that q ¸ all the eigenvalues of q 2(Q 1) 2(Q 1) D are contained in the interval O> O . Q Q 25. Since dll = 0, we have that trace(D) = 0. From (8.7), the coe!cient fQ 1 of 29
30
Eigenvalues of the adjacency matrix
the characteristic polynomial fD () is (1)Q1 fQ1 =
Q X
n = 0
(3.1)
n=1
26. Applying the Newton identities (9.3) to the characteristic polynomial (8.3) and (8.5) of the adjacency matrix with }n = n , dn = (1)Q fn and fQ 1 = 0 (from (3.1)) yields for the first few values, fQ 2 =
Q 1 X 2 n 2 n=1
3n 3 n=1 3Ã 4 !2 Q Q X 1C X 2 = n 2 4n D 8
fQ 3 =
fQ 4
Q 1X
n=1
n=1
P 27. From (8.4), the coe!cient of the characteristic polynomial fQ2 = all P2 . Each principal¸minor P2 of the adjacency matrix D has a principal submatrix of the 0 { with {> | 5 [0> 1]. A minor P2 is non-zero if and only if { = | = 1, form | 0 in which case P2 = 1. For each set of adjacent nodes, there exists a non-zero minor, which implies that fQ 2 = O The Newton identities in art. 26 show that the number of links O equals Q 1 X 2 n O= 2
(3.2)
n=1
PQ Since the mean H [] = Q1 n=1 n = 0 (art. 25), the variance of the adjacency eigenvalues equals, invoking the basic law of the degree (2.3), Var [] =
Q 1 X 2 2O = H [G] n = Q Q n=1
Especially in order to understand the density function of the adjacency eigenvalues (for example, in Section 7), this stochastic interpretation is helpful. 28. Each principal submatrix P3×3 of the 5 0 P3×3 = 7 { }
adjacency matrix D is of the form 6 { } 0 | 8 |
0
and the corresponding minor P3 = det P3×3 = 2{|} is only non-zero for { = | =
3.1 General properties
31
} = 1. That form of P3×3 corresponds with a subgraph of three nodes that are fully connected. Hence, fQ3 = 2× the number of triangles NJ in J. From art. 26, it follows that the number of triangles in J is 1X 3 n 6 Q
NJ =
(3.3)
n=1
29. In general, from (8.4) and by identifying the structure of a minor Pn of D, any coe!cient fQ n can be expressed in terms of graph characteristics, X (1)Q fQ n = (1)f|fohv(G) (3.4) G5Jn
where Jn is the set of all subgraphs of J with exactly n nodes and f|fohv (G) is the number of cycles in a subgraph G 5 Jn . The minor Pn is a determinant of the Pn×n submatrix of D and defined (see art. 187) as X Pn = (1)(s) d1s1 d2s2 · · · dnsn s
where the sum is over all n! permutations s = (s1 > s2 > = = = > sn ) of (1> 2> = = = > n) and (s) is the parity of s, i.e., the number of interchanges of (1> 2> = = = > n) to obtain (s1 > s2 > = = = > sn ). Only if all the links (1> s1 ) > (2> s2 ) > = = = > (n> sn ) are contained in J, d1s1 d2s2 = = = dnsn is non-zero. Since dmm = 0, the sequence of contributing links (1> s1 ) > (2> s2 ) > = = = > (n> sn ) is a set of disjoint cycles such that each node in Jn belongs to exactly one of these cycles and (s) depends on the number of those disjoint cycles. Now, Pn is¡constructed from a specific set G 5 Jn of n out of Q nodes and ¢ in total there are Q such sets in Jn . Combining all contributions leads to the n expression (3.4). Harary (1962) discusses the determinant det D of a directed graph, from which another expression than (3.4) for the coe!cients fn of the characteristic polynomial fD () can be derived. An elementary subgraph of J on n nodes is a graph in which each component is either a link between two distinct nodes or a cycle. Here, a cycle is thus of at least length 3, possessing at least three nodes or links. Harary observes that, in the determinant of the adjacency matrix D (or in each of its minors) of a directed graph, each directed cycle of even (odd) length contributes negatively (positively) to det D. Let hf denote the number of even components in an elementary subgraph, i.e. containing an even number of nodes. Each cycle in an undirected graph corresponds to the two directions in its directed companion. Harary (1962) shows that the determinant of the adjacency matrix of an elementary subgraph H equals Y Y det D (H) = (1)hf (H) 2|f(H)| d2n dp dn 5h(H)
dp 5f(H)
where f (H) is the set of components that are cycles in H and h (H) is the set of components that are simple links in H. Finally, if Kn denotes the set of all
32
Eigenvalues of the adjacency matrix
elementary subgraphs of J with n nodes, then the coe!cient of the characteristic polynomial fD () can be written as X h (H) |f(H)| (1)Q fQ n = (1) f 2 H5Kn
30. Since D is a symmetric 0-1 matrix, we observe that using (2.2), Q Q Q X X X ¡ 2¢ D ll = dln dnl = d2ln = dln = gl n=1
n=1
n=1
Hence, with (3.2) and (8.7), the basic law for the degree (2.3) is expressed as trace(D2 ) =
Q X
2n =
n=1
Q X
gn = 2O
(3.5)
n=1
Furthermore, Q Q Q Q Q X Q Q Q X X X X X X X ¡ 2¢ D lm = dln dnm = dnl dnm l=1 m=1;m6=l
=
l=1 m=1;m6=l n=1
n=1 l=1
Q Q X X
Q X
dnl (gn dnl ) =
n=1 l=1
Ã
n=1
gn
m=1;m6=l Q X
dnl
l=1
Q X
! dnl
l=1
or Q Q X X ¡
D2
¢ lm
=
l=1 m=1;m6=l
Q X
gn (gn 1)
(3.6)
n=1
¡ ¢ PQ PQ Lemma 3 states that l=1 m=1;m6=l D2 lm equals twice the total number of twohop walks with dierent source and destination nodes. In other words, the total number of connected triplets of nodes in J equals half (3.6). Art. 8 further shows that Q Q μ ¶ Q X X X ¡ 2¢ gl D lm = 2 = 2Oo(J) 2 l=1 l=1 m=1;m6=l
which means that the number of links in the line graph o (J) of the graph J equals the number of connected triplets of nodes in J. 31. Applying the Hadamard inequality (8.93) for the determinant of a matrix yields, with (2.2), ÃQ ! 12 ÃQ ! 12 Q Q Q Y X Y X Y p |det D| d2ml = dml = gm m=1
l=1
m=1
l=1
m=1
Hence, with (8.6), (det D)2 =
Q Y n=1
2n
Q Y
gm
m=1
(3.7)
3.2 The number of walks
33
32. Art. 158 relates the diagonal elements of a symmetric matrix to its eigenvalues. Since dmm = 0, the matrix equation (8.35) becomes \ = 0, where the vector =¤ £ W (1 > 2 > = = = > Q ) and where the¡ non-negative matrix \ = |Q | 1 |2 · · · ¢ 2 2 2 consists of column vectors |m = {1m > {2m > = = = > {Q m , where {nm is the m-th component of the n-th eigenvector of D belonging to n . Geometrically, \ = 0 means that the vector is orthogonal to all Q vectors |m and, in order to have a non-zero solution for , there must hold that det \ = 0. This means, that the matrix \ has a zero eigenvalue, while all other eigenvalues of \ lie, as shown in art. 158, within the unit circle and the largest eigenvalue is precisely equal to 1. In addition, det \ = 0 implies that the set of vectors |1 > |2 > = = = > |Q is linearly dependent! Since the n-th row of \ equals the vector }n consisting of the squared components of the n-th eigenvector {n , the property det \ = det \ W = 0 also implies that the vectors }1 > }2 > = = = > }Q are linearly dependent. ¡ ¢ Since D2 mm = gm , another instance of (8.36) gives \ W 2 = g ¢ ¡ where the vector 2 = 21 > 22 > = = = > 2Q and g = (g1 > g2 > = = = > gQ ) is the degree vector. Since \ x = x (art. 158), left-multiplying with x yields xW 2 = 2O, which is (3.5).
3.2 The number of walks 33. The total number Qn of walks of length n in a graph follows from Lemma 3 as Qn =
Q X Q X (Dn )lm = xW Dn x
(3.8)
l=1 m=1
For example, Q0 = Q> Q1 = 2O. Invoking (2.4) and DW = D, we can write Qn = xW DW Dn2 Dx = gW Dn2 g For example, if n = 2, we obtain Q2 = gW g, or Q2 =
Q X
³ ´ g2n = Q (H [G])2 + Var [G]
(3.9)
n=1
The number of walks {Q0 > Q1 > = = = > Q10 } in the graph of Fig. 2.1 ignoring directions and up to n = 10 is {6> 18> 56> 174> 542> 1688> 5258> 16378> 51016> 158910> 494990}. Since the adjacency matrix D is symmetric, art. 156 provides us with Dn =
Q X
nq {q {Wq
(3.10)
q=1
such that total number Qn = xW Dn x of walks of length n is expressed in terms of
34
Eigenvalues of the adjacency matrix
the eigenvalues of D as Qn =
Q X ¡ W ¢2 n {q x q
(3.11)
q=1
PQ where {Wq x = m=1 ({q )m is the sum of the components of the eigenvector {q . When the (normalized) eigenvector {1 = sxQ as in regular graphs (art. 41) where 1 = u, the number of all walks with n hops equals Qn = Q un . Geometrically, the scalar P product {Wq x = Q m=1 ({q )m is the projection of the eigenvector {q onto the vector x, s (3.12) {Wq x = k{q k2 kxk2 cos q = Q cos q where q is the angle between the eigenvector {q and the all-one vector x. The total number Qn of walks of length n, written in terms of the “graph angles” as coined by Cvetkovi´c et al. (1997), is Qn = Q
Q X
nq cos2 q
(3.13)
q=1
¡ ¢ Since Qn = xW Dp Dnp x = (Dp x)W Dnp x , the Cauchy-Schwarz inequality (8.41) shows that ¯ ¯ ³ ´ ³¡ ¢W ¡ np ¢´ ¯ p W ¡ np ¢¯2 W Dnp x D x ¯ (Dp x) (Dp x) x ¯(D x) D from which we obtain, for integers n 0, the inequality ´ ¡ ¢³ Qn2 xW D2p x xW D2(np) x
(3.14)
valid for any number p, provided Dp exists and Dp is a real matrix. Equality only holds for regular graphs. If 2p is an integer, we have Qn2 Q2p Q2n2p . In particular for p = 0, there holds that Qn2 Q Q2n . When 2p = 1, we 2 1@2 would erroneously = ³p ´deduce that Qn (2O) Q2n1 , which is wrong because D W n (D) [ is a not a real matrix (as is required for the application of the [ diag Cauchy-Schwarz inequality), because at least one eigenvalue Q (D) is negative. 34. The generating function QJ (}). The generating function of the total number of walks in a graph J is defined as QJ (}) =
4 X
Qn } n
(3.15)
n=0
The two dierent expressions in art. 33 result in two dierent expressions for QJ (}). First, substituting the definition (3.8) into (3.15) yields Ã4 ! X 1 QJ (}) = xW Dn } n x = xW (L }D) x (3.16) n=0
3.2 The number of walks
35
where |}| ? 11 in order for the infinite series to convergence (art. 149). Since D is symmetric, there holds for any analytic function i (}) (possessing a power series expansion around some point) that i (D) = (i (D))W . Thus, we have that ° °2 ³ ´W 1 1 1 ° ° (L }D) 2 x = °(L }D) 2 x° xW (L }D)1 x = xW (L }D) 2 2
which shows that
° °2 1 ° ° QJ (}) = °(L }D) 2 x° 0 2
for all real } obeying |}| ? 11 . The zeros of QJ (}) are simple and lie in between two consecutive eigenvalues of D as follows from art. 180. Second, invoking (3.11) gives, for |}| ? 11 , QJ (}) =
Q X ¡
¢2 {Wq x
q=1
4 X
nq } n
n=0
Q ¡ W ¢2 X {q x = 1 q } q=1
(3.17)
Since QJ (0) = Q0 = Q , we have Q=
Q X ¡ W ¢2 {q x
(3.18)
q=1
and, with (3.13), Q X
cos2 q = 1
q=1
For regular graphs (art. 41), where {1 = sxQ is the eigenvector belonging to 1 = u, the generating function (3.17) of the total number of walks simplifies to Qregular
graph
(}) =
Q 1 u}
(3.19)
Cvetkovi´c et al. (1995, p. 45) have found an elegant formula for QJ (}) by rewrit1 ing xW (L }D) x using (8.81). Indeed, for n = 1 in (8.81) and FQ ×1 = {x and GQ ×1 = x, we obtain with M = x=xW , ¢ ¡ ¢ ¡ det (D + }M) = det D det 1 + }xW D1 x = 1 + }xW D1 x det D Replacing D $ L }D results in xW (L }D)
1
x=
1 }
μ
¶ det (L + } (M D)) 1 det (L }D)
The right-hand side can be written in terms of the complement Df = M L D as ! Ã ¡ f }+1 ¢ 1 1 Q det D + } L W ¡ ¢ 1 x (L }D) x = (1) } det D }1 L
36
Eigenvalues of the adjacency matrix
Finally, using the characteristic polynomial of a matrix D, fD (}) = det (D }L), we arrive at Cvetkovic’s formula, for |}| ? 11 , ! Ã ¡ 1 ¢ 1 Q fDf } 1 ¡ ¢ 1 (3.20) QJ (}) = (1) } fD }1 which shows that }QJ (}) + 1 is a ratio of two real polynomials, both with real zeros and of degree at most Q . The right-hand side of (3.16), (3.17) and (3.20) can serve as analytic continuations of QJ (}) for |}| 11 . 35. The total number of walks Qn and the sum of degree powers. We will prove the inequality Qn
Q X
gnm
(3.21)
m=1
due to Fiol and Garriga (2009). Equality in (3.21) for all n 0 is only achieved for regular graphs, because Qn;regular graph = Q un (art. 33). For n 2, equality P PQ 0 in (3.21) holds in general, because Q0 = Q = Q m=1 gm , Q1 = 2O = m=1 gm and P 2 g . For n A 2, the total number Q of walks of length n is Q2 = gW g = Q n m=1 m Qn = xW DW Dn2 Dx = gW Dn2 g =
Q X Q X
¡ ¢ gl Dn2 lm gm
l=1 m=1
=
Q X
¡ n2 ¢ 2 D g +2 ll l
l=1
Q X
Q X
¡ ¢ gl Dn2 lm gm
l=1 m=l+1
where the last sum holds by symmetry of D = DW . From 0 (d e)2 = d2 +e2 2de, we see that d2 + e2 2de such that Q Q X X
2
Q Q X X ¡ ¢ ¡ n2 ¢ © 2 ª D gl + g2m gl Dn2 lm gm lm
l=1 m=l+1
l=1 m=l+1
and Qn
Q Q X Q X Q Q X ¡ n2 ¢ 2 X ¡ n2 ¢ © 2 ª X ¡ n2 ¢ 2 2 D D g = D g + + g g l m ll l lm lm m l=1 W
l=1 m=l+1
l=1 m=1
n2 2
=x D
g
where the vector gm = argument
³ ´ gm1 > gm2 > = = = > gmQ . This derivation suggests the induction
Qn xW Dnp gp xW Dnp1 gp+1
(3.22)
which has been demonstrated already for p = 0> 1 and 2. Assume now that it holds
3.2 The number of walks
37
for p = 0, then the induction inequality (3.22) is proved when we can show that it also holds for p = + 1. Using Dx = g in (2.4) and D = DW , xW Dn g = xW DW Dn1 g = gW Dn1 g =
Q X Q X
¡ ¢ gl Dn1 lm gm
l=1 m=1
=
Q X
¡ n1 ¢ +1 D g + lm l
l=1
Q X
Q X
¡ n1 ¢ ¡ ¢ D gl gm + gm gl lm
l=1 m=l+1
Fiol and Garriga (2009) now cleverly use the inequality for any positive numbers d and e, ¢ ¡ dn e + den = dn+1 + en+1 dn en (d e) dn+1 + en+1 with equality if and only if d = e, and obtain xW Dn g
Q Q X Q X ¡ n1 ¢ +1 X ¡ n1 ¢ ¡ +1 ¢ D D gm + g+1 g + l l lm lm l=1
=
Q X Q X ¡
l=1 m=l+1
Dn1
¢ lm
g+1 = xW Dn(+1) g+1 m
l=1 m=1
which establishes the induction inequality (3.22) and completes the proof of (3.21). By using the fundamental form of the Laplacian (4.11) in art. 77, applied to { = g, X 2 gW Tg = (go+ go ) W
W
and g Tg = g ( D) g = Q X
g3m Q3 =
m=1
PQ
3 m=1 gm
X o5L
o5L
gW Dg, we find that 1 XX 2 (gl gm ) dlm 2 l=1 m=1 Q
2
(go+ go ) =
Q
(3.23)
where the right-hand side sums, over all links o in the graph, the square of the dierence between the degrees at both side of the link o. We will see in Section 7.5 how this expression can be related to the linear correlation coe!cient of the degrees in a graph and to the (dis)assortative property of a graph. 36. The number of closed walks Zn of length n in graph J is defined in art. 17; Lemma 3 and art. 144 show that Zn =
Q Q X ¡ ¢ X (Dn )mm = trace Dn = nm m=1
(3.24)
m=1
Since the mean H [] = 0 (art. 25), the definition (3.24) demonstrates that all centered moments of the adjacency eigenvalues are non-negative and equal to h i Z n H ( H [])n = Q
38
Eigenvalues of the adjacency matrix
Hence, the centered n-th moment is equal to the number of a closed walks of length n per node. The special case for n = 2 is Var[] = 2O Q , which is deduced in art. 27. When n = 3, (3.3) indicates that h i 6N J H ( H [])3 = Q The skewness v , that measures the lack of symmetry of the distribution around the mean, is defined as the normalized third moment, h i H ( H [])3 3NJ v = ³ h i´3@2 = p 2 O H [G] H ( H []) Since a tree does not have triangles, NJ = 0, the minimum possible skewness, v = 0, in the distribution of adjacency eigenvalues is achieved for a tree. In Section 5.8, we will indeed show that only the adjacency spectrum of a tree is symmetric around the mean (or origin = 0). The number of closed walks Zn of length n in graph J has a nice generating function, which is derived from Jacobi’s general identity (art. 193). Let F = }D in (8.89), trace log (L }D) = log det (L }D) and expand log (L }D) =
4 X (}D)n n=1
n
This Taylor series converges (art. 149) provided |}| ? respect to }, we obtain
1 1 .
After dierentiation with
¡ ¢ g 1X log det (L }D) trace Dn } n = } g} 4
n=1
¡ ¢ With Zn = trace Dn and Z0 = Q , the generating function of the number of closed walks Zn in J and convergent for |}| ? 11 is ZJ (}) =
4 X
Zn } n = Q + }
n=0
g log det (L }D) g}
(3.25)
Substitution of the last equality in (3.24) into the generating function (3.25) yields, for |}| ? 11 , ZJ (}) =
4 Q X X m=1 n=0
nm } n =
Q X m=1
1 1 m }
In terms of the characteristic polynomial fD () =
PQ
n n=0 fn
(3.26) of D, which is
3.2 The number of walks ¡ ¢ fD () = det (D L) = ()Q det L 1 D , we have
39
Q ¡ ¢ X det (L }D) = (})Q fD } 1 = (1)Q fQ n } n n=0
with (1)Q fQ = 1. Then, we deduce from (3.25) that Ã4 ! Q X X }n Q n (1) fQn } = exp Zn n n=0
n=1
from which, by Taylor’s theorem, Q
(1) fQ n
Ã4 !¯ ¯ X Zn 1 gn n ¯ } ¯ = exp n ¯ n! g} n n=1
}=0
This relation is equivalent to the Newton identities (art. 26). In fact, by applying our characteristic coe!cients defined in Van Mieghem (2007), the above derivatives can be explicitly computed for any finite n. The result leads precisely to that in art. 26. 37. The generating function of the number of closed walks of length n that start and terminate at node m (art. 17), is defined as ZJ (}; m) =
4 X ¡ n¢ n D mm }
(3.27)
n=0
Substituting the m-th diagonal element of (3.10) into (3.27) yields, for |}| ? ¡ ¢ 4 Q Q X X X {q {Wq mm ¡ ¢ {q {Wq mm nq } n = ZJ (}; m) = 1 q } q=1 q=1
1 1 ,
n=0
³ ´2 ¢ ¡ Art. 157 indicates that {n {Wn mm = ({n )m , such that ZJ (}; m) =
³ ´2 Q ({q )m X q=1
1 q }
(3.28)
PQ Clearly, by definition, we have that ZJ (}) = m=1 ZJ (}; m), while (3.26) and ´2 PQ ³ (3.28) indicate that m=1 ({q )m = 1, which confirms the normalization {Wq {q = 1 of the eigenvector {q . Combining (8.65) and (8.66) in art. 178 yields ¡ ¢ ¢ ¡ Q X {q {Wq mm det }L D\{m} = det (}L D) } q q=1 where D\{m} is the (Q 1)×(Q 1) adjacency matrix obtained from D by deleting the m-th row and column. Thus, D\{m} is the adjacency matrix of the subgraph
40
Eigenvalues of the adjacency matrix
J\ {m} of J obtained from the graph J by deleting node m and all its incident links. Hence, ¢ ¡ ¶ μ det }L D\{m} 1 1 = ZJ ;m det (}L D) } } and written in terms of the characteristic polynomial of a matrix D, fD (}) = det (D }L), we obtain ¡ ¢ fD\{m} }1 ¡ ¢ ZJ (}; m) = }fD }1 The relation between the characteristic polynomials fD\{m} (}) and fD (}) is further studied in art. 60. 38. Relations between Qn and Zn . Let p be the maximizer of {Wn x over all 1 n Q eigenvectors such that {Wp x {Wn x for any 1 n 6= p Q . Geometrically, the “graph angle” representation in (3.12), {Wn sxQ = cos (n ), reflects that all orthogonal eigenvectors {1 > {2 > = = = > {Q start at the origin and end on an Q -dimensional unit sphere centered at the origin. The graph angle between {n and sx is largest for {1 , by the Perron-Frobenius Theorem 38 in art. 168, because Q {1 and sxQ lie in the same Q -dimensional “quadrant” as both their components are non-negative. Any other vector {n must be orthogonal to {1 , implying that {n cannot lie in the “opposite” Q -dimensional “quadrant”, where all components or coordinates are negative, and in which the resulting cos (n ) also can be large. Another, though less transparent, argument follows from the Cauchy identity (8.86), Q Q ´2 ¡ W ¢2 1 XX³ x {n = Q ({n )m ({n )o 2 m=1 o=1
=Q
m1 ³ Q X X
´2 ({n )m ({n )o
m=1 o=1
¢2 ¡ which illustrates that the maximizer over all xW {n has minimum dierence between its components. Thus, it is the eigenvector {p that is as close as possible to the vector s1Q x with all components exactly the same. In conclusion, p = 1 and {W1 x A {Wn x for all 1 ? n Q . ¯Art. ¯162 further demonstrates that {W1 x 1, PQ PQ ¯ ¯ because {W1 x = m=1 ({1 )m = m=1 ¯({1 )m ¯ = k{1 k1 and k{1 k1 k{1 k2 = 1. ¡ ¢2 ¡ ¢2 Likewise, let t be the index that minimizes {Wn x {Wt x for any 1 n 6= t Q . Recall that {Wt x = 0 for a regular graph. Then, (3.11) is lower and upper bounded as Q Q Q X ¡ W ¢2 X ¡ W ¢2 n ¡ W ¢2 X {t x {q x q {1 x nq nq q=1
q=1
q=1
Invoking the number of closed walks Zn of length n in graph J (art. 36), Zn =
3.2 The number of walks PQ q=1
41
nq , and the total number Qn of walks (3.11), leads to the inequality ¡ W ¢2 ¡ ¢2 {t x Zn Qn {W1 x Zn ? Q Zn
where the last inequality follows from (3.18). The Q × 1 total walk vector N = (Q> Q1 > Q2 > = = = > QQ 1 ) can be written with (3.11) as 6 5 ¡ W ¢2 6 5 6 5 1 1 ··· 1 1 x {1 Q .. : 9 ¡xW { ¢2 : 9 Q1 : 9 : : 9 9 2 . Q 1 2 Q : 9 9 : 9 1 : 9 : 9 .. . : : 9 .. .. .. .. .. =9 9 :=9 . : : . . ··· . . 9 : 9 ¡ ¢ : 9 2 : 7 QQ2 8 7 Q 2 Q 2 Q 2 Q 2 8 7 xW { 8 1 2 · · · Q 1 Q ¡ W Q1¢2 Q 1 Q 1 Q 1 Q 1 QQ1 x {Q 1 2 · · · Q 1 Q and, in matrix notation, N = YQ () w{ where YQ () is the Vandermonde matrix (8.90) in art. 194 and where the Q × 1 ¢2 ¡ vector w{ has xW {n as its n-th component. Similarly, the closed walk vector W = (Z0 > Z1 > Z2 > = = = > ZQ1 ) is written as W = YQ () x 39. Diameter of a graph. Theorem 5 The number of distinct eigenvalues of the adjacency matrix D is at least equal to + 1, where is the diameter of the graph. ¡ ¢ First proof: Lemma 3 implies that Dn lm is non-zero if and only if node l and m can be joined in the graph by a walk of ¡length ¢ n. Thus, if¡ the¢ shortest path from node l to m consists of k hops, then Dk lm 6= 0, while Dn lm = 0 if n ? k. This means that the matrix Dk cannot be written as a linear combination of L> D> D2 > = = = > Dk1 . By definition of the diameter as the longest shortest path, we thus conclude that the matrices L> D> D2 > = = = > D are linearly independent. Art. 156 shows that the matrix Hn , that represents the orthogonal projection onto the eigenspace of n , is a polynomial in D. Thus, the vector space spanned by L> D> D2 > = = = > D is also spanned by a corresponding set of matrices Hn , which obey P+1 Hn Hp = 1{n=p} (art. 156). Let \ = n=1 fn Hn , then fm = Hm \ , which is only zero if all Hn are linearly independent. The matrices Hn and Hp are only linearly independent if they belong to a distinct eigenvalue of D. The linear independence of the set L> D> D2 > = = = > D thus implies that at least + 1 eigenvalues of D must be distinct. ¤ We may rephrase Theorem 5 as: “The diameter of a graph J obeys o 1, where o is the number of dierent eigenvalues of D”. The second proof may be found easier and more elegant.
42
Eigenvalues of the adjacency matrix
Second proof: As defined in art. 19, the lowest integer , which satisfies (D )lm 6= 0 and (Dp )lm = 0 if p ? for each pair (l> m) of nodes, is called the diameter of the graph J. Suppose that the adjacency matrix D has precisely o distinct eigenvalues. Art. 145 shows that D obeys pfD (D) = R, where the minimal P polynomial pfD ({) = on=0 en } n has degree o. Hence, we may write o1 ¡ o¢ 1 X ¡ n¢ en D lm D lm = eo n=0
which shows that ¡ o ¢ 1. For, assume that A o 1, then there is at least one pair (l> m) for which Dn lm = 0 for 0 n o 1. But, the minimal polynomial then ¢ ¡ ¢ ¡ shows that also Do lm = 0 and, further any Do+t lm = 0 because Dt pfD (D) = R for any integer t 0. This leads to a contradiction that A o 1. ¤ As an example, consider the complete graph NQ whose adjacency matrix has precisely o = 2 distinct eigenvalues, 1 = Q 1 and 2 = 1, as computed in Section 5.1. Theorem 5 states that the diameter is at most = o 1 = 1. Since the diameter is at least equal to = 1, we conclude from Theorem 5 that the diameter in the complete graph equals = 1, as anticipated. 40. The characteristic polynomial of the complement Jf is det (Df L) = det (M D ( + 1) L) ³ ³ ´´ 1 = (1)Q det (D + ( + 1) L) L (D + ( + 1) L) M ³ ´ 1 = (1)Q det ((D + ( + 1) L)) det L (D + ( + 1) L) x=xW where we have used that M = x=xW . Using the “rank 1 update” formula (8.82), we find det (Df L) = (1)Q j () det (D + ( + 1) L)
(3.29)
with the definition 1
j () = 1 xW (D + ( + 1) L) = 1 + }QJ (})|}=
x
1 +1
where the last equation is written in terms of the generating function (3.16) of the total number of walks (art. 34). In general, j () is not a simple function of ° °2 ° 1 ° although a little more is known. For example, j () = 1 °(D + ( + 1) L) 2 x° , 2
which shows that j () 5 (4> 1]. Analogous to derivations in art. 34, we can express j () in terms of “graph angles” as j () = 1 Q
Q X m=1
cos2 m + 1 + m
3.3 Regular graphs
43
which shows that the largest zero of j () lies on the positive real -axis exceeding Q 1. In Section 5.9, we give two methods to approximate this largest zero arbitrarily close. With (8.5), we have fD ( 1) = det (D + ( + 1) L) = QQ n=1 (n + 1 + ). Thus, the characteristic polynomial fDf () of the complement Df is det (Df L) =
Q ¢ (1)Q X ¡ + 1 + m Q 2 cos2 m Q m=1
Q Y
(n + 1 + ) (3.30)
n=1;n6=m
which shows that the poles of j () are precisely compensated by the zeros of the characteristic polynomial fD ( 1). Thus, the eigenvalues of Df are generally dierent from {m 1}1mQ , where m is an eigenvalue of D. Only if x is an n Q eigenvector of D corresponding with n , then j () = +1+ and all eigenvalues +1+n f of D belong to the set {m 1}1m6=nQ ^ {Q 1 n }. According to art. 41, x can only be an eigenvector belonging to 1 when the graph is regular. Combining (3.17) and (3.20) yields, with = }1 , (1)
Q
Q ¡ W ¢2 X {q x fDf ( 1) =1+ fD () q q=1
The right-hand side can be written as a fraction of two polynomials, in which the denominator polynomial has only simple zeros. From this observation, Cvetkovi´c et al. (1995) deduced that, if fD () has an eigenvalue with multiplicity s A 1, then the characteristic polynomial of the complement fDf () contains an eigenvalue 1 with multiplicity s 1 t s + 1.
3.3 Regular graphs The class of regular graph possesses a lot of specific and remarkable properties that justify the discussion of some spectrum related properties here. 41. Regular graphs. Every node m in a regular graph has the same degree gm = u and relation (2.2) indicates that each row sum of D equals u. The basic law of the degree (2.3) reduces for regular graphs to 2O = Q u, implying that, if the degree u is odd, then the number of nodes Q must be even. Theorem 6 The maximum degree gmax = max1mQ gm is the largest eigenvalue of the adjacency matrix D of a connected graph J if and only if the corresponding graph is regular (i.e., gm = gmax = u for all m). Proof: If { is an eigenvector of D belonging to eigenvalue = gmax so is each vector n{ for each complex n (art. 138). Thus, we can scale the eigenvector { such that the maximum component, say {p = 1, and {n 1 for all n. The eigenvalue
44
Eigenvalues of the adjacency matrix
equation D{ = gmax { for that maximum component {p is gmax {p = gmax =
Q X
dpm {m
m=1
which implies that all {m = 1 whenever dpm = 1, i.e., when the node m is adjacent to node p. Hence, the degree of node p is gp = gmax . For any node m adjacent to p for which the component {m = 1, a same eigenvalue relation holds and thus gm = gmax . Proceeding with this process shows that every node n 5 J has same degree gn = gmax because J is connected. Hence, { = x where xW = [1 1 · · · 1] and the Perron-Frobenius Theorem 38 shows that x is the eigenvector belonging to the largest eigenvalue of D. Conversely, if J is connected and regular, then PQ m=1 dpm = gmax = u for each p such that x is the eigenvector belonging to eigenvalue = gmax , and the only possible eigenvector (art. 21). Hence, there is only one eigenvalue gmax = u. ¤ Theorem 6 shows that, for a regular graph, Dx = ux, and, thus, DM = uM. After taking the transpose, (DM)W = MD = uM, we see that DM = MD. Thus, D and M commute if J is regular. Theorem 7 (Homan) A graph J is regular and connected if and only if there exists a polynomial s such that M = s (D). Proof: (a) If M = s (D), then M and D commute and, hence, J is regular. (b) Since the largest eigenvalue u is simple (art. 21), the Laplacian T = uL D has a zero eigenvalue with multiplicity 1. Theorem 11 then states that a regular graph J is connected. Conversely, let J be connected and regular. We can diagonalize the adjacency matrix D of J by using an orthogonal matrix formed by its eigenvectors (art. 151). This basis of eigenvectors of D also diagonalizes M as diag(Q> 0> = = = > 0), because M and D commute. Consider the polynomial t ({) =
Q fD ({) Y = (n (D) {) {u m=2
where fD ({) is the characteristic polynomial of D, then M = Q t(D) t(u) , because the projections on the basisvectors are t (D) {m = 0 if {m 6= x and t (D) x = t (u) x, while Mx = Q x. Thus, the polynomial s ({) = Q t({) t(u) satisfies the requirement. ¤ The proof shows that, if pfD ({) is the minimal polynomial (art. 145) associpfD ({) ated to the characteristic polynomial fD ({) and tpf ({) = {u , the polynomial t
({)
f of possibly lower degree can be found. spf ({) = Q tp p (u) f
42. Strongly regular graphs. Following Cvetkovi´c et al. (1995), we first define ' (y> z) as the number of nodes adjacent to both node y and node z 6= y. In other words, ' (y> z) is the number of common neighbors of both y and z. A regular graph J of degree u A 0, dierent from the complete graph NQ , is called strongly
3.3 Regular graphs
45
regular if ' (y> z) = q1 for each pair (y> z) of adjacent nodes and ' (y> z) = q2 for each pair (y> z) of non-adjacent nodes. A strongly regular graph is completely defined by the parameters (Q> u> q1 > q2 ). For example, the Petersen graph in Fig. 2.3 is a strongly regular graph with parameters (10> 3> 0> 1). Cvetkovi´c et al. (2009) show how many strongly regular graphs can be constructed from line graphs. For example, ³ the line graph o (NQ ) of´ the complete graph is strongly regular with parameters Q(Q21) > 2Q 4> Q 2> 4 for Q A 3. Another example is the class of Paley graphs St , whose nodes belong to the finite field Ft of order t, where t is a prime power congruent to 1 modulo 4, and whose links (l> m) are present if and only if l m is a quadratic residue (see Hardy (1968)). The Paley graph St is strongly regular with parame¡ and Wright ¢ t5 t1 ters |Ft | > t1 . Bollobas (2001, Chapter 13) discusses properties of the > > 2 4 4 Paley graph and its generalizations, the Caley graphs and conference graphs. The number of common neighbors of two dierent nodes l and m is equal to the number of 2-hop walks between l and¡ m. ¢ Thus, Lemma 3 states that ' (l> m) = ¡ 2¢ D lm if l 6= m. Art. 30 shows that D2 ll = gl = u. The condition for strong ¡ ¢ regularity states that, for dierent nodes l and m, D2 lm = q1 dlm + q2 (1 dlm ), because ' (l> m) = q1 if node l and m are neighbors, hence, dlm = 1 and ' (l> m) = q2 , if they are not, i.e. ¡ d¢lm = 0. Adding the two mutual exclusive conditions together with ' (l> m) = D2 lm demonstrates the relation. Combining all entries into a matrix form yields D2 = q1 D + q2 Df + uL Finally, using Df = M L D (art. 1), we obtain the matrix relation that characterizes strong regularity, D2 = (q1 q2 ) D + q2 M + (u q2 ) L ¡ ¢ from which M = q12 D2 + (q2 q1 ) D + (q2 u) L . Hence, the polynomial M = s (D) in Homans Theorem 7 is the quadratic polynomial s2 (}) =
¢ 1 ¡ 2 } + (q2 q1 ) } + (q2 u) q2 t
(u)
f ({ u) s2 ({) from which we deduce that the minimal polynomial pfD ({) = pQ is of degree 3. The definition of a minimal polynomial in art. 211 implies that the adjacency matrix D of J possesses precisely three distinct eigenvalues 1 = u, 2 and 3 , where 2 and 3 are zeros of s2 ({), related by q1 q2 = 2 + 3 and q2 u = 2 3 . The property that strongly regular graphs have three dierent eigenvalues explains why the complete graph NQ must be excluded in the definition above. In summary, we have proved:
Theorem 8 A connected graph J is strongly regular with degree u A 0 if and only if its adjacency matrix D has three distinct eigenvalues 1 = u, 2 and 3 , which
46
Eigenvalues of the adjacency matrix
satisfy q1 = u + 2 + 3 + 2 3 q2 = u + 2 3 where q1 and q2 are the number of common neighbors of adjacent and non-adjacent nodes, respectively.
3.4 Bounds for the largest, positive eigenvalue 1 The largest eigenvalue 1 of the adjacency matrix D appears in many applications. In particular in dynamic processes on graphs, the inverse of the largest eigenvalue 1 characterizes the threshold of the phase transition of both virus spread (Van Mieghem et al., 2009) and synchronization of coupled oscillators (Restrepo et al., 2005) in networks. Sharp bounds or exact expressions for 1 are desirable to control these processes. Bounds for 2 and Q in connected graphs follow from the general bounds (8.56) and (8.57), respectively, on eigenvalues of non-negative, irreducible, symmetric matrices in art. 172. 43. Classical lower bound. The Rayleigh’s inequalities in art. 152 indicate that 1 = sup {6=0
{W D{ {W {
and that the maximum is attained if and only if { is the eigenvector of D belonging W to 1 , while for any other vector | 6= {, 1 ||WD| | . By choosing the vector | = x, we obtain, with (2.5), the classical bound 1
2O xW Dx = xW x Q
(3.31)
Equality is reached in a regular graph, because the average degree is H [G] = 2O Q =u since gm = u for each node m and, because u is the largest eigenvalue of D belonging to the eigenvector x (Theorem 6). The dierences 1 H [G] and gmax 1 can be considered as measures for the irregularity of a graph. The Interlacing Theorem 42 states that 1 is larger than or equal to the largest eigenvalue of any subgraph Jv of J: 1
max (1 (DJv ))
all Jv J
(3.32)
Of course, the lower bounds deduced in this Section 3.4, such as (3.31) and (3.34), also apply to each individual subgraph Jv . It is a matter of ingenuity to find that subgraph Jv with highest largest eigenvalue 1 (DJv ). The lower bound (3.32) can also be deduced from the Rayleigh inequality by choosing zero components in the vector | such that | W D| = zW DJv z, where the vector z contains the non-zero components of | and DJv is the subgraph obtained by deleting those rows and columns that correspond to the zero components in |.
3.4 Bounds for the largest, positive eigenvalue 1
47
We remark that the largest eigenvalue of a non-negative matrix, that is not necessarily symmetric, also obeys the Rayleigh principle (8.28) as can be verified from art. 152 by incorporating the Perron-Frobenius Theorem 38. Hence, most of the deduced bounds in this Section 3.4 also apply to directed graphs, whose adjacency matrix is generally non-symmetric. 44. Applying the Rayleigh’s inequalities to Dn and using art. 144 and (3.8) leads to ¶1 μ W n ¶ n1 μ x D x Qn n 1 = (3.33) xW x Q For example, for n = 2, art. 33 shows that v s u q Q u1 X 2O Var [G] 2 1 t g2n = Var [G] + (H [G]) = 1+ Q Q (H [G])2 n=1
(3.34)
because the variance Var[G] 0 and Var[G] is only zero for regular graphs. The lower bound (3.34) is thus always better than the classical bound (3.31) for nonregular graphs. From the inequality (3.14) in art. 33, we deduce that μ ¶1 ¶1 μ Qn n Q2n 2n Q Q 1 1 ¢ ¢ ¡ ¡ Since the sequence QQ1 > QQ2 2 > QQ4 4 > = = = is generally non-decreasing, while each term is bounded by 1 , we arrive at ¶1 μ Qn n = 1 lim n$4 Q 45. Variations on the Rayleigh inequality. A series of other bounds can be deduced W from the Rayleigh inequality 1 ||WD| | . ´ ³ By choosing the vector | = g1 > g2 > = = = > gQ for some real number , we have PQ P
PQ PQ 1
m=1 dlm gl gm PQ 2 n=1 gn
l=1
=
l=1
m5 neighb ors(l) PQ 2 n=1 gn
(gl gm )
Maximizing the lower bound for is di!cult. The basic law of the degree (2.3) suggests to take = 12 , X p 1 1 gl gm O all pairs (l>m) of neighb ors
From (1.3), we deduce 1 ({1 )m gm maxo is a direct Thus, for any two nodes m and n, it holds that 21 gm gn
maxo is
a direct neighb or of m
({1 )m
neighb or of m
({1 )o maxp is
({1 )o for any node m.
a direct neighb or of n
({1 )n
({1 )p
48
Eigenvalues of the adjacency matrix
When choosing the pair (m> n) such that ({1 )m = maxp is and ({1 )n = maxo is a direct neighb or of m ({1 ), we find that p 1 max gl gm
a direct neighb or of n
({1 )p
all pairs (l>m) of neighb ors
Combining both bounds as X 1 O
p gl gm 1
all pairs (l>m) of neighb ors
max
all pairs (l>m) of neighb ors
p gl gm
yields the improvement of the analogous classical inequalities H [G] 1 gmax . Choosing = 1, equivalent to | = g in Rayleigh’s inequality, yields (art. 33) 1
Q3 gW Dg = gW g Q2
This bound is the special case for p = 1 of 1
Q2p+1 xW D2p+1 x = xW D2p x Q2p
(3.35)
which is obtained from Rayleigh’s inequality for | = Dp x. The classical bound (3.31) is recovered when p = 0. Invoking (3.13) yields for any non-negative integer m and n ¶ μ Qn+m V (n) V (n + m) m 1 + V (n + m) m = 1 1 (3.36) = 1 Qn 1 + V (n) 1 + V (n) where
¶n Q μ X X μ q ¶n cos2 q X μ q ¶n cos2 q q cos2 q V (n) = = + (3.37) 1 cos2 1 1 cos2 1 1 cos2 1 q=2 q A0 q ?0 ¯ ¯ ¯ ¯ Assuming that ¯ q1 ¯ ? 1, which excludes bipartite graphs by Theorem 22, then Q
Q
Q
(3.37) tends exponentially fast in n to zero. Hence, the sequence Qm > Q1+m > Q2+m >=== 1 2 m converges to 1 . Excluding in the sequel regular graphs for which V (n) = 0 and bipartite graphs, we have that ¯ ¯ ¯m ¯¯ X ¯¯ q ¯¯n ¯¯ ¯ q ¯ ¯ cos2 q ¯ ¯ ¯ ¯ ¯ V (n) V (n + m) = 1 ¯ ¯ 1 ¯ ¯ ¯ 1 ¯ ¯ cos2 1 q A0 ¯μ ¶ ¯! ¯ ¯n à m¯ ¯ 2 X ¯ ¯ n ¯ q ¯ 1 (1)m ¯¯ q ¯¯ cos q + (1) ¯ 1 ¯ ¯ 1 ¯ cos2 1 q ?0
which demonstrates for even n = 2p 0 that V (2p) V (2p + m) A 0. Since 1 + V (n) A 0, we deduce from (3.36) that m1
Q2p+m Q2p
(3.38)
from which (3.35) follows alternatively. For odd n = 2p + 1 1 and odd m,
3.4 Bounds for the largest, positive eigenvalue 1
49
the sign of V (n) V (n + m) can be negative. Indeed, we can always rewrite (3.37) as an alternating series with non-increasing (in absolute value) terms. If ¯ ¯ ¯ ¯ ¯ Q1 ¯n cos2 Q 1 ¯ 2 ¯n cos2 2 ¯ 1 ¯ cos2 1 A ¯ 1 ¯ cos2 1 , then ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Q 1 ¯n cos2 Q 1 ¯ 2 ¯n cos2 2 ¯ Q 2 ¯n cos2 Q 2 ¯ 3 ¯n cos2 3 ¯ ¯ ¯ ¯ ¯ ¯ V (n) = ¯ +¯ ¯ +¯¯ ¯¯ += = = 1 ¯ cos2 1 1 cos2 1 ¯ 1 ¯ cos2 1 1 cos2 1 and we have that ¯ ¯ ¯ ¯ ¯ ¯ ¯ Q 1 ¯n cos2 Q1 ¯ 2 ¯n cos2 2 ¯ Q1 ¯n cos2 Q1 ¯ ¯ ¯ ¯ ¯ ¯ 0?¯ ¯ ¯ V (n) ¯ 1 ¯ cos2 1 1 cos2 1 1 ¯ cos2 1 such that V (n) ? 0 and, similarly, V (n) V (n + m) ? 0. In such a case and confining to m = 1, both the lower and upperbound, Q2p+1 Q2p+2 ? 1 ? Q2p Q2p+1 converge with increasing p to 1 . By applying the inequality (Hardy et al., 1999) d1 + d2 + · · · + dq dn dn min max 1nq tn 1nq tn t1 + t2 + · · · + tq
(3.39)
where t1 > t2 > = = = > tq are positive real numbers and d1 > d2 > = = = > dq are real numbers, we obtain ¡ W ¢2 2p+m PQ ¡ W ¢2 2p+m {q x q q Q2p+m m q=1 {q x = PQ max 2 2p = 1 2 2p 1qQ W W Q2p ({q x) q q=1 ({q x) q establishing the inequality (3.38) again. For example, since V (2n + 2m) ? V (2n), Q4 Q6 2 2 for m = 2, the sequence Q Q0 Q2 Q4 = = = monotonously increases towards 1 . 46. Optimized Rayleigh lower bounds. Yet another choice is | = x + g, where the parameter will be tuned to maximize the Rayleigh lower bound. Introducing this W vector | into Rayleigh’s inequality 1 ||WD| | leads to 1
Q1 + 2Q2 + 2 Q3 2O + 2gW g + 2 gW Dg = Q + 4O + 2 gW g Q0 + 2Q1 + 2 Q2
where we have used Q = Q0 , 2O = xW g = Q1 , gW g = Q2 and gW Dg = xW D3 x = Q3 , as shown in art. 33. The zeros of the denominator Q0 + 2Q1 + 2 Q2 = 0 are p Q1 ± Q12 Q0 Q2 1>2 = Q2 The inequality (3.14) shows that Q12 Q0 Q2 0, implying that the zeros of the denominator are complex (unless for regular graphs). Maximizing the lower bound for yields, after a tedious calculation, p Q0 Q3 Q1 Q2 + Q02 Q32 6Q0 Q1 Q2 Q3 3Q12 Q22 + 4 (Q13 Q3 + Q0 Q23 ) 1 2 (Q0 Q2 Q12 ) (3.40)
50
Eigenvalues of the adjacency matrix
Numerical results in Table 3.1 and Fig. 7.6 show that this bound is better than (3.34), which is not surprising because it includes via Q3 additional information about the graph. Air transport
Random ER
Complete bipartite
80=9576
19=3405
105=5557
18=3079
18=3304
17=8701
bound (3.34)
42=7942
18=8005
105=5557
bound (3.40)
75=9029
19=2867
105=5557
1 exact bound (3.31) =
2O Q
Table 3.1. Comparison of a few lower bounds for 1 . All networks have Q = 1247 nodes. The European direct airport-to-airport tra!c network is obtained from Eurostat, while the Erd˝os-Rénji graph is defined in Section 1.3 and the complete bipartite graph Nq>p in Section 5.7. From a computational point of view, given the Q × Q adjacency matrix D, ¡ 2¢ it is the number of elementary computations for Qn is about R Q ¡, because ¢ 2 2 the addition of the computational complexity of D , and that , which is R Q ¡ 2¢ W n of D3 = DD2 , which ¡ 2 ¢ is another R Q , and so on, and that of Qn = x D x, which is again R Q . On the other hand (see Golub ¡ and ¢ Van Loan (1996)), a full eigenvalue decomposition of D requires at most R Q 3 elementary operations. An extension of the last approach is to choose the vector |=
p X
Dm x m
m=0
such that Rayleigh’s inequality becomes Pp Pp Pp Pp W m+n+1 x m=0 n=0 m n x D m=0 n=0 m n Qm+n+1 = Pp Pp 1 Pp Pp W m+n x m=0 n=0 m n x D m=0 n=0 m n Qm+n The right-hand side is a multi-dimensional function that, in principle, can be optimized for 1 > 2 > = = = > p . 47. Rayleigh’s inequality and the walk generating function. Continuing as in W art. 45, we now propose to choose in the Rayleigh inequality 1 ||WD| | , the vector 4 X |= Dm x } m = x + }g + } 2 Dg + = = = m=0
which converges for |}| ? 1 1 (art. 34). Using the Cauchy product of power series
3.4 Bounds for the largest, positive eigenvalue 1
51
yields W
| |= =
4 X
W
p
x D
}
p
p=0 4 X
4 X
m
4 X
m
D x} =
m=0
n=0
Ã
n X
! W
np
x D
p
D x }n
p=0
(n + 1) xW Dn x } n
n=0
Written in terms of the generating function (3.15) of the total number of walks in a graph J results in |W | =
4 X
(n + 1) Qn } n =
n=0
g (}QJ (})) g}
Similarly, we find that 4 X
| W D| =
(n + 1) Qn+1 } n =
n=0
gQJ (}) g}
Rayleigh’s inequality becomes, for |}| ? 1 1 , 1
gQJ (}) g} g g}
(}QJ (}))
=
gQJ (}) g} J (}) } gQg} + QJ (})
(3.41)
For example, for regular graphs, whose generating function (3.19) of the total number walks is particularly simple, equality in (3.41) is established for all |}| ? 1 1 . Conversely, the solution of the dierential equation deduced from (3.41) with equality sign is precisely the generating function (3.19) of regular graphs. When } = 0, (3.41) reduces to 1 QQ1 , which is the classical lower bound (3.31). For small }, we substitute the power series of QJ (}) up to order three in } in the right-hand side of (3.41), ¡ ¢ Q1 + 2Q2 } + 3Q3 } 2 + R } 3 1 Q0 + 2Q1 } + 3Q2 } 2 + R (} 3 ) which shows that the right-hand side is similar to the -variant in art. 45 that led to (3.40) and that it increases from 0 with }, because Q3 A Q2 A Q1 A Q0 . The function on the right-hand side of (3.41) is thus maximal when ¶2 μ g2 QJ (}) gQJ (}) = 2 QJ (}) g} 2 g} whose solution for real }, provided QJ (}) is known, will lead to the best lower bound for 1 in (3.41). On the other hand, this dierential equation can be solved for QJ (}). Rewritten as 1 gQJ (}) g}
2 gQJ (}) g2 QJ (}) = 2 g} QJ (}) g}
52
Eigenvalues of the adjacency matrix
and integrating both sides from 0 to } gives ¯ ¶ μ gQJ (}) ¯¯ gQJ (}) QJ (}) log log = 2 log g} g} ¯}=0 QJ (0) ¯ J (}) ¯ Using QJ (0) = Q and gQg} = Q1 = 2O, we obtain after simplification, ¯ }=0
gQJ (})
=
(QJ (}))2
2O g} Q2
After integrating both sides for 0 to } and after some rearrangement, we arrive at Q 1 2O Q }
QJ (}) =
which is, again, the generating function (3.19) of a regular graph because H [G] = 2O Q = u. At last, we can rewrite the inequality in (3.41) as 1
1 }+
QJ (})
gQJ (}) g}
or 1 }+ 1
1 g log QJ (}) g}
48. An improvement of the classical bound (3.31) in terms of the total number (3.8) of walks Qn is derived in Van Mieghem (2007) (and improved in Walker and Van Mieghem (2008)), μ ¶ Q3 Q1 Q2 Q1 Q13 4 1 +2 2 + ) (3.42) 0 + R(w Q 2Q Q2 2Q 3 s where w W , 0 = w Q , X 1 |dlm |) (3.43) max (dmm + W =s Q 1mp l6=m Since Q1 = xW Dx = 2O, the first term in (3.42) is the classical bound (3.31). The Lagrange series (3.42) with terms containing powers of 2m for m A 0 measures the 0 irregularity 1 H [G] of the graph. 49. A direct consequence of art. 24 is that the largest eigenvalue 1 is bounded by r 2O (Q 1) 1 (3.44) Q and equality in (3.44) occurs for the complete graph NQ (see Section 5.1). PQ Alternatively, in terms of the average degree gd = Q1 m=1 gm = 2O Q , the largest eigenvalue 1 is bounded by the geometric mean of the average degree and the
3.4 Bounds for the largest, positive eigenvalue 1 53 p maximum possible degree, 1 gd (Q 1). Combining the lower bound (3.34) and upper bound (3.44) with art. 23 yields s (r ) 2O Var [G] 2O (Q 1) > gmax 1+ (3.45) 2 1 min Q Q (H [G]) More generally, combining Theorem 63, art. 23 and art. 44 gives for any integer n1 ;Ã < 1 ! 2n μ ¶1 ? @ Q2n 2n Z2n (3.46) 1 min > g max = 1 + (Q 1)12n > Q where Zn is the number of closed walks with n hops. For n = 1, (3.46) reduces to (3.45). ¯ ¯ ¯ ¯ When we assume that ¯ q1 ¯ ? 1 for all 2 q Q , which, as mentioned in art. 45, excludes bipartite graphs, the definition of Zn (art. 36) indicates that ; < ( ¯ ) ¯ ¶n @ Q μ ? X ¯ max (2 > Q ) ¯n m n n ¯ ¯ ? 1 1 + (Q 1) ¯ Zn = 1 1 + ¯ = > 1 1 m=2 1@n
1@n
This implies that Zn is decreasing in n, because (1 + {) is for { A 0 and, in ³ ´n 1@n max(2 >Q ) addition, is exponentially decaying in n. Hence, limn$4 Zn = 1 . 1 While the left-hand side of (3.46) is increasing in n (art. 44), the right-hand side is decreasing in n. Together, they provide increasingly sharp bounds for 1 when n increases. 50. Bounds for connected graphs. As will be deduced in Section 4.1.1, a connected graph has an adjacency matrix that is irreducible (Section 8.5). We apply the bounds (8.54) in art. 169 to D2 by choosing | = x, ¡ ¡ ¢ ¢ min D2 x l 21 max D2 x l 1lQ
1lQ
¡ ¢ PQ P where D2 x l = (Dg)l = m=1 dlm gm = m5neighbors(l) gm . Thus, s min
1lQ
s
X
gm 1 max
m5neighb ors(l)
1lQ
X
gm
(3.47)
m5neighb ors(l)
Invoking the basic law of the degree (2.3), we have X ¡ 2 ¢ D x l = 2O gl gm 2O gl (Q 1 gl ) m 5neighbor(l) @
where the inequality arises from the connectivity of the graph, ¡ which ¢ implies that the degree gm of each node m is at least one. Thus, max1lQ D2 x l = 2O Q + 1
54
Eigenvalues of the adjacency matrix
and this maximum is reached in the complete graph NQ and in the star N1>Q1 . Hence, for any connected graph, we obtain the bound s 1 2O Q + 1 (3.48) which is sharper than (3.44), but the latter bound did not assume connectivity of the graph. When choosing | = g in (8.54) in art. 169, we obtain a companion of (3.47) for connected graphs: X X 1 1 min gm 1 max gm (3.49) 1lQ gl 1lQ gl m5neighb ors(l)
m5neighb ors(l)
51. From the inequality (8.42) for Hölder t-norms, we find for s A t A 0 that PQ PQ t s t s , n=1 |n | ? n=1 |n | ? ¯P ¯ P ¯ Q s¯ Since Q n=1 n = 0, not all n can be positive and combined with ¯ n=1 n ¯ ¯P ¯ PQ ¯ Q s s¯ s | | , we also have that ¯ n n=1 n=1 n ¯ ? . Applied to the case where t = 2 ¯P ¯ P ¯ Q 2 2 3¯ 3 and s = 3 gives the following implication: if Q ? then ¯ 1 n=2 n n=2 n ¯ ? 1 . In that case, the number of triangles given in (3.3) is ¯ ¯Q Q 1 3 1X 3 1 3 1 ¯¯X 3 ¯¯ NJ = 1 + n 1 ¯ n ¯ A 0 ¯ 6 6 6 6¯ PQ
n=2
n=2
+ ? then the number of triangles NJ in J is at Hence, if 2O = s least one. Equivalently, in view of (3.2), if 1 A O then the graph J contains at least one triangle. 2 n=2 n
21
221 ,
52. A theorem of Turan states that: Theorem 9 A graph J with Q nodes and more than one triangle.
h
Q2 4
i links contains at least
h 2i 2 This theorem is a consequence of art. 51. For, using O A Q4 Q4 , which is s equivalent to Q ? 2 O in the bound on the largest eigenvalue (3.31), s 2O 2O 1 A s = O Q 2 O s and 1 A O is precisely the condition in art. 51 to have at least one triangle. 53. The graph formed by the union of all hop paths, defined in art. 20, h shortest i Q2 between node x and node y has at most 4 links. Indeed, the union of two shortest paths P1 and P2 between the same sourcedestination pair cannot have a triangle. For, suppose that there were a triangle between three nodes l, m and n in the union P1 ^ P2 . Since the link l $ n is a
3.5 Eigenvalue spacings
55
subsection of a shortest hop path, it obeys the strict triangle inequality (all z = 1 are the same), z (l $ n) ? z (l $ m) + z (m $ n) otherwise the link l $ n is not the shortest hop path from l to n. Hence, the subsection l $ m $ n is not a shortest hop path. The triangle inequality thus implies that the union of the shortest hop paths between the same source-destination pair cannot have a triangle.h Turan’s Theorem 9 then states that the number of links in i Q2 that union is at most 4 . 54. We can deduce a set of lower bounds for 1 by considering (1.4). The PerronFrobenius theorem, as explained in art. 21, states that all eigenvector components are non-negative, which leads in (1.4) to the bound p p 1 A (D )mm
where (Dp )mm equals the number of closed p-hop walks starting and ending at node m (see art. 17). Since the bound holds for any node m and any integer p 1, we arrive at r 1 A max p1
p
max (Dp )mm
1mQ
(3.50)
The largest eigenvalue of the adjacency matrix is at least as big as the p-th root of the largest number of p-hop cycles around a node in the graph. For example, s invoking art. 30, (3.50) shows in the case p = 2 that 1 A gmax . The lower bound (3.50) is, in general, weak. The related upper bound v u Q X u p 1 max t max (Dp )lm p1
1lQ
follows from (8.51) and (8.47). Since Zp = is dierent from (3.46).
m=1
PQ m=1
(Dp )mm , the above upper bound
3.5 Eigenvalue spacings The dierence n n+1 , for 1 n Q 1, between two consecutive eigenvalues of the adjacency matrix D is called the n-th eigenvalue spacing of D. Only basic and simple, but general relations are deduced. Higher order dierences (see art. 207) are not considered, nor the combination with the powerful Interlacing Theorem 42 in art. 180. 55. Spectral gap. The dierence between the largest eigenvalue 1 and second largest 2 , called the spectral gap, is never larger than Q : 1 2 Q
(3.51)
56
Eigenvalues of the adjacency matrix
Indeed, since 1 A 0 as indicated by (3.45), it follows from (3.1) that 0=
Q X
n = 1 +
n=1
Q X
n 1 + (Q 1)2
n=2
such that 2
1 Q 1
Hence, Q 1 1 = Q 1 Q 1 Art. 23 states that the largest possible eigenvalue is 1 = Q 1, attained in the complete graph, which proves (3.51). Again, the equality sign in (3.51) occurs in case of the complete graph (see Section 5.1). When a link is removed in the complete graph, the spectral gap drops by at least 1 (see Section 5.10). The spectral gap plays an important role in the dynamics of processes on graphs (art. 64) and it characterizes the robustness of a graph due to its relation with the algebraic connectivity (art. 74 and Section 4.2). 1 2 1 +
56. Eigenvalue spacings. The sum over all spacings between two consecutive eigenvalues equals Q 1 X (n n+1 ) = 1 Q (3.52) n=1
Since each spacing n n+1 0, the largest possible spacing occurs when all but one spacing is zero, in which case max1nQ 1 n n+1 is equal to 1 Q . However, each spacing consists of two consecutive eigenvalues, which implies that Q = 2 (or Q 1 = 1 ). Art. 55 shows that the largest possible spacing is attained in the complete graph and is equal to Q , the largest possible spectral gap. Let denote an arbitrary spacing between two consecutive eigenvalues, then the telescoping series (3.52) shows that its average equals H [] =
1 Q 2gmax Q 1 Q 1
h i 2 2 The variance Var[] = H () (H []) is, however, more di!cult to compute, because it requires the determination of the sum i h 2 H () = PQ 1
or equivalently, of the sum Abel’s partial summation q X n=1
dn en =
n=1
q1 X n=1
Q 1 1 X 2 (n n+1 ) Q 1 n=1
n n+1 .
à n ! à q ! X X do (en en+1 ) + eq do o=1
o=1
(3.53)
3.5 Eigenvalue spacings
57
applied to (3.1) shows that Q 1 X
n (n n+1 ) = Q Q
n=1
Invoking (3.39) with dn = n (n n+1 ) and tn = n yields bounds for the minimum and maximum spacing between consecutive eigenvalues of the adjacency matrix D: 0
min
1nQ1
n n+1
2Q max n n+1 Q 1 1nQ 1
Relation (8.57) in art. 172 implies that
»
Q
Q 2
(3.54)
¼
such that the minimum spacing is never larger than min
1nQ 1
n n+1
» ¼ Q 2 Q 1 2
57. Inequalities for Q . Besides the general bounds in art. 172, new bounds for the smallest eigenvalue Q of the adjacency matrix D can be deduced, when known relationships are rewritten in terms of the spacings. Partial summation (3.53) of the total number of closed walks (3.24) yields, for any integer 0 p n, Ã m ! Q Q 1 X X X np n p Zn = o = o np Zp (3.55) (np m m+1 ) + Q m=1
m=1
o=1
while the generalization of the telescoping series (3.52) is, for any q, Q 1 X
¡ q ¢ m qm+1 = q1 qQ
(3.56)
m=1
The dierence qm qm+1 can be negative when eigenvalues are negative and q is P even. The sum mo=1 p o is always positive for m ? Q , which is immediate for even p. For odd p and denoting by t the index such that t 0 and t+1 ? 0, we can write m t m X X X p p = + p o o o o=1
o=1
o=t+1
where the first sum is strictly positive and the second is strictly negative. The secP p ond sum decreases with increasing m, and is thus larger than or equal to Q1 o=t+1 o . PQ 1 p However, in that extreme case where m = Q 1, the sum o=1 o = Zp p Q A 0. Pm p The minimum value of the sum o=1 o is attained for even p at m = 1 and for p p p odd p at either m = 1, if p 1 ? Zp Q , or at m = Q 1, if 1 A Zp Q . If
58
Eigenvalues of the adjacency matrix
p = 1, the minimum occurs at m = Q 1 provided 1 A |Q |, which excludes, as in art. 45, bipartite graphs. Pm np With this preparation, the inequality (3.39), with dm = tm o=1 p o , tm = m np m+1 A 0 and n p is odd, becomes, using (3.55) and (3.56), m X Zn np Zp p p Q min p o = min (Zp Q > 1 ) 1{p is 1mQ 1 np np 1 Q o=1
p odd} +1 1{p is even}
from which we arrive at the bound, for even p and for odd p provided Zp p Q A p 1 , Zn n1 np Q Zp p 1
(3.57)
p and, for odd p provided Zp p Q ? 1 ,
Zn nQ np 1 Zp p Q
(3.58)
For example, for n p = 1 and excluding bipartite graphs, (3.58) reduces for n = 2 and p = 1 and using (3.2) to 2O 2Q 1 Q from which we deduce that Q
1 2
μ q ¶ 2 1 + 8O + 1
For n = 3 and p = 2, and using (3.3), (3.57) becomes 6NJ 31 Q 2O 21 ´ PQ 1 ³Pm p Since we can only compute the sum m=1 for p = 0, the inequality o=1 o (3.39), with dm = tm (nm nm+1 ) and tm = m, yields for all integer n 0, using (3.55), ¡ ¢ 2 Zn Q nQ n n max nm nm+1 (3.59) min m m+1 1mQ 1 1mQ1 Q (Q 1) which is the generalization of (3.54). In particular, for odd values of n = 2q + 1, that conserve the ordering in the powers of eigenvalues, i.e., 2q+1 2q+1 m m+1 for all 1 m Q 1, the bounds in (3.59) can be of interest.
3.6 Additional properties 58. Cliques and cocliques. A clique of size p in a graph J with Q p nodes is a set of p pairwise adjacent nodes. Only when p = Q or the clique is a disjoint subgraph of J, the clique is a complete graph and each node has degree p 1. A
3.6 Additional properties
59
coclique is the complement of a clique; thus, a set of pairwise non-adjacent nodes. The clique number is the size of the largest clique, while the independence number is the size of the largest coclique. Suppose that J has a coclique of size f. We can always relabel the nodes such that the nodes belonging to that coclique possess the first f labels. The corresponding adjacency matrix D has the form " # Rf×f If×(Qf) D= IW Ie(Qf)×(Q f) (Q f)×f
Since the principle matrix Rf×f has f eigenvalues equal to zero, the Interlacing Theorem 42 shows that, for 1 m f, Q f+m (D) 0 m (D) Hence, the adjacency matrix has at least f non-negative and Q f + 1 non-positive eigenvalues. The converse is that the number q+ = {m : m (D) 0} of non-negative eigenvalues of D provides an upper bound for the independence number. Also, the number q = {m : m (D) 0} of non-positive eigenvalues of D bounds the independence number by Q q . Only for the complete graph NQ , where f = 1, there is only one positive eigenvalue. If one link (e.g. between node 1 and 2) in the complete graph is removed, the coclique has size f = 2, and two eigenvalues are non-negative. Consequently, the second largest eigenvalue 2 in any graph apart from NQ is at least equal to zero. Another argument is that, apart from the complete graph, any graph possesses the star N1>2 as a subgraph, whose adjacency eigenvalues are computed in Section 5.7. It follows then again from the Interlacing Theorem 42 that 2 0. Similarly, if J has a clique of size f, then, after relabeling, the adjacency matrix has the form " # (M L)f×f Jf×(Qf) D= e (Qf)×(Q f) JW J (Q f)×f
Since the principle matrix (M L)f×f has an eigenvalue f 1 and (1)[f1] eigenvalues by (5.1), the Interlacing Theorem 42 shows that, Q f+1 (D) f 1 1 (D) and, for 2 m f, Qf+m (D) 1 m (D) The bounds for the clique are less elegant than those for the coclique. 59. Almost all non-star-like trees are not determined by the spectrum of the adjacency matrix (van Dam and Haemers, 2003). Godsil and Royle (2001) start by the remark that the spectrum of a graph does not determine the degrees, nor whether the graph is planar and that there are many graphs that are co-spectral, i.e., although graphs are dierent, their spectrum is the same. Cvetkovi´c et al. (2009)
60
Eigenvalues of the adjacency matrix
devote a whole chapter on the characterization of graphs by their spectrum. They list theorems on graphs that are determined by their spectrum (such as regular graphs with degree u = 2 and complete bipartite graphs), but they also present counter examples. Finally, van Dam and Haemers (2003) conjecture that su!ciently large graphs are determined by their spectrum, roughly speaking because the probability of having co-spectral graphs becomes vanishingly small when the number of nodes Q increases. 60. Addition of a node to a graph. When one node q is added to a graph JQ to form the graph JQ+1 , the adjacency matrix of the latter is expressed as " # D y Q Q×1 DQ+1 = ¡ W ¢ (3.60) y 1×Q 0 where yQ ×1 is the zero-one connection vector of the new node q to any other node in JQ . The analysis below can be complemented with that in art. 181. The characteristic polynomial of DQ +1 is, invoking (8.79), ¸ DQ L y det (DQ +1 L) = det yW μ ³ ´ ¶ 1 = det (DQ L) det y W (DQ L) y 1×1 ³ ´ 1 W = + y (DQ L) y det (DQ L) With the expansion of the resolvent of D, 1
(D L)
4 ¢1 1¡ 1 X Dn 1 = L 1 D = = n n=0
we can write 1 X yW DnQ y 1 y= = n 4
1
W
y (DQ L)
n=0
μ
¶ D D2 L + + 2 + ··· (3.61)
Ã
4 X yW DnQ y gq + n
!
n=1
where y W y = gq is the degree of node q. If we denote the characteristic polynomial P n of DQ by fDQ () = Q n=0 fn (Q ) , then we obtain à ! Q Q +1 4 X X yW DnQ y X n fn (Q + 1) = + fn (Q ) n n+1 n=0
n=0
=
Q X
fn (Q ) n+1 +
n=0
=
Q +1 X n=1
n=0
Q 4 X X
¡ ¢ fm (Q ) y W DnQ y mn1
n=0 m=0
fn1 (Q ) n +
Q1 X
Q X
³ ´ mp1 fm (Q ) y W DQ y p
p=4 m=max(0>p+1)
3.6 Additional properties
61
Equating corresponding powers in yields1 , for 1 n Q 1, apart from Q +1 fQ +1 (Q + 1) = (1) and fQ (Q + 1) = fQ 1 (Q ) = 0, a recursion that expresses the coe!cients of the characteristic polynomial of DQ +1 in terms of those of DQ , Q ³ ´ X m(n+1) fm (Q ) y W DQ y fn (Q + 1) = fn1 (Q ) + m=n+1
while, for n = 0, f0 (Q + 1) =
Q X
³ ´ fm (Q ) y W Dm1 Q y = det DQ+1
m=1
Given the coe!cients {fn (Q )}0n?Q 1 , the quadratic forms un (y) = y W DnQ y for 1 n ? Q constitute the major computational eort to determine the coe!cients {fn (Q + 1)}0n?Q . Starting with Q = 2, the set can be iterated up to any size Q and any structure in DQ , each Q producing the set of coe!cients {fn (Q )}0n?Q 1 of a polynomial with integer coe!cients and real zeros lying in the interval ( (Q 1) > Q 1]. Moreover, art. 180 shows that all eigenvalues of DQ interlace with those of DQ+1 . These properties are also shared by orthogonal polynomials (see Chapter 10). Suppose that y is an eigenvector of D with eigenvalue y , then y = max (DQ ) = 1 by the Perron-Frobenius Theorem 38 because y has non-negative components2 . Hence, if y is the eigenvector belonging to the largest eigenvalue and scaled to have only zero and one components, DQ y = 1 y and (DQ L)1 y = (1 )1 y for 6= 1 such that ´ ³ gq 1 = y W (DQ L) y 1 1×1 and
¢ det (DQ L) ¡ det (DQ+1 L) = 2 1 gq 1
Hence, if y is the (unscaled) eigenvector of DQ belonging to the largest eigenvalue whose norm is kyk22 = yW y = gq , then the spectrum of DQ+1 consists of all eigenvalues of DQ , except for = 1 and two new eigenvalues, s ! Ã gq 1 1± 1+4 2 2 1 1
Observe that, for p ? 0, an identity is found because 3 4 Q [ m D 3p31 W C y fm (Q) DQ DQ y=0 m=0
2
S m by the Caley-Hamilton Theorem (art. 145), stating that fDQ (DQ ) = Q m=0 fm (Q) DQ = 0. If y has zero components, then D is reducible, which implies that the graph J is disconnected.
62
Eigenvalues of the adjacency matrix
In other words, the largest eigenvalue 1 of DQ is split up into a slightly larger one and a smaller one with strength related to the degree gq . Such a vector y exists, for example, when y = x and DQ is the adjacency matrix of a regular graph (art. 41). 61. Invoking (8.80), we obtain the alternative expression for the determinant ¶ ¸ μ yy W DQ L yQ ×1 (3.62) = det D det L + Q yW For any complex number }, the determinant det (DQ +1 L) can be split into two others: ¸ ¸ ¸ DQ L yQ×1 DQ L yQ ×1 DQ L yQ ×1 = det + det det yW y W zW } zW } The particular case where y = z = x, then reduces, for any } = 6 0, to ¸ ¸ DQ L xQ ×1 DQ L DQ L xQ×1 det = det det W W x x } 01×Q
xQ×1 +}
¸
When adding }1 times the last row to each other row in the first determinant, we obtain, with M = x=xW , ¶ μ 1 det (DQ +1 L) = } det DQ M L ( + }) det (DQ L) (3.63) } which reduces, for } = , to (3.62). Since the adjacency matrix of the complement Jf equals Df = M L D, choosing } = 1 in (3.63) results in det (DQ+1 L) = (1)Q det (DfQ + ( + 1) L) ( + 1) det (DQ L) When a node that connects to all other nodes in JQ is added such that y = x, the resulting graph JQ +1 is called the cone of JQ . The cone is always a connected graph. The cone construction is useful to convert a reducible matrix into an irreducible one, or to connect a graph with several disconnected clusters of components. An interesting application occurs in Google’s PageRank as discussed in Van Mieghem (2006b, pp. 224-228). Finally, in case y = x, the quadratic forms in art. 60, un (x) = xW DnQ x = Qn , represent the total number of walks with n hops (art. 33). 62. If is an equitable partition of the connected graph J, then the adjacency matrix D and the corresponding quotient matrix D have the same spectral radius. Indeed, art. 15 shows that the eigenvalues of the quotient matrix corresponding to an equitable partition are a subset of the eigenvalues of the symmetric matrix. Moreover, any eigenvector y of D belonging to eigenvalue is transformed to an eigenvector Vy with the same eigenvalue . The Perron-Frobenius Theorem 38 states that the eigenvector belonging to 1 is the only one with non-negative components. Both D and D are non-negative matrices. Since the characteristic matrix V of the partition has positive elements, both the eigenvector y and Vy have positive vector components and, thus, must belong to the spectral radius.
3.7 The stochastic matrix S = 1 D
63
63. The spectral radius of any tree with maximum degree gmax A 1 is smaller than s 2 gmax 1. Indeed, Lemma 2 and 7 show that the spectral radius of any tree is at most equal to that of a centrally symmetric tree Wgmax as defined in art. 16. Art. 62 allows us to concentrate on the quotient matrix D instead of the adjacency matrix. f derived in Gershgorin’s Theorem 36, applied to the transformed quotient matrix D s art. 16 shows that all eigenvalues are smaller than 2 gmax 1 because sggmax1 ? max s 2 gmax 1 for gmax A 1. 3.7 The stochastic matrix S = 1 D 64. The stochastic matrix S = 1 D, introduced in art. 3, characterizes a random walk on a graph. A random walk is described by a finite Markov chain that is timereversible. Alternatively, a time-reversible Markov chain can be viewed as a random walk on an undirected graph. Random walks on graphs have many applications in dierent fields (see, e.g., the survey by Lovász (1993) and the relation with electric networks by Doyle and Snell (1984)); perhaps the most important application is randomly searching or sampling. The combination of Markov theory and algebra leads to interesting properties of S = 1 D. The left-eigenvector of S belonging to eigenvalue = 1 is the steadystate vector (which is a 1 × Q row vector, see Van Mieghem (2006b)). The corresponding right-eigenvector is the all-one vector x. These eigenvectors obey the eigenvalue equations S W W = W and S x = x and the orthogonality relation x = 1 (art. 140). If g = (g1 > g2 > = = = > gQ ) is the degree vector, then the basic law ¡ g ¢W for the degree (2.5) is rewritten as 2O x = 1. The steady-state eigenvector is ¡ g ¢W x=1 unique (see Van Mieghem (2006b)) such that the equations x = 1 and 2O imply that the steady-state vector is μ ¶W g = 2O or gm m = (3.64) 2O In general, the matrix S is not symmetric, but, after a similarity transform K = 1@2 , a symmetric matrix U = 1@2 S 1@2 = 1@2 D1@2 is obtained whose eigenvalues are the same as those of S (art. 142). The powerful property (art. 151) of symmetric matrices shows that all eigenvalues are real and that U = X diag(U ) X W , where the columns of the orthogonal matrix X consist of the normalized eigenvectors yn that obey ymW yn = mn . Explicitly written in terms of these eigenvectors gives (art. 156) U=
Q X n=1
n yn ynW
64
Eigenvalues of the adjacency matrix
where, with the Perron-Frobenius Theorem 38, the real eigenvalues are ordered as 1 = 1 2 · · · Q 1. If we exclude bipartite graphs (where the set of nodes is N = N1 ^ N2 with N1 _ N2 = B and where each link connects a node in N1 and in N2 ) or reducible Markov chains (art. 167), then |n | ? 1, for n A 1. Art. 142 shows that the similarity transform K = 1@2 maps the steady state vector into y1 = K 1 W and, with (3.64), 1@2 W ° y1 = ° °1@2 W ° 2 or
s
r
gm 2O
y1m = s μ s ¶2 = PQ gm m=1
gm s = m 2O
2O
Finally, since S = 1@2 U1@2 , the spectral decomposition of the transition probability matrix of a random walk on a graph with adjacency matrix D is S =
Q X
n 1@2 yn ynW 1@2 = x +
n=1
Q X
n 1@2 yn ynW 1@2
n=2
¢ ¡ The q-step transition probability is, with yn ynW lm = ynl ynm and (3.64), Slmq
gm + = 2O
r
Q gm X q n ynl ynm gl n=2
The convergence rate towards the steady state m , also coined the “mixing rate”, can be estimated from r r Q Q X ¯ q ¯ gm X q q ¯Slm m ¯ gm |n | |ynl | |ynm | ? |n | gl gl n=2
n=2
Denoting by = max (|2 | > |Q |) and by 0 the largest element of the reduced set {|n |} \ {} with 2 n Q , we obtain r ¯ q ¯ ¯Slm m ¯ ? gm q + R (0q ) gl Hence, the smaller or, equivalently, the larger the spectral gap |1 | |2 | 1 , the faster the random walk converges to its steady-state. 65. The stochastic matrix S = 1 D can also be expressed in terms of the Laplacian T = D as S = L 1 T. This shows that the eigenvector { of S with corresponding eigenvalue is the same as that of the normalized Laplacian 1 T belonging to e = 1 and 0 e 2. Hence, the spectral gap of a stochastic matrix S also equals the second smallest eigenvalue of normalized Laplacian
3.7 The stochastic matrix S = 1 D 65 ¢ ¡ ¢ ¡ 1 T. Moreover, trace(S ) = trace(D) = 0 and trace S 2 = trace U2 implies, d with (U)lm = s lm , that gl gm
Q X
Q Q X X d d dlm p lm p ml = g g gl gm gl gm l=1 m=1 l=1 m=1 l m ³ ´2 ¾ 1 1 , we obtain that gl gm
2n (S ) =
n=1
½ With
1 gl gm
=
1 2
1 g2l
+
1 g2m
Q Q X X
μ ¶2 Q Q Q Q X Q Q Q Q X 1 1X 1 X 1X 1 X 1 XX 1 dlm = dlm + dml dlm gg 2 l=1 g2l m=1 2 m=1 g2m l=1 2 l=1 m=1 gl gm l=1 m=1 l m μ ¶2 Q Q X l1 X X 1 1 1 = dlm g gl gm l=1 l l=1 m=1
Thus, Q X n=1
2n
μ ¶2 Q Q X l1 X X 1 1 1 (S ) = dlm g gl gm l=1 l l=1 m=1
PQ 1 P 1 1 which shows that n=1 2n (S ) Q l=1 gl , where Q l=1 gl is the harmonic mean of the degree set {gl }1lQ . Only for regular graphs where gl = u, the double sum PQ disappears and n=1 2n (S ) = Qu . Since PQ
Q X n=1
2n (S ) =
Q Q 1 X X ¡ ¡ ¢¢2 1 n 1 T =1+ (1 en )2 n=1
n=1 2
1 + (Q 1) (1 e2 )
we find, for regular graphs, an upper bound for the spectral gap e2 1 A tight upper bound s ¶ μ gmax 1 2 2 + 1 e2 1 2 gmax
q
Q u u(Q 1) .
for a graph with diameter 4 is derived by Nilli (1991) using Rayleigh’s equation (4.12) and some ingenuity.
4 Eigenvalues of the Laplacian T
In the sequel, we denote the eigenvalues of the Laplacian T by to distinguish them from the eigenvalues of the adjacency matrix D.
4.1 General properties 66. The Laplacian T = D is symmetric because D and are both symmetric. Symmetry T = TW also follows from the other definition T = EE W . The quadratic form defined in art. 160, ° °2 {W T{ = {W EE W { = °E W {°2 0 (4.1) is positive semidefinite, which implies that all eigenvalues of T are non-negative and at least one is zero because det T = 0 as shown in art. 2. Thus, the zero eigenvalue is the smallest eigenvalue of T. The eigenvector belonging to the zero eigenvalue is P x because Q n=1 tln = 0 for each row 1 l Q ; in vector notation, Tx = 0. We order the real, non-negative eigenvalues of the Laplacian T as 0 = Q Q 1 · · · 1 . Similarly as for the adjacency matrix (art. 22), none of the eigenvalues of the Laplacian is a fraction of the form de , where d and e are co-prime and e A 1. Only integer and irrational eigenvalues are possible. 67. Since T is a symmetric matrix, all eigenvectors {1 > {2 > = = = > {Q are orthogonal (art. 151). Art. 66 shows that the eigenvector {Q = x belonging to the smallest eigenvalue Q = 0, such that, for all 1 m Q 1, xW {m =
Q X
({m )n = 0
n=1
Thus, the sum of all vector components of an eigenvector, dierent from {Q = x, is zero. When these eigenvector components are ranked in increasing order, then the smallest and largest eigenvector component of {m 6= x, with 1 m Q 1, have a dierent sign. 68. Gerschgorin’s Theorem 36 states that each eigenvalue of the Laplacian lies 67
Eigenvalues of the Laplacian T
68
in an interval | gm | gm around a degree gm -value. Hence, 0 2gm which shows that Gerschgorin’s Theorem 36, alternatively to art. 66, demonstrates that T is positive semidefinite. Moreover, 1 2gmax . This same bound (4.10) is also found by considering the non-negative matrix gmax L T whose largest eigenvalue is gmax and smallest eigenvalue is gmax 1 . The Perron-Frobenius Theorem 38 states that the positive largest eigenvalue is larger than the absolute value of any other one eigenvalue, whence gmax |gmax 1 |. This inequality is essentially the same as Gerschgorin’s and, thus, it implies (4.10). 69. The definition of T = D and (3.1) show that trace(T) = trace() = PQ m=1 gm . The basic law of the degree (2.3) and relation (8.7) combine to Q X
n = 2O
(4.2)
n=1
Hence, the average value of a Laplacian eigenvalue equals the average degree, H [] = H [G]. Corollary 1 shows that any partial sum with 1 m Q of ordered eigenvalues satisfies m X
g(n)
n=1
m X
n
(4.3)
n=1
where g(n) denotes the n-th largest degree in the graph, i.e., g(Q ) g(Q 1) · · · g(1) . 70. Applying the general relation (8.20) to the Laplacian yields Q X
¡ ¢ 2n = trace T2
n=1
The square equals ³ ´ T2 = ( D)2 = 2 + D2 D + (D)W such that Q ¡ ¢ X ¡ ¢ trace T2 = g2n + trace D2 n=1
Using (3.5) leads to Q X n=1
2n =
Q X n=1
g2n + 2O
(4.4)
4.1 General properties
69
Stochastically1 , when considering the eigenvalue and the degree G in a graph as a random variable, (4.4) translates to Var [] = Var [G] + H [G] £ ¤ where the variance Var[[] = H [ 2 (H [[])2 for any random variable [. Since H [G] A 0 (excluding graphs without links), the variability of the Laplacian eigenvalues is larger than that of the degree G in the graph. Applying Corollary 1 yields, for 1 m Q , m X
g2(n)
+
n=1
m X
g(n)
n=1
m X
2n
(4.5)
n=1
where g(n) denotes the n-th largest degree in the graph. 71. The case for the third powers in (8.20) needs the computation of the trace of T3 = ( D)3 = 3 2 D D + D2 D2 + DD + D2 D3 Since dmm = 0, all matrices to first power in D have a vanishing trace. By computing the product of the matrices, we find that Q ¢ ¡ ¢ X ¡ g2n trace D2 = trace (DD) = trace D2 = n=1
Hence, Q Q X ¡ ¢ X ¡ ¢ trace T3 = g3n + 3 g2n trace D3 n=1
n=1
where Q Q X Q Q X X ¡ ¢ X dmn dno dom = 3n trace D3 = m=1 n=1 o=1
n=1
¡ ¢ and (3.3) shows that trace D3 equals six times the number of triangles in the graph, which we denote by NJ . Combining all yields Q X n=1
3n =
Q X n=1
g3n + 3
Q X
g2n 6NJ
(4.6)
n=1
¡ ¢ P 2 For the complete graph, we have that trace D3 = Q (Q 1) (Q 2) and Q n=1 gn = 2 Q (Q 1) such that, for Q A 3, 3
Q X
g2n 6NJ = 3Q (Q 1) (3 Q ) ? 0
n=1 1
Each of the values 1 > 2 > = = = > Q is interpreted as a realization (outcome) of the random 1 SQ p variable and the mean of the p-th powers is computed as H [p ] = Q n=1 n .
Eigenvalues of the Laplacian T PQ while for a tree, where NJ = 0, the last two sums are 3 n=1 g2n 6NJ A 0. Thus, the sum of the third powers of the Laplacian eigenvalues can be lower and higher than the corresponding sum of the degrees. Stochastically, the third centered moment, which quantifies the skewness of the distribution, follows from (4.6) and (4.4) as h i h i 6NJ H ( H [])3 = H (G H [G])3 + 3Var [G] Q The third centered moment eigenvalue diers from that of the ¡ of the Laplacian ¢ degree G by an amount 3 Var [G] 2NQJ . 70
72. We can extend the previous computation to higher powers and compute, for any integer p 0, p
trace (Tp ) = trace (( D) ) by using the linearity and commutativity of the trace of the matrix product trace (FG) = trace (GF) p
When expanding the product ( D) in a sum of 2p matrix products2 and applying the commutativity of the trace, we find the usual binomial expansion p μ ¶ X ¢ ¡ p (1)pn trace n Dpn trace (( D)p ) = n n=0 ¢ ¡ Since n = diag gn1 > gn2 > = = = > gnQ , we have that Q Q Q X X ¡ ¢ X ¡ n pn ¢ ¡ n ¢ ¡ pn ¢ trace n Dpn = D ot D = oo to o=1 t=1
o=1
=
Q X
¡ ¢ gno Dpn oo
o=1
and trace (Tp ) =
p μ ¶ X p n=0
n
(1)pn
Q X
¡ ¢ gno Dpn oo
o=1
Taking into account that Doo = 0, we can express, using (8.20), the p-th moment of the Laplacian ¡ ¢ eigenvalues in terms of the degree sequence and the number of closed walks Dm oo of length m starting and returning at node o (art. 17), as μ ¶X Q Q p2 X μp¶ X ¡ ¢ p pn (1) gp1 + gno Dpn oo +(1)p Zp o n 2 n=1 o=1 o=1 n=1 o=1 (4.7) ¡ ¢ where we have introduced Zn = trace Dn as defined in art. 36. Q X
2
p n =
Q X
gp o +
Each product can be associated with a binary expression of a number 0 $ m $ 2p , in which a zero bit is replaced by the matrix { and a one bit by the matrix D.
4.1 General properties
71
73. Art. 158 relates the diagonal elements of a symmetric matrix to its eigenvalues and so provides another relation between the degree gm of node m and the set of Laplacian eigenvalues 0 = Q Q1 · · · 1 . The matrix equation (8.35) becomes \ = g, where the vector =£(1 > 2 > = = = > Q ), g = ¤ (g1 > g2 > = = = > gQ ) W and where the non-negative matrix \ consists of col= | · · · | | 1 2 Q ¢ ¡ umn vectors |m = {21m > {22m > = = = > {2Q m , where {nm is the m-th component of the n-th eigenvector of T belonging to n . Analogously to the adjacency matrix in art. 32, also for the Laplacian the determinant is singular, det \ = 0. This follows from (8.37) and the fact that |Q = Q1 x, because the sum of the first Q 1 columns in \ W is a multiple of the last column. Hence, beside the largest eigenvalue at 1, \ (and \ W ) has also a zero eigenvalue. The obvious consequence is that \ = g cannot be inverted to g = \ 1 . However, when deleting the last column and last row, the resulting matrix \˜ (minor of \ ) can be inverted and the degrees g1 > g2 > = = = > gQ 1 can be determined. The degree gQ of the Q -th node then follows from the basic law (2.3). 74. If J is regular, where all nodes have the same degree, gm = u for all 1 m Q , then the eigenvalues of the Laplacian T and the adjacency matrix D are directly connected because det (T L) = det ((u ) L D). Thus, for all 1 m Q , m (T) = u Q+1m (D)
(4.8)
Since Q (T) = 0, we again find as in art. 41 that the largest eigenvalue of the adjacency matrix in a regular graph equals 1 (D) = u. From (4.8), the dierence for all 1 m Q , m1 (T) m (T) = Q+1m (D) Q +2m (D) shows that the spectral gap (art. 55) in a regular graph equals 1 (D) 2 (D) = Q 1 (T). This relation might suggest that the spectral gap in any graph is related to the second smallest eigenvalue Q 1 of the Laplacian, whose properties are further explored in Section 4.2. However, Section 7.5.2 exhibits a graph with large spectral gap and small Q1 . 75. A direct application of Lemma 8 to D = T yields, for any eigenvalue 1 n Q, gmin n (T) n (D) gmax n (T) and g(n) 1 (D) n (T) g(n) Q (D) Equality is only reached when gmin = gmax = u as in a regular graph (art. 74). 76. The Laplacian spectrum of the complement Jf of J. From the adjacency matrix Df = M L D of the complement Jf of a graph J (art. 1), the Laplacian
Eigenvalues of the Laplacian T
72
of the complement Jf is immediate as Tf = f Df = (Q 1) L M + L + D = QL M T Let {1 > {2 > = = = > {Q = x denote the eigenvectors of T belonging to the eigenvalues Q 1 1 > 2 > = = =, and Q = 0, respectively. The eigenvalues of M are Q and [0] as shown in (5.1). Since Mx = Q x and M{m = 0 for 1 m Q 1 as demonstrated in art. 85, we observe that Tf x = 0 and Tf {m = (Q m ) {m Hence, the set of eigenvectors of T and of the complement Tf are the same, while the ordered eigenvalues, for 1 m Q 1, are m (Tf ) = Q Q m (T)
(4.9)
Art. 66, and alternatively also art. 68, indicate that all eigenvalues of a Laplacian matrix are non-negative, hence m (Tf ) 0 for all 1 m Q such that (4.9) then implies that Q Q m (T) 0, or that all Laplacian eigenvalues must lie in the interval [0> Q ]. Hence, the upper bound for 1 in art. 68 needs to be refined to 1 min (Q> 2gmax )
(4.10)
77. The Rayleigh’s inequalities in art. 152 us to consider the quadratic ¡ motivate ¢ form (4.1) further. The o-th component of E W { o = {l {m where the link o = l $ m connects node l and m, starting at node l = o+ and ending at node m = o . This observation allows us to write ° °2 X 2 {W T{ = °E W {°2 = ({o+ {o ) (4.11) o5L
Since the vector { consists of any real number, we may consider { as a real function i (q) acting on a node q. Thus, with {o+ = i (o+ ) and {o = i (o ), we finally arrive at X¡ ¡ ¢ ¡ ¢¢2 (Ti> i ) = i o+ i o P
o5L
where (j> i ) = {5N i ({) j ({) denotes the scalar product (art. 241) of two real functions i and j belonging to O2 (N ), the space of all real functions on the set of 2 nodes N for which the norm ki k = (i> i ) exists. 78. Art. 67 shows that the eigenvector { of T belonging to Q 1 must satisfy {W x = 0. By requiring this additional constraint and choosing the scaling of the
4.1 General properties
73
eigenvector such that {W { = 1, Rayleigh’s principle (art. 152) applied to the second smallest eigenvalue of the Laplacian results in Q 1 =
min
k{k22 =1 and {W x=0
{W T{
(4.12)
Applied to the complement Tf and with (4.9), we obtain Q 1 (Tf ) = Q 1 (T) =
min
k{k22 =1 and {W x=0
{W Tf {
Since {W Tf { = {W (Q L M T) { = Q {W T{ as follows from art. 76, we obtain that ¡ ¢ W Q { Q 1 (T) = min T{ = Q 2 max {W T{ 2 k{k2 =1 and {W x=0
k{k2 =1 and {W x=0
Hence, the largest eigenvalue of T obeys 1 (T) =
max
k{k22 =1 and {W x=0
{W T{ = Q Q1 (Tf )
4.1.1 Eigenvalues and connectivity 79. Disconnectivity is a special case of the reducibility of a matrix (art. 167) and expresses that no communication is possible between two nodes in a dierent component or cluster. A component of a graph J is a connected subgraph of J. Theorem 10 The graph J is connected if and only if Q 1 A 0. Proof: The theorem is a consequence of the Perron-Frobenius Theorem 38 for a non-negative, irreducible matrix. Indeed, consider the non-negative matrix L T, where gmax . If J is connected, then L T is irreducible and the Perron-Frobenius Theorem 38 states that the largest eigenvalue u of L T is positive and simple and the corresponding eigenvector {u has positive components. Hence, T{u = ( u) {u . Since eigenvectors of a symmetric matrix are orthogonal (art. 151) while xW {u A 0, {u must be proportional to x, and thus Q = u = 0. Since there is only one such eigenvector {u and since the eigenvalue u exceeds all others, all other eigenvalues of T must exceed zero. ¤ 80. A graph J has n components (or clusters) if there exists a relabeling of the nodes such that the adjacency matrix has the structure 5 6 D1 R = = = R 9 .. : 9 R D2 . : 9 : D=9 . : .. 7 .. 8 . R = = = Dn
74
Eigenvalues of the Laplacian T
where the square submatrix Dp is the adjacency matrix of the connected component p. The corresponding Laplacian is 5 6 T1 R = = = R 9 .. : 9 R T2 . : : T=9 9 . : .. 7 .. 8 . R
= = = Tn
Using (8.79) indicates that det (T L) =
n Y
det (Tp p L)
p=1
Since each block matrix Tp is a Laplacian, whose row sum is zero and det Tp = 0, the characteristic polynomial det (T L) has at least a n-fold zero eigenvalue. If each block matrix Tp is irreducible, i.e., the p-th cluster is connected, Theorem 10 shows that Tp has only one zero eigenvalue. Hence, we have proved: Theorem 11 The multiplicity of the smallest eigenvalue = 0 of the Laplacian T is equal to the number of components in the graph J. If T has only one zero eigenvalue with corresponding eigenvector x (art. 66), then the graph is connected; it has only one component. Theorem 11 as well as Theorem 10 also imply that, if the second smallest eigenvalue Q 1 of T is zero, the graph J is disconnected.
4.1.2 The number of spanning trees and the Laplacian T 81. Matrix Tree Theorem. The coe!cients {fn (T)}0nQ of the characteristic polynomial of the Laplacian fT ({) = det (T {L) =
Q X
fn (T) {n
n=0
can be expressed in terms of sums over minors (see art. 138). Apart from fQ = (1)Q , we apply (8.4) for 0 p ? Q to the Laplacian T = EE W X ¡ ¢ X ¢ ¢ ¡¡ (1)Q p fQp (T) = minorp EE W = det EE W p doo
doo
¢ ¡ where (T)p = EE W p denotes an p × p submatrix of T obtained by deleting the ¡ ¢ ¡ Q ¢ same set of Q p rows and columns and where the sum is of over all Q p = Q p ways in which Q p rows can be deleted among the Q rows. Since T = EE W and PO tlm = n=1 eln emn , deleting a row l in T translates to deleting row l in E. Thus, (E)p is an p × O submatrix of E in which the same Q p rows in E are deleted.
4.1 General properties 75 ¢ ¡ We apply the Binet-Cauchy Theorem 45 to det EE W p . Using (8.85) gives ¯ ¯ ¯ e1n1 · · · e1np ¯2 ¯ ¯ ¢ ¡ ¯ .. .. ¯ det EE W p = ··· ¯ . ··· . ¯¯ ¯ n1 =1 n2 =n1 +1 np =np1 +1 ¯ epn1 · · · epnp ¯ ¢ ¡ which illustrates that det EE W p is non-zero so that fT ({) is a polynomial with positive coe!cients. Descartes’ rule (art. 212) shows that fT ({) has no real positive zero, agreeing with the non-negativity of the Laplacian eigenvalues (art. 66). Poincaré’s Theorem 2 tells us that the square of the above determinant in the multiple sum is either zero or one. It remains to investigate for which set (n1 > n2 > = = = > np ) the determinant is non-zero, hence, of rank p. Art. 6 shows that, only if the subgraph formed by the p links (columns in the matrix of the above determinant) is a spanning ¡ ¢ tree, the determinant is non-zero. To conclude, det EE W p equals the total number of trees with p links that can be formed in the graph on p + 1 given nodes. The coe!cient (1)Qp fQ p (T) then counts all these spanning trees with p links over all possible ways of deleting Q p nodes in the graph. In summary, we have demonstrated the famous Matrix Tree Theorem: O X
O X
O X
Theorem 12 (Matrix Tree Theorem) In a graph J with Q nodes, the coe!cient (1)Q p fQp (T) of the characteristic polynomial of the Laplacian T equals the number of all spanning trees with p links in all subgraphs of J that are obtained after deleting Q p nodes in all possible ways. Clearly, f0 (T) = det T = 0 because there does not exist a tree with Q links that spans the Q nodes in a graph. The other extreme is, by convention, (1)Q fQ (T) = 1. Further, (1)Q 1 fQ 1 (T) = 2O, equals the number of spanning trees in J each consisting which equals twice the number of links in J. In¡ ¢of onePlink, O W deed, det EE 1 = n1 =1 |e1n1 | is the number of neighbors of node 1; taking Q 1 the sum over all possible ways to delete one row results in (1) fQ 1 (T) = PQ PO l=1 nl =1 |elnl |, which is the sum of the absolute value of all elements in E. This result also follows from the general relation (8.7) for the second highest degree coe!cient ¡ ¢in any polynomial and art. 69. When p = Q 1, art. 6 shows that det EE W Q1 equals the number of all spanning trees with Q 1 links in the graph J. Since there are precisely Q ways to remove one node (one row in E), the coe!cient f1 counts Q times all trees spanning all Q nodes in J. The characteristic polynomial of the Laplacian of the example graph in Fig. 2.1 is fT ({) = {6 18{5 + 125{4 416{3 + 659{2 396{ Ã Ã s !Ã s ! s !Ã s ! 7 13 7 5 7+ 5 7 + 14 ={ { { ({ 4) { { 2 2 2 2
Eigenvalues of the Laplacian T
76
The example graph has 18 spanning trees with one link, 125 consisting of two links, . . . , and 66 spanning trees with five links (396 = 6 × 66). There is another Matrix Tree Theorem variant for the coe!cients of the characteristic polynomial of T due to Kelmans and Chelnokov (1974) based on the notion of a forest. A forest is a collection of trees. A n-forest, denoted by In , is a forest consisting of n components and a 1-forest is a tree. A component m is a set Nm of nodes of J and two dierent components possess dierent nodes such that Nm _ No = B for each component m and o of a n-forest. A n-spanning forest of J is a n-forest whose union of components consists of all nodes of J, thus ^no=1 No = N , and a n-spanning forest of J has Q n links. Two n-spanning forests are dierent Qn if they have dierent sets of links. Finally, we denote by (In ) = o=1 qo , where qo = |No |. Theorem 13 (Matrix Tree Theorem according to Kelmans) In a graph J with Q nodes, the coe!cient (1)p fp (T) of the characteristic polynomial of the Laplacian T equals, for p = 0, f0 = 0 and, for 1 p Q , X (Ip ) (1)p fp (T) = al l Ip
where the sum is over all possible p-spanning forests of the graph J with precisely p components. Kelmans Theorem 13 is used in art. 87. Kelmans and Chelnokov (1974) also give3 1X 2 gn 2 Q
(1)Q 2 fQ 2 (T) = 2O2 O
n=1
(1)
Q 3
fQ 3 (T) =
X 4 3 1X 3 O 2O2 (O 1) g2n + gn 2NJ 3 3 Q
Q
n=1
n=1
where NJ is the number of triangles in J. Invoking the Newton identities in art. 198, we may verify that these expressions for the coe!cients fn (T) are consistent with (4.4) and (4.6).
4.1.3 The complexity 82. The complexity (J) of the graph J equals the number of all possible spanning trees in the graph. Let M denote the all-one matrix with (M)lm = 1 and M = x=xW , then adjT = (J) M 3
(4.13)
The first result is presented without proof, but a reference to the Russian PhD thesis of Kelmans is given, while the second result is obtained by using special types of graphs.
4.1 General properties
77
adj[ where [ 1 = det [ . Indeed, if rank(T) ? Q 1, then every cofactor of T is zero, thus adjT = 0 and (4.13) shows that (J) = 0 implying that the graph is disconnected. If rank(T) = Q 1, then TadjT = L det T = 0 which means that each column vector of adjT is orthogonal to the Q 1 dimensional space spanned by the row vectors of T. Thus, each column vector of adjT belongs to the null-space or kernel of T, which is one-dimensional and spanned by x, since Tx = 0. Hence, each column vector of adjT is a multiple of the vector x. Since T is symmetric, so is adjT ³and all the ´ multipliers must be equal such that adjT = M. Since ¡ ¢ adjT = det EE W Q 1 , the Matrix Tree Theorem 12 in art. 81 shows that (J) equals the total number of trees that span Q nodes. We apply the relation (4.13) to the complete graph NQ where T = Q L M. Equation (4.13) demonstrates that all elements of adjT are equal to (J). Hence, it su!ces to compute one suitable element of adjT, for example, (adjT)11 , which is equal to the determinant of the (Q 1)×(Q 1) principal submatrix of T obtained by deleting the first row and column in T,
5 9 9 (adjT)11 = det 9 7
Q 1 1 === 1 1 Q 1 === 1 .. .. .. . . . 1 1 === Q 1
6 : : : 8
Adding all rows to the first and subsequently adding this new first row to all other rows gives 5 9 9 (adjT)11 = det 9 7
1 31 .. . 31
1 Q 31 31
=== === .. . ===
1 31 .. . Q 31
6
5
: 9 : 9 : = det 9 8 7
1 0 .. . 0
1 Q 0
=== === .. . ===
1 0 .. . Q
6 : : Q 32 :=Q 8
Hence, the total number of spanning trees in the complete graph NQ , which is also the total number of possible spanning trees in any graph with Q nodes, equals Q Q2 . This is a famous theorem of Cayley of which many proofs exist, see, e.g., Lovász (2003) and van Lint and Wilson (1996, Chapter 2). 83. Equation (4.13) shows that all Q minors PQ 1 of T are equal to (J). Application of the general relation (8.4) for the coe!cients of the characteristic polynomial then gives f1 = Q (J), as earlier established in art. 81. Using (8.8) QQ 1 and the fact that Q = 0 (see art. 66) yields f1 = m=1 m . By combining both, the total number of spanning trees (J) in a connected graph is expressed in terms of the eigenvalues of the Laplacian T as (J) =
Q 1 1 Y m Q m=1
(4.14)
Eigenvalues of the Laplacian T
78
84. The complexity of J is also given by (J) =
det (M + T) Q2
(4.15)
Indeed, observe that MT = (ME) E W = 0 since ME = 0 as follows from M = x=xW and from (2.1) in art. 1. Hence, (Q L M) (M + T) = Q M + Q T M 2 MT = Q T and adj ((Q L M) (M + T)) = adj (M + T) adj (Q L M) = adj (Q T) Since TNQ = Q L M and as shown in art. 82, adj(Q L M) = Q Q 2 M and since adj(Q T) = Q Q1 adjT = Q Q1 (J) M, where we have used (4.13), adj (M + T) M = Q (J) M Left-multiplication with M + T taking into account that MT = 0 and M 2 = Q M finally gives (M + T) adj (M + T) M = det (M + T) M = Q 2 (J) M which proves (4.15). 85. Since Tx = 0, we also have that TM = R and, after taking the transpose, M W TW = MT = R. Hence, the Laplacian T = D commutes with the all-one matrix M, TM = MT. Recall from art. 41 that the adjacency matrix D and the all-one matrix M only commute if the graph is regular. Since commuting matrices have a common, not necessarily complete set of eigenvectors on Lemma 11, T and M have a common basis of eigenvectors. The all-one vector x is also an eigenvector of M with eigenvalue (M) = Q . All eigenvalues (5.1) of the rank 1 symmetric matrix M = x=xW are explicitly known. If [ is the matrix containing as columns the eigenvectors m1 = x> m2 > = = = > mQ of M and [ W [ = L, then diag(n (T)) = [ W T[. The matrix \ = L Q1 M projects any other vector { orthogonal to the vector x. Hence, a set of eigenvectors of M consists of Q 1 columns of \ and the vector x. However, there are infinitely many sets of basisvectors that are also eigenvectors of M, but not necessarily of T. Hence, the di!culty lies in finding [T among all those of [M . 86. Vice versa, if {n is an eigenvector of T belonging to n A 0, then it is also an eigenvector of M because M{n = 0 for any {n orthogonal to x. This means that the eigenvalues of M + T consists of the eigenvalue Q with eigenvector x and the set 0 ? m > m1 > = = = > 1 where m Q 1. If m = Q 1, Theorem 11 shows that the graph is connected, else the graph is disconnected. A connected graph satisfies Q det (M + T) = Q Q1 m=1 m and the complexity via (4.15) leads again to (4.14). If Ju is a regular graph where all nodes have degree u, then art. 74 shows that
4.1 General properties
79
m = u Q+1m . Substituted in (4.14) yields (Ju ) = Q
1
Q1 Y
(u Q+1m ) = Q
Q Y
1
(u p )
p=2
m=1
The characteristic polynomial fDu ({) of the adjacency matrix of Ju equals fDu ({) = ({ u)
Q Y
({ p )
p=2
from which we deduce that ¯ Q Y gfDu ({) ¯¯ = (u p ) = Q (Ju ) g{ ¯{=u p=2 87. Since f0 = det T = 0, the characteristic polynomial of the Laplacian is fT ({) = {
Q 1 X
fn+1 (T) {n
n=0
Applying the Newton equations (art. 198) to Q 1 X n=1
fT ({) {
gives
f2 (T) 1 = n f1 (T)
PQ1
Since all zeros of n=0 fn+1 (T) {n for a connected graph are positive and 2O (art. 81), art. 198 provides the bound
fQ 1 fQ
=
2
f2 (T) (Q 1) f1 (T) 2O
Art. 83 shows that f1 (T) = Q (J), while the Matrix Tree Theorem 12 in art. 81 indicates that f2 (T) equals the number of all spanning trees with Q 2 links in all subgraphs of J that are obtained after deleting any pair of two nodes in J. For a tree J = W , we have that (J) = 1 and f1 (T) = Q , while Kelmans’ Theorem 13 states that X f2 (T) = (I2 ) all I2
where the sum is over all possible 2-spanning forests of the graph J with precisely two components. A 2-spanning forest is constructed from a spanning tree of J in which one link is deleted such that two disjoint trees W1 and W2 are obtained. Now, (I2 ) = q1 q2 is also equal to the number of ways of choosing a node y1 in tree W1 (component 1) and a node y2 in W2 (component 2). Since J is a tree, the number of pairs (W1 > y1 ) and (W2 > y2 ) equals the distance k (y1 > y2 ) in hops between node y1
80
Eigenvalues of the Laplacian T
and node y2 , since (W1 > y1 ) and (W2 > y2 ) can only be obtained by deleting one of the links in J on the path from y1 to y2 . Thus, X X Q (Q 1) H [KW ] f2 (T) = k (y1 > y2 ) = 2 y1 5N y2 6=y1 5N
where KW is the hopcount in the tree W . Hence, the average hopcount in any tree satisfies Q 1 2 X 1 (4.16) H [KW ] = Q 1 n n=1
Mohar (1991) has attributed formula (4.16) to Brendan McKay, who provided the above derivation. Section 7.8.3 demonstrates, via inequality (7.26), that the righthand side of (4.16) is a lower bound for the average hopcount in any graph.
4.2 Second smallest eigenvalue of the Laplacian T The second smallest eigenvalue Q1 of the Laplacian has many interesting properties and was coined by Fiedler (1973), the algebraic connectivity of a graph. Mainly bounds are presented in this section, whereas art. 97 provides the major motivation to focus in depth on the algebraic connectivity Q 1 .
4.2.1 Upper bounds for Q1 88. The eigenvector belonging to the smallest, zero eigenvalue Q is x (art. 66). In the terminology of art. 77 and art. 241, any constant function i ({) = f is an eigenfunction of Q . Rayleigh’s theorem (art. 152) states that, for any function i orthogonal to a constant function f, we have Q1 (i> i ) (Ti> i ) and the minimizer, for which equality holds, is the eigenfunction belonging to the second smallest eigenvalue Q 1 . With art. 77, we obtain P 2 (i (o+ ) i (o )) (4.17) Q 1 o5LP 2 q5N i (q) P for any i that satisfies (i> f) = f q5N i (q) = 0. The latter condition is always P fulfilled if we choose i ({) = j ({) Q1 q5N j (q), where the last term can be interpreted as an average of j over all nodes of the graph. In addition, for such a choice, (i (o+ ) i (o )) = (j (o+ ) j (o )) such that P + 2 o5L (j (o ) j (o )) Q 1 P ¡ ¢2 P 1 2 q5N j (q) Q x5N j (x) for any non-constant function j.
4.2 Second smallest eigenvalue of the Laplacian T
81
For example, choose the vector or eigenfunction i as i (x) = 1, i (y) = 1 and i (q) = 0 for any node q 6= y 6= x. This vector is orthogonal to the constant, (i> f) = 0. Inequality (4.17) then gives Q1
gx + gy 2
A sharper bound using the same method is obtained in (4.20). 89. There is an alternative representation for (i> i ) or for {W { = k{k22 due to Fiedler. Since, Q Q X X l=1 m=1
2
({l {m ) =
Q Q X X
{2l + 2
l=1 m=1
Q X
{l
l=1
¡ ¢2 = 2Q {W { + 2 xW {
Q X
{m +
m=1
Q Q X X
{2m
l=1 m=1
and since any eigenvector { that does not belong to Q = 0 is orthogonal to x, we find that Q Q 1 XX {W { = ({l {m )2 2Q l=1 m=1 Thus, if i is the eigenfunction of T belonging to Q 1 , then P 2 2Q o5L (i (o+ ) i (o )) Q1 = P P 2 x5N y5N (i (x) i (y))
(4.18)
The numerator and denominator are invariant to the addition of a constant. If i is not the eigenfunction of T belonging to Q 1 , i.e., i 6= f, Rayleigh’s principle in art. 152 states that P 2 2Q o5L (i (o+ ) i (o )) Q1 P (4.19) P 2 x5N y5N (i (x) i (y)) The advantage of this inequality is, that explicit orthogonality for i to the constant function f, (f> i ) = 0, is not required anymore since it is implicitly incorporated into the denominator. For example, choosing now the eigenfunction i as i ({) = 1{{=z} leads, with (i (x) i (y))2 = 1{x=z} + 1{y=z} provided x 6= y and X X X X X X (i (x) i (y))2 = 1{x=z} + 1{y=z} x5N y5N
x5N y5N \{x}
=2
X
1{x=z}
x5N
X
y5N x5N \{y}
1 = 2 (Q 1)
y5N \{x}
to Q1
Q gz Q 1
Since the inequality holds for any node z, the sharpest bound is reached when
Eigenvalues of the Laplacian T
82
gz = gmin and we find an inequality for the second smallest eigenvalue of the Laplacian Q Q 1 gmin (4.20) Q 1 Since equality is attained for the complete graph NQ as shown in Section 5.1, the bound (4.20) is generally the best possible. Recall that this inequality also follows from (8.62) in Fiedler’s Theorem 41 for symmetric, positive semidefinite matrices. The bound (4.20) is also derived from the Alon-Milman inequality (4.33) as shown in art. 97. 90. We apply (4.20) to the complement Jf of a graph J, Q Q gmin (Jf ) = (Q 1 gmax (J)) Q 1 Q 1 Q gmax (J) =Q Q 1
Q 1 (Tf )
Using (4.9) yields a lower bound for the largest eigenvalue of the Laplacian Q gmax 1 min (Q> 2gmax ) Q 1
(4.21)
where the upperbound follows from (4.10). Grone and Merris (1994) succeeded in improving Fiedler’s lower bound (4.21). Since their proof is not illuminating, however, we merely quote the improved lower bound, 1 gmax + 1
(4.22)
which they claim is a strict inequality when gmax ? Q 1. Applying (4.22) to the complement Jf then shows that 1 (Tf ) gmax (Jf ) + 1 = Q gmin (J) and, with (4.9), that Q 1 (T) gmin (J)
(4.23)
4.2.2 Lower bounds for Q 1 91. Lower bounds for any Laplacian eigenvalue. Recently, Bouwer and Haemers (2008) have impressively extended the type of lower bound (4.22) of Grone and Merris. Theorem 14 (Brouwer and Haemers) For any graph but Np + (Q p) N1 , the disjoint union of the complete graph Np and Q p isolated nodes, the m-th largest Laplacian eigenvalue is lower bounded, for 1 m Q , by m g(m) m + 2
(4.24)
4.2 Second smallest eigenvalue of the Laplacian T
83
where g(m) is the m-th largest nodal degree. Proof: The proof of Bouwer and Haemers (2008) cleverly combines the generalized interlacing Theorem 43 applied to a specific quotient matrix N , defined in art. 15. The proof is rather complex and thus omitted. ¤ Bouwer and Haemers (2008) also discuss graphs for which equality in (4.24) is reached. Since m 0, the bound (4.24) becomes useless when g(m) ? m 2. In fact, we may introduce slack variables m 0 in (4.24) to obtain the equality m = g(m) m + 2 + m Substitution into the p-th moment formula (4.7) specifies the moments PQ P For example, for p = 1, we find from (4.2), using Q m=1 g(m) = m=1 gm , Q X m=1
m =
PQ
p m=1 m .
Q (Q 3) 2
which shows that the average of the m ’s increases linearly with Q . The cases for higher values of p are more involved, as illustrated for p = 2, which is derived from (4.4), ¢ ¡ Q Q Q Q X X X X Q 2Q 2 9Q + 13 2 + 2O + 2 m = g2m + 2 m (m gm ) 2 gm m 6 m=1 m=1 m=1 m=1 The last sum is related to the covariance H [G] H [G] H [] and, in general, di!cult to assess. The above method of equating moments (see also art. 246) suggests to consider m = g(m) + ˜m , where the dierence ˜m can be negative as well as positive, but the average dierence is zero. 92. Lower bounds for Q 1 . First, the upperbound (4.10) applied to the complement Jf , 1 (Tf ) 2gmax (Jf ) = 2 (Q 1 gmin (J)) > and (4.9) give Q + 2 + 2gmin (J) Q 1 which is, in most cases, useless if the left-hand side is negative. Hence, more ingenious methods are needed. We will now apply the functional framework of art. 77 and art. 88 to derive a better lower bound for the second smallest eigenvalue of the Laplacian. Assume that i is the eigenfunction of T belonging to Q1 for which the equality sign holds in (4.17), P 2 (i (o+ ) i (o )) Q 1 = o5LP 2 q5N i (q)
Eigenvalues of the Laplacian T
84
Let node x for which |i (x)| = maxq5N |i (q)| A 0. Clearly, X i 2 (q) Q i 2 (x) q5N
P
Since q5N i (q) = 0 as shown in art. 88, there exists a node y for which i (x) i (y) ? 0. Since Q 1 A 0 provided the graph is connected, it means that there exists a path P (y> x) from y to x with hopcount k (P). The minimum number of links to connect a graph occurs in a minimum spanning tree (MST) consisting of Q 1 links. Only if the diameter k (P) of J is smaller than Q 1, we have a strict inequality in X¡ ¡ ¢ X ¡ ¡ ¢ X ¡ ¡ ¢ ¡ ¢¢2 ¡ ¢¢2 ¡ ¢¢2 i o+ i o i o + i o i o+ i o o5L
o5M ST
o5P(y>x)
By the Cauchy-Schwarz inequality (8.41), we have 3 X X ¡ ¡ ¢ ¡ ¢¢ 2 i o+ i o k (P) C o5P(y>x)
42 ¯ ¡ +¢ ¡ ¢¯ ¯i o i o ¯D
o5P(y>x)
3 C
42 ¡ ¡ +¢ ¡ ¢¢ D i o i o
X
o5P(y>x) 2
= (i (y) i (x)) i 2 (x) because i (x) i (y) ? 0. Thus, X o5P(y>x)
¡ ¡ +¢ ¡ ¢¢2 i 2 (x) i 2 (x) i o i o k (P)
Combining all inequalities leads to Q1
1 Q
This bound can only be improved a little by using the definition (4.18). Indeed, using the above inequality based on the Cauchy-Schwarz inequality, we have X X X X X ¡ ¡ ¢ ¡ ¢¢2 i o+ i o (i (x) i (y))2 k (P (y> x)) x5N y5N
x5N y5N
=
X X
x5N y5N
o5P(y>x)
X¡ ¡ ¢ ¡ ¢¢2 i o+ i o k (P (y> x)) 1{o5P(y>x)} o5L
X¡ ¡ ¢ ¡ ¢¢2 X X i o+ i o = k (P (y> x)) 1{o5P(y>x)} o5L
x5N y5N
X¡ ¡ ¢ ¡ ¢¢2 X X i o+ i o 1{o5P(y>x)} o5L
x5N y5N
4.2 Second smallest eigenvalue of the Laplacian T
85
where the betweenness of a link o is defined as 1 X X 1{o5P(y>x)} Eo = 2 x5N y5N
Art. 53 shows that the number of links in the union ofh alli shortest hop paths 2 between a same source x and destination y is at most Q4 . This means that h 2i an arbitrary link o can only occur in at most Q4 unions4 of shortest hop paths between all pairs (x> y), 2¸ X X Q (4.25) 2Eo = 1{o5P(y>x)} 2 4 x5N y5N
Thus, (4.18) leads to Q1
4 Q
(4.26)
This lower bound (4.26) is the best possible. As mentioned by Mohar (1991), McKay has shown that in a tree of diameter = w + 2, obtained from a w-hop path, where n nodes are connected to each of its end-nodes such that Q = w + 1 + 2n, (4.26) is sharp if nw $ 4. 93. We present another interpretation, deduced from art. 92, by rewriting X X u= k (P (y> x)) 1{o5P(y>x)} x5N y5N
=
X X
¡ ¢ k (P (y> x)) 1{o5P(y>x)} 1 + 1
x5N y5N
=
X X
k (P (y> x))
X
X
¢ ¡ k (P (y> x)) 1 1{o5P(y>x)}
x5N y5N \{x}
x5N y5N
because k (P (x> x)) = 0. In all other cases where nodes x and y are dierent, k (P (y> x)) 1, such that the last sum is bounded: X X ¢ ¡ ue = k (P (y> x)) 1 1{o5P(y>x)} x5N y5N \{x}
X
X
x5N y5N \{x}
μ ¶ X ¡ ¢ Q 1 1{o5P(y>x)} = 2 2
X
1{o5P(y>x)}
x5N y5N \{x}
μ ¶ Q2 Q (Q 2) Q = 2 2 4 2 2
where in the last line (4.25) has been used. With the definition of the average hopcount, X X k (P (y> x)) = Q (Q 1) H [K] x5N y5N 4
k The maximum betweenness in any graph, Eo $ by Wang et al. (2008).
Q2 4
l , can also be proved dierently, as shown
Eigenvalues of the Laplacian T
86 we thus find X X
k (P (y> x)) 1{o5P(y>x)} Q (Q 1) H [K]
x5N y5N
Q (Q 2) 2
Again, (4.18) shows that Q 1
2 (Q 1) H [K]
(Q 2) 2
or H [K]
(Q 2) 2 + (Q 1)Q1 2 (Q 1)
94. Another type lower bounds for Q 1 . Let i be the eigenfunction of T belonging to Q 1 . Since i is non-zero and orthogonal to the constant function, X ¯ ¡ ¢¯ X ¡ ¢ X ¯i q ¯ i (q) = i q+ 0= q+ 5N
q5N
q 5N
where, for q+ 5 N , i (q+ ) A 0 and q 5 N , i (q ) 0. Let us define the set of positive nodes N + = {q 5 N : i (q) A 0} and N = N \N + . Similarly, let L+ = {x+ y + 5 N : x+ > y + 5 N + } denote the set of all links between positive nodes and L = {x+ y 5 N : x+ 5 N + > y 5 N } denote the set of all links between positive nodes and negative nodes. Since i is an eigenfunction of T, for each nodal component x 5 N holds that (Ti ) (x) = Q1 i (x) Multiplying both sides by i (x) and summing over positive nodes yields P + Ti (y) i (y) P Q1 = y5N 2 y5N + i (y) Using the definition (art. 2) of the Laplacian T = D, X X Ti (y) i (y) = i (y) (i Di ) (y) y5N +
y5N +
X
=
y5N +
X
=
3 i (y) Cg (y) i (y) X
X
4 i (x)D
x5neighb or(y)
i (y) (i (y) i (x))
y5N + x5neighb or(y)
Further, after splitting the neighbors into positive and negative nodes, X X X Ti (y) i (y) = i (y) (i (y) i (x)) + i (y) (i (y) i (x)) y5N +
y + x+ 5L+
y + x 5L
4.2 Second smallest eigenvalue of the Laplacian T
87
Since the graph is bidirectional, i.e., DW = D, the link y+ x+ = x+ y + appears twice in the sum such that X X i (y) (i (y) i (x)) = {i (y) (i (y) i (x)) + i (x) (i (x) i (y))} y + x+ 5L+
x+ y + 5L+
X
=
(i (y) i (x))2
x+ y + 5L+
where a link x+ y+ 5 L+ is only counted once. Similarly as before, we denote the link o = x+ y + by the head of link as o+ = x+ and by the tail as o = y+ . Thus, we arrive at X¡ ¡ ¢ X X ¡ ¢¢2 i o+ i o Ti (y) i (y) = + i (y) (i (y) i (x)) y5N +
o5L+
y + x 5L
and P Q 1 =
2
o5L+
(i (o+ ) i (o )) + P
P
y5N +
y + x 5L i 2 (y)
i (y) (i (y) i (x))
Also the last sum in the numerator is non-negative because each term ¡ ¢¢ ¡ ¢¡ ¡ ¢ i y + i y + i x A 0 such that
X
X
Ti (y) i (y)
y5N +
2
(i (y) i (x))
x+ y+ 5L+
which leads to a lower bound P Q 1
(i (o+ ) i (o )) P 2 q5N + i (q)
2
o5L+
(4.27)
The lower bound (4.27) resembles the upper bound (4.17), except that only positive nodes and links are considered and that i is not arbitrary, but the eigenfunction of T belonging to the eigenvalue Q 1 . We can improve this lower bound (4.27) by incorporating positive terms in P y + x 5L i (y) (i (y) i (x)), that we have neglected. This means that also links outside the positive cluster are taken into account. Following Alon (1986), we can define j (y) = i (y) 1{y5N + } such that X X i (y) (i (y) i (x)) (j (y) j (x))2 y+ x 5L
y + x 5L
With this function, the first sum remains unaltered, X¡ ¡ ¢ X¡ ¡ ¢ ¡ ¢¢2 ¡ ¢¢2 X ¡ ¡ + ¢ ¡ ¢¢2 i o+ i o j o+ j o j o j o = = o5L+
o5L+
o5L
Eigenvalues of the Laplacian T P P 2 2 2 and, also q5N + i (q) = q5N + j (q) = q5N j (q). Thus, the improved lower bound is P + 2 o5L (j (o ) j (o )) P (4.28) Q 1 2 q5N j (q) 88
P
95. Let J + {h} denote a graph obtained from J by adding a link h between two nodes of J. For any i orthogonal to a constant function, we have that P P + 2 + 2 2 (i (h+ ) i (h )) o5L(J+{h}) (i (o ) i (o )) o5L(J) (i (o ) i (o )) P P P = + 2 2 2 q5N i (q) q5N i (q) q5N i (q) If i = iJ+{h} is an eigenfunction of J + {h} corresponding to Q1 (J + {h}), then (4.17) shows that ¡ ¢2 iJ+{h} (h+ ) iJ+{h} (h ) P Q 1 (J + {h}) Q 1 (J) + 2 q5N iJ+{h} (q) On the other hand, if i = iJ is an eigenfunction of J corresponding to Q 1 (J), then 2 (iJ (h+ ) iJ (h )) P Q1 (J + {h}) Q1 (J) + 2 q5N iJ (q) In the first bound, ¡ ¢2 iJ+{h} (h+ ) iJ+{h} (h ) P e= 2 q5N iJ+{h} (q) =
2 2 (h+ ) + iJ+{h} (h ) 2iJ+{h} (h+ ) iJ+{h} (h ) iJ+{h} P 2 2 2 iJ+{h} (h+ ) + iJ+{h} (h ) + q5N \{h+ >h } iJ+{h} (q) ¯ ¯ ¯ ¯ 2 + 2 + ¯¯ ¯ iJ+{h} (h ) + iJ+{h} (h ) + 2 iJ+{h} (h ) iJ+{h} (h )¯ 2 2 iJ+{h} (h+ ) + iJ+{h} (h )
2 | because max{0>|0 {22{| +| 2 = maxu= { 0 the second bound, we arrive at
2u 1+u2
= 1. With
(iJS (h+ )iJ (h ))2 q5N
Q 1 (J) Q 1 (J + {h}) Q 1 (J) + 2
2 (q) iJ
0 in (4.29)
The same bounds (4.29) are elegantly proved by invoking interlacing (art. 183) on TJ+{h} = TJ + T{h} . Indeed, the Laplacian T{hlm } of a link hlm between node l and m has precisely four non-zero elements: tlm¡ = tml = 1 ¢ and tll = tmm = 1. The eigenvalues of T{hlm } are obtained from det T{h} L after expanding the determinant in cofactors over row l (or m), ¡ ¢ 2l Q2 2 Q 2 det T{h} L = (1) () (1 ) (1)l+m (1)m+l () Q 1
= ()
( 2)
4.3 Partitioning of a graph
89
Q 1
and 2; the interlacing inequality (8.72) The eigenvalues of T{hlm } are thus [0] leads to (4.29). Art. 69 shows that, by adding one link, the sum of all eigenvalues increases by 2. Hence, when the upper bound in (4.29) is achieved, all other eigenvalues of TJ+{h} are precisely equal to those of TJ .
4.3 Partitioning of a graph The problem of graph partitioning consists of dividing the nodes of a graph into a number of disjoint groups, also called partitions (see art. 14), such that a certain criterion is met. The most popular criterion is that the number of links between these disjoint groups is minimized. Sometimes, the number of those partitions and their individual size is prescribed. Most, but not all (see art. 96), variants of the graph partitioning problem are NP-hard. 96. Graph partitioning into two disjoint subsets. When confining to a graph partitioning into two disjoint subsets (subgroups, clusters, partitions,...), an index vector | can be defined with vector component |m = 1 if the node m belongs to one partition and |m = 1 if node m belongs to the other partition. The number of links U between the two disjoint subsets, also called the cut size or size of the separator, elegantly follows from the characteristic property (4.11) of the Laplacian, 1 1X 2 U= (|o+ |o ) = | W T| (4.30) 4 4 o5L
because, only if the starting node o+ and the ending node o of a link o belong to a dierent partition, (|o+ |o )2 = 4, else |o+ = |o . The minimum cut size is obviously 1 Umin = min | W T| |5Y 4 where Y is the set of all possible index vectors of the Q -dimensional space with either 1 or 1 components. Since all eigenvectors {{n }1nQ of the Laplacian T are orthogonal (art. 151), PQ any vector can be written as a linear combination. Let | = m=1 m {m , then 1X X m n {Wm T{n U= 4 m=1 Q
Q
n=1
and using the orthogonality property (8.25) in art. 151, we obtain 1X 2 m 4 m=1 m Q
U=
(4.31)
Since Q = 0 and all other eigenvalues are larger than zero for a connected graph
90
Eigenvalues of the Laplacian T
(Theorem 10), the alternative eigenvalue expression (4.31) shows that U is a sum of positive real numbers. Although Stoer and Wagner (1997) have presented a highly ¡ e!cient, non-spectral ¢ min-cut algorithm with a computational complexity of R Q O + Q 2 log Q , which demonstrates that the min-cut problem is not NP-hard, the minimization of (4.31) is generally di!cult. However, if one chooses in (4.30) | = Q 1 {Q 1 , then U = 1 2 4 Q 1 Q 1 , which is, in view of (4.31), obviously the best possible to minimize U. Unfortunately, choosing the index vector | parallel to the Fiedler vector {Q1 is generally not possible, because {Q 1 5 @ Y. A good strategy is to choose the sign of the components in | according to the sign of the corresponding component in the Fiedler vector. A slightly better approach is the choice | = Q x + {Q1 , since the eigenvector x belonging to Q = 0 does not aect the value of U in (4.31) and it provides a higher degree of freedom to choose the size of each partition. This strategy agrees with Fiedler’s graph partitioning explained in art. 103. 97. The Alon-Milman inequality. Another approach to the separator problem is to establish useful bounds. As we will demonstrate here, it turns out that the algebraic connectivity Q 1 plays an important role in such bounds. Our starting point is the upper bound in (4.17) for Q 1 . The ingenuity lies in finding a function i , introduced in art. 88, satisfying (i> f) = 0 that has both a graph interpretation and that provides a tight bound for Q 1 in (4.17). Alon and Milman (1985) have proposed the function μ ¶ 1 1 1 min (k> k (x> D)) j (x) = + d d e k P where (j> f) 6= 0 such that i = j j¯ in art. 88, with j¯ = Q1 q5N j (q). Further, k is the distance (in hops) between two disjoint subsets D and E of N , k (x> D) is the shortest distance of node x 5 N to a node of the set D and d = QQD and e = QQE , where Qn = | Nn | is the number of nodes of set Nn . Clearly, if x 5 D, then j (x) = d1 , while, if x 5 E, then k (x> D) = k and j (x) = 1e . Moreover, if x and y are adjacent, i.e., they are either head (x = o+ ) or tail (x = o ) of a link o, then μ ¶ 1 1 1 + (4.32) |j (x) j (y)| k d e Indeed, if x and y belong to the same set, then j (x)j (y) = 0. If x 5 ¡D and ¢y 5 @ D, then k (y> D) = 1, because x and y are adjacent and j (x) j (y) = d1 + 1e k1 . If both x and y do not belong to D, then |k (y> D) k (x> D)| 1 and μ ¶ 1 1 1 |j (x) j (y)| = + | min (k> k (x> D)) + min (k> k (y> D))| k d e where the dierence of the min-operator is largest and equal to 1 if not both k(x> D) and k (y> D) are larger than k. This proves (4.32). Hence, using this bound (4.32),
4.3 Partitioning of a graph the numerator in (4.17) is X¡ ¡ ¢ ¡ ¢¢2 X ¡ ¡ + ¢ ¡ ¢¢2 i o+ i o j o j o = = o5L
o5L
1 k2
μ
1 1 + d e
91 X
¡ ¡ +¢ ¡ ¢¢2 j o j o
o5L\{D^E}
¶2
(O OD OE )
where OD and OE are the number of links in the sets D and E respectively. The denominator of (4.17) is X X X X 2 2 i 2 (q) i 2 (q) = (j (q) j¯) + (j (q) j¯) q5N
q5(D^E)
q5D
q5E
¶2 ¶2 μ 1 1 j¯ + QE + j¯ = QD d e ¶ ¶ μ μ 1 1 1 1 2 + + (d + e) j¯ Q + =Q d e d e μ
Finally, with (4.17), Alon and Milman (1985) arrive at ¶ μ μ ¶ 1 1 1 1 1 1 Q1 + (O OD OE ) = 2 (O OD OE ) + Q k2 d e k QD QE (4.33) The Alon-Milman inequality (4.33) shows that a large algebraic connectivity Q1 leads to a high number of links between the two clusters D and E. Indeed, consider all subsets D and E in a graph J with a fixed number of nodes QD and QE and same separation k, then a large Q1 implies a large number of links O OD OE between any pair — thus also minimal pairs — of subsets D and E. Hence, a large Q 1 means a higher inter-twined subgraph structure and, consequently, it is more di!cult to cut away a subgraph from J. A graph with large second smallest Laplacian eigenvalue Q 1 is thus more “robust”, in the sense of being better connected or interlinked. Just this property of Q1 has made the second smallest Laplacian eigenvalue a fundamental characterizer of the robustness of a graph. However, the algebraic connectivity Q 1 should not be viewed as a strict disconnectivity or robustness metric. Fig. 4.1 depicts two graphs J1 and J2 , each with Q = 7 nodes, O = 10 links and diameter = 4, but with dierent algebraic connectivity Q1 (J1 ) = 0=6338 and Q 1 (J2 ) = 0=5858. Although Q 1 (J1 ) A Q1 (J2 ), it is easier to disconnect J1 than J2 , because one link removal disconnects J1 , while two links need to be deleted in J2 . 98. Bounds for the separator. The Alon-Milman method of art. 97 can be extended to deduce bounds for the separator V of two disjoint subsets D and E, that are at a distance k from each other. The separator V is the set of nodes at a distance less than k hops from D and not belonging to D nor E, V = {x 5 N \D : k (x> D) ? k} and D ^ E ^ V = N . Sometimes, when k = 1, the separator is called the cut size,
Eigenvalues of the Laplacian T
92
G2
G1
Fig. 4.1. Two graphs J1 and J2 , each with Q = 7 nodes, O = 10 links and diameter = 4, but with dierent algebraic connectivity.
since there is a cut that splits the graph into two partitions. As in art. 97, we define d = QQD , e = QQE and v = QQV , where QF is the number of nodes in the set F. Instead of using the inequality (4.17), Pothen et al. (1990) start from the Fiedler inequality (4.19) in which they use i (x) = 1
2 min (k> k (x> D)) k
that is recognized as the Alon-Milman function j (x) with d¡ = e =¢ 1. If x 5 V, then i (x) = 1 k2 k (x> D) and 1 k2 i (x) 1 2(k1) = 1 k2 . The numerator k in (4.19) is computed precisely as in art. 97 with d = e = 1, μ ¶2 X¡ ¡ ¢ ¡ ¢¢2 2 i o+ i o (O OD OE ) k o5L μ ¶2 2 vQ gmax k The denominator q in (4.19) is 1 X X 2 q= (i (x) i (y)) 2 x5N y5N Ã XX XX XX X = + + + x5D y5V
x5D y5E
x5E y5V
XX
XX
XX
Ã
x5D y5V
+
x5D y5E
+
!
X
! 2
(i (x) i (y))
x5V y5V:yAx
(i (x) i (y))2
x5E y5V
¶¶2 ¶¶2 μ μ μ μ 2 2 2 2 2 1 1 Q dv + (1 (1)) Q de + 1 + 1 Q 2 ev k k μ ¶2 © ª 2 = Q 2 v (d + e) + k2 de k With e = 1 d v, we arrive at Q1
vgmax v (1 v) + d(1 d v)k2
(4.34)
which provides a quadratic inequality in v, from which a lower bound for v can be derived.
4.3 Partitioning of a graph
93
99. Pothen et al. (1990) present another inequality for the normalized size v of the separator, that is a direct application of the Wielandt-Homan inequality (8.75) for symmetric matrices. We can always relabel the nodes in the graph J corresponding to the sets D> E and V such that the Laplacian becomes 5 6 TQd×Q d RQ d×Q e TQ d×Q v TQe×Q e TQe×Qv 8 T = 7 RQ e×Q d W W (TQ d×Q v ) (TQ e×Q v ) TQv×Qv The idea, then, is to consider another matrix, whose eigenvalues are all known, such as P = diag(MQ d×Qd > MQ e×Q e > MQ v×Qv ), where M is the all-one matrix. The eigenvalues of P are those of the separate block matrices, that follow from (5.1) as Q d> Q e> Q f and all the others are zero. Let us assume that d e f. We apply the Wielandt-Homan inequality (8.75) to P and T (to have consistent ordering in the eigenvalues) such that q X n=1
n (T) n (P ) =
q X
Q +1n n (P ) = (0=Q d + Q 1 Q e + Q 2 Q v)
n=1
while trace(TP ) = trace(TP ) and, with the shorter notation for the square matrix UQ o×Q o = UQ o , trace (TP ) = trace (TQd MQd ) + trace (TQe MQ e ) + trace (TQv MQ v ) Ã ! X X X + + = (gx gx ) x5D
x5E
x5V
= 2 (O OD OE OV ) where gx is the number of links incident to the node x and with end node in the same set as x. Substituting both in (8.75) yields Q 1 Q e + Q 2 Q v 2 (O OD OE OV ) 2 (O OD OE ) 2Q vgmax from which, using e = 1 d v, a lower bound for the size of the separator follows as (1 d) Q1 v 2gmax (Q 2 Q1 ) This inequality, that contains beside the algebraic connectivity Q 1 also the gap Q 2 Q 1 , complements the inequality (4.34). 100. Applications of the Alon-Milman bound (4.33). Alon and Milman (1985) mention the following applications of the bound (4.33). First, let D = {x} and E = N \ {x}, then k = 1 and O OD OE = gx , the degree of node x. Since this inequality holds for any node x, the tightest bound is obtained by choosing a node x with minimum degree gmin = minx5N gx , which leads again to (4.20).
Eigenvalues of the Laplacian T
94
Second, let d = e = 12 , then the set of all links connecting a node in D to a node in E is called the bisector of J. The minimum number of the bisector is related to min-cut, max-flow problems. The Alon-Milman bound (4.33) shows a lower bound for the bisector, Q Q1 bisector (J) 4 Third, if k A 1, then every link in the set L\ (LD ^LE ) is incident with at least one of the Q QD QE nodes of the set V = N \ (ND ^N E ), such that OOD OE (Q QD QE ) gmax . The Alon-Milman bound (4.33) becomes, using d + e ? 1, ¶ μ 1 1 1 1 + (1 d e) gmax (1 d e) gmax Q 1 2 k d e dek2 vgmax = d (1 d v) k2 which is clearly weaker than (4.34) because 0 ? v ? 1. It provides a lower bound for the fraction e = QQE as 1d (4.35) e 2 Q 1 1 + dkgmax where k A 1. Based on (4.35), Alon and Milman (1985) also derive a second bound μ r ¸¶ Q 1 e (1 d) exp ln (1 + 2d) k 2gmax
(4.36)
where [{] denotes the largest integer smaller than or equal to {. Proof: The idea is to construct subsets Du of N that include, beside the original subset D, additional nodes of N within distance u 5 R hops from D, i.e., Du = {y 5 N : g (y> D) u}. We construct a sequence on distance u = m for m = 0> 1> = = = > n of those subsets such that Dm and N \D(m+1) are more than k A 1 hops separated, which requires that k A A 1. For those subsets D D D2 · · · Dn N , application of (4.35) yields 1 d(m+1) =
1 dm 1+
dm k2 Q1 gmax
1 dm 2
Q 1 1 + d gmax
The largest possible n is such that n , where is the diameter of the graph. Now, with (4.20), we observe that, for Q 2, 1 Q 1 1 1 Q 1 Q gmin 2gmax such that
2gmax Q 1
A 1. Let 2 =
2gmax Q1
A 1, then
1 d(m+1) (1 dm )
1 1 + 2d
4.3 Partitioning of a graph
95
for 0 m ? n. Multiplying those inequalities yields 1 dn (1 d)
1 (1 + 2d)n
= (1 d)hn ln(1+2d)
and by construction E N \Dn or e ? 1 dn and n ?
q Q 1 = 2g . This max
¤
proves (4.36) for any k ? .
101. Isoperimetric constant . If we choose the set E equal to N \ND , then L LD LE is the set of links with one end in D and the other in E. Thus, CD = O OD OE is the number of links between D and its complement N \ND . The isoperimetric constant of the graph J is defined5 as = min ND
CD QD
(4.37)
£ ¤ where the minimum is over all non-empty subsets ND of N satisfying QD Q2 . The isoperimetric constant is also called the Cheeger constant. The Alon-Milman bound (4.33) reduces (with k = 1) to ¶ μ 1 1 Q 1 CD + QD Q QD ¯ o n CD ¯ Q If we denote n = minND Q ¯ QD = n , then Q1 CD n Qn and this inequality D holds for any set QD , also for the minimizer of the right-hand side. Thus, Q 1 n QQn and Q n Q 1 n Q
£ ¤ We may further minimize both sides over all n = 1> 2> = = = > Q2 . Observe that = min1n[ Q ] n . Hence, the Alon-Milman bound (4.33) leads to a lower bound 2 for the isoperimetric constant Q 1 2 Using Alon’s machinery of art. 94 that led to the lower bound (4.28), Mohar (1989) showed that, for Q A 3, p Q1 (2gmax Q 1 ) 102. Expanders. A graph J with Q nodes is a f-expander if every subset ND £Q ¤ with QD 2 nodes is connected to its complement N \ND by at least fQD links. Since CD fQD , art. 101 indicates that f = . Expanders are thus very di!cult to disconnect because every set of nodes in J is well connected to its complement. This “robustness” property makes expanders highly desirable in the design of fault tolerant networks such as man-made infrastructures like communications networks 5
The computation of the isoperimetric constant is an NP-complete problem as shown by Mohar (1989).
96
Eigenvalues of the Laplacian T
and electric power transmission networks. A part of the network can only be cut o by destroying a large number of individual connections. In particular, sparse expanders, graphs with few links, have great interest, because the cost of a network usually increases with the number of links. A well-studied subclass of expanders are regular graphs. In Govers et al. (2008), Wigderson mentions that almost every regular graph with degree u 3 is an expander. The proof is probabilistic and does not provide insight how to construct a regular f-expander. Although nearly any regular graph is an expander, it turns out that there are only few methods to construct them explicitly. It follows from the bounds in art. 101 and f = that p 1 Q 1 f Q 1 (2u Q 1 ) 2 where Q 1 = u 2 (D) also equals the spectral gap (art. 55 and art. 74). The larger the spectral gap or the smaller 2 (D), the larger f and the stronger or the more robust the expander is. A remarkable achievement is the discovery that, for all s u-regular graphs, 2 (D) 2 u 1 and that equality is only attained in Ramanujan graphs, where u 1 is a prime power, as shown by Lubotzky et al. (1988). 103. Graph partioning. Since uL T is a non-negative matrix for u A gmax , a direct application of Fiedler’s Theorem 40 in art. 171 for n = 2 shows that a connected graph J can be partitioned into two distinct, connected components J1 and Jo2 , n where the nodes of J1 = J\J2 are elements of the set M = m 5 N : ({Q 1 )m , where {Q 1 is the eigenvector belonging to the second smallest eigenvalue Q1 of the Laplacian T and is some threshold value that specifies dierent disjoint partitions. Clearly, if A max1mQ ({Q 1 )m or if ? min1mQ ({Q1 )m , there is only the “trivial” partition consisting of the original graph J itself. Fiedler (1975) demonstrates that, by varying the threshold 0, all possible cuts that separate the graph J = J1 ^ J2 into two distinct (J1 _ J2 = B) connected components J1 and J2 can be obtained in this way. Art. 67 indicates that the sum over all positive vector components equals the sum over all negative ones. This means that the value = 0 in Fiedler’s partitioning algorithm divides the graph into two “equivalent” partitions, where “equivalent” is measured with respect to the second smallest Laplacian eigenvector. It does not imply, however, that both partitions have the same number of nodes.
4.4 The modularity and the modularity matrix P 104. Modularity. The modularity, proposed by Newman and Girvan (2004), is a measure of the quality of a particular division of the network. The modularity is proportional to the number of links falling within clusters or groups minus the expected number in an equivalent network with links placed at random. Thus, if the number of links within a group is no better than random, the modularity is zero.
4.4 The modularity and the modularity matrix P
97
A modularity approaching one reflects networks with strong community structure: a dense intra-group and a sparse inter-group connection pattern. If links are placed at random, then the expected number of links between node gl gm l and node m equals 2O . The modularity p is defined by Newman (2006) as ¶ μ Q Q 1 XX gl gm 1{l and m b elong to the same cluster} p= dlm (4.38) 2O l=1 m=1 2O We consider first a network partitioning into two clusters or subgraphs as in art. 96. The condition (indicator function) is rewritten in terms of the | vector, defined in art. 96, as 1 1{l and m b elong to the same cluster} = (|l |m + 1) 2 so that ¶ Q Q μ gl gm 1 XX |l |m dlm p= 4O l=1 m=1 2O because, by the basic law for the degree (2.3) and by (2.2), ¶ X Q μ Q Q X gl gm gl X = dlm dlm gm = 0 2O 2O m=1 m=1 m=1
(4.39)
Clearly, if there is only one partition to which all nodes belong, then | = x and the modularity is p = 0 as follows from (4.39). After defining the symmetric modularity matrix P with elements plm = dlm gl gm 2O , such that 1 g=gW (4.40) P =D 2O we rewrite the modularity p, with respect to a partitioning into two clusters specified by the vector |, as a quadratic form 1 W | P| 4O which is analogous to the number of links U in (4.30) between the two disjoint partitions. Generally, for a partitioning of the network into f clusters, instead of the vector |, the Q × f community matrix V, defined in art. 14, can be used to rephrase the condition as f X 1{l and m b elong to the same cluster} = Vln Vmn p=
n=1
which leads to the matrix representation of the modularity ¡ ¢ f Q Q trace V W P V 1 XXX Vln plm Vmn = p= 2O 2O l=1 m=1 n=1
(4.41)
Eigenvalues of the Laplacian T
98
We define the community vector vn as the n-th column of the community matrix V, which specifies the n-th cluster: all components of vn , corresponding to nodes belonging to cluster Fn , are equal to one, otherwise they are zero. For f = 2 clusters, we note that the vector | = v1 v2 , and that, in case f = 2, only one vector su!ces for the partitioning, instead of v1 and v2 . Using the eigenvalue decomposition (art. 156) of the symmetric modularity matrix P = Z diag(m (P )) Z W , where Z is the orthogonal Q × Q matrix with the m-th eigenvector zm belonging to m (P ) in column m, the general spectral expression for the modularity p for any number of clusters f follows from (4.41) as ³¡ ´ ¢W trace Z W V diag (m (P )) Z W V p= 2O à f ! Q 1 X X ¡ W ¢2 zm vn (4.42) m (P ) = 2O m=1 n=1
¢
¡
PQ
W because Z W V mn = t=1 Ztm Vtn = zm vn . In particular, the scalar product P zmW vn = t5Fn (zm )t is the sum of those eigenvector components of zm that belong to cluster Fn . If we write the community vector as a linear combination of P the eigenvectors of P , vn = Q m=1 nm zm , then the orthogonality of eigenvectors indicates that the coe!cients equal nm = zmW vn . Moreover, art. 14 shows that the Pf vectors v1 > v2 > = = = > vf are orthogonal vectors, and, by definition, that n=1 vn = x. Since x is an eigenvector of P belonging to the zero eigenvalue as follows from (4.39), we observe that f X zmW vn = 0 n=1
provided the eigenvector zm 6= x. Using the Cauchy identity (8.86) Ã f !2 f f p1 X X X X¡ ¡ W ¢2 ¢2 zm vn zmW (vp vn ) zmW vn = f n=1
n=1
we find that 1 X p= 2Of m=1 Q
Ã
p=2 n=1
! f p1 X X¡ ¢2 W zm (vp vn ) m (P )
p=2 n=1
which reduces for f = 2 and | = v1 v2 to (4.60)³ below. ´ ¡ ¢W ¡ ¢ Since Z Z W = L (art. 151), we have that trace Z W V Z W V = trace V W V = Q (art. 14), such that we obtain a companion of (4.42) f Q X X ¡ W ¢2 zm vn = Q
(4.43)
m=1 n=1
Let zt =
sx Q
denote the eigenvector of P belonging to the eigenvalue t (P ) = 0,
4.4 The modularity and the modularity matrix P then
99
f f f X ¡ W ¢2 1 X ¡ W ¢2 1 X 2 zt vn = x vn = qn Q Q
n=1
n=1
n=1
where qn is the number of nodes in cluster Fn . Invoking the inequality (3.39) to (4.42) subject to (4.43) yields ³P ³P ¡ W ¢2 ´ ¡ W ¢2 ´ PQ f f m (P ) m (P ) m=1;m6=t n=1 zm vn n=1 zm vn max = 1 (P ) ¡ W ¢2 PQ Pf ¡ W ¢2 P f 1mQ m=1;m6=t n=1 zm vn n=1 zm vn 2O Q ,
from which we find, with H [G] =
a spectral upper bound for the modularity ! Ã f 1 (P ) 1 X 2 p qn (4.44) 1 2 H [G] Q n=1
This bound can also be written as p
1 (P ) H [G]
μ ¶ 1 f 1 2 Var [qF ] f Q
where qF is the number of nodes in an arbitrary cluster, because H [qF ] = Pf Q n=1 qn = f .
1 f
105. Upper bound for the modularity. The general definition (4.38) is first rewritten as follows. We transform the nodal representation to a counting over links o = l m such that Q X f Q X X dlm 1{l and m b elong to the same cluster} = 2 On l=1 m=1
n=1
where On is the number of links of cluster Fn , and the factor 2 arises from the fact that all links are counted twice, due the symmetry D = DW of the adjacency matrix. If we denote by Ointer the number of inter-community links, i.e. the number of links that are cut by partitioning the network into f communities or clusters, then O=
f X
On + Ointer
n=1
Similarly, Q Q X X
gl gm 1{l and
m b elong to the same cluster}
=
l=1 m=1
f X n=1
where GFn =
X
Ã
X
l5Fn
4 !3 f X X 2 gl C gm D = GF n m5Fn
n=1
gl
l5Fn
is the sum of the degrees of all nodes that belong to cluster Fn . Clearly, GFn 2On , because some nodes in cluster Fn may possess links connected to nodes in
Eigenvalues of the Laplacian T
100
Pf other clusters. The basic law of the degree (2.3) then shows that n=1 GFn = 2O. Substituting these expressions in the definition (4.38) leads to an alternative expression6 for the modularity Ã μ ¶2 ! f X GFn On p= (4.45) O 2O n=1 Pf Pf 2 Subject to the basic law of the degree, n=1 GFn = 2O, the sum n=1 GFn is 2O maximized when GFn = f for all 1 n f. Indeed, the corresponding Lagrangian à f ! f X X 2 L= GF + GFn 2O n n=1
n=1
where is a Lagrange multiplier, supplies the set of equations for the optimal Pf CL solution, CG = 2GFm + = 0 for 1 m f and CL n=1 GFn 2O = 0, which C = Fm 2 P 4O 2O 2 is satified for = f and GFm = f for all 1 m f. Hence, fn=1 GF (2O) f . n The modularity in (4.45) is minimized, for f A 1, if On = 0 for 1 n f and Pf 1 2 n=1 GFn is maximized such that p f . In conclusion, the modularity of any 1 graph is never smaller than 2 , and this minimum is obtained for the complete bipartite graph. Pf Invoking the Cauchy identity (8.86) and n=1 GFn = 2O, f X n=1
¢2 (2O)2 1 X X ¡ GFm GFn + f f m=2 f
2 GF = n
m1
n=1
results in yet another expression for the modularity ¶2 f m1 μ Ointer 1 1 X X GFm GFn p=1 O f f m=2 2O
(4.46)
n=1
Since the double sum is always positive, (4.46) provides us with an upper bound for the modularity, 1 Ointer p1 (4.47) f O The upper bound (4.47) is only attained if the degree sum of all clusters is the same. In addition, the upper bound (4.47) shows that p 1 and that a modularity of 1 is only reached asymptotically, when the number of clusters f $ 4 and Ointer = r (O), implying that the fraction of inter-community links over the total number of links O is vanishingly small for large ¯ graphs (Q ¯ $ 4 and O $ 4). Let GF = max{Fm >Fn } ¯GFm GFn ¯, then a lower bound of the modularity, deduced from (4.46), is μ ¶2 1 (f 1) GF Ointer (4.48) p1 O f 2 2O 6
Newman (2010) presents still another expression for the modularity.
4.4 The modularity and the modularity matrix P
101
Only if GF = 0, the lower bound (4.48) equals the upper bound (4.47) and the equality sign can occur. Excluding the case that GF = 0, then not all GFm are equal, and we may assume an ordering GF1 GF2 = = = GFf , with at least one strict inequality. We demonstrate that, for f A 2, not all dierences GFm GFn = GF A 0 for any pair (m> n). For, assume the contrary so that GF1 GF2 = GF2 GF3 = GF1 GF3 = GF A 0, then GF = GF1 GF3 = (GF1 GF2 ) + (GF2 GF3 ) = 2GF , which cannot hold for GF A 0. Hence, if GF A 0, the inequality in (4.48) is strict; alternatively, the lower bound (4.48) is not attainable in that case. In order for a network to have modular structure, the modularity must be positive. The requirement that the lower bound (4.48) is non-negative, supplies us with an upper bound for the maximum dierence GF in the nodal degree sum between two clusters in a “modular” graph s μ ¶ Ointer 2 1 GF 2O 1 (4.49) f1 O f For f A 1, (4.49) demonstrates that GF ? 2O. Ignoring the integer nature of f, the lower bound (4.48) is maximized with respect to the number of clusters f when s 2 2O s f = A 2 (4.50) GF resulting in p1
Ointer s 2 O
μ
GF 2O
¶ +
1 2
μ
GF 2O
¶2
F Theμright-hand ¶ side in this lower bound is positive provided that 1 A G2O A q s nter F . When this lower bound for G2O 2 1 Oi O is satisfied, the modularity p
is certainly positive, implying that the graph exhibits modular structure. 106. Spectrum of the modularity matrix P . Since the row sum (4.39) of the modularity matrix P is zero, which translates to P x = 0, the modularity matrix has a zero eigenvalue corresponding to the eigenvector x, similar to the Laplacian matrix (art. 2). Unlike the Laplacian T, the modularity matrix P always has negative eigenvalues. Indeed, from (8.7) and art. 25, the sum of the eigenvalues of P equals Q Q X 1 X 2 Q2 m (P ) = gm = (4.51) 2O m=1 Q1 m=1 where Qn is the total number of walks of length n (art. 33). With a little more eort, we find that Q X
Q ¢ ¢ ¡ ¡ ¢ 1 ¡ 1 X 2 2m (P ) = trace D2 trace DggW + gm trace g=gW 2 O (2O) m=1 m=1
Eigenvalues of the Laplacian T
102
Using (3.5) and art. 33, we have 42 3 μ ¶2 Q X 1 1 2Q3 Q2 2 W 2D C m (P ) = 2O g Dg + gm = Q1 + O 2O m=1 Q1 Q1 m=1
Q X
(4.52)
In general, P and D do not commute, similar to the fact that T and D also do not commute. Hence, art. 184 shows that the set of eigenvectors {zn }1nQ of P is dierent from the set of eigenvectors {{n }1nQ of D. 1 The eigenvalues of the modularity matrix P = D 2O g=gW are zeros of the characteristic polynomial μ ¶ 1 W det (P L) = det D L g=g 2O ¶ μ 1 1 W (D L) g=g = det (D L) det L 2O Using the “rank one update” formula (8.82), we have ¶ μ 1 W 1 g (D L) g det (P L) = det (D L) 1 2O
(4.53)
We invoke the resolvent (8.66) in art. 178 gW (D L)
1
g=
¢2 Q ¡ W X g {p p p=1
where {p is the eigenvector of D belonging to eigenvalue p . Since gW {p = xW D{p = p xW {p , we obtain ( ¡ ¢2 ) Q X 2p xW {p 1 1 W 1 g (D L) g = 1 2O 2O 2O p p=1 Using Q1 = 2O and (3.11),
¡ ¢2 ) Q ¡ W ¢2 X 2p xW {p p x {p p p=1 p=1 ¢2 Q ¡ X xW {p p = 2O p=1 p
1 1 W g (D L)1 g = 1 2O 2O
(
Q X
which can, in view of (3.17), be written in terms of the generating function QJ (}) of the total number of walks (art. 34). Thus, we arrive at7 μ ¶ μ ¶ 1 det (D L) QJ Q (4.54) det (P L) = 2O 7
Invoking (3.20) and fD () = det (D 3 L), another expression for the characteristic polynomial is det (P 3 L) = (31)Q fDf (3 3 1) 3 ( + Q) fD () 2O
4.4 The modularity and the modularity matrix P 103 ¡1¢ Since lim$0 QJ = 0, the characteristic polynomial (4.54) of P illustrates that = 0 is an eigenvalue of P , corresponding to the eigenvalue x as shown above. 1 W By a¡ same argument as in art. 180, the function j () = 1 2O g (D L)1 g = ¢ QJ 1 Q has simple zeros that lie in between two consecutive eigenvalues of the adjacency matrix D. In summary, the eigenvalues of the modularity matrix P interlace with the eigenvalues of adjacency matrix D: 1 (D) 1 (P ) 2 (D) 2 (P ) = = = Q (D) Q (P ). 107. Spectrum of the modularity matrix P for regular graphs. For regular graphs, where each node has degree u and Dx = ux (art. 41), we have that (D L) x = 1 1 (u ) x from which (D L) x = (u ) x. Substituted in (4.53) yields, with the degree vector g = u=x, ³ ´ u det (P L) = det (D L) 1 xW (D L)1 x Q ¶ μ u xW x = det (D L) = det (D L) 1 Q (u ) u After invoking the basic relation (8.5), we arrive at Y (n (D) ) u Q
det (P L) =
n=1
=
Q Y
(n (D) )
n=2
Hence, the eigenvalues of the modularity matrix P of a regular graph are precisely equal to the eigenvalues of the corresponding adjacency matrix D, except that the largest eigenvalue 1 (D) = u is replaced by the eigenvalue at zero. ¡ ¢ 108. The largest eigenvalue of the modularity matrix. Since QJ 1 Q A 0 in (4.54) for 1 as follows from (3.15) in art. 34, 1 (P ) 1 (D). This inequality is also found from the interlacing property of P and D derived in art. 106. We will show here that 1 (P ) ? 1 (D). Since = 0 is always an eigenvalue of P (art. 106), there cannot be a smaller largest eigenvalue than zero. The interlacing property bounds the largest eigenvalue from below, 1 (P ) 2 (D), and art. 58 demonstrates that all graphs have a non-negative second largest eigenvalue 2 (D) 0, except for the complete graph. The modularity matrix of the complete graph NQ is PNQ = Q1 M L, whose charQ Q 1 acteristic polynomial is det (P L) = (1) (1 + ) as follows from (5.1). This illustrates that the largest eigenvalue of the complete graph is 1 (PNQ ) = 0, which is also the smallest possible largest modularity eigenvalue of all graphs. The eigenvector z1 of P belonging to 1 (P ) has negative components (in contrast to the largest eigenvector {1 of D), because xW z1 = 0, which is similar to the eigenvectors of the Laplacian T (art. 67). The Rayleigh equation (8.28) and the
Eigenvalues of the Laplacian T
104
Rayleigh inequalities in art. 152 demonstrate that ¡ ¢2 z1W P z1 z1W Dz1 1 z1W g 1 ¡ W ¢2 z g = 1 (D) 1 (P ) = 2O z1W z1 2O 1 z1W z1 z1W z1
(4.55)
because z1W z1 = 1 as the orthogonal eigenvectors are normalized (art. 151). The scalar product z1W g is only zero for regular graphs, where each node has degree u, because the degree vector is g = u=x and z1W x = 0, provided z1 6= sxQ (as in the complete graph). However, art. 107 shows that the largest eigenvalue for regular graphs equals 1 (Pu ) = max (0> 2 (Du )) ? 1 (Du ), where the subscript u explicitly refers to regular graphs. Due to interlacing (art. 106), of all graphs, the regular graph has the smallest largest eigenvalue of the modularity matrix. Because the last term in the above upper bound is always strictly positive for nonregular graphs, we obtain the range of 1 (P ) for any graph: 0 1 (P ) ? 1 (D). In summary, the largest eigenvalue of the modularity matrix P is always strictly smaller than the largest eigenvalue of the corresponding adjacency matrix D. For non-regular graphs, the degree vector g is not proportional to the eigenvector x of P (art. 41) and z1W g 6= 0. We can write the degree vector g as a linear combination of the eigenvectors of P , g=
Q X
n zn
where n = gW zn
(4.56)
n=1
s Q t and t A n for 1 n 6= t Q PQ by a similar argument as in art. 38. In addition, we have gW g = n=1 n2 and ¢ ¡ 2 gW g Q 2O = Q Var[G] A 0, such that Q
Let zt =
sx , Q
then gW x = 2O =
1 1 ¡ W ¢2 Var [G] z g = 2O 1 H [G] 2O
Q X
n2
n=2;n6=t
Unfortunately, it is di!cult ¡ W ¢2to estimate the last sum in order to provide a good 2 lower bound for 1 = z1 g in (4.55). Next, we apply the Rayleigh principle to the adjacency matrix D, 1 (D) =
{W1 D{1 {W1 P {1 1 ¡ W ¢2 1 ¡ W ¢2 {1 g 1 (P ) + { g = + W W 2O 2O 1 {1 {1 {1 {1
(4.57)
Combining both Rayleigh inequalites (4.57) and (4.55), we obtain bounds for the dierence 1 (D) 1 (P ) A 0, 1 ¡ W ¢2 1 ¡ W ¢2 z1 g 1 (D) 1 (P ) { g 2O 2O 1 Since {W1 g = {W1 DW x = (D{1 )W x = 1 (D) {W1 x, we arrive from (4.57) at the lower bound ( ) ¡ W ¢2 {1 x 1 (D) 1 (P ) 1 (D) 1 2O
4.4 The modularity and the modularity matrix P 105 ¡ W ¢2 which is only useful when the term in brackets is positive, {1 x can be determined accurately8 and when the lower bound is larger than 2 (D). On the other hand, the scalar product {W1 g is maximal if {1 = s gW , such that, using (4.51), g g
Q X Q2 1 ¡ W ¢2 gW g {1 g = = m (P ) 2O 2O Q1 m=1
from which we obtain, together with (4.57), the upper bound 1 (D)
Q X
m (P )
m=2
109. Bounds for the largest eigenvalue of the modularity matrix. Since gW P g = ¡ W ¢2 Q2 1 g g = Q3 Q21 , we obtain with (4.56) gW Dg 2O Q22 X 2 = n n (P ) Q1 Q
Q3
(4.58)
n=1
As shown in Section 7.5, the sign of (4.58) determines whether a graph is assortative (positive sign) or disassortative (negative sign). Similarly, from gW P 2 g, we deduce that Q Q3 Q2 Q3 X 2 2 Q4 2 + 22 = n n (P ) Q1 Q1 n=1
By applying the inequality (3.39), we obtain PQ n2 n (P ) Q2 Q3 2 n (P ) = n=1 max n 2 = 1 (P ) PQ 2 1nq Q2 Q1 n n=1 n and Q3 Q4 2 + Q2 Q1
μ
Q2 Q1
¶2 21 (P )
Application of Laguerre’s Theorem 64, combined with art. 198 and trace relations (4.51) and (4.52), yields the rather complicated upper bound sμ ¶ μ ¶ μ ¶ 2 Q 1 1 Q2 Q 2Q3 Q2 1 (P ) + (4.59) Q1 Q Q1 Q Q1 Q 1 Q1 For regular graphs where Qn = Q un and 0 ? 1 (Pu ) = 2 (Du ), the bound (4.59) provides an upper bound for the second largest eigenvalue of the adjacency matrix, p u 1p 2 (Du ) + u (Q 1) Q 2 (Q + 1) u Q Q 8
2 Indeed, the bound {W $ Q, as shown in art. 38, leads to a lower bound 1 (D) 3 1x 1 (P), that is smaller than zero for non-regular graphs and, thus, useless.
2 1 (D) H[G]
$
106
Eigenvalues of the Laplacian T
For the complete graph NQ , where u = Q 1 and 1 (PNQ ) = 0, the bound (4.59) is exact. In view of the upper bound (4.44) for the modularity, the bound (4.59) is only useful when the right-hand side is smaller than the average degree H [G]. Numerical evaluations indicate that the bound (4.59) is seldom sharp. 110. Maximizing the modularity. Maximizing the modularity p consists of finding the best Q × f community matrix V in either definition (4.41) or (4.42). Numerous algorithms, that (approximately for f A 2) find the best community matrix V, exist, for which we refer to Newman (2010). Here, we concentrate on a spectral method. 1 W Starting from the quadratic form p = 4O | P | for the modularity, where the number of clusters f = 2, Newman (2006) mimics the method in art. 96 by writing PQ the vector | = m=1 m zm with m = | W zm as a linear combination of the orthogonal eigenvectors z1 > z2 > = = = > zQ of P , 1 X 2 m (P ) 4O m=1 m Q
p=
(4.60)
Maximizing the modularity p is thus equal to choosing the vector | as a linear combination of the few largest eigenvectors, such that components of | are either 1 and +1, which is di!cult as mentioned above in art. 96. ¯ Newman (2006) ¯ PQ ¯ ¯ proposes to maximize 1 = | W z1 and the maximum 1 = m=1 ¯(z1 )m ¯ is reached when each component |m = 1 if (z1 )m ? 0 or |m = 1 if (z1 )m 0. Moreover, using properties of norms (art. 162), we find that 1 = kz1 k1 kz1 k2 = 1, and by construction and the orthogonality of the eigenvectors, m ? kzm k1 . This separation of nodes into two partitions according to the sign of the vector components in the largest eigenvector z1 of P is similar in spirit to Fiedler’s algorithm (art. 103). Apart from the sign considered so far, a large eigenvector component contributes more to the modularity p in (4.60) than a small (in absolute value) component. Thus, the magnitude (in absolute value) of the components in z1 measure how firmly the corresponding node in the graph belongs to its assigned group, which is a general characteristic of a class of spectral measures called “eigenvalue centralities”, defined in Section 7.8.1. Notice that, since x is the eigenvector belonging to (P ) = 0, the trivial partition of whole the network in one group, is excluded from modularity (because (P ) = 0 does not contribute to the sum in (4.60)) and that any other eigenvector, due to the orthogonality (art. 151), must have at least one negative component. In contrast to the Fiedler partitioning based on the Laplacian, the situation where all non-zero eigenvalues of P are negative might occur (as in the complete graph, for example; art. 108), which indicates that there is no partition, except for the trivial one, and that the modularity p in (4.60) is negative. This observation is important: Newman (2006) exploits the fact that p ? 0 to not partition a (sub)network. 111. Newman’s iterated bisection. Let us now consider a network partitioning in more than two clusters or groups. Usually, as in Fiedler’s approach, we partition
4.4 The modularity and the modularity matrix P
107
the graph first into two subgraph, then apply Fiedler’s algorithm recursively to each subgraph, which is divided again into two parts, and so on. Newman (2006) remarks that deleting the links between the two partitions, and then applying the second iteration of partitioning into two parts, is not correct, because the modularity in (4.38) will change if links are deleted. Instead, Newman proposes to write the additional contribution p to the modularity upon further dividing a group j of Qj nodes into two as 3 4 X X 1 C plm |l |m plm D p = 4O l>m5j l>m5j 3 4 X X 1 1 W Cplm lm | Pj |j = pln D |l |m = (4.61) 4O l>m5j 4O j n5j
where the Qj × Qj symmetric matrix Pj has elements (Pj )lm indexed by the links (l> m) within the group j, X (Pj )lm = plm lm pln P
P
n5j
P We note that Pj x = m5j plm m5j lm n5j pln = 0 and that n5j pln = 0 for all l, only if j = J, because then P x = 0, implying that this sum vanishes in the first step of partitioning process (as in (4.38)). Hence, Pj formally possesses the same properties as P and we can apply the above spectral algorithm derived from (4.60) to further partition the group j into two parts, provided Pj has positive eigenvalues. Since j is a subgraph of J, the Interlacing Theorem 42 states that the eigenvalues of Pj interlace with those of P , such that subsequent divisions have a smaller impact on the modularity. As explained above, if Pj has nonpositive eigenvalues, the group j should not be divided, because the contribution p to the total modularity should be positive, else nothing is gained. The recursive subpartitioning of a group into two smaller groups is terminated if the eigenvalues of Pj are all non-positive, which is an elegant check in Newman’s partitioning algorithm. This stopping criterion is su!cient to determine indivisibility of a group, but it is not always necessary. Indeed, when there are a few, small positive and many large negative eigenvalues of Pj , the sum in (4.60) can be negative. We can guard against this possibility by just checking in (4.61) whether p 0. If p ? 0, we leave the corresponding subgraph undivided. As a result, the outcome of the algorithm gives subgraphs that are all indivisible, according to the modularity measure. Finally, we end by mentioning another algorithm that does not involve spectral analysis. Assume some initial split of the graph into two groups. We move a node from one group to the other, only if the resulting modularity increases, and we start preferably with the node whose move has the largest increase in p. We repeatedly move each node once in such way. To be sure that no greater modularity is possible P
108
Eigenvalues of the Laplacian T
after one entire round over all the nodes, we can repeat the process iteratively until the modularity cannot further be improved. The best partitioning results according to Newman (2006) seem to be achieved when first the spectral method above is followed up to some broad division of the network, which is then refined by the repeated move algorithm, due to Kerninghan and Lin (1970).
4.5 Bounds for the diameter 112. Diameter . Another noteworthy deduction from the Alon-Milman bound (4.33) is: Theorem 15 (Alon-Milman) The diameter of a connected graph is at most "s # 2gmax log2 Q + 1 (4.62) Q1 Proof: If E is the set of all nodes of J at a larger distance than k from D and D contains at least half of the nodes (d 12 ), then (4.36) gives μ r ¸¶ 1 Q 1 e exp ln (2) k 2 2gmax ³ h q i´ q Q 1 2gmax 1 If we require that exp ln (2) k 2g , then k Q Q1 log2 Q ? max hq i 2gmax 1 Q1 log2 Q + 1. By construction, for such k, e ? Q or E = B, which implies
that D = N . Next, if y 5 N , then the subset {yk } of nodes that is reached within k hops of node y contains more than Q@2 nodes. Indeed, suppose the converse and define D = N \ {yk }. Then d = D@Q A 12 . But, we have shown that, if k = , then D = N . This contradicts the hypothesis. Hence, all nodes in J are reached from an arbitrary node within k = hops, where is specified in (4.62). ¤ Theorem 16 (Van Dam-Haemers) The diameter of a connected graph is at most " # log 2 (Q 1) ¢ ¡s ¢ +1 ¡s (4.63) s s log 1 + Q 1 log 1 Q 1 Proof: The proof is based on the quotient matrix (art. 15) of a graph and on interlacing (art. 182). ¤ Theorem 17 (Mohar) The diameter of a connected graph is at most 6 5 Q log ³ 2 ´: 29 9 + : : 9 log ggmax max where is the isoperimetric constant.
(4.64)
4.6 Eigenvalues of graphs and subgraphs
109
Proof: Mohar (1989) considers the subsets Dx (u) = {y 5 N : g (y> x) u} at distance u of node x. The definition (4.37) shows that, for |Dx (u)|
£Q ¤ 2 ,
(|Dx (u)| + |Dx (u 1)|) CDx (u) + CDx (u 1) where CDx (u) contains all the links between the set Dx (u) \Dx (u 1) and the set Dx (u + 1) \Dx (u). Hence, CDx (u) + CDx (u 1) contains all links in two-hop shortest paths between the set Dx (u 1) \Dx (u 2) and the set Dx (u + 1) \Dx (u), which equals X CDx (u) + CDx (u 1) = gy gmax (|Dx (u)| |Dx (u 1)|) y5Dx (u)\Dx (u1)
Thus, (|Dx (u)| + |Dx (u 1)|) gmax (|Dx (u)| |Dx (u 1)|) £ ¤ from which, for |Dx (u)| Q2 , |Dx (u)| gmax + |Dx (u 1)| gmax Since |Dx (0)| = 1 (and |Dx (1)| = gx ), iterating yields ¶u μ gmax + |Dx (u)| gmax ¼ » £ ¤ log Q provided |Dx (u)| Q2 , which restricts umax log gmax2 + . This maximum ( gmax ) hopcount reaches half of the nodes. To reach also the other half of nodes in the complement, at most 2umax hops are needed, which proves (4.64). ¤
4.6 Eigenvalues of graphs and subgraphs 113. If J1 and J2 are link-disjoint graphs on the same set of nodes, then the union J = J1 ^ J2 possesses the adjacency matrix DJ = DJ1 + DJ2 and the Laplacian TJ = TJ1 + TJ2 . Art. 183 then states that, for each eigenvalue 1 n Q , Q (J1 ) + n (J2 ) n (J) n (J2 ) + 1 (J1 ) n (J2 ) n (J) n (J2 ) + 1 (J1 ) This shows that the Laplacian eigenvalues n (J) are non-decreasing if links are added in a graph, or, more generally, if J2 J and both have the same number of nodes, QJ2 = QJ , then n (J2 ) n (J). 114. The general result in art. 113 can be sharpened for the specific case of adding one link to a graph. If J + {h} is the graph obtained from J by adding a link, then the incidence matrix EJ+{h} consists of the incidence matrix EJ with one added
110
Eigenvalues of the Laplacian T
column containing the vector }, that has only two non-zero elements, 1 at row h+ and 1 at row h . Hence, W TJ+{h} = EJ EJ + }} W = TJ + }} W
and ¡ ¢ ¡ ¢ det TJ+{h} L = det TJ + }} W L ³ ´ = det (TJ L) det L + (TJ L)1 }} W Applying the “rank one update” formula (8.82) yields ³ ´ 1 1 det L + (TJ L) }} W = 1 + } (TJ L) } W The same argument as in art. 180 shows that the strictly increasing rational funcdet(TJ+{h} L ) only possesses simple poles and zeros that lie in between the tion det(T J L) poles. From the common zero Q (J) = Q (J + {h}) = 0 on, the function det(TJ+{h} L ) increases implying that first the pole at Q 1 (J) is reached bedet(TJ L) fore the zero at Q1 (J + {h}). Hence, we obtain m (J) m (J + {h}) m1 (J) for all 1 ? l Q and at l = 1, 1 (J) 1 (J + {h}). Comparing this bound for m = Q 1 with (4.29) in art. 95 yields Q 1 (J) Q 1 (J + {h}) min (Q 2 (J) > Q1 (J) + 2) 115. Laplacian spectrum of the cone of a graph. When a node n is added to a graph J, a similar analysis as in art. 60 applies. From the adjacency matrix (3.60), the corresponding Laplacian is " # TQ ×Q¡ + ¢diag (yl ) yQ×1 T(Q +1)×(Q +1) = y W 1×Q gn Let us confine ourselves to the special case y = x, where the new node n is connected to any other node in graph, thus forming the cone of the graph J.£ Let z be¤ the eigenvector of TQ ×Q belonging to Q 1 , then, for the vector } W = zW 0 , " # ¸ ¸ TQ + L y (Q 1 + 1) z z ×Q Q ×1 ¢ ¡ = T(Q+1)×(Q +1) } = y W 1×Q Q 0 yW z Only if y = x, in which case xW z = 0, then } is an eigenvalue of T(Q +1)×(Q+1) belonging to Q1 + 1. In fact, as shown by Das (2004), the entire spectrum can be deduced by considering the complement JfQ+1 of the cone of JQ . Since the cone node has degree Q , the complement JfQ +1 is disconnected. Theorem 10 states that the Laplacian of JfQ+1 has at least two eigenvalues fQ = fQ1 = 0, while art. 80 tells us that the
4.6 Eigenvalues of graphs and subgraphs
111
then shows remaining Laplacian eigenvalues of JfQ+1 are those ©of J¡fQ . Using (4.9)¢ª that the set of eigenvalues of the cone of a graph, m T(Q+1)×(Q +1) 1mQ +1 , are Q + 1, m (TQ×Q ) + 1 for 1 m Q 1> and zero. 116. Removal of a node. Let us consider the graph J\ {m} obtained by removing an arbitrary node m and its incident links from J. Art. 115 shows that the Laplacian eigenvalues of the cone of J\ {m} equal Q , 1 (J\ {m}) + 1, 2 (J\ {m}) + 1, . . . , Q 2 (J\ {m}) + 1 and 0. The original graph J is a subgraph of the cone of J\ {m}. Since the Laplacian eigenvalues are non-decreasing if links are added to the graph (art. 113), we conclude that, for all 1 m Q 2, m (J\ {m}) + 1 m+1 (J) 117. Vertex connectivity N (J). The vertex connectivity of a graph, N (J), is the minimum number of nodes whose removal (together with adjacent links) disconnects the graph J. The Rayleigh principle (art. 152) shows,¡for any other ¢connection vector y 6= x in art. 115, that } W T(Q +1)×(Q+1) } Q1 T(Q +1)×(Q+1) such that ¡ ¢ Q 1 T(Q +1)×(Q+1) Q 1 (TQ ×Q ) + 1 Repeating the argument gives ¡ ¢ Q1 T(Q+n)×(Q +n) Q 1 (TQ ×Q ) + n If N (J) = n, the above relation shows that Q 1 N (J)
(4.65)
of minimum Indeed, for a disconnected graph Q 1 (TQ ×Q )¡ = 0 and the addition ¢ (J) = n nodes connects the graph, i.e., Q1 T(Q+n)×(Q +n) A 0. 118. Edge connectivity L (J). The edge connectivity of a graph, L (J), is the minimum number of links whose removal disconnects the graph J. For any connected graph J, it holds that N (J) L (J) gmin (J)
(4.66)
Indeed, let us concentrate on a connected graph J that is not a complete graph. Since gmin (J) is the minimum degree of a node, say q, in J, by removing all links of node q, J is disconnected. By definition, since L (J) is the minimum number of links that leads to disconnectivity, it follows that L (J) gmin (J) and L (J) Q 2 because J is not a complete graph and consequently the minimum nodal degree is at most Q 2. Furthermore, the definition of L (J) implies that there exists a set V of L (J) links whose removal splits the graph J into two connected subgraphs J1 and J2 , as illustrated in Fig. 4.2. Any link of that set V connects a node in J1 to a node in J2 . Indeed, adding an arbitrary link of that set makes J again connected. But J can be disconnected into the same two connected subgraphs by removing nodes in J1 and/or J2 . Since possible disconnectivity inside
Eigenvalues of the Laplacian T
112
either J1 or J2 can occur before L (J) nodes are removed, it follows that N (J) cannot exceed L (J), which establishes the inequality (4.66).
G2
G1 C
A
B
Fig. 4.2. A graph J with Q = 16 nodes and O = 32 links. Two connected subgraphs J1 and J2 are shown. The graph’s connectivity parameters are N (J) = 1 (removal of node = 4. F), L (J) = 2 (removal of links from F to J1 ), gmin (J) = 3 and H [G] = 2O Q
Let us proceed to find the number of link-disjoint paths between D and E in a connected graph J. Suppose that K is a set of links whose removal separates D from E. Thus, the removal of all links in the set K destroys all paths from D to E. The maximum number of link-disjoint paths between D and E cannot exceed the number of links in K. However, this property holds for any set K, and thus also for the set with the smallest possible number of links. A similar argument applies to node-disjoint paths. Hence, we end up with Theorem 18: Theorem 18 (Menger’s Theorem) The maximum number of link- (node)-disjoint paths between D and E is equal to the minimum number of links (nodes) separating (or disconnecting) D and E. Recall that the edge connectivity L (J) (analogously vertex connectivity N (J)) is the minimum number of links (nodes) whose removal disconnects J. By Menger’s Theorem, it follows that there are at least L (J) link-disjoint paths and at least N (J) node-disjoint paths between any pair of nodes in J. 119. Edge connectivity L (J) and the algebraic connectivity Q1 . Fiedler (1973) has proved a lower bound for Q 1 in terms of the edge connectivity L (J). Theorem 19 (Fiedler) For any graph J with O links and Q nodes, ³ ´ Q 1 2L (J) 1 cos Q
(4.67)
1 T in Theorem Proof: Consider the symmetric, stochastic matrix S = L gmax Q 1 39. The spectral gap of S equals 1 2 (S ) = gmax , and is lower bounded in (8.55) by
#q (u (S )) 1 2 (S ) ¢ ¡ where #q ({) = 2{ 1 cos q for { 12 and the measure of irreducibility u (S ),
4.6 Eigenvalues of graphs and subgraphs
113
(J) . Indeed, by Merger’s Theorem 18, the defined in (8.52), equals u (S ) = gLmax maximum number of link-disjoint paths between node D and E equals the minimum number of links that separates D from E. Hence, there are at least L (J) linkdisjoint paths between any pair of nodes in J. ¤
We note that the function #q ({) in Theorem 39 provides a second bound ¶ μ ³ 2 ´ cos 2gmax 1 cos Q 1 2L (J) cos cos Q Q Q Q which is only better than (4.67) if and ¢ L (J) A gmax . Using (5.7) for a cir¡ only if22 cuit F, shows that (F) = 2 1 cos Q 1 Q , while, for a path S , Q 1 (S ) = ¡ ¢ 2 1 cos Q follows from (5.8). Also, L (F) = N (F) = 2 and L (S ) = N (S ) = 1, which shows that equality is achieved in the bound (4.67) for the path S . However, in most cases as verified for example from Fig. 4.1, the lower bound (4.67) is rather weak. 120. Pendants in a graph. A node with degree one is called a pendant. Many complex networks possess pendants. If a connected graph J has a pendant, then the second smallest eigenvalue Q1 1 as follows from9 (4.22) in art. 90. If the pendant is not adjacent to the highest degree node, then Q1 ? 1. The latter result is based on the observation that, the complement Jf of J, has at least one node of degree Q 2, namely the complementary node of the pendant. This node is thus almost the corner node in the cone of Jf and the star tree with this corner node in the center is a spanning tree W if precisely one node at distance 2 of the corner node is added to that star. Since W = N1>Q 2 + {h}, art. 114 shows that 1 (W ) A 1 (N1>Q 2 ) = Q 1 because 1 (N1>Q2 ) = Q 1 as computed in Section 5.7. Since the spanning tree W is a subtree graph of Jf , art. 113 implies that 1 (Jf ) 1 (W ) A Q 1 such that we arrive, with (4.9), at Q1 (J) ? 1. The following theorem is due to Das (2004): Theorem 20 (Das) If J is a connected graph with a Laplacian eigenvalue 0 ? ? 1, then the diameter of J is at least 3. Proof: Let { denote the eigenvector belonging to , then the equation for the m-th component is {m =
Q X n=1
9
tmn {n = gm {m
Q X n=1
dmn {n = gm {m
Q X
{n
n5neighb or(m)
Fiedler’s general upper bound (4.20) for Q 31 leading to (4.21) is not sharp enough to establish this result.
Eigenvalues of the Laplacian T
114
If {m = max1nQ {n = {max is the largest component of the eigenvector { and let {min;m = minn5neighb or(m) {n with arg {min = nm , then {min;m = (gm ) {m
Q X
{n
n5neighb or(m)\nm
(1 ) {m PQ
because n5neighb or(m)\nm {n (gm 1) {max = (gm 1) {m . Similarly, if {m = max1nQ {n = {min is the largest component of the eigenvector { and let {max;m = maxn5neighb or(m) {n with arg {max = nm , then {max;m = (gm ) {m
Q X
{n
n5neighb or(m)\nm
(1 ) {m PQ
because n5neighbor(m)\nm {n (gm 1) {min = (gm 1) {m . These inequalities show that, if 0 ? ? 1, the eigenvector components corresponding to the neighbors of the node m with largest (smallest) eigenvector component, have the same sign as {m . Art. 67 shows that the largest and smallest eigenvector component (for 6= Q = 0) must have a dierent sign. This implies that the nodes with largest and smallest eigenvector component are not neighbors (not directly connected), nor have neighbors in common. Since J is connected, this means that the diameter in J is at least 3. ¤
5 Spectra of special types of graphs
This chapter presents spectra of graphs that are known in closed form.
5.1 The complete graph The eigenvalues of the adjacency matrix of the complete graph NQ are 1 = Q 1 and 2 = = = = = Q = 1. Since NQ is a regular graph (art. 74), the eigenvalues of the Laplacian are, apart from Q = 0, all equal to m = Q for 1 m Q 1. The eigenvalues can be computed from the determinant in (8.2) in the same way as in art. 82. Alternatively, the adjacency matrix of the complete graph is DNQ = M L and M = x=xW . A direct computation yields ¶ μ ¢ ¡ W x=xW Q det (M L L) = det x=x ( + 1) L = ( ( + 1)) det L +1 Using (8.82) and xW x = Q , we obtain Q
det (M L L) = ( ( + 1))
μ 1 Q 1
= (1)Q ( + 1)
Q +1
¶
( + 1 Q )
from which the eigenvalues of the adjacency matrix of the complete graph NQ are immediate. In summary, det (M {L)q×q = (1)q {q1 ({ q)
(5.1)
5.2 A small-world graph In a small-world graph JSW n , each node is placed on a ring as illustrated in Fig. 5.1 and has links to precisely n subsequent neighbors and, by the cyclic structure of the ring, also to n previous neighbors. The small-world graph has been proposed by Watts and Strogatz (1998) — and is further discussed in Watts (1999) — to study the eect of adding random links to a regular network or of rewiring links randomly. The thus modified small-world graphs are found to be highly clustered, like regular 115
116
Spectra of special types of graphs
graphs. As mentioned in Section 1.3, depending on the rewiring process of links, typical paths may have a large hopcount, unlike in random graphs. The adjacency matrix DSW n is of the type of a symmetric circulant, Toeplitz matrix whose eigenvalue structure (eigenvalues and eigenvectors) can be exactly determined by the Fourier matrix.
Fig. 5.1. A Watts-Strogatz small-world graph JS W n with n = 2 is a regular graph with degree g = 4.
5.2.1 The eigenvalue structure of a circulant matrix A circulant matrix F is an q × q matrix with the form 3 fq1 fq2 · · · f1 f0 E f1 f0 fq1 · · · f2 E E .. . f3 F=E f1 f0 E f2 E . . . . . . ... .. .. C .. fq1 fq2 fq3 · · · f0
4 F F F F F F D
Each column is precisely the same as the previous one, but the elements are shifted one position down and wrapped around at the bottom. In fact, fmn = f(mn) mod q , which shows that diagonals parallel to the main diagonal contain the same elements. The elementary circulant matrix H has all zero elements except for f1 = 1, 4 3 0 0 0 ··· 1 E 1 0 0 ··· 0 F F E F E .. E . 0 F H=E 0 1 0 F F E . . . . . . . . ... D C .. .. 0 0
0
···
0
5.2 A small-world graph
117
and represents a unit-shift relabeling transformation of nodes: 1 $ 2> 2 $ 3> = = = > q $ 1. Thus, the unit-shift relabeling transformation, which is a particular example of a permutation (art. 10), maps the vector { = ({1 > {2 > = = = > {q ) into H{ = ({q > {1 > {2 > = = = > {q1 ). Again applying the unit-shift relabeling transformation maps H{ into H 2 { = ({q1 > {q > {1 > = = = > {q2 ), which is a two-shift relabeling transformation, and 4 3 0 0 0 === 1 0 E 0 0 0 ··· 0 1 F F E F E .. .. E . 0 F . 1 0 0 F E H2 = E .. F .. .. .. F E . . F . . E 0 1 F E . .. F .. .. E . .. . 0 D . . . C . 0 ··· 0 1 0 0 Hence, we observe that H 2 equals the circulant matrix F with all fm = 0, except for f2 = 1. In general, H n represents a n-shift relabeling transformation, where each node label qm $ q(m+n) mod q and H n equals F with all fm = 0, except for fn = 1. Alternatively, a general circulant matrix F can be decomposed into elementary n-shift relabeling matrices H n (where H 0 = L) as F = f0 L + f1 H + f2 H 2 + fq1 H q1 =
q1 X
fn H n
n=0
Pq1 Denoting the polynomial s ({) = n=0 fn {n , we can write that F = s (H). The eigenstructure of H can be found quite elegantly. Indeed, H{ = { is equivalent to solving the set, for both and all components {n of {, {q = {1 {1 = {2 {2 = {3 .. . {q1 = {q After multiplying all equations, we find 2ln q
Qq m=1
{m = q
Qq m=1
{m , from which q = 1 2ln
, for n = 0> 1> = = = > q 1. The roots of unity n = h q obey and n = h 2ln q n = h , n n = |n |2 = 1 and, thus with (8.6) in art. 138, we obtain det H = Qq1 q1 . Any eigenvector is only determined apart from a scaling n=0 n = (1) factor, we may choose {1 = and, after backsubstitution in the set, we find that {n = 1n for all n = 1> = = = > q 1 and {q = = 1q because q = 1. Thus, the¢ ¡ 1 q+1 eigenvector of H belonging to the eigenvalue n equals 1> n > 2 n > = = = > n and the matrix [ containing the eigenvectors of H as column vectors is, with
118
Spectra of special types of graphs 2l q
=h
,
3 E E E [ = E E C
1 1 1 .. .
1 2 .. .
1 q1
1 2 4 .. .
··· ··· ···
q1 q2
q2
···
1
4 F F F F F D
where ([)nm = (n1)(m1) . We observe that [ W = [. If { and | are the eigenvectors belonging to eigenvalue n and m , respectively, then the inner product {K | (art. 150) is q q ³ q ´o X X ¡ 1o ¢ 1o 2l(mn) X 2l(mn) n {K | = {o |o = 2 m = 2 h q h q o=1 o=1 q1 X ³ 2l(mn) ´o 2 q
=
h
o=0
o=1
= 2
2l(mn)
1h
1 h
2l(mn) q
Since h2lp = 1 for any integer p, we have that {K | = 0 if n 6= m, and {K | = q2 if n = m which suggests the normalization 2 q = 1. Hence, with = s1q , we have shown that [ is a unitary matrix (art. 151), that obeys [ K [ = [[ K = L. The matrix [ is also called the Fourier matrix. The eigenvalue equation, written in terms of the matrix [, is H[ = [, where ³ ´ 2l(q1) 2l 4l 2ln = diag 1> h q > h q > = = = > h q > = = = > h q ³ ´ = diag 1> 1 > 2 > = = = > (q1) Using the unitary property results, after left multiplication of both sides by [ K , in ³ ´ [ K H[ = diag 1> 1 > 2 > = = = > (q1) and
³ ´ [ K H n [ = diag 1> n > 2n > = = = > (q1)n
Since det H 6= 0, the inverse H 1 exists and is found as ³ ´ H 1 = [diag 1> > 2 > = = = > (q1) [ K Explicitly, q q ¡ K¢ ¡ 1 ¢ 1 X 1 X (n1)(p1)+(p1)(p1)(m1) p1 [ = ([) = H np nm pm q p=1 q p=1
=
q1 1 X 2l(nm+1) p q h q p=0
=
1 1 h2l(nm+1) = 1{n=m1} q 1 h 2l(nm+1) q
5.2 A small-world graph
119
Thus, H 1 = H q1 , which corresponds to a unit-shift relabeling transformation in the other direction: 1 $ q, 2 $ 1,. . . , q $ q 1. Finally, the eigenvalue structure of a general circulant matrix F = s (H) = Pq1 n n=0 fn H is Ãq1 ! q1 X X K K n fn H [ = fn [ K H n [ [ F[ = [ n=0
=
q1 X
n=0
³ ´ fn diag 1> n > 2n > = = = > (q1)n
n=0
= diag
Ãq1 X
fn >
n=0
q1 X
fn
n
>
n=0
q1 X
fn
2n
>===>
q1 X
n=0
! fn
(q1)n
n=0
In terms of the polynomial s ({), we arrive at the eigenvalue decomposition of a general circulant matrix F, ³ ³ ´´ ¢ ¡ ¢ ¡ [ K F[ = diag s (1) > s 1 > s 2 > = = = > s (q1) (5.2)
5.2.2 The spectrum of a small-world graph The adjacency matrix DSW n of a small-world graph where each node, placed on a ring, has links to precisely n subsequent and n previous neighbors, is a symmetric circulant matrix where fQm = fm (symmetry) and f0 = 0, fm = 1{m5[1>n]} , where 1| is the indicator function. Since the degree of each node is 2n and the maximum possible degree is Q 1, the value of n is limited to 2n + 1 Q . The corresponding PQ1 polynomial is denoted as sSW n (}) = m=0 fm } m . Since f0 = 0, we have that sSW n (}) =
Q1 X
fm } m =
m=1
=
d X
d X
fm } m +
m=1
fm } m +
m=1
Q 1 X
fm } m
m=d+1
QX d1
fQ m } Q m
m=1
By symmetry fQ m = fm , the last sum is, for any integer d 5 [1> Q 1], sSW n (}) =
d X
fm } m +
QX d1
m=1
fm } Qm
m=1
When choosing d = n, the bound 2n +1 Q implies that Q d1 = Q n 1 n. Invoking fm = 1{m5[1>n]} and with d = n, we obtain sSW n (}) =
n X m=1
}m + }Q
n X
} m
m=1
1 }n 1 } n + } Q 1 =} 1} 1 } 1
120
Spectra of special types of graphs
Fig. 5.2. The complete spectrum (5.3) for Q = 101. The {-axis plots all values of n from 1 to q31 = 50, and for each n, all 1 $ p $ Q values of (S W n )p are shown. 2
The spectrum of DSW n follows for p = 1> = = = > Q from (5.2) as ¡ ¢ 1 (1p)n 1 (p1)n + (1p)(Q1) (SW n )p = sSW n 1p = 1p 1p 1 1 p1 =h
2l Q (p1)
1h
2l Q (p1)n
1h
à = 2 Re h
2l Q (p1)
2l Q (p1)
1h
+ h
2l Q (p1)
2l Q (p1)n
1h
!
³ 1h
1h
2l Q (p1)
1 h
2l Q (p1)
2l Q (p1)
After rewriting 2l Q (p1)n
2l Q (p1)n
1 h
l
= h Q (p1)(n1)
sin sin
(p1)n Q
³
(p1) Q
´
´
the eigenvalue with index p of DSW n is ³ ´4 3 l h Q (p1)(n+1) sin (p1)n Q D ³ ´ (SW n )p = 2 Re C (p1) sin Q ³ ´ (p1)n ¶ μ sin Q (p 1) (n + 1) ³ ´ cos =2 Q sin (p1) Q ³ ´ ³ ´ (p1)(2n+1) sin (p1) + sin Q Q ³ ´ = (p1) sin Q
5.2 A small-world graph Finally, the unordered1 eigenvalues of DSW n are, for 1 p Q , ³ ´ sin (p1)(2n+1) Q ³ ´ 1 (SW n )p = sin (p1) Q
121
(5.3)
The complete spectrum for Q = 101 is drawn in Fig. 5.2, which is representative for values of Q roughly above 50. Fig. 5.2 illustrates the spectral evolution (as function of¤ n in the abscissa) from a circuit (n = 1) towards the complete graph £ Q1 (n = 2 ). Applying sin ({ + q) = (1)q sin ({), valid for any integer q, we find additional symmetry in the eigenvalue spectrum, ³ ´ (p1)} sin (2n+1){Q Q ³ ´ (SW n )p = 1 = (SW n )Q +2p sin {Q (p1)} Q for 2 p Q . In we cannot deduce more symmetry because, if Q is £ general, ¤ a prime, precisely Q2 + 1 eigenvalues are distinct for any n ? Q1 2 . Theorem 5 £Q ¤ states that the diameter of JSW n is at most 2 when Q is prime. Fig. 5.3 reflects the irregular dependence of the number of dierent eigenvalues, which reminds us of the irregular structure of quantities in number theory, such as the number of divisors and the prime number factorization. The Chebyshev polynomial of the second kind (Abramowitz and Stegun, 1968, Sections 22.3.7, 22.3.16), defined by [ q2 ] sin (q + 1) arccos } X (q n)! q2n Xq (}) = = (2}) (1)n sin arccos } n! (q 2n)!
(5.4)
n=0
p has zeros at }p = cos q+1 for p = 1> = = = > q such that ¶ q μ Y p Xq (}) = 2q } cos q+1 p=1
(5.5)
In terms of the Chebyshev polynomial of the second kind, we can write (5.3) as ¶¶ μ μ (p 1) (SW n )p = X2n cos 1 Q For example, if p = 1 in (5.3), then (SW n )1 = 2n. We know that (SW n )1 is the largest eigenvalue because, for a regular graph, the maximum eigenvalue equals the degree (art. 41). Also, if 2n + 1 = Q in which case DSW n = M L, then we obtain from (5.3) lim
2n+1$Q 1
(SW n )m = 1
The p-th ordered eigenvalue (S W n )(p) , satisfying (S W n )(p) D (S W n )(p+1) , for 1 $ p ? Q, is not easy to determine as Fig. 5.2 suggests.
122
Spectra of special types of graphs 100
80
60
Number of different eigenvalues
Number of different eigenvalues
100 80 60 40 20
20
40
k
60
80
40
20
50
100 Number of nodes N
150
200
Fig. 5.3. The number of dierent eigenvalues in DS W n as a function of Q for dierent values of n = 1> 2> 3> 4> 5 and n = 10> 20> 30> 40> 50. The insert shows the number of dierent eigenvalues for Q = 200 versus n.
for all p 6= 1, while (SW n )1 = Q 1. Of course, this spectrum corresponds to that of the complete graph NQ , derived in Section 5.1. Since a small-world graph is a regular graph (art. 74), the Laplacian TSW n = 2nL DSW n and the corresponding unordered spectrum is (SW n )Q +1p = 2n (SW n )p ³ ´ sin (p1)(2n+1) Q ³ ´ = 2n + 1 (p1) sin Q As Theorem 10 prescribes, there is precisely one zero eigenvalue (SW n )Q = 0. Art. 74 demonstrates that (SW n )Q1 equals the spectral gap, the dierence between the largest and second largest eigenvalue at each n as illustrated in Fig. 5.2. Q 2Q The largest negative eigenvalue of (SW n )p lies between 2n+1 ? p 1 ? 2n+1 (2n1)Q 2n+1 ). Indeed, if [0> ), then the function i ({) = sin(2n+1){ 1 has sin { sin(2n+1){ o = , which has zeros at { = 2n+1 5 [0> ) sin { 0
(and, by symmetry p $ Q + 2 p,
2nQ 2n+1
A p1 A
and { 5 we let { = (p1) Q the same derivative as i˜({) for o = 1> 2> = = = > 2n. By Rolle’s Theorem, i ({) has always a zero in an interval between two zeros of i ({) since i ({) is continuous. Since sin { has the same sign in { 5 (0> ), the largest absolute values of i ({) will occur near { $ 0 and { $ , where sin { has zeros. A good estimate for the value at which the largest
5.3 A circuit on Q nodes negative eigenvalue occurs is half of the interval, hence, pmin = corresponding eigenvalue is, approximately, (SW n )pmin
³ sin
1 3 2(2n+1)
h
3Q 2(2n+1)
123 i + 1 . The
´ 1 ? 2
h i 3Q + 1 is, in many cases, exact. Numerical values indicate that pmin = 2(2n+1) Hence, the eigenvalues SW n of the adjacency matrix £ ¤ £ DSW n lie in¤ the interval (SW n )pmin > 2n , and most of them lie in the interval (SW n )pmin > 0 . This interval is, to a good approximation, independent of the size of the graph Q , but only a function of the degree of each node, which is 2n. The approximation fails for the complete graph NQ when 2n + 1 = Q (and thus odd) and (SW n )pmin = 1. 5.3 A circuit on Q nodes A circuit F is a ring topology where each node on a circle is connected to its previous and subsequent neighbor on the ring. Hence, the circuit is a special case of the small-world graph for n = 1. The adjacency matrix of the circuit is thus DF = H + H 1 . The eigenvalues of the circuit follow directly from (5.3) as ³ ´ sin 3 (p1) Q ³ ´ 1 (C )p = (p1) sin Q Using the identities sin 3{ = 3 sin { 4 sin3 { and 1 2 sin2 { = cos 2{ yields, for p = 1> = = = > Q , ¶ μ 2 (p 1) (5.6) (C )p = 2 cos Q which shows that (C )p = (C )Qp+2 and that 2 (C )p 2. The lower bound of 2 is only attained for Q is even. We notice that the line graph of the circuit is the circuit itself: o (F) = F. Since the number of links O = Q in the circuit F, the prefactor ( + 2)OQ in the general expression (2.9) of the characteristic polynomial of a line graph vanishes. Nevertheless, only if Q is even, this line graph o (F) still has an eigenvalue equal to 2, while all other eigenvalues are larger (art. 9). The corresponding Laplacian spectrum follows from art. 74 as ¶ μ 2 (p 1) (C )Q+1p = 2 2 cos p = 1> = = = > Q (5.7) Q The characteristic polynomial of the circuit F is ¶ ¶ ¶ ¶ μ μ Q μ Q μ Y Y 2 (p 1) 2p fC () = = 2 cos 2 cos Q Q p=1 p=1 where the last equality follows from the symmetry relation (C )p = (C )Q p . The
124
Spectra of special types of graphs
zeros of the Chebyshev polynomial Wq (}) of the first kind (Abramowitz and Stegun, 1968, Sections 22.3.6, 22.3.15), defined by
are }p
[ q2 ] qX (q n 1)! q2n Wq (}) = cos (q arccos }) = (2}) (1)n 2 n! (q 2n)! n=0 ³ ´ = cos (2p1) for p = 1> = = = > q, from which the product form 2q Wq (}) = 2q1
¶¶ μ q μ Y (2p 1) } cos 2q p=1
follows. On the other hand, the zeros of cos (q arccos })1 = 0 are }p = cos for p = 1> = = = > q. Thus, ¶ ¶ ¶ ¶ μ μ Q μ Q μ Y Y 2p 2p Q fF () = =2 2 cos cos Q Q 2 p=1 p=1
¡ 2p ¢ , q
and, in terms of the Chebyshev polynomial Wq ({) of the first kind, ¶ μ μ ¶ Q 1 WQ fF () = 2(1) 2 5.4 A path of Q 1 hops A path consisting of Q 1 hops has an adjacency matrix DS where each row has precisely one non-zero element in the upper triangular part. There exists a relabeling transformation that transforms the adjacency matrix DS of the path (line) on Q nodes in a tri-diagonal Toeplitz matrix, where each non-zero element appears on the line parallel and just above the main diagonal. The eigenstructure of the general Q × Q tri-diagonal Toeplitz matrix, 6 5 e d : 9 f e d : 9 : 9 . . . .. .. .. W =9 : : 9 7 f e d 8 f
e
is computed in Van Mieghem (2006b, Section A.5.2.1). The matrix W has Q distinct eigenvalues p , for 1 p Q , ¶ μ s p p = e + 2 df cos Q +1 The components ({p )n of the eigenvector {p belonging to p are, for 1 n Q , ¶ μ ³ f ´ n2 pn sin ({p )n = 2 d Q +1
5.4 A path of Q 1 hops
125
Since the eigenvalues are invariant under a similarity transform such as a relabeling transformation (art. 142), the complete eigenvalue and eigenvector system of DS follows, for d = f = 1 and e = 0, from the eigenstructure of the general Q × Q tri-diagonal Toeplitz matrix for p = 1> = = = Q , as μ (S )p = 2 cos
p Q +1
¶ (5.8)
Formula (5.8) shows that (S )p = (S )Q +1p and that all eigenvalues of the Q 1 hops path S are strictly smaller than 2, in particular, 2 ? (S )p ? 2. The largest eigenvalue of the path S is the smallest largest adjacency eigenvalue among any connected graph as proved by Lovász and Pelikán (1973). We provide another reasoning: Lemma 7 shows that a tree has the smallest 1 , because it is the connected graph with the minimum number of links. Further, the tree with minimum maximum degree (gmax = 2) and minimum degree variance is the path. According to the bound (3.45) and O = Q 1 in any tree, the bounds for the largest eigenvalue of the path satisfies r ¶ μ 1 6 2 1 1 (S ) 2 2 1 Q Q and the lower bound even tends to the upper bound for large Q . Any other tree has a larger variance, thus a larger lower bound in (3.45), while also the upper bound is larger than gmax A 2. The characteristic polynomial of the path is fS () =
¶ ¶ μ μ ¶ Q μ Y p = (1)Q XQ 2 cos Q + 1 2 p=1
where the Chebyshev polynomial XQ ({) of the second kind (5.5) has been used. After a suitable relabeling (as above), the Laplacian TS of an Q 1 hops path is, except for the first and last row, a Toeplitz matrix, 6
5
1 1 9 1 2 1 9 9 .. .. TS = 9 . . 9 7 1
: : : : . : 2 1 8 1 1 ..
We compute here the eigenstructure of TS analogous to the derivation of the eigenstructure of the general Q × Q tri-diagonal Toeplitz matrix in Van Mieghem (2006b, Section A.5.2.1). An eigenvector { corresponding to eigenvalue satisfies
126
Spectra of special types of graphs
(T L) { = 0 or, written per component, (1 ){1 {2 = 0 {n1 + (2 ){n {n+1 = 0
2n Q 1
{Q 1 + (1 ){Q = 0 PQ Consider the generating function J (}) = n=1 {n } n , where all {n are real because all eigenvalues of the Laplacian are real. Art. 67 demonstrates that J (1) = 0 when the eigenvector { 6= x, else the eigenvector x, where is a suitable PQ Q +1 normalization, belongs to the = 0 eigenvalue and J (}) = n=1 } n = } 1} 1} and J (1) = Q . After multiplying the n-th vector component equation by } n and summing over all n 5 [2> Q 1], the above dierence equation is transformed into }
Q 2 X
{n } n + ( 2)
n=1
Q 1 X n=2
{n } n + } 1
Q X
{n } n = 0
n=3
and, in terms of J (}), ¢ ¡ ¢ J (}){2 } 2 {1 } ¡ =0 } J (}){Q1 } Q 1 {Q } Q + (2) J (}){1 }{Q } Q + } Thus, ¢ ¡ 2 } + ( 2)} + 1 J (}) = {Q 1 } Q +1 + {Q } Q+2 + ( 2){1 } 2 + ( 2){Q } Q +1 + {2 } 2 + {1 } ¢ ¡ = (} 1) } {Q } Q {1 where in the last step the first and last vector component equation has been used. Solving for J (}) yields ¢ ¡ (} 1) } {Q } Q {1 J (}) = } 2 + ( 2)} + 1 ³ ´ {Q (} 1) } } Q {{Q1 = (} u1 ) (} u2 ) where u1 and u2 are the roots of the polynomial } 2 + ( 2)} + 1 = 0, thus obeying u1 + u2 = 2 and u1 u2 = 1. Since J (})³ is a polynomial of order Q , ´ the zeros u1 and u2 must also be zeros of (} 1) } } Q {{Q1 . Thus, u1 and u2 ³ ´1@Q 2lp must be either 0, 1 or {{Q1 h Q for p = 0> 1> = = = > Q 1. Since u1 u2 = 1, neither u1 nor u2 can be zero. If u1 = 1, then also u2 = 1 in which case = 0 ³ ´1@Q 2lp h Q = 1 to the power Q . In that and {1 = {Q as follows by raising {{Q1 Q PQ {Q } (} 1) = {Q n=1 } n such that the corresponding eigenvector is, case, J (}) = (}1) indeed, the scaled all-one vector x with = {Q . All positive eigenvalues A 0
5.4 A path of Q 1 hops 127 ³ ´1@Q 2lp correspond to distinct zeros u1 = {{Q1 h Q for p = 0> 1> 2> = = = > Q 1. But, ³ ´1@Q 2lq h Q for some since u2 = u11 , the zero u2 also must be of this form, u2 = {{Q1 0 q 6= p Q 1. Thus, the product μ u1 u2 =
{1 {Q
raised to the power Q , shows that l(2q+n)
¶2@Q h {1 {Q
2l(p+q) Q
=1
= ±1 = hln such that u1 = h
l(2p+n) Q
l(pq)
and u2 = h Q . Requiring that u2 = u11 results in u1 = h Q and u2 = l(qp) l(pq) Q Q h =h . Now, u1 changes with p, while u2 with q. For each p = 1> 2> = = = > Q 1, there must correspond to o = p q in the exponent of u1 , a value o = (p q) in u2 , only by changing q 6= p, thus q = p o. The extent over which the integer o can range is (Q 1) o Q 1 and to each o there must correspond a o. Hence, for o = 1> 2> = = = > Q 1, we finally find that ³ lo ´ lo o = 2 (u1 + u2 ) = 2 h Q + h Q μ μ ¶¶ o o = 4 sin2 = 2 1 cos Q 2Q and to o = 0, the case u1 = u2 = 1 corresponds with 0 = 0. In summary, the ordered Laplacian eigenvalues of the Q 1 hops path are ³ ³ p ´´ (P )Q p = 2 1 cos p = 0> 1> = = = > Q 1 (5.9) Q All Laplacian eigenvalues of the path are simple, while most of the circuit Laplacian eigenvalues in (5.7) have double multiplicity. We now determine the eigenvectors {1 > {2 > = = = > {Q 1 and use the notation ({p )m for the m-th component of the eigenvector {p . The eigenvector {p corresponding PQ to p = (P )Q p A 0 has the generating function Jp (}) = n=1 ({p )n } n , which equals ³ ´ ({ ) ({p )Q (} 1) } } Q ({pp) 1 ³ ´³ ´Q Jp (}) = lp lp Q Q }h }h Invoking art. 210 to the polynomial ¶ μ ({p )1 s0 (}) = ({p )Q (} 1) } } Q ({p )Q which is written as s0 (}) =
Q+2 X m=0
dm } m = ({p )Q } Q +2 ({p )Q } Q +1 ({p )1 } 2 + ({p )1 }
128
Spectra of special types of graphs
yields ; Q ? Q +2 X X
< ³ ´@ lp lp }n dm h(mn1) Q h(mn1) Q Jp (}) = lp lp > h Q h Q n=0 =m=n+1 < ; Q Q+2 ´@ ³ p 1 X? X }n (m n 1) = dm sin > = sin p Q Q 1
n=0
m=n+1
from which, for 0 n Q , ({p )n =
Q+2 ´ ³ p X 1 (m n 1) dm sin p sin Q Q m=n+1
Since ({p )0 = 0, this component relation reduces to 0=
Q +2 X
dm sin
´ ´ ³ p ³ p ´ (m 1) = ({p )Q sin (Q + 1) ({p )1 sin Q Q Q
³ p
m=1
such that ({p )Q = (1)p ({p )1 For n A 0, we have that Q+2 ´ ³ p X 1 (m n 1) d sin m sin p Q Q m=n+2 ´ ³ p ´o n ³ ({p )Q p = (Q n + 1) sin (Q n) sin sin p Q Q Q ´ ³ ´ ³ 2 ({p )Q p p = (2Q 2n + 1) cos p sin sin Q 2Q 2Q
({p )n =
Using ({p )Q = (1)p ({p )1 , we finally find the n-th component of the eigenvector {p belonging to the eigenvalue p = (P )Q p A 0, ({p )n =
´ ³ p ({p )1 (2n 1) cos cos p 2Q 2Q
A proper normalization of the eigenvectors, obeying {Wn {p = np as in art. 151, is readily obtained for 1 p Q 1 as r p 2 cos (2n 1) for 1 n Q ({p )n = Q 2Q Obviously, the eigenvector {Q =
sx Q
belonging to Q = 0 has the components
1 ({Q )m = s for 1 m Q Q
5.5 A path of k hops
129
5.5 A path of k hops A path of k A 0 hops/links in a graph with Q nodes has k non-zero rows with one non-zero element in the upper triangular part. After a similarity transform, the corresponding adjacency matrix can be transformed into " # (DS )(k+1)×(k+1) R(k+1)×(Q k1) Dk-hop path = R(Q k1)×(k+1) R(Q k1)×(Q k1) where DS is the tri-diagonal Toeplitz adjacency matrix of an k hops path in a graph with k + 1 nodes. Invoking (5.8), the spectrum of an k hops path possesses a zero eigenvalue of multiplicity Q k 1 and k + 1 eigenvalues ¶ μ n (k-hop path )n = 2 cos k+2 for n = 1> = = = > k + 1.
5.6 The wheel ZQ+1 The wheel graph ZQ+1 is the graph obtained by adding to the circuit graph one central node q with links (spokes) to each node of the circuit. Thus, the wheel graph is the cone of the circuit graph. The adjacency matrix is a special case of art. 60, ¸ DF xQ ×1 DZ = x1×Q 0 Since x is an eigenvector of DF belonging to F = 2 because the circuit is a regular graph, all eigenvalues of DF are the same as those of DZQ +1 , except for s the largest eigenvalue F = 2, which is replaced by two new ones, 1 ± 1 + Q , as derived in art. the spectrum of the wheel with Q + 1 nodes is n 60.³ Hence,´o s s 1 + Q + 1, 2 cos 2(p1) and 1 + 1 + Q . Q 2pQ
The Laplacian spectrum follows from art. 115 and (5.7) as, (W )Q +1 = 0, (W )1 = Q + 1 and ¶ μ 2 (p 1) p = 2> = = = > Q (W )Q +2p = 3 2 cos Q 5.7 The complete bipartite graph Np>q The complete bipartite graph Np>q consists of two sets M and N with p = |M| and q = |N | nodes respectively, where each node of one set is connected to all other nodes of the other set. There are no links between nodes of a same set. The adjacency matrix of Np>q is, with Q = p + q, ¸ Rp×p Mp×q DNp>q = Mq×p Rq×q
130
Spectra of special types of graphs
and the characteristic polynomial is ¡ ¢ det DNp>q L =
Lp×p Mq×p
Mp×q Lq×q
¸
Invoking (8.79) and Mn×q Mq×o = qMn×o gives ¶ μ ¡ ¢ 1 det DNp>q L = ()p det Lq×q + Mq×p Mp×q ´ ³p p M L = () det q×q ¶ μ ³ p ´q 2 = ()p det M L p q×q Using (5.1), the characteristic polynomial of Np>q is ¶ μ 2 ¶q1 μ 2 ³ p ´q ¢ ¡ q (1)q1 det DNp>q L = ()p p p ¡ 2 ¢ p+q1 p+q2 pq = (1) Q 2
and max = from which the eigenvalues2 follow as max > [0] trum reduces to that of a star topology N1>q for p = 1. The Laplacian of the complete bipartite graph Np>q is TNp>q =
qLp×p Mq×p
Mp×q pLq×q
and the characteristic polynomial is ¢ ¡ (q ) Lp×p det TNp>q L = Mq×p
s pq. This spec-
¸
Mp×q (p ) Lq×q
¸
A derivation similar to the above results in ¡ ¢ det TNp>q L = (p )q1 (q )p1 ((p ) (q ) qp) The eigenvalues of TNp>q are 0> [p]q1 > [q]p1 and p + q = Q . In the case of the star N1>q , the eigenvalues of TN1>q are 0> [1]q1 and q + 1. The complexity (Q ), the number of trees in Np>q , is found from (4.14) as Np>q (Q ) =
Q 1 1 Y n = pq1 qp1 Q n=1
and clearly, for the star where p = 1, N1>q (Q ) = 1. 2
We denote the multiplicity p of an eigenvalue by []p
5.8 A general bipartite graph
131
5.8 A general bipartite graph 5.8.1 Undirected bipartite graph Instead of connecting each of the p q nodes in the set M to each of the q nodes in the other set N , we may consider an arbitrary linking between the two sets represented by a matrix Up×q , resulting in a general bipartite graph Ep>q with adjacency matrix ¸ Rp×p Up×q DEp>q = W Uq×p Rq×q Using (8.80) when p q, the characteristic polynomial is ¶ μ ¢ ¡ 1 q W det DEp>q L = () det Lp×p + Up×q Uq×p ¢ ¡ qp W det Up×q Uq×p 2 Lp×p = () while, using (8.79) when p A q, we obtain ¢ ¢ ¡ W ¡ Up×q 2 Lq×q det DEp>q L = ()pq det Uq×p These two forms for p q and p A q are an illustration of Lemma 10. In the sequel, we confine to the case where p q without loss of generality. W The singular value decomposition of U is U = Xp×p ( U )p×q Yq×q , where U = diag(1 > = = = > p > 0> = = = > 0), because the rank of U cannot be larger than p q and where X and Y are orthonormal matrices (art. 151). From X X W = L and W UUW = Xp×p 2p×p Xp×p , we see that ¢ ¡ ¡ ¢ W ¢ ¡ W 2 Lp×p = det Xp×p 2p×p 2 L Xp×p det Up×q Uq×p p Y ¡ 2 ¢ m 2 = m=1
Hence, the spectrum of the general bipartite graph is p Y ¡ 2 ¢ ¢ ¡ m 2 det DEp>q L = (1)pq qp m=1
which show that, apart from the zero eigenvalues, it is completely determined by the singular values of U, because = ±m for m = 1> = = = > p. Since ¸ W Up×q Uq×p Rp×q D2Ep>q = W Rq×p Uq×p Up×q and, further for any integer n 1, that " ¡ ¢n W Up×q Uq×p 2n DEp>q = Rq×p
Rp×q ¡ W ¢n Uq×p Up×q
#
132
Spectra of special types of graphs
and
" D2n+1 Ep>q
=
Rp×p ¡ W ¢n W Uq×p Up×q Uq×p
# ¡ ¢n W Up×q Uq×p Up×q Rq×q
the even powers D2n Ep>q are reducible non-negative matrices (art. 167), while the odd powers again represent a “bipartite” matrix structure.
5.8.2 Directed bipartite graph A general directed bipartite graph EJ has an adjacency matrix, ¸ Rp×p Ep×q DEJ = Fq×p Rq×q
(5.10)
Any treenW onoQ = q + p nodes can be represented in the form of a levelset. (n) Denote by [Q the n-th level set of a tree W , which is the set of nodes in the tree (n)
W at hopcount n from the n root o in a graph with Q nodes, and by [Q the number (n) (0) of elements in the set [Q . Then, we have [Q = 1 because the zeroth level (n)
can only contain the root node itself. For all n A 0, it holds that 0 [Q Q 1 and that Q1 X (n) [Q = Q (5.11) n=0 (n) [Q
Nodes per level.
at a same level n are not interconnected. Fig. 5.4 draws a tree organized
Root 1
6
12
2
18
3
4
9
22
7
14
24
5
21
19
8
13
10
16
23
26
11
15
20
25
17
X N( 0)
1
X N(1)
5
X N( 2)
9
X N( 3)
7
X N( 4)
4
Fig. 5.4. An instance of a tree with Q = 26 nodes organized per level 0 $ n $ 4. The nodes in the tree are arbitrarily labeled.
5.8 A general bipartite graph
133
The levelset can be folded level by level to form a general bipartite graph. Indeed, (1) (1) the root connects to the nodes [Q at hop 1; those [Q are the ancestors of all (2) (2) the nodes on levelset [Q . We may arrange these [Q nodes at the side of the (2) (3) root. Next, these [Q are the ancestors of all [Q nodes, which we move to the (1) other side of the [Q node. In this way, all even levels are placed at the side of the root and all odd levels at the other side, thus creating a general directed bipartite graph. Hence, the adjacency matrix of any tree can be recast in the form of (5.10), P Q1 P Q2 (2n) (2n+1) 2 2 where p h= n=0 = Q p. In a stochastic setting, i [Q and q = n=0 [Q (n) where H [Q = Q Pr [KQ = n], we observe that the average multiplicity of the zero eigenvalue equals (assuming q A p) H [q p] = Q
Q1 X
n
Pr [KQ = n] (1) = Q *KQ (1)
n=0
where the probability generating function of the hopcount in a random tree is ¤ P £ Q 1 *KQ (}) = H } KQ = n=0 Pr [KQ = n] } n . ¤ £ W is an eigenvector belonging to eigenvalue , which means If {W = {F {E that ¸ ¸ ¸ ¸ Rp×p Ep×q {Fp×1 E{E {F = = Fq×p Rq×q {Eq×1 F{F {E ¤ £ W then also {W = {F {E is an eigenvector belonging to the eigenvalue , which shows that the spectrum is symmetric around = 0. The same result can be derived from (8.79) analogously to the spectrum of DEp>q above as ¢ ¡ det (DEJ L) = (1)pq qp det Ep×q Fq×p 2 Lp×p Hence, DEJ has, at least, q p zero eigenvalues. Consequently, we have demonstrated: Theorem 21 The spectrum of the adjacency matrix of any tree is symmetric around = 0 with q p zero eigenvalues.
5.8.3 Symmetry in the spectrum of an adjacency matrix D Theorem 21 can also be proven as follows. If the spectrum of D is symmetric, the characteristic polynomial fD ({) = f D ({) is even, which implies that the odd coe!cients f2n+1 of the characteristic polynomial fD ({) are all zero. Art. 29 shows that the product d1s1 d2s2 = = = dnsn of a permutation s = (s1 > s2 > = = = > sn ) of (1> 2> = = = > n) in a tree is always zero for odd n. Hence, (3.4) indicates that the spectrum of the adjacency matrix of any tree is symmetric. Indeed, the only nonzero product d1s1 d2s2 = = = dnsn in a tree is obtained when a link is traversed in both directions. Loops longer than two hops are not possible due to the tree structure and the permutation requirement for the determinant. The latter only admits paths of
134
Spectra of special types of graphs
n hops as subgraphs Jn in art. 29 because all first as well as second indices need to dierent in d1s1 d2s2 = = = dnsn (since dmm = 0). A longer (even) loop containing more than two hops will visit the intermediate nodes of the n-hop path twice, which the determinant structure does not allow. The skewness v , defined in art. 36, is zero for a tree, which again agrees with Theorem 21. We have called the spectrum of a matrix D symmetric around = 0 when, for each eigenvalue = u A 0 of D, there is an eigenvalue = u of D. The reverse of Theorem 21 is: Theorem 22 If the spectrum of an adjacency matrix D is symmetric around = 0, then the corresponding graph is a bipartite graph. Proof: Consider the adjacency matrix Rp×p D= Fq×p
Ep×q Gq×q
¸
where F = E W if the graph is undirected. Any adjacency matrix can be written in this form for p 1, because dmm = 0. The characteristic polynomial det (D L) follows from (8.79) as ¢ ¡ pq det (D L) = () det Gq×q 2 Lq×q Fq×p Ep×q The determinant on the right-hand side is only a symmetric polynomial in if G = R. In that case, D equals the adjacency matrix (5.10) of a bipartite graph. ¤
5.8.4 Laplacian spectrum of a tree We have shown in Section 5.8.2 that any tree can be represented by a general bipartite graph after properly folding the levelsets. Art. 7 indicates that the unsigned incidence matrix U and the incidence matrix E satisfy E W E = UW U, provided the (2n) links are directed from a node at an even levelset [Q to a node at an odd levelset (2n+1) [Q (or all in the opposite direction), for all levels 0 n ? Q 1. Under this condition, the relation (2.10) in art. 9 applies such that, with O = Q 1 in any tree W , ¡ ¢ m (W ) = m Do(W ) + 2 (5.12) for 1 m ? Q and Q (W ) = 0, as for any graph. Hence, the m-th Laplacian eigenvalue of a tree W equals the m-th eigenvalue of the (Q 1) ×(Q 1) adjacency matrix of its corresponding line graph o (W ). Petrovi´c and Gutman (2002) have elegantly proven that the path (with Q 1 hops) is the tree with smallest largest Laplacian eigenvalue. Their arguments are as follows. When adding a link to a graph, the Laplacian eigenvalues are nondecreasing as shown in art. 114 such that, among all connected graphs, some tree
5.9 Complete multi-partite graph
135
will have the smallest largest Laplacian, because a tree has the minimum number of links of any connected graph. Now, (5.12) shows that the smallest largest Laplacian in any tree is attained in the tree whose line graph has the smallest largest adjacency eigenvalue. This line graph has Q 1 nodes. The line graph of any tree on Q A 4 nodes possesses cycles, except for the path SQ (of Q 1 hops). Any connected cycle-containing graph J on Q 1 nodes has a spanning tree that contains the minimum number of links and whose largest adjacency eigenvalue is smaller than 1 (J) by Lemma 7. As shown in Section 5.4, the path SQ 1 has the smallest largest adjacency eigenvalue among all trees on Q 1 nodes. Since the line graph of the path SQ is the path SQ 1 , we conclude that the path SQ has the smallest largest Laplacian eigenvalue among connected graphs. Combining (5.8) for SQ1 and (5.12) yields ³ ³ ´´ 1 (SQ ) = 2 1 + cos ?4 Q which agrees, indeed, with (5.9).
5.9 Complete multi-partite graph Instead of two partitions, we now consider the complete p-partite graph, where each partition of nm nodes with 1 m p is internally not connected, but fully connected to any other partition. The corresponding adjacency matrix is 5
Rn1
9 Mn2 ×n1 9 Dp-partite = 9 .. 7 . Mnp ×n1
Mn1 ×n2 Rn2
··· ··· .. .
Mn1 ×np Mn2 ×np .. .
···
···
Rnp
6 : : : 8
The complement of the p-partite graph is the union of p cliques Nn1 > Nn2 > = = = > Nnp , whose spectrum is the union of the eigenvalues of each clique, given by (5.1). Thus, P the eigenvalues of Dfp-partite are {nm 1}1mp and [1]Qp , where Q = p m=1 nm . As we will see below, but also from art. 40, the eigenvalues of Dfp-partite are not quite helpful to derive those of Dp-partite . If all nm = n, then Dp-partite = DNp Mn×n , whose corresponding spectrum follows from art. 185 as {m (DNp ) o (Mn×n )}1mp>1on , where, according to (5.1), n n o o m (DNp ) 5 p 1> [1]p1 and o (Mn×n ) 5 n> [0]n1 . Thus, when all nm = n Qp
p1
and Q = np, the eigenvalues of Dp-partite are (p 1) n, [0] and [n] . The general case is obtained using the quotient matrix (art. 15). Since the row sum of each block matrix in Dp-partite is the same, the partition is equitable, with
136
Spectra of special types of graphs
corresponding quotient matrix 5 0 9 n1 9 (D )p-partite = 9 . 7 ..
n2 0
··· ··· .. .
np np .. .
n1
n2
···
0
6 : : : = (M L)p diag (nm ) 8
The eigenvalues of (D )p-partite are the non-trivial eigenvalues of Dp-partite . The remaining Q p eigenvalues of Dp-partite are zero, because, only when = 0, the matrix Dp-partite L has in each block nm identical rows. The eigenvalues of (D )p-partite are obtained by subtracting the first row from all the others, which results in 5 6 n3 ··· np n2 9 : 0 ··· 0 : ³ ´ 9 n1 + ( + n2 ) 9 : 0 ( + n3 ) · · · 0 det (D )p-partite L = 9 n1 + : 9 : .. .. .. .. 7 8 . . . . =
n1 +
0
0 W
| (n1 + ) x diag ( ( + nm ))
¸
···
( + np )
where the (p 1) × 1 vector | = (n2 > n3 > = = = > np ). Using the Schur-complement (8.80), ¶ μ μ ¶ p ³ ´ Y 1 p W det (D )p-partite L = (1) (n1 + ) x ( + nm ) det + | diag + nm m=2 3 4 p p Y X nm D p = (1) ( + nm ) C (n1 + ) + nm m=2 m=2 and the eigenvalues are the zeros of the polynomial of degree p in 3 4 p p p ³ ´ Y X Y p det (D )p-partite L = (1) C ( + nm ) no ( + nm )D m=2
o=2
m=1;m6=o
When multiplying out, we find that all coe!cients fm of the polynomial p ´ ³ X p1 fm m det (D )p-partite L = (1) m=0
are positive, except for fp = 1 and fp1 = 0. Explicitly, we have p ³ ´ Y p1 det (D )p-partite = (p 1) nm f0 = (1) m=1
and fm = (p m 1) hpm (n1 > n2 > = = = > nq )
5.9 Complete multi-partite graph
137
polynomial (art. 200). for 0 m p 2, where hn is the ³ elementary symmetric ´ Art. 218 demonstrates that det (D )p-partite L has only one positive zero, while all others are negative. The positive zero — the largest eigenvalue of Dp-partite and (D )p-partite — is the unique, positive solution of p X m=1
1 =1 1 + nm
(5.13)
which shows that (p 1) min1mp nm (p 1) max1mp nm . The spectral gap of Dp-partite is equal to the largest eigenvalue of (D )p-partite , because 2 (Dp-partite ) = 0. Therefore, an explicit expression for the the largest eigenvalue 1 (Dp-partite ) is desirable to estimate the influence of the partitions nm on the spectral gap. Below, we devote some eort and present two dierent expansions for 1 (Dp-partite ). If all nm = n, then, as found above, ³ ´ det (D )p-partite L = (1)p ( + n)p1 ( (p 1) n) which reduces to the characteristic polynomial (5.1) of the complete graph Np if n = 1. The spectral gap is np n = Q n, which equals that of the complete graph NQ minus n. When not all nm are equal, the spectral gap is smaller than Q n as verified from Lagrange optimization of (5.13) for all nm subject to Q = Pp m=1 nm . This underlines that regularity in a graph’s structure scores highest in terms of robustness. The Lagrange expansion (art. 234) of the zero (}0 ) = 1 (Dp-partite ) of i (}) =
p X m=1
1 1 1 + n}m
around }0 , where }0 (p 1) min1mp nm , follows from (9.42) in which P nm 1 p i0 (}0 ) m=1 nm +}0 = Pp nm i1 (}0 ) 2 m=1 (nm +}0 )
and, for any integer t A 0, nm t1 Pp (1) m=1 (nm +}0 )t+1 it (}0 ) = Pp nm i1 (}0 ) 2 m=1 (nm +}0 )
0) Clearly, the closer }0 is chosen to 1 (Dp-partite ), the smaller ii01 (} (}0 ) and the faster ¯ ¯ ¯ i (} ) ¯ the Lagrange series converges (because ¯ it1 (}00 ) ¯ ? 1 for any t A 1 and }0 0).
A reasonable choice is }0 = Q , although }0 = p1 p Q is better as deduced below. Since the Lagrange series (9.42) up to five terms lacks elegance and insight with i (} ) these it1 (}00 ) , we present another method.
138
Spectra of special types of graphs
We can rewrite (5.13) as p1 = Pp 1
(5.14)
m=1 +nm
from which A A Pp
Sp1 p 1
. Iterating (5.14) once gives a sharper lower bound
m=1 nm
p1
=
1
m1 =1 nm + S p1 p 1
1 m2 =1 nm 2
p1 Ã ´ P p
³P p
1 m2 =1 nm2
!
1 S m1 =1 p1+ p m =1 2
nm 1 nm 2
After t-times iterating the equation, we obtain a finite continued fraction expansion p1
A Pp
1
m1 =1 nm + S p 1
p1
m2 =1 n + m2
..
1 p1
. Spp1 1
mt =1 nm t
that approaches 1 (Dp-partite ) arbitrarily close from below for su!ciently large t. Finally, for real positive numbers d1 > d2> > = = = > dq , the harmonic, geometric and arithmetic mean inequality (Hardy et al., 1999) is v uY q u q q 1X q Pq 1 t dm dm (5.15) q m=1 m=1 dm m=1 with equality only if all dm are equal. Applied to (5.14) yields 3 4 μ ¶ p p X Q p1 p 1 1 CX p1 p D Pp + = + n = m 1 p p p m=1 p p m=1 +nm m=1 whence
p1 Q p
Only when p = Q (in case nm = 1), the largest eigenvalue 1 (Dp-partite ) equals that of the complete graph.
5.10 An p-fully meshed star topology In the complete bipartite graph Np>q , the p star nodes are not interconnected among themselves. The opposite variant, which we now consider, is essentially Np>q where all nodes in the p set are fully connected. We denote this topology by Jpstar . The adjacency matrix of a graph of p stars, in which node 1 up to node p has degree Q 1 while all other nodes have degree p, is ¸ (M L)p×p Mp×(Q p) Dpstar = M(Q p)×p R(Q p)×(Q p)
5.10 An p-fully meshed star topology
139
Observe that Dpstar = DNp>q + DˇNp , where DˇNp =
(M L)p×p R(Q p)×p
¸
Rp×(Q p) R(Q p)×(Qp)
The characteristic polynomial is det (Dpstar L) = det
(M ( + 1) L)p×p M(Qp)×p
Mp×(Qp) L(Q p)×(Qp)
¸
and this determinant will be solved in two ways by applying (8.79) first and then (8.80). Applying (8.79) and using [p×p = (M ( + 1) L)p×p
(5.16)
gives ¢ ¡ 1 Mp×(Q p) det (Dpstar L) = det ([) det L M(Qp)×p [p×p We first need to compute the inverse of [p×p = (M ( + 1) L)p×p , which we adj[ compute as [ 1 = det [ , where the adjoint matrix adj(D) is the transpose of the matrix of the cofactors of D. An inspection of the matrix [ shows that there are ˚mm of a diagonal element of [ equals precisely two types of cofactors. The cofactor [ 5 9 ˚mm = det 9 [ 9 7
1 = = = 1 1 = = = 1 .. .. .. . . . 1 1 = = =
6 : : : = det (M ( + 1)L)(p1)×(p1) 8
˚lm (with l 6= m) is The o-diagonal cofactor [ 5 9 9 9 9 9 l+m ˚ [lm = (1) det 9 9 9 9 9 7
1 = = = 1 = = = .. .. .. . . . 1 1 === .. .. . . 1
1
===
l-th col
1 1 .. . 1 .. . 1
6 === ===
1 1 .. .
: : : : : : : === 1 : .. : .. : . . 8 = = =
where the m-th row and the l-th column consist of all ones. Subtracting row m from
140
Spectra of special types of graphs
all other rows yields 5 0 9 1 9 0 1 9 9 . . 9 .. .. ˚lm = (1)l+m det 9 [ 9 1 1 9 9 .. .. 9 7 . . 0 0
=== === .. . ===
l-th col
===
6
0 0 .. .
=== ===
1 .. .
=== 1 .. .. . . = = = 1
0
0 0 .. .
: : : : : : : : : : 8
The l-th column now has only one non-zero element at row m, such that the determinant is equal to (1)l+m times the minor of element (m> l), which is (1)p2 ( + 1)p2 . Hence, the adjoint matrix has all elements equal to (1)p2 ( + 1)p2 exp2 cept for the diagonal elements that are equal to (1)p2 ( + 1) ( + 2 p), adj ([) = (1)p2 ( + 1)p2 M ³ ´ p2 + (1)p2 ( + 1)p2 + (1)p2 ( + 1) ( + 2 p) L = (1)p2 ( + 1)p2 M + (1)p2 ( + 1)p2 ( + 1 p) L = (1)p2 ( + 1)p2 (M + ( + 1 p) L) and, since det [ = (1)p1 ( + 1)p1 ( + 1 p), the inverse matrix of [ is [ 1 =
1 (M + ( + 1 p) L)p×p ( + 1) ( + 1 p)
(5.17)
We now compute 1 \ = M(Q p)×p [p×p Mp×(Q p) 1 M(Qp)×p (Mp×p + ( + 1 p) Lp×p ) Mp×(Qp) = ( + 1) ( + 1 p)
Using Mn×q Mq×o = qMn×o gives ¢ ¡ 1 \ = pM(Qp)×p + ( + 1 p) M(Q p)×p Mp×(Q p) ( + 1) ( + 1 p) ¢ ¡ 2 1 p M(Q p)×(Q p) + p ( + 1 p) M(Q p)×(Q p) = ( + 1) ( + 1 p) whence \ =
p M(Qp)×(Qp) ( + 1 p)
Combining all yields
(5.18)
¶ p M L +1p (Q p)×(Q p) ¶Q p ¶ μ μ p ( + 1 p) L = det ([) det M +1p p
det (Dpstar L) = det (M ( + 1) L)p×p det
μ
5.10 An p-fully meshed star topology
141
Finally, using (5.1) leads to ¶Q p p p1 ( + 1) ( + 1 p) ( + 1 p) ¶Qp1 μ ¶ μ ( + 1 p) ( + 1 p) Q +p × p p μ
det (Dpstar L) = (1)Q
p1
= (1)Q ( + 1)
Qp1 ( ( + 1 p) p (Q p)) (5.19)
= (1)Q ( + 1)p1 Qp1 ( ) ( + ) where
s p1 ± ± = 2
μ
p (Q p) + p1
p1 2
¶2
Q 1p
The eigenvalues of Dpstar arep > [1] > [0] , and (max )pstar = + , which is larger than (max )Np>q = p (Q p) as was expected from Gerschgorin’s Theorem 36. When viewing the complete spectrum, we observe that (a) the spectrum is not symmetric in anymore for p A 1 and (b) that all other are ³s eigenvalues ´ very small, except for = || ? , which is = R () = R pQ . If p = Q 1, the pstar topology equals NQ . It is readily verified that, indeed, for p = Q 1, the spectrum reduces to that of NQ . If p = Q 2, then the pstar topology equals NQ minus one link, for which the eigenvalues are > [1]Q 3 > 0, and s ) ( Q 3 8 (Q 2) (max )pstar = + = 1+ 1+ 2 2 (Q 3) =Q 12
(Q 2 2Q 1) 3
(Q 3)
¢ ¡ + R Q 2
Hence, by deleting one link in the complete graph NQ , the spectral gap (art. 55) reduces from Q to + ? Q 1. The spectral gap of the complete multi-partite graph (Section 5.9) equals Q 2, when n = 2 and Q = 2p. In that case, p links are removed from the complete graph NQ in such a way that each node has still degree Q 2. The second, considerably more e!cient way of computing det (Dpstar L) is based on (8.80), ¶ μ 1 Q p det (Dpstar L) = () det (M ( + 1) L)p×p + Mp×(Q p) M(Qp)×p Using Mn×q Mq×o = qMn×o and (5.1) leads, after some manipulations, to (5.19). The first, elaborate computation supplies us with the matrices [ 1 in (5.17) and \ in (5.18), that will be of use later in Sections 5.10.2 and 5.10.3.
142
Spectra of special types of graphs
The spectrum of Dpstar can be determined in yet another way3 . Since Dpstar has Q p identical rows, it has an eigenvalue 0 with multiplicity at least Q p 1. Further, since Dpstar + L has p identical rows, it follows that Dpstar has an eigenvalue 1 with multiplicity at least equal to p 1. The remaining two other eigenvalues are obtained after determining the eigenvector that is orthogonal to the eigenvector (with constant components) belonging to = 0 and that belonging to = 1.
5.10.1 Fully-interconnected stars linked to two separate groups In stead of the Mp×(Q p) matrix in Dpstar of Section 5.10, a next step is to consider some matrix E. Thus, instead of connecting each of the p fully interconnected stars to all other non-star nodes, each such star does not necessarily need to connect to all other nodes, but to a few. Let us consider " Dopqstar =
#
Dp×p
Ep×(Q p)
W E(Q p)×p
R(Q p)×(Q p)
where E=
Mq×o
Rq×(Q po)
R(pq)×o
M(pq)×(Qpo)
¸
which means that q stars all reach the same o nodes and p q stars all reach Q p o other nodes. The eigenvalue analysis is simplified if we consider D = R. Then, using (8.79) gives det (Dopqstar
μ ¶ 1 W L) = () det L + E E p
where " W
E E= = = 3
#
W Mq×o
W R(pq)×o
W Rq×(Qpo)
W M(pq)×(Qpo
Mo×q
Ro×(pq)
R(Q po)×q
M(Q po)×(pq)
qMo×o R(Q po)×o
Ro×(Q po) (p q) M(Q po)×(Q po)
¸
This method was pointed out to me by E. van Dam.
Mq×o
Rq×(Qpo)
R(pq)×o
M(pq)×(Q po)
Mq×o
Rq×(Q po)
R(pq)×o ¸
M(pq)×(Q po)
¸ ¸
5.10 An p-fully meshed star topology Thus,
143
¶ μ ¶ μ ¢ 1¡ W 1 E E 2 L e = det L + E W E = det ¡ W ¢ pQ 2 det E E L = ¸ qMo×o 2 L Ro×(Qpo) pQ det = R(Q po)×o (p q) M(Qpo)×(Q po) 2 L ¡ ¢ ¡ ¢ = pQ det qMo×o 2 L det (p q) M(Q po)×(Q po) 2 L ¶ μ 2 = pQ qo det Mo×o L (p q)Q po q ¶ μ 2 L × det M(Q po)×(Q po) pq
With (5.1), we arrive at ¶ 2 o (p q)Q po q ¶Q po1 μ ¶ μ 2 2 Qpo1 Q +p+o × (1) pq pq ¡ ¢ ¡ ¢ = (1)Q Q4 2 qo 2 (Q p o) (p q) p s and the eigenvalues of DopqstarJ are ± qo> ± (Q p o) (p q) and [0]Q4 . For o = q = 0, the spectrum reduces to that of Np>Q p (as it should since Dopqstar = Np>Qp ). μ
det (Dopqstar L) = ()p pQ qo (1)o1
2 q
¶o1 μ
5.10.2 Star-like, two-hierarchical structure We compute the spectrum of a classical star-like, two-hierarchical telephony network where " # (M L)p×p Ep×(Q p) Dpdoublestar = W E(Q R(Qp)×(Q p) p)×p where 5 9 9 E=9 7
1 ··· 0 ··· .. . 0 ···
1 0 ··· 0 1 ···
0 1 .. . ···
0 ··· 0 ···
··· ···
0 0 .. .
0
···
1
1
6 : : : = Lp×p x1×o 8
with x1×o is the o component long all-one vector and the Kronecker product is defined in art. 185. Thus, the dimension of E is p × op and Q p = op, and the number of nodes in Ddoublestar is Q = (o + 1) p. All p fully interconnected nodes (Dp×p = (M L)p×p ) may represent the highest level core in a telephony
144
Spectra of special types of graphs
network. Each of these p nodes connects to o dierent lower level nodes, the local exchanges, in the telephony network. Applying (8.79) and denoting [p×p = (M ( + 1) L)p×p , the characteristic polynomial is ³ ´ 1 W det (Dpdoublestar L) = det ([) det L E(Q p)×p [p×p Ep×(Qp) In Section 5.10, the inverse of [p×p = (M ( + 1) L)p×p is computed in (5.17). Thus, 1 W E(Q p)×p [p×p Ep×(Q p) =
W (M + ( + 1 p) L)p×p Ep×(Qp) E(Qp)×p
( + 1) ( + 1 p)
Further, with Mp×p Ep×(Q p) = Mp×p (Lp×p x1×o ) = (Mp×p x1×1 ) (Lp×p x1×o ) = (Mp×p Lp×p x1×1 x1×o ) = Mp×p x1×o = Mp×po and, similarly, W E(Q p)×p Mp×po = (Lp×p xo×1 ) (Mp×p x1×o ) = Mp×p xo×1 x1×o
= Mp×p Mo×o = Mpo×po we obtain, using properties of the Kronecker product (Meyer, 2000, p. 598), that W (M + ( + 1 p) L)p×p Ep×(Qp) \ = E(Qp)×p W W = E(Qp)×p Mp×po + ( + 1 p) E(Q p)×p Ep×(Q p)
= Mpo×po + ( + 1 p) (Lp×p xo×1 ) (Lp×p x1×o ) = Mpo×po + ( + 1 p) (Lp×p xo×1 x1×o ) = Mp×p Mo×o + ( + 1 p) (Lp×p Mo×o ) = {Mp×p + ( + 1 p) Lp×p } Mo×o Hence,
´ ³ 1 W F = det L E(Q p)×p (M ( + 1) L)p×p Ep×(Q p) ¶ μ 1 {Mp×p + ( + 1 p) Lp×p } Mo×o = det L + ( + 1) ( + 1 p) det ( ( + 1) ( + 1 p) L + {Mp×p + ( + 1 p) Lp×p } Mo×o ) = ( + 1)po ( + 1 p)po
The eigenvalues of Gp×p Ho×o are the po numbers {m (G) n (H)}1mp>1no (art. 185). n The eigenvalues of Go= Mp×p + ( + 1 p) Lp×p follow from (5.1) as p1 (G) = [ + 1 p] > + 1 , while the eigenvalues of H = Mo×o are (H) = n o o1 [0] > o . Hence, det (Gp×p Ho×o }L) = 0 has the zeros [0]pop > o ( + 1)
5.10 An p-fully meshed star topology p1
and [o ( + 1 p)]
145
and the same as the polynomial
} pop (} o ( + 1)) (} o ( + 1 p))p1 such that the polynomial F in has, with } = ( + 1) ( + 1 p), the same zeros as ¯ p1 ¯ } pop (} o ( + 1)) (} o ( + 1 p)) ¯ F = ¯ po po ¯ ( + 1) ( + 1 p) }=} Combined, again using (5.1), yields p1
det (Dpdoublestar L) = (1)p ( + 1)
( + 1 p)
¯ } pop (} o ( + 1)) (} o ( + 1 p))p1 ¯¯ × ¯ ¯ ( + 1)po ( + 1 p)po
}=}
Simplified, p1
det (Dpdoublestar L) = (1)p pop ( ( + 1 p) o) ( ( + 1) o)
The eigenvalues of Dpdoublestar are, beside a high-multiplicity root at zero [0]pop , p s £ ¤p1 p1 1 1 (p 1)2 + 4o and 1 . The number of dierent eigen2 ± 2 2 ± 2 1 + 4o values equals four, which implies that the diameter is three (art. 39). The double star with p = 2 and Q = 2 (o + 1) is proved in Das and Kumar (2004) to have as largest eigenvalue s s (Q 1) + 2Q 3 1 1s max (D2doublestar ) = = + 2Q 3 2 2 2 5.10.3 Complementary double cone We consider a complete graph NQ to which two nodes, labeled by Q + 1 and Q + 2, are connected. Node Q + 1 is connected to p nodes in NQ and node Q + 2 to the Q p other nodes. The corresponding adjacency matrix of this complementary double cone (CDC) on NQ is ¸ (M L)Q ×Q EQ×2 DFGF = W E2×Q R2×2 (Q +2)×(Q +2) where
EQ ×2 =
xp×1
0p×1
0(Q p)×1
x(Q p)×1
¸
The CDC graph has diameter 3 and each other graph with diameter 3 is a subgraph of CDC (see also art. 42 on strongly regular graphs). The corresponding Laplacian is ¸ Q L (M L)Q×Q EQ ×2 TFGF = W E2×Q diag (p> Q p)
146
Spectra of special types of graphs
whose eigenvalues follow from (Q + 1 ) L MQ ×Q det (TFGF L) = det W E2×Q
EQ×2 diag (p > Q p )
¸
We ³ apply (8.80), with G = diag(p > Q p ), whose inverse is G1 = ´ 1 1 diag (p ) > (Q p ) , and ³ ´ W EG1 F = EQ ×2 diag (p )1 > (Q p )1 E2×Q # ¸" 1 0 x1×p xp×1 0p×1 p = 1 0 0(Qp)×1 x(Q p)×1 01×p Q p " # 1 Rp×(Q p) p Mp×p = 1 R(Qp)×p Qp M(Q p)×(Q p)
01×(Q p) x1×(Q p)
¸
such that, with D = (Q + 1 ) L MQ×Q , we obtain W = D EG1 F 5 ³ ´ 1 (Q + 1 ) Lp×p p +1 M =7 M(Qp)×p
6 Mp×(Qp) ³ ´ 8 1 (Q + 1 ) L Qp +1 M
Hence, det (TFGF L) = det G det W = (p ) (Q p ) det W The determinant of W is computed with (8.79). The computation is similar to those of p fully connected stars in Section 5.10. Using (5.16), we express the matrix as (Q + 1 ) Lp×p where + 1 =
p 1+p
1+p 1+p Mp×p = [p×p p p
(Q + 1 ). With (5.18), we have
p (p ) p M(Q p)×p [ 1 Mp×(Qp) = M(Qp)×(Qp) 1+p (1 + p ) ( + 1 p)
(p) Q p+1 p such that, with = (1+p) (+1p) + Q p , μ ¶ ¡ ¢ 1+p [p×p det (Q + 1 ) L(Qp)×(Qp) M det W = det p ¶p μ ¢ ¡ 1+p ()Qp det (M ( + 1) L)p×p = p ¶ μ Q +1 L × det M
5.11 A chain of cliques
147
Using (5.1) yields ¶ μ 1+p ( + 1 p) (Q + 1 )Q 2 (Q + 1 (Q p) ) det W = p After simplification, we find that n o p (Q p) (Q + 1) (Q + 1)2 p + p (Q p) + 2 (Q + 1) 2 3 = (p (Q p) (Q + 1) + 2 ) (Q p ) We now compute Q + 1 (Q p) = v() u , where ¢ ¡ u = p (Q p) (Q + 1) + 2 (Q p ) The result is ¡ © ª ¢ v () = 3 22 (Q + 1) + (Q + 1)2 + p (Q p) p(Q p) (Q + 2) The polynomial v() has degree 3 in and the sum of its zeros is 2 (Q + 1), while the product is p(Q p) (Q + 2). Combining all factors yields det W =
1 Q 2 (Q + 1 ) v () (p ) (Q p )
and det (TFGF L) = det G det W Q2
= (Q + 1 )
v () Q 2
In summary, the eigenvalues of (TFGF )(Q +2)×(Q+2) are 0, [Q + 1] three real positive roots of v ().
, and the
5.11 A chain of cliques A chain of G+1 cliques is a graph JG (q1 > q2 > ===> qG+1 ) consisting of G +1 complete graphs Nqm (clique) with 1 m G+1 where each clique Nqm is fully interconnected with its neighboring cliques Nqm1 and Nqm+1 . Two graphs J1 and J2 are fully interconnected if each node in J1 is connected to each node in J2 . An example of a member of the class JG (q1 > q2 > ===> qG+1 ) is drawn in Fig. 5.5. The total number of nodes in JG (q1 > q2 > ===> qG+1 ) is Q=
G+1 X
qm
(5.20)
m=1
The total number of links in JG is O=
G+1 Xμ m=1
qm 2
¶ +
G X m=1
qm qm+1
(5.21)
148
Spectra of special types of graphs
K1
K3
K4
K8 Fig. 5.5. A chain of cliques JW4 (8> 1> 3> 4).
where the first sum equals the number of intra-cluster links and the second the number of inter-cluster links. The main motivation to study the class of graphs JG (q1 > q2 > ===> qG+1 ) with qm 1 are its extremal properties, which are proved in Wang et al. (2010b): Theorem 23 Any graph J(Q> G) with Q nodes and diameter G is a subgraph of at least one graph in the class JG (q1 = 1> q2 > ===> qG > qG+1 = 1)= Theorem 24 The maximum of any Laplacian eigenvalue l (JG )> l 5 [1> Q ] achieved in the class JG (q1 = 1> q2 > ===> qG > qG+1 = 1) is also the maximum among all the graphs with Q nodes and diameter G. Theorem 25 The maximum number ¡Q G+2¢of links in a graph with given size Q and diameter G is Omax (Q> G) = + G 3, which can only be obtained by 2 either JG (1> ===> 1> qm = Q G> 1> ===> 1) with m 5 [2> G]> where only one clique has size larger than one, or by JG (1> ===> 1> qm A 1> qm+1 A 1> 1> ===> 1) with m 5 [2> G 1] where only two cliques have size larger than one and they are next to each other. Another valuable theorem, due to van Dam (2007) and related to Theorem 25, is: Theorem 26 The graph JG (q1 > q2 > ===> qG+1 ) with q[ G+1 ] = Q G and all other 2 qm = 1 is the graph with largest spectral radius (i.e., the largest eigenvalue of the adjacency matrix) among all graphs with a same diameter G and number of nodes Q. Here, we will compute the Laplacian spectrum of JG (q1 > q2 > ===> qG1 > qG > qG+1 ): we will show that Q G eigenvalues are exactly known, while the remain G eigenvalues are the positive zeros of an orthogonal polynomial. The adjacency matrix
5.11 A chain of cliques
149
DJG of JG (q1 > q2 > ===> qG1 > qG > qG+1 ) is 5
Meq1 ×q1 9 M 9 q2 ×q1 9 9 9 9 9 9 9 7
Mq1 ×q2 Meq ×q 2
2
6 : : : : : : : : : 8
Mq2 ×q3 ..
.
Mql ×ql1
Meql ×ql
Mql ×ql+1 .. . MqG+1 ×qG+1
MeqG+1 ×qG+1
where Me = M L. Theorem 27 The characteristic polynomial of the Laplacian TJG of the class of graphs JG (q1 > q2 > ===> qG+1 ) equals ¡ ¢ QG+1 q 1 det TJG L = sG () m=1 (gm + 1 ) m (5.22) where gm = qm1 + qm + qm+1 1 denotes the degree of a node in clique m. The QG+1 polynomial sG () = m=1 m is of degree G +1 in and the function m = m (G; ) obeys the recursion μ ¶ qm1 + 1 qm (5.23) m = (gm + 1 ) m1 with initial condition 0 = 1 and with the convention that q0 = qG+2 = 0. The proof below elegantly uses the concept of a quotient matrix, defined in Section 2.1.3. An elementary, though more elaborated proof, which is basically an extension of the derivation in Section 5.10.3, is found in Van Mieghem and Wang (2009). Consider the n-partition of a graph J that separate the node set N of J into n 5 [1> Q ] disjoint, non-empty subsets {N1 > N2 > ===> Nn }. Correspondingly, the quotient matrix D of the adjacency matrix of J is a n × n matrix, where Dl>m is the average number of neighbors in Nm of nodes in Nl . Similarly, the quotient matrix T of the Laplacian matrix T of J is a n × n matrix, where ( D X l>m , if l 6= m Tl>m = Dl>n > if l = m l6=n
As defined in art. 15, a partition is called regular or equitable if for all 1 l> m n the number of neighbors in Nm is the same for all the nodes in Nl . The eigenvalues derived from the quotient matrix D (T ) of the adjacency D (Laplacian T) matrix are also eigenvalues of D (Laplacian T) given the partition is equitable (see art. 15). Proof: The partition that separates the graph JG (q1 > q2 > ===> qG+1 ) into the G + 1 cliques Nq1 > Nq2 > ===> NqG+1 is equitable. The quotient matrix T of the
150 Laplacian matrix 5 q2 9 q 1 9 9 9 T = 9 9 9 9 7
Spectra of special types of graphs T of J is q2 q1 + q3 q2
6 q3 q2 + q4
q4 .. . qG1
qG1 + qG+1 qG
qG+1 qG
: : : : : : : : 8 (5.24)
We use (8.79) to det (T L) ¯ ¯ q2 q2 ¯ ¯ q q1 + q3 q3 1 ¯ ¯ q2 q2 + q4 q4 ¯ = ¯¯ .. . ¯ ¯ ¯ qG1 qG1 + qG+1 ¯ ¯ qG ¯ ¯ q q 2 ¯ ¯ q1 + q3 q 1 q3 2 ¯ ¯ ¯ ¯ q2 q2 + q4 q4 ¯ ¯ = (q2 ) ¯ ¯ .. ¯ ¯ . ¯ ¯ ¯ q ¯
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ qG+1 ¯¯ qG ¯
G
We repeat the method and obtain ¶ μ q1 q2 × det (T L) = (q2 ) q1 + q3 q2 5 q2 q3 q4 q2 + q4 q q2 q1 +q3 q 1 2 9 9 .. . det9 9 7 qG1 qG1 + qG+1 qG
6 : : : : qG+1 8 qG
Eventually, after subsequent expansions using (8.79), we find YG+1 det (T L) = m = sG () m=1
where m follows the recursion m = (qm1 + qm+1 )
qm1 qm m1
with initial condition 0 = 1 and with the convention that q0 = qG+2 = 0. When written in terms of the degree gm = qm1 + qm + qm+1 1, we obtain (5.23). Any two nodes v and w in a same clique Nql of JG are connected to each ¡ other and¢ they are connected to the same set of neighbors. The two rows in det TJG L
5.11 A chain of cliques
151
corresponding to node v and w are the same when ¡ = gl +¢1, where gl is the degree of all nodes in clique Nql . In this case, det TJG L = 0 since the rank of TJG L is reduced by 1. Hence, = gl + 1 is an eigenvalue of the Laplacian matrix TJG . The corresponding eigenvector { has only two non-zero components, {v = {w 6= 0. Since the G + 1 partitions of JG (q1 > q2 > ===> qG+1 ) are equitable, the G + 1 eigenvalues of T , which are the roots of det (T L) = 0> are also the eigenvalues of the Laplacian matrix TJG . Each eigenvector of TJG , belonging to the G + 1 eigenvalues, has the same elements {v = {w if the nodes v and w belong to the same clique. Hence, the Laplacian matrix TJG has G + 1 non-trivial eigenvalues, which are the roots of det (T L) = 0 and trivial eigenvalues gm + 1 with multiplicity qm 1 for 1 m G + 1. ¤
5.11.1 Orthogonal polynomials In the sequel, we will show that the polynomial sG () in Theorem 27 belongs to a set of orthogonal polynomials (see Chapter 10). The dependence of m on the diameter G and on is further on explicitly written. Lemma 4 For all m 0, the functions m (G; {) are rational functions m (G; {) =
wm (G; {) wm1 (G; {)
(5.25)
where wm ({) is a polynomial of degree m in { = and w0 (G; {) = 1. Proof: It holds for m = 1 as verified from (5.23) because 0 (G; {) = 1. Let us assume that (5.23) holds for m 1 (induction argument). Substitution of (5.25) into the right-hand side of (5.23), ( ({+q +q )w (G;{)q q w (G;{) m1 m+1 m1 m1 m m2 1mG wm1 (G;{) m (G; {) = ({+qG )wG (G;{)qG qG+1 wG1 (G;{) m =G+1 wG (G;{) indeed shows that the left-hand side is of the form (5.25) for m. This demonstrates the induction argument and proves the lemma. ¤ The polynomial of interest, sG () =
QG+1 m=1
m (G; ) =
G+1 X n=0
fn (G) n =
G+1 Y
(}n )
(5.26)
n=1
(where the product with the zeros }G+1 }G · · · }1 follows from the definition of the eigenvalue equation (8.5)) equals with (5.25) QG+1 m=1 wm (G; {) sG ({) = QG+1 = wG+1 (G; {) m=1 wm1 (G; {)
152
Spectra of special types of graphs
We rewrite (5.25) as wm (G; {) = m (G; {) wm1 (G; {) and with (5.23), we obtain the set of polynomials ; wG+1 (G; {) = ({ + qG ) wG (G; {) qG qG+1 wG1 (G; {) ? wm (G; {) = ({ + qm1 + qm+1 ) wm1 (G; {) qm1 qm wm2 (G; {) for 1 m G = w1 (G; {) = ({ + q2 ) w0 (G; {) (5.27) where w0 (G; {) = 1. Art. 248 demonstrates that, for a fixed G, the sequence {wm (G; {)}0mG+1 is a set of orthogonal polynomials because (5.27) obeys Favard’s three-term recurrence relation. By Theorem 66, the zeros of any set of orthogonal polynomials are all simple, real and lie in the orthogonality interval [d> e], which is here for the Laplacian equal to [0> Q ]. By iterating the equation upwards, we find that ; m+1 A ? Y qp 1 m G wm (G; 0) = (5.28) p=2 A = 0 m =G+1 Thus, wG+1 (G; 0) = 0 (and thus G+1 (G; 0) = 0) implies that sG () must have a zero at = 0, which is, indeed, a general property of any Laplacian (art. 66). From (5.25), it then follows that m (G; 0) = qm+1 A 0 The eigenvalues of the Jacobi matrix (art. 260), 5 q2 1 9 q1 q2 (q1 + q3 ) 1 9 9 .. . . .. .. P =9 . 9 7 qG1 qG (qG1 + qG+1 ) 1 qG qG+1 qG
6 : : : : : 8
(5.29)
are equal to the zeros of sG ({). Moreover, we observe that also the quotient matrix T in (5.24) possesses the same eigenvalues as the Jacobi matrix P . Since the eigenvalues of P are simple, art. 142 shows that there exists a similarity transform that maps the Jacobi matrix P into the quotient matrix T (and vice versa). Moreover, the matrix P can be symmetrized by a similarity transform, 3 4 E F E F 1 1 1 E F K = diag E1> s >===> > = = = > F m1 G E F q1 q2 Y Y s s C q1 qG+1 qn D q1 qm qn n=2
n=2
and the eigenvector belonging to zero equals h q q2 e (G; 0) = K (G; 0) = 1 q1
···
q
qG1 q1
q
qG q1
iW
5.11 A chain of cliques f = KP K 1 , After the similarity transform K, the result is P s 5 q1 q2 q2 s s 9 q1 q2 (q1 + q3 ) q2 q3 9 9 .. .. . f .. P =9 . . 9 s s 7 qG1 qG (qG1 + qG+1 ) qG qG+1 s qG qG+1 qG
153 6 : : : : : 8
The corresponding square root matrix D of the Gram matrix P = DW D can be computed explicitly as s 5 s 6 q2 q1 s s 9 0 : q3 q2 9 : 9 : . . . .. .. .. D=9 : 9 : s s 7 0 qG+1 qG 8 0 0 in contrast to the general theory in art. 264, where each element is a continued fraction. In summary, all non-trivial eigenvalues of TJG are also eigenvalues of the simpler f. Properties and bounds on those nontrivial eigenvalues matrices T , P or P and zeros of sG () as well as the spectrum of the corresponding adjacency matrix are studied in Van Mieghem and Wang (2009). We mention the asymptotic scaling law: Theorem 28 For a constant diameter G and a large number Q of nodes, all nontrivial eigenvalues of both the adjacency and Laplacian matrix of any graph in the class JG (q1 > q2 > ===> qG+1 ) scale linearly with Q , the number of nodes. All coe!cients fn (G) of sG () in (5.26) can be computed explicitly in terms of the clique sizes q1 > q2 > = = = > qG+1 for which we refer to Van Mieghem and Wang () (2009). We merely list here the first few polynomials tG () = sG : t1 () = 3 ( 3 Q) t2 () = 2 3 (Q + q2 ) + Q q2 = ( 3 Q) ( 3 q2 ) t3 () = 33 + (2Q 3 q1 3 q4 ) 2 3 q22 + q23 + q1 q2 + q1 q3 + q1 q4 + 3q2 q3 + q2 q4 + q3 q4 + Qq2 q3 t4 () = 4 3 (2Q 3 q1 3 q5 ) 3 + q22 +q23 +q24 +q4 q5 +q3 (3q4 +q5 ) +q2 (3q3 + 3q4 + 2q5 ) + q1 (q2 + q3 + 2q4 + q5 ) 2 2 3 q3 q4 (q3 + q4 + q5 ) + q2 q3 + q24 + 4q3 q4 + (q3 + q4 ) q5 + q2 (q3 + q4 + q5 ) + q1 (q2 + q4 ) (q3 + q4 + q5 )) + Qq2 q3 q4
For increasing G, the explicit expressions rapidly become involved without a simple structure. There is one exception: JG (q1 > q2 > ===> qG+1 ) with all unit size cliques, qm = 1, is a G-hop line topology, whose spectrum is exactly given in (5.9), such
154 that
Spectra of special types of graphs ¶¶ ¶ μ G μ μ ³ ´ Y n 2 1 cos tG ; {qm = 1}1mG+1 = G+1 n=1
Finally, we mention that t3 () appears as the polynomial v () in the Laplacian spectrum of the complementary double cone (CDC) in Section 5.10.3. The CDC, written as J3 (1> p> Q p> 1), is clearly a member of the class JG (q1 > q2 > ===> qG+1 ) P with 4m=1 qm = Q + 2. 5.12 The lattice Consider a rectangular lattice with size }1 and }2 where at each lattice point with two integer coordinates (n> o) a node is placed. A node at (n> o) is connected to its direct neighbors at (n + 1> o), (n 1> o), (n> o + 1) and (n> o 1) where possible. At border points, nodes only have three neighbors and at the four corner points only two. The number of lattice points (nodes) equals Q = (}1 + 1)(}2 + 1) and the number of links is O = 2}1 }2 + (}1 + }2 ). Meyer (2000) nicely relates the Laplacian of the lattice JLa(Q ) to the discrete version of the Laplacian operator, C2x C2x + 2 C{2 C| In a similar vein, Cvetkovi´c et al. (2009, Chapter 9) discuss the Laplacian operator and its discretization in the solution of the wave equation with rectangular boundary. The adjacency matrix, following Meyer (2000), is 5
DLa(Q)
W(}1 +1)×(}1 +1) 9 L(}1 +1)×(}1 +1) 9 9 =9 9 9 7
L(}1 +1)×(}1 +1) W(}1 +1)×(}1 +1) L(}1 +1)×(}1 +1)
6 L .. . ..
.
..
.
W(}1 +1)×(}1 +1) L(}1 +1)×(}1 +1)
L(}1 +1)×(}1 +1) W(}1 +1)×(}1 +1)
: : : : : : 8
where the Toeplitz matrix 6
5
W(}1 +1)×(}1 +1)
0 1 9 1 0 9 9 =9 1 9 9 7
1 .. . .. .
..
.
0 1
: : : : : : 1 8 0
is the adjacency matrix of a }1 hops path whose eigenvalues are given by (5.8) with Q $ }1 + 1. The Laplacian TLa(Q ) is not easily given in general form because the sum of the rows in DLa(Q ) or the degree of a node is not constant. The adjacency
5.12 The lattice
155
matrix DLa(Q ) is a block Toeplitz matrix whose structure is most elegantly written in terms of a Kronecker product. We may verify that4 DLa(Q ) = L(}2 +1)×(}2 +1) W(}1 +1)×(}1 +1) + W(}2 +1)×(}2 +1) L(}1 +1)×(}1 +1) (5.30) The eigenvalues of DLa(Q ) are immediate from art. 185. Thus, for all 1 m }1 + 1> 1 n }2 + 1, ¡ ¢ ¡ ¢ ¡ ¢ mn DLa(Q ) = m W(}1 +1)×(}1 +1) + n W(}2 +1)×(}2 +1) and with (5.8), we arrive at ¡ ¢ mn DLa(Q ) = 2 cos
μ
m }1 + 2
¶
μ + 2 cos
n }2 + 2
¶
where 1 m }1 + 1> 1 n }2 + 1. Several extensions are possible. For a cubic or three dimensional lattice, the adjacency matrix generalizes to DLa(Q) = L(}3 +1)×(}3 +1) L(}2 +1)×(}2 +1) W(}1 +1)×(}1 +1) + L(}3 +1)×(}3 +1) W(}2 +1)×(}2 +1) L(}1 +1)×(}1 +1) + W(}3 +1)×(}3 +1) L(}2 +1)×(}2 +1) L(}1 +1)×(}1 +1) with spectrum ¡ ¢ mno DLa(Q ) = 2 cos
μ
m }1 + 2
¶
μ + 2 cos
n }2 + 2
¶
μ + 2 cos
o }3 + 2
¶
where 1 m }1 + 1> 1 n }2 + 1> 1 o }3 + 1. The Kronecker product where the Toeplitz matrix W of the path is changed for the circulant Toeplitz matrix of the circuit represents a lattice on a torus (Cvetkovi´c et al., 1995, p. 74). We end this section by considering the p-dimensional lattice Lap with lengths }1 > }2 > = = = > }p in each dimension, respectively, and where at each lattice point with integer coordinates a node is placed that is connected to its nearest neighbors whose coordinates only dier by one in only one components. The total number of nodes in Lap is Q = (}1 + 1) × (}2 + 1) × = = = × (}p + 1). The lattice graph can be written as a Cartesian product (Cvetkovi´c et al., 1995) of p path graphs, which we denote 4
When applying the identity (Meyer, 2000, p. 597) (D1 E1 ) (D2 E2 ) = (D1 D2 E1 E2 ) to compute the square of D2L a(Q ) given by (5.30), powers of ever, 5 1 0 1 9 9 0 2 0 9 9 2 .. 9 = W(} 9 1 . 1 +1)×(}1 +1) 0 9 9 . . 7 .. .. 1 shows that the Toeplitz structure is destroyed.
the Toeplitz matrix appear. How6 ..
.
..
.
2 0
: : : : : 1 : : : 8 0 1
156
Spectra of special types of graphs
by Lap = S(}1 +1) ¤S(}2 +1) ¤ = = = ¤S(}p +1) . According to Cvetkovi´c et al. (1995), the eigenvalues of Lap can be written as a sum of one combination of eigenvalues of path graphs and the corresponding eigenvector is the Kronecker product of the corresponding eigenvectors of the same path graphs, ¡ ¢ P l1 l2 ===lQ (Lap ) = p lm S(}m +1) m=1 ¡ ¢ ¡ ¢ ¡ ¢ (5.31) {l1 l2 ===lp (Lap ) = {l1 S(}m +1) {l2 S(}2 +1) = = = {lp S(}p +1) where lm 5 {1> 2> = = = > }m + 1} for each m 5 {1> 2> = = = > p}. Since both the adjacency and the Laplacian spectrum of the path SQ graph are completely known (Section 5.4), the corresponding spectra of the p-dimensional lattice Lap can also analytically be computed from (5.31) by substituting Q = }m + 1 in the derivations in Section 5.4. Lemma 5 The number O of links in the p-dimensional lattice Lap is, for p 1, O=
p Y m=1
(}m + 1)
p X m=1
}m }m + 1
Proof: We will prove the lemma by induction. Let the number of links in the n-dimensional lattice Lan be o(}1 > }2 > = = = > }n ). For n = 1, we have a path graph 1 S}1 +1 and its number of links is O = o(}1 ) = }1 = (}1 + 1) }1}+1 . Let us assume that the lemma holds for n-dimensional lattices. We consider the (n + 1)-dimensional lattice Lan+1 , that is constructed from n dierent n-dimensional lattices La(}l1 +1)×(}l2 +1)×===×(}ln +1) , where l1 > l2 > = = = > ln 5 {1> 2> = = = > (n + 1)} ¡ ¢ in the following way. We position a total of }ln+1 + 1 such n-dimensional lattices La(}l1 +1)×(}l2 +1)×===×(}ln +1) next to each other in the direction of dimension ln+1 . In this way, every link is counted n-times in each dimensions. Intuitively, this construction is easier to imagine in three dimensions, where the three dimensional lattice La(}1 +1)×(}2 +1)×(}3 +1) is constructed by (}3 +1) consecutive two dimensional La(}1 +1)×(}2 +1) planes that are positioned next to each other in the direction of the third dimension, (}2 + 1) consecutive two dimensional La(}1 +1)×(}3 +1) planes that are positioned next to each other in the direction of the second dimension and, finally, (}1 + 1) consecutive two dimensional La O(}2 +1)×(}3 +1) planes that are positioned next to each other in the direction of the first dimension. All links in this process are counted twice. Returning to the n-dimensional case, we thus deduce that o(}1 > }2 > = = = > }n+1 ) =
n+1 1X (}l + 1)o(}m1 > }m2 > = = = > }mn ) n l=1
where mz 6= l for each l = 1> 2> = = = > n + 1 and z = 1> 2> = = = > n. Introducing the
5.12 The lattice
157
induction hypothesis for n-dimensional lattices, we obtain o(}1 > }2 > = = = > }n+1 ) =
n+1 n+1 n+1 Y X 1X }m (}l + 1) (}m + 1) n l=1 }m + 1 m=1>m6=l
=
=
n+1 Y
n+1 X
1 (}m + 1) n m=1 l=1 n+1 Y
n+1 X m=1>m6=l
m=1>m6=l
}m }m + 1
n+1 X
1 }m (}m + 1)n n m=1 } +1 l=1 m
which illustrates that the induction hypothesis is true for (n + 1), and consequently, the lemma is true for each dimension p 1. ¤
6 Density function of the eigenvalues
General properties of the density function of eigenvalues are studied. Most articles in this chapter implicitly assume the eigenvalues of the adjacency matrix D. Especially for large graphs and random graphs, the density function is more suitable than the list of eigenvalues.
6.1 Definitions 121. The density function of the eigenvalues {p }1pQ is, by definition, i (w) =
Q 1 X (w p ) Q p=1
(6.1)
where (w) is the Dirac function, which can be written as a complex integral Z f+l4 1 h}w g} (f A 0) (w) = 2l fl4 Hence, we have for f A 0, i (w) =
1 2l
Z
f+l4
h}w * (}) g}
(6.2)
fl4
where, analogous to the definition of a probability generating function, * (}) =
Q 1 X exp (}p ) Q p=1
(6.3)
can be interpreted as the generating function of the density function of the eigenvalues {p }1pQ . 122. In fact, * (}) is a general Dirichlet series (Hardy and Riesz, 1964). Summing Z 4 1 v {v1 hp { g{ p = (v) 0 159
160
Density function of the eigenvalues
over p gives Q X
1 (v)
hv log p =
p=1
Z
4
{v1
0
Q X
hp { g{
p=1
which relates the general Dirichlet series on the set {p }1pQ by a Mellin transform to that on the set {log p }1pQ . By inverse Mellin transform, we thus find an alternative expression for (6.3), * (}) =
1 2Q l
Z
Q X
f +l4
(v) f l4
hv log p } v gv
p=1
where f A 0. By closing the contour over the negative half Re(v)-plane, we obtain PQ the Taylor expansion of (6.3) around zero. Hence, only if p=1 hv log p can be PQ summed in contrast to p=1 hp { , interesting insight may be gained. Formal substitution into (6.2) yields Z
1
Z
f+l4
f +l4
g}
gv } v h}w (v)
Q X
hv log p (2l)2 Q fl4 f l4 p=1 ¸ Z f +l4 Z f+l4 Q X 1 1 = gv } v h}w g} (v) hv log p 2lQ f l4 2l fl4 p=1
i (w) =
Using Hankel’s contour integral (see Abramowitz and Stegun (1968, Section 6.1.4); Sansone and Gerretsen (1960, p. 202)), the integral between brackets equals Z f+l4 1 wv1 } v h}w g} = 2l fl4 (v) Hence, besides (6.2), i (w) can be expressed as an inverse Mellin transform 1 i (w) = 2lQ of
1 Q
PQ p=1
Z
f +l4
v1
gv w f l4
Q X
hv log p
p=1
hv log p .
123. Another integral form can be derived by applying Abel summation. The result (Hardy and Riesz, 1964) is Z 1 * (}) = } D (x) hx} gx + h1 } Q
where D (w) =
1 X 1 Q p ?w
=
Q p Q
if p+1 w ? p
6.2 The density when Q $ 4
161
is, in terms of probability theory, nothing else than Pr [ w] = I (w) = Rwhich w i (x) gx. Q 124. A noteworthy relation, based on (8.7) in art. 138, is * (}) =
¢ ¡ 1 trace h}D Q
(6.4)
because, if {p }1pQ are the eigenvalues of a symmetric matrix DQ ×Q , then © } ª P n Dn h p 1pQ are the eigenvalues of h}D . Indeed, since h}D = 4 n=0 n! (}) ¢ ¡ and Dn = [diag np [ W , where the orthogonal matrix [ has the eigenvectors of D as columns (art. 151), we have that ¢ ¡ h}D = [diag h}p [ W The argument shows, in addition, that the eigenvector of D belonging to the eigen}p value p is also the eigenvector of h}D belonging to the eigenvalue . The £ } ¤ h relation with a probability generating function, * (}) = H h , suggests that (q) the moments H [q ] = (1)q * (0), and with (6.4) that H [q ] =
1 trace (Dq ) Q
(6.5)
This relation lies at the basis of Wigner’s moment approach (art. 135) to computing the eigenvalues of random matrices.
6.2 The density when Q $ 4 125. Since h} is convex for real }, an application of the general convexity bound (Van Mieghem, 2006b, eq. (5.3)), from which Jensen’s inequality is also derived, gives à ! Q Q 1 X } X * (}) = exp (}p ) exp p = 1 Q p=1 Q p=1 where we have used that
PQ
p=1
p = 0 for the adjacency matrix D (art. 25).
126. The basic summation formula (Titchmarsh and Heath-Brown, 1986, p. 13), ¶ μ ¶ Z e Z eμ X 1 1 gi ({) g{ + d [d] i (d) { [{] i (n) = i ({) g{ + 2 g{ 2 d d d?ne ¶ μ 1 i (e) (6.6) e [e] 2 is valid for any function i ({) with continuous derivative in the interval [d> e]. We define the continuous function ({) on [0> Q ] such that (p) = p by Lagrange interpolation (art. 206). Since, for any p 5 [1> Q ], art. 23 shows that (Q 1) ? p Q 1 and since Q Q1 === 1 , | ({)| is bounded on [1> Q ] by Q 1
162
Density function of the eigenvalues
and (p) (p 1) for any p. Thus, ({) is continuous and not increasing on [0> Q ]. Application of (6.6) yields Z 1 Q }({) h}(Q ) h}(0) * (}) = h g{ |Q (}) + Q 0 2Q where |Q (}) = Since
With
12
Z
} Q
¶ μ 1 }({) g ({) h g{ { [{] 2 g{
Q
0
{ [{] and g({) g{ 0, we may bound |Q (}) for real } as, Z Q Z Q } } g ({) }({) g ({) g{ |Q (}) g{ h h}({) 2Q 0 g{ 2Q 0 g{ 1 2
Z
Q
1 2
}({) g ({)
h
g{
0
Z
(Q )
g{ =
h}({) g ({) =
(0)
h}(0) h}(Q) }
we have that h}(Q) h}(0) h}(0) h}(Q ) |Q (}) 2Q 2Q such that, for real }, Z Z 1 Q }({) 1 Q }({) h}(Q ) h}(0) h g{ * (}) h g{ + Q 0 Q 0 Q The density function i (w) involves a line integration (6.2) over Re (}) = f A 0. If, for Re (}) A 0, h}(Q ) h}(0) h}(Q ) = lim =0 Q $4 Q $4 Q Q lim
then the limit 1 Q $4 Q
Z
Q
lim * (}) = lim
Q$4
h}({) g{
0
exists and, hence, also limQ $4 i (w). The condition means that the absolute value of the smallest eigenvalue (Q ) = Q ? 0 grows as |Q | = R (log Q ) at most, for Re (}) = f A 0, but arbitrarily small. This condition is quite restrictive and suggests to consider the spectrum of all normalized eigenvalues. 127. We start from H [p ] = Z p H [ ] =
1 p Q trace(D ) and using the Stieltjes integral 4 ¢ ¡ 1 {p gI (JQ ) = trace Dp Q ×Q Q 4
If limQ$4
¡ p ¢ DQ ×Q = zJ (p) exists, then Z 4 {p gI4 = zJ (p)
(art. 241),
1 Q trace
4
(6.7)
6.3 Examples of spectral density functions
163
which implies that the limiting distribution I4 of the eigenvalues of J4 = lim JQ exists. that this distribution is also dierentiable, ¢ ¡ 2 then R 4Q$4 R 4 Assuming p p = { gI = { i ({) g{. Since trace(D ) = 0 and trace D 4 4 Q ×Q Q ×Q 4 4 R4 2O, we find, beside 4 i4 ({) g{ = 1, that Z 4 {i4 ({) g{ = 0 4 Z 4 {2 i4 ({) g{ = H [G] 4 p
Multiplying both sides in (6.7) by (}) and summing over all integers p yields p! again the pgf Z 4 4 X (})p *4 (}) = zJ (p) h}{ gI4 = (6.8) p! 4 p=0
6.3 Examples of spectral density functions 128. Applying (6.3) to the Q 1 hops path using (5.8) yields ¶¶ μ μ Q 1 X n exp 2} cos Q Q +1
*S (}) = Since ({) = 2 cos
³
{ Q+1
n=1
´ and
Q
h}(Q ) h2} cos( Q+1 ) = lim =0 Q $4 Q $4 Q Q lim
art. 126 shows that the limit generating function exists, Z 1 Q 2} cos( Q{ +1 ) g{ lim * (}) = lim h Q$4 Q $4 Q 0 Z Q Q + 1 Q+1 2} cos = lim h g Q $4 Q 0 Hence, lim *S
Q $4
1 (}) =
Z
exp (2} cos ) g = L0 (2}) = L0 (2}) 0
where Lq (}) is the modified Bessel function (Abramowitz and Stegun, 1968, Section 9.6.19). The inverse Laplace transform is Z f+l4 Z f+l4 © ª 1 1 lim i (w) = h}w L0 (2}) g} = 2 h}(w+2) h2} L0 (2}) g} Q $4 2l fl4 2 l fl4
164
Density function of the eigenvalues
and with the Laplace pair in Abramowitz and Stegun (1968, Section 29.3.124), we arrive at the spectral density function of an infinitely long path: lim i (w) =
Q$4
1 1 s 1|w|?2 4 w2
(6.9)
129. The spectrum of an arbitrary path in a graph with Q nodes can be computed if the distribution of the hopcount KQ A 0 of that path is known. Indeed, using the law of total probability yields Pr [arbitrary
path
w] =
Q 1 X
Pr [ arbitrary
path
w| KQ = n] Pr [KQ = n]
n=1
=
Q 1 X
Pr [n-hop
path
w] Pr [KQ = n]
n=1
Dierentiation gives us the density, ia r b i t r a r y
path
(w) =
Q 1 X
in- h o p
path
(w) Pr [KQ = n]
n=1
Introducing the definition (6.1) combined with the spectrum specified in Section 5.5, ¶¶ μ n+1 μ 1 X Q n1 p in- h o p p a t h (w) = (w) + w 2 cos Q Q p=1 n+2 gives ia r b i t r a r y
path
(w) =
Q 1 H [KQ ] (w) Q ¶¶ μ μ Q 1 n+1 1 X X p + Pr [KQ = n] w 2 cos Q n+2 p=1 n=1
The spectral peak at w = 0 has a strength equal to
Q 1H[KQ ] . Q
Just as for the³Q 1 ´
p hop path, the spectrum lies in the interval (2> 2) at discrete values w = 2 cos n+2 that range over more possible values than a constant hop path. Moreover, the strength or amplitude of a peak is modulated by the hopcount distribution.
130. Applying (6.3) to the small world graph SWn , with (5.3), gives ³ ´4 3 (p1)(2n+1) Q sin } X Q h ³ ´ D *S W n (}) = exp C} (p1) Q p=1 sin Q ³ ´4 3 p(2n+1) Q 1 sin Q h} X ¡ ¢ D = exp C} Q p=0 sin p Q
6.3 Examples of spectral density functions ({1)(2n+1) sin Q sin ({1) Q
165
(Q1)(2n+1) sin Q sin (Q1) Q
) ( ) 1 and (Q ) = 1 (SW n )min , ( ) ( ) which is independent of Q , the limit generating function exists ³ ´4 3 ({1)(2n+1) sin } Z Q Q h ³ ´ D g{ exp C} lim *S W n (}) = lim ({1) Q $4 Q$4 Q 0 sin
Since ({) =
(
Q
h} Q$4
Z
(Q1) Q
= lim Thus,
h} lim *S W n (}) = Q $4
Q
Z
¶ μ sin(2n + 1) g exp } sin
¶ μ sin(2n + 1) g exp } sin
0
(6.10)
In terms of the Chebyshev polynomial Xq ({) of the second kind, Z h} lim *S W n (}) = exp (}X2n (cos )) g Q$4 0 = sin(2n+1)({) , the definition (5.4) shows that X2n (cos ) = Since sin(2n+1){ sin { sin({) X2n (cos (( ))). Hence, 2h} lim *S W n (}) = Q $4
Z
2
exp (}X2n (cos )) g
We return to the generating function (6.10). With X2n (cos ) = Pn 1 + 2 m=1 cos 2m, we have 1 lim *S W n (}) = Q $4
Z
(6.11)
0
3
exp C2}
0
n X
sin(2n+1) sin
=
4 cos 2mD g
m=1
R If n = 1, we find after some manipulations that 1 0 exp (2} cos 2) g = L0 (2}), which shows that the limit density of the infinite cycle (n = 1) is the same as that (6.9) of the infinite path. For n = 2, we find Z 1 2} cos 2 2} cos 4 lim *S W n=2 (}) = h h g Q$4 0 Applying the generating function (Abramowitz and Stegun, 1968, Section 9.6.34) of the modified Bessel function, h} cos { = L0 (}) + 2
4 X m=1
Lm (}) cos (m{)
166
Density function of the eigenvalues
to h2} cos 4 gives
Z 1 2} cos 2 h g 0 Z 4 X 1 2} cos 2 m +2 (1) Lm (2}) h cos (4m) g 0 m=1
lim *S W n=2 (}) = L0 (2})
Q $4
Using (Abramowitz and Stegun, 1968, Section 9.6.19) Z 1 } cos { h cos q{ g{ Lq (}) = 0 yields lim *S W n=2 (}) =
Q $4
L02
4 X (2}) + 2 (1)m Lm (2}) L2m (2}) m=1
The inverse Laplace transform is 1 lim iS W n=2 (w) = Q$4 2l
Z
f+l4
fl4
< ; 4 @ ? X h}w L02 (2}) + 2 (1)m Lm (2}) L2m (2}) g} > = m=1
We cannot evaluate this integral in closed form.
6.4 Density of a sparse regular graph We present here an ingenious way of McKay (1981), who succeeded in finding the asymptotic density of the eigenvalues of the adjacency matrix of a regular, sparse graph. 131. A large sparse, regular graph. Consider a regular graph J (u; Q ), where each node has degree u. The sparseness of J (u; Q ) is here understood in the sense that J (u; Q ) has a local tree-like structure. In other words, for small enough integers k, the graph induced by the nodes at hop distance 1> 2> = = = > k from a certain node q is a tree, more specific a n-ary, regular tree with the out-degree n = u 1. We will first determine the moments via trace(Dp ), as explained in art. 127, where each element (Dp )mm equals the number of closed walks of p hops starting at node m and returning at m (art. 17). In view of the regularity of J (u; Q ), it is expected that, for Q $ 4, Q 1 1 X p trace (Dp ) = (D )mm $ (Dp )qq Q Q m=1
Hence, for large Q and fixed p, the local structure around any node q is almost the p same. In addition, as long as the contribution f(Q ) of cycles ´ ³P to trace(D ) is small, Q 1 p i.e., f(Q ) = r(Q ), the above limit is unaltered, for Q m=1 (D )mm ± f (Q ) $
6.4 Density of a sparse regular graph
167
(Dp )qq . The fact that the number of cycles in J (u; Q ) grows less than proportionally with Q is an alternative way to define the sparseness of J (u; Q ). 132. Random walks and the reflection principle. McKay (1981) had the fortunate idea to relate the computation of the number of closed walks to the powerful reflection principle, primarily used in the theory of random walks. £ pThe ¤ largest hop distance reached in a n-ary tree by a closed walk of p hops is 2 . The length p of all closed walks in a n-ary tree is even. Moreover, all walks travel some hops down and return along the same path back to the root q. Due to the regular structure, the analogy with a path in a simple random walk is very eective. In a simple random walk, an item moves along the (vertical) {-axis over the integers during q epochs, measured along the horizontal n-axis. At each epoch n, the item jumps either one step to the right ({n = {n1 + 1) or one step to the left ({n = {n1 1). Assuming that the item starts at the origin at epoch 0, then {0 = 0, his position at n = q equals {q = u o, where u and o are the total number of right and left steps, respectively. Geometrically, plotting the {-distance versus discrete time n, the sequence {1 > {2 > = = = > {q represents a path from ¢ to¡the ¢ ¡ ¢the ¡origin q point (q> {q ). The number of such paths with u right steps is qu = qu = qo , which is thus equal the number of paths with o left steps, because o + u = q. Also, since u = {+q — we write in the sequel { for {q — the number of paths from the 2 origin to the point (q> {) is ¶ μ q W(q>{) = {+q 1{ {+q 5N} (6.12) 2
2
In general, it is clear that {n can be either negative, positive or zero. The reflection principle states that: Theorem 29 (Reflection principle) The number of paths from the point d = (p> |{|) to the point e = (q> |||) that cross or touch the n-axis is equal to the number of all paths from d = (p> |{|) to that same point e. The reflection of a point (w> {) is the point (w> {). Proof: The reflection principle is demonstrated by showing a one-to-one correspondence with the subpath from d = (p> |{|) to f = (> 0) and the reflected subpath from d = (p> |{|) to f. For each subpath from d to f, there corresponds precisely one subpath from d to f (and the sequel of f to e is the same in both cases). ¤ A direct consequence of Theorem 29 is the so-called ballot theorem: Theorem 30 (Ballot) The number of paths from the origin to (q> {), where q> { 5 N0 , that never touch the discrete time n-axis equals q{ W(q>{) .
168
Density function of the eigenvalues
Never touching the {-axis implies that {1 A 0> {2 A 0> = = = > {q = { A 0. The proof is too nice to not include. Proof: Since {1 A 0, the first step in such a path is necessarily the point (1> 1). Hence, the number of paths from the origin above the n-axis to the point (q> {) is equal to the number of paths from (1> 1) to (q> {) lying above the n-axis. By the reflection principle and (6.12), that number equals W(q1>{1) W(q1>{+1) = { ¤ q W(q>{) . With this preparation, we can determine (Dp )qq . Each walk of p hops can be represented by the sequence of points (1> 1) > (2> k2 ) > = = = (p> kp ), where km 0 is the distance in hops from the root node q. A closed walk means that kp = 0. Each such walk of p steps may consist of smaller walks o, each time when km = 0 for 0 ? m ? p. In the language of a random walk, each time that the path from the origin back to the origin and only lying above the n-axis, touches the naxis at (m> km = 0). We thus need to compute the number such paths with o points touching ¡po¢the n-axis. Feller (1970, pp. 90-91) proves that this number of paths equals o po [ p ] 1{ p 5N} . An elementary closed walk f 5 [1> v] consists of one excursion to 2
2
some maximum level Kf and back along the same track. If there is more than one maximum, then Kf denotes the sum of these maxima. The total number of such K 1 elementary closed walks f is u (u 1) f because the root has degree u, and from hop level 1 on, each node has outdegree u 1. Only the upwards steps towards the local maxima contribute to the determination of the total number of walks in a type f walk. Since there are v suchS elementary closed walks, their total is o Qo Kf 1 = uo (u 1)o (u 1) f=1 Kf . Now, each closed walk has an f=1 u (1 u) even number£ of¤hops and precisely as many up as down in the n-ary tree. Hence, Po p f=1 Kf = 2 , the highest possible level to be reached. Thus, we end up with ¢ ¡ [ p2 ]o walks with o touching points. Finally, o o a total of po po [ p2 ] 1{ p2 5N} u (u 1) summing over all possible of o yields McKay’s basic result ¶ p μ X ¡ 2p ¢ 2p o o po D uo (u 1) = (6.13) qq 2p o p o=1 ¡ 2p+1 ¢ D =0 qq 133. Asymptotic density i4 ({). The next hurdle is the inversion of (6.7) in art. 127. We assume that the limit density exists and is dierentiable such that ¶ Z 4 p μ X o 2p o 2p uo (u 1)po { i4 ({) g{ = p 2p o 4 o=1 ¢ ¡ and i4 ({) = i4 ({) is even to satisfy D2p+1 qq = 0. Recall from Theorems 21 and 22 that symmetry in the spectrum of D is the unique fingerprint of a bipartite structure of which a tree is a special case. McKay succeeded in finding i4 ({) by inverting this relation, using a rather complicated method.
6.5 Random matrix theory
169
0.5
0.4
0.3 fOf(x)
r=2 r=3
0.2 r=4 r=5
0.1
0.0
eigenvalue x Fig. 6.1. The spectral density i" ({) of a large sparse regular graph for various values of the degree u.
He presents various alternative sums of (6.13) without derivation. Then he derives an asymptotic form of (6.13) for large p to conclude that the extent of i4 ({) s is bounded, i.e., i4 ({) exists only for |{| 2 u 1. After normalizing the {range to the interval [1> 1], he employs Chebyshev polynomials and their orthogonality properties to execute the inversion, resulting in: Theorem 31 (McKay’s Law) The asymptotic density i4 ({) of the eigenvalues of the adjacency matrix of a large, sparse regular graph with degree u equals p u 4 (u 1) {2 1{|{|2su1} i4 ({) = (6.14) 2 (u2 {2 ) This spectral density (6.14) is plotted in Fig. 6.1. For u = 2, we again find the spectral density of an infinitely long path (6.9).
6.5 Random matrix theory 134. Random matrix theory investigates the eigenvalues of an Q × Q matrix D whose elements dlm are random variables with a given joint distribution. Even in case all elements dlm are independent, there does not exist a general expression for the distribution of the eigenvalues. However, in some particular cases (such as Gaussian elements dlm ), there exist nice results. Moreover, if the elements dlm are properly scaled, in various cases the spectrum in the limit Q $ 4 seems to
170
Density function of the eigenvalues
converge rapidly to a deterministic limit distribution. The fascinating results of random matrix theory and applications from nuclear physics to the distributions of the non-trivial zeros of the Riemann Zeta function are overviewed by Mehta (1991). Recent advances in random matrix theory, discussed by Edelman and Raj Rao (2005), present a general framework that relates, among others, the laws of Wigner (Theorem 32), McKay (Theorem 31) and Mar˘cenko-Pastur (Theorem 34) to Hermite, Jacobi and Laguerre orthogonal polynomials (see Chapter 10), respectively. A rigorous mathematical treatment of random matrix theory has just appeared in Anderson et al. (2010). Random matrix theory immediately applies to the adjacency matrix of the Erd˝osRényi random graph Js (Q ), where each element dlm is 1 with probability s and zero with probability 1 s.
6.5.1 Wigner’s Semicircle Law 135. Wigner’s Semicircle Law is the fundamental result in the spectral theory of large random matrices. Theorem 32 (Wigner’s Semicircle Law) Let D be a random Q × Q real symmetric matrix with independent and identically distributed elements dlm with 2 = Var[dlm ] and denote by (DQ ) an eigenvalue of the set of the Q real eigenvalues of the scaled matrix DQ = sDQ . The probability density function i(DQ ) ({) tends for Q $ 4 to 1 p 2 lim i(DQ ) ({) = 4 {2 1|{|2 (6.15) Q$4 2 2 Since Wigner’s first proof (Wigner, 1955) of Theorem 32 and his subsequent generalizations (Wigner, 1957, 1958) many proofs have been published. However, none of them is short and easy enough to include here. Wigner’s Semicircle Law illustrates that, for su!ciently large Q , the distribution of the eigenvalues of sDQ does not depend anymore on the probability distribution of the elements dlm . Hence, Wigner’s Semicircle Law exhibits a universal property of a class of large, real symmetric matrices with independent random elements. Mehta (1991) suspects that, for a much broader class of large random matrices, a mysterious yet unknown law of large numbers must be hidden. The adjacency matrix of the Erd˝os-Rényi random 2 graph satisfies the conditions in Theorem ³s32 ´with = s (1 s) and its eigenvalues (apart from the largest) grow as R Q . In order to obtain the finite limit distribution (6.15) scaling by s1Q is necessary. The moment relation (6.5) for the eigenvalues suggests us to compute the moments of Wigner’s Semicircle Law (6.15), Z 2 Z 4 p 1 H [q ] = {q lim i(DQ ) ({) g{ = {q 4 2 {2 g{ 2 Q$4 2 2 4
6.5 Random matrix theory
171
Thus, H [q ] = q Fq where Fq =
2q+1
Z
1
wq
p 1 w2 gw
(6.16)
1
shows that H [q ] = 0 for odd values of q because of integration of an odd function over an even interval. Using the integral of the Beta-function (Abramowitz and Stegun, 1968, Section 6.2.1) for Re (}) A 0 and Re (z) A 0, Z 1 (}) (z) E (}> z) = w}1 (1 w)z1 gw = (} + z) 0 we find that
Z p 1 22n+1 1 n 1 2 w 1 w gw = { 2 (1 {) 2 g{ F2n 0 0 ¢ ¡ ¢ ¡ 22n+1 n + 12 32 = (n + 2) ¡ ¢ s Using the functional equation (} + 1) = } (}), 12 = and the duplication formula (Abramowitz and Stegun, 1968, Section 6.1.18), ¶ μ 1 22} 2 1 (2}) = s (}) } + 2 2 22n+2 =
Z
1
2n
finally gives F2n
¡2n¢ (2n)! = n = (n + 1)!n! n+1
These numbers F2n are known as Catalan numbers (Comtet, 1974). Since all moments uniquely define a probability distribution, the only distribution, whose moments are Catalan numbers, is the semicircle distribution, with density function given by (6.15). Another derivation of the integral (6.16) is given that avoids the theory of the Gamma function. We can rewrite Z Z ¢ª 2q+1 1 q p 2q+1 1 w © q1 ¡ s Fq = w 1 w2 gw w 1 w2 gw = 2 1w 1 1 s g w Recognizing that gw 1 w2 = s1w 2 , partial integration gives 2q+1 Fq = =
2q+1
Z
p ¢ª g © q1 ¡ 1 w2 gw w 1 w2 gw 1 Z 1p © ª 1 w2 (q 1) wq2 (q + 1) wq gw 1
1
= 2 (q 1) Fq2 (q + 1) Fq
172
Density function of the eigenvalues
which leads, with F0 = 1 and F1 = 0, to the recursion Fq =
2 (q 1) Fq2 q+2
Iteration gives Fq = 2s
q1q3q5 q (2s 1) ··· Fq2s q+2 q q2 q (2s 4)
If q is odd, Fq = 0 (as found above), while if q = 2n and s = n, then 1 2n 1 2n 3 2n 5 ··· 2n + 2 2n 2n 2 4 1 (2n)! 2n 2n 1 2n 2 2n 3 2n 4 2n 5 ··· = = 22n 2n 2n + 2 2n 2 2n 2n 4 2n 2 4 (n + 1)!n!
F2n = 22n
which again results in the Catalan numbers. The Catalan numbers appear in many combinatorial problems (see, e.g., Comtet (1974)). For example, the number of paths in the simple random walk that never cross (but may touch) the n-axis and that start from the origin and return to the origin at time q = 2p, is deduced from the reflection principle (Theorem 29) as ¶ μ ¶ μ 2p 2p = F2p W(2p>0) W(2p>2) = p1 p Indeed, the number of paths from the origin to (2p> 0) that never cross the n-axis equals the total number of paths from the origin to (2p> 0), which is W(2p>0) , minus the number of paths from the origin to (2p> 0) that cross the n-axis at some point. A path that crosses the n-axis, touches the line { = 1. Instead of considering the reflection principle with respect to the { = 0 line — the n-axis —, it evidently applies for a reflection around a line at { = m 5 Z. Thus, the number of paths from (2p> 0) to the origin that touch or cross the line at { = 1 is equal to the total number of paths from (2p> 2) to the origin. That latter number is W(2p>2) , which demonstrates the claim. 136. A single eigenvalue has measure zero and does not contribute to the limit probability density function (6.15). By using Wigner’s method, Füredi and Komlós (1981) have extended Wigner’s Theorem 32. Theorem 33 (Füredi-Komlós) Let D be a random Q × Q real symmetric matrix where the elements dlm = dml are independent (not necessary identically distributed) random variables bounded by a common bound N. Assume that, for l 6= m, these random variables possess a common mean H [dlm ] = and common Y du [dlm ] = 2 , while H [dll ] = . (a) If A 0, then the distribution of the largest eigenvalue 1 (D) can be approxi³ ´ 1 s mated to within order R by a Gaussian distribution with mean Q H [1 (D)] ' (Q 1) + +
2
6.5 Random matrix theory
173
and (bounded) variance Y du [1 (D)] ' 2 2 In addition, with probability tending to 1, ³ ´ s max |m (D)| ? 2 Q + R Q 1@3 log Q
(6.17)
mA1
(b) If = 0, then all eigenvalues of D, including the largest, obey the last bound (6.17). 137. Spectrum of the Erd˝os-Rényi random graph. We apply the powerful Theorem 33 to the Erd˝os-Rényi random graph Js (Q ). Since = s, = 0 and 2 = s (1 s), Theorem 33 states that the largest 1 is a Gaussian random variable with ³ eigenvalue ´ 1 mean H [1 ] = (Q 2) s+1+R sQ and Var[1 (D)] ' 2s (1 s), while all other p ¡ ¢ eigenvalues are smaller in absolute value than 2 s (1 s) Q + R Q 1@3 log Q .
14 p = 0.1 N = 50 p = 0.2 p = 0.3 p = 0.4 p = 0.5 p = 0.6 p = 0.7 p = 0.8 p = 0.9 Semicircle Law (p = 0.5)
12 p = 0.1
fO(x)
10 8 6
p = 0.9
p = 0.8
p = 0.2
p = 0.7
p = 0.3
4
E[O1] = p(N 2) + 1
2 0
0
10
20
30
40
eigenvalue x Fig. 6.2. The probability density function of an eigenvalue in Js (50) for various s. Wigner’s Semicircle Law, rescaled and for s = 0=5 (2 = 14 ), is shown in bold. We observe that the spectrum for s and 1 3 s is similar, but slightly shifted. The high peak for s = 0=1 reflects disconnectivity, while the high peak at s = 0=9 shows the tendency to the spectrum of the complete graph where Q 3 1 eigenvalues are precisely 31.
The spectrum of Js (50) together with the properly rescaled Wigner’s Semicircle Law (6.15) is plotted in Fig. 6.2. Already for this small value of Q , we observe
174
Density function of the eigenvalues
that Wigner’s Semicircle Law is a reasonable approximation for the intermediate sregion. The largest eigenvalue 1 for finite Q , which is almost Gaussian distributed around s (Q 2) + 1 with variance 2s (1 s) by Theorem 33 and shown in Fig. 6.2, but which is not incorporated in Wigner’s Semicircle Law, influences the average PQ H [] = Q1 n=1 n = 0 and causes the major bulk of the pdf around { = 0 to shift leftward compared to Wigner’s Semicircle Law, which is perfectly centered around { = 0. The finite size variant of the Wigner Semicircle Law for the eigenvalue distribution of the adjacency matrix of the Erd˝os-Rényi random graph Js (Q ) is
i ({) '
q 2 4Q s (1 s) ({ + s) 2Q s (1 s)
s > |{| 2s (1 s) Q
(6.18)
The expression (6.18) for the bulk density of eigenvalues, thus also ignoring the largest eigenvalue 1 , agrees very well with simulations for finite Q . Below, we sketch the derivation of (6.18). The probabilistic companion of (3.1) is 4 P
H [] =
n Pr [ = n] = 0
n=4
while the discrete random variable needs to satisfy 4 P
Pr [ = n] = 1
n=4
The Perron-Frobenius Theorem 38 states that any connected graph has one largest eigenvalue 1 with multiplicity one, such that Pr [ = 1 ] = Q1 . Both the mean and the law of total probability can be written, for one realization of an Erd˝os-Rényi random graph, as H [] = 1
P 1 + n Pr [ = n] = 0 Q All others
(6.19)
and P All others
Pr [ = n] = 1
1 Q
Fig. 6.2 suggests us to consider the Semicircle Law for finite Q shifted over some value %, q 2 s 4Q s (1 s) ({ + %) i ({; %) = > |{| 2s (1 s) Q 2Q s (1 s) s Denoting the radius U = 2s (1 s) Q and passing to the continuous random
6.5 Random matrix theory
175
variable, relation (6.19) becomes Z U% 1 {i ({; %) g{ 0 = 1 + Q U% Z U% Z U% 1 = 1 + ({ + %) i ({; %) g{ % i ({; %) g{ Q U% U% R U% R U% Since U% ({ + %) i ({; %) g{ = 0 due to symmetry and U% i ({; %) g{ = 1 1 Q , we obtain μ ¶ 1 1 =0 1 % 1 Q Q ¢ ¡ Finally, Theorem 33 states that 1 = (Q 2) s + R (1) such that % = s + R Q 1 leading to (6.18). 12 p = 0.2 N = 100 p = 0.3 p = 0.4 p = 0.5 p = 0.6 p = 0.7 p = 0.8 p = 0.9 Semicircle Law (p = 0.5)
10 p = 0.7
p = 0.8
fO(x)
8
6
4
2
0 0
20
40
60
80
eigenvalue x Fig. 6.3. The spectrum of the adjacency matrix of Js (100) (full lines) and of the corresponding matrix with i.i.d. uniform elements (dotted lines). The small peaks at higher values of { are due to 1 .
The complement of Js (Q ) is (Js (Q ))f = J1s (Q ), because a link in Js (Q ) is present with probability s and absent with probability 1 s and (Js (Q ))f is also a random graph. For large Q , there exists a large range of s values for which both s sf and 1 s sf such that both Js (Q ) and (Js (Q ))f are connected almost surely. Figure 6.2 shows that the normalized spectra of Js (Q ) and J1s (Q ) are, apart from a small shift and ignoring the largest eigenvalue, almost identical.
176
Density function of the eigenvalues
Equation (3.30) indicates that the spectra of a graph and of its complement tend to each other if cos m $ 0 (except for the largest eigenvalue which will tend to x). This seems to suggest that Js (Q ) and J1s (Q ) are tending to a regular graph with degree s (Q 1) and (1 s) (Q 1) and that these regular graphs (even for small Q ) have nearly the same spectrum, apart from the largest eigenvalue s (Q 1) s and (1 s) (Q 1) respectively: s1s ' sQ s1Q where s is an eigenvalue of Q Js (Q ). Figure 6.3 shows the probability density function i ({) of the eigenvalues of the adjacency matrix D of Js (Q ) with Q = 100 together with the eigenvalues of the corresponding matrix DX where all one elements in the adjacency matrix of Js (100) are replaced by i.i.d. uniform random variables on [0> 1]. Wigner’s Semicircle Law provides an already better approximation than for Q = 50. Since the elements of DX are always smaller (with probability 1) than those of D, the matrix norm kDX kt ? kDkt and the inequality (8.51) imply that 1 (DX ) ? 1 (D). In addition, PQ relation (3.5) shows that n=1 2n (DX ) ? 2O such that Var[ (DX )] ? Var[ (D)], which is manifested by a narrower and higher peaked pdf centered around { = 0.
6.5.2 The Mar˘ cenko-Pastur Law The last of the classical laws in random matrix theory with an analytic density function for the eigenvalues is given in the next theorem without proof: Theorem 34 (The Mar˘ cenko-Pastur law) Let F be a random p × q matrix with independent and identically distributed complex elements flm with finite 2 = Y du [flm ] and zero mean H [flm ] = 0, or the complex elements flm are independently distributed with a finite fourth-order moment. Let | = p q as q $ 4 and define ¡ ¡ s ¢2 s ¢2 2 2 d (|) = 1 | and e (|) = 1 + | , and denote by (V) an eigenvalue of the set of the p real eigenvalues of the scaled Hermitian matrix V = q1 FF . The probability density function i(V) ({) tends for q $ 4 to ¶ μ 1{d(|){e(|)} p 1 lim i(V) ({) = ({) 1{|A1} ({ d (|)) (e (|) {) + 1 q$4 2{| 2 | (6.20) Marˇcenko and Pastur (1967) prove Theorem 34 by deriving a first-order partial dierential equation, from whose solution the unique Stieltjes transform p (#; }) of # ({) = limq$4 i(V) ({) is found. The Stieltjes transform of a function i ({), defined by Z 4 i ({) p (i> }) = g{ 4 } { is essentially a special case of an integral of the Cauchy type, that is treated, together with its inverse, in art. 252. This method is essentially dierent than the
6.5 Random matrix theory
177
moments method used by McKay (sketched in Section 6.4) and earlier by Wigner (1955). 3 .5 y y y y y
3 .0
= = = = =
1 0 .5 0 .2 5 0 .1 0 .0 1
fO(S)(x)
2 .5 2 .0 1 .5 1 .0 0 .5 0 .0
1
2
3
4
e ig e n v a lu e x
Fig. 6.4. The Mar˘cenko-Pastur probability density function (6.20) for various values of |. Each curve starts at { = d (|), which is increasing from 0 to 1 when | decreases from 1 to 0, and ends at { = e (|), which decreases from 4 to 1 when | decreases from 1 to 0. When | < 0, the Mar˘cenko-Pastur probability density function tends to a delta function at { = 1.
The last term in (6.20), the point mass at { = 0, is a consequence of the non square form of F. The³rank(FF ´ ) min (q> p) such, for | A 1, the p × p matrix 1 FF has p q = p 1 | zero eigenvalues, while all q| other eigenvalues are the same as those of F F, which follows from art. 184. In the case p = q and | = 1, and F = F W = sDq , the eigenvalues of V are the D s . q
Since the latter eigenvalues obey the Wigner-Semicircle Law s s i[ ( {)+i[ ( {) s (6.15) and since the density i[ 2 ({) = for any random variable [ 2 { as shown in Van Mieghem (2006b, p. 50), we find, indeed for | = 1, that s s s i(Dq ) ( {) i(Dq ) ( {) + i(Dq ) ( {) s s = i(V) = 2 { {
squares of those of
Also, in that case, the matrix V represents a square covariance matrix. In general for p real random q × 1 vectors, V represents the p × q covariance matrix, that appears in many applications of signal and information theory and physics. Fig. 6.4 illustrates the Mar˘cenko-Pastur probability density function i(V) ({) for various values of | 1.
7 Spectra of complex networks
This chapter presents some examples of the spectra of complex networks, which we have tried to interpret or to understand using the theory of previous chapters. In contrast to the mathematical rigor of the other chapters, this chapter is more intuitively oriented and it touches topics that are not yet understood or that lack maturity. Nevertheless, the examples may give a flavor of how real-world complex networks are analyzed as a sequence of small and partial steps towards (hopefully) complete understanding.
7.1 Simple observations When we visualize the density function i ({) of the eigenvalues of the adjacency matrix of a graph, defined in art. 121, peaks at { = 0, { = 1 and { = 2 are often observed. The occurrence of adjacency eigenvalue at those integer values has a physical explanation.
7.1.1 A graph with eigenvalue (D) = 0 A matrix has a zero eigenvalue if its determinant is zero (art. 138). A determinant is zero if two rows are identical or if some of the rows are linearly dependent. For example, two rows are identical resulting in (D) = 0, if two not mutually interconnected nodes are connected to a same set of nodes. Since the elements dlm of an adjacency matrix D are only 0 or 1, linear dependence of rows occurs every time the sum of a set of rows equals another row in the adjacency matrix. For example, consider the sum of two rows. If node q1 is connected to the set V1 of nodes and node q2 is connected to the distinct set V2 , where V1 _ V2 = B and q1 6= q2 , then the graph has a zero adjacency eigenvalue if another node q3 6= q2 6= q1 is connected to all nodes in the set V1 ^ V2 . These two types of zero eigenvalues occur when a graph possesses a “local bipartiteness”. In real networks, this type of interconnection often occurs. 179
180
Spectra of complex networks 7.1.2 A graph with eigenvalue (D) = 1
An adjacency matrix D has an eigenvalue (D) = 1 every time a node pair q1 and q2 in the graph is connected to a same set V of dierent nodes and q1 and q2 are mutually also interconnected. Indeed, without loss of generality, we can relabel the nodes such that q1 = 1 and q2 = 2. In that case, the first two rows in D are of the form 0 1 d13 d14 · · · d1Q 1 0 d13 d14 · · · d1Q and the corresponding rows in det (D L) of the characteristic polynomial are 1 d13 1 d13
d14 d14
··· ···
d1Q d1Q
If two rows are identical, the determinant is zero. In order to make these rows identical, it su!ces to take = 1 and det (D + L) = 0, which shows that = 1 is an eigenvalue of D with this particular form. This observation generalizes to a graph where n nodes are fully meshed and, in addition, all n nodes are connected to the same set V of dierent nodes. Again, we may relabel nodes such that the first n rows describe these n nodes in a complete graph configuration, also called a clique. Let { denote a (Q n) × 1 zero-one vector, then x{W is a matrix with all rows identical and equal to {. The structure of det (D L) is ¯ ¯ ¯ ¯ (M ( + 1) L) W x={ ¯ ¯ n×n det (D L) = ¯ ¯ E(Qn)×n (F L)(Qn)×(Q n) ¯ ¯ which shows that the first n rows are identical if = 1, implying that the multiplicity of this eigenvalue is n 1. Observe that the spectrum in Section 5.1 of the complete graph NQ , where n = Q , indeed contains an eigenvalue = 1 with multiplicity Q 1. We can also say that a peak in the density of the adjacency eigenvalues at = 1 reflects that a set of interconnected nodes all have the same neighbors (dierent from those in the interconnected set).
7.1.3 A graph with eigenvalue (D) = 2 If the graph is a line graph (art. 7), then art. 9 demonstrates that the adjacency matrix has an eigenvalue equal to (D) = 2 with multiplicity O Q . However, it is in general rather di!cult to conclude that a graph is a line graph. Each node with ¡degree g — locally, a star N1g — is transformed in the line graph into a clique ¢ g with 2 links. Thus, a line graph can be regarded as a collection of interconnected cliques Ngm , where 1 m Q . The presence of an eigenvalue (D) = 2 is insu!cient to deduce that a graph is a line graph. A more elaborate discussion on line graphs is found in Cvetkovi´c et al. (2009, Section 3.4). A peak in the density i ({) of the eigenvalues of the adjacency matrix at (D) = 2 and (D) = 2 may correspond to a very long path (art. 128). As shown in Fig.
7.2 Distribution of the Laplacian eigenvalues and of the degree
181
6.1, these peaks occur in large, sparse regular graphs with degree u = 2 (McKay’s Theorem 31).
7.2 Distribution of the Laplacian eigenvalues and of the degree Although the moments of the Laplacian eigenvalues (art. 70-72) can be expressed in terms of those of the degree, in most real-world networks the degree distribution and the Laplacian distribution are usually dierent. In this section, we present a curious example, where both distributions are remarkably alike. Software is assembled from many interacting units and subsystems at several levels of granularity (subroutines, classes, source files, libraries, etc.) and the interactions and collaborations of those parts can be represented by a graph, which is called the software collaboration graph. Fig. 7.1 depicts the topology of the VTK network, which represents the collaborations in the VTK visualization C++ library that has been documented and studied by Myers (2003).
Fig. 7.1. The connected graph of the VTK network with Q = 771 and O = 1357. The nodal size is drawn proportionally to its degree.
Fig. 7.2 shows the correspondence between the degree G and the Laplacian eigenvalue in the connected VTK graph with Q = 771 nodes, H [G] = H [] = 3=5201, Var[G] = 33=0603 and Var[] = 36=5804, which agrees with the theory in art. 70. Both the degree G and the Laplacian eigenvalue of the VTK graph approximately follow a power law, a general characteristic of many complex networks, and each power law is specified by the fit in the legend in Fig. 7.2, where fG and f are normalization constants. The much more surprising fact is that the insert in Fig.7.2
182
Spectra of complex networks
demonstrates how closely the ordered Laplacian eigenvalues n follow the ordered degree g(n) . Only in software collaboration networks (such as MySql studied in Myers (2003)), have we observed such a close relationship between G and .
-1.53
fP (x) and fit: cP x
-1.90
f (x)
0.1
Pr[D = x] and fit: cD x
0.01 10
1 0.001
Laplacian eigenvalue Pk Ordered degree d(k) 0.1 0
0.0001
100
200 2
300 k 400 3
4
500 5
6
600 7
1
700
8 9
2
3
4
10 x
Fig. 7.2. The density function of the degree and of the Laplacian eigenvalues in the software dependence network VTK. The insert shows how close the ordered degree and Laplacian eigenvalues are.
We do not have an explanation of why G and are so close. The observation suggests, in view of the definition of the Laplacian T = D, that the influence of the adjacency matrix on the eigenvalues of the Laplacian is almost negligible. The bounds in art. 70, derived from the interlacing principle, g(n) 1 (D) n (T) g(n) Q (D) are too weak because 1 (D) = 11=46 and Q (D) = 9=13. Fig. 7.3 presents the density function i ({) of the adjacency eigenvalues, which is typically treelike: a high peak i (0) = 0=42 at the origin { = 0 and the density function is almost symmetric around the origin, i ({) i ({). If a graph is locally treelike (art. 131), we would expect its density to approximately follow McKay’s law (Theorem 31) drawn in Fig. 6.1. At first glance, the peaks in i ({) at roughly { = 1 and { = 1 may hint at such a locally tree-like structure, but, since the degree u in (6.14) should be at least 2, a locally tree-like structure should have peaks at roughly { = 2 and { = 2. The small variance Var[] = H [G] = 3=52 (art. 27), which is much smaller than Var[G] and than Var[] = Var[G] +Var[], supports
7.2 Distribution of the Laplacian eigenvalues and of the degree
183
the observation above why the adjacency spectrum only marginally influences the Laplacian eigenvalues. -1
1.0x10
10
0.8
Ok
5
0.6 fO (x)
0
0.4
-5
0.2
0
200
400 k
600
0.0 -5
0
5
10
x Fig. 7.3. The density of the eigenvalues of the adjacency matrix of the VTK graph. The insert shows the ordered eigenvalues n versus their rank n, where 1 = 11=46 and Q = 39=13.
Finally, we mention the nice estimate of Dorogovtsev et al. (2003). Using an approximate analysis from statistical physics, but inspired by McKay’s result (Section 6.4) based on random walks, Dorogovtsev et al. (2003) derived the asymptotic law for the tails of i ({) of locally tree-like graphs as ¤ £ i ({) 2 |{| Pr G = {2 for large {. For example, in a power law graph where Pr [G = n] = fn , the asymptotic tail behavior of the density function of the adjacency eigenvalues is 12
i ({) 2f |{|
As shown in Fig. 7.2, the power law exponent for the VTK network is about ' 1=9 such that 2 1 ' 2=8, but fitting the tail region of i ({) in a log-log plot gives a slope of 1=7, which again seems to indicate that the VTK graph is not su!ciently close to a locally tree-like, power law graph.
184
Spectra of complex networks 7.3 Functional brain network
The interactions between brain areas can be represented by a functional brain network as shown by Stam and Reijneveld (2007). The concept of functional connectivity refers to the statistical interdependencies between physiological time series recorded in various brain areas, and is thought to reflect communication between several brain areas. Magneto-encephalography (MEG), a recording of the brain’s magnetic activity, is a method used to assess functional connectivity within the brain. Each MEG channel is regarded as a node in the functional brain network, while the functional connectivity between each pair of channels is represented by a link, whose link weight reflects the strength of the connectivity, measured via the synchronization likelihood. It is based on the concept of general synchronization (Rulkov et al., 1995), and takes linear as well as nonlinear synchronization between two time series into account. The synchronization likelihood zlm between time series l and m lies in the interval [0> 1], with zlm = 0 indicating no synchronization, and zlm = 1 meaning total synchronization. We adopt the convention that zmm = 0, rather than zmm = 1, because of the association with the adjacency matrix of the corresponding functional brain graph.
1.0
14 ON-k (Abefore)-ON-k (Aafter)
0.8
12
ON-k(A)
10 8
0.6 0.4 0.2 0.0
6
-0.2
4
-0.4 0
20
40
60
80
100
120
140
k
2
before surgery after surgery
0 0
20
40
60
80
100
120
140
k
Fig. 7.4. The eigenvalues of the weighted adjacency matrix of the functional brain network before and after surgery in increasing order. The insert shows the dierences between the eigenvalues before and after surgery.
The weighted adjacency matrix Z of the human functional brain network contains as elements zlm the synchronization likelihood between the Q = 151 dierent MEG channels, each probing a specific area in the human brain as detailed in Wang
7.4 Rewiring Watts-Strogatz small-world graphs
185
et al. (2010a). Since all functional brain areas are correlated, the matrix Z has the structure of the adjacency matrix DNQ of the complete graph NQ , where the one-elements dlm are substituted by the correlations |zlm | 1. Since the matrix norm kZ kt kDNQ kt because all elements |zlm | 1, art. 166 indicates that 1 (Z ) kZ kt and 1 (Z ) 1 (DNQ ) = Q 1. Fig. 7.4 shows the eigenvalues of the weighted adjacency matrix Z of the functional brain network of a typical patient before and after surgery. The correlations zlm before and after surgery are almost the same. The spectrum in Fig. 7.4 is closely related to that of the complete graph NQ : the Q = 1 eigenvalue with multiplicity Q 1 in NQ is here spread over the interval [1> 1 ). All eigenvalues are simple and the largest eigenvalue in [14> 15] is clearly most sensitivity to the changes in the weighted adjacency matrix Z , as the insert in Fig. 7.4 shows. Hence, the changes in the few largest eigenvalues seem to be good indicators to evaluate the eect of the brain surgery. The (weighted) adjacency matrix of any graph has at least two dierent eigenvalues as follows from art. 25. The complete graph is the only graph with Q A 2 nodes that has precisely two dierent eigenvalues. Strongly regular graphs (art. 42) have three dierent eigenvalues. A small number of dierent eigenvalues implies a small diameter (art. 39). The weighted, symmetric adjacency matrix Z = Z W deviates from the zero-one matrix and, with high probability, all of its eigenvalues are simple given that almost all real zlm are dierent (art. 181).
7.4 Rewiring Watts-Strogatz small-world graphs The spectrum of the Watts-Strogatz small-world graph JSW n without link rewiring is computed in Section 5.2. Recall that JSW n is a regular graph (art. 41) where each node has degree u = 2n. When links in JSW n are rewired, independently and with probability su , the graph’s topology and properties change with su . Fig. 1.3 presents a rewired Watts-Strogatz small-world graph, while the original regular small-world graph JSW n is shown in Fig. 5.1. Here, we investigate the influence of the link rewiring probability su on the eigenvalues of the adjacency matrix of Watts-Strogatz small-world graphs. Fig. 7.5 shows the pdf i ({) of an eigenvalue of the adjacency matrix of a Watts-Strogatz small-world graph. In absence of randomness su = 0, the spectrum is discrete, reflected by the peaks in Fig. 7.5 and drawn dierently for all n in Fig. 5.2. When randomness is introduced by increasing su A 0, the peaks smooth out and Fig. 7.5 indicates that the pdf i ({) tends rapidly to that of the Erd˝osRényi random graph shown in Fig. 6.2. Fig. 7.5 thus suggests that a bell-shape of the spectrum around the origin is a fingerprint of “randomness” in a graph, while peaks reflect “regularity” or “structure”1 . We also observe that “irregularity” can be measured, as mentioned in art. 43, by the amount that the largest eigenvalue deviates from the mean degree 1
The quotes here refer to an intuitive meaning. A commonly agreed and precise definition of “randomness” and “structure” of a graph is lacking.
186
Spectra of complex networks
0.8
fO(x)
0.6
0.4
0.2
0.0 -4
-2
0
2 eigenvalue x
4
6
8
Fig. 7.5. The probability density function i ({) of an eigenvalue in Watts-Strogatz smallworld graphs with Q = 200 and n = 4 for various rewiring probabilities su ranging from 0 to 1, first in steps of 0=01 until su = 0=1, followed by an increase in steps of 0=1 up to su = 1. The arrow shows the direction of increasing su .
H [G] = 2n. Rewiring does not change the mean degree (because the number of links and nodes is kept constant and H [G] = 2O Q ), but we clearly see an increase of the largest eigenvalue from 1 = 8 = 2n when su = 0 to about 9 for su = 1. Fig. 5.3 has shown how irregular the number of dierent eigenvalues of JSW n without rewiring behaves as a function of Q and n. Simulations indicate that, even for a small rewiring probability of su = 0=01, the spectrum only contains simple eigenvalues with high probability. More precisely, when rewiring only one link in JSW n with Q = 200 and n = 4, the number of distinct eigenvalues dramatically increases from 95 to about 190. In other words, destroying the regular adjacency matrix structure by even one element has a profound impact on the multiplicity of the eigenvalues. This very high sensitivity is a known fact in the study of zeros of polynomials (Wilkinson, 1965, Chapter 2): small perturbations of the coe!cients of a polynomial may heavily impact the multiplicity of the real zeros (and whether the perturbed zeros are still real). Another consequence is that the upper bound in Theorem 5 on the diameter in terms of the number of dierent eigenvalues is almost useless in real-world graphs, where most of the eigenvalues are dierent, such that the bound in Theorem 5 reduces to Q 1 = max . By rewiring links in JSW n , we observe even contrasting
7.5 Assortativity
187
eects: the regular structure of JSW n is destroyed, which causes the diameter, in most cases, to shrink, while the number of dierent eigenvalues suddenly jumps to almost the maximum Q .
7.5 Assortativity 7.5.1 Theory “Mixing” in complex networks refers to the tendency of network nodes to connect preferentially to other nodes with either similar or opposite properties. Mixing is computed via the correlations between the properties, such as the degree, of nodes in a network. Here, we study the degree mixing in undirected graphs. Generally, the linear correlation coe!cient between two random variables of [ and \ is defined (Van Mieghem, 2006b, p. 30) as ([> \ ) =
H [[\ ] [ \ [ \
(7.1)
p where [ = H [[] and [ = Var [[] are the mean and standard deviation of the random variable [, respectively. Newman (2003a, eq. (21)) has expressed the linear degree correlation coe!cient of a graph as P G =
{|
{| (h{| d{ e| ) (7.2)
d e
where h{| is the fraction of all links that connect the nodes with degree { and | and where d{ and e| are the fraction of links that start and end at nodes with degree { and |, satisfying the following three conditions: X {|
h{| = 1, d{ =
X |
h{| and e| =
X
h{|
{
When G A 0, the graph possesses assortative mixing, a preference of high-degree nodes to connect to other high-degree nodes and, when G ? 0, the graph features disassortative mixing, where high-degree nodes are connected to low-degree nodes. The translation of (7.2) into the notation of random variables is presented as follows. Denote by Gl and Gm the node degree of two connected nodes l and m in an undirected graph with Q nodes. In fact, we are interested in the degree of nodes at both sides of a link, without taking the link, that we are looking at, into consideration. As Newman (2003a) points out, we need to consider the number of excess links at both sides, and, hence the degree Go+ = Gl 1 and Go = Gm 1, where the link o has a start at o+ = l and an end at o = m. The linear correlation
188
Spectra of complex networks
coe!cient of those excess degrees is (Go+ > Go ) =
H [Go+ Go ] H [Go+ ] H [Go ] Go+ Go
H [(Go+ H [Go+ ]) (Go H [Go ])] =r h i h i H (Go+ H [Go+ ])2 H (Go H [Go ])2 Since Go+ H [Go+ ] = Gl H [Gl ], subtracting one link everywhere does not change the linear correlation coe!cient, provided Gl A 0 (and similarly that Gm A 0), which is the case if there are no isolated nodes. Removing isolated nodes from the graph does not alter the linear degree correlation coe!cient (7.2). Hence, we can assume that the graph has no zero-degree nodes. In summary, the linear degree correlation coe!cient is (Go+ > Go ) = (Gl > Gm ) =
H [Gl Gm ] 2Gl H [Gl2 ] 2Gl
(7.3)
We now proceed by expressing H [Gl Gm ], H [Gl ] and Gl in the definition of (Go+ > Go ) = (Gl > Gm ) for undirected graphs in terms of more appropriate quantities of algebraic graph theory. First, we have that 1 XX gW Dg gl gm dlm = 2O l=1 m=1 2O Q
H [Gl Gm ] =
Q
where gl and gm are the elements in the degree vector g = (g1 > g2 > ===> gQ ), and dlm is the element (1.1) of the (symmetric) adjacency matrix D, that expresses connectivity between node l and m. The quadratic form gW Dg can be written in terms of the total number Qn = xW Dn x of walks with n hops (art. 33). The total number Q3 = gW Dg of walks with three hops is called the v metric in Li et al. (2006). The average Gl and Gm are the mean node degree of the two connected nodes l and m, respectively, and not the mean of the degree G of a random node, which equals H [G] = 2O Q . Thus, Q Q Q Q Q 1 XX 1 X X 1 X 2 gW g gl dlm = gl dlm = g = 2O l=1 m=1 2O l=1 m=1 2O l=1 l 2O
Gl = while
1 XX gm dlm = Gl 2O l=1 m=1 £ ¤ = Var[Gl ] = H Gl2 2Gl and Q
Q
Gm =
2 The variance G l
Q Q Q £ ¤ £ ¤ 1 XX 2 1 X 3 H Gl2 = gl dlm = g = H Gm2 2O l=1 m=1 2O l=1 l
After substituting all terms into the expression (7.3) of the linear degree correlation
7.5 Assortativity
189
coe!cient, we obtain, with Q1 = 2O and Q2 = gW g, our reformulation of Newman’s definition (7.2) in terms of Qn , G = (Gl > Gm ) =
Q1 Q3 Q22 Q P Q1 g3l Q22
(7.4)
l=1
The crucial understanding of (dis)assortativity lies in the total number Q3 of walks with three hops compared to those with two hops, Q2 , and one hop, Q1 = 2O. 7.5.1.1 Discussion of (7.4) As shown in art. 35, the total number Qn = xW Dn x of walks of length n is upper bounded by Qn
Q X
gnm
m=1
with equality only if n 2 and, for all n, only if the graph is regular. Hence, (7.4) shows that only if the graph is regular, G = 1, implying that maximum assortativity is only possible in regular graphs2 . Since the variance of the degrees at one side of an arbitrary link μ ¶2 Q 1 X 3 Q2 2 Gl = g 0 (7.5) Q1 l=1 l Q1 the sign of Q1 Q3 Q22 in (7.4) distinguishes between assortativity (G A 0) and disassortativity (G ? 0). The sign of Q1 Q3 Q22 can also be determined from (4.58). Using (3.23) and denoting a link o = l m, the degree correlation (7.4) can be rewritten as P 2 lm (gl gm ) G = 1 (7.6) μ Q ¶2 Q P P 2 1 3 gl 2O gl l=1
l=1
The graph is zero assortative (G = 0) if Q22 = Q1 Q3
(7.7)
We can show that the connected Erd˝os-Rényi ¡ ¢ random graph Js (Q ) is zero-assortative for all Q and link density s = O@ Q2 A sf , where sf is the disconnectivity threshold. Asymptotically for large Q , the Barabási-Albert power law graph is zero-assortative as shown in Nikoloski et al. (2005). 2
Notice that the definition (7.4) is inadequate (due to a zero denominator and numerator) for a regular graph with degree u because Qn re gu la r g ra p h = Q un (art. 33). For regular graphs S 3 2 where Q l=1 gl = Q3 , the perfect disassortativity condition (7.8) becomes Q2 = Q1 Q3 , which is equal to the zero assortativity condition (7.7). One may argue that G;re g u la r g ra p h = 1, since all degrees are equal and thus perfectly correlated. On the other hand, the complete graph NQ minus one link o has G (NQ \{o}) = Q32 , which suggests that G (NQ ) = 0 instead of 1= 31
190
Spectra of complex networks
Perfect disassortativity (G = 1 in (7.4)) implies that ! Ã Q X Q1 2 3 gl Q2 = Q3 + 2 l=1
(7.8)
For a complete bipartite graph Np>q (Section 5.7), we have that X
2
2
(gl gm ) = pq (q p) ,
lm
Q X l=1
Q X ¢ ¡ g3l = qp q2 + p2 and g2l = qp (q + p) l=1
such that (7.6) becomes G = 1, provided p 6= q. Hence, any complete bipartite graph Np>q (irrespective of its size and structure (p> q), except for the regular graph variant where p = q) is perfectly disassortative. The perfect disassortativity of complete bipartite graphs is in line with the definition of disassortativity, because each node has only links to nodes of a dierent set with dierent properties. Nevertheless, the fact that all complete bipartite graphs Np>q with p 6= q have G = 1, even those with nearly the same degrees p = q ± 1 and thus close to regular graphs typified by G = 1, shows that assortativity and disassortativity of a graph is not easy to predict. It remains to be shown that the complete bipartite graphs Np>q with p 6= q are the only perfect disassortative class of graphs. There is an interesting relation between the linear degree correlation coe!cient G of the graph J and the variance of the degree of a node in the corresponding line graph o (J) (art. 7). ¡ The ¢ o-th component of the O × 1 degree vector in the line graph o (J) (art. 8) is go(J) o = gl + gm 2, where node l and node m are connected by link o = l m. The variance of the degree Go(J) of a random node in the line graph equals h i £ ¤ Var Go(J) = H (Gl + Gm )2 (H [Gl + Gm ])2 which we rewrite as
¤ ¡ £ ¤ ¢ £ Var Go(J) = 2 H Gl2 2Gl + H [Gl Gm ] 2Gl
Using (7.3), we arrive at ¤ ¢ ¡ £ ¤ £ Var Go(J) = 2 (1 + G ) H Gl2 2Gl = 2 (1 + G ) Var [Gl ] Ã μ ¶2 ! Q 1 X 3 Q2 g = 2 (1 + G ) Q1 l=1 l Q1
(7.9)
Curiously, (7.9) shows for perfect disassortative graphs (G = 1) £ the expression ¤ that Var Go(J) = 0. The latter means that o (J) is then a regular graph, but this does not imply that the original graph J is regular. Indeed, if J is regular, then o¢(J) is also regular as follows from the o-th component of the degree vector, ¡ go(J) o = gl +gm 2. However, the reverse is not necessarily true: it is possible that o (J) is regular, while J is not, as shown above, for complete bipartite graphs Np>q with p 6= q that are not regular. In summary, in both extreme cases G = 1 and G = 1, the corresponding line graph o (J) is a regular graph.
7.5 Assortativity
191
7.5.1.2 Relation between 1 and G The largest eigenvalue 1 of the adjacency matrix D of a graph is an important characterizer of a graph. Here, we present a new lower bound for 1 in terms of the linear degree correlation coe!cient G . Art. 44 presents improvements on the classical lower bound (3.31) for 1 in terms of Qn . For n = 3 in (3.33) and using (7.4), we obtain ! ! Ã ÃQ X Q3 Q22 Q22 1 3 3 1 = gl + (7.10) G Q Q Q1 Q1 l=1 This last inequality (7.10) with (7.5) shows that the lower bound for the largest eigenvalue 1 of the adjacency matrix D is strictly increasing in the linear degree correlation coe!cient G (except for regular graphs). Given the degree vector g is constant, inequality (7.10) shows that the largest eigenvalue 1 is obtained, in the case where we succeed in increasing the assortativity of the graph by degreepreserving rewiring, which is discussed in Section 7.5.2.
25
exact bound 1 bound 2 bound 3
O1
20
15
10 -0.4
-0.2
0.0
0.2
0.4
UD Fig. 7.6. The largest eigenvalue 1 of the Barabási-Albert power law graph with Q = 500 nodes and O = 1960 links versus the linear degree correlation coe!cient G . Various lower bounds are plotted: bound 1 is (7.10), bound 2 is (3.40) and bound 3 is (3.35) for p = 1. The corresponding classical lower bound (3.31) is 7.84, while the lower bound (3.34) is 10.548. The latter two lower bounds are independent of G .
Fig. 7.6 illustrates how the largest eigenvalue 1 of the Barabási-Albert power law graph evolves as a function of the linear degree correlation coe!cient G , which can be changed by degree-preserving rewiring. The lower bound (3.40) clearly outperforms the lower bound3 (7.10). 3
Especially for strong negative G , we found — very rarely though — that (3.40) can be slightly worse than (3.34).
192
Spectra of complex networks 7.5.1.3 Relation between Q1 and G
The Rayleigh principle in art. 88 provides an upper bound for the second smallest eigenvalue Q 1 of the Laplacian T for the choice j (q) = gq , the degree of a node q, P 2 o5L (go+ go ) Q 1 P ³P ´2 Q Q 2 1 g g m=1 m m=1 m Q After introducing (7.6), we find for any non-regular graph Q P l=1
Q 1 (1 G ) P
g3l
Q m=1
g2m
1 2O 1 Q
μQ P l=1
¶2 g2l
³P Q m=1
gm
£ ¤ ¡ £ ¤¢2 H [G] H G3 H G2 ´2 = (1 G ) H [G] Var [G]
(7.11) which is an upper bound for the algebraic connectivity Q 1 in terms of the linear correlation coe!cient of the degree G . In degree-preserving rewiring, the fraction in (7.11), which is always positive, is unchanged and we observe that the upper bound decreases linearly in G .
7.5.2 Degree-preserving rewiring In degree-preserving rewiring, links in a graph are rewired while maintaining the degree distribution unchanged. This means that the degree vector g is constant and, P PQ 2 PQ 3 consequently, that Q1 = Q l=1 gl > Q2 = l=1 gl and l=1 gl do not change during degree-preserving rewiring, only Q3 does, and, by (7.4), also the (dis)assortativity G . P Degree-preserving rewiring changes only the term lm (gl gm )2 in (7.6), which allows us to understand how a degree-preserving rewiring operation changes the linear degree correlation G . Each step in degree-preserving random rewiring consists of first randomly selecting two links l m and n o associated with the four nodes l> m> n> o. Next, the links can be rewired either into l n and m o or into l o and mn . Theorem 35 Given a graph in which two links are degree-preservingly rewired. We order the degree of the four involved nodes as g(1) g(2) g(3) g(4) . The two links are associated with the four nodes qg(1) > qg(2) > qg(3) and qg(4) only in one of the following three ways: (a) qg(1) qg(2) > qg(3) qg(4) , (b) qg(1) qg(3) > qg(2) qg(4) and (c) qg(1) qg(4) > qg(2) qg(3) = The corresponding linear degree correlation introduced by these three possibilities obeys d e f . Proof: In these three ways of placing the two links, the degree of each node remains the same. According to the definition (7.6), the linear degree correlation
7.5 Assortativity P
193
2
changes only via % = lm (gl gm ) . Thus, the relative degree correlation difference between (a) and (b) is ¢2 ¡ ¢2 ¡ ¢2 ¡ ¢2 ¡ %d %e = g(1) g(2) g(3) g(4) + g(1) g(3) + g(2) g(4) = 2(g(2) g(3) )(g(1) g(4) ) 0 since the rest of the graph remains the same in all three cases. Similarly, %d %f = 2(g(2) g(4) )(g(1) g(3) ) 0 %e %f = 2(g(1) g(2) )(g(3) g(4) ) 0 These three inequalities prove the theorem.
¤
A direct consequence of Theorem 35 is that we can now design a rewiring rule that increases or decreases the linear degree correlation G of a graph. We define degreepreserving assortative rewiring as follows: randomly select two links associated with four nodes and then rewire the two links such that as in (a) the two nodes with the highest degree and the two lowest degree nodes are connected. If any of the new links exists before rewiring, discard this step and a new pair of links is randomly selected. Similarly, the procedure for degree-preserving disassortative rewiring is: randomly select two links associated with four nodes and then rewire the two links such that as in (c) the highest degree node and the minimum degree node are connected, while also the remaining two nodes are linked provided the new links do not exist before rewiring. Theorem 35 shows that the degree-preserving assortative (disassortative) rewiring operations increase (decrease) the degree correlation of a graph. The assortativity range, defined as dierence maxG minG , may be regarded as a metric of a given degree vector g, which reflects its adaptivity in (dis)assortativity under degree-preserving rewiring. As shown earlier, for some graphs such as regular graphs, that dierence maxG minG = 0, while maxG minG 2 because 1 G 1. Degree-preserving rewiring is an interesting tool to modify a graph in which the resources of the nodes are constrained. For instance, the number of outgoing links in a router as well as the number of flights per day at many airports are almost fixed. As an example, we consider degree-preserving rewiring in the US air transportation network4 , where each node is an American airport and each link is a flight connection between two airports. We are interested in an infection process, where viruses are spread via airplanes from one city to another. From a topological point of view, the infection threshold f = 11 is the critical design parameter, which we would like to have as high as possible because an eective infection rate A f translates into a certain percentage of people that remain infected after su!ciently long time (for details see Van Mieghem et al. (2009)). Since most airports operate near to full capacity, the number of flights per airport should hardly 4
The number of nodes is Q = 2179 and the number of links is O = 31326.
194
Spectra of complex networks
150
Adjacency eigenvalues
UD
100
Percentage of rewired links 0
100
200
300
0.20 0.15 0.10 0.05 0.00 400
50
0
-50 0
100
200 Percentage of rewired links
300
400
Fig. 7.7. The ten largest and five smallest eigenvalue of the adjacency matrix of the USA airport transport network versus the percentage of rewired links. The insert shows the linear degree correlation coe!cient G as a function of the assortative degree-preserving rewiring.
change during the re-engineering to modify the largest eigenvalue 1 . Fig. 7.7 shows how the adjacency eigenvalues of the US air transportation network changes with degree-preserving assortative rewiring, while the disassortive companion figure is also shown in Van Mieghem et al. (2010). In each step of the rewiring process, only four one elements (i.e., two links) in the adjacency matrix change position. If we relabel the nodes in such a way that the link between 1 and 2 and between 3 and 4 (case (a) in Theorem 35) is rewired to either case (b) or (c), then only a 4 × 4 submatrix D4 of the adjacency matrix D in ¸ D4 F D= F W Df is altered. The Interlacing Theorem 42 states that m+4 (D) m (Df ) m (D) for 1 m Q 4, which holds as well for Du after just one degree-preserving rewiring step. Thus, most of the eigenvalues of D and Du are interlaced, as observed from Fig. 7.7. The large bulk of the 2179 eigenvalues (not shown in Fig. 7.7) remains centered around zero and confined to the almost constant white strip between 10 and Q 5 . As shown in Section 7.5.1.2, assortative rewiring increases 1 . Fig. 7.7 illustrates, in addition, that the spectral width or range 1 Q increases as well, while the spectral gap 1 2 remains high, in spite of the fact that the algebraic connectivity Q 1 is small. In fact, Fig. 7.8 shows that Q 1 decreases, in agreement with
7.5 Assortativity
0.7
195
0.1 Pr[D = k]
-1.21
Laplacian eigenvalues
0.6 0.5
Pr[D = k] ~ k
0.01
0.001
0.4 1
2
46
0.3
2 46 2 10 100 degree k
4
0.2 0.1 0.0 0
20
40
60
80
100
Percentage of rewired links
Fig. 7.8. The twenty smallest eigenvalues of the Laplacian matrix of the US air transportation network versus the percentage of rewired links. The insert shows the degree distribution that is maintained in each degree-preserving rewiring step.
(7.11), and vanishes after about 10% of the link rewirings, which indicates (art. 80) that the graph is then disconnected. Fig. 7.8 further shows that by rewiring all links on average once (100%), assortative degree-preserved rewiring has dissected the US air transportation network into 20 disconnected clusters. Increasing assortativity implies that high-degree and low-degree nodes are linked increasingly more to each other, which, intuitively, explains why disconnectivity in more and more clusters start occurring during the rewiring process. The opposite occurs in disassortative rewiring: the algebraic connectivity Q1 was found to increase during degree-preserving rewiring from about 0.25 to almost 1, which is the maximum possible due to (4.23) and gmin = 1 as follows from the insert in Fig. 7.8. Hence, in order to suppress virus propagation via air transport while guaranteeing connectivity, disassortative degree-preserving rewiring is advocated, which, in return, enhances the topological robustness as explained in art. 97. Finally, we mention that highly disassortative graphs possess a zero eigenvalue of the adjacency matrix with large multiplicity, which can be understood from Section 7.1.1: high degree nodes are preferentially connected to a large set of low degree nodes, that are not interconnected among themselves.
196
Spectra of complex networks 7.6 Reconstructability of complex networks
In this section, we investigate, given the set of eigenvectors {1 > {2 > = = = > {Q , how many eigenvalues of the adjacency matrix D are needed to be able to reconstruct D exactly. Specifically, we perturb the spectrum by omitting the m smallest eigenvalues in absolute value of D and we determine the maximal value of m such that the matrix D can be exactly reconstructed. PQ Since m=0 m = 0 (art. 25), on average half of the eigenvalues ¯of the ¯ adjacency ¯ ¯ ¯ ¯ ¯ ¯ matrix¯ D are ¯ negative. Therefore, we reorder the eigenvalues as (1) (2) ¯ ¯ · · · (Q ) such that (m) is the m-th smallest (in absolute value) eigenvalue corresponding to the eigenvector {(m) . Let us define the Q × Q matrices ¢ ¡ (m) = diag 0> = = = > 0> (m+1) > (m+2) > · · · > (Q) and ˜ (m) [ ˜W D(m) = [ ¤ £ ˜ = {(1) {(2) · · · {(Q) is the reordered version of the orthogonal where [ matrix [ in (1.2) corresponding to the eigenvalues ranked in absolute value. Thus, (m) is the diagonal matrix where the m smallest (in absolute value) eigenvalues are put equal to zero, or, equivalently, are removed from the spectrum of D. The spectral perturbation here considered consists of consecutively removing more eigenvalues from the spectrum until we can no longer reconstruct the adjacency matrix D. Clearly, when m = 0, we have that D(0) = D and that, for any other m A 0, D(m) 6= D. Moreover, when m A 0, D(m) is not a zero-one matrix anymore. Fig. 7.9 plots the histograms of the entries of D(5) , D(10) , D(15) and D(20) for an Erd˝os-Rényi random graph with Q = 36 nodes and link density of s = 0=5. The removal of a part of the eigenvalues causes roughly the same impact on the 1 and 0 elements of the adjacency matrix D, as shown in Fig. 7.9. This means that the deviations on 1s and 0s are almost the same, and that the distribution of values around 1 and 0 will reach 1/2 roughly simultaneously, when the number of removed eigenvalues increases gradually. Using Heavyside’s step function k ({), ; ? 0 if { ? 0 1 k ({) = if { = 0 = 2 1 if { A 0 ³¡ ´ ¢ we truncate the elements of D(m) as k D(m) lm 12 . If we now define the operator ³¡ ´ ¢ H applied to a matrix D(m) that replaces each element of D(m) by k D(m) lm 12 , then ¢ ¡ fm = H D(m) D is a zero-one matrix, with the possible exception of elements 12 . The interesting observation from extensive simulation is that there seems to exist a maximal number
7.6 Reconstructability of complex networks
120
350 250
A(5)
100
Histogram
Histogram
300 200 150 100
-0.5
0.0
0.5 1.0 Entries of A(5)
A(15)
40 30 20 10 0
80 60 40
0
1.5
Histogram
Histogram
60 50
A(10)
20
50 0
197
-0.5
0.0 0.5 1.0 Entries of A(15)
1.5
45 40 35 30 25 20 15 10 5 0
-0.5
0.0 0.5 1.0 Entries of A(10)
1.5
0.0 0.5 1.0 Entries of A(20)
1.5
A(20)
-0.5
Fig. 7.9. The histograms of the entries of D(5) , D(10) , D(15) and D(20) . The matrix D (D = D(0) ) is the adjacency matrix of an Erd˝os-Rényi random graph with Q = 36 nodes and link density s = 0=5.
, such that fm = D‚ if m D fm 6= D‚ if m A D In other words, is the maximum number of eigenvalues that can be removed from the spectrum of the graph such that the graph can still be reconstructed precisely, given the matrix [. We therefore call the reconstructability coe!cient.
7.6.1 Theory Art. 156 shows that any real, symmetric matrix D can be rewritten as (8.31), D=
Q X n=1
n {n {Wn =
Q X
n Hn
n=1
where the matrix Hn = {n {Wn is the outer product of {n by itself. Any element of D can be written, with the above relabelling of the eigenvectors according to a
198
Spectra of complex networks ¯ ¯ ¯ ¯ ¯ ¯ ranking in absolute values of the eigenvalues ¯(1) ¯ ¯(2) ¯ · · · ¯(Q ) ¯ as dlm =
p X
Q X ¡ ¢ ¡ ¢ (n) H(n) lm + (n) H(n) lm
n=1
(7.12)
n=p+1
where p 5 [1> Q ] is, for the time being, ¯an integer. As shown in art. 157, the ¢ ¯¯ ¯¡ 2-norm of Hn is not larger than 1, so that ¯ H(n) lm ¯ 1 for any 1 n Q , which ¢ ¡ implies that 1 H(n) lm 1. Relation (8.31) also explains why an ordering in absolute value is most appropriate for our spectral perturbation: the usual ordering 1 2 · · · Q1 Q in algebraic graph theory would first remove Q ? 0, then Q 1 and so on. However, |Q | can be large and its omission from the spectrum is likely to cause too big an impact. The reconstructability of a graph is now reformulated as follows. Since dlm is either zero or one, it follows from (7.12) that, if ¯ ¯ Q ¯ X ¡ ¢ ¯¯ 1 ¯ (7.13) (n) H(n) lm ¯ ? ¯dlm ¯ ¯ 2 n=p+1
we can reconstruct the element dlm as ( ¡ ¢ PQ 1 if n=p+1 (n) H(n) lm A ¡ ¢ PQ dlm = 0 if n=p+1 (n) H(n) lm ?
1 2 1 2
The reconstructability requirement (7.13) determines the values of p that satisfy the inequality. The largest value of p obeying (7.13) is denoted by , called the reconstructability coe!cient of a graph. Using (7.12), the reconstructability requirement (7.13) is equivalent to ¯ ¯ ¯X ¡ ¢ ¯¯ 1 ¯ (n) H(n) lm ¯ ? ¯ ¯ ¯ 2 n=1
¢ ¡ A further analysis is di!cult due to the appearance of the matrix elements H(n) lm , ¯¡ ¢ ¯¯ ¯ of which, in general, not much is known. Since ¯ H(n) lm ¯ 1, we can bound the sum as ¯ ¯ ¯X ¯ ¯ ¯¡ ¯ ¯ ¡ ¢ ¯¯ X ¢ ¯ X ¯ ¯(n) ¯ ¯¯ H(n) ¯¯ ¯(n) ¯ (n) H(n) lm ¯ (7.14) ¯ lm ¯ ¯ n=1
n=1
n=1
In many cases, this bound is conservative because, on average, half of the¡eigenvalues ¢ of the adjacency matrix D is negative. Moreover, the matrix element H(n) lm can also be negative. Liu et al. (2010) show for the class Js (Q ) of Erd˝os-Rényi random graphs that the bound (7.14) is, indeed, too conservative and that only extensive simulations seem appropriate to determine the reconstructability coe!cient .
7.7 Reaching consensus
199
7.6.2 The average the reconstructability coe!cient H [] Via extensive simulations, Liu et al. (2010) investigated the properties of the reconstructability coe!cient for several important types of complex networks introduced in Section 1.3, such as Erd˝os-Rényi random graphs, Barabási-Albert scalefree networks and Watts-Strogatz small-world networks, and also other special deterministic types of graphs. A general linear scaling law was found: H [] = dQ
(7.15)
where the real number d 5 [0> 1] depends on the graph J. Moreover, the variance Var[] was su!ciently smaller than the mean H [] such that H [] serves as an excellent estimate for . For su!ciently large Q , a portion d of the smallest eigenvalues (in absolute value) can be removed from the spectrum and the adjacency matrix is still reconstructable with its original eigenvectors. The magnitude of d for dierent types of complex networks with dierent parameters was found to vary from 39% to 76%, which is surprisingly high. The basic eigenvalue relation (8.31) shows that the set of orthogonal eigenvectors are weighted by their corresponding eigenvalues. Any eigenvector specifies an orthogonal direction in the Q -dimensional space (see Section 1.1). The eigenvector with an eigenvalue in absolute value close to zero contains redundant information about the topology of the graph, in the sense that after the removal of this eigenvalue the network can still be reconstructed from the remaining spectrum. Liu et al. (2010) observe that when the graphs are more “regularly structured”, the parameter d seems higher. Deterministic graphs, like path, ring and lattice graphs, are more “regularly structured” than Erd˝os-Rényi random graphs, power-law graphs and rewired small-world graphs. In the spectral domain, the more regular a graph is, the more constraints it needs to obey, and the less the Q -dimensional space is “sampled”. In other words, the fewer the number spectral bases (eigenvectors) that are needed to reconstruct the graph. One may also say that the embedding of the graph structure in the Q -dimensional space does not need those orthogonal dimensions (which act similarly as a kernel of a linear transformation). The reconstructability coe!cient (or the scaled one coe!cient d = H[] Q in (7.15)) can be regarded as a spectral metric of the graph that expresses how many dimensions of the Q -dimensional space are needed to represent or reconstruct the graph. Roughly, a high reconstructability coe!cient reflects a “geometrically simple” graph that only needs a few orthogonal dimensions to be described. The precise physical or topological meaning of the reconstructability coe!cient is not yet entirely clear. Finally, it would be desirable to have a rigorous proof of the claimed linear law (7.15).
7.7 Reaching consensus Each node m in a network possesses a value {m [0] at discrete time n = 0. That value can be an estimate of, for example, the temperature, the pressure, the tra!c
200
Spectra of complex networks
load, measured by each sensor node in the network, or it can represent an opinion score in a social network. The goal is to reach consensus among all the nodes in the network about the value { over time. All nodes exchange their value with their neighbors at discrete time n and they update their value at time n + 1. A simple updating strategy is a weighted sum, X {m [n + 1] = zmm [n] {m [n] + zom [n] {m [n] o is a neighb or of m
where zom [n] is the weight that node m applies at time n to the value received from the neighboring node o. Given the initial vector { [0], the governing equations for the dynamic consensus process are, in matrix form, { [n + 1] = Z [n] { [n] where we can consider a weighted adjacency matrix at time n plus a ³ Z [n] as ´ diagonal matrix diag {zmm }1mQ , because dmm = 0 and, here, the diagonal values are usually the most important ones. The formal solution is obtained by iteration as { [n] = Z [n 1] Z [n 2] = = = Z [0] { [0] This matrix equation is similar to the system equation of a Markov process, where Z [n] is then the stochastic transition probability matrix at time n. Section 3.7 describes the time-convergence of a random walk on a graph towards the steadystate. A simple example of this linear updating strategy is Z [n] = L zT [n] = L z [n] + zD [n] where z is chosen such that Z [n] is a non-negative, symmetric matrix and that consensus is reached rapidly. The whole dynamics of the process then depends on the time-dependence of the graph, reflected by the corresponding Laplacian T [n]. A further simplification lies in considering a static weight matrix Z [n] = Z . The consensus dynamically behaves in that case as { [n] = Z n { [0] = (L zT)n { [0] which is entirely determined by the eigenvalues and eigenvectors of the matrix L zT. The eigenvalues {m }1mQ of L zT are equal to the set {1 zm }1mQ with corresponding eigenvectors equal to those of the Laplacian T. Fast convergence and non-negativity requires that each eigenvalue of L zT obeys 0 m 1, which 1 implies with (4.10) that z ? 11 or, safely, z ? 2gmax . 7.8 Spectral graph metrics Most of the graph metrics, such as the hopcount, diameter, clustering coe!cient, and many more listed in Section 1.4, are defined in the topology domain. In this
7.8 Spectral graph metrics
201
section, we deal with graph metrics that are defined in the spectral domain. We have already encountered some spectral graph metrics such as the algebraic connectivity Q 1 in Section 4.2 and the reconstructability coe!cient in Section 7.6.
7.8.1 Eigenvector centrality The per component eigenvalue equation (1.3) of the n-th eigenvector, ({n )m =
(D{n )m n
=
1 n
X
({n )o
(7.16)
o is a direct neighb or of m
is called the eigenvector centrality of node m according to the eigenvector {n of the adjacency matrix D of a graph J. This “centrality” measure reflects the importance with respect to the eigenvector {n of a node in a network and provides a ranking of the nodes in the network according to the eigenvector {n . Since the eigenvector {1 has non-zero components (art. 21), this largest eigenvector is considered most often. Perhaps the best known example of this spectral graph metric is Google’s Page Rank, explained in Van Mieghem (2006b, Section 11.6), where the importance of webpages are ranked according to the components of the largest eigenvector of a weighted adjacency matrix, actually the stochastic matrix S = 1 D of the web. Since the eigenvectors of the symmetric adjacency matrix D are assumed to be orthogonal and normalized (art. 151), it holds that 1 ({n )m 1. Thus, the definition of eigenvector centrality (7.16) shows that X 1 gm ({n )m 1= n n o is a direct neighb or of m
which suggests the interpretation of the eigenvector centrality (7.16) as a “weighted degree”. Using (2.4), gm = (Dx)m , art. 41 indicates that the eigenvector centrality of regular graphs, according to the largest eigenvalue, equals ({1 )m = s1Q , which is constant for all nodes in the regular graph. Thus, for regular graphs, the eigenvector centrality is, essentially other than a scaling constant, the same as the degree.
7.8.2 Graph energy The graph energy HJ is defined as HJ =
Q X
|m (D)|
(7.17)
m=1
The definition (7.17) of the graph energy is inspired by the energy eigenstates of the Hamiltonian applied to molecules (such as hydrocarbons) and was first proposed by Gutman (Dehmer and Emmert-Streib, 2009, Chapter 7). The chemical origin does not directly help to interpret the notion of graph energy, so that the graph energy is best considered as one of the spectral metrics of a graph.
202
Spectra of complex networks
The absolute sign in the definition (7.17) complicates exact computations, but a large number of bounds exist. A direct application of the inequality (9.49) and art. 144 gives © ¡ ¢ ª (HJ 1 (D))2p (Q 1)2p1 trace D2p 2p 1 (D) ¡ ¢ Rewritten with definition of Zn = trace Dn in art. 36, we obtain, for any integer p A 0, the upper bound q 11@(2p) 2p HJ 1 (D) + (Q 1) Z2p 2p 1 (D) s Since the function i ({) = { + (Q 1)11@(2p) 2p Z2p {2p is decreasing in the s interval arg{ max i ({) { 2p Z2p , a lower bound for 1 (D) of the type (3.33) with n = 2p that lies in that interval can be used to derive a (slightly) less tight upper bound for HJ , r r Q2p 2p Q2p 11@(2p) 2p + (Q 1) HJ Z2p Q Q For example, for p = 1, we obtain with (3.9) v s ! u μ ¶2 Ã u p 2O Var [G] Var [G] 2O HJ 1+ (Q 1)t2O 1+ 2 + 2 Q Q (H [G]) (H [G]) Other upper bounds are found in Dehmer and Emmert-Streib (2009, Chapter 7). A lower bound is deduced from 2 HJ =
Q X
2m (D) +
m=1
Q Q X X
|m (D)| |n (D)|
m=1 n=1;n6=m
We apply the harmonic, geometric and arithmetic mean inequality (5.15) to the last sum v uQ Q Q Q X X uY Y 1 Q(Q 1) t |m (D)| |n (D)| |m (D)| |n (D)| Q (Q 1) m=1 m=1 n=1;n6=m
n=1;n6=m
¯Q ¯ Q ¯Y ¯ Y ¯ ¯ With |n (D)| = ¯ n (D)¯ = |det (D)| (art. 138), we have ¯ ¯ n=1
Q Q Y Y m=1n=1;n6=m
n=1
|m (D)| |n (D)| =
Q Y m=1
Q Y
|m (D)|Q 2 3
= |det (D)|Q C
n=1 Q Y
m=1
|n (D)| = 4Q 2
|m (D)|D
Q Y
|m (D)|Q 2 |det (D)|
m=1
= |det (D)|2(Q 1)
7.8 Spectral graph metrics
203
such that with (3.2) q s q HJ 2O + Q (Q 1) (|det (D)|)2@Q = 2 O + Omax (|det (D)|)2@Q s Clearly, HJ 2O. The determination of graphs that maximize the graph energy HJ is an active domain of research. A complete solution is not known. We content ourselves here to list a few results and refer to Dehmer and Emmert-Streib (2009, Chapter 7) for a detailed review of reported results. Obviously, the graph with minimum energy is the zero graph consisting of isolated nodes, i.e., the complement of the complete graph (NQ )f . Among the trees, the star N1>q has minimal graph energy, whereas the path possesses maximum energy. A curiously observed fact from simulations, as mentioned by Gutman et al. in Dehmer and Emmert-Streib (2009, Chapter 7), is that HJ seems to decrease almost linearly in the multiplicity of the zero eigenvalue for a certain class of graphs. Such an “almost5 ” linear scaling law bears resemblance to the linear scaling law (7.15) of the reconstructability coe!cient.
7.8.3 Eective graph resistance We consider a network in which a flow with magnitude L is injected in node d and the flow leaves the network at node e. As explained in Section 2.1.1, the in-flows and out-flows at a node l satisfy a conservation law: X m5 neighb or(l)
|lm =
Q X
¢ ¡ dlm |lm = L 1{l=d} 1{l=e}
(7.18)
m=1
where |lm = |ml is the flow over link o = l m from node l to m. Thus, if node l is neither the source node d nor the sink node e, then the net flow, the sum of the flows, at node l is zero. Each link o = l m between node l and m contains a resistor ulm . A flow |lm is said to be physical if there is an associated potential function y on the nodes of the network such that yl ym = ulm |lm
(7.19)
In electrical networks, the potential function is called the “voltage”, whereas in hydraulic networks, it is called the “pressure”. The relation (7.19), known as the law of Ohm, reflects that the potential dierence yl ym generates a force that drives the current |lm from node l to node m (if yl ym A 0, else in the opposite direction) and that the potential dierence is proportional to the current |lm . The proportionality constant equals ulm , the resistance between node l and m. For other electrical network elements such as capacitors and inductances, the relations between potential and current are more complicated than Ohm’s law (7.19) and can be derived from the laws of Maxwell. 5
Simulations show that the relation is weakly concave.
204
Spectra of complex networks
Confining ourselves to a connected resistor network in which the flows are entirely specified by the conservation law (7.18) and Ohm’s law (7.19), we will now translate the flow problem into a graph theoretical setting. The aim is to determine the eective resistance matrix with elements $de that satisfy yd ye = $de L. The eective graph resistance, defined as 1 XX 1 $de = xW x 2 d=1 2 Q
UJ =
Q
(7.20)
e=1
can be regarded as a graph metric that measures the di!culty of transport in a graph J. Since the flow injected in node d can spread out over multiple paths towards node e, Klein and Randi´c (1993) characterize the eective graph resistance UJ by the “multiple-route distance diminishment” feature, which is sometimes desirable over the conventional distance that uses a single path from d to e. Intuitively, even with two paths of dierent length, the robustness of communication between d and e might be enhanced somewhat by the longer path. The eective resistance $de is a generalization to any configuration of the classical series and parallel formulas for the resistance. Substituting Ohm’s law (7.19) into the conservation law (7.18) yields Q ¡ ¢ X dlm L 1{l=d} 1{l=e} = (yl ym ) u m=1 lm
= yl
Q X dlm m=1
ulm
Q X dlm m=1
ulm
ym
We introduce the weighted adjacency matrix Z , defined in art. 3, with elements dlm zlm = ulm and obtain Q Q X X ¢ ¡ L 1{l=d} 1{l=e} = yl zlm zlm ym m=1
m=1
Since this relation holds for any node l, the linear set of all such equations is written in matrix form as ; < 3 4 Q ? @ X ˜ L (hd he ) = diag C zlm D Z y = Ty = > m=1
˜ is the weighted Laplacian (art. 3) and hn is the basic vector with the p-th where T component equal to 1{p=n} . Similar to the Laplacian T, one may verify from the ˜ has non-negative eigenvalues proof of Theorem 10 that the weighted Laplacian T ˜ =0 and the smallest eigenvalue is zero, belonging to the eigenvector x (because Tx as follows from the definition). Clearly, for the standard choice of unit resistances, ˜ reduce to the adjacency matrix D and the Laplacian T of the graph J. Z and T
7.8 Spectral graph metrics
205
For the subsequent algebraic manipulations, there is no loss in generality when we limit ourselves to the Laplacian. Due to the zero eigenvalue Q = 0, the matrix equation L (hd he ) = Ty cannot be inverted. However, using the spectral decomposition T = [diag(n ) [ W in art. 156, we have diag (n ) [ W y = L[ W (hd he ) which shows that the last equation corresponding to Q = 0 can be omitted. This means that one of the potentials y1 > y2 > = = = > yQ can be chosen at will, which coincides physically with the fact that only the potential dierence matters. The simplest way of omitting one equation and choosing one of the potentials as a reference, is b = [diag( b b W instead of T, to proceed with the symmetric Q × Q matrix T n) [ b where the Q × (Q 1) matrix [ consists of all eigenvectors of T, except for the eigenvector x belonging to eigenvalue Q = 0 and the (Q 1) × (Q 1) diagonal matrix diag(n ) contains only the positive eigenvalues of T. Art. 148 shows that b equals the adjoint matrix of T with = 0. The orthogonality of the eigenvectors T bW [ b = L and the pseudo-inversion now equals shows that [ b 1 L (hd he ) y=T ¡ 1 ¢ W b b (hd he ) = L [diag n [
(7.21)
b1 , the inverse of the LaplaWe present another expression for the pseudo-inverse T b 1 T = cian within the subspace orthogonal to the vector x. Art. 85 shows that T 1 1 b TT = \ , where \ = L Q M is the projection of any vector orthogonal to the ¡ 1 ¢ W b 1 M = [diag b b x=xW = 0, we have that vector x. Since T n [ b 1 (T + M) = \ = L 1 M T Q and the matrix T + M has an inverse for any connected graph (art. 86). Thus, b1 = (T + M)1 1 M (T + M)1 T Q Art. 84 shows that M (T + M)1 =
1 QM
so that the pseudo-inverse equals
b 1 = (T + M)1 1 M T Q2 W
The definition yd ye = $de L and yd ye = (hd he ) y combined with (7.21) then leads to the quadratic form b 1 (hd he ) $de = (hd he )W T Multiplying out yields ³ ´ b 1 $de = T
dd
³ ³ ´ ´ b 1 b 1 + T 2 T ee
de
206
Spectra of complex networks
from which the symmetric eective resistance matrix is obtained as b 1 (7.22)
= }=xW + x=} W 2T ³³ ³ ´ ³ ´ ´ ´ b 1 b 1 b 1 where the vector } = T > T >===> T and all diagonal ele11 22 QQ ments of are zero, as follows from the definition yd ye = $de L. Finally, the eective graph resistance (7.20) equals 1 W 1 1 b1 x x x = xW }=xW x + xW x=} W x xW T 2 2 2 ³ ´ b 1 = Q xW } = Q trace T
UJ =
¡ 1 ¢ W b 1 x = xW [diag b b x = 0 as the vector x is orthogonal to each because xW T n [ other eigenvector {n of T. Using (8.7) leads to the spectral expression for the eective graph resistance UJ = Q
Q1 X n=1
1 n
(7.23)
For example, for the undirected version of the graph in Fig. 2.1, the eective resistance matrix , computed from (7.22), is 5 6 37 32 49 6 0 31 66 66 33 66 11 9 31 0 17 5 17 31 : 9 66 6 33 66 : 9 37 17 33 15 8 49 : 9 66 33 0 22 11 66 :
= 9 32 5 15 15 32 : 9 33 6 22 0 22 33 : 9 49 17 8 15 : 7 66 33 11 22 0 37 8 66 6 31 49 32 37 0 11 66 66 33 66 and the corresponding eective graph resistance (7.23) is UJ = 659 66 ' 9=98485. A much more interesting example is the eective resistance of the chain of cliques JG (q1 > q2 > ===> qG+1 ), defined in Section 5.11. By using Theorem 27, art. 87 and the explicit relations for the coe!cients f2 (G) and f1 (G) of the characteristic polynomial sG () in Van Mieghem and Wang (2009), the eective resistance of the chain of cliques JG (q1 > q2 > ===> qG+1 ) is ³ Pt1 ´ t1 G+1 G+1 X Q n=1 qn X X qn 1 UJG = qn + Q (7.24) qt1 qt qn1 + qn + qn+1 t=2 n=1
n=1
where the number of nodes Q and the number of links O is given in (5.20) and (5.21) respectively. Theorem 23 shows that the minimum eective resistance in any graph with Q nodes and diameter G is achieved in the class JG (q1 > q2 > ===> qG+1 ). Hence, minimizing (7.24) with respect to q1 > q2 > = = = > qG+1 subject to (5.20) yields the smallest possible eective resistance in any graph with Q nodes and diameter G. An extreme case is the G-hop line topology (see the end of Section 5.11), for
7.8 Spectral graph metrics
207
which all qm = 1 such that Q = G + 1 and the eective graph resistance, computed via (7.24) and via (7.23) with (5.9), is G+1 X G (G + 1) (G + 2) 1 = n 6 2 1 cos G+1 G
UG-hop
line
=
n=1
We end this section by establishing a lower and upper bound for the eective graph resistance UJ . Art. 87 demonstrates that UJ =
f2 (T) (Q 1)2 (J) H [G]
(7.25)
where the complexity (J) of the graph J equals the number of all possible spanning trees in the graph and where f2 (T) equals the number of all spanning trees with Q 2 links in all subgraphs of J that are obtained after deleting any pair of two nodes in J. The lower bound in (7.25) for the eective graph resistance UJ is attained by the complete graph NQ , for which UJ = Q 1. An upper bound follows from the physical fact that the eective resistance $de is never larger than the sum of the resistances in the shortest path from d to e such that $de Kde , where the distance matrix K is defined in art. 3. This bound $de Kde in (7.20) yields μ ¶ Q UJ H [K] 2 where H [K] is the average hopcount in the graph. Alternatively, this bound is equivalent to Q 1 2 X 1 H [K] (7.26) Q 1 n n=1
where equality is obtained for a tree as shown in (4.16) of art. 87.
Part II Eigensystem and polynomials
8 Eigensystem of a matrix
This chapter reviews general results from linear algebra. In-depth analyses are found in classical books by Gantmacher (1959a,b), Wilkinson (1965) and Meyer (2000). We refer to Golub and Van Loan (1996) for matrix computational methods and algorithms. In this chapter, D is a general matrix, not the adjacency matrix.
8.1 Eigenvalues and eigenvectors 138. The algebraic eigenproblem consists in the determination of the eigenvalues and the corresponding eigenvectors { of an q × q matrix D for which the set of q homogeneous linear equations in q unknowns, D{ = {
(8.1)
has a non-zero solution. Clearly, the zero vector { = 0 is always a solution of (8.1). A non-zero solution of (8.1) is only possible if and only if the matrix D L is singular, that is, det (D L) = 0
(8.2)
This determinant can be expanded in a polynomial in of degree q, fD () =
q X
fn n = fq q + fq1 q1 + · · · + f1 + f0 = 0
(8.3)
n=0
which is called the characteristic or eigenvalue polynomial of the matrix D. Apart q from fq = (1) , the coe!cients for 0 n ? q are X Pqn (8.4) fn = (1)n doo 1
and Pn is a principal minor . Since a polynomial of degree q has q complex zeros 1
A principal minor Pn is the determinant of a principal n × n submatrix Pn×n obtained by deleting the same q 3 n rows and columns in D. Hence, the main diagonal elements (Pn×n )ll are n elements of main diagonal elements {dll }1$l$q of D.
211
212
Eigensystem of a matrix
(art. 196), the matrix D possesses q eigenvalues 1 > 2 > = = = > 1 , not all necessarily distinct. In general, the characteristic polynomials can be written as fD () =
q Y
(n )
(8.5)
n=1
Since fD () = det (D L), it follows from (8.3) and (8.5) that, for = 0, det D = f0 =
q Y
n
(8.6)
n=1
Hence, if det D = 0, there is at least one zero eigenvalue. Also, (1)
q1
fq1 =
q X
n = trace(D)
(8.7)
n=1
and f1 =
q X
q Y
m = det D
n=1 m=1;m6=n
q X 1 n
(8.8)
n=1
For any eigenvalue , the set (8.1) has at least one non-zero eigenvector {. Furthermore, if { is a non-zero eigenvector, also n{ is a non-zero eigenvalue. Therefore, eigenvectors are often normalized, for instance, a probabilistic eigenvector has the sum of its components equal to 1 or a norm k{k1 = 1 as defined in (8.39). If the rank of D L is less than q 1, there will be more than one independent vector. Just these cases seriously complicate the eigenvalue problem. In the sequel, we omit the discussion on multiple eigenvalues and refer to Wilkinson (1965). 139. General bounds on the position of eigenvalues. Theorem 36 (Gerschgorin) Every eigenvalue of a matrix D lies in at least P one of the circular discs with center dmm and radii Um = n=1;n6=m |dmn | or um = P n=1;n6=m |dnm | Proof: Suppose that the u-th component of the eigenvector { of D belonging to eigenvalue has the largest modulus. An eigenvector can always be scaled and we normalize such that {W = ({1 > {2 > = = = > {u1 > 1> {u+1 > = = = > {q ) where |{m | 1, for all m. Equating the u-th component on both sides of the eigenvalue equation D{ = { gives q X n=1
dun {n = {u =
8.1 Eigenvalues and eigenvectors
213
Hence, |duu |
q X
|dun {n |
n=1;n6=u
q X
|dun | |{n |
n=1;n6=u
q X
|dun |
n=1;n6=u
which shows that lies in a circular disc centered at duu with a radius not larger Pq than n=1;n6=u |dun |. The other radius mentioned follows from the fact that D and DW have the same eigenvalues as shown in art. 140. ¤ 140. The eigenproblem of the transpose DW , DW | = |
(8.9)
is of singular importance. Since¡ the determinant of a matrix is equal to the de¢ W terminant of its transpose, det D L = det (D L), which shows that the eigenvalues of D and DW are the same. However, the eigenvectors are, in general, dierent. Alternatively, transposing (8.9) yields | W D = | W
(8.10)
The vector |mW is therefore called the left-eigenvector of D belonging to the eigenvalue m , whereas {m is called the right-eigenvector belonging to the same eigenvalue m . An important relation between the left- and right-eigenvectors of a matrix D is, for m 6= n , |mW {n = 0
(8.11)
Indeed, left-multiplying (8.1) with = n by |mW , |mW D{n = n |mW {n and similarly right-multiplying (8.10) with = m by {n |mW D{n = m |mW {n leads, after subtraction to 0 = (n m ) |mW {n and (8.11) follows. Since eigenvectors may be complex in general and since |mW {n = {Wn |m , the expression |mW {n is not an ¢ ¡ inner-product that is always real and for which |mW {n = {Wn |m holds. However, (8.11) expresses that the sets of left- and right-eigenvectors are orthogonal if m 6= n . 141. If D has q distinct eigenvalues, then the q eigenvectors are linearly independent and span the whole q-dimensional space. The proof is by reductio ad absurdum. Assume that v is the smallest number of linearly dependent eigenvectors labelled by the first v smallest indices. Linear dependence then means that v X n=1
n {n = 0
(8.12)
214
Eigensystem of a matrix
where n 6= 0 for 1 n v. Left-multiplying by D and using (8.1) yields v X
n n {n = 0
(8.13)
n=1
On the other hand, multiplying (8.12) by v and subtracting from (8.13) leads to v1 X
n (n v ) {n = 0>
n=1
which, because all eigenvalues are distinct, implies that there is a smaller set of v 1 linearly depending eigenvectors. This contradicts the initial hypothesis. This important property has a number of consequences. First, it applies to left- as well as to right-eigenvectors. Relation (8.11) then shows that the sets of left- and right-eigenvectors form a bi-orthogonal system with |nW {n 6= 0. For, if {n were orthogonal to |n (or |nW {n = 0), (8.11) demonstrates that {n would be orthogonal to all left-eigenvectors |m . Since the set of left-eigenvectors span the q dimensional vector space, it would mean that the q-dimensional vector {n would be orthogonal to the whole q-space, which is impossible because {n is not the null vector. Second, any q-dimensional vector can be written in terms of either the leftor right-eigenvectors. 142. Let us denote by [ the matrix with the right-eigenvector {m in column m and by \ W the matrix with the left-eigenvector |nW in row n. If the right- and left-eigenvectors are scaled such that, for all 1 n q, |nW {n = 1, then \ W[ = L
(8.14)
or the matrix \ W is the inverse of the matrix [. Furthermore, for any righteigenvector, (8.1) holds, rewritten in matrix form, such that D[ = [ diag(n )
(8.15)
Left-multiplying by [ 1 = \ W yields the similarity transform of matrix D, [ 1 D[ = \ W D[ = diag(n )
(8.16)
Thus, when the eigenvalues of D are distinct, there exists a similarity transform K 1 DK that reduces D to diagonal form. In many applications, similarity transforms are applied to simplify matrix problems. Observe that a similarity transform preserves the eigenvalues, because, if D{ = {, then K 1 { = K 1 D{ = (K 1 DK)K 1 {. The eigenvectors are transformed to K 1 {. When D has multiple eigenvalues, it may be impossible to reduce D to a diagonal form by similarity transforms. Instead of a diagonal form, the most compact form Pu when D has u distinct eigenvalues each with multiplicity pm such that m=1 pm = q
8.1 Eigenvalues and eigenvectors
215
is the Jordan canonical form F, 5 9 9 9 F=9 9 7
6
Fp1 d (1 )
: : : : : 8
Fd (1 ) .. . Fpu1 (u1 ) Fpu (u )
where Fp () is an p × p submatrix of the form 5
0 .. .
1 .. .
9 9 9 Fp () = 9 9 7 0 0
··· ···
0 ··· 1 0 .. .. . . 0 0 0
0 ··· .. .
6
: : : : : 1 8
The number of independent eigenvectors is equal to the number of submatrices. If an eigenvalue has multiplicity p, there can be one large submatrix Fp (), but also P a number n of smaller submatrices Fem () such that nm=1 em = p. This illustrates, as mentioned in art. 138, the much higher complexity of the eigenproblem in case of multiple eigenvalues. For more details we refer to Wilkinson (1965). 143. The companion matrix of a polynomial sq () = 5 9 9 9 F=9 9 9 7
Pq n=0
dq1 dq 1 0 .. .
dq2 dq 0 1 .. .
··· ··· ··· .. .
ddq1 0 0 .. .
ddq0 0 0 .. .
0
0
···
1
0
dn } n is defined as 6 : : : : : : 8
although other variants also appear in the literature, such as 5 9 9 9 F=9 9 9 7
0 1 0 .. . 0
0 0 1 .. . 0
··· ··· ··· .. . ···
0 ddq0 0 ddq1 0 ddq2 .. .. . . dq1 1 dq
6 : : : : : : 8
The basic property of the companion matrix F is det (F L) = (1)q
sq () dq
(8.17)
216
Eigensystem of a matrix
Indeed, in ¯ d ¯ q1 dq2 ¯ dq dq ¯ 1 ¯ ¯ 0 1 det (F L) = ¯¯ .. .. ¯ ¯ . . ¯ ¯ 0 0
··· ··· ··· .. .
ddq1 0 0 .. .
···
1
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯
ddq0 0 0 .. .
multiply the first column by q1 , the second column by q2 , and so on, and add them to the last column. The resulting last column elements are zero, except for that in the first row, which is sqd() . The corresponding cofactor is one, which q proves (8.17). From (9.2), art. 144 and (8.17), it follows that the inverse of the companion matrix F is 5
F 1
9 9 9 =9 9 9 7
dd10 1 0 .. .
dd20 0 1 .. .
··· ··· ··· .. .
dq1 d0 0 0 .. .
ddq0 0 0 .. .
0
0
···
1
0
6 : : : : : : 8
The companion matrix of the characteristic polynomial (8.3) of D is defined as 5 9 9 9 F=9 9 7
(1)q1 fq1 1 0 .. .
(1)q1 fq2 0 1 .. .
··· ··· ··· .. .
(1)q1 f1 0 0 .. .
(1)q1 f0 0 0 .. .
0
0
···
1
0
6 : : : : : 8
such that det (F L) = fD (). If D has distinct eigenvalues, D as well as F are similar to diag(l ). It has been shown in art. 142 that the similarity transform K for D equals K = [. The similarity transform for F is the Vandermonde matrix Yq (), where 5 9 9 9 Yq ({) = 9 9 9 7
{q1 1 {q2 1 .. .
{q1 2 {q2 2 .. .
{1 1
{2 1
··· ··· ··· .. . ···
{q1 q1 {q2 q1 .. .
{q1 q {q2 q .. .
{q1 1
{q 1
6 : : : : : : 8
8.1 Eigenvalues and eigenvectors
217
Furthermore, 5 9 9 9 Yq ()diag (l ) = 9 9 9 7 while
5
9 9 9 FYq () = 9 9 9 7
q1 1q1 .. .
q2 q1 2 .. .
21 1
22 2
··· ··· ··· .. . ···
(31)q31 fD (1 ) + q 1 q31 1 .. .
(31)q31 fD (2 ) + q 2 q31 2 .. .
21
22
1
2
qq1 q1 q1 .. .
qq q1 q .. .
2q1 q1
2q q
··· ··· ··· .. . ···
6 : : : : : : 8
(31)q31 fD (q ) + q q q31 q .. . 2q q
6 : : : : : : 8
Since fD (m ) = 0, it follows that FYq () = Yq ()diag(l ), which demonstrates the claim. Hence, the eigenvector {n of F belonging to eigenvalue n is ¤ £ {Wn = q1 q2 · · · n 1 n n The Vandermonde matrix Yq () is clearly non-singular if all eigenvalues are distinct (see also art. 194). In the case that det Yq () 6= 0, the matrix Yq () is of rank q, implying that all eigenvectors are linearly independent. The eigenvectors are only orthogonal if {Wn {p = 0 for each pair (n> p) with n 6= p. In other words, if q q X (n p ) 1 m m 0= (n ) (p ) = n p n p 1 m=1 2lo
The solution is n p = h q for o = 1> 2> = = = > q 1, which implies that each of the q eigenvalues {n }1nq must be a q-th distinct root of unity and that the associated polynomial to the companion matrix is sq (}) = dq (} q ± 1). The first component or row in the eigenvalue equation F{ = { expresses explicitly the root equation fD () = 0 of the polynomial (F{n )1 (n {n )1 = fD (n ) = 0
(8.18)
and any other row is an identity. If F has an eigenvalue n of multiplicity p, (p1) then n satisfies fD (n ) = f0D (n ) = = = = = fD (n ) = 0. The first equality is equivalent to (8.18). The others are similarly derived by dierentiating (8.18) with respect to n such that, for 1 m p 1, ³ ³ ³ ´ ´ ´ (m) (m1) (m) (m) m{n n {n = fD (n ) = 0 F{n 1
1
1
Hence, if n is a zero with multiplicity p, then F{n = n {n , where {n is the eigenvector and the other 2 m p equations are F|m = n |m + |m1
218 where |m =
Eigensystem of a matrix (m1) {n
is a generalized eigenvector, h ¡ m ¢ n |mW = 0 0 = = = 1 m1
(m1)!
===
¡q1¢ qm i m1 n
which has a 1 in the m-th position. Clearly, with this notation, {n = |1 . Moreover, the set of the eigenvector and p 1 generalized eigenvectors are independent because the p × q matrix formed by their components has rank p. 144. When left-multiplying (8.1), we obtain D2 { = D{ = 2 { and, in general for any integer n 0, Dn { = n {
(8.19)
Moreover, if D has no zero eigenvalue, i.e., D1 exists, then left-multiplying (8.1) with D1 yields D1 { = 1 { We apply (8.19) to the matrix D1 and conclude that Dn { = n { In other words, if the inverse matrix D1 exists, then equation (8.19) is valid for any integer, positive as well as negative. Combining (8.19) and (8.7) implies that q ¡ ¢ X trace Dn = nm (D)
(8.20)
m=1
145. The Caley-Hamilton Theorem. Since any eigenvalue satisfies its characteristic polynomial fD () = 0, we directly find from (8.19) that the matrix D satisfies its own characteristic equation, fD (D) = R
(8.21)
This result is the Caley-Hamilton Theorem. There exist several other proofs of the Caley—Hamilton Theorem. P Let pfD (}) = on=0 en } n denote the minimal polynomial, defined in art. 211, of the characteristic polynomial fD (}) of a matrix D and the degree of the minimal polynomial obeys o q, where o are the number of dierent eigenvalues of D. Any eigenvalue of the matrix D satisfies pfD () = 0. Analogously as for the Caley-Hamilton Theorem, we also have from (8.19) that pfD (D) = R Clearly, if I (}) is any polynomial satisfying I (D) = R, then pfD (}) |I (}) as well as pfD (}) |f (}).
8.2 Functions of a matrix
219
146. Power method. Let {1 > {2 > = = = > {q denote the eigenvectors of D belonging to the distinct eigenvalues 1 > 2 > = = = > q . Art. 141 demonstrates that this set of linearly independent vectors spans the q-dimensional space such that any other vector z can be written as a linear combination, z=
q X
m {m
m=1
Then, for any integer n and using (8.19) yields Dn z =
q X
m Dn {m =
m=1
q X
m nm {m
m=1
If the largest eigenvalue obeys that |1 | A |2 | and |2 | |m | for any 3 m q, then, for large n, we observe that à n
D z=
1 n1 {1
Ãμ
1+R
|2 | |1 |
¶n !!
This shows that, after subsequent multiplications with D, an arbitrary vector z aligns increasingly more towards the largest eigenvector {1 . This so-called power method lies at the basis of the computations of the largest eigenvector, especially p in large and sparse matrices. In particular, the sequence Dz> D2 z> D4 z> = = = > D2 z, tends exponentially fast to a vector, proportional to the largest eigenvector {1 of D (under the very mild condition that |1 | A |2 |).
8.2 Functions of a matrix 147. Consider an arbitrary matrix polynomial in , I () =
p X
In n
n=0
where all In are q × q matrices and Ip 6= R. Any matrix polynomial I () can be right and left divided by another (non-zero) matrix polynomial E() in a unique way as proved in Gantmacher (1959a, Chapter IV). Hence the left-quotient and leftremainder I () = E()TO () + O() and the right-quotient and right-remainder I () = TU ()E() + U() are unique. Let us concentrate on the right-remainder in the case where E() = L D is a linear polynomial in . Using Euclid’s division
220
Eigensystem of a matrix
scheme for polynomials (art. 208), I () = Ip p1 (L D) + (Ip D + Ip1 ) p1 + ¤ £ = Ip p1 + (Ip D + Ip1 ) p2 (L D)
p2 X
In n
n=0
p3 X ¢ ¡ + Ip D2 + Ip1 D + Ip2 p2 + In n n=0
and continuing, we arrive at 5 I () = 7Ip p1 + · · · + n1
p X
Im Dmn + · · · +
+
6 Im Dm1 8 (L D)
m=1
m=n p X
p X
Im Dm
m=0
In summary, I () = TU () (L D) + U() (and similarly for the left-quotient and left-remainder) with ³P ´ ³P ´ P Pp p p n1 mn n1 mn TU () = p I D () = D I T m O m n=1 m=n n=1 m=n Pp P m m I D = I (D) O() = D I U() = p m m m=0 m=0 (8.22) and where the right-remainder is independent of . The Generalized Bézout Theorem states that the polynomial I () is divisible by (L D) on the right (left) if and only if I (D) = R (O() = R). 148. The adjoint matrix. By the Generalized Bézout Theorem, the polynomial I () = j()L j(D) is divisible by (L D) because I (D) = j(D)L j(D) = R. If I () is an ordinary polynomial, the right- and left-quotient and remainder are equal. The Caley—Hamilton Theorem (8.21) states that fD (D) = 0, which indicates that fD ()L = T() (L D) and also fD ()L = (L D) T(). The matrix T() = (L D)1 fD () is called the adjoint matrix of D. Explicitly, from (8.22), 3 4 q q X X T() = n1 C fm Dmn D n=1
m=n
and, with (8.6), T(0) = (D)1 det D =
q X
fm Dm1
m=1
The main theoretical interest of the adjoint matrix stems from its definition, fD ()L = T() (L D) = (L D) T()
8.2 Functions of a matrix
221
In case = n is an eigenvalue of D, then (n L D) T(n ) = 0, which indicates by (8.1) and the commutative property (L D) T() = T() (L D) that every nonzero column(row) of the adjoint matrix T(n ) is a right(left)-eigenvector belonging to the eigenvalue n . In addition, by dierentiation with respect to , we obtain f0D ()L = (L D) T0 () + T() This demonstrates that, if T(n ) 6= R, the eigenvalue n is a simple root of fD () and, conversely, if T(n ) = R, the eigenvalue n has higher multiplicity. 1 The adjoint matrix T() = (L D) fD () is computed by observing that, on fD ()fD () the Generalized Bézout Theorem, is divisible without remainder. By replacing and in this polynomial by L and D respectively, T() readily follows. 149. Consider the arbitrary polynomial of degree o, j({) = j0
o Y
({ m )
m=1
Substitute { by D, then j(D) = j0
o Y
(D m L)
m=1
Since det (DE) = det D det E and det(nD) = nq det D, we have det(j(D)) = j0q
o Y
det(D m L) = j0q
m=1
o Y
f(m )
m=1
With (8.5), det(j(D)) = j0q
q o Y Y
(n m ) =
m=1 n=1
=
q Y
q Y n=1
j0
o Y
(n m )
m=1
j (n )
n=1
If k({) = j({) , we arrive at the general result: for any polynomial j({), the eigenvalues of j(D) are j (1 ) > = = = > j (q ) and the characteristic polynomial is det(j(D) L) =
q Y
(j (n ) )
(8.23)
n=1
which is a polynomial in of degree at most q. Since the result holds for an arbitrary polynomial, it should not surprise that, under appropriate conditions of convergence, it can be extended to infinite polynomials, in particular to the Taylor series of a complex function. As proved in Gantmacher (1959a, Chapter V), if the
222
Eigensystem of a matrix
power series of a function i (}) around } = }0 , i (}) =
4 X
im (}0 )(} }0 )m
(8.24)
m=1
P4 converges for all } in the disc |} }0 | ? U, then i (D) = m=1 im (}0 )(D }0 L)m provided that all eigenvalues of D lie within the region of convergence of (8.24), i.e., | }0 | ? U. For example, hD} = log D =
4 X } n Dn n=0 4 X n=1
for all D
n!
(1)n1 (D L)n for |n 1| ? 1, all 1 n q n
and, from (8.23), the eigenvalues of hD} are h}1 > = = = > h}1 . Hence, the knowledge of the eigenstructure of a matrix D allows us to compute any function of D (under the same convergence restrictions as complex numbers }).
8.3 Hermitian and real symmetric matrices 150. A¡ Hermitian matrix. A Hermitian matrix D is a complex matrix that obeys ¢ DK = DW = D, where dK = (dlm ) is the complex conjugate of dlm . The superscript K, in honor of Charles Hermite, means to take the complex conjugate and then a transpose. Hermitian matrices possess a number of attractive properties. A particularly interesting subclass of Hermitian matrices are real, symmetric matrices W that obey of vector | and { is defined as | K { and ¡ K D¢ = ¡D.K The ¢K inner-product P obeys | { = | { = {K |. The inner-product {K { = qm=1 |{m |2 is real and positive for all vectors except for the null vector. 151. The eigenvalues of a Hermitian matrix are all real. Indeed, left-multiplying (8.1) by {K yields {K D{ = {K { ¢K ¡ and, since {K D{ = {K DK { = {K D{, it follows that {K { = K {K { or = K because {K { is a positive real number. Furthermore, since D = DK , we have DK { = { Taking the complex conjugate, yields DW { = { In general, the eigenvectors of a Hermitian matrix are complex, but real for a real symmetric matrix since DK = DW . Moreover, the left-eigenvector | W is the
8.3 Hermitian and real symmetric matrices
223
complex conjugate of the right-eigenvector {. Hence, the orthogonality relation (8.11) reduces, after normalization, to the inner-product {K n {m = nm
(8.25)
where nm is the Kronecker delta, which is zero if n 6= m and else nn = 1. Consequently, (8.14) reduces to [K [ = L
(8.26)
which implies that the matrix [ formed by the eigenvectors is an unitary matrix ([ 1 = [ K ). For a real symmetric matrix D, the corresponding relation [ W [ = L implies that [ is an orthogonal matrix ([ 1 = [ W ) obeying [ W [ = [[ W = L where the first equality follows from the commutativity of the inverse of a matrix, [ 1 [ = [[ 1 . Hence, all eigenvectors of a symmetric matrix are orthogonal. Although the arguments so far (see Section 8.1) have assumed that the eigenvalues of D are distinct, the theorem applies in general as proved in Wilkinson (1965, Section 47): for any Hermitian matrix D, there exists a unitary matrix X such that, for real m , X K DX = diag (m ) and for any real symmetric matrix D, there exists an orthogonal matrix X such that, for real m , X W DX = diag (m ) 152. The Rayleigh inequalities. The normalized eigenvectors {n and {p of real symmetric2 D obey {Wn D{p = 0 if n 6= p and {Wn D{n = n (art. 151). These q eigenvectors span the q-dimensional space. Let z be an q × 1 vector that can be written as a linear combination of the first m eigenvectors of D, z = f1 {1 + f2 {2 + · · · + fm {m where all fn 5 R. Equivalently, z 5 Xm , where Xm is the space spanned by the Pm Pm Pm vectors {{1 > {2 > = = = > {m }. Then zW z = n=1 p=1 fn fp {Wn {p = n=1 f2n and zW Dz =
m m X X
fn fp {Wn D{p =
n=1 p=1
m X
f2n n
n=1
Since D has real eigenvalues 1 2 · · · Q , this ordering of the eigenvalues P P P leads to the bound m mn=1 f2n mn=1 f2n n 1 mn=1 f2n from which the Rayleigh inequalities for z 5 Xm follow as m 2
zW Dz 1 zW z
The extension to a Hermitian matrix is straightforward and omitted.
(8.27)
224
Eigensystem of a matrix W
Equality in zzWDz z = n is only attained provided z 5 Xm is an eigenvector of D belonging to eigenvalue n with m n 1 . If z is a vector that is orthogonal to the first m eigenvectors of D, which means that z = fm+1 {m+1 +fm+2 {m+2 +· · ·+fq {q can be written as a linear combination of the last qm eigenvectors or that z 5 XmB , W
then q zzWDz z m+1 . The two extreme eigenvalues can thus be written as | W D| |W |
(8.28)
| W D| |6=0 | W |
(8.29)
1 = sup |6=0
q = inf
153. Complex symmetric matrix. Let D be a complex symmetric matrix and {n is the eigenvector belonging to eigenvalue n , D{n = n {n In general, both {n and n are complex. For example, the matrix diag(i$n ) with $n 5 R is symmetric and all its eigenvalues are purely imaginary. Since D = DW is symmetric, the right- and left-eigenvectors are the same and (8.14) shows that the matrix [ with the eigenvectors placed in columns is orthogonal. This illustrates that the set of eigenvectors span the q-dimensional space. Left-multiplication with {Wn gives n =
{Wn D{n {Wn {n
Also {Wn {n is complex in general. Only when taking the absolute value, an ordering |1 | |2 | = = = q can be obtained and the same argument as in art. 152 leads to ¯ ¯ W ¯ z Dz ¯ ¯ |q | ¯ W ¯¯ |1 | z z for any (complex) vector z. Symmetry D = DW guarantees that the set of (possibly complex) eigenvectors is an orthogonal basis for the q-dimensional space. If D is not symmetric, diagonalization via a similarity transform may not be possible (art. 142) so that Rayleigh’s inequalities are not valid. For example, if D is an q × q upper triangular matrix with all ones and ones on the diagonal, then all eigenvalues are equal to 1, one eigenvector is h1 = (1> 0> = = = > 0) and all others are zero. Choosing z = x gives xW Dx = q+1 2 A 1 for q A 1, violating the Rayleigh inequality (8.27) for the largest xW x eigenvalue. 154. Field of values. The field of values (=) is a set of complex numbers associated to an q × q matrix D, © ª (D) = {K D{ : { 5 Cq , {K { = 1 (8.30)
8.3 Hermitian and real symmetric matrices
225
While the spectrum of a matrix is a discrete set, the field of values (=), of which instances appeared in art. 152 and art. 153, can be a continuum. However, (D) is always a convex subset of C for any matrix D, a fundamental fact known as the Toeplitz-Hausdor Theorem and proved in Horn and Johnson (1991, Section 1.3). Another property is the subadditivity, (D + E) (D) + (E), which follows from the definition (8.30). Indeed, © ª (D + E) = {K D{ + {K E{ : { 5 Cq , {K { = 1 ª © ª © {K D{ : { 5 Cq , {K { = 1 + | K E| : | 5 Cq , | K | = 1 = (D) + (E) Since the set (D) of eigenvalues of D belongs to (D), it holds that (D + E) (D + E) (D) + (E) which can provide possible information about the eigenvalues of D + E, given (D) and (E). In general, given the spectrum (D) and (E), less can be said about (D + E) (see also art. 183 and 184). For example, even if the eigenvalues of D and E are known and bounded, the largest eigenvalue of D + E can be unbounded, as deduced from the example inspired by Horn and Johnson (1991, p. 5), where ¸ ¸ 1+{ 1 1{ 1 and E = D= j ({) { i ({) { Clearly, the eigenvalues of D, E an D + E are ´ p 1³ 1>2 (D) = 1 ± 1 + 4 ({2 { + i ({)) 2 ´ p 1³ 1>2 (E) = 1 ± 1 + 4 ({2 + { + j ({)) 2 ´ p 1³ 1>2 (D + E) = 1 ± 1 + 4 (i ({) + j ({)) 2 It su!ces to choose i ({) = {2 + { + f1 and j ({) = {2 { + f2 for arbitrary constants f1 and f2 to have bounded eigenvalues, independent of {, while lim{$4 |1>2 (D + E)| = 4. 155. The spectrum of a unitary matrix. We denote the eigenvalues of the q × q unitary matrix X by 1 > 2 > = = = > q . Theorem 37 All eigenvalues of a unitary matrix have absolute value 1, i.e., |n | = 1 for all 1 n q. Proof: The orthogonality relation (8.25) for n = m or the matrix product of the m-th diagonal element in L in the orthogonality relation (8.26) equals q X l=1
2
|Xlm | = 1
226
Eigensystem of a matrix
which implies that the elements Xlm of a unitary matrix cannot exceed unity in absolute value. Therefore, the absolute value of the coe!cients fn in (8.4) of the characteristic polynomial is bounded for any q × q unitary matrix X . Taking the determinant of the orthogonality relation (8.26) gives ¡ ¢ ¢ ¡ 2 1 = det X K X = det X K det X = |det X | Qq 2 while (8.6) then leads to n=1 |n | = 1. Hence, a unitary matrix cannot have a zero eigenvalue. In addition, it shows, together with the bounds on |fn | that are only function of q, that all eigenvalues must lie between some lower and upper bound for any q × q unitary matrix X and these bounds are not dependent on the unitary matrix elements considered. Art. 144 shows that any integer power p (positive as well as negative) of X p has the same eigenvalues of X raised to the power p. In K addition, X p is also an q × q orthogonal matrix obeying (X p ) X p = L as follows by induction on ¢K ¢K ¡ ¡ (X p )K X p = X p1 X K X X p1 = X p1 X p1 2
Hence, |det X p | = 1 and, by (8.6), we have, for any p 5 Z, that q Y
2p
|n |
=1
n=1
But, the absolute value of these powers (positive as well as negative) can only remain below a bound independent of p provided |n | = 1 for all n. ¤ An orthogonal matrix XU that obeys XUW XU = L can be regarded as a unitary matrix X = XU +lXL with imaginary part XL = 0. In general, an orthogonal matrix is not symmetric, unless XU1 = XU . Theorem 37 states that the m-th eigenvector }m = {m + l|m obeys the eigenvalue equation X }m = hlm }m for real m , explicitly, (XU + lXL ) ({m + l|m ) = (cos m + l sin m ) ({m + l|m ) Thus, in general, the eigenvalues hlm and eigenvector }m of an orthogonal matrix XU (with XL = 0) are complex. 156. The eigenvalue decomposition of a symmetric (Hermitian) matrix. Art. 151 W shows that any real, symmetric matrix ¤ as D = [[ , where £ Dq×q can be written = diag(m )1mq and where [ = {1 {2 = = = {q is an orthogonal matrix (such that [ W [ = L) formed by the real and normalized eigenvectors {1 > {2 > = = = > {q of D corresponding to the eigenvalues 1 2 · · · q . In vector notation, D=
q X n=1
n {n {Wn =
q X
n Hn
(8.31)
n=1
where the matrix Hn = {n {Wn is the outer product of {n by itself. It represents the
8.3 Hermitian and real symmetric matrices
227
orthogonal projection3 onto the eigenspace of n . For any analytic function i , we have that i (D) = [i () [ W and i (D) =
q X
i (n ) Hn
(8.32)
n=1
157. Properties of the matrix Hn . From the definition Hn = {n {Wn , we deduce that Hn = HnW , thus, symmetric. The explicit form of the matrix Hn is 6 5 ({n1 )2 {n1 {n2 {n1 {n3 · · · {n1 {nq 2 9 { { {n2 {n3 · · · {n2 {nq : : 9 n2 n1 ({n2 ) : 9 2 W 9 { { { { ({ ) · · · { { n3 n2 n3 n3 nq : Hn = {n {n = 9 n3 n1 : .. .. .. .. .. : 9 8 7 . . . . . {nq {n1
{nq {n2
{nq {n3
···
({nq )2
2
which shows that the diagonal element (Hn )ll = ({nl ) equals the square of the l-th vector component of the eigenvector {n . Hence, trace (Hn ) =
q X
2
({nl ) = {Wn {n = 1
(8.33)
l=1
It follows from the orthogonality property (8.25) of eigenvectors {n of a symmetric matrix that Hn2 = Hn and Hn Hp = 0 for n 6= p. Let us denote the eigenvalue equation Hn |m = m |m of the symmetric matrix Hn . After left-multiplication by Hn , we obtain Hn2 |m = m2 |m and, since Hn2 = Hn , we arrive at Hn |m = m2 |m . Hence, for any eigenvalue m and corresponding eigenvector |m , we have that m |m = m2 |m , which implies that m is either zero or 1. The trace-relation (8.7) and (8.33) indicates that Pq 4 m=1 m = 1. Consequently , we conclude that q 1 eigenvalues are zero and one eigenvalue equals 1, such that kHn k2 = 1, which follows from (8.49). The zero eigenvalues imply that det (Hn ) = 0 and that the inverse of Hn does not exist. 3
4
Let X and Y be complementary subspaces of a vector space V so that every vector y M V can be uniquely resolved as y = { + |, where { M X and | M Y. The unique linear operator S defined by S y = { is called the projector onto X along Y and S has the following properties: (a) S 2 = S (i.e., S is idempotent), (b) L 3 S is the complementary projector onto Y along X , (c) If V is Rq , then L R 31 [ \ S = [ \ R R where the columns of [ and \ are respective bases for X and Y. These results are proved in Meyer (2000, p. 386). If Y = X z , then S is the orthogonal projector onto X . In that 31 W [ and, if the basisvectors of X are case, Meyer (2000, p. 430) shows that S = [ [ W [ W W orthogonal, i.e., [ [ = L, we have S = [[ . The eigenvalues of Hn directly follow from the rank-one update formula (8.82) and {W n {n = 1 because 1 q = (3)q31 ( 3 1) {n {W det {n {W { 3 L = (3) det L 3 {
228
Eigensystem of a matrix
Geometrically, this is understood because, by projecting, information is lost and the inverse cannot create information. The notation so far has implicitly assumed that all eigenvalues are dierent, which is not the case in general. Thus, if the eigenvalue n has multiplicity pn , then there are pn eigenvectors belonging to n that form an orthonormal basis for the eigenspace belonging to n . Let Xn denote the matrix with its columns equal to those pn eigenvectors belonging to n , then the matrix Hn = Xn XnW , which is the obvious generalization of Hn = {n {Wn . Thus, with (8.7), trace(Hn ) is equal to the rank of Hn , which is the dimension of the eigenspace associated with n . If the multiplicity of n is pn , then rank(Hn ) = pn . Pt Consider now the q × q matrix \ = n=1 Hn , where the n index ranges over all distinct t q eigenvalues {n }1nt of D. Since Hn2 = Hn and Hn Hp = 0 for n 6= p, we find that \2 =
t X n=1
Hn
t X
Hm =
m=1
t X
Hn2 + 2
n=1
t n1 X X
Hn Hm = \
n=1 m=1
such that all eigenvalues of the symmetric (Hermitian) matrix \ are either 1 or 0. But, trace (\ ) =
t X
trace (Hn ) =
n=1
t X
pn = q
n=1
which implies that all eigenvalues of \ must be equal to 1, and thus that \ = L. The fact that t X Hn = Lq×q (8.34) n=1
also follows from (8.32) for i ({) = h}{ , after letting } = 0. Moreover, this relation is rewritten as [[ W = L, which, combined with the normalization (8.26), implies that [ is an orthogonal matrix, which we already knew from art. 151. It means that the sum of the orthogonal projections onto all eigenspaces of D spans again the total q × q space. By right-multiplying the eigenvalue equation D{n = n {n by {Wn , we obtain DHn = n Hn This result also follows from (8.31) by using the orthogonality property Hn Hp = Q D ({) Hn 1{n=p} of the matrices Hn and Hp . Finally, let i ({) = f{ = qm=1;m6=n ({ m ), n where fD ({) is the characteristic polynomials of D, then (8.32) shows that q Y m=1;m6=n
Hence, Hn is a polynomial in D.
(D m ) = i (n ) Hn
8.3 Hermitian and real symmetric matrices
229
158. Diagonal elements of Hn . It directly follows from (8.31) that, for each 1 m q, q q X X dmm = n (Hn )mm = n ({n )2m n=1
n=1
Geometrically, the scalar ¢ product of the vector = (1 > 2 > = = = > q ) with the vectors ¡ |m = {21m > {22m > = = = > {2qm , where {nm is the m-th component of the n-th eigenvector of D belonging£ to n , equals the diagonal element dmm . Denoting the non-negative ¤ W matrix \ = |1 |2 · · · |q and e = (d11 > d22 > = = = > dqq ), the above relation reads in matrix form \ W = e
(8.35)
Generally, denoting the vector ei = ((i (D))11 > (i (D))22 > = = = > (i (D))qq ) and similarly for the vector i = (i (1 ) > i (2 ) > = = = > i (q )), it follows from (8.32) that \ W i = ei
(8.36)
The normalization (8.25) of the eigenvectors shows that the row sum in \ W is equal to one. Thus, \x=x which shows that \ and \ W have an eigenvalue equal to 1 belonging to the eigenvector x. Since \ is a non-negative matrix, the Perron-Frobenius Theorem in art. 168 indicates that this eigenvalue belonging to an eigenvector with non-negative components is the largest one and that the absolute value of any other eigenvalue is smaller than 1. Relation (8.34) translates to 1=
q X
({n )2m
(8.37)
n=1
and to \ Wx = x illustrating that x is both a left- and right-eigenvector. Incidentally, \ W is a doubly stochastic matrix and |lm can be regarded as a transition probability from state m to state l. 159. Orthogonal transformation. Using the orthogonality property X W X = L, from which X 1 = X W , the transformation of a vector } by an orthogonal matrix X results in the vector { such that { = X } and the inverse transform } = X W { is also an orthogonal transformation. Moreover, the quadratic form {W { = } W X W X } = } W }
Pq 2 is invariant under an orthogonal transformation. Since {W { = n=1 {n , which can be regarded as the square of the Euclidean distance of the vector { from the
230
Eigensystem of a matrix
origin, the invariance means that, after orthogonal transformation, that distance is preserved. Geometrically, any orthogonal transformation is a rotation of the vector { around the origin to a vector }, and both { and } have equal Euclidean distance or norm (see art. 161). 160. To a real symmetric matrix D, a bilinear form {W D| is associated, which is a scalar defined as q q X X {W D| = {D| W = dlm {l |m l=1 m=1
We call a bilinear form a quadratic form if | = {. A necessary and su!cient condition for a quadratic form to be positive definite, i.e., {W D{ A 0 for all { 6= 0, is that all eigenvalues of D should be positive. Indeed, art. 151 shows the existence of an orthogonal matrix X that transforms D to a diagonal form. Let { = X }, then {W D{ = } W X W DX } =
q X
n }n2
(8.38)
n=1
which is only positive for all }n provided n A 0 for all n. From (8.6), a positive definite quadratic form {W D{ possesses a positive determinant, det D A 0. This analysis shows that the problem of determining an orthogonal matrix X (or the eigenvectors of D) is equivalent to the geometrical problem of determining the principal axes of the hyper-ellipsoid q q X X
dlm {l |m = 1
l=1 m=1
Relation (8.38) illustrates that the eigenvalue 1 n is the square of the principal axis along the }n vector. A multiple eigenvalue refers to an indeterminacy of the principal axes. For example if q = 3, an ellipsoid with two equal principal axis means that any section along the third axis is a circle. Any two perpendicular diameters of the largest circle orthogonal to the third axis are principal axes of that ellipsoid. For additional properties of quadratic forms, such as the inertial theorem5 , we refer to Courant and Hilbert (1953) and Gantmacher (1959a).
8.4 Vector and matrix norms 161. Vector and matrix norms, denoted by k{k and kDk respectively, provide a single number reflecting a “size” of the vector or matrix and may be regarded as an extension of the concept of the modulus of a complex number. A norm is a certain 5
The inertial theorem states that the number of positive and negative coe!cients in a quadratic form reduced to the form (8.38) by a nonsigular real linear transformation does not depend on the particular transformation.
8.4 Vector and matrix norms
231
function of the vector components or matrix elements. All norms, vector as well as matrix norms, satisfy the three “distance” relations: (i) k{k A 0 unless { = 0; (ii) k{k = || k{k for any complex number ; (iii) k{ + |k k{k + k|k In general, the Hölder t-norm of a vector { is defined as 3 41@t q X k{kt = C |{m |t D
(8.39)
m=1
For example, the well-known Euclidean norm or length of the vector { is found for 2 t = 2 and k{k2 = {K {. In probability theory where { denotes a discrete probability P density function, the law of total probability states that k{k1 = qm=1 {m = 1 and we will write k{k1 = k{k. Finally, max |{m | = limt$4 k{kt = k{k4 . The unit-spheres Vt = {{| k{kt = 1} are, in three dimensions q = 3, for t = 1 an octahedron; for t = 2 a ball; and for t = 4 a cube. Furthermore, V1 fits into V2 , which in turn fits into V4 , and this implies that k{k1 k{k2 k{k4 for any {. The Hölder inequality states that, for s1 + 1t = 1 and real s> t A 1, ¯ K ¯ ¯{ | ¯ k{k k|k (8.40) s t and is given explicitly in (9.46). A special case of the Hölder inequality where s = t = 2 is the Cauchy-Schwarz inequality ¯ K ¯ ¯{ | ¯ k{k k|k (8.41) 2 2 The Cauchy-Schwarz inequality (8.41) follows immediately from the Cauchy identity (8.86) as shown in art. 192. The t = 2 norm is invariant under a unitary (hence also orthogonal) transformation X , where X K X = L, because kX {k22 = {K X K X { = {K { = k{k22 (see art. 159). Another example of a non-homogeneous vector norm is the quadratic form q s k{kD = {W D{ provided D is positive definite. Relation (8.38) shows that, if not all eigenvalues m of D are the same, thenpnot all components of the vector { are weighted similarly and, thus, in general, k{kD is a non-homogeneous norm. The quadratic form k{kL equals the homogeneous Euclidean norm k{k22 . 8.4.1 Properties of norms 162. All norms are equivalent in the sense that there exist positive real numbers f1 and f2 such that, for all {, f1 k{ks k{kt f2 k{ks
232
Eigensystem of a matrix
For example, k{k2 k{k1
s q k{k2
k{k4 k{k1 q k{k4 s k{k4 k{k2 q k{k4 By choosing in the Hölder inequality s = t = 1, {m $ m {vm for real v A 0 and |m $ m A 0, we obtain with 0 ? ? 1 an inequality for the weighted t-norm à Pq à Pq ! v1 ! 1v v v m=1 m |{m | m=1 m |{m | Pq Pq m=1 m m=1 m For m = 1, the weights m disappear such that the inequality for the Hölder t-norm becomes 1
1
k{kv k{kv q v ( 1) 1
1
where q v ( 1) 1. On the other hand, with 0 ? ? 1 and for real v A 0, 1 S 3 41 # q q v v v 1 $ 1v q v [ [ m=1 |{m | k{kv |{m |v |{ | m C D Sq = S = 1 = Sq 1 v q k{kv v v v n=1 |{n | m=1 m=1 n=1 |{n | n=1 |{n |
Since | =
v Sq|{m | v n=1 |{n |
# q [ m=1
1 and
|{ |v Sq m v n=1 |{n |
1
1 $ 1v $
1
A 1, it holds that | | and # q [ m=1
|{ |v Sq m v n=1 |{n |
# Sq
$1 v
Sqm=1 n=1
=
|{m |v |{n |v
$1
v
=1
which leads to an opposite inequality, k{kv k{kv In summary, if s A t A 0, then the general inequality for Hölder t-norm is 1
1
k{ks k{kt k{ks q t s
(8.42)
163. For p × q matrices D, the most frequently used norms are the Euclidean or Frobenius norm 3 41@2 q p X X 2 kDkI = C |dlm | D (8.43) l=1 m=1
and the t-norm kDkt = sup {6=0
The second distance relation in art. 161,
kD{kt
(8.44)
k{kt
kD{kt k{kt
° ° ° { ° = °D k{k ° , shows that
kDkt = sup kD{kt k{kt =1
t
t
(8.45)
8.4 Vector and matrix norms
233
Furthermore, the matrix t-norm (8.44) implies that kD{kt kDkt k{kt
(8.46)
Since the vector norm is a continuous function of the vector components and since the domain k{kt = 1 is closed, there must exist a vector { for which equality kD{kt = kDkt k{kt holds. Since the n-th vector component of D{ is (D{)l = Pq m=1 dlm {m , it follows from (8.39) that ¯ ¯t 41@t 3 ¯ p ¯X X ¯ q ¯ ¯ ¯ D d { kD{kt = C lm m ¯ ¯ ¯ ¯ l=1 m=1 For example, for all { with k{k1 = 1, we have that ¯ ¯ ¯ X q p ¯X p X q p X X X ¯ q ¯ ¯ ¯ d { |d | |{ | = |{ | |dlm | kD{k1 = lm m ¯ lm m m ¯ ¯ l=1 m=1 l=1 ¯ m=1 m=1 l=1 Ã ! p p q X X X |{m | max |dlm | = max |dlm | m=1
m
m
l=1
l=1
Clearly, there exists a vector { for which equality holds, namely, if n is the column in D with maximum absolute sum, then { = hn , the n-th basis vector with all components zero, except for the n-th one, which is 1. Similarly, for all { with k{k4 = 1, ¯ ¯ ¯X ¯ q q X X ¯ q ¯ ¯ dlm {m ¯¯ max |dlm | |{m | max |dlm | kD{k4 = max ¯ l ¯ l l ¯ m=1 m=1 m=1 Again, if u is the row with maximum absolute sum and {m = 1.sign(dum ) such that Pq Pq k{k4 = 1, then (D{)u = m=1 |dum | = maxl m=1 |dlm | = kD{k4 . Hence, we have proved that kDk4 = max l
kDk1 = max m
from which
q X m=1 p X
|dlm |
(8.47)
|dlm |
(8.48)
l=1
° K° °D ° = kDk 1 4
164. The t = 2 matrix norm, kD{k2 > is obtained dierently. Consider kD{k22 = (D{)K D{ = {K DK D{ Since DK D is a Hermitian matrix, art. 151 shows that all eigenvalues are real and 2 non-negative because a norm kD{k2 0. These ordered eigenvalues are denoted as
234
Eigensystem of a matrix
12 22 · · · q2 0. Applying the theorem in art. 151, there exists a unitary matrix X such that { = X } yields ¡ ¢ {K DK D{ = } K X K DK DX } = } K diag m2 } 12 } K } = 12 k}k22 Since the t = 2 norm is invariant under a unitary (orthogonal) transform k{k2 = k}k2 , by the definition (8.44), kDk2 = sup {6=0
kD{k2 = 1 k{k2
(8.49)
where the supremum is achieved if { is the eigenvector of DK D belonging to 12 . Meyer (2000, p. 279) proves the corresponding result for the minimum eigenvalue provided that D is non-singular, ° 1 ° 1 °D ° = = q1 2 min kD{k2 k{k2 =1
The non-negative quantity m is called the m-th singular value and 1 is the largest singular value of D. The importance of this result lies in an extension of the eigenvalue problem to non-square matrices, which is called the singular value decomposition. A detailed discussion is found in Golub and Van Loan (1996) and Horn and Johnson (1991, Chapter 3). ¡ ¢ 2 165. The Frobenius norm kDkI = trace DK D . With (8.7) and the analysis of DK D above, q X 2 n2 (8.50) kDkI = n=1
In view of (8.49), the bounds kDk2 kDkI
s q kDk2 may be attained.
8.4.2 Applications of norms ° ° ° ° ° ° 166. (a) Since °Dn ° = °DDn1 ° kDk °Dn1 °, by induction, we have for any integer n, that ° n° °D ° kDkn and lim Dn = 0 if kDk ? 1
n$4
(b) By taking the norm of the eigenvalue equation (8.1), kD{k = || k{k and with (8.46), || kDkt Applied to DK D, for any t-norm, ° ° ° ° 12 °DK D°t °DK °t kDkt
(8.51)
8.5 Non-negative matrices
235
Choose t = 1 and with (8.49),
° ° 2 kDk2 °DK °1 kDk1 = kDk4 kDk1
(c) Any matrix D can be transformed by a similarity transform K to a Jordan canonical form F (art. 4) as D = KFK 1 , from which Dn = KF n K 1 . A typical Jordan submatrix (Fp ())n = n2 E, where E is independent of n. Hence, for large n, Dn $ 0 if and only if || ? 1 for all eigenvalues.
8.5 Non-negative matrices 167. Reducibility. A matrix D is reducible if there is a relabeling that leads to ¸ D1 E e D= R D2 where D1 and D2 are square matrices. Otherwise D is irreducible. Relabeling amounts to permuting rows and columns in the same fashion. Thus, there exists e 1 . For doubly stochastic matrices, a similarity transform K such that D = K DK Pq Pq where n=1 dnm = n=1 dmn = 1, Fiedler (1972) has proposed the “measure u (D) of irreducibility” of D defined as X u (D) = min dln (8.52) MN
l5M>n5M @
because D is reducible if there exists a non-empty subset M of the set of all indices (or nodes) N such that dln = 0 for all l 5 M and n 5 @ M. Hence, if D is reducible, P Pq then u (D) = 0. Since l5M>n5M d d 1, the measure of irreducibility ln 1n @ n=2 lies between 0 u (D) 1. 168. The famous Perron-Frobenius theorem for non-negative matrices. Theorem 38 (Perron-Frobenius) An irreducible non-negative q × q matrix D always has a real, positive eigenvalue 1 = max (D) and the modulus of any other eigenvalue does not exceed max (D), i.e., |n (D)| max (D) for n = 2> = = = > q. Moreover, 1 is a simple zero of the characteristic polynomial det (D L). The eigenvector belonging to 1 has positive components. If D has k eigenvalues 1 > 2 > = = = > k with |k | = 1 , then all these equal-moduli 2l(n1) eigenvalues satisfy the polynomial k k1 = 0, i.e., n = 1 h k for n = 1> = = = > k. Proof: See, e.g., Gantmacher (1959b, Chapter XIII).
¤
If a non-negative q × q matrix D is reducible, then D has always a non-negative eigenvalue max (D) and no other eigenvalue has a larger modulus than max (D). The corresponding eigenvector belonging to max (D) has non-negative components. Hence, reducibility removes the positivity of the largest eigenvalue and that of the
236
Eigensystem of a matrix
components of its corresponding eigenvector. An essential Lemma in Frobenius’ proof, beside the variational property of the largest eigenvalue max (D) = max min
{6=0 1mq
(D{)m {m
(8.53)
akin to Rayleigh’s inequality (8.28) and a consequence of (8.54) in art. 169 for a symmetric matrix, is: Lemma 6 If D is a q×q non-negative, irreducible matrix and F is an q×q complex matrix, in which each element obeys |flm | dlm , then every eigenvalue (F) of F satisfies the inequality | (F)| max (D). Proof: See, e.g., Gantmacher (1959b, Chapter XIII).
¤
An application of Lemma 6 is the following lemma for non-negative matrices, which is useful in assessing the largest eigenvalue of the adjacency matrix of a graph: Lemma 7 If one element in a non-negative matrix D is increased, then the largest eigenvalue is also increased. The increase is strict for irreducible matrices. Proof: Consider the non-negative matrix F and D = F + %hl hWm , where % A 0, e = hl hW is the zero matrix, except for the hl and hm are the basic vectors and R m e element Rlm = 1. Lemma 6 shows that max (D) (F). We now demonstrate the strict inequality for irreducible matrices. If { denotes the eigenvector belonging to the largest eigenvalue of F, then the variational property (8.53) implies {W hl hWm { {W F{ {W D{ = + % {W { {W { {W { {l {m = max (F) + % W { {
max (D)
Since all components of the largest eigenvector { are non-negative (and even positive if F is irreducible), the lemma is proved. ¤ 169. Bounds for the largest eigenvalue of symmetric, irreducible, non-negative matrices. If the irreducible, non-negative matrix is symmetric, we can exploit symmetry to deduce bounds for the largest eigenvalue by considering the quadratic form | W D{1 = {W1 D|, where {1 is the eigenvector with positive components (PerronFrobenius Theorem 38) belonging to the largest eigenvalue 1 and | is a vector with positive components. Using the eigenvalue equation D{1 = 1 {1 , we obtain | W D{1 = 1 | W {1 . On the other hand, we have 3 4 Q X Q Q Q X X X d | lm m D dlm ({1 )l |m = ({1 )l |l C {W1 D| = |l l=1 m=1 l=1 m=1
8.5 Non-negative matrices
237
and, since the components of | and {1 are positive, 4 4 3 3 Q Q Q Q X X X X d | d | lm m lm m D D C min ({1 )l |l {W1 D| C max ({1 )l |l 1lQ 1lQ |l |l m=1 l=1 m=1 l=1 By combining both expressions, taking into account that | W {1 = we obtain the bounds min
1lQ
PQ l=1
({1 )l |l A 0,
(D|)l (D|)l 1 max 1lQ |l |l
(8.54)
which are valid for any symmetric, irreducible, non-negative matrix D. 170. The next remarkable theorem by Fiedler (1972) bounds the spectral gap for symmetric stochastic matrices. Theorem 39 (Fiedler) Let S be a symmetric stochastic q × q matrix with second largest eigenvalue 2 (S ). Then #q (u (S )) 1 2 (S )
q u (S ) q1
(8.55)
where the measure of irreducibility u (D) is defined in (8.52) and where the continuous, convex and increasing function #q ({) 5 [0> 1] is ¢ ¡ ½ 0 { 12 2{ 1 cos q #q ({) = 2 1 2 (1 {) cos q (2{ 1) cos q 21 ? { 1 The inequality (8.55) is best possible: if x> y 5 R satisfy 0 x 1 and #q (x) q 1 y q1 x, then there exists a symmetric stochastic matrix S with u (S ) = x and 2 (S ) = y. Proof: The proof is rather involved and we refer to Fiedler (1972).
¤
171. Eigenvector components of a non-negative matrix. Fiedler (1975) found a nice property regarding the signs of eigenvector components of a non-negative symmetric matrix, that have a profound impact on graph partitioning (art. 103). Theorem 40 (Fiedler) Let D be an irreducible, non-negative symmetric q × q matrix with eigenvalues 1 (D) 2 (D) = = = q (D) and } be a vector such that D} n (D) } with n 2. Then, the set of indices (nodes) M = {m 5 N : }m 0} is not empty and the number of connected components of the principal submatrix D (M), with indices of rows and columns belonging to M, is not larger than n 1. Before proving the theorem, we rephrase the theorem when D is the adjacency matrix of a graph J. The non-negative vector components of } correspond to nodes, that induce a subgraph, specified by the adjacency matrix D (M), with at most n 1 distinct connected components.
238
Eigensystem of a matrix
Proof6 : The set M cannot be empty. For, if M were empty, then all components of } would be negative such that y = } satisfies Dy n (D) y. Since D is irreducible, Perron-Frobenius Theorem 38 demonstrates that D{1 = 1 (D) {1 and 1 (D) A max(2 (D) > q (D)). Thus, {W1 Dy = 1 (D) {W1 y A n (D) {W1 y while the hypothesis implies that {W1 Dy n (D) {W1 y, which leads to a contradiction. If M = N , the theorem is true by the Perron-Frobenius Theorem 38. Suppose now that M 6= N . Then, we can always write the matrix D as " # e F D D= FW G e consists of u distinct connected or irreducible matrices Dm , subject to where D u X
dim (Dm ) = dim M ? q
m=1
with structure
5 9 9 e D=9 9 7
D1
R
R .. .
D2 .. .
R
···
6 5 R F1 .. : 9 F 2 . : : and F = 9 9 . : . 7 . R 8 F u Du
··· .. . .. . R
We partition the vector } conformally, }=
{ |
6 : : : 8
¸
¤ £ has subvectors {m all with nonwhere the vector {W = {W1 {W2 · · · {Wu negative components whose indices belong to M. This implies that | contains only positive components, otherwise a component of | would belong to M. The condition D} n (D) } implies that Dm {m Fm | n (D) {m . Since D is irreducible, none of the block matrices Fm can be zero such that Fm | 0, with inequality in some component because all components of | are strictly positive. Hence, Dm {m n (D) {m holds with strict inequality in some component which implies that, for 1 m u, {Wm Dm {m A n (D) {Wm {m By construction, each Dm is irreducible. The Perron-Frobenius Theorem 38 and the Rayleigh inequality (art. 152) for the largest eigenvalue state that 1 (Dm ) {Wm {m {Wm Dm {m such that 1 (Dm ) A n (D). Finally, the interlacing Theorem 42 shows that, if 1 (Dm ) A n (D) for all 1 m u, then u (D) A n (D) and u n 1. This proves the theorem. ¤ 6
We have combined Fiedler’s proof with that of Powers (1988).
8.5 Non-negative matrices
239
An immediate consequence is that the vector } = y1 +y2 , where y1 is the largest eigenvector (with all positive components) and y2 is the second largest eigenvector of D, satisfies, for 0, D (y1 + y2 ) = 1 (D) y1 + 2 (D) y2 2 (D) (y1 + y2 ) and Theorem 40 for n = 2. Hence, the index set M = n thus the inequality in o m 5 N : (y1 )m + (y2 )m 0 corresponds to an irreducible submatrix of D (M). Since D andnD (M) are irreducible, it means that D (Mf ), where M ^ Mf = N o and Mf = m 5 N : (y1 )m + (y2 )m ? 0 , is also irreducible. This index set M decomposes the set of indices (nodes) into two irreducible submatrices (connected subgraphs). 172. Bounds on eigenvalues of symmetric, non-negative and irreducible matrices. We present a consequence of Fiedler’s eigenvector component Theorem 40. We consider the eigenvalue equation D{n = n (D) {n , where the eigenvalue n (D) is smaller than the largest eigenvalue 1 (D). The corresponding real eigenvector {n is orthogonal to {1 , whose vectors components are positive by virtue of the Perron-Frobenius Theorem 38. Let us denote the nodal sets n o n o n o M+ = m 5 N : ({n )m A 0 > M = m 5 N : ({n )m ? 0 > M0 = m 5 N : ({n )m = 0 such that M+ ^ M ^ M0 = N . Since {Wn {1 = 0 (by orthogonality, art. 151), it holds that |M+ | 1 and |M | 1, whence |M0 | Q 2 for any eigenvalue n (D) with index n A 1. Suppose that ({n )o = min1mQ ({n )m ? 0 and ({n )p = max1mQ ({n )m A 0. The eigenvalue equation (1.3) for component o is, assuming that n (D) A 0, n (D) ({n )o =
Q X
dom ({n )m (|M | 1) ({n )o
m=1
while that for component p is n (D) ({n )p (|M+ | 1) ({n )p Thus, provided n (D) A 0, we have that n (D) (|M | 1) and n (D) (|M+ | 1), from which n (D) min (|M | > |M+ |) 1 Since |M0 | 0, we find that
Q |M0 | |M | + |M+ | 1= 1 2 2 ¹
Q 2 (D) 2
º 1
When n (D) ? 0, we obtain similarly n (D) ({n )o = |n (D)| = |({n )o | |M+ | ({n )p
(8.56)
240
Eigensystem of a matrix
and n (D) ({n )p |M | ({n )o or |n (D)| ({n )p |M | ({n )o from which we deduce, after multiplying both inequalities p |M | + |M+ | |n (D)| |M+ | |M | 2 p p Since |M+ | |M | ¥= ¦ |M+ | (Q |M+ | |M0 |) and |M0 | 0, this quantity is maximal if |M+ | = Q2 and |M0 | = 0. Hence, the smallest eigenvalue of D obeys s¹ º » ¼ Q Q (8.57) Q (D) 2 2 In addition to this bound (8.57), the Perron-Frobenius Theorem 38 as well as Theorem 63 indicate that Q (D) 1 (D). We end this section on non-negative matrices by pointing to yet another nice article by Fiedler and Pták (1962), that studies the class of real square matrices with non-positive o-diagonal elements, to which the Laplacian matrix of a graph belongs. We also mention totally positive matrices. An q × p matrix is said to be totally positive if all its minors are non-negative. The current state of the art is treated by Pinkus (2010), who shows that the eigenvalues of square, totally positive matrices are both real and non-negative.
8.6 Positive (semi) definiteness 173. Positive definiteness. A matrix D 5 Rq×q is positive definite if the quadratic form {W D{ A 0 for all non-zero vectors { 5 Rq . This definition implies that D is nonsingular for otherwise there would exist a non-zero vector { such that {W D{ = 0. We start with a basic property: If D 5 Rq×q is positive definite and \ 5 Rn×q has rank n, then the n × n matrix E = \ W D\ is also positive definite. Indeed, W suppose that the non-zero vector } 5 Rn satisfies 0 } W E} = (\ }) D\ }, then \ } = 0 by the positive definiteness of D. But \ has full column rank, which implies that } = 0, leading to a contradiction. A consequence of the basic property is that all principal submatrices of D are positive definite. In particular, all diagonal elements of a positive definite matrix D are positive. By choosing \ equal to the n column vectors of the identity matrix Lq×q , any principal submatrix of D is found as E = \ W D\ . The basic property then demonstrates the consequence. If D is positive semidefinite, then any principal submatrix of D is also positive semidefinite. This property is less stringent than the basic property for positive W definiteness, because } W E} = (\ }) D\ } 0 for any vector \ }.
8.6 Positive (semi) definiteness
241
174. Elements in a symmetric positive semidefinite matrix. If D 5 Rq×q is symmetric positive semidefinite, then 1 (dll + dmm ) 2 s |dlm | dll dmm |dlm |
(8.58) (8.59)
Proof: We first show the arithmetic mean inequality (8.58). Positive semidefiniteness implies that {W D{ 0 for any vector {. Choose now { = hl + hm , where hl is a basis vector with all components zero except at position l where it is one. Then, {W D{ = dll + dmm + 2dlm 0. Similarly, for { = hl hm , we find {W D{ = dll + dmm 2dlm 0. Combining both inequalities demonstrates the arithmetic mean inequality. Since the inequality holds for all l and m, it also implies max |dlm | max dmm l>m
m
The geometric mean inequality (8.59) follows by considering a principal submatrix of D, which is also positive semidefinite (art. loss of generality, ¸ ¸ we 173). Without { d11 d12 . can choose the principal submatrix Dv = and the vector } = 1 d21 d22 Then, 0 } W Dv } = d11 {2 + 2d12 { + d22 , which requires that the discriminant 4d212 4d11 d22 0. ¤ 175. The Gram matrix associated to the vectors d1 > d2 > = = = > dq is defined as ¤ £ D = d1 d2 · · · dq J = DW D> so that Jlm = dWl dm and Jll = dWl dl = |dl |2 for l = 1> = = = > q. The Gram matrix W J = DW D is symmetric and positive semidefinite because {W J{ = (D{) D{ = 2 kD{k2 0. Art. 160 implies that all eigenvalues of J are real and non-negative. When a matrix J is positive semidefinite and symmetric, we can find the mas trix D as the square root D = J.£ Indeed, the eigenvalue decomposition is ¤ J =¡ X diag(n (J)) X W¢, where X = x1 x2 · · · xq is an orthogonal matrix X W X = X X W = L formed by the scaled, real eigenvectors xn belonging to eigenvalue n (J). Since all eigenvalues are real and non-negative, it holds that p n (J) are real such that ³p ´ ³p ´ J = X diag (n (J)) X W = X diag n (J) diag n (J) X W ³p ´³ ³p ´´W = X diag n (J) X diag n (J) ³p ´ Hence, we can find the matrix D = diag n (J) X W , which is symmetric7 . Moreover, if U is an orthogonal matrix for which UW U = L, then D˜ = UD has a 7
I The Cholesky method gives a solution D = J that is, in general, not symmetric. Another example of a non-symmetric “square root” matrix D is given in art. 264.
242
Eigensystem of a matrix
same Gram matrix since ˜ = D˜W D˜ = (UD)W UD = DW UW UD = DW D = J J Hence, given a solution D of J = DW D, all other solutions are found by orthogonal transformation. In summary, any symmetric, positive semidefinite matrix can be considered as a Gram matrix J whose diagonal elements are non-negative, Jll 0. The nonnegativeness of the diagonal elements was already demonstrated in art. 173 and art. 174. 176. If all eigenvalues are real and n 0 as in a symmetric, positive semidefinite matrix (art. 160), we can apply the general theorem of the arithmetic and geometric mean in several real variables {n 0, which is nicely treated by Hardy et al. (1999), q Y
where
Pq
n=1 tn
{tnn
n=1
q X
tn {n
= 1, to (8.6) and (8.7) with tn = q Y
ÃP n n
n=1
(8.60)
n=1
q n=1 n n P q m=1 m
Sqn
m=1
m
0,
!Sqm=1 m
By choosing m = 1, we find the inequality ¶q μ trace(D) det D q 177. Let P be a symmetric and positive semidefinite q × q matrix such that P x = 0. Any square matrix whose q row sums are zero has an eigenvalue zero with corresponding eigenvector x. Let Z denote the set of all column vectors { that satisfy {W { = 1 and {W x = 0. If P is positive semidefinite, then the second smallest eigenvalue q1 (P ) = min {W P { {5Z
(8.61)
which follows from the Rayleigh inequalities in art. 152 and the fact that the smallest eigenvalue is q (P ) = 0. Theorem 41 (Fiedler) The second smallest eigenvalue q1 (P ) of a symmetric, positive definite q × q matrix P , where P x = 0 obeys q min pmm q1 (P ) (8.62) q 1 1mq In addition, Xs s pmm pnn q
2 max
1mq
n=1
(8.63)
8.7 Interlacing and
s
2 max
1mq
pmm
μ
1 q1 (P ) 1 q
¶
q X
243
s
n=1
¶ μ 1 pmm q1 (P ) 1 q
(8.64)
Proof: Fiedler (1973) observes that the matrix ¶ μ f = P q1 (P ) L 1 M P q is also positive semidefinite. For, let | be any vector in Rq . Then | can be written fx = 0 because Mx = qx, it follows with as | = f1 x + f2 { where { 5 Z . Since P W M{ = x=x { = 0 that ¡ ¢ f| = f22 {W P f{ = f22 {W P { q1 (P ) 0 |W P by (8.61). Since any symmetric, positive semidefinite matrix can be considered as a Gram matrix, whose diagonal elements are non-negative (see art. 175), the f is non-negative, minimum diagonal element of P ¶¶ μ μ 1 0 min p e mm = min pmm q1 (P ) 1 1mq 1mq q which proves (8.62). Also P is a Gram matrix, i.e., P = DW D and plm = dWl dm , where P = P W is symmetric. The fact that P x = 0 translates to Dx = 0. This implies that the P P column vectors d1 > d2 > = = = > dq of D obey qn=1 dn = 0. Hence, dm = qn=1>n6=m dn , and taking the Euclidean norm of both sides leads to |dm |
q X
|dn |
n=1;n6=m
Since this inequality holds for any 1 m q, it also holds for max1mq |dm |, 2 max |dm | 1mq
q X
|dn |
n=1
2 f, yields (8.64). ¤ With pll = |dl | , we arrive at (8.63), which, when applied to P
8.7 Interlacing 178. The resolvent. The resolvent of matrix D is defined as ({L D)1 and is related (art. 148) to the adjoint matrix T ({) = ({L D)1 fD ({), where fD ({) is the characteristic polynomial of D. From the general expression (Meyer, 2000, p. 479) of the inverse of a matrix E, adjE det E where the adjugate adjE is the transpose of the matrix of cofactors of E. A cofactor E 1 =
244
Eigensystem of a matrix
l+m elm , where E elm is the matrix obtained from E by of element (l> m) is (1) det E deleting the l-th row and the m-th column. Applied to the diagonal element of ({L D)1 yields ¡ ¢ det {L D\{m} 1 (8.65) ({L D)mm = det ({L D)
where D\{m} is the (q 1) × (q 1) matrix obtained from D by deleting the m-th row and column. Furthermore, from the general expression of the derivative of a determinant (Meyer, 2000, p. 471), we find that X ¢ ¡ g det ({L D) = det {L D\{m} g{ m=1 q
Hence, q X
1
({L D)mm =
m=1
g g{
det ({L D) g = log det ({L D) det ({L D) g{
Applying (8.32) for a symmetric matrix D to the function i (|) = everywhere analytic except for { = |, yields ({L D)1 =
q X n=1
1 {| ,
1 Hn { n
which is
(8.66)
After taking the trace of both sides and recalling from art. 157 that trace(Hn ) = pn , we finally arrive at q X m=1
({L D)1 mm =
q X n=1
g pn log det ({L D) = { n g{
179. Christoel-Darboux formula for resolvents. The resolvent ({L D)1 of matrix D obeys ({L D)1 (|L D)1 = (| {) ({L D)1 (|L D)1 which is verified by left-multiplication by (|L D) and right-multiplication by adjE ({L D). With E 1 = det E, adj ({L D) adj (|L D) adj ({L D) adj (|L D) = (| {) det ({L D) det (|L D) det ({L D) det (|L D) After multiplying both sides by (| {)1 det ({L D) det (|L D), the element lm of the resulting matrix equation is adjlm ({L D) det (|L D) adjlm (|L D) det ({L D) |{ q X = adjln ({L D) adjnm (|L D)
wlm =
n=1
8.7 Interlacing
245
Since both the adjugate and the determinant of {LD are polynomials in { of degree q 1 and q, respectively, the Christoel-Darboux identity reflects a polynomial identity, whose strength was first applied in the study of orthogonal polynomials (see art. 249). In particular, the limit | $ { results in lim wlm = adjlm ({L D)
|${
=
q X
g g det ({L D) adj ({L D) det ({L D) g{ g{ lm
adjln ({L D) adjnm ({L D)
n=1
If l = m and D is symmetric (such that adjD is symmetric), then adjll ({L 3 D)
q [ gadjll ({L 3 D) g det ({L 3 D) 3 det ({L 3 D) = (adjln ({L 3 D))2 g{ g{ n=1 (8.67)
By using the same arguments as in art. 254, the above expression implies that the zeros of the polynomial s ({) = adjll ({L D) and the polynomial t ({) = det ({L D) interlace. Here, we follow the proof of Godsil (1993). First, let } be a zero of s ({) with multiplicity ps and a zero of t ({) with multiplicity pt . Both of the two terms on the left-hand side of (8.67) have a zero at } of multiplicity ps + pt 1. The right-hand side consists of a sum of squares, such that each term must have a zero at } of multiplicity at least ps + pt 1. Now, the n = l-th term has a zero at } of multiplicity 2ps . Thus, 2ps ps + pt 1, implying that ps pt 1 and that s({) t({) is a rational function with only simple poles. Further, the residue u} of
s({) t({)
u} = lim
{$}
at { $ } is
s ({) ({ }) s ({) t 0 ({) ({ }) s ({) = lim 0 = pt lim 0 {$} {$} t ({) t ({) t ({) t ({)
because t ({) = ({ })pt v ({) and the polynomial v (}) 6= 0. At { = } and, when t (}) = 0, (8.67) shows that s ({) t 0 ({) is positive. A rational function s({) t({) with simple poles that possess a positive residue implies that the zeros of the polynomials s ({) and t ({) interlace. Art. 180 presents another proof. 180. Interlacing. For any q × 1 vector |, we obtain from (8.66) for a symmetric matrix D that q X | W Hn | 1 !| ({) = | W ({L D) | = { n n=1
which implies that the rational function !| ({) has simple poles at the real eigenvalues of D. Dierentiation with respect to { yields X | W Hn | g!| ({) = 2 g{ ({ n ) q
n=1
246
Eigensystem of a matrix
Applying (8.32) to the function i (|) = ({L D)2 =
1 ({|)2
shows that
q X
1
n=1
({ n )2
Hn
from which | W ({L D)2 | =
q X | W Hn | 2
({ n ) n=1
=
g!| ({) g{
Since | W ({L D)
2
1
1
| = | W ({L D) ({L D) | ° °2 ³ ´W ° ° = | W ({L D)1 ({L D)1 | = °({L D)1 | ° 0 2
g!| ({) g{
is always strictly negative whenever { is not a pole of !| ({). we observe that This implies that each zero of !| ({) must be simple and lying between two consecutive poles, i.e., eigenvalues of D, of !| ({). As this result holds for any vector |, 1 we find, by choosing | = hm equal to the base vector h¡m , that !hm ({) ¢ = ({L D)mm . Art. 178, in particular (8.65), then indicates that det {L D\{m} has simple zeros that lie between the zeros of det ({L D). Hence, all eigenvalues of the symmetric matrix D\{m} lie in between eigenvalues of D = DW , ¡ ¢ l+1 (D) l D\{m} l (D) for any 1 l q 1. This property is called interlacing. Since it also applies to a principal submatrix of D\{m} obtained by deleting a same row and column, we arrive at the general interlacing theorem: Theorem 42 (Interlacing) For a real symmetric matrix Dq×q and any principal submatrix Ep×p of D obtained by deleting q p same rows and columns in D, the eigenvalues of E interlace with those of D as qp+l (D) l (E) l (D)
(8.68)
for any 1 l p. Also the zeros of orthogonal polynomials (art. 254) are interlaced. There is an interesting corollary of Theorem 42: Corollary 1 Let D be a real symmetric q × q matrix with eigenvalues q (D) q1 (D) · · · 1 (D) and ordered diagonal elements gq gq1 · · · g1 then, for any 1 n q, it holds that n X m=1
gm
n X m=1
m (D)
8.7 Interlacing
247
Proof: Let E denote the principal submatrix of D obtained by deleting the rows and columns containing the qn smallest diagonal elements gn+1 > gn+2 > = = = > gq . By P (8.7), we have that trace(E) = nm=1 m (E) and, by construction of E, trace(E) = Pn m=1 gm . The Interlacing Theorem provides the inequality (8.68) from which n X m=1
m (E)
n X
m (D)
m=1
¤
Combining the relations proves the corollary.
181. Strict interlacing. We present yet another derivation of the interlacing principle that allows us to specify strict inequalities when we possess additional information. Let us consider the symmetric q × q matrix Dq , which we write in terms of the (q 1) × (q 1) symmetric matrix Dq1 by adding the last column and row as " # D y q1 (q1)×1 Dq = ¡ W ¢ y 1×(q1) dqq where the (q 1) × 1 vector y = (d1q > d2q > = = = > dq1>q ). The characteristic polynomial of Dq is, invoking (8.79), ¸ Dq1 L y det (Dq L) = det yW dqq μ ³ ´ ¶ = det (Dq1 L) det dqq y W (Dq1 L)1 y 1×1
and
³ ´ det (Dq L) = dqq y W (Dq1 L)1 y det (Dq1 L)
(8.69)
For any symmetric matrix Dq1 , the resolvent can be expressed via (8.66) in art. 178 as q1 X {p {W 1 p (Dq1 L) = p p=1 where {1 > {2 > = = = > {q1 are the orthogonal eigenvectors of Dq1 , belonging to the eigenvalues 1 2 = = = q1 , respectively. Hence, ¡ ¢ q1 X y W {p 2 1 W y (Dq1 L) y = p p=1 and (8.69) is written with the projection8 fp = y W {p of the vector y on the 8
The values (f1 > f2 > = = = > fq31 ) can be regarded as the coordinates of the point y in the q 3 1 dimensional space with respect to the coordinate axes generated by the orthogonal eigenvectors of Dq31 .
248
Eigensystem of a matrix
eigenvector {p as q1 X f2 det (Dq L) p = dqq det (Dq1 L) p p=1
Qq1 Since det (Dq1 L) = n=1 (n ) as shown in art. 138, we find the characteristic polynomial of Dq , fDq () = det (Dq L) = (dqq )
q1 Y
(n )
q1 X
f2p
p=1
n=1
q1 Y
(n ) (8.70)
n=1;n6=p
Equation (8.70) shows that fDq (t ) = (1)
qt 2 ft
t1 Y n=1
|n t |
q1 Y
|n t |
(8.71)
n=t+1
A number of interesting conclusions can be derived from (8.70) and (8.71). First, if t = t+1 is an eigenvalue of Dq1 with multiplicity larger than 1, then (8.71) indicates that fDq (t ) = 0, implying that t is also an eigenvalue of Dq . Eigenvalues of Dq1 with multiplicity exceeding 1 are found as a degenerate case of the simple eigenvalue situation when t $ t+o with o A 1. Thus, we assume next that all eigenvalues of Dq1 are distinct and simple such that the product of the absolute values of the dierences of eigenvalues in (8.71) is strict positive. Then, (8.71) shows that the eigenvalue t of Dq1 cannot be an eigenvalue of Dq , unless ft = 0, which means that y is orthogonal to the eigenvector {t . If y is not orthogonal to any eigenvector of Dq1 , then fp 6= 0 for 1 p q 1 and fDq (t ) 6= 0 for 1 t q 1. Moreover, fDq (t ) is alternatingly negative, starting from t = q 1, then positive for t =¡q 2,¢ again negative for t = q 3, etc. Since the polynomial q fDq ({) = ({) +R {q1 for large { as follows from (8.70), there is a zero smaller than q1 (because fDq (q1 ) ? 0 and lim{$4 fDq ({) A 0) and a zero larger q1 q1 than 1 (because (1) fDq (1 ) A 0 and lim{$4 (1) fDq ({) ? 0). Since all zeros of fDq ({) are real (art. 151) and the total number of zeros is q (art. 196), all zeros of fDq ({) are simple and there is precisely one zero of fDq ({) in between two consecutive zeros of fDq1 ({). This argument presents another derivation of the interlacing principle in art. 180. But, the conclusion is more precise and akin to interlacing for orthogonal polynomials (art. 254): if the vector y is not orthogonal to any eigenvector of Dq1 , which is equivalent to the requirement that fp 6= 0 for 1 p q 1, then the interlacing is strict in the sense that q (Dq ) ? q1 (Dq1 ) ? q1 (Dq ) ? = = = ? 1 (Dq1 ) ? 1 (Dq ) Only if y is orthogonal to some eigenvectors, the corresponding eigenvalues are the same for Dq and Dq1 . If y is proportional to an eigenvector, say y = ft {t , then fp = 0 for all 1 p
8.7 Interlacing
249
q 1, except when p = t, such that (8.70) reduces to det (Dq L) = (dqq )
q1 Y
q1 Y
(n ) f2t
n=1
(n )
n=1;n6=t q1 Y
ª © = (dqq ) (t ) f2t
(n )
n=1;n6=t
which shows that det (Dq L) and det (Dq1 L) have q 2 eigenvalues in common and only t and the zeros of the quadratic equation are dierent. Indeed, t is not a zero of s2 () = (dqq ) (t ) f2t because s2 (t ) = f2t 6= 0, by construction. This observation is readily extended: if y is a linear combination of o eigenvectors, then there are q 1 o eigenvalues in common. From (8.70) with fo+1 = fo+2 = = = = = fq1 = 0, we have det (Dq L) = (dqq )
q1 Y
(n )
p=1
n=1
= so+1 ()
q1 Y
o X
o Y
f2p
(n )
n=1;n6=p
q1 Y
(n )
n=o+1
(n )
n=o+1
where the polynomial is so+1 () = (dqq )
o Y n=1
(n )
o X p=1
f2p
o Y
(n )
n=1;n6=p
We see that so+1 (t ) = f2t 6= 0, for 1 t o, by construction and that the real o + 1 zeros of so+1 () determine the zeros of fDq ({), that are dierent from those of fDq1 ({). In summary, if we build up the matrix Dq by iterating from q = 2 and requiring in each iteration l that the corresponding (l 1) × 1 vector y is not orthogonal to any eigenvector of Dl1 , then each matrix in the sequence D2 > D3 > = = = > Dq has only simple eigenvalues, that all interlace over 2 l q. Their associated characteristic polynomials are very likely a set of orthogonal polynomials. In order to have simple, distinct eigenvalues, it is su!cient for D2 that all elements in the upper triangular part including the diagonal are dierent. However, the statement that “the symmetric matrix Dq has only real, simple eigenvalues provided all its upper triangular (including the diagonal) elements are dierent” is not correct for q A 2 as follows from the counter example9 6 5 9 3 6 D3 = 7 3 1 2 8 = DW3 6 2 4 because the eigenvalues of D3 are 14> 0> 0. In fact, D3 is the q = 3 case of a 9
This example is due to F.A. Kuipers.
250
Eigensystem of a matrix
Fibonacci product matrix Dq , with elements dlm = Il+1 Im+1 , where Il denotes the l-th Fibonacci number. Since Il+1 Im+1 and Iq+1 Ip+1 are only equal if l = p and m = q, all elements in the upper triangular part are dierent. Since all rows are dependent, we have q 1 eigenvalues equal to 0 and one eigenvalue equal to the P 2 sum of the diagonal elements, qm=1 Im+1 . Although not correct for q A 2, we provide a probabilistic argument that the statement is in most, but not all cases correct. A random vector y has almost surely all real elements (components) dierent. In additional, such a random vector y is almost never orthogonal to any of the q 1 given orthogonal eigenvectors of Dq1 , that span the q 1 dimensional space. Intuitively, one may think of a unit sphere in q = 3 dimensions in which the eigenvectors form an orthogonal coordinate axis. The (normalized) vector y is a point on the surface of that unit sphere. The orthogonality requirement translates to three circles on the sphere’s surface, each of them passing through two orthogonal eigenvector points. The vector y is not allowed to lie on such a circle. These circles occupy a region with negligible area, thus they have Lesbegue measure zero on that surface. Hence, the probability that y coincides with such a forbidden region is almost zero. Geometric generalizations to higher dimensions are di!cult to imagine, but the argument, that the forbidden “orthogonality” regions have a vanishingly small probability to be occupied by a random vector, also holds for q A 3. In practice, most matrices Dq that obey the statement have distinct eigenvalues. 182. General interlacing. Art. 180, in particular Theorem 42, has been generalized by Haemers (1995): Theorem 43 (Generalized Interlacing) Let D be a real symmetric q × q matrix and V be a real q×p orthogonal matrix satisfying V W V = L. Denote the eigenvector yn belonging to the eigenvalue n (E) of the p × p matrix E = V W DV. Then, (i) the eigenvalues of E interlace with those of D; (ii) if n (E) = n (D) (or n (E) = qp+n (D)) for some n 5 [1> p], then Vyn is an eigenvector of D belonging to n (D); (iii) if there exists an integer n 5 [0> p] such that m (E) = m (D) for 1 m n and m (E) = qp+m (D) for n + 1 m p, then VE = DV. Proof: The Rayleigh’s inequalities (8.27) in art. 152, applied to an p × 1 vector vm be a vector belonging to the space spanned by the eigenvectors {y1 > y2 > = = = > ym }, are vWm Evm m (E) vWm vm Since W vWm Evm (Vvm ) DVvm = W vWm vm (Vvm ) Vvm
8.7 Interlacing
251
Rayleigh’s principle, now applied to the vector Vvm , states that the right-hand side is smaller than m (D) provided Vvm belongs to the space spanned by {{m > {m > = = = > {q }, the last q + 1 m eigenvectors of D. In that case, Vvm can be written as a linear combination, q X Vvm = fn {n n=m
Using the orthogonality V
1
W
=V , vm =
q X
fn V W {n
n=m
Hence, if we choose vm belonging © to the space spanned byª{y1 > y2 > = = = > ym } and orthogonal to the space spanned by V W {1 > V W {2 > = = = > V W {m1 , then m+1 (D) m (E) m (D) for any 1 m p. If the same reasoning is applied to D and E, we obtain m (E) qp+m (D), thereby proving (i). Equality, occurring in the Rayleigh inequalities, m (E) = m (D), means that the vm = ym is an eigenvector of E belonging to the eigenvalue m (E) and that Vvm = Vym = {m is an eigenvector of D belonging to the eigenvalue m (D). This proves (ii). The last point (iii) implies, using (ii), that Vy1 > Vy2 > = = = > Vyp is an orthonormal set of eigenvectors of D belonging to the eigenvalues 1 (E) > 2 (E) > = = = > p (E). Left-multiplying the eigenvalue equation Eym = m (E) ym by V yields VEym = m (E) Vym = m (D) {m = D{m = DVym from which VE = DV follows because all 1 m p eigenvector span the pdimensional space. ¤ ¤W £ By choosing V = Lp×p Rp×(qp) , we find that E is just a principal submatrix of D. This observation shows that Theorem 42 is a special case of the general Theorem 43, which was already known to Cauchy. 183. Interlacing and the sum D + E. Lemma 8 For symmetric q × q matrices D> E, it holds that q (E) + n (D) n (D + E) n (D) + 1 (E)
(8.72)
Proof: The proof is based on the Rayleigh’s inequalities (art. 152) of eigenvalues (see, e.g., Wilkinson (1965, pp. 101-102)). ¤ An extension of Lemma 8 is, for n + m 1 q, n+m1 (D + E) n (D) + m (E) which is also called an interlacing property. These inequalities are also known as the Courant-Weyl inequalities and also hold for Hermitian matrices.
252
Eigensystem of a matrix
Lemma 9 If [=
D FW
F E
¸
is a real symmetric matrix, where D and E are square, and consequently symmetric, matrices, then max ([) + min ([) max (D) + max (E)
(8.73) ¤
Proof: See, e.g., Biggs (1996, p. 56).
Theorem 44 (Wielandt-Homan) For symmetric matrices D and E, it holds that q q X X (n (D + E) n (D))2 2n (E) (8.74) n=1
n=1
¤
Proof: See, e.g., Wilkinson (1965, pp. 104-108). We can rewrite (8.74) with F = D + E and E = F D as q X
2n (D) +
n=1
q X n=1
2n (F)
q X
2n (F D) 2
n=1
q X
n (D) n (F)
n=1
Using (8.20), we have 2
q X n=1
³ ´ ¡ ¢ ¡ ¢ 2 n (D) n (F) trace D2 + trace F 2 trace (F D) ³ ´ 2 = trace D2 + F 2 (F D) = trace (FD + DF) = 2
q q X X
dlm flm
l=1 m=1
Hence, an equivalent form of the Wielandt-Homan Theorem 44 for symmetric matrices D and E is q X trace (DE) n (D) n (E) (8.75) n=1
8.8 Eigenstructure of the product DE 184. Eigenvalues of the product DE. Lemma 10 For all matrices Dq×p and Ep×q with q p, it holds that (DE) = (ED) and (DE) has q p extra zero eigenvalues.
8.8 Eigenstructure of the product DE Proof: Consider the matrix identities ¸ ¸ " Lq×q Lq×q Dq×p Lq×q Rq×p = Rp×q Ep×q Lp×p Ep×q Lp×p and
Lq×q Rp×q
Dq×p Lp×p
¸
Lq×q Ep×q
Dq×p Lp×p
" ¡
¸ =
253
¡ 2 Dq×p¢ L ED p×p
2 L DE Ep×q
¢ q×q
Rq×p Lp×p
#
#
Taking the determinants of both sides of each identity and denoting ¸ Lq×q Dq×p [= Ep×q Lp×p gives respectively
¡ ¢ p det [ = q det 2 L ED ¡ ¢ q det [ = p det 2 L DE
from which it follows, with = 2 , that qp det (ED L) = det (DE L), which is an equation of two polynomials in . Equating corresponding powers in proves Lemma 10. ¤ Lemma 11 If square matrices Dq×q and Eq×q commute such that DE = ED, then the set of eigenvectors of D is the same as the set of eigenvectors of E provided that all q eigenvectors are independent. The converse more generally holds: if any two matrices D and E have a common complete set of eigenvectors, then DE = ED. Proof: If {n is an eigenvector of D corresponding to eigenvalue n , then D{n = n {n . Left multiplying both sides by E and using the commutative property yields D (E{n ) = n (E{n ), which implies that, to any eigenvector {n with eigenvalue n , the matrix D also possesses an eigenvector E{n with same eigenvalue n . Since eigenvectors are linearly independent and since the set of q eigenvectors {{1 > {2 > = = = > {q } spans the q-dimensional space, the eigenvector E{n = n {n , which means that {n is also an eigenvector of E. The converse follows from art. 142 since D = [diag(n ) [ 1 and, similarly, E = [diag(n ) [ 1 . Indeed, DE = [diag (n ) [ 1 [diag (n ) [ 1 = [diag (n n ) [ 1 ED = [diag (n ) [ 1 [diag (n ) [ 1 = [diag (n n ) [ 1 shows that DE = ED.
¤
If all eigenvalues are distinct, all eigenvectors are independent. However, in case of multiple eigenvalues, the situation can be more complex such that there are fewer than q independent eigenvectors. In that case, the Lemma 11 is not applicable. A direct consequence of Lemma 11 is that, for commuting matrices D and E, the eigenvalues of D + E are n + n and both eigenvalues belong to the
254
Eigensystem of a matrix
same eigenvector {n . If matrices are not commuting, remarkably little can be said about the eigenvalues of D + E, given the spectra of D and E (see also art. 154). 185. Kronecker product. The Kronecker product of the q × p matrix D and the s × t matrix E is the qs × pt matrix D E, where 5 6 d11 E d12 E · · · d1p E 9 d21 E d22 E · · · d2p E : 9 : D E =9 . : .. .. .. 7 .. 8 . . . dq1 E
dq2 E
···
dqp E
The Kronecker product D E features many properties (Meyer, 2000, p. 597). The eigenvalues of Dq×q Ep×p are the qp numbers {m (D) n (E)}1mq>1np . Likewise, the set of eigenvalues of Lp Dq×q + Ep×p Lq equals the set of qp eigenvalues {m (D) + n (E)}1mq>1np . 186. The commutator of a matrix. Consider the matrix equation Dq×q [q×p + [q×p Ep×p = Fq×p that includes the commutator equation, D[ [D = R, where [ are all matrices that commute with D, as a special case, as well as the Lyapunov equation (Horn and Johnson, 1991, Chapter 4). The matrix equation is written in Kronecker form as ¡ ¢ W Lp Dq×q + Ep×p (8.76)
Lq yhf ([) = yhf (F) where the qp × 1 vector is ¡ ¢ yhf ([) = {W1 > {W2 > = = = > {Wp = ({11 > = = = > {q1 > {12 > = = = > {q2 > = = = > {1p > = = = > {qp ) where {m is the m-th q×1 column vector of [. The mixed-product property (Meyer, 2000, p. 597), (D1 E1 ) (D2 E2 ) = (D1 D2 E1 E2 ) shows that (Lp Dq×q ) (Ep×p Lq ) = (Ep×p Dq×q ) = (Ep×p Lq ) (Lp Dq×q ) In other words, the square pq × pq matrices Lp Dq×q and Ep×p Lq commute. Horn and Johnson (1991) prove that, if zn is an q × 1 eigenvector of D belonging to n (D) and |o an p × 1 eigenvector of E belonging to o (E), then |o zn is an qp × 1 eigenvector of Lp Dq×q + Ep×p Lq belonging to the eigenvalue n (D) + o (E). The linear equation (8.76) has a unique solution provided none of ¡the eigenvalues ¢ n (D) + o (E) = 0 for all 1 n q and 1 o p, because o E W = o (E) on art. 140. Likewise, if F = R in (8.76), the equation D[ E[ = R has only a solution, provided {n (D)}1nq _ {o (E)}1nq 6= B. Thus, when E = D,
8.9 Formulae of determinants
255
in which case [ is the commutator of D, there are at least q zero eigenvalues of Lq Dq×q Dq×q Lq (and more than q if D has zero eigenvalues) illustrating that there many possible commutators of a matrix D. If F 6= R and E = D in (8.76), there is no solution for [. A theorem of Shoda, proved in Horn and Johnson (1991, p. 288), states that F can be written as F = [\ \ [ for some matrices [ and \ provided trace(F) = 0. 8.9 Formulae of determinants The theory of determinants is discussed in historical order up to 1920 by Muir (1930) in five impressive volumes. Muir claims to be comprehensive. His treatise summarizes each paper and relates that paper to others. A remarkably large amount of papers are by his hand. Many papers deal with specially structured determinants that sometimes possess a nice, closed form10 . 187. A determinant of an q × q matrix D is defined by det D =
X
(s)
(1)
s
q Y
dmnm
(8.77)
m=1
where the sum is over all the q! permutations s = (n1 > n2 > = = = > nq ) of (1> 2> = = = > q) and (s) is the number of interchanges between s and the natural order (1> 2> = = = > q). For example, s = (1> 3> 2> 4) has 1 interchange, (s) = 1, while s = (4> 3> 2> 1) has (s) = 2. Thus, (s) is the number of interchanges to bring s back to the natural order. The determinant of a non-square matrix is not defined. 188. From the Schur identity ¸ D E L = F G FD1 we find that
det
D F
E G
¸
R L
¸
D R
E G FD1 E
¡ ¢ = det D det G FD1 E
¸ (8.78)
(8.79)
and G FD1 E is called the Schur complement of D. Similarly (see, e.g., Meyer (2000)), ¸ ¡ ¢ D E = det G det D EG1 F (8.80) det F G 10
We mention as an example the vol. IV, p. 124), which involves inequality (5.15), 0 d1 + d2 d1 + d3 d1 + d2 0 d2 + d3 .. d +d . d2 + d3 1 3 .. .. . .
following q × q determinant of Scott (1880) in Muir (1930, all players of the harmonic, geometric and arithmetic mean · · · ;3 < 43 4 === q q q @ (32)q31 \ ?C[ D C[ 1 D 2 dm dm 3 (q 3 2) = = > 2 d m=1 m=1 m=1 m .. .
256
Eigensystem of a matrix
189. An interesting application of art. 188 is ¸ ¢ ¡ ¡ ¢ Dq×q Fq×n W = det D det Ln + GW D1 F = det Dq×q + Fq×n Gn×q det W Gn×q Ln (8.81) which follows by applying both (8.80) and (8.79). For n = 1 and D = L in (8.81), we obtain the “rank one update” formula ¢ ¡ det L + fgW = 1 + gW f
(8.82)
This example shows that interesting relations can be obtained when the inverse of either D or G or both in (8.79) and (8.80) are explicitly known. 190. Cramer’s rule. The linear set of equations, D{ = e, has a unique solution { = D1 e provided det D 6= 0. If we write the matrix D in terms of its column vectors dn = (d1n > d2n > = = = > d2n ), then ¤ £ D = d1 · · · dn1 dn dn+1 · · · dq Cramer’s rule expresses the unique solution of { = ({1 > {2 > = = = > {q ) per component as ¤ £ det d1 · · · dn1 e dn+1 · · · dq (8.83) {n = det D Indeed, we can write the matrix D with the n-th column replaced by the vector e as Dn = D + (e dn ) hWn where hn is the n-th basis vector. Thus, |hWn equals the zero matrix with the n-th column replaced by the vector | and it has rank 1. Then, ¢ ¡ ¢ ¡ det Dn = det D + (e dn ) hWn = det D det L + D1 (e dn ) hWn Application of the “rank one update” formula (8.82), recalling that Dhn = dn and hn = D1 dn , yields ¢ ¡ det L + D1 (e dn ) hWn = 1 + hWn D1 (e dn ) ¡ ¢ = 1 + hWn D1 e D1 dn = 1 + hWn ({ hn ) = {n which demonstrates (8.83). 191. Expansion of the determinant of a product. Theorem 45 (Binet-Cauchy) Let ¯ ¯ d1n1 ¯ X ¯ .. det F = ¯ . ¯ 1n1 ?n2 ?···?np q ¯ dpn1
F = DE where Dp×q ¯¯ · · · d1np ¯¯ ¯¯ en1 1 ¯ ¯ .. .. ¯¯ . ··· . ¯¯ · · · dpnp ¯ ¯ en1 p
and Eq×p . Then, ¯ · · · enp 1 ¯¯ .. ¯ (8.84) ··· . ¯¯ ¯ · · · enp p
8.9 Formulae of determinants
257 ¤
Proof: See, e.g., Gantmacher (1959a, pp. 9-10).
If Eq×p = (Dp×q )W (thus elm = dml ), then the Binet-Cauchy formula (8.84) reduces to ¯ ¯ ¯ d1n1 · · · d1np ¯2 q q q ¯ ¯ X X X ¯ .. ¯ .. det DDW = ··· (8.85) ¯ . ¯ ··· . ¯ ¯ n1 =1 n2 =n1 +1 np =np1 +1 ¯ ¯ dpn1 · · · dpnp 192. The Cauchy identity. The Cauchy identity 3 42 q q q q q X X X 1 XX 2 {2m |m2 C {m |m D = ({m |n {n |m ) 2 m=1 m=1 m=1 m=1
(8.86)
n=1
is the special case for the dimension p = 2 in the Binet-Cauchy W ¸ Theorem 45. { , where { and Specifically, (8.85) reduces to (8.86) for the matrix D2×q = |W | are q × 1 vectors. Since the right-hand side in the Cauchy identity (8.86) is nonnegative for real vectors { and |, the Cauchy-Schwarz inequality (8.41) written as 3 42 q q q X X X {2m |m2 C {m |m D m=1
m=1
m=1
is a consequence of (8.86). £ ¤ Since Var[[] = H [ 2 (H [[])2 , Cauchy’s equality (8.86) shows that for any random variable [ in a specific graph, the variance equals 3 42 ¶2 m1 μ q q q X X 1 X 2 C1 X D {m {n Var [[] = {m {m = (8.87) q m=1 q m=1 q m=2 n=1
where the last term sums the square of the dierence in realizations of [ over all pairs of nodes in the graph. 193. We give an identity for any matrix E due to Jacobi, htrace(E) = det hE
(8.88)
We present two proofs. The first proof is the general proof. The logarithm log (L F) has a formal Taylor series provided all eigenvalues | (F)| ? 1, Z 1 Z 1X 4 4 4 X X Fn n n1 = F | g| = F n+1 | n g| log (L F) = n 0 0 n=0 n=1 n=1 Z 1 = F (L F|)1 g| 0
258
Eigensystem of a matrix
The relation with the determinant follows from the general definition of the inverse 1 1 of a matrix, which is (L F|) = det(LF|) adj(L F|). Hence, Z log (L F) =
1
Fadj (L F|) 0
g| det (L F|)
By matrix multiplications and denoting the dimension of the matrix E (and F) by Q , we have that {Fadj (L F|)}lm =
Q X
flq × cofactormq (L F|)
q=1
and trace {Fadj (L F|)} =
Q Q X X
fmq × cofactormq (L F|)
m=1 q=1
Any determinant can be expanded in a sum of cofactors as det (D) =
Q X
dmq × cofactormq D
q=1
and the derivative with respect to | is (Meyer, 2000, ex. 6.2.25) X X gdmq g det (D) = × cofactormq D g| g| m=1 q=1 μ ¶ gD = trace adjD × g| Q
If D1 =
adjD det D
Q
exists, then Jacobi’s analysis shows that μ ¶ gD g det (D) = det Dtrace D1 × g| g|
Thus, trace {Fadj (L F|)} = and
Z
g det (L F|) g|
1
trace {Fadj (L F|)}
trace log (L F) = 0
Z =
0
1
g g| det (L F|) g|
det (L F|)
g| det (L F|)
= log det (L F|)|10
= log det (L F) We arrive at trace log (L F) = log det (L F)
(8.89)
8.9 Formulae of determinants
259
By substitution of E = log (L F), we find Jacobi’s expression (8.88). Finally, the condition that | (F)| ? 1 does not restrict the eigenvalues of E. The second proof is much shorter, but it assumes that all eigenvalues¡ are distinct ¢ such that, from the eigenvalue representation, we have hE = [diag h(E) [ 1 . Taking the determinant of both sides gives Q ³ ³ ´´ Y hm (E) det hE = det [ det diag h(E) det [ 1 = m=1 SQ
=h
m=1 m (E)
traceE
=h
After taking the logarithm in (8.88), the trace is expressed in terms of the determinant, trace (D) = log det hD while by substituting D = hE in (8.88), Jacobi’s identity expresses a determinant as a function of the trace det D = htrace(log D) Expanding the last expression (via Taylor series) shows the relation with the Newton identities (9.3) as demonstrated in art. 36. 194. The q × q Vandermonde matrix 5 1 {1 9 1 { 2 9 9 9 1 {3 9 .. Yq ({) = 9 .. 9 . . 9 .. 9 .. 7 . . 1 {q
of the vector { is defined as11 6 {21 {31 · · · {1q1 {22 {32 · · · {2q1 : : : {23 {33 · · · {3q1 : .. .. .. .. : : . . . . : : .. .. .. .. : . . . . 8 2 3 q1 {q {q · · · {q
(8.90)
The Vandermonde matrix appeared earlier in art. 143 as the eigenvector matrix of the companion matrix. The Vandermonde determinant obeys the recursion det Yq ({) =
q1 Y
({q {m ) det Y(q1) ({)
(8.91)
m=1
with det Y2 ({) = {2 {1 . Indeed, subtracting the last row from all previous rows 11
There are dierent ways to define the Vandermonde matrix, for instance, by organizing the powers of the vector { in rows (as in art. 143) instead of in columns, and by choosing the sequence of powers in either decreasing or increasing order.
260
Eigensystem of a matrix Pn1 and using the algebraic formula {n | n = ({ |) m=0 {n1m | m yields ¯ ¯ 0 ({1 {q )({1 + {q ) · · · · · · {q1 {q1 {1 {q q 1 ¯ q1 ¯ 0 {2 {q ({2 {q )({2 + {q ) · · · · · · {2 {q1 ¯ q ¯ {3 {q ({3 {q )({3 + {q ) · · · · · · {q1 {q1 ¯ 0 q 3 ¯ .. .. .. .. .. det Yq ({) = ¯ .. ¯ . . . . . . ¯ .. .. .. ¯ ¯ 0 {q1 {q ({q1 {q )({q1 + {q ) . . . ¯ ¯ 1 {q {2q {3q · · · {q1 q
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯
q After expanding the determinant as (1) times the cofactor of the last element of the first column, the resulting determinant is, after dividing each row u by the factor {u {q , ¯ Pq2 q2m m ¯ 1 {21 + {1 {q + {2q ··· ··· {q {1 {1 + {q ¯ Pm=0 q2 q2m m ¯ 1 2 2 { + { { + { { + { · · · · · · { {2 2 q 2 q ¯ 2 q q Pm=0 ¯ q2 q2m m 2 2 {3 + {q {2 + {2 {q + {q ··· ··· {3 ¯ 1 m=0 {q ¯ .. .. .. .. .. det Zq1 = ¯ .. ¯ . . . . . . ¯ .. .. ¯ 2 2 ¯ 1 {q2 + {q {q2 + {q2 {q + {q · · · . . ¯ ¯ 1 {q1 + {q {2 + {q1 {q + {2 · · · · · · Pq2 {q2m {m q1 q q1 m=0 q
A determinant remains unchanged by adding a column multiplied by some number P P n1m m n2m m | = {n1 + | n2 | , we can to another column. Since n1 m=0 { m=0 { subsequently multiply each but the last column n by {q and subtract the result from the column n + 1 to arrive at Zq1 = Yq1 ({). This establishes the recursion (8.91). Iterating the recursion (8.91) results in det Yq ({) =
Y
({m {l ) =
q q Y Y
({m {l )
(8.92)
l=1 m=l+1
1l?mq
¤ £ 195. Hadamard’s inequality. Consider the matrix D = d1 d2 · · · dq , with as columns the vectors {dn }1nq . The Hadamard inequality for the determinant, proved in Meyer (2000, p. 469), is v q q uX Y Y u q t |det D| kdn k2 = |dnm |2 (8.93) n=1
n=1
m=1
with equality only if all the vectors d1 > d2 > = = = > dq are mutually orthogonal, i.e., if (dn )W dm = nm or, when complex (dn )K dm = nm , for all pairs (n> m). As proved by Meyer (2000, p. 469), the volume Yq of an q-dimensional parallelepiped, a possibly skewed rectangular box generated by q independent vectors d1 > d2 > = = = > dq , equals Yq = |det D|. This relation provides a geometrical interpretation of the determinant. Hadamard’s inequality (8.93) asserts that the volume of an q-dimensional parallelepiped generated by the columns of D cannot exceed the volume of a rectangular
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯
8.9 Formulae of determinants
261
box whose sides have length kdn k2 . In general, an q-dimensional parallelepiped is skewed, i.e., its q independent, generating vectors d1 > d2 > = = = > dq are not orthogonal, which geometrically explains Hadamard’s inequality (8.93). We apply the Hadamard inequality (8.93) to the Vandermonde determinant in art. 194, where the components of the vector { are ordered as |{1 | A |{2 | A = = = A |{p | A 1 |{p+1 | A = = = A |{q |. After dividing the first p rows, corresponding to the components with absolute value larger than 1, by {q1 for 1 m p, we m obtain ¯ (q1) ¯ (q2) (q3) (q4) ¯ { {1 {1 {1 ··· 1 ¯¯ ¯ 1 ¯ ¯ .. .. .. .. .. .. ¯ ¯ . . . . . . ¯ ¯ p ¯ (q1) ¯ (q2) (q3) (q4) Y ¯ ¯ { { { { · · · 1 p p p det Yq ({) = {q1 ¯ p m q1 ¯¯ 2 3 ¯ 1 {p+1 {p+1 {p+1 · · · {p+1 ¯ m=1 ¯ ¯ ¯ .. .. .. .. .. .. ¯ ¯ . . . . . . ¯ ¯ 2 3 q1 ¯ ¯ 1 {q { { ··· { q
q
q
Since none of the elements in this determinant exceeds in absolute value unity, Hadamard’s inequality (8.93) shows that q
|det Yq ({)| q 2
p Y
qm
|{m |
m=1
with equality if and only if the row vectors are orthogonal. Art. 143 shows that m orthogonality is only possible if all {m = h2l q corresponding to the zeros of sq (}) = dq (} q ± 1). Using (8.92) and |{m | = 1 yields q q ¯ 2lm ¯ Y Y 2ln ¯ q ¯ q h q ¯ = q2 ¯h n=1 m=n+1
(8.94)
9 Polynomials with real coe!cients
The characteristic polynomial of real matrices possesses real coe!cients. This chapter aims to summarize general results on the location and determination of the zeros of polynomials with mainly real coe!cients. The operations here are assumed to be performed over the set C of complex numbers. Restricting operations to other subfields of C, such as the set Z of integers or finite fields, is omitted because, in that case, we need to enter an entirely dierent and more complex area, which requires Galois theory, advanced group theory and number theory. A general outline for the latter is found in Govers et al. (2008) and a nice introduction to Galois theory is written by Stewart (2004). The study of polynomials belongs to one of the oldest researches in mathematics. The insolubility of the quintic, famously proved by Abel and extended by Galois (see art. 196 and Govers et al. (2008) for more details and for the historical context), shifted the root finding problem in polynomials from pure to numerical analysis. Numerical methods as well as matrix method based on the companion matrix (art. 143) are extensively treated by McNamee (2007), but omitted here. A complex function theoretic approach, covering more recent results such as selfinversive polynomials and extensions of Grace’s Theorem (art. 227), is presented by Sheil-Small (2002) and by Milovanovi´c et al. (1994), who also list many polynomial inequalities. In addition, Milovanovi´c et al. (1994) treat polynomial extremal problems of the type: given that the absolute value of a polynomial is bounded in some region of the complex plane, how large can its derivative be in that region?
9.1 General properties 196. A fundamental theorem of algebra, first proved by Gauss and later by Liouville (Titchmarsh, 1964, p. 118), states that any polynomial of degree q has precisely q zeros in the complex plane. If these zeros coincide, we count zeros according to their multiplicity. Thus, if there are o ? q zeros and a zero }n has multiplicity Po pn , then the fundamental theorem states that n=1 pn = q. In the sequel, we 263
264
Polynomials with real coe!cients
represent zeros as if they are single, however, with the convention that a coinciding zero }n with multiplicity pn is counted pn times. Let sq (}) denote a polynomial of degree q defined by sq (}) =
q X n=0
dn } n = dq
q Y
(} }n )
(9.1)
n=1
where {}n }1nq is the set of q zeros and the coe!cient dn (for 0 n q) is a (finite) complex number. Moreover, we require that dq 6= 0, otherwise the polynomial is not of degree q. If dq = 1, which is an often used normalization, the polynomial is called “monic”. Once the set of zeros is known, the coe!cients dn can be computed by multiplying out the product in (9.1). The other direction, the determination of the set of zeros given the set coe!cients {dn }0nq , proves to be much more challenging. Abel and Galois have shown that only up to degree q = 4 explicit relations of the zeros exist in terms of a finite number of elementary operations such as additions, subtractions, multiplications, divisions and radicals on the coe!cients. The solution of the cubic (q = 3) and quartic (q = 4) can be found, for example, in Stewart (2004) and Milovanovi´c et al. (1994). An important aspect of the theory of polynomials thus lies in the determination of the set of zeros. Qq It follows immediately from (9.1) that sq (0) = d0 = dq n=1 (}n ). This shows that the absolute value of any zero of a polynomial must be finite. From (9.1), one readily verifies that } q sq
μ ¶ X ¶ q μ q Y 1 1 = } dqn } n = d0 } }n n=0
(9.2)
n=1
P Hence, the polynomial qn=0 dqn } n with the coe!cients in reverse order possesses P as zeros the inverses of those of the original polynomial qn=0 dn } n . Pq 197. If all the coe!cients dn of sq (}) = n=0 dn {n are integers and if = uv is a rational zero¡(i.e. ¢ u and v are integers and coprime), then u|d0 and v|dq . Indeed, rewriting sq uv = 0 yields u
q1 X
dn+1 un vqn1 = vq d0
n=0
which shows that u divides the left-hand side and, hence, u|vq d0 . Since the prime factorization of u and v do not have a prime number in common because u and v are coprime, u|vq¡d0¢ implies that u|d0 . The second statement follows analogously after P n q1n rewriting sq uv = 0 as dq uq = v q1 . The zeros of a monic polynon=0 dn u v mial with integer coe!cients are called algebraic numbers and play a fundamental role in algebraic number fields (see, e.g., Govers et al. (2008)).
9.1 General properties
265
198. The Newton identities for 0 o ? q, do =
q X 1 dn ]no qo
(9.3)
n=o+1
derived in Van Mieghem (2006b, Appendix B.2), are a recursive set of equations that Pq relate the coe!cients dn of a polynomial sq (}) = n=0 dn } n to sums of positive Pq powers ]m = n=1 }nm of the zeros }1 > }2 > = = = > }q of sq (}). An extension to the sum of negative powers follows from the same method and yields, for any o ? 0, ]o
q 1 X = dn ]no d0 n=1
By rewriting the Newton identities as à ! m1 X 1 dn+qm ]n mdqm + ]m = dq
(9.4)
n=1
we obtain, for 1 m q, a recursion that expresses the positive powers ]m of the zeros in terms of the coe!cients dn . Explicitly for the first few ]m , ]1 = ]2 = ]3 =
q X n=1 q X n=1 q X n=1 q X
}n = }n2 =
dq1 dq
(9.5)
d2q1 2dq2 d2q dq
}n3 =
d3q1 3dq2 dq1 3dq3 + d3q d2q dq
d4q1 4dq2 d2q1 2d2q2 + 4dq3 dq1 dq4 + 4 4 3 2 dq dq dq dq n=1 Pq Applying (9.5) to the polynomial n=0 dqn } n with the coe!cients in reverse order (art. 196) gives ]4 =
}n4 =
]1 = ]2 =
q X d1 1 = }n d0
n=1 q X n=1
d2 2d2 1 = 12 2 }n d0 d0
If all zeros {}n }1nq are real and positive, the harmonic, geometric and arithmetic mean inequality (5.15) shows that v uY u q q 1 q t }m ]1 ]1 q m=1 from which we find
dq1 d1 dq d0
q2 .
266
Polynomials with real coe!cients
199. Vieta’s formulae express the coe!cients dn of sq (}) explicitly in terms of its zeros {}n }1nq . Theorem 46 (Vieta) For any polynomial defined by (9.1), it holds that, for 0 n ? q, q q X X dn qn = (1) ··· dq m =1 m =m +1 m 1
2
q X
qn =mqn1 +1
1
X
(1)qn
qn Y
qn Y
}ml
(m0 = 0)
(9.6)
l=1
}ml
(9.7)
1m1 ?m2 ?···?mqn q l=1
Proof: The proof is based on the principle of induction. Relation (9.6) is easily verified for q = 2. Assume that it holds for q. Consider now the polynomial of order q + 1 whose zeros are precisely those of sq (}), thus }n (q + 1) = }n (q) for 1 n q with the addition of the q + 1-th zero, }q+1 (q + 1). Hence, sq+1 (}) = (} }q+1 (q + 1)) sq (}) =
q+1 X
dn1 (q) } n
n=1
q X
}q+1 (q + 1) dn (q) } n
n=0
q X = }q+1 (q + 1)d0 (q)+ [dn1 (q)}q+1 (q + 1)dn (q)]} n +dq (q)} q+1 n=1
from which the recursion dn (q + 1) = dn1 (q) }q+1 (q + 1) dn (q) is immediate. Since the coe!cient of the highest power equals unity, by definition (thus dq (q) = dq+1 (q+1) = 1) and since the constant term indeed reflects (1)q+1 times the product of all q + 1 zeros, we only have to verify the relation for the coe!cients dn (q + 1) with 1 n q, i.e., to check whether (9.6) satisfies the recursion relation. Substitution yields dn (q + 1) = (1)q+1n
q X
q X
m1 =1 m2 =m1 +1
(1)qn
q X
q X
q+1n Y
mq+1n =mqn +1
l=1
···
q X
m1 =1 m2 =m1 +1
···
q X
qn Y
}ml (q)
}ml (q) }q+1 (q + 1)
mqn =mqn1 +1 l=1
Distributing the product of zeros over the sums and using }n (q + 1) = }n (q) for
9.1 General properties
267
1 n q leads to dn (q + 1) = (1)q+1n
q X
}m1 (q + 1)
m1 =1
5
q X
}m2 (q + 1) · · ·
}mqn (q mqn =mqn1 +1
m2 =m1 +1
q X
7
q X
+ 1)
6
}mq+1n (q + 1) + }q+1 (q + 1)8
mq+1n =mqn +1
= (1)q+1n
q X
}m1 (q + 1) · · ·
m1 =1
Since q + 1.
Pe n=d
q X
}mqn (q + 1)
q+1 X
}mq+1n (q mq+1n =mqn +1
mqn =mqn1 +1
+ 1)
i (n) = 0 if d A e, the last relation equals (9.6) when q is replaced by ¤
In fact, this summation convention implies that (9.6) also equals qn+1 X qn+2 X dn = (1)qn ··· dq m =1 m =m +1 m 1
2
q X
qn Y
qn =mqn1 +1
1
In case for all n holds that }n (q) = 1, we have sq (}) = (}1)q = from which the simple check q X
q X
···
m1 =1 m2 =m1 +1
q X mqn =mqn1
}ml
(9.8)
l=1
Pq
¡ ¢ (1)qn } n
q n=0 n
μ ¶ q 1 = q (q 1) · · · (q n)@n! n +1
follows because only one¡ ordering of {ml } out of n! is allowed. Hence, the multiple ¢ sum in (9.6) consists of qn terms. Applying (9.6) to (9.2) yields, after substitution of p = q n, the following alternative expression: q X
dp = (1)p d0
q X
q X
···
m1 =1 m2 =m1 +1
mp =mp1
p Y 1 } +1 l=1 ml
(9.9)
Finally, if the multiplicity of the zeros is known, then the polynomial can be written as sq (}) = dq
o Y
(} }n )pn
n=1
Using Newton’s binomium (} }n )pn = product yields sq (}) = dq
¶ p1 μ X p1 m1 =0
m1
(}1 )m1
Ppn ¡pn ¢ m=0
m
(}n )m } pn m , expansion of the
¶ ¶ po μ p2 μ X X So p2 po (}2 )m2 = = = (}o )mo } q n=1 mn m2 mo m =0 m =0 2
o
268
Polynomials with real coe!cients Po where we have used n=1 po = q (art. 196). Let t = n=1 mo , then ; < ¶ o μ q ? @ X X Y pn q m (}n ) n } t sq (}) = (1) dq =So > mn t=0 Po
n=1
mn =t;mn 0 n=1
from which the coe!cient dt follows as dt = (1)qt dq
X So
n=1
¶ o μ Y pn
mn =t;mn 0 n=1
mn
}nmn
The last sum is an instance of a characteristic coe!cient of complex function, first defined in Van Mieghem (1996). 200. The elementary symmetric polynomials of degree n in q variables }1 > }2 > = = = > }q are defined by dqn hn (}1 > }2 > = = = > }q ) = (1)qn dq where
dqn dq
is given in either (9.6) or (9.7). For example, for q = 1, h1 (}1 ) = }1
for q = 2, h1 (}1 > }2 ) = }1 + }2 h2 (}1 > }2 ) = }1 }2 for q = 3, h1 (}1 > }2 > }3 ) = }1 + }2 + }3 h2 (}1 > }2 > }3 ) = }1 }2 + }1 }3 + }2 }3 h3 (}1 > }2 > }3 ) = }1 }2 }3 for q = 4, h1 (}1 > }2 > }3 > }4 ) = }1 + }2 + }3 + }4 h2 (}1 > }2 > }3 > }4 ) = }1 }2 + }1 }3 + }1 }4 + }2 }3 + }2 }4 + }3 }4 h3 (}1 > }2 > }3 > }4 ) = }1 }2 }3 + }1 }2 }4 + }1 }3 }4 + }2 }3 }4 h4 (}1 > }2 > }3 > }4 ) = }1 }2 }3 }4 For each positive integer n q, there exists exactly one elementary symmetric polynomial of degree n in q variables, which is formed by the sum of all dierQ ent products of n-tuples of the q variables. Since each such a product nm=1 }m is commutative, all linear combinations of products of the elementary symmetric polynomials constitute a commutative ring, which lies at the basis of Galois theory. For example, it can be shown that any symmetric polynomial in q variables can be expressed in a unique way in terms of the elementary symmetric polynomials hn (}1 > }2 > = = = > }q ) for 1 n q.
9.1 General properties
269
201. Discriminant of a polynomial. The discriminant of a polynomial sq (}) is defined for q 2 as Y (sq ) = d2q2 (}m }n )2 (9.10) q 1n?mq
with the convention that (s1 ) = 1. In view of (8.92), the discriminant can be written in terms of the Vandermonde determinant as (sq ) = d2q2 (det Yq (})) q
2
(9.11)
where } = (}1 > }2 > = = = > }q ) is the vector of the zeros of sq (}). The definition (9.10) of the discriminant shows that (sq ) = 0 when, at least one zero has a multiplicity larger than 1. In order words, (sq ) 6= 0 if and only if all zeros of sq (}) are simple or distinct. Since the discriminant is a symmetric polynomial in the zeros and any symmetric polynomial can be expressed in a unique way in terms of the elementary symmetric polynomials (art. 200), the discriminant can also be expressed in terms of the coe!cients of the polynomial. For example, for q = 2, we obtain the well-known discriminant of the quadratic polynomial s2 (}) = d} 2 + e} + f as (s2 ) = e2 4df The discriminant of the cubic s3 (}) = d} 3 + e} 2 + f} + g is (s3 ) = e2 f2 4df3 4e3 g 27d2 g2 + 18defg 202. Discriminant and the derivative of a polynomial. The logarithmic derivative of sq (}), defined in (9.1), is s0 (}) X 1 g log sq (}) = q = g} sq (}) } }n q
n=1
which shows that s0q (}) = dq
q X
q Y
(} }m )
n=1 m=1;m6=n
Evaluated at a zero }p of sq (}) gives s0q (}p ) = dq
q Y
(}p }m ) = dq (1)
p1
m=1;m6=p
p1 Y
q Y
(}m }p )
m=1;m6=p
from which we obtain q Y
s0q (}p ) = dqq (1)
p=1
q(q1) 2
q q Y Y p=1m=p+1
(}p }m )
m=p+1
(}m }p )2
270
Polynomials with real coe!cients
By invoking the definition (9.10) of the discriminant, we arrive at (sq ) = (1)
q(q1) 2
dq2 q
q Y
s0q (}p )
(9.12)
p=1
which shows that, if the discriminant is non-zero, the derivative s0q (}) has all its zeros dierent from the simple zeros of sq (}). In cases where a dierential equation for a set of polynomials is known, such as for most classical orthogonal polynomials, the relation (9.12) can be used to express the discriminant in closed form as shown in Milovanovi´c et al. (1994, p. 67).
9.2 Transforming polynomials 203. Any polynomial tq (}) =
q X m=0
em } m = eq
q Y
(} |n )
n=1
where eq 6= 0 can be reduced by a linear transformation } = { + f into a polynomial P sq ({) = qn=0 dn {n , where the coe!cient dq1 of {q1 is zero. Indeed, m μ ¶ q q X X X m n mn m tq ({ + f) = { f em ({ + f) = em n m=0 m=0 n=0 3 4 μ ¶ q q X X m C fmn D {n = em n n=0
m=n
= eq {q + (eq1 + qeq f) {q1 + = = = +
q X
em fm
m=0 q1 eqe , q
which shows that, if f = the polynomial sq ({) = tq ({ + f) possesses a zero coe!cient of {q1 , thus dq1 = 0. Clearly, a linear transform shifts the zeros {|n }1nq of tq (}) to the zeros {|n f}1nq of sq (}). The sum of zeros of sq (}) is zero by (9.5) such that f is the mean of the zeros of tq (}). Hence, without loss of generality, any polynomial tq (}) can be first transformed by } = { + f with ¡ m ¢ mn Pq q1 f = eqe f into the polynomial s ({), where d = e and dq1 = 0. q n m m=n n q Pq 2 2dq2 The Newton identity ]2 = n=1 (|n f) = dq shows that the (real) coe!cients dq2 and dq of a polynomial sq ({), where dq1 = 0 must have opposite sign if all zeros are real. 1z 204. The conformal mapping z = 1} 1+} and } = 1+z transforms the right-half complex plane Re (}) A 0 into the unit circle |z| ? 1. For, let } = uhl , then ³ l+ln u ´ l+ln u l+ln u ¶ μ 2 2 2 h l+ln u h h ln u + l 1h ´ = tanh = l+ln u ³ l+ln u z= l+ln u 1 + hl+ln u 2 h 2 h 2 + h 2
9.2 Transforming polynomials
271
which can also be written as r sin cosh ln u cos l(arctan( sinh ln u )+ ) h z= cosh ln u + cos If u = 1, then z = l tan 2 , which shows that a point } on the unit circle is ¡mapped ¢ into a point z on the imaginary axis (and vice versa). If Re (}) ? 0 or 5 2 > 3 2 , q ¡ ¢ ln ucos then cos ? 0 and |z| = cosh cosh ln u+cos A 1, while, if Re (}) A 0 or 5 2 > 2 and cos A 0, then |z| ? 1. If all zeros of sq (}) lie in the left-half complex plane all zeros of ³ (similarly, ´ 1z sq (}) lie in the Re (}) A 0-plane), then all zeros of sq 1+z lie inside the unit circle |z| ? 1. The function1 ¶ μ q X 1z = (1 + z)q sq dn (1 + z)qn (1 z)n 1+z n=0
has the same (finite) zeros as the polynomial tq (z) =
q X
dn (1 + z)qn (1 z)n
n=0
Using Newton’s binomium and the Cauchy product, gives ¶ μ ¶μ q X p X qn n zp (1)m (1 + z)qn (1 z)n = p m m p=0 m=0 Pq such that tq (z) = p=0 ep zp , where ¶ μ ¶μ p q X X qn n (1)m dn ep = pm m m=0 n=0
=
q X
Epn dn
n=0
¡ ¢¡ ¢ P m n qn Defining the matrix elements Epn = p m=0 (1) m pm of the (q + 1) × (q + 1) matrix E, we can write the coe!cient vector e in terms of the coe!cient vector d as e = Ed. Since, tq (1) = d0 2q , which is, for any set of coe!cients dn , equivalent to ¶ μ ¶μ q q X q p X X X qn n m = 2q d0 ep = (1) dn p m m p=0 p=0 m=0 n=0
we find, by equating corresponding coe!cients dn that μ ¶μ ¶ p q X q X X n qn = 2q 1{n=0} = (1)m Epn m pm p=0 m=0 p=0 1
has all zeros outside the unit circle and that zq tq z31 Notice sq 3 13z 1+z Sq Sq Sp m n n q3n zp . p=0 m=0 (31) n=0 dn (31) m p3m
=
272
Polynomials with real coe!cients
Hence, the sums over the rows of E are zero, except for the ³ zeroth ´ column n = 0. 1} 1z Let us consider the inverse transform z = 1+} . From sq 1+z = (1 + z)q tq (z), we have that ¶ μ 1} sq (}) = 2q (1 + })q tq 1+} On the other hand, ¶ X ¶p μ μ q q X 1} 1} q qp = ep = (1 + }) ep (1 + }) (1 })p tq 1+} 1 + } p=0 p=0 = (1 + })q
q X
fp } p
p=0
where, since en = fp
Pn
o=0 (1)
o
Pq t=0
dt
¡t¢¡qt¢ o no ,
¶ μ ¶μ p q X X qn n m = (1) en pm m m=0 n=0 Ã μ ¶μ ¶! μ ¶μ ¶ p n q q X X X X t qt n qn m o = (1) (1) dt o no m pm t=0 m=0 n=0 o=0 4 3 ! Ã ¶ ¶ μ ¶μ μ ¶μ q q n p X X X X q t q n t n D C (1)m = dt (1)o n o p m o m t=0 m=0 n=0
o=0
Pq Pq Thus, sq (}) = 2q p=0 fp } p and since sq (}) = p=0 dp } p , we must have that dp = 2q fp
4 3 Ã n ¶! X ¶ μ ¶μ μ ¶μ q p X X q t q n t n D C (1)m = 2q dt (1)o n o p m o m t=0 m=0 q X
n=0
o=0
or, after equating corresponding coe!cients dp , 4 3 ¶ ÃX ¶! μ ¶μ μ ¶μ q p n X X q n q t n t D C (1)m = 2q tp (1)o p m n o m o m=0 n=0
o=0
In terms of the matrix elements Epn = q X
Pp
¡ ¢¡ qn ¢ pm , we observe that
m n m=0 (1) m
Epn Ent = 2q tp
n=0
Hence, the matrix E 2 = 2q L. If all zeros of tq (z) lie inside the unit circle, then the sequence of the power sums
9.2 Transforming polynomials
273
Pq
]m = n=1 }nm is strictly decreasing in m. Art. 229 below gives another check. In addition, the sum of the inverses of the zeros z 1 > z2 > = = = > zq of tq (z) is (art. 198) Pq q X (q 2n) dn e1 1 Pq = = n=0 zn e0 n=0 dn n=1 ³ ´¯ q g 1z ¯ gz (1 + z) sq 1+z ¯ 2s0 (1) z=0 = q + q = sq (1) sq (1) 205. Consider an even polynomial u2q (}) = 1z z = 1} 1+} and } = 1+z leads to μ u2q
1z 1+z
¶ 2q
= (1 + z)
q X
Pq n=0
2q2n
d2n (1 + z)
n=0 2q
= (1 + z)
d2n } 2n . Conformal mapping
¶ μ 1z (1 z)2n = u2q 1+z
w2q (z)
where w2q (z) =
2q X
ep zp
p=0
and ep
¶ μ ¶μ p q X X 2n 2q 2n m = (1) d2n pm m m=0 n=0
The inverse transform z =
1} 1+}
applied to u2q μ
u2q (}) = 22q (1 + })2q w2q
1} 1+}
³
1z 1+z
´
2q
= (1 + z)
¶
μ = 22q (1 })2q w2q
w2q (z) gives
1+} 1}
¶
where the latter follows from u2q (}) = u2q (}). Explicitly, we have that μ 2q
(1 + })
w2q
1} 1+}
¶ =
2q X
ep (1 })p (1 + })
2qp
p=0
and μ 2q
(1 })
w2q
1+} 1}
¶ =
=
2q X
ep (1 })2qp (1 + })
p=0 2q X
e2qp (1 })t (1 + })
p
2qt
t=0 p
Equating corresponding powers in (1 }) (1 + }) ep = e2qp
p
¢p ¡ = 1 }2 shows that
274
Polynomials with real coe!cients
Hence, the coe!cients ep are symmetric around eq . Thus, μ ¶ X 2q 2q 2q X X 1 2q = ep z2qp = e2qp z2qp = et zt = w2q (z) z w2q z p=0 p=0 t=0 ¡1¢ 2 q q or, in symmetric form , z w2q z = z w2q (z). This shows that, if the polynomial w2q (z) does not have a zero inside (and thus also outside) the unit circle, all 2q zeros of w2q (z) must lie on the unit circle, which is equivalent to the fact that all zeros of u2q (}) lie on the imaginary axis. On the other hand, we can write w2q (z) =
q X p=0
=
q1 X
2q X
ep zp +
p=q+1
ep zp + eq zq +
p=0
=z
q1 X
ep z2qp
p=0
Ã
q
e2qp zp
eq
+
q1 X
ep
! ¡ pq ¢ qp z +z
p=0 l
Let z = uh , then with z
pq
+z
qp
zq w2q (z) = eq + 2
= 2 cosh ((q p) (ln u + l)), we have
q X
eq+n cosh (n (ln u + l))
n=1
If u = 1, and if eq ? 2eq+1 ? 2eq+2 ? · · · ? 2e2q then q X ¡ ¢ eq+n cos (n) hlq w2q hl = eq + 2 n=1
has 2q distinct real roots in the interval 0 ? ? 2, and no imaginary roots at all. The proof is given in Markushevich (1985, Vol. II, pp. 50-52).
9.3 Interpolation 206. Lagrange interpolation. Interpolation consists of constructing a polynomial that passes through at set of q distinct points, defined by their finite coordinates ({m > |m ) for 1 m q. In many cases, the ordinates are special instances of a function, |m = i ({m ), that we want to approximate by a polynomial. Lagrange has developed a most convenient way to construct this “interpolating” polynomial. We start considering the polynomial of degree q, Iq ({) =
q Y
({ {m )
(9.13)
m=1 2
Sq Sq W n n = Polynomials sq (}) with complex coe!cients that satisfy n=0 dn } n=0 (dq3n ) } , q 31 W equivalent to sq (}) = } sq } are called self-inversive and discussed in Sheil-Small (2002, Chapter 7).
9.3 Interpolation
275
Qq
q ({) = m=1;m6=n ({ {m ) is a polynomial of degree q 1 and, since Clearly, I{{ n Iq ({n ) = 0, we have that
lim
{${n
Iq ({) Iq ({) Iq ({n ) = lim = Iq0 ({n ) = {${n { {n { {n
q Y
({n {m )
(9.14)
m=1;m6=n
Since all {n are distinct, {n is a simple zero of Iq ({) such that Iq0 ({n ) 6= 0. Hence, the polynomial of degree q 1, oq1 ({; {n ) =
Iq ({) = ({ {n ) Iq0 ({n )
q Y m=1;m6=n
{ {m {n {m
(9.15)
possesses the interesting property that, at any of the abscissa {1 > {2 > = = = > {q , it vanishes, except at { = {n , where it is one. Thus, with the “Kronecker’s delta” nm , which is defined as nm = 0 if m 6= n, and nn = 1, it holds that oq1 ({m ; {n ) = nm Lagrange observed that the polynomial of degree q 1, sq1 ({) =
q X
|m oq1 ({; {m )
(9.16)
m=1
passes through all q points {({m > |m )}1mq , with |m = sq1 ({m ) for 1 m q. The polynomial (9.16) is called the Lagrange interpolation polynomial corresponding to the set of q points {({m > |m )}1mq . The Lagrange polynomial (9.16) is unique. Indeed, assume that there is another polynomial tq1 ({) that passes through the same set of q points. Then sq1 ({) tq1 ({) is again a polynomial of degree q 1 that possesses q zeros at {m for 1 m q, which is impossible (art. 196). Hence, sq1 ({) = tq1 ({), which establishes the uniqueness. 207. Newton interpolation. When the set of abscissa {{n }1nq is chosen in an equidistant way as {n = n{ for 1 n q, then the Lagrange interpolating polynomial (9.16) reduces to sq1 ({) = Iq ({)
q X m=1
sq1 (m{) q Y ({ m{) (m p) { p=1;p6=m
Qq
qm
Using p=1;p6=m (m p) = (1) ton’s interpolating polynomial, q Y
sq1 ({) =
(m 1)! (q m)!, we obtain a variant of New-
({ n{)
n=1
({)q1 (q 1)!
¶ q μ qm X q 1 (1) sq1 (m{) m=1
m1
{ m{
276
Polynomials with real coe!cients
It is often more convenient to interpolate the polynomial sq ({) from {1 = 0 with steps of { = | up to {q = (q 1) |, in which case we arrive at the classical Newton interpolating polynomial: q Y
sq ({|) =
({ n)
n=0
q!
q μ ¶ X q (1)qm sq (m|) m=0
m
{m
In particular, the special case where { = 1 leads to ¶ q μ X q+1 (1)m sq (m|) sq (|) = m + 1 m=0
(9.17)
(9.18)
which expresses the negative argument values of a polynomial in terms of positive argument values. Finally, we present μ ¶ q X | sq ({|) = (9.19) t sq (t{) t t=0 where
μ ¶ t X tn t sq (n{) sq (t{) = (1) n t
n=0
is the t-th dierence obeying ip = t1 ip+1 t1 ip for all t 5 N0 , thus Pt ¡t ¢ t ip = ip+1 ip . By iteration, we have that ip = n=0 n (1)tn ipt+n . Pq n Substituting the polynomial form, sq ({) = n=0 dn { into the t-th dierence yields q X t sq (t{) (t) = Sp dp {p t! p=t t
(t)
where Sp are the Stirling Numbers of the Second Kind (Abramowitz and Stegun, 1968, Section 24.1.4). The relation (9.19) is commonly known for { = 1 as Newton’s dierence expansion for polynomials. If both sides of (9.19) converge in the limit for q $ 4, the left-hand side converges to the Taylor series of a complex function i , and the right-hand side then equals the dierence expansion for i . Using the formula μ ¶μ ¶ q X (1)n+q (1 + |) | n+m m = (1) m n (| n) n!(q n)! (q + |)
(9.20)
m=n
= q>n
if | = q
where ({) is the Gamma function, we may verify that Newton’s dierence expansion (9.19) for polynomials is equivalent to Newton’s interpolating formula (9.17).
9.4 The Euclidean algorithm
277
9.4 The Euclidean algorithm Pq Pp 208. Consider two polynomials3 s0 (}) = n=0 dn } n and s1 (}) = n=0 en } n , both with complex coe!cients and where the degree q of s0 (}) is larger than or equal to p. Then, there always exists a polynomial t1 (}), called the quotient, such that s0 (}) = t1 (}) s1 (}) + s2 (}) and the degree of the remainder polynomial s2 (}) is smaller than p. Indeed, we can always remove the highest degree term dq } q in s0 (}) by subtracting edpq } qp s1 (}). This first step in the long division yields s0 (})
¶ q1 Xμ dq qp dq dm } s1 (}) = emq+p } m = u (}) ep ep m=0
P m where the convention em = 0 for m A 0 and where the degree of u (}) = q1 m=0 um } is at most q 1. If the degree of the remainder u (}) is larger than p, we repeat the process and subtract ueq1 } q1p s1 (}) from u (}), resulting in a remainder with p degree at most q 2. As long as the degree of the remainder polynomial exceeds p, we repeat the process of subsequent lowering the highest degree. Eventually, we arrive at a remainder s2 (}) with degree smaller than p. This operation is the well-known long division. Next, we can rewrite the equation as s0 (}) s2 (}) = t1 (}) + = t1 (}) + s1 (}) s1 (}) and apply the same recipe to ss12 (}) (}) = t2 (}) + smaller than that of s2 (}). Thus,
s3 (}) s2 (}) ,
1 s0 (}) = t1 (}) + s1 (}) t2 (}) +
1 s1 (}) s2 (})
where the degree of s3 (}) is
1
s2 (}) s3 (})
s4 (}) We can repeat the recipe to ss23 (}) (}) = t3 (})+ s3 (}) , where again the degree of s4 (}) is smaller than that of s3 (}). Hence, we can always reduce the degree of the remainder and eventually it will be equal to zero. The result is a finite continued fraction for s0 (}) s1 (}) in terms of the subsequent quotients t1 (}) > t2 (}) > = = = > tp (}),
s0 (}) = t1 (}) + s1 (}) t2 (}) +
3
1 1 t3 (})+
..
1
. + tp1(})
The index q in sq here deviates from the general definition (9.1).
278
Polynomials with real coe!cients
Alternatively, we obtain a system of polynomial equations (0 ? deg s2 ? deg s1 ) s0 (}) = t1 (}) s1 (}) + s2 (}) s1 (}) = t2 (}) s2 (}) + s3 (}) (0 ? deg s3 ? deg s2 ) s2 (}) = t3 (}) s3 (}) + s4 (}) (0 ? deg s4 ? deg s3 ) ··· ··· sp2 (}) = tp1 (}) sp1 (}) + sp (}) (0 ? deg sp ? deg sp1 ) sp1 (}) = tp (}) sp (}) which is known as Euclid’s algorithm. The last equation shows that sp (}) divides sp1 (}). The last but one polynomial equation indicates that sp (}) also divides sp2 (}). Continuing upwards, we see that the polynomial sp (}) divides all polynomials sn (}) with 0 n p. Hence, sp (}) cannot be zero for all }, otherwise all sn (}) would be zero. Also, sp (}) is the largest common divisor polynomial4 of both s0 (}) and s1 (}). Indeed, any divisor polynomial g (}) of s0 (}) and s1 (}) obeys g|s0 and g|s1 , then the first Euclidean equation indicates that g|s2 , and subsequently, g|sn . Since the degree of the sequence of polynomials sn (}) strictly decreases, sp (}) is the largest n (}) possible common divisor polynomial. Consequently, the functions in (}) = ssp (}) are again polynomials. P 209. Division by a first degree polynomial. The division of s0 (}) = qn=0 dn } n by s1 (}) = } can be computed explicitly. The long division of s0 (}) by } gives the remainder s2 (}) = s0 () and the quotient ;
t1 (}) =
q1 X? n=0
1
= n+1
q X
dm m
m=n+1
< @ >
}n
(9.21)
It is instructive to relate the long division with Taylor series expansions in analysis. With the convention that dn = 0 if n A q, execution of the Cauchy product of two Taylor series around } = 0 yields, for |}| ? ||, s0 (}) = }
4 X n=0
; < n 4 ? 4 @ n X X X 1 } dn } n = dm m } n n+1 n+1 = > m=0 n=0
n=0
We split the series into two parts and take into account that dn = 0 if n A q, ; < ; < q1 4 ? n q @ @ X? 1 X X X 1 s0 (}) m n m } }n = d d m m = n+1 > = n+1 > } m=0 m=0 n=0
4
n=q
For a particular value of } = }0 , we obtain the greatest common divisor of the two numbers s0 (}0 ) and s1 (}0 ).
9.4 The Euclidean algorithm 279 Pq Observing that s0 () = m=0 dm m , the last sum equals ; < Ã4 ! q 4 ? 4 q1 @ X X X }n X }n 1 X }n m n } = s0 () dm = s0 () = n+1 > n+1 n+1 n+1 m=0 n=q
n=q
= such that
s0 () s0 () }
;
s0 (}) = }
1 = n+1
n=0
n=0
}n n+1
4< @ s0 () Cs0 () dm m D } n + > } m=0 3
q1 X? n=0
n=0
q1 X
n X
(9.22)
where the sum equals is the quotient t1 (}) in (9.21) obtained by the long division. Hence, we obtain the dierence quotient t1 (}) =
s0 (}) s0 () }
from which it follows that t1 () = s00 (). The latter is, indeed, deduced from (9.21) as t1 () =
q1 X q1 X
dm+1 m =
q1 X
dm+1 m
m=0
n=0 m=n
m X
1=
q1 X
(m + 1) dm+1 m = s00 ()
m=0
n=0
The case, in which is a zero of s0 (}), is of particular interest, ; ; 3 4< 3 4< q1 q q1 n @ @ X? 1 X s0 (}) X ? 1 C X mD n mD C } }n = d = d m m > > = n+1 = n+1 } m=0 n=0
m=n+1
n=0
This expression gives the explicit quotient polynomial t1 (}) after division by a linear factor } where s0 () = 0. 210. Division by a p-degree polynomial. By using the Taylor series in case p = 2, ; < 4 4 4 n μ ¶m 1 X }n X }n 1 1 X?1 X @ n } = = = n (} ) (} ) n n > n=0
=
1
4 X n=0
n=0
μ
n=0
1 n+1
1 n+1
¶
m=0
}n
a similar series expansion manipulation as in art. 209 leads to ; < q q q2 ? @ X X X 1 s0 (}) 1 1 m m }n = d d m m = n+1 > (} ) (} ) n+1 n=0
+
m=n+1
s0 () 1 s0 () 1 + } }
m=n+1
280
Polynomials with real coe!cients
When = , dierentiation yields ; < q2 q ? @ X X s0 () s0 () 1 s0 (}) m }n + 0 + = (m n 1) d m 2 n+2 = > } (} ) (} )2 n=0 m=n+1 The general result is most elegantly deduced from Cauchy’s integral theorem, Z i (z) 1 gz i (}) = 2l F(}) z } where the contour F (}) encloses the point z = }. Let s1 (}) = assuming that all zeros m of s1 (}) are dierent5 , then Z 1 1 1 = gz p Y s1 (}) 2l F(}) (z m ) (z })
Qp m=1
(} m ) and
m=1
Since lim}$4 |s11(})| = 0, we can deform the contour F (}) to enclose the entire complex plane except for an arbitrary small region around the point z = }. The function s11(}) is analytic everywhere, except for the simple zeros at z = m . Cauchy’s residue theorem (Titchmarsh, 1964) then leads to X 1 = lim p z$o Y s1 (}) p
o=1
z o (z m ) (z })
m=1
=
p X o=1
1 p Y
(o m )
X 1 = } o p
o=1
m=1;m6=o
1 p Y
(o m )
4 X }n n+1 n=0 o
m=1;m6=o
Thus, we find the Taylor expansion, for |}| ? min1op o , ; < 4 ?X p p X Y 1 1 1 @ n 1 } = p = n+1 Y = s1 (}) o m=1;m6=o o m > n=0 o=1 (} m )
(9.24)
m=1
Proceeding similarly as in art. 209 by computing the Cauchy product of s0 (}) and 1 s1 (}) , we arrive at 5
When not all zeros are simple, we need to invoke the n-th derivative of the Cauchy integral, 1 1 gn i (}) = n! g} n 2l
] F(})
i (z) (z 3 })n+1
gz
(9.23)
9.4 The Euclidean algorithm
s0 (}) = s1 (})
; A A A Ap qp X ?X n=0
A A o=1 n+1 A A o =
< A A A A @
q X
1 p Y
(o m ) t=n+1
281
p X
dt ot } n + A A A A >
o=1
m=1;m6=o
1 s0 (o ) p Y } o (o m ) m=1;m6=o
(9.25) Q 0 Observe from (9.13) that p ( ) = I ( ) and that s (}) = I m 1 p (}) p o m=1;m6=o o where {n = n such that (9.25), when p = q, equals s0 (}) =
q X
s0 (o )
o=1
Iq (}) (} o ) Iq0 (o )
which is the Lagrange interpolation polynomial (9.16) corresponding to the set of q points {(o > s0 (o ))}1oq . The first sum in (9.25), which reduces to the n = 0 term when q = p, vanishes because q X s0 (o ) s0 (0) o=1
o Iq0 (o )
=
q q X X s0 (o ) 1 s =0 (0) 0 0 0 o Iq (o ) o Iq (o ) o=1
o=1
which follows from the Taylor expansion (9.24) and the Lagrange polynomial, both evaluated at } = 0. We have implicitly assumed that not all o are zeros of s0 (}). If all {o }1op are simple zeros of s0 (}), then (9.25), written in terms of the polynomial in (9.13), reduces to the quotient polynomial ( p Pqn1 ) qp X X dt+n+1 ot s0 (}) t=0 tp (}) = = (9.26) }n 0 ( ) Ip (}) Ip o n=0
o=1
Compared to the quotient polynomial (9.21) for p = 1, the coe!cients in the general version (9.26) of the quotient polynomial only require p similar polynomial Pqn1 0 evaluations t=0 dt+n+1 ot as in (9.21) and p additional Ip (o ) computations. 211. Minimal polynomial. The minimal polynomial associated to a polynomial Qo p sq (}) = dq n=1 (} }n ) n , where pn denotes the multiplicity of zero }n , is defined as psq (}) = dq
o Y
(} }n )
(9.27)
n=1
The minimal polynomial divides sq (}) and is the lowest degree polynomial possessing the same zeros of sq (}), all with multiplicity 1. If sq (}) has only simple zeros, i.e., pn = 1 for all 1 n q, then psq (}) = sq (}). The minimal polynomial plays an important role in matrix polynomials and the Caley-Hamilton theorem (art. 145).
282
Polynomials with real coe!cients 9.5 Descartes’ rule of signs
212. A famous theorem due to René Descartes is: Theorem 47 (Descartes’ rule of signs) Let F denote the number of changes of sign in the sequence of real coe!cients d0 > d1 > = = = > dq of a polynomial sq (}) = Pq n n=0 dn { and let ] denote the number of positive real zeros of sq (}), then F ] = 2n 0 where n is a non-negative integer. Before proving Theorem 47, we make the following observation. The product form in (9.1) for a real polynomial with y real and 2n complex zeros, such that q = y + 2n, can be written as sq (}) = dq
y Y
(} {m )
m=1
n Y
2
(} Re }m ) + (Im }m )
2
m=1
from which d0 = sq (0) = dq
y Y m=1
({m )
n Y
(Re }m )2 + (Im }m )2
m=1
shows that the sign of d0 does not dependent on the complex zeros. For example, d0 has the sign of dq if all real zeros {m are negative. The sequence d0 > d1 > = = = > dq in which sign(d0 ) = sign(dq ), equivalent to d0 dq A 0, has an even number of changes in sign. This is best verified when the sequence is plot as a piece-wise linear function through the points (n> dn ) for 0 n q, similar as for the random walk in art. 132. The number of sign changes equals the number of n-axis crossings. Zero coe!cients do not contribute to a change in sign. For example, the sequence {1> 2> 0> 0> 1> 0> 1} has two sign changes. Generalizing this observation, if the polynomial sq (}) with real coe!cients has an even number of real, positive zeros such that sign(d0 ) = sign(dq ), the number F of sign changes in d0 > d1 > = = = > dq is even, whereas, if sq (}) has an odd number of real, positive zeros such that sign(d0 ) = sign(dq ), the number F of sign changes is odd. This argument demonstrates that F ] = 2n is even. To show that n is non-negative, a deeper argument is needed. Proof6 (by Laguerre): Let } = h{ , then the number of real zeros of the function P sq (h{ ) = qn=0 dn hn{ is the same as the number of positive zeros of sq ({), because { = log } is monotonous increasing for } A 0. Laguerre actually proves a more 6
In 1828, Gauss proved Decartes’ rule, which was published in 1637 in his Géométrie. Laguerre studied and extended Decartes’ rule in several papers, combined by Hermite et al. (1972) in the part on Algebra.
9.5 Descartes’ rule of signs
283
general result by considering the entire function I ({) =
q X
dn hn {
n=0
where the real numbers obey 0 ? 1 ? · · · ? q . Clearly, if we choose n = n, we obtain the result for sq (h{ ). Let F denote the number of changes in sign in the sequence d1 > d2 > = = = > dq and let ] denote the number of real zeros of the entire function I ({). Since for { $ 4, the term dq hq { dominates, while for { $ 4, the term d0 h0 { is dominant; by the argument above, therefore, F ] is even. The proof that F ] 0 is by induction. If there are no changes of sign (F = 0), then there are no zeros (] = 0) and F ]. Assume that the Theorem holds for F 1 changes of sign (hypothesis). Suppose that I ({) has F A 0 changes of sign and let + 1 be an index of change, i.e., d d+1 ? 0 for 1 ? q. Consider now the related function q X J ({) = dn (n ) hn { n=0
then, for ? ? +1 , the number of changes of sign in the sequence d0 ( 0 ) > d1 ( 1 ) > = = = > d ( ) > d+1 (+1 ) > = = = > dq (q ) is precisely FJ = F 1 because now d ( ) d+1 (+1 ) A 0, where all other consecutive products remain unchanged. Further, J ({) = h{
¢ g ¡ { h I ({) g{
and, since h{ A 0 for all real {, h{ I ({) and I ({) have the same real zeros. As a consequence of Rolle’s Theorem, the derivative i 0 ({) has no less than ] 1 zeros in the same interval where i ({) has ] zeros. Hence, J ({) has at least ]J ] 1 zeros. On the other hand, J ({) has at most ] 1 zeros7 . Thus, FJ ]J = F 1 (] 1) and, by the induction argument (F 1 ] 1), we arrive at FJ ]J . Introducing FJ = F 1 and ]J = ] 1, we finally obtain that F ], which completes the induction. ¤ Since the set of exponents {n }1nq can be real numbers, Laguerre’s proof thus extends Descartes’ rule of signs to a finite sum of non-integer powers of {. For example, {3 {2 + {1@3 + {1@7 1 {2 = 0 has F = 3 sign changes, and thus at most ] 3 positive (real) zeros. Since the polynomial sq ({) has coe!cients (1)n dn , Descartes’ rule of signs indicates that the number of negative real zeros of sq ({) is not larger than the number of changes in signs in sq ({). 7
If i (}) is an analytic function in the interior of a single closed contour F defined by |i (})| = P, where P is a constant, then the number of zeros of i (}) in this region exceeds the number of zeros of the derivative i 0 (}) in that same region by unity (Whittaker and Watson, 1996, p. 121).
284
Polynomials with real coe!cients
Descartes’ rule of signs is only exact if F ? 2 because n 0; thus, in case there is no (F = 0) or only one (F = 1) sign variation, which corresponds to no or exactly one positive real zero. The reverse of the F = 0 case holds: if all zeros of a real polynomial have negative real part, then all coe!cients are positive and there is no change in sign. However, the reverse implication, {] = 1} =, {F = 1} does not hold in general as the example {3 {2 + { 1 = ({ 1) ({ l) ({ + l) shows. Example 1 The polynomial s5 ({) = 2{5 {4 + {3 + 11{2 { + 2 has four changes in sign, while s5 ({) = 2{5 {4 {3 + 11{2 + { + 2 only has one change in sign. Hence, while there are in total precisely five zeros, there is at most one negative real zero and at most four real positive. Since complex zeros appear in pairs, there can be either four, two or zero real positive zeros, but precisely one negative zero. Example 2 Milovanovi´c et al. (1994) mention the remarkable inequality, valid for all real { and even integers q A 0, tq ({) = {q q{ + q 1 0 with equality only if { = 1. Since tq ({) has only positive coe!cients, tq ({) A 0. For { A 0, there are F = 2 changes in sign and Descartes’ rule of signs states that there are at most two real zeros. Since tq (1) = tq0 (1) = 0, the polynomial tq ({) has a double zero at { = 1, which is thus the only real zero and this implies tq ({) 0. 213. Number of sign changes in the sequence of the dierences. Let F be the number of sign changes in the sequence d0 > d1 > = = = > dq and assume that these sign changes occur between the elements (dn1 > dp1 ) > (dn2 > dp2 ) > = = = > (dnF > dpF ) where nm pm 1 and the equality sign only occurs if there are no zero elements between dnm and dpm . We denote dp0 = d0 and dnF+1 = dq , which has the same sign as dpF and dnF+1 +1 = 0. Assume, without loss of generality, that dp0 A 0. l1 m Then, we have that signdnm = (1) , signdpm = (1) for 1 m F and (1)
l1
dpm 1 0
Consider now the sequence of the dierences d0 > d1 d0 > d2 d1 > = = = > dq dq1 . We denote the dierence by dm = dm dm1 for 1 m q and d0 = d0 . The sign of the 1 m F elements, ¡ ¢ (1)l dpm dpm 1 = (1)l dpm A 0 is known. Since dpm1 and dpm have opposite sign for 1 m F, the changes in sign of all dierences between them is odd (an odd number of n-axis crossings). The last subsequence between dpF and dq+1 = dq+1 dnF+1 = signdpF = sign dpF also has an odd number of sign changes. Summing the sign changes in all F + 1 subintervals equals F plus an odd number of sign changes. Thus, we have proved:
9.5 Descartes’ rule of signs
285
Lemma 12 If F is the number of sign changes in the sequence d0 > d1 > = = = > dq , then the number of sign changes in the sequence of the dierences d0 > d1 > = = = > dq , where dm = dm dm1 for 1 m q and d0 = d0 , equals F plus an odd positive number. 214. Consider the polynomial tq+1 ({) = ({ ) sq ({) = dq {q+1 +
q X
(dn1 dn ) {n d0
n=1
=
q+1 X
(dm1 dm ) {m
m=0
with the convention that d1 = dq+1 = 0. If A 0, the number of sign changes in the coe!cients e¡m = d¢m1 dm of tq+1 ({) equals the number of sign changes in the dierence m dm = m dm m1 dm1 = m1 em . Lemma 12 shows that the number of sign changes in the dierence sequence equals that in the polynomial sq (}) plus an odd positive number. Descartes’ rule of signs, Theorem 47, states that Fsq (}) = ]sq (}) + 2n and, hence, Ftq+1 (}) = ]sq (}) + 2n + 2p + 1 = ]tq+1 (}) + 2 (n + p) Notice that the argument and Lemma 12 provide a second proof of Descartes’ rule of signs, Theorem 47, because we have just shown the inductive step: if the rule holds for sq (}), it also holds for tq+1 (}). Descartes’ rule of signs definitely holds for q = 0 and this completes the second proof. If A 0 and the number of changes in sign in sq ({) is zero (which implies by Descartes’ rule of signs, Theorem 47, that sq ({) has no positive real zeros), then is the largest real zero of tq+1 ({). If ? 0 and the coe!cients of sq ({) are alternating (equivalent to the fact that sq ({) does not have negative real zeros (art. 212)), then is the smallest real zero of tq+1 ({). 215. Laguerre (see Hermite et al. (1972)) has elegantly and ingeniously extended Descartes’ rule of signs. As in art. 212, Laguerre considers, as a generalization of P the polynomial sq (}) = qn=0 dn } n , the entire function I (}) =
q X
dn } n = dq } q + dq1 } q1 + = = = + d0 } 0
(9.28)
n=0
where q A q1 A = = = A 0 are real numbers. Theorem 48 (Laguerre) The number of real zeros ] of the entire function I (}), defined in (9.28), that are larger than a positive number , is at most equal the number F of changes in signs of the sequence dq q > dq q + dq1 q1 > = = = > I () and F ] = 2n 0.
286
Polynomials with real coe!cients
Proof: We start from the polynomial identity (9.22), 3 4 q1 q sq () sq (}) X C X = dm mn1 D } n + } } n=0
m=n+1
1
= for } A results in 4 q1 q 4 X X X sq (}) n sq () C = dm mn1 D } n + } } n+1
Using the expansion (} )
3
n=0
m=n+1
n=0
Laguerre’s extension of Descartes’ rule of signs (art. 212) indicates that the number of real zeros ] of sq (}) that are larger than is at most equal to the number of sign changes of the right-hand side. Since A 0, all terms n sq () for n A 0 have the same sign as sq (), which implies that the number F of sign changes is equal to the number of sign changes of the coe!cients in the first n-sum. Each of these Pq coe!cients has the same sign as the partial sum m=n+1 dm m of sq (), because A 0. This proves the theorem in case I (}) is a polynomial. At last, we can always reduce I (}) to a polynomial form. If 0 ? 0 and all exponents n are integers, } 0 I ({) is a polynomial and the above argument applies. If n 5 Q+ , then I (} z ) is a polynomial provided z is the least common multiple of the denominators of the set {n }1nq . Finally, since each real number can be approximated arbitrarily close by a rational number, so can I (}) with real exponents approximated arbitrarily close by a polynomial, which demonstrates Theorem 48. ¤ Theorem 48 is readily modified when we want the number of positive zeros smaller than . In that case, we may verify by following the same steps as in the proof above that the Theorem 48 also holds for the number of positive zeros smaller than , provided the order of terms (9.28) is written according to increasing exponents, i.e., q ? q1 ? = = = ? 0 . As a corollary, the number of zeros in [0> 1] of I (}) is at most equal to the number of sign changes in the sequence dq > dq + dq1 > = = = > I (1) Another application is I (}) = sq (} + k), which we expand by Taylor’s theorem sq (} + k) =
q (m) X sq (k) m=0
m!
} m = sq (k) + }s0q (k) + } 2
s00q (k) sq (k) + = = = + }q 2 q! (q)
into a polynomial, written with exponents of } in increasing order. The number of real zeros of I (}) between [0> ], and thus the real zeros of sq (}) between [k> k + ], is at most equal to the number of sign changes in the sequence ½ ¾ 00 0 0 2 sq (k) sq (k) > sq (k) + sq (k) > sq (k) + }sq (k) + } > = = = > sq ( + k) 2 and their dierence is an even integer (possibly zero).
9.5 Descartes’ rule of signs The whole idea can subsequently be applied to
287
sq (}) (})p
using art. 210. Since the
sq (}) (})p
sq (}) is at most equal to that in (}) number of sign changes in p1 (art. 214), but not smaller than the number of real zeros of sq (}) larger than , we may expect to deduce, by choosing an appropriate p, an exact way to determine the number of such real zeros. In fact, Laguerre succeeded (Hermite et al., 1972, p. 24-25) to propose an exact method, that involves the discriminant (art. 201), which is hard to compute. In summary, his method turns out to be less attractive than that of Sturm, discussed in art. 225.
216. We present another nice approach due to Laguerre (Hermite et al., 1972, p. 26-41). Consider the polynomial iq (}) =
p X
Dm sq (m })
m=1
Pq where 0 ? p ? p1 ? = = = ? 1 and sq (}) = n=0 dn } n . Explicitly, 3 4 q p X X Cdn Dm mn D } n iq (}) = m=1
n=0
Descartes’ rule of signs, Theorem 47, states that the number ] of positive zeros is at least equal to the number F of variation in sign in the sequence ; < p p p ? X @ X X d0 Dm > d1 Dm m > = = = > dq Dm mq = > m=1
m=1
m=1
That number F is also equal to the number F1 of sign changes in the sequence (o ? q) ; < p p p ? X @ X X Dm > d1 Dm m > = = = > do Dm mo V1 = d0 = > m=1
m=1
m=1
plus the number F2 of changes of sign in the remaining sequence ; < p p p ? X @ X X Dm mo > do+1 Dm mo+1 > = = = > dq Dm mq V2 = do = > m=1
m=1
m=1
= {do ! (0) > do+1 ! (1) > = = = > dq ! (q o)} where ! ({) =
p X m=1
Dm mo m{ =
p X
log m
Dm mo (h{ )
m=1
If we suppose that all dn 0, then the number F2 of variations in sign in V2 is at most equal to the number ]! of positive zeros of ! ({) (because even if ! (n) ! (n 1) A 0, there can be an even number of zeros in the interval (n 1> n)).
288
Polynomials with real coe!cients
The number ]! is also equal to the number of real zeros of ! (log }) = 0, which is greater than 1. Theorem 48 in art. 215 shows that the number ]! of real zeros of ! (log }) larger than 1 is at most equal to the number F! of sign changes in the sequence © ª D1 1o > D1 1o + D2 2o > = = = > ! (0) Hence, ] F F1 + F! . Since the above holds for all 0 o ? q, the simplest choice is o = 0. Thus, we have proved Theorem 49 (Laguerre) The number of real roots ] of the equation p X
Dm sq (m }) = 0
m=1
Pq n where 0 ? p ? p1 ? = = = ? 1 and sq (}) = n n=0 dn } , is atPmost equal o to the p number of variations in sign of the sequence D1 > D1 + D2 > = = = > m=1 Dm . Theorem 49 holds for i (}) = limq$4 sq (}) provided the polynomial sum conP verges. As an application, we consider i (}) = h} . The equation p m=0 Dm exp (m }) = P 0 possesses the same roots as p D exp (( + n) }) = 0, where n is a finite real m m=0 m number such that the restriction 0 ? p can be removed. Let m = d + mw and p = e A d, such that p = ed w , then we obtain the Riemann sum, Z
ed
lim
w$0
w X
(d+mw)}
Dm h
w =
e
h{} # (}) g}
d
m=1
where # (}) is a completely arbitrary function (continue or discontinue) because the coe!cients D0 > D1n> = = = > Dp are completely arbitrary. The number of changes in o Pp sign in the sequence D1 > D1 + D2 > = = = > m=1 Dm is, in that limit, at most equal R{ to the number of zeros of d # (}) g} = 0 in the interval (d> e). For example, let # (}) =
q X n=0
dn
} n +z1 (n + z)
0 and ({) is the Gamma function. For d = 0 and e = 4, where all n A R 40, z {} the equation 0 h # (}) g} = 0 becomes q X n=0
dn =0 {n +z
whose number of positive zeros is, after the transformation { $ {1 , precisely Pq Pq equal to those of n=0 dn {n +z = 0 and, thus, of R n=0 dn {n = 0. On the other { hand, the number of positive zeros of the equation 0 # (}) g} = 0, computed as q X n=0
dn
{n +z =0 (n + z + 1)
9.5 Descartes’ rule of signs 289 Pq is at least equal to those of n=0 dn {n = 0. Now, for n = n the equations reduce to polynomials and we observe that, after a transform { $ {, the number of P negative zeros of the polynomial sq ({) = qn=0 dn {n is at most equal to those of Pq n { the polynomial n=0 dn (n+z+1) . Consequently, we arrive at Pq Theorem 50 (Laguerre) If all zeros of the polynomial sq ({) = n=0 dn {n are Pq {n real, then the zeros of the related polynomial tq ({; z) = n=0 dn (n+z+1) are also all real, for any real number z 0. Many extensions of Laguerre’s Theorem 50 have been deduced, the so-called zero mapping transformations. Consider the set of real numbers {n }n0 , which is a zero mapping transformation, satisfying certain properties. If all zeros of the Pq polynomial sq ({) = n=0 dn {n are real, then the zeros of the related, transformed Pq n polynomial wq ({; ) = n=0 n dn { are also all real. A large list of particular sequences {n }n0 is presented in Milovanovi´c et al. (1994). 217. Theorem 50 in art. 216 can be extended, P Theorem 51 (Laguerre) Let sq (}) = qn=0 dn } n be a polynomial with real zeros and let i (}) be an entire function (of genus 0 or 1), which is real for real } and all the zeros are real and negative. Then, the polynomial jq (}) =
q X
dn i (n) } n
n=0
has all real zeros, and as many positive, zero and negative zeros as sq (}). Proof: See Hermite et al. (1972, p. 200) or Titchmarsh (1964, pp. 268-269).
¤
It can be shown (Titchmarsh, 1964, pp. 269-270) that, if q $ 4 and s (}) = limq$4 sq (}) is an entire function, then j (}) = limq$4 jq (}) is entire, all of whose zeros are real and negative. Hence, applied to s (}) = h} , Laguerre’s theorem (extended to q $ 4) shows that the Taylor series j (}) =
4 X i (n) n=0
n!
}n
is an entire function j (}) with negative, real zeros. 218. The polynomial uq (}) = |dq | } q
q1 X
|dn | } n
n=0
Pq1
where |dq | A 0 and n=0 |dn | A 0, has precisely one positive zero. Descartes’ rule of signs, Theorem 47, tells us that there is at most one positive zero because there
290
Polynomials with real coe!cients
is one change of sign. We show that there is precisely one zero by noting that in à ! q1 X |dn | q nq } 1 uq (}) = |dq | } |dq | n=0
the n-sum is monotone decreasing from 4 to 0 when } increases from 0 to 4. Hence, there is precisely one point } = at which the n-sum is 1 and uq () = 0. Moreover, uq (}) ? 0 if } ? and uq (}) A 0 if } A . Consider now q X ¡ ¢ ueq (}) = } q uq } 1 = |dq | |dqn | } n n=1
The same argument as above for the n-sum demonstrates that there is precisely one positive zero, which is by art. 196, equal to 1 . P n 219. If }0 is a zero of sq (}) = dq } q + q1 n=0 dn } and is the only positive zero P q1 of uq (}) = |dq | } q n=0 |dn | } n in art. 218, then |}0 | . Proof: |dq | |}0q |
¯ q1 ¯ q1 ¯ X ¯ X ¯ n q n¯ = ¯ dn }0 ¯ |dn | |}0 | = |dq | |}0 | uq (|}0 |) ¯ ¯ n=0
n=0
which shows that 0 uq (|}0 |). Art. 218 indicates that this condition, equivalent to uq (|}0 |) 0, implies that |}0 | . ¤ Similarly, the absolute values of the zeros of sq (}) are larger than or equal to Pq the only positive zero of ueq (}) = |dq | n=1 |dqn | } n . 220. Cauchy’s rule. We derive an upperbound 0 for any positive zero of P the real polynomial sq (}) = qn=0 dn } n , without resorting to Decartes’ rule. The upperbound 0 satisfies ¯ ¯ ¯ ¯ ¯ q q ¯ ¯ X ¯ dn ¯ n ¯ sq () ¯ ¯¯ q X d ¯ n n q ¯ = ¯ + ¯ ¯ 0 ¯¯ + ¯ ¯ dq ¯ dq ¯ ¯ dq ¯ n=0
n=0
Since the coe!cients of sq (}) are real, we rewrite the latter bound as 3 4 q1 q1 X X dn nq F dn n E q C1 0 D+ d d q q dn dn n=0 and
dq
?0
n=0 and
dq
0
¯ ¯ ¯ ¯ = ¯ ddqn ¯ nq for all n indices for which
Let us denote fn ´1@(qn) ³ |dn | and the above inequality requires that fn |dq | q1 X n=0 and
dn dq
fn 1 ?0
dn dq
? 0. Then, =
(9.29)
9.5 Descartes’ rule of signs
291
Since all fn 0, we observe that any zero of sq (}) is not larger than Ãμ Ãμ ¶1@(qn) ! ¶1@(qn) ! |dn | |dn | max = max d n ?0 0nq1 fn |dq | fn |dq | 0nq1 and dq (9.30) which is Cauchy’s rule. If denotes the number of negative coe!cients of sqd(}) , q Pq1 1 then n=0 and dn ?0 1 and the choice fn = clearly satisfies the condition dq
(9.29), leading to
1 μ ¯ ¯¶ qn ¯ dn ¯ ¯ ¯ ¯ ¯ = max d n ?0 dq 0n?q and dq
(9.31)
A weaker bound, derived from the inequality in (9.30), follows from the choice (q1 n ) . fn = q1 1 . Another choice, that satisfies (9.29) for all n, is fn = 2q1 221. The equation sq (}) = 0 where dq 6= 0 can be transformed by the substitution } = e{ into q1 X dn enq {n = 0 {q + dq n=0
Let us confine to odd q. Odd polynomials with real coe!cients have at least one ³ ´1@q . This choice real zero. We now choose e such that dd0q eq = 1 or e = dd0q reduces the original equation into q1 X dn μ d0 ¶n@q q {n 1 = 0 tq ({) = { d0 dq n=1
Since tq (0) = 1 ? 0 and lim{$4 tq ({) A 0, there must lie at least one real root in the interval (0> 4). If tq (1) A 0, the root must lie between 0 and 1; if tq (1) ? 0, then the root lies in the interval (1> 4). By the transform { = | 1 , the interval Q (1> 4) can be changed to (0> 1). Alternatively, art. 196 shows that qn=1 }n = 1 which indicates that not all zeros can lie in (0> 1) nor in (1> 4). Hence, we have reduced the problem to find a real zero of sq (}) with q odd, into a new problem of finding the real root of tq ({) in (0> 1). We refer to Lanczos (1988) for a scheme of successively lowering the order of the polynomial tq ({) by shifted Chebyshev polynomials. 222. Isolation of real zeros via continued fractions. Let p1 5 N and pn 5 N0 for all n A 1. Theorem 52 (Vincent-Uspensky-Akritas) There exists a continued fraction transform with a non-negative p1 and further positive integer partial quotients {pn }2no , }=
1 Do z + Do1 = p1 + Eo z + Eo1 p2 + · · · +
1 po +z
(9.32)
292
Polynomials with real coe!cients
that transforms the polynomial ³ ´sq (}) with rational coe!cients dn and simple zeros q Do z+Do1 into the function sq Eo z+Eo1 = (Eo z + Eo1 ) s˜q (z) such that the polynomial s˜q (z) has either zero or one sign variation. The integer o is the smallest integer such that Io1 g2 A 1 and Io1 Io g A 1 + %1 q , where g is the minimum distance between any two zeros, Ip is the p-th Fibonacci number that obeys Ip = Ip1 + ¢ 1 ¡ Ip2 for p A 1 and with I0 = I1 = 1 and where %q = 1 + q1 q1 1. ¤
Proof: See Akritas (1989, p. 367-371).
While the converse of Descartes’ Theorem 47 in case F = 0, implying that there is no positive real zero, is generally true, the converse of the case F = 1 is not generally true as demonstrated in art. 212. The part of Theorem 52 that details the determination of the integer o guarantees that, if there is one zero with positive real part and all others have negative real part and lying in an %q -disk around 1, the corresponding polynomial has exactly one change in sign. The Fibonacci numbers Ip enter the scene because they are the denominators of the p-th convergent of the continued fraction of the golden mean (see e.g. Govers et al. (2008, p. 316)), s 1+ 5 1 =1+ 2 1 + ··· + 1 .. 1+ . in the limit case where all pn = 1 for n 1. The continued fraction transform (9.32) roughly maps one zero to the interval (0> 4) and all others in clusters around 1 with negative real part. Moreover, it is equivalent to a series of successive substitutions of the form } = pm + z1 for 1 m o. The best way to choose the set of integers {pn }1no is still an open issue. Akritas (1989) motivates to choose pm in each substitution round equal to Cauchy’s estimate (9.31). Finally, Akritas (1989) claims that his method for isolating a zero is superior in computational eort to Sturm’s classical bisection method based on Theorem 54.
9.6 The number of real zeros in an interval (}) 223. The Cauchy index. Consider the rational function u (}) = ssp that has at q (}) most q poles, the zeros of the polynomial sq (}) that are not zeros of the numerator polynomial sp (}). We further assume q A p, else we can always reduce the rational function as the sum of a polynomial and a rational function, where the numerator polynomial has a smaller degree than the denominator polynomial as explained in art. 208. If n is a zero with multiplicity pn of sq (}) but not of sp (}), then
u (}) = sm (})
o Y
(} n )pn
n=1
where sm (}) is a polynomial of degree m p and sm (n ) 6= 0. The behavior of
9.6 The number of real zeros in an interval
293 p
u (}) around a pole t of order pt is dominated by et (} t ) t , where et = Qo p sm (t ) n=1;n6=t (t n ) n 6= 0. The Cauchy index of a rational function u (}) at a real pole | is defined to be +1 if lim{$| u ({) = 4 and lim{$|+ u ({) = 4 and the Cauchy index is 1 if lim{$| u ({) = 4 and lim{$|+ u ({) = 4 while the Cauchy index is zero if both limits are the same. Hence, the Cauchy index at t (assumed to be real) equals 0 if pt is even and sign(et ) if pt is odd. The Cauchy index of a rational function u for the interval [d> e], denoted by Lde u ({) is defined as the sum of the Cauchy indices at all real poles | between d and e, such that d ? | ? e and d and e are not poles of u. Q Now, let sq (}) = on=1 (} n )pn and consider its logarithmic derivative u (}) =
X pn s0 (}) X pn g log sq (}) = q = = + u1 (}) g} sq (}) } n } n o
v
n=1
n=1
where only the first v zeros are real in the interval [d> e]. The Cauchy index of u for s0 ({) s0 ({) the interval [d> e], Lde sqq ({) = v. Hence, Lde sqq ({) is equal to the number of distinct real zeros of sq (}) in the interval [d> e]. Since sq (}) has a finite number of zeros, 0 +4 sq ({) L4 sq ({) equals all distinct real zeros of sq (}). The remainder of the subject thus consists in finding the Cauchy index for the logarithmic derivative in order to determine the number of real zeros of a polynomial in a (possibly infinite) interval [d> e]. One of the methods is based on Sturm’s classical theorem. 224. A Sturm sequence. A sequence of real polynomials i1 ({) > i2 ({),= = =, ip ({) is a Sturm sequence on the interval (d> e) if it obeys two properties: (i) ip ({) 6= 0 for d ? { ? e and (ii) in1 ({) in+1 ({) ? 0 for any n where in ({) = 0 and d ? { ? e. Let Y ({) denote the number of changes in sign of the sequence i1 ({), i2 ({), = = =, ip ({) at a fixed { 5 (d> e). It is clear that the value of Y ({) can only change when { varies from d to e, if one of the functions in ({) passes through zero. However, for a Sturm sequence, property (ii) shows that, when in ({) = 0 for any 2 n p 1, the value of Y ({) does not change. Only if i1 ({) passes through a zero 5 (d> e), Y ({) changes by ±1 according to the Cauchy index of ii21 ({) ({) at { = . Hence, we have shown: Theorem 53 (Sturm) If i1 ({) > i2 ({) > = = = > ip ({) is a Sturm sequence on the interval (d> e), then Lde
i2 ({) = Y (d) Y (e) i1 ({)
294
Polynomials with real coe!cients
225. An interesting application of a Sturm sequence is its connection to the Euclidean algorithm (art. 208), which we modify (all remainders have negative sign) into s0 (}) = t1 (}) s1 (}) s2 (}) (0 ? deg s2 ? deg s1 ) s1 (}) = t2 (}) s2 (}) s3 (}) (0 ? deg s3 ? deg s2 ) s2 (}) = t3 (}) s3 (}) s4 (}) (0 ? deg s4 ? deg s3 ) ··· ··· sp2 (}) = tp1 (}) sp1 (}) sp (}) (0 ? deg sp ? deg sp1 ) sp1 (}) = tp (}) sp (}) The sequence {sn ({)}0np is a Sturm sequence if the largest common divisor polynomial sp ({) does not change sign in the interval (d> e). By the modified Euclidean construction, we observe that property (ii) in art. 224 is always fulfilled. Indeed, in the modified Euclidean algorithm for any 0 n ? p and { 5 (d> e) relation sn1 ({) = tn ({) sn ({) sn+1 ({) shows that, if sn ({) = 0, both sn1 ({) and sn+1 ({) have opposite sign and do not contribute to changes in Y ({). 0 (}) The Euclidean algorithm, applied to the logarithmic derivative u (}) = ss(}) where s0 (}) = s ({) and s1 (}) = s0 (}), provides information about the multiplicity of zeros of the polynomial s (}). If is a zero with multiplicity p of s (}), then it is a zero with multiplicity p 1 of s0 (}). Hence, both s (}) and s0 (}) have the factor p1 (} ) in common, and since, by construction, sp (}) is the largest common divisor polynomial, sp (}) also must possess the factor (} )p1 . In summary, applying the (modified) Euclidean algorithm to the logarithmic 0 ({) derivative u ({) = ss({) of a polynomial s ({), art. 223 with Theorem 53 leads to: Theorem 54 (Sturm) Let s (}) be a polynomial with real coe!cients and let {sn } be the sequence of polynomials generated by the (modified) Euclidean algorithm starting with s0 (}) = s (}) and s1 (}) = s0 (}). The polynomial s (}) has exactly Y (d)Y (e) distinct real zeros in (d> e), where Y ({) denotes the number of changes of sign in the sequence {sn ({)}. A complex number is a zero of multiplicity p of s (}) if and only if is a zero of multiplicity p 1 of sp (}). Thus, all zeros of s (}) in (d> e) are simple if and only if sp (}) has no zeros in (d> e). Example Let s (}) = } 4 2} 2 + } + 1. Descartes’ rule of signs (Theorem 47) states that there are either 2 or 0 real positive zeros. The (modified) Euclidean algorithm yields, with s0 (}) = s (}) and s1 (}) = s00 (}), ¶ μ ³} ´ 3 4 2 2 s0 (}) = } 2} + } + 1 = s1 (}) } } 1 4 4 ¶ μ 9 3 s1 (}) = 4} 4} + 1 = (4} + 3) s2 (}) } 4 4 μ ¶ 4} 91 3 283 s3 (}) s2 (}) = } 2 } 1 = + 4 9 81 81
9.7 Locations of zeros in the complex plane and 9 s3 (}) = } 4 = 4 283 s4 (4) = 81
μ
729 324 }+ 1132 283
295
¶ s4 (})
The corresponding continued fraction of the modified Euclidean algorithm is s0 (}) = t1 (}) s1 (}) t2 (})
1 1 t3 (})
..
1
. tp1(})
and, here, } s0 (}) = s1 (}) 4 4} + 3
1 1 91 4} 9 + 81 729
1132
1 }+ 324 283
The sequence of signs at } = 0 is +> +> > > such that Y (0) = 1. For } $ 4, the sign of the leading coe!cients are +> +> +> > and Y (4) = 1, while Y (4) = 3. There is no positive real zero, but two negative zeros ones. The zeros are simple because s4 (}) is constant. The zeros of s (}) are }1 = 1=49, }2 = 0=52, }3>4 = 1=01 ± 0=51l.
9.7 Locations of zeros in the complex plane Pq 226. Consider the real numbers8 p1 A 0, p2 A 0,. . . , pq A 0 and m=1 pm = 1, and let {}n }1nq denote the q complex zeros of a polynomial sq (}), then the center of gravity is defined as }=
q X
pm }m
(9.33)
m=1
and the number pm can be interpreted as a mass placed at the position }m . If we consider all possible sets {pm }1mq of masses at the fixed points {}n }1nq in the complex plane, then the corresponding centers of gravity cover the interior of a convex polygon, the smallest one containing the points }1 > }2 > = = = > }q . The only exception occurs if all zeros lie on a straight line. In that case, all the centers of gravity lie in the smallest line segment that contains all the points }1 > }2 > = = = > }q . Any straight line through the center of gravity9 separates the set {}n }1nq into 8 9
Here pn is not the multiplicity of zero }n . We may also interpret the vector }m 3 } as a force directed from } to }m with magnitude pm |}m 3 }|. Then } represents an equilibrium position of a material point subject to repellant forces exerted by the points }1 > }2 > = = = > }q . If } were outside the smallest convex polygon that contains the }m ’s, the resultant of the the several forces acting on } could not vanish: no equilibrium were possible.
296
Polynomials with real coe!cients
parts, one of each side of the line (except if all the points }1 > }2 > = = = > }q lie on a line). Pq Indeed, since m=1 pm = 1, we can write (9.33) 0=
q X
pm (}m })
m=1
If all the points z1 > z2 > = = = > zq are on the same side of a straight line passing through P P the origin, then qm=1 zm 6= 0 and qm=1 z1m 6= 0. Indeed, we can always rotate the coordinate axis such that the imaginary axis coincides with the straight line through the origin. If all points are on one side,³then they´lie in either the positive ¢ ¡ Pq Pq Pq or negative half plane and m=1 Re (zm ) = Re and m=1 Re zm1 is m=1 zm non-zero. Now, let pm (}m }) = zm and the above argument shows that not all the points pm (}m }) lie on the same side of a line. Translate the origin from the center of gravity } to any other point in the plane and verify that the property still holds. Theorem 55 (Gauss) All zeros of the derivative s0q (}) of a polynomial sq (}) do not lie outside the smallest convex polygon that contains all the zeros of sq (}). Proof: Let }1 > }2 > = = = > }q denote the zeros of sq (}) and let z be a zero of s0q (}), dierent from }1 > }2 > = = = > }q , then s0q (z) X 1 = =0 sq (z) m=1 z }m q
or, since also the complex conjugate
Pq
1 m=1 (z}m )
q X z }m 2
m=1
This is equivalent to z we arrive at
Pq
1 m=1 |z}m |2
z=
|z }m | =
q X m=1
= 0, we have that
=0
Pq
1 m=1 |z}m |2 }m .
1
With P =
Pq
1 m=1 |z}m |2 ,
2 }m
P |z }m |
1 which expresses a center of gravity if pm = P|z} 2 in (9.33) and, by construction, m| Pq m=1 pm = 1. Above it is shown that any center of gravity lies inside the smallest convex polygon formed by the points }1 > }2 > = = = > }q . ¤
Any smallest convex polygon containing all zeros can be enclosed by a circular disk F because all zeros are finite. If f is a point lying on the boundary of the 1 circle F, then the Möbius transform v = w (}) = }f maps the disk into a half-plane (containing the point at infinity, since v(f) = 4). Further considerations of Gauss’s Theorem 55 and the Möbius transform are discussed in Henrici (1974).
9.7 Locations of zeros in the complex plane
297
227. There exists a quite remarkable result that relates the zeros of two polynoPq mials, that satisfy the apolar condition. Two polynomials sq (}) = n=0 dn } n and P tq (}) = qn=0 en } n are called apolar if they satisfy q X
(1)n
dn eqn ¡q¢ = 0
(9.34)
n
n=0 n
Let n = (1)q en , then the Cauchy product of the polynomials sq (}) and t˜q (}) = (n) Pq n } is n n=0 3 4 2q n X X C dm nm D } n sq (}) t˜q (}) = n=0
m=0
which shows that the apolar condition (9.34) implies that the q-th coe!cient or q-th derivative at } = 0 of the product sq (}) t˜q (}) is zero. Pq Pq Theorem 56 (Grace) Let sq (}) = n=0 dn } n and tq (}) = n=0 en } n be apolar, thus satisfying (9.34). Suppose that all zeros of sq (}) lie in a circular region U, then tq (}) has at least one zero in U. Proof: See, e.g., Szeg˝o (1922), Henrici (1974, pp. 469-472).
¤
Example Consider tq (}) = } q + eqn } qn , whose coe!cient eqn is chosen to n ¡ ¢1 satisfy the apolar condition (9.34), such that d0 + (1) qn dn eqn = 0. Thus, n+1 ¡q¢ d0 qn for eqn = (1) and (eqn )1@n h2o@n for n dn , the zeros of tq (}) are [0] 0 o ? n. All zeros of tq (}) lie at the origin or on the circle U around the origin ¯¡ ¢ ¯1@n ¯ ¯ with radius ¯ qn ddn0 ¯ . Grace’s Theorem 56 states that, there is at least one zero of sq (}) that lies inside that circle U. The example shows that, by choosing an appropriate polynomial tq (}) whose zeros are known and that can be made apolar to sq (}), valuable information about the locations of some zeros of sq (}) can be derived. Related to Grace’s Theorem 56 is: Theorem 57 (Szeg˝ ¡ ¢o’s Composition Theorem) Suppose that all the zeros of Pq sq (}) =¡ ¢ n=0 dn qn } n lie in a circular region U. If ¡ ¢is a zero of tq (}) = Pq Pq q n q n n=0 en n } , then each zero of zq (}) = n=0 dn en n } can be written as = v, where v is a point belonging to U. Proof: See Szeg˝o (1922).
¤
P 228. Let us assume that there is no zero of the polynomial sq (}) = qn=0 dn } n in a disk around }0 with radius . After transforming } $ } }0 , we obtain the Pq n polynomial expansion around }0 , sq (}) = n=0 en (}0 ) (} }0 ) , where e0 (}0 ) =
298
Polynomials with real coe!cients
sq (}0 ) 6= 0, by the assumption. Further, we bound sq (}) for |} }0 | ? as ¯ ¯ q q ¯ ¯ X X ¯ n¯ n |sq (})| = ¯e0 (}0 ) + en (}0 ) (} }0 ) ¯ A |e0 (}0 )| |en (}0 )| (} }0 ) ¯ ¯ n=1 n=1 ( ) q q X X |en (}0 )| n n A |e0 (}0 )| |en (}0 )| = |e0 (}0 )| 1 |e0 (}0 )| n=1
n=1
Cauchy’s rule in art. 220 shows that we may deduce a sharper bound if all coefficients en (}0 ) are real. There is exactly one positive solution for of |e0 (}0 )| = Pq n n=1 |en (}0 )| because the right-hand side is monotonously increasing from zero (at = 0) on. Since finding such solution is, generally not easy, we proceed as in n (}0 )| n art. 220. Let n = |e |e0 (}0 )| A 0, for each n where |en (}0 )| A 0, then ; ? |sq (})| A |e0 (}0 )| 1 =
q X n=1;|en (}0 )|A0
< @ n
>
( |e0 (}0 )| 1
q X
) n
n=1
Pq It su!ces to require that n=1 n 1 to obtain |sq (})| A 0. Hence, given a set Pq of positive numbers n satisfying n=1 n 1, then there are no zeros in a disk around }0 with radius ¯1@n ¯ 1@n ¯ en (}0 ) ¯¯ = min n ¯¯ e0 (}0 ) ¯ 1nq;|en (}0 )|A0 Pq Pq P4 Example 1 If n = 2n for which n=1 n = n=1 2n ? n=1 2n = 1, then a zero free disk around }0 has radius ¯ ¯ ¯ en (}0 ) ¯1@n 1 ¯ ¯ min = 2 1nq;|en (}0 )|A0 ¯ e0 (}0 ) ¯ ¡ ¢ Pq Pq ¡ ¢ qn qn Example 2 If n = qn | n (1 |) , then n=1 n = n=1 qn | n (1 |) =1 and ¯1@n ¯μ ¶ q ¯ q | en (}0 ) ¯¯ n ¯ = min (1 |) ¯ 1 | 1nq;|en (}0 )|A0 n e0 (}0 ) ¯ q
If 0 ? | ? 1, then (1 |) n ? (1 |)q such that q1
| (1 |)
¯ ¯μ ¶ ¯ q en (}0 ) ¯1@n ¯ ¯ 1nq;|en (}0 )|A0 ¯ n e0 (}0 ) ¯ min
q1
Finally, the maximum of | (1 |) occurs at | = q1 and is Thus, a zero free disk around }0 has radius ¯ ¯μ ¶ ¯ q en (}0 ) ¯1@n 1 ¯ ¯ min = qh 1nq;|en (}0 )|A0 ¯ n e0 (}0 ) ¯
1 q
¡ ¢q1 1 q1 A
1 qh .
9.7 Locations of zeros in the complex plane
299
Example 2 has another interesting property: Vieta’s formulae (9.9) applied to Pq n sq (}) = n=0 en (}0 ) (} }0 ) shows that q q X X ep (}0 ) = (1)p ··· e0 (}0 ) m =1 m =m +1 m 1
2
q X
p =mp1 +1
1
p Y l=1
1 }ml }0
¡q¢ terms as shown in art. 199. Now, let g = where the multiple sum contains p min1nq |}n }0 | denote the distance of the }0 to the nearest zero of sq (}), then |}n }0 |1 g1 for all 1 n q. Introduced in the above Vieta formula yields, for 1 p q, ¯ ¯ q q X ¯ ep (}0 ) ¯ X ¯ ¯= ··· ¯ e0 (}0 ) ¯ m =1 m =m +1 m 1
2
q X
p =mp1
1
¯ μ ¶ p ¯ Y ¯ ¯ 1 q p ¯ ¯ ¯ }m }0 ¯ p g l +1 l=1
from which ¯ ¯μ ¶ ¯ q en (}0 ) ¯1@n ¯ ¯ = qh g min 1nq;|en (}0 )|A0 ¯ n e0 (}0 ) ¯ Thus, we have shown that there is at least one zero in the disk around }0 with radius qh, while Example 2 demonstrates that there are no¯ zeros¯in the disk with ¯ (}0 ) ¯ ¡ q ¢ p the same center }0 but radius . Finally, we use the bound ¯ eep0 (} into ¯ p g 0) Pq n |e0 (}0 )| = n=1 |en (}0 )| and find 1=
q X |en (}0 )| n=1
|e0 (}0 )|
n
q μ ¶ ³ ´n X q n=1
n
g
³ ´q = 1+ 1 g
Pq such that g 21@q 1 . Given the solution of |e0 (}0 )| = n=1 |en (}0 )| n , the disk around }0 with radius 21@q 1 contains at least one zero of sq (}). There exist theorems, for which we refer to Henrici (1974, pp. 457-462), that give conditions for the radius of a disk to enclose at least p zeros. Pq 229. If d0 A d1 A · · · A dq A 0, then the polynomial sq (}) = n=0 dn } n does not have a zero in the unit disc |}| 1 nor on the positive real axis. Proof: If } = u is real and positive, sq (u) A 0. For the other cases where } = uhl and 6= 0, consider ¯ Ã q !¯ ¯ ¯ X ¯ n q+1 ¯ |(1 })sq (})| = ¯d0 (dn1 dn ) } + dq } ¯ ¯ ¯ n=1 ¯ ¯ q ¯ ¯X ¯ ¯ (dn1 dn ) } n + dq } q+1 ¯ d0 ¯ ¯ ¯ n=1
300
Polynomials with real coe!cients
Further, with u 1, ¯ ¯ q ¯ ¯ q ¯ ¯X ¯ ¯X ¯ ¯ n q+1 ¯ n ln q+1 l(q+1) ¯ (dn1 dn ) } + dq } (dn1 dn ) u h + dq u h ¯=¯ ¯ ¯ ¯ ¯ ¯ ¯ n=1
n=1
?
q X
(dn1 dn ) + dq = d0
n=1
where the inequality stems from the fact that not all arguments hln are equal (since 6= 0). Hence, |(1 })sq (})| A 0 for |}| 1. ¤ Art. 229 also holds for a polynomial with alternating coe!cients, wq (}) = Pq n n n=0 (1) dn } , where d0 A d1 A · · · A dq A 0, because a zero { of wq (}) is also a zero of sq (}) for which |{| A 1. If dq A dq1 A · · · A d0 A 0, then all Pq the zeros of the polynomial sq (}) = n=0 dn } n lie within the unit disc |}| ? 1. This case is a consequence of art. 229 and art. 196. We refer to Milovanovi´c et al. (1994) for extensions. Pq 230. If the polynomial sq (}) = n=0³ dn } n´has real, positive coe!cients, ³ ´ then all dn1 its zeros lie in the annulus min1nq dn1 |}| max . 1nq dn dn ¡ } ¢ Pq Proof: Consider sq { = n=0 dn {n } n and we can always choose { such that dn {n ? dn1 {1n for each 1 n q. Indeed, it su!ces that {1 ? dn1 dn for each ³ ´ ¯ ¡ ¢¯ d n or that {1 = min1nq n1 . For those {, art. 229 shows that ¯sq } ¯ A 0 dn
{
for |}| 1, which implies that sq (}) has no zeros within¡the radius {1 , ¢ discPwith q | 1 q qn n thus |}n | A { . Applying the same method to } sq } = } n=0 dqn | qn qn+1 and choose³| such that d | ? d | for each 1 n q, or | = qn qn+1 ´ ¯ ¡ | ¢¯ dn1 ¯ ¯ max1nq dn . For those |, art. 229 indicates that sq } A 0 for |}| 1, ¡ ¢ which implies that all zeros of } q sq }1 lie outside the disk with radius |. In view of art. 196, the zeros }n of sq (}) lie within that disc with radius |, thus |}n | ? |. Combining both bounds completes the proof. ¤ 231. Weierstrass’s iterative method. Let zm be an approximation to the zero }m of the polynomial sq (}). The new update z ˆm = zm + zm is ideally equal to }m , so that sq (z ˆm ) = 0. By Taylor’s Theorem, we have ³ ´ 2 sq (z ˆm ) = sq (zm + zm ) = sq (zm ) + s0q (zm ) zm + R (zm ) Requiring that sq (z ˆm ) = 0 and ignoring second order terms leads to the NewtonRaphson rule for the increment zm =
sq (zm ) s0q (zm )
(9.35)
When the approximation zm is indeed su!ciently close to }m , subsequent iterations (n+1) (n) (n) zm = zm + zm converge quadratically in n to }m .
9.7 Locations of zeros in the complex plane
301
Weierstrass argues similarly. Ideally, all z ˆm = }m for all zeros 1 m q such that the product form (9.1) of the polynomial equals sq (}) = dq
q Y
(} zm + zm )
m=1
Applying Taylor’s Theorem to expand the q-dimensional function sq (}; }1 > = = = > }q ) in the vector (}1 > }2 > = = = > }q ) around the vector (z1 > z 2 > = = = > zq ) yields ¯ q X Csq (}) ¯¯ zm + u sq (}) = sq (}; z1 > = = = > zq ) + C}m ¯}m =zm m=1 W
where the remainder u contains higher order terms in zm such as (z) Kz, where K is the Hessian. Ignoring the remainder as in Newton-Raphson’s rule and computing the derivative yields sq (}) ' dq
q Y
(} zm ) dq
m=1
q X
q Y
(} zn )zm
m=1 n=1;n6=m
All increments zm for 1 m q are solved from this relation by subsequently letting } = zp for 1 p q, resulting in sq (zp ) ' dq
q X
zm
m=1
q Y
(zp zn ) = dq zp
n=1;n6=m
q Y
(zp zn )
n=1;n6=p
from which Weierstrass’ increments for 1 p q are obtained: zp = (n+1)
(n)
dq
sq (zp ) n=1;n6=p (zp zn )
Qq
(n)
(9.36)
Iterations of zp = zp + zp for 1 p q in n = 0> 1> = = = converge also quadratically in n to all the 1 p q zeros }p under much milder conditions than the Newton-Raphson rule. McNamee (2007) demonstrates that Weierstrass’s (0) scheme nearly always converges, irrespective of the initial guesses zp for 1 p q. There is an interesting alternative derivation of the Weierstrass increments (9.36). The application of the Newton-Raphson rule (9.35) to the coe!cients (9.6) of Vieta’s formula expressed in terms of the zeros yields a set of linear equations in zp Pq that leads to (9.36). The simplest linear equation, dq1 n=1 }n for n = q 1 dq = in (9.6), is already linear in }n = z ˆn = zn + zn and shows that, at each iteration Pq (n+1) = dq1 p=1 zp dq , meaning that the sum of approximations equals the exact sum. Just as there are many improvements of the Newton-Raphson rule to enhance the convergence towards the root, so are there many variants that improve Weier(0) strass’s rule. Moreover, there are conditions for the initial values zp to guarantee convergence, which are discussed in McNamee (2007).
302
Polynomials with real coe!cients 9.8 Zeros of complex functions
232. The argument principle. Theorem 58 If i (}) is analytic on and inside the contour F, then the number of zeros of i (}) inside F is Z 1 1 i 0 (}) Q= g} = F arg i (}) 2l F i (}) 2 where F denotes the variation of the argument of i round the contour F. Proof: See Titchmarsh (1964, p. 116).
¤
Since polynomials are analytic in the entire complex plane, Theorem 58 is valid for any contour F and can be used to compute the number of zeros inside a certain contour as shown in Section 9.6. 233. The famous and simple theorems of Rouché and of Jensen are very powerful tools in complex analysis. Theorem 59 (Rouché) If i (}) and j (}) are analytic inside and on a closed contour F, and |j (})| ? |i (})| on F, then i (}) and i (}) + j (}) have the same number of zeros inside F. Proof: See Titchmarsh (1964, p. 116).
¤
If at all points of a contour F around the origin holds that ¯ ¯ ¯ X ¯ q ¯ ¯ n m¯ ¯ |dn } | A ¯ dm } ¯ ¯m=0;m6=n ¯ Pq then the contour F encloses n zeros of sq (}) = n=0 dn } n . Proof: A proof not directly based on Rouché’s Theorem is given in Whittaker and Watson (1996, p. 120). The result directly follows from Rouché’s Theorem 59 with i (}) = dn } n , which has a n-multiple zero at the origin and j (}) = Pq m ¤ m=0;m6=n dm } . We give another application of Rouché’s Theorem to a polynomial sq (}) with Pq n real coe!cients d0 A d1 A · · · A dq A 0. If U is such that d0 A n=1 dn U , then sq (}) has no zeros in the disk around the origin with radius U. If U A 1, an improvement of art. 229 is obtained. Theorem 60 (Jensen) Let i (}) be analytic for |}| ? U. Suppose that i (0) 6= 0, and let u1 u2 = = = uq = = = be the moduli of the zeros of i (}) in the circle |}| ? U. Then, if uq u uq+1 , Z 2 q X ¯ ¡ ¢¯ 1 log um = log ¯i uhl ¯ g q log u + log |i (0)| 2 0 m=1
9.8 Zeros of complex functions
303
Proof: See Titchmarsh (1964, p. 125). ¤ Pq Consider the polynomial sq (}) = n=0 dn } n with zeros, ordered as |}1 | A |}2 | A = = = A |}p | A 1 |}p+1 | A = = = A |}q |. Assuming that d0 6= 0, then Jensen’s Theorem 60 states for u = 1 that Z 2 ¯ ¡ ¢¯ 1 |d0 | = log ¯sq hl ¯ g log q Y 2 0 |}m | Using d0 = (1)q dq
Qq
m=p+1
n=1 }n
Z
1 2
2
in art. 196 yields
p Y ¯ ¡ ¢¯ log ¯sq hl ¯ g = log |dq | |}n |
0
With
(9.37)
n=1
¯ ¯ q q ¯ X ¯ ¡ l ¢¯ ¯¯X ln ¯ ¯sq h ¯ = ¯ dn h ¯ |dn | ¯ ¯ n=0
n=0
we obtain the inequality of Mahler (1960), |dq |
p Y
|}n |
n=1
q X
|dn |
(9.38)
n=0
Recall that, if all coe!cients {dn }0nq of the polynomial sq (}) are real, then Qp o Qp number of real negative zeros. Mahler n=1 |}n | = (1) n=1 }n , where o is theQ p (1960) also derives a lower bound for |dq | n=1 |}n |. Let 1 mn q, then, for any 0 n q, n p Y Y |}ml | |}o | l=1
o=1
Vieta’s formula (9.6) shows that, for each 0 n q, ¯ ¯ ¯ n ¯ ¯ ¯ q ¯ X ¯ q q q q n q ¯Y ¯ X X Y X X ¯ ¯ dqn ¯ ¯ X ¯ ¯ ¯ ¯ ¯ ··· }ml ¯¯ ··· }ml ¯ ¯ ¯ dq ¯ = ¯ ¯ ¯m1 =1 m2 =m1 +1 mn =mn1 +1 l=1 ¯ m1 =1 m2 =m1 +1 mn =mn1 +1 l=1 ¯ μ ¶Y p q q p q Y X X X q |}o | ··· 1= |}o | n m =1 m =m +1 m =m +1 o=1
1
2
1
n
o=1
n1
qn
Multiplying by q X n=0
|dqn | qn
and summing over all n results in q p q μ ¶ p X Y X Y q qn q n = |dn | |dq | |}o | = (1 + ) |dq | |}o | n n=0
o=1
n=0
o=1
which gives Mahler’s lower bound when = 1, q
(1 + )
q X n=0
|dn | n |dq |
p Y o=1
|}o |
(9.39)
304
Polynomials with real coe!cients
234. Lagrange’s series for the inverse of a function. Let the function z = i (}) be analytic around the point }0 and i 0 (}0 ) 6= 0. Then, there exists a region around z0 = i (}0 ), in which each point has a unique inverse } = i 1 (z) belonging to the analytic region around }0 . The Lagrange series for the inverse of a function (Markushevich, 1985, II, pp. 88), i
1
¶q ¸¯ μ 4 X ¯ } }0 1 gq1 ¯ (z) = }0 + (z i (}0 ))q ¯ q1 q! g} i (}) i (} ) 0 }=} 0 q=1
(9.40)
is a special case (for J(}) = }) of the more general result ¶q ¸¯ μ 4 X ¯ } }0 1 gq1 0 ¯ J (}) (z i (}0 ))q ¯ q1 q! g} i (}) i (} ) 0 }=}0 q=1 (9.41) Provided that J (}) is analytic inside the contour F around }0 , that encloses a region where i (}) z0 has only a single zero, then the last series (9.41) follows from expanding the integral J(i 1 (z)) = J(}0 ) +
J(i 1 (z)) =
1 2l
Z J(}) F
i 0 (}) g} i (}) z
in a Taylor series around z0 = i (}0 ), J(i
1
¸ Z 4 X 1 i 0 (}) (z)) = J(}) g} (z i (}0 ))q q+1 2l (i (}) i (} )) 0 F q=0
Applying integration by parts for q A 0 gives 1 2l
Z J(}) F
¶ μ Z i 0 (}) 1 1 g} g} = J(})g (i (}) i (}0 ))q+1 2lq F (i (}) i (}0 ))q Z 1 J0 (}) g} = 2lq F (i (}) i (}0 ))q
After rewriting, 1 2l
Z F
1 J0 (}) g} = q (i (}) i (}0 )) 2l
Z μ F
(} }0 ) (i (}) i (}0 ))
¶q
J0 (}) g} (} }0 )q
and invoking Cauchy’s integral formula (9.23) for the n-th derivative, we obtain (9.41). A zero of a function z = i (}), whose inverse is } = i 1 (z), satisfies P4 n = i 1 (0). If the Taylor series i (}) = n=0 in (}0 ) (} }0 ) is known, the Lagrange series (9.40) can be computed formally using characteristic coe!cients 0) (Van Mieghem, 2007) to any desired order. Explicitly, up to order five in ii01 (} (}0 ) , we
9.9 Bounds on values of a polynomial
305
have
# μ μ ¶2 " ¶2 ¶3 μ i0 (}0 ) i0 (}0 ) i2 (}0 ) i0 (}0 ) i3 (}0 ) i2 (}0 ) + 2 + (}0 ) }0 i1 (}0 ) i1 (}0 ) i1 (}0 ) i1 (}0 ) i1 (}0 ) i1 (}0 ) " # μ ¶3 ¶4 μ i0 (}0 ) i2 (}0 ) i3 (}0 ) i2 (}0 ) i4 (}0 ) + 5 +5 i1 (}0 ) i1 (}0 ) i1 (}0 ) i1 (}0 ) i1 (}0 ) " ¶4 μ ¶2 ¶2 μ μ i2 (}0 ) i3 (}0 ) i2 (}0 ) i3 (}0 ) + 14 + 21 3 i1 (}0 ) i1 (}0 ) i1 (}0 ) i1 (}0 ) ¸ μ ¶5 i4 (}0 ) i2 (}0 ) i5 (}0 ) i0 (}0 ) + 6 (9.42) i1 (}0 ) i1 (}0 ) i1 (}0 ) i1 (}0 )
from which we observe that the two first terms are Newton-Raphson’s correction (9.35) in art. 231. The formal Lagrange expansion, where only a few terms in (9.42) are presented, only converges provided }0 is chosen su!ciently close to the zero (}0 ), which underlines the importance of the choice for }0 . Another observation is that, if all Taylor coe!cients in (}0 ) are real as well as }0 , the Lagrange series only possesses real terms such that the zero (}0 ) is real. Thus, for any polynomial i (}) = sq (}) with given real coe!cients, a converging Lagrange series for some real }0 identifies a real zero (}0 ).
9.9 Bounds on values of a polynomial Milovanovi´c et al. (1994) have collected a large amount of bounds on values of a polynomial of several types. Here, we only mention the first contributions to field by Pavnuty Chebyshev. 235. Chebyshev proved the following theorem: Theorem 61 (Chebyshev) Let sq ({) = coe!cients and dq = 1, then, for q 1,
Pq
max |sq ({)|
1{1
n=0
dn {n be a polynomial with real
1 2q1
(}) The equality sign is obtained for sq (}) = W2qq1 , where Wq (}) = cos (q arccos }) are the Chebyshev polynomials of the first kind.
Proof: See, e.g., Aigner and Ziegler (2003, Chapter 18).
¤
An immediate consequence is: Corollary 2 For a real and monic polynomial sq (}), suppose that |sq ({)| f for ¡ ¢1@q all { 5 [d> e]. Then, e d 4 2f .
306
Polynomials with real coe!cients
Proof: We map the {-interval [d> e] onto the |-interval [1> 1] by the linear 2 transform | = ed ({ d) 1. The corresponding polynomial μ tq (|) = sq has leading coe!cient
¡ ed ¢q 2
¶ ed (| + 1) + d 2
and satisfies
max |tq (|)| = max |sq ({)|
1|1
d{e
By Theorem 61 we deduce that max1|1 |tq (|)| ¡ Chebyshev’s ¢q 2 ed . Hence, 4 ¶q μ ed 2 max |sq ({)| f d{e 4 such that e d 4
¡ f ¢1@q 2
¡ ed ¢q 2
1 2q1
=
¤
.
9.10 Bounds for the spacing between zeros 236. Minimum distance between zeros. Mahler (1964) proved a beautiful theorem that bounds the minimum spacing or distance between any pair of simple zeros of a polynomial. Only if all zeros are simple or distinct, the discriminant (sq ) is non-zero as shown in art. 201. Moreover, Mahler’s lower bound (9.43) is the best possible. Theorem 62 (Mahler) For any polynomial sq ({) with degree q 2 and distinct zeros, ordered as |}1 | A |}2 | A = = = A |}p | A 1 |}p+1 | A = = = A |}q |, the minimum distance between any pair of zeros is bounded from below by p p 3 | (sq )| 3 | (sq )| min |}n }m | A 3 4q1 q +1 Pq q1 1n?mq p q 2 ( n=0 |dn |) Y q +1 q 2 C|dq | |}m |D
(9.43)
m=1
where (sq ) is the discriminant, defined in art. 201. Proof: The relation (9.11) between the discriminant and the Vandermonde determinant suggests us to start considering the Vandermonde matrix Yq (}) in (8.90) of the zeros, ordered as in Theorem 62. Subtract row v from row u and use the Pn1 algebraic formula {n | n = ({ |) m=0 {n1m | m such that
9.10 Bounds for the spacing between zeros 5 9 9 9 9 9 det Yq (}) = (}u }v ) 9 9 9 9 7
1 1 .. .
}12 }22 .. .
}13 }23 .. .
··· ··· .. .
}u + }v .. . }q2
}u2 + }u }v + }v2 .. . }q3
··· .. . ···
}1 }2 .. .
0 1 .. .. . . 1 }q
307
}1q1 }2q1 .. .
6
: : : : Pq2 q2m m : : }v : m=0 }u : .. : 8 . q1 }q
We now proceed similarly as in art. 195 by dividing the first p rows, corresponding to the components with absolute value larger than 1, by }mq1 for 1 m p. The only dierence lies in row u, that consists of the elements Pq2 q2m m if u A p 0 1 }u + }v }u2 + }u }v + }v2 · · · }v m=0 }u (q2)
}u2 +}u }v +}v2 }uq1
}u +}v }uq1
0 }u
···
Sq2 m=0
}uq2m }vm
}uq1
if u p
Since u ? v, the ordering tells us that |}u | ¯A |}v |. If u A p,¯ then 1 |}u | A |}v |, and ¯P n2m m ¯ the n-th element in row u is bounded by ¯ n2 }v ¯ n 1, while if u p, m=0 }u ¯P ¯ ¯ }un2m }vm ¯ then |}u | A 1 such that n-th element in row u is bounded by ¯ n2 ¯ n1. q1 m=0 }u Hadamard’s inequality (8.93) shows that v u q p Y uX q1 |}m |q1 t (n 1)2 |det Yq (})| |}u }v | q 2 m=1
Pq1 2 Using = n=1 n 23.1.4), we have
q(q1)(2q1) 6
?
q3 3
n=1
(Abramowitz and Stegun, 1968, Section
Y |}u }v | q+2 s |}m |q1 q 2 3 m=1 p
|det Yq (})|
(9.44)
This inequality (9.44) is nearly the best possible, because equality is attained if m 2l }m = h2l q as shown in art. 195 and art. 143. Choosing }u = 1 and }v = h q , q@2 |}u }v | = 2 sin q , while we such that μ know ¶ from (8.94) that |det Yq (})| = q |det Yq (})| |}u }v |
=
qq@2 2 sin q
=
q
sin
q
q
q+2 2
2
, which tends to
|det Yq (})| |}u }v |
$
q
q+2 2
2
for large q.
This illustrates that (9.44) cannot be improved, except perhaps for a slightly smaller prefactor than s13 . Since the inequality (9.44) holds for any u and v, Theorem 62 now follows from the definition of the discriminant (9.11). The last inequality in (9.43) follows from (9.38). ¤ Usually, the discriminant (sq ) is not easy to determine. However, for a polynomial with integer coe!cients (and thus also rational coe!cients because sq(}) and sq (}) have the same zeros for any complex number 6= 0), art. 201 shows that (sq ) is a function of the coe!cients dn and, hence, an integer. In addition
308
Polynomials with real coe!cients
(sq ) 6= 0, such that | (sq )| 1. Thus, the minimum spacing between the simple zeros of a polynomial with rational coe!cients dn 5 Q is lower bounded by s 3 (9.45) min |}n }m | A q Pq q1 +1 1n?mq q 2 ( n=0 |dn |) 237. An upper bound for the spacing (Milovanovi´c et al., 1994, p. 106) due to Lupas is r 3 Var [}] min |}n }m | 2 1n?mq q2 1 where the variance of the zeros of a real polynomial is defined as à q !2 q 1X 1X 2 }n }n Var [}] = q q n=1
n=1
Using the Newton identities in art. 198 yields ½ ¾ d2q1 2qdq2 1 Var [}] = 2 (q 1) 2 q dq dq Equality in the upper bound is attained for the polynomial à ! r q Y 3 Var [}] sq (}) = } H [}] (q 2n + 1) q2 1 n=1
where the mean of the zeros H [}] =
1 q
Pq
n=1 }n
q1 = dqd . q
9.11 Bounds on the zeros of a polynomial McNamee (2007) provides a long list of bounds on the modulus of the largest zero = max1nq |}n | of sq (}), where the coe!cients dn 5 C and dq = 1. By testing over a large number of polynomials, he mentions that the relatively simple formula, due to Deutsch (1970), ¯ ¯ ¯ dn1 ¯ ¯ ¯ |dq1 | + max 1nq1 and dn 6=0 ¯ dn ¯ which is an extension of art. 196 to complex coe!cients dn derived from the companion matrix (art. 143) using matrix norms, ended up as second best. The best one, due to Kalantari,
max
4nq+3
¯ 2 ¯ 1 ¯dq1 dqn+3 dq1 dqn+2 dq2 dqn+3 + dqn+1 ¯ n1
is clearly more complicated.
9.11 Bounds on the zeros of a polynomial
309 Pq
m
238. Let }1 denote the zero with largest modulus and define m = n=1 |}n | m |]m |. Evidently, |}1 | m . On the other hand, since |}1 | |}n | for any 1 n q, m+1 =
q X
|}n |m |}n | |}1 |
n=1
q X
|}n |m = |}1 | m
n=1
Combining both inequalities yields the bounds, for any m A 0, 1 m+1 |}1 | mm m
239. The Hölder inequality, presented in art. 8.40 in vector form is, with s A 1 and
1 s
+
1 t
= 1, 3 4 s1 3 4 1t q q X X |{m |m | C |{m |s D C ||m |t D
q X m=1
m=1
(9.46)
m=1
Quite likely, there exists a value of s A 1 that results in the lowest upper bound, but that value of s will dependent on the values of {m and |m . For any integer p and m, we may write q ¯ ¯ X ¯ mp ¯ p m = ¯}n ¯ |}n | n=1
Applying (9.46) gives à m
q X
|}n |
s(mp)
! s1 Ã q X
n=1
|}n |
sp s1
!1 s1
n=1
sp where we require that s (m p) = o and s1 = k and both o and p are integers. o All solutions satisfy p (o k) = (o m) k with o A 0 and s = mp . For example, m the solution k = o = m and s = mp returns, for any p and m, an equality, namely the definition of m . The case s = 2 is à q !à q ! X X 2 2(mp) 2p m |}n | |}n | (9.47) = 2(mp) 2p n=1
n=1
which is particularly useful in the case where m is even and all zeros are real. 240. The next theorem sharpens the bounds in art. 238: Pq Theorem 63 If }1 > = = = > }q are the real zeros of a polynomial sq (}) = n=0 dn } n Pq for which ]1 = £n=1¤ }n = 0, then any zero 5 {}1 > = = = > }q } is bounded, for positive integers 1 p q2 by Ã
]2p 1+
1 (Q1)2p1
Ã
1 ! 2p
]2p 1+
1 (Q 1)2p1
1 ! 2p
(9.48)
310
Polynomials with real coe!cients PQ
where ]m = n=1 }nm is uniquely expressed via the recursion (9.4) in terms of the coe!cients dn . Before proving Theorem 63, we note that, as shown in art. 203, the condition ]1 = 0, which is equivalent to the requirement that dq1 = 0 by (9.5), is not confining. Proof: Let }1 denote an arbitrary zero of sq (}) (because we can always relabel the zeros), then ]m = }1m +
q X
}nm
n=2
Applying the Hölder inequality (9.46) to {m = 1 and |m = }m 5 R for 2 m q, yields for even t = 2p, 3 42p q q X X 1 C D |} | }n2p (9.49) m 2p1 (q 1) m=2 n=2 Since
Pq
m=2 }m
Pq m=2
|}m | and
Pq
1 (q 1)2p1
m=2 }m
μ
= dq1 dq }1 , the inequality (9.49) becomes
dq1 + }1 dq
¶2p ]2p }12p
Using the assumption that dq1 = 0, we finally arrive, for all 1 p bounds (9.48) for any zero of sq (}), and thus also for the largest zero.
(9.50) £q¤ 2 , at our ¤
Theorem 63 actually generalizes a famous theorem due to Laguerre in which p = 1 and where the condition that ]1 = 0 was not needed. Theorem 64 (Laguerre) If all the zeros }1 > = = = > }q of a polynomial sq ({) = Pq n n=0 dn { with dq = 1 are real, then they all lie in the interval [| > |+ ] where r dq1 q 1 2q ± dq2 |± = d2q1 q q q1 Proof: Laguerre’s theorem follows immediately from (9.50) and the Newton identities in art. 198 for p = 1. See also Aigner and Ziegler (2003, p. 101). ¤ Since the quartic equation (p = 2 in (9.50)) is still solvable exactly, that case can be expressed in closed form without using the condition ]1 = 0, as in the proof of Laguerre’s Theorem 64. However, all other p 2 cases are greatly simplified by the condition ]1 = 0, that relieves us from solving a polynomial equation of order 2p. Theorem 65 If }1 > = = = > }q are the real zeros of a polynomial sq (}) =
Pq n=0
dn } n ,
9.11 Bounds on the zeros of a polynomial
311
then any zero 5 {}1 > = = = > }q } is upper bounded, for positive integers p> by 41 3 s μ ¶2 p p ]2p D ]2p ]4p + (q 1) }1 C (9.51) q q q and lower bounded for odd integer values of p> by 41 3 s μ ¶2 p p ] ] ] 2p 4p 2p D }1 C (q 1) q q q
(9.52)
Proof: In a similar vein, application of (9.47) for m = 2p and p = o, gives ´ ¡ ¢2 ³ ¢ 2(2po) ¡ ]2o }12o ]2p }12p ]2(2po) }1 or 2 2}12p ]2p ]4p2o ]2o }12o ]4p2o }14p2o ]2o ]2p
If o = p, then the dependence on }1 disappears. If o =
p 2
or o =
3p 2 ,
then
2 }13p ]p 2}12p ]2p + }1p ]3p + ]2p ]3p ]p 0
whose exact solution (via Cardano’s formula) is less attractive. For o = 0, the quadratic inequality in }1p 2 2]2p 2p ]2p (q 1) ]4p }1 + 0 q q is obeyed for any }1p lying in between s μ ¶2 ]2p ]2p p ]4p ± (q 1) }± = q q q ¢2 ¡ The Cauchy—Schwarz inequality (8.41) shows that ]q4p ]q2p 0, implying that the roots are real. Thus, we find the upper bound (9.51) and the lower bound (9.52), that always exists for odd p. ¤
}14p
10 Orthogonal polynomials
The classical theory of orthogonal polynomials is reviewed from an algebraic point of view. The book by Szeg˝o (1978) is still regarded as the standard text, although it approaches the theory of orthogonal polynomials via complex function theory and dierential equations. The classical theory of orthogonal polynomials is remarkably beautiful, and powerful at the same time. Moreover, as shown in Section 5.11, we found interesting relations with graph theory. An overview and properties of the classical orthogonal polynomials such as Legendre, Chebyshev, Jacobi, Laguerre and Hermite polynomials is found in Abramowitz and Stegun (1968, Chapter 22). A general classification scheme of orthogonal polynomials is presented by Koekoek et al. (2010). The theory of expanding an arbitrary function in terms of solutions of a second-order dierential equation, initiated by Sturm and Liouville, and treated by Titchmarsh (1962) and by Titchmarsh (1958) for partial dierential equations, can be regarded as the generalization of orthogonal polynomial expansions.
10.1 Definitions 241. The usual scalar or inner product between two vectors { and |, that is denoted as {W |, is generalized to real functions i and j as the Stieltjes-Lebesgue integral1 over the interval [d> e] Z e (i> j) = i (x) j (x) gZ (x) (10.1) d
where the distribution function Z is a non-decreasing, non-constant function in 1
As mentioned in the introduction of Van Mieghem (2006b, Chapter 2), the Stieltjes integral unifies both continuous and dierentiable distribution functions as well as discrete ones, in which case, the integral reduces to a sum. If Z is dierentiable, then ] e (i> j) = i (x) j (x) z (x) gx d
and the non-negative function z (x) = Z 0 (x) is often called a weight function. A broader discussion is given by Szeg˝o (1978, Section 1.4).
313
314
Orthogonal polynomials
[d> e]. As in linear algebra, the function p i and j are called orthogonal if (i> j) = 0 and, likewise, the norm of i is ki k = (i> i ). Moreover, the generalization (10.1) is obviously linear, (i + k> j) = (i> j) + (k> j) and commutative (i> j) = (j> i ). The definition thus assumes the knowledge of both the interval [d> R ee] as well as the distribution function Z . All functions i , for which the integral d |i (x)|2 gZ (x) in (10.1) exists, constitute the space O2[d>e] . 242. An orthogonal set of real functions i0 ({) > i1 ({) > = = = > ip ({) is defined, for any n 6= o 5 {0> 1> = = = > p}, by Z e (in > io ) = in (x) io (x) gZ (x) = 0 (10.2) d
When (in > in ) = 1 for all n 5 {0> 1> = = = > p}, the set is normalized and called an orthonormal set of functions. Just as in linear algebra, these functions {in }0np are linearly independent. Since polynomials are special types of functions, an orthogonal set of polynomials {n }0np is also defined by (10.2), and we denote, an orthogonal polynomial of degree q, by q or q ({). In addition, the general polynomial expression (9.1) is q ({) =
q X
fn;q {n
(10.3)
n=0
¡ ¢ The special scalar product pn = {n > 1 , or in integral form Z e xn gZ (x) pn =
(10.4)
d
is called the moment of order n, and is further studied in art. 246. If q ({) is an orthogonal polynomial, then eq ({) = sq ({) is an orthonormal (q >q )
polynomial. Although the highest coe!cients fq;q can be any real number, we may always choose fq;q A 0 since any polynomial can be multiplied by a number without aecting the zeros. The fact that fq;q A 0 is sometimes implicitly assumed. 243. The Gram-Schmidt orthogonalization process. Analogous to linear algebra, where a set of q linearly independent vectors that span the q-dimensional space are transformed into an orthonormal set of vectors, the Gram-Schmidt orthogonalization process can also be used to construct a set of orthonormal polynomials, defined by the scalar product (10.1). First, the constant polynomial of degree zero 0 ({) = 0 is chosen to obey Z e gZ (x) = 02 p0 1 = (0 > 0 ) = 02 d
where the moment of order zero in (10.4) equals p0 = Z (e) Z (d) A 0 because the distribution function Z ({) is non-decreasing in {. Thus, e0 ({) =
s1 . p0
10.2 Properties
315
The degree one polynomial, 1 ({) = f1;1 { + f0;1 must be orthogonal to 0 ({) and orthonormal, (1 > 1 ) = 1. These two requirements result in (1 > 0 ) = f1;1 ({> 0 ) + f0;1 (1> 0 ) = 0 such that f0;1 =
f1;1 ({>0 ) (1>0 ) ,
f1;1 {+f0;1 while normalization requires that e1 ({) = s . (1 >1 )
Combined, leads to f1;1 e1 ({) = p (1 > 1 )
μ ¶ ({> 0 ) { (1> 0 )
Both 1 and 0 are real polynomials. We can continue the process and compute the degree two polynomial that is orthogonal to both 1 and 0 , and that is also orthonormal. Suppose now that we have constructed a sequence of orthonormal polynomials 0 > 1 > = = = > q1 , which are all real, obey the orthogonality condition (10.2) and are normalized, (n > n ) = 1. Next, we construct the polynomial q ({) that is orthogonal to all lower degree orthonormal polynomials by considering q ({) = fq;q {q
q1 X
dn n ({)
n=0
Orthogonality requires for m ? q that 0 = (q > m ) = fq;q ({q > m )
q1 X
dn (n > m )
n=0
= fq;q ({q > m ) dm (m > m ) such that dm = fq;q
({q > m ) (m > m )
After normalization (q > q ) = 1, we obtain the real orthonormal polynomial of degree q: ! Ã q1 X ({q > n ) fq;q q n ({) eq ({) = p { (n > n ) (q > q ) n=0
By induction on q, it follows that there exists an orthonormal set of polynomials belonging to the scalar product (10.1).
10.2 Properties 244. Key orthogonality property. Let sq ({) be an arbitrary polynomial of degree
316
Orthogonal polynomials
q, then it can be written as a linear combination of the (linearly independent) orthogonal polynomials {n }0np , provided p q. Hence, sq ({) =
q X
en;q n ({)
n=0
After multiplying both sides by o ({), taking the scalar product, and using the orthogonality in (10.2), we find that, for all 0 o q, (sq > o ) (o > o )
eo;q =
Hence, any polynomial of degree q p can be expressed in a unique way as a linear combination of the set of orthogonal polynomials {n }0np . Because sq ({) is of degree q, we notice that the coe!cients eo;q = 0 when o A q. In summary, a key property of orthogonality is ½ eo;q ko k2 if 0 o q (sq > o ) = 0 if o A q Pq 245. Example of art. 244. If sq ({) = q ({) = n=0 fn;q {n , then ¡ n ¢ Pq q X ¡ ¢ 1 n=0 fn;q { > o = fn;q {n > o eo;q = o>q = (o > o ) (o > o ) n=o
In particular, the coe!cient of the highest degree is the same, eq;q = 1, such that (q > q ) ({q > q )
fq;q =
(10.5)
246. A first interesting consequence of art. 244 arises for the special class of q q polynomials sq ({) = {q . In that q, then ¡ q case, ¢ if ¡o Aq+n ¢ ({ > o ) = 0, but ({ > q ) 6= 0. n Introducing (10.3) and using { > { = { > 1 = pq+n yields, for q o, ({q > o ) =
o X
fn;o pq+n
n=0
which is written in matrix form, taking into account that ({q > o ) = ({q > q ) o>q for 0 q o, as 5 9 9 9 7
p0 p1 .. .
p1 p2 .. .
··· ··· .. .
po po+1 .. .
po
po+1
···
p2o
65 :9 :9 :9 87
f0;o f1;o .. . fo;o
6
5
: 9 : 9 :=9 8 7
0 0 .. .
¡ o ¢ { > o
6 : : : 8
10.3 The three-term recursion The symmetric moment matrix 5 9 9 Po = 9 7
p0 p1 .. .
p1 p2 .. .
··· ··· .. .
po po+1 .. .
po
po+1
···
p2o
317 6 : : : 8
is a (o + 1) × (o + 1) Hankel matrix2 . The Gram-Schmidt orthogonalization process (art. 243) shows that there always exists an orthogonal set of polynomials, such that a set of not-all-zero coe!cients {fn;o }0no exists. This implies that the determinant of the left-hand side moment matrix is non-zero. By Cramer’s rule (8.83), the solution is ¡ o ¢ { > o cofactoro+1>n Po fn;o = det Po In particular, fo;o =
det Po1 ¡ o ¢ { > o det Po
(10.6)
which is always dierent from zero. Pq For any polynomial sq ({) = n=0 dn {n , we can write, by applying (10.6) for q = o, q X ¡ ¢ fq;q det Pq (sq > q ) = dn {n > q = dq ({q > q ) = dq det Pq1 n=0
In particular, when choosing for sq ({) = q ({), we observe that (q > q ) det Pq = A0 det Pq1 f2q;q
(10.7)
10.3 The three-term recursion 247. The three-term recursion. As another, even more important application of art. 244, we consider the polynomial sq ({) = {q1 ({), whose coe!cients are eo;q =
(q1 > {o ) ({o > q1 ) ({q1 > o ) = = (o > o ) (o > o ) (o > o )
Since {o is a polynomial of degree o + 1, we know from art. 244 that eo;q = 0 when q 1 A o + 1, thus when o ? q 2. Hence, we find that any set of orthogonal polynomials possesses a three-term recursion for 2 q o {q1 ({) = eq;q q ({) + eq1;q q1 ({) + eq2;q q2 ({)
(10.8)
When q = 1, then 0 ({) is a constant and 1 ({) = 0. Observe that any other 2
We refer for properties of the Hankel matrix to Gantmacher (1959a, pp. 338-348).
318
Orthogonal polynomials
polynomial sq ({) = {m qm ({) with m A 1, will result in a recursion with 2m + 1 ({m o >qm ) ({m qm >o ) terms because (o >o ) = (o >o ) = 0 if q m A o + m, thus o ? q 2m. Further, taking the scalar product in (10.8) with {q2 yields ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ {q1 > {q2 = eq;q q > {q2 + eq1;q q1 > {q2 + eq2;q q2 > {q2 ¢ ¡ ¢ ¡ ¢ ¡ Since {q1 > {q2 = q1 > {q1 while m > {q2 = 0 for m A q 2, we find, q1 >{q2 ) beside eq2;q = ( (q2 >q2 ) , that ¡ ¢ q1 > {q1 fq1;q1 det Pq1 det Pq3 = eq2;q = (10.9) 2 (q2 > {q2 ) fq2;q2 (det Pq2 ) where the last formula follows from (10.6). It demonstrates that eq2;q 6= 0. Moreover, (10.7) shows that for monic polynomials, i.e., if fq;q = 1, that eq2;q A 0 and, thus, (q1 ({) > {q2 ) A 0. Substituting (10.3) in ¡ n ¢ ¡ ¢ Pq1 fq2;q2 {q1 > q1 ({q2 > q1 ) n=1 fn1;q2 { > q1 = = eq2;q = (q2 > q2 ) (q2 > q2 ) (q2 > q2 ) leads with (10.5) to eq2;q =
fq2;q2 (q1 > q1 ) fq1;q1 (q2 > q2 )
(10.10)
Further, eq1;q
1 ({q1 > q1 ) = = (q1 > q1 ) (q1 > q1 )
Z d
e 2 xq1 (x) gZ (x)
(10.11)
shows that eq1;q is positive if e A d 0 and negative, if e ? d 0. It can only be zero provided symmetry holds, d = e and z (x) = gZ gx = z (x). The coe!cient eq;q can be rewritten as ¡ ¢ Pq fn1;q1 {n > q fq1;q1 ({q > q ) ({q1 ({) > q ) = n=1 = eq;q = (q > q ) (q > q ) (q > q ) Using (10.5) leads to eq;q =
fq1;q1 fq;q
(10.12)
The expressions (10.9) and (10.12) simplify for monic polynomials where fq;q = 1. 248. Often, thepthree-term recursion (10.8) is rewritten in normalized form with q ({) = eq ({) (q > q ) as p p eq2;q (q2 > q2 ) (q1 > q1 ) p p eq1 ({) eq ({) = ({ eq1;q ) eq2 ({) eq;q (q > q ) eq;q (q > q ) Substituting the expressions (10.12), (10.11) and (10.9) for the e-coe!cients yields eq ({) = ({ eq1;q ) eq1 ({)
e fq;q e fq;q e fq2;q2 eq2 ({) e fq1;q1 e f2q1;q1
10.3 The three-term recursion f where e fq;q = s q;q
(q >q )
319
. Thus,
eq ({) = (Dq { + Eq ) eq1 ({) Fq eq2 ({) where Dq =
fq;q h h fq1;q1 ,
h f
q;q Eq = eq1;q hfq1;q1 and Fq =
h fq;q h fq2;q2 h f2q1;q1
(10.13) =
Dq Dq1 .
The
Dq Dq1 ,
as illusmajor advantage of the normalized expression is the relation Fq = trated in art. 249. The converse is proven by Favard: if a set of polynomials satisfies a three-term recursion as (10.8), then the set of polynomials is orthogonal. Favard’s theorem is proven in Chihara (1978, p. 22) for monic polynomials, where Dq = 1 and Fq A 0. 249. Christoel-Darboux formula. Multiplying both sides of the normalized threeterm recursion (10.13) by eq1 (|), eq ({) = (Dq { + Eq ) eq1 ({) eq1 (|) Fq eq2 ({) eq1 (|) eq1 (|) Similarly, letting { $ | in (10.13) and multiplying both sides by eq1 ({) yields eq (|) = (Dq | + Eq ) eq1 ({) eq1 (|) Fq eq2 (|) eq1 ({) eq1 ({) Subtracting the second equation from the first results in eq ({) eq1 ({) eq (|) = Dq ({ |) eq1 ({) eq1 (|) eq1 (|) q2 ({) eq1 (|) eq2 (|) eq1 ({)} Fq {e q , that only holds for the normalized At this stage, we employ the relation Fq = DDq1 three-term recursion (10.13) and not for (10.8). Defining
eq1 (|) eq ({) eq1 ({) eq (|) Dq
jq = leads to
eq1 (|) = jq jq1 ({ |) eq1 ({) Summing both sides over q, ({ |)
p+1 X
eq1 ({) eq1 (|) =
q=1
p+1 X q=1
jq
p+1 X
jq1
q=1
= jp+1 j0 = jp+1 because e1 = 0. Hence, we arrive at the famous Christoel-Darboux formula, p X
eq ({) eq (|) =
q=0
1 ep (|) ep+1 ({) ep ({) ep+1 (|) Dp+1 {|
(10.14)
which can also be written as p X q=0
eq ({) eq (|) =
ep (|) ep+1 ({) ep+1 (|) ep ({) ep (|) ep+1 (|) Dp+1 {| Dp+1 {|
320
Orthogonal polynomials
The special case, where { = |, follows, after invoking the definition of the derivative, as μ ¶ p 0 0 X ep ({) ep+1 ({) ep ({) ep+1 ({) ep+1 ({) e2 ({) g eq2 ({) = = p Dp+1 Dp+1 g{ ep ({) q=0 (10.15) 250. Associated orthogonal polynomials. Similar to the derivation of the ChristoelDarboux formula in art. 249, we consider the dierence at two arguments of the three-term recursion (10.8) for q A 1, {q1 ({) |q1 (|) = eq;q [q ({) q (|)] + eq1;q [q1 ({) q1 (|)] + eq2;q [q2 ({) q2 (|)] We rewrite the left-hand side as {q1 ({) |q1 (|) = { [q1 ({) q1 (|)] + ({ |) q1 (|) and obtain ({ |) q1 (|) = eq;q [q ({) q (|)] + (eq1;q {) [q1 ({) q1 (|)] + eq2;q [q2 ({) q2 (|)] After multiplying both sides by Z
e
gZ (|) {|
Z
and integrating over [d> e], we have
e
q ({) q (|) gZ (|) {| d Z e q1 ({) q1 (|) + (eq1;q {) gZ (|) {| d Z e q2 ({) q2 (|) + eq2;q gZ (|) {| d
q1 (|) gZ (|) = eq;q d
Re Since d q1 (|) gZ (|) = (q1 > 1) = 0 for q A 1 by orthogonality (art. 244), we arrive, with the definition Z q ({) = d
e
q ({) q (|) gZ (|) {|
(10.16)
at the same recursion as (10.8) for q A 1, {q1 ({) = eq;q q ({) + eq1;q q1 ({) + eq2;q q2 ({) If q = 0, in which case 0 ({) is a constant, then (10.16) shows that 0 ({) = 0. For q = 1 where 1 ({) = f1;1 { + f0;1 , the integral (10.16) with (10.4) gives 1 ({) =
10.3 The three-term recursion
321
f1;1 p0 . By introducing (10.3) in (10.16), we have Z e n q X { |n gZ (|) fn;q q ({) = {| d n=0 n1 q X X Z e = fn;q {m | n1m gZ (|) d
m=0
n=0
Using (10.4) yields q ({) =
q X
fn;q
n1 X
{m pn1m
m=0
n=0
After reversal of the summation, we find that 3 4 q1 q X X C q ({) = fn;q pn1m D {m m=0
n=m+1
is a polynomial of order q 1. Since the polynomials q ({) satisfy a three-term recursion, Favard’s theorem (art. 247) states that these polynomials are also orthogonal. The polynomials q ({), defined by the integral (10.16), are called orthogonal polynomials of the second kind or associated orthogonal polynomials. The analysis shows that, by choosing other initial conditions, another set of orthogonal polynomials can be obtained from the three-term recursion (10.8). 251. The integral (10.16) of q ({) cannot be split into two integrals when { 5 [d> e], due to the pole at {. However, when { 5 C\ [d> e], then we can write, Z e Z e gZ (|) q (|) q ({) = q ({) gZ (|) { | d d {| P4 n 1 For |{| A max (|d| > |e|), we expand {| = n=0 {|n+1 and interchange the integration and summation (which is valid when assuming absolute convergence). The first integral, Z e gZ (|) I ({) = (10.17) {| d becomes I ({) =
4 X n=0
Z
1 {n+1
while the second integral,
e d
Z
e
q ({) = d
reads q ({) =
4 X n=0
1 {n+1
Z d
e
| n gZ (|) =
4 X pn {n+1
n=0
q (|) gZ (|) {|
(10.18)
¢ 4 ¡ n X | > q | q (|) gZ (|) = {n+1 n
n=q
322
Orthogonal polynomials ¢ ¡ because, by orthogonality (art. 244), | n > q = 0 for n ? q. Hence, for large {, we rewrite q ({) = q ({) I ({) q ({) as ¢ ¡ q ({) q ({) = I ({) = I ({) + R {2q1 q ({) q ({)
(10.19)
whose consequences are further explored in art. 257. Convergence considerations when q $ 4 are discussed in Gautschi (2004). 252. Computing the weight function z ({). The two functions I (}) and q (}) are analytic in C\ and both integral representations resemble the Cauchy integral, R [d> ie](z) 1 i (}) = 2l gz, where F (}) is a contour that encloses the point } 5 C. A F(}) }z general theorem (Markushevich, 1985, p. 312) states that the integral of the Cauchy type, Z 1 * (z) i (}) = gz 2l O } z satisfies lim
}$}0 ;}5L(}0 )
i (})
lim
z$}0 ;z5H(}0 )
i (z) = * (}0 )
where O is a (not necessarily closed) path in the complex plane and on which |* (}) * (}0 )| f (} }0 ) for any point }> }0 5 O, and where f> A 0 are constants. The interior L (}0 ) is a region enclosed3 by a closed contour F (}0 ) around }0 5 O in which * (z) is analytic. The exterior H (}0 ) is the region that is not enclosed by a contour F 0 (}0 ). The contours F (}0 ) and F 0 (}0 ) are here formed by a circle around }0 with a radius such that it intersects the path O in two points }1 and }2 and such that * (z) is analytic in the enclosed region. The first contour F (}0 ) follows the path O in the positive direction from }1 to }2 and returns to }1 along the circle in positive direction, whereas contour F 0 (}0 ) similarly follows the path O in the positive direction from }1 to }2 , but it returns to }1 along the circle around }0 in negative direction. Hence, any point } lying inside the contour F (}0 ) is enclosed in positive direction, whereas any point z lying inside the contour F 0 (}0 ) is enclosed in negative direction. This direction does not contribute to the Cauchy integral, since it is equivalent to being not enclosed (see, e.g., Titchmarsh (1964)). We apply this theorem to the integral (10.18). The path O is the segment [d> e] on the real axis and }0 = {0 5 [d> e]. The contour F (}0 ) around }0 = {0 is the path from {0 u A d to {0 + u ? e along the real axis and the circle segment lying above the real axis (with positive imaginary part). The contour F 0 (}0 ) follows the same 3
An observer traveling in the direction of the contour around }0 finds the interior on his left and the exterior on his right.
10.4 Zeros of orthogonal polynomials
323
segment from {0 u A d to {0 + u ? e, but returns along the semicircle below the real axis. Hence, lim
}$}0 ;}5L(}0 )
lim
z$}0 ;z5H(}0 )
q (}) = lim q ({0 + l|) |$0
q (z) = lim q ({0 l|) |$0
Since the complex conjugate q (}) = q (} ), by the reflection principle (Titchmarsh, 1964, p. 155) because q (}) is real on the real axis, we have that q ({0 l|) = Re q ({0 l|) + l Im q ({0 l|) = Re q ({0 + l|) l Im q ({0 + l|) and, finally, 1 lim Im q ({0 + l|) = q ({0 ) z ({0 ) |$0 Similarly, from the integral (10.17), the density function is found at {0 5 [d> e] by z ({0 ) =
1 lim Im I ({0 + l|) |$0
10.4 Zeros of orthogonal polynomials As will be illustrated, a lot of information about the p zeros of orthogonal polynomials eq ({) (q > q ) shows that both the can be deduced. We remark that q ({) = orthogonal polynomial and its normalized version possess the same zeros. 253. Zeros of orthogonal polynomials. Theorem 66 All zeros of the orthogonal polynomial o (x) are real, simple and lie inside the interval [d> e]. Proof: Art. 246 has shown that ({q > o ) = 0 if q ? o. The particular case q = 0 and o 1, written with (10.1) as Z e o (x) gZ (x) = 0 d
indicates that there must exist at least one point within the interval (d> e) at which o (x) changes sign because Z (x) is a distribution function with positive density. The change in sign implies that such a point is a zero with odd multiplicity. Let Qn }1 > }2 > = = = > }n be all such points and consider the polynomial tn ({) = m=1 ({ }m ). Art. 244 shows that (o > to ) 6= 0 but that (o > tn ) = 0 if o A n, Z
e
o (x) d
n Y m=1
({ }m ) gZ (x) = 0
324
Orthogonal polynomials Qn
By construction, o (x) m=1 ({ }m ) does not change sign for any { 5 [d> e] and, hence, the integral cannot vanish. Orthogonality shows that the non-vanishing of (o > tn ) is only possible provided n = o. The fundamental theorem of algebra (art. 196) together with the odd multiplicity of each zero }m then implies that all zeros are simple. ¤ Szeg˝o (1978, p. 45) presents other proofs. For example, the simplicity of the zeros can be deduced by applying the Sturm sequence (art. 224) to the tree-term recursion (10.13) (assuming fq;q A 0). Theorem 66 shows that any orthogonal polynomial only possesses real and simple zeros. An arbitrary polynomial with real coe!cients possesses zeros that, with high probability, do not all lie on a line segment in the complex plane, which illustrates the peculiar nature of orthogonal polynomials. Let d }q;q ? }q1;q ? · · · ? }1;q e denote the zeros of the orthogonal polynomial q ({). Combining Theorem 66 and (9.1) yields q ({) = fq;q
q Y
({ }m;q )
(10.20)
m=1
254. Interlacing property of zeros of orthogonal polynomials. The main observations are derived from the Christoel-Darboux formula (10.15), which implies, assuming Dq+1 A 0, 0 eq ({) eq+1 ({) eq0 ({) eq+1 ({)
1 A0 p0
(10.21)
1 because e0 ({) = sp . The simplicity of the zeros (Theorem 66) implies that the 0 0 derivative ep ({) cannot have the same zero as ep ({). Hence, the above inequality indicates that ep ({) and ep+1 ({) cannot have a same zero.
Theorem 67 (Interlacing) Let d ? }q;q ? }q1;q ? · · · ? }1;q ? e be the zeros of the orthogonal polynomial q ({). The zeros of q ({) and q+1 ({) are interlaced, d ? }q+1;q+1 ? }q;q ? }q;q+1 ? }q1;q ? · · · ? }1;q ? }1;q+1 ? e In other words, between each pair of consecutive zeros of q ({), there lies a zero of q+1 ({), thus, }n;q ? }n;q+1 ? }n1;q for all 1 n q, while the smallest and largest zero obey d ? }q+1;q+1 ? }q;q and }1;q ? }1;q+1 ? e. Proof: Theorem 66 shows that the zeros are simple and real such that q0 (}n;q ) q0 (}n1;q ) ? 0 On the other hand, the inequality (10.21) implies that q0 (}n;q ) q+1 (}n;q ) A 0
and
q0 (}n1;q ) q+1 (}n1;q ) A 0
Multiplying both and taking q0 (}n;q ) q0 (}n1;q ) ? 0 into account yields q+1 (}n;q ) q+1 (}n1;q ) ? 0
10.4 Zeros of orthogonal polynomials
325
which means that there is at least one zero }m;q+1 between }n;q ? }m;q+1 ? }n1;q . Since the inequalities hold for all 1 n q, the argument accounts for at least q1 zeros of q+1 ({). With the convention that fq;q A 0 for all q 0, we know that q ({) is increasing at least from the largest zero on, q0 (}1;q ) A 0. The inequality (10.21) indicates that q+1 (}1;q ) ? 0. By the convention fq;q A 0 for all q 0, we have that q+1 (e) A 0 such that there must be a zero, in fact the largest }1;q+1 of q+1 ({) in the interval [}1;q > e]. A similar argument applies for the smallest zero }q+1;q+1 , thereby proving the theorem. ¤ n ({)}0nq is If }q;q = d, the interlacing Theorem 67 implies that the set {e finite and that eq ({) is the highest order polynomial of that finite orthogonal set with a zero equal to }q;q = d. All other smallest zeros are larger, i.e., }n;n A d for 1 n q 1. Another noteworthy consequence of the interlacing Theorem 67 is the partial fraction decomposition q+1 X n;q+1 q ({) = q+1 ({) { }n;q+1 n=1
where the coe!cients, in general, obey n;q+1 =
lim
{$}n;q+1
q (}n;q+1 ) q ({) ({ }n;q+1 ) = 0 q+1 ({) q+1 (}n;q+1 )
Inequality (10.21) shows that all n;q+1 A 0. We include here a sharpening of the interlacing property whose proof relies on the Gaussian quadrature Theorem 69 derived in Section 10.5. Theorem 68 Between two zeros of q ({), there is at least one zero of o ({) with o A q. Proof: Assume the contrary, namely o ({) has no zero between }n;q and }n1;q for some n 5 [1> q]. Then, the polynomial sp ({) = q ({) tq2 ({) of degree p = 2q 2, where tq2 ({) =
q ({) ({ }n;q ) ({ }n1;q )
is everywhere non-zero in [d> e], except in the interval (}n;q > }n1;q ), where sp ({) is negative. The Gaussian quadrature Theorem 69 shows, for p = 2q 2 ? 2o, that Z
e
sp ({) gZ ({) = d
o X
sp (}m;o ) m;o A 0
m=1
because (a) the Christoel numbers m;o are positive (art. 256), and (b) sp (}m;o ) cannot vanish at every zero }m;o of o ({) and sp (}m;o ) 0 since, by hypothesis,
326
Orthogonal polynomials
@ [}n;q > }n1;q ]. But this contradicts the basic orthogonality property, }m;o 5 Z
e
sp ({) gZ ({) = (q > tq2 ) = 0 d
¤
established in art. 244.
Szeg˝o (1978, p. 112) mentions the following distance result between consecutive zeros. If the density or weight function gZg{({) = z ({) zmin A 0 and the zeros ed are written as }n;q = d+e 2 + 2 cos n;q , where 0 ? n;q ? , for 1 n q, then it holds that log q n+1;q n;q ? q where the constant depends on zmin , d and e. If stronger constraints are imposed on the weight function z, the log q factor in the numerator can be removed. More precise results on the location of zeros only seem possible in specific cases and/or when the dierential equation of the set of orthogonal polynomials is known.
10.5 Gaussian quadrature Lanczos R(1988, pp. 396-414) nicely explains Gauss’s genial idea to compute the 1 integral 1 i (x) gx with “double order accuracy” compared to other numerical integration methods. The underlying principle of Gauss’s renowned quadrature method is orthogonality and properties of orthogonal polynomials. Before giving an example, we first focus on the theory. 255. We consider the Lagrange polynomial tq1 of degree q 1 (art. 206) that coincides at q points, defined by their finite coordinates ({m > |m ) for 1 m q, with the arbitrary polynomial sp ({) of degree p A q, tq1 ({) =
q X m=1
|m oq1 ({; {n ) =
q X m=1
|m
Iq ({) ({ {m ) Iq0 ({m )
Qq
where Iq ({) = m=1 ({ {m ) and |m = sp ({m ). We further assume that the abscissa coincide with the (distinct) zeros of the orthogonal polynomial q ({), thus {m = }m;q . Then, from (10.20), it follows that II0 q(}({) = 0 q(}({) for all 1 m q q m;q ) q m;q ) and we obtain q X q ({) tq1 ({) = sp (}m;q ) ({ } ) q0 (}m;q ) m;q m=1 Moreover, the dierence polynomial up ({) = sp ({) tq1 ({) has degree p and up ({) vanishes at the q points {m = }m;q , taken as the zeros of q ({). Thus, up ({) = wpq ({) q ({)
10.5 Gaussian quadrature
327
where wpq ({) is some polynomial of degree p q. Taking the scalar product (up > 1) (or multiplying both sides by gZ ({) and integrating over [d> e]) shows that (up > 1) = (wpq > q ) which, by art. 244, vanishes provided q A p q, or 2q A p. In the case that p is at most 2q 1 and (up > 1) = (sp tq1 > 1) = 0, we find that Z e Z e sp ({) gZ ({) = tq1 ({) gZ ({) d
d
=
q X
Z
e
sp (}m;q ) d
m=1
q ({) gZ ({) ({ }m;q ) q0 (}m;q )
In summary, we have demonstrated Gauss’s famous quadrature formula, Theorem 69 (Gauss’s quadrature formula) Let d ? }q;q ? }q1;q ? · · · ? }1;q ? e be the zeros of the orthogonal polynomial q ({). For any polynomial sp ({) of degree p at most 2q 1, Z e q X sp ({) gZ ({) = sp (}m;q ) m;q (10.22) d
m=1
where the Christoel numbers are Z m;q = d
e
q ({) gZ ({) ({ }m;q ) q0 (}m;q )
(10.23)
and q ({) is orthogonal on the interval [d> e] with respect to the distribution function Z ({). 256. The Christoel numbers possess interesting properties. First, let sp ({) = o2 n q ({) such that sp (}n;q ) = nm , then Gauss’s quadrature formula 0 ({}m;q )q (}m;q ) (10.22) reduces to ¾2 Z e½ q ({) gZ ({) (10.24) m;q = ({ }m;q ) q0 (}m;q ) d demonstrating that all Christoel numbers m;q are positive. The integral (10.18), corresponding to the associated orthogonal polynomials q ({) and valid for { 5 C\ [d> e], actually is finite at the zeros of q ({). Comparison with (10.23) shows that q (}m;q ) = m;q q0 (}m;q ) Next, the Christoel-Darboux formula (10.14), with | = }m;q , is q1 X n=0
en ({) en (}m;q ) =
1 e q ({) eq+1 (}m;q ) Dq+1 { }m;q
328
Orthogonal polynomials
Taking the scalar product (=> 1) of both sides yields q1 X
(e n > 1) en (}m;q ) =
n=0
eq+1 (}m;q ) Dq+1
Z
e
d
eq ({) gZ ({) { }m;q
Art. 244 shows that (e n > 1) = 0 except when n = 0. In that case, e0 ({) = Re s and (e 0 > 1) = d e0 ({) gZ ({) = p0 such that q1 X
s1 p0
(e n > 1) en (}m;q ) = 1
n=0
The definition (10.23) of the Christoel numbers shows that Z e Z e q ({) gZ ({) eq ({) gZ ({) = m;q = 0 (} ({ } ) ) ({ }m;q ) eq0 (}m;q ) m;q m;q d d q such that eq+1 (}m;q ) 1= Dq+1
Z d
e
eq+1 (}m;q ) eq ({) eq0 (}m;q ) gZ ({) = m;q { }m;q Dq+1
Thus, the Christoel numbers obey m;q =
Dq Dq+1 = 0 eq+1 (}m;q ) eq (}m;q ) eq1 (}m;q ) eq0 (}m;q )
(10.25)
where the latter follows from (10.13). Finally, the Christoel-Darboux formula (10.15) evaluated at { = }m;q combined with (10.25) gives 1 m;q = Pq1 2 (10.26) en (}m;q ) n=0 257. Partial fraction decomposition of
q ({) q ({) .
The associated orthogonal polyno-
mials (art. 250) are of degree q 1, such that the fraction as q q ({) X dn;q = q ({) { }n;q
q ({) q ({)
can be expanded
n=1
where we need to determine the coe!cients dn;q . Equation (10.19) shows that, for large {, ¢ ¡ q ({) I ({) = R {2q1 q ({) With the integral (10.17) of I ({), we have Z e q X ¢ ¡ dn;q gZ (|) = R {2q1 { }n;q { | d n=1
10.5 Gaussian quadrature
329
After expanding the left-hand side in a power series in {1 and after equating the corresponding power of {p , we obtain, for 0 p 2q 1, Z e q X p dn;q (}n;q ) | p gZ (|) = 0 d
n=1
Gauss’s quadrature formula (10.22) applied to sp ({) = | p for 0 p 2q 1 gives Z e q X p | p gZ (|) = n;q }n;q d
n=1
whence dn;q = n;q . In summary, the partial fraction decomposition becomes q ({) X n;q = q ({) { }n;q q
n=1
from which the Christoel numbers follow as m;q = lim
{$}m;q
q (}m;q ) q ({) ({ }m;q ) = 0 q ({) q (}m;q )
(10.27)
258. Parameterized weight functions. Suppose that the distribution function Z is dierentiable at any point of [d> e], and that Z depends on a parameter w. In ({>w) addition, we assume that the density or weight function z ({> w) = gZg{ is positive and that z ({> w) is also continuous and dierentiable in w. The explicit dependence on the parameter w in Gauss’s quadrature formula (10.22) is written as Z e q X sp ({) z ({> w) g{ = sp (}m;q (w)) m;q (w) d
m=1
Dierentiation with respect to w yields Z e q q X X Cz ({> w) 0 g{ = sp ({) s0p (}m;q (w)) }m;q (w) m;q (w) + sp (}m;q (w)) 0m;q (w) Cw d m=1 m=1 h2 ({>w)
q For the particular choice of sp ({) = {} , we have that sp (}m;q (w)) = 0 and n;q (w) that ¯ gsp ({) ¯¯ 2 s0p (}m;q (w)) = = (e q0 (}m;q (w) > w)) mn g{ ¯{=}m;q (w)
such that
Z d
e
eq2 ({> w) Cz ({> w) 2 0 g{ = (e q0 (}n;q (w) > w)) }n;q (w) n;q (w) { }n;q (w) Cw
On the other hand, μ eq >
eq { }n;q (w)
¶
Z = d
e
eq2 ({> w) z ({> w) g{ = 0 { }n;q (w)
330
Orthogonal polynomials
by orthogonality (art. 244). Subtraction from the previous integral yields ½ ¾ Z e Cz ({> w) eq2 ({> w) 2 0 z ({> w) g{ (w) n;q (w) = (e q0 (}n;q (w) > w)) }n;q Cw d { }n;q (w) ) ( Cz({>w) Z e eq2 ({> w) Cw z ({> w) g{ = z ({> w) d { }n;q (w) ¯ Cz({>w) ¯ 1 where, if the constant is chosen equal to = z({>w) shows that ¯ Cw {=}n;q (w)
the function
½ Cz({>w) 1 z({>w) Cw
¯
Cz({>w) ¯ 1 ¯ z({>w) Cw {=}n;q (w)
{ }n;q (w)
¾ 0
Cz({>w) 1 is increasing in {. In that case, the integral at the rightprovided that z({>w) Cw hand side is positive (because it cannot vanish at any point { 5 [d> e]) and, hence, 0 }n;q (w) A 0: the zero }n;q (w) of eq ({> w) is increasing in the parameter w. An interesting application is the choice z ({> w) = (1 w) z1 ({) + wz2 ({), where z1 and z2 are two weight functions on [d> e], both positive and continuous for { 5 (d> e). In addition,
1 z2 ({) z1 ({) Cz ({> w) = z ({> w) Cw (1 w) z1 ({) + wz2 ({) 3 4 1 1C D 1 z ({) = w w 2 +1w z1 ({)
2 ({) is increasing if z z1 ({) is increasing for 0 ? w ? 1. Then, we have shown above that the zero }n;q (w) of eq ({> w) is increasing in w. Let {}1;n;q }1nq and {}2;n;q }1nq denote the set of zeros of the orthogonal polynomials corresponding to z1 and z2 , respectively. Thus, }2;n;q = }n;q (1) is larger than }1;n;q = }n>q (0) for all 1 n q, 2 ({) because z ({> 0) = z1 ({) and z ({> 1) = z2 ({). In summary, if the ratio z z1 ({) of two weight functions is increasing on { 5 [d> e], then the respective zeros obey }2;n;q A }1;n;q for all 1 n q.
259. Numerical integration. Let us consider the integral Z e i ({) gZ ({) d
which exists for the function i ({). The Gaussian quadrature formula (10.22) suggests Z e q X i ({) gZ ({) = i (}m;q ) m;q + Up d
m=1
where the q-term sum approximates the integral and where Up represents the error. The Christoel numbers {m;q }1mq and the zeros {}m;q }1mq of the orthogonal
10.6 The Jacobi matrix
331
polynomial q ({) are independent of the function i ({). Theorem 69 states that the above approximation is exact for any polynomial i ({) of degree at most 2q 1. In fact, it can be shown (see Gautschi (2004)) that the Gaussian quadrature formula is the only interpolating quadrature rule with the largest possible precision of 2q 1. Since gZ (x) = gx for Legendre polynomials Sq ({), where d =R 1 and e = 1, e the most straightforward numerical computation of the integral d i (x) gx uses e+d ed Legendre’s orthogonal polynomials. After substitution x = 2 + 2 {, we have ¶ μ Z e Z ed 1 e+d ed + { g{ i (x) gx = i 2 2 2 d 1 We refer to Lanczos (1988, p. 400-404) for a numerical example that illustrates the power of the Gaussian quadrature formula.
10.6 The Jacobi matrix 260. The Jacobi matrix. The three-term recursion (10.8) is written in matrix form £ ¤W by defining the vector ({) = 0 ({) 1 ({) · · · q1 ({) as 5
0 ({) 1 ({) .. .
6
5
e0;1 9 : 9 e0;2 9 : 9 9 : 9 {9 :=9 9 : 9 7q2 ({)8 7 q1 ({)
e1;1 e1;2 .. .
65 e2;2 .. . eq3;q1
..
. eq2;q1 eq2;q
0 ({) 1 ({) .. .
6 5
0 0 .. .
6
:9 : 9 : :9 : 9 : :9 : 9 : :9 :+9 : :9 : 9 : 8 eq1;q1 87 q2 ({) 8 7 0 q1 ({) eq1;q eq q ({)
Thus, (10.28) { ({) = ({) + eq q ({) hq £ ¤W and the q × q matrix where the basic vector hq = 0 0 · · · 0 1 5 6 e0;1 e1;1 9 e0;2 e1;2 : e2;2 9 : 9 : . . . .. .. ..
=9 : 9 : 7 eq3;q1 eq2;q1 eq1;q1 8 eq2;q eq1;q We observe that, when { = }n is a zero of q ({), then (10.28) reduces to the eigenvalue equation
(}n ) = }n (}n ) such that the zero }n is an eigenvalue of belonging to the eigenvector (}n ). This eigenvector is never equal to the zero vector because the first component 0 ({) = f0;0 6= 0. There must be a similarity transform to make the matrix symmetric, since all
332
Orthogonal polynomials
zeros of q ({) are real (Theorem 66). A similarity transform (art. 142) preserves the eigenvalues. The simplest similarity transform is K = diag(k1 > k2 > = = = > kq ) such that 6 5 k1 e0;1 k2 e1;1 k2 : 9 k2 e0;2 e1;2 : 9 k1 k3 e2;2 : 9 .. .. .. e = K K 1 = 9 :
. . . : 9 : 9 kq1 kq1 e e e 7 q2;q1 kq2 q3;q1 kq q1;q1 8 kq eq1;q kq1 eq2;q e eW Thus, ³ ´ in order ³ ´to produce a symmetric matrix = , we need to require that e e
=
for all 1 l q, implying that, for l 2, l>l1
l1>l
kl1 kl el2;l = el1;l1 kl1 kl whence
μ
kl kl1
¶2 =
el1;l1 el2;l
q
Art. 247 shows that el1;l1 and el2;l have the same sign. Thus, kl = for 1 l q and we can choose k1 = 1 such that v um1 u Y en;n kl = t en1;n+1
el1;l1 el2;l kl1
n=1
The eigenvector belonging to zero }n equals e (}n ) = K (}n ). After the similarity e is the matrix transform K, the result for
p 5 6 e0;2 e1;1 p e0;1 p 9 e0;2 e1;1 : e1;2 e1;3 e2;2 9 : 9 : . . . .. .. .. 9 : 9 : p p 7 eq3;q1 eq2;q2 p eq2;q1 eq2;q eq1;q1 8 eq2;q eq1;q1 eq1;q 261. Similarly as in art. 260, the three-term recursion (10.13) of the normalized polynomials {e m ({)}0mq1 is written in matrix form as 5
e0 ({) e1 ({) .. .
6
5
9 : 9 9 : 9 9 : 9 {9 :=9 9 : 9 9 7 eq2 ({) 8 7 eq1 ({)
1 E D1
1 D1
1 D1 2 E D2
..
.
1 D2
..
.
1 Dq2
..
.
q1 E Dq1 1 Dq1
1 Dq1 q E Dq
6 65 6 5 0 e0 ({) : :9 9 0 : :9 e1 ({) : : 9 : :9 9 : 9 .. .. : :9 + : . . : :9 9 : 9 : :7 8 0 8 8 7 eq2 ({) 1 e ({) eq1 ({) Dq q
10.6 The Jacobi matrix Thus, in the normalized case where Fq =
Dq Dq1 ,
e ({) + {e ({) = e
333
e matrix is symmetric, the
1 eq ({) hq Dq
³ ´ where the vector e ({) = diag km k1 ({). If there exist two dierent similarity transforms K1 and K2 that transform a matrix D into a two dierent symmetric matrices, E1 and E2 , such that E1 = K1 DK11 and E2 = K2 DK21 , then K1W K1 = K2W K2 . Indeed, D = K11 E1 K1 = K21 E2 K2 from which E1 = K1 K21 E2 K2 K11 . Since E1 = E1W and E2 = E2W , we have that ¢W ¡ ¢W ¡ E1 = K2 K11 E2 K1 K21 ¢W ¢W ¡ ¡ Hence, K1 K21 = K2 K11 and K2 K11 = K1 K21 , which lead to K1W K1 = K2W K2 . If K1 and K2 are, in addition, also symmetric as in the case of a diagonal 1 matrix, then K12 = K22 or K1 = ±K2 . Since em ({) = m ({) km k , both similarity ¡p ¢ transforms K and K must be the same. This implies that K = diag em = 1 2 ³ ´ 1
diag km k we have
2
, thus em = km k
=
1 (m >m ) .
In addition, in agreement with art. 247, Em Dm 1 = Dm
em1;m = p em1;m+1 em;m
e corHence, transforming by a similarity transform K to a symmetric matrix
responds to normalizing the orthogonal polynomials. 262. Gerschgorin’s Theorem 36 tells us that there lies a zero }n of q ({) in a E 1 disk centered around em1;m = Dmm with radius Dm1 + D1m . Overall, the symmetric e leads to the sharpest bounds on the eigenvalues/zeros of q ({) because matrix
the above similarity transform K minimizes the o-diagonal elements. However, not always. In particular, ignoring the¢ attempt¡ to symmetrize
, we may choose k1 ¢ ¡ and kq in such a way that K K 1 12 and K K 1 q;q1 are arbitrarily small (but not zero). But, by making k1 and kq very small, we increase the radius around e0;1 and eq1;q . Gerschgorin’s Theorem 36 indicates that there is a zero }n close to e0;1 and another zero close to eq1;q . 263. Continued fraction associated to orthogonal polynomials. By systematic row multiplication and subtraction from the next³one, we ´can eliminate the lower diage m1;m in the determinant det
e {L , which eventually results in onal elements
334
Orthogonal polynomials ¯ ³ ´ ¯ e {L = ¯¯
e {L ¯¯, a continued fraction expansion of Wq = det
¯ E1 1 ¯ D { D1 1 ¯ E 1 ¯ D22 { ¯ D1 ¯ .. Wq = ¯¯ . ¯ ¯ ¯ ¯
1 D2
..
.
1 Dq2
We write the determinant Wq in block form, ¯ ¯ D1 {+E1 ¯ D1 Wq = ¯ 1 ¯ D1 h1
¯ ¯ ¯ ¯ ¯ ¯ .. ¯ . ¯ ¯ Eq1 1 ¯ Dq1 { Dq1 ¯ Eq 1 ¯ { Dq1 Dq
1 W D1 h1
W1;q
¯ ¯ ¯ ¯ ¯
where the basis vector is h1 = (1> 0> = = =) and where the matrix W1;q is obtained by e {L, deleting the first row and the first column in
6 5 D2 {+E2 1 D2 D2 1 1 3 : 9 D3 {+E : 9 D2 D3 D3 : 9 . . . : 9 . . . W1;q = 9 . . . : : 9 D {+E 1 1 q1Dq1 q1 8 7 Dq2 Dq1 Dq {+Eq 1 Dq1 Dq Invoking (8.79) yields D1 { + E1 Wq = D1 and h1 hW1 = the element
¯ ¯ ¯ h1 hW1 ¯¯ ¯W1;q + 1 ¯ D1 D1 { + E1 ¯
b equals the zero matrix (with same dimensions as Wq1 ), except for R b11 = 1. Thus, R ¯ ¯ 1 1 1 W ¯ 2 + h D1 { + E1 ¯¯ D2 {+E D2 D1 D1 {+E1 D2 1 ¯ Wq = ¯ ¯ 1 ¯ h W2;q ¯ D1 D2 1
where we denote by Wm;q the matrix obtained by deleting the first m rows and the e {L. Again invoking (8.79) yields, with Fq = Dq , first m columns in
Dq1 μ ¶ ¯¯ D3 {+E3 + 1 1 1 W D1 { + E1 F2 D3 D2 D2 {+E2 F2 D3 h1 ¯ D {+E 1 1 D2 { + E2 Wq = ¯ 1 D1 D2 D1 { + E1 ¯ W3;q D3 h1 Next,
μ ¶Ã D1 { + E1 F2 F3 (D2 { + E2 ) Wq = (D3 { + E3 ) D1 D2 D3 D1 { + E1 D2 { + E2 ¯ D {+E ¯ 1 1 W ¯ 4 ¯ 4 + D13 F3 D4 D4 h1 ¯ ¯ D3 {+E3 F2 ¯ D2 {+E2 × ¯¯ D1 {+E1 ¯ 1 ¯ W4;q ¯ D4 h1
¯ ¯ ¯ ¯ ¯ !
F2 D1 {+E1
10.6 The Jacobi matrix
335
from which we deduce that Wq =
q (1)q Y n ({) q Y n=1 Dn n=1
where the continued fraction n ({) equals n ({) = Dn { + En
Fn Dn1 { + En1
(10.29)
Fn1 Dn2 {+En2
Fn2
..
.
..
.
F2 D2 {+E2 D1 {+E1
The continued fraction thus satisfies the recursion4 Fn n1 ({)
n ({) = Dn { + En
(10.30)
³ ´ e {L has the same Art. 260 shows that the characteristic polynomial Wq = det
zeros of q ({), such that Wq =
(1)q fq;q q
({), and
fq;q Y n ({) q Y n=1 Dn q
q ({) =
n=1
from which, q ({) =
Dq fq1;q1 q ({) eq ({) = fq;q q1 ({) eq1 ({)
q ({) Introducing q ({) = hhq1 ({) into the recursion (10.30) again leads to the normalized three-term recursion (10.13). More results on continued fractions are presented in Gautschi (2004) and in Chihara (1978).
e is positive semidefinite, then
e can be considered as a Gram matrix 264. If
W e = D D where
e l>l 0. Art. 260 demonstrates that
e is positive (art. 175), i.e.
semidefinite if all zeros of the orthogonal polynomials are non-negative. Theorem 4
In most textbooks, a finite continued fraction is written in a dierently labeled form as q = d0 3
e1 d1 3
e2 d2 3
..
e3
.
..
.
eq dq
from which the recursive structure is less naturally observed. If the determinant Wq is expanded by the last row and last column, up to the first one, a same labeling would have been found. The main purpose in classical treatment to use the highest index in the deepest fraction is to study the convergence of limq