Journal of Computational and Applied Mathematics 122 (2000) ix–xi www.elsevier.nl/locate/cam
Preface
Numerical Analysis 2000 Vol. II: Interpolation and extrapolation C. Brezinski Laboratoire d’Analyse NumÃerique et d’Optimisation, UniversitÃe des Sciences et Technologies de Lille, 59655 Villeneuve d’Ascq Cedex, France
This volume is dedicated to two closely related subjects: interpolation and extrapolation. The papers can be divided into three categories: historical papers, survey papers and papers presenting new developments. Interpolation is an old subject since, as noticed in the paper by M. Gasca and T. Sauer, the term was coined by John Wallis in 1655. Interpolation was the rst technique for obtaining an approximation of a function. Polynomial interpolation was then used in quadrature methods and methods for the numerical solution of ordinary dierential equations. Obviously, some applications need interpolation by functions more complicated than polynomials. The case of rational functions with prescribed poles is treated in the paper by G. Muhlbach. He gives a survey of interpolation procedures using Cauchy–Vandermonde systems. The well-known formulae of Lagrange, Newton and Neville–Aitken are generalized. The construction of rational B-splines is discussed. Trigonometric polynomials are used in the paper by T. Strohmer for the reconstruction of a signal from non-uniformly spaced measurements. They lead to a well-posed problem that preserves some important structural properties of the original in nite dimensional problem. More recently, interpolation in several variables was studied. It has applications in nite dierences and nite elements for solving partial dierential equations. Following the pioneer works of P. de Casteljau and P. Bezier, another very important domain where multivariate interpolation plays a fundamental role is computer-aided geometric design (CAGD) for the approximation of surfaces. The history of multivariate polynomial interpolation is related in the paper by M. Gasca and T. Sauer. The paper by R.A. Lorentz is devoted to the historical development of multivariate Hermite interpolation by algebraic polynomials. In his paper, G. Walz treats the approximation of multivariate functions by multivariate Bernstein polynomials. An asymptotic expansion of these polynomials is given and then used for building, by extrapolation, a new approximation method which converges much faster. E-mail address:
[email protected] (C. Brezinski). c 2000 Elsevier Science B.V. All rights reserved. 0377-0427/00/$ - see front matter PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 5 2 - 6
x
Preface / Journal of Computational and Applied Mathematics 122 (2000) ix–xi
Extrapolation is based on interpolation. In fact, extrapolation consists of interpolation at a point outside the interval containing the interpolation points. Usually, this point is either zero or in nity. Extrapolation is used in numerical analysis to improve the accuracy of a process depending of a parameter or to accelerate the convergence of a sequence. The most well-known extrapolation processes are certainly Romberg’s method for improving the convergence of the trapezoidal rule for the computation of a de nite integral and Aitken’s 2 process which can be found in any textbook of numerical analysis. An historical account of the development of the subject during the 20th century is given in the paper by C. Brezinski. The theory of extrapolation methods lays on the solution of the system of linear equations corresponding to the interpolation conditions. In their paper, M. Gasca and G. Muhlbach show, by using elimination techniques, the connection between extrapolation, linear systems, totally positive matrices and CAGD. There exist many extrapolation algorithms. From a nite section Sn ; : : : ; Sn+k of the sequence (Sn ), they built an improved approximation of its limit S. This approximation depends on n and k. When at least one of these indexes goes to in nity, a new sequence is obtained with, possibly, a faster convergence. In his paper, H.H.H. Homeier studies scalar Levin-type acceleration methods. His approach is based on the notion of remainder estimate which allows to use asymptotic information on the sequence to built an ecient extrapolation process. The most general extrapolation process known so far is the sequence transformation known under the name of E-algorithm. It can be implemented by various recursive algorithms. In his paper, N. Osada proved that the E-algorithm is mathematically equivalent to the Ford–Sidi algorithm. A slightly more economical algorithm is also proposed. When S depends on a parameter t, some applications need the evaluation of the derivative of S with respect to t. A generalization of Richardson extrapolation process for treating this problem is considered in the paper by A. Sidi. Instead of being used for estimating the limit S of a sequence from Sn ; : : : ; Sn+k , extrapolation methods can also be used for predicting the next unknown terms Sn+k+1 ; Sn+k+2 ; : : :. The prediction properties of some extrapolation algorithms are analyzed in the paper by E.J. Weniger. Quite often in numerical analysis, sequences of vectors have to be accelerated. This is, in particular, the case in iterative methods for the solution of systems of linear and nonlinear equations. Vector acceleration methods are discussed in the paper by K. Jbilou and H. Sadok. Using projectors, they derive a dierent interpretation of these methods and give some theoretical results. Then, various algorithms are compared when used for the solution of large systems of equations coming out from the discretization of partial dierential equations. Another point of view is taken in the paper by P.R. Graves-Morris, D.E. Roberts and A. Salam. After reminding, in the scalar case, the connection between the -algorithm, Pade approximants and continued fractions, these authors show that the vector -algorithm is the best all-purpose algorithm for the acceleration of vector sequences. There is a subject which can be related either to interpolation (more precisely, Hermite interpolation by a rational function at the point zero) and to convergence acceleration: it is Pade approximation. Pade approximation is strongly connected to continued fractions, one of the oldest subject in mathematics since Euclid g.c.d. algorithm is an expansion into a terminating continued fraction.
Preface / Journal of Computational and Applied Mathematics 122 (2000) ix–xi
xi
Although they were implicitly known before, Pade approximants were really introduced by Johan Heinrich Lambert in 1758 and Joseph Louis Lagrange in 1776. Pade approximants have important applications in many branches of applied sciences when the solution of a problem is obtained as a power series expansion and some of its properties have to be guessed from its rst Taylor coecients. In this volume, two papers deal with nonclassical applications of Pade approximation. M. Prevost shows how Pade approximants can be used to obtain Diophantine approximations of real and complex numbers and then proving irrationality. Pade approximation of the asymptotic expansion of the remainder of a series also provides Diophantine approximations. The solution of a discrete dynamical system can be related to matrix Hermite–Pade approximants, an approach developed in the paper by V. Sorokin and J. van Iseghem. Spectral properties of the band operator are investigated. The inverse spectral method is used for the solution of dynamical systems de ned by a Lax pair. Obviously, all aspects of interpolation and extrapolation have not been treated in this volume. However, many important topics have been covered. I would like to thank all authors for their eorts.
Journal of Computational and Applied Mathematics 122 (2000) 1–21 www.elsevier.nl/locate/cam
Convergence acceleration during the 20th century C. Brezinski Laboratoire d’Analyse Numerique et d’Optimisation, UFR IEEA, Universite des Sciences et Technologies de Lille, 59655 –Villeneuve d’Ascq cedex, France Received 8 March 1999; received in revised form 12 October 1999
1. Introduction In numerical analysis many methods produce sequences, for instance iterative methods for solving systems of equations, methods involving series expansions, discretization methods (that is methods depending on a parameter such that the approximate solution tends to the exact one when the parameter tends to zero), perturbation methods, etc. Sometimes, the convergence of these sequences is slow and their eective use is quite limited. Convergence acceleration methods consist of transforming a slowly converging sequence (Sn ) into a new sequence (Tn ) converging to the same limit faster than the initial one. Among such sequence transformations, the most well known are certainly Richardson’s extrapolation algorithm and Aitken’s 2 process. All known methods are constructed by extrapolation and they are often called extrapolation methods. The idea consists of interpolating the terms Sn ; Sn+1 ; : : : ; Sn+k of the sequence to be transformed by a sequence satisfying a certain relationship depending on parameters. This set of sequences is called kernel of the transformation and every sequence of this set is transformed into a constant sequence by the transformation into consideration. For example, as we will see below, the kernel of Aitken’s 2 process is the set of sequences satisfying ∀n; a0 (Sn − S) + a1 (Sn+1 − S) = 0, where a0 and a1 are parameters such that a0 + a1 6= 0. If Aitken’s process is applied to such a sequence, then the constant sequence (Tn = S) is obtained. The parameters involved in the de nition of the kernel are uniquely determined by the interpolation conditions and then the limit of the interpolating sequence of the kernel is taken as an approximation of the limit of the sequence to be accelerated. Since this limit depends on the index n, it will be denoted by Tn . Eectively, the sequence (Sn ) has been transformed into a new sequence (Tn ). This paper, which is based on [31], but includes new developments obtained since 1995, presents my personal views on the historical development of this subject during the 20th century. I do not pretend to be exhaustive nor even to quote every important contribution (if a reference does not E-mail address:
[email protected] (C. Brezinski) c 2000 Elsevier Science B.V. All rights reserved. 0377-0427/00/$ - see front matter PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 6 0 - 5
2
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
appear below, it does not mean that it is less valuable). I refer the interested reader to the literature and, in particular to the recent books [55,146,33,144]. For an extensive bibliography, see [28]. I will begin with scalar sequences and then treat the case of vector ones. As we will see, a sequence transformation able to accelerate the convergence of all scalar sequences cannot exist. Thus, it is necessary to obtain many dierent convergence acceleration methods, each being suitable for a particular class of sequences. Many authors have studied the properties of these procedures and proved some important classes of sequences to be accelerable by a given algorithm. Scalar sequence transformations have also been extensively studied from the theoretical point of view. The situation is more complicated and more interesting for vector sequences. In the case of a sequence of vectors, it is always possible to apply a scalar acceleration procedure componentwise. However, such a strategy does not take into account connections which may exist between the various components, as in the important case of sequences arising from iterative methods for solving a system of linear or nonlinear equations. 2. Scalar sequences Let (Sn ) be a scalar sequence converging to a limit S. As explained above, an extrapolation method consists of transforming this sequence into a new one, (Tn ), by a sequence transformation T : (Sn ) → (Tn ). The transformation T is said to accelerate the convergence of the sequence (Sn ) if and only if Tn − S lim = 0: n→∞ Sn − S We can then say that (Tn ) converges (to S) faster than (Sn ). The rst methods to have been used were linear transformations Tn =
∞ X
ani Si ;
n = 0; 1; : : : ;
i=0
where the numbers ani are constants independent of the terms of the sequence (Sn ). Such a linear transformation is usually called a summation process and its properties are completely determined by the matrix A = (ani ). For practical reasons, only a nite number of the coecients ani are dierent from zero for each n. Among such processes are those named after Euler, Cesaro and Holder. In the case of linear methods, the convergence of the sequence (Tn ) to S for any converging sequence (Sn ) is governed by the Toeplitz summability theorem; see [115] for a review. Examples of such processes are Tn =
n 1 X Si n + 1 i=0
Tn =
n+k 1 X Si : k + 1 i=n
or
In the second case, the sequence (Tn ) also depends on a second index, k, and the convergence has to be studied either when k is xed and n tends to in nity, or when n is xed and k tends to in nity.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
3
With respect to convergence acceleration, summation processes are usually only able to accelerate the convergence of restricted classes of sequences and this is why the numerical analysts of the 20th century turned their eorts to nonlinear transformations. However, there is one exception: Richardson’s extrapolation process. 2.1. Richardson’s process It seems that the rst appearance of a particular case of what is now called the Richardson extrapolation process is due to Christian Huygens (1629 –1695). In 1903, Robert Moir Milne (1873) applied the idea of Huygens for computing [101]. The same idea was exploited again by Karl Kommerell (1871–1948) in his book of 1936 [78]. As explained in [143], Kommerell can be considered as the real discoverer of Romberg’s method although he used this scheme in the context of approximating . Let us now come to the procedures used for improving the accuracy of the trapezoidal rule for computing approximations to a de nite integral. In the case of a suciently smooth function, the error of this method is given by the Euler–Maclaurin expansion. In 1742, Colin Maclaurin (1698– 1746) [90] showed that its precision could be improved by forming linear combinations of the results obtained with various stepsizes. His procedure can be interpreted as a preliminary version of Romberg’s method; see [49] for a discussion. In 1900, William Fleetwood Sheppard (1863–1936) used an elimination strategy in the Euler– Maclaurin quadrature formula with hn = rn h and 1 = r0 ¡ r1 ¡ r2 ¡ · · · to produce a better approximation to the given integral [132]. In 1910, combining the results obtained with the stepsizes h and 2h, Lewis Fry Richardson (1881– 1953) eliminated the rst term in a discretization process using central dierences [119]. He called this procedure the deferred approach to the limit or h2 -extrapolation. The transformed sequence (Tn ) is given by Tn =
h2n+1 S(hn ) − h2n S(hn+1 ) : h2n+1 − h2n
In a 1927 paper [120] he used the same technique to solve a 6th order dierential eigenvalue problem. His process was called (h2 ; h4 )-extrapolation. Richardson extrapolation consists of computing the value at 0, denoted by Tk(n) , of the interpolation polynomial of the degree at most k, which passes through the points (x n ; Sn ); : : : ; (x n+k ; Sn+k ). Using the Neville–Aitken scheme for these interpolation polynomials, we immediately obtain (n) Tk+1 =
x n+k+1 Tk(n) − x n Tk(n+1) x n+k+1 − x n
with T0(n) = Sn . Let us mention that Richardson referred to a 1926 paper by Nikolai Nikolaevich Bogolyubov (born in 1909) and Nikolai Mitrofanovich Krylov (1879 –1955) where the procedure (often called the deferred approach to the limit) can already be found [11]. In 1955, Werner Romberg (born in 1909) was the rst to use repeatedly an elimination approach for improving the accuracy of the trapezoidal rule [121]. He himself refers to the book of Lothar Collatz (1910 –1990) of 1951 [50]. The procedure became widely known after the rigorous error
4
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
analysis given in 1961 by Friedrich L. Bauer [3] and the work of Eduard L. Stiefel (1909 –1978) [138]. Romberg’s derivation of his process was heuristic. It was proved by Pierre-Jean Laurent in 1963 [81] that the process comes out from the Richardson process by choosing x n =h2n and hn =h0 =2n . Laurent also gave conditions on the choice of the sequence (x n ) in order that the sequences (Tk(n) ) tend to S either when k or n tends to in nity. Weaker conditions were given by Michel Crouzeix and Alain L. Mignot in [52, pp. 52–55]. As we shall see below, extensions of Romberg’s method to nonsmooth integrands leads to a method called the E-algorithm. Applications of extrapolation to the numerical solution of ordinary dierential equations were studied by H.C. Bolton and H.I. Scoins in 1956 [12], Roland Bulirsch and Josef Stoer in 1964 –1966 [47] and William B. Gragg [65] in 1965. The case of dierence methods for partial dierential equations was treated by Guri Ivanovich Marchuk and V.V. Shaidurov [91]. Sturm–Liouville problems are discussed in [117]. Finally, we mention that Heinz Rutishauser (1918–1970) pointed out in 1963 [122] that Romberg’s idea can be applied to any sequence as long as the error has an asymptotic expansion of a form similar to the Euler–Maclaurin’s. For a detailed history of the Richardson method, its developments and applications, see [57,77,143]. 2.2. Aitken’s process and the -algorithm by
The most popular nonlinear acceleration method is certainly Aitken’s 2 process which is given Tn =
2 Sn Sn+2 − Sn+1 (Sn+1 − Sn )2 = Sn − ; Sn+2 − 2Sn+1 + Sn Sn+2 − 2Sn+1 + Sn
n = 0; 1; : : :
The method was stated by Alexander Craig Aitken (1895 –1967) in 1926 [1], who used it to accelerate the convergence of Bernoulli’s method for computing the dominant zero of a polynomial. Aitken pointed out that the same method was obtained by Hans von Naegelsbach (1838) in 1876 in his study of Furstenau’s method for solving nonlinear equations [104]. The process was also given by James Clerk Maxwell (1831–1879) in his Treatise on Electricity and Magnetism of 1873 [95]. However, neither Naegelsbach nor Maxwell used it for the purpose of acceleration. Maxwell wanted to nd the equilibrium position of a pointer oscillating with an exponentially damped simple harmonic motion from three experimental measurements. It is surprising that Aitken’s process was known to Takakazu Seki (1642–1708), often considered the greatest Japanese mathematician. In his book Katsuyo Sanpo, Vol. IV, he used this process to compute the value of , the length of a chord and the volume of a sphere. This book was written around 1680 but only published in 1712 by his disciple Murahide Araki. Parts of it can be found in [73]. Let us mention that the Japanese characters corresponding to Takakazu have another pronounciation which is Kowa. This is the reason why this mathematician is often called, erroneously as in [29,31] Seki Kowa. What makes Aitken’s process so popular is that it accelerates the convergence of all linearly converging sequences, that is sequences such that ∃a 6= 1 lim
n→∞
Sn+1 − S = a: Sn − S
It can even accelerate some logarithmic sequences (that is corresponding to a = 1) which are those with the slowest convergence and the most dicult to accelerate.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
5
Aitken’s 2 process is exact (which means that ∀n; Tn = S) for sequences satisfying, a0 (Sn − S) + a1 (Sn+1 − S) = 0; ∀n; a0 a1 6= 0; a0 + a1 6= 0. Such sequences form the kernel of Aitken’s process. The idea naturally arose of nding a transformation with the kernel a0 (Sn − S) + · · · + ak (Sn+k − S) = 0;
∀n;
a0 ak 6= 0; a0 + · · · + ak 6= 0. A particular case of k = 2 was already treated by Maxwell in his book of 1873 and a particular case of an arbitrary value of k was studied by T.H. O’Beirne in 1947 [107]. This last work remains almost unknown since it was published only as an internal report. The problem was handled in full generality by Daniel Shanks (1917–1996) in 1949 [130] and again in 1955 [131]. He obtained the sequence transformation de ned by Sn Sn+1 · · · Sn+k Sn+1 Sn+2 · · · Sn+k+1 . .. .. .. . . S n+k Sn+k+1 · · · Sn+2k Tn = ek (Sn ) = : 2 2 Sn · · · Sn+k−1 .. .. . . 2 2 S ··· S n+k−1
n+2k−2
When k = 1, Shanks transformation reduces to the Aitken’s 2 process. It can be proved that ek (Sn ) = S; ∀n if and only if (Sn ) belongs to the kernel of the transformation given above. The same ratios of determinants were obtained by R.J. Schmidt in 1941 [127] in his study of a method for solving systems of linear equations. The determinants involved in the de nition of ek (Sn ) have a very special structure. They are called Hankel determinants and were studied by Hermann Hankel (1839 –1873) in his thesis in 1861 [72]. Such determinants satisfy a ve-term recurrence relationship. This relation was used by O’Beirne and Shanks to implement the transformation by computing separately the numerators and the denominators of the ek (Sn )’s. However, numerical analysts know it is dicult to compute determinants (too many arithmetical operations are needed and rounding errors due to the computer often lead to a completely wrong result). A recursive procedure for computing the ek (Sn )’s without computing the determinants involved in their de nition was needed. This algorithm was obtained in 1956 by Peter Wynn. It is called the -algorithm [147]. It is as follows. One starts with (n) −1 = 0;
0(n) = Sn
and then (n) (n+1) k+1 = k−1 +
1 : − k(n)
k(n+1)
Note that the numbers k(n) ’s ll out a two-dimensional array. The -algorithm is related to Shanks transformation by (n) 2k = ek (Sn )
and
(n) 2k+1 = 1=ek (Sn ):
Thus, the ’s with an odd lower index are only auxiliary quantities. They can be eliminated from the algorithm, thus leading to the so-called cross rule due to Wynn [153].
6
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
When implementing the -algorithm or using Wynn’s cross rule, division by zero can occur and the algorithm must be stopped. However, if the singularity is con ned, a term that will again be used in Section 1.6, that is if it occurs only for some adjacent values of the indexes k and n, one may jump over it by using singular rules and continue the computation. If a division by a number close to zero arises, the algorithm becomes numerically unstable due to the cancellation errors. A similar situation holds for the other convergence acceleration algorithms. The study of such problems was initiated by Wynn in 1963 [151], who proposed particular rules for the -algorithm which are more stable than the usual rules. They were extended by Florent Cordellier in 1979 [51,151]. Particular rules for the Â-algorithm were obtained by Redivo Zaglia [155]. The convergence and acceleration properties of the -algorithm have only been completely described only for two classes of sequences, namely totally monotonic and totally oscillating sequences [154,15,16]. Shanks’ transformation and the -algorithm have close connections to Pade approximants, continued fractions and formal orthogonal polynomials; see, for example [18]. 2.3. Subsequent developments The Shanks transformation and the -algorithm sparked the rebirth of the study of nonlinear acceleration processes. They now form an independent chapter in numerical analysis with connections to other important topics such as orthogonal and biorthogonal polynomials, continued fractions, and Pade approximants. They also have applications to the solution of systems of linear and nonlinear equations, the computation of the eigenvalues of a matrix, the solution of systems of linear and nonlinear equations, and many other topics, see [40]. Among other acceleration methods which were obtained and studied, are the W -process of Samuel Lubkin [89], the method of Kjell J. Overholt [110], the -algorithm of Wynn [148], the G-transformation of H.L. Gray, T.A. Atchison and G.V. McWilliams [70], the Â-algorithm of Claude Brezinski [14], the transformations of Bernard Germain– Bonne [63] and the various transformations due to David Levin [85]. To my knowledge, the only known acceleration theorem for the -algorithm was obtained by Naoki Osada [108]. Simultaneously, several applications began to appear. For example, the -algorithm provides a quadratically convergent method for solving systems of nonlinear equations and its does not require the knowledge of any derivative. This procedure was proposed simultaneously by Brezinski [13] and Eckhart Gekeler [61]. It has important applications to the solution of boundary value problems for ordinary dierential equations [44]. Many other algorithms are given in the work of Ernst Joachim Weniger [145], which also contains applications to physics, or in the book of Brezinski and Michela Redivo Zaglia [40] where applications to various domains of numerical analysis can be found. The authors of this book provide FORTRAN subroutines. The book of Annie Cuyt and Luc Wuytack must also be mentioned [53]. The -algorithm has been applied to statistics, see the work of Alain Berlinet [9], and to the acceleration of the convergence of sequences of random variables, considered by Helene Lavastre [82]. Applications to optimization were proposed by Le Ferrand [84] and Bouchta Rhanizar [118]. Instead of using a quite complicated algorithm, such as the -algorithm, it can be interesting to use a simpler one (for instance, Aitken’s 2 process) iteratively. Such a use consists of applying the algorithm to (Sn ) to produce a new sequence (Tn ), then to apply the same algorithm to (Tn ), and so on. For example, applying the iterated 2 process to the successive convergents
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
7
of a periodic continued fraction produces a better acceleration than using the -algorithm [24]. In particular, the iterated 2 process transforms a logarithmic sequence into a sequence converging linearly and linear convergence into superlinear, to my knowledge the only known cases of such transformations. The experience gained during these years lead to a deeper understanding of the subject. Research workers began to study more theoretical and general questions related to the theory of convergence acceleration. The rst attempt was made by R. Pennacchi in 1968 [114], who studied rational sequence transformations. His work was generalized by Germain–Bonne in 1973 [62], who proposed a very general framework and showed how to construct new algorithms for accelerating some classes of sequences. However, a ground breaking discovery was made by Jean Paul Delahaye and Germain–Bonne in 1980 [56]. They proved that if a set of sequences satis es a certain property, called remanence (too technical to be explained here), then a universal algorithm, i.e. one able to accelerate all sequences of this set, cannot exist. This result shows the limitations of acceleration methods. Many sets of sequences were proved to be remanent, for example, the sets of monotonic or logarithmic sequences. Even some subsets of the set of logarithmic sequences are remanent. Moulay Driss Benchiboun [5] observed that all the sequence transformations found in the literature could be written as Tn =
f(Sn ; : : : ; Sn+k ) Df(Sn ; : : : ; Sn+k )
with D2 f ≡ 0, where Df denotes the sum of the partial derivatives of the function f. The reason for that fact was explained by Brezinski [26], who showed that it is related to the translativity property of sequence transformations. Hassane Sadok [123] extended these results to the vector case. Abderrahim Benazzouz [7] proved that quasilinear transformations can be written as the composition of two projections. In many transformations, such as Shanks’, the quantities computed are expressed as a ratio of determinants. This property is related to the existence of a triangular recurrence scheme for their computation as explained by Brezinski and Guido Walz [46]. Herbert Homeier [74] studied a systematic procedure for constructing sequences transformations. He considered iterated transformations which are hierarchically consistent, which means that the kernel of the basic transformation is the lowest one in the hierarchy. The application of the basic transformation to a sequence which is higher in the hierarchy leads to a new sequence belonging to a kernel lower in the hierarchy. Homeier wrote several papers on this topics. Thus, the theory of convergence acceleration methods has progressed impressively. The practical side was not forgotten and authors obtained a number of special devices for improving their eciency. For example, when a certain sequence is to be accelerated, it is not obvious to know in advance which method will give the best result unless some properties of the sequence are already known. Thus, Delahaye [54] proposed using simultaneously several transformations and selecting, at each step of the procedure, one answer among the answers provided by the various algorithms. He proved that, under some assumptions, some tests are able to nd automatically the best answer. The work of Delahaye was extended by Abdelhak Fdil [58,59]. The various answers could also be combined leading to composite transformations [23]. It is possible, in some cases, to extract a linear subsequence from the original one and then to accelerate it, for example, by Aitken’s 2 process [37]. Devices for controlling the error were also constructed [21].
8
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
When faced with the problem of accelerating the convergence of a given sequence, two approaches are possible. The rst is to use a known extrapolation procedure and to try to prove that it accelerates the convergence of the given sequence. The second possibility is to construct an extrapolation procedure especially for that sequence. Convergence tests for sequences and series can be used for that purpose as explained by Brezinski [25]. This approach was mostly developed by Ana Cristina Matos [92]. Special extrapolation procedures for sequences such that ∀n; Sn −S =an Dn , where (Dn ) is a known sequence and (an ) an unknown one, can also be constructed from the asymptotic properties of the sequences (an ) and (Dn ). Brezinski and Redivo Zaglia did this in [39]. A.H. Bentbib [10] considered the acceleration of sequences of intervals. Mohammed Senhadji [129] de ned and studied the condition number of a sequence transformation. 2.4. The E-algorithm As we see above, the quantities involved in Shanks transformation are expressed as a ratio of determinants and the -algorithm allows one to compute them recursively. It is well known that an interpolation polynomial can be expressed as a ratio of determinants. Thus polynomial extrapolation also leads to such a ratio and the Neville–Aitken scheme can be used to avoid the computation of these determinants which leads to the Richardson extrapolation algorithm. A similar situation arises for many other transformations: in each case, the quantities involved are expressed as a ratio of special determinants and, in each case, one seeks for a special recursive algorithm for the practical implementation of the transformation. Thus, there was a real need for a general theory of such sequence transformations and for a single general recursive algorithm for their implementation. This work was performed independently between 1973 and 1980 by ve dierent people. It is now known as the E-algorithm. It seems that the rst appearance of this algorithm is due to Claus Schneider in a paper received on December 21, 1973 [128]. The quantities S(hi ) being given for i = 0; 1; : : :, Schneider looked for S 0 (h) = S 0 + a1 g1 (h) + · · · + ak gk (h) satisfying the interpolation conditions S 0 (hi ) = S(hi ) for i = n; : : : ; n + k, where the gj ’s are given functions of h. Of course, the value of the unknown S 0 thus obtained will depend on the indexes k and n. Assuming that ∀j; gj (0) = 0, we have S 0 = S 0 (0). Denoting by nk the extrapolation functional on the space of functions f de ned at the points h0 ¿ h1 ¿ · · · ¿ 0 and at the point 0 and such that nk f = f(0), we have nk S 0 = c0 S(hn ) + · · · + ck S(hn+k ) with c0 + · · · + ck = 1. The interpolation conditions become nk E = 1; and nk gj = 0; j = 1; : : : ; k n+1 . He with E(h) ≡ 1. Schneider wanted to express the functional nk in the form nk = ank−1 + bk−1 obtained the two conditions nk E = a + b = 1
and nk gk = ank−1 gk + bn+1 k−1 gk = 0: The values of a and b follow immediately and we have [n+1 gk ]n − [nk−1 gk ]n+1 k−1 nk = k−1 n+1 k−1 : [k−1 gk ] − [nk−1 gk ]
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
9
Thus, the quantities nk S 0 can be recursively computed by this scheme. The auxiliary quantities nk gj needed in this formula must be computed separately by the same scheme using a dierent initialization. As we shall see below, this algorithm is just the E-algorithm. In a footnote, Schneider mentioned that this representation for nk was suggested by Borsch–Supan from Johannes Gutenberg Universitat in Mainz. In 1976, Gunter Meinardus and G.D. Taylor wrote a paper [97] on best uniform approximation by functions from span(g1 ; : : : ; gN ) ⊂ C[a; b]. They de ned the linear functionals Lkn on C[a; b] by Lkn (f) =
n+k X
ci f(hi );
i=n
where a6h1 ¡ h2 ¡ · · · ¡ hN +1 6b and where the coecients ci , which depend on n and k, are such that cn ¿ 0; ci 6= 0 for i = n; : : : ; n + k, sign ci = (−1)i−n and n+k X
|ci | = 1;
i=n n+k X
ci gj (hi ) = 0;
j = 1; : : : ; k:
i=n
By using Gaussian elimination to solve the system of linear equations N X
ai gi (hj ) + (−1) j = f(hj );
j = 1; : : : ; k;
i=n
Meinardus and Taylor obtained a recursive scheme Lki (f) =
k−1 k−1 (gk )Lik−1 (f) − Lik−1 (gk )Li+1 (f) Li+1 k−1 Li+1 (gk ) − Lik−1 (gk )
with L0i (f) = f(hi ); i = n; : : : ; n + k. This is the same scheme as above. Newton’s formula for computing the interpolation polynomial is well known. It is based on divided dierences. One can try to generalize these formulae to the case of interpolation by a linear combination of functions from a complete Chebyshev system (a technical concept which insures the existence and uniqueness of the solution). We seek Pk(n) (x) = a0 g0 (x) + · · · + ak gk (x); satisfying the interpolation conditions Pk(n) (xi ) = f(xi );
i = n; : : : ; n + k;
where the xi ’s are distinct points and the gi ’s given functions. The Pk(n) can be recursively computed by an algorithm which generalizes the Neville–Aitken scheme for polynomial interpolation. This algorithm was obtained by Gunter Muhlbach in 1976 [103] from a generalization of the notion of divided dierences and their recurrence relationship. This algorithm was called the Muhlbach– Neville–Aitken algorithm, for short the MNA. It is as follows: Pk(n) (x)
=
(n+1) (n) (n) (n+1) gk−1; k (x)Pk−1 (x) − gk−1; k (x)Pk−1 (x) (n+1) (n) gk−1; k (x) − gk−1; k (x)
10
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
with P0(n) (x) = f(x n )g0 (x)=g0 (x n ). The gk;(n)i ’s can be recursively computed by a quite similar relationship gk;(n)i (x) =
(n+1) (n) (n) (n+1) gk−1; k (x)gk−1; i (x) − gk−1; k (x)gk−1; i (x) (n+1) (n) gk−1; k (x) − gk−1; k (x)
with g0;(n)i (x) = gi (x n )g0 (x)=g0 (x n ) − gi (x). If g0 (x) ≡ 1, if it is assumed that ∀i ¿ 0; gi (0) = 0, the quantities Pk(n) (0) are the same as those obtained by the E-algorithm and the MNA reduces to it. Let us mention that, in fact, the MNA is closely related to the work of Henri Marie Andoyer (1862–1929) which goes back to 1906 [2]; see [30] for detailed explanations. We now come to the work of Tore Havie. We already mentioned Romberg’s method for accelerating the convergence of the trapezoidal rule. The success of this procedure is based on the existence of the Euler–Maclaurin expansion for the error. This expansion only holds if the function to be integrated has no singularity in the interval. In the presence of singularities, the expansion of the error is no longer a series in h2 (the stepsize) but a more complicated one depending on the singularity. Thus, Romberg’s scheme has to be modi ed to incorporate the various terms appearing in the expansion of the error. Several authors worked on this question, treating several types of singularities. In particular, Havie began to study this question under Romberg (Romberg emigrated to Norway and came to Trondheim in 1949). In 1978, Havie wrote a report, published one year later [71], where he treated the most general case of an error expansion of the form S(h) − S = a1 g1 (h) + a2 g2 (h) + · · · ; where S(h) denotes the approximation obtained by the trapezoidal rule with step size h to the de nite integral S and the gi are the known functions (forming an asymptotic sequence when h tends to zero) appearing in the expansion of the error. Let h0 ¿ h1 ¿ · · · ¿ 0; Sn = S(hn ) and gi (n) = gi (hn ). Havie set g1 (n + 1)Sn − g1 (n)Sn+1 E1(n) = : g1 (n + 1) − g1 (n) Replacing Sn and Sn+1 by their expansions, he obtained E1(n) = S + a2 g1;(n)2 + a3 g1;(n)3 + · · · with g1;(n)i =
g1 (n + 1)gi (n) − g1 (n)gi (n + 1) : g1 (n + 1) − g1 (n)
The same process can be repeated for eliminating g1;(n)2 in the the expansion of E1(n) , and so on. Thus, once again we obtain the E-algorithm Ek(n) =
(n+1) (n) (n) (n+1) gk−1; k Ek−1 − gk−1; k Ek−1 (n+1) (n) gk−1; k − gk−1; k
with E0(n) = Sn and g0;(n)i = gi (n). The auxiliary quantities gk;(n)i are recursively computed by the quite similar rule (n) (n) (n+1) g(n+1) gk−1; i − gk−1; k gk−1; i gk;(n)i = k−1; k (n+1) (n) gk−1; k − gk−1; k with g0;(n)i = gi (n).
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
11
Havie gave an interpretation of this algorithm in terms of the Gaussian elimination process for solving the system Ek(n) + b1 g1 (n + i) + · · · + bk gk (n + i) = Sn+i ;
i = 0; : : : ; k
Ek(n) .
for the unknown In 1980, Brezinski took up the same problem, but from the point of view of extrapolation [19]. Let (Sn ) be the sequence to be accelerated. Interpolating it by a sequence of the form Sn0 = S + a1 g1 (n) + · · · + ak gk (n), where the gi ’s are known sequences which can depend on the sequence (Sn ) itself, leads to 0 ; Sn+i = Sn+i
i = 0; : : : ; k:
Solving this system directly for the unknown S (which, since it depends on n and k, will be denoted by Ek(n) ) gives Sn g1 (n) . .. g (n) k (n) Ek = 1 g1 (n) . .. g (n) k
· · · Sn+k · · · g1 (n + k) .. .
· · · gk (n + k) : ··· 1 · · · g1 (n + k) .. .
· · · gk (n + k)
Thus Ek(n) is given as a ratio of determinants which is very similar to the ratios previously mentioned. Indeed, for the choice gi (n)=Sn+i , the ratio appearing in Shanks transformation results while, when gi (n) = xin , we obtain the ratio expressing the quantities involved in the Richardson extrapolation process. Other algorithms may be similarly derived. Now the problem is to nd a recursive algorithm for computing the Ek(n) ’s. Applying Sylvester’s determinantal identity, Brezinski obtained the two rules of the above E-algorithm. His derivation of the E-algorithm is closely related to Havie’s since Sylvester’s identity can be proved by using Gaussian elimination. Brezinski also gave convergence and acceleration results for this algorithm when the (gi (n)) satisfy certain conditions [19]. These results show that, for accelerating the convergence of a sequence, it is necessary to know the expansion of the error Sn − S with respect to some asymptotic sequence (g1 (n)); (g2 (n)); : : : : The gi (n) are those to be used in the E-algorithm. It can be proved that, ∀k (n) Ek+1 −S = 0: (n) n→∞ E k −S
lim
These results were re ned by Avram Sidi [134 –136]. Thus the study of the asymptotic expansion of the error of the sequences to be accelerated is of primary importance, see Walz [144]. For example, Mohammed Kzaz [79,80] and Pierre Verlinden [142] applied this idea to the problem of accelerating the convergence of Gaussian quadrature formulae [79] and Pedro Lima and Mario Graca to boundary value problems with singularities [88,87] (see also the works of Lima and Diogo [87], and Lima and Carpentier [86]). Other acceleration results were obtained by Matos and Marc Prevost [94], Prevost
12
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
[116] and Pascal Mortreux and Prevost [102]. An algorithm, more economical than the E-algorithm, was given by William F. Ford and Avram Sidi [60]. The connection between the E-algorithm and the -algorithm was studied by Bernhard Beckermann [4]. A general -algorithm connected to the E-algorithm was given by Carsten Carstensen [48]. See [27] for a more detailed review on the E-algorithm. Convergence acceleration algorithms can also be used for predicting the unknowns terms of a series or a sequence. This idea, introduced by Jacek Gilewicz [64], was studied by Sidi and Levin [137], Brezinski [22] and Denis Vekemans [141]. 2.5. A new approach Over the years, a quite general framework was constructed for the theory of extrapolation algorithms. The situation was quite dierent for the practical construction of extrapolation algorithms and there was little systematic research in their derivation. However, thanks to a formalism due to Weniger [145], such a construction is now possible, see Brezinski and Matos [38]. It is as follows. Let us assume that the sequence (Sn ) to be accelerated satis es, ∀n; Sn − S = an Dn where (Dn ) is a known sequence, called a remainder (or error) estimate for the sequence (Sn ), and (an ) an unknown sequence. It is possible to construct a sequence transformation such that its kernel is precisely this set of sequences. For that purpose, we have to assume that a dierence operator L (that is a linear mapping of the set of sequences into itself) exists such that ∀n; L(an ) = 0. This means that the sequence obtained by applying L to the sequence (an ) is identically zero. Such a dierence operator is called an annihilation operator for the sequence (an ). We have S Sn − = an : Dn Dn Applying L and using linearity leads to Sn 1 L − SL = L(an ) = 0: Dn Dn We solve for S and designate it by the sequence transformation L(Sn =Dn ) Tn = : L(1=Dn ) The sequence (Tn ) is be such that ∀n; Tn = S if and only if ∀n; Sn − S = an Dn . This approach is highly versatile. All the algorithms described above and the related devices such as error control, composite sequence transformations, least squares extrapolation, etc., can be put into this framework. Moreover, many new algorithms can be obtained using this approach. The E-algorithm can also be put into this framework which provides a deeper insight and leads to new properties [41]. Matos [93], using results from the theory of dierence equations, obtained new and general convergence and acceleration results when (an ) has an asymptotic expansion of a certain form. 2.6. Integrable systems The connection between convergence acceleration algorithms and discrete integrable systems is a subject whose interest is rapidly growing among physicists. When a numerical scheme is used for
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
13
integrating a partial dierential evolution equation, it is important that it preserves the quantities that are conserved by the partial dierential equation itself. An important character is the integrability of the equation. Although this term has not yet received a completely satisfactory de nition (see [66]), it can be understood as the ability to write the solution explicitly in terms of a nite number of functions or as the con nement of singularities in nite domains. The construction of integrable discrete forms of integrable partial dierential equations is highly nontrivial. A major discovery in the eld of integrability was the occurrence of a solitary wave (called a soliton) in the Korteweg– de Vries (KdV) equation. Integrability is a rare phenomenon and the typical dynamical system is nonintegrable. A test of integrability, called singularity con nement, was given by B. Grammaticos, A. Ramani and V. Papageorgiou [67]. It turns out that this test is related to the existence of singular rules for avoiding a division by zero in convergence acceleration algorithms (see Section 1.2). The literature on this topic is vast and we cannot enter into the details of it. We only want to give an indication of the connection between these two subjects since both domains could bene t from it. In the rule for the -algorithm, V. Papageorgiou, B. Grammaticos and A. Ramani set m = k + n and replaced k(n) by u(n; m) + mp + nq, where p and q satisfy p2 − q2 = 1. They obtained [111] [p − q + u(n; m + 1) − u(n + 1; m)][p + q + u(n + 1; m + 1) − u(n; m)] = p2 − q2 : This is the discrete lattice KdV equation. Since this equation is integrable, one can expect integrability to hold also for the -algorithm, and, thanks to the singular rules of Wynn and Cordellier mentioned at the end of Subsection 1.2, this is indeed the case. In the rule of the -algorithm, making the change of variable k = t=3 and n − 1=2 = x= − ct=3 and replacing k(n) by p + 2 u(x − =2; t) where c and p are related by 1 − 2c = 1=p2 , A. Nagai and J. Satsuma obtained [105] 1 1 2 u(x − =2 + c; t + 3 ) − 2 u(x + =2 − c; t − 3 ) = − : 2 2 p + u(x + =2; t) p + u(x − =2; t) We have, to terms of order 5 , the KdV equation ut −
1 1 uux + (1 − p−4 )uxxx = 0: 3 p 48p2
Other discrete numerical algorithms, such as the qd, LR, and -algorithms are connected to other discrete or continuous integrable equations (see, for example [112]). Formal orthogonal polynomials, continued fractions, Pade approximation also play a rˆole in this topic [113]. By replacing the integer n in the -algorithm by a continuous variable, Wynn derived the con uent form of the -algorithm [149] k+1 (t) = k−1 (t) +
1 k0 (t)
with −1 (t) ≡ 0 and 0 (t) = f(t). This algorithm is the continuous counterpart of the -algorithm 0 (t), A. Nagai, T. Tokihiro and and its aim is to compute limt→∞ f(t). Setting Nk (t) = k0 (t)k+1 J. Satsuma [106] obtained Nk0 (t) = Nk (t)[Nk−1 (t) − Nk+1 (t)]: The above equation is the Backlund transformation of the discrete Toda molecule equation [139].
14
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
So, we see that some properties of integrable systems are related to properties of convergence acceleration algorithms. On the other hand, discretizing integrable partial dierential equations leads to new sequence transformations which have to be studied from the point of view of their algebraic and acceleration properties. Replacing the second integer k in the con uent form of the -algorithm by a continuous variable, Wynn obtained a partial dierential equation [152]. Its relation with integrable systems is an open question. The connection between integrable systems and convergence acceleration algorithms needs to be investigated in more details to fully understand its meaning which is not clear yet. 3. The vector case In numerical analysis, many iterative methods lead to vector sequences. To accelerate the convergence of such sequences, it is always possible to apply a scalar algorithm componentwise. However, vector sequence transformations, specially built for that purpose, are usually more powerful. The rst vector algorithm to be studied was the vector -algorithm. It was obtained by Wynn [150] by replacing, in the rule of the scalar -algorithm, 1=k(n) by (k(n) )−1 where the inverse y−1 of a vector y is de ned by y−1 = y=(y; y). Thus, with this de nition, the rule of the -algorithm can be (n) =S applied to vector sequences. Using Cliord algebra, J.B. McLeod proved in 1971 [96] that ∀n; 2k if the sequence (Sn ) satis es a0 (Sn − S) + · · · + ak (Sn+k − S) = 0; ∀n with a0 ak 6= 0; a0 + · · · + ak 6= 0. This result is only valid for real sequences (Sn ) and real ai ’s. Moreover, contrary to the scalar case, this condition is only sucient. In 1983, Peter R. Graves–Morris [68] extended this result to the complex case using a quite dierent approach. A drawback to the development of the theory of the vector -algorithm was that it was not known whether a corresponding generalization of Shanks transformation was underlying the algorithm, that is whether the vectors k(n) obtained by the algorithm could be expressed as ratios of determinants (or some kind of generalization of determinants). This is why Brezinski [17], following the same path as Shanks, tried to construct a vector sequence transformation with the kernel a0 (Sn − S) + · · · + ak (Sn+k − S) = 0. He obtained a transformation expressed as a ratio of determinants. He then had to develop a recursive algorithm for avoiding their computation. This was the so-called topological -algorithm. This algorithm has many applications, in particular, to the solution of systems of linear equations (it is related to the biconjugate gradient algorithm [18, pp. 185]). In the case of a system of nonlinear equations, it gave rise to a generalization of Steensen’s method [13]. That algorithm has a quadratic convergence under some assumptions as established by Herve Le Ferrand [83] following the ideas presented by Khalide Jbilou and Sadok [75]. The denominator of the vector (n) 2k obtained by the vector -algorithm was rst written as a determinant of dimension 2k + 1 by Graves–Morris and Chris Jenkins in [69]. The numerator follows immediately by modifying the rst row of the denominator, a formula given by Ahmed Salam and Graves–Morris [126]. However, the dimension of the corresponding determinants in the scalar case is only k + 1. It was proved by (n) Salam [124] that the vectors 2k computed by the vector -algorithm can be expressed as a ratio of two designants of dimension k + 1. A designant is a generalization of a determinant when solving a system of linear equations in a noncommutative algebra. An algebraic approach to this algorithm was given in [125]. This approach, which involves the use of a Cliord algebra, was used in [45] for extending the mechanism given in [41] to the vector and matrix cases. The vector generalization
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
15
of the E-algorithm [19] can be explained similarly. This algorithm makes use of a xed vector y. Jet Wimp [146, pp. 176 –177] generalized it using a sequence (yn ) instead of y. Jeannette van Iseghem [140] gave an algorithm for accelerating vector sequences based on the vector orthogonal polynomials she introduced for generalizing Pade approximants to the vector case. Other vector sequence transformations are due to Osada [109] and Jbilou and Sadok [76]. Benchiboun [6] and Abderrahim Messaoudi [100] studied matrix extrapolation algorithms. We have seen that, in the scalar case, the kernels of sequence transformations may be expressed as relationships with constant coecients. This is also the case for the vector and the topological -algorithms and the vector E-algorithm. The rst (and, to my knowledge, only) transformation treating a relationship with varying coecients was introduced in [42]. The theory developed there also explains why the case of a relationship with non-constant coecients is a dicult problem in the scalar case and why it could be solved, on the contrary, in the vector case. The reason is that the number of unknown coecients appearing in the expression of the kernel must be strictly less than the dimension of the vectors. Brezinski in [34] proposed a general methodology for constructing vector sequence transformations. It leads to a uni ed presentation of several approaches to the subject and to new results. He also discussed applications to linear systems. In fact, as showed by Sidi [133], and Jbilou and Sadok [75], vector sequence transformations are closely related to projection methods for the solution of systems of equations. In particular, the RPA, a vector sequence transformation de ned by Brezinski [20] was extensively studied by Messaoudi who showed its connections to direct and iterative methods for solving systems of linear equations [98,99]. Vector sequence transformations lead to new methods for the solution of systems of nonlinear equations. They also have other applications. First of all, it is quite important to accelerate the convergence of iterative methods for the solution of systems of linear equations, see [32,33,36]. Special vector extrapolation techniques were designed for the regularization of ill-posed linear systems in [43] and the idea of extrapolation was used in [35] to obtain estimates of the norm of the error when solving a system of linear equations by an arbitrary method, direct or iterative. General theoretical results similar to those obtained in the scalar case are still lacking in the vector case although some partial results have been obtained. Relevant results on quasilinear transformations are in the papers by Sadok [123] and Benazzouz [8]. The present author proposed a mechanism for vector sequence transformations in [45,34]. 4. Conclusions and perspectives In this paper, I have tried to give a survey of the development of convergence acceleration methods for scalar and vector sequences in the 20th century. These methods are based on the idea of extrapolation. Since a universal algorithm for accelerating the convergence of all sequences cannot exist (and this is even true for some restricted classes of sequences), it was necessary to de ne and study a large variety of algorithms, each of them being appropriate for some special subsets of sequences. It is, of course, always possible to construct other convergence acceleration methods for scalar sequences. However, to be of interest, such new processes must provide a major improvement over existing ones. For scalar sequence transformations, the emphasis must be placed on the theory rather than on special devices (unless a quite powerful one is found) and on the application of new
16
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
methods to particular algorithms in numerical analysis and to various domains of applied sciences. In particular, the connection between convergence acceleration algorithms and continuous and discrete integrable systems brings a dierent and fresh look to both domains and could be of bene t to them. An important problem in numerical analysis is the solution of large, sparse systems of linear equations. Most of the methods used nowadays are projection methods. Often the iterates obtained in such problems must be subject to acceleration techniques. However, many of the known vector convergence acceleration algorithms require the storage of too many vectors to be useful. New and cheaper acceleration algorithms are required. This dicult project, in my opinion, oers many opportunities for future research. In this paper, I only brie y mentioned the con uent algorithms whose aim is the computation of the limit of a function when the variable tends to in nity (the continuous analog of the problem of convergence acceleration for a sequence). This subject and its applications will provide fertile ground for new discoveries. Acknowledgements I would like to thank Jet Wimp for his careful reading of the paper. He corrected my English in many places, he asked me to provide more explanations when needed, and suggested many improvements in the presentation. I am also indebted to Naoki Osada for his informations about Takakazu Seki. References [1] A.C. Aitken, On Bernoulli’s numerical solution of algebraic equations, Proc. Roy. Soc. Edinburgh 46 (1926) 289–305. [2] H. Andoyer, Interpolation, in: J. Molk (Ed.), Encyclopedie des Sciences Mathematiques Pures et Appliquees, Tome I, Vol. 4, Fasc. 1, I–21, Gauthier–Villars, Paris, 1904 –1912, pp.127–160; (reprint by Editions Gabay, Paris, 1993). [3] F.L. Bauer, La methode d’integration numerique de Romberg, in: Colloque sur l’Analyse Numerique, Librairie Universitaire, Louvain, 1961, pp. 119 –129. [4] B. Beckermann, A connection between the E-algorithm and the epsilon-algorithm, in: C. Brezinski (Ed.), Numerical and Applied Mathematics, Baltzer, Basel, 1989, pp. 443–446. [5] M.D. Benchiboun, Etude de Certaines Generalisations du 2 d’Aitken et Comparaison de Procedes d’Acceleration de la Convergence, These 3eme cycle, Universite de Lille I, 1987. [6] M.D. Benchiboun, Extension of Henrici’s method to matrix sequences, J. Comput. Appl. Math. 75 (1996) 1–21. [7] A. Benazzouz, Quasilinear sequence transformations, Numer. Algorithms 15 (1997) 275–285. [8] A. Benazzouz, GL(E)-quasilinear transformations and acceleration, Appl. Numer. Math. 27 (1998) 109–122. [9] A. Berlinet, Sequence transformations as statistical tools, Appl. Numer. Math. 1 (1985) 531–544. [10] A.H. Bentbib, Acceleration of convergence of interval sequences, J. Comput. Appl. Math. 51 (1994) 395–409. [11] N. Bogolyubov, N. Krylov, On Rayleigh’s principle in the theory of dierential equations of mathematical physics and upon Euler’s method in the calculus of variation, Acad. Sci. Ukraine (Phys. Math.) 3 (1926) 3–22 (in Russian). [12] H.C. Bolton, H.I. Scoins, Eigenvalues of dierential equations by nite-dierence methods, Proc. Cambridge Philos. Soc. 52 (1956) 215–229. [13] C. Brezinski, Application de l’-algorithme a la resolution des systemes non lineaires, C.R. Acad. Sci. Paris 271 A (1970) 1174–1177. [14] C. Brezinski, Acceleration de suites a convergence logarithmique, C. R. Acad. Sci. Paris 273 A (1971) 727–730. [15] C. Brezinski, Etude sur les et -algorithmes, Numer. Math. 17 (1971) 153–162.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
17
[16] C. Brezinski, L’-algorithme et les suites totalement monotones et oscillantes, C.R. Acad. Sci. Paris 276 A (1973) 305–308. [17] C. Brezinski, Generalisation de la transformation de Shanks, de la table de Pade et de l’-algorithme, Calcolo 12 (1975) 317–360. [18] C. Brezinski, Pade-Type Approximation and General Orthogonal Polynomials, Birkhauser, Basel, 1980. [19] C. Brezinski, A general extrapolation algorithm, Numer. Math. 35 (1980) 175–187. [20] C. Brezinski, Recursive interpolation, extrapolation and projection, J. Comput. Appl. Math. 9 (1983) 369–376. [21] C. Brezinski, Error control in convergence acceleration processes, IMA J. Numer. Anal. 3 (1983) 65–80. [22] C. Brezinski, Prediction properties of some extrapolation methods, Appl. Numer. Math. 1 (1985) 457–462. [23] C. Brezinski, Composite sequence transformations, Numer. Math. 46 (1985) 311–321. [24] C. Brezinski, A. Lembarki, Acceleration of extended Fibonacci sequences, Appl. Numer. Math. 2 (1986) 1–8. [25] C. Brezinski, A new approach to convergence acceleration methods, in: A. Cuyt (Ed.), Nonlinear Numerical Methods and Rational Approximation, Reidel, Dordrecht, 1988, pp. 373–405. [26] C. Brezinski, Quasi-linear extrapolation processes, in: R.P. Agarwal et al. (Eds.), Numerical Mathematics, Singapore 1988, International Series of Numerical Mathematics, Vol. 86, Birkhauser, Basel, 1988, pp. 61–78. [27] C. Brezinski, A survey of iterative extrapolation by the E-algorithm, Det Kong. Norske Vid. Selsk. Skr. 2 (1989) 1–26. [28] C. Brezinski, A Bibliography on Continued Fractions, Pade Approximation, Extrapolation and Related Subjects, Prensas Universitarias de Zaragoza, Zaragoza, 1991. [29] C. Brezinski, History of Continued Fractions and Pade Approximants, Springer, Berlin, 1991. [30] C. Brezinski, The generalizations of Newton’s interpolation formula due to Muhlbach and Andoyer, Electron Trans. Numer. Anal. 2 (1994) 130–137. [31] C. Brezinski, Extrapolation algorithms and Pade approximations: a historical survey, Appl. Numer. Math. 20 (1996) 299–318. [32] C. Brezinski, Variations on Richardson’s method and acceleration, in: Numerical Analysis, A Numerical Analysis Conference in Honour of Jean Meinguet, Bull. Soc. Math. Belgium 1996, pp. 33– 44. [33] C. Brezinski, Projection Methods for Systems of Equations, North-Holland, Amsterdam, 1997. [34] C. Brezinski, Vector sequence transformations: methodology and applications to linear systems, J. Comput. Appl. Math. 98 (1998) 149–175. [35] C. Brezinski, Error estimates for the solution of linear systems, SIAM J. Sci. Comput. 21 (1999) 764–781. [36] C. Brezinski, Acceleration procedures for matrix iterative methods, Numer. Algorithms, to appear. [37] C. Brezinski, J.P. Delahaye, B. Germain-Bonne, Convergence acceleration by extraction of linear subsequences, SIAM J. Numer. Anal. 20 (1983) 1099–1105. [38] C. Brezinski, A.C. Matos, A derivation of extrapolation algorithms based on error estimates, J. Comput. Appl. Math. 66 (1996) 5–26. [39] C. Brezinski, M. Redivo Zaglia, Construction of extrapolation processes, Appl. Numer. Math. 8 (1991) 11–23. [40] C. Brezinski, M. Redivo Zaglia, Extrapolation Methods, Theory and Practice, North-Holland, Amsterdam, 1991. [41] C. Brezinski, M. Redivo Zaglia, A general extrapolation procedure revisited, Adv. Comput. Math. 2 (1994) 461–477. [42] C. Brezinski, M. Redivo Zaglia, Vector and matrix sequence transformations based on biorthogonality, Appl. Numer. Math. 21 (1996) 353–373. [43] C. Brezinski, M. Redivo Zaglia, G. Rodriguez, S. Seatzu, Extrapolation techniques for ill-conditioned linear systems, Numer. Math. 81 (1998) 1–29. [44] C. Brezinski, A.C. Rieu, The solution of systems of equations using the vector -algorithm, and an application to boundary value problems, Math. Comp. 28 (1974) 731–741. [45] C. Brezinski, A. Salam, Matrix and vector sequence transformation revisited, Proc. Edinburgh Math. Soc. 38 (1995) 495–510. [46] C. Brezinski, G. Walz, Sequences of transformations and triangular recursion schemes with applications in numerical analysis, J. Comput. Appl. Math. 34 (1991) 361–383. [47] R. Bulirsch, J. Stoer, Numerical treatment of ordinary dierential equations by extrapolation methods, Numer. Math. 8 (1966) 1–13. [48] C. Carstensen, On a general epsilon algorithm, in: C. Brezinski (Ed.), Numerical and Applied Mathematics, Baltzer, Basel, 1989, pp. 437–441.
18
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
[49] J.L. Chabert et al., Histoire d’Algorithmes, Belin, Paris, 1994. [50] L. Collatz, Numerische Behandlung von Dierentialgleichungen, Springer, Berlin, 1951. [51] F. Cordellier, Demonstration algebrique de l’extension de l’identite de Wynn aux tables de Pade non normales, in: L. Wuytack (Ed.), Pade Approximation and its Applications, Lecture Notes in Mathematics, Vol. 765, Springer, Berlin, 1979, pp. 36–60. [52] M. Crouzeix, A.L. Mignot, Analyse Numerique des Equations Dierentielles, 2nd Edition, Masson, Paris, 1989. [53] A. Cuyt, L. Wuytack, Nonlinear Methods in Numerical Analysis, North-Holland, Amsterdam, 1987. [54] J.P. Delahaye, Automatic selection of sequence transformations, Math. Comp. 37 (1981) 197–204. [55] J.P. Delahaye, Sequence Transformations, Springer, Berlin, 1988. [56] J.P. Delahaye, B. Germain-Bonne, Resultats negatifs en acceleration de la convergence, Numer. Math. 35 (1980) 443–457. [57] J. Dutka, Richardson-extrapolation and Romberg-integration, Historia Math. 11 (1984) 3–21. [58] A. Fdil, Selection entre procedes d’acceleration de la convergence, M2AN 30 (1996) 83–101. [59] A. Fdil, A new technique of selection between sequence transformations, Appl. Numer. Math. 25 (1997) 21–40. [60] W.F. Ford, A. Sidi, An algorithm for a generalization of the Richardson extrapolation process, SIAM J. Numer. Anal. 24 (1987) 1212–1232. [61] E. Gekeler, On the solution of systems of equations by the epsilon algorithm of Wynn, Math. Comp. 26 (1972) 427–436. [62] B. Germain-Bonne, Transformations de suites, RAIRO R1 (1973) 84–90. [63] B. Germain-Bonne, Estimation de la Limite de Suites et Formalisation de Procedes d’Acceleration de la Convergence, These d’Etat, Universite de Lille I, 1978. [64] J. Gilewicz, Numerical detection of the best Pade approximant and determination of the Fourier coecients of insuciently sampled function, in: P.R. Graves-Morris (Ed.), Pade Approximants and their Applications, Academic Press, New York, 1973, pp. 99–103. [65] W.B. Gragg, On extrapolation algorithms for initial-value problems, SIAM J. Numer. Anal. 2 (1965) 384–403. [66] B. Grammaticos, A. Ramani, Integrability – and how to detect it, in: Y. Kosmann-Schwarzbach et al. (Eds.), Integrability of Nonlinear Systems, Springer, Berlin, 1997, pp. 30–94. [67] B. Grammaticos, A. Ramani, V.G. Papageorgiou, Do integrable mappings have the Painleve property? Phys. Rev. Lett. 67 (1991) 1825–1828. [68] P.R. Graves-Morris, Vector valued rational interpolants I, Numer. Math. 42 (1983) 331–348. [69] P.R. Graves-Morris, C.D. Jenkins, Vector valued rational interpolants III, Constr. Approx. 2 (1986) 263–289. [70] H.L. Gray, T.A. Atchison, G.V. McWilliams, Higher order G – transformations, SIAM J. Numer. Anal. 8 (1971) 365–381. [71] T. Havie, Generalized Neville type extrapolation schemes, BIT 19 (1979) 204–213. [72] H. Hankel, Ueber eine besondere Classe der symmetrischen Determinanten, Inaugural Dissertation, Universitat Gottingen, 1861. [73] A. Hirayama, K. Shimodaira, H. Hirose, Takakazu Seki’s Collected Works Edited with Explanations, Osaka Kyoiku Tosho, Osaka, 1974. [74] H.H.H. Homeier, A hierarchically consistent, iterative sequence transformation, Numer. Algorithms 8 (1994) 47–81. [75] K. Jbilou, H. Sadok, Some results about vector extrapolation methods and related xed-point iterations, J. Comput. Appl. Math. 36 (1991) 385–398. [76] K. Jbilou, H. Sadok, Hybrid vector transformations, J. Comput. Appl. Math. 81 (1997) 257–267. [77] D.C. Joyce, Survey of extrapolation processes in numerical analysis, SIAM Rev. 13 (1971) 435–490. [78] K. Kommerell, Das Grenzgebiet der Elementaren und Hoheren Mathematik, Verlag Kohler, Leipzig, 1936. [79] M. Kzaz, Gaussian quadrature and acceleration of convergence, Numer. Algorithms 15 (1997) 75–89. [80] M. Kzaz, Convergence acceleration of the Gauss–Laguerre quadrature formula, Appl. Numer. Math. 29 (1999) 201–220. [81] P.J. Laurent, Un theoreme de convergence pour le procede d’extrapolation de Richardson, C.R. Acad. Sci. Paris 256 (1963) 1435–1437. [82] H. Lavastre, On the stochastic acceleration of sequences of random variables, Appl. Numer. Math. 15 (1994) 77–98. [83] H. Le Ferrand, Convergence of the topological -algorithm for solving systems of nonlinear equations, Numer. Algorithms 3 (1992) 273–284.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
19
[84] H. Le Ferrand, Recherches d’extrema par des methodes d’extrapolation, C.R. Acad. Sci. Paris, Ser. I 318 (1994) 1043–1046. [85] D. Levin, Development of non-linear transformations for improving convergence of sequences, Int. J. Comput. Math. B3 (1973) 371–388. [86] P.M. Lima, M.P. Carpentier, Asymptotic expansions and numerical approximation of nonlinear degenerate boundary-value problems, Appl. Numer. Math. 30 (1999) 93–111. [87] P. Lima, T. Diogo, An extrapolation method for a Volterra integral equation with weakly singular kernel, Appl. Numer. Math. 24 (1997) 131–148. [88] P.M. Lima, M.M Graca, Convergence acceleration for boundary value problems with singularities using the E-algorithm, J. Comput. Appl. Math. 61 (1995) 139–164. [89] S. Lubkin, A method of summing in nite series, J. Res. Natl. Bur. Standards 48 (1952) 228–254. [90] C. Maclaurin, Treatise of Fluxions, Edinburgh, 1742. [91] G.I. Marchuk, V.V. Shaidurov, Dierence Methods and their Extrapolations, Springer, Berlin, 1983. [92] A.C. Matos, Acceleration methods based on convergence tests, Numer. Math. 58 (1990) 329–340. [93] A.C. Matos, Linear dierence operators and acceleration methods, IMA J. Numer. Anal., to appear. [94] A.C. Matos, M. Prevost, Acceleration property for the columns of the E-algorithm, Numer. Algorithms 2 (1992) 393–408. [95] J.C. Maxwell, A Treatise on Electricity and Magnetism, Oxford University Press, Oxford, 1873. [96] J.B. McLeod, A note on the -algorithm, Computing 7 (1971) 17–24. [97] G. Meinardus, G.D. Taylor, Lower estimates for the error of the best uniform approximation, J. Approx. Theory 16 (1976) 150–161. [98] A. Messaoudi, Recursive interpolation algorithm: a formalism for solving systems of linear equations – I, Direct methods, J. Comput. Appl. Math. 76 (1996) 13–30. [99] A. Messaoudi, Recursive interpolation algorithm: a formalism for solving systems of linear equations – II, Iterative methods, J. Comput. Appl. Math. 76 (1996) 31–53. [100] A. Messaoudi, Matrix extrapolation algorithms, Linear Algebra Appl. 256 (1997) 49–73. [101] R.M. Milne, Extension of Huygens’ approximation to a circular arc, Math. Gaz. 2 (1903) 309–311. [102] P. Mortreux, M. Prevost, An acceleration property for the E-algorithm for alternate sequences, Adv. Comput. Math. 5 (1996) 443–482. [103] G. Muhlbach, Neville-Aitken algorithms for interpolating by functions of Ceby sev-systems in the sense of Newton and in a generalized sense of Hermite, in: A.G. Law, B.N. Sahney (Eds.), Theory of Approximation with Applications, Academic Press, New York, 1976, pp. 200–212. [104] H. Naegelsbach, Studien zu Furstenau’s neuer Methode der Darstellung und Berechnung der Wurzeln algebraischer Gleichungen durch Determinanten der Coecienten, Arch. Math. Phys. 59 (1876) 147–192; 61 (1877) 19 –85. [105] A. Nagai, J. Satsuma, Discrete soliton equations and convergence acceleration algorithms, Phys. Lett. A 209 (1995) 305–312. [106] A. Nagai, T. Tokihiro, J. Satsuma, The Toda molecule equation and the -algorithm, Math. Comp. 67 (1998) 1565–1575. [107] T.H. O’Beirne, On linear iterative processes and on methods of improving the convergence of certain types of iterated sequences, Technical Report, Torpedo Experimental Establishment, Greenock, May 1947. [108] N. Osada, An acceleration theorem for the -algorithm, Numer. Math. 73 (1996) 521–531. [109] N. Osada, Vector sequence transformations for the acceleration of logarithmic convergence, J. Comput. Appl. Math. 66 (1996) 391–400. [110] K.J. Overholt, Extended Aitken acceleration, BIT 5 (1965) 122–132. [111] V. Papageorgiou, B. Grammaticos, A. Ramani, Integrable dierence equations and numerical analysis algorithms, in: D. Levi et al. (Eds.), Symmetries and Integrability of Dierence Equations, CRM Proceedings and Lecture Notes, Vol. 9, AMS, Providence, RI, 1996, pp. 269–279. [112] V. Papageorgiou, B. Grammaticos, A. Ramani, Integrable lattices and convergence acceleration algorithms, Phys. Lett. A 179 (1993) 111–115. [113] V. Papageorgiou, B. Grammaticos, A. Ramani, Orthogonal polynomial approach to discrete Lax pairs for initial-boundary value problems of the QD algorithm, Lett. Math. Phys. 34 (1995) 91–101. [114] R. Pennacchi, Le trasformazioni razionali di una successione, Calcolo 5 (1968) 37–50.
20
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21
[115] R. Powell, S.M. Shah, Summability Theory and its Applications, Van Nostrand Reinhold, London, 1972. [116] M. Prevost, Acceleration property for the E-algorithm and an application to the summation of series, Adv. Comput. Math. 2 (1994) 319–341. [117] J.D. Pryce, Numerical Solution of Sturm–Liouville Problems, Clarendon Press, Oxford, 1993. [118] B. Rhanizar, On extrapolation methods in optimization, Appl. Numer. Math. 25 (1997) 485–498. [119] L.F. Richardson, The approximate arithmetical solution by nite dierence of physical problems involving dierential equations, with an application to the stress in a masonry dam, Philos. Trans. Roy. Soc. London, Ser. A 210 (1910) 307–357. [120] L.F. Richardson, The deferred approach to the limit. I: Single lattice, Philos. Trans. Roy. Soc. London, Ser. A 226 (1927) 299–349. [121] W. Romberg, Vereinfachte numerische Integration, Kgl. Norske Vid. Selsk. Forsk. 28 (1955) 30–36. [122] H. Rutishauser, Ausdehnung des Rombergschen Prinzips, Numer. Math. 5 (1963) 48–54. [123] H. Sadok, Quasilinear vector extrapolation methods, Linear Algebra Appl. 190 (1993) 71–85. [124] A. Salam, Non-commutative extrapolation algorithms, Numer. Algorithms 7 (1994) 225–251. [125] A. Salam, An algebraic approach to the vector -algorithm, Numer. Algorithms 11 (1996) 327–337. [126] P.R. Graves-Morris, D.E. Roberts, A. Salam, The epsilon algorithm and related topics, J. Comput. Appl. Math. 122 (2000). [127] J.R. Schmidt, On the numerical solution of linear simultaneous equations by an iterative method, Philos. Mag. 7 (1941) 369–383. [128] C. Schneider, Vereinfachte Rekursionen zur Richardson-Extrapolation in Spezialfallen, Numer. Math. 24 (1975) 177–184. [129] M.N. Senhadji, On condition numbers of some quasi-linear transformations, J. Comput. Appl. Math. 104 (1999) 1–19. [130] D. Shanks, An analogy between transient and mathematical sequences and some nonlinear sequence-to-sequence transforms suggested by it, Part I, Memorandum 9994, Naval Ordnance Laboratory, White Oak, July 1949. [131] D. Shanks, Non linear transformations of divergent and slowly convergent sequences, J. Math. Phys. 34 (1955) 1–42. [132] W.F. Sheppard, Some quadrature formulas, Proc. London Math. Soc. 32 (1900) 258–277. [133] A. Sidi, Extrapolation vs. projection methods for linear systems of equations, J. Comput. Appl. Math. 22 (1988) 71–88. [134] A. Sidi, On a generalization of the Richardson extrapolation process, Numer. Math. 57 (1990) 365–377. [135] A. Sidi, Further results on convergence and stability of a generalization of the Richardson extrapolation process, BIT 36 (1996) 143–157. [136] A. Sidi, A complete convergence and stability theory for a generalized Richardson extrapolation process, SIAM J. Numer. Anal. 34 (1997) 1761–1778. [137] A. Sidi, D. Levin, Prediction properties of the t-transformation, SIAM J. Numer. Anal. 20 (1983) 589–598. [138] E. Stiefel, Altes und neues u ber numerische Quadratur, Z. Angew. Math. Mech. 41 (1961) 408–413. [139] M. Toda, Waves in nonlinear lattice, Prog. Theor. Phys. Suppl. 45 (1970) 174–200. [140] J. van Iseghem, Convergence of vectorial sequences, applications, Numer. Math. 68 (1994) 549–562. [141] D. Vekemans, Algorithm for the E-prediction, J. Comput. Appl. Math. 85 (1997) 181–202. [142] P. Verlinden, Acceleration of Gauss–Legendre quadrature for an integral with an endpoint singularity, J. Comput. Appl. Math. 77 (1997) 277–287. [143] G. Walz, The history of extrapolation methods in numerical analysis, Report No. 130, Universitat Mannheim, Fakultat fur Mathematik und Informatik, 1991. [144] G. Walz, Asymptotics and Extrapolation, Akademie, Berlin, 1996. [145] E.J. Weniger, Nonlinear sequence transformations for the acceleration of convergence and the summation of divergent series, Comput. Phys. Rep. 10 (1989) 189–371. [146] J. Wimp, Sequence Transformations and their Applications, Academic Press, New York, 1981. [147] P. Wynn, On a device for computing the em (Sn ) transformation, MTAC 10 (1956) 91–96. [148] P. Wynn, On a procrustean technique for the numerical transformation of slowly convergent sequences and series, Proc. Cambridge Philos. Soc. 52 (1956) 663–671. [149] P. Wynn, Con uent forms of certain non-linear algorithms, Arch. Math. 11 (1960) 223–234.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1–21 [150] [151] [152] [153]
21
P. Wynn, Acceleration techniques for iterated vector and matrix problems, Math. Comput. 16 (1962) 301–322. P. Wynn, Singular rules for certain non-linear algorithms, BIT 3 (1963) 175–195. P. Wynn, Partial dierential equations associated with certain non-linear algorithms, ZAMP 15 (1964) 273–289. P. Wynn, Upon systems of recursions which obtain among the quotients of the Pade table, Numer. Math. 8 (1966) 264–269. [154] P. Wynn, On the convergence and stability of the epsilon algorithm, SIAM J. Numer. Anal. 3 (1966) 91–122. [155] M. Redivo Zaglia, Particular rules for the Â-algorithm, Numer. Algorithms 3 (1992) 353–370.
Journal of Computational and Applied Mathematics 122 (2000) 23–35 www.elsevier.nl/locate/cam
On the history of multivariate polynomial interpolation a
b
Mariano Gascaa; ∗ , Thomas Sauerb
Department of Applied Mathematics, University of Zaragoza, 50009 Zaragoza, Spain Institute of Mathematics, University Erlangen-Nurnberg, Bismarckstr. 1 12 , D-91054 Erlangen, Germany Received 7 June 1999; received in revised form 8 October 1999
Abstract Multivariate polynomial interpolation is a basic and fundamental subject in Approximation Theory and Numerical Analysis, which has received and continues receiving not deep but constant attention. In this short survey, we review its c 2000 development in the rst 75 years of this century, including a pioneering paper by Kronecker in the 19th century. Elsevier Science B.V. All rights reserved.
1. Introduction Interpolation, by polynomials or other functions, is a rather old method in applied mathematics. This is already indicated by the fact that, apparently, the word “interpolation” itself has been introduced by J. Wallis as early as 1655 as it is claimed in [13]. Compared to this, polynomial interpolation in several variables is a relatively new topic and probably only started in the second-half of the last century with work in [6,22]. If one considers, for example, the Encyklopadie der Mathematischen Wissenschaften [13] (Encyclopedia of Math. Sciences), originated by the Preuische Akademie der Wissenschaften (Prussian Academy of Sciences) to sum up the “state of art” of mathematics at its time, then the part on interpolation, written by J. Bauschinger (Bd. I, Teil 2), mentions only one type of multivariate interpolation, namely (tensor) products of sine and cosine functions in two variables, however, without being very speci c. The French counterpart, the Encyclopedie de Sciences Mathematiques [14], also contains a section on interpolation (Tome I, vol. 4), where Andoyer translated and extended Bauschinger’s exposition. Andoyer is even more
∗
Corressponding author. E-mail addresses:
[email protected] (M. Gasca),
[email protected] (T. Sauer).
c 2000 Elsevier Science B.V. All rights reserved. 0377-0427/00/$ - see front matter PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 5 3 - 8
24
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
explicit with his opinion on multivariate polynomial interpolation, by making the following statement which we think that time has contradicted: Il est manifeste que l’interpolation des fonctions de plusiers variables ne demande aucun principe nouveau, car dans tout ce qui precede le fait que la variable independante e tait unique n’a souvent joue aucun rˆole. 1 Nevertheless, despite of Andoyer’s negative assessment, multivariate polynomial interpolation has received not deep but constant attention from one part of the mathematical community and is today a basic subject in Approximation Theory and Numerical Analysis with applications to many mathematical problems. Of course, this eld has de nitely been in uenced by the availability of computational facilities, and this is one of the reasons that more papers have been published about this subject in the last 25 years than in the preceding 75 ones. To our knowledge, there is not any paper before the present one surveying the early papers and books on multivariate polynomial interpolation. Our aim is a rst, modest attempt to cover this gap. We do not claim to be exhaustive and, in particular, recognize our limitations with respect to the Russian literature. Moreover, it has to be mentioned that the early results on multivariate interpolation usually appear in the context of many dierent subjects. For example, papers on cubature formulas frequently have some part devoted to it. Another connection is Algebraic Geometry, since the solvability of a multivariate interpolation problem relies on the fact that the interpolation points do not lie on an algebraic surface of a certain type. So it is dicult to verify precisely if and when a result appeared somewhere for the rst time or if it had already appeared, probably even in an implicit way, in a dierent context. We remark that another paper in this volume [25] deals, complementarily, with recent results in the subject, see also [16]. Along the present paper we denote by kd the space of d-variate polynomials of total degree not greater than k. 2. Kronecker, Jacobi and multivariate interpolation Bivariate interpolation by the tensor product of univariate interpolation functions, that is when the variables are treated separately, is the classical approach to multivariate interpolation. However, when the set of interpolation points is not a Cartesian product grid, it is impossible to use that idea. Today, given any set of interpolation points, there exist many methods 2 to construct an adequate polynomial space which guarantees unisolvence of the interpolation problem. Surprisingly, this idea of constructing an appropriate interpolation space was already pursued by Kronecker [22] in a widely unknown paper from 1865, which seems to be the rst treatment of multivariate polynomial interpolation with respect to fairly arbitrary point con gurations. Besides the mathematical elegance of this approach, we think it is worthwhile to devote some detailed attention to this paper and to resolve its main ideas in today’s terminology, in particular, as it uses the “modern” approach of connecting polynomial interpolation to the theory of polynomial ideals. 1 It is clear that the interpolation of functions of several variables does not demand any new principles because in the above exposition the fact that the variable was unique has not played frequently any role. 2 See [16,25] for exposition and references.
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
25
Kronecker’s method to construct an interpolating polynomial assumes that the disjoint nodes z1 ; : : : ; zN ∈ Cd are given in implicit form, i.e., they are (all) the common simple zeros of d polynomials f1 ; : : : ; fd ∈ C[z] = C[1 ; : : : ; d ]. Note that the nonlinear system of equations fj (1 ; : : : ; d ) = 0;
j = 1; : : : ; d;
(1)
is a square one, that is, the number of equations and the number of variables coincide. We are interested in the nite variety V of solutions of (1) which is given as V :={z1 ; : : : ; zN } = {z ∈ Cd : f1 (z) = · · · = fd (z) = 0}:
(2)
The primary decomposition according to the variety V allows us to write the ideal I(V )={p : p(z)= 0; z ∈ V } as I(V ) =
N \
h1 − k; 1 ; : : : ; d − k; d i;
k=1
where zk = (k; 1 ; : : : ; k; d ). In other words, since fk ∈ I(V ), k = 1; : : : ; d, any of the polynomials f1 ; : : : ; fd can be written, for k = 1; : : : ; N , as fj =
d X i=1
gi;k j (·)(i − k; i );
(3)
where gi;k j are appropriate polynomials. Now consider the d × d square matrices of polynomials Gk = [gi;k j : i; j = 1; : : : ; d];
k = 1; : : : ; N
and note that, due to (3), and the assumption that fj (zk ) = 0; j = 1; : : : ; d; k = 1; : : : ; N , we have
f1 (Zj ) (j; 1 − k; 1 ) .. .. 0 = . = Gk (zj ) ; . fd (zj )
k = 1; : : : ; N:
(4)
(j; d − k; d )
Since the interpolation nodes are assumed to be disjoint, this means that for all j 6= k the matrix Gk (zj ) is singular, hence the determinant of Gk (zj ) has to be zero. Moreover, the assumption that z1 ; : : : ; zN are simple zeros guarantees that det Gk (zk ) 6= 0. Then, Kronecker’s interpolant takes, for any f : Cd → C, the form Kf =
N X
f(zj )
j=1
Hence,
det Gk (·) : det Gk (zk )
(5)
det Gk (·) : k = 1; : : : ; N P = span det Gk (zk ) is an interpolation space for the interpolation nodes z1 ; : : : ; zN . Note that this method does not give only one interpolation polynomial but in general several dierent interpolation spaces, depending on how the representation in (3) is chosen. In any way, note that for each polynomial f ∈ C[z] the dierence f−
N X j=1
f(zj )
det Gk (z) det Gk (zk )
26
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
belongs to the ideal hf1 ; : : : ; fd i, hence there exist polynomials q1 ; : : : ; qd such that N X
d X det Gk (z) f− f(zj ) qj fj : = det Gk (zk ) j=1 j=1
(6)
Moreover, as Kronecker points out, the “magic” polynomials gi;k j can be chosen such that their leading homogeneous terms, say Gi;k j , coincide with the leading homogeneous terms of (1=deg fj )@fj =@i . If we denote by Fj the leading homogeneous term of fj , j = 1; : : : ; d, then this means that Gi;k j =
1 @Fj ; deg Fj @i
i; j = 1; : : : ; d;
k = 1; : : : ; N:
(7)
But this implies that the homogeneous leading term of the “fundamental” polynomials det Gk coincides, after this particular choice of gi;k j , with
@Fj 1 det : i; j = 1; : : : ; d ; g= deg f1 · · · deg fd @i which is independent of k now; in other words, there exist polynomials gˆk ; k = 1; : : : ; N , such that deg gˆk ¡ deg g and det Gk = g + gˆk . Moreover, g is a homogeneous polynomial of degree at most deg f1 + · · · + deg fd − d. Now, let p be any polynomial, then Kp =
N X
p(zj )
j=1
N N X X p(zj ) p(zj ) det Gj (·) =g + gˆ : det Gj (zj ) det Gj (zj ) j=1 det Gj (zj ) j j=1
(8)
Combining (8) with (6) then yields the existence of polynomials q1 ; : : : ; qd such that p=g
N X j=1
N d X X p(zj ) p(zj ) + gˆj + qj fj det Gj (zj ) j=1 det Gj (zj ) j=1
and comparing homogeneous terms of degree deg g Kronecker realized that either, for any p such that deg p ¡ deg g, N X j=1
p(zj ) =0 det Gj (zj )
(9)
or there exist homogeneous polynomials h1 ; : : : ; hd such that g=
d X
hj det Fj :
(10)
j=1
The latter case, Eq. (10), says (in algebraic terminology) that there is a syzygy among the leading terms of the polynomials Fj ; j = 1; : : : ; d, and is equivalent to the fact that N ¡ deg f1 · · · deg fd , while (9) describes and even characterizes the complete intersection case that N = deg f1 · · · deg fd . In his paper, Kronecker also mentions that the condition (10) has been overlooked in [21]. Jacobi dealt there with the common zeros of two bivariate polynomials and derived explicit representations for the functional [z1 ; : : : ; zN ]f:=
N X j=1
f(zj ) ; det Gj (zj )
(11)
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
27
which behaves very much like a divided dierence, since it is a combination of point evaluations d which, provided that (9) hold true, annihilates deg g−1 . In addition, Kronecker refers to a paper [6] which he says treats the case of symmetric functions, probably elementary symmetric polynomials. Unfortunately, this paper is unavailable to us so far. 3. Bivariate tables, the natural approach Only very few research papers on multivariate polynomial interpolation were published during the rst part of this century. In the classical book Interpolation [45], where one section (Section 19) is devoted to this topic, the author only refers to two related papers, recent at that time (1927), namely [27,28]. The latter one [28], turned out to be inaccessible to us, unfortunately, but it is not dicult to guess that it might have pursued a tensor product approach, because this is the unique point of view of [45] (see also [31]). The formulas given in [27] are Newton formulas for tensor product interpolation in two variables, and the author, Narumi, claims (correctly) that they can be extended to “many variables”. Since it is a tensor product approach, the interpolation points are of the form (xi ; yj ), 06i6m; 06j6n, with xi ; yj arbitrarily distributed on the axes OX and OY , respectively. Bivariate divided dierences for these sets of points are obtained in [27], by recurrence, separately for each variable. With the usual notations, the interpolation formula from [27] reads as p(x; y) =
n m X X
f[x0 ; : : : ; xi ; y0 ; : : : ; yj ]
i=0 j=0
i−1 Y
j−1 Y
h=0
k=0
(x − xh )
(y − xk );
(12)
where empty products have the value 1. Remainder formulas based on the mean value theorem are also derived recursively from the corresponding univariate error formulas in [27]. For f suciently smooth there exist values ; 0 ; Á; Á0 such that Q Q @m+1 f(; y) mh=0 (x − xh ) @n+1 f(x; Á) nk=0 (y − yk ) R(x; y) = + @xm+1 (m + 1)! @yn+1 (n + 1)! Q
Q
@m+n+2 f(0 ; Á0 ) mh=0 (x − xh ) nk=0 (y − yk ) : (13) − @xm+1 @yn+1 (m + 1)! (n + 1)! The special case of equidistant points on both axes is particularly considered in [27], and since the most popular formulas at that time were based on nite dierences with equally spaced arguments, Narumi shows how to extend Gauss, Bessel and Stirling univariate interpolation formulas for equidistant points to the bivariate case by tensor product. He also applies the formulas he obtained to approximate the values of bivariate functions, but he also mentions that some of his formulas had been already used in [49]. In [45], the Newton formula (12) is obtained in the same way, with the corresponding remainder formula (13). Moreover, Steensen considers a more general case, namely when for each i; 06i6m, the interpolation points are of the form y0 ; : : : ; yni , with 06ni 6n. Now with a similar argument the interpolating polynomial becomes p(x; y) =
ni m X X i=0 j=0
f[x0 ; : : : ; xi ; y0 ; : : : ; yj ]
i−1 Y
j−1 Y
h=0
k=0
(x − xh )
(y − xk )
(14)
28
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
with a slightly more complicated remainder formula. The most interesting particular cases occur when ni = n, which is the Cartesian product considered above, and when ni = m − i. This triangular case (triangular not because of the geometrical distribution of the interpolation points, but of the indices (i; j)), gives rise to the interpolating polynomial p(x; y) =
m−i m X X
f[x0 ; : : : ; xi ; y0 ; : : : ; yj ]
i=0 j=0
i−1 Y
j−1 Y
h=0
k=0
(x − xh )
(y − xk );
(15)
that is p(x; y) =
X
f[x0 ; : : : ; xi ; y0 ; : : : ; yj ]
06i+j6m
i−1 Y
j−1 Y
h=0
k=0
(x − xh )
(y − xk ):
(16)
Steensen refers for this formula to Biermann’s lecture notes [4] from 1905, and actually it seems that Biermann has been the rst who considered polynomial interpolation on the triangular grid in a paper [3] from 1903 (cf. [44]) in the context of cubature. Since the triangular case corresponds to looking at the “lower triangle” of the tensor product situation only, this case can be resolved by tensor product methods. In particular, the respective error formula can be written as R(x; y) =
m+1 m+1 X @ f(i ; Ái ) i=0
@xi @ym+1−i
Qi−1
h=0 (x
i!
− xh )
Qm−i
k=0 (y
− yk ) : (m − i + 1)!
(17)
In the case of Cartesian product Steensen also provides the Lagrange formula for (12), which can be obviously obtained by tensor product of univariate formulas. Remainder formulas based on intermediate points (i ; Ái ) can be written in many dierent forms. For them we refer to Stancu’s paper [44] which also contains a brief historical introduction where the author refers, among others, to [3,15,27,40,41]. Multivariate remainder formulas with Peano (spline) kernel representation, however, have not been derived until very recently in [42] and, in particular, in [43] which treats the triangular situation. 4. Salzer’s papers: from bivariate tables to general sets In 1944, Salzer [33] considered the interpolation problem at points of the form (x1 + s1 h1 ; : : : ; x n + sn hn ) where (i) (x1 ; : : : ; x n ) is a given point in Rn , (ii) h1 ; : : : ; hn are given real numbers, (iii) s1 ; : : : ; sn are nonnegative integers summing up to m. This is the multivariate extension of the triangular case (16) for equally spaced arguments, where nite dierences can be used. Often, dierent names are used for the classical Newton interpolation formula in the case of equally spaced arguments using forward dierences: Newton–Gregory, Harriot– Briggs, also known by Mercator and Leibnitz, etc. See [18] for a nice discussion of this issue. In [33], Salzer takes the natural multivariate extension of this formula considering the polynomial q(t1 ; : : : ; tn ) := p(x1 + t1 h1 ; : : : ; x n + tn hn ) of total degree not greater than m in the variables t1 ; : : : ; tn , which interpolates a function f(x1 +t1 h1 ; : : : ; x n +tn hn ) at the points corresponding to ti =si , i=1; : : : ; n,
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
29
where the si are all nonnegative integers such that 06s1 + · · · + sn 6m. The formula, which is called in [33] a multiple Gregory–Newton formula, is rewritten there in terms of the values of the function f at the interpolation points, i.e., in the form
X
q(t1 ; : : : ; tn ) =
s1 +···+sn 6m
t1 s1
···
tn sn
m − t1 − · · · − tn m − s1 − · · · − sn
f(x1 + s1 h1 ; : : : ; x n + sn hn ):
(18)
Note that (18) is the Lagrange formula for this interpolation problem. Indeed, each function
t1 s1
···
tn sn
m − t1 − · · · − tn m − s1 − · · · − sn
(19)
is a polynomial in t1 ; : : : ; tn of total degree m which vanishes at all points (t1 ; : : : ; tn ) with ti nonnegative integers 06t1 + · · · + tn 6m, except at the point (s1 ; : : : ; sn ), where it takes the value 1. In particular, for n = 1 we get the well-known univariate Lagrange polynomials
‘s (t) =
t s
m−t m−s
=
Y t−i 06i6m; i6=s
s−i
for s = 0; : : : ; m. Salzer used these results in [34] to compute tables for the polynomials (18) and, some years later in [35], he studied in a similar form how to get the Lagrange formula for the more general case of formula (16), even starting with this formula. He obtained the multivariate Lagrange polynomials by a rather complicated expression involving the univariate ones. It should be noted that several books related to computations and numerical methods published around this time include parts on multivariate interpolation to some extent, surprisingly, more than most of the recent textbooks in Numerical Analysis. We have already mentioned Steensen’s book [45], but we should also mention Whittaker and Robinson [51, pp. 371–374], Mikeladze [26, Chapter XVII] and especially Kunz [23, pp. 248–274], but also Isaacson and Keller [20, pp. 294 –299] and Berezin and Zhidkov [2, pp. 156 –194], although in any of them not really much more than in [45] is told. In [36,37], Salzer introduced a concept of bivariate divided dierences abandoning the idea of iteration for each variable x and y taken separately. Apparently, this was the rst time (in spite of the similarity with (11)), that bivariate divided dierences were explicitly de ned for irregularly distributed sets of points. Divided dierences with repeated arguments are also considered in [37] by coalescence of the ones with dierent arguments. Since [36] was just a rst attempt of [37], we only explain the latter one. Salzer considers the set of monomials {xi yj }, with i; j nonnegative integers, ordered in a graded lexical term order, that is, (i; j) ¡ (h; k) ⇔ i + j ¡ h + k
or
i + j = h + k; i ¿ h:
(20)
Hence, the monomials are listed as {1; x; y; x2 ; xy; y2 ; x3 ; : : :}:
(21)
For any set of n + 1 points (xi ; yi ), Salzer de nes the associated divided dierence [01 : : : n]f:=
n X k=0
Ak f(xk ; yk );
(22)
30
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
choosing the coecients Ak in such a form that (22) vanishes when f is any of the rst n monomials of list (21) and takes the value 1 when f is the (n + 1)st monomial of that list. In other words, the coecients Ak are the solution of the linear system n X
Ak xki ykj = 0;
k=0 n X k=0
Ak xki ykj = 1;
xi yj any of the rst n monomials of (21); xi yj the (n + 1)th monomial of (21):
(23)
These generalized divided dierences share some of the properties of the univariate ones but not all. Moreover, they have some limitations, for example, they exist only if the determinant of the coecients in (23) is dierent from zero, and one has no control of that property in advance. On the other hand, observe that for example the simple divided dierence with two arguments (x0 ; y0 ) and (x; y), which is f(x; y) − f(x0 ; y0 ) ; x − x0 gives, when applied to f(x; y) = xy, the rational function xy − x0 y0 x − x0 and not a polynomial of lower degree. In fact, Salzer’s divided dierences did not have great success. Several other de nitions of multivariate divided dierences had appeared since then, trying to keep as many as possible of the good properties of univariate divided dierences, cf. [16]. 5. Reduction of a problem to other simpler ones Around the 1950s an important change of paradigm happened in multivariate polynomial interpolation, as several people began to investigate more general distributions of points, and not only (special) subsets of Cartesian products. So, when studying cubature formulae [32], Radon observed the following in 1948: if a bivariate interpolation problem with respect to a set T ⊂ R2 of ( k+2 ) 2 interpolation points is unisolvent in k2 , and U is a set of k + 2 points on an arbitrary straight line ‘ ⊂ R2 such that ‘ ∩ T = ∅, then the interpolation problem with respect to T ∪ U is unisolvent 2 . Radon made use of this observation to build up point sets which give rise to unisolvent in k+1 interpolation problems for m recursively by degree. Clearly, these interpolation points immediately yield interpolatory cubature formulae. The well-known Bezout theorem, cf. [50], states that two planar algebraic curves of degree m and n, with no common component, intersect each other at exactly mn points in an algebraic closure of the underlying eld, counting multiplicities. This theorem has many interesting consequences for bivariate interpolation problems, extensible to higher dimensions. For example, no unisolvent interpolation problem in n2 can have more than n + 1 collinear points. Radon’s method in [32] is a consequence of this type of observations, and some other more recent results of dierent authors can also be deduced in a similar form, as we shall see later.
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
31
Another example of a result which shows the more general point of view taken in multivariate interpolation at that time is due to Thacher Jr. and Milne [47] (see also [48]). Consider two uni1 variate interpolation problems in n−1 , with T1 ; T2 as respective sets of interpolation points, both of cardinality n. Assume that T1 ∩ T2 has cardinality n − 1, hence T = T1 ∪ T2 has cardinality n + 1. The univariate Aitken–Neville interpolation formula combines the solutions of the two smaller problems based on T1 and T2 to obtain the solution in n1 of the interpolation problem with T as the underlying set of interpolation points. The main idea is to nd a partition of unity, in this case ane polynomials ‘1 ; ‘2 , i.e., ‘1 + ‘2 = 1, such that ‘1 (T2 \T1 ) = ‘2 (T1 \T2 ) = 0 and then combine the solutions p1 ; p2 with respect to T1 ; T2 , into the solution ‘1 p1 + ‘2 p2 with respect to T . This method was developed in the 1930s independently by Aitken and Neville with the goal to avoid the explicit use of divided dierences in the computation of univariate Lagrange polynomial interpolants. It was exactly this idea which Thatcher and Milne extended to the multivariate case in [47]. Let us sketch their approach in the bivariate case. For example, consider an interpolation problem with 10 interpolation points, namely, the set T = {(i; j) : 06i + j63}, where i; j are nonnegative integers, and the interpolation space 32 . The solution pT of this problem is obtained in [47] from the solutions pTk ∈ 22 ; k = 1; 2; 3, of the 3 interpolation problems with respect to the six-point sets Tk ⊂ T , k = 1; 2; 3, where T1 = {(i; j) : 06i + j62}; T2 = {(i; j) : (i; j) ∈ T; i ¿ 0}; T3 = {(i; j) : (i; j) ∈ T; j ¿ 0}: Then, p T = ‘1 p T 1 + ‘2 p T 2 + ‘3 p T 3 ; where ‘k ; k = 1; 2; 3 are appropriate polynomials of degree 1. In fact, in this case these polynomials are the barycentric coordinates relative to the simplex (0, 0), (3, 0), (0, 3) and thus a partition of unity. In [47] the problem is studied in d variables and in that case d + 1 “small” problems, with respective interpolation sets Tk ; k = 1; : : : ; d, with a simplicial structure (the analogue of the triangular grid), are used to obtain the solution of the full problem with T = T1 ∩ · · · ∩ Td+1 as interpolation points. In 1970, Guenter and Roetman [19], among other observations, made a very interesting remark, which connects to the Radon=Bezout context and deserves to be explained here. Let us consider a set T of ( m+d ) points in Rd , where exactly ( m+d−1 ) of these points lie on a hyperplane H . Then T \H d d−1 m−1+d m consists of ( d ) points. Let us denote by d; H the space of polynomials of dm with the variables m . If the interpolation problems de ned by the sets T \H restricted to H , which is isomorphic to d−1 and T ∩H are unisolvent in the spaces dm−1 and d;m H , respectively, then the interpolation problem de ned by T is unisolvent in dm . In other words, the idea is to decompose, whenever possible, a problem of degree m and d variables into two simpler problems, one of degree m and d−1 variables and the other one with degree m − 1 and d variables.
32
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
6. The ÿnite element approach In 1943, Courant [11] suggested a nite dierence method applicable to boundary value problems arising from variational problems. It is considered one of the motivations of the nite element method, which emerged from the engineering literature along the 1950s. It is a variational method of approximation which makes use of the Rayleigh–Ritz–Galerkin technique. The method became very successful, with hundreds of technical papers published (see, e.g., the monograph [52]), even before its mathematical basis was completely understood at the end of the 1960s. Involved in the process of the nite element method there are local polynomial interpolation problems, generally for polynomials of low degree, thus, with only few interpolation data. The global solution obtained by solving all the local interpolation problems is a piecewise polynomial of a certain regularity, depending on the amount and type of interpolation data in the common boundary between pieces. Some of the interest in multivariate polynomial interpolation along the 1960=1970s was due to this method. Among the most interesting mathematical papers of that time in Finite Elements, we can mention [53,5], see also the book [46] by Strang and Fix, but, in our opinion, the most relevant papers and book from the point of view of multivariate polynomial interpolation are due to Ciarlet et al., for example [7–9]. In 1972, Nicolaides [29,30] put the classical problem of interpolation on a simplicial grid of ( m+d ) d points of Rd , regularly distributed, forming what he called a principal lattice, into the nite element context. He actually used barycentric coordinates for the Lagrange formula, and moreover gave the corresponding error representations, see also [7]. However, much of this material can already be found in [3]. In general, taking into account that these results appeared under dierent titles, in a dierent context and in journals not accessible everywhere, it is not so surprising any more, how often the basic facts on the interpolation problem with respect to the simplicial grid had been rediscovered. 7. Hermite problems The use of partial or directional derivatives as interpolation data in the multivariate case had not received much attention prior to the nite element method, where they were frequently used. It seems natural to approach partial derivatives by coalescence, as in univariate Hermite interpolation problems. However, things are unfortunately much more complicated in several variables. As it was already pointed out by Salzer and Kimbro [39] in 1958, the Hermite interpolation problem based on the values of a bivariate function f(x; y) at two distinct points (x1 ; y1 ); (x2 ; y2 ) and on the values of the partial derivatives @f=@x; @f=@x at each of these two points is not solvable in the space 22 for any choice of points, although the number of interpolation conditions coincides with the dimension of the desired interpolation space. Some years later, Ahlin [1] circumvented some of these problems by using a tensor product approach: k 2 derivatives @p+q f=@xp @yq with 06p; q6k − 1 are prescribed at the n2 points of a Cartesian product. The interpolation space is the one spanned by x yÿ with 06; ÿ6nk − 1 and a formula for the solution is easily obtained. We must mention that Salzer came back to bivariate interpolation problems with derivatives in [38] studying hyperosculatory interpolation over Cartesian grids, that is, interpolation problems where all partial derivatives of rst and second order and the value of the function are known at the
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
33
interpolation points. Salzer gave some special con gurations of points which yield solvability of this type of interpolation problem in an appropriate polynomial space and also provided the corresponding remainder formulae. Nowadays, Hermite and Hermite–Birkho interpolation problems have been studied much more systematically, see [16,25] for references.
8. Other approaches In 1966, Coatmelec [10] studied the approximation of functions of several variables by linear operators, including interpolation operators. At the beginning of the paper, he only considered interpolation operators based on values of point evaluations of the function, but later he also used values of derivatives. In this framework he obtained some qualitative and quantitative results on the approximation order of polynomial interpolation. At the end of [10], Coatmelec also includes some examples in R2 of points which are distributed irregularly along lines: n + 1 of the points on a line r0 , n of them on another line r1 , but not on r0 , and so on until 1 point is chosen on a line rn but not on r0 ∪ · · · ∪ rn−1 . He then points out the unisolvence of the corresponding interpolation problem in n2 which is, in fact, again a consequence of Bezout’s theorem as in [32]. In 1971, Glaeser [17] considers Lagrange interpolation in several variables from an abstract algebraic=analytic point of view and acknowledges the inconvenience of working with particular systems of interpolation points due to the possibility of the nonexistence of a solution, in contrast to the univariate case. This is due to the nonexistence of polynomial spaces of dimension k ¿ 1 in more than one variable such that the Lagrange interpolation problem has a unique solution for any system of k interpolation points. In other words, there are no nontrivial Haar (or Chebychev) spaces any more for two and more variables, cf. [12] or [24]. In [17], polynomial spaces with dimension greater than the number of interpolation conditions are considered in order to overcome this problem. Glaeser investigated these underdetermined systems which he introduced as interpolation schemes in [17] and also studied the problem of how to particularize the ane space of all solutions of a given interpolation problem in order to obtain a unique solution. This selection process is done in such a way that it controls the variation of the solution when two systems of interpolation points are very “close” to each other, with the goal to obtain a continuous selection process.
Acknowledgements 1. We thank Carl de Boor for several references, in particular, for pointing out to us the paper [32] with the result mentioned at the beginning of Section 4, related to Bezout’s theorem. We are also grateful the help of Elena Ausejo, from the group of History of Sciences of the University of Zaragoza, in the search of references. 2. M. Gasca has been partially supported by the Spanish Research Grant PB96 0730. 3. T. Sauer was supported by a DFG Heisenberg fellowship, Grant SA 627=6.
34
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32]
A.C. Ahlin, A bivariate generalization of Hermite’s interpolation formula, Math. Comput. 18 (1964) 264–273. I.S. Berezin, N.P. Zhidkov, Computing Methods, Addison-Wesley, Reading, MA, 1965 (Russian version in 1959). O. Biermann, Uber naherungsweise Kubaturen, Monatshefte Math. Phys. 14 (1903) 211–225. O. Biermann, Vorlesungen u ber Mathematische Naherungsmethoden, Vieweg, Braunschweig, 1905. G. Birkho, M.H. Schultz, R.S. Varga, Piecewise Hermite interpolation in one and two variables with applications to partial dierential equations, Numer. Math. 11 (1968) 232–256. W. Borchardt, Uber eine Interpolationsformel fur eine Art symmetrischer Funktionen und deren Anwendung, Abh. d. Preu. Akad. d. Wiss. (1860) 1–20. P.G. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam, 1978. P.G. Ciarlet, P.A. Raviart, General Lagrange and Hermite interpolation in Rn with applications to nite element methods, Arch. Rational Mech. Anal. 46 (1972) 178–199. P.G. Ciarlet, C. Wagschal, Multipoint Taylor formulas and applications to the nite element method, Numer. Math. 17 (1971) 84–100. C. Coatmelec, Approximation et interpolation des fonctions dierentiables de plusieurs variables, Ann. Sci. Ecole Norm. Sup. 83 (1966) 271–341. R. Courant, Variational methods for the solution of problems of equilibrium and vibrations, Bull. Amer. Math. Soc. 49 (1943) 1–23. P.J. Davis, Interpolation and Approximation, Blaisdell, Walthan, MA, 1963 (2nd Edition, Dover, New York, 1975). Encyklopadie der mathematischen Wissenschaften, Teubner, Leipzig, pp. 1900 –1904. Encyclopedie des Sciences Mathematiques, Gauthier-Villars, Paris, 1906. I.A. Ezrohi, General forms of the remainder terms of linear formulas in multidimensional approximate analysis I, II Mat. Sb. 38 (1956) 389 – 416 and 43 (1957) 9 –28 (in Russian). M. Gasca, T. Sauer, Multivariate polynomial interpolation, Adv. Comput. Math., 12 (2000) 377–410. G. Glaeser, L’interpolation des fonctions dierentiables de plusieurs variables, in: C.T.C. Wall (Ed.), Proceedings of Liverpool Singularities Symposium II, Lectures Notes in Mathematics, Vol. 209, Springer, Berlin, 1971, pp. 1–29. H.H. Goldstine, A History of Numerical Analysis from the 16th Through the 19th Century, Springer, Berlin, 1977. R.B. Guenter, E.L. Roetman, Some observations on interpolation in higher dimensions, Math. Comput. 24 (1970) 517–521. E. Isaacson, H.B. Keller, Analysis of Numerical Methods, Wiley, New York, 1966. C.G.J. Jacobi, Theoremata nova algebraica circa systema duarum aequationum inter duas variabiles propositarum, Crelle J. Reine Angew. Math. 14 (1835) 281–288. L. Kronecker, Uber einige Interpolationsformeln fur ganze Funktionen mehrerer Variabeln. Lecture at the academy of sciences, December 21, 1865, in: H. Hensel (Ed.), L. Kroneckers Werke, Vol. I, Teubner, Stuttgart, 1895, pp. 133–141. (reprinted by Chelsea, New York, 1968). K.S. Kunz, Numerical Analysis, McGraw-Hill, New York, 1957. G.G. Lorentz, Approximation of Funtions, Chelsea, New York, 1966. R. Lorentz, Multivariate Hermite interpolation by algebaic polynomials: a survey, J. Comput. Appl. Math. (2000) this volume. S.E. Mikeladze, Numerical Methods of Mathematical Analysis, Translated from Russian, Oce of Tech. Services, Department of Commerce, Washington DC, pp. 521–531. S. Narumi, Some formulas in the theory of interpolation of many independent variables, Tohoku Math. J. 18 (1920) 309–321. L. Neder, Interpolationsformeln fur Funktionene mehrerer Argumente, Skandinavisk Aktuarietidskrift (1926) 59. R.A. Nicolaides, On a class of nite elements generated by Lagrange interpolation, SIAM J. Numer. Anal. 9 (1972) 435–445. R.A. Nicolaides, On a class of nite elements generated by Lagrange interpolation II, SIAM J. Numer. Anal. 10 (1973) 182–189. K. Pearson, On the construction of tables and on interpolation, Vol. 2, Cambridge University Press, Cambridge, 1920. J. Radon, Zur mechanischen Kubatur, Monatshefte Math. Physik 52 (1948) 286–300.
M. Gasca, T. Sauer / Journal of Computational and Applied Mathematics 122 (2000) 23–35
35
[33] H.E. Salzer, Note on interpolation for a function of several variables, Bull. AMS 51 (1945) 279–280. [34] H.E. Salzer, Table of coecients for interpolting in functions of two variables, J. Math. Phys. 26 (1948) 294–305. [35] H.E. Salzer, Note on multivariate interpolation for unequally spaced arguments with an application to double summation, J. SIAM 5 (1957) 254–262. [36] H.E. Salzer, Some new divided dierence algorithms for two variables, in: R.E. Langer (Ed.), On Numerical Approximation, University of Wisconsin Press, Madison, 1959, pp. 61–98. [37] H.E. Salzer, Divided dierences for functions of two variables for irregularly spaced arguments, Numer. Math. 6 (1964) 68–77. [38] H.E. Salzer, Formulas for bivariate hyperosculatory interpolation, Math. Comput. 25 (1971) 119–133. [39] H.E. Salzer, G.M. Kimbro, Tables for Bivariate Osculatory Interpolation over a Cartesian Grid, Convair Astronautics, 1958. [40] A. Sard, Remainders: functions of several variables, Acta Math. 84 (1951) 319–346. [41] A. Sard, Remainders as integrals of partial derivatives, Proc. Amer. Math. Soc. 3 (1952) 732–741. [42] T. Sauer, Yuan Xu, On multivariate Lagrange interpolation, Math. Comput. 64 (1995) 1147–1170. [43] T. Sauer, Yuan Xu, A case study in multivariate Lagrange interpolation, in: S.P. Singh (Ed.), Approximation Theory, Wavelets and Applications, Kluwer Academic Publishers, Dordrecht, 1995, pp. 443–452. [44] D.D. Stancu, The remainder of certain linear approximation formulas in two variables, J. SIAM Numer. Anal. 1 (1964) 137–163. [45] I.F. Steensen, Interpolation, Chelsea, New York, 1927 (2nd Edition, 1950). [46] G. Strang, G.J. Fix, An Analysis of the Finite Element Method, Prentice-Hall, Englewood Clis, NJ, 1973. [47] H.C. Thacher Jr., W.E. Milne, Interpolation in several variables, J. SIAM 8 (1960) 33–42. [48] H.C. Thacher Jr., Derivation of interpolation formulas in several independent variables, Ann. N.Y. Acad. Sci. 86 (1960) 758–775. [49] T.N. Thiele, Interpolationsrechnung, Teubner, Leipzig, 1909. [50] R.S. Walker, Algebraic Curves, Springer, Berlin, 1978. [51] E.T. Whittaker, G. Robinson, Calculus of Observations, 4th Edition, Blackie and Sons, London, 1944. [52] O.C. Zienkiewicz, The Finite Element Method in Structural and Continuum Mechanics, McGraw-Hill, London, 1967. [53] M. Zlamal, On the nite element method, Numer. Math. 12 (1968) 394–409.
Journal of Computational and Applied Mathematics 122 (2000) 37–50 www.elsevier.nl/locate/cam
Elimination techniques: from extrapolation to totally positive matrices and CAGD M. Gasca ∗ , G. Muhlbach Department of Applied Mathematics, University of Zaragoza, 5009 Zaragoza, Spain Received 20 May 1999; received in revised form 22 September 1999
Abstract In this survey, we will show some connections between several mathematical problems such as extrapolation, linear systems, totally positive matrices and computer-aided geometric design, with elimination techniques as the common tool c 2000 Elsevier Science B.V. All rights reserved. to deal with all of them.
1. Introduction Matrix elimination techniques are basic tools in many mathematical problems. In this paper we will show their crucial role in some results that various authors with us have obtained in two problems apparently distant: extrapolation and computer-aided geometric design (CAGD). A brief overview of how things were developed over time will show that, once again, two results which are apparently far from each other, even obtained by dierent groups in dierent countries, are the natural consequence of a sequence of intermediate results. Newton’s interpolation formula is a classical tool for constructing an interpolating polynomial by recurrence, by using divided dierences. In the 1930s, Aitken [1] and Neville [52] derived independently of each other algorithms to compute the interpolating polynomial from the solutions of two simpler interpolation problems, avoiding the explicit use of divided dierences. Some papers, [38,46] among others, extended both approaches at the beginning of the 1970s, to the more general setting of Chebyshev systems. Almost simultaneously, extrapolation methods were being studied and extended by several authors, as Schneider [54], Brezinski [4,5,7], Havie [31–33], Muhlbach [39 – 42,48] and Gasca and Lopez-Carmona [19]. For a historical overview of extrapolation methods confer Brezinski’s contribution [6] to this volume and the book [8]. It must be remarked that the ∗
Corresponding author. E-mail address:
[email protected] (M. Gasca). c 2000 Elsevier Science B.V. All rights reserved. 0377-0427/00/$ - see front matter PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 5 6 - 3
38
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
techniques used by these authors were dierent, and that frequently the results obtained using one of these techniques induced some progress in the other ones, in a very cooperative form. However, it is clear that the basic role in all these papers was played by elimination techniques. In [21] we studied general elimination strategies, where one strategy which we called Neville elimination proved to be well suited to work with some special classes of matrices, in particular totally positive matrices (that are matrices with all subdeterminants nonnegative). This was the origin of a series of papers [24 –27] where the properties of Neville elimination were carefully studied and its application to totally positive matrices allowed a much better knowledge of these matrices. Since one of the applications of totally positive matrices is CAGD, the results obtained for them have given rise in the last years to several other papers as [28,11,12]. In [11,12] Carnicer and Pe˜na proved the optimality in their respective spaces of some well-known function bases as Bernstein polynomials and B-splines in the context of shape preserving representations. Neville elimination has appeared, once again, as a way to construct other bases with similar properties. 2. Extrapolation and Schur complement A k-tuple L = (‘1 ; : : : ; ‘k ) of natural numbers, with ‘1 ¡ · · · ¡ ‘k , will be called an index list of length k over N. For I = (i1 ; : : : ; im ) and J = (j1 ; : : : ; jn ) two index lists over N, we write I ⊂ J i every element of I is an element of J . Generally, we shall use for index lists the same notations as for sets emphasizing that I \ J; I ∩ J; I ∪ J : : : always have to be ordered as above. Let A=(aji ) be a real matrix and I =(i1 ; : : : ; im ) and J =(j1 ; : : : ; jn ) index lists contained, repectively, in the index lists of rows and columns of A. By
A
J I
=A
j1 ; : : : ; jn i1 ; : : : ; im
=1; :::; n m×n = (aji )=1; ; :::; m ∈ R
we denote the submatrix of A with list of rows I and list of columns J . ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ If I ; I 0 and J ; J 0 are partitions of I and J , respectively, i.e., I ∪ I 0 = I; I ∩ I 0 = ∅; J ∪ J 0 = ◦ ◦0 J J; J ∩ J = ∅, we represent A( I ) in a corresponding partition
A
J I
◦
J ◦ I
◦
J 0 ◦ I
A A = ◦ ◦0 : J J A
I
◦0
A
I
(1)
◦0
If m = n, then by J A I
:= det A J = A j1 ; : : : ; jm ; I i1 ; : : : ; im
we denote the determinant of A( JI ) which is called a subdeterminant of A. Throughout we set A| ∅∅ | := 1. ◦ Let N ∈ N; I := (1; 2; : : : ; N +1) and I := (1; 2; : : : ; N ). By a prime we denote ordered complements with respect to I . Given elements f1 ; : : : ; fN and f =: fN +1 of a linear space E over R, elements L1 ; : : : ; LN and L =: LN +1 of its dual E ∗ , consider the problem of nding hL; p1N (f)i;
(2)
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
39
where p = p1N (f) = c1 · f1 + · · · + cN · fN satis es the interpolation conditions ◦
hLi ; pi = hLi ; fi i ∈ I :
(3)
Here h·; ·i means duality between E ∗ and E. If we write
j i
A
:= hLi ; fj i for i; j ∈ I;
(i; j) 6= (N + 1; N + 1);
and c is the vector of components ci , this problem is equivalent to solving the bordered system (cf. [16])
B·x=y
B=
where
A
A
◦
I ◦ I
0
◦
I N +1
1
;
x=
c
;
N +1 ◦ I
A y= : N +1
A
(4)
N +1
◦
Assuming A( II ◦ ) nonsingular this can be solved by eliminating the components of c in the last equation by adding a suitable linear combination of the rst N equations of (4) to the last one, yielding one equation for one unknown, namely :
=A
N +1 N +1
−A
◦
I N +1
·A
◦
I ◦ I
−1
A
N +1 ◦ I
:
(5)
Considering the eect of this block elimination step on the matrix
A=
A
A
◦
I ◦ I ◦
I N +1
A
A
N +1 ◦ I
; N +1
(6)
N +1
we nd it transformed to ◦ I N +1 ◦ ◦ A A : I I A˜ = If we take N +1 A := 0; N +1
(7)
then we have = −hL; p1N (f)i:
(8)
On the other hand, if instead of (7) we take
A
N +1 N +1
:= hLN +1 ; fN +1 i;
(9)
then, in this frame, we get = hL; r1N (f)i;
(10)
40
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
where r1N (f) := f − p1N (f) is the interpolation remainder. If the systems (f1 ; : : : ; fN ) and (L1 ; : : : ; LN ) are independent of f and L then these problems are called general linear extrapolation problems, and if one or both do depend on f = fN +1 or L = LN +1 they are called problems of quasilinear extrapolation. Observe, that with regard to determinants the block elimination step above is an elementary operation leaving the value of det A unchanged. Hence
I I = ◦ ; I det A ◦ I det A
◦
which is known as the Schur complement of A( II ◦ ) in A( II ). This concept, introduced in [34,35] has found many applications in Linear Algebra and Statistics [13,53]. It may be generalized in dierent ways, see, for example, [21,22,44] where we used the concept of general elimination strategy which is explained in the next section. 3. Elimination strategies In this section and the next two let k; m; n ∈ N such that k + m = n and I = (1; : : : ; n). Given a square matrix A = A( II ) over R, how can we simplify det A by elementary operations, not altering the value of det A, producing zeros in prescribed columns, e.g. in columns 1 to k?. Take a permutation of all rows, M = (m1 ; : : : ; mn ) say, then look for a linear combination of k rows from (m1 ; : : : ; mn−1 ) which, when added to row mn , will produce zeros in columns 1 to k. Then add to row mn−1 a linear combination of k of its predecessors in M , to produce zeros in columns 1 to k, etc. Finally, add to row mk+1 a suitable linear combination of rows m1 ; : : : ; mk to produce zeros in columns 1 to k. Necessarily, 1; : : : ; k 6= 0 jr ; : : : ; jr
A
1
k
is assumed when a linear combination of rows j1r ; : : : ; jkr is added to row mr (r = n; n − 1; : : : ; k + 1) to generate zeros in columns 1 to k, and jqr ¡ mr (q = 1; : : : ; k; r = n; n − 1; : : : ; k + 1) in order that in each step an elementary operation will be performed. ◦ Let us give a formal description of this general procedure. Suppose that (Is ; Is ) (s = 1; : : : ; m) are ◦ pairs of ordered index lists of length k +1 and k, respectively, over a basic index list M with Is ⊂ Is . Then the family ◦
:= ((Is ; Is ))s=1; :::; m will be called a (k; m)-elimination strategy over I := I1 ∪ · · · ∪ Im provided that for s = 2; : : : ; m (i) card(I1 ∪ · · · ∪ Is ) = k + s, ◦ (ii) Is ⊂ Is ∩ (I1 ∪ · · · ∪ Is−1 ):
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50 ◦
41
◦
By E(k; m; I ) we denote the set of all (k; m)-elimination strategies over I . I := I1 is called the ◦ basic index list of the strategy . For each s, the zeros in the row (s) := Is \ Is are produced with ◦ the rows of Is . For shortness, we shall abbreviate the phrase “elimination strategy” by e.s. Notice that, when elimination is actually performed, it is done in the reverse ordering: rst in row (m), then in row (m − 1), etc. The simplest example of e.s. over I = (1; : : : ; m + k), is Gauss elimination: ◦
◦
= ((Gs ; Gs ))s=1; :::; m ;
◦
◦
G = Gs = {1; : : : ; k};
Gs = G ∪ {k + s}:
(11)
For this strategy it is irrelevant in which order elimination is performed. This does not hold for another useful strategy over I : ◦
N = ((Ns ; Ns ))s=1; :::; m
(12)
◦
with Ns = (s; : : : ; s + k − 1); Ns = (s; : : : ; s + k); s = 1; : : : ; m, which we called [21,43,44] the Neville (k; m)–e.s. Using this strategy elimination must be performed from bottom to top. The reason for the name Neville is their relationship with Neville interpolation algorithm, based on consecutivity, see [43,23]. 4. Generalized Schur complements ◦
◦
Suppose that = ((Is ; Is ))s=1; :::; m ∈ E(k; m; I ) and that K ⊂ I is a xed index list of length k. ◦ We assume that the submatrices A( KI ◦ ) of a given matrix A = A( II ) ∈ Rn×n are nonsingular for s s = 1; : : : ; m. Then the elimination strategy transforms A into the matrix A˜ which, partitioned with ◦ ◦ ◦ ◦ respect to I ∪ I 0 = I; K ∪ K 0 = I , can be written as
◦
˜ K A I◦
A˜ =
with
◦
◦
◦0
K ◦ A˜ I 0
0
K ◦ A˜ I
K0 ◦ A˜ I
=A
◦
K ◦ I
◦
K0 ◦ A˜ I
;
◦
K0 ◦ I
=A
:
◦0
◦
˜ K◦0 ) of A˜ is called the Schur complement of A( K◦ ) in A with respect to the The submatrix S˜ := A( I I ◦ e.s. and the column list K , and is also denoted by
S˜ = A
I I
A
◦
K ◦ I
:
◦ When = as in (11) and K = {1; : : : ; k}, then S˜ is the classical Schur complement, which can also be written as
◦
K0 ◦ A˜ I 0
=A
◦
K0 ◦ I 0
−A
◦
K ◦ I 0
A
◦
K ◦ I
−1
A
◦
K0 ◦ I
:
42
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50 ◦
When = N is the Neville (k; m)–e.s. (12) and K = {1; : : : ; k}, then the rows of the Schur ◦0 ˜ K◦0 ) are complement S˜ = A( I A˜
◦
K0 k +s
=A
◦
K0 k +s
−A
◦
K k +s
A
−1
◦
K s; : : : ; s + k − 1
A
◦
K0 s; : : : ; s + k − 1
s=1; : : : ; m:
Whereas, the Schur complement of a submatrix depends essentially on the elimination strategies used, its determinant does not! There holds the following generalization of Schur’s classical determinantal identity [21,22,44]:
I I
det A
= (−1) det A
◦
K ◦ I
det A
I I
A
◦
K ◦ I
◦
for all e.s. ∈ E(k; m; I ), where is an integer depending only on and K . Also, Sylvester’s classical determinantal identity [55,56] has a corresponding generalization, see [18,21,22,43,44] for details. In the case of Gauss elimination we get Sylvester’s classical identity [9,10,55,56]
t=1; :::; m
1; : : : ; k; k + t det A 1; : : : ; k; k + s
s=1;:::;m
m−1
1; : : : ; k = det A A 1; : : : ; k
In the case of Neville elimination one has
t=1; :::; m
1; : : : ; k; k + t det A s; : : : ; s + k − 1; s + k
= det A s=1;:::;m
m Y s=2
:
A
1; : : : ; k : s; : : : ; s + k − 1
Another identity of Sylvester’s type has been derived in [3]. Also some applications to the E-algorithm [5] are given there. As we have seen, the technique of e.s. has led us in particular to general determinantal identities of Sylvester’s type. It can also be used to extend determinantal identities in the sense of Muir [51], see [47]. 5. Application to quasilinear extrapolation problems Suppose we are given elements f1 ; : : : ; fN of a linear space E and elements L1 ; : : : ; LN of its dual E ∗ . Consider furthermore elements f =: fN +1 of E and L =: LN +1 of E ∗ . Setting I = (1; : : : ; N + 1), by A we denote the generalized Vandermonde matrix
I I
A=A
=V
f1 ; : : : ; fN ; fN +1 L1 ; : : : ; LN ; LN +1
j=1; :::; N +1
:= hLi ; fj i i=1;:::;N +1 :
(13)
Assume now that k; m ∈ N; m6N + 1 − k and that ◦
= ((Is ; Is ))s=1; :::; m is a (k − 1; m)–e.s. over
A
G Is
Sm
s=1 Is
(14) ⊂(1; : : : ; N ): Let G := (1; : : : ; k): If the submatrices
are nonsingular for s = 1; : : : ; m;
(15)
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
43
then for s = 1; : : : ; m the interpolants psk (f) :=
k X j=1
cs;k j (f) · fj ;
(16)
satisfying the interpolation conditions hLi ; psk (f)i = hLi ; fi for i ∈ Is are well de ned as well as ks (f) := hL; psk (f)i: Clearly, in case of general linear extrapolation the mapping pk
E 3 f →s psk (f) is a linear projection onto span{f1 ; : : : ; fN } and cs;k j
E 3 f → cs;k j (f) is a linear functional. In case of quasilinear extrapolation we assume that, as a function of f ∈ E; psk remains idempotent. Then, as a function of f ∈ E, in general the coecients cs;k j (f) are not linear. We assume that, as functions of f ∈ span{f1 ; : : : ; fN }; cs;k j (f) remain linear. The task is (i) to nd conditions, such that p1N (f); N1 (f) are well de ned, and (ii) to nd methods to compute these quantities from psk (f); ks (f)(s = 1; : : : ; m), respectively. When translated into pure terms of Linear Algebra these questions mean: Consider matrix (13) and assume (15), (i) under which conditions can we ensure that A( 1;:::;N ) is nonsingular? 1;:::;N The coecient problem reads: (ii0 ) Suppose that we do know the solutions csk (f) = (cs;k j (f))j=1; :::; k of the linear systems
A
G Is
·
csk (f)
=A
N +1 Is
;
s = 1; : : : ; m:
How to get from these the solution c1N (f) = (c1;N j (f))j=1; :::; N of
A
1; : : : ; N 1; : : : ; N
· c1N (f) = A
N +1 ? 1; : : : ; N
The value problem reads: (iii) Suppose that we do know the values ks (f) = hL; psk (f)i;
s = 1; : : : ; m:
How to get from these the value N1 (f) = hL; p1N (f)i? A dual coecient problem can be also considered interchanging the roles of the spaces E and E ∗ . These problems were considered and solved in [20,7,19,31,40 – 42,45,48,50].
44
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
6. Applications to special classes of matrices General elimination strategies, in particular the Neville e.s. and generalized Schur complements have found other applications in matrix theory and related problems. In [21,22,44] we have considered some classes Ln of real n × n-matrices A including the classes (i) Cn of matrices satisfying det A( JJ ) ¿ 0 for all J ⊂(1; : : : ; n); det A( KJ ) · det A( KJ ) ¿ 0 for all J; K ⊂(1; : : : ; n) of the same cardinality, which was considered in [36]; (ii) of symmetric positive-de nite matrices; (iii) of strictly totally positive matrices (STP), which are de ned by the property that all square submatrices have positive determinants [36]; (iv) of Minkowski matrices, de ned by
A
j i
¡0
for all i 6= j;
det A
1; : : : ; k 1; : : : ; k
¿0
for all 16k6n:
In [21] we have proved that A ∈ Ln ⇒S˜ ∈ Lm ; where m=n−k and S˜ denotes the classical Schur complement of A( 1;:::;k ) in A. For STP matrices also 1;:::;k generalized Schur complements with respect to the Neville e.s. are STP. Using the Neville e.s. in [21,49] tests of algorithmic complexity O(N 4 ) for matrices being STP were derived for the rst time. Neville elimination, based on consecutivity, proved to be especially well suited for STP matrices, because these matrices were characterized in [36] by the property of having all subdeterminants with consecutive rows and columns positive. Elimination by consecutive rows is not at all new in matrix theory. It has been used to prove some properties of special classes of matrices, for example, totally positive (TP) matrices, which, as it has already been said, are matrices with all subdeterminants nonnegative. However, motivated by the above mentioned algorithm for testing STP matrices, Gasca and Pe˜na [24] initiated an exhaustive study of Neville elimination in an algorithmic way, of the pivots and multipliers used in the proccess to obtain new properties of totally positive matrices and to improve and simplify the known characterizations of these matrices. Totally positive matrices have interesting applications in many elds, as, for example, vibrations of mechanical systems, combinatorics, probability, spline functions, computer-aided geometric design, etc., see [36,37]. For this reason, remarkable papers on total positivity due to specialists on these elds have appeared, see for example the ones collected in [29]. The important survey [2] presents a complete list of references on totally positive matrices before 1987. One of the main points in the recent study of this class of matrices has been that of characterizing them in practical terms, by factorizations or by the nonnegativity of some minors (instead of all of them, as claimed in the de nition). In [24] for example, it was proved that a matrix is STP if and only if all subdeterminants with lists of consecutive rows and consecutive columns, starting at least one of these lists by 1, are positive. Necessarily, one of the lists must start with 1. Observe, that the new characterization considerably decreases the number of subdeterminants to be checked, compared with the classical characterization, due to Fekete and Polya [17], which used all subdeterminants with consecutive rows and columns.
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
45
This result means that the set of all subdeterminants of a matrix A with consecutive rows and columns, of the form 1; : : : ; j ; A i; : : : ; i + j − 1
i; : : : ; i + j − 1 ; A 1; : : : ; j
called in [24] column- and row-initial minors, play in total positivity a similar role to that of the leading principal minors 1; : : : ; j A 1; : : : ; j
in positive de niteness of symmetric real matrices. An algorithm based on Neville elimination was given in [24] with a complexity O(N 3 ) for a matrix of order N , instead of the one with O(N 4 ) previously obtained in [21,49]. Other similar simpli cations were obtained in [24] for the characterization of totally positive matrices (not strictly). Concerning factorizations, in [26] Neville elimination was described in terms of a product by bidiagonal unit-diagonal matrices. Some of the most well-known characterizations of TP and STP matrices are related to their LU factorization. Cryer [14,15], in the 1970s, extended to TP matrices what was previously known for STP matrices, thus obtaining the following result. A square matrix A is TP (resp. STP) i it has an LU factorization such that L and U are TP (STP). Here, as usual, L (resp. U ) denotes a lower (upper) triangular matrix and STP means triangular nonnegative matrices with all the nontrivial subdeterminants of any order strictly positive. Also Cryer pointed out that the matrix A is STP i it can be written in the form A=
N Y r=1
Lr
M Y
Us
s=1
where each Lr (resp. Us ) is a lower (upper) STP matrix. Observe that this result does not mention the relation of N or M with the order n of the matrix A. The matricial description of Neville elimination obtained in [26] produced in the same paper the following result. Let A be a nonsingular matrix of order n. Then A is STP i it can be expressed in the form: A = Fn−1 · · · F1 DG1 · · · Gn−1 ; where, for each i=1; 2; : : : ; n−1; Fi is a bidiagonal, lower triangular, unit diagonal matrix, with zeros in positions (2; 1); : : : ; (i; i − 1) and positive entries in (i + 1; i); : : : ; (n; n − 1); Gi has the transposed form of Fi and D is a diagonal matrix with positive diagonal. Similar results were obtained in [26] for TP matrices. In that paper all these new characterizations were collected in three classes: characterizations in terms of determinants, in terms of algorithms and in terms of factorizations. 7. Variation diminution and computer-aided geometric design An n × n matrix A is said to be sign-regular (SR) if for each 16k6n all its minors of order k have the same (non strict) sign (in the sense that the product of any two of them is greater than or
46
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
equal to zero). The matrix is strictly sign-regular (SSR) if for each 16k6n all its minors of order k are dierent from zero and have the same sign. In [27] a test for strict sign regularity is given. The importance of these types of matrices comes from their variation diminishing properties. By a sign sequence of a vector x = (x1 ; : : : ; x n )T ∈ Rn we understand any signature sequence for which i xi = |xi |; i = 1; 2; : : : ; n. The number of sign changes of x associated to , denoted by C(), is the number of indices i such that i i+1 ¡ 0; 16i6n − 1. The maximum (resp. minimum) variation of signs, V+ (x) (resp. V− (x)), is by de nition the maximum (resp. minimum) of C() when runs over all sign sequences of x. Let us observe that if xi 6= 0 for all i, then V+ (x) = V− (x) and this value is usually called the exact variation of signs. The next result (see [2, Theorems 5:3 and 5:6]) characterizes sign-regular and strictly sign-regular matrices in terms of their variation diminishing properties. Let A be an n × n nonsingular matrix. Then: (i) A is SR ⇔ V− (Ax)6V− (x) ∀x ∈ Rn . (ii) A is SR ⇔ V+ (Ax)6V+ (x) ∀x ∈ Rn . (iii) A is SSR ⇔ V+ (Ax)6V− (x) ∀x ∈ Rn \ {0}: The above matricial de nitions lead to the corresponding de nitions for systems of functions. A system of functions (u0 ; : : : ; un ) is sign-regular if all its collocation matrices are sign-regular of the same kind. The system is strictly sign-regular if all its collocation matrices are strictly sign-regular of the same kind. Here a collocation matrix is de ned to be a matrix whose (i; j)-entry is of the form ui (xj ) with any system of strictly increasing points xj . Sign-regular systems have important applications in CAGD. Given u0 ; : : : ; un , functions de ned on [a; b], and P0 ; : : : ; Pn ∈ Rk , we may de ne a curve (t) by
(t) =
n X
ui (t)Pi :
i=0
The points P0 ; : : : ; Pn are called control points, because we expect to modify the shape of the curve by changing these points adequately. The polygon with vertices P0 ; : : : ; Pn is called control polygon of . P In CAGD the functions u0 ; : : : ; un are usually nonnegative and normalized ( ni=0 ui (t)=1 ∀ t ∈ [a; b]). In this case they are called blending functions. These requirements imply that the curve lies in the convex hull of the control polygon (convex hull property). Clearly, (u0 ; : : : ; un ) is a system of blending functions if and only if all the collocation matrices are stochastic (that is, they are nonnegative matrices such that the elements of each row sum up to 1). For design purposes, it is desirable that the curve imitates the control polygon and that the control polygon even “exaggerates” the shape of the curve, and this holds when the system satis es variation diminishing properties. If (u0 ; : : : ; un ) is a sign-regular system of blending functions then the curve preserves many shape properties of the control polygon, due to the variation diminishing properties of (u0 ; : : : ; un ). For instance, any line intersects the curve no more often than it intersects the control polygon. A characterization of SSR matrices A by the Neville elimination of A and of some submatrices of A is obtained in [26, Theorem 4.1]. A system of functions (u0 ; : : : ; un ) is said to be totally positive if all its collocation matrices are totally positive. The system is normalized totally positive (NTP) if it is totally positive and Pn i=0 ui = 1.
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
47
Normalized totally positive systems satisfy an interesting shape-preserving property, which is very convenient for design purposes and which we call endpoint interpolation property: the initial and nal endpoints of the curve and the initial and nal endpoints (respectively) of the control polygon coincide. In summary, these systems are characterized by the fact that they always generate curves satisfying simultaneously the convex hull, variation diminishing and endpoint interpolation properties. Now the following question arises. Given a system of functions used in CAGD to generate curves, does there exist a basis of the space generated by that system with optimal shape preserving properties? Or equivalently, is there a basis such that the generated curves imitate better the form of the corresponding control polygon than the form of the corresponding control polygon for any other basis? In the space of polynomials of degree less than or equal to n on a compact interval, the Bernstein basis is optimal. This was conjectured by Goodman and Said in [30], and it was proved in [11]. In [12], there is also an armative answer to the above questions for any space with TP basis. Moreover, Neville elimination provides a constructive way to obtain optimal bases. In the space of polynomial splines, B-splines form the optimal basis. Since the product of TP matrices is a TP matrix, if (u0 ; : : : ; un ) is a TP system of functions and A is a TP matrix of order n+1, then the new system (u0 ; : : : ; un )A is again a TP system (which satis es a “stronger” variation diminishing property than (u0 ; : : : ; un )). If we obtain from a basis (u0 ; : : : ; un ), in this way, all the totally positive bases of the space, then (u0 ; : : : ; un ) will be the “least variation diminishing” basis of the space. In consequence, the control polygons with respect to (u0 ; : : : ; un ) will imitate the form of the curve better than the control polygons with respect to other bases of the space. Therefore, we may reformulate the problem of nding an optimal basis (b0 ; : : : ; bn ) in the following way: Given a vector space U with a TP basis, is there a TP basis (b0 ; : : : ; bn ) of U such that, for any TP basis (v0 ; : : : ; vn ) of U there exists a TP matrix K satisfying (v0 ; : : : ; vn ) = (b0 ; : : : ; bn )K?. The existence of such optimal basis (b0 ; : : : ; bn ) was proved in [12], where it was called B-basis. In the same paper, a method of construction, inspired by the Neville elimination process, was given. As mentioned above, Bernstein polynomials and B-splines are examples of B-bases. Another point of view for B-bases is closely related to corner cutting algorithms, which play an important role in CAGD. Given two NTP bases, (p0 ; : : : ; pn ); (b0 ; : : : ; bn ), let K be the nonsingular matrix such that (p0 ; : : : ; pn ) = (b0 ; : : : ; bn )K: Since both bases are normalized, if K is a nonnegative matrix, it is clearly stochastic. A curve can be expressed in terms of both bases
(t) =
n X i=0
Bi bi (t) =
n X
Pi pi (t);
t ∈ [a; b];
i=0
and the matrix K gives the relationship between both control polygons (B0 ; : : : ; Bn )T = K(P0 ; : : : ; Pn )T :
48
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
An elementary corner cutting is a transformation which maps any polygon P0 · · · Pn into another polygon B0 · · · Bn de ned by: Bj = Pj ;
j 6= i;
Bi = (1 − )Pi + Pi+1 ;
for one i ∈ {0; : : : ; n − 1}
(17)
for one i ∈ {1; : : : ; n}:
(18)
or Bj = Pj ;
j 6= i;
Bi = (1 − )Pi + Pi−1 ;
Here ∈ (0; 1). A corner-cutting algorithm is the algorithmic description of a corner cutting transformation, which is any composition of elementary corner cutting transformations. Let us assume now that the matrix K above is TP. Since it is stochastic, nonsingular and TP, it can be factorized as a product of bidiagonal nonnegative matrices, (as we have mentioned in Section 6), which can be interpreted as a corner cutting transformation. Such factorizations are closely related to the Neville elimination of the matrix [28]. From the variation diminution produced by the totally positive matrices of the process, it can be deduced that the curve imitates better the form of the control polygon B0 · · · Bn than that of the control polygon P0 · · · Pn . Therefore, we see again that an NTP basis (b0 ; : : : ; bn ) of a space U has optimal shape-preserving properties if for any other NTP basis (p0 ; : : : ; pn ) of U there exists a (stochastic) TP matrix K such that (p0 ; : : : ; pn ) = (b0 ; : : : ; bn )K:
(19)
Hence, a basis has optimal shape preserving properties if and only if it is a normalized B-basis. Neville elimination has also inspired the construction of B-bases in [11,12]. Many of these results and other important properties and applications of totally positive matrices have been collected, as we have already said in [28, Section 6]. References [1] A.G. Aitken, On interpolation by iteration of proportional parts without the use of dierences, Proc. Edinburgh Math. Soc. 3 (1932) 56–76. [2] T. Ando, Totally positive matrices, Linear Algebra Appl. 90 (1987) 165–219. [3] B. Beckermann, G. Muhlbach, A general determinantal identity of Sylvester type and some applications, Linear Algebra Appl. 197,198 (1994) 93–112. [4] Cl. Brezinski, The Muhlbach–Neville–Aitken-algorithm and some extensions, BIT 20 (1980) 444–451. [5] Cl. Brezinski, A general extrapolation algorithm, Numer. Math. 35 (1980) 175–187. [6] Cl. Brezinski, Convergence acceleration during the 20th century, this volume, J. Comput. Appl. Math. 122 (2000) 1–21. [7] Cl. Brezinski, Recursive interpolation, extrapolation and projection, J. Comput. Appl. Math. 9 (1983) 369–376. [8] Cl. Brezinski, M. Redivo Zaglia, Extrapolation methods, theory and practice, North-Holland, Amsterdam, 1991. [9] R.A. Brualdi, H. Schneider, Determinantal identities: Gauss, Schur, Cauchy, Sylvester, Kronecker, Jacobi, Binet, Laplace, Muir and Cayley, Linear Algebra Appl. 52=53 (1983) 769–791. [10] R.A. Brualdi, H. Schneider, Determinantal identities revisited, Linear Algebra Appl. 59 (1984) 183–211. [11] J.M. Carnicer, J.M. Pe˜na, Shape preserving representations and optimality of the Bernstein basis, Adv. Comput. Math. 1 (1993) 173–196.
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
49
[12] J.M. Carnicer, J.M. Pe˜na, Totally positive bases for shape preserving curve design and optimality of B-splines, Comput. Aided Geom. Design 11 (1994) 633–654. [13] R.W. Cottle, Manifestations of the Schur complement, Linear Algebra Appl. 8 (1974) 189–211. [14] C. Cryer, The LU-factorization of totally positive matrices, Linear Algebra Appl. 7 (1973) 83–92. [15] C. Cryer, Some poperties of totally positive matrices, Linear algebra Appl. 15 (1976) 1–25. [16] D.R. Faddeev, U.N. Faddeva, Computational Methods of Linear Algebra, Freeman, San Francisco, 1963. [17] M. Fekete, G. Polya, Uber ein Problem von Laguerre, Rend. C.M. Palermo 34 (1912) 89–120. [18] M. Gasca, A. Lopez-Carmona, V. Ramrez, A generalized Sylvester’s identity on determinants and 1st application to interpolation problems, in: W. Schempp, K. Zeller (Eds.), Multivariate Approximation Theory II, ISNM, Vol. 61, Biskhauser, Basel, 1982, pp. 171–184. [19] M. Gasca, A. Lopez-Carmona, A general interpolation formula and its application to multivariate interpolation, J. Approx. Theory 34 (1982) 361–374. [20] M. Gasca, E. Lebron, Elimination techniques and interpolation, J. Comput. Appl. Math. 19 (1987) 125–132. [21] M. Gasca, G. Muhlbach, Generalized Schur-complements and a test for total positivity, Appl. Numer. Math. 3 (1987) 215–232. [22] M. Gasca, G. Muhlbach, Generalized Schur-complements, Publicacciones del Seminario Matematico Garcia de Galdeano, Serie II, Seccion 1, No. 17, Universidad de Zaragoza, 1984. [23] M. Gasca, J.M. Pe˜na, Neville elimination and approximation theory, in: S.P. Singh (Ed.), Approximation Theory, Wavelets and Applications, Kluwer Academic Publishers, Dordrecht, 1995, pp. 131–151. [24] M. Gasca, J.M. Pe˜na, Total positivity and Neville elimination, Linear Algebra Appl. 165 (1992) 25–44. [25] M. Gasca, J.M. Pe˜na, On the characterization of TP and STP matrices, in: S.P. Singh (Ed.), Aproximation Theory, Spline Functions and Applications, Kluwer Academic Publishers, Dordrecht, 1992, pp. 357–364. [26] M. Gasca, J.M. Pe˜na, A matricial description of Neville elimination with applications to total positivity, Linear Algebra Appl. 202 (1994) 33–54. [27] M. Gasca, J.M. Pe˜na, A test for strict sign-regularity, Linear Algebra Appl. 197–198 (1994) 133–142. [28] M. Gasca, J.M. Pe˜na, Corner cutting algorithms and totally positive matrices, in: P.J. Laurent, A. Le Mehaute, L.L. Schumaker (Eds.), Curves and Surfaces II, 177–184, A.K. Peters, Wellesley, MA, 1994. [29] M. Gasca, C.A. Micchelli (Eds.), Total Positivity and its Applications, Kluwer Academic Publishers, Dordrecht, 1996. [30] T.N.T. Goodman, H.B. Said, Shape preserving properties of the generalized ball basis, Comput. Aided Geom. Design 8 (115 –121) 1991. [31] T. Havie, Generalized Neville type extrapolation schemes, BIT 19 (1979) 204–213. [32] T. Havie, Remarks on a uni ed theory of classical and generalized interpolation and extrapolation, BIT 21 (1981) 465–474. [33] T. Havie, Remarks on the Muhlbach–Neville–Aitken-algorithm, Math. a. Comp. Nr. 2=80, Department of Numerical Mathematics, The University of Trondheim, 1980. [34] E. Haynsworth, Determination of the inertia of a partitioned Hermitian matrix, Linear Algebra Appl. 1 (1968) 73–81. [35] E. Haynsworth, On the Schur Complement, Basel Mathemathics Notes No. 20, June 1968. [36] S. Karlin, Total Positivity, Stanford University Press, Standford, 1968. [37] S. Karlin, W.J. Studden, Tchebyche Systems: with Applications in Analysis and Statistics, Interscience, New York, 1966. [38] G. Muhlbach, Neville–Aitken Algorithms for interpolation by functions of Ceby sev-systems in the sense of Newton and in a generalized sense of hermite, in: A.G. Law, B.N. Sahney (Eds.), Theory of Approximation, with Applications, Proceedings of the International Congress on Approximation Theory in Calgary, 1975, Academic Press, New York, 1976, pp. 200–212. [39] G. Muhlbach, The general Neville–Aitken-algorithm and some applications, Numer. Math. 31 (1978) 97–110. [40] G. Muhlbach, On two general algorithms for extrapolation with applications to numerical dierentiation and integration, in: M.G. de Bruin, H. van Rossum (Eds.), Pade Approximation and its Applications, Lecture Notes in Mathematics, Vol. 888, Springer, Berlin, 1981, pp. 326–340. [41] G. Muhlbach, Extrapolation algorithms as elimination techniques with applications to systems of linear equations, Report 152, Institut fur Mathematik der Universitat Hannover, 1982, pp. 1– 47. [42] G. Muhlbach, Algorithmes d’extrapolation, Publication ANO 118, Universite de Lille 1, January 1984.
50
M. Gasca, G. Muhlbach / Journal of Computational and Applied Mathematics 122 (2000) 37–50
[43] G. Muhlbach, Sur une identite generalisee de Sylvester, Publication ANO 119, Universite de Lille 1, January 1984. [44] G. Muhlbach, M. Gasca, A generalization of Sylvester’s identity on determinants and some applications, Linear Algebra Appl. 66 (1985) 221–234. [45] G. Muhlbach, Two composition methods for solving certain systems of linear equations, Numer. Math. 46 (1985) 339–349. [46] G. Muhlbach, A recurrence formula for generalized divided dierences and some applications, J. Approx. Theory. 9 (1973) 165–172. [47] G. Muhlbach, On extending determinantal identities, Publicaciones del Seminario Matematico Garcia de Galdeano, Serie II, Seccion 1, No. 139, Universidad de Zaragoza, 1987. [48] G. Muhlbach, Linear and quasilinear extrapolation algorithms, in: R. Vichnevetsky, J. Vignes (Eds.), Numerical Mathematics and Applications, Elsevier, North-Holland, Amsterdam, IMACS, 1986, pp. 65 –71. [49] G. Muhlbach, M. Gasca, A test for strict total positivity via Neville elimination, in: F. Uhlig, R. Grone (Eds.), Current Trends in Matrix Theory, North-Holland, Amsterdam, 1987, pp. 225–232. [50] G. Muhlbach, Recursive triangles, in: D. Beinov, V. Covachev (Eds.), Proceedings of the third International Colloquium on Numerical Analysis, Utrecht, VSP, 1995, pp. 123–134. [51] T. Muir, The law of extensible minors in determinants, Trans. Roy. Soc. Edinburgh 30 (1883) 1–4. [52] E.H. Neville, Iterative interpolation, J. Indian Math. Soc. 20 (1934) 87–120. [53] D.V. Ouellette, Schur complements and statistics, Linear Algebra Appl. 36 (1981) 186–295. [54] C. Schneider, Vereinfachte rekursionen zur Richardson-extrapolation in spezialfallen, Numer. Math. 24 (1975) 177–184. [55] J.J. Sylvester, On the relation between the minor determinants of linearly equivalent quadratic functions, Philos. Mag. (4) (1851) 295 –305. [56] J.J. Sylvester, Collected Mathematical Papers, Vol. 1, Cambridge University Press, Cambridge, 1904, pp. 241–250.
Journal of Computational and Applied Mathematics 122 (2000) 51–80 www.elsevier.nl/locate/cam
The epsilon algorithm and related topics a
P.R. Graves-Morrisa; ∗ , D.E. Robertsb , A. Salamc
School of Computing and Mathematics, University of Bradford, Bradford, West Yorkshire BD7 1DP, UK b Department of Mathematics, Napier University, Colinton Road, Edinburgh, EH14 1DJ Scotland, UK c Laboratoire de MathÃematiques Pures et AppliquÃees, UniversitÃe du Littoral, BP 699, 62228 Calais, France Received 7 May 1999; received in revised form 27 December 1999
Abstract The epsilon algorithm is recommended as the best all-purpose acceleration method for slowly converging sequences. It exploits the numerical precision of the data to extrapolate the sequence to its limit. We explain its connections with Pade approximation and continued fractions which underpin its theoretical base. Then we review the most recent extensions of these principles to treat application of the epsilon algorithm to vector-valued sequences, and some related topics. In this paper, we consider the class of methods based on using generalised inverses of vectors, and the formulation speci cally c 2000 Elsevier Science B.V. All rights reserved. includes the complex case wherever possible. Keywords: Epsilon algorithm; qd algorithm; Pade; Vector-valued approximant; Wynn; Cross rule; Star identity; Compass identity; Designant
1. Introduction A sequence with a limit is as basic a topic in mathematics as it is a useful concept in science and engineering. In the applications, it is usually the limit of a sequence, or a xed point of its generator, that is required; the existence of the limit is rarely an issue, and rapidly convergent sequences are welcomed. However, if one has to work with a sequence that converges too slowly, the epsilon algorithm is arguably the best all-purpose method for accelerating its convergence. The algorithm was discovered by Wynn [54] and his review article [59] is highly recommended. The epsilon algorithm can also be used for weakly diverging sequences, and for these the desired limit is usually de ned as being a xed point of the operator that generates the sequence. There are interesting exceptional cases, such as quantum well oscillators [51], where the epsilon algorithm is not powerful enough and we refer to the companion paper by Homeier [33] in which the more ∗
Corresponding author. E-mail address:
[email protected] (P.R. Graves-Morris). c 2000 Elsevier Science B.V. All rights reserved. 0377-0427/00/$ - see front matter PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 5 5 - 1
52
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
powerful Levin-type algorithms, etc., are reviewed. The connections between the epsilon algorithm and similar algorithms are reviewed by Weniger [50,52]. This paper is basically a review of the application of the epsilon algorithm, with an emphasis on the case of complex-valued, vector-valued sequences. There are already many reviews and books which include sections on the scalar epsilon algorithm, for example [1,2,9,17,53]. In the recent past, there has been progress with the problem of numerical breakdown of the epsilon algorithm. Most notably, Cordellier’s algorithm deals with both scalar and vector cases [13–16]. This work and its theoretical basis has been extensively reviewed [26,27]. In this paper, we focus attention on how the epsilon algorithm is used for sequences (si ) in which si ∈ Cd . The case d = 1 is the scalar case, and the formulation for si ∈ C is essentially the same as that for si ∈ R. Not so for the vector case, and we give full details of how the vector epsilon and vector qd algorithms are implemented when si ∈ Cd , and of the connections with vector Pade approximation. Understanding these connections is essential for specifying the range of validity of the methods. Frequently, the word “normally” appears in this paper to indicate that the results may not apply in degenerate cases. The adaptations for the treatment of degeneracy are almost the same for both real and complex cases, and so we refer to [25 –27] for details. In Section 2, we formulate the epsilon algorithm, and we explain its connection with Pade approximation and the continued fractions called C-fractions. We give an example of how the epsilon algorithm works in ideal circumstances, without any signi cant loss of numerical precision (which is an unusual outcome). In Section 3, we formulate the vector epsilon algorithm, and we review its connection with vector-valued Pade approximants and with vector-valued C-fractions. There are two major generalisations of the scalar epsilon algorithm to the vector case. One of them is Brezinski’s topological epsilon algorithm [5,6,35,48,49]. This algorithm has two principal forms, which might be called the forward and backward versions; and the backward version has the orthogonality properties associated with Lanczos methods [8]. The denominator polynomials associated with all forms of the topological epsilon algorithm have degrees which are the same as those for the scalar case [2,5,8]. By contrast, the other generalisation of the scalar epsilon algorithm to the vector case can be based on using generalised inverses of vectors, and it is this generalisation which is the main topic of this paper. We illustrate how the vector epsilon algorithm works in a two-dimensional real space, and we give a realistic example of how it works in a high-dimensional complex space. The denominator polynomials used in the scalar case are generalised both to operator polynomials of the same degree and to scalar polynomials of double the degree in the vector case, and we explain the connections between these twin generalisations. Most of the topics reviewed in Section 3 have a direct generalisation to the rational interpolation problem [25]. We also note that the method of GIPAs described in Section 3 generalises directly to deal with sequences of functions in L2 (a; b) rather than vectors Cd ; in this sense, the vectors are regarded as discretised functions [2]. In Section 4 we review the use of the vector qd algorithm for the construction of vector-valued C-fractions, and we note the connections between vector orthogonal polynomials and the vector epsilon algorithm. We prove the cross-rule (4.18), (4.22) using a Cliord algebra. For real-valued vectors, we observe that it is really an overlooked identity amongst Hankel designants. Here, the Cross Rule is proved as an identity amongst complex-valued vectors using Moore–Penrose inverses. The importance of studying the vector epsilon algorithm lies partly in its potential [20] for application to the acceleration of convergence of iterative solution of discretised PDEs. For
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
53
example, Gauss–Seidel iteration generates sequences of vectors which often converge too slowly to be useful. SOR, multigrid and Lanczos methods are alternative approaches to the problem which are currently popular, but the success of the techniques like CGS and LTPMs (see [31] for an explanation of the techniques and the acronyms) indicates the need for continuing research into numerical methods for the acceleration of convergence of vector-valued sequences. To conclude this introductory section, we recall that all algorithms have their domains of validity. The epsilon algorithm fails for logarithmically convergent sequences (which converge too slowly) and it fails to nd the xed point of the generator of sequences which diverge too fast. For example, if C + O(n−2 ); C 6= 0; n the sequence (sn ) is logarithmically convergent to s. More precisely, a sequence is de ned to converge logarithmically to s if it converges to s at a rate governed by sn+1 − s lim = 1: n→∞ sn − s sn − s =
Not only does the epsilon algorithm usually fail for such sequences, but Delahaye and Germain-Bonne [18,19] have proved that there is no universal accelerator for logarithmically convergent sequences. Reviews of series transformations, such as those of the energy levels of the quantum-mechanical harmonic oscillator [21,50,51], and of the Riemann zeta function [34], instructively show the inadequacy of the epsilon algorithm when the series coecients diverge too fast. Information about the asymptotic form of the coecients and scaling properties of the solution is exploited to create purpose-built acceleration methods. Exotic applications of the -algorithm appear in [55]. 2. The epsilon algorithm The epsilon algorithm was discovered by Wynn [54] as an ecient implementation of Shanks’ method [47]. It is an algorithm for acceleration of convergence of a sequence S = (s0 ; s1 ; s2 ; : : : ; si ∈ C)
(2.1)
and it comprises the following initialisation and iterative phases: Initialisation: For j = 0; 1; 2; : : : ( j) −1 =0
(arti cially);
0( j) = sj :
(2.2) (2.3)
Iteration: For j; k = 0; 1; 2; : : : ( j) ( j+1) k+1 = k−1 + [k( j+1) − k( j) ]−1 :
(2.4)
The entries k( j) are displayed in the epsilon table on the left-hand side of Fig. 1, and the initialisation has been built in.
54
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
Fig. 1. The epsilon table, and a numerical example of it.
Example 2.1. Gregory’s series for tan−1 z is z3 z5 z7 + − + ··· : (2.5) 3 5 7 This series can be used to determine the value of by evaluating its MacLaurin sections at z = 1: tan−1 z = z −
sj := [4 tan−1 (z)]02j+1
z=1
;
j = 0; 1; 2; : : : :
(2.6)
Nuttall’s notation is used here and later on. For a function whose MacLaurin series is (z) = 0 + 1 z + 2 z 2 + · · · ; its sections are de ned by [(z)]kj =
k X
i z i
for 06j6k:
(2.7)
i=j
In fact, sj → as j → ∞ [2] but sequence (2.6) converges slowly, as is evidenced in the column k = 0 of entries sj = 0( j) in Fig. 1. The columns of odd index have little signi cance, whereas the columns of even index can be seen to converge to , which is the correct limit [2], increasingly ( j) fast, as far as the table goes. Some values of 2k are also shown on the bar chart (Fig. 2). Notice (2) (0) that 2 = 3:145 and 4 = 3:142 cannot be distinguished visually on this scale. In Example 2.1, convergence can be proved and the rate of convergence is also known [2]. From the theoretical viewpoint, Example 2.1 is ideal for showing the epsilon algorithm at its best. It is noticeable that the entries in the columns of odd index are large, and this eect warns us to beware of possible loss of numerical accuracy. Like all algorithms of its kind (which use reciprocal dierences of convergent sequences) the epsilon algorithm uses (and usually uses up) numerical precision of the data to do its extrapolation. In this case, there is little loss of numerical precision (0) using 16 decimal place (MATLAB) arithmetic, and 22 = almost to machine precision. In this case, the epsilon algorithm converges with great numerical accuracy because series (2.5) is a totally oscillating series [4,7,17,59]. To understand in general how and why the epsilon algorithm converges, whether we are referring ( j) ( j) to its even columns (2k ; j = 0; 1; 2; : : : ; k xed) or its diagonals (2k ; k = 0; 1; 2; : : : ; j xed) or any
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
55
( j) Fig. 2. Values of 2k for Example 2.1, showing the convergence rate of the epsilon algorithm using n + 1 = 1; 2; 3; 4; 5 terms of the given sequence.
other sequence, the connection with Pade approximation is essential [1,2,56]. Given a (possibly formal) power series f(z) = c0 + c1 z + c2 z 2 + · · · ;
(2.8)
the rational function A(z)B(z)−1 ≡ [‘=m](z)
(2.9)
is de ned as a Pade approximant for f(z) of type [‘=m] if (i) deg{A(z)}6‘;
deg{B(z)}6m;
(2.10)
(ii) f(z)B(z) − A(z) = O(z ‘+m+1 );
(2.11)
(iii) B(0) 6= 0:
(2.12)
The Baker condition B(0) = 1
(2.13)
is often imposed for reliability in the sense of (2.14) below and for a de nite speci cation of A(z) and B(z). The de nition above contrasts with the classical (Frobenius) de nition in which axiom (iii) is waived, and in this case the existence of A(z) and B(z) is guaranteed, even though (2.14) below is not. Using speci cation (2.10) – (2.13), we nd that f(z) − A(z)B(z)−1 = O(z ‘+m+1 );
(2.14)
56
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
Fig. 3. Relative location of Pade approximants.
provided that a solution of (2.15) below can be found. To nd B(z), the linear equations corresponding to accuracy-through-orders z ‘+1 ; z ‘+2 ; : : : ; z ‘+m in (2.11) must be solved. They are
c‘−m+1 .. . c‘
::: :::
c‘ .. .
c‘+m−1
bm c‘+1 .. .. . = − . :
P
b1
(2.15)
c‘+m
The coecients of B(z) = mi=0 bi z i are found using an accurate numerical solver of (2.15). By contrast, for purely theoretical purposes, Cramer’s rule is applied to (2.15). We are led to de ne c ‘−m+1 c ‘−m+2 [‘=m] q (z) = ... c‘ zm
c‘−m+2 c‘−m+3 .. . c‘+1 z m−1
::: ::: ::: :::
c‘+1 c‘+2 .. .
c‘+m 1
(2.16)
and then we nd that B[‘=m] (z) = q[‘=m] (z)=q[‘=m] (0)
(2.17)
is the denominator polynomial for the Pade approximation problem (2.9) – (2.15) provided that q[‘=m] (0) 6= 0. The collection of Pade approximants is called the Pade table, and in Fig. 3 we show ve neighbouring approximants in the table. These approximants satisfy a ve-point star identity, [N (z) − C(z)]−1 + [S(z) − C(z)]−1 = [E(z) − C(z)]−1 + [W (z) − C(z)]−1 ;
(2.18)
called Wynn’s identity or the compass identity. The proof of (2.18) is given in [1,2], and it is also a corollary (in the case d = 1) of the more general result (3.59) that we prove in the next section. Assuming (2.18) for the moment, the connection between Pade approximation and the epsilon
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
57
Fig. 4. Some arti cial entries in the Pade table are shown circled.
algorithm is given by connecting the coecients of f(z) with those of S with c0 = s0 ;
ci = si − si−1 ;
i = 1; 2; 3; : : : ;
and by Theorem 2.1. The entries in columns of even index in the epsilon table are values of PadÃe approximants given by ( j) 2k = [ j + k=k](1)
(2.19)
provided (i) zero divisors do not occur in the construction of the epsilon table; and (ii) the corresponding PadÃe approximants identiÿed by (2:19) exist. Proof. The entries W; C; E in the Pade table of Figs. 3 and 4 may be taken to correspond to entries ( j−1) ( j) ( j+1) 2k ; 2k ; 2k , respectively, in the epsilon table. They neighbour other elements in columns of odd ( j) ( j+1) ( j) ( j−1) ; ne := 2k−1 ; se := 2k+1 and sw := 2k+1 . By re-pairing, we index in the epsilon table, nw := 2k−1 have (nw − sw) − (ne − se) = (nw − ne) − (sw − se):
(2.20)
By applying the epsilon algorithm to each term in (2.20), we obtain the compass identity (2.18). With our conventions, the approximants of type [‘=0] lie in the rst row (m = 0) of the Pade table. This is quite natural when we regard these approximants as MacLaurin sections of f(z). However, it must be noted that the row sequence ([‘=m](1); ‘ = m + j; m + j + 1; : : : ; m xed) ( j) corresponds to the column sequence of entries (2m ; j = 0; 1; 2; : : : ; m xed); this identi cation follows from (2.19). A key property of Pade approximants that is an axiom of their de nition is that of accuracy-throughorder, also called correspondence. Before Pade approximants were known as such, attention had rightly been focused on the particular sequence of rational fractions which are truncations of the
58
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
continued fraction c0 za1 za2 za3 ··· : f(z) = 1 − 1 − 1 − 1 −
(2.21)
The right-hand side of (2.21) is called a C-fraction (for instance, see [36]), which is short for corresponding fraction, and its truncations are called its convergents. Normally, it can be constructed by successive reciprocation and re-expansion. The rst stage of this process is 1 − c0 =f(z) a1 za2 za3 = ··· : (2.22) z 1 − 1 − 1 − By undoing this process, we see that the convergents of the C-fraction are rational fractions in the variable z. By construction, we see that these convergents agree order by order with f(z), provided all ai 6= 0, and this property is called correspondence. Example 2.2. We truncate (2.21) after a2 and obtain za1 A2 (z) c0 : = − B2 (z) 1 1 − za2
(2.23)
This is a rational fraction of type [1=1], and we take A2 (z) = c0 (1 − za2 );
B2 (z) = 1 − z(a1 + a2 ):
Provided all the ai 6= 0, the convergents of (2.21) are well de ned. The equality in (2.21) is not to be understood in the sense of pointwise convergence for each value of z, but in the sense of correspondence order by order in powers of z. The numerators and denominators of the convergents of (2.21) are usually constructed using Euler’s recursion. It is initialised, partly arti cially, by A−1 (z) = 0;
A0 (z) = c0 ;
B−1 (z) = 1;
B0 (z) = 1
(2.24)
and the recursion is Ai+1 (z) = Ai (z) − ai+1 zAi−1 (z);
i = 0; 1; 2; : : : ;
(2.25)
Bi+1 (z) = Bi (z) − ai+1 zBi−1 (z);
i = 0; 1; 2; : : : :
(2.26)
Euler’s formula is proved in many texts, for example, [1,2,36]. From (2.24) to (2.26), it follows by induction that i i+1 ‘ = deg{Ai (z)}6 ; m = deg{Bi (z)}6 ; (2.27) 2 2 where [ · ] represents the integer part function and the Baker normalisation is built in: Bi (0) = 1;
i = 0; 1; 2; : : : :
(2.28)
The sequence of approximants generated by (2.24) – (2.26) is shown in Fig. 5. From (2.19) and (2.27), we see that the convergents of even index i = 2k correspond to Pade (0) approximants of type [k=k]; when they are evaluated at z = 1, they are values of 2k on the leading diagonal of the epsilon table.
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
59
Fig. 5. A staircase sequence of approximants indexed by i, as in (2.27).
The epsilon algorithm was introduced in (2.1) – (2.4) as a numerical algorithm. Eq. (2.19) states its connection with values of certain Pade approximants. However, the epsilon algorithm can be given a symbolic interpretation if it is initialised with ( j) −1
= 0;
0( j)
=
j X
ci z i
(2.29)
i=0
instead of (2.2) and (2.3). In this case, (2.19) would become (j) (z) = [ j + k=k](z): 2k
(2.30)
The symbolic implementation of the iterative process (2.4) involves considerable cancellation of polynomial factors, and so we regard this procedure as being primarily of conceptual value. We have avoided detailed discussions of normality and degeneracy [1,2,25] in this paper so as to focus on the algorithmic aspects. The case of numerical breakdown associated with zero divisors is treated by Cordellier [14,15] for example. Refs. [1,2] contain formulae for the dierence between Pade approximants occupying neighbouring positions in the Pade table. Using these formulae, one can show that condition (i) of Theorem 2.1 implies that condition (ii) holds, and so conditions (ii) can be omitted. It is always worthwhile to consider the case in which an approximation method gives exact results at an intermediate stage so that the algorithm is terminated at that stage. For example, let f(z) = 0 +
k X =1
1 − z
(2.31)
with ; ∈ C, each | | ¡ 1, each 6= 0 and all distinct. Then f(z) is a rational function of precise type [k/k]. It is the generating function of the generalised geometric sequence S with elements sj = 0 +
k X =1
1 − j+1 ; 1 −
j = 0; 1; 2; : : : :
(2.32)
60
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
This sequence is sometimes called a Dirichlet series and it converges to s∞ = f(1) as j → ∞. Its elements can also be expressed as sj = s∞ −
k X =1
w j
(2.33)
if s∞ =
k X
+
=0
k X
w
and
w = (1 − )−1 :
=1
Then (2.33) expresses the fact that S is composed of exactly k non-trivial, distinct geometric components. Theorem 2.1 shows that the epsilon algorithm yields ( j) = s∞ ; 2k
j = 0; 1; 2; : : :
which is the ‘exact result’ in each row of the column of index 2k, provided that zero divisors have not occurred before this column is constructed. The algorithm should be terminated at this stage via a consistency test, because zero divisors necessarily occur at the next step. Remarkably, the epsilon algorithm has some smoothing properties [59], which may (or may not) disguise this problem when rounding errors occur. In the next sections, these results will be generalised to the vector case. To do that, we will also need to consider the paradiagonal sequences of Pade approximants given by ([m + J /m](z); m = (J) 0; 1; 2; : : : ; J ¿0; J xed). After evaluation at z = 1, we nd that this is a diagonal sequence (2m ; m= 0; 1; 2; : : : ; J ¿0; J xed) in the epsilon table. 3. The vector epsilon algorithm The epsilon algorithm acquired greater interest when Wynn [57,58] showed that it has a useful and immediate generalisation to the vector case. Given a sequence S = (s0 ; s1 ; s2 ; : : : : si ∈ Cd );
(3.1)
the standard implementation of the vector epsilon algorithm (VEA) consists of the following initialisation from S followed by its iteration phase: Initialisation: For j = 0; 1; 2; : : : ; ( j) −1 =0
(arti cially);
(3.2)
0( j) = sj :
(3.3)
Iteration: For j; k = 0; 1; 2; : : : ; ( j) ( j+1) k+1 = k−1 + [k( j+1) − k( j) ]−1 :
(3.4)
The iteration formula (3.4) is identical to (2.4) for the scalar case, except that it requires the speci cation of an inverse (reciprocal) of a vector. Usually, the Moore–Penrose (or Samelson) inverse , d X
C−1 = C∗ =(CH C) = C∗
i=1
|vi |2
(3.5)
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
61
Fig. 6. Columns k = 0; 2 and 4 of the vector epsilon table for Example 3.1 are shown numerically and graphically.
(where the asterisk denotes the complex conjugate and H the Hermitian conjugate) is the most useful, but there are exceptions [39]. In this paper, the vector inverse is de ned by (3.5). The vector epsilon table can then be constructed column by column from (3.2) to (3.4), as in the scalar case, and as shown in Fig. 6. Example 3.1. The sequence S is initialised by s0 := b := (−0:1; 1:5)T
(3.6)
(where T denotes the transpose) and it is generated recursively by sj+1 := b + Gsj ; with
0:6 G= −1
j = 0; 1; 2; : : :
0:5 : 0:5
The xed point of (3.7) is x = [1; 1], which is the solution of Ax = b with A = I − G. Notice that 4( j) = x
(3.7)
for j = 0; 1; 2
and this ‘exact’ result is clearly demonstrated in the right-hand columns of Fig. 6.
(3.8)
62
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
Fig. 7. Schematic view of the two components of u1 (x) and the boundary
on the x1 -axis.
This elementary example demonstrates how the VEA can be a powerful convergence accelerator in an ideal situation. With the same rationale as was explained in the scalar case, the vector epsilon algorithm is used for sequences of vectors when their convergence is too slow. Likewise, the VEA can nd an accurate solution (as a xed point of an associated matrix operator) even when the sequence of vectors is weakly divergent. In applications, these vector sequences usually arise as sequences of discretised functions, and the operator is a (possibly nonlinear) integral operator. An example of this kind of vector sequence is one that arises in a problem of current interest. We consider a problem in acoustics, which is based on a boundary integral equation derived from the Helmholtz equation [12]. Our particular example includes impedance boundary conditions (3.12) relevant to the design of noise barriers. Example 3.2. This is an application of the VEA for the solution of u(x) = u1 (x) + ik
Z
G(x; y)[ (y) − 1]u(y) dy
(3.9)
for the acoustic eld u(x) at the space point x = (x1 ; x2 ). This eld is con ned to the half-space x2 ¿0 by a barrier shown in Fig. 7. The inhomogeneous term in (3.9) is u1 (x) = eik(x1 sin −x2 cos ) + R:eik(x1 sin +x2 cos )
(3.10)
which represents an incoming plane wave and a “partially re ected” outgoing plane wave with wave number k. The re ection coecient in (3.10) is given by R = −tan2
; 2
(3.11)
so that u1 (x) and u(x) satisfy the impedance boundary conditions @u1 = −iku1 @x2
and
@u = −ik u @x2
on :
(3.12)
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
63
Notice that u(x1 ; 0)=u1 (x1 ; 0) if (x1 ; 0) ≡ 1. Then a numerically useful form of the Green’s function in (3.9) is [10] eikr i G(x; y) = H0(1) (kr) + 2
Z
∞
0
t −1=2 e−krt (1 + + it) √ dt; t − 2i(t − i − i )2
(3.13)
where w = x − y; r = |w|, = w2 /r and H0(1) (z) is a Hankel function of the rst kind, as speci ed more fully in [10,11]. By taking x2 = 0 in (3.9), we see from (3.13) that u(x1 ; 0) satis es an integral equation with Toeplitz structure, and the fast Fourier transform yields its iterative solution eciently. Without loss of generality, we use the scale determined by k =1 in (3.9) – (3.13). For this example, the impedance is taken to be = 1:4ei=4 on the interval = {x: − 40 ¡ x1 ¡ 40; x2 = 0}. At two sample points (x1 ≈ −20 and 20) taken from a 400-point discretisation of , we found the following results with the VEA using 16 decimal place (MATLAB) arithmetic 0(12) = [ : : ; −0:36843 + 0:44072i; : : : ; −0:14507 + 0:55796i; : :]; 2(10) = [ : : ; −0:36333 + 0:45614i; : : : ; −0:14565 + 0:56342i; : :]; 4(8) = [ : : ; −0:36341 + 0:45582i; : : : ; −0:14568 + 0:56312i; : :];
6(6) = [ : : ; −0:36341 + 0:45583i; : : : ; −0:14569 + 0:56311i; : :];
8(4) = [ : : ; −0:36341 + 0:45583i; : : : ; −0:14569 + 0:56311i; : :]; where the converged gures are shown in bold face. Each of these results, showing just two of the components of a particular ( j) in columns = 0; 2; : : : ; 8 of the vector-epsilon table, needs 12 iterations of (3.9) for its construction. In this application, these results show that the VEA converges reasonably steadily, in contrast to Lanczos type methods, eventually yielding ve decimal places of precision. Example 3.2 was chosen partly to demonstrate the use of the vector epsilon algorithm for a weakly convergent sequence of complex-valued data, and partly because the problem is one which lends itself to iterative methods. In fact, the example also shows that the VEA has used up 11 of the 15 decimal places of accuracy of the data to extrapolate the sequence to its limit. If greater precision is required, other methods such as stabilised Lanczos or multigrid methods should be considered. The success of the VEA in examples such as those given above is usually attributed to the fact ( j) that the entries {2k ; j = 0; 1; 2; : : :} are the exact limit of a convergent sequence S if S is generated by precisely k nontrivial geometric components. This result is an immediate and direct generalisation of that for the scalar case given in Section 2. The given vector sequence is represented by sj = C0 +
k X
C
=1
j X
( )i = s∞ −
i=0
k X
w ( )j ;
j = 0; 1; 2; : : : ;
(3.14)
=1
where each C ; w ∈ Cd ; ∈ C; | | ¡ 1, and all the are distinct. The two representations used in (3.14) are consistent if k X =0
C = s∞ −
k X =1
w
and
C = w (−1 − 1):
To establish this convergence result, and its generalisations, we must set up a formalism which allows vectors to be treated algebraically.
64
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
From the given sequence S = (si ; i = 0; 1; 2; : : : ; : si ∈ Cd ), we form the series coecients c0 := s0 ;
ci := si − si−1 ;
i = 1; 2; 3; : : :
(3.15)
and the associated generating function f (z) = c0 + c1 z + c2 z 2 + · · · ∈ Cd [[z]]:
(3.16)
Our rst aim is to nd an analogue of (2.15) which allows construction, at least in principle, of the denominator polynomials of a vector-valued Pade approximant for f (z). This generalisation is possible if the vectors cj in (3.16) are put in one–one correspondence with operators cj in a Cliord algebra A. The details of how this is done using an explicit matrix representation were basically set out by McLeod [37]. We use his approach [26,27,38] and square matrices Ei , i = 1; 2; : : : ; 2d + 1 of dimension 22d+1 which obey the anticommutation relations Ei Ej + Ej Ei = 2ij I;
(3.17)
where I is an identity matrix. The special matrix J = E2d+1 is used to form the operator products Fi = JEd+i ;
i = 1; 2; : : : ; d:
(3.18)
Then, to each vector w = x + iy ∈ Cd whose real and imaginary parts x; y ∈ Rd , we associate the operator w=
d X
xi E i +
i=1
d X
yi Fi :
(3.19)
i=1
The real linear space V C is de ned as the set of all elements of the form (3.19). If w1 ; w2 ∈V C correspond to w1 ; w2 ∈ Cd and ; are real, then w3 = w1 + w2 ∈ V C
(3.20) d
corresponds uniquely to w3 = w1 + w2 ∈ C . Were ; complex, the correspondence would not be d one–one. We refer to the space V C as the isomorphic image of C , where the isomorphism preserves linearity only in respect of real multipliers as shown in (3.20). Thus the image of f (z) is f(z) = c0 + c1 z + c2 z 2 + · · · ∈ V C [[z]]:
(3.21)
The elements Ei , i = 1; 2; : : : ; 2d + 1 are often called the basis vectors of A, and their linear combinations are called the vectors of A. Notice that the Fi are not vectors of A and so the vectors of A do not form the space V C . Products of the nonnull vectors of A are said to form the Lipschitz group [40]. The reversion operator, denoted by a tilde, is de ned as the anti-automorphism which reverses the order of the vectors constituting any element of the Lipschitz group and the operation is extended to the whole algebra A by linearity. For example, if ; ∈ R and D = E1 + E4 E5 E6 ; then D˜ = E1 + E6 E5 E4 : Hence (3.18) and (3.19) imply that w˜ =
d X i=1
xi E i −
d X i=1
yi Fi :
(3.22)
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
65
We notice that w˜ corresponds to w∗ , the complex conjugate of w, and that ww ˜ =
d X i=1
(xi2 + yi2 )I = ||w||22 I
(3.23)
is a real scalar in A. The linear space of real scalars in A is de ned as S := {I; ∈ R}. Using (3.23) we can form reciprocals, and 2 w−1 = w=|w| ˜ ;
(3.24)
|w| := ||w||;
(3.25)
where so that w−1 is the image of w−1 as de ned by (3.5). Thus (3.19) speci es an isomorphism between (i) the space Cd , having representative element w = x + iy
and an inverse
w−1 = w∗ =||w||2 ;
(ii) the real linear space V C with a representative element w=
d X
xi E i +
d X
i=1
yi Fi
and its inverse given by
2 w−1 = w=|w| ˜ :
i=1
The isomorphism preserves inverses and linearity with respect to real multipliers, as shown in (3.20). Using this formalism, we proceed to form the polynomial q2j+1 (z) analogously to (2.15). The equations for its coecients are
c0 .. . cj
··· ···
(2j+1) qj+1 cj −cj+1 .. .. . . . = .. c2j −c2j+1 q1(2j+1)
(3.26)
which represent the accuracy-through-order conditions; we assume that q0(2j+1) = q2j+1 (0) = I . In (2j+1) (2j+1) ; qj ; : : : ; q2(2j+1) sequentially, nd q1(2j+1) and then principle, we can eliminate the variables qj+1 the rest of the variables of (3.26) by back-substitution. However, the resulting qi(2j+1) turn out to be higher grade quantities in the Cliord algebra, meaning that they involve higher-order outer products of the fundamental vectors. Numerical representation of these quantities uses up computer storage and is undesirable. For practical purposes, we prefer to work with low-grade quantities such as scalars and vectors [42]. The previous remarks re ect the fact that, in general, the product w1 ; w2 ; w3 6∈ V C when w1 ; w2 ; w3 ∈ V C . However, there is an important exception to this rule, which we formulate as follows [26], see Eqs. (6:3) and (6:4) in [40]. d Lemma 3.3. Let w; t ∈ V C be the images of w = x + iy; t = u + iC ∈ C . Then
(i) t w˜ + wt˜ = 2 Re(wH t)I ∈ S;
(3.27)
(ii) wt˜w = 2w Re(wH t) − t||w||2 ∈ V C:
(3.28)
66
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
Proof. Using (3.17), (3.18) and (3.22), we have t w˜ + wt˜ =
d X d X
(ui Ei + vi Fi )(xj Ej − yj Fj ) + (xj Ej + yj Fj )(ui Ei − vi Fi )
i=1 j=1
= (uT x + CT y)I = 2 Re(wH t)I because, for i; j = 1; 2; : : : ; d; Fi Ej − Ej Fi = 0;
Fi Fj + Fj Fi = −2ij I:
For part (ii), we simply note that wt˜w = w(t˜w + wt) ˜ − wwt: ˜ We have noted that, as j increases, the coecients of q2j+1 (z) are increasingly dicult to store. Economical approximations to q2j+1(z) are given in [42]. Here we proceed with
c0 .. .
···
cj+1
···
(2j+1) qj+1 . ..
0 cj+1 .. . .. . (2j+1) = 0 q1 c2j+2 e2j+1 I
(3.29)
which are the accuracy-through-order conditions for a right-handed operator Pade approximant (OPA) p2j+1 (z)[q2j+1 (z)]−1 for f(z) arising from f(z)q2j+1 (z) = p2j+1 (z) + e2j+1 z 2j+2 + O(z 2j+3 ):
(3.30)
The left-hand side of (3.29) contains a general square Hankel matrix with elements that are operators from V C . A remarkable fact, by no means obvious from (3.29) but proved in the next theorem, is that e2j+1 ∈ V C:
(3.31)
This result enables us to use OPAs of f(z) without constructing the denominator polynomials. A quantity such as e2j+1 in (3.29) is called the left-designant of the operator matrix and it is denoted by c0 e2j+1 = ... c
j+1
···
cj+1 .. .
···
c2j+2
:
(3.31b)
l
The subscript l (for left) distinguishes designants from determinants, which are very dierent constructs. Designants were introduced by Heyting [32] and in this context by Salam [43]. For present purposes, we regard them as being de ned by the elimination process following (3.26). Example 3.4. The denominator of the OPA of type [0=1] is constructed using
c0 c1
c1 c2
"
#
0 q1(1) = : e I 1
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
67
We eliminate q1(1) as described above following (3.26) and nd that c e1 = 2 c1
c1 = c2 − c1 c0−1 c1 ∈ span{c0 ; c1 ; c2 }: c0 l
(3.32)
Proceeding with the elimination in (3.29), we obtain
c2 −
c1 c0−1 c1
.. . cj+2 − cj+1 c0−1 c1
··· ···
cj+2 − c2j+2 −
c1 c0−1 cj+1 .. .
cj+1 c0−1 cj+1
(2j+1) qj+2 . ..
0 .. .
: (2j+1) = 0 q1
I
(3.33)
e2j+1
Not all the elements of the matrix in (3.33) are vectors. An inductive proof that e2j+1 is a vector (at least in the case when the cj are real vectors and the algebra is a division ring) was given by Salam [43,44] and Roberts [41] using the designant forms of Sylvester’s and Schweins’ identities. We next construct the numerator and denominator polynomials of the OPAs of f(z) and prove (3.31) using Berlekamp’s method [3], which leads on to the construction of vector Pade approximants. Deÿnitions. Given the series expansion (3.22) of f(z), numerator and denominator polynomials Aj (z); Bj (z) ∈ A[z] of degrees ‘j ; mj are de ned sequentially for j = 0; 1; 2; : : : ; by −1 Aj+1 (z) = Aj (z) − zAj−1 (z)ej−1 ej ;
(3.34)
−1 Bj+1 (z) = Bj (z) − zBj−1 (z)ej−1 ej
(3.35)
in terms of the error coecients ej and auxiliary polynomials Dj (z) which are de ned for j=0; 1; 2; : : : by ej := [f(z)Bj (z)B˜ j (z)]j+1 ;
(3.36)
−1 : Dj (z) := B˜ j (z)Bj−1 (z)ej−1
(3.37)
These de nitions are initialised with A0 (z) = c0 ;
B0 (z) = I;
e 0 = c1 ;
A−1 (z) = 0;
B−1 (z) = I;
e−1 = c0 :
(3.38)
Example 3.5. A1 (z) = c0 ;
B1 (z) = I − zc0−1 c1 ;
−1 D1 (z) = c1−1 − z c˜1 c˜−1 0 c1 :
e1 = c2 − c1 c0−1 c1 ; (3.39)
Lemma 3.6. Bj (0) = I;
j = 0; 1; 2; : : : :
(3.40)
68
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
Proof. See (3.35) and (3.38). Theorem 3.7. With the deÿnitions above; for j = 0; 1; 2; : : : ; (i)
f(z)Bj (z) − Aj (z) = O(z j+1 ):
(ii)
‘j := deg{Aj (z)} = [ j=2];
(iii)
Bj (z)B˜ j (z) = B˜ j (z)Bj (z) ∈ S[z]:
(3.43)
(iv)
ej ∈ V C:
(3.44)
(v)
Dj (z); Aj (z)B˜ j (z) ∈ V C [z]:
(3.45)
(vi)
f(z)Bj (z) − Aj (z) = ej z j+1 + O(z j+2 ):
(3.46)
mj := deg{Bj (z)} = [(j + 1)=2];
(3.41) deg{Aj (z)B˜ j (z)} = j:
(3.42)
Proof. Cases j=0; 1 are veri ed explicitly using (3.38) and (3.39). We make the inductive hypothesis that (i) – (vi) hold for index j as stated, and for index j − 1. Part (i): Using (3.34), (3.35) and the inductive hypothesis (vi), −1 f(z)Bj+1 (z) − Aj+1 (z) = f(z)Bj (z) − Aj (z) − z( f(z)Bj−1 (z) − Aj−1 (z))ej−1 ej = O(z j+2 ):
Part (ii): This follows from (3.34), (3.35) and the inductive hypothesis (ii). Part (iii): Using (3.27) and (3.35), and hypotheses (iii) – (iv) inductively, B˜ j+1 (z)Bj+1 (z) = B˜ j (z)Bj (z) + z 2 B˜ j−1 (z)Bj−1 (z)|ej |2 |ej−1 |−2 − z[Dj (z)ej + e˜ j D˜ j (z)] ∈ S[z] and (iii) follows after postmultiplication by B˜ j+1 (z) and premultiplication by [B˜ j+1 (z)]−1 , see [37, p. 45]. Part (iv): By de nition (3.36), 2mj+1
ej+1 =
X
cj+2−i i ;
i=0
where each i = [Bj+1 (z)B˜ j+1 (z)]i ∈ S is real. Hence ej+1 ∈ V C: Part (v): From (3.35) and (3.37), Dj+1 (z) = [B˜ j (z)Bj (z)]ej−1 − z[e˜ j D˜ j (z)ej−1 ]: Using part (v) inductively, parts (iii), (iv) and Lemma 3.3, it follows that Dj+1 (z) ∈ V C [z]. Using part (i), (3.40) and the method of proof of part (iv), we have Aj+1 (z)B˜ j+1 (z) = [f(z)Bj+1 (z)B˜ j+1 (z)]j+1 ∈V C [z]: 0 Part (vi): From part (i), we have f(z)Bj+1 (z) − Aj+1 (z) = j+1 z j+2 + O(z j+3 ) for some j+1 ∈ A. Hence, f(z)Bj+1 (z)B˜ j+1 (z) − Aj+1 (z)B˜ j+1 (z) = j+1 z j+2 B˜ j+1 (z) + O(z j+3 ): Using (ii) and (3.40), we obtain j+1 = ej+1 , as required.
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
69
Corollary. The designant of a Hankel matrix of real (or complex) vectors is a real (or complex) vector. Proof. Any designant of this type is expressed by e2j+1 in (3.31b), and (3.44) completes the proof. The implications of the previous theorem are extensive. From part (iii) we see that Qj (z) : I := Bj (z)B˜ j (z)
(3.47)
de nes a real polynomial Qj (z). Part (iv) shows that the ej are images of vectors ej ∈ Cd ; part (vi) (0) of justi es calling them error vectors but they are also closely related to the residuals b − A2j d ˜ Example 3.1. Part (v) shows that Aj (z)Bj (z) is the image of some Pj (z) ∈ C [z], so that Aj (z)B˜ j (z) =
d X i=1
[Re{Pj }(z)]i Ei +
d X i=1
[Im{Pj }(z)]i Fi :
(3.48)
From (3.17) and (3.18), it follows that Pj (z) · Pj∗ (z) = Qj (z)Qˆ j (z);
(3.49)
where Qˆ j (z) is a real scalar polynomial determined by Qˆ j (z)I = Aj (z)A˜j (z). Property (3.49) will later be used to characterise certain VPAs independently of their origins in A. Operator Pade approximants were introduced in (3.34) and (3.35) so as to satisfy the accuracy-through-order property (3.41) for f(z). To generalise to the full table of approximants, only initialisation (3.38) and the degree speci cations (3.42) need to be changed. For J ¿ 0, we use A(0J ) (z) =
J X
ci z i ;
B0( J ) (z) = I;
e0( J ) = cJ +1 ;
ci z i ;
(J) B−1 (z) = I;
(J) e−1 = cJ ;
i=0 J) A(−1 (z) =
J −1 X i=0
(3.50)
‘j( J ) := deg{A(j J ) (z)} = J + [ j=2]; m(j J ) := deg{Bj( J ) (z)} = [(j + 1)=2]
(3.51)
and then (3.38) and (3.42) correspond to the case of J = 0. For J ¡ 0, we assume that c0 6= 0, and de ne −1 ˜ ˜ f(z)] g(z) = [f(z)]−1 = f(z)[f(z)
(3.52)
corresponding to g(z) = [ f (z)]−1 = f ∗ (z)[ f (z) : f ∗ (z)]−1 :
(3.53)
(If c0 = 0, we would remove a maximal factor of z from f(z) and reformulate the problem.)
70
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
Then, for J ¡ 0, A(0J ) (z) = I; A(1J ) (z) = I;
B0( J ) (z) = B1( J ) (z) =
−J X
gi z i ;
e0( J ) = [f(z)B0( J ) (z)]1−J ;
gi z i ;
e1( J ) = [f(z)B1( J ) (z)]2−J ;
i=0 1−J X i=0
‘j( J ) := deg{A(j J ) (z)} = [ j=2]; m(j J ) := deg{Bj( J ) (z)} = [(j + 1)=2] − J:
(3.54)
If an approximant of given type [‘=m] is required, there are usually two dierent staircase sequences of the form S (J ) = (A(j J ) (z)[Bj( J ) (z)]−1 ;
j = 0; 1; 2; : : :)
(3.55)
which contain the approximant, corresponding to two values of J for which ‘ = ‘j( J ) and m = m(j J ) . For ease of notation, we use p[‘=m] (z) ≡ A(j J ) (z) and q[‘=m] (z) ≡ Bj( J ) (z). The construction based on (3.41) is for right-handed OPAs, as in f(z) = p[‘=m] (z)[q[‘=m] (z)]−1 + O(z ‘+m+1 );
(3.56)
but the construction can easily be adapted to that for left-handed OPAs for which f(z) = [q[‘=m] (z)]−1 p [‘=m] (z) + O(z ‘+m+1 ):
(3.57)
Although the left- and right-handed numerator and denominator polynomials usually are dierent, the actual OPAs of given type are equal: Theorem 3.8 (Uniqueness). Left-handed and right-handed OPAs; as speciÿed by (3.56) and (3.57) are identical: [‘=m](z) := p[‘=m] (z)[q[‘=m] (z)]−1 = [q[‘=m] (z)]−1 p [‘=m] (z) ∈ V C
(3.58)
and the OPA of type [‘=m] for f(z) is unique. Proof. Cross-multiply (3.58), use (3.56), (3.57) and then (3.40) to establish the formula in (3.58). Uniqueness of [‘=m](z) follows from this formula too, and its vector character follows from (3.43) and (3.45). The OPAs and the corresponding VPAs satisfy the compass ( ve-point star) identity amongst approximants of the type shown in the same format as Fig. 3. Theorem 3.9 (Wynn’s compass identity [57,58]). [N (z) − C (z)]−1 + [S(z) − C (z)]−1 = [E(z) − C (z)]−1 + [W (z) − C (z)]−1 :
(3.59)
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
71
Proof. We consider the accuracy-through-order equations for the operators: p N (z)qC (z) − qN (z)pC (z) = z ‘+m p˙ N q˙C ; p C (z)qW (z) − qC (z)pW (z) = z ‘+m p˙ C q˙W ; p N (z)qW (z) − qN (z)pW (z) = z ‘+m p˙ N q˙W ; where q˙ ; p˙ denote the leading coecients of p (z); q (z), and care has been taken to respect noncommutativity. Hence [N (z) − C(z)]−1 − [W (z) − C(z)]−1 = [N (z) − C(z)]−1 (W (z) − N (z))[W (z) − C(z)]−1 = qC [p N qC − qN pC ]−1 (qN pW − p N qW )[qC pW − p C qW ]−1 qC −1
= z −‘−m qC (z)q˙−1 ˙ C qC (z): C p Similarly, we nd that −1
˙ C qC (z) [E(z) − C(z)]−1 − [S(z) − C(z)]−1 = z −‘−m qC (z)q˙−1 C p and hence (3.59) is established in its operator form. Complex multipliers are not used in it, and so (3.59) holds as stated. An important consequence of the compass identity is that, with z = 1, it becomes equivalent to the vector epsilon algorithm for the construction of E(1) as we saw in the scalar case. If the elements sj ∈ S have representation (3.14), there exists a scalar polynomial b(z) of degree k such that f (z) = a(z)=b(z) ∈ Cd [[z]]:
(3.60)
If the coecients of b(z) are real, we can uniquely associate an operator f(z) with f (z) in (3.60), and then the uniqueness theorem implies that ( j) = f (1) 2k
(3.61)
and we are apt to say that column 2k of the epsilon table is exact in this case. However, Example 3.2 indicates that the condition that b(z) must have real coecients is not necessary. For greater generality in this respect, generalised inverse, vector-valued Pade approximants (GIPAs) were introduced [22]. The existence of a vector numerator polynomial P [n=2k] (z) ∈ Cd [z] and a real scalar denominator polynomial Q[n=2k] (z) having the following properties is normally established by (3.47) and (3.48): (i) deg{P [n=2k] (z)} = n; (ii)
deg{Q[n=2k] (z)} = 2k;
Q[n=2k] (z) is a factor of P [n=2k] (z):P [n=2k]∗ (z);
(iii) Q[n=2k] (0) = 1; (iv)
f (z) − P [n=2k] (z)=Q[n=2k] (z) = O(z n+1 );
(3.62) (3.63) (3.64) (3.65)
where the star in (3.63) denotes the functional complex-conjugate. These axioms suce to prove the following result.
72
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
Theorem 3.10 (Uniqueness [24]). If the vector-valued PadÃe approximant R[n=2k] (z) := P [n=2k] (z)=Q[n=2k] (z)
(3.66)
of type [n=2k] for f (z) exists; then it is unique. Proof. Suppose that R(z) = P(z)=Q(z);
ˆ ˆ ˆ R(z) = P(z)= Q(z)
are two dierent vector-valued Pade approximants having the same speci cation as (3.62) – (3.66). ˆ Let Qgcd (z) be the greatest common divisor of Q(z); Q(z) and de ne reduced and coprime polynomials by Qr (z) = Q(z)=Qgcd (z);
ˆ Qˆ r (z) = Q(z)=Q gcd (z):
From (3.63) and (3.65) we nd that ∗ ˆ ˆ∗ ˆ z 2n+2 Qr (z)Qˆ r (z) is a factor of [P(z)Qˆ r (z) − P(z)Q r (z)] · [P (z)Q r (z) − P (z)Qr (z)]:
(3.67)
The left-hand expression of (3.67) is of degree 2n+4k −2:deg{Qgcd (z)}+2. The right-hand expression of (3.67) is of degree 2n + 4k − 2:deg{Qgcd (z)}. Therefore the right-hand expression of (3.67) is identically zero. [n=2m] [n=2m] By taking Qˆ (z) = b(z):b∗ (z) and Pˆ (z) = a(z)b∗ (z), the uniqueness theorem shows that the generalised inverse vector-valued Pade approximant constructed using the compass identity yields
f (z) = a(z)b∗ (z)=b(z)b∗ (z) exactly. On putting z =1, it follows that the sequence S, such as the one given by (3.14), is summed exactly by the vector epsilon algorithm in the column of index 2k. For normal cases, we have now outlined the proof of a principal result [37,2]. Theorem 3.11 (McLeod’s theorem). Suppose that the vector sequence S satisÿes a nontrivial recursion relation k X
i si+j =
i=0
k X
!
i s ∞ ;
j = 0; 1; 2; : : :
(3.68)
i=0
with i ∈ C. Then the vector epsilon algorithm leads to ( j) 2k = s∞ ;
j = 0; 1; 2; : : :
(3.69)
provided that zero divisors are not encountered in the construction. The previous theorem is a statement about exact results in the column of index 2k in the vector epsilon table. This column corresponds to the row sequence of GIPAs of type [n=2k] for f (z), evaluated at z = 1. If the given vector sequence S is nearly, but not exactly, generalized geometric, we model this situation by supposing that its generating function f (z) is analytic in the closed unit except for k poles in D := {z: |z| ¡ 1}. This hypothesis ensures that f (z) is analytic at disk D, z = 1, and it is suciently strong to guarantee convergence of the column of index 2k in the vector
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
73
epsilon table. There are several convergence theorems of this type [28–30,39]. It is important to note that any row convergence theorem for generalised inverse vector-valued Pade approximants has immediate consequences as a convergence result for a column of the vector epsilon table. A determinantal formula for Q[n=2k] (z) can be derived [24,25] by exploiting the factorisation property (3.63). The formula is 0 M 10 .. Q[n=2k] (z) = . M2k−1; 0 z 2k
M01
M02
:::
0
M12
:::
.. .
.. .
M2k−1; 1
M2k−1; 2
z
2k−1
z
M0; 2k
M1; 2k .. .
2k−2
:::
M2k−1; 2k
:::
1
;
(3.70)
where the constant entries Mij are those in the rst 2k rows of an anti-symmetric matrix M ∈ R(2k+1)×(2k+1) de ned by
Mij =
j−i−1 X H c‘+i+n−2k+1 · cj−‘+n−2k
for j ¿ i;
l=0
−Mji
for i ¡ j;
0
for i = j:
As a consequence of the compass identity (Theorem 3.9) and expansion (3.16), we see that entries in the vector epsilon table are given by ( j) 2k = P [ j+2k=2k] (1)=Q[ j+2k=2k] (1);
j; k¿0;
From this result, it readily follows that each entry in the columns of even index in the vector epsilon table is normally given succinctly by a ratio of determinants: 0 M10 ( j) .. 2k = . M 2k−1; 0 s j
M01
:::
0
:::
.. .
0 M10 .. ÷ . M 2k−1; 0 1
M0; 2k
M1; 2k .. .
M2k−1; 1
:::
M2k−1; 2k
sj+1
:::
s2k+j
M01
:::
0
:::
.. .
M0; 2k
M1; 2k .. .
M2k−1; 1
:::
M2k−1; 2k
1
:::
1
:
For computation, it is best to obtain numerical results from (3.4). The coecients of Q[n=2k] (z) = [n=2k] i z should be found by solving the homogeneous, anti-symmetric (and therefore consistent) i=0 Qi linear system equivalent to (3.70), namely P2k
M q = 0; [n=2k] where qT = (Q2k−i ; i = 0; 1; : : : ; 2k).
74
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
4. Vector-valued continued fractions and vector orthogonal polynomials (0) The elements 2k lying at the head of each column of even index in the vector epsilon table are values of the convergents of a corresponding continued fraction. In Section 3, we noted that the entries in the vector epsilon table are values of vector Pade approximants of
f (z) = c0 + c1 z + c2 z 2 + · · ·
(4.1)
as de ned by (3.16). To obtain the continued fraction corresponding to (4.1), we use Viskovatov’s algorithm, which is an ingenious rule for eciently performing successive reciprocation and re-expansion of a series [2]. Because algebraic operations are required, we use the image of (4.1) in A, which is f(z) = c0 + c1 z + c2 z 2 + · · ·
(4.2)
with ci ∈ V C . Using reciprocation and re-expansion, we nd f(z) =
J −1 X
ci z i +
i=0
z J cJ z1( J ) z 1( J ) z2( J ) z 2( J ) ··· 1 − 1 − 1 − 1 − 1 −
(4.3)
with i( J ) ; i( J ) ∈ A and provided all i( J ) 6= 0; i( J ) 6= 0. By de nition, all the inverses implied in (4.3) are to be taken as right-handed inverses. For example, the second convergent of (4.3) is [J + 1=1](z) =
J −1 X
ci z i + z J cJ [1 − z1( J ) [1 − z 1( J ) ]−1 ]−1
i=0
and the corresponding element of the vector epsilon table is 2( J ) = [J + 1=1](1); where the type refers to the allowed degrees of the numerator and denominator operator polynomials. The next algorithm is used to construct the elements of (4.3). Theorem 4.1 (The vector qd algorithm [40]). With the initialisation 0( J ) = 0;
J = 1; 2; 3; : : : ;
1( J ) = cJ−1 cJ +1 ;
J = 0; 1; 2; : : : ;
(4.4) (4.5)
the remaining i( J ) ; i( J ) can be constructed using ( J +1) m( J ) + m( J ) = m( J +1) + m−1 ;
(4.6)
(J) = m( J +1) m( J +1) m( J ) m+1
(4.7)
for J = 0; 1; 2; : : : and m = 1; 2; 3; : : : : Remark. The elements connected by these rules form lozenges in the − array, as in Fig. 8. Rule (4.7) requires multiplications which are noncommutative except in the scalar case.
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
75
Fig. 8.
Proof. First, the identity C + z[1 + z D−1 ]−1 = C + z − z 2 [z + D]−1 is applied to (4.3) with = cJ ; = f(z) =
J X
ci z i +
i=0
1
−1( J ) ,
z J +1 cJ 1( J ) − z(1( J ) + 1( J ) )
then with =
−1
(4.8) − 1( J ) ;
z 2 1( J ) 2( J ) − z(2( J ) + 2( J ) )
−
=
−2( J ) ,
··· :
etc. We obtain (4.9)
Secondly, let J → J + 1 in (4.3), and then apply (4.8) with = −1( J +1) ; = − 1( J +1) , then with = −2( J +1) ; = − 2( J +1) , etc., to obtain f(z) =
J X
ci z i +
i=0
z J +1 cJ +1 z 2 1( J +1) 1( J +1) ··· : 1 − z1( J +1) − 1 − z( 1( J +1) + 2( J +1) ) −
(4.10)
These expansions (4.9) and (4.10) of f(z) must be identical, and so (4.4) – (4.7) follow by identi cation of the coecients. The purpose of this algorithm is the iterative construction of the elements of the C-fraction (4.3) starting from the coecients ci of (4.1). However, the elements i( J ) ; i( J ) are not vectors in the algebra. Our next task is to reformulate this algorithm using vector quantities which are amenable for computational purposes. The recursion for the numerator and denominator polynomials was derived in (3.34) and (3.35) for case of J = 0, and the more general sequence of approximants labelled by J ¿0 was introduced in (3.50) and (3.51). For them, the recursions are J) J) ( J )−1 ( J ) A(j+1 (z) = A(j J ) (z) − zA(j−1 (z)ej−1 ej ;
(4.11)
(J) (J) ( J )−1 ( J ) Bj+1 (z) = Bj( J ) (z) − zBj−1 (z)ej−1 ej
(4.12)
and accuracy-through-order is expressed by f(z)Bj( J ) (z) = A(j J ) (z) + ej( J ) z j+J +1 + O(z j+J +2 )
(4.13)
for j=0; 1; 2; : : : and J ¿0. Euler’s formula shows that (4.11) and (4.12) are the recursions associated with f(z) =
J −1 X i=0
ci z i +
cJ z J e0( J ) z e0( J )−1 e1( J ) z e1( J )−1 e2( J ) z ··· : − − 1 − 1 − 1 1
(4.14)
76
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
As was noted for (3.55), the approximant of (operator) type [J + m=m] arising from (4.14) is also a convergent of (4.14) with J → J + 1. We nd that (J) J +1) ( J +1) A(2mJ ) (z)[B2m (z)]−1 = [J + m=m](z) = A(2m−1 [B2m−1 (z)]−1
(4.15)
and their error coecients in (4.13) are also the same: (J) ( J +1) = e2m−1 ; e2m
m; J = 0; 1; 2; : : : :
(4.16)
These error vectors ei( J ) ∈ V C obey the following identity. Theorem 4.2 (The cross-rule [27,40,41,46]). With the partly artiÿcial initialisation ( J +1) e−2 = ∞;
e0( J ) = cJ +1
for J = 0; 1; 2; : : : ;
(4.17)
the error vectors obey the identity ( J −1) ( J +1)−1 = ei( J +1) + ei( J ) [ei−2 ei+2 − ei( J −1)−1 ]ei( J )
(4.18)
for J ¿0 and i¿0. Remark. These entries are displayed in Fig. 9 at positions corresponding to their associated approximants (see (4.13)) which satisfy the compass rule. Proof. We identify the elements of (4.3) and (4.14) and obtain (J) ( J )−1 ( J ) j+1 = e2j−1 e2j ;
(J) ( J )−1 ( J ) j+1 = e2j e2j+1 :
(4.19)
We use (4.16) to standardise on even-valued subscripts for the error vectors in (4.19): (J) ( J −1)−1 ( J ) = e2j e2j ; j+1
(J) ( J )−1 ( J −1) j+1 = e2j e2j+2 :
(4.20)
Substitute (4.20) in (4.6) with m = j + 1 and i = 2j, giving ( J −1) ( J +1)−1 ( J ) = ei( J )−1 ei( J +1) + ei−2 ei( J −1)−1 ei( J ) + ei( J )−1 ei+2 ei :
(4.21)
Result (4.18) follows from (4.21) directly if i is even, but from (4.16) and (4.20) if i is odd. Initialisation (4.17) follows from (3.50). From Fig. 9, we note that the cross-rule can be informally expressed as −1 )eC eS = eE + eC (eN−1 − eW
(4.22)
where e ∈ VC for = N; S; E; W and C. Because these error vectors are designants (see (3.31b)), Eq. (4.22) is clearly a fundamental compass identity amongst designants. In fact, this identity has also been established for the leading coecients p˙ of the numerator polynomials [23]. If we were to use monic normalisation for the denominators Q˙ (z) = 1;
(J) B˙ j (z) = I;
(J)
p˙ := A˙ j (z)
(4.23)
(where the dot denotes that the leading coecient of the polynomial beneath the dot is required), we would nd that ˙ −1 ˙C; p˙ S = p˙ E + p˙ C (p˙ −1 N −p W )p corresponding to the same compass identity amongst designants.
(4.24)
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
77
Fig. 9. Position of error vectors obeying the cross-rule.
Reverting to the normalisation of (3.64) with q (0) = I and Q (0) = 1, we note that formula (3.28) is required to convert (4.22) to a usable relation amongst vectors e ∈ Cd . We nd that 2
eS = eE − |eC |
eN eW − + 2eC Re eCH |eN |2 |eW |2
eN eW − |eN |2 |eW |2
and this formula is computationally executable. Implementation of this formula enables the calculation of the vectors e in Cd in a rowwise fashion (see Fig. 9). For the case of vector-valued meromorphic functions of the type described following (3.69) it is shown in [40] that asymptotic (i.e., as J tends to in nity) results similar to the scalar case are valid, with an interesting interpretation for the behaviour of the vectors ei( J ) as J tends to in nity. It is also shown in [40] that, as in the scalar case, the above procedure is numerically unstable, while a column-by-column computation retains stability – i.e., (4.22) is used to evaluate eE . There are also considerations of under ow and over ow which can be dealt with by a mild adaptation of the cross-rule. Orthogonal polynomials lie at the heart of many approximation methods. In this context, the orthogonal polynomials are operators i () ∈ A[], and they are de ned using the functionals c{·} and c{·}. These functionals are de ned by their action on monomials: c{i } = ci ;
c{i } = ci :
(4.25)
By linearity, we can normally de ne monic vector orthogonal polynomials by 0 () = I and, for i = 1; 2; 3; : : : ; by c{i ()j } = 0;
j = 0; 1; : : : ; i − 1:
The connection with the denominator polynomials (3.35) is Theorem 4.3. For i = 0; 1; 2; : : : i () = i B2i−1 (−1 ):
(4.26)
78
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
Proof. Since B2i−1 (z) is an operator polynomial of degree i, so is i (). Moreover, for j = 0; 1; : : : ; i − 1, j
i+j
−1
c{ i ()} = c{ B2i−1 ( )} =
i X ‘=0
c{
i+j−‘
B‘(2i−1) }
=
i X
ci+j−‘ B‘(2i−1)
‘=0
= [f(z)B2i−1 (z)]i+j = 0 as is required for (4.26). This theorem establishes an equivalence between approximation methods based on vector orthogonal polynomials and those based on vector Pade approximation. To take account of noncommutativity, more care is needed over the issue of linearity with respect to multipliers from A than is shown in (4.26). Much fuller accounts, using variants of (4.26), are given by Roberts [41] and Salam [44,45]. In this section, we have focussed on the construction and properties of the continued fractions associated with the leading diagonal sequence of vector Pade approximants. When these approximants (0) are evaluated at z = 1, they equal 2k , the entries on the leading diagonal of the vector epsilon table. These entries are our natural rst choice for use in the acceleration of convergence of a sequence of vectors. Acknowledgements Peter Graves-Morris is grateful to Dr. Simon Chandler-Wilde for making his computer programs available to us, and to Professor Ernst Weniger for his helpful review of the manuscript. References [1] G.A. Baker, Essentials of Pade Approximants, Academic Press, New York, 1975. [2] G.A. Baker Jr., P.R. Graves-Morris, Pade approximants, Encyclopedia of Mathematics and its Applications, 2nd Edition, Vol. 59, Cambridge University Press, New York, 1996. [3] E.R. Berlekamp, Algebraic Coding Theory, McGraw-Hill, New York, 1968. [4] C. Brezinski, Etude sur les -et -algorithmes, Numer. Math. 17 (1971) 153–162. [5] C. Brezinski, Generalisations de la transformation de Shanks, de la table de Pade et de l’-algorithme, Calcolo 12 (1975) 317–360. [6] C. Brezinski, Acceleration de la Convergence en Analyse Numerique, Lecture Notes in Mathematics, Vol. 584, Springer, Berlin, 1977. [7] C. Brezinski, Convergence acceleration of some sequences by the -algorithm, Numer. Math. 29 (1978) 173–177. [8] C. Brezinski, Pade-Type Approximation and General Orthogonal Polynomials, Birkhauser, Basel, 1980. [9] C. Brezinski, M. Redivo-Zaglia, Extrapolation Methods, Theory and Practice, North-Holland, Amsterdam, 1991. [10] S.N. Chandler-Wilde, D. Hothersall, Ecient calculation of the Green function for acoustic propagation above a homogeneous impedance plane, J. Sound Vibr. 180 (1995) 705–724. [11] S.N. Chandler-Wilde, M. Rahman, C.R. Ross, A fast, two-grid method for the impedance problem in a half-plane, Proceedings of the Fourth International Conference on Mathematical Aspects of Wave Propagation, SIAM, Philadelphia, PA, 1998. [12] D. Colton, R. Kress, Integral Equations Methods in Scattering Theory, Wiley, New York, 1983. [13] F. Cordellier, L’-algorithme vectoriel, interpretation geometrique et regles singulieres, Expose au Colloque d’Analyse Numerique de Gourette, 1974.
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80
79
[14] F. Cordellier, Demonstration algebrique de l’extension de l’identite de Wynn aux tables de Pade non-normales, in: L. Wuytack (Ed.), Pade Approximation and its Applications, Springer, Berlin, Lecture Notes in Mathematics, Vol. 765, 1979, pp. 36 – 60. [15] F. Cordellier, Utilisation de l’invariance homographique dans les algorithmes de losange, in: H. Werner, H.J. Bunger (Eds.), Pade Approximation and its Applications, Bad Honnef 1983, Lecture Notes in Mathematics, Vol. 1071, Springer, Berlin, 1984, pp. 62–94. [16] F. Cordellier, Thesis, University of Lille, 1989. [17] A. Cuyt, L. Wuytack, Nonlinear Methods in Numerical Analysis, North-Holland, Amsterdam, 1987. [18] J.-P. Delahaye, B. Germain-Bonne, The set of logarithmically convergent sequences cannot be accelerated, SIAM J. Numer. Anal. 19 (1982) 840–844. [19] J.-P. Delahaye, Sequence Transformations, Springer, Berlin, 1988. [20] W. Gander, E.H. Golub, D. Gruntz, Solving linear systems by extrapolation in Supercomputing, Trondheim, Computer Systems Science, Vol. 62, Springer, Berlin, 1989, pp. 279 –293. [21] S. Gra, V. Grecchi, Borel summability and indeterminancy of the Stieltjes moment problem: Application to anharmonic oscillators, J. Math. Phys. 19 (1978) 1002–1006. [22] P.R. Graves-Morris, Vector valued rational interpolants I, Numer. Math. 42 (1983) 331–348. [23] P.R. Graves-Morris, B. Beckermann, The compass (star) identity for vector-valued rational interpolants, Adv. Comput. Math. 7 (1997) 279–294. [24] P.R. Graves-Morris, C.D. Jenkins, Generalised inverse vector-valued rational interpolation, in: H. Werner, H.J. Bunger (Eds.), Pade Approximation and its Applications, Vol. 1071, Springer, Berlin, 1984, pp. 144–156. [25] P.R. Graves-Morris, C.D. Jenkins, Vector-valued rational interpolants III, Constr. Approx. 2 (1986) 263–289. [26] P.R. Graves-Morris, D.E. Roberts, From matrix to vector Pade approximants, J. Comput. Appl. Math. 51 (1994) 205–236. [27] P.R. Graves-Morris, D.E. Roberts, Problems and progress in vector Pade approximation, J. Comput. Appl. Math. 77 (1997) 173–200. [28] P.R. Graves-Morris, E.B. Sa, Row convergence theorems for generalised inverse vector-valued Pade approximants, J. Comput. Appl. Math. 23 (1988) 63–85. [29] P.R. Graves-Morris, E.B. Sa, An extension of a row convergence theorem for vector Pade approximants, J. Comput. Appl. Math. 34 (1991) 315–324. [30] P.R. Graves-Morris, J. Van Iseghem, Row convergence theorems for vector-valued Pade approximants, J. Approx. Theory 90 (1997) 153–173. [31] M.H. Gutknecht, Lanczos type solvers for non-symmetric systems of linear equations, Acta Numer. 6 (1997) 271– 397. [32] A. Heyting, Die Theorie der linear Gleichungen in einer Zahlenspezies mit nichtkommutatives Multiplikation, Math. Ann. 98 (1927) 465–490. [33] H.H.H. Homeier, Scalar Levin-type sequence transformations, this volume, J. Comput. Appl. Math. 122 (2000) 81–147. [34] U.C. Jentschura, P.J. Mohr, G. So, E.J. Weniger, Convergence acceleration via combined nonlinear-condensation transformations, Comput. Phys. Comm. 116 (1999) 28–54. [35] K. Jbilou, H. Sadok, Vector extrapolation methods, Applications and numerical comparison, this volume, J. Comput. Appl. Math. 122 (2000) 149–165. [36] W.B. Jones, W. Thron, in: G.-C. Rota (Ed.), Continued Fractions, Encyclopedia of Mathematics and its Applications, Vol. 11, Addison-Wesley, Reading, MA, USA, 1980. [37] J.B. McLeod, A note on the -algorithm, Computing 7 (1972) 17–24. [38] D.E. Roberts, Cliord algebras and vector-valued rational forms I, Proc. Roy. Soc. London A 431 (1990) 285–300. [39] D.E. Roberts, On the convergence of rows of vector Pade approximants, J. Comput. Appl. Math. 70 (1996) 95–109. [40] D.E. Roberts, On a vector q-d algorithm, Adv. Comput. Math. 8 (1998) 193–219. [41] D.E. Roberts, A vector Chebyshev algorithm, Numer. Algorithms 17 (1998) 33–50. [42] D.E. Roberts, On a representation of vector continued fractions, J. Comput. Appl. Math. 105 (1999) 453–466. [43] A. Salam, An algebraic approach to the vector -algorithm, Numer. Algorithms 11 (1996) 327–337. [44] A. Salam, Formal vector orthogonal polynomials, Adv. Comput. Math. 8 (1998) 267–289. [45] A. Salam, What is a vector Hankel determinant? Linear Algebra Appl. 278 (1998) 147–161.
80 [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59]
P.R. Graves-Morris et al. / Journal of Computational and Applied Mathematics 122 (2000) 51–80 A. Salam, Pade-type approximants and vector Pade approximants, J. Approx. Theory 97 (1999) 92–112. D. Shanks, Non-linear transformations of divergent and slowly convergent sequences, J. Math. Phys. 34 (1955) 1–42. A. Sidi, W.F. Ford, D.A. Smith, SIAM J. Numer. Anal. 23 (1986) 178–196. R.C.E. Tan, Implementation of the topological epsilon algorithm, SIAM J. Sci. Statist. Comput. 9 (1988) 839–848. E.J. Weniger, Nonlinear sequence transformations for the acceleration of convergence and the summation of divergent series, Comput. Phys. Rep. 10 (1989) 371–1809. E.J. Weniger, A convergent, renormalised strong coupling perturbation expansion for the ground state energy of the quartic, sextic and octic anharmonic oscillator, Ann. Phys. 246 (1996) 133–165. E.J. Weniger, Prediction properties of Aitken’s iterated 2 process, of Wynn’s epsilon algorithm and of Brezinski’s iterated theta algorithm, this volume, J. Comp. Appl. Math. 122 (2000) 329–356. J. Wimp, Sequence Transformations and Their Applications, Academic Press, New York, 1981. P. Wynn, On a device for calculating the em (Sn ) transformations, Math. Tables Automat. Comp. 10 (1956) 91–96. P. Wynn, The epsilon algorithm and operational formulas of numerical analysis, Math. Comp. 15 (1961) 151–158. P. Wynn, L’-algoritmo e la tavola di Pade, Rendi. Mat. Roma 20 (1961) 403–408. P. Wynn, Acceleration techniques for iterative vector problems, Math. Comp. 16 (1962) 301–322. P. Wynn, Continued fractions whose coecients obey a non-commutative law of multiplication, Arch. Rational Mech. Anal. 12 (1963) 273–312. P. Wynn, On the convergence and stability of the epsilon algorithm, SIAM J. Numer. Anal. 3 (1966) 91–122.
Journal of Computational and Applied Mathematics 122 (2000) 81–147 www.elsevier.nl/locate/cam
Scalar Levin-type sequence transformations Herbert H.H. Homeier ∗; 1 Institut fur Physikalische und Theoretische Chemie, Universitat Regensburg, D-93040 Regensburg, Germany Received 7 June 1999; received in revised form 15 January 2000
Abstract Sequence transformations are important tools for the convergence acceleration of slowly convergent scalar sequences or series and for the summation of divergent series. The basic idea is to construct from a given sequence {{sn }} a new sequence {{sn0 }} = T({{sn }}) where each sn0 depends on a nite number of elements sn1 ; : : : ; snm . Often, the sn are the partial sums of an in nite series. The aim is to nd transformations such that {{sn0 }} converges faster than (or sums) {{sn }}. Transformations T({{sn }}; {{!n }}) that depend not only on the sequence elements or partial sums sn but also on an auxiliary sequence of the so-called remainder estimates !n are of Levin-type if they are linear in the sn , and nonlinear in the !n . Such remainder estimates provide an easy-to-use possibility to use asymptotic information on the problem sequence for the construction of highly ecient sequence transformations. As shown rst by Levin, it is possible to obtain such asymptotic information easily for large classes of sequences in such a way that the !n are simple functions of a few sequence elements sn . Then, nonlinear sequence transformations are obtained. Special cases of such Levin-type transformations belong to the most powerful currently known extrapolation methods for scalar sequences and series. Here, we review known Levin-type sequence transformations and put them in a common theoretical framework. It is discussed how such transformations may be constructed by either a model sequence approach or by iteration of simple transformations. As illustration, two new sequence transformations are derived. Common properties and results on convergence acceleration and stability are given. For important special cases, extensions of the general results are presented. Also, guidelines for the application of Levin-type sequence transformations are discussed, and a few numerical c 2000 Elsevier Science B.V. All rights reserved. examples are given. MSC: 65B05; 65B10; 65B15; 40A05; 40A25; 42C15 Keywords: Convergence acceleration; Extrapolation; Summation of divergent series; Stability analysis; Hierarchical consistency; Iterative sequence transformation; Levin-type transformations; Algorithm; Linear convergence; Logarithmic convergence; Fourier series; Power series; Rational approximation
∗
Fax: +49-941-943-4719. E-mail address:
[email protected] (H.H.H. Homeier) 1 WWW: http:==www.chemie.uni-regensburg.de= ∼hoh05008 c 2000 Elsevier Science B.V. All rights reserved. 0377-0427/00/$ - see front matter PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 5 9 - 9
82
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
1. Introduction In applied mathematics and the numerate sciences, extrapolation methods are often used for the convergence acceleration of slowly convergent sequences or series and for the summation of divergent series. For an introduction to such methods, and also further information that cannot be covered here, see the books of Brezinski and Redivo Zaglia [14] and Wimp [102] and the work of Weniger [84,88] and Homeier [40], but also the books of Baker [3], Baker and Graves-Morris [5], Brezinski [7,8,10 –12], Graves-Morris [24,25], Graves-Morris, Sa and Varga [26], Khovanskii [52], Lorentzen and Waadeland [56], Nikishin and Sorokin [62], Petrushev and Popov [66], Ross [67], Sa and Varga [68], Wall [83], Werner and Buenger [101] and Wuytack [103]. For the discussion of extrapolation methods, one considers a sequence {{sn }} = {{s ; s ; : : :}} with P Pn0 1 a with partial sums s = elements sn or the terms an = sn − sn−1 of a series ∞ n j=0 j j=0 aj for large n. A common approach is to rewrite sn as sn = s + Rn ;
(1)
where s is the limit (or antilimit in the case of divergence) and Rn is the remainder or tail. The aim then is to nd a new sequence {{sn0 }} such that sn0 = s + R0n ;
R0n =Rn → 0 for n → ∞:
(2)
Thus, the sequence {{sn0 }} converges faster to the limit s (or diverges less violently) than {{sn }}. To nd the sequence {{sn0 }}, i.e., to construct a sequence transformation {{sn0 }} = T({{sn }}), one needs asymptotic information about the sn or the terms an for large n, and hence about the Rn . This information then allows to eliminate the remainder at least asymptotically, for instance by substracting the dominant part of the remainder. Either such information is obtained by a careful mathematical analysis of the behavior of the sn and=or an , or it has to be extracted numerically from the values of a nite number of the sn and=or an by some method that ideally can be proven to work for a large class of problems. Suppose that one knows quantities !n such that Rn =!n = O(1) for n → ∞, for instance lim Rn =!n = c 6= 0;
(3)
n→∞
where c is a constant. Such quantities are called remainder estimates. Quite often, such remainder estimates can be found with relatively low eort but the exact value of c is often quite hard to calculate. Then, it is rather natural to rewrite the rest as Rn = !n n where n → c. The problem is how to describe or model the n . Suppose that one has a system of known functions j (n) such that −j for some 0 (n) = 1 and j+1 = o( j (n)) for j ∈ N0 . An example of such a system is j (n) = (n + ) ∈ R+ . Then, one may model n as a linear combination of the j (n) according to n ∼
∞ X
cj j (n)
for n → ∞;
(4)
j=0
whence the problem sequence is modelled according to sn ∼ s + ! n
∞ X j=0
cj j (n):
(5)
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
83
The idea now is to eliminate the leading terms of the remainder with the unknown constants cj up to j = k − 1, say. Thus, one uses a model sequence with elements m = + !m
k−1 X
cj j (m);
m ∈ N0
(6)
j=0
and calculates exactly by solving the system of k + 1 equations resulting for m = n; n + 1; : : : ; n + k for the unknowns and cj , j = 0; : : : ; k − 1. The solution for is a ratio of determinants (see below) and may be denoted symbolically as = T (n ; : : : ; n+k ; !n ; : : : ; !n+k ; j (n); : : : ; j (n + k)):
(7)
The resulting sequence transformation is T({{sn }}; {{!n }}) = {{Tn(k) ({{sn }}; {{!n }})}}
(8)
Tn(k) ({{sn }}; {{!n }}) = T (sn ; : : : ; sn+k ; !n ; : : : ; !n+k ; j (n); : : : ; j (n + k)):
(9)
with
It eliminates the leading terms of the asymptotic expansion (5). The model sequences (6) are in the kernel of the sequence transformation T, de ned as the set of all sequences such that T reproduces their (anti)limit exactly. A somewhat more general approach is based on model sequences of the form n = +
k X
cj gj (n);
n ∈ N0 ; k ∈ N:
(10)
j=1
Virtually all known sequence transformations can be derived using such model sequences. This leads to the E algorithm as described below in Section 3.1. Also, some further important examples of sequence transformations are described in Section 3. However, the introduction of remainder estimates proved to be an important theoretical step since it allows to make use of asymptotic information of the remainder easily. The most prominent of the resulting sequence transformations T({{sn }}; {{!n }}) is the Levin transformation [53] that corresponds to the asymptotic system of functions given by j (n) = (n + )−j , and thus, to Poincare-type expansions of the n . But also other systems are of importance, like j (n) = 1=(n + )j leading to factorial series, or j (n) = tnj corresponding to Taylor expansions of t-dependent functions at the abscissae tn that tend to zero for large n. The question which asymptotic system is best, cannot be decided generally. The answer to this question depends on the extrapolation problem. To obtain ecient extrapolation procedures for large classes of problems requires to use various asymptotic systems, and thus, a larger number of dierent sequence transformations. Also, dierent choices of !n lead to dierent variants of such transformations. Levin [53] has pioneered this question and introduced three variants that are both simple and rather successful for large classes of problems. These variants and some further ones will be discussed. The question which variant is best, also
84
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
cannot be decided generally. There are, however, a number of results that favor certain variants for certain problems. For example, for Stieltjes series, the choice !n = an+1 can be theoretically justi ed (see Appendix A). Thus, we will focus on sequence transformations that involve an auxiliary sequence {{!n }}. To be more speci c, we consider transformations of the form T({{sn }}; {{!n }}) = {{Tn(k) }} with Pk
Tn(k)
=
(k) j=0 n; j sn+j =!n+j : Pk (k) j=0 n; j =!n+j
(11)
This will be called a Levin-type transformations. The known sequence transformations that involve remainder estimates, for instance the C; S, and M transformations of Weniger [84], the W algorithm of Sidi [73], and the J transformation of Homeier with its many special cases like the important p J transformations [35,36,38– 40,46], are all of this type. Interestingly, also the H; I, and K transformations of Homeier [34,35,37,40 – 44] for the extrapolation of orthogonal expansions are of this type although the !n in some sense cease to be remainder estimates as de ned in Eq. (3). The Levin transformation was also generalized in a dierent way by Levin and Sidi [54] who introduced the d(m) transformations. This is an important class of transformations that would deserve a thorough review itself. This, however, is outside the scope of the present review. We collect some important facts regarding this class of transformations in Section 3.2. Levin-type transformations as de ned in Eq. (11) have been used for the solution of a large variety of problems. For instance, Levin-type sequence transformations have been applied for the convergence acceleration of in nite series representations of molecular integrals [28,29, 33,65,82,98–100], for the calculation of the lineshape of spectral holes [49], for the extrapolation of cluster- and crystal-orbital calculations of one-dimensional polymer chains to in nite chain length [16,88,97], for the calculation of special functions [28,40,82,88,89,94,100], for the summation of divergent and acceleration of convergent quantum mechanical perturbation series [17,18,27,85,90 –93,95,96], for the evaluation of semiin nite integrals with oscillating integrands and Sommerfeld integral tails [60,61,75,81], and for the convergence acceleration of multipolar and orthogonal expansions and Fourier series [34,35,37,40 – 45,63,77,80]. This list is clearly not complete but sucient to demonstrate the possibility of successful application of these transformations. The outline of this survey is as follows: After listing some de nitions and notations, we discuss some basic sequence transformations in order to provide some background information. Then, special de nitions relevant for Levin-type sequence transformations are given, including variants obtained by choosing speci c remainder estimates !n . After this, important examples of Levin-type sequence transformations are introduced. In Section 5, we will discuss approaches for the construction of Levin-type sequence transformations, including model sequences, kernels and annihilation operators, and also the concept of hierarchical consistency. In Section 6, we derive basic properties, those of limiting transformations and discuss the application to power series. In Section 7, results on convergence acceleration are presented, while in Section 8, results on the numerical stability of the transformations are provided. Finally, we discuss guidelines for the application of the transformations and some numerical examples in Section 9.
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
85
2. Deÿnitions and notations 2.1. General de nitions 2.1.1. Sets Natural numbers: N = {1; 2; 3; : : :};
N0 = N ∪ {0}:
(12)
Integer numbers: Z = N ∪ {0; −1; −2; −3; : : :}:
(13)
Real numbers and vectors: R = {x : x real}; R+ = {x ∈ R : x ¿ 0} Rn = {(x1 ; : : : ; x n ) | xj ∈ R; j = 1; : : : ; n}:
(14)
Complex numbers: C = {z = x + iy : x ∈ R; y ∈ R; i2 = −1}; Cn = {(z1 ; : : : ; zn ) | zj ∈ C; j = 1; : : : ; n}:
(15)
For z = x + iy, real and imaginary parts are denoted as x = R (z); y = I (z). We use K to denote R or C. Vectors with nonvanishing components: Fn = {(z1 ; : : : ; zn ) | zj ∈ C; zj 6= 0; j = 1; : : : ; n}: Polynomials: Pk =
P : z 7→
k X
cj z j | z ∈ C; (c0 ; : : : ; ck ) ∈ Kk+1
j=0
(16)
:
(17)
Sequences: SK = {{{s0 ; s1 ; : : : ; sn ; : : :}} | sn ∈ K; n ∈ N0 }:
(18)
Sequences with nonvanishing terms: OK = {{{s0 ; s1 ; : : : ; sn ; : : :}} | sn 6= 0; sn ∈ K; n ∈ N0 }:
(19)
2.1.2. Special functions and symbols Gamma function [58, p. 1]: (z) =
Z
0
∞
t z−1 exp(−t) dt
(z ∈ R+ ):
(20)
Factorial: n! = (n + 1) =
n Y j=1
j:
(21)
86
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
Pochhammer symbol [58, p. 2]: n (a + n) Y = (a + j − 1): (a) j=1
(a)n =
(22)
Binomial coecients [1, p. 256, Eq. (6.1.21)]:
z w
=
(z + 1) : (w + 1) (z − w + 1)
(23)
Entier function: <x= = max{j ∈ Z: j6x; x ∈ R}:
(24)
2.2. Sequences, series and operators 2.2.1. Sequences and series For Stieltjes series see Appendix A. Scalar sequences with elements sn , tail Rn , and limit s: K {{sn }} = {{sn }}∞ n=0 = {{s0 ; s1 ; s2 ; : : :}} ∈ S ;
Rn = sn − s;
lim sn = s:
n→∞
(25)
If the sequence is not convergent but summable to s; s is called the antilimit. The nth element sn of a sequence = {{sn }} ∈ SK is also denoted by hin . A sequence is called a constant sequence, if all elements are constant, i.e., if there is a c ∈ K such that sn = c for all n ∈ N0 , in which case it is denoted by {{c}}. The constant sequence {{0}} is called the zero sequence. Scalar series with terms aj ∈ K, partial sums sn , tail Rn , and limit=antilimit s: s=
∞ X
aj ;
sn =
j=0
n X
aj ;
∞ X
Rn = −
j=0
aj = sn − s:
(26)
j=n+1
We say that aˆn are Kummer-related to the an with limit or antilimit sˆ if aˆn = 4sˆn−1 satisfy an ∼ aˆn P for n → ∞ and sˆ is the limit (or antilimit) of sˆn = nj=0 aˆj . Scalar power series in z ∈ C with coecients cj ∈ K, partial sums fn (z), tail Rn (z), and limit/antilimit f(z): f(z) =
∞ X
j
cj z ;
fn (z) =
j=0
n X
j
cj z ;
Rn (z) =
j=0
∞ X
cj z j = f(z) − fn (z):
(27)
j=n+1
2.2.2. Types of convergence Sequences {{sn }} satisfying the equation lim (sn+1 − s)=(sn − s) =
n→∞
(28)
are called linearly convergent if 0 ¡ || ¡ 1, logarithmically convergent for = 1 and hyperlinearly convergent for = 0. For || ¿ 1, the sequence diverges. A sequence {{un }} accelerates a sequence {{vn }} to s if lim (un − s)=(vn − s) = 0:
n→∞
If {{vn }} converges to s then we also say that {{un }} converges faster than {{vn }}.
(29)
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
87
A sequence {{un }} accelerates a sequence {{vn }} to s with order ¿ 0 if (un − s)=(vn − s) = O(n− ):
(30)
If {{vn }} converges to s then we also say that {{un }} converges faster than {{vn }} with order . 2.2.3. Operators Annihilation operator: An operator A: SK → K is called an annihilation operator for a given sequence {{n }} if it satis es A({{sn + ztn }}) = A({{sn }}) + zA({{tn }})
for all {{sn }} ∈ SK ; {{tn }} ∈ SK ; z ∈ K;
A({{n }}) = 0:
(31)
Forward dierence operator. 4m g(m) = g(m + 1) − g(m); 4m gm = gm+1 − gm ; 4km = 4m 4mk−1 ; 4 = 4n ; k
4 gn =
k X
(−1)
k−j
k j
j=0
gn+j :
(32)
Generalized dierence operator n(k) for given quantities n(k) 6= 0: n(k) = (n(k) )−1 4 :
(33)
Generalized dierence operator
(k) ˜ n
for given quantities n(k) 6= 0:
(k) ˜ n = (n(k) )−1 42 :
(34)
Generalized dierence operator
5n(k) []
for given quantities
n(k)
6= 0:
5n(k) []fn = (n(k) )−1 (fn+2 − 2 cos fn+1 + fn ):
(35)
˜ (k) 6= 0: Generalized dierence operator @n(k) [] for given quantities n ˜ (k) )−1 ((2) fn+2 + (1) fn+1 + (0) fn ): @n(k) []fn = ( n+k n+k n+k n
(36)
Weighted dierence operators for given P (k−1) ∈ Pk−1 : Wn(k) = Wn(k) [P (k−1) ] = 4(k) P (k−1) (n): Polynomial operators P for given P (k) ∈ P(k) : Let P (k) (x) = P[P (k) ]gn =
k X
Pk
j=0
(37) pj(k) xj . Then put
pj(k) gn+j :
(38)
j=0
Divided dierence operator. For given {{x n }} and k; n ∈ N0 , put (k) n [{{x n }}](f(x))
=
(k) n (f(x))
= f[x n ; : : : ; x n+k ] =
k X j=0
f(x n+j )
k Y i=0 i6=j
1 ; x n+j − x n+i
88
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147 (k) n [{{x n }}]gn
=
(k) n gn
=
k X
gn+j
j=0
k Y i=0 i6=j
1 : x n+j − x n+i
(39)
3. Some basic sequence transformations 3.1. E Algorithm Putting for sequences {{yn }} and {{gj (n)}}; j = 1; : : : ; k yn g1 (n) En(k) [{{yn }}; {{gj (n)}}] = .. . g (n) k
· · · yn+k · · · g1 (n + k) ; .. .. . .
(40)
· · · gk (n + k)
one may de ne the sequence transformation En(k) ({{sn }}) =
En(k) [{{sn }}; {{gj (n)}}] : En(k) [{{1}}; {{gj (n)}}]
(41)
As is plain using Cramer’s rule, we have En(k) ({{n }}) = if the n satisfy Eq. (10). Thus, the sequence transformation yields the limit exactly for model sequences (10). The sequence transformation E is known as the E algorithm or also as Brezinski–Ha vie–Protocol [102, Section 10] after two of its main investigators, Ha vie [32] and Brezinski [9]. A good introduction to this transformation is also given in the book of Brezinski and Redivo Zaglia [14, Section 2.1], cf. also Ref. [15]. Numerically, the computation of the En(k) ({{sn }}) can be performed recursively using either the algorithm of Brezinski [14, p. 58f ] En(0) ({{sn }}) = sn ; En(k) ({{sn }})
=
(n) gk;(n)i = gk−1; i −
g0;(n)i = gi (n);
En(k−1) ({{sn }}) (n+1) (n) gk−1; i − gk−1; i (n+1) (n) gk−1; k − gk−; 1; k
−
n ∈ N0 ; i ∈ N; (k−1) E(n+1) ({{sn }}) − En(k−1) ({{sn }})
(n) gk−; 1; k ;
(n+1) gk−1; k
−
(n) gk−1; k
i = k + 1; k + 2; : : :
(n) gk−1; k;
(42)
or the algorithm of Ford and Sidi [22] that requires additionally the quantities gk+1 (n+j); j =0; : : : ; k for the computation of En(k) ({{sn }}). The algorithm of Ford and Sidi involves the quantities k; n (u) =
En(k) [{{un }}; {{gj (n)}}] En(k) [{{gk+1 (n)}}; {{gj (n)}}]
(43)
for any sequence {{u0 ; u1 ; : : :}}, where the gi (n) are not changed even if they depend on the un and the un are changed. Then we have En(k) ({{sn }}) =
k(n) (s) k(n) (1)
(44)
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
89
and the are calculated recursively via k−1; n+1 (u) − k−1; n (u) k; n (u) = : (45) k−1; n+1 (gk+1 ) − k−1; n (gk+1 ) Of course, for gj (n) = !n j−1 (n), i.e., in the context of sequences modelled via expansion (5), the E algorithm may be used to obtain an explicit representation for any Levin-type sequence transformation of the form (cf. Eq. (9)) Tn(k) = T (sn ; : : : ; sn+k ; !n ; : : : ; !n+k ; j (n); : : : ; j (n + k))
(46)
as ratio of two determinants
En(k) [{{sn =!n }}; {{ j−1 (n)}}] : En(k) [{{1=!n }}; {{ j−1 (n)}}] This follows from the identity [14] En(k) [{{sn }}; {{!n j−1 (n)}}] En(k) [{{sn =!n }}; {{ j−1 (n)}}] = (k) ; En(k) [{{1}}; {{!n j−1 (n)}}] En [{{1=!n }}; {{ j−1 (n)}}] that is an easy consequence of usual algebraic manipulations of determinants. Tn(k) ({{sn }}; {{!n }}) =
(47)
(48)
3.2. The d(m) transformations As noted in the introduction, the d(m) transformations were introduced by Levin and Sidi [54] as a generalization of the u variant of the Levin transformation [53]. We describe a slightly modi ed variant of the d(m) transformations [77]: Let sr ; r = 0; 1; : : : be a real or complexPsequence with limit or antilimit s and terms a0 = s0 and ar = sr − sr−1 ; r = 1; 2; : : : such that sr = rr=0 aj ; r = 0; 1; : : : . For given m ∈ N and l ∈ N0 with l ∈ N0 and 060 ¡ 1 ¡ 2 ¡ · · · and = (n1 ; : : : ; nm ) with nj ∈ N0 the d(m) transformation yields a table of approximations s(m; j) for the (anti-)limit s as solution of the linear system of equations nk m X X ki sl = s(m; j) + (l + )k [k−1 al ] ; j6l6j + N (49) (l + )i i=0 k=1 P
with ¿ 0; N = mk=1 nk and the N +1 unknowns s(m; j) and k i . The [k aj ] are de ned via [0 aj ]=aj and [k aj ] = [k−1 aj+1 ] − [k−1 aj ]; k = 1; 2; : : : . In most cases, all nk are chosen equal and one puts = (n; n; : : : ; n). Apart from the value of , only the input of m and of ‘ is required from the (m; 0) user. As transformed sequence, often one chooses the elements s(n; :::; n) for n = 0; 1; : : : . The u variant of the Levin transformation is obtained for m = 1; = and l = l. The de nition above diers slightly from the original one [54] and was given in Ref. [22] with = 1. Ford and Sidi have shown, how these transformations can be calculated recursively with the W(m) algorithms [22]. The d(m) transformations are the best known special cases of the generalised Richardson Extrapolation process (GREP) as de ned by Sidi [72,73,78]. The d(m) transformations are derived by asymptotic analysis of the remainders sr − s for r → ∞ (m) for the family B˜ of sequences {{ar }} as de ned in Ref. [54]. For such sequences, the ar satisfy a dierence equation of order m of the form ar =
m X k=1
pk (r)k ar :
(50)
90
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
The pk (r) satisfy the asymptotic relation pk (r) ∼ r ik
∞ X pk‘ ‘=0
r‘
for r → ∞:
(51)
The ik are integers satisfying ik 6k for k = 1; : : : ; m. This family of sequences is very large. But still, Levin and Sidi could prove [54, Theorem 2] that under mild additional assumptions, the remainders for such sequences satisfy sr − s ∼
m X
jk
k−1
r (
ar )
k=1
∞ X k‘ ‘=0
r‘
for r → ∞:
(52)
The jk are integers satisfying jk 6k for k = 1; : : : ; m. A corresponding result for m = 1 was proven by Sidi [71, Theorem 6:1]. System (49) now is obtained by truncation of the expansions at ‘ = nn , evaluation at r = l , and some further obvious substitutions. The introduction of suitable l was shown to improve the accuracy and stability in dicult situations considerably [77]. 3.3. Shanks transformation and epsilon algorithm An important special case of the E algorithm is the choice gj (n) = 4sn+j−1 leading to the Shanks transformation [70] ek (sn ) =
En(k) [{{sn }}; {{4sn+j−1 }}] : En(k) [{{1}}; {{4sn+j−1 }}]
(53)
Instead of using one of the recursive schemes for the E algorithms, the Shanks transformation may be implemented using the epsilon algorithm [104] that is de ned by the recursive scheme (n) −1 = 0;
0(n) = sn ;
(n) (n+1) k+1 = k−1 + 1=[k(n+1) − k(n) ]:
(54)
The relations (n) = ek (sn ); 2k
(n) 2k+1 = 1=ek (4sn )
(55)
(n) hold and show that the elements 2k+1 are only auxiliary quantities. The kernel of the Shanks transformation ek is given by sequences of the form
sn = s +
k−1 X
cj 4 sn+j :
(56)
j=0
See also [14, Theorem 2:18]. Additionally, one can use the Shanks transformation – and hence the epsilon algorithm – to compute the upper-half of the Pade table according to [70,104] ek (fn (z)) = [n + k=k]f (z)
(k¿0; n¿0);
(57)
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
where fn (z) =
n X
cj z j
91
(58)
j=0
are the partial sums of a power series of a function f(z). Pade approximants of f(z) are rational functions in z given as ratio of two polynomials p‘ ∈ P(‘) and qm ∈ P(m) according to [‘=m]f (z) = p‘ (z)=qm (z);
(59)
where the Taylor series of f and [‘=m]f are identical to the highest possible power of z, i.e., f(z) − p‘ (z)=qm (z) = O(z ‘+m+1 ):
(60)
Methods for the extrapolation of power series will be treated later. 3.4. Aitken process The special case 2(n) = e1 (sn ) is identical to the famous 2 method of Aitken [2] (sn+1 − sn )2 sn(1) = sn − sn+2 − 2sn+1 + sn with kernel sn = s + c (sn+1 − sn );
n ∈ N0 :
(61) (62)
2
Iteration of the method yields the iterated Aitken process [14,84,102] An(0) = sn ; (k) − An(k) )2 (An+1 : (63) (k) (k) An+2 − 2An+1 + An(k) The iterated Aitken process and the epsilon algorithm accelerate linear convergence and can sometimes be applied successfully for the summation of alternating divergent series.
An(k+1) = An(k) −
3.5. Overholt process The Overholt process is de ned by the recursive scheme [64] Vn(0) ({{sn }}) = sn ; (k−1) ({{sn }}) − (4sn+k )k Vn(k−1) ({{sn }}) (4sn+k−1 )k Vn+1 (64) (4sn+k−1 )k − (4sn+k )k for k ∈ N and n ∈ N0 . It is important for the convergence acceleration of xed point iterations.
Vn(k) ({{sn }}) =
4. Levin-type sequence transformations 4.1. De nitions for Levin-type transformations A set (k) = {n;(k)j ∈ K | n ∈ N0 ; 06j6k} is called a coecient set of order k with k ∈ N if n;(k)k 6= 0 for all n ∈ N0 . Also, = {(k) | k ∈ N} is called coecient set. Two coecient sets
92
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
(k) = {{n;(k)j }} and ˆ = {{ˆn; j }} are called equivalent, if for all n and k, there is a constant cn(k) 6= 0 (k) such that ˆ = c(k) n;(k)j for all j with 06j6k. n
n; j
For each coecient set (k) = {n;(k)j |n ∈ N0 ; 06j6k} of order k, one may de ne a Levin-type sequence transformation of order k by T[(k) ] : SK × Y(k) → SK : ({{sn }}; {{!n }}) 7→ {{sn0 }} = T[(k) ]({{sn }}; {{!n }}) with
(65)
Pk
sn0
=
Tn(k) ({{sn }}; {{!n }})
and Y(k) =
{{!n }} ∈ OK :
=
k X
(k) j=0 n; j sn+j =!n+j Pk (k) j=0 n; j =!n+j
n;(k)j =!n+j 6= 0 for all n ∈ N0
j=0
(66)
:
(67)
We call T[] = {T[(k) ]| k ∈ N} the Levin-type sequence transformation corresponding to the coecient set = {(k) | k ∈ N}. We write T(k) and T instead of T[(k) ] and T[], respectively, whenever the coecients n;(k)j are clear from the context. Also, if two coecient sets and ˆ are ˆ since equivalent, they give rise to the same sequence transformation, i.e., T[] = T[], Pk
j=0
Pk
(k) ˆn; j sn+j =!n+j (k)
ˆ j=0 n; j =!n+j
Pk
=
(k) j=0 n; j sn+j =!n+j Pk (k) j=0 n; j =!n+j
(k) for ˆn; j = cn(k) n(k)
(68)
with arbitrary cn(k) 6= 0. The number Tn(k) are often arranged in a two-dimensional table T0(0) T1(0) T2(0) .. .
T0(1) T1(1) T2(1) .. .
T0(2) T1(2) T2(2) .. .
··· ··· ··· .. .
(69)
that is called the T table. The transformations T(k) thus correspond to columns, i.e., to following vertical paths in the table. The numerators and denominators such that Tn(k) = Nn(k) =Dn(k) also are often arranged in analogous N and D tables. Note that for xed N , one may also de ne a transformation TN : {{sn+N }} 7→ {{TN(k) }}∞ k=0 :
(70)
This corresponds to horizontal paths in the T table. These are sometimes called diagonals, because rearranging the table in such a way that elements with constant values of n + k are members of the same row, TN(k) for xed N correspond to diagonals of the rearranged table. For a given coecient set de ne the moduli by n(k) = max {|n;(k)j |} 06j6k
(71)
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
93
and the characteristic polynomials by n(k) ∈ P(k) : n(k) (z) =
k X
n;(k)j z j
(72)
j=0
for n ∈ N0 and k ∈ N. Then, T[] is said to be in normalized form if n(k) = 1 for all k ∈ N and n ∈ N0 . Is is said to be in subnormalized form if for all k ∈ N there is a constant ˜ (k) such that n(k) 6˜ (k) for all n ∈ N0 . Any Levin-type sequence transformation T[] can rewritten in normalized form. To see this, use cn(k) = 1=n(k)
(73)
in Eq. (68). Similarly, each Levin-type sequence transformation can be rewritten in (many dierent) subnormalized forms. A Levin-type sequence transformation of order k is said to be convex if n(k) (1) = 0 for all n in N0 . Equivalently, it is convex if {{1}} 6∈ Y(k) , i.e., if the transformation vanishes for {{sn }} = {{c!n }}; c ∈ K. Also, T[] is called convex, if T[(k) ] is convex for all k ∈ N. We will see that this property is important for ensuring convergence acceleration for linearly convergent sequences. A given Levin-type transformation T can also be rewritten as Tn(k) ({{sn }}; {{!n }}) = with
k X
n;(k)j (!n ) sn+j ;
!n = (!n ; : : : ; !n+k )
(74)
j=0
−1
k n;(k)j0 n;(k)j X
n;(k)j (!n ) = !n+j j0 =0 !n+j0
;
k X
n;(k)j (!n ) = 1:
(75)
j=0
Then, one may de ne stability indices by (k) n (T)
=
k X
| n;(k)j (!n )|¿1:
(76)
j=0
Note that any sequence transformation Q Qn(k) =
k X
qn;(k)j sn+j
(77)
j=0
with k X
qn;(k)j = 1
(78)
j=0
can formally be rewritten as a Levin-type sequence transformation according to Qn(k) = Tn(k) ({{sn }}; {{!n }}) with coecients n;(k)j = !n+j qn;(k)j n(k) where the validity of Eq. (78) requires to set n(k) =
k X j=0
n;(k)j =!n+j :
(79)
94
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
If for given k ∈ N and for a transformation T[(k) ] the following limits exist and have the values: ◦
lim n;(k)j = j(k)
(80)
n→∞
◦
◦
for all 06j6k, and if (k) is a coecient set of order k which means that at least the limit k(k) ◦ ◦ ◦ ◦ does not vanish, then a limiting transformation T [ (k) ] exists where (k) = { j(k) }. More explicitly, we have ◦
◦
(k) K (k) K T [ ] : S × Y → S : ({{sn }}; {{!n }}) 7→ {{sn0 }}
with ◦ sn0 = T (k) ({{sn }}; {{!n }})
Pk
=
◦
j=0
Pk
(k) j sn+j =!n+j
j=0
and ◦ (k)
Y
=
{{!n }} ∈ OK :
k X
(81)
(82)
◦
(k) j =!n+j
◦ (k) j =!n+j
6= 0
for all n ∈ N0
j=0
:
(83)
Obviously, this limiting transformation itself is a Levin-type sequence transformation and automatically is given in subnormalized form. 4.1.1. Variants of Levin-type transformations For the following, assume that ¿ 0 is an arbitrary constant, an =4sn−1 , and aˆn are Kummer-related to the an with limit or antilimit sˆ (cf. Section 2.2.1). A variant of a Levin-type sequence transformation T is obtained by a particular choice !n . For !n = fn ({{sn }}), the transformation T is nonlinear in the sn . In particular, we have [50,53,79]: t Variant: t
!n = 4sn−1 = an : t Tn(k) ({{sn }}) = Tn(k) ({{sn }}; {{t !n }}):
(84)
u Variant: u
!n = (n + ) 4 sn−1 = (n + )an : u Tn(k) ( ; {{sn }}) = Tn(k) ({{sn }}; {{u !n }}):
(85)
v Variant: v
!n = −
4sn−1 4 sn an an+1 v (k) = : Tn ({{sn }}) = Tn(k) ({{sn }}; {{v !n }}): 2 4 sn−1 an − an+1
(86)
t˜ Variant: t˜
!n = 4sn = an+1 : t˜Tn(k) ({{sn }}) = Tn(k) ({{sn }}; {{t˜!n }}):
(87)
lt Variant: lt
!n = aˆn : lt Tn(k) ({{sn }}) = Tn(k) ({{sn }}; {{lt !n }}):
(88)
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
95
lu Variant: lu
!n = (n + )aˆn : lu Tn(k) ( ; {{sn }}) = Tn(k) ({{sn }}; {{lu !n }}):
(89)
lv Variant: lv
!n =
aˆn aˆn+1 lv (k) : Tn ({{sn }}) = Tn(k) ({{sn }}; {{lv !n }}): aˆn − aˆn+1
(90)
lt˜ Variant: lt˜
!n = aˆn+1 : lt˜Tn(k) ({{sn }}) = Tn(k) ({{sn }}; {{lt˜!n }}):
(91)
K Variant: K
!n = sˆn − s: ˆ K Tn(k) ({{sn }}) = Tn(k) ({{sn }}; {{K !n }}):
(92)
The K variant of a Levin-type transformation T is linear in the sn . This holds also for the lt, lu, lv and lt˜ variants. 4.2. Important examples of Levin-type sequence transformations In this section, we present important Levin-type sequence transformations. For each transformation, we give the de nition, recursive algorithms and some background information. 4.2.1. J transformation The J transformation was derived and studied by Homeier [35,36,38– 40,46]. Although the J transformation was derived by hierarchically consistent iteration of the simple transformation 4sn sn0 = sn+1 − !n+1 ; (93) 4!n it was possible to derive an explicit formula for its kernel as is discussed later. It may be de ned via the recursive scheme Nn(0) = sn =!n ;
Dn(0) = 1=!n ;
Nn(k) = n(k−1) Nn(k−1) ;
Dn(k) = n(k−1) Dn(k−1) ;
Jn(k) ({{sn }}; {{!n }}; {n(k) }) = Nn(k) =Dn(k) ;
(94)
where the generalized dierence operator de ned in Eq. (33) involves quantities n(k) 6= 0 for k ∈ N0 . Special cases of the J transformation result from corresponding choices of the n(k) . These are summarized in Table 1. Using generalized dierence operators n(k) , we also have the representation [36, Eq. (38)] n(k−1) n(k−2) : : : n(0) [sn =!n ] : (95) n(k−1) n(k−2) : : : n(0) [1=!n ] The J transformation may also be computed using the alternative recursive schemes [36,46] Jn(k) ({{sn }}; {{!n }}; {{n(k) }}) =
(0) Dˆ n = 1=!n ;
(0) Nˆ n = sn =!n ;
(k) (k−1) (k−1) Dˆ n = n(k−1) Dˆ n+1 − Dˆ n ;
k ∈ N;
(k) Nˆ n
k ∈ N;
(k−1) = n(k−1) Nˆ n+1
−
(k−1) Nˆ n ;
(96)
96
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147 Table 1 Special cases of the J transformationa Case Drummond transformation Dn(k) ({{sn }}; {{!n }}) Homeier I transformation In(k) (; {{sn }}; {{!n }}; {(k) n }) =Jn(2k) ({{sn }}; {{e−in !n }}; {(k) n }) Homeier F transformation F(k) n ({{sn }}; {{!n }}; {{x n }}) Homeier p J transformation (k) p Jn ( ; {{sn }}; {{!n }}) Levin transformation L(k) n ( ; {{sn }}; {{!n }}) generalized L transformation L(k) n (; ; {{sn }}; {{!n }}) Levin-Sidi d(1) transformation [22,54,77] (d(1) )(k) n (; {{sn }}) Mosig–Michalski algorithm [60,61] Mn(k) ({{sn }}; {{!n }}; {{x n }}) Sidi W algorithm (GREP(1) ) [73,77,78] Wn(k) ({{sn }}; {{!n }}; {{tn }}) Weniger C transformation [87] Cn(k) ( ; = ; {{sn }}; {{!n }}) Weniger M transformation Mn(k) (; {{sn }}; {{!n }}) Weniger S transformation Sn(k) ( ; {{sn }}; {{!n }}) Iterated Aitken process [2,84] An(k) ({{sn }}) =Jn(k) ({{sn }}; {{4sn }}; {(k) n })
j (n)
b
nj
Eq. (231)
a
Refs. [36,38,40]. For the de nition of the j; n see Eq. (5). c Factors independent of n are irrelevant. b
c
1 = exp(2in); (2‘) n (2‘+1) = exp(−2in)(‘) n n
Qn−1
(xj +k)(xj+k+1 +k−1) j=0 (xj +k−1)(xj+k+2 +k)
1=(x n )j
x n+k+1 −x n x n +k−1
Eq. (231)
1 (n+ +(p−1)k)2
(n + )−j
1 (n+ )(n+ +k+1)
(n + )−j
(n+ +k+1) −(n+ ) (n+ ) (n+ +k+1)
(Rn + )−j
1 Rn+k+1 +
Eq. (231)
1 x2n
tnj
tn+k+1 − tn
1 ( n+ )j
(n+1+( +k−1)= )k (n+( +k)= )k+2
1 (−n−)j
(n+1+−(k−1))k (n+−k)k+2
1=(n + )j
1 (n+ +2k)2
Eq. (231)
(4An
Overholt process [64] Vn(k) ({{sn }}) =Jn(k) ({{sn }}; {{4sn }}; {(k) n })
(k) n
Eq. (231)
1−
(k+1) (k)
−
1 Rn +
!n x2k n+1
!n+1 x2k n
(k)
({{sn }}))(42 An )({{sn }}) (k)
(4An ({{sn }}))(4An+1 ({{sn }}))
(sn+k+1 )[(sn+k )k+1 ] (sn+k )k+1
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
Jn(k) ({{sn }}; {{!n }}; {n(k) })
97
(k) Nˆ n = (k) Dˆ n
with n(0) = 1;
(1) (k−1) (0) n n · · · n ; (1) (k−1) (0) n+1 n+1 · · · n+1
n(k) =
k∈N
(97)
and (0) D˜ n = 1=!n ;
(0) N˜ n = sn =!n ;
(k) (k−1) (k−1) D˜ n = D˜ n+1 − n(k−1) D˜ n ;
k ∈N;
(k) N˜ n
k ∈ N;
(k−1) = N˜ n+1
−
(k−1) n(k−1) N˜ n ;
Jn(k) ({{sn }}; {{!n }}; {n(k) })
(98)
(k) N˜ n = (k) D˜ n
with n(0) = 1;
n(k) =
(1) (k−1) (0) n+k n+k−1 · · · n+1
(1) (k−1) (0) n+k−1 n+k−2 · · · n
;
k ∈ N:
(99)
The quantities n(k) should not be mixed up with the k; n (u) as de ned in Eq. (43). P (k) As shown in [46], the coecients for the algorithm (96) that are de ned via Dˆ n = kj=0 n;(k)j =!n+j , satisfy the recursion (k) (k) = n(k) n+1; n;(k+1) j j−1 − n; j
(100)
with starting values n;(0)j = 1. This holds for all j if we de ne n;(k)j = 0 for j ¡ 0 or j ¿ k. Because n(k) 6= 0, we have n;(k)k 6= 0 such that {n;(k)j } is a coecient set for all k ∈ N0 . (k) P (k) Similarly, the coecients for algorithm (98) that are de ned via D˜ n = kj=0 ˜n; j =!n+j , satisfy the recursion (k+1) (k) (k) ˜n; j = ˜n+1; j−1 − n(k) ˜n; j
(101)
(0) (k) with starting values ˜n; j = 1. This holds for all j if we de ne ˜n; j = 0 for j ¡ 0 or j ¿ k. In this (k) (k) case, we have ˜n; k = 1 such that {˜n; j } is a coecient set for all k ∈ N0 . Since the J transformation vanishes for {{sn }} = {{c!n }}, c ∈ K according to Eq. (95) for all k ∈ N, it is convex. This may also be shown by using induction in k using n;(1)1 = −n;(1)0 = 1 and the equation k+1 X j=0
n;(k+1) j
=
n(k)
k X
(k) n+1; j
j=0
that follows from Eq. (100).
−
k X j=0
n;(k)j
(102)
98
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
Assuming that the limits k = limn→∞ n(k) exist for all k ∈ N and noting that for k = 0 always ◦
◦
0 = 1 holds, it follows that there exists a limiting transformation J [ ] that can be considered as special variant of the J transformation and with coecients given explicitly as [46, Eq. (16)] ◦ (k) j
= (−1)
X
k−1 Y
j0 +j1 +:::+jk−1 =j; j0 ∈{0;1};:::; jk−1 ∈{0;1}
m=0
k−j
(m ) jm :
(103)
As characteristic polynomial we obtain ◦
(k) (z) =
k X
◦ (k) j j z
j=0
=
k−1 Y
(j z − 1):
(104)
j=0
◦
◦
Hence, the J transformation is convex since (k) (1) = 0 due to 0 = 1. The p J Transformation: This is the special case of the J transformation corresponding to n(k) =
1 (n + + (p − 1)k)2
(105)
or to [46, Eq. (18)] 2
n(k) =
n+ +2 n+ p−1 k p−1 k n+ +2 k
n(k)
=
(106) for p = 1
n+
or to
for p 6= 1;
n+ +k −1 n+ +k +1 p−2 p−2 k k n+ +k −1 k
n+ +k +1
for p 6= 2; (107) for p = 2;
that is, (k) p Jn ( ; {{sn }}; {{!n }})
= Jn(k) ({{sn }}; {{!n }}; {1=(n + + (p − 1)k)2 }): ◦
(108) ◦
The limiting transformation p J of the p J transformation exists for all p and corresponds to the J transformation with k = 1 for all k in N0 . This is exactly the Drummond transformation discussed in Section 4.2.2, i.e., we have ◦ (k) p J n ( ; {{sn }}; {{!n }})
2
= Dn(k) ({{sn }}; {{!n }}):
The equation in [46] contains an error.
(109)
H.H.H. Homeier / Journal of Computational and Applied Mathematics 122 (2000) 81–147
99
4.2.2. Drummond transformation This transformation was given by Drummond [19]. It was also discussed by Weniger [84]. It may be de ned as Dn(k) ({{sn }}; {{!n }}) =
4k [sn =!n ] : 4k [1=!n ]
(110)
Using the de nition (32) of the forward dierence operator, the coecients may be taken as n;(k)j
= (−1)
j
k j
;
(111)
k i.e., independent of n. As moduli, one has n(k) = (