Theoretical Computer Science 315 (2004) 307 – 308
www.elsevier.com/locate/tcs
Preface Symbolic/algebraic and numerical algorithms are the backbone of the modern computations in sciences, engineering, and electrical engineering. These two classes of algorithms and the respective scienti*c communities have been separated historically and still have relatively little interaction; yet they may bene*t from combining their power and techniques toward common computational goals. Such a combination is a rather recent undertaking, which became visible only in the last decade, but it is already popular in such central areas as root-*nding for polynomials and systems of polynomials and computations with Toeplitz and other structured matrices (having ties and impact to both numerical matrix methods and algebraic polynomial techniques). The areas happened to be central also for the Editors’ research interests, which motivated our e1orts to bring the subjects to the attention of the TCS readers. (The readers can *nd links to the introductory and advanced bibliography on these subjects in the Editors’ web home pages.) The present issue of the TCS partly re3ects the state of the art. It includes papers on various algebraic and numerical algorithms and techniques, focuses on combined application of the methods from both groups, and extensively represents topics in polynomial root-*nding and structured matrix computations. The issue includes recent advances in the numerical study of the root variety for a system of multivariate polynomials by homotopy methods, leading to the variety’s irreducible decomposition and the approximation of all common roots, by Sommese et al. A complexity study is undertaken by Bompadre et al, for the problem of solving polynomial systems by combining di1erent techniques such as the Newton-Hensel lemma, characteristic polynomials, and resultants, in the setting of Straight-Line programs. The same setting is used by Pardo and San Mart;erent from two. A general linear system TN f = b with a nonsingular Toeplitz coe?cient matrix can be solved “fast” with complexity O(N 2 ) using Levinson-type or Schur-type algorithms. A problem is that the classical Levinson and Schur algorithms work only if the matrix TN is strongly nonsingular, which means that all leading principal submatrices Tk = [ai−j ]ki;j=1 are nonsingular for k = 1; : : : ; N . This condition is never satis'ed for a ∗
This work was supported by Research Grant SM05/02 of Kuwait University. Corresponding author. E-mail addresses:
[email protected] (G. Heinig),
[email protected] (K. Rost).
c 2004 Elsevier B.V. All rights reserved. 0304-3975/$ - see front matter doi:10.1016/j.tcs.2004.01.003
454
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
skewsymmetric Toeplitz matrix, since skewsymmetric matrices of odd order are always singular. The problem of fast solving skewsymmetric Toeplitz systems was addressed in our recent paper [14]. In this paper fast algorithms were designed for skewsymmetric Toeplitz matrices which work under the condition that every leading principal submatrix of even order is nonsingular, which means the same as the nonsingularity of all −‘ central submatrices TN −2‘ = [ai−j ]Ni;j=‘+1 , ‘ = 0; 1; : : : ; N=2 − 1. Matrices with the latter property are called centro-nonsingular. The algorithms in [14] are, in principle, split algorithms in the sense of Delsarte-Genin in [3,4]. Some algorithms in [14] are the skewsymmetric counterparts of double-steps split algorithms for symmetric Toeplitz matrices proposed in [16] and [8]. However, surprisingly, there are also algorithms for skewsymmetric Toeplitz matrices that do not have an obvious symmetric counterpart, which is due to some additional symmetry properties of skewsymmetric Toeplitz matrices. An algorithm for Toeplitz matrices working without additional conditions was 'rst proposed in [7]. A discussion of algorithms of this kind can also be found in [19]. But these algorithms are for general Toeplitz matrices and do not fully utilize additional symmetry properties like symmetry or skewsymmetry. Thus, the aim of the present paper is to design (split) algorithms that exploit both the Toeplitz structure as well as the skewsymmetry and work without assumption on the rank pro'le. Split algorithms for general symmetric Toeplitz matrices were designed in our recent paper [15]. Let us reiterate that the skewsymmetric case is not simply an analogue of the symmetric case but has some speci'c peculiarities. Our approach is based on a look-ahead strategy. In the algorithms we consider only those submatrices Tn which are nonsingular. Let n1 ¡n2 ¡ · · · ¡nr = N be the set of all n = nk for which Tn is nonsingular, and let u(k) be the vector spanning the (onedimensional) nullspace of Tnk +1 . Here TN +1 means any skewsymmetric extension of TN . The Levinson-type algorithm computes a vector u(k+1) from u(k) and u(k−1) by a threeterm recursion, and the Schur-type algorithm computes the corresponding residuals. The last two vectors u(k) determine the inverse matrix via an “inversion formula” which allows to solve a linear system e?ciently. Note that a di>erent approach for solving skewsymmetric Toeplitz systems will be discussed in a forthcoming paper [9]. The approach in [9] is based on the recursion of fundamental systems (see [13]). One of its advantages is that it can easily be generalized to the block case, which is not the case for the look-ahead approach. Like the classical Schur algorithm is related to an LU-factorization of the Toeplitz matrix and the classical Levinson algorithm is related to a UL-factorization of its inverse, the split Schur algorithm for symmetric Toeplitz matrices is related to a ZW -factorization 1 of the matrix and the split Levinson algorithm to a WZ-factorization of its inverse. This was observed in [5]. Concerning WZ-factorization for general matrices we refer to [6,18] and references therein. In [14] the structure of the ZW -factorization of centro-nonsingular skewsymmetric Toeplitz matrices was studied. It was shown that such a matrix TN admits a represen1
The de'nitions of Z- and W -matrices are given in Section 6.
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
455
tation TN = ZXZ T in which Z is a special unit Z-matrix and X is a skewsymmetric antidiagonal matrix (and a similar result for TN−1 ). In the present paper we show that, more general, any nonsingular skewsymmetric Toeplitz matrix admits such a representation in which X is a skewsymmetric block antidiagonal matrix and the blocks are multiples of the identity. The factors Z and X can be computed with the help of the generalized split Schur algorithm. The factorization combined with back substitution gives another possibility to solve linear systems without computing the vectors u(k) . Besides the solution via inversion formula and factorization we also discuss the solution via direct recursion. We refrain from computing the computational complexities in all cases, since their exact values depend on the rank pro'le of the matrix. However, it can be pointed out that these values are in general not essentially higher and in most cases even lower than the corresponding values computed in [14] for the case of a centro-nonsingular skewsymmetric Toeplitz matrix. Let us introduce some notations that will be used throughout the paper. We denote by Jn the n × n matrix of counteridentity, which has ones on the antidiagonal and zeros elsewhere. A vector u ∈ Fn is called symmetric if u = Jn u and skewsymmetric if u = − Jn u. An n × n matrix B is called centrosymmetric if Jn BJn = B or n be the subspaces of Fn consisting of all centro-skewsymmetric if Jn BJn = − B. Let F± symmetric, skewsymmetric vectors, respectively. Occasionally we will use polynomial language. For a matrix A = [aij ], A(t; s) will denote the bivariate polynomial A(t; s) = aij t i−1 sj−1 ; i;j
n and for u = (ui )ni=1 we set u(t) = i=1 ui t i−1 . For a vector u = (ui )li=1 , let Mk (u) denote the (k + l − 1) × k matrix 0 u1 . . . . . . u1 k + l − 1: Mk (u) = ul .. . . . . 0 ul It is easily checked that, for x ∈ Fk , (Mk (u)x)(t) = u(t)x(t). Furthermore, ek ∈ Fn will denote the kth vector in the standard basis of Fn , and 0k will denote a zero vector of length k. If the length of the vector is clear or irrelevant we omit the subscript. 2. Inversion formula From now on, let TN = [ai−j ]Ni;j=1 be a nonsingular skewsymmetric Toeplitz matrix and TN +1 any skewsymmetric (N + 1) × (N + 1) Toeplitz extension of TN . Clearly, N must be even and TN +1 and TN −1 have one-dimensional nullspaces. Let u ∈ FN +1
456
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
and u ∈ FN −1 be the vectors spanning these nullspaces. In [13] (see also [14]) it was shown that the vectors u and u are symmetric. Since TN is nonsingular, the last component of u is nonzero. Therefore, we may assume that it is equal to 1. Note that the last component of u might be zero. Let r be de'ned by r = [aN −1
···
a1 ]u :
Since TN is nonsingular, we have r = 0. It is worth to mention that the vectors 1 0 1 u ; − r 0 r u are the last and the 'rst columns of TN−1 , respectively. We introduce the (symmetric) vector 0 1 x = u ∈ FN +1 r 0 which is the solution of the equation TN +1 x = eN +1 − e1 . The following is a speci'cation of a well-known inversion formula for general Toeplitz matrices (see [10,1]) for the case of skewsymmetric matrices and was discussed in [14]. Theorem 2.1. The inverse of TN is given by TN−1 (t; s) =
x(t)u(s) − u(t)x(s) : 1 − ts
(1)
Formula (1) can be expressed in matrix form in many ways. Let us present one of them, which is the “classical” Gohberg-Semencul formula built from triangular Toeplitz matrices. +1 , let L(v) denote the N × N lower triangular Toeplitz matrix For a vector v = (vi )Ni=1 v1 0 . . : . . L(v) = . . vN · · · v1 Corollary 2.2. The inverse of TN is given by TN−1 = L(x)L(u)T − L(u)L(x)T :
(2)
The direct application of (2) has complexity O(N 2 ), but if F is the 'eld of real or complex numbers fast algorithms with complexity O(N log N ) can be applied. Let us mention that there are formulas for TN−1 that contain only diagonal matrices and discrete
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
457
Fourier or real trigonometric transformations, which are ready for implementation (see for example [11,12] and references therein). Note also that formula (2) can be written in terms of polynomial multiplication, and polynomial multiplication can be carried out with complexity O(N log N log log N ) in any 'eld (see [17] and references therein). 3. Recursion background We are going to show some facts which will be the basis for the split algorithms developed in the next sections. Besides the (nonsingular) matrix TN and its extension TN +1 we consider its central submatrices. Recall that N is even and so all central submatrices of TN have even order. These central submatrices coincide with the leading principal submatrices Tk = [ai−j ]ki;j=1 for even k. Let Tn be nonsingular. Then Tn+1 has the kernel dimension one. Let un span the kernel of Tn+1 . Since the last component of un does not vanish we may assume that it is equal to 1. As mentioned above, un is symmetric. We introduce the numbers rj = [aj+n
···
aj ]un
for j = 1; : : : ; N − n, which will be called residuals of un . Proposition 3.1. Let r1 = · · · = rd−1 = 0, rd = 0, and m = n + 2d. Then Tn+1 ; : : : ; Tm−1 are singular and Tm is nonsingular. Proof. We have
Od×d
Tm M2d (un ) = O
T
R
−R O ; Od×d
(3)
where R denotes the d × d upper triangular Toeplitz matrix rd · · · r2d−1 .. .. R= . . : 0 rd Hence
0k
Tn+2k+1 un = 0 0k for k = 0; : : : ; d − 1, which means that the matrices Tn+1 ; : : : ; Tm−1 are singular. Furthermore, we conclude from (3) that the vectors e1 ; : : : ; ed and em−d+1 ; : : : ; em belong to the range of Tm and also to the range of TmT .
458
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
Suppose that Tm v = 0. Then gT v = 0 for all vectors from the range of TmT . Hence the 'rst and last d components of v vanish, and v is of the form vT = [0d vT 0d ]T , where v belongs to the kernel of Tn . Since Tn is nonsingular, we conclude that v = 0. Thus Tm is nonsingular. Besides the vector un we consider a solution xn of the equation Tn+1 xn = en+1 − e1 . Since unT (en+1 − e1 ) = 0, this equation has a (non-unique) solution x, which is symmetric, due to the centro-skewsymmetry of Tn . We introduce numbers sj = [aj+n
···
aj ]xn
for j = 0; : : : ; N − n. In particular, s0 = 1. Let xm be a solution of the equation Tm+1 xm = em+1 − e1 and um the vector spanning the kernel of Tm+1 with the last component equal to 1. We show now how um and xm can be computed from un and xn . From (3) we conclude that 0d Tm+1 un = rd (em+1 − e1 ): 0d Thus xm can be chosen as 0d 1 xm = un : rd 0d To 'nd um we observe that 0 . . . 0 Tm+1 M2d+1 (un ) = 0 rd . . . r2d
(4)
· · · −rd
0 ..
.
···
rd
· · · −r2d .. .. . . −rd 0 : 0 .. . ··· 0
Let R˜ denote the (d + 1) × (d + 1) upper triangular Toeplitz matrix rd · · · r2d .. .. R˜ = . . ; 0 rd
(5)
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
459
and c = (ci )d+1 i=1 the solution of the triangular Toeplitz system T R˜ c = s;
where s = (si−1 )d+1 i=1 .
Furthermore, let c˜ = p = q˜c. Then we have
c c
2d+1 ∈ F+ be the symmetric extension of c, q = 1=c1 , and
0d
Tm+1 M2d+1 (un )p − q xn = 0: 0d By construction, the last (and the 'rst) component of M2d+1 (un )p equals 1. We arrived at the relation 0d um = M2d+1 (un )p − q xn : (6) 0d We write relations (4) and (6) in polynomial language and arrive at the following. Proposition 3.2. The vectors um and xm can be computed from un and xn via um (t) = p(t)un (t) − qt d xn (t); 1 d xm (t) = t un (t): rd
(7)
4. Split algorithms We discuss now the algorithms emerging from the recursion described in Proposition 3.2. First we introduce some notation. Let n1 ¡ · · · ¡n‘ = N be the integers n ∈ {1; 2; : : : ; N } for which Tn is nonsingular, dk = 12 (nk+1 − nk ), and let u(k) be the vector spanning the kernel of Tnk +1 with last component equal to 1 and x(k) a solution of Tnk +1 x(k) = enk +1 − e1 . The residuals rj(k) and sj(k) of u(k) and x(k) are de'ned by rj(k) = [aj+nk
···
aj ]u(k) ;
sj(k) = [aj+nk
···
aj ]x(k) ;
(8)
respectively, for j = 0; : : : ; N − nk . Clearly, r0(k) = 0 and s0(k) = 1. Our aim is to 'nd u = u(‘) and x = x(‘) . Then the solution of a linear system TN f = b can be computed using the formula from Corollary 2.2 or another inversion formula. First let us note that according to (7) 0dk−1 1 x(k) = (k−1) u(k−1) rdk−1 0dk−1
460
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
and sj(k) =
1 rd(k−1) k−1
(k−1) rj+d : k−1
That means it is su?cient to compute the residuals rj(k) and to construct the vectors u(k) . For initialization we set n0 = 0 and u(0) = 1. Then rj(0) = aj . If a1 = · · · = ad−1 = 0 and ad = 0, then n1 = 2d. The vector u(1) is the normalized solution of the homogeneous system T2d+1 v = 0. We show how this solution can be found. We form the matrix ad · · · a2d (0) .. .. R˜ = . . : 0 ad (0) T c ˜ ∈ Let c be the solution of the triangular Toeplitz system (R ) c = e1 and v = c F2d+1 its symmetric extension. Then T2d+1 v = 0. Hence u(1) = (1=c)v, where c is the 'rst component of c. We assume now that nk−1 , nk , u(k−1) and u(k) are given. We also need some of the values rj(k−1) (j = 1; : : : ; 2dk−1 ) that are computed in the previous step. Now nk+1 and (k) = 0 and rd(k) = 0, then dk = d, i.e. u(k+1) are computed as follows. If r1(k) = · · · = rd−1 nk+1 = nk + 2d. (k) (k) ; : : : ; r2d and form the matrix R˜ as We compute the numbers rd(k) k +1 k (k) (k) rdk · · · r2d k (k) . . .. .. R˜ = : rd(k) k
0
If dk ¿dk−1 , then it will be necessary to compute also the numbers rj(k−1) for j = 2dk−1 d +d
k k−1 + 1; : : : ; dk + dk−1 to form the vector r(k−1) = (rj(k−1) )j=d . k−1 (k) Let c be the solution of the triangular Toeplitz system
(k)
(R˜ )T c(k) = r (k−1) ; q(k) = 1=c, where c is the 'rst component of c(k) , and p(k) = q(k) the symmetric extension of q(k) c(k) . Then 0dk +dk−1 u(k+1) = M2dk +1 (u(k) )p(k) − q(k) u(k−1) : 0dk +dk−1 In polynomial language the recursion can be written as follows.
c(k) c(k)
2dk +1 ∈ F+ be
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
461
Theorem 4.1. The polynomials u(k) (t) satisfy the three-term recursion u(k+1) (t) = p(k) (t)u(k) (t) − t dk +dk−1 q(k) u(k−1) (t): Example 1. Consider the skewsymmetric Toeplitz matrix T6 = [ai−j ]6i;j=1 , with (ak )5k=1 = (1; 2; 3; 5; 6). Since we need also an extension of T6 we set a6 = 0. The standard setting for initialization is n0 = 0, u(0) = 1 and rj(0) = aj . Since r1(0) = 1 = 0 we have d0 = 1 and n1 = n0 + 2d0 = 2. We obtain x(1) = [0 1 0]T and u(1) = [1; −2; 1]T . With u(0) and u(1) we can start the recursion. We compute the residuals as r1(1) = 0, r2(1) = 1. Thus d1 = 2, n2 = n1 + 2d1 = 6, and (1) x(2) = [0; 0; 1; −2; 1; 0; 0]T . In order to form the matrix R˜ we 'nd that r3(1) = − 1 and r4(1) = − 7, and in order to form the vector r(0) we observe that r2(0) = a2 = 2, (1) r3(0) = a3 = 3. The solution of the system (R˜ )T c(1) = r(0) is c(1) = [1; 3; 13]T . Hence T (1) p = [1; 3; 13; 3; 1] , which gives u(2) = [1; 1; 8; −21; 8; 1; 1]T : The inverse of T6 is now given by Corollary 2.2 with x = x(2) and u = u(2) . A check shows that this really gives the inverse matrix. Let us discuss the complexity of the algorithm emerging from Theorem 4.1. Surprisingly, the existence of singular central submatrices does not increase the complexity, in many cases it even decreases it. For simplicity we assume that all dk are equal to d, where d is small compared with N . We neglect lower order terms. The amount for inner product calculations will be almost independent of d. We have to compute about N inner products of a symmetric and a general vector. For this 12 N 2 additions and 1 2 4 N multiplications are needed. Then we have in each step 2d + 1 vector additions of symmetric vectors and 2 d + 1 multiplications of a2 symmetric vector by a scalar. 1 1 N additions and 18 + 8d N multiplications. Thus, the overThis results in 14 + 8d 2 2 3 1 1 N multiplications. That all complexity is about 4 + 8d N additions and 38 + 8d means the amount decreases when d increases. In the case d = 1, which is the centrononsingular case, Theorem 4.1 is just Theorem 3.2 in [14]. In this case the complexity is 78 N 2 additions and 12 N 2 multiplications (comp. [14]). The algorithm just described is a split Levinson-type algorithm and includes the calculation of the residuals via long inner products, which might be not convenient in parallel computing. We show now that the residuals can also be computed by a Schurtype algorithm. The Schur-type algorithm is of independent interest, since it provides a factorization, which will be described in Section 6. −nk We consider the full residual vectors r(k) = (rj(k) )Nj=1 and the corresponding polyno(k) (k) mials r (t). By the de'nition of the integer dk , r˜ (t) = t −dk +1 r(k) (t) is a polynomial. The monic, symmetric polynomial p(k) (t) and q(k) ∈ F have been constructed in such way that the polynomial r˜(k) (t)p(k) (t) − q(k) r˜(k−1) (t)
462
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
has a zero of order dk + 1 at t = 0. According to Theorem 4.1, the will give remainder N us r(k+1) (t). Let Pm denote the projector mapping a polynomial j=1 pj t j−1 (N ¿m) m to j=1 pj t j−1 , i.e. cutting o> high powers. Theorem 4.1 gives us immediately the following recursion formula for the residuals. Theorem 4.2. The polynomials r(k) (t) satisfy the recursion r(k+1) (t) = PN −nk+1 (t −2dk p(k) (t)r(k) (t) − t −dk−1 −dk q(k) r(k−1) (t)): To write this recursion in matrix form we introduce the matrix Q(k) by (k) Q(k) = [r2d ]%k 2dk +1 ; k +i−j+1 i=1 j=1
where %k = N − nk+1 = N − nk − 2dk . Now we have r(k+1) = Q(k) p(k) − q(k) rO(k−1) ; where rO(k−1) = [rd(k−1) ]% k . k +dk−1 +i i=1
recursion starts with rO(−1) = 0, r(0) = [aj ]Nj=1 , p(0) = u(1) , and Q(0) = N −n1 n1 +1 [an1 +i−j+1 ]i=1 . The vector u(1) will be computed as described in the initializaj=1 tion of the Levinson-type recursion via the solution of a triangular (d1 + 1) × (d1 + 1) Toeplitz system. Theorem 4.2 can be combined with Theorem 4.1 to compute u and x, the parameters for the inversion formula. The
5. Solution of linear systems In this section we show how to solve a linear system TN fN = bN with a nonsingular N × N skewsymmetric Toeplitz coe?cient matrix TN recursively without using the inversion formula. We use all notations that were introduced in the previous section. +nk ) nk Suppose that b = [bi ]Ni=1 . We set b(k) = [bi ](1=2)(N We consider the sysi=(1=2)(N −nk +2) ∈ F tems T (k) f (k) = b(k) ; where T (k) = Tnk . Our aim is to compute f (k+1) from f (k) . Since T (k+1) is of the form (k) ∗ ∗ −B− T (k+1) = ∗ T (k) ∗ ; ∗
(k) B+
∗
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
where
(k) = B+
a1 .. . ; · · · adk ··· .. .
ank .. . ank +dk −1
463
(k) (k) B− = Jdk B+ Jn k
we have
0
(k) −'−
T (k+1) f (k) = b(k) ; (k) 0 '+ (k) (k) (k) where '± = B± f . As in Section 4 we obtain
T (k+1) M2dk (u(k) ) =
−R(k) O ;
O O
(R(k) )T where
R(k) =
O
· · · r2dk −1 .. .. . . :
rdk
rdk (k) Hence we have, for (± ∈ F dk ,
0
T (k+1) f (k) + M2dk (u(k) ) 0
(k) (+ (k) (−
=
(k) (k) −R(k) (− − '−
b(k)
(k) (R(k) )T (+
+
:
(9)
(k) '+
From this relation we conclude the following. Theorem 5.1. Suppose that
(k) b−
b(k+1) = b(k) ; (k) b+ (k) (k) where b± ∈ Fdk and (± are the solutions of (k) (k) (k) (R(k) )T (+ = b+ − '+ ;
(k) (k) (k) R(k) (− = −b− − '− :
(10)
464
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
Then the solution f (k+1) of T (k+1) f (k+1) = b(k+1) is given by
0 (k) (+ (k) (k+1) (k) f = f + M2dk (u ) (k) : (− 0
(11)
(k) which require For one step of the recursion one has 'rst to compute the vectors '± (k) the multiplication of a vector by the dk × nk Toeplitz matrix B± , then to solve two triangular dk × dk Toeplitz systems (with actually the same coe?cient matrix) to get (k) and 'nally to apply formula (11). (± (k) require long inner product calculations which The computations of the vectors '± (k) can be avoided if the full residual vectors '˜± ∈ FN −nk are considered. These vectors are given by (k) 0 −'˜− (k) : TN f (k) = b (k) 0 '˜ +
Let
(k) Q±
be de'ned by
(k) (k) Q+ = [r2d ]) k d k ; k +i−j+1 i=1 j=1
(k) (k) Q− = J)k Q+ Jd k ;
where )k = 12 (N − nk ). Then we conclude from (9) that (k+1) (k) (k) (k) = Q± (± + ('˜± ) ; '˜± (k) where here the prime at '˜+ means that the 'rst dk components are deleted and at (k) '˜− that the last dk components are deleted.
6. Generalized ZW -factorization Like the classical Schur algorithm for symmetric Toeplitz matrices is related to the LU-factorization of the matrix and the classical Levinson algorithm related to a ULfactorization of its inverse, the split Schur algorithm is related to a ZW -factorization of the matrix and the split Levinson algorithm to a WZ-factorizaton of the inverse. In [14] the latter factorizations were investigated for skewsymmetric Toeplitz matrices. It was shown that centro-nonsingular skewsymmetric matrices admit a ZW -factorization in which the factors possess some additional symmetry properties. We are going to generalize this result to arbitrary nonsingular skewsymmetric Toeplitz matrices. The factorization will lead to the possibility to solve a linear system by a pure Schur-type algorithm. To be more precise, let us recall some concepts. A matrix A = [aij ]ni;j=1 is called a W -matrix if aij = 0 for all (i; j) for which i¿j and i + j¿n or i¡j and i + j6n.
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
465
The matrix A will be called a unit W -matrix if, in addition, aii = 1 for i = 1; : : : ; n and ai;n+1−i = 0 for i = (n + 1)=2. The transpose of a W -matrix is called a Z-matrix. A matrix which is both a Z- and a W -matrix will be called an X -matrix. The names come from the shapes of the set of all possible positions for nonzero entries, viz. • • •• • • • • • ◦ ◦ ◦ ◦ • ◦ • • ◦ ◦ ◦ ◦ • ◦ • : W = ; Z = • ◦ • ◦ • • ◦ • • • • ◦ ◦ ◦ • • •
•• • • • •
•
A unit Z- or W -matrix is obviously nonsingular and a linear system with such a coe?cient matrix can be solved by back substitution with n2 =2 additions and n2 =2 multiplications. A representation A = ZXW in which Z is a unit Z-matrix, W is a unit W -matrix, and X a nonsingular X -matrix is called ZW -factorization. Analogously WZ-factorization is de'ned. A admits a ZW -factorization if and only if A is centro-nonsingular. Under the same condition A−1 admits a WZ-factorization. That means if A is not centro-nonsingular, then no such a factorization exists and a generalization is not at hand. We show now that, nevertheless, in the special case of a skewsymmetric Toeplitz matrix there is a natural generalization of the factorization result in [14]. We introduce N × dk matrices W±(k) by O)k ×dk O)k ×dk M (u(k) ) O d dk ×dk (k) W−(k) = k ; W+ = ; Odk ×dk Mdk (u(k) ) O)k ×dk
O)k ×dk
where )k = 12 (N − nk+1 ), and form the matrix W = [W−(‘−1))
···
W−(0) W+(0)
···
W+(‘−1) ]:
Recall that u(0) = 1, n0 = 0. Obviously, W is a centrosymmetric unit W -matrix. We have (k) (k) −S− −Sˆ+ O(nk+1 −dk )×dk −R(k) (k) (k) ; TN W− = ; ; TN W+ = (k) T (R ) O(nk+1 −dk )×dk (k) S+(k) Sˆ− where (k) S+(k) = [r2d ]) k d k ; k +i−j i=1 j=1
(k) S− = [r)(k) ]) k d k ; k −i+j i=1 j=1
(12)
466
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
(k) (k) Jdk . We set r (k) = rd(k) , Sˆ± = J)k S± k
Z+(k) =
1 r (k)
TN W−(k) ;
(k) Z− =−
1 r (k)
TN W+(k) ;
and form the matrix (‘−1) Z = [Z−
···
(0) (0) Z− Z+
···
Z+(‘−1) ]:
(13)
Then Z is a centrosymmetric unit Z-matrix. Furthermore, TN W = ZX; where X is the skewsymmetric block antidiagonal matrix 0 −r (‘−1) Id‘−1 . .. −r (0) Id0 X = r (0) Id0 . .. r (‘−1) Id‘−1 0
:
(14)
This leads to the following. Theorem 6.1. A nonsingular skewsymmetric Toeplitz matrix and its inverse admit representations TN = ZXZ T ;
TN−1 = WX −1 W T ;
where Z is a centrosymmetric Z-matrix given by (13), W is a centrosymmetric W -matrix given by (12), and X is a skewsymmetric block antidiagonal matrix given by (14). Example 2. Let us illustrate the factorizations for the example of a nonsingular skewsymmetric Toeplitz matrix T6 = [ai−j ]6i;j=1 with a1 = 0 for which T4 is singular. That means we have n1 = 2 and N = n2 = 6. Let u(1) = [1 u 1]T span the nullspace of T3 . Then the factors of the generalized ZW -factorization of T6 and generalized WZfactorization of T6−1 are given by r 3 a3 a2 1 − 0 0 a1 r2 a 1 1 0 a 2 1 −1 0 u 1 0 0 a1 1 u 1 0 1 0 1 0 W = ; Z = ; 0 1 u 1 0 1 0 1 a 2 0 0 1 u 0 −1 1 a 1 0 1 a2 a3 r3 1 0 0 − a1 a1 r2
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
X = r2 0
−r2 0 −a1 a1 0
0
467
−r2 ;
r2
where r2 = a4 + a3 u + a2 and r3 = a5 + a4 u + a3 . Let us point out that the factorization of TN can be computed with the help of the Schur-type algorithm emerging from Theorem 4.2. and the factorization of TN−1 with the help of the Levinson-type algorithm emerging from Theorem 4.1. Thus these algorithms can be used to solve linear systems via factorization and back substitution or matrix multiplication, respectively. 7. Concluding remarks The algorithms described in the previous sections lead to several methods for solving a linear system TN f = b with a nonsingular, skewsymmetric Toeplitz coe?cient matrix. There are three possibilities, namely (a) via inversion formula, (b) via direct recursion, and (c) via factorization. For each possibility there is a Levinson-type and a Schur-type version. That means we have six methods. In [14] these six methods (and two more) are described in detail and compared from the view point of complexity in sequential processing in the centro-nonsingular case. In the general case complexity matters are more complicated, since the complexity heavily depends on the rank pro'le but the comparison will give, in principle, the same result. It turned out that the Levinson-type algorithm combined with the inversion formula is the most e?cient one from the complexity point of view, provided that for matrixvector multiplication a fast algorithm is used. If it is carried out in the classical way, then direct recursion and factorization are preferable. Let us point out that complexity is not the only criterion for estimating the performance of an algorithm. In Qoating point arithmetics stability is an important issue. It is well known that, as a rule, Schur-type algorithms are more stable than Levinson-type algorithms (see [2]). From this point of view a solution via ZW -factorization and back substitution might be preferable over the other methods. Furthermore, all Schur-type versions are preferable in parallel computing, since they avoid inner product calculations.
References [1] D.A. Bini, V.Y. Pan, Matrix and Polynomial Computations 1: Fundamental Algorithms, BirkhSauser Verlag, Basel, Boston, Berlin, 1994.
468
G. Heinig, K. Rost / Theoretical Computer Science 315 (2004) 453 – 468
[2] R.P. Brent, Stability of fast algorithms for structured linear systems, in: T. Kailath, A.H. Sayed (Eds.), Fast Reliable Algorithms for Matrices with Structure, SIAM, Philadelphia, 1999. [3] P. Delsarte, Y. Genin, The split Levinson algorithm, IEEE Trans. Acoust. Speech Signal Process. ASSP-34 (1986) 470–477. [4] P. Delsarte, Y. Genin, On the splitting of classical algorithms in linear prediction theory, IEEE Trans. Acoust. Speech Signal Process. ASSP-35 (1987) 645–653. [5] C.J. Demeure, Bowtie factors of Toeplitz matrices by means of split algorithms, IEEE Trans. Acoust. Speech Signal Process. ASSP-37 (10) (1989) 1601–1603. [6] D.J. Evans, M. Hatzopoulos, A parallel linear systems solver, Internat. J. Comput. Math. 7 (3) (1979) 227–238. [7] G. Heinig, Inversion of Toeplitz and Hankel matrices with singular sections, Wiss. Zeitschr. d. TH Karl-Marx-Stadt 25 (3) (1983) 326–333. [8] G. Heinig, Chebyshev-Hankel matrices and the splitting approach for centrosymmetric Toeplitzplus-Hankel matrices, Linear Algebra Appl. 327 (1–3) (2001) 181–196. [9] G. Heinig, A. Al-Rashidi, Fast algorithms for skewsymmetric Toeplitz matrices based on recursion of fundamental systems, in preparation. [10] G. Heinig, K. Rost, Algebraic Methods for Toeplitz-like Matrices and Operators, BirkhSauser Verlag, Basel, Boston, Stuttgart, 1984. [11] G. Heinig, K. Rost, DFT representations of Toeplitz-plus-Hankel Bezoutians with application to fast matrix-vector multiplication, Linear Algebra Appl. 284 (1998) 157–175. [12] G. Heinig, K. Rost, E?cient inversion formulas for Toeplitz-plus-Hankel matrices using trigonometric transformations, in: V. Olshevsky (Ed.), Structured Matrices in Mathematics, Computer Science, and Engineering, Vol. 2, AMS-Series, Providence, RI, Contemp. Math. 281 (2001) 247–264. [13] G. Heinig, K. Rost, Centrosymmetric and centro-skewsymmetric Toeplitz matrices and Bezoutians, Linear Algebra Appl. 343–344 (2001) 195–209. [14] G. Heinig, K. Rost, Fast algorithms for skewsymmetric Toeplitz matrices, Oper. Theory Adv. Appl. 135 (2002) 193–208. [15] G. Heinig, K. Rost, Split algorithms for symmetric Toeplitz matrices with arbitrary rank pro'le, Numer. Linear Algebra Appl., to appear. [16] A. Melman, A two-step even-odd split Levinson algorithm for Toeplitz systems, Linear Algebra Appl. 338 (2001) 219–237. [17] V.Y. Pan, Structured Matrices and Polynomials, BirkhSauser Verlag, Boston, Springer, New York, 2001. [18] S. Chandra Sekhara Rao, Existence and uniqueness of WZ factorization, Parallel Comp. 23 (8) (1997) 1129–1139. [19] V.V. Voevodin, E.E. Tyrtyshnikov, Numerical Processes with Toeplitz Matrices, Nauka, Moscow, 1987 (in Russian).
Theoretical Computer Science 315 (2004) 469 – 510
www.elsevier.com/locate/tcs
The aggregation and cancellation techniques as a practical tool for faster matrix multiplication Igor Kaporin1 Computational Center of the Russian Academy of Sciences, Vavilova 40, 119991 Moscow, Russia
Abstract The main purpose of this paper is to present a fast matrix multiplication algorithm taken from the paper of Laderman et al. (Linear Algebra Appl. 162–164 (1992) 557) in a re-ned compact “analytical” form and to demonstrate that it can be implemented as quite e1cient computer code. Our improved presentation enables us to simplify substantially the analysis of the computational complexity and numerical stability of the algorithm as well as its computer implementation. The algorithm multiplies two N × N matrices using O(N 2:7760 ) arithmetic operations. In the case where N = 18 · 48k , for a positive integer k, the total number of 6ops required by the algorithm is 4:894N 2:7760 − 16:165N 2 , which may be compared to a similar estimate for the Winograd algorithm, 3:732N 2:8074 − 5N 2 6ops, N = 8 · 2k , the latter being current record bound among all known practical algorithms. Moreover, we present a pseudo-code of the algorithm which demonstrates its very moderate working memory requirements, much smaller than that of the best available implementations of Strassen and Winograd algorithms. For matrices of medium-large size (say, 2000 6 N ¡ 10; 000) we consider one-level algorithms and compare them with the (multilevel) Strassen and Winograd algorithms. The results of numerical tests clearly indicate that our accelerated matrix multiplication routines implementing two or three disjoint product-based algorithm are comparable in computational time with an implementation of Winograd algorithm and clearly outperform it with respect to working space and (especially) numerical stability. The tests were performed for the matrices of the order of up to 7000, both in double and single precision. c 2004 Elsevier B.V. All rights reserved. Keywords: Fast matrix multiplication; Strassen algorithm; Winograd algorithm; Pan’s aggregation/cancellation method; Numerical stability; Computational complexity
1
Supported by the NSF grant CCR-9732206. E-mail address:
[email protected] (I. Kaporin).
c 2004 Elsevier B.V. All rights reserved. 0304-3975/$ - see front matter doi:10.1016/j.tcs.2004.01.004
470
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
1. Introduction Matrix multiplication is one of the most basic computational tasks arising in numerical computing. Software implementing this operation (among other basic linear algebra modules) is always included into general-purpose scienti-c packages, or invoked by them, see, e.g., [10,13]. The most widely known is the LAPACK library, which includes, e.g., such routines as DGEMM and SGEMM (multiplication of general rectangular matrices in double and single precision, respectively). Matrix multiplication is also a basic operation for many important non-numerical computational problems such as: • transitive closure and all-pair-shortest-distance problems in graphs [1,29]; • parsing algorithms for context-free grammars (as is known, context-free language recognition over an input sequence of length n can be reduced to multiplication of n × n matrices) [15,27]; • pattern recognition tasks (classi-cation and -nding similar objects), arising, e.g., in factor analysis of texts or in image retrieval, see [8] and references therein; • computational molecular biology (processing gene expression pro-les, which is reduced to the problem of identi-cation of Boolean networks) [2,5]. In some of the above problems, the matrices are Boolean rather than -lled with 6oatingpoint numbers; however, most of the results on fast matrix multiplication still hold true. Moreover, the numerical stability problem disappears in Boolean settings. As a part of intensive development of software for fundamental computational kernels during the last three decades, a considerable eLort was directed towards e1cient implementation of fast matrix multiplication (MM) algorithms [3,7,12,17,23,26]. However, only Strassen algorithm (1969) [25] and rather similar Winograd algorithm (1974), see, e.g., [6,14], have been implemented. The latter is often referred to as StrassenWinograd’s, and hereafter we use the abbreviation SW. The main de-ciencies of the SW based implementations are: • the much larger worst-case upper bound for the 6oating-point error as compared to that of the classical O(n3 ) procedure (hence, the Strassen-type algorithms cannot be safely used in single precision 6oating-point computations) cf. [6,7,12,14]; • the need for a rather large volume of work memory; • the discrepancy between the algorithmic tunings providing the minimization of the total operation count and the tunings aimed at the maximization of M6ops performance on modern RISC computers, see, e.g., [24]; • algorithmic complications arising for inputs being rectangular matrices of arbitrary sizes. Some problems also arise with e1cient parallel implementation, but these issues are not treated here. However, there exist other class of practical matrix multiplication algorithms which are clearly better than the SW ones with respect to the numerical stability and workspace consumption, and are competitive with respect to operation count and running time for realistic matrix sizes. The basis for the construction of such algorithms was set in [19,20,21], where the so-called aggregation-cancellation techniques were proposed for calculating two or three disjoint matrix products. Later on, in [18] a great practical
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
471
potential hidden in such designs was revealed, in particular the gain in 6oating-point accuracy, but also their rather regular structure and very moderate working memory requirements, typically smaller than that of the available SW implementations. Our re-ned algorithm multiplies two N × N matrices by using O(N 2:7760 ) 6ops (6oating point arithmetic operations). In the case where N = 18 · 48k , for a positive integer k, the total number of 6ops required by the algorithm is 4:894N 2:7760 − 16:165N 2 which may be related to the estimate TSW = 3:732N 2:8074 − 5N 2 6ops, N = 8 · 2k , for the SW algorithm. The latter was a current record bound among all known practical algorithms. (We do not count the theoretically fast algorithms [9,16] that support even much smaller exponents (2:375 : : : for square matrix multiplications) but are not competitive even with classical algorithm unless N is immensely large.) Our numerical tests indicate that the fast matrix multiplication routine implementing our algorithm based on two and three disjoint products is comparable to an implementation of the SW algorithm with respect to time, but takes considerably less working storage and possesses much better numerical stability (almost as good as for some implementations of the standard MM algorithm). The tests were performed for the matrices of the order of up to 7000, both in double and single precision. The paper is organized as follows. In Section 2, we restate and re-ne some results from [18]; one of the main results is the n × 2n by 2n × n MM algorithm requiring n3 + 3n2 − n bilinear multiplications. This also serves as an elementary introduction into our subject. In Section 3 we present a re-ned compact version of the fast Disjoint Triple MM algorithm taken from [19] as well as the related n × 3n by 3n × n matrix multiplication algorithm using n3 + 12n2 + 24n bilinear multiplications derived similarly to [18]. We give there pseudo-codes for the key algorithms, as well as the analysis of the computational complexity and discussion on numerical stability and computer implementation of the algorithm. In Section 4, we outline one-level procedures derived from the above rectangular MM algorithms, in particular, their adjustment to odd-sized and rectangular inputs. In Section 5, the results of numerical tests are given. Finally concluding remarks are given in Section 6. 2. Two disjoint product based algorithms Let us devise fast MM algorithms [18,22] by relying on aggregation technique, speci-cally, on the so-called 2-procedure; hereafter we refer to them as PK2-algorithms. 2.1. A recursive procedure for two disjoint MM To compute two generally disjoint matrix products Z = XY;
W = UV;
where all U; V; W; X; Y; Z are n × n block matrices with the blocks properly dimensioned, consider n3 aggregates mijk = (xik + ukj )(ykj + vji ):
472
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
Summation over k or over j gives us zij or wki , respectively, up to some additive correction terms which involve only 3n2 multiplications: zij = −cj − (xi + uj )vji +
n k=1
mijk ;
wki = −rk − xik (yk + vi ) +
n j=1
mijk ;
where cj = xi =
n k=1 n k=1
ukj ykj ; xik ;
rk =
uj =
n j=1
n k=1
ukj ykj ;
ukj ;
yk =
n j=1
ykj ;
vi =
n j=1
vji :
Hence, the number of multiplications is only (n) = n3 + 3n2 (compared to 2n3 for the double application of the standard algorithm). The number of additions and subtractions must be accounted separately for each typical size of matrix blocks involved. In what follows, the three matrix pairs {X; U }, {Y; V }, and {Z; W } are composed of l × l=t, l=t × l, and l × l blocks, respectively, where t = 2 for 2-procedure (Section 2) and t = 3 for 3-procedure (Section 3). One can see that the number of additions and subtractions is 1 (n) = 2n3 + 6n2 − 4n for the input-type blocks (i.e., related to the input matrices X; Y; U; V ), and 2 (n) = 2n3 + 4n2 − 2n for the output-type blocks (i.e., related to the output matrices Z; W ). Since the number n3 + 3n2 is always even, a recursive algorithm groups smaller MM problems into pairs, and for each pair, the same procedure applies. For N = nk l with some l -xed, one has k 3 (n) n + 3n2 b(l); b(N ) = b(N=n) = · · · = 2 2 where b(N ) is the number of multiplications for two N × N disjoint matrix products. Thus, the total number of operations can be estimated as O(N !(n) ), where 3 n + 3n2 log n; !(n) = log 2 in particular, !(13) = 2 + log13 8¡2:81071. This exponent ! slightly exceeds ! = log2 7¡2:80736 in the Strassen-type algorithms, but the fast MM algorithm above is much more appealing from the practical viewpoint, especially for 6oating-point calculations, cf. [18].
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
473
2.2. The algorithm for n × 2n by 2n × n product For computation of a single matrix product, one can save more operations. Consider the product H of n × 2n block matrix F by 2n × n block matrix G: H = FG: The standard algorithm “by de-nition” hij = k fik gkj uses 2n3 block multiplications and 2n3 − n2 (output-type) block additions. The original problem is reduced to two disjoint products by the column splitting of F and row splitting of G into two equal blocks each, that is, Y F = [X U ]; G= ; V where X; U and Y; V have the block sizes n × n. Equations Z = XY;
W = UV;
H = Z + W;
reduce the problem to a pair of disjoint matrix multiplications and an n × n matrix addition. Analysis of the expression for zii + wii shows, however, that we may remove the aggregates miii from the summation by spreading their terms among the diagonal corrections for hii . Indeed, for i = j one can directly use the formulas of the preceding subsection, hij = zij + wij = −cj − ri − (xi + uj )vij − xij (yi + vj ) +
n k=1
(mijk + mjki );
while for i = j one readily obtains hii = −ci − ri − (xi + ui )vii − xii (yi + vi ) + 2miii +
k=i
(miik + miki )
= −ci − ri − (xi + ui − uii )vii − xii (yi − yii + vi ) +
k=i
(miik + miki );
where ci = ci + uii yii ; xi = xi + xii ;
ri = ri + uii yii ; ui = ui + uii ;
yi = yi + yii ;
vi = vi + vii :
Introducing the notations Fi;j = xij ;
Fi;n+j = uij ;
Gi;j = yij ;
Gn+i;j = vij ;
for the entries of the input matrices and F0i = −xi − ui + uii ; G1i = −yi ;
G0i = −yi + yii − vi ;
G2i = −vi ;
H 1i = −ci ;
F1i = −xi ;
H 2i = −ri ;
F2i = −ui ;
474
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
for temporary variables, one can obtain the following pseudo-code for the algorithm (the latter eight equalities are valid at the Step 4 below): Step 1: F1i = 0, F2i = 0, G1i = 0, G2i = 0, H 1i = 0, H 2i = 0, i = 1; : : : n; Step 2: do i = 1; n do j = 1; n P := Fi; n+j · Gi; j if (i = j) then Hi; i := P else H 1j := H 1j − P H 2i := H 2i − P F1i := F1i − Fi; j F2j := F2j − Fi; n+j G1i := G1i − Gi; j G2j := G2j − Gn+i; j end if end do end do Step 3: do i = 1; n F0i := F1i + F2i + Fi; n+i F1i := F1i − Fi; i F2j := F2j − Fi; n+i G0i := G1i + G2i + Gi; i G1i := G1i − Gi; i G2j := G2j − Fn+i; i P := Hi; i Hi; i = H 1i + H 2i H 1j := H 1j − P H 2i := H 2i − P end do Step 4: do i = 1; n do j = 1; n if (i = j) then Hi; i := Hi; i + F0i · Gn+i; i + Fi; i · G0i else S1 := F1i + F2j S2 := G1i + G2j Hi; i := H 1j + H 2i + S1 · Gn+j; i + Fj; i · S2
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
475
end if end do end do Step 5: do i = 1; n do j = 1; n do k = 1; n if (|i − j| + |j − k| = 0) then S1 := Fi; k + Fk; n+j S2 := Gk; j + Gn+j; i P := S1 · S2 Hi; j := Hi; j + P Hk; i := Hk; i + P end if end do end do end do Here P; S1; S2 are temporary variables and the symbol “ := ” denotes in-place updating. The symbols F0i ; Hi; j ; : : : indicate some storage areas rather than algebraic terms. The working memory is exactly de-ned by the matrix blocks F0i ; F1i ; F2i ; G0i ; G1i ; G2i ; H 1i ; H 2i , i = 1; : : : ; n. For n of the order of tens, this typically comprises only a small fraction of the total volume of the input and output data. The operations count for the above algorithm is as follows. The number of multiplications is (n) ˜ = n3 + 3n2 − n (n2 at Step 2; 2n2 at Step 4; n3 − n at Step 5), and the number of block additions and subtractions is ˜1 (n) = 2n3 + 6n2 − 4n for the input-type blocks (4(n2 − 2n) at Steps 1–2; 8n at Step 3; 2(n2 − n) at Step 4; 2(n3 − n) at Step 5), and ˜2 (n) = 2n3 + 5n2 − 4n (2(n2 − 2n) at Steps 1–2; 3n at Step 3; 2n + 3(n2 − n) at Step 4; 2(n3 − n) at Step 5) for the output-type blocks. Here we assumed a non-trivial initialization of F1; : : : ; H 2 (diLerent from zeroing at Step 1 above), which allows us to eliminate 6n -ctitious subtractions from zero at Step 2. A similar algorithm with a larger number of multiplications n3 + 3n2 was described in [18].
476
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
2.3. The recursive algorithm for square matrices Multiplying a pair of N × N matrices F and G with numerical entries, assume, for simplicity, that N = nk l, where n and l are even, and k¿1, so M = N=n is also even. Represent F as an n × 2n block matrix with N=n × N=(2n) blocks, G as an 2n × n block matrix with N=(2n) × N=n blocks, and H as an n × n block matrix with N=n × N=n blocks. Then the algorithm of the preceding subsection can be readily applied using TPK2 (N ) = ˜1 (n)
N2 N2 (n) ˜ + ˜ (n) + T2 (N=n); 2 2n2 n2 2
arithmetic operations, where T2 (M ) operations are required for the computation of a pair of M × M=2 by M=2 × M matrix products. The latter problem can be solved either by a standard algorithm (T2 (M ) = 2M 3 − 2M 2 ), which gives rise to the so-called onelevel algorithm [18], or by the application of the (generally, recursive) algorithm of Section 2.1. The one-level algorithm is hereafter referred to as PK21. For this algorithm, one readily obtain that 3 5 1 3 N 2; TPK21 (N ) = 1 + − 2 N + 2n + 5 − n n n which has minimum near n = O(N 1=2 ). However, the actual constant within this “O” should be adjusted when running the corresponding PK21 code on a speci-c computer (see Section 5). If one decides to use recursive calls, Step 5 in the above pseudo-code should be unrolled twice: do i = 1; n=2 do j = 1; n do k = 1; n if (|i − j| + |j − k| = 0) then S1 := Fi; k + Fk; n+j S2 := Gk; j + Gn+j; i T 1 := Fn+1−i; n+1−k + Fn+1−k; 2n+1−j T 2 := Gn+1−k; n+1−j + G2n+1−j; n+1−i P := S1 · S2, Q := T 1 · T 2 Hi; j := Hi; j + P Hk; i := Hk; i + P Hn+1−i; n+1−j := Hn+1−i; n+1−j + Q Hn+1−k; n+1−i := Hn+1−k; n+1−i + Q end if end do end do end do
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
477
For the recursive algorithm we have T2 (M ) = 1 (n)
M2 M2 (n) + 2 (n) 2 + T2 (M=n); 2 2n n 2
where M = nk−1 l and T2 (l) = 2l3 − 2l2 . We need the following simple technical result, cf. [1]. Lemma (FMM recursion). Let T (l) be given, M = nm l, and T (M ) = *T (M=n) + +n−2 M 2 for some constants *¿n2 and +. Then T (M ) = (T (l) + -l2 )*m − -M 2 ; where - = +=(* − n2 ). Corollary. Under the assumptions of the FMM Recursion Lemma, it holds T (M ) = (T (l)l−! + -l2−! )M ! − -M 2 ; where !=
log * : log n
In our case, *=
(n) = (n3 + 3n2 )=2; 2
+=
1 (n) + 2 (n) = 3n3 + 7n2 − 4n; 2
and, consequently, !=
log((n3 + 3n2 )=2) ; log n
-=
6n2 + 14n − 8 : n2 + n
Applying the lemma and using M = N=n, m = k − 1, T2 (l) = 2l3 − 2l2 , we obtain k−1 3 n + 3n2 4n2 + 12n − 8 2 6n2 + 14n − 8 T2 (N=n) = 2l3 + − l (N=n)2 : 2 n +n 2 n2 + n Insert this into the formula for TPK2 , use ((n3 + 3n2 )=2)k = (N=l)! , and after some simpli-cations, obtain n2 + 3n − 1 n2 + 3n − 2 2−! 5n3 + 12n2 − 13n + 4 2 3−! ! 2l TPK2 (N ) = N + 4 − N : l n2 + 3n n2 + n n3 + n 2 For n = 12 we obtain ! 6 2:81086 and TPK2 (N ) =
3−! 179 180 (2l
+
178 2−! )N ! 39 l
−
1277 234
N 2:
478
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
Table 1 PK2 exponents !(n)
n
n3 + 3n2 2
log
n3 + 3n2 = log n 2
8 10 12 14 16 18 20 22 24 26
352 650 1080 1666 2432 3402 4600 6050 7776 9802
2.819810 2.812913 2.810856 2.810920 2.811981 2.813520 2.815275 2.817112 2.818957 2.820770
Table 1 shows !(n) for the nearby even n. (Although the smallest value is !(13), odd n’s are less convenient for coding.) Finally, choosing l = 10, so N = 10 · 12k , we obtain TPK2 (N ) 6 3:776N 2:81086 − 5:457N 2 : This estimate should be compared with similar bounds TS (N ) = 3:895N 2:80736 − 6N 2 , N = 10·2k , for the Strassen algorithm [25] and TSW (N ) = 3:732N 2:80736 −5N 2 , N = 8·2k , for a similar algorithm by Winograd. Thus, the PK2 algorithm can be quite competitive with Strassen type algorithms for not very large matrices. Remark 1. Since all the above functions TPK2 (N ), TS (N ), and TSW (N ) are de-ned for the values of N belonging to special subsets of integers (which never intersect), the above formulas cannot be used for extracting the “best” algorithm unless N is very large. For concrete values of N one should -rst specify the rule by which these algorithms are generalized for an arbitrary N . In particular, one can use padding by zeroes (i.e., in6ating the matrix dimension to a closest regular value) or peeling (i.e., two by two block splitting with regularly sized leading block of maximum possible dimension) in their static or dynamic versions. Also, for the PK2 method one can use n = 12, diLerent for each recursion level. After all, the concrete software design and hardware features can aLect the performance much more essentially than certain less than 10 per cent operation count variations. For certain regular (but “non-optimal”) matrix sizes N and cut-oL parameters l, one can -nd the values of TSW (N ) and TPK21 (N ), e.g., in Table 5, see Section 5. Remark 2. In [18], a somewhat underestimated operation count was mistakenly given for a similar matrix multiplication algorithm.
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
479
3. Three disjoint product based algorithms Our next construction of fast MM algorithms relies on aggregation=cancellation techniques and on two-level block matrix structure; the aggregates involve quadruple rather than double indexing of matrix entries. This enables us to develop the so-called 3Procedure (for computing Three Disjoint MMs), and we refer to the resulting methods for single matrix product as the “PK3 algorithms”. In our exposition, we follow the notations of [11, Section 5]. Our basic problem is the calculation of three disjoint n × n matrix products C 0 = A0 B 0 ;
W 0 = U 0V 0;
Z 0 = X 0Y 0;
(1)
and, for simplicity, we let n be even, n = 2m − 2:
(2)
We -rst describe preprocessing of the input matrices similar to that in [19]. 3.1. Reduction to the case of zero row and column sums We assume, for simplicity, that the entries of A0 ; B0 ; U 0 ; V 0 ; X 0 ; Y 0 are real numbers. (In general, these matrices can be composed of rectangular submatrices, and then our formulae (3)– (8) would still apply.) Write 0 0 0 0 A11 A12 B11 B12 0 0 A = ; B = ; 0 0 0 0 A21 A22 B21 B22 and similarly for U 0 ; V 0 ; X 0 ; Y 0 , where each of the four submatrices has the size (m − 1) × (m − 1), cf. (2). Let I be the (m − 1) × (m − 1) identity matrix and let u0T = [1 : : : 1]
uT = [u0T 1]
and
denote the (m − 1)- and m-vectors composed of all ones, respectively. De-ne the matrices I 1 1 T L= ; R = I − u0 u0 − u0 −u0T m m of sizes m × (m − 1) and (m − 1) × m, respectively. Noting that uT L = 0;
Ru = 0;
RL = I;
consider the transformations A11 = LA011 R;
0 T B11 = LB11 L
0 of the blocks A011 and B11 . Then clearly,
uT A11 = 0;
A11 u = 0;
480
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
uT B11 = 0; A11 B11 =
B11 u = 0;
0 T (LA011 R)(LB11 L )
=
0 L(A011 B11 )LT
=
0 ∗ A011 B11 : ∗ ∗
Now, replace each of the four (m − 1) × (m − 1) blocks A0ij in A0 and Bij0 in B0 by the transformed m × m blocks Aij and Bij with zero row and column sums and arrive at the matrices 0 LA11 R LA012 R L 0 0 R 0 = A= A ; 0 R 0 L LA021 R LA022 R 0 T T 0 T LB11 L LB12 L 0 0 L 0 L = B= B 0 T 0 T : 0 L 0 LT LB21 L LB22 L The product 0 0
0
A B =C =
0 0 C12 C11 0 0 C21 C22
is recovered from the (m − 1) × (m − 1) leading submatrices of the m × m blocks C11 ; C12 ; C21 ; C22 in the product 0 0 C11 ∗ C12 ∗ T ∗ ∗ ∗ ∗ L 0 L 0 : C = AB = A0 B0 = T 0 0 C21 0 L 0 L ∗ C22 ∗ ∗ ∗ ∗ ∗ To conclude this section, let us specify the transformation H = LGR of an (m − 1) × (m − 1)-submatrix G of a left multiplier (e.g., G = A011 into H = A11 ): Him = −
1 m−1 Gij ; m j=1
Hij = Gij + Him ; Hmj = −
m−1 i=1
Hij ;
i = 1; : : : ; m − 1;
i = 1; : : : ; m − 1; j = 1; : : : ; m − 1; j = 1; : : : ; m − 1:
(3) (4) (5)
For the right multipliers, the transformation of an (m − 1) × (m − 1)-submatrix G (e.g., 0 into H = B11 ) given by H = LGLT is even simpler: G = B11 Him = −
m−1 j=1
Hij = Gij ; Hmj = −
i = 1; : : : ; m − 1;
i = 1; : : : ; m − 1; j = 1; : : : ; m − 1;
m−1 i=1
Gij ;
Hij ;
j = 1; : : : ; m − 1:
(6) (7) (8)
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
481
Due to (5) and (8), we avoid computing the matrices Apm+m; qm+m ; Bpm+m; qm+m , : : : ; Ypm+m; qm+m ; p = 0; 1; q = 0; 1, which are not used in our algorithm (as one can see in the next section). Remark 3. The above preprocessing algorithm is diLerent from that in Section 5 of [19], where the same transformation is made for both left and right multiplicands (e.g., for A0 and B0 , respectively), followed by a post-processing stage. In our case, there is no numerical post-processing, and the operation count corresponding to (3) –(8) is, therefore, only about 5=8 times that involved in the preprocessing in [19]. To obtain our next algorithm for three disjoint matrix products, we removed some redundant operations in the algorithm in Section 5 of [19], change some signs in the aggregates, and reordered rows and columns in the transformed matrices A; B; U; V; X; Y . 3.2. A compact form of the aggregation-cancellation algorithm Suppose all six input matrices A0 ; B0 ; U 0 ; V 0 ; X 0 ; Y 0 are preprocessed as in the preceding subsection. Then the following three disjoint products, C = AB;
W = UV;
Z = XY;
are actually computed, where each matrix has size (n + 2) × (n + 2) for n + 2 = 2m. For the transformed matrices we have the following “zero-sum” relationships: m i=1 m j=1 m j=1 m k=1 m j=1 m k=1 m k=1 m i=1
Apm+i;qm+j = 0;
1 6 j 6 m;
Apm+i;qm+j = 0;
1 6 i 6 m;
Bqm+j;rm+k = 0;
1 6 k 6 m;
Bqm+j;rm+k = 0;
1 6 j 6 m;
Urm+j;pm+k = 0;
1 6 k 6 m;
Urm+j;pm+k = 0;
1 6 j 6 m;
Vpm+k;qm+i = 0;
1 6 i 6 m;
Vpm+k;qm+i = 0;
1 6 k 6 m;
p = 0; 1; q = 0; 1;
q = 0; 1; r = 0; 1;
r = 0; 1; p = 0; 1;
p = 0; 1; q = 0; 1;
482
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510 m k=1 m i=1 m i=1 m j=1
Xqm+k;rm+i = 0;
1 6 i 6 m;
Xqm+k;rm+i = 0;
1 6 k 6 m;
Yrm+i;pm+j = 0;
1 6 j 6 m;
Yrm+i;pm+j = 0;
1 6 i 6 m;
q = 0; 1; r = 0; 1;
r = 0; 1; p = 0; 1:
To devise our algorithm, consider the 8m3 = (n+2)3 products (the so-called aggregates, cf. [19]) pqr Mijk = ((−1)r Apm+i;qm+j + (−1)q Urm+j;pm+k + (−1)p Xqm+k;rm+i )
×(Bqm+j;rm+k + Vpm+k;qm+i + Yrm+i;pm+j ); 1 6 i 6 m; 1 6 j 6 m; 1 6 k 6 m; p = 0; 1; q = 0; 1; r = 0; 1: (9) Each of these products equals the sum of the following nine terms: pqr Mijk = (−1)r Apm+i;qm+j Bqm+j;rm+k + (−1)r Apm+i;qm+j Vpm+k;qm+i
+ (−1)r Apm+i;qm+j Yrm+i;pm+j + (−1)q Urm+j;pm+k Bqm+j;rm+k + (−1)q Urm+j;pm+k Vpm+k;qm+i + (−1)q Urm+j;pm+k Yrm+i;pm+j + (−1)p Xqm+k;rm+i Bqm+j;rm+k + (−1)p Xqm+k;rm+i Vpm+k;qm+i + (−1)p Xqm+k;rm+i Yrm+i;pm+j :
Sum over r; i, note that the theseq quantities over q; j, over p; k, and sums rof the type p (−1) U Y , (−1) X B ; rm+j; pm+k rm+i; pm+j qm+k; rm+i qm+j; rm+k q; j p; k r; i (−1) Apm+i; qm+j Vpm+k; qm+i are equal to zero (due to the so called cancellation eLect, cf. [19]), and take into account the zero sum properties of the input matrices. This produces the following expressions for (AB)pm+i; rm+k , (UV )rm+j; qm+i , and (XY )qm+k; pm+j , respectively, which de-ne the desired algorithm: pqr (AB)pm+i;rm+k = (−1)r Mijk − (−1)p+r m Xqm+k;rm+i Vpm+k;qm+i −
q;j
q
q;j
Apm+i;qm+j Yrm+i;pm+j −
q;j
(−1)q+r Urm+j;pm+k Bqm+j;rm+k ;
1 6 i 6 m − 1; 1 6 k 6 m − 1; p = 0; 1; r = 0; 1; (UV )rm+j;qm+i = (−1)q −
p;k
p;k
pqr Mijk − (−1)r+q m
Urm+j;pm+k Bqm+j;rm+k −
p
Apm+i;qm+j Yrm+i;pm+j
p;k
(10)
(−1)p+q Xqm+k;rm+i Vpm+k;qm+i ;
1 6 j 6 m − 1; 1 6 i 6 m − 1; r = 0; 1; q = 0; 1;
(11)
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
(XY )qm+k;pm+j = (−1)p −
r;i
r;i
pqr Mijk − (−1)q+p m
r
483
Urm+j;pm+k Bqm+j;rm+k
Xqm+k;rm+i Vpm+k;qm+i − (−1)r+p Apm+i;qm+j Yrm+i;pm+j ; r;i
1 6 k 6 m − 1; 1 6 j 6 m − 1; q = 0; 1; p = 0; 1:
(12)
In the next section, we estimate arithmetic complexity of this algorithm. Remark 4. The above algorithm can be easily generalized to the case where the sizes of the input matrices are n1 × n2 , n2 × n3 , n2 × n3 , n3 × n1 , n3 × n1 , and n1 × n2 for A0 ; B0 ; U 0 ; V 0 ; X 0 , and Y 0 , respectively, as in [19]. Remark 5. For each -xed triple i; j; k; the eight products (9) obtained with diLerent p; q; r correspond exactly to the eight products P1 ; : : : ; P8 introduced in [11, p. 572] as follows: P1 = M 000 ; P2 = M 010 ; P3 = M 100 ; P4 = M 001 ; P5 = −M 111 ; P6 = −M 101 ; P7 = − M 011 ; P8 = − M 110 . 3.3. Asymptotics for bilinear multiplicative cost The algorithm can be summarized as follows: • Split the matrices properly and apply transformation (3) – (8) to each of the 24 blocks 0 ; then perform all the matrix additions involved in (9). A011 ; : : : ; Y22 • Perform the (bilinear) matrix multiplications involved in (9) – (12) (in general, either a recursive call, or the trivial algorithm, or another algorithm can be applied here). • Perform all additions involved in (10) – (12) (as follows from Section 3.1, for the resulting products C, W , and Z, the bordering rows and columns introduced at the preprocessing stage need not be calculated). This rather rough sketch makes it possible to estimate the number of bilinear multiplications involved. To estimate the number of linear operations (additions, subtractions, and multiplications by scalars) and the working memory usage, we have to reorder the computations properly, see Sections 3.4 –3.6. pqr pqr Note that for all p; q; r there is no actual need to calculate the products Mimm , Mmim , pqr Mmmi , i = 1; : : : ; m, and Apm+m; qm+m Yrm+m; pm+m , Urm+m; pm+m Bqm+m; rm+m , Xqm+m; rm+m Vpm+m; qm+m , since these quantities are never used in (10) – (12). The remaining prodpqr ucts Mijk and the correction terms of the type Apm+i; qm+j Yrm+i; pm+j are computed by 3 using 8(m − 3m + 2) and 24(m2 − 1) multiplications, respectively. Add these quantities and recall 2m = n + 2 to yield the following expression for the total number of bilinear multiplications: (n) = 8m3 + 24m2 − 24m − 8 = n3 + 12n2 + 24n: This number is divisible by 3 whenever n = 6k;
k = 1; 2; : : :
(recall that we already assumed that n is even). Hence, the MMs of smaller size in this construction can be regrouped again in triples. Assuming that N = nk l, k¿1, l¿1,
484
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
one readily obtains the following recurrence relation: 3 n N (n) N = = ··· b + 4n2 + 8n b 3 n 3 n 3 k n 2 = + 4n + 8n b(l); 3
b(N ) =
where b(N ) is the number of bilinear multiplications in the resulting recursive algorithm for three disjoint products of N × N matrices. For n = 48, -xed l, and k → ∞, we obtain an algorithm with asymptotic complexity T (N ) = O(N 2:7760 ): In general, the “base n” algorithm has the asymptotic complexity O(N !(n) ), where !(n) = logn ((n)=3) (cf. Section 2.1); some exponents !(n) are shown in Table 2. The above asymptotics hold for all N since the limitation N = nk l can be relaxed using simple bordering of the original matrices by zeroes (also called static padding) [25]. Such techniques may also be of practical use, see Section 4.2, where the case of rectangular matrices is considered. 3.4. Implementation details for 3-procedure Next, we study the computational scheme for Three Disjoint MMs in some detail to estimate the number of linear operations involved and the working memory used.
Table 2 PK3 exponents !(n)
n
n3 + 12n2 + 24n 3
log
n3 + 12n2 + 24n = log n 3
12 18 24 30 36 42 48 54 60 66 72 78 84 90 96
1248 3384 7104 12,840 21,024 32,088 46,464 64,584 86,880 113,784 145,728 183,144 226,464 276,120 332,544
2.869040 2.811685 2.790517 2.781468 2.777555 2.776125 2.775995 2.776577 2.777559 2.778763 2.780085 2.781464 2.782860 2.784249 2.785617
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
485
Let us introduce the following more compact notation using four-dimensional indexing: qp Apm+i;qm+j = Apq ij ; : : : ; Zqm+k;pm+j = Zkj :
The main part of the algorithm described by (9) – (12) can be implemented as shown by the following pseudo-code: Step 1: do p = 0; 1; q = 0; 1 : do i = 1; : : : ; m : pq pq pq pq pq pq Cmi := 0, Cim := 0, Wmi := 0, Wim := 0, Zmi := 0, Zim := 0 end do end do Step 2: do p = 0; 1; q = 0; 1; r = 0; 1: do i = 1; : : : ; m; j = 1; : : : ; m : rp 00 if (i¡m or j¡m) Cmm := Apq ij Yij if (i¡m and j¡m) then if (p = 0) then 00 Wjirq := Cmm else 00 Wjirq := − (−1)q+r m(Wjirq + Cmm ) end if end if pr pr 00 if (i¡m) Cim := Cim + Cmm qp qp 00 if (j¡m) Cmj := Cmj + (−1)p+r Cmm end do do j = 1; : : : ; m; k = 1; : : : ; m : qr 00 if (j¡m or k¡m) Cmm := Ujkrp Bjk if (j¡m and k¡m) then if (r = 0) then 00 Zkjqp := Cmm else 00 Zkjqp := − (−1)p+q m(Zkjqp + Cmm ) end if end if rq rq 00 if (j¡m) Wjm := Wjm + Cmm pr pr 00 if (k¡m) Wmk := Wmk + (−1)q+r Cmm end do do k = 1; : : : ; m; i = 1; : : : ; m : 00 if (i¡m or k¡m) Cmm := Xkiqr Vkipq if (i¡m and k¡m) then if (q = 0) then 00 Cikpr := Cmm
486
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
else 00 Cikpr := − (−1)q+r m(Cikpr + Cmm ) end if end if qp qp 00 if (k¡m) Zkm := Zkm + Cmm rq rq 00 if (i¡m) Zmi := Zmi + (−1)p+q Cmm end do end do Step 3: do p = 0; 1; q = 0; 1; r = 0; 1 : do i = 1; : : : ; m; j = 1; : : : ; m; k = 1; : : : ; m : if ((i¡m; j¡m) or (j¡m; k¡m) or (k¡m; i¡m)) then 01 q rp p qr Cmm := (−1)r Apq ij + (−1) Ujk + (−1) Xki qr pq rp 10 Cmm := Bjk + Vki + Yij 00 01 10 Cmm := Cmm Cmm end if 00 if (i¡m and k¡m) Cikpr := Cikpr + (−1)r Cmm rq rq q 00 if (i¡m and j¡m) Wji := Wji + (−1) Cmm 00 if (j¡m and k¡m) Zkjqp := Zkjqp + (−1)p Cmm end do end do Step 4: do p = 0; 1; r = 0; 1 : do i = 1; : : : ; m − 1; k = 1; : : : ; m − 1 : pr pr Cikpr := Cikpr − Cim − Wmk end do end do do q = 0; 1; r = 0; 1 : do j = 1; : : : ; m − 1; i = 1; : : : ; m − 1 : rq rq Wjirq := Wjirq − Wjm − Zmi end do end do do p = 0; 1; q = 0; 1 : do k = 1; : : : ; m − 1; j = 1; : : : ; m − 1 : qp qp Zkjqp := Zkjqp − Zkm − Cmj end do end do We use the bordering rows of the resulting matrices C pr ; W rq ; Z qp as temporary variables for the accumulation of appropriate sums. The symbol ‘ := ’ denotes in-place updating, so our symbols Cikpr ; Wjirq ; Zkjqp indicate certain storage areas rather than algebraic terms. Obviously, the required memory does not exceed the amount of bordering
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
487
introduced for all the input and output matrices. We choose n = 2m − 2 of the order of tens, so this typically comprises only a moderate fraction (not larger than (4n + 4)=n2 ) of the total input data volume. We have not commented above on the grouping of matrix products into triples as implied by the recursion. However, for matrix sizes not larger than 10,000, the one-level scheme appears to be most e1cient, at least for many modern RISC computers (see Sections 4 and 5). In this case, no grouping by triples is required, whereas grouping of pairs should be done if the 2-procedure instead of the standard MM is used at the inner level; the latter choice seems to be good for very large matrix sizes. 3.5. Scalar multiplications and additions: exact operation count To show that the algorithm is practically competitive, e.g., with the ones presented in [18,19,25], we should estimate the actual number of linear operations. The number of “linear operations” (i.e., matrix additions, subtractions, and multiplications by scalars m−1 or m required for performing (3) or computing the correction terms in (8)– (10), respectively) can be estimated as follows: • Steps (3)– (8) take 12(5m2 − 13m + 8) operations applied to input-type blocks. • Step (9) involves 32(m3 − 3m + 2) operations applied to input-type blocks. • Steps (10)– (12) can be performed in 24(m3 + 2m2 − 6m + 3) operations applied to output-type blocks. Substituting 2m = n + 2, one obtains the estimates 1 (n) = 4n3 + 39n2 − 18n and 2 (n) = 3n3 + 30n2 + 12n for linear operations performed on the input-type and the output-type blocks, respectively. In Section 3.7, the above formulas are used as the basis for the operation count for a regular level of recursion in the above algorithm. 3.6. An algorithm for a single matrix product The above procedure can be applied to multiply a single pair of N × N matrices with scalar coe1cients quite similar to the approach of Subsection 2.2 (cf. [18]). Consider the product H = FG of two square N × N matrices. Let N be an integer multiple of 3. Split the columns of F and the rows of G into three equal blocks each, that is, B F = [A X U ]; G = Y ; V where A; X; U and B; Y; V have the sizes N × N=3 and N=3 × N , respectively. Then, by computing C = AB;
Z = XY;
W = UV;
488
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
one obtains the required product as H = C + Z + W; and the problem is thus reduced to a triple of disjoint matrix multiplications, followed by a pair of N × N matrix additions. We keep working memory as small as in Section 3.4, by accumulating all three products simultaneously in the course of calculations. Indeed, as one can see from the pseudo-code below, after adding the bordering block rows and columns to the input and output matrices, all the subsequent computations can be performed in-place again. Write as above Fpm+i;qm+j = Fijpq ;
Gqm+j;rm+k = Gijpq ;
Hpm+i;rm+k = Hkjqp ;
and summarize the main part of the algorithm (performed after completing the bordering) as follows: Step 1: do p = 0; 1; q = 0; 1 : do i = 1; : : : ; m; j = 1; : : : ; m : Hijpq := 0 end do end do Step 2: do p = 0; 1; q = 0; 1; r = 0; 1 : do i = 1; : : : ; m; j = 1; : : : ; m : rp 00 if (i¡m or j¡m) Hmm := Fijpq G2m+i; j rq rq 00 if (i¡m and j¡m) Hji := Hji − (−1)q+r Hmm pr pr 00 if (i¡m) Him := Him + Hmm qp qp 00 if (j¡m) Hmj := Hmj + (−1)p+r Hmm end do do j = 1; : : : ; m; k = 1; : : : ; m : qr 00 if (j¡m or k¡m) Hmm := Fj;rpm+k Gjk qp qp 00 if (j¡m and k¡m) Hkj := Hkj − (−1)p+q Hmm rq rq 00 if (j¡m) Hjm := Hjm + Hmm pr pr 00 if (k¡m) Hmk := Hmk + (−1)q+r Hmm end do do k = 1; : : : ; m; i = 1; : : : ; m : pq 00 if (i¡m or k¡m) Hmm := Fk;qr2m+i Gm+k; i pr pr 00 if (i¡m and k¡m) Hik := Hik − (−1)q+r Hmm qp qp 00 if (k¡m) Hkm := Hkm + Hmm rq rq 00 if (i¡m) Hmi := Hmi + (−1)p+q Hmm end do end do
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
489
Step 3: do p = 0; 1; q = 0; 1 : do i = 1; : : : ; m − 1; j = 1; : : : ; m − 1 : Hijpq := mHijpq end do end do Step 4: do p = 0; 1; q = 0; 1; r = 0; 1 : do i = 1; : : : ; m; j = 1; : : : ; m; k = 1; : : : ; m : if ((i¡m; j¡m) or (j¡m; k¡m) or (k¡m; i¡m)) then 01 Hmm := (−1)r Fijpq + (−1)q Fj;rpm+k + (−1)p Fk;qr2m+i qr pq rp 10 Hmm := Gjk + Gm+k; i + G2m+i; j 00 01 10 Hmm := Hmm Hmm end if 00 if (i¡m and k¡m) Hikpr := Hikpr + (−1)r Hmm rq rq q 00 if (i¡m and j¡m) Hji := Hji + (−1) Hmm 00 if (j¡m and k¡m) Hkjqp := Hkjqp + (−1)p Hmm end do end do Step 5: do p = 0; 1; q = 0; 1 : do i = 1; : : : ; m − 1; j = 1; : : : ; m − 1 : pq pq Hijpq := Hijpq − Him − Hmj end do end do Fortunately, the algorithm for a single MM appears to be even more compact than the generic Three Disjoint Product procedure. Here we use 4m2 redundant additions due to the simplistic initialization at Step 1 above, but we save many scalar multiplications by m, performing them just once at Step 3. The latter algorithm actually presents a procedure for multiplying n × 3n matrix F by 3n × n matrix G and requires (n) = n2 +12n+24 bilinear multiplications (the same as above) and ˜1 (n) = 4n3 + 39n2 − 18n and ˜2 (n) = 3n3 + 27n2 + 9n linear operations performed on input-type and output-type blocks, respectively. The above formulas are used in the next section to estimate the complexity for the starting level of recursion in the PK3 algorithm.
490
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
Note that in the above algorithm, the preprocessing stage of Section 3 is made separately for every n × n block, a triple of which composes F or G. Similar to Section 3.4, the working memory volume for the above procedure is bounded by the total amount of the bordering blocks introduced at the preprocessing stage, i.e. ((n + 2)2 − n2 )=n2 = (4n + 4)=n2 times the memory occupied by A; B, and C. When the recursive base n algorithm is applied (see the next subsection), the above quantity should be multiplied by 1 + n−2 + · · · + n−2k+2 6 n2 =(n2 − 1). For instance, in the case of multiplying N × N matrices (C = A · B, N = nk l), the working memory volume for the PK3 method is estimated as WPK3 6
12 N2 n−1
while the Winograd method requires [12] WSW ≈
2 3
N2
workspace. If one takes, e.g., n = 48, then the workspace for Winograd method appears to be more than 2.6 times larger than that required in the PK3 method. 3.7. Recursive algorithm and its best-case performance Let the (block) sizes of all these matrices be n × n. This corresponds to the assumption that N = nk l, where n = 6k (as was assumed earlier) and l is an integer multiple of 3, so each of matrices A; X; U and B; Y; V is partitioned as a square n × n block matrix composed of l × l=3 and l=3 × l submatrices, respectively (l = N=n). Hence, the above recursion scheme readily applies. Noting that the recursive 3Procedure and the corresponding PK3 method for square matrix multiplication diLer only in their initialization stage, one can formally write TPK3 (N ) = −
3n + 3 2 N + T3 (N ): n
The input-type and output-type linear operations take (N=n)2 =3 and (N=n)2 6ops, respectively, so the 3-Procedure (i.e., three disjoint products of N × N=3 by N=3 × N matrices) uses 1 (n) (n) T3 (N ) = + 2 (n) N 2 =n2 + T3 (N=n) 3 3 =
13n3 + 129n2 + 18n 2 2 n3 + 12n2 + 24n N =n + T3 (N=n) 3 3
6ops, and T3 (l) = 2l3 − 3l2 ;
l N:
Applying now the FMM recursion Lemma, one obtains 10n2 + 102n − 54 2−! 13n2 + 129n + 18 2 3−! ! l N ; N T3 (N ) = 2l + − n2 + 9n + 24 n2 + 9n + 24
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
491
and therefore, 10n2 + 102n − 54 2−! 16n3 + 159n2 + 117n + 72 2 3−! ! TPK3 (N ) = 2l N + − l N ; n2 + 9n + 24 n3 + 9n2 + 24n where
log !=
n3 + 4n2 + 8n 3 : log n
To minimize !, set n = 48 to obtain 4647 2−! 29743 2 l N : N! − TPK3 (N ) = 2l3−! + 460 1840 With optimum l = 18, this yields TPK3 (N ) 6 4:894N 2:7760 − 16:165N 2 : With respect to the total operations count, the above PK3 algorithm is quite competitive with Strassen’s, for which TS (N ) 3:895N 2:80736 − 6N 2 ; and even with Winograd’s one, which has TSW (N ) 3:732N 2:80736 − 5N 2 : By the reasons quoted above in Remark 1, we would refrain from a direct comparison of Strassen-type methods and the PK3 algorithm based on the above best-case operation counts. With respect to the running time, the actual cross-over points for these methods will mostly depend on implementation details and computational platform rather than on their operation counts. (Of course, for su1ciently large values of N the PK3 algorithm will always run faster due to its smaller exponent ! = 2:7760.) For certain regular (but “non-optimal”) matrix sizes N and cut-oL parameters l, one can -nd the values of TSW (N ) and TPK31 (N ), e.g., in Table 5 below, see Section 5. 3.8. Cross-over point between PK and SW algorithms It appears that the total operation count for FMM algorithms based on 2- and 3procedures is comparable with that of the Winograd algorithm for square matrices of the order 500¡N ¡4640. For larger orders, the new algorithms are slightly better, at least for 46416N ¡200; 000 with just a few marginal exceptions near N = 33; 000. The numerical comparison was performed as follows. For an arbitrary N , the operation count for the Winograd algorithm was estimated as stat TSW (N ) = min(TSW
padd
dyn ; TSW
padd
stat ; TSW
peel
dyn ; TSW
peel
);
where • stat padd denotes the odd-size -x-up by “static padding”, i.e., by embedding the original N × N matrices into matrices of the (generally larger) size N+ = 2k l with
492
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
subsequent application of the Winograd algorithm. This approach was historically the -rst [25]. In our calculations, the values of k and l delivering the minimum total operation count were obtained through the exhaustive search. • dyn padd denotes the odd-size -x-up by “dynamic padding”, i.e., at each recursive step one increases, if necessary, the matrix size only by one to make it even. Then the Winograd recursion is applied, and the recursion stops when the operation count attains its minimum. The method of row=column duplication [12] is described in a diLerent way but yields the same operation count. • stat peel denotes the odd-size -x-up by “static peeling”, i.e., splitting the original N × N matrices into 2 × 2 block form with the upper left block of the size N1 = 2k l and subsequent application of the Winograd algorithm for the multiplication of such blocks. The rest of the calculations involving rectangular blocks is performed using the standard MM algorithm. The values of k and l delivering the minimum total operation count were obtained through the exhaustive search. • dyn peel denotes the odd-size -x-up by “dynamic peeling”, i.e., at each recursive step one splits, if necessary, the matrix into 2 × 2 block form with 1 × 1 right lower blocks to make the size N − 1 of the left upper block even. Then the Winograd recursion is applied for left upper blocks, while the arising matrix-vector and vector-vector operations are performed by the standard algorithm. The recursion stops when the operation count attains its minimum. While static peeling appears to be always worse than dynamic peeling, there is no clear loser among the remaining three algorithms. On the average, dynamic peeling requires (up to 20%) smaller number of operations than padding algorithms in ≈ 85% cases. The operation count for the PK2=PK3 algorithm was estimated as stat padd stat peel stat padd stat peel stat padd stat peel TPK (N ) = min(TPK22 ; TPK22 ; TPK31=21 ; TPK31=21 ; TPK32 ; TPK32 );
where • stat padd and stat peel denote the same approaches to the odd-size -x-up as above but with the regular problem size N = nml instead of N = 2k l. • PK22 denotes the two-level MM algorithm which uses the algorithms of Sections 2.2, 2.1, and the standard procedure at its outer, middle, and inner recursion levels, respectively. The total operation count for the regular case N = nml, with n and l even, is TPK22 (nml) =
3n2 + 8n − 6 2 n2 + 3n − 1 N + n n 2 m + 2m − 2 2 m + 3 3 × N + N : m 2mn
• PK31=21 denotes the two-level MM algorithm which uses the algorithms of Sections 3.6, 2.1, and the standard procedure at its outer, middle, and inner recursion levels, respectively. The total operation count for the regular case N = nml,
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
493
with n even and l divisible by 3, is TPK31=21 (nml) =
13n2 + 120n + 9 2 n2 + 12n + 24 N + 3n n 2 5m + 9m − 10 2 m + 3 3 × N + N : 6m 2mn
• PK32 denotes the two-level MM algorithm which uses the algorithms of Sections 3.6, 3.4, and the standard procedure at its outer, middle, and inner recursion levels, respectively. The total operation count for the regular case N = nml, with n divisible by 6, m even, and l divisible by 3, is TPK32 (nml) =
13n2 + 120n + 9 2 n2 + 12n + 24 N + 3n 3n 12m2 + 117m − 6 2 m2 + 12m + 24 3 N + N : × 3m 3m2 n
In the above algorithms, the values of n, m and l for which the total operation count is minimum were obtained through the exhaustive search. The values of N beg ; Nend and K = |{N : Nbeg 6 N 6 Nend ; TSW =TPK ¿ 1}|; Rmin =
min
Nbeg 6N 6Nend
TSW =TPK ;
Rmax =
max
Nbeg 6N 6Nend
TSW =TPK
are given in Table 3. These data con-rm that the new algorithms are quite competitive with Winograd algorithm with respect to the total operation count. Of course, the above two-level algorithms are e1cient only for limited values of N . For instance, the obvious PK32=31 or PK33 three-level procedures should be tried as N approaches 200,000. Remark 6. The multiplicative constant in TPK3 (N ) becomes somewhat smaller than 4.894 when the algorithm of Section 2.1 is employed instead of the standard MM at the lowest level l. The one-level procedures PK21 and PK31 can be readily implemented in codes running at high M6ops rate in the range 10006N 610; 000, which may not be the case for the above described two-level procedures. This explains the choice of algorithms for numerical testing in Section 5. 3.9. Estimating numerical stability of the 3-Procedure As we show in Section 5, the presented matrix multiplication algorithm (similar to the one in [18]) demonstrates very good numerical stability due to the structural advantage given by the “long base” recursions. This is an essential property of the algorithms based on the schemes in [19,20,21], whereas the Strassen type algorithms use “base two” recursions and therefore are much less numerically stable. The techniques for the estimation of stability of MM algorithms can be found in [10,13,14]. The
494
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
Table 3 The ratio R = TSW (N )=TPK (N )
N beg
Nend
K
R min
R max
500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10,000 15,000 20,000 25,000 30,000 32,000 33,000 35,000 40,000 50,000 60,000 70,000 80,000 90,000 100,000 110,000 120,000 130,000 140,000 160,000
999 1999 2999 3999 4999 5999 6999 7999 8999 9999 14,999 19,999 24,999 29,999 31,999 32,999 34,999 39,999 49,999 59,999 69,999 79,999 89,999 99,999 109,999 119,999 129,999 139,999 159,999 199,999
1 24 131 822 976 1000 1000 1000 1000 1000 5000 5000 5000 5000 2000 989 2000 5000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 20,000 40,000
0.905 0.941 0.943 0.975 0.991 1.008 1.013 1.024 1.021 1.021 1.018 1.018 1.015 1.003 1.006 0.999 1.006 1.006 1.006 1.017 1.023 1.015 1.025 1.028 1.037 1.022 1.028 1.015 1.015 1.012
1.003 1.012 1.022 1.051 1.056 1.067 1.086 1.078 1.078 1.083 1.088 1.076 1.065 1.062 1.064 1.044 1.059 1.066 1.074 1.091 1.084 1.081 1.079 1.082 1.084 1.085 1.075 1.069 1.066 1.063
general approach to theoretical estimation of the error growth factor for the 6oatingpoint implementation of such algorithms can be found in [6], where the whole class of Strassen-like algorithms was analyzed. Using the standard techniques [6,13,14] for estimating the numerical error growth for the 3-Procedure, one can obtain the following result (somewhat similar to that presented for the 2-Procedure in [18]). If we denote by 7 the machine tolerance (usually near 10−15 and 10−7 in double and single precision, respectively) and use the matrix norm S = max |(S)i;j |; i;j
then the error in the 6oating point implementation of the 3-Procedure applied to a triple of N × N=3 by N=3 × N products C 0 = A0 B0 , W 0 = U 0 V 0 , Z 0 = X 0 Y 0 with N = nk−1 l,
I. Kaporin / Theoretical Computer Science 315 (2004) 469 – 510
495
l6n, and n¿6, satis-es the bound fl([C 0 |W 0 |Z 0 ]) − [C 0 |W 0 |Z 0 ]
2 6O N
log(3n +24n−52) log n
7[A0 |U 0 |X 0 ] [B0 |V 0 |Y 0 ] + O(72 ):
Here and hereafter, [C 0 |W 0 |Z 0 ] denotes the N × N matrix having 1 × 3 block structure, etc. The sketch of the proof is as follows. (We are trying to be as close as possible to the analysis of Strassen’s algorithm in [13,14].) The 6oating point model of scalar additions=subtractions and multiplications is fl(a ± b) = a(1 + ) ± b(1 + *); fl(ab) = ab(1 + +); where ||; |*|; |+|67. Hereafter, we will use the notation 9(S) = fl(S) − S: Together with the simple estimate 9(S1 + S2 )627[S1 |S2 ], we use its general form 2 J 6 J + J − 2 7[S1 | : : : |SJ ] + O(72 ); 9 S i 2 i=1 valid for arbitrary matrices S1 ; : : : ; SJ , as well as the error bound for the standard algorithm applied to the product ST of a I × J matrix P by a J × K matrix Q: 9(PQ) 6
J 2 + 3J − 2 7P Q + O(72 ): 2
The latter, taken with I = K = l, J = l=3, yields 9([A0 B0 |U 0 V 0 |X 0 Y 0 ]) 6 ’(l)7nable irreducible component of the solution variety. In fact, this algorithm outputs information on some subvariety of V (F) and we wanted to compute information concerning irreducibility. This can also be done by means of a factoring procedure adapted to straight-line program encoding of integers (cf. [5]). We also exhibit the following theorem: Theorem 3. There is a bounded error probability Turing machine M that performs the following task: The input of machine M is (1) A straight-line program of size L, depth ‘ and parameters in Z of bounded bit length at most log H that evaluates a list of polynomials F := [f1 ; : : : ; fn ] ∈ Z[X1 ; : : : ; Xn ]n such that F is a generalised Pham system. The output of M is a Kronecker’s encoding of the residue class =eld of some zero ∈ V (F). The running time of M is polynomial in the following quantities: L; d; n; log H; def deg(F); ht() where d is the maximum of the degrees of the polynomials in F and ht() is the height of the residue class =eld of the point ∈ Cn , whose coordinates are algebraic over Q.
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
599
2. Basic notions and notations 2.1. Kronecker’s encoding A Q-de=nable algebraic varieties V ⊆ Cn is the set of common zeros of a >nite set of polynomial equations with coeKcients over the >eld Q. Namely, V ⊆ Cn is a Q-de>nable algebraic varieties if there are polynomials F := [f1 ; : : : ; fs ] ∈ Q[X1 ; : : : ; Xn ]s such that V = V (F). The class of all Q-de>nable algebraic varieties de>nes a unique Noetherian Zariski topology in Cn . This Noetherian topology has the corresponding notion of irreducible closed sets which we call Q-de=nable irreducible algebraic subsets of Cn . Additionally, every Q-de>nable algebraic varieties V ⊆ Cn has a unique minimal description as a >nite union of Q-de>nable irreducible algebraic varieties V = V1 ∪ · · · ∪Vt ⊆ Cn . These Q-de>nable irreducible varieties V1 ; : : : ; Vt are called the Q-irreducible components of V . The C-irreducible components of V are simply called irreducible components. Observe that if W is an irreducible component of V , then there is a Q-irreducible component WQ of V such that W ⊆ WQ . A Q-de>nable algebraic variety V ⊆ Cn is said to be a Q-de>nable complete intersection of codimension r, if there are polynomials F := [f1 ; : : : ; fr ] ∈ Q[X1 ; : : : ; Xn ]r such that V = V (F) and dim V = n − r. Observe that from Macaulay’s Unmixedness Theorem (cf. [36]), if V ⊆ Cn is a Q-de>nable complete intersection variety of codimension r, all the Q-irreducible components of V also have dimension n − r. In [33], L. Kronecker introduced a notion of description of equidimensional algebraic varieties that for sake of readability we reproduce here. This notion has been extensively used in the sequence of papers [12–15,17,18,21,24,26]. Let V ⊆ Cn be an equidimensional Q-de>nable algebraic variety of dimension n−r. From Noether’s Normalisation Lemma, there are generically many non-singular matrices ∈ GL(n; Q) such that the following holds: Let (Y1 ; : : : ; Yn ) be the new coordinates of the aDne space Cn de=ned by . Then, the following is an integral ring extension: A := Q[Y1 ; : : : ; Yn−r ] ,→ Q[V ] := Q[X1 ; : : : ; Xn ]=I (V ): We say that the variables (Y1 ; : : : ; Yn ) de>ned by are in Noether position with respect to the variety V . Observe that if (Y1 ; : : : ; Yn ) are in Noether position with respect to an equidimensional algebraic variety V ⊆ Cn and if W is a Q-irreducible component of V , then the variables (Y1 ; : : : Yn ) are also in Noether position with respect to W . Moreover, let V ⊆ Cn be a Q-de>nable complete intersection variety of codimension r. Let F := [f1 ; : : : ; fr ] ∈ Q[X1 ; : : : ; Xn ]r be a system of polynomial equations de>ning the variety V (i.e. V (F) = V ). Let (F) be the ideal in Q[X1 ; : : : ; Xn ] generated by {f1 ; : : : ; fr } and assume that (F) is a radical ideal. Let ∈ GL(n; Q) be a non-singular matrix that puts the variables in Noether position with respect to the variety V . Then, the ring extension A := Q[Y1 ; : : : ; Yn−r ] ,→ B := Q[X1 ; : : : ; Xn ]=(F) is integral. Because of Macaulay’s Unmixedness Theorem, we conclude that B is a Cohen–Macaulay ring and, from [16, Lemma 3.3.1], we also know that B is a free A-module of positive rank.
600
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
Let V ⊆ Cn be a Q-de>nable equidimensional algebraic variety of codimension r and let ∈ GL(n; Q) be a non-singular matrix that puts the variables in Noether position with respect to V . We denote by (Y1 ; : : : ; Yn ) the set of coordinates in Cn given by . Let u ∈ Q[Y1 ; : : : ; Yn ] be a polynomial. We de>ne the regular mapping 'u : Cn → Cn−r+1 depending on and u as the mapping given by the following identity: 'u (x1 ; : : : ; xn ) := (y1 ; : : : ; yn−r ; u(x1 ; : : : ; xn )): Let 'u |V : V → Cn−r+1 be the restriction of 'u to the algebraic variety V . The image of 'u |V (i.e. 'u (V )) is a Q-de>nable hypersurface Hu of Cn−r+1 . Let mu ∈ Z[Y1 ; : : : ; Yn−r ][Z] be the minimal polynomial equation of the hypersurface Hu . The polynomial mu is a square-free, primitive polynomial, monic with respect to the variable Z (up to a non-zero integer). We say that u is a primitive element with respect to the variety V if 'u |V de>nes a birational isomorphism between V and Hu . In this case, there are polynomials: • * ∈ Z[Y1 ; : : : ; Yn−r ]\{0}, • v1 ; : : : ; vn ∈ Z[Y1 ; : : : ; Yn−r ; Z], such that the rational mapping ('u |V )−1 : Hu → V is given by the following identity: v1 vn −1 ('u |V ) (y1 ; : : : ; yn−r ; z) := (y1 ; : : : ; yn−r ; z); : : : ; (y1 ; : : : ; yn−r ; z) * * for every (y1 ; : : : ; yn−r ; z) ∈ Hu such that *(y1 ; : : : ; yn−r ) = 0. The rational functions {vi =* : 16i6n} are called the parametrisations with respect to the Noether normalisation given by and the primitive element u. The non-zero polynomial * is called a discriminant associated to and u. Denition 4. Let V ⊆ Cn be a Q-de>nable equidimensional algebraic variety of codimension r. A Kronecker’s encoding of V is given by the following sequence of items: (1) A non-singular matrix ∈ GL(n; Z) that puts the variables in Noether position with respect to the variety V . (2) A linear form u := ,1 X1 + · · · + ,n Xn ∈ Z[X1 ; : : : ; Xn ], which is a primitive element with respect to the Noether normalisation given by and with respect to the variety V . (3) The minimal polynomial mu ∈ Z[Y1 ; : : : ; Yn−r ][Z] of the hypersurface Hu := 'u (V ). (4) A non-zero discriminant * ∈ Z[Y1 ; : : : ; Yn−r ] associated to and u. (5) The parametrisations {v1 ; : : : ; vn } ⊆ Z[Y1 ; : : : ; Yn−r ][Z] associated to , u, V and *. In [14,41], Kronecker’s encoding and Kronecker’s polynomial system solver were rediscovered without knowledge of their existing ancestor. In [12,15] the main diKculties in Kronecker’s original approach were solved. For a Q-de>nable complete intersection algebraic variety V ⊆ Cn of dimension n − r, let u ∈ Q[Yn−r+1 ; : : : ; Yn ] be a primitive element of some Kronecker’s encoding of V . Let Hu ⊆ Cn−r+1 be the hypersurface introduced above with minimal polynomial mu .
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
601
Then, the Q-irreducible components of V are in one-to-one correspondence to those of Hu and, hence, in one-to-one correspondence to the irreducible factors of mu . 2.2. Geometric degree For sake of completeness, we shall resume some basic facts concerning geometric degree as introduced in [23] (cf. also [9,60] for alternative notions). Let V ⊆ Cn be a zero-dimensional variety; the geometric degree of V is the number of points in V . If V ⊆ Cn is an equidimensional algebraic variety, the geometric degree of V is the maximum of the degrees of the intersections of V with aKne linear varieties H of dimension dim H = codim V such that V ∩ H is zero dimensional. In the general case, when V ⊆ Cn is not equidimensional, let V = j Cj be an equidimensionaldecomposition of the variety V ; we de>ne the (geometric) degree of V as deg V := j deg Cj . A key result due to [23] is the B ezout’s Inequality: given V; V ⊆ Cn two algebraic varieties, then deg(V ∩ V )6deg V deg V . For instance, given F := [f1 ; : : : ; fr ] ∈ Q [X1 ; : : : ; Xn ]r a system of polynomial r equations de>ning a complete r intersection variety V (F) ⊆ Cn , we have deg V (F)6 i=1 deg fi , and this quantity i=1 deg fi is called the B ezout number of system F. This last inequality is not always an equality; however, it is generically (i.e. up to a zero measure set of the space of polynomial equations of given degree) an equality. A consequence of B,ezout’s inequality above is the following proposition. Proposition 5 (Sabia and Solern,o [44]). Let V ⊆ Cn be a Q-de=nable equidimensional algebraic variety. Assume that the variables are in Noether position with respect to V. Let mu be the minimal polynomial of the complex hypersurface Hu ⊆ Cn−r+1 . Then, deg mu 6deg V . Moreover, the total degree of the discriminant * and the total degree of the parametrisations v1 ; : : : ; vn are also bounded by a quantity that depends polynomially on deg V . 2.3. Straight-line programs Our basic data structure to handle with integer numbers and polynomials is the straight-line program. In this section, we state its de>nition and the model to codify Kronecker’s encoding of algebraic varieties. For a more detailed treatment on straightline programs as data structures, see [31,41,56] and the references therein. Denition 6. A division-free non-scalar straight-line program with inputs X1 ; : : : ; Xn is a pair := (G; Q), where G is a directed acyclic graph, with n + 1 input gates, and Q is a function that assigns to every gate (i; j) one of the following instructions: i=0: Qij :=
Q0;1 := 1; Q0;2 := X1 ; : : : ; Q0;n+1 := Xn ; rs r s Ai; j Qrs · Bi; j Qr s ;
r6i−1;16s6Lr
r 6i−1;16s 6Lr
602
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
r s where 0 6 i 6 ‘ and Ars i; j ; Bi; j are indeterminates over Z called the parameters of . The size of the straight-line program is L() = L0 + · · · + L‘ (where L0 := n + 1), and its depth ‘() = ‘.
r s We identify A = (Ars i; j ) and B = (Bi; j ). Semantically speaking, the straight-line program de>nes an evaluation algorithm of the polynomials:
Qi; j =
|0|62i
Qi;0 j (A; B)X101 · · · Xn0n ;
where each coeKcient Qi;0 j (A; B) is a polynomial in Z[A; B]. A =nite set of polynomials f1 ; : : : ; fr ∈ Z[X1 ; : : : ; Xn ] is said to be evaluated by a straight-line program with parameters in a set F ⊂ Z if specialising the coordinates of the parameters A and B in to values in F, there exist gates (i1 ; j1 ); : : : ; (ir ; jr ) of such that fk = Qik ;jk (a; b; X1 ; : : : ; Xn ) holds for every k, 16k6r. Specialising in the indicated way the parameters of into values of F we obtain a copy of the directed acyclic graph G underlying the straight-line program and of its instruction assignment Q. We call this copy a straight-line program in Z[X1 ; : : : ; Xm ] with parameters in F. The gates of correspond to polynomials belonging to Z[X1 ; : : : ; Xn ]. In this way f1 ; : : : ; fr are represented, computed or evaluated by . We say that f ∈ Z[X1 ; : : : ; Xn ] is computable (or evaluated) by a straight-line program with parameters of height h if the specialisation of A and B is done with integer numbers of bounded height h. Finally, we can encode an integer number by a straight-line program: an integer number 5 ∈ Z is said to be computed by a straight-line program if it can be computed by a straight-line program when considered 5 as an element in Z[X ]. 2.3.1. Straight-line program encoding for varieties Here, we will discuss how our Turing machines work with Kronecker’s encoding of algebraic varieties. Let V := V (F) ⊆ Cn be a complete intersection algebraic variety of codimension r, where F := [f1 ; : : : ; fr ]. Then, a Kronecker’s encoding of V is the list of items [ ; u; mu ; *; v1 ; : : : ; vn ] satisfying the properties described in De>nition 4 above. A mixed dense/straight-line program data structure of a Kronecker’s encoding of V is a straight-line program such that: (0) evaluates {f1 ; : : : ; fr }. (I) evaluates the integral entries of ∈ GL(n; Q). (II) evaluates u := ,1 X1 + · · · + ,n Xn ∈ Z[X1 ; : : : ; Xn ]. (III) evaluates mu ∈ Z[Y1 ; : : : ; Yn−r ][Z]. This polynomial mu is encoded as a list of its coeKcients with respect to the variable Z. The coeKcients in Z[Y1 ; : : : ; Yn−r ] are polynomials evaluated by in labelled nodes. (IV) evaluates * ∈ Z[Y1 ; : : : ; Yn−r ]. (V) evaluates {v1 ; : : : ; vn } ⊆ Z[Y1 ; : : : ; Yn−r ][Z]. Again, the vi s are encoded as the list of their coeKcients in Z[Y1 ; : : : ; Yn−r ], and evaluates these coeKcients at labelled nodes.
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
603
2.4. Some preliminary subalgorithms to be used in the sequel 2.4.1. Elimination step The following statement is a consequence of the technical tools used in the series of papers [12–15,17,18,21,24,26,41]. Theorem 7. There is a bounded error probability Turing machine M1 that performs the following task: • The input of machine M1 is given by the following list of items: ◦ A Kronecker’s encoding of the Q-de=nable algebraic variety V . ◦ A polynomial g ∈ Z[X1 ; : : : ; Xn ] such that g is not a zero divisor in the residue ring Q[V ] and such that V ∩ V (g) = ∅. • The output of machine M1 is a Kronecker’s encoding of the Q-de=nable equidimensional algebraic variety V ∩ V (g). The input of machine M1 is represented in the following form: (1) A straight-line program 1 that codi=es a mixed dense/straight-line program representation of a Kronecker’s encoding of V. (2) The additional polynomial g is given by a non-scalar straight-line program 2 that evaluates g. The running time of M1 is at most polynomial in the quantities deg(V ); L; n; d, where L is the maximum of the sizes of 1 and 2 , and d is the degree of g. The output of M1 (i.e. the Kronecker’s encoding of V ∩ V (g)) is also given using a mixed dense/straight-line program representation of the corresponding Kronecker’s encoding. 2.4.2. Non-Archimedean approximants Let b ∈ Z be a >xed integer number and K a >eld of characteristic 0. In this section we propose an algorithm to solve the following problem: “Given a non-Archimedean approximant of an integral formal power series 8 ∈ K[[T − b]], compute its minimal polynomial in K[T; Z].” This problem is just a classical in a series concerning non-Archimedean approximants and minimal polynomial. In [35] the authors introduced such a kind of algorithms for p-adic approximants. Diophantine approximants were considered in [30]. A close treatment to ours is that of [19]. The new outcome here is not the concept of the procedure but the fact that it is well-suited for mixed dense/straight-line program data structures with precise estimates on its complexity. Denition 8. Let K be a >eld of characteristic zero. A formal power series 8 ∈ K[[T − b]] is an integral formal power series if there exists a non-zero polynomial q(T; Z) ∈K[T; Z] such that the following properties hold: • q(T; Z) is irreducible over K[T; Z]. • q(T; 8) = 0. • deg q = degZ q.
604
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
Such a polynomial q is unique (up to a constant in K), and it is called the minimal polynomial of 8. If d = deg q, we say that 8 has degree d. The regular local ring K[[T − b]] has a natural non-Archimedean absolute value given by its discrete valuation (cf. [61] for instance). Let | · | : K[[T − b]]\{0} → R+ be the non-Archimedean absolute value associated to the (T − b)-adic >ltration in the local ring K[[T − b]]. For every formal power series 8 ∈ K[[T − b]] as above, and for every positive integer d ∈ N, we de>ne the truncated Taylor series expansion of 8 up d−1 to degree d as the univariate polynomial 8d := k=0 ak (T − b)k . For every polynomial q(T; Z) ∈ K[T; Z] and for every positive integer d ∈ N, we have |q(T; 8) − q(T; 8d )|61=2d and the following equivalence also holds: |q(T; 8)|6
1 1 ⇔ |q(T; 8k )|6 d : d 2 2
(2)
Denition 9. Let 8 ∈ K[[T − b]] be a formal power series and let m; k ∈ N be two positive integer numbers. Let K[T; Z]m be the K-vector space of all polynomials in K[T; Z] of (total) degree at most m. We de>ne the subset Lm;k (8) ⊆ K[T; Z]m by the following identity: Lm;k (8) :=
1 g ∈ K[T; Z]m : |g(T; 8)| 6 k 2
:
Observe that Lm;k (8) is a K-vector space of >nite dimension. From Equivalence (2) above, we conclude the following chain of set equalities: 1 Lm;k (8) = g ∈ K[T; Z]m : |g(T; 8k )|6 k ; 2 m+2 2 i j k = (aij ) ∈ K aij T (8k ) ∈ (T − b) : : i+j6m
(3)
Proposition 10. With the same notations as above, let 8 be an integral formal power series of degree d with coeDcients in the =eld K. Let m; k ∈ N be two positive integers. If m¿d and k¿m2 + 1, then, for every g ∈ Lm;k (8), g(T; 8) = 0. Proof. Let q(T; Z) ∈ K[T; Z] be the minimal polynomial of 8. This polynomial is an irreducible polynomial, monic up to a constant, that de>nes a plane algebraic curve V (q) ⊆ K2 , where K is the algebraic closure of K. Additionally, the ring extension A := K[T ] ,→ B := K[T; Z]=(q) is integral and, from [16, Lemma 3.3.1], B is a free A-module. Now, assume that g ∈ Lm;k (8) is a non-zero polynomial. Let ;g : B → B be the S := gh ∈ B; ∀hS ∈ B, where S· denotes residue class modulo homothesy given by ;g (h) the ideal (q).
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
605
Let G(T; U ) ∈ K[T ][U ] be the minimal polynomial of ;g . This polynomial satis>es that it is monic with respect to the variable U (up to a constant in K), and its total degree is, at most, equal to deg(g)deg(q)6m2 (cf. [23]). As G(T; g) ∈ (q), the polynomial G(T; g(T; Z)) ∈ K[T; Z] vanishes on the curve V (q). Now we proceed by extending scalars by tensoring with K[[T − b]]. Namely, as B is a free A-module, the following is also an integral ring extension: A ⊗A K[[T − b]] = K[[T − b]] ,→ B := K[[T − b]] ⊗A B and B is the completion of B. In fact, we have B = K[[T − b]][Z]=(q)e . As G(T; g) ∈ (q) in B, we also have G(T; g) ∈ (q)e in B . As q(T; 8) = 0, then, G(T; g(T; 8)) = 0 too. Finally, observe that g(T; 8) ∈ K[[T − b]] is an integral formal power series and we have just shown that the minimal polynomial with coeKcients in K[T ] satis>ed by g(T; 8) has degree at most m2 . Let us denote >(T; R) ∈ K[T ][R] as the minimal polynomial of g(t; 8) over K[T ]. We assume that it can be written in the following form: >(T; R) = a0 (T ) + a1 (T )R + · · · : If we evaluate this last expression at R = g(T; 8), we get 0 = >(T; g(t; 8)) = a0 (T ) + a1 (T )g(T; 8) + · · · :
(4)
Since g(T; Z) belongs to Lm;k (8), it veri>es g(T; 8) ∈ (T − b)k , so by hypothesis it also 2 2 holds g(T; 8) ∈ (T − b)m +1 and we conclude from Eq. (4) that a0 (T ) ∈ (T − b)m +1 . 2 As a0 ∈ K[T ] and deg(ao ) ≤ deg >6m , we obviously conclude that a0 (T ) ≡0 in K[T ]. Therefore, >(T; R) = RA(T; R) ∈ K[T; R], where A(T; R) is a monic polynomial with respect tho the variable R of total degree at most deg > − 1. As > is the minimal polynomial of g(T; 8) over K[T ], we conclude that A(T; g(T; 8)) = 0. Hence, as K[[T − b]] is an integral domain, the proof is >nished since: >(T; g(T; 8)) = 0 ∧ A(T; g(T; 8)) = 0 =⇒ g(T; 8) = 0
in K[[T − b]]:
Remark 11. Let {U1 ; : : : ; Un } be new variables and let K := Q(U1 ; : : : ; Un ) be a transcendental extension of Q. Let 8 ∈ K[[T − b]] be an integral formal power series and let q(T; Z) ∈ K[T; Z] its minimal polynomial over K[T ]. Then, q(T; Z) is an irreducible polynomial characterised by the following property: “Assume deg(q) = d and let m; k ∈ N be two positive integers such that m¿d and k¿m2 +1. Then, q(T; Z) is the lowest degree monic (up to a constant in K) polynomial in Lm;k (8).” Now we are in conditions to state the basic algorithm of this section. Theorem 12. Let K := Q(U1 ; : : : ; Un ) be a transcendental extension of Q as in Remark 11. Then, there is a universal constant c ¿ 0 such that the following holds:
606
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
There is a bounded error probability Turing machine M2 that performs the following task: • The input machine M2 is a straight-line program of size L, depth ‘ and parameters in a =nite set F ⊆ Z. The straight-line program evaluates the coeDcients in K of some polynomial g ∈ K[T; Z] such that g is the Taylor expansion up to order D2 + 1 of an integral power series 8 ∈ K[[T − b]] of degree D. Moreover, assume that deg(g)6D2 + 1. • The output of machine M2 is a straight-line program 1 of size L1 , depth ‘1 and parameters in the =nite set F1 := F ∪ {x ∈ Z : |x|6(nD)c }. This straight-line program 1 evaluates the minimal polynomial of 8 over K[T; Z] The running time of M2 is at most polynomial in the quantities D; L; n. The total size L1 of the output straight-line program 1 is at most the running time of M2 and, hence, polynomial in the quantities D; L; n. Proof. From Equality (3), given m; k ∈ N and given 8k = g, we can always compute a basis of the K-vector space Lm;k (8) using the Linear Algebra methods adapted to straight-line program encodings as in [31] (which are based on [1] or [8,39]). These Linear Algebra methods adapted to straight-line program encoding contain random methods based either on Zippel–Schwartz tests (cf. [45] or [62]) or on correct-test sequences (cf. [27] or [31]). The running time of these procedures is polynomial in the wanted quantities. Once a basis of Lm;k (8) has been computed we can easily >nd the wanted lowest degree monic (up to a constant in K) polynomial q(T; Z) ∈ Lm;k (8). Remark 13. Observe that if either m2 ¡ D or k ¡ m2 +1, the same algorithm computes either a minimal polynomial of some diOerent integral formal power series 8 of lower degree than 8 or it outputs that Lm;k (8) is the null vector space. In either cases we can proceed to the output for further discussions.
3. Generalised Pham systems In this section, we brieTy discuss some basic facts concerning generalised Pham systems. The reader may >nd additional information on Pham systems in [3,6] or [37,38] and the references therein. 3.1. Basic notions and notations In the sequel, K will denote a zero characteristic >eld and K its algebraic closure. Denition 14. A Pham system of codimension r (r6n) is a >nite subset of polynomials F := [f1 ; : : : ; fr ] ∈ K[X1 ; : : : ; Xn ]r such that for every i, 16i6r, there are polynomials gi ∈ K[X1 ; : : : ; Xn ] and natural numbers di ∈ N\{0} such that fi = Xidi + gi and deg gi ¡ di .
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
607
For every Pham system codimension r, F ∈ K[X1 ; : : : ; Xn ]r , we denote by (F) the ideal in K[X1 ; : : : ; Xn ] generated by the elements in F. Next the lemma follows from a classical and elementary argument. Lemma 15. Let F := [f1 ; : : : ; fr ] ∈ K[X1 ; : : : ; Xn ]r be a Pham system of codimension r, and let B the module B := K[X1 ; : : : ; Xn ]=(F). Then, the extension K[Xr+1 ; : : : ; Xn ] ,→ B is an integral ring extension. In particular, V (F) ⊆ Kn is an algebraic variety of pure codimension r and B is a free (Cohen-Macaulay) K[Xr+1 ; : : : ; Xn ]-module. Let X0 be a new variable. For every polynomial f ∈ K[X1 ; : : : ; Xn ], let fh ∈ K[X0 ; X1 ; : : : ; Xn ] be the homogenisation of fi with respect to new the variable X0 . Let Pn (K) be the n-dimensional projective space over K and let H∞ := {X0 = 0} ⊆ Pn (K) be the hyperplane of points at in>nity in Pn (K) with respect to the new variable X0 . For every list of polynomials F := [f1 ; : : : ; fs ] ∈ K[X1 ; : : : ; Xn ]s let us denote by V (Fh ) the projective variety of the common zeros of [f1h ; : : : ; fsh ] in Pn (K). Denition 16. A generalised Pham system is a >nite subset of polynomials F := [f1 ; : : : ; fn ] ∈ K[X1 ; : : : ; Xn ]n such that the projective variety V (Fh ) ⊆ Pn (K) is a zerodimensional projective variety without points at in>nity (i.e. V (Fh ) ∩ H∞ = ∅). In other words, a system F := [f1 ; : : : ; fn ] ∈ K[X1 ; : : : ; Xn ]n is a generalised Pham system if and only if for every i, 16i6n, there are polynomials Ai ; gi ∈ K[X1 ; : : : ; Xn ] such that fi = Ai + gi and the following properties hold: • For every i, 16i6n, Ai ∈ K[X1 ; : : : ; Xn ] is a homogeneous polynomial of degree deg fi . • For every i, 16i6n, gi is a polynomial of degree at most deg fi − 1. • The projective algebraic variety V (C) ⊆ Pn−1 (K) is empty, where C := [A1 ; : : : ; An ] is the list of leading homogeneous terms of F. For every generalised Pham system F ∈ K[X1 ; : : : ; Xn ]n , we also denote by (F) the ideal in K[X1 ; : : : ; Xn ] generated by the elements in F. Proposition 17. Let F := [f1 ; : : : ; fn ] ∈ K[X1 ; : : : ; Xn ]n be a generalised Pham system. Then, V (F) ⊆ Kn is a non-empty zero-dimensional algebraic variety. Moreover, the Jacobian determinant det(DF) = det(@fi =@Xj ) ∈ K[X1 ; : : : ; Xn ] is a non-zero polynomial. The following elementary lemma follows from the upper degree bounds in the Hilbert Nullstellensatz. The reader may follow some of them in [13,31,32,44] and the references therein. Lemma 18. Let F := [f1 ; : : : ; fn ] ∈ K[X1 ; : : : ; Xn ]n be a generalised Pham system. Then, the ideal (F) contains a Pham system of codimension n. Proof of Proposition 17. Using the previous lemma, the ideal (F) contains a Pham system of codimension n. Hence, V (F) is either empty or a zero-dimensional aKne algebraic variety.
608
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
Let V (Fh ) ⊆ Pn (K) be the projective algebraic variety associated to system F. Since V (Fh ) is de>ned as the set of common zeros of n homogeneous polynomials in n + 1 variables, then V (Fh ) = ∅ (see for instance [46]). Moreover, as F is a generalised Pham system, then V (Fh ) is a zero-dimensional projective variety such that V (Fh ) ⊆ {x0 = 0} and that implies V (F) = ∅. Thus, V (F) ⊆ Kn is a non-empty zero-dimensional algebraic variety. As for the second claim, let F : Kn → Kn be the polynomial mapping given by the identity F(x) := (f1 (x); : : : ; fn (x)) ∀x ∈ Kn . First of all, we observe that F is surjective. In order to prove this claim, let , := (,1 ; : : : ; ,n ) ∈ Kn be a point in Kn . Then, the >bre F −1 (,) is de>ned as the set of common zeros of the generalised Pham system given by the sequence of polynomials [f1 − ,1 ; : : : ; fn − ,n ] ∈ K[X1 ; : : : ; Xn ]n . Thus, F −1 (,) is a non-empty zero-dimensional variety and F is a surjective mapping. From the Second Bertini Theorem (cf. [46, p. 141, Theorem 2]) there is a zero measure subset U ⊆ Kn such that for every x ∈ F −1 (Kn \U ) the tangent mapping DF(x) : Tx Kn → TF(x) Kn is surjective. In particular, DF(x) is a non zero matrix and det(DF) ∈ K[X1 ; : : : ; Xn ] is a non zero polynomial. 3.2. Deforming a generalised Pham system In the sequel we assume that all the polynomials in a generalised Pham system have degree at least 2. Let F := [f1 ; : : : ; fn ] ∈ K[X1 ; : : : ; Xn ]n be a generalised Pham system and let a ∈ K n be a regular point of the mapping F : Kn → Kn (namely, a ∈ K n such that the Jacobian matrix DF(a) is non-singular). We de>ne the deformation of F at a as the system of polynomial equations: Fa := [f1 − Tf1 (a); : : : ; fn − Tfn (a)] ∈ K[T; X1 ; : : : ; Xn ]n : In a numerical analysis context, this deformation is called “Newton homotopy” or “global homotopy”. This deformation is a particular case of the linear deformation (1 − T )F − T G, where G ∈ K[X1 ; : : : ; Xn ]. In our particular case, G := F − F(a). Let V (Fa ) ⊆ Kn+1 be the K-de>nable algebraic variety given by V (Fa ) := {(t; x) ∈ Kn+1 : fi (x) − tfi (a) = 0;
1 6 i 6 n}:
Finally, let (Fa ) ⊆ K[T; X1 ; : : : ; Xn ] be the ideal generated by the set of polynomials {fi (X1 ; : : : ; Xn ) − Tfi (a) : 16i6n}. Proposition 19. Let F be a generalised Pham system with coeDcients in K and let a ∈ K n be a regular point of F : Kn → Kn (i.e. DF(a) ∈ GL(n; K)). With the same notations as above, the following properties hold: (1) The ideal (Fa ) contains a Pham system of codimension 1. (2) The following is an integral ring extension: K[T ] ,→ B := K[T; X1 ; : : : ; Xn ]=(Fa ) and B is a free K[T ]-module of positive rank.
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
609
(3) The variety V (Fa ) is an equidimensional curve (i.e. V (Fa ) has no isolated component of dimension 0). (4) The point (1; a) ∈ V (Fa ) is a smooth point of V (Fa ) and there is one and only one K-irreducible component Wa of V (Fa ) such that (1; a) ∈ Wa . Proof. Assume that F := [f1 ; : : : ; fn ] ∈ K[X1 ; : : : ; Xn ]n . According to the notations of De>nition 16, for every j, 16j6n, fj = Aj + gj , where Aj ∈ K[X1 ; : : : ; Xn ] is a homogeneous polynomial of degree deg(fj ) and gj ∈ K[X1 ; : : : ; Xn ] is a polynomial of degree at most deg(fj ) − 1. As the projective algebraic variety VP (A1 ; : : : ; An ) is empty, there is some constant D = D(deg f1 ; : : : ; deg fn ) such that for every i, 16i6n, there are homogeneous polynomials hij ∈ K[X1 ; : : : ; Xn ], 16j6n, of degree D − deg(fi ) such that the following equality holds: XiD =
n j=1
hij Aj :
Hence the following equality also holds: XiD −
n j=1
hij (fj − Tfj (a)) = −
n j=1
hij (gj − Tfj (a)):
For every i, 16i6n, let Gi (T; X1 ; : : : ; Xn ) ∈ K[T; X1 ; : : : ; Xn ] be the polynomial given by the following identity: Gi (T; X1 ; : : : ; Xn ) :=
n j=1
hij gj −
n j=1
Thij fj (a):
n Observe that XiD − Gi (T; X1 ; : : : ; Xn ) = j=1 hij (fj − Tfj (a)) ∈ (Fa ). As deg(fi )¿2 for every i, 16i6n, we conclude that deg(Gi )6D − 1 for every i, 16i6n. In particular, the system G := [X1D − G1 ; : : : ; XnD − Gn ] ∈ K[T; X1 ; : : : ; Xn ]n is a Pham system of codimension n. Moreover, (G) ⊆ (Fa ). As (1; a) ∈ V (Fa ), we conclude that V (Fa ) is either a curve in Kn+1 or a zerodimensional algebraic variety. Moreover, from Lemma 15, the ring extension K[T ] → K [T; X1 ; : : : ; Xn ]=(G) is integral, where (G) is the ideal generated by the elements in G. We claim that (Fa ) ∩ K[T ] = (0). In order to prove this claim, let h(T ) ∈ K[T ] be a polynomial in the ideal (Fa ). Then, for every i, 16i6n, there are polynomials hi (T; X1 ; : : : ; Xn ) ∈ K[T; X1 ; : : : ; Xn ] such that the following holds: h(T ) =
n i=1
hi (T; x)(fi (x) − Tfi (a)):
Hence, if h(T ) were a non-zero polynomial, there would exist t0 ∈ Q such that h(t0 ) = 0. Thus it would follow that 0 = h(t0 ) = hi (t0 ; X1 ; : : : ; Xn )(fi (x) − t0 fi (a)): (5)
610
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
On the other hand, let Fa; t0 ⊆ K[X1 ; : : : ; Xn ] be the system of polynomials given by the following equality: Fa;t0 := [f1 (x) − t0 f1 (a); : : : ; fn (x) − t0 fn (a)] ∈ K[X1 ; : : : ; Xn ]n : Observe that Fa; t0 is a generalised Pham system in K[X1 ; : : : ; Xn ]. Hence Proposition 17 implies that V (Fa; t0 ) = ∅ in contradiction with Eq. (5) above. Thus, (Fa ) ∩ K[T ] = (0) and we have the following commutative diagram of ring extensions: K[T ] ,→ B2 := K[T; X1 ; : : : Xn ]=(G) ↓' K[T ] ,→ B1 := K[T; X1 ; : : : Xn ]=(Fa ); where ' : B2 → B1 is the canonical projection. In particular, the ring extension K[T ] ,→ B1 is an integral ring extension, and (Fa ) is a complete intersection ideal of codimension 1. Now, from [16, Lemma 3.3.1] we conclude that B1 is a free K[T ]-module of positive rank. From Macaulay’s Unmixedness Theorem (cf. [36, Proposition 16f ] for instance), we know that the ideal (Fa ) has no embedded associated primes. In particular, all associated primes over (Fa ) have codimension 1 and the curve V (Fa ) is an equidimensional curve. Let ma ⊆ K[T; X1 ; : : : ; Xn ] be the maximal ideal associated to the point (1; a). Namely, ma := (T −1; X1 −a1 ; : : : ; Xn −an ), where a = (a1 ; : : : ; an ) ∈ K n . Let B1 := K[T; X1 ; : : : ; Xn ]ma be the localisation of K[T; X1 ; : : : ; Xn ] at ma . From the Jacobian Criterium (cf. [22]) the set Fa is part of a regular system of parameters that generate the maximal ideal of B1 . As the ideal (Fa ) is a complete intersection ideal of codimension 1, we conclude that (B1 )ma := K[T; X1 ; : : : ; Xn ]ma =(Fa )ma is a regular local ring of dimension 1, and the ideal (Fa )ma is a prime ideal in K[T; X1 ; : : : ; Xn ]ma . Then, we conclude that there is a unique K-irreducible component Wa of V (Fa ) such that (1; a) ∈ Wa . Moreover, I (V )ma = I (Wa )ma = (Fa )ma in B1 . Hence, we conclude that K[V (Fa )] = (B1 )ma is a regular local ring of dimension 1 and (1; a) ∈ V (Fa ) is a smooth zero of V (Fa ). Corollary 20. With the same notations as in Proposition 19 above, let K = Q and let Wa be the unique Q-irreducible component of V (Fa ) that contains (1; a). Then, there is at least one Q-irreducible component W of V (F) such that {0} × W ⊆ Wa ∩ V (T ), where V (T ) := {(0; x) ∈ Cn+1 : x ∈ Cn }. Proof. From the second claim of Proposition 19 above, we have the integral ring extension Q[T ] ,→ B := Q[T; X1 ; : : : ; Xn ]=(Fa ). Then, as I (Wa ) is a minimal prime ideal
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
611
over (Fa ), the following is also an integral ring extension: Q[T ] ,→ Q[Wa ] = Q[T; X1 ; : : : ; Xn ]=I (Wa ): From the Krull–Cohen–Seidenberg Theorems, we conclude that Wa ∩ V (T ) is a nonempty zero-dimensional algebraic variety. Hence, as Wa ∩ V (T ) ⊆ V (F) and Wa ∩ V (T ) = ∅, the claim follows. Observe that in the previous Corollary we have shown that T is not a zero divisor in Q[Wa ] = Q[T; X1 ; : : : ; Xn ]=I (Wa ). Hence, the algorithm cited in Theorem 7 can be applied to perform the following task: • Take as input a Kronecker’s encoding of the curve V (Fa ). • Output a Kronecker’s encoding of some Q-de>nable component of V (F). Corollary 21. With the same notations and assumptions as in Proposition 19 above, let Wa ⊆ Qn+1 be the unique Q-irreducible component of V (Fa ) that contains (1; a) ∈ Cn+1 . Let Q[Wa ]ma be the localisation of Q[Wa ] at the maximal ideal ma := (T − 1; X1 − a1 ; : : : ; Xn − an ), where a = (a1 ; : : : ; an ) ∈ Qn . Then, Q[T ](T −1) ,→ Q[Wa ]ma is an integral ring extension and Q[Wa ]ma is a free Q[T ](T −1) -module of positive rank. Hence, the following inequalities hold: rank Q[T ](T −1) Q[Wa ]ma 6 deg(Wa ) 6
n i=1
deg(fi ):
Proof. From Proposition 19 above, we have that A := Q[T ] ,→ B := Q[T; X1 ; : : : ; Xn ]=(Fa ) is an integral ring extension and B is a free A-module of positive rank. From B,ezout’s n inequality we also conclude that deg(Wa )6 deg(V (Fa ))6 i=1 deg(fi ). Finally, in the proof of Proposition 19 we have shown that Q[Wa ]ma = Q[T; X1 ; : : : ; Xn ]ma =(Fa )ma : Additionally, let Q(T ) be the >eld of fractions of Q[T ] and let Q(Wa ) be the >eld of rational functions de>ned in Wa . As Q[T ] ,→ Q[Wa ] is an integral ring extension, Q(Wa ) is a >nite >eld extension of Q(T ). From the de>nition of geometric degree in [23], we have [Q(Wa ) : Q(T )]6 deg(Wa ). In order to conclude the proof of this Corollary we just have to observe that Q[T ](T −1) = Q[T; X1 ; : : : ; Xn ]ma =(X1 − a1 ; : : : ; Xn − an )ma and the following is an integral ring extension Q[T; X1 ; : : : ; Xn ]ma =(X1 − a1 ; : : : ; Xn − an )ma ,→ Q[T; X1 ; : : : ; Xn ]ma =(Fa )ma :
612
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
4. The algorithm Now we are in conditions to exhibit the algorithm we refer at the Introduction. This algorithm has three main steps: Step 1: Choose at random a point a ∈ Zn of bounded height such that DF(a) ∈ GL (n; Q). This is achieved by any of the probabilistic zero tests based either on Zippel– Schwartz test (as in [45,62]) or using correct-test sequences (as in [27] or [31]). In the sequel we always assume that the regular value a satis>es a 6 (nd)O(1) . This upper bound is an immediate consequence of applying any of these probabilistic zero test. Step 2: Lifting Step. From the smooth point (1; a) of the curve V (Fa ), compute a Kronecker’s encoding of the Q-irreducible component Wa ⊆ Cn+1 of V (Fa ). Step 3: Using the algorithm cited in Theorem 7 above, compute a Kronecker’s encoding of the intersection Wa ∩ V (T ). The key ingredient is clearly the algorithm that performs Step 2. We start by a description of this algorithm. 4.1. Lifting step First of all, the following technical property holds: Proposition 22. Let F := [f1 ; : : : ; fn ] ∈ Q[X1 ; : : : ; Xn ]n be a generalised Pham system, and let a ∈ Qn be a point such that DF(a) ∈ GL(n; Q). Let Fa ⊆ Q[T; X1 ; : : : ; Xn ] be the deformation of F given by the regular point a ∈ Qn . Then, the following properties holds: (1) There is an holomorphic mapping A : D → Cn , de=ned in an open neighbourhood D ⊆ C of 1 ∈ D such that V (Fa ) agrees with the graph of A near the simple point (1; a) ∈ V (Fa ). (2) Assume that A := (A1 ; : : : ; An ), where Ai : D → C are holomorphic mappings. For every i, 16i6n, let 8i ∈ C[[T − 1]] be the Taylor expansion of Ai at T = 1. Then, 8i ∈ Q[[T − 1]] and 8i is integral over Q[T ]. (3) Let Wa ⊆ Cn+1 be the unique Q-irreducible component of V (Fa ) that contains the point (1; a). Then, for every i, 16i6n, the integral formal power series 8i ∈ Q[[T − 1]] have degree at most deg(Wa ). Proof. The >rst claim of this proposition is granted by the Implicit Function Theorem (cf. [20] for instance). Moreover, since Wa is the unique Q-irreducible component of V (Fa ) that contains the point (1; a), we conclude that, near (1; a), Wa agrees with the graph of A. For every i, 16i6n, let 'i : Cn+1 → C2 be the canonical projection 'i (t; x1 ; : : : ; xn ) := (t; xi ), ∀(t; x1 ; : : : ; xn ) ∈ Cn+1 , and let Vi := 'i (Wa ) be the ith projection of the Q-irreducible variety Wa . As Q[T ] ,→ Q[Wa ] is an integral ring extension, Vi ⊆ C2 is a hypersurface and there is a polynomial qi (T; Xi ) ∈ Q[T; Xi ] of degree at most deg(Wa ), monic with respect to the variable Xi such that qi |Vi ≡ 0. As the graph of A locally agrees with Wa near (1; a), we conclude that the graph of the holomorphic mapping 'i ◦ A : D → C2 is included in
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
613
Vi near 'i (1; a). In particular, qi (T; Xi ) vanishes in the graph of Ai . Then, by the Identity Principle (cf. [20] for instance) we conclude that for every i, 16i6n, the following holds: qi (T; 8i ) ≡ 0
in C[[T − 1]]:
Moreover, since (1; a) ∈ Qn+1 and F ∈ Q[X1 ; : : : ; Xn ]n , using Hensel’s Lemma (cf. [12] or [61] for instance) we conclude that 8i ∈ Q[[T − 1]]. In particular, 8i ∈ Q[[T − 1]] is an integral formal power series of degree at most deg(Wa ) as wanted. As in Section 2.4.2, let {U1 ; : : : ; Un } be independent variables over Q, let K := Q(U1 ; : : : ; Un ) be the corresponding transcendental >eld extension of Q and let K be the algebraic closure of K. For a generalised Pham system F := [f1 ; : : : ; fn ] ∈ Q[X1 ; : : : ; Xn ]n and for a regular point a ∈ Qn , let 81 ; : : : ; 8n be the Taylor expansions of the holomorphic functions A1 ; : : : ; An of the second claim of Proposition 22 above. We have 8i ∈ Q[[T − 1]] for every i, 16i6n. Then, the following is a formal power series in K[[T − 1]] u := U1 81 + · · · + Un 8n ∈ K[[T − 1]]: Moreover, u is integral over the ring K[T ] and the following Proposition holds: Proposition 23. With the same notations as above, let qu (T; Z) ∈ K[T; Z] be the minimal polynomial of the integral power series u = U1 81 +· · ·+Un 8n de=ned above. Then, qu (T; Z) is the Chow polynomial of the K-de=nable irreducible variety Wa ⊆ Kn+1 with respect to the Noether normalisation A := K[T ] ,→ B := K[T; X1 ; : : : ; Xn ]=I (Wa ):
(6)
In particular, qu (T; Z) is an irreducible polynomial of total degree at most 2 deg(Wa ). The reader should observe that the Chow polynomial with respect to the Noether normalisation (6) is also de>ned in the following terms: Let {U1 ; : : : ; Un } be some new variables, K := Q(U1 ; : : : ; Un ) and let K Q A = K[T ] ,→ K Q B =: BK = K[T; X1 ; : : : ; Xn ]=I (Wa )e be the integral ring extension obtained by extending scalars. Let ;u : BK → BK be the homothesy de>ned by ;u (g) S = (U1 X1 + · · · + Un Xn )g ∈ BK ;
∀gS ∈ BK ;
where · denotes residue class modulo the extended ideal I (Wa )e . The minimal equation of ;u is a polynomial in K[U1 ; : : : ; Un ; T; Z], monic with respect to the variable Z of total degree at most 2 deg(Wa ). This minimal equation of ;u is called the Chow polynomial of Wa with respect to the Noether normalisation (6). The degree bound is a consequence of B,ezout’s inequality as in [23]. Finally, we shall make use of the Newton operator as in [12]. From now on, let F := [f1 ; : : : ; fn ] ∈ Q[X1 ; : : : ; Xn ]n be a generalised Pham system and let a ∈ Qn be a
614
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
regular point of F (i.e. DF(a) ∈ GL(n; Q). Let Wa be the unique Q-irreducible component of V (Fa ) that contains the point (1; a). We de>ne the Newton operator associated to the system Fa as f1 (Z) − Tf1 (a) Z1 .. NFa (Z1 ; : : : ; Zn ) := ... − DFa (Z)−1 : . f1 (Z) − Tf1 (a)
Zn
This Newton operator satis>es the following standard and well-known Proposition. Proposition 24. With the same notations and assumptions as above, for every positive integer number k ∈ N, let NFka (a) ∈ Q[[T − 1]]n be the list of rational functions (in Q[T ]n(T −1) ) given by the following recursion: NF0a (a) = a ∈ Q[[T − 1]]n and for every k, k¿1, we de=ne NFka (a) := NFa (NFk−1 (a)) ∈ Q[[T − 1]]n . a k Then, the sequence {NFa (a) : k ∈ N} is well-de=ned. Moreover, let · : Q[[T − 1]]n → R+ be the maximum norm with respect to the non-Archimedean absolute value |·| : Q[[T −1]] −→ R. Then, for every positive integer number k ∈ N, the following holds: (81 ; : : : ; 8n ) − NFka (a) 6
1 ; 22 k
where (81 ; : : : ; 8n ) ∈ Q[[T − 1]]n are the implicit formal power series of the second claim of Proposition 22. The following algorithm easily follows from the one discussed in [12,15]. This Algorithm uses Strassen’s Vermeidung von Divisionen technique (cf. [55] as adapted in [31]). Proposition 25. There is a deterministic Turing machine M4 that performs the following task: • The input of machine M4 is given by the following information: ◦ A straight-line program of size L, depth ‘ and parameters in Z of bit length at most log2 H that evaluates a generalised Pham system F := [f1 ; : : : ; fn ] ∈ Z[X1 ; : : : ; Xn ]n . ◦ A regular point a ∈ Zn such that a6H . ◦ A positive integer D ∈ N. • The output of machine M4 is the truncated Taylor series expansion (up to degree D) uD of the integral formal power series u := U1 81 + · · · + Un 8n ∈ K(U1 ; : : : ; Un )[[T − 1]]: The polynomial uD is given by its dense encoding in Q(U1 ; : : : ; Un )[T − 1] and its coeDcients are given by a straight-line program of size polynomial in the quantities D; L; d; n, where d := max{deg fi : 16i6n}.
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
615
The running time of M4 is polynomial in the quantities D; L; d; n; log H . The following algorithm is due to [17] (cf. also [34]). We rewrite it as adapted to our particular situation. Theorem 26 (Giusti et al. [17]). There is a bounded error probability Turing machine M5 that performs the following task: • The machine M5 takes as input the following information: ◦ A straight-line program that evaluates a generalised Pham system F := [f1 ; : : : ; fn ] ∈ Z[X1 ; : : : ; Xn ]n . The size of is at most L, the depth is ‘ and the parameters in have bit length at most h. ◦ A regular value a ∈ Zn of bit length at most h. ◦ An irreducible monic polynomial q ∈ Z[U1 ; : : : ; Un ][T; Z] encoded by a non-scalar straight-line program of size at most L, depth at most ‘ and parameters of bit length at most h. Assume that the total degree of q is at most D. • The machine M5 outputs the following information: ◦ First of all, M5 decides whether q is the Chow polynomial of the unique Q-irreducible component Wa of V (Fa ) with respect to the Noether normalisation Q[T ] ,→ Q[T; X1 ; : : : ; Xn ]=(Fa ). ◦ If so, M5 outputs a Kronecker’s encoding of Wa . The running time of M5 is polynomial in max{D; deg Wa }; L; n; d; h, where d := max{deg fi : 16i6n}. In fact, this algorithm in [17] can also be replaced by the “two-by-two reconstruction” algorithm in [31] with similar time bounds and characteristics. The procedure >rst computes a Kronecker’s encoding of some curve C associated to the polynomial q(U1 ; : : : ; Un ; T; Z). Then, M5 decides whether C ⊆ V (Fa ) and (1; a) ∈ C. If this were the case, then C = Wa and we already have a Kronecker’s encoding of Wa . Now we can >nally de>ne the subalgorithm that performs Step 2. This is Algorithm 1. Algorithm 1. INPUT • A straight-line program that evaluates a generalised Pham system F := [f1 ; : : : ; fn ] ∈ Z[X1 ; : : : ; Xn ]n . • A regular point a ∈ Zn . D←1 already computed n ← false while (D6 i=1 deg(fi ) ∨ ¬ already computed) do Apply the Newton operator as in the Turing machine M4 of Proposition 25 above to compute a truncated Taylor expansion (up to degree D2 + 1) uD ∈ K[[T − 1]]. Apply the Turing machine M2 of Theorem 12 to uD . The output is a polynomial qD ∈ K[T; Z] of degree at most D2 + 1.
616
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
if qD = 0 then D←D + 1 else Apply the Turing machine M5 of Theorem 26 to decide whether qD is the Chow polynomial of Wa with respect to the Noether normalisation Q[T ] ,→ Q[T; X1 ; : : : ; Xn ]=I (Wa ): if this were the case then already computed ← true. else D←D + 1 end if end if end while OUTPUT the Kronecker’s encoding of Wa . The following Theorem is simply a consequence of our previous discussion. The reader should simply note that the output of Lifting Step is a Kronecker’s encoding given by polynomials in Z[T; Z] which are given by their dense encoding and their coeKcients (in Z) are given by straight-line program encoding whose size is at most the running time of the procedure. Theorem 27. Algorithm 1 outputs a Kronecker’s encoding of Wa in time at most polynomial in the quantities deg(Wa ); L; n; d; h; where d := max{deg(fi ) : 16i6n}, L is an upper bound of the size of and h is an upper bound of the bit length of and of the bit length of the coordinates of a. 4.2. Proofs of the main Theorems 1 and 3 Proof of Theorem 1. The algorithm cited in Theorem 1 is Algorithm 2 below. Algorithm 2. INPUT • A non-scalar straight-line program evaluating a generalised Pham system F := [f1 ; : : : ; fn ] ∈ Z[X1 ; : : : ; Xn ]n . • Choose at random a point a ∈ Zn such that DF(a) ∈ GL(n; K). Apply the LIFTING STEP Algorithm described in Theorem 27 above. The output is a Kronecker’s encoding of Wa . Apply the elimination Algorithm of Theorem 7 above. The output is Kronecker’s encoding of the non-empty (see Corollary 20 above) zero dimensional algebraic variety W := Wa ∩ V (T ). OUTPUT the Kronecker’s encoding of W .
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
617
From Corollary 20, we know that W := Wa ∩ V (T ) is a non-empty zero-dimensional subvariety of V (F). Then, this algorithm computes what was announced in the claim of Theorem 1. In what concerns complexity, our intermediate results show that the time complexity of this procedure is polynomial in the input length and polynomial in the geometric degree deg(Wa ). As deg(Wa )6def deg(F), then the theorem follows. Proof of Theorem 3. As observed in the Introduction, the output of the algorithm of Theorem 1 is the Kronecker’s encoding of some zero-dimensional Q-de>nable component W of V (F). This encoding is given by the following information: (1) A primitive element u := ,1 X1 + · · · + ,n Xn ∈ Z[X1 ; : : : ; Xn ] whose coeKcients are given by their binary/decimal expansion. (2) The minimal equation mu ∈ Z[T ] of the primitive element. This polynomial is given in dense encoding but its coeKcients are given in straight-line program encoding. (3) The discriminant * ∈ Z given by its straight-line program encoding. (4) The parametrisations: v1 ; : : : ; vn ∈ Z[T ] whose coeKcients are also given by their straight-line program encoding. As W is a Q-de>nable non-empty zero-dimensional variety, there should be some ∈ Cn such that ∈ W . Then, there is at least one Q-irreducible component W of W such that W contains the point ∈ Cn . In fact, all Q-irreducible components of W are of this kind, and W hasan irreducible s minimal decomposition given by W = W1 ∪ · · · ∪ Ws , where deg W = i=1 #(Wi )6 def deg(F) and i ∈ Wi for every i, 16i6s. Moreover, each Q-irreducible component of W is one-to-one identi>ed with some irreducible factor of the polynomial mu over Q[T ]. Thus, the algorithm that proves Theorem 3 is Algorithm 3 below. Algorithm 3. INPUT • A non-scalar straight-line program evaluating a generalised Pham system F := [f1 ; : : : ; fn ] ∈ Z[X1 ; : : : ; Xn ]n . • Apply the Algorithm of Theorem 1 to output a Kronecker’s encoding of W := Wa ∩ V (T ) Factor the minimal polynomial mu ∈ Z[Z] of the primitive element u with respect to the variety W . Choose one of these factors q ∈ Z[Z]. Reduce the parametrisations with respect to the polynomial q and output new parametrisations * ; w1 ; : : : ; wn ∈ Z[Z]. OUTPUT q; * ; w1 ; : : : ; wn ∈ Z[Z]. There is one new task performed by this algorithm: factoring a univariate polynomial whose coeKcients are given in straight-line program encoding. The process of factoring a univariate polynomial whose coeKcients are given in straight-line program encoding was >rst discussed in [28,29]. However, E. Kaltofen did not take into account that the bit complexity not only depends on the degree and
618
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
the size of the straight-line program. As observed in [5], the factorisation of univariate polynomials with integral coeKcients, whose coeKcients are given by straight-line programs also depend on the height of the factors. In fact, in [5] the authors proved the following statement: Theorem 28 (Castro et al. [5]). There is a deterministic Turing machine M6 that performs the following task: • The input of M6 is given by the following items: ◦ A polynomial p ∈ Z[T ] of degree at most d whose coeDcients are encoded by a straight-line program of size L, using parameters of bit length at most h. ◦ A positive integer number H ∈ N. • The output of M6 is the list of all the irreducible factors of p whose coeDcients can be written with at most H bits (i.e. the irreducible factors of p are of logarithmic height at most H ). The running time of M6 is polynomial in the following quantities: d; L; H . Using this algorithm M6 in the step factor of the algorithm of Theorem 3 above, we can >nd the minimum H such that mu ∈ Z[T ] has an irreducible factor whose coeKcients have bit length at most H . Choosing just one of them, we proceed to the step reduce in the same theorem. The height of a zero ∈ Cn is precisely the maximum number of digits required to represent the coeKcients of a Kronecker’s encoding of W . Hence, ht()6H and the Theorem follows. 5. Universal behaviour In this section, we will show that, although the algorithm in Theorem 1 is not universal in the sense of [4,25,42], unfortunately, on the “average” it behaves as a universal symbolic polynomial equation solver. This is what we prove in this section. Proposition 29. Let F be a generalised Pham system with coeDcients in Q and let a ∈ Qn be a point such that F(a) ∈ Qn is a regular value of F (i.e. for every point c ∈ Cn in the =bre F −1 ({F(a)}), c is a regular point of F). Then, we have: (1) For every point c ∈ Cn in the =bre F −1 ({F(a)}), there is one and only one Qirreducible component Wc of V (Fa ) that contains the point (1; c) (i.e. (1; c) ∈ Wc ). is the decom(2) There is a =nite subset S ⊆ F −1 ({F(a)}) such that the following position of V (Fa ) into Q-irreducible components V (Fa ) = c ∈ S Wc . Proof. From De>nition 16, if F := [f1 ; : : : ; fn ] ∈ Q[X1 ; : : : ; Xn ]n is a generalised Pham system, it is also a generalised Pham system in C[X1 ; : : : ; Xn ]n . As c ∈ F −1 ({F(a)}), we have F(c) = F(a), and also V (Fc ) = V (Fa ). As F(a) is a regular value, c ∈ Cn is also a regular point of the mapping F : Cn → Cn . Hence, Proposition 19 applies and there is one and only one (C-)irreducible component Vc of V (Fc ) that contains the smooth point (1; c). Next, as V (Fc ) = V (Fa ), there is at least one Q-irreducible component Wc of V (Fa ) that contains Vc and the smooth point (1; c). Additionally, as (1; c) is a smooth point in V (Fa ) = V (Fc ), the variety Wc is unique and the >rst claim holds.
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
619
On the other hand, let W ⊆ V (Fa ) be a Q-irreducible component of V (Fa ). The ring extension Q[T ] ,→ Q[T; X1 ; : : : ; Xn ]=(I (W )) is integral. In particular, W ∩ V (T − 1) is a non-empty algebraic variety contained in V (Fa ) ∩ V (T − 1) = {(1; x) ∈ Cn+1 : F(x) − F(a) = 0}: Then, if (1; c) ∈ W ∩ V (T − 1) we conclude that F(c) = F(a) (or, equivalently, c ∈ F −1 ({F(a)})) and the >rst claim implies W = Wc . Let F ∈ Q[X1 ; : : : ; Xn ]n be a generalised Pham system. For every point a ∈ Qn such that F(a) ∈ Qn is a regular value, we can decompose V (Fa ) according to either Q-irreducible components or (C-)irreducible components. We shall introduce some notations to distinguish both of them. Thus, we may assume that there are two subsets S; S˜ ⊆ F −1 ({F(a)}) such that V (Fa ) =
c∈S
Wc =
c∈S˜
c ; W
where Wc ⊆ V (Fa ) is the unique Q-irreducible component of V (Fa ) that contains the c is the unique irreducible component of V (Fa ) that contains smooth zero (1; c) and W c ⊆ Wc . As {0} × V (F) = V (Fa ) ∩ the smooth zero (1; c). Additionally, we have that W V (T ), the following corollary immediately follows: Corollary 30. With the same notations and assumptions as above, let a ∈ Qn be such that F(a) ∈ Qn is a regular value and let ∈ V (F) be a zero of the generalised Pham system. Then, there is some c ∈ F −1 ({F(a)}) such that c ⊆ Wc : (0; ) ∈ W We shall make use of a generic deformation of a generalised Pham system in the following terms. Let F := [f1 ; : : : ; fn ] ∈ Q[X1 ; : : : ; Xn ]n be a generalised Pham system with rational coeKcients. Let {Y1 ; : : : ; Yn } be a set of variables algebraically independent over C. Let us de>ne the system of polynomials FY given by the following identities: (Y )
fi := fi (X1 ; : : : ; Xn ) − Tfi (Y1 ; : : : ; Yn ) ∈ Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]; (Y ) FY := [f1 ; : : : ; fn(Y ) ] ∈ Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]n : We call FY the generic deformation of the generalised Pham system F. Let W (FY ) ⊆ C2n+1 be the algebraic variety given by (Y )
W (FY ) := {(t; x; y) ∈ C2n+1 : fi (t; x; y) = 0;
1 6 i 6 n}:
Observe that for every a := (a1 ; : : : ; an ) ∈ Qn , the following equality holds: V (Fa ) = W (FY ) ∩ V (Y1 − a1 ; : : : ; Yn − an ): Proposition 19 above may be rewritten in the following terms:
620
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
Proposition 31. Let F := [f1 ; : : : ; fn ] ∈ Q[X1 ; : : : ; Xn ]n be a generalised Pham system and let FY ∈ Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]n be its generic deformation. Let K := Q(Y1 ; : : : ; Yn ) be the =eld of rational functions with rational coeDcients and let (FY )e be the ideal generated by FY in the ring K[T; X1 ; : : : ; Xn ]. Then, the ring extension K[T ] ,→ K[T; X1 ; : : : ; Xn ]=(FY )e is integral. Moreover, there is a non-zero polynomial h ∈ Q[Y1 ; : : : ; Yn ] such that the following is also an integral ring extension: Q[T; Y1 ; : : : ; Yn ]h ,→ Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h =(FY )ec ;
(7)
where Q[T; Y1 ; : : : ; Yn ]h and Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h are the respective localisations at the multiplicative system S := {1; h; h2 ; : : :}, and (FY )ec is the ideal generated by FY in Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h . Proposition 32. With the same notations and assumptions as above, there is a unique prime ideal pY ∈ Spec(Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]) such that the following properties hold: (1) pY ∩ Q[Y1 ; : : : ; Yn ] = (0). (2) pY is a minimal prime ideal over (FY ) of coheight n + 1. (3) Let peY be the prime generated by pY in K[T; X1 ; : : : ; Xn ]. Then, peY is the unique minimal prime ideal over (FY )e contained in the maximal ideal of K[T; X1 ; : : : ; Xn ] generated by {T − 1; X1 − Y1 ; : : : ; Xn − Yn }. (4) The following is an integral ring extension: Q[T; Y1 ; : : : ; Yn ]h ,→ Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h =pec Y ;
(8)
where h ∈ Q[Y1 ; : : : ; Yn ]\{0} is the non-zero polynomial of Proposition 31 above and pec Y is the ideal generated in Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h by pY . Proof. From Proposition 31 above, there is one and only one prime ideal P ∈ Spec(K [T; X1 ; : : : ; Xn ]) such that P is a minimal prime ideal over (FY )e and such that P is contained in the ideal generated in K[T; X1 ; : : : ; Xn ] by {T − 1; X1 − Y1 ; : : : ; Xn − Yn }. Let m ⊆ K[T; X1 ; : : : ; Xn ] be the maximal ideal given by m := (T − 1; X1 − Y1 ; : : : ; Xn − Yn ). Then, the following properties hold: • FY is part of a regular system of parameters in the local ring A := K[T; X1 ; : : : ; Xn ]m = Q[Y1 ; : : : ; Yn ; T; X1 ; : : : ; Xn ](T −1;
X1 −Y1 ; :::; Xn −Yn ) :
• Pm = (FY )em is the unique prime ideal generated by FY in the local ring A. Then, there is a unique prime ideal pY ∈ Spec(Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]) such that (pY )m = (FY )em and pY ⊆ (T − 1; X1 − Y1 ; : : : ; Xn − Yn ). We also have that (FY ) ⊆ pY and pY ∩ Q[Y1 ; : : : ; Yn ] = (0). From Krull’s Principal Ideal Theorem, we conclude that ht(pY )6n. Additionally, from the integral ring extension (7) we conclude that the ec ring extension Q[T; Y1 ; : : : ; Yn ]h ,→ Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h =pec Y is integral, where pY is the extension of pY to Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h . In particular, we conclude that ht(pY )¿n and the second claim follows. The reader should observe that the third and the fourth claim have been already stated.
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
621
Proposition 33. With the same notations and assumptions as in the previous Proposition, let WY ⊆ C2n+1 be the algebraic variety de=ned as the set of common zeros de=ned by the polynomials in pY . Then, the following properties hold: (1) WY is a Q-de=nable irreducible algebraic variety of dimension n + 1. (2) For every c := (c1 ; : : : ; cn ) ∈ Cn such that h(c) = 0, the algebraic set WY(c) := WY ∩ V (Y1 − c1 ; : : : ; Yn − cn ) is a curve in C2n+1 . (3) For every point c ∈ Cn such that F(c) is a regular value and such that h does not vanish on the =bre F −1 ({F(c)}), then WY(c) is equidimensional and veri=es c × {c} ⊆ W (c) . Moreover, if c ∈ Qn is a rational point, then W (c) the inclusion W Y
Y
is a Q-de=nable equidimensional algebraic variety and veri=es Wc × {c} ⊆ WY(c) .
Proof. We clearly have that WY is a Q-de>nable irreducible algebraic variety of dimension n + 1. Since WY = V (pY ) ⊆ C2n+1 , taking into account the integral ring extension (8) and extending scalars (i.e. tensoring by C ⊗Q ), the following is also an integral ring extension: C[T; Y1 ; : : : ; Yn ]h ,→ C[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h =C Q pec Y : From Krull–Cohen–Seidenberg Theorem, we conclude that for every c ∈ Cn , h(c) = 0, Y ∩ V (Y1 − c1 ; : : : ; Yn − cn ) is non-empty. From Krull’s Printhe algebraic set WY(c) := W cipal Ideal Theorem, we conclude that dim WY(c) ¿1. On the other hand, we have the (c)
Y ⊆ V (Fc ) × {c}, and the set V (Fc ) × {c} is a curve. The second claim inclusion W then follows. Assume now that F(c) is a regular value and that h does not vanish on the >bre F −1 ({F(c)}). From Krull’s Principal Ideal Theorem, every minimal prime ideal over pY + (Y1 − c1 ; : : : ; Yn − cn ) has height at most 2n. Then, every irreducible component of WY(c) is also a curve. Thus, there is a >nite subset S1 ⊆ F −1 ({F(c)}) such that a × {c}: W WY(c) = a∈S1
Finally, as pY ⊆ (T − 1; X1 − Y1 ; : : : ; Xn − Yn ), WY also contains the diagonal ⊆ C2n+1 (c) Y and, by given by the identity := {(t; x; y) ∈ C2n+1 : x = y}. In particular, (1; c) ∈ W (c)
c × {c} ⊆ W Y . Moreover, if c belongs to Qn , then W (c) = WY ∩ V (Y1 − irreducibility, W Y c1 ; : : : ; Yn −cn ) is a Q-de>nable algebraic variety contained in V (Fc ) × {c}. This implies Wc × {c} ⊆ WY(c) . Proposition 34. With the same notations as above, let ∈ V (F) be a zero of the generalised Pham system. Let A ⊆ Cn be the constructible set given by the following identity: A := {z ∈ Cn : (0; ; z) ∈ WY(z) ; h(z) = 0}: Then, A contains a non-empty Zariski open set.
622
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
Proof. Assume that A is contained in some proper hypersurface H := V (G). From the second Bertini Theorem (cf. [46]), there is an open set U ⊆ Cn such that for every x ∈ U , x is a regular value of the surjective mapping F : Cn → Cn . Let c ∈ Cn be such that F(c) ∈ U is a regular value. Then, there is some a ∈ Cn such that F(a) = F(c) a . Thus, either h(a) = 0 or h(a) = 0 and (0; ; a) ∈ W (a) . This second case and (0; ) ∈ W Y implies a ∈ A and hence G(a) = 0. In conclusion, U is contained in the constructible set U0 := F(V (G) ∪ V (h)). But dim U0 6 dim(V (G) ∪ V (h))6n−1 which yields a contradiction. Then, the Proposition follows. Corollary 35. There is a Zariski open set A ⊆ Cn such that the following holds for every c ∈ A: Let ' : C2n+1 → Cn be the canonical projection in the second group of coordinates, '(t; x; y) := x, ∀(t; x; y) ∈ C2n+1 . Then, '(WY(c) ∩ V (T )) = V (F). Proof. We just need to observe that A := previous Proposition.
∈V (F)
A , and the result follows from the
Proposition 36. With the same notations as in Proposition 19, there exists in=nitely many integer points a ∈ Zn such that the following properties hold: (1) F(a) is a regular value of F : Cn → Cn . (2) h(a) = 0. (3) '(Wa ∩ V (T )) = V (F), where ' : Cn+1 → Cn stands for the canonical projection '(t; x) := x; ∀(t; x) ∈ Cn+1 . Proof. Since a ∈ Zn , we apply Proposition 33 to conclude that Wa × {a} ⊆ WY(a) . Then, it suKces to show that we can choose in>nitely many a ∈ Zn such that WY(a) is Qirreducible to conclude that the following equality holds: Wa × {a} = WY(a) and the result follows from Corollary 35 above. Now, consider the following ring extension: Q[T; Y1 ; : : : ; Yn ]h ,→ Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h =pec Y : Observe that pec Y is a prime ideal in Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]h . There is a polynomial q(T; Y1 ; : : : ; Yn ) ∈ Q[T; Y1 ; : : : ; Yn ]h such that the following is an isomorphism: Q[T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ]=pec Y ≡ Q[T; Y1 ; : : : ; Yn ; Z]h =(q)h : Observe that q(T; X1 ; : : : ; Xn ; Y1 ; : : : ; Yn ) is an irreducible polynomial in the ring Q[T; Y1 ; : : : ; Yn ; Z]h . Now, for every integer point a := (a1 ; : : : ; an ) ∈ Zn such that h(a) = 0 and q(T; a1 ; : : : ; an ; Z) is irreducible in Q[T; Z], we have that WY(a) is a Q-irreducible variety. The existence of in>nitely rational points a ∈ Zn verifying that property is guaranteed by Hilbert’s Irreducibility Theorem (cf. [7] or [63]). Proof of Theorem 2. It follows from Proposition 36 above.
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
623
References [1] S.J. Berkowitz, On computing the determinant in small parallel time using a small number of processors, Inform. Process. Lett. 18 (1984) 147–150. [2] D.N. Bernstein, A.G. KuWsnirenko, A.G. HovanskiX-, Newton polyhedra, Uspehi Mat. Nauk 31 (3189) (1976) 201–202. [3] A. Bompadre, Un problema de eliminaci,on geom,etrica en sistemas de Pham-Brieskorn, Master’s Thesis, Universidad de Buenos Aires, Argentina, 2000. [4] D. Castro, M. Giusti, J. Heintz, G. Matera, L.M. Pardo, The hardness of polynomial equation solving, Found. Comput. Math. 3 (2003) 347–420. [5] D. Castro, K. HYagele, J.E. Morais, L.M. Pardo, Kronecker’s and Newton’s approaches to solving: a >rst comparison, J. Complexity 17 (1) (2001) 212–303. [6] E. Cattani, A. Dickenstein, B. Sturmfels, Computing multidimensional residues, in: T. Recio, L. Conzalez (Eds.), Algorithms in Algebraic Geometry and Applications, BirkhYauser, Basel, 1996, pp. 135–164. [7] S.D. Cohen, The distribution of Galois groups and Hilbert’s irreducibility theorem, Proc. London Math. Soc. (3) 43 (2) (1981) 227–250. [8] L. Csanky, Fast parallel matrix inversion algorithms, SIAM J. Comput. 5 (1976) 618–623. [9] W. Fulton, Intersection Theory, Ergebnisse der Mathematik, 3 Folge Band 2, Springer, Berlin, 1984. [10] C.B. Garc,-a, W.I. Zangwill, Pathways to Solutions, Fixed Points, and Equilibria, Prentice-Hall, Englewood CliOs, NJ, 1981. [11] M. Giusti, J. Heintz, La d,etermination des points isol,es et de la dimension d’une vari,et,e alg,ebrique peut se faire en temps polynomial, in: Computational Algebraic Geometry and Commutative Algebra, Cortona, 1991, Symp. Mathematics XXXIV, Cambridge University Press, Cambridge, 1993, pp. 216–256. [12] M. Giusti, J. Heintz, K. HYagele, J.E. Morais, L.M. Pardo, J.L. Monta˜na, Lower bounds for Diophantine approximations, J. Pure Appl. Algebra 117/118 (1997) 277–317. [13] M. Giusti, J. Heintz, J.E. Morais, J. Morgenstern, L.M. Pardo, Straight-line programs in geometric elimination theory, J. Pure Appl. Algebra 124 (1–3) (1998) 101–146. [14] M. Giusti, J. Heintz, J.E. Morais, L.M. Pardo, When polynomial equation systems can be “solved” fast? in: Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, Paris, 1995, Lecture Notes on Computer Science, Vol. 948, Springer, Berlin, 1995, pp. 205–231. [15] M. Giusti, J. Heintz, J.E. Morais, L.M. Pardo, Le rˆole des structures de donn,ees dans les probl]emes d’,elimination, C.R. Acad. Sci. Paris S,er. I Math. 325 (11) (1997) 1223–1228. [16] M. Giusti, J. Heintz, J. Sabia, On the eKciency of eOective NullstellensYatze, Comput. Complexity 3 (1) (1993) 56–95. [17] M. Giusti, G. Lecerf, B. Salvy, A GrYobner free alternative for polynomial system solving, J. Complexity 17 (1) (2001) 154–211. , Schost, Solving some overdetermined polynomial systems, in: Proc. 1999 Internat. Symp. [18] M. Giusti, E. on Symbolic and Algebraic Computation, Vancouver, BC, ACM, New York, 1999, pp. 1–8 (electronic). [19] D.Yu. Grigoriev, Factorization of polynomials over a >nite >eld and the solution of systems of algebraic equations, Zapiski Nauchnykh Seminarov Leningradskigi Otdeleniya Matematicheskogo Instituta 137 (1984) 20–79. [20] R.C. Gunning, H. Rossi, Analytic Functions of Several Complex Variables, Prentice-Hall Inc., Englewood CliOs, NJ, 1965. [21] K. HYagele, J.E. Morais, L.M. Pardo, M. Sombra, On the intrinsic complexity of the arithmetic Nullstellensatz, J. Pure Appl. Algebra 146 (2) (2000) 103–183. [22] R. Hartshorne, Algebraic Geometry, Springer, New York, 1977. [23] J. Heintz, De>nability and fast quanti>er elimination in algebraically closed >elds, Theoret. Comput. Sci. 24 (3) (1983) 239–277. [24] J. Heintz, T. Krick, S. Puddu, J. Sabia, A. Waissbein, Deformation techniques for eKcient polynomial equation solving, J. Complexity 16 (1) (2000) 70–109. [25] J. Heintz, G. Matera, L.M. Pardo, R. Wachenchauzer, The intrinsic complexity of parametric elimination methods, Electron. J. SADIO 1 (1) (1998) 37–51 (electronic).
624
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
[26] J. Heintz, G. Matera, A. Waissbein, On the time-space complexity of geometric elimination procedures, Appl. Algebra Eng. Comm. Comput. 11 (4) (2001) 239–296. [27] J. Heintz, C.P. Schnorr, Testing polynomials which are easy to compute, in: Logic and Algorithmic, Monograph. Enseign. Math 30, Univ. Gen]eve, Geneva, 1982, pp. 237–254. [28] E. Kaltofen, Polynomial-time reductions from multivariate to bi- and univariate integral polynomial factorization, SIAM J. Comput. 14 (2) (1985) 469–489. [29] E. Kaltofen, Factorization of polynomials given by straight-line programs, in: S. Micali (Ed.), Randomness and Computation, Advances in Computing Research, Vol. 5, JAI Press Inc., Greenwitch, CT, 1989, pp. 375–412. [30] R. Kannan, A.K. Lenstra, L. Lov,asz, Polynomial factorization and nonrandomness of bits of algebraic and some transcendental numbers, Math. Comput. 50 (181) (1988) 235–250. [31] T. Krick, L.M. Pardo, A computational method for Diophantine approximation, in: T. Recio, L. Gonzalez (Eds.), Algorithms in Algebraic Geometry and Applications, BirkhYauser, Basel, 1996, pp. 193–253. [32] T. Krick, L.M. Pardo, M. Sombra, Sharp estimates for the arithmetic Nullstellensatz, Duke Math. J. 109 (3) (2001) 521–598. [33] L. Kronecker, GrundzYuge einer arithmetischen theorie de algebraischen grYossen, J. Reine Angew. Math. 92 (1882) 1–122. [34] G. Lecerf, Une alternative aux m,ethodes de r,ee, criture pour la r,esolution des syst]emes alg,ebriques, Ph.D. , Thesis, Ecole polytechnique, Palaiseau, France, 2001. [35] A.K. Lenstra, H.W. Lenstra Jr., L. Lov,asz, Factoring polynomials with rational coeKcients, Math. Ann. 261 (4) (1982) 515–534. [36] H. Matsumura, Commutative Algebra, 2nd Edition, Benjamin/Cummings Publishing Co., Inc., Reading, MA, 1980. [37] B. Mourrain, V.Y. Pan, Solving special polynomial systems by using structured matrices and algebraic residues, in: F. Cucker, M. Shub (Eds.), Foundations of Computational Mathematics, Springer, Berlin, 1997, pp. 287–304. [38] B. Mourrain, V.Y. Pan, Multivariate polynomials, duality, and structured matrices, J. Complexity 16 (1) (2000) 110–180. [39] K. Mulmuley, A fast parallel algorithm to compute the rank of a matrix over an arbitrary >eld, Combinatoria 7 (1) (1987) 101–104. [40] V.Y. Pan, Y. Rami, X. Wang, Structured matrices and Newton’s iteration: uni>ed approach, Linear Algebra Appl. 343/344 (2002) 233–265. [41] L.M. Pardo, How lower and upper complexity bounds meet in elimination theory, in: Applied Algebra, Algebraic Algorithms and Error-Correcting Codes (Paris, 1995), Lecture Notes in Computer Science, Vol. 948, Springer, Berlin, 1995, pp. 33–69. [42] L.M. Pardo, Universal elimination requires exponential running time (extended abstract), in: Proceedings EACA’2000, 2000, pp. 25–51. [43] J.M. Rojas, Some speed-ups and speed limits for real algebraic geometry, J. Complexity 16 (3) (2000) 552–571. [44] J. Sabia, P. Solern,o, Bounds for traces in complete intersections and degrees in the Nullstellensatz, Appl. Algebra Eng. Comm. Comput. 6 (6) (1995) 353–376. [45] J.T. Schwartz, Fast probabilistic algorithms for ver>cation of polynomial identities, J. ACM 27 (4) (1980) 701–717. [46] I.R. Shafarevich, Basic Algebraic Geometry, Vol. 1, 2nd Edition, Springer, Berlin, 1994. [47] M. Shub, S. Smale, Complexity of B,ezout’s theorem I: geometric aspects, J. AMS 6 (2) (1993) 459–501. [48] M. Shub, S. Smale, Complexity of B,ezout’s theorem II: volumes and probabilities, in: Proceedings of the EOective Methods in Algebraic Geometry, Progress of Mathematics, Vol. 109, BirkhYauser, Basel, 1993, pp. 267–285. [49] M. Shub, S. Smale, Complexity of B,ezout’s theorem III: condition number and packing, J. Complexity 9 (1993) 4–14. [50] M. Shub, S. Smale, Complexity of B,ezout’s theorem V: polynomial time, Theoret. Comput. Sci. 133 (1994) 141–164.
L.M. Pardo, J.S. Mart n / Theoretical Computer Science 315 (2004) 593 – 625
625
[51] M. Shub, S. Smale, Complexity of Bezout’s theorem. IV. probability of success and extensions, SIAM J. Numer. Anal. 33 (1) (1996) 128–148. [52] S. Smale, The fundamental theorem of algebra and complexity theory, Bull. Amer. Math. Soc. (N.S.) 4 (1) (1981) 1–36. [53] A.J. Sommese, J. Verschelde, C.W. Wampler, Numerical decomposition of the solution sets of polynomial systems into irreducible components, SIAM J. Numer. Anal. 38 (6) (2001) 2022–2046 (electronic). [54] A.J. Sommese, J. Verschelde, C.W. Wampler, Numerical irreducible decomposition using projections from points on the components, in: Symbolic Computation: Solving Equations in Algebra, Geometry, and Engineering, Contemporary Mathematics, Vol. 286, American Mathematical Society, Providence, RI, 2001, pp. 37–51. [55] V. Strassen, Vermeidung von divisionen, Crelle J. Reine Angew. Math. 264 (1973) 184–202. [56] V. Strassen, Algebraic complexity theory, in: Handbook of Theoretical Computer Science, Vol. A, Elsevier Amsterdam, 1990, pp. 633–672. [57] B. Sturmfels, Solving Systems of Polynomial Equations, CBMS Regional Conference Series in Mathematics, Vol. 97, Published for the Conf. Board of the Mathematical Sciences, Washington, DC, 2002. [58] J. Verschelde, Toric Newton method for polynomial homotopies, J. Symbolic Comput. 29 (4–5) (2000) 777–793. [59] J. Verschelde, P. Verlinden, R. Cools, Homotopies exploiting Newton polytopes for solving sparse polynomial systems, SIAM J. Numer. Anal. 31 (3) (1994) 915–930. [60] W. Vogel, Lectures Notes on B,ezout’s Theorem, in: Tata Lectures Notes, Vol. 74, Springer, Berlin, 1984. [61] O. Zariski, P. Samuel, Commutative Algebra II, Graduate Texts in Mathematics, Vol. 39, Springer, Berlin, 1960. [62] R. Zippel, Probabilistic algorithms for sparse polynomials, in: Proceedings EUROSAM’79, 1979, pp. 216–226. [63] R. Zippel, EOective Polynomial Computation, Kluwer Academic Publishers, Dordrecht, 1993.
Theoretical Computer Science 315 (2004) 627 – 650
www.elsevier.com/locate/tcs
Parametrization of approximate algebraic curves by lines Sonia P(erez-D(+aza , Juana Sendrab , J. Rafael Sendraa;∗ a Departamento
de Matem aticas, Universidad de Alcal a, Facultad de Ciencias, Apartado de Correos 20, E-28871 Madrid, Spain b Departamento de Matem aticas, Universidad Carlos III, E-28911 Madrid, Spain
Abstract It is well known that irreducible algebraic plane curves having a singularity of maximum multiplicity are rational and can be parametrized by lines. In this paper, given a tolerance ¿ 0 and an -irreducible algebraic plane curve C of degree d having an -singularity of multiplicity d − 1, we provide an algorithm that computes a proper parametrization of a rational curve that is exactly parametrizable by lines. Furthermore, the error analysis shows that under certain initial conditions that ensures that points are projectively well de4ned, the output curve lies within the √ o5set region of C at distance at most 2 21=(2d) exp(2). c 2004 Elsevier B.V. All rights reserved. Keywords: Approximate algebraic curves; Rational parametrization; Hibrid symbolic-numeric methods
1. Introduction Over the past several years, many authors have approached computer algebra problems by means of symbolic-numeric techniques. For instance, among others, methods for computing greatest common divisors of approximate polynomials (see [6,9,15,29]), for determining functional decomposition (see [10]), for testing primality (see [21]), for 4nding zeros of multivariate systems (see [9,16,18]), for factoring approximate polynomials (see [11,20,30,31]), or for numerical computation of GrCobner basis (see [28,36]) have been developed.
Authors partially supported by BMF2002-04402-C02-01, HU2001-0002 and GAIA II (IST-2002-35512). Corresponding author. E-mail addresses:
[email protected] (S. P(erez-D(+az),
[email protected] (J. Sendra),
[email protected] (J.R. Sendra). ∗
c 2004 Elsevier B.V. All rights reserved. 0304-3975/$ - see front matter doi:10.1016/j.tcs.2004.01.010
628
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
Similarly, hybrid (i.e. symbolic and numeric) methods for the algorithmic treatment of algebraic curves and surfaces have been presented. For instance, computation of singularities have been treated in [3,5,13,22,26], implicitization methods have been proposed in [12,14], and the numerical condition of implicitly given algebraic curves and surfaces have been analyzed (see [17]). Also, piecewise parametrizations are provided (see [11,23,19]) by means of combination of both algebraic and numerical techniques for solving di5erential equations and rational B-spline manipulations. However, although many authors have addressed the problem of globally and symbolically parametrizing algebraic curves and surfaces (see, [1,24,25,32–34]), only few results have been achieved for the case of approximate algebraic varieties. The statement of the problem for the approximate case is slightly di5erent than the classical symbolic parametrization question. Intuitively speaking, one is given an irreducible aIne algebraic plane curve C, that may or not be rational, and a tolerance ¿0, and J and its parametrization, such the problem consists in computing a rational curve C, that almost all points of the rational curve CJ are in the “vicinity” of C. The notion of vicinity may be introduced as the o5set region limited by the external and internal o5set to C at distance (see Section 4 for more details, and [2] for basic concept on o5sets), and therefore the problem consists in 4nding, if it is possible, a rational curve CJ lying within the o5set region of C. For instance, let us suppose that we are given a tolerance = 0:001, and that we are given the quartic C de4ned by 16:001 + 24:001x + 8y − 2y2 + 12yx + 14:001x2 + 2y2 x + x2 y + x4 − y3 + 6:001x3 : Note that C has genus 3, and therefore the input curve is not rational. Our method provides as an answer the quartic CJ de4ned by 16:008 + 24:012x + 8y − 2y2 + 12yx + 14:006x2 + 2y2 x + x2 y + x4 − y3 + 6:001x3 : Now, it is easy to check that the new curve CJ has an aIne triple point at (−2; −2), and hence it is rational. Furthermore, it can be parametrized by P(t) = (t 3 − 0:001 − t − 2t 2 ; t 4 + 1:999t − t 2 − 2t 3 − 2): In Fig. 1 one may check that C and CJ are close (see Example 2 in Section 3 for more details). The notion of vicinity is geometric and in general may be diIcult to deduce it directly from the coeIcients of the implicit equations; in the sense that two implicit equations f1 and f2 may satisfy that f1 −f2 is small, and however they may de4ne algebraic curves that are not close; i.e. none of them lie in the vicinity of the other. 1 For example, if we consider the line f1 = x + y and the conic f2 = x + y + 1000 x2 + 1 1 1 2 1000 y − 1000 , we have that f1 − f2 ∞ = 1000 . Nevertheless, the curves de4ned by f1 and f2 are not close. The problem of relating the tolerance with the vicinity notion, may be approached either analyzing locally the condition number of the implicit equations (see [17]) or
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
629
Fig. 1. Curve C (left) and curve CJ (right).
studying whether for almost every point P on the original curve, there exists a point Q on the output curve such that the Euclidean distance of P and Q is signi4cantly smaller than the tolerance. In this paper our error analysis will be based on the second approach. From this fact, and using [17], one may derive upper bounds for the distance of the o5set region. In [4], the problem described above is studied for the case of approximate irreducible conics, rational cubics and quadrics, and the error analysis for the conic case is presented. In this paper, although we do not give an answer for the general case, we extend the results in [4] by showing how to solve the question for the special case of curves parametrizable by lines. More precisely, we provide an algorithm that parametrizes approximate irreducible algebraic curves of degree d having an -singularity of multiplicity d−1 (see Section 2). We illustrate the results by some examples (see Section 3), and we analyze the numerical error showing that the output rational curve lies within √ the o5set region of the input perturbated curve at distance at most 2 21=(2d) exp(2) (see Section 4). 2. Numerical parametrization by lines It is well known that irreducible algebraic curves having a singularity of maximum multiplicity are rational, and that they can be parametrized by lines. Examples of curves parametrizable by lines are irreducible conics, irreducible cubics with a double point, irreducible quartics with a triple point, etc. In this section, we show that this property is also true if one considers approximate irreducible algebraic curves that “almost” have a singularity of maximum multiplicity. Before describing the method for the approximate case, and for reasons of completeness, we brieQy recall here the algorithmic approach for symbolically parametrize
630
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
curves having a singularity of maximum multiplicity. The geometric idea for these type of curves is to consider a pencil of lines passing through the singular point if the curve has degree bigger than 2, or through a simple point if the curve is a conic. In this situation, all but 4nitely many lines in the pencil intersect the original curve exactly at two di5erent points: the base point of the pencil and a free point on the curve. The free intersection point depends rationally on the parameter de4ning the line, and it yields a rational parametrization of the curve. More precisely, the symbolic algorithm for parametrizing curves by lines (where the trivial case of lines is excluded) can be outlined as follows (see [33,34] for details): Symbolic parametrization by lines • Given an irreducible polynomial f(x; y) ∈ K[x; y] (K is an algebraically closed 4eld of characteristic zero), de4ning an irreducible aIne algebraic plane curve C of degree d¿1, with a (d − 1)-fold point if d¿3. • Compute a rational parametrization P(t) = (p1 (t); p2 (t)) of C. 1. If d = 2 take a point P on C, else determine the (d − 1)-fold point P of C. 2. If P is at in4nity, consider a linear change of variables such that P is transformed into an aIne point. Let P = (a; b). 3. Compute 1 t @(d−1) f @(d−1) f + + ··· + @(d−1) x (d − 1)! @(d−2) x@y (d − 2)! A(x; y; t) = t @ df 1 @ df + (d−1) + ··· + d @ x d! @ x@y (d − 1)!
t (d−1) @(d−1) f (d − 1)! @(d−1) y : t d @ df d! @ d y
and return P(t) = (−A(P; t) + a; −tA(P; t) + b): Remark. The parametrization can also be obtained as −gd−1 (1; t) −tgd−1 (1; t) P(t) = + a; +b ; gd (1; t) gd (1; t) where gd (x; y) and gd−1 (x; y) are the homogeneous components of g(x; y) = f(x + a; y + b) of degree d and d − 1, respectively. Observe that both components of P(t) have the same denominator. Now, we proceed to describe the method to parametrize by lines approximate algebraic curves. For this purpose, we distinguish between the conic case and the general case. The main di5erence between these two cases is that in the case of conics, if the approximate curve is irreducible, the rationality is preserved. As we will see, the results obtained for conics are similar to those presented in [4]. Afterwards, the ideas for the 2-degree case will be generalized to any degree and therefore results in [4] will be extended. Throughout this section, we 4x a tolerance ¿0 and we will use the polynomial ∞-norm; i.e if p(x; y) = i; j∈I ai; j xi y j ∈ C[x; y] then p(x; y) is de4ned as max{|ai; j |=i; j ∈ I }. In particular if p(x; y) is a constant coeIcient p(x; y) will denote its module.
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
631
2.1. Parametrization of approximate conics Let C be a conic de4ned by an -irreducible (over C) polynomial f(x; y) ∈ C[x; y]; that is f(x; y) cannot be expressed as f(x; y) = g(x; y)h(x; y) + E(x; y) where g; h; E ∈ C[x; y] and E(x; y)¡f(x; y) (see for instance [11]). In particular, this implies that f(x; y) is irreducible and therefore C is rational. Thus, one may try to apply the symbolic parametrization algorithm to C. In order to do that one has to compute a simple point on C. Furthermore, one may check whether the simple point can be taken over R and, if possible, compute it. This can be done either symbolically, for instance introducing algebraic numbers with the techniques presented in [35], or numerically by root 4nding methods. If one works symbolically then the direct application of the algorithm will provide an exact answer. Let us assume that the simple point is approximated. For this purpose, we introduce the notion of -point. J ∈ C2 is an -aIne point of an algebraic plane Denition 1. We say that PJ = (a; J b) curve C de4ned by a polynomial f(x; y) ∈ C[x; y] if it holds that J |f(P)| ¡ ; f(x; y) that is, PJ is a simple point on C computed under 4xed precision f(x; y). Note that we required the relative error w.r.t f(x; y) because for any non-zero complex number the polynomial f(x; y) also de4nes C. J be an -aIne point of C, and let us consider the In this situation, let PJ = (a; J b) conic CJ de4ned by the polynomial J y) = f(x; y) − f(P): J f(x; J Furthermore, CJ is irreducible. Indeed, if fJ factors Now, PJ is really a point on C. J J J J J as f = gJh then f = gJh + f(P) and |f(P)|¡f(x; y), that is f is not -irreducible, which is impossible. Therefore, we have constructed a rational conic, namely CJ on J Hence, we may directly apply the symbolic which we know a simple point, namely P. algorithm to C to get the rational parametrization J J J t) + a; J t) + b); P(t) = (−A(P; J −tA(P; where A(x; y; t) =
(@2 f=@2 x)1=2!
@f=@x + t(@f=@y) : + t (@2 f=@x@y) + (t 2 =2!)@2 f=@2 y
2.2. Parametrization of approximate curves In this subsection we deal with approximate curves of degree bigger than 2. In this case, the main diIculty is that the given approximate algebraic curve is, in general, non-rational even though it might correspond to the perturbation of a rational curve. The idea to solve the problem is to generalize the construction done for conics. For
632
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
this purpose, we observe that the output curve in the 2-degree case is the original polynomial minus its Taylor expansion up to order 1 at the -point, i.e. the evaluation of the polynomial at the point. We will see that for curves of degree d having “almost” a singularity of multiplicity d−1 one may subtract to the original polynomial its Taylor expansion up to order d − 1 at the quasi-singularity to get a rational curve close to the given one. To be more precise, we 4rst introduce the notion of -singularity. J ∈ C2 is an -aIne singularity of multiplicity r of Denition 2. We say that PJ = (a; J b) an algebraic plane curve de4ned by a polynomial f(x; y) ∈ C[x; y] if, for 06i+j6r−1, it holds that J (@i+j f=@i x@ j y) (P) ¡ : f(x; y) Note that an -singularity of multiplicity 1 is an -point on the curve. Similarly, one may introduce the corresponding notion for -singularities at in4nity. However, here we will work only with -aIne singularities taking into account that the user can always prepare the input, by means of a suitable linear change of coordinates, in order to be in the aIne case. Alternatively, one may also use the method described in [9]. In this situation, we denote by Ld the set of all -irreducible (over C) real algebraic curves of degree d having an -singularity of multiplicity d − 1, that we assume is real. In the previous subsection we have seen how to parametrize by lines elements in L2 . In the following, we assume that d¿2 and we show that also elements in Ld can be parametrized by lines. In order to check whether a given curve C of degree d, de4ned by a polynomial f(x; y), belongs to Ld , one has to check the -irreducibility of f(x; y) as well as the existence of an -singularity of multiplicity d − 1. For this purpose, to analyze the -irreducibility, one may use any of the existing algorithms (e.g. [11,21,20,31]). The algorithm given in [11] has polynomial complexity. However, although the algorithm given in [21] has exponential complexity, in practice has very good performance. Furthermore, algorithms in [20,31] provide improvements to the methods described in [21]. For checking the existence and computation of -singularities of multiplicity d − 1 one has to solve the system of algebraic equations: @i+j f (x; y) = 0; @i x@ j y
i + j = 0; : : : ; d − 2;
under 4xed precision ·f(x; y), by applying root 4nding techniques (see [9,22,26,27]). Nevertheless, one may accelerate the computation by reducing the number of equations and degrees involved in the system. More precisely, for some i0 ; j0 ; i1 ; j1 , such that i0 + j0 = i1 + j1 = d − 2, one computes the solutions of the system @i1 +j1 f @i0 +j0 f (x; y) = (x; y) = 0; @i0 x@j0 y @i1 x@j1 y
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
633
Fig. 2. Real part of the curve C.
under 4xed precision f(x; y). Note that the two equations involved are quadratic. For this purpose, one may use well known methods (see for instance [9,22,26,27]). Once these solutions have been approximated, one may proceed as follows: if any of J satis4es that the roots obtained above, say P, i+j @ f J @i x@ j y (P) 6 f(x; y); i + j = 0; : : : ; d − 3; then PJ is an -singularity of multiplicity d−1; otherwise, C does not have -singularities of multiplicity d − 1. As an example (see Example 3 in Section 3), let = 0:001, and let C be the real -irreducible quartic de4ned by f(x; y) = x4 + 2y4 + 1:001x3 + 3x2 y − y2 x − 3y3 + 0:00001y2 − 0:001x − 0:001y − 0:001: Applying the process described above one gets that C has a 3-fold -singularity at PJ = (−0:1248595915 10−6 ; 0:1249844199 10−6 ). In Fig. 2 appears the plot of the real part of C, and one sees that PJ is “almost” a triple point of the curve. Alternatively to the approach described above one may use the techniques presented in [5] in combination with the Gap Theorem (see [8]), and the Test Criterion. Now, in order to parametrize the approximate algebraic curve C ∈ Ld we consider J of multiplicity d − 1. J b) a pencil of lines Ht passing through the -singularity PJ = (a; That is, Ht is de4ned by the polynomial Ht (x; y; t) = y − tx − bJ + at: J If PJ had been really a singularity, then the above symbolic algorithm would have output the parametrization (pJ 1 (t); pJ 2 (t)) ∈ R(t)2 , where pJ 1 (t) is the root in R(t) of
634
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
the polynomial f(x; tx + bJ − at) J d−1 (x − a) J and pJ 2 (t) = t pJ 1 (t) + bJ − t a. J However, in our case PJ is not a singularity but an -singularity. Then, the idea consists in computing the root in R(t) of the quotient of J at) J at)) f(x; tx+ b− J and (x−a) J d−1 w.r.t. x (note that degx (f(x; tx+ b− J = d, and therefore J = (pJ 1 (t); t pJ 1 (t) + the quotient has degree 1 in x), say pJ 1 (t), to 4nally consider P(t) J bJ − t a) J as approximate parametrization of C. In the next lemma we prove that P(t) is really a rational parametrization, and in Section 4, we will see that the error analysis shows that this construction generates a rational curve close to the original one. J Lemma 1. Let f(x; y) be the implicit equation of a curve C ∈ Ld and let PJ = (a; J b) be the -singularity of multiplicity d − 1 of C. Let pJ 1 (t) be the root in R(t) of the quotient of f(x; tx + bJ − at) J and (x − a) J d−1 , and let pJ 2 (t) = t pJ 1 (t) + bJ − t a. J Then J = (pJ 1 (t); pJ 2 (t)) is a rational parametrization. P(t) Proof. To prove the lemma one has to show that at least one of the components of J P(t) is not a constant. Let g(x; t) = f(x; tx + bJ − at). J We see that pJ 1 (t) = a. J Indeed, if pJ 1 (t) = a, J since pJ1 (t) is the root of quotient of g(x; t) and (x − a) J d−1 , one has that g(x; t) = (x − a) J d + R(t), where ∈ R? , and R(t) ∈ R(t). Moreover, since R(t) is the remainder and (x − a) J d−1 is monic in x, one has that R(t) is a polynomial. Let us say s that R(t) = as t + · · · + a0 , with as = 0. Thus,
y − bJ f(x; y) = g x; x − aJ
= (x − a) J d+
J s + as−1 (y − b) J s−1 (x − a) J + · · · + a0 (x − a) J s as (y − b) : s (x − a) J
J s which is impossible However, if s¿0 this implies that (x − a) J divides as (y − b) because as = 0. Hence s = 0; i.e. R(t) is a constant . That is, f(x; y) = (x − a) J d + . Therefore, since f(x; y) is a univariate of polynomial of degree bigger than 1, it is reducible and hence it is not -irreducible which is impossible. J = (pJ 1 (t); pJ 2 (t)) in Lemma 1 is proper. Lemma 2. The parametrization P(t) J pJ 1 − a). J J Proof. Note that t = (pJ 2 − b)=( J Thus, P(t) is proper and its inverse is (y − b)= (x − a). J In the next lemma, for P ∈ R2 and ¿0, we denote by D(P; ) the Euclidean disk D(P; ) = {(x; y) ∈ R2 | (x; y) − P2 6 }:
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
635
Lemma 3. Let C be an a:ne algebraic curve, de;ned by a polynomial f(x; y) ∈ R[x; y], having a real -singularity PJ of multiplicity r. Then, there exists ¿0 such J ) is also an -singularity of multiplicity r of C. that any point Q ∈ D(P; Proof. We denote by fi; j the partial derivative @i+j f=@i x@ j y. Since PJ is an J -singularity of multiplicity r, for i +j = 1; : : : ; r −1, it holds that |fi; j (P)|¡f(x; y). J = i; j for i+j = 1; : : : ; r −1. Then, for each i; j there exist i; j ¿0 Let us denote |fi; j (P)| such that i; j = f(x; y) − i;j ¡ f(x; y): We consider = min{i; j , i + j = 1; : : : ; r − 1} (note that ¿0). On the other hand, since all partial derivatives are continuous, let M bound all partial derivatives up to J ), and let be strictly smaller than min{=(2M ); }; order r in the compact set D(P; note that M ¿0 since otherwise it would imply that C contains a disk of points which J ). Then, by applying the Mean Value Theorem, we is impossible. Now, take Q ∈ D(P; have that for i + j = 1; : : : ; r − 1 J + |fi; j (P) J − fi; j (Q)| 6 i; j + |∇(fi; j (!i; j )) · (PJ − Q)T |; |fi; j (Q)| 6 |fi; j (P)| J Then, one concludes that where !i; j is on the segment joining Q and P. |fi; j (Q)| 6 f(x; y) − i; j + 2M 6 f(x; y) − + 2M ¡ f(x; y): Therefore, Q is an -singularity of multiplicity r of C. Now, let C ∈ Ld be de4ned by the polynomial f(x; y). Then by Lemma 3, one deduces that C has in4nitely many (d − 1)-fold -singularities. For our purposes, we are interested in choosing the singularity appropriately. More precisely, we say that J is a proper (d − 1)-fold -singularity of C if the polynomial PJ = (a; J b) d j1 +j2 =d−1
@j1 +j2 f J J j2 1 ; (P)(x − a) J j1 (y − b) @j1 x@j2 y j1 !j2 !
is irreducible over C. Note that this is always possible because a small perturbation of the coeIcients of a polynomial transforms it onto an irreducible polynomial. The following theorem shows that the implicit equation of the rational curve de4ned by the parametrization generated by the above process can be obtained also, as in the conic case, by Taylor expansions at the -singularity. In fact, the theorem includes as a particular case the result for conics. This result will avoid quotient computations and will be used to analyze the error. J Theorem 1. Let f(x; y) be the implicit equation of a curve C ∈ Ld and let PJ = (a; J b) be a proper -singularity of multiplicity d − 1 of C. Let pJ 1 (t) be the root in R(t) J of the quotient of f(x; tx + bJ − at) J and (x − a) J d−1 , and let pJ 2 (t) = t pJ 1 (t) + bJ − t a. Then the implicit equation of the rational curve CJ de;ned by the parametrization
636
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
J = (pJ 1 (t); pJ 2 (t)) is P(t) J y) = f(x; y) − T (x; y); f(x; J where T (x; y) is the Taylor expansion up to order d − 1 of f(x; y) at P. Proof. Let J + f(x; y) = f(P)
d j1 +j2 =1
@j1 +j2 f J J j2 1 (P)(x − a) J j1 (y − b) j j 1 2 @ x@ y j1 !j2 !
J Thus, be the Taylor expansion of f(x; y) at P. J + f(x; tx + bJ − t a) J = f(P)
d j1 +j2 =1
@j1 +j2 f J 1 (P)(x − a) J j1 +j2 t j2 @j1 x@j2 y j1 !j2 !
@j1 +j2 f J 1 = (x − a) J d−1 (P)(x − a) J j1 +j2 −d+1 t j2 j1 x@j2 y !j @ j 1 2! j1 +j2 =d−1 d−2 @j1 +j2 f 1 J + J + f(P) (P)(x − a) J j1 +j2 t j2 j1 j2 j1 !j2 ! j1 +j2 =1 @ x@ y d
= (x − a) J d−1 M (x; t) + N (x; t); where N (x; t) = T (x; tx + bJ − t a); J
M (x; t) =
S(x; tx + bJ − t a) J d−1 (x − a) J
J We observe and S(x; y) is the Taylor expansion from order d − 1 up to order d at P. that degx (M ) = 1, and degx (N )6d − 2. On the other hand, let U (x; t) and V (x; t) be the quotient and the remainder of f(x; tx + bJ − t a) J and (x − a) J d−1 w.r.t. x, respectively. Then, f(x; tx + bJ − t a) J = (x − a) J d−1 U (x; t) + V (x; t) with degx (V )6d − 2. Therefore, (x − a) J d−1 (M (x; t) − U (x; t)) = V (x; t) − N (x; t): Thus, since the degree w.r.t. x of V − N is smaller or equal d − 2, and (x − a) J d−1 divides V − N , one gets that M = U and V = N . In this situation, J P(t)) J J J J f( = f(P(t)) − T (P(t)) = f(pJ 1 (t); t pJ 1 (t) + bJ − t a) J − T (P(t)) J = (pJ1 (t) − a) J d−1 U (pJ1 (t); t) + N (pJ 1 (t); t) − T (P(t)) J J = T (P(t)) − T (P(t)) = 0:
Moreover, since PJ is a proper -singularity of multiplicity d − 1 of C, one has that fJ J J is irreducible, and thus P(t) parametrizes C.
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
637
This result can be applied to derive a similar algorithm for parametrizing approximate algebraic curves by lines similar to the symbolic algorithm. Numerical parametrization by lines • Given the de4ning polynomial f(x; y) of C ∈ Ld , d¿2. J of a rational curve CJ close to C. • Compute a rational parametrization P(t) 1. If d = 2 compute an aIne -point PJ of C, else compute a proper -singularity PJ of C of multiplicity d − 1. J y) = f(x; y) − T (x; y) where T (x; y) is the Taylor expansion of 2. Compute f(x; J f(x; y) up to order d − 1 at P. J 3. Apply step 3 of the symbolic algorithm to fJ and P. 3. Examples In this section, we illustrate the numerical parametrization algorithm developed in Section 2 by some examples where one can check that the output rational curve CJ is close to the original curve C. This behavior will be clari4ed in the error analysis section. We give an example in detail, where we explain how the algorithm is performed, and we summarize seven other examples in di5erent tables. In these tables we show J the the input curve C, the tolerance considered, the -singularity, the output curve C, J J and a 4gure representing C and C. J output parametrization P(t) de4ning the curve C, Example 1. We consider = 0:001 and the curve C of degree 6 de4ned by the polynomial f(x; y) = y6 + x6 + 2:yx4 − 2:y4 x + 10−3 x + 10−3 y + 2 · 10−3 + 10−3 x4 : First of all, by applying the algorithm developed in [11], we observe that the polynomial f(x; y) is -irreducible. Now, we apply the 4rst step of the Algorithm Numerical Parametrization by Lines, and we compute the -singularity. For this purpose, we determine the solutions of the system (see [9,27]) @4 f @4 f (x; y) = (x; y) = 0; @4 x @4 y under 4xed precision f(x; y) = 0:002. We get four solutions PJ 1 = (−0:06650062380 + 0:1157587268I; 0:06683312414 + 0:1154704132I ); PJ 2 = (−0:06650062380 − 0:1157587268I; 0:06683312414 − 0:1154704132I ); PJ 3 = (0:1875000000 · 10−5 ; −0:50000002 · 10−3 ); PJ 4 = (0:1329993725; −0:1331662483): Only the root PJ 3 , satis4es that i+j @ f J @i x@ j y (P 3 ) 6 0:002;
i + j = 0; : : : ; 3:
638
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
Then PJ = PJ 3 = (0:1875000000 · 10−5 ; −0:50000002 · 10−3 ) is an -singularity of multiplicity 5, and therefore C ∈ L60:001 . Applying the second step of the Algorithm Numerical Parametrization by Lines, we compute J y) = f(x; y) − T (x; y); f(x; J where T (x; y) is the Taylor expansion of f(x; y) up to order 5 at P, T (x; y) = 0:001000000000x + 0:0010000000000y + 0:1000000173 · 10−8 yx + 0:1300000000 · 10−10 x4 + 0:7500000034 · 10−8 x3 − 0:2499999700 · 10−8 y3 + 0:4000000160 · 10−2 xy3 + 0:1500000000 · 10−4 x3 y − 0:2109375027 · 10−13 x2 + 0:3000000000 · 10−12 y4 − 0:2812500001 · 10−11 y2 − 0:4218750000 · 10−10 yx2 + 0:3000000240 · 10−5 y2 x + 0:2000000000 · 10−2 : One gets the curve CJ de4ned by J y) = −0:1250000464 · 10−12 x + 0:1125000100 · 10−14 y + 0:9999999873 · 10−3 x4 + f(x; 2:yx4 −2:y4 x−0:1000000173·10−8 yx+y6 +x6 −0:7500000036·10−8 x3 +0:2499999700· 10−8 y3 + 0:2109375029 · 10−13 x2 − 0:3000000180 · 10−12 y4 + 0:2812500000 · 10−11 y2 − 0:1500000000 · 10−4 x3 y − 0:4000000160 · 10−2 xy3 − 0:3000000240 · 10−5 y2 x + 0:4218750000 · 10−10 yx2 + 0:1562500311 · 10−18 : J Thus, we compute Now, we apply step 3 of the symbolic algorithm to fJ and P. @5 fJ 1 @5 fJ t t 5 @5 fJ + + · · · + @5 x 5! @4 x@y 4! 5! @5 y A(x; y; t) = 6 6 J J @ f 1 @ f t t 6 @6 fJ + 5 + ··· + 6 @ x 6! @ x@y 5! 6! @6 y 6x + 2:000000000t − 2:000000000t 4 + 6yt 5 = 1 + t6 and we return J t) − 0:50000002 · 10−3 ) J t) + 0:1875000000 · 10−5 ; −tA(P; P(t) = (−A(P; = (pJ 1 (t); pJ 2 (t)); where pJ 1 (t) =
−2:000000000t + 0:3000000120 · 10−2 t 5 + 0:1875000000 · 10−5 t 6 1 + t6 4 2:000000000t − 0:9375000000 · 10−5 + 1 + t6
and pJ 2 (t) =
−0:4887500200 · 10−3 − 2:000000000t 4 − 0:3000000120 · 10−2 t 5 1 + t6 2:000000000t − 0:5000000200 · 10−3 t 6 + : 1 + t6
See Fig. 3 to compare the input curve and the rational output curve.
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
Fig. 3. Input curve C (left) and output curve CJ (right).
Example 2. Input curve C
16:001 + 24:001x + 8y − 2y2 + 12yx + 14:001x2 + 2y2 x + x2 y + x4 − y3 + 6:001x3
Tolerance
0:001
-Singularity
(−2; −2)
Output curve CJ
16:008 + 24:012x + 8y − 2y2 + 12yx +14:006x2 + 2y2 x + x2 y + x4 − y3 + 6:001x3
Parametrization J = (pJ 1 (t); pJ 2 (t)) P(t)
pJ 1 = t 3 − 0:001 − t − 2t 2 ;
Figures Curve C(left) J Curve C(right)
pJ 2 = t 4 + 1:999t − t 2 − 2t 3 − 2
639
640
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
Example 3. Input curve C
x4 + 2y4 + 1:001x3 + 3x2 y − y2 x − 3y3 + 0:00001y2 − 0:001x − 0:001y − 0:001
Tolerance
0:001
-Singularity
(−0:1248595915 10−6 ; 0:1249844199 · 10−6 )
Output curve CJ
x4 + 2:y4 + 1:001x3 + 3:x2 y − y2 x − 3:y3 + 10−6 y2 − 0:6243761996 · 10−13 x − 0:6260915576 · 10−13 y + 0:9744187291 · 10−23 − 0:3522924910 · 10−16 x2 + 0:9991263887 · 10−6 xy 2
−6 4
3
t −6:15167t ; pJ 1 = −0:487671 · 2:0526−2:05055t +6:15167t+0:512063·10 Parametrization 1+2:t 4 3 2 4 J = (pJ 1 (t); pJ 2 (t)) −2:05260t+2:05055t −6:15167t +6:15167t +0:256287·10−6 P(t) pJ 2 = 0:487671 · : 1+2:t 4
Figures Curve C(left) J Curve C(right)
Example 4. Input curve C
y5 + x5 + x4 + 0:001x + 0:001y + 0:002 + 0:001x2 + 0:005y2 + 0:001x3
Tolerance
0:01
-Singularity
(− 0:0002501; 0)
Output curve CJ
y5 + x5 + x4 + 0:6255863298·10−10 x + 0:9999998183·10−3 x3 + 0:3912115701 · 10−14 + 0:3751562603 · 10−6 x2
Parametrization −6 2384119+597t 5 ; pJ 2 = −0:9987492180 1+tt 5 : J = (pJ 1 (t); pJ 2 (t)) pJ 1 = − 41902244·10 · 1+t 5 P(t)
Figures Curve C (left) Curve CJ (right)
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
641
Example 5. −10:x + 2:y + xy4 + 862:x4 y − 359:x3 y2 + 3:099 Input curve C
− 859:967x3 y + 39:x2 y3 + 299:011x2 y2 + 52:x2 y − 3:xy3 + 5:xy2 − 7:901xy + 687:x4 − 642:x5 − 67:989x3 + 14:x2 − 9:989y4 + y5 − 4:y3 − y2
Tolerance
0:1
-Singularity
(0:999067678; 1:99734) 10:12701492x + 1:548607302y + xy4 + 862:x4 y − 359:x3 y2 − 859:9670000x3 y + 39:x2 y3
Output curve CJ
+ 299:0110000x2 y2 + 52:18519488x2 y − 3:xy3 + 4:626307400xy2 − 7:063248589xy − 642:x5 − 67:98172465x3 + 13:33333837x2 − 9:989000000y4 + y5 − 3:999974822y3 − 0:9012712980y2 + 687:x4 + 3:247948193 pJ 1 = 0:22545229 ·
0:69592866·103 −0:128422685·104 t+0:0102:t 4 +4:4313t 5 t 5 +t 4 +862:t+39:t 3 −642−359:t 2 3 2
3 3
t −0:19495476·10 t + 0:22545229 · 0:81893515·10 Parametrization t 5 +t 4 +862:t+39:t 3 −642−359:t 2 ; 5 J = (pJ 1 (t); pJ 2 (t)) P(t) t−0:82845609·104 t 2 +4:4380666t 5 pJ 2 = 0:22545229 · 0:111775629·10 t 5 +t 4 +862:t+39:t 3 −642−359:t 2
+ 0:22545229 ·
Figures Curve C(left) J Curve C(right)
0:27553162·104 t 3 −0:35891982·103 t 4 −0:56876434·104 t 5 +t 4 +862:t+39:t 3 −642−359:t 2
642
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
Example 6. Input curve C
x3 + x2 y + x2 + xy2 + y3 + y2 − 0:999990x − 0:999980y − 0:9999600
Tolerance
0:01
-Singularity
(−0:99000000; 0)
Output curve CJ
x3 + x2 y + x2 + xy2 + y3 + y2 − 0:9603000x − 0:9801000y − 0:9604980
Parametrization J = (pJ 1 (t); pJ 2 (t)) P(t)
−0:99t pJ 1 = 0:99t+0:98−t ; pJ 2 = t(1:98t+1:97−0:01t 1+t+t 2 +t 3 1+t+t 2 +t 3
2
3
2
)
Figures Curve C(left) J Curve C(right)
Example 7. Input Curve C
y5 + x5 + x4 − 2:y4 + 10−3 x + 10−3 y + 10−3 + 10−3 x2 + 10−3 x3 + 2 · 10−3 y2 x + 10−3 y3
Tolerance
0:01
-Singularity
(−0:2501564001 · 10−3 ; 0:1250195 · 10−3 )
Output curve CJ
0:6255863298 · 10−10 x + 0:1562864926 · 10−10 y + y5 + x5 + x4 − 2:y4 + 0:9999998183 · 10−3 x3 + 0:3751562603 · 10−6 x2 + 0:9999997015 · 10−3 y3 − 0:1875194239 · 10−6 y2 + 0:3423651857 · 10−14 2 4
t +0:2177516307·10 pJ 1 = −0:114881528 · 8:695909548−0:1740379799·10 Parametrization 1+t 5 5 −3 J 8:702443119t −4:346866016t+0:544123596·10 P(t) = (pJ 1 (t); pJ 2 (t)) pJ 2 = 0:2297630556 · : 1+t 5
Figures Curve C(left) J Curve C(right)
−2 5
t
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
643
Example 8.
Input curve C
291:9690000x − 17:00300000y − 100:9940000y2 + 20:y4 x − 511:9760000x2 + x7 − 14:x6 + 82:x5 − 259:9990000x4 + 479:9920000x3 + 29:y5 − 74:99900000y4 − 40:y3 x + 40:y2 x − 160:x2 y + 140:xy + 2:x5 y − 20:x4 y + 80:x3 y + y7 − 7:y6 + 114:9960000y3 − 72:98400000 − 4:y5 x
Tolerance
0:001
-Singularity
(2; 1)
Output curve CJ
−73 + 292:x − 17:y − 101:y2 − 512:x2 + x7 − 14:x6 −260:x4 + 480:x3 + 29:y5 − 75:y4 − 40:y3 x −160:x2 y + 140:xy + 2:x5 y − 20:x4 y + 80:x3 y + y7 −7:y6 + 115:y3 − 4:y5 x + 20:y4 x + 82:x5 + 40:y2 x:
Parametrization 2(t 7 +1+2t 5 −t) ; J = (pJ 1 (t); pJ 2 (t)) pJ 1 = t 7 +1 P(t)
pJ 2 = 4t
6
−2t 2 +t 7 +1 t 7 +1
Figures Curve C(left) J Curve C(right)
4. Error analysis Examples in Section 3 show that, in practice, the output curve of our algorithm is quite close to the input one. In this section we analyze how far these two aIne curves are. To be more precise let C ∈ Ld be de4ned by f(x; y). In addition, we will denote by J = pJ 1 (t) ; pJ 2 (t) ; P(t) q(t) J q(t) J J Moreover, where gcd(pJ i ; q) J = 1, the generated parametrization of the output curve C. since we will measure distances, we may assume that the -singularity of C is the
644
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
origin, otherwise one can apply a translation such that it is moved to the origin and distances are preserved. Also we assume that f(x; y) = 1, otherwise we consider f(x; y)=f(x; y). If one does not normalize the input polynomial f(x; y), a similar treatment with relative errors can be done. In this situation, the general strategy we will follow is to show that almost any aIne real point on CJ is at small distance of an aIne real point on C. For this purpose, we J observe that P(t) is an exact parametrization of CJ obtained by lines, and therefore all aIne real points on CJ are obtained as the intersection of a line of the form y = tx, for J Then, if one intersects the curve C with the same line one gets d points t real, with C. on C, counted properly, and we show that at least one of these intersection points on J Also, we observe that it is enough to reason with C is close to the initial point on C. slope parameter values of t in the interval [−1; 1] because if |t|¿1 one may apply a similar strategy intersecting with lines of the form x = ty. Therefore, let t0 ∈ R be such J 0 ). Let us J 0 ) = 0. Then, the corresponding point QJ on CJ is QJ = P(t that |t0 |61 and q(t expressed QJ as J = QJ = (a; J b)
aJ1 bJ1 ; ; cJ cJ
where aJ1 = pJ 1 (t0 ), aJ2 = pJ 2 (t0 ) and cJ = q(t J 0 ). Observe that, since we are cutting with the line y = t0 x, it holds that bJ = t0 a. J Thus, if we write the aIne point QJ projectively J Now, observe that if |aJ1 | and |c| J are simultaneously very one has that (aJ1 : t0 aJ1 : c). small, i.e. very close to , this point is not well de4ned as an element in P2 (R). For J is bigger than a certain bound that this reason, we will assume that either |aJ1 | or |c| depends on the tolerance. In fact, for our error analysis, we 4x that |aJ1 | ¿ 1=d
or |c| J ¿ 1=d :
Furthermore, we observe that the de4ning polynomials of CJ and C have the same homogeneous form of maximum degree, and hence both curves have the same points at in4nity. Now, let Q = (a; b) be any aIne point in C ∩ {y = t0 x}; note that here it also holds that b = t0 a. We want to compute the Euclidean distance between QJ and Q. In order to do that, we observe that QJ − Q2 =
√ (aJ − a)2 + (bJ − b)2 = (aJ − a)2 (1 + t02 ) 6 2| aJ − a|:
Therefore, we focus on the problem of computing a good bound for | aJ − a|. For this purpose we 4rst prove two di5erent lemmas that will be used as general strategies in our reasonings. Lemma 2. It holds that |aJ − a| 6 · C;
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
where
d−2 C=
j1 +j2 =0
|a| J j1 +j2 |t0 |j2 1=j1 !j2 ! |a| J d−1 |c| J
645
:
J t0 x) = xd−1 Proof. First of all, we note that aJ is a root of the univariate polynomial f(x; (cx J − aJ1 ), and that a is a root of the univariate polynomial f(x; t0 x) = xd−1 (cx J − aJ1 ) +
d−2 j1 +j2 =0
@j1 +j2 f 1 (0; 0)xj1 (t0 x)j2 : j j 1 2 @ x@ y j1 !j2 !
Since (0; 0) is the (d − 1)-fold -singularity of CJ it holds that j1 +j2 j 1 @ f |t0 | 2 J t0 x) = (0; 0) max f(x; t0 x) − f(x; j1 +j2 =0;:::;d−2 @j1 x@j2 y j1 !j2 ! j1 +j2 @ f 6 max (0; 0) ¡ f(x; y) = j1 +j2 =0;:::;d−2 @j1 x@j2 y J t0 x) can be written as and thus f(x; J t0 x) = f(x; t0 x) + R(x) where R ∈ R[x] f(x;
and
R(x) ¡ :
Therefore, by applying standard numerical techniques to measure |aJ − a| by means of the condition number (see for instance [7, p. 303]), one deduces that |aJ − a| 6 · C; where
d−2 C=
|a| J j1 +j2 |t0 |j2 1=j1 !j2 ! = J |@f=@x( a; J t0 a)| J
j1 +j2 =0
d−2
j1 +j2 =0
|a| J j1 +j2 |t0 |j2 1=j1 !j2 ! |a| J d−1 |c| J
:
Lemma 3. Let h(x) = c
n
i=1
(x − ci ) ∈ C[x] with deg(h) = n
and let ∈ C be such that |h()|6. Then, there exists a root ci0 of h(x) such that | − ci0 | 6
|c|
1=n :
Proof. Let us assume that for i = 1; : : : ; n, | − ci |¿(=|c|)1=n . Then, |h()| = |c|
n
i=1
| − ci | ¿ ;
which contradicts that |h()|6.
646
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
Now, we proceed to analyze |aJ − a| by using the previous lemmas. For this purpose, we distinguish di5erent cases depending on the values of |aJ1 | and |c|: J Lemma 4. Let |c|¿1. J Then, it holds that: 1. If |a|¿1, J then |aJ − a|6 exp(2). 2. If |a|61, J then |aJ − a|6( exp(2))1=d . Proof. 1. If |a|¿1, J we have that the constant C in Lemma 2 can be bounded as d−2 C= 6
j1 +j2 =0
d−2 k=0
|a| J j1 +j2 |t0 |j2 1=j1 !j2 ! |a| J d−1 |c| J
d−2 =
k=0
(|a| J + |at J 0 |)k =k! |a| J d−1 |c| J
d−2 (1 + |t0 |)k (1 + |t0 |)k 6 6 exp(1 + |t0 |) 6 exp(2): k!|a| J d−1−k k! k=0
Therefore, by Lemma 2 we deduce that |aJ − a| 6 exp(2): 2. If |a|61, J we have that d−2 @j1 +j2 f 1 J j1 j2 |f(a; J at J 0 )| = f(a; (0; 0) a J J at J 0) + (t a) J 0 j j 1 2 j1 !j2 ! j1 +j2 =0 @ x@ y d−2 1 @j1 +j2 f j1 j2 = (t a) J (0; 0) a J 0 j1 +j2 =0 @j1 x@j2 y j1 !j2 ! 6
d−2 j1 +j2 =0
j +j @ 1 2f j 1 j2 1 |a| J j2 (0; 0) 6 exp(|a|(1 J + |t0 |)) @j1 x@j2 y J |t0 | |a| j1 !j2 !
6 exp(2): In this situation, by Lemma 3 we deduce that there exists a root of the univariate polynomial f(x; t0 x), that we can assume w.l.o.g. that is a, such that |aJ − a| 6
exp(2) |c| J
1=d
6 ( exp(2))1=d :
Lemma 5. Let |c|¡1 J and |aJ1 |¿1. Then, it holds that |aJ − a|6 exp(2).
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
647
Proof. Since |c|¡1 J and |aJ1 |¿1, we have that the constant C in Lemma 2 can be bounded as d−2 d−2 k J j1 +j2 |t0 |j2 1=j1 !j2 ! J (d−2−k) =k! j1 +j2 =0 |a| k=0 (|aJ1 | + |aJ1 t0 |) |c| = C= |a| J d−1 |c| J |aJ1 |d−1 d−2 d−2 (1 + |t0 |)k (1 + |t0 |)k 6 6 6 exp(1 + |t0 |) 6 exp(2): d−1−k k! k=0 k!|aJ1 | k=0 Therefore, by Lemma 2 we deduce that |aJ − a| 6 exp(2): Finally, it only remains to analyze the case where |c|¡1 J and |aJ1 |¡1. In order to do that, we recall that we have assumed that either |aJ1 | or |c| J is bigger than 1=d . In the next lemma, we study these cases. Lemma 6. It holds that: 1. If |c|¡1 J and 1=d ¡|aJ1 |¡1, then |aJ − a|61=d exp(2). 2. If |aJ1 |¡1 and 1=d ¡|c|¡1, J then |aJ − a|6(1=2 exp(2))1=d . Proof. 1. If |c|¡1 J and |aJ1 |¿1=d , we have that the constant C in Lemma 2 can be bounded as d−2 d−2 j1 +j2 −d+1 J j1 +j2 |t0 |j2 1=j1 !j2 ! |t0 |j2 1=j1 !j2 ! j1 +j2 =0 |a| j1 +j2 =0 |aJ1 | C= = |a| J d−1 |c| J |c| J j1 +j2 −d+2 d−2 J d−j1 −j2 −2 |t0 |j2 1=j1 !j2 ! j1 +j2 =0 |c| = |aJ1 |d−j1 −j2 −1 d−2 j2 exp(2) j1 +j2 =0 |t0 | 1=j1 !j2 ! 6 6 6 exp(2) −1+1=d |aJ1 |d−1 |aJ1 |d−1 Therefore, by Lemma 2 we deduce that |aJ − a| 6 1=d exp(2): 2. Let 1=d ¡|c|¡1 J and |aJ1 |¡1. First we assume that |aJ1 |61=d . Otherwise we would J In these conditions, we deduce reason as in [1]. Thus, one has that |aJ1 |61=d ¡|c|¡1. that d−2 @j1 +j2 f 1 J j1 j2 |f(a; J at J 0 )| = f(a; J at J 0) + (t a) J (0; 0) a J 0 j1 x@j2 y @ j !j ! 1 2 j1 +j2 =0 d−2 1 @j1 +j2 f j1 j2 = J (0; 0)aJ (t0 a) j1 +j2 =0 @j1 x@j2 y j1 !j2 !
648
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
6
d−2 j1 +j2 =0
j +j @ 1 2f j 1 J 1 |t0 |j2 |a| J j2 @j1 x@j2 y (0; 0) |a| j1 !j2 !
6 exp(|a|(1 J + |t0 |)) 6 exp(2): Now, by Lemma 3 we deduce that there exists a root of the univariate polynomial f(x; t0 x), that we can assume w.l.o.g. that is a, such that |aJ − a| 6
exp(2) |c| J
1=d
6 ( exp(2))1=d
= ( · exp(2))1=d
1 1=2d
1 1 6 ( exp(2))1=d 1=d2 |c| J 1=d
= (1=2 exp(2))1=d :
From the previous lemmas, one deduces the following theorem. Theorem 2. For almost all a:ne real point QJ ∈ CJ there exists an a:ne real point Q ∈ C such that √ QJ − Q2 6 21=2d exp(2): Proof. Applying Lemmas 4–6 one deduces that
(aJ − a)2 + (bJ − b)2 = (aJ − a)2 (1 + t02 ) √ √ 6 2|aJ − a| 6 21=2d exp(2):
QJ − Q2 =
J be a regular point on CJ such that there exists Q = (a; b) ∈ C Now, let QJ = (a; J b) √ J with Q − Q2 6 21=2d exp(2) (see Theorem 2). In this situation, we consider the J where (nx ; ny ) is the unitary J i.e. T (x; y) = nx (x − a) tangent line to CJ at Q; J + ny (y − b), J J normal vector to C at Q. Then, we bound the value T (Q): J 6 QJ − Q2 (nx + ny ) J + ny · |b − b| T (Q) 6 nx · |a − a| √ 1=2d 6 2 2 exp(2): Therefore, reasoning as in Section 2.2 of [17] one deduces the following theorem. √ Theorem 3. C is contained in the o<set region of CJ at distance 2 21=2d exp(2):
References [1] S. Abhyankar, C. Bajaj, Automatic parametrization of rational curves and surfaces III: algebraic plane curves, Comput. Aided Geom. Des. 5 (1988) 321–390. [2] E. Arrondo, J. Sendra, J.R. Sendra, Parametric generalized o5sets to hypersurfaces, J. Simbolic Comput. 23 (1997) 267–285.
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
649
[3] C. Bajaj, C.M. Ho5mann, J.E. Hopcroft, R.E. Lynch, Tracing surface intersections, Comput. Aided Geom. Des. 5 (1988) 285–307. [4] C. Bajaj, A. Royappa, Parameterization in 4nite precision, Algorithmica 27 (1) (2000) 100–114. [5] C. Bajaj, G. Xu, Piecewise approximations of real algebraic curves, J. Comput. Math. 15 (1) (1997) 55–71. [6] B. Beckermann, G. Labahn, When are two polynomials relatively prime? J. Symbolic Comput. 26 (1998) 677–689. [7] R. Bulirsch, J. Stoer, Introduction to Numerical Analysis, Springer, New York, 1993. [8] J. Canny, The complexity of robot motion planning, ACM Doctoral Dissertation Award, The MIT Press, Cambridge, MA, 1987. [9] R.M. Corless, P.M. Gianni, B.M. Trager, S.M. Watt, The singular value decomposition for polynomial systems, Proceedings of the ISSAC 1995, ACM Press, New York, 1995, pp. 195–207. [10] R.M. Corless, M.W. Giesbrecht, D.J. Je5rey, S.M. Watt, Approximate polynomial decomposition, in: S.S. Dooley (Ed.), Proceedings of the ISSAC 1999, Vancouver, Canada, ACM Press, New York, 1999, pp. 213–220. [11] R.M. Corless, M.W. Giesbrecht, I.S. Kotsireas, M. van Hoeij, S.M. Watt, Towards factoring bivariate approximate polynomials, Proceedings of the ISSAC 2001, Bernard Mourrain, London, 2001, pp. 85–92. [12] R.M. Corless, M.W. Giesbrecht, I. Kotsireas, S.M. Watt, Numerical implicitization of parametric hypersurfaces with linear algebra. Proceedings of the Arti4cial Intelligence with Symbolic Computation (AISC 2000), Lecture Notes in Arti4cal Intelligence, Springer, Berlin, 1930, pp. 174–183. [13] J. Demmel, D. Manocha, Algorithms for intersecting parametric and algebraic curves II: multiple intersections, Graphical Models and Image Processing: GMIP 57 (2) (1995) 81–100. [14] T. Dokken, Approximate implicitization, in: T. Lyche, L.L. Schumakes (Eds.), Mathematical Methods for Curves and Surfaces in GAGD, Oslo 2000, Innovations in Applied Mathematical Series, Vanderbilt University Press, 2001, pp. 81–102. [15] I.Z. Emiris, A. Galligo, H. Lombardi, Certi4ed approximate univariate GCDs, J. Pure Appl. Algebra 117 and 118 (1997) 229–251. [16] I.Z. Emiris, Pan, Y. Victor, Symbolic and numeric methods for exploiting structure in constructing resultant matrices, J. Symbolic Comput. 33(4) 2002 393–413. [17] R.T. Farouki, V.T. Rajan, On the numerical condition of algebraic curves and surfaces. 1: Implicit equations, Comput. Aided Geom. Des. 5 (1988) 215–252. [18] Fortune, Polynomial root-4nding using iterated eigenvalue computation, Proceedings of the ISSAC 2001, ACM Press, New York, pp. 121–128. [19] J. Gahleitner, B. JCuttler, J. Schicho, Approximate parameterization of planar cubic curve segments, Proceedings of the Fifth International Conference on Curves and Surfaces, Saint-Malo, 2002, Nashboro Press, Nashville, TN, 2002, pp. 1–13. [20] A. Galligo, D. Rupprech, Irreducible decomposition of curves, J. Symbolic Comput. 33 (2002) 661–677. [21] A. Galligo, S.M. Watt, in: W. Kuchlin (Ed.), A Numerical Absolute Primality test for Bivariate Polynomials, ISSAC 1997, Maui, USA, ACM, New York, 1997, pp. 217–224. [22] G.H. Golub, C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore and London, 1989. [23] E. Hartmann, Numerical parameterization of curves and surfaces, Comput. Aided Geom. Des. 17 (2000) 251–266. [24] M. van Hoeij, Computing parametrizations of rational algebraic curves, in: J. von zur Gathen (Ed.), Proceedings of the ISSAC94, ACM Press, New York, pp. 187–190. [25] M. van Hoeij, Rational parametrizations of curves using canonical divisors, J. Symbolic Comput. 23 (1997) 209–227. [26] C.M. Ho5mann, Geometric and Solid Modeling, Morgan Kaufmann Publ., Inc., Los Altos, CA, 1993. [27] S. Krishnan, D. Manocha, Solving algebraic systems using matrix computations, Sigsam Bull. ACM. 30 (4) (1996) 4–21. [28] H.M. MColler, GrCobner bases and numerical analysis, in: B. Buchberger, F. Winkler (Eds.), GrCobner Bases and Applications, Lecture Notes in Statistics, Vol. 251, Springer, Berlin, 1998, pp. 159–178.
650
S. P erez-D )az et al. / Theoretical Computer Science 315 (2004) 627 – 650
[29] V.Y. Pan, Numerical computation of a polynomial GCD and extensions, Tech. report, N. 2969, Sophia-Antipolis, France, 1996. [30] V.Y. Pan, Univariate polynomials: nearly optimal algorithms for factorization and root4nding, ISSAC 2001, London, Ont., Canada, ACM Press, New York, NY, USA, 2001, pp. 253–267. [31] T. Sasaki, Approximate multivariate polynomial factorization based on zero-sum relations, Proceedings of the ISSAC 2001, ACM Press, New York, NY, USA, 2001, pp. 284–291. [32] J. Schicho, Rational parametrization of surfaces, J. Symbolic Comput. 26 (1998) 1–9. [33] J.R. Sendra, F. Winkler, Symbolic parametrization of curves, J. Symbolic Comput. 12 (1991) 607–631. [34] J.R. Sendra, F. Winkler, Parametrization of algebraic curves over optimal 4eld extension, J. Symbolic Comput. 23 (1997) 191–207. [35] J.R. Sendra, F. Winkler, Algorithms for rational real algebraic curves, Fund. Inform. 39 (1999) 211–228. [36] H. Stetter, Stabilization of polynomial systems solving with Groebner bases, Proceedings of ISSAC 97, 1997, pp. 117–124.
Theoretical Computer Science 315 (2004) 651 – 669
www.elsevier.com/locate/tcs
Numerical factorization of multivariate complex polynomials Andrew J. Sommesea , Jan Verscheldeb;∗ , Charles W. Wamplerc a Department
of Mathematics, University of Notre Dame, Notre Dame, IN 46556-4618, USA of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, 851 South Morgan (M/C 249), Chicago, IL 60607-7045, USA c General Motors Research & Development, Mail Code 480-106-359, 30500 Mound Road, Warren, MI 48090-9055, USA b Department
Abstract One can consider the problem of factoring multivariate complex polynomials as a special case of the decomposition of a pure dimensional solution set of a polynomial system into irreducible components. The importance and nature of this problem however justify a special treatment. We exploit the reduction to the univariate root 0nding problem as a way to sample the polynomial more e1ciently, certify the decomposition with linear traces, and apply interpolation techniques to construct the irreducible factors. With a random combination of di3erentials we lower multiplicities and reduce to the regular case. Estimates on the location of the zeroes of the derivative of polynomials provide bounds on the required precision. We apply our software to study the singularities of Stewart–Gough platforms. c 2004 Elsevier B.V. All rights reserved. MSC: Primary 13P05; 14Q99; Secondary 65H10; 68W30 Keywords: Approximate factorization; Divided di3erences; Generic points; Homotopy continuation; Irreducible decomposition; Newton interpolation; Numerical algebraic geometry; Monodromy; Multiple roots; Polynomial; Stewart–Gough platform; Symbolic-numeric computation; Traces; Witness points
∗
Corresponding author. Fax: +1-312-996-1491. E-mail addresses:
[email protected] (A.J. Sommese),
[email protected],
[email protected] (J. Verschelde),
[email protected] (C.W. Wampler). URLs: http://www.nd.edu/∼sommese, http://www.math.uic.edu/∼jan c 2004 Elsevier B.V. All rights reserved. 0304-3975/$ - see front matter doi:10.1016/j.tcs.2004.01.011
652
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
1. Introduction We consider a polynomial f with complex coe1cients in several variables. We wish to write f as a product of irreducible polynomials: f(x) =
N i=1
qi (x)i ;
x = (x1 ; x2 ; : : : ; xn );
N i=1
i deg(qi ) = deg(f):
(1)
Note that for each i, the factor qi occurs with multiplicity i . The problem of factoring multivariate polynomials occurs frequently in computer algebra. Especially the case when the coe1cients of f are known only approximately is important for applications and is stated as a challenge problem to symbolic computation in [10]. Recent papers on this problem are [2,3,5,6,8,19,20]. These papers propose algorithms in hybrid symbolic-numeric computation [4]. We 0nd our way of working very much related to the method of computing the approximate gcd of two polynomials using their zeros, as presented in [1]. Using homotopies theoretically, the complexity of factoring polynomials with rational coe1cients was shown in [1] to be in NC. The crux of our approach is the numerical computation of witness sets, which in the case of a single polynomial means 0nding the intersection of f−1 (0) with a generic line in Cn and then partitioning these witness points according to their membership in the irreducible factors. In this way, each factor qi is witnessed by deg(qi ) distinct points of multiplicity i . The main contribution of this paper is a symbolic-numeric method for reducing multiplicities by di3erentiation, along with an analysis of the numerical stability of this step. Subsequent to the determination of the witness sets, one can numerically reconstruct the coe1cients of each qi by tracking the witness points in a continuation as the generic line is moved and interpolating points on these paths. For many purposes, the interpolation step is not necessary; for example, using only the witness set for a component, a homotopy membership test can check if a given point lies on that component. In this sense, we may consider the polynomial to be numerically factored once the witness set for f has been decomposed into the witness sets for its irreducible components qi . Because interpolation can be sensitive to errors in the sample points, it is preferable to work directly with the witness sets whenever possible. Even so, for the sake of completeness, we carry out the interpolation step in our test examples. The approach just described is a specialization of the tools we built for computing an irreducible decomposition of the solution set of a polynomial system. These tools were developed to implement the research program Numerical Algebraic Geometry, outlined in [30]. In [22] we gave algorithms to decompose solution sets of polynomial systems into irreducible components of various degrees and dimensions, applying an embedding and sequence of homotopies [21] to 0nd generic points e1ciently. The homotopy test presented in [23] to determine whether a given point belongs to an irreducible component led to the use of monodromy [24] for which linear traces [25] provide an e1cient veri0cation and interpolation method. Applications of our software [28]
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
653
to design problems in mechanical engineering are described in [27]. A tutorial to our recent developments can be found in [29]. In [24] we gave an algorithm for using monodromy to decompose the zero set of a polynomial system into irreducible components. The main di1culty in the use of monodromy occurs when we track points on irreducible components of multiplicity at least two. In [26] we presented a method for tracking these singular paths, but it necessitates special care and usually requires higher precision arithmetic. In this article we specialize the algorithm of [25] to the case of a single polynomial f(x) on Cn where it is possible to replace singular paths by nonsingular ones via di3erentiation. Consider the example f(x) = (x12 − x2 )3 (x12 + x22 + x23 ). If this polynomial is represented exactly, we could symbolically di3erentiate twice with respect to x1 to obtain a polynomial where x12 − x2 is a factor of multiplicity one. However, when f is represented numerically in unfactored form, care must be taken to ensure that di3erentiation does not lead to numerical instability and erroneous results. For simplicity, 0rst assume that the polynomial is on C. Data for polynomials arising in engineering and science is sometimes noisy. Also zeroes of polynomials of multiplicity more than one are hard to compute exactly. Thus even if the actual polynomial has a factor with multiplicity ¿2, we must expect that if we solve the restriction of the polynomial to a general line, we will 0nd not a single zero of multiplicity , but a cluster of zeroes. Assume a polynomial has a zero z of multiplicity . Di3erentiating − 1 times will yield a polynomial having a nonsingular factor x − z. Because of slightly perturbed coe1cients or roundo3 errors, we may have a nearby polynomial f(x) containing a factor i=1 (x − zi ) with zi near z. In contrast to the case of an exact multiple root, it does not follow that f(−1) (x) has a single zero near z. Here a remarkable result of Marden and Walsh [11] (formulated as Corollary 3.3 below) gives mild numerical conditions guaranteeing the numerical stability of the symbolic operation of di3erentiation. It guarantees that if we have a cluster of roots in a disk D of radius r, and no root outside of D is a distance less than R from the center of the disk D, then under mild conditions on the size of R=r, f(−1) (x) has one root in D, and a lower bound is given for the distance of any root of f(−1) (z) outside of D to the center of D. For polynomials of several variables, i.e., x ∈ Cn , we 0nd the roots of the restriction of f(x) to a general line, i.e., a line x(t) = x0 + tv where we have chosen random vectors x0 ; v ∈ Cn . For roots of multiplicity one, we can use the monodromy technique of [24], or if the degree of f(x) is low, use the trace theorem of [25] to justify the partial sum approach of [19]. To deal with the clusters of roots we can compute the ( − 1)th derivative of f(x) in the direction v, i.e., with v = (v1 ; : : : ; vn ), we compute −1 @ @ g(x) := v1 + · · · + vn f(x) (2) @x1 @xn and apply the techniques to the multiplicity one roots of g(x) corresponding to the clusters. Since −1 d g(x0 + tv) = f(x0 + tv); (3) dt
654
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
we can use Corollary 3.3 to check that g(x0 + tv) has multiplicity one roots corresponding to the multiplicity clusters. Now as we vary the line, we can use g(x) to track the appropriate roots. As a numerical safety check we can check that the continuations of these roots on the line intersection with g−1 (0) have small residual when we evaluate f(x) at them. In this paper, we 0rst outline the algorithms using pseudocode. Then we justify our use of di3erentiation to remove multiplicities and examine the implications of the results of Marden and Walsh for the behavior of root clusters under di3erentiation. After discussing some numerical aspects of our implementation, we apply our software to a problem from mechanical engineering concerning the singularities of Stewart–Gough platforms.
2. Algorithms Given a pure k-dimensional a1ne algebraic set Z, we use the term “witness point set” to designate the intersections of Z with a generic linear space LN −k of complementary dimension N − k (see for example, [22]). LN −k can be de0ned by k linear equations on CN , each having random, complex coe1cients, or equivalently, it can be given in N −k parametric form as x = x0 + i=1 vi ti , where x0 ; vi ∈ CN are random and complex. In the case at hand, Z is a hypersurface given by a single polynomial equation f(x) = 0, so we intersect it with a one-dimensional linear space, L1 . In the initial stage we compute a set of witness points on the hypersurface de0ned by f(x) = 0 and store clustered points according to the size of the cluster. More precisely, if d = deg(f), WitnessGenerate computes d witness points and partitions the set of witness points into W = {W1 ; W2 ; : : : ; Wm }, where each Wi is a set of clusters of size i. Algorithm W = WitnessGenerate(f; x0 ; v) Input : f(x) polynomial in n variables with complex coe1cients; x0 and v represent a random line x(t) = x0 + tv. Output : W = {W1 ; W2 ; : : : ; Wm }, m = maxdi=1 i , for all X ∈ Wi : #X = i . The method to solve a polynomial in one variable is invoked once in WitnessGenerate. The algorithm RegularFactor assumes the witness points all have multiplicity one. It is invoked repeatedly in the main factorization algorithm. Algorithm P = RegularFactor(f; x0 ; v; W1 ) Input : f(x) polynomial in n variables with complex coe1cients; x0 and v represent a random line x(t) = x0 + tv; W1 set of t-values for which f(x(t)) = 0, with multiplicity one. k Output : P = {p1 ; p2 ; : : : ; pk }, k irreducible factors of f with i=1 deg(pi ) = #W1 . We have two di3erent implementations of RegularFactor: (1) Using the monodromy grouping algorithm [24], certi0ed with linear traces [25]. (2) Applying linear traces to enumerated factors [5,6,19].
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
655
Fig. 1. Illustration of the combinatorial method to factor a cubic, given by witness points 1, 2, 3. Every question is answered by a linear trace test.
Both methods apply path-following techniques. In each step of the path tracker we slightly move the random line, predict the location of the solutions and feed the predicted solutions to Newton’s method for correction. The main di3erence between the two implementations lies in the number of computed samples. With monodromy we compute witness point sets on many random lines, and take the witness points connected by paths as belonging to the same irreducible factor. Linear traces are then applied to certify the factorization predicted by the monodromy. In the enumeration method, we also use linear traces, but plainly enumerate all possible factorizations. For three witness points, the enumeration method runs as in Fig. 1. Each test is answered by the computation of a linear trace and comparing the value at the linear trace with the sum directly computed from the samples. The largest number of tests in this algorithm occurs when the factor witnessed by W1 is irreducible, and equals 2w−1 − 1, where w = #W1 . After the grouping of the witness points along the irreducible factors, we can apply interpolation techniques to 0nd symbolic expressions for the polynomials. In our implementation we postponed all interpolation to the end, because this stage is most time consuming and sometimes also not really necessary. After WitnessGenerate and RegularFactor we have all irreducible factors of multiplicity one. To build the higher multiplicity factors, we propose to take random combinations of all partial derivatives. Each di3erentiation cuts the multiplicity by one and the number of solutions in the cluster drops accordingly. To process Wi , f is di3erentiated i − 1 times yielding g := D(i−1) f. The routine ReneCluster takes the center of each clustered set of Wi as an initial approximation for root re0nement with g. Thereafter we can apply RegularFactor on g and the reduced set Wi as before. With the witness points grouped according to the factors, we can apply interpolation methods, e.g., using traces [20], to obtain symbolic representations of the factors. 3. How clusters of zeroes spread out under di$erentiation At the innermost level of our routine, we have reduced the problem to following a multiple root of a complex polynomial in one variable. Recall that if h(z) has a root
656
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
of multiplicity at a point z0 , then its derivative h (z) has a root of multiplicity − 1 at z0 , and so h(−1) (z) has a nonsingular root at z0 . While a nonsingular root is much easier to compute accurately than a multiple root, we must be concerned about the numerical stability of the di3erentiation. This problem manifests itself in the way that the roots away from z0 move under di3erentiation. Ignoring the roots exactly at z0 , it may well happen that h (z) has at least one root closer to z0 than any root of h(z), and after − 1 derivatives, some root may come so close to z0 as to be numerically indistinguishable from z0 itself. For a simple example showing the problem, consider h(z) = z (z −1)d− . The root z = 0 occurs with multiplicity , with the d− remaining roots at z = 1. The derivative h (z) has z = 0 as a root of multiplicity − 1, but it also has a root =d, which for large d can be near zero. If we will need to di3erentiate − 1 times, this can lead to a serious problem. Algorithm Q = Factor(f) Input : f(x) polynomial in n variables with complex coe1cients. Output : Q = { (qi ; i ) | i = 1; 2; : : : ; N }, irreducible factors qi with multiplicities i . x0 := Random(n; C); v := Random(n; C); W := WitnessGenerate(f; x0 ; v); Q := ∅; P := RegularFactor(f; x0 ; v; W1 ); for all pi ∈ P do Q := Q ∪ (pi ; 1); end for; n @ D := vi ; @xi i=1 g := f; for = 2; 3; : : : ; #W do g := Dg; Wi := ReneCluster(g; x0 ; v; Wi ); P := RegularFactor(g; x0 ; v; Wi ); for all pi ∈ P do Q := Q ∪ (pi ; ); end for; end for; return Q.
[choose n random numbers in C] [v is direction of line x(t) = x0 + tv] [9nd witness points] [Q will collect all factors] [9nd regular factors] [collect multiplicity one factors] [di:erential along direction v] [g is ( − 1)-th di:erential of f, = 1] [construct multiplicity factors] [di:erentiate so that g = D(−1) f] [re9ne center of clusters] [9nd regular factors] [collect multiplicity factors]
The problem is further exacerbated if, due to numerical roundo3 in its representaˆ tion, we begin with h(z), nearby to h(z), having a cluster of roots near z0 . After di3erentiation, we would like to have an hˆ (z) with a cluster of − 1 roots near z0 and all other roots far away, but as seen from the above example, this may not always be the case. The following result gives some bounds on the behavior of the roots under di3erentiation and helps in guiding the choice of how many digits of precision we should use in implementation of our algorithm. The result follows from a very
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
657
special case of a beautiful, but somewhat intricate, classical result of Marden and Walsh [11, Theorem 21.1] about the geometry of the zeroes of the derivative of a polynomial. For the convenience of the reader, we include a self-contained proof of the result we need. As a preliminary step, we derive a simple result on sums of complex numbers. Lemma 3.1. Let u1 ; : : : ; u denote ¿0 complex numbers satisfying |ui − !|¡r where ! and r are real numbers satisfying !¿r¿0. Then 1 : ¡ (4) !+r i=1 ui √
Proof. Write each ui in polar form, ri e −1#i . Since !¿r, the real parts of each 1=ui are positive, and we have from the triangle inequality that 1 ¿ Re 1 = 1 cos(−#i ): (5) ui i=1 ui i=1 i=1 ri Note that for a 0xed ri , the smallest value of the positive number (1=ri ) cos(−#i ) for |ui − !|¡r occurs at the boundary |ui − !| = r. It is a simple calculus problem that the minimum of the real part of 1=u for u satisfying |u − !| = r, occurs when u = ! + r. Theorem 3.2 (Marden and Walsh [11]). Let h(z) be a degree d polynomial of one complex variable. Assume that h(z) has zeroes in the disk $r (z0 ) := {z ∈ C| |z − z0 |6r} and d − zeroes in the region {z ∈ C | |z − z0 |¿R}, where R¿r. Then, if (R + r)¿2 dr, it follows that h (z) has − 1 roots in $r (z0 ) and all the remaining d − roots in the region z ∈ C |z − z0 | ¿ (R + r) − r : (6) d Proof. By translation we can assume without loss of generality that z0 = 0. In this case we abbreviate $r (0) to $r . Without loss of generality we assume that h(z) is monic, i.e., that the highest order term of h(z) is z d . Note that to prove the theorem, it is enough to prove the following assertion. Claim 1. Given a real number ! satisfying R¿!¿r, it follows that if !¡
(R + r) − r; d
(7)
then h has exactly − 1 zeroes in the disk $! . The assumptions of Theorem 3.2 and Claim 1 imply that we may write h(z) = p(z) q(z), where p(z) is a monic polynomial of degree , which has the same zeroes with multiplicities, that h(z) has in $r , and q(z) is a monic polynomial with all roots at least distance R¿!¿r from the origin. The polynomial p (z)q(z) has the same zeroes in
658
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
$r as p (z). Therefore, it su1ces to check, that for !¿r satisfying Eq. (7), p (z)q(z) and h (z) have the same number of zeroes counting multiplicities in $! . By RouchRe’s Theorem, e.g., [11, p. 2], we know that p (z)q(z) and h (z) = p (z)q(z) + p(z)q (z) have the same number of zeroes in $! if |p(z)q (z)| ¡ |p (z)q(z)|
(8)
for z satisfying |z| = !. Therefore, to prove Claim 1, it su1ces to show that Eq. (7) implies Eq. (8). Since h(z) has no zeroes satisfying |z| = !, it su1ces to show that Eq. (7) implies q (z) ¡ p (z) : (9) p(z) q(z) Letting z1 ; : : : ; z denote the zeroes of p(z), each listed a number of times equal to its multiplicity, and letting w1 ; : : : ; wd− denote the zeroes of q(z), each listed a number of times equal to its multiplicity, we see that Eq. (9) is equivalent to d− 1 1 : (10) ¡ j=1 z − wj i=1 z − zi Consequently, we prove Claim 1, and hence the theorem, by showing that for |z| = !; d− 1 d− 1 : (11) ¡ ¡ ¡ j=1 z − wj R−! !+r i=1 z − zi The leftmost inequality in expression (11) follows from the triangle inequality and the fact that for z satisfying |z| = ! we have |z − wj |¿R − !. The middle inequality is a simple consequence of Eq. (7). To complete the proof, we proceed as follows. For ! √ 0xed, let z∗ = !e −1#∗ denote the point that minimizes the rightmost side of expression (11). Then, 1 1 ¿ √ 1 = : √ (12) z − zi −1#∗ − z − −1#∗ i=1 i=1 !e i=1 ! − zi e i Since |zi |6r, we see that Lemma 3.1 applies, and the result is shown. We denote the kth derivative of h by h(k) (z). Corollary 3.3. Let h(z) be a degree d polynomial of one complex variable. Assume that h(z) has zeroes in the disk $r (z0 ) := {z ∈ C | |z − z0 |6r} and d − zeroes in the region {z ∈ C | |z −z0 |¿R}, where R¿r. Assume further that k6 − 1. Then, if R=r¿ 2 d =d − + 1 − 1, it follows that h(k) (z) has − k roots in $r (z0 ) and all the remaining d − roots in the region
k (13) z ∈ C |z − z0 | ¿ (R + r) − r : d k
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
659
Proof. Use Theorem 3.2 k times. Suppose that h(z) is an approximation to an underlying exact polynomial having a root of multiplicity . If we increase the precision used to evaluate h(z) su1ciently, it will have a tight cluster of roots near the multiple root of the exact polynomial. Re0ning the precision until the radius r of the cluster is within the bound given by Corollary 3.3 assures us that the roots of h(k) (z) remain clustered. In particular, letting r denote the radius of a disk $r (z0 ) around z0 , the multiplicity weighted average of the roots, which contains the cluster of roots, and letting R denote the distance from z0 to the 0rst root outside of $r (z0 ), we have that the conservative estimate
2 d R ¿ r d−+1
(14)
guarantees that h(k) (z), for all k6 − 1, has exactly − k zeroes in $r (z0 ). For example, for a polynomial of degree 75, with roots of at most multiplicity 10, log10 (R=r)¿10:41 is su1cient. For a given d, the worst case happens when is approximately d=2. Thus log10 (R=r)¿57:3 is su1cient for a degree 200 polynomial having a worst case root of multiplicity 100. We may also regard log10 (R=r) as a measure of the number of decimal places needed to accurately carry out the computations. Using Stirling’s approximation, we see that the number of decimal places needed in case ≈ d=2, and thus for all with the given d is approximately: log10 (R=r) ¿ 0:3d:
(15)
To determine how to set the precision, we rearrange inequality (14) into d
r6
2
d−+1
R:
(16)
We consider the quantities at the right of (16) as 0xed, while the radius r of the cluster is variable. The right-hand side of (16) imposes a bound on the accuracy of the cluster, i.e., the number of decimal places all points in the cluster need to agree on. In a straightforward but e3ective manner we can apply Newton’s method locally to all points in the cluster, adapting the precision until the radius r is small enough. 4. Computational experiments In this section, we report numerical results obtained with PHCpack [31]. This software package has recently been extended with facilities to handle positive dimensional solution components (see [28]). In particular, invoking the executable with the option -f gives access to the capability to factor multivariate polynomials. The black-box solver of PHCpack now applies the numerical factorization methods when given on input a single polynomial in several variables. All our experiments were done with standard double precision arithmetic.
660
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
4.1. A numerical implementation Specialized versions of the path tracking routines in PHCpack have been built to deal with the case of homotopies between systems of one polynomial equation in several variables: f(z(t; &)) = 0;
(17)
z(t; &) = (x0 + tv)& + (y0 + tw)(1 − &)
(18)
where de0nes the movement from the line x(t) = x0 +tv to the line y(t) = y0 +tw, as & moves from 1 to 0, i.e.: z(t; &) = x(t)& + y(t)(1 − &). One motivation for this specialized code is to save linear algebra operations. Without the parametric representation of a general line x(t) = x0 +tv, we have to consider polynomial systems of n equations consisting of one polynomial (the polynomial we wish to factor) and n − 1 hyperplanes to cut out the line. Now we can use, like in [20], the method of Weierstrass 1 (also known as Durand–Kerner) to solve f(x(t)) = 0. See [32] for methods to locate zero clusters of univariate polynomials. The other motivation is that we hope to have a better understanding and control of the numerical stability of the algorithms. To give an impression of the numerical di1culty of computing witness points, consider the substitution of x(t) = x0 +tv into the polynomial f(x) = xd (in one variable x). Application of the binomial theorem gives d d x0d−k vk t k = 0 f(x) = (x0 + tv)d = (19) k k=0 which must be solved for t. Assuming that the magnitudes of x0 and v are approximately one, the leading coe1cient is also magnitude one, but for degree d = 30, the largest coe1cient (k = 15) has a magnitude occupying nine decimal places. Such large ranges in coe1cients are known to cause numerical sensitivity. Extrapolating from this simple case to the general case of computing all witness points, solving f(x(t)) = f(x0 + tv) = 0 for t, we warn that even if the original coe1cients of f are nice, and if we choose the entries of x0 and v on the complex unit circle, for large degrees, the univariate polynomial in t may have coe1cients that vary greatly in magnitude. To deal with such polynomials numerically, higher working precision may be required. This also implies that the path tracking in the monodromy phase of the algorithm may need higher precision. In previous work on polynomial systems having positive dimensional solution sets of higher multiplicity [28], we already extended the path-tracking routines in PHCpack to multi-precision. For such general polynomial systems, multi-precision path tracking tends to be computationally expensive, but we expect more reasonable execution times when addressing homotopies of only one polynomial equation. 1 In [15] this method is quali0ed as “quite e3ective and increasingly popular”. Convergence is global and quadratic in the limit [17].
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
661
Our working precision determines the accuracy of the algorithm RegularFactor. For instance, consider the polynomial f(x; y) = xy + 10−16 . Working with standard double precision Toating point numbers, the constant in f(x; y) will be ignored and a loop which shows f is irreducible will not be found, and also the validation with linear traces will con0rm the breakup into the factors x and y. On the other hand, doubling the precision will show that f is irreducible. In principle, the groupings of the monodromy algorithm can deal with approximate coe1cients if we set the working precision according to the accuracy level of the coe1cients. However, this scheme only works for su1ciently low degrees, because more precision is usually needed to evaluate polynomials of high degree. To obtain symbolic expressions of the polynomial factors, we applied the interpolation methods using traces, developed and implemented for any degree and any number of variables using multi-precision arithmetic if needed, reported in [25]. Multivariate Newton interpolation with divided di3erences is outlined in [9]. Since the algorithms for multivariate Newton interpolation involve a recursive application of the classical one variable case, the error analysis in [7, pp. 110–111] shows the relation between the errors on the coe1cients, the degree of the polynomial, and the working precision. In particular, the error Uc on the coe1cients is bounded by 1 |Uc| 6 − 1 |L| |f|; (20) (1 − 3u)n where u is an upper bound on the roundo3 in one arithmetical operation (depends on the working precision), n is the degree of the polynomial, L is the product of n bidiagonal matrices of dimension n (expressing the computation of divided di3erences in matrix–vector form), and f is the vector of function values at sample points. Note, however, that usually we do not need the symbolic representation of the factors to work with them. For instance, with the witness points we can determine whether a point satis0es a factor, via the homotopy membership test of [23]. 4.2. Singularities of Stewart–Gough platforms A mechanical device of considerable interest in mechanical engineering is the Stewart –Gough platform, consisting of a moving platform supported above a stationary base by six legs. One end of the ith leg connects to the base via a ball joint centered on point bi (given in the base coordinate system) and the other end connects to the platform via a ball joint at point ai (given in platform coordinates). The length of the leg, Li , is controlled by a linear actuator. A good general reference discussing this device and its relatives is Merlet [14]. For 0xed leg lengths, the device is generally rigid, but at singular con0gurations, the rigidity of the device is lost. That is, even though the leg lengths are held constant, the platform has at least one combination of velocity and angular velocity that, up to 0rst order, is unconstrained. In the design of such a device, an understanding of its singularities is crucial; see [12,13] and their references for background. The condition for singularity can be derived as follows. We represent the position of the platform by p ∈ C3 and the orientation by a quaternion q ∈ P3 . Letting vi be the
662
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
vector, in base coordinates, from base point bi to the corresponding platform point ai , we have vi := −bi + p + R(q)ai = −bi + p + qai q =qq ;
(21)
where R(q) is the 3×3 rotation matrix giving the same rotation to a vector w as the quaternion operation qwq =qq . The squared length of leg i is L2i = vi · vi , so Li L˙i = vi · v˙i = vi · (p˙ + G × (Rai ));
(22)
where “·” is the vector inner product, × is the vector cross product, and G is the ˆ angular velocity vector. Rewriting R = R=Q with Q := qq = q02 + q12 + q22 + q32
(23)
and substituting from Eq. (21), we may transform Eq. (22) into ˆ i ) · p˙ + ((Ra ˆ i ) × (p − bi )) · G: QLi L˙i = (Q(p − bi ) + Ra Letting J be the matrix whose ith column is ˆ i Q(p − bi ) + Ra Ji = ˆ i ) × (p − bi ) ; i = 1; : : : ; 6 (Ra
(24)
(25)
the singularity condition is 0 = Jw for some w = 0, or equivalently, det J = 0. Since the elements of Rˆ and Q are quadratic polynomials in q, one sees that det J is a polynomial in p; q; ai ; bi . Taking all of these as variables, the 0rst three rows of J are cubic and the last three are quadric, so det J is a homogeneous polynomial of degree 1728 in 42 variables. Not much understanding is likely to result from analyzing such a complicated object, nor could we begin to deal with it numerically. However, considerable insight can be gained by studying cases where some variables are taken as given, as has been done in [12,18]. In the next few paragraphs, we study such cases, some never before published, and use our numerical algorithm to factor det J. In all of the following examples, we expanded det J into monomials for convenient input to our computer code. This made automatic computation of derivatives very simple, but it is a very ine1cient way to evaluate the polynomial. It would be much more e1cient and accurate to evaluate the matrix entries numerically and then evaluate the determinant by reducing the matrix to triangular form. Therefore, the computation times reported here are far from the best that could be achieved. The examples serve to show how the algorithm works on fully expanded polynomials. 4.2.1. General platform, 9xed position For the general platform, we give p, ai and bi , i = 1; : : : ; 6, as random, complex values; that is, we choose a generic Stewart–Gough platform at a generic position, and look for singularities arising from rotations of the platform. The factorization of one such example will indicate, with probability one, the form of the factorization for
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
663
Table 1 Cluster radius r versus distance R to the nearest root outside the cluster, for the 0rst case of general platform, 0xed position. There are three roots in every cluster
Cluster
r
R
R=r
One Two
1.7E-05 4.9E-06
3.4E-01 1.7E-01
2.0E+04 3.6E+04
Table 2 Execution times for the 0rst case of general platform, 0xed position
Elapsed user CPU times on 2:4 Ghz WindowsXP 1. 2. 3. 4.
Monodromy grouping: Linear traces certi0cation: Interpolation at factors: Multiplication validation:
0h 0h 1h 0h
6 min 0 min 41 min 0 min
40 s 30 s 53 s 8s
469 ms 672 ms 78 ms 156 ms
Total time for all 4 stages:
1h
49 min
12 s
391 ms
almost all 2 Stewart–Gough platforms. In this case, det J is a homogeneous polynomial of degree 12 in q = {q0 ; q1 ; q2 ; q3 }. We 0nd numerically that a generic line hits det J in six regular points and two singular points of multiplicity 3. The regular points form one factor of degree six. Using di3erentiation to remove the multiplicity, we 0nd that the two singular points form one quadratic factor, and interpolating that factor shows it to be precisely Q from Eq. (23). That is, det J = F1 (q)Q3 ;
(26)
where F1 (q) is a sextic. Since a quaternion with zero norm does not represent a valid rotation matrix, the factor of Q3 = 0 is not of physical signi0cance. The computed factorization is certi0ed with linear traces by comparing the value at the linear trace with the calculated sum of the witness points on each factor. The maximal di3erence in the comparison for the two factors is 2.049E-13. If we multiply the interpolated factors and take the di3erence with the original polynomial, then the largest norm of the coe1cients of the di3erence polynomial is 1.919E-05, as explained by roundo3 in the interpolation of high degree polynomials. The data for the cluster analysis, comparing cluster radius r and distance R to the nearest other root outside the cluster, is given in Table 1. For d = 12 and = 3, the right-hand side of estimate (14) evaluates to 44. This bound is clearly smaller than 104 , so the initial approximations for the multiple root are accurate enough for the di3erentiation process. Table 2 lists the execution times for each stage in the factorization: monodromy grouping, certi0cation with linear traces, interpolation in the factors, and 0nally, the 2 The exceptions will be an algebraic subset of the space of all platform devices as parameterized by p, ai , bi , i = 1; : : : ; 6.
664
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
Table 3 Cluster radius r versus distance R to the nearest root outside the cluster, for the second case of planar base and platform, 0xed position. There are three roots in every cluster
Cluster
r
R
R=r
One Two
6.2E-05 4.8E-05
2.4E-01 6.0E-01
3.8E+04 1.2E+04
Table 4 Execution times for the second case of planar base and platform, 0xed position
Elapsed user CPU times on 2:4 Ghz WindowsXP 1. 2. 3. 4.
Monodromy grouping: Linear traces certi0cation: Interpolation at factors: Multiplication validation:
0h 0h 1h 0h
17 min 0 min 24 min 0 min
34 s 27 s 45 s 8s
735 ms 359 ms 766 ms 172 ms
Total time for all 4 stages:
1h
42 min
56 s
32 ms
comparison between the product of the factors with the original polynomials. The evaluation of a polynomial of degree 12 in four variables with 910 terms is responsible for the dominance of stages one and three in the overall execution time. 4.2.2. Planar base and platform, 9xed position This is the same as the former case, except that the third component of each of ai , bi , i = 1; : : : ; 6 is zero, meaning that the points of the base are all in a common plane, as are the points of the platform. Now det J is still homogeneous of degree 12, and it still factors in two pieces: one irreducible single factor of degree six, and the quadratic factor having multiplicity three. The maximal di3erence in certifying with linear traces is 4.147E-11, i.e., the difference between the calculated sum in the roots and the value predicted by the linear trace of the factor is 4.147E-11, showing the inTuence of roundo3 in evaluation a polynomial of degree 12 in four variables with 910 terms. The inTuence of roundo3 in the comparison between the original and product of the interpolated factors is even more obvious: we see 9.483E-05 as the maximal norm of the di3erence in the coe1cients. In Table 3 we summarize the results of the cluster analysis for two clusters which contain witness points of multiplicity three. As in Table 1 we can make the same observations, to conclude that R=r ≈ 104 ¿44 is safe for the di3erentiations. Table 4 shows the execution times. The algorithm RegularFactor takes so much time because of the cost of evaluating a polynomial of degree 12 in four variables with 910 terms.
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
665
Table 5 Cluster radius r versus distance R to nearest root outside the cluster, for the third case of planar base and platform, parallel planes. There are three roots in the 0rst cluster, and 0ve roots in the other two clusters
Cluster
r
R
R=r
One Two Three
5.1E-07 7.3E-04 4.0E-03
1.0E+00 3.4E-01 7.2E-01
2.0E+06 4.7E+02 1.8E+02
4.2.3. Planar base and platform, parallel planes In this case, which was studied in [12,18], we consider a device with planar base and platform in a con0guration with the two planes parallel to each other. The condition of parallelism means that the platform is rotated only about the third axis, so q1 = q2 = 0. The position, p, is now left as variable, and det J becomes cubic in p, homogeneous of degree 12 in (q0 ; q3 ), and degree 15 in (p; q) together. One does not know a priori that the contribution of p will factor out separately, but in fact it does. The computed factorization is det J = ap33 (q0 + bq3 )(q0 + cq3 )(q0 + iq3 )5 (q0 − iq3 )5 ;
(27)
where a; b; c are constants that depend on the choice of ai ; bi . This result is in agreement with [18], when we consider that, over the complex numbers, any homogeneous polynomial in two variables breaks into linear factors. Notice that the multiplicity 0ve factors are points on Q from Eq. (23) and therefore are not of physical signi0cance. The condition p3 = 0 means that the two planes coincide, which is clearly singular since then the legs provide no support perpendicular to the plane. Otherwise, singularity does not depend on position at all, as the other factors depend only on orientation. This fact is used to advantage in [18] to characterize the singularities of the planar– planar Stewart–Gough platforms. The numerical results give a maximal di3erence over all factors between the computed sum of roots and the value at the linear trace as 8.047E-08. When we interpolate the factors and compare the multiplied factors with the original polynomial, we 0nd 3.599E-07 as the highest norm of the di3erence between the coe1cients. In Table 5 we summarize the results of the cluster analysis for the three factors occurring with multiplicities three, 0ve, and 0ve. Observe that the cluster radius grows as the multiplicity gets larger. The bound in the estimate of the right-hand side of (14) now evaluates to 546 using d = 15 and = 5. While the approximation for R=r lies now much closer to this bound, numerically we can still apply the di3erentiation procedure successfully. Table 6 shows the execution times. Since there is a single cluster that contains three witness points, we may immediately conclude that it represents a linear factor of multiplicity three, namely p33 without further testing. In contrast, there are two clusters of size 5, so we must apply monodromy or trace tests to determine whether they represent an irreducible quadratic or they factor into two linears. While the other polynomials of Sections 4.2.1 and 4.2.2 each have 910 terms, this polynomial is much
666
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
Table 6 Execution times for the third case of planar base and platform, parallel planes
Elapsed user CPU times on 2:4 Ghz WindowsXP 1. 2. 3. 4.
Monodromy grouping: Linear traces certi0cation: Interpolation at factors: Multiplication validation:
1 min 0 min 0 min 0 min
13 s 3s 4s 1s
656 ms 891 ms 734 ms 657 ms
Total time for all 4 stages:
1 min
23 s
938 ms
Table 7 Execution times for the factorization using monodromy, compared to enumerating factors, for the three cases of the singularities of the Stewart–Gough platform
User CPU times on 2:4 Ghz Windows XP Case
Monodromy
1 2 3
6 min 17 min 1 min
Enumeration 40 s 34 s 13 s
460 ms 735 ms 656 ms
40 s 31 s 3s
750 ms 657 ms 0 ms
Table 8 Execution times for the factorization using monodromy, compared to enumerating the irreducible factors, for three very sparse random polynomials of increasing degrees
User CPU times on 2:4 Ghz Windows XP Degree
Monodromy
10 15 16
5s 8s 16 s
Enumeration 484 ms 187 ms 63 ms
1s 2s
312 ms 453 ms 875 ms
sparser: only 24 terms. The sparsity reduces the cost of evaluation and explains why this polynomial is factored much faster than the other ones. 4.3. Monodromy compared to the enumeration method In this section, we show that for the polynomials of modest degrees that we factored in this paper, the enumeration method outperforms the monodromy algorithm. In Table 7 we list execution times for the three cases treated above. We see the enumeration method as a clear winner. Recall that the highest degree of an irreducible factor is six, so only relatively few tests are needed in the enumeration method. Irreducible polynomials are the most di1cult for the enumeration method. In Table 8 we compare execution times again, but now for random irreducible polynomials of 0ve
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
667
monomials, and for increasing degrees. We see that the ratio of monodromy time to enumeration time drops from 18 to about 6 as the degree increases.
5. Conclusions In this paper, we show how the numerical factorization of a single polynomial in several variables with approximate complex coe1cients can be accomplished by specializing continuation methods for the numerical irreducible decomposition of a polynomial system. In the case of a system of polynomials, components with multiplicity ¿1 require the tracking of a singular path, but this di1cult numerical task can be avoided in the specialization to a single polynomial. To do so, we symbolically di3erentiate the polynomial − 1 times, thus replacing singular roots with nonsingular ones. In Toating point calculations, a multiple root becomes a cluster of nearly singular roots. Via a result of Marden and Walsh, we can estimate the precision needed to successfully apply di3erentiation to replace such clusters with one nonsingular root. To illustrate the methods, we applied the algorithms in a study of singularities of Stewart–Gough platforms. We experienced that a numerical factorization (i.e., partition of the witness set) is is usually less expensive and more numerically stable than the subsequent interpolation to obtain coe1cients for the factor written as a polynomial. Since most questions can be answered via continuation of the witness set, the interpolation step can usually be omitted. As the algorithms require repeated evaluations of the polynomial and its derivatives, a major factor in the cost of the method is the sparsity of the polynomials. The sparser the polynomials, the faster the evaluation and interpolation algorithms run. Since the polynomials we can factor with standard arithmetic have modest degrees, the enumeration methods of Galligo and Rupprecht [5,6,19] proved to be faster than the monodromy method in our tests. If one were to factor polynomials of higher degree, the speed advantage might reverse, due to exponential growth of the number of cases the enumeration method may have to test. Acknowledgements We gratefully acknowledge the support of this work by Volkswagen-Stiftung (RiPprogram at Oberwolfach). The 0rst author thanks the Duncan Chair of the University of Notre Dame and National Science Foundation. This material is based upon work supported by the National Science Foundation under Grant No. 0105653. The second author thanks the Department of Mathematics of the University of Illinois at Chicago and National Science Foundation. This material is based upon work supported by the National Science Foundation under Grant No. 0105739 and Grant No. 0134611. The third author thanks General Motors Research and Development for their support. The authors thank AndrRe Galligo for bringing the work of [1] to their attention. Last and not least, we are grateful to the editors and referees for their suggestions to improve the paper.
668
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
References [1] C. Bajaj, J. Canny, T. Garrity, J. Warren, Factoring rational polynomials over the complex numbers, SIAM J. Comput. 22 (2) (1993) 318–331. [2] R.M. Corless, A. Galligo, I.S. Kotsireas, S.M. Watt, A geometric-numeric algorithm for factoring multivariate polynomials, in: T. Mora (Ed.), Proc. 2002 Internat. Symp. on Symbolic and Algebraic Computation (ISSAC 2002), ACM, New York, 2002. [3] R.M. Corless, M.W. Giesbrecht, M. van Hoeij, I.S. Kotsireas, S.M. Watt, Towards factoring bivariate approximate polynomials, in: B. Mourrain (Ed.), Proc. 2001 Internat. Symp. on Symbolic and Algebraic Computation (ISSAC 2001), ACM, New York, 2001, pp. 85–92. [4] R.M. Corless, E. Kaltofen, S.M. Watt, Hybrid methods, in: J. Grabmeier, E. Kaltofen, V. Weispfenning (Eds.), Computer Algebra Handbook, Springer, Berlin, 2002, pp. 112–125. [5] A. Galligo, D. Rupprecht, Semi-numerical determination of irreducible branches of a reduced space curve, in: B. Mourrain (Ed.), Proc. 2001 Internat. Symp. on Symbolic and Algebraic Computation (ISSAC 2001), ACM, New York, 2001, pp. 137–142. [6] A. Galligo, D. Rupprecht, Irreducible decomposition of curves, J. Symbolic Comput. 33 (5) (2002) 661–677. [7] N.J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, PA, 1996. [8] Y. Huang, W. Wu, H.J. Stetter, L. Zhi, Pseudofactors of multivariate polynomials, in: C. Traverso (Ed.), Proc. 2000 Internat. Symp. on Symbolic and Algebraic Computation (ISSAC 2000), 2000, pp. 161–168. [9] E. Isaacson, H.B. Keller, Analysis of Numerical Methods, Dover, New York, 1994 (Dover Reprint of the 1966 Wiley editon). [10] E. Kaltofen, Challenges of symbolic computation: my favorite open problems, J. Symbolic Comput. 29 (6) (2000) 891–919. [11] M. Marden, The geometry of the zeroes of a polynomial in a complex variable, in: Mathematical Surveys, Vol. 3, American Mathematical Society, New York, 1949. [12] B. Mayer St-Onge, C.M. Gosselin, Singularity analysis and representation of the general Gough–Stewart platform, Internat. J. Robotics Res. 19 (3) (2000) 271–288. [13] J.P. Merlet, Singular con0gurations of parallel manipulators and Grassmann geometry, Internat. J. Robotics Res. 8 (5) (1999) 45–56. [14] J.P. Merlet, Parallel Robots, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2000. [15] V. Pan, Solving a polynomial equation: some history and recent progress, SIAM Rev. 39 (2) (1997) 187–220. [16] V. Pan, Computation of approximate polynomial GCDs and an extension, Inform. Comput. 167 (2001) 71–85. [17] L. Pasquini, D. Trigiante, A globally convergent method for simultaneously 0nding polynomial roots, Math. Comput. 44 (169) (1985) 135–149. [18] F. Pernkopf, M.L. Husty, Singularity analysis of spatial Stewart–Gough platforms with planar base and platform, Proc. ASME Design Eng. Tech. Conf. Montreal, Canada, September 30–October 2, 2002. [19] D. Rupprecht, Semi-numerical absolute factorization of polynomials with integer coe1cients, manuscript. [20] T. Sasaki, Approximate multivariate polynomial factorization based on zero-sum relations, in: B. Mourrain (Ed.), Proc. 2001 Internat. Symp. on Symbolic and Algebraic Computation (ISSAC 2001), ACM, New York, 2001, pp. 284–291. [21] A.J. Sommese, J. Verschelde, Numerical homotopies to compute generic points on positive dimensional algebraic sets, J. Complexity 16 (3) (2000) 572–602. [22] A.J. Sommese, J. Verschelde, C.W. Wampler, Numerical decomposition of the solution sets of polynomial systems into irreducible components, SIAM J. Numer. Anal. 38 (6) (2001) 2022–2046. [23] A.J. Sommese, J. Verschelde, C.W. Wampler, Numerical irreducible decomposition using projections from points on the components, in: E.L. Green, S. HoZsten, R.C. Laubenbacher, V. Powers (Eds.), Journal of Symbolic Computation: Solving Equations in Algebra, Geometry, and Engineering, Contemporary Mathematics, Vol. 286, American Mathematical Society, Providence, RI, 2001, pp. 37–51.
A.J. Sommese et al. / Theoretical Computer Science 315 (2004) 651 – 669
669
[24] A.J. Sommese, J. Verschelde, C.W. Wampler, Using monodromy to decompose solution sets of polynomial systems into irreducible components, in: C. Ciliberto, F. Hirzebruch, R. Miranda, M. Teicher (Eds.), Application of Algebraic Geometry to Coding Theory, Physics and Computation, Proc. NATO Conf., Eilat, Israel, Kluwer Academic Publishers, Dordrecht, February 25–March 1, 2001, pp. 297–315. [25] A.J. Sommese, J. Verschelde, C.W. Wampler, Symmetric functions applied to decomposing solution sets of polynomial systems, SIAM J. Numer. Anal. 40 (6) (2002) 2026–2046. [26] A.J. Sommese, J. Verschelde, C.W. Wampler, A method for tracking singular paths with application to the numerical irreducible decomposition, in: M.C. Beltrametti, F. Catanese, C. Ciliberto, A. Lanteri, C. Pedrini. W. de Gruyter (Eds.), Algebraic Geometry, a Volume in Memory of Paolo Francia, W. de Gruyter, Belmont, CA, 2002, pp. 329–345. [27] A.J. Sommese, J. Verschelde, C.W. Wampler, Advances in polynomial continuation for solving problems in kinematics, in: Proc. ASME Design Engineering Tech. Conf. (CDROM), Paper DETC2002/MECH-34254, Montreal, Quebec, September 29–October 2, 2002 (A revised version will appear in the ASME Journal of Mechanical Design). [28] A.J. Sommese, J. Verschelde, C.W. Wampler, Numerical irreducible decomposition using PHCpack, in: M. Joswig, N. Takayama (Eds.), Algebra, Geometry, and Software Systems, Springer, Berlin, 2003, pp. 109–130. [29] A.J. Sommese, J. Verschelde, C.W. Wampler, Introduction to numerical algebraic geometry, in: Graduate School on Systems of Polynomial Equations: From Algebraic Geometry to Industrial Applications, 2003, pp. 229–247 (Organized by A. Dickenstein, I. Emiris, 14–25 July 2003, Buenos Aires, Argentina. Notes published by INRIA). [30] A.J. Sommese, C.W. Wampler, Numerical algebraic geometry, in: J. Renegar, M. Shub, S. Smale (Eds.), The Mathematics of Numerical Analysis, Lectures in Applied Mathematics, Vol. 32, Proc. AMS-SIAM Summer Seminar in Applied Mathematics, Park City, Utah, July 17–August 11, 1995, Park City, UT, 1996, pp. 749–763. [31] J. Verschelde, Algorithm 795: PHCpack: a general-purpose solver for polynomial systems by homotopy continuation, ACM Transactions on Mathematical Software 25 (2) (1999) 251–276, Software available at http://www.math.uic.edu/∼jan. [32] J.-C. Yakoubsohn, Finding a cluster of zeros of univariate polynomials, J. Complexity 16 (3) (2000) 603–638.
Theoretical Computer Science 315 (2004) 671–672
Author index volume 315 (2004) The issue number is given in front of the page numbers.
Bai, Z.-J. and R.H. Chan, Inverse eigenproblem for centrosymmetric and centroskew matrices and their approximation Bauer , A., L. Birkedal and D.S. Scott, Equilogical spaces Berardi, S. and C. Berline, Building continuous webbed models for system F Berline, C., see S. Berardi Bini , D.A. and L. Gemignani, Bernstein–Bezoutian matrices Birkedal, L., see A. Bauer Bompadre, A., G. Matera, R. Wachenchauzer and A. Waissbein, Polynomial equation solving by lifting procedures for ramified fibers Burago, D., D. Grigoriev and A. Slissenko, Approximating shortest path for the skew lines problem in time doubly logarithmic in 1/epsilon Calcagno, C., Two-level languages for program optimization Chan, R.H., see Z.-J. Bai Ching, W.-K., see F.-R. Lin Codevico, G., see V.Y. Pan Croot, E., R.-C. Li and H.J. Zhu, The abc conjecture and correctly rounded reciprocal square roots de Paiva, V., see A. Schalk Emiris, I.Z., B. Mourrain and V.Y. Pan, Preface Furedi, . Z. and R.P. Kurshan, Minimal length test vectors for multiple-fault detection Gemignani, L., see D.A. Bini Grigoriev, D., see D. Burago Heinig, G. and K. Rost, Split algorithms for skewsymmetric Toeplitz matrices with arbitrary rank profile Kaporin, I., The aggregation and cancellation techniques as a practical tool for faster matrix multiplication Konecˇn!y, M., Real functions incrementally computable by finite automata Kurshan, R.P., see Z. Furedi . Li, R.-C., see E. Croot Lin, F.-R., W.-K. Ching and M.K. Ng, Fast inversion of triangular Toeplitz matrices Lowe, G., Semantic models for information flow Malajovich, G. and J.M. Rojas, High probability analysis of the condition number of sparse polynomial systems Mart!ın, J.S., see L.M. Pardo Matera, G., see A. Bompadre Mislove, M., Mathematical Foundations of Programming Semantics: Papers from MFPS 14 and MFPS 16 Mourrain, B., see I.Z. Emiris Ng, M.K., see F.-R. Lin . Nocker, M., see J. von zur Gathen Noutsos, D., S. Serra Capizzano and P. Vassalos, Matrix algebra preconditioners for multilevel Toeplitz systems do not insure optimal convergence rate O’Hearn, P.W., see D.J. Pym
(2–3) (1) (1) (1) (2–3) (1)
309–318 35– 59 3– 34 3– 34 319–333 35– 59
(2–3) 335–369 (2–3) (1) (2–3) (2–3) (2–3)
371–404 61– 81 309–318 511–523 581–592
(2–3) (1) (2–3) (1) (2–3) (2–3)
405–417 83–107 307–308 191–208 319–333 371–404
(2–3) 453–468 (2–3) (1) (1) (2–3) (2–3) (1)
469–510 109–133 191–208 405–417 511–523 209–256
(2–3) 525–555 (2–3) 593–625 (2–3) 335–369 (1) 1– 2 (2–3) 307–308 (2–3) 511–523 (2–3) 419–452 (2–3) 557–579 (1) 257–305
672
Author index volume 315 / Theoretical Computer Science 315 (2004) 671–672
Pan, V.Y., M. Van Barel, X. Wang and G. Codevico, Iterative inversion of structured matrices Pan, V.Y., see I.Z. Emiris Pardo, L.M. and J.S. Mart!ın, Deformation techniques to solve generalised Pham systems P!erez-D!ıaz, S., J. Sendra and J.R. Sendra, Parametrization of approximate algebraic curves by lines Pym, D.J., P.W. O’Hearn and H. Yang, Possible worlds and resources: the semantics of BI Rojas, J.M., see G. Malajovich Rost, K., see G. Heinig Schalk, A. and V. de Paiva, Poset-valued sets or how to build models for linear logics Schellekens, M.P., The correspondence between partial metrics and semivaluations Scott , D.S., see A. Bauer Sendra, J., see S. P!erez-D!ıaz Sendra, J.R., see S. P!erez-D!ıaz Serra Capizzano, S., see D. Noutsos Slissenko, A., see D. Burago Sommese, A.J., J. Verschelde and C.W. Wampler, Numerical factorization of multivariate complex polynomials Van Barel, M., see V.Y. Pan Vassalos, P., see D. Noutsos Verschelde, J., see A.J. Sommese . von zur Gathen, J. and M. Nocker, Fast arithmetic with general Gau periods Wachenchauzer, R., see A. Bompadre Waissbein, A., see A. Bompadre Wampler, C.W., see A.J. Sommese Wang, X., see V.Y. Pan Yang, H., see D.J. Pym Yang, Z., Encoding types in ML-like languages Zhu, H.J., see E. Croot
(2–3) 581–592 (2–3) 307–308 (2–3) 593–625 (2–3) 627–650 (1) (2–3) (2–3) (1) (1) (1) (2–3) (2–3) (2–3) (2–3)
257–305 525–555 453–468 83–107 135–149 35– 59 627–650 627–650 557–579 371–404
(2–3) (2–3) (2–3) (2–3) (2–3) (2–3) (2–3) (2–3) (2–3) (1) (1) (2–3)
651–669 581–592 557–579 651–669 419–452 335–369 335–369 651–669 581–592 257–305 151–190 405–417