OPTIMAL SOLUTION OF NONLINEAR EQUATIONS
This page intentionally left blank
OPTIMAL SOLUTION OF NONLINEAR EQUATIONS ...
79 downloads
1087 Views
10MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
OPTIMAL SOLUTION OF NONLINEAR EQUATIONS
This page intentionally left blank
OPTIMAL SOLUTION OF NONLINEAR EQUATIONS Krzysztof A. Sikorski
OXFORD UNIVERSITY PRESS
2001
OXFORD UNIVERSITY PRESS
Oxford New York Athens Auckland Bangkok Bogota Buenos Aires Calcutta Cape Town Chennai Dar es Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi Paris Sao Paulo Shanghai Singapore Taipei Tokyo Toronto Warsaw and associated companies in Berlin Ibadan
Copyright © 2001 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 Oxford is a registered trademark of Oxford University Press. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Sikorski, Krzysztof A., 1953Optimal solution of nonlinear equations / Krzysztof A. Sikorski. p. cm. Includes bibliographical references and index. ISBN 0-19-510690-3 1. Differential equations, Nonlinear—Numerical solutions. 2. Mathematical optimization. 3. Fixed point theory. 4. Topological degree. I. Title. QA377 S4918 2000 515'.355—dc21 99-045246
9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper
To my wife Elizabeth and our son John
This page intentionally left blank
Preface The purpose of this monograph is to provide an overview of optimal computational methods for the solution of nonlinear equations, fixed points of contra ve and noncontra ve mappings, and for the computation of the topological degree. We analyze the worst case scenario here. This means that for a given error criterion and a tolerance e, the methods guarantee the computation of an e-approximation to the solution for every function in a given class F. Optimal methods solve the problem in the smallest possible time. We study several classes of function, with special emphasis on tight complexity bounds and methods that are close to or achieve these bounds. In addition, pseudocodes and numerical tests of several methods are exhibited. The monograph should be viewed as a report on work in progress. We provide several worst case results, list numerous open problems, mention new work in average case analysis, as well as alternative models to be explored. The work on optimal complexity algorithms for nonlinear problems had its inception in the work of Kiefer in 1953 and Traub in 1961. This stream of research complements the classical numerical work on convergence of iterative techniques as summarized by Ortega and Rheinboldt in 1970. In the annotations to chapter 2 we give a brief history and list references to this work. In the 1980s Traub and Wozniakowski initiated a general complexity theory for solving continuous problems. Our work on nonlinear problems fits into this field presently known as Information-Based Complexity. In late 1980s a new stream of research in algebraic complexity was originated with the work of Blum, Shub, and Smale. Several optimality results and new algorithms for approximating zeros of systems of real and complex polynomial equations were derived. In addition a new algebraic topological complexity theory over the reals was established.
viii
PREFACE
We stress that the results we present strongly depend on the assumptions on the classes of function considered and confirm the worst case behavior of numerical experiments. It may surprise some pra tioners that, for example, the bisection method is proven to be optimal, even in the class of infinitely smooth function changing a sign at the endpoints of an interval. We stress that this is a worst case result. It confirms the fact that all iterative techniques based on linear information, in the worst case, exhibit only linear convergence. This was also the conclusion of extensive numerical experiments. On the contrary, at least quadratic rate of convergence of hybrid methods was observed in many tests. This is indeed confirmed by the average case analysis of a hybrid bisection-Newton, and bisectionsecant type methods carried out by Novak, Ritter, and Wozniakowski in 1995 (see the annotations to chapter 2). They also proved that such hybrid methods are optimal on the average. We present a brief summary of all chapters. More-detailed summaries are given in theINTRODUCTIONSto each chapter. In chapter 1 the basic concepts are illustrated with a simple bisection example and then formalized in the language of InformationBased Complexity theory. Adversary arguments lead to lower bounds on errors of algorithms. The real number model with oracle calls is defined. Basic notions of optimal information and algorithms are specified. In chapter 2 we analyze optimal methods for solving scalar and multivariate nonlinear equations. We define various error criteria and then review optimality results in several classes of function. An asymptotic analysis of one problem is also undertaken. In chapter 3 we discuss fixed point problems for the case of contractive functions. We first derive optimala algorithms for the univariate case under the relative and absolute error criteria. Then the multivariate case is analyzed. Several algorithms are derived including the circumscribed ellipsoid, centroid, and ball iteration methods. In chapter 4 we deal with fixed point problems for noncontra ve function. We exhibit the optimal bisection-envelope method for the univariate case. We prove that the multivariate problem has infinite worst case complexity under the absolute error criterion. The complexity under the residual error criterion is finite but exponential in 1/e. In chapter 5 we derive lower and upper bounds on the complexity of computing topological degree of Lipschitz function. In the two-
PREFACE
ix
dimensional case the derived algorithm enjoys almost optimal cost. We list some numerical tests. In the multivariate case the lower and upper bounds are exponential. Form of the text Each section of the text ends with exercises. Exercises vary in difficulty: More difficult exercises are marked with an asterisk (*), and open research problems are marked with two asterisks '**'. Each chapter closes with annotations, which indicate the source of the material and include historical remarks. A bibliography follows at the end of each chapter. We have chosen a special format for numbering theorems, lemmas, examples, corollaries, figures and formulas. Namely, they are numbered consecutively on each page, and have the page number attached to their name. For example, Theorem 99.1 is the name of the first theorem on page 99, Corollary 44.2 is the name of the second corollary on page 44, etc. We believe that this format will best serve the reader by providing a more structured and self contained text. Acknowledgments I want to acknowledge many debts. H. Wozniakowski and F. Stenger introduced me to the field of numerical computation and computational complexity, with special emphasis on optimal algorithms for solving nonlinear equations. Collaboration with Henryk and Frank was always stimulating and produ ve. Several sections of the monograph present my joint work with H. Wozniakowski, T. Boult, and Ch. W. Tsay. These are precisely indicated in the annotations. New work with Z. Huang and L. Khachiyan is also reported in the annotations. L. Plaskota, F. Stenger, J. F. Traub, G. Wasilkowski, and H. Wozniakowski carefully read the manuscript and suggested valuable improvements. Z. Huang helped me with drawing all figures and provided several constru ve comments. Most of the research reported here was carried out in the splendid research environments of the Computer Science Departments at the University of Utah, University of Warsaw, Columbia University, and University of California at Berkeley. I would like to thank the National Science Foundation, IBM, and Amoco Corporations for supporting the research reported here.
x
K. Sikorski Salt Lake City and Warsaw 2000
PREFACE
Contents 1 Introduction 1.1 Basic Concepts 1.2 Formulation of the Problem 1.2.1 Computational Methods 1.2.2 Optimal Complexity Methods 1.2.3 Asymptotic Setting 1.2.4 Exercises 1.3 Annotations Bibliography
3 7 8 16 18 18 19 20
2 Nonlinear Equations
23
3
2.1 Univariate Problems 25 2.1.1 Optimality of the Bisection Method 27 2.1.2 Root Criterion in C 34 2.1.3 Residual Criterion in W 37 2.1.4 General Error Criterion in C and W . . . . 44 2.1.5 Polynomial Equations 46 2.1.6 Asymptotic Optimality of the Bisection Method 56 2.1.7 Exercises 70 2.2 Multivariate Problems 71 2.2.1 function with Nonzero Topological Degree . . 71 2.2.2 Lipschitz Functions function 83 2.2.3 Exercises 97 2.3 Annotations 98 2.3.1 Overview and brief history 98 2.3.2 Specific Comments 106 Bibliography 108 XI
xii
CONTENTS
3 Fixed Points-Contractive Functions cti 3.1 Univariate Problems 3.1.1 Relative Error Criterion 3.1.2 Absolute Error Criterion 3.1.3 Exercises 3.2 Multivariate Problems 3.2.1 A Constru cti ve Lemma 3.2.2 Ball Iteration 3.2.3 Ellipsoid Iteration 3.2.4 Centroid Method 3.2.5 Numerical Tests 3.2.6 Exercises 3.3 Annotations 3.3.1 Specific Comments Bibliography
121 122 123 137 138 139 140 142 144 155 157 162 162 163 165
4 Fixed Points-Noncontractive Functions 4.1 Univariate Problems 4.1.1 Minimal Cardinality Number 4.1.2 The FPE-A Method 4.1.3 Exercises 4.2 Multivariate Problems 4.2.1 Absolute Error Criterion 4.2.2 Exercises 4.3 Annotations 4.3.1 General Comments 4.3.2 Residual Error Criterion 4.3.3 Specific Comments Bibliography
171 173 173 175 177 178 178 187 188 188 188 190 190
5 Topological Degree Computation 5.1 Two-Dimensional Lipschitz Functions 5.1.1 Basic Definitions 5.1.2 Lower Bound on the Minimal Cardinality Number 5.1.3 Minimal Cardinality Number 5.1.4 Complexity of the Problem 5.1.5 Numerical Experiments 5.1.6 Exercises 5.2 Lipschitz function in d Dimensions
193 194 195
196 199 207 208 211 211
CONTENTS 5.2.1 Basic Definitions 5.2.2 Information N* 5.2.3 Algorithm Using Information N* 5.2.4 Lower Bound on the Minimal 5.2.5 Exercises 5.3 Annotations 5.3.1 Specific Comments Bibliography Index
xiii 212 213 214 217 226 226 227 228 233
This page intentionally left blank
OPTIMAL SOLUTION OF NONLINEAR EQUATIONS
This page intentionally left blank
Chapter 1
Introduction This monograph is devoted to studying worst case complexity results and optimal or nearly optimal methods for the approximation of solutions of nonlinear equations, approximation of fixed points, and computation of the topological degree. The methods are "global" in nature. They guarantee that the computed solution is within a specified error from the exact solution for every function in a given class. A common approach in numerical analysis is to study the rate of convergence and/or locally convergent methods that require special assumptions on the location of initial points of iterations to be "sufficiently" close to the actual solutions. This approach is briefly reviewed in the annotations to chapter 2, as well as in section 2.1.6, dealing with the asymptotic analysis of the bisection method. Extensive literature exists describing the iterative approach, with several monographs published over the last 30 years. We do not attempt a complete review of this work. The reader interested in this classical approach should consult the monographs listed in the annotations to chapter 2.
1.1
Basic Concepts
We motivate our analysis and introduce basic notions in a simple example of zero finding for continuous function with different signs at the endpoints of an interval. Example 3.1 We want to approximate a zero of a function f from the class F = {/ : [0,1]
R: f(0) , 0, / continuous}. s}
4
CHAPTER 1. INTRODUCTION
By an approximate solution of this problem we understand any point x = x(f) such that the distance between x and some zero = of the function f , f ( a ) = 0, is at most equal to a given small positive number e, \x — a\ < e. To compute x we first gather some information on the function f by sampling f at n sequentially chosen points ti in the interval [0,1]. Then, based on this information we select x. To minimize the time complexity we must select the minimal number of sampling points, that guarantee computing x(f] for any function f in the class F. This minimal number of samples (in the worst case) is called the information complexity of the problem. The cost of combining the samples is called the combinatory complexity of the algorithm constru cti ng x(f). We observe that our problem can be easily solved by the standard bisection method. That is, we select the first evaluation point as the midpoint of [0,1], t1 = 0.5. Then, if the function value /(t 1 ) is positive, the next interval containing a zero of f is [0,0.5], and if negative then [0.5,1]. We compute t2 as the center of the next interval containing a zero and so on. This is illustrated in figure 4-1After n evaluations we have an interval of length 2-n containing a zero of f. The best worst case approximation to a zero of f is now given by the midpoint of that interval. It is not difficult to carry
Figure 4.1: The bisection method.
1.1. BASIC CONCEPTS
5
out an induction showing that this approach is optimal in the worst case, i.e., that any other sampling procedure will produce an interval of length at least 2-n containing a zero of f. Therefore, the minimal number of function samples required to compute x(f) is given by n = [log 2 (l/(2 ))] in the worst case. This minimal number is achieved by the bisection method, i.e., the bisection method is optimal in this class of function. • In section 2.1.1 we outline a generalization of this example to the case of smooth function and methods based on evaluations of arbitrary linear functionals as information. It turns out that the bisection method remains optimal in that setting. Therefore, smoothness of function and extra power of general linear information do not help here in the worst case. It is customary in numerical practice to approximate zeros of univariate function by using hybrid methods involving bisection, Newton type, and higher order interpolatory methods. These methods do converge with asymptotically at least quadratic rate, for functions having zeros of bounded multiplicity. We show in section 2.1.6
that again the linear rate of convergence of the bisection method can not essentially be improved whenever function are infinitely many times differentiable and have zeros of unbounded multiplicity. The results of section 2.1.1 indicate that the fast asymptotic convergence of hybrid methods may occur for some function after an arbitrarily large number of bisection steps have been accomplished. This has been observed, but not too often, in numerical tests. For many test function the hybrid methods were far superior to plain bisection. These tests indicated the need of average case analysis of such methods. Recent results, as outlined in annotations to chapter 2, show that indeed a hybrid bisection-Newton type methods minimize the number of iterations in the average case. We stress that the above results hold with respect to the absolute error criterion, since we insisted on computing solutions close to the actual zeros in the absolute sense. One may wish to relax the absolute error criterion and decide to compute a point x = x(f) at which the magnitude of a function / is small, |/(x)| < e. This is called the residual error criterion. In general the absolute and residual solutions are not related to each other. For example, in the above class it is impossible to satisfy the residual error criterion in the worst case, since the function do not have a uniform upper bound on the derivatives. In some other
6
CHAPTER 1.
INTRODUCTION
classes, like the Lipschitz class in section 2.2.2 or W class in section 2.1.3, it is impossible to solve the problem under the absolute criterion, whereas the solution with residual criterion is readily available. In the course of the text we will introduce various generalizations of these two basic criteria, including relative, relative residual, and arbitrary homogeneous bounds. In terms of the above example we informally introduce basic concepts of our analysis. We denote the set of all zeros of a function / by S(f) and define the set of absolute e-approximations to be S (/,e) = {x € [0,1] : inf \x — \ e}. Then the problem is to compute an e-approximation M(/) as an element of the set S(/,£). We assume that the error tolerance e < 0.5, since for e 0.5 the trivial solution is M(f) = 0.5. The element M(f) is computed under the follwing assumptions. We use the real number model of computation. This means that we can perform the basic arithmetic operations (+,-,*,/) on real numbers at a unit cost. The cost of comparing real numbers is also taken as unity. The unit information operations, called oracle calls, are assumed to be function evaluations (we later generalize these to be arbitrary linear functional evaluations). The cost of computing f(t] for any argument t € [0,1] is denoted by a constant c, and c is usually much larger than unity. By a method computing M(f) we understand any mapping that is composed of a finite number of oracle calls, arithmetic operations, and comparisons. We analyze methods that compute M(f) S(f,e) for any function / in our class F. In the next section we make this definition more precise by splitting the computation of M(/) into two stages: information gathering (oracle calls) and combinatorial evaluation of M(f) (combinatory algorithmic stage). The worst case cost of a method M(.) is defined as the maximal cost of computing M(f) for any function f £ F. Then, the worst case complexity of the problem is defined as the minimal cost among all methods that solve the problem. In section 2.1.1 we show that the worst case complexity of zero finding in the above class F with methods based on evaluations of arbitrary linear functionals is
for some constant d € [0, 3]. Since for typical function / c essentially get
1, we
1.2. FORMULATION
OF THE PROBLEM
7
Why the real number model of computation? We carry out our analysis in the real number model since it is the most popular and practical model for scientific computation, numerical analysis, algebraic complexity, as well as computational geometry. The results in that model are essentially the same as in fixed precision floating point, whenever the methods investigated are numerically stable, and the relative error tolerance e is not too small compared with the product of the roundoff unit of the floating point arithmetic, the condition number, and the accumulation constant of the chosen method. Fortunately, many of the optimal methods are also numerically stable. The methods presented for the topological degree evaluation compute correct result provided that the fun on evaluations have only correct signs! That is also the case with the bisection method in the above example. We do not attempt giving precise definitions of stability and floating point analysis of methods. In the annotations to this chapter we list several references devoted to such topics.
1.2
Formulation of the Problem
In this section we cast our problem in formal language of the computational complexity theory. We formalize the notion of computational methods and their cost. All cost analysis is carried out under the assumption of the real number model of computation. We include this section to make the presentation of the material complete, up to date, and to direct reader's attention to Information-Based Complexity theory, which establishes a formal framework for studying optimal complexity methods for continuous problems. Our problem is formally defined as follows. We let F be a subset of a linear space of function F = {/ : D C Rd }, and let G be a subset of the real space . We let 5 be a given operator,
where 2G is the power set of G and is the set of nonnegative real numbers. We assume that the set S(f) = S(f, 0) is not empty and that S ( f , e ) becomes smaller as we decrease e, i.e., whenever
8
CHAPTER 1.
INTRODUCTION
for every element / € F. We call 5 the solution operator. Now we formulate the problem as: for a given error tolerance e 0, and any element , we wish to compute an e—approximation M(f) to be an element of the set S(f, e),
The above assumptions on S ( f , s ) guarantee that the problem has a solution and that the set of solution elements does not increase when we decrease the error tolerance. These can be viewed as natural requirements. The most basic solution operator studied by us is defined in terms of the absolute error criterion
where dist(a;, A) = inf , for A C G, with a given norm || • || in ;. Throughout the book the solution operators will be defined
by
which corresponds to the zero finding problem, which corresponds to the fixed point problem, which corresponds to the topological degree problem. An important example of a solution operator defined in terms of the residual error criterion for the zero finding problem is given by
Under this criterion we compute a point at which the magnitude of a fun on is at most equal to the prescribed error tolerance s. 1.2.1
Computational Methods
We describe the computation of the element M(f) 6 S(f,e) as composed of the following two stages: gathering information on the element / (oracle calls stage) and combining the information to compute M(/) (combinatorial algorithmic stage). Below we describe each of these.
1.2. FORMULATION OF THE PROBLEM
9
Information To compute M(f) we need to gather some information on the element /, e.g. samples of a fun on / at a discrete set of arguments. We assume that in general we can compute unit information operations L(f), L : F , where L is a linear functional. Examples of such functionals L are fun on and/or derivative evaluations, scalar products, integrals, etc. We denote the class of all such L's by . For each / we compute a finite number n(f) of unit information operations, which constitute the information N(f) about /. The information N is called parallel (nonadaptive) iff
where Li € L for every and where the are linearly independent. The number of evaluations n is fixed for any fun on / and is called the cardinality of N. In the case of parallel information the functionals Li are given a priori, before any computation takes place. Such information can be easily implemented on a parallel computer with up to n-processors and yielding optimal speed-up over sequential one processor computation. The class of sequential (adaptive) information does not enjoy this desirable property; every Li evaluation depends on all previously computed values, and the total number of evaluations n(f) depends on the particular fun on / selected. The sequential information can therefore be defined as
where
for
and
The number n(/) is determined as follows. We suppose that we have already computed yl = LI(/), ...,y,- = (f; yi,...,y,-_i). Then we make a decision whether another evaluation is needed. The decision is made on the basis of available knowledge about /. If the decision is "NO", n(f) becomes i and the computation is terminated. If the decision is "YES", then we choose L;+1, and the whole process is repeated. This termination procedure can be modeled by defining Boolean function
called termination function. If ter,-(yi, ...,?/,•_!) = 1 then we terminate the computation and set n(f) = i. Otherwise, if teri(yi,..., )
10
CHAPTER 1.
INTRODUCTION
= 0, we choose the (i + l)st functional Li+1 and compute L;+i( ,..., y i ) . This process is then repeated. Thus, n(f) is defined as
By convention, we define here min{0} = +00. The maximal number of functional evaluations is called the cardinality of N,
In what follows, we assume that the cardinality card(IV) is finite. This assumption is unrestri ve in practice, since if n(f) is unbounded then the cost of computing the information N(f) could be arbitrarily large. In some parts of our book we consider the sequential information with the number of evaluations n = n(f) given a priori, i.e., independent of the selection of a fun on /. This is justified by a common computational practice in which the user restricts a priori the total number of steps (evaluations) in a method. This type of sequential information is sometimes easier (less technical) to analyze. We will call it information with predefined cardinality. In the sequential information the next evaluation (functional) Li is defined with respect to all previously computed information operations given in the vector [y1; ..., y ]. Such structure does not permit easy implementation on a parallel computer and is naturally suited for sequential processor implementation. Two important properties of the information are: • It is partial, i.e., the operator N is many to one. This implies that knowing N(f) we cannot uniquely identify /. • It is priced, i.e., each evaluation Li costs, say c-units of time, and the total N(f) therefore costs c n(f). We are interested in information that has minimal cost, i.e., minimal number of L evaluations, which are necessary to compute an e-approximation M(/) for every / G F. This minimal number of evaluations m(e) will be called the information complexity of the problem.
1.2. FORMULATION OF THE PROBLEM
11
Algorithms
Knowing the information N(f) we compute an approximation M(f) € S(f, s). This is accomplished by an algorithm defined as any transformation of the information N(f) into the set G, i.e.,
We are interested in algorithms that use minimal resources (information with minimal cardinality and a small number of arithmetic operations and comparisons) to solve our problem. Such algorithms compute the e-approximation M(/) = (N ( f ) ) , for every / 6 F, and have smallest (or close to smallest) cost. The cost issues are further discussed below. Remark We stress that classical numerical methods (algorithms) can be cast into our framework. The notions of information and algorithms are not separated there but rather are jointly used as a method or an algorithm. • By a computational method M for solving the problem we understand combined information N and algorithm such that M(/) = (N(f)). In what follows we introduce concepts like radius and diameter of information, define error of an algorithm, optimal and central algorithms. These notions will enable us to characterize the quality of computational methods. Radius and diameter of information
The radius and diameter characterize the quality of information. The smaller the radius and diameter the better we can approximate our problem. We assume that the solution operator is given as in 8.1, so the error is measured according to the absolute error criterion. The notions of radius and diameter with different error criteria are analyzed in the following chapters. We also assume for simplicity that the set S(/) = {s(/)} is a singleton, with the general case left as an exercise for the reader. We let y — N(f) be the n-dimensional information vector for
12
CHAPTER 1.
INTRODUCTION
Figure 12.1: Local radius of information. some / 6 F. The set of elements F indistinguishable from / by N will be called V(y) (see Figure 12.1), i.e.,
The local radius of information r(N, y) is defined as the radius of the set U(y) = S(f), which is composed of all elements s(f) for f in V(y), i.e.,
The local diameter of information d(n, y) is the diameter of the set U(y), i.e.,
The (global) radius r(N) and diameter d(N) of the information N are defined as the worst case local radius and diameter, i.e.,
1.2. FORMULATION OF THE PROBLEM
13
and
It is not difficult to prove (but is left as an exercise) the following relationship between the radii and diameters of information: Lemma 13.1 For every vector y = N ( f ) , f tion N we have:
F and any informa-
Moreover, if the elements s(f) of the sets S(f) are real numbers, then
and
Errors of algorithms It turns out that the local (global) radius of information is a sharp lower bound on the local (global) error of any algorithm using the information N. More precisely, we define the local e( , y) and global e( ) errors of any algorithm as:
and
Theorem 13.1 We have:
and
where
(N) is the class of all algorithms using the information N.
CHAPTER 1.
14
INTRODUCTION
Proof We will only prove 13.1 since the proof of 13.2 is similar. We let R = inf and L = r(N, y). We prove (i) L R, and (ii) L R which combined imply 13.1. To this end we take any algorithm and observe that
which yields (i). Now we take any 8 > 0 and c$
Then taking the algorithms
G such that
(y) = c with
0+ we obtain
which completes the proof. • Theorem 13.1 indicates that optimal algorithms should have errors as small as possible, i.e., equal to corresponding radii of information. The strongly (locally) and globally optimal error algorithms are therefore defined as: The algorithm * is a strongly (locally) optimal error algorithm iff for every
The algorithm
** is an (globally) optimal error algorithm iff
These definitions immediately imply that every strongly optimal algorithm is also globally optimal. It may happen though that the local error of a globally optimal algorithm is much larger than the local radius of information.
1.2. FORMULATION OF THE PROBLEM
15
Central algorithms We suppose now that for every y € N(F) there exists c = c(y) which is the center of a smallest ball containing the set U(y). We then define the central algorithm (y) by
Lemma 15.1 An algorithm error algorithm.
is central iff it is a strongly optimal
The proof of this lemma is left to the reader as an exercise at the end of this chapter. Interpolatory algorithms It is important to analyze algorithms that are "nearly" strongly optimal, or more precisely that compute a solution belonging to the set U ( y ) . Such algorithms will be called interpolatory, since they provide an exact solution for some element / that interpolates our unknown / with respect to the given information N. An algorithm is called interpolatory iff for some
for every
It turns out that the local error of interpolatory algorithms is at most twice the local radius of information. This property is formulated in the following theorem. Theorem 15.1 For every interpolatory algorithm N(F) we have
, and every y 6
and Proof We only show (i) since the proof of (ii) is similar. To show (i) it is enough to prove that e( , y) d(N, y) (see Lemma 13.1 (i) and 13.1). We observe that
which completes the proof. •
CHAPTER 1.
16 1.2.2
INTRODUCTION
Optimal Complexity Methods
We stress that to compute an e-approximation M(f) S ( f , e ) , for every / F, we need to utilize a method M — ( , N), such that the (global) error of the algorithm is at most e, e( ) e. We discuss here complexity issues of computational methods. We use the real number model of computation, which is defined by the following two postulates: 1. We assume that each information operation (oracle call) L;(•) costs c > 0 units of time. 2. We assume that multiplication by a scalar or addition (subtraction) of elements in the set G has unit cost. Since G C d, the unit cost corresponds to the cost of d arithmetic operations. We make this assumption to normalize our cost function. In addition we assume that comparisons and evaluations of certain elementary function also have unit cost. We define the worst case cost of a method M = ( , N), as:
where the cost(N(f)) is the cost of computing the information vector y = N ( f ) , and cost( (N(/))) is the cost of combining the information y to compute M(f) = (y) with using of the algorithm . A method M° = ( , N°) is called an optimal complexity method if it guarantees to compute an e-approximation M(/) € S ( f , e ) , for every problem element F, with a minimal worst case cost, among all methods that belong to a specific class M, cost(M°) = comp(e), where such that
This minimal cost comp(e) is called e-complexity of the problem 5 in the class M . Examples of classes of methods include the following: 1. information consisting of parallel evaluations of linear fun cti onals combined with an arbitrary algorithm 2. information consisting of parallel fun with any algorithm
on samples combined
1.2. FORMULATION OF THE PROBLEM 3. information consisting of sequential fun with an arbitrary algorithm, etc.
17 on samples combined
We recall that the information complexity m(e) in the class £ is the minimal number of samples ;(•) € , in the information Nn = [ (-), ... (•)], which is needed to compute an e-approximation for every / F. Formally for all N as above with
We now suppose that we can construct information N° consisting of n = m(e] samples, such that the radius of N° is at most e, r(N°) . To summarize: N° is such that: 1. The number of samples 2. The radius r(N°) 3. The cost(N°(f}} c m(e).
, equals m(e).
e. for a worst / 6 F is approximately equal
We further assume that there exist an algorithm mation N° such that: 4. The error of r(N°).
using the infor-
is equal to the radius of information N°, e( ) —
5. The cost of combining N ° ( f ) with the algorithm smaller that the cost of computing N°:
is much
for every
Under these assumptions the cost of the method M° = ( , N°) is approximately equal to c m(e),
We observe that to compute an e-approximation M(/) e S ( f , e ) for every / e F we have to use information Nn with the number of samples n > m(e}. Therefore, the cost of any method M that solves our problem must be at least c m(e),
CHAPTER 1.ININTRODUCTION
18
These arguments imply that the method M° is an almost optimal complexity method in the class M. = ( , N), where N is any information consisting of evaluations L;(•) € and is an arbitrary algorithm. Furthermore, we conclude that the e-complexity of the problem is approximately
which establishes important relationship between information complexity and the e-complexity of the problem. Conclusion Any method satisfying conditions (1-5) above is an almost optimal complexity method. The e-complexity of the problem is approximately equal c m(e). • Fortunately, many important applications satisfy conditions (15). These include the zero finding problem in the case of opposite signs of a fun on at the endpoints of an interval, other zero finding problems as described in chapter 2, and many of the fixed point problems in chapters 3 and 4.
1.2.3
Asymptotic Setting
In classical numerical analysis the asymptotic analysis of methods is commonly undertaken. In this case we approximate an element s(f) 6 S(f) by a sequence of methods Mn(f) = n (Nn(/)), where Nn(f) consists of n, in general, sequential evaluations on /. In this setting we are interested in achieving possibly the best speed of convergence of the sequence \\s(f) — Mn(f}\\ for every fun on in the class F. Here the complexity analysis is of secondary importance. The problem is to find a sequence of algorithms n and information Nn that guarantee the best speed of convergence. This setting will be analyzed in section 2.1.6 for the class of smooth univariate function changing sign at the endpoints of an interval.
1.2.4
Exercises
1. Carry out the proof of Lemma 13.1. 2. Verify 13.2. 3. Verify formally Lemma 15.1. 4. Carry out the proof of part (ii) of Theorem 15.1.
1.3. ANNOTATIONS
19
5. Suppose that the set S(f) is not a singleton as assumed on page 11. Derive the general formula for the local radius of information with respect to the absolute error criterion, which is given by:
6. (*) Carry out the proofs of all major results in this chapter with using of the generalized definition of radius of information from the previous exercise.
1.3
Annotations
This chapter is based on the theory outlined in several monographs: [4], [7]-[11], [6], [12]. In these the reader shall find the history of research devoted to optimal information and algorithms for several important applications as well as exhaustive bibliography of the field presently known as Information-Based Complexity [8]-[11]. These monographs go well beyond the worst case model discussed in our book. They analyze extensions of the theory to different settings, including average, probabilistic, and asymptotic analysis [11]. Moreover, they fully develop the theory and applications for linear problems [11]. A formalization of the real number model is given by Blum, Shub, and Smale [1], [2]. In our analysis we use the extended real number model with inclusion of oracle calls. This formal extension can be found in the paper of Novak [5]. The notion of numerical stability for solving nonlinear equations was introduced by Wozniakowski [13]. In a recent paper [14], Wozniakowski gives complete floating point analysis of the bisection method, together with an informal review of several computational models used in scientific computation. An excellent monograph devoted to stability of numerical algorithms was recently published by Higham [3]. The results of exercises 5 and 6 were further generalized in [10] to the case of sets without any norm.
20
BIBLIOGRAPHY
Bibliography
[1] Blum, L., Shub, M., and Smale, S. On a theory of computation and complexity over the real numbers: NP completeness, recursive function and universal machines. Bull. Amer. Math. Soc., 21: 1-46, 1989. [2] Blum, L., Cucker, F., Shub, M., and Smale, S. Complexity and Real Computation. Springer-Verlag, New York, 1998. [3] Higham, N. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 1996. [4] Kowalski, M., Sikorski, K., and Stenger, F. Selected Topics in Approximation and Computation. Oxford University Press, New York, 1995. [5] Novak, E. The real number model in numerical analysis. Complexity, 11: 57-73, 1995.
J.
[6] Novak, E. Deterministic and Stochastic Error Bounds in Numerical Analysis. Vol. 1349 of Lecture Notes in Math. SpringerVerlag, Berlin, 1988. [7] Plaskota, L. Noisy Information and Computational Complexity. Cambridge University Press, New York, 1996. [8] Traub, J. F., and Werschulz, A. G. Complexity and Information. Cambridge University Press, New York, 1998. [9] Traub, J. F., and Wozniakowski, H. A General Theory of Optimal Algorithms. Academic Press, New York, 1980. [10] Traub, J. F., Wasilkowski, G., and Wozniakowski, H. Information, Uncertainty, Complexity. Addison-Wesley, Reading, MA, 1983. [11] Traub, J. F., Wasilkowski, G., and Wozniakowski, H. Information Based Complexity. Academic Press, New York, 1988. [12] Werschulz, A. G. The Computational Complexity of Differential and Integral Equations. Oxford University Press, New York, 1991.
BIBLIOGRAPHY
21
[13] Wozniakowski, H. Numerical stability for solving nonlinear equations. Numer. Math., 27: 373-390, 1977. [14] Wozniakowski, H. Why does information-based complexity use the real number model? Theor. Comp. Sci., 219: 451-466, 1999.
This page intentionally left blank
Chapter 2
Nonlinear Equations In this chapter we address the problem of approximationg zeros a of nonlinear function /, f ( a ) = 0, where / E F C {/ : D C Rd —^1Z1}. In order to define our solution operators, we first review several error criteria that are commonly used to measure the quality of approximations to zeros of nonlinear equations. This is done for univariate function / : [a, b] R. Straightforward generalizations to the mulivariate case are based on replacing the magnitude fun on by a specific norm. These are considered in section 2.2 when we review multivariate problems. Error criteria
A number of error criteria are used in practice for approximation of a zero a of /. For instance one may wish to find a number x = x(f, e) such that one of the following conditions is satisfied: root (absolute) relative root residual relative residual
criterion : criterion : criterion : criterion :
where e is a nonnegative real number specifying the errror tolerance. A general formulation of the problem can be given in terms of a general error criterion as follows. We define E to be a nonnegative real fun on Then the problem can be formulated by the solution operator S ( f , e ) given by
24 We call the fun are as follows
CHAPTER 2. NONLINEARREEQUATIONS on E a general error criterion. The examples of E
which corresponds to the root criterion,
which corresponds to the relative root criterion,
which corresponds to the residual criterion, and
which defines the relative residual criterion. Remark It is not difficult to generalize the definitions of the radius of information and error of an algorithm presented in section 1.2.1. Namely
and
As an analog to the result in section 1.2.1 we can prove that the radius of information is a tight lower bound on the error of any algorithm
The proof of this result is left as an exercise for the reader. •
2.1. UNIVARIATE PROBLEMS
2.1
25
Univariate Problems
In this section we review several classes of univariate function with emphasis on optimal methods and tight complexity bounds. In general we define the class of function F as a subset of
where some restrictions on smoothness, additional properties (like sign change) and bounds on the norm are analyzed in the following sections. In the following sections 2.1.1-2.1.4 we consider the class F C F of function, which have some number of continuous derivatives, nonempty set of zeros and a bounded norm (or seminorm). A particular selection of the class depends on the error criterion and is specified below. We start our analysis in section 2.1.1 with the class C of infinitely many times differentiable function changing sign at the end points of the interval [0,1]. We consider the absolute error criterion and analyze methods based on sequential evaluations of arbitrary linear functionals. We show that in this class the bisection method is optimal. In sections 2.1.2-2.1.4 we do not assume that the function change signs at the endpoints. In this case the zero finding problem under the absolute error criterion becomes unsolvable in the worst case; however, it can be solved in the residual sense. More precisely, in section 2.1.2 we analyze the absolute error criterion. We prove that there exists no method to find x, \x — a\ e, with e < (b — a)/2 for the class F1 of infinitely often differentiable function with simple zeros, whose arbitrary seminorm is bounded by 1. We stress that this result holds independently of which and how many linear functionals are evaluated. The same result holds for the relative root criterion with e < (b — a) / (b + a + 2 ) and a 0 (section 2.1.4). In section 2.1.3 we consider the residual error criterion, and deal with the class F2 of function having zeros whose (r — l)st derivative is bounded by one, r 1. We find almost optimal information and algorithm. The analysis makes extensive use of Gelfand n-widths. The almost optimal information consists of n parallel fun on evaluations and the algorithm is based on a pair of perfect splines interpolating /. This algorithm yields a point x such that f ( x ) = O ( n - r ) .
26
CHAPTER 2. NONLINEAR EQUATIONS
For small r we present a different algorithm and information that are also almost optimal and much easier to compute than the algorithm based on perfect splines. If n is large enough, n = , then the residual criterion is satisfied. By contrast we prove that the relative residual criterion is never satisfied (section 2.1.4). In section 2.1.4 we also discuss a general error criterion and find a lower bound on the error of optimal algorithms in terms of Gelfand widths. We compare the results for root and residual criteria. We consider the class F1 with the seminorm given by ||f|| = . Then F1 is a proper subset of F2. This shows that for the class F\ it is possible to satisfy the residual criterion while it is impossible to satisfy the root criterion. On the other hand, there exist classes of function for which one draws the opposite conclusion. For example, for the class of continuous function that have opposite signs at a and 6, one can easily show that it is possible to satisfy the root criterion and impossible to satisfy the residual criterion. This shows how essential is the specification of the error criterion for solving nonlinear algebraic equations. We finally summarize the results of sections 2.1.5-2.1.6. In section 2.1.5 we generalize the results of section 2.1.1 to the case of polynomials of (in general) unbounded degree that change signs at the endpoints of [a, b]. We show that even in this strongly restricted class the bisection method remains to be optimal. We also find optimal parallel method, which is based on evaluations of a polynomial at equidistant points in [a, b]. In section 2.1.6 we analyze the asymptotic rate of convergence of methods approximating zeros of infinitely many times differentiable function that change signs at the endpoints of [0,1]. We show that for function having zeros of infinite multiplicity we cannot essentially improve the linear rate of convergence of the bisection method. This result holds for methods based on arbitrary sequential evaluations of continuous linear functionals. When the multiplicity of zeros is bounded, then there exist methods that converge asymptotically with at least quadratic rate. The general setting of the iterative model of computation is reviewed in the annotations to this chapter. We give there several references to the classical literature devoted to this model. In the annotations we also briefly review the average case model of com-
2.1.
UNIVAPIATE PROBLEMS
27
putation and list references addressing univariate problems. The multivariate average case analysis is a widely open research area. Complexity results We briefly summarize complexity results for solving univariate problems. We recall that the worst case complexity is defined as the minimal (worst case) cost of a method that computes e-approximations for all function in the considered class. As in chapter 1 we assume that each fun on evaluation costs c units of time, and that the cost of arithmetic operations, comparisons, and the evaluations of elementary function is taken as unity (usually c >> 1). Our positive results for the bisection method in sections 2.1.1 and 2.1.5 show that the e-complexity in corresponding classes of function is proportional to clog(l/e). The negative results in sections 2.1.2-2.1.4 for the root, relative root, and relative residual criteria, show that the e-complexity for these problems is infinite. For the residual criterion, the results of section 2.1.3 imply that for small r, the e-complexity is proportional to ce~ 1 / r . For large r, we can conclude that the e-complexity is at least proportional to ce~ 1 / r . It may be much larger since the computation of the algorithms based on perfect splines may require much more than ce~l/r arithmetic operations.
2.1.1
Optimality of the Bisection Method
We consider here a zero finding problem for smooth function that change sign at the endpoints of an interval. The goal is to compute an e-approximation with respect to the root criterion. This can be accomplished by using bisection, hybrid bisection-secant, or bisection-Newton type methods. We show that in the worst case the bisection method is optimal in the class of all methods using arbitrary sequential linear information. This holds even for infinitely many times differentiable function having only simple zeros.
28
CHAPTER 2. NONLINEARrR EREQUATIO
Formulation of the problem
We let = C [0,1] to be the space of infinitely many times differentiable function / : [0,1] . We let F = f(1) 0) there exists exactly one a = a(f) such that /(a) = 0, and The solution operator S is defined as
with the singleton set S(f) = {a(f)}. We wish to compute an eapproximation M = M(f), which belongs to the set S( ),
for every fun on / 6 F. To accomplish this we use arbitrary sequential information with predifmed cardinality n, N = Nn as defined in hapter 1,
where y = LI(/), [y,-_i,//,-(/;y,-i)], and !,-,/(•) !,-(•;y,;-i) is a linear functional i = 1, ...,n. The bisection information JVbls is in particular given by the functionals
for Xi — (ez,-_i + 6,-_i)/2 with ao = 0, 60 = 1, and
and
Knowing A^(/) we approximate a(f) by an algorithm <j>,
As in chapter 1, the worst case error of an algorithm
is
2.1. UNIVARIATE PROBLEMS
29
The radius of information is given by
and as in general, it is a lower bound on the error of any algorithm
where is the class of all algorithms using N. The bisection algorithm is defined as
It is easy to verify that
Optimality theorem We show that the bisection method Mbis = (