COMPUTATIONAL THEORY OF ITERATIVE METHODS
STUDIES IN COMPUTATIONAL MATHEMATICS 15
Editors:
C.K. CHUI Stanford University Stanford, CA, USA
L. WUYTACK University of Antwerp Antwerp, Belgium
Amsterdam • Boston • Heidelberg • London • New York • Oxford • Paris San Diego • San Francisco • Singapore • Sydney • Tokyo
COMPUTATIONAL THEORY OF ITERATIVE METHODS
IOANNIS K. ARGYROS Department of Mathematical Sciences Cameron University Lawton, OK 73505, USA
Amsterdam • Boston • Heidelberg • London • New York • Oxford • Paris San Diego • San Francisco • Singapore • Sydney • Tokyo
Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands Linacre House, Jordan Hill, Oxford OX2 8DP, UK
First edition 2007 Copyright © 2007 Elsevier B.V. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+ 44) (0) 1865 843830; fax (+ 44) (0) 1865 853333; email:
[email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: ISSN:
978-0-444-53162-9 1570-579X
For information on all Elsevier publications visit our website at books.elsevier.com
Printed and bound in The Netherlands 07 08 09 10 11
10 9 8 7 6 5 4 3 2 1
DEDICATED TO MY MOTHER ANASTASIA
v
This page intentionally left blank
Introduction Researchers in computational sciences are faced with the problem of solving a variety of equations. A large number of problems are solved by finding the solutions of certain equations. For example, dynamic systems are mathematically modelled by difference or differential equations, and their solutions represent usually the states of the systems. For the sake of simplicity, assume that a time-invariant systems is driven by the equation x = f (x), where x is the state, then the equilibrium states are determined by solving the equations f (x) = 0. Similar equations are used in the case of discrete systems. The unknowns of engineering equations can be functions (difference, differential, integral equations), vectors (systems of linear or nonlinear algebraic equations), or real or complex numbers (single algebraic equations with single unknowns). Except special cases, the most commonly used solutions methods are iterative, when starting from one or several initial approximations a sequences is constructed, which converges to a solution of the equation. Iteration methods are applied also for solving optimization problems. In such cases the iteration sequences converge to an optimal solution of the problem in hand. since all of these methods have the same recursive structure, they can be introduced and discussed in a general framework. To complicate the matter further, many of these equations are nonlinear. However, all may be formulated in terms of operators mapping a linear space into another, the solutions being sought as points in the corresponding space. Consequently, computational methods that work in this general setting for the solution of equations apply to a large number of problems, and lead directly to the development of suitable computer programs to obtain accurate approximate solutions to equations in the appropriate space. This monograph is intended for researchers in computational sciences, and as a reference book for an advanced numerical-functional analysis or computer science course. The goal is to introduce these powerful concepts and techniques at the earliest possible stage. The reader is assumed to have had basic courses in numerical analysis, computer programming, computational linear algebra, and an introduction to real, complex, and functional analysis. We have divided the material into twelve chapters. Although the monograph is of a theoretical nature, with optimization and weakening of existing hypotheses considerations each chapter contains several new theoretical results and important vii
viii
Introduction
applications in engineering, in dynamic economic systems, in input-output systems, in the solution of nonlinear and linear differential equations, and optimization problems. The applications appear in the form of Examples or Applications or Exercises or study cases or they are implied since our results improve earlier one that have already been applied in concrete problems. Sections have been written as independent of each other as possible. Hence the interested reader can go directly to a certain section and understand the material without having to go back and forth in the whole textbook to find related material. There are four basic problems connected with iterative methods. Problem 1 Show that the iterates are well defined. For Example, if the algorithm requires the evaluation of F at each xn , it has to be guaranteed that the iterates remain in the domain of F . It is, in general, impossible to find the exact set of all initial data for which a given process is well defined, and we restrict ourselves to giving conditions which guarantee that an iteration sequence is well defined for certain specific initial guesses. Problem 2 Concerns the convergence of the sequences generated by a process and the question of whether their limit points are, in fact, solutions of the equation. There are several types of such convergence results. The first, which we call a local convergence theorem, begins with the assumption that a particular solution x∗ exists, and then asserts that there is a neighborhood U of x∗ such that for all initial vectors in U the iterates generated by the process are well defined and converge to x∗ . The second type of convergence theorem, which we call semilocal, does not require knowledge of the existence of a solution, but states that, starting from initial vectors for which certain-usually stringent-conditions are satisfied, convergence to some (generally nearby) solutions x∗ is guaranteed. Moreover, theorems of this type usually include computable (at least in principle) estimates for the error xn − x∗ , a possibility not afforded by the local convergence theorems. Finally, the third and most elegant type of convergence result, the global theorem, asserts that starting anywhere in a linear space, or at least in a large part of it, convergence to a solution is assured. Problem 3 Concerns the economy of the entire operations, and, in particular, the question of how fast a given sequence will converge. Here, there are two approaches, which correspond to the local and semilocal convergence theorems. As mentioned above, the analysis which leads to the semilocal type of theorem frequently produces error estimates, and these, in turn, may sometimes be reinterpreted as estimates of the rate of convergence of the sequence. Unfortunately, however, these are usually overly pessimistic. The second approach deals with the behavior of the sequence {xn } when n is large, and hence when xn is near the solutions x∗ . This behavior may then be determined, to a first approximation, by the properties of the iteration function near x∗ and leads to so-called asymptotic rates of convergence.
Introduction
ix
Problem 4 Concerns with how to best choose a method, algorithm, or software program to solve a specific type of problem and its descriptions of when a given algorithm or method succeeds or fails. We have included a variety of new results dealing with Problems 1–4. Results by other authors (or us) are either explicitely stated or implicitely when compared with the corresponding ones of ours or they are left as exercises This monograph is an outgrowth of research work undertaken by us and complements/updates earlier works of ours focusing on in depth treatment of convergence theory for iterative methods [7]–[100]. Such a comprehensive study of optimal iterative procedures appears to be needed and should benefit not only those working in the field but also those interest in, or in need of, information about specific results or techniques. We have endeavored to make the main text as self contained as possible, to prove all results in full detail and to include a number of exercises throughout the monograph. In order to make the study useful as a reference source, we have complemented each section with a set of ”Remarks” in which literature citations are given, other related results are discussed, and various possible extensions of the results of the text are indicated. For completion, the monograph ends with a comprehensive list of references and an appendix containing useful contemporary numerical algorithms. Because we believe our readers come from diverse backgrounds and have varied interests, we provide ”recommended reading” throughout the textbook. Often a long monograph summarizes knowledge in a field. This monograph, however, may be viewed as a report on work in progress. We provide a foundation for a scientific field that is rapidly changing. Therefore we list numerous conjectures and open problems as well as alternative models which need to be explored. In this monograph we do not attempt to cover all the aspects of what is commonly understood to be computational methods. Although we make reference, and briefly study (e.g. homotopy methods), our main concern is restricted to certain nongeometrical computational methods. For a geometrical approach to computational methods we refer the reader to the excellent book by Krasnosel’skii and Zabrejko [214]. The monograph is organized as follows. Chapter 1: The essentials on linear spaces are provided. Chapter 2: Some results on monotone convergence are reported. Chapter 3: Stationary and nonstationary Mann and Ishikawa-type iterations are studied. Recent results of ours are also compared favourably with earlier ones. Chapters 4–9: Newton-type methods and their implications/applications are covered here. The Newton-Kantorovich Theorem 4.2.4 is one of the most important tools in nonlinear analysis and in classical numerical analysis. This theorem has been successfully used by numerical analysts for obtaining optimal bounds for many iterative procedures. The original paper of Kantorovich [206] contains optimal a-priori bounds for the Newton’s method, albeit not in explicit form. Explicit forms of those a-priori bounds were obtained independently by Ostrowski [264] and Gragg and Tapia [170] (see [284] for the equivalence of the bounds).
x
Introduction
The paper of Gragg and Tapia [170] also contains sharp a-posteriori bounds for Newton’s method. By using different techniques and/or different a-posteriori information, the a-posteriori error bounds were refined by other authors [6], [121], [141], [143], [209]–[212], [274], [284] and us [11]–[99]. Various extensions of the Kantorovich theorem have been used to obtain error bounds for Newton-like methods [16], [29], [34], [35], [48], [121], [274], [351], inexact Newton methods [42], [58], [359], the secant method [13], [21], [275], Halley’s method [41], [160], etc. A survey of such results can be found in [68], [99], [350]. The Newton-Kantorovich theorem has also been used for proving existence and uniqueness of solutions of nonlinear equations arising in various fields. The spectrum of applications of this theorem is immense. One can sit in front of a computer and make a search for the ”Newton-Kantorovich theorem” and will find hundreds if not thousands of works related/based on this theorem. Our list below is therefore very incomplete. However we mention such diverse problems as the chaotic behavior of maps [314], optimal shape design [220], nonlinear Riemann-Hilbert problems [104], optimal control [172], nonlinear finite element [324], integral equations [4], [141], [160], flow problems [226], nonlinear elliptic problems [249], semidefinite programming [251], LP methods [294], interior point methods [282], multiparameter singularly perturbed systems [245]. We also mention that this theorem has been extended for analyzing Newton’s method on manifolds [162], [265] (see also our section 8.7). The relation between the Kantorovich Theorem and other existence theorems for solutions of nonlinear equations like Moore and Miranda theorems have been investigated in [1], [369] (see also the improvements in our chapter 8, sections 8.1 and 8.2). The foundation of the Newton-Kantorovich theorem is the famous for its simplicity and clarity Newton-Kantorovich hypothesis (4.2.51). This condition is the sufficient hypothesis for the convergence of Newton’s method. However, it is common knowledge as numerical experiments indicate that the convergence of the Newton method can be obtained even if that the initial guess is such that this condition is violated. Therefore weakening this condition is of extreme importance in computational mathematics, since the trust region will be enlarged. Recently [92] we showed that the Newton-Kantorovich hypothesis can always be replaced by weaker hypothesis (5.1.56) which doubles (at most, if l0 = 0) the application of this theorem. Note that the verification of condition (5.1.56) requires the same information and computational cost as (4.2.51), since in practice the computation of Lipschitz constant l requires that of center Lipschitz constant l0 . Moreover the following advantages hold: semilocal case: finer error bounds on the distances involved and an at least as precise information on the location of the solution; local case: finer error bounds and a larger trust region. The following advantages carry over if our approach is extended to Newton-like methods (see Chapter 5).
Introduction
xi
For all the above reasons it is worth it for somebody to revisit all existing results using the Newton-Kantorovich hypothesis and replace it with our weaker condition (5.1.56). Such a task is obviously impossible to be undertaken by us and we leave it to the motivated readers to pick on specific problems mentioned (or not) above. Here we simply have attempted (Chapters 5–9) to provide usefull but by far a complete list of such applications/implications. We report on some results on the study of variational inequalities on Chapter 10. Chapters 11 and 12 deal with the convergence of point to set mappings and their applications. Finally in Chapter 13 we improve the mesh independence principle [145] and also provide a more efficient conjugate gradient solver for the normal equations [194].
This page intentionally left blank
Contents
Contents
xiii
1 Linear Spaces 1.1 Linear Operators . . . . . . . . . 1.2 Continuous Linear Operators . . 1.3 Equations . . . . . . . . . . . . . 1.4 The Inverse of a Linear Operator 1.5 Fr´echet Derivatives . . . . . . . . 1.6 Integration . . . . . . . . . . . . 1.7 Exercises . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
2 Monotone convergence 2.1 Partially Ordered Topological Spaces . . . . . . 2.2 Divided Differences in a Linear Space . . . . . 2.3 Divided Difference in a Banach Space . . . . . 2.4 Divided Differences and Monotone Convergence 2.5 Divided Differences and Fr´echet-derivatives . . 2.6 Enclosing the Root of an Equation . . . . . . . 2.7 Applications . . . . . . . . . . . . . . . . . . . . 2.8 Exercises . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . . .
3 Contractive Fixed Point Theory 3.1 Fixed Points . . . . . . . . . . . . . . . . . . . . 3.2 Applications . . . . . . . . . . . . . . . . . . . . . 3.3 Extending the contraction mapping theorem . . . 3.4 Stationary and nonstationary projection-iteration 3.5 A Special iteration method . . . . . . . . . . . . 3.6 Stability of the Ishikawa iteration . . . . . . . . . 3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . xiii
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
1 1 2 3 5 6 9 10
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
17 17 20 21 25 30 34 38 40
. . . . . . . . . . . . . . . . . . methods . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
47 47 49 50 54 64 68 79
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
xiv
Contents
4 Solving Smooth Equations 4.1 Linearization of Equations . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Newton-Kantorovich Method . . . . . . . . . . . . . . . . . . . 4.3 Local Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Convergence under Twice-Fr´echet Differentiability Only at a Point 4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87 . 87 . 88 . 95 . 97 . 104
5 Newton-like methods 5.1 Two-point methods . . . . . . . . . . 5.2 Relaxing the convergence conditions 5.3 A fast method . . . . . . . . . . . . 5.4 Exercise . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
121 121 144 151 166
6 More Results on Newton’s Method 6.1 Convergence radii for the Newton-Kantorovich method 6.2 Convergence under weak continuous derivative . . . . 6.3 Convergence and universal constants . . . . . . . . . . 6.4 A Deformed Method . . . . . . . . . . . . . . . . . . . 6.5 The Inverse Function Theorem . . . . . . . . . . . . . 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
187 187 198 204 213 217 227
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
7 Equations with Nonsmooth Operators 7.1 Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Convergence of Newton-Kantorovich Method. Case 2: H¨olderian assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Convergence of the Secant Method Using Majorizing Sequences . . . 7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Applications of the weaker version of the Newton-Kantorovich theorem 8.1 Comparison with Moore Theorem . . . . . . . . . . . . . . . . . . . . 8.2 Miranda Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Computation of Orbits . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Continuation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Trust Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Convergence on a Cone . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Convergence on Lie Groups . . . . . . . . . . . . . . . . . . . . . . . 8.8 Finite Element Analysis and Discretizations . . . . . . . . . . . . . . 8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
245 246 256 263 270 279
287 287 291 297 300 305 316 329 336 341
xv
Contents 9 The Newton-Kantorovich Theorem and Mathematical Programming 9.1 Case 1: Interior Point Methods . . . . . . . . . . . . . . . . . . . . . 9.2 Case 2: LP Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Variational inequalities on finite dimensional spaces . . . . . . . . . . 9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
353 353 360 368 377
10 Generalized equations 10.1 Strongly regular solutions . . . . . . . . . . . 10.2 Quasi Methods . . . . . . . . . . . . . . . . . 10.3 Variational Inequalities under Weak Lipschitz 10.4 Exercises . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . Conditions . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
379 379 387 393 400
11 Monotone convergence of point to set-mapping 11.1 General Theorems . . . . . . . . . . . . . . . . . 11.2 Convergence in linear spaces . . . . . . . . . . . . 11.3 Applications . . . . . . . . . . . . . . . . . . . . . 11.4 Exercises . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
409 409 410 415 417
12 Fixed points of point-to-set 12.1 General theorems . . . . . 12.2 Fixed points . . . . . . . . 12.3 Applications . . . . . . . . 12.4 Exercises . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
419 419 421 422 426
13 Special topics 13.1 Newton-Mysovskikh conditions . . . . . . . . . . . . . . . 13.2 A conjugate gradient solver for the normal equations . . . 13.3 A finer mesh independence principle for Newton’s method 13.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
429 429 437 449 456
mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Bibliography
457
A Glossary of Symbols
483
Index
485
This page intentionally left blank
Chapter 1
Linear Spaces The basic background for solving equations is introduced here.
1.1
Linear Operators
Some mathematical operations have certain properties in common. These properties are given in the following definition. Definition 1.1.1 An operator T which maps a linear space X into a linear space Y over the same scalar field S is said to be additive if T (x + y) = T (x) + T (y),
for all x, y ∈ X,
and homogeneous if T (sx) = sT (x),
for all x ∈ X, s ∈ S.
An operator that is additive and homogeneous is called a linear operator. Many examples of linear operators exist. Example 1.1.1 Define an operator T from a linear space X into it self by T (x) = sx, s ∈ S. Then T is a linear operator. Example 1.1.2 The operator D = given by D (x) =
d dt
mapping X = C 1 [0, 1] into Y = C[0, 1]
dx = y (t) , 0 ≤ t ≤ 1, dt
is linear. 1
2
1. Linear Spaces
If X and Y are linear spaces over the same scalar field S, then the set L(X, Y ) containing all linear operators from X into Y is a linear space over S if addition is defined by (T1 + T2 )(x) = T1 (x) + T2 (x),
for all x ∈ X,
and scalar multiplication by (sT )(x) = s(T (x)),
for all x ∈ X, s ∈ S.
We may also consider linear operators B mapping X into L(X, Y ). For an x ∈ X we have B(x) = T, a linear operator from X into Y . Hence, we have B(x1 , x2 ) = (B(x1 ))(x2 ) = y ∈ Y. B is called a bilinear operator from X into Y . The linear operators B from X into L(X, Y ) form a linear space L(X, L(X, Y )). This process can be repeated to generate j-linear operators (j > 1 an integer). Definition 1.1.2 A linear operator mapping a linear space X into its scalar S is called a linear functional in X. Definition 1.1.3 An operator Q mapping a linear space X into a linear space Y is said to be nonlinear if it is not a linear operator from X into Y .
1.2
Continuous Linear Operators
Some metric concepts of importance are introduced here. Definition 1.2.1 An operator F from a Banach space X into a Banach space Y is continuous at x = x∗ if lim kxn − x∗ kX = 0 =⇒ lim kF (xn ) − F (x∗ )kY = 0
n→∞
n→∞
Theorem 1.2.2 If a linear operator T from a Banach space X into a Banach space Y is continuous at x∗ = 0, then it is continuous at every point x of space X. Proof. We have T (0) = 0, and from limn→∞ kxn k = 0 we get limn→∞ kT (xn )k = 0. If sequence {xn } (n ≥ 0) converges to x∗ in X, by setting yn = xn − x∗ we obtain limn→∞ kyn k = 0. By hypothesis this implies that lim kT (xn )k = lim kT (xn − x∗ )k = lim kT (xn ) − T (x∗ )k = 0.
n→∞
n→∞
n→∞
1.3. Equations
3
Definition 1.2.3 An operator F from a Banach space X into a Banach space Y is bounded on the set A in X if there exists a constant c < ∞ such that kF (x)k ≤ c kxk ,
for all x ∈ A.
The greatest lower bound (infimum) of numbers c satisfying the above inequality is called the bound of F on A. An operator which is bounded on a ball (open) U (z, r) = {x ∈ X | kx − zk < r} is continuous at z. It turns out that for linear operators the converse is also true. Theorem 1.2.4 A continuous linear operator T from a Banach space X into a Banach space Y is bounded on X. Proof. By the continuity of T there exists ε > 0 such that kT (z)k < 1, if kzk < ε. For 0 6= z ∈ X kT (z)k ≤
1 ε
kzk ,
(1.2.1)
ε , and kT (cz)k = |c| · kT (z)k < 1. Letting c = ε−1 in since kczk < ε for |c| < kzk (1.2.1), we conclude that operator T is bounded on X. The bound on X of a linear operator T denoted by kT kX or simply kT k is called the norm of T . As in Theorem 1.2.4 we get
kT k = sup kT (x)k.
(1.2.2)
kxk=1
Hence, for any bounded linear operator T kT (x)k ≤ kT k · kxk,
for all x ∈ X.
(1.2.3)
From now on, L(X, Y ) denotes the set of all bounded linear operators from a Banach space X into another Banach space Y . It also follows immediately that L(X, Y ) is a linear space if equipped with the rules of addition and scalar multiplication introduced in Section 1.1. The proof of the following result is left as an exercise (see also [129]). Theorem 1.2.5 The set L(X, Y ) is a Banach space for the norm (1.2.2).
1.3
Equations
In a Banach space X solving a linear equation can be stated as follows: given a bounded linear operator T mapping X into itself and some y ∈ X, find an x ∈ X such that T (x) = y. The point x (if it exists) is called a solution of Equation (1.3.1).
(1.3.1)
4
1. Linear Spaces
Definition 1.3.1 If T is a bounded linear operator in X and a bounded linear operator T1 exists such that T1 T = T T1 = I,
(1.3.2)
where I is the identity operator in X (i.e., I(x) = x for all x ∈ X), then T1 is called the inverse of T and we write T1 = T −1 . That is, T −1 T = T T −1 = I.
(1.3.3)
If T −1 exists, then Equation (1.3.1) has the unique solution x = T −1 (y).
(1.3.4)
The proof of the following result is left as an exercise (see also [147], [211], [214]). Theorem 1.3.2 (Banach Lemma on Invertible Operators). If T is a bounded linear operator in X, T −1 exists if and only if there is a bounded linear operator P in X such that P −1 exists and kI − P T k < 1.
(1.3.5)
If T −1 exists, then T −1 =
∞ X
n=0
n
(I − P T ) P
(Neumann Series)
(1.3.6)
and
−1
T ≤
kP k . 1 − kI − P T k
(1.3.7)
Based on Theorem 1.3.2 we can immediately introduce a computational theory for Equation (1.3.1) composed by three factors: (A) Existence and Uniqueness. Under the hypotheses of Theorem 1.3.2 Equation (1.3.1) has a unique solution x∗ . (B) Approximation. The iteration xn+1 = P (y) + (I − P T )(xn ) (n ≥ 0)
(1.3.8)
gives a sequence {xn } (n ≥ 0) of successive approximations, which converges to x∗ for any initial guess x0 ∈ X. (C) Error Bounds. Clearly the speed of convergence of iteration {xn } (n ≥ 0) to x∗ is governed by the estimate: kxn − x∗ k ≤
kI − P T kn kP (y)k + kI − P T kn kx0 k. 1 − kI − P T k
(1.3.9)
1.4. The Inverse of a Linear Operator
1.4
5
The Inverse of a Linear Operator
Let T be a bounded linear operator in X. One way to obtain an approximate inverse is to make use of an operator sufficiently close to T . Theorem 1.4.1 If T is a bounded linear operator in X, T −1 exists if and only if there is a bounded linear operator P1 in X such that P1−1 exists, and
−1
kP1 − T k ≤ P1−1 . (1.4.1) If T −1 exists, then T −1 =
∞ X
n=0
and
−1
T ≤
I − P1−1 T
n
P1−1
−1
−1
P
P 1
≤ . −1 1 − I − P1 T 1 − P1−1 kP1 − T k
Proof. Let P = P1−1 in Theorem 1.3.2 and note that by (1.4.1)
I − P −1 T = P −1 (P1 − T ) ≤ P −1 · kP1 − T k < 1. 1 1 1
(1.4.2)
(1.4.3)
(1.4.4)
That is, (1.3.5) is satisfied. The bounds (1.4.3) follow from (1.3.7) and (1.4.4). That proves the sufficiency. The necessity is proved by setting P1 = T , if T −1 exists. The following result is equivalent to Theorem 1.3.2.
Theorem 1.4.2 A bounded linear operator T in a Banach space X has an inverse T −1 if and only if linear operators P , P −1 exist such that the series ∞ X
n=0
n
(I − P T ) P
(1.4.5)
converges. In this case we have T −1 =
∞ X
n=0
n
(I − P T ) P.
Proof. If series (1.4.5) converges, then it converges to T −1 (see Theorem 1.3.2). The existence of P , P −1 and the convergence of series (1.4.5) is again established as in Theorem 1.3.2, by taking P = T −1 , when it exists. Definition 1.4.3 A linear operator N in a Banach space X is said to be nilpotent if N m = 0, for some positive integer m.
(1.4.6)
6
1. Linear Spaces
Theorem 1.4.4 A bounded linear operator T in a Banach space X has an inverse T −1 and only if there exist linear operators P , P −1 such that I − P T is nilpotent. Proof. If P , P −1 exists and I − P T is nilpotent, then series ∞ X
n=0
(I − P T )n P =
m−1 X n=0
(I − P T )n P
converges to T −1 by Theorem 1.4.2. Moreover, if T −1 exists, then P = T −1 , P −1 = T exists, and I − P T = I − T −1 T = 0 is nilpotent.
1.5
Fr´ echet Derivatives
The computational techniques to be considered later make use of the derivative in the sense of Fr´echet [211], [212], [264]. Definition 1.5.1 Let F be an operator mapping a Banach space X into a Banach space Y . If there exists a bounded linear operator L from X into Y such that lim
k∆xk→0
kF (x0 + ∆x) − F (x0 ) − L (∆x)k = 0, k∆xk
(1.5.1)
then P is said to be Fr´echet differentiable at x0 , and the bounded linear operator P ′ (x0 ) = L
(1.5.2)
is called the first Fr´echet-derivative of F at x0 . The limit in (1.5.1) is supposed to hold independently of the way that ∆x approaches 0. Moreover, the Fr´echet differential δF (x0 , ∆x) = F ′ (x0 ) ∆x
(1.5.3)
is an arbitrary close approximation to the difference F (x0 + ∆x) − F (x0 ) relative to k∆xk, for k∆xk small. If F1 and F2 are differentiable at x0 , then (F1 + F2 )′ (x0 ) = F1′ (x0 ) + F2′ (x0 ).
(1.5.4)
Moreover, if F2 is an operator from a Banach space X into a Banach space Z, and F1 is an operator from Z into a Banach space Y , their composition F1 ◦F2 is defined by (F1 ◦ F2 )(x) = F1 (F2 (x)),
for all x ∈ X.
(1.5.5)
7
1.5. Fr´echet Derivatives
It follows from Definition 1.5.1 that F1 ◦ F2 is differentiable at x0 if F2 is differentiable at x0 and F1 is differentiable at F2 (x0 ) of Z, with (chain rule): (F1 ◦ F2 )′ (x0 ) = F1′ (F2 (x0 ))F2′ (x0 ).
(1.5.6)
In order to differentiate an operator F we write: F (x0 + ∆x) − F (x0 ) = L(x0 , ∆x)∆x + η(x0 , ∆x),
(1.5.7)
where L(x0 , ∆x) is a bounded linear operator for given x0 , ∆x with lim
k∆xk→0
L(x0 , ∆x) = L,
(1.5.8)
and kη(x0 , ∆x)k = 0. k∆xk k∆xk→0 lim
(1.5.9)
Estimates (1.5.8) and (1.5.9) give lim
k∆xk→0
L(x0 , ∆x) = F ′ (x0 ).
(1.5.10)
If L(x0 , ∆x) is a continuous function of ∆x in some ball U (0, R) (R > 0), then L(x0 , 0) = F ′ (x0 ).
(1.5.11)
We need the definition of a mosaic: Higher-order derivatives can be defined by induction: Definition 1.5.2 If F is (m − 1)-times Fr´echet-differentiable (m ≥ 2 an integer), and an m-linear operator A from X into Y exists such that
(m−1)
F (x0 + ∆x) − F (m−1) (x0 ) − A (∆x) lim = 0, (1.5.12) k∆xk→0 k∆xk then A is called the m-Fr´echet-derivative of F at x0 , and A = F (m) (x0 )
(1.5.13)
Higher partial derivatives in product spaces can be defined as follows: Define Xij = L(Xj , Xi ),
(1.5.14)
where X1 , X2 , . . . are Banach spaces and L(Xj , Xi ) is the space of bounded linear operators from Xj into Xi . The elements of Xij are denoted by Lij , etc. Similarly, Xijm = L(Xm , Xij ) = L(Xm , L(Xj , Xi ))
(1.5.15)
8
1. Linear Spaces
denotes the space of bounded bilinear operators from Xk into Xij . Finally, we write Xij1 j2 ···jm = L Xjk , Xij1 j2 ···jm−1 , (1.5.16)
which denotes the space of bounded linear operators from Xjm into Xij1 j2 ···jm−1 . The elements A = Aij1 j2 ···jm of Xij1 j2 ···jm are a generalization of m-linear operators [10], [54]. Consider an operator Fi from space X=
n Y
Xjp
(1.5.17)
p=1
into Xi , and that Fi has partial derivatives of orders 1, 2, . . . , m − 1 in some ball U (x0 , R), where R > 0 and (0) (0) (0) x0 = xj1 , xj2 , . . . , xjn ∈ X. (1.5.18) For simplicity and without loss of generality we renumber the original spaces so that j1 = 1, j2 = 2, . . . , jn = n.
(1.5.19)
Hence, we write (0)
(0)
x0 = (x1 , x2 , . . . , x(0) n ).
(1.5.20)
A partial derivative of order (m − 1) of Fi at x0 is an operator Aiq1 q2 ···qm−1 =
∂ (m−1) Fi (x0 ) ∂xq1 ∂xq2 · · · ∂xqm−1
(1.5.21)
(in Xiq1 q2 ···qm−1 ) where 1 ≤ q1 , q2 , . . . , qm−1 ≤ n.
(1.5.22)
Let P (Xqm ) denote the operator from Xqm into Xiq1 q2 ···qm−1 obtained from (1.5.21) by letting (0)
xj = xj ,
j 6= qm ,
(1.5.23)
for some qm , 1 ≤ qm ≤ n. Moreover, if P ′ (x(0) qm ) =
∂ m Fi (x0 ) ∂ ∂ m−1 Fi (x0 ) · = , ∂xqm ∂xq1 ∂xq2 · · · ∂xqm−1 ∂xq1 · · · ∂xqm
(1.5.24)
exists, it will be called the partial Fr´echet-derivative of order m of Fi with respect to xq1 , . . . , xqm at x0 .
1.6. Integration
9
Furthermore, if Fi is Fr´echet-differentiable m times at x0 , then ∂ m Fi (x0 ) ∂ m Fi (x0 ) xq1 · · · xqm = xs · · · xsm ∂xq1 · · · ∂xqm ∂xs1 ∂xs2 · · · ∂xsm 1
(1.5.25)
for any permutation s1 , s2 , . . . , sm of integers q1 , q2 , . . . , qm and any choice of points xq1 , . . . , xqm , from Xq1 , . . . , Xqm respectively. Hence, if F = (F1 , . . . , Ft ) is an operator from X = X1 × X2 × · · · × Xn into Y = Y1 × Y2 × · · · × Yt , then ∂ m Fi (m) (x0 ) = (1.5.26) F ∂xj1 · · · ∂xjm x=x0 i = 1, 2, . . . , t, j1 , j2 , . . . , jm = 1, 2, . . . , n, is called the m-Fr´echet derivative of F at (0) (0) (0) x0 = (x1 , x2 , . . . , xn ).
1.6
Integration
In this section we state results concerning the mean value theorem, Taylor’s theorem, and Riemannian integration. The proofs are left out as exercises. The mean value theorem for differentiable real functions f : f (b) − f (a) = f ′ (c)(b − a),
(1.6.1)
where c ∈ (a, b), does not hold in a Banach space setting. However, if F is a differentiable operator between two Banach spaces X and Y , then kF (x) − F (y)k ≤
sup x ¯∈L(x,y)
kF ′ (¯ x)k · kx − yk,
(1.6.2)
where L(x, y) = {z : z = λy + (1 − λ)x, 0 ≤ λ ≤ 1}.
(1.6.3)
z(λ) = λy + (1 − λ)x,
(1.6.4)
Set 0 ≤ λ ≤ 1,
and F (λ) = F (z(λ)) = F (λy + (1 − λ)x).
(1.6.5)
Divide the interval 0 ≤ λ ≤ 1 into n subintervals of lengths ∆λi , i = 1, 2, . . . , n, choose points λi inside corresponding subintervals and as in the real Riemann integral consider sums X
F (λi )∆λi =
σ
n X
F (λi )∆λi ,
(1.6.6)
i=1
where σ is the partition of the interval, and set |σ| = max ∆λi . (i)
(1.6.7)
10
1. Linear Spaces
Definition 1.6.1 If X S = lim F (λi ) ∆λi |σ|→0
(1.6.8)
σ
exists, then it is called the Riemann integral from F (λ) from 0 and 1, denoted by Z 1 Z y S= F (λ) dλ = F (λ) dλ. (1.6.9) 0
x
Definition 1.6.2 A bounded operator P (λ) on [0, 1] such that the set of points of discontinuity is of measure zero is said to be integrable on [0, 1]. We now state the famous Taylor theorem [171]. Theorem 1.6.3 If F is m-times Fr´echet-differentiable in U (x0 , R), R > 0, and F (m) (x) is integrable from x to any y ∈ U (x0 , R), then F (y) = F (x) +
m−1 X
(n) 1 (x)(y n! F
n=1
− x)n + Rm (x, y),
m−1 X
(n) n 1
F (y) − F (x)(y − x) n!
≤ n=0
where
Rm (x, y) =
1.7
Z
1 0
(m)
ky − xkm
F (¯ x) , m! x ¯∈L(x,y) sup
m F (m) λy + (1 − λ) x (y − x)
(1−λ)m−1 (m−1)! dλ.
(1.6.10)
(1.6.11)
(1.6.12)
Exercises
1.1 Show that the operators introduced in Examples 1.1.1 and 1.1.2 are indeed linear. 1.2. Show that the Laplace transform ∆=
∂2 ∂2 ∂2 + 2+ 2 2 ∂x1 ∂x2 ∂x3
is a linear operator mapping the space of real functions x = x(x1 , x2 , x3 ) with continuous second derivatives on some subset D of R3 into the space of continuous real functions on D. 1.3. Define T : C ′′ [0, 1] × C ′ [0, 1] → C[0, 1] by 2 d d d2 x dy x T (x, y) = α 2 β = α 2 + β , 0 ≤ t ≤ 1. y dt dt dt dt Show that T is a linear operator.
1.7. Exercises
11
1.4. In an inner product h·, ·i space show that for any fixed z in the space T (x) = hx, zi is a linear functional. 1.5. Show that an additive operator T from a real Banach space X into a real Banach space Y is homogeneous if it is continuous. 1.6. Show that matrix A = {aij }, i, j = 1, 2, . . . , n has an inverse if |aii | >
1 2
n X j=1
|aij | > 0,
i = 1, 2, . . . , n.
1.7. Show that the linear integral equation of second Fredholm kind in C[0, 1] Z 1 x(s) − λ K(s, t)x(t)dt = y(s), 0 ≤ λ ≤ 1, 0
where K(s, t) is continuous on 0 ≤ s, t ≤ 1, has a unique solution x(s) for y(s) ∈ C[0, 1] if Z |λ| < max [0,1]
0
1
|K(s, t)|dt
−1
.
1.8. Prove Theorem 1.2.5. 1.9. Prove Theorem 1.3.2. 1.10. Show that the operators defined below are all linear: (a) Identity operator. The identity operator IX : X → X given by IX (x) = x, for all x ∈ X. (b) Zero operator. The zero operator O : X → Y given by O(x) = 0, for all x ∈ X. R1 (c) Integration. T : C[a, b] → C[a, b] given by T (x(t)) = 0 x(s)ds, t ∈ [a, b].
(d) Differentiation. Let X be the vector space of all polynomials on [a, b]. Define T on X by T (x(t)) = x′ (t).
(e) Vector algebra. The cross product with one factor kept fixed. Define T1 : R3 → R5 . Similarly, the dot product with one fixed factor. Define T2 : R3 → R. (f) Matrices. A real matrix A = {aij } with m rows and n columns. Define T : Rn → Rm given by y = Ax.
12
1. Linear Spaces
1.11. Let T be a linear operator. Show: (a) the R(T ) (range of T ) is a vector space; (b) if dim(T ) = n < ∞, then dim R(T ) ≤ n; (c) the null/space N (T ) is a vector space. 1.12. Let X, Y be vector spaces, both real or both complex. Let T : D(T ) → Y (domain of T ) be a linear operator with D(T ) ⊆ X and R(T ) ⊆ Y . Then, show: (a) the inverse T −1 : R(T ) → D(T ) exists if and only if T (x) = 0 ⇒ x = 0; (b) if T −1 exists, it is a linear operator; (c) if dim D(T ) = n < ∞ and T −1 exists, then dim R(T ) = dim D(T ). 1.13. Let T : X → Y , P : Y → Z be bijective linear operators, where X, Y , Z are vector spaces. Then, show: the inverse (ST )−1 : Z → X of the product ST exists, and (ST )−1 = T −1 S −1 . 1.14. If the product (composite) of two linear operators exists, show that it is linear. 1.15. Let X be the vector space of all complex 2×2 matrices and define T : X → X by T (x) = cx, where c ∈ X is fixed and cx denotes the usual product of matrices. Show that T is linear. Under what conditions does T −1 exist? 1.16. Let T : X → Y be a linear operator and dim X = dim Y = n < ∞. Show that R(T ) = Y if and only if T −1 exists. 1.17. Define the integral operator T : C[0, 1] → C[0, 1] by y = T (x), where y(t) = R1 k(x, s)x(s)ds and k is continuous on [0, 1] × [0, 1]. Show that T is linear 0 and bounded. 1.18. Show that the operator T defined in Exercise 1.10(f) is bounded. 1.19. If a normed space X is finite dimensional then show that every linear functional on X is bounded. 1.20. Let T : D(T ) → Y be a linear operator, where D(T ) ⊆ X and X, Y are normed spaces. Show:
1.7. Exercises
13
(a) T is continuous if and only if it is bounded; (b) if T is continuous at a single point, it is continuous. 1.21. Let T be a bounded linear operator. Show: (a) xn → x (where xn , x ∈ D(T )) ⇒ T (xn ) → T (x); (b) the null space N (T ) is closed. 1.22. If T 6= 0 is a bounded linear operator, show that for any x ∈ D(T ) such that kxk < 1, we have kT (x)k < kT k. 1.23. Show that the operator T : ℓ∞ → ℓ∞ defined by y = (yi ) = T (x), yi = x = (xi ), is linear and bounded.
xi i ,
1.24. Let T : C[0, 1] → C[0, 1] be defined by y(t) =
Z
t
x(s)ds. 0
Find R(T ) and T −1 : R(T ) → C[0, 1]. Is T −1 linear and bounded? 1.25. Show that the functionals defined on C[a, b] by f1 (x) =
Z
b
(y0 ∈ C[a, b])
x(t)y0 (t)dt
a
f2 (x) = c1 x(a) + c2 x(b)
(c1 , c2 fixed)
are linear and bounded. 1.26. Find the norm of the linear functional f defined on C[−1, 1] by f (x) =
Z
0
−1
x(t)dt −
Z
1
x(t)dt.
0
1.27. Show that f1 (x) = max x(t), t∈J
f2 (x) = min x(t), t∈J
J = [a, b]
define functionals on C[a, b]. Are they linear? Bounded? 1.28. Show that a function can be additive and not homogeneous. For example, let z = x + iy denote a complex number, and let T : C → C be given by T (z) = z¯ = x − iy.
14
1. Linear Spaces
1.29. Show that a function can be homogeneous and not additive. For example, consider the operator T : R2 → R given by T ((x1 , x2 )) =
x21 . x2
1.30. Let F be an operator in C[0, 1] defined by Z 1 s x(t)dt, 0 ≤ λ ≤ 1. F (x)(s) = x(s) s + t 0 Show that for x0 , z ∈ C[0, 1] Z 1 Z 1 s s F ′ (x0 )z = x0 (s) z(t)dt + z(s) x0 (t)dt. s + t s + t 0 0 1.31. Find the Fr´echet-derivative of the operator F in R2∞ given by 2 x + 7x + 2xy − 3 F xy = . x + y3 1.32. Find the first and second Fr´echet-derivatives of the Uryson operator Z 1 U (x) = k(s, t, x(t))dt 0
in C[0, 1] at x0 = x0 (s). 1.33. Find the Fr´echet-derivative of the Riccati differential operator R(z) =
dz + p(t)z 2 + q(t)z + r(t), dt
from C ′ [0, s] into C[0, s] at z0 = z0 (t) in C ′ [0, s]. 1.34. Find the first two Fr´echet-derivatives of the operator 2 x + y2 − 3 in R2 . F xy = x sin y
1.35. Consider the partial differential operator F (x) = ∆x − x2
from C 2 (I) into C(I), the space of all continuous function on the square 0 ≤ α, β ≤ 1. Show that F ′ (x0 )z = ∆z(α, β) − 2x0 (α, β)z(α, β), where ∆ is the usual Laplace operator.
1.7. Exercises 1.36. Let F (L) = L3 , in L(x). Show: F ′ (L0 ) = L0 [ ]L0 + L20 [ ] + [ ]L0 . 1.37. Let F (L) = L−1 , in L(x). Show: −1 F ′ (L0 ) = −L−1 0 [ ]L0 ,
provided that L−1 0 exists. 1.38. Show estimates (1.6.1) and (1.6.2). 1.39. Show Taylor’s Theorem 1.6.3. 1.40. Integrate the operator F (L) = L−1 in L(X) from L0 = I to L1 = A, where kI − Ak < 1.
15
This page intentionally left blank
Chapter 2
Monotone convergence This chapter introduces the fundamentals of the theory of divided differences of a nonlinear operator. Several results are also provided involving differences as well as Fr´echet derivatives satisfying Lipschitz or monotone-type conditions that will be used later. Results on monotone convergence are discussed here and also in chapters 11 and 12.
2.1
Partially Ordered Topological Spaces
Let X be a linear space. We introduce the following definition: Definition 2.1.1 a partially ordered topological linear space (POTL-space) is a locally convex topological linear space X which has a closed proper convex cone. A proper convex cone is a subset K such that K + K ⊂ K, αK ⊂ K for α > 0, and K ∩ (−K) = {0} . Thus the order relation ≤, defined by x ≤ y if and only if y − x ∈ K, gives a partial ordering which is compatible with the linear structure of the space. The cone K which defines the ordering is called the positive cone since K = {x ∈ X | x ≥ 0} . The fact that K is closed implies also that intervals, [a, b] = {z ∈ X | a ≤ z ≤ b}, are closed sets. Example 2.1.1 Some simple examples of POTL-spaces are: (1) X = E n , n-dimensional Euclidean space, with K = {(x1 , x2 , ..., xn ) ∈ E n | xi ≥ 0, i = 1, 2, ..., n} ; (2) X = E n with K = {(x1 , x2 , ..., xn ) ∈ E n | xi ≥ 0, i = 1, 2, ..., n − 1, xn = 0} ; 17
18
2. Monotone convergence
(3) X = C n [0, 1], continuous functions, maximum norm topology, pointwise ordering; (4) X = C n [0, 1], n−times continuously differentiable functions with kf k =
n X
k=0
max f (K) (t) , and pointwise ordering;
(5) C = Lp [0, 1] , 0 ≤ p ≤ ∞ usual topology, K = {f ∈ Lp [0, 1] | f (t) ≤ 0 a.e.} . Remark 2.1.1 Using the above examples, it is easy to see that the closedness of the positive cone is not, in general, a strong enough connection between the ordering and the topology. Consider, for example, the following properties of sequences of real numbers: (1) x1 ≤ x2 ≤ · · · ≤ x∗ , and sup {xn } x∗ implies lim xn = x∗ ; n→∞
(2) lim xn = 0 implies that there exists a sequence {yn } with y1 ≥ y2 ≥ · · · ≥ 0, n→∞
inf {yn } = 0 and −yn ≤ xn ≤ yn ;
(3) 0 ≤ xn ≤ yn , and lim yn = 0 imply lim xn = 0. n→∞
n→∞
Unfortunately, these statements are not trues for all POTL-spaces: (a) In X = C [0, 1] let xn (t) = −tn . Then x1 ≤ x2 ≤ · · · ≤ 0, and sup {xn } = 0, but kxn k = 1 for all n, so lim xn does not exist. Hence (1) does not hold. n→∞
1 (b) In X = L1 [0, 1] let xn (t) = n for n+1 ≤ t ≤ n1 and zero elsewhere. Then lim kxn k = 0 but clearly property (2) does not hold. n→∞
n ˇ n ≤ yn , and lim yn = 0, (c) In X = C 1 [0, 1] let xn (t) = tn , yn (t) = n1 . then 0Sx n→∞ tn n−1 = 1 + 1 > 1; hence xn does not converge but kxn k = max n + max t n to zero.
We will now devote a brief discussion of certain types of POTL spaces in which some of the above statements are true.
Definition 2.1.2 A POTL-space is called regular if every order-bounded increasing sequence has a limit. Remark 2.1.2 Examples of regular POTL-spaces are E n and Lp , 0 ≤ p ≤ ∞, where as C [0, 1] , C n [0, 1] and L∞ [0, 1] are not regular, as was shown in (a) of the above remark. If {xn } n ≥ 0 is a monotone increasing sequence and lim xn = x∗ n→∞ exists, then for any k0 , n ≥ k0 implies xn ≥ xk0 . Hence x∗ = lim xn ≥ xk0 , i.e., n→∞
2.1. Partially Ordered Topological Spaces
19
x∗ is an upper bound on {xn } n = 0. Moreover, if y is any other upper bound, then xn ≤ y, and hence x∗ = limn→∞ xn ≤ y, i.e., x∗ = sup {xn } . This shows that in any POTL-space, the closedness of the positive cone guarantees that, if a monotone increasing sequence has a limit, then it is also a supremum. In a regular space, the converse of this is true; i.e., if a monotone increasing sequence has a supremum, then it also has a limit. It is important to note that the definition of regularity involves both an order concept (monotone boundedness) and a topological concept (limit). Definition 2.1.3 A POTL-space is called normal if, given a local base U for the topology, there exists a positive number η so that if 0 ≤ x ∈ V ∈ U then [0, x] ⊆ η U . Remark 2.1.3 If the topology of a POTL-space is given by a norm then this space is called a partially ordered normed space (PON)-space. If a PON-space is complete with respect to its topology then it is called a partially ordered Banach space (POB)space. According to Definition 2.1.3. A PON-space is normal if and only if there exists a positive number α such that kxk ≤ α kyk
for all
x, y ∈ X
with
0 ≤ x ≤ y.
Let us note that any regular POB-space is normal. The converse is not true. For example, the space C [0, 1] , ordered by the cone of nonnegative functions, is normal but is not regular. All finite dimensional POTL-spaces are both normal and regular. Remark 2.1.4 Let us now define some special types of operators acting between two POTL-spaces. First we introduce some notation if X and Y are two linear spaces then we denote by (X, Y ) the set of all operators from X into Y and by L (X, Y ) the set of all linear operators from X into Y. If X and Y are topological linear spaces then we denote by LB (X, Y ) the set of all continuous linear operators from X into Y . For simplicity the spaces L (X, X) and LB (X, X) will be denoted by L (X) and LB (X) . Now let X and Y be two POTL-spaces and consider an operator G ∈ (X, Y ). G is called isotone (resp. antitone) if x ≥ y implies G (x) ≤ G (y) (resp. G (x) ≤ G (y)). G is called nonnegative if x ≥ 0 implies G (x) ≥ 0. For linear operators the nonnegativity is clearly equivalent with the isotony. Also, a linear operator is inverse nonnegative if and only if it is invertible and its inverse is nonnegative. If G is a nonnegative operator then we write G ≥ 0. If G and H are two operators from X into Y such that H − G is nonnegative then we write G ≤ H. If Z is a linear space then we denote by I = Iz the identity operator in Z (i.e., I (x) = x for all x ∈ Z). If Z is a POTL-space then we have obviously I ≥ 0. Suppose that X and Y are two POTL-spaces and consider the operators T ∈ L (X, Y ) and S ∈ L (Y, X). If ST ≤ Ix (resp. ST ≥ Ix ) then S is called a left subinverse (resp. superinverse) of T and T is called a right sub-inverse (resp. superinverse) of S. We say that S is a subinverse of T is S is a left-as a right subinverse of T .
20
2. Monotone convergence
We finally end this section by noting that for the theory of partially ordered linear spaces the reader may consult M.A. Krasnosel’skii [210]–[213], Vandergraft [332] or Argyros and Szidarovszky [99].
2.2
Divided Differences in a Linear Space
The concept of a divided difference of a nonlinear operator generalizes the usual notion of a divided difference of a scalar function in the same way in which the Fr´echet-derivative generalizes the notion of a derivative of a function. Definition 2.2.1 Let F be a nonlinear operator defined on a subset D of a linear space X with values in a linear space Y , i.e., F ∈ (D, Y ) and let x, y be two points of D. A linear operator from X into Y , denoted [x, y] , which satisfies the condition [x, y] (x − y) = F (x) − F (y)
(2.2.1)
is called a divided difference of F at the points x and y. Remark 2.2.1 If X and Y are topological linear spaces then we shall always assume the continuity of the linear operator [x, y]. (Generally, [x, y] ∈ L (X, Y ) if X, Y are POTL-spaces then [x, y] ∈ LB (X, Y )). Obviously, condition (2.2.1) does not uniquely determine the divided difference, with the exception of the case when X is one-dimensional. An operator [·, ·] : D × D → L (X, Y ) satisfying (2.2.1) is called a divided difference of F on D. If we fix the first variable, we get an operator
x0 , · : D → L (X, Y ) .
(2.2.2)
Let x1 , x2 be two points of D. A divided difference of the operator (2.2.2) at the points x1 , x2 will be called a divideddifferenceof the second order of F at the points x0 , x1 , x2 and will be denoted by x0 , x1 , x2 . We have by definition
x0 , x1 , x2
x1 − x2 = x0 , x1 − x0 , x2 .
(2.2.3)
Obviously, x0 , x1 , x2 ∈ L (X, L (X, Y )) . Let us now state a well known result due to Kantorovich concerning the location of fixed points which will be used extensively later. Theorem 2.2.2 Let X be a regular POTL-space and let x, y be two points of X such that x ≤ y. If H : [x, y] → X is a continuous isotone operator having the property that x ≤ H (x) and y ≥ H (y) , then there exists a point z ∈ [x, y] such that H (z) = z.
2.3. Divided Difference in a Banach Space
2.3
21
Divided Difference in a Banach Space
In this section we will assume that X and Y are Banach spaces. Accordingly we shall have [x, y] ∈ LB (X, Y ) , [x, y, z] ∈ LB (X, LB (X, Y )) . As we will see in later chapters, most convergence theorems in a Banach space require that the divided differences of F satisfy Lipschitz conditions of the form: k[x, y] − [x, z]k ≤ c0 ky − zk
k[y, x] − [z, x]k ≤ c1 ky − zk k[x, y, z] − [u, y, z]k ≤ c2 kx − yk
(2.3.1) for all x, y, z, u ∈ D.
(2.3.2) (2.3.3)
It is a simple exercise to show that if [·, ·] is a divided difference of F satisfying (2.3.1) or (2.3.2) then F is Fr´echet differentiable on D and we have F ′ (x) = [x, x]
for all x ∈ D.
(2.3.4)
Moreover, if (2.3.1) and (2.3.2) are both satisfied then the Fr´echet derivative F ′ is Lipschitz continuous on D with Lipschitz constant I = c0 + c1 . At the end of this section we shall give an example of divided differences of the first and of the second order in the finite dimensional case. We shall consider the space Rq equipped with the Chebysheff norm which is given by kxk = max {|xi | ∈ R|1 ≤ I ≤ q}
for x = (x1 , x2 , ..., xq ) ∈ Rq .
(2.3.5)
It follows that the norm of a linear operator L ∈ LB (Rq ) represented by the matrix with entries Iij is given by q X kLk = max |Iij | | |1 ≤ i ≤ q . (2.3.6) j=1
We cannot give a formula for the norm of a bilinear operator. However, if B is a bilinear operator with entries bijk then we have the estimate q X q X |bijk | |1 ≤ i ≤ q . (2.3.7) kBk ≤ max j=1 k=1
Let U be an open ball of Rq and let F be an operator defined on U with values in Rq . We denote by f1 , ..., fq the components of F . For each x ∈ U we have F (x) = (f1 (x) , ..., fq (x))T .
(2.3.8)
Moreover, we introduce the notation Dj fi (x) =
∂f (x) , ∂xj
Dkj fi (x) =
∂ 2 fi (x) . ∂xj ∂xk
(2.3.9)
22
2. Monotone convergence Let x, y be two points of U and let us denote by [x, y] the matrix with entries [x, y]ij =
1 xj −yj
(fi (x1 , ..., xj , yj+1 , ..., yq ) − fi (x1 , ..., xj−1 , yj , ..., yq )) . (2.3.10)
The linear operator [x, y] ∈ LB (Rq ) defined in this way obviously satisfies condition (2.2.1). If the partial derivatives Dj fi satisfy some Lipschitz conditions of the form |Dj fi (x1 , ..., xk + t, ..., xq ) − Dj fi (x1 , ..., xk , ..., xq )| ≤ pijk |t| then condition (2.3.1) and (2.3.2) will be satisfied with Pq 1 Pq i i p + p c0 = max |1 ≤ i ≤ q k=j+1 jk 2 j=1 jj and
c1 = max
Pj−1 i 1 Pq i p + p |1 ≤ i ≤ q . k=1 jk 2 j=1 jj
We shall prove (2.3.1) only since (2.3.2) can be proved similarly. Let x, y, z be three points of U . We shall have in turn Pq n [x, y]ij − [x, z] = k=1 [x, (y1 , ..., yk , zk+1 , ..., zq )]ij o − [x (y1 , ..., yk−1 , zk , ..., zq )]ij .
(2.3.11)
(2.3.12)
(2.3.13)
(2.3.14)
by (2.3.10). If k ≤ j then we have
[x, (y1 , ..., yk , zk+1 , ..., zq )]ij − [x, (y1 , ..., yk−1 , zk , ..., zq )]ij = 1 {fi (x1 , ..., xj , zj+1 , ..., zq ) − fi (x1 , ..., xj−1 , zj , ..., zq )} = xj −z j 1 − xj −zj {fi (x1 , ..., xj , zj+1 , ..., zq ) − fi (x1 , ..., xj−1 , zj , ..., zq )} = 0. For k = j we have h i [x, (y1 , ..., yj , zj+1 , ..., zq )]ij − x, (y1 , ..., yj−1 , zj , ..., zq )ij = 1 = xj −y {fi (x1 , ..., xj , zj+1 , ..., zq ) − fi (x1 , ..., xj−1 , yi , zj+1 , ..., zq )} j 1 − xj −y {f (x , ..., x , z , ..., z ) − f (x , ..., x , z , ..., z )} i 1 j j+1 q i 1 j−1 j q j Z 1 = {Dj fi (x1 , ..., xj , yj + t (xj − yj ) , zj+1 , ..., zq ) 0
−Dj fi (x1 , ..., xj , zj + t (xj − zj ) , zj+1 , ..., zq )} dt| Z 1 ≤ |yj − zj | pijj tdt = 12 |xj − zj | pijj 0
2.3. Divided Difference in a Banach Space
23
(by (2.3.11)). Finally for k > j we have using (2.3.10) and (2.3.14) again [x, (y1 , ..., yk , zk+1 , ..., zq )]ij − [x, (y1 , ..., yk−1 , zk , ..., zq )]ij = 1 = xj −y {fi (x1 , ..., xj , yj+1 , ..., yk , zk+1 , ..., zq ) j − fi (x1 , ..., xj−1 , yj , ..., yk , zk+1 , ..., zq )
− fi (x1 , ..., xj , yj+1 , ..., yk−1 , zk , ..., zq ) +fi (x1 , ..., xj−1 , yj , ..., yk−1 , zk , ..., zq )}| Z 1 = {fi (x1 , ..., xj−1 , yj + t (xj − yj ) , yj+1 , ..., yk , zk+1 , ..., zq ) 0
−fi (x1 , ..., xj−1 , yj + t (xj − yj ) , yj+1 , ..., yk−1 , zk , ..., zq )} dt|
≤ |yk − zk | pijk .
By adding all the above we get q 1 X i y] − [x, z] ≤ |y − z | p + |yk − zk | pijk [x, ij j j jj ij 2 k=j+1 q q 1 X X pijj + pijk . ≤ ky − zk 2 j=1 k=j+1
Consequently condition (2.3.1) is satisfied with c0 given by (2.3.12). If each fj has continuous second order partial derivatives which are bounded on U we have pijk = sup {|Djk fi (x)| |x ∈ U } . In this case pijk = pikj so that c0 = c1 . Moreover, consider again three points x, y, z of U . Similarly with (2.3.14) the second divided difference of F at x, y, z is the bilinear operators defined by [x, y, z]ijk =
1 yk −zk
n o [x, (y1 , ..., yk , zk+1 , ..., zq )]ij − [x, (y1 , ..., yk−1 , zk , ..., zq )]ij .
(2.3.15)
It is easy to see as before that [x, y, z]ijk = 0 for k < j. For k = j we have [x, y, z]ijj = [xj , yj , zj ]t fi (x1 , ..., xi−1 , t, zj+1 , ..., zq )
(2.3.16)
where the right hand side of (2.3.16) represents the divided difference of fi (x1 , ..., xj−1 , t, zj+1 , ..., zq ) as a function of t, at the points xj , yj , zj . Using Genocchi’s
24
2. Monotone convergence
integral representation of divided differences of scalar functions we get [x, y, z]ijj = =
Z1 Z1 0
0
tDjj fi (x1 , ..., xj−1 , xj + t (yj − xj ) + ts (zj − yj ) , zj+1 , ..., zq ) dsdt. (2.3.17)
Hence, for k > j we obtain [x, y, z]ijk = =
1 (yk −zk )(xj −yj )
{fi (x1 , ..., xj , yj+1 , ..., yk , zk+1 , ..., zq )
− fi (x1 , ..., xj , xj+1 , ..., yk−1 , zk , ..., zq ) − fi (x1 , ..., xj−1 , yj , ..., yk , zk+1 , ..., zq )
+fi (x1 , ..., xj−1 , yj , ..., yk−1 , zk , ..., zq )} Z 1 1 {Dk fi (x1 , ..., xj , yj+1 , ..., yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq ) xj −yj 0
−Dk fi (x1 , ..., xj−1 , yj , ..., yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq )} dt Z 1Z 1 = Dkj fi (x1 , ..., xj−1 , yj + s (xj − yi ) , yj+1 , ..., yk−1 , zk 0
(2.3.18)
0
+ t (yk − zk ) , zk+1 , ..., zq )dsdt.
We now want to show that if ij |Dkj fi (v1 , ..., vm + t, ..., vq ) − Dkj fi (v1 , ..., vm , ...vq )| ≤ qkm |t|
for all v = (v1 , ..., vq ) ∈ U, 1 ≤ i, j, k, m ≤ q,
(2.3.19)
then the divided difference of F of the second order defined by (2.3.15) satisfies condition (2.3.3) with the constant c2 = max
1≤i≤q
Pq
j=1
Pq Pj−1 ij 1 ij 1 Pj−1 ij 1 Pq ij qjj + q + q + q k=j+1 m=1 km . 6 2 m=1 jm 2 k=j+1 kj (2.3.20)
Let u, x, y, z be four points of U . Then using (2.3.15) we can easily have n [(x1 , ..., xm , um+1 , ..., uq ) , y, z]ijk o [(x1 , ..., xm−1 , um , ..., uq ) , y, z]ijk .
[x, y, z]ijk − [u, y, z]ijk =
Pq
m=1
(2.3.21)
2.4. Divided Differences and Monotone Convergence
25
If m = j the terms in (2.3.21) vanish so that using (2.3.18) and (2.3.19) we deduce that for k > j [x, y, z]ijk − [u, y, z]ijk = Z 1Z 1 Pj−1 = m=1 {Dkj fi (x1 , ..., xm , um+1 , ..., uj−1 , yj + s (xj − yj ) , 0
0
yj+1 , ..., yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq )
− Dkj fi (x1 , ..., xm−1 , um , ..., uj−1 , yj + s (xj − yj ) , yj+1 , ..., yk−1 zk + t (yk − zk ) , zk+1 , ..., zq )} dsdt Z 1Z 1 + {Dkj fi (x1 , ..., xj−1 , yj + s (xj − yj ) , yj+1 , ..., ) 0
0
yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq } − Dkj fi (x1 , ..., xj−1 , yj + s (uj − yj ) , yj+1 , ..., yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq )} dsdt| Pj−1 1 ij ij ≤ |xj − uj | qkj + m=1 |xm − um | qkm . 2
Similarly for k = j we obtain in turn [x, y, z]ijj − [u, y, z]ijj = Z 1 Z 1 t {Djj fi (x1 , ..., xj−1 , xj + t (yj − xj ) + ts (zj − yj ) , zj+1 , ..., zq ) = 0
0
−Djj fi (x1 , ..., xj−1 , uj + t (yj − uj ) + ts (zj − yj ) , zj+1 , ..., zq )} dsdt Z 1Z 1 Pj−1 + m=1 t{Djj fi (x1 , ..., xm , um+1 , ..., 0
0
uj−1 , xj + t (xj − yj ) + ts (zj − y) j, zj+1 , ..., zq )
− Djj fi (x1 , ..., xm−1 , um , ..., uj−1 , xj + t (yj − xj ) + ts (zj − yj ) , zj+1 , ..., zq )} dsdt| Pj−1 ij ij ≤ 16 |xj − uj | qjj + 21 m=1 |xm − um | qjm .
Finally using the estimate (2.3.7) of the norm of a bilinear operator, we deduce that condition (2.3.3) holds with c2 given by (2.3.20).
2.4
Divided Differences and Monotone Convergence
In this section we make an introduction to the problem of approximating a locally unique solution x∗ of the nonlinear operator equation F (x) = 0, in a POTL-space X. In particular, consider an operator F : D ⊆ X → Y where X is a POTL-space
26
2. Monotone convergence
with values in a POTL-space Y . Let x0 , y0 , y−1 be three points of D such that x0 ≤ y0 ≤ y−1 ,
[x0 , y−1 ] ,
and denote D1 = (x, y) ∈ X 2 | x0 ≤ x ≤ y ≤ y0 , D2 = (y, y−1 ) ∈ X 2 | x0 ≤ u ≤ y0 ,
D3 = D1 ∪ D2 .
(2.4.1)
Assume there exist operators A0 : D3 → LB (X, Y ) , A : D1 → L (X, Y ) such that: (a) F (y) − F (x) ≤ A0 (w, z) (y − x) f orall (x, y) , (y, w) ∈ D1 , (w, z) ∈ D3 ; (2.4.2) (b) the linear operator A0 (u, v) has a continuous nonsingular nonnegative left subinverse; (c) F (y) − F (x) ≥ A (x, y) (y − x)
for all
(x, y) ∈ D1 ;
(2.4.3)
(d) the linear operator A (x, y) has a nonegative left superinverse for each (x, y) ∈ D1 F (y) − F (x) ≤ A0 (y, z) (y − x)
for all x, y ∈ D1 , (y, z) ∈ D3 . (2.4.4)
Moreover, let us define approximations F (yn ) + A0 (yn , yn−1 ) (yn+1 − yn ) = 0 F (xn ) + A0 (yn , yn−1 ) (xn+1 − xn ) = 0
(2.4.5) (2.4.6)
yn+1 = yn − Bn F (yn ) n ≥ 0 xn+1 = xn − Bn1 F (xn )
(2.4.7)
n ≥ 0,
(2.4.8)
where Bn and Bn1 are nonnegative subinverses of A0 (yn , yn−1 ) n ≥ 0. Under very natural conditions, hypotheses the form (2.4.2) or (2.4.3) or (2.4.4) have been used extensively to show that the approximations (2.4.2) and (2.4.6) or (2.4.7) and (2.4.8) generate two sequences {xn } n ≥ 1, {yn } n ≥ 1 such that x0 ≤ x1 ≤ · · · ≤ xn ≤ xn+1 ≤ yn+1 ≤ yn ≤ · · · ≤ y1 ≤ y0 ∗
∗
lim xn = x = y = lim yn
n→∞
n→∞
∗
and F (x ) = 0.
(2.4.9) (2.4.10)
2.4. Divided Differences and Monotone Convergence
27
For a complete survey on these results we refer to the works of [279], Argyros and Szidarovszky [97]–[99]. Here we will use similar conditions (i.e. like (2.4.2), (2.4.3), (2.4.4)) for two-point approximations of the form (2.4.5) and (2.4.6) or (2.4.7) and (2.4.8). Consequently a discussion must follow on the possible choices of the linear operators A0 and A. Remark 2.4.1 Let us now consider an operator F : D ⊆ X → Y, where X, Y are both POTL-spaces. The operator F is called order-complex on an interval [x0 , y0 ] ⊆ D if F (λx + (1 − λ) y) ≤ λF (x) + (1 − λ) F (y)
(2.4.11)
for all comparable x, y ∈ [x0 , y0 ] and λ ∈ [0, 1] . If F has a linear G-derivative F ′ (x) at each point [x0 , y0 ] then (2.4.11) holds if and only if F ′ (x) (y − x) ≤ F (y) − F (x) ≤ F ′ (y) (y − x)
for
x0 ≤ x ≤ y ≤ y0 . (2.4.12)
(See e.g. Ortega and Rheinboldt [263] or our Exercise 2.25 for the properties of the Gateaux-derivative). Hence, for order-convex G-differentiable operators conditions (2.4.3) and (2.4.4) are satisfied with A0 (y, v) = A (y, v) = F ′ (u). In the unidimensional case (2.4.12) is equivalent with the isotony of the operator x → F ′ (x) but in general the latter property is stronger. Assuming the isotony of the operator x → F ′ (x) it follows that F (y) − F (x) ≤ F ′ (w) (y − x)
for
x0 ≤ x ≤ y ≤ w ≤ y0
Hence, in this case condition (2.4.2) is satisfied for A0 (w, z) = F ′ (w) . The above observations show to choose A and A0 for single or two-step Newtonmethods. We note that the iterative algorithm (2.4.5)-(2.4.8) with A0 (u, v) = F ′ (u) is the algorithm proposed by Fourier in 1818 in the unidimensional case and extended by Baluev in 1952 in the general case. The idea of using an algorithm of the form (2.4.7)-(2.4.8) goes back to Slugin [311]. In Ortega and Rheinboldt [263] it is shown that with Bn properly chosen (2.4.7) reduces to a general Newton-SOR algorithm. In particular,suppose (in the finite dimensional case) that F ′ (yn ) is an M -matrix and let F ′ (y) = Dn − Ln − Un be the partition of F ′ (yn ) into diagonal, strictly lower - and strictly upper-triangular parts respectively for all n ≥ 0. Consider an integer mn ≥ 1, a real parameter wn ∈ (0, 1] and denote Pn = wn−1 (Dn − wn Ln ) ,
Hn =
Pn−1 Qn ,
Qn = wn−1 [(1 − wn ) Dn + wn Un ] , and Bn = I + Hn + · · · + Hnmn −1 Pn−1 .
(2.4.13) (2.4.14)
28
2. Monotone convergence
It can easily be seen that Bn n ≥ 0 is a nonnegative subinverse of F ′ (yn ) (see, also [99]). If f : [a, b] → | R is a real function of a real variable, then f is (order) convex if and only if f (u) − f (v) f (x) − f (y) ≤ , x−y u−v
∀x, y, u, v ∈ [a, b] such that x ≤ u and y ≤ v.
This fact motivates the notion of convexity with respect to a divided difference considered by J.W. Schmidt and H. Leonhardt [307]. Let F : D ⊆ X → Y be a nonlinear operator between the POTL-spaces X and Y. Assume that the nonlinear operator F has a divided difference [·, ·] on D. F is called convex with respect to the divided difference [·, ·] on D if [x, y] ≤ [u, v] for all x, y, u, v ∈ D with x ≤ u and y ≤ v.
(2.4.15)
In the above quoted study, Schmidt and Leonhardt studied (2.4.5)-(2.4.6) with A0 (u, v) = [u, v] in case the nonlinear operator F is convex with respect to [·, ·]. Their result was extended by N. Schneider [308] who assumed instead of (1.2.1) the milder condition [u, v] (u − v) ≥ F (u) − F (v) for all comparable u, v ∈ D.
(2.4.16)
An operator [·, ·] : D × D → L (X, Y ) satisfying (2.4.16) is called a generalized divided difference of F on D. If both (2.4.15) and (2.4.16) are satisfied, then we say that F is convex with respect to the generalizedivided difference of [·, ·] . It is easily seen that if (2.4.15) and (2.4.16) are satisfied on D = [x0 , y−1 ] then conditions (2.4.2) and (2.4.3) are satisfied with A = A0 = [·, ·] . Indeed for x0 ≤ x ≤ y ≤ w ≤ z ≤ y−1 we have [x, y] (y − x) ≤ F (y) − F (x) ≤ [y, x] (y − x) ≤ [w, z] (y − x) . Moreover, concerning general secant-SOR methods, in case the generalized difference [yn , yn−1 ] is an M -matrix and if Bn n ≥ 0 is computed according to (2.4.13) and (2.4.14) where [yn , yn−1 ] = Dn − Ln − Un n ≥ 0 is the partition of [yn , yn−1 ] into its diagonal, strictly lower- and strictly upper-triangular parts. We remark that an operator which is convex with respect to a generalized divided difference is also order-convex. To see that, consider x, y ∈ D, x ≤ y, λ ∈ [0, 1] −1 and set z = λx + (I − λ) y. Observing that y − x = (1 − λ) (z − x) = λ−1 (y − z) and applying (2.4.16) we have in turn: (1 − λ)−1 (F (z) − F (x)) ≤ (1 − λ)−1 [z, x] (z − x)
= [z, x] (y − x) ≤ [z, y] (y − x)
= λ−1 [z, y] (y − z) ≤1 (F (y) − F (z)) . By the first and last term we deduce that F (z) ≤ λF (x) + (1 − λ) F (y) . Thus, Schneider’s result can be applied only to order-convex operators and its importance
2.4. Divided Differences and Monotone Convergence
29
resides in the fact that the use of a generalized divided difference instead of the G-derivative may be more advantageous from a numerical point of view. We note however, that conditions (2.4.2) and (2.4.3) do not necessarily imply convexity. For example, if f is a real function of a real variable such that f (x) − f (y) = m > 0, x−y x,y∈[x0 ,y0 ] inf
f (x) − f (y) =M 0 such that k[x, y] − [x1 , y1 ]k ≤ c (kx − x1 k + ky − y1 k)
(2.5.4)
for all x, y, x1 , y1 ∈ D with x 6= y and x1 = y1 . We say in this case that F has a Lipschitz continuous difference on D. This condition allows us to extend by continuity the operator (x, y) → [x, y] to the whole Cartesian product D × D. From (1.2.1) and (2.5.4) it follows that F is Fr´echet-differentiable on D and that [x, x] = F ′ (x). It also follows that kF ′ (x) − F ′ (y)k ≤ c1 kx − yk
with
c1 = 2c
(2.5.5)
and k[x, y] − F ′ (z)k ≤ c (kx − zk + ky − zk)
(2.5.6)
for all x, y ∈ D. conversely if we assume that F is Fr´echet-differentiable on D and that its Frechet derivative satisfies (2.5.5) then it follows that F has a Lipschitz continuous divided difference on D. We can certainly take Z 1 [x, y] = F ′ (x + t (y − x)) dt. (2.5.7) 0
We now want to give the definition of the second Frechet derivative of F . We must first introduce the definition of bounded multilinear operators (which will also be used later). Definition 2.5.3 Let X and Y be two Banach spaces. An operator A : X n → y (n ∈ N ) will be called n-linear operator from X to Y if the following conditions are satisfied: (a) The operator (x1 , ..., xn ) → A (x1 , ..., xn ) is linear in each variable xk k = 1, 2, ..., n. (b) There exists a constant c such that kA (x1 , x2 , ..., xn )k ≤ c kx1 k ... kxn k
(2.5.8)
The norm of a bounded n−linear operator can be defined by the formula kAk = sup {kA (x1 , ..., xn )k | kxn k = 1} . Set LB (1) (X, Y ) = LB (X, Y ) and define recursively LB (k+1) (X, Y ) = LB X, LB (k) (X, Y ) , k ≥ 0.
(2.5.9)
(2.5.10)
2.5. Divided Differences and Fr´echet-derivatives
33
In this way we obtain a sequence of Banach spaces LB (n) (X, Y ) (n ≥ 0) . Every A ∈ LB (n) (X, Y ) can be viewed as a bounded n-linear operator if one takes A (x1 , ..., xn ) = (... (A (x1 ) (x2 ) (x3 )) ...) (xn ) .
(2.5.11)
In the right hand side of (2.5.11) we have A (x1 ) ∈ LB (n−1) (X, Y ) ,
(A (x1 )) (x2 ) ∈ LB (n−2) (X, Y ) ,
etc.
Conversely, any bounded n-linear operator A drom X to Y can be interpreted as an element of B (n) (X, Y ) . Moreover the norm of A as a bounded n-linear operator coincides with the norm as an element of the space LB (n) (X, Y ) . Thus we may identify this space with the space of all bounded n−linear operators from X to Y. In the sequel we will identify A (x, x, ..., x) = Axn , and A (x1 ) (x2 ) ... (xn ) = A (x1 , x2 , ..., xn ) = Ax1 x2 ...xn . Let us now consider a nonlinear operator F : D ⊆ X → Y where D is open. Suppose that F is Fr´echet differentiable on D. Then we may consider the operator F ′ : D → LB (X, Y ) which associated to each point x the Fr´echet derivative of F at x. If the operator F ′ is Fr´echet differentiable at a point x0 ∈ D then we say that F is twice Fr´echet differentiable at x0 . The Fr´echet derivative of F ′ at x0 will be denoted by F ′′ (x0 ) and will be called the second Fr´echet derivative of F at x0 . Note that F ′′ (x0 ) ∈ LB (2) (X, Y ) . Similarly we can define Fr´echet derivatives of higher order. Finally by analogy with (2.5.7) [x0 , ..., xk ] = Z 1 Z 1 = ··· tk−1 tk−2 · · · tk−1 F (x0 + t1 (x1 − x0 ) + t1 t2 (x2 − x1 ) 1 2 0
0
+ · · · + t1 t2 , ..., tk (xk − xk−1 )) dt1 dt2 ...dtk .
(2.5.12)
It is easy to see that the multilinear operators defined above verify [x0 , ..., xk−1 , xk , xk+1 ] (xk − xk+1 ) = [x0 , ..., xk−1 , xk+1 ] .
(2.5.13)
We note that throughout this sequel a 2−linear operator will also be called bilinear. Finally, we will also need the definition of a n−linear symmetric operator. Given a n-linear operator A : X n → Y and a permutation i = (i1 , i2 , ..., in ) of the integers 1, 2, ..., n, the notation A (i) (or An (i) if we want to emphasize the n−linearity of A) can be used for the n−linear operator A (i) = An (i) such that A (i) (x1 , x2 , ..., xn ) = An (i) (x1 , x2 , ..., xn ) = An (xi1 , xi2 , ..., xin ) An xi1 xi2 ...xin (2.5.14) for all x1 , x2 , ..., xn ∈ X. Thus, there are n! n−linear operators A (i) = An (i) associated with a given n−linear operator A = An .
34
2. Monotone convergence
Definition 2.5.4 A n-linear operator A = An : X n → Y is said to be symmetric if A = An = An (i)
(2.5.15)
for all i belonging in Rn which denotes the set of all permutations of the integers 1, 2, ..., n. The symmetric n−linear operator A = An =
1P An (i) n! i∈Rn
(2.5.16)
is called the mean of A = An .
Definition 2.5.5 A n−linear operator A = An : X n → Y is said to be symmetric if A = An = An (i) for all i belonging in Rn which denotes the set of all permutations of the integers 1, 2, ..., n. The symmetric n−linear operator A = An =
1 X An (i) n!
(2.5.17)
i∈Rn
is called the mean of A = An . Notation 2.5.6 The notation An X P = An xx...x (p-times)
(2.5.18)
p ≤ n, A = An : X n → Y , for the result of applying An to x ∈ X p−times will be used. If p < n, then (2.5.17) will represent a (n − p)-linear operator. For p = n, note that Ak xk = Ak xk = Ak (i) xk
(2.5.19)
for all i ∈ Rk , x ∈ X. It follows from (2.5.18) that whenever we are dealing with an equation involving n-linear operators An , we may assume that they are symmetric without loss of generality, since each An may be replaced by An without changing the value of the expression at hand.
2.6
Enclosing the Root of an Equation
We consider the equation L (x) = T (x)
(2.6.1)
2.6. Enclosing the Root of an Equation
35
where L is a linear operator and T is a nonlinear operator defined on some convex subset D of a linear space E with values in a linear space Y. We study the convergence of the iterations L (yn+1 ) = T (yn ) + An (y, x) (yn+1 − yn )
(2.6.2)
L (xn+1 ) = T (xn ) + An (y, x) (xn+1 − xn )
(2.6.3)
and
to a solution x∗ of equation (2.6.1), where An (y,x) , n ≥ 0 is a linear operator. If P is a linear projection operator P 2 = P , that projects the space Y into YP ⊆ Y, then the operator P T will be assumed to be Fr´echet differentiable on D and its derivative P Tx′ (x) corresponds to the operator P B (y, x) , y, x ∈ D × D, ′ P TX (x) = P B (x, x) for all x ∈ D. We will assume that An (y, x) = AP B (yn , xn ) ,
n ≥ 0.
Iterations (2.6.2) and (2.6.3) have been studied extensively under several assumptions [68], [99], [100], [263], [277], [279], when P = L = I, the identity operator on D. However, the iterates {xn } and {yn } can rarely be computed in infinite dimensional spaces in this case. But if the space YP is finite dimensional with dim (YP ) = N, then iterations (2.6.2) and (2.6.3) reduce to systems of linear algebraic equations of order at most N. This case has been studied in [100], [205] and in particular in [225]. The assumptions in [230] involve the positivity of the operators P B (y, x) − AP B (y, x) , QTx′ (x) with Q = I − P and L (y) − An (y, x) y on some interval [y0 , x0 ], which is difficult to verify. In this section we simplify the above assumptions and provide some further conditions for the convergence of iterations (2.6.2) and (2.6.3) to a solution x∗ of equation (2.6.1). We finally illustrate our results with an example. We can now formulate our main result. Theorem 2.6.1 Let F = D ⊂ X → Y, where X is a regular POTL-space and Y is a POTL-space. Assume (a) there exist points x0 , y0 , y−1 ∈ D with x0 ≤ y0 ≤ y−1 ,
[x0 , y−1 ] ⊂ D,
L (x0 ) − T (x0 ) ≤ 0 ≤ L (y0 ) − T (y0 ) .
Set
and
S1 = (x, y) ∈ X 2 ; x0 ≤ x ≤ y ≤ y0 , S2 = (u, y−1 ) ∈ X 2 ; x0 ≤ u ≤ y0 S3 = S1 ∪ S2 .
36
2. Monotone convergence
(b) Assume that there exists an operator A : S3 → B (X, Y ) such that (L (y) − T (y)) − (L (x) − T (x)) ≤ A (w, z) (y − x)
(2.6.4)
for all (x, y) , (y, w) ∈ S2 , (w, z) ∈ S3 . (c) Suppose that for any (u, v) ∈ S3 the linear operator A (u, v) has a continuous nonsingular nonnegative left subinverse. Then there exist two sequences {xn } , {yn } , n ≥ 1 satisfying (2.6.2), (2.6.3), L (xn ) − T (xn ) ≤ 0 ≤ L (yn ) − T (yn ) , (2.6.5) x0 ≤ x1 ≤ · · · ≤ xn ≤ xn+1 ≤ yn+1 ≤ yn ≤ · · · ≤ y1 ≤ y0 (2.6.6) and lim xn = x∗ ,
n→∞
lim yn = y ∗ .
(2.6.7)
n→∞
Moreover, if the operators An = A (yn , yn−1 ) are inverse nonnegative then any solution of the equation (2.6.1) from the interval [x0 , y0 ] belongs to the interval [x∗ , y ∗ ] . Proof. Let us define the operator M : [0, y0 − x0 ] → X by M (x) = x − L0 (L (x0 ) − T (x0 ) + A0 (x)) where L0 is a continuous nonsingular nonnegative left subinverse of A0 . It can easily be seen that M is isotone, continuous with M (0) = −L0 (L (x0 ) − T (x0 )) ≥ 0 and M (y0 − x0 ) = y0 − x0 − L0 (L (y0 ) − T (y0 )) + L0 [(L (y0 ) − T (y0 )) − (L (x0 ) − T (x0 )) − A0 (y0 − x0 )] ≤ y0 − x0 − L0 (L (y0 ) − T (y0 )) ≤ y0 − x0 .
It now follows from the theorem of L.V. Kantorovich 2.2.2 on fixed points that the operator M has a fixed point w ∈ [0, y0 − x0 ] . Set x1 = x0 + w to get L (x0 ) − T (x0 ) + A0 (x1 − x0 ) = 0,
x0 ≤ x1 ≤ y0 .
By (2.6.4), we get L (x1 ) − T (x1 ) = (L (x1 ) − T (x1 )) − (L (x0 ) − T (x0 )) + A0 (x0 − x1 ) ≤ 0.
2.6. Enclosing the Root of an Equation
37
Let us now define the operator M1 : [0, y0 − x1 ] → X by M1 (x) = x + L0 (L (y0 ) − T (y0 ) − A0 (x)) . It can easily be seen that M1 is continuous, isotone with M1 (0) = L0 (L (y0 ) − T (y0 )) ≥ 0 and M1 (y0 − x1 ) = y0 − x1 + L0 (L (x1 ) − T (x1 )) + L0 [(L (y0 ) − T (y0 )) − (L (x1 ) − T (x1 )) − A0 (y0 − x1 )] ≤ y0 − x1 + L0 (L (x1 ) − T (x1 )) ≤ y0 − x1 .
As before, there exists z ∈ [0, y0 − x1 ] such that M1 (z) = z. Set y1 = y0 − z to get L (y0 ) − T (y0 ) + A0 (y1 − y0 ) = 0,
x1 ≤ y1 ≤ y0 .
But from (2.6.4) and the above L (y1 ) − T (y1 ) = (L (y1 ) − T (y1 )) − (L (y0 ) − T (y0 )) − A0 (y1 − y0 ) ≥ 0. Using induction on n we can now show, following the above technique, that there exist sequences {xn } , {yn } , n ≥ 1 satisfying (2.6.1), (2.6.2), (2.6.5) and (2.6.6). since the space X is regular (2.6.6) we get that there exist x∗ , y ∗ ∈ E satisfying (2.6.7), with x∗ ≤ y ∗ . Let x0 ≤ z ≤ y0 and L (z) − T (z) = 0 then we get A0 (y1 − z) = A0 (y0 ) − (L (y0 ) − T (y0 )) − A0 (z)
= A0 (y0 − z) − [(L (y0 ) − T (y0 )) − (L (z) − T (z))] ≥ 0
and A0 (x1 − z) = A0 (x0 ) − (L (x0 ) − T (x0 )) − A0 (z)
= A0 (x0 − z) − [(L (x0 ) − T (x0 )) − (L (z) − T (z))] ≤ 0.
If A0 is inverse isotone, then x1 ≤ z ≤ y1 and by induction xn ≤ z ≤ yn . Hence x∗ ≤ z ≤ y ∗ . That completes the proof of the theorem. Using (2.6.1), (2.6.2), (2.6.5), (2.6.6) and (2.6.7) we can easily prove the following theorem which gives us natural conditions under which the points x∗ and y ∗ are solutions of the equation (2.6.1). Theorem 2.6.2 Let L − T be continuous at x∗ and y ∗ and the hypotheses of Theorem 2.6.1 be true. Assume that one of the conditions is satisfied: (a) x∗ = y ∗ ;
38
2. Monotone convergence
(b) X is normal and there exists an operator H : X → Y (H (0) = 0) which has an isotone inverse continuous at the origin and An ≤ H for sufficiently large n; (c) Y is normal and there exists an operator G : X → Y (G (0) = 0) continuous at the origin and such that An ≤ G for sufficiently large n. (d) The operators Ln , n ≥ 0 are equicontinuous. Then L (x∗ )−T (x∗ ) = L (y ∗ )−T (y ∗ ) = 0. Moreover, assume that there exists an operator G1 : S1 → L (X, Y ) such that G1 (x, y) has a nonnegative left superinverse for each (x, y) ∈ S1 and L (y) − T (y) − (L (x) − T (x)) ≥ G1 (x, y) (y − x)
for all
(x, y) ∈ S1 .
Then if (x∗ , y ∗ ) ∈ S1 and L (x∗ ) − T (x∗ ) = L (y ∗ ) − T (y ∗ ) = 0 then x∗ = y ∗ . We now complete this paper with an application.
2.7
Applications
Let X = Y = Rk with k = 2N. We define a projection operator PN by vi , i = 1, 2, . . . , N PN (v) = 0, i = N + 1, . . . , k, v = (v1 , v2 , . . . , vk ) ∈ X. We consider the system of equations vi = fi (v1 , ..., vk ) ,
i = 1, 2, ..., k.
(2.7.1)
Set T (v) = {fi (v1 , ..., vk )} , i = 1, 2, . . . , k, then fi (v1 , . . . , vk ) i = 1, . . . , N, PN T (v) = 0 i = N + 1, . . . , k, k P f ′ (v , . . . , v ) u i = 1, 2, . . . , N, 1 k j ij PN T ′ (v) u = j=1 ∂fi 0 i = N + 1, . . . , k, fij′ = ∂v , j k P Fij (w1 , . . . , wk , z1 , . . . , zk ) uj , i = 1, . . . , N PN B (w, z) u = j=1 0, i = N + 1, . . . , k i = CN (w, z) u,
where Fij (v1 , ..., vk , v1 , ..., vk ) = ∂fi (v1 , ..., vk ) /∂vj . Choose i An (y, x) = CN (y, x) ,
2.7. Applications
39
then iterations (2.6.2) and (2.6.3) become i yi,n+1 = fi (y1,n , . . . , yk,n ) + CN (yi,n , xi,n ) (yi,n+1 − yi,n )
xi,n+1 = fi (x1,n , . . . , xk,n ) +
i CN
(2.7.2)
(yi,n , xi,n ) (xi,n+1 − xi,n ) .
(2.7.3)
Let us assume that the determinant Dn of the above N −th order linear systems is nonzero, then (2.7.2) and (2.7.3) become
yi,n+1 = yi,n+1
N P
m=1
1 Dim Fm (yn , xn )
,
Dn = fi (y1n , . . . , ykn ) ,
i = 1, . . . , N
(2.7.4)
i = N + 1, . . . , k
(2.7.5)
and
xi,n+1 = xi,n+1
N P
m=1
2 Dim Fm (yn , xn )
Dn = fi (x1,n , . . . , xk,n ) ,
,
i = 1, . . . , N,
(2.7.6)
i = N + 1, . . . , k
(2.7.7)
respectively. Here Dim is the cofactor of the element in the i-th column and m-th 1 row of Dn and Fm (yn , xn ) , i = 1, 2 are given by k X
αnmj fj (y1,n , . . . , yk,n ) −
k X
k X
αnmj fj (x1,n , . . . , xk,n ) −
k X
1 Fm (yn , xn ) = fm (y1,n , . . . , yk,n ) +
i=N +1
αnmj yj,n ,
j=1
and 2 Fm (yn , xn ) = fm (x1,n , . . . , xk,n ) +
i=N +1
αnmj xj,n ,
j=1
where αnmj = Fmj (yn , xn ) . If the hypotheses of Theorem 2.6.1 and 2.6.2 are now satisfied for the equation (2.7.1) then the results apply to obtain a solution x∗ of equation (2.7.1) [y0 , x0 ] . In particular consider the system of differential equations qi′′ = fi (t, q1 , q2 ) ,
i = 1, 2,
0≤t≤1
(2.7.8)
subject to the boundary conditions qi (0) = di ,
qi (1) = ei ,
i = 1, 2, .
(2.7.9)
The functions f1 and f2 are assumed to be sufficiently smooth. For the discretization a uniform mesh tj = jh,
j = 0, 1, ..., N + 1,
h=
1 N +1
40
2. Monotone convergence
and the corresponding central-difference approximation of the second derivatives are used. Then the discretized given by x = T (x)
(2.7.10)
with T (x) = (B + I) (x) = h2 ϕ (x) − b,
x ∈ R2N
where
B=
ϕ (x) =
A+I 0 ϕ1 (x) ϕ2 (x)
0 A+I
,
,
2
−1 A+ 0
−1 2
0 .. . ..
−1
ϕi (x) = (fi (t, xj , xn+j )) ,
.
, −1 2
j = 1, 2, ..., N,
i = 1, 2,
x ∈ R2N , and b ∈ R2N is the vector of boundary values that has zero components except for b1 = d1 , bn = e1 , bn+1 = d2 , b2n = e2 . That is (2.7.10) plays the role of (2.7.1) (in vector form). As a numerical example, consider the problem (2.7.8)-(2.7.9) with f1 (t, q1 , q2 ) = q12 + q1 + .1q22 − 1.2 f2 (t, q1 , q2 ) = .2q12 + q22 + 2q2 − .6 d1 = d2 = e1 = e2 = 0.
Choose N = 49 and starting points xi,0 = 0, yi,0 = (tj (1 − tj ) ,
j = 1, ..., N ) , with t = .1 (.1) .5.
It is trivial to check that the hypotheses of Theorem 2.6.1 are satisfied with the above values. Furthermore the components of the first two iterates corresponding to the above values using the procedure described above and (2.7.4)-(2.7.5), (2.7.6)(2.7.7) we get the following values. The computations were performed in double precision on a PRIME-850.
2.8
Exercises
2.1. Show that the spaces defined in Examples 2.1.1 are POTL. 2.2. Show that any regular POB-space is normal but the converse is not necessarily true. 2.3. Prove Theorem 2.2.2.
2.8. Exercises
41
x1,p
x2,p
y1,p
y2,p
t .1 .2 .3 .4 .5 .1 .2 .3 .4 .5 .1 .2 .3 .4 .5 .1 .2 .3 .4 .5
p=1 .0478993317265 .0843667040291 .1099518629493 .1251063839064 .1301240123325 .0219768501208 .0384462112803 .0498537074028 .0565496306877 .0587562590344 .0494803602542 .0874507511044 .1142981809478 .1302974325097 .1356123753407 .0235492475283 .0415200498433 .0541939935471 .0617399319012 .0642461600398
p=2 .0490944353538 .0866974354188 .1132355483832 .1290273001442 .1342691068706 .0227479238400 .0399528292723 .0519796383151 .0590905187490 .0614432572165 .0490951403091 .0866988216544 .1132375255317 .1290296859551 .1342716394060 .0227486289905 .0399542200344 .0519816281202 .0590929252230 .0614458137439
2.4. Show that if (2.3.1) or (2.3.2) are satisfied then F ′ (x) = [x, x] for all x ∈ D. Moreover show that if both (2.3.1) and (2.3.2) are satisfied, then F ′ is Lipschitz continuous with I = c0 + c1 . 2.5. Find sufficient conditions so that estimates (2.4.9) and (2.4.10) are both satisfied. 2.6. Show that Bn (n ≥ 0) in (2.4.14) is a nonnegative subinverse of F ′ (yn ) (n ≥ 0). 2.7. Let x0 , x1 , ..., xn be distinct real numbers, and let f be a given real-valued function. Show that: [x0 , x1 , ..., xn ] = and
Pn
f (xj ) (xj )
j=0 ′ gn
[x0 , x1 , ..., xn ] (xn − x0 ) = [x1 , ..., xn ] − [x0 , ..., xn−1 ] where gn (x) = (x − x0 ) ... (x − xn ) .
42
2. Monotone convergence
2.8. Let x0 , x1 , ..., xn be distinct real numbers, and let f be n times continuously differentiable function on the interval I {x0 , x1 , ..., xn } . Then show that Z Z [x0 , x1 , ..., xn ] = · · · f (n) (t0 x0 + · · · + tn xn ) dt1 · · · dtn τn
in which τn = {(t1 , ..., tn ) |t1 ≥ 0, ..., tn ≥ 0, P t0 = 1 − ni=1 ti .
Pn
i=1 ti
≤ 1}
2.9. If f is a real polynomial of degree m, then show: polynomial of degree m − n − 1, n ≤ m − 1 am n=m−1 [x0 , x1 , ..., xn , x] = 0 n>m−1 where f (x) = am xn + lower-degree terms.
2.10. The tensor product of two matrices M, N ∈ L (Rn ) is defined as the n2 × n2 matrix M × N = (mij N | i, j = 1, ..., n), where M = (mij ) . Consider two F -differentiable operators H, K : L (Rn ) → L (Rn ) and set F (X) = n ′ ′ hH (X) K (X)ifor all X ∈ L (R ). Show that F (X) = [H (X) × I] K (X) + T
I × K (X)
H ′ (X) for all X ∈ L (Rn ) .
2.11. Let F : R2 → R2 be defined by f1 (x) = x31 , f2 (x) = x22 . Set x = 0 and T y = (1, 1) . Show that there is no z ∈ [x, y] such that F (y) − F (x) = F ′ (z) (y − x) . 2.12. Let F : D ⊂ Rn → Rm and assume that F is continuously differentiable on a convex set D0 ⊂ D. For and x, y ∈ D0 , show that kF (y) − F (x) − F ′ (x) (y − x)k ≤ ky − xk w (ky − xk) , where w is the modulus of continuity of F ′ on [x, y] . That is w (t) = sup {kF ′ (x) − F ′ (y)k | x, y ∈ D0 , kx − yk ≤ t} . 2.13. Let F : D ⊂ Rn → Rm . Show that F ′′ is continuous at z ∈ D if and only if all second partial derivatives of the components f1 , ..., fm of F are continuous at z. 2.14. Let F : D ⊂ Rn → Rm . Show that F ′′ (z) is symmetric if and only if each Hessian matrix H1 (z) , ..., Hm (z) is symmetric. 2.15. Let M ∈ L (Rn ) be symmetric, and define f : Rn → R by f (x) = xT M x. Show, directly from the definition that f is convex if and only if M is positive semidefinite.
2.8. Exercises
43
2.16. Show that f : D ⊂ Rn → R is convex on the set D if and only if, for any x, y ∈ D, the function g : [0, 1] → R, g (t) = g (tx + (1 − t) y) , is convex on [0, 1] . 2.17. Show that if gi : Rn → R is convex and ci ≥ 0, i = 1, 2, ..., m, then g = m P ci gi is convex. t=1
2.18. Suppose that g : D ⊂ Rn → R is continuous on a convex set D0 ⊂ D and satisfies 1 1 1 2 g (x) + g (y) − g (x + y) ≥ γ kx − yk 2 2 2 for all x, y ∈ D0 . Show that g is convex on D0 if γ = 0.
2.19. Let M ∈ L (Rn ) . Show that M is a nonnegative matrix if and only if it is an isotone operator. 2.20. Let M ∈ L (Rn ) be diagonal, nonsingular, and nonnegative. Show that kxk = kD (x)k is a monotonic norm on Rn . 2.21. Let M ∈ L (Rn ) . Show that M is invertible and M −1 ≥ 0 if and only if there exist nonsingular, nonnegative matrices M1 , M2 ∈ L (Rn ) such that M1 M M2 = 1. 2.22. Let [·, ·] : D × D be an operator satisfying conditions (2.2.1) and (2.5.3). The following two assertions are equivalent: (a) Equality (2.5.7) holds for all x, y ∈ D. (b) For all points u, v ∈ D such that 2v − u ∈ D we have [u, v] = 2 [u, 2v − u] − [v, 2v − u] . 2.23. If δ F is a consistent approximation F ′ on D show that each of the following four expressions in an estimate for kF (x) − F (y) − δF (u, v) (x − y)k
c1 = h (kx − uk + ky − uk + ku − vk) kx − yk , c2 = h (kx − vk + ky − vk + ku − vk) kx − yk ,
c3 = h (kx − yk + ky − uk + ky − vk) kx − yk , and
c4 = h (kx − yk + kx − uk + kx − vk) kx − yk .
44
2. Monotone convergence
2.24. Show that the integral representation of [x0 , ..., xk ] is indeed a divided difference of k-th order of F. Let us assume that all divided differences have such an integral representation. In this case for x0 = x1 , = ... = xk = x we shall have 1 [x, x, ..., x] = f (k) (x) . | {z } k k+1 times
Suppose now that the n-th Fr´echet-derivative of F is Lipschitz continuous on D, i.e. there exists a constant cn+1 such that
(n)
F (u) − F (n) (v) ≤ cn+1 ku − vk for all u, v ∈ D. In this case, set
Rn (y) = ([x0 , ..., xn−1 , y] − [x0 , ..., xn−1 , xn ]) (y − xn−1 ) , ..., (y − x0 ) and show that kRn (y)k ≤ and
cn+1 ky − xn k · ky − xn−1 k · · · ky − x0 k (n + 1)!
F (x + h) − F (x) + F ′ (x) h + 1 F ′′ (x) h2 + · · · + 1 F (n) (x) hn
2 n! cn+1 ≤ khkn+1 . (n + 1)!
2.25. We recall the definitions: (a) An operator F : D ⊂ Rn → Rm is Gateaux - (or G -) differentiable at an interior point x of D if there exists a linear operator L ∈ L (Rn , Rm ) such that, for any h ∈ Rn 1 kF (x + th) − F (x) − tL (h)k = 0. t→0 t lim
L is denoted by F ′ (x) and called the G-derivative of F at x. (b) An operator F : D ⊂ Rn → Rm is hemicontinuous at x ∈ D if, for any h ∈ Rn and ε > 0, there is a δ = δ (ε, h) so that whenever |t| < δ and x + th ∈ D, then kF (x + th) − F (x)k < ε. (c) If F : D ⊂ Rn → Rm and if for some interior point x of D, and h ∈ Rn , the limit lim
t→0
1 [F (x + th) − F (x)] = A (x, h) t
exists, then F is said to have a Gateaux-differential at x in the direction h.
2.8. Exercises
45
(d) If the G-differential exists at x for all h and if, in addition lim
h→0
1 kf (x + H) − F (x) − A (x, h)k = 0, khk
then F has a Fr´echet differential at x. Show:
(i) The linear operator L is unique; (ii) If F : D ⊂ Rn → Rm is G-differentiable at x ∈ D, then F is hemicontinuous at x. (iii) G-differential and ”uniform in h” implies F -differential; (iv) F -differential and ”linear in h” implies F -derivative; (v) G-differential and ”linear in h” implies G-derivative; (vi) G-derivative and ”uniform in h” implies F -derivative; Here ”uniform in h” indicated the validity of (d). Linear in h means that A (x, h) exists for all h ∈ Rn and A (x, h) = M (x) h, where M (x) ∈ L (Rn , Rm ) .
Define F : R2 → R by F (x) = sgn (x2 ) min (|x1 | , |x2 |) . Show that, for any h ∈ R2 A (0, h) = F (h), bu F does not have a G-derivative at 0. (viii) Define F : R2 → R by F (0) = 0 if x = 0 and i 3 .h 2 2 F (x) = x2 x21 + x22 2 x1 + x22 + x22 , if x 6= 0.
Show that F has a G-derivative at 0, but not an F -derivative. Show, moreover, that the G-derivative is hemicontinuous at 0. (ix) If the g-differential A (x, h) exists for all x in an open neighborhood of an interior point x0 of D and for all h ∈ Rn , then F has an F -derivative at x0 provided that for each fixed h, A (x, h) is continuous in x at x0 .
(e) Assume that F : D ⊂ Rn → Rm has a G-derivative at each point of an open set D0 ⊂ D. If the operator F ′ : D0 ⊂ Rn → L (Rn , Rm ) has a G-derivative at ′ x ∈ D0 , then (F ′ ) (x) is denoted by F ′′ (x) and called the second G-derivative of F at x. Show: (i) If F : Rn → Rm has a G-derivative at each point of an open neighborhood of x, then F ′ is continuous at x if and only if all partial derivatives ∂i Fi are continuous at x. (ii) F ′′ is continuous at x0 ∈ D if and only if all second partial derivatives of the components f1 , ..., fm of F are continuous at x0 . F ′′ (x0 ) is symmetric if and only if each Hessian matrix H1 (x0 ) , ..., Hm (x0 ) is symmetric.
This page intentionally left blank
Chapter 3
Contractive Fixed Point Theory The ideas of a contraction operator and its fixed points are fundamental to many questions in applied mathematics. In Section 1 we outline the essential ideas, whereas in Section 2 some examples are provided. An extension of the contraction mapping theorem with an application to a boundary value problem is provided in Section 3. Stationary and nonstationary projection methods are studied in Section 5. A special iteration method is analysed in Section 5 whose convergence results improves earlier ones. Finally, in Section 6 the stability of the Ishikawa iteration is discussed. Further results on fixed points are given in Chapter 12.
3.1
Fixed Points
Definition 3.1.1 Let F be an operator mapping a set X into itself. A point x ∈ X is called a fixed point of F if x = F (x).
(3.1.1)
Equation (3.1.1) leads naturally to the construction of the method of successive approximations or substitutions xn+1 = F (xn ) (n ≥ 0) x0 ∈ X.
(3.1.2)
If sequence {xn } (n ≥ 0) converges to some point x∗ ∈ X for some initial guess x0 ∈ X, and F is a continuous operator in a Banach space X, we can have x∗ = lim xn+1 = lim F (xn ) = F lim xn = F (x∗ ) . n→∞
n→∞
n→∞
∗
That is, x is a fixed point of operator F . Hence, we showed:
Theorem 3.1.2 If F is a continuous operator in a Banach space X, and the sequence {xn } (n ≥ 0) generated by (3.1.2) converges to some point x∗ ∈ X for some initial guess x0 ∈ X, then x∗ is a fixed point of operator F . 47
48
3. Contractive Fixed Point Theory
We need information on the uniqueness of x∗ and the distances kxn − x∗ k, kxn+1 − xn k (n ≥ 0). That is why we introduce the concept: Definition 3.1.3 Let (X, k k) be a metric space and F a mapping of X into itself. The operator F is said to be a contraction or a contraction mapping if there exists a real number c, 0 ≤ c < 1, such that kF (x) − F (y)k ≤ c kx − yk ,
for all x, y ∈ X.
(3.1.3)
It follows immediately from (3.1.3) that every contraction mapping F is uniformly continuous. Indeed, F is Lipschitz continuous with a Lipschitz constant c. The point c is called the contraction constant for F . We now arrive at the Banach contraction mapping theorem [207]. Theorem 3.1.4 Let (X, k·k) be a Banach space, and F : X → X be a contraction mapping. Then F has a unique fixed point. Proof. Uniqueness: Suppose there are two fixed points x, y of F . Since F is a contraction mapping, kx − yk = kF (x) − F (y)k ≤ c kx − yk < kx − yk , which is impossible. That shows uniqueness of the fixed point of F . Existence: Using (3.1.2) we obtain kx2 − x1 k ≤ c kx1 − x0 k
kx3 − x2 k ≤ x kx2 − x1 k ≤ c2 kx1 − x0 k .. . kxn+1 − xn k ≤ cn kx1 − x0 k
(3.1.4)
and, so kxn+m − xn k ≤ kxn+m − xn+m−1 k + · · · + kxn+1 − xn k ≤ cm+1 + · · · + c + 1 cn kx1 − x0 k cn ≤ kx1 − x0 k . 1−c Hence, sequence {xn }(n ≥ 0) is Cauchy in a Banach space X and such it converges to some x∗ . The rest of the theorem follows from Theorem 3.1.2. Remark 3.1.1 It follows from (3.1.1), (3.1.3) and x∗ = F (x∗ ) that kxn − x∗ k = kF (xn−1 ) − F (x∗ )k ≤ c kxn−1 − x∗ k ≤ cn kx0 − x∗ k (n ≥ 1) .
(3.1.5)
3.2. Applications
49
Inequality (3.1.5) describes the convergence rate. This is convenient only when an a priori estimate for x0 − x∗ is available. Such an estimate can be derived from the inequality kx0 − x∗ k ≤ kx0 − F (x0 )k + kF (x0 ) − F (x∗ )k ≤ kx0 − F (x0 )k + c kx0 − x∗ k , which leads to kx0 − x∗ k ≤
1 kx0 − F (x0 )k . 1−c
(3.1.6)
By (3.1.5) and (3.1.6), we obtain kxn − x∗ k ≤
cn kx0 − F (x0 )k (n ≥ 1) . 1−c
(3.1.7)
Estimates (3.1.5) and (3.1.7) can be used to determine the number of steps needed to solve Equation (3.1.1). For example if the error tolerance is ε > 0, that is, we use kxn − x∗ k < ε, then this will certainly hold if n>
3.2
1 ε (1 − c) ln . ln c kx0 − F (x0 )k
(3.1.8)
Applications
These are examples of operators with fixed points that are not contraction mappings: Example 3.2.1 Let F : R → R, q > 1, F (x) = qx + 1. Operator F is not a contraction, but it has a unique fixed point x = (1 − q)−1 . Example 3.2.2 Let F : X → X, x = (0, √16 ], F (x) = x3 . We have |F (x) − F (y)| = x3 − y 3 ≤ |x|2 + |x| · |y| + |y|2 |x − y| ≤
1 2
|x − y|
That is, F is a contraction with c = 12 , with no fixed points in X. This is not violating Theorem 3.1.4 since X is not a Banach space. Example 3.2.3 Let F : [a, b] → [a, b], F differentiable at every x ∈ (a, b) and |F ′ (x)| ≤ c < 1. By the mean value theorem, if x, y ∈ [a, b] there exists a point z between x and y such that F (x) − F (y) = F ′ (z) (z − y) , from which it follows that F is a contraction with constant c.
50
3. Contractive Fixed Point Theory
Example 3.2.4 Let F : [a, b] → R. Assume there exist constants p1 , p2 such that p1 p2 < 0 and 0 < p1 < F ′ (x) ≤ p−1 and assume that F (a) < 0 < F (b). How 2 can we find the zero of F (x) guaranteed to exist by the intermediate value theorem? Define a function P by P (x) = x − p2 F (x). Using the hypotheses, we obtain P (a) = a − p2 F (a) > a, P (b) = b − F (b) < b, P ′ (x) = 1 − p2 F ′ (x) ≥ 0, and P ′ (x) ≤ 1 − p1 p2 < 1. Hence P maps [a, b] into itself and |p′ (x)| ≤ 1 for all x ∈ [a, b]. By Example 3.2.3, P (x) is a contraction mapping. Hence, P has a fixed point which is clearly a zero of F . Example 3.2.5 (Fredholm integral equation). Let K(x, y) be a continuous function on [a, b] × [a, b], f0 (x) be continuous on [a, b] and consider the equation Z b f (x) = f0 (x) + λ K (x, y) f (y) dy. a
Define the operator P : C[a, b] → C[a, b] by p(f ) = g given by Z b g (x) = f0 (x) + λ K (x, y) f (y) dy. a
Note that a fixed point of P is a solution of the integral equation. We get kP (q1 ) − P (q2 )k = sup p (q1 (x)) − P (q2 (x)) x∈[a,b]
Z b = |λ| sup k (x, y) (q1 (y) − q2 (y)) dy x∈[a,b] a Z b (q1 (y) − q2 (y)) dy (by Weierstrass theorem) ≤ |λ| δ a ≤ |λ| δ (b − a) sup |q1 (y) − q2 (y)| x∈[a,b]
≤ |λ| δ (b − a) kq1 − q2 k . Hence, P is a contraction mapping if there exists a c > 1 such that |λ| δ (b − a) ≤ c.
3.3
Extending the contraction mapping theorem
An extension of the contraction mapping theorem is provided in a Banach space setting. Our approach is justified by numerical examples where our results apply whereas the contraction mapping theorem 3.1.4 cannot. We provide the following fixed point theorem:
3.3. Extending the contraction mapping theorem
51
Theorem 3.3.1 Let Q : D ⊆ X → X be an operator, p, q scalars with p < 1, q ≥ 0 and x0 a point in D such that: kQ(x) − Q(y)k ≤ qkx − ykp for all x, y ∈ D. 1
(3.3.1)
1
Set a = q 1−p and b = q p kQ(x0 ) − x0 k. Moreover assume: 0≤b 1. Under hypotheses (3.3.1), (3.3.2) and (3.3.3) n n+m−1 the proof of the theorem goes through since bp + · · · + bp is still a Cauchy sequence but estimate (3.3.6) only holds as kxn − x∗ k ≤ cn .
(3.3.11)
(b) Case p = 1. We simply have the contraction mapping theorem 3.1.4. Next we provide a numerical example to show that our Theorem can apply in cases where p 6= 1. That is in cases where the contraction mapping principle cannot apply.
3.3. Extending the contraction mapping theorem
53
Example 3.3.1 Let X = Rm−1 with m ≥ 2 an integer. Define operator Q on a subset D of X by Q(w) = H + h2 (p + 1)M where, h is a real parameter, p1 ≥ 0, 2 −1 0 ··· 0 −1 2 −1 0 H = 0 −1 2 −1 0 ··· −1 2 p w1 ··· 0 w2p ··· 0 , M = 0 p 0 ··· ··· wm−1
(3.3.12)
,
and w = (w1 , w2 , . . . , wm−1 ). Finding fixed points of operator Q is very important in many discretization studies since e.g. many two point boundary value problems can be reduced to finding such points [68], [99]. Let w ∈ Rm−1 , M ∈ Rm−1 ×Rm−1 , and define the norms of w and M by kwk =
max
1≤j≤m−1
|wj |
and kM k =
max
1≤j≤m−1
m−1 X k=1
|mjk |.
For all w, z ∈ Rm−1 for which |wj | > 0, |zj | > 0, j = 1, 2, . . . , m − 1 we obtain, for say p = 12
3 2 1/2 1/2
kQ(w) − Q(z)k = diag h w − z j j
2 1/2 3 1/2 = h2 max wj − zj 2 1≤j≤m−1 3 ≤ h2 [max |wj − zj |]1/2 2 3 2 = h kw − zk1/2 . (3.3.13) 2 Set q = 32 h2 . Then a = q 2 and b = q 2 kQ(x0 ) − x0 k. It follows by (17) that the contraction mapping principle cannot be used to obtain fixed points of operator Q since p = 12 6= 1. However our theorem can apply provided that x0 ∈ D is such that kQ(x0 ) − x0 k ≤ q −2 .
(3.3.14)
That is (2) is satisfied. Furthermore r should be chosen to satisfy (3) and we can certainly set U (x0 , r) = D.
54
3.4
3. Contractive Fixed Point Theory
Stationary and nonstationary projection-iteration methods
The aim of this work is to study the existence of the solutions of the equation x = A(x, x)
(3.4.1)
and their approximations for nonlinear operator A. It is clear that equation (3.4.1) contains the equation x = Ax as a special case and the approximations xn = An (xn , xn−1 ), n = 1, 2, ...
(3.4.2)
generalize a lot of approximate methods such as the Picard, Seidel and NewtonKantorovich methods and so on. In the sequel we study the strong and weak convergence of this type of projection-iteration method, stationary as well as non stationary for equation (3.4.1). A new generalized fixed point Edelstein’s theorem [149], [150] will be stated and some examples will be shown. Theorem 3.4.1 (Stationary) Let X be a topological vector space, D a compact set in X, A a continuous mapping from D × D into D, and F = {x ∈ D : A(x, x) = x} Let g : D × F → [0, ∞). Assume 1. F 6= ∅ and g = 0, x ∈ D, p ∈ F ⇔ x = p ∈ F ; 2. g(A(x, y), p) < max{g(x, p), g(y, p)}, ∀x, y ∈ D, p ∈ F such that there are no such relations x = y = p; 3. the function g(x, p) is continuous in x; 4. the system un = A(un , vn ), (3.4.3) vn = Ak (un , un−1 ) is solvable for arbitrary element u0 ∈ D anf for all n ≥ 1, where A1 (x, y) = A(x, y), Ai (x, y) = A(x, Ai−1 (x, y)), i = 2, 3, ..., k. Then the sequences {un }, {vn } constructed by (3.4.3) converge (in the topology of X) to the unique solution of equation (3.4.1). Proof. By assumption 2/ we have g(un , p) < max(g(un , p), g(vn , p)), i.e. g(un , p) < g(vn , p). However g(vn , p) = g(Ak (un , un−1 ), p) < max(g(un , p), g(Ak−1 (un , un−1 ), p)) ≤ ... ≤ ≤ max(g(un , p), g(un−1 , p)),
hence g(un , p) < g(vn , p) < g(un−1 , p).
3.4. Stationary and nonstationary projection-iteration methods
55
Therefore lim g(un , p) = lim g(vn , p). By the compactness of D we get the subsequences {unj } ⊂ {un }, {vnj } ⊂ {vn } such that unj → u, vnj → v. Since g is continuous g(u, p) = g(v, p). On the other hand A(unj , vnj ) → A(u, v) and g(u, p) = g(A(u, v), p) < max{g(u, p), g(v, p)}. This is a contradiction. So u = v = p. Note that F contains only one element. Indeed if p 6= q, p, q ∈ F, then g(p, q) = g(A(p, q), p) < max g(p, q). This is impossible. Consequently the sequences {un , vn } converge to the unique solution p of equation (3.4.1). Remark 3.4.1 Instead of condition 3/ we can use the condition R: from lim g(un , p) = lim g(vn , p), p ∈ F, and v is a limit of the sequence {vn } it follows that either g(u, p) ≤ g(v, p) or g(u, p) ≥ g(v, p) for all limits u of the sequence {un }.
To verify this Remark it suffices to use conditions 2/, 4/. Indeed it is clear that from the sequences {unj }, {vnj }, we can choose the subseqeunces {unjk −1 } such that unjk −1 → u¯. Then we have unjk = A(unjk , vnjk ), unjk → u = A(u, v)
¯). vnjk = A(unjk , unjk −1 ), vnjk → v = A(u, u
Hence g(u, p) < max{g(u, p), g(v, p)} implies g(u, p) < g(v, p) g(v, p) < max{g(u, p), g(¯ u, p)} implies g(v, p) < g(¯ u, p), if either u = v = p or u = u ¯ = p do not hold. However the last inequalities contradict condition R). Therefore u = v = p or u = u ¯ = p, i.e. v = Ak (u, u) = Ak (p, p) = p. Lemma 3.4.2 [332] Let M be a bounded weakly closet subset in a reflexive Banach space, and mf a lower weakly semicontinuous functional on M. Then f is lower bounded and attains its infinimum on M. Theorem 3.4.3 Let X, D, A, F, g be as in Theorem 3.4.1. Assume that conditions 1/, 3/, 4/ of Theorem 3.4.1 hold and instead of condition 2/ of this theorem the following conditions are satisfied: 5/ < max(g(x, p), g(y, p)), if g(x, p) 6= g(y, p), g(A(x, y), p) ≤ max(g(x, p), g(y, p)), if g(x, p) = g(y, p),
56
3. Contractive Fixed Point Theory
∀x, y ∈ D, p ∈ F ; 6/ ∀y ∈ / F, x ∈ D satisfying x = A(x, y), ∃q = q(x, y) ∈ F such that g(A(x, y), p) < max(g(x, q), g(y, q)). Then subsequence {unj , vnj } of the sequence {un , vn } constructed by (3.4.3) must converge simultaneously to the solution p of equation (3.4.1). Proof. As in the proof of Theorem 3.4.1 we have lim g(un , p) = lim g(vn , p).
(3.4.4)
Now from {un }, {vn } we take {unj }, {vnj } such that unjk → u, vnjk → v. It will be shown that v ∈ F. If it is not true, i.e. v ∈ / F, then ∃q = q(u, v) ∈ F, such that g(u, p) = g(A(u, v), q) < max(g(u, p), g(v, p)) i.e. g(u, p) < g(v, p).
(3.4.5)
Since the function g is continuous g(u, p) = g(v, p) by (3.4.4). But this contradicts (3.4.5). Hence v ∈ F. Moreover, if g(u, v) 6= g(v, v) = 0 then g(u, p) = g(A(u, v), v) < max(g(u, v), g(v, v)), this is impossible. So g(u, v) = 0, i.e. u = v ∈ F. Lemma 3.4.4 Let X be a strictly convex Banach space, D a convex subset in X, A : D × D → D a continuous mapping satisfying the condition kA(x, y) − pk ≤ max(kx − pk , ky − pk), ∀x, y ∈ D, p ∈ F = {x ∈ D : A(x, x) = x}. If F 6= ∅, then F is convex. The proof of this Lemma is similar to the one in a work by W.V. Petryshyn and J.R. Williamson (see [273]). Lemma 3.4.5 [332]. In a Banach space if the functional f is quasiconvex, i.e. for any t ∈ (0, 1) f (tx − (1 − t)y) ≤ max(f (x), f (y)) and lower semicontinuous, then f is lower weakly semicontinuous. From now on a mapping V is said to be quasiconvex if
x1 + x2
V (
≤ max(kV (x1 )k , kV (x2 )k). )
2
3.4. Stationary and nonstationary projection-iteration methods
57
Theorem 3.4.6 Let X be a strictly convex and reflexive Banach space, D a convex closed subset in X, A a continuous mapping from D × D into D and 7/ F 6= ∅; 8/ ≤ max(kx − pk , ky − pk) if kx − pk 6= ky − pk , kA(x, y) − pk ≤ max(kx − pk , ky − pk) if kx − pk = ky − pk ; 9/ the system (3.4.3) is solvable for all n and kun − vn k → 0 as n → ∞; 10/ either the mapping Ix − A, where (Ix − A)(x, y) = x − A(x, y), is quasiconvex ot the mapping Iy −A, where (Iy −A)(x, y) = y −A(x, y), is quasiconvex; 11/ thespace mX W
has the property: if {yn } ⊂ X, yn → y0 ∈ X, then the limkyn − yk >limkyn − y0 k , ∀y 6= y0 , y ∈ X. Then the sequences {un }, {vn } defined by (3.4.3) weakly converge simoultaneously to the solution p ∈ F. Proof. Similarly to the proof of Theorem 3.4.1 we obtain: kun − pk ≤ kvn − pk ≤ kun−1 − pk . Hence lim kun − pk = lim kvn − pk .
Consequently {un }, {vn } ⊂ B(p, ku0 − pk), where mB is a ball with center at p ∈ F. Therefore from {un }, {vn } we can choose weakly convergent subsequences {unj }, {vnj } such that W
W
unj → u, vnj → v. However,
ku − vk ≤ lim unj − vnj = lim kun − vn k = 0.
Hence u = v. By condition 10/, if the mapping Iy −a is quasiconvex (and continuous, of course) then it is weakly lower semicontinuous, we thus have
kv − A(u, v)k ≤ lim unj − A(unj , vnj ) = 0.
Hence u = A(u, v), i.e. u = v = A(u, u) = A(v, v). Therefore for both cases the subsequences {unj }, {vnj } weakly converge to the solution p ∈ F. Now let f (p) = lim kun − pk . This function is defined on the set F0 = F ∩ B(p ku0 − pk) which is convex closed bounded by the convexity and closedness of F. and hence it is weakly compact. On the other hand it is easy to see that f (p) is convex and continuous and thus f (p) is lower weakly semicontinuous. Therefore f (p) attains on F0 its infinimum, say, at a point p∗∗ . It will be shown that p is a unique limit point of both sequences {un }, {vn }. Indeed if there exists a subsequence {unk } weakly converging to u 6= p then by condition 11/ we have lim kunk − pk > lim kunk − uk .
58
3. Contractive Fixed Point Theory
However lim kunk − pk = lim kun − pk = f (p), and lim kunk − uk = lim kun − uk = f (u). That is f (p) > f (u), which is a contradiction. Theorem 3.4.7 Let X be a uniformly convex Banach space, D a convex subset in X and A : D × D → D a continuous mapping satisfying conditions 7/, 8/ and a/ assume that for a point c ∈ [0, 1) the mapping Jc − A, where Jc (x, y) = cx + (1 − c)y is quasiconvex on D × D; then for any t ∈ (0, 1) the sequences un = ((tJc − (I − t)A)(un , vn ), vn = ((tJc − (I − t)A)k (un , un−1 ), weakly converge simultaneously to the solution p ∈ F ; b/ when the mapping Ix − A is quasiconvex, we construct an algorithm as follows: u ¯0 = u0 , then by means of the system un = A(un , vn ), n ≥ 1
(3.4.6)
vn = Ak (un , u ˜n−1 ),
we obtain (u1 , v1 ),then we take u ˜1 = tu1 + (1 − t)v1, and again using system (3.4.6) we get (u2 , v2 ) etc; we claim that such obtained sequences {un }, {vn } will weakly converge to the solution p ∈ F. Proof. It suffices to show that the constructed sequences {un }, {vn } satisfy condition 9/ i.e. kun − vn k → 0 as n → ∞. i/ First we study the case c 6= 1. We have kun − pk = k(tJc − (I − t)A)(un , vn ) − pk ≤ t kJc (un , vn ) − pk + (1 − t) kA(un , vn ) − pk
≤ t(c kun − pk + (1 − c) kvn − pk) + (1 − t) kA(un , vn ) − pk .
But (1 − t) kA(un , vn ) − pk < (1 − t) max(kun − pk , kvn − pk), if kun − pk 6= kvn − pk ≤ (1 − t) max(kun − pk , kvn − pk), if kun − pk = kvn − pk . It follows that kun − pk ≤ kvn − pk .
3.4. Stationary and nonstationary projection-iteration methods
59
Analogously we obtain kvn − pk = k(tJc + (1 − t)A)k (un , un−1 ) − pk ≤ kun−1 − pk . Therefore there exists d such that d = lim kun − pk = lim kvn − pk . If d = 0, then un → p, vn → p and, of course, W
W
un → p, vn → p. If d 6= 0, then kun − pk → 1. kvn − pk However
un − p c(un − p) + (1 − c)(vn − p) A(un , vn ) − p
= t
. + (1 − t)
kvn − pk kvn − pk kvn − pk Let
c(un − p) + (1 − c)(vn − p) A(un , vn ) − p
. Wn = t + (1 − t) kvn − pk kvn − pk
Obviously
Wn ≤ 1, kZn k ≤ 1. Hence from ktWn + (1 − t)Zn k → 0 as n → ∞ and by the uniform convexity of X it follows that kWn − Zn k → 0 as n → ∞, i.e. kc(un − p) + (1 − c)(vn − p) − (A(un , vn ) − p)k → 0 as n → ∞. Taking into account that kcun + (1 − c)vn − A(un , vn )k = 1 = k(1 − t)(cun + (1 − c)vn ) − (1 − t)A(un , vn )k 1−t 1 = kcun + (1 − c)vn − [t(cun + (1 − c)vn ) + (1 − t)A(un , vn )]k 1−t 1 = kcun + (1 − c)vn − un k 1−t 1 = k(1 − c)(vn − un )k , 1−t
60
3. Contractive Fixed Point Theory
we have kvn − un k → ∞ as n → ∞. At the same time it is easily seen (Jc − A)(un − vn ) = cun + (1 − c)vn − A(un , vn ) =
1−c (vn − un ). 1−t
Now as in the proof of Theorem 3.4.1 we obtain W
W
unj → u, vnj → v, u = v. By the quasiconvexity of the mapping Jc − A we get
kcu + (1 − c)v − A(un , vn )k = ku − A(u, u)k ≤ lim (Jc − A)(unj − vnj ) 1−c = lim kvn − un k = 0, 1−t
i.e. u = v = A(u, u) = A(v, v). It remains to apply the same arguments as in the proof of Theorem 3.4.1. ii/ The case c = 1. By condition 7/ we have kun+1 − pk ≤ kvn+1 − pk ≤ k¯ un − pk and kun − pk ≤ kvn − pk . But u¯n = tun + (1 − t)vn so k¯ un − pk ≤ kvn − pk . Thus there exists d such that d = lim k¯ un − pk = lim kvn − pk . If d = 0, then kvn − pk → 0 as n → ∞. If d 6= 0, then
u
¯n − p → 1.
kvn − pk
Nevertheless
u
¯n − p = t(un − p) + (1 − t)(vn − p) .
kvn − pk kvn − pk kvn − pk
Setting
Wn =
u ¯n − p vn − p , Zn = , kvn − pk kvn − pk
we have kWn k ≤ 1,
kZn k = 1.
3.4. Stationary and nonstationary projection-iteration methods
61
Hence kWn − Zn k → 0 as n → ∞, i.e.
u
¯n − p
¯n − p − vn − p = u
kvn − pk kvn − pk vn − p → 0 as n → ∞.
So kun − vn k → 0, as n → ∞. The proof of the theorem is complete. Let us deal with the algorithm un = An (un , vn ), n ≥ 1, vn = An,k (un , un−1 ),
(3.4.7)
where An,1 (x, y) = An (x, y), An,i (x, y) = An (x, An,i−1 (x, y)), i = 1, 2, ..., k. Lemma 3.4.8 Let (X, d) be a metric space, Mn , n = 0, 1, 2, ... compact subsets in B
X, and Mn → M0 as n → ∞ i.e. sup inf d(x, y) → 0 as n → ∞. Then the set
∪∞ n=0 Mn is compact too.
x∈Mn y∈M0
Remark 3.4.2 Similarly to the proofs of the last Theorem and of Theorems 3.4.6, 3.4.7 it is able to show the weak convergence of the approximants defined by (3.4.7) or by un = (tJc − (I − t)An )(un , vn ), n ≥ 1, vn = (tJc − (I − t)An )k (un , un−1 ), for any t ∈ (0, 1), the the cases when X is a reflexive Banach space or an uniformly convex one and An are continuous mappings from D×D into Dn , where Dn , mn≥ 0 are convex closed subsets in D ⊂ X. Theorem 3.4.9 Let X be a Banach space, D a closed subset of X. Assume that An , n > 0 are continuous and weakly compact mappings from D × D into D. Moreover the following conditions are satisfied: 12/ F = {x : A(x, x) = x} 6= ∅; 13/ < max(kx − pk , ky − pk) if kx − pk 6= ky − pk , kAn (x, y) − pk ≤ max(kx − pk , ky − pk) if kx − pk = ky − pk ∀x, y ∈ D, p ∈ F, n ≥ 0; 14/ f or c ∈ [0, 1] the mapping Jc − A0 is quasiconvex W on D × D. 15/ the space X has the property: if {yn } ⊂ X, yn → y0 ∈ X, then limkyn − yk >limkyn − y0 k , ∀y 6= y0 , y ∈ X; 16/ The system (3.4.7) is solvable for n ≥ 1 and kun − vn k → 0 as n → ∞; 17/ kAn (x, y) − A0 (x, y)k ≤ ε, ε → 0 as n → ∞, ∀x, y ∈ D. Then the sequences {un } , {vn } defined by formula (3.4.7) weakly converge to the solution of equation (3.4.1) .
62
3. Contractive Fixed Point Theory
Proof. As in the proof of Theorem 3.4.1 we have lim kvn − pk = lim kun − pk ≤ ku0 − pk . Hence {un } , {vn } ⊂ D0 = B(p, ku0 − pk) ∩ D (convex and closed in D). Let Mn be the image of the restriction of An on D0 × D0 , n ≥ 0. By the weak compactness of An , Mn are weakly compact. By means of constructed method (3.4.7) it is clear that un , vn ∈ Mn , n ≥ 1. By condition 17/ and Lemma 3.4.8, M = ∪n≥0 Mn is weakly compact. Therefore from the sequences {un } , {vn } we can choose weakly convergent subsequences unj , vnj such that W
W
unj → u, vnj → v. But
ku − vk ≤ lim unj − vnj = lim kun − vn k = 0, that is, u = v. On the other hand,
k(Jc − A0 ) (u, v)k = kcu + (1 − c) v − A0 (u, v)k
≤ lim cunj + (1 − c) vnj − A0 unj , vnj + εnj
≤ lim cunj + (1 − c) vnj − Anj unj , vnj + εnj
≤ lim (1 − c) unj , vnj (since unj = Anj unj , vnj ) = (1 − c) lim kun − vn k = 0.
Hence cu + (1 − c) v = A0 (u, v), and u = v, so u = v ∈ F. It will be shown that {un } , {vn } have just one limit point (in the weak sense), say, p. In order to do it we consider the function f (p) = lim kvn − pk defined on F0 = F ∩ M0 . Since the function f is lower weakly semicontinuous and the set F is weakly compact (as the image of a bounded closed set under a weakly compact mapping), we claim that the function f attains on F0 its infinimum at a point, say, p¯. Suppose that a subsequence {unk } weakly converges to u 6= p¯. Then by condition 15/ we have lim kunk − p¯k > lim kunk − uk . W
W
This is a contradiction. So un → p, vn → p. The theorem is thus proved. By the use of Edelstein’s Theorem from [149], [150] for the operator An (x) = A(x, v) it is easy to prove
3.4. Stationary and nonstationary projection-iteration methods
63
Theorem 3.4.10 Let D be a closed bounded subset of a normed space X, and A a mapping from D × D to D satisfying the condition < max {kx − zk , ky − tk} , if (x, y) 6= (z, t) and x 6= y or z 6= t, kA(x, y) − A(z, t)k ≤ kx − zk = ky − tk , if x = y and z = t,
(3.4.8)
for all x, y, z, t ∈ D. Assume that either D is compact or A maps D × D into a compact subset of D. Then the equation A (x, x) = x
(3.4.9)
has a unique solution in D. Moreover every equation xn = A(xn , xn−1 ), n ≥ 0,
(3.4.10)
has also a unique solution xn and the sequence {xn } defined by (3.4.10) converges to the solution of equation (3.4.9) for any x0 ∈ D. Remark 3.4.3 If instead of (3.4.9) under the assumptions as in Theorem 3.4.10 we use the Picard iteration for A∗ (x) = A (x, x) as in [105] by L.P. Belluce and W.A. Kirk, in [111] by F.E. Browder, in [149] by M. Edelstein, in [273] by W.V. Petryshyn and T.E. Williamson, etc., then obviously it is impossible to assert either the existence or the convergence of the approximate solution for equation (3.4.10). In what follows some simple examples will be shown. Example 3.4.1 Let us consider the set D of C(−∞, +∞) with elements defined as follows x (t) ∈ D if |x (t)| ≤ M, M is a constant, x (k) = 0 for all integers k ∈ Z, 2 (t − k) x (k + 1/2) , for t ∈ [k, k + 1/2] x (t) = 2 (k + 1 − tx (k + 1/2)) , for t ∈ [k + 1/2, k + 1] |x (k + 1/2)| ≤ |x (k + 3/2)| for all k ∈ Z. Let A (x, y) =
x (t + 1) + y (t + 1) , ∀x (t) , y (t) ∈ D, 2 + kx − yk / (kx − yk + M )
where kx − yk =
sup t∈(−∞,∞)
kx (t) − y (t)k .
Mapping A satisfies all conditions of Theorem 3.4.10. Hence for any x0 (t) ∈ D, the sequence xn (t) = A (xn (t) , xn−1 (t)) must converge to the solution of (3.4.10) x (k + 1/2) = x (k + 3/2) , ∀k ∈ Z (including x (t) ≡ 0). On the other hand the iteration xn = A (xn−1 , xn−1 ) may have got any limit point when, for instance, x0 (t) is such that: x0 (k0 + 1/2) < x0 (k0 + 3/2) for some k0 ∈ Z.
64
3. Contractive Fixed Point Theory
Example 3.4.2 Let D = lp , 1 < p < ∞, A (x, y) =
τy exp (kx − yk)
where τ : x = (x1 , x2 , . . .) 7−→ τ x = (0, x1 , x2 , . . .) , x ∈ lp . Finally for arbitrary x0 = (x01 , x02 , . . .) , x01 6= 0, the sequence defined by (3.4.10) converges to 0 as n → ∞.
3.5
A Special iteration method
The convergence of the iteration algorithm based on the algorithmic function x → αx + (1 − α) A(x) is analyzed, where A : Rn → Rn is a point to point mapping, and the coefficient α may change during the process. We consider iteration methods of the form xk+1 = αk xk + (1 − αk ) A(xk )
(3.5.1)
where A : D → D with D ⊆ Rn being convex, and for all k, 0 ≤ αk < 1. The convexity of D implies that the iteration sequence exists for arbitrary x0 ∈ D. In Section 3.1 the convergence of this procedure was studied for the special case, when A is a contraction and αk ≡ α for all k. This result was further generalized for the nonstationary case [99]. These earlier results guarantee the convergence of process (3.5.1) in cases when the original algorithm zk+1 = A (zk )
(3.5.2)
also converges to the unique fixed point of mapping A. In this section we will find convergence condition even in the cases when the original process (3.5.2) diverges. In such cases the optimal selection of the weights αk will also be determined. The following conditions will be assumed: (A) kA (x) − z ∗ k ≤ K (x) kx − z ∗ k , for all x ∈ D,
(3.5.3)
where z ∗ ∈ D is a fixed point of A, k·k is the Euclidean norm, and K : D → R is a real valued function; (B) T
2
(x − z ∗ ) (A (x) − z ∗ ) ≤ β (x) kx − z ∗ k , for all x ∈ D, where β : D → R is a real valued function.
(3.5.4)
3.5. A Special iteration method
65
Before analyzing the convergence of process (3.5.1), some comments are in order. If mapping A is bounded on D, that is, if kA (x) − A (y)k ≤ K kx − yk , the constant function K (x) ≡ K. If mapping −A is monotonic, that is, if T
(x − z) (A (x) − A (y)) ≤ 0 for all x, y ∈ D, then we may select β (x) ≡ 0. The Cauchy-Schwartz inequality implies that for all x ∈ D T
2
(x − z ∗ ) (A (x) − z ∗ ) ≤ kx − z ∗ k · kA (x) − z ∗ k ≤ K (x) kx − z ∗ k . Therefore we may assume that β (x) ≤ K (x) , otherwise we may replace β (x) by K (x) . Under assumptions (A) and (B) 2
2
kxk+1 − z ∗ k = kαk xk + (1 − αk ) A (xk ) − αk z ∗ − (1 − αk ) z ∗ k 2
= kαk (xk − z ∗ ) + (1 − αk ) (A (xk ) − z ∗ )k
= α2k kxk − z ∗ k + (1 − αk )2 kA (xk ) − z ∗ k2
+ 2αk (1 − αk ) (xk − z ∗ ) (A (xk ) − z ∗ ) h i 2 2 ≤ α2k + (1 − αk ) Kk2 + 2αk (1 − αk ) βk kxk − z ∗ k ,
2
where Kk = K (xk ) and βk = β (xk ) . Let p (αk ) denote the multiplier of kxk − z ∗ k , then p (αk ) = α2k 1 + Kk2 − 2βk − 2αk Kk2 − βk + Kk2 . Notice that
p (0) = Kk2 and p (1) = 1, further more p is a convex quadratic polynomial, unless 1 + Kk2 − 2βk = 0. Since 2
1 + Kk2 − 2βk ≥ 1 + Kk2 − 2Kk = (1 − Kk ) ≥ 0, this is the case if and only if Kk = βk = 1. In this special case p (αk ) ≡ 1. Otherwise the stationary point of function p is α∗ =
Kk2 − βk , 1 + Kk2 − 2βk
(3.5.5)
66
3. Contractive Fixed Point Theory
which is in [0, 1) if and only if βk ≤ Kk2 and βk < 1. Easy calculation shows that p (α∗ ) =
Kk2 − βk2 , 1 + Kk2 − 2βk
which is below 1 − ε (where ε is small) when (βk − 1) ≥ ε. 1 + Kk2 − 2βk If α∗ ∈ / [0, 1), then min {p (α) : 0 ≤ α ≤ 1} =
Kk2 , if Kk ≤ 1, 1, if Kk > 1.
The above iteration can be summarized as follows. Theorem 3.5.1 Assume that for each k ≥ 0, either Kk ≤ 1 − δ, βk > Kk2 ; or Kk ≤ 1, βk ≤ Kk2 , βk ≤ 1 − δ; or 1 < Kk ≤ Q, βk ≤ 1 − δ, where δ and Q are fixed constants. Then with appropriate selection of αk , kxk+1 − z ∗ k ≤ (1 − ε) kxk − z ∗ k for all k with some ε > 0, therefore the process converges to z ∗ . Assume next that K (x) ≡ K and β (x) ≡ β for all x ∈ D. Then the theorem reduces to the following result. Theorem 3.5.2 Assume that either K < 1, or K ≥ 1 and β < 1. Then the process converges to z ∗ with the appropriate selection of the coefficients αk . Corollary 3.5.3 If mapping A is bounded and −A is monotonic, then all conditions of Theorem 3.5.2 are satisfied. In this special case the best selection for αk is the following αk =
K2 , for all k. 1 + K2
3.5. A Special iteration method
67
This relation follows from (3.5.5) and the fact that in this case K (x) ≡ K and β (x) ≡ 0. Example 3.5.1 Consider equation x = −2x, that is, select D = R, A (x) = −2x, and z ∗ = 0. Mapping A is bounded and K = 2. The original iteration process xk+1 = A (zk ) does not converge for x0 6= 0. However, by selecting αk = 45 , the modified process has the form xk+1 =
4 1 2 xk + (−2xk ) = xk , 5 5 5
which is obviously convergent, since mapping x → 25 x is a contraction. Consided now the generalized algorithm xk+1 =
I X
αik Ai (xk ) ,
(3.5.6)
i=1
where Ai : D → D, and αik is a given nonnegative constant such that for all k ≥ 0. It is assumed that (C) for all i and j, T
PI
i=1
αik = 1
2
(Ai (x) − z ∗ ) (Aj (x) − z ∗ ) ≤ βij (x) kx − z ∗ k
for all x ∈ D, where βij : D → R is a real valued function for all i and j. Notice that these conditions are the straightforward generalizations of assumptions (A) and (B). Easy calculation shows that
I
2
X
∗ 2 ∗ kxk − z k = αik (Ai (xk ) − z )
i=1 I X I X 2 = αik αjk βijk kxk − z ∗ k , i=1 j=1
where βijk = βij (xk ) . The best choice of the αik values can be obtained by solving the quadratic programming problem minimize
I X I X
αik αjk βijk ,
i=1 j=1
subject to αik ≥ 0 (i = 1, 2, . . . , I) I X i=1
αik = 1.
(3.5.7)
68
3. Contractive Fixed Point Theory
The iteration procedure is convergent, if for all k, the optimal objective function value is below 1 − ε with some fixed small ε > 0. Since for all k, the constraints are linear, the solution of problem (3.5.7) is a simple task. Process (3.5.6) can be further generalized as xk+1 = Fk (A1 (xk ) , . . . , AI (xk )) ,
(3.5.8)
where Ai : D → D, Fk : DI → D. Let z ∗ be a common fixed point of mappings Ai , and assume that z ∗ = Fk (z ∗ , . . . , z ∗ ) , furthermore (D) for all x ∈ D and all i, kAi (x) − z ∗ k ≤ Ki (x) kx − z ∗ k , where Ki : d → R is a real valued function; (E) for all x(i) ∈ D (i = 1, 2, . . . , I) and k ≥ 0, I
X
αik x(i) − z ∗ .
Fk x(1) , . . . , x(I) ≤ i=1
Under the above conditions,
kxk+1 − z ∗ k = kFk (A1 (xk ), . . . , AI (xk )) − z ∗ k ≤ ≤
I X i=1
αik kAi (xk ) − z ∗ k
I X i=1
αik Ki
!
kxk − z ∗ k .
PI It is easy to see that there exist coefficients αik such that i=1 αik Ki ≤ 1 − ε with some ε > 0 if and only if at least one Ki < 1. This observation generalizes the earlier results of Section 3.1. Finally we note that D may be a subset of an infinite dimensional Euclidean space, and in the last result k·k is not necessarily the Euclidean norm.
3.6
Stability of the Ishikawa iteration
Let X be an uniformly real smooth Banach space and T : X → X be Φ-strongly pseudocontractive mapping with the bounded range. It is shown that Ishikawa iteration procedures are almost T -stable and half T -stable. our results improve and extend corresponding results announced by others. Let X be a real Banach space and J the normalized duality mapping from X ∗ into 2X given by 2
2
J(x) = {f ∈ X ∗ : hx, f i = kxk = kf k },
3.6. Stability of the Ishikawa iteration
69
where X ∗ denotes the dual space of X and h·, ·i denotes the generalized duality pairing. It is well known that if X ∗ is strictly convex then J is single-valued and such that J(tx) = tJ(x) for all t ≥ 0 and x ∈ X, and if X ∗ is uniformly convex, then J is uniformly continuous on the bounded subsets of X. In the sequel we shall denote the single-valued normalized duality mapping by j. Definition 3.6.1 A mapping T with domain D(T ) and range R(T ) in X is called Φ-strongly pseudocontraction if for all x, y ∈ D(T ) there exist j(x − y) ∈ J(x − y) and a strictly increasing function Φ : [0, ∞) → [0, ∞) with Φ(0) = 0 such that 2
hT x − T y, j(x − y)i ≤ kx − yk − Φ(kx − yk) kx − yk
(3.6.1)
T is called strongly pseudocontraction if there exists k > 0 such that (3.6.1) is satisfied with Φ(t) = kt, and it is called Φ-hemicontractive if the nullspace of T, N (T ) = {x ∈ D(T ); T x = 0} is nonempty and (3.6.1) is satisfied for all x ∈ D(T ) but y ∈ N (T ). Furthermore, the class of strongly pseudocontraction mapping is a proper subclass of the class of Φ-strongly pseudocontraction mappings. Recently, some stability results for certain classes of nonlinear maps have been researched by several authors, the study topic on stability is both of theoretical and of numerical interest. In order to prove the main results of the paper, we need the following some concepts. Let mK be a nonempty convex subset of a real banach space X and T be a self-mapping of K. Let {αn } be a real sequence in [0, ∞). Assume that x0 ∈ K and xn+1 = f (T, xn , αn ) defines an iteration procedure which yields a sequence of points {xn } in K. Suppose, furthermore, that F (T ) = {x ∈ X : T x = x} 6= ∅ and that {xn } converges strongly to x∗ ∈ F (T ). Let {yn }∞ n=0 be any sequence in K and + {εn }∞ a sequence in R = [0, ∞) given by ε = ky n n+1 − f (T, yn , αn )k . n=0 Definition 3.6.2 1) If lim εn = 0 implies that lim yn = x∗ , then the iteration n→∞
n→∞
procedure {xn } defined by xn+1 = f (T, xn , αn ) is said to be T -stable. ∞ P 2) < ∞ implies that lim yn = x∗ , then the iteration procedure {xn } defined n=1
n→∞
by xn+1 = f (T, xn , αn ) is said to be almost T -stable. 3) If εn = o(αn ) implies that lim yn = x∗ , then the iteration procedure {xn } n→∞
defined by xn+1 = f (T, xn , αn ) is said to be half T -stable. ∞ P 4) If εn = ε′n + ε′′n with ε′n < ∞ and ε′′n = o(αn ) implies that lim yn = x∗ , n=1
n→∞
then the iteration procedure {xn } defined by xn+1 = f (T, xn , αn ) is said to be weakly T -stable.
We remark immediatly that, if an iterative sequence {xn } is T -stable, then it is weakly T -stable and, if the iterative sequence {xn } is weakly T -stable, then it is both almost and half T -stable. Conversly, it is evident that an iterative sequence {xn } which is either almost or half T -stable may fail to be weakly T -stable. Accordingly, it is of important theoretical interest for us to study the weak stability.
70
3. Contractive Fixed Point Theory
∞ Lemma 3.6.3 Let {a}∞ n=0 , {bn }n=0 be two nonnegative real sequence satisfying condition: ∞ P an+1 ≤ an + bn, bn < ∞, then lim an ∃. n→∞
n=1
Lemma 3.6.4 Let be a real banach space. then, for all x, y ∈ X and j(x + y) ∈ J(x + y), 2
2
kx + yk ≤ kxk + 2hy, j(x + y)i. Theorem 3.6.5 Let X be a real uniformly smooth Banach space, T : X → X be a continuous and Φ-strong pseudocontraction mapping with a bounded range. Suppose ∞ q is the fixed point of the mapping T. Let {αn }∞ n=1 , {βn }n=1 be two real sequence satisfying: 0 ≤ αn , βn < 1, ∀n ≥ 1
lim αn = lim βn = 0 n→∞ n→∞ P∞ α = ∞. n=1 n
Let {xn }∞ n=1 be the sequence of the Ishikawa iteration [198] generated from an arbitrary x1 ∈ X by xn+1 = (1 − αn )xn + αn T zn
zn = (1 − βn )xn + βn T xn ,
Let
{yn }∞ n=1
n ≥ 1.
+ be any sequence in X and define {εn }∞ n=1 ⊂ R by
εn = o(αn ) = kyn+1 − (1 − αn )yn + αn T τn k τn = (1 − βn )yn + βn T yn , n ≥ 1.
Then the Ishikawa iteration procedure {xn }∞ n=1 is half T -stable. Proof. Let q denote the unique fixed point of T. Let εn = o(αn ) → 0 as n → ∞. It suffices to prove that yn → q as n → ∞. Without loss of generality, we may assume that εn ≤ αn for all mn≥ 1. Now set M0 = sup {kT x − qk} + ky1 − qk , M = M0 + 1. x∈X
For any n ≥ 1, using induction, we get kyn − qk ≤ M. Indeed, we have ky1 − qk ≤ M0 ≤ M, ky2 − qk ≤ ky2 − (1 − α1 )y1 − α1 T τ1 k + (1 − α1 ) ky1 − qk + α1 kT τ1 − qk ≤ ε1 + (1 − α1 )M0 + α1 M0 ≤ α1 + M0 ≤ M, ky3 − qk ≤ ky3 − (1 − α2 )y2 − α2 T τ2 k + (1 − α2 ) ky2 − qk + α2 kT τ2 − qk
≤ ε2 + (1 − α2 )M + α2 M0 ≤ α2 + (1 − α2 )M0 + α2 M0 ≤ M, ... ky3 − qk ≤ εn−1 + (1 − αn−1 )M + αn−1 M0 ≤ αn−1 + (1 − αn−1 )M + αn−1 M0 ≤ M.
3.6. Stability of the Ishikawa iteration
71
Since T is Φ-strong pseudocontraction mapping, then we have 2
hT x − T y, J(x − y)i ≤ kx − yk − Φ(kx − yk) kx − yk , for ∀x, y ∈ X. Let vn = (1 − αn )yn + αn T τn , un = yn+1 − vn . Then we have yn+1 = un + vn , kun k = εn and kyn+1 − qk ≤ kvn − qk + kun k = kvn − qk + εn .
(3.6.2)
Using Lemma 3.6.4, we have 2
kvn − qk = k(1 − αn )yn + αn T τn − qk
2
= k(1 − αn )(yn − q) + αn (T τn − T qk2 2
≤ (1 − αn )2 kyn − qk + 2αn hT τn − T q, J(vn − q)i 2
= (1 − αn )2 kyn − qk + 2αn hT τn − T q, J(τn − q)i + 2αn hT τn − T q, J(vn − q) − J(τn − q)i 2
≤ (1 − αn )2 kyn − qk + 2αn kτn − qk − 2αn Φ(kτn − qk) kτn − qk + 2αn dn
2
(3.6.3)
where dn = hT τn − T q, J(vn − q) − J(τn − q)i → 0 (n → ∞) (using J is uniformly continuous on the bounded subsets of X). Again applying Lemma 3.6.4, we have kτn − qk2 = k(1 − βn )(yn − q) + βn (T yn − q)k2 2
≤ (1 − βn )2 kyn − qk + 2βn hT yn − T q, J(τn − q)i 2
= (1 − βn )2 kyn − qk + 2βn hT yn − T q, J(yn − q)i
+ 2βn hT yn − T q, J(τn − q) − J(yn − q)i 2
≤ (1 − βn )2 kyn − qk + 2βn kyn − qk − 2βn Φ(kyn − qk) kyn − qk + 2βn en
2
2
= (1 + βn 2 ) kyn − qk − 2βn Φ(kyn − qk) kyn − qk + 2βn en , (3.6.4)
where en = hT τn − T q, J(τn − q) − J(yn − q)i → 0 (n → ∞) (same as above).
72
3. Contractive Fixed Point Theory Substituting (3.6.4) into (3.6.3) yields kvn − qk2 ≤ (1 − αn )2 kyn − qk2 + 2αn (1 + βn 2 ) kyn − qk2 − 4αn βn Φ(kyn − qk) kyn − qk + 4αn βn en − 2αn Φ(kτn − qk) kτn − qk + 2αn dn 2
= (1 + α n2 + 2α nβ 2n) ky n − qk + 4α nβ n[e n − Φ(ky n − qk) ky n− qk] + 2αn [dn − Φ(kτn − qk) kτn − qk]
≤ kyn − qk2 + (αn 2 + 2αn βn2 )M 2 + 4αn βn [en − Φ(kyn − qk) kyn − qk]
+ 2αn [dn − Φ(kτn − qk) kτn − qk]
= kyn − qk2 + 4αn βn (M 2 /2βn + en − Φ(kyn − qk) kyn − qk]
(3.6.5)
2
+ 2αn [M /2αn + dn − Φ(kτn − qk) kτn − qk]. We shall prove: inf {kyn − qk} = 0. If not, there must exist δ > 0, such that n≥1
inf {kyn − qk} = δ. This is to say that kyn − qk ≥ δ, for all n ≥ 1, and hence
n≥1
Φ(kyn − qk) kyn − qk ≥ Φ(δ)δ. From (3.6.4), we obtain 2
2
kτn − qk ≤ kyn − qk + M 2 βn2 + 2en βn − 2βn Φ(δ)δ 2
= kyn − qk + βn (M 2 βn + 2en − 2Φ(δ)δ).
(3.6.6)
Since βn → 0 and en → 0 as n → ∞, then there exists some integer N1 such that M 2 βn + 2en − 2Φ(δ)δ < 0 for all n > N1 . Thus we have kτn − qk ≤ kyn − qk and −Φ(kτn − qk) kτn − qk ≥ −Φ(kyn − qk) kyn − qk
(3.6.7)
for all n > N1 . Take these facts into consideration in (3.6.5), we obtain 2
2
kvn − qk ≤ kyn − qk + 4αn βn (M 2 /2βn + en − Φ(kτn − qk) kτn − qk]
(3.6.8)
2
+ 2αn [M /2αn + dn − Φ(kτn − qk) kτn − qk], for all n > N1 . Furthermore, we have the following estimate: kτn − qk = k(1 − βn )(yn − q) + βn (T yn − T q)k ≥ kyn − qk − βn kyn − qk − βn kT yn − T qk ≥ kyn − qk − βn (M + M0 ).
(3.6.9)
3.6. Stability of the Ishikawa iteration
73
Again since βn → 0 as n → ∞, then there exists some integer N2 > N1 such that βn < δ/(2(M + M0 )) for all n > N2 . Hence, from (3.6.8) and (3.6.9) we obtain δ/2 ≤ kyn − qk − δ/2 ≤ kτn − qk
(3.6.10)
for all n > N2 > N1 . Using (3.6.10) and (3.6.8) we have for all n > N3 kvn − qk2 ≤ kyn − qk2 + 4αn βn (M 2 /2βn + en − Φ(δ/2)δ/2]
(3.6.11)
2
+ αn [M αn + 2dn − Φ(δ/2)δ/2] − αn Φ(kτn − qk) kτn − qk .
Owning to αn , βn , en , dn → 0 as n → ∞, without loss generality, it must be that there exists integer N3 > N2 such that M 2 /2βn + en − Φ(δ/2)δ/2 < 0,
M 2 αn + 2dn − Φ(δ/2)δ/2 < 0
for all n > N3 . Thus from (3.6.11) we get kvn − qk ≤ kyn − qk , 2
2
kvn − qk ≤ kyn − qk − αn Φ(kτn − qk) kτn − qk
(3.6.12)
for all n > N3 . Since εn = o(αn ), we may choose positive integer N4 > N3 such that 1 o(αn ) [o(αn )]2 Φ(δ/2)δ/2 − 2M − >0 2 αn αn
(3.6.13)
for all n > N4 . Again from (3.6.2), (3.6.10), (3.6.12) and (3.6.13), we obtain 2
2
kyn+1 − qk ≤ kvn − qk + 2 kvn − qk εn + ε2n 2
≤ kyn − qk − αn Φ(kτn − qk) kτn − qk + 2M εn + ε2n
≤ kyn − qk2 − αn Φ(kτn − qk) kτn − qk + 2M [o(αn )] + [o(αn )]2 1 2 ≤ kyn − qk − αn Φ(δ/2)δ/2 2 2M [o(αn )] [o(αn )]2 1 − αn αn Φ(δ/2)δ/2 − − 2 αn αn 1 2 ≤ kyn − qk − αn Φ(δ/2)δ/2 (3.6.14) 2 2 ≤ kyn − qk (3.6.15) for all n > N4 . From above (3.6.15), we get lim kyn − qk ∃. n→∞
Thus, (3.6.14) implies that 1 2
∞ X
n=N4 +1
αn Φ(δ/2)δ/2 ≤ kyN4 +1 − qk2 − lim kyn+1 − qk2 < ∞. n→∞
74
3. Contractive Fixed Point Theory
This is a contradiction. Thus inf kyn − qk = 0.
(3.6.16)
n
It follows that we want to prove lim kyn − qk = 0. n→∞
From (3.6.16) we get that there exists an infinite subsequence {ynj − q} → 0 as j → ∞.
τnj − ynj = −βnj ynj + βnj T ynj → 0 as j → ∞, so that τnj − ynj ≤
Since
τnj − ynj + ynj − q → 0 as j → ∞, thus we have
ynj +1 − q ≤ vnj − q + εnj n
2
≤ ynj − q + 4αnj βnj [M 2 /2βnj + enj − Φ( ynj − q ) ynj − q ]
1/2 +2αnj [M 2 /2αnj + dnj − Φ( τnj − q ) τnj − q ] + εnj → 0,
as j → ∞. Using continuous of Φ and induce method, we obtain ynj +m − q → 0 as j → ∞ for any nature number m.
Theorem 3.6.6 Let X be a real uniformly smooth Banach space, T : X → X be a contiuous and Φ-strong accretive operator. Suppose (I − T ) has a bounded range. Define S : X → X by Sx = f + x − T x. Suppose q is the fixed point of the mapping ∞ S. Let {αn }∞ n=1 , {βn }n=1 be two real sequence satisfying: (1) 0 ≤ αn , βn < 1, ∀n ≥ 1
(2) lim αn = lim βn = 0 n→∞
n→∞
(3)
∞ X
n=1
αn = ∞.
Let {xn }∞ n=1 be the sequence of the Ishikawa iteration generated from an arbitrary x1 ∈ X by xn+1 = (1 − αn )xn + αn Szn zn = (1 − βn )xn + βn Sxn , n ≥ 1. ∞ + Let {yn }∞ n=1 be any sequence in X and define {εn }n=1 ⊂ R by εn = o(αn ) = kyn+1 − (1 − αn )yn − αn Sτn k τn = (1 − βn )yn + βn Syn , n ≥ 1.
Then the Ishikawa iteration procedure {xn }∞ n=1 is half S-stable. Proof. Since T is Φ-strong accretive operator, then S is Φ-strong pseudocontraction. Now the conclusion follows from Theorem 3.6.5. Theorem 3.6.7 Let X be a real uniformly smooth Banach space, T : X → X be a continuous and Φ-strong pseudocontraction mapping with the bounded range.
3.6. Stability of the Ishikawa iteration
75
∞ Suppose mq is the fixed point of the mapping T. Let {αn }∞ n=1 , {βn }n=1 be two real sequence satisfying:
(1) 0 ≤ αn , βn < 1, ∀n ≥ 1
(2) lim αn = lim βn = 0 n→∞
n→∞
(3)
∞ X
n=1
αn = ∞.
Let {xn }∞ n=1 be the sequence of the Ishikawa iteration generated from an arbitrary x1 ∈ X by xn+1 = (1 − αn )xn + αn T zn zn = (1 − βn )xn + βn T xn , n ≥ 1. ∞ + Let {yn }∞ n=1 be any sequence in X and define {εn }n=1 ⊂ R by
εn = kyn+1 − (1 − αn )yn − αn T τn k τn = (1 − βn )yn + βn T yn , n ≥ 1.
Then the Ishikawa iteration procedure {xn }∞ n=1 is almost T -stable. Proof. Set M0 = sup {kT x − qk} + ky1 − qk , M = M0 + x∈X
∞ P
εn .
n=1
For any n ≥ 1, using induction, we get kyn − qk ≤ M. Indeed we have ky1 − qk ≤ M0 ≤ M, ky2 − qk ≤ ky2 − (1 − α1 )y1 − α1 T τ1 k + (1 − α1 ) ky1 − qk + α1 kT τ1 − qk ≤ ε1 + (1 − α1 )M0 + α1 M0 ≤ ε1 + M0 ≤ M,
ky3 − qk ≤ ε2 + ε1 + M0 ≤ M, ... ky3 − qk ≤
n−1 X i=1
εi + M0 ≤ M.
Since T is Φ-strong pseudocontraction mapping, for ∀x, y ∈ X then have 2
hT x − T y, J(x − y)i ≤ kx − yk − Φ(kx − yk) kx − yk . Let vn = (1 − αn )yn + αn T τn , un = yn+1 − vn . Then we have yn+1 = un + vn , kun k = εn and kyn+1 − qk ≤ kvn − qk + kun k = kvn − qk + εn .
(3.6.17)
76
3. Contractive Fixed Point Theory
Using Lemma 3.6.4, we have 2
kvn − qk = k(1 − αn )yn + αn T τn − qk
2
= k(1 − αn )(yn − q) + αn (T τn − T qk2 2
≤ (1 − αn )2 kyn − qk + 2αn hT τn − T q, J(vn − q)i 2
= (1 − αn )2 kyn − qk + 2αn hT τn − T q, J(τn − q)i
+ 2αn hT τn − T q, J(vn − q) − J(τn − q)i 2
≤ (1 − αn )2 kyn − qk + 2αn kτn − qk − 2αn Φ(kτn − qk) kτn − qk + 2αn dn
2
(3.6.18)
where dn = hT τn − T q, J(vn − q) − J(τn − q)i → 0 (n → ∞) (using J is uniformly continuous on the bounded subsets of X). Again applying Lemma 3.6.4, we have 2
2
kτn − qk = k(1 − βn )(yn − q) + βn (T yn − q)k 2
≤ (1 − βn )2 kyn − qk + 2βn hT yn − T q, J(τn − q)i 2
= (1 − βn )2 kyn − qk + 2βn hT yn − T q, J(yn − q)i
+ 2βn hT yn − T q, J(τn − q) − J(yn − q)i 2
≤ (1 − βn )2 kyn − qk + 2βn kyn − qk − 2βn Φ(kyn − qk) kyn − qk + 2βn en
2
2
= (1 + βn 2 ) kyn − qk − 2βn Φ(kyn − qk) kyn − qk + 2βn en ,
(3.6.19)
where en = hT τn − T q, J(τn − q) − J(yn − q)i → 0 (n → ∞) (same as above). Substituting (3.6.19) into (3.6.18) yields 2
2
2
kvn − qk ≤ (1 − αn )2 kyn − qk + 2αn (1 + βn 2 ) kyn − qk − 4αn βn Φ(kyn − qk) kyn − qk
+ 4αn βn en − 2αn Φ(kτn − qk) kτn − qk + 2αn dn 2
= (1 + αn 2 + 2αn βn2 ) kyn − qk + 4αn βn [en − Φ(kyn − qk) kyn − qk]
+ 2αn [dn − Φ(kτn − qk) kτn − qk] 2
≤ kyn − qk + (αn 2 + 2αn βn2 )M 2 + 4αn βn [en − Φ(kyn − qk) kyn − qk] + 2αn [dn − Φ(kτn − qk) kτn − qk] 2
= kyn − qk + 4αn βn (M 2 /2βn + en − Φ(kyn − qk) kyn − qk] (3.6.20)
+ 2αn [M 2 /2αn + dn − Φ(kτn − qk) kτn − qk]. We shall prove: inf {kyn − qk} = 0. If not, there must exist δ > 0, such that n≥1
3.6. Stability of the Ishikawa iteration
77
inf {kyn − qk} = δ. This is to say that kyn − qk ≥ δ, for all n ≥ 1, and hence
n≥1
Φ(kyn − qk) kyn − qk ≥ Φ(δ)δ. From (3.6.19), we obtain kτn − qk2 ≤ kyn − qk2 + M 2 βn2 + 2en βn − 2βn Φ(δ)δ 2
= kyn − qk + βn (M 2 βn + 2en − 2Φ(δ)δ).
(3.6.21)
Since βn → 0 and en → 0 as n → ∞, then there exists some integer N1 such that M 2 βn + 2en − 2Φ(δ)δ < 0 for all n > N1 . Thus we have kτn − qk ≤ kyn − qk and −Φ(kτn − qk) kτn − qk ≥ −Φ(kyn − qk) kyn − qk
(3.6.22)
for all n > N1 . Take these facts into consideration in (3.6.20), we obtain 2
2
kvn − qk ≤ kyn − qk + 4αn βn (M 2 /2βn + en − Φ(kτn − qk) kτn − qk] (3.6.23) + 2αn [M 2 /2αn + dn − Φ(kτn − qk) kτn − qk], for all n > N1 . Furthermore, we have the following estimate: kτn − qk = k(1 − βn )(yn − q) + βn (T yn − T q)k
≥ kyn − qk − βn kyn − qk − βn kT yn − T qk ≥ kyn − qk − βn (M + M0 ).
(3.6.24)
Again since βn → 0 as n → ∞, then there exists some integer N2 > N1 such that βn < δ/(2(M + M0 )) for all n > N2 . Hence, from (3.6.23) and (3.6.24) we obtain δ/2 ≤ kyn − qk − δ/2 ≤ kτn − qk
(3.6.25)
for all n > N2 > N1 . Using (3.6.25) and (3.6.23) we have for all n > N3 2
2
kvn − qk ≤ kyn − qk + 4αn βn (M 2 /2βn + en − Φ(δ/2)δ/2]
(3.6.26)
2
+ αn [M αn + 2dn − Φ(δ/2)δ/2] − αn Φ(kτn − qk) kτn − qk .
Owning to αn , βn , en , dn → 0 as n → ∞, without loss generality, we may assume that ∃N3 such that M 2 /2βn + en − Φ(δ/2)δ/2 < 0,
M 2 αn + 2dn − Φ(δ/2)δ/2 < 0
78
3. Contractive Fixed Point Theory
for all n > N3 . Thus from (3.6.26) we have kvn − qk ≤ kyn − qk for all n > N3 . It follows that kyn+1 − qk ≤ kyn − qk + εn for all n > N3 . Using Lemma 3.6.3, we obtain lim kyn − qk ∃. n→∞
From (3.6.17) and (3.6.26) we get 2
2
kyn+1 − qk ≤ kvn − qk + 2 kvn − qk εn + ε2n 2
≤ kyn − qk − αn Φ(kτn − qk) kτn − qk + 2M εn + ε2n
for all n > N3 . Which implies that 1 2
∞ X
n=N3 +1
2
2
αn Φ(δ/2)δ/2 ≤ kyN3 +1 − qk − lim kyn+1 − qk + n→∞
X (2M εn +ε2n ) < ∞.
This is a contradiction. Thus inf kyn − qk = 0.
(3.6.27)
n
It follows that we want to prove lim kyn − qk = 0. n→∞
From (3.6.27) we get that there exists an infinite subsequence {ynj − q} → 0 as j → ∞.
τnj − ynj = −βnj ynj + βnj T ynj → 0 as j → ∞, so that τnj − ynj ≤ Since
τnj − ynj + ynj − q → 0 as j → ∞, thus we have
ynj +1 − q ≤ vnj − q + εnj n
2
≤ ynj − q + 4αnj βnj [M 2 /2βnj + enj − Φ( ynj − q ) ynj − q ]
1/2 +2αnj [M 2 /2αnj + dnj − Φ( τnj − q ) τnj − q ] + εnj → 0,
as j → ∞. Using continuous of Φ and induce method, we obtain ynj +m − q → 0 as j → ∞ for any nature number m.
Theorem 3.6.8 Let X be a real uniformly smooth Banach space, T : X → X be a continuous and Φ-strong accretive operator. Suppose (I − T ) has a bounded range. Define S : X → X by Sx = f + x − T x. Suppose q is the fixed point of the mapping ∞ S. Let {αn }∞ n=1 , {βn }n=1 be two real sequence satisfying: (1) 0 ≤ αn , βn < 1, ∀n ≥ 1
(2) lim αn = lim βn = 0 n→∞
n→∞
(3)
∞ X
n=1
αn = ∞.
Let {xn }∞ n=1 be the sequence of the Ishikawa iteration generated from an arbitrary x1 ∈ X by xn+1 = (1 − αn )xn + αn Szn zn = (1 − βn )xn + βn Sxn , n ≥ 1.
3.7. Exercises
79
∞ + Let {yn }∞ n=1 be any sequence in X and define {εn }n=1 ⊂ R by εn = kyn+1 − (1 − αn )yn − αn Sτn k τn = (1 − βn )yn + βn Syn , n ≥ 1.
Then the Ishikawa iteration procedure {xn }∞ n=1 is almost S-stable. Proof. Since T is Φ-strong accretive operator, then S is Φ-strong pseudocontraction. Now the conclusion follows from Theorem 3.6.7.
3.7
Exercises
3.1 Consider the problem of approximating a solution y ∈ C ′ [0, t0 ] of the nonlinear ordinary differential equation dy = K(t, y(t)), dt
0 ≤ t ≤ t0 , y(0) = y0 .
The above equation may be turned into a fixed point problem of the form Z t y(t) = y0 + K(s, y(s))ds, 0 ≤ t ≤ t0 . 0
Assume K(x, y) is continuous on [0, t0 ] × [0, t0 ] and satisfies the Lipschitz condition max K(s, q1 (s)) − K(s, q2 (s)) ≤ M kq1 − q2 k, for all q1 , q2 ∈ C[0, t0 ]. [0,t0 ]
Note that the integral equation above defines an operator P from C[0, t0 ] into itself. Find a sufficient condition for P to be a contraction mapping.
¯ (x0 , r) in a Banach space X, 3.2. Let F be a contraction mapping on the ball U and let kF (x0 ) − x0 k ≤ (1 − c)r. ¯ (x0 , r). Show F has a unique fixed point in U 3.3. Under the assumptions of Theorem 3.1.2, show that the sequence generalized by (3.1.2) minimizes the functional f (x) = kx − F (x)k for any x0 belonging to a closed set A such that F (A) ⊆ A. 3.4. Let the equation F (x) = x have a unique solution in a closed subset A of a Banach space X. Assume that there exists an operator F1 that F1 (A) ⊆ A and F1 commutes with F on A. Show the equation x = F1 (x) has at least one solution in A.
80
3. Contractive Fixed Point Theory
3.5. Assume that operator F maps a closed set A into a compact subset of itself and satisfies kF (x) − F (y)k < kx − yk
(x 6= y), for all x, y ∈ A.
Show F has a unique fixed point in A. Apply these results to the mapping 2 F (x) = x − x2 of the interval [0, 1] into itself. 3.6. Show that operator F defined by F (x) = x +
1 x
maps the half line [1, ∞) into itself and satisfies kF (x) − F (y)k < kx − yk but has no fixed point in this set. 3.7. Consider condition kF (x) − F (y)k ≤ kx − yk
(x, y ∈ A).
Let A be either an interval [a, b] or a disk x2 + y 2 ≤ r2 . Find conditions in both cases under which F has a fixed point. 3.8. Consider the set c0 of null sequences x = {x1 , x2 , . . .} (xn → 0) equipped with the norm kxk = maxn |xn |. Define the operator F by 1 F (x) = 21 (1 + kxk), 34 x1 , 78 x2 , . . . , 1 − 2n+1 xn , . . . . ¯ (0, 1) → U ¯ (0, 1) satisfies Show that F : U kF (x) − F (y)k < kx − yk, but has no fixed points. 3.9. Repeat Exercise 3.8 for the operator F defined in c0 by F (x) = {y1 , . . . , yn , . . .}, 1 where yn = n−1 n xn + n sin(n) (n ≥ 1). 3.10. Repeat Exercise 3.8 for the operator F defined in C[0, 1] by Ax(t) = (1 − t)x(t) + t sin 1t .
3.11. Let F be a nonlinear operator on a Banach space X which satisfies (3.1.3) ¯ (0, r). Let F (0) = 0. Define the resolvent R(x) of F by on U F (x)f = xF R(x)f + f. Show:
3.7. Exercises
81
(a) R(x) is defined on the ball kf k ≤ (1 − |x|c)r if |x| < c−1 ;
(b)
1 1+|x|c kf
− gk ≤ kR(x)f − R(x)gk ≤
(c) kR(x)f − R(y)f k ≤
1 1−|x|c kf
− gk;
ckf k |x−y| (1−|x|c)(1−|y|c) .
3.12. Let A be an operator mapping a closed set A of a Banach space X into itself. Assume that there exists a positive integer m such that Am is a contraction operator. Prove that sequence (3.1.2) converges to a unique fixed point of F in A. 3.13. Let F be an operator mapping a compact set A ⊆ X into itself with kF (x) − F (y)k < kx − yk (x 6= y, all x, y ∈ A). Show that sequence (3.1.2) converges to a fixed point of (3.1.1). 3.14. Let F be a continuous function on [0, 1] with 0 ≤ f (x) ≤ 1 for all x ∈ [0, 1]. Define the sequence xn+1 = xn +
1 n+1 (F (xn )
− xn ).
Show that for any x0 ∈ [0, 1] sequence {xn } (n ≥ 0) converges to a fixed point of F . 3.15. Show: (a) A system x = Ax + b of n linear equations in n unknowns x1 , x2 , . . . , xn (the components of x) with A = {ajk }, j, k = 1, 2, . . . , n, b given, has a unique solution x∗ if n X
k=1
|ajk | < 1,
j = 1, 2, . . . , n.
(b) The solution x∗ can be obtained as the limit of the iteration (x(0) , x(1) , x(2) , . . .}, where x(0) is arbitrary and x(m+1) = Ax(m) + b (m ≥ 0). (c) The following error bounds hold: kx(m) − x∗ k ≤
c cm kx(m−1) − x(m) k ≤ kx(0) − x(1) k, 1−c 1−c
where c = max j
n X
k=1
|ajk | and kx − zk = max |xi − zi |, j = 1, 2, . . . , n. j
82
3. Contractive Fixed Point Theory
3.16. (Gershgorin’s theorem): If λ is an eigenvalue of a square matrix A = {ajk }, then for some j, where 1 ≤ j ≤ n, |ajj − λ| ≤
n X
k=1 k6=j
|ajk |.
Show that x = Ax + b can be written Bx = b, where B = I − A, and P n k=1 |ajk | < 1 together with the theorem imply that 0 is not an eigenvalue of B and A has spectral radius less than 1. 3.17. Let (X, d), (X, d1 ), (X, d2 ) be metric spaces with d(x, z) = maxj |xj − zj |, j = 1, 2, . . . , n, X 1/2 n n X d1 (x, z) = |xj − zj | and d2 (x, z) = (xj − zj )2 , j=1
j=1
respectively. Show that instead of the conditions n X j=1
Pn
k=1
|ajk | < 1, j = 1, 2, . . . , n, we obtain
|ajk | < 1, k = 1, 2, . . . , n and
n X n X
a2jk < 1.
j=1 k=1
3.18. Let us consider the ordinary differential equation of the first order (ODE) x′ = f (t, x),
x(t0 ) = x0 ,
where t0 and x0 are given real numbers. Assume: |f (t, x)| ≤ c0 on R = {(t, x) | |t − t0 | ≤ a, |x − x0 | ≤ b}, |f (t, x) − f (t, v)| ≤ c1 |x − v|,
for all (t, x), (t, v) ∈ R.
Then show: the (ODE) has a unique solution on [t0 − c2 , t0 + c2 ], where n o c2 < min a, cb0 , c11 . 3.19. Show that f defined by f (x, y) = | sin y| + x satisfies a Lipschitz condition with respect to the second variable (on the whole xy-plane). 3.20. Does f defined by f (t, x) = |x|1/2 satisfy a Lipschitz condition? Rt 3.21. Apply Picard’s iteration xn+1 (t) = t0 f (s, xn (s))ds used for the (ODE) x′ = f (t, x), x(t0 ) = x0 + 0, x′ = 1 + x2 , x(0) = 0. Verify that for x3 , the terms involving t, t2 , . . . , t5 are the same as those of the exact solution.
3.7. Exercises
83
3.22. Show that x′ = 3x2/3 , x(0) = 0 has infinitely many solutions x. 3.23. Assume that the hypotheses of the contraction mapping principle hold, then ¯ 0 , r0 ). show that x∗ is accessible from any point U(x 3.24. Define the sequence {¯ xn } (n ≥ 0) by x ¯0 = x0 , x ¯n+1 = F (¯ xn ) + εn (n ≥ 0). Assume: kεn k ≤ λn ε (n ≥ 0) (0 ≤ λ < 1); F is a c-contraction operator on U (x0 , r). Then show sequence {¯ xn } (n ≥ 0) ¯ (x0 , r) provided that converges to the unique fixed point x∗ of F in U r ≥ r0 +
ε . 1−c
3.25. (a) Let F : D ⊆ X → X be an analytic operator. Assume: • there exists α ∈ [0, 1) such that kF ′ (x)k ≤ α •
(x ∈ D);
1
1 (k) k−1 γ = sup k! F (x)
(1)
is finite;
k>1 x∈D
• there exists x0 ∈ D such that kx0 − F (x0 )k ≤ η ≤
√ 3−α−2 2−α , γ
γ 6= 0;
¯ 0 , r1 ) ⊆ D, where, r1 , r2 with 0 ≤ r1 ≤ r2 are the two zeros of • U(x function f , given by f (r) = γ(2 − α)r2 − (1 + ηγ − α)r + η. ¯ 0 , r1 ) Show: method of successive substitutions is well defined, remains in U(x ∗ ¯ for all n ≥ 0 and converges to a fixed point x ∈ U(x0 , r1 ) of operator F . ¯ 0 , r2 ). Furthermore, the Moreover, x∗ is the unique fixed point of F in U(x following error bounds hold for all n ≥ 0: kxn+2 − xn+1 k ≤ βkxn+1 − xn k and kxn − x∗ k ≤
βn η, 1−β
84
3. Contractive Fixed Point Theory where β=
γη + α. 1 − γη
The above result is based on the assumption that the sequence
1
1 (k) k−1 γk = k! F (x)
(x ∈ D),
(k > 1)
is bounded above by γ. This kind of assumption does not always hold. Let us then not assume sequence {γk } (k > 1) is bounded and define “function” f1 by f1 (r) = η − (1 − α)r +
∞ X
γkk−1 rk .
k=2
(b) Let F : D ⊆ X → X be an analytic operator. Assume for x0 ∈ D function f1 has a minimum positive zero r3 such that ¯ 0 , r3 ) ⊆ D. U(x ¯ 0 , r3 ) Show: method of successive substitutions is well defined, remains in U(x ¯ (x0 , r3 ) of operator for all n ≥ 0 and converges to a unique fixed point x∗ ∈ U F . Moreover the following error bounds hold for all n ≥ 0 kxn+2 − xn+1 k ≤ β1 kxn+1 − xn k and kxn − x∗ k ≤
β1n η, 1 − β1
where, β1 =
∞ X
γkk−1 η k−1 + α.
k=2
3.26. (a) With F, D as in Exercise 3.25 assume that there exists α such that kF ′ (x∗ )k ≤ α, and ¯ ∗ , r∗ ) ⊆ D, U(x
(2)
3.7. Exercises
85
where, r∗ =
1 γ
∞, · 1−α 2−α ,
if γ = 0 if γ 6= 0.
Then, if β = α+
γr∗ < 1, 1 − γr∗
¯ (x∗ , r∗ ) for all show: the method of successive substitutions remains in U ∗ ∗ ∗ n ≥ 0 and converges to x for any x0 ∈ U (x , r ). Moreover, the following error bounds hold for all n ≥ 0: kxn+1 − x∗ k ≤ βn kxn − x∗ k ≤ βkxn − x∗ k, where, β0 = 1,
βn+1 = α +
γr∗ βn 1 − γr∗ βn
(n ≥ 0).
The above result was based on the assumption that the sequence 1
1 (k) ∗ k−1 F (x ) γk = k! (k ≥ 2)
is bounded γ. In the case where the assumption of boundedness does not necessarily hold, we have the following local alternative.
(b) Let x∗ be a fixed point of operator F. Moreover, assume: max exists and is attained at some r0 > 0. Set p=
∞ X
∞ P
r>0 k=2
(γk r)k−1
(γk r0 )k−1 ;
k=2
there exist α, δ with α ∈ [0, 1), δ ∈ (α, 1) such that p+α−δ ≤0 and ¯ ∗ , r0 ) ⊆ D. U(x
¯ ∗ , r0 ) Show: the method of successive substitutions {xn } (n ≥ 0) remains in U(x ∗ ∗ ¯ for all n ≥ 0 and converges to x for any x0 ∈ U (x , r0 ). Moveover the following error bounds hold for all n ≥ 0: kxn+1 − x∗ k ≤ αkxn − x∗ k +
∞ X
k=2
γkk−1 kxn − x∗ kk ≤ δkxn − x∗ k.
This page intentionally left blank
Chapter 4
Solving Smooth Equations In this chapter we are concerned with the problem of approximating a locally unique solution of an equation in a Banach space. The Newton-Kantorovich method is undoubtedly the most popular method for solving such equations. The theoretical importance of this method should be also emphasized, since using it we can draw conclusions about the existence, uniqueness and domain of distribution of solutions without finding the solution itself, this being sometimes of greater value than the actual knowledge of the solution.
4.1
Linearization of Equations
Let F be a Fr6chet-differentiable operator mapping an open and convex subset D of a Banach space X into a Banach space Y. Consider the equation
F (x) = 0.
(4.1.1)
The principal method for constructing approximations x, to a solution x* (if it exists) of Equations (4.1.1) is based on successive linearization of the equation. Assuming that an approximation x, has been found, we compute the next x,+l by replacing Equation (4.1.1) by
If F' (x,)-I
E L (Y, X), then approximation x,+l is given by
The iterative procedure generated by (4.1.3) is the famous Newton or NewtonKantorovich method [205]. Iterates {x,) ( n 0) are not always easy to compute. First, x, may fall outside the set D for a certain n, and secondly, even if this does not happen, F' (x,)-'
>
4.
88
Solving Smooth Equations
may not exist. If sequence {x,) converges to x* and xo is chosen sufficiently close to x*,the operations F' (x,) and F' (xo)will differ only by a small amount due to the continuity of F'. Therefore the modified Newton-Kantorovich method
Yn+l = Yn - F' (YO)-' F (yn) (n2 0) (yo = XO)
(4.1.4)
can also be used to approximate x*. Method (4.1.4) is easier to use than (4.1.3) since the inverses are computed only once and for all. However the latter method yields worse approximations.
4.2
The Newton-Kantorovich Method
Define the operator
P on D by
P(x)= x - F1(x)-IF (x)
(4.2.1)
Then the Newton-Kantorovich method may be regarded as the usual iterative method
for approximating the fixed point x* of the equation
x = P (x)
(4.2.3)
Consequently, all the results of the previous chapters involving Equation (4.2.3) are applicable. Suppose that lim x, = x*
n-00
We would like to know under what conditions on F and F' the point x* is a solution of Equation (4.1.1). Proposition 4.2.1 If F' is continuous at x = x* E
F (x*)= 0.
D,then we have (4.2.5)
Proof. The approximations x, satisfy the equation
Since the continuity of F at x* follows from the continuity of F',taking the limit we obtain (4.2.5). as n + oo in (4.2.6) Proposition 4.2.2 If
IlF' (x)11 5 b
(4.2.7)
in some closed ball in D which contains {x,), then x* is a solution of F (x)= 0.
4.2. The Newton-Kantorovich Method
89
Proof. By (4.2.7) we get lim F
(2,) = F
12-03
( x * ),
(4.2.8)
and since
(4.2.5) is obtained by taking the limit as n
+ ca in
(4.2.4). rn
Proposition 4.2.3 If
i n some closed ball
which contains {x,), then x* is a solution of equation F ( x ) = 0. Proof. By (4.2.10)
for all x E U ( x o ,r ) . Moreover, we can write
so the conditions of Proposition 4.2.2 hold with
As in Kantorovich (see, e.g., [207])consider constants Bo,70 such that
(4.2.16) 11x1 - ~ 0 1 1 5 Vo = "11 respectively. We state and give a slightly different and improved proof of the celebrated Newton-Kantorovich theorem [206],[207, Th. 6, (I.XVII), p. 7081:
Theorem 4.2.4 If condition (4.2.10) holds i n some closed ball
U ( x o ,r ) i n D and
then the Newton-Kantorovich sequence (4.1.3), starting from xo, converges to a solution x* of Equation (4.1.1) which exists i n u ( x o ,r ) , provided that r>ro=
1 - ,/= "loho
4.
90
Solving Smooth Equations
Proof. We first show that (4.1.3) is well defined. By (4.2.10) we get
IIF'
(XI)
-
F' (xo)ll I K 11x1 - ~
0 1 1 = Kilo
(4.2.19)
Using (4.2.17),we obtain [IF' ( X I ) - F' (xo)ll I Kilo By Theorem 1.3.2, [F' ( X I ) ] - '
1
1
2Bo
11 [~'(xo)l-'11
I -
0, E > 0, t E [0,11 there exzst 6 = 6 ( ~EO, , t ) and b 2 0 such that
and
II~'(xo)-~~"(xo)ll I b; ( 4 for
the following conditions hold
4.4.
Convergence under Twice-Re'chet Di&ferentiabilityOnly at a Point
99
and
Then, sequence {x,) ( n > 0 ) generated by method (4.1.3) is well defined, remains i n U ( x o ,6) for all n > 0 and converges to a solution x* of equation F ( x ) = 0 with x* E U ( x o ,6). Moreover the following estimates hold for all n > 0
and
Furthermore the solution x* is unique in U(xo,6) if a06 < 1. Proof. We first show operator F' (x)-' is invertible for all x E U ( x o 6). , Indeed we can obtain the identity:
By hypotheses we get
It follows from (4.4.15) and the Banach Lemma on invertible operators 1.3.2 that F' (x)-l E L(Y,X ) and
Note that:
Hence x get
+ d(y
-
x ) - xo E U(x0,6)for all x , y E U(xo,6). By (4.1.3) for n
= 0, we
4.
100
Solving Smooth Equations
That is x1 E U ( x o ,6). Assume xk E U ( x o ,6) for all k = 0 , I , . . . ,m (4.1.3) we can obtain the identity
+
b E I I F ' ( X O ) ~ ~ F ( X5Xl+( )( ~ k + ~- ~
-
1. Using
(4.4.19)
k ( ( ~ -
Moreover by (4.4.16) for x = xk, (4.1.3) and (4.4.18) we deduce llxk+l
-
xk11 5 l l ~ ' ( x k ) - ~ ~ ' ( x 0 .) I1I1~ ' ( x o ) - ~ ~ ( x k ) l l
I a0llxk
- xk-111
2
+'
1 xkll 5 -a2* 5 L a ' * , a0 a0 which shows (4.4.11). For all m 2 0 we get
x
-
In particular for k = 0 it follows from (4.4.20) and (4.4.8) that x, E ~ ( X O6,) for all n 2 0. Sequence {x,} is Cauchy in a Banach space X and as such it converges to some x* E U ( x o ,6). Estimate (4.4.12) follows from (4.4.20) by letting m + oo. Finally, to show uniqueness let y* E U ( x o ,6) be a solution of equation F ( x ) = 0. Since,
1
1
xk+l
-
y* = [F'(X~)-~F~(XO)]F~(X~)-~ {Ft'[xk
+O(XX
-
y*)]
4.4. Convergence under Twice-Pre'chet Diflerentiability Only at a Point
101
we obtain
from which it follows lim xk k+oo Hence, we deduce
=
y*. However, we showed above lim xk k+oo
=
x*.
x* = y*.
Remark 4.4.1 (a) In general
holds. We can certainly choose E = EO in (4.4.3). However in case strict inequality holds in (4.4.23) more precise error bounds can be obtained. Conditions (4.4.3) and (4.4.4) certainly hold if operator F is twice continuously Re'chet-diferentiable. (b) If condition (4.4.3) is replaced by
then it can easily be seen by (4.4.15) that conclusions of Theorem 4.4.2 hold for ao, Z replacing ao, a where
-
-
b+E
a0 = - and Z = Z o q . 2(1- E )
This case was considered in [196](see Theorem 1). However the stronger condition that operator F is actually twice continuously Frkchet-differentiable was imposed and the convergence of the sequence {x,) is only linear. (c) Note that even in the case of Theorem 4.4.2 using (4.4.9, (4.4.4) instead of (4.4.22)' and (4.4.4) essentially used there the results can be weaker provided that
We can prove the following local convergence theorem for Newton's method:
Theorem 4.4.3 Let F : D C_ X + Y be a twice-Fre'chet diferentiable operator. Assume: (a) there exists a solution x* of equation F ( x ) = 0 such that F f ( x * ) - l E L(Y,X I ; (b) for all EO > 0, E > 0, t E [ O i l ] there exist 6 = 6 ( ~EO, , t ) and b > 0 such that:
+ < EO, 11 F' (x*)-' [ F f ' ( x+ O(x* - x ) ) - Ff'(x*)]11 < E , ~ ~ F ~ ( X * ) - ~ [ F "O (( XX* x * ) )- F I ' ( x * ) ] I I
4.
102
Solving Smooth Equations
and
where,
Then, sequence { x n ) generated by method (4.1.3) is well defined, remains i n U ( x * ,r * ) for all n 2 0 and converges t o x* provided that xo E U ( x * ,r * ) . Moreover the following estimates hold for all n 2 0 11xn+l - X*
11 I allxn - X*112,
(4.4.32)
where,
Proof. It follows immediately from (4.4.21) by using x * , (4.4.26), (4.4.2'7) instead of xo, (4.4.3) and (4.4.4) respectively. Remark 4.4.2 Local results were not given i n [196]. If they were the following assumptions would have been made
with the corresponding radius r: given by
Moreover according to the analog of the proof of Theorem 4.4.1 the order of convergence would have been only linear. We provide an example for Theorem 4.4.3.
Example 4.4.1 Let X
= Y = R,
D
According to Theorem 4.4.3 for EO Choose 6 = (4.4.30) b = 1, r* =
i.
= U ( 0 , l ) and
= E =
define function F on D by
1 we can have by (4.4.26)-(4.4.28),
i. Then all hypotheses of Theorem 4.4.3 hold.
4.4.
Convergence under Twice-Fre'chet Diflerentiability Only at a Point
103
Moreover if (4.4.33), (4.4.34) are used instead of (4.4.26) and (4.4.213 respectively, then for F = .999 we obtain using (4.4.35) that r ; = .0010005. Choosing 6 < .368 all conditions hold but r;
< r*.
(4.4.38)
Hence in this case our Theorem 4.4.3 provides a wider range of initial guesses xo than the corresponding one mentioned in Remark 4.4.2. Furthermore the order of convergence is quadratic instead of only linear. Remark 4.4.3 If only (4.4.33) is used it can easily be seen by (4.4.21) that
In the case of Example 4.4.1 for Z = .999 we get ra
= . o o l 1 (since q = 1 )
i).O n the other hand, for
E
E (0,
i)we can get
Therefore (4.3.9) is violated. However the conditions of Theorem 4.4.1 are satisfied, 4
+ 4"
since b = O,q = 1 and if we choose E = and take 6 = ( 1 That is, Theorem 4.4.1 guarantees the convergence of method (4.1.3) (starting at xo = 0 ) to the solution x* of equation F ( x ) = 0.
4.
104
Solving Smooth Equations
Remark 4.4.4 The results obtained here can be extended to hold for m-times Fre'chet differentiable operators (m > 2) [79].
4.5
Exercises
'+
4.1. If ho < $ holds in U ( x o ,r * ) , where r* = F v o , then show: Equation (4.1.1) has a unique solution x* in U ( x o ,r * ) . 4.2. If the hypotheses of Theorem 4.2.4 are satisfied and ho = $, then show: There
exists a unique solution x* of Equation (4.1.1) in U ( x o ,r o ) = U ( x o ,270) 4.3. Let x* be a solution of Equation (4.1.1). If the linear operator F' ( x * ) has
a bounded inverse, and limllX-,.ll,o [IF' ( x ) - F' (x*)ll = 0, then show Newton's method (4.1.3) converges to x* if xo is sufficiently close to x* and
where
E
is any positive number; d is a constant depending on xo and
E.
4.4. The above result cannot be strengthened, in the sense that for every sequence of positive numbers c, such that: limn,, -- 0 , there is an equation for which (4.1.3) converges less rapidly than c,. Define Sn
Show: s ,
=
Cnl2 , { Jc(,-l)l~c(,+l)/r,
+ 0,
+ 0 , and
if n is even if n is odd.
limn,,
$& = 0 ,
(k 2 I).
4.5. Assume operator F' ( x ) satisfies a Holder condition
with 0 < b < 1 and U ( x o ,R). Define ho = boar$ 5 co, where co is a root of
h and let R > = ro, where do = (l+b)(;-ho). Show that method (4.1.3) converges to a solution x* of Equation (4.1.1) in U ( x Oy, o ) .
4.6. Let K , bo, 70 be as in Theorem 4.2.4. If ho = bovoK
0 there exists 60 > 0 such that
(b) for all ~1 > 0 there exists 61 > 0 such that
11 (F1(xo)-'
- F1(x)-l)
11 < E I ,
F'(XO)
for all x E U(xo,61).
Set 6 = min{So, 61) and E = max{EO,EI). (c) for E > 0 there exists 6 > 0 as defined above such that IIF1(x0)-l ( ~ ' ( x o ) F1(x))
11
0
>
c
> 2 be an integer and F an m-times Frkchet-differentiable operator defined on a convex subset D of a Banach space X with values in a Banach space Y. Assume:
4.10. Let m
4.
112
Solving Smooth Equations
(a) there exists xo E D such that Ft(xo)-' E L(Y,X); (b) there exist parameters 77
> 0, ai > 0, i = 2, . . .,m such that
Since F is m-times F'rQchet-differentiablefor all E such that
> 0 there exists 60 > 0
for all x E U(xo,60); (c) the positive zeros of p' is such that
where
Then polynomial p has only two positive zeros denoted by 61, 62 (61 5 62). (d) U(xo, 6) G D ; (e) 60 E [61,62]or 60 > 62, where
Show: sequence {x,} (n 2 0) generated by method (4.1.3) is well defined, remains in U(xo, 61) for all n 2 0 and converges to a solution x* E U(xo, 61) of equation F(x) = 0. The solution x* is unique in U(xo,60) if 60 E [dl, d2] or x* is unique in U(xo,J2) if J0 > 62. Moreover, the following estimates hold for all n 2 0
llxn - X* 11 5 t* - tn, where {tn) (n 2 0) is a monotonically increasing sequence converging to
t*,generated by
4.11. (a) Let F be a twice FrQchet-differentiableoperator defined on a convex
subset D of a Banach space X with values in a Banach space Y. Assume that the equation F ( x ) = 0 has a simple zero x* E D , in the sense that
4.5. Exercises
113
F1(x*) has an inverse F1(x*)-l E L(Y, X). Then for all exists r* > 0 such that IIF'(x*)-~[F"(x)
-
for all x , y E U(x*,r*).
~ " ( ~ ) ]5l lel,
Moreover, assume there exists bl
el > 0 there
> 0 such that
I bl. IIF'(x*)-~F~'(x*)II Then if
show: sequence {xn) (n 2 0) generated by method (4.1.3) is well defined, remains in U(x*,r*) for all n 2 0 and converges to x* provided that xo E U(x*,r*). Furthermore, the following estimates hold for all n 0:
>
(b) Assume: (i) there exists 7
2 0, xo E D such that F1(xo)-'
E L(Y,X)
I I ~ ' ( x o ) - ~ ~ ( x oI ) l l7; Then, for all k' 2 0 there exists r
> 0 such that
11
IIF'(XO)-~[F"(X) - ~ " ( y ) ] < l o , (ii) there exists b0
'v'x, 9 E ~ ( x or), & D ;
2 0 such that
2 - 6 (iii) 0 , co=~(eo+bo); (iv) r is the smallest positive zero of equation
Show: sequence {xn) (n 2 0) generated by Newton's method is well 0 and converges to a unique defined, remains in U(xo,r) for all n solution x* E ~ ( x or ), of equation F(x) = 0. Moreover, the following error bounds hold for all n 2 0
>
and IIxn+l
I
In
-
2.11.
4.
114 4.12.
Solving Smooth Equations
(a) Let : D X + Y be a continuously Frkchet-differentiable operator defined on an open convex subset D of a Banach space X with values in a Banach space Y . Assume: -
there exists xo E D such that F ( x o )# 0; F' (x)-' E L (Y,X ) ( x E D ) , and there exists b > 0 such that: [IF' ( X I - ' F'
-
11 I b
(20)
(2E
D; )
for each fixed p E D with F ( p ) # 0, there exists
such that IF' (xo)-' [F' (P) - F ' ( ~ t ) l l l< tllFf (p)-' F ( p )I]
for all
-
and yt collinear to p; there exists 7 > 0 such that:
c=
% E (0,l) ;
and U = U ( x o ,6*) G D , where
Show: sequence {x,) ( n 2 0 ) generated by method (4.1.3) is well defined, remains in U for all n 2 0 and converges to a unique solution x* E u of equation F ( x ) = 0. Moreover the following estimates hold:
and
where
4.5. Exercises
115
(b) Assume: there exists a simple zero x* E D of F in the sense that F' (x*)-I E L (Y,X ) ; - F' (x)-' E L (Y, X ) (x E D ) and there exists q > 0 such that: -
-
for each fixed p E D with F (p) t E (O,1] such that:
[IF'
# 0 there exists 6
> t Ilp - x*II,
(x*)-l [F'(PI - F' (pt)lll < t l l ~ - x* I1
+
for all pt = x t (x* - x) E U (p, 6) - U* = U (x*,r*) G D , where r* >
G D,
q. >
Show: sequence {x,) (n 0) generated by method (4.1.3) is well de0 and converges to x* provided that fined, remains in U* for all n XO E U*. Moreover, the following estimates hold for all n 2 0:
4.13.
>
(a) Suppose that F', F" are uniformly bounded by non-negative constants a < 1, K respectively, on a convex subset D of X and the ball
o(xO -= 0
>
211x0-F(x0)I)
2 D.
Moreover, if
hs = K 1+2a 2
]Ixo-F(xo)~[
1-a
1-a
< 1,
holds, then Stirling's method
converges to the unique fixed point x* of F in o(xo,ro). Moreover, the following estimates hold for all n 0:
>
(b) Let F: D
X
+Y
be analytic. Assume:
4.
116 and 0 #y
-
Solving Smooth Equations
sup l l ~ ~ ( * ) ( x ~< )m, ll~ k>l
where,
Show: sequence {xn) (n 2 0) generated by Stirling's method (4.5.1) is well defined, remains in U(xo,ro) for all n 2 0 and converges to a unique fixed point x* of operator F at the rate given by (4.5.2) with
(c) Let X
= D =R
and define function F on D by
Using Stirling's method for xo = 3 we obtain the fixed point x* = 0 of F in one iteration, since x1 = 3 - (1 6)-'(3 + 1) = 0. Show: method (4.1.3) fails t o converge.
+
4.14. It is convenient for us to define certain parameters, sequences and functions. Let Itn) (n 2 0) be a Fibonacci sequence given by
Let also c, l , rl be non-negative parameters and define: the real function f by
sequences {s,}
for xn E X, and
(n L -I), {an} (n L -I), {An) (n L -1) by
4.5. Exercises parameters b, d, ro by
Let F : D G X -+ Y be a nonlinear operator. Assume there exist 2 - 1 , xo E D and non-negative parameters c, e, 7 such that:
A;' exists, 11x0 - 2 - 1 11 I c, I l A ; l ~ ( x o )I1 I 7 ,
lpol ( [ x Y, ; FI - [z,w;~ 1 11 )I !(llx - -11 + Ilv - wll), YX,YE , ~D , x # v , w # z , 3-A, so < 7 e7 < (1 - s,,) 2 So = a ,
and
Show: sequence { x n ) (n 2 -1) generated by the Secant method
is well defined, remains in U ( x o ,r o ) for all n 2 0 and converges to a unique solution x* E U ( x Or,o ) of equation F ( x ) = 0. Moreover the following estimates hold for all n 1:
>
and
Furthermore, let
Then r* > ro and the solution x* is unique in U ( x o ,r * ) . 4.15. (1) Let F be a twice F'rkchet differentiable operator defined on a convex subset D of a Hilbert space H with values in H; xo be a point in D. Assume:
4.
Solving Smooth Equations
(a) there exist constants a, b, c, d and 7 such that
and
for all x E D , y E H;
+
(b) p = gc372 $e27 and (c) U(x0, r*) D , where
+ J1-kE [o, 1)
c
Show: iteration {xn) (n 2 0) generated by
is well defined, remains in U(xo,r*) for all n 2 0 and converges to a unique solution x* of equation F (x) = 0 in U(xO,T * ) . Moreover, the following estimates hold for all n 2 0
and
If we let a = 0, then (1) reduces to Theorem 13.2 in [213, p. 1611. (2) Under the hypothesis of (I), show the same conclusions of (1) hold for iteration
but with p replaced by
119
4.5. Exercises
(3) Under the hypotheses of (2), show that the conclusions of (1) hold for iteration
Results (2) and (3) reduce to the corresponding ones in [213, p. 1631 for a = 0. We give a numerical example to show our results apply, whereas the corresponding ones in [213, pp. 161-1631 do not. (4) Let H = R,D = [-I, 11, xo
=0
and consider equation
The convergence condition in [187, p. 1611 using our notation is
where L is the Lipschitz constant such that
Show: q = L3, d = 56 , c = $ , L = $
and q = s10> 1 .
Hence, (i) is violated. Moreover, show: the condition corresponding to iterations in (2) and (3) in [213, p. 1631 is also violated since q0 = -\/c2(d2
+ Lq)
-
1 E [0, I),
gives
Furthermore show: our conditions hold since for a = 1, b =
i, we get
and
Conclude: there is no guarantee that iterations converge to a solution x* of equation F (x) = 0 under the conditions in [213]. But our results (1)-(3) guarantee convergence for the same iterations.
4.
120
Solving Smooth Equations
(5) The results obtained here can easily be extended to include m-times F'rkchet differentiable operators, where m > 2 an integer. Indeed, let us assume there exist constants a2,. . . , am+l such that
Show:
The results (1)-(3) hold if we replace p, po by
and
Chapter 5
Newton-like methods We provide a local as well as a semilocal convergence analysis for Newton-like methods in a Banach space setting under very relaxed conditions.
5.1
Two-point methods
In this section we are concerned with the problem of approximating a locally unique solution x* of the nonlinear equation
where F, G are operators defined on a closed ball u ( w , R) centered at point w and of radius R > 0, which is a subset of a Banach space X with values in a Banach space Y. F is F'rkchet-differentiable on g ( w , R), while the differentiability of operator G is not assumed. However we note that the actual condition (4) (see below) used here implies the continuity (but not necessarily the differentiability) of operator G on u ( w , R). We use the two-point Newton-like method
to generate a sequence converging to x*. Here A(x, y) E L(X, Y), for each fixed x, y E p ( w , R). Special cases of method (5.1.2) have already been studied. For example: if A(x, y) = Ft(y) and G(x) = 0 we obtain the Newton-Kantorovich method (4.1.3); if A(%,y) = [x, y; F], and G(x) = 0 method (5.1.2) reduces to the Secant method; if A(x,y) = Ff(y), we obtain the Zincenko iteration, and for A(x, y) = A(y) (A(x) E L(X, Y)) we obtain the Newton-like method studied. Our motivation originates from the following observation: Consider the scalar Then for the initial equation F(d) = d3 - p = 0 on interval b,2 - p] (p E [0, guess do = 1, the famous Newton-Kantorovich hypothesis, which is the sufficient convergence condition for the convergence of Newton's method to x* is violated.
i)).
122
5. Newton-like methods
Therefore under the Newton-Kantorovich theorem 4.2.4 there is no guarantee that method (4.1.3) converges to x* in this case. However one can easily see that Newton's method indeed converges for several values of p, say p E [O, .46) (see also Example 5.1.3). Therefore one expects that the Newton-Kantorovich hypothesis (4.2.51) can be weakened. In order for us to cover a wider range of problems and present a finer unified convergence analysis for iterative methods we decided to introduce method (5.1.2). We provide a local as well as a semilocal convergence analysis for method (5.1.2) under very relaxed hypotheses (see (5.1.3), (5.1.4)). Since we are further motivated by optimization considerations, our new idea is to use center-Lipschitz conditions instead of Lipschitz conditions for the upper bounds on the inverses of the linear operators involved. It turns out that this way we obtain more precise majorizing sequences 12631. Moreover, despite the fact that our conditions are more general than related ones already in the literature, we can provide weaker sufficient convergence conditions, and finer estimates on the distances involved. Several applications are provided: e.g., in the semilocal case we show that the famous for its simplicity and transparency Newton-Kantorovich hypothesis (see (4.2.51)) is weakened (see (5.1.57)), whereas in the local case we can provide a larger convergence radius using the same information (see (5.1.131) and (5.1.132)). Assume there exists fixed v E X such that v E g ( w , R), A(v, w)-' E L(Y,X ) , and for any x, y, z E r ( w , r) 2 r ( w , R), r E [0, R], t E [0, 11, the following hold: l l ~ ( vw)-'[A(x, , Y) - A(v, w>lll I 0 o ( b - vll, I
~ Y - wI>,
(5.1.3)
and
+ t(z - Y)) - A(x,y)l(z - Y) + G(z) - G(Y))II IIA(v,w)-l{l~'(~ 5
~ ( I I Y - w I I ~ I I -~ ~ 1 1 1, 12 - 41,t)Ilz - YII
(5.1.4)
+ [O, +oo), 0 : [O, + a ) 3 x [O,1] + [O,+ a ) are continuous, nonwhere 00 : 10, +a)2 decreasing functions on [O, + a ) 2 , [O, + a ) 3 x [O,1] respectively. Define parameters C-1, C, C1 by
With the above choices, we show the following result on majorizing sequences for method (5.1.2). Lemma 5.1.1 Assume: there exist parameters 7 that:
2 0, 6 E [O, 2), ro
E [0, r] such
1
hs
=2
1 0 ( r 0 , ~ , c + q , t ) d t + 6 0 0 ( c + ~ - l , q + r5 0 )6,
(5.1.6)
5.1. Two-point methods and
for all n 2 0. Then iteration {tn) ( n 2 -1) given by
is nondecreasing, bounded above by
and converges to some t* such that
0 5 t* I t** 5 r. Moreover the following estimates hold for all n 2 0:
Note that for 6 = 0 by (5.1.6) we require ho = 0 provided 8 = 0. However (5.1.8) is really used to show (5.1.16). Proof. Using induction we show: P
1
and
>
for all k 0. Estimate (5.1.13) can then follow from (5.1.14)-(5.1.16) and (5.1.10). For k = 0 estimates (5.1.14)-(5.1.16) follow by the initial conditions. By (5.1.10) we then get
6
0I t2- tl I -(t1 2
-
to).
(5.1.17)
124
5. Newton-like methods
Assume (5.1.14)-(5.1.16) hold for all k 5 n
+ 1. We obtain in turn
5 6 by ( 9 ) and (13).
(5.1.18)
Hence we showed (5.1.14) holds for k
=n
+ 2. Moreover we show
Indeed we have:
Assume (5.1.19) holds for all k 5 n tk+2 5 t i t 1
6 + #k+l
-
tk)
+ 1. It follows from (5.1.13)-(5.1.16):
5 tk
+ 26
-(tk
-
tk-1)
6 + -(t*+l 2
-
tk)
Hence, sequence {t,) (n 2 -1) is bounded above by t**. Estimate (5.1.15) holds for k = n 2 by (5.1.14) and (5.1.21). Furthermore (5.1.16) holds for k = n 2 by (5.1.14) and (5.1.21). Finally, sequence {t,) ( n 2 -1) is monotonically increasing by (5.1.15), bounded above by (5.1.21) and as such it converges to some t* satisfying (5.1.12).
+
+
Remark 5.1.1 It follows from (5.1.10) that t** can be replaced i n (5.1.11) by the finer tG* given by
since po 5 6.
5.1. Two-point methods
125
We provide the main result on the semilocal convergence of method (5.1.2) using majorizing sequence (5.1.10). Theorem 5.1.2 Under the hypotheses of Lemma 5.1.1 assume further that for
each fixed pair ( r ,r o ):
and
Note that the existence of A(y-1, yo)-' follows from BO(c-1,co) < 1. Then, sequence {y,} ( n 2 -1) generated by Newton-like method (5.1.2) is well defined, remains i n U ( w ,t*)for all n 2 -1, and converges to a solution x* of equation F ( x ) + G ( x ) = 0 . Moreover the following estimates hold for all n 2 -1:
and
Furthermore the solution x* is unique i n r ( w , t*) if
and i n r ( w , Ro)for Ro E ( t * , r ] if
l1
B(t*, t*
+ Ro,t* + Ro,t ) d t + Bo(t* + e l , t*)< 1.
Proof. We first show estimate (5.1.24)) and yn E u ( w , t * ) for all n
n
2 -1. For
= -1,0, (5.1.24) follows from (5.1.5), (5.1.10) and (5.1.23). Suppose (5.1.24) holds for all n = 0 , 1 , . . . , k 1; this implies in particular (using (5.1.5), (5.1.22))
+
11~k+1- w I I
5 11yk+1 - ykll + IIyk - 7Jk-111 + . . . + llyl - yo//+ llVO 5 (tk+l - t k ) + ( t k - t k - 1 ) + . . . + (tl - t o )+ T O = tk+l - t 0 + T O 5 tk+l 5 t*.
That is, yk+l E U ( w , t * ) . We show (5.1.24) holds for n (5.1.10) we obtain for all x , y E u ( w , t * )
=
k
+ 2.
-
WII
By (5.1.3) and
126
5. Newton-like methods
In particular for x
= yk
and y
= yk+l
we get using (5.1.3), (5.1.5),
It also follows from (5.1.29) and the Banach Lemma on invertible operators 1.3.2 that A(yk, yk+l)-l exists, and
where,
Using (5.1.2), (5.1.4), (5.1.10), (5.1.30) we obtain in turn
which shows (5.1.24) for all n 2 0. Note also that
+
(5.1.32) llyk+2 - wII 5 ll~k+2- Yk+i 11 IIyk+l - wII tk+2 - tk+l tk+l - to TO = tk+2 - to + T O 5 tk+2 5 t*.
+
+
That is, yk+2 E V(w, t*). It follows from (5.1.24) that {y,) (n 2 -1) is a Cauchy sequence in a Banach space X, and as such it converges to some x* E r ( w , t*). We can have as above
where,
5.1. Two-point methods
127
and 1
b1 =
B(R - C , q , 27, t )dt.
(5.1.35)
By letting k -,oo in (5.1.33), using (5.1.30), and the continuity of operators F , G we get boIIA(v,w ) - l ( F ( x * ) + G ( x * ) ) I l = 0 from which we obtain F ( x * )+ G ( x * ) = 0 (since bo > 0). Estimate (5.1.25) follows from (5.1.24) by using standard majorization techniques [99],[263]. To show uniqueness in V ( w ,t * ) ,let y* be a solution of equation (5.1.1) in V ( w ,t*).Then as in (5.1.31) we obtain the identity:
Using (5.1.36) we obtain in turn:
That is, x* = y*, since lim yk = y*. If y* E V ( w ,Ro)then as in (5.1.37) we get k+w
ly* - %+I
11 5
$; B(t*, Ro + t*,Ro + t*,t ) d t 1 - oo(t* +
tl)
I Y*
-
Ykll
< I~Y*
-
(5.1.38)
Hence, again we get x* = y*. rn
Remark 5.1.2 Conditions (5.1.8), (5.1.9) can be replaced by the stronger but easier to check
and
respectively. Note also that conditions (5.1.6)-(5.1.9), (5.1.39), (5.1.40) are of the Newton-Kantorovich-type hypotheses (see also (5.1.58)) which are always present i n the study of Newton-like methods.
128
5. Newton-like methods
Remark 5.1.3 (a) Conditions similar to (5.1.3), (5.1.4) but less jlexible were considered by Chen and Yamamoto i n [I211 i n the special case when A ( x , y ) = A ( x ) for all x , y E U ( w ,R) ( A ( x ) E L ( X ,Y ) )(see also Theorem 5.1.6). Operator A ( x ) is intended there to be an approximation t o the Frdchet-derivative F 1 ( x ) of F . However we also want the choice of the operator A to be more flexible, and be related to the difference G ( z ) - G ( y ) for all y, z E U ( w ,R) (see Application 5.1.1 and Example 5.1.1). (b) If we choose
for all x , y E U ( w , r ) , q, q l , q2, q3 E [O,r], t E [O, 11 we obtain the NewtonKantorovich method (4.1.3) (see also Application 5.1.4). (c) Assume:
Define F = 0 , A ( x , Y ) = [Y,x ; GI, G ( x ) - G ( Y ) = [ z ,Y ;G l ( z - Y ) , %?l, q2, q3, t ) = d?(q2, q3)q2 where [ z ,y; GI is a divided digerenee of order one for G (see Chapter 2), and $ is a continuous nondecreasing function in two variables. Then note that function was used in [I901 in the case of the secant method. Note also that obviously (5.1.4) implies again the continuity of operator G for this special choice of function 8. Application 5.1.3 Let us consider some special cases. Define
and set
where F ' , [.,-;G ] denote the Frdchet-derivative of F and the divided d i e r e n c e of order one for operator G (see Chapter 1). Hence, we consider Newton-like method (5.1.2) i n the fomn
The method was studied in [ll7]. It is shown to be of order as the order of Chord), but higher than the order of
-
1.618..
. (same
and wn+l = wn
-
A(wn)-'(F(wn)
+ G ( w ~ ) )( n2 0 ) ,
(5.1.48)
5.1. Two-point methods
129
where A(.) is an operator approximating F'. Assume:
and
for some non-negative parameters Then we can define:
ti,i = 1,2,3,4 and all x , y E U ( y o r, ) 5 U ( y o ,R).
and
If the hypotheses of Theorem 5.1.2 hold for the above choices, the conclusions follow. Note that conditions (5.1.49)-(5.1.52) are weaker than the corresponding ones in [117,pp. 48-49], [99].Indeed, conditions IIF1(x)-F1(y)I15 &IIx-yll, IIA(x, y)-l11 I e 6 , II[x,Y,~;G]II I e 7 , and
for all x , y, z , w E U ( y o ,r ) are used there instead of (5.1.49)-(5.1.52),where [ x ,y, z ; GI denotes a second order divided difference of G at ( x ,y, z ) , and ti,i = 5,6,7,8 are non-negative parameters. Let us provide an example for this case:
Example 5.1.1 Let X = Y = (R2, 11 -
Consider the system
II(xl,x")IIm = max(lx'1, lxl'l), F = ( F l ,Fz), G = (Gl,G2). For x = ( X I , x") = ~ ( X ' ) ~ X ' ' +( x " )~1, F2(x',x") = ( x ' ) ~ + x ' ( x "-) ~ 1, Gl(xl,x'') = Ixr- 11, G2(x1,x'')= lxl'l. W e shall take [ x , y ; G ]E M z x 2 ( R ) as G~(Y',Y")-G~(x',Y") [ x , Y ;G]i,l = yl-xl , [ x , ~ ; G ] i=, z G i ( ~ y' , y~ "~) --Gxi (~~,~' i, ~=" )1,2, provided that y' # x' and y" # xu. Otherwise define [x,y; GI to be the zero matrix in M z X 2 ( R ) . Using method (5.1.47) with zo = (1,O) we obtain the results in Table 5.1. Using the method of chord with w-1 = (5,5), and wo = ( l , O ) , we obtain the results i n Table 5.2.
Set llxllm ( X I , XI')
=
E R' we take Fl
5. Newton-like methods
130
Table 5.1: Method (5.1.47) with zo = (1,O).
Table 5.2: Method of chord with w-1 = (5,5), and wo = (1,O).
Using our method (5.1.46) with y-I i n Table 5.3.
=
(5,5), yo
=
(1, 0), we obtain the results
W e did not verify the hypotheses of Theorem 5.1.2 for the above starting points. However, it is clear that the hypotheses of Theorem 5.1.2 are satisfied for all three
5.1. Two-point methods
131
Table 5.3: Our method (5.1.46) with 3-1 = (5,5), yo = (1,O).
methods for starting points closer to the solution
chosen from the lists of the tables displayed above. Hence method (5.1.2) (i.e. method (5.1.46) in this case) converges faster than (5.1.47) suggested in Chen and Yamamoto [121], Zabrejko and Nguen (3631 in this case and the method of chord or secant method. In the application that follows we show that the famous Newton-Kantorovich hypothesis (see (5.1.58)) is weakened under the same hypotheses/information. Application 5.1.4 Returning back to Remark 5.1.3 and (5.1.dl), iteration (5.1.2) reduces to the famous Newton-Kantorovich method (4.1.3). Condition (5.1.6) reduces to:
Case 1. Let us restrict 6 E [0, 11. Hypothesis (5.1.9) now becomes
[g -
-
(a)"]
.
132
5. Newton-like methods
which is true for all k
20
by the choice of 6. Furthermore (5.1.8) gives
Hence in this case conditions (5.1.6), (5.1.8) and (5.1.9) reduce only to (5.1.56) provided 6 E [O, 11. Condition (5.1.54) for say 6 = 1 reduces to: hl I 1
(5.1.56)
which weakens the famous Newton-Kantorovich hypothesis (4.2.51) since
eo I e
(5.1.57)
in general. Case 2. It follows from Case 1 that (5.1.6), (5.1.8) and (5.1.9) reduce to (5.1.54),
and
respectively provided 6 E [O, 2). Case 3. It turns out that the range for 6 can be extended (see also Example 5.1.3). Introduce conditions: 1
cog 5 1 - -6, 2
for 6 E
[e,, 2),
where,
Indeed the proof of Lemma 5.1.1 goes through weaker condition in this case
(b-
-
);
(;)*+I
50,
or
6 2 So, which is true by the choice of bO.
if instead
of (5.1.18) we show the
5.1. Two-point methods
133
In the example that follows we show that
6can be arbitrarily large. Indeed:
Example 5.1.2 Let X = Y = R, do = 0 and define function F on R by
+ + az sin ea3"
(5.1.60)
F ( x ) = a ~ x a1
where ai, i = 0,1,2,3 are given parameters. T h e n with the rest of the choices of operators and functions given by (5.1 .dl) we see from (5.1.60) that for as large and a2 suficiently small can be arbitrarily large. Next we provide an example also mentioned in the introduction to show that (5.1.57) holds but not (5.1.58).
Example 5.1.3 L e t X = Y = R , do = 1, D = Ip,2-p], p~ o n D by
[o,;), and
define F
Using (5.1.23), (5.1 .dl) and (5.1.61) we get
which imply
That is there is n o guarantee that method (5.1.42) converges since (4.2.51) is violated. However since lo= 3 - p we find that (5.1.56) holds if p E , which improves (4.2.51) under the same computational cost.
[kp, i)
Finally using Case 3 we can do better. Indeed, choose
Then we get q = .183166.. . , Co = 2.5495,
C = 3.099, and 6' = 1.0656867.
Choose 6 = 6O. Then we get
That is the interval
[kp, i)can be extended to
i).
134
5. Newton-like methods
In order to compare with earlier results we consider the case of single step methods by setting x = y and v = w. We can then prove along the same lines to Lemma 5.1.1 and Theorem 5.1.2 respectively the following results by assuming: there exists w E X such that A(w)-l E L(Y,X), for any x, y E U(w,r) U(w, R), t E [O,l]:
-
c
and
where mo, m l , m are as 80, 80 (one variable), 0 (three variables), respectively. Then as in Lemma 5.1.1 we can show the following result on majorizing sequences: Lemma 5.1.5 Assume:
and
+ Smo for all n
> 0.
[&
(1 -
Then, iteration {s,)
(a) (n
+ TO]
> 0) given by
16
5.1. Two-point methods
135
is nondecreasing, bounded above by
and converges to some s* such that 0I s* I s**. Moreover the following estimates hold for all n
20
Remark 5.1.4 Comments similar to Remark 5.1.2 can now follow for Lemma 5.1.5.
+
Using mo, m(-,-,-) ml, A ( - ) w , instead of 00, O ( - , -,-,-), A ( - ,-), (v, w ) respectively and going through the proof of Theorem 5.1.2 step by step we can show:
Theorem 5.1.6 Under the hypotheses of Lemma 5.1.5 hold assume further that
Then, sequence { w n } (n L 0) generated by Newton-like method (5.1.48) is welldefined, remains in r ( w ,s*)for all n 2 0, and converges to a solution x* of equation F ( x ) G ( x ) = 0. Moreover the following estimates hold for all n 2 0
+
and
Furthermore the solution x* is unique in u ( w ,s*) zf
A1
m ( s * ,2s*,t)dt
or in r ( w ,Ro), if s*
1'
m ( ~ *Ro,
+ m l ( s * )+ mo(s*)< 1,
(5.1.77)
< & 5 r , and
+ ~ * , t ) d+t ~ I ( s *+) m o ( ~ *
Then, sequence {x,} ( n 0 ) generated by method (5.3.2) is well defined, remains i n U ( x * ,r * ) for all n > 0 and converges to x* provided that x-1, xo belong i n U ( x * ,r*).Moreover the following estimates hold for all n 2 0:
Proof. Let us denote by L = L ( x , y ) the linear operator
Assume x , y E U ( x * ,r*). We shall show L is invertible on U ( x * ,r * ) , and
IIL-~F'(x*)II
I [l-ally
- x*II - cllx -
Y112]-1
I [ I - ar* - 4c(r* )2 ] -1 .
(5.3.13)
Using (5.3.2), (5.3.4)-(5.3.10), we obtain in turn: IIF'(x*)-~[FI(x*)- L]11
=
+ ( [ Y , Y ]- [
= I I ~ ' ( x * ) - l [ ( [ x * , ~-* [l Y , Y ] )
2 -~x7xl)l
I ally - x*II + clly - x1I2
I ar* + c[l[y- x*ll + IIx* - x11I2 I ar* + 4 ~ ( r *II I Z(lb - 41 + I
~ Y - vll>
(5.3.19)
instead of (5.3.4) and (5.3.5). Note that (5.3.19) implies (5.3.4) and (5.3.5). Moreover we have:
and
Therefore stronger condition (5.3.19) can replace (5.3.4), (5.3.6) and (5.3.9) i n Theorem 5.3.1. Assuming F has divided differences of order two, condition (5.3.6) can be replaced by the stronger
or the even stronger
Note also that
EST and we can set
despite the fact that E is more difficult to compute since we use divided differences of order two (instead of one). Conditions (5.3.19) and (5.3.23) were used in [236] to show method
converges to x* with order 1.839.. . which is the solution of the scalar equation
5.3. A fast method
155
Potra in [236] has also shown how to compute the Lipschitz constants appearing here in some cases. It follows from (5.3.11) that there exist co, N sufficiently large such that:
I ~ x ~ -x*II + ~
I
for a11n2 N.
(5.3.28)
Hence the order of convergence for method (5.3.2) is essentially at least two, which is higher than 1.839.. . . Note also that the radius of convergence r* given by (5.3.9) is larger than the corresponding one given in [[236],estimate (5.3.22)]. This observation is very important since it allows a wider choice of initial guesses x-1 and xo. It turns out that our convergence radius r* given by (5.3.9) can even be larger than the one given by Rheinboldt [295] (see, e.g., Remark 4.2 in [277])for Newton's method. Indeed under condition (5.3.19) radius rk is given by
We showed in Section 5.1 that have: rk
(or %) can be arbitrarily large. Hence we can
< r*.
(5.3.30)
In 5.1 we also showed that rk is enlarged under the same hypotheses and computational cost as in [247]. We note that condition (5.3.10) suffices to hold only for x, y being iterates of method (5.3.2) (see, e.g., Example 5.3.2). Condition (5.3.24) can be removed if D = X. In this case (5.3.7) is also satisfied. Delicate condition (5.3.9) can also be replaced by stronger but more practical one which we decided not to introduce originally in Theorem 5.3.1, so we can leave the result as uncluttered-general as possible. Indeed, define ball Ul by U1 = U (x*,R*)
with
R* = 3r*.
If xn-l,xn E U* (n L 0) then we conclude 22, - xn-1 E Ul (n 2 0). This is true since it follows from the estimates
Hence the proof of Theorem 5.3.1 goes through if both conditions (5.3.7), (5.3.9) are replaced by
u1 c Do. We provide an example to justify estimate (5.3.30).
((5.3.7'))
156
5. Newton-like methods
Example 5.3.1 Returning back to Example 5.1.5 we obtain a and a = 5. I n view of (5.3.8) and (5.3.29), we get
r& = .24525296 < .299040145 W e can also set R* = 3r*
=
=
b = e - 1, c
=
e
= r*.
.897120435.
We can show the following result for the semilocal convergence of method (5.3.2). Theorem 5.3.2 Let F be a nonlinear operator defined on an open set D of a Banach space X with values i n a Banach space Y . Assume: operator F has divided diferences of order one and two on Do G D ; there exist points x-1, xo i n Do such that 2x0 - x-1 E Do and A. = [2xo - 2 - 1 , x P l ] is invertible on D ; Set A, = [2xn - x,-1, x,-I] ( n L 0). there exist constants a ,p such that:
for all x , y, u, v E D , and condition (5.3.1 0) holds; Define constants y , 6 by
define 8, r , h by
+
+
8 = {(a P Y ) ~ V ( 1 -
u?~)}~'~,
(5.3.36)
and
where ro E (O,r] is the unique solution of equation
on interval (O,r]. Then sequence {x,) (n 2 -1) generated by method (5.3.2) is well defined, remains i n U ( x o ,r o ) for all n 2 -1 and converges to a solution x* of equation F ( x ) = 0 . Moreover the following estimates hold for all n -1
5.3. A fast method
and
where,
'YO = a!
+ +Pro + /37,
= 3pr2 o - 2'~oro- p y 2
+ 1,
and for n L 0
Furthermore if ro 5 rl and
x* is the unique solution of equation (5.3.1) i n U ( x o ,r o ) .
Proof. Sequence {t,) ( n > -1) generated by (5.3.44)-(5.3.46) can be obtained if method (5.3.2) is applied to the scalar polynomial f ( t )= -pt3
+ -/ot2+ ~ l t .
It is simple calculus to show sequence {t,) ( n zero (decreasingly). We can have:
We show (5.3.42) holds for all n
(5.3.48)
> -1)
converges monotonically to
2 -1. Using (5.3.33)-(5.3.38) and
we conclude that (5.3.42) holds for n = -1,O. Assume (5.3.42) holds for all n I k and xk E U(x0,ro). By (5.3.10) and (5.3.42) xk+l E U(x0,ro). By (5.3.31), (5.3.32)
5. Newton-like methods and (5.3.42) IIA,~(AO
-
Ak+1)11 =
+
IIA; ([axo- X - I , X-11 - [ X O , X-11 [ X O , 2-11 - [ X O , X O ] + LxO, 201 - [xk+l,X O ] [xk+l,201 - [xk+l,xk] [xk+l,xk] - [2xk+1 - xk, x k ] )11
-
-
1
+
+
= 11~,~(([2 -xx-1, o x-1, xo] - [xo,2 - 1 , xo])(xo- 2 - 1 )
+ +
( [ X O , X O ] - [xk+l,~ 0 1 ) ([xk+l,201 - [xk+l,x k ] ) ([xk+l,xk] - [2xk+1 - xk, x k ] ) )11
+
I P'Y2 + (11x0 - xk+lll + 11x0 - xkll
+ 11xk - ~ k + l l l ) a
I Py2 + 2(to - t k + l ) a < 2py2 + 2ar 5 1.
(5.3.52)
It follows by the Banach lemma on invertible operators 1.3.2 and (5.3.52) that exists, so that
AL:,-:,
We can also obtain
Using (5.3.2), (5.3.53) and (5.3.54) we get
which together with (5.3.41) completes the induction. It follows from (5.3.42) that sequence {x,) (n 2 -1) is Cauchy in a Banach space X and as such it converges to some x* E r ( x o ,ro). By letting k + oo in (5.3.55) we obtain F ( x * ) = 0. Finally to show uniqueness, define operator
5.3. A fast method
where y* is a solution of equation (5.3.1) in V ( x o ,r l ) . We can have IIA,l(~o - Bo)ll I I a [IIY*- (2x0 - ~ - 1 ) I l
[II(Y* I a [IIY* I
I a(2y
+ Ilx* + Il(x* - xo) + (xo -
- xo) - (xo - 2 - 1 ) I I - xoll
2-1111
2-1)11]
+ 2 11x0 - 2-111 + Ilx* - xoll]
+ ro + r l ) < 1.
(5.3.57)
It follows from the Banach lemma on invertible operators and (5.3.57) that linear operator Bo is invertible. We deduce from (5.3.56) and the identity
that x* = y*.
(5.3.59)
Remark 5.3.2 (a) It follows from (5.3.42), (5.3.43), (5.3.50) and (5.3.55) that the order of convergence of scalar sequence {t,} and iteration {x,} is quadratic. (b) The conclusions of Theorem 5.3.2 hold i n a weaker setting. Indeed assume:
and
for all x , y E Do. It follows from (5.3.31), (5.3.32) and (5.3.60) - (5.3.65) that
and
For the derivation of: (5.3.53), we can use (5.3.60) - (5.3.62) and (5.3.65) instead of (5.3.31) and (5.3.32), respectively; (5.3.54), we can use (5.3.63) instead of (5.3.31; (5.3.57), we can use (5.3.65) instead of (5.3.31). The resulting majorizing sequence call it {s,) is also converging to zero and is finer than {t,) because of (5.3.66)
160
5. Newton-like methods
and (5.3.67). Therefore if (5.3.10), (5.3.60) - (5.3.65) are used i n Theorem 5.3.2 instead of (5.3.31) we draw the same conclusions but with weaker conditions, and corresponding estimates are such that:
and
for all n 2 0 . (c) Condition (5.3.32) can be replaced by the stronger (not really needed i n the proof) but more popular,
for all v , u , x , y E D. (d) A s already noted at the end of Remark 5.3.1, conditions (5.3.10) and (5.3.40) can be replaced by U2 = U ( x o ,Ro)C Do with Ro = 37-0
((5.3.40'))
provided that x-1 E U2. Indeed if xn-1, xn E Uo ( n L 0 ) then 112xn - xn-1[1 I 211x72 - ~
0 1 1 + 1 1 ~ ~ - 1 2011 < 3r0. -
That is 22, - xn-1 E U2 ( n 2 0 ) . We can also provide a posteriori estimates for method (5.3.2): Proposition 5.3.3 Assume hypotheses of Theorem 5.3.3 hold. Define scalar sequences {p,) and {q,) for all n 2 1 by:
and
Then the following error bounds hold for all n L 1:
where,
5.3. A fast method Proof. As in (5.3.52) we can have in turn:
It follows from (5.3.71) and the Banach lemma on invertible operators 1.3.2 that linear operator [x,, x*] is invertible, and
Using (5.3.2) we obtain the approximation:
By (5.3.54), (5.3.76) and (5.3.77) we obtain the left-hand side estimate of (5.3.73). Moreover we can have in turn:
I
+
a(tn-1 - tn)2 P(tn-1 - tn)(tn-I - t n - ~ ) ~ 1 - By2 - 2a(to - t,) - at,
I -@ti + at: -Pr; + at,
++ yl
= t,.
rn
A simple numerical example follows to show: (a) how to choose divided difference in method (5.3.2); (b) method (5.3.2) is faster than the Secant method [68]
(c) method (5.3.2) can be as fast as Newton-Kantorovich method (4.1.3) Note that the analytical representation of F f ( x n ) may be complicated which makes the use of method (5.3.2) very attractive.
162
5. Newton-like methods
Example 5.3.2 Let X
= Y = R,
and define function F on Do = D
=
(.4,1.5) by
Moreover define divided difference of order one appearing i n method (5.3.2) by
I n this case method (5.3.2) becomes
and coincides with method (4.1.3) applied to F . Furthermore Secant method (5.3.79) becomes:
Choose x-1 = .6 and xo = .7. Then we obtain:
n 1
1 Method 2 1 Secant method (5.3.83) 1
.980434783
1
.96875
We conclude this section with an example involving a nonlinear integral equation: Example 5.3.3 Let H ( x , t ,x ( t ) ) be a continuous function of its arguments which is sufficiently many times differentiable with respect to x . It can easily be seen that i f operator F i n (5.3.1) is given by
then divided difference of order one appearing in (5.3.2) can be defined as h n ( s ,t ) =
H ( s , t , 2 x n ( t ) - xn-I ( t ) )- H ( s , t ,2,-1 ( t ) ) > 2(xn( t )- xn-1 ( t ) )
provided that if for t = t, we get x,(t) = ~ , - ~ ( tthen ) , the above function equals Note that this way h,(s, t ) is continuous for all t E [0, 11. HL(s, t,, x,(t,)).
5.3. A fast method
163
W e refer the reader t o Chapter 2 for the concepts concerning partially ordered topological spaces (POTL-spaces). The monotone convergence o f method (5.3.2) is examined in the next result. Theorem 5.3.4 Let F be a nonlinear operator defined on an open subset of a regular POTL-space X with values in a POTL-space Y. Let xo, yo, y-1 be points 0.f D such that:
Moreover assume: there exists a divided difference [., .] : D all ( x ,y) E Dz with x 5 y
+ L ( X , Y)
2y - x E Do,
such that for (5.3.87)
and
Furthermore, assume that for any ( x ,y) E 0: with x I y, and ( x ,2y - x ) E D i the linear operator [ x ,2y - x ] has a continuous non-singular, non-negative left subinverse. Then there exist two sequences { x n } (n 2 I ) , { y n ) (n 2 I ) , and two points x*, y* of X such that for all n 2 0:
lim xn = x*,
n+m
lim yn = y*
n+m
Finally, if linear operators An = [yn-l, 2yn - yn-I] are inverse non-negative, then any solution of the equation F ( x ) = 0 from the interval Do belongs to the interval ( x * ,y*) (i.e., xo I v I yo and F ( v ) = 0 imply x* I v I y*). Proof. Let & be a continuous non-singular, non-negative left subinverse o f Ao. Define the operator Q : (0,yo - x o ) + X by
It is easy t o see that Q is isotone and continuous. W e also have:
5. Newton-like methods
164
According to Kantorovich's theorem 2.2.2 on POTL-spaces [99],[272],operator Q has a fixed point w E ( 0 ,yo - xo). Set xl = xo w.Then we get
+
By (5.3.88) and (5.3.94) we deduce:
Consider the operator H: ( 0 ,yo - xl) + X given by
Operator H is clearly continuous, isotone and we have:
By Kantorovich's theorem there exists a point z E (0, yo - xl) such that H ( z ) = z . Set yl = yo - z to obtain
Using (5.3.88), (5.3.95) we get:
Proceeding by induction we can show that there exist two sequences {x,) ( n 2 I), {y,) ( n 2 1) satisfying (5.3.89)-(5.3.92) in a regular space X , and as such they converge to points x*,y* E X respectively. We obviously have x* I y*. If xo 5 uI yo and F ( u ) = 0 , then we can write
and
If the operator A. is inverse non-negative then it follows that xl 5 u 5 yl. Proceeding by induction we deduce that x, < u 5 y, holds for all n 0. Hence we conclude
>
x* I u I y*. rn In what follows we give some natural conditions under which the points x* and y* are solutions of equation F ( x ) = 0.
5.3. A fast method
165
P r o p o s i t i o n 5.3.5 Under the hypotheses of Theorem 5.3.4, assume that F is continuous at x* and y* if one of the following conditions is satisfied:
( a ) x*
= y*;
(b) X is normal, and there exists an operator T : X
-+ Y ( T ( 0 ) = 0 ) which has an isotone inverse continuous at the origin and such that An < T for suficiently large n ;
( c ) Y is normal and there exists an operator Q : X + Y ( Q ( 0 ) = 0 ) continuous at the origin and such that An 5 Q for suficiently large n ; ( d ) operators A, (n 2 0 ) are equicontinuous.
Then we deduce
Proof. (a) Using the continuity of F and (92) we get
Hence, we conclude
(b) Using (90)-(93) we get
0 0
2 F ( x n ) = A n ( x n - xn+l) 2 T ( x n - xn+l),
I F ( Y ~=) An(yn - %+I) I T ( 3 n - ~ n + l ) -
Therefore, it follows:
By the normality of X , and lirn (x, - x,+l) = lirn ( y , - y,+l) = 0 ,
n-co
n-cc
T - l F ( x , ) ) = lirn T - ' ( F ( ~ , ) ) = 0. Using the continuity of F we n+cc obtain (5.3.96). (c) As before for sufficiently large n
we get lirn,,,
By the normality of Y and the continuity of F and Q we obtain (5.3.96). (d) It follows from the equicontinuity of operator A, that lirn A,vn = 0 whenever n+cc
lim v, = 0. Therefore, we get lim A,(x, - x,+l ) = lim A,(y, - y,+l) = 0. By
n+oo
n+oo
n+cc
(5.3.89), (5.3.90), and the continuity of F a t x* and y* we obtain (5.3.96). That completes the proof of Proposition 5.3.5.
5. Newton-like methods
166
Remark 5.3.3 Hypotheses of Theorem 5.3.4 can be weakened along the lines of Remarks 5.3.1 and 5.3.2 above and the works i n [277, pp.102-1051, or Chapter 2 on the monotone convergence of Newton-like methods. However, we leave the details to the motivated reader. Remark 5.3.4 W e finally note that (5.3.2) is a special case o f t h e class of methods of he form:
where A, are real numbers depending o n x,-I
and x,, i.e.
and are chosen so that i n practice, e.g., for all x , y E D
+ (1 + X(x, y))y - X(x, y)x E D.
(5.3.99)
Note that setting X(x, y) = 1 for all x , y E D in (5.3.97) we obtain (5.3.2). Using (5.3.97) instead of (5.3.2) all the results obtained here can immediately be reproduced in this more general setting.
5.4
Exercise
5.1. Let F be a nonlinear operator defined on an open convex subset D of a Banach space X with values in a Banach space Y and let A ( x ) E L ( X , Y) ( x E D). Assume: there exists xo E D such that A(xo)-' E L ( Y , X ) ; there exist non-decreasing, non-negative functions a, b such that: IIA(xo)-'[A(x>- A(xo)lll I a(llx - xoll), IIA(XO>-'[F(Y> - F ( x ) - A(x)(Y- x)lll 5 b(llx - yll)llx - ~ l for all x, y E D ;
there exist rl 2 0, r o > 7 such that
and
l,
5.4. Exercise where
and
ro is the minimum positive root of equation h(r) = 0 on (0, ro],where
>
Show: sequence {x,} (n 0) generated by Newton-like method (5.2.1) is 0 and converges to a solution well defined, remains in U(xo, ro) for all n x* E U(xo,ro) of equation F(x) = 0. 5.2.
>
(a) Let F be a FrBchet-differentiable operator defined on some closed convex subset D of a Banach space X with values in a Banach space Y; let A(x) E L(X,Y) (x E D). Assume: there exists xo E D such that A(xo) E L(X, Y), A(xo)-l E L(Y, X), and
Then, show: (1) for all ~1 > 0 there exists 61 > 0 such that
~I[A(X)-'- A ( x o ) - ~ ] A ( x o 0 there exist 6 > 0 as defined above such that
and
11 [A(x)-'
- A(xo)-']A(xo)ll
< E,
for all x, y E U(xo, 6). (b) Let operators F, A, point xo E D, and parameters Assume there exist Q 0, c 0 such that
>
>
E,
6 be as in (1).
5. Newton-like methods and
Show: sequence {x,) (n 2 0) generated by Newton-like method (5.2.1) is well defined, remains in U(xo, 6) for all n 2 0, and converges to a solution x* E U(xo,6) of equation F ( x ) = 0. Moreover, if linear operator
is invertible for all x, y E D , then x* is the unique solution of equation F(x) = 0 in U(xo, 6). Furthermore, the following estimates hold for all n 20
and
(c) Let X
= Y = IR,D
> U(0, .3), xo = 0,
Set A(x) = F1(x) (x E D), 63 = J4 = ES = ~4 = .3. Then we obtain
-
f < 1, = .28 < 6 = 63.
C3 =
The conclusions of (b) hold and
(d) Let x* E D be a simple zero of equation F(x) = 0. Assume:
< ~ 1 1 , for a11 x E U(x*,611), 11 [A(x)-' - A(x*)-l]A(x*) 11 < ~ 1 2 , for a11 x E U(x*, 612). - A(x)]II IIA(X*)-~[F'(~)
Set
Further, assume:
5.4. Exercise and
Show: sequence, {x,) ( n 2 0 ) generated by Newton-like method (5.2.1) is well defined, remains in U ( x * ,613) for all n 0 and converges to x* with
>
[lxn+l - x* 11 I cgllxn - x*11,
for all n
2 0.
5.3. Consider equation (5.1.1) and iteration (5.1.48). Assume:
and
+
for all x,, x , y E U (xo,r ) C U (xo,R ) , t E [0,11, where wn ( r t )- v, ( r ) t 2 0 and en ( r ) ( n 2 0 ) are nondecreasing non-negative functions with wn ( 0 ) = vn ( 0 ) = en ( 0 ) = 0 ( n 0 ) , vn ( r ) are differentiable, vk ( r ) > 0 ( n 2 0 ) for all r E [0,R ] , and the constants b,, en satisfy bn L 0, c, 2 0 and b, +en < 1 for all n t 0. Introduce for all n, i 2 0 , an = IIA (xn)-' ( F (x,) G 11, Pn,i ( r ) = ai-r+cn,i W n ( t )dt, Z n ( r ) = 1-vn (r)-bn, $n ( T ) = cn,i Jo en ( t )dt, -1 cn,i = Z n ( ~ i ), hn,i ( r ) = ~ n ,(r)+$n,i i ( r ) ,rn = IIxn - ~ 0 1 1 , an = IIxn+l - xnII, the equations
Jl
an = r
(1'
+ coo.,
(WO (tn
and the scalar iterations
+
+ t ) + en (rn + t ) )dt + (bn - en - 1) r )
pn))
(5.4.3)
5. Newton-like methods hn,n(~k,n+rn)
=
+ C n , n z n , n ( ~ k , n + ~ n )(k 2 n) ; (b) The function h ~(r), has ~ a unique zero sg in the interval [0,R] and ho,o(R) I 0;
(c) The following estimates are true:
for all r E [0,R - r,] and for each fixed n
2 0.
Then show: i. the scalar iterations {s:+~,,) and {s*+~,,)for k 2 0 are monotonically increasing and convergence to s;*, and :s for each fixed n 2 0, which are the unique solutions of equations (5.4.1) and (5.4.2) in [0, R - s,] respectively with s r I s;*, (n 0).
>
ii. Iteration (5.1.48) is well defined, remains in (xo,s*) for all n 2 0 and converges to a solution of equation (5.1. l), which is unique in u (xo, R). iii. The following estimates are true:
and
I:* I I; (n
2 0)
where I; and I;* are the solutions of the equations (5.4.3) and (5.4.4) respectively for all n 2 0.i The above approach shows how to improve upon the results given in [I211 by Yamamoto and Chen. iv. Define the sequence {s:)
(n 2 0) by
Then under the hypotheses (5.4.1)-(5.4.3) above show that:
and
where t* = lim s,.1 n+cc
5.4. Exercise
171
5.4. Consider the Newton-like method (5.2.1). Let A : D t L ( X , Y ) , xo E D, M-1 E L ( X ,Y ) ,X C Y , L-1 E L ( X ,X ) . For n 2 0 choose Nn E L ( X ,X ) and define Mn = Mn-lNn A (x,) Ln-1, Ln = LnP1 Ln-lNn, xn+l = xn Ln (yn), yn being a solution of Mn (yn) = - [F (x,) zn] for a suitable xn E Y .
+ +
+
+
Assume: (a) F is FrQchet-differentiableon D. (b) There exist non-negative numbers a , a0 and nondecreasing functions w , wo : R+ t R+ with w (0) = wo ( 0 ) = 0 such that
IIF (xo)ll I ao, IIRo ( ~ o ) l I l a, ~ ~011) [ [ A( x ) - A (xo)ll I wo ( 1 1 and
for all x , y E u ( x o ,R ) and t E [0, 11. (c) Let M-1 and L-1 be such that M P 1 is invertible,
11
[ [ M I : I P, IIL-111 I
r and
IIM-1
-
A ( S O ) LP111 I 6.
(d) There exist non-negative sequence {a,), {Zi,) for all n 2 0
, {b,)
and {en) such that
and
(e) The scalar sequence {t,) ( n 2 0 ) given by
tn+l = tn+l+ en+ldn+l(l+ cn+l) n
In
i=l
( n >), to
I
+ C hiw ( t i ) (ti - ti-1) + w (tn+l)(tn+l - t n )
= 0, t l = a
is bounded above by a t: with 0 < t; I R, where
5. Newton-like methods and
(f) The following estimate is true
E,
I E < 1 (n 2 0) .
Then show: i. scalar sequence {t,) (n 2 0) is nondecreasing and convergence to a t* with O t* and
(b) Let F: D C X C Y be a Frkchet-differentiable operator and A(x) E L(X, Y). Assume: there exist a simple zero x* of F and nonnegative continuous functions a, p, 7 such that
for all x E D; equation f
1
has nonnegative solutions. Denote by r* the smallest; and ~ ( x *r*) , C D. Show: Under the above stated hypotheses: sequence {x,) (n > 0) generated by the Newton-like method (5.2.1) is well defined, remains in U(x*,r*) for all n 0 and converges to x*, provided xo E U(x*,r*).
>
5. Newton-like methods Moreover the following estimates hold for all n
> 0:
IIxn+l - X* II I 6nIIxn - ~*11, where
5.6. (a) Assume:
>
there exist parameters K 0, M A3 E [O,l], 6 E [O, 2) such that:
> 0, L
2 0,
e > 0, p
2 0, q
> 0, XI,
X2,
and
where, q=-
6
1+*1 '
Then, show: iteration {tn) (n 2 0) given by
is non-decreasing, bounded above by
and converges to some t* such that
Moreover, the following estimates hold for all n 0 I tn+2 -tn+l I q(tn+l -tn)
20
I qn+l 7.
(b) Let X1 = X2 = X3 = 1. Assume: there exist parameters K that:
> 0, M > 0, L > 0, C > 0, p > 0, q > 0, 6 E [ O , l ] such
5.4. Exercise and
then, show: iteration {tn) (n
> 0) is non-decreasing, bounded above
t** = 21) 2-6
and converges to some t* such that
Moreover the following estimates hold for all n 2 0 0 I tn+z - tn+l I g(tn+l- tn) I (c) Let F: D
(i)n+l V .
X + Y be a F'rkchet-differentiable operator. Assume:
(1) there exist an approximation A(x) E L(X, Y) of Ft(x), an open convex subset Do of D l xo E Do, parameters q L 0, K 2 0, M 2 0, L 0, p 2 0, 1 2 0, A 1 E [O,l], Xz E [O,l], X3 E [O,1] such that:
and
+
[IA(X~)-' [ ~ ( x-) A(xo)]11 I Lllx - xollx3 1 for all x, y E DO; (2) hypotheses of (a) or (b) hold; (3)
U(xo,t*)
Do.
Then, show sequence {xn) (n 2 0) generated by Newton-like method (5.2.1) is well defined, remains in U(xo,t*) for all n 2 0 and converges to a solution ) F(x) = 0. x* E ~ ( x ~ , tof* equation Moreover the following estimates hold for all n 2 0:
and 112,
- x*II
I t* - t,.
5. Newton-like methods
176 Furthermore the solution x* is unique in U ( x o ,t * ) if
(t*)l+" or in U(x0,Ro) if Ro
(d) Let F : D
X
+Y
+ ~ ( t * )+PI ~ '
< 1,
> t * , U ( X ORo) , c DO,and
be a Frbchet-differentiable operator. Assume:
(a) there exist an approximation A ( x ) E L ( X , Y ) of F 1 ( x ) ,a simple solution x* E D of equation F ( x ) = 0, a bounded inverse A(x*) and parameters K , M , +, 22 0, X q , X 5 , X6 E [O,1]such that:
z,
and
z
I I A ( x * ) - ~ [ A-( xA)( X * ) ] I I ~ L I I X - I L . * I+ I~~ for all x , y E D; (b) equation K +
+
+ i r A 6 MrA5+ + + I - 1 = o
$4
has a minimal positive zero ro which also satisfies:
Er,X6+ 2< 1 and
Then, show: sequence {x,) (n > 0 ) generated by Newton-like method (5.2.1) is well defined, remains in U ( x * ,ro) for all n 2 0, and converges to x* provided that xo E U ( x * ,ro). Moreover the following estimates hold for all n 2 0:
11xn+l
-
X*
11 I 1
5 1-jqlzn-z*ll*6-r
K
+ ~ 1 1 ~ X. * I I +~ P] ~ IIX.
[ m ~ ~ ~ x*1lA4 n
-
-x
*.
5.7. Let F be a nonlinear operator defined on an open convex subset D of a Banach space X with values in a Banach space Y and let A ( x ) E L ( X , Y ) ( x E D ) . Assume:
5.4. Exercise (a) there exists xo E D such that A(xo)-' E L(Y, X ) ; (b) there exist non-decreasing, non-negative functions a, b such that: IIA(x0)-'[A(.>
- A(xo>lllI 4llx - xoll),
llA(xo)-'[~(Y)- F(x) - A(x)(Y- x)lll 1 b(llx - yll)llx - Yll for all x, y E D; (c) there exist 7 2 0, ro
> 7 such that
and d(r) < 1 for all r E (O,ro], where
and
(d) ro is the minimum positive root of equation h(r) = 0 on (0, ro]where,
Then show: sequence {x,) (n 2 0) generated by Newton-like method (5.2.1) is well defined, remains in U(xo, ro) for all n 2 0 and converges to a solution x* E U(xo,ro) of equation F(x) = 0.
C X + Y be a Frkchet-differentiable operator for some > 0, and A(x) E L(X,Y). Assume:
5.8. (a) Let F : U(z, R)
z E X, R
A(z)-l E L(Y, X ) and for any x, y E ~ ( zr), G ~ ( zR),
178
5. Newton-like methods
+
>
where w(r t) - wl (r) (t 0), wl (r) and wz(r) are non-decreasing, nonnegative functions with w(0) = wo(0) = wl(0) = ~ ~ ( =0 0,) wo is differentiable, wk(r) > 0, r E [0,R],
and parameters a , b satisfy
Define functions
(PI, ( ~ 2 (P ,
by
p r
iteration {rn) (n
where
(PO
> 0) by
is the minimal value of cp on [0,R];
Then show iteration {xn) (n 2 0) generated by Newton-like method (5.1.48) is well defined, remains in ~ ( zr*) , for any
and converges to a solution x* E o ( z , r:), which is unique in
where r* is the minimal point, r: is the unique zero on (0, r*], and r* = lim r,. n-00
>
Moreover sequence {rn)(n 0) is monotonically increasing and converges to r*. Furthermore the following estimates hold for all n 2 0
5.4. Exercise
179
provided that ro E R,,, where for x E D(r*)
and
In the next result we show how to improve on the error bounds. (b) Under the hypotheses of (a) show the conclusions hold with {r,) and D(r*) replaced by {t,} (n 2 O), D(t*) given by
( n 2 11,
t* = lim t,. n-00
Moreover iteration {t,} is monotonically increasing and converges to t* Furthermore the following hold for all n 2 1:
and r*. t* I If strict inequality holds in (5.4.5) so does in (5.4.6-5.4.8).
wo (r) = ~1 (r) r E [O, R] our result reduces to Theorem 1 in [121]. If (5.4.5) holds our error bounds are at least as fine as the ones given by Chen and Yamamoto (under the same hypotheses and a more general condition. Moreover our error bounds are finer. 5.9. (a) Let F : D
C X + Y be differentiable. Assume:
180
5. Newton-like methods
There exist functions f l : [O, 11 x [0,cm12 + [ O , c m ) , f 2 , f3 : [ O , c m ) nondecreasing on [O, cm12, [O, cm) , [0,cm) such that
+
[O,cm),
hold for all t E [O, 11 and x , y E D;
has non-negative solutions, and denote by ro the smallest one. In addition, ro satisfies:
U (xo,T O ) G D, and
where bo = :f.
(t,v,O,v)dt+fz(O) 1-f3(v) 1 .ft f i (t,bov,v,v+bov)dt+f*(v) bl = 1 - f 3 (v+bov) , fi
and b = b ( r ) = .fd
fl
(t,blbov,r,r)dt+fi(r) l-f3(7-)
,
Then, show iteration { x n ) ( n 2 0 ) generated b y Newton-like method (5.2.1) is well defined, remains in U ( x o ,ro) for all n 2 0 and converges to a solution x* E U ( x o ,ro) of equation. Moreover, the following estimates hold 11x2 - xi ll 5 boq 11x3 - x2ll 5 bl 11x2 - xi 11 . IIxn+l - x*II I b2 1IxL.n - xn-111, ( n 2 3) and IIx* - xnll I llx* - xnll I where
b2
= b (ro).
boblb;-2T, l-bz
b"
&,
( n2 3 ) ,
( nL 0)
5.4. Exercise Furthermore if r o satisfies rl
lo
$1
(t, 2 ~ 0TO, , TO)dt
+ f2 (TO)+ f3 (TO)< 1,
x* is the unique solution of equation F(x) = 0 in U (xo,ro) . Finally if there exists a minimum non-negative number R satisfying equation
1'
fl
(t, r
+ r ~ , r) dt +
such that U (xo,R)
TO,
+
fi (TO) fi (ro) =
1,
D, then the solution x* is unique in U (xo,R).
-
(b) There exist a simple zero x* of F and continuous functions f 4 : [O, 11x [0,co) + [o, co), f5, f6 : [O, co) [O, co), non-decreasing on [O, co) such that
and
hold for all t E [O,1] and x E D; Equation
1'
f4
( t , r ) d t + f 5 (r) + f 6 (') = l
has a minimum positive zero r*.
Then, show iteration {x,) (n 2 0) generated by Newton-like sequence (5.2.1) is well defined, remains in U (x*,r*) for all n > 0 and converges to x* provided that xo E U (x*,r*). Moreover, the following estimates hold for all n 2 0
where
5. Newton-like methods
182 (c) Let X
= Y = R,
D = (-1, I ) , x* = 0 and define function F on D by
For the case A = F' show
where r~ stands for Rheinboldt's radius (see section 4.1) [295] where rR is the convergence radius given by Rheinboldt's [295]. (d) Let X = Y = C [0, 11, the space of continuous functions defined on [ O , 1 ] equipped with the max-norm. Let D = {$ E C [O,1] ; 11 $11 5 1) and F defined on D by
with a solution $* (x) = 0 for all x E [0, 11. In this case, for each $ E D , Fr (4) is a linear operator defined on D by the following expression:
In this case and by considering again A = F',
5.10. Consider inexact Newton methods for solving equation F (x) = 0, F : D & RN + RN of the general form
where sn E RN satisfies the equation
F'
(2,)
s, = -F (x,)
+ rn
(n L 0)
for some sequence {r,) IKN. Let F E FA (a) = { F : U (x*, a ) + RN with U (x*,a ) & D, where F (x*) = 0, F is Frbchet differentiable on U (x*,a ) and F' (x)-' exists for all x E U (x*, a ) , F' is continous on U (x*,a ) and there > 0 such that for all y, z E U (x*,a ) exists
5.4. Exercise Suppose the inexact Newton method {x,) ( n > 0 ) satisfy:
>
for some { v n ) L R ( n 0). If xi E U ( x * ,a ) , then show: (a) IIxn+l - X* 11 I W n 1Ixn - X* 1 1 ,
(b) if v, 5 v
< 1 (n 2 0 ) , then there exists w E ( 0 , l ) given by
such that I[x,+l - x* 11 I w Ilx, - x* 11 and limn,, x , = x*; (c) if {x,) ( n 2 0 ) converges to x* and limn,, v, = 0, then {x,) converges Q-superlinearly; (d) if {x,) ( n 2 0 ) converges to x* and limn,, v,('+"-'< 1, then {x,) converges with R-order at least 1 A.
+
5.11 Consider the two-point method [144]:
for approximating a solution x* of equation F ( x ) = 0. Let F : R G X + Y be a twice-F'rkchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume: (1)
ro= F'
(xo)-l E L (Y,X ) exists for some xo E R with
(2) IlFoF (x0)ll i rl;
(3) llF"
( X I ll I
M ( x E 0);
(4) IIF" ( x ) - F"
Denote by a0
(YIII i K IIx - Y I I ,
= MPq,
an+l =a,f (an)'
x , Y E 0-
bo = KPr12. Define sequences gp
(an,b n ) ,
bn+l =bnf (an)3g p (an,bn)'
,
llroll I P;
184
5. Newton-like methods
>
then show: iteration {x,} (n 0) is well defined, remains in U(xo, 2)for all a0 n 2 0 and converges to a solution x* of equation F (x) = 0, which is unique Moreover, the following estimates hold for all n 0: in U(xo,
2).
>
where y = % and A = f
(~0)-'
5.12. Consider the two-step method:
for approximating a solution x* of equation F (x) = 0. Let F : R G X G Y be a F'rkchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume: (1)
ro= f f (xo)-'
E L (Y, X ) for some xo E R,
llroll 5 P;
(2) IlroF (xo)ll I 7;
(3) llFf(XI - Ff(YIII I K Ilx - YII (x1 Y E 0 ) .
(n 2 O), Denote a0 = kPq and define the sequence an+l = f (an12g (a,)?, where f (x) = 2-2x-x 2 a n d g ( x ) = x 2 ( x + 4 ) / 8 . 1 f a ~ ~ ( ~ , ~ ) , U ( x ~ , R q ) C I+ 9 0 , R = +, y = and A = f (~0)-', then show: iteration {x,) (n 0) is well defined, remains in u (xo, Rq) for all n 0 and converges to a solution - Rq) n 0 . Moreover, x* of equation F (x) = 0, which is unique in U(xo, the following estimates hold for all n 0
2
>
>
>
&
5.13. Let X be a Banach space, and let Y be a closed subspace. Assume F is a completely continuous operator defined on D X , D an open set, and assume the values F (x) E Y for all x E D. Let X, be a sequence of finite dimensional subspace of X such that inf I I y - x 1 1 + O a s n + c m f o r a l l y ~ Y .
x EX,
5.4. Exercise Let Fn be a sequence of projections associated with Xn :
Assume that when restricted to Y , the projections are uniformly bounded:
Then projection method for solving
x
=F
(x)
becomes
Suppose that x* E D if a fixed point of nonzero index for F. Then show for all sufficiently large n equation F ( x ) = x has at least one solution x, E Xn n D such that lim
n+m
Let F : U ( x o ,r ) X + X be differentiable and continuous on that I - P' ( a ) is compact. Suppose L E L ( X ) is such that
(xo,r ) such
llLF (xo)II I a III - LFf(xo)II I b < 1, IIL [F' ( x )- F' (y)]llI c IIx - yll for a11 x E U ( x o , ~ ) h=
"" 5 i
(1-b)Z
ro = & f
( h ) I r , where f ( h )=
q f; (0) 1. =
Then show: (a) equation F ( x ) = x has a solution x* in U (xo,ro) ; x* is unique in i /+ ,l if, for rl = &f l ( h ) ,where f i ( h )zZ =7
r < rl rsrl
for h
(XO,T )
< 4;
for h = $ .
(b) Furthermore Newton's method is well defined, remains in U ( x o ,r ) for all n 2 0 and converges to x*.
+
+
5.14. Let Fibonacci sequence {a,) be defined by aa, ban-l canpa = 0 where a0 = 0 and a1 = 1. If the characteristice polynomial p ( x ) = ax2 bx c has
zeros rl and
7-2
with lrl 1
> Ir21. Then show:
+ +
5. Newton-like methods (a) a, # 0 (n > 0) ; (b) lim % - rl; 77,-00
an
(F) (F,F)-,
(c) Newton and (d) secant where,
= azn+l
=
x, = Newton (x,-~) = X,-I
F(xn-i
-F
(n 2 l)
and xn = secant (x,-1, x,-~) = x,-~ -
-
FF ( ~ (n -Z1 )~( ~- n~- 1) --~ (n -X2 )~ (n )
2
Chapter 6
More Results on Newton's Method We exploit the weaker conditions introduced in Section 5 to improve on recent results on the local and semilocal convergence analysis of Newton's method.
6.1
Convergence radii for the Newton-Kantorovich method
Motivated by the works in Wang [341], [343] and using the same type of weak Lipschitz conditions we show that a larger convergence radius can be obtained for Newton-Kantorovich method (4.1.3). Recently in [341], [343] a Lipschitz-type condition
for all x E D, where L is an integrable, nondecreasing, nonnegative function, s(x) = IIx - x* 11 and xe = x* O(x - x*) 0 E [0, 11,was used to obtain larger convergence radii than the ones given by the relevant works in [136], [340], [312]. Condition (6.1.1) is weaker than the famous Lipschitz condition
+
where w is also a nondecreasing, nonnegative function. Condition (6.1.2) has been used e.g. in [295] (when w is a constant function). It turns out that (6.1.1) instead of the stronger (6.1.2) is really needed to show the local convergence of method (6.1.1) to x* [341],[343]. Based on this observation it became possible to obtain convergence radii larger than the ones obtained in [136], [295], [340], [312]. Here we show that all the above mentioned results can be further improved under weaker or the same hypotheses and under less or the same computational cost. A numerical example is also provided where our results compare favorably with all earlier ones. Several areas of application are also suggested.
188
6. More Results on Newton's Method
We provide the following local convergence theorems:
Theorem 6.1.1 Let F : D C X + Y be a Fre'chet-dzfferentzable operator. Assume: (a) there exists a simple zero x* E D of equation F ( x ) = 0 so that F1(x*)-I E L ( Y , X ) , positive integrable nondecreasing functions L , Lo such that the radius Lipschitz condition with L average and the center radius Lipschitz condition with Lo average
and
are satisfied respectively for all x E D , 0 E [0,11; (b) equation
I'
i r ~ ( u ) u d u + r Lo(u)du-r=O
(6.1.5)
has positive zeros. Denote by r* the smallest such zero; and U ( x * ,r * ) L D.
(6.1.6)
Then, sequence {x,) (n 2 0 ) generated by method (4.1.3) is well defined, remains i n U ( x * , r * )for all n 2 0 , and converges to x* provided that xo E U ( x * ,r*). Moreover the following estimates hold for all n 2 1: llx, - x*11 5
11x0 - X*
11
(6.1.7)
where,
and Proof. We first show a E [0,1). Indeed we can have in turn for 0 the monotonicity of L :
< u l < u2 and
6.1. Convergence radii for the Newton-Kantorovich method
Therefore can have
$ :J
189
L(v)dv is a nondecreasing function with respect to u. Then, we
Let x E U ( x * ,r*). Using (6.1.4) and (6.1.6) we get
It follows from (6.1.10) and the Banach Lemma on invertible operators 1.3.2 that F1(x)-I E L ( Y , X ) and
In particular xo E U ( x * , r * ) by hypothesis. Let us assume x , E U ( x * , r * ) . We obtain the identity
By (6.1.3), (6.1.4) and (6.1.11) we get
which shows the convergence of x, to x* and x,+l E U ( x * ,r*). Hence sequence {x,) ( n 0 ) remains in U ( x * ,r * ) by the principle of mathematical induction.
>
6. More Results on Newton's Method
190
Moreover s ( x n ) decreases monotonically. Furthermore by (6.1.13)we obtain more precisely
from which estimate (6.1.7) follows. rn
Remark 6.1.1 I n general
a
holds and can be arbitrarily large (see Section 5.1). If equality holds i n (6.1.1518) then our theorem coincides with the corresponding Theorem 3.1 i n [341]. However i f strict inequality holds in (6.1.15) (see Example 5.1.5), then our convergence radius r* is greater than the corresponding radius r i n Theorem 3.1. Moreover our convergence rate a is smaller than the corresponding q i n Theorem 3.1. Summarizing, if
then
That is our theorem provides a wider range of initial guesses xo and more precise error estimates o n the distances 112, - x*11 (n > 0 ) than Theorem 3.1 and under the same computational cost since i n practice the evaluation of function L requires that of Lo. In case Lo,L are constants see section 5.1.
Theorem 6.1.2 Let F : D G X + Y be a Fre'chet-differentiable operator. Assume: (a) there exists a simple zero x* E D of equation F ( x ) = 0, so that Ff(x*)-' E L(Y,X),and a positive integrable function Lo such that condition (6.1.4) holds; (b) equation
6.1. Convergence radii for the Newton-Kantorovich method
191
has a minimum positive zero denoted b y r:; and (4
Then sequence {y,)
(n> 0 ) generated by
remains i n U ( x * ,r2;)for all n 2 0 and converges to a unique solution x* of equation (4.1.1) so that
where a0 E [O,1) and
Proof. Simply use function Lo instead of L in the proof of Theorem 4.1 in [341].
Remark 6.1.2 If Lo = L then Theorem 6.1.2 reduces to Theorem 4.1 i n [341]. Moreover if (19) holds then
and
where b y ro,qo we denote the radius and ratio found i n Theorem 4.1. In the case when L , Lo are constants as expected our results improve Corollaries 6.3 and 6.4 in [342]which in turn compare favorably with the results obtained by Smale [312]and Debieu [I391 respectively when F is analytic and satisfies
provided that
and
6. More Results o n Newton's Method
for some yo
> 0 , y > 0 with
"Yo I Y
(6.1.29)
For example our result corresponding to Corollary 6.4 gives
whereas the corresponding radius in [342]is given by
Therefore (6.1.24) and (6.1.25) hold (if strict inequality holds in (6.1.29)). We can provide the local convergence result for twice Frhchet-differentiable operators as an alternative to Theorem 6.1.1. Theorem 6.1.3 Let F : D G X + Y be a twice Fre'chet-diflerentiable operator. Assume: (a) there exists a simple zero x* E D of equation F ( x ) = 0 , so that F 1 ( x * ) - l E L ( Y , X ) and nonnegative constants C , b, losuch that
+
I ~ F ' ( X * ) - ' [ F " ( X O ( X * - x ) ) - F " ( x * ) ] II~e ( i - e)llx - .*I],
(6.1.32)
I I F ' ( x * ) - ~ F " ( x *5) Ib,I
(6.1.33)
and
for all x E D , 0 E [O,l];(b) U ( x * ,R*) G D , where,
3b 1 1 d = - and c = -C -to. 2 3 2 Then, sequence {x,) generated by method (4.1.3) is well defined, remains i n U ( x * ,R*) for all n > 0 and converges to x* provided that xo E U ( x * ,R*). Moreover the following error bounds hold for all n > 0
+
and
6.1. Convergence radii for the Newton-Kantorovich method
193
Proof. Let x E U ( x * ,R*). Using (6.1.33), (6.1.34) and the definition of R* we obtain in turn
F1(x*)-I [ ~ ' ( x *-)F 1 ( x ) ]= -F1(x*)-'[F'(x) - F'(x*) - F U ( x * ) ( x- x * ) F1'(x*)(x- x*)]
+
and
Jd
1
+
I ~ F ~ ( X * ) - ~ [ F' ( FX *~ )( x )5I ~ ~ e 0 s x - x*l12do b11x - x*ll
It follows from (6.1.40) and the Banach Lemma on invertible operators that Ff (x)-I F 1 ( x * )exists and
In particular by hypothesis xo E U ( x * ,R * ) . Assume x, E U ( x * ,R*). Using F ( x * ) = 0 we obtain the identity
By (6.1.32), (6.1.33), (6.1.41), (6.1.42) and the triangle inequality we get
194
6. More Results on Newton's Method
Estimate (6.1.44) implies xn+l E U ( x * ,R*) and lim x , 12-00 The analog of Theorem 6.1.2 is given by
= x*.
rn
Theorem 6.1.4 Under the hypotheses of Theorem 6.1.3 (excluding (6.1.46)) and replacing R*, h by R,*,ho respectively given by 2
R* O-do+ J ho = coR;; do,
+
where,
-eo
co = 6
and
b do = 2
(6.1.47)
the conclusions of Theorem 6.1.3 hold, for iteration (6.1.21).
Proof. Using iteration (6.1.21) we obtain the identity
By (6.1.33), (6.1.34), (6.1.48) and the definition of R,*,ho we obtain (as in (6.1.43))
which completes the proof of Theorem 6.1.4. rn
Remark 6.1.3 (a) Returning back to Example 5.1.5 and using (6.1.32)-(6.1.36), (6.1.45)-(6.1.47) we obtain l = lo= e - 1, b = 1, d = 1.5, c = g(e - I ) , co = 1 do = 5 ,
9,
6.1. Convergence radii for the Newton-Kantorovich method and we can set
Comparing with the earlier results mentioned above (see Example 5.1.5) we see that the convergence radius is significantly enlarged under our new approach. Note also that the order of convergence of iteration (6.1.21) is quadratic whereas it is only linear i n the corresponding Theorem 4.1 i n [341]. (b) The results can be extended easily along the same lines as above t o the case when l , lo are not constants but functions. (c) The results obtained here can also be extended to hold for m-Fre'chetdifferentiable operators (m 2 2 ) along the lines of our works i n [79], [68]. However we leave the details of the works for (b) and (c) to the motivated reader. We present a new convergence theorem where the hypotheses (see Theorem 6.1.1) that functions L , Lo are nondecreasing are not assumed: Theorem 6.1.5 Let F , x*, L , Lo be as i n Theorem 6.1.1 but the nondecreasiness of functions L , Lo is not assumed. Moreover assume: (a) equation
has positive solutions. Denote by R* the smallest one; (b) U ( x * ,R*) C D.
(6.1.51)
Then sequence {x,) (n 2 0 ) generated by method (4.1.3) is well defined, remains i n U ( x * ,R*) for all n 2 0 and converges to x* provided that xo E U ( x *, R*), so that:
where,
and b E [0,1). Furthermore, if function L , given by
is nondecreasing for some a E [0,11, and R* satisfies
196
6. More Results o n Newton's Method
Then method (4.1.3) converges to x* for all xo E U(x*,R*)and
where,
Proof. Exactly as in the proof of Theorem 6.1.1 we arrive at
which show that if x, E U(x*,R*)then x,+l E U(x*,R*)and lim x, = x*. Esti12-00 mate (6.1.13)follows from (6.1.20). Furthermore, if function La is given by (6.1.15) and R* satisfies (6.1.16)then by Lemma 2.2 in [343,p. 4081 for nondecreasing function
we get
which completes the proof of Theorem 6.1.5. rn Theorem 6.1.6 Let F, x*, Lo be as in Theorem 6.1.2 but the nondecreasness of function Lo is not assumed. Moreover assume: (a) equation
c
has positive solutions. Denote by RT,the smallest one; (b) U(x*,Rg) D. Then, sequence {x,) (n 2 0) generated by method (4.1.3) is well defined, remains in
6.1. Convergence radii for the Newton-Kantorovich method
197
U ( x * ,R:) for all n 2 0 and converges to x* provided that xo E U ( x * ,R:). Moreover error bound (6.1.13) hold for all n 2 0 ; where
Proof. Simply use Lo instead of L in the corresponding proof of Theorem 1.2 in [343, p. 4091. Remark 6.1.4 (a) I n general (6.1.16) holds. If equality holds i n (6.1.16) then Theorems 6.1.4 and 6.1.5 reduce t o Theorems 1.1 and 1.2 i n [343] respectively. However if strict inequality holds in (6.1.16) then clearly: our convergence conditions (6.1.11), (6.1.24) are weaker than the corresponding ones (9) and (15) i n [343]; convergence radii R* and Rg are also larger than the corresponding ones i n [343] denoted here by p and po; and finally our convergence rates are smaller. For example in case functions Lo, L are constants we obtain
b=
LR* 1 - LOR*'
1 po = -, 3L
and R*
O
1
-. - 3L0 -
Therefore we have:
and
where q is the corresponding ratio in Theorem 1.1 i n [343]. Note that if strict inequality holds in (6.1.26) so does in (6.1.28)-(6.1.30). Moreover the computational cost for finding L o , L i n practice is the same as of that of finding just L . A s an example, as i n Section 5.1 consider real function F given by (5.1.40). Then using (6.1.3) and (6.1.4) we obtain Lo = e - 1 < e = L ,
and
Hence, we have:
6. More Results on Newton's Method and
(c) As already noted in [343, p. 4111 and for the same reasons Theorem 6.1.5 is an essential improvement over Theorem 6.1.4 if the radius of convergence is ignored (see Example 3.1 in [343]).
6.2
Convergence under weak continuous derivative
We provide a semilocal convergence analysis of Newton's method (4.1.3) under weaker Lipschitz conditions for approximating a locally unique solution of equation (4.1.1) in a Banach space. For p E [O, 11 consider the following Holder condition:
In Rokne [303], Argyros [ll]the Holder continuity condition replaced (6.2.1) or in Argyros [93] the more general
was used, where v is a nondecreasing function on [0, R] for some given R. In order to combine the Kantorovich condition with Smale a-criterion [296], [312] for the convergence of method (4.1.3) (when F is an analytic operator), Wang in [341], [343] used the weak Lipschitz condition
instead of (6.2.1). Huang in [I951 noticed that (6.2.3) is a special case of the Generalized Kantorovich condition IIF'(xo)-~[F'(x)- F'(Y)III5
I
s(llx-~ll+llx-~II) L (u)du
(6.2.4)
s(llx-yll)
for all x E Ul, y E U2 where Ul
=
u ( x o ,t*), U2
=
u ( x o , t * - ]lx - xoll), and
t* E (0, R) is the minimum positive simple zero of
with functions L, s' continuously increasing satisfying L(u) 2 0, on [0,R], s(0) = 0 and st(t) 2 0 on [0,R), respectively. Under condition (6.2.4) a semilocal convergence analysis was provided and examples were given where the Generalized
6.2. Convergence under weak continuous derivative
199
Kantorovich condition (6.2.4) can be used to show convergence of method (4.3.1) but (6.2.1) cannot. Here, we provide a semilocal convergence analysis based on a combination of condition (6.2.6) (which is weaker than (6.2.4)) and on the center generalized Kantorovich condition (6.2.7). The advantages of such an approach have already been explained in Section 5.1. We need the definitions: Definition 6.2.1 (Weak Generalized Kantorovich Condition). W e say xo satisfies the weak generalized Kantorovich condition i f f o r all 0 E [O, 11, x E U l , y E U z
holds, where functions s l , L 1 are as s , L respectively. Definition 6.2.2 (Center Generalized Kantorovich Condition). W e say xo satisfies the center generalized Kantorovich condition if for all x E Ul
holds, where functions so, L o are as s and L respectively. Note that in general so(u) I s1(u) I s ( u ) and
hold. In case strict inequality holds in (6.2.8) or (6.2.9) then the results obtained in [I951 can be improved under (6.2.6) or even more under the weaker condition (6.2.7). Indeed we can show the following semilocal convergence theorem for method (4.1.3). We first define functions fo, f i , by fo(t) =
Jd
fl(t) =
so(t)
L o ( u ) ( t - .;l(u))d.
-t
+ v,
~ ~ ( u -) s(; lt( u ) ) d u
-t
+ 1)
and sequences a,, t, by a,+l
= a,
fi(tn) fA(tn)
- -
( n 2 O ) , (to = O ) ,
f(tn) tn+l= tn - ( n 2 0 ) , (to = 0 ) . f '(tn)
(6.2.13)
200
6. More Results on Newton's Method
Theorem 6.2.3 Let F : D C X 4 Y be a Fre'chet-differentiable operator. Assume: there exists xo E D such that F1(xo)-' E L ( Y , X ) ; and condition (6.2.5) holds. Then sequence {x,) generated by method (4.1.3) is well defined, remains in U ( x o ,a*) for all n 2 0 and converges to a unique solution x* of equation F ( x ) = 0 in U ( x o , a * ) ,where a* is the limit of the monotonically increasing sequence {a,) given by (6.2.12). Moreover the following estimates hold:
11xn+l - xn11 I an+i - an I tn+l - t,, llx, - x* 11 I a* - a, I t* - t,, an I t n ) t*, a* I
and
where r E [a*,t**]satisfies f ( r )5 0, a* t** = max { t ,f ( t ) 5 0 ) ) 60 = -, a* 0 ) generated by method (4.1.3) is well defined, remains in ~ ( x ob*), for all n > 0 and converges to a solution x* E u ( x o ,b*) of equation F ( x ) = 0 . Moreover, the following estimates hold for all n > 0:
and
where iteration {b,) ( n > 0 ) is given by (6.2.27). The solution x* is unique in U ( x 0 ,b*) if
Furthermore if there exists Ro such that
and
1'
+
[ ~ ( b * B(b*
+ Ro))- w(b*)]dB+ W O ( ~ 5* )1,
then the solution x* is unique in U ( x 0 ,Ro). Proof. Using induction we show xk E u ( x o , b*) and estimate (6.2.32) hold for all k 0. For n = 0 , (6.2.32) holds, since
>
Suppose (6.2.32) holds for all n = 0 , 1 , . . . ,k
We show (6.2.32) holds for n = k
+ 1; this implies in particular that
+ 2. By (6.2.7) and (6.2.25) we get
6.2. Convergence under weak continuous derivative
203
It follows from (6.2.37) and the Banach Lemma on invertible operators 1.3.2 that F ' ( X ~ + ~exists ) - ~ and
By (4.1.3), (6.2.6), and (6.2.23) we obtain
Using (4.1.3), (6.2.38) and (6.2.39) we get
Note also that:
, It follows from (6.2.40) that {x,) ( n 2 0 ) is a Cauchy That is Xk+2 E V ( x O b*). sequence in a Banach space X, and as such it converges to some x* E U ( x o ,b*). By letting k + oo in (6.2.39) we obtain F ( x * ) = 0. Moreover, estimate (6.2.33) follows from (6.2.32) by using standard majorization techniques. Furthermore, to show uniqueness, let y* be a solution of equation F ( x ) = 0 in U ( x o ,Ro). It follows from (4.1.3)
6. More Results on Newton's Method
204 Similarly if y* E U(xo, b*) we arrive at
I~Y*
+
[w(b* Qb*)- w(bt)]dQ I~Y* - ~kll, 1 - wo(b*)
- ~ k + l I l5
In both cases above lim xk = y*. Hence, we conclude k-00
x* = y*. We can now compare majorizing sequences {t,), {a,), { b n ) under the hypotheses of Theorem 1 in [195, p. 1171, and Theorems 6.2.3, 6.2.5 above:
Lemma 6.2.6 Under the hypotheses stated above and if strict inequality holds in (6.2.7) or (6.2.8) then the following estimates
and
b* 5 a* 5 t* hold for all n
2 0.
Proof. It follows easily by using induction, definitions of sequences {t,), {a,), {b,) given by (6.2.12), (6.2.11), (6.2.27) respectively and hypotheses (6.2.8), (6.2.9).
Remark 6.2.2 The above lemma justifies the advantages of our approach claimed at the beginning of the section.
6.3
Convergence and universal constants
We improve the elegant work by Wang, Li and Lai in [342],where they gave a unified convergence theory for several Newton-type methods by using different Lipschitz conditions and a finer "universal constant". It is convenient to define majorizing function f and majorizing sequence {t,) by
and
respectively where 7 , e, L are nonnegative constants.
6.3. Convergence and universal constants Proposition 6.3.1 [342, p. 2071 Let
and
Then, (a) function f is monotonically decreasing i n [O,r], while it is increasing b then monotonically in [r,+m). Moreover, if q I
Therefore f has a unique zero in two intervals denoted by t* and t** respectively, satisfying
when q < b and t* = t**,when q = b; (b) Iteration {t,) is monotonically increasing and converges t o t*. From now on we set D = 8 ( x o ,r ) . We can define a more precise majorizing sequence {s,) (n > 0 ) by
so = 0 , sl = q , for some constant loE [O,!]. Iteration {t,) (n 2 0 ) can also be written as
Then under the hypotheses of Proposition 6.3.1, using (6.3.6), (6.3.7) and induction on n we can easily show: Proposition 6.3.2 Under the hypotheses of Proposition 6.3.1, sequence {s,)
is monotonically increasing and converges to some s* E ( q ,t*].Moreover the following estimates hold:
and S*
5 t*
provided that L
> 0 if lo= e.
6. More Results on Newton's Method
206
Let xo E D such that Ft(xo)-' E L(Y, X ) . W e now need to introduce some Lipschitz conditions:
11 Ft(xo)-' [ F t ( x )- Ft(xo)]11 I
CO~~ Xxo 11,
for all x , y: IIy
-
211
+
112
for all x E ~ ( x or ),
-
(6.3.12)
~ 0 1 1 5 T,
for all x E ~ ( x or ),, IIFt(xo)-'[Ft(x)-Ft(xo)]llICllx-~011, for a l l x € u ( x o , r )
(6.3.15)
I I F ' ( X ~ ) - ~ F I~ 'c( X ~ ) ~ ~
(6.3.16)
11 F ' ( X ~ ) - ' F ~ ' ( XIOlo )~~
(6.3.17)
( ( F ~ ( X ~ ) - ' [ F ~ ~ ( ~ ) -IFL((y-XI(, ~ ~ ( X ) ]for ( ( ally,^: ((y-x((+((x-xo(lI r, (6.3.18) (6.3.19) ~~F~(X~)-'[F "Ftt(xo)]ll (X) I Lllx - ~ 0 1 1 , for all x E T ( x o , r ) and
11 F ' ( X ~ )[Ftt(x) - ~ - Ftt(xo)] 11 I
Lollx - xo 11, for all x E ~ ( x or ),.
(6.3.20)
Note that we will use (6.3.12), (6.3.17), (6.3.20) instead of (6.3.15), (6.3.16) and (6.3.19) respectively in our proofs. W e define as in [342] K ( ' ) ( x o ,C ) = { F E C' ( x o ,r ) : (6.3.13) holds for L = O), K:iL,(xo,C) = { F E c l ( x o , r ) :(6.3.15) holds), K ( ' ) ( x o C, , L ) = { F E c l ( x o ,r ) : (6.3.13) holds), K : ~ ~ , (l X ,L~) = , { F E C1(xo,r ) : (6.3.14) holds),
K ( ' ) ( x Ol, ,L ) = { F E C 2 ( x or, ) : (6.3.16) and (6.3.18) hold), K ~ ~ L , ( xl o,L, ) = { F E C 2 ( x 0r, ) : (6.3.16) and (6.3.20) hold). Moreover for our study we need to define: K(')(X~,C~,C = ,{LF) E C 1 ( x o , r ) :(6.3.12) and (6.3.13) hold), K:iL,(xo, l o ) = { F E C' ( x o ,r ) : (6.3.12) holds),
K ( ~ ) ( x ~ , C ~ , LC ), = { F E c 2 ( x 0 , r ) :(6.3.12), (6.3.17) and (6.3.18) hold or (6.3.12), (6.3.16) and (6.3.18) hold). W e can provide the main semilocal convergence result for method (4.1.3).
6.3. Convergence and universal constants
207
Theorem 6.3.3 Suppose that F E K ( ~ ) ( X ~ , C ~ , LC ), . If q 5 b, then sequence {x,) generated by method (4.1.3) is well defined, remains i n u ( x o ,s*) for all n > 0, and converges to a solution x* E U ( x O r, ) of equation F ( x ) = 0 . Proof. Simply use (6.3.12) instead of (6.3.15) in Lemma 2.3 in [342] to obtain F1(x)-' exists for all x E V ( x o ,r ) and satisfies
The rest follows exactly as in Theorem 3.1 in [342] but by replacing {t,), {t*), (6.3.11) by {s,), { s * ) , (6.3.21) respectively. rn Theorem 6.3.4 Suppose that F E K~en,(xo,Co). If q 5 b, then the equation F ( x ) = 0 has a unique solution x* i n V ( x O s;)), , where s;) = s* if q = b, and s;) E [ s * , t * )i f q < b. Proof. As in Theorem 3.2 in [342]but used (6.3.12) instead of (6.3.14). rn By Proposition 6.3.2 we see that our results in Theorems 6.3.3 and 6.3.4 improve the corresponding ones in [342]as claimed in the introduction in case Co 5 C. At this point we wonder if hypothesis q 5 b can be weakened by finding a larger "universal constant" b. It turns out that this is possible. We first need the following lemma on majorizing sequences: Lemma 6.3.5 Assume there exist nonnegative constants eter 6 E [O, 2) such that for all n 2 0
to,C , L , q and a param-
Then, iteration {s,) given by (6.3.7) is monotonically increasing, bounded above and converges to some s* such that by s** =
&
Moreover the following estimates hold for all n > 0:
Proof. The proof follows along the lines of Lemma 5.1.1. However there are some differences. We must show using induction on the integer k > 0
6. More Results on Newton's Method
208 0 5 Sk+l - S k ,
(6.3.26)
0 < 1- l o s k + l .
(6.3.27)
and
Estimates (6.3.25)-(6.3.27) hold for k = 0 by (6.3.22). But then (6.3.6) gives
Let us assume (6.3.24)-(6.3.27) hold for all k 5 n. We can have in turn:
We must also show S'
I s**.
For k = 0,1,2 we have SO = 0
I s**, s1 = 7) I s**, and
s2
=q
+ -q62
I s**.
n. It follows from the induction hypothesis Assume (6.3.30) holds for all k I
Moreover (6.3.27) holds from (6.3.22) and (6.3.31). Inequality (6.3.26) follows from (6.3.6), (6.3.27) and (6.3.25). Hence, sequence {s,) is monotonically increasing, bounded above by s**, and as such it converges to some s* satisfying (6.3.23). Remark 6.3.1 Condition (6.3.22) can be replaced by a stronger but easier to verify condition
6.3. Convergence and universal constants where,
Indeed (6.3.22) certainly holds if
since,
holds for all n 2 0 if S
E
[O,
- 11. However
(6.3.37) holds by (6.3.32).
Hence, we showed:
Lemma 6.3.6 If (6.3.22) replaces condition (6.3.22) i n Lemma 6.3.5 then the conhold. clusions for sequence {s,) Remark 6.3.2 I n Lemma 6.3.5 we wanted to leave (3.4) as uncluttered as possible, since it can be shown to hold in some interesting cases. For example consider Newton's method, 2.e. set L = 0. See Application 5.1.4 i n Section 5.1. Note also that bo can be larger than b, which means that (3.14) may hold but not n I b. Indeed, here are two such cases: choose (a) L = 1, yo = y = .5 and v = .59 then b = .583 and (b) L
= .8,
y = .4 and q
=
b0 = .590001593 .7
then b = .682762423 and
bo = .724927596.
Below is the main semilocal convergence theorem for Newton's method involving F'rhchet differentiable operators and using center-Lipschitz conditions.
210
6. More Results on Newton's Method
Theorem 6.3.7 Assume: (a) F E ~ ( ' ) ( x ~ , l L~) , land , (b) hypotheses of Lemmas 6.3.5 or 6.3.6 hold. Then, sequence { x n ) ( n 2 0 ) generated by method (4.1.3) is well defined, remains i n U ( x o ,s * ) for all n 2 0 , and converges to a solution x* E U ( x o ,s*) of equation F ( x ) = 0. Moreover the following estimates hold for all n 2 0:
and
where, sequence { s n ) ( n 2 0 ) is given by (6.3.6). The solution x* is unique in U ( x o , s * ) if ys*
< 1.
Furthermore, i f there exists R
(6.3.41)
> s* such that U ( x o ,R) c D, and
the solution x* is unique in U ( x o ,R). Proof. Let us prove that
and
hold for all k
2 0. For every x E U ( X Is*, - s ~ )
implies x E U l ( x l ,s* - s l ) . Since also
11x1 - ~
0 1 1= [ [ F ' ( X ~ ) - ~ F ( ~ OI ) ~ ~ =
(6.3.43) and (6.3.44) hold for k
= 0.
Given they hold for n
= 0,1,.
. .,k , then
6.3. Convergence and universal constants
and
Using (4.1.3) we obtain the approximation
Then, we get by (6.3.12), (6.3.13) and (6.3.45)
Using (6.3.12) we get I I ~ ' ( x o ) - ' [ ~ ' ( x k +l ) F1(xo)]II5 YoIIxk+l - 2011 I yosk+l I ys* < 1 (by (6.3.22)).
(6.3.47)
It follows from the Banach Lemma on invertible operators 1.3.2 and (6.3.22) that F1(x,+l)-' exists so that: I I ~ ' ( x k + l ) - ~ ~ ' ( x oI ) l I[l - ~ollxk+i- ~ o l l ] -I~ ( 1 - ~ o t k + i ) - ~ . Therefore by (6.3.2), (6.3.46), and (6.3.48) we obtain in turn
~ h u for s every z E T(xk+2,S* - sk+2) we have
That is
(6.3.48)
212
6. More Results on Newton's Method
+
Estimates (6.3.49), (6.3.50) imply that (6.3.43)' (6.3.44) hold for n = k 1. The proof of (6.3.43) and (6.3.44) is now complete by induction. Lemma 6.3.5 implies I t n ) (n 2 0) is a Cauchy sequence. From (6.3.43), (6.3.44) {xn) (n 2 0) becomes a Cauchy sequence too, and as such it converges to some U(xo,s) such that (6.3.40) holds. The combination of (6.3.40) and (6.3.46) yields F(x*) = 0. Finally to show uniqueness let y* be a solution of equation F(x) = 0 in U(xo, R). It follows from (6.3.12), the estimate
< Z2( s * + R) 5 1 (by (6.3.42)), and the Banach Lemma on invertible operators that linear operator
is invertible. Using the identity
we deduce x* = y*. Similarly we show uniqueness in u ( x o ,s*) using (6.3.41). rn As in [342] we also study the semilocal convergence of King's iteration [208]
where
[$I
denotes the integer part of
2 , and Werner's method
[345]
Both methods improve the computational efficiency with (6.3.53)-(6.3.55) being although the more preferable because it raises the convergence order to 1 number of function evaluations doubles when compared with Newton's method. Using (6.3.12) instead of (6.3.14) or (6.3.19) to compute upper error bounds on 11 F1(x)-I F1(xo)11 (see e.g. (6.3.48)) we can show as in Theorems 4.1 and 4.2 in [342] respectively:
+ a,
6.4. A Deformed Method
213
Theorem 6.3.8 Let F E K(') (xo,yo,y, L) and assume hypotheses of Lemmas 6.3.5 or 6.3.6 hold. Then sequence {xn) generated by King's method (6.3.52) converges to a unique solution x* of equation F ( x ) = 0 in u ( x O ,s*). Theorem 6.3.9 Let F E K ( ~ ) ( x olo, , l,L) and assume hypotheses of Lemmas 6.3.5 or 6.3.6 hold. Then sequence {xn) generated by Werner's method (6.3.53)-(6.3.55) converges to a unique solution x* of equation F(x) = 0 in u ( x o ,s*).
Note that finer error bounds and a more precise information on the location of the solution x* are obtained again in the cases of Theorems 6.3.8 and 6.3.9 since more precise majorizing sequences are used.
6.4
A Deformed Method
We assume that F is a twice continuously FrBchet-differentiable operator in this section. We use the following modification of King's iteration [208] proposed by Werner [345]:
Although the number of function evaluations increases by one when compared to Newton's method, the order of convergence is raised from 2 to 1 4 [345]. Here in particular using more precise majorizing sequences than before we provide finer error bounds on the distances llxn+l - xnII, lix, - x*11 and a more precise information on the location of the solution x*. Finally, we suggest an approach for even weakening the sufficient convergent condition (6.4.8) for this method along the lines of the works in Section 5.1. Let p 2 0, y > 0, and yo > 0 be given parameters. It is convenient to define real function f by
+
and scalar sequences {t,), tn+l
= t , - fl(sn)-'
{L)(n 2 0) by f (t,)
to = ro
= 0,
(6.4.4)
6. More Results on Newton's Method
214
where,
and
We need the following lemma on majorizing sequences:
Lemma 6.4.1 Assume:
and
Then iterations {t,},
where, -*
t
= lim
{G)are well defined and the following
Zn,
12-00
and
t* = lim tn = n+cc
47
estimates hold:
6.4. A Deformed Method
215
Proof. Estimates (6.4.10) and (6.4.19) follow using only (6.4.8) (see Lemma 2 in [177,p. 691. The rest of the proof follows immediately by using induction on n and noticing that
Indeed (6.4.20) reduces to showing:
which is true by the choice of t. Remark 6.4.1 It follows fiom Lemma 6.4.1 that i f { L ) is shown to be a majorizing sequence for {x,) instead of {t,) (shown to be so see Theorem 1 [l79]) then we can obtain finer estimates on the distances
and an at least as precise infomation on the location of the solution x*. W e also note that as it can be seen i n the main semilocal convergence theorem that follows all that is achieved under the same hypotheses/computational cost: Theorem 6.4.2 Let F: D X + Y be a thrice continuously Fre'chet differentiable operator. Assume there exist xo E D , b L 0 , y > 0 , yo > 0 such that F'(x0)-l E
L(Y>X ) ,
for all
U ( X ~ '( I -
$)$)c D
and hypotheses of Lemma 6.4.1 hold. Then iterations {x,), {z,) ( n 2 0) generated by method (6.4.2) are well defined, remain i n g ( x o , t * ) for all n 2 0, where t* is
6. More Results on Newton's Method
216 given -
by (6.4,19) and converge to a unique solution x* of equation F ( x ) $) . Moreover the followzng estzmates hold:
u ( x o ,( 1
3)
=
0 in
and
Proof. We only need to show that the left-hand side inequalities in (6.4.27) and
(6.4.28) hold. The rest follow from Lemma 2.1, standard majorization techniques and Theorem 1 in [179]. Let x E g ( x o , t * ) . It follows from (6.4.9), (6.4.19) and (6.4.23) that
Using the Banach Lemma on invertible operators [6] and (6.4.31) we conclude that F' ( x ) - l exists and
By (6.4.2), (6.4.3)-(6.4.7), (6.4.20) above and (21)-(23) in Theorem 1 in [179, p. 701 we arrive at
(with F,, ?, replacing r,, t, respectively in the estimate above (23) in [179]).Note also that (6.4.27) holds for n = 0 , and by the induction hypothesis
Therefore the left-hand side of (6.4.27) holds for n + 1. We also have
Hence, the same is true for y,+l. (6.4.32) above we have
-
Moreover by (24), (25) in [179],and (6.4.20) and
-
= tn+2 - rn+1.
+
Hence, the left-hand side of (6.4.28) holds for n 1. The rest of the proof follows exactly as in Lemma 6.4.1 with K ,En, ?* replacing r,, t , and t* respectively.
6.5. The Inverse Function Theorem
217
{z,}
Remark 6.4.2 (a) Let us denote by ( n 2 0) scalar sequence generated as (6.4.6), (6.4.7) by with &, S,, F,, g,+l replaced by Tn,Z,, F,, f (t,+l), respectively. It follows from the proof of Theorem 6.4.2 that {T,) is also a majorizing sequence for {x,) so that
Tn
< t,,
-
Fn
< rn and
5 t*
where, 3
t
= lim
n+cm
-
&.
(b) A is expected that since sequence {En) is finer than {t,) condition (6.4.8) can be dropped as a suficient convergence condition for {T,}, and be replaced b y a weaker one. We do not pursue this idea here. However we suggest that such a condition can be found if we work along the lines of the more general majorizing sequences studied in Section 5.1.
6.5
The Inverse Function Theorem
We provide a semilocal convergence analysis for Newton's method in a Banach space setting using the inverse function theorem. Using a combination of a Lipschitz as well as a center Lipschitz type condition we provide: weaker sufficient convergence conditions; finer error bounds on the distances involved, and a more precise information on the location of the solution than before [340]. Moreover, our results are obtained under the same or less computational cost, as already indicated in Section 5.1. Let xo E D such that U ( x o ,r ) G D for some r > 0, and F1(xo)-' E L(Y, X ) . The inverse function theorem under the above stated conditions guarantees the existence of E > 0, operator Fr(xo)-l: U ( F ( x o ) , & ) Y + X such that Ff(xo)-' is differentiable,
c
and
F(F' ( 2 0 ) - ' ( 2 ) ) = z
for all z E U ( F ( x o ) E, ) .
In particular in [341] the center Lipschitz condition
(6.5.2)
218
6. More Results on Newton's Method
was used, where s(x) = 112 - xo11, and L is a positive integrable function on (0, r] to first study the semilocal convergence of the modified Newton method. Then using the same function L and the Lipschitz condition
+
for all x E U(xo,r), y E U(x, r - s(x)), where s(x, y) = s(x) lly - xll 5 r , the semilocal convergence of method (6.5.2) was studied in a unified way with the corresponding modified Newton's method. We assume instead of (6.5.3):
By Banach's Lemma on invertible operators 1.3.2 and
for all r: E (0, r], x E U(xo, r!), Ff(x)-l exists, and
with r: satisfying
By simply using (6.5.5) instead of (6.5.3) we can show the following improvements of the corresponding results in [340]: Theorem 6.5.1 Set bo = Then the following hold:
So
rO O
Lo(u). Assume condition (6.5.5) holds for all r
Ff(xo)-l exists and is differentiable on
and the radius of the ball is the best possible (under condition (6.5.5)).
2 r:.
6.5. The Inverse Function Theorem
219
Lemma 6.5.2 Let
where Ro satisfies
If ,B P ( 0 ,bo), then fo is monotonically decreasing on [O,rg],while it is increasing monotonically on [rg,R] so that
Moreover, fo has a unique root in each interval, denoted by ry and r:, satisfying:
If (6.5.5) is extended to the boundary, i.e.
according to Lemma 6.5.2, Theorem 6.5.1 implies a more precise result:
Proposition 6.5.3 If
and
then
Ff (xo)-I exists and is differentiable o n U ( F ( X ~I l)r ,, ( ~ ) - I ) ;
satisfies (6.5.7), and radius r: is as small as possible.
6. More Results on Newton's Method Proposition 6.5.4 Assume: 7-
[r!,r:l,
P E ( 0 ,bo) and condition (6.5.14) hold. Then
In particular if y = 0 and
P = I I F ' ( X ~ ) - ~ F ( Xwe~ )get: ~~,
Theorem 6.5.5 Assume:
rE[r:,rg], ifP 1 than the ones i n [34O] using majorixing sequence {t,) instead of the more precise {s,). W e also note that our results are obtained under the same computational cost with the ones i n [340] since in practice finding L requires finding Lo. The rest of the error bounds i n (3401 can also be improved by simply using Lo i n (6.5.7) instead of L see Section 5.1. Instead we wonder if the convergence of scalar sequence {s,) can be established under weaker conditions than the ones given i n Theorem 6.5.6.
224
6. More Results on Newton's Method
We showed in Section 5.1 that {s,} is monotonically convergent to s* if: (A) If L o , L are constants then sequence {s,) given by (6.5.28) becomes
The advantages in this case have been explained in Section 5.1. ( B ) There exists a minimal positive number R1 E [P, RO]satisfying equation
Moreover
R1
satisfies
iR1
L O ( u ) d u< 1.
Indeed if s,+l 5 R1,then Sn+2 I %+I
+1
Js:+l
-
L( u )du
( s n + l - sn) L0(u)du
st1L ( u )du
I/?+
R1 = R1. 1 - 1;' Lo(u)du
Hence, {s,) is monotonically increasing (by (6.5.41)) and bounded above by R1, and as such it converges to some s* E (/?,Rl]. ( C ) Other sufficient convergence conditions for sequence i s n ) can be found in Section 5.1 but for more general functions Lo and L. Following the proof of Theorem 6.5.6 but using hypotheses (A) or ( B ) or ( C ) above we immediately obtain: Theorem 6.5.7 Under hypotheses ( A ) or (B) or (C) above the conclusions of Theorem 6.5.6 for method (4.1.3) hold i n the ball U ( x l ,s* - /?) and s* 5 r l . Note that s* replaces rl in the left-hand side of estimate (6.5.30).
Next we examine the uniqueness of solution x* in the ball U ( x o ,r l ) or U ( x o ,s*). Remark 6.5.3 Assume:
6.5. The Inverse Function Theorem
l L B r
225
+ 1 -B)R][Brl+ (1 -B)R]dB < 1.
(6.5.43)
Then under the hypotheses of Theorem 6.5.6 the solution x* is unique in U ( x o r, l ) if (6.5.42) holds and in U ( x o R , ) for R rl if (6.5.43) holds. Indeed, if y* is a solution of equation F ( x ) = 0 inU(xO,R ) , then by (6.5.5) we get for L = J ; F1(y*+ B(x* - y*))dB
>
if y* E U ( x o r, l ) or (6.5.43) if y* E U ( x o R , ) . It follows b y the Banach Lemma on invertible operators and (6.5.44) that linear operator L is invertible. Using the identity 0 = F ( x * )- F(y*)= L(x* - y*)
we obtain in both cases
which establishes the uniqueness of the solution x* in the corresponding balls ~ ( X Or ,l ) and U ( x o R , ) . Replace rl by s* in (6.5.42) and (6.5.43) to obtain uniqueness of x* , ) under the hypotheses of Theorem 6.5.7. The corresponding in U ( x o s*) , or U ( x o R problem of uniqueness of the solution x* was not examined in Theorem 1.5 in [340]. Remark 6.5.4 Assume operator F is analytic and such that
Smale in [312] provided a semilocal convergence analysis for method (4.1.3) using (6.5.45). Later in [I771 Smale's results were improved by introducing a majorizing function of the form
(see also Section 6.4). Define function L by
Then conditions (6.5.3) and (6.5.4) are satisfied and function f defined in Remark 6.5.1 coincides with (6.5.46).
6. More Results o n Newton's Method
226
Concrete forms of Theorems 6.5.1 and 6.5.5 were given by Theorems 5.1 and 5.2 respectively in [340]. Here we show how to improve on Theorems 5.1 and 5.2 along the lines of our Theorems 6.5.1 and 6.5.5 above. Indeed simply define functions fo, Lo corresponding to f , L by
and
Again for this choice of L o , function fo given by (6.5.10) coincides with (6.5.48). Moreover condition (6.5.5) is satisfied. Clearly "lo l r
(6.5.50)
holds in general. If strict inequality holds in (6.5.50) then our results enjoy the ones corresponding to (6.1.14) benefits (see Remark 6.5.1). In particular as
the zeros of fo are
r: = ( 1 -
$)$, Ro= 1
270
1 and bo= ( 3 - 2 f i ) - . "lo
Then the concretizations of e.g., Theorems 6.5.1 and 6.5.5 are given respectively by: Theorem 6.5.8 Let yo
> 0 be a given constant. Assume operator F satisfies
for all
[ (
X E U xo, I - -
Then F1(xo)-I
I:);
E L(Y,X ) ,
- =U.
and is diflerentiable in the ball
6.6. Exercises
Moreover
uo c u and the radius in Uo is the best possible. Theorem 6.5.9 Let yo
> 0 be a given constant, and
a=Pyo53-2h. Assume operator F satisfies
for all x E v ( x o , r ) ,where ry 5 r < r: if a < 3 - 2 f i , or r = ry if a = 3 - 2 f i . Then equation F ( x ) = 0 has a unique solution x* satisfying (6.5.19) in the ball U(x0,r ). The rest of the results in [340]can be improved along the same lines. However we leave the details as exercises to the motivated reader. Iterative methods of higher order using hypotheses on the second (or higher) F'rkchet-derivative as well as results involving outer or generalized inverses can be found in the Exercises that follow.
6.6
Exercises X + Y be a three times Frkchet-differentiable operator defined on an open convex domain S of Banach space X with values in a Banach space Y. Assume F1(xo)-' exists for some xo E s, IIFt(xo)-lll 5 P, ~ ~ F ~ ( X ~ ) - ~5 FV ,( IIF"(x)II X O ) ~ ~5 M , llF'"(~)Il5 N , l l F " ' ( ~ )- F'"(Y)II 5 Lllz - y 11 for all x , y E S , and U ( x o ,r ~ C_) S , where
6.1 Let F : S
and r
= limn,,
Cy=o (ci
+ di). If A E
[0,
+I, B
=
[0,
( P ( A ) - 17c)]and
C E [ 0 , ~ ] , w h e r e ~ ( A ) = 2 7 ( ~ - 1 ) ( 2 A - ~ ) ( A z + A(+A~2 + ) 2A+4).
6. More Results on Newton's Method
228
Then show [186]: Chebyshev-Halley method given by
is well defined, remains in U (xo,rq) and converges to a solution x* t of equation F (x) = 0. Moreover, the solution x* is unique in U Furthermore, the following estimates hold for all n 2 0
0(xo,rg)
b where y = 2. bl
6.2. Consider the scalar equation [I851
f (x) = 0. Using the degree of logarithmic convexity of f
the convex acceleration of Newton's method is given by
#
2 1754877, the interval [a,b] satisfying a+5 2k-1 f ( b ) b and xo t [a, b] with f (xo) > 0, and xo 2 a+ m. If ILf (x) I 5 $ and L f' (x) E [ i ,2 ( k - 1)2 - i ) in [a, b], then show: Newton's method converges to a solution x* of equation f (x) = 0 and xzn > x*, xzn+l I x* for all n 2 0. for some xo t W.Let k
6.3 Consider the midpoint method [68], [158]:
for approximating a solution x* of equation F (x) = 0. Let F : 0 G X + Y be a twice Frkchet-differentiable operator defined on an open convex subset of a Banach space X with values in a Banach space Y. Assume:
6.6. Exercises
(1) roE L ( Y , X ) for some xo E R and (2) IlroF(xo)ll
I
llroll I P;
rl;
(3) IIF" (x)ll I M (a: E 0 ); (4) IIF" ( x )- F1' (9)II I K llx - Yll ( ~ 7 E9 0 ) . 2
Denote a0 = MPq, bo = K P ~ ' . Define sequence a,+l = a, f (a,) g (a,, b,), If bn+l = bnf (an)3g (an,bn)2, f ( x ) = and g ( x ,Y ) = ( 2 - x ) 0 < a0 < bo < h (ao),where
e,
4,
xzz+2.
>
then show: midpoint method {x,) (n 0) is well defined, remains in n ( x o ,Rq) and converges at least R-cubically to a solution x* of equation F ( x ) = 0.The - Rq) n R and solution x* is unique in U ( x o ,
&
6.4 Consider the multipoint method in the form [68], [158]:
for approximating a solution x* of equation F ( x ) = 0. Let F be a twiceF'rkchet-differentiable operator defined on some open convex subset R of a Banach space X with values in a Banach space Y. Assume:
(1) roE L (Y,X ) , for some xo E X and llroll 5 P; (2) IlFoF (xo>ll5 rl; 3 I I 5 , (x E 0 ); (4) llF" ( 2 )- F" (Y)II I K 112 - YllP, ( x , Y )E 0, K L 0,P E Denote ao = MBv, bo = K,Bql+p and define sequence
lo,11.
6. More Results on Newton's Method
230 Suppose a0 E (0,
hp (x, 4)
=
i) and bo < hp (ao,8 ) ) where p+l p+2 (1 d2+tb(+2)0!l
-
2%)(8 - 4x2 - x3) .
+ q)h,
Then, if u (so,Rq) C R, where R = (1 A = f (ao)-' show: iteration {x,) (n 2 0) is well defined, remains in u (xo,Rq) for all n 2 0 and converges with R-order at least 2 p to a solution x* of equation F (x) = 0. - Rq) n 0 . Moreover, the following The solution x* is unique in U(xO, estimates hold for all n 2 0
+
&
where y = a. ao 6.5. Consider the multipoint iteration [191]:
X + Y be a three for approximation equation F (x) = 0. Let F : R times F'r6chet-differentiable operator defined on some convex subset R of a Banach space X with values in a Banach space Y. Assume Ft (xo)-' E L (Y, X ) (xo E R), llroll 5 a, I ~ ~ o F ( x5o P, ) ~llF" ~ 5 M ) IlF"' (.)I1 5 N ) and IIF"' (x) - F"' (y)ll 5 k IIx - - 1 1 for a11 x, y E R. Denote 8 = M a p , w = ~ a / and 3 ~ S = ~ a p Define ~ . sequences
and
Moreover, assume:
u (xo, RP) C R, where
n
R = lim x d i , B E (0,;)) n-cc i=o 27(20-1)(e~-se~+i60-8) OIS< 17(1-0)~ , O I w
0 in the form:
F ( x n )+ F' ( x n )(yn - xn) = 0, 3F ( x n )+ 3F' ( x n ) [xn + $ (yn - xn)] (yn - xn) 4F' (yn)(xn+l - yn) = 0, for approximating a solution x* of equation F ( x ) = 0. Let F : R 5 X -+ Y be a three-times FrBchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume:
+
(1)
ro= F'
(xo)-' E L (Y,X ) for some xo E R with
llroll 5 P;
(2) IPoF (x0)Il 5 7 ; (3) llF" (x)ll I M ( x E 0 ); ( 4 ) IlF"' ( x ) - F"' (y)ll 5 L IIx - yll ( x ,Y E R ) , ( L 2 0 ) . Denote by a0 = MPq, co = LPq3, and define sequences
where
f ( ) = +( 1 - 2 ) Suppose: a0 E
and g ( x ,Y ) =
[(,I:), + %Y]
'
i),co < h (ao),where
(0,
h ( x )=
27(2~-1)(2-1)(2-3+&)(2-3-~)
17(1-2)"
9
and A = f (ao)-'. then show: iteration {x,} ( n 2 0 ) is well defined, remains in U ( x o ,Rq) for all n 2 0 and converges to a solution x* of equation F ( x ) = 0. the solution x* is unique in U ( x o , - Rq) n R and
&
where y = a. ao 6.10. Consider the multipoint iteration method [161]:
yn
= xn - F'
+
Gn = F' x n + ~= y n
(xn)-l F (x,)
-
[F' (xn $ (yn - x,)) - F' (x,)] i G n [ I - i G n ] (yn - xn) ( n 2 O ) ,
,
for approximating a solution x* of equation F ( x ) = 0. Let F : R c X -+ Y be a three times FrBchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume:
234
6. More Results on Newton's Method (1)
ro= F'
(xo)-l E L (Y,X ) exists for some xo
E
R and
llroll I P;
(2) IlFoF (x0)ll I rl; (3) IIF" ( ~ 1 1 1I M ( x E 0 ); ( 4 ) IlF"'(x)II I N ( x € f l ) ;
( ~ > l l 5 L IIx - ~ l (l x ,Y E 01, ( L 2 0 ) .
( 5 ) llF"' ( x ) - F"'
Denote by a0 = MPq, bo = N/3q2 and co = L/3q3. Define the sequence
where
and
+ + 5 ) + 18xy + 1721 .
[27x3( x 2 22
g ( x ,y, z ) = If a0 E (0,;), 17co
+ l8aobo < p (ao),where
+ + 2) ( x 2+ 2%+ 4 ) ,
p ( x ) = 27 (1 - X ) ( 1 - 22) ( x 2 x
U(xo,~q)cRR , = [l+?(l+ao)]
,
1
7=
a0
A = f (ao)rl,
then show: iteration { x n ) ( n 2 0 ) is well defined, remains in U ( x o ,Rq) for all n 2 0 and converges to a solution x* of equation F ( x ) = 0, which is unique in U ( x o , - Rq) n R. Moreover the following estimates hold for all n 2 0:
&
6.11. Consider the Halley method [68],[131]
where
for approximating a solution x* of equation F ( x ) = 0. Let F : R G X + be a twice F'rkchet-differentiable operator defined on an open convex subset of a Banach space X with values in a Banach space Y. Assume:
6.6. Exercises
(1) F' (xo)-'
E
L (Y,X ) exists for some xo
(2) llF' (xo)-l F (xo) II
(3) llF'
E
0;
I P;
F" (xo)I1 I 7;
(4) IIF' (xo)-l [F" ( 2 )- F" (911 II I
11%
-
YII
( 2 ,Y
E 0).
+
+
(rl I r 2 ) where r l , 7-2 are the positive roots of h ( t ) = 8 - t zt2 9 t 3 , then show: iteration {x,) ( n 2 0 ) is well defined, remains in U ( x o ,r l ) for 0 and converges to a unique solution x* of equation F ( x ) = 0 in all n u ( x o ,r l ) . Moreover, the following estimates hold for all n 2 0:
>
-ro is the negative root of h, and tn+l = H (t,), where
H ( t ) = t - h (4lh' ( t ) 1 - + L ( t~)'
Lh ( t )= h (4lh" (4 h'(t)2 '
6.12. Consider the iteration [68],[158]
where r, = F' (xn)-' and T (2,) = +I',A~,F (x,) ( n 2 O ) , for approximating a solution x* of equation F ( x ) = 0. Here A : X x X + Y is a bilinear operator with IlAll = a, and F : R & X + Y is a F'rbchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y . Assume:
(1) F' (xo)-l = ro E L ( Y , X ) exists for some xo E R with
llroll 5 8;
(2) IIr'oF (x0)ll I rl; (3) llF' ( x )- F' (Y)II I k
112 - yll(x, Y E
a>.
Let a, b be real numbers satisfying a E [0,
i),b E (0,o),where
236
6. More Results on Newton's Method
Set a0
=
1, co
=
1, bo
=
b
and do = 1
+ .,b
Define sequence
~ifi > g),
dk ( n 0). If a = l ~ P q6 [0, $), U (XO,TT]) 0 , r = and rn+l = limn,, r,, a E [O, then show: iteration {x,) (n 0) is well defined, remains in U(xo,rq) and converges to a solution x* of equation F (x) = 0, which is unique in U(xo, -$- rq). Moreover the following estimates hold for all n 0
>
>
and OC)
6.13. Consider the Halley-method [68], [I751 in the form:
for approximating a solution x* of equation F (x) = 0. Let F : R G X + Y be a twice FrGchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume: (1) roE L (Y, X) exists for some xo E R with
(2) llF" (x)11
llroll < P;
M (x E 0 ) ;
(3) llF" (x) - F" (Y)II I N Ilx - Yll (x,Y 6 0); (4) IIFoF (xo) 11 I T ] ; (5) the polynomial p (t) = $t2 - i t + $ , where M ~ $ + I k2 has two positive roots rl and 7-2 with (r1 < 7-2).
Let sequences {s,), {t,}, (n
> 0) be defined by
If, u (xo,rl) R, then show: iteration {x,} ( n 2 0) is well defined, remains in U (so,r l ) for all n 0 and converges to a solution x* of equation F (x) = 0.
>
6.6. Exercises
237
Moreover if rl < r2 the solution x* is unique in following estimates hold for all (n 0 )
>
u (xo,r2). Furthermore, the
6.14. Consider the two-point method [68], [153]:
for approximating a solution x* of equation F ( x ) = 0. Let F : R X -+ Y be a twice-FrBchet-differentiableoperator defined on an open convex subset R of a Banach space X with values in a Banach space Y . Assume: (1)
ro= Ft (xo)-'
E
L ( Y , X ) exists for some xo
E
R with
llroll 5 P;
(2) IlFoF (xo)ll 5 rl;
(3) llF" (x)ll I M ( x E a>; (4) IIF" ( x )- F" (Y>II I K
112
-~
l ,l
x , Y E 0.
Denote by a0 = MPr), b0 = KPV2. Define sequences
2(1-x)
where f ( x ) = and gp ( x ,Y ) = (0, bo < hp (ho),where
3x3+2y(l-~)[(1-6~)~+(2+3~)1. 24(1-x)'
~f
a.
E
i),
>
then show: iteration { x n ) ( n 0 ) is well defined, remains in U ( x o ,$) for all n 0 and converges to a solution x* of equation F ( x ) = 0, which is unique in U ( x o , Moreover, the following estimates hold for all n 2 0:
>
where y
z).
=
and A
=f
(ao)-'.
238
6. More Results on Newton's Method
6.15. Consider the two-step method:
c
for approximating a solution x* of equation F (x) = 0. Let F : R X 2 Y be a FrBchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume:
ro = f' (xo)-'E L (Y,X) for some xo E R, llroll5 P; (2) IlFoF (x0)Il 5 v;
(1)
(3) IIF1(x) - F1(y)ll 5 K llx - yll (X,YE 0 ) . Denote a0 = lcPq and define the sequence a,+l = f (an12g (a,)?, (n 1 0): andg(x) = x 2 ( x + 4 ) / 8 . Ifao E (o,;), U(x0,Rq) G where f (x) = ,-,z-z 1+= 7= and A = f (ao)-', then show: iteration {x,) (n 0) R, R = is well defined, remains in U (xo,Rq) for all n 0 and converges to a solution x* of equation F (x) = 0, which is unique in U(xo, - Rq) n 0 . Moreover, the following estimates hold for all n 2 0
e,2
>
>
&
6.16 (a) Assume there exist non-negative parameters K , M, L,
t, p, q, 6 E [O,1]
such that:
and
Show: iteration {t,) (n 2 0) given by
is nondecreasing, bounded above by t**, and converges to some t* such that
Moreover, the following estimates hold for all n
>0
6.6. Exercises
239
(b) Let F: D c X + Y be a Fr6chet-differentiable operator. Assume: there exist an approximation A(x) E L(X, Y) of F1(x),an open convex subset Do of D, xo E Do, a bounded outer inverse A# of A(xo), and parameters r > 0, K > 0, M 0, L 0, ,U 0, e 0 such that (1)-(3) hold
and
for all x, y E Do, and
Show: sequence {x,} (n > 0) generated by Newton-like method with
is well defined, remains in U(xo, s * ) for all n 2 0 and converges to a unique solution x* of equation A#F(x) = 0, (xo,t*) n Do Moreover, the following estimates hold for all n
20
and 112,
- x*11 I t* - t,.
( c ) Assume:
- there exist an approximation A(x)
E L(X, Y) of F1(x), a simple solution x* E D of equation (I), a bounded outer inverse A t of A(x*) and non-negative M , ,iil, , such that: parameters K ,
x,
and
for all x, y E D;
240
6. More Results on Newton's Method -
equation
has a minimal non-negative zero r* satisfying
L r + L < 1, and U(x*, r*) G D.
>
Show: sequence {xn) (n 0) generated by Newton-like method is well de0 and converges to x* provided that fined, remains in U(x*,r*) for all n xo E U(x*, r*). Moreover, the following estimates hold for all n 2 0:
6.17 (a) Let F : D (m 2 integer). Assume:
>
G X
+
>
Y be an m-times Frkchet-differentiable operator
(al) there exist an open convex subset Do of D,xo E Do,a bounded outer inverse F' (xo)# of F' (xO),and constants Q > 0, cri 2 0, i = 2, ..., m + 1 such that for all x, y E Do the following conditions hold: (6.6.1) IIF'(X~)#(F(")(x) - F(") (xo))ll I E , E > 0, Vx E U (xo,60) and some 60 > 0.
llF'
11 F' (xo)'
F (xo) II I 711 F(') (xo)
11 I cri
the positive zeros s of p' is such that
where,
Show: polynomial p has only two positive zeros denoted by t*, t** (t* I t**).
6.6. Exercises
(a2) U(xO,6)
C DO,
6 = max(60, t*, t**).
>
Moreover show: sequence {x,) (n 0) generated by Newton's method with ~ )F ~ ( X ~ ) ) ] - ~ F ~ ( (n X ~2) 0) # is well defined, Ft(x,)# = [I F t (X ~ ) # ( F ' ( X remains in U(xo, t*) and converges to a solution x* E U(xo,t*) of equation F ~ ( x ~ ) # F ( x= ) 0;
+
-
the following estimates hold for all n
20
and
where {t,) (n 2 0) is a monotonically increasing sequence converging to t* and generated by
(b) Let F : D C X + Y be an m-times F'rkchet-differentiable operator (m integer). Assume:
> 2 an
(bl) condition (1) holds; (bz) there exists an open convex subset Do of D , xo E Do, and constants a, p, 7 0 such that for any x E Do there exists an outer inverse F'(x)# of Ft(x) satisfying N(Ft(x)#) = N(Ft(xo)#) and
>
1
1
IFt(y)#
0
+
Ftt[x t ( -~x)1(1 - t)dt(y -
I [ y l l y-
X)~II I
+ .- .+ 21 Ily - x]I2,
for all x, y E Do,
and U(x0, r ) where,
C DO
with r = min
{&,
60).
6. More Results on Newton's Method
242
Show: sequence {xn) (n 2 0) generated by Newton's method is well defined, remains in U(xo,r) for all n 2 0 and converges to a solution x* of F'(X~)#F(X)= 0 with the iterates satisfying N(Ft(xn)#) = N(Ft(xo)#) (n 2 0). Moreover, the following estimates hold for all n 2 0
and Ilxn-x~II 5
1-rn *IIxI-xoII
5
5 n. Any outer inverse M of L will be an n x m matrix. Show:
(a) If rank(L) = n, then L can be written as
where I is the n x n identity matrix, and A is an m x m invertible matrix.The n x m matrix M = [ I
~ 1 A - l
is an outer inverse of L for any n x (m - n) matrix B (b) If rank (L) = r < n, then L can be written as
where A is an m x m invertible matrix, I is the r x r identity matrix, and C is an n x n invertible matrix. If E is an outer (inner) inverse of the matrix
[ ; :1,
then the n x m matrix
is an outer (inner) inverse of L. (c) E is both an inner and an outer inverse of be written in the form
[
:]
if and only if E can
6.6. Exercises
243
6.19 Let F : D 2 X
-+ Y be a Frkchet-differentiable operator between two Banach space X and Y , A ( x ) E L ( X ,Y ) ( x E D ) be an approximation to F' ( x ) . Assume that there exist an open convex subset Do of D, xo E Do, a bounded ) constants r], k > 0, M , L, p, 1 2 0 such outer inverse A# of A (= A ( s o )and that for all x , y E Do the following conditions hold:
Assume h = or] I
Do,t
*
-
-
l-b-J(l-b)'-2h , 7
4 (1 - b)2, o := max ( k ,M + L ) , and u = u ( x o ,t*) . Then show
(i) sequence {x,) ( n 2 0) generated by x,+l = x,-A (x,)' F (x,) ( n 2 0 ) with A (x,)# = [I A# ( A (x,) - A ) ] A# remains in U and converges to a solution x* E U of equation A#F ( x ) = 0.
+
-'
(ii) equation A#F ( x ) = 0 has a unique solution in where
6={ U( X O(xo,t**) , t*) n D~ if n Do
u n {R ( A # ) + x o ) ,
4
h = ( 1 - b)2 if h < (1 - b ) 2 ,
and
(iii) IIxn+l - x,I
5 tn+l - tn, I ~ X * - xnll I t* - tn, where to = 0, tn+l f ( tn+ ~ ( t , ) ' t ) = %t2- (1 - b) t + r], and g ( t ) = 1 - Lt - l.
=
fctn)
6.20 Let F : D
X + Y be a Frkchet-differentiable operator between two Banach spaces X and Y and let A ( x ) E L ( X ,Y ) be an approximation of F' ( x ) . Assume that there exist an open convex subset Do of D, a point xo E Do and constants r], k > 0 such that for any x E Do there exists an outer inverse A (2)' of A ( x )satisfying N ( A (x)') = N ( A # ) , where A = A ( x o )and A# is a
6. More Results on Newton's Method
244
bounded outer inverse of A, and for this outer inverse the following conditions hold:
) Do with for all x , y E Do and t E [O, I], h = i k r l < 1 and ~ ( x o , r 2 r = 1-h' L Then show sequence {x,) ( n > 0 ) generated by X,+I = x, A (x,)' F (x,) ( n > 0 ) with A (x,)' satisfying N ( A (x,)') = N ( A # ) remains in u ( x o ,r ) and converges to a solution x* of equation A#F ( x ) = 0. 6.21 Show that Newton's method with outer inverses x,+l
= x, - F' (x,)' ( n 2 0 ) converges quadratically to a solution x* E 6 n { R ( F f ( x o ) # ) x o ) of equation F' (xo)'F ( x ) = 0 under the conditions of Exercise 6.4 with A ( x ) = F' ( x ) ( x E Do).
+
C X + Y be Frkchet-differentiable and assume that F' ( x )satisfies a Lipschitz condition
6.22 Let F : D
Assume x* E D exists with F ( x * )= 0. Let a Suppose there is an
t) C D.
> 0 such that U (x*,
F' (x*)' E R (F' ( x * ) )= { B E L (Y,X ) : BF' ( x * )B = B , B # 0 ) such that IIF' (x*)' 11 I a and for any x E U (x*,&), the set R (F' ( x ) )contains an element of minimum norm. Then show there exists a ball U (x*,r ) 2 D with cr < such that for any xo E U (x*,r ) the sequence {x,) ( n 0 ) xn+l = x, - F' (x,) # F (x,) ( n 2 0 ) with
&
F' (xo)'
>
E argmin (11BII
+
IB
E
R (F' (xo)))
and with F' (2,)' = ( I F' (xo)' (F' (x,) - F' ( X ~ ) ) ) - ~(xo)' F ' converges quadratically to 3* E U (xo,&) n { R ( F 1(xo)') x o ) , which is a solution xo = { x xo : x E of equation F' (xo)' F ( x ) = 0. Here, R(F1(xo)') R(F' (xo)#)).
+
+
+
Chapter 7
Equations with Nonsmooth Operators In recent years, Newton-type methods have been generalized to solve many problems in mathematical programming. An excellent treatment of these generalizations in the form of a genearlized equation was inaugurated by S. M. Robinson [297], [298]. Relevant work has been conducted by Josephy [200]–[203] and Argyros [76], [81]–[89]. When applied to an optimization problem Newton-type methods constitute a type of sequential quadratic programming (SQP) algorithms. These algorithms are very efficient and among the most promising for solving large-scale problems. See Gill, Murray and Wright [168] for more discussions on the numerical aspects of these algorithms. There have been several previous papers dealing with Newton’s method for solving non-Fr´echet differentiable operators in normed spaces Kojima and Shindo [209] analized Newton and quasi-Newton methods for systems of piecewise continuously differentiable operators, and Kummer [215] extended the Kojima-Shindo result to real Banach spaces and studied Newton’s method defined by a large family of linear continuous operators. Catinas [117], Chen and Yamamoto [121] Nguen and Zabrejko [363], Argyros [18], [34], [68] have provided results on equations involving a Fr´echet differentiable term and another term whose differentiability is not assumed. Robinson [302] discusses a Newton method for nonsmooth operators which admits a certain kind of a point-based approximating operator. He then provides a semilocal convergence analysis using the Newton-Kantorovich hypothesis and the nondiscrete mathematical induction (see Potra and Ptak [284], [286]. In this chapter we further exploit the weaker version of the Newton-Kantorovich theorem. The benefits of this approach have already been explained in Section 5.1. In particular in the first section we show that under weaker hypotheses and same computational cost we can provide a finer convergence analysis and a better information on the location of the solution than the corresponding one in [302]. 245
246
7. Equations with Nonsmooth Operators
Moreover, in sections 2–4 we examine the semilocal convergence of Newton’s method under even weaker hypotheses. Furthermore in Sections 5 and 6 we do the same for the secant method under different hypotheses whose advantages over each other have already been explained in Section 5.1. Finally as a future research we suggest to the motivated reader to further exploit our work here to improve the results obtained by Robinson [302], Pang et. all [267], Qi [287], Han [178], Shapiro [309], Ip et all [197] concerning Bouligand and other types of derivatives in connection with Newton-type methods.
7.1
Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions
The classical Newton-Kantorovich method (4.1.3) replaces the operator whose solution is sought by an approximate operator that we hope can be solved easier. A solution of the approximation (linearization) can be used to restart the process. Under certain conditions a sequence is being generated that converges to x∗ . In particular in the case of the Newton–Kantorovich theorem (see Section 4.1) the approximating operators are linearizations of the form F (x) + F ′ (x)(x − z) where given x we solve for z. Robinson [302] extended this theorem to hold for nonsmooth operators, where the linearization is no longer available. This class includes smooth operators as well as nonsmooth reformulations of variational inequalities. He used as an alternative a point-based-approximation which imitates the properties of linearization of Newton’s method. Here we extend Robinson’s work, and in particular we show that it is possible under weaker hypotheses, and the same computational cost to provide finer error bounds on the distances involved, and a more precise information on the location of the solution x∗ . We need the definitions. Definition 7.1.1 Let F be an operator from a closed subset D of a metric space (X, m) to a normed linear space Y . Operator F has a point-based approximation PBA on D if there exist an operator A : D × D → Y and a number η ≥ 0 such that for each u and v in D, (a) kF (v) − A(u, v)k ≤ 12 ηm(u, v)2 , and (b) operator A(u, ·) − A(v, ·) is Lipschitzian on D with modulus ηm(u, v). We then say operator A is a PBA for F with modulus η. Moreover if for a given point x0 ∈ D (c) operator A(u, ·) − A(x0 , ·) is Lipschitzian on D with modulus η0 m(u, x0 ), then we say operator A is a PBA for F with moduli (η, η0 ) at the point x0 . Note that if F has a Fr´echet-derivative that is Lipschitzian on a closed, convex set D with modulus η, and X is also a normed linear space then by choosing A(u, v) = F (u) + F ′ (u)(v − u)
(7.1.1)
7.1. Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions
247
(used in method (4.1.3)), we get by (a) kF (v) − F (u) − F ′ (u)(v − u)k ≤
1 ηkv − uk2 , 2
whereas (b) and (c) are obtained by the Lipschitzian property of F ′ . As already noted in [300] the existence of a PBA does not necessarily imply differentiability. Method (4.1.3) is applicable to any operator having a PBA A, if solving the equation A(x, z) = 0 for z provided that x is given is possible. We also need the definition concerning bounds for the inverses of operators: Definition 7.1.2 Let X, D, and Y be as in Definition 7.1.1, and F : D → Y . Then we define the number: kF (u) − F (v)k δ(F, D) = inf , u= 6 v, u, v ∈ D . m(u, v) Clearly, if δ (F, D) 6= 0 then F is 1–1 on D. We also define the center-Lipschitzian number kF (u) − F (x0 )k δ0 (F, D) = inf , u 6= x0 , u ∈ D . m(u, x0 ) Set d = δ(F, D) and d1 = δ0 (F, D). At this point we need two lemmas. The first one is the analog of the Banach Lemma on invertible operators 1.3.2. The second one gives conditions under which operator F is 1–1. Lemma 7.1.3 Let (X, d) be a Banach space, D a closed subset of X, and Y a normed linear space. Let F and G be functions from D into Y , G being Lipschitzian with modulus η and center-Lipschitzian with modulus η0 . Let x0 ∈ D with F (x0 ) = y0 . Assume: (a) U (y0 , α) ⊆ F (D), (b) 0 ≤ η < d, (c) U (x0 , d−1 1 α) ⊆ D, and (d) θ0 = (1 − η0 d−1 1 )α − kG(x0 )k ≥ 0. Then the following hold: U (y0 , θ0 ) ⊆ (F + G) U (x0 , d−1 1 α) and
0 < d − η ≤ δ(F + G, D).
248
7. Equations with Nonsmooth Operators
−1 Proof. Let y ∈ U (y0 , θ0 ), x ∈ U (x0 , d−1 (y − G(x)). 1 α) and define T y(x) = F Then we can have in turn
ky − G(x) − y0 k ≤ ky − y0 k + kG(x) − G(x0 )k + kG(x0 )k ≤ θ0 + η0 d−1 1 α + kG(x0 )k = α.
That is the set T y(x) 6= ∅ and in particular it contains a single point because d > 0. The rest follows exactly as in Lemma 3.1 in [300, p. 298] (see also the more detailed proof of Lemma 7.2.2). Remark 7.1.1 In general η0 ≤ η,
and
d ≤ d1
(7.1.2)
hold and ηη0 , dd1 can be arbitrarily large (see Section 5.1). If η0 is replaced by η and d1 by d in Lemma 1 then our result reduces to the corresponding one in [Lemma 3.1, [300]]. Denote θ0 by θ if η0 is replaced by η and d1 by d. Then in case strict inequality holds in any of the inequalities in (7.1.2), we get θ < θ0 ,
(7.1.3)
which improves the corresponding Lemma 3.1 in [302], and under the same computational cost, since in practice the computation of η requires the computation of η0 or d1 respectively. Lemma 7.1.4 Let X and Y be normed linear spaces, and let D be a closed subset of X. For x0 ∈ D and F : D → Y assume: function A : D × D → Y is a PBA for F on D with moduli (η, η0 ) at the point x0 ; there exists x0 ∈ D such that U (x0 , ρ) ⊆ D. Then the following hold: δ F, U (x0 , ρ) ≥ d − η1 ρ
(7.1.4)
where,
1 d = δ A(x0 , ·), D , and η1 = η0 + η. 2
Moreover, if d − η1 ρ > 0 then F is 1–1 on U (x0 , ρ). Proof. It follows as in the proof of Lemma 2.4 in [302, p. 295] but there are some 2 differences. Set x = x1 +x for x1 and x2 in U (x0 , ρ). Then we can write: 2 F (x1 ) − F (x2 ) = [F (x1 ) − A(x, x1 )] + [A(x, x1 ) − A(x, x2 )] + [A(x, x2 ) − F (x2 )].
(7.1.5)
7.1. Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions
249
We will find bounds on each one of the quantities inside the brackets above. First by (a) of Definition 7.1.1, we obtain kF (xi ) − A(x, xi )k ≤
1 1 ηkx − xi k2 = ηkx1 − x2 k2 . 2 8
(7.1.6)
Using the triangle inequality we get kA(x, u) − A(x, v)k ≥ kA(x0 , u) − A(x0 , v)k − k[A(x, u) − A(x0 , u)] − [A(x, v) − A(x0 , v)]k.
(7.1.7)
By the definition of d, and (7.1.7) we get in turn δ(A(x, ·), D) ≥ δ(A(x0 , ·), D) − sup{k[A(x, u) − A(x0 , u)]
− [A(x, v) − A(x0 , v)]k/ku − vk, u 6= v, u, v ∈ D} ≥ d − η0 kx − x0 k ≥ d − η0 ρ.
(7.1.8)
Hence by (7.1.5), (7.1.8) and (c) of Definition 7.1.1 we have 1 kF (x1 ) − F (x2 )k ≥ (d − η0 ρ) − ηkx1 − x2 k kx1 − x2 k 4 and for x1 6= x2 , 1 kF (x1 ) − F (x2 )k/kx1 − x2 k ≥ d − η0 ρ − ηkx1 − x2 k 4 1 1 ≥ d − η0 ρ − ηk(x1 − x0 ) + (x0 − x2 )k ≥ d − η0 ρ − ηρ. 4 2
Remark 7.1.2 If equality holds in (7.1.2), then Lemma 7.1.4 reduces to Lemma 2.4 in [302]. Otherwise our result provides a wider range for δ(F, U (x0 , ρ)). Let r0 and d0 be positive numbers. It is convenient to define scalar sequences {tn }, {tn } by tn+2 = tn+1 +
2 d−1 0 η(tn+1 − tn ) , t0 = 0, t1 = r0 −1 2(1 − d0 η0 tn+1 )
sn+2 = sn+1 +
2 d−1 0 η(sn+1 − sn ) , s0 = 0, s1 = r0 . 2(1 − d−1 0 ηsn+1 )
(7.1.9)
and (7.1.10)
We set hA = d−1 0 ηr0 ,
η=
η0 + η 2
(7.1.11)
250
7. Equations with Nonsmooth Operators
and hK = d−1 0 ηr0 .
(7.1.12)
Note that 1 hK ≤ hA ≤ hK . 2
(7.1.13)
If strict inequality holds in (7.1.2) then so does in (7.1.13). It was shown in Section 5.1 that sequence {tn } converges monotonically to some t∗ ∈ (0, 2r0 ] provided that hA ≤ 1.
(7.1.14)
Moreover by the Newton–Kantorovich theorem for nonsmooth equations 4.2.4 sequence {sn } converges monotonically to some s∗ with 0 < s∗ =
r0 1 − (1 − 2hK )1/2 ≤ 2r0 hK
(7.1.15)
provided that the famous Newton–Kantorovich hypotheses hK ≤ 1.
(7.1.16)
holds. Note also that hK ≤ 1 ⇒ hA ≤ 1
(7.1.17)
but not vice versa unless equality holds in (7.1.2). Under hypotheses (7.1.14) and (7.1.16) we showed in Section 5.1 (for η0 < η): tn < s n
(n ≥ 2)
tn+1 − tn < sn+1 − sn ∗
∗
t − tn ≤ s − s n
(7.1.18) (n ≥ 1)
(7.1.19) (7.1.20)
and t∗ ≤ s∗ ≤ 2r0 .
(7.1.21)
Note that {sn } was essentially used as a majorizing sequence for {xn } in Theorem 3.2 in [302] for Newton’s method. We can state the following semilocal convergence theorem for Newton’s method involving nonsmooth operators: Theorem 7.1.5 Let F : D ⊆ X → Y be a continuous operators x0 ∈ D, and r0 > 0. Suppose that A : D × D → Y is a PBA for F with moduli (η, η0 ) at the point x = x0 . Moreover assume:
7.1. Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions
251
(a) δ(A(x0 , ·), D) ≥ d0 > 0; (b) 0 < hA ≤ 12 ; (c) for each y ∈ U (0, d0 (t∗ − r0 )) equation A(x0 , x) = y has a solution x; (d) the solution S(x0 ) of A(x0 , S(x0 )) = 0 satisfies kS(x0 ) − x0 k ≤ r0 ; and (e) U (x0 , t∗ ) ⊆ D. Then the Newton iteration defining xn+1 by A(xn , xn+1 ) = 0
(n ≥ 0)
is well defined, remains in U (x0 , t∗ ) for all n ≥ 0 and converges to a solution x∗ ∈ U (x0 , t∗ ) of equation F (x) = 0. Moreover the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ tn+1 − tn
(7.1.22)
kxn − x∗ k ≤ t∗ − tn .
(7.1.23)
and
Furthermore if ρ given in Lemma 7.1.4 satisfies t∗ < ρ < d0 η1−1
(7.1.24)
then x∗ is the unique solution of equation F (x) = 0 in U (x0 , ρ). Proof. We use Lemma 7.1.3 with quantities F , G, α, x0 and y0 replaced by A(x0 , ·), A(x1 , ·) − A(x0 , ·), d1 (t∗ − r0 ), x1 = S(x0 ), and 0 respectively. Hypothesis (a) of the Lemma follows from the fact that A(x0 , x) = y has a unique solution x1 (since δ(A(x0 , ·), D)) ≥ d0 > 0). For hypothesis (b) we have δ(A(x1 , ·), D) ≥ δ(A(x0 , ·), D) − η0 kx1 − x0 k ≥ d0 − η0 r0 > 0.
(7.1.25)
Condition (c) of Lemma 7.1.3 follows immediately from the above choices. The condition hA ≤ 12 is equivalent to 2r0 (η + η0 ) ≤ d0 . We have d1 = δ0 (F, D) ≥ δ(F, D) ≥ d0 , and kG(x1 )k ≤ 21 η0 r02 ≤ 12 ηr02 . Then condition (d) of Lemma 7.1.3 follows from the estimate η0 r0 η η θ ≥ 1− d1 (t∗ − r0 ) − r02 = (d1 − η0 r0 )(t∗ − r0 ) − r02 d1 2 2 η0 2 η 2 ∗ ∗ ≥ (d0 − η0 r0 )(t − r0 ) − r0 = (d0 − η0 t1 )(t − t1 ) − t1 2 2 η ∗ = (d0 − η0 r0 )(t − t2 ) + (d0 − η0 t1 )(t2 − t1 ) − (t1 − t0 )2 2 = (d0 − η0 r0 )(t∗ − t2 ) ≥ η0 r0 (t∗ − t2 ) ≥ 0. It follows from Lemma 7.1.3 that for each y ∈ U (0, t∗ − kx1 − x0 k), the equation A(x1 , z) = y has a unique solution z = x2 in D since δ(A(x1 , ·), D) > 0.
252
7. Equations with Nonsmooth Operators
We also have A(x0 , x1 ) = A(x1 , x2 ) = 0 and A(x1 , x1 ) = F (x1 ). By Definition 7.1.2 and kx2 − x1 k ≤ δ(A(x1 , ·), D)−1 kA(x1 , x2 ) − A(x1 , x1 )k
−1 −1 ≤ d−1 kA(x0 , x1 ) − F (x1 )k 0 (1 − d0 η0 kx1 − x0 k) −1 −1 1 ≤ d−1 ηkx1 − x0 k2 ≤ t2 − t1 . 0 (1 − d0 η0 kx1 − x0 k) 2
Hence, we showed: kxn+1 − xn k ≤ tn+1 − tn ,
(7.1.26)
U (xn+1 , t∗ − tn+1 ) ⊆ U (xn , t∗ − tn )
(7.1.27)
and
hold for n = 0, 1. Moreover for every z ∈ U (x1 , t∗ − t1 ), kz − x0 k ≤ kz − x1 k + kx1 − x0 k ≤ t∗ − t1 + t1 = t∗ − t0 , implies z ∈ U (x0 , t∗ − t0 ). Given they hold for n = 0, 1, . . . , j, then kxj+1 − x0 k ≤
j+1 X i=1
j+1 X kxi − xi−1 k ≤ (ti − ti−1 ) = tj+1 − t0 = tj+1 . i=1
The induction can be completed by simply replacing above x0 , x1 by xn , xn+1 respectively, and noticing that kxn+1 − xn k ≤ r0 . Indeed for the crucial condition (d) of Lemma 7.1.3 assigning F as A(xn , ·), G as A(xn+1 , ·) − A(xn , ·), α as d1 (t∗ − tn+1 ) we have, since d1 = δ0 (F, D) ≥ δ(F, D) ≥ d0 − η0 tn : η η(tn+1 − tn ) d1 (t∗ − tn+1 ) − (tn+1 − tn )2 θ0 ≥ 1 − d1 2 η = [d1 − η(tn+1 − tn )](t∗ − tn+1 ) − (tn+1 − tn )2 2 η ∗ ≥ (d0 − ηtn+1 )(t − tn+1 ) − (tn+1 − tn )2 2 η ∗ = (d0 − ηtn+1 )(t − tn+2 ) + (d0 − ηtn+1 )(tn+2 − tn+1 ) − (tn+1 − tn )2 2 = (d0 − ηtn+1 )(t∗ − tn+2 ) ≥ 0. It follows by hypothesis (b) that scalar sequence {tn } is Cauchy. From (7.1.26) and (7.1.27) it follows {xn } is Cauchy too in a Banach space X and as such it converges to some x∗ ∈ U (x0 , t∗ ). Furthermore we have kF (xn+1 )k = kF (xn+1 ) − A(xn , xn+1 )k ≤
1 1 ηkxn+1 − xn k2 ≤ η(tn+1 − tn )2 . 2 2
7.1. Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions
253
That is kF (xn+1 )k converges to zero as n → ∞. Therefore by the continuity of F we deduce F (x∗ ) = 0. Estimate (7.1.23) follows from (7.1.22) by using standard majorization techniques. Moreover for the uniqueness part using Lemma 7.1.4 and hypothesis (7.1.24) we deduce that x∗ is an isolated zero of F , since F is 1–1 on U (x0 , t∗ ). Remark 7.1.3 (a) If equality hold in (7.1.2), then our theorem reduces to the corresponding Theorem 3.2 in [302]. Otherwise according to (7.1.17)–(7.1.21) our theorem provides under weaker conditions, and the same computational cost finer error bounds on the distances involved and a more precise information on the location of the solution x∗ . Moreover the uniqueness of the solution x∗ is shown in a larger ball (see (7.1.24) and compare it with 3.7 in [302] (i.e. compare (7.1.24) with s∗ < ρ < d0 η −1 ). (b) Condition (c) of Theorem 7.1.5 can be replaced by the stronger but more practical: (b)′ For each fixed x0 and y in U (0, d0 (t∗ − r0 )) = U (or in U (0, d0 r0 )) operator Q given by Q(x) = x − y + A(x0 , x) is a contraction on U and maps U into itself. It then follows by the contraction mapping principle (see Chapter 3) that equation A(x0 , x) = y has a solution x in U. We can also provide a local convergence result for Newton’s method: Theorem 7.1.6 Let F : D ⊆ X → Y be as in Theorem 7.1.5. Assume: x∗ ∈ D is a simple solution of equation F (x) = 0; operator F has a point-based approximation A on D with moduli (ℓ, ℓ0 ) at the point x∗ . Moreover assume that the following hold: (a) δ(A(x∗ , ·), D)≥ d∗ > 0; (b) d∗ > ℓ0 + 12 ℓ r∗ ; (c) for each y ∈ U (0, d∗ r∗ ) and provided x0 ∈ U 0 (x∗ , r∗ ) the equation A(x0 , x) = y has a solution x satisfying kx − x∗ k ≤ r∗ ; and (d) U (x∗ , r∗ ) ⊆ D. Then Newton’s method is well defined, remains in U (x∗ , r∗ ) for all n ≥ 0 and converges to x∗ , so that kxn+1 − x∗ k ≤
ℓkxn − x∗ k2 − ℓ0 kxn − x∗ k)
2(d∗
(n ≥ 0).
(7.1.28)
Proof. It follows immediately by replacing x0 by x∗ in Lemma 7.1.4 and in appropriate places in the proof of Theorem 7.1.5. Remark 7.1.4 (a) The local convergence of Newton’s method was not studied in [302]. However if it was it would have been as in Theorem 7.1.6 with ℓ0 = ℓ. Note however that as in (7.1.2) in general ℓ0 ≤ ℓ
(7.1.29)
254
7. Equations with Nonsmooth Operators
∗ holds, and ℓℓ0 can be arbitrarily large. Denote by rR the convergence radius to have been obtained by Robinson. It then follows by (b) of Theorem 7.1.6 that
r1∗ =
2d∗ 2d∗ ≤ r∗ = . 3ℓ 2ℓ0 + ℓ
(7.1.30)
Moreover in case strict inequality holds in (7.1.29), then so does in (7.1.30). This observation is very important in computational mathematics since our approach allows a wider choice for initial guesses x0 . Moreover the corresponding error bounds are obtained if we set ℓ = ℓ0 in (7.1.29). Therefore our error bounds are also finer (for ℓ0 < ℓ) and under the same computational cost. (b) If operator A is given by the linearization (7.1.1), then A(xn , xn+1 ) = 0 is a linear equation. Thus we have the Newton’s method for differentiable operators. Otherwise, equation A(xn , xn+1 ) = 0 applies to a wide class of important real life problems for which it is not a linear equation. Such problems are discussed below. Application 7.1.7 (Application to Normal Maps.) An important class of nonsmooth functions that have tractable point-based approximations. This class was given in [302] and is a reformulation of the class of variational inequalities on a Hilbert space, so in particular it includes the finite-dimensional generalized equations investigated by Josephy [200]–[203], as well as many other problems of practical importance. For many members of this class, computational methods are available to solve the Newton subproblems, and therefore the method we suggest here can be applied not only as a theoretical device, but also as a practical computational method for solving problems. The functions we shall treat are the so-called normal maps. We define these, and discuss briefly their significance, before relating them to point-based approximations. Let C be a closed convex subset of a Hilbert space H, and denote by Π the metric projector from H onto C. Let Ω be an open subset of H meeting G, and let F be a function from Ω to H. The normal map FC is defined from the set Π−1 (Ω) to H by FC (z) = F (Π(z)) + (z − Π(z)). A more extensive discussion of these normal maps is in [301], where we show how the first order necessary optimality conditions for nonlinear optimization, as well as linear and nonlinear complementarity problems and more general variational inequalities, can all be expressed compactly and conveniently in the form of FC (z) = 0 involving normal maps. This is true because, as is easily verified, the variational problem of finding an x0 ∈ C such that for each c ∈ C, hF (x0 ), c − x0 i ≥ 0 is completely equivalent to the normal-map equation FC (z0 ) = 0 through the transformation z0 = F (x0 ). However, the use of normal maps sometimes enable one to gain insight into special properties of problem classes that might have remained obscure in the formalism of variational inequalities. A particular example of this is the characterization of the local and global homeomorphism properties of linear normal maps given in [301] and improved in [291].
7.1. Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions
255
As the functions F and the sets C used in the constructioin of these normal maps are those naturally appearing in the optimizatioin or variational problems, the normal-map technique provides a simple and convenient device for formulating a wide variety of important problems from applications in single, unified manner. Results established for such normal maps will then apply to the entire range of these applicatioins, so that there is no need to derive them separately for each different type of problem. Results of this kind already established include an implicit-funciton theorem and related machinery in [300], characterizations in [301] of those normal maps derived from linear thransformations that are also global or local homeomorphisms. As we shall now show, the results on Newton’s method that we establish here will apply to normal maps, therefore encompassing in particular the problems treated in the work already cited on application of Newton’s method to certain optimization problems and variational inequalities. Our results on local convergence of Newton’s method require, in essence, the existence of a PBA that is a local homeomorhism, together with some closeness conditions. To show that these apply to normal maps, we must indicate how one can construct PBAs for such maps. Recalling our earlier demonstration that the linearization operation yields a PBA for the function being linearized, one might ask if one could ninearize the function appearing in a normal map. This does not seem easy, as such functions are in general not differentiable. However, we shall show that a simple modification of the linearization procedure will work very well. The key observation is is that although the function FC will be differentiable (indeed, often smooth), and therefore will have an easily constructible PBA. We now show that if such a PBA exists for F, then one can easily be constructed for FC .
Proposition 7.1.8 Let H be a Hilbert space and C a nonempty closed convex subset of H, and denote by Π the metric projector on C. Let F be a function from C to H, and let A be a function from C × C to H. If A is a PBA for F on C, then the function G : H × H → H defined by G(w, z) = A(Π(w), ·)C (z) is a PBA for FC on H. Further, the constant in the definition of PBA can be taken to be the same for G as for A.
Before giving the proof we note that by the definition of normal map, G(w, z) is A(Π(w), Π(z)) + (z − Π(z)). Therefore in determininng values of this function one only evaluates A on the set C × C, where it is defined. Proof. By hypothesis there is a constant φ for which A has the two properties given in Definition 7.1.1. We shall show that the same two properties hold for G,
256
7. Equations with Nonsmooth Operators
again with the constant φ. First, let w and z be any elements of H. Then kFC (z) − G(w, z)k =
= kF (Π(z)) + (z − Π(z)) − [A(Π(w), Π(z)) + (z − Π(z))]k = kF (Π(z)) − A(Π(w), Π(z))k 2
≤ φ/2 kΠ(w) − Π(z)k 2
≤ φ/2 kw − zk ,
where we used the facts that A was a PBA for F on C and that the metric projector is nonexpansive. Therefore G satisfies the first property of a PBA. For the second, suppose that z and z ′ are two elements of H; we show that G(z, ·) − G(z ′ , ·) is Lipschitzian on H with modulus φ kz − z ′ k . Indeed, let x and y be elements of H. Then k[G(z, x) − G(z ′ , x)] − [G(z, y) − G(z ′ , y)]k = = k[A(Π(z), Π(x)) + (x − Π(x)) − A(Π(z ′ ), Π(x)) − (x − Π(x))]− − [A(Π(z), Π(y)) + (y − Π(y)) − A(Π(z ′ ), Π(y)) − (y − Π(y))]k = k[A(Π(z), Π(x)) − A(Π(z ′ ), Π(x))]− [A(Π(z), Π(y)) − A(Π(z ′ ), Π(y))]k ≤ φ kΠ(z) − Π(z ′ )k kΠ(x) − Π(y)k ≤ φ kz − z ′ k kx − yk ,
where we used the second property of a PBA (for A) along with the nonexpansivity of Π.
7.2
Convergence of Newton-Kantorovich Method. Case 2: H¨ olderian assumptions
In this section we weaken further assumptions made in Section 7.1 by considering H¨olderian instead of Lipschitzian assumptions in order to cover a wider range of problems than the ones discussed in Section 7.1 or in [302]. Numerical examples are also provided to show that our results apply where others fail. We need the definitions: Definition 7.2.1 Let F be an operator from a closed subset D of a metric space (X, m) to a normed linear space Y . We say F has a point-based approximation p(PBA) at some point x0 ∈ D if there exist an operator A : D × D → Y and scalars k, k0 such that for each u and v in D and some p ∈ (0, 1]: kF (v) − A(u, v)k ≤
1 km(u, v)1+p , 1+p
(7.2.1)
7.2. Convergence of Newton-Kantorovich Method. Case 2: H¨olderian assumptions 257
k[A(u, x) − A(v, x)] − [A(u, y) − A(v, y)]k ≤
2 km(u, v)1+p , 1+p
(7.2.2)
and k[A(u, x) − A(x0 , x)] − [A(u, y) − A(x0 , y)]k ≤
2 k0 m(u, v)1+p 1+p
(7.2.3)
for all x, y ∈ D. Let us assume that D is a convex set and X is also a normed linear space for both cases that follow. Here are some cases where the above hold: (a) Let p = 1. Then see Section 7.1. (b) If F has a Fr´echet derivative that is p-H¨olderian on D with modulus k and center-p-H¨olderian (at x0 ) with modulus k0 then with the choice of A given by (7.1.1) we get kF (v) − F (u) − F ′ (u)(v − u)k ≤
k ku − vk1+p 1+p
whereas again (7.2.2) and (7.2.3) are equivalent to the H¨olderian and center H¨olderian property of operator F ′ (see also Section 5.1). We need the following generalization of the classical Banach Lemma on invertible operators 1.3.2: Lemma 7.2.2 Let (X, d) be a Banach space, D a closed subset of X, and Y a normed linear space. Let F, G be operators from D into Y , with G being Lipschitzian with modulus k, and G being center Lipschitzian with modulus k0 . Let x0 ∈ D, and set F (x0 ) = y0 . Assume that: (a) U (y0 , α) ⊆ F (D); (b) 0 ≤ k < d; (c) U (x0 , d−1 1 α) ⊆ D, and (d) θ = (1 − k0 d−1 1 )α − kG(x0 )k ≥ 0. Then the following hold: U (y0 , θ) ≤ (F + G) U (x0 , d−1 (7.2.4) 1 α) , and
δ(F + G, D) ≥ d − k > 0.
(7.2.5)
Proof. Define operator T y(x) = F −1 (y − G(x)) for y ∈ U (y0 , θ), and x ∈ U (x0 , d−1 1 α). We can get: ky − G(x) − y0 k ≤ ky − y0 k + kG(x) − G(x0 )k + kG(x0 )k ≤ θ + k0 (d−1 1 α) + kG(x0 )k = α.
258
7. Equations with Nonsmooth Operators
Therefore T y(x) is a singleton set since δ > 0. That is T y is an operator on 1/p U (x0 , (d−1 ). This operator maps U (x0 , d1−1 α) into itself. Indeed for x ∈ 1 α) −1 U (x0 , d1 α) d(T y(x), x0 ) = d F −1 (y − G(x)), F −1 (y0 ) ≤ d−1 1 α, so we get:
d(T y(x), x0 ) ≤ d−1 1 α. Moreover let u, v be in U (x0 , d−1 1 α), then d(T y(u), T y(v)) ≤ d−1 d(F −1 (y − G(u)), F −1 (y − G(v))) ≤ d−1 kd(u, v) so d(T y(u), T y(v)) ≤ d−1 kd(u, v).
(7.2.6)
It follows by hypothesis (b) and (7.2.6) that T y is a contraction operator (see Chapter 3) mapping closed ball U (x0 , d−1 1 α) into itself and as such it has a unique fixed point x(y) in this ball, since ( k[F (u) − F (v)] + [G(u) − G(v)]k δ(F + G, D) = inf , m(u, v) ) u 6= v, u, v ∈ D
kG(u) − G(v)k ≥ δ − sup , u 6= v, u, v ∈ D m(u, v) ≥ δ − k > 0.
(7.2.7)
Furthermore (7.2.7) shows F + G is one-to-one on D. The following lemma can be used to show uniqueness of the solution on the semilocal case and convergence of Newton’s method in the local case: Lemma 7.2.3 Let X and Y be normed linear spaces, and let D be a closed subset of X. Let F : D → Y , and let A be a p-(PBA) for F on D at the point x0 ∈ D. Denote by d0 the quantity δ(A(x0 , ·), D). If x0 ∈ D and U (x0 , ρ) ⊆ D, then δ(F, U (x0 , ρ)) ≥ d0 − (2k0 + k)
ρp . 1+p
p
ρ In particular, if d0 > (2k0 + k) 1+p then F is one-to-one on U (x0 , ρ).
Proof. Let y, z be points in U (x0 , ρ). Set x =
y+z z .
We can write
F (y) − F (z) = [F (y) − A(x, y)] + [A(x, y) − A(x, z)] + [A(x, z) − F (z)].
7.2. Convergence of Newton-Kantorovich Method. Case 2: H¨olderian assumptions 259
Then we get kF (y) − A(x, y)k ≤
k k kx − yk1+p = 1+p ky − xk1+p , 1+p 2 (1 + p)
and similarly kF (z) − A(x, z)k ≤
k kx − zk1+p . 21+p (1 + p)
Using the estimate kA(x, u) − A(x, v)k ≥ kA(x0 , u) − A(x0 , v)k − k[A(x, u) − A(x0 , u)] − [A(x, v) − A(x0 , v)]k, we get in turn: δ(A(x, ·), D) = n o = inf kA(x,u)−A(x,v)k | u 6= v, u, v ∈ D ku−vk ( ≥ δ(A(x0 , ·), D) − sup u 6= v, u, v ∈ D ≥ d0 −
)
k[A(x,u)−A(x0 ,u)]−[A(x,v)−A(x0 ,v)]k , ku−vk
2 2k0 p k0 kx − x0 kp ≥ d0 − ρ . 1+p 1+p
Hence, we get kF (y) − F (z)k ≥
2k0 ρp k d0 − ky − zk − p ky − zk1+p , 1+p 2 (1 + p)
and for y 6= z
2k0 ρp k kF (y) − F (z)k ≥ d0 − − p ky − zkp ky − zk 1+p 2 (1 + p) ρp ≥ d0 − (2k0 + k) . 1+p
Remark 7.2.1 Lemmas 7.2.2 and 7.2.3 can be reduced to the corresponding ones in [302] if p = 1, d1 = d = d0 , and k0 = k. In the case p = 1 and k0 < k or d1 > d our Lemma 7.2.2 improves the range for θ, whereas our Lemma 7.2.3 enlarges ρ given in [300]. Note that in general k0 ≤ k hold and
and
d1 k k0 , d
d ≤ d1
can be arbitrarily large (see Section 5.1).
260
7. Equations with Nonsmooth Operators
It is convenient for us to define scalar iteration {tn } (n ≥ 0) by tn+2 = tn+1 +
k(tn+1 − tn )1+p , t0 = 0, t1 = η 2k0 −1 p d0 tn+1 d0 (1 + p) 1 − 1+p
(7.2.8)
for some η ≥ 0. The convergence of sequence {tn } was studied in Section 5.1 and in [90]. In particular, we showed that if p 2k0 p + 1 p d−1 h= k+ 0 η ≤ 1 1+p p for p 6= 1 or h = (k0 + k)ηd−1 0 ≤ 1 for p = 1,
(7.2.9)
then sequence {tn } is monotonically convergent to its upper bound t∗ with η ≤ t∗ ≤
p+1 η = r. p
(7.2.10)
Note also that these conditions are obtained as special cases from Lemma 5.1.1. Conditions (7.2.9) were compared favourably in [90] with corresponding ones already in the literature [303]. That is why we decided to use them here over the earlier ones. It will be shown that {tn } is a majorizing sequence for Newton’s iteration {xn } (n ≥ 0) to be defined in the main semilocal convergence theorem that follows: We can state and prove the main semilocal convergence result for Newton’s method involving p-(PBA). Theorem 7.2.4 Let X and Y be Banach spaces, D a closed convex subset of X, x0 ∈ D, and F a continuous operator from D into Y . Suppose that F has a p-(PBA) approximation at x0 . Moreover assume: (a) δ(A(x0 , ·), D) ≥ d0 > 0; (b) condition (7.2.9) holds; (c) for each y ∈ U (0, d0 (t∗ − η)) the equation A(x0 , x) = y has a solution x; (d) the solution S(x0 ) of A(x0 , S(x0 )) = 0 satisfies kS(x0 ) − x0 k ≤ η; and (e) U (x0 , t∗ ) ⊆ D. Then the Newton iteration defining xn+1 by A(xn , xn+1 ) = 0
(n ≥ 0)
(7.2.11) ∗
remains in U (x0 , r), and converges to a solution x ∈ U (x0 , r) of equation F (x) = 0. Moreover the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ tn+1 − tn
(7.2.12)
kxn − x∗ k ≤ t∗ − tn
(7.2.13)
and
where sequence {tn } is defined by (7.2.8) and t∗ = lim tn . n→∞
7.2. Convergence of Newton-Kantorovich Method. Case 2: H¨olderian assumptions 261
Proof. We use Lemma 7.2.2 with quantities F , G, x0 , and y0 replaced by A(x, ·), A(x1 , ·)−A(x, ·), x1 = S(x0 ), and 0 respectively. Hypothesis (a) of Lemma 1 follows from the fact that A(x0 , x) = y has a unique solution x1 (since δ(A(x0 , ·), D) > 0). For hypothesis (b) we have 2 2 k0 η p ≥ d0 − k0 η p 1+p 1+p (by (7.2.9)).
δ(A(x1 , ·), D) ≥ δ(A(x0 , ·), D) − >0
To show (c) we must have U (x1 , t∗ − η) ⊆ D. Instead by hypothesis (e) it suffices to show: U (x1 , t∗ − η) ⊂ U (x0 , t∗ ), which is true since kx1 − x0 k + η ≤ t∗ − η + η = t∗ . We also have by (7.2.2) kA(x0 , x1 ) − A(x1 , x1 )k ≤
1 kkx0 − x1 k1+p . 1+p
The quantity d0 η 1 2 k0 η p d−1 − kη 1+p 1− 0 1+p p 1+p
(7.2.14)
is nonnegative by (7.2.9), and is also an upper bound on θ given by (d) in Lemma 7.2.2, since θ is bounded above by 2 d0 η 1− k0 η p d−1 − kA(x0 , x1 ) − A(x1 , x1 )k. 1 1+p p It follows from Lemma 7.2.2 that for each y ∈ U (0, t∗ − kx1 − x0 k), the equation A(x1 , z) = y has a unique solution z = x2 in D, since δ(A(x1 , ·), D) > 0. We also have A(x0 , x1 ) = A(x1 , x2 ) = 0 and A(x1 , x1 ) = F (x1 ). By Definition 7.2.1 we get in turn: kx2 − x1 k ≤ δ(A(x1 , ·), D)−1 kA(x1 , x2 ) − F (x1 )k −1 k −1 −1 2k0 p ≤ d0 1 − d0 kx1 − x0 k kx1 − x0 k1+p 1+p 1+p −1 k −1 2k0 p ≤ d−1 1 − d (t − t ) (t1 − t2 )1+p ≤ t2 − t1 . 1 0 0 0 1+p 1+p Hence we showed: kxn+1 − xn k ≤ tn+1 − tn
(7.2.15)
262
7. Equations with Nonsmooth Operators
and U (xn+1 , t∗ − tn+1 ) ⊂ U (xn , t∗ − tn )
(7.2.16) ∗
hold for n = 0, 1. Moreover for every v ∈ U (x1 , t − t1 )
kv − x0 k ≤ kv − x1 k + kx1 − x0 k ≤ t∗ − t1 + t1 = t∗ − t0 ,
implies v ∈ U (x0 , t∗ − t0 ). Given they hold for n = 0, 1, . . . , j, then kxj+1 − x0 k ≤
j+1 X i=1
kxi − xi−1 k ≤
j+1 X (ti − ti−1 ) = tj+1 − t0 = tj+1 . i=1
The induction for (7.2.15), (7.2.16) can easily be completed by simply replacing above x0 , x1 by xn , xn+1 respectively. Indeed the crucial quantity corresponding to (7.2.14) will be p p+1 d0 η k 2k0 η p d−1 − η 1+p 1− 0 1+p p p 1+p or p 2k0 p+1 k p + d−1 0 η ≤ 1, 1+p 1+p p which is nonnegative by (7.2.9). Furthermore by (7.2.9) scalar sequence {tn } is Cauchy. From (7.2.15) and (7.2.16) it follows {xn } is Cauchy too in a Banach space X, and as such it converges to some x∗ ∈ U (x0 , t∗ ). Moreover we have by (7.2.2) and (7.2.15) kF (xn+1 )k = kF (xn+1 ) − A(xn , xn+1 )k ≤ ≤
k kxn+1 − xn k1+p 1+p
k (tn+1 − tn )1+p . 1+p
Therefore kF (xn+1 )k converges to zero as n → ∞. By the continuity of F we deduce F (x∗ ) = 0. Estimate (7.2.13) now follows from (7.2.12) by using standard majorization techniques. Remark 7.2.2 The uniqueness of the solution x∗ was not considered in Theorem 7.2.4. Indeed, we do not know if under the conditions stated above the solution x∗ is unique, say in U (x0 , t∗ ). However using Lemma 7.2.2 we can obtain a uniqueness result, so that if ρ satisfies 1/p (1 + p)d0 t∗ < ρ < 2k0 + k then operator F is one-to-one in a neighborhood of x∗ , since x∗ ∈ U (x0 , t∗ ). Note that r can replace t∗ in Theorem 7.2.4 as well as in the above double inequality. That is x∗ is an isolated zero of F in this case.
7.3. Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions 263 We can also provide a local convergence result for Newton’s method: Theorem 7.2.5 Let F : D ⊆ X → Y be as in Theorem 7.2.4. Assume: x∗ ∈ D is an isolated zero of F ; operator F has a p-(PBA) on D with modulus (ℓ, ℓ0 ) at the point x∗ . Moreover assume that the following hold: (a) δ(A(x∗ , ·), D) ≥ d∗ > 0; 0 +ℓ (b) d∗ ≥ 2ℓ1+p (r∗ )p ; (c) for each y ∈ U (0, d∗ r∗ ) and provided x0 ∈ U 0 (x∗ , r∗ ) the equation A(x0 , x) = y has a solution x satisfying kx − x∗ k ≤ r∗ and U (x∗ , r∗ ) ⊆ D. Then Newton’s method generated by (7.2.11) is well defined, remains in U (x∗ , r∗ ) for all n ≥ 0, and converges to x∗ . Proof. Simply replace x0 by x∗ in Lemma 7.2.3 and in the appropriate places in the proof of Theorem 7.2.4. Example 7.2.1 Let X = Y = R, D = [0, b] for some b > 1. Define function F on D by 1
F (x) =
x1+ i − b0 x + b1 1 + 1i
for some real parameters b0 and b1 , and an integer i > 1. Then no matter how operator A is chosen corresponding to (7.2.2), (7.2.3), conditions in [302] won’t hold (i.e. set p = 1 in both (7.2.2) and (7.2.3)). However if for example A is given by (7.1.1) our conditions (7.2.2)–(7.2.3) hold for say i = 2, p = 21 , k0 = k = 1 and x0 = 1. A more interesting application for Theorem 7.2.4 is given by Example 3.3.1 (alternative approach).
7.3
Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions
We need the definitions suitabloe for further weakening earlier hypotheses: Definition 7.3.1 Let F be an operator from a closed subset D of a metric space (X, m) to a normed linear space Y . Let x0 ∈ D, ε ≥ 0, and ε0 ≥ 0. We say F has an (x0 , ε0 , ε) point-based approximation (PBA) on D if there exist an operator A : D × D → Y and R > 0 such that for all u, v, x, y ∈ U (x0 , R):
264
7. Equations with Nonsmooth Operators
(a) U (x0 , R) ⊆ D,
(7.3.1)
(b) kF (v) − A(u, v)k ≤
ε m(u, v), 2
(7.3.2)
(c) k[A(u, x) − A(v, x)] − [A(u, y) − A(v, y)]k ≤ εm(u, v),
(7.3.3)
and (d) k[A(u, x) − A(x0 , x)] − [A(u, y) − A(x0 , y)]k ≤ ε0 m(u, v).
(7.3.4)
Let X be also a normed linear space. Then, if F has a Fr´echet-derivative F ′ which is 2ε -continuous on D, and we set A(u, v) = F (u) + F ′ (u)(v − u), (b) gives kF (v) − F (u) − F ′ (u)(v − u)k ≤
ε kv − uk 2
(7.3.5)
provided that D is also a convex set, where as Parts (c) and (d) are equivalent to the continuity property of F ′ . Conditions (7.3.1)–(7.3.4) are weaker than the ones considered in [302, p. 293] or in Section 7.1. The advantages of such an approach have already been explained in Section 5.1 for the classical case. Similar advantages can be found in the nonsmooth case. Let δ(F, D), δ0 (F, D), δ, δ1 be as δ(F, D), δ0 (F, D), d, d1 in Definition 7.1.2 respectively. We need the following generalization of the classical Banach Lemma 1.3.2 on invertible operators in order to examine both the semilocal as well as the local convergence of Newton’s method: Lemma 7.3.2 Let X be a Banach space, D a closed subset of X, and Y a normed linear space. Let F, G be operators from D into Y . Let x0 ∈ D and set F (x0 ) = y0 . Let also G be Lipschitzian with modulus ε and center Lipschitzian at x0 with modulus ε0 . Assume that: (a) U (y0 , α) ⊆ F (D); (b) 0 ≤ ε < δ; (c) U (x0 , δ1−1 α) ⊆ D, and (d) θ = (1 − ε0 δ1−1 )α − kG(x0 )k ≥ 0. Then the following hold: U (y0 , θ) ≤ (F + G) U (x0 , δ1−1 α)
and
δ(F + G, D) ≥ δ − η > 0.
7.3. Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions 265 Proof. Define operator T y(x) = F −1 (y − G(x)) for each fixed y ∈ U (y0 , θ), and x ∈ U (x0 , δ1−1 α). We can get ky − G(x) − y0 k ≤ ky − y0 k + kG(x) − G(x0 )k + kG(x0 )k ≤ θ + ε0 (δ1−1 α) + kG(x0 )k = α.
Therefore T y(x) is a singleton set since δ > 0. That is T y is an operator on U (x0 , δ1−1 α). This operator maps U (x0 , δ1−1 α) into itself. Indeed for x ∈ U(x0 , δ1−1 α), we get: m(T y(x), x0 ) = m(F −1 (y − G(x)), F −1 (y0 )) ≤ δ1−1 α. Moreover let u, v be in U (x0 , δ1−1 α), then m(T y(u), T y(v)) ≤ δ −1 e[G(u), G(v)] ≤ δ −1 εm(u, v). It follows by hypothesis (7.3.5) that T y is a contraction operator mapping closed ball U (x0 , δ1−1 α) into itself and as such it has a unique fixed point x(y) in this ball, since δ(F + G, D) = inf {k[F (u) − F (v)] + [G(u) − G(v)]k, u 6= v, u, v, ∈ D} kG(u) − G(v)k ≥ δ − sup , u 6= v, u, v ∈ D ≥ δ − η > 0. m(u, v) Furthermore the above estimate shows F + G is one-to-one on D. Remark 7.3.1 In general ε0 ≤ ε
and
δ ≤ δ1
(7.3.6)
hold. If strict inequality holds in any of the inequalities above, then Lemma 7.3.2 improves the corresponding Lemma 3.1 in [300] since the range for θ is enlarged and under the same computational cost, since the computation of ε (or δ) in practice requires the evaluation of ε0 (or δ1 ). Note also that εε0 and δδ1 can be arbitrarily large (see Section 5.1). The following lemma helps to establish the uniqueness of the solution x∗ in the semilocal case. Lemma 7.3.3 Let X and Y be normed linear spaces, let D be a closed subset of X, and x0 ∈ D, ε0 ≥ 0 and ε ≥ 0. Let F has an (x0 , ε0 , ε) (PBA) on D. Denote by d the quantity δ(A(x0 , ·), D). If U (x0 , ρ) ⊆ D, then 1 δ(F, U (x0 , ρ)) ≥ d − ε0 ρ − ε. 2 In particular, if d > ε0 ρ + 12 ε then F is one-to-one on U (x0 , ρ).
266
7. Equations with Nonsmooth Operators
Proof. Let y, z be points in U (x0 , ρ). Set x =
y+z z .
We can write
F (y) − F (z) = [F (y) − A(x, y)] + [A(x, y) − A(x, z)] + [A(x, z) − F (z)]. Then we get kF (y) − A(x, y)k ≤
ε ε kx − yk = kx − yk, 2 4
and similarly, kF (z) − A(x, z)k ≤
ε kx − zk. 4
Using the estimate kA(x, u) − A(x, v)k ≥ kA(x0 , u) − A(x0 , v)k
− k[A(x, u) − A(x0 , u)] − [A(x, v) − A(x0 , v)]k,
we get in turn δ(A(x, ·), D) = o n , u 6= v, u, v ∈ D ≥ δ(A(x0 , ·), D) = inf kA(x,u)−A(x,v)k ku−vk k[A(x,u)−A(x0 ,u)]−[A(x,v)−A(x0 ,v)]k − sup , u 6= v, u, v ∈ D ku−vk ≥ d − ε0 kx − x0 k ≥ d − ε0 ρ.
Hence, we get 1 kF (y) − F (z)k ≥ (d − ε0 ρ)ky − zk − εky − zk 2 and for y 6= z kF (y) − F (z)k 1 ≥ d − ε0 ρ − ε . ky − zk 2 From now on we set D = U (x0 , R). We can now state and prove the main semilocal convergence theorem for Newton’s method: Theorem 7.3.4 Let X and Y be Banach spaces, D a closed convex subset of X, x0 ∈ D, ε > 0, ε0 ≥ 0, and F a continuous operator from D into Y . Suppose that F has an (x0 , ε0 , ε) (PBA) approximation on D. Moreover assume: (a) δ(A(x0 , ·), D) ≥ d0 ≥ ε > 0; (b) 2ε0 η ≤ min 2(d0 − ε), d0 + 12 ε ;
7.3. Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions 267 (c) for each y ∈ U (0, d0 η) the equation A(x0 , x) = y has a solution x; (d) the solution S(x0 ) of A(x0 , S(x0 )) = 0 satisfies kS(x0 ) − x0 k ≤ η, and (e) U (x0 , 2η) ⊆ D. Then the Newton iteration defining xn+1 by A(xn , xn+1 ) = 0
(n ≥ 0)
(7.3.7)
remains in U (x0 , 2η), and converges to a solution x∗ ∈ U(x0 , 2η) of equation F (x) = 0. Moreover the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ tn+1 − tn ,
(7.3.8)
kxn − x∗ k ≤ 2η − tn
(7.3.9)
and
where lim tn = 2η n→∞
1 t0 = 0, t1 = η, tn+2 = tn+1 + (tn+1 − tn ). 2 Proof. We use Lemma 7.3.2 with quantities F , G, x0 , and y0 replaced by A(x, ·), A(x1 , ·) − A(x, ·), x1 = S(x0 ), and 0 respectively. Hypothesis (a) of Lemma 7.3.2 follows from the fact that A(x0 , x) = y has a unique solution x1 (since δ(A(x0 , ·), D) > 0). For hypothesis (b) we have δ(A(x1 , ·), U ) ≥ δ(A(x0 , ·), U ) − ε0 kx1 − x0 k ≥ d0 − ε0 η > 0. To show (c) we must have U (x1 , η) ⊆ D. Instead by hypothesis (e) of the theorem it suffices to show: U (x1 , η) ⊂ U (x0 , 2η), which is true since kx1 − x0 k + η ≤ 2η. Note that kA(x0 , x1 ) − A(x1 , x1 )k ≤
ε ε kx0 − x1 k ≤ η 2 2
and since d0 ≤ d1 = δ0 (A(x0 , ·), D), we get θ ≥ (1 − 2ηε0 d−1 1 )d0 η − kA(x0 , x1 ) − A(x1 , x1 )k 1 1 ≥ (1 − 2ηε0 d−1 )d η − εη = d − 2ε η − ε η ≥ 0, 0 0 0 0 2 2
268
7. Equations with Nonsmooth Operators
by hypothesis (b). That is condition (d) of the Lemma is also satisfied. It now follows from Lemma 7.3.2 that for each y ∈ U (0, 2η − kx1 − x0 k), the equation A(x1 , z) = y has a unique solution z = x2 in U since δ(A(x1 , ·), D) > 0. We also have A(x0 , x1 ) = A(x1 , x2 ) = 0, and A(x1 , x1 ) = F (x1 ). By Definition 7.1.2 we get in turn: kx2 − x1 k ≤ δ(A(x1 , ·), U )−1 kA(x1 , x2 ) − A(x1 , x1 )k
−1 −1 ≤ d−1 kA(x0 , x1 ) − F (x1 )k 0 (1 − d0 εkx1 − x0 k) η −1 −1 −1 ε kx1 − x0 k ≤ = t2 − t1 , ≤ d0 (1 − d0 ε0 kx1 − x0 k) 2 2
by hypothesis (b). Hence, we showed: kxn+1 − xn k ≤ tn+1 − tn ,
(7.3.10)
U (xn+1 , 2η − tn+1 ) ⊆ U (xn , 2η − tn )
(7.3.11)
and
hold for n = 0, 1. Moreover for every z ∈ U (x1 , 2η − t1 ), kz − x0 k ≤ kz − x1 k + kx1 − x0 k ≤ 2η − t1 + t1 = 2η − t0 , implies z ∈ U (x0 , 2η − t0 ). Given they hold for n = 0, 1, . . . , k, then kxk+1 − x0 k ≤
k+1 X i=1
kxi − xi−1 k ≤
k+1 X i=1
(ti − ti−1 ) = tk+1 − t0 = tk+1 .
The induction can easily be completed by simply replacing above x0 , x1 by xn , xn+1 respectively. Furthermore scalar sequence {tn } is Cauchy. From (7.3.10) and (7.3.11) it follows {xn } is Cauchy too in a Banach space X and as such it converges to some x∗ ∈ U (x0 , 2η). Then we have ε kxn+1 − xn k 2 ε ≤ (tn+1 − tn ). 2
kF (xn+1 )k = kF (xn+1 − A(xn , xn+1 )k ≤
That is kF (xn+1 )k converges to 0 as n → ∞. By the continuity of F we deduce F (x∗ ) = 0. Estimate (7.3.9) follows from (7.3.8) by using standard majorization techniques. Finally note n+1 1 − 12 1 tn+2 = tn+1 + (tn+1 − tn ) = · · · = η 2 1 − 21 and consequently lim tn = 2η.
n→∞
7.3. Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions 269 Remark 7.3.2 The uniqueness of the solution x∗ was not studied in Theorem 7.3.4. Indeed, we do not know if under the conditions stated above the solution x∗ is unique, say in U(x0 , 2η). However, we can obtain a uniqueness result by using Lemma 7.3.3 at the point x0 , so that if ρ satisfies 2η < ρ < d0 − 12 ε ε−1 for ε ∈ (0, 2d0 ), ε > 0, ε0 > 0, and U(x0 , ε1 ) ⊆ D, 0 = ε1
then operator F is one-to-one in a neighborhood of x∗ , since x∗ ∈ U (x0 , 2η). Therefore x∗ is an isolated zero of F in this case.
Example 7.3.1 Let us show how to choose operator A in some interesting cases. Let B(u) ∈ L(X, Y ) the space of bounded linear operators from X into Y , and define A by A(u, v) = F (u) + B(u)(v − u). This is the class of the so-called Newton-like methods (see Section 5.1). Moreover assume there exist nonnegative, nondecreasing functions βt (p, q) and γ(p) on [0, +∞) × [0, +∞) and [0, +∞) respectively, and F ′ (u) ∈ L(X, Y ) such that: kF ′ (x + t(y − x)) − B(x0 )k ≤ βt (kx − x0 k, ky − x0 k) and kB(x) − B(x0 )k ≤ γ(kx − x0 k) for all x, y ∈ U (x0 , R), t ∈ [0, 1]. Then we can have Z 1 F (v) − F (u) − B(u)(v − u) = [F ′ (u + t(v − u)) − B(x0 )](v − u)dt 0
+ (B(x0 ) − B(u))(v − u).
Therefore we can define βt (p, q) = ℓ[(1 − t)p + tq]
(ℓ > 0)
and γ(p) = ℓp. In the special case of Newton’s method, i.e. when B(u) = F ′ (u) we can set ε0 = ε = 4ℓ[βt (R, R) + γ(R)] = 4ℓR. In this case crucial condition (b) of Theorem 7.3.4 becomes R≤
d0 . 4ℓ(1 + 2η)
270
7. Equations with Nonsmooth Operators
We must also have U(x0 , 2η) ⊆ U (x0 , R), i.e. 2η ≤ R. By combining all the above we deduce: d0 R ∈ 2η, 4ℓ(1 + 2η) provided that: " # p −ℓ + ℓ(ℓ + d0 ) η ∈ 0, . 4ℓ Consider function F defined on [0, ∞) and given by 1
x1+ n F (x) = 1 + n1
(n > 1).
Then F ′ is center-Lipschitz with modulus 1 since for x0 = 1 (see Section 5.2). However F ′ is not Lipschitz on [0, ∞). Therefore the Newton–Kantorovich Theorem in the form given in [302] cannot apply here, whereas our Theorem 7.3.4 can.
7.4
Convergence of the Secant Method Using Majorizing Sequences
We noticed (see the numerical example at the end of this section) that such an approximations of the kind discussed in the previous sections may not exist. Therefore in order to solve a wider range of problems we introduce a more flexible and precise point-based-approximation which is more suitable for Newton-like methods and in particular for Secant-type iterative procedures (see Section 5.1). A local as well as a semilocal convergence analysis for the Secant method is provided, and our approach is justified through numerical examples. We need a definition of a point-based-approximation (PBA) for operator F which is suitable for the Secant method. Definition 7.4.1 Let F be an operator from a closed subset D of a metric space (X, m) into a normed linear space Y . Operator F has a (PBA) on D at the point x0 ∈ D if there exists an operator A : D × D × D → Y and scalars ℓ0 , ℓ such that u, v, w, x, y and z in D, kF (w) − A(u, v, w)k ≤ ℓm(u, w)m(v, w), k[A(x, y, z) − A(x0 , x0 , z)] − [A(x, y, w) − A(x0 , x0 , w)]k ≤ ℓ0 [m(x, x0 ) + m(y, x0 )]m(z, w),
(7.4.1)
(7.4.2)
7.4. Convergence of the Secant Method Using Majorizing Sequences
271
and k[A(x, y, z) − A(u, v, z)] − [A(x, y, w) − A(u, v, w)]k ≤ ℓ[m(x, u) + m(u, v)]m(z, w).
(7.4.3)
We then say A is a (PBA) for F . This definition is suitable for the application of the Secant method. Indeed let X be also a normed linear space, D a convex set and F having a divided difference of order one on D × D denoted by [x, y; F ] and satisfying the standard condition [68], [99], [213], [217] (see also Chapter 2): k[u, v; F ] − [w, x; F ]k ≤ ℓ(ku − wk + kv − xk)
(7.4.4)
for all u, v, w and x in D. If we set A(u, v, w) = F (v) + [u, v; F ](w − v)
(7.4.5)
then (7.4.1) becomes kF (w) − F (v) − [u, v; F ](w − v)k ≤ ℓku − wk kv − wk,
(7.4.6)
whereas (7.4.2) and (7.4.3) are equivalent to property (7.4.4) of linear operator [·, ·; F ]. Note that a (PBA) does not imply differentiability. It follows by (7.4.1) that one way of finding a solution x∗ of equation F (x) = 0 is to solve for w the equation A(x, y, w) = 0
(7.4.7)
provided that x and y are given. We state the following generalization of the classical Banach Lemma 1.3.2 on invertible operators. The proof can be found in Lemma 7.1.3. Lemma 7.4.2 Let X, D and Y be as in Definition 7.4.1. Assume further X is a Banach space. Let F and G be operators from D into Y with G being Lipschitzian with modulus ℓ and center-Lipschitzian with modulus ℓ0 . Let x0 ∈ D with F (x0 ) = y0 . Assume that: U (y0 , α) ⊆ F (D);
(7.4.8)
0 ≤ ℓ < d;
(7.4.9)
U (x0 , d−1 1 α)
⊆ D,
(7.4.10)
and θ0 = (1 − ℓ0 d−1 1 )α − kG(x0 )k ≥ 0.
(7.4.11)
272
7. Equations with Nonsmooth Operators
Then the following hold: U (y0 , θ0 ) ⊆ (F + G)(U (x0 , d−1 1 α))
(7.4.12)
δ(F + G, D) ≥ d − ℓ > 0.
(7.4.13)
and
The advantages of this approach have been explained in Remark 7.1.1. The following lemma is used to show uniqueness of the solution in the semilocal case and convergence of Secant method in the local case. Lemma 7.4.3 Let X and Y be normed linear spaces, and let D be a closed subset of X. Let F : D → Y , and let A be a (PBA) for operator F on D at the point x0 ∈ D. Denote by d the quantity δ(A(x0 , x0 , ·), D). If U (x0 , ρ) ⊆ D, then δ(F, U (x0 , ρ)) ≥ d − (2ℓ0 + ℓ)ρ.
(7.4.14)
In particular, if d − (2ℓ0 + ℓ)ρ > 0, then F is one-to-one on U (x0 , ρ). Proof. Let w, z be points in U (x0 , ρ). We can write F (w) − F (z) = [F (w) − A(x, y, w)] + [A(x, y, w) − A(x, y, z)] + [A(x, y, z) − F (z)]
By (7.4.1) we can have kF (w) − A(x, y, w)k ≤ ℓkx − wk ky − wk and kF (z) − A(x, y, z)k ≤ ℓkx − zk ky − zk. Moreover we can find kA(x, y, u) − A(x, y, v)k ≥ kA(x0 , x0 , u) − A(x0 , x0 , v)k
− k[A(x, y, u) − A(x0 , x0 , u)] − [A(x, y, v) − A(x0 , x0 , v)]k
and therefore δ(A(x, y, ·), D) ≥ ≥ δ(A(x0 , x0 , ·), D) ( k[A(x, y, u) − A(x0 , x0 , u)] − [A(x, y, v) − A(x0 , x0 , v)]k − sup , ku − vk ) u 6= v, u, v ∈ D
≥ d − ℓ0 (kx − x0 k + ky − x0 k) ≥ d − 2ℓ0 ρ.
(7.4.15)
7.4. Convergence of the Secant Method Using Majorizing Sequences
273
Furthermore, we can now have kF (w) − F (z)k ≥ (d − 2ℓ0 ρ)kw − zk − ℓ[kx − wk ky − wk + kx − zk ky − zk] ℓ ≥ (d − 2ℓ0 ρ)kw − zk − kw − zk2 2 and for w 6= z, kF (w) − F (z)k ≥ d − (2ℓ0 + ℓ)ρ. kw − zk
(7.4.16)
Remark 7.4.1 In order for us to compare our result with the corresponding Lemma 2.4 in [302, p. 294], first note that if: (a) equality holds in both inequalities in (7.3.6), u = v and x = y in (7.4.1)–(7.4.3), then our result reduces to Lemma 2.4 by setting k2 = ℓ = ℓ0 . (b) Strict inequality holds in any of the inequalities in (7.3.6), u = v and x = y then our Lemma 7.4.3 improves (enlarges) the range for ρ, and under the same computational cost. The implications of that are twofold (see Theorems 7.4.5 and 7.4.6 that follows): in the semilocal case the uniqueness ball is more precise, and in the local case the radius of convergence is enlarged. We will need our result on majorizing sequences for the Secant method. The proof using conditions (C1 )–(C3 ) can be found in [94], whereas for well-known condition (C4 ) see e.g. [306]. Detailed comparisons between conditions (C1 )–(C4 ) were given in [93]. In particular if strict inequality holds in (7.3.6) (first inequality) the error bounds under (C1 )–(C3 ) are more precise and the limit of majorizing sequence more accurate than under condition (C4 ). Note also that these conditions are obtained as special cases of Lemma 5.1.1. Lemma 7.4.4 Assume for η ≥ 0, c ≥ 0, d0 > 0: (C1 ) for all n ≥ 0 there exists δ ∈ [0, 1) such that: (C1 )
ℓδ n (1 + δ)η +
and
δℓ0 (2 − δ n+2 − δ n+1 )η + δℓ0 c ≤ δd0 , 1−δ
δℓ0 (2 − δ n+2 − δ n+1 )η + δℓ0 c < d0 , 1−δ
or
(C2 ) there exists δ ∈ [0, 1) such that: ℓ(1 + δ)η + and
2δℓ0 η + δℓ0 c ≤ δd0 , 1−δ
2δℓ0 η + δℓ0 c < d0 , 1−δ
274
7. Equations with Nonsmooth Operators
or (C3 ) there exists a ∈ [0, 1] and h i √ a= 6 0 0, −1+2a1+4a , δ∈ [0, 1), a=0
such that
(ℓ + δℓ0 )(c + η) ≤ d0 δ, η ≤ δc and ℓ0 ≤ aℓ, or
q (C4 ) d−1 ℓc + 2 d−1 0 0 ℓη ≤ 1 for ℓ0 = ℓ. Then, (a) iteration {tn } (n ≥ −1) given by t−1 = 0,
t0 = c,
tn+2 = tn+1 +
1
t1 = c + η,
d−1 0 ℓ(tn+1 − tn−1 ) (tn+1 − d−1 0 ℓ0 [tn+1 − t0 + tn ]
− tn ),
(7.4.17)
is non-decreasing, bounded above by r r=
η + c, 1−δ
and converges to some t∗ such that 0 ≤ t∗ ≤ r. Moreover, the following estimates hold for all n ≥ 0: 0 ≤ tn+2 − tn+1 ≤ δ(tn+1 − tn ) ≤ δ n+1 η. (b) Iteration {sn } (n ≥ −1) given by s−1 − s0 = c, sn+1 = sn+2 +
s0 − s1 = η,
d−1 0 ℓ(sn−1 − sn+1 ) (sn − sn+1 ), −1 1 − d0 ℓ0 [(s0 + s−1 ) − (sn + sn+1 )]
(7.4.18)
provided that s−1 ≥ 0, s0 ≥ 0, s1 ≥ 0 is non-increasing, bounded below by s given by s = s0 −
η , 1−δ
and converges to some s∗ such that 0 ≤ s∗ ≤ s.
7.4. Convergence of the Secant Method Using Majorizing Sequences
275
Moreover, the following estimates hold for all n ≥ 0: 0 ≤ sn+1 − sn ≤ δ(sn − sn+1 ) ≤ δ n+1 η. We denote by (C), Conditions (C1 ) or (C2 ) or (C3 ) or (C4 ). We can state and prove the main semilocal convergence result for the Secant method involving a (PBA). Theorem 7.4.5 Let X and Y be Banach spaces, D a closed convex subset of X, x−1 , and x0 ∈ D with kx−1 − x0 k ≤ c, and F a continuous operator from D into Y . Suppose operator F has a (PBA) on D at the point x0 . Moreover assume: δ(A(x−1 , x0 , ·), D) ≥ d0 > 0; Condition (C) holds; for each y ∈ U (x0 , d0 (t∗ − t1 ) the equation A(x−1 , x0 , x) = y has a solution x; the solution S(x−1 , x0 ) of A(x−1 , x0 , S(x−1 , x0 )) = 0 satisfies kS(x−1 , x0 ) − x0 k ≤ η; and U (x0 , t∗ ) ⊆ D. Then the Secant iteration defining xn+1 by A(xn−1 , xn , xn+1 ) = 0
(7.4.19)
remains in U (x0 , t∗ ), and converges to a solution x∗ ∈ U (x0 , t∗ ) of equation F (x) = 0. Moreover the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ tn+1 − tn
(7.4.20)
kxn − x∗ k ≤ t∗ − tn ,
(7.4.21)
and
where sequence {tn } is defined by (7.4.17) and t∗ = lim tn . n→∞
Proof. We use Lemma 7.4.2 with quantities F , G, x0 and y0 replaced by A(w, x, ·), A(x0 , x1 , ·) − A(v, x, ·), x1 = S(x−1 , x0 ), and 0 respectively. Hypothesis (7.4.8) of the Lemma follows from the fact that A(x−1 , x0 , x) = y has a unique solution x1 (since d0 = δ(A(x−1 , x0 , ·), D) > 0). For hypothesis (7.4.9) we have using (7.4.2) δ(A(x0 , x1 , ·), D) ≥ δ(A(x−1 , x0 , ·), D) − ℓ0 (kx0 − x−1 k + kx1 − x0 k) ≥ d0 − ℓ0 t1 > 0
by (C)).
(7.4.22)
276
7. Equations with Nonsmooth Operators
To show (7.4.10) we must have U (x1 , t∗ −t1 ) ⊆ D. Instead by hypothesis U (x0 , t∗ ) ⊆ D, it suffices to show: U (x1 , t∗ − t1 ) ⊂ U (x0 , t∗ ) which is true since kx1 − x0 k + t∗ − t1 ≤ t∗ . We also have by (7.4.1) kA(x−1 , x0 , x1 ) − A(x1 , x1 , x1 )k ≤ ℓ(t1 − t−1 )(t1 − t0 ).
(7.4.23)
It follows by (7.4.11) and (7.4.23) that θ0 ≥ 0 if θ0 ≥ [1 − ℓ0 d−1 1 (c + η)]d0 η − ℓ(t1 − t−1 )(t1 − t0 ) ≥ 0,
(7.4.24)
which is true by (7.4.17) and since d0 ≤ d1 = δ0 (A(x−1 , x0 , ·), D). It follows from Lemma 7.4.2 that for each y ∈ U (0, r − kx1 − x0 k), the equation A(x0 , x1 , z) = y has a unique solution since δ(A(x0 , x1 , ·), D) > 0. We also have A(x0 , x1 , x2 ) = A(x−1 , x0 , x1 ) = 0. By Definition 7.1.2, the induction hypothesis, (7.4.1) and (7.4.17) we get in turn: kx2 − x1 k ≤ δ(A(x0 , x1 , ·), D)−1 kA(x−1 , x0 , x1 ) − F (x1 )k ≤ ≤
ℓkx−1 − x0 k kx0 − x1 k d0 − ℓ0 (kx0 − x−1 k + kx1 − x0 k)
d−1 0 ℓ(t1 − t−1 )(t1 − t0 ) = t2 − t1 . 1 − d−1 0 ℓ 0 t1
(7.4.25)
Hence we showed: kxn+1 − xn k ≤ tn+1 − tn ,
(7.4.26)
U (xn+1 , t∗ − tn+1 ) ⊂ U (xn , t∗ − tn )
(7.4.27)
and
hold for n = 0, 1. Moreover for every v ∈ U (x1 , t∗ − t1 ) kv − x0 k ≤ kv − x1 k + kx1 − x0 k ≤ t∗ − t1 + t1 − t0 , implies v ∈ U (x0 , t∗ − t0 ). Given they hold for n = 0, 1, . . . , j, then kxj+1 − x0 k ≤
j+1 X i=1
j+1 X kxi − xi−1 k ≤ (ti − ti−1 ) = tj+1 − t0 . i=1
7.4. Convergence of the Secant Method Using Majorizing Sequences
277
The induction for (7.4.26) and (7.4.27) can easily be completed by simply replacing x−1 , x0 , x1 , by xn−1 , xn , xn+1 respectively. Indeed corresponding to (7.4.24) we have: θ0 ≥ [1−ℓ0d−1 1 (tn+1 −t0 +tn )]d0 (tn+2 −tn+1 )−ℓ(tn+1 −tn−1 )(tn+1 −tn ) = 0, (7.4.28) which is true by (7.4.17). Scalar sequence {tn } is Cauchy. From (7.4.26) and (7.4.27) it follows {xn } is Cauchy too in a Banach space X, and as such it converges to some x∗ ∈ U (x0 , t∗ ). Moreover we have by (7.4.1) and (7.4.26) kF (xn+1 )k = kF (xn+1 ) − A(xn−1 , xn , xn+1 )k
≤ ℓkxn − xn−1 k kxn − xn+1 k ≤ ℓ(tn − tn−1 )(tn+1 − tn ) −→ 0 as n → ∞.
By the continuity of F we deduce F (x∗ ) = 0. Finally estimate (7.4.21) follows from (7.4.20) by using standard majorization techniques. Remark 7.4.2 The uniqueness of the solution x∗ was not considered in Theorem 7.4.5. Indeed, we do not know if under the conditions stated above the solution x∗ is unique, say in U (x0 , t∗ ). However using Lemma 7.4.3 we can obtain a uniqueness result, so that if ρ satisfies t∗ < ρ
0; 0 ≤ r∗
1 respectively.
7.5.1
An Exercise
We need the following definition of a point-based approximation for p-H¨olderian operators: Definition 7.5.1 Let f be an operator from a closed subset D of a metric space (X, m) to a normed linear space Y , and let x0 ∈ D, p ∈ (0, 1]. We say f has a point-based approximation (PBA) on D at x0 ∈ D if there exist an operator A : D × D → Y and scalars ℓ, ℓ0 such that for each u, and v in D, kf (v) − A(u, v)k ≤ ℓm(u, v)p
(7.5.1)
k[A(u, x) − A(v, x)] − [A(u, y) − A(v, y)]k ≤ 2ℓm(u, v)p
(7.5.2)
k[A(u, v) − A(x0 , x)] − [A(u, y) − A(x0 , y)]k ≤ 2ℓ0 m(u, v)p
(7.5.3)
and
for all x, y ∈ D. Assume X is also a normed linear space and D is a convex set. Then this definition is suitable for operators f where Fr´echet derivative f ′ is (p − 1)-H¨olderian with modulus pℓ and (p − 1)-center H¨ olderian with modulus pℓ0 . Indeed if we set A(u, v) = f (u) + f ′ (u)(v − u),
280
7. Equations with Nonsmooth Operators
then equation F (x) = 0 says kF (v) − f (u) − f ′ (u)(v − u)k ≤ ℓkv − ukp , where as parts (7.5.2) and (7.5.3) are equivalent to the H¨olderian and center H¨olderian property of f ′ . Definition 7.5.2 Let f be an operator from a metric space (X, m) into a normed linear space Y . We let kF (u) − F (v)k δ(f, X) = inf , u 6= v, u, v ∈ X . (7.5.4) m(u, v)p Note that by (7.5.4) it follows F −1 (if it exists) is p1 -H¨ olderian with modulus −1/p δ(f, X) . We also define kF (u) − F (x0 )k δ0 (f, X) = inf , u = 6 x , u ∈ D . 0 m(u, x0 )p Set δ = δ(f, D) and δ1 = δ0 (f, D). Lemma 7.5.3 Let X be a Banach space, D a closed subset of X, and Y a normed linear space. Let f, y be operators from D into Y , g being p-H¨ olderian with modulus ℓ and center p-H¨ olderian with modulus ℓ0 . Let x0 ∈ D with f (x0 ) = y0 . Assume that: U (y0 , α) ⊆ F (D);
(7.5.5)
0 ≤ ℓ < δ;
(7.5.6)
U and
x0 , (δ1−1 α)1/p
⊆ D;
(7.5.7)
θ = (1 − ℓ0 δ1−1 )α − kg(x0 )k ≥ 0. Then show the following hold: (f + g) U (x0 , (δ1−1 α)1/p ⊇ U (y0 , θ)
(7.5.8)
(7.5.9)
and
δ(f + g, D) ≥ δ − ℓ > 0.
(7.5.10)
Lemma 7.5.4 Let X and Y be normed linear spaces, and let D be a closed subset of X. Let f : D → Y , and let A be a PBA for f on D at x0 ∈ D. Denote by d the quantity δ(A(x0 , ·), D). If U (x0 , ρ) ⊆ D, then show δ(f, U (x0 , ρ)) ≥ δ0 − 2ℓ0 − In particular, if δ0 > 2ℓ0 +
ℓ 2p−1 ,
ℓ . 2p−1 then show f is one-to-one on U (x0 , ρ).
7.5. Exercises
281
We need the following result on fixed points (see also Section 3.3): Theorem 7.5.5 Let Q : X ⊂ X → Y be a continuous operator, let p ∈ [0, 1), q ≥ 0 and x0 ∈ D be such that kQ(x) − Q(y)k ≤ qkx − ykp
for all x, y ∈ D
(7.5.11)
d ∈ [0, 1),
(7.5.12)
U (x0 , r) ⊆ D
(7.5.13)
and
where, 1
b = q 1−p ,
(7.5.14)
d = b−1 kx0 − Q(x0 )k,
and
r=
b . 1 − dp
(7.5.15)
Then show: sequence {xn } (n ≥ 0) generated by successive substitutions: xn+1 = Q(xn )
(n ≥ 0)
(7.5.16)
converges to a fixed point x∗ ∈ U (x0 , r) of operator Q, so that for all n ≥ 0: n
kxn+1 − xn k ≤ dp b,
(7.5.17)
and dnp b. 1 − dp
(7.5.18)
d1 = b−1 (kx0 − Q(x0 )k + qr) < 1,
(7.5.19)
q 1/p r < 1
(7.5.20)
kxn − x∗ k ≤ Moreover if
or
then show x∗ is the unique fixed point of Q in U (x0 , r). If instead of (7.5.12) and (7.5.19) we assume h = 2dp < 1,
(7.5.21)
2b ≤ q −1/p
(7.5.22)
and
282
7. Equations with Nonsmooth Operators
then the conclusions of Theorem 7.5.5 hold in the ball U (x0 , r0 ) where r0 = 2b.
(7.5.23)
We set kx0 − Q(x0 )k ≤ η.
(7.5.24)
We can state the main semilocal convergence result for Newton’s method involving a p-(PBA) approximation for f . Theorem 7.5.6 Let X and Y be Banach spaces, D a closed convex subset of X, x0 ∈ D, and F a continuous operator from D into Y . Suppose that F has a p-(PBA) approximation at x0 . Moreover assume: δ(A(x0 , ·), D) ≥ δ0 > 0, η ≤ b, 2ℓ0 < δ0 ,
(7.5.25)
(1 − 2ℓ0 δ1−1 r0 )δ0 η − ℓη p ≥ 0
(7.5.26)
and conditions (7.5.8), (7.5.21), (7.5.22) are satisfied for α = δ0 η and q = ℓ(δ0 − 2ℓ0 )−1 , respectively where, δ1 = δ0 (A(x0 , ·), D). for each y ∈ U (0, δ0 η) the equation A(x0 , x) = y has a solution x; the solution T (x0 ) of A(x0 , T (x0 )) = 0 satisfies kx0 − T (x0 )k ≤ η; and U (x0 , r0 ) ⊆ D, where r0 is given by (7.5.23). Then show: the Newton iteration defining xn+1 by A(xn , xn+1 ) = 0
(7.5.27)
remains in U (x0 , r0 ), and converges to a solution x∗ ∈ U (x0 , r0 ) of equation F (x) = 0, so that estimates (7.5.17) and (7.5.18) hold. The uniqueness of the solution x∗ was not considered in Theorem 7.5.6. Using Lemma 7.5.4 show: we can obtain a uniqueness result so that if r0 < ρ
and δ0 > 2ℓ0 +
ℓ , 2p−1
then operator F is one-to-one in a neighborhood of x∗ , since x∗ ∈ U (x0 , r0 ). That is show: x∗ is an isolated zero of F in this case.
7.5.2
Another Exercise
With the notations introduced in the previous subsection, assume p > 1. We need the following result on fixed points (see also Section 3.3):
7.5. Exercises
283
Theorem 7.5.7 Let Q : D ⊂ X → X be an operator, p, q scalars with p > 1, q ≥ 0, and x0 a point in D such that kQ(x) − Q(y)k ≤ qkx − ykp
for all x, y ∈ D;
(7.5.28)
equation 2p−1 qrp − r + kx0 − Q(x0 )k = 0
(7.5.29)
1 has a unique positive solution r in I = kx0 − Q(x0 )k, 21 q 1−p ; and U (x0 , r) ⊆ D.
Then show: sequence {xn } (n ≥ 0) generated by successive substitutions xn+1 = Q(xn )
(n ≥ 0)
(7.5.30)
converges to a unique fixed point x∗ ∈ U (x0 , r) of operator Q, so that for all n ≥ 1 kxn+1 − xn k ≤ dkxn − xn−1 k ≤ dn kx0 − Q(x0 )k
(7.5.31)
and kxn − x∗ k ≤
dn kx0 − Q(x0 )k 1−d
(7.5.32)
where, d = (2r)p−1 q.
(7.5.33)
Remark 7.5.1 Conditions can be given to guarantee the existence and uniqueness of r. Indeed define scalar function h by h(t) = 2p−1 qtp − t + η,
kx0 − Q(x0 )k ≤ η.
By the intermediate value theorem show (7.5.29) has a solution r in I if 1 1 1−p h q ≤0 2
(7.5.34)
(7.5.35)
or if η≤
1 1 1−p q (1 − q −1 ) = η0 2
(7.5.36)
and q ≥ 1.
(7.5.37)
284
7. Equations with Nonsmooth Operators
This solution is unique if η ≤ η1
(7.5.38)
where,
1 1 η1 = min η0 , (pq) 1−p 2
.
(7.5.39)
Indeed if (7.5.37) and (7.5.38) hold, it follows that h′ (t) ≤ 0 on I1 = [η, η1 ] ⊂ I. Therefore h crosses the t-axis only once (since h is nonincreasing on I1 ). Set η = min{η1 , η2 }
(7.5.40)
where, 1
η2 = (2p q) 1−p .
(7.5.41)
It follows from (7.5.34) and (7.5.40) that if η ≤ η,
(7.5.42)
r ≤ 2η = r0 .
(7.5.43)
then
We can state the main semilocal convergence theorem for Newton’s method involving a p-(PBA) (p > 1) approximation for f . Theorem 7.5.8 Let X and Y be Banach spaces, D a closed convex subset of X, x0 ∈ D, and F a continuous operator from D into Y . Suppose that F has a p-(PBA) approximation at x0 . Moreover assume: δ(A(x0 , ·), D) ≥ δ0 > 0; 2ℓ0 < δ0 ,
(1 − 2ℓ0 δ1−1 r0 )δ0 η − ℓη p ≥ 0
(7.5.44)
where, δ1 = δ0 (A(x0 , ·), D); (7.5.32) in Lemma 7.1.3 and conditions (7.5.37), (7.5.42) in Remark 7.5.1 hold for α = δ0 η and q = ℓ(δ0 − 2ℓ0 )−1 ; for each y ∈ U (0, δ0 η) the equation A(x0 , x) = y has a solution x; the solution T (x0 ) of A(x0 , T (x0 )) = 0 satisfies kx0 − T (x0 )k ≤ η, and U (x0 , r0 ) ⊆ D, where r0 is given by (7.5.43). Then show: Newton iteration remains in U (x0 , r0 ), and converges to a solution x∗ ∈ U (x0 , r0 ) of equation F (x) = 0, so that estimates (7.5.31) and (7.5.32) hold.
7.5. Exercises
285
Remark 7.5.2 Our Theorem 7.5.8 compares favorably with Theorem 3.2 in [302, p. 298]. First of all the latter theorem cannot be used when e.g. p ∈ [1, 2) (see the example that follows). In the case p = 2 our condition (7.5.44) becomes for δ0 = δ1 h0 = δ0−1 (ℓ + 4ℓ0 )η ≤ 1
(7.5.45)
where as the corresponding one in [302] becomes h = 4δ0−1 ℓη ≤ 1.
(7.5.46)
Clearly (7.5.45) is weaker than (7.5.46) if 4 ℓ > . ℓ0 3
(7.5.47)
But ℓℓ0 can be arbitrarily large. Therefore our Theorem 7.5.8 can be used in cases when Theorem 3.2 in [302] cannot when p = 2. Example 7.5.1 Let X = Rm−1 , m ≥ 2 an integer and define matrix operator Q on X by Q(z) = M + M1 (z),
z = (z1 , z2 , . . . , zm−1 ),
(7.5.48)
where M is a real (m − 1) × (m − 1) matrix, ( 0 i 6= j, M1 (z) = mij = p zi , i=j and e.g. p ∈ [1, 2). Operators Q of the form (7.5.48) appear in many discretization studies in connection with the solution of two boundary value problems [68]. Clearly no matter how operator A is chosen the conditions in Definition 2.1 in [302, p. 293] cannot hold, whereas our result can apply.
This page intentionally left blank
Chapter 8
Applications of the weaker version of the Newton-Kantorovich theorem Diverse further applications of the results in Chapter 5 are provided here.
8.1
Comparison with Moore Theorem
In this section we are concerned with the problem of approximating a locally unique solution x* of equation
where F: D C IRk + IRk is continuously differentiable on an open convex set D , and Ic is a positive integer. Rall in [290] compared the theorems of Kantorovich and Moore. This comparison showed that the Kantorovich theorem has only a slight advantage over the Moore theorem with regard to sensitivity and precision, while the latter requires less computational cost. Later Neumaier and Shen [255] showed that when the derivative in the Krawczyk operator is replaced with a suitable slope then the corresponding existence theorem is at least as effective as the Kantorovich theorem with respect to sensitivity and precision. At the same time Zuhe and Wolfe [369] showed that the hypotheses in the affine invariant form of the Moore theorem are always implied by the Newton-Kantorovich theorem 4.2.4 but not necessarily vice versa. Here we show that this implication is not true in general for a weaker version of theorem 4.2.4. We will need the following semilocal convergence theorem for Newton's method due to Deuflhard and Heindl [143]:
288
8. Applications of the weaker version of the Newton-Kantorovich theorem
Theorem 8.1.1 Let F be a Fre'chet-dzfferentzable operator defined on an open convex subset D of a Banach space X with values in a Banach space Y . Suppose that xo E D is such that the Newton-Kantorovich hypotheses (4.2.51) holds and:
and
where,
Then sequence { x n ) generated by method (4.1.3) is well defined, remains i n U ( x o ,r l ) for all n 2 0 , and converges to a solution x* of equation F ( x ) = 0 which is unique i n g ( x o ,r l ) U ( D n U ( x o ,r z ) ) , where,
Moreover the following estimates hold for all n 2 0:
and
where
Remark 8.1.1 I n the case of Moore's theorem [240] suppose that hypotheses of E I ( D ) is Theorem 8.1.1 are valid with X = Y = R" xo = z = mid(:,), where given by
:,
and define the Krawczyk operator
8.1. Comparison with Moore Theorem
289
where
w = z - F'(z)-~F(.z), and
1
1
F I ~ , Z= ~I
+
~ ' ( z t(z7 - . ~ ) ) d t
in which integration is defined as in [240]. The following result relates a weaker version of Theorem 8.1.1 with Moore's theorem. Theorem 8.1.2 Assume for xo = z = mid(%) : hypotheses (8.1.2) and (8.1.3) of Theorem 8.1.1 hold; IIF'(xo)-~[F'(x) - F'(xo)]ll I l0IIx - xoII for all x tl D;
(8.1.15)
h = 2eov I 1;
(8.1.16)
~ ( x o~ ,3 C ) D,
(8.1.17)
and
where,
K (z7)G where g7, K are given
(8.1.19)
z71
by (8.1.11) and (8.1.12)-(8.1.14) respectively.
Proof. If u E K(zy), then u = w
+v
v = Y{[F'(~)-'F[~,z71- 1)L-e' el (by (8.1.13) and (8.1.14)). We have in turn llz - uIIm 5 llz - wIIm + llvlloo I r! r l l ~ ' ( z ) - ~ { F [ zzyl , - F'(z)lllm
+
I 7 + +eoy2.
Hence (8.1.19) holds if
v+
;toy2 I 7'
which is true by (8.1.16)-(8.1.18). rn
(8.1.21)
290
8. Applications of the weaker version of the Newton-Kantorovich theorem
Remark 8.1.2 (a) Note that Moore's theorem [240] guarantees the existence of a solution x* of equation (8.1.1) if (8.1.19) holds. (b) Theorem 8.1.2 reduces to Theorem 8.1.2 in (3691 if ifo = e. However in general
holds. Note also that
but not vice versa unless if lo= l . Hence our Theorem 8.1.2 weakens Theorem 2 in [369], and can be used in cases not covered by the latter. An example was given in [255] and [369] to show that the hypotheses of the affine invariant form of the Moore theorem may be satisfied even when those of Theorem 8.1.1 are not.
Example 8.1.1 (used also in Section 5.1) Let Ic = 1, a E and define function F : g, + R by
Choose z = mid(:,)
[o, ;), x E g7
= [a,2 -a]
then (4.2.51) becomes
That is, there is no guarantee that method generated by (4.1.3) converges to a solution of F ( x ) = 0, since the Newton-Kantorovich hypothesis (4.2.51) is violated. However using (8.1.11)-(8.1.14) and (8.1.25) we get
and if
then (8.1.19) holds. That is Moore's theorem guarantees convergence of method (4.1.3) to a solution x* of equation F ( x ) = 0 provided that (8.1.28) holds. However we can do better. Indeed, since lo= 3 - a
which improves (8.1.28).
8.2. Miranda Theorem
29 1
Remark 8.1.3 If (4.2.51) holds as a strict inequality method (4.1.3) converges quadratically t o x*. However (8.1.17) guarantees only the linear convergence to x* of the modified method
I n practice the quadratic convergence of method (4.1.3) is desirable. I n Section 5.1. we found (see condition (5.1.56)) a condition weaker than (4.2.51) but stronger than (8.1.16) so that the quadratic convergence of method (4.1.3) is guaranteed (see condition (5.1.56)).
8.2
Miranda Theorem
In this section we are concerned with the problem of approximating a locally unique solution x* of an equation
where F is defined on an open convex subset S of Rn (n a positive integer) with values in Rn. We showed in Section 5.1 that the Newton-Kantorovich hypothesis (4.2.51) can always be replaced by the weaker (5.1.56). Here we first weaken the generalization of Miranda's theorem (Theorem 4.3 in [227]). Then we show that operators satisfying the weakened Newton-Kantorovich conditions satisfy those of the weakened Miranda's theorem. In order for us to compare our results with earlier ones we need to list the following theorems guaranteeing the existence of solution x* of equation (8.2.1) (see Chapter 4).
Theorem 8.2.1 Let F : S + Rn be a Fre'chet-differentiable operator. Assume: there exists xo E S such that F1(xo)-l E L(Y,X), and set
there exists an 7 2 0 such that
there exists an e > 0 such that
and
8. Applications of the weaker version of the Newton-Kantorovich theorem
292 where,
Then there exists a unique solution x*
E
U ( x o ,r * ) of equation F ( x ) = 0.
Remark 8.2.1 Theorem 8.2.1 is the portion of the Newton-Kantorovich theorem (4.2.4). The following theorem is due to Miranda [236], which is a generalization of the intermediate value theorem: Theorem 8.2.2 Let b > 0, xo E Rn, and define
Q = { x E Rn I IIx - xollm I b},
Qi= { X E Q : xk = X O +, ~b), and
Let F = (Fk) ( 1 5 k 5 n ) :Q k = l , 2 ,..., n,
+
Rn be a continuous operator satisfying for all
Then there exists x* E Q such that F ( x * ) = 0. The following result connects Theorems 8.2.1 and 8.2.2. Theorem 8.2.3 Suppose F : S + Rn satisfies all hypotheses of Theorem 8.21 in the maximum norm, then G satisfies the conditions of Theorem 8.2.3 on U ( x o r, * ) .
In the elegant study [227]a generalization of Theorem 8.2.3 was given (Theorem 4.3). We first weaken this generalization. Let Rn be equipped with a norm denoted by 11 . 11 and RnXn with a norm 11 . 11 such that IIM . X I [ 5 llMll. IIxII for a11 M E RnXn and x E Rn. Choose constants co, cl > 0 such that for all x E Rn
since all norms on finite dimensional spaces are equivalent. Set
8.2. Miranda Theorem Definition 8.2.4 Let S C Rn be an open convex set, and let G : S diferentiable operator on S . Let xo E S , and assume: G1(xo)= I
(the identity matrix)
293
Rn be a (8.2.14)
there exists 17 2 0 such that llG(x0)Il
5 17;
there exists an l o 2 0 such that J J G 1 ( x ) - G 1 ( x oI)eJ o J ) J x - x o J Jf o r a l l x ~ S .
Define:
W e say that G satisfies the weak center-Kantorovich conditions i n xo if
W e also say that G satisfies the strong center-Kantorovich conditions i n xo i f
Moreover define
and
R = [ T I ,rz] for l o # 0. Furthermore if lo= 0, define
and
Remark 8.2.2 The weak and strong center-Kantorovich conditions are equivalent only for the maximum norm.
8. Applications of the weaker version of the Newton-Kantorovich theorem
294
As in [I961 we need to define certain concepts. Let r
U ( r ) = { Z E Rn ~ ( x oT ), = { X = xo
> 0 , xo
E Rn, and define
I 1 1 ~ 1 15 r ) ,
(8.2.25) (8.2.26)
+ z E IWn I z E U ( r ) ) ,
I l l ~ l l= r , Z k = I I z ~ ~ ~ ) , U i ( r ) = ( 2 E Rn I l l ~ l l= T , Z k = -11211~0), ~ : ( x ~ , r=) { x = xo + z E IWn I z E ~ $ ( r ) ) , UF ( x o ,r ) = { x = xo + x E IWn I x E U F ( r ) ) for all k = 1,2, . . . ,n. ~ : ( r )= ( 2 E Rn
(8.2.27) (8.2.28) (8.2.29) (8.2.30)
We show the main result:
Theorem 8.2.5 Let G : S + Rn be a differentiable operator defined on an open convex subset of Rn. Assume G satisfies the strong center-Kantorovich conditions. Then, for any r E R with U ( x o ,r ) Z S the following hold: (a)
U = U ( r ) = U ( x o ,r ) is a Miranda domain [ 2 2 7 ,
(8.2.31)
and
is a Miranda partition [2271 of the boundary d U . It is a canonical Miranda partition [2271 for r > 0 and a trivial Miranda domain and partition for r = 0; (b)
G~(X)LO forall X E U $ ( X O , ~ ) , k = l , ..., n
(8.2.33)
and G k ( x ) 5 0 for all x E U i ( x O , r ) , k = 1,..., n;
(8.2.34)
(c) G satisfies the Miranda conditions on (U,U l ) ; (d) if G ( x o ) = 0 and lo > 0 , then G satisfies the Miranda conditions on (U,U I ) for any r E 0 such that U ( x o ,r )
c S.
[
,lo1
Proof. The first point of the theorem follows exactly as in Theorem 4.3 in [227]. For the rest we follow the proof of Theorem 4.3 in [227] (which is essentially the reasoning of Theorem 8.2.3) but with some differences stretching the use of centerLipschitz condition (8.2.16) instead of the stronger Lipschitz condition (8.2.4) which is not really needed in the proof. However it was used in both proofs mentioned above. Using the intermediate value theorem for integration we first obtain the
8.2. Mzranda Theorem
identity G(xo
=
+rz)
I'
-
G ( x o )=
+
[G' ( X O r t z ) - G' ( x ~ ) ] rdtz
+ rz
I'
d t ( G 1 ( x O= ) I).
(8.2.36)
Let ek denote the k-th unit vector. Then we can have: Gk (xo
+ r z ) = G k ( 2 0 )+
I'
+
e Z [ ~ l ( x o r t z ) - G 1 ( x o ) ] r dt z
+rzk,
(8.2.37)
and by (8.2.16)
5
l11
+
e: [ ~ ' ( x o r t z ) - G' ( x o ) ] r zdt
1
Let z E ~ k + ( lUsing ). (8.2.38), and
zk
1 >C2
we get from (8.2.37)
for u E Uk+(l),
(8.2.40)
296
8. Applications of the weaker version of the Newton-Kantorovich theorem
for r E R (by (8.2.19) and (8.2.22)). Similarly,
for r E R. If G ( x o ) = 0, let 7 = 0 which implies ho = 0. Remark 8.2.3 If C = lo,then our Theorem 8.2.5 becomes Theorem 4.3 i n [227]. Moreover if 11 11 is the maximum norm then Theorem 8.2.5 becomes Theorem 8.2.3. However i n general
Hence the strong Kantorovich condition is such that
Similarly the Kantorovich condition (8.2.5) is such that
but not vice versa unless if to = e. If strict inequality holds i n (8.2.44) and conditions (8.2.45) or (8.2.5) are not satisfied then the conclusions of Theorem 4.3 i n [227] respectively do es not necessarily hold. However if (8.2.8) holds the conclusions of our Theorem 8.2.5 hold. Remark 8.2.4 Condition (8.2.5) guarantees the quadratic convergence of NewtonKantorovich method t o x*. However this is not the case for condition (8.2.18). To rectify this and still use a condition weaker than (8.2.5) (or (8.2.45)) define
W e showed i n Section 5.1 that zf
and
replace (8.2.5) then Newton-Kantorovich method converges to x* E c ( x o ,r 3 ) with r3 5 r * . Moreover finer error bounds o n the distances between iterates or between
8.3. Computation of Orbits
297
iterates and x* are obtained. If we restrict 6 E [0, I], then (8.2.50) and (8.2.51) hold i f only (8.2.49) is satisfied. Choose 6 = 1 for simplicity i n (8.2.49). Then again
but not vice versa unless if
e = lo.Hence if (8.2.18) is replaced by
then all conclusions of Theorem 8.2.5 hold.
8.3
Computation of Orbits
In this section we are concerned with the problem of approximating shadowing orbits for dynamical systems. It is well known in the theory of dynamical systems that actual computations of complicated orbits rarely produce good approximations to the trajectory. However under certain conditions, the computed section of an orbit lies in the shadow of a true orbit. Hence using product spaces and the results in Section 5.1, we show that the sufficient conditions for the convergence of Newton's method to a true orbit can be weakened under the same computational cost as in the elegant work by Hadeller in [177]. Moreover the information on the location of the solutions is more precise and the corresponding error bounds are finer. Let f be a F'rkchet-differentiable operator defined on an open convex subset D of a Banach space X with values in X. The operator f defines a local dynamical system as follows:
as long as xk E D . ~ D with x,+l = f (x,), i = 0, ...,N - 1 is called an A sequence { x , ) ~ in orbit. Any sequence { x , ) ~ ~ ,x, E D , rn = 0, ...,N is called a pseudo-orbit of length N. We can now pass to product spaces. Let y = X N + l equipped with maximum norm. The norm x = (xo, ..., xN) E Y is given by
Set S
= DN+l.
F (x) = F
Let F : S + Y be an operator associated with
['I[ XN
f (20) - x1
f
(XN)
1.
f:
8. Applications of the weaker version of the Newton-Kantorovich theorem
298
Assume there exist constants lo, 1, Lo,L such that:
for all u, v E D , u,v E S. From now on we assume: lo = Lo and 1 = L. For y E X define an operator
It follows that Fi (x)= F' (x). As in [I661 define the quantities a ( . )
= man
O 0 , on which the operator Lemma 8.4.1 Let D be an open subset of Rn. Assume H : J x D
for all (t,x ) E I x U is well defined and F has a partial Fre'chet derivative with respect to x at ( t * , x * ) given by
Moreover i f there exist positive constants lo and 1 such that
and
for all (t,x ) , (t,y ) E lox Uo, then the continuity of H at (t*,x * ) and strongness of H, (t*,x * ) implies the strongness of G,(t*, x*). Proof. Let us fix E E ( 0 , l ) and set S = & so that
and
IIA(t*,x * ) - ' [ ~ ( tx, ) - A ( t * ,x*)]11 I eo6 = E < 1 for all (t,x )
EI
x S. (8.4.8)
It follows by (8.4.8) and the Banach Lemma on invertible operators 1.3.2 that A(t,2 ) - I exists, and
IIA(~,x)-~A(~*,x*)II -< ( 1 - & ) - I
for all ( t , x ) E I x S.
By restricting S if necessary we can have
(8.4.9)
302
8. Applications of the weaker version of the Newton-Kantorovich theorem
and
I I A ( ~ * , X * ) - ~ H (I~ E , ~ )for I I all ( t , y ) E I x S.
(8.4.10)
Using (8.4.4)-(8.4.7), (8.4.9)-(8.4.10) and the strongness of H x ( t * ,x * ) we obtain in turn:
IIG(t,x) - G ( t 7 ~-) Gx(t*7x*)(x- Y ) I I I I l A ( t * , x * ) - l [ H ( t , x )- H ( ~ , Y )- H x ( t * , x * ) ( x - y ) l l l + IIA(t*,x * ) - l [ A ( t ,x ) - A ( t * ,x * ) ] . ~ ( xt ),- ~ A ( ~x * ), A ( ~x**, ) - l ( ~ (X t) ,- ~ ( y))ll t , I I A (y~) ,- l ~ ( t x**, ) A ( ~x**, ) - l ( ~ ( tX) , - A(t,y))A(t,x)-lA(t*,x*)A(t*, x * ) - l H ( t , y)II I sallx - yll,
+
where,
which shows that G x ( t * ,x * ) is strong. The rest of the result follows by a classical lemma due to Ortega and Rheinboldt [263].
Remark 8.4.1 In general
&
can be arbitrarily large (for t suficiently close to t * ) . If e = Lo holds and Lemma 8.4.1 can be reduced to Lemma 3.1 i n [103]. However i n that Lemma 6 is chosen is an unspecified (in terms of E ) way, whereas i n our proof it is specified (6 = %) . Moreover if strict inequality holds I , U are enlarged, a is smaller than the corresponding a call it a1 given i n [103, p. 1091 by
where,
and the strongness of G x ( t * ,x*) is established on a larger cross product I x S . The above benefits are achieved under the same hypotheses and computational cost since i n practice the computation of C requires that of lo.Furthermore a wider choice of initial guesses to start Newton's method becomes available since U is enlarged. W e also note that our result is given i n an afine invariant form. The advantages of such an approach have been explained i n Section 5.1. Finally we can see from the from of the Lemma that condition (8.4.7) can be replaced by the stronger
8.4. Continuation Methods
303
3
for all ( t l ,x ) , (t2,y ) E I. x Uo. Clearly l 5 e l , lo 5 L 1 , and again can be arbitrarily large. Under this modification t does not have to necessarily be close enough to t*for the conclusions of the Lemma to hold (with el replacing 4 throughout the statment and the proof of the Lemma). In the result that follows we provide a local convergence analysis for method (4.1.3):
Theorem 8.4.2 Let H : J x D c J x Rn -, Rn, where D is an open subset of Rn and H has partial Fre'chet derivatives with respect to both t and x . Assume x*: J + D is a continuous solution of the equation H ( t , x ) = 0, t E J . It follows by the openness of D that there exists r* > 0 such that Do = { v E X llx* - v(t*)ll < ro for some t* E J ) 2 D.
(8.4.16)
Moreover assume: Theorem 8.4.3 Ht is continuous on J x Do, L = Hz. (t*,x*)-l exists, and IILHt(t,v(t))II I a for allt E J ; IIL[Hx(t,x)- HX(~,Y)IIII ell. - yll and
for all t E J and for all x , y, v
Theorem 8.4.4 for y =
E
Do;
m,and
ro satisfies ro I q. Moreover let hl be the largest value in [0,&]
where,
Then, if the integer N is chosen so that
such that
304
8. Applications of the weaker version of the Newton-Kantorovich theorem
Newton's method
k=1,
..., N - 1 ,
xO=x(0);
, 1,. . . xk = xk-l - H , ( ~ , X ~ - ~ ) - ~ H ( ~ , kX=~ N- ,~N) +
(8.4.26)
is feasible. Proof. By hypotheses and the Banach Lemma on invertible operators H x ( t ,x)-I exists and
We then set ek = 11x"l we get
11
- x(kAt)
and exactly as in Theorem 4.3 in [103,p. 1131
where,
Pie
a = - and ~ = @ ~ r r . 2
(8.4.29)
By hypotheses 1 - 4ac 2 0. Therefore it follows from Lemma 4.2 in [103, p. 1121 that
Remark 8.4.2 According to the hypotheses of Theorem 4.3 in [I 031 the parameter p in practice will be given b y
P= where
IIHZ* (t*,x*)-llI 1 - 11 Hz* (t*,x*)-l llezro '
e2 is the constant appearing in hypothesis (iii) in that theorem given by IIH,(t, X ) - H,(t, y)II 5 e211x - 911 for all t E J , and all x , y E Do.
Clearly
2 can be arbitrarily large. In particular if
then P1
< P,
(8.4.32)
8.5. Trust Regions which implies that
and is the largest value in
[o, 20$r,]
such that
where, 1 rl ( h ) = -( 1 - J1 - 2ap212h ) . Dl2
Note also that in this case
That is our approach and under the same hypotheses and computational cost has the following advantages over the one given in Theorem 4.3 in [103]: a smaller number of steps; a larger convergence radius; finer upper bounds on the distances involved and the results are given in afine invariant fom. As a simple example where strict inequality holds in (8.4.12), consider
and define operator H on J x Do b y
Then it can easily be seen using (8.4.18) and (8.4.19) that
The rest of the results in [I031 can now be improved if rewritten with placing (/?, h ) , see, e.g., Theorem 4.7 and Corollary 4.8.
8.5
( P I ,h l ) re-
Trust Regions
Approximate solution of equation involves a complex problem: choice of an initial approximation xo sufficiently close to the true solution. The method of random choice is often successful. Another frequently used method is to replace
F ( x )= 0
(8.5.1)
by a "similar" equation and to regard the exact solution of the latter as the initial approximation xo. Of course, there are no general "prescriptions" for admissible
306
8. Applications of the weaker version of the Newton-Kantorovich theorem
initial approximations. Nevertheless, one can describe various devices suitable for extensive classes of equations. As usual, let F map X into Y. To simplify the exposition, we shall assume that F is defined throughout X . Assume that G (x.A) ( x E X ; 0 5 A 5 1 ) is an operator with values in Y such that
G ( x ;1 ) = F x
(xE X ),
(8.5.2)
and the equation
G ( x ;0 ) = 0
(8.5.3)
has an obvious solution xO. for example, the operator G ( x ;A) might be defined by
G ( x ;A)
=F
x - ( 1 - A) FX'.
(8.5.4)
Consider the equation
Suppose that equation (8.5.5) has a continuous solution x = x ( A ) , defined for 0 5 A 5 1 and satisfying the condition
Were the solution x ( A ) known,
would be a solution of equation (8.5.1). One can thus find a point xo close to x* by approximating x (A). Our problem is thus to approximate the implicit function defined by (8.5.5) and the initial conditions (8.5.6). Global propositions are relevant here theorems on implicit functions defined on the entire interval [0,11. the theory of implicit functions of this type is at present insufficiently developed. The idea of extending solutions with respect to a parameter is due to S.N. Bernstein [322];it has found extensive application in various theoretical and applied problems. Assume that the operator G ( x ;A) is differentiable with respect to both x and A, in the sense that there exist linear operators Gk ( x ;A) mapping X and Y and elements Gk ( x ;A) E Y such that lim Ilhll+lAxl+o
I G(x+~;x+Ax)-G(x; )-G;(x; )~-G;(x; )AxI
~lhll+lnxl
= 0.
The implicit function x ( A ) is then a solution of the differential equation
Gk ( x ;A)
+ Gk ( x ;A) = 0 ,
(8.5.8)
8.5. Trust Regions
307
satisfying the initial condition (8.5.6). conditions for existence of a solution of this Cauchy problem defined on [0,1]are precisely conditions for existence of the implicit function. Assuming the existence of a continuous operator
we can rewrite equation (8.5.8) as
One must bear in mind that Peano's Theorem is false for ordinary differential equations in Banach spaces. Therefore, even in the local existence theorem for equation (8.5.10)with condition (8.5.6) one must assume that the right-hand side of the equation satisfies certain smoothness conditions. However, there are no sufficient smoothness conditions for the existence of a global extension of the solution to the entire interval 0 X 5 1. We shall only mention a trivial fact: if the equation
s
$ = f (x;A) in a Banach space satisfies the local existence theorem for some initial condition, and
then every solution of equation (8.5.11) can be extended to the entire interval 0 5 X 5 1. Thus, if
then equation (8.5.5)defines an implicit function which satisfies (8.5.6) and is defined for 0 5 X 5 a. consequently, condition (8.5.13) guarantees that equation (8.5.1)is solvable and its solution can be constructed by integrating the differential equation (8.5.10). To approximate a solution x (A) of equation (8.5.10),one can use, for example, Euler's method. To this end, divide the interval [O,1]into m subintervals by points
The approximate values x (Xi)of the implicit function x (A) are then determined by the equalities x (Ao)= xO and X
(Xi+')
=X
(Xi)- r [X (Xi); Xi]G i [X (Xi);Xi] (&+I
- Xi) .
(8.5.15)
The element x (A,) is in general close to the solution x* of equation (8.5.1)and one may therefore expect it to fulfill the demands imposed on initial approximations for iterative solution of equation (8.5.1). We emphasize that (8.5.15)does not describe an iterative process; it only yields a finite sequence of operations, whose result is
8. Applications of the weaker version of the Newton-Kantorovich theorem
308
an element which may be a suitable initial approximation for iterative solution of equation (8.5.1). Other constructions may be used to approximate the implicit function x (A). Partition the interval [ O , 1 ] by the points (8.5.14). The point x (XI) is a solution of the equation G (x, A1) = 0. Now x (Ao) = xO is a suitable initial approximation to x (A1) . Approximate x (A1) by performing a fixed number of steps of some iterative process. The result is an element xl which should be fairly close to x (A1). xl is obtained from xOby a certain operator xl = W [xO;G (x; XI)] . Now regard xl as an initial approximation to the solution x (Az) of the equation G (x; Az), and proceed as before. The result is an element xz:
Continuing in this way, we obtain a finite set of points
the last of which x,, may be regarded as an initial approximation for iterative solution of equation (8.5.1) If the operator W represents one iteration of the method, formula (8.5.16) is
Using the algorithm proposed by H.T. Kung [216] (see also [322]) we show that the number of steps required to compute a good starting point xo (to be precised later) can be significantly reduced. This observation is very important in computational mathematics. In Chapter 5 we showed the following semilocal convergence theorem for Newton's method (4.1.3) which essentially states the following: If
F' (xo)-l exists,
r) for all x, y E U (XO,
where,
I F'
(xO)-lII
< PO,
(8.5.18)
8.5. Trust Regions
309
and
then sequence {x,} (n 2 0) generated by Newton's method (4.1.3) is well defined, remains in U (xo,ro) for all n 2 0 and converges quadratically to a unique solution x* E U (xo,r ) of equation F (x) = 0. Moreover we have
Remark 8.5.1 In general
holds. If equality holds in (8.5.27) then the result stated above reduces to the famous Newton-Kantorovich theorem and (8.5.22) to the Newton-Kantorovich hypothesis (4.2.51). If strict inequality holds in (8.5.27) then (8.5.22) is weaker than the Newton-Kantorovich hypothesis
Moreover the error bounds on the distances llxn+l - xnll , llx, - x* 11 (n 2 0) are finer and the information on the location of the solution more precise. Note also that the computational cost of obtaining (KO,K ) is the same as the one for K since in practice evaluating K requires finding KO. Hence all results using (8.5.28) instead of (8.5.22) can now be challenged to obtain more information. That is exactly what we are doing here. In particular motivated by the elegant work of H.T. Kung [216] on good starting points for Newton's method we show how to improve on these results of we use our theorem stated above instead of the Newton-Kantorovich theorem. Definition 8.5.1 We say xo is a good starting point for approximating x* by Newton's method or a good starting point for short if conditions (8.5.18)-(8.5.25) hold.
Note that the existence of a good starting point implies the existence of a solution x* of equation F (x) = 0 in U (xo,250) . We provide the following theorem / Algorithm which improves the corresponding ones given in [216, Thm. 4.11 to obtain good starting points. Theorem 8.5.2 Let F : D C X -+ Y be a Fke'chet-differentiable operator. If F' satisfies center-Lipschitz, Lipschitz conditions (8.5.20), (8.5.21) respectively on u ($0'2~)
IF'
(x)-'ll 5
for all x E u ( x o , 2 r ) ,
(8.5.29)
8. Applications of the weaker version of the Newton-Kantorovich theorem
310 and
Prlo
$ - 6 , and want to prove that b+l is a good starting point for approximating xi+n, a zero of fi+l. We have by (8.5.49) and (8.5.46),
I
(~~+~)ll.
Let b = [f{+,(Ti+,)]-' fi+l If x E U (Ti+1,2b), as in (8.5.50) we can prove that x E U ( x o ,2 r ) . hence Ti+l is a good starting point for approximating Xi+2. By the same argument as used for obtaining T o and by (8.5.49), one can prove that if Ti+2 is set to be the J ( 6 ) -th Newton iterate starting from Ti+l, then
and
+
i.e. (8.5.45) and (8.5.46) hold with i replaced by i 1. This shows that we need to perform at most J ( 6 ) Newton steps at step 4 of Algorithm B to obtain each Therefore, for any 6 E ( 0 , , to obtain a good starting point we need to perform at most N ( 6 ) = I ( 6 ) . J ( 6 ) Newton steps.
i)
Remark 8.5.3 A s noted i n [216]6 should not be chosen to minimize the complexity of Algorithm B. Instead, 6 should be chosen to minimixe the complexity of algorithm: 1. Search Phase: Perform algorithm B. 2. Iteration Phase: Perform Newton's method starting from the point obtained by
Algorithm B.
8.5. Trust Regions
315
An upper bound on the complexity of the iteration phase is the time needed to carry out T (6,K O ,K , E ) is the smallest integer K such that
Note also that if K O = K our Theorem 8.5.3 reduces to Theorem 4.2 in [216, p. 151. Hence we showed the following result:
Theorem 8.5.4 Under the hypotheses of Theorem 8.5.3 the time needed to find a solution x* of equation F ( x ) = 0 inside a ball of radius E is bounded above by the time needed to carry out R (6,K O ,K , E ) Newton steps, where
where N (6,K O ,K , E ) and T (6,K O ,K , E ) are givey by (8.5.32) and (8.5.51) respectively. Remark 8.5.4 If K O = K Theorem 8.5.4 reduces to Theorem 4.3 i n [216, p. 201. I n order for us to compare our results with the corresponding ones i n [216] we computed the values of R ( E , K O ,K ) for F satisfying the conditions of Theorem 4.3 i n (2161 and Theorem 8.5.4 above with .4r, Prlo I
(8.5.53)
and
and for
E
equal to lo-%, 1 I iI 10.
The following table gives the results for E = 1OP6r. Note that by I we mean I (60,K , K ) , IaK we mean I (60,K O ,K ) with K O = a K , a E [0,11. similarly for J , N and T.
Remark 8.5.5 It follows from the table that our results significantly improve the corresponding ones i n [216] and under the same computational cost. Suppose for example that ho = 9, 6 = .159. Kung found that the search phase can be done i n 255 Newton steps and the iteration phase i n 5 Newton steps. That is a root can be located inside a ball of radius using 260 Newton steps. However for K O = .9K, K O = .5K and K O = 0 the corresponding Newton steps are 245,179 and 104 respectively which constitute a significant improvement. A t the end of his paper Kung asked whether the number of Newton steps used by this procedure is close to the minimum. It is now clear from our approach that the answer is no (in general). Finally Kung proposed the open question: Suppose that the conditions of the Newton-kantorovich theorem hold: Is Newton's method optimal or close to optimal, i n terms of the numbers offunction and derivative equations required to approximate the solution x* of equation F ( x ) = 0 to within a given tolerance E ? Clearly according to our approach the answer is no (see also Remark 8.5.1).
316
8.6
8. Applications of the weaker version of the Newton-Kantorovich theorem
Convergence on a Cone
In this section we are concerned with the problem of approximating a solution x* of equation (5.1.1). We propose the Newton-Kantorovich method (5.1.47) to generate a sequence approximating x*. This study is motivated by the elegant work in [116],where X is a real Banach space ordered by a closed convex cone K. We note that passing from scalar majorants to vector majorants enlarges the range of applications, since the latter uses the spectral radius which is usually smaller than its norm used by the former. Here using finer vector majorants than before (see [116]) we show under the same hyotheses: (a) sufficient convergence conditions can be obtained which are always weaker than before. (b) finer error bounds on the distances involved and an at least as precise information on the location of the solution z* are provided.
Several applications are provided. In particular we show as a special case that the Newton-Kantorovich hypothesis (4.2.51) is weakened. Finally we study the local convergence of method (5.1.47). In order to make the study as self-contained as possible we need to introduce some concepts involving K-normed spaces. Let X be a real Banach space ordered by a closed convex cone K. We say that cone K is regular if every increasing sequence yl 5 y2 5 . - . 5 y, 5 . .. which is bounded above, converges in norm. If : y 5 yn 5 y: and limn,, : y = lim,, y: = y*, the regularity of K implies limn,, yn = y*. Let a , p E X the conic segment ( a , @ = {y I a 5 y 5 P). An operator Q in X is called positive if Q(y) E K for all y E K. Denote by L(X, X ) the space of all bounded linear operators in X , and L,,,(X2, X ) the space of bilinear,
8.6. Convergence o n a Cone
317
symmetric, bounded operators from X 2 to X . By the standard linear isometry between L(X2,X), and L(X, L(X, X)), we consider the former embedded into the latter. Let D be a linearly connected subset of K , and cp be a continuous operator from D into L(X, X ) or L(X, L(X, X)). We say that the line integral of cp is independent of the path if for every polygonal line L in D, the line integral depends only on the initial and final point of L. We define
We need the definition of K-normed space [365]:
Definition 8.6.1 Let X be a real linear space. T h e n X is said t o be K-normed if operator I.[: X 3 Y satisfies:
]x[ > 0 (X E X); ]x[ = o + x = o ; ]rx[ = Irl]x[ (x E X , r E R); and
Let xo E X and r E K . T h e n we denote
Using K - n o r m we can define convergence o n X . A sequence {y,) ( n > 0) i n X is said to be (1) convergent to a limit y E X if lim ]y,- y[ = O
n'm
and we write (X) - lim y, = y; n+m
(2) a Cauchy sequence if
inX
318
8. Applications of the weaker version of the Newton-Kantorovich theorem
The space X is complete if every Cauchy sequence is convergent. We use the following conditions: F is differentiable on the K-ball U(xo, R), and for every r E S = (0, R) there exist positive operators wo(r), m(r) E Ls,,(X2, X ) such that for all x E X
for all x E U(xo, r ) ,
for all x, y E U(xo,r), where operators wo,m: S + Lsy,(X2, X ) are increasing. Moreover the line integral of (similarly for wo) is independent of the path, and the same is true for the operator w: S + L(X, X ) given by
Note that in general wo(r) 5 m(r) for all r E S. The Newton-Leibniz formula holds for F on ~ ( X R): O,
for all segments [x,y] E U(xo, R); for every r E S there exists a positive operator wl(r) E L(X, X ) such that
where wl : S + L(X, X ) is increasing and the line integral of wl is independent of the path; Operator F1(xo) is invertible and satisfies:
for some positive operator b E L(X, X). Let
and define operator f : S
f (r) = 7 + b
1'
-+ X
w(t)dt
by letting
+b
1'
w1 (t)dt.
(8.6.11)
By the monotonicity of operators w, w1 we see that f is order convex, i.e. for all r, F E S with r 5 F,
We will use the following results whose proofs can be found in [116]:
8.6. Convergence on a Cone Lemma 8.6.2 (a) If Lipschitz condition (8.6.4) holds then
l(F'(x + Y )
-
F 1 ( x ) ) ( z ) [5 ( w ( r + I y [ )- w ( r ) ) ( l z [ >
for all r , r + ] y [ E S , x ~ U ( x o , r )z, E X ; (b) If Lipschitz condition (8.6.8) holds then
.+IY[ ] G ( X + Y ) - G ( X )5[
wl(t)dt for all r, r+ ] y [E S , x E ~ ( x or ) ,. (8.6.14)
Denote by F i x ( f ) the set of all fixed points of the operator f . Lemma 8.6.3 Assume:
Then there is a minimal element r* in F i x ( f ) which can be found by applying the method of successive approximations
with 0 as the starting point. The set B ( f , r * )= { r E S : lim f n ( r ) = r * ) n-oo
is the attracting zone of r*. Remark 8.6.1 [I161 Let r E S . If
f
(r)l r,
and (0,r ) n F i x ( f ) = {r*} then, ( 0 , ~L) B ( f , r * ) . Note that the successive approximations
converges to a fixed point E* of f , satisfying 0 5 s* 5 r . Hence, we conclude = r*, which implies r E B ( f , r * ) .
S*
8. Applications of the weaker version of the Newton-Kantorovich theorem
320
In particular, Remark 8.6.1 implies
(0, ( 1 - s)r*
+sr) G B(f,r*)
(8.6.22)
for every r E Fix( f ) with ( 0 ,r ) n Fix(f ) In the scalar case X = R, we have
B ( f , r * ) = [0,r*] U { r E S : r*
=
{r*,r ) , and for all X E [ O , l ) .
< r , f ( q ) < q, (r* < q I r)).
(8.6.23)
We will also use the notation
Returning back to method (5.1.47),we consider the sequences of approximations
and -
rn+l = Pn - (bw(Pn)- I)-' ( f (Pn) - P n )
(Po = 0 , n 2 0 )
(8.6.26)
for the majorant equation (8.6.16). Lemma 8.6.4 If operators
are invertible with positive inverses, then sequence {r,) ( n 2 0 ) given by (7) is well defined for all n 2 0 , monotonically increasing and convergent to r*. Proof. We first show that if a1 5 a2, a:! # r * , then (1- bwo(a1))-'8
I ( I - bwo(a2))-'8
We have
where,
O2
=
( I - bwo(a2))-'8.
Using the monotonicity of w we get
for all 8 E K.
(8.6.28)
8.6. Convergence on a Cone
321
which implies increasing sequence { ( b ~ ~ ( a l ) )(~nO2) 0 ) is bounded in a regular cone K, and as such it converges to some
with
We also need to show that the operator g ( r ) = r - (bwo(r) - I)-' ( f ( r ) - r )
(8.6.33)
is increasing on (O,r*) - { r * ) . Let bl I b2 with bl, b2 E [O,r*],then using (8.6.6) and (8.6.33) we obtain in turn
since all terms in the right hand of equality (8.6.35) are in the cone. Moreover, g leaves ( 0 ,r * ) - { r * } invariant, since
Hence sequence
is well defined for all n 2 0, lie in the set (0, r * ) - { r * ) , and is increasing. Therefore the limit of this sequence exists. Let us call it r;. The point r ; is a fixed point of f in ( 0 ,r*). But r* is the unique fixed point of f in (0, r * ) . Hence, we deduce
rT
= r*.
(8.6.38)
Remark 8.6.2 If equality holds i n (8.6.6) then sequence 1%) becomes {r,) ( n > 0 ) and Lemma 8.6.4 reduces to the corresponding Lemma i n [I 16, p. 5551. Moreover as it can easily be seen using induction on n
322
8. Applications of the weaker version of the Newton-Kantorovich theorem
and
>
for all n 0. Furthermore if strict inequality holds in (8.6.6) so does in (8.6.39) and (8.6.40). If {r,} ( n 2 0 ) is a majorizing sequence for method (5.1.47), then (8.6.39) shows that the error bounds on the distances llxn+l - xnll are improved. It turns out that this is indeed the case. We can show the semilocal convergence theorem for method (5.1.47). Theorem 8.6.5 Assume: hypotheses (8.6.4), (8.6.5), (8.6.7), (8.6.8), (8.6.9), (8.6.15) hold, and operators (8.6.27) are invertible with positive inverses. Then sequence {x,) ( n 2 0 ) generated by Newton-Kantorovich method (5.1.48) is well defined, remains in the K-ball U ( x o , r * )for all n > 0 and converges to a solution x* of equation F ( x ) + G ( x ) = 0 in E ( r * ) , where E ( r * ) is given by (8.6.24). Moreover the following estimates hold for all n 2 0:
and
where sequence {r,} is given by (8.6.25). Proof. We first show (8.6.41) using induction on n
Assume:
for k = 1,2, . . . ,n. Using (8.6.44) we get
Define operators Q,: X -+ X by
By (8.6.3) and (8.6.9) we get
2 0 (by(8.6.10)). For n = 0;
8.6. Convergence on a Cone
323
and
Hence, we have:
00
That is series and
C Q k ( z ) is convergent in X.
i=O
Hence operator I
-
Qn is invertible,
Operator F 1 ( x n )is invertible for all n 2 0 , since F 1 ( x n ) = F 1 ( x o ) ( I- Q n ) , and for all x E Y we have:
Using (5.1.47) we obtain the approximation
It now follows from (8.6.3)-(8.6.9), (8.6.11), (8.6.25) and (8.6.52)
324
8. Applications of the weaker version of the Newton-Kantorovich theorem
By Lemma 8.6.4 sequence { r n ) ( n 2 0) converges to r*. Hence {x,) is a convergent sequence, and its limit is a solution of equation (5.1.1). Therefore xn converges to x* . Finally (8.6.42) follows from (8.6.41) by using standard majorization techniques. The uniqueness part is omitted since it follows exactly as in Theorem 2 in [116]. w
Remark 8.6.3 It follows immediately from (8.6.53) and (8.6.54) that sequence to = to,
tl
=
v,
is also a finer majorizing sequence of { x n ) ( n 2 0 ) and converges to some t* i n (0, r * ) . Moreover the following estimates hold for all n 0
>
and
t* 5 r*.
(8.6.61)
That is {t,) is a finer rnajorizing sequence than { r n ) and the information on the location of the solution x* is more precise. Therefore we wonder i f studying the convergence of { t n ) without assuming (8.6.15) can lead to weaker suficient convergence conditions for method (5.1.47). I n Theorem 8.6.7 we answer this question. But first we need the following lemma on majorizing sequences for method (5.1.47).
8.6. Convergence on a Cone Lemma 8.6.6 If: there exist parameters 7 > 0 , 6 E [O, 2 ) such that: Operators
be positive, invertible and with positive inverses for all n > 0;
and 2b
Jdl
(I7 + s ( Y ) ~ "71ds 61 2(2I - 61)- (I - ( ) )
w [2(2I - 6 I ) - l
($)"+I)
n+l
1
2(2I - 61)-' ( I - ( 2(2I - 61)-l
I
SI,
3
~ ) ~ + 71 l )
(I- (g)n+l) 71
for all n 2 0.
Then iteration {t,) (n 2 0 ) given by (8.6.56) is non-decreasing, bounded above by
t**= 2(2I - &I)-'rl, and converges to some t* such that: 0 5 t* I t**.
Moreover the following estimates hold for all n 2 0:
Proof. We must show:
and operators
(8.6.64)
326
8. Applications of the weaker version of the Newton-Kantorovich theorem
positive, invertible and with positive inverses. Estimate (8.6.67) can then follow immediately from (8.6.68)-(8.6.70). Using induction on the integer k, we get for k=O
bwo (tl) < I, by the initial conditions. But (8.6.56) then gives 0 5 t2 - t l 5 %(tl
-
(8.6.71)
to).
Assume (8.6.68)-(8.6.70) hold for all k 5 n in turn:
+ 1. Using (8.6.62)-(8.6.64) we obtain
Moreover, we show:
We have:
Assume (8.6.73) holds for all k 5 n
+ 1. It follows from (8.6.56), (8.6.68)-(8.6.70):
Hence, sequence {t,) (n 2 0) converges to some t* satisfying (8.6.66). We can show the main semilocal convergence theorem for method (5.1.47).
8.6. Convergence on a Cone
327
Theorem 8.6.7 Assume: hypotheses (8.6.3)-(8.6.5), (8.6.7)-(8.6.9), (8.6.62)-(8.6.64) hold, and
where t** is given by (8.6.65). Then sequence {x,) ( n >) generated by Newton-Kantorovich method (5.1.47) is well defined, remains in the K-ball U ( x o , t * )for all n 2 0 and converges to a solution x* of equation F ( x ) + G ( x ) = 0, which is unique in E ( t * ) . Moreover the following error bounds hold for all n 2 0:
and
where sequence {t,) ( nL 0 ) and t* are given by (8.6.56) and (8.6.66) respectively. Proof. The proof is identical to Theorem 8.6.5 with sequence t, replacing r , until the derivation of (8.6.53). But then the right hand side of (8.6.53) with these changes becomes t,+' - t,. By Lemma 8.6.6 {t,) converges to t*. Hence {x,) is a convergent sequence, its limit converges to a solution of equation F ( x ) G ( x ) = 0. Therefore {x,) converges to x*. Estimate (8.6.77) follows from (8.6.76) by using standard majorization techniques. The uniqueness part is omitted since it follows exactly as in Theorem 2 in [116].
+
Remark 8.6.4 Conditions (8.6.62), (8.6.64) can be replaced by the stronger but easier to check
and
respectively. Application 8.6.8 Assume operator ] [ is given by a nomn 11 11 and set G ( x ) = 0 for all x E U ( x o ,R ) . Choose for all r E S , b = 1 for simplicity,
328
8. Applications of the weaker version of the Newton-Kantorovich theorem
and
With these choices our conditions reduces to the ones in Section 5.1 that have already been compared favourably with the Newton-Kantorovich theorem. Remark 8.6.5 The results obtained here hold under even weaker conditions. Indeed, since (8.6.4) is not "directly" used i n the proofs above, it can be replaced by the weaker condition (8.6.13) throughout this study. A s we showed i n Lemma 8.6.2
but not necessarily vice versa unless if operator w is convex [116, p. 6741. The local convergence for method (5.1.47) was not examined in [Ill]. Let x* be a simple solution of equation (5.1.1), and assume F ( x * ) - l E L(Y,X ) . Moreover assume with x* replacing xo that hypotheses (8.6.3), (8.6.4), (8.6.5), (8.6.7), (8.6.8), (8.6.9) hold. Then exactly as in (8.6.53) but using the local conditions, and the approximation
we can show the following local result for method (5.1.47). Theorem 8.6.9 Assume there exists a minimal solution r* E S of equation
where,
Then, sequence {x,) ( n 2 0 ) generated by Newton-Kantorovich method (5.1.47) is well defined, remains i n V ( x * ,r * ) for all n 2 0 , and converges to x* provided that 2 0 E U ( x * ,r * ) . Moreover the following estimates hold for all n 2 0
where,
8.7. Convergence on Lie Groups
329
Application 8.6.10 Returning back to the choices of Application 8.6.8 and using (8.6.85) we get
See also Section 5.1.
8.7
Convergence on Lie Groups
In this section we are concerned with the problem of approximating a solution x* of the equation
where F is an operator mapping a Lie group to its corresponding Lie algebra. Most numerical algorithms are formulated on vector spaces with some type of a Banach space structure that allows the study of the convergence properties of the iteration. Recently, there has been an interest in studying numerical algorithms on manifolds (see [265] and the references there). Note that a suitable choice of coordinates can be interpreted locally to be the Euclidean space, which justifies the study of such algorithms on manifolds. In particular, Newton's method for solving equation (8.7.1) has already been introduced in the elegant work in [265], where the quadratic convergence of the algorithm was established provided that the initial guess is chosen from a ball of a certain convergence radius. Here by using more precise majorizing sequences we show that under the same hypotheses and computational cost a larger radius of convergence and finer error bounds can be obtained than before [265]. This observation allows a wider choice of initial guesses for starting Newton's method and shows that fewer iterations are required to find the solution than it is suggested in [265]. Finally we provide a numerical example to justify what we stated in the above paragraph. Our study is divided as follows: first we provide the background in condensed form needed for this paper, then we introduce Newton's method and state some results obtained in [265] that are needed for the main local convergence results appearing later in the study. A Lie group (G, .) is understood to have the structure of a smooth manifold, i.e. the group product and the inversion are smooth operations in the differentiable structure embedded in the manifold. The dimension of the Lie group equals the one of the corresponding manifold, and we assume that it is finite. We will need the definitions [265]: Definition 8.7.1 A Lie algebra g of a Lie group (G,.) is the tangent space at i (i the identity element of G), TGIi equzpped with the Lie bracket [-, 1- : g x g + g
8. Applications of the weaker version of the Newton-Kantorovich theorem
330
defined for u , v
Eg
by
where, y(t), z ( s ) are two smooth curues in G satisfying y(0) = z(0) = e, ~ ' ( 0=) u and ~ ' ( 0=) v . Definition 8.7.2 The exponential map is an operator
exp: g
4
G :u
4
exp(u).
Given u E G , the left invariant vector field X u : y + L&(u)(where Ll(g) = TG/,) determines an integral curve y,(t) solving the initial value problem Th(t) = Xu(Tu ( t ) ) % ( O ) = e. If N(i) = exp(T(0)) where T ( 0 ) is an open neighborhood of o E g then for any y E N ( i ) y = exp(v) for some v
E N(0)
and if exp(u) = exp(v) E N (i) for some u , v E X ( O ) ,then u = v . If G is Abelian, then exp(u + v ) = exp(u)exp(v) = exp(v)exp(u) for all u , v E g . In the non-Abelian case (8.7.4) is replaced b y exp(w) = exp(u)exp(v), where w is given by the (BCH) formula [265, p. 1141 w =u
1 + v + -21[u,v]+ ( [ u ,[u,v ] ]+ [v,[v,~ 1 1 + ) .. 12
for u , v E N ( 0 ) . Definition 8.7.3 The differential of exp at u E g is a map
exp:, : g
--,TG/exp(,)
expressed by d exp, : g
4
d exp, = L;,,p(,))-l
g by 0
exp,' - L~x,(-u) e x d .
8.7. Convergence o n Lie Groups
331
Definition 8.7.4 T h e differential of operator F at a point y E G is a map
F; : TGIY-. T g l f (y)
"S
defined by
for any tangent vector Ay E TG/, t o the manifold G at y. It follows:
FA (Ay ) = lim F ( Y .2) - F(Y) t-0
-
t
As in (8.7.7)) the differential g g given by
Fi can be expressed in terms of
dFy = ( F o Ly)' = F; o L;,
an operator D f y (8.7.9)
and consequently
We can now define Newton's method on the Lie group (G, .) using (8.7.8) as follows: by (8.7.8). Then find Ago E Choose yo E G and determine the differential TG/,, satisfying the approximation
Fin
. it follows by (8.7.9) that Newton's and then set yl = yo e x p ( ~ ~ , , ( A y 0 ) )Hence method can be formally written as: Given yn E G, determine dFyn according to (8.7.10); Find u , E g such that
compute
It is well known that a Lie group (G, .) is a manifold and as such it is also a topological space. Therefore the notion of convergence is defined with respect to this topology. We find it convenient to introduce a Banach space structure on the Lie algebra g, and use it to analyze convergence.
8. Applications of the weaker version of the Newton-Kantorovich theorem
332
As already mentioned before we only discuss finite dimensional Lie algebras. Therefore every linear operator h : g + g is bounded. Consequently we can define its norm subordinate to that on g by
Ilh(u)lI = sup Ilh(u)II < 00. llhll = sup u#O llull Ilull=l We also need the definitions concerning convergence of sequences:
Definition 8.7.5 A sequence {y,) (n > 0 ) i n N ( i ) converges to i .if and only if corresponding sequence implicitly defined by y, = exp(u,) (n > 0 ) converges to 0, i n the sense that lim llunll = 0.
n+m
Definition 8.7.6 If there exists y
E
G such that
and (8.7.13) holds, then we say sequence {y,) converges to y . Moreover the order of convergence is X > 0 if
The continuity of the Lie bracket [-, p 2 0 such that
-1
guarantees the existence of a constant
We need the Lemmas whose proofs can be found in [265]: Lemma 8.7.7 There exists a constant 6
i n the disk It1
> 0 and a real function f
and
I
P
I 5, l l ~ l l +
for any ,B 5
=f
I 6, non-decreasing for t > 0 , with h ( 0 ) = 1, such that for
llvll I iR +-Is
-
vll
I llvllf (1-441,
$ , where w is given by (8.7.6).
Lemma 8.7.8 For y E G , u E g and t E R we have
(t) analytic all u, v
E
g
8.7. Convergence on Lie Groups
333
W e also need the radius r
=
m a x { t E R, su E N(o), for all s E [0,t ] )> 0
IIuII=l
(8.7.19)
of the maximal ball i n g centered at 0 and contained i n N ( 0 ) . Moreover let r* E [0,r ] . W e can define the closed ball with center y E G and of radius r* by
W i t h t h e notation introduced above, we can state and prove t h e following local convergence result for Newton's method (8.7.11)-(8.7.12): Theorem 8.7.9 Assume: Operator d F j is one-to-one, d F c l exists and
for some positive constant a; there exist constants ro E ( 0 ,r ] , to2 0 and C that for all z E U ( y , r o ) , u E g with llull 5 ro the following hold:
2 0 such
and
Then for each fixed q E [O, 11 there exists r* > 0 such that, if yo E U O ( yr, * ) sequence {yn) ( n 2 0 ) generated by Newton's method (8.7.11)-(8.7.12) is well defined, remains in U O ( y , r * )for all n 2 0 and converges quadratically to y. Moreover the following relations hold for all n 2 0:
lim vn = 0 ,
n-im
and
where,
and function f is given i n Lemma 8.7.7.
8. Applications of the weaker version of the Newton-Kantorovich theorem
334
Proof. Let 6
> 0 and f (-) be as in Lemma 8.7.7. Set b = sup f ( t )= f ( 6 ) > 1, t~[0,61
and define r*
= min
(p, - -
l'+"q,:p),
where,
, I t follows y-lyo = exp(vo) for some vo E g with 11 voll Let yo E U O ( yr*). Using (8.7.22) we can get
< T* .
by the choice of r*. By (8.7.30) and the Banach Lemma on invertible operators 1.3.2 dF;l exists, and
Define uo by (8.7.11). Then using Lemma 8.7.8, we can have in turn uo
+ vo
=
d F i l [ F ( Y )- F(yo)
+ dFy(vo)]
Moreover using (8.7.23)) (8.7.31) and (8.7.32)) we get since lltvoll t E [O, 11
Then by the choice of r* and (8.7.33) we obtain:
< r* for all
8.7. Convergence on Lie Groups which implies uo E N(0). Therefore we have
Hence by (8.7.6) and (8.7.34)
By inequality (8.7.16) with ,B = 3r* 5
$ we get
by the choice of r*. That is yl E UO(y,r*). The rest follows by induction and noticing that vn + 0 as n + since Ilvn+l 11 < llvnll for all n 2 0 holds.
Remark 8.7.1 In general
&
holds and can be arbitrarily large. If (a) to = e, and q = 1, our results are reduced to the ones obtained in [265, Theorem 4.6, p. 1371. Denote the convergence radius obtained there by rT and the convergence ratio by cl . In this case we have r* = rT and c = cl. (b) l o
=
(8.7.36)
e and q E [O, I), then we get by (8.7.27) and (8.7.28)
and
(c) lo< 1 and q and (8.7.38).
E
[O, 11, then using (8.7.27) and (8.7.28) we deduce again (8.7.37)
That is in cases (b) and (c) we obtain a larger convergence radius r* and a smaller convergence ratio c than the corresponding r; and cl found in 1265, p. 1371. Note that these advantages are obtained under the same computational cost, since in practice the evaluation of Lipschitz constant l requires the evaluation of the center Lipschitz constant lo. We complete this study with a numerical example. But first we need some background on some matrix groups. The set GL(k, R) of all invertible k x k matrices with real coefficients forms a Lie group called the general linear group under standard matrix multiplication. This Lie group has some interesting subgroups one
336
8. Applications of the weaker version of the Newton-Kantorovich theorem
of which is the orthogonal group O ( k ,R) consisting of orthogonal matrices y such that y T y = Ik where Ik is the identity k x k matrix. The Lie algebra of GL(k, R) is denoted by g!(k, R) and it is the linear space of all real k x k matrices such that [ u , v ] = u v - v u . In particular the Lie algebra of O ( k , R) is the set of all k x k skew-symmetric matrices
sq(k, R) = { v E g!(k, R), vT
+v = 0).
Example 8.7.1 W e consider the initial value problem
for y E G = O ( N ,R), where g ( y ) = s q ( N , R) and y(0) is a random initial guess for Newton's method (8.7.11)-(8.7.12). For simplicity we take N = 1. Choose y ( t ) = et, g ( t ) = e-t. Then by Lemma 8.7.7 we can get f ( t ) = 1 and a = b = 1. Note that (8.7.39) corresponds to the equation
where
W e now restrict F on U ( y ,r * ) = U ( 0 , l ) . B y using (8.7.10), (8.7.22) and (8.7.23) we can get
l = e and lo= e - 1. That is (8.7.35) holds as strict inequality. I n particular we obtain
Moreover (8.7.38) holds. Note that the results concerning Version 2 of Newton's method also studied in [265] can be immediately improved along the same lines. However we leave the details to the motivated reader.
8.8
Finite Element Analysis and Discretizations
In this Section we use our weaker version of the Newton-Kantorovich theorem introduced in Section 5.1 to show existence and uniqueness of solutions for strongly nonlinear elliptic boundary value problems [324]. Moreover, a priori estimates are
8.8. Finite Element Analysis and Discretizations
337
obtained as a direct consequence of this theorem. This study is motivated by the work in [324] where the Newton-Kantorovich theorem 4.2.4 was used. The advantages of our approach have been explained in detail in Section 5.1. Finally examples of boundary value problems where our results apply are provided.We assume the following: ( A l ) There are Banach spaces Z C X and U C Y such that the inclusions are continuous, and the restriction of F to Z , denoted again by F, is a Frkchetdifferentiable operator from Z to U. (A2) For any v E Z the derivative Ff(v) E L(Z, U ) can be extended to Ff(v) E L(X, Y) and it is: - locally Lipschitz continuous on Z, i.e., for any bounded convex set T L Z there exists a positive constant cl depending on T such that:
for all v, w E T - center locally Lipschitz continuous at a given uo E Z, i.e., for any bounded convex set T C Z, with uo E T, there exists a positive constant co depending on uo and T such that:
for all v E T. (As) There are Banach spaces V C Z and W C U such that the inclusions are continuous. We suppose that there exists a subset S C V for which the following holds: "if Ff(u) E L(V, W) is an isomorphism between V and W at u E S, then F f ( u ) E L(X, Y) is an isomorphism between X and Y as well." To define discretized solutions of F(u) = 0, we introduce the finite dimensional subspaces S d C Z and S d C U parametrized by d, 0 < d < 1 with the following properties: (Ad) There exists r L 0 and a positive constant ca independent of d such that
(As) There exists projection IId : X solution of F ( u ) = 0, then
--t
Sd for each Sdsuch that, if uo
E
S is a
lim dCr 1
1 ~ 0 - IIduollX = 0
(8.8.4)
lim dCr 1
1 ~ 0 - IIduollZ = 0
(8.8.5)
d-+o
and d-+o
We can show the following result concerning the existence of locally unique solutions of discretized equations:
8. Applications of the weaker version of the Newton-Kantorovich theorem
338
Theorem 8.8.1 Assume that conditions (A1)-(&) hold. Suppose F1(uo)E L ( V , W ) is an isomorphism, and uo E S . Moreover, assume F1(uo) can be decomposed into F 1 ( u o )= Q R, where Q E L ( X , Y ) and R E L ( X , Y ) is compact. The discretized nonlinear operator Fd : Z -+ U is defined by
+
where I is the identity of Y, and Pd : Y
-+ Sd
is a projection such that
lim llv - Pdv1lY = 0, for all v E Y,
(8.8.7)
( I - Pd)Q(vd) = 0 , for all vd E Sd.
(8.8.8)
d+o
and
Then, for suficiently small d > 0 , there exists ud E Sd such that F d ( u d )= 0 , and ud is locally unique. Moreover, the following estimate holds
where
11
is a positive constant independent of h.
Proof. The proof is similar to the corresponding one in [324, Th. 2.1, p. 1261. However, there are some crucial differences where weaker (8.8.2) is used (needed) instead of stronger condition (8.8.1). Step 1. We claim that there exists a positive constant cg, independent of d, such that, for sufficiently small h > 0,
11
11
From ( A 3 )and uo E S , F1(uo)E L ( X , Y )is an isomorphism. Set B o = F ' ( U ~ ) - .~ We can have in turn
+ F1(uo)E L ( X , Y ) is compact we get by (8.8.7) that lim 11 ( I - Pd)(-Q + F1(uo)ll= 0. d+o
Since -Q
By (8.8.7) there exists a positive constant
That is, using (8.8.2) we get
c4
such that
(8.8.12)
8.8. Finite Element Analysis and Discretizations
339
Hence, by (8.8.5) we can have
where lim 6 ( d ) = 0 , and (8.8.10) holds with q = $. d-0
lim d-'
d+o
IIFA(II~(uo))-~F~(~z~(uo)II
Step 2. We shall show:
= 0.
(8.8.16)
Note that
where,
and we used
where c5 is independent of d. The claim is proved. Step 3. We use our modification of the Newton-Kantorovich theorem with the following choices: A = Sd Z with U with norm d-' IIwdll , xo = &(uo), F = Fd. norm d-' llwdllx , B = Sd Notice that IISIILcA,B,= IISIIL(X,Y),for any linear operator S E L(Sd, Sd). By step 1 , we know that FA(IId(uO))E L ( S d ,S d ) is an isomorphism. It follows from (8.8.1) and ( A 4 )that for any wd, ud E Sd,
Similarly, we get using (8.8.2) and ( A d ) that
Hence assumptions on the Lipschitz constants are satisfied with
1 = c1c2ci1c4 and lo = coc2ci1 cq. From step 2, we may take sufficiently small d
(8.8.22)
> 0 such that (lo + l ) 5~1, where
340
8. Applications of the weaker version of the Newton-Kantorovich theorem
That is, assumption (5.1.56) is satisfied. Hence, for sufficiently small d exists a locally unique u d E Sd such that F d ( u d )= 0 and
It follows (8.8.9) holds with
11
> 0 there
= 2 ~ 3 ~ rn~ 4 ~ ~ .
Remark 8.8.1 I n general,
holds and can be arbitrarily large (see Section 5.1), where 1 and lo are given by (8.8.22). If 1 = lo, our Theorem 8.8.1 reduces to the corresponding Theorem 2.1 i n [324, p. 1261. Otherwise our condition (5.1.56) is weaker than the corresponding one i n [324] using the Newton-Kantorovich hypothesis (5.2.5). That is,
but not vice versa. A s already shown i n Section 5.1, finer estimates on the distances llud - Hd(uo)l[and a more precise information on the location of the solution are provided here and under the same computational cost since i n practive the evaluation of cl requires that of co. Note also that our parameter d will be smaller than the corresponding one in [324], which i n turn implies that fewer computations and smaller dimension subspaces Sd are used to appproximate u d . This observation is very important i n computational mathematics (681. The above observations suggest that all results obtained in [324]can be improved if rewritten using weaker (5.1.56) instead of stronger (5.2.5). However we do not attempt this here (leaving this task to the motivated reader). Instead we provide examples of nonlinear problems already reported in [324] where finite element methods apply along the lines of our Theorem above. Example 8.8.1 Find u E H h ( r ) , r = (b, c ) & R such that
( F ( u ) ,u ) =
[go(x,u, ul)d
+ g ( x , u,ul)u]d x = 0 ,
for all v E H,'(T), (8.8.27)
where go,g1 are suficiently smooth functions from r x R x R to R. Example 8.8.2 For the N-dimensional case ( N = 2,3): let D C RN be a bounded domain with a Lipschitz boundary. Then consider the problem: find u E H J ( d ) such that
( F ( u ) ,u ) =
+
[qo(x,21, V u ) V u q ( x ,u, V U )d~x ]= 0 , for all v E H;(d), (8.8.28)
where qo : D x R x RN + RN and q E D x R x RN + R are suficiently smooth functions. Since equations (8.8.27) and (8.8.28) are defined i n divergence form, their finite element solutions are defined in a natural way.
8.9. Exercises
341
Results on iterative methods defined on generalized Banach spaces with a convergence structure [68], [231], [232] can be found in the exercises.
8.9
Exercises
8.1 Let g be an algorithm for finding a solution x* of equation F (x) = 0, and x
the approximation to x* computed by g. Define the error for approximating
x
Consider the problem of approximating x* when F satisfies some conditions. Algorithms based on these conditions cannot differentiate between operators in the class C of all operators satisfying these conditions we confort the class C instead of specific operators from C. Define
di = inf sup d (g, F ) g E A FCC
where A is the class of all algorithms using i units of time. The time t needed to approximate x* to with in error tolerance E > 0 is the smallest i such that di I E, and an algorithm is said to be optimal if sup d (g, F ) = dt. FCC
If for any algorithm using i units of time, there exist functions P I , F2 in C such that: the minimum distance between any solution of F, and any solution of F2 is greater or equal to 2s then, show:
8.2 With the notation introduced in Exercise 8.1, assume: (1) F : [a,b] + R is continuous; (2) F (a) < 0, F (b) > 0, Then, show:
d. - b-" 2 - 2i+l ' 8.3 With the notation introduced in Exercise 8.1, assume: (1) F : [a,b] + R, F' (x) 2 a > 0 for all x E [a,b] ; (2) F (a) < 0, F (b) > 0.
Then, show: d2. -" -b 2i+l '
342
8. Applications of the weaker version of the Newton-Kantorovich theorem
8.4 With the notation introduced in Exercises 8.1, assume: ( 1 ) F : [a,b] + R, F' ( x ) b for all x E [a,b];
0 Then, show: d 2. b-a - m.
8.5 With the notation introduced in Exercis 8.1, assume: ( 1 ) F : [a,b]+ R, b 2 F 1 ( x )2 a > 0 for all x E [a,b ] ; ( 2 ) F ( a ) < 0, F (b) > 0 Then, show:
8.6 Assume: (1) hypotheses of Exercise 8.5 hold; ( 2 ) IIF" (x)Il 5 y for a11 y E [a,b]. Then show that the problem of finding a solution x* of equation F ( x ) = 0 can be solved superlinearly.
8.7. Let L , M , M I be operators such that L E C 1 (Vl + V ) , M I E L+ ( V ) , M E C (Vl + V ) , and x , be points in D. It is convenient for us to define the sequences c,, d,, a,, b, (n 2 0 ) by C n = [ x n +~ xnI a, = ( L M M I ) , ( a ) for some a E C, bn = ( L M M I ) , (0),
+ + + +
and the point b by
Prove the result: Let X be a Banach space with convergence structure ( X ,V ,E) with V = (V,C, ll.llv), an operator F E C 1 ( D --, X ) with D C X , an operator Q E C ( D + X ) , an operator L E C 1 (Vl -+ V ) with Vl V , an operator M E C (Vl -+ V ) ,an operator M I E L+ ( V ) , a point a E C , and a null sequence ( 2 , ) E D such that the following conditions are satisfied:
( C s ) the inclusions U ( a ) C D and [O, a] C V 1 are true; (C7) L is order-convex on [0,a], and satisfies
for all x , y E U ( a ) with 1x1
+ Iyl 5 a;
8.9. Exercises
(Cs) M satisfies the conditions
for all x, y E U (a) with 1x1 M (wl) - M (wz)
+ Iyl I a, and
< M (w3) - M (w4) and M (w) > 0
for all w, wl, w2, w3, w4 E [O,a] with wl I ws, w2 I w4, w2 I wl, W4 I ~ 3 ; (C9) MI, x,, z, satisfy the inequality
for all n
2 1;
((210) L' (0) E B (I- F' (0)) , and (- ( F (0) Q (0) f' (0) ( ~ 0 ),) L (0) M (0) Mi (0)) E E; ((311) (L M MI) (a) I a with 0 I L M MI; and (C12)
+ + + + + + + ( M + MI + L' (a))" a + 0 as n + m.
+
Then, (i) The sequence (x,, d,) E (X x v ) is~well defined, remains in EN, is monotone, and satisfies for all n 2 0
where b is the smallest fixed point of L (ii) Moreover, the iteration {x,}
+ M + MI in [O, a].
(n > 0) generated by
converges to a solution x* E U (b) of the equation F (x) which is unique in U (a). (iii) Furthermore, the following estimates are true for all n
+ Q (x) = 0,
> 0:
I dn I b, bn 5 an, Ixn+l- xnI I dn+l- dn, bn
Ix, -x*I
< b-d,,
and Ix,
-
x* I
I a,
-
b,,
for MI
= 0,
and z,
=0
(n 2 0).
344
8. Applications of the weaker version of the Newton-Kantorovich theorem
8.8. We will now introduce results on a posteriori estimates for the iteration introduced in exercise 8.7. It is convenient to define the operator
where,
and the interval
In = 10, a - 1xn11 . Show: (a) operators S, are monotone on I,;
(b) operators Rn : [0, a - d,] Hint: Verify the scheme:
(c) if q E In satisfy Rn (q)
+
[0,a - d,] are well defined, and monotone.
I q, then
cn I Rn (q) = P l q, and
Rn+l ( p - c,)
I p - c,
for all n 2 0;
(d) under the hypotheses of Exercise 8.7, let qn E In be a solution of R , (q) 5 q, then
where
a, = q, and a,+l=
R , (a,)
- em;
and (e) under the hypotheses of exercise 8.7, any solution q E I, of R, (q) 5 q is such that
8.9. Exercises
345
8.9. Let A E L (X + X ) be a given operator. Define the operators P, T (D + X )
by P(x) = A T ( x + u ) , T (x) = G (x)
+ R (x) , P (x) = F (x) + Q (x) ,
and
where A E L ( X + X ) G, R are as F, Q respectively. We deduce immediately that under the hypotheses of Exercise 8.7 the zero x* of P is also a zero of AT, if u = 0. We will now provide a monotonicity result to find a zero x* of AT. the space X is assumed to be partially ordered and satisfies the conditions for V given in (C1)-(C5). Moreover, we set X = V, D = C2 SO that 1.1 turns out to be I . Prove the result: Let V be a partially ordered Banach space satisfying conditions (C1)-(C5), V, Y be a Banach space, G E C1 (D + Y), R E C (D + Y) with D A E L ( X + V ) , M E C ( D + V ) , Ml E L+(V) a n d u , v E V s u c h t h a t : ((313) [u,vl
G D;
(C14) I - AG' (u)
+ M + M I E L+ (V) ;
(C15) for all wl, w2 E [u,v] : wl
I w2 +AG' (wl) L AG' (wa) ;
(Cis) AT (u) + AG' (u) (zo) I 0, AT (v) + AG' (v)(20) 2 0 and AT (v) MI (V- U) 2 0; (C17) condition (Cs) is satisfied, and M (v - u)
I -Q
(v - u);
(Cis) condition (Cg) is satisfied ; (C19) the following initial condition is satisfied
and (Czo) ( I - AG' (v)
+ m + MI)" (v - u) + 0 as n + ca.
Then the Newton sequence
is well defined for all n AT in [u,v].
> 0, monotone and converges to a unique zero x* of
346
8. Applications of the weaker version of the Newton-Kantorovich theorem
8.10. Let X be a Banach space with convergence structure (X, V, E) with V = (V, C, Il,llv), an operator F E C1 (XF + X ) with XF C X, an operator L E C1 (VL + V) with VL G V and a point of C such that
(a) U(a) C XF, [(),a]C VL; (b) L is order convex on [0,a] and satisfies for x, y E U (a) with 1x1
+ I yl 5 a;
(c) L' (0) E B (I- F' (0)), (-F (0) ,L (0)) E E;
AG' ( ~ 2 ) ;
Then show: the Newton sequence uo = U, un+l := U+ [AG' (un)]*[-AG (un)] ( n 2 0)
is well defined, monotone, and converges to the unique zero x of AG in [u,v]. 8.15. Consider the two-point boundary value problem
-xl' (s) = 4 sin (x (s))
+ f (s) ,
x (0) = x (1) = 0
as a possible application of Exercises 8.7-8.14 in the space X = C [O,1] and V = X with natural partial ordering; the operator /./ is defined by taking absolute values. Let G E L+ (C [0, 11) be given by
{ lssin (2t) sin (2
Gx (s) = -
+ satisfying Gx = y
Jds
- 2s) x
(t) dt
sin (2 - 2t) sin (2s) x (t) dt
1
-yt' - 4y = x, y (0) = y (1) = 0. Define the operator
Let
For x, y, w E C [O,1] 1x1 ,I4 + IYI 5 -57r =+ I[F' (x) - F' (x Y)]wl 5 [L' (1x1 + IY~)- L'L' (IxI)l IwI.
+
Further we have L (0) = lGf 1 and L' (0) = 0. We have to determine a E C+ [ O , l ] with la1 5 .57r and s E (0,l) such that L (a) = 4G (a - sin (a)) lGf 1 5 sa. We seek a constant function as a solution. For eo (s) = 1, we compute
+
348
8. Applications of the weaker version of the Newton-Kantorovich theorem
Show that a = teo will be a suitable solution if
+ ll Gf ll
4p (t - sin ( t ) )
oo
< t.
8.16. (a) Assume: given a Banach space X with a convergence structure ( X ,V ,E ) with V = (V,C , [[.[[v), an operator F E C1 ( X o & X + X ) , operators M , Lo, L E L1 (VOC V + V ) , and a point p E C such that the following conditions hold:
M , Lo, L are order convex on [O,p],and such that for x , y, x E U ( p ) with 1x1 I P , Ivl + 1x1 I P Lb (1x1)- Lb ( 0 ) E B (F' ( 0 )- F' ( 2 ) ), - L' (191) E B (F' (9)- F' ( 3 XI), M (PI I P LO(PO) I M ( P o ) for all PO E [O,p], Lb(po) I M'(Po) for all PO E [o,P], Lb E B ( I - F' (011, (-F(O), Lo(0)) E E , M' (p)" ( p ) + 0 as n + m, M' (d,) (b - d,) L (d,) I M (b) for all n 2 0, L'
+
(Ivl + 1x1)
+
where
and
Show: sequence {x,) ( n 2 0 ) generated by Newton's method is well defined, remains in E ~is ,monotone, and converges to a unique zero x* in U ( b ) ,where b is the smallest fixed point of M in [O,p]. Moreover the following bounds hold for all n 2 0
and
8.9. Exercises
349
< r then show: the following holds for
(b) If r E [O,p - Ixnl] satisfies R,(r) all n 0:
>
and
(c) Assume hypotheses of (a) hold and let r, E [O,p - Ixnl] be a solution (8.9.1). Then show the following a posteriori estimates hold for all n 2 0
where
(d) Under hypotheses of (a) show any solution r E [0,p - lxnl] of Rn(r) 5 r yields the a posteriori estimate
Let X be partially ordered and set
X=V,
D = C ~and I.I=I.
(e) Let X be partially ordered, Y a Banach space, G E C1 (Xo A E L (Y + X ) and x, y E X such that:
c X + Y) ,
1. [x,Y] E Xo. 2. I - AG' (x) E L+ (X) , 3. zl I 22 ==+ AG' (21) I AG'
(22)
for all zl, z2 E [x,y],
4. AG (x) 5 0, AG (9) 2 0,
5. (I- AG' (9))" (y - x) Show: sequence {y,) (n Y O = X,
Yn+l
= yn
is well defined for all n AG in [x,y].
-, 0
as n
-, 00.
> 0) generated by
+ AG' (yn)*[-AG
(yn)]
> 0, monotone and converges to a unique zero y* of
350
8. Applications of the weaker version of the Newton-Kantorovich theorem
8.17. Let there be given a Banach space X with convergence structure (X, V, E) where V = (V, C, Il.llv), and operator F E C" (Xo + X ) with Xo c X , and operator M E C' (Vo + V) with Vo C V, and a point p E C satisfying:
M is order-convex on [O,p] and for all x, y
E
U(p) with 1x1
+ Iyl Ip
and
Then, show sequence (x,, t,) E (X x v ) ~ where , {x,} is generated by Newton's method and {t,} (n 0) is given by
>
to = 0, tn+l = M (tn)
+ M' ( 1 ~ ~(a,) 1 ) , a, = IX,+~
-
xnl
is well defined for all n 2 0, belongs in E", and is monotone. Moreover the following hold for all n
>0
where,
b = M" (0), is the smallest fixed point of M in [O,p]. b) Show corresponding results as in Exercises 8.16 (b)-(c). 8.18. Assume hypotheses of Exercise 8.16 hold for M = L.
Show: (a) conclusions (a)-(e) hold (under the revised hypothesis) and the last hypothesis in (a) can be dropped; (b) error bounds lx* - x,l (n 2 0) obtained in this setting are finer and the infromation on the location of the solution more precise than the corresponding ones in Exercise 8.10 provided that Lo (PO)< L (PO)or Lb (PO)< L' (PO) for all p E [O,p]. As in Exercise 8.11 assume there exists a monotone operator ko : [O,p]+ R such that
8.9. Exercises
and define operator Lo by
Sequence {d,) given by do = 0, d,+l p* E [0,p] provided that
= L(d,)
+Lb (Ix,~)c, converges to some
Conclude that the above semilocal convergence condition is weaker than (d) in Exercise 8.10 or equivalently (for p = a )
Finally, conclude that in the setting of Exercise 8.7 and under the same computational cost we always obtain and under weaker conditions: finer error bounds on the distances lx, - x*I (n 2 0) and a better information on the location of the solution x* than in Exercise 8.7.
This page intentionally left blank
Chapter 9
The Newton-Kantorovich Theorem and Mathematical Programming The Newton-Kantorovich theorem 4.2.4 with a few notable exceptions [293], [294], [282] has not been sufficiently utilized in the mathematical programming community. The purpose of this chapter is to provide a bridge between the two research communities by showing that the Newton-Kantorovich theorem can be used in analyzing LP and interior point methods, and can be used for obtaining optimal error bounds along the way. In Sections 1 and 2 we show how to improve on the elegant works of Renegar, Shub in [294] and Potra in [282]. To avoid repetitions, we simply refer the reader to the above excellent works for the development of these methods. We simply start from the point where the hypotheses made can be replaced by ours which are weaker. The benefits of this approach have also been explained in the introduction and in Section 5.1. We also note that the work in [294] is motivated by a Theorem of Smale in combination with the Newton-Kantorovich theorem, whereas the work in [282] differs from the above since it applies the latter theorem directly.
9.1
Case 1: Interior Point Methods
It has already been shown in [282] that the Newton-Kantorovich theorem can be used to construct and analyze optimal-complexity path following algorithms for linear complementary problems. Potra has chosen to apply this theorem to linear complementary problems because such problems provide a convenient framework for analyzing primal-dual interior point methods. Theoretical and experimental work conducted over the past decade has shown that primal-dual path following algorithms are among the best solution methods for linear programming (LP), quadratic programming (QP), and linear complementary problems (LCP) (see for example [285], [346]). Primal-dual path following algorithms are the basis of the 353
354
9. The Newton-Kantorovich Theorem and Mathematical Programming
best general-purpose practical methods, and they have important theoretical properties [304], [346], [354]. Potra, using the Newton-Kantorovich theorem 4.2.4, in particular showed how √ to construct path-following algorithms for LCP that have O( nL) iteration complexity [282]. Given a point x that approximates a point x(τ ) on the central path of the LCP with complementarity gap τ, the algorithms compute a parameter θ ∈ (0, 1) so that x satisfies the Newton-Kantorovich hypothesis (4.2.51) for the equation defining x ((1 − θ) τ ) . It is proven that θ is bounded below by a multiple of n−1/2 . Since (4.2.51) is satisfied, the sequence generated by Newton’s method (4.1.3) or by the modified Newton method (take F ′ (xn ) = F ′ (x0 ), n ≥ 0) with starting x, will converge to x ((1 − θ) τ ) . He showed that the number of steps required to obtain an acceptable approximation of x ((1 − θ) τ ) is bounded above by a number independent of n. Therefore, apoint √ with complementarity less than ε can be obtained in at most O n log εε0 steps (for both methods), where ε0 is the complementary gap of the starting point. For linear complementarity problems with rational input data √of bit length L, this implies that an exact solution can be obtained in at most O ( nL) iterations plus a rounding procedure including O n3 arithmetic operations [346]. The differences between Potra’s work and the earlier works by Renegar [293] and Renegar and Shub [294] have been stated in the introduction of this chapter and further analyzed in [282]. We also refer the reader to the excellent monograph of Nesterov and Nemirovskii [254] for an analysis of the construction of interior point methods for a larger class of problems than that considered in [282]. Below one can find our contribution. Let k · k be a given norm on Ri , i a natural integer, and x0 be a point of D such that the closed ball of radius r centered at x0 , U (x0 , r) = {x ∈ Ri : kx − x0 k ≤ r}
(9.1.1)
is included in D ⊆ Ri , i.e. U (x0 , r) ⊆ D.
(9.1.2)
We assume that the Jacobian F ′ (x0 ) of F : D ⊆ X → X is nonsingular and that Lipschitz condition (4.2.50) is satisfied. The Newton–Kantorovich Theorem 4.2.4 states that if k = ℓη ≤
1 , 2
(9.1.3)
then there exists x∗ ∈ U (x0 , r) with F (x∗ ) = 0. Moreover the sequences produced by Newton’s method (4.1.3) and by the modified Newton method (4.1.4) are well defined and converge to x∗ .
9.1. Case 1: Interior Point Methods
355
Define ¯ ≤ k 0 = ℓη
1 , 2
(9.1.4)
where, ℓ0 + ℓ ℓ¯ = , 2
(9.1.5)
where ℓ0 is the constant appearing in the center Lipschitz condition. Note that k≤
1 1 ⇒ k0 ≤ 2 2
(9.1.6)
but not vice versa unless if ℓ0 = ℓ. Similarly by simply replacing ℓ with ℓ0 (since the center-Lipschitz condition instead of (4.2.50) is actually needed in the proof) and condition (9.1.3) by the weaker k 1 = αω0 ≤
1 2
(9.1.7)
in the proof of Theorem 1 in [282] we show that method (4.1.4) also converges to x∗ and the improved bounds kyn − x∗ k ≤
2β0 λ20 n−1 ξ 1 − λ20 0
(n ≥ 1)
(9.1.8)
where β0 =
√ 1 − 2k 1 , ω0
λ0 =
1−
√ 1 − 2k 1 − h1 k1
(9.1.9)
and ξ0 = 1 −
p 1 − 2k 1 ,
(9.1.10)
hold. In case ℓ0 = ℓ (9.1.9) reduces to (12) in [282]. Otherwise our error bounds are finer. Note also that k≤
1 1 ⇒ k1 ≤ 2 2
(9.1.11)
but not vice versa unless if ℓ0 = ℓ. We can now describe the linear complementarity problem as follows: Given two matrices Q, R ∈ Rn×n (n ≥ 2) and a vector b ∈ Rn , the horizontal linear
356
9. The Newton-Kantorovich Theorem and Mathematical Programming
complementarity problem (HLCP) consists of approximating a pair of vectors (w, s) such that ws = 0 Q(w) + R(s) = b
(9.1.12)
w, s ≥ 0. The monotone linear complementarity problem (LCP) is obtained by taking R = −I and Q positive semidefinite. Moreover the linear programming problem (LP) and the quadratic programming problem (QP) can be formulated as HLCPs. That is, HLCP is a suitable way for studying interior point methods. We assume HLCP (9.1.12) is monotone in the sense that: Q(u) + R(v) = 0 implies ut v ≥ 0, for all u, v ∈ Rn .
(9.1.13)
Condition (9.1.13) holds if the HLCP is a reformulation of a QP. If the HLCP is a reformuation of a LP then the following stronger condition holds: Q(u) + R(v) = 0 implies ut v = 0, for all u, v ∈ Rn .
(9.1.14)
Then we say in this case that the HLCP is skew-symmetric. If the HLCP has an interior point, i.e. there is (w, s) ∈ Rn++ × Rn++ satisfying Q(w) + R(s) = b, then for any parameter τ > 0 the nonlinear system ws = τ e Q(w) + R(s) = b
(9.1.15)
w, s ≥ 0 t
has a unique positive solution x(τ ) = [w(τ )t , s(τ )t ] . The set of all such solutions defines the central path C of the HLCP. It can be proved that (w(τ ), s(τ )) converges to a solution of the HLCP as τ → 0. Such an approach for solving the HLCP is called the path following algorithm. At a basic step of a path following algorithm, an approximation (w, s) of (w(τ ), s(τ )) has already been computed for some τ > 0. The algorithm determines the smaller value of the central path parameter τ+ = (1 − θ) τ, where the value θ ∈ (0, 1) is computed in some unspecified way. The approximation (wt , st ) of (w(τ+ ), s(τ+ )) is computed. The procedure is then repeated with (w+ , s+ , τ + ) in place of (w, s.τ ) . In order for us to relate the path following algorithm and the the NewtonKantorovich theorem we introduce the notations w w (τ ) x= , x (τ ) = , s s (τ ) + w w (τ+ ) x+ = , x (τ ) = , etc. + s+ s (τ+ )
9.1. Case 1: Interior Point Methods Then for any θ > 0 we define the nonlinear operator ws − σe Fσ (x) = . Q(w) + R(s) − b
357
(9.1.16)
Then system (9.1.15) defining x(τ ) becomes Fσ (x) = 0,
(9.1.17)
whereas the system defining x(τ+ ) is given by F(1−θ)τ (x) = 0.
(9.1.18)
We assume that the initial guess x belongs in the interior of the feasible set of the HLCP F 0 = x = (wt , st )t ∈ R2n (9.1.19) ++ : Q(w) + R(s) = b . In order to verify the Newton-Kantorovich hypothesis for equation (9.1.17) we introduce the quantity
η = η(x, τ ) = F ′ (x)−1 Fτ (x) , (9.1.20)
the measure of proximity
k = k(x, τ ) = ηℓ, ℓ = ℓ(x) ¯ ℓ¯ = ℓ(x) ¯ k 0 = k 0 (x, τ ) = η ℓ,
(9.1.21)
k 1 = k 1 (x, τ ) = ηℓ0 , ℓ0 = ℓ0 (x) and the normalized primal-dual gap µ = µ(x) =
wt s . η
(9.1.22)
If for a given interior point x and a given parameter τ we have k 0 (x, τ ) ≤ .5 for the Newton-Kantorovich method or k 1 (x, τ ) ≤ .5 for the modified Newton-Kantorovich method then corresponding sequences starting from x will converge to the point x (τ ) on the central path. We can now describe our algorithm which is a weaker version of the one given in [282]: Algorithm 9.1.1 (using Newton-Kantorovich method). Given 0 < k10 < k20 < .5, ε > 0, and x0 ∈ F 0 satisfying k 0 (x0 , µ (x0 )) ≤ k10 ; Set k 0 ← 0 and τ0 ← µ (x0 ) ; repeat (outer iteration) Set (x, τ ) ← (xk , τk ) , x ¯ ← xk ; Determine the largest θ ∈ (0, 1) such that k 0 (x, (1 − θ)τ ) ≤ k20 ;
358
9. The Newton-Kantorovich Theorem and Mathematical Programming Set τ ← (1 − θ)τ ; repeat (inner iteration) Set x ← x − F ′ (x)−1 Fτ (x)
(9.1.23)
until k 0 (x, µ) ≤ k10 ; Set (xk+1 , τk+1 ) ← (x, τ ) ; Set k ← k + 1; t until wk sk ≤ ε.
For the modified the Newton-Kantorovich algorithm k10 , k20 , k 0 should be replaced by k11 , k21 , k 1 , and (9.1.23) by Set x ← x − F ′ (¯ x)−1 Fτ (x) respectively. In order to obtain Algorithm 1 in [282] we need to replace k10 , k20 , k 0 by k1 , k2 , k respectively. The above suggest that all results on interior methods obtained in [282] using (9.1.3) can now be rewritten using only the weaker (9.1.5) (or (9.1.8)). We only state those results for which we will provide applications. Let us introduce the notation p 1 + θia + p2θia + ria , if HLCP is monotone, Ψai = (9.1.24) 2 , if HLCP is skew-symmetric 1 + qia + 2qia + qia p p where ria = θia , tai = kia , a = 0, 1, h θia = ti 1 +
ta i 1−ta i
i
, qia =
tai , i = 1, 2. 2
(9.1.25)
Then by simply replacing k, k1 , k2 by k 0 , k10 , k20 respectively in the corresponding results in [282] we obtain the following improvements: Theorem 9.1.2 The parameter θ determined at each outer iteration of algorithm 9.1.1 satisfies χa θ ≥ √ = λa n where
a
χ =
√ a a √ 2(k2 −k √1 ) a , 2
2+p ti ψ1 √ 2(k2a −k1a ) √ , √ ( 2+pk1a ) ψ1a
if HLCP is skew-symmetric or if no simplified Newton-Kanorovich steps are performed, otherwise (9.1.26)
9.1. Case 1: Interior Point Methods where
√ 2, if HLCP is monotone, 1, if HLCP is skew-symmetric.
p=
359
(9.1.27)
Clearly, the lower bound on λa on θ is an improvement over the corresponding one in [283, Corollary 4]. In the next result a bound on the number of steps of the inner iteration that depends only on k10 and k20 is provided. Theorem 9.1.3 If Newton-Kantorovich method is used in Algorithm 9.1.1 then each inner iteration terminates in at most N 0 k10 , k20 steps, where 0 log (x ) N 2 h i (9.1.28) N 0 k10 , k20 = integer part log2 p 0 0 0 log2 1 − 1 − 2k2 − k2 /k2
and
xN 0
2 p √ 1 − pk20 / 2 t02 − 1 − 1 − 2k20 k10 hp i = √ p . p 2 2t02 1 − 2k20 ψ20 + 1 − 1 − 2k20 (1 + k10 )
(9.1.29)
If the modified the Newton-Kantorovich method is used in Algorithm 9.1.1 then each iteration terminates in at most S 0 (k1 , k2 ) steps, where
and
S 1 k11 , k21 = integer part
xS 1
log2 (xS 1 ) q +1 p 1 log2 1 − 1 − 1 − 2k2
(9.1.30)
2 p √ 1 − pk21 / 2 t12 − 1 − 1 − 2k21 − k21 k10 = √ p . 2 p p p 1 1 1 1 1 1 2 2 1 − 2k2 1 − 1 − 2k2 − k2 ψ2 + 1 − 1 − 2k2 (1 + k1 )
Clearly, if k11 = k10 = k1 , k21 = k20 = k2 , k 1 = k 0 = k, Theorem 9.1.2 reduces to the corresponding Theorem 2 in [282]. Otherwise the following improvement holds: N 0 k10 , k20 < N (k1 , k2 ) , N 0 < N, S 1 k11 , k21 < S (k1 , k2 ) , and S 1 < S. Since choices
k1 k2 k1 k2 , , , k10 k20 k11 k21
can be arbitrarily large for a given triplet η, ℓ and ℓ0 , the
k10 = k11 = .12, k20 = k21 = .24 when k1 = .21 and k2 = .42
360
9. The Newton-Kantorovich Theorem and Mathematical Programming
and k10 = k11 = .24, k20 = k21 = .48 when k1 = .245 and k2 = .49 are possible. Then using formulas (9.1.27), (9.1.28) and (9.1.30) for our results and (9)–(11) in [282], we obtain the following tables: (a) If the HLCP is monotone and only Newton directions are performed, then: Potra χ(.21, .42) > .17
Argyros χ (.12, .24) > .1
χ(.245, .49) > .199
χ0 (.24, .48) > .196
0
Potra N (.21, .42) = 2
Argyros N (.12, .24) = 1
N (.245, .49) = 4
N 0 (.24, .48) = 3
0
(b) If the HLCP is monotone and Modified Newton directions are performed: Potra χ(.21, .42) > .149
Argyros χ (.12, .24) > .098
χ(.245, .49) > .164
χ1 (.24, .48) > .162
1
Potra S(.21, .42) = 5
Argyros S 1 (.12, .24) = 1
S(.245, .49) = 18
S 1 (.24, .48) = 12
All the above improvements are obtained under weaker hypotheses and the same computational cost (in the case of Newton’s method) or less computational cost (in the case of the modified Newton method) since in practice the computation of ℓ requires that of ℓ0 and in general the computation of ℓ0 is less expensive than that of ℓ.
9.2
Case 2: LP Methods
We are motivated by paper [294], where a unified complexity analysis for Newton LP methods was given. Here we show that it is possible under weaker hypotheses
9.2. Case 2: LP Methods
361
to provide a finer semilocal convergence analysis, which in turn can reduce the computational time for interior algorithms appearing in linear programming and convex quadratic programming. Finally an example is provided to show that our results apply where the ones in [294] cannot. Let us assume that F : D ⊆ X → Y is an analytic operator and F ′ (x)−1 exists at x = x0 ∈ D. Define η = η(x0 ) = kF ′ (x0 )−1 F (x0 )k
(9.2.1)
and 1
k−1
1 ′
−1 (k)
γ = γ(x0 ) = sup F (x0 ) F (x) . k≥2 k!
(9.2.2)
Note that η(x0 ) is the step length at x0 when method (4.1.3) is applied to approximate x∗ . Smale in [312] showed that if D = X, and γη ≤
1 = .125, 8
then Newton-Kantorovich method (4.1.3) converges to x∗ , so that n 1 kxn+1 − xn k ≤ 2 kx1 − x0 k (n ≥ 0). 2
(9.2.3)
(9.2.4)
Rheinboldt in [296], using the Newton–Kantorovich theorem showed convergence assuming D ⊆ X and γη ≤ .11909565.
(9.2.5)
We need the following semilocal convergence result for Newton-Kantorovich method (4.1.3) and twice Fr´echet-differentiable operators: Theorem 9.2.1 Let F : D ⊆ X → Y be a twice Fr´echet-differentiable operator. Assume there exist a point x0 ∈ D and parameters η ≥ 0, ℓ0 ≥ 0, ℓ ≥ 0, δ ∈ [0, 1] such that: F ′ (x0 )−1 ∈ L(Y, X),
(9.2.6)
kF ′ (x0 )−1 [F ′ (x) − F ′ (x0 )]k ≤ ℓ0 kx − x0 k,
(9.2.7)
kF ′ (x0 )−1 F ′′ (x)k ≤ ℓ
(9.2.8)
for all x ∈ D, U (x0 , t∗ ) ⊆ D,
(9.2.9)
362
9. The Newton-Kantorovich Theorem and Mathematical Programming
and hδ = (ℓ + δℓ0 )η ≤ δ,
(9.2.10)
where, t∗ = lim tn ,
(9.2.11)
n→∞
with t0 = 0, t1 = η, tn+2 = tn+1 +
ℓ(tn+1 − tn )2 2(1 − ℓ0 tn+1 )
(n ≥ 0).
(9.2.12)
Then (a) sequence {xn } (n ≥ 0) generated by Newton-Kantorovich method (4.1.3) is well defined, remains in U(x0 , t∗ ) for all n ≥ 0, and converges to a unique solution x∗ of equation F (x) = 0 in U (x0 , t∗ ). Moreover the following estimates hold for all n ≥ 0: kxn+2 − xn+1 k ≤
ℓkxn+1 − xn k2 ≤ tn+2 − tn+1 , 2(1 − ℓ0 kxn+1 − x0 k)
kxn − x∗ k ≤ t∗ − tn ≤ t∗∗ 0 − tn ≤ αη,
(9.2.13) (9.2.14)
where, t∗∗ 0
"
1 δ0 + = 1+ 2 1−
δ 2
# δ δ0 2 η, t∗∗ = αη, α = , 2 2 2−δ
(9.2.15)
and δ0 =
ℓη . 1 − ℓ0 η
(9.2.16)
Furthermore the solution x∗ is unique in U (x0 , R), R =
2 ℓ0
− t∗ provided that
U (x0 , R) ⊆ D, and ℓ0 6= 0.
(9.2.17)
Hα = (ℓ + 2ℓ0 α)η < 2,
(9.2.18)
(b) If
then 0≤a=
ℓη 0, ℓ > 0 such that F ′ (x0 )−1 ∈ L(Y, X),
′
F (x0 )−1 F (x0 ) ≤ η,
′
F (x0 )−1 F ′′ (x) ≤ ℓ, for all x ∈ D, 4 3 γη ≤ and U (x0 , η) ⊆ D. 9 2 Then show that the the Newton-Kantorovich method is well defined, remains in U (x0 , 32 η) for all n ≥ 0, converges to a unique zero x∗ of F in D ∩ U (x0 , 1ℓ ) and satisfies 2n 1 ∗ kxn − x k ≤ 2 η, (n ≥ 1). 2 9.3. [294] Let F : D ⊆ X → Y be analytic. Assume there exist x0 ∈ D, δ ≥ 0 such that F ′ (x0 )−1 ∈ L(Y, X), 1 1 η≤ δ≤ , 2 40γ
378
9. The Newton-Kantorovich Theorem and Mathematical Programming
and ¯ (x0 , 4δ) ⊆ D. U If k¯ x − x0 k ≤ δ then show that the Newton-Kantorovich method (4.1.3) starting at ¯ (x0 , 3 η) so that x¯ is well defined, and converges to a unique zero x∗ of F in U 2 7 kxn − x k ≤ 2 ∗
2n 1 δ. 2
9.4. Consider the LP barrier method [294]. Let Int = {x : A(x) > b} , A is a real matrix with ai denoting the ith row of A, and let h : Int × R+ → R denote the map X h(x, t) = ct x − t ln (ai x − bi ) . For fixed t, the map x → h(x, t) is strictly convex having a unique minimum. The sequence of minimum as t → 0 converges to the optimal solution of the LP. The algorithm simply computes a Newton-Kantorovich sequence {xn } (n ≥ 0), where xn+1 = xn − ∇2x h(xn , tn+1 )−1 ∇x h(xn , tn+1 ) and tn → 0. Show: 1 (a) for suitable x0 we can always choose tn+1 = 1 − 41√ tn and each xn will m be a ”good” approximation to the minimum of the map x → h(x, tn ). 1 and show Hint: Let y be the minimum of x → h(x, t), assume k¯ y − yk ≤ 20
′ ′ 1 ′ ′ −1 ′ ′
k¯ y − y k ≤ 20 , where kxk = ∆(y) A(x) 2 and for t > 0, y and k·k are defined in an obvious √ way. Then use Exercise 9.3. (b) An O ( mL) iteration bound. (c) the choice of sequence {tn } above can be improved if 41 is replaced by any a ∈ {22, 23, . . . , 40} . Hint: Use Theorem 9.2.3. Note that this √ is an improvement over the choice in (a). (d) An O ( mL) iteration bound using choices of sequences {tn } given in (c). (e) How to improve along the same lines as above the QP barrier method, the primal LP algorithm, the primal dual LP algorithm and the primal-dual QP algorithm as introduced in [294].
Chapter 10
Generalized equations Recent results on generalized equations that compare favourably with earlier ones are presented in this chapter.
10.1
Strongly regular solutions
In this section we are concerned with the problem of approximating a strongly regular solution x∗ of generalized equation 0 ∈ F (x) + NC (x)
(10.1.1)
where F is a continuously Fr´echet-differentiable operator defined on an open, nonempty convex subset D of a Banach space X with values in a Banach space Y , and NC (x) is a normal cone to C at x (C ⊆ X closed and convex). Many problems in Mathematical Programming/Economics can be formulated as generalized equations [68], [99], [298], [302] (see also Chapter 7). Here motivated by [298] we show that it is possible under the same hypotheses computational cost to provide a wider choice of initial guesses in the local case and under weaker conditions finer error bounds on the distances involved as well as a more precise information on the location of the solution in the semilocal case. We achieve all that by using more precise majorizing sequences, along the lines of our work in Chapter 5. We first provide in a condensed form the preliminaries for generalized equations and strongly regular solutions, and further we provide a local convergence (see Theorem 10.1.6) and a semilocal convergence analysis (see Theorem 10.1.10) for method (4.1.3). Finally we show that our results compare favorably with the earlier ones. To make the section as self-contained as possible we need the definitions [298]. For a more detailed information on strongly regular equations see [298], [302] and the references there. 379
380
10. Generalized equations
Definition 10.1.1 Let F : D ⊆ X → Y be an operator with a continuous Fr´echetderivative F ′ . Let y be a fixed point in D. Then the linearization of F at y denoted by LFy is defined by LFy (x) = F (y) + F ′ (y)(x − y).
(10.1.2)
It follows from (10.1.2) that LF y is an affine operator passing through y having “slope” equal to that of F at y. Consequently the linearization of a generalized equation at y can be obtained by replacing F by LFy , formally: Definition 10.1.2 Let F , D, X, Y be as in the previous Definition. Then the linearization of the generalized equation 0 ∈ F (x) + NK (x)
(10.1.3)
at y is the generalized equation 0 ∈ LFy (x) + NK (x), where K is a non-empty, closed convex cone in D, and NK (x) be the normal cone to K at x. Definition 10.1.3 Let T be a set-valued operator. We define T −1 (y) = {x | y ∈ T (x)}.
(10.1.4)
−1
It then follows T (0) = {x | 0 ∈ T (x)} is the solution set of the generalized equation 0 ∈ T (x). We pay special attention to the case T −1 when T = LFy + NK . The following definition is due to Robinson [298] and describes the behavior of (LFy + NK )−1 suitable for the convergence of Newton’s method: Definition 10.1.4 Let 0 ∈ F (x) + NK (x) have a solution x∗ . Set T = LFx∗ + NK . Then x∗ is a strongly regular solution of 0 ∈ F (x) + NK (x) if and only if there exists a neighborhood B1 of the origin, and a neighborhood B2 of x∗ , such that the restriction B2 ∩ T −1 to B1 is a single-valued and Lipschitz continuous operator. Note that the operator B2 ∩ T −1 is defined by (B2 ∩ T −1 )(y) = B2 ∩ T −1 (y).
Finally we need a Corollary of Theorem 2.4 in [298]. Corollary 10.1.5 Let F : D ⊆ X → Y have a Fr´echet derivative F ′ on D. Suppose x0 ∈ D and the generalized equation 0 ∈ LFx0 (x) + NC (x) has a strongly regular solution with association Lipschitz constant d. Then there exist positive constants ρ, r and R such that for all x ∈ U(x0 , ρ) the following hold: (LFx + NC )−1 ∩ U (x1 , r) restricted to U(0, R) is single-valued and Lipschitz continuous with modulus d . 1 − dkF ′ (x) − F ′ (x0 )k
10.1. Strongly regular solutions
381
We can provide the following local convergence result for Newton’s method involving generalized equations: Theorem 10.1.6 Let C be a nonempty, closed, convex subset of X, and let D be a nonempty, open, convex subset of X. Moreover let F : D → Y have a Lipschitz continuous Fr´echet derivative F ′ with Lipschitz constant b. Suppose: generalized equation 0 ∈ F (x) + NC (x)
(10.1.5)
has a strongly regular solution x∗ with associated Lipschitz constant d; operator F ′ has a center-Lipschitz constant (at x = x∗ ) b0 . Let ρ, r, and R be the positive constants introduced in Corollary 1. Let x0 ∈ D and define the notation: r0 = kx∗ − x0 k Sx = LFx + NC a0 = d(1 − db0 r0 )−1 1 h0 = a0 br0 . 2 Moreover suppose: d(b + 2b0 )r0 < 2, 1 2 br < R, 2 0 r0 < ρ
(10.1.6)
U (x∗ , r0 ) ⊂ D ∩ U(x∗ , ρ).
(10.1.9)
(10.1.7) (10.1.8)
Then Newton sequence {xn } (n ≥ 0) given by xn+1 = (Sxn )−1 (0) ∩ U (x∗ , r)
(10.1.10)
is well defined, remains in U (x∗ , r0 ) for all n ≥ 0 and converges to x∗ . Moreover the following estimates hold for all n ≥ 0: kxn+1 − x∗ k ≤
1 a0 bkxn − x∗ k2 2
(10.1.11)
and n+1
kxn+1 − x∗ k ≤ 2(a0 b)−1 h20
.
Note that b0 ≤ b holds in general and that
(10.1.12) b b0
can be arbitrarily large.
382
10. Generalized equations
Proof. We introduce Tx = U (x∗ , r) ∩ Sx−1 , Tk = Txk ,
f (x) = d[1 − dkF ′ (x) − F ′ (x∗ )]−1
J(x) = LFx (x∗ ) − LFx∗ (x∗ ). We will show:
kTx (u) − Tx (v)k ≤ a0 ku − vk for all x ∈ U (x∗ , r0 )
(10.1.13)
and u, v ∈ U (0, R),
and
x∗ ∈ Sx−1 (J(x)) for all x ∈ D,
(10.1.14)
x∗ = Tx (J(x)) for x ∈ U(x∗ , r0 ) and J(x) ∈ U (0, R)
(10.1.15)
kJ(x)k ≤
b kx − x∗ k2 , 2
h < 1.
(10.1.16) (10.1.17)
By Corollary 10.1.5 Tx is a single-valued and Lipschitz continuous with modulus f (x) for x ∈ U(x∗ , ρ). Since, for x ∈ U (x∗ , r0 ) ⊂ U(x∗ , ρ) ∩ D kF ′ (x) − F ′ (x∗ )k ≤ b0 kx − x0 k ≤ b0 r0 we get f (x) ≤ a0 , which shows (10.1.13) because g(x) is the Lipschitz constant for operator Tx . Let x ∈ D, then we get 0 ∈ F (x∗ ) + NC (x∗ ) = LFx∗ (x∗ ) + NC (x∗ )
= LFx (x∗ ) − J(x) + NC (x∗ )
= −J(x) + Sx (x∗ ) + NC (x∗ ) so J(x) ∈ Sx (x∗ ), which shows (10.1.14). Equation (10.1.15) follows from (10.1.14) and the fact that operator Tx is single-valued on U (x∗ , r0 ). Estimate (10.1.16) is standard under the stated hypotheses, whereas (10.1.17) follows immediately from hypothesis (10.1.6). By (10.1.8), (10.1.17), and (10.1.19) we conclude (10.1.13) holds for x = xn . Using (10.1.15) and (10.1.20), we get X ∗ = Tn (J(xn )). That is xn+1 = Tn (0) is well defined, and by (10.1.13) kxn+1 − x∗ k = kT0 (0) − Tn (J(xn ))k ≤ a0 kJ(xn )k 1 ≤ ba0 kxn − x∗ k2 by (10.1.16), (10.1.19) and U(x∗ , r0 ) ⊆ D 2 1 ≤ ba0 (h0 r0 )2 by (10.1.19) 2 = h30 r0 < h0 r0 < r0 .
10.1. Strongly regular solutions
383
Furthermore by xn+1 ∈ U (x∗ , r0 ) ⊆ D and (10.1.16) we get 1 bkxn+1 − x∗ k2 2 1 1 ≤ b(h0 r0 )2 ≤ br02 < R, 2 2
kJ(xn+1 )k ≤
which completes the induction. Finally (10.1.11) follows from (10.1.19), and (10.1.12) from (10.1.11) using induction. Moreover using induction on k ≥ 1 we show: xk = Tk−1 (0) is well defined 1 kxk − x∗ k ≤ a0 bkxk−1 − x∗ k ≤ hr0 2 J(xk ) ∈ U (0, R).
(10.1.18) (10.1.19) (10.1.20)
For k = 1: T0 (0) is single-valued by Corollary 10.1.5, which shows (10.1.18). By (10.1.7) and (10.1.16) kJ(x0 )k ≤
b b kx0 − x∗ k2 = r02 < R 2 2
and hence by (10.1.15), x∗ = T0 (J(x0 )). Therefore we get kx1 − x∗ k = kT0 (0) − T0 (J(x0 ))k ≤ a0 kJ(x0 )k by (10.1.13) 1 1 ≤ ba0 kx0 − x∗ k2 = a0 br02 by (9) and (10.1.16) 2 2 = h0 r0 < r0 by (10.1.17). It follows by (10.1.8) and (10.1.9) that x1 ∈ U (x∗ , r0 ) ⊆ D. By (10.1.16) we can obtain b 1 1 kx1 − x∗ k2 ≤ b(h0 r0 )2 = br02 h20 2 2 2 1 < br02 by (10.1.17) 2 < R by (10.1.7).
kJ(x1 )k ≤
Therefore (10.1.18)–(10.1.20) hold for k = 1. Let us assume (10.1.18)–(10.1.20) hold for all k ≤ n. We must show (10.1.18)–(10.1.20) hold for k = n + 1. The following two results can be found in Section 5.1. Lemma 10.1.7 Let b0 > 0, b > 0, d > 0 and η > 0 with b0 ≤ b. Set h = d b02+b η and suppose h ≤ 12 . Define t0 = 0, tk+1 = G0 (tk ), (k ≥ 0), where G0 (t) =
1 2 2 dbt
−η . db0 t − 1
384
10. Generalized equations
Let t∗ = lim tn and t∗∗ = 2η. Define also Q0 (s, t) = n→∞
dbs2 2(1−db0 t) .
Then the following
hold: (a) tk+1 = tk + Q0 (tk − tk−1 , tk ) (k ≥ 1); (b) G0 is monotone non-decreasing on [0, t∗ ]; (c) t∗ is the smallest fixed point of G0 on [0, t∗∗ ]. (d) sequence {tk } is monotonically increasing, and lim tn exists; (e) tk+1 − tk ≤ 2−k η (k ≥ 0); (f ) n→∞ k 1 k+1 η (k ≥ 0); (g) |t∗ − tk | ≤ 2a2 · (bd2k )−1 (k ≥ 0). tk+1 ≤ 2 1 − 2
Lemma 10.1.8 Let D ⊆ X, and let {xn } (n ≥ 0) be a sequence in X. With the notation using in Lemma 1 assume: (a) kxk+2 − xk+1 k ≤ Q0 (kxk+1 − xk k, kxk+1 − x0 k) whenever xk , xk+1 ∈ D; (b) x0 ∈ D; (c) t1 ≥ η = kx1 − x0 k; (d) x ∈ D whenever kx − x0 k ≤ t∗ ; (e) hypotheses of Lemma 1 hold. Then the following conclusions hold: (f ) kxk − x0 k ≤ t∗ (k ≥ 0); (g) sequence {xn } converges to some x∗ with kx∗ − x0 k ≤ t∗ ; (h) kx∗ − xk+1 k ≤ t∗ − tk+1 (k ≥ 0) (i) kxk+1 − xk k ≤ tk+1 − tk (k ≥ 0). Lemma 10.1.9 Let D be an open, convex subset of D. Let F : D → Y be continuously differentiable. Suppose that for some ℓ > 0 kF ′ (u) − F ′ (v)k ≤ ℓku − vk whenever u, v ∈ D. Then kF (u) − LFv (u)k ≤
1 ℓku − vk2 for all u, v ∈ D. 2
We can show the following semilocal convergence theorem for Newton’s method: Theorem 10.1.10 Let C, D, F , b, b0 and Sx be as in Theorem 10.1.6. Let x0 ∈ D and assume that the generalized equation 0 ∈ Sx0 (x) is strongly regular at x1 . Let ρ, r, and R be the positive constants associated with strong regularity, and d the associated Lipschitz constant, as given in Corollary 10.1.5. Let η := kx1 − x0 k and t∗ := lim tn , where a := d b02+b η. Suppose that the following relations hold: h ≤ 12 , by
n→∞ 1 2 2 bm ≤
R, and U (x0 , t∗ ) ⊂ D ∩ U (x0 , ρ). Then the sequence {xn } defined
xn+1 := Sx−1 (0) ∩ U(x0 , r) n
(10.1.21)
is well defined, converges to some x∗ ∈ U (x0 , t∗ ) which satisfies 0 ∈ F (x∗ )+NC (x∗ ), and kx∗ − xn k ≤ t∗ − tn for n = 1, 2, . . .. Note that by b0 here we mean the center Lipschitz constant of F ′ at x = x0 . Proof. We first define the following notation: g(x) :=d(1 − db0 kx − x0 k)−1 ,
, Tn :=B(x1 , r) ∩ Sx−1 n LFn :=LFxn ,
Jn :=F (xn ) − LFn−1 (xn ).
10.1. Strongly regular solutions
385
Before beginning the induction part of the proof, we prove the following relation holds: xn+1 = Tn+1 (Jn+1 ) where xn , xn+1 ∈ U (x0 , t∗ ),
(10.1.22)
xn := Tn (0), Jn+1 ∈ U (0, R) and Tn+1 restricted to U (0, R) is single-valued. By definition of Tn , xn+1 ∈ U(x1 , r) and 0 ∈ LFn (xn+1 ) + NC (xn+1 ). Thus 0 ∈ LFn+1 (xn+1 ) − (LFn+1 (xn+1 )) − LFn (xn+1 ) + NC (xn+1 ) = LFn+1 (xn+1 ) − Jn+1 + NC (xn+1 ). Rearranging results in Jn+1 ∈ (LFn+1 + NC )(xn+1 ). That is, xn+1 ∈ (LFn+1 + NC )−1 (Jn+1 ). Since Jn+1 ∈ U (0, R) and xn+1 ∈ U (x1 , r), we have xn+1 = Tn+1 (Jn+1 ), which shows (10.1.22). We now proceed by induction to prove the following for all n ≥ 1. Tn restricted to U (0, R) is single-valued and Lipschitz continuous with Lipschitz constant g(xn ).
(10.1.23)
kxn+1 − xn k ≤ Q(kxn − xn−1 k, kxn − x0 k), where xn+1 := Tn (0). (10.1.24) xn+1 ∈ U (x0 , t∗ ). kJn+1 k ≤ R.
(10.1.25) (10.1.26)
We first show that (10.1.23)–(10.1.26) hold for n = 1. By Lemma 10.1.7(d), t1 ≤ t∗ . By definition, t1 = G0 (t0 ) = m = kx1 − x0 k. Hence, x1 ∈ U (x0 , t1 ) ⊂ U (x0 , t∗ ) ⊂ D ∩ U (x0 , ρ). By Corollary 10.1.5, T1 restricted to U (0, R) is a single-valued and Lipschitz continuous operator with modulus d(1−dkF ′ (x1 )−F ′ (x0 )k)−1 . By hypothesis, kF ′ (x1 )− F ′ (x0 )k ≤ b0 kx1 − x0 k, so that d(1 − dkF ′ (x1 ) − F ′ (x0 )k)−1 ≤ g(x1 ). Hence, (10.1.23) holds for n = 1. Now define x2 = T1 (0). By Lemma 10.1.9, kJ1 k := kF (x1 ) − LF0 (x1 )k ≤ 12 bkx1 − x0 k2 = 12 bη 2 ≤ R. Thus, J1 ∈ U (0, R) and by (10.1.23), 1 dbkx1 − x0 k2 (1 − db0 kx1 − x0 k)−1 2 = Q0 (kx1 − x0 k, kx1 − x0 k).
kT1 (J1 ) − T1 (0)k ≤ g(x1 ) · kJ1 k ≤
But x2 = T1 (0) and by (10.1.22), x1 = T1 (J1 ), hence we have kx2 − x1 k = kT1 (0) − T1 (J1 )k. This establishes (10.1.24) when n = 1. By Lemma 10.1.8(f), x2 ∈ U(x0 , t∗ ). Thus, (10.1.25) holds for n = 1. Also, x2 ∈ U(x0 , t∗ ) ⊂ D, and thus
386
10. Generalized equations
by Lemmas 10.1.7(e), 10.1.8(i) and 1 we get: kJ2 k := kF (x2 ) − LF1 (x2 )k ≤
1 bkx2 − x1 k2 2
1 b(t2 − t1 )2 2 1 ≤ bη 2 2−1 2 ≤ R.
≤
Thus, (10.1.26) holds for n = 1, and we have completed the induction when n = 1. Suppose that (10.1.23)–(10.1.26) hold for all n ≥ k. We proceed to show that they also hold for n = k + 1. By (10.1.25), xk+1 ∈ U (x0 , t∗ ) ⊂ U (x0 , ρ) ∩ D. By Corollary 10.1.5, Tk+1 restricted to U (0, R) is single-valued and Lipschitz continuous with modulus d(1 − dkF ′ (xk+1 ) − F ′ (x0 )k)−1 . By hypothesis, kF ′ (xk+1 ) − F ′ (x0 )k ≤ b0 kxk+1 − x0 k. Thus, g(xk+1 ) ≥ d(1 − dkF ′ (xk+1 − F ′ (x0 )k)−1 . Hence (10.1.23) holds for n = k + 1. Let xk+2 := Tk+1 (0). By (10.1.22)–(10.1.26), xk+1 = Tk+1 (Jk+1 ). Thus, kxk+2 − xk+1 k = kTk+1 (0) − Tk+1 (Jk+1 )k ≤ g(xk+1 )kJk+1 k 1 ≤ g(xk+1 ) · bkxk+1 − xk k2 2 = Q0 (kxk+1 − xk k, kxk+1 − x0 k). This establishes (10.1.24) for n = k + 1. It follows from Lemma 10.1.8(f) that xk+2 ∈ U (x0 , t∗ ) ⊂ D, thus (10.1.25) holds for n = k + 1. Now, by Lemmas 10.1.7, 10.1.8(i) and (10.1.3) we get: kJk+2 k = kF ′ (xk+2 ) − LFk+1 (xk+2 )k 1 ≤ bkxk+2 − xk+1 k2 2 1 ≤ b(tk+2 − tk+1 )2 2 1 ≤ bη 2 2 ≤ R. Thus, (10.1.26) holds for n = k + 1, which completes the induction. By Lemma n 10.1.8, the sequence {xn } converges to x∗ ∈ U (x0 , t∗ ) and kx∗ −xn k ≤ (2n bd)−1 (2a)(2 ) . ∗ It remains to show that x is a solution to the generalized equation 0 ∈ F (x) + NC (x). Let graph NC denote the graph of NC , that is, graph NC = {(u, v) | v ∈ NC (u)}. The assumptions on C imply that graph NC is a closed subset of X × X. By construction, (xn+1 , −LFn (xn+1 )) ∈ graph NC .
10.2. Quasi Methods
387
It follows from the continuity properties of F and F ′ on D that (xn+1 , −LFn (xn+1 )) converges to (x∗ , −F (x∗ )). Since graph NC is closed, (x∗ , −F (x∗ )) belongs to graph NC , which implies 0 ∈ F (x∗ ) + NC (x∗ ). Remark 10.1.1 Denote by {sn } the sequence {tn } when b0 = b. Then our results reduce to the ones in [200]. Otherwise we have (see Chapter 5) tn < s n tn+1 − tn < sn+1 − sn t∗ − tn ≤ s ∗ − s n
t∗ ≤ s ∗ h < hK = 2dbη 2 rJ = < r0 , 3bd
which justify the claims made in the introduction.
10.2
Quasi Methods
In this section we are concerned with the problem of approximating a strongly regular solution x∗ of generalized equation 0 ∈ F (x) + NC (x)
(10.2.1)
where F is a continuously Fr´echet-differentiable operator defined on an open, convex, nonempty subset D of Rk (k ≥ 1 an integer) with values in Rk , and {z | hz, q − xi ≤ 0 for all q ∈ C}, if x ∈ C, NC (x) = ∅, if x 6∈ C. We say point x∗ ∈ D satisfies the generalized equation (1) if and only if x∗ ∈ C and hF (x∗ ), q − x∗ i ≥ 0 for all q ∈ C. Many problems in Mathematical Programming/Economics and engineering can be formulated as generalized equations of form (10.2.1). A survey on generalized equations and their solutions can be found in [68], [99], [298], [301], [302] and the references there. Newton’s method is undoubtedly the most popular method for approximating x∗ .
388
10. Generalized equations
However Newton’s method involves the evaluation of F ′ at each step, which is an expensive task in general. That is why in such cases we replace F ′ (xn ) with an affine operator F (xn ) + B(• − xn ) whose derivative B approximates F ′ (xn ) and is easier to compute. The Secant approximations are such an example, and in general the popular so-called quasi-Newton methods [140], [201], [263], [298]. Newton’s method for approximating x∗ iteratively solves 0 = F (xn ) + F ′ (xn )(x − xn ) to find a solution xn+1 , whereas a quasi-Newton method also iteratively solves 0 = F (xn ) + Bn (x − xn ) where Bn approximates F ′ (xn ) in some sense to be precised later. A survey on the convergence of quasi-Newton method can be found in [140], [201], [263], [298] and the references there. Here motivated by the work in Chapter 5 we show that a larger convergence radius and finer error bounds on the distances involved can be found than before [201], [298], and under the same computational cost. This observation is important in computational mathematics since it allows a wider choice of initial guesses x0 and given in error tolerance ε > 0 finds the approximate solution using less computational steps. We need a corollary to Theorem 2.4 of Robinson [298]: Corollary 10.2.1 Let F , F ′ , D, C, Rk be as in the introduction. Assume equation (10.2.1) has a strongly regular solution x∗ ∈ D [201], [298], with associated Lipschitz constant d. Then there exist positive constants r, R, b and r0 such that the following hold for an k × k matrix M and x ∈ Rk . −1 U (x∗ , r) ∩ F (x) + A · (• − x) + NC
restricted to U (0, R) is single-valued and Lipschitz continuous with modulus d(1 − dkA − F ′ (x∗ )k)−1 provided x ∈ U (x∗ , r0 ) and kA − F ′ (x∗ )k < b.
We can state and prove the following local convergence result for quasi-Newton methods: Theorem 10.2.2 Let D, C, F , F ′ , NC be as in the Introduction. Assume: x∗ ∈ D is a strongly regular solution of the generalized equation (1) with Lipschitz constant d; there exist positive constants ℓ0 and ℓ such that kF ′ (x) − F ′ (y)k ≤ ℓkx − yk
(10.2.2)
kF ′ (x) − F ′ (x∗ )k ≤ ℓ0 kx − x∗ k
(10.2.3)
and
for all x, y ∈ D. Let N ∈ L(Rk , Rk ) the space of linear operators from Rk into Rk . Let k · kM denote a matrix norm and let a > 0 be such that k · k ≤ ak · kM ,
10.2. Quasi Methods
389
where k · k is a matrix norm for a given vector norm on Rk . Moreover assume: for (x, B) ∈ D×N , B ∈ S(x, B), where S is an update operator and x the vector closest to x in the set (F (x) + B(• − x) + NC )−1 (0) exists, there exist positive constants a1 and a2 such that: kB − F ′ (x∗ )kM ≤ 1 + a1 max{kx − x∗ k, kx − x∗ k} kB − F ′ (x∗ )kM + a2 max{kx − x∗ k, kx − x∗ k}, hold. Let b, r0 , r and R be as in Corollary 10.2.1. Furthermore assume: for a fixed c ∈ (0, 1) and r0 , b reduced if necessary that the following hold: 1 ℓ + ℓ0 r0 + b r0 < R, 2 d(1 − db)−1
1 ℓ + ℓ0 r0 + b < c, 2
2a(a−1 ba1 + a2 )r0 (1 − p)−1 < b, kB0 − F ′ (x∗ )kM
0 such that for all x, y ∈ D kF ′ (x) − F ′ (y)k ≤ ℓkx − yk.
(10.3.4)
Here we replace (10.3.4) by the weaker center-Lipschitz condition kF ′ (x) − F ′ (x0 )k ≤ ℓ0 kx − x0 k for all x ∈ D.
(10.3.5)
Note that in general ℓ0 ≤ ℓ
(10.3.6)
holds and ℓℓ0 can be arbitrarily large. From the computational point of view it is easier to evaluate ℓ0 than ℓ unless if ℓ0 = ℓ. Therefore sufficient convergence conditions based on ℓ0 rather than ℓ are weaker and can be used in cases not covered before. This is our motivation for providing a semilocal convergence analysis for methods (10.3.2) and (10.3.3) based on (10.3.5) rather than (10.3.4). Throughout this section we assume ϕ is a convex function. We can now state and prove the semilocal convergence result for modified Newton’s method (10.3.2):
394
10. Generalized equations
Theorem 10.3.1 Assume: there exist x0 ∈ H, a > 0, b > 0, and r ≥ 0 such that for all y ∈ D ∂ϕ(x0 ) ∋ 0,
(10.3.7)
kF (x0 )k ≤ a,
(10.3.8)
hF ′ (x0 )y, yi ≥ bkyk2 ,
and
(10.3.9)
2aℓ0 < b2 , r − b ≤ r0 , ℓ0
(10.3.10)
U (x0 , r) ⊆ D,
(10.3.12)
(10.3.11)
where, p b(b − 2aℓ0 ) r0 = . ℓ0
(10.3.13)
Then sequence {xn } (n ≥ 0) generated by modified Newton method (10.3.2) is well defined, remains in U (x0 , r) for all n ≥ 0 and converges to a unique solution x∗ of inequality (10.3.1) in U (x0 , r). Moreover the following estimates hold for all n ≥ 0: kxn+2 − xn+1 k ≤ ckxn+1 − xn k, where c ∈ [0, 1) is given by √ b2 − 2aℓ0 c=1− . b
(10.3.14)
(10.3.15)
Proof. Our proof is based on the contraction mapping theorem 3.1.4. Let x ∈ U (x0 , r) and define operator g(x) by the variational inequality F ′ (x0 )g(x) + ∂ϕ(g(x)) ∋ F ′ (x0 )x − F (x).
(10.3.16)
It follows by Theorem 3.5 in [221] and (10.3.9) that operator g is well defined on U (x0 , r) (which is a closed set). The convexity of ϕ implies the monotonicity of ∂ϕ in the sense that h1 ∈ ∂ϕ(y1 ), h2 ∈ ∂ϕ(y2 ) ⇒ hh1 − h2 , y1 − y2 i ≥ 0. Using (10.3.7), (10.3.16) and (10.3.17) we get hF (x) − F ′ (x0 )(x − g(x)), g(x) − x0 i ≤ 0,
(10.3.17)
10.3. Variational Inequalities under Weak Lipschitz Conditions
395
which can be rewritten as hF ′ (x0 )(g(x) − x0 ), g(x) − x0 i ≤ hF ′ (x0 )(x − x0 ) − F (x), g(x) − x0 i. (10.3.18) Moreover by the choice of r, (10.3.5), (10.3.8), (10.3.9) and (10.3.18) we obtain in turn 1 kF (x) − F ′ (x0 )(x − x0 )k b 1 1 kF (x0 ) + F (x) − F (x0 ) − F ′ (x0 )(x − x0 )k + kF (x0 )k b b Z 1 1 kF ′ (x0 + t(x − x0 )) − F ′ (x0 )k kx − x0 kdt + b 0 a ℓ0 a ℓ0 + kx − x0 k2 ≤ + r2 ≤ r b 2b b 2b (10.3.19)
kg(x) − x0 k ≤ =
≤
(since x0 + t(x − x0 ) ∈ U (x0 , r) for all t ∈ [0, 1]). It follows from (10.3.19) that operator g maps the closed set U (x0 , r) into itself. Furthermore for any x, y ∈ U (x0 , r), the monotonicity of ∂ϕ and (10.3.2) we get hF ′ (x0 )(g(x) − g(y)) + F (x) − F (y) − F ′ (x0 )(x − y), g(x) − g(y)i ≤ 0, which can be rewritten as hF ′ (x0 )(g(x) − g(y)), g(x) − g(y)i ≤ hF ′ (x0 )(x − y) − F (x) + F (y), g(x) − g(y)i.
(10.3.20)
Using (10.3.5), (10.3.8), (10.3.9), (10.3.11) and (10.3.20) we can get in turn kg(x) − g(y)k ≤ ≤ ≤
1 kF (x) − F (y) − F ′ (x0 )(x − y)k b ℓ0 max{kx − x0 k, ky − x0 k}kx − yk b ℓ0 r kx − yk ≤ ckx − yk, b
(10.3.21)
which implies g is a contraction from the closed set U (x0 , r) into itself. The result now follows from the contraction mapping principle. Note also that by setting xn+1 = g(xn ) and using (10.3.21) we obtain (10.3.14). We can now state and prove the semilocal convergence result for Newton’s method (10.3.3): Theorem 10.3.2 Assume: there exist x0 ∈ H, a > 0, b > 0 such that for all x, y ∈ D (10.3.7), (10.3.8), (10.3.12) holds, and hF ′ (x)y, yi ≥ bkyk2;
(10.3.22)
396
10. Generalized equations
Equation "
# ℓ0 a G(p) = − p = 0, +1 2ℓ0 p b 2 1− b
(10.3.23)
has a unique positive zero r ∈ 0, 2ℓb0 . Then sequence {xn } (n ≥ 0) generated by Newton method (10.3.3) is well defined, remains in U (x0 , r) for all n ≥ 0, and converges to a unique solution x∗ of inequality (10.3.1) in U (x0 , r). Moreover the following estimates hold for all n ≥ 0: kxn+2 − xn+1 k ≤ qkxn − xn−1 k ≤ q n kx1 − x0 k,
(10.3.24)
and kxn − x∗ k ≤
qn a , 1−q b
(10.3.25)
where, q=
2ℓ0 r . b
Proof. We first note that by the coercivity condition (10.3.22) and Theorem 3.5 in [221], the solutions xn+1 of (10.3.3) exist for all n ≥ 0. By the monotonicity of ∂ϕ, (10.3.7) and (10.3.3) for n = 1, we get hF (x0 ) − F ′ (x0 )(x0 − x1 ), x1 − x0 i ≤ 0, or by rewriting hF ′ (x0 )(x1 − x0 ), x1 − x0 i ≤ hF (x0 ), x1 − x0 i.
(10.3.26)
It follows from (10.3.8), (10.3.22) and (10.3.26) that we can obtain kx1 − x0 k ≤
1 a kF (x0 )k ≤ . b b
(10.3.27)
By the monotonicity of ∂ϕ, (10.3.3), the corresponding equation for n − 1 becomes hF ′ (xn )(xn+1 − xn ), xn+1 − xn i ≤ hF (xn ) − F (xn−1 ) − F ′ (xn−1 )(xn − xn−1 , xn+1 − xn i.
(10.3.28)
Using (10.3.5), (10.3.22) and (10.3.28) we can obtain in turn assuming x0 , x1 , . . . , xn ∈ U (x0 , r) kxn+1 − xn k ≤
1 kF (xn ) − F (xn−1 ) − F ′ (xn−1 )(xn − xn−1 )k b
10.3. Variational Inequalities under Weak Lipschitz Conditions ≤
397
Z 1
1
[F ′ (xn−1 + t(xn − xn−1 )) − F ′ (x0 )](xn − xn−1 )dt
b 0
1 k(F ′ (xn−1 ) − F ′ (x0 ))(xn − xn−1 )k (10.3.29) b Z 1 ℓ0 ≤ [(1 − t)kxn−1 − x0 k + tkxn − x0 k]dt + kxn−1 − x0 k kxn − xn−1 k b 0
+
≤
ℓ0 1 1 2ℓ0 r r + r + r kxn − xn−1 k ≤ kxn − xn−1 k = qkxn − xn−1 k (n ≥ 2) b 2 2 b (10.3.30)
0 (or ≤ aℓ 2b kx1 − x0 k for n = 1), which shows (10.3.24) for all n ≥ 0. It follows by (10.3.30) that
kxn+1 − x0 k ≤ ≤
≤ ≤
kxn+1 − xn k + kxn − xn−1 k + · · · + kx2 − x1 k + kx1 − x0 k
(q n−1 + q n−2 + · · · + 1)kx2 − x1 k + kx1 − x0 k 1 − qn kx2 − x1 k kx2 − x1 k + kx1 − x0 k ≤ + kx1 − x0 k 1−q 1−q a aℓ0 + =r 2b(1 − q) b (10.3.31)
by the choice of r. That is xn+1 ∈ U (x0 , r). Furthermore for all m > 0, n > 0 we have kxn+m − xn k
≤ kxn+m − xn+m−1 k + kxn+m−1 − xn+m−2 k + · · · + kxn+1 − xn k n+m−1 ≤ q + · · · + q n kx1 −x0 k ≤ q n 1 + · · · + q m−2 + q m−1 kx1 − x0 k 1 − qm n a ≤ q . 1−q b
(10.3.32)
It follows by (10.3.32) that sequence {xn } is Cauchy in a complete space H and as such it converges to some x∗ ∈ U (x0 , r). By letting n → ∞ we obtain (10.3.25). We must show x∗ solves (10.3.1). Let y ∈ S(ϕ). Then by (10.3.3) and the lower semicontinuity of ϕ we get in turn ϕ(x∗ ) ≤ lim inf ϕ(xn+1 ) ≤ ϕ(y)+ lim infhF ′ (xn )(xn −xn+1 )−F (xn ), xn+1 −yi n→∞
= ϕ(y) − hF (x∗ ), x∗ − yi.
n→∞
398
10. Generalized equations
By the definition of ∂ϕ we get −F (x∗ ) ∈ ∂ϕ(x∗ ), which implies x∗ solves (10.3.1). Finally to show uniqueness let y ∗ ∈ U (x0 , r) be a solution of (10.3.1). It follows by the monotonicity of ∂ϕ that hF (x∗ ) − F (y ∗ ), x∗ − y ∗ i ≤ 0, and by the mean value theorem hF ′ (tx∗ + (1 − t)y ∗ )(x∗ − y ∗ ), x∗ − y ∗ i ≤ 0
(10.3.33)
for all t ∈ [0, 1]. It then follows by (10.3.22) and (10.3.33) that bkx∗ − y ∗ k2 ≤ hF ′ (tx∗ + (1 − t)y ∗ )(x∗ − y ∗ ), x∗ − y ∗ i ≤ 0, which implies x∗ = y ∗ . Remark 10.3.1 (a) If equality holds in (10.3.6) then our Theorem 10.3.1 reduces to the corresponding Theorem 1 in [326, p. 432]. Otherwise our condition (10.3.10) is weaker than corresponding condition (10.3.13) in [326, p. 432], which simply reads 2aℓ < b2 .
(10.3.34)
Note that corresponding error estimates are also smaller since ℓ0 < ℓ. (b) Sufficient conditions for finding zeros of G can be found in [68]. For example one such set of conditions can be given by r ≥ (1 + ℓ0 )
a b
(10.3.35)
provided that a≤
b2 . 4ℓ0 (1 + ℓ0 )
(10.3.36)
(c) It follows by (10.3.25) and (10.3.27) that scalar {tn } (n ≥ 0) defined by a ℓ0 1 t0 = 0, t1 = , tn+2 = tn+1 + 2tn + (tn+1 − tn ) (tn+1 − tn ) b b 2 is a majorizing sequence for {xn } (n ≥ 0). Sufficient convergence conditions for sequences more general than {tn } have already been given in Section 5.1 and can replace condition (10.3.23) in Theorem 10.3.2 on such condition is: there exist θ ∈ [0, 1) such that for all n ≥ 1 a 1 − θn 1 bθ + θn−1 ≤ (10.3.37) b 1−θ 4 2ℓ0
10.3. Variational Inequalities under Weak Lipschitz Conditions or the stronger but easier to verify a 1 bθ 1 + ≤ . b 1−θ 4 2ℓ0
399
(10.3.38)
In this case we have kxn+1 − xn k ≤ θ(tn+1 − tn ),
(10.3.39)
kxn − x∗ k ≤ t∗ − tn ,
(10.3.40)
and t∗ = lim tn n→∞
replacing the less precise estimates (in general) (10.3.24) and (10.3.25) respectively. (d) Direct comparison between Theorem 10.3.2 and the corresponding Theorem 2 in [326, p. 434] cannot be made since we use only (10.3.5) instead of the stronger (10.3.4). Note however that condition in [326] a
8, ℓ0 (1 + ℓ0 )
(10.3.42)
which can happen since as stated in the introduction ℓℓ0 can be arbitrarily large. (e) It turns out that condition (10.3.5) can be weakened even further, and replaced as follows: Assume operator F ′ (x) is simply continuous at x0 ∈ D. It follows that for all ε > 0, there exists δ > 0 such that if x ∈ U (x0 , δ) kF (x) − F ′ (x0 )k < ε.
(10.3.43)
Then, we can have: Case of Theorem 10.3.1. Replace hypotheses (10.3.10) and (10.3.11) by r = min{δ, r1 } where r1 =
a b−ε
provided that b > ε.
400
10. Generalized equations
Set c = εb . With the above changes the conclusions of Theorem 10.3.1 hold as it can easily be seen by the proof of this theorem (see in particular (10.3.19), where (10.3.43) is used instead of (10.3.5)). Similarly, in: Case of Theorem 10.3.2. Replace condition (10.3.23) by r = min{δ, r2 }, where, a ε 2ε r2 = 1+ , q= b 1−q b provided that b > 2ε. Then again with the above changes the conclusions of Theorem 10.3.2 hold as it can easily be seen from the proof of this theorem (see in particular (10.3.29), where (10.3.43) is used instead of (10.3.5)). (f ) The results obtained here can be extended from a Hilbert into a Banach space setting along the lines of our works in Section 5.1. However we leave the details of the minor modifications to the motivated reader.
10.4
Exercises
10.1. Consider the problem of approximating a locally unique solution of the variational inequality F (x) + ∂ϕ(x) ∋ 0,
(10.4.1)
where F is a Gˆ ateaux differentiable operator defined on a Hilbert space H with values in H; ϕ : H → (−∞, ∞] is a lower semicontinuous convex function.
We approximate solutions x∗ of (10.4.1) using the generalized Newton method in the form F ′ (xn )(xn+1 ) + ∂ϕ(xn+1 ) ∋ F ′ (xn )(xn ) − F (xn ) to generate a sequence {xn } (n ≥ 0) converging to x∗ . Define: the set D(ϕ) = {x ∈ H : ϕ(x) < ∞}
and assume D(ϕ) 6= φ;
the subgradient ∂ϕ(x) = {z ∈ H : ϕ(x) − ϕ(y) ≤ hz, x − yi, y ∈ D(ϕ)}; and the set D(∂ϕ) = {x ∈ D(ϕ) : ∂ϕ(x) 6= 0}.
(10.4.2)
10.4. Exercises
401
Function ∂ϕ is multivalued and for any λ > 0, (1 + λ∂ϕ)−1 exists (as a single valued function) and satisfies k(1 + λ∂ϕ)−1 (x) − (I + λ∂ϕ)−1 (y)k ≤ kx − yk (x, y ∈ H). Moreover ∂ϕ is monotone: f1 ∈ ∂ϕ(x1 ), f2 ∈ ∂ϕ(x2 ) ⇒ hf1 − f2 , x1 − x2 i ≥ 0.
¯ = D(∂ϕ), ¯ Furthermore, we want D(ϕ) so that D(∂ϕ) is sufficient for our purposes. We present the following local result for variational inequalities and twice Gˆateaux differentiable operators: (a) Let F : H → H be a twice Gˆ ateaux differentiable function. Assume: (1) variational inequality (10.4.1) has a solution x∗ ; (2) there exist parameters a ≥ 0, b > 0, c > 0 such that kF ′′ (x) − F ′′ (y)k ≤ akx − yk, kF ′′ (x∗ )k ≤ b,
and
cky − zk2 ≤ F ′ (x)(y − z), y − z
for all x, y, z ∈ H; (3) x0 ∈ D(ϕ) and x0 ∈ U (x∗ , r), where −1 q 16ac 2 r = 4c b + b + 3 .
Then show: generalized Newton method (10.4.2) is well defined, remains in U (x∗ , r) and converges to x∗ with n
kxn − x∗ k ≤ p · d2 ,
(n ≥ 0)
where, p−1 =
1 c
1
∗ 3 akx
− x0 k +
b 2
and
d = p−1 kx∗ − x0 k.
(b) We will approximate x∗ using the generalized Newton method in the form f ′′ (xn )(xn+1 ) + ∂ϕ(xn+1 ) ∋ f ′′ (xn )(xn ) − ∇f (xn ).
(10.4.3)
We present the following semilocal convergence result for variational inequalities involving twice Gˆ ateaux differentiable operators. Let f : H → R be twice Gˆ ateaux differentiable Assume:
402
10. Generalized equations (1) for x0 ∈ D(ϕ) there exist parameters α ≥ 0, β > 0, c > 0 such that
′′′ (f (x) − f ′′′ (x0 ))(y, y), z ≤ αkx − x0 k kyk2kzk, kf ′′′ (x0 )k ≤ β
and
cky − zk2 ≤ f ′′ (x)(y − z), y − z
for all x, y, z ∈ H; (2) the first two terms of (10.4.3) x0 and x1 , are such that for η ≥ kx1 − x0 k √ c[β + 2 αc]−1 , η≤ c(2β)−1 ,
β 2 − 4αc 6= 0 β 2 − 4αc = 0.
Then show: generalized Newton’s method (10.4.3) is well defined, remains in U (x0 , r0 ) for all η ≥ 0, where c0 is the small zero of function δ, δ(r) = αηr2 − (c − βη)r + cη, and converges to a unique solution x∗ of inclusion ▽f (x) + ∂ϕ (x) ∋ 0. ¯ (x0 , r0 ). Moreover the following error bounds hold In particular x∗ ∈ U for all n ≥ 0 n
kxn − x∗ k ≤ γd2 , where, γ −1 =
αr0 +β c
and d = ηγ −1 .
10.2. Let M , h·, ·i, k · k denote the dual, inner product and norm of a Hilbert space H, respectively. Let C be a closed convex set in H. Consider an operator a : H × H → [0, +∞). If a is continuous bilinear and satisfies a(x, y) ≥ c0 kyk2 ,
y ∈ H,
(10.4.4)
and a(x, y) ≤ c1 kxk · kyk,
x, y ∈ H,
(10.4.5)
for some constants c0 > 0, c1 > 0 then a is called a coercive operator. Given z ∈ M , there exists a unique solution x ∈ C such that: a(x, x − y) ≥ hz, x − yi,
y ∈ C.
(10.4.6)
10.4. Exercises
403
Inequality (10.4.6) is called variational. It is well known that x∗ can be obtained by the iterative procedure xn+1 = PC (xn − ρF (G(xn ) − z)),
(10.4.7)
where PC is a projection of H into C, ρ > 0 is a constant, F is a canonical isomorphism from M onto H, defined by hz, yi = hF (z), yi,
y ∈ H, z ∈ M,
(10.4.8)
and a(x, y) = hG(x), yi,
y ∈ H.
(10.4.9)
Given a point-to-set operator C from H into M we define the quasivariational inequality problem to be: find x ∈ C(x) such that: a(x, y − x) ≥ hz, y − xi
y ∈ C(x).
(10.4.10)
Here, we consider C(x) to be of the form C(x) = f (x) + C,
(10.4.11)
where f is a point-to-point operator satisfying kf (x∗ ) − f (y)k ≤ c2 kx∗ − ykλ
(10.4.12)
for some constants c2 ≥ 0, λ ≥ 1, all y ∈ H and x∗ a solution of (10.4.10). We will compute the approximate solution to (10.4.10). (a) Show: For fixed z ∈ H, x ∈ C satisfies hx − z, y − xi ≥ 0
y∈C
(10.4.13)
⇔ x = PC (z),
(10.4.14)
where PC is the projection of H into C. (b) PC given by (10.4.14) is non-expansive, that is kPC (x) − PC (y)k ≤ kx − yk,
x, y ∈ H.
(10.4.15)
(c) For C given by (10.4.11), x ∈ C(x) satisfies (10.4.10) ⇔ x = f (x) + PC (x − ρF (G(x) − z)).
(10.4.16)
Result (c) suggests the iterative procedure xn+1 = f (xn ) + PC xn − ρF (Gn ) − z) − f (xn )
for approximating solutions of (10.4.10). Let us define the expression q λ−1 θ = θ(λ, ρ) = 2c2 kx0 − xk + 1 + ρ2 c21 − 2c0 ρ .
(10.4.17)
It is simple algebra to show that θ ∈ [0, 1) in the following cases:
404
10. Generalized equations √ c0 + c20 −4c21 c2 (1−c2 ) c2 (1 − c2 ), 0 < ρ < ; c21 q 1/λ−1 2 = c3 , (2) λ > 1, c0 ≤ c1 , kx0 − xk < 2c12 1 + 1 − cc01 (1) λ = 1, c2 ≤ 12 , c0 ≥ 2c1
0 c1 , kx0 − xk ≤ Denote by H0 , H1 the sets
q = 1 − 2c2 kx0 − xkλ−1
1 2c2
H0 = {y ∈ H | ky − x∗ k ≤ c3 }
1/λ−1
2
= c4 , 0 < ρ
0 be such that D ≡ U (x0 , R). Suppose that f is mtimes Fr´echet-differentiable on D, and its mth derivative f (m) is in a certain sense uniformly continuous: kf (m) (x) − f (m) (x0 )k ≤ w(kx − x0 k),
for all x ∈ D,
(10.4.18)
for some monotonically increasing positive function w satisfying lim w(r) = 0,
t→∞
(10.4.19)
or, even more generally, that kf (m) (x)−f (m) (x0 )k ≤ w(r, kx−x0 k),
¯ r ∈ (0, R), (10.4.20) for all x ∈ D,
10.4. Exercises
405
for some monotonically increasing in both variables positive function w satisfying r ∈ [0, R].
lim w(r, t) = 0,
t→0
(10.4.21)
Let us define function θ on [0, R] by h 1 m m r + · · · + α2!2 r2 η + αm! θ(r) = c+α Z v1 Z vm−2 i w(vm−1 )(r − v1 )dv1 · · · dvm−1 − r ··· + 0
0
(10.4.22)
for some constants α, c, η, αi , i = 2, . . . , m; the equation, θ(r) = 0;
(10.4.23)
and the scalar iteration {rn } (n ≥ 0) by r0 = 0,
rn+1 = rn −
θ(rn ) θ ′ (rn )
.
(10.4.24)
Let g be a maximal monotone operator satisfying L(z)+g(z) ∋ y, and suppose: (10.4.18) holds, there exist αi (i = 2, . . . , m) such that kF (i) (x0 )k ≤ αi ,
(10.4.25)
and equation (10.4.23) has a unique r∗ ∈ [0, R] and θ(R) ≤ 0.
Then show: the generalized Newton’s method {xn } (n ≥ 0) generated by f ′ (xn )xn+1 + g(xn+1 ) ∋ f ′ (xn )(xn ) − f (xn )
(n ≥ 0), (x0 ∈ D)
is well defined, remains in V (x0 , r∗ ) for all n ≥ 0, and converges to a solution x∗ of f (x) + g(x) ∋ x. Moreover, the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ rn+1 − rn ,
(10.4.26)
and kxn − x∗ k ≤ r∗ − rn ,
r∗ = lim rn . n→∞
(10.4.27)
10.4. Let x0 ∈ D and R > 0 be such that D ≡ U (x0 , R). Suppose that f is Fr´echet-differentiable on D, and its derivative f ′ is in a certain sense uniformly
406
10. Generalized equations continuous as an operator from D into L(H, H); the space of linear operators from H into H. In particular we assume: kf ′ (x) − f ′ (y)k ≤ w(kx − yk),
x, y ∈ D,
(10.4.28)
for some monotonically increasing positive function w satisfying lim w(r) = 0,
t→∞
or, even more generally, that kf ′ (x) − f ′ (y)k ≤ w(r, kx − yk),
¯ r ∈ (0, R), x, y ∈ D,
(10.4.29)
for some monotonically increasing in both variables positive function w satisfying lim w(r, t) = 0,
t→0
r ∈ [0, R].
Conditions of this type have been studied in the special cases w(t) = dtλ , w(r, t) = d(r)tλ , λ ∈ [0, 1] for regular equations; and for w(t) = dt for generalized equations of the form f (x)+g(x) ∋ x. The advantages of using (10.4.28) or (10.4.29) have been explained in great detail in the excellent paper [6]. It is useful to pass from the function w to w(r) ¯ = sup{w(t) + w(s) : t + s = r}. The function may be calculated explicitly in some cases. For example if λ w(r) = drλ (0 < λ ≤ 1), then w(r) ¯ = 21−λ dr . More generally, if w is a r ¯ is increasing, convex concave function on [0, R], then w(r) ¯ = 2w 2 , and w and w(r) ¯ ≥ w(r), r ∈ [0, R]. Let us define the functions θ, θ¯ on [0, R] by Z r h i η+ w(t)dt − r, for some α > 0, η > 0, c > 0 Z0 r h i ¯ = 1 η+ θ(r) w(t)dt ¯ −r c+α
θ(r) =
1 c+α
0
and the equations θ(r) = 0, ¯ = 0. θ(r) Let g be a maximal monotone operator satisfying
there exists c > −α such that f ′ (z)(x), x ≥ ckxk2 ,
for all x ∈ H, z ∈ D,
10.4. Exercises
407
¯ and suppose: (10.4.28) holds and equation θ(r) = 0 has a unique solution ∗ r ∈ [0, R]. Then, show: the generalized Newton method {xn } (n ≥ 0) generated by f ′ (xn )xn+1 + g(xn+1 ) ∋ f ′ (xn )(xn ) − f (xn )
(n ≥ 0), (x0 ∈ D)
is well defined, remains in U (x0 , r∗ ) for all n ≥ 0, and converges to a solution x∗ of f (x) + g(x) ∋ x. Moreover, the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ rn+1 − rn kxn − x∗ k ≤ r∗ − rn , where, r0 = 0, rn+1 = rn − and lim rn = r∗ .
n→∞
¯ n) θ(r , θ′ (rn )
This page intentionally left blank
Chapter 11
Monotone convergence of point to set-mapping In this chapter we discuss the monotonicity of the iteration sequence generated by the algorithmic model xk+1 ∈ fk (x0 , x1 , ..., xk )
(k ≥ l − 1).
(11.0.1)
Here, we assume again that for k ≥ l − 1, fk : X k+1 → 2X , where X is a subset of a partially ordered space L, and let ≤ denote the order in L, (see also Chapter 2, [97]–[101])
11.1
General Theorems
Our main results will be based on a certain type of isotony of the algorithmic mappings. Definition 11.1.1 The sequence of point-to-set mapping gk : X k+1 → 2L is called increasingly isotone from the left on X, if for any sequence {x(k) } such that x(k) ∈ X (k ≥ 0) and x(0) ≤ x(1) ≤ x(2) ≤ ..., y1 ≤ y2 for all y1 ∈ gk (x(0) , x(1) , ..., x(k) ), y2 ∈ gk+1 (x(0) , x(1) , ..., x(k+1) ) and k ≥ l − 1. Consider the special case of single variable mappings gk . The above properties are now reduced to the following. Assume that x(0) , x(1) ∈ X, x(0) ≤ x(1) , y1 ∈ gk (x(0) ), and y2 ∈ gk+1 (x(1) ), then y1 ≤ y2 . Similarly, the sequence of point-to-set mappings gk : X k+1 → 2L is called increasingly isotone from the right on X, if for any sequence {x(k) } such that x(k) ∈ X (k ≥ 0) and x(0) ≤ x(1) ≤ x(2) ≤ ..., y1 ≥ y2 for all y1 ∈ gk (x(0) , x(1) , ..., x(k) ), y2 ∈ gk+1 (x(0) , x(1) , ..., x(k+1) ) and k ≥ l − 1. Our main result is the following: 409
410
11. Monotone convergence of point to set-mapping
Theorem 11.1.2 Assume that the sequence of mappings fk is increasingly isotone from the left; futhermore, x0 ≤ x1 ≤ ... ≤ xl−1 ≤ xl (xk ∈ X, 0 ≤ k ≤ l − 1). Then for all k ≥ 0, xk+1 ≥ xk . Proof. By induction, assume that for all indices i (i < k), xi+1 ≥ xi . The relations xk ∈ fk−1 (x0 ≤ x1 ≤ ... ≤ xk−1 )
and
xk+1 ∈ fk (x0 ≤ x1 ≤ ... ≤ xk )
and Definition 11.1.1 imply that xk+1 ≥ xk . Because this inequality holds for k ≤ l − 1, the proof is complete. Consider next the modified algorithmic model yk+1 ∈ fk (yk , yk−1 , ..., y1 , y0 ),
(11.1.1)
which differs from process (11.0.1) only in the order of the variables in the righthand side. Using finite induction again similarly to Theorem 11.1.2, one may easily prove the following theorem. Theorem 11.1.3 Assume that the sequence of mappings fk is increasingly isotone from the right; furthermore, y0 ≥ y1 ≥ ... ≥ yl−1 ≥ yl (yk ∈ X, 0 ≤ k ≤ l − 1). The for all k ≥ 0, yk+1 ≤ yk . Corollary 11.1.4 Assume that X ⊆ Rn ; furthermore, xk → s∗ and yk → s∗ as k → ∞. It is also assumed that ” ≤ ” is the usual partial order of vectors; i.e., a = (a(i) ) ≤ b = (b(i) ) if and only if for all i, a(i) ≤ b(i) . Under the conditions of Theorems 11.1.2 and 11.1.3, xk ≤ s∗ ≤ yk . This relation is very useful in constructing an error estimator and a stopping rule for methods (11.0.1) and (11.1.1), since for all coordinates of vectors xk , yk and s∗ , (i)
(i)
(i)
(i)
(i)
0 ≤ s∗(i) − xk ≤ yk − xk and (i)
0 ≤ yk − s∗(i) ≤ yk − xk . Hence, if each element of yk − xk is less than an error tolerance e, then all elements of xk and yk are closer than e to the corresponding elements of the limit vector.
11.2
Convergence in linear spaces
Assume now that X is a subset of a partially ordered linear space L. In addition, the partial order satisfies the following conditions:
11.2. Convergence in linear spaces
411
(F1). x ≤ y (x, y ∈ L) implies that −x ≥ y; (F2). If x(1) , x(2) , y (1) , y (2) ∈ L, x(1) ≤ y (1) and x(2) ≤ y (2) , then x(1) + x(2) ≤ y (1) + y (2) . (p)
(p)
Assume that for k ≥ l − 1 and p = 1, 2, mapping Kk , Hk are point-to-set (p) mapping defined on X k+1 , and for all (x(1) , x(2) , ..., x(k+1) ) ∈ X k+1 , Kk (x(1) , x(2) (p) (p) , ..., x(k+1) ), and Hk (x(1) , x(2) , ..., x(k+1) ) are nonempty subsets of L; i.e., Kk , (p) and Hk : X k+1 → 2L . Consider next the algorithmic model: (1)
(1)
xk+1 = tk+1 − sk+1
(2)
(2)
yk+1 = tk+1 − sk+1 ,
and
(11.2.1)
where (1)
(1)
(2)
(2)
(1)
(1)
tk+1 ∈ Kk (x0 , x1 , ..., xk ), sk+1 ∈ Hk (yk , yk−1 , ..., y0 ) and (1)
(2)
tk+1 ∈ Kk (yk , yk−1 , ..., y0 ), sk+1 ∈ Hk (x0 , x1 , ..., xk ). In the formulation of our next theorem we will need what follows. Definition 11.2.1 A point-to-set mapping g : X k+1 → 2L is called increasingly isotone on X, if for all x(1) ≤ x(2) ≤ ... ≤ x(k+2) (x(i) ∈ X, i = 1, 2, ..., k + 2), relations y1 ∈ g(x(1) , x(2) , ..., x(k+1) ) and y2 ∈ g(x(2) , x(3) , ..., x(k+2) imply that y1 ≤ y2 . Increasingly isotone functions have the following property. Lemma 11.2.2 Assume that mapping g : X k+1 → 2L is increasingly isotone. Then, for all x(i) and y (i) ∈ X (i = 1, 2, ..., k + 1) such that x(1) ≤ x(2) ≤ ..., x(k+1) ≤ y (1) ≤ y (2) ≤ ... ≤ y (k+1) and for any x ∈ g(x(1) , x(2) , ..., x(k+1) ) and y ∈ g(y (1) , y (2) , ..., y (k+1) ), x ≤ y. Proof. Let yi ∈ g(x(i+1) , ..., x(k+1) , y (1) , ..., y (i) ) be arbitrary for i = 1, 2, ..., k. Then x ≤ y1 ≤ y2 ≤ ... ≤ yk ≤ y, which completes the proof. Remark 11.2.1 In the literature a point-to-set mapping g : X k+1 → 2L is called isotone on X, if for all x(i) and y (i) such that x(i) ≤ y (i) (i = 1, 2, ..., k + 1) and for all x ∈ g(x(1) , ..., x(k+1) ) and y ∈ g(y (1) , ..., y (k+1) ), x ≤ y. It is obvious that an isotone mapping is increasingly isotone, but the reverse is not necessarily true as the following example illustrates.
412
11. Monotone convergence of point to set-mapping
Example 11.2.1 Define L = R1 , X = [0, 1], k = 1, ≤ to be the usual order of real numbers, and (2) if x(1) ≥ 2x(2) − 1 x , g(x(1) , x(2) ) = (1) (2) x − x + 1, if x(1) < 2x(2) − 1. We will now verify that g is increasingly isotone, but not isotone on X. Select arbitrary x(1) ≤ x(2) ≤ x(3) from the unit interval. Note first that g(x(1) , x(2) ) ≤ x(2) , since if x(1) ≥ 2x(2) − 1, then g(x(1) , x(2) ) = x(2) , and if x(1) < 2x(2) − 1, then g(x(1) , x(2) ) = x(1) − x(2) + 1 < 2x(2) − 1 − x(2) + 1 = x(2) . Note next that g(x(2) , x(3) ) ≥ x(2) , since if x(2) ≥ 2x(3) − 1, then g(x(2) , x(3) ) = x(3) ≥ x(2) , and if x(2) < 2x(3) − 1, then g(x(2) , x(3) ) = x(2) − x(3) + 1 ≥ x(2) . Consequently, g(x(1) , x(2) ) ≤ x(2) ≤ g(x(2) , x(3) ). Hence, g is increasingly isotone; however, it is not isotone on X, since for any points (t, 1) and (t, 1 − e) (where t, e > 0 and t + 2e < 1), g(t, 1) = t − 1 + 1 = t < g(t, 1 − e) = t − (1 − e) + 1 = t + e, but 1˙ > 1 − e. Consider now the following assumptions: (1)
(2)
(G1 ) Sequences of mapping Kk and Hk are increasingly isotone from the left. (1) (2) Futhermore, Kk and Hk are increasingly isotone from the right. (G2 ) For all k and t(0) ≤ t(1) ≤ ... ≤ t(k) ≤ t¯(k) ≤ ... ≤ t¯(1) ≤ t¯(0) (t(i) and t¯(i) ∈ X, i = 0, 1, ..., k), (2) Kk (t¯(k) , ..., t¯(0) )
and (1)
(2)
Hk (t¯(k) , ..., t¯(0) ) Hk (t(0) , t(1) , ..., t(k) ). (G3 ) The initial points are selected so that x0 ≤ x1 ≤ ... ≤ xl−1 ≤ xl ≤ yl ≤ yl−1 ≤ ... ≤ y1 ≤ y0 ; (G4 ) x0 , y0 = {x | x ∈ L, x0 ≤ x ≤ y0 } ⊆ X. Theorem 11.2.3 Under assumtions (F1 ), (F2 ) and (G1 ) through (G4 ), xk and yk are in X for all k ≥ 0. Furthermore, xk ≤ xk+1 ≤ yk+1 ≤ yk .
11.2. Convergence in linear spaces
413
Proof. By induction, assume that y0 xi+1 ≥ xi and x0 yi+1 ≤ yi for all i < k. Therefore, points xi and yi (i = 0, 1, ..., k) are in X. Note next that assumption (G1 ) implies that (1)
(1)
(2)
(2)
(1)
(1)
(2)
(2)
tk+1 ≥ tk , tk+1 ≤ tk , sk+1 ≤ sk and sk+1 ≥ sk , and hence, (1)
(1)
(1)
(1)
(2)
(2)
(2)
(2)
xk+1 = tk+1 − sk+1 ≥ tk − sk = xk and yk+1 = tk+1 − sk+1 ≤ tk − sk = yk . Assume next that xi ≤ yi (i ≤ k). Then, assumption (G2 ) implies that (1)
(2)
(1)
(2)
tk+1 ≤ tk+1 and sk+1 ≥ sk+1 . Therefore, (1)
(1)
(2)
(2)
xk+1 = tk+1 − sk+1 ≤ tk+1 − sk+1 = yk+1 . Thus, the proof is complete. Introduce next the point-to-set mappings (p)
(p)
(p)
fk (x(1) , ..., x(k+1) ) = {t−s | t ∈ Kk (x(1) , ..., x(k+1) ), s ∈ Hk (x(1) , ..., x(k+1) )}. (1)
(2)
Assume that z is a common fixed point of mappings fk and fk ; i.e., for p = 1, 2, (p) (1) (2) (1) (2) z ∈ fk (z, z, ..., z), furthermore (G5 ). Mappings Kk Kk Hk Hk are increasingly isotone. Theorem 11.2.4 Assume that x1 ≤ z ≤ y1 . Then, under assumptions (F1 ), (F2 ) and (G1 ) through (G5 ), xk ≤ z ≤ yk for all k ≥ 0. Proof. By induction, assume that xi ≤ z ≤ yi , for i ≤ k. Then, for p = 1, 2 and (p) (p) for all t(p) ∈ Kk (z, z, ..., z) and s(p) ∈ Hk (z, z, ..., z), (1)
(2)
(1)
(2)
tk+1 ≤ t(1) , t(2) ≤ tk+1 and sk+1 ≥ s(1) , s(2) ≥ sk+1 . Therefore, (1)
(1)
(2)
(2)
xk+1 = tk+1 − sk+1 ≤ t(p) − s(p) ≤ tk+1 − sk+1 = yk+1, and since z = t − s with some t and s, the proof is completed.
(p = 1, 2)
414
11. Monotone convergence of point to set-mapping
Remark 11.2.2 Two important special cases of algorithm (11.2.1) can be given: (1)
(2)
(1)
(2)
(1)
1 If Kk = Kk = Kk and Hk = Hk = Hk , then fk is a common fixed point of the mapping fk .
(2)
= fk
= fk , and so z
2 Assume next that Kk and Hk are l−variable functions. Then one may select (1)
Kk (t(0) , ..., t(k) ) = Kk (t(k−l+1) , ..., t(k) ), (2)
Kk (t(0) , ..., t(k) ) = Kk (t(0) , ..., t(l−1) ), (1)
Hk (t(0) , ..., t(k) ) = Hk (t(0) , ..., t(l−1) ) and (2)
Hk (t(0) , ..., t(k) ) = Hk (t(k−l+1) , ..., t(k) ). Note that the resulting process for l = 1 was discussed in Ortega and Rheinboldt ([263], Section 13.2). Corollary 11.2.5 Assume that in addition L is a Hausdorff topological space which satisfies the first axiom of countabi;ity, and (F3 ) If xk ∈ L (k ≥ 0) such that xk ≤ x with some x ∈ L and xk ≤ xk+1 for all k ≥ 0, then the sequence {xk } converges to a limit point x∗ ∈ L; furthermore, x∗ ≤ x.
(F4 ) If xk ∈ L (k 0) and xk → x, then −xk → −x.
Note first that assumptions (F1 ) through (F4 ) imply that any sequence {yk } with the properties yk ∈ L, yk ≥ yk+1, and yk ≥ y (k ≥ 0) is convergent to a limit point y ∗ ∈ L; furthermore, y ∗ ≥ y. Since under the conditions of the theorem {xk } is increasing with an upper bound y0 and {yk } decreases with a lower bound x0 , both sequences are convergent. If x∗ and y ∗ denote the limit points and z is a common fixed point of mappings fk such that xl ≤ z ≤ yl then x∗ ≤ z ≤ y ∗ . Consider the special l−step algorithm given as special case 2 above. Assume that for all k ≥ l − 1, mappings Kk and Hk are closed. Then, x∗ is the smallest fixed point and y ∗ is the largest fixed point in xl , yl . As a further special case we mention the following lemma. Lemma 11.2.6 Let B be a partially ordered topological space and let x. y be two points of B such that x ≤ y. If f : x, y → B is a continuous isotone mappings having the property that x ≤ f (x) and y ≥ f (y), then a point z ∈ x, y exists such that z = f (z). Proof. Select X = x, y , l = 1, Hk = 0 and Kk = f, and consider the iteration sequences {xk } and {yk } starting from x0 = x and y0 = y. Then, both sequences are convergent, and the limits are fixed points. Remark 11.2.3 This result is known as Theorem 2.2.2, (see, Chapter 2).
11.3. Applications
11.3
415
Applications
Consider the simple iteration method xk+! = f (xk ),
(11.3.1)
where f : [a, b] → R with the additional properties: f (a) > a, f (b) < b, 0 ≤ f (x) ≤ q < 1 for all × ∈ [a, b], where q is a given constant. Consider the iteration sequences xk+1 = f (xk ),
x0 = a
yk+1 = f (yk ),
y0 = b
and (11.3.2)
Since f is increasingly, Theorems 11.1.2 and 11.1.3 apply to both sequences {xk } and {yk } that converge to the unique fixed point s∗ ; furthermore, sequence {xk } is increasing and {yk } decreases. This situation is illustrated in Figure 11.3.1. Next we drop the assumption f (x) q < 1, but retain all other conditions. In other words, f is continuous, increasing in [a, b], f (a) > a
f (b) < b.
Notice that Theorems 11.1.2 and 11.1.3 still can be applied for the iteration sequence (11.3.2). However, the uniqueness of the fixed point is no longer true. This situation is shown in Figure 11.3.2. Becausef is continuous, there is at least one fixed point in the interval (a, b). (1) (2) Alternatively, Theorems 11.2.3 and 11.2.4 may also be applied with Kk = Kk ≡ (1) (2) f and Hk = Hk = 0, which has the following additional consequence. Let s∗ and s∗∗ denote the smallest and the largest fixed points between a and b. Because a < s∗ and b > s∗∗ , f (a) f (s∗ ) = s∗ and f (b) f (s∗∗ ) = s∗∗ , thus, s∗ and s∗∗ are between x1 and y1 . Therefore, for all k 1, xk s∗ s∗∗ yk ; futhermore, xk → s∗ and yk → s∗∗ . If the fixed point is unique, the uniqueness of the fixed point can be estabilished by showing that sequences {xk } and {yk } have the same limit. Finally, we mention that the above facts can be easily extended to the more general case when f : Rn → Rn . Example 11.3.1 We now illustrate procedure (11.3.2) in the case of the single variable nonlinear equation x=
1 x e + 1. 10
416
11. Monotone convergence of point to set-mapping y=x
f(x)
y=f(x)
x
x0
1
x2 s
∗
y2
y
y1
0
Figure 11.1: Monotone fixed point iterations.
y=x
f(x)
y=f(x)
x1 s∗
x0
∗∗
s
y1
y0
Figure 11.2: The case of multiple fixed points
It is easy to see that 1 1 e +1>1 10
and
1 2 e + 1 < 2. 10
Further, d dx
1 x e +1 10
=
1 x e ∈ (0, 0.75). 10
11.4. Exercises
417
Therefore, all conditions are satisfied. Select x0 = 1 and y0 = 2. Then, x1 = x2 = x3 = x4 = x3 =
1 1 1 2 e + 1 ≈ 1.27183; e + 1 ≈ 1.73891; y1 = 10 10 1 1.27183 1 1.73891 e e + 1 ≈ 1.35674; y2 = + 1 ≈ 1.56911; 10 10 1 1.35674 1 1.56911 e e + 1 ≈ 1.38835; y3 = + 1 ≈ 1.48024; 10 10 1 1.38835 1 1.48024 e e + 1 ≈ 1.40082; y4 = + 1 ≈ 1.43940; 10 10 1 1.40082 1 1.43940 e e + 1 ≈ 1.40585; y5 = + 1 ≈ 1.42182, 10 10
and so on. The monotonicity of both iteration sequences, as expected, is clearly visible.
11.4
Exercises
1. Consider the problem of approximating a locally unique zero s∗ of the equation g(x) = 0, where g is a given real function defined on a closed interval [a, b]. The Newton iterates {xn }, {yn }, n ≥ 0 are defined as g(xn ) + g (xn )(xn+1 − xn ) = 0
x0 = a
g(yn ) + g (yn )(yn+1 − yn ) = 0
y0 = b.
and
Use Theorems 11.1.2 and 11.1.3 to find sufficient conditions for the monotone convergence of the above sequences to s∗ ∈ [a, b]. 2. Examine the above problem, but using the secant iterations {xn }, {yn }, n ≥ −1 given by g(xn ) − g(xn−1 ) (xn+1 − xn ) = 0 xn − xn−1 g(yn ) − g(yn−1 ) g(yn ) + (yn+1 − yn ) = 0 yn − yn−1
g(xn ) +
x1 = a, x0 = given y1 = b, y0 = given.
3. Examine the above problem, but using the Newton-like iteration, where {xn }, {yn }, n ≥ 0 are given by g(xn ) + h(xn )(xn+1 − xn ) = 0
x0 = a
418
11. Monotone convergence of point to set-mapping and g(yn ) + h(yn )(yn+1 − yn ) = 0
y0 = b
with a suitably chosen function h. 4. Examine the above problem, but using the Newton-like iteration, where {xn }, {yn }, n ≥ 0 are given by g(xn ) + cn (xn+1 − xn ) = 0
x0 = a
g(yn ) + dn (yn+1 − yn ) = 0
y0 = b
and
with a suitably chosen real sequences {cn }, {dn }. 5. Generalize Problems 1 to 4 to Rn . 6. Generalize Problems 1 to 4 to Rn by choosing g to be defined on a given subset of a partially ordered topological space. (For a definition, see Chapter 2). 7. Compare the answers given in Problem 1 to 6 with the results of Chapter 2. 8. To find a zero s∗ for g(x) = 0, where g is a real function defined on [a, b], rewrite the equation as x = x + cg(x) = f (x) for some constant c = 0. Consider the sequences xn+1 = f (xn )
x0 = a
yn+1 = f (yn )
y0 = b.
and
Use Theorems 11.1.2 and 11.1.3 to find sufficient conditions for the monotone convergence of the above sequences to a locally unique fixed point s∗ ∈ [a, b] ot the equation x = f (x).
Chapter 12
Fixed points of point-to-set mappings In this chapter special monotonicity concepts are used to compare iteration sequences and common fixed points of point-to set maps. The results of this chapter are very useful in comparing the solutions of linear and nonlinear algebraic equations as well as the solutions of differential and integral equations (see also Chapter 3).
12.1
General theorems
Assume again that X is a subset of a partially ordered space L. Let P denote a partially ordered parameter set, and for all p ∈ P and k ≥ l − 1, let the point-to-set (p) mappings fk : X k+1 → 2X . For each p ∈ P, define the iteration algorithm (p)
(p)
(p)
(p)
(p)
xk+1 ∈ fk (x0 , x1 , ..., xk ).
(12.1.1)
Introduce the following conditions: (H1 ) If p1 , p2 ∈ P such that p1 ≤ p2 , then for all x(i) ∈ X (i ≥ 0) and yj ∈ (p) (p) (p) (k) fk (x0 , x1 , ..., xk ) (j = 1, 2), y1 ≤ y2 ; (H2 ) If p ∈ P and t(i) , s(i) ∈ X such that t(i) ≤ s(i) (i ≥ 0) then yt ≤ ys for all (p) (p) k ≥ l − 1, where yt ∈ fk (t(0) , t(1) , ..., t(k) ) and ys ∈ fk (s(0) , t(1) , ..., s(k) ) are arbitrary; (p )
(p )
(H3 ) For k = 0, 1, ..., l − 1 and all p1 , p2 ∈ P such that p1 ≤ p2 , xk 1 ≤ xk 2 . (p)
Assumption (H1 ) shows that for fixed k, mappings fk are increasing in p, and (p) assumption (H2 ) means that all mappings fk are isotone. Condition (H3 ) requires that the initial points are selected so that they are increasing in p. 419
420
12. Fixed points of point-to-set mappings
Our first result shows that under the above assumptions the entire iteration sequence (12.1.1) is increasing in p. Theorem 12.1.1 Assume that conditions (H1 ) through (H3 ) hold. Then p1 , p2 ∈ (p ) (p ) P and p1 ≤ p2 imply that for all k ≥ 0, xk 1 ≤ xk 2 . (p )
(p )
Proof. By induction, assume that xi 1 ≤ xi 2 , (i < k). Then condition (H2 ) (p ) (p1 ) (p2 ) (p2 ) implies that xk 1 ≤ y for all y ∈ fk−1 (x0 , ..., xk−1 ), and from assumption (H1 ) (p )
(p )
(p )
we conclude that y ≤ xk 2 . Thus, xk 1 ≤ xk 2 , which completes the proof.
Remark 12.1.1 The proof of the theorem remains valid in the slightly more general case, when it is assumed only that condition (H2 ) holds either for p1 or for p2 . Assume next that L is a metric space. Furthermore, (H4 ) If tk , sk ∈ X such that tk ≤ sk (k ≥ 0) and tk → t∗ , sk → s∗ as k → ∞ with some t∗ , s∗ ∈ L, then t∗ ≤ s∗ ; (p)
(H5 ) For all p ∈ P, iteration sequence xk converges to a limit x(p) ∈ L. ∗
Our next result compares the limit points x(p) . ∗
∗
Theorem 12.1.2 Under assumptions (H1 ) through (H5 ), x(p1 ) ≤ x(p2 ) if p1 ≤ p2 . (p )
(p )
Proof. From Theorem 12.1.1 we know that xk 1 xk 2 for all k ≥ 0, and therefore condition (H4 ) implies the assertion. Corollary 12.1.3 Assume that for all p ∈ P, algorithm (12.1.1) is an l−step pro∗ (p) cess, mappings fk are also closed. Then, x(p) is a common fixed point of mappings (p) fk (k ≥ l − 1), and it is increasing in p. As the conclusion of this section a different class of comparison theorems is (1) (2) introduced. Consider the iteration sequences xk , xk ∈ X (k ≥ 0), and assume that (1)
(I1 ) xi
(2)
≤ xi
for i = 0, 1, ..., l − 1;
(I2 ) For all k ≥ l there exist mappings φk : X k+1 → 2L such that • φk (x(0) , x(1) , ..., x(k) ) is decreasing in variables x(i) (0 ≤ i < k), and is strictly increasing in x(k) ; (1)
(1)
(1)
(2)
(2)
• y (1) ≤ y (2) ∀k ≥ l, y (1) ∈ φk (x0 , x1 , ..., xk ) and y (2) ∈ φk (x0 , x1 , ..., (2) xk ).
12.2. Fixed points
421 (1)
(2)
Theorem 12.1.4 Under conditions (I1 ) and (I2 ), xk ≤ xk (1)
for all k ≥ 0.
(2)
(1)
(2)
Proof. By induction, assume that xi ≤ xi for i < k, but xk > xk . Let (2) (2) (1) (1) yi ∈ φk (x0 , ..., xi−1 , xi , ..., xk ) be arbitrary for i = 0, 1, ..., k, and select any (2)
(2)
(2)
yk+1 ∈ φk (x0 , x1 , ..., xk ). Then the monotonicity of φk implies y0 ≥ y1 ≥ ... ≥ yk ≥ yk+1 ,
(1)
(2)
which contradicts the second bulleted section of assumption (I2 ). Hence, xk ≤ xk , (1) (2) and since xi ≤ xi for i = 0, 1, ..., l − 1, the proof is complete. Remark 12.1.2 Note that this theorem generalizes the main result of Redheffer and Walter [291].
12.2
Fixed points
Consider once again the general l−step algorithm xk+1 ∈ fk (xk+1−l , xk+2−l , ..., xk ),
(12.2.1)
and assume that for k ≥ l − 1, fk : X l → 2X , where X is a subset of a partially ordered metric space L. Assume that (J1 ) The algorithm converges to an s∗ ∈ L with arbitrary initial points x0 , x1 , ..., xl−1 ∈ X; (J2 ) For all k ≥ l − 1, mappings fk is isotone; (J3 ) If xk , x ∈ X, x ≤ xk (k ≥ 0) such that xk → x∗ (x∗ ∈ L) as k → ∞, then x ≤ x∗ . The main result of this section can be formulated as the following theorem. Theorem 12.2.1 Assume that conditions (J1 ) through (J3 ) hold. Furthermore, for an x ∈ X, x ≤ y with all k ≥ l − 1 and y ∈ fk (x, x, ..., x). Then, x ≤ s∗ . Proof. Consider the iteration sequence starting with initial vectors xi = x (i = 0, 1, ..., l − 1). First we show that x ≤ xk for all k ≥ 0. By induction, assume that x ≤ xi (i < k). Since xk ∈ fk−1 (xk−l , ..., xk−2 , xk−1 ), assumption (J2 ) implies that for all y ∈ fk−1 (x, x, ..., x), y ≤ xk . Then, from the assumption of the theorem we conclude that x ≤ xk . Because xk → s∗ , condition (J3 ) implies the assertion. Remark 12.2.1 The assertion of the theorem remains true if (J2 ) is weakened as follows. If x, x(i) ∈ X such that x ≤ x(i) (i = 1, 2, ..., l), then for all k ≥ l and y ∈ fk (x, x, ..., x) and y ∗ ∈ fk (x(1) , x(2) , ..., x(l) ), y ≤ y ∗ . The theorem can be easiy extended to the general algorithm (12.1.1). The details are omitted.
422
12.3
12. Fixed points of point-to-set mappings
Applications
Case Study 1 Consider the first order difference equations xk+1 = g(xk )
(12.3.1)
zk+1 = h(zk ), where g and h : D → D, with D being a subset of Rn . Assume that for all x ∈ D, g(x) h(x), and x, x ¯ ∈ D, x x ¯ imply that g(x) g(¯ x) and h(x) h(¯ x). Then Theorem 12.1.1 implies that if the initial terms x0 and z0 are selected in such a way that x0 z0 , then for the entire iteration sequence, xk zk (k ≥ 0). If xk → x∗ and zk → z∗ as k → ∞, then x∗ ≤ z∗ . This inequality is illustrated in Figure 12.1. y=x g(x), h(x) y=h(x)
y=g(x)
x x∗
z∗
Figure 12.1: Comparison of fixed points on the real line.
Case Study 2 Assume that the above equations are linear, i.e., they have the special form xk+1 = Axk + b
(12.3.2)
zk+1 = Bzk + d. If we assume that matrices A, B and vectors b and d are non-negative, and that A B and b d, then x0 z0 implies that xk zk for all k ≥ 0. This assertion is a simple consequence of Theorem 12.1.1 and the previous case study. Case Study 3 Consider the linear input-output system y = x − Ax,
(12.3.3)
12.3. Applications
423
where A ≥ 0 is an n × n constant matrix such that its spectral radius ρ(A) is less than one. Here, x and y denote the gross and net production vectors, respectively. It is known that for all y ∈ Rn , equation (12.3.3) has a unique solution x(y), which can be obtained as the limit of the iteration process xk+1 = Axk + y
(12.3.4) n
with arbitrary initial point x0 ∈ R . By selecting X = Rn , l = 1, f (x) = Ax + y, we obtain the following result. If for any vector u, u ≤ Au + y, then u ≤ x(y); and if for a vector v, v ≥ Av + y, then v ≥ x(y). Here, ” ≤ ” the partial order of vectors, or u = (ui ) ≤ v = (v i ) if for all i, ui ≤ v i . Case Study 4 Consider again the linear input-output system (12.3.3) with the same assumptions as before. As we have seen, the iteration process (12.3.4) converges to the unique solution x(y). Since A ≥ 0, function Ax + y is increasing in x and y. Therefore Theorem 12.1.1 with the selection p = y implies that x(y1 ) ≤ x(y2 ), if y1 ≤ y2 . That is, higher net production requires higher gross production. Case Study 5 As a simple consequence of Theorem 12.2.1, this case proves a useful result which can be applied in practical cases to determine wether a matrix has a nonnegative inverse. The assertion is the following. Theorem 12.3.1 Let A be an n × n real constant matrix. A is nonsingular and A−1 ≥ 0 if and only if an n × n real constant matrix D exists such that (i) D ≥ 0; (ii) I − DA ≥ 0; (iii) ρ(I − DA) < 1. Proof. 1. If A−1 exists and is nonnegative, then D = A−1 satisfies the above conditions. 2. From (iii) we know that DA is nonsingular; therefore, A−1 exists. In order to show that A−1 ≥ 0, we verify that equation Ax = b has a non-negative solution x for all b ≥ 0. Note first that this equation is equivalent to the following x = (I − DA)x + Db.
(12.3.5)
Condition (iii) implies that the iteration sequence generated by the mapping f (x) = (I − DA)x converges to s∗ = 0. Starting from arbitrary initial vector x0 . If the partial order is selected as ” ≥ ” (instead of ” ≤ ”), then for the solution x∗ of equation (12.3.5) x∗ ≥ f (x∗ ), and therefore Theorem 12.2.1 implies that x∗ ≥ s∗ = 0. Hence, the solution x∗ of equation (12.3.5) is non-negative, which completes the proof. In the matrix theoretical literature, a special class of matrices satisfying the conditions of this theorem, is often discussed.
424
12. Fixed points of point-to-set mappings
Definition 12.3.2 An n × n real matrix A = (aij ) is called an M -matrix, if D = −1 −1 diag(a−1 11 , a22 , . . . , ann ) satisfies conditions (i), (ii), (iii). Notice that assumptions (i) and (ii) imply that for all i, aii > 0 and for all i = j, aij ≤ 0. For the sake of simplicity, let NP denote the set of all n × n real matrice whose off-diagonal elements are all non-positive, and let MM denote the set of all n × n M -matrices. Obviously, MM ⊂ NP. A comprehensive summary of the mostly known characterizations of M -matrices is given as follows. Theorem 12.3.3 Let A ∈ NP. Then the following conditions are equivalent to each others: 1. there exists a vector x ≥ 0 such that Ax > 0; 2. there exists a vector x > 0 such that Ax > 0; 3. there exists a diagonal matrix D with positive diagonal such that AD1 > 0; 4. there exists a diagonal matrix D with positive diagonal such that AD is strictly diagonally dominant; 5. for each diagonal matrix R such that R ≥ A, R−1 exists and ρ(R−1 (Ad − A)) < 1, where Ad is the diagonal of A; 6. if B ≥ A and B ∈ NP, then B−1 exists; 7. each real eigenvalue of A is positive; 8. each principal minor of A is positive; 9. there exists a strictly increasing sequence ∅ = M1 ⊆ M2 ⊆ . . . ⊆ Mr = {1, 2, . . . , n} such that the principal minors A(Mi ) are positive, where A(Mi ) is obtained from A by eliminating all rows and columns with indices i ∈ / Mi ; 10. there exists a permutation matrix P such that PAP−1 can be decomposed as LU, where L ∈ NP is a lower triangular matrix with positive diagonal, and U ∈ NP is an upper triangular matrix with positive diagonal; 11. inverse A−1 exists and is non-negative. Proof. 1 ⇒ 2: The continuity of Ax implies that for sufficiently small ε > 0, A(x+ ε1) > 0. Here x + ε1 > 0. 2 ⇒ 3: Let Ax > 0, and define D = diag(x(1) , ..., x(n) ), where x(i) denotes the ith element of x. Then, AD1 = Ax > 0. 3 ⇒ 4: Let D be a diagonal matrix with positive diagonal such that AD1 > 0. Then, with th enotation AD = W = (wij ),
n
wij > 0, i.e., wij >
j=1
j=i
− wij =
j=i
|wij |,
12.3. Applications
425
since for i = j, wij ≤ 0. 4 ⇒ 5: Let WD denote the diagonal of W. Then the row −1 norm of I − W−1 D W is less than one, therefore ρ(I − WD W) < 1, and −1
−1 −1 (I − A−1 AD AD) ρ(I − A−1 D A) = ρ(D D A)D) =ρ(I − D
= ρ(I − W−1 D W) < 1,
since WD = AD D. Let R ≥ A be a diagonal matrix. Then, R ≥ AD , and therefore R−1 ≤ A−1 D . The Perron-Frobenius theorem (see, for example, Luenberger, 1979) implies that −1 ρ(R−1 (AD − A)) ≤ ρ(A−1 D (AD − A)) = ρ(I − AD A) 0, since otherwise a singular R could be selected with at least one zero diagonal element. Inequality BD ≥ AD implies that B−1 D exists and is positive. Therefore −1 0 ≤ B−1 D (BD − B) ≤ BD (AD − A),
which implies that −1 −1 ρ(I − B−1 D BD ) = ρ(BD (BD − B)) ≤ ρ(BD (AD − A)) < 1
since the selection R = B satisfies condition 5. Hence, B−1 D BD is invertible, which implies that B−1 exists. 6 ⇒ 7: Assume that 6 holds, and let w ≤ 0. Then, matrix B = A−wI satisfies inequality B ≥ A. Since B−1 exists, w is not an eigenvalue of A. 7 ⇒ 8: Let M ⊂ {1, 2, ..., n}, and define matrix B as ⎧ i, j ∈ M ⎨ aij aii i∈ /M bij = ⎩ 0 otherwise.
Clearly, B ≥ A and B ∈ NP. It is easy to prove by finite onduction that det(B) > 0. Notice that det(B) is the product of det A(M ) and the aii ’s with i ∈ / M, and it is also the product of all eigenvalues of B. We will show that all real eigenvalues of B are positive. Let β denote an eigenvalue of B, and assume that β ≤ 0. Then ¯ = B−βI ≥ A, and we will next prove that B−βI is nonsingular, which implies B ¯ ≥ 0, and that β is not an eigenvalue of B. There exists a σ > 0 such that I−σ B ¯ = U ≥ 0. V = I−σA ≥ I−σ B
The Perron-Frobenius theory and assumption 7 imply that 0 ≤ ρ(U) ≤ ρ(V) < 1, ¯ B ¯ is nonsingular. 8 ⇒ 9: This therefore, (I − U)−1 exists. Since I − U =σ B, is trivial. 9 ⇒ 10: This assertion follows from the well-known conditions of the existence of LU decomposition (see, for example, Szidarovsky and Yakowitz, [314]. 10 ⇒ 11: Assume that A may be written as A = LU where L and U satisfy the given properties. Then both L−1 and U−1 exist and it is easy to show that L−1 and U−1 are nonnegative. Therefore, A−1 = (LU)−1 = U−1 L−1 ≥ 0. 11 ⇒ 1: Select x = A−1 1 ≥ 0, then Ax = 1 > 0. Thus, the proof is complete.
426
12. Fixed points of point-to-set mappings
Corollary 12.3.4 An n × n real matrix is an M −matrix if and only if it satisfies any one of the conditions of the theorem. Proof. If A is an M −matrix, then it satisfies 11, and if a matrix satisfies any one of the assumptions, then it satisfies 5. Select R as the diagonal of A, the condition 5 is equivalent to the conditions of Theorem 12.3.1. Case Study 6 A nice application of the theory of M -matrices is contained here. Consider equations g(x) = b(1) and g(x) = b(2) , where g : D → Rn with D ⊆ Rn being convex, b(1) and b(2) are in R(g), furthermore b(1) b(2) . Assume that g is differentiable, and for all x ∈D and i and j, ∂gi (x) > 0, ∂xi
∂gi (x) ≤ 0 ∂xj
(j = i),
and
12.4
∂gi ∂gi . (x) > (x) ∂xi j=i ∂xj
Exercises
12.1. Assume that conditions (H1 ), (H3 ) hold. Moreover, (H2 ) holds either for p1 or for p2 . Then show that p1 , p2 ∈ P and p1 ≤ p2 imply that for all k ≥ 0 (p ) (p ) xk 1 ≤ xk 2 . 12.2. Let un : N → R and u : R+ → R be given by δun = un+1 − un , δu(t) = u(t + 1) − u(t). If f : R × R → R has the property that p ≥ q =⇒ f (t, p) ≥ f (t, q) it is said that f (t, ↑) is increasing, and if −f has this property it is said that f (t, ↓) is decreasing. In either case, the condition is required for all t ∈ R+ . Suppose that (i) f (t, ↑) is increasing and δun ≤ f (n, un ),
δvn ≥ f (n, vn ), n ∈ N
or (ii) f (t, ↓) is decreasing and δun ≤ f (n, un+1 ),
δvn ≥ f (n, vn+1 ), n ∈ N.
Then show that u0 ≤ v0 =⇒ un ≤ vn for all n ∈ N.
12.4. Exercises
427
12.3. Let u and v be functions R+ → R. Suppose (i) f (t, ↑) is increasing and δu(t) ≤ f (t, u(t)),
δv(t) ≥ f (t, v(t)), t ∈ R+
or (ii) f (t, ↓) is decreasing and δu(t) ≤ f (t, u(t + 1)),
δv(t) ≥ f (t, v(t + 1)), t ∈ R+ .
Then show that u(t) ≤ v(t) for all 0 ≤ t < 1 =⇒ u(t) ≤ v(t) for t ∈ R+ . 12.4 Let u− (t) = inf t≤q≤t+1 u(q), u+ (t) = supt≤q≤t+1 u(q). Suppose that f and u satisfy hypothesis (i) or (ii) of the previous problem and suppose further that f (t, M ) ≤ 0 for t ∈ R+ , where M is a constant. Then show that u+ (0) ≤ M ⇒ u(t) ≤ M, t ∈ R+ . 12.5 Let u be a locally bounded solution of h(t)δu(t) ≤ g(t) − u(t + 1), t ∈ R+ where g > 0 and h > 0 for large t. Suppose v is positive for large t and satisfies the conditions 1 (h(t) + 1) v (t) + v− (t) + v(t) ≥ g(t), t 1. 2 Then show that u(t) ≤ O(v(t)), as t → ∞. 12.6 It is said that f : R+ → R+ is admissible if constants a, T exist such that f (t) > 0 and
f (s) ≤ a for s, t ≥ T, |s − t| ≤ 1. f (t)
Let f and h be admissible functions such that lim sup h(t)f (t) < ∞
t→∞
and let u be a locally bounded, nonnegative solution of f (t)δu(t) ≤ g(t) − u(t + 1)1+a where a > 0. Then show that g(t) = O(f (t)−1−1/a ) ⇒ u(t) = O(f (t)−1/a ), as t → ∞.
428
12. Fixed points of point-to-set mappings
12.7 Let ϕ(t, p, q) be decreasing in p and strictly increasing in q. Then show that if u0 ≤ v0 and ϕ(n, un , un+1 ) ≤ ϕ(n, vn , vn+1 ), for all n ≥ 0 then un ≤ vn , for all n ≥ 0. 12.8 Consider the inequality c(1 + t)r δu(t) ≤ g(t) − u(t + 1)1+a , t ∈ R+ , where g is a positive function R+ → R and c, a, r are real constants with c > 0 and a ≥ 0. Assume u is locally bounded and u ≥ 0. Let a = 0. If r < 1 (c, m arbitrary) or if r = 1 and m ≥ − c1 then show that g(t) = O(tm ) ⇒ u(t) = O(tm ). 12.9 Extend Theorem 12.2.1 for the general algorithm (11.0.1). 12.10 Find results similar to those introduced in case study 8 for systems of ordinary differential equations.
Chapter 13
Special topics We exploit further the conclusions of Section 5.1.
13.1
Newton-Mysovskikh conditions
The Newton-Mysovskikh theorem [248], [263] provides sufficient conditions for the semilocal convergence of Newton's method to a locally unique solution of an equation in a Banach space setting. In this section we show that under weaker hypotheses and a more precise error analysis than before [248], [263]weaker sufficient conditions can be obtained for the local as well as the semilocal convergence of Newton's method. Error bounds on the distances involved as well as a larger radius of convergence are also obtained. Some numerical examples are also provided. In particular we are motivated by the affine convariant Newton-Mysovskikh theorem (ACNMT) (see also Theorem 13.1.1 that follows) which basically states that if the initial guess xo is close enough to the solution x* then Newton's method (4.1.3) converges quadratically to x*. The basic condition is given by - x)II 5 wily - x1I2 for all x, y, z E D. ~~F'(Z)-~[F' ( ~F1(x)](y )
(13.1.1)
Here we have noticed (see Theorem 13.3.1) that condition (13.1.1) is not used at full strength to prove convergence of method (4.3.1). Indeed a much weaker hypothesis (see (13.1.4)) can replace (13.1.1) in the proof of the theorem. This observation has the following advantages (semilocal case): (a) the applicability of (ACNMT) is extended; (b) finer error bounds on the distances llxn+l - x, 11, llx, - x*11 (n 2 0); (c) a more precise information on the location of the solution x*; (d) the Lipschitz constant is at least as small and easier to compute. Local case: (a) a larger radius of convergence; (b) finer error bounds on the distances llx, - x*ll (n 2 0).
13. Special topics
430
The above advantages are obtained under not only weaker hypotheses but under less or the same computational cost. Some numerical examples are also provided for both the semilocal and local case where our results apply where other ones in [248],[263]fail. We can show the following weaker semilocal convergence version of the Affine Covariant Newton-Mysovskikh Theorem (ACNMT): Theorem 13.1.1 Let F : D 5 X + Y be a Fre'chet continuously differentiable operator, and let xo E D be an initial guess. Assume further that the following conditions hold true:
F'(x)-' E L ( Y , X ) ( x E D ) ,
(13.1.2)
I l ~ ' ( x o ) - ~ ~ ( x o577, )ll
(13.1.3)
I~F'(Y)-~[F'(x
(13.1.4)
f t ( y - x ) )- F1(x)](y x)II I wllly - x1I2t for all x , y E Dl t E [011] -
U(x0,r1) G D ,
where
Then sequence {x,) ( n > 0 ) generated by Newton's method (4.1.3) is well defined, remains i n g ( x o ,r l ) for all n 2 0 , and there exists a unique x* E g ( x o ,r l ) such that F ( x * ) = 0 and xn + x* as n + co with
IIx* - xn+lII I &n11x*- xn-i1I2 where,
for all n 2 1,
Proof. Using (4.1.3) and (13.1.4) we obtain in turn
13.1. Newton-Mysovskikh conditions
and
which shows (13.1.8). We now prove {x,} is a Cauchy sequence in g ( x o ,r l ) . By induction on k we will show:
For k
= 0,
we have by (4.1.3), (13.1.3) and (13.1.5)
2 llxl - X O I I = I I F ' ( x ~ ) - ~ FI( x17 ~=) -hl. ~~ w1
Assuming (13.1.13) to be true for some k E N , we get
and
it follows from (13.1.15) that xk+l E u ( x o ,r l ) : Similarly it can be shown that
That is {x,} is a Cauchy sequence in g ( x o , r l ) and as such it converges to some x* E U ( x 0 ,T I ) . Hence, we get
432
13. Special topics
and thus F ( x * )= 0. To show (13.1.9), set hf = iwlllxk turn: lim llxk - xmll k<m+cc lim [llxm - xm-111 I k<m+cc 2 5 - lim [hm-l . .. W 1 k<m+cc
-
xk-lI1. Then we get in
llxk - x*II =
+ . . . + IIxk+i + + hk]
-
xk II]
We can also have
whence hfim 5 ( h ~ ) b~E ~ No., That is
which shows (13.1.9). Finally to show uniqueness, let y* E u ( x o ,r l ) be a solution of equation F ( x ) = 0. Then, we have for all t E [0, 11: x*
+ t(y* - x*) - X o = (1 - t)(x*- xo) + t(y*- xo)
and IIx* + t(y* - x*) - xoll I (1 - t)llx* - xoll + tlly* - xoll 5 (1 - t)rl trl = rl.
+
That is
Using the identity
13.1. Newton-Mysovskikh conditions where.
and hypothesis (13.1.2) we deduce that for xt = x* is invertible. Hence, we conclude x*
+ t(y* - x*) linear operator M
= y*.
Remark 13.1.1 Clearly condition (13.1.1) implies (13.1.4) but not vice versa. That is (13.1.4) is a weaker condition than (13.1.1). Note that
holds i n general and can be arbitrarily large. The classical A B n e Covariant Newton-Mysovskikh Theorem has been shown using (13.1.1), h , r instead of (13.1.4), hl and r l respectively, where
and
Note that
but not vice versa unless if w l = w.If strict inequality holds i n (13.1.23) then
and our error estimates (13.1.8)-(13.1.10) are finer than the corresponding ones i n [248], [263] obtained by replacing w l by w i n (13.1.8)-(13.1.10). Note that all the above advantages are obtained under less computational cost since i n practice the computation of w requires that of w l . Remark 13.1.2 Although the computation of constant wl i n (13.1.4) is i n practice easier than the computation of w in (13.1.1) one may want to replace (13.1.4) by the pair of conditions
13. Special topics
434 and
for all x,y E D, t E [O,l]. Using (13.1.28) and (13.1.29) we showed in Section 5.1 that Newton's sequence converges to a unique solution x* of equation (4.1.3) in U ( x o , provided that
3)
Clearly these results will be weaker provided w2 1 - wollz- xoll
<w, z 7 7 (xo,
2) ,
since,
by the Banach Lemma o n invertible operators and the estimate
Condition (13.1.31) certainly holds if
which can happen for suficiently small q and since
can be arbitrarily small.
We can provide an example to show that (13.1.5)and (13.1.30)hold but (13.1.24) fails. Example 13.1.1 Let X function F o n D by
=
Y
=
R,D
=
[a,2 - a ] , a E [0,$) , xo
=
Using (13.1.I), (13.1.3), (13.1.4), (13.1.28) and (13.1.29) we obtain
1 and define
13.1. Newton-Mysovskikh conditions
435
Condition (13.1.24) does not hold since
That is there is no guarantee that Newton's method starting at xo converges to x*. However condition (13.1.5) holds, since 2 (I-a) 2h1 = -3a
< 2 for alla E
(F,:)
Finally, say for S = 1, condition (13.1.30) holds since
In order for us to cover the local convergence of Newton's method under NewtonMysovskikh-type conditions, let x* be a simple solution of equation F ( x ) = 0. Assume: x x*)II 5 v(1 - t)IIx - x*112 (13.1.40) l l F 1 ( x ) - ' [ ~ ' ( x+ * t ( x - x * ) )- F 1 ( x ) ] ( for all x E u ( x * ,r * ) , t E [ O , l ] where,
Set also
We can show the following local convergence theorem for Newton's method (4.1.3): Theorem 13.1.2 Let F : D C X sume
+
Y be a Fre'chet-differentiable operator. As-
F1(x)-I E L ( Y , X ) ( x E D ) ; there exists x* E D such that F ( x * )= 0; if (a) condition (13.1.40) holds and
then, sequence (2,) generated by Newton's method (4.1.3) is well defined, remains in U ( x * ,r * ) for all n 2 0 and converges to x* provided that xo E U ( x * ,r*) with
13. Special topics
436 (b) condition (13.1.1) holds and
then, sequence {xn) generated by Newton's method (4.1.3) is well defined, remains i n U ( x * ,r ; ) for all n 2 0 and converges to x* provided that xo E U ( x * ,r ? ) with
Proof. (a) Using (13.1.42), (13.1.43) and induction on the integer Ic we get
which shows xk E U(x*,r * ) for all k and lim xk = x* . k-00
( b ) Similarly using ( 3 ) , (45) and induction on the integer Ic we get
Let us further introduce the conditions:
and
I I F ' ( x * ) - ~ [ F ' (-x F'(x*)]I~ )
I
x*II.
V O ~ ~ X
Then since
if we set
then we showed (see Section 5.1) that Newton's method converges to x* provided that xo E U ( x * ,ra) and
In general vl5v5w holds and
$, E,
(13.1.54) can be arbitrarily large. Therefore by (13.1.41) and (13.1.42)
r: 5 r*. We can now provide an example where we compare radii r*, r ; and r,*.
13.2. A conjugate gradient solver for the normal equations
437
Example 13.1.2 Let X = Y = R, D = U(0, I ) , x* = 0, and define function F on D by
Using (13.1.1), (13.1.40), (13.1.49), (13.1.50) we obtain w=e, v=e-1,
vo=e-1
and v l = e .
(13.1.56)
By (13.1.41), (13.1.42), (13.1.52), (13.1.56) we get
and
We need to reset r* = 1 so that (13.1.53) holds. It follows that since rT < r*, our approach provides a wider choice of initial guesses xo than in the local convergence results using the standard condition (13.1.1).
13.2
A conjugate gradient solver for the normal equations
In this section a weaker version of the affine covariant Lipschitz condition appearing in connection with Newton-Mysovskikh-type theorems for the semilocal convergence of Newton methods is introduced. It turns out under weaker hypotheses and less computational cost than before [142],[194],[263]a semilocal convergence analysis can be provided for Newton methods. This approach in particular on one hand extends the usage of the conjugate gradient solver for normal equations and on the other hand provides finer error estimates on the distances involved. A simple numerical example justifies our approach. In this study we are concerned with the problem of approximating a locally unique solution x* of equation
where operator F is defined on an open convex subset D of the n-dimensional Euclidean space Rn with values in itself. The most popular method for approximating x* is undoubtedly Newton's method. Newton's method computes iterates successively as the solution of linear algebraic systems
438
13. Special topics
The classical convergence theorems of Newton-Kantorovich and Newton-Mysovsikikh and its affine covariant, affine contravariant, and affine conjugate versions assume the exact solution of (13.2.2). In practice however, in particular if the dimension is large, (13.2.2) will be solved by an iterative method. In this case, we end up with an outerlinner iteration, where the outer iterations are the Newton steps and the inner iterations result from the application of an iterative scheme to (13.2.2). It is important to tune the outer and inner iterations and to keep track of the iteration errors. With regard to affine covariance, affine contravariance, and affine conjugacy the iterative scheme for the inner iterations has to be chosen in such a way, that it easily provides information about the error norm in case of affine covariance, residual norm in case of affine contravariance, and energy norm in case of affine conjugacy. Except for convex optimization, we cannot expect Ff(x), x E D , to be symmetric positive definite. Hence, for affine covariance and affine contravariance we have to pick iterative solvers that are designed for nonsymmetric matrices. Appropriate candidates are CTGNE (Conjugate Gradient for the Normal Equations) in case of affine covariance, GMRES (Generalized Minimum M i d u a l ) in case of affine contravariance, and PCG (Preconditioned Conjuate Gradient) in case of affine conjugacy. Although our results can be extended to all solvers mentioned above we only report on CGN. In this study we are motivated by the elegant work of R. Hoppe in [I941 (see also related work by E. Cgtinag [118], [119]). Hoppe's work is based on the following assumption: Suppose that F: D L R n + R n is continuously differentiable on D with invertible F'rdchet derivative F1(x),x E Rn. Assume further that the following affine covariant Lipschitz condition is satisfied: I I F ' ( ~ ) - ~ [ F '-( ~ ) Ft(x)]vll 5 wily-xll
1 1 ~ 1 1 for all x, y , z E D, v E Rn.
(13.2.3)
Condition (13.2.3) together with the sufficient convergence Newton-Mysovskikh hypothesis
form the basic hypotheses of the main results (Theorems 3.1 and 3.2 in [194]). However there are even simple scalar examples where hypothesis (13.2.4) does not hold and consequently there is no guarantee that method (13.2.2) converges to x*:
13.2. A conjugate gradient solver for the normal equations
439
Example 13.2.1 Let n = 1, xO = 1, D = [a,2 -a], a E [0, F on D by
i) and define function
It follows by (13.2.2), (13.2.3) and (13.2.5) that
~
1
~ = 5(1 ~ - a),1
2(2 - a) w 1 = -. a2
By (13.2.4) and (13.2.6) we obtain 2 H = -(I3a2
a)(2 - a).
i).
However (13.2.4) does not hold since H > 2 for all a E [o, That is the results obtained in [I941 cannot be applied in this case. We noticed that all the results obtained in [I941 hold in a weaker setting if (13.2.3) and (13.2.4) are replaced by the weaker (13.2.10) and (13.2.25) respecx*ll and the relative error are tively. Moreover error bounds Ilxk+l - xyl, Ix" finer. This information is important in computational mathematics [99], [142], [1941, [2631. To make the section as self-contained as possible we reintroduce some of the terminology and concepts in [263].
Affine Covariant Inexact Newton Methods CGNE (conjugate Gradient for the Normal Equations) We assume A E Rnxn to be a regular, nonsymmetric matrix and b E Rn to be given and look for y* E Rn as the unique solution of the linear algebraic system
As the name already suggests, CGNE is the conjugate gradient method applied to the normal equations: It solves the system
for z and then computes y according to
The implementation of CGNE is as follows: CGNE Initialization:
13. Special topics
440
Given an initial guess yo E Rn, compute the residual ro = b - Ayo and set
Po = 7-01 Po = 0 ,
Po = 0 , go = llr01l2.
CGNE Iteration Loop: For 1 5 i 5 i,,
compute
CGNE has the error minimizing property
where K i ( A T r o ,ATA) stands for the Krylov subspace
L e m m a 13.2.1 [194]. Representation of the iteration error. Let ~i := Ily* - yi1I2 be the square of the CGNE iteration error with respect to the i t h iterate. Then, there holds
Proof. CGNE has the Galerkin orthogonality
Setting m = 1, this implies the orthogonal decomposition
which readily gives
On the other hand, observing y, yields
=
y*, for m
=
n - i the Galerkin orthogonality
13.2. A conjugate gradient solver for the normal equations
Computable lower bound for the iteration error It follows readily from Lemma 13.2.1 that the computable quantity
provides a lower bound for the iteration error. In practice, we will test the relative error norm according to
where
is a user specified accuracy. Convergence of Affine Covariant Inexact Newton Methods
We denote by 62% En the result of an inner iteration, e.g., CGNE, for the solution of (13.2.2). Then, it is easy to see that the iteration error 6xk - A x k satisfies the error equation
We will measure the impact of the inexact solution of (13.2.2) by the relative error
Theorem 13.2.2 A f i n e covariant convergence theorem for the inexact Newton method. Part I: Linear Converence Suppose that F : D c Rn + Rn is continuously differentiable on D with invertible Fre'chet derivatives F 1 ( x ) ,x E Rn. Assume further that the following a f i n e covariant Lipschitz condition is satisfied
where x , y , z E D , v E Rn, t E [0, 11. Assume that xO E D is an initial guess for the outer Newton iteration and that 6x0 = 0 is chosen as the start iterate for the inner iteration. Consider the Kantorovich quantities
442
13. Special topics
associated with the outer and inner iteration. Assume that
and control the inner iterations according to
which implies linear convergence. Note that a necessary condition for t9(hk,b k ) 5 g is that it holds true for 6k = 0 , which is satisfied due to assumption (13.2.2). Then, there holds: Theorem 13.2.3 The Newton CGNE iterates \x
k Ic No stay i n
and converges linearly to some x* E B ( x O ,p) with f ( x * )= 0 . The exact Newton increments decrease monotonically according to
whereas for the inexact Newton increments we have
Proof. Using (13.2.2) we can obtain in turn:
13.2. A conjugate gradient solver for the normal equations
443
Using the affine covariant Lipschitz condition (13.2.10), the first term on the righthand side in (13.2.17) can be estimated according to 1 1 1 5 w l l l b kt d[t [ =2 - 5~ ~1116~~11~.
(13.2.18)
For the second term we obtain by the same argument 11 = I I F / ( x ~ + + [ F~/ )( -X~~
axk)f F ~ ( X ~ + ~ ) -( axk)] ~ X ~ 11
) ( ~-X ~
5 I I F ~ ( x ~ + ~ ) - ~ ( F-' (F/(x~))(Gx" x"~) axk)ll
+ 11 F ~ ( X " ~ ) - ~ F ~ ( X- ~axk)[[ + ~ ) ( ~ ~ ~
(13.2.19)
Combining (13.2.18) and (13.2.19) yields
+
l16xk - axkll 116xkll
5 hi
+6k(l+ hi).
Observing (13.2.11),we finally get
which implies linear convergence. Note that a necessary condition for 8 ( h k ,6k) 5 is that it holds true for which is satisfied due to assumption (13.2.12). For the contraction of the inexact Newton increments we get
= 0,
It can easily be shown that { X ~ )isNa ~Cauchy sequence in B ( x O p). , Consequently, there exists x* E B ( x O p) , such that xk + x* ( k + a). Since
we conclude F ( x * ) = 0. rn
13. Special topics
444
Remark 13.2.1 I n general
holds and can be arbitrarily large. If wl = w and (13.2.10) is replaced by stronger (13.2.3) then our Theorem 13.2.2 reduces t o Theorem 3.1 in [194, p. 201. Otherwise our Theorem 13.2.2 improves Theorem 3.1 in [194]; under weaker hypotheses and less computational cost since in practice the computation of wl is less expensive than w. Note also that
but not vice versa unless if wl = w. Returning back t o Example 13.2.1 and using (13.2.10) we obtain 2
wl=-