Computational Theory of Iterative Methods (Studies in Computational Mathematics, Volume 15)

COMPUTATIONAL THEORY OF ITERATIVE METHODS STUDIES IN COMPUTATIONAL MATHEMATICS 15 Editors: C.K. CHUI Stanford Unive...

Author: Ioannis Argyros

83 downloads 547 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

COMPUTATIONAL THEORY OF ITERATIVE METHODS

STUDIES IN COMPUTATIONAL MATHEMATICS 15

Editors:

C.K. CHUI Stanford University Stanford, CA, USA

L. WUYTACK University of Antwerp Antwerp, Belgium

Amsterdam • Boston • Heidelberg • London • New York • Oxford • Paris San Diego • San Francisco • Singapore • Sydney • Tokyo

COMPUTATIONAL THEORY OF ITERATIVE METHODS

IOANNIS K. ARGYROS Department of Mathematical Sciences Cameron University Lawton, OK 73505, USA

Amsterdam • Boston • Heidelberg • London • New York • Oxford • Paris San Diego • San Francisco • Singapore • Sydney • Tokyo

Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands Linacre House, Jordan Hill, Oxford OX2 8DP, UK

First edition 2007 Copyright © 2007 Elsevier B.V. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+ 44) (0) 1865 843830; fax (+ 44) (0) 1865 853333; email: [email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: ISSN:

978-0-444-53162-9 1570-579X

For information on all Elsevier publications visit our website at books.elsevier.com

Printed and bound in The Netherlands 07 08 09 10 11

10 9 8 7 6 5 4 3 2 1

DEDICATED TO MY MOTHER ANASTASIA

v

This page intentionally left blank

Introduction Researchers in computational sciences are faced with the problem of solving a variety of equations. A large number of problems are solved by finding the solutions of certain equations. For example, dynamic systems are mathematically modelled by difference or differential equations, and their solutions represent usually the states of the systems. For the sake of simplicity, assume that a time-invariant systems is driven by the equation x = f (x), where x is the state, then the equilibrium states are determined by solving the equations f (x) = 0. Similar equations are used in the case of discrete systems. The unknowns of engineering equations can be functions (difference, differential, integral equations), vectors (systems of linear or nonlinear algebraic equations), or real or complex numbers (single algebraic equations with single unknowns). Except special cases, the most commonly used solutions methods are iterative, when starting from one or several initial approximations a sequences is constructed, which converges to a solution of the equation. Iteration methods are applied also for solving optimization problems. In such cases the iteration sequences converge to an optimal solution of the problem in hand. since all of these methods have the same recursive structure, they can be introduced and discussed in a general framework. To complicate the matter further, many of these equations are nonlinear. However, all may be formulated in terms of operators mapping a linear space into another, the solutions being sought as points in the corresponding space. Consequently, computational methods that work in this general setting for the solution of equations apply to a large number of problems, and lead directly to the development of suitable computer programs to obtain accurate approximate solutions to equations in the appropriate space. This monograph is intended for researchers in computational sciences, and as a reference book for an advanced numerical-functional analysis or computer science course. The goal is to introduce these powerful concepts and techniques at the earliest possible stage. The reader is assumed to have had basic courses in numerical analysis, computer programming, computational linear algebra, and an introduction to real, complex, and functional analysis. We have divided the material into twelve chapters. Although the monograph is of a theoretical nature, with optimization and weakening of existing hypotheses considerations each chapter contains several new theoretical results and important vii

viii

Introduction

applications in engineering, in dynamic economic systems, in input-output systems, in the solution of nonlinear and linear differential equations, and optimization problems. The applications appear in the form of Examples or Applications or Exercises or study cases or they are implied since our results improve earlier one that have already been applied in concrete problems. Sections have been written as independent of each other as possible. Hence the interested reader can go directly to a certain section and understand the material without having to go back and forth in the whole textbook to find related material. There are four basic problems connected with iterative methods. Problem 1 Show that the iterates are well defined. For Example, if the algorithm requires the evaluation of F at each xn , it has to be guaranteed that the iterates remain in the domain of F . It is, in general, impossible to find the exact set of all initial data for which a given process is well defined, and we restrict ourselves to giving conditions which guarantee that an iteration sequence is well defined for certain specific initial guesses. Problem 2 Concerns the convergence of the sequences generated by a process and the question of whether their limit points are, in fact, solutions of the equation. There are several types of such convergence results. The first, which we call a local convergence theorem, begins with the assumption that a particular solution x∗ exists, and then asserts that there is a neighborhood U of x∗ such that for all initial vectors in U the iterates generated by the process are well defined and converge to x∗ . The second type of convergence theorem, which we call semilocal, does not require knowledge of the existence of a solution, but states that, starting from initial vectors for which certain-usually stringent-conditions are satisfied, convergence to some (generally nearby) solutions x∗ is guaranteed. Moreover, theorems of this type usually include computable (at least in principle) estimates for the error xn − x∗ , a possibility not afforded by the local convergence theorems. Finally, the third and most elegant type of convergence result, the global theorem, asserts that starting anywhere in a linear space, or at least in a large part of it, convergence to a solution is assured. Problem 3 Concerns the economy of the entire operations, and, in particular, the question of how fast a given sequence will converge. Here, there are two approaches, which correspond to the local and semilocal convergence theorems. As mentioned above, the analysis which leads to the semilocal type of theorem frequently produces error estimates, and these, in turn, may sometimes be reinterpreted as estimates of the rate of convergence of the sequence. Unfortunately, however, these are usually overly pessimistic. The second approach deals with the behavior of the sequence {xn } when n is large, and hence when xn is near the solutions x∗ . This behavior may then be determined, to a first approximation, by the properties of the iteration function near x∗ and leads to so-called asymptotic rates of convergence.

Introduction

ix

Problem 4 Concerns with how to best choose a method, algorithm, or software program to solve a specific type of problem and its descriptions of when a given algorithm or method succeeds or fails. We have included a variety of new results dealing with Problems 1–4. Results by other authors (or us) are either explicitely stated or implicitely when compared with the corresponding ones of ours or they are left as exercises This monograph is an outgrowth of research work undertaken by us and complements/updates earlier works of ours focusing on in depth treatment of convergence theory for iterative methods [7]–[100]. Such a comprehensive study of optimal iterative procedures appears to be needed and should benefit not only those working in the field but also those interest in, or in need of, information about specific results or techniques. We have endeavored to make the main text as self contained as possible, to prove all results in full detail and to include a number of exercises throughout the monograph. In order to make the study useful as a reference source, we have complemented each section with a set of ”Remarks” in which literature citations are given, other related results are discussed, and various possible extensions of the results of the text are indicated. For completion, the monograph ends with a comprehensive list of references and an appendix containing useful contemporary numerical algorithms. Because we believe our readers come from diverse backgrounds and have varied interests, we provide ”recommended reading” throughout the textbook. Often a long monograph summarizes knowledge in a field. This monograph, however, may be viewed as a report on work in progress. We provide a foundation for a scientific field that is rapidly changing. Therefore we list numerous conjectures and open problems as well as alternative models which need to be explored. In this monograph we do not attempt to cover all the aspects of what is commonly understood to be computational methods. Although we make reference, and briefly study (e.g. homotopy methods), our main concern is restricted to certain nongeometrical computational methods. For a geometrical approach to computational methods we refer the reader to the excellent book by Krasnosel’skii and Zabrejko [214]. The monograph is organized as follows. Chapter 1: The essentials on linear spaces are provided. Chapter 2: Some results on monotone convergence are reported. Chapter 3: Stationary and nonstationary Mann and Ishikawa-type iterations are studied. Recent results of ours are also compared favourably with earlier ones. Chapters 4–9: Newton-type methods and their implications/applications are covered here. The Newton-Kantorovich Theorem 4.2.4 is one of the most important tools in nonlinear analysis and in classical numerical analysis. This theorem has been successfully used by numerical analysts for obtaining optimal bounds for many iterative procedures. The original paper of Kantorovich [206] contains optimal a-priori bounds for the Newton’s method, albeit not in explicit form. Explicit forms of those a-priori bounds were obtained independently by Ostrowski [264] and Gragg and Tapia [170] (see [284] for the equivalence of the bounds).

x

Introduction

The paper of Gragg and Tapia [170] also contains sharp a-posteriori bounds for Newton’s method. By using different techniques and/or different a-posteriori information, the a-posteriori error bounds were refined by other authors [6], [121], [141], [143], [209]–[212], [274], [284] and us [11]–[99]. Various extensions of the Kantorovich theorem have been used to obtain error bounds for Newton-like methods [16], [29], [34], [35], [48], [121], [274], [351], inexact Newton methods [42], [58], [359], the secant method [13], [21], [275], Halley’s method [41], [160], etc. A survey of such results can be found in [68], [99], [350]. The Newton-Kantorovich theorem has also been used for proving existence and uniqueness of solutions of nonlinear equations arising in various fields. The spectrum of applications of this theorem is immense. One can sit in front of a computer and make a search for the ”Newton-Kantorovich theorem” and will find hundreds if not thousands of works related/based on this theorem. Our list below is therefore very incomplete. However we mention such diverse problems as the chaotic behavior of maps [314], optimal shape design [220], nonlinear Riemann-Hilbert problems [104], optimal control [172], nonlinear finite element [324], integral equations [4], [141], [160], flow problems [226], nonlinear elliptic problems [249], semidefinite programming [251], LP methods [294], interior point methods [282], multiparameter singularly perturbed systems [245]. We also mention that this theorem has been extended for analyzing Newton’s method on manifolds [162], [265] (see also our section 8.7). The relation between the Kantorovich Theorem and other existence theorems for solutions of nonlinear equations like Moore and Miranda theorems have been investigated in [1], [369] (see also the improvements in our chapter 8, sections 8.1 and 8.2). The foundation of the Newton-Kantorovich theorem is the famous for its simplicity and clarity Newton-Kantorovich hypothesis (4.2.51). This condition is the sufficient hypothesis for the convergence of Newton’s method. However, it is common knowledge as numerical experiments indicate that the convergence of the Newton method can be obtained even if that the initial guess is such that this condition is violated. Therefore weakening this condition is of extreme importance in computational mathematics, since the trust region will be enlarged. Recently [92] we showed that the Newton-Kantorovich hypothesis can always be replaced by weaker hypothesis (5.1.56) which doubles (at most, if l0 = 0) the application of this theorem. Note that the verification of condition (5.1.56) requires the same information and computational cost as (4.2.51), since in practice the computation of Lipschitz constant l requires that of center Lipschitz constant l0 . Moreover the following advantages hold: semilocal case: finer error bounds on the distances involved and an at least as precise information on the location of the solution; local case: finer error bounds and a larger trust region. The following advantages carry over if our approach is extended to Newton-like methods (see Chapter 5).

Introduction

xi

For all the above reasons it is worth it for somebody to revisit all existing results using the Newton-Kantorovich hypothesis and replace it with our weaker condition (5.1.56). Such a task is obviously impossible to be undertaken by us and we leave it to the motivated readers to pick on specific problems mentioned (or not) above. Here we simply have attempted (Chapters 5–9) to provide usefull but by far a complete list of such applications/implications. We report on some results on the study of variational inequalities on Chapter 10. Chapters 11 and 12 deal with the convergence of point to set mappings and their applications. Finally in Chapter 13 we improve the mesh independence principle [145] and also provide a more efficient conjugate gradient solver for the normal equations [194].


Contents

Contents

xiii

1 Linear Spaces 1.1 Linear Operators . . . . . . . . . 1.2 Continuous Linear Operators . . 1.3 Equations . . . . . . . . . . . . . 1.4 The Inverse of a Linear Operator 1.5 Fréchet Derivatives . . . . . . . . 1.6 Integration . . . . . . . . . . . . 1.7 Exercises . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

2 Monotone convergence 2.1 Partially Ordered Topological Spaces . . . . . . 2.2 Divided Differences in a Linear Space . . . . . 2.3 Divided Difference in a Banach Space . . . . . 2.4 Divided Differences and Monotone Convergence 2.5 Divided Differences and Fréchet-derivatives . . 2.6 Enclosing the Root of an Equation . . . . . . . 2.7 Applications . . . . . . . . . . . . . . . . . . . . 2.8 Exercises . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . .

3 Contractive Fixed Point Theory 3.1 Fixed Points . . . . . . . . . . . . . . . . . . . . 3.2 Applications . . . . . . . . . . . . . . . . . . . . . 3.3 Extending the contraction mapping theorem . . . 3.4 Stationary and nonstationary projection-iteration 3.5 A Special iteration method . . . . . . . . . . . . 3.6 Stability of the Ishikawa iteration . . . . . . . . . 3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . xiii

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

1 1 2 3 5 6 9 10

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

17 17 20 21 25 30 34 38 40

. . . . . . . . . . . . . . . . . . methods . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

47 47 49 50 54 64 68 79

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

xiv

Contents

4 Solving Smooth Equations 4.1 Linearization of Equations . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Newton-Kantorovich Method . . . . . . . . . . . . . . . . . . . 4.3 Local Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Convergence under Twice-Fréchet Differentiability Only at a Point 4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87 . 87 . 88 . 95 . 97 . 104

5 Newton-like methods 5.1 Two-point methods . . . . . . . . . . 5.2 Relaxing the convergence conditions 5.3 A fast method . . . . . . . . . . . . 5.4 Exercise . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

121 121 144 151 166

6 More Results on Newton’s Method 6.1 Convergence radii for the Newton-Kantorovich method 6.2 Convergence under weak continuous derivative . . . . 6.3 Convergence and universal constants . . . . . . . . . . 6.4 A Deformed Method . . . . . . . . . . . . . . . . . . . 6.5 The Inverse Function Theorem . . . . . . . . . . . . . 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

187 187 198 204 213 217 227

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

7 Equations with Nonsmooth Operators 7.1 Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Convergence of Newton-Kantorovich Method. Case 2: Hölderian assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Convergence of the Secant Method Using Majorizing Sequences . . . 7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Applications of the weaker version of the Newton-Kantorovich theorem 8.1 Comparison with Moore Theorem . . . . . . . . . . . . . . . . . . . . 8.2 Miranda Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Computation of Orbits . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Continuation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Trust Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Convergence on a Cone . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Convergence on Lie Groups . . . . . . . . . . . . . . . . . . . . . . . 8.8 Finite Element Analysis and Discretizations . . . . . . . . . . . . . . 8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

245 246 256 263 270 279

287 287 291 297 300 305 316 329 336 341

xv

Contents 9 The Newton-Kantorovich Theorem and Mathematical Programming 9.1 Case 1: Interior Point Methods . . . . . . . . . . . . . . . . . . . . . 9.2 Case 2: LP Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Variational inequalities on finite dimensional spaces . . . . . . . . . . 9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

353 353 360 368 377

10 Generalized equations 10.1 Strongly regular solutions . . . . . . . . . . . 10.2 Quasi Methods . . . . . . . . . . . . . . . . . 10.3 Variational Inequalities under Weak Lipschitz 10.4 Exercises . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . Conditions . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

379 379 387 393 400

11 Monotone convergence of point to set-mapping 11.1 General Theorems . . . . . . . . . . . . . . . . . 11.2 Convergence in linear spaces . . . . . . . . . . . . 11.3 Applications . . . . . . . . . . . . . . . . . . . . . 11.4 Exercises . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

409 409 410 415 417

12 Fixed points of point-to-set 12.1 General theorems . . . . . 12.2 Fixed points . . . . . . . . 12.3 Applications . . . . . . . . 12.4 Exercises . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

419 419 421 422 426

13 Special topics 13.1 Newton-Mysovskikh conditions . . . . . . . . . . . . . . . 13.2 A conjugate gradient solver for the normal equations . . . 13.3 A finer mesh independence principle for Newton’s method 13.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

429 429 437 449 456

mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Bibliography

457

A Glossary of Symbols

483

Index

485


Chapter 1

Linear Spaces The basic background for solving equations is introduced here.

1.1

Linear Operators

Some mathematical operations have certain properties in common. These properties are given in the following definition. Definition 1.1.1 An operator T which maps a linear space X into a linear space Y over the same scalar field S is said to be additive if T (x + y) = T (x) + T (y),

for all x, y ∈ X,

and homogeneous if T (sx) = sT (x),

for all x ∈ X, s ∈ S.

An operator that is additive and homogeneous is called a linear operator. Many examples of linear operators exist. Example 1.1.1 Define an operator T from a linear space X into it self by T (x) = sx, s ∈ S. Then T is a linear operator. Example 1.1.2 The operator D = given by D (x) =

d dt

mapping X = C 1 [0, 1] into Y = C[0, 1]

dx = y (t) , 0 ≤ t ≤ 1, dt

is linear. 1

2

1. Linear Spaces

If X and Y are linear spaces over the same scalar field S, then the set L(X, Y ) containing all linear operators from X into Y is a linear space over S if addition is defined by (T1 + T2 )(x) = T1 (x) + T2 (x),

for all x ∈ X,

and scalar multiplication by (sT )(x) = s(T (x)),

for all x ∈ X, s ∈ S.

We may also consider linear operators B mapping X into L(X, Y ). For an x ∈ X we have B(x) = T, a linear operator from X into Y . Hence, we have B(x1 , x2 ) = (B(x1 ))(x2 ) = y ∈ Y. B is called a bilinear operator from X into Y . The linear operators B from X into L(X, Y ) form a linear space L(X, L(X, Y )). This process can be repeated to generate j-linear operators (j > 1 an integer). Definition 1.1.2 A linear operator mapping a linear space X into its scalar S is called a linear functional in X. Definition 1.1.3 An operator Q mapping a linear space X into a linear space Y is said to be nonlinear if it is not a linear operator from X into Y .

1.2

Continuous Linear Operators

Some metric concepts of importance are introduced here. Definition 1.2.1 An operator F from a Banach space X into a Banach space Y is continuous at x = x∗ if lim kxn − x∗ kX = 0 =⇒ lim kF (xn ) − F (x∗ )kY = 0

n→∞

n→∞

Theorem 1.2.2 If a linear operator T from a Banach space X into a Banach space Y is continuous at x∗ = 0, then it is continuous at every point x of space X. Proof. We have T (0) = 0, and from limn→∞ kxn k = 0 we get limn→∞ kT (xn )k = 0. If sequence {xn } (n ≥ 0) converges to x∗ in X, by setting yn = xn − x∗ we obtain limn→∞ kyn k = 0. By hypothesis this implies that lim kT (xn )k = lim kT (xn − x∗ )k = lim kT (xn ) − T (x∗ )k = 0.

n→∞

n→∞

n→∞

1.3. Equations

3

Definition 1.2.3 An operator F from a Banach space X into a Banach space Y is bounded on the set A in X if there exists a constant c < ∞ such that kF (x)k ≤ c kxk ,

for all x ∈ A.

The greatest lower bound (infimum) of numbers c satisfying the above inequality is called the bound of F on A. An operator which is bounded on a ball (open) U (z, r) = {x ∈ X | kx − zk < r} is continuous at z. It turns out that for linear operators the converse is also true. Theorem 1.2.4 A continuous linear operator T from a Banach space X into a Banach space Y is bounded on X. Proof. By the continuity of T there exists ε > 0 such that kT (z)k < 1, if kzk < ε. For 0 6= z ∈ X kT (z)k ≤

1 ε

kzk ,

(1.2.1)

ε , and kT (cz)k = |c| · kT (z)k < 1. Letting c = ε−1 in since kczk < ε for |c| < kzk (1.2.1), we conclude that operator T is bounded on X. The bound on X of a linear operator T denoted by kT kX or simply kT k is called the norm of T . As in Theorem 1.2.4 we get

kT k = sup kT (x)k.

(1.2.2)

kxk=1

Hence, for any bounded linear operator T kT (x)k ≤ kT k · kxk,

for all x ∈ X.

(1.2.3)

From now on, L(X, Y ) denotes the set of all bounded linear operators from a Banach space X into another Banach space Y . It also follows immediately that L(X, Y ) is a linear space if equipped with the rules of addition and scalar multiplication introduced in Section 1.1. The proof of the following result is left as an exercise (see also [129]). Theorem 1.2.5 The set L(X, Y ) is a Banach space for the norm (1.2.2).

1.3

Equations

In a Banach space X solving a linear equation can be stated as follows: given a bounded linear operator T mapping X into itself and some y ∈ X, find an x ∈ X such that T (x) = y. The point x (if it exists) is called a solution of Equation (1.3.1).

(1.3.1)

4

1. Linear Spaces

Definition 1.3.1 If T is a bounded linear operator in X and a bounded linear operator T1 exists such that T1 T = T T1 = I,

(1.3.2)

where I is the identity operator in X (i.e., I(x) = x for all x ∈ X), then T1 is called the inverse of T and we write T1 = T −1 . That is, T −1 T = T T −1 = I.

(1.3.3)

If T −1 exists, then Equation (1.3.1) has the unique solution x = T −1 (y).

(1.3.4)

The proof of the following result is left as an exercise (see also [147], [211], [214]). Theorem 1.3.2 (Banach Lemma on Invertible Operators). If T is a bounded linear operator in X, T −1 exists if and only if there is a bounded linear operator P in X such that P −1 exists and kI − P T k < 1.

(1.3.5)

If T −1 exists, then T −1 =

∞ X

n=0

n

(I − P T ) P

(Neumann Series)

(1.3.6)

and

−1

T ≤

kP k . 1 − kI − P T k

(1.3.7)

Based on Theorem 1.3.2 we can immediately introduce a computational theory for Equation (1.3.1) composed by three factors: (A) Existence and Uniqueness. Under the hypotheses of Theorem 1.3.2 Equation (1.3.1) has a unique solution x∗ . (B) Approximation. The iteration xn+1 = P (y) + (I − P T )(xn ) (n ≥ 0)

(1.3.8)

gives a sequence {xn } (n ≥ 0) of successive approximations, which converges to x∗ for any initial guess x0 ∈ X. (C) Error Bounds. Clearly the speed of convergence of iteration {xn } (n ≥ 0) to x∗ is governed by the estimate: kxn − x∗ k ≤

kI − P T kn kP (y)k + kI − P T kn kx0 k. 1 − kI − P T k

(1.3.9)

1.4. The Inverse of a Linear Operator

1.4

5

The Inverse of a Linear Operator

Let T be a bounded linear operator in X. One way to obtain an approximate inverse is to make use of an operator sufficiently close to T . Theorem 1.4.1 If T is a bounded linear operator in X, T −1 exists if and only if there is a bounded linear operator P1 in X such that P1−1 exists, and

−1

kP1 − T k ≤ P1−1 . (1.4.1) If T −1 exists, then T −1 =

∞ X

n=0

and

−1

T ≤

I − P1−1 T

n

P1−1

−1

−1

P

P 1

≤ . −1 1 − I − P1 T 1 − P1−1 kP1 − T k

Proof. Let P = P1−1 in Theorem 1.3.2 and note that by (1.4.1)

I − P −1 T = P −1 (P1 − T ) ≤ P −1 · kP1 − T k < 1. 1 1 1

(1.4.2)

(1.4.3)

(1.4.4)

That is, (1.3.5) is satisfied. The bounds (1.4.3) follow from (1.3.7) and (1.4.4). That proves the sufficiency. The necessity is proved by setting P1 = T , if T −1 exists. The following result is equivalent to Theorem 1.3.2.

Theorem 1.4.2 A bounded linear operator T in a Banach space X has an inverse T −1 if and only if linear operators P , P −1 exist such that the series ∞ X

n=0

n

(I − P T ) P

(1.4.5)

converges. In this case we have T −1 =

∞ X

n=0

n

(I − P T ) P.

Proof. If series (1.4.5) converges, then it converges to T −1 (see Theorem 1.3.2). The existence of P , P −1 and the convergence of series (1.4.5) is again established as in Theorem 1.3.2, by taking P = T −1 , when it exists. Definition 1.4.3 A linear operator N in a Banach space X is said to be nilpotent if N m = 0, for some positive integer m.

(1.4.6)

6

1. Linear Spaces

Theorem 1.4.4 A bounded linear operator T in a Banach space X has an inverse T −1 and only if there exist linear operators P , P −1 such that I − P T is nilpotent. Proof. If P , P −1 exists and I − P T is nilpotent, then series ∞ X

n=0

(I − P T )n P =

m−1 X n=0

(I − P T )n P

converges to T −1 by Theorem 1.4.2. Moreover, if T −1 exists, then P = T −1 , P −1 = T exists, and I − P T = I − T −1 T = 0 is nilpotent.

1.5

Fr´ echet Derivatives

The computational techniques to be considered later make use of the derivative in the sense of Fréchet [211], [212], [264]. Definition 1.5.1 Let F be an operator mapping a Banach space X into a Banach space Y . If there exists a bounded linear operator L from X into Y such that lim

k∆xk→0

kF (x0 + ∆x) − F (x0 ) − L (∆x)k = 0, k∆xk

(1.5.1)

then P is said to be Fréchet differentiable at x0 , and the bounded linear operator P ′ (x0 ) = L

(1.5.2)

is called the first Fréchet-derivative of F at x0 . The limit in (1.5.1) is supposed to hold independently of the way that ∆x approaches 0. Moreover, the Fréchet differential δF (x0 , ∆x) = F ′ (x0 ) ∆x

(1.5.3)

is an arbitrary close approximation to the difference F (x0 + ∆x) − F (x0 ) relative to k∆xk, for k∆xk small. If F1 and F2 are differentiable at x0 , then (F1 + F2 )′ (x0 ) = F1′ (x0 ) + F2′ (x0 ).

(1.5.4)

Moreover, if F2 is an operator from a Banach space X into a Banach space Z, and F1 is an operator from Z into a Banach space Y , their composition F1 ◦F2 is defined by (F1 ◦ F2 )(x) = F1 (F2 (x)),

for all x ∈ X.

(1.5.5)

7

1.5. Fréchet Derivatives

It follows from Definition 1.5.1 that F1 ◦ F2 is differentiable at x0 if F2 is differentiable at x0 and F1 is differentiable at F2 (x0 ) of Z, with (chain rule): (F1 ◦ F2 )′ (x0 ) = F1′ (F2 (x0 ))F2′ (x0 ).

(1.5.6)

In order to differentiate an operator F we write: F (x0 + ∆x) − F (x0 ) = L(x0 , ∆x)∆x + η(x0 , ∆x),

(1.5.7)

where L(x0 , ∆x) is a bounded linear operator for given x0 , ∆x with lim

k∆xk→0

L(x0 , ∆x) = L,

(1.5.8)

and kη(x0 , ∆x)k = 0. k∆xk k∆xk→0 lim

(1.5.9)

Estimates (1.5.8) and (1.5.9) give lim

k∆xk→0

L(x0 , ∆x) = F ′ (x0 ).

(1.5.10)

If L(x0 , ∆x) is a continuous function of ∆x in some ball U (0, R) (R > 0), then L(x0 , 0) = F ′ (x0 ).

(1.5.11)

We need the definition of a mosaic: Higher-order derivatives can be defined by induction: Definition 1.5.2 If F is (m − 1)-times Fréchet-differentiable (m ≥ 2 an integer), and an m-linear operator A from X into Y exists such that

(m−1)

F (x0 + ∆x) − F (m−1) (x0 ) − A (∆x) lim = 0, (1.5.12) k∆xk→0 k∆xk then A is called the m-Fréchet-derivative of F at x0 , and A = F (m) (x0 )

(1.5.13)

Higher partial derivatives in product spaces can be defined as follows: Define Xij = L(Xj , Xi ),

(1.5.14)

where X1 , X2 , . . . are Banach spaces and L(Xj , Xi ) is the space of bounded linear operators from Xj into Xi . The elements of Xij are denoted by Lij , etc. Similarly, Xijm = L(Xm , Xij ) = L(Xm , L(Xj , Xi ))

(1.5.15)

8

1. Linear Spaces

denotes the space of bounded bilinear operators from Xk into Xij . Finally, we write Xij1 j2 ···jm = L Xjk , Xij1 j2 ···jm−1 , (1.5.16)

which denotes the space of bounded linear operators from Xjm into Xij1 j2 ···jm−1 . The elements A = Aij1 j2 ···jm of Xij1 j2 ···jm are a generalization of m-linear operators [10], [54]. Consider an operator Fi from space X=

n Y

Xjp

(1.5.17)

p=1

into Xi , and that Fi has partial derivatives of orders 1, 2, . . . , m − 1 in some ball U (x0 , R), where R > 0 and (0) (0) (0) x0 = xj1 , xj2 , . . . , xjn ∈ X. (1.5.18) For simplicity and without loss of generality we renumber the original spaces so that j1 = 1, j2 = 2, . . . , jn = n.

(1.5.19)

Hence, we write (0)

(0)

x0 = (x1 , x2 , . . . , x(0) n ).

(1.5.20)

A partial derivative of order (m − 1) of Fi at x0 is an operator Aiq1 q2 ···qm−1 =

∂ (m−1) Fi (x0 ) ∂xq1 ∂xq2 · · · ∂xqm−1

(1.5.21)

(in Xiq1 q2 ···qm−1 ) where 1 ≤ q1 , q2 , . . . , qm−1 ≤ n.

(1.5.22)

Let P (Xqm ) denote the operator from Xqm into Xiq1 q2 ···qm−1 obtained from (1.5.21) by letting (0)

xj = xj ,

j 6= qm ,

(1.5.23)

for some qm , 1 ≤ qm ≤ n. Moreover, if P ′ (x(0) qm ) =

∂ m Fi (x0 ) ∂ ∂ m−1 Fi (x0 ) · = , ∂xqm ∂xq1 ∂xq2 · · · ∂xqm−1 ∂xq1 · · · ∂xqm

(1.5.24)

exists, it will be called the partial Fréchet-derivative of order m of Fi with respect to xq1 , . . . , xqm at x0 .

1.6. Integration

9

Furthermore, if Fi is Fréchet-differentiable m times at x0 , then ∂ m Fi (x0 ) ∂ m Fi (x0 ) xq1 · · · xqm = xs · · · xsm ∂xq1 · · · ∂xqm ∂xs1 ∂xs2 · · · ∂xsm 1

(1.5.25)

for any permutation s1 , s2 , . . . , sm of integers q1 , q2 , . . . , qm and any choice of points xq1 , . . . , xqm , from Xq1 , . . . , Xqm respectively. Hence, if F = (F1 , . . . , Ft ) is an operator from X = X1 × X2 × · · · × Xn into Y = Y1 × Y2 × · · · × Yt , then ∂ m Fi (m) (x0 ) = (1.5.26) F ∂xj1 · · · ∂xjm x=x0 i = 1, 2, . . . , t, j1 , j2 , . . . , jm = 1, 2, . . . , n, is called the m-Fréchet derivative of F at (0) (0) (0) x0 = (x1 , x2 , . . . , xn ).

1.6

Integration

In this section we state results concerning the mean value theorem, Taylor’s theorem, and Riemannian integration. The proofs are left out as exercises. The mean value theorem for differentiable real functions f : f (b) − f (a) = f ′ (c)(b − a),

(1.6.1)

where c ∈ (a, b), does not hold in a Banach space setting. However, if F is a differentiable operator between two Banach spaces X and Y , then kF (x) − F (y)k ≤

sup x ¯∈L(x,y)

kF ′ (¯ x)k · kx − yk,

(1.6.2)

where L(x, y) = {z : z = λy + (1 − λ)x, 0 ≤ λ ≤ 1}.

(1.6.3)

z(λ) = λy + (1 − λ)x,

(1.6.4)

Set 0 ≤ λ ≤ 1,

and F (λ) = F (z(λ)) = F (λy + (1 − λ)x).

(1.6.5)

Divide the interval 0 ≤ λ ≤ 1 into n subintervals of lengths ∆λi , i = 1, 2, . . . , n, choose points λi inside corresponding subintervals and as in the real Riemann integral consider sums X

F (λi )∆λi =

σ

n X

F (λi )∆λi ,

(1.6.6)

i=1

where σ is the partition of the interval, and set |σ| = max ∆λi . (i)

(1.6.7)

10

1. Linear Spaces

Definition 1.6.1 If X S = lim F (λi ) ∆λi |σ|→0

(1.6.8)

σ

exists, then it is called the Riemann integral from F (λ) from 0 and 1, denoted by Z 1 Z y S= F (λ) dλ = F (λ) dλ. (1.6.9) 0

x

Definition 1.6.2 A bounded operator P (λ) on [0, 1] such that the set of points of discontinuity is of measure zero is said to be integrable on [0, 1]. We now state the famous Taylor theorem [171]. Theorem 1.6.3 If F is m-times Fréchet-differentiable in U (x0 , R), R > 0, and F (m) (x) is integrable from x to any y ∈ U (x0 , R), then F (y) = F (x) +

m−1 X

(n) 1 (x)(y n! F

n=1

− x)n + Rm (x, y),

m−1 X

(n) n 1

F (y) − F (x)(y − x) n!

≤ n=0

where

Rm (x, y) =

1.7

Z

1 0

(m)

ky − xkm

F (¯ x) , m! x ¯∈L(x,y) sup

m F (m) λy + (1 − λ) x (y − x)

(1−λ)m−1 (m−1)! dλ.

(1.6.10)

(1.6.11)

(1.6.12)

Exercises

1.1 Show that the operators introduced in Examples 1.1.1 and 1.1.2 are indeed linear. 1.2. Show that the Laplace transform ∆=

∂2 ∂2 ∂2 + 2+ 2 2 ∂x1 ∂x2 ∂x3

is a linear operator mapping the space of real functions x = x(x1 , x2 , x3 ) with continuous second derivatives on some subset D of R3 into the space of continuous real functions on D. 1.3. Define T : C ′′ [0, 1] × C ′ [0, 1] → C[0, 1] by 2 d d d2 x dy x T (x, y) = α 2 β = α 2 + β , 0 ≤ t ≤ 1. y dt dt dt dt Show that T is a linear operator.

1.7. Exercises

11

1.4. In an inner product h·, ·i space show that for any fixed z in the space T (x) = hx, zi is a linear functional. 1.5. Show that an additive operator T from a real Banach space X into a real Banach space Y is homogeneous if it is continuous. 1.6. Show that matrix A = {aij }, i, j = 1, 2, . . . , n has an inverse if |aii | >

1 2

n X j=1

|aij | > 0,

i = 1, 2, . . . , n.

1.7. Show that the linear integral equation of second Fredholm kind in C[0, 1] Z 1 x(s) − λ K(s, t)x(t)dt = y(s), 0 ≤ λ ≤ 1, 0

where K(s, t) is continuous on 0 ≤ s, t ≤ 1, has a unique solution x(s) for y(s) ∈ C[0, 1] if Z |λ| < max [0,1]

0

1

|K(s, t)|dt

−1

.

1.8. Prove Theorem 1.2.5. 1.9. Prove Theorem 1.3.2. 1.10. Show that the operators defined below are all linear: (a) Identity operator. The identity operator IX : X → X given by IX (x) = x, for all x ∈ X. (b) Zero operator. The zero operator O : X → Y given by O(x) = 0, for all x ∈ X. R1 (c) Integration. T : C[a, b] → C[a, b] given by T (x(t)) = 0 x(s)ds, t ∈ [a, b].

(d) Differentiation. Let X be the vector space of all polynomials on [a, b]. Define T on X by T (x(t)) = x′ (t).

(e) Vector algebra. The cross product with one factor kept fixed. Define T1 : R3 → R5 . Similarly, the dot product with one fixed factor. Define T2 : R3 → R. (f) Matrices. A real matrix A = {aij } with m rows and n columns. Define T : Rn → Rm given by y = Ax.

12

1. Linear Spaces

1.11. Let T be a linear operator. Show: (a) the R(T ) (range of T ) is a vector space; (b) if dim(T ) = n < ∞, then dim R(T ) ≤ n; (c) the null/space N (T ) is a vector space. 1.12. Let X, Y be vector spaces, both real or both complex. Let T : D(T ) → Y (domain of T ) be a linear operator with D(T ) ⊆ X and R(T ) ⊆ Y . Then, show: (a) the inverse T −1 : R(T ) → D(T ) exists if and only if T (x) = 0 ⇒ x = 0; (b) if T −1 exists, it is a linear operator; (c) if dim D(T ) = n < ∞ and T −1 exists, then dim R(T ) = dim D(T ). 1.13. Let T : X → Y , P : Y → Z be bijective linear operators, where X, Y , Z are vector spaces. Then, show: the inverse (ST )−1 : Z → X of the product ST exists, and (ST )−1 = T −1 S −1 . 1.14. If the product (composite) of two linear operators exists, show that it is linear. 1.15. Let X be the vector space of all complex 2×2 matrices and define T : X → X by T (x) = cx, where c ∈ X is fixed and cx denotes the usual product of matrices. Show that T is linear. Under what conditions does T −1 exist? 1.16. Let T : X → Y be a linear operator and dim X = dim Y = n < ∞. Show that R(T ) = Y if and only if T −1 exists. 1.17. Define the integral operator T : C[0, 1] → C[0, 1] by y = T (x), where y(t) = R1 k(x, s)x(s)ds and k is continuous on [0, 1] × [0, 1]. Show that T is linear 0 and bounded. 1.18. Show that the operator T defined in Exercise 1.10(f) is bounded. 1.19. If a normed space X is finite dimensional then show that every linear functional on X is bounded. 1.20. Let T : D(T ) → Y be a linear operator, where D(T ) ⊆ X and X, Y are normed spaces. Show:

1.7. Exercises

13

(a) T is continuous if and only if it is bounded; (b) if T is continuous at a single point, it is continuous. 1.21. Let T be a bounded linear operator. Show: (a) xn → x (where xn , x ∈ D(T )) ⇒ T (xn ) → T (x); (b) the null space N (T ) is closed. 1.22. If T 6= 0 is a bounded linear operator, show that for any x ∈ D(T ) such that kxk < 1, we have kT (x)k < kT k. 1.23. Show that the operator T : ℓ∞ → ℓ∞ defined by y = (yi ) = T (x), yi = x = (xi ), is linear and bounded.

xi i ,

1.24. Let T : C[0, 1] → C[0, 1] be defined by y(t) =

Z

t

x(s)ds. 0

Find R(T ) and T −1 : R(T ) → C[0, 1]. Is T −1 linear and bounded? 1.25. Show that the functionals defined on C[a, b] by f1 (x) =

Z

b

(y0 ∈ C[a, b])

x(t)y0 (t)dt

a

f2 (x) = c1 x(a) + c2 x(b)

(c1 , c2 fixed)

are linear and bounded. 1.26. Find the norm of the linear functional f defined on C[−1, 1] by f (x) =

Z

0

−1

x(t)dt −

Z

1

x(t)dt.

0

1.27. Show that f1 (x) = max x(t), t∈J

f2 (x) = min x(t), t∈J

J = [a, b]

define functionals on C[a, b]. Are they linear? Bounded? 1.28. Show that a function can be additive and not homogeneous. For example, let z = x + iy denote a complex number, and let T : C → C be given by T (z) = z¯ = x − iy.

14

1. Linear Spaces

1.29. Show that a function can be homogeneous and not additive. For example, consider the operator T : R2 → R given by T ((x1 , x2 )) =

x21 . x2

1.30. Let F be an operator in C[0, 1] defined by Z 1 s x(t)dt, 0 ≤ λ ≤ 1. F (x)(s) = x(s) s + t 0 Show that for x0 , z ∈ C[0, 1] Z 1 Z 1 s s F ′ (x0 )z = x0 (s) z(t)dt + z(s) x0 (t)dt. s + t s + t 0 0 1.31. Find the Fréchet-derivative of the operator F in R2∞ given by 2 x + 7x + 2xy − 3 F xy = . x + y3 1.32. Find the first and second Fréchet-derivatives of the Uryson operator Z 1 U (x) = k(s, t, x(t))dt 0

in C[0, 1] at x0 = x0 (s). 1.33. Find the Fréchet-derivative of the Riccati differential operator R(z) =

dz + p(t)z 2 + q(t)z + r(t), dt

from C ′ [0, s] into C[0, s] at z0 = z0 (t) in C ′ [0, s]. 1.34. Find the first two Fréchet-derivatives of the operator 2 x + y2 − 3 in R2 . F xy = x sin y

1.35. Consider the partial differential operator F (x) = ∆x − x2

from C 2 (I) into C(I), the space of all continuous function on the square 0 ≤ α, β ≤ 1. Show that F ′ (x0 )z = ∆z(α, β) − 2x0 (α, β)z(α, β), where ∆ is the usual Laplace operator.

1.7. Exercises 1.36. Let F (L) = L3 , in L(x). Show: F ′ (L0 ) = L0 [ ]L0 + L20 [ ] + [ ]L0 . 1.37. Let F (L) = L−1 , in L(x). Show: −1 F ′ (L0 ) = −L−1 0 [ ]L0 ,

provided that L−1 0 exists. 1.38. Show estimates (1.6.1) and (1.6.2). 1.39. Show Taylor’s Theorem 1.6.3. 1.40. Integrate the operator F (L) = L−1 in L(X) from L0 = I to L1 = A, where kI − Ak < 1.

15


Chapter 2

Monotone convergence This chapter introduces the fundamentals of the theory of divided differences of a nonlinear operator. Several results are also provided involving differences as well as Fréchet derivatives satisfying Lipschitz or monotone-type conditions that will be used later. Results on monotone convergence are discussed here and also in chapters 11 and 12.

2.1

Partially Ordered Topological Spaces

Let X be a linear space. We introduce the following definition: Definition 2.1.1 a partially ordered topological linear space (POTL-space) is a locally convex topological linear space X which has a closed proper convex cone. A proper convex cone is a subset K such that K + K ⊂ K, αK ⊂ K for α > 0, and K ∩ (−K) = {0} . Thus the order relation ≤, defined by x ≤ y if and only if y − x ∈ K, gives a partial ordering which is compatible with the linear structure of the space. The cone K which defines the ordering is called the positive cone since K = {x ∈ X | x ≥ 0} . The fact that K is closed implies also that intervals, [a, b] = {z ∈ X | a ≤ z ≤ b}, are closed sets. Example 2.1.1 Some simple examples of POTL-spaces are: (1) X = E n , n-dimensional Euclidean space, with K = {(x1 , x2 , ..., xn ) ∈ E n | xi ≥ 0, i = 1, 2, ..., n} ; (2) X = E n with K = {(x1 , x2 , ..., xn ) ∈ E n | xi ≥ 0, i = 1, 2, ..., n − 1, xn = 0} ; 17

18

2. Monotone convergence

(3) X = C n [0, 1], continuous functions, maximum norm topology, pointwise ordering; (4) X = C n [0, 1], n−times continuously differentiable functions with kf k =

n X

k=0

max f (K) (t) , and pointwise ordering;

(5) C = Lp [0, 1] , 0 ≤ p ≤ ∞ usual topology, K = {f ∈ Lp [0, 1] | f (t) ≤ 0 a.e.} . Remark 2.1.1 Using the above examples, it is easy to see that the closedness of the positive cone is not, in general, a strong enough connection between the ordering and the topology. Consider, for example, the following properties of sequences of real numbers: (1) x1 ≤ x2 ≤ · · · ≤ x∗ , and sup {xn } x∗ implies lim xn = x∗ ; n→∞

(2) lim xn = 0 implies that there exists a sequence {yn } with y1 ≥ y2 ≥ · · · ≥ 0, n→∞

inf {yn } = 0 and −yn ≤ xn ≤ yn ;

(3) 0 ≤ xn ≤ yn , and lim yn = 0 imply lim xn = 0. n→∞

n→∞

Unfortunately, these statements are not trues for all POTL-spaces: (a) In X = C [0, 1] let xn (t) = −tn . Then x1 ≤ x2 ≤ · · · ≤ 0, and sup {xn } = 0, but kxn k = 1 for all n, so lim xn does not exist. Hence (1) does not hold. n→∞

1 (b) In X = L1 [0, 1] let xn (t) = n for n+1 ≤ t ≤ n1 and zero elsewhere. Then lim kxn k = 0 but clearly property (2) does not hold. n→∞

n ˇ n ≤ yn , and lim yn = 0, (c) In X = C 1 [0, 1] let xn (t) = tn , yn (t) = n1 . then 0Sx n→∞ tn n−1 = 1 + 1 > 1; hence xn does not converge but kxn k = max n + max t n to zero.

We will now devote a brief discussion of certain types of POTL spaces in which some of the above statements are true.

Definition 2.1.2 A POTL-space is called regular if every order-bounded increasing sequence has a limit. Remark 2.1.2 Examples of regular POTL-spaces are E n and Lp , 0 ≤ p ≤ ∞, where as C [0, 1] , C n [0, 1] and L∞ [0, 1] are not regular, as was shown in (a) of the above remark. If {xn } n ≥ 0 is a monotone increasing sequence and lim xn = x∗ n→∞ exists, then for any k0 , n ≥ k0 implies xn ≥ xk0 . Hence x∗ = lim xn ≥ xk0 , i.e., n→∞

2.1. Partially Ordered Topological Spaces

19

x∗ is an upper bound on {xn } n = 0. Moreover, if y is any other upper bound, then xn ≤ y, and hence x∗ = limn→∞ xn ≤ y, i.e., x∗ = sup {xn } . This shows that in any POTL-space, the closedness of the positive cone guarantees that, if a monotone increasing sequence has a limit, then it is also a supremum. In a regular space, the converse of this is true; i.e., if a monotone increasing sequence has a supremum, then it also has a limit. It is important to note that the definition of regularity involves both an order concept (monotone boundedness) and a topological concept (limit). Definition 2.1.3 A POTL-space is called normal if, given a local base U for the topology, there exists a positive number η so that if 0 ≤ x ∈ V ∈ U then [0, x] ⊆ η U . Remark 2.1.3 If the topology of a POTL-space is given by a norm then this space is called a partially ordered normed space (PON)-space. If a PON-space is complete with respect to its topology then it is called a partially ordered Banach space (POB)space. According to Definition 2.1.3. A PON-space is normal if and only if there exists a positive number α such that kxk ≤ α kyk

for all

x, y ∈ X

with

0 ≤ x ≤ y.

Let us note that any regular POB-space is normal. The converse is not true. For example, the space C [0, 1] , ordered by the cone of nonnegative functions, is normal but is not regular. All finite dimensional POTL-spaces are both normal and regular. Remark 2.1.4 Let us now define some special types of operators acting between two POTL-spaces. First we introduce some notation if X and Y are two linear spaces then we denote by (X, Y ) the set of all operators from X into Y and by L (X, Y ) the set of all linear operators from X into Y. If X and Y are topological linear spaces then we denote by LB (X, Y ) the set of all continuous linear operators from X into Y . For simplicity the spaces L (X, X) and LB (X, X) will be denoted by L (X) and LB (X) . Now let X and Y be two POTL-spaces and consider an operator G ∈ (X, Y ). G is called isotone (resp. antitone) if x ≥ y implies G (x) ≤ G (y) (resp. G (x) ≤ G (y)). G is called nonnegative if x ≥ 0 implies G (x) ≥ 0. For linear operators the nonnegativity is clearly equivalent with the isotony. Also, a linear operator is inverse nonnegative if and only if it is invertible and its inverse is nonnegative. If G is a nonnegative operator then we write G ≥ 0. If G and H are two operators from X into Y such that H − G is nonnegative then we write G ≤ H. If Z is a linear space then we denote by I = Iz the identity operator in Z (i.e., I (x) = x for all x ∈ Z). If Z is a POTL-space then we have obviously I ≥ 0. Suppose that X and Y are two POTL-spaces and consider the operators T ∈ L (X, Y ) and S ∈ L (Y, X). If ST ≤ Ix (resp. ST ≥ Ix ) then S is called a left subinverse (resp. superinverse) of T and T is called a right sub-inverse (resp. superinverse) of S. We say that S is a subinverse of T is S is a left-as a right subinverse of T .

20


We finally end this section by noting that for the theory of partially ordered linear spaces the reader may consult M.A. Krasnosel’skii [210]–[213], Vandergraft [332] or Argyros and Szidarovszky [99].

2.2

Divided Differences in a Linear Space

The concept of a divided difference of a nonlinear operator generalizes the usual notion of a divided difference of a scalar function in the same way in which the Fréchet-derivative generalizes the notion of a derivative of a function. Definition 2.2.1 Let F be a nonlinear operator defined on a subset D of a linear space X with values in a linear space Y , i.e., F ∈ (D, Y ) and let x, y be two points of D. A linear operator from X into Y , denoted [x, y] , which satisfies the condition [x, y] (x − y) = F (x) − F (y)

(2.2.1)

is called a divided difference of F at the points x and y. Remark 2.2.1 If X and Y are topological linear spaces then we shall always assume the continuity of the linear operator [x, y]. (Generally, [x, y] ∈ L (X, Y ) if X, Y are POTL-spaces then [x, y] ∈ LB (X, Y )). Obviously, condition (2.2.1) does not uniquely determine the divided difference, with the exception of the case when X is one-dimensional. An operator [·, ·] : D × D → L (X, Y ) satisfying (2.2.1) is called a divided difference of F on D. If we fix the first variable, we get an operator

x0 , · : D → L (X, Y ) .

(2.2.2)

Let x1 , x2 be two points of D. A divided difference of the operator (2.2.2) at the points x1 , x2 will be called a divideddifferenceof the second order of F at the points x0 , x1 , x2 and will be denoted by x0 , x1 , x2 . We have by definition

x0 , x1 , x2

x1 − x2 = x0 , x1 − x0 , x2 .

(2.2.3)

Obviously, x0 , x1 , x2 ∈ L (X, L (X, Y )) . Let us now state a well known result due to Kantorovich concerning the location of fixed points which will be used extensively later. Theorem 2.2.2 Let X be a regular POTL-space and let x, y be two points of X such that x ≤ y. If H : [x, y] → X is a continuous isotone operator having the property that x ≤ H (x) and y ≥ H (y) , then there exists a point z ∈ [x, y] such that H (z) = z.

2.3. Divided Difference in a Banach Space

2.3

21

Divided Difference in a Banach Space

In this section we will assume that X and Y are Banach spaces. Accordingly we shall have [x, y] ∈ LB (X, Y ) , [x, y, z] ∈ LB (X, LB (X, Y )) . As we will see in later chapters, most convergence theorems in a Banach space require that the divided differences of F satisfy Lipschitz conditions of the form: k[x, y] − [x, z]k ≤ c0 ky − zk

k[y, x] − [z, x]k ≤ c1 ky − zk k[x, y, z] − [u, y, z]k ≤ c2 kx − yk

(2.3.1) for all x, y, z, u ∈ D.

(2.3.2) (2.3.3)

It is a simple exercise to show that if [·, ·] is a divided difference of F satisfying (2.3.1) or (2.3.2) then F is Fréchet differentiable on D and we have F ′ (x) = [x, x]

for all x ∈ D.

(2.3.4)

Moreover, if (2.3.1) and (2.3.2) are both satisfied then the Fréchet derivative F ′ is Lipschitz continuous on D with Lipschitz constant I = c0 + c1 . At the end of this section we shall give an example of divided differences of the first and of the second order in the finite dimensional case. We shall consider the space Rq equipped with the Chebysheff norm which is given by kxk = max {|xi | ∈ R|1 ≤ I ≤ q}

for x = (x1 , x2 , ..., xq ) ∈ Rq .

(2.3.5)

It follows that the norm of a linear operator L ∈ LB (Rq ) represented by the matrix with entries Iij is given by   q X  kLk = max |Iij | | |1 ≤ i ≤ q . (2.3.6)   j=1

We cannot give a formula for the norm of a bilinear operator. However, if B is a bilinear operator with entries bijk then we have the estimate   q X q  X |bijk | |1 ≤ i ≤ q . (2.3.7) kBk ≤ max   j=1 k=1

Let U be an open ball of Rq and let F be an operator defined on U with values in Rq . We denote by f1 , ..., fq the components of F . For each x ∈ U we have F (x) = (f1 (x) , ..., fq (x))T .

(2.3.8)

Moreover, we introduce the notation Dj fi (x) =

∂f (x) , ∂xj

Dkj fi (x) =

∂ 2 fi (x) . ∂xj ∂xk

(2.3.9)

22

2. Monotone convergence Let x, y be two points of U and let us denote by [x, y] the matrix with entries [x, y]ij =

1 xj −yj

(fi (x1 , ..., xj , yj+1 , ..., yq ) − fi (x1 , ..., xj−1 , yj , ..., yq )) . (2.3.10)

The linear operator [x, y] ∈ LB (Rq ) defined in this way obviously satisfies condition (2.2.1). If the partial derivatives Dj fi satisfy some Lipschitz conditions of the form |Dj fi (x1 , ..., xk + t, ..., xq ) − Dj fi (x1 , ..., xk , ..., xq )| ≤ pijk |t| then condition (2.3.1) and (2.3.2) will be satisfied with Pq 1 Pq i i p + p c0 = max |1 ≤ i ≤ q k=j+1 jk 2 j=1 jj and

c1 = max

Pj−1 i 1 Pq i p + p |1 ≤ i ≤ q . k=1 jk 2 j=1 jj

We shall prove (2.3.1) only since (2.3.2) can be proved similarly. Let x, y, z be three points of U . We shall have in turn Pq n [x, y]ij − [x, z] = k=1 [x, (y1 , ..., yk , zk+1 , ..., zq )]ij o − [x (y1 , ..., yk−1 , zk , ..., zq )]ij .

(2.3.11)

(2.3.12)

(2.3.13)

(2.3.14)

by (2.3.10). If k ≤ j then we have

[x, (y1 , ..., yk , zk+1 , ..., zq )]ij − [x, (y1 , ..., yk−1 , zk , ..., zq )]ij = 1 {fi (x1 , ..., xj , zj+1 , ..., zq ) − fi (x1 , ..., xj−1 , zj , ..., zq )} = xj −z j 1 − xj −zj {fi (x1 , ..., xj , zj+1 , ..., zq ) − fi (x1 , ..., xj−1 , zj , ..., zq )} = 0. For k = j we have h i [x, (y1 , ..., yj , zj+1 , ..., zq )]ij − x, (y1 , ..., yj−1 , zj , ..., zq )ij = 1 = xj −y {fi (x1 , ..., xj , zj+1 , ..., zq ) − fi (x1 , ..., xj−1 , yi , zj+1 , ..., zq )} j 1 − xj −y {f (x , ..., x , z , ..., z ) − f (x , ..., x , z , ..., z )} i 1 j j+1 q i 1 j−1 j q j Z 1 = {Dj fi (x1 , ..., xj , yj + t (xj − yj ) , zj+1 , ..., zq ) 0

−Dj fi (x1 , ..., xj , zj + t (xj − zj ) , zj+1 , ..., zq )} dt| Z 1 ≤ |yj − zj | pijj tdt = 12 |xj − zj | pijj 0

2.3. Divided Difference in a Banach Space

23

(by (2.3.11)). Finally for k > j we have using (2.3.10) and (2.3.14) again [x, (y1 , ..., yk , zk+1 , ..., zq )]ij − [x, (y1 , ..., yk−1 , zk , ..., zq )]ij = 1 = xj −y {fi (x1 , ..., xj , yj+1 , ..., yk , zk+1 , ..., zq ) j − fi (x1 , ..., xj−1 , yj , ..., yk , zk+1 , ..., zq )

− fi (x1 , ..., xj , yj+1 , ..., yk−1 , zk , ..., zq ) +fi (x1 , ..., xj−1 , yj , ..., yk−1 , zk , ..., zq )}| Z 1 = {fi (x1 , ..., xj−1 , yj + t (xj − yj ) , yj+1 , ..., yk , zk+1 , ..., zq ) 0

−fi (x1 , ..., xj−1 , yj + t (xj − yj ) , yj+1 , ..., yk−1 , zk , ..., zq )} dt|

≤ |yk − zk | pijk .

By adding all the above we get q 1 X i y] − [x, z] ≤ |y − z | p + |yk − zk | pijk [x, ij j j jj ij 2 k=j+1    q q  1 X X pijj + pijk  . ≤ ky − zk  2 j=1 k=j+1

Consequently condition (2.3.1) is satisfied with c0 given by (2.3.12). If each fj has continuous second order partial derivatives which are bounded on U we have pijk = sup {|Djk fi (x)| |x ∈ U } . In this case pijk = pikj so that c0 = c1 . Moreover, consider again three points x, y, z of U . Similarly with (2.3.14) the second divided difference of F at x, y, z is the bilinear operators defined by [x, y, z]ijk =

1 yk −zk

n o [x, (y1 , ..., yk , zk+1 , ..., zq )]ij − [x, (y1 , ..., yk−1 , zk , ..., zq )]ij .

(2.3.15)

It is easy to see as before that [x, y, z]ijk = 0 for k < j. For k = j we have [x, y, z]ijj = [xj , yj , zj ]t fi (x1 , ..., xi−1 , t, zj+1 , ..., zq )

(2.3.16)

where the right hand side of (2.3.16) represents the divided difference of fi (x1 , ..., xj−1 , t, zj+1 , ..., zq ) as a function of t, at the points xj , yj , zj . Using Genocchi’s

24


integral representation of divided differences of scalar functions we get [x, y, z]ijj = =

Z1 Z1 0

0

tDjj fi (x1 , ..., xj−1 , xj + t (yj − xj ) + ts (zj − yj ) , zj+1 , ..., zq ) dsdt. (2.3.17)

Hence, for k > j we obtain [x, y, z]ijk = =

1 (yk −zk )(xj −yj )

{fi (x1 , ..., xj , yj+1 , ..., yk , zk+1 , ..., zq )

− fi (x1 , ..., xj , xj+1 , ..., yk−1 , zk , ..., zq ) − fi (x1 , ..., xj−1 , yj , ..., yk , zk+1 , ..., zq )

+fi (x1 , ..., xj−1 , yj , ..., yk−1 , zk , ..., zq )} Z 1 1 {Dk fi (x1 , ..., xj , yj+1 , ..., yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq ) xj −yj 0

−Dk fi (x1 , ..., xj−1 , yj , ..., yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq )} dt Z 1Z 1 = Dkj fi (x1 , ..., xj−1 , yj + s (xj − yi ) , yj+1 , ..., yk−1 , zk 0

(2.3.18)

0

+ t (yk − zk ) , zk+1 , ..., zq )dsdt.

We now want to show that if ij |Dkj fi (v1 , ..., vm + t, ..., vq ) − Dkj fi (v1 , ..., vm , ...vq )| ≤ qkm |t|

for all v = (v1 , ..., vq ) ∈ U, 1 ≤ i, j, k, m ≤ q,

(2.3.19)

then the divided difference of F of the second order defined by (2.3.15) satisfies condition (2.3.3) with the constant c2 = max

1≤i≤q

Pq

j=1

Pq Pj−1 ij 1 ij 1 Pj−1 ij 1 Pq ij qjj + q + q + q k=j+1 m=1 km . 6 2 m=1 jm 2 k=j+1 kj (2.3.20)

Let u, x, y, z be four points of U . Then using (2.3.15) we can easily have n [(x1 , ..., xm , um+1 , ..., uq ) , y, z]ijk o [(x1 , ..., xm−1 , um , ..., uq ) , y, z]ijk .

[x, y, z]ijk − [u, y, z]ijk =

Pq

m=1

(2.3.21)

2.4. Divided Differences and Monotone Convergence

25

If m = j the terms in (2.3.21) vanish so that using (2.3.18) and (2.3.19) we deduce that for k > j [x, y, z]ijk − [u, y, z]ijk = Z 1Z 1 Pj−1 = m=1 {Dkj fi (x1 , ..., xm , um+1 , ..., uj−1 , yj + s (xj − yj ) , 0

0

yj+1 , ..., yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq )

− Dkj fi (x1 , ..., xm−1 , um , ..., uj−1 , yj + s (xj − yj ) , yj+1 , ..., yk−1 zk + t (yk − zk ) , zk+1 , ..., zq )} dsdt Z 1Z 1 + {Dkj fi (x1 , ..., xj−1 , yj + s (xj − yj ) , yj+1 , ..., ) 0

0

yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq } − Dkj fi (x1 , ..., xj−1 , yj + s (uj − yj ) , yj+1 , ..., yk−1 , zk + t (yk − zk ) , zk+1 , ..., zq )} dsdt| Pj−1 1 ij ij ≤ |xj − uj | qkj + m=1 |xm − um | qkm . 2

Similarly for k = j we obtain in turn [x, y, z]ijj − [u, y, z]ijj = Z 1 Z 1 t {Djj fi (x1 , ..., xj−1 , xj + t (yj − xj ) + ts (zj − yj ) , zj+1 , ..., zq ) = 0

0

−Djj fi (x1 , ..., xj−1 , uj + t (yj − uj ) + ts (zj − yj ) , zj+1 , ..., zq )} dsdt Z 1Z 1 Pj−1 + m=1 t{Djj fi (x1 , ..., xm , um+1 , ..., 0

0

uj−1 , xj + t (xj − yj ) + ts (zj − y) j, zj+1 , ..., zq )

− Djj fi (x1 , ..., xm−1 , um , ..., uj−1 , xj + t (yj − xj ) + ts (zj − yj ) , zj+1 , ..., zq )} dsdt| Pj−1 ij ij ≤ 16 |xj − uj | qjj + 21 m=1 |xm − um | qjm .

Finally using the estimate (2.3.7) of the norm of a bilinear operator, we deduce that condition (2.3.3) holds with c2 given by (2.3.20).

2.4

Divided Differences and Monotone Convergence

In this section we make an introduction to the problem of approximating a locally unique solution x∗ of the nonlinear operator equation F (x) = 0, in a POTL-space X. In particular, consider an operator F : D ⊆ X → Y where X is a POTL-space

26


with values in a POTL-space Y . Let x0 , y0 , y−1 be three points of D such that x0 ≤ y0 ≤ y−1 ,

[x0 , y−1 ] ,

and denote D1 = (x, y) ∈ X 2 | x0 ≤ x ≤ y ≤ y0 , D2 = (y, y−1 ) ∈ X 2 | x0 ≤ u ≤ y0 ,

D3 = D1 ∪ D2 .

(2.4.1)

Assume there exist operators A0 : D3 → LB (X, Y ) , A : D1 → L (X, Y ) such that: (a) F (y) − F (x) ≤ A0 (w, z) (y − x) f orall (x, y) , (y, w) ∈ D1 , (w, z) ∈ D3 ; (2.4.2) (b) the linear operator A0 (u, v) has a continuous nonsingular nonnegative left subinverse; (c) F (y) − F (x) ≥ A (x, y) (y − x)

for all

(x, y) ∈ D1 ;

(2.4.3)

(d) the linear operator A (x, y) has a nonegative left superinverse for each (x, y) ∈ D1 F (y) − F (x) ≤ A0 (y, z) (y − x)

for all x, y ∈ D1 , (y, z) ∈ D3 . (2.4.4)

Moreover, let us define approximations F (yn ) + A0 (yn , yn−1 ) (yn+1 − yn ) = 0 F (xn ) + A0 (yn , yn−1 ) (xn+1 − xn ) = 0

(2.4.5) (2.4.6)

yn+1 = yn − Bn F (yn ) n ≥ 0 xn+1 = xn − Bn1 F (xn )

(2.4.7)

n ≥ 0,

(2.4.8)

where Bn and Bn1 are nonnegative subinverses of A0 (yn , yn−1 ) n ≥ 0. Under very natural conditions, hypotheses the form (2.4.2) or (2.4.3) or (2.4.4) have been used extensively to show that the approximations (2.4.2) and (2.4.6) or (2.4.7) and (2.4.8) generate two sequences {xn } n ≥ 1, {yn } n ≥ 1 such that x0 ≤ x1 ≤ · · · ≤ xn ≤ xn+1 ≤ yn+1 ≤ yn ≤ · · · ≤ y1 ≤ y0 ∗

∗

lim xn = x = y = lim yn

n→∞

n→∞

∗

and F (x ) = 0.

(2.4.9) (2.4.10)


27

For a complete survey on these results we refer to the works of [279], Argyros and Szidarovszky [97]–[99]. Here we will use similar conditions (i.e. like (2.4.2), (2.4.3), (2.4.4)) for two-point approximations of the form (2.4.5) and (2.4.6) or (2.4.7) and (2.4.8). Consequently a discussion must follow on the possible choices of the linear operators A0 and A. Remark 2.4.1 Let us now consider an operator F : D ⊆ X → Y, where X, Y are both POTL-spaces. The operator F is called order-complex on an interval [x0 , y0 ] ⊆ D if F (λx + (1 − λ) y) ≤ λF (x) + (1 − λ) F (y)

(2.4.11)

for all comparable x, y ∈ [x0 , y0 ] and λ ∈ [0, 1] . If F has a linear G-derivative F ′ (x) at each point [x0 , y0 ] then (2.4.11) holds if and only if F ′ (x) (y − x) ≤ F (y) − F (x) ≤ F ′ (y) (y − x)

for

x0 ≤ x ≤ y ≤ y0 . (2.4.12)

(See e.g. Ortega and Rheinboldt [263] or our Exercise 2.25 for the properties of the Gateaux-derivative). Hence, for order-convex G-differentiable operators conditions (2.4.3) and (2.4.4) are satisfied with A0 (y, v) = A (y, v) = F ′ (u). In the unidimensional case (2.4.12) is equivalent with the isotony of the operator x → F ′ (x) but in general the latter property is stronger. Assuming the isotony of the operator x → F ′ (x) it follows that F (y) − F (x) ≤ F ′ (w) (y − x)

for

x0 ≤ x ≤ y ≤ w ≤ y0

Hence, in this case condition (2.4.2) is satisfied for A0 (w, z) = F ′ (w) . The above observations show to choose A and A0 for single or two-step Newtonmethods. We note that the iterative algorithm (2.4.5)-(2.4.8) with A0 (u, v) = F ′ (u) is the algorithm proposed by Fourier in 1818 in the unidimensional case and extended by Baluev in 1952 in the general case. The idea of using an algorithm of the form (2.4.7)-(2.4.8) goes back to Slugin [311]. In Ortega and Rheinboldt [263] it is shown that with Bn properly chosen (2.4.7) reduces to a general Newton-SOR algorithm. In particular,suppose (in the finite dimensional case) that F ′ (yn ) is an M -matrix and let F ′ (y) = Dn − Ln − Un be the partition of F ′ (yn ) into diagonal, strictly lower - and strictly upper-triangular parts respectively for all n ≥ 0. Consider an integer mn ≥ 1, a real parameter wn ∈ (0, 1] and denote Pn = wn−1 (Dn − wn Ln ) ,

Hn =

Pn−1 Qn ,

Qn = wn−1 [(1 − wn ) Dn + wn Un ] , and Bn = I + Hn + · · · + Hnmn −1 Pn−1 .

(2.4.13) (2.4.14)

28


It can easily be seen that Bn n ≥ 0 is a nonnegative subinverse of F ′ (yn ) (see, also [99]). If f : [a, b] → | R is a real function of a real variable, then f is (order) convex if and only if f (u) − f (v) f (x) − f (y) ≤ , x−y u−v

∀x, y, u, v ∈ [a, b] such that x ≤ u and y ≤ v.

This fact motivates the notion of convexity with respect to a divided difference considered by J.W. Schmidt and H. Leonhardt [307]. Let F : D ⊆ X → Y be a nonlinear operator between the POTL-spaces X and Y. Assume that the nonlinear operator F has a divided difference [·, ·] on D. F is called convex with respect to the divided difference [·, ·] on D if [x, y] ≤ [u, v] for all x, y, u, v ∈ D with x ≤ u and y ≤ v.

(2.4.15)

In the above quoted study, Schmidt and Leonhardt studied (2.4.5)-(2.4.6) with A0 (u, v) = [u, v] in case the nonlinear operator F is convex with respect to [·, ·]. Their result was extended by N. Schneider [308] who assumed instead of (1.2.1) the milder condition [u, v] (u − v) ≥ F (u) − F (v) for all comparable u, v ∈ D.

(2.4.16)

An operator [·, ·] : D × D → L (X, Y ) satisfying (2.4.16) is called a generalized divided difference of F on D. If both (2.4.15) and (2.4.16) are satisfied, then we say that F is convex with respect to the generalizedivided difference of [·, ·] . It is easily seen that if (2.4.15) and (2.4.16) are satisfied on D = [x0 , y−1 ] then conditions (2.4.2) and (2.4.3) are satisfied with A = A0 = [·, ·] . Indeed for x0 ≤ x ≤ y ≤ w ≤ z ≤ y−1 we have [x, y] (y − x) ≤ F (y) − F (x) ≤ [y, x] (y − x) ≤ [w, z] (y − x) . Moreover, concerning general secant-SOR methods, in case the generalized difference [yn , yn−1 ] is an M -matrix and if Bn n ≥ 0 is computed according to (2.4.13) and (2.4.14) where [yn , yn−1 ] = Dn − Ln − Un n ≥ 0 is the partition of [yn , yn−1 ] into its diagonal, strictly lower- and strictly upper-triangular parts. We remark that an operator which is convex with respect to a generalized divided difference is also order-convex. To see that, consider x, y ∈ D, x ≤ y, λ ∈ [0, 1] −1 and set z = λx + (I − λ) y. Observing that y − x = (1 − λ) (z − x) = λ−1 (y − z) and applying (2.4.16) we have in turn: (1 − λ)−1 (F (z) − F (x)) ≤ (1 − λ)−1 [z, x] (z − x)

= [z, x] (y − x) ≤ [z, y] (y − x)

= λ−1 [z, y] (y − z) ≤1 (F (y) − F (z)) . By the first and last term we deduce that F (z) ≤ λF (x) + (1 − λ) F (y) . Thus, Schneider’s result can be applied only to order-convex operators and its importance


29

resides in the fact that the use of a generalized divided difference instead of the G-derivative may be more advantageous from a numerical point of view. We note however, that conditions (2.4.2) and (2.4.3) do not necessarily imply convexity. For example, if f is a real function of a real variable such that f (x) − f (y) = m > 0, x−y x,y∈[x0 ,y0 ] inf

f (x) − f (y) =M 0 such that k[x, y] − [x1 , y1 ]k ≤ c (kx − x1 k + ky − y1 k)

(2.5.4)

for all x, y, x1 , y1 ∈ D with x 6= y and x1 = y1 . We say in this case that F has a Lipschitz continuous difference on D. This condition allows us to extend by continuity the operator (x, y) → [x, y] to the whole Cartesian product D × D. From (1.2.1) and (2.5.4) it follows that F is Fréchet-differentiable on D and that [x, x] = F ′ (x). It also follows that kF ′ (x) − F ′ (y)k ≤ c1 kx − yk

with

c1 = 2c

(2.5.5)

and k[x, y] − F ′ (z)k ≤ c (kx − zk + ky − zk)

(2.5.6)

for all x, y ∈ D. conversely if we assume that F is Fréchet-differentiable on D and that its Frechet derivative satisfies (2.5.5) then it follows that F has a Lipschitz continuous divided difference on D. We can certainly take Z 1 [x, y] = F ′ (x + t (y − x)) dt. (2.5.7) 0

We now want to give the definition of the second Frechet derivative of F . We must first introduce the definition of bounded multilinear operators (which will also be used later). Definition 2.5.3 Let X and Y be two Banach spaces. An operator A : X n → y (n ∈ N ) will be called n-linear operator from X to Y if the following conditions are satisfied: (a) The operator (x1 , ..., xn ) → A (x1 , ..., xn ) is linear in each variable xk k = 1, 2, ..., n. (b) There exists a constant c such that kA (x1 , x2 , ..., xn )k ≤ c kx1 k ... kxn k

(2.5.8)

The norm of a bounded n−linear operator can be defined by the formula kAk = sup {kA (x1 , ..., xn )k | kxn k = 1} . Set LB (1) (X, Y ) = LB (X, Y ) and define recursively LB (k+1) (X, Y ) = LB X, LB (k) (X, Y ) , k ≥ 0.

(2.5.9)

(2.5.10)

2.5. Divided Differences and Fréchet-derivatives

33

In this way we obtain a sequence of Banach spaces LB (n) (X, Y ) (n ≥ 0) . Every A ∈ LB (n) (X, Y ) can be viewed as a bounded n-linear operator if one takes A (x1 , ..., xn ) = (... (A (x1 ) (x2 ) (x3 )) ...) (xn ) .

(2.5.11)

In the right hand side of (2.5.11) we have A (x1 ) ∈ LB (n−1) (X, Y ) ,

(A (x1 )) (x2 ) ∈ LB (n−2) (X, Y ) ,

etc.

Conversely, any bounded n-linear operator A drom X to Y can be interpreted as an element of B (n) (X, Y ) . Moreover the norm of A as a bounded n-linear operator coincides with the norm as an element of the space LB (n) (X, Y ) . Thus we may identify this space with the space of all bounded n−linear operators from X to Y. In the sequel we will identify A (x, x, ..., x) = Axn , and A (x1 ) (x2 ) ... (xn ) = A (x1 , x2 , ..., xn ) = Ax1 x2 ...xn . Let us now consider a nonlinear operator F : D ⊆ X → Y where D is open. Suppose that F is Fréchet differentiable on D. Then we may consider the operator F ′ : D → LB (X, Y ) which associated to each point x the Fréchet derivative of F at x. If the operator F ′ is Fréchet differentiable at a point x0 ∈ D then we say that F is twice Fréchet differentiable at x0 . The Fréchet derivative of F ′ at x0 will be denoted by F ′′ (x0 ) and will be called the second Fréchet derivative of F at x0 . Note that F ′′ (x0 ) ∈ LB (2) (X, Y ) . Similarly we can define Fréchet derivatives of higher order. Finally by analogy with (2.5.7) [x0 , ..., xk ] = Z 1 Z 1 = ··· tk−1 tk−2 · · · tk−1 F (x0 + t1 (x1 − x0 ) + t1 t2 (x2 − x1 ) 1 2 0

0

+ · · · + t1 t2 , ..., tk (xk − xk−1 )) dt1 dt2 ...dtk .

(2.5.12)

It is easy to see that the multilinear operators defined above verify [x0 , ..., xk−1 , xk , xk+1 ] (xk − xk+1 ) = [x0 , ..., xk−1 , xk+1 ] .

(2.5.13)

We note that throughout this sequel a 2−linear operator will also be called bilinear. Finally, we will also need the definition of a n−linear symmetric operator. Given a n-linear operator A : X n → Y and a permutation i = (i1 , i2 , ..., in ) of the integers 1, 2, ..., n, the notation A (i) (or An (i) if we want to emphasize the n−linearity of A) can be used for the n−linear operator A (i) = An (i) such that A (i) (x1 , x2 , ..., xn ) = An (i) (x1 , x2 , ..., xn ) = An (xi1 , xi2 , ..., xin ) An xi1 xi2 ...xin (2.5.14) for all x1 , x2 , ..., xn ∈ X. Thus, there are n! n−linear operators A (i) = An (i) associated with a given n−linear operator A = An .

34


Definition 2.5.4 A n-linear operator A = An : X n → Y is said to be symmetric if A = An = An (i)

(2.5.15)

for all i belonging in Rn which denotes the set of all permutations of the integers 1, 2, ..., n. The symmetric n−linear operator A = An =

1P An (i) n! i∈Rn

(2.5.16)

is called the mean of A = An .

Definition 2.5.5 A n−linear operator A = An : X n → Y is said to be symmetric if A = An = An (i) for all i belonging in Rn which denotes the set of all permutations of the integers 1, 2, ..., n. The symmetric n−linear operator A = An =

1 X An (i) n!

(2.5.17)

i∈Rn

is called the mean of A = An . Notation 2.5.6 The notation An X P = An xx...x (p-times)

(2.5.18)

p ≤ n, A = An : X n → Y , for the result of applying An to x ∈ X p−times will be used. If p < n, then (2.5.17) will represent a (n − p)-linear operator. For p = n, note that Ak xk = Ak xk = Ak (i) xk

(2.5.19)

for all i ∈ Rk , x ∈ X. It follows from (2.5.18) that whenever we are dealing with an equation involving n-linear operators An , we may assume that they are symmetric without loss of generality, since each An may be replaced by An without changing the value of the expression at hand.

2.6

Enclosing the Root of an Equation

We consider the equation L (x) = T (x)

(2.6.1)

2.6. Enclosing the Root of an Equation

35

where L is a linear operator and T is a nonlinear operator defined on some convex subset D of a linear space E with values in a linear space Y. We study the convergence of the iterations L (yn+1 ) = T (yn ) + An (y, x) (yn+1 − yn )

(2.6.2)

L (xn+1 ) = T (xn ) + An (y, x) (xn+1 − xn )

(2.6.3)

and

to a solution x∗ of equation (2.6.1), where An (y,x) , n ≥ 0 is a linear operator. If P is a linear projection operator P 2 = P , that projects the space Y into YP ⊆ Y, then the operator P T will be assumed to be Fréchet differentiable on D and its derivative P Tx′ (x) corresponds to the operator P B (y, x) , y, x ∈ D × D, ′ P TX (x) = P B (x, x) for all x ∈ D. We will assume that An (y, x) = AP B (yn , xn ) ,

n ≥ 0.

Iterations (2.6.2) and (2.6.3) have been studied extensively under several assumptions [68], [99], [100], [263], [277], [279], when P = L = I, the identity operator on D. However, the iterates {xn } and {yn } can rarely be computed in infinite dimensional spaces in this case. But if the space YP is finite dimensional with dim (YP ) = N, then iterations (2.6.2) and (2.6.3) reduce to systems of linear algebraic equations of order at most N. This case has been studied in [100], [205] and in particular in [225]. The assumptions in [230] involve the positivity of the operators P B (y, x) − AP B (y, x) , QTx′ (x) with Q = I − P and L (y) − An (y, x) y on some interval [y0 , x0 ], which is difficult to verify. In this section we simplify the above assumptions and provide some further conditions for the convergence of iterations (2.6.2) and (2.6.3) to a solution x∗ of equation (2.6.1). We finally illustrate our results with an example. We can now formulate our main result. Theorem 2.6.1 Let F = D ⊂ X → Y, where X is a regular POTL-space and Y is a POTL-space. Assume (a) there exist points x0 , y0 , y−1 ∈ D with x0 ≤ y0 ≤ y−1 ,

[x0 , y−1 ] ⊂ D,

L (x0 ) − T (x0 ) ≤ 0 ≤ L (y0 ) − T (y0 ) .

Set

and

S1 = (x, y) ∈ X 2 ; x0 ≤ x ≤ y ≤ y0 , S2 = (u, y−1 ) ∈ X 2 ; x0 ≤ u ≤ y0 S3 = S1 ∪ S2 .

36


(b) Assume that there exists an operator A : S3 → B (X, Y ) such that (L (y) − T (y)) − (L (x) − T (x)) ≤ A (w, z) (y − x)

(2.6.4)

for all (x, y) , (y, w) ∈ S2 , (w, z) ∈ S3 . (c) Suppose that for any (u, v) ∈ S3 the linear operator A (u, v) has a continuous nonsingular nonnegative left subinverse. Then there exist two sequences {xn } , {yn } , n ≥ 1 satisfying (2.6.2), (2.6.3), L (xn ) − T (xn ) ≤ 0 ≤ L (yn ) − T (yn ) , (2.6.5) x0 ≤ x1 ≤ · · · ≤ xn ≤ xn+1 ≤ yn+1 ≤ yn ≤ · · · ≤ y1 ≤ y0 (2.6.6) and lim xn = x∗ ,

n→∞

lim yn = y ∗ .

(2.6.7)

n→∞

Moreover, if the operators An = A (yn , yn−1 ) are inverse nonnegative then any solution of the equation (2.6.1) from the interval [x0 , y0 ] belongs to the interval [x∗ , y ∗ ] . Proof. Let us define the operator M : [0, y0 − x0 ] → X by M (x) = x − L0 (L (x0 ) − T (x0 ) + A0 (x)) where L0 is a continuous nonsingular nonnegative left subinverse of A0 . It can easily be seen that M is isotone, continuous with M (0) = −L0 (L (x0 ) − T (x0 )) ≥ 0 and M (y0 − x0 ) = y0 − x0 − L0 (L (y0 ) − T (y0 )) + L0 [(L (y0 ) − T (y0 )) − (L (x0 ) − T (x0 )) − A0 (y0 − x0 )] ≤ y0 − x0 − L0 (L (y0 ) − T (y0 )) ≤ y0 − x0 .

It now follows from the theorem of L.V. Kantorovich 2.2.2 on fixed points that the operator M has a fixed point w ∈ [0, y0 − x0 ] . Set x1 = x0 + w to get L (x0 ) − T (x0 ) + A0 (x1 − x0 ) = 0,

x0 ≤ x1 ≤ y0 .

By (2.6.4), we get L (x1 ) − T (x1 ) = (L (x1 ) − T (x1 )) − (L (x0 ) − T (x0 )) + A0 (x0 − x1 ) ≤ 0.

2.6. Enclosing the Root of an Equation

37

Let us now define the operator M1 : [0, y0 − x1 ] → X by M1 (x) = x + L0 (L (y0 ) − T (y0 ) − A0 (x)) . It can easily be seen that M1 is continuous, isotone with M1 (0) = L0 (L (y0 ) − T (y0 )) ≥ 0 and M1 (y0 − x1 ) = y0 − x1 + L0 (L (x1 ) − T (x1 )) + L0 [(L (y0 ) − T (y0 )) − (L (x1 ) − T (x1 )) − A0 (y0 − x1 )] ≤ y0 − x1 + L0 (L (x1 ) − T (x1 )) ≤ y0 − x1 .

As before, there exists z ∈ [0, y0 − x1 ] such that M1 (z) = z. Set y1 = y0 − z to get L (y0 ) − T (y0 ) + A0 (y1 − y0 ) = 0,

x1 ≤ y1 ≤ y0 .

But from (2.6.4) and the above L (y1 ) − T (y1 ) = (L (y1 ) − T (y1 )) − (L (y0 ) − T (y0 )) − A0 (y1 − y0 ) ≥ 0. Using induction on n we can now show, following the above technique, that there exist sequences {xn } , {yn } , n ≥ 1 satisfying (2.6.1), (2.6.2), (2.6.5) and (2.6.6). since the space X is regular (2.6.6) we get that there exist x∗ , y ∗ ∈ E satisfying (2.6.7), with x∗ ≤ y ∗ . Let x0 ≤ z ≤ y0 and L (z) − T (z) = 0 then we get A0 (y1 − z) = A0 (y0 ) − (L (y0 ) − T (y0 )) − A0 (z)

= A0 (y0 − z) − [(L (y0 ) − T (y0 )) − (L (z) − T (z))] ≥ 0

and A0 (x1 − z) = A0 (x0 ) − (L (x0 ) − T (x0 )) − A0 (z)

= A0 (x0 − z) − [(L (x0 ) − T (x0 )) − (L (z) − T (z))] ≤ 0.

If A0 is inverse isotone, then x1 ≤ z ≤ y1 and by induction xn ≤ z ≤ yn . Hence x∗ ≤ z ≤ y ∗ . That completes the proof of the theorem. Using (2.6.1), (2.6.2), (2.6.5), (2.6.6) and (2.6.7) we can easily prove the following theorem which gives us natural conditions under which the points x∗ and y ∗ are solutions of the equation (2.6.1). Theorem 2.6.2 Let L − T be continuous at x∗ and y ∗ and the hypotheses of Theorem 2.6.1 be true. Assume that one of the conditions is satisfied: (a) x∗ = y ∗ ;

38


(b) X is normal and there exists an operator H : X → Y (H (0) = 0) which has an isotone inverse continuous at the origin and An ≤ H for sufficiently large n; (c) Y is normal and there exists an operator G : X → Y (G (0) = 0) continuous at the origin and such that An ≤ G for sufficiently large n. (d) The operators Ln , n ≥ 0 are equicontinuous. Then L (x∗ )−T (x∗ ) = L (y ∗ )−T (y ∗ ) = 0. Moreover, assume that there exists an operator G1 : S1 → L (X, Y ) such that G1 (x, y) has a nonnegative left superinverse for each (x, y) ∈ S1 and L (y) − T (y) − (L (x) − T (x)) ≥ G1 (x, y) (y − x)

for all

(x, y) ∈ S1 .

Then if (x∗ , y ∗ ) ∈ S1 and L (x∗ ) − T (x∗ ) = L (y ∗ ) − T (y ∗ ) = 0 then x∗ = y ∗ . We now complete this paper with an application.

2.7

Applications

Let X = Y = Rk with k = 2N. We define a projection operator PN by vi , i = 1, 2, . . . , N PN (v) = 0, i = N + 1, . . . , k, v = (v1 , v2 , . . . , vk ) ∈ X. We consider the system of equations vi = fi (v1 , ..., vk ) ,

i = 1, 2, ..., k.

(2.7.1)

Set T (v) = {fi (v1 , ..., vk )} , i = 1, 2, . . . , k, then fi (v1 , . . . , vk ) i = 1, . . . , N, PN T (v) = 0 i = N + 1, . . . , k,  k   P f ′ (v , . . . , v ) u i = 1, 2, . . . , N, 1 k j ij PN T ′ (v) u = j=1  ∂fi  0 i = N + 1, . . . , k, fij′ = ∂v , j  k   P  Fij (w1 , . . . , wk , z1 , . . . , zk ) uj , i = 1, . . . , N PN B (w, z) u = j=1   0, i = N + 1, . . . , k i = CN (w, z) u,

where Fij (v1 , ..., vk , v1 , ..., vk ) = ∂fi (v1 , ..., vk ) /∂vj . Choose i An (y, x) = CN (y, x) ,

2.7. Applications

39

then iterations (2.6.2) and (2.6.3) become i yi,n+1 = fi (y1,n , . . . , yk,n ) + CN (yi,n , xi,n ) (yi,n+1 − yi,n )

xi,n+1 = fi (x1,n , . . . , xk,n ) +

i CN

(2.7.2)

(yi,n , xi,n ) (xi,n+1 − xi,n ) .

(2.7.3)

Let us assume that the determinant Dn of the above N −th order linear systems is nonzero, then (2.7.2) and (2.7.3) become

yi,n+1 = yi,n+1

N P

m=1

1 Dim Fm (yn , xn )

,

Dn = fi (y1n , . . . , ykn ) ,

i = 1, . . . , N

(2.7.4)

i = N + 1, . . . , k

(2.7.5)

and

xi,n+1 = xi,n+1

N P

m=1

2 Dim Fm (yn , xn )

Dn = fi (x1,n , . . . , xk,n ) ,

,

i = 1, . . . , N,

(2.7.6)

i = N + 1, . . . , k

(2.7.7)

respectively. Here Dim is the cofactor of the element in the i-th column and m-th 1 row of Dn and Fm (yn , xn ) , i = 1, 2 are given by k X

αnmj fj (y1,n , . . . , yk,n ) −

k X

k X

αnmj fj (x1,n , . . . , xk,n ) −

k X

1 Fm (yn , xn ) = fm (y1,n , . . . , yk,n ) +

i=N +1

αnmj yj,n ,

j=1

and 2 Fm (yn , xn ) = fm (x1,n , . . . , xk,n ) +

i=N +1

αnmj xj,n ,

j=1

where αnmj = Fmj (yn , xn ) . If the hypotheses of Theorem 2.6.1 and 2.6.2 are now satisfied for the equation (2.7.1) then the results apply to obtain a solution x∗ of equation (2.7.1) [y0 , x0 ] . In particular consider the system of differential equations qi′′ = fi (t, q1 , q2 ) ,

i = 1, 2,

0≤t≤1

(2.7.8)

subject to the boundary conditions qi (0) = di ,

qi (1) = ei ,

i = 1, 2, .

(2.7.9)

The functions f1 and f2 are assumed to be sufficiently smooth. For the discretization a uniform mesh tj = jh,

j = 0, 1, ..., N + 1,

h=

1 N +1

40


and the corresponding central-difference approximation of the second derivatives are used. Then the discretized given by x = T (x)

(2.7.10)

with T (x) = (B + I) (x) = h2 ϕ (x) − b,

x ∈ R2N

where

B=

ϕ (x) =

A+I 0 ϕ1 (x) ϕ2 (x)

0 A+I

,

,



2

  −1 A+   0

−1 2

0 .. . ..

−1

ϕi (x) = (fi (t, xj , xn+j )) ,

.



  ,  −1  2

j = 1, 2, ..., N,

i = 1, 2,

x ∈ R2N , and b ∈ R2N is the vector of boundary values that has zero components except for b1 = d1 , bn = e1 , bn+1 = d2 , b2n = e2 . That is (2.7.10) plays the role of (2.7.1) (in vector form). As a numerical example, consider the problem (2.7.8)-(2.7.9) with f1 (t, q1 , q2 ) = q12 + q1 + .1q22 − 1.2 f2 (t, q1 , q2 ) = .2q12 + q22 + 2q2 − .6 d1 = d2 = e1 = e2 = 0.

Choose N = 49 and starting points xi,0 = 0, yi,0 = (tj (1 − tj ) ,

j = 1, ..., N ) , with t = .1 (.1) .5.

It is trivial to check that the hypotheses of Theorem 2.6.1 are satisfied with the above values. Furthermore the components of the first two iterates corresponding to the above values using the procedure described above and (2.7.4)-(2.7.5), (2.7.6)(2.7.7) we get the following values. The computations were performed in double precision on a PRIME-850.

2.8

Exercises

2.1. Show that the spaces defined in Examples 2.1.1 are POTL. 2.2. Show that any regular POB-space is normal but the converse is not necessarily true. 2.3. Prove Theorem 2.2.2.

2.8. Exercises

41

x1,p

x2,p

y1,p

y2,p

t .1 .2 .3 .4 .5 .1 .2 .3 .4 .5 .1 .2 .3 .4 .5 .1 .2 .3 .4 .5

p=1 .0478993317265 .0843667040291 .1099518629493 .1251063839064 .1301240123325 .0219768501208 .0384462112803 .0498537074028 .0565496306877 .0587562590344 .0494803602542 .0874507511044 .1142981809478 .1302974325097 .1356123753407 .0235492475283 .0415200498433 .0541939935471 .0617399319012 .0642461600398

p=2 .0490944353538 .0866974354188 .1132355483832 .1290273001442 .1342691068706 .0227479238400 .0399528292723 .0519796383151 .0590905187490 .0614432572165 .0490951403091 .0866988216544 .1132375255317 .1290296859551 .1342716394060 .0227486289905 .0399542200344 .0519816281202 .0590929252230 .0614458137439

2.4. Show that if (2.3.1) or (2.3.2) are satisfied then F ′ (x) = [x, x] for all x ∈ D. Moreover show that if both (2.3.1) and (2.3.2) are satisfied, then F ′ is Lipschitz continuous with I = c0 + c1 . 2.5. Find sufficient conditions so that estimates (2.4.9) and (2.4.10) are both satisfied. 2.6. Show that Bn (n ≥ 0) in (2.4.14) is a nonnegative subinverse of F ′ (yn ) (n ≥ 0). 2.7. Let x0 , x1 , ..., xn be distinct real numbers, and let f be a given real-valued function. Show that: [x0 , x1 , ..., xn ] = and

Pn

f (xj ) (xj )

j=0 ′ gn

[x0 , x1 , ..., xn ] (xn − x0 ) = [x1 , ..., xn ] − [x0 , ..., xn−1 ] where gn (x) = (x − x0 ) ... (x − xn ) .

42


2.8. Let x0 , x1 , ..., xn be distinct real numbers, and let f be n times continuously differentiable function on the interval I {x0 , x1 , ..., xn } . Then show that Z Z [x0 , x1 , ..., xn ] = · · · f (n) (t0 x0 + · · · + tn xn ) dt1 · · · dtn τn

in which τn = {(t1 , ..., tn ) |t1 ≥ 0, ..., tn ≥ 0, P t0 = 1 − ni=1 ti .

Pn

i=1 ti

≤ 1}

2.9. If f is a real polynomial of degree m, then show:   polynomial of degree m − n − 1, n ≤ m − 1 am n=m−1 [x0 , x1 , ..., xn , x] =  0 n>m−1 where f (x) = am xn + lower-degree terms.

2.10. The tensor product of two matrices M, N ∈ L (Rn ) is defined as the n2 × n2 matrix M × N = (mij N | i, j = 1, ..., n), where M = (mij ) . Consider two F -differentiable operators H, K : L (Rn ) → L (Rn ) and set F (X) = n ′ ′ hH (X) K (X)ifor all X ∈ L (R ). Show that F (X) = [H (X) × I] K (X) + T

I × K (X)

H ′ (X) for all X ∈ L (Rn ) .

2.11. Let F : R2 → R2 be defined by f1 (x) = x31 , f2 (x) = x22 . Set x = 0 and T y = (1, 1) . Show that there is no z ∈ [x, y] such that F (y) − F (x) = F ′ (z) (y − x) . 2.12. Let F : D ⊂ Rn → Rm and assume that F is continuously differentiable on a convex set D0 ⊂ D. For and x, y ∈ D0 , show that kF (y) − F (x) − F ′ (x) (y − x)k ≤ ky − xk w (ky − xk) , where w is the modulus of continuity of F ′ on [x, y] . That is w (t) = sup {kF ′ (x) − F ′ (y)k | x, y ∈ D0 , kx − yk ≤ t} . 2.13. Let F : D ⊂ Rn → Rm . Show that F ′′ is continuous at z ∈ D if and only if all second partial derivatives of the components f1 , ..., fm of F are continuous at z. 2.14. Let F : D ⊂ Rn → Rm . Show that F ′′ (z) is symmetric if and only if each Hessian matrix H1 (z) , ..., Hm (z) is symmetric. 2.15. Let M ∈ L (Rn ) be symmetric, and define f : Rn → R by f (x) = xT M x. Show, directly from the definition that f is convex if and only if M is positive semidefinite.

2.8. Exercises

43

2.16. Show that f : D ⊂ Rn → R is convex on the set D if and only if, for any x, y ∈ D, the function g : [0, 1] → R, g (t) = g (tx + (1 − t) y) , is convex on [0, 1] . 2.17. Show that if gi : Rn → R is convex and ci ≥ 0, i = 1, 2, ..., m, then g = m P ci gi is convex. t=1

2.18. Suppose that g : D ⊂ Rn → R is continuous on a convex set D0 ⊂ D and satisfies 1 1 1 2 g (x) + g (y) − g (x + y) ≥ γ kx − yk 2 2 2 for all x, y ∈ D0 . Show that g is convex on D0 if γ = 0.

2.19. Let M ∈ L (Rn ) . Show that M is a nonnegative matrix if and only if it is an isotone operator. 2.20. Let M ∈ L (Rn ) be diagonal, nonsingular, and nonnegative. Show that kxk = kD (x)k is a monotonic norm on Rn . 2.21. Let M ∈ L (Rn ) . Show that M is invertible and M −1 ≥ 0 if and only if there exist nonsingular, nonnegative matrices M1 , M2 ∈ L (Rn ) such that M1 M M2 = 1. 2.22. Let [·, ·] : D × D be an operator satisfying conditions (2.2.1) and (2.5.3). The following two assertions are equivalent: (a) Equality (2.5.7) holds for all x, y ∈ D. (b) For all points u, v ∈ D such that 2v − u ∈ D we have [u, v] = 2 [u, 2v − u] − [v, 2v − u] . 2.23. If δ F is a consistent approximation F ′ on D show that each of the following four expressions in an estimate for kF (x) − F (y) − δF (u, v) (x − y)k

c1 = h (kx − uk + ky − uk + ku − vk) kx − yk , c2 = h (kx − vk + ky − vk + ku − vk) kx − yk ,

c3 = h (kx − yk + ky − uk + ky − vk) kx − yk , and

c4 = h (kx − yk + kx − uk + kx − vk) kx − yk .

44


2.24. Show that the integral representation of [x0 , ..., xk ] is indeed a divided difference of k-th order of F. Let us assume that all divided differences have such an integral representation. In this case for x0 = x1 , = ... = xk = x we shall have 1 [x, x, ..., x] = f (k) (x) . | {z } k k+1 times

Suppose now that the n-th Fréchet-derivative of F is Lipschitz continuous on D, i.e. there exists a constant cn+1 such that

(n)

F (u) − F (n) (v) ≤ cn+1 ku − vk for all u, v ∈ D. In this case, set

Rn (y) = ([x0 , ..., xn−1 , y] − [x0 , ..., xn−1 , xn ]) (y − xn−1 ) , ..., (y − x0 ) and show that kRn (y)k ≤ and

cn+1 ky − xn k · ky − xn−1 k · · · ky − x0 k (n + 1)!

F (x + h) − F (x) + F ′ (x) h + 1 F ′′ (x) h2 + · · · + 1 F (n) (x) hn

2 n! cn+1 ≤ khkn+1 . (n + 1)!

2.25. We recall the definitions: (a) An operator F : D ⊂ Rn → Rm is Gateaux - (or G -) differentiable at an interior point x of D if there exists a linear operator L ∈ L (Rn , Rm ) such that, for any h ∈ Rn 1 kF (x + th) − F (x) − tL (h)k = 0. t→0 t lim

L is denoted by F ′ (x) and called the G-derivative of F at x. (b) An operator F : D ⊂ Rn → Rm is hemicontinuous at x ∈ D if, for any h ∈ Rn and ε > 0, there is a δ = δ (ε, h) so that whenever |t| < δ and x + th ∈ D, then kF (x + th) − F (x)k < ε. (c) If F : D ⊂ Rn → Rm and if for some interior point x of D, and h ∈ Rn , the limit lim

t→0

1 [F (x + th) − F (x)] = A (x, h) t

exists, then F is said to have a Gateaux-differential at x in the direction h.

2.8. Exercises

45

(d) If the G-differential exists at x for all h and if, in addition lim

h→0

1 kf (x + H) − F (x) − A (x, h)k = 0, khk

then F has a Fréchet differential at x. Show:

(i) The linear operator L is unique; (ii) If F : D ⊂ Rn → Rm is G-differentiable at x ∈ D, then F is hemicontinuous at x. (iii) G-differential and ”uniform in h” implies F -differential; (iv) F -differential and ”linear in h” implies F -derivative; (v) G-differential and ”linear in h” implies G-derivative; (vi) G-derivative and ”uniform in h” implies F -derivative; Here ”uniform in h” indicated the validity of (d). Linear in h means that A (x, h) exists for all h ∈ Rn and A (x, h) = M (x) h, where M (x) ∈ L (Rn , Rm ) .

Define F : R2 → R by F (x) = sgn (x2 ) min (|x1 | , |x2 |) . Show that, for any h ∈ R2 A (0, h) = F (h), bu F does not have a G-derivative at 0. (viii) Define F : R2 → R by F (0) = 0 if x = 0 and i 3 .h 2 2 F (x) = x2 x21 + x22 2 x1 + x22 + x22 , if x 6= 0.

Show that F has a G-derivative at 0, but not an F -derivative. Show, moreover, that the G-derivative is hemicontinuous at 0. (ix) If the g-differential A (x, h) exists for all x in an open neighborhood of an interior point x0 of D and for all h ∈ Rn , then F has an F -derivative at x0 provided that for each fixed h, A (x, h) is continuous in x at x0 .

(e) Assume that F : D ⊂ Rn → Rm has a G-derivative at each point of an open set D0 ⊂ D. If the operator F ′ : D0 ⊂ Rn → L (Rn , Rm ) has a G-derivative at ′ x ∈ D0 , then (F ′ ) (x) is denoted by F ′′ (x) and called the second G-derivative of F at x. Show: (i) If F : Rn → Rm has a G-derivative at each point of an open neighborhood of x, then F ′ is continuous at x if and only if all partial derivatives ∂i Fi are continuous at x. (ii) F ′′ is continuous at x0 ∈ D if and only if all second partial derivatives of the components f1 , ..., fm of F are continuous at x0 . F ′′ (x0 ) is symmetric if and only if each Hessian matrix H1 (x0 ) , ..., Hm (x0 ) is symmetric.


Chapter 3

Contractive Fixed Point Theory The ideas of a contraction operator and its fixed points are fundamental to many questions in applied mathematics. In Section 1 we outline the essential ideas, whereas in Section 2 some examples are provided. An extension of the contraction mapping theorem with an application to a boundary value problem is provided in Section 3. Stationary and nonstationary projection methods are studied in Section 5. A special iteration method is analysed in Section 5 whose convergence results improves earlier ones. Finally, in Section 6 the stability of the Ishikawa iteration is discussed. Further results on fixed points are given in Chapter 12.

3.1

Fixed Points

Definition 3.1.1 Let F be an operator mapping a set X into itself. A point x ∈ X is called a fixed point of F if x = F (x).

(3.1.1)

Equation (3.1.1) leads naturally to the construction of the method of successive approximations or substitutions xn+1 = F (xn ) (n ≥ 0) x0 ∈ X.

(3.1.2)

If sequence {xn } (n ≥ 0) converges to some point x∗ ∈ X for some initial guess x0 ∈ X, and F is a continuous operator in a Banach space X, we can have x∗ = lim xn+1 = lim F (xn ) = F lim xn = F (x∗ ) . n→∞

n→∞

n→∞

∗

That is, x is a fixed point of operator F . Hence, we showed:

Theorem 3.1.2 If F is a continuous operator in a Banach space X, and the sequence {xn } (n ≥ 0) generated by (3.1.2) converges to some point x∗ ∈ X for some initial guess x0 ∈ X, then x∗ is a fixed point of operator F . 47

48

3. Contractive Fixed Point Theory

We need information on the uniqueness of x∗ and the distances kxn − x∗ k, kxn+1 − xn k (n ≥ 0). That is why we introduce the concept: Definition 3.1.3 Let (X, k k) be a metric space and F a mapping of X into itself. The operator F is said to be a contraction or a contraction mapping if there exists a real number c, 0 ≤ c < 1, such that kF (x) − F (y)k ≤ c kx − yk ,

for all x, y ∈ X.

(3.1.3)

It follows immediately from (3.1.3) that every contraction mapping F is uniformly continuous. Indeed, F is Lipschitz continuous with a Lipschitz constant c. The point c is called the contraction constant for F . We now arrive at the Banach contraction mapping theorem [207]. Theorem 3.1.4 Let (X, k·k) be a Banach space, and F : X → X be a contraction mapping. Then F has a unique fixed point. Proof. Uniqueness: Suppose there are two fixed points x, y of F . Since F is a contraction mapping, kx − yk = kF (x) − F (y)k ≤ c kx − yk < kx − yk , which is impossible. That shows uniqueness of the fixed point of F . Existence: Using (3.1.2) we obtain kx2 − x1 k ≤ c kx1 − x0 k

kx3 − x2 k ≤ x kx2 − x1 k ≤ c2 kx1 − x0 k .. . kxn+1 − xn k ≤ cn kx1 − x0 k

(3.1.4)

and, so kxn+m − xn k ≤ kxn+m − xn+m−1 k + · · · + kxn+1 − xn k ≤ cm+1 + · · · + c + 1 cn kx1 − x0 k cn ≤ kx1 − x0 k . 1−c Hence, sequence {xn }(n ≥ 0) is Cauchy in a Banach space X and such it converges to some x∗ . The rest of the theorem follows from Theorem 3.1.2. Remark 3.1.1 It follows from (3.1.1), (3.1.3) and x∗ = F (x∗ ) that kxn − x∗ k = kF (xn−1 ) − F (x∗ )k ≤ c kxn−1 − x∗ k ≤ cn kx0 − x∗ k (n ≥ 1) .

(3.1.5)

3.2. Applications

49

Inequality (3.1.5) describes the convergence rate. This is convenient only when an a priori estimate for x0 − x∗ is available. Such an estimate can be derived from the inequality kx0 − x∗ k ≤ kx0 − F (x0 )k + kF (x0 ) − F (x∗ )k ≤ kx0 − F (x0 )k + c kx0 − x∗ k , which leads to kx0 − x∗ k ≤

1 kx0 − F (x0 )k . 1−c

(3.1.6)

By (3.1.5) and (3.1.6), we obtain kxn − x∗ k ≤

cn kx0 − F (x0 )k (n ≥ 1) . 1−c

(3.1.7)

Estimates (3.1.5) and (3.1.7) can be used to determine the number of steps needed to solve Equation (3.1.1). For example if the error tolerance is ε > 0, that is, we use kxn − x∗ k < ε, then this will certainly hold if n>

3.2

1 ε (1 − c) ln . ln c kx0 − F (x0 )k

(3.1.8)

Applications

These are examples of operators with fixed points that are not contraction mappings: Example 3.2.1 Let F : R → R, q > 1, F (x) = qx + 1. Operator F is not a contraction, but it has a unique fixed point x = (1 − q)−1 . Example 3.2.2 Let F : X → X, x = (0, √16 ], F (x) = x3 . We have |F (x) − F (y)| = x3 − y 3 ≤ |x|2 + |x| · |y| + |y|2 |x − y| ≤

1 2

|x − y|

That is, F is a contraction with c = 12 , with no fixed points in X. This is not violating Theorem 3.1.4 since X is not a Banach space. Example 3.2.3 Let F : [a, b] → [a, b], F differentiable at every x ∈ (a, b) and |F ′ (x)| ≤ c < 1. By the mean value theorem, if x, y ∈ [a, b] there exists a point z between x and y such that F (x) − F (y) = F ′ (z) (z − y) , from which it follows that F is a contraction with constant c.

50


Example 3.2.4 Let F : [a, b] → R. Assume there exist constants p1 , p2 such that p1 p2 < 0 and 0 < p1 < F ′ (x) ≤ p−1 and assume that F (a) < 0 < F (b). How 2 can we find the zero of F (x) guaranteed to exist by the intermediate value theorem? Define a function P by P (x) = x − p2 F (x). Using the hypotheses, we obtain P (a) = a − p2 F (a) > a, P (b) = b − F (b) < b, P ′ (x) = 1 − p2 F ′ (x) ≥ 0, and P ′ (x) ≤ 1 − p1 p2 < 1. Hence P maps [a, b] into itself and |p′ (x)| ≤ 1 for all x ∈ [a, b]. By Example 3.2.3, P (x) is a contraction mapping. Hence, P has a fixed point which is clearly a zero of F . Example 3.2.5 (Fredholm integral equation). Let K(x, y) be a continuous function on [a, b] × [a, b], f0 (x) be continuous on [a, b] and consider the equation Z b f (x) = f0 (x) + λ K (x, y) f (y) dy. a

Define the operator P : C[a, b] → C[a, b] by p(f ) = g given by Z b g (x) = f0 (x) + λ K (x, y) f (y) dy. a

Note that a fixed point of P is a solution of the integral equation. We get kP (q1 ) − P (q2 )k = sup p (q1 (x)) − P (q2 (x)) x∈[a,b]

Z b = |λ| sup k (x, y) (q1 (y) − q2 (y)) dy x∈[a,b] a Z b (q1 (y) − q2 (y)) dy (by Weierstrass theorem) ≤ |λ| δ a ≤ |λ| δ (b − a) sup |q1 (y) − q2 (y)| x∈[a,b]

≤ |λ| δ (b − a) kq1 − q2 k . Hence, P is a contraction mapping if there exists a c > 1 such that |λ| δ (b − a) ≤ c.

3.3

Extending the contraction mapping theorem

An extension of the contraction mapping theorem is provided in a Banach space setting. Our approach is justified by numerical examples where our results apply whereas the contraction mapping theorem 3.1.4 cannot. We provide the following fixed point theorem:

3.3. Extending the contraction mapping theorem

51

Theorem 3.3.1 Let Q : D ⊆ X → X be an operator, p, q scalars with p < 1, q ≥ 0 and x0 a point in D such that: kQ(x) − Q(y)k ≤ qkx − ykp for all x, y ∈ D. 1

(3.3.1)

1

Set a = q 1−p and b = q p kQ(x0 ) − x0 k. Moreover assume: 0≤b 1. Under hypotheses (3.3.1), (3.3.2) and (3.3.3) n n+m−1 the proof of the theorem goes through since bp + · · · + bp is still a Cauchy sequence but estimate (3.3.6) only holds as kxn − x∗ k ≤ cn .

(3.3.11)

(b) Case p = 1. We simply have the contraction mapping theorem 3.1.4. Next we provide a numerical example to show that our Theorem can apply in cases where p 6= 1. That is in cases where the contraction mapping principle cannot apply.

3.3. Extending the contraction mapping theorem

53

Example 3.3.1 Let X = Rm−1 with m ≥ 2 an integer. Define operator Q on a subset D of X by Q(w) = H + h2 (p + 1)M where, h is a real parameter, p1 ≥ 0,  2 −1 0 ··· 0  −1 2 −1 0 H =  0 −1 2 −1 0 ··· −1 2  p  w1 ··· 0 w2p ··· 0 , M = 0 p 0 ··· ··· wm−1

(3.3.12) 

 , 

and w = (w1 , w2 , . . . , wm−1 ). Finding fixed points of operator Q is very important in many discretization studies since e.g. many two point boundary value problems can be reduced to finding such points [68], [99]. Let w ∈ Rm−1 , M ∈ Rm−1 ×Rm−1 , and define the norms of w and M by kwk =

max

1≤j≤m−1

|wj |

and kM k =

max

1≤j≤m−1

m−1 X k=1

|mjk |.

For all w, z ∈ Rm−1 for which |wj | > 0, |zj | > 0, j = 1, 2, . . . , m − 1 we obtain, for say p = 12

3 2 1/2 1/2

kQ(w) − Q(z)k = diag h w − z j j

2 1/2 3 1/2 = h2 max wj − zj 2 1≤j≤m−1 3 ≤ h2 [max |wj − zj |]1/2 2 3 2 = h kw − zk1/2 . (3.3.13) 2 Set q = 32 h2 . Then a = q 2 and b = q 2 kQ(x0 ) − x0 k. It follows by (17) that the contraction mapping principle cannot be used to obtain fixed points of operator Q since p = 12 6= 1. However our theorem can apply provided that x0 ∈ D is such that kQ(x0 ) − x0 k ≤ q −2 .

(3.3.14)

That is (2) is satisfied. Furthermore r should be chosen to satisfy (3) and we can certainly set U (x0 , r) = D.

54

3.4


Stationary and nonstationary projection-iteration methods

The aim of this work is to study the existence of the solutions of the equation x = A(x, x)

(3.4.1)

and their approximations for nonlinear operator A. It is clear that equation (3.4.1) contains the equation x = Ax as a special case and the approximations xn = An (xn , xn−1 ), n = 1, 2, ...

(3.4.2)

generalize a lot of approximate methods such as the Picard, Seidel and NewtonKantorovich methods and so on. In the sequel we study the strong and weak convergence of this type of projection-iteration method, stationary as well as non stationary for equation (3.4.1). A new generalized fixed point Edelstein’s theorem [149], [150] will be stated and some examples will be shown. Theorem 3.4.1 (Stationary) Let X be a topological vector space, D a compact set in X, A a continuous mapping from D × D into D, and F = {x ∈ D : A(x, x) = x} Let g : D × F → [0, ∞). Assume 1. F 6= ∅ and g = 0, x ∈ D, p ∈ F ⇔ x = p ∈ F ; 2. g(A(x, y), p) < max{g(x, p), g(y, p)}, ∀x, y ∈ D, p ∈ F such that there are no such relations x = y = p; 3. the function g(x, p) is continuous in x; 4. the system un = A(un , vn ), (3.4.3) vn = Ak (un , un−1 ) is solvable for arbitrary element u0 ∈ D anf for all n ≥ 1, where A1 (x, y) = A(x, y), Ai (x, y) = A(x, Ai−1 (x, y)), i = 2, 3, ..., k. Then the sequences {un }, {vn } constructed by (3.4.3) converge (in the topology of X) to the unique solution of equation (3.4.1). Proof. By assumption 2/ we have g(un , p) < max(g(un , p), g(vn , p)), i.e. g(un , p) < g(vn , p). However g(vn , p) = g(Ak (un , un−1 ), p) < max(g(un , p), g(Ak−1 (un , un−1 ), p)) ≤ ... ≤ ≤ max(g(un , p), g(un−1 , p)),

hence g(un , p) < g(vn , p) < g(un−1 , p).

3.4. Stationary and nonstationary projection-iteration methods

55

Therefore lim g(un , p) = lim g(vn , p). By the compactness of D we get the subsequences {unj } ⊂ {un }, {vnj } ⊂ {vn } such that unj → u, vnj → v. Since g is continuous g(u, p) = g(v, p). On the other hand A(unj , vnj ) → A(u, v) and g(u, p) = g(A(u, v), p) < max{g(u, p), g(v, p)}. This is a contradiction. So u = v = p. Note that F contains only one element. Indeed if p 6= q, p, q ∈ F, then g(p, q) = g(A(p, q), p) < max g(p, q). This is impossible. Consequently the sequences {un , vn } converge to the unique solution p of equation (3.4.1). Remark 3.4.1 Instead of condition 3/ we can use the condition R: from lim g(un , p) = lim g(vn , p), p ∈ F, and v is a limit of the sequence {vn } it follows that either g(u, p) ≤ g(v, p) or g(u, p) ≥ g(v, p) for all limits u of the sequence {un }.

To verify this Remark it suffices to use conditions 2/, 4/. Indeed it is clear that from the sequences {unj }, {vnj }, we can choose the subseqeunces {unjk −1 } such that unjk −1 → u¯. Then we have unjk = A(unjk , vnjk ), unjk → u = A(u, v)

¯). vnjk = A(unjk , unjk −1 ), vnjk → v = A(u, u

Hence g(u, p) < max{g(u, p), g(v, p)} implies g(u, p) < g(v, p) g(v, p) < max{g(u, p), g(¯ u, p)} implies g(v, p) < g(¯ u, p), if either u = v = p or u = u ¯ = p do not hold. However the last inequalities contradict condition R). Therefore u = v = p or u = u ¯ = p, i.e. v = Ak (u, u) = Ak (p, p) = p. Lemma 3.4.2 [332] Let M be a bounded weakly closet subset in a reflexive Banach space, and mf a lower weakly semicontinuous functional on M. Then f is lower bounded and attains its infinimum on M. Theorem 3.4.3 Let X, D, A, F, g be as in Theorem 3.4.1. Assume that conditions 1/, 3/, 4/ of Theorem 3.4.1 hold and instead of condition 2/ of this theorem the following conditions are satisfied: 5/ < max(g(x, p), g(y, p)), if g(x, p) 6= g(y, p), g(A(x, y), p) ≤ max(g(x, p), g(y, p)), if g(x, p) = g(y, p),

56


∀x, y ∈ D, p ∈ F ; 6/ ∀y ∈ / F, x ∈ D satisfying x = A(x, y), ∃q = q(x, y) ∈ F such that g(A(x, y), p) < max(g(x, q), g(y, q)). Then subsequence {unj , vnj } of the sequence {un , vn } constructed by (3.4.3) must converge simultaneously to the solution p of equation (3.4.1). Proof. As in the proof of Theorem 3.4.1 we have lim g(un , p) = lim g(vn , p).

(3.4.4)

Now from {un }, {vn } we take {unj }, {vnj } such that unjk → u, vnjk → v. It will be shown that v ∈ F. If it is not true, i.e. v ∈ / F, then ∃q = q(u, v) ∈ F, such that g(u, p) = g(A(u, v), q) < max(g(u, p), g(v, p)) i.e. g(u, p) < g(v, p).

(3.4.5)

Since the function g is continuous g(u, p) = g(v, p) by (3.4.4). But this contradicts (3.4.5). Hence v ∈ F. Moreover, if g(u, v) 6= g(v, v) = 0 then g(u, p) = g(A(u, v), v) < max(g(u, v), g(v, v)), this is impossible. So g(u, v) = 0, i.e. u = v ∈ F. Lemma 3.4.4 Let X be a strictly convex Banach space, D a convex subset in X, A : D × D → D a continuous mapping satisfying the condition kA(x, y) − pk ≤ max(kx − pk , ky − pk), ∀x, y ∈ D, p ∈ F = {x ∈ D : A(x, x) = x}. If F 6= ∅, then F is convex. The proof of this Lemma is similar to the one in a work by W.V. Petryshyn and J.R. Williamson (see [273]). Lemma 3.4.5 [332]. In a Banach space if the functional f is quasiconvex, i.e. for any t ∈ (0, 1) f (tx − (1 − t)y) ≤ max(f (x), f (y)) and lower semicontinuous, then f is lower weakly semicontinuous. From now on a mapping V is said to be quasiconvex if

x1 + x2

V (

≤ max(kV (x1 )k , kV (x2 )k). )

2


57

Theorem 3.4.6 Let X be a strictly convex and reflexive Banach space, D a convex closed subset in X, A a continuous mapping from D × D into D and 7/ F 6= ∅; 8/ ≤ max(kx − pk , ky − pk) if kx − pk 6= ky − pk , kA(x, y) − pk ≤ max(kx − pk , ky − pk) if kx − pk = ky − pk ; 9/ the system (3.4.3) is solvable for all n and kun − vn k → 0 as n → ∞; 10/ either the mapping Ix − A, where (Ix − A)(x, y) = x − A(x, y), is quasiconvex ot the mapping Iy −A, where (Iy −A)(x, y) = y −A(x, y), is quasiconvex; 11/ thespace mX W

has the property: if {yn } ⊂ X, yn → y0 ∈ X, then the limkyn − yk >limkyn − y0 k , ∀y 6= y0 , y ∈ X. Then the sequences {un }, {vn } defined by (3.4.3) weakly converge simoultaneously to the solution p ∈ F. Proof. Similarly to the proof of Theorem 3.4.1 we obtain: kun − pk ≤ kvn − pk ≤ kun−1 − pk . Hence lim kun − pk = lim kvn − pk .

Consequently {un }, {vn } ⊂ B(p, ku0 − pk), where mB is a ball with center at p ∈ F. Therefore from {un }, {vn } we can choose weakly convergent subsequences {unj }, {vnj } such that W

W

unj → u, vnj → v. However,

ku − vk ≤ lim unj − vnj = lim kun − vn k = 0.

Hence u = v. By condition 10/, if the mapping Iy −a is quasiconvex (and continuous, of course) then it is weakly lower semicontinuous, we thus have

kv − A(u, v)k ≤ lim unj − A(unj , vnj ) = 0.

Hence u = A(u, v), i.e. u = v = A(u, u) = A(v, v). Therefore for both cases the subsequences {unj }, {vnj } weakly converge to the solution p ∈ F. Now let f (p) = lim kun − pk . This function is defined on the set F0 = F ∩ B(p ku0 − pk) which is convex closed bounded by the convexity and closedness of F. and hence it is weakly compact. On the other hand it is easy to see that f (p) is convex and continuous and thus f (p) is lower weakly semicontinuous. Therefore f (p) attains on F0 its infinimum, say, at a point p∗∗ . It will be shown that p is a unique limit point of both sequences {un }, {vn }. Indeed if there exists a subsequence {unk } weakly converging to u 6= p then by condition 11/ we have lim kunk − pk > lim kunk − uk .

58


However lim kunk − pk = lim kun − pk = f (p), and lim kunk − uk = lim kun − uk = f (u). That is f (p) > f (u), which is a contradiction. Theorem 3.4.7 Let X be a uniformly convex Banach space, D a convex subset in X and A : D × D → D a continuous mapping satisfying conditions 7/, 8/ and a/ assume that for a point c ∈ [0, 1) the mapping Jc − A, where Jc (x, y) = cx + (1 − c)y is quasiconvex on D × D; then for any t ∈ (0, 1) the sequences un = ((tJc − (I − t)A)(un , vn ), vn = ((tJc − (I − t)A)k (un , un−1 ), weakly converge simultaneously to the solution p ∈ F ; b/ when the mapping Ix − A is quasiconvex, we construct an algorithm as follows: u ¯0 = u0 , then by means of the system un = A(un , vn ), n ≥ 1

(3.4.6)

vn = Ak (un , u ñ−1 ),

we obtain (u1 , v1 ),then we take u ˜1 = tu1 + (1 − t)v1, and again using system (3.4.6) we get (u2 , v2 ) etc; we claim that such obtained sequences {un }, {vn } will weakly converge to the solution p ∈ F. Proof. It suffices to show that the constructed sequences {un }, {vn } satisfy condition 9/ i.e. kun − vn k → 0 as n → ∞. i/ First we study the case c 6= 1. We have kun − pk = k(tJc − (I − t)A)(un , vn ) − pk ≤ t kJc (un , vn ) − pk + (1 − t) kA(un , vn ) − pk

≤ t(c kun − pk + (1 − c) kvn − pk) + (1 − t) kA(un , vn ) − pk .

But (1 − t) kA(un , vn ) − pk < (1 − t) max(kun − pk , kvn − pk), if kun − pk 6= kvn − pk ≤ (1 − t) max(kun − pk , kvn − pk), if kun − pk = kvn − pk . It follows that kun − pk ≤ kvn − pk .


59

Analogously we obtain kvn − pk = k(tJc + (1 − t)A)k (un , un−1 ) − pk ≤ kun−1 − pk . Therefore there exists d such that d = lim kun − pk = lim kvn − pk . If d = 0, then un → p, vn → p and, of course, W

W

un → p, vn → p. If d 6= 0, then kun − pk → 1. kvn − pk However

un − p c(un − p) + (1 − c)(vn − p) A(un , vn ) − p

= t

. + (1 − t)

kvn − pk kvn − pk kvn − pk Let

c(un − p) + (1 − c)(vn − p) A(un , vn ) − p

. Wn = t + (1 − t) kvn − pk kvn − pk

Obviously

Wn ≤ 1, kZn k ≤ 1. Hence from ktWn + (1 − t)Zn k → 0 as n → ∞ and by the uniform convexity of X it follows that kWn − Zn k → 0 as n → ∞, i.e. kc(un − p) + (1 − c)(vn − p) − (A(un , vn ) − p)k → 0 as n → ∞. Taking into account that kcun + (1 − c)vn − A(un , vn )k = 1 = k(1 − t)(cun + (1 − c)vn ) − (1 − t)A(un , vn )k 1−t 1 = kcun + (1 − c)vn − [t(cun + (1 − c)vn ) + (1 − t)A(un , vn )]k 1−t 1 = kcun + (1 − c)vn − un k 1−t 1 = k(1 − c)(vn − un )k , 1−t

60


we have kvn − un k → ∞ as n → ∞. At the same time it is easily seen (Jc − A)(un − vn ) = cun + (1 − c)vn − A(un , vn ) =

1−c (vn − un ). 1−t

Now as in the proof of Theorem 3.4.1 we obtain W

W

unj → u, vnj → v, u = v. By the quasiconvexity of the mapping Jc − A we get

kcu + (1 − c)v − A(un , vn )k = ku − A(u, u)k ≤ lim (Jc − A)(unj − vnj ) 1−c = lim kvn − un k = 0, 1−t

i.e. u = v = A(u, u) = A(v, v). It remains to apply the same arguments as in the proof of Theorem 3.4.1. ii/ The case c = 1. By condition 7/ we have kun+1 − pk ≤ kvn+1 − pk ≤ k¯ un − pk and kun − pk ≤ kvn − pk . But u¯n = tun + (1 − t)vn so k¯ un − pk ≤ kvn − pk . Thus there exists d such that d = lim k¯ un − pk = lim kvn − pk . If d = 0, then kvn − pk → 0 as n → ∞. If d 6= 0, then

u

¯n − p → 1.

kvn − pk

Nevertheless

u

¯n − p = t(un − p) + (1 − t)(vn − p) .

kvn − pk kvn − pk kvn − pk

Setting

Wn =

u ¯n − p vn − p , Zn = , kvn − pk kvn − pk

we have kWn k ≤ 1,

kZn k = 1.


61

Hence kWn − Zn k → 0 as n → ∞, i.e.

u

¯n − p

¯n − p − vn − p = u

kvn − pk kvn − pk vn − p → 0 as n → ∞.

So kun − vn k → 0, as n → ∞. The proof of the theorem is complete. Let us deal with the algorithm un = An (un , vn ), n ≥ 1, vn = An,k (un , un−1 ),

(3.4.7)

where An,1 (x, y) = An (x, y), An,i (x, y) = An (x, An,i−1 (x, y)), i = 1, 2, ..., k. Lemma 3.4.8 Let (X, d) be a metric space, Mn , n = 0, 1, 2, ... compact subsets in B

X, and Mn → M0 as n → ∞ i.e. sup inf d(x, y) → 0 as n → ∞. Then the set

∪∞ n=0 Mn is compact too.

x∈Mn y∈M0

Remark 3.4.2 Similarly to the proofs of the last Theorem and of Theorems 3.4.6, 3.4.7 it is able to show the weak convergence of the approximants defined by (3.4.7) or by un = (tJc − (I − t)An )(un , vn ), n ≥ 1, vn = (tJc − (I − t)An )k (un , un−1 ), for any t ∈ (0, 1), the the cases when X is a reflexive Banach space or an uniformly convex one and An are continuous mappings from D×D into Dn , where Dn , mn≥ 0 are convex closed subsets in D ⊂ X. Theorem 3.4.9 Let X be a Banach space, D a closed subset of X. Assume that An , n > 0 are continuous and weakly compact mappings from D × D into D. Moreover the following conditions are satisfied: 12/ F = {x : A(x, x) = x} 6= ∅; 13/ < max(kx − pk , ky − pk) if kx − pk 6= ky − pk , kAn (x, y) − pk ≤ max(kx − pk , ky − pk) if kx − pk = ky − pk ∀x, y ∈ D, p ∈ F, n ≥ 0; 14/ f or c ∈ [0, 1] the mapping Jc − A0 is quasiconvex W on D × D. 15/ the space X has the property: if {yn } ⊂ X, yn → y0 ∈ X, then limkyn − yk >limkyn − y0 k , ∀y 6= y0 , y ∈ X; 16/ The system (3.4.7) is solvable for n ≥ 1 and kun − vn k → 0 as n → ∞; 17/ kAn (x, y) − A0 (x, y)k ≤ ε, ε → 0 as n → ∞, ∀x, y ∈ D. Then the sequences {un } , {vn } defined by formula (3.4.7) weakly converge to the solution of equation (3.4.1) .

62


Proof. As in the proof of Theorem 3.4.1 we have lim kvn − pk = lim kun − pk ≤ ku0 − pk . Hence {un } , {vn } ⊂ D0 = B(p, ku0 − pk) ∩ D (convex and closed in D). Let Mn be the image of the restriction of An on D0 × D0 , n ≥ 0. By the weak compactness of An , Mn are weakly compact. By means of constructed method (3.4.7) it is clear that un , vn ∈ Mn , n ≥ 1. By condition 17/ and Lemma 3.4.8, M = ∪n≥0 Mn is weakly compact. Therefore from the sequences {un } , {vn } we can choose weakly convergent subsequences unj , vnj such that W

W

unj → u, vnj → v. But

ku − vk ≤ lim unj − vnj = lim kun − vn k = 0, that is, u = v. On the other hand,

k(Jc − A0 ) (u, v)k = kcu + (1 − c) v − A0 (u, v)k

≤ lim cunj + (1 − c) vnj − A0 unj , vnj + εnj

≤ lim cunj + (1 − c) vnj − Anj unj , vnj + εnj

≤ lim (1 − c) unj , vnj (since unj = Anj unj , vnj ) = (1 − c) lim kun − vn k = 0.

Hence cu + (1 − c) v = A0 (u, v), and u = v, so u = v ∈ F. It will be shown that {un } , {vn } have just one limit point (in the weak sense), say, p. In order to do it we consider the function f (p) = lim kvn − pk defined on F0 = F ∩ M0 . Since the function f is lower weakly semicontinuous and the set F is weakly compact (as the image of a bounded closed set under a weakly compact mapping), we claim that the function f attains on F0 its infinimum at a point, say, p¯. Suppose that a subsequence {unk } weakly converges to u 6= p¯. Then by condition 15/ we have lim kunk − p¯k > lim kunk − uk . W

W

This is a contradiction. So un → p, vn → p. The theorem is thus proved. By the use of Edelstein’s Theorem from [149], [150] for the operator An (x) = A(x, v) it is easy to prove


63

Theorem 3.4.10 Let D be a closed bounded subset of a normed space X, and A a mapping from D × D to D satisfying the condition  < max {kx − zk , ky − tk} , if (x, y) 6= (z, t) and x 6= y or z 6= t, kA(x, y) − A(z, t)k  ≤ kx − zk = ky − tk , if x = y and z = t,

(3.4.8)

for all x, y, z, t ∈ D. Assume that either D is compact or A maps D × D into a compact subset of D. Then the equation A (x, x) = x

(3.4.9)

has a unique solution in D. Moreover every equation xn = A(xn , xn−1 ), n ≥ 0,

(3.4.10)

has also a unique solution xn and the sequence {xn } defined by (3.4.10) converges to the solution of equation (3.4.9) for any x0 ∈ D. Remark 3.4.3 If instead of (3.4.9) under the assumptions as in Theorem 3.4.10 we use the Picard iteration for A∗ (x) = A (x, x) as in [105] by L.P. Belluce and W.A. Kirk, in [111] by F.E. Browder, in [149] by M. Edelstein, in [273] by W.V. Petryshyn and T.E. Williamson, etc., then obviously it is impossible to assert either the existence or the convergence of the approximate solution for equation (3.4.10). In what follows some simple examples will be shown. Example 3.4.1 Let us consider the set D of C(−∞, +∞) with elements defined as follows x (t) ∈ D if |x (t)| ≤ M, M is a constant, x (k) = 0 for all integers k ∈ Z, 2 (t − k) x (k + 1/2) , for t ∈ [k, k + 1/2] x (t) = 2 (k + 1 − tx (k + 1/2)) , for t ∈ [k + 1/2, k + 1] |x (k + 1/2)| ≤ |x (k + 3/2)| for all k ∈ Z. Let A (x, y) =

x (t + 1) + y (t + 1) , ∀x (t) , y (t) ∈ D, 2 + kx − yk / (kx − yk + M )

where kx − yk =

sup t∈(−∞,∞)

kx (t) − y (t)k .

Mapping A satisfies all conditions of Theorem 3.4.10. Hence for any x0 (t) ∈ D, the sequence xn (t) = A (xn (t) , xn−1 (t)) must converge to the solution of (3.4.10) x (k + 1/2) = x (k + 3/2) , ∀k ∈ Z (including x (t) ≡ 0). On the other hand the iteration xn = A (xn−1 , xn−1 ) may have got any limit point when, for instance, x0 (t) is such that: x0 (k0 + 1/2) < x0 (k0 + 3/2) for some k0 ∈ Z.

64


Example 3.4.2 Let D = lp , 1 < p < ∞, A (x, y) =

τy exp (kx − yk)

where τ : x = (x1 , x2 , . . .) 7−→ τ x = (0, x1 , x2 , . . .) , x ∈ lp . Finally for arbitrary x0 = (x01 , x02 , . . .) , x01 6= 0, the sequence defined by (3.4.10) converges to 0 as n → ∞.

3.5

A Special iteration method

The convergence of the iteration algorithm based on the algorithmic function x → αx + (1 − α) A(x) is analyzed, where A : Rn → Rn is a point to point mapping, and the coefficient α may change during the process. We consider iteration methods of the form xk+1 = αk xk + (1 − αk ) A(xk )

(3.5.1)

where A : D → D with D ⊆ Rn being convex, and for all k, 0 ≤ αk < 1. The convexity of D implies that the iteration sequence exists for arbitrary x0 ∈ D. In Section 3.1 the convergence of this procedure was studied for the special case, when A is a contraction and αk ≡ α for all k. This result was further generalized for the nonstationary case [99]. These earlier results guarantee the convergence of process (3.5.1) in cases when the original algorithm zk+1 = A (zk )

(3.5.2)

also converges to the unique fixed point of mapping A. In this section we will find convergence condition even in the cases when the original process (3.5.2) diverges. In such cases the optimal selection of the weights αk will also be determined. The following conditions will be assumed: (A) kA (x) − z ∗ k ≤ K (x) kx − z ∗ k , for all x ∈ D,

(3.5.3)

where z ∗ ∈ D is a fixed point of A, k·k is the Euclidean norm, and K : D → R is a real valued function; (B) T

2

(x − z ∗ ) (A (x) − z ∗ ) ≤ β (x) kx − z ∗ k , for all x ∈ D, where β : D → R is a real valued function.

(3.5.4)

3.5. A Special iteration method

65

Before analyzing the convergence of process (3.5.1), some comments are in order. If mapping A is bounded on D, that is, if kA (x) − A (y)k ≤ K kx − yk , the constant function K (x) ≡ K. If mapping −A is monotonic, that is, if T

(x − z) (A (x) − A (y)) ≤ 0 for all x, y ∈ D, then we may select β (x) ≡ 0. The Cauchy-Schwartz inequality implies that for all x ∈ D T

2

(x − z ∗ ) (A (x) − z ∗ ) ≤ kx − z ∗ k · kA (x) − z ∗ k ≤ K (x) kx − z ∗ k . Therefore we may assume that β (x) ≤ K (x) , otherwise we may replace β (x) by K (x) . Under assumptions (A) and (B) 2

2

kxk+1 − z ∗ k = kαk xk + (1 − αk ) A (xk ) − αk z ∗ − (1 − αk ) z ∗ k 2

= kαk (xk − z ∗ ) + (1 − αk ) (A (xk ) − z ∗ )k

= α2k kxk − z ∗ k + (1 − αk )2 kA (xk ) − z ∗ k2

+ 2αk (1 − αk ) (xk − z ∗ ) (A (xk ) − z ∗ ) h i 2 2 ≤ α2k + (1 − αk ) Kk2 + 2αk (1 − αk ) βk kxk − z ∗ k ,

2

where Kk = K (xk ) and βk = β (xk ) . Let p (αk ) denote the multiplier of kxk − z ∗ k , then p (αk ) = α2k 1 + Kk2 − 2βk − 2αk Kk2 − βk + Kk2 . Notice that

p (0) = Kk2 and p (1) = 1, further more p is a convex quadratic polynomial, unless 1 + Kk2 − 2βk = 0. Since 2

1 + Kk2 − 2βk ≥ 1 + Kk2 − 2Kk = (1 − Kk ) ≥ 0, this is the case if and only if Kk = βk = 1. In this special case p (αk ) ≡ 1. Otherwise the stationary point of function p is α∗ =

Kk2 − βk , 1 + Kk2 − 2βk

(3.5.5)

66


which is in [0, 1) if and only if βk ≤ Kk2 and βk < 1. Easy calculation shows that p (α∗ ) =

Kk2 − βk2 , 1 + Kk2 − 2βk

which is below 1 − ε (where ε is small) when (βk − 1) ≥ ε. 1 + Kk2 − 2βk If α∗ ∈ / [0, 1), then min {p (α) : 0 ≤ α ≤ 1} =

Kk2 , if Kk ≤ 1, 1, if Kk > 1.

The above iteration can be summarized as follows. Theorem 3.5.1 Assume that for each k ≥ 0, either Kk ≤ 1 − δ, βk > Kk2 ; or Kk ≤ 1, βk ≤ Kk2 , βk ≤ 1 − δ; or 1 < Kk ≤ Q, βk ≤ 1 − δ, where δ and Q are fixed constants. Then with appropriate selection of αk , kxk+1 − z ∗ k ≤ (1 − ε) kxk − z ∗ k for all k with some ε > 0, therefore the process converges to z ∗ . Assume next that K (x) ≡ K and β (x) ≡ β for all x ∈ D. Then the theorem reduces to the following result. Theorem 3.5.2 Assume that either K < 1, or K ≥ 1 and β < 1. Then the process converges to z ∗ with the appropriate selection of the coefficients αk . Corollary 3.5.3 If mapping A is bounded and −A is monotonic, then all conditions of Theorem 3.5.2 are satisfied. In this special case the best selection for αk is the following αk =

K2 , for all k. 1 + K2

3.5. A Special iteration method

67

This relation follows from (3.5.5) and the fact that in this case K (x) ≡ K and β (x) ≡ 0. Example 3.5.1 Consider equation x = −2x, that is, select D = R, A (x) = −2x, and z ∗ = 0. Mapping A is bounded and K = 2. The original iteration process xk+1 = A (zk ) does not converge for x0 6= 0. However, by selecting αk = 45 , the modified process has the form xk+1 =

4 1 2 xk + (−2xk ) = xk , 5 5 5

which is obviously convergent, since mapping x → 25 x is a contraction. Consided now the generalized algorithm xk+1 =

I X

αik Ai (xk ) ,

(3.5.6)

i=1

where Ai : D → D, and αik is a given nonnegative constant such that for all k ≥ 0. It is assumed that (C) for all i and j, T

PI

i=1

αik = 1

2

(Ai (x) − z ∗ ) (Aj (x) − z ∗ ) ≤ βij (x) kx − z ∗ k

for all x ∈ D, where βij : D → R is a real valued function for all i and j. Notice that these conditions are the straightforward generalizations of assumptions (A) and (B). Easy calculation shows that

I

2

X

∗ 2 ∗ kxk − z k = αik (Ai (xk ) − z )

i=1   I X I X 2 = αik αjk βijk  kxk − z ∗ k , i=1 j=1

where βijk = βij (xk ) . The best choice of the αik values can be obtained by solving the quadratic programming problem minimize

I X I X

αik αjk βijk ,

i=1 j=1

subject to αik ≥ 0 (i = 1, 2, . . . , I) I X i=1

αik = 1.

(3.5.7)

68


The iteration procedure is convergent, if for all k, the optimal objective function value is below 1 − ε with some fixed small ε > 0. Since for all k, the constraints are linear, the solution of problem (3.5.7) is a simple task. Process (3.5.6) can be further generalized as xk+1 = Fk (A1 (xk ) , . . . , AI (xk )) ,

(3.5.8)

where Ai : D → D, Fk : DI → D. Let z ∗ be a common fixed point of mappings Ai , and assume that z ∗ = Fk (z ∗ , . . . , z ∗ ) , furthermore (D) for all x ∈ D and all i, kAi (x) − z ∗ k ≤ Ki (x) kx − z ∗ k , where Ki : d → R is a real valued function; (E) for all x(i) ∈ D (i = 1, 2, . . . , I) and k ≥ 0, I

X

αik x(i) − z ∗ .

Fk x(1) , . . . , x(I) ≤ i=1

Under the above conditions,

kxk+1 − z ∗ k = kFk (A1 (xk ), . . . , AI (xk )) − z ∗ k ≤ ≤

I X i=1

αik kAi (xk ) − z ∗ k

I X i=1

αik Ki

!

kxk − z ∗ k .

PI It is easy to see that there exist coefficients αik such that i=1 αik Ki ≤ 1 − ε with some ε > 0 if and only if at least one Ki < 1. This observation generalizes the earlier results of Section 3.1. Finally we note that D may be a subset of an infinite dimensional Euclidean space, and in the last result k·k is not necessarily the Euclidean norm.

3.6

Stability of the Ishikawa iteration

Let X be an uniformly real smooth Banach space and T : X → X be Φ-strongly pseudocontractive mapping with the bounded range. It is shown that Ishikawa iteration procedures are almost T -stable and half T -stable. our results improve and extend corresponding results announced by others. Let X be a real Banach space and J the normalized duality mapping from X ∗ into 2X given by 2

2

J(x) = {f ∈ X ∗ : hx, f i = kxk = kf k },

3.6. Stability of the Ishikawa iteration

69

where X ∗ denotes the dual space of X and h·, ·i denotes the generalized duality pairing. It is well known that if X ∗ is strictly convex then J is single-valued and such that J(tx) = tJ(x) for all t ≥ 0 and x ∈ X, and if X ∗ is uniformly convex, then J is uniformly continuous on the bounded subsets of X. In the sequel we shall denote the single-valued normalized duality mapping by j. Definition 3.6.1 A mapping T with domain D(T ) and range R(T ) in X is called Φ-strongly pseudocontraction if for all x, y ∈ D(T ) there exist j(x − y) ∈ J(x − y) and a strictly increasing function Φ : [0, ∞) → [0, ∞) with Φ(0) = 0 such that 2

hT x − T y, j(x − y)i ≤ kx − yk − Φ(kx − yk) kx − yk

(3.6.1)

T is called strongly pseudocontraction if there exists k > 0 such that (3.6.1) is satisfied with Φ(t) = kt, and it is called Φ-hemicontractive if the nullspace of T, N (T ) = {x ∈ D(T ); T x = 0} is nonempty and (3.6.1) is satisfied for all x ∈ D(T ) but y ∈ N (T ). Furthermore, the class of strongly pseudocontraction mapping is a proper subclass of the class of Φ-strongly pseudocontraction mappings. Recently, some stability results for certain classes of nonlinear maps have been researched by several authors, the study topic on stability is both of theoretical and of numerical interest. In order to prove the main results of the paper, we need the following some concepts. Let mK be a nonempty convex subset of a real banach space X and T be a self-mapping of K. Let {αn } be a real sequence in [0, ∞). Assume that x0 ∈ K and xn+1 = f (T, xn , αn ) defines an iteration procedure which yields a sequence of points {xn } in K. Suppose, furthermore, that F (T ) = {x ∈ X : T x = x} 6= ∅ and that {xn } converges strongly to x∗ ∈ F (T ). Let {yn }∞ n=0 be any sequence in K and + {εn }∞ a sequence in R = [0, ∞) given by ε = ky n n+1 − f (T, yn , αn )k . n=0 Definition 3.6.2 1) If lim εn = 0 implies that lim yn = x∗ , then the iteration n→∞

n→∞

procedure {xn } defined by xn+1 = f (T, xn , αn ) is said to be T -stable. ∞ P 2) < ∞ implies that lim yn = x∗ , then the iteration procedure {xn } defined n=1

n→∞

by xn+1 = f (T, xn , αn ) is said to be almost T -stable. 3) If εn = o(αn ) implies that lim yn = x∗ , then the iteration procedure {xn } n→∞

defined by xn+1 = f (T, xn , αn ) is said to be half T -stable. ∞ P 4) If εn = ε′n + ε′′n with ε′n < ∞ and ε′′n = o(αn ) implies that lim yn = x∗ , n=1

n→∞

then the iteration procedure {xn } defined by xn+1 = f (T, xn , αn ) is said to be weakly T -stable.

We remark immediatly that, if an iterative sequence {xn } is T -stable, then it is weakly T -stable and, if the iterative sequence {xn } is weakly T -stable, then it is both almost and half T -stable. Conversly, it is evident that an iterative sequence {xn } which is either almost or half T -stable may fail to be weakly T -stable. Accordingly, it is of important theoretical interest for us to study the weak stability.

70


∞ Lemma 3.6.3 Let {a}∞ n=0 , {bn }n=0 be two nonnegative real sequence satisfying condition: ∞ P an+1 ≤ an + bn, bn < ∞, then lim an ∃. n→∞

n=1

Lemma 3.6.4 Let be a real banach space. then, for all x, y ∈ X and j(x + y) ∈ J(x + y), 2

2

kx + yk ≤ kxk + 2hy, j(x + y)i. Theorem 3.6.5 Let X be a real uniformly smooth Banach space, T : X → X be a continuous and Φ-strong pseudocontraction mapping with a bounded range. Suppose ∞ q is the fixed point of the mapping T. Let {αn }∞ n=1 , {βn }n=1 be two real sequence satisfying: 0 ≤ αn , βn < 1, ∀n ≥ 1

lim αn = lim βn = 0 n→∞ n→∞ P∞ α = ∞. n=1 n

Let {xn }∞ n=1 be the sequence of the Ishikawa iteration [198] generated from an arbitrary x1 ∈ X by xn+1 = (1 − αn )xn + αn T zn

zn = (1 − βn )xn + βn T xn ,

Let

{yn }∞ n=1

n ≥ 1.

+ be any sequence in X and define {εn }∞ n=1 ⊂ R by

εn = o(αn ) = kyn+1 − (1 − αn )yn + αn T τn k τn = (1 − βn )yn + βn T yn , n ≥ 1.

Then the Ishikawa iteration procedure {xn }∞ n=1 is half T -stable. Proof. Let q denote the unique fixed point of T. Let εn = o(αn ) → 0 as n → ∞. It suffices to prove that yn → q as n → ∞. Without loss of generality, we may assume that εn ≤ αn for all mn≥ 1. Now set M0 = sup {kT x − qk} + ky1 − qk , M = M0 + 1. x∈X

For any n ≥ 1, using induction, we get kyn − qk ≤ M. Indeed, we have ky1 − qk ≤ M0 ≤ M, ky2 − qk ≤ ky2 − (1 − α1 )y1 − α1 T τ1 k + (1 − α1 ) ky1 − qk + α1 kT τ1 − qk ≤ ε1 + (1 − α1 )M0 + α1 M0 ≤ α1 + M0 ≤ M, ky3 − qk ≤ ky3 − (1 − α2 )y2 − α2 T τ2 k + (1 − α2 ) ky2 − qk + α2 kT τ2 − qk

≤ ε2 + (1 − α2 )M + α2 M0 ≤ α2 + (1 − α2 )M0 + α2 M0 ≤ M, ... ky3 − qk ≤ εn−1 + (1 − αn−1 )M + αn−1 M0 ≤ αn−1 + (1 − αn−1 )M + αn−1 M0 ≤ M.


71

Since T is Φ-strong pseudocontraction mapping, then we have 2

hT x − T y, J(x − y)i ≤ kx − yk − Φ(kx − yk) kx − yk , for ∀x, y ∈ X. Let vn = (1 − αn )yn + αn T τn , un = yn+1 − vn . Then we have yn+1 = un + vn , kun k = εn and kyn+1 − qk ≤ kvn − qk + kun k = kvn − qk + εn .

(3.6.2)

Using Lemma 3.6.4, we have 2

kvn − qk = k(1 − αn )yn + αn T τn − qk

2

= k(1 − αn )(yn − q) + αn (T τn − T qk2 2

≤ (1 − αn )2 kyn − qk + 2αn hT τn − T q, J(vn − q)i 2

= (1 − αn )2 kyn − qk + 2αn hT τn − T q, J(τn − q)i + 2αn hT τn − T q, J(vn − q) − J(τn − q)i 2

≤ (1 − αn )2 kyn − qk + 2αn kτn − qk − 2αn Φ(kτn − qk) kτn − qk + 2αn dn

2

(3.6.3)

where dn = hT τn − T q, J(vn − q) − J(τn − q)i → 0 (n → ∞) (using J is uniformly continuous on the bounded subsets of X). Again applying Lemma 3.6.4, we have kτn − qk2 = k(1 − βn )(yn − q) + βn (T yn − q)k2 2

≤ (1 − βn )2 kyn − qk + 2βn hT yn − T q, J(τn − q)i 2

= (1 − βn )2 kyn − qk + 2βn hT yn − T q, J(yn − q)i

+ 2βn hT yn − T q, J(τn − q) − J(yn − q)i 2

≤ (1 − βn )2 kyn − qk + 2βn kyn − qk − 2βn Φ(kyn − qk) kyn − qk + 2βn en

2

2

= (1 + βn 2 ) kyn − qk − 2βn Φ(kyn − qk) kyn − qk + 2βn en , (3.6.4)

where en = hT τn − T q, J(τn − q) − J(yn − q)i → 0 (n → ∞) (same as above).

72

3. Contractive Fixed Point Theory Substituting (3.6.4) into (3.6.3) yields kvn − qk2 ≤ (1 − αn )2 kyn − qk2 + 2αn (1 + βn 2 ) kyn − qk2 − 4αn βn Φ(kyn − qk) kyn − qk + 4αn βn en − 2αn Φ(kτn − qk) kτn − qk + 2αn dn 2

= (1 + α n2 + 2α nβ 2n) ky n − qk + 4α nβ n[e n − Φ(ky n − qk) ky n− qk] + 2αn [dn − Φ(kτn − qk) kτn − qk]

≤ kyn − qk2 + (αn 2 + 2αn βn2 )M 2 + 4αn βn [en − Φ(kyn − qk) kyn − qk]

+ 2αn [dn − Φ(kτn − qk) kτn − qk]

= kyn − qk2 + 4αn βn (M 2 /2βn + en − Φ(kyn − qk) kyn − qk]

(3.6.5)

2

+ 2αn [M /2αn + dn − Φ(kτn − qk) kτn − qk]. We shall prove: inf {kyn − qk} = 0. If not, there must exist δ > 0, such that n≥1

inf {kyn − qk} = δ. This is to say that kyn − qk ≥ δ, for all n ≥ 1, and hence

n≥1

Φ(kyn − qk) kyn − qk ≥ Φ(δ)δ. From (3.6.4), we obtain 2

2

kτn − qk ≤ kyn − qk + M 2 βn2 + 2en βn − 2βn Φ(δ)δ 2

= kyn − qk + βn (M 2 βn + 2en − 2Φ(δ)δ).

(3.6.6)

Since βn → 0 and en → 0 as n → ∞, then there exists some integer N1 such that M 2 βn + 2en − 2Φ(δ)δ < 0 for all n > N1 . Thus we have kτn − qk ≤ kyn − qk and −Φ(kτn − qk) kτn − qk ≥ −Φ(kyn − qk) kyn − qk

(3.6.7)

for all n > N1 . Take these facts into consideration in (3.6.5), we obtain 2

2

kvn − qk ≤ kyn − qk + 4αn βn (M 2 /2βn + en − Φ(kτn − qk) kτn − qk]

(3.6.8)

2

+ 2αn [M /2αn + dn − Φ(kτn − qk) kτn − qk], for all n > N1 . Furthermore, we have the following estimate: kτn − qk = k(1 − βn )(yn − q) + βn (T yn − T q)k ≥ kyn − qk − βn kyn − qk − βn kT yn − T qk ≥ kyn − qk − βn (M + M0 ).

(3.6.9)


73

Again since βn → 0 as n → ∞, then there exists some integer N2 > N1 such that βn < δ/(2(M + M0 )) for all n > N2 . Hence, from (3.6.8) and (3.6.9) we obtain δ/2 ≤ kyn − qk − δ/2 ≤ kτn − qk

(3.6.10)

for all n > N2 > N1 . Using (3.6.10) and (3.6.8) we have for all n > N3 kvn − qk2 ≤ kyn − qk2 + 4αn βn (M 2 /2βn + en − Φ(δ/2)δ/2]

(3.6.11)

2

+ αn [M αn + 2dn − Φ(δ/2)δ/2] − αn Φ(kτn − qk) kτn − qk .

Owning to αn , βn , en , dn → 0 as n → ∞, without loss generality, it must be that there exists integer N3 > N2 such that M 2 /2βn + en − Φ(δ/2)δ/2 < 0,

M 2 αn + 2dn − Φ(δ/2)δ/2 < 0

for all n > N3 . Thus from (3.6.11) we get kvn − qk ≤ kyn − qk , 2

2

kvn − qk ≤ kyn − qk − αn Φ(kτn − qk) kτn − qk

(3.6.12)

for all n > N3 . Since εn = o(αn ), we may choose positive integer N4 > N3 such that 1 o(αn ) [o(αn )]2 Φ(δ/2)δ/2 − 2M − >0 2 αn αn

(3.6.13)

for all n > N4 . Again from (3.6.2), (3.6.10), (3.6.12) and (3.6.13), we obtain 2

2

kyn+1 − qk ≤ kvn − qk + 2 kvn − qk εn + ε2n 2

≤ kyn − qk − αn Φ(kτn − qk) kτn − qk + 2M εn + ε2n

≤ kyn − qk2 − αn Φ(kτn − qk) kτn − qk + 2M [o(αn )] + [o(αn )]2 1 2 ≤ kyn − qk − αn Φ(δ/2)δ/2 2 2M [o(αn )] [o(αn )]2 1 − αn αn Φ(δ/2)δ/2 − − 2 αn αn 1 2 ≤ kyn − qk − αn Φ(δ/2)δ/2 (3.6.14) 2 2 ≤ kyn − qk (3.6.15) for all n > N4 . From above (3.6.15), we get lim kyn − qk ∃. n→∞

Thus, (3.6.14) implies that 1 2

∞ X

n=N4 +1

αn Φ(δ/2)δ/2 ≤ kyN4 +1 − qk2 − lim kyn+1 − qk2 < ∞. n→∞

74


This is a contradiction. Thus inf kyn − qk = 0.

(3.6.16)

n

It follows that we want to prove lim kyn − qk = 0. n→∞

From (3.6.16) we get that there exists an infinite subsequence {ynj − q} → 0 as j → ∞.

τnj − ynj = −βnj ynj + βnj T ynj → 0 as j → ∞, so that τnj − ynj ≤

Since

τnj − ynj + ynj − q → 0 as j → ∞, thus we have

ynj +1 − q ≤ vnj − q + εnj n

2

≤ ynj − q + 4αnj βnj [M 2 /2βnj + enj − Φ( ynj − q ) ynj − q ]

1/2 +2αnj [M 2 /2αnj + dnj − Φ( τnj − q ) τnj − q ] + εnj → 0,

as j → ∞. Using continuous of Φ and induce method, we obtain ynj +m − q → 0 as j → ∞ for any nature number m.

Theorem 3.6.6 Let X be a real uniformly smooth Banach space, T : X → X be a contiuous and Φ-strong accretive operator. Suppose (I − T ) has a bounded range. Define S : X → X by Sx = f + x − T x. Suppose q is the fixed point of the mapping ∞ S. Let {αn }∞ n=1 , {βn }n=1 be two real sequence satisfying: (1) 0 ≤ αn , βn < 1, ∀n ≥ 1

(2) lim αn = lim βn = 0 n→∞

n→∞

(3)

∞ X

n=1

αn = ∞.

Let {xn }∞ n=1 be the sequence of the Ishikawa iteration generated from an arbitrary x1 ∈ X by xn+1 = (1 − αn )xn + αn Szn zn = (1 − βn )xn + βn Sxn , n ≥ 1. ∞ + Let {yn }∞ n=1 be any sequence in X and define {εn }n=1 ⊂ R by εn = o(αn ) = kyn+1 − (1 − αn )yn − αn Sτn k τn = (1 − βn )yn + βn Syn , n ≥ 1.

Then the Ishikawa iteration procedure {xn }∞ n=1 is half S-stable. Proof. Since T is Φ-strong accretive operator, then S is Φ-strong pseudocontraction. Now the conclusion follows from Theorem 3.6.5. Theorem 3.6.7 Let X be a real uniformly smooth Banach space, T : X → X be a continuous and Φ-strong pseudocontraction mapping with the bounded range.


75

∞ Suppose mq is the fixed point of the mapping T. Let {αn }∞ n=1 , {βn }n=1 be two real sequence satisfying:

(1) 0 ≤ αn , βn < 1, ∀n ≥ 1


n→∞

(3)

∞ X

n=1

αn = ∞.

Let {xn }∞ n=1 be the sequence of the Ishikawa iteration generated from an arbitrary x1 ∈ X by xn+1 = (1 − αn )xn + αn T zn zn = (1 − βn )xn + βn T xn , n ≥ 1. ∞ + Let {yn }∞ n=1 be any sequence in X and define {εn }n=1 ⊂ R by

εn = kyn+1 − (1 − αn )yn − αn T τn k τn = (1 − βn )yn + βn T yn , n ≥ 1.

Then the Ishikawa iteration procedure {xn }∞ n=1 is almost T -stable. Proof. Set M0 = sup {kT x − qk} + ky1 − qk , M = M0 + x∈X

∞ P

εn .

n=1

For any n ≥ 1, using induction, we get kyn − qk ≤ M. Indeed we have ky1 − qk ≤ M0 ≤ M, ky2 − qk ≤ ky2 − (1 − α1 )y1 − α1 T τ1 k + (1 − α1 ) ky1 − qk + α1 kT τ1 − qk ≤ ε1 + (1 − α1 )M0 + α1 M0 ≤ ε1 + M0 ≤ M,

ky3 − qk ≤ ε2 + ε1 + M0 ≤ M, ... ky3 − qk ≤

n−1 X i=1

εi + M0 ≤ M.

Since T is Φ-strong pseudocontraction mapping, for ∀x, y ∈ X then have 2

hT x − T y, J(x − y)i ≤ kx − yk − Φ(kx − yk) kx − yk . Let vn = (1 − αn )yn + αn T τn , un = yn+1 − vn . Then we have yn+1 = un + vn , kun k = εn and kyn+1 − qk ≤ kvn − qk + kun k = kvn − qk + εn .

(3.6.17)

76


Using Lemma 3.6.4, we have 2

kvn − qk = k(1 − αn )yn + αn T τn − qk

2

= k(1 − αn )(yn − q) + αn (T τn − T qk2 2

≤ (1 − αn )2 kyn − qk + 2αn hT τn − T q, J(vn − q)i 2

= (1 − αn )2 kyn − qk + 2αn hT τn − T q, J(τn − q)i

+ 2αn hT τn − T q, J(vn − q) − J(τn − q)i 2

≤ (1 − αn )2 kyn − qk + 2αn kτn − qk − 2αn Φ(kτn − qk) kτn − qk + 2αn dn

2

(3.6.18)

where dn = hT τn − T q, J(vn − q) − J(τn − q)i → 0 (n → ∞) (using J is uniformly continuous on the bounded subsets of X). Again applying Lemma 3.6.4, we have 2

2

kτn − qk = k(1 − βn )(yn − q) + βn (T yn − q)k 2

≤ (1 − βn )2 kyn − qk + 2βn hT yn − T q, J(τn − q)i 2

= (1 − βn )2 kyn − qk + 2βn hT yn − T q, J(yn − q)i

+ 2βn hT yn − T q, J(τn − q) − J(yn − q)i 2

≤ (1 − βn )2 kyn − qk + 2βn kyn − qk − 2βn Φ(kyn − qk) kyn − qk + 2βn en

2

2

= (1 + βn 2 ) kyn − qk − 2βn Φ(kyn − qk) kyn − qk + 2βn en ,

(3.6.19)

where en = hT τn − T q, J(τn − q) − J(yn − q)i → 0 (n → ∞) (same as above). Substituting (3.6.19) into (3.6.18) yields 2

2

2

kvn − qk ≤ (1 − αn )2 kyn − qk + 2αn (1 + βn 2 ) kyn − qk − 4αn βn Φ(kyn − qk) kyn − qk

+ 4αn βn en − 2αn Φ(kτn − qk) kτn − qk + 2αn dn 2

= (1 + αn 2 + 2αn βn2 ) kyn − qk + 4αn βn [en − Φ(kyn − qk) kyn − qk]

+ 2αn [dn − Φ(kτn − qk) kτn − qk] 2

≤ kyn − qk + (αn 2 + 2αn βn2 )M 2 + 4αn βn [en − Φ(kyn − qk) kyn − qk] + 2αn [dn − Φ(kτn − qk) kτn − qk] 2

= kyn − qk + 4αn βn (M 2 /2βn + en − Φ(kyn − qk) kyn − qk] (3.6.20)

+ 2αn [M 2 /2αn + dn − Φ(kτn − qk) kτn − qk]. We shall prove: inf {kyn − qk} = 0. If not, there must exist δ > 0, such that n≥1


77

inf {kyn − qk} = δ. This is to say that kyn − qk ≥ δ, for all n ≥ 1, and hence

n≥1

Φ(kyn − qk) kyn − qk ≥ Φ(δ)δ. From (3.6.19), we obtain kτn − qk2 ≤ kyn − qk2 + M 2 βn2 + 2en βn − 2βn Φ(δ)δ 2

= kyn − qk + βn (M 2 βn + 2en − 2Φ(δ)δ).

(3.6.21)

Since βn → 0 and en → 0 as n → ∞, then there exists some integer N1 such that M 2 βn + 2en − 2Φ(δ)δ < 0 for all n > N1 . Thus we have kτn − qk ≤ kyn − qk and −Φ(kτn − qk) kτn − qk ≥ −Φ(kyn − qk) kyn − qk

(3.6.22)

for all n > N1 . Take these facts into consideration in (3.6.20), we obtain 2

2

kvn − qk ≤ kyn − qk + 4αn βn (M 2 /2βn + en − Φ(kτn − qk) kτn − qk] (3.6.23) + 2αn [M 2 /2αn + dn − Φ(kτn − qk) kτn − qk], for all n > N1 . Furthermore, we have the following estimate: kτn − qk = k(1 − βn )(yn − q) + βn (T yn − T q)k

≥ kyn − qk − βn kyn − qk − βn kT yn − T qk ≥ kyn − qk − βn (M + M0 ).

(3.6.24)

Again since βn → 0 as n → ∞, then there exists some integer N2 > N1 such that βn < δ/(2(M + M0 )) for all n > N2 . Hence, from (3.6.23) and (3.6.24) we obtain δ/2 ≤ kyn − qk − δ/2 ≤ kτn − qk

(3.6.25)

for all n > N2 > N1 . Using (3.6.25) and (3.6.23) we have for all n > N3 2

2

kvn − qk ≤ kyn − qk + 4αn βn (M 2 /2βn + en − Φ(δ/2)δ/2]

(3.6.26)

2

+ αn [M αn + 2dn − Φ(δ/2)δ/2] − αn Φ(kτn − qk) kτn − qk .

Owning to αn , βn , en , dn → 0 as n → ∞, without loss generality, we may assume that ∃N3 such that M 2 /2βn + en − Φ(δ/2)δ/2 < 0,

M 2 αn + 2dn − Φ(δ/2)δ/2 < 0

78


for all n > N3 . Thus from (3.6.26) we have kvn − qk ≤ kyn − qk for all n > N3 . It follows that kyn+1 − qk ≤ kyn − qk + εn for all n > N3 . Using Lemma 3.6.3, we obtain lim kyn − qk ∃. n→∞

From (3.6.17) and (3.6.26) we get 2

2

kyn+1 − qk ≤ kvn − qk + 2 kvn − qk εn + ε2n 2

≤ kyn − qk − αn Φ(kτn − qk) kτn − qk + 2M εn + ε2n

for all n > N3 . Which implies that 1 2

∞ X

n=N3 +1

2

2

αn Φ(δ/2)δ/2 ≤ kyN3 +1 − qk − lim kyn+1 − qk + n→∞

X (2M εn +ε2n ) < ∞.

This is a contradiction. Thus inf kyn − qk = 0.

(3.6.27)

n

It follows that we want to prove lim kyn − qk = 0. n→∞

From (3.6.27) we get that there exists an infinite subsequence {ynj − q} → 0 as j → ∞.

τnj − ynj = −βnj ynj + βnj T ynj → 0 as j → ∞, so that τnj − ynj ≤ Since

τnj − ynj + ynj − q → 0 as j → ∞, thus we have

ynj +1 − q ≤ vnj − q + εnj n

2

≤ ynj − q + 4αnj βnj [M 2 /2βnj + enj − Φ( ynj − q ) ynj − q ]

1/2 +2αnj [M 2 /2αnj + dnj − Φ( τnj − q ) τnj − q ] + εnj → 0,

as j → ∞. Using continuous of Φ and induce method, we obtain ynj +m − q → 0 as j → ∞ for any nature number m.

Theorem 3.6.8 Let X be a real uniformly smooth Banach space, T : X → X be a continuous and Φ-strong accretive operator. Suppose (I − T ) has a bounded range. Define S : X → X by Sx = f + x − T x. Suppose q is the fixed point of the mapping ∞ S. Let {αn }∞ n=1 , {βn }n=1 be two real sequence satisfying: (1) 0 ≤ αn , βn < 1, ∀n ≥ 1


n→∞

(3)

∞ X

n=1

αn = ∞.

Let {xn }∞ n=1 be the sequence of the Ishikawa iteration generated from an arbitrary x1 ∈ X by xn+1 = (1 − αn )xn + αn Szn zn = (1 − βn )xn + βn Sxn , n ≥ 1.

3.7. Exercises

79

∞ + Let {yn }∞ n=1 be any sequence in X and define {εn }n=1 ⊂ R by εn = kyn+1 − (1 − αn )yn − αn Sτn k τn = (1 − βn )yn + βn Syn , n ≥ 1.

Then the Ishikawa iteration procedure {xn }∞ n=1 is almost S-stable. Proof. Since T is Φ-strong accretive operator, then S is Φ-strong pseudocontraction. Now the conclusion follows from Theorem 3.6.7.

3.7

Exercises

3.1 Consider the problem of approximating a solution y ∈ C ′ [0, t0 ] of the nonlinear ordinary differential equation dy = K(t, y(t)), dt

0 ≤ t ≤ t0 , y(0) = y0 .

The above equation may be turned into a fixed point problem of the form Z t y(t) = y0 + K(s, y(s))ds, 0 ≤ t ≤ t0 . 0

Assume K(x, y) is continuous on [0, t0 ] × [0, t0 ] and satisfies the Lipschitz condition max K(s, q1 (s)) − K(s, q2 (s)) ≤ M kq1 − q2 k, for all q1 , q2 ∈ C[0, t0 ]. [0,t0 ]

Note that the integral equation above defines an operator P from C[0, t0 ] into itself. Find a sufficient condition for P to be a contraction mapping.

¯ (x0 , r) in a Banach space X, 3.2. Let F be a contraction mapping on the ball U and let kF (x0 ) − x0 k ≤ (1 − c)r. ¯ (x0 , r). Show F has a unique fixed point in U 3.3. Under the assumptions of Theorem 3.1.2, show that the sequence generalized by (3.1.2) minimizes the functional f (x) = kx − F (x)k for any x0 belonging to a closed set A such that F (A) ⊆ A. 3.4. Let the equation F (x) = x have a unique solution in a closed subset A of a Banach space X. Assume that there exists an operator F1 that F1 (A) ⊆ A and F1 commutes with F on A. Show the equation x = F1 (x) has at least one solution in A.

80


3.5. Assume that operator F maps a closed set A into a compact subset of itself and satisfies kF (x) − F (y)k < kx − yk

(x 6= y), for all x, y ∈ A.

Show F has a unique fixed point in A. Apply these results to the mapping 2 F (x) = x − x2 of the interval [0, 1] into itself. 3.6. Show that operator F defined by F (x) = x +

1 x

maps the half line [1, ∞) into itself and satisfies kF (x) − F (y)k < kx − yk but has no fixed point in this set. 3.7. Consider condition kF (x) − F (y)k ≤ kx − yk

(x, y ∈ A).

Let A be either an interval [a, b] or a disk x2 + y 2 ≤ r2 . Find conditions in both cases under which F has a fixed point. 3.8. Consider the set c0 of null sequences x = {x1 , x2 , . . .} (xn → 0) equipped with the norm kxk = maxn |xn |. Define the operator F by 1 F (x) = 21 (1 + kxk), 34 x1 , 78 x2 , . . . , 1 − 2n+1 xn , . . . . ¯ (0, 1) → U ¯ (0, 1) satisfies Show that F : U kF (x) − F (y)k < kx − yk, but has no fixed points. 3.9. Repeat Exercise 3.8 for the operator F defined in c0 by F (x) = {y1 , . . . , yn , . . .}, 1 where yn = n−1 n xn + n sin(n) (n ≥ 1). 3.10. Repeat Exercise 3.8 for the operator F defined in C[0, 1] by Ax(t) = (1 − t)x(t) + t sin 1t .

3.11. Let F be a nonlinear operator on a Banach space X which satisfies (3.1.3) ¯ (0, r). Let F (0) = 0. Define the resolvent R(x) of F by on U F (x)f = xF R(x)f + f. Show:

3.7. Exercises

81

(a) R(x) is defined on the ball kf k ≤ (1 − |x|c)r if |x| < c−1 ;

(b)

1 1+|x|c kf

− gk ≤ kR(x)f − R(x)gk ≤

(c) kR(x)f − R(y)f k ≤

1 1−|x|c kf

− gk;

ckf k |x−y| (1−|x|c)(1−|y|c) .

3.12. Let A be an operator mapping a closed set A of a Banach space X into itself. Assume that there exists a positive integer m such that Am is a contraction operator. Prove that sequence (3.1.2) converges to a unique fixed point of F in A. 3.13. Let F be an operator mapping a compact set A ⊆ X into itself with kF (x) − F (y)k < kx − yk (x 6= y, all x, y ∈ A). Show that sequence (3.1.2) converges to a fixed point of (3.1.1). 3.14. Let F be a continuous function on [0, 1] with 0 ≤ f (x) ≤ 1 for all x ∈ [0, 1]. Define the sequence xn+1 = xn +

1 n+1 (F (xn )

− xn ).

Show that for any x0 ∈ [0, 1] sequence {xn } (n ≥ 0) converges to a fixed point of F . 3.15. Show: (a) A system x = Ax + b of n linear equations in n unknowns x1 , x2 , . . . , xn (the components of x) with A = {ajk }, j, k = 1, 2, . . . , n, b given, has a unique solution x∗ if n X

k=1

|ajk | < 1,

j = 1, 2, . . . , n.

(b) The solution x∗ can be obtained as the limit of the iteration (x(0) , x(1) , x(2) , . . .}, where x(0) is arbitrary and x(m+1) = Ax(m) + b (m ≥ 0). (c) The following error bounds hold: kx(m) − x∗ k ≤

c cm kx(m−1) − x(m) k ≤ kx(0) − x(1) k, 1−c 1−c

where c = max j

n X

k=1

|ajk | and kx − zk = max |xi − zi |, j = 1, 2, . . . , n. j

82


3.16. (Gershgorin’s theorem): If λ is an eigenvalue of a square matrix A = {ajk }, then for some j, where 1 ≤ j ≤ n, |ajj − λ| ≤

n X

k=1 k6=j

|ajk |.

Show that x = Ax + b can be written Bx = b, where B = I − A, and P n k=1 |ajk | < 1 together with the theorem imply that 0 is not an eigenvalue of B and A has spectral radius less than 1. 3.17. Let (X, d), (X, d1 ), (X, d2 ) be metric spaces with d(x, z) = maxj |xj − zj |, j = 1, 2, . . . , n, X 1/2 n n X d1 (x, z) = |xj − zj | and d2 (x, z) = (xj − zj )2 , j=1

j=1

respectively. Show that instead of the conditions n X j=1

Pn

k=1

|ajk | < 1, j = 1, 2, . . . , n, we obtain

|ajk | < 1, k = 1, 2, . . . , n and

n X n X

a2jk < 1.

j=1 k=1

3.18. Let us consider the ordinary differential equation of the first order (ODE) x′ = f (t, x),

x(t0 ) = x0 ,

where t0 and x0 are given real numbers. Assume: |f (t, x)| ≤ c0 on R = {(t, x) | |t − t0 | ≤ a, |x − x0 | ≤ b}, |f (t, x) − f (t, v)| ≤ c1 |x − v|,

for all (t, x), (t, v) ∈ R.

Then show: the (ODE) has a unique solution on [t0 − c2 , t0 + c2 ], where n o c2 < min a, cb0 , c11 . 3.19. Show that f defined by f (x, y) = | sin y| + x satisfies a Lipschitz condition with respect to the second variable (on the whole xy-plane). 3.20. Does f defined by f (t, x) = |x|1/2 satisfy a Lipschitz condition? Rt 3.21. Apply Picard’s iteration xn+1 (t) = t0 f (s, xn (s))ds used for the (ODE) x′ = f (t, x), x(t0 ) = x0 + 0, x′ = 1 + x2 , x(0) = 0. Verify that for x3 , the terms involving t, t2 , . . . , t5 are the same as those of the exact solution.

3.7. Exercises

83

3.22. Show that x′ = 3x2/3 , x(0) = 0 has infinitely many solutions x. 3.23. Assume that the hypotheses of the contraction mapping principle hold, then ¯ 0 , r0 ). show that x∗ is accessible from any point U(x 3.24. Define the sequence {¯ xn } (n ≥ 0) by x ¯0 = x0 , x ¯n+1 = F (¯ xn ) + εn (n ≥ 0). Assume: kεn k ≤ λn ε (n ≥ 0) (0 ≤ λ < 1); F is a c-contraction operator on U (x0 , r). Then show sequence {¯ xn } (n ≥ 0) ¯ (x0 , r) provided that converges to the unique fixed point x∗ of F in U r ≥ r0 +

ε . 1−c

3.25. (a) Let F : D ⊆ X → X be an analytic operator. Assume: • there exists α ∈ [0, 1) such that kF ′ (x)k ≤ α •

(x ∈ D);

1

1 (k) k−1 γ = sup k! F (x)

(1)

is finite;

k>1 x∈D

• there exists x0 ∈ D such that kx0 − F (x0 )k ≤ η ≤

√ 3−α−2 2−α , γ

γ 6= 0;

¯ 0 , r1 ) ⊆ D, where, r1 , r2 with 0 ≤ r1 ≤ r2 are the two zeros of • U(x function f , given by f (r) = γ(2 − α)r2 − (1 + ηγ − α)r + η. ¯ 0 , r1 ) Show: method of successive substitutions is well defined, remains in U(x ∗ ¯ for all n ≥ 0 and converges to a fixed point x ∈ U(x0 , r1 ) of operator F . ¯ 0 , r2 ). Furthermore, the Moreover, x∗ is the unique fixed point of F in U(x following error bounds hold for all n ≥ 0: kxn+2 − xn+1 k ≤ βkxn+1 − xn k and kxn − x∗ k ≤

βn η, 1−β

84

3. Contractive Fixed Point Theory where β=

γη + α. 1 − γη

The above result is based on the assumption that the sequence

1

1 (k) k−1 γk = k! F (x)

(x ∈ D),

(k > 1)

is bounded above by γ. This kind of assumption does not always hold. Let us then not assume sequence {γk } (k > 1) is bounded and define “function” f1 by f1 (r) = η − (1 − α)r +

∞ X

γkk−1 rk .

k=2

(b) Let F : D ⊆ X → X be an analytic operator. Assume for x0 ∈ D function f1 has a minimum positive zero r3 such that ¯ 0 , r3 ) ⊆ D. U(x ¯ 0 , r3 ) Show: method of successive substitutions is well defined, remains in U(x ¯ (x0 , r3 ) of operator for all n ≥ 0 and converges to a unique fixed point x∗ ∈ U F . Moreover the following error bounds hold for all n ≥ 0 kxn+2 − xn+1 k ≤ β1 kxn+1 − xn k and kxn − x∗ k ≤

β1n η, 1 − β1

where, β1 =

∞ X

γkk−1 η k−1 + α.

k=2

3.26. (a) With F, D as in Exercise 3.25 assume that there exists α such that kF ′ (x∗ )k ≤ α, and ¯ ∗ , r∗ ) ⊆ D, U(x

(2)

3.7. Exercises

85

where, r∗ =

1 γ

∞, · 1−α 2−α ,

if γ = 0 if γ 6= 0.

Then, if β = α+

γr∗ < 1, 1 − γr∗

¯ (x∗ , r∗ ) for all show: the method of successive substitutions remains in U ∗ ∗ ∗ n ≥ 0 and converges to x for any x0 ∈ U (x , r ). Moreover, the following error bounds hold for all n ≥ 0: kxn+1 − x∗ k ≤ βn kxn − x∗ k ≤ βkxn − x∗ k, where, β0 = 1,

βn+1 = α +

γr∗ βn 1 − γr∗ βn

(n ≥ 0).

The above result was based on the assumption that the sequence 1

1 (k) ∗ k−1 F (x ) γk = k! (k ≥ 2)

is bounded γ. In the case where the assumption of boundedness does not necessarily hold, we have the following local alternative.

(b) Let x∗ be a fixed point of operator F. Moreover, assume: max exists and is attained at some r0 > 0. Set p=

∞ X

∞ P

r>0 k=2

(γk r)k−1

(γk r0 )k−1 ;

k=2

there exist α, δ with α ∈ [0, 1), δ ∈ (α, 1) such that p+α−δ ≤0 and ¯ ∗ , r0 ) ⊆ D. U(x

¯ ∗ , r0 ) Show: the method of successive substitutions {xn } (n ≥ 0) remains in U(x ∗ ∗ ¯ for all n ≥ 0 and converges to x for any x0 ∈ U (x , r0 ). Moveover the following error bounds hold for all n ≥ 0: kxn+1 − x∗ k ≤ αkxn − x∗ k +

∞ X

k=2

γkk−1 kxn − x∗ kk ≤ δkxn − x∗ k.


Chapter 4

Solving Smooth Equations In this chapter we are concerned with the problem of approximating a locally unique solution of an equation in a Banach space. The Newton-Kantorovich method is undoubtedly the most popular method for solving such equations. The theoretical importance of this method should be also emphasized, since using it we can draw conclusions about the existence, uniqueness and domain of distribution of solutions without finding the solution itself, this being sometimes of greater value than the actual knowledge of the solution.

4.1

Linearization of Equations

Let F be a Fr6chet-differentiable operator mapping an open and convex subset D of a Banach space X into a Banach space Y. Consider the equation

F (x) = 0.

(4.1.1)

The principal method for constructing approximations x, to a solution x* (if it exists) of Equations (4.1.1) is based on successive linearization of the equation. Assuming that an approximation x, has been found, we compute the next x,+l by replacing Equation (4.1.1) by

If F' (x,)-I

E L (Y, X), then approximation x,+l is given by

The iterative procedure generated by (4.1.3) is the famous Newton or NewtonKantorovich method [205]. Iterates {x,) ( n 0) are not always easy to compute. First, x, may fall outside the set D for a certain n, and secondly, even if this does not happen, F' (x,)-'

>

4.

88

Solving Smooth Equations

may not exist. If sequence {x,) converges to x* and xo is chosen sufficiently close to x*,the operations F' (x,) and F' (xo)will differ only by a small amount due to the continuity of F'. Therefore the modified Newton-Kantorovich method

Yn+l = Yn - F' (YO)-' F (yn) (n2 0) (yo = XO)

(4.1.4)

can also be used to approximate x*. Method (4.1.4) is easier to use than (4.1.3) since the inverses are computed only once and for all. However the latter method yields worse approximations.

4.2

The Newton-Kantorovich Method

Define the operator

P on D by

P(x)= x - F1(x)-IF (x)

(4.2.1)

Then the Newton-Kantorovich method may be regarded as the usual iterative method

for approximating the fixed point x* of the equation

x = P (x)

(4.2.3)

Consequently, all the results of the previous chapters involving Equation (4.2.3) are applicable. Suppose that lim x, = x*

n-00

We would like to know under what conditions on F and F' the point x* is a solution of Equation (4.1.1). Proposition 4.2.1 If F' is continuous at x = x* E

F (x*)= 0.

D,then we have (4.2.5)

Proof. The approximations x, satisfy the equation

Since the continuity of F at x* follows from the continuity of F',taking the limit we obtain (4.2.5). as n + oo in (4.2.6) Proposition 4.2.2 If

IlF' (x)11 5 b

(4.2.7)

in some closed ball in D which contains {x,), then x* is a solution of F (x)= 0.

4.2. The Newton-Kantorovich Method

89

Proof. By (4.2.7) we get lim F

(2,) = F

12-03

( x * ),

(4.2.8)

and since

(4.2.5) is obtained by taking the limit as n

+ ca in

(4.2.4). rn

Proposition 4.2.3 If

i n some closed ball

which contains {x,), then x* is a solution of equation F ( x ) = 0. Proof. By (4.2.10)

for all x E U ( x o ,r ) . Moreover, we can write

so the conditions of Proposition 4.2.2 hold with

As in Kantorovich (see, e.g., [207])consider constants Bo,70 such that

(4.2.16) 11x1 - ~ 0 1 1 5 Vo = "11 respectively. We state and give a slightly different and improved proof of the celebrated Newton-Kantorovich theorem [206],[207, Th. 6, (I.XVII), p. 7081:

Theorem 4.2.4 If condition (4.2.10) holds i n some closed ball

U ( x o ,r ) i n D and

then the Newton-Kantorovich sequence (4.1.3), starting from xo, converges to a solution x* of Equation (4.1.1) which exists i n u ( x o ,r ) , provided that r>ro=

1 - ,/= "loho

4.

90


Proof. We first show that (4.1.3) is well defined. By (4.2.10) we get

IIF'

(XI)

-

F' (xo)ll I K 11x1 - ~

0 1 1 = Kilo

(4.2.19)

Using (4.2.17),we obtain [IF' ( X I ) - F' (xo)ll I Kilo By Theorem 1.3.2, [F' ( X I ) ] - '

1

1

2Bo

11 [~'(xo)l-'11

I -
0, E > 0, t E [0,11 there exzst 6 = 6 ( ~EO, , t ) and b 2 0 such that

and

II~'(xo)-~~"(xo)ll I b; ( 4 for

the following conditions hold

4.4.

Convergence under Twice-Re'chet Di&ferentiabilityOnly at a Point

99

and

Then, sequence {x,) ( n > 0 ) generated by method (4.1.3) is well defined, remains i n U ( x o ,6) for all n > 0 and converges to a solution x* of equation F ( x ) = 0 with x* E U ( x o ,6). Moreover the following estimates hold for all n > 0

and

Furthermore the solution x* is unique in U(xo,6) if a06 < 1. Proof. We first show operator F' (x)-' is invertible for all x E U ( x o 6). , Indeed we can obtain the identity:

By hypotheses we get

It follows from (4.4.15) and the Banach Lemma on invertible operators 1.3.2 that F' (x)-l E L(Y,X ) and

Note that:

Hence x get

+ d(y

-

x ) - xo E U(x0,6)for all x , y E U(xo,6). By (4.1.3) for n

= 0, we

4.

100


That is x1 E U ( x o ,6). Assume xk E U ( x o ,6) for all k = 0 , I , . . . ,m (4.1.3) we can obtain the identity

+

b E I I F ' ( X O ) ~ ~ F ( X5Xl+( )( ~ k + ~- ~

-

1. Using

(4.4.19)

k ( ( ~ -

Moreover by (4.4.16) for x = xk, (4.1.3) and (4.4.18) we deduce llxk+l

-

xk11 5 l l ~ ' ( x k ) - ~ ~ ' ( x 0 .) I1I1~ ' ( x o ) - ~ ~ ( x k ) l l

I a0llxk

- xk-111

2

+'

1 xkll 5 -a2* 5 L a ' * , a0 a0 which shows (4.4.11). For all m 2 0 we get

x

-

In particular for k = 0 it follows from (4.4.20) and (4.4.8) that x, E ~ ( X O6,) for all n 2 0. Sequence {x,} is Cauchy in a Banach space X and as such it converges to some x* E U ( x o ,6). Estimate (4.4.12) follows from (4.4.20) by letting m + oo. Finally, to show uniqueness let y* E U ( x o ,6) be a solution of equation F ( x ) = 0. Since,

1

1

xk+l

-

y* = [F'(X~)-~F~(XO)]F~(X~)-~ {Ft'[xk

+O(XX

-

y*)]

4.4. Convergence under Twice-Pre'chet Diflerentiability Only at a Point

101

we obtain

from which it follows lim xk k+oo Hence, we deduce

=

y*. However, we showed above lim xk k+oo

=

x*.

x* = y*.

Remark 4.4.1 (a) In general

holds. We can certainly choose E = EO in (4.4.3). However in case strict inequality holds in (4.4.23) more precise error bounds can be obtained. Conditions (4.4.3) and (4.4.4) certainly hold if operator F is twice continuously Re'chet-diferentiable. (b) If condition (4.4.3) is replaced by

then it can easily be seen by (4.4.15) that conclusions of Theorem 4.4.2 hold for ao, Z replacing ao, a where

-

-

b+E

a0 = - and Z = Z o q . 2(1- E )

This case was considered in [196](see Theorem 1). However the stronger condition that operator F is actually twice continuously Frkchet-differentiable was imposed and the convergence of the sequence {x,) is only linear. (c) Note that even in the case of Theorem 4.4.2 using (4.4.9, (4.4.4) instead of (4.4.22)' and (4.4.4) essentially used there the results can be weaker provided that

We can prove the following local convergence theorem for Newton's method:

Theorem 4.4.3 Let F : D C_ X + Y be a twice-Fre'chet diferentiable operator. Assume: (a) there exists a solution x* of equation F ( x ) = 0 such that F f ( x * ) - l E L(Y,X I ; (b) for all EO > 0, E > 0, t E [ O i l ] there exist 6 = 6 ( ~EO, , t ) and b > 0 such that:

+ < EO, 11 F' (x*)-' [ F f ' ( x+ O(x* - x ) ) - Ff'(x*)]11 < E , ~ ~ F ~ ( X * ) - ~ [ F "O (( XX* x * ) )- F I ' ( x * ) ] I I

4.

102


and

where,

Then, sequence { x n ) generated by method (4.1.3) is well defined, remains i n U ( x * ,r * ) for all n 2 0 and converges t o x* provided that xo E U ( x * ,r * ) . Moreover the following estimates hold for all n 2 0 11xn+l - X*

11 I allxn - X*112,

(4.4.32)

where,

Proof. It follows immediately from (4.4.21) by using x * , (4.4.26), (4.4.2'7) instead of xo, (4.4.3) and (4.4.4) respectively. Remark 4.4.2 Local results were not given i n [196]. If they were the following assumptions would have been made

with the corresponding radius r: given by

Moreover according to the analog of the proof of Theorem 4.4.1 the order of convergence would have been only linear. We provide an example for Theorem 4.4.3.

Example 4.4.1 Let X

= Y = R,

D

According to Theorem 4.4.3 for EO Choose 6 = (4.4.30) b = 1, r* =

i.

= U ( 0 , l ) and

= E =

define function F on D by

1 we can have by (4.4.26)-(4.4.28),

i. Then all hypotheses of Theorem 4.4.3 hold.

4.4.

Convergence under Twice-Fre'chet Diflerentiability Only at a Point

103

Moreover if (4.4.33), (4.4.34) are used instead of (4.4.26) and (4.4.213 respectively, then for F = .999 we obtain using (4.4.35) that r ; = .0010005. Choosing 6 < .368 all conditions hold but r;

< r*.

(4.4.38)

Hence in this case our Theorem 4.4.3 provides a wider range of initial guesses xo than the corresponding one mentioned in Remark 4.4.2. Furthermore the order of convergence is quadratic instead of only linear. Remark 4.4.3 If only (4.4.33) is used it can easily be seen by (4.4.21) that

In the case of Example 4.4.1 for Z = .999 we get ra

= . o o l 1 (since q = 1 )

i).O n the other hand, for

E

E (0,

i)we can get

Therefore (4.3.9) is violated. However the conditions of Theorem 4.4.1 are satisfied, 4

+ 4"

since b = O,q = 1 and if we choose E = and take 6 = ( 1 That is, Theorem 4.4.1 guarantees the convergence of method (4.1.3) (starting at xo = 0 ) to the solution x* of equation F ( x ) = 0.

4.

104


Remark 4.4.4 The results obtained here can be extended to hold for m-times Fre'chet differentiable operators (m > 2) [79].

4.5

Exercises

'+

4.1. If ho < $ holds in U ( x o ,r * ) , where r* = F v o , then show: Equation (4.1.1) has a unique solution x* in U ( x o ,r * ) . 4.2. If the hypotheses of Theorem 4.2.4 are satisfied and ho = $, then show: There

exists a unique solution x* of Equation (4.1.1) in U ( x o ,r o ) = U ( x o ,270) 4.3. Let x* be a solution of Equation (4.1.1). If the linear operator F' ( x * ) has

a bounded inverse, and limllX-,.ll,o [IF' ( x ) - F' (x*)ll = 0, then show Newton's method (4.1.3) converges to x* if xo is sufficiently close to x* and

where

E

is any positive number; d is a constant depending on xo and

E.

4.4. The above result cannot be strengthened, in the sense that for every sequence of positive numbers c, such that: limn,, -- 0 , there is an equation for which (4.1.3) converges less rapidly than c,. Define Sn

Show: s ,

=

Cnl2 , { Jc(,-l)l~c(,+l)/r,

+ 0,

+ 0 , and

if n is even if n is odd.

limn,,

$& = 0 ,

(k 2 I).

4.5. Assume operator F' ( x ) satisfies a Holder condition

with 0 < b < 1 and U ( x o ,R). Define ho = boar$ 5 co, where co is a root of

h and let R > = ro, where do = (l+b)(;-ho). Show that method (4.1.3) converges to a solution x* of Equation (4.1.1) in U ( x Oy, o ) .

4.6. Let K , bo, 70 be as in Theorem 4.2.4. If ho = bovoK
0 there exists 60 > 0 such that

(b) for all ~1 > 0 there exists 61 > 0 such that

11 (F1(xo)-'

- F1(x)-l)

11 < E I ,

F'(XO)

for all x E U(xo,61).

Set 6 = min{So, 61) and E = max{EO,EI). (c) for E > 0 there exists 6 > 0 as defined above such that IIF1(x0)-l ( ~ ' ( x o ) F1(x))

11
0

>

c

> 2 be an integer and F an m-times Frkchet-differentiable operator defined on a convex subset D of a Banach space X with values in a Banach space Y. Assume:

4.10. Let m

4.

112


(a) there exists xo E D such that Ft(xo)-' E L(Y,X); (b) there exist parameters 77

> 0, ai > 0, i = 2, . . .,m such that

Since F is m-times F'rQchet-differentiablefor all E such that

> 0 there exists 60 > 0

for all x E U(xo,60); (c) the positive zeros of p' is such that

where

Then polynomial p has only two positive zeros denoted by 61, 62 (61 5 62). (d) U(xo, 6) G D ; (e) 60 E [61,62]or 60 > 62, where

Show: sequence {x,} (n 2 0) generated by method (4.1.3) is well defined, remains in U(xo, 61) for all n 2 0 and converges to a solution x* E U(xo, 61) of equation F(x) = 0. The solution x* is unique in U(xo,60) if 60 E [dl, d2] or x* is unique in U(xo,J2) if J0 > 62. Moreover, the following estimates hold for all n 2 0

llxn - X* 11 5 t* - tn, where {tn) (n 2 0) is a monotonically increasing sequence converging to

t*,generated by

4.11. (a) Let F be a twice FrQchet-differentiableoperator defined on a convex

subset D of a Banach space X with values in a Banach space Y. Assume that the equation F ( x ) = 0 has a simple zero x* E D , in the sense that

4.5. Exercises

113

F1(x*) has an inverse F1(x*)-l E L(Y, X). Then for all exists r* > 0 such that IIF'(x*)-~[F"(x)

-

for all x , y E U(x*,r*).

~ " ( ~ ) ]5l lel,

Moreover, assume there exists bl

el > 0 there

> 0 such that

I bl. IIF'(x*)-~F~'(x*)II Then if

show: sequence {xn) (n 2 0) generated by method (4.1.3) is well defined, remains in U(x*,r*) for all n 2 0 and converges to x* provided that xo E U(x*,r*). Furthermore, the following estimates hold for all n 0:

>

(b) Assume: (i) there exists 7

2 0, xo E D such that F1(xo)-'

E L(Y,X)

I I ~ ' ( x o ) - ~ ~ ( x oI ) l l7; Then, for all k' 2 0 there exists r

> 0 such that

11

IIF'(XO)-~[F"(X) - ~ " ( y ) ] < l o , (ii) there exists b0

'v'x, 9 E ~ ( x or), & D ;

2 0 such that

2 - 6 (iii) 0 , co=~(eo+bo); (iv) r is the smallest positive zero of equation

Show: sequence {xn) (n 2 0) generated by Newton's method is well 0 and converges to a unique defined, remains in U(xo,r) for all n solution x* E ~ ( x or ), of equation F(x) = 0. Moreover, the following error bounds hold for all n 2 0

>

and IIxn+l

I

In

-

2.11.

4.

114 4.12.


(a) Let : D X + Y be a continuously Frkchet-differentiable operator defined on an open convex subset D of a Banach space X with values in a Banach space Y . Assume: -

there exists xo E D such that F ( x o )# 0; F' (x)-' E L (Y,X ) ( x E D ) , and there exists b > 0 such that: [IF' ( X I - ' F'

-

11 I b

(20)

(2E

D; )

for each fixed p E D with F ( p ) # 0, there exists

such that IF' (xo)-' [F' (P) - F ' ( ~ t ) l l l< tllFf (p)-' F ( p )I]

for all

-

and yt collinear to p; there exists 7 > 0 such that:

c=

% E (0,l) ;

and U = U ( x o ,6*) G D , where

Show: sequence {x,) ( n 2 0 ) generated by method (4.1.3) is well defined, remains in U for all n 2 0 and converges to a unique solution x* E u of equation F ( x ) = 0. Moreover the following estimates hold:

and

where

4.5. Exercises

115

(b) Assume: there exists a simple zero x* E D of F in the sense that F' (x*)-I E L (Y,X ) ; - F' (x)-' E L (Y, X ) (x E D ) and there exists q > 0 such that: -

-

for each fixed p E D with F (p) t E (O,1] such that:

[IF'

# 0 there exists 6

> t Ilp - x*II,

(x*)-l [F'(PI - F' (pt)lll < t l l ~ - x* I1

+

for all pt = x t (x* - x) E U (p, 6) - U* = U (x*,r*) G D , where r* >

G D,

q. >

Show: sequence {x,) (n 0) generated by method (4.1.3) is well de0 and converges to x* provided that fined, remains in U* for all n XO E U*. Moreover, the following estimates hold for all n 2 0:

4.13.

>

(a) Suppose that F', F" are uniformly bounded by non-negative constants a < 1, K respectively, on a convex subset D of X and the ball

o(xO -= 0

>

211x0-F(x0)I)

2 D.

Moreover, if

hs = K 1+2a 2

]Ixo-F(xo)~[

1-a

1-a

< 1,

holds, then Stirling's method

converges to the unique fixed point x* of F in o(xo,ro). Moreover, the following estimates hold for all n 0:

>

(b) Let F: D

X

+Y

be analytic. Assume:

4.

116 and 0 #y

-


sup l l ~ ~ ( * ) ( x ~< )m, ll~ k>l

where,

Show: sequence {xn) (n 2 0) generated by Stirling's method (4.5.1) is well defined, remains in U(xo,ro) for all n 2 0 and converges to a unique fixed point x* of operator F at the rate given by (4.5.2) with

(c) Let X

= D =R

and define function F on D by

Using Stirling's method for xo = 3 we obtain the fixed point x* = 0 of F in one iteration, since x1 = 3 - (1 6)-'(3 + 1) = 0. Show: method (4.1.3) fails t o converge.

+

4.14. It is convenient for us to define certain parameters, sequences and functions. Let Itn) (n 2 0) be a Fibonacci sequence given by

Let also c, l , rl be non-negative parameters and define: the real function f by

sequences {s,}

for xn E X, and

(n L -I), {an} (n L -I), {An) (n L -1) by

4.5. Exercises parameters b, d, ro by

Let F : D G X -+ Y be a nonlinear operator. Assume there exist 2 - 1 , xo E D and non-negative parameters c, e, 7 such that:

A;' exists, 11x0 - 2 - 1 11 I c, I l A ; l ~ ( x o )I1 I 7 ,

lpol ( [ x Y, ; FI - [z,w;~ 1 11 )I !(llx - -11 + Ilv - wll), YX,YE , ~D , x # v , w # z , 3-A, so < 7 e7 < (1 - s,,) 2 So = a ,

and

Show: sequence { x n ) (n 2 -1) generated by the Secant method

is well defined, remains in U ( x o ,r o ) for all n 2 0 and converges to a unique solution x* E U ( x Or,o ) of equation F ( x ) = 0. Moreover the following estimates hold for all n 1:

>

and

Furthermore, let

Then r* > ro and the solution x* is unique in U ( x o ,r * ) . 4.15. (1) Let F be a twice F'rkchet differentiable operator defined on a convex subset D of a Hilbert space H with values in H; xo be a point in D. Assume:

4.


(a) there exist constants a, b, c, d and 7 such that

and

for all x E D , y E H;

+

(b) p = gc372 $e27 and (c) U(x0, r*) D , where

+ J1-kE [o, 1)

c

Show: iteration {xn) (n 2 0) generated by

is well defined, remains in U(xo,r*) for all n 2 0 and converges to a unique solution x* of equation F (x) = 0 in U(xO,T * ) . Moreover, the following estimates hold for all n 2 0

and

If we let a = 0, then (1) reduces to Theorem 13.2 in [213, p. 1611. (2) Under the hypothesis of (I), show the same conclusions of (1) hold for iteration

but with p replaced by

119

4.5. Exercises

(3) Under the hypotheses of (2), show that the conclusions of (1) hold for iteration

Results (2) and (3) reduce to the corresponding ones in [213, p. 1631 for a = 0. We give a numerical example to show our results apply, whereas the corresponding ones in [213, pp. 161-1631 do not. (4) Let H = R,D = [-I, 11, xo

=0

and consider equation

The convergence condition in [187, p. 1611 using our notation is

where L is the Lipschitz constant such that

Show: q = L3, d = 56 , c = $ , L = $

and q = s10> 1 .

Hence, (i) is violated. Moreover, show: the condition corresponding to iterations in (2) and (3) in [213, p. 1631 is also violated since q0 = -\/c2(d2

+ Lq)

-

1 E [0, I),

gives

Furthermore show: our conditions hold since for a = 1, b =

i, we get

and

Conclude: there is no guarantee that iterations converge to a solution x* of equation F (x) = 0 under the conditions in [213]. But our results (1)-(3) guarantee convergence for the same iterations.

4.

120


(5) The results obtained here can easily be extended to include m-times F'rkchet differentiable operators, where m > 2 an integer. Indeed, let us assume there exist constants a2,. . . , am+l such that

Show:

The results (1)-(3) hold if we replace p, po by

and

Chapter 5

Newton-like methods We provide a local as well as a semilocal convergence analysis for Newton-like methods in a Banach space setting under very relaxed conditions.

5.1

Two-point methods

In this section we are concerned with the problem of approximating a locally unique solution x* of the nonlinear equation

where F, G are operators defined on a closed ball u ( w , R) centered at point w and of radius R > 0, which is a subset of a Banach space X with values in a Banach space Y. F is F'rkchet-differentiable on g ( w , R), while the differentiability of operator G is not assumed. However we note that the actual condition (4) (see below) used here implies the continuity (but not necessarily the differentiability) of operator G on u ( w , R). We use the two-point Newton-like method

to generate a sequence converging to x*. Here A(x, y) E L(X, Y), for each fixed x, y E p ( w , R). Special cases of method (5.1.2) have already been studied. For example: if A(x, y) = Ft(y) and G(x) = 0 we obtain the Newton-Kantorovich method (4.1.3); if A(%,y) = [x, y; F], and G(x) = 0 method (5.1.2) reduces to the Secant method; if A(x,y) = Ff(y), we obtain the Zincenko iteration, and for A(x, y) = A(y) (A(x) E L(X, Y)) we obtain the Newton-like method studied. Our motivation originates from the following observation: Consider the scalar Then for the initial equation F(d) = d3 - p = 0 on interval b,2 - p] (p E [0, guess do = 1, the famous Newton-Kantorovich hypothesis, which is the sufficient convergence condition for the convergence of Newton's method to x* is violated.

i)).

122

5. Newton-like methods

Therefore under the Newton-Kantorovich theorem 4.2.4 there is no guarantee that method (4.1.3) converges to x* in this case. However one can easily see that Newton's method indeed converges for several values of p, say p E [O, .46) (see also Example 5.1.3). Therefore one expects that the Newton-Kantorovich hypothesis (4.2.51) can be weakened. In order for us to cover a wider range of problems and present a finer unified convergence analysis for iterative methods we decided to introduce method (5.1.2). We provide a local as well as a semilocal convergence analysis for method (5.1.2) under very relaxed hypotheses (see (5.1.3), (5.1.4)). Since we are further motivated by optimization considerations, our new idea is to use center-Lipschitz conditions instead of Lipschitz conditions for the upper bounds on the inverses of the linear operators involved. It turns out that this way we obtain more precise majorizing sequences 12631. Moreover, despite the fact that our conditions are more general than related ones already in the literature, we can provide weaker sufficient convergence conditions, and finer estimates on the distances involved. Several applications are provided: e.g., in the semilocal case we show that the famous for its simplicity and transparency Newton-Kantorovich hypothesis (see (4.2.51)) is weakened (see (5.1.57)), whereas in the local case we can provide a larger convergence radius using the same information (see (5.1.131) and (5.1.132)). Assume there exists fixed v E X such that v E g ( w , R), A(v, w)-' E L(Y,X ) , and for any x, y, z E r ( w , r) 2 r ( w , R), r E [0, R], t E [0, 11, the following hold: l l ~ ( vw)-'[A(x, , Y) - A(v, w>lll I 0 o ( b - vll, I

~ Y - wI>,

(5.1.3)

and

+ t(z - Y)) - A(x,y)l(z - Y) + G(z) - G(Y))II IIA(v,w)-l{l~'(~ 5

~ ( I I Y - w I I ~ I I -~ ~ 1 1 1, 12 - 41,t)Ilz - YII

(5.1.4)

+ [O, +oo), 0 : [O, + a ) 3 x [O,1] + [O,+ a ) are continuous, nonwhere 00 : 10, +a)2 decreasing functions on [O, + a ) 2 , [O, + a ) 3 x [O,1] respectively. Define parameters C-1, C, C1 by

With the above choices, we show the following result on majorizing sequences for method (5.1.2). Lemma 5.1.1 Assume: there exist parameters 7 that:

2 0, 6 E [O, 2), ro

E [0, r] such

1

hs

=2

1 0 ( r 0 , ~ , c + q , t ) d t + 6 0 0 ( c + ~ - l , q + r5 0 )6,

(5.1.6)

5.1. Two-point methods and

for all n 2 0. Then iteration {tn) ( n 2 -1) given by

is nondecreasing, bounded above by

and converges to some t* such that

0 5 t* I t** 5 r. Moreover the following estimates hold for all n 2 0:

Note that for 6 = 0 by (5.1.6) we require ho = 0 provided 8 = 0. However (5.1.8) is really used to show (5.1.16). Proof. Using induction we show: P

1

and

>

for all k 0. Estimate (5.1.13) can then follow from (5.1.14)-(5.1.16) and (5.1.10). For k = 0 estimates (5.1.14)-(5.1.16) follow by the initial conditions. By (5.1.10) we then get

6

0I t2- tl I -(t1 2

-

to).

(5.1.17)

124


Assume (5.1.14)-(5.1.16) hold for all k 5 n

+ 1. We obtain in turn

5 6 by ( 9 ) and (13).

(5.1.18)

Hence we showed (5.1.14) holds for k

=n

+ 2. Moreover we show

Indeed we have:

Assume (5.1.19) holds for all k 5 n tk+2 5 t i t 1

6 + #k+l

-

tk)

+ 1. It follows from (5.1.13)-(5.1.16):

5 tk

+ 26

-(tk

-

tk-1)

6 + -(t*+l 2

-

tk)

Hence, sequence {t,) (n 2 -1) is bounded above by t**. Estimate (5.1.15) holds for k = n 2 by (5.1.14) and (5.1.21). Furthermore (5.1.16) holds for k = n 2 by (5.1.14) and (5.1.21). Finally, sequence {t,) ( n 2 -1) is monotonically increasing by (5.1.15), bounded above by (5.1.21) and as such it converges to some t* satisfying (5.1.12).

+

+

Remark 5.1.1 It follows from (5.1.10) that t** can be replaced i n (5.1.11) by the finer tG* given by

since po 5 6.

5.1. Two-point methods

125

We provide the main result on the semilocal convergence of method (5.1.2) using majorizing sequence (5.1.10). Theorem 5.1.2 Under the hypotheses of Lemma 5.1.1 assume further that for

each fixed pair ( r ,r o ):

and

Note that the existence of A(y-1, yo)-' follows from BO(c-1,co) < 1. Then, sequence {y,} ( n 2 -1) generated by Newton-like method (5.1.2) is well defined, remains i n U ( w ,t*)for all n 2 -1, and converges to a solution x* of equation F ( x ) + G ( x ) = 0 . Moreover the following estimates hold for all n 2 -1:

and

Furthermore the solution x* is unique i n r ( w , t*) if

and i n r ( w , Ro)for Ro E ( t * , r ] if

l1

B(t*, t*

+ Ro,t* + Ro,t ) d t + Bo(t* + e l , t*)< 1.

Proof. We first show estimate (5.1.24)) and yn E u ( w , t * ) for all n

n

2 -1. For

= -1,0, (5.1.24) follows from (5.1.5), (5.1.10) and (5.1.23). Suppose (5.1.24) holds for all n = 0 , 1 , . . . , k 1; this implies in particular (using (5.1.5), (5.1.22))

+

11~k+1- w I I

5 11yk+1 - ykll + IIyk - 7Jk-111 + . . . + llyl - yo//+ llVO 5 (tk+l - t k ) + ( t k - t k - 1 ) + . . . + (tl - t o )+ T O = tk+l - t 0 + T O 5 tk+l 5 t*.

That is, yk+l E U ( w , t * ) . We show (5.1.24) holds for n (5.1.10) we obtain for all x , y E u ( w , t * )

=

k

+ 2.

-

WII

By (5.1.3) and

126


In particular for x

= yk

and y

= yk+l

we get using (5.1.3), (5.1.5),

It also follows from (5.1.29) and the Banach Lemma on invertible operators 1.3.2 that A(yk, yk+l)-l exists, and

where,

Using (5.1.2), (5.1.4), (5.1.10), (5.1.30) we obtain in turn

which shows (5.1.24) for all n 2 0. Note also that

+

(5.1.32) llyk+2 - wII 5 ll~k+2- Yk+i 11 IIyk+l - wII tk+2 - tk+l tk+l - to TO = tk+2 - to + T O 5 tk+2 5 t*.

+

+

That is, yk+2 E V(w, t*). It follows from (5.1.24) that {y,) (n 2 -1) is a Cauchy sequence in a Banach space X, and as such it converges to some x* E r ( w , t*). We can have as above

where,


127

and 1

b1 =

B(R - C , q , 27, t )dt.

(5.1.35)

By letting k -,oo in (5.1.33), using (5.1.30), and the continuity of operators F , G we get boIIA(v,w ) - l ( F ( x * ) + G ( x * ) ) I l = 0 from which we obtain F ( x * )+ G ( x * ) = 0 (since bo > 0). Estimate (5.1.25) follows from (5.1.24) by using standard majorization techniques [99],[263]. To show uniqueness in V ( w ,t * ) ,let y* be a solution of equation (5.1.1) in V ( w ,t*).Then as in (5.1.31) we obtain the identity:

Using (5.1.36) we obtain in turn:

That is, x* = y*, since lim yk = y*. If y* E V ( w ,Ro)then as in (5.1.37) we get k+w

ly* - %+I

11 5

$; B(t*, Ro + t*,Ro + t*,t ) d t 1 - oo(t* +

tl)

I Y*

-

Ykll

< I~Y*

-

(5.1.38)

Hence, again we get x* = y*. rn

Remark 5.1.2 Conditions (5.1.8), (5.1.9) can be replaced by the stronger but easier to check

and

respectively. Note also that conditions (5.1.6)-(5.1.9), (5.1.39), (5.1.40) are of the Newton-Kantorovich-type hypotheses (see also (5.1.58)) which are always present i n the study of Newton-like methods.

128


Remark 5.1.3 (a) Conditions similar to (5.1.3), (5.1.4) but less jlexible were considered by Chen and Yamamoto i n [I211 i n the special case when A ( x , y ) = A ( x ) for all x , y E U ( w ,R) ( A ( x ) E L ( X ,Y ) )(see also Theorem 5.1.6). Operator A ( x ) is intended there to be an approximation t o the Frdchet-derivative F 1 ( x ) of F . However we also want the choice of the operator A to be more flexible, and be related to the difference G ( z ) - G ( y ) for all y, z E U ( w ,R) (see Application 5.1.1 and Example 5.1.1). (b) If we choose

for all x , y E U ( w , r ) , q, q l , q2, q3 E [O,r], t E [O, 11 we obtain the NewtonKantorovich method (4.1.3) (see also Application 5.1.4). (c) Assume:

Define F = 0 , A ( x , Y ) = [Y,x ; GI, G ( x ) - G ( Y ) = [ z ,Y ;G l ( z - Y ) , %?l, q2, q3, t ) = d?(q2, q3)q2 where [ z ,y; GI is a divided digerenee of order one for G (see Chapter 2), and $ is a continuous nondecreasing function in two variables. Then note that function was used in [I901 in the case of the secant method. Note also that obviously (5.1.4) implies again the continuity of operator G for this special choice of function 8. Application 5.1.3 Let us consider some special cases. Define

and set

where F ' , [.,-;G ] denote the Frdchet-derivative of F and the divided d i e r e n c e of order one for operator G (see Chapter 1). Hence, we consider Newton-like method (5.1.2) i n the fomn

The method was studied in [ll7]. It is shown to be of order as the order of Chord), but higher than the order of

-

1.618..

. (same

and wn+l = wn

-

A(wn)-'(F(wn)

+ G ( w ~ ) )( n2 0 ) ,

(5.1.48)


129

where A(.) is an operator approximating F'. Assume:

and

for some non-negative parameters Then we can define:

ti,i = 1,2,3,4 and all x , y E U ( y o r, ) 5 U ( y o ,R).

and

If the hypotheses of Theorem 5.1.2 hold for the above choices, the conclusions follow. Note that conditions (5.1.49)-(5.1.52) are weaker than the corresponding ones in [117,pp. 48-49], [99].Indeed, conditions IIF1(x)-F1(y)I15 &IIx-yll, IIA(x, y)-l11 I e 6 , II[x,Y,~;G]II I e 7 , and

for all x , y, z , w E U ( y o ,r ) are used there instead of (5.1.49)-(5.1.52),where [ x ,y, z ; GI denotes a second order divided difference of G at ( x ,y, z ) , and ti,i = 5,6,7,8 are non-negative parameters. Let us provide an example for this case:

Example 5.1.1 Let X = Y = (R2, 11 -

Consider the system

II(xl,x")IIm = max(lx'1, lxl'l), F = ( F l ,Fz), G = (Gl,G2). For x = ( X I , x") = ~ ( X ' ) ~ X ' ' +( x " )~1, F2(x',x") = ( x ' ) ~ + x ' ( x "-) ~ 1, Gl(xl,x'') = Ixr- 11, G2(x1,x'')= lxl'l. W e shall take [ x , y ; G ]E M z x 2 ( R ) as G~(Y',Y")-G~(x',Y") [ x , Y ;G]i,l = yl-xl , [ x , ~ ; G ] i=, z G i ( ~ y' , y~ "~) --Gxi (~~,~' i, ~=" )1,2, provided that y' # x' and y" # xu. Otherwise define [x,y; GI to be the zero matrix in M z X 2 ( R ) . Using method (5.1.47) with zo = (1,O) we obtain the results in Table 5.1. Using the method of chord with w-1 = (5,5), and wo = ( l , O ) , we obtain the results i n Table 5.2.

Set llxllm ( X I , XI')

=

E R' we take Fl


130

Table 5.1: Method (5.1.47) with zo = (1,O).

Table 5.2: Method of chord with w-1 = (5,5), and wo = (1,O).

Using our method (5.1.46) with y-I i n Table 5.3.

=

(5,5), yo

=

(1, 0), we obtain the results

W e did not verify the hypotheses of Theorem 5.1.2 for the above starting points. However, it is clear that the hypotheses of Theorem 5.1.2 are satisfied for all three


131

Table 5.3: Our method (5.1.46) with 3-1 = (5,5), yo = (1,O).

methods for starting points closer to the solution

chosen from the lists of the tables displayed above. Hence method (5.1.2) (i.e. method (5.1.46) in this case) converges faster than (5.1.47) suggested in Chen and Yamamoto [121], Zabrejko and Nguen (3631 in this case and the method of chord or secant method. In the application that follows we show that the famous Newton-Kantorovich hypothesis (see (5.1.58)) is weakened under the same hypotheses/information. Application 5.1.4 Returning back to Remark 5.1.3 and (5.1.dl), iteration (5.1.2) reduces to the famous Newton-Kantorovich method (4.1.3). Condition (5.1.6) reduces to:

Case 1. Let us restrict 6 E [0, 11. Hypothesis (5.1.9) now becomes

[g -

-

(a)"]

.

132


which is true for all k

20

by the choice of 6. Furthermore (5.1.8) gives

Hence in this case conditions (5.1.6), (5.1.8) and (5.1.9) reduce only to (5.1.56) provided 6 E [O, 11. Condition (5.1.54) for say 6 = 1 reduces to: hl I 1

(5.1.56)

which weakens the famous Newton-Kantorovich hypothesis (4.2.51) since

eo I e

(5.1.57)

in general. Case 2. It follows from Case 1 that (5.1.6), (5.1.8) and (5.1.9) reduce to (5.1.54),

and

respectively provided 6 E [O, 2). Case 3. It turns out that the range for 6 can be extended (see also Example 5.1.3). Introduce conditions: 1

cog 5 1 - -6, 2

for 6 E

[e,, 2),

where,

Indeed the proof of Lemma 5.1.1 goes through weaker condition in this case

(b-

-

);

(;)*+I

50,

or

6 2 So, which is true by the choice of bO.

if instead

of (5.1.18) we show the


133

In the example that follows we show that

6can be arbitrarily large. Indeed:

Example 5.1.2 Let X = Y = R, do = 0 and define function F on R by

+ + az sin ea3"

(5.1.60)

F ( x ) = a ~ x a1

where ai, i = 0,1,2,3 are given parameters. T h e n with the rest of the choices of operators and functions given by (5.1 .dl) we see from (5.1.60) that for as large and a2 suficiently small can be arbitrarily large. Next we provide an example also mentioned in the introduction to show that (5.1.57) holds but not (5.1.58).

Example 5.1.3 L e t X = Y = R , do = 1, D = Ip,2-p], p~ o n D by

[o,;), and

define F

Using (5.1.23), (5.1 .dl) and (5.1.61) we get

which imply

That is there is n o guarantee that method (5.1.42) converges since (4.2.51) is violated. However since lo= 3 - p we find that (5.1.56) holds if p E , which improves (4.2.51) under the same computational cost.

[kp, i)

Finally using Case 3 we can do better. Indeed, choose

Then we get q = .183166.. . , Co = 2.5495,

C = 3.099, and 6' = 1.0656867.

Choose 6 = 6O. Then we get

That is the interval

[kp, i)can be extended to

i).

134


In order to compare with earlier results we consider the case of single step methods by setting x = y and v = w. We can then prove along the same lines to Lemma 5.1.1 and Theorem 5.1.2 respectively the following results by assuming: there exists w E X such that A(w)-l E L(Y,X), for any x, y E U(w,r) U(w, R), t E [O,l]:

-

c

and

where mo, m l , m are as 80, 80 (one variable), 0 (three variables), respectively. Then as in Lemma 5.1.1 we can show the following result on majorizing sequences: Lemma 5.1.5 Assume:

and

+ Smo for all n

> 0.

[&

(1 -

Then, iteration {s,)

(a) (n

+ TO]

> 0) given by

16


135

is nondecreasing, bounded above by

and converges to some s* such that 0I s* I s**. Moreover the following estimates hold for all n

20

Remark 5.1.4 Comments similar to Remark 5.1.2 can now follow for Lemma 5.1.5.

+

Using mo, m(-,-,-) ml, A ( - ) w , instead of 00, O ( - , -,-,-), A ( - ,-), (v, w ) respectively and going through the proof of Theorem 5.1.2 step by step we can show:

Theorem 5.1.6 Under the hypotheses of Lemma 5.1.5 hold assume further that

Then, sequence { w n } (n L 0) generated by Newton-like method (5.1.48) is welldefined, remains in r ( w ,s*)for all n 2 0, and converges to a solution x* of equation F ( x ) G ( x ) = 0. Moreover the following estimates hold for all n 2 0

+

and

Furthermore the solution x* is unique in u ( w ,s*) zf

A1

m ( s * ,2s*,t)dt

or in r ( w ,Ro), if s*

1'

m ( ~ *Ro,

+ m l ( s * )+ mo(s*)< 1,

(5.1.77)

< & 5 r , and

+ ~ * , t ) d+t ~ I ( s *+) m o ( ~ *

Then, sequence {x,} ( n 0 ) generated by method (5.3.2) is well defined, remains i n U ( x * ,r * ) for all n > 0 and converges to x* provided that x-1, xo belong i n U ( x * ,r*).Moreover the following estimates hold for all n 2 0:

Proof. Let us denote by L = L ( x , y ) the linear operator

Assume x , y E U ( x * ,r*). We shall show L is invertible on U ( x * ,r * ) , and

IIL-~F'(x*)II

I [l-ally

- x*II - cllx -

Y112]-1

I [ I - ar* - 4c(r* )2 ] -1 .

(5.3.13)

Using (5.3.2), (5.3.4)-(5.3.10), we obtain in turn: IIF'(x*)-~[FI(x*)- L]11

=

+ ( [ Y , Y ]- [

= I I ~ ' ( x * ) - l [ ( [ x * , ~-* [l Y , Y ] )

2 -~x7xl)l

I ally - x*II + clly - x1I2

I ar* + c[l[y- x*ll + IIx* - x11I2 I ar* + 4 ~ ( r *II I Z(lb - 41 + I

~ Y - vll>

(5.3.19)

instead of (5.3.4) and (5.3.5). Note that (5.3.19) implies (5.3.4) and (5.3.5). Moreover we have:

and

Therefore stronger condition (5.3.19) can replace (5.3.4), (5.3.6) and (5.3.9) i n Theorem 5.3.1. Assuming F has divided differences of order two, condition (5.3.6) can be replaced by the stronger

or the even stronger

Note also that

EST and we can set

despite the fact that E is more difficult to compute since we use divided differences of order two (instead of one). Conditions (5.3.19) and (5.3.23) were used in [236] to show method

converges to x* with order 1.839.. . which is the solution of the scalar equation

5.3. A fast method

155

Potra in [236] has also shown how to compute the Lipschitz constants appearing here in some cases. It follows from (5.3.11) that there exist co, N sufficiently large such that:

I ~ x ~ -x*II + ~

I

for a11n2 N.

(5.3.28)

Hence the order of convergence for method (5.3.2) is essentially at least two, which is higher than 1.839.. . . Note also that the radius of convergence r* given by (5.3.9) is larger than the corresponding one given in [[236],estimate (5.3.22)]. This observation is very important since it allows a wider choice of initial guesses x-1 and xo. It turns out that our convergence radius r* given by (5.3.9) can even be larger than the one given by Rheinboldt [295] (see, e.g., Remark 4.2 in [277])for Newton's method. Indeed under condition (5.3.19) radius rk is given by

We showed in Section 5.1 that have: rk

(or %) can be arbitrarily large. Hence we can

< r*.

(5.3.30)

In 5.1 we also showed that rk is enlarged under the same hypotheses and computational cost as in [247]. We note that condition (5.3.10) suffices to hold only for x, y being iterates of method (5.3.2) (see, e.g., Example 5.3.2). Condition (5.3.24) can be removed if D = X. In this case (5.3.7) is also satisfied. Delicate condition (5.3.9) can also be replaced by stronger but more practical one which we decided not to introduce originally in Theorem 5.3.1, so we can leave the result as uncluttered-general as possible. Indeed, define ball Ul by U1 = U (x*,R*)

with

R* = 3r*.

If xn-l,xn E U* (n L 0) then we conclude 22, - xn-1 E Ul (n 2 0). This is true since it follows from the estimates

Hence the proof of Theorem 5.3.1 goes through if both conditions (5.3.7), (5.3.9) are replaced by

u1 c Do. We provide an example to justify estimate (5.3.30).

((5.3.7'))

156


Example 5.3.1 Returning back to Example 5.1.5 we obtain a and a = 5. I n view of (5.3.8) and (5.3.29), we get

r& = .24525296 < .299040145 W e can also set R* = 3r*

=

=

b = e - 1, c

=

e

= r*.

.897120435.

We can show the following result for the semilocal convergence of method (5.3.2). Theorem 5.3.2 Let F be a nonlinear operator defined on an open set D of a Banach space X with values i n a Banach space Y . Assume: operator F has divided diferences of order one and two on Do G D ; there exist points x-1, xo i n Do such that 2x0 - x-1 E Do and A. = [2xo - 2 - 1 , x P l ] is invertible on D ; Set A, = [2xn - x,-1, x,-I] ( n L 0). there exist constants a ,p such that:

for all x , y, u, v E D , and condition (5.3.1 0) holds; Define constants y , 6 by

define 8, r , h by

+

+

8 = {(a P Y ) ~ V ( 1 -

u?~)}~'~,

(5.3.36)

and

where ro E (O,r] is the unique solution of equation

on interval (O,r]. Then sequence {x,) (n 2 -1) generated by method (5.3.2) is well defined, remains i n U ( x o ,r o ) for all n 2 -1 and converges to a solution x* of equation F ( x ) = 0 . Moreover the following estimates hold for all n -1

5.3. A fast method

and

where,

'YO = a!

+ +Pro + /37,

= 3pr2 o - 2'~oro- p y 2

+ 1,

and for n L 0

Furthermore if ro 5 rl and

x* is the unique solution of equation (5.3.1) i n U ( x o ,r o ) .

Proof. Sequence {t,) ( n > -1) generated by (5.3.44)-(5.3.46) can be obtained if method (5.3.2) is applied to the scalar polynomial f ( t )= -pt3

+ -/ot2+ ~ l t .

It is simple calculus to show sequence {t,) ( n zero (decreasingly). We can have:

We show (5.3.42) holds for all n

(5.3.48)

> -1)

converges monotonically to

2 -1. Using (5.3.33)-(5.3.38) and

we conclude that (5.3.42) holds for n = -1,O. Assume (5.3.42) holds for all n I k and xk E U(x0,ro). By (5.3.10) and (5.3.42) xk+l E U(x0,ro). By (5.3.31), (5.3.32)

5. Newton-like methods and (5.3.42) IIA,~(AO

-

Ak+1)11 =

+

IIA; ([axo- X - I , X-11 - [ X O , X-11 [ X O , 2-11 - [ X O , X O ] + LxO, 201 - [xk+l,X O ] [xk+l,201 - [xk+l,xk] [xk+l,xk] - [2xk+1 - xk, x k ] )11

-

-

1

+

+

= 11~,~(([2 -xx-1, o x-1, xo] - [xo,2 - 1 , xo])(xo- 2 - 1 )

+ +

( [ X O , X O ] - [xk+l,~ 0 1 ) ([xk+l,201 - [xk+l,x k ] ) ([xk+l,xk] - [2xk+1 - xk, x k ] ) )11

+

I P'Y2 + (11x0 - xk+lll + 11x0 - xkll

+ 11xk - ~ k + l l l ) a

I Py2 + 2(to - t k + l ) a < 2py2 + 2ar 5 1.

(5.3.52)

It follows by the Banach lemma on invertible operators 1.3.2 and (5.3.52) that exists, so that

AL:,-:,

We can also obtain

Using (5.3.2), (5.3.53) and (5.3.54) we get

which together with (5.3.41) completes the induction. It follows from (5.3.42) that sequence {x,) (n 2 -1) is Cauchy in a Banach space X and as such it converges to some x* E r ( x o ,ro). By letting k + oo in (5.3.55) we obtain F ( x * ) = 0. Finally to show uniqueness, define operator

5.3. A fast method

where y* is a solution of equation (5.3.1) in V ( x o ,r l ) . We can have IIA,l(~o - Bo)ll I I a [IIY*- (2x0 - ~ - 1 ) I l

[II(Y* I a [IIY* I

I a(2y

+ Ilx* + Il(x* - xo) + (xo -

- xo) - (xo - 2 - 1 ) I I - xoll

2-1111

2-1)11]

+ 2 11x0 - 2-111 + Ilx* - xoll]

+ ro + r l ) < 1.

(5.3.57)

It follows from the Banach lemma on invertible operators and (5.3.57) that linear operator Bo is invertible. We deduce from (5.3.56) and the identity

that x* = y*.

(5.3.59)

Remark 5.3.2 (a) It follows from (5.3.42), (5.3.43), (5.3.50) and (5.3.55) that the order of convergence of scalar sequence {t,} and iteration {x,} is quadratic. (b) The conclusions of Theorem 5.3.2 hold i n a weaker setting. Indeed assume:

and

for all x , y E Do. It follows from (5.3.31), (5.3.32) and (5.3.60) - (5.3.65) that

and

For the derivation of: (5.3.53), we can use (5.3.60) - (5.3.62) and (5.3.65) instead of (5.3.31) and (5.3.32), respectively; (5.3.54), we can use (5.3.63) instead of (5.3.31; (5.3.57), we can use (5.3.65) instead of (5.3.31). The resulting majorizing sequence call it {s,) is also converging to zero and is finer than {t,) because of (5.3.66)

160


and (5.3.67). Therefore if (5.3.10), (5.3.60) - (5.3.65) are used i n Theorem 5.3.2 instead of (5.3.31) we draw the same conclusions but with weaker conditions, and corresponding estimates are such that:

and

for all n 2 0 . (c) Condition (5.3.32) can be replaced by the stronger (not really needed i n the proof) but more popular,

for all v , u , x , y E D. (d) A s already noted at the end of Remark 5.3.1, conditions (5.3.10) and (5.3.40) can be replaced by U2 = U ( x o ,Ro)C Do with Ro = 37-0

((5.3.40'))

provided that x-1 E U2. Indeed if xn-1, xn E Uo ( n L 0 ) then 112xn - xn-1[1 I 211x72 - ~

0 1 1 + 1 1 ~ ~ - 1 2011 < 3r0. -

That is 22, - xn-1 E U2 ( n 2 0 ) . We can also provide a posteriori estimates for method (5.3.2): Proposition 5.3.3 Assume hypotheses of Theorem 5.3.3 hold. Define scalar sequences {p,) and {q,) for all n 2 1 by:

and

Then the following error bounds hold for all n L 1:

where,

5.3. A fast method Proof. As in (5.3.52) we can have in turn:

It follows from (5.3.71) and the Banach lemma on invertible operators 1.3.2 that linear operator [x,, x*] is invertible, and

Using (5.3.2) we obtain the approximation:

By (5.3.54), (5.3.76) and (5.3.77) we obtain the left-hand side estimate of (5.3.73). Moreover we can have in turn:

I

+

a(tn-1 - tn)2 P(tn-1 - tn)(tn-I - t n - ~ ) ~ 1 - By2 - 2a(to - t,) - at,

I -@ti + at: -Pr; + at,

++ yl

= t,.

rn

A simple numerical example follows to show: (a) how to choose divided difference in method (5.3.2); (b) method (5.3.2) is faster than the Secant method [68]

(c) method (5.3.2) can be as fast as Newton-Kantorovich method (4.1.3) Note that the analytical representation of F f ( x n ) may be complicated which makes the use of method (5.3.2) very attractive.

162


Example 5.3.2 Let X

= Y = R,

and define function F on Do = D

=

(.4,1.5) by

Moreover define divided difference of order one appearing i n method (5.3.2) by

I n this case method (5.3.2) becomes

and coincides with method (4.1.3) applied to F . Furthermore Secant method (5.3.79) becomes:

Choose x-1 = .6 and xo = .7. Then we obtain:

n 1

1 Method 2 1 Secant method (5.3.83) 1

.980434783

1

.96875

We conclude this section with an example involving a nonlinear integral equation: Example 5.3.3 Let H ( x , t ,x ( t ) ) be a continuous function of its arguments which is sufficiently many times differentiable with respect to x . It can easily be seen that i f operator F i n (5.3.1) is given by

then divided difference of order one appearing in (5.3.2) can be defined as h n ( s ,t ) =

H ( s , t , 2 x n ( t ) - xn-I ( t ) )- H ( s , t ,2,-1 ( t ) ) > 2(xn( t )- xn-1 ( t ) )

provided that if for t = t, we get x,(t) = ~ , - ~ ( tthen ) , the above function equals Note that this way h,(s, t ) is continuous for all t E [0, 11. HL(s, t,, x,(t,)).

5.3. A fast method

163

W e refer the reader t o Chapter 2 for the concepts concerning partially ordered topological spaces (POTL-spaces). The monotone convergence o f method (5.3.2) is examined in the next result. Theorem 5.3.4 Let F be a nonlinear operator defined on an open subset of a regular POTL-space X with values in a POTL-space Y. Let xo, yo, y-1 be points 0.f D such that:

Moreover assume: there exists a divided difference [., .] : D all ( x ,y) E Dz with x 5 y

+ L ( X , Y)

2y - x E Do,

such that for (5.3.87)

and

Furthermore, assume that for any ( x ,y) E 0: with x I y, and ( x ,2y - x ) E D i the linear operator [ x ,2y - x ] has a continuous non-singular, non-negative left subinverse. Then there exist two sequences { x n } (n 2 I ) , { y n ) (n 2 I ) , and two points x*, y* of X such that for all n 2 0:

lim xn = x*,

n+m

lim yn = y*

n+m

Finally, if linear operators An = [yn-l, 2yn - yn-I] are inverse non-negative, then any solution of the equation F ( x ) = 0 from the interval Do belongs to the interval ( x * ,y*) (i.e., xo I v I yo and F ( v ) = 0 imply x* I v I y*). Proof. Let & be a continuous non-singular, non-negative left subinverse o f Ao. Define the operator Q : (0,yo - x o ) + X by

It is easy t o see that Q is isotone and continuous. W e also have:


164

According to Kantorovich's theorem 2.2.2 on POTL-spaces [99],[272],operator Q has a fixed point w E ( 0 ,yo - xo). Set xl = xo w.Then we get

+

By (5.3.88) and (5.3.94) we deduce:

Consider the operator H: ( 0 ,yo - xl) + X given by

Operator H is clearly continuous, isotone and we have:

By Kantorovich's theorem there exists a point z E (0, yo - xl) such that H ( z ) = z . Set yl = yo - z to obtain

Using (5.3.88), (5.3.95) we get:

Proceeding by induction we can show that there exist two sequences {x,) ( n 2 I), {y,) ( n 2 1) satisfying (5.3.89)-(5.3.92) in a regular space X , and as such they converge to points x*,y* E X respectively. We obviously have x* I y*. If xo 5 uI yo and F ( u ) = 0 , then we can write

and

If the operator A. is inverse non-negative then it follows that xl 5 u 5 yl. Proceeding by induction we deduce that x, < u 5 y, holds for all n 0. Hence we conclude

>

x* I u I y*. rn In what follows we give some natural conditions under which the points x* and y* are solutions of equation F ( x ) = 0.

5.3. A fast method

165

P r o p o s i t i o n 5.3.5 Under the hypotheses of Theorem 5.3.4, assume that F is continuous at x* and y* if one of the following conditions is satisfied:

( a ) x*

= y*;

(b) X is normal, and there exists an operator T : X

-+ Y ( T ( 0 ) = 0 ) which has an isotone inverse continuous at the origin and such that An < T for suficiently large n ;

( c ) Y is normal and there exists an operator Q : X + Y ( Q ( 0 ) = 0 ) continuous at the origin and such that An 5 Q for suficiently large n ; ( d ) operators A, (n 2 0 ) are equicontinuous.

Then we deduce

Proof. (a) Using the continuity of F and (92) we get

Hence, we conclude

(b) Using (90)-(93) we get

0 0

2 F ( x n ) = A n ( x n - xn+l) 2 T ( x n - xn+l),

I F ( Y ~=) An(yn - %+I) I T ( 3 n - ~ n + l ) -

Therefore, it follows:

By the normality of X , and lirn (x, - x,+l) = lirn ( y , - y,+l) = 0 ,

n-co

n-cc

T - l F ( x , ) ) = lirn T - ' ( F ( ~ , ) ) = 0. Using the continuity of F we n+cc obtain (5.3.96). (c) As before for sufficiently large n

we get lirn,,,

By the normality of Y and the continuity of F and Q we obtain (5.3.96). (d) It follows from the equicontinuity of operator A, that lirn A,vn = 0 whenever n+cc

lim v, = 0. Therefore, we get lim A,(x, - x,+l ) = lim A,(y, - y,+l) = 0. By

n+oo

n+oo

n+cc

(5.3.89), (5.3.90), and the continuity of F a t x* and y* we obtain (5.3.96). That completes the proof of Proposition 5.3.5.


166

Remark 5.3.3 Hypotheses of Theorem 5.3.4 can be weakened along the lines of Remarks 5.3.1 and 5.3.2 above and the works i n [277, pp.102-1051, or Chapter 2 on the monotone convergence of Newton-like methods. However, we leave the details to the motivated reader. Remark 5.3.4 W e finally note that (5.3.2) is a special case o f t h e class of methods of he form:

where A, are real numbers depending o n x,-I

and x,, i.e.

and are chosen so that i n practice, e.g., for all x , y E D

+ (1 + X(x, y))y - X(x, y)x E D.

(5.3.99)

Note that setting X(x, y) = 1 for all x , y E D in (5.3.97) we obtain (5.3.2). Using (5.3.97) instead of (5.3.2) all the results obtained here can immediately be reproduced in this more general setting.

5.4

Exercise

5.1. Let F be a nonlinear operator defined on an open convex subset D of a Banach space X with values in a Banach space Y and let A ( x ) E L ( X , Y) ( x E D). Assume: there exists xo E D such that A(xo)-' E L ( Y , X ) ; there exist non-decreasing, non-negative functions a, b such that: IIA(xo)-'[A(x>- A(xo)lll I a(llx - xoll), IIA(XO>-'[F(Y> - F ( x ) - A(x)(Y- x)lll 5 b(llx - yll)llx - ~ l for all x, y E D ;

there exist rl 2 0, r o > 7 such that

and

l,

5.4. Exercise where

and

ro is the minimum positive root of equation h(r) = 0 on (0, ro],where

>

Show: sequence {x,} (n 0) generated by Newton-like method (5.2.1) is 0 and converges to a solution well defined, remains in U(xo, ro) for all n x* E U(xo,ro) of equation F(x) = 0. 5.2.

>

(a) Let F be a FrBchet-differentiable operator defined on some closed convex subset D of a Banach space X with values in a Banach space Y; let A(x) E L(X,Y) (x E D). Assume: there exists xo E D such that A(xo) E L(X, Y), A(xo)-l E L(Y, X), and

Then, show: (1) for all ~1 > 0 there exists 61 > 0 such that

~I[A(X)-'- A ( x o ) - ~ ] A ( x o 0 there exist 6 > 0 as defined above such that

and

11 [A(x)-'

- A(xo)-']A(xo)ll

< E,

for all x, y E U(xo, 6). (b) Let operators F, A, point xo E D, and parameters Assume there exist Q 0, c 0 such that

>

>

E,

6 be as in (1).

5. Newton-like methods and

Show: sequence {x,) (n 2 0) generated by Newton-like method (5.2.1) is well defined, remains in U(xo, 6) for all n 2 0, and converges to a solution x* E U(xo,6) of equation F ( x ) = 0. Moreover, if linear operator

is invertible for all x, y E D , then x* is the unique solution of equation F(x) = 0 in U(xo, 6). Furthermore, the following estimates hold for all n 20

and

(c) Let X

= Y = IR,D

> U(0, .3), xo = 0,

Set A(x) = F1(x) (x E D), 63 = J4 = ES = ~4 = .3. Then we obtain

-

f < 1, = .28 < 6 = 63.

C3 =

The conclusions of (b) hold and

(d) Let x* E D be a simple zero of equation F(x) = 0. Assume:

< ~ 1 1 , for a11 x E U(x*,611), 11 [A(x)-' - A(x*)-l]A(x*) 11 < ~ 1 2 , for a11 x E U(x*, 612). - A(x)]II IIA(X*)-~[F'(~)

Set

Further, assume:

5.4. Exercise and

Show: sequence, {x,) ( n 2 0 ) generated by Newton-like method (5.2.1) is well defined, remains in U ( x * ,613) for all n 0 and converges to x* with

>

[lxn+l - x* 11 I cgllxn - x*11,

for all n

2 0.

5.3. Consider equation (5.1.1) and iteration (5.1.48). Assume:

and

+

for all x,, x , y E U (xo,r ) C U (xo,R ) , t E [0,11, where wn ( r t )- v, ( r ) t 2 0 and en ( r ) ( n 2 0 ) are nondecreasing non-negative functions with wn ( 0 ) = vn ( 0 ) = en ( 0 ) = 0 ( n 0 ) , vn ( r ) are differentiable, vk ( r ) > 0 ( n 2 0 ) for all r E [0,R ] , and the constants b,, en satisfy bn L 0, c, 2 0 and b, +en < 1 for all n t 0. Introduce for all n, i 2 0 , an = IIA (xn)-' ( F (x,) G 11, Pn,i ( r ) = ai-r+cn,i W n ( t )dt, Z n ( r ) = 1-vn (r)-bn, $n ( T ) = cn,i Jo en ( t )dt, -1 cn,i = Z n ( ~ i ), hn,i ( r ) = ~ n ,(r)+$n,i i ( r ) ,rn = IIxn - ~ 0 1 1 , an = IIxn+l - xnII, the equations

Jl

an = r

(1'

+ coo.,

(WO (tn

and the scalar iterations

+

+ t ) + en (rn + t ) )dt + (bn - en - 1) r )

pn))

(5.4.3)

5. Newton-like methods hn,n(~k,n+rn)

=

+ C n , n z n , n ( ~ k , n + ~ n )(k 2 n) ; (b) The function h ~(r), has ~ a unique zero sg in the interval [0,R] and ho,o(R) I 0;

(c) The following estimates are true:

for all r E [0,R - r,] and for each fixed n

2 0.

Then show: i. the scalar iterations {s:+~,,) and {s*+~,,)for k 2 0 are monotonically increasing and convergence to s;*, and :s for each fixed n 2 0, which are the unique solutions of equations (5.4.1) and (5.4.2) in [0, R - s,] respectively with s r I s;*, (n 0).

>

ii. Iteration (5.1.48) is well defined, remains in (xo,s*) for all n 2 0 and converges to a solution of equation (5.1. l), which is unique in u (xo, R). iii. The following estimates are true:

and

I:* I I; (n

2 0)

where I; and I;* are the solutions of the equations (5.4.3) and (5.4.4) respectively for all n 2 0.i The above approach shows how to improve upon the results given in [I211 by Yamamoto and Chen. iv. Define the sequence {s:)

(n 2 0) by

Then under the hypotheses (5.4.1)-(5.4.3) above show that:

and

where t* = lim s,.1 n+cc

5.4. Exercise

171

5.4. Consider the Newton-like method (5.2.1). Let A : D t L ( X , Y ) , xo E D, M-1 E L ( X ,Y ) ,X C Y , L-1 E L ( X ,X ) . For n 2 0 choose Nn E L ( X ,X ) and define Mn = Mn-lNn A (x,) Ln-1, Ln = LnP1 Ln-lNn, xn+l = xn Ln (yn), yn being a solution of Mn (yn) = - [F (x,) zn] for a suitable xn E Y .

+ +

+

+

Assume: (a) F is FrQchet-differentiableon D. (b) There exist non-negative numbers a , a0 and nondecreasing functions w , wo : R+ t R+ with w (0) = wo ( 0 ) = 0 such that

IIF (xo)ll I ao, IIRo ( ~ o ) l I l a, ~ ~011) [ [ A( x ) - A (xo)ll I wo ( 1 1 and

for all x , y E u ( x o ,R ) and t E [0, 11. (c) Let M-1 and L-1 be such that M P 1 is invertible,

11

[ [ M I : I P, IIL-111 I

r and

IIM-1

-

A ( S O ) LP111 I 6.

(d) There exist non-negative sequence {a,), {Zi,) for all n 2 0

, {b,)

and {en) such that

and

(e) The scalar sequence {t,) ( n 2 0 ) given by

tn+l = tn+l+ en+ldn+l(l+ cn+l) n

In

i=l

( n >), to

I

+ C hiw ( t i ) (ti - ti-1) + w (tn+l)(tn+l - t n )

= 0, t l = a

is bounded above by a t: with 0 < t; I R, where

5. Newton-like methods and

(f) The following estimate is true

E,

I E < 1 (n 2 0) .

Then show: i. scalar sequence {t,) (n 2 0) is nondecreasing and convergence to a t* with O t* and

(b) Let F: D C X C Y be a Frkchet-differentiable operator and A(x) E L(X, Y). Assume: there exist a simple zero x* of F and nonnegative continuous functions a, p, 7 such that

for all x E D; equation f

1

has nonnegative solutions. Denote by r* the smallest; and ~ ( x *r*) , C D. Show: Under the above stated hypotheses: sequence {x,) (n > 0) generated by the Newton-like method (5.2.1) is well defined, remains in U(x*,r*) for all n 0 and converges to x*, provided xo E U(x*,r*).

>

5. Newton-like methods Moreover the following estimates hold for all n

> 0:

IIxn+l - X* II I 6nIIxn - ~*11, where

5.6. (a) Assume:

>

there exist parameters K 0, M A3 E [O,l], 6 E [O, 2) such that:

> 0, L

2 0,

e > 0, p

2 0, q

> 0, XI,

X2,

and

where, q=-

6

1+*1 '

Then, show: iteration {tn) (n 2 0) given by

is non-decreasing, bounded above by


Moreover, the following estimates hold for all n 0 I tn+2 -tn+l I q(tn+l -tn)

20

I qn+l 7.

(b) Let X1 = X2 = X3 = 1. Assume: there exist parameters K that:

> 0, M > 0, L > 0, C > 0, p > 0, q > 0, 6 E [ O , l ] such

5.4. Exercise and

then, show: iteration {tn) (n

> 0) is non-decreasing, bounded above

t** = 21) 2-6


Moreover the following estimates hold for all n 2 0 0 I tn+z - tn+l I g(tn+l- tn) I (c) Let F: D

(i)n+l V .

X + Y be a F'rkchet-differentiable operator. Assume:

(1) there exist an approximation A(x) E L(X, Y) of Ft(x), an open convex subset Do of D l xo E Do, parameters q L 0, K 2 0, M 2 0, L 0, p 2 0, 1 2 0, A 1 E [O,l], Xz E [O,l], X3 E [O,1] such that:

and

+

[IA(X~)-' [ ~ ( x-) A(xo)]11 I Lllx - xollx3 1 for all x, y E DO; (2) hypotheses of (a) or (b) hold; (3)

U(xo,t*)

Do.

Then, show sequence {xn) (n 2 0) generated by Newton-like method (5.2.1) is well defined, remains in U(xo,t*) for all n 2 0 and converges to a solution ) F(x) = 0. x* E ~ ( x ~ , tof* equation Moreover the following estimates hold for all n 2 0:

and 112,

- x*II

I t* - t,.


176 Furthermore the solution x* is unique in U ( x o ,t * ) if

(t*)l+" or in U(x0,Ro) if Ro

(d) Let F : D

X

+Y

+ ~ ( t * )+PI ~ '

< 1,

> t * , U ( X ORo) , c DO,and

be a Frbchet-differentiable operator. Assume:

(a) there exist an approximation A ( x ) E L ( X , Y ) of F 1 ( x ) ,a simple solution x* E D of equation F ( x ) = 0, a bounded inverse A(x*) and parameters K , M , +, 22 0, X q , X 5 , X6 E [O,1]such that:

z,

and

z

I I A ( x * ) - ~ [ A-( xA)( X * ) ] I I ~ L I I X - I L . * I+ I~~ for all x , y E D; (b) equation K +

+

+ i r A 6 MrA5+ + + I - 1 = o

$4

has a minimal positive zero ro which also satisfies:

Er,X6+ 2< 1 and

Then, show: sequence {x,) (n > 0 ) generated by Newton-like method (5.2.1) is well defined, remains in U ( x * ,ro) for all n 2 0, and converges to x* provided that xo E U ( x * ,ro). Moreover the following estimates hold for all n 2 0:

11xn+l

-

X*

11 I 1

5 1-jqlzn-z*ll*6-r

K

+ ~ 1 1 ~ X. * I I +~ P] ~ IIX.

[ m ~ ~ ~ x*1lA4 n

-

-x

*.

5.7. Let F be a nonlinear operator defined on an open convex subset D of a Banach space X with values in a Banach space Y and let A ( x ) E L ( X , Y ) ( x E D ) . Assume:

5.4. Exercise (a) there exists xo E D such that A(xo)-' E L(Y, X ) ; (b) there exist non-decreasing, non-negative functions a, b such that: IIA(x0)-'[A(.>

- A(xo>lllI 4llx - xoll),

llA(xo)-'[~(Y)- F(x) - A(x)(Y- x)lll 1 b(llx - yll)llx - Yll for all x, y E D; (c) there exist 7 2 0, ro

> 7 such that

and d(r) < 1 for all r E (O,ro], where

and

(d) ro is the minimum positive root of equation h(r) = 0 on (0, ro]where,

Then show: sequence {x,) (n 2 0) generated by Newton-like method (5.2.1) is well defined, remains in U(xo, ro) for all n 2 0 and converges to a solution x* E U(xo,ro) of equation F(x) = 0.

C X + Y be a Frkchet-differentiable operator for some > 0, and A(x) E L(X,Y). Assume:

5.8. (a) Let F : U(z, R)

z E X, R

A(z)-l E L(Y, X ) and for any x, y E ~ ( zr), G ~ ( zR),

178


+

>

where w(r t) - wl (r) (t 0), wl (r) and wz(r) are non-decreasing, nonnegative functions with w(0) = wo(0) = wl(0) = ~ ~ ( =0 0,) wo is differentiable, wk(r) > 0, r E [0,R],

and parameters a , b satisfy

Define functions

(PI, ( ~ 2 (P ,

by

p r

iteration {rn) (n

where

(PO

> 0) by

is the minimal value of cp on [0,R];

Then show iteration {xn) (n 2 0) generated by Newton-like method (5.1.48) is well defined, remains in ~ ( zr*) , for any

and converges to a solution x* E o ( z , r:), which is unique in

where r* is the minimal point, r: is the unique zero on (0, r*], and r* = lim r,. n-00

>

Moreover sequence {rn)(n 0) is monotonically increasing and converges to r*. Furthermore the following estimates hold for all n 2 0

5.4. Exercise

179

provided that ro E R,,, where for x E D(r*)

and

In the next result we show how to improve on the error bounds. (b) Under the hypotheses of (a) show the conclusions hold with {r,) and D(r*) replaced by {t,} (n 2 O), D(t*) given by

( n 2 11,

t* = lim t,. n-00

Moreover iteration {t,} is monotonically increasing and converges to t* Furthermore the following hold for all n 2 1:

and r*. t* I If strict inequality holds in (5.4.5) so does in (5.4.6-5.4.8).

wo (r) = ~1 (r) r E [O, R] our result reduces to Theorem 1 in [121]. If (5.4.5) holds our error bounds are at least as fine as the ones given by Chen and Yamamoto (under the same hypotheses and a more general condition. Moreover our error bounds are finer. 5.9. (a) Let F : D

C X + Y be differentiable. Assume:

180


There exist functions f l : [O, 11 x [0,cm12 + [ O , c m ) , f 2 , f3 : [ O , c m ) nondecreasing on [O, cm12, [O, cm) , [0,cm) such that

+

[O,cm),

hold for all t E [O, 11 and x , y E D;

has non-negative solutions, and denote by ro the smallest one. In addition, ro satisfies:

U (xo,T O ) G D, and

where bo = :f.

(t,v,O,v)dt+fz(O) 1-f3(v) 1 .ft f i (t,bov,v,v+bov)dt+f*(v) bl = 1 - f 3 (v+bov) , fi

and b = b ( r ) = .fd

fl

(t,blbov,r,r)dt+fi(r) l-f3(7-)

,

Then, show iteration { x n ) ( n 2 0 ) generated b y Newton-like method (5.2.1) is well defined, remains in U ( x o ,ro) for all n 2 0 and converges to a solution x* E U ( x o ,ro) of equation. Moreover, the following estimates hold 11x2 - xi ll 5 boq 11x3 - x2ll 5 bl 11x2 - xi 11 . IIxn+l - x*II I b2 1IxL.n - xn-111, ( n 2 3) and IIx* - xnll I llx* - xnll I where

b2

= b (ro).

boblb;-2T, l-bz

b"

&,

( n2 3 ) ,

( nL 0)

5.4. Exercise Furthermore if r o satisfies rl

lo

$1

(t, 2 ~ 0TO, , TO)dt

+ f2 (TO)+ f3 (TO)< 1,

x* is the unique solution of equation F(x) = 0 in U (xo,ro) . Finally if there exists a minimum non-negative number R satisfying equation

1'

fl

(t, r

+ r ~ , r) dt +

such that U (xo,R)

TO,

+

fi (TO) fi (ro) =

1,

D, then the solution x* is unique in U (xo,R).

-

(b) There exist a simple zero x* of F and continuous functions f 4 : [O, 11x [0,co) + [o, co), f5, f6 : [O, co) [O, co), non-decreasing on [O, co) such that

and

hold for all t E [O,1] and x E D; Equation

1'

f4

( t , r ) d t + f 5 (r) + f 6 (') = l

has a minimum positive zero r*.

Then, show iteration {x,) (n 2 0) generated by Newton-like sequence (5.2.1) is well defined, remains in U (x*,r*) for all n > 0 and converges to x* provided that xo E U (x*,r*). Moreover, the following estimates hold for all n 2 0

where


182 (c) Let X

= Y = R,

D = (-1, I ) , x* = 0 and define function F on D by

For the case A = F' show

where r~ stands for Rheinboldt's radius (see section 4.1) [295] where rR is the convergence radius given by Rheinboldt's [295]. (d) Let X = Y = C [0, 11, the space of continuous functions defined on [ O , 1 ] equipped with the max-norm. Let D = {$ E C [O,1] ; 11 $11 5 1) and F defined on D by

with a solution $* (x) = 0 for all x E [0, 11. In this case, for each $ E D , Fr (4) is a linear operator defined on D by the following expression:

In this case and by considering again A = F',

5.10. Consider inexact Newton methods for solving equation F (x) = 0, F : D & RN + RN of the general form

where sn E RN satisfies the equation

F'

(2,)

s, = -F (x,)

+ rn

(n L 0)

for some sequence {r,) IKN. Let F E FA (a) = { F : U (x*, a ) + RN with U (x*,a ) & D, where F (x*) = 0, F is Frbchet differentiable on U (x*,a ) and F' (x)-' exists for all x E U (x*, a ) , F' is continous on U (x*,a ) and there > 0 such that for all y, z E U (x*,a ) exists

5.4. Exercise Suppose the inexact Newton method {x,) ( n > 0 ) satisfy:

>

for some { v n ) L R ( n 0). If xi E U ( x * ,a ) , then show: (a) IIxn+l - X* 11 I W n 1Ixn - X* 1 1 ,

(b) if v, 5 v

< 1 (n 2 0 ) , then there exists w E ( 0 , l ) given by

such that I[x,+l - x* 11 I w Ilx, - x* 11 and limn,, x , = x*; (c) if {x,) ( n 2 0 ) converges to x* and limn,, v, = 0, then {x,) converges Q-superlinearly; (d) if {x,) ( n 2 0 ) converges to x* and limn,, v,('+"-'< 1, then {x,) converges with R-order at least 1 A.

+

5.11 Consider the two-point method [144]:

for approximating a solution x* of equation F ( x ) = 0. Let F : R G X + Y be a twice-F'rkchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume: (1)

ro= F'

(xo)-l E L (Y,X ) exists for some xo E R with

(2) IlFoF (x0)ll i rl;

(3) llF"

( X I ll I

M ( x E 0);

(4) IIF" ( x ) - F"

Denote by a0

(YIII i K IIx - Y I I ,

= MPq,

an+l =a,f (an)'

x , Y E 0-

bo = KPr12. Define sequences gp

(an,b n ) ,

bn+l =bnf (an)3g p (an,bn)'

,

llroll I P;

184


>

then show: iteration {x,} (n 0) is well defined, remains in U(xo, 2)for all a0 n 2 0 and converges to a solution x* of equation F (x) = 0, which is unique Moreover, the following estimates hold for all n 0: in U(xo,

2).

>

where y = % and A = f

(~0)-'

5.12. Consider the two-step method:

for approximating a solution x* of equation F (x) = 0. Let F : R G X G Y be a F'rkchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume: (1)

ro= f f (xo)-'

E L (Y, X ) for some xo E R,

llroll 5 P;

(2) IlroF (xo)ll I 7;

(3) llFf(XI - Ff(YIII I K Ilx - YII (x1 Y E 0 ) .

(n 2 O), Denote a0 = kPq and define the sequence an+l = f (an12g (a,)?, where f (x) = 2-2x-x 2 a n d g ( x ) = x 2 ( x + 4 ) / 8 . 1 f a ~ ~ ( ~ , ~ ) , U ( x ~ , R q ) C I+ 9 0 , R = +, y = and A = f (~0)-', then show: iteration {x,) (n 0) is well defined, remains in u (xo, Rq) for all n 0 and converges to a solution - Rq) n 0 . Moreover, x* of equation F (x) = 0, which is unique in U(xo, the following estimates hold for all n 0

2

>

>

>

&

5.13. Let X be a Banach space, and let Y be a closed subspace. Assume F is a completely continuous operator defined on D X , D an open set, and assume the values F (x) E Y for all x E D. Let X, be a sequence of finite dimensional subspace of X such that inf I I y - x 1 1 + O a s n + c m f o r a l l y ~ Y .

x EX,

5.4. Exercise Let Fn be a sequence of projections associated with Xn :

Assume that when restricted to Y , the projections are uniformly bounded:

Then projection method for solving

x

=F

(x)

becomes

Suppose that x* E D if a fixed point of nonzero index for F. Then show for all sufficiently large n equation F ( x ) = x has at least one solution x, E Xn n D such that lim

n+m

Let F : U ( x o ,r ) X + X be differentiable and continuous on that I - P' ( a ) is compact. Suppose L E L ( X ) is such that

(xo,r ) such

llLF (xo)II I a III - LFf(xo)II I b < 1, IIL [F' ( x )- F' (y)]llI c IIx - yll for a11 x E U ( x o , ~ ) h=

"" 5 i

(1-b)Z

ro = & f

( h ) I r , where f ( h )=

q f; (0) 1. =

Then show: (a) equation F ( x ) = x has a solution x* in U (xo,ro) ; x* is unique in i /+ ,l if, for rl = &f l ( h ) ,where f i ( h )zZ =7

r < rl rsrl

for h

(XO,T )

< 4;

for h = $ .

(b) Furthermore Newton's method is well defined, remains in U ( x o ,r ) for all n 2 0 and converges to x*.

+

+

5.14. Let Fibonacci sequence {a,) be defined by aa, ban-l canpa = 0 where a0 = 0 and a1 = 1. If the characteristice polynomial p ( x ) = ax2 bx c has

zeros rl and

7-2

with lrl 1

> Ir21. Then show:

+ +

5. Newton-like methods (a) a, # 0 (n > 0) ; (b) lim % - rl; 77,-00

an

(F) (F,F)-,

(c) Newton and (d) secant where,

= azn+l

=

x, = Newton (x,-~) = X,-I

F(xn-i

-F

(n 2 l)

and xn = secant (x,-1, x,-~) = x,-~ -

-

FF ( ~ (n -Z1 )~( ~- n~- 1) --~ (n -X2 )~ (n )

2

Chapter 6

More Results on Newton's Method We exploit the weaker conditions introduced in Section 5 to improve on recent results on the local and semilocal convergence analysis of Newton's method.

6.1

Convergence radii for the Newton-Kantorovich method

Motivated by the works in Wang [341], [343] and using the same type of weak Lipschitz conditions we show that a larger convergence radius can be obtained for Newton-Kantorovich method (4.1.3). Recently in [341], [343] a Lipschitz-type condition

for all x E D, where L is an integrable, nondecreasing, nonnegative function, s(x) = IIx - x* 11 and xe = x* O(x - x*) 0 E [0, 11,was used to obtain larger convergence radii than the ones given by the relevant works in [136], [340], [312]. Condition (6.1.1) is weaker than the famous Lipschitz condition

+

where w is also a nondecreasing, nonnegative function. Condition (6.1.2) has been used e.g. in [295] (when w is a constant function). It turns out that (6.1.1) instead of the stronger (6.1.2) is really needed to show the local convergence of method (6.1.1) to x* [341],[343]. Based on this observation it became possible to obtain convergence radii larger than the ones obtained in [136], [295], [340], [312]. Here we show that all the above mentioned results can be further improved under weaker or the same hypotheses and under less or the same computational cost. A numerical example is also provided where our results compare favorably with all earlier ones. Several areas of application are also suggested.

188

6. More Results on Newton's Method

We provide the following local convergence theorems:

Theorem 6.1.1 Let F : D C X + Y be a Fre'chet-dzfferentzable operator. Assume: (a) there exists a simple zero x* E D of equation F ( x ) = 0 so that F1(x*)-I E L ( Y , X ) , positive integrable nondecreasing functions L , Lo such that the radius Lipschitz condition with L average and the center radius Lipschitz condition with Lo average

and

are satisfied respectively for all x E D , 0 E [0,11; (b) equation

I'

i r ~ ( u ) u d u + r Lo(u)du-r=O

(6.1.5)

has positive zeros. Denote by r* the smallest such zero; and U ( x * ,r * ) L D.

(6.1.6)

Then, sequence {x,) (n 2 0 ) generated by method (4.1.3) is well defined, remains i n U ( x * , r * )for all n 2 0 , and converges to x* provided that xo E U ( x * ,r*). Moreover the following estimates hold for all n 2 1: llx, - x*11 5

11x0 - X*

11

(6.1.7)

where,

and Proof. We first show a E [0,1). Indeed we can have in turn for 0 the monotonicity of L :

< u l < u2 and

6.1. Convergence radii for the Newton-Kantorovich method

Therefore can have

$ :J

189

L(v)dv is a nondecreasing function with respect to u. Then, we

Let x E U ( x * ,r*). Using (6.1.4) and (6.1.6) we get

It follows from (6.1.10) and the Banach Lemma on invertible operators 1.3.2 that F1(x)-I E L ( Y , X ) and

In particular xo E U ( x * , r * ) by hypothesis. Let us assume x , E U ( x * , r * ) . We obtain the identity

By (6.1.3), (6.1.4) and (6.1.11) we get

which shows the convergence of x, to x* and x,+l E U ( x * ,r*). Hence sequence {x,) ( n 0 ) remains in U ( x * ,r * ) by the principle of mathematical induction.

>


190

Moreover s ( x n ) decreases monotonically. Furthermore by (6.1.13)we obtain more precisely

from which estimate (6.1.7) follows. rn

Remark 6.1.1 I n general

a

holds and can be arbitrarily large (see Section 5.1). If equality holds i n (6.1.1518) then our theorem coincides with the corresponding Theorem 3.1 i n [341]. However i f strict inequality holds in (6.1.15) (see Example 5.1.5), then our convergence radius r* is greater than the corresponding radius r i n Theorem 3.1. Moreover our convergence rate a is smaller than the corresponding q i n Theorem 3.1. Summarizing, if

then

That is our theorem provides a wider range of initial guesses xo and more precise error estimates o n the distances 112, - x*11 (n > 0 ) than Theorem 3.1 and under the same computational cost since i n practice the evaluation of function L requires that of Lo. In case Lo,L are constants see section 5.1.

Theorem 6.1.2 Let F : D G X + Y be a Fre'chet-differentiable operator. Assume: (a) there exists a simple zero x* E D of equation F ( x ) = 0, so that Ff(x*)-' E L(Y,X),and a positive integrable function Lo such that condition (6.1.4) holds; (b) equation


191

has a minimum positive zero denoted b y r:; and (4

Then sequence {y,)

(n> 0 ) generated by

remains i n U ( x * ,r2;)for all n 2 0 and converges to a unique solution x* of equation (4.1.1) so that

where a0 E [O,1) and

Proof. Simply use function Lo instead of L in the proof of Theorem 4.1 in [341].

Remark 6.1.2 If Lo = L then Theorem 6.1.2 reduces to Theorem 4.1 i n [341]. Moreover if (19) holds then

and

where b y ro,qo we denote the radius and ratio found i n Theorem 4.1. In the case when L , Lo are constants as expected our results improve Corollaries 6.3 and 6.4 in [342]which in turn compare favorably with the results obtained by Smale [312]and Debieu [I391 respectively when F is analytic and satisfies

provided that

and

6. More Results o n Newton's Method

for some yo

> 0 , y > 0 with

"Yo I Y

(6.1.29)

For example our result corresponding to Corollary 6.4 gives

whereas the corresponding radius in [342]is given by

Therefore (6.1.24) and (6.1.25) hold (if strict inequality holds in (6.1.29)). We can provide the local convergence result for twice Frhchet-differentiable operators as an alternative to Theorem 6.1.1. Theorem 6.1.3 Let F : D G X + Y be a twice Fre'chet-diflerentiable operator. Assume: (a) there exists a simple zero x* E D of equation F ( x ) = 0 , so that F 1 ( x * ) - l E L ( Y , X ) and nonnegative constants C , b, losuch that

+

I ~ F ' ( X * ) - ' [ F " ( X O ( X * - x ) ) - F " ( x * ) ] II~e ( i - e)llx - .*I],

(6.1.32)

I I F ' ( x * ) - ~ F " ( x *5) Ib,I

(6.1.33)

and

for all x E D , 0 E [O,l];(b) U ( x * ,R*) G D , where,

3b 1 1 d = - and c = -C -to. 2 3 2 Then, sequence {x,) generated by method (4.1.3) is well defined, remains i n U ( x * ,R*) for all n > 0 and converges to x* provided that xo E U ( x * ,R*). Moreover the following error bounds hold for all n > 0

+

and


193

Proof. Let x E U ( x * ,R*). Using (6.1.33), (6.1.34) and the definition of R* we obtain in turn

F1(x*)-I [ ~ ' ( x *-)F 1 ( x ) ]= -F1(x*)-'[F'(x) - F'(x*) - F U ( x * ) ( x- x * ) F1'(x*)(x- x*)]

+

and

Jd

1

+

I ~ F ~ ( X * ) - ~ [ F' ( FX *~ )( x )5I ~ ~ e 0 s x - x*l12do b11x - x*ll

It follows from (6.1.40) and the Banach Lemma on invertible operators that Ff (x)-I F 1 ( x * )exists and

In particular by hypothesis xo E U ( x * ,R * ) . Assume x, E U ( x * ,R*). Using F ( x * ) = 0 we obtain the identity

By (6.1.32), (6.1.33), (6.1.41), (6.1.42) and the triangle inequality we get

194


Estimate (6.1.44) implies xn+l E U ( x * ,R*) and lim x , 12-00 The analog of Theorem 6.1.2 is given by

= x*.

rn

Theorem 6.1.4 Under the hypotheses of Theorem 6.1.3 (excluding (6.1.46)) and replacing R*, h by R,*,ho respectively given by 2

R* O-do+ J ho = coR;; do,

+

where,

-eo

co = 6

and

b do = 2

(6.1.47)

the conclusions of Theorem 6.1.3 hold, for iteration (6.1.21).

Proof. Using iteration (6.1.21) we obtain the identity

By (6.1.33), (6.1.34), (6.1.48) and the definition of R,*,ho we obtain (as in (6.1.43))

which completes the proof of Theorem 6.1.4. rn

Remark 6.1.3 (a) Returning back to Example 5.1.5 and using (6.1.32)-(6.1.36), (6.1.45)-(6.1.47) we obtain l = lo= e - 1, b = 1, d = 1.5, c = g(e - I ) , co = 1 do = 5 ,

9,

6.1. Convergence radii for the Newton-Kantorovich method and we can set

Comparing with the earlier results mentioned above (see Example 5.1.5) we see that the convergence radius is significantly enlarged under our new approach. Note also that the order of convergence of iteration (6.1.21) is quadratic whereas it is only linear i n the corresponding Theorem 4.1 i n [341]. (b) The results can be extended easily along the same lines as above t o the case when l , lo are not constants but functions. (c) The results obtained here can also be extended to hold for m-Fre'chetdifferentiable operators (m 2 2 ) along the lines of our works i n [79], [68]. However we leave the details of the works for (b) and (c) to the motivated reader. We present a new convergence theorem where the hypotheses (see Theorem 6.1.1) that functions L , Lo are nondecreasing are not assumed: Theorem 6.1.5 Let F , x*, L , Lo be as i n Theorem 6.1.1 but the nondecreasiness of functions L , Lo is not assumed. Moreover assume: (a) equation

has positive solutions. Denote by R* the smallest one; (b) U ( x * ,R*) C D.

(6.1.51)

Then sequence {x,) (n 2 0 ) generated by method (4.1.3) is well defined, remains i n U ( x * ,R*) for all n 2 0 and converges to x* provided that xo E U ( x *, R*), so that:

where,

and b E [0,1). Furthermore, if function L , given by

is nondecreasing for some a E [0,11, and R* satisfies

196


Then method (4.1.3) converges to x* for all xo E U(x*,R*)and

where,

Proof. Exactly as in the proof of Theorem 6.1.1 we arrive at

which show that if x, E U(x*,R*)then x,+l E U(x*,R*)and lim x, = x*. Esti12-00 mate (6.1.13)follows from (6.1.20). Furthermore, if function La is given by (6.1.15) and R* satisfies (6.1.16)then by Lemma 2.2 in [343,p. 4081 for nondecreasing function

we get

which completes the proof of Theorem 6.1.5. rn Theorem 6.1.6 Let F, x*, Lo be as in Theorem 6.1.2 but the nondecreasness of function Lo is not assumed. Moreover assume: (a) equation

c

has positive solutions. Denote by RT,the smallest one; (b) U(x*,Rg) D. Then, sequence {x,) (n 2 0) generated by method (4.1.3) is well defined, remains in


197

U ( x * ,R:) for all n 2 0 and converges to x* provided that xo E U ( x * ,R:). Moreover error bound (6.1.13) hold for all n 2 0 ; where

Proof. Simply use Lo instead of L in the corresponding proof of Theorem 1.2 in [343, p. 4091. Remark 6.1.4 (a) I n general (6.1.16) holds. If equality holds i n (6.1.16) then Theorems 6.1.4 and 6.1.5 reduce t o Theorems 1.1 and 1.2 i n [343] respectively. However if strict inequality holds in (6.1.16) then clearly: our convergence conditions (6.1.11), (6.1.24) are weaker than the corresponding ones (9) and (15) i n [343]; convergence radii R* and Rg are also larger than the corresponding ones i n [343] denoted here by p and po; and finally our convergence rates are smaller. For example in case functions Lo, L are constants we obtain

b=

LR* 1 - LOR*'

1 po = -, 3L

and R*

O

1

-. - 3L0 -

Therefore we have:

and

where q is the corresponding ratio in Theorem 1.1 i n [343]. Note that if strict inequality holds in (6.1.26) so does in (6.1.28)-(6.1.30). Moreover the computational cost for finding L o , L i n practice is the same as of that of finding just L . A s an example, as i n Section 5.1 consider real function F given by (5.1.40). Then using (6.1.3) and (6.1.4) we obtain Lo = e - 1 < e = L ,

and

Hence, we have:

6. More Results on Newton's Method and

(c) As already noted in [343, p. 4111 and for the same reasons Theorem 6.1.5 is an essential improvement over Theorem 6.1.4 if the radius of convergence is ignored (see Example 3.1 in [343]).

6.2

Convergence under weak continuous derivative

We provide a semilocal convergence analysis of Newton's method (4.1.3) under weaker Lipschitz conditions for approximating a locally unique solution of equation (4.1.1) in a Banach space. For p E [O, 11 consider the following Holder condition:

In Rokne [303], Argyros [ll]the Holder continuity condition replaced (6.2.1) or in Argyros [93] the more general

was used, where v is a nondecreasing function on [0, R] for some given R. In order to combine the Kantorovich condition with Smale a-criterion [296], [312] for the convergence of method (4.1.3) (when F is an analytic operator), Wang in [341], [343] used the weak Lipschitz condition

instead of (6.2.1). Huang in [I951 noticed that (6.2.3) is a special case of the Generalized Kantorovich condition IIF'(xo)-~[F'(x)- F'(Y)III5

I

s(llx-~ll+llx-~II) L (u)du

(6.2.4)

s(llx-yll)

for all x E Ul, y E U2 where Ul

=

u ( x o ,t*), U2

=

u ( x o , t * - ]lx - xoll), and

t* E (0, R) is the minimum positive simple zero of

with functions L, s' continuously increasing satisfying L(u) 2 0, on [0,R], s(0) = 0 and st(t) 2 0 on [0,R), respectively. Under condition (6.2.4) a semilocal convergence analysis was provided and examples were given where the Generalized

6.2. Convergence under weak continuous derivative

199

Kantorovich condition (6.2.4) can be used to show convergence of method (4.3.1) but (6.2.1) cannot. Here, we provide a semilocal convergence analysis based on a combination of condition (6.2.6) (which is weaker than (6.2.4)) and on the center generalized Kantorovich condition (6.2.7). The advantages of such an approach have already been explained in Section 5.1. We need the definitions: Definition 6.2.1 (Weak Generalized Kantorovich Condition). W e say xo satisfies the weak generalized Kantorovich condition i f f o r all 0 E [O, 11, x E U l , y E U z

holds, where functions s l , L 1 are as s , L respectively. Definition 6.2.2 (Center Generalized Kantorovich Condition). W e say xo satisfies the center generalized Kantorovich condition if for all x E Ul

holds, where functions so, L o are as s and L respectively. Note that in general so(u) I s1(u) I s ( u ) and

hold. In case strict inequality holds in (6.2.8) or (6.2.9) then the results obtained in [I951 can be improved under (6.2.6) or even more under the weaker condition (6.2.7). Indeed we can show the following semilocal convergence theorem for method (4.1.3). We first define functions fo, f i , by fo(t) =

Jd

fl(t) =

so(t)

L o ( u ) ( t - .;l(u))d.

-t

+ v,

~ ~ ( u -) s(; lt( u ) ) d u

-t

+ 1)

and sequences a,, t, by a,+l

= a,

fi(tn) fA(tn)

- -

( n 2 O ) , (to = O ) ,

f(tn) tn+l= tn - ( n 2 0 ) , (to = 0 ) . f '(tn)

(6.2.13)

200


Theorem 6.2.3 Let F : D C X 4 Y be a Fre'chet-differentiable operator. Assume: there exists xo E D such that F1(xo)-' E L ( Y , X ) ; and condition (6.2.5) holds. Then sequence {x,) generated by method (4.1.3) is well defined, remains in U ( x o ,a*) for all n 2 0 and converges to a unique solution x* of equation F ( x ) = 0 in U ( x o , a * ) ,where a* is the limit of the monotonically increasing sequence {a,) given by (6.2.12). Moreover the following estimates hold:

11xn+l - xn11 I an+i - an I tn+l - t,, llx, - x* 11 I a* - a, I t* - t,, an I t n ) t*, a* I

and

where r E [a*,t**]satisfies f ( r )5 0, a* t** = max { t ,f ( t ) 5 0 ) ) 60 = -, a* 0 ) generated by method (4.1.3) is well defined, remains in ~ ( x ob*), for all n > 0 and converges to a solution x* E u ( x o ,b*) of equation F ( x ) = 0 . Moreover, the following estimates hold for all n > 0:

and

where iteration {b,) ( n > 0 ) is given by (6.2.27). The solution x* is unique in U ( x 0 ,b*) if

Furthermore if there exists Ro such that

and

1'

+

[ ~ ( b * B(b*

+ Ro))- w(b*)]dB+ W O ( ~ 5* )1,

then the solution x* is unique in U ( x 0 ,Ro). Proof. Using induction we show xk E u ( x o , b*) and estimate (6.2.32) hold for all k 0. For n = 0 , (6.2.32) holds, since

>

Suppose (6.2.32) holds for all n = 0 , 1 , . . . ,k

We show (6.2.32) holds for n = k

+ 1; this implies in particular that

+ 2. By (6.2.7) and (6.2.25) we get

6.2. Convergence under weak continuous derivative

203

It follows from (6.2.37) and the Banach Lemma on invertible operators 1.3.2 that F ' ( X ~ + ~exists ) - ~ and

By (4.1.3), (6.2.6), and (6.2.23) we obtain

Using (4.1.3), (6.2.38) and (6.2.39) we get

Note also that:

, It follows from (6.2.40) that {x,) ( n 2 0 ) is a Cauchy That is Xk+2 E V ( x O b*). sequence in a Banach space X, and as such it converges to some x* E U ( x o ,b*). By letting k + oo in (6.2.39) we obtain F ( x * ) = 0. Moreover, estimate (6.2.33) follows from (6.2.32) by using standard majorization techniques. Furthermore, to show uniqueness, let y* be a solution of equation F ( x ) = 0 in U ( x o ,Ro). It follows from (4.1.3)


204 Similarly if y* E U(xo, b*) we arrive at

I~Y*

+

[w(b* Qb*)- w(bt)]dQ I~Y* - ~kll, 1 - wo(b*)

- ~ k + l I l5

In both cases above lim xk = y*. Hence, we conclude k-00

x* = y*. We can now compare majorizing sequences {t,), {a,), { b n ) under the hypotheses of Theorem 1 in [195, p. 1171, and Theorems 6.2.3, 6.2.5 above:

Lemma 6.2.6 Under the hypotheses stated above and if strict inequality holds in (6.2.7) or (6.2.8) then the following estimates

and

b* 5 a* 5 t* hold for all n

2 0.

Proof. It follows easily by using induction, definitions of sequences {t,), {a,), {b,) given by (6.2.12), (6.2.11), (6.2.27) respectively and hypotheses (6.2.8), (6.2.9).

Remark 6.2.2 The above lemma justifies the advantages of our approach claimed at the beginning of the section.

6.3

Convergence and universal constants

We improve the elegant work by Wang, Li and Lai in [342],where they gave a unified convergence theory for several Newton-type methods by using different Lipschitz conditions and a finer "universal constant". It is convenient to define majorizing function f and majorizing sequence {t,) by

and

respectively where 7 , e, L are nonnegative constants.

6.3. Convergence and universal constants Proposition 6.3.1 [342, p. 2071 Let

and

Then, (a) function f is monotonically decreasing i n [O,r], while it is increasing b then monotonically in [r,+m). Moreover, if q I

Therefore f has a unique zero in two intervals denoted by t* and t** respectively, satisfying

when q < b and t* = t**,when q = b; (b) Iteration {t,) is monotonically increasing and converges t o t*. From now on we set D = 8 ( x o ,r ) . We can define a more precise majorizing sequence {s,) (n > 0 ) by

so = 0 , sl = q , for some constant loE [O,!]. Iteration {t,) (n 2 0 ) can also be written as

Then under the hypotheses of Proposition 6.3.1, using (6.3.6), (6.3.7) and induction on n we can easily show: Proposition 6.3.2 Under the hypotheses of Proposition 6.3.1, sequence {s,)

is monotonically increasing and converges to some s* E ( q ,t*].Moreover the following estimates hold:

and S*

5 t*

provided that L

> 0 if lo= e.


206

Let xo E D such that Ft(xo)-' E L(Y, X ) . W e now need to introduce some Lipschitz conditions:

11 Ft(xo)-' [ F t ( x )- Ft(xo)]11 I

CO~~ Xxo 11,

for all x , y: IIy

-

211

+

112

for all x E ~ ( x or ),

-

(6.3.12)

~ 0 1 1 5 T,

for all x E ~ ( x or ),, IIFt(xo)-'[Ft(x)-Ft(xo)]llICllx-~011, for a l l x € u ( x o , r )

(6.3.15)

I I F ' ( X ~ ) - ~ F I~ 'c( X ~ ) ~ ~

(6.3.16)

11 F ' ( X ~ ) - ' F ~ ' ( XIOlo )~~

(6.3.17)

( ( F ~ ( X ~ ) - ' [ F ~ ~ ( ~ ) -IFL((y-XI(, ~ ~ ( X ) ]for ( ( ally,^: ((y-x((+((x-xo(lI r, (6.3.18) (6.3.19) ~~F~(X~)-'[F "Ftt(xo)]ll (X) I Lllx - ~ 0 1 1 , for all x E T ( x o , r ) and

11 F ' ( X ~ )[Ftt(x) - ~ - Ftt(xo)] 11 I

Lollx - xo 11, for all x E ~ ( x or ),.

(6.3.20)

Note that we will use (6.3.12), (6.3.17), (6.3.20) instead of (6.3.15), (6.3.16) and (6.3.19) respectively in our proofs. W e define as in [342] K ( ' ) ( x o ,C ) = { F E C' ( x o ,r ) : (6.3.13) holds for L = O), K:iL,(xo,C) = { F E c l ( x o , r ) :(6.3.15) holds), K ( ' ) ( x o C, , L ) = { F E c l ( x o ,r ) : (6.3.13) holds), K : ~ ~ , (l X ,L~) = , { F E C1(xo,r ) : (6.3.14) holds),

K ( ' ) ( x Ol, ,L ) = { F E C 2 ( x or, ) : (6.3.16) and (6.3.18) hold), K ~ ~ L , ( xl o,L, ) = { F E C 2 ( x 0r, ) : (6.3.16) and (6.3.20) hold). Moreover for our study we need to define: K(')(X~,C~,C = ,{LF) E C 1 ( x o , r ) :(6.3.12) and (6.3.13) hold), K:iL,(xo, l o ) = { F E C' ( x o ,r ) : (6.3.12) holds),

K ( ~ ) ( x ~ , C ~ , LC ), = { F E c 2 ( x 0 , r ) :(6.3.12), (6.3.17) and (6.3.18) hold or (6.3.12), (6.3.16) and (6.3.18) hold). W e can provide the main semilocal convergence result for method (4.1.3).

6.3. Convergence and universal constants

207

Theorem 6.3.3 Suppose that F E K ( ~ ) ( X ~ , C ~ , LC ), . If q 5 b, then sequence {x,) generated by method (4.1.3) is well defined, remains i n u ( x o ,s*) for all n > 0, and converges to a solution x* E U ( x O r, ) of equation F ( x ) = 0 . Proof. Simply use (6.3.12) instead of (6.3.15) in Lemma 2.3 in [342] to obtain F1(x)-' exists for all x E V ( x o ,r ) and satisfies

The rest follows exactly as in Theorem 3.1 in [342] but by replacing {t,), {t*), (6.3.11) by {s,), { s * ) , (6.3.21) respectively. rn Theorem 6.3.4 Suppose that F E K~en,(xo,Co). If q 5 b, then the equation F ( x ) = 0 has a unique solution x* i n V ( x O s;)), , where s;) = s* if q = b, and s;) E [ s * , t * )i f q < b. Proof. As in Theorem 3.2 in [342]but used (6.3.12) instead of (6.3.14). rn By Proposition 6.3.2 we see that our results in Theorems 6.3.3 and 6.3.4 improve the corresponding ones in [342]as claimed in the introduction in case Co 5 C. At this point we wonder if hypothesis q 5 b can be weakened by finding a larger "universal constant" b. It turns out that this is possible. We first need the following lemma on majorizing sequences: Lemma 6.3.5 Assume there exist nonnegative constants eter 6 E [O, 2) such that for all n 2 0

to,C , L , q and a param-

Then, iteration {s,) given by (6.3.7) is monotonically increasing, bounded above and converges to some s* such that by s** =

&

Moreover the following estimates hold for all n > 0:

Proof. The proof follows along the lines of Lemma 5.1.1. However there are some differences. We must show using induction on the integer k > 0


208 0 5 Sk+l - S k ,

(6.3.26)

0 < 1- l o s k + l .

(6.3.27)

and

Estimates (6.3.25)-(6.3.27) hold for k = 0 by (6.3.22). But then (6.3.6) gives

Let us assume (6.3.24)-(6.3.27) hold for all k 5 n. We can have in turn:

We must also show S'

I s**.

For k = 0,1,2 we have SO = 0

I s**, s1 = 7) I s**, and

s2

=q

+ -q62

I s**.

n. It follows from the induction hypothesis Assume (6.3.30) holds for all k I

Moreover (6.3.27) holds from (6.3.22) and (6.3.31). Inequality (6.3.26) follows from (6.3.6), (6.3.27) and (6.3.25). Hence, sequence {s,) is monotonically increasing, bounded above by s**, and as such it converges to some s* satisfying (6.3.23). Remark 6.3.1 Condition (6.3.22) can be replaced by a stronger but easier to verify condition

6.3. Convergence and universal constants where,

Indeed (6.3.22) certainly holds if

since,

holds for all n 2 0 if S

E

[O,

- 11. However

(6.3.37) holds by (6.3.32).

Hence, we showed:

Lemma 6.3.6 If (6.3.22) replaces condition (6.3.22) i n Lemma 6.3.5 then the conhold. clusions for sequence {s,) Remark 6.3.2 I n Lemma 6.3.5 we wanted to leave (3.4) as uncluttered as possible, since it can be shown to hold in some interesting cases. For example consider Newton's method, 2.e. set L = 0. See Application 5.1.4 i n Section 5.1. Note also that bo can be larger than b, which means that (3.14) may hold but not n I b. Indeed, here are two such cases: choose (a) L = 1, yo = y = .5 and v = .59 then b = .583 and (b) L

= .8,

y = .4 and q

=

b0 = .590001593 .7

then b = .682762423 and

bo = .724927596.

Below is the main semilocal convergence theorem for Newton's method involving F'rhchet differentiable operators and using center-Lipschitz conditions.

210


Theorem 6.3.7 Assume: (a) F E ~ ( ' ) ( x ~ , l L~) , land , (b) hypotheses of Lemmas 6.3.5 or 6.3.6 hold. Then, sequence { x n ) ( n 2 0 ) generated by method (4.1.3) is well defined, remains i n U ( x o ,s * ) for all n 2 0 , and converges to a solution x* E U ( x o ,s*) of equation F ( x ) = 0. Moreover the following estimates hold for all n 2 0:

and

where, sequence { s n ) ( n 2 0 ) is given by (6.3.6). The solution x* is unique in U ( x o , s * ) if ys*

< 1.

Furthermore, i f there exists R

(6.3.41)

> s* such that U ( x o ,R) c D, and

the solution x* is unique in U ( x o ,R). Proof. Let us prove that

and

hold for all k

2 0. For every x E U ( X Is*, - s ~ )

implies x E U l ( x l ,s* - s l ) . Since also

11x1 - ~

0 1 1= [ [ F ' ( X ~ ) - ~ F ( ~ OI ) ~ ~ =

(6.3.43) and (6.3.44) hold for k

= 0.

Given they hold for n

= 0,1,.

. .,k , then

6.3. Convergence and universal constants

and

Using (4.1.3) we obtain the approximation

Then, we get by (6.3.12), (6.3.13) and (6.3.45)

Using (6.3.12) we get I I ~ ' ( x o ) - ' [ ~ ' ( x k +l ) F1(xo)]II5 YoIIxk+l - 2011 I yosk+l I ys* < 1 (by (6.3.22)).

(6.3.47)

It follows from the Banach Lemma on invertible operators 1.3.2 and (6.3.22) that F1(x,+l)-' exists so that: I I ~ ' ( x k + l ) - ~ ~ ' ( x oI ) l I[l - ~ollxk+i- ~ o l l ] -I~ ( 1 - ~ o t k + i ) - ~ . Therefore by (6.3.2), (6.3.46), and (6.3.48) we obtain in turn

~ h u for s every z E T(xk+2,S* - sk+2) we have

That is

(6.3.48)

212


+

Estimates (6.3.49), (6.3.50) imply that (6.3.43)' (6.3.44) hold for n = k 1. The proof of (6.3.43) and (6.3.44) is now complete by induction. Lemma 6.3.5 implies I t n ) (n 2 0) is a Cauchy sequence. From (6.3.43), (6.3.44) {xn) (n 2 0) becomes a Cauchy sequence too, and as such it converges to some U(xo,s) such that (6.3.40) holds. The combination of (6.3.40) and (6.3.46) yields F(x*) = 0. Finally to show uniqueness let y* be a solution of equation F(x) = 0 in U(xo, R). It follows from (6.3.12), the estimate

< Z2( s * + R) 5 1 (by (6.3.42)), and the Banach Lemma on invertible operators that linear operator

is invertible. Using the identity

we deduce x* = y*. Similarly we show uniqueness in u ( x o ,s*) using (6.3.41). rn As in [342] we also study the semilocal convergence of King's iteration [208]

where

[$I

denotes the integer part of

2 , and Werner's method

[345]

Both methods improve the computational efficiency with (6.3.53)-(6.3.55) being although the more preferable because it raises the convergence order to 1 number of function evaluations doubles when compared with Newton's method. Using (6.3.12) instead of (6.3.14) or (6.3.19) to compute upper error bounds on 11 F1(x)-I F1(xo)11 (see e.g. (6.3.48)) we can show as in Theorems 4.1 and 4.2 in [342] respectively:

+ a,

6.4. A Deformed Method

213

Theorem 6.3.8 Let F E K(') (xo,yo,y, L) and assume hypotheses of Lemmas 6.3.5 or 6.3.6 hold. Then sequence {xn) generated by King's method (6.3.52) converges to a unique solution x* of equation F ( x ) = 0 in u ( x O ,s*). Theorem 6.3.9 Let F E K ( ~ ) ( x olo, , l,L) and assume hypotheses of Lemmas 6.3.5 or 6.3.6 hold. Then sequence {xn) generated by Werner's method (6.3.53)-(6.3.55) converges to a unique solution x* of equation F(x) = 0 in u ( x o ,s*).

Note that finer error bounds and a more precise information on the location of the solution x* are obtained again in the cases of Theorems 6.3.8 and 6.3.9 since more precise majorizing sequences are used.

6.4

A Deformed Method

We assume that F is a twice continuously FrBchet-differentiable operator in this section. We use the following modification of King's iteration [208] proposed by Werner [345]:

Although the number of function evaluations increases by one when compared to Newton's method, the order of convergence is raised from 2 to 1 4 [345]. Here in particular using more precise majorizing sequences than before we provide finer error bounds on the distances llxn+l - xnII, lix, - x*11 and a more precise information on the location of the solution x*. Finally, we suggest an approach for even weakening the sufficient convergent condition (6.4.8) for this method along the lines of the works in Section 5.1. Let p 2 0, y > 0, and yo > 0 be given parameters. It is convenient to define real function f by

+

and scalar sequences {t,), tn+l

= t , - fl(sn)-'

{L)(n 2 0) by f (t,)

to = ro

= 0,

(6.4.4)


214

where,

and

We need the following lemma on majorizing sequences:

Lemma 6.4.1 Assume:

and

Then iterations {t,},

where, -*

t

= lim

{G)are well defined and the following

Zn,

12-00

and

t* = lim tn = n+cc

47

estimates hold:

6.4. A Deformed Method

215

Proof. Estimates (6.4.10) and (6.4.19) follow using only (6.4.8) (see Lemma 2 in [177,p. 691. The rest of the proof follows immediately by using induction on n and noticing that

Indeed (6.4.20) reduces to showing:

which is true by the choice of t. Remark 6.4.1 It follows fiom Lemma 6.4.1 that i f { L ) is shown to be a majorizing sequence for {x,) instead of {t,) (shown to be so see Theorem 1 [l79]) then we can obtain finer estimates on the distances

and an at least as precise infomation on the location of the solution x*. W e also note that as it can be seen i n the main semilocal convergence theorem that follows all that is achieved under the same hypotheses/computational cost: Theorem 6.4.2 Let F: D X + Y be a thrice continuously Fre'chet differentiable operator. Assume there exist xo E D , b L 0 , y > 0 , yo > 0 such that F'(x0)-l E

L(Y>X ) ,

for all

U ( X ~ '( I -

$)$)c D

and hypotheses of Lemma 6.4.1 hold. Then iterations {x,), {z,) ( n 2 0) generated by method (6.4.2) are well defined, remain i n g ( x o , t * ) for all n 2 0, where t* is


216 given -

by (6.4,19) and converge to a unique solution x* of equation F ( x ) $) . Moreover the followzng estzmates hold:

u ( x o ,( 1

3)

=

0 in

and

Proof. We only need to show that the left-hand side inequalities in (6.4.27) and

(6.4.28) hold. The rest follow from Lemma 2.1, standard majorization techniques and Theorem 1 in [179]. Let x E g ( x o , t * ) . It follows from (6.4.9), (6.4.19) and (6.4.23) that

Using the Banach Lemma on invertible operators [6] and (6.4.31) we conclude that F' ( x ) - l exists and

By (6.4.2), (6.4.3)-(6.4.7), (6.4.20) above and (21)-(23) in Theorem 1 in [179, p. 701 we arrive at

(with F,, ?, replacing r,, t, respectively in the estimate above (23) in [179]).Note also that (6.4.27) holds for n = 0 , and by the induction hypothesis

Therefore the left-hand side of (6.4.27) holds for n + 1. We also have

Hence, the same is true for y,+l. (6.4.32) above we have

-

Moreover by (24), (25) in [179],and (6.4.20) and

-

= tn+2 - rn+1.

+

Hence, the left-hand side of (6.4.28) holds for n 1. The rest of the proof follows exactly as in Lemma 6.4.1 with K ,En, ?* replacing r,, t , and t* respectively.

6.5. The Inverse Function Theorem

217

{z,}

Remark 6.4.2 (a) Let us denote by ( n 2 0) scalar sequence generated as (6.4.6), (6.4.7) by with &, S,, F,, g,+l replaced by Tn,Z,, F,, f (t,+l), respectively. It follows from the proof of Theorem 6.4.2 that {T,) is also a majorizing sequence for {x,) so that

Tn

< t,,

-

Fn

< rn and

5 t*

where, 3

t

= lim

n+cm

-

&.

(b) A is expected that since sequence {En) is finer than {t,) condition (6.4.8) can be dropped as a suficient convergence condition for {T,}, and be replaced b y a weaker one. We do not pursue this idea here. However we suggest that such a condition can be found if we work along the lines of the more general majorizing sequences studied in Section 5.1.

6.5

The Inverse Function Theorem

We provide a semilocal convergence analysis for Newton's method in a Banach space setting using the inverse function theorem. Using a combination of a Lipschitz as well as a center Lipschitz type condition we provide: weaker sufficient convergence conditions; finer error bounds on the distances involved, and a more precise information on the location of the solution than before [340]. Moreover, our results are obtained under the same or less computational cost, as already indicated in Section 5.1. Let xo E D such that U ( x o ,r ) G D for some r > 0, and F1(xo)-' E L(Y, X ) . The inverse function theorem under the above stated conditions guarantees the existence of E > 0, operator Fr(xo)-l: U ( F ( x o ) , & ) Y + X such that Ff(xo)-' is differentiable,

c

and

F(F' ( 2 0 ) - ' ( 2 ) ) = z

for all z E U ( F ( x o ) E, ) .

In particular in [341] the center Lipschitz condition

(6.5.2)

218


was used, where s(x) = 112 - xo11, and L is a positive integrable function on (0, r] to first study the semilocal convergence of the modified Newton method. Then using the same function L and the Lipschitz condition

+

for all x E U(xo,r), y E U(x, r - s(x)), where s(x, y) = s(x) lly - xll 5 r , the semilocal convergence of method (6.5.2) was studied in a unified way with the corresponding modified Newton's method. We assume instead of (6.5.3):

By Banach's Lemma on invertible operators 1.3.2 and

for all r: E (0, r], x E U(xo, r!), Ff(x)-l exists, and

with r: satisfying

By simply using (6.5.5) instead of (6.5.3) we can show the following improvements of the corresponding results in [340]: Theorem 6.5.1 Set bo = Then the following hold:

So

rO O

Lo(u). Assume condition (6.5.5) holds for all r

Ff(xo)-l exists and is differentiable on

and the radius of the ball is the best possible (under condition (6.5.5)).

2 r:.


219

Lemma 6.5.2 Let

where Ro satisfies

If ,B P ( 0 ,bo), then fo is monotonically decreasing on [O,rg],while it is increasing monotonically on [rg,R] so that

Moreover, fo has a unique root in each interval, denoted by ry and r:, satisfying:

If (6.5.5) is extended to the boundary, i.e.

according to Lemma 6.5.2, Theorem 6.5.1 implies a more precise result:

Proposition 6.5.3 If

and

then

Ff (xo)-I exists and is differentiable o n U ( F ( X ~I l)r ,, ( ~ ) - I ) ;

satisfies (6.5.7), and radius r: is as small as possible.

6. More Results on Newton's Method Proposition 6.5.4 Assume: 7-

[r!,r:l,

P E ( 0 ,bo) and condition (6.5.14) hold. Then

In particular if y = 0 and

P = I I F ' ( X ~ ) - ~ F ( Xwe~ )get: ~~,

Theorem 6.5.5 Assume:

rE[r:,rg], ifP 1 than the ones i n [34O] using majorixing sequence {t,) instead of the more precise {s,). W e also note that our results are obtained under the same computational cost with the ones i n [340] since in practice finding L requires finding Lo. The rest of the error bounds i n (3401 can also be improved by simply using Lo i n (6.5.7) instead of L see Section 5.1. Instead we wonder if the convergence of scalar sequence {s,) can be established under weaker conditions than the ones given i n Theorem 6.5.6.

224


We showed in Section 5.1 that {s,} is monotonically convergent to s* if: (A) If L o , L are constants then sequence {s,) given by (6.5.28) becomes

The advantages in this case have been explained in Section 5.1. ( B ) There exists a minimal positive number R1 E [P, RO]satisfying equation

Moreover

R1

satisfies

iR1

L O ( u ) d u< 1.

Indeed if s,+l 5 R1,then Sn+2 I %+I

+1

Js:+l

-

L( u )du

( s n + l - sn) L0(u)du

st1L ( u )du

I/?+

R1 = R1. 1 - 1;' Lo(u)du

Hence, {s,) is monotonically increasing (by (6.5.41)) and bounded above by R1, and as such it converges to some s* E (/?,Rl]. ( C ) Other sufficient convergence conditions for sequence i s n ) can be found in Section 5.1 but for more general functions Lo and L. Following the proof of Theorem 6.5.6 but using hypotheses (A) or ( B ) or ( C ) above we immediately obtain: Theorem 6.5.7 Under hypotheses ( A ) or (B) or (C) above the conclusions of Theorem 6.5.6 for method (4.1.3) hold i n the ball U ( x l ,s* - /?) and s* 5 r l . Note that s* replaces rl in the left-hand side of estimate (6.5.30).

Next we examine the uniqueness of solution x* in the ball U ( x o ,r l ) or U ( x o ,s*). Remark 6.5.3 Assume:


l L B r

225

+ 1 -B)R][Brl+ (1 -B)R]dB < 1.

(6.5.43)

Then under the hypotheses of Theorem 6.5.6 the solution x* is unique in U ( x o r, l ) if (6.5.42) holds and in U ( x o R , ) for R rl if (6.5.43) holds. Indeed, if y* is a solution of equation F ( x ) = 0 inU(xO,R ) , then by (6.5.5) we get for L = J ; F1(y*+ B(x* - y*))dB

>

if y* E U ( x o r, l ) or (6.5.43) if y* E U ( x o R , ) . It follows b y the Banach Lemma on invertible operators and (6.5.44) that linear operator L is invertible. Using the identity 0 = F ( x * )- F(y*)= L(x* - y*)

we obtain in both cases

which establishes the uniqueness of the solution x* in the corresponding balls ~ ( X Or ,l ) and U ( x o R , ) . Replace rl by s* in (6.5.42) and (6.5.43) to obtain uniqueness of x* , ) under the hypotheses of Theorem 6.5.7. The corresponding in U ( x o s*) , or U ( x o R problem of uniqueness of the solution x* was not examined in Theorem 1.5 in [340]. Remark 6.5.4 Assume operator F is analytic and such that

Smale in [312] provided a semilocal convergence analysis for method (4.1.3) using (6.5.45). Later in [I771 Smale's results were improved by introducing a majorizing function of the form

(see also Section 6.4). Define function L by

Then conditions (6.5.3) and (6.5.4) are satisfied and function f defined in Remark 6.5.1 coincides with (6.5.46).


226

Concrete forms of Theorems 6.5.1 and 6.5.5 were given by Theorems 5.1 and 5.2 respectively in [340]. Here we show how to improve on Theorems 5.1 and 5.2 along the lines of our Theorems 6.5.1 and 6.5.5 above. Indeed simply define functions fo, Lo corresponding to f , L by

and

Again for this choice of L o , function fo given by (6.5.10) coincides with (6.5.48). Moreover condition (6.5.5) is satisfied. Clearly "lo l r

(6.5.50)

holds in general. If strict inequality holds in (6.5.50) then our results enjoy the ones corresponding to (6.1.14) benefits (see Remark 6.5.1). In particular as

the zeros of fo are

r: = ( 1 -

$)$, Ro= 1

270

1 and bo= ( 3 - 2 f i ) - . "lo

Then the concretizations of e.g., Theorems 6.5.1 and 6.5.5 are given respectively by: Theorem 6.5.8 Let yo

> 0 be a given constant. Assume operator F satisfies

for all

[ (

X E U xo, I - -

Then F1(xo)-I

I:);

E L(Y,X ) ,

- =U.

and is diflerentiable in the ball

6.6. Exercises

Moreover

uo c u and the radius in Uo is the best possible. Theorem 6.5.9 Let yo

> 0 be a given constant, and

a=Pyo53-2h. Assume operator F satisfies

for all x E v ( x o , r ) ,where ry 5 r < r: if a < 3 - 2 f i , or r = ry if a = 3 - 2 f i . Then equation F ( x ) = 0 has a unique solution x* satisfying (6.5.19) in the ball U(x0,r ). The rest of the results in [340]can be improved along the same lines. However we leave the details as exercises to the motivated reader. Iterative methods of higher order using hypotheses on the second (or higher) F'rkchet-derivative as well as results involving outer or generalized inverses can be found in the Exercises that follow.

6.6

Exercises X + Y be a three times Frkchet-differentiable operator defined on an open convex domain S of Banach space X with values in a Banach space Y. Assume F1(xo)-' exists for some xo E s, IIFt(xo)-lll 5 P, ~ ~ F ~ ( X ~ ) - ~5 FV ,( IIF"(x)II X O ) ~ ~5 M , llF'"(~)Il5 N , l l F " ' ( ~ )- F'"(Y)II 5 Lllz - y 11 for all x , y E S , and U ( x o ,r ~ C_) S , where

6.1 Let F : S

and r

= limn,,

Cy=o (ci

+ di). If A E

[0,

+I, B

=

[0,

( P ( A ) - 17c)]and

C E [ 0 , ~ ] , w h e r e ~ ( A ) = 2 7 ( ~ - 1 ) ( 2 A - ~ ) ( A z + A(+A~2 + ) 2A+4).


228

Then show [186]: Chebyshev-Halley method given by

is well defined, remains in U (xo,rq) and converges to a solution x* t of equation F (x) = 0. Moreover, the solution x* is unique in U Furthermore, the following estimates hold for all n 2 0

0(xo,rg)

b where y = 2. bl

6.2. Consider the scalar equation [I851

f (x) = 0. Using the degree of logarithmic convexity of f

the convex acceleration of Newton's method is given by

#

2 1754877, the interval [a,b] satisfying a+5 2k-1 f ( b ) b and xo t [a, b] with f (xo) > 0, and xo 2 a+ m. If ILf (x) I 5 $ and L f' (x) E [ i ,2 ( k - 1)2 - i ) in [a, b], then show: Newton's method converges to a solution x* of equation f (x) = 0 and xzn > x*, xzn+l I x* for all n 2 0. for some xo t W.Let k

6.3 Consider the midpoint method [68], [158]:

for approximating a solution x* of equation F (x) = 0. Let F : 0 G X + Y be a twice Frkchet-differentiable operator defined on an open convex subset of a Banach space X with values in a Banach space Y. Assume:

6.6. Exercises

(1) roE L ( Y , X ) for some xo E R and (2) IlroF(xo)ll

I

llroll I P;

rl;

(3) IIF" (x)ll I M (a: E 0 ); (4) IIF" ( x )- F1' (9)II I K llx - Yll ( ~ 7 E9 0 ) . 2

Denote a0 = MPq, bo = K P ~ ' . Define sequence a,+l = a, f (a,) g (a,, b,), If bn+l = bnf (an)3g (an,bn)2, f ( x ) = and g ( x ,Y ) = ( 2 - x ) 0 < a0 < bo < h (ao),where

e,

4,

xzz+2.

>

then show: midpoint method {x,) (n 0) is well defined, remains in n ( x o ,Rq) and converges at least R-cubically to a solution x* of equation F ( x ) = 0.The - Rq) n R and solution x* is unique in U ( x o ,

&

6.4 Consider the multipoint method in the form [68], [158]:

for approximating a solution x* of equation F ( x ) = 0. Let F be a twiceF'rkchet-differentiable operator defined on some open convex subset R of a Banach space X with values in a Banach space Y. Assume:

(1) roE L (Y,X ) , for some xo E X and llroll 5 P; (2) IlFoF (xo>ll5 rl; 3 I I 5 , (x E 0 ); (4) llF" ( 2 )- F" (Y)II I K 112 - YllP, ( x , Y )E 0, K L 0,P E Denote ao = MBv, bo = K,Bql+p and define sequence

lo,11.


230 Suppose a0 E (0,

hp (x, 4)

=

i) and bo < hp (ao,8 ) ) where p+l p+2 (1 d2+tb(+2)0!l

-

2%)(8 - 4x2 - x3) .

+ q)h,

Then, if u (so,Rq) C R, where R = (1 A = f (ao)-' show: iteration {x,) (n 2 0) is well defined, remains in u (xo,Rq) for all n 2 0 and converges with R-order at least 2 p to a solution x* of equation F (x) = 0. - Rq) n 0 . Moreover, the following The solution x* is unique in U(xO, estimates hold for all n 2 0

+

&

where y = a. ao 6.5. Consider the multipoint iteration [191]:

X + Y be a three for approximation equation F (x) = 0. Let F : R times F'r6chet-differentiable operator defined on some convex subset R of a Banach space X with values in a Banach space Y. Assume Ft (xo)-' E L (Y, X ) (xo E R), llroll 5 a, I ~ ~ o F ( x5o P, ) ~llF" ~ 5 M ) IlF"' (.)I1 5 N ) and IIF"' (x) - F"' (y)ll 5 k IIx - - 1 1 for a11 x, y E R. Denote 8 = M a p , w = ~ a / and 3 ~ S = ~ a p Define ~ . sequences

and

Moreover, assume:

u (xo, RP) C R, where

n

R = lim x d i , B E (0,;)) n-cc i=o 27(20-1)(e~-se~+i60-8) OIS< 17(1-0)~ , O I w
0 in the form:

F ( x n )+ F' ( x n )(yn - xn) = 0, 3F ( x n )+ 3F' ( x n ) [xn + $ (yn - xn)] (yn - xn) 4F' (yn)(xn+l - yn) = 0, for approximating a solution x* of equation F ( x ) = 0. Let F : R 5 X -+ Y be a three-times FrBchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume:

+

(1)

ro= F'

(xo)-' E L (Y,X ) for some xo E R with

llroll 5 P;

(2) IPoF (x0)Il 5 7 ; (3) llF" (x)ll I M ( x E 0 ); ( 4 ) IlF"' ( x ) - F"' (y)ll 5 L IIx - yll ( x ,Y E R ) , ( L 2 0 ) . Denote by a0 = MPq, co = LPq3, and define sequences

where

f ( ) = +( 1 - 2 ) Suppose: a0 E

and g ( x ,Y ) =

[(,I:), + %Y]

'

i),co < h (ao),where

(0,

h ( x )=

27(2~-1)(2-1)(2-3+&)(2-3-~)

17(1-2)"

9

and A = f (ao)-'. then show: iteration {x,} ( n 2 0 ) is well defined, remains in U ( x o ,Rq) for all n 2 0 and converges to a solution x* of equation F ( x ) = 0. the solution x* is unique in U ( x o , - Rq) n R and

&

where y = a. ao 6.10. Consider the multipoint iteration method [161]:

yn

= xn - F'

+

Gn = F' x n + ~= y n

(xn)-l F (x,)

-

[F' (xn $ (yn - x,)) - F' (x,)] i G n [ I - i G n ] (yn - xn) ( n 2 O ) ,

,

for approximating a solution x* of equation F ( x ) = 0. Let F : R c X -+ Y be a three times FrBchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume:

234

6. More Results on Newton's Method (1)

ro= F'

(xo)-l E L (Y,X ) exists for some xo

E

R and

llroll I P;

(2) IlFoF (x0)ll I rl; (3) IIF" ( ~ 1 1 1I M ( x E 0 ); ( 4 ) IlF"'(x)II I N ( x € f l ) ;

( ~ > l l 5 L IIx - ~ l (l x ,Y E 01, ( L 2 0 ) .

( 5 ) llF"' ( x ) - F"'

Denote by a0 = MPq, bo = N/3q2 and co = L/3q3. Define the sequence

where

and

+ + 5 ) + 18xy + 1721 .

[27x3( x 2 22

g ( x ,y, z ) = If a0 E (0,;), 17co

+ l8aobo < p (ao),where

+ + 2) ( x 2+ 2%+ 4 ) ,

p ( x ) = 27 (1 - X ) ( 1 - 22) ( x 2 x

U(xo,~q)cRR , = [l+?(l+ao)]

,

1

7=

a0

A = f (ao)rl,

then show: iteration { x n ) ( n 2 0 ) is well defined, remains in U ( x o ,Rq) for all n 2 0 and converges to a solution x* of equation F ( x ) = 0, which is unique in U ( x o , - Rq) n R. Moreover the following estimates hold for all n 2 0:

&

6.11. Consider the Halley method [68],[131]

where

for approximating a solution x* of equation F ( x ) = 0. Let F : R G X + be a twice F'rkchet-differentiable operator defined on an open convex subset of a Banach space X with values in a Banach space Y. Assume:

6.6. Exercises

(1) F' (xo)-'

E

L (Y,X ) exists for some xo

(2) llF' (xo)-l F (xo) II

(3) llF'

E

0;

I P;

F" (xo)I1 I 7;

(4) IIF' (xo)-l [F" ( 2 )- F" (911 II I

11%

-

YII

( 2 ,Y

E 0).

+

+

(rl I r 2 ) where r l , 7-2 are the positive roots of h ( t ) = 8 - t zt2 9 t 3 , then show: iteration {x,) ( n 2 0 ) is well defined, remains in U ( x o ,r l ) for 0 and converges to a unique solution x* of equation F ( x ) = 0 in all n u ( x o ,r l ) . Moreover, the following estimates hold for all n 2 0:

>

-ro is the negative root of h, and tn+l = H (t,), where

H ( t ) = t - h (4lh' ( t ) 1 - + L ( t~)'

Lh ( t )= h (4lh" (4 h'(t)2 '

6.12. Consider the iteration [68],[158]

where r, = F' (xn)-' and T (2,) = +I',A~,F (x,) ( n 2 O ) , for approximating a solution x* of equation F ( x ) = 0. Here A : X x X + Y is a bilinear operator with IlAll = a, and F : R & X + Y is a F'rbchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y . Assume:

(1) F' (xo)-l = ro E L ( Y , X ) exists for some xo E R with

llroll 5 8;

(2) IIr'oF (x0)ll I rl; (3) llF' ( x )- F' (Y)II I k

112 - yll(x, Y E

a>.

Let a, b be real numbers satisfying a E [0,

i),b E (0,o),where

236


Set a0

=

1, co

=

1, bo

=

b

and do = 1

+ .,b

Define sequence

~ifi > g),

dk ( n 0). If a = l ~ P q6 [0, $), U (XO,TT]) 0 , r = and rn+l = limn,, r,, a E [O, then show: iteration {x,) (n 0) is well defined, remains in U(xo,rq) and converges to a solution x* of equation F (x) = 0, which is unique in U(xo, -$- rq). Moreover the following estimates hold for all n 0

>

>

and OC)

6.13. Consider the Halley-method [68], [I751 in the form:

for approximating a solution x* of equation F (x) = 0. Let F : R G X + Y be a twice FrGchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume: (1) roE L (Y, X) exists for some xo E R with

(2) llF" (x)11

llroll < P;

M (x E 0 ) ;

(3) llF" (x) - F" (Y)II I N Ilx - Yll (x,Y 6 0); (4) IIFoF (xo) 11 I T ] ; (5) the polynomial p (t) = $t2 - i t + $ , where M ~ $ + I k2 has two positive roots rl and 7-2 with (r1 < 7-2).

Let sequences {s,), {t,}, (n

> 0) be defined by

If, u (xo,rl) R, then show: iteration {x,} ( n 2 0) is well defined, remains in U (so,r l ) for all n 0 and converges to a solution x* of equation F (x) = 0.

>

6.6. Exercises

237

Moreover if rl < r2 the solution x* is unique in following estimates hold for all (n 0 )

>

u (xo,r2). Furthermore, the

6.14. Consider the two-point method [68], [153]:

for approximating a solution x* of equation F ( x ) = 0. Let F : R X -+ Y be a twice-FrBchet-differentiableoperator defined on an open convex subset R of a Banach space X with values in a Banach space Y . Assume: (1)

ro= Ft (xo)-'

E

L ( Y , X ) exists for some xo

E

R with

llroll 5 P;

(2) IlFoF (xo)ll 5 rl;

(3) llF" (x)ll I M ( x E a>; (4) IIF" ( x )- F" (Y>II I K

112

-~

l ,l

x , Y E 0.

Denote by a0 = MPr), b0 = KPV2. Define sequences

2(1-x)

where f ( x ) = and gp ( x ,Y ) = (0, bo < hp (ho),where

3x3+2y(l-~)[(1-6~)~+(2+3~)1. 24(1-x)'

~f

a.

E

i),

>

then show: iteration { x n ) ( n 0 ) is well defined, remains in U ( x o ,$) for all n 0 and converges to a solution x* of equation F ( x ) = 0, which is unique in U ( x o , Moreover, the following estimates hold for all n 2 0:

>

where y

z).

=

and A

=f

(ao)-'.

238


6.15. Consider the two-step method:

c

for approximating a solution x* of equation F (x) = 0. Let F : R X 2 Y be a FrBchet-differentiable operator defined on an open convex subset R of a Banach space X with values in a Banach space Y. Assume:

ro = f' (xo)-'E L (Y,X) for some xo E R, llroll5 P; (2) IlFoF (x0)Il 5 v;

(1)

(3) IIF1(x) - F1(y)ll 5 K llx - yll (X,YE 0 ) . Denote a0 = lcPq and define the sequence a,+l = f (an12g (a,)?, (n 1 0): andg(x) = x 2 ( x + 4 ) / 8 . Ifao E (o,;), U(x0,Rq) G where f (x) = ,-,z-z 1+= 7= and A = f (ao)-', then show: iteration {x,) (n 0) R, R = is well defined, remains in U (xo,Rq) for all n 0 and converges to a solution x* of equation F (x) = 0, which is unique in U(xo, - Rq) n 0 . Moreover, the following estimates hold for all n 2 0

e,2

>

>

&

6.16 (a) Assume there exist non-negative parameters K , M, L,

t, p, q, 6 E [O,1]

such that:

and

Show: iteration {t,) (n 2 0) given by

is nondecreasing, bounded above by t**, and converges to some t* such that

Moreover, the following estimates hold for all n

>0

6.6. Exercises

239

(b) Let F: D c X + Y be a Fr6chet-differentiable operator. Assume: there exist an approximation A(x) E L(X, Y) of F1(x),an open convex subset Do of D, xo E Do, a bounded outer inverse A# of A(xo), and parameters r > 0, K > 0, M 0, L 0, ,U 0, e 0 such that (1)-(3) hold

and

for all x, y E Do, and

Show: sequence {x,} (n > 0) generated by Newton-like method with

is well defined, remains in U(xo, s * ) for all n 2 0 and converges to a unique solution x* of equation A#F(x) = 0, (xo,t*) n Do Moreover, the following estimates hold for all n

20

and 112,

- x*11 I t* - t,.

( c ) Assume:

- there exist an approximation A(x)

E L(X, Y) of F1(x), a simple solution x* E D of equation (I), a bounded outer inverse A t of A(x*) and non-negative M , ,iil, , such that: parameters K ,

x,

and

for all x, y E D;

240

6. More Results on Newton's Method -

equation

has a minimal non-negative zero r* satisfying

L r + L < 1, and U(x*, r*) G D.

>

Show: sequence {xn) (n 0) generated by Newton-like method is well de0 and converges to x* provided that fined, remains in U(x*,r*) for all n xo E U(x*, r*). Moreover, the following estimates hold for all n 2 0:

6.17 (a) Let F : D (m 2 integer). Assume:

>

G X

+

>

Y be an m-times Frkchet-differentiable operator

(al) there exist an open convex subset Do of D,xo E Do,a bounded outer inverse F' (xo)# of F' (xO),and constants Q > 0, cri 2 0, i = 2, ..., m + 1 such that for all x, y E Do the following conditions hold: (6.6.1) IIF'(X~)#(F(")(x) - F(") (xo))ll I E , E > 0, Vx E U (xo,60) and some 60 > 0.

llF'

11 F' (xo)'

F (xo) II I 711 F(') (xo)

11 I cri

the positive zeros s of p' is such that

where,

Show: polynomial p has only two positive zeros denoted by t*, t** (t* I t**).

6.6. Exercises

(a2) U(xO,6)

C DO,

6 = max(60, t*, t**).

>

Moreover show: sequence {x,) (n 0) generated by Newton's method with ~ )F ~ ( X ~ ) ) ] - ~ F ~ ( (n X ~2) 0) # is well defined, Ft(x,)# = [I F t (X ~ ) # ( F ' ( X remains in U(xo, t*) and converges to a solution x* E U(xo,t*) of equation F ~ ( x ~ ) # F ( x= ) 0;

+

-

the following estimates hold for all n

20

and

where {t,) (n 2 0) is a monotonically increasing sequence converging to t* and generated by

(b) Let F : D C X + Y be an m-times F'rkchet-differentiable operator (m integer). Assume:

> 2 an

(bl) condition (1) holds; (bz) there exists an open convex subset Do of D , xo E Do, and constants a, p, 7 0 such that for any x E Do there exists an outer inverse F'(x)# of Ft(x) satisfying N(Ft(x)#) = N(Ft(xo)#) and

>

1

1

IFt(y)#

0

+

Ftt[x t ( -~x)1(1 - t)dt(y -

I [ y l l y-

X)~II I

+ .- .+ 21 Ily - x]I2,

for all x, y E Do,

and U(x0, r ) where,

C DO

with r = min

{&,

60).


242

Show: sequence {xn) (n 2 0) generated by Newton's method is well defined, remains in U(xo,r) for all n 2 0 and converges to a solution x* of F'(X~)#F(X)= 0 with the iterates satisfying N(Ft(xn)#) = N(Ft(xo)#) (n 2 0). Moreover, the following estimates hold for all n 2 0

and Ilxn-x~II 5

1-rn *IIxI-xoII

5

5 n. Any outer inverse M of L will be an n x m matrix. Show:

(a) If rank(L) = n, then L can be written as

where I is the n x n identity matrix, and A is an m x m invertible matrix.The n x m matrix M = [ I

~ 1 A - l

is an outer inverse of L for any n x (m - n) matrix B (b) If rank (L) = r < n, then L can be written as

where A is an m x m invertible matrix, I is the r x r identity matrix, and C is an n x n invertible matrix. If E is an outer (inner) inverse of the matrix

[ ; :1,

then the n x m matrix

is an outer (inner) inverse of L. (c) E is both an inner and an outer inverse of be written in the form

[

:]

if and only if E can

6.6. Exercises

243

6.19 Let F : D 2 X

-+ Y be a Frkchet-differentiable operator between two Banach space X and Y , A ( x ) E L ( X ,Y ) ( x E D ) be an approximation to F' ( x ) . Assume that there exist an open convex subset Do of D, xo E Do, a bounded ) constants r], k > 0, M , L, p, 1 2 0 such outer inverse A# of A (= A ( s o )and that for all x , y E Do the following conditions hold:

Assume h = or] I

Do,t

*

-

-

l-b-J(l-b)'-2h , 7

4 (1 - b)2, o := max ( k ,M + L ) , and u = u ( x o ,t*) . Then show

(i) sequence {x,) ( n 2 0) generated by x,+l = x,-A (x,)' F (x,) ( n 2 0 ) with A (x,)# = [I A# ( A (x,) - A ) ] A# remains in U and converges to a solution x* E U of equation A#F ( x ) = 0.

+

-'

(ii) equation A#F ( x ) = 0 has a unique solution in where

6={ U( X O(xo,t**) , t*) n D~ if n Do

u n {R ( A # ) + x o ) ,

4

h = ( 1 - b)2 if h < (1 - b ) 2 ,

and

(iii) IIxn+l - x,I

5 tn+l - tn, I ~ X * - xnll I t* - tn, where to = 0, tn+l f ( tn+ ~ ( t , ) ' t ) = %t2- (1 - b) t + r], and g ( t ) = 1 - Lt - l.

=

fctn)

6.20 Let F : D

X + Y be a Frkchet-differentiable operator between two Banach spaces X and Y and let A ( x ) E L ( X ,Y ) be an approximation of F' ( x ) . Assume that there exist an open convex subset Do of D, a point xo E Do and constants r], k > 0 such that for any x E Do there exists an outer inverse A (2)' of A ( x )satisfying N ( A (x)') = N ( A # ) , where A = A ( x o )and A# is a


244

bounded outer inverse of A, and for this outer inverse the following conditions hold:

) Do with for all x , y E Do and t E [O, I], h = i k r l < 1 and ~ ( x o , r 2 r = 1-h' L Then show sequence {x,) ( n > 0 ) generated by X,+I = x, A (x,)' F (x,) ( n > 0 ) with A (x,)' satisfying N ( A (x,)') = N ( A # ) remains in u ( x o ,r ) and converges to a solution x* of equation A#F ( x ) = 0. 6.21 Show that Newton's method with outer inverses x,+l

= x, - F' (x,)' ( n 2 0 ) converges quadratically to a solution x* E 6 n { R ( F f ( x o ) # ) x o ) of equation F' (xo)'F ( x ) = 0 under the conditions of Exercise 6.4 with A ( x ) = F' ( x ) ( x E Do).

+

C X + Y be Frkchet-differentiable and assume that F' ( x )satisfies a Lipschitz condition

6.22 Let F : D

Assume x* E D exists with F ( x * )= 0. Let a Suppose there is an

t) C D.

> 0 such that U (x*,

F' (x*)' E R (F' ( x * ) )= { B E L (Y,X ) : BF' ( x * )B = B , B # 0 ) such that IIF' (x*)' 11 I a and for any x E U (x*,&), the set R (F' ( x ) )contains an element of minimum norm. Then show there exists a ball U (x*,r ) 2 D with cr < such that for any xo E U (x*,r ) the sequence {x,) ( n 0 ) xn+l = x, - F' (x,) # F (x,) ( n 2 0 ) with

&

F' (xo)'

>

E argmin (11BII

+

IB

E

R (F' (xo)))

and with F' (2,)' = ( I F' (xo)' (F' (x,) - F' ( X ~ ) ) ) - ~(xo)' F ' converges quadratically to 3* E U (xo,&) n { R ( F 1(xo)') x o ) , which is a solution xo = { x xo : x E of equation F' (xo)' F ( x ) = 0. Here, R(F1(xo)') R(F' (xo)#)).

+

+

+

Chapter 7

Equations with Nonsmooth Operators In recent years, Newton-type methods have been generalized to solve many problems in mathematical programming. An excellent treatment of these generalizations in the form of a genearlized equation was inaugurated by S. M. Robinson [297], [298]. Relevant work has been conducted by Josephy [200]–[203] and Argyros [76], [81]–[89]. When applied to an optimization problem Newton-type methods constitute a type of sequential quadratic programming (SQP) algorithms. These algorithms are very efficient and among the most promising for solving large-scale problems. See Gill, Murray and Wright [168] for more discussions on the numerical aspects of these algorithms. There have been several previous papers dealing with Newton’s method for solving non-Fréchet differentiable operators in normed spaces Kojima and Shindo [209] analized Newton and quasi-Newton methods for systems of piecewise continuously differentiable operators, and Kummer [215] extended the Kojima-Shindo result to real Banach spaces and studied Newton’s method defined by a large family of linear continuous operators. Catinas [117], Chen and Yamamoto [121] Nguen and Zabrejko [363], Argyros [18], [34], [68] have provided results on equations involving a Fréchet differentiable term and another term whose differentiability is not assumed. Robinson [302] discusses a Newton method for nonsmooth operators which admits a certain kind of a point-based approximating operator. He then provides a semilocal convergence analysis using the Newton-Kantorovich hypothesis and the nondiscrete mathematical induction (see Potra and Ptak [284], [286]. In this chapter we further exploit the weaker version of the Newton-Kantorovich theorem. The benefits of this approach have already been explained in Section 5.1. In particular in the first section we show that under weaker hypotheses and same computational cost we can provide a finer convergence analysis and a better information on the location of the solution than the corresponding one in [302]. 245

246

7. Equations with Nonsmooth Operators

Moreover, in sections 2–4 we examine the semilocal convergence of Newton’s method under even weaker hypotheses. Furthermore in Sections 5 and 6 we do the same for the secant method under different hypotheses whose advantages over each other have already been explained in Section 5.1. Finally as a future research we suggest to the motivated reader to further exploit our work here to improve the results obtained by Robinson [302], Pang et. all [267], Qi [287], Han [178], Shapiro [309], Ip et all [197] concerning Bouligand and other types of derivatives in connection with Newton-type methods.

7.1

Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions

The classical Newton-Kantorovich method (4.1.3) replaces the operator whose solution is sought by an approximate operator that we hope can be solved easier. A solution of the approximation (linearization) can be used to restart the process. Under certain conditions a sequence is being generated that converges to x∗ . In particular in the case of the Newton–Kantorovich theorem (see Section 4.1) the approximating operators are linearizations of the form F (x) + F ′ (x)(x − z) where given x we solve for z. Robinson [302] extended this theorem to hold for nonsmooth operators, where the linearization is no longer available. This class includes smooth operators as well as nonsmooth reformulations of variational inequalities. He used as an alternative a point-based-approximation which imitates the properties of linearization of Newton’s method. Here we extend Robinson’s work, and in particular we show that it is possible under weaker hypotheses, and the same computational cost to provide finer error bounds on the distances involved, and a more precise information on the location of the solution x∗ . We need the definitions. Definition 7.1.1 Let F be an operator from a closed subset D of a metric space (X, m) to a normed linear space Y . Operator F has a point-based approximation PBA on D if there exist an operator A : D × D → Y and a number η ≥ 0 such that for each u and v in D, (a) kF (v) − A(u, v)k ≤ 12 ηm(u, v)2 , and (b) operator A(u, ·) − A(v, ·) is Lipschitzian on D with modulus ηm(u, v). We then say operator A is a PBA for F with modulus η. Moreover if for a given point x0 ∈ D (c) operator A(u, ·) − A(x0 , ·) is Lipschitzian on D with modulus η0 m(u, x0 ), then we say operator A is a PBA for F with moduli (η, η0 ) at the point x0 . Note that if F has a Fréchet-derivative that is Lipschitzian on a closed, convex set D with modulus η, and X is also a normed linear space then by choosing A(u, v) = F (u) + F ′ (u)(v − u)

(7.1.1)

7.1. Convergence of Newton-Kantorovich Method. Case 1: Lipschitzian Assumptions

247

(used in method (4.1.3)), we get by (a) kF (v) − F (u) − F ′ (u)(v − u)k ≤

1 ηkv − uk2 , 2

whereas (b) and (c) are obtained by the Lipschitzian property of F ′ . As already noted in [300] the existence of a PBA does not necessarily imply differentiability. Method (4.1.3) is applicable to any operator having a PBA A, if solving the equation A(x, z) = 0 for z provided that x is given is possible. We also need the definition concerning bounds for the inverses of operators: Definition 7.1.2 Let X, D, and Y be as in Definition 7.1.1, and F : D → Y . Then we define the number: kF (u) − F (v)k δ(F, D) = inf , u= 6 v, u, v ∈ D . m(u, v) Clearly, if δ (F, D) 6= 0 then F is 1–1 on D. We also define the center-Lipschitzian number kF (u) − F (x0 )k δ0 (F, D) = inf , u 6= x0 , u ∈ D . m(u, x0 ) Set d = δ(F, D) and d1 = δ0 (F, D). At this point we need two lemmas. The first one is the analog of the Banach Lemma on invertible operators 1.3.2. The second one gives conditions under which operator F is 1–1. Lemma 7.1.3 Let (X, d) be a Banach space, D a closed subset of X, and Y a normed linear space. Let F and G be functions from D into Y , G being Lipschitzian with modulus η and center-Lipschitzian with modulus η0 . Let x0 ∈ D with F (x0 ) = y0 . Assume: (a) U (y0 , α) ⊆ F (D), (b) 0 ≤ η < d, (c) U (x0 , d−1 1 α) ⊆ D, and (d) θ0 = (1 − η0 d−1 1 )α − kG(x0 )k ≥ 0. Then the following hold: U (y0 , θ0 ) ⊆ (F + G) U (x0 , d−1 1 α) and

0 < d − η ≤ δ(F + G, D).

248


−1 Proof. Let y ∈ U (y0 , θ0 ), x ∈ U (x0 , d−1 (y − G(x)). 1 α) and define T y(x) = F Then we can have in turn

ky − G(x) − y0 k ≤ ky − y0 k + kG(x) − G(x0 )k + kG(x0 )k ≤ θ0 + η0 d−1 1 α + kG(x0 )k = α.

That is the set T y(x) 6= ∅ and in particular it contains a single point because d > 0. The rest follows exactly as in Lemma 3.1 in [300, p. 298] (see also the more detailed proof of Lemma 7.2.2). Remark 7.1.1 In general η0 ≤ η,

and

d ≤ d1

(7.1.2)

hold and ηη0 , dd1 can be arbitrarily large (see Section 5.1). If η0 is replaced by η and d1 by d in Lemma 1 then our result reduces to the corresponding one in [Lemma 3.1, [300]]. Denote θ0 by θ if η0 is replaced by η and d1 by d. Then in case strict inequality holds in any of the inequalities in (7.1.2), we get θ < θ0 ,

(7.1.3)

which improves the corresponding Lemma 3.1 in [302], and under the same computational cost, since in practice the computation of η requires the computation of η0 or d1 respectively. Lemma 7.1.4 Let X and Y be normed linear spaces, and let D be a closed subset of X. For x0 ∈ D and F : D → Y assume: function A : D × D → Y is a PBA for F on D with moduli (η, η0 ) at the point x0 ; there exists x0 ∈ D such that U (x0 , ρ) ⊆ D. Then the following hold: δ F, U (x0 , ρ) ≥ d − η1 ρ

(7.1.4)

where,

1 d = δ A(x0 , ·), D , and η1 = η0 + η. 2

Moreover, if d − η1 ρ > 0 then F is 1–1 on U (x0 , ρ). Proof. It follows as in the proof of Lemma 2.4 in [302, p. 295] but there are some 2 differences. Set x = x1 +x for x1 and x2 in U (x0 , ρ). Then we can write: 2 F (x1 ) − F (x2 ) = [F (x1 ) − A(x, x1 )] + [A(x, x1 ) − A(x, x2 )] + [A(x, x2 ) − F (x2 )].

(7.1.5)


249

We will find bounds on each one of the quantities inside the brackets above. First by (a) of Definition 7.1.1, we obtain kF (xi ) − A(x, xi )k ≤

1 1 ηkx − xi k2 = ηkx1 − x2 k2 . 2 8

(7.1.6)

Using the triangle inequality we get kA(x, u) − A(x, v)k ≥ kA(x0 , u) − A(x0 , v)k − k[A(x, u) − A(x0 , u)] − [A(x, v) − A(x0 , v)]k.

(7.1.7)

By the definition of d, and (7.1.7) we get in turn δ(A(x, ·), D) ≥ δ(A(x0 , ·), D) − sup{k[A(x, u) − A(x0 , u)]

− [A(x, v) − A(x0 , v)]k/ku − vk, u 6= v, u, v ∈ D} ≥ d − η0 kx − x0 k ≥ d − η0 ρ.

(7.1.8)

Hence by (7.1.5), (7.1.8) and (c) of Definition 7.1.1 we have 1 kF (x1 ) − F (x2 )k ≥ (d − η0 ρ) − ηkx1 − x2 k kx1 − x2 k 4 and for x1 6= x2 , 1 kF (x1 ) − F (x2 )k/kx1 − x2 k ≥ d − η0 ρ − ηkx1 − x2 k 4 1 1 ≥ d − η0 ρ − ηk(x1 − x0 ) + (x0 − x2 )k ≥ d − η0 ρ − ηρ. 4 2

Remark 7.1.2 If equality holds in (7.1.2), then Lemma 7.1.4 reduces to Lemma 2.4 in [302]. Otherwise our result provides a wider range for δ(F, U (x0 , ρ)). Let r0 and d0 be positive numbers. It is convenient to define scalar sequences {tn }, {tn } by tn+2 = tn+1 +

2 d−1 0 η(tn+1 − tn ) , t0 = 0, t1 = r0 −1 2(1 − d0 η0 tn+1 )

sn+2 = sn+1 +

2 d−1 0 η(sn+1 − sn ) , s0 = 0, s1 = r0 . 2(1 − d−1 0 ηsn+1 )

(7.1.9)

and (7.1.10)

We set hA = d−1 0 ηr0 ,

η=

η0 + η 2

(7.1.11)

250


and hK = d−1 0 ηr0 .

(7.1.12)

Note that 1 hK ≤ hA ≤ hK . 2

(7.1.13)

If strict inequality holds in (7.1.2) then so does in (7.1.13). It was shown in Section 5.1 that sequence {tn } converges monotonically to some t∗ ∈ (0, 2r0 ] provided that hA ≤ 1.

(7.1.14)

Moreover by the Newton–Kantorovich theorem for nonsmooth equations 4.2.4 sequence {sn } converges monotonically to some s∗ with 0 < s∗ =

r0 1 − (1 − 2hK )1/2 ≤ 2r0 hK

(7.1.15)

provided that the famous Newton–Kantorovich hypotheses hK ≤ 1.

(7.1.16)

holds. Note also that hK ≤ 1 ⇒ hA ≤ 1

(7.1.17)

but not vice versa unless equality holds in (7.1.2). Under hypotheses (7.1.14) and (7.1.16) we showed in Section 5.1 (for η0 < η): tn < s n

(n ≥ 2)

tn+1 − tn < sn+1 − sn ∗

∗

t − tn ≤ s − s n

(7.1.18) (n ≥ 1)

(7.1.19) (7.1.20)

and t∗ ≤ s∗ ≤ 2r0 .

(7.1.21)

Note that {sn } was essentially used as a majorizing sequence for {xn } in Theorem 3.2 in [302] for Newton’s method. We can state the following semilocal convergence theorem for Newton’s method involving nonsmooth operators: Theorem 7.1.5 Let F : D ⊆ X → Y be a continuous operators x0 ∈ D, and r0 > 0. Suppose that A : D × D → Y is a PBA for F with moduli (η, η0 ) at the point x = x0 . Moreover assume:


251

(a) δ(A(x0 , ·), D) ≥ d0 > 0; (b) 0 < hA ≤ 12 ; (c) for each y ∈ U (0, d0 (t∗ − r0 )) equation A(x0 , x) = y has a solution x; (d) the solution S(x0 ) of A(x0 , S(x0 )) = 0 satisfies kS(x0 ) − x0 k ≤ r0 ; and (e) U (x0 , t∗ ) ⊆ D. Then the Newton iteration defining xn+1 by A(xn , xn+1 ) = 0

(n ≥ 0)

is well defined, remains in U (x0 , t∗ ) for all n ≥ 0 and converges to a solution x∗ ∈ U (x0 , t∗ ) of equation F (x) = 0. Moreover the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ tn+1 − tn

(7.1.22)

kxn − x∗ k ≤ t∗ − tn .

(7.1.23)

and

Furthermore if ρ given in Lemma 7.1.4 satisfies t∗ < ρ < d0 η1−1

(7.1.24)

then x∗ is the unique solution of equation F (x) = 0 in U (x0 , ρ). Proof. We use Lemma 7.1.3 with quantities F , G, α, x0 and y0 replaced by A(x0 , ·), A(x1 , ·) − A(x0 , ·), d1 (t∗ − r0 ), x1 = S(x0 ), and 0 respectively. Hypothesis (a) of the Lemma follows from the fact that A(x0 , x) = y has a unique solution x1 (since δ(A(x0 , ·), D)) ≥ d0 > 0). For hypothesis (b) we have δ(A(x1 , ·), D) ≥ δ(A(x0 , ·), D) − η0 kx1 − x0 k ≥ d0 − η0 r0 > 0.

(7.1.25)

Condition (c) of Lemma 7.1.3 follows immediately from the above choices. The condition hA ≤ 12 is equivalent to 2r0 (η + η0 ) ≤ d0 . We have d1 = δ0 (F, D) ≥ δ(F, D) ≥ d0 , and kG(x1 )k ≤ 21 η0 r02 ≤ 12 ηr02 . Then condition (d) of Lemma 7.1.3 follows from the estimate η0 r0 η η θ ≥ 1− d1 (t∗ − r0 ) − r02 = (d1 − η0 r0 )(t∗ − r0 ) − r02 d1 2 2 η0 2 η 2 ∗ ∗ ≥ (d0 − η0 r0 )(t − r0 ) − r0 = (d0 − η0 t1 )(t − t1 ) − t1 2 2 η ∗ = (d0 − η0 r0 )(t − t2 ) + (d0 − η0 t1 )(t2 − t1 ) − (t1 − t0 )2 2 = (d0 − η0 r0 )(t∗ − t2 ) ≥ η0 r0 (t∗ − t2 ) ≥ 0. It follows from Lemma 7.1.3 that for each y ∈ U (0, t∗ − kx1 − x0 k), the equation A(x1 , z) = y has a unique solution z = x2 in D since δ(A(x1 , ·), D) > 0.

252


We also have A(x0 , x1 ) = A(x1 , x2 ) = 0 and A(x1 , x1 ) = F (x1 ). By Definition 7.1.2 and kx2 − x1 k ≤ δ(A(x1 , ·), D)−1 kA(x1 , x2 ) − A(x1 , x1 )k

−1 −1 ≤ d−1 kA(x0 , x1 ) − F (x1 )k 0 (1 − d0 η0 kx1 − x0 k) −1 −1 1 ≤ d−1 ηkx1 − x0 k2 ≤ t2 − t1 . 0 (1 − d0 η0 kx1 − x0 k) 2

Hence, we showed: kxn+1 − xn k ≤ tn+1 − tn ,

(7.1.26)

U (xn+1 , t∗ − tn+1 ) ⊆ U (xn , t∗ − tn )

(7.1.27)

and

hold for n = 0, 1. Moreover for every z ∈ U (x1 , t∗ − t1 ), kz − x0 k ≤ kz − x1 k + kx1 − x0 k ≤ t∗ − t1 + t1 = t∗ − t0 , implies z ∈ U (x0 , t∗ − t0 ). Given they hold for n = 0, 1, . . . , j, then kxj+1 − x0 k ≤

j+1 X i=1

j+1 X kxi − xi−1 k ≤ (ti − ti−1 ) = tj+1 − t0 = tj+1 . i=1

The induction can be completed by simply replacing above x0 , x1 by xn , xn+1 respectively, and noticing that kxn+1 − xn k ≤ r0 . Indeed for the crucial condition (d) of Lemma 7.1.3 assigning F as A(xn , ·), G as A(xn+1 , ·) − A(xn , ·), α as d1 (t∗ − tn+1 ) we have, since d1 = δ0 (F, D) ≥ δ(F, D) ≥ d0 − η0 tn : η η(tn+1 − tn ) d1 (t∗ − tn+1 ) − (tn+1 − tn )2 θ0 ≥ 1 − d1 2 η = [d1 − η(tn+1 − tn )](t∗ − tn+1 ) − (tn+1 − tn )2 2 η ∗ ≥ (d0 − ηtn+1 )(t − tn+1 ) − (tn+1 − tn )2 2 η ∗ = (d0 − ηtn+1 )(t − tn+2 ) + (d0 − ηtn+1 )(tn+2 − tn+1 ) − (tn+1 − tn )2 2 = (d0 − ηtn+1 )(t∗ − tn+2 ) ≥ 0. It follows by hypothesis (b) that scalar sequence {tn } is Cauchy. From (7.1.26) and (7.1.27) it follows {xn } is Cauchy too in a Banach space X and as such it converges to some x∗ ∈ U (x0 , t∗ ). Furthermore we have kF (xn+1 )k = kF (xn+1 ) − A(xn , xn+1 )k ≤

1 1 ηkxn+1 − xn k2 ≤ η(tn+1 − tn )2 . 2 2


253

That is kF (xn+1 )k converges to zero as n → ∞. Therefore by the continuity of F we deduce F (x∗ ) = 0. Estimate (7.1.23) follows from (7.1.22) by using standard majorization techniques. Moreover for the uniqueness part using Lemma 7.1.4 and hypothesis (7.1.24) we deduce that x∗ is an isolated zero of F , since F is 1–1 on U (x0 , t∗ ). Remark 7.1.3 (a) If equality hold in (7.1.2), then our theorem reduces to the corresponding Theorem 3.2 in [302]. Otherwise according to (7.1.17)–(7.1.21) our theorem provides under weaker conditions, and the same computational cost finer error bounds on the distances involved and a more precise information on the location of the solution x∗ . Moreover the uniqueness of the solution x∗ is shown in a larger ball (see (7.1.24) and compare it with 3.7 in [302] (i.e. compare (7.1.24) with s∗ < ρ < d0 η −1 ). (b) Condition (c) of Theorem 7.1.5 can be replaced by the stronger but more practical: (b)′ For each fixed x0 and y in U (0, d0 (t∗ − r0 )) = U (or in U (0, d0 r0 )) operator Q given by Q(x) = x − y + A(x0 , x) is a contraction on U and maps U into itself. It then follows by the contraction mapping principle (see Chapter 3) that equation A(x0 , x) = y has a solution x in U. We can also provide a local convergence result for Newton’s method: Theorem 7.1.6 Let F : D ⊆ X → Y be as in Theorem 7.1.5. Assume: x∗ ∈ D is a simple solution of equation F (x) = 0; operator F has a point-based approximation A on D with moduli (ℓ, ℓ0 ) at the point x∗ . Moreover assume that the following hold: (a) δ(A(x∗ , ·), D)≥ d∗ > 0; (b) d∗ > ℓ0 + 12 ℓ r∗ ; (c) for each y ∈ U (0, d∗ r∗ ) and provided x0 ∈ U 0 (x∗ , r∗ ) the equation A(x0 , x) = y has a solution x satisfying kx − x∗ k ≤ r∗ ; and (d) U (x∗ , r∗ ) ⊆ D. Then Newton’s method is well defined, remains in U (x∗ , r∗ ) for all n ≥ 0 and converges to x∗ , so that kxn+1 − x∗ k ≤

ℓkxn − x∗ k2 − ℓ0 kxn − x∗ k)

2(d∗

(n ≥ 0).

(7.1.28)

Proof. It follows immediately by replacing x0 by x∗ in Lemma 7.1.4 and in appropriate places in the proof of Theorem 7.1.5. Remark 7.1.4 (a) The local convergence of Newton’s method was not studied in [302]. However if it was it would have been as in Theorem 7.1.6 with ℓ0 = ℓ. Note however that as in (7.1.2) in general ℓ0 ≤ ℓ

(7.1.29)

254


∗ holds, and ℓℓ0 can be arbitrarily large. Denote by rR the convergence radius to have been obtained by Robinson. It then follows by (b) of Theorem 7.1.6 that

r1∗ =

2d∗ 2d∗ ≤ r∗ = . 3ℓ 2ℓ0 + ℓ

(7.1.30)

Moreover in case strict inequality holds in (7.1.29), then so does in (7.1.30). This observation is very important in computational mathematics since our approach allows a wider choice for initial guesses x0 . Moreover the corresponding error bounds are obtained if we set ℓ = ℓ0 in (7.1.29). Therefore our error bounds are also finer (for ℓ0 < ℓ) and under the same computational cost. (b) If operator A is given by the linearization (7.1.1), then A(xn , xn+1 ) = 0 is a linear equation. Thus we have the Newton’s method for differentiable operators. Otherwise, equation A(xn , xn+1 ) = 0 applies to a wide class of important real life problems for which it is not a linear equation. Such problems are discussed below. Application 7.1.7 (Application to Normal Maps.) An important class of nonsmooth functions that have tractable point-based approximations. This class was given in [302] and is a reformulation of the class of variational inequalities on a Hilbert space, so in particular it includes the finite-dimensional generalized equations investigated by Josephy [200]–[203], as well as many other problems of practical importance. For many members of this class, computational methods are available to solve the Newton subproblems, and therefore the method we suggest here can be applied not only as a theoretical device, but also as a practical computational method for solving problems. The functions we shall treat are the so-called normal maps. We define these, and discuss briefly their significance, before relating them to point-based approximations. Let C be a closed convex subset of a Hilbert space H, and denote by Π the metric projector from H onto C. Let Ω be an open subset of H meeting G, and let F be a function from Ω to H. The normal map FC is defined from the set Π−1 (Ω) to H by FC (z) = F (Π(z)) + (z − Π(z)). A more extensive discussion of these normal maps is in [301], where we show how the first order necessary optimality conditions for nonlinear optimization, as well as linear and nonlinear complementarity problems and more general variational inequalities, can all be expressed compactly and conveniently in the form of FC (z) = 0 involving normal maps. This is true because, as is easily verified, the variational problem of finding an x0 ∈ C such that for each c ∈ C, hF (x0 ), c − x0 i ≥ 0 is completely equivalent to the normal-map equation FC (z0 ) = 0 through the transformation z0 = F (x0 ). However, the use of normal maps sometimes enable one to gain insight into special properties of problem classes that might have remained obscure in the formalism of variational inequalities. A particular example of this is the characterization of the local and global homeomorphism properties of linear normal maps given in [301] and improved in [291].


255

As the functions F and the sets C used in the constructioin of these normal maps are those naturally appearing in the optimizatioin or variational problems, the normal-map technique provides a simple and convenient device for formulating a wide variety of important problems from applications in single, unified manner. Results established for such normal maps will then apply to the entire range of these applicatioins, so that there is no need to derive them separately for each different type of problem. Results of this kind already established include an implicit-funciton theorem and related machinery in [300], characterizations in [301] of those normal maps derived from linear thransformations that are also global or local homeomorphisms. As we shall now show, the results on Newton’s method that we establish here will apply to normal maps, therefore encompassing in particular the problems treated in the work already cited on application of Newton’s method to certain optimization problems and variational inequalities. Our results on local convergence of Newton’s method require, in essence, the existence of a PBA that is a local homeomorhism, together with some closeness conditions. To show that these apply to normal maps, we must indicate how one can construct PBAs for such maps. Recalling our earlier demonstration that the linearization operation yields a PBA for the function being linearized, one might ask if one could ninearize the function appearing in a normal map. This does not seem easy, as such functions are in general not differentiable. However, we shall show that a simple modification of the linearization procedure will work very well. The key observation is is that although the function FC will be differentiable (indeed, often smooth), and therefore will have an easily constructible PBA. We now show that if such a PBA exists for F, then one can easily be constructed for FC .

Proposition 7.1.8 Let H be a Hilbert space and C a nonempty closed convex subset of H, and denote by Π the metric projector on C. Let F be a function from C to H, and let A be a function from C × C to H. If A is a PBA for F on C, then the function G : H × H → H defined by G(w, z) = A(Π(w), ·)C (z) is a PBA for FC on H. Further, the constant in the definition of PBA can be taken to be the same for G as for A.

Before giving the proof we note that by the definition of normal map, G(w, z) is A(Π(w), Π(z)) + (z − Π(z)). Therefore in determininng values of this function one only evaluates A on the set C × C, where it is defined. Proof. By hypothesis there is a constant φ for which A has the two properties given in Definition 7.1.1. We shall show that the same two properties hold for G,

256


again with the constant φ. First, let w and z be any elements of H. Then kFC (z) − G(w, z)k =

= kF (Π(z)) + (z − Π(z)) − [A(Π(w), Π(z)) + (z − Π(z))]k = kF (Π(z)) − A(Π(w), Π(z))k 2

≤ φ/2 kΠ(w) − Π(z)k 2

≤ φ/2 kw − zk ,

where we used the facts that A was a PBA for F on C and that the metric projector is nonexpansive. Therefore G satisfies the first property of a PBA. For the second, suppose that z and z ′ are two elements of H; we show that G(z, ·) − G(z ′ , ·) is Lipschitzian on H with modulus φ kz − z ′ k . Indeed, let x and y be elements of H. Then k[G(z, x) − G(z ′ , x)] − [G(z, y) − G(z ′ , y)]k = = k[A(Π(z), Π(x)) + (x − Π(x)) − A(Π(z ′ ), Π(x)) − (x − Π(x))]− − [A(Π(z), Π(y)) + (y − Π(y)) − A(Π(z ′ ), Π(y)) − (y − Π(y))]k = k[A(Π(z), Π(x)) − A(Π(z ′ ), Π(x))]− [A(Π(z), Π(y)) − A(Π(z ′ ), Π(y))]k ≤ φ kΠ(z) − Π(z ′ )k kΠ(x) − Π(y)k ≤ φ kz − z ′ k kx − yk ,

where we used the second property of a PBA (for A) along with the nonexpansivity of Π.

7.2

Convergence of Newton-Kantorovich Method. Case 2: H¨ olderian assumptions

In this section we weaken further assumptions made in Section 7.1 by considering Hölderian instead of Lipschitzian assumptions in order to cover a wider range of problems than the ones discussed in Section 7.1 or in [302]. Numerical examples are also provided to show that our results apply where others fail. We need the definitions: Definition 7.2.1 Let F be an operator from a closed subset D of a metric space (X, m) to a normed linear space Y . We say F has a point-based approximation p(PBA) at some point x0 ∈ D if there exist an operator A : D × D → Y and scalars k, k0 such that for each u and v in D and some p ∈ (0, 1]: kF (v) − A(u, v)k ≤

1 km(u, v)1+p , 1+p

(7.2.1)

7.2. Convergence of Newton-Kantorovich Method. Case 2: Hölderian assumptions 257

k[A(u, x) − A(v, x)] − [A(u, y) − A(v, y)]k ≤

2 km(u, v)1+p , 1+p

(7.2.2)

and k[A(u, x) − A(x0 , x)] − [A(u, y) − A(x0 , y)]k ≤

2 k0 m(u, v)1+p 1+p

(7.2.3)

for all x, y ∈ D. Let us assume that D is a convex set and X is also a normed linear space for both cases that follow. Here are some cases where the above hold: (a) Let p = 1. Then see Section 7.1. (b) If F has a Fréchet derivative that is p-Hölderian on D with modulus k and center-p-Hölderian (at x0 ) with modulus k0 then with the choice of A given by (7.1.1) we get kF (v) − F (u) − F ′ (u)(v − u)k ≤

k ku − vk1+p 1+p

whereas again (7.2.2) and (7.2.3) are equivalent to the Hölderian and center Hölderian property of operator F ′ (see also Section 5.1). We need the following generalization of the classical Banach Lemma on invertible operators 1.3.2: Lemma 7.2.2 Let (X, d) be a Banach space, D a closed subset of X, and Y a normed linear space. Let F, G be operators from D into Y , with G being Lipschitzian with modulus k, and G being center Lipschitzian with modulus k0 . Let x0 ∈ D, and set F (x0 ) = y0 . Assume that: (a) U (y0 , α) ⊆ F (D); (b) 0 ≤ k < d; (c) U (x0 , d−1 1 α) ⊆ D, and (d) θ = (1 − k0 d−1 1 )α − kG(x0 )k ≥ 0. Then the following hold: U (y0 , θ) ≤ (F + G) U (x0 , d−1 (7.2.4) 1 α) , and

δ(F + G, D) ≥ d − k > 0.

(7.2.5)

Proof. Define operator T y(x) = F −1 (y − G(x)) for y ∈ U (y0 , θ), and x ∈ U (x0 , d−1 1 α). We can get: ky − G(x) − y0 k ≤ ky − y0 k + kG(x) − G(x0 )k + kG(x0 )k ≤ θ + k0 (d−1 1 α) + kG(x0 )k = α.

258


Therefore T y(x) is a singleton set since δ > 0. That is T y is an operator on 1/p U (x0 , (d−1 ). This operator maps U (x0 , d1−1 α) into itself. Indeed for x ∈ 1 α) −1 U (x0 , d1 α) d(T y(x), x0 ) = d F −1 (y − G(x)), F −1 (y0 ) ≤ d−1 1 α, so we get:

d(T y(x), x0 ) ≤ d−1 1 α. Moreover let u, v be in U (x0 , d−1 1 α), then d(T y(u), T y(v)) ≤ d−1 d(F −1 (y − G(u)), F −1 (y − G(v))) ≤ d−1 kd(u, v) so d(T y(u), T y(v)) ≤ d−1 kd(u, v).

(7.2.6)

It follows by hypothesis (b) and (7.2.6) that T y is a contraction operator (see Chapter 3) mapping closed ball U (x0 , d−1 1 α) into itself and as such it has a unique fixed point x(y) in this ball, since ( k[F (u) − F (v)] + [G(u) − G(v)]k δ(F + G, D) = inf , m(u, v) ) u 6= v, u, v ∈ D

kG(u) − G(v)k ≥ δ − sup , u 6= v, u, v ∈ D m(u, v) ≥ δ − k > 0.

(7.2.7)

Furthermore (7.2.7) shows F + G is one-to-one on D. The following lemma can be used to show uniqueness of the solution on the semilocal case and convergence of Newton’s method in the local case: Lemma 7.2.3 Let X and Y be normed linear spaces, and let D be a closed subset of X. Let F : D → Y , and let A be a p-(PBA) for F on D at the point x0 ∈ D. Denote by d0 the quantity δ(A(x0 , ·), D). If x0 ∈ D and U (x0 , ρ) ⊆ D, then δ(F, U (x0 , ρ)) ≥ d0 − (2k0 + k)

ρp . 1+p

p

ρ In particular, if d0 > (2k0 + k) 1+p then F is one-to-one on U (x0 , ρ).

Proof. Let y, z be points in U (x0 , ρ). Set x =

y+z z .

We can write

F (y) − F (z) = [F (y) − A(x, y)] + [A(x, y) − A(x, z)] + [A(x, z) − F (z)].


Then we get kF (y) − A(x, y)k ≤

k k kx − yk1+p = 1+p ky − xk1+p , 1+p 2 (1 + p)

and similarly kF (z) − A(x, z)k ≤

k kx − zk1+p . 21+p (1 + p)

Using the estimate kA(x, u) − A(x, v)k ≥ kA(x0 , u) − A(x0 , v)k − k[A(x, u) − A(x0 , u)] − [A(x, v) − A(x0 , v)]k, we get in turn: δ(A(x, ·), D) = n o = inf kA(x,u)−A(x,v)k | u 6= v, u, v ∈ D ku−vk ( ≥ δ(A(x0 , ·), D) − sup u 6= v, u, v ∈ D ≥ d0 −

)

k[A(x,u)−A(x0 ,u)]−[A(x,v)−A(x0 ,v)]k , ku−vk

2 2k0 p k0 kx − x0 kp ≥ d0 − ρ . 1+p 1+p

Hence, we get kF (y) − F (z)k ≥

2k0 ρp k d0 − ky − zk − p ky − zk1+p , 1+p 2 (1 + p)

and for y 6= z

2k0 ρp k kF (y) − F (z)k ≥ d0 − − p ky − zkp ky − zk 1+p 2 (1 + p) ρp ≥ d0 − (2k0 + k) . 1+p

Remark 7.2.1 Lemmas 7.2.2 and 7.2.3 can be reduced to the corresponding ones in [302] if p = 1, d1 = d = d0 , and k0 = k. In the case p = 1 and k0 < k or d1 > d our Lemma 7.2.2 improves the range for θ, whereas our Lemma 7.2.3 enlarges ρ given in [300]. Note that in general k0 ≤ k hold and

and

d1 k k0 , d

d ≤ d1

can be arbitrarily large (see Section 5.1).

260


It is convenient for us to define scalar iteration {tn } (n ≥ 0) by tn+2 = tn+1 +

k(tn+1 − tn )1+p , t0 = 0, t1 = η 2k0 −1 p d0 tn+1 d0 (1 + p) 1 − 1+p

(7.2.8)

for some η ≥ 0. The convergence of sequence {tn } was studied in Section 5.1 and in [90]. In particular, we showed that if p 2k0 p + 1 p d−1 h= k+ 0 η ≤ 1 1+p p for p 6= 1 or h = (k0 + k)ηd−1 0 ≤ 1 for p = 1,

(7.2.9)

then sequence {tn } is monotonically convergent to its upper bound t∗ with η ≤ t∗ ≤

p+1 η = r. p

(7.2.10)

Note also that these conditions are obtained as special cases from Lemma 5.1.1. Conditions (7.2.9) were compared favourably in [90] with corresponding ones already in the literature [303]. That is why we decided to use them here over the earlier ones. It will be shown that {tn } is a majorizing sequence for Newton’s iteration {xn } (n ≥ 0) to be defined in the main semilocal convergence theorem that follows: We can state and prove the main semilocal convergence result for Newton’s method involving p-(PBA). Theorem 7.2.4 Let X and Y be Banach spaces, D a closed convex subset of X, x0 ∈ D, and F a continuous operator from D into Y . Suppose that F has a p-(PBA) approximation at x0 . Moreover assume: (a) δ(A(x0 , ·), D) ≥ d0 > 0; (b) condition (7.2.9) holds; (c) for each y ∈ U (0, d0 (t∗ − η)) the equation A(x0 , x) = y has a solution x; (d) the solution S(x0 ) of A(x0 , S(x0 )) = 0 satisfies kS(x0 ) − x0 k ≤ η; and (e) U (x0 , t∗ ) ⊆ D. Then the Newton iteration defining xn+1 by A(xn , xn+1 ) = 0

(n ≥ 0)

(7.2.11) ∗

remains in U (x0 , r), and converges to a solution x ∈ U (x0 , r) of equation F (x) = 0. Moreover the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ tn+1 − tn

(7.2.12)

kxn − x∗ k ≤ t∗ − tn

(7.2.13)

and

where sequence {tn } is defined by (7.2.8) and t∗ = lim tn . n→∞


Proof. We use Lemma 7.2.2 with quantities F , G, x0 , and y0 replaced by A(x, ·), A(x1 , ·)−A(x, ·), x1 = S(x0 ), and 0 respectively. Hypothesis (a) of Lemma 1 follows from the fact that A(x0 , x) = y has a unique solution x1 (since δ(A(x0 , ·), D) > 0). For hypothesis (b) we have 2 2 k0 η p ≥ d0 − k0 η p 1+p 1+p (by (7.2.9)).

δ(A(x1 , ·), D) ≥ δ(A(x0 , ·), D) − >0

To show (c) we must have U (x1 , t∗ − η) ⊆ D. Instead by hypothesis (e) it suffices to show: U (x1 , t∗ − η) ⊂ U (x0 , t∗ ), which is true since kx1 − x0 k + η ≤ t∗ − η + η = t∗ . We also have by (7.2.2) kA(x0 , x1 ) − A(x1 , x1 )k ≤

1 kkx0 − x1 k1+p . 1+p

The quantity d0 η 1 2 k0 η p d−1 − kη 1+p 1− 0 1+p p 1+p

(7.2.14)

is nonnegative by (7.2.9), and is also an upper bound on θ given by (d) in Lemma 7.2.2, since θ is bounded above by 2 d0 η 1− k0 η p d−1 − kA(x0 , x1 ) − A(x1 , x1 )k. 1 1+p p It follows from Lemma 7.2.2 that for each y ∈ U (0, t∗ − kx1 − x0 k), the equation A(x1 , z) = y has a unique solution z = x2 in D, since δ(A(x1 , ·), D) > 0. We also have A(x0 , x1 ) = A(x1 , x2 ) = 0 and A(x1 , x1 ) = F (x1 ). By Definition 7.2.1 we get in turn: kx2 − x1 k ≤ δ(A(x1 , ·), D)−1 kA(x1 , x2 ) − F (x1 )k −1 k −1 −1 2k0 p ≤ d0 1 − d0 kx1 − x0 k kx1 − x0 k1+p 1+p 1+p −1 k −1 2k0 p ≤ d−1 1 − d (t − t ) (t1 − t2 )1+p ≤ t2 − t1 . 1 0 0 0 1+p 1+p Hence we showed: kxn+1 − xn k ≤ tn+1 − tn

(7.2.15)

262


and U (xn+1 , t∗ − tn+1 ) ⊂ U (xn , t∗ − tn )

(7.2.16) ∗

hold for n = 0, 1. Moreover for every v ∈ U (x1 , t − t1 )

kv − x0 k ≤ kv − x1 k + kx1 − x0 k ≤ t∗ − t1 + t1 = t∗ − t0 ,

implies v ∈ U (x0 , t∗ − t0 ). Given they hold for n = 0, 1, . . . , j, then kxj+1 − x0 k ≤

j+1 X i=1

kxi − xi−1 k ≤

j+1 X (ti − ti−1 ) = tj+1 − t0 = tj+1 . i=1

The induction for (7.2.15), (7.2.16) can easily be completed by simply replacing above x0 , x1 by xn , xn+1 respectively. Indeed the crucial quantity corresponding to (7.2.14) will be p p+1 d0 η k 2k0 η p d−1 − η 1+p 1− 0 1+p p p 1+p or p 2k0 p+1 k p + d−1 0 η ≤ 1, 1+p 1+p p which is nonnegative by (7.2.9). Furthermore by (7.2.9) scalar sequence {tn } is Cauchy. From (7.2.15) and (7.2.16) it follows {xn } is Cauchy too in a Banach space X, and as such it converges to some x∗ ∈ U (x0 , t∗ ). Moreover we have by (7.2.2) and (7.2.15) kF (xn+1 )k = kF (xn+1 ) − A(xn , xn+1 )k ≤ ≤

k kxn+1 − xn k1+p 1+p

k (tn+1 − tn )1+p . 1+p

Therefore kF (xn+1 )k converges to zero as n → ∞. By the continuity of F we deduce F (x∗ ) = 0. Estimate (7.2.13) now follows from (7.2.12) by using standard majorization techniques. Remark 7.2.2 The uniqueness of the solution x∗ was not considered in Theorem 7.2.4. Indeed, we do not know if under the conditions stated above the solution x∗ is unique, say in U (x0 , t∗ ). However using Lemma 7.2.2 we can obtain a uniqueness result, so that if ρ satisfies 1/p (1 + p)d0 t∗ < ρ < 2k0 + k then operator F is one-to-one in a neighborhood of x∗ , since x∗ ∈ U (x0 , t∗ ). Note that r can replace t∗ in Theorem 7.2.4 as well as in the above double inequality. That is x∗ is an isolated zero of F in this case.

7.3. Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions 263 We can also provide a local convergence result for Newton’s method: Theorem 7.2.5 Let F : D ⊆ X → Y be as in Theorem 7.2.4. Assume: x∗ ∈ D is an isolated zero of F ; operator F has a p-(PBA) on D with modulus (ℓ, ℓ0 ) at the point x∗ . Moreover assume that the following hold: (a) δ(A(x∗ , ·), D) ≥ d∗ > 0; 0 +ℓ (b) d∗ ≥ 2ℓ1+p (r∗ )p ; (c) for each y ∈ U (0, d∗ r∗ ) and provided x0 ∈ U 0 (x∗ , r∗ ) the equation A(x0 , x) = y has a solution x satisfying kx − x∗ k ≤ r∗ and U (x∗ , r∗ ) ⊆ D. Then Newton’s method generated by (7.2.11) is well defined, remains in U (x∗ , r∗ ) for all n ≥ 0, and converges to x∗ . Proof. Simply replace x0 by x∗ in Lemma 7.2.3 and in the appropriate places in the proof of Theorem 7.2.4. Example 7.2.1 Let X = Y = R, D = [0, b] for some b > 1. Define function F on D by 1

F (x) =

x1+ i − b0 x + b1 1 + 1i

for some real parameters b0 and b1 , and an integer i > 1. Then no matter how operator A is chosen corresponding to (7.2.2), (7.2.3), conditions in [302] won’t hold (i.e. set p = 1 in both (7.2.2) and (7.2.3)). However if for example A is given by (7.1.1) our conditions (7.2.2)–(7.2.3) hold for say i = 2, p = 21 , k0 = k = 1 and x0 = 1. A more interesting application for Theorem 7.2.4 is given by Example 3.3.1 (alternative approach).

7.3

Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions

We need the definitions suitabloe for further weakening earlier hypotheses: Definition 7.3.1 Let F be an operator from a closed subset D of a metric space (X, m) to a normed linear space Y . Let x0 ∈ D, ε ≥ 0, and ε0 ≥ 0. We say F has an (x0 , ε0 , ε) point-based approximation (PBA) on D if there exist an operator A : D × D → Y and R > 0 such that for all u, v, x, y ∈ U (x0 , R):

264


(a) U (x0 , R) ⊆ D,

(7.3.1)

(b) kF (v) − A(u, v)k ≤

ε m(u, v), 2

(7.3.2)

(c) k[A(u, x) − A(v, x)] − [A(u, y) − A(v, y)]k ≤ εm(u, v),

(7.3.3)

and (d) k[A(u, x) − A(x0 , x)] − [A(u, y) − A(x0 , y)]k ≤ ε0 m(u, v).

(7.3.4)

Let X be also a normed linear space. Then, if F has a Fréchet-derivative F ′ which is 2ε -continuous on D, and we set A(u, v) = F (u) + F ′ (u)(v − u), (b) gives kF (v) − F (u) − F ′ (u)(v − u)k ≤

ε kv − uk 2

(7.3.5)

provided that D is also a convex set, where as Parts (c) and (d) are equivalent to the continuity property of F ′ . Conditions (7.3.1)–(7.3.4) are weaker than the ones considered in [302, p. 293] or in Section 7.1. The advantages of such an approach have already been explained in Section 5.1 for the classical case. Similar advantages can be found in the nonsmooth case. Let δ(F, D), δ0 (F, D), δ, δ1 be as δ(F, D), δ0 (F, D), d, d1 in Definition 7.1.2 respectively. We need the following generalization of the classical Banach Lemma 1.3.2 on invertible operators in order to examine both the semilocal as well as the local convergence of Newton’s method: Lemma 7.3.2 Let X be a Banach space, D a closed subset of X, and Y a normed linear space. Let F, G be operators from D into Y . Let x0 ∈ D and set F (x0 ) = y0 . Let also G be Lipschitzian with modulus ε and center Lipschitzian at x0 with modulus ε0 . Assume that: (a) U (y0 , α) ⊆ F (D); (b) 0 ≤ ε < δ; (c) U (x0 , δ1−1 α) ⊆ D, and (d) θ = (1 − ε0 δ1−1 )α − kG(x0 )k ≥ 0. Then the following hold: U (y0 , θ) ≤ (F + G) U (x0 , δ1−1 α)

and

δ(F + G, D) ≥ δ − η > 0.

7.3. Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions 265 Proof. Define operator T y(x) = F −1 (y − G(x)) for each fixed y ∈ U (y0 , θ), and x ∈ U (x0 , δ1−1 α). We can get ky − G(x) − y0 k ≤ ky − y0 k + kG(x) − G(x0 )k + kG(x0 )k ≤ θ + ε0 (δ1−1 α) + kG(x0 )k = α.

Therefore T y(x) is a singleton set since δ > 0. That is T y is an operator on U (x0 , δ1−1 α). This operator maps U (x0 , δ1−1 α) into itself. Indeed for x ∈ U(x0 , δ1−1 α), we get: m(T y(x), x0 ) = m(F −1 (y − G(x)), F −1 (y0 )) ≤ δ1−1 α. Moreover let u, v be in U (x0 , δ1−1 α), then m(T y(u), T y(v)) ≤ δ −1 e[G(u), G(v)] ≤ δ −1 εm(u, v). It follows by hypothesis (7.3.5) that T y is a contraction operator mapping closed ball U (x0 , δ1−1 α) into itself and as such it has a unique fixed point x(y) in this ball, since δ(F + G, D) = inf {k[F (u) − F (v)] + [G(u) − G(v)]k, u 6= v, u, v, ∈ D} kG(u) − G(v)k ≥ δ − sup , u 6= v, u, v ∈ D ≥ δ − η > 0. m(u, v) Furthermore the above estimate shows F + G is one-to-one on D. Remark 7.3.1 In general ε0 ≤ ε

and

δ ≤ δ1

(7.3.6)

hold. If strict inequality holds in any of the inequalities above, then Lemma 7.3.2 improves the corresponding Lemma 3.1 in [300] since the range for θ is enlarged and under the same computational cost, since the computation of ε (or δ) in practice requires the evaluation of ε0 (or δ1 ). Note also that εε0 and δδ1 can be arbitrarily large (see Section 5.1). The following lemma helps to establish the uniqueness of the solution x∗ in the semilocal case. Lemma 7.3.3 Let X and Y be normed linear spaces, let D be a closed subset of X, and x0 ∈ D, ε0 ≥ 0 and ε ≥ 0. Let F has an (x0 , ε0 , ε) (PBA) on D. Denote by d the quantity δ(A(x0 , ·), D). If U (x0 , ρ) ⊆ D, then 1 δ(F, U (x0 , ρ)) ≥ d − ε0 ρ − ε. 2 In particular, if d > ε0 ρ + 12 ε then F is one-to-one on U (x0 , ρ).

266


Proof. Let y, z be points in U (x0 , ρ). Set x =

y+z z .

We can write

F (y) − F (z) = [F (y) − A(x, y)] + [A(x, y) − A(x, z)] + [A(x, z) − F (z)]. Then we get kF (y) − A(x, y)k ≤

ε ε kx − yk = kx − yk, 2 4

and similarly, kF (z) − A(x, z)k ≤

ε kx − zk. 4

Using the estimate kA(x, u) − A(x, v)k ≥ kA(x0 , u) − A(x0 , v)k

− k[A(x, u) − A(x0 , u)] − [A(x, v) − A(x0 , v)]k,

we get in turn δ(A(x, ·), D) = o n , u 6= v, u, v ∈ D ≥ δ(A(x0 , ·), D) = inf kA(x,u)−A(x,v)k ku−vk k[A(x,u)−A(x0 ,u)]−[A(x,v)−A(x0 ,v)]k − sup , u 6= v, u, v ∈ D ku−vk ≥ d − ε0 kx − x0 k ≥ d − ε0 ρ.

Hence, we get 1 kF (y) − F (z)k ≥ (d − ε0 ρ)ky − zk − εky − zk 2 and for y 6= z kF (y) − F (z)k 1 ≥ d − ε0 ρ − ε . ky − zk 2 From now on we set D = U (x0 , R). We can now state and prove the main semilocal convergence theorem for Newton’s method: Theorem 7.3.4 Let X and Y be Banach spaces, D a closed convex subset of X, x0 ∈ D, ε > 0, ε0 ≥ 0, and F a continuous operator from D into Y . Suppose that F has an (x0 , ε0 , ε) (PBA) approximation on D. Moreover assume: (a) δ(A(x0 , ·), D) ≥ d0 ≥ ε > 0; (b) 2ε0 η ≤ min 2(d0 − ε), d0 + 12 ε ;

7.3. Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions 267 (c) for each y ∈ U (0, d0 η) the equation A(x0 , x) = y has a solution x; (d) the solution S(x0 ) of A(x0 , S(x0 )) = 0 satisfies kS(x0 ) − x0 k ≤ η, and (e) U (x0 , 2η) ⊆ D. Then the Newton iteration defining xn+1 by A(xn , xn+1 ) = 0

(n ≥ 0)

(7.3.7)

remains in U (x0 , 2η), and converges to a solution x∗ ∈ U(x0 , 2η) of equation F (x) = 0. Moreover the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ tn+1 − tn ,

(7.3.8)

kxn − x∗ k ≤ 2η − tn

(7.3.9)

and

where lim tn = 2η n→∞

1 t0 = 0, t1 = η, tn+2 = tn+1 + (tn+1 − tn ). 2 Proof. We use Lemma 7.3.2 with quantities F , G, x0 , and y0 replaced by A(x, ·), A(x1 , ·) − A(x, ·), x1 = S(x0 ), and 0 respectively. Hypothesis (a) of Lemma 7.3.2 follows from the fact that A(x0 , x) = y has a unique solution x1 (since δ(A(x0 , ·), D) > 0). For hypothesis (b) we have δ(A(x1 , ·), U ) ≥ δ(A(x0 , ·), U ) − ε0 kx1 − x0 k ≥ d0 − ε0 η > 0. To show (c) we must have U (x1 , η) ⊆ D. Instead by hypothesis (e) of the theorem it suffices to show: U (x1 , η) ⊂ U (x0 , 2η), which is true since kx1 − x0 k + η ≤ 2η. Note that kA(x0 , x1 ) − A(x1 , x1 )k ≤

ε ε kx0 − x1 k ≤ η 2 2

and since d0 ≤ d1 = δ0 (A(x0 , ·), D), we get θ ≥ (1 − 2ηε0 d−1 1 )d0 η − kA(x0 , x1 ) − A(x1 , x1 )k 1 1 ≥ (1 − 2ηε0 d−1 )d η − εη = d − 2ε η − ε η ≥ 0, 0 0 0 0 2 2

268


by hypothesis (b). That is condition (d) of the Lemma is also satisfied. It now follows from Lemma 7.3.2 that for each y ∈ U (0, 2η − kx1 − x0 k), the equation A(x1 , z) = y has a unique solution z = x2 in U since δ(A(x1 , ·), D) > 0. We also have A(x0 , x1 ) = A(x1 , x2 ) = 0, and A(x1 , x1 ) = F (x1 ). By Definition 7.1.2 we get in turn: kx2 − x1 k ≤ δ(A(x1 , ·), U )−1 kA(x1 , x2 ) − A(x1 , x1 )k

−1 −1 ≤ d−1 kA(x0 , x1 ) − F (x1 )k 0 (1 − d0 εkx1 − x0 k) η −1 −1 −1 ε kx1 − x0 k ≤ = t2 − t1 , ≤ d0 (1 − d0 ε0 kx1 − x0 k) 2 2

by hypothesis (b). Hence, we showed: kxn+1 − xn k ≤ tn+1 − tn ,

(7.3.10)

U (xn+1 , 2η − tn+1 ) ⊆ U (xn , 2η − tn )

(7.3.11)

and

hold for n = 0, 1. Moreover for every z ∈ U (x1 , 2η − t1 ), kz − x0 k ≤ kz − x1 k + kx1 − x0 k ≤ 2η − t1 + t1 = 2η − t0 , implies z ∈ U (x0 , 2η − t0 ). Given they hold for n = 0, 1, . . . , k, then kxk+1 − x0 k ≤

k+1 X i=1

kxi − xi−1 k ≤

k+1 X i=1

(ti − ti−1 ) = tk+1 − t0 = tk+1 .

The induction can easily be completed by simply replacing above x0 , x1 by xn , xn+1 respectively. Furthermore scalar sequence {tn } is Cauchy. From (7.3.10) and (7.3.11) it follows {xn } is Cauchy too in a Banach space X and as such it converges to some x∗ ∈ U (x0 , 2η). Then we have ε kxn+1 − xn k 2 ε ≤ (tn+1 − tn ). 2

kF (xn+1 )k = kF (xn+1 − A(xn , xn+1 )k ≤

That is kF (xn+1 )k converges to 0 as n → ∞. By the continuity of F we deduce F (x∗ ) = 0. Estimate (7.3.9) follows from (7.3.8) by using standard majorization techniques. Finally note n+1 1 − 12 1 tn+2 = tn+1 + (tn+1 − tn ) = · · · = η 2 1 − 21 and consequently lim tn = 2η.

n→∞

7.3. Convergence of Newton-Kantorovich Method. Case 3: Local Lipschitzian Assumptions 269 Remark 7.3.2 The uniqueness of the solution x∗ was not studied in Theorem 7.3.4. Indeed, we do not know if under the conditions stated above the solution x∗ is unique, say in U(x0 , 2η). However, we can obtain a uniqueness result by using Lemma 7.3.3 at the point x0 , so that if ρ satisfies 2η < ρ < d0 − 12 ε ε−1 for ε ∈ (0, 2d0 ), ε > 0, ε0 > 0, and U(x0 , ε1 ) ⊆ D, 0 = ε1

then operator F is one-to-one in a neighborhood of x∗ , since x∗ ∈ U (x0 , 2η). Therefore x∗ is an isolated zero of F in this case.

Example 7.3.1 Let us show how to choose operator A in some interesting cases. Let B(u) ∈ L(X, Y ) the space of bounded linear operators from X into Y , and define A by A(u, v) = F (u) + B(u)(v − u). This is the class of the so-called Newton-like methods (see Section 5.1). Moreover assume there exist nonnegative, nondecreasing functions βt (p, q) and γ(p) on [0, +∞) × [0, +∞) and [0, +∞) respectively, and F ′ (u) ∈ L(X, Y ) such that: kF ′ (x + t(y − x)) − B(x0 )k ≤ βt (kx − x0 k, ky − x0 k) and kB(x) − B(x0 )k ≤ γ(kx − x0 k) for all x, y ∈ U (x0 , R), t ∈ [0, 1]. Then we can have Z 1 F (v) − F (u) − B(u)(v − u) = [F ′ (u + t(v − u)) − B(x0 )](v − u)dt 0

+ (B(x0 ) − B(u))(v − u).

Therefore we can define βt (p, q) = ℓ[(1 − t)p + tq]

(ℓ > 0)

and γ(p) = ℓp. In the special case of Newton’s method, i.e. when B(u) = F ′ (u) we can set ε0 = ε = 4ℓ[βt (R, R) + γ(R)] = 4ℓR. In this case crucial condition (b) of Theorem 7.3.4 becomes R≤

d0 . 4ℓ(1 + 2η)

270


We must also have U(x0 , 2η) ⊆ U (x0 , R), i.e. 2η ≤ R. By combining all the above we deduce: d0 R ∈ 2η, 4ℓ(1 + 2η) provided that: " # p −ℓ + ℓ(ℓ + d0 ) η ∈ 0, . 4ℓ Consider function F defined on [0, ∞) and given by 1

x1+ n F (x) = 1 + n1

(n > 1).

Then F ′ is center-Lipschitz with modulus 1 since for x0 = 1 (see Section 5.2). However F ′ is not Lipschitz on [0, ∞). Therefore the Newton–Kantorovich Theorem in the form given in [302] cannot apply here, whereas our Theorem 7.3.4 can.

7.4

Convergence of the Secant Method Using Majorizing Sequences

We noticed (see the numerical example at the end of this section) that such an approximations of the kind discussed in the previous sections may not exist. Therefore in order to solve a wider range of problems we introduce a more flexible and precise point-based-approximation which is more suitable for Newton-like methods and in particular for Secant-type iterative procedures (see Section 5.1). A local as well as a semilocal convergence analysis for the Secant method is provided, and our approach is justified through numerical examples. We need a definition of a point-based-approximation (PBA) for operator F which is suitable for the Secant method. Definition 7.4.1 Let F be an operator from a closed subset D of a metric space (X, m) into a normed linear space Y . Operator F has a (PBA) on D at the point x0 ∈ D if there exists an operator A : D × D × D → Y and scalars ℓ0 , ℓ such that u, v, w, x, y and z in D, kF (w) − A(u, v, w)k ≤ ℓm(u, w)m(v, w), k[A(x, y, z) − A(x0 , x0 , z)] − [A(x, y, w) − A(x0 , x0 , w)]k ≤ ℓ0 [m(x, x0 ) + m(y, x0 )]m(z, w),

(7.4.1)

(7.4.2)

7.4. Convergence of the Secant Method Using Majorizing Sequences

271

and k[A(x, y, z) − A(u, v, z)] − [A(x, y, w) − A(u, v, w)]k ≤ ℓ[m(x, u) + m(u, v)]m(z, w).

(7.4.3)

We then say A is a (PBA) for F . This definition is suitable for the application of the Secant method. Indeed let X be also a normed linear space, D a convex set and F having a divided difference of order one on D × D denoted by [x, y; F ] and satisfying the standard condition [68], [99], [213], [217] (see also Chapter 2): k[u, v; F ] − [w, x; F ]k ≤ ℓ(ku − wk + kv − xk)

(7.4.4)

for all u, v, w and x in D. If we set A(u, v, w) = F (v) + [u, v; F ](w − v)

(7.4.5)

then (7.4.1) becomes kF (w) − F (v) − [u, v; F ](w − v)k ≤ ℓku − wk kv − wk,

(7.4.6)

whereas (7.4.2) and (7.4.3) are equivalent to property (7.4.4) of linear operator [·, ·; F ]. Note that a (PBA) does not imply differentiability. It follows by (7.4.1) that one way of finding a solution x∗ of equation F (x) = 0 is to solve for w the equation A(x, y, w) = 0

(7.4.7)

provided that x and y are given. We state the following generalization of the classical Banach Lemma 1.3.2 on invertible operators. The proof can be found in Lemma 7.1.3. Lemma 7.4.2 Let X, D and Y be as in Definition 7.4.1. Assume further X is a Banach space. Let F and G be operators from D into Y with G being Lipschitzian with modulus ℓ and center-Lipschitzian with modulus ℓ0 . Let x0 ∈ D with F (x0 ) = y0 . Assume that: U (y0 , α) ⊆ F (D);

(7.4.8)

0 ≤ ℓ < d;

(7.4.9)

U (x0 , d−1 1 α)

⊆ D,

(7.4.10)

and θ0 = (1 − ℓ0 d−1 1 )α − kG(x0 )k ≥ 0.

(7.4.11)

272


Then the following hold: U (y0 , θ0 ) ⊆ (F + G)(U (x0 , d−1 1 α))

(7.4.12)

δ(F + G, D) ≥ d − ℓ > 0.

(7.4.13)

and

The advantages of this approach have been explained in Remark 7.1.1. The following lemma is used to show uniqueness of the solution in the semilocal case and convergence of Secant method in the local case. Lemma 7.4.3 Let X and Y be normed linear spaces, and let D be a closed subset of X. Let F : D → Y , and let A be a (PBA) for operator F on D at the point x0 ∈ D. Denote by d the quantity δ(A(x0 , x0 , ·), D). If U (x0 , ρ) ⊆ D, then δ(F, U (x0 , ρ)) ≥ d − (2ℓ0 + ℓ)ρ.

(7.4.14)

In particular, if d − (2ℓ0 + ℓ)ρ > 0, then F is one-to-one on U (x0 , ρ). Proof. Let w, z be points in U (x0 , ρ). We can write F (w) − F (z) = [F (w) − A(x, y, w)] + [A(x, y, w) − A(x, y, z)] + [A(x, y, z) − F (z)]

By (7.4.1) we can have kF (w) − A(x, y, w)k ≤ ℓkx − wk ky − wk and kF (z) − A(x, y, z)k ≤ ℓkx − zk ky − zk. Moreover we can find kA(x, y, u) − A(x, y, v)k ≥ kA(x0 , x0 , u) − A(x0 , x0 , v)k

− k[A(x, y, u) − A(x0 , x0 , u)] − [A(x, y, v) − A(x0 , x0 , v)]k

and therefore δ(A(x, y, ·), D) ≥ ≥ δ(A(x0 , x0 , ·), D) ( k[A(x, y, u) − A(x0 , x0 , u)] − [A(x, y, v) − A(x0 , x0 , v)]k − sup , ku − vk ) u 6= v, u, v ∈ D

≥ d − ℓ0 (kx − x0 k + ky − x0 k) ≥ d − 2ℓ0 ρ.

(7.4.15)


273

Furthermore, we can now have kF (w) − F (z)k ≥ (d − 2ℓ0 ρ)kw − zk − ℓ[kx − wk ky − wk + kx − zk ky − zk] ℓ ≥ (d − 2ℓ0 ρ)kw − zk − kw − zk2 2 and for w 6= z, kF (w) − F (z)k ≥ d − (2ℓ0 + ℓ)ρ. kw − zk

(7.4.16)

Remark 7.4.1 In order for us to compare our result with the corresponding Lemma 2.4 in [302, p. 294], first note that if: (a) equality holds in both inequalities in (7.3.6), u = v and x = y in (7.4.1)–(7.4.3), then our result reduces to Lemma 2.4 by setting k2 = ℓ = ℓ0 . (b) Strict inequality holds in any of the inequalities in (7.3.6), u = v and x = y then our Lemma 7.4.3 improves (enlarges) the range for ρ, and under the same computational cost. The implications of that are twofold (see Theorems 7.4.5 and 7.4.6 that follows): in the semilocal case the uniqueness ball is more precise, and in the local case the radius of convergence is enlarged. We will need our result on majorizing sequences for the Secant method. The proof using conditions (C1 )–(C3 ) can be found in [94], whereas for well-known condition (C4 ) see e.g. [306]. Detailed comparisons between conditions (C1 )–(C4 ) were given in [93]. In particular if strict inequality holds in (7.3.6) (first inequality) the error bounds under (C1 )–(C3 ) are more precise and the limit of majorizing sequence more accurate than under condition (C4 ). Note also that these conditions are obtained as special cases of Lemma 5.1.1. Lemma 7.4.4 Assume for η ≥ 0, c ≥ 0, d0 > 0: (C1 ) for all n ≥ 0 there exists δ ∈ [0, 1) such that: (C1 )

ℓδ n (1 + δ)η +

and

δℓ0 (2 − δ n+2 − δ n+1 )η + δℓ0 c ≤ δd0 , 1−δ

δℓ0 (2 − δ n+2 − δ n+1 )η + δℓ0 c < d0 , 1−δ

or

(C2 ) there exists δ ∈ [0, 1) such that: ℓ(1 + δ)η + and

2δℓ0 η + δℓ0 c ≤ δd0 , 1−δ

2δℓ0 η + δℓ0 c < d0 , 1−δ

274


or (C3 ) there exists a ∈ [0, 1] and  h i √  a= 6 0  0, −1+2a1+4a , δ∈   [0, 1), a=0

such that

(ℓ + δℓ0 )(c + η) ≤ d0 δ, η ≤ δc and ℓ0 ≤ aℓ, or

q (C4 ) d−1 ℓc + 2 d−1 0 0 ℓη ≤ 1 for ℓ0 = ℓ. Then, (a) iteration {tn } (n ≥ −1) given by t−1 = 0,

t0 = c,

tn+2 = tn+1 +

1

t1 = c + η,

d−1 0 ℓ(tn+1 − tn−1 ) (tn+1 − d−1 0 ℓ0 [tn+1 − t0 + tn ]

− tn ),

(7.4.17)

is non-decreasing, bounded above by r r=

η + c, 1−δ

and converges to some t∗ such that 0 ≤ t∗ ≤ r. Moreover, the following estimates hold for all n ≥ 0: 0 ≤ tn+2 − tn+1 ≤ δ(tn+1 − tn ) ≤ δ n+1 η. (b) Iteration {sn } (n ≥ −1) given by s−1 − s0 = c, sn+1 = sn+2 +

s0 − s1 = η,

d−1 0 ℓ(sn−1 − sn+1 ) (sn − sn+1 ), −1 1 − d0 ℓ0 [(s0 + s−1 ) − (sn + sn+1 )]

(7.4.18)

provided that s−1 ≥ 0, s0 ≥ 0, s1 ≥ 0 is non-increasing, bounded below by s given by s = s0 −

η , 1−δ

and converges to some s∗ such that 0 ≤ s∗ ≤ s.


275

Moreover, the following estimates hold for all n ≥ 0: 0 ≤ sn+1 − sn ≤ δ(sn − sn+1 ) ≤ δ n+1 η. We denote by (C), Conditions (C1 ) or (C2 ) or (C3 ) or (C4 ). We can state and prove the main semilocal convergence result for the Secant method involving a (PBA). Theorem 7.4.5 Let X and Y be Banach spaces, D a closed convex subset of X, x−1 , and x0 ∈ D with kx−1 − x0 k ≤ c, and F a continuous operator from D into Y . Suppose operator F has a (PBA) on D at the point x0 . Moreover assume: δ(A(x−1 , x0 , ·), D) ≥ d0 > 0; Condition (C) holds; for each y ∈ U (x0 , d0 (t∗ − t1 ) the equation A(x−1 , x0 , x) = y has a solution x; the solution S(x−1 , x0 ) of A(x−1 , x0 , S(x−1 , x0 )) = 0 satisfies kS(x−1 , x0 ) − x0 k ≤ η; and U (x0 , t∗ ) ⊆ D. Then the Secant iteration defining xn+1 by A(xn−1 , xn , xn+1 ) = 0

(7.4.19)

remains in U (x0 , t∗ ), and converges to a solution x∗ ∈ U (x0 , t∗ ) of equation F (x) = 0. Moreover the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ tn+1 − tn

(7.4.20)

kxn − x∗ k ≤ t∗ − tn ,

(7.4.21)

and

where sequence {tn } is defined by (7.4.17) and t∗ = lim tn . n→∞

Proof. We use Lemma 7.4.2 with quantities F , G, x0 and y0 replaced by A(w, x, ·), A(x0 , x1 , ·) − A(v, x, ·), x1 = S(x−1 , x0 ), and 0 respectively. Hypothesis (7.4.8) of the Lemma follows from the fact that A(x−1 , x0 , x) = y has a unique solution x1 (since d0 = δ(A(x−1 , x0 , ·), D) > 0). For hypothesis (7.4.9) we have using (7.4.2) δ(A(x0 , x1 , ·), D) ≥ δ(A(x−1 , x0 , ·), D) − ℓ0 (kx0 − x−1 k + kx1 − x0 k) ≥ d0 − ℓ0 t1 > 0

by (C)).

(7.4.22)

276


To show (7.4.10) we must have U (x1 , t∗ −t1 ) ⊆ D. Instead by hypothesis U (x0 , t∗ ) ⊆ D, it suffices to show: U (x1 , t∗ − t1 ) ⊂ U (x0 , t∗ ) which is true since kx1 − x0 k + t∗ − t1 ≤ t∗ . We also have by (7.4.1) kA(x−1 , x0 , x1 ) − A(x1 , x1 , x1 )k ≤ ℓ(t1 − t−1 )(t1 − t0 ).

(7.4.23)

It follows by (7.4.11) and (7.4.23) that θ0 ≥ 0 if θ0 ≥ [1 − ℓ0 d−1 1 (c + η)]d0 η − ℓ(t1 − t−1 )(t1 − t0 ) ≥ 0,

(7.4.24)

which is true by (7.4.17) and since d0 ≤ d1 = δ0 (A(x−1 , x0 , ·), D). It follows from Lemma 7.4.2 that for each y ∈ U (0, r − kx1 − x0 k), the equation A(x0 , x1 , z) = y has a unique solution since δ(A(x0 , x1 , ·), D) > 0. We also have A(x0 , x1 , x2 ) = A(x−1 , x0 , x1 ) = 0. By Definition 7.1.2, the induction hypothesis, (7.4.1) and (7.4.17) we get in turn: kx2 − x1 k ≤ δ(A(x0 , x1 , ·), D)−1 kA(x−1 , x0 , x1 ) − F (x1 )k ≤ ≤

ℓkx−1 − x0 k kx0 − x1 k d0 − ℓ0 (kx0 − x−1 k + kx1 − x0 k)

d−1 0 ℓ(t1 − t−1 )(t1 − t0 ) = t2 − t1 . 1 − d−1 0 ℓ 0 t1

(7.4.25)

Hence we showed: kxn+1 − xn k ≤ tn+1 − tn ,

(7.4.26)

U (xn+1 , t∗ − tn+1 ) ⊂ U (xn , t∗ − tn )

(7.4.27)

and

hold for n = 0, 1. Moreover for every v ∈ U (x1 , t∗ − t1 ) kv − x0 k ≤ kv − x1 k + kx1 − x0 k ≤ t∗ − t1 + t1 − t0 , implies v ∈ U (x0 , t∗ − t0 ). Given they hold for n = 0, 1, . . . , j, then kxj+1 − x0 k ≤

j+1 X i=1

j+1 X kxi − xi−1 k ≤ (ti − ti−1 ) = tj+1 − t0 . i=1


277

The induction for (7.4.26) and (7.4.27) can easily be completed by simply replacing x−1 , x0 , x1 , by xn−1 , xn , xn+1 respectively. Indeed corresponding to (7.4.24) we have: θ0 ≥ [1−ℓ0d−1 1 (tn+1 −t0 +tn )]d0 (tn+2 −tn+1 )−ℓ(tn+1 −tn−1 )(tn+1 −tn ) = 0, (7.4.28) which is true by (7.4.17). Scalar sequence {tn } is Cauchy. From (7.4.26) and (7.4.27) it follows {xn } is Cauchy too in a Banach space X, and as such it converges to some x∗ ∈ U (x0 , t∗ ). Moreover we have by (7.4.1) and (7.4.26) kF (xn+1 )k = kF (xn+1 ) − A(xn−1 , xn , xn+1 )k

≤ ℓkxn − xn−1 k kxn − xn+1 k ≤ ℓ(tn − tn−1 )(tn+1 − tn ) −→ 0 as n → ∞.

By the continuity of F we deduce F (x∗ ) = 0. Finally estimate (7.4.21) follows from (7.4.20) by using standard majorization techniques. Remark 7.4.2 The uniqueness of the solution x∗ was not considered in Theorem 7.4.5. Indeed, we do not know if under the conditions stated above the solution x∗ is unique, say in U (x0 , t∗ ). However using Lemma 7.4.3 we can obtain a uniqueness result, so that if ρ satisfies t∗ < ρ
0; 0 ≤ r∗
1 respectively.

7.5.1

An Exercise

We need the following definition of a point-based approximation for p-Hölderian operators: Definition 7.5.1 Let f be an operator from a closed subset D of a metric space (X, m) to a normed linear space Y , and let x0 ∈ D, p ∈ (0, 1]. We say f has a point-based approximation (PBA) on D at x0 ∈ D if there exist an operator A : D × D → Y and scalars ℓ, ℓ0 such that for each u, and v in D, kf (v) − A(u, v)k ≤ ℓm(u, v)p

(7.5.1)

k[A(u, x) − A(v, x)] − [A(u, y) − A(v, y)]k ≤ 2ℓm(u, v)p

(7.5.2)

k[A(u, v) − A(x0 , x)] − [A(u, y) − A(x0 , y)]k ≤ 2ℓ0 m(u, v)p

(7.5.3)

and

for all x, y ∈ D. Assume X is also a normed linear space and D is a convex set. Then this definition is suitable for operators f where Fréchet derivative f ′ is (p − 1)-Hölderian with modulus pℓ and (p − 1)-center H¨ olderian with modulus pℓ0 . Indeed if we set A(u, v) = f (u) + f ′ (u)(v − u),

280


then equation F (x) = 0 says kF (v) − f (u) − f ′ (u)(v − u)k ≤ ℓkv − ukp , where as parts (7.5.2) and (7.5.3) are equivalent to the Hölderian and center Hölderian property of f ′ . Definition 7.5.2 Let f be an operator from a metric space (X, m) into a normed linear space Y . We let kF (u) − F (v)k δ(f, X) = inf , u 6= v, u, v ∈ X . (7.5.4) m(u, v)p Note that by (7.5.4) it follows F −1 (if it exists) is p1 -H¨ olderian with modulus −1/p δ(f, X) . We also define kF (u) − F (x0 )k δ0 (f, X) = inf , u = 6 x , u ∈ D . 0 m(u, x0 )p Set δ = δ(f, D) and δ1 = δ0 (f, D). Lemma 7.5.3 Let X be a Banach space, D a closed subset of X, and Y a normed linear space. Let f, y be operators from D into Y , g being p-H¨ olderian with modulus ℓ and center p-H¨ olderian with modulus ℓ0 . Let x0 ∈ D with f (x0 ) = y0 . Assume that: U (y0 , α) ⊆ F (D);

(7.5.5)

0 ≤ ℓ < δ;

(7.5.6)

U and

x0 , (δ1−1 α)1/p

⊆ D;

(7.5.7)

θ = (1 − ℓ0 δ1−1 )α − kg(x0 )k ≥ 0. Then show the following hold: (f + g) U (x0 , (δ1−1 α)1/p ⊇ U (y0 , θ)

(7.5.8)

(7.5.9)

and

δ(f + g, D) ≥ δ − ℓ > 0.

(7.5.10)

Lemma 7.5.4 Let X and Y be normed linear spaces, and let D be a closed subset of X. Let f : D → Y , and let A be a PBA for f on D at x0 ∈ D. Denote by d the quantity δ(A(x0 , ·), D). If U (x0 , ρ) ⊆ D, then show δ(f, U (x0 , ρ)) ≥ δ0 − 2ℓ0 − In particular, if δ0 > 2ℓ0 +

ℓ 2p−1 ,

ℓ . 2p−1 then show f is one-to-one on U (x0 , ρ).

7.5. Exercises

281

We need the following result on fixed points (see also Section 3.3): Theorem 7.5.5 Let Q : X ⊂ X → Y be a continuous operator, let p ∈ [0, 1), q ≥ 0 and x0 ∈ D be such that kQ(x) − Q(y)k ≤ qkx − ykp

for all x, y ∈ D

(7.5.11)

d ∈ [0, 1),

(7.5.12)

U (x0 , r) ⊆ D

(7.5.13)

and

where, 1

b = q 1−p ,

(7.5.14)

d = b−1 kx0 − Q(x0 )k,

and

r=

b . 1 − dp

(7.5.15)

Then show: sequence {xn } (n ≥ 0) generated by successive substitutions: xn+1 = Q(xn )

(n ≥ 0)

(7.5.16)

converges to a fixed point x∗ ∈ U (x0 , r) of operator Q, so that for all n ≥ 0: n

kxn+1 − xn k ≤ dp b,

(7.5.17)

and dnp b. 1 − dp

(7.5.18)

d1 = b−1 (kx0 − Q(x0 )k + qr) < 1,

(7.5.19)

q 1/p r < 1

(7.5.20)

kxn − x∗ k ≤ Moreover if

or

then show x∗ is the unique fixed point of Q in U (x0 , r). If instead of (7.5.12) and (7.5.19) we assume h = 2dp < 1,

(7.5.21)

2b ≤ q −1/p

(7.5.22)

and

282


then the conclusions of Theorem 7.5.5 hold in the ball U (x0 , r0 ) where r0 = 2b.

(7.5.23)

We set kx0 − Q(x0 )k ≤ η.

(7.5.24)

We can state the main semilocal convergence result for Newton’s method involving a p-(PBA) approximation for f . Theorem 7.5.6 Let X and Y be Banach spaces, D a closed convex subset of X, x0 ∈ D, and F a continuous operator from D into Y . Suppose that F has a p-(PBA) approximation at x0 . Moreover assume: δ(A(x0 , ·), D) ≥ δ0 > 0, η ≤ b, 2ℓ0 < δ0 ,

(7.5.25)

(1 − 2ℓ0 δ1−1 r0 )δ0 η − ℓη p ≥ 0

(7.5.26)

and conditions (7.5.8), (7.5.21), (7.5.22) are satisfied for α = δ0 η and q = ℓ(δ0 − 2ℓ0 )−1 , respectively where, δ1 = δ0 (A(x0 , ·), D). for each y ∈ U (0, δ0 η) the equation A(x0 , x) = y has a solution x; the solution T (x0 ) of A(x0 , T (x0 )) = 0 satisfies kx0 − T (x0 )k ≤ η; and U (x0 , r0 ) ⊆ D, where r0 is given by (7.5.23). Then show: the Newton iteration defining xn+1 by A(xn , xn+1 ) = 0

(7.5.27)

remains in U (x0 , r0 ), and converges to a solution x∗ ∈ U (x0 , r0 ) of equation F (x) = 0, so that estimates (7.5.17) and (7.5.18) hold. The uniqueness of the solution x∗ was not considered in Theorem 7.5.6. Using Lemma 7.5.4 show: we can obtain a uniqueness result so that if r0 < ρ

and δ0 > 2ℓ0 +

ℓ , 2p−1

then operator F is one-to-one in a neighborhood of x∗ , since x∗ ∈ U (x0 , r0 ). That is show: x∗ is an isolated zero of F in this case.

7.5.2

Another Exercise

With the notations introduced in the previous subsection, assume p > 1. We need the following result on fixed points (see also Section 3.3):

7.5. Exercises

283

Theorem 7.5.7 Let Q : D ⊂ X → X be an operator, p, q scalars with p > 1, q ≥ 0, and x0 a point in D such that kQ(x) − Q(y)k ≤ qkx − ykp

for all x, y ∈ D;

(7.5.28)

equation 2p−1 qrp − r + kx0 − Q(x0 )k = 0

(7.5.29)

1 has a unique positive solution r in I = kx0 − Q(x0 )k, 21 q 1−p ; and U (x0 , r) ⊆ D.

Then show: sequence {xn } (n ≥ 0) generated by successive substitutions xn+1 = Q(xn )

(n ≥ 0)

(7.5.30)

converges to a unique fixed point x∗ ∈ U (x0 , r) of operator Q, so that for all n ≥ 1 kxn+1 − xn k ≤ dkxn − xn−1 k ≤ dn kx0 − Q(x0 )k

(7.5.31)

and kxn − x∗ k ≤

dn kx0 − Q(x0 )k 1−d

(7.5.32)

where, d = (2r)p−1 q.

(7.5.33)

Remark 7.5.1 Conditions can be given to guarantee the existence and uniqueness of r. Indeed define scalar function h by h(t) = 2p−1 qtp − t + η,

kx0 − Q(x0 )k ≤ η.

By the intermediate value theorem show (7.5.29) has a solution r in I if 1 1 1−p h q ≤0 2

(7.5.34)

(7.5.35)

or if η≤

1 1 1−p q (1 − q −1 ) = η0 2

(7.5.36)

and q ≥ 1.

(7.5.37)

284


This solution is unique if η ≤ η1

(7.5.38)

where,

1 1 η1 = min η0 , (pq) 1−p 2

.

(7.5.39)

Indeed if (7.5.37) and (7.5.38) hold, it follows that h′ (t) ≤ 0 on I1 = [η, η1 ] ⊂ I. Therefore h crosses the t-axis only once (since h is nonincreasing on I1 ). Set η = min{η1 , η2 }

(7.5.40)

where, 1

η2 = (2p q) 1−p .

(7.5.41)

It follows from (7.5.34) and (7.5.40) that if η ≤ η,

(7.5.42)

r ≤ 2η = r0 .

(7.5.43)

then

We can state the main semilocal convergence theorem for Newton’s method involving a p-(PBA) (p > 1) approximation for f . Theorem 7.5.8 Let X and Y be Banach spaces, D a closed convex subset of X, x0 ∈ D, and F a continuous operator from D into Y . Suppose that F has a p-(PBA) approximation at x0 . Moreover assume: δ(A(x0 , ·), D) ≥ δ0 > 0; 2ℓ0 < δ0 ,

(1 − 2ℓ0 δ1−1 r0 )δ0 η − ℓη p ≥ 0

(7.5.44)

where, δ1 = δ0 (A(x0 , ·), D); (7.5.32) in Lemma 7.1.3 and conditions (7.5.37), (7.5.42) in Remark 7.5.1 hold for α = δ0 η and q = ℓ(δ0 − 2ℓ0 )−1 ; for each y ∈ U (0, δ0 η) the equation A(x0 , x) = y has a solution x; the solution T (x0 ) of A(x0 , T (x0 )) = 0 satisfies kx0 − T (x0 )k ≤ η, and U (x0 , r0 ) ⊆ D, where r0 is given by (7.5.43). Then show: Newton iteration remains in U (x0 , r0 ), and converges to a solution x∗ ∈ U (x0 , r0 ) of equation F (x) = 0, so that estimates (7.5.31) and (7.5.32) hold.

7.5. Exercises

285

Remark 7.5.2 Our Theorem 7.5.8 compares favorably with Theorem 3.2 in [302, p. 298]. First of all the latter theorem cannot be used when e.g. p ∈ [1, 2) (see the example that follows). In the case p = 2 our condition (7.5.44) becomes for δ0 = δ1 h0 = δ0−1 (ℓ + 4ℓ0 )η ≤ 1

(7.5.45)

where as the corresponding one in [302] becomes h = 4δ0−1 ℓη ≤ 1.

(7.5.46)

Clearly (7.5.45) is weaker than (7.5.46) if 4 ℓ > . ℓ0 3

(7.5.47)

But ℓℓ0 can be arbitrarily large. Therefore our Theorem 7.5.8 can be used in cases when Theorem 3.2 in [302] cannot when p = 2. Example 7.5.1 Let X = Rm−1 , m ≥ 2 an integer and define matrix operator Q on X by Q(z) = M + M1 (z),

z = (z1 , z2 , . . . , zm−1 ),

(7.5.48)

where M is a real (m − 1) × (m − 1) matrix, ( 0 i 6= j, M1 (z) = mij = p zi , i=j and e.g. p ∈ [1, 2). Operators Q of the form (7.5.48) appear in many discretization studies in connection with the solution of two boundary value problems [68]. Clearly no matter how operator A is chosen the conditions in Definition 2.1 in [302, p. 293] cannot hold, whereas our result can apply.


Chapter 8

Applications of the weaker version of the Newton-Kantorovich theorem Diverse further applications of the results in Chapter 5 are provided here.

8.1

Comparison with Moore Theorem

In this section we are concerned with the problem of approximating a locally unique solution x* of equation

where F: D C IRk + IRk is continuously differentiable on an open convex set D , and Ic is a positive integer. Rall in [290] compared the theorems of Kantorovich and Moore. This comparison showed that the Kantorovich theorem has only a slight advantage over the Moore theorem with regard to sensitivity and precision, while the latter requires less computational cost. Later Neumaier and Shen [255] showed that when the derivative in the Krawczyk operator is replaced with a suitable slope then the corresponding existence theorem is at least as effective as the Kantorovich theorem with respect to sensitivity and precision. At the same time Zuhe and Wolfe [369] showed that the hypotheses in the affine invariant form of the Moore theorem are always implied by the Newton-Kantorovich theorem 4.2.4 but not necessarily vice versa. Here we show that this implication is not true in general for a weaker version of theorem 4.2.4. We will need the following semilocal convergence theorem for Newton's method due to Deuflhard and Heindl [143]:

288

8. Applications of the weaker version of the Newton-Kantorovich theorem

Theorem 8.1.1 Let F be a Fre'chet-dzfferentzable operator defined on an open convex subset D of a Banach space X with values in a Banach space Y . Suppose that xo E D is such that the Newton-Kantorovich hypotheses (4.2.51) holds and:

and

where,

Then sequence { x n ) generated by method (4.1.3) is well defined, remains i n U ( x o ,r l ) for all n 2 0 , and converges to a solution x* of equation F ( x ) = 0 which is unique i n g ( x o ,r l ) U ( D n U ( x o ,r z ) ) , where,

Moreover the following estimates hold for all n 2 0:

and

where

Remark 8.1.1 I n the case of Moore's theorem [240] suppose that hypotheses of E I ( D ) is Theorem 8.1.1 are valid with X = Y = R" xo = z = mid(:,), where given by

:,

and define the Krawczyk operator

8.1. Comparison with Moore Theorem

289

where

w = z - F'(z)-~F(.z), and

1

1

F I ~ , Z= ~I

+

~ ' ( z t(z7 - . ~ ) ) d t

in which integration is defined as in [240]. The following result relates a weaker version of Theorem 8.1.1 with Moore's theorem. Theorem 8.1.2 Assume for xo = z = mid(%) : hypotheses (8.1.2) and (8.1.3) of Theorem 8.1.1 hold; IIF'(xo)-~[F'(x) - F'(xo)]ll I l0IIx - xoII for all x tl D;

(8.1.15)

h = 2eov I 1;

(8.1.16)

~ ( x o~ ,3 C ) D,

(8.1.17)

and

where,

K (z7)G where g7, K are given

(8.1.19)

z71

by (8.1.11) and (8.1.12)-(8.1.14) respectively.

Proof. If u E K(zy), then u = w

+v

v = Y{[F'(~)-'F[~,z71- 1)L-e' el (by (8.1.13) and (8.1.14)). We have in turn llz - uIIm 5 llz - wIIm + llvlloo I r! r l l ~ ' ( z ) - ~ { F [ zzyl , - F'(z)lllm

+

I 7 + +eoy2.

Hence (8.1.19) holds if

v+

;toy2 I 7'

which is true by (8.1.16)-(8.1.18). rn

(8.1.21)

290


Remark 8.1.2 (a) Note that Moore's theorem [240] guarantees the existence of a solution x* of equation (8.1.1) if (8.1.19) holds. (b) Theorem 8.1.2 reduces to Theorem 8.1.2 in (3691 if ifo = e. However in general

holds. Note also that

but not vice versa unless if lo= l . Hence our Theorem 8.1.2 weakens Theorem 2 in [369], and can be used in cases not covered by the latter. An example was given in [255] and [369] to show that the hypotheses of the affine invariant form of the Moore theorem may be satisfied even when those of Theorem 8.1.1 are not.

Example 8.1.1 (used also in Section 5.1) Let Ic = 1, a E and define function F : g, + R by

Choose z = mid(:,)

[o, ;), x E g7

= [a,2 -a]

then (4.2.51) becomes

That is, there is no guarantee that method generated by (4.1.3) converges to a solution of F ( x ) = 0, since the Newton-Kantorovich hypothesis (4.2.51) is violated. However using (8.1.11)-(8.1.14) and (8.1.25) we get

and if

then (8.1.19) holds. That is Moore's theorem guarantees convergence of method (4.1.3) to a solution x* of equation F ( x ) = 0 provided that (8.1.28) holds. However we can do better. Indeed, since lo= 3 - a

which improves (8.1.28).

8.2. Miranda Theorem

29 1

Remark 8.1.3 If (4.2.51) holds as a strict inequality method (4.1.3) converges quadratically t o x*. However (8.1.17) guarantees only the linear convergence to x* of the modified method

I n practice the quadratic convergence of method (4.1.3) is desirable. I n Section 5.1. we found (see condition (5.1.56)) a condition weaker than (4.2.51) but stronger than (8.1.16) so that the quadratic convergence of method (4.1.3) is guaranteed (see condition (5.1.56)).

8.2

Miranda Theorem

In this section we are concerned with the problem of approximating a locally unique solution x* of an equation

where F is defined on an open convex subset S of Rn (n a positive integer) with values in Rn. We showed in Section 5.1 that the Newton-Kantorovich hypothesis (4.2.51) can always be replaced by the weaker (5.1.56). Here we first weaken the generalization of Miranda's theorem (Theorem 4.3 in [227]). Then we show that operators satisfying the weakened Newton-Kantorovich conditions satisfy those of the weakened Miranda's theorem. In order for us to compare our results with earlier ones we need to list the following theorems guaranteeing the existence of solution x* of equation (8.2.1) (see Chapter 4).

Theorem 8.2.1 Let F : S + Rn be a Fre'chet-differentiable operator. Assume: there exists xo E S such that F1(xo)-l E L(Y,X), and set

there exists an 7 2 0 such that

there exists an e > 0 such that

and


292 where,

Then there exists a unique solution x*

E

U ( x o ,r * ) of equation F ( x ) = 0.

Remark 8.2.1 Theorem 8.2.1 is the portion of the Newton-Kantorovich theorem (4.2.4). The following theorem is due to Miranda [236], which is a generalization of the intermediate value theorem: Theorem 8.2.2 Let b > 0, xo E Rn, and define

Q = { x E Rn I IIx - xollm I b},

Qi= { X E Q : xk = X O +, ~b), and

Let F = (Fk) ( 1 5 k 5 n ) :Q k = l , 2 ,..., n,

+

Rn be a continuous operator satisfying for all

Then there exists x* E Q such that F ( x * ) = 0. The following result connects Theorems 8.2.1 and 8.2.2. Theorem 8.2.3 Suppose F : S + Rn satisfies all hypotheses of Theorem 8.21 in the maximum norm, then G satisfies the conditions of Theorem 8.2.3 on U ( x o r, * ) .

In the elegant study [227]a generalization of Theorem 8.2.3 was given (Theorem 4.3). We first weaken this generalization. Let Rn be equipped with a norm denoted by 11 . 11 and RnXn with a norm 11 . 11 such that IIM . X I [ 5 llMll. IIxII for a11 M E RnXn and x E Rn. Choose constants co, cl > 0 such that for all x E Rn

since all norms on finite dimensional spaces are equivalent. Set

8.2. Miranda Theorem Definition 8.2.4 Let S C Rn be an open convex set, and let G : S diferentiable operator on S . Let xo E S , and assume: G1(xo)= I

(the identity matrix)

293

Rn be a (8.2.14)

there exists 17 2 0 such that llG(x0)Il

5 17;

there exists an l o 2 0 such that J J G 1 ( x ) - G 1 ( x oI)eJ o J ) J x - x o J Jf o r a l l x ~ S .

Define:

W e say that G satisfies the weak center-Kantorovich conditions i n xo if

W e also say that G satisfies the strong center-Kantorovich conditions i n xo i f

Moreover define

and

R = [ T I ,rz] for l o # 0. Furthermore if lo= 0, define

and

Remark 8.2.2 The weak and strong center-Kantorovich conditions are equivalent only for the maximum norm.


294

As in [I961 we need to define certain concepts. Let r

U ( r ) = { Z E Rn ~ ( x oT ), = { X = xo

> 0 , xo

E Rn, and define

I 1 1 ~ 1 15 r ) ,

(8.2.25) (8.2.26)

+ z E IWn I z E U ( r ) ) ,

I l l ~ l l= r , Z k = I I z ~ ~ ~ ) , U i ( r ) = ( 2 E Rn I l l ~ l l= T , Z k = -11211~0), ~ : ( x ~ , r=) { x = xo + z E IWn I z E ~ $ ( r ) ) , UF ( x o ,r ) = { x = xo + x E IWn I x E U F ( r ) ) for all k = 1,2, . . . ,n. ~ : ( r )= ( 2 E Rn

(8.2.27) (8.2.28) (8.2.29) (8.2.30)

We show the main result:

Theorem 8.2.5 Let G : S + Rn be a differentiable operator defined on an open convex subset of Rn. Assume G satisfies the strong center-Kantorovich conditions. Then, for any r E R with U ( x o ,r ) Z S the following hold: (a)

U = U ( r ) = U ( x o ,r ) is a Miranda domain [ 2 2 7 ,

(8.2.31)

and

is a Miranda partition [2271 of the boundary d U . It is a canonical Miranda partition [2271 for r > 0 and a trivial Miranda domain and partition for r = 0; (b)

G~(X)LO forall X E U $ ( X O , ~ ) , k = l , ..., n

(8.2.33)

and G k ( x ) 5 0 for all x E U i ( x O , r ) , k = 1,..., n;

(8.2.34)

(c) G satisfies the Miranda conditions on (U,U l ) ; (d) if G ( x o ) = 0 and lo > 0 , then G satisfies the Miranda conditions on (U,U I ) for any r E 0 such that U ( x o ,r )

c S.

[

,lo1

Proof. The first point of the theorem follows exactly as in Theorem 4.3 in [227]. For the rest we follow the proof of Theorem 4.3 in [227] (which is essentially the reasoning of Theorem 8.2.3) but with some differences stretching the use of centerLipschitz condition (8.2.16) instead of the stronger Lipschitz condition (8.2.4) which is not really needed in the proof. However it was used in both proofs mentioned above. Using the intermediate value theorem for integration we first obtain the

8.2. Mzranda Theorem

identity G(xo

=

+rz)

I'

-

G ( x o )=

+

[G' ( X O r t z ) - G' ( x ~ ) ] rdtz

+ rz

I'

d t ( G 1 ( x O= ) I).

(8.2.36)

Let ek denote the k-th unit vector. Then we can have: Gk (xo

+ r z ) = G k ( 2 0 )+

I'

+

e Z [ ~ l ( x o r t z ) - G 1 ( x o ) ] r dt z

+rzk,

(8.2.37)

and by (8.2.16)

5

l11

+

e: [ ~ ' ( x o r t z ) - G' ( x o ) ] r zdt

1

Let z E ~ k + ( lUsing ). (8.2.38), and

zk

1 >C2

we get from (8.2.37)

for u E Uk+(l),

(8.2.40)

296


for r E R (by (8.2.19) and (8.2.22)). Similarly,

for r E R. If G ( x o ) = 0, let 7 = 0 which implies ho = 0. Remark 8.2.3 If C = lo,then our Theorem 8.2.5 becomes Theorem 4.3 i n [227]. Moreover if 11 11 is the maximum norm then Theorem 8.2.5 becomes Theorem 8.2.3. However i n general

Hence the strong Kantorovich condition is such that

Similarly the Kantorovich condition (8.2.5) is such that

but not vice versa unless if to = e. If strict inequality holds i n (8.2.44) and conditions (8.2.45) or (8.2.5) are not satisfied then the conclusions of Theorem 4.3 i n [227] respectively do es not necessarily hold. However if (8.2.8) holds the conclusions of our Theorem 8.2.5 hold. Remark 8.2.4 Condition (8.2.5) guarantees the quadratic convergence of NewtonKantorovich method t o x*. However this is not the case for condition (8.2.18). To rectify this and still use a condition weaker than (8.2.5) (or (8.2.45)) define

W e showed i n Section 5.1 that zf

and

replace (8.2.5) then Newton-Kantorovich method converges to x* E c ( x o ,r 3 ) with r3 5 r * . Moreover finer error bounds o n the distances between iterates or between

8.3. Computation of Orbits

297

iterates and x* are obtained. If we restrict 6 E [0, I], then (8.2.50) and (8.2.51) hold i f only (8.2.49) is satisfied. Choose 6 = 1 for simplicity i n (8.2.49). Then again

but not vice versa unless if

e = lo.Hence if (8.2.18) is replaced by

then all conclusions of Theorem 8.2.5 hold.

8.3

Computation of Orbits

In this section we are concerned with the problem of approximating shadowing orbits for dynamical systems. It is well known in the theory of dynamical systems that actual computations of complicated orbits rarely produce good approximations to the trajectory. However under certain conditions, the computed section of an orbit lies in the shadow of a true orbit. Hence using product spaces and the results in Section 5.1, we show that the sufficient conditions for the convergence of Newton's method to a true orbit can be weakened under the same computational cost as in the elegant work by Hadeller in [177]. Moreover the information on the location of the solutions is more precise and the corresponding error bounds are finer. Let f be a F'rkchet-differentiable operator defined on an open convex subset D of a Banach space X with values in X. The operator f defines a local dynamical system as follows:

as long as xk E D . ~ D with x,+l = f (x,), i = 0, ...,N - 1 is called an A sequence { x , ) ~ in orbit. Any sequence { x , ) ~ ~ ,x, E D , rn = 0, ...,N is called a pseudo-orbit of length N. We can now pass to product spaces. Let y = X N + l equipped with maximum norm. The norm x = (xo, ..., xN) E Y is given by

Set S

= DN+l.

F (x) = F

Let F : S + Y be an operator associated with

['I[ XN

f (20) - x1

f

(XN)

1.

f:


298

Assume there exist constants lo, 1, Lo,L such that:

for all u, v E D , u,v E S. From now on we assume: lo = Lo and 1 = L. For y E X define an operator

It follows that Fi (x)= F' (x). As in [I661 define the quantities a ( . )

= man

O 0 , on which the operator Lemma 8.4.1 Let D be an open subset of Rn. Assume H : J x D

for all (t,x ) E I x U is well defined and F has a partial Fre'chet derivative with respect to x at ( t * , x * ) given by

Moreover i f there exist positive constants lo and 1 such that

and

for all (t,x ) , (t,y ) E lox Uo, then the continuity of H at (t*,x * ) and strongness of H, (t*,x * ) implies the strongness of G,(t*, x*). Proof. Let us fix E E ( 0 , l ) and set S = & so that

and

IIA(t*,x * ) - ' [ ~ ( tx, ) - A ( t * ,x*)]11 I eo6 = E < 1 for all (t,x )

EI

x S. (8.4.8)

It follows by (8.4.8) and the Banach Lemma on invertible operators 1.3.2 that A(t,2 ) - I exists, and

IIA(~,x)-~A(~*,x*)II -< ( 1 - & ) - I

for all ( t , x ) E I x S.

By restricting S if necessary we can have

(8.4.9)

302


and

I I A ( ~ * , X * ) - ~ H (I~ E , ~ )for I I all ( t , y ) E I x S.

(8.4.10)

Using (8.4.4)-(8.4.7), (8.4.9)-(8.4.10) and the strongness of H x ( t * ,x * ) we obtain in turn:

IIG(t,x) - G ( t 7 ~-) Gx(t*7x*)(x- Y ) I I I I l A ( t * , x * ) - l [ H ( t , x )- H ( ~ , Y )- H x ( t * , x * ) ( x - y ) l l l + IIA(t*,x * ) - l [ A ( t ,x ) - A ( t * ,x * ) ] . ~ ( xt ),- ~ A ( ~x * ), A ( ~x**, ) - l ( ~ (X t) ,- ~ ( y))ll t , I I A (y~) ,- l ~ ( t x**, ) A ( ~x**, ) - l ( ~ ( tX) , - A(t,y))A(t,x)-lA(t*,x*)A(t*, x * ) - l H ( t , y)II I sallx - yll,

+

where,

which shows that G x ( t * ,x * ) is strong. The rest of the result follows by a classical lemma due to Ortega and Rheinboldt [263].

Remark 8.4.1 In general

&

can be arbitrarily large (for t suficiently close to t * ) . If e = Lo holds and Lemma 8.4.1 can be reduced to Lemma 3.1 i n [103]. However i n that Lemma 6 is chosen is an unspecified (in terms of E ) way, whereas i n our proof it is specified (6 = %) . Moreover if strict inequality holds I , U are enlarged, a is smaller than the corresponding a call it a1 given i n [103, p. 1091 by

where,

and the strongness of G x ( t * ,x*) is established on a larger cross product I x S . The above benefits are achieved under the same hypotheses and computational cost since i n practice the computation of C requires that of lo.Furthermore a wider choice of initial guesses to start Newton's method becomes available since U is enlarged. W e also note that our result is given i n an afine invariant form. The advantages of such an approach have been explained i n Section 5.1. Finally we can see from the from of the Lemma that condition (8.4.7) can be replaced by the stronger

8.4. Continuation Methods

303

3

for all ( t l ,x ) , (t2,y ) E I. x Uo. Clearly l 5 e l , lo 5 L 1 , and again can be arbitrarily large. Under this modification t does not have to necessarily be close enough to t*for the conclusions of the Lemma to hold (with el replacing 4 throughout the statment and the proof of the Lemma). In the result that follows we provide a local convergence analysis for method (4.1.3):

Theorem 8.4.2 Let H : J x D c J x Rn -, Rn, where D is an open subset of Rn and H has partial Fre'chet derivatives with respect to both t and x . Assume x*: J + D is a continuous solution of the equation H ( t , x ) = 0, t E J . It follows by the openness of D that there exists r* > 0 such that Do = { v E X llx* - v(t*)ll < ro for some t* E J ) 2 D.

(8.4.16)

Moreover assume: Theorem 8.4.3 Ht is continuous on J x Do, L = Hz. (t*,x*)-l exists, and IILHt(t,v(t))II I a for allt E J ; IIL[Hx(t,x)- HX(~,Y)IIII ell. - yll and

for all t E J and for all x , y, v

Theorem 8.4.4 for y =

E

Do;

m,and

ro satisfies ro I q. Moreover let hl be the largest value in [0,&]

where,

Then, if the integer N is chosen so that

such that

304


Newton's method

k=1,

..., N - 1 ,

xO=x(0);

, 1,. . . xk = xk-l - H , ( ~ , X ~ - ~ ) - ~ H ( ~ , kX=~ N- ,~N) +

(8.4.26)

is feasible. Proof. By hypotheses and the Banach Lemma on invertible operators H x ( t ,x)-I exists and

We then set ek = 11x"l we get

11

- x(kAt)

and exactly as in Theorem 4.3 in [103,p. 1131

where,

Pie

a = - and ~ = @ ~ r r . 2

(8.4.29)

By hypotheses 1 - 4ac 2 0. Therefore it follows from Lemma 4.2 in [103, p. 1121 that

Remark 8.4.2 According to the hypotheses of Theorem 4.3 in [I 031 the parameter p in practice will be given b y

P= where

IIHZ* (t*,x*)-llI 1 - 11 Hz* (t*,x*)-l llezro '

e2 is the constant appearing in hypothesis (iii) in that theorem given by IIH,(t, X ) - H,(t, y)II 5 e211x - 911 for all t E J , and all x , y E Do.

Clearly

2 can be arbitrarily large. In particular if

then P1

< P,

(8.4.32)

8.5. Trust Regions which implies that

and is the largest value in

[o, 20$r,]

such that

where, 1 rl ( h ) = -( 1 - J1 - 2ap212h ) . Dl2

Note also that in this case

That is our approach and under the same hypotheses and computational cost has the following advantages over the one given in Theorem 4.3 in [103]: a smaller number of steps; a larger convergence radius; finer upper bounds on the distances involved and the results are given in afine invariant fom. As a simple example where strict inequality holds in (8.4.12), consider

and define operator H on J x Do b y

Then it can easily be seen using (8.4.18) and (8.4.19) that

The rest of the results in [I031 can now be improved if rewritten with placing (/?, h ) , see, e.g., Theorem 4.7 and Corollary 4.8.

8.5

( P I ,h l ) re-

Trust Regions

Approximate solution of equation involves a complex problem: choice of an initial approximation xo sufficiently close to the true solution. The method of random choice is often successful. Another frequently used method is to replace

F ( x )= 0

(8.5.1)

by a "similar" equation and to regard the exact solution of the latter as the initial approximation xo. Of course, there are no general "prescriptions" for admissible

306


initial approximations. Nevertheless, one can describe various devices suitable for extensive classes of equations. As usual, let F map X into Y. To simplify the exposition, we shall assume that F is defined throughout X . Assume that G (x.A) ( x E X ; 0 5 A 5 1 ) is an operator with values in Y such that

G ( x ;1 ) = F x

(xE X ),

(8.5.2)

and the equation

G ( x ;0 ) = 0

(8.5.3)

has an obvious solution xO. for example, the operator G ( x ;A) might be defined by

G ( x ;A)

=F

x - ( 1 - A) FX'.

(8.5.4)

Consider the equation

Suppose that equation (8.5.5) has a continuous solution x = x ( A ) , defined for 0 5 A 5 1 and satisfying the condition

Were the solution x ( A ) known,

would be a solution of equation (8.5.1). One can thus find a point xo close to x* by approximating x (A). Our problem is thus to approximate the implicit function defined by (8.5.5) and the initial conditions (8.5.6). Global propositions are relevant here theorems on implicit functions defined on the entire interval [0,11. the theory of implicit functions of this type is at present insufficiently developed. The idea of extending solutions with respect to a parameter is due to S.N. Bernstein [322];it has found extensive application in various theoretical and applied problems. Assume that the operator G ( x ;A) is differentiable with respect to both x and A, in the sense that there exist linear operators Gk ( x ;A) mapping X and Y and elements Gk ( x ;A) E Y such that lim Ilhll+lAxl+o

I G(x+~;x+Ax)-G(x; )-G;(x; )~-G;(x; )AxI

~lhll+lnxl

= 0.

The implicit function x ( A ) is then a solution of the differential equation

Gk ( x ;A)

+ Gk ( x ;A) = 0 ,

(8.5.8)

8.5. Trust Regions

307

satisfying the initial condition (8.5.6). conditions for existence of a solution of this Cauchy problem defined on [0,1]are precisely conditions for existence of the implicit function. Assuming the existence of a continuous operator

we can rewrite equation (8.5.8) as

One must bear in mind that Peano's Theorem is false for ordinary differential equations in Banach spaces. Therefore, even in the local existence theorem for equation (8.5.10)with condition (8.5.6) one must assume that the right-hand side of the equation satisfies certain smoothness conditions. However, there are no sufficient smoothness conditions for the existence of a global extension of the solution to the entire interval 0 X 5 1. We shall only mention a trivial fact: if the equation

s

$ = f (x;A) in a Banach space satisfies the local existence theorem for some initial condition, and

then every solution of equation (8.5.11) can be extended to the entire interval 0 5 X 5 1. Thus, if

then equation (8.5.5)defines an implicit function which satisfies (8.5.6) and is defined for 0 5 X 5 a. consequently, condition (8.5.13) guarantees that equation (8.5.1)is solvable and its solution can be constructed by integrating the differential equation (8.5.10). To approximate a solution x (A) of equation (8.5.10),one can use, for example, Euler's method. To this end, divide the interval [O,1]into m subintervals by points

The approximate values x (Xi)of the implicit function x (A) are then determined by the equalities x (Ao)= xO and X

(Xi+')

=X

(Xi)- r [X (Xi); Xi]G i [X (Xi);Xi] (&+I

- Xi) .

(8.5.15)

The element x (A,) is in general close to the solution x* of equation (8.5.1)and one may therefore expect it to fulfill the demands imposed on initial approximations for iterative solution of equation (8.5.1). We emphasize that (8.5.15)does not describe an iterative process; it only yields a finite sequence of operations, whose result is


308

an element which may be a suitable initial approximation for iterative solution of equation (8.5.1). Other constructions may be used to approximate the implicit function x (A). Partition the interval [ O , 1 ] by the points (8.5.14). The point x (XI) is a solution of the equation G (x, A1) = 0. Now x (Ao) = xO is a suitable initial approximation to x (A1) . Approximate x (A1) by performing a fixed number of steps of some iterative process. The result is an element xl which should be fairly close to x (A1). xl is obtained from xOby a certain operator xl = W [xO;G (x; XI)] . Now regard xl as an initial approximation to the solution x (Az) of the equation G (x; Az), and proceed as before. The result is an element xz:

Continuing in this way, we obtain a finite set of points

the last of which x,, may be regarded as an initial approximation for iterative solution of equation (8.5.1) If the operator W represents one iteration of the method, formula (8.5.16) is

Using the algorithm proposed by H.T. Kung [216] (see also [322]) we show that the number of steps required to compute a good starting point xo (to be precised later) can be significantly reduced. This observation is very important in computational mathematics. In Chapter 5 we showed the following semilocal convergence theorem for Newton's method (4.1.3) which essentially states the following: If

F' (xo)-l exists,

r) for all x, y E U (XO,

where,

I F'

(xO)-lII

< PO,

(8.5.18)

8.5. Trust Regions

309

and

then sequence {x,} (n 2 0) generated by Newton's method (4.1.3) is well defined, remains in U (xo,ro) for all n 2 0 and converges quadratically to a unique solution x* E U (xo,r ) of equation F (x) = 0. Moreover we have


holds. If equality holds in (8.5.27) then the result stated above reduces to the famous Newton-Kantorovich theorem and (8.5.22) to the Newton-Kantorovich hypothesis (4.2.51). If strict inequality holds in (8.5.27) then (8.5.22) is weaker than the Newton-Kantorovich hypothesis

Moreover the error bounds on the distances llxn+l - xnll , llx, - x* 11 (n 2 0) are finer and the information on the location of the solution more precise. Note also that the computational cost of obtaining (KO,K ) is the same as the one for K since in practice evaluating K requires finding KO. Hence all results using (8.5.28) instead of (8.5.22) can now be challenged to obtain more information. That is exactly what we are doing here. In particular motivated by the elegant work of H.T. Kung [216] on good starting points for Newton's method we show how to improve on these results of we use our theorem stated above instead of the Newton-Kantorovich theorem. Definition 8.5.1 We say xo is a good starting point for approximating x* by Newton's method or a good starting point for short if conditions (8.5.18)-(8.5.25) hold.

Note that the existence of a good starting point implies the existence of a solution x* of equation F (x) = 0 in U (xo,250) . We provide the following theorem / Algorithm which improves the corresponding ones given in [216, Thm. 4.11 to obtain good starting points. Theorem 8.5.2 Let F : D C X -+ Y be a Fke'chet-differentiable operator. If F' satisfies center-Lipschitz, Lipschitz conditions (8.5.20), (8.5.21) respectively on u ($0'2~)

IF'

(x)-'ll 5

for all x E u ( x o , 2 r ) ,

(8.5.29)


310 and

Prlo

$ - 6 , and want to prove that b+l is a good starting point for approximating xi+n, a zero of fi+l. We have by (8.5.49) and (8.5.46),

I

(~~+~)ll.

Let b = [f{+,(Ti+,)]-' fi+l If x E U (Ti+1,2b), as in (8.5.50) we can prove that x E U ( x o ,2 r ) . hence Ti+l is a good starting point for approximating Xi+2. By the same argument as used for obtaining T o and by (8.5.49), one can prove that if Ti+2 is set to be the J ( 6 ) -th Newton iterate starting from Ti+l, then

and

+

i.e. (8.5.45) and (8.5.46) hold with i replaced by i 1. This shows that we need to perform at most J ( 6 ) Newton steps at step 4 of Algorithm B to obtain each Therefore, for any 6 E ( 0 , , to obtain a good starting point we need to perform at most N ( 6 ) = I ( 6 ) . J ( 6 ) Newton steps.

i)

Remark 8.5.3 A s noted i n [216]6 should not be chosen to minimize the complexity of Algorithm B. Instead, 6 should be chosen to minimixe the complexity of algorithm: 1. Search Phase: Perform algorithm B. 2. Iteration Phase: Perform Newton's method starting from the point obtained by

Algorithm B.

8.5. Trust Regions

315

An upper bound on the complexity of the iteration phase is the time needed to carry out T (6,K O ,K , E ) is the smallest integer K such that

Note also that if K O = K our Theorem 8.5.3 reduces to Theorem 4.2 in [216, p. 151. Hence we showed the following result:

Theorem 8.5.4 Under the hypotheses of Theorem 8.5.3 the time needed to find a solution x* of equation F ( x ) = 0 inside a ball of radius E is bounded above by the time needed to carry out R (6,K O ,K , E ) Newton steps, where

where N (6,K O ,K , E ) and T (6,K O ,K , E ) are givey by (8.5.32) and (8.5.51) respectively. Remark 8.5.4 If K O = K Theorem 8.5.4 reduces to Theorem 4.3 i n [216, p. 201. I n order for us to compare our results with the corresponding ones i n [216] we computed the values of R ( E , K O ,K ) for F satisfying the conditions of Theorem 4.3 i n (2161 and Theorem 8.5.4 above with .4r, Prlo I

(8.5.53)

and

and for

E

equal to lo-%, 1 I iI 10.

The following table gives the results for E = 1OP6r. Note that by I we mean I (60,K , K ) , IaK we mean I (60,K O ,K ) with K O = a K , a E [0,11. similarly for J , N and T.

Remark 8.5.5 It follows from the table that our results significantly improve the corresponding ones i n [216] and under the same computational cost. Suppose for example that ho = 9, 6 = .159. Kung found that the search phase can be done i n 255 Newton steps and the iteration phase i n 5 Newton steps. That is a root can be located inside a ball of radius using 260 Newton steps. However for K O = .9K, K O = .5K and K O = 0 the corresponding Newton steps are 245,179 and 104 respectively which constitute a significant improvement. A t the end of his paper Kung asked whether the number of Newton steps used by this procedure is close to the minimum. It is now clear from our approach that the answer is no (in general). Finally Kung proposed the open question: Suppose that the conditions of the Newton-kantorovich theorem hold: Is Newton's method optimal or close to optimal, i n terms of the numbers offunction and derivative equations required to approximate the solution x* of equation F ( x ) = 0 to within a given tolerance E ? Clearly according to our approach the answer is no (see also Remark 8.5.1).

316

8.6


Convergence on a Cone

In this section we are concerned with the problem of approximating a solution x* of equation (5.1.1). We propose the Newton-Kantorovich method (5.1.47) to generate a sequence approximating x*. This study is motivated by the elegant work in [116],where X is a real Banach space ordered by a closed convex cone K. We note that passing from scalar majorants to vector majorants enlarges the range of applications, since the latter uses the spectral radius which is usually smaller than its norm used by the former. Here using finer vector majorants than before (see [116]) we show under the same hyotheses: (a) sufficient convergence conditions can be obtained which are always weaker than before. (b) finer error bounds on the distances involved and an at least as precise information on the location of the solution z* are provided.

Several applications are provided. In particular we show as a special case that the Newton-Kantorovich hypothesis (4.2.51) is weakened. Finally we study the local convergence of method (5.1.47). In order to make the study as self-contained as possible we need to introduce some concepts involving K-normed spaces. Let X be a real Banach space ordered by a closed convex cone K. We say that cone K is regular if every increasing sequence yl 5 y2 5 . - . 5 y, 5 . .. which is bounded above, converges in norm. If : y 5 yn 5 y: and limn,, : y = lim,, y: = y*, the regularity of K implies limn,, yn = y*. Let a , p E X the conic segment ( a , @ = {y I a 5 y 5 P). An operator Q in X is called positive if Q(y) E K for all y E K. Denote by L(X, X ) the space of all bounded linear operators in X , and L,,,(X2, X ) the space of bilinear,

8.6. Convergence o n a Cone

317

symmetric, bounded operators from X 2 to X . By the standard linear isometry between L(X2,X), and L(X, L(X, X)), we consider the former embedded into the latter. Let D be a linearly connected subset of K , and cp be a continuous operator from D into L(X, X ) or L(X, L(X, X)). We say that the line integral of cp is independent of the path if for every polygonal line L in D, the line integral depends only on the initial and final point of L. We define

We need the definition of K-normed space [365]:

Definition 8.6.1 Let X be a real linear space. T h e n X is said t o be K-normed if operator I.[: X 3 Y satisfies:

]x[ > 0 (X E X); ]x[ = o + x = o ; ]rx[ = Irl]x[ (x E X , r E R); and

Let xo E X and r E K . T h e n we denote

Using K - n o r m we can define convergence o n X . A sequence {y,) ( n > 0) i n X is said to be (1) convergent to a limit y E X if lim ]y,- y[ = O

n'm

and we write (X) - lim y, = y; n+m

(2) a Cauchy sequence if

inX

318


The space X is complete if every Cauchy sequence is convergent. We use the following conditions: F is differentiable on the K-ball U(xo, R), and for every r E S = (0, R) there exist positive operators wo(r), m(r) E Ls,,(X2, X ) such that for all x E X

for all x E U(xo, r ) ,

for all x, y E U(xo,r), where operators wo,m: S + Lsy,(X2, X ) are increasing. Moreover the line integral of (similarly for wo) is independent of the path, and the same is true for the operator w: S + L(X, X ) given by

Note that in general wo(r) 5 m(r) for all r E S. The Newton-Leibniz formula holds for F on ~ ( X R): O,

for all segments [x,y] E U(xo, R); for every r E S there exists a positive operator wl(r) E L(X, X ) such that

where wl : S + L(X, X ) is increasing and the line integral of wl is independent of the path; Operator F1(xo) is invertible and satisfies:

for some positive operator b E L(X, X). Let

and define operator f : S

f (r) = 7 + b

1'

-+ X

w(t)dt

by letting

+b

1'

w1 (t)dt.

(8.6.11)

By the monotonicity of operators w, w1 we see that f is order convex, i.e. for all r, F E S with r 5 F,

We will use the following results whose proofs can be found in [116]:

8.6. Convergence on a Cone Lemma 8.6.2 (a) If Lipschitz condition (8.6.4) holds then

l(F'(x + Y )

-

F 1 ( x ) ) ( z ) [5 ( w ( r + I y [ )- w ( r ) ) ( l z [ >

for all r , r + ] y [ E S , x ~ U ( x o , r )z, E X ; (b) If Lipschitz condition (8.6.8) holds then

.+IY[ ] G ( X + Y ) - G ( X )5[

wl(t)dt for all r, r+ ] y [E S , x E ~ ( x or ) ,. (8.6.14)

Denote by F i x ( f ) the set of all fixed points of the operator f . Lemma 8.6.3 Assume:

Then there is a minimal element r* in F i x ( f ) which can be found by applying the method of successive approximations

with 0 as the starting point. The set B ( f , r * )= { r E S : lim f n ( r ) = r * ) n-oo

is the attracting zone of r*. Remark 8.6.1 [I161 Let r E S . If

f

(r)l r,

and (0,r ) n F i x ( f ) = {r*} then, ( 0 , ~L) B ( f , r * ) . Note that the successive approximations

converges to a fixed point E* of f , satisfying 0 5 s* 5 r . Hence, we conclude = r*, which implies r E B ( f , r * ) .

S*


320

In particular, Remark 8.6.1 implies

(0, ( 1 - s)r*

+sr) G B(f,r*)

(8.6.22)

for every r E Fix( f ) with ( 0 ,r ) n Fix(f ) In the scalar case X = R, we have

B ( f , r * ) = [0,r*] U { r E S : r*

=

{r*,r ) , and for all X E [ O , l ) .

< r , f ( q ) < q, (r* < q I r)).

(8.6.23)

We will also use the notation

Returning back to method (5.1.47),we consider the sequences of approximations

and -

rn+l = Pn - (bw(Pn)- I)-' ( f (Pn) - P n )

(Po = 0 , n 2 0 )

(8.6.26)

for the majorant equation (8.6.16). Lemma 8.6.4 If operators

are invertible with positive inverses, then sequence {r,) ( n 2 0 ) given by (7) is well defined for all n 2 0 , monotonically increasing and convergent to r*. Proof. We first show that if a1 5 a2, a:! # r * , then (1- bwo(a1))-'8

I ( I - bwo(a2))-'8

We have

where,

O2

=

( I - bwo(a2))-'8.

Using the monotonicity of w we get

for all 8 E K.

(8.6.28)

8.6. Convergence on a Cone

321

which implies increasing sequence { ( b ~ ~ ( a l ) )(~nO2) 0 ) is bounded in a regular cone K, and as such it converges to some

with

We also need to show that the operator g ( r ) = r - (bwo(r) - I)-' ( f ( r ) - r )

(8.6.33)

is increasing on (O,r*) - { r * ) . Let bl I b2 with bl, b2 E [O,r*],then using (8.6.6) and (8.6.33) we obtain in turn

since all terms in the right hand of equality (8.6.35) are in the cone. Moreover, g leaves ( 0 ,r * ) - { r * } invariant, since

Hence sequence

is well defined for all n 2 0, lie in the set (0, r * ) - { r * ) , and is increasing. Therefore the limit of this sequence exists. Let us call it r;. The point r ; is a fixed point of f in ( 0 ,r*). But r* is the unique fixed point of f in (0, r * ) . Hence, we deduce

rT

= r*.

(8.6.38)

Remark 8.6.2 If equality holds i n (8.6.6) then sequence 1%) becomes {r,) ( n > 0 ) and Lemma 8.6.4 reduces to the corresponding Lemma i n [I 16, p. 5551. Moreover as it can easily be seen using induction on n

322


and

>

for all n 0. Furthermore if strict inequality holds in (8.6.6) so does in (8.6.39) and (8.6.40). If {r,} ( n 2 0 ) is a majorizing sequence for method (5.1.47), then (8.6.39) shows that the error bounds on the distances llxn+l - xnll are improved. It turns out that this is indeed the case. We can show the semilocal convergence theorem for method (5.1.47). Theorem 8.6.5 Assume: hypotheses (8.6.4), (8.6.5), (8.6.7), (8.6.8), (8.6.9), (8.6.15) hold, and operators (8.6.27) are invertible with positive inverses. Then sequence {x,) ( n 2 0 ) generated by Newton-Kantorovich method (5.1.48) is well defined, remains in the K-ball U ( x o , r * )for all n > 0 and converges to a solution x* of equation F ( x ) + G ( x ) = 0 in E ( r * ) , where E ( r * ) is given by (8.6.24). Moreover the following estimates hold for all n 2 0:

and

where sequence {r,} is given by (8.6.25). Proof. We first show (8.6.41) using induction on n

Assume:

for k = 1,2, . . . ,n. Using (8.6.44) we get

Define operators Q,: X -+ X by

By (8.6.3) and (8.6.9) we get

2 0 (by(8.6.10)). For n = 0;


323

and

Hence, we have:

00

That is series and

C Q k ( z ) is convergent in X.

i=O

Hence operator I

-

Qn is invertible,

Operator F 1 ( x n )is invertible for all n 2 0 , since F 1 ( x n ) = F 1 ( x o ) ( I- Q n ) , and for all x E Y we have:

Using (5.1.47) we obtain the approximation

It now follows from (8.6.3)-(8.6.9), (8.6.11), (8.6.25) and (8.6.52)

324


By Lemma 8.6.4 sequence { r n ) ( n 2 0) converges to r*. Hence {x,) is a convergent sequence, and its limit is a solution of equation (5.1.1). Therefore xn converges to x* . Finally (8.6.42) follows from (8.6.41) by using standard majorization techniques. The uniqueness part is omitted since it follows exactly as in Theorem 2 in [116]. w

Remark 8.6.3 It follows immediately from (8.6.53) and (8.6.54) that sequence to = to,

tl

=

v,

is also a finer majorizing sequence of { x n ) ( n 2 0 ) and converges to some t* i n (0, r * ) . Moreover the following estimates hold for all n 0

>

and

t* 5 r*.

(8.6.61)

That is {t,) is a finer rnajorizing sequence than { r n ) and the information on the location of the solution x* is more precise. Therefore we wonder i f studying the convergence of { t n ) without assuming (8.6.15) can lead to weaker suficient convergence conditions for method (5.1.47). I n Theorem 8.6.7 we answer this question. But first we need the following lemma on majorizing sequences for method (5.1.47).

8.6. Convergence on a Cone Lemma 8.6.6 If: there exist parameters 7 > 0 , 6 E [O, 2 ) such that: Operators

be positive, invertible and with positive inverses for all n > 0;

and 2b

Jdl

(I7 + s ( Y ) ~ "71ds 61 2(2I - 61)- (I - ( ) )

w [2(2I - 6 I ) - l

($)"+I)

n+l

1

2(2I - 61)-' ( I - ( 2(2I - 61)-l

I

SI,

3

~ ) ~ + 71 l )

(I- (g)n+l) 71

for all n 2 0.

Then iteration {t,) (n 2 0 ) given by (8.6.56) is non-decreasing, bounded above by

t**= 2(2I - &I)-'rl, and converges to some t* such that: 0 5 t* I t**.

Moreover the following estimates hold for all n 2 0:

Proof. We must show:

and operators

(8.6.64)

326


positive, invertible and with positive inverses. Estimate (8.6.67) can then follow immediately from (8.6.68)-(8.6.70). Using induction on the integer k, we get for k=O

bwo (tl) < I, by the initial conditions. But (8.6.56) then gives 0 5 t2 - t l 5 %(tl

-

(8.6.71)

to).

Assume (8.6.68)-(8.6.70) hold for all k 5 n in turn:

+ 1. Using (8.6.62)-(8.6.64) we obtain

Moreover, we show:

We have:

Assume (8.6.73) holds for all k 5 n

+ 1. It follows from (8.6.56), (8.6.68)-(8.6.70):

Hence, sequence {t,) (n 2 0) converges to some t* satisfying (8.6.66). We can show the main semilocal convergence theorem for method (5.1.47).


327

Theorem 8.6.7 Assume: hypotheses (8.6.3)-(8.6.5), (8.6.7)-(8.6.9), (8.6.62)-(8.6.64) hold, and

where t** is given by (8.6.65). Then sequence {x,) ( n >) generated by Newton-Kantorovich method (5.1.47) is well defined, remains in the K-ball U ( x o , t * )for all n 2 0 and converges to a solution x* of equation F ( x ) + G ( x ) = 0, which is unique in E ( t * ) . Moreover the following error bounds hold for all n 2 0:

and

where sequence {t,) ( nL 0 ) and t* are given by (8.6.56) and (8.6.66) respectively. Proof. The proof is identical to Theorem 8.6.5 with sequence t, replacing r , until the derivation of (8.6.53). But then the right hand side of (8.6.53) with these changes becomes t,+' - t,. By Lemma 8.6.6 {t,) converges to t*. Hence {x,) is a convergent sequence, its limit converges to a solution of equation F ( x ) G ( x ) = 0. Therefore {x,) converges to x*. Estimate (8.6.77) follows from (8.6.76) by using standard majorization techniques. The uniqueness part is omitted since it follows exactly as in Theorem 2 in [116].

+

Remark 8.6.4 Conditions (8.6.62), (8.6.64) can be replaced by the stronger but easier to check

and

respectively. Application 8.6.8 Assume operator ] [ is given by a nomn 11 11 and set G ( x ) = 0 for all x E U ( x o ,R ) . Choose for all r E S , b = 1 for simplicity,

328


and

With these choices our conditions reduces to the ones in Section 5.1 that have already been compared favourably with the Newton-Kantorovich theorem. Remark 8.6.5 The results obtained here hold under even weaker conditions. Indeed, since (8.6.4) is not "directly" used i n the proofs above, it can be replaced by the weaker condition (8.6.13) throughout this study. A s we showed i n Lemma 8.6.2

but not necessarily vice versa unless if operator w is convex [116, p. 6741. The local convergence for method (5.1.47) was not examined in [Ill]. Let x* be a simple solution of equation (5.1.1), and assume F ( x * ) - l E L(Y,X ) . Moreover assume with x* replacing xo that hypotheses (8.6.3), (8.6.4), (8.6.5), (8.6.7), (8.6.8), (8.6.9) hold. Then exactly as in (8.6.53) but using the local conditions, and the approximation

we can show the following local result for method (5.1.47). Theorem 8.6.9 Assume there exists a minimal solution r* E S of equation

where,

Then, sequence {x,) ( n 2 0 ) generated by Newton-Kantorovich method (5.1.47) is well defined, remains i n V ( x * ,r * ) for all n 2 0 , and converges to x* provided that 2 0 E U ( x * ,r * ) . Moreover the following estimates hold for all n 2 0

where,

8.7. Convergence on Lie Groups

329

Application 8.6.10 Returning back to the choices of Application 8.6.8 and using (8.6.85) we get

See also Section 5.1.

8.7

Convergence on Lie Groups

In this section we are concerned with the problem of approximating a solution x* of the equation

where F is an operator mapping a Lie group to its corresponding Lie algebra. Most numerical algorithms are formulated on vector spaces with some type of a Banach space structure that allows the study of the convergence properties of the iteration. Recently, there has been an interest in studying numerical algorithms on manifolds (see [265] and the references there). Note that a suitable choice of coordinates can be interpreted locally to be the Euclidean space, which justifies the study of such algorithms on manifolds. In particular, Newton's method for solving equation (8.7.1) has already been introduced in the elegant work in [265], where the quadratic convergence of the algorithm was established provided that the initial guess is chosen from a ball of a certain convergence radius. Here by using more precise majorizing sequences we show that under the same hypotheses and computational cost a larger radius of convergence and finer error bounds can be obtained than before [265]. This observation allows a wider choice of initial guesses for starting Newton's method and shows that fewer iterations are required to find the solution than it is suggested in [265]. Finally we provide a numerical example to justify what we stated in the above paragraph. Our study is divided as follows: first we provide the background in condensed form needed for this paper, then we introduce Newton's method and state some results obtained in [265] that are needed for the main local convergence results appearing later in the study. A Lie group (G, .) is understood to have the structure of a smooth manifold, i.e. the group product and the inversion are smooth operations in the differentiable structure embedded in the manifold. The dimension of the Lie group equals the one of the corresponding manifold, and we assume that it is finite. We will need the definitions [265]: Definition 8.7.1 A Lie algebra g of a Lie group (G,.) is the tangent space at i (i the identity element of G), TGIi equzpped with the Lie bracket [-, 1- : g x g + g


330

defined for u , v

Eg

by

where, y(t), z ( s ) are two smooth curues in G satisfying y(0) = z(0) = e, ~ ' ( 0=) u and ~ ' ( 0=) v . Definition 8.7.2 The exponential map is an operator

exp: g

4

G :u

4

exp(u).

Given u E G , the left invariant vector field X u : y + L&(u)(where Ll(g) = TG/,) determines an integral curve y,(t) solving the initial value problem Th(t) = Xu(Tu ( t ) ) % ( O ) = e. If N(i) = exp(T(0)) where T ( 0 ) is an open neighborhood of o E g then for any y E N ( i ) y = exp(v) for some v

E N(0)

and if exp(u) = exp(v) E N (i) for some u , v E X ( O ) ,then u = v . If G is Abelian, then exp(u + v ) = exp(u)exp(v) = exp(v)exp(u) for all u , v E g . In the non-Abelian case (8.7.4) is replaced b y exp(w) = exp(u)exp(v), where w is given by the (BCH) formula [265, p. 1141 w =u

1 + v + -21[u,v]+ ( [ u ,[u,v ] ]+ [v,[v,~ 1 1 + ) .. 12

for u , v E N ( 0 ) . Definition 8.7.3 The differential of exp at u E g is a map

exp:, : g

--,TG/exp(,)

expressed by d exp, : g

4

d exp, = L;,,p(,))-l

g by 0

exp,' - L~x,(-u) e x d .

8.7. Convergence o n Lie Groups

331

Definition 8.7.4 T h e differential of operator F at a point y E G is a map

F; : TGIY-. T g l f (y)

"S

defined by

for any tangent vector Ay E TG/, t o the manifold G at y. It follows:

FA (Ay ) = lim F ( Y .2) - F(Y) t-0

-

t

As in (8.7.7)) the differential g g given by

Fi can be expressed in terms of

dFy = ( F o Ly)' = F; o L;,

an operator D f y (8.7.9)

and consequently

We can now define Newton's method on the Lie group (G, .) using (8.7.8) as follows: by (8.7.8). Then find Ago E Choose yo E G and determine the differential TG/,, satisfying the approximation

Fin

. it follows by (8.7.9) that Newton's and then set yl = yo e x p ( ~ ~ , , ( A y 0 ) )Hence method can be formally written as: Given yn E G, determine dFyn according to (8.7.10); Find u , E g such that

compute

It is well known that a Lie group (G, .) is a manifold and as such it is also a topological space. Therefore the notion of convergence is defined with respect to this topology. We find it convenient to introduce a Banach space structure on the Lie algebra g, and use it to analyze convergence.


332

As already mentioned before we only discuss finite dimensional Lie algebras. Therefore every linear operator h : g + g is bounded. Consequently we can define its norm subordinate to that on g by

Ilh(u)lI = sup Ilh(u)II < 00. llhll = sup u#O llull Ilull=l We also need the definitions concerning convergence of sequences:

Definition 8.7.5 A sequence {y,) (n > 0 ) i n N ( i ) converges to i .if and only if corresponding sequence implicitly defined by y, = exp(u,) (n > 0 ) converges to 0, i n the sense that lim llunll = 0.

n+m

Definition 8.7.6 If there exists y

E

G such that

and (8.7.13) holds, then we say sequence {y,) converges to y . Moreover the order of convergence is X > 0 if

The continuity of the Lie bracket [-, p 2 0 such that

-1

guarantees the existence of a constant

We need the Lemmas whose proofs can be found in [265]: Lemma 8.7.7 There exists a constant 6

i n the disk It1

> 0 and a real function f

and

I

P

I 5, l l ~ l l +

for any ,B 5

=f

I 6, non-decreasing for t > 0 , with h ( 0 ) = 1, such that for

llvll I iR +-Is

-

vll

I llvllf (1-441,

$ , where w is given by (8.7.6).

Lemma 8.7.8 For y E G , u E g and t E R we have

(t) analytic all u, v

E

g

8.7. Convergence on Lie Groups

333

W e also need the radius r

=

m a x { t E R, su E N(o), for all s E [0,t ] )> 0

IIuII=l

(8.7.19)

of the maximal ball i n g centered at 0 and contained i n N ( 0 ) . Moreover let r* E [0,r ] . W e can define the closed ball with center y E G and of radius r* by

W i t h t h e notation introduced above, we can state and prove t h e following local convergence result for Newton's method (8.7.11)-(8.7.12): Theorem 8.7.9 Assume: Operator d F j is one-to-one, d F c l exists and

for some positive constant a; there exist constants ro E ( 0 ,r ] , to2 0 and C that for all z E U ( y , r o ) , u E g with llull 5 ro the following hold:

2 0 such

and

Then for each fixed q E [O, 11 there exists r* > 0 such that, if yo E U O ( yr, * ) sequence {yn) ( n 2 0 ) generated by Newton's method (8.7.11)-(8.7.12) is well defined, remains in U O ( y , r * )for all n 2 0 and converges quadratically to y. Moreover the following relations hold for all n 2 0:

lim vn = 0 ,

n-im

and

where,

and function f is given i n Lemma 8.7.7.


334

Proof. Let 6

> 0 and f (-) be as in Lemma 8.7.7. Set b = sup f ( t )= f ( 6 ) > 1, t~[0,61

and define r*

= min

(p, - -

l'+"q,:p),

where,

, I t follows y-lyo = exp(vo) for some vo E g with 11 voll Let yo E U O ( yr*). Using (8.7.22) we can get

< T* .

by the choice of r*. By (8.7.30) and the Banach Lemma on invertible operators 1.3.2 dF;l exists, and

Define uo by (8.7.11). Then using Lemma 8.7.8, we can have in turn uo

+ vo

=

d F i l [ F ( Y )- F(yo)

+ dFy(vo)]

Moreover using (8.7.23)) (8.7.31) and (8.7.32)) we get since lltvoll t E [O, 11

Then by the choice of r* and (8.7.33) we obtain:

< r* for all

8.7. Convergence on Lie Groups which implies uo E N(0). Therefore we have

Hence by (8.7.6) and (8.7.34)

By inequality (8.7.16) with ,B = 3r* 5

$ we get

by the choice of r*. That is yl E UO(y,r*). The rest follows by induction and noticing that vn + 0 as n + since Ilvn+l 11 < llvnll for all n 2 0 holds.


&

holds and can be arbitrarily large. If (a) to = e, and q = 1, our results are reduced to the ones obtained in [265, Theorem 4.6, p. 1371. Denote the convergence radius obtained there by rT and the convergence ratio by cl . In this case we have r* = rT and c = cl. (b) l o

=

(8.7.36)

e and q E [O, I), then we get by (8.7.27) and (8.7.28)

and

(c) lo< 1 and q and (8.7.38).

E

[O, 11, then using (8.7.27) and (8.7.28) we deduce again (8.7.37)

That is in cases (b) and (c) we obtain a larger convergence radius r* and a smaller convergence ratio c than the corresponding r; and cl found in 1265, p. 1371. Note that these advantages are obtained under the same computational cost, since in practice the evaluation of Lipschitz constant l requires the evaluation of the center Lipschitz constant lo. We complete this study with a numerical example. But first we need some background on some matrix groups. The set GL(k, R) of all invertible k x k matrices with real coefficients forms a Lie group called the general linear group under standard matrix multiplication. This Lie group has some interesting subgroups one

336


of which is the orthogonal group O ( k ,R) consisting of orthogonal matrices y such that y T y = Ik where Ik is the identity k x k matrix. The Lie algebra of GL(k, R) is denoted by g!(k, R) and it is the linear space of all real k x k matrices such that [ u , v ] = u v - v u . In particular the Lie algebra of O ( k , R) is the set of all k x k skew-symmetric matrices

sq(k, R) = { v E g!(k, R), vT

+v = 0).

Example 8.7.1 W e consider the initial value problem

for y E G = O ( N ,R), where g ( y ) = s q ( N , R) and y(0) is a random initial guess for Newton's method (8.7.11)-(8.7.12). For simplicity we take N = 1. Choose y ( t ) = et, g ( t ) = e-t. Then by Lemma 8.7.7 we can get f ( t ) = 1 and a = b = 1. Note that (8.7.39) corresponds to the equation

where

W e now restrict F on U ( y ,r * ) = U ( 0 , l ) . B y using (8.7.10), (8.7.22) and (8.7.23) we can get

l = e and lo= e - 1. That is (8.7.35) holds as strict inequality. I n particular we obtain

Moreover (8.7.38) holds. Note that the results concerning Version 2 of Newton's method also studied in [265] can be immediately improved along the same lines. However we leave the details to the motivated reader.

8.8

Finite Element Analysis and Discretizations

In this Section we use our weaker version of the Newton-Kantorovich theorem introduced in Section 5.1 to show existence and uniqueness of solutions for strongly nonlinear elliptic boundary value problems [324]. Moreover, a priori estimates are

8.8. Finite Element Analysis and Discretizations

337

obtained as a direct consequence of this theorem. This study is motivated by the work in [324] where the Newton-Kantorovich theorem 4.2.4 was used. The advantages of our approach have been explained in detail in Section 5.1. Finally examples of boundary value problems where our results apply are provided.We assume the following: ( A l ) There are Banach spaces Z C X and U C Y such that the inclusions are continuous, and the restriction of F to Z , denoted again by F, is a Frkchetdifferentiable operator from Z to U. (A2) For any v E Z the derivative Ff(v) E L(Z, U ) can be extended to Ff(v) E L(X, Y) and it is: - locally Lipschitz continuous on Z, i.e., for any bounded convex set T L Z there exists a positive constant cl depending on T such that:

for all v, w E T - center locally Lipschitz continuous at a given uo E Z, i.e., for any bounded convex set T C Z, with uo E T, there exists a positive constant co depending on uo and T such that:

for all v E T. (As) There are Banach spaces V C Z and W C U such that the inclusions are continuous. We suppose that there exists a subset S C V for which the following holds: "if Ff(u) E L(V, W) is an isomorphism between V and W at u E S, then F f ( u ) E L(X, Y) is an isomorphism between X and Y as well." To define discretized solutions of F(u) = 0, we introduce the finite dimensional subspaces S d C Z and S d C U parametrized by d, 0 < d < 1 with the following properties: (Ad) There exists r L 0 and a positive constant ca independent of d such that

(As) There exists projection IId : X solution of F ( u ) = 0, then

--t

Sd for each Sdsuch that, if uo

E

S is a

lim dCr 1

1 ~ 0 - IIduollX = 0

(8.8.4)

lim dCr 1

1 ~ 0 - IIduollZ = 0

(8.8.5)

d-+o

and d-+o

We can show the following result concerning the existence of locally unique solutions of discretized equations:


338

Theorem 8.8.1 Assume that conditions (A1)-(&) hold. Suppose F1(uo)E L ( V , W ) is an isomorphism, and uo E S . Moreover, assume F1(uo) can be decomposed into F 1 ( u o )= Q R, where Q E L ( X , Y ) and R E L ( X , Y ) is compact. The discretized nonlinear operator Fd : Z -+ U is defined by

+

where I is the identity of Y, and Pd : Y

-+ Sd

is a projection such that

lim llv - Pdv1lY = 0, for all v E Y,

(8.8.7)

( I - Pd)Q(vd) = 0 , for all vd E Sd.

(8.8.8)

d+o

and

Then, for suficiently small d > 0 , there exists ud E Sd such that F d ( u d )= 0 , and ud is locally unique. Moreover, the following estimate holds

where

11

is a positive constant independent of h.

Proof. The proof is similar to the corresponding one in [324, Th. 2.1, p. 1261. However, there are some crucial differences where weaker (8.8.2) is used (needed) instead of stronger condition (8.8.1). Step 1. We claim that there exists a positive constant cg, independent of d, such that, for sufficiently small h > 0,

11

11

From ( A 3 )and uo E S , F1(uo)E L ( X , Y )is an isomorphism. Set B o = F ' ( U ~ ) - .~ We can have in turn

+ F1(uo)E L ( X , Y ) is compact we get by (8.8.7) that lim 11 ( I - Pd)(-Q + F1(uo)ll= 0. d+o

Since -Q

By (8.8.7) there exists a positive constant

That is, using (8.8.2) we get

c4

such that

(8.8.12)

8.8. Finite Element Analysis and Discretizations

339

Hence, by (8.8.5) we can have

where lim 6 ( d ) = 0 , and (8.8.10) holds with q = $. d-0

lim d-'

d+o

IIFA(II~(uo))-~F~(~z~(uo)II

Step 2. We shall show:

= 0.

(8.8.16)

Note that

where,

and we used

where c5 is independent of d. The claim is proved. Step 3. We use our modification of the Newton-Kantorovich theorem with the following choices: A = Sd Z with U with norm d-' IIwdll , xo = &(uo), F = Fd. norm d-' llwdllx , B = Sd Notice that IISIILcA,B,= IISIIL(X,Y),for any linear operator S E L(Sd, Sd). By step 1 , we know that FA(IId(uO))E L ( S d ,S d ) is an isomorphism. It follows from (8.8.1) and ( A 4 )that for any wd, ud E Sd,

Similarly, we get using (8.8.2) and ( A d ) that

Hence assumptions on the Lipschitz constants are satisfied with

1 = c1c2ci1c4 and lo = coc2ci1 cq. From step 2, we may take sufficiently small d

(8.8.22)

> 0 such that (lo + l ) 5~1, where

340


That is, assumption (5.1.56) is satisfied. Hence, for sufficiently small d exists a locally unique u d E Sd such that F d ( u d )= 0 and

It follows (8.8.9) holds with

11

> 0 there

= 2 ~ 3 ~ rn~ 4 ~ ~ .

Remark 8.8.1 I n general,

holds and can be arbitrarily large (see Section 5.1), where 1 and lo are given by (8.8.22). If 1 = lo, our Theorem 8.8.1 reduces to the corresponding Theorem 2.1 i n [324, p. 1261. Otherwise our condition (5.1.56) is weaker than the corresponding one i n [324] using the Newton-Kantorovich hypothesis (5.2.5). That is,

but not vice versa. A s already shown i n Section 5.1, finer estimates on the distances llud - Hd(uo)l[and a more precise information on the location of the solution are provided here and under the same computational cost since i n practive the evaluation of cl requires that of co. Note also that our parameter d will be smaller than the corresponding one in [324], which i n turn implies that fewer computations and smaller dimension subspaces Sd are used to appproximate u d . This observation is very important i n computational mathematics (681. The above observations suggest that all results obtained in [324]can be improved if rewritten using weaker (5.1.56) instead of stronger (5.2.5). However we do not attempt this here (leaving this task to the motivated reader). Instead we provide examples of nonlinear problems already reported in [324] where finite element methods apply along the lines of our Theorem above. Example 8.8.1 Find u E H h ( r ) , r = (b, c ) & R such that

( F ( u ) ,u ) =

[go(x,u, ul)d

+ g ( x , u,ul)u]d x = 0 ,

for all v E H,'(T), (8.8.27)

where go,g1 are suficiently smooth functions from r x R x R to R. Example 8.8.2 For the N-dimensional case ( N = 2,3): let D C RN be a bounded domain with a Lipschitz boundary. Then consider the problem: find u E H J ( d ) such that

( F ( u ) ,u ) =

+

[qo(x,21, V u ) V u q ( x ,u, V U )d~x ]= 0 , for all v E H;(d), (8.8.28)

where qo : D x R x RN + RN and q E D x R x RN + R are suficiently smooth functions. Since equations (8.8.27) and (8.8.28) are defined i n divergence form, their finite element solutions are defined in a natural way.

8.9. Exercises

341

Results on iterative methods defined on generalized Banach spaces with a convergence structure [68], [231], [232] can be found in the exercises.

8.9

Exercises

8.1 Let g be an algorithm for finding a solution x* of equation F (x) = 0, and x

the approximation to x* computed by g. Define the error for approximating

x

Consider the problem of approximating x* when F satisfies some conditions. Algorithms based on these conditions cannot differentiate between operators in the class C of all operators satisfying these conditions we confort the class C instead of specific operators from C. Define

di = inf sup d (g, F ) g E A FCC

where A is the class of all algorithms using i units of time. The time t needed to approximate x* to with in error tolerance E > 0 is the smallest i such that di I E, and an algorithm is said to be optimal if sup d (g, F ) = dt. FCC

If for any algorithm using i units of time, there exist functions P I , F2 in C such that: the minimum distance between any solution of F, and any solution of F2 is greater or equal to 2s then, show:

8.2 With the notation introduced in Exercise 8.1, assume: (1) F : [a,b] + R is continuous; (2) F (a) < 0, F (b) > 0, Then, show:

d. - b-" 2 - 2i+l ' 8.3 With the notation introduced in Exercise 8.1, assume: (1) F : [a,b] + R, F' (x) 2 a > 0 for all x E [a,b] ; (2) F (a) < 0, F (b) > 0.

Then, show: d2. -" -b 2i+l '

342


8.4 With the notation introduced in Exercises 8.1, assume: ( 1 ) F : [a,b] + R, F' ( x ) b for all x E [a,b];

0 Then, show: d 2. b-a - m.

8.5 With the notation introduced in Exercis 8.1, assume: ( 1 ) F : [a,b]+ R, b 2 F 1 ( x )2 a > 0 for all x E [a,b ] ; ( 2 ) F ( a ) < 0, F (b) > 0 Then, show:

8.6 Assume: (1) hypotheses of Exercise 8.5 hold; ( 2 ) IIF" (x)Il 5 y for a11 y E [a,b]. Then show that the problem of finding a solution x* of equation F ( x ) = 0 can be solved superlinearly.

8.7. Let L , M , M I be operators such that L E C 1 (Vl + V ) , M I E L+ ( V ) , M E C (Vl + V ) , and x , be points in D. It is convenient for us to define the sequences c,, d,, a,, b, (n 2 0 ) by C n = [ x n +~ xnI a, = ( L M M I ) , ( a ) for some a E C, bn = ( L M M I ) , (0),

+ + + +

and the point b by

Prove the result: Let X be a Banach space with convergence structure ( X ,V ,E) with V = (V,C, ll.llv), an operator F E C 1 ( D --, X ) with D C X , an operator Q E C ( D + X ) , an operator L E C 1 (Vl -+ V ) with Vl V , an operator M E C (Vl -+ V ) ,an operator M I E L+ ( V ) , a point a E C , and a null sequence ( 2 , ) E D such that the following conditions are satisfied:

( C s ) the inclusions U ( a ) C D and [O, a] C V 1 are true; (C7) L is order-convex on [0,a], and satisfies

for all x , y E U ( a ) with 1x1

+ Iyl 5 a;

8.9. Exercises

(Cs) M satisfies the conditions

for all x, y E U (a) with 1x1 M (wl) - M (wz)

+ Iyl I a, and

< M (w3) - M (w4) and M (w) > 0

for all w, wl, w2, w3, w4 E [O,a] with wl I ws, w2 I w4, w2 I wl, W4 I ~ 3 ; (C9) MI, x,, z, satisfy the inequality

for all n

2 1;

((210) L' (0) E B (I- F' (0)) , and (- ( F (0) Q (0) f' (0) ( ~ 0 ),) L (0) M (0) Mi (0)) E E; ((311) (L M MI) (a) I a with 0 I L M MI; and (C12)

+ + + + + + + ( M + MI + L' (a))" a + 0 as n + m.

+

Then, (i) The sequence (x,, d,) E (X x v ) is~well defined, remains in EN, is monotone, and satisfies for all n 2 0

where b is the smallest fixed point of L (ii) Moreover, the iteration {x,}

+ M + MI in [O, a].

(n > 0) generated by

converges to a solution x* E U (b) of the equation F (x) which is unique in U (a). (iii) Furthermore, the following estimates are true for all n

+ Q (x) = 0,

> 0:

I dn I b, bn 5 an, Ixn+l- xnI I dn+l- dn, bn

Ix, -x*I

< b-d,,

and Ix,

-

x* I

I a,

-

b,,

for MI

= 0,

and z,

=0

(n 2 0).

344


8.8. We will now introduce results on a posteriori estimates for the iteration introduced in exercise 8.7. It is convenient to define the operator

where,

and the interval

In = 10, a - 1xn11 . Show: (a) operators S, are monotone on I,;

(b) operators Rn : [0, a - d,] Hint: Verify the scheme:

(c) if q E In satisfy Rn (q)

+

[0,a - d,] are well defined, and monotone.

I q, then

cn I Rn (q) = P l q, and

Rn+l ( p - c,)

I p - c,

for all n 2 0;

(d) under the hypotheses of Exercise 8.7, let qn E In be a solution of R , (q) 5 q, then

where

a, = q, and a,+l=

R , (a,)

- em;

and (e) under the hypotheses of exercise 8.7, any solution q E I, of R, (q) 5 q is such that

8.9. Exercises

345

8.9. Let A E L (X + X ) be a given operator. Define the operators P, T (D + X )

by P(x) = A T ( x + u ) , T (x) = G (x)

+ R (x) , P (x) = F (x) + Q (x) ,

and

where A E L ( X + X ) G, R are as F, Q respectively. We deduce immediately that under the hypotheses of Exercise 8.7 the zero x* of P is also a zero of AT, if u = 0. We will now provide a monotonicity result to find a zero x* of AT. the space X is assumed to be partially ordered and satisfies the conditions for V given in (C1)-(C5). Moreover, we set X = V, D = C2 SO that 1.1 turns out to be I . Prove the result: Let V be a partially ordered Banach space satisfying conditions (C1)-(C5), V, Y be a Banach space, G E C1 (D + Y), R E C (D + Y) with D A E L ( X + V ) , M E C ( D + V ) , Ml E L+(V) a n d u , v E V s u c h t h a t : ((313) [u,vl

G D;

(C14) I - AG' (u)

+ M + M I E L+ (V) ;

(C15) for all wl, w2 E [u,v] : wl

I w2 +AG' (wl) L AG' (wa) ;

(Cis) AT (u) + AG' (u) (zo) I 0, AT (v) + AG' (v)(20) 2 0 and AT (v) MI (V- U) 2 0; (C17) condition (Cs) is satisfied, and M (v - u)

I -Q

(v - u);

(Cis) condition (Cg) is satisfied ; (C19) the following initial condition is satisfied

and (Czo) ( I - AG' (v)

+ m + MI)" (v - u) + 0 as n + ca.

Then the Newton sequence

is well defined for all n AT in [u,v].

> 0, monotone and converges to a unique zero x* of

346


8.10. Let X be a Banach space with convergence structure (X, V, E) with V = (V, C, Il,llv), an operator F E C1 (XF + X ) with XF C X, an operator L E C1 (VL + V) with VL G V and a point of C such that

(a) U(a) C XF, [(),a]C VL; (b) L is order convex on [0,a] and satisfies for x, y E U (a) with 1x1

+ I yl 5 a;

(c) L' (0) E B (I- F' (0)), (-F (0) ,L (0)) E E;

AG' ( ~ 2 ) ;

Then show: the Newton sequence uo = U, un+l := U+ [AG' (un)]*[-AG (un)] ( n 2 0)

is well defined, monotone, and converges to the unique zero x of AG in [u,v]. 8.15. Consider the two-point boundary value problem

-xl' (s) = 4 sin (x (s))

+ f (s) ,

x (0) = x (1) = 0

as a possible application of Exercises 8.7-8.14 in the space X = C [O,1] and V = X with natural partial ordering; the operator /./ is defined by taking absolute values. Let G E L+ (C [0, 11) be given by

{ lssin (2t) sin (2

Gx (s) = -

+ satisfying Gx = y

Jds

- 2s) x

(t) dt

sin (2 - 2t) sin (2s) x (t) dt

1

-yt' - 4y = x, y (0) = y (1) = 0. Define the operator

Let

For x, y, w E C [O,1] 1x1 ,I4 + IYI 5 -57r =+ I[F' (x) - F' (x Y)]wl 5 [L' (1x1 + IY~)- L'L' (IxI)l IwI.

+

Further we have L (0) = lGf 1 and L' (0) = 0. We have to determine a E C+ [ O , l ] with la1 5 .57r and s E (0,l) such that L (a) = 4G (a - sin (a)) lGf 1 5 sa. We seek a constant function as a solution. For eo (s) = 1, we compute

+

348


Show that a = teo will be a suitable solution if

+ ll Gf ll

4p (t - sin ( t ) )

oo

< t.

8.16. (a) Assume: given a Banach space X with a convergence structure ( X ,V ,E ) with V = (V,C , [[.[[v), an operator F E C1 ( X o & X + X ) , operators M , Lo, L E L1 (VOC V + V ) , and a point p E C such that the following conditions hold:

M , Lo, L are order convex on [O,p],and such that for x , y, x E U ( p ) with 1x1 I P , Ivl + 1x1 I P Lb (1x1)- Lb ( 0 ) E B (F' ( 0 )- F' ( 2 ) ), - L' (191) E B (F' (9)- F' ( 3 XI), M (PI I P LO(PO) I M ( P o ) for all PO E [O,p], Lb(po) I M'(Po) for all PO E [o,P], Lb E B ( I - F' (011, (-F(O), Lo(0)) E E , M' (p)" ( p ) + 0 as n + m, M' (d,) (b - d,) L (d,) I M (b) for all n 2 0, L'

+

(Ivl + 1x1)

+

where

and

Show: sequence {x,) ( n 2 0 ) generated by Newton's method is well defined, remains in E ~is ,monotone, and converges to a unique zero x* in U ( b ) ,where b is the smallest fixed point of M in [O,p]. Moreover the following bounds hold for all n 2 0

and

8.9. Exercises

349

< r then show: the following holds for

(b) If r E [O,p - Ixnl] satisfies R,(r) all n 0:

>

and

(c) Assume hypotheses of (a) hold and let r, E [O,p - Ixnl] be a solution (8.9.1). Then show the following a posteriori estimates hold for all n 2 0

where

(d) Under hypotheses of (a) show any solution r E [0,p - lxnl] of Rn(r) 5 r yields the a posteriori estimate

Let X be partially ordered and set

X=V,

D = C ~and I.I=I.

(e) Let X be partially ordered, Y a Banach space, G E C1 (Xo A E L (Y + X ) and x, y E X such that:

c X + Y) ,

1. [x,Y] E Xo. 2. I - AG' (x) E L+ (X) , 3. zl I 22 ==+ AG' (21) I AG'

(22)

for all zl, z2 E [x,y],

4. AG (x) 5 0, AG (9) 2 0,

5. (I- AG' (9))" (y - x) Show: sequence {y,) (n Y O = X,

Yn+l

= yn

is well defined for all n AG in [x,y].

-, 0

as n

-, 00.

> 0) generated by

+ AG' (yn)*[-AG

(yn)]

> 0, monotone and converges to a unique zero y* of

350


8.17. Let there be given a Banach space X with convergence structure (X, V, E) where V = (V, C, Il.llv), and operator F E C" (Xo + X ) with Xo c X , and operator M E C' (Vo + V) with Vo C V, and a point p E C satisfying:

M is order-convex on [O,p] and for all x, y

E

U(p) with 1x1

+ Iyl Ip

and

Then, show sequence (x,, t,) E (X x v ) ~ where , {x,} is generated by Newton's method and {t,} (n 0) is given by

>

to = 0, tn+l = M (tn)

+ M' ( 1 ~ ~(a,) 1 ) , a, = IX,+~

-

xnl

is well defined for all n 2 0, belongs in E", and is monotone. Moreover the following hold for all n

>0

where,

b = M" (0), is the smallest fixed point of M in [O,p]. b) Show corresponding results as in Exercises 8.16 (b)-(c). 8.18. Assume hypotheses of Exercise 8.16 hold for M = L.

Show: (a) conclusions (a)-(e) hold (under the revised hypothesis) and the last hypothesis in (a) can be dropped; (b) error bounds lx* - x,l (n 2 0) obtained in this setting are finer and the infromation on the location of the solution more precise than the corresponding ones in Exercise 8.10 provided that Lo (PO)< L (PO)or Lb (PO)< L' (PO) for all p E [O,p]. As in Exercise 8.11 assume there exists a monotone operator ko : [O,p]+ R such that

8.9. Exercises

and define operator Lo by

Sequence {d,) given by do = 0, d,+l p* E [0,p] provided that

= L(d,)

+Lb (Ix,~)c, converges to some

Conclude that the above semilocal convergence condition is weaker than (d) in Exercise 8.10 or equivalently (for p = a )

Finally, conclude that in the setting of Exercise 8.7 and under the same computational cost we always obtain and under weaker conditions: finer error bounds on the distances lx, - x*I (n 2 0) and a better information on the location of the solution x* than in Exercise 8.7.


Chapter 9

The Newton-Kantorovich Theorem and Mathematical Programming The Newton-Kantorovich theorem 4.2.4 with a few notable exceptions [293], [294], [282] has not been sufficiently utilized in the mathematical programming community. The purpose of this chapter is to provide a bridge between the two research communities by showing that the Newton-Kantorovich theorem can be used in analyzing LP and interior point methods, and can be used for obtaining optimal error bounds along the way. In Sections 1 and 2 we show how to improve on the elegant works of Renegar, Shub in [294] and Potra in [282]. To avoid repetitions, we simply refer the reader to the above excellent works for the development of these methods. We simply start from the point where the hypotheses made can be replaced by ours which are weaker. The benefits of this approach have also been explained in the introduction and in Section 5.1. We also note that the work in [294] is motivated by a Theorem of Smale in combination with the Newton-Kantorovich theorem, whereas the work in [282] differs from the above since it applies the latter theorem directly.

9.1

Case 1: Interior Point Methods

It has already been shown in [282] that the Newton-Kantorovich theorem can be used to construct and analyze optimal-complexity path following algorithms for linear complementary problems. Potra has chosen to apply this theorem to linear complementary problems because such problems provide a convenient framework for analyzing primal-dual interior point methods. Theoretical and experimental work conducted over the past decade has shown that primal-dual path following algorithms are among the best solution methods for linear programming (LP), quadratic programming (QP), and linear complementary problems (LCP) (see for example [285], [346]). Primal-dual path following algorithms are the basis of the 353

354

9. The Newton-Kantorovich Theorem and Mathematical Programming

best general-purpose practical methods, and they have important theoretical properties [304], [346], [354]. Potra, using the Newton-Kantorovich theorem 4.2.4, in particular showed how √ to construct path-following algorithms for LCP that have O( nL) iteration complexity [282]. Given a point x that approximates a point x(τ ) on the central path of the LCP with complementarity gap τ, the algorithms compute a parameter θ ∈ (0, 1) so that x satisfies the Newton-Kantorovich hypothesis (4.2.51) for the equation defining x ((1 − θ) τ ) . It is proven that θ is bounded below by a multiple of n−1/2 . Since (4.2.51) is satisfied, the sequence generated by Newton’s method (4.1.3) or by the modified Newton method (take F ′ (xn ) = F ′ (x0 ), n ≥ 0) with starting x, will converge to x ((1 − θ) τ ) . He showed that the number of steps required to obtain an acceptable approximation of x ((1 − θ) τ ) is bounded above by a number independent of n. Therefore, apoint √ with complementarity less than ε can be obtained in at most O n log εε0 steps (for both methods), where ε0 is the complementary gap of the starting point. For linear complementarity problems with rational input data √of bit length L, this implies that an exact solution can be obtained in at most O ( nL) iterations plus a rounding procedure including O n3 arithmetic operations [346]. The differences between Potra’s work and the earlier works by Renegar [293] and Renegar and Shub [294] have been stated in the introduction of this chapter and further analyzed in [282]. We also refer the reader to the excellent monograph of Nesterov and Nemirovskii [254] for an analysis of the construction of interior point methods for a larger class of problems than that considered in [282]. Below one can find our contribution. Let k · k be a given norm on Ri , i a natural integer, and x0 be a point of D such that the closed ball of radius r centered at x0 , U (x0 , r) = {x ∈ Ri : kx − x0 k ≤ r}

(9.1.1)

is included in D ⊆ Ri , i.e. U (x0 , r) ⊆ D.

(9.1.2)

We assume that the Jacobian F ′ (x0 ) of F : D ⊆ X → X is nonsingular and that Lipschitz condition (4.2.50) is satisfied. The Newton–Kantorovich Theorem 4.2.4 states that if k = ℓη ≤

1 , 2

(9.1.3)

then there exists x∗ ∈ U (x0 , r) with F (x∗ ) = 0. Moreover the sequences produced by Newton’s method (4.1.3) and by the modified Newton method (4.1.4) are well defined and converge to x∗ .

9.1. Case 1: Interior Point Methods

355

Define ¯ ≤ k 0 = ℓη

1 , 2

(9.1.4)

where, ℓ0 + ℓ ℓ¯ = , 2

(9.1.5)

where ℓ0 is the constant appearing in the center Lipschitz condition. Note that k≤

1 1 ⇒ k0 ≤ 2 2

(9.1.6)

but not vice versa unless if ℓ0 = ℓ. Similarly by simply replacing ℓ with ℓ0 (since the center-Lipschitz condition instead of (4.2.50) is actually needed in the proof) and condition (9.1.3) by the weaker k 1 = αω0 ≤

1 2

(9.1.7)

in the proof of Theorem 1 in [282] we show that method (4.1.4) also converges to x∗ and the improved bounds kyn − x∗ k ≤

2β0 λ20 n−1 ξ 1 − λ20 0

(n ≥ 1)

(9.1.8)

where β0 =

√ 1 − 2k 1 , ω0

λ0 =

1−

√ 1 − 2k 1 − h1 k1

(9.1.9)

and ξ0 = 1 −

p 1 − 2k 1 ,

(9.1.10)

hold. In case ℓ0 = ℓ (9.1.9) reduces to (12) in [282]. Otherwise our error bounds are finer. Note also that k≤

1 1 ⇒ k1 ≤ 2 2

(9.1.11)

but not vice versa unless if ℓ0 = ℓ. We can now describe the linear complementarity problem as follows: Given two matrices Q, R ∈ Rn×n (n ≥ 2) and a vector b ∈ Rn , the horizontal linear

356


complementarity problem (HLCP) consists of approximating a pair of vectors (w, s) such that ws = 0 Q(w) + R(s) = b

(9.1.12)

w, s ≥ 0. The monotone linear complementarity problem (LCP) is obtained by taking R = −I and Q positive semidefinite. Moreover the linear programming problem (LP) and the quadratic programming problem (QP) can be formulated as HLCPs. That is, HLCP is a suitable way for studying interior point methods. We assume HLCP (9.1.12) is monotone in the sense that: Q(u) + R(v) = 0 implies ut v ≥ 0, for all u, v ∈ Rn .

(9.1.13)

Condition (9.1.13) holds if the HLCP is a reformulation of a QP. If the HLCP is a reformuation of a LP then the following stronger condition holds: Q(u) + R(v) = 0 implies ut v = 0, for all u, v ∈ Rn .

(9.1.14)

Then we say in this case that the HLCP is skew-symmetric. If the HLCP has an interior point, i.e. there is (w, s) ∈ Rn++ × Rn++ satisfying Q(w) + R(s) = b, then for any parameter τ > 0 the nonlinear system ws = τ e Q(w) + R(s) = b

(9.1.15)

w, s ≥ 0 t

has a unique positive solution x(τ ) = [w(τ )t , s(τ )t ] . The set of all such solutions defines the central path C of the HLCP. It can be proved that (w(τ ), s(τ )) converges to a solution of the HLCP as τ → 0. Such an approach for solving the HLCP is called the path following algorithm. At a basic step of a path following algorithm, an approximation (w, s) of (w(τ ), s(τ )) has already been computed for some τ > 0. The algorithm determines the smaller value of the central path parameter τ+ = (1 − θ) τ, where the value θ ∈ (0, 1) is computed in some unspecified way. The approximation (wt , st ) of (w(τ+ ), s(τ+ )) is computed. The procedure is then repeated with (w+ , s+ , τ + ) in place of (w, s.τ ) . In order for us to relate the path following algorithm and the the NewtonKantorovich theorem we introduce the notations w w (τ ) x= , x (τ ) = , s s (τ ) + w w (τ+ ) x+ = , x (τ ) = , etc. + s+ s (τ+ )

9.1. Case 1: Interior Point Methods Then for any θ > 0 we define the nonlinear operator ws − σe Fσ (x) = . Q(w) + R(s) − b

357

(9.1.16)

Then system (9.1.15) defining x(τ ) becomes Fσ (x) = 0,

(9.1.17)

whereas the system defining x(τ+ ) is given by F(1−θ)τ (x) = 0.

(9.1.18)

We assume that the initial guess x belongs in the interior of the feasible set of the HLCP F 0 = x = (wt , st )t ∈ R2n (9.1.19) ++ : Q(w) + R(s) = b . In order to verify the Newton-Kantorovich hypothesis for equation (9.1.17) we introduce the quantity

η = η(x, τ ) = F ′ (x)−1 Fτ (x) , (9.1.20)

the measure of proximity

k = k(x, τ ) = ηℓ, ℓ = ℓ(x) ¯ ℓ¯ = ℓ(x) ¯ k 0 = k 0 (x, τ ) = η ℓ,

(9.1.21)

k 1 = k 1 (x, τ ) = ηℓ0 , ℓ0 = ℓ0 (x) and the normalized primal-dual gap µ = µ(x) =

wt s . η

(9.1.22)

If for a given interior point x and a given parameter τ we have k 0 (x, τ ) ≤ .5 for the Newton-Kantorovich method or k 1 (x, τ ) ≤ .5 for the modified Newton-Kantorovich method then corresponding sequences starting from x will converge to the point x (τ ) on the central path. We can now describe our algorithm which is a weaker version of the one given in [282]: Algorithm 9.1.1 (using Newton-Kantorovich method). Given 0 < k10 < k20 < .5, ε > 0, and x0 ∈ F 0 satisfying k 0 (x0 , µ (x0 )) ≤ k10 ; Set k 0 ← 0 and τ0 ← µ (x0 ) ; repeat (outer iteration) Set (x, τ ) ← (xk , τk ) , x ¯ ← xk ; Determine the largest θ ∈ (0, 1) such that k 0 (x, (1 − θ)τ ) ≤ k20 ;

358

9. The Newton-Kantorovich Theorem and Mathematical Programming Set τ ← (1 − θ)τ ; repeat (inner iteration) Set x ← x − F ′ (x)−1 Fτ (x)

(9.1.23)

until k 0 (x, µ) ≤ k10 ; Set (xk+1 , τk+1 ) ← (x, τ ) ; Set k ← k + 1; t until wk sk ≤ ε.

For the modified the Newton-Kantorovich algorithm k10 , k20 , k 0 should be replaced by k11 , k21 , k 1 , and (9.1.23) by Set x ← x − F ′ (¯ x)−1 Fτ (x) respectively. In order to obtain Algorithm 1 in [282] we need to replace k10 , k20 , k 0 by k1 , k2 , k respectively. The above suggest that all results on interior methods obtained in [282] using (9.1.3) can now be rewritten using only the weaker (9.1.5) (or (9.1.8)). We only state those results for which we will provide applications. Let us introduce the notation p 1 + θia + p2θia + ria , if HLCP is monotone, Ψai = (9.1.24) 2 , if HLCP is skew-symmetric 1 + qia + 2qia + qia p p where ria = θia , tai = kia , a = 0, 1, h θia = ti 1 +

ta i 1−ta i

i

, qia =

tai , i = 1, 2. 2

(9.1.25)

Then by simply replacing k, k1 , k2 by k 0 , k10 , k20 respectively in the corresponding results in [282] we obtain the following improvements: Theorem 9.1.2 The parameter θ determined at each outer iteration of algorithm 9.1.1 satisfies χa θ ≥ √ = λa n where

a

χ =

 √ a a   √ 2(k2 −k √1 ) a , 2  

2+p ti ψ1 √ 2(k2a −k1a ) √ , √ ( 2+pk1a ) ψ1a

if HLCP is skew-symmetric or if no simplified Newton-Kanorovich steps are performed, otherwise (9.1.26)

9.1. Case 1: Interior Point Methods where

√ 2, if HLCP is monotone, 1, if HLCP is skew-symmetric.

p=

359

(9.1.27)

Clearly, the lower bound on λa on θ is an improvement over the corresponding one in [283, Corollary 4]. In the next result a bound on the number of steps of the inner iteration that depends only on k10 and k20 is provided. Theorem 9.1.3 If Newton-Kantorovich method is used in Algorithm 9.1.1 then each inner iteration terminates in at most N 0 k10 , k20 steps, where    0 log (x ) N 2 h i  (9.1.28) N 0 k10 , k20 = integer part log2  p 0 0 0 log2 1 − 1 − 2k2 − k2 /k2

and

xN 0

2 p √ 1 − pk20 / 2 t02 − 1 − 1 − 2k20 k10 hp i = √ p . p 2 2t02 1 − 2k20 ψ20 + 1 − 1 − 2k20 (1 + k10 )

(9.1.29)

If the modified the Newton-Kantorovich method is used in Algorithm 9.1.1 then each iteration terminates in at most S 0 (k1 , k2 ) steps, where  

and

 S 1 k11 , k21 = integer part  

xS 1

 log2 (xS 1 )  q +1 p 1 log2 1 − 1 − 1 − 2k2

(9.1.30)

2 p √ 1 − pk21 / 2 t12 − 1 − 1 − 2k21 − k21 k10 = √ p . 2 p p p 1 1 1 1 1 1 2 2 1 − 2k2 1 − 1 − 2k2 − k2 ψ2 + 1 − 1 − 2k2 (1 + k1 )

Clearly, if k11 = k10 = k1 , k21 = k20 = k2 , k 1 = k 0 = k, Theorem 9.1.2 reduces to the corresponding Theorem 2 in [282]. Otherwise the following improvement holds: N 0 k10 , k20 < N (k1 , k2 ) , N 0 < N, S 1 k11 , k21 < S (k1 , k2 ) , and S 1 < S. Since choices

k1 k2 k1 k2 , , , k10 k20 k11 k21

can be arbitrarily large for a given triplet η, ℓ and ℓ0 , the

k10 = k11 = .12, k20 = k21 = .24 when k1 = .21 and k2 = .42

360


and k10 = k11 = .24, k20 = k21 = .48 when k1 = .245 and k2 = .49 are possible. Then using formulas (9.1.27), (9.1.28) and (9.1.30) for our results and (9)–(11) in [282], we obtain the following tables: (a) If the HLCP is monotone and only Newton directions are performed, then: Potra χ(.21, .42) > .17

Argyros χ (.12, .24) > .1

χ(.245, .49) > .199

χ0 (.24, .48) > .196

0

Potra N (.21, .42) = 2

Argyros N (.12, .24) = 1

N (.245, .49) = 4

N 0 (.24, .48) = 3

0

(b) If the HLCP is monotone and Modified Newton directions are performed: Potra χ(.21, .42) > .149

Argyros χ (.12, .24) > .098

χ(.245, .49) > .164

χ1 (.24, .48) > .162

1

Potra S(.21, .42) = 5

Argyros S 1 (.12, .24) = 1

S(.245, .49) = 18

S 1 (.24, .48) = 12

All the above improvements are obtained under weaker hypotheses and the same computational cost (in the case of Newton’s method) or less computational cost (in the case of the modified Newton method) since in practice the computation of ℓ requires that of ℓ0 and in general the computation of ℓ0 is less expensive than that of ℓ.

9.2

Case 2: LP Methods

We are motivated by paper [294], where a unified complexity analysis for Newton LP methods was given. Here we show that it is possible under weaker hypotheses

9.2. Case 2: LP Methods

361

to provide a finer semilocal convergence analysis, which in turn can reduce the computational time for interior algorithms appearing in linear programming and convex quadratic programming. Finally an example is provided to show that our results apply where the ones in [294] cannot. Let us assume that F : D ⊆ X → Y is an analytic operator and F ′ (x)−1 exists at x = x0 ∈ D. Define η = η(x0 ) = kF ′ (x0 )−1 F (x0 )k

(9.2.1)

and 1

k−1

1 ′

−1 (k)

γ = γ(x0 ) = sup F (x0 ) F (x) . k≥2 k!

(9.2.2)

Note that η(x0 ) is the step length at x0 when method (4.1.3) is applied to approximate x∗ . Smale in [312] showed that if D = X, and γη ≤

1 = .125, 8

then Newton-Kantorovich method (4.1.3) converges to x∗ , so that n 1 kxn+1 − xn k ≤ 2 kx1 − x0 k (n ≥ 0). 2

(9.2.3)

(9.2.4)

Rheinboldt in [296], using the Newton–Kantorovich theorem showed convergence assuming D ⊆ X and γη ≤ .11909565.

(9.2.5)

We need the following semilocal convergence result for Newton-Kantorovich method (4.1.3) and twice Fréchet-differentiable operators: Theorem 9.2.1 Let F : D ⊆ X → Y be a twice Fréchet-differentiable operator. Assume there exist a point x0 ∈ D and parameters η ≥ 0, ℓ0 ≥ 0, ℓ ≥ 0, δ ∈ [0, 1] such that: F ′ (x0 )−1 ∈ L(Y, X),

(9.2.6)

kF ′ (x0 )−1 [F ′ (x) − F ′ (x0 )]k ≤ ℓ0 kx − x0 k,

(9.2.7)

kF ′ (x0 )−1 F ′′ (x)k ≤ ℓ

(9.2.8)

for all x ∈ D, U (x0 , t∗ ) ⊆ D,

(9.2.9)

362


and hδ = (ℓ + δℓ0 )η ≤ δ,

(9.2.10)

where, t∗ = lim tn ,

(9.2.11)

n→∞

with t0 = 0, t1 = η, tn+2 = tn+1 +

ℓ(tn+1 − tn )2 2(1 − ℓ0 tn+1 )

(n ≥ 0).

(9.2.12)

Then (a) sequence {xn } (n ≥ 0) generated by Newton-Kantorovich method (4.1.3) is well defined, remains in U(x0 , t∗ ) for all n ≥ 0, and converges to a unique solution x∗ of equation F (x) = 0 in U (x0 , t∗ ). Moreover the following estimates hold for all n ≥ 0: kxn+2 − xn+1 k ≤

ℓkxn+1 − xn k2 ≤ tn+2 − tn+1 , 2(1 − ℓ0 kxn+1 − x0 k)

kxn − x∗ k ≤ t∗ − tn ≤ t∗∗ 0 − tn ≤ αη,

(9.2.13) (9.2.14)

where, t∗∗ 0

"

1 δ0 + = 1+ 2 1−

δ 2

# δ δ0 2 η, t∗∗ = αη, α = , 2 2 2−δ

(9.2.15)

and δ0 =

ℓη . 1 − ℓ0 η

(9.2.16)

Furthermore the solution x∗ is unique in U (x0 , R), R =

2 ℓ0

− t∗ provided that

U (x0 , R) ⊆ D, and ℓ0 6= 0.

(9.2.17)

Hα = (ℓ + 2ℓ0 α)η < 2,

(9.2.18)

(b) If

then 0≤a=

ℓη 0, ℓ > 0 such that F ′ (x0 )−1 ∈ L(Y, X),

′

F (x0 )−1 F (x0 ) ≤ η,

′

F (x0 )−1 F ′′ (x) ≤ ℓ, for all x ∈ D, 4 3 γη ≤ and U (x0 , η) ⊆ D. 9 2 Then show that the the Newton-Kantorovich method is well defined, remains in U (x0 , 32 η) for all n ≥ 0, converges to a unique zero x∗ of F in D ∩ U (x0 , 1ℓ ) and satisfies 2n 1 ∗ kxn − x k ≤ 2 η, (n ≥ 1). 2 9.3. [294] Let F : D ⊆ X → Y be analytic. Assume there exist x0 ∈ D, δ ≥ 0 such that F ′ (x0 )−1 ∈ L(Y, X), 1 1 η≤ δ≤ , 2 40γ

378


and ¯ (x0 , 4δ) ⊆ D. U If k¯ x − x0 k ≤ δ then show that the Newton-Kantorovich method (4.1.3) starting at ¯ (x0 , 3 η) so that x¯ is well defined, and converges to a unique zero x∗ of F in U 2 7 kxn − x k ≤ 2 ∗

2n 1 δ. 2

9.4. Consider the LP barrier method [294]. Let Int = {x : A(x) > b} , A is a real matrix with ai denoting the ith row of A, and let h : Int × R+ → R denote the map X h(x, t) = ct x − t ln (ai x − bi ) . For fixed t, the map x → h(x, t) is strictly convex having a unique minimum. The sequence of minimum as t → 0 converges to the optimal solution of the LP. The algorithm simply computes a Newton-Kantorovich sequence {xn } (n ≥ 0), where xn+1 = xn − ∇2x h(xn , tn+1 )−1 ∇x h(xn , tn+1 ) and tn → 0. Show: 1 (a) for suitable x0 we can always choose tn+1 = 1 − 41√ tn and each xn will m be a ”good” approximation to the minimum of the map x → h(x, tn ). 1 and show Hint: Let y be the minimum of x → h(x, t), assume k¯ y − yk ≤ 20

′ ′ 1 ′ ′ −1 ′ ′

k¯ y − y k ≤ 20 , where kxk = ∆(y) A(x) 2 and for t > 0, y and k·k are defined in an obvious √ way. Then use Exercise 9.3. (b) An O ( mL) iteration bound. (c) the choice of sequence {tn } above can be improved if 41 is replaced by any a ∈ {22, 23, . . . , 40} . Hint: Use Theorem 9.2.3. Note that this √ is an improvement over the choice in (a). (d) An O ( mL) iteration bound using choices of sequences {tn } given in (c). (e) How to improve along the same lines as above the QP barrier method, the primal LP algorithm, the primal dual LP algorithm and the primal-dual QP algorithm as introduced in [294].

Chapter 10

Generalized equations Recent results on generalized equations that compare favourably with earlier ones are presented in this chapter.

10.1

Strongly regular solutions

In this section we are concerned with the problem of approximating a strongly regular solution x∗ of generalized equation 0 ∈ F (x) + NC (x)

(10.1.1)

where F is a continuously Fréchet-differentiable operator defined on an open, nonempty convex subset D of a Banach space X with values in a Banach space Y , and NC (x) is a normal cone to C at x (C ⊆ X closed and convex). Many problems in Mathematical Programming/Economics can be formulated as generalized equations [68], [99], [298], [302] (see also Chapter 7). Here motivated by [298] we show that it is possible under the same hypotheses computational cost to provide a wider choice of initial guesses in the local case and under weaker conditions finer error bounds on the distances involved as well as a more precise information on the location of the solution in the semilocal case. We achieve all that by using more precise majorizing sequences, along the lines of our work in Chapter 5. We first provide in a condensed form the preliminaries for generalized equations and strongly regular solutions, and further we provide a local convergence (see Theorem 10.1.6) and a semilocal convergence analysis (see Theorem 10.1.10) for method (4.1.3). Finally we show that our results compare favorably with the earlier ones. To make the section as self-contained as possible we need the definitions [298]. For a more detailed information on strongly regular equations see [298], [302] and the references there. 379

380

10. Generalized equations

Definition 10.1.1 Let F : D ⊆ X → Y be an operator with a continuous Fréchetderivative F ′ . Let y be a fixed point in D. Then the linearization of F at y denoted by LFy is defined by LFy (x) = F (y) + F ′ (y)(x − y).

(10.1.2)

It follows from (10.1.2) that LF y is an affine operator passing through y having “slope” equal to that of F at y. Consequently the linearization of a generalized equation at y can be obtained by replacing F by LFy , formally: Definition 10.1.2 Let F , D, X, Y be as in the previous Definition. Then the linearization of the generalized equation 0 ∈ F (x) + NK (x)

(10.1.3)

at y is the generalized equation 0 ∈ LFy (x) + NK (x), where K is a non-empty, closed convex cone in D, and NK (x) be the normal cone to K at x. Definition 10.1.3 Let T be a set-valued operator. We define T −1 (y) = {x | y ∈ T (x)}.

(10.1.4)

−1

It then follows T (0) = {x | 0 ∈ T (x)} is the solution set of the generalized equation 0 ∈ T (x). We pay special attention to the case T −1 when T = LFy + NK . The following definition is due to Robinson [298] and describes the behavior of (LFy + NK )−1 suitable for the convergence of Newton’s method: Definition 10.1.4 Let 0 ∈ F (x) + NK (x) have a solution x∗ . Set T = LFx∗ + NK . Then x∗ is a strongly regular solution of 0 ∈ F (x) + NK (x) if and only if there exists a neighborhood B1 of the origin, and a neighborhood B2 of x∗ , such that the restriction B2 ∩ T −1 to B1 is a single-valued and Lipschitz continuous operator. Note that the operator B2 ∩ T −1 is defined by (B2 ∩ T −1 )(y) = B2 ∩ T −1 (y).

Finally we need a Corollary of Theorem 2.4 in [298]. Corollary 10.1.5 Let F : D ⊆ X → Y have a Fréchet derivative F ′ on D. Suppose x0 ∈ D and the generalized equation 0 ∈ LFx0 (x) + NC (x) has a strongly regular solution with association Lipschitz constant d. Then there exist positive constants ρ, r and R such that for all x ∈ U(x0 , ρ) the following hold: (LFx + NC )−1 ∩ U (x1 , r) restricted to U(0, R) is single-valued and Lipschitz continuous with modulus d . 1 − dkF ′ (x) − F ′ (x0 )k

10.1. Strongly regular solutions

381

We can provide the following local convergence result for Newton’s method involving generalized equations: Theorem 10.1.6 Let C be a nonempty, closed, convex subset of X, and let D be a nonempty, open, convex subset of X. Moreover let F : D → Y have a Lipschitz continuous Fréchet derivative F ′ with Lipschitz constant b. Suppose: generalized equation 0 ∈ F (x) + NC (x)

(10.1.5)

has a strongly regular solution x∗ with associated Lipschitz constant d; operator F ′ has a center-Lipschitz constant (at x = x∗ ) b0 . Let ρ, r, and R be the positive constants introduced in Corollary 1. Let x0 ∈ D and define the notation: r0 = kx∗ − x0 k Sx = LFx + NC a0 = d(1 − db0 r0 )−1 1 h0 = a0 br0 . 2 Moreover suppose: d(b + 2b0 )r0 < 2, 1 2 br < R, 2 0 r0 < ρ

(10.1.6)

U (x∗ , r0 ) ⊂ D ∩ U(x∗ , ρ).

(10.1.9)

(10.1.7) (10.1.8)

Then Newton sequence {xn } (n ≥ 0) given by xn+1 = (Sxn )−1 (0) ∩ U (x∗ , r)

(10.1.10)

is well defined, remains in U (x∗ , r0 ) for all n ≥ 0 and converges to x∗ . Moreover the following estimates hold for all n ≥ 0: kxn+1 − x∗ k ≤

1 a0 bkxn − x∗ k2 2

(10.1.11)

and n+1

kxn+1 − x∗ k ≤ 2(a0 b)−1 h20

.

Note that b0 ≤ b holds in general and that

(10.1.12) b b0

can be arbitrarily large.

382


Proof. We introduce Tx = U (x∗ , r) ∩ Sx−1 , Tk = Txk ,

f (x) = d[1 − dkF ′ (x) − F ′ (x∗ )]−1

J(x) = LFx (x∗ ) − LFx∗ (x∗ ). We will show:

kTx (u) − Tx (v)k ≤ a0 ku − vk for all x ∈ U (x∗ , r0 )

(10.1.13)

and u, v ∈ U (0, R),

and

x∗ ∈ Sx−1 (J(x)) for all x ∈ D,

(10.1.14)

x∗ = Tx (J(x)) for x ∈ U(x∗ , r0 ) and J(x) ∈ U (0, R)

(10.1.15)

kJ(x)k ≤

b kx − x∗ k2 , 2

h < 1.

(10.1.16) (10.1.17)

By Corollary 10.1.5 Tx is a single-valued and Lipschitz continuous with modulus f (x) for x ∈ U(x∗ , ρ). Since, for x ∈ U (x∗ , r0 ) ⊂ U(x∗ , ρ) ∩ D kF ′ (x) − F ′ (x∗ )k ≤ b0 kx − x0 k ≤ b0 r0 we get f (x) ≤ a0 , which shows (10.1.13) because g(x) is the Lipschitz constant for operator Tx . Let x ∈ D, then we get 0 ∈ F (x∗ ) + NC (x∗ ) = LFx∗ (x∗ ) + NC (x∗ )

= LFx (x∗ ) − J(x) + NC (x∗ )

= −J(x) + Sx (x∗ ) + NC (x∗ ) so J(x) ∈ Sx (x∗ ), which shows (10.1.14). Equation (10.1.15) follows from (10.1.14) and the fact that operator Tx is single-valued on U (x∗ , r0 ). Estimate (10.1.16) is standard under the stated hypotheses, whereas (10.1.17) follows immediately from hypothesis (10.1.6). By (10.1.8), (10.1.17), and (10.1.19) we conclude (10.1.13) holds for x = xn . Using (10.1.15) and (10.1.20), we get X ∗ = Tn (J(xn )). That is xn+1 = Tn (0) is well defined, and by (10.1.13) kxn+1 − x∗ k = kT0 (0) − Tn (J(xn ))k ≤ a0 kJ(xn )k 1 ≤ ba0 kxn − x∗ k2 by (10.1.16), (10.1.19) and U(x∗ , r0 ) ⊆ D 2 1 ≤ ba0 (h0 r0 )2 by (10.1.19) 2 = h30 r0 < h0 r0 < r0 .


383

Furthermore by xn+1 ∈ U (x∗ , r0 ) ⊆ D and (10.1.16) we get 1 bkxn+1 − x∗ k2 2 1 1 ≤ b(h0 r0 )2 ≤ br02 < R, 2 2

kJ(xn+1 )k ≤

which completes the induction. Finally (10.1.11) follows from (10.1.19), and (10.1.12) from (10.1.11) using induction. Moreover using induction on k ≥ 1 we show: xk = Tk−1 (0) is well defined 1 kxk − x∗ k ≤ a0 bkxk−1 − x∗ k ≤ hr0 2 J(xk ) ∈ U (0, R).

(10.1.18) (10.1.19) (10.1.20)

For k = 1: T0 (0) is single-valued by Corollary 10.1.5, which shows (10.1.18). By (10.1.7) and (10.1.16) kJ(x0 )k ≤

b b kx0 − x∗ k2 = r02 < R 2 2

and hence by (10.1.15), x∗ = T0 (J(x0 )). Therefore we get kx1 − x∗ k = kT0 (0) − T0 (J(x0 ))k ≤ a0 kJ(x0 )k by (10.1.13) 1 1 ≤ ba0 kx0 − x∗ k2 = a0 br02 by (9) and (10.1.16) 2 2 = h0 r0 < r0 by (10.1.17). It follows by (10.1.8) and (10.1.9) that x1 ∈ U (x∗ , r0 ) ⊆ D. By (10.1.16) we can obtain b 1 1 kx1 − x∗ k2 ≤ b(h0 r0 )2 = br02 h20 2 2 2 1 < br02 by (10.1.17) 2 < R by (10.1.7).

kJ(x1 )k ≤

Therefore (10.1.18)–(10.1.20) hold for k = 1. Let us assume (10.1.18)–(10.1.20) hold for all k ≤ n. We must show (10.1.18)–(10.1.20) hold for k = n + 1. The following two results can be found in Section 5.1. Lemma 10.1.7 Let b0 > 0, b > 0, d > 0 and η > 0 with b0 ≤ b. Set h = d b02+b η and suppose h ≤ 12 . Define t0 = 0, tk+1 = G0 (tk ), (k ≥ 0), where G0 (t) =

1 2 2 dbt

−η . db0 t − 1

384


Let t∗ = lim tn and t∗∗ = 2η. Define also Q0 (s, t) = n→∞

dbs2 2(1−db0 t) .

Then the following

hold: (a) tk+1 = tk + Q0 (tk − tk−1 , tk ) (k ≥ 1); (b) G0 is monotone non-decreasing on [0, t∗ ]; (c) t∗ is the smallest fixed point of G0 on [0, t∗∗ ]. (d) sequence {tk } is monotonically increasing, and lim tn exists; (e) tk+1 − tk ≤ 2−k η (k ≥ 0); (f ) n→∞ k 1 k+1 η (k ≥ 0); (g) |t∗ − tk | ≤ 2a2 · (bd2k )−1 (k ≥ 0). tk+1 ≤ 2 1 − 2

Lemma 10.1.8 Let D ⊆ X, and let {xn } (n ≥ 0) be a sequence in X. With the notation using in Lemma 1 assume: (a) kxk+2 − xk+1 k ≤ Q0 (kxk+1 − xk k, kxk+1 − x0 k) whenever xk , xk+1 ∈ D; (b) x0 ∈ D; (c) t1 ≥ η = kx1 − x0 k; (d) x ∈ D whenever kx − x0 k ≤ t∗ ; (e) hypotheses of Lemma 1 hold. Then the following conclusions hold: (f ) kxk − x0 k ≤ t∗ (k ≥ 0); (g) sequence {xn } converges to some x∗ with kx∗ − x0 k ≤ t∗ ; (h) kx∗ − xk+1 k ≤ t∗ − tk+1 (k ≥ 0) (i) kxk+1 − xk k ≤ tk+1 − tk (k ≥ 0). Lemma 10.1.9 Let D be an open, convex subset of D. Let F : D → Y be continuously differentiable. Suppose that for some ℓ > 0 kF ′ (u) − F ′ (v)k ≤ ℓku − vk whenever u, v ∈ D. Then kF (u) − LFv (u)k ≤

1 ℓku − vk2 for all u, v ∈ D. 2

We can show the following semilocal convergence theorem for Newton’s method: Theorem 10.1.10 Let C, D, F , b, b0 and Sx be as in Theorem 10.1.6. Let x0 ∈ D and assume that the generalized equation 0 ∈ Sx0 (x) is strongly regular at x1 . Let ρ, r, and R be the positive constants associated with strong regularity, and d the associated Lipschitz constant, as given in Corollary 10.1.5. Let η := kx1 − x0 k and t∗ := lim tn , where a := d b02+b η. Suppose that the following relations hold: h ≤ 12 , by

n→∞ 1 2 2 bm ≤

R, and U (x0 , t∗ ) ⊂ D ∩ U (x0 , ρ). Then the sequence {xn } defined

xn+1 := Sx−1 (0) ∩ U(x0 , r) n

(10.1.21)

is well defined, converges to some x∗ ∈ U (x0 , t∗ ) which satisfies 0 ∈ F (x∗ )+NC (x∗ ), and kx∗ − xn k ≤ t∗ − tn for n = 1, 2, . . .. Note that by b0 here we mean the center Lipschitz constant of F ′ at x = x0 . Proof. We first define the following notation: g(x) :=d(1 − db0 kx − x0 k)−1 ,

, Tn :=B(x1 , r) ∩ Sx−1 n LFn :=LFxn ,

Jn :=F (xn ) − LFn−1 (xn ).


385

Before beginning the induction part of the proof, we prove the following relation holds: xn+1 = Tn+1 (Jn+1 ) where xn , xn+1 ∈ U (x0 , t∗ ),

(10.1.22)

xn := Tn (0), Jn+1 ∈ U (0, R) and Tn+1 restricted to U (0, R) is single-valued. By definition of Tn , xn+1 ∈ U(x1 , r) and 0 ∈ LFn (xn+1 ) + NC (xn+1 ). Thus 0 ∈ LFn+1 (xn+1 ) − (LFn+1 (xn+1 )) − LFn (xn+1 ) + NC (xn+1 ) = LFn+1 (xn+1 ) − Jn+1 + NC (xn+1 ). Rearranging results in Jn+1 ∈ (LFn+1 + NC )(xn+1 ). That is, xn+1 ∈ (LFn+1 + NC )−1 (Jn+1 ). Since Jn+1 ∈ U (0, R) and xn+1 ∈ U (x1 , r), we have xn+1 = Tn+1 (Jn+1 ), which shows (10.1.22). We now proceed by induction to prove the following for all n ≥ 1. Tn restricted to U (0, R) is single-valued and Lipschitz continuous with Lipschitz constant g(xn ).

(10.1.23)

kxn+1 − xn k ≤ Q(kxn − xn−1 k, kxn − x0 k), where xn+1 := Tn (0). (10.1.24) xn+1 ∈ U (x0 , t∗ ). kJn+1 k ≤ R.

(10.1.25) (10.1.26)

We first show that (10.1.23)–(10.1.26) hold for n = 1. By Lemma 10.1.7(d), t1 ≤ t∗ . By definition, t1 = G0 (t0 ) = m = kx1 − x0 k. Hence, x1 ∈ U (x0 , t1 ) ⊂ U (x0 , t∗ ) ⊂ D ∩ U (x0 , ρ). By Corollary 10.1.5, T1 restricted to U (0, R) is a single-valued and Lipschitz continuous operator with modulus d(1−dkF ′ (x1 )−F ′ (x0 )k)−1 . By hypothesis, kF ′ (x1 )− F ′ (x0 )k ≤ b0 kx1 − x0 k, so that d(1 − dkF ′ (x1 ) − F ′ (x0 )k)−1 ≤ g(x1 ). Hence, (10.1.23) holds for n = 1. Now define x2 = T1 (0). By Lemma 10.1.9, kJ1 k := kF (x1 ) − LF0 (x1 )k ≤ 12 bkx1 − x0 k2 = 12 bη 2 ≤ R. Thus, J1 ∈ U (0, R) and by (10.1.23), 1 dbkx1 − x0 k2 (1 − db0 kx1 − x0 k)−1 2 = Q0 (kx1 − x0 k, kx1 − x0 k).

kT1 (J1 ) − T1 (0)k ≤ g(x1 ) · kJ1 k ≤

But x2 = T1 (0) and by (10.1.22), x1 = T1 (J1 ), hence we have kx2 − x1 k = kT1 (0) − T1 (J1 )k. This establishes (10.1.24) when n = 1. By Lemma 10.1.8(f), x2 ∈ U(x0 , t∗ ). Thus, (10.1.25) holds for n = 1. Also, x2 ∈ U(x0 , t∗ ) ⊂ D, and thus

386


by Lemmas 10.1.7(e), 10.1.8(i) and 1 we get: kJ2 k := kF (x2 ) − LF1 (x2 )k ≤

1 bkx2 − x1 k2 2

1 b(t2 − t1 )2 2 1 ≤ bη 2 2−1 2 ≤ R.

≤

Thus, (10.1.26) holds for n = 1, and we have completed the induction when n = 1. Suppose that (10.1.23)–(10.1.26) hold for all n ≥ k. We proceed to show that they also hold for n = k + 1. By (10.1.25), xk+1 ∈ U (x0 , t∗ ) ⊂ U (x0 , ρ) ∩ D. By Corollary 10.1.5, Tk+1 restricted to U (0, R) is single-valued and Lipschitz continuous with modulus d(1 − dkF ′ (xk+1 ) − F ′ (x0 )k)−1 . By hypothesis, kF ′ (xk+1 ) − F ′ (x0 )k ≤ b0 kxk+1 − x0 k. Thus, g(xk+1 ) ≥ d(1 − dkF ′ (xk+1 − F ′ (x0 )k)−1 . Hence (10.1.23) holds for n = k + 1. Let xk+2 := Tk+1 (0). By (10.1.22)–(10.1.26), xk+1 = Tk+1 (Jk+1 ). Thus, kxk+2 − xk+1 k = kTk+1 (0) − Tk+1 (Jk+1 )k ≤ g(xk+1 )kJk+1 k 1 ≤ g(xk+1 ) · bkxk+1 − xk k2 2 = Q0 (kxk+1 − xk k, kxk+1 − x0 k). This establishes (10.1.24) for n = k + 1. It follows from Lemma 10.1.8(f) that xk+2 ∈ U (x0 , t∗ ) ⊂ D, thus (10.1.25) holds for n = k + 1. Now, by Lemmas 10.1.7, 10.1.8(i) and (10.1.3) we get: kJk+2 k = kF ′ (xk+2 ) − LFk+1 (xk+2 )k 1 ≤ bkxk+2 − xk+1 k2 2 1 ≤ b(tk+2 − tk+1 )2 2 1 ≤ bη 2 2 ≤ R. Thus, (10.1.26) holds for n = k + 1, which completes the induction. By Lemma n 10.1.8, the sequence {xn } converges to x∗ ∈ U (x0 , t∗ ) and kx∗ −xn k ≤ (2n bd)−1 (2a)(2 ) . ∗ It remains to show that x is a solution to the generalized equation 0 ∈ F (x) + NC (x). Let graph NC denote the graph of NC , that is, graph NC = {(u, v) | v ∈ NC (u)}. The assumptions on C imply that graph NC is a closed subset of X × X. By construction, (xn+1 , −LFn (xn+1 )) ∈ graph NC .

10.2. Quasi Methods

387

It follows from the continuity properties of F and F ′ on D that (xn+1 , −LFn (xn+1 )) converges to (x∗ , −F (x∗ )). Since graph NC is closed, (x∗ , −F (x∗ )) belongs to graph NC , which implies 0 ∈ F (x∗ ) + NC (x∗ ). Remark 10.1.1 Denote by {sn } the sequence {tn } when b0 = b. Then our results reduce to the ones in [200]. Otherwise we have (see Chapter 5) tn < s n tn+1 − tn < sn+1 − sn t∗ − tn ≤ s ∗ − s n

t∗ ≤ s ∗ h < hK = 2dbη 2 rJ = < r0 , 3bd

which justify the claims made in the introduction.

10.2

Quasi Methods

In this section we are concerned with the problem of approximating a strongly regular solution x∗ of generalized equation 0 ∈ F (x) + NC (x)

(10.2.1)

where F is a continuously Fréchet-differentiable operator defined on an open, convex, nonempty subset D of Rk (k ≥ 1 an integer) with values in Rk , and {z | hz, q − xi ≤ 0 for all q ∈ C}, if x ∈ C, NC (x) = ∅, if x 6∈ C. We say point x∗ ∈ D satisfies the generalized equation (1) if and only if x∗ ∈ C and hF (x∗ ), q − x∗ i ≥ 0 for all q ∈ C. Many problems in Mathematical Programming/Economics and engineering can be formulated as generalized equations of form (10.2.1). A survey on generalized equations and their solutions can be found in [68], [99], [298], [301], [302] and the references there. Newton’s method is undoubtedly the most popular method for approximating x∗ .

388


However Newton’s method involves the evaluation of F ′ at each step, which is an expensive task in general. That is why in such cases we replace F ′ (xn ) with an affine operator F (xn ) + B(• − xn ) whose derivative B approximates F ′ (xn ) and is easier to compute. The Secant approximations are such an example, and in general the popular so-called quasi-Newton methods [140], [201], [263], [298]. Newton’s method for approximating x∗ iteratively solves 0 = F (xn ) + F ′ (xn )(x − xn ) to find a solution xn+1 , whereas a quasi-Newton method also iteratively solves 0 = F (xn ) + Bn (x − xn ) where Bn approximates F ′ (xn ) in some sense to be precised later. A survey on the convergence of quasi-Newton method can be found in [140], [201], [263], [298] and the references there. Here motivated by the work in Chapter 5 we show that a larger convergence radius and finer error bounds on the distances involved can be found than before [201], [298], and under the same computational cost. This observation is important in computational mathematics since it allows a wider choice of initial guesses x0 and given in error tolerance ε > 0 finds the approximate solution using less computational steps. We need a corollary to Theorem 2.4 of Robinson [298]: Corollary 10.2.1 Let F , F ′ , D, C, Rk be as in the introduction. Assume equation (10.2.1) has a strongly regular solution x∗ ∈ D [201], [298], with associated Lipschitz constant d. Then there exist positive constants r, R, b and r0 such that the following hold for an k × k matrix M and x ∈ Rk . −1 U (x∗ , r) ∩ F (x) + A · (• − x) + NC

restricted to U (0, R) is single-valued and Lipschitz continuous with modulus d(1 − dkA − F ′ (x∗ )k)−1 provided x ∈ U (x∗ , r0 ) and kA − F ′ (x∗ )k < b.

We can state and prove the following local convergence result for quasi-Newton methods: Theorem 10.2.2 Let D, C, F , F ′ , NC be as in the Introduction. Assume: x∗ ∈ D is a strongly regular solution of the generalized equation (1) with Lipschitz constant d; there exist positive constants ℓ0 and ℓ such that kF ′ (x) − F ′ (y)k ≤ ℓkx − yk

(10.2.2)

kF ′ (x) − F ′ (x∗ )k ≤ ℓ0 kx − x∗ k

(10.2.3)

and

for all x, y ∈ D. Let N ∈ L(Rk , Rk ) the space of linear operators from Rk into Rk . Let k · kM denote a matrix norm and let a > 0 be such that k · k ≤ ak · kM ,

10.2. Quasi Methods

389

where k · k is a matrix norm for a given vector norm on Rk . Moreover assume: for (x, B) ∈ D×N , B ∈ S(x, B), where S is an update operator and x the vector closest to x in the set (F (x) + B(• − x) + NC )−1 (0) exists, there exist positive constants a1 and a2 such that: kB − F ′ (x∗ )kM ≤ 1 + a1 max{kx − x∗ k, kx − x∗ k} kB − F ′ (x∗ )kM + a2 max{kx − x∗ k, kx − x∗ k}, hold. Let b, r0 , r and R be as in Corollary 10.2.1. Furthermore assume: for a fixed c ∈ (0, 1) and r0 , b reduced if necessary that the following hold: 1 ℓ + ℓ0 r0 + b r0 < R, 2 d(1 − db)−1

1 ℓ + ℓ0 r0 + b < c, 2

2a(a−1 ba1 + a2 )r0 (1 − p)−1 < b, kB0 − F ′ (x∗ )kM
0 such that for all x, y ∈ D kF ′ (x) − F ′ (y)k ≤ ℓkx − yk.

(10.3.4)

Here we replace (10.3.4) by the weaker center-Lipschitz condition kF ′ (x) − F ′ (x0 )k ≤ ℓ0 kx − x0 k for all x ∈ D.

(10.3.5)

Note that in general ℓ0 ≤ ℓ

(10.3.6)

holds and ℓℓ0 can be arbitrarily large. From the computational point of view it is easier to evaluate ℓ0 than ℓ unless if ℓ0 = ℓ. Therefore sufficient convergence conditions based on ℓ0 rather than ℓ are weaker and can be used in cases not covered before. This is our motivation for providing a semilocal convergence analysis for methods (10.3.2) and (10.3.3) based on (10.3.5) rather than (10.3.4). Throughout this section we assume ϕ is a convex function. We can now state and prove the semilocal convergence result for modified Newton’s method (10.3.2):

394


Theorem 10.3.1 Assume: there exist x0 ∈ H, a > 0, b > 0, and r ≥ 0 such that for all y ∈ D ∂ϕ(x0 ) ∋ 0,

(10.3.7)

kF (x0 )k ≤ a,

(10.3.8)

hF ′ (x0 )y, yi ≥ bkyk2 ,

and

(10.3.9)

2aℓ0 < b2 , r − b ≤ r0 , ℓ0

(10.3.10)

U (x0 , r) ⊆ D,

(10.3.12)

(10.3.11)

where, p b(b − 2aℓ0 ) r0 = . ℓ0

(10.3.13)

Then sequence {xn } (n ≥ 0) generated by modified Newton method (10.3.2) is well defined, remains in U (x0 , r) for all n ≥ 0 and converges to a unique solution x∗ of inequality (10.3.1) in U (x0 , r). Moreover the following estimates hold for all n ≥ 0: kxn+2 − xn+1 k ≤ ckxn+1 − xn k, where c ∈ [0, 1) is given by √ b2 − 2aℓ0 c=1− . b

(10.3.14)

(10.3.15)

Proof. Our proof is based on the contraction mapping theorem 3.1.4. Let x ∈ U (x0 , r) and define operator g(x) by the variational inequality F ′ (x0 )g(x) + ∂ϕ(g(x)) ∋ F ′ (x0 )x − F (x).

(10.3.16)

It follows by Theorem 3.5 in [221] and (10.3.9) that operator g is well defined on U (x0 , r) (which is a closed set). The convexity of ϕ implies the monotonicity of ∂ϕ in the sense that h1 ∈ ∂ϕ(y1 ), h2 ∈ ∂ϕ(y2 ) ⇒ hh1 − h2 , y1 − y2 i ≥ 0. Using (10.3.7), (10.3.16) and (10.3.17) we get hF (x) − F ′ (x0 )(x − g(x)), g(x) − x0 i ≤ 0,

(10.3.17)

10.3. Variational Inequalities under Weak Lipschitz Conditions

395

which can be rewritten as hF ′ (x0 )(g(x) − x0 ), g(x) − x0 i ≤ hF ′ (x0 )(x − x0 ) − F (x), g(x) − x0 i. (10.3.18) Moreover by the choice of r, (10.3.5), (10.3.8), (10.3.9) and (10.3.18) we obtain in turn 1 kF (x) − F ′ (x0 )(x − x0 )k b 1 1 kF (x0 ) + F (x) − F (x0 ) − F ′ (x0 )(x − x0 )k + kF (x0 )k b b Z 1 1 kF ′ (x0 + t(x − x0 )) − F ′ (x0 )k kx − x0 kdt + b 0 a ℓ0 a ℓ0 + kx − x0 k2 ≤ + r2 ≤ r b 2b b 2b (10.3.19)

kg(x) − x0 k ≤ =

≤

(since x0 + t(x − x0 ) ∈ U (x0 , r) for all t ∈ [0, 1]). It follows from (10.3.19) that operator g maps the closed set U (x0 , r) into itself. Furthermore for any x, y ∈ U (x0 , r), the monotonicity of ∂ϕ and (10.3.2) we get hF ′ (x0 )(g(x) − g(y)) + F (x) − F (y) − F ′ (x0 )(x − y), g(x) − g(y)i ≤ 0, which can be rewritten as hF ′ (x0 )(g(x) − g(y)), g(x) − g(y)i ≤ hF ′ (x0 )(x − y) − F (x) + F (y), g(x) − g(y)i.

(10.3.20)

Using (10.3.5), (10.3.8), (10.3.9), (10.3.11) and (10.3.20) we can get in turn kg(x) − g(y)k ≤ ≤ ≤

1 kF (x) − F (y) − F ′ (x0 )(x − y)k b ℓ0 max{kx − x0 k, ky − x0 k}kx − yk b ℓ0 r kx − yk ≤ ckx − yk, b

(10.3.21)

which implies g is a contraction from the closed set U (x0 , r) into itself. The result now follows from the contraction mapping principle. Note also that by setting xn+1 = g(xn ) and using (10.3.21) we obtain (10.3.14). We can now state and prove the semilocal convergence result for Newton’s method (10.3.3): Theorem 10.3.2 Assume: there exist x0 ∈ H, a > 0, b > 0 such that for all x, y ∈ D (10.3.7), (10.3.8), (10.3.12) holds, and hF ′ (x)y, yi ≥ bkyk2;

(10.3.22)

396


Equation "

# ℓ0 a G(p) = − p = 0, +1 2ℓ0 p b 2 1− b

(10.3.23)

has a unique positive zero r ∈ 0, 2ℓb0 . Then sequence {xn } (n ≥ 0) generated by Newton method (10.3.3) is well defined, remains in U (x0 , r) for all n ≥ 0, and converges to a unique solution x∗ of inequality (10.3.1) in U (x0 , r). Moreover the following estimates hold for all n ≥ 0: kxn+2 − xn+1 k ≤ qkxn − xn−1 k ≤ q n kx1 − x0 k,

(10.3.24)

and kxn − x∗ k ≤

qn a , 1−q b

(10.3.25)

where, q=

2ℓ0 r . b

Proof. We first note that by the coercivity condition (10.3.22) and Theorem 3.5 in [221], the solutions xn+1 of (10.3.3) exist for all n ≥ 0. By the monotonicity of ∂ϕ, (10.3.7) and (10.3.3) for n = 1, we get hF (x0 ) − F ′ (x0 )(x0 − x1 ), x1 − x0 i ≤ 0, or by rewriting hF ′ (x0 )(x1 − x0 ), x1 − x0 i ≤ hF (x0 ), x1 − x0 i.

(10.3.26)

It follows from (10.3.8), (10.3.22) and (10.3.26) that we can obtain kx1 − x0 k ≤

1 a kF (x0 )k ≤ . b b

(10.3.27)

By the monotonicity of ∂ϕ, (10.3.3), the corresponding equation for n − 1 becomes hF ′ (xn )(xn+1 − xn ), xn+1 − xn i ≤ hF (xn ) − F (xn−1 ) − F ′ (xn−1 )(xn − xn−1 , xn+1 − xn i.

(10.3.28)

Using (10.3.5), (10.3.22) and (10.3.28) we can obtain in turn assuming x0 , x1 , . . . , xn ∈ U (x0 , r) kxn+1 − xn k ≤

1 kF (xn ) − F (xn−1 ) − F ′ (xn−1 )(xn − xn−1 )k b

10.3. Variational Inequalities under Weak Lipschitz Conditions ≤

397

Z 1

1

[F ′ (xn−1 + t(xn − xn−1 )) − F ′ (x0 )](xn − xn−1 )dt

b 0

1 k(F ′ (xn−1 ) − F ′ (x0 ))(xn − xn−1 )k (10.3.29) b Z 1 ℓ0 ≤ [(1 − t)kxn−1 − x0 k + tkxn − x0 k]dt + kxn−1 − x0 k kxn − xn−1 k b 0

+

≤

ℓ0 1 1 2ℓ0 r r + r + r kxn − xn−1 k ≤ kxn − xn−1 k = qkxn − xn−1 k (n ≥ 2) b 2 2 b (10.3.30)

0 (or ≤ aℓ 2b kx1 − x0 k for n = 1), which shows (10.3.24) for all n ≥ 0. It follows by (10.3.30) that

kxn+1 − x0 k ≤ ≤

≤ ≤

kxn+1 − xn k + kxn − xn−1 k + · · · + kx2 − x1 k + kx1 − x0 k

(q n−1 + q n−2 + · · · + 1)kx2 − x1 k + kx1 − x0 k 1 − qn kx2 − x1 k kx2 − x1 k + kx1 − x0 k ≤ + kx1 − x0 k 1−q 1−q a aℓ0 + =r 2b(1 − q) b (10.3.31)

by the choice of r. That is xn+1 ∈ U (x0 , r). Furthermore for all m > 0, n > 0 we have kxn+m − xn k

≤ kxn+m − xn+m−1 k + kxn+m−1 − xn+m−2 k + · · · + kxn+1 − xn k n+m−1 ≤ q + · · · + q n kx1 −x0 k ≤ q n 1 + · · · + q m−2 + q m−1 kx1 − x0 k 1 − qm n a ≤ q . 1−q b

(10.3.32)

It follows by (10.3.32) that sequence {xn } is Cauchy in a complete space H and as such it converges to some x∗ ∈ U (x0 , r). By letting n → ∞ we obtain (10.3.25). We must show x∗ solves (10.3.1). Let y ∈ S(ϕ). Then by (10.3.3) and the lower semicontinuity of ϕ we get in turn ϕ(x∗ ) ≤ lim inf ϕ(xn+1 ) ≤ ϕ(y)+ lim infhF ′ (xn )(xn −xn+1 )−F (xn ), xn+1 −yi n→∞

= ϕ(y) − hF (x∗ ), x∗ − yi.

n→∞

398


By the definition of ∂ϕ we get −F (x∗ ) ∈ ∂ϕ(x∗ ), which implies x∗ solves (10.3.1). Finally to show uniqueness let y ∗ ∈ U (x0 , r) be a solution of (10.3.1). It follows by the monotonicity of ∂ϕ that hF (x∗ ) − F (y ∗ ), x∗ − y ∗ i ≤ 0, and by the mean value theorem hF ′ (tx∗ + (1 − t)y ∗ )(x∗ − y ∗ ), x∗ − y ∗ i ≤ 0

(10.3.33)

for all t ∈ [0, 1]. It then follows by (10.3.22) and (10.3.33) that bkx∗ − y ∗ k2 ≤ hF ′ (tx∗ + (1 − t)y ∗ )(x∗ − y ∗ ), x∗ − y ∗ i ≤ 0, which implies x∗ = y ∗ . Remark 10.3.1 (a) If equality holds in (10.3.6) then our Theorem 10.3.1 reduces to the corresponding Theorem 1 in [326, p. 432]. Otherwise our condition (10.3.10) is weaker than corresponding condition (10.3.13) in [326, p. 432], which simply reads 2aℓ < b2 .

(10.3.34)

Note that corresponding error estimates are also smaller since ℓ0 < ℓ. (b) Sufficient conditions for finding zeros of G can be found in [68]. For example one such set of conditions can be given by r ≥ (1 + ℓ0 )

a b

(10.3.35)

provided that a≤

b2 . 4ℓ0 (1 + ℓ0 )

(10.3.36)

(c) It follows by (10.3.25) and (10.3.27) that scalar {tn } (n ≥ 0) defined by a ℓ0 1 t0 = 0, t1 = , tn+2 = tn+1 + 2tn + (tn+1 − tn ) (tn+1 − tn ) b b 2 is a majorizing sequence for {xn } (n ≥ 0). Sufficient convergence conditions for sequences more general than {tn } have already been given in Section 5.1 and can replace condition (10.3.23) in Theorem 10.3.2 on such condition is: there exist θ ∈ [0, 1) such that for all n ≥ 1 a 1 − θn 1 bθ + θn−1 ≤ (10.3.37) b 1−θ 4 2ℓ0

10.3. Variational Inequalities under Weak Lipschitz Conditions or the stronger but easier to verify a 1 bθ 1 + ≤ . b 1−θ 4 2ℓ0

399

(10.3.38)

In this case we have kxn+1 − xn k ≤ θ(tn+1 − tn ),

(10.3.39)

kxn − x∗ k ≤ t∗ − tn ,

(10.3.40)

and t∗ = lim tn n→∞

replacing the less precise estimates (in general) (10.3.24) and (10.3.25) respectively. (d) Direct comparison between Theorem 10.3.2 and the corresponding Theorem 2 in [326, p. 434] cannot be made since we use only (10.3.5) instead of the stronger (10.3.4). Note however that condition in [326] a
8, ℓ0 (1 + ℓ0 )

(10.3.42)

which can happen since as stated in the introduction ℓℓ0 can be arbitrarily large. (e) It turns out that condition (10.3.5) can be weakened even further, and replaced as follows: Assume operator F ′ (x) is simply continuous at x0 ∈ D. It follows that for all ε > 0, there exists δ > 0 such that if x ∈ U (x0 , δ) kF (x) − F ′ (x0 )k < ε.

(10.3.43)

Then, we can have: Case of Theorem 10.3.1. Replace hypotheses (10.3.10) and (10.3.11) by r = min{δ, r1 } where r1 =

a b−ε

provided that b > ε.

400


Set c = εb . With the above changes the conclusions of Theorem 10.3.1 hold as it can easily be seen by the proof of this theorem (see in particular (10.3.19), where (10.3.43) is used instead of (10.3.5)). Similarly, in: Case of Theorem 10.3.2. Replace condition (10.3.23) by r = min{δ, r2 }, where, a ε 2ε r2 = 1+ , q= b 1−q b provided that b > 2ε. Then again with the above changes the conclusions of Theorem 10.3.2 hold as it can easily be seen from the proof of this theorem (see in particular (10.3.29), where (10.3.43) is used instead of (10.3.5)). (f ) The results obtained here can be extended from a Hilbert into a Banach space setting along the lines of our works in Section 5.1. However we leave the details of the minor modifications to the motivated reader.

10.4

Exercises

10.1. Consider the problem of approximating a locally unique solution of the variational inequality F (x) + ∂ϕ(x) ∋ 0,

(10.4.1)

where F is a Gˆ ateaux differentiable operator defined on a Hilbert space H with values in H; ϕ : H → (−∞, ∞] is a lower semicontinuous convex function.

We approximate solutions x∗ of (10.4.1) using the generalized Newton method in the form F ′ (xn )(xn+1 ) + ∂ϕ(xn+1 ) ∋ F ′ (xn )(xn ) − F (xn ) to generate a sequence {xn } (n ≥ 0) converging to x∗ . Define: the set D(ϕ) = {x ∈ H : ϕ(x) < ∞}

and assume D(ϕ) 6= φ;

the subgradient ∂ϕ(x) = {z ∈ H : ϕ(x) − ϕ(y) ≤ hz, x − yi, y ∈ D(ϕ)}; and the set D(∂ϕ) = {x ∈ D(ϕ) : ∂ϕ(x) 6= 0}.

(10.4.2)

10.4. Exercises

401

Function ∂ϕ is multivalued and for any λ > 0, (1 + λ∂ϕ)−1 exists (as a single valued function) and satisfies k(1 + λ∂ϕ)−1 (x) − (I + λ∂ϕ)−1 (y)k ≤ kx − yk (x, y ∈ H). Moreover ∂ϕ is monotone: f1 ∈ ∂ϕ(x1 ), f2 ∈ ∂ϕ(x2 ) ⇒ hf1 − f2 , x1 − x2 i ≥ 0.

¯ = D(∂ϕ), ¯ Furthermore, we want D(ϕ) so that D(∂ϕ) is sufficient for our purposes. We present the following local result for variational inequalities and twice Gâteaux differentiable operators: (a) Let F : H → H be a twice Gˆ ateaux differentiable function. Assume: (1) variational inequality (10.4.1) has a solution x∗ ; (2) there exist parameters a ≥ 0, b > 0, c > 0 such that kF ′′ (x) − F ′′ (y)k ≤ akx − yk, kF ′′ (x∗ )k ≤ b,

and

cky − zk2 ≤ F ′ (x)(y − z), y − z

for all x, y, z ∈ H; (3) x0 ∈ D(ϕ) and x0 ∈ U (x∗ , r), where −1 q 16ac 2 r = 4c b + b + 3 .

Then show: generalized Newton method (10.4.2) is well defined, remains in U (x∗ , r) and converges to x∗ with n

kxn − x∗ k ≤ p · d2 ,

(n ≥ 0)

where, p−1 =

1 c

1

∗ 3 akx

− x0 k +

b 2

and

d = p−1 kx∗ − x0 k.

(b) We will approximate x∗ using the generalized Newton method in the form f ′′ (xn )(xn+1 ) + ∂ϕ(xn+1 ) ∋ f ′′ (xn )(xn ) − ∇f (xn ).

(10.4.3)

We present the following semilocal convergence result for variational inequalities involving twice Gˆ ateaux differentiable operators. Let f : H → R be twice Gˆ ateaux differentiable Assume:

402

10. Generalized equations (1) for x0 ∈ D(ϕ) there exist parameters α ≥ 0, β > 0, c > 0 such that

′′′ (f (x) − f ′′′ (x0 ))(y, y), z ≤ αkx − x0 k kyk2kzk, kf ′′′ (x0 )k ≤ β

and

cky − zk2 ≤ f ′′ (x)(y − z), y − z

for all x, y, z ∈ H; (2) the first two terms of (10.4.3) x0 and x1 , are such that for η ≥ kx1 − x0 k  √  c[β + 2 αc]−1 , η≤  c(2β)−1 ,

β 2 − 4αc 6= 0 β 2 − 4αc = 0.

Then show: generalized Newton’s method (10.4.3) is well defined, remains in U (x0 , r0 ) for all η ≥ 0, where c0 is the small zero of function δ, δ(r) = αηr2 − (c − βη)r + cη, and converges to a unique solution x∗ of inclusion ▽f (x) + ∂ϕ (x) ∋ 0. ¯ (x0 , r0 ). Moreover the following error bounds hold In particular x∗ ∈ U for all n ≥ 0 n

kxn − x∗ k ≤ γd2 , where, γ −1 =

αr0 +β c

and d = ηγ −1 .

10.2. Let M , h·, ·i, k · k denote the dual, inner product and norm of a Hilbert space H, respectively. Let C be a closed convex set in H. Consider an operator a : H × H → [0, +∞). If a is continuous bilinear and satisfies a(x, y) ≥ c0 kyk2 ,

y ∈ H,

(10.4.4)

and a(x, y) ≤ c1 kxk · kyk,

x, y ∈ H,

(10.4.5)

for some constants c0 > 0, c1 > 0 then a is called a coercive operator. Given z ∈ M , there exists a unique solution x ∈ C such that: a(x, x − y) ≥ hz, x − yi,

y ∈ C.

(10.4.6)

10.4. Exercises

403

Inequality (10.4.6) is called variational. It is well known that x∗ can be obtained by the iterative procedure xn+1 = PC (xn − ρF (G(xn ) − z)),

(10.4.7)

where PC is a projection of H into C, ρ > 0 is a constant, F is a canonical isomorphism from M onto H, defined by hz, yi = hF (z), yi,

y ∈ H, z ∈ M,

(10.4.8)

and a(x, y) = hG(x), yi,

y ∈ H.

(10.4.9)

Given a point-to-set operator C from H into M we define the quasivariational inequality problem to be: find x ∈ C(x) such that: a(x, y − x) ≥ hz, y − xi

y ∈ C(x).

(10.4.10)

Here, we consider C(x) to be of the form C(x) = f (x) + C,

(10.4.11)

where f is a point-to-point operator satisfying kf (x∗ ) − f (y)k ≤ c2 kx∗ − ykλ

(10.4.12)

for some constants c2 ≥ 0, λ ≥ 1, all y ∈ H and x∗ a solution of (10.4.10). We will compute the approximate solution to (10.4.10). (a) Show: For fixed z ∈ H, x ∈ C satisfies hx − z, y − xi ≥ 0

y∈C

(10.4.13)

⇔ x = PC (z),

(10.4.14)

where PC is the projection of H into C. (b) PC given by (10.4.14) is non-expansive, that is kPC (x) − PC (y)k ≤ kx − yk,

x, y ∈ H.

(10.4.15)

(c) For C given by (10.4.11), x ∈ C(x) satisfies (10.4.10) ⇔ x = f (x) + PC (x − ρF (G(x) − z)).

(10.4.16)

Result (c) suggests the iterative procedure xn+1 = f (xn ) + PC xn − ρF (Gn ) − z) − f (xn )

for approximating solutions of (10.4.10). Let us define the expression q λ−1 θ = θ(λ, ρ) = 2c2 kx0 − xk + 1 + ρ2 c21 − 2c0 ρ .

(10.4.17)

It is simple algebra to show that θ ∈ [0, 1) in the following cases:

404

10. Generalized equations √ c0 + c20 −4c21 c2 (1−c2 ) c2 (1 − c2 ), 0 < ρ < ; c21 q 1/λ−1 2 = c3 , (2) λ > 1, c0 ≤ c1 , kx0 − xk < 2c12 1 + 1 − cc01 (1) λ = 1, c2 ≤ 12 , c0 ≥ 2c1

0 c1 , kx0 − xk ≤ Denote by H0 , H1 the sets

q = 1 − 2c2 kx0 − xkλ−1

1 2c2

H0 = {y ∈ H | ky − x∗ k ≤ c3 }

1/λ−1

2

= c4 , 0 < ρ
0 be such that D ≡ U (x0 , R). Suppose that f is mtimes Fréchet-differentiable on D, and its mth derivative f (m) is in a certain sense uniformly continuous: kf (m) (x) − f (m) (x0 )k ≤ w(kx − x0 k),

for all x ∈ D,

(10.4.18)

for some monotonically increasing positive function w satisfying lim w(r) = 0,

t→∞

(10.4.19)

or, even more generally, that kf (m) (x)−f (m) (x0 )k ≤ w(r, kx−x0 k),

¯ r ∈ (0, R), (10.4.20) for all x ∈ D,

10.4. Exercises

405

for some monotonically increasing in both variables positive function w satisfying r ∈ [0, R].

lim w(r, t) = 0,

t→0

(10.4.21)

Let us define function θ on [0, R] by h 1 m m r + · · · + α2!2 r2 η + αm! θ(r) = c+α Z v1 Z vm−2 i w(vm−1 )(r − v1 )dv1 · · · dvm−1 − r ··· + 0

0

(10.4.22)

for some constants α, c, η, αi , i = 2, . . . , m; the equation, θ(r) = 0;

(10.4.23)

and the scalar iteration {rn } (n ≥ 0) by r0 = 0,

rn+1 = rn −

θ(rn ) θ ′ (rn )

.

(10.4.24)

Let g be a maximal monotone operator satisfying L(z)+g(z) ∋ y, and suppose: (10.4.18) holds, there exist αi (i = 2, . . . , m) such that kF (i) (x0 )k ≤ αi ,

(10.4.25)

and equation (10.4.23) has a unique r∗ ∈ [0, R] and θ(R) ≤ 0.

Then show: the generalized Newton’s method {xn } (n ≥ 0) generated by f ′ (xn )xn+1 + g(xn+1 ) ∋ f ′ (xn )(xn ) − f (xn )

(n ≥ 0), (x0 ∈ D)

is well defined, remains in V (x0 , r∗ ) for all n ≥ 0, and converges to a solution x∗ of f (x) + g(x) ∋ x. Moreover, the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ rn+1 − rn ,

(10.4.26)

and kxn − x∗ k ≤ r∗ − rn ,

r∗ = lim rn . n→∞

(10.4.27)

10.4. Let x0 ∈ D and R > 0 be such that D ≡ U (x0 , R). Suppose that f is Fréchet-differentiable on D, and its derivative f ′ is in a certain sense uniformly

406

10. Generalized equations continuous as an operator from D into L(H, H); the space of linear operators from H into H. In particular we assume: kf ′ (x) − f ′ (y)k ≤ w(kx − yk),

x, y ∈ D,

(10.4.28)

for some monotonically increasing positive function w satisfying lim w(r) = 0,

t→∞

or, even more generally, that kf ′ (x) − f ′ (y)k ≤ w(r, kx − yk),

¯ r ∈ (0, R), x, y ∈ D,

(10.4.29)

for some monotonically increasing in both variables positive function w satisfying lim w(r, t) = 0,

t→0

r ∈ [0, R].

Conditions of this type have been studied in the special cases w(t) = dtλ , w(r, t) = d(r)tλ , λ ∈ [0, 1] for regular equations; and for w(t) = dt for generalized equations of the form f (x)+g(x) ∋ x. The advantages of using (10.4.28) or (10.4.29) have been explained in great detail in the excellent paper [6]. It is useful to pass from the function w to w(r) ¯ = sup{w(t) + w(s) : t + s = r}. The function may be calculated explicitly in some cases. For example if λ w(r) = drλ (0 < λ ≤ 1), then w(r) ¯ = 21−λ dr . More generally, if w is a r ¯ is increasing, convex concave function on [0, R], then w(r) ¯ = 2w 2 , and w and w(r) ¯ ≥ w(r), r ∈ [0, R]. Let us define the functions θ, θ¯ on [0, R] by Z r h i η+ w(t)dt − r, for some α > 0, η > 0, c > 0 Z0 r h i ¯ = 1 η+ θ(r) w(t)dt ¯ −r c+α

θ(r) =

1 c+α

0

and the equations θ(r) = 0, ¯ = 0. θ(r) Let g be a maximal monotone operator satisfying

there exists c > −α such that f ′ (z)(x), x ≥ ckxk2 ,

for all x ∈ H, z ∈ D,

10.4. Exercises

407

¯ and suppose: (10.4.28) holds and equation θ(r) = 0 has a unique solution ∗ r ∈ [0, R]. Then, show: the generalized Newton method {xn } (n ≥ 0) generated by f ′ (xn )xn+1 + g(xn+1 ) ∋ f ′ (xn )(xn ) − f (xn )

(n ≥ 0), (x0 ∈ D)

is well defined, remains in U (x0 , r∗ ) for all n ≥ 0, and converges to a solution x∗ of f (x) + g(x) ∋ x. Moreover, the following estimates hold for all n ≥ 0: kxn+1 − xn k ≤ rn+1 − rn kxn − x∗ k ≤ r∗ − rn , where, r0 = 0, rn+1 = rn − and lim rn = r∗ .

n→∞

¯ n) θ(r , θ′ (rn )


Chapter 11

Monotone convergence of point to set-mapping In this chapter we discuss the monotonicity of the iteration sequence generated by the algorithmic model xk+1 ∈ fk (x0 , x1 , ..., xk )

(k ≥ l − 1).

(11.0.1)

Here, we assume again that for k ≥ l − 1, fk : X k+1 → 2X , where X is a subset of a partially ordered space L, and let ≤ denote the order in L, (see also Chapter 2, [97]–[101])

11.1

General Theorems

Our main results will be based on a certain type of isotony of the algorithmic mappings. Definition 11.1.1 The sequence of point-to-set mapping gk : X k+1 → 2L is called increasingly isotone from the left on X, if for any sequence {x(k) } such that x(k) ∈ X (k ≥ 0) and x(0) ≤ x(1) ≤ x(2) ≤ ..., y1 ≤ y2 for all y1 ∈ gk (x(0) , x(1) , ..., x(k) ), y2 ∈ gk+1 (x(0) , x(1) , ..., x(k+1) ) and k ≥ l − 1. Consider the special case of single variable mappings gk . The above properties are now reduced to the following. Assume that x(0) , x(1) ∈ X, x(0) ≤ x(1) , y1 ∈ gk (x(0) ), and y2 ∈ gk+1 (x(1) ), then y1 ≤ y2 . Similarly, the sequence of point-to-set mappings gk : X k+1 → 2L is called increasingly isotone from the right on X, if for any sequence {x(k) } such that x(k) ∈ X (k ≥ 0) and x(0) ≤ x(1) ≤ x(2) ≤ ..., y1 ≥ y2 for all y1 ∈ gk (x(0) , x(1) , ..., x(k) ), y2 ∈ gk+1 (x(0) , x(1) , ..., x(k+1) ) and k ≥ l − 1. Our main result is the following: 409

410

11. Monotone convergence of point to set-mapping

Theorem 11.1.2 Assume that the sequence of mappings fk is increasingly isotone from the left; futhermore, x0 ≤ x1 ≤ ... ≤ xl−1 ≤ xl (xk ∈ X, 0 ≤ k ≤ l − 1). Then for all k ≥ 0, xk+1 ≥ xk . Proof. By induction, assume that for all indices i (i < k), xi+1 ≥ xi . The relations xk ∈ fk−1 (x0 ≤ x1 ≤ ... ≤ xk−1 )

and

xk+1 ∈ fk (x0 ≤ x1 ≤ ... ≤ xk )

and Definition 11.1.1 imply that xk+1 ≥ xk . Because this inequality holds for k ≤ l − 1, the proof is complete. Consider next the modified algorithmic model yk+1 ∈ fk (yk , yk−1 , ..., y1 , y0 ),

(11.1.1)

which differs from process (11.0.1) only in the order of the variables in the righthand side. Using finite induction again similarly to Theorem 11.1.2, one may easily prove the following theorem. Theorem 11.1.3 Assume that the sequence of mappings fk is increasingly isotone from the right; furthermore, y0 ≥ y1 ≥ ... ≥ yl−1 ≥ yl (yk ∈ X, 0 ≤ k ≤ l − 1). The for all k ≥ 0, yk+1 ≤ yk . Corollary 11.1.4 Assume that X ⊆ Rn ; furthermore, xk → s∗ and yk → s∗ as k → ∞. It is also assumed that ” ≤ ” is the usual partial order of vectors; i.e., a = (a(i) ) ≤ b = (b(i) ) if and only if for all i, a(i) ≤ b(i) . Under the conditions of Theorems 11.1.2 and 11.1.3, xk ≤ s∗ ≤ yk . This relation is very useful in constructing an error estimator and a stopping rule for methods (11.0.1) and (11.1.1), since for all coordinates of vectors xk , yk and s∗ , (i)

(i)

(i)

(i)

(i)

0 ≤ s∗(i) − xk ≤ yk − xk and (i)

0 ≤ yk − s∗(i) ≤ yk − xk . Hence, if each element of yk − xk is less than an error tolerance e, then all elements of xk and yk are closer than e to the corresponding elements of the limit vector.

11.2

Convergence in linear spaces

Assume now that X is a subset of a partially ordered linear space L. In addition, the partial order satisfies the following conditions:

11.2. Convergence in linear spaces

411

(F1). x ≤ y (x, y ∈ L) implies that −x ≥ y; (F2). If x(1) , x(2) , y (1) , y (2) ∈ L, x(1) ≤ y (1) and x(2) ≤ y (2) , then x(1) + x(2) ≤ y (1) + y (2) . (p)

(p)

Assume that for k ≥ l − 1 and p = 1, 2, mapping Kk , Hk are point-to-set (p) mapping defined on X k+1 , and for all (x(1) , x(2) , ..., x(k+1) ) ∈ X k+1 , Kk (x(1) , x(2) (p) (p) , ..., x(k+1) ), and Hk (x(1) , x(2) , ..., x(k+1) ) are nonempty subsets of L; i.e., Kk , (p) and Hk : X k+1 → 2L . Consider next the algorithmic model: (1)

(1)

xk+1 = tk+1 − sk+1

(2)

(2)

yk+1 = tk+1 − sk+1 ,

and

(11.2.1)

where (1)

(1)

(2)

(2)

(1)

(1)

tk+1 ∈ Kk (x0 , x1 , ..., xk ), sk+1 ∈ Hk (yk , yk−1 , ..., y0 ) and (1)

(2)

tk+1 ∈ Kk (yk , yk−1 , ..., y0 ), sk+1 ∈ Hk (x0 , x1 , ..., xk ). In the formulation of our next theorem we will need what follows. Definition 11.2.1 A point-to-set mapping g : X k+1 → 2L is called increasingly isotone on X, if for all x(1) ≤ x(2) ≤ ... ≤ x(k+2) (x(i) ∈ X, i = 1, 2, ..., k + 2), relations y1 ∈ g(x(1) , x(2) , ..., x(k+1) ) and y2 ∈ g(x(2) , x(3) , ..., x(k+2) imply that y1 ≤ y2 . Increasingly isotone functions have the following property. Lemma 11.2.2 Assume that mapping g : X k+1 → 2L is increasingly isotone. Then, for all x(i) and y (i) ∈ X (i = 1, 2, ..., k + 1) such that x(1) ≤ x(2) ≤ ..., x(k+1) ≤ y (1) ≤ y (2) ≤ ... ≤ y (k+1) and for any x ∈ g(x(1) , x(2) , ..., x(k+1) ) and y ∈ g(y (1) , y (2) , ..., y (k+1) ), x ≤ y. Proof. Let yi ∈ g(x(i+1) , ..., x(k+1) , y (1) , ..., y (i) ) be arbitrary for i = 1, 2, ..., k. Then x ≤ y1 ≤ y2 ≤ ... ≤ yk ≤ y, which completes the proof. Remark 11.2.1 In the literature a point-to-set mapping g : X k+1 → 2L is called isotone on X, if for all x(i) and y (i) such that x(i) ≤ y (i) (i = 1, 2, ..., k + 1) and for all x ∈ g(x(1) , ..., x(k+1) ) and y ∈ g(y (1) , ..., y (k+1) ), x ≤ y. It is obvious that an isotone mapping is increasingly isotone, but the reverse is not necessarily true as the following example illustrates.

412


Example 11.2.1 Define L = R1 , X = [0, 1], k = 1, ≤ to be the usual order of real numbers, and (2) if x(1) ≥ 2x(2) − 1 x , g(x(1) , x(2) ) = (1) (2) x − x + 1, if x(1) < 2x(2) − 1. We will now verify that g is increasingly isotone, but not isotone on X. Select arbitrary x(1) ≤ x(2) ≤ x(3) from the unit interval. Note first that g(x(1) , x(2) ) ≤ x(2) , since if x(1) ≥ 2x(2) − 1, then g(x(1) , x(2) ) = x(2) , and if x(1) < 2x(2) − 1, then g(x(1) , x(2) ) = x(1) − x(2) + 1 < 2x(2) − 1 − x(2) + 1 = x(2) . Note next that g(x(2) , x(3) ) ≥ x(2) , since if x(2) ≥ 2x(3) − 1, then g(x(2) , x(3) ) = x(3) ≥ x(2) , and if x(2) < 2x(3) − 1, then g(x(2) , x(3) ) = x(2) − x(3) + 1 ≥ x(2) . Consequently, g(x(1) , x(2) ) ≤ x(2) ≤ g(x(2) , x(3) ). Hence, g is increasingly isotone; however, it is not isotone on X, since for any points (t, 1) and (t, 1 − e) (where t, e > 0 and t + 2e < 1), g(t, 1) = t − 1 + 1 = t < g(t, 1 − e) = t − (1 − e) + 1 = t + e, but 1˙ > 1 − e. Consider now the following assumptions: (1)

(2)

(G1 ) Sequences of mapping Kk and Hk are increasingly isotone from the left. (1) (2) Futhermore, Kk and Hk are increasingly isotone from the right. (G2 ) For all k and t(0) ≤ t(1) ≤ ... ≤ t(k) ≤ t¯(k) ≤ ... ≤ t¯(1) ≤ t¯(0) (t(i) and t¯(i) ∈ X, i = 0, 1, ..., k), (2) Kk (t¯(k) , ..., t¯(0) )

and (1)

(2)

Hk (t¯(k) , ..., t¯(0) ) Hk (t(0) , t(1) , ..., t(k) ). (G3 ) The initial points are selected so that x0 ≤ x1 ≤ ... ≤ xl−1 ≤ xl ≤ yl ≤ yl−1 ≤ ... ≤ y1 ≤ y0 ; (G4 ) x0 , y0 = {x | x ∈ L, x0 ≤ x ≤ y0 } ⊆ X. Theorem 11.2.3 Under assumtions (F1 ), (F2 ) and (G1 ) through (G4 ), xk and yk are in X for all k ≥ 0. Furthermore, xk ≤ xk+1 ≤ yk+1 ≤ yk .

11.2. Convergence in linear spaces

413

Proof. By induction, assume that y0 xi+1 ≥ xi and x0 yi+1 ≤ yi for all i < k. Therefore, points xi and yi (i = 0, 1, ..., k) are in X. Note next that assumption (G1 ) implies that (1)

(1)

(2)

(2)

(1)

(1)

(2)

(2)

tk+1 ≥ tk , tk+1 ≤ tk , sk+1 ≤ sk and sk+1 ≥ sk , and hence, (1)

(1)

(1)

(1)

(2)

(2)

(2)

(2)

xk+1 = tk+1 − sk+1 ≥ tk − sk = xk and yk+1 = tk+1 − sk+1 ≤ tk − sk = yk . Assume next that xi ≤ yi (i ≤ k). Then, assumption (G2 ) implies that (1)

(2)

(1)

(2)

tk+1 ≤ tk+1 and sk+1 ≥ sk+1 . Therefore, (1)

(1)

(2)

(2)

xk+1 = tk+1 − sk+1 ≤ tk+1 − sk+1 = yk+1 . Thus, the proof is complete. Introduce next the point-to-set mappings (p)

(p)

(p)

fk (x(1) , ..., x(k+1) ) = {t−s | t ∈ Kk (x(1) , ..., x(k+1) ), s ∈ Hk (x(1) , ..., x(k+1) )}. (1)

(2)

Assume that z is a common fixed point of mappings fk and fk ; i.e., for p = 1, 2, (p) (1) (2) (1) (2) z ∈ fk (z, z, ..., z), furthermore (G5 ). Mappings Kk Kk Hk Hk are increasingly isotone. Theorem 11.2.4 Assume that x1 ≤ z ≤ y1 . Then, under assumptions (F1 ), (F2 ) and (G1 ) through (G5 ), xk ≤ z ≤ yk for all k ≥ 0. Proof. By induction, assume that xi ≤ z ≤ yi , for i ≤ k. Then, for p = 1, 2 and (p) (p) for all t(p) ∈ Kk (z, z, ..., z) and s(p) ∈ Hk (z, z, ..., z), (1)

(2)

(1)

(2)

tk+1 ≤ t(1) , t(2) ≤ tk+1 and sk+1 ≥ s(1) , s(2) ≥ sk+1 . Therefore, (1)

(1)

(2)

(2)

xk+1 = tk+1 − sk+1 ≤ t(p) − s(p) ≤ tk+1 − sk+1 = yk+1, and since z = t − s with some t and s, the proof is completed.

(p = 1, 2)

414


Remark 11.2.2 Two important special cases of algorithm (11.2.1) can be given: (1)

(2)

(1)

(2)

(1)

1 If Kk = Kk = Kk and Hk = Hk = Hk , then fk is a common fixed point of the mapping fk .

(2)

= fk

= fk , and so z

2 Assume next that Kk and Hk are l−variable functions. Then one may select (1)

Kk (t(0) , ..., t(k) ) = Kk (t(k−l+1) , ..., t(k) ), (2)

Kk (t(0) , ..., t(k) ) = Kk (t(0) , ..., t(l−1) ), (1)

Hk (t(0) , ..., t(k) ) = Hk (t(0) , ..., t(l−1) ) and (2)

Hk (t(0) , ..., t(k) ) = Hk (t(k−l+1) , ..., t(k) ). Note that the resulting process for l = 1 was discussed in Ortega and Rheinboldt ([263], Section 13.2). Corollary 11.2.5 Assume that in addition L is a Hausdorff topological space which satisfies the first axiom of countabi;ity, and (F3 ) If xk ∈ L (k ≥ 0) such that xk ≤ x with some x ∈ L and xk ≤ xk+1 for all k ≥ 0, then the sequence {xk } converges to a limit point x∗ ∈ L; furthermore, x∗ ≤ x.

(F4 ) If xk ∈ L (k 0) and xk → x, then −xk → −x.

Note first that assumptions (F1 ) through (F4 ) imply that any sequence {yk } with the properties yk ∈ L, yk ≥ yk+1, and yk ≥ y (k ≥ 0) is convergent to a limit point y ∗ ∈ L; furthermore, y ∗ ≥ y. Since under the conditions of the theorem {xk } is increasing with an upper bound y0 and {yk } decreases with a lower bound x0 , both sequences are convergent. If x∗ and y ∗ denote the limit points and z is a common fixed point of mappings fk such that xl ≤ z ≤ yl then x∗ ≤ z ≤ y ∗ . Consider the special l−step algorithm given as special case 2 above. Assume that for all k ≥ l − 1, mappings Kk and Hk are closed. Then, x∗ is the smallest fixed point and y ∗ is the largest fixed point in xl , yl . As a further special case we mention the following lemma. Lemma 11.2.6 Let B be a partially ordered topological space and let x. y be two points of B such that x ≤ y. If f : x, y → B is a continuous isotone mappings having the property that x ≤ f (x) and y ≥ f (y), then a point z ∈ x, y exists such that z = f (z). Proof. Select X = x, y , l = 1, Hk = 0 and Kk = f, and consider the iteration sequences {xk } and {yk } starting from x0 = x and y0 = y. Then, both sequences are convergent, and the limits are fixed points. Remark 11.2.3 This result is known as Theorem 2.2.2, (see, Chapter 2).

11.3. Applications

11.3

415

Applications

Consider the simple iteration method xk+! = f (xk ),

(11.3.1)

where f : [a, b] → R with the additional properties: f (a) > a, f (b) < b, 0 ≤ f (x) ≤ q < 1 for all × ∈ [a, b], where q is a given constant. Consider the iteration sequences xk+1 = f (xk ),

x0 = a

yk+1 = f (yk ),

y0 = b

and (11.3.2)

Since f is increasingly, Theorems 11.1.2 and 11.1.3 apply to both sequences {xk } and {yk } that converge to the unique fixed point s∗ ; furthermore, sequence {xk } is increasing and {yk } decreases. This situation is illustrated in Figure 11.3.1. Next we drop the assumption f (x) q < 1, but retain all other conditions. In other words, f is continuous, increasing in [a, b], f (a) > a

f (b) < b.

Notice that Theorems 11.1.2 and 11.1.3 still can be applied for the iteration sequence (11.3.2). However, the uniqueness of the fixed point is no longer true. This situation is shown in Figure 11.3.2. Becausef is continuous, there is at least one fixed point in the interval (a, b). (1) (2) Alternatively, Theorems 11.2.3 and 11.2.4 may also be applied with Kk = Kk ≡ (1) (2) f and Hk = Hk = 0, which has the following additional consequence. Let s∗ and s∗∗ denote the smallest and the largest fixed points between a and b. Because a < s∗ and b > s∗∗ , f (a) f (s∗ ) = s∗ and f (b) f (s∗∗ ) = s∗∗ , thus, s∗ and s∗∗ are between x1 and y1 . Therefore, for all k 1, xk s∗ s∗∗ yk ; futhermore, xk → s∗ and yk → s∗∗ . If the fixed point is unique, the uniqueness of the fixed point can be estabilished by showing that sequences {xk } and {yk } have the same limit. Finally, we mention that the above facts can be easily extended to the more general case when f : Rn → Rn . Example 11.3.1 We now illustrate procedure (11.3.2) in the case of the single variable nonlinear equation x=

1 x e + 1. 10

416

11. Monotone convergence of point to set-mapping y=x

f(x)

y=f(x)

x

x0

1

x2 s

∗

y2

y

y1

0

Figure 11.1: Monotone fixed point iterations.

y=x

f(x)

y=f(x)

x1 s∗

x0

∗∗

s

y1

y0

Figure 11.2: The case of multiple fixed points

It is easy to see that 1 1 e +1>1 10

and

1 2 e + 1 < 2. 10

Further, d dx

1 x e +1 10

=

1 x e ∈ (0, 0.75). 10

11.4. Exercises

417

Therefore, all conditions are satisfied. Select x0 = 1 and y0 = 2. Then, x1 = x2 = x3 = x4 = x3 =

1 1 1 2 e + 1 ≈ 1.27183; e + 1 ≈ 1.73891; y1 = 10 10 1 1.27183 1 1.73891 e e + 1 ≈ 1.35674; y2 = + 1 ≈ 1.56911; 10 10 1 1.35674 1 1.56911 e e + 1 ≈ 1.38835; y3 = + 1 ≈ 1.48024; 10 10 1 1.38835 1 1.48024 e e + 1 ≈ 1.40082; y4 = + 1 ≈ 1.43940; 10 10 1 1.40082 1 1.43940 e e + 1 ≈ 1.40585; y5 = + 1 ≈ 1.42182, 10 10

and so on. The monotonicity of both iteration sequences, as expected, is clearly visible.

11.4

Exercises

1. Consider the problem of approximating a locally unique zero s∗ of the equation g(x) = 0, where g is a given real function defined on a closed interval [a, b]. The Newton iterates {xn }, {yn }, n ≥ 0 are defined as g(xn ) + g (xn )(xn+1 − xn ) = 0

x0 = a

g(yn ) + g (yn )(yn+1 − yn ) = 0

y0 = b.

and

Use Theorems 11.1.2 and 11.1.3 to find sufficient conditions for the monotone convergence of the above sequences to s∗ ∈ [a, b]. 2. Examine the above problem, but using the secant iterations {xn }, {yn }, n ≥ −1 given by g(xn ) − g(xn−1 ) (xn+1 − xn ) = 0 xn − xn−1 g(yn ) − g(yn−1 ) g(yn ) + (yn+1 − yn ) = 0 yn − yn−1

g(xn ) +

x1 = a, x0 = given y1 = b, y0 = given.

3. Examine the above problem, but using the Newton-like iteration, where {xn }, {yn }, n ≥ 0 are given by g(xn ) + h(xn )(xn+1 − xn ) = 0

x0 = a

418

11. Monotone convergence of point to set-mapping and g(yn ) + h(yn )(yn+1 − yn ) = 0

y0 = b

with a suitably chosen function h. 4. Examine the above problem, but using the Newton-like iteration, where {xn }, {yn }, n ≥ 0 are given by g(xn ) + cn (xn+1 − xn ) = 0

x0 = a

g(yn ) + dn (yn+1 − yn ) = 0

y0 = b

and

with a suitably chosen real sequences {cn }, {dn }. 5. Generalize Problems 1 to 4 to Rn . 6. Generalize Problems 1 to 4 to Rn by choosing g to be defined on a given subset of a partially ordered topological space. (For a definition, see Chapter 2). 7. Compare the answers given in Problem 1 to 6 with the results of Chapter 2. 8. To find a zero s∗ for g(x) = 0, where g is a real function defined on [a, b], rewrite the equation as x = x + cg(x) = f (x) for some constant c = 0. Consider the sequences xn+1 = f (xn )

x0 = a

yn+1 = f (yn )

y0 = b.

and

Use Theorems 11.1.2 and 11.1.3 to find sufficient conditions for the monotone convergence of the above sequences to a locally unique fixed point s∗ ∈ [a, b] ot the equation x = f (x).

Chapter 12

Fixed points of point-to-set mappings In this chapter special monotonicity concepts are used to compare iteration sequences and common fixed points of point-to set maps. The results of this chapter are very useful in comparing the solutions of linear and nonlinear algebraic equations as well as the solutions of differential and integral equations (see also Chapter 3).

12.1

General theorems

Assume again that X is a subset of a partially ordered space L. Let P denote a partially ordered parameter set, and for all p ∈ P and k ≥ l − 1, let the point-to-set (p) mappings fk : X k+1 → 2X . For each p ∈ P, define the iteration algorithm (p)

(p)

(p)

(p)

(p)

xk+1 ∈ fk (x0 , x1 , ..., xk ).

(12.1.1)

Introduce the following conditions: (H1 ) If p1 , p2 ∈ P such that p1 ≤ p2 , then for all x(i) ∈ X (i ≥ 0) and yj ∈ (p) (p) (p) (k) fk (x0 , x1 , ..., xk ) (j = 1, 2), y1 ≤ y2 ; (H2 ) If p ∈ P and t(i) , s(i) ∈ X such that t(i) ≤ s(i) (i ≥ 0) then yt ≤ ys for all (p) (p) k ≥ l − 1, where yt ∈ fk (t(0) , t(1) , ..., t(k) ) and ys ∈ fk (s(0) , t(1) , ..., s(k) ) are arbitrary; (p )

(p )

(H3 ) For k = 0, 1, ..., l − 1 and all p1 , p2 ∈ P such that p1 ≤ p2 , xk 1 ≤ xk 2 . (p)

Assumption (H1 ) shows that for fixed k, mappings fk are increasing in p, and (p) assumption (H2 ) means that all mappings fk are isotone. Condition (H3 ) requires that the initial points are selected so that they are increasing in p. 419

420

12. Fixed points of point-to-set mappings

Our first result shows that under the above assumptions the entire iteration sequence (12.1.1) is increasing in p. Theorem 12.1.1 Assume that conditions (H1 ) through (H3 ) hold. Then p1 , p2 ∈ (p ) (p ) P and p1 ≤ p2 imply that for all k ≥ 0, xk 1 ≤ xk 2 . (p )

(p )

Proof. By induction, assume that xi 1 ≤ xi 2 , (i < k). Then condition (H2 ) (p ) (p1 ) (p2 ) (p2 ) implies that xk 1 ≤ y for all y ∈ fk−1 (x0 , ..., xk−1 ), and from assumption (H1 ) (p )

(p )

(p )

we conclude that y ≤ xk 2 . Thus, xk 1 ≤ xk 2 , which completes the proof.

Remark 12.1.1 The proof of the theorem remains valid in the slightly more general case, when it is assumed only that condition (H2 ) holds either for p1 or for p2 . Assume next that L is a metric space. Furthermore, (H4 ) If tk , sk ∈ X such that tk ≤ sk (k ≥ 0) and tk → t∗ , sk → s∗ as k → ∞ with some t∗ , s∗ ∈ L, then t∗ ≤ s∗ ; (p)

(H5 ) For all p ∈ P, iteration sequence xk converges to a limit x(p) ∈ L. ∗

Our next result compares the limit points x(p) . ∗

∗

Theorem 12.1.2 Under assumptions (H1 ) through (H5 ), x(p1 ) ≤ x(p2 ) if p1 ≤ p2 . (p )

(p )

Proof. From Theorem 12.1.1 we know that xk 1 xk 2 for all k ≥ 0, and therefore condition (H4 ) implies the assertion. Corollary 12.1.3 Assume that for all p ∈ P, algorithm (12.1.1) is an l−step pro∗ (p) cess, mappings fk are also closed. Then, x(p) is a common fixed point of mappings (p) fk (k ≥ l − 1), and it is increasing in p. As the conclusion of this section a different class of comparison theorems is (1) (2) introduced. Consider the iteration sequences xk , xk ∈ X (k ≥ 0), and assume that (1)

(I1 ) xi

(2)

≤ xi

for i = 0, 1, ..., l − 1;

(I2 ) For all k ≥ l there exist mappings φk : X k+1 → 2L such that • φk (x(0) , x(1) , ..., x(k) ) is decreasing in variables x(i) (0 ≤ i < k), and is strictly increasing in x(k) ; (1)

(1)

(1)

(2)

(2)

• y (1) ≤ y (2) ∀k ≥ l, y (1) ∈ φk (x0 , x1 , ..., xk ) and y (2) ∈ φk (x0 , x1 , ..., (2) xk ).

12.2. Fixed points

421 (1)

(2)

Theorem 12.1.4 Under conditions (I1 ) and (I2 ), xk ≤ xk (1)

for all k ≥ 0.

(2)

(1)

(2)

Proof. By induction, assume that xi ≤ xi for i < k, but xk > xk . Let (2) (2) (1) (1) yi ∈ φk (x0 , ..., xi−1 , xi , ..., xk ) be arbitrary for i = 0, 1, ..., k, and select any (2)

(2)

(2)

yk+1 ∈ φk (x0 , x1 , ..., xk ). Then the monotonicity of φk implies y0 ≥ y1 ≥ ... ≥ yk ≥ yk+1 ,

(1)

(2)

which contradicts the second bulleted section of assumption (I2 ). Hence, xk ≤ xk , (1) (2) and since xi ≤ xi for i = 0, 1, ..., l − 1, the proof is complete. Remark 12.1.2 Note that this theorem generalizes the main result of Redheffer and Walter [291].

12.2

Fixed points

Consider once again the general l−step algorithm xk+1 ∈ fk (xk+1−l , xk+2−l , ..., xk ),

(12.2.1)

and assume that for k ≥ l − 1, fk : X l → 2X , where X is a subset of a partially ordered metric space L. Assume that (J1 ) The algorithm converges to an s∗ ∈ L with arbitrary initial points x0 , x1 , ..., xl−1 ∈ X; (J2 ) For all k ≥ l − 1, mappings fk is isotone; (J3 ) If xk , x ∈ X, x ≤ xk (k ≥ 0) such that xk → x∗ (x∗ ∈ L) as k → ∞, then x ≤ x∗ . The main result of this section can be formulated as the following theorem. Theorem 12.2.1 Assume that conditions (J1 ) through (J3 ) hold. Furthermore, for an x ∈ X, x ≤ y with all k ≥ l − 1 and y ∈ fk (x, x, ..., x). Then, x ≤ s∗ . Proof. Consider the iteration sequence starting with initial vectors xi = x (i = 0, 1, ..., l − 1). First we show that x ≤ xk for all k ≥ 0. By induction, assume that x ≤ xi (i < k). Since xk ∈ fk−1 (xk−l , ..., xk−2 , xk−1 ), assumption (J2 ) implies that for all y ∈ fk−1 (x, x, ..., x), y ≤ xk . Then, from the assumption of the theorem we conclude that x ≤ xk . Because xk → s∗ , condition (J3 ) implies the assertion. Remark 12.2.1 The assertion of the theorem remains true if (J2 ) is weakened as follows. If x, x(i) ∈ X such that x ≤ x(i) (i = 1, 2, ..., l), then for all k ≥ l and y ∈ fk (x, x, ..., x) and y ∗ ∈ fk (x(1) , x(2) , ..., x(l) ), y ≤ y ∗ . The theorem can be easiy extended to the general algorithm (12.1.1). The details are omitted.

422

12.3


Applications

Case Study 1 Consider the first order difference equations xk+1 = g(xk )

(12.3.1)

zk+1 = h(zk ), where g and h : D → D, with D being a subset of Rn . Assume that for all x ∈ D, g(x) h(x), and x, x ¯ ∈ D, x x ¯ imply that g(x) g(¯ x) and h(x) h(¯ x). Then Theorem 12.1.1 implies that if the initial terms x0 and z0 are selected in such a way that x0 z0 , then for the entire iteration sequence, xk zk (k ≥ 0). If xk → x∗ and zk → z∗ as k → ∞, then x∗ ≤ z∗ . This inequality is illustrated in Figure 12.1. y=x g(x), h(x) y=h(x)

y=g(x)

x x∗

z∗

Figure 12.1: Comparison of fixed points on the real line.

Case Study 2 Assume that the above equations are linear, i.e., they have the special form xk+1 = Axk + b

(12.3.2)

zk+1 = Bzk + d. If we assume that matrices A, B and vectors b and d are non-negative, and that A B and b d, then x0 z0 implies that xk zk for all k ≥ 0. This assertion is a simple consequence of Theorem 12.1.1 and the previous case study. Case Study 3 Consider the linear input-output system y = x − Ax,

(12.3.3)

12.3. Applications

423

where A ≥ 0 is an n × n constant matrix such that its spectral radius ρ(A) is less than one. Here, x and y denote the gross and net production vectors, respectively. It is known that for all y ∈ Rn , equation (12.3.3) has a unique solution x(y), which can be obtained as the limit of the iteration process xk+1 = Axk + y

(12.3.4) n

with arbitrary initial point x0 ∈ R . By selecting X = Rn , l = 1, f (x) = Ax + y, we obtain the following result. If for any vector u, u ≤ Au + y, then u ≤ x(y); and if for a vector v, v ≥ Av + y, then v ≥ x(y). Here, ” ≤ ” the partial order of vectors, or u = (ui ) ≤ v = (v i ) if for all i, ui ≤ v i . Case Study 4 Consider again the linear input-output system (12.3.3) with the same assumptions as before. As we have seen, the iteration process (12.3.4) converges to the unique solution x(y). Since A ≥ 0, function Ax + y is increasing in x and y. Therefore Theorem 12.1.1 with the selection p = y implies that x(y1 ) ≤ x(y2 ), if y1 ≤ y2 . That is, higher net production requires higher gross production. Case Study 5 As a simple consequence of Theorem 12.2.1, this case proves a useful result which can be applied in practical cases to determine wether a matrix has a nonnegative inverse. The assertion is the following. Theorem 12.3.1 Let A be an n × n real constant matrix. A is nonsingular and A−1 ≥ 0 if and only if an n × n real constant matrix D exists such that (i) D ≥ 0; (ii) I − DA ≥ 0; (iii) ρ(I − DA) < 1. Proof. 1. If A−1 exists and is nonnegative, then D = A−1 satisfies the above conditions. 2. From (iii) we know that DA is nonsingular; therefore, A−1 exists. In order to show that A−1 ≥ 0, we verify that equation Ax = b has a non-negative solution x for all b ≥ 0. Note first that this equation is equivalent to the following x = (I − DA)x + Db.

(12.3.5)

Condition (iii) implies that the iteration sequence generated by the mapping f (x) = (I − DA)x converges to s∗ = 0. Starting from arbitrary initial vector x0 . If the partial order is selected as ” ≥ ” (instead of ” ≤ ”), then for the solution x∗ of equation (12.3.5) x∗ ≥ f (x∗ ), and therefore Theorem 12.2.1 implies that x∗ ≥ s∗ = 0. Hence, the solution x∗ of equation (12.3.5) is non-negative, which completes the proof. In the matrix theoretical literature, a special class of matrices satisfying the conditions of this theorem, is often discussed.

424


Definition 12.3.2 An n × n real matrix A = (aij ) is called an M -matrix, if D = −1 −1 diag(a−1 11 , a22 , . . . , ann ) satisfies conditions (i), (ii), (iii). Notice that assumptions (i) and (ii) imply that for all i, aii > 0 and for all i = j, aij ≤ 0. For the sake of simplicity, let NP denote the set of all n × n real matrice whose off-diagonal elements are all non-positive, and let MM denote the set of all n × n M -matrices. Obviously, MM ⊂ NP. A comprehensive summary of the mostly known characterizations of M -matrices is given as follows. Theorem 12.3.3 Let A ∈ NP. Then the following conditions are equivalent to each others: 1. there exists a vector x ≥ 0 such that Ax > 0; 2. there exists a vector x > 0 such that Ax > 0; 3. there exists a diagonal matrix D with positive diagonal such that AD1 > 0; 4. there exists a diagonal matrix D with positive diagonal such that AD is strictly diagonally dominant; 5. for each diagonal matrix R such that R ≥ A, R−1 exists and ρ(R−1 (Ad − A)) < 1, where Ad is the diagonal of A; 6. if B ≥ A and B ∈ NP, then B−1 exists; 7. each real eigenvalue of A is positive; 8. each principal minor of A is positive; 9. there exists a strictly increasing sequence ∅ = M1 ⊆ M2 ⊆ . . . ⊆ Mr = {1, 2, . . . , n} such that the principal minors A(Mi ) are positive, where A(Mi ) is obtained from A by eliminating all rows and columns with indices i ∈ / Mi ; 10. there exists a permutation matrix P such that PAP−1 can be decomposed as LU, where L ∈ NP is a lower triangular matrix with positive diagonal, and U ∈ NP is an upper triangular matrix with positive diagonal; 11. inverse A−1 exists and is non-negative. Proof. 1 ⇒ 2: The continuity of Ax implies that for sufficiently small ε > 0, A(x+ ε1) > 0. Here x + ε1 > 0. 2 ⇒ 3: Let Ax > 0, and define D = diag(x(1) , ..., x(n) ), where x(i) denotes the ith element of x. Then, AD1 = Ax > 0. 3 ⇒ 4: Let D be a diagonal matrix with positive diagonal such that AD1 > 0. Then, with th enotation AD = W = (wij ),

n

wij > 0, i.e., wij >

j=1

j=i

− wij =

j=i

|wij |,

12.3. Applications

425

since for i = j, wij ≤ 0. 4 ⇒ 5: Let WD denote the diagonal of W. Then the row −1 norm of I − W−1 D W is less than one, therefore ρ(I − WD W) < 1, and −1

−1 −1 (I − A−1 AD AD) ρ(I − A−1 D A) = ρ(D D A)D) =ρ(I − D

= ρ(I − W−1 D W) < 1,

since WD = AD D. Let R ≥ A be a diagonal matrix. Then, R ≥ AD , and therefore R−1 ≤ A−1 D . The Perron-Frobenius theorem (see, for example, Luenberger, 1979) implies that −1 ρ(R−1 (AD − A)) ≤ ρ(A−1 D (AD − A)) = ρ(I − AD A) 0, since otherwise a singular R could be selected with at least one zero diagonal element. Inequality BD ≥ AD implies that B−1 D exists and is positive. Therefore −1 0 ≤ B−1 D (BD − B) ≤ BD (AD − A),

which implies that −1 −1 ρ(I − B−1 D BD ) = ρ(BD (BD − B)) ≤ ρ(BD (AD − A)) < 1

since the selection R = B satisfies condition 5. Hence, B−1 D BD is invertible, which implies that B−1 exists. 6 ⇒ 7: Assume that 6 holds, and let w ≤ 0. Then, matrix B = A−wI satisfies inequality B ≥ A. Since B−1 exists, w is not an eigenvalue of A. 7 ⇒ 8: Let M ⊂ {1, 2, ..., n}, and define matrix B as ⎧ i, j ∈ M ⎨ aij aii i∈ /M bij = ⎩ 0 otherwise.

Clearly, B ≥ A and B ∈ NP. It is easy to prove by finite onduction that det(B) > 0. Notice that det(B) is the product of det A(M ) and the aii ’s with i ∈ / M, and it is also the product of all eigenvalues of B. We will show that all real eigenvalues of B are positive. Let β denote an eigenvalue of B, and assume that β ≤ 0. Then ¯ = B−βI ≥ A, and we will next prove that B−βI is nonsingular, which implies B ¯ ≥ 0, and that β is not an eigenvalue of B. There exists a σ > 0 such that I−σ B ¯ = U ≥ 0. V = I−σA ≥ I−σ B

The Perron-Frobenius theory and assumption 7 imply that 0 ≤ ρ(U) ≤ ρ(V) < 1, ¯ B ¯ is nonsingular. 8 ⇒ 9: This therefore, (I − U)−1 exists. Since I − U =σ B, is trivial. 9 ⇒ 10: This assertion follows from the well-known conditions of the existence of LU decomposition (see, for example, Szidarovsky and Yakowitz, [314]. 10 ⇒ 11: Assume that A may be written as A = LU where L and U satisfy the given properties. Then both L−1 and U−1 exist and it is easy to show that L−1 and U−1 are nonnegative. Therefore, A−1 = (LU)−1 = U−1 L−1 ≥ 0. 11 ⇒ 1: Select x = A−1 1 ≥ 0, then Ax = 1 > 0. Thus, the proof is complete.

426


Corollary 12.3.4 An n × n real matrix is an M −matrix if and only if it satisfies any one of the conditions of the theorem. Proof. If A is an M −matrix, then it satisfies 11, and if a matrix satisfies any one of the assumptions, then it satisfies 5. Select R as the diagonal of A, the condition 5 is equivalent to the conditions of Theorem 12.3.1. Case Study 6 A nice application of the theory of M -matrices is contained here. Consider equations g(x) = b(1) and g(x) = b(2) , where g : D → Rn with D ⊆ Rn being convex, b(1) and b(2) are in R(g), furthermore b(1) b(2) . Assume that g is differentiable, and for all x ∈D and i and j, ∂gi (x) > 0, ∂xi

∂gi (x) ≤ 0 ∂xj

(j = i),

and

12.4

∂gi ∂gi . (x) > (x) ∂xi j=i ∂xj

Exercises

12.1. Assume that conditions (H1 ), (H3 ) hold. Moreover, (H2 ) holds either for p1 or for p2 . Then show that p1 , p2 ∈ P and p1 ≤ p2 imply that for all k ≥ 0 (p ) (p ) xk 1 ≤ xk 2 . 12.2. Let un : N → R and u : R+ → R be given by δun = un+1 − un , δu(t) = u(t + 1) − u(t). If f : R × R → R has the property that p ≥ q =⇒ f (t, p) ≥ f (t, q) it is said that f (t, ↑) is increasing, and if −f has this property it is said that f (t, ↓) is decreasing. In either case, the condition is required for all t ∈ R+ . Suppose that (i) f (t, ↑) is increasing and δun ≤ f (n, un ),

δvn ≥ f (n, vn ), n ∈ N

or (ii) f (t, ↓) is decreasing and δun ≤ f (n, un+1 ),

δvn ≥ f (n, vn+1 ), n ∈ N.

Then show that u0 ≤ v0 =⇒ un ≤ vn for all n ∈ N.

12.4. Exercises

427

12.3. Let u and v be functions R+ → R. Suppose (i) f (t, ↑) is increasing and δu(t) ≤ f (t, u(t)),

δv(t) ≥ f (t, v(t)), t ∈ R+

or (ii) f (t, ↓) is decreasing and δu(t) ≤ f (t, u(t + 1)),

δv(t) ≥ f (t, v(t + 1)), t ∈ R+ .

Then show that u(t) ≤ v(t) for all 0 ≤ t < 1 =⇒ u(t) ≤ v(t) for t ∈ R+ . 12.4 Let u− (t) = inf t≤q≤t+1 u(q), u+ (t) = supt≤q≤t+1 u(q). Suppose that f and u satisfy hypothesis (i) or (ii) of the previous problem and suppose further that f (t, M ) ≤ 0 for t ∈ R+ , where M is a constant. Then show that u+ (0) ≤ M ⇒ u(t) ≤ M, t ∈ R+ . 12.5 Let u be a locally bounded solution of h(t)δu(t) ≤ g(t) − u(t + 1), t ∈ R+ where g > 0 and h > 0 for large t. Suppose v is positive for large t and satisfies the conditions 1 (h(t) + 1) v (t) + v− (t) + v(t) ≥ g(t), t 1. 2 Then show that u(t) ≤ O(v(t)), as t → ∞. 12.6 It is said that f : R+ → R+ is admissible if constants a, T exist such that f (t) > 0 and

f (s) ≤ a for s, t ≥ T, |s − t| ≤ 1. f (t)

Let f and h be admissible functions such that lim sup h(t)f (t) < ∞

t→∞

and let u be a locally bounded, nonnegative solution of f (t)δu(t) ≤ g(t) − u(t + 1)1+a where a > 0. Then show that g(t) = O(f (t)−1−1/a ) ⇒ u(t) = O(f (t)−1/a ), as t → ∞.

428


12.7 Let ϕ(t, p, q) be decreasing in p and strictly increasing in q. Then show that if u0 ≤ v0 and ϕ(n, un , un+1 ) ≤ ϕ(n, vn , vn+1 ), for all n ≥ 0 then un ≤ vn , for all n ≥ 0. 12.8 Consider the inequality c(1 + t)r δu(t) ≤ g(t) − u(t + 1)1+a , t ∈ R+ , where g is a positive function R+ → R and c, a, r are real constants with c > 0 and a ≥ 0. Assume u is locally bounded and u ≥ 0. Let a = 0. If r < 1 (c, m arbitrary) or if r = 1 and m ≥ − c1 then show that g(t) = O(tm ) ⇒ u(t) = O(tm ). 12.9 Extend Theorem 12.2.1 for the general algorithm (11.0.1). 12.10 Find results similar to those introduced in case study 8 for systems of ordinary differential equations.

Chapter 13

Special topics We exploit further the conclusions of Section 5.1.

13.1

Newton-Mysovskikh conditions

The Newton-Mysovskikh theorem [248], [263] provides sufficient conditions for the semilocal convergence of Newton's method to a locally unique solution of an equation in a Banach space setting. In this section we show that under weaker hypotheses and a more precise error analysis than before [248], [263]weaker sufficient conditions can be obtained for the local as well as the semilocal convergence of Newton's method. Error bounds on the distances involved as well as a larger radius of convergence are also obtained. Some numerical examples are also provided. In particular we are motivated by the affine convariant Newton-Mysovskikh theorem (ACNMT) (see also Theorem 13.1.1 that follows) which basically states that if the initial guess xo is close enough to the solution x* then Newton's method (4.1.3) converges quadratically to x*. The basic condition is given by - x)II 5 wily - x1I2 for all x, y, z E D. ~~F'(Z)-~[F' ( ~F1(x)](y )

(13.1.1)

Here we have noticed (see Theorem 13.3.1) that condition (13.1.1) is not used at full strength to prove convergence of method (4.3.1). Indeed a much weaker hypothesis (see (13.1.4)) can replace (13.1.1) in the proof of the theorem. This observation has the following advantages (semilocal case): (a) the applicability of (ACNMT) is extended; (b) finer error bounds on the distances llxn+l - x, 11, llx, - x*11 (n 2 0); (c) a more precise information on the location of the solution x*; (d) the Lipschitz constant is at least as small and easier to compute. Local case: (a) a larger radius of convergence; (b) finer error bounds on the distances llx, - x*ll (n 2 0).

13. Special topics

430

The above advantages are obtained under not only weaker hypotheses but under less or the same computational cost. Some numerical examples are also provided for both the semilocal and local case where our results apply where other ones in [248],[263]fail. We can show the following weaker semilocal convergence version of the Affine Covariant Newton-Mysovskikh Theorem (ACNMT): Theorem 13.1.1 Let F : D 5 X + Y be a Fre'chet continuously differentiable operator, and let xo E D be an initial guess. Assume further that the following conditions hold true:

F'(x)-' E L ( Y , X ) ( x E D ) ,

(13.1.2)

I l ~ ' ( x o ) - ~ ~ ( x o577, )ll

(13.1.3)

I~F'(Y)-~[F'(x

(13.1.4)

f t ( y - x ) )- F1(x)](y x)II I wllly - x1I2t for all x , y E Dl t E [011] -

U(x0,r1) G D ,

where

Then sequence {x,) ( n > 0 ) generated by Newton's method (4.1.3) is well defined, remains i n g ( x o ,r l ) for all n 2 0 , and there exists a unique x* E g ( x o ,r l ) such that F ( x * ) = 0 and xn + x* as n + co with

IIx* - xn+lII I &n11x*- xn-i1I2 where,

for all n 2 1,

Proof. Using (4.1.3) and (13.1.4) we obtain in turn

13.1. Newton-Mysovskikh conditions

and

which shows (13.1.8). We now prove {x,} is a Cauchy sequence in g ( x o ,r l ) . By induction on k we will show:

For k

= 0,

we have by (4.1.3), (13.1.3) and (13.1.5)

2 llxl - X O I I = I I F ' ( x ~ ) - ~ FI( x17 ~=) -hl. ~~ w1

Assuming (13.1.13) to be true for some k E N , we get

and

it follows from (13.1.15) that xk+l E u ( x o ,r l ) : Similarly it can be shown that

That is {x,} is a Cauchy sequence in g ( x o , r l ) and as such it converges to some x* E U ( x 0 ,T I ) . Hence, we get

432

13. Special topics

and thus F ( x * )= 0. To show (13.1.9), set hf = iwlllxk turn: lim llxk - xmll k<m+cc lim [llxm - xm-111 I k<m+cc 2 5 - lim [hm-l . .. W 1 k<m+cc

-

xk-lI1. Then we get in

llxk - x*II =

+ . . . + IIxk+i + + hk]

-

xk II]

We can also have

whence hfim 5 ( h ~ ) b~E ~ No., That is

which shows (13.1.9). Finally to show uniqueness, let y* E u ( x o ,r l ) be a solution of equation F ( x ) = 0. Then, we have for all t E [0, 11: x*

+ t(y* - x*) - X o = (1 - t)(x*- xo) + t(y*- xo)

and IIx* + t(y* - x*) - xoll I (1 - t)llx* - xoll + tlly* - xoll 5 (1 - t)rl trl = rl.

+

That is

Using the identity

13.1. Newton-Mysovskikh conditions where.

and hypothesis (13.1.2) we deduce that for xt = x* is invertible. Hence, we conclude x*

+ t(y* - x*) linear operator M

= y*.

Remark 13.1.1 Clearly condition (13.1.1) implies (13.1.4) but not vice versa. That is (13.1.4) is a weaker condition than (13.1.1). Note that

holds i n general and can be arbitrarily large. The classical A B n e Covariant Newton-Mysovskikh Theorem has been shown using (13.1.1), h , r instead of (13.1.4), hl and r l respectively, where

and

Note that

but not vice versa unless if w l = w.If strict inequality holds i n (13.1.23) then

and our error estimates (13.1.8)-(13.1.10) are finer than the corresponding ones i n [248], [263] obtained by replacing w l by w i n (13.1.8)-(13.1.10). Note that all the above advantages are obtained under less computational cost since i n practice the computation of w requires that of w l . Remark 13.1.2 Although the computation of constant wl i n (13.1.4) is i n practice easier than the computation of w in (13.1.1) one may want to replace (13.1.4) by the pair of conditions

13. Special topics

434 and

for all x,y E D, t E [O,l]. Using (13.1.28) and (13.1.29) we showed in Section 5.1 that Newton's sequence converges to a unique solution x* of equation (4.1.3) in U ( x o , provided that

3)

Clearly these results will be weaker provided w2 1 - wollz- xoll

<w, z 7 7 (xo,

2) ,

since,

by the Banach Lemma o n invertible operators and the estimate

Condition (13.1.31) certainly holds if

which can happen for suficiently small q and since

can be arbitrarily small.

We can provide an example to show that (13.1.5)and (13.1.30)hold but (13.1.24) fails. Example 13.1.1 Let X function F o n D by

=

Y

=

R,D

=

[a,2 - a ] , a E [0,$) , xo

=

Using (13.1.I), (13.1.3), (13.1.4), (13.1.28) and (13.1.29) we obtain

1 and define

13.1. Newton-Mysovskikh conditions

435

Condition (13.1.24) does not hold since

That is there is no guarantee that Newton's method starting at xo converges to x*. However condition (13.1.5) holds, since 2 (I-a) 2h1 = -3a

< 2 for alla E

(F,:)

Finally, say for S = 1, condition (13.1.30) holds since

In order for us to cover the local convergence of Newton's method under NewtonMysovskikh-type conditions, let x* be a simple solution of equation F ( x ) = 0. Assume: x x*)II 5 v(1 - t)IIx - x*112 (13.1.40) l l F 1 ( x ) - ' [ ~ ' ( x+ * t ( x - x * ) )- F 1 ( x ) ] ( for all x E u ( x * ,r * ) , t E [ O , l ] where,

Set also

We can show the following local convergence theorem for Newton's method (4.1.3): Theorem 13.1.2 Let F : D C X sume

+

Y be a Fre'chet-differentiable operator. As-

F1(x)-I E L ( Y , X ) ( x E D ) ; there exists x* E D such that F ( x * )= 0; if (a) condition (13.1.40) holds and

then, sequence (2,) generated by Newton's method (4.1.3) is well defined, remains in U ( x * ,r * ) for all n 2 0 and converges to x* provided that xo E U ( x * ,r*) with

13. Special topics

436 (b) condition (13.1.1) holds and

then, sequence {xn) generated by Newton's method (4.1.3) is well defined, remains i n U ( x * ,r ; ) for all n 2 0 and converges to x* provided that xo E U ( x * ,r ? ) with

Proof. (a) Using (13.1.42), (13.1.43) and induction on the integer Ic we get

which shows xk E U(x*,r * ) for all k and lim xk = x* . k-00

( b ) Similarly using ( 3 ) , (45) and induction on the integer Ic we get

Let us further introduce the conditions:

and

I I F ' ( x * ) - ~ [ F ' (-x F'(x*)]I~ )

I

x*II.

V O ~ ~ X

Then since

if we set

then we showed (see Section 5.1) that Newton's method converges to x* provided that xo E U ( x * ,ra) and

In general vl5v5w holds and

$, E,

(13.1.54) can be arbitrarily large. Therefore by (13.1.41) and (13.1.42)

r: 5 r*. We can now provide an example where we compare radii r*, r ; and r,*.

13.2. A conjugate gradient solver for the normal equations

437

Example 13.1.2 Let X = Y = R, D = U(0, I ) , x* = 0, and define function F on D by

Using (13.1.1), (13.1.40), (13.1.49), (13.1.50) we obtain w=e, v=e-1,

vo=e-1

and v l = e .

(13.1.56)

By (13.1.41), (13.1.42), (13.1.52), (13.1.56) we get

and

We need to reset r* = 1 so that (13.1.53) holds. It follows that since rT < r*, our approach provides a wider choice of initial guesses xo than in the local convergence results using the standard condition (13.1.1).

13.2

A conjugate gradient solver for the normal equations

In this section a weaker version of the affine covariant Lipschitz condition appearing in connection with Newton-Mysovskikh-type theorems for the semilocal convergence of Newton methods is introduced. It turns out under weaker hypotheses and less computational cost than before [142],[194],[263]a semilocal convergence analysis can be provided for Newton methods. This approach in particular on one hand extends the usage of the conjugate gradient solver for normal equations and on the other hand provides finer error estimates on the distances involved. A simple numerical example justifies our approach. In this study we are concerned with the problem of approximating a locally unique solution x* of equation

where operator F is defined on an open convex subset D of the n-dimensional Euclidean space Rn with values in itself. The most popular method for approximating x* is undoubtedly Newton's method. Newton's method computes iterates successively as the solution of linear algebraic systems

438

13. Special topics

The classical convergence theorems of Newton-Kantorovich and Newton-Mysovsikikh and its affine covariant, affine contravariant, and affine conjugate versions assume the exact solution of (13.2.2). In practice however, in particular if the dimension is large, (13.2.2) will be solved by an iterative method. In this case, we end up with an outerlinner iteration, where the outer iterations are the Newton steps and the inner iterations result from the application of an iterative scheme to (13.2.2). It is important to tune the outer and inner iterations and to keep track of the iteration errors. With regard to affine covariance, affine contravariance, and affine conjugacy the iterative scheme for the inner iterations has to be chosen in such a way, that it easily provides information about the error norm in case of affine covariance, residual norm in case of affine contravariance, and energy norm in case of affine conjugacy. Except for convex optimization, we cannot expect Ff(x), x E D , to be symmetric positive definite. Hence, for affine covariance and affine contravariance we have to pick iterative solvers that are designed for nonsymmetric matrices. Appropriate candidates are CTGNE (Conjugate Gradient for the Normal Equations) in case of affine covariance, GMRES (Generalized Minimum M i d u a l ) in case of affine contravariance, and PCG (Preconditioned Conjuate Gradient) in case of affine conjugacy. Although our results can be extended to all solvers mentioned above we only report on CGN. In this study we are motivated by the elegant work of R. Hoppe in [I941 (see also related work by E. Cgtinag [118], [119]). Hoppe's work is based on the following assumption: Suppose that F: D L R n + R n is continuously differentiable on D with invertible F'rdchet derivative F1(x),x E Rn. Assume further that the following affine covariant Lipschitz condition is satisfied: I I F ' ( ~ ) - ~ [ F '-( ~ ) Ft(x)]vll 5 wily-xll

1 1 ~ 1 1 for all x, y , z E D, v E Rn.

(13.2.3)

Condition (13.2.3) together with the sufficient convergence Newton-Mysovskikh hypothesis

form the basic hypotheses of the main results (Theorems 3.1 and 3.2 in [194]). However there are even simple scalar examples where hypothesis (13.2.4) does not hold and consequently there is no guarantee that method (13.2.2) converges to x*:


439

Example 13.2.1 Let n = 1, xO = 1, D = [a,2 -a], a E [0, F on D by

i) and define function

It follows by (13.2.2), (13.2.3) and (13.2.5) that

~

1

~ = 5(1 ~ - a),1

2(2 - a) w 1 = -. a2

By (13.2.4) and (13.2.6) we obtain 2 H = -(I3a2

a)(2 - a).

i).

However (13.2.4) does not hold since H > 2 for all a E [o, That is the results obtained in [I941 cannot be applied in this case. We noticed that all the results obtained in [I941 hold in a weaker setting if (13.2.3) and (13.2.4) are replaced by the weaker (13.2.10) and (13.2.25) respecx*ll and the relative error are tively. Moreover error bounds Ilxk+l - xyl, Ix" finer. This information is important in computational mathematics [99], [142], [1941, [2631. To make the section as self-contained as possible we reintroduce some of the terminology and concepts in [263].

Affine Covariant Inexact Newton Methods CGNE (conjugate Gradient for the Normal Equations) We assume A E Rnxn to be a regular, nonsymmetric matrix and b E Rn to be given and look for y* E Rn as the unique solution of the linear algebraic system

As the name already suggests, CGNE is the conjugate gradient method applied to the normal equations: It solves the system

for z and then computes y according to

The implementation of CGNE is as follows: CGNE Initialization:

13. Special topics

440

Given an initial guess yo E Rn, compute the residual ro = b - Ayo and set

Po = 7-01 Po = 0 ,

Po = 0 , go = llr01l2.

CGNE Iteration Loop: For 1 5 i 5 i,,

compute

CGNE has the error minimizing property

where K i ( A T r o ,ATA) stands for the Krylov subspace

L e m m a 13.2.1 [194]. Representation of the iteration error. Let ~i := Ily* - yi1I2 be the square of the CGNE iteration error with respect to the i t h iterate. Then, there holds

Proof. CGNE has the Galerkin orthogonality

Setting m = 1, this implies the orthogonal decomposition

which readily gives

On the other hand, observing y, yields

=

y*, for m

=

n - i the Galerkin orthogonality


Computable lower bound for the iteration error It follows readily from Lemma 13.2.1 that the computable quantity

provides a lower bound for the iteration error. In practice, we will test the relative error norm according to

where

is a user specified accuracy. Convergence of Affine Covariant Inexact Newton Methods

We denote by 62% En the result of an inner iteration, e.g., CGNE, for the solution of (13.2.2). Then, it is easy to see that the iteration error 6xk - A x k satisfies the error equation

We will measure the impact of the inexact solution of (13.2.2) by the relative error

Theorem 13.2.2 A f i n e covariant convergence theorem for the inexact Newton method. Part I: Linear Converence Suppose that F : D c Rn + Rn is continuously differentiable on D with invertible Fre'chet derivatives F 1 ( x ) ,x E Rn. Assume further that the following a f i n e covariant Lipschitz condition is satisfied

where x , y , z E D , v E Rn, t E [0, 11. Assume that xO E D is an initial guess for the outer Newton iteration and that 6x0 = 0 is chosen as the start iterate for the inner iteration. Consider the Kantorovich quantities

442

13. Special topics

associated with the outer and inner iteration. Assume that

and control the inner iterations according to

which implies linear convergence. Note that a necessary condition for t9(hk,b k ) 5 g is that it holds true for 6k = 0 , which is satisfied due to assumption (13.2.2). Then, there holds: Theorem 13.2.3 The Newton CGNE iterates \x

k Ic No stay i n

and converges linearly to some x* E B ( x O ,p) with f ( x * )= 0 . The exact Newton increments decrease monotonically according to

whereas for the inexact Newton increments we have

Proof. Using (13.2.2) we can obtain in turn:


443

Using the affine covariant Lipschitz condition (13.2.10), the first term on the righthand side in (13.2.17) can be estimated according to 1 1 1 5 w l l l b kt d[t [ =2 - 5~ ~1116~~11~.

(13.2.18)

For the second term we obtain by the same argument 11 = I I F / ( x ~ + + [ F~/ )( -X~~

axk)f F ~ ( X ~ + ~ ) -( axk)] ~ X ~ 11

) ( ~-X ~

5 I I F ~ ( x ~ + ~ ) - ~ ( F-' (F/(x~))(Gx" x"~) axk)ll

+ 11 F ~ ( X " ~ ) - ~ F ~ ( X- ~axk)[[ + ~ ) ( ~ ~ ~

(13.2.19)

Combining (13.2.18) and (13.2.19) yields

+

l16xk - axkll 116xkll

5 hi

+6k(l+ hi).

Observing (13.2.11),we finally get

which implies linear convergence. Note that a necessary condition for 8 ( h k ,6k) 5 is that it holds true for which is satisfied due to assumption (13.2.12). For the contraction of the inexact Newton increments we get

= 0,

It can easily be shown that { X ~ )isNa ~Cauchy sequence in B ( x O p). , Consequently, there exists x* E B ( x O p) , such that xk + x* ( k + a). Since

we conclude F ( x * ) = 0. rn

13. Special topics

444

Remark 13.2.1 I n general

holds and can be arbitrarily large. If wl = w and (13.2.10) is replaced by stronger (13.2.3) then our Theorem 13.2.2 reduces t o Theorem 3.1 in [194, p. 201. Otherwise our Theorem 13.2.2 improves Theorem 3.1 in [194]; under weaker hypotheses and less computational cost since in practice the computation of wl is less expensive than w. Note also that

but not vice versa unless if wl = w. Returning back t o Example 13.2.1 and using (13.2.10) we obtain 2

wl=-

Computational Theory of Iterative Methods (Studies in Computational Mathematics, Volume 15)

Computational Theory of Iterative Methods (Studies in Computational Mathematics, Volume 15)

Computational Theory of Iterative Methods

Extrapolation Methods: Theory and Practice (Studies in Computational Mathematics 2)

Extrapolation Methods: Theory and Practice (Studies in Computational Mathematics 2)

Reviews in Computational Chemistry, Volume 15

Computational Optimization, Methods and Algorithms (Studies in Computational Intelligence, 356)

Computational Plasticity (Computational Methods in Applied Sciences)

Advances in computational mathematics

Computational Methods in Photochemistry

Computational Methods And Function Theory

Computational Intelligence: A Compendium (Studies in Computational Intelligence, Volume 115)

Challenges for Computational Intelligence (Studies in Computational Intelligence, Volume 262)

Computational Materials Science, Volume 15 (Theoretical and Computational Chemistry)

Computational Methods in Earthquake Engineering (Computational Methods in Applied Sciences)

Beyond Wavelets (Studies in Computational Mathematics 10)

Beyond Wavelets (Studies in Computational Mathematics 10)

Modern Computational Intelligence Methods for the Interpretation of Medical Images (Studies in Computational Intelligence, Volume 84)

Perturbation Theory for Matrix Equations (Studies in Computational Mathematics)

Theory of computational complexity

Principles of Computational Fluid Dynamics (Springer Series in Computational Mathematics)

Principles of Computational Fluid Dynamics (Springer Series in Computational Mathematics)

Principles of Computational Fluid Dynamics (Springer Series in Computational Mathematics)

Principles of Computational Fluid Dynamics (Springer Series in Computational Mathematics)

Computational Modeling Methods for Neuroscientists (Computational Neuroscience)

Computational Modeling Methods for Neuroscientists (Computational Neuroscience)

Computational methods in structural dynamics

Computational Methods in Biomedical Research

Computational Methods in Systems Biology

Computational Methods in Plasma Physics

Computational Methods in Process Simulation

Computational Theory of Iterative Methods (Studies in Computational Mathematics, Volume 15)