POINT-TO-SET MAPS AND MATHEMATICAL PROGRAMMING
MATHEMATICAL PROGRAMMING STUDIES Editor·in·Chief M.L. BALINSKI, Yale Un...
43 downloads
910 Views
7MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
POINT-TO-SET MAPS AND MATHEMATICAL PROGRAMMING
MATHEMATICAL PROGRAMMING STUDIES Editor·in·Chief M.L. BALINSKI, Yale University, New Haven, CT, U.S.A. Senior Editors E.M.L. BEALE, SCIentific Control Systems, Ltd., London, Great Britain GEORGE B. DANTZIG, Stanford University, Stanford, CA, U.S.A. L. KANTOROVICH, National Academy of Sciences, Moscow, U.S.S.R. TJALLING C. KOOPMANS, Yale University, New Haven, CT, U.S.A. A.W. TUCKER, Princeton University, Princeton, NJ, U.S.A. PHILIP WOLFE, IBM Research, Yorktown Heights, NY, U.S.A. Associate Editors VACLAV CHVATAL, Stanford University, Stanford, CA, U.S.A. RICHARD W. COTTLE, Stanford University, Stanford, CA, U.S.A. H.P. CROWDER, IBM Research, Yorktown Heights, NY, U.S.A. J.E. DENNIS, Jr., Cornell University, Ithaca, NY, U.S.A. B. CURTIS EAVES, Stanford University, Stanford, CA, U.S.A. R. FLETCHER, The University, Dundee, Scotland B. KORTE, Universitat Bonn, Bonn, West Germany MASAO IRI, University of Tokyo, Tokyo, Japan C. LEMARECHAL, IRIA·Laboria, Le Chesnay, Yvelines, France C.E. LEMKE, Rensselaer Polytechnic Institute, Troy, NY, U.S.A. GEORGE 1. NEMHAUSER, Cornell University, Ithaca, NY, U.SA. WERNER OETTLI, Universitiit Mannheim, Mannheim, West Germany MANFRED W. PADBERG, New York University, New York, U.S.A. M.J.D. POWELL, University of Cambridge, Cambridge, England JEREMY F. SHAPIRO, Massachusetts Institute of Technology, Cambridge, MA, U.S.A. L.S. SHAPLEY, The RAND Corporation, Santa Monica, CA, U.S.A. K. SPIELBERG, IBM Scientific Computing, White Plains, NY, U.S.A. HOANG TUY, Institute of Mathematics, Hanoi, Socialist Republic of Vietnam D.W. WALKUP, Washington University, Saint Louis, MO, U.S.A. ROGER WETS, University of Kentucky, Lexington, KY, U.S.A. C. WITZGALL, National Bureau of Standards, Washington, DC, U.S.A.
NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM-NEW YORK-OXFORD
MATHEMATICAL PROGRAMMING STUDY
10
Point-to -Set Maps and Mathematical Programming Edited by P. HUARD
A. Auslender J.M. Borwein J.P. Delahaye J. Denel J.Ch. Fiorot E.G. Gol'shtein P. Huard D. Klatte R. Klessig
B. Kummer G.G.L. Meyer E. Polak S.M. Robinson A. Ruszczynski R. Saigal J. Szymanowski S. Tishyadhigama N.V. Tret'yakoY
1979
NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM-NEW YORK-OXFORD
© THE MATHEMATICAL PROGRAMMING SOCIETY - 1979 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
This book is also available in journal format on subscription.
North-Holland ISBN for this series: 0 7204 8300 X for this volume: 0444 85243 3
Published by : NORTH-HOLLAND PUBLISHING COMPANY AMSTERDAM' NEW YORK' OXFORD
Sole distributors for the U.S.A. and Canada: Elsevier North-Holland, Inc. 52 Vanderbilt Avenue New 'York, N.Y. 10017
Library of Congress Cataloging in Publication Data
Main entry under title: Point-to-set maps and mathematical programming. (Mathematical programming study ; no. ) 1. programming (Mathematics)--Addresses, essays, lectures. 2. Mappings (Mathematics)--Addresses, essays, lectures. I. Huard, Pierre. II. Auslende~ Alfred. III. Series. QA402.5 . P5 7 519.7 78-23304 ISBN 0-444-85243-3
PRINTED IN THE NETHERLANDS
PREFACE The theory of point-to-set maps, and more exactly the notions of continuity connected with it, forms a most interesting mathematical tool for the study of algorithms. It has come into increasing use during the last twelve years in papers on optimization (convergence of algorithms, synthesis of existing methods, stability of parametrized mathematical programs, etc.). The object of this monograph is to give a sample of this literature, and to endeavour to take stock of the question. The monograph includes a bibliographic survey drafted by the editor in collaboration with several colleagues. A list of references going back to the beginning of the century is given in an annex, as well as a short communication by Delahaye and Denel concerning the equivalences between the various notions of continuity of the point-to-set maps used in classical papers. The other articles cover a great variety of subjects, which can however be regrouped under a few headings: stability, optimality-duality, algorithms and fixed points. Three papers dealing with the stability of nonlinear programming, with different subjects of concern. Paper 1 by Auslender studies the directional derivatives of the optimal value of a program whose right-hand members are parameters, in the case of nondifferentiable constraints. Paper 7 by Klatte lays down sufficient conditions for the lower semicontinuity of the set of optimal solutions in the case of a program where the objective function and the domain are defined by parametrized quadratic convex functions. It will be recalled that this semicontinuity is generally not assured, contrary to upper semicontinuity. Lastly, Paper 8 by Kummer deals with the same problem, but for the set of optimal solutions of the dual program of a linearly constrained program where only the convex objective function is parametrized: it is consequently a matter of continuity of the set of multipliers of the primal. Stability is also studied in Paper 10 by Robinson, but in a broader setting: the subject is the stability of the set of solutions of multivoque equations, Le., where the sign of equality is replaced by that of belonging to a set. This leads to applications to complementarity problems and to quadratic programming, where the constraints are cone-constraints. Extensions of the theory of duality are given in two papers. Papers 2 by Borwein gives a generalization of the Farkas lemma, and illustrates the possibilities of simplifying afforded by point-to-set maps. Paper 5 by Gul'shtein and Tret'yakov studies the conservation of duality properties for generalized v
vi
Preface
Lagrangian functions and the applications resulting therefrom for the convergence of methods of subgradients for determining saddle-points. Seven other papers deal mainly with algorithms. Papers 6 by Huard and 13 by Tishyadhigama, Polak and Klessig extend the applications of Zangwill's general algorithm by the weakening of hypotheses. In Paper 4 by Fiorot and Huard are studied the possibilities of combining the iterations of different algorithms in a cyclic way (composition of ranges of point-to-set maps) or arbitrarily (union of the ranges), thus generalizing the conventional relaxation methods. The conventional continuity properties of point-to-set-maps are hard to preserve in these operations, and new notions of continuity are proposed by Denel in Paper 3. This author is thus able to construct general algorithms having a great number of applications. Also in the field of composition of algorithms, Paper 12 by Szymanowski and Ruszczynski studies the convergences of two-level algorithms and especially the influence of the approximation of calculations in the sub-optimizations. All these algorithms lead to obtaining fixed points of point-to-set-maps. Paper 11 by Saigal studies at theoretic and practical levels a method for obtaining fixed points by simplicial decomposition of space, and piecewise linear approximation of the functions. Lastly, Paper 9 by Meyer studies the properties of cluster points of a sequence of points generated by an algorithm, account taken of the properties of continuity of the point-to-set map that defines the algorithm. P. Huard
CONTENTS Preface Contents
Vll
v
Background to point-to-set maps in mathematical programming Annex 1: The continuities of the point-to-set maps, definitions and equivalences . . . . . Annex 2: Relaxation methods General reference list
8 13 14
(1) Differentiable stability in non convex and non differentiable programming, A. Auslender . . . . . .
29
(2) A multivalued approach to the Farkas lemma, l. Borwein
42
(3) Extensions of the continuity of point-to-set maps: applications to fixed point algorithms, l. Denel
48
(4) Composition and union of general algorithms of optimization, l.Ch. Fiorot and P. Huard . . . . . . . .
69
(5) Modified Lagrangians in convex programming and their generalizations, E.G. Gol'shtein and N. V. Tret'yakov . . ..
86
(6) Extensions of Zangwill's theorem, P. Huard
98
(7) On the lower semicontinuity of optimal sets in convex parametric optimization, D. Klatte
104
(8) A note on the continuity of the solution set of special dual optimization problems, B. Kummer . . . . . . .
110
(9) Asymptotic properties of sequences iteratively generated by point-toset maps, G.G.L. Meyer
115
(10) Generalized equations and their solutions, Part I: Basic theory, S.M. Robinson
128
(11) The fixed point approach to nonlinear programming, R. Saigal
142
vii
viii
Contents
(12) Convergence analysis for two-level algorithms of mathematical programming, J. Szymanowski and A. Ruszczynski
158
(13) A comparative study of several general convergence conditions for algorithms modeled by point-to-set maps, S. Tishyadhigama, E. Polak and R. Klessig . . . . . . . . . . . .
172
Mathematical ProgrammingStudy 10 (1979) I-7. North-Holland Publishing Company
B A C K G R O U N D T O P O I N T - T O - S E T MAPS IN M A T H E M A T I C A L PROGRAMMING
I. A i m o f the b o o k
The articles of this monograph deal with the use of point-to-set maps in the theory of mathematical programming or, more generally, of optimization. We should recall that a point-to-set map is, generally speaking, a function defined on a space X and whose ranges are subsets of a space Y. This is expressed symbolically F : X ~ ~ ( Y ) or again F : X ~ 2 v. Why a monograph on such a subject? The use is recent, and to our knowledge the first articles of this kind are those by Cesari (1966) and by Rosen (1966). In these papers the authors use continuity properties of point-to-set maps in optimal control problems (existence or convergence theorems). With great modesty, apparently, for the authors do not stress the originality of their approaches. We might also mention, in the book by Berge (1959, and 1966 for the second edition), the expression of a theorem known as the maximum value theorem (Ch. 6, w regarding the stability or continuity properties of the optimal value of a parametrized program, as well as of the set of optimal solutions. But the use of point-to-set maps only really got into its stride after the publication of Zangwill's book, in 1969. This book, the theme of which bears on the representation and study of iterative methods by means of general schemas, makes most stimulating reading. One important idea set forth in this book is the "macroscopic" analysis and synthesis of the algorithms, in the study of their convergence, thanks to point-to-set maps. Notwithstanding the obvious advantage of this mathematical tool in the representation and study of optimization algorithms, its use by specialists has not shown rapid spread. There are now hardly more than a few dozen users. One of the reasons for this slow spread is probably that point-to-set maps are hardly ever studied in university courses in mathematics. Hence the absence of a common language, and especially the use of notions that are neighbouring but nevertheless vary from author to author: the notions of continuity may be taken as the main example of this. The authors are thus practically compelled, at the beginning of each article, to define the notions they are using, thus weighing down on the presentation of results. This is also the case for the basic theorems applied: few readers know, for instance, the theorems on the stability of the continuity properties in the composition of two point-to-set maps. On the other hand, can one imagine a "classical" author reminding readers of a similar
2
Background to point-to-set maps
theorem on the composition of two "univoque" functions, or quoting a reference thereto? With this monograph we hope to point out the advantage of using point-to-set maps for optimization, and to make this mathematical tool better known, one of great use and which often simplifies theoretical schemas. Before justifying the above statement, we would remind readers how the notion of a point-to-set map appeared in the literature of mathematics. By reading the references quoted in the historical outline that follows, we note that these studies deal essentially with the notions of continuity of these functions. Most authors propose and study two types of continuity, originally called upper semi-continuity and lower semi-continuity. These names cover, as we have indicated above, notions that differ slightly from author to author. A comparative study of these notions is given in Annex 1.
2. Historical outline
The notion of point-to-set mapping, and more exactly the notions of continuity connected with it, made their appearance long before mathematical programming, in study of the limit of a sequence of sets. We can cite a theorem on the upper limit of a family of subsets of the interval [0, 1] depending on a real variable, presented by Painlev6 in his course at the Ecole Normale Sup6rieure de Paris, in 1902. This theorem was taken up again and developed by Zoretti in 1905. Then, spread out over about 20 years, several articles came out on the limits of sets or on that type of functions valued in ~(R): Janiszewski (1912), Hahn (1921), Moore (1924-1925), Vasilesco (1925), Wilson (1926), Hill (1927). But it was during the thirties that studies were published on more general point-to-set maps as well as on the notions of continuity related thereto: Bouligand (1931, 1932, 1933), Kuratowski (1932), Hahn (1932), Blanc (1933). Among these more modern studies, published independently of one another (except for Blanc, who refers to Bouligand, Vasilesco and Kuratowski), the work of Hahn stands out. This most complete exposition makes uses of modern topological viewpoint. He drew up definitions of continuity and many results, often made use of subsequently, but very rarely quoted, despite a second edition that came out in 1948. The latter date appears to correspond, after a long interruption, to a revival in publications on the continuity properties of point-to-set maps: Brisac (1947), Choquet (1947, 1948), Fort (1949), these studies dealing with topological spaces. Then Michael (1951) studied the construction of topologies on the space of the subsets. More recently, we should mention Cesari (1966, 1968, 1970), Lasota and Olech (1968), Ky-Fan (1970), Biilera (1971), etc. This list does not claim to be exhaustive, especially for the latter years. Increasing numbers of theoretical articles are being published on point-to-set maps, without direct relationship to
Background to point-to-set maps
3
optimization: we are not including them in our reference list, except for some books of a general nature. We close, however, with a reminder respecting some known works dealing with and making use of these notions of continuity properties: Berge (1957), Debreu (1959), Kuratowski (1961) and, already referred to, Berge (1959 and 1966), Zangwill (1969). 3. Parametrization in mathematical programming The notion of point-to-set mapping appears quite naturally in mathematical programming during study of the stability of the optimal value of a program or of the set of optimal solutions, when the problem data depend on a parameter: the domain and the set of optimal solutions appear as point-to-set maps of this parameter. This problem of parametrization may itself be the outcome of solving a program in two steps. The variables being denoted by x and y, we fix y and optimize with respect to x. The problem thus reduced is a problem parametrized by y, and its optimal value is a function of y, which has to be improved by modifying y. Many articles have been written (several dozen) dealing closely or not with these questions, and they are marked with a cross in column 1 of the reference list. Among the first published are those by Berge (1966), Rosen (1966), Rockafellar (1967 and 1969), Dantzig, Folkman and Shapiro (1967), Dinkelbach (1969), Zangwill (1969). 4. General algorithms Another fruitful field of application of point-to-set maps to mathematical programming is the representation of iterative solving methods or algorithms. Most of these autonomous algorithms can be simply defined by the recurrent relation xi.~ E F(xi), where F is a suitably chosen point-to-set map, with eventually a termination rule. The determination of a point of F(x), x being given, then represents the set of calculations to be made at each iteration. The definition of F may be somewhat complex, and generally appears as the composition of several point-to-set maps. For instance, in order to represent the maximization without constraints of a function [ : R " ~ R , presumed to be continuously differentiable, by means of the classical algorithm known as the "ascent method", we can first consider the point-to-set maps D : R " • R"---> ~ ( R " ) and M : R " • giving to a point x and to a direction z the respective values D ( x , z ) = {y E R" [ y = x + Oz, O >0}, i.e. the half-line with extremity x and direction z, and M(x, z) = {y E D(x, z) If(Y) > f(t) Vt E D(x, z)}, i.e. the set of optimal solutions on the half-line D(x, z). The choice of the
4
Background to point-to-set maps
direction z, for a given x, offers some degree of freedom and can be defined by z E za(x), with A: R ' ~ ~ ( R ' ) a point-to-set map such that a(x) = {z ~ R = Iz. V/(x) ~ ~llV/(x)ll"llzll}
where a is a positive constant. Finally, writing zl'(x)= {x} • A(x), the point-toset map F, defining the ascent method algorithm, appears as the composition F = M o A', which means that we have, for any x, F ( x ) = U~x.,)~a,~x)M(x,z). Other examples can lead to compositions of a higher order. To return to the general case. If the algorithm properly plays its part, it must generate, from an initial approximate solution x0, a sequence of points converging towards an optimal solution of the problem posed (or any accumulation point of this sequence must be an optimal solution, or more modestly a stationary point). The justification of this convergence generally consists in two steps, the first of which amounts to showing that the limit point (respectively any accumulation point) of the sequence is a fixed point of F, i.e. satisfying x E F ( x ) . The second step amounts to showing that this fixed point condition is a sufficient condition of optimality (or a necessary condition of optimality, in which case the algorithm yields only stationary points). In what follows, the word convergence will thus take on a very broad meaning, for the sake of simplification. In such an approach to the convergence of algorithms, the demonstrations clearly show the respective parts played by the hypotheses and especially by the continuity properties of F. These demonstrations are simple in outline, easily allowing partial changes (eventually prompted by plain common sense) to be made in any given special method, while ensuring that the theoretical framework that guarantees convergence is not overstepped. For example, the introduction of an approximation e in the calculations regarding an iteration (i) does not alter the schema of the demonstration, where F becomes a function of (x, e) instead of x. If (x, e) is one accumulation value of the sequence of the pairs (x, el), then x E F ( x , ~ ) is obtained. A recurrence relation giving 9 = 0 is for instance, (Xi+l, Ei+t) ~ F(xi, ei) • [0, 89r To make things clear, in the example of the ascent method algorithm referred to above, the fixed point relation x ~ F(x, ~) means that there exists z E A ( x ) such that f ( x ) >- [ ( t ) - ~ Vt ~ D(x, z). Which, if ~ = 0 and account is taken of definition of the function A, leads to V/(x) = 0. This new approach to algorithms quite naturally lends itself to a synthesis of optimization methods. After 1960, the nonlinear programming methods proposed in mathematical literature showed considerable development. Without prejudice to the practical value of these methods, many of them show but slight variations, unjustified from the theoretical standpoint. A natural need for classification has become evident, both for characterization of the methods and for unifying the convergence demonstrations. This is why general methods have appeared in literature on the subject, providing a minimum of precise data about the hypotheses necessary for their theoretical justification, and affording a wide
B a c k g r o u n d to p o i n t - t o - s e t m a p s
5
choice for possible particularizations, thus permitting a rediscovery of many conventional methods. A forerunner in general algorithms is undoubtedly Zoutendijk (1960) who, while proposing well-defined special methods, demonstrated their convergence by means of theorems valid for an entire category of methods. His well-known theory of "feasible directions" did not, however, make use of point-to-set mapping. The same is true of some general algorithms proposed during the decade that followed: Levitin and Polyak (1966), Huard (1967), Fiacco (1967) . . . . . Then came the publication of Zangwill's book (1969), in which is presented with the aid of point-to-set mapping a general algorithm that goes beyond mathematical programming, and makes it possible to determine a point of a privileged subset P in a given set E. A point-to-set map F : E ~ ~(E) is used iteratively following the rule:
xi+l E F(xi) Xi+ l = X i
if xi ~ P, otherwise.
On the basis of some general hypotheses, any accumulation point of the infinite sequence thus generated belongs to P. One important application in mathematical programming is to take as P the set of optimal (or stationary) solutions of a program. Zangwill's book provided an incentive work on a synthesis covering nonlinear programming methods. His general algorithm was extended by weakening the hypotheses: R.R. Meyer (1968, 1970, 1976(a), 1976(b)), Polak (1970, 1971 and this volume), Huard (1972, and this volume), Dubois (1973), G.G.L. Meyer (1977). At the same time there was a growing concern for the presentation of conventional methods in the most general form. Among the articles most "committed", we should mention R.R. Meyer (1970, 1976(b), 1977(a)), G.G.L. Meyer and Polak (1971), Huard (1972, 1975), G.G.L. Meyer (1974, 1975) and Auslender and Dang Tran Dac (1976). We should not overlook the advanced article by an anonymous author (see Anonymous (1972)) which, if it amused all its readers, discouraged no-one.
5. Macroscopic study of the algorithms The regrouping of a family of special methods in a single general algorithm reveals a common, simpler schema, which makes it easier to distinguish between and separate the various basic operations forming an iteration. Hence the natural idea of considering the structure of an algorithm "macroscopicaily" in order to be able to alter some parts without touching the others. It is thus possible to consider setting up a new method "by blocks", taking these blocks from various existing methods. Or one can use successively, complete iterations of different algorithms. Here again point-to-set mapping lends itself to such construction. Let us consider, for example, several functions F / : X ~ ~(X), j = 1, 2 . . . . . p,
6
Background to point-to-set maps
such that for any j, the relation xi§ E F~(xi) represents the current iteration of an algortihm (j). If we use these relations each in turn, in the natural order j = 1, 2 . . . . . p, in cyclical form, by regrouping p iterations in succession we implicitly build up an algorithm defined by the function F = Fp o Fp-t . . . . . F~. Other types of compositions can be envisaged. The problem set is to establish under what conditions any accumulation point obtained by the algorithm composed is a fixed point for each of the functions F/. This latter property often represents an optimality condition for some decomposable problems. The most classical example of application of this procedure is relaxation. For instance, if a point (x, y) of R 2 maximizes a differentiable concave function in both directions of coordinates, it thus maximizes this function in the whole space. And this point can be obtained by an infinite sequence of sub-optimizations, effected alternatively with respect to x, with y fixed, and with respect to y, with x fixed. These relaxation procedures are old, and were developed independently of the use of point-to-set mapping. A short reference list is given in an annex regarding this work, which made use solely of sub-optimizations on lines of R n or eventually on affine varieties. It is but recently that the use of point-to-set mapping has spread relaxation to subsets of any kind, at the same time simplifying demonstrations by means of the idea of composing algorithms. This notion is the basic idea of Zangwill's book (1969). It has been taken over and developed by different authors, e.g.R.R. Meyer (1976(b)), G.G.L. Meyer (1975(a) and 1975(b)), Fiorot and Huard (1974 and this volume).
6. Extending the classical notions o[ continuity Study of general algorithms is as yet but in its initial stages, and the results obtained do not always permit making a satisfactory description of certain special methods. Obtaining a fixed point by means of an iterative algorithm rests largely on certain continuity properties of the associated point-to-set map. In building up a complex algorithm, making use of operations on the ranges of various point-to-set maps, such as intersection, union, optimization, etc., the continuity properties of these functions are n o t always preserved. It is obvious that the classical notions of continuity, originally introduced independently of optimization methods, are not always well adapted to recent studies. In particular, Zangwill's general algorithm, in its classical form, is based on hypotheses that are too strong to permit a great number of applications. Apart from the articles already referred to, generalizing this algorithm, there is a new notion of continuity introduced by Denel (1977 and this volume). This continuity does not tie in with one point-to-set map but with a family of them, depending o n a scalar parameter p, the values of which are decreasing functions of p, in the sense of inclusion. What is involved is an extension of classical continuities better adapted to the construction of general algorithms and which generalizes fixed point theorems. Lastly, this notion of a "p-decreasing family" makes possible the measurement of the discontinuity of point-to-set maps.
Background to point-to-set maps
7
7. Various uses o[ point-to-set mapping
Apart from this important movement of research, various applications should be mentioned. The classical extension of optimality conditions and of duality to problems having cone-constraints naturally makes use of point-to-set mapping: e.g., see Borwein (1977). Zang, Choo and Avriei (1975, 1976, 1977) relate the notion of global optimality to the lower semi-continuity of a point-to-set map, defined by the truncation of the domain by an equi-level surface of the objective function. The weakening of the notion of gradient, e.g. the sub-differential, replaces a single valued function by a point-to-set map. For example, see Minty (1964), Moreau (1967), Valadier (1968), Dem'yanov (1968), Rockafellar (1970), Bertsekas and Mitter (1973), Clarke (1976). The many procedures for solving convex programs by successive linearizations, proposed in mathematical literature, have often been taken up again and analysed in the light of point-to-set mapping. We cite: R.R. Meyer (1968), Robinson and R.R. Meyer (1973), Fiorot and Huard (1974(a)), Fiorot (1977), Denel (1977(b)). As already pointed out, taking approximate calculations into account during the course of each iteration brings but very small complication to the theoretical schema of a method when it is represented by a point-to-set map. Likewise, it is often possible to insert at each iteration the discretization of the domain where sub-optimizing is being carried out, in order to obtain finitely many calculations. If this discretization can be defined as a point-to-set mapping of the current solution, offering the requisite continuity properties, then the demonstration of convergence remains unchanged. Two examples of such a method of implementation are given in Huard (1977 and 1978), one for the maximization of an ascent direction, the other for adapting Rosen's gradient projection method to the case of nonlinear constraints. Finally, we would point out that integer programming is also affected by point-to-set mapping although, a priori, it appears to be excluded from this field because of its combinatorial nature. The proof is given by R.R. Meyer (1975, 1977(c)). In this rapid review we have neglected mathematical studies unrelated, a priori, to optimization, as for instance articles concerning the derivability of point-to-set maps. On this subject we might mention, however, Methlouthi (1977), who gives an application to the stability of inequalities. Acknowledgments
This review has been written with the help of several authors, who sent us references and documents. The oldest references were discovered by Dr. J. Denel. We wish to thank them for their helpful informations. P. HUARD Electricit~ de France, Clamart, France
Mathematical Programming Study 10 (1979) 8-12. North-Holland Publishing Company
ANNEX 1 T H E CONTINUITIES OF THE POINT-TO-SET MAPS, DEFINITIONS AND E Q U I V A L E N C E S O. Introduction
In the theory of set-valued mapping, two kinds of continuity have been developed. For each of them, very closely related definitions have been given using on one hand (Hill (1927), Kuratowsky and Hahn (1932), Bouligand and Blanc (1933)) ordering inclusion properties in terms of limit of sequences of sets and on the other hand (Hahn (1932), Choquet (1948), Berge (1959)) topological properties of the "inverse image". The connexions between these definitions are given in the following. All throughout the paper, we consider a map F from X into ~(Y), the set of subsets of Y, where X, Y are Hausdorff spaces. Particular assumptions (for example, first countability) will be specified when necessary; it is to be noticed that none of the assumptions that are given, can be deleted. The properties are presented without proofs, the majority of the results being stated in the literature (for complete proofs and counter examples see [2]). O. 1. Notation ~ the family of the neighbourhoods of x ~ X, N' C N will always denote an infinite subset of N, {x.}• a sequence of points in X ({x.}N, an extracted subsequence) [A] ~ " [B] means: A ::> B, and B ::> A if assumption H is verified. (for the meaning of such a specified assumption, see the footnotes to Diagrams 1 and 2).
1. Limits of sets (Hahn and Kuratowski) Let {A.}N be a given sequence of subsets of a topological space Y. 1. I. L o w e r limit o.f {A.}N limN A. denotes the lower limit of the sequence {A.}N, i.e. the subset of Y (possibly empty) that consists of points x satisfying (VV E T'(x))(3no)n > n o ~ A,, N V ~ 8
O,
Continuities of point-to-set maps
9
1.2. Upper limit of {A.} iimN A. denotes the upper limit of the sequence {A.}s, i.e. the subset of Y (possibly empty) that consists of all points x satisfying (WV ~ ~(x))(Vn)(3n' >- n)A., n v # ~. 1.3. Properties (a) (b) (c) (d)
limNA. C limN A. and these subsets are closed in Y. x ~ limN A. ~=rn,(3{x.}~--> x)(3no)(Vn >- no)x. E A.. x ~ lims A. ~:~H,(3N' C N)(:l {x.}N,~ x)(Vn E N')x. E A.. If ~7 is opened in Y. then (::IN' C N)(Vn ~
N')A.
n ~ = O--~(lims A . ) n ~ = 0.
(e) If G is compact (or G sequentially compact)
(limN A.) n G = 0 ~ (3no)(Vn >- no)A. n G = O. (f) If Y is a metric space with compact closed-balls, G closed, then A. N G ~ O
(Vn)
A. connected subset of Y
(Vn) ~ ( l i m N A . ) n G~O.
limN A. ~ ~ and compact
2. First kind of continuity: the lower continuity
2.1 We present four definitions that have been introduced in the literature. Definition 2.1 (Hill, Kuratowski, Hahn, Blanc). The map F is said to be lower semi continuous by inclusion (L.S.C.) at x E X if and only if (V{x.}s--> x)F(x) C limN F(x.). Definition 2.2 (Hahn, Choquet, Berge). The map F is said to be lower semi continuous (I.s.c.), at ~r E X if and only if (V~? C Y, opened) F(x) n r
~
(3 v ~ ~(x))(Vx' E V)F(x') n ~ # fJ.
Definition 2.3 (Debreu, Hogan, Huard). The map F is said to be opened (or lower continuous) at x E X if and only if
(V{x. E X}s~x)}(=l{y. E Y}s~ y)(=ln0)(Vn -> n0)y. E F(x.). (Vy ~ F(x))
10
Continuities of point-to-set maps
Definition 2.4 (Brisac). The map F is said to be lower semi continuous, (l.s.c.) at x E X if and only if
F(x) = {y ~E Y I ( v v ~E 'F'(y)){x' ~EX ] F(x') n V : O} 6~ ~ where F(x) denotes the closure of F(x).
2.2. Equivalence properties Diagram 1 shows the connections between these definitions; the reader is referred to its footnote for the meaning of hypothesis Hi (i = 1,2). These equivalences are given for the definitions at x E X.
Diagram 1. Ht: Y satisfies the "first a x i o m of countability": H2: "there exists at x ~ X a countable base of neighbourhoods".
Remark 1. The notion of lower-continuity is extended to the whole space X by assuming the l.s.c, at every x E X. A characterization of the lower semi continuity (l.s.c.) on X is given by the following relation (Berge, Choquet): F is l.s.c, on X if and only if
{x E X [ F(x) A ~ # 0} is opened in X for every opened 6 in Y. Remark 2. It is worth noticing that all the previous definitions are equivalent if X and Y are first countable Hausdorff spaces. This is the case for metric spaces. 3. Second kind of continuity: the upper continuity
3.1 Definition 3.1 (Hill, Kuratowski, Hahn, Bouligand, Choquet). The map F is said to be upper semi continuous by inclusion, (U.S.C.) at x ~ X if and only if (V{x, E X}N ~ x) lims F(x,) C F(x).
Continuities of point-to-set maps
11
Definition 3.2 (Hahn, Choquet, Berge). The map F is said to be upper semi continuous (u.s.c.) at x E X if and only if (W7 C Y, opened) F ( x ) C ~7~ (3 V E ~
E V ~ F(x') C ~.
Definition 3.3 (Debreu, Hogan, Huard). The map F is said to be closed (or upper continuous) at x E X if and only if (V{x, ~ X}~-~ x) ( V { y , E Y } N ~ y such that y . ~ F ( x , ) )
I (Vn) = ) , y E F ( x ) .
Definition 3.4 (Choquet). The map F is said to be weakly upper semi continuous (w.u.s.c.) at x E X if and only if ( V y ~ F(x))(3 U ~ ~V(x))(3 V E ~V(y))x' E U ~ F(x') fq V = ~. R e m a r k 3. Choquet Definition 3.2.
calls "strong upper semi continuity"
the property of
R e m a r k 4. Both Definitions 3.1 and 3.4 imply " F ( x ) closed", but Definition 3.3 only implies " F ( x ) sequentially closed".
3.2. Equivalence properties Diagram 2 shows the connections between these definitions (the meaning of assumptions Hi (i = 1,4) is given by its footnote). Beside these equivalences, the following proposition gives a relation between U.S.C. and u.s.c, in a particular interesting context.
.
Diagram 2. These following equivalences are gwen for the definitions at x ~ X. H~: Y satisfies the "first axiom of countability"; Hz: "there exists at x E X a countable base of neighbourhoods"; /'/3: Y is a regular space and F(x) is closed; 1"14: Y - F(x) is compact (in particular, if Y is compact,/-/4 is fulfilled).
Continuities o[ point-to-set maps
12
Proposition 3. If Y is a metric space with compact closed balls, then
F(x) ~ ~J and compact [ F(x') connected ::> [USC ::> u.s.c]. (3C c o m p a c t ) ( 3 V E ~V(x))x'~ V::> [ F ( x ' ) n C ~ O at x at x Remark 5. It is to be noticed that Definitions 3.1, 3.3 and 3.4 are equivalent if the spaces X and Y are first countable Hausdorff spaces (in particular, if X, Y are metric spaces). Remark 6. It is worth noticing that Berge defines the u.s.c, of a map over the whole space X by the two following conditions: FF is u.s.c, at every x E X, is compact-valued.
Bibliography (1) The reader is referred to the general reference list of this Study (more precisely, papers with a mark in column 2). (2) J.P. Delahaye, J. Denel, "Equivalences des continuit6s des applications multivoques dans des espaces topologiques", Publication n ~ 111, Laboratoire de Calcul, Universit6 de Lille (1978). J.P. D E L A H A Y E and J. D E N E L Universit~ de Lille l, France
Mathematical Programming Study 10 (1979) 13. North-Holland Publishing Company
ANNEX2 RELAXATION METHODS A. Auslender, "Methodes num6riques pour la d6composition et la minimisation de fonctions non diff6rentiables", Numerische Mathematik 18 (1971) 213-223. J. Cea and R. Glowinski, "Sur les m6thodes d'optimisation par relaxation", Rairo No. R-3 (1973) 5-32. D. Chazan and W. Miranker, "Chaotic relaxation", Linear Algebra and its Applications (1969) 199-222. B. Martinet and A. Auslender, "M6thodes de d6composition pour la minimisation d'une fonction sur un espace produit", S.LA.M. Journal on Control 12 (1974) 635-642. J.C. Miellou, "Algorithmes de relaxation chaotique /l retard", Rairo No. R-1 (1975) 55-82. J.M. Ortega and W.C. Rheinboldt, Iterative solution of nonlinear equations in several variables (Academic Press, New York, 1970). F. Robert, M. Charnay and F. Musy, "Iterations chaotiques s6rie parall61e pour des 6quations non-lin6aires de point fixe", Aplikace Matematiky 20 (1975) 1-37.
S. Schechter, "Relaxation methods for linear equations", Communications on Pure and Applied Mathematics 12 (1959) 313-335. S. Schechter, "Iterative methods for nonlinear problems", Transactions of the American Mathematical Society 104 (1962) 179-189. S. Schechter, "Relaxation methods for convex problems", Siam Journal on Numerical Analysis 5 (1968) 601-612. S. Schechter, "Minimization of a convex function by relaxation", in: J.M. Abadie, ed., Integer and nonlinear programming (North-Holland, Amsterdam, 1970) pp. 117-189.
13
Mathematical ProgrammingStudy I0 (1979) 14-28. North-Holland Publishing Company
G E N E R A L R E F E R E N C E LIST
Each references' main subjects are indicated by numbers: Number
Main subject Parametrization; stability Theory of point-to-set maps, possibly not related to optimization: continuity, differentiability, integrability, existence of fixed points, etc. General algorithms; synthesis of optimization methods; computation of fixed points Use of point-to-set maps in particular problems or methods of optimization (excepted stability problems) Book, survey.
Anonymous, "A new algorithm for optimization", Mathematical Programming 3 (1972) 124-128.
(3)
J.P. Aubin, "Propri6t6 de Perron-Frobenius pour des correspondances positives semi-continues supdrieurement", Centre de recherches math6matiques, rapport No. 719, University of Montreal (1977).
(4)
J.P. Aubin, Mathematical methods of game and economic theory (North-Holland, Amsterdam, 1979).
(4, 5)
J.P. Aubin, Applied Junctional analysis York, 1978).
(Wiley-Interscience, New
(5)
J.P. Aubin and F.H. Clarke, "Multiplicateurs de Lagrange en optimisation non convexe et applications", Comptes Rendus de i'Acad~mie des Sciences 285-A (Paris, 1977)451-454.
(4)
J.P. Aubin and F.H. Clarke, "Removal of linear constraints in minimizing nonconvex functions", Modelling research group report No. 7708, University Park, Los Angeles (1978)
(I,4)
J.P. Aubin and J. Siegel, "Fixed points and stationary points of dissipative multivalued maps", Modelling research group report No. 7712, University Park, Los Angeles (1978).
(2)
14
General reference list
15
R.J. Aumann, "Integrals of set valued functions", Journal of Mathematical Analysis and Applications 12 (1965) 1-12.
(2)
A. Auslender, "R4solution num6rique d'in4galit6s variationnelles", Comptes rendus de l'Acad~mie des Sciences 276-A (Paris, 1973), 10631066.
(2, 4)
A. Auslender and Dang Tran Dac, "M6thodes de descente et analyse convexe", Cahiers du Centre d'Etudes et de Recherche Op~rationnelle 18 (I 976) 269-307.
(3, 4)
A. Auslender, Optimisation (m(thodes num(riques) (Masson, Paris, 1976).
(3, 5)
A. Auslender, "Minimisation de fonctions localement lipschitziennes: Applications a la programmation mi-convexe, mi-diff4rentiable", in: Mangasarian, Meyer and Robinson, eds., Nonlinear Programming 3 (Academic Press, New York, 1978).
(4)
A. Auslender, "Differentiable stability in nonconvex and nondifferentiable programming", Mathematical Programming Study 10 (1979) (this volume).
(1,4)
H.T. Banks and M.Q. Jacobs, "A differential calculus for multifunctions", Journal of Mathematical Analysis and Applications 29 (1970) 246-272.
(2)
M.S. Bazaraa, "A theorem of the alternative with application to convex programming: optimality, duality and stability", Journal of Mathematical Analysis and Applications 41 (1973) 701-715.
(I)
B. Bereanu, "The continuity of the optimum in parametric programming and applications to stochastic programming", Journal o[ Optimization Theory and Applications 18 (1976) 319-334.
(I, 2)
C. Berge, "Th4orie g4n~rale des jeux ~ n personnes", M4morial des sciences math6matiques 138 (Gauthier-Villars, Paris, 1957).
(2, 5)
C. Berge, Espaces topologiques-Fonctions multiooques (Dunod, Paris, 1966).
(2, 5)
D.P. Bertsekas and S.K. Mitter, "A descent numerical method for optimization problems with nondifferentiable cost functionals", SIAM Journal on Control 11 (1973) 637-652.
(4)
L.J. Billera, "Topologies for 2x; set-valued functions and their graphs", Transactions of the American Mathematical Society 155 (1971) 137147.
(2)
16
General reference list
E. Blanc, "Sur une proprirt6 diffrrentielle des continus de Jordan", Comptes Rendus de l'Acad~mie des Sciences 196 (Paris, 1933) 600-602.
(2)
E. Blanc, "Sur la structure de certaines lois g~nrrales rrgissant des correspondances multiformes", Comptes Rendus de l'Acad(mie des Sciences 196 (Paris, 1933) 1769-1771.
(2)
F.H. Bohnenblust and S. Karlin, "On a theorem of Ville", in: H.W. Kuhn and A.W. Tucker eds., Contribution to the theory of games (Princeton University Press, Princeton, NJ, 1950). Vol. 1, pp. 155-160.
(4)
J. Borwein, "Multivalued convexity and optimization: a unified approach to inequality and equality constraints", Mathematical Programming 13 (1977) 183-199.
(2, 4)
G. Bouligand, "Sur la semi-continuit6 d'inclusion et quelques sujets connexes", Enseignement math~matique 31 (1932) 14-22, and 30 (1931) 240.
(2)
G. Bouligand, "Proprirtrs grn~rales des correspondances multiformes", Comptes Rendus de l'Acad~mie des Sciences 196 (Paris, 1933) 1767-1769.
(2)
R. Brisac, "Sur les fonctions multiformes", Comptes Rendus de l'Acad~mie des Sciences 224 (Paris, 1947) 92-94.
(2)
F. Browder, "The fixed point theory of multivalued mappings in topological vector spaces", Mathematische Annalen 177 (1968) 283301.
(2)
C. Castaing and M. Valadier, Convex analysis and measurable multi[unctions (Springer-Verlag, Berlin, 1977).
(2, 5)
A. Cellina, "A further result on the approximation of set valued mappings", Rendiconti Accademia Nazionale dei Lincei 48 (1970) 230-234.
(2)
A. Cellina, "The role of approximation in the theory of set valued mappings", in: H.W. Kuhn and G.P. Szeg/5 eds., Differential games and related topics (North-Holland, Amsterdam, 1971).
(2)
L. Cesari, "Existence theorems for weak and usual optimal solutions in Lagrange problems with unilateral constraints, I and II", Transactions of the American Mathematical Society 124 (1966) 369--430.
(2, 4)
L. Cesari, "Existence theorems for optimal solutions in Pontryagin and Lagrange problems", S I A M Journal on Control 3 (1966) 475--498.
(2, 4)
General reference list
17
L. Cesari, "Existence theorems for optimal controls of the Mayer type", S I A M Journal on Control 6 (1%8) 517-552.
(2, 4)
L. Cesari, "Seminormality and upper semicontinuity in optimal control", Journal o[ Optimization Theory and Applications 6 (1970) 114-137.
(2, 4)
G. Choquet, "Convergences", Annales de l'Universit~ de Grenoble 23 (1947-1948) 55-112.
(2)
F.H. Clarke, "A new approach to Lagrange multipliers", Mathematics of Operations Research 1 (1976) 165-174.
(4)
F. Cordellier and J.C. Fiorot, "Trois algorithmes pour r6soudre le probl~me de Fermat-Weber g6n6ralis6", Bulletin de la Direction des Etudes et Recherches (E.D.F.) S~rie C, suppl6ment au no. 2 (1976) 35-54.
(4)
F. Cordellier and J.C. Fiorot, "Sur le probl~me de Fermat-Weber avec fonctions de cofits convexes", Laboratoire de Calcul, publication no. 74, University of Lille (1976).
(4)
F. Cordellier and J.C. Fiorot, "On the Fermat-Weber problem with convex cost functions", Mathematical Programming 14 (1978) 295-311.
(4)
D.E. Cowles, "Upper semi-continuity properties of variables sets in optimal control", Journal of Optimization Theory and Applications 10 (1972) 222-236.
(2, 4)
J.P. Crouzeix, "Continuit6 des applications lin6aires multivoques", Revue Fran~aise d'Automatique, d'Informatique et de R.O. No. R1 (1973) 62-67.
(2)
Dang Tran Dac, "D6composition en programmation convexe", Revue Franr d'Automatique, d'In[ormatique et de R.O. No. R1 (1973) 68-75.
(3, 4)
J.W. Daniel, "Stability of the solution of definite quadratic programs", Mathematical Programming 5 (1973) 41-53.
(1)
G.B. Dantzig, J. Foikman and N. Shapiro, "On the continuity of the minimum of a continuous function", Journal of mathematical Analysis and Applications 17 (1%7) 519-548.
(1, 2)
R. Datko, "Measurability properties of set-valued mappings in a Banach space", S I A M Journal on Control 8 (1970) 226--238.
(2)
18
General reference list
G. Debreu, Theory of value, Cowles Foundation monograph No. 17, (Wiley, New York, 1959).
(5)
J.P. Delahaye and J. Denel, "Equivalences des continuit4s des applications multivoques dans des espaces topologiques", Laboratoire de Calcul, publication no. 111, University of Lille (1978)
(2)
V.F. Dem'yanov, "Algorithms for some minimax problems", Journal on Computer and System Sciences 2 (1%8) 342-380.
(4)
J. Denel, "Propri4t~s de continuit~ des families p-d~croissantes d'applications multivoques", Laboratoire de Calcul, publication no. 87, University of Lille (1977).
(2)
J. Denel, "On the continuity of point-to-set maps with applications to Optimization", Proceedings of the 2nd symposium on operations research, Aachen, 1977 (to appear).
(2, 3, 4)
W. Dinkelbach, Sensitivitiits Analysen und Parametrische programmierung (Springer-Verlag, Berlin, 1%9).
(1,5)
S. Dolecki, "Extremal measurable selections", Bulletin de l'Acad~mie Polonaise des Sciences 25 (1977) 355-360.
(2)
S. Dolecki, "Semicontinuity in constrained optimization", Control and Cybernetics (to appear).
(1,2)
S. Dolecki and S. Rolewicz, "Metric characterizations of the upper semicontinuity", Institute of Mathematics, report 125 (Polish Academy of Sciences, Warsaw, 1978).
(2)
S. Dolecki and S. Rolewicz, "Exact penalty for local minima", Institute of Mathematics, report 125 (Polish Academy of Sciences, Warsaw, 1978).
(4)
S. Dolecki and R. Rolewicz, "A characterization of semicontinuity preserving multifunctions", Institute of Mathematics, report 125 (Polish Academy of Sciences, Warsaw, 1978).
(2)
J. Dubois, "Theorems of convergence for improved nonlinear programming algorithms", Operations Research 21 (1973) 328-332.
(3)
B.C. Eaves, "Homotopies for computation of fixed points", Mathematical Programming 3 (1972) 1-22.
(3)
B.C. Eaves and R. Saigal, "Homotopies for computation of fixed points on unbounded regions", Mathematical Programming 3 (1972) 225-237.
(3)
General reference list
19
B.C. Eaves and W.I. Zangwill, "Generalized cutting plane algorithms", S I A M Journal on Control 9 (1971) 529-542.
(4)
I. Ekeland and M. Valadier, "Representation of set-valued mappings", Journal of Mathematical Analysis and Applications 35 (1971) 621-629.
(2)
J.P. Evans and F.J. Gould, "Stability in nonlinear programming", Operations Research 18 (1970) 107-118.
(1)
J.P. Evans and F.J. Gould, "A nonlinear duality theorem without convexity", Econometrica 40 (1972) 487-496.
(4)
A.V. Fiacco, "Sequential unconstrained minimization methods for nonlinear programming", Thesis, Northwestern University, Evanston, Illinois (1967).
(3)
J.C. Fiorot, "Algorithmes de programmation convexe par lin6arisation en format constant", Revue Fran~aise d'Automatique, d'Informatique et de R.O., Analyse Num~rique 11 (1977) 245-253.
(4)
J.C. Fiorot et P. Huard, "Une approche th~orique du probl~me de lin6arisation en programmation math~matique convexe", Laboratoire de Caicui, publication no. 42, University of Lille (1974).
(4)
J.C. Fiorot et P. Huard, "Composition et r6union d'algorithmes g~n6raux", Laboratoire de Caicul, publication no. 43, University of Lille (1974).
(3)
M.K. Fort Jr., "A unified theory of semicontinuity", Duke Mathematical Journal 16 (1949) 237-246.
(2)
J. Gauvin and J.W. Tolle, "Differential stability", S l A M Journal on Control and Optimization 15 (1977) 294-311.
(1)
R.J. Gazik, "Convergence in spaces of subsets", Pacific Journal of Mathematics 43 (1972) 81-92.
(2)
A.M. Geoffrion, "Duality in nonlinear programming: a simplified applications oriented development", S l A M Review 13 (1974) 1-37.
(1)
J.H. George, V.M. Seghal and R.E. Smithson, "Application of Liapunov's direct method to fixed point theorem", Proceedings of the American Mathematical society 18 (1971) 613-620.
(2)
I.L. Glicksberg, "A further generalization of the Kakutani fixed point theorem with application to Nash equilibrium points", Proceedings of the American Mathematical Society 3 (1952) 170-174.
(2, 3)
20
General re[erencelist
H.J. Greenberg and W.P. Pierskalla, "Extensions of the Evans and Gould stability theorems for mathematical programs", Operations Research 20 (1972) 143-153.
(1)
H.J. Greenberg and W.P. Pierskalla, "Stability theorems for infinitely constrained mathematical programs", Journal o.f Optimization Theory and Applications 16 (1975) 409-428.
(1)
J. Guddat, "Stability in convex quadratic parametric programming", Mathematische Operationsforschung und Statistik 7 (1976) 223-245.
(1)
J. Guddat and D. Klatte, "Stability in nonlinear parametric optimization", Proceeding of the IX Symposium on Mathematical Programming, Budapest (1976).
(1)
H. Hahn, "Uber irreduzible Kontinua", Sitzungsberichte der Akademie der Wissenscha[ten Wien 130 (Vienna, 1921).
(2)
H. Hahn, Reelle Funktionen, 1. Tome: Punktfunktionen, copyright 1932 by Akademische Verlags Gesellschaft MBH Leipzig (Chelsea Publishing Co., New York, 1948).
(2, 5)
F. Hausdorff, Set theory (Chelsea Publishing Co., New York, 1962).
(5)
H. Hermes, "Calculus of set valued functions and control", Journal of Mathematics and Mechanics 18 (1968) 47-59.
(4)
L.S. Hill, "Properties of certain aggregate functions", American Journal o[ Mathematics 49 (1927) 419--432.
(2)
C.J. Himmelberg, "Fixed points of compact multifunctions", Journal of Mathematical Analysis and Applications 38 (1972) 205-207.
(2)
W.W. Hogan, "Directional derivatives for extremal-value functions with applications to the completely convex case", Operations Research 21 (1973) 188-209.
(1)
W.W. Hogan, "The continuity of the perturbation function of a convex program", Operations Research 21 (1973)351-352.
(1)
W.W. Hogan, "Point-to-set maps in mathematical programming", (1,2,5) SIAM Review 15 (1973) 591-603. W.W. Hogan, "Applications of general convergence theory for outer approximation algorithms", Mathematical Programming 5 (1973) 151168. P. Huard, "Resolution of mathematical programming problems with nonlinear constraints by the method of centres", in: J. Abadie, ed.,
(3, 4)
Generdl reference list
21
Nonlinear programming (North-Holland, Amsterdam, 1967) pp. 206219.
(3)
P. Huard, "Optimisation dans R n. 2~me partie: Algorithmes g6n6raux", Laboratoire de Calcul, University of Lille (1972).
(3, 4, 5)
P. Huard, "Tentative de synth~se dans les m6thodes de programmation non-lin6aire", Cahiers du Centre d'~tudes de R.O. 16 (1974) 347-367.
(3, 4)
P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331.
(3, 4)
P. Huard, "Implementation de m6thodes de gradients par discr6tisation tangentielle", Bulletin de la Direction des "Etudes et Recherches (E.D.F.) S~rie C No. 2 (1977) 43-57.
(4)
P. Huard, "Implementation of gradient methods by tangential discretization", Journal of Optimization Theory and Applications 28 (1979).
(4)
M.Q. Jacobs, "Some existence theorems for linear optimal control problems", S I A M Journal on Control 5 (1967) 418-437.
(4)
M.Q. Jacobs, "On the approximation of integrals of multivalued functions", S I A M Journal on Control 7 (1969) 158-177.
(2, 4)
R. Janin, "Sensitivity for nonconvex optimization problems", in: A. Auslender ed., Convex analysis and its applications (Springer-Verlag, Berlin, 1977) pp. 115-119.
(1)
Z. Janiszewski, "Sur les continus irr6ductibles entre deux points", Journal de l'Ecole Polytechnique, S6rie II, 16 (1912).
(2)
J.L. Joly and P.J. Laurent, "Stability and duality in convex minimization problems", Revue Fran~aise d'Informatique et de R.O., No. R-2 (1971) 3-42.
(1)
S. Kakutani, "A generalization of Brouwer's fixed point theorem", Duke Mathematical Journal 8 (1941) 457-459.
(2)
P.R. Kleindorfer and M.R. Sertel, "Equilibrium existence results for simple dynamic games", Journal of Optimization Theory and Applications 14 (1974) 614--631.
(1,4)
R. Klessig, "A general theory of convergence for constrained optimization algorithms that use antizigzagging provisions", S l A M Journal on Control 12 (1974) 598-608.
(3)
22
General reference list
R. Klessig and E. Polak, "An adaptative precision gradient method for optimal control", S I A M Journal on Control 11 (1973) 80-93.
(4)
B. Kummer, "Global stability of optimization problems", Mathematische Operationsforschung und Statistik, Optimization 8 (1977) 367383.
(1)
C. Kuratowski, "Les fonctions semi-continues dans l'espace des ensembles ferm6s", Fundamenta Mathematicae 18 (1932) 148-160.
(2)
C. Kuratowski, Topologie, third edition, volume II, Monografie matematyczne (Polska Akademia Nauk., Warszawa, 1961) Ch. 4, section 39.
(2, 5)
Ky Fan, "Fixed point and minimax theorems in locally convex topological linear spaces", Proceedings of the National Academy of Sciences of USA 38 (1952) 121-126.
(2)
Ky Fan, "Extensions of two fixed point theorems of F.E. Browder", in: W.M. Fleischman ed., Set-valued mappings, selections and topological properties o[ 2x, Lecture notes in Mathematics 171 (SpringerVerlag, Berlin, 1970).
(2, 5)
A. Lasota and C. Olech, "On Cesari's semicontinuity condition for set valued mappings", Bulletin de l'Acad~mie Polonaise des Sciences 16 (1968).
(2)
J.M. Lasry and R. Robert, "Analyse non lin6aire multivoque", Cahiers de math~matiques de la d~cision No. 11, University of Paris-Dauphine (1976).
(2)
E.S. Levitin and B.T. Polyak, "Constrained optimization methods", USSR Computational Mathematics and Mathematical Physics 6 (1966) 1-50.
(3, 4)
T.C. Lira, "A fixed point theorem for multivalued nonexpansive mappings in a uniformly convex Banach space", Bulletin o[ the American Mathematical Society 80 (1974) 1123-1126.
(2)
D.G. Luenberger, Introduction to linear and nonlinear programming (Addison-Wesley, Reading, MA, 1973).
(4, 5)
V.J. Manusco, "An Ascoli theorem for multivalued functions", Journal of the Australian Mathematical Society 12 (1971) 466-477.
(2)
M. Martelli and A. Vignoli, "On differentiability of multi-valued maps",
General re[erence list
23
Bollettino della Unione Matematica Italiana 10 (1974) 701-712.
(2)
D.H. Martin, "On the continuity of the maximum in parametric linear programming", Journal of Optimization Theory and Applications 17 (1975) 205.
(1)
B. Martinet, "Perturbation des mrthodes d'optimisation. Applications", Revue Franfaise d'Automatique, d'In[ormatique et de R.O., Analyse Num~rique 12 (1978) 153-171.
(1,4)
O.H. Merrill, "Applications and extensions of an algorithm that computes fixed points of certain upper-semi-continuous point-to-set mappings" University of Michigan, Ph.D. Dissertation, Ann Arbor (1972).
(3)
H. Methlouthi, "Caicul diff6rentiel multivoque" Centre de recherche de math6matiques de la d~cision, Cahier No. 7702, Universit~ ParisDauphine (1977).
(2)
G.G.L. Meyer, "Algorithm model for penalty functions-type iterative procedures", Journal of Computer and System Sciences 9 (1974) 20-30.
(3, 4)
G.G.L. Meyer, "A canonical structure for iterative procedures", Journal of Mathematical Analysis and Applications 52 (1975) 120-128.
(3)
G.G.L. Meyer, "A systematic approach to the synthesis of algorithms", Numerische Mathematik 24 (1975) 277-289.
(3)
G.G.L. Meyer, "Conditions de convergence pour les algorithmes it6ratifs monotones, autonomes et non drterministes", Revue Franfaise d'Automatisme, d'Informatique et de R. 0., Analyse Num~rique 11 (1977) 61-74.
(3)
G.G.L. Meyer, "Convergence conditions for a type of algorithm model", SlAM Journal on Control and Optimization 15 (1977) 779-784.
(3)
G.G.L. Meyer and E. Polak, "A decomposition algorithm for solving a class of optimal control problems", Journal o[ Mathematical Analysis and Applications 32 (1970) 118--140.
(4)
G.G.L. Meyer and E. Polak, "Abstract models for the synthesis of optimization algorithms", SIAM Journal on Control and Optimization 9 (1971) 547-560.
(3)
G.G.L. Meyer and R.C. Raup, "On the structure of cluster point sets of iteratively generated sequences", Electrical engineering department, report No. 75-24, The Johns Hopkins University, Baltimore, MD (1975).
(3)
24
General re[erencelist
R.R. Meyer, "The solution of non-convex optimization problems by iterative convex programming", Ph.D. Thesis, University of Wisconsin-Madison (1968).
(4)
R.R. Meyer, "The validity of a family of optimization methods", SIAM Journal on Control 8 (1970) 41-54.
(3, 4)
R.R. Meyer, "Integer and mixed-integer programming models: General properties", Journal of Optimization Theory and Applications 16 (1975) 191-206.
(4)
R.R. Meyer, "Sufficient conditions for the convergence of monotonic mathematical programming algorithms", Journal of Computation and System Sciences 12 (1976) 108-121.
(3)
R.R. Meyer, "On the convergence of algorithms with restart", SIAM Journal on Numerical Analysis 13 (1976) 696-704.
(3, 4)
R.R. Meyer, "A convergence theory for a class of anti-jamming strategies", Journal of Optimization Theory and Applications 21 (1977) 277-297.
(4)
R.R. Meyer, "A comparison of the forcing function and point-to-set mapping approaches to convergence analysis", SIAM Journal on Control and Optimization 15 (1977) 699-715.
(3)
R.R. Meyer, "Equivalent constraints for discrete sets", Mathematics research center report No. 1748, University of Wisconsin-Madison (1977).
(4)
E. Michael, "Topologies on spaces of subsets", Transactions o[ the American Mathematical Society 71 (1951) 152-182.
(2)
E. Michael, "Continuous selections", Annals o[ Mathematics 63, 64, 65 (1956, 1957).
(2)
G.J. Minty, "On the monotonicity of the gradient of a convex function", Pacific Journal o[ Mathematics 14 (1964) 243-247.
(4)
R.L. Moore, "Concerning upper semicontinuous collections of continua" Proceedings o[ the National Academy of Sciences of USA 10 (1924) 356-360.
(2)
R.L. Moore, "Concerning upper semicontinuous collections of continua" Transactions of the American Mathematical Society 27 (1925) 416.
(2)
J.J. Moreau, "S6minaire sur les 6quations aux d6riv6es partielles, II, Fonctionnelles convexes", Coll6ge de France (1966-1967).
(4)
General re[erencelist
25
S.B. Nadler, "Multivalued contraction mappings", Pacific Journal o[ Mathematics 30 (1969) 475--488.
(2)
F. No~.i~ka, J. Guddat, H. Hollatz and B. Bank, Theorie der linearen parametrischen Optimierung (Akademie-Verlag, Berlin, 1974).
(I, 5)
J.M. Ortega and W.C. Rheinbolt, Iterative solution of nonlinear equations in several variables (Academic Press, New York, 1970).
(5)
J.M. Ortega and W.C. Rheinbolt, "A general convergence result for unconstrained minimization methods", SlAM Journal on Numerical Analysis 9 (1972) 40--43.
(3)
A.M. Ostrowski, Solution o[ equations and systems of equations (Academic Press, New York, 1966).
(5)
W.W. Petryshyn and P.M. Fitzpatrick, "Fixed point theorems for multivalued non compact inward maps", Journal o[ Mathematical Analysis and Applications 46 (1974) 756--767.
(2)
E. Polak, "On the convergence of optimization algorithms", Revue Franfaise d'In[ormatique et de R.O. 16-R1 (1969) 17-34.
(3, 4)
E. Polak, "On the implementation of conceptual algorithms", in: J.B. Rosen, O.L. Mangasarian and K. Ritter, eds., Nonlinear programming (Academic Press, New York, 1970) pp. 275-291.
(3, 4)
E. Polak, Computational methods in optimization: A unified approach (Academic Press, New York 1971).
(3, 4, 5)
E. Polak, R.W.H. Sargent and D.J. Sebastian, "On the convergence of sequential minimization algorithms", Journal o[ Optimization Theory and Applications 14 (1974) 439--442.
(4)
B.T. Polyak, "Gradient methods for the minimization of functionals", U.S.S.R. Computational Mathematics and Mathematical Physics 3 (1963) 864--878.
(3)
S.M. Robinson, "Stability theory for systems of inequalities, Part II: Differentiable nonlinear systems", Mathematics research center technical report No. 1338, University of Wisconsin-Madison (1974).
(1)
S.M. Robinson, "Perturbed Kuhn-Tucker points and rates of con-
26
General reference list
vergence for a class of nonlinear-programming algorithms", Mathematical Programming 7 (1974) 1-16.
(1)
S.M. Robinson, "Regularity and stability for convex multivalued functions", Mathematics of Operations Research 1 (1976) 130-143.
(1)
S.M. Robinson, "First-order conditions for general nonlinear optimization", S l A M Journal on Applied Mathematics 30 (1976) 597-607.
(4)
S.M. Robinson, "A characterization of stability in linear programming", Operations Research 25 (1977) 435 447.
(1)
S.M. Robinson and R.H. Day, "A sufficient condition for continuity of optimal sets in mathematical programming", Journal of Mathematical Analysis and Applications 45 (1974) 506-511.
(l)
S.M. Robinson and R.R. Meyer, "Lower semicontinuity of multivalued linearization mappings", S I A M Journal on Control l l (1973) 525-533.
(2, 4)
R.T. Rockafellar, "Duality and stability in extremum problems involving convex functions", Pacific Journal of Mathematics 21 (1967).
(1)
R.T. Rockafellar, Convex functions and duality in optimization problems and dynamics, Lecture notes in operations research and mathematical economics 11 (Springer-Verlag, Berlin, 1969).
(5)
R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, N J, 1970).
(5)
R.T. Rockafellar, "Montone operators and the proximal point algorithm", S I A M Journal on Control and Optimization 14 (1976) 877892.
(3)
J.B. Rosen, "Iterative solution of nonlinear optimal control problems", S I A M Journal on Control 4 (1966) 223-244.
(4)
J.B. Rosen, "Two-phase algorithm for nonlinear constraint problems", Computer science department technical report No. 7%8, University of Minnesota (1977).
(4)
R. Saigal, "Extension of the generalized complementarity problem", Mathematics of Operations Research 1 (1976) 260-266.
(4)
J. Saint-Pierre, "Borel convex-valued multi[unctions" in: A. Auslender ed., Convex analysis and its applications (Springer-Verlag, Berlin 1977) pp. 180-190.
(2)
M.R. Sertel, "The fundamental continuity theory of optimization on a
General re[erencelist
27
compact space. 1", Journal of Optimization Theory and Applications 16 (1975) 549-558.
(l)
I. Shigeru and T. Wataru, "Single-valued mappings, multi-valued mappings and fixed point theorems", Journal of Mathematical Analysis and Applications 59 (1977) 514-521.
(2)
R.E. Smithson, "Sub continuity for multifunctions", Pacific Journal of Mathematics 61 (1975) 283-288.
(2)
S. Swaminathan, Fixed point theory and its applications (Academic Press, New York, 1977).
(2, 5)
Hoang Tuy, "On the convex approximation of nonlinear inequalities", Operations[orschung und Statistik 5 (1974) 451-466.
(4)
Hoang Tuy, "Stability property of a system of inequalities", Mathematische Operations[orschung und Statistik, Optimization 8 (1977) 27-39.
(1)
M. Valadier, "Quelques propridtds des sous-gradients", Rapport de I'I.R.I.A., Automatique 6833, Rocquencourt (1%8).
(4)
F.L. Vasilesco, Th~se, Facultd des Sciences, Paris (1925).
(2)
R.J.B. Wets, "On the convergence of random convex sets", in: A. Auslender ed., Convex analysis and its applications (Springer-Verlag, Berlin, 1977) pp. 191-206.
(2)
W.A. Wilson, "On the structure of a continuum, limited and irreductible between two points", American Journal of Mathematics 48 (1926) 145-168.
(2)
I. Zang and M. Avriel, "On functions whose local minima are global", Journal of Optiniization Theory and Applications 16 (1975) 183-190.
(4)
I. Zang, E.U. Choo and M. Avriei, "A note on functions whose local minima are globaF', Journal of Optimization Theory and Applications 18 (1976) 555-559.
(4)
I. Zang, E.U. Choo and M. Avriei, "On functions whose stationary points are global minima", Journal of Optimization Theory and Applications 22 (1977) 195-208.
(4)
28
General reference list
W.I. Zangwill, Nonlinear programming: a unified approach (PrenticeHall, Englewood Cliffs, N J, 1969).
(3, 4, 5)
W.I. Zangwill, "Convergence conditions for nonlinear programming algorithms", Management Science 16 (1969) 1-13.
(3, 4)
L. Zoretti, "Sur les fonctions analytiques uniformes qui poss6dent un ensemble parfait discontinu de points singuliers", Journal de Mathdmatiques Pures et Appliqudes 1 (1905) 1-51.
(2)
G. Zoutendijk, Methods of feasible directions (Elsevier, Amsterdam, 1960).
(3, 5)
Mathematical Programming Study 10 (1979) 29-41. North-Holland Publishing Company
DIFFERENTIABLE STABILITY IN NON CONVEX NON DIFFERENTIABLE PROGRAMMING
AND
A. A U S L E N D E R
Universit~ de Clermont, Aubidre, France Received 24 October 1977
This paper consists of a study of differentiable stability in non convex and non differentiable mathematical programming. Vertically perturbed programs are studied and upper and lower bounds are estimated for the potential directional derivative of the perturbed objective function.
Key words: Mathematical Programming, Objective Function Sensibility, Locally Lipschitz Function, Generalized Gradients.
0. Introduction This p a p e r consists o f a study o f differentiable stability in non c o n v e x and non differentiable m a t h e m a t i c a l p r o g r a m m i n g . F o r a s u r v e y o f this and related w o r k see G a u v i n and Toile [7]. The goal o f this paper is to e x t e n d the G a u v i n and Tolle's results to the non differentiable case. More precisely in this p a p e r R N is the usual v e c t o r s p a c e o f real N - u p l e s with the usual inner p r o d u c t d e n o t e d by (.,.), m, p, N ( p < N ) positive integers, and [, [i iE(1,m)g~, j E ( 1 , p ) are real-valued f u n c t i o n s defined on R N locally Lipschitz; (1, p ) d e n o t e s the set o f integers included in the closed interval [ 1 , p ] . T h e f u n c t i o n s gi i E ( l , p ) are c o n t i n u o u s l y differentiable. Let (z, w) ~ R m • R p ; we shall c o n s i d e r the sets C = { x : / i ( x ) -x lim .O(x,) is either empty or equal to D ( x ) and that ~ is continuous, then H is closed at x. Theorem 1.3 ([6, corollary 2.3.4]). Let g, {g~} be affine functions f r o m R n to R m with g = l i m r ~ gr. If we define H ( g ) = {x: g(x) = 0} and suppose that lim sup rank(g,) -< rank(g), r~oe
Then either l i m r ~ H(g,) = H ( g ) or H(gr) is empty f o r infinitely m a n y r.
1.2. Pseudoinverse maps In this p a p e r we shall have occasion to use the pseudoinverse of a p x N matrix A with rank p (p < N ) . For such a matrix, A A ' is non singular and the matrix A ' ( A A t ) -~ denoted by A -n is called the pseudoinverse of A. M o r e o v e r we have: Theorem 1.4 ([8, theorem 8.1]). I f A is a p • N matrix with rank p, then A lb is a solution o f A x = b. The next t h e o r e m will be fundamental in this paper. Theorem 1.5. L e t gi i E ( 1 , p ) be a set of p real-valued functions continuously differentiable defined on R N. Suppose further that the matrix o f partial derivatives (Ogi], Oxj/
iE(1,p),
jE(1,N)
(1.0)
has rank p at x = Xo. Then f o r a given vector ho, there exists a neighborhood V • X of (h0, x0), 8 > 0 a function a with values in R N defined on ] - 8, 8 [ • V • X continuous which has with respect to the first component a continuous partial derivative aa/aA such that a(0, h, y) = 0,
Oct ~ - (0, h, y) = 0 V(y, h) E X • V
(1.1)
A. Auslender/ Differentiable stability
32
and such that if we set x(A, h, y) = y + a(A, h, y) + Ah,
then x(A, h, y) satisfies gi(x(A, h, y)) = gi(y) + A(Vgi(y), h) VAE]-6,6[,
V(y,h)EXxV,
iE(1,p).
(1.2)
Proof. Use the proof of [8, theorem 5.2] substituting the variable A by the triplet (X, h, y). For the following we shall denote by G(xo) the pseudoinverse of the matrix defined by (1.0).
1.3. Locally Lipschitz functions For completeness we include some important concepts from the theory of locally Lipschitz functions. For a more detailed exposition the reader is referred to Clarke [2, 3, 4, 5]. Let f be a real-valued function defined on R N and locally Lipschitz, then there exists for every x ~ R N a nonempty convex compact set denoted by Of(x) and called the generalized gradient such that, if we note
/~
12)
sup f(X + h + A v ) - f ( x + h) lira ~o + A '
(1.3)
h-,O
then we have f~
v) = 6*(v I Of(x)) V v E R N.
(1.4)
In this formula, 0"(. I Of(x)) is the support functional of Of(x). More generally if C is a convex compact set we note
O*(v I C) = max((v, Y) I Y E C).
(1.5)
We shall use the two classical properties of support functionals ([12, theorem 13.11)
OECr
O*(v [ C ) - > 0 V v E R N.
(1.6)
If C,, i ~ (1, r), are convex compact sets, Ai positive reals, then
O*(vl~:lAiCi)=~AiO'(vlCi). We shall use also the following properties of f0. (1) f0 is upper semi continuous (u.s.c.), (2) for any x, if(x; .) is convex and positively homogeneous,
(1.7)
A. Auslender/ Differentiable stability
33
(3) f~
v) = lim sup f(x + h +Av') - f ( x + h)
(1.8)
h-*O,v'~t"
2. A fundamental theorem: first applications Theorem 2.1. Let hi i E (1, s) be a set o f p real-valued functions defined on R N locally Lipschitz. For any (z, w) E R s x R p set D(z, w) = {x: hi(x) 0 such that x(X) ~ D*(Xz, Xw) V )t E ]0, Xo[.
(2.3)
(B) More generally, f o r (L ~ ) there exists a neighborhood Z x W o f (L ~), )to > O, an affine injective transformation d defined f r o m R " into R N such that f o r any (z, w ) E Z • W, A ~ ] 0 , Ao[ (a) h~163d ( w ) ) < zi,
i E I(s
(Vg;(s
d(w)) = w~, i E (1, p),
(b) if a is the function associated to (,L d ( ~ ) ) by Theorem 1.5 and if we set x(a, w) = ~ + a (A, d(w), s + Xd(w), then x(a, w) E D*(Az, AwL
(2.4)
Proof. Obviously part (A) is a corollary of part (B), but for more clearness we shall prove first part (A).
A. Auslender/ Differentiable stability
34
(A(1)) Since Vg~($), i E (1, p) are i n d e p e n d e n t , G($)w satisfies (Vgi(~), G($)w) = w,,
i ~ (1, p)
( G ( $ ) is the p s e u d o i n v e r s e of the of the matrix defined in 2.0). T h u s f r o m 2.1 and since the functionals {(Vg,-($),.)} are linear and the functionals {h~ are h o m o g e n e o u s and c o n v e x there exists 6 --- 0 such that if we set ~ = O7)7+ G(Y,)w, then h~
}) < z,,
i E I($),
(Vgi(s
}) = w,,
i E (1, p).
(A)
(A2) L e t d satisfying 2.2, a the f u n c t i o n a s s o c i a t e d to (d, s b y T h e o r e m 1.5 and set x(X) = :~ + a(X, d, ~) + Ad, then: (a) f r o m T h e o r e m 1.5, there exists ;tl > 0 such that for ;t E ]0, ;t~[,
g~(x(A))=Aw,,
i @ (1,p).
(b) Since zi - h o(~; d ( ~ ) ) > 0 there exists A2 > 0 such that f o r i E I ( ~ ) , A E ]0, A2[, hi(x(A)) - h~(~) < lira sup hi(x(A)) - hi(~) A ,~o* A
d) + zi
h~
Set d' = d + a(A, d, .f)/X ; then f r o m (1.1) d ' ~ d if X ~ 0 +. T h u s f r o m 1.8, we h a v e lira sup hi(x(X)) - hi(S) _< lim sup hi($ + h + Xd') - hi('2 + h) x-.0 +
A
h-.O.x--.o
A
-< lim sup h~(:~ + h + Ad*) - h,-(.f + h) = h0($; d) h-~O,d*-~d ~~0 +
and c o n s e q u e n t l y w e obtain h/(x(X)) < Xzi.
(B)
(c) Since h,- is c o n t i n u o u s and since a(X, d, s such that f o r A E ]O, A3[, i~. I(g),
0 if A ~ 0 + t h e r e exists A3 > 0
hi(x(X)) < Xzi.
(C)
If we set ;t0 = min(At, ;t2, A3), then we obtain 2.3. (B(I)) F o r e v e r y b o u n d e d n e i g h b o r h o o d Z • W of (L if') t h e r e exists o7 - 0 such that if we set d ( w ) = o T ~ + G ( $ ) w , then d is an injective attine m a p satisfying for e v e r y (z, w) E Z • W,
h~
iEl(~),
(Vgi(~),d(w))=w~,
iE(I,p).
(B(2)) L e t a the f u n c t i o n a s s o c i a t e d to (~, d(ff)) b y . T h e o r e m 1.5 and set x(A, w) = $ + a(A, d ( w ) , ~) + Ad(w). Then: (a) F r o m T h e o r e m 1.5, since a(A, d(w), $ ) ~ 0 if A ~ 0 § and w --, ff there exist
A. Auslender/ Differentiable stability
35
At > 0 , W1C W neighborhood of ~ such that for A E ] 0 , At[, w E WI, z E Z we have gi(x(A, w)) =Awi,
i ~ (1, p),
hi(x(A, w)) < Azi,
i~_ I(Y`).
(b) Since z i - h~ d ( f f ) ) > 0, there exist A2E ]0, A,[, a neighborhood W2C W~ such that for A E ]0, A2[, w E W2, z ~ Z, i E 1(2) we have hi(x(A, w)) - hi(y`)
A
< lim sup A-.o%w~
hi(x(A, w)) - hi(y`)
A
+ zi - h~
d(ff))
and then for such (A, w), with the same arguments as in (A (b)) we can prove that hi(x(A, w)) < Xzi. Now return to problem P given in the introduction. First we want to give necessary conditions to characterize a local minimum 2. For that it is necessary to impose some type of constraint qualification condition on the functions .6, gi at Y`. We shall say that 2 satisfies the constraint qualification condition if: (1) matrix
OXj/ ie(i,p).ieO,N)
has rank p at Y`; (2) there exists 3~ such that ~ ( 2 ; 3~) < 0,
i ~ I(2),
(Vgi(y`), 37) = 0,
i E (1, p)
(2.5)
where I(Y`) = {i: [i(y`) = 0}. This condition generalizes the Mangasarian-Fromovitz condition [11] in the differentiable case. The next generalized K u h n - T u c k e r theorem appears as a trivial corollary of more general theorems given by Hiriart-Urruty [9] and by Clarke [3]. Nevertheless this theorem is also a simple corollary of Theorem 2.1 and we obtain thus another proof of it.
Theorem 2.2. I[ Y` E C satisfies the constraint qualification condition a n d if 2 is a local m i n i m u m [ o r f in C, then there exist real n u m b e r s ui vj, i ~ 1(2), j ~ (1, p) such that
ui -> 0,
i E 1(2),
o~ of(y,)+ ~ , ~u~af~(y`)+ ~.= v~vgj(y`). Proof. Denote by D the following set D = {d: f~
d) < 0, (Vgi(2), d) = 0, j E (1, p),
~(Y`; d) < 0, i ~ I(2)}.
(2.6) (2.7)
A. Auslender/ Differentiable stability
36
D is empty. In the contrary case, if we set hi = [i, i E (1, m), hm+~(')= f ( ' ) - / ( $ ) , (z, w) -- (0, 0), then by part (A) of T h e o r e m 2.1 we obtain a contradiction with the fact that $ is a local minimum. Thus, from the fundamental theorem [I, p. 210] there exists ;to, ;ti - 0, i E I($), /zj, j E (1, p) such that ;to+ ~
iEl(s
d)+
;ti# 0,
d)
iEI(.~)
+ ~ ~j(Vg~(~), d) -> 0, Yd. j=l
Next, the qualification constraint condition implies that ;to # 0; then from 1.6 and 1.7 we conclude that 2.7 holds. Let s be the n u m b e r of elements of I(~); without loss of generality we assume that I ( ~ ) - - ( 1 , s) and for the following we shall denote by K ( g ) the set of K u h n - T u c k e r vectors (u, v) E R s x R p such that 2.6 and 2.7 hold.
3. Differentiable stability In this section we investigate some differential properties of the perturbed program P(z, w) given in the introduction. We denote by M(z, w) the optimal set of P(z, w), that is, M ( z , w ) = {x E C ( z , w): f ( x ) = h ( z , w)}.
It will be assumed in this section that ~ is a local minimum for f in C and that the qualification constraint condition is satisfied in s
Theorem 3.1. F o r e v e r y (z, w ) E R s x R p, E > 0 there exists a v e c t o r y(e, z, w ) s u c h that
f0(~; y(E, z, w)) < zi,
i E I(~),
(Vgi(~), y(e, z, w)) = w. i ~ (l,p),
f0(~; y(e, z, w)) -< -min(((u, v), (z, w)) [ (u, v) E K(~)) + e. Proof. For (u, v, y) ~ R s x R p x R N we denote L(u, v, y) =
y) + E
y) +
iE/((~)
i"= l
Consider the minimization problem PI:
a = inf(f~
y) I ~ ( ~ ; y) -< zi,
i ~ I(~),
(Vgi(x), y) = wi,
i ~ (1, p))
y).
(3.1)
(3.2)
A. Auslender/ Differentiable stability
37
and its ordinary dual problem Q,:
fl : sup(inf-x ~ viwi y
i=l
~, uizi+ L(u, v, y)[ui >-O, i E I(,2,, v, E R) iUI(2)
Since from part (A(a)) of Theorem 2.1 there exists y* such that ff~(2;y*)0, v E RP). \
y
/
From 1.6 and 1.7 we have K ( 2 ) = {(u, v): u >- O, L(u, v, .) >- 0},
(A)
Let u -> 0; if (u, v) ~ K(2), it is a consequence from (A) that there exists y such that L(u, v, y) < 0 and since L(u, v, .) is positively homogeneous we have inf(L(u, v, y) [ y E R N) = -or In the other case, if (u, v) E K(2) since L(u, v, .) is homogenous and ->0 we have inf(L(u, v, y) I Y E R N) = 0. Thus finally we obtain a = -min(((u, v), (z, w))[ (u, v) E K(:D). Then if we take an e-optimal solution of Pt we obtain the desired result. For every z E R " we denote z "- (z *, z 2) with z t = (z, . . . . . z,), z 2 = (zs+l . . . . . z,,). Now we assume for the sequel that 2 E M(0, 0). 3.2. For any direction (z, w) E R m x R p, (1) there exists A0 such that
Theorem
C(Xz, Aw) r OVA E ]0, X0[,
(2) lira sup h(Xz, Aw) - h(0, 0) < - min(((u, v), (z', w)) [ (u, v) E K(2)). A-.,o
A
(3.3)
Proof. Let e > 0 , / ~ > 0 , y ( e , z , w ) satisfying 3.1 and 3.2. Set d(e,p.) = y(~, z, w ) + / ~ (~ satisfies 2.5). Then from part (A(a)) of T h e o r e m 2.1, if x is the
A. Auslender/ Differentiable stability
38
curve associated to d in this theorem, there exists ;to > 0 such that for A ~ ]0, ;to[, x(x) ~ C(;tz, ;tw).
(A)
Moreover, from (A) we obtain
h(;tz, hw) -- 0, tz ~ 0 we obtain from (B) inequality 3.3. T h e o r e m 3.3. Assume that the point-to-set mapping C(.) is uniformly compact
near (0, 0), then h is continuous at (0, 0). Proof. Since C(-) is uniformly compact near 0 and closed at (0, 0), h is lower semi-continuous at (0, 0) by Theorem 1.1. We shall prove now that h is upper semi-continuous at (0, 0). Let {z,, w,} converging to (0, 0) such that = lim sup h(z, w) = lim h(z,, w,). ( w,z )--~ O,O)
n--,~
Set (~., ~ . ) _
(z., w.)
I[(z.,
w.)[l'
A. = II(z., w.)ll,
t h e n w i t h o u t Joss o f g e n e r a l i t y w e can a s s u m e that there e x i s t s (2, ~ ) s u c h that (5, ~ ) = lim (2., ~.),
0 = lim A..
n ---~r
n--~o
Thus from part B of T h e o r e m 2.1 if x is the curve associated to (:~, d(~)) there exists no such that for n -> no,
x(;t., ~,.) ~ C(;t.5., ;t.r Since h(;t.(2., ~.)) = h(z., w.) we obtain
h(z,,, w,,) nt we have gi(.f.(A)) = g i ( x . ) - AWl V A G ]0, )~[, a.(0) = 0,
lira
a ' ( A ) = 0.
T h u s there exists n2 > n~ such that for n > n2, A, e 10, ~7[, lira a . ( A . ) = 0,
lira a.(A.) = 0.
L e t y. = ~.(A.). T h e n y. = x. + A.d" with d = l i m . ~ d ' . Since x. E M(A.z, A.w) w e have g/(y.) = 0,
n -----n2.
(A)
T h e n there exists n3 -- n2 such that (a)
fi(y.) < 0,
i ~ IC~);
(b)
fi(y.)-fi(x.)_~
A.
(B)
lim sup f i ( y " ) - f i ( x " ) .-~ A.
/ ~ ( ~ ; 3~)
---O
VaEA,
(12)
and
Lx + [ ( - l ) x >- 0
Vx E R(S).
(13)
Since (13) says that L and [ coincide on R(S) we are done. If A is a polyhedral cone and X and Y are finite dimensional (12) and (13) will still hold as a consequence of the linear duality theory [7], even if (9) fails. Thus, in Theorem 2 we have reestablished a generalization of all the classical Farkas iemmas.
Acknowledgment This work was partially produced while the author was a D.Phil. Student under the supervision of Dr. M.A.H. Dampster of Ballioi College, Oxford, whose continued interest is much appreciated. Research was partially funded on N.R.C. Account A4493.
References [I] C. Berge, Topological spaces (Oliver and Boyd, London, 1963). [2] J.M. Borwein, "Multivalued convexity and optimization: a unified approach to equality and inequality constraints", Mathematical Programming 13 (1977) 163-180. [3] B.D. Craven, "Nonlinear programming in locally convex spaces", Journal of Optimization Theory and Applications 10 (1972) 197-210. [4] T.M. Flett, "On differentiation in normed spaces", Journal of the London Mathematical Society 42 (1967) 523-533. [5] D.H. Fremlin, Topological Riesz spaces and measure theory (Cambridge University Press, Cambridge, 1974). [6] K. Kuratowski, Topology, Volume I (revised) (Academic Press, New York, 1966). [7] R. Lehmann and W. Oettli, "The theorem of the alternative, the key theorem and the vector maximization problem, Mathematical Programming 8 (1975) 332-344. [8] O.L. Mangasarian, Nonlinear programming (McGraw-Hill, New York, 1969). [9] V.J. Manusco, "An Ascoli theorem for multivalued functions", Journal o[ the Australian Mathematical Society 12 (1971) 466-477. [10] A.L. Peressini, Ordered topological vector spaces (Harper and Row, New York, 1967). [I 1] K. Ritter, "Optimization in linear spaces, I", Mathematische Annalen 182 (1969) 189-206. [12] K. Ritter, "Optimization in linear spaces, lI", Mathematische Annalen 183 (1969) 169-180. [13] K. Ritter, "Optimization in linear spaces, Ill", Mathematische Annalen 184 (1970) 133-154. [14] A.P. Robertson and W.J. Robertson, Topological vector spaces (Cambridge University Press, Cambridge, 1964). [15] R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, 1970).
Mathematical Programming Study 10 (1979) 48-68. North-Holland Publishing Company
EXTENSIONS O F T H E C O N T I N U I T Y OF POINT-TO-SET MAPS: A P P L I C A T I O N S T O FIXED P O I N T A L G O R I T H M S J. D E N E L Universit~ de Lille 1, France Received 17 January 1978 Revised manuscript received 24 March 1978
A new approach for synthetizing optimization algorithms is presented. New concepts for the continuity of point-to-set maps are given in terms of families of maps. These concepts are well adapted to construct fixed point theorems that are widely useful for synthetizing optimization methods. The general algorithms already published are shown to be particular applications and illustrations in the field of mathematical programming are given. Key words: Point-to-Set Maps, Continuity, General Algorithms, Fixed Point Theorems, Synthesis, Optimization.
1. Introduction
For about ten years the number of optimization methods has been leading some authors to develop a synthetic approach of such algorithms in order to recognize their common features and a global condition for the convergence. Zangwill [19] seems to have been the first to fully exploit in the optimization area the concept of point-to-set maps. He showed that a lot of methods can be viewed as applications of the fixed point method, xi+, E F(x~), where F is a point-to-set map that depends on a particular algorithm. This approach divides the convergence criterias into two kinds. The first one to ensure that any accumulation point of the sequence generated by the method is a fixed point for F, the second one to ensure that every fixed point has optimality properties for the problem. We are interested here by the description of fixed point algorithms which generalize those already published [15, 16, 17, 19] and enlarge their possible applications. The convergence conditions generally given for such general schemes are of two kinds; the first one is the so-called strict monotonic property (which means the existence of a function h such that x ' E F(x) and x ~ F(x) implies h ( x ' ) > h(x)). The second condition uses the continuity-properties of point-to-set maps (about these notions see for example Berge [1] and for the terminology used here, see [151 and the Appendix). A lot of papers are concerned with the study of the continuity of maps often used in the modelization of mathematical programs (see for example [1, 10, 15, 16]). Another topic that arises is the stability of 48
J. Denel/ Extensions of the continuity
49
optimization problems; it has been widely studied [3, 8, 9, 10, 15, 17]. These results show that the classical continuities of mappings are not well adapted because when modelizing one has to consider maps that are defined either by an operation (intersection, composition ...) between maps or by the optimal solutions set of a parametrized program. Unfortunately the stability of the classical continuities (in particular the lower one) is only verified in a few cases and always requires strong assumptions. This restricts the power of the already published schemes. In this paper we present a new synthetic approach of optimization methods that greatly avoids this non-stability. The originality lies in the definition in terms of families of maps, of properties like continuity and in a different approach of algorithms (no maximisations at each iteration but something that preserves the strict monotonic property which seems to be essential). This proposed approach is justified in the context of optimization by the new possible applications and leads to general fixed point algorithm that generalize [15, 17]. To develop this approach we consider families of point-to-set maps that depend on a parameter. These families, we have called p-decreasing families of maps, are defined in Section 1. Some essential definitions and properties needed for the description of fixed point algorithms are given (for more details see [5]). These definitions can be considered as extensions of the classical continuities; they allow us to define a "measure" for maps that are not lower or upper continuous. Besides this generalization, the formalism used here allows us to exhibit, for convergence purpose, a "minimal property" (the uniform regularity) that is not a continuity property. In Sections 2 and 3, two general algorithms are presented. They are the generalization of the one (and of the two) stage algorithm [15, 17]. Illustrations of the possible applications are shown in Section 4. Notations R n, the n-dimensional Euclidean space; ~(X), the set of all subsets of X C R" ; A ~ (Fr(A)) the interior (the boundary) of A C R" ; ~,, the closure of A C R" ; (x, y), the Euclidean scalar product on R~; Ix, z], the convex hull of x, z E R" (segment); liras x, (liras x,), the smallest (the greatest) accumulation point of the sequence
{x~ ~ R~; 1. p-Decreasing families of point-to-set maps: Definitions In the following we shall have to consider families of point-to-set maps from X (C R") into ~ ( Y ) (Y C Rm), which depend on a nonnegative parameter and
J. Denel/ Extensions o[ the continuity
50
which are ordered by inclusion. Definitions and properties about such families are given in this section, for a complete study of the p-decreasing families the reader is referred to [5]. Definition 1 (p-decreasing family). A family {Fp[p >-0} of point-to-set maps from X into ~ ( Y ) is said to be a p-decreasing family if and only if (i) Vp >0, Fp : X ~ ( Y ) , (ii) Vx E X, Vp -> 0, Vp' -> 0 (p' --0} from X into ~ ( Y ) is uniformly regular at g E X if and only if, F0(~) # ~:=> =lp > 0, =IV(S) a neighbourhood of $: Vx ~ X tq V($), Fp(x)~ ~. The family is uniformly regular on X if it is regular at every x E X. Remarks. (1) A p-decreasing family {FpJp-O} will be called regular if the property of Definition 2 holds only at $, that is, F0(.~) # ~==~=lp > 0 : Fp(.~) # ~. (2) It is convenient to call dense at ~ E X a p-decreasing family such that
Fo(g) C U Fp(s p>0
it is obvious that density at ~ implies the regularity at ~. (3) In the Definition 2 it is equivalent to say, F0(g) # ~ 3p >0, V{x. E X}N-~ g, =In0: Vn -> no, Fp(x,) # ~. (4) Definition 2 is not concerned with a notion of continuity of the map F0 (in the classical meaning). But it will be seen (Sections 2, 3) that this notion is the one that is needed for convergence results in the fixed point theorems. To ensure the stability of the uniform regularity of p-decreasing families in elementary operations we have to introduce an extension of the classical lower continuity. Definition 3 (pseudo-lower-continuity (or p.l.c) at ~ E X). A p-decreasing family {Fp ] p -> 0} from X into ~ ( Y ) is pseudo-lower-continuous at ~ E X if and only if,
V{x. E X ~ , V p ,
p'>O(p'0} does not depend on the parameter p, Definition 3 is obviously the definition of the lower-continuity; but the p.l.c at of a p-decreasing family does not imply the lower continuity (in the classical understanding) of the map F0. Conversely even if F0 is lower continuous at • X, the p-decreasing family {Fp [ p -> 0} may not be p.l.c at g. But it is easy to
J. Denel/ Extensions of the continuity
51
see that if a p-decreasing family is dense and p.l.c at $ E X then the map F0 is lower continuous at ~. (6) In the above definitions, the variable x and the parameter p play distinct parts. In fact to a p-decreasing family we can associate a map G defined on X x R . into ~ ( Y ) by G ( x , p ) = Fp(x). It is easy to verify that the lower continuity of the map G implies the pseudo-lower-continuity of the family; the converse is not true.
Definition 4 (pseudo-upper-continuity (or p.u.c) at g E X). A p-decreasing family {Fp I P > 0} from X into ~ ( Y ) is pseudo-upper-continuous at ~ E X if and only if, V{x,, E X}s--, ~ } V{y. E Y } s ~ y such that ::lp > 0 , :lno: Vn -> no, y. E F,(x.) => ~ E F0(~). The family is p.u.c on X if it is p.u.c at every x E X.
Remarks.
(7) If the family {Fplp >-0} does not depend on the parameter p, Definition 4 is obviously the classical definition of the upper continuity. It is easy to construct families that are p.u.c with the map F0 being not upper continuous. (8) The previous definitions have been given in terms of sequences which is well adapted to the study of algorithms. In [5], these definitions are given in topological spaces and the connections with the definitions given here are studied. Let us now give some properties that will be used in the following sections.
Proposition 1. Let be given a p-decreasing family {Fp ] p > 0} from X into ~ ( Y ) and denote by M = {x E X [ Fo(x) = 0}. Then, {Fp I P >- 0} uniformly regular on X ~ M closed. Proof. If M = ft the result is correct; so assume M S 0 and consider a sequence {x. E M}~ converging to ~; by the converse, assume SK M:
{Fpl p ->0} uniformly regular at x /
::lp > 0 , 3 n o : n > n o , F p ( x . ) # 0 ]
Fo(g) ~ 0 Remark 3
Vn, Fo(x.) = 0
I
leading to a contradiction.
Proposition 2. If {Fp ] p >- O} is uniformly regular at ~ ~ X and if there exist
{x. e X ~ - , ~ e x ,
~o. e R+}~ -,0
such that Fp.(x,) = ~, then Fo(g) = ~. Proof. ( a ) { F p l p ~ O }
uniform regular at g ~ 3 p > O ,
3V(g):VxEXAV(g),
J. Denel/Extensions of the continuity
52
Fp(x) r 0; and (b) {x. E X } ~
~
3no: n >--no~ x. E X FI V(Y,) imply
3n0: Vn >--no, Fp(x.) r O.
(1)
But,
{p. -> 0}s ~ 0,
{Fp ] p - 0} p-decreasing
imply
::lnl:Vn>-n,,p. 0} a p-decreasing family from Y into ~ ( X ) . Proposition 3 below gives results about the family {Fp I P - 0} defined by Vy
Y, Vp > 0: Fp(y) = {x E Ap(y) If(x, y) -> ~b(y) + 3'(p)},
Vy E Y, p = 0: F0(y) = {x E Ao(y) If(x, y) > ~b(y)}. Proposition 3. With the notations and assumptions given above, {Fp [p-> 0} is a p-decreasing family and we have: f upper semi-cont, on X x {y}] (i) ~b lower semi-cont, at Y I ~ {Fp I P >- 0} p.u.c, at y, {Ap [ p -->0} p.u.c, at
(ii)
f lower semi-cont, on X x {9} ] {Fp [ p -> 0} dense, dp upper semi-cont, at Y I ~ uniformly regular {Ap ] P > 0} dense and p.l.c, at ~ and p.l.c, at Y.
Proof. (i) Consider {y. E Y}s ~ Y, {x. E X}s ~ ~ such that :lp0 > 0, 3n0: Vn -> no, x. E F~(y.). X closed ~ ~ E X ; furthermore because {Ap I P -> 0} is pseudo upper continuous at ~ we have ~ E A0(~) and for every n -> no we have
f(x., y.) >--,;b( y . ) + 3'(P0). Taking the limit (n ~ +oo) in this result and using the continuity properties of f and ,# we obtain: f(~, 9) - 4,(Y) + 3'(p0) > 4,(9)
which implies with ~ E A0(~) that ~ E F0(y).
J. Denel/Extensions of the continuity
53
(ii) We first have to show that Fo(9) C Up>oFp(9); if ~ E Fo(9) we have ~ ~ A0(~) and f(g, 9) > ~k(~). The density of the family {Apl p >-0} implies the existence of a sequence {x. E X}s converging to ~ with x. E Up>o Ap(9) for every n or equivalently Vn ~ N , ::lp. > 0 : x. E Ap.(9).
(1)
On the other hand, the assumption about 3' implies that there exists p0 > 0 such that 0 < 7(Po) < f ( 2 , 9 ) - $(Y), since f(2, 9) > 4~(Y), and then 3no: Vn > no, f(x., 9) > 4,(9)+ 3,(po). This result with (1) shows that for every n - no, there exists p" = min{p., po} such that x. ~ Fp;(9), hence
p>0
Let us now prove the pseudo-lower-continuity of {Fp ] p ~ 0} at Y. Consider p, p ' > 0 ( p ' < p ) , {y, E Y } ~ 9 and ~ E Fp(~). {Ap ] p -> 0} p.l.c at ~ l ::> :Ino, 3{x. E X}s ~ ~: Vn -> no, x. E Ap,(y. ). E Ap(y) J We will show that this sequence {x. E X}~ verifies f(x., y . ) > 4,(y.)+ 3,(p') for every n-> n~. By the converse, assume there exists a sub sequence {x, E X}~, such that Vn ~ N', f(x., y.) < 4'(Yo) + 3,(P')Using the continuity of f and ~b, and taking the limit (n E N') we obtain /(x, y) -< ~(y) + 3,(#) < ~ ( y ) + 3,(o) (because Finally Remarks family at
3, is strictly increasing) and hence a contradiction with ~ E Fp(9). to prove the uniform regularity at Y, we observe that the regularity (cf. 1, 2 in Section 1) and the pseudo-lower-continuity of a p-decreasing 9 imply its uniform regularity (see [5, P.I. 3, p. 10]).
2. A one stage algorithm
2.1. Principle, assumptions of algorithm Am Here is described a fixed point algorithm to construct a feasible sequence {x.}N for solving the general problem: maximize
f(x),
subject to
x•ACR".
J. Denell Extensions of the continuity
54
2. I.I. Intuitive description A well known approach to solve problem ( ~ ) is to replace the direct solving by an infinite sequence of optimization sub-problems, associated with the current solution x, easier to solve than ( ~ ) because their domain O(x) and/or their objective function g(., x) are more simple. More precisely assume that at every point x E A is associated a subset ~ ( x ) C A. Let us denote by Po(x) the sub-set of 12(x) defined by:
Po(x) = {y E ~ ( x ) I g(Y, x) > g(x, x)} where g is the objective function of the sub-problems properly associated with f. In the classical approach, [15] or [17], the successor x' of x is chosen among the points which maximize g(., x) over Po(x) and hence, to prove convergence, the continuity properties of the optimal solution set of a parametrized problem have to be used. Intuitively, the set Po(x) may be considered as the union of sets Pp(x), p > 0 where (see Fig. l)
Pp(x) = {y E ~ ( x ) [ g(y, x) >-g(x, x) + p}.
~
-
-q.(=) Fig. I.
In the proposed approach, a step consists, x E A and p > 0 being given, in arbitrarily choosing x' in Pp(x) if Pp(x) is not empty. If Pp(x) is empty and only in this event, the step consists in setting x' = x and in reducing the parameter p (for example p = 89 The classical maximization of g ( - , x ) over O(x) is obviously a particular case. We shall present two versions of such a one stage algorithm: the first one when x' is arbitrarily chosen (no maximizations), the second one when the point x' maximizes g(-, x) over Pp(x). The convergence results are of course stronger with this second version. It is to be noticed that to recognize a set Pp(x)--tJ is as much difficult as to recognize that a point x maximizes the function g(., x) over /2(x) within some error ~, this knowledge being required in the classical schemes.
2.1.2. Assumptions Let us consider: E C R", a compact subset;
J. Denel/ Extensions o f the continuity
55
{Fp [ p -> 0} a p-decreasing family from E into ~ ( E ) ; and assume HI: M = {x E E [ Fo(x) = 0} is not empty. H2: The family {Fp [ p -> 0} is uniformly regular on E. H3: There exist h : E ~ R and a : ]0, + o o [ x E ~ R such that a: 3K, V x E E - M : h ( x ) < - K , b: Vx E E - M, Vp > 0 : x ' E F p ( x ) ~ h ( x ' ) > - a ( p , x ) , c: V{x. E E - M}N such that {h(x.)}s has a limit/~ we have Vp>O,
h < limN a(p, x.).
Remark. (9) An example of function a is given, in a lot of applications, by a ( p , x ) = h ( x ) + p . In Section 2.4 it is shown that the classical relation [15] between the original objective function f of problem ( ~ ) and the related function g implies the existence of such a function. (10) V x E E - M , V p > O : h ( x ) < a ( p , x ) . To prove this, consider in H3c a sequence {x.}~ with x. = x Vn. With this remark, assumption H3 can be seen as a property like the strict monotonic property. (11) In terms of p-decreasing family, assumption H2 replaces the uppercontinuity in Zangwill theorem.
2.2. Description of algorithm A~ Let be given starting values x0 E E, p0 > 0 and a scalar/3 E ]0, 1[. Step n: ..
f Xn+l ~ Xn~
if Fp.(x.) = 0, men ~.0 no; the scalar/3 ensures the sequence {p,}N to converge to zero if the event Fp,(x,) = 0 occurs an infinity of times.
2.3. Convergence results T h e o r e m I. Under the assumptions HI, H2, H3 there exists a well determined sub sequence of the sequence {x.}N constructed by Al, having its accumulation points in M.
Proof. We may suppose that for every n , x . ~ M ; otherwise, if x~0E M, then
56
z Denel/Extensions of the continuity
Fo(x,O = 0. Thus we would have Fp(x~o) = 0 for every p > 0 and the sequence would be constant for n >- no and equal to x,0. With (x~ E E - M}N let us show that: (1) {h(xn)}s has a limit/T, (2) {.on}~converges to zero. (1) By construction, assumptions H3b, completed by Remark 10, and H3a, the sequence {h(x~)}~ is an upper-bounded nondecreasing sequence, thus converges to some/Y. (2) The sequence {p,}~ is a non-increasing positive sequence; it has a limit p*. Assume that p * > 0. From the construction of the sequence {P,}s it is obvious then that there exists no such that Vn >_no
P~o -
P.
-
P
,
[Fpn(Xn) = Fp.(x,,) r O. So we have, using Hab:Vn >-no h(x~+l)>-a(p*,x~). This is inconsistent with {h(x~)}~/Y (use H3c and n ~ + ~ ) . Thus there exists an infinite well-determined subset N, C N such that Vn E N1
Fpn(x=) =
O.
The result follows by applying Proposition 2 to any convergent subsequence of the (compact) sequence {x~}s,. We propose with Theorems 2 and 3 stronger results according to stronger assumptions (in Theorem 2) or according to a modification of each step in A~ (version Ai, Theorem 3). Theorem 2. Under the assumptions Ht, H2, H3 and (H4) h is l.s.c at any x E M and u.s.c on E, (H~) if x E M, then Vy E E: h(y) --- h(x), algorithm A~ constructs a sequence {x,}s having all its accumulation points in M. Remark. The latter assumption H5 implies that M is the set of optimal solutions for the problem sup{h(x) I x ~ E}. This assumption will be often verified in the applications (concave maximisations) and is similar to hypothesis Hal in [15, p. 313]. Proof. Results from Theorem I are available. Since E is compact there exists K M, limit of a sub sequence N~ C N1 (where N, is determined in Theorem 1). Hs:~ Vy E E: h(y) _ a(p, x*)} ::~ :lp~ > 0, Fpo(x*)~ --~H3b
H2
J - - Remark 10
yielding :ly E / E : h(y) > h(x*); but lim{x.}N, = x* E E - M ] h u.s.c, o n E - M
~ 0 , / ~ ~ ] 0 , 1[ and {y.}N a sequence of nonnegative reals converging to zero. Step n: -
f
Xn+I
=
Xa,
if Fp.(x.) = t~, then~0 < P.+t ~/3p., otherwise choose~ x"+l ~ Fp.(x.): Vy ~ Fp.(x.), h(y) -< h(x.+l) + y. I.P.+1 = Pn-
end of step n.
Under the assumptions H1, H2, H3, every, accumulation point of the sequence {x.~ constructed by Ai is in M.
Theorem 3.
Proof. Points 1 and 2 from T h e o r e m 1 are available, that is {h(x.)]~-*/~, {p.}s~0. Let us then consider any accumulation point x* = lim{x.}N, of the sequence {x.}s and assume that x* ~ M.
Fo(x*) # ~} H2
::~ 3po > 0, 3no: Vn > no (n E N'), Fpo(x.) ~ O.
Thus, for e v e r y n -> no (n E N ' ) there exists y. such that Y. E -rr_~0,x.,~ t ~1 ~ h ( y , ) -> , ' (p0, x , ) . H3b
J
J. Denel/ Extensions of the continuity
58
Since { p , } ~ 0 , we can prove as in T h e o r e m 1 the existence of n, such that
Vn>-nl(nEN
')
y, EFp.(x,).
But the choice of x,., in Fp.(x,) implies that Vn - max{n0, n,}, n ~ N': a(p0, x,) -< h(y,) -< h(x,+l) + 3',. Taking the limit (n E N') we conclude limN, a(po, x,) --0: a(p, y) = f(y).
3. A two-stage algorithm A2 The algorithm described in this section is similar to the one published by Huard [15, p. 317]. However the use of p-decreasing families allows to enlarge the applications of this type of general algorithms in the field of mathematical programming. A two-stage algorithm is described by: to a feasible point x is associated a set A(x) (for example the set of all the descent directions in unconstrained optimization). An arbitrary z in A (x) being chosen, the successor x' is picked in the set F(x, z) where F is a point-to-set map. The set F(x, z) consists generally in the points of the segment [x,z] which satisfy a given property (for example the points maximizing a function). In Section 3.1 assumptions and notations are given for the description of algorithm A2 (3.2). The convergence is proved in Section 3.3.
3.1. Assumptions, notations for algorithm A2 let us consider E0 C R", E~ C R" two compact subsets;
60
J. Dend/ Extensions o f the continuity
{Ap [ p -> 0} and {Fp ] p > 0} two p-decreasing families from Eo into ~ ( E , ) and Eo x E~ into ~(Eo). Denote Ml = {x E E0 ] A0(x) = ~},
M2 = {x E Eo [ 3z E Ao(x), Fo(x, z) = (J}, M
= MI
UM2.
And assume H~: (a) Me: ~, (b) the maps Ao and Fo are such that MI r ~::> M2 = ~. H~: (a) The family {Zip [ p - 0} is uniformly regular, (b) the family {Fp ]p - 0} is uniformly regular, (c) the family {Zip [ p - 0 } is pseudo-upper-continuous. H~: ::lh : Eo--* R and :la : ]0, +oo[ x ]0, +oo[ x E o ~ R such that (a) =IK < +oo: Vx E Eo, h(x) 0, Vp' > 0, we have
z E Ap(x) } x' ~ FAx, z) ~ h(x') >- a(p, p', x). (c) V{x. ~ E0}s such that {h(x.)}N has a limit/Y, then Vp > 0, Vp' > O:/Y < lims a(p, p', x.).
Remarks. (1) H~r ==>Vx E E0, Vp, p' > 0: h(x) < a(p, p', x). (2) The function ~, can be a ( p , p ' , x ) = h(x)+pp'; in a lot of applications a does not depend on the parameter p.
2.2. Description of algorithm A2 Starting values: x0 E E0, p0 > 0, p~ > 0 and a scalar fl E ]0, 1[. Step n: Xn+l =
Xn,
1st stage: if Ap,(x.) = O, then t 0 < p.,j ~' and
{x.}N~:~
(E0 and E, compact).
Hence we have, using the p.u.c of {Ap ] p > 0}, 3s E a0(~), F0(~, 2) = 0r ME ~ 0.
This is inconsistent with M~ ~ t~ (see H~b). In conclusion: (a) If M~;a0, the sequence N1 of indices such that n E N ~ A p , ( x , ) = r is well-determined. Proposition 2 proves that the accumulation points of N~ are in M. (b) If M~ = I~, the sequence N2 of indices such that: Vn E N2,
::17..E Ap.(xn) C Ap.(xn): Fp;,(xn, zn) =
(p* is the limit of {p.}~) is well-determined and again Proposition 2 proves the theorem. The following theorem (similar to Theorem 2 in Section 2) holds for algorithm
As.
J. Denel/ Extensions of the continuity
62
Theorem 5. If the function h is l.s.c at any x ~ M, u.s.c on E0, if x ~ M implies Vy E E0, h(y) -< h(x) and if the assumptions H~, Hi, H'3 are satisfied, then every accumulation point of the sequence {x.}s is in M. The proof is omitted because it is similar to the proof of T h e o r e m 2. (Remark, x0 ~ M=> A0(x0) ~: ~ and Vz E A0(x0), Fo(xo, z) ~ ~J.) As in Section 2.3 it is possible to derive from algorithm A2 particular versions (denoted by A~ and A~3. In version Ai, inaccurate maximizations are performed in Fp;(x.,z.) to determine x.+t (that is, if F p ; ( x . , z . ) ~ ~J, choose x.+~ E Fp;(x.,z.) such that Vy E Fp~(x., z.), h(y) h, {p'}N-->O and
{p,}N--*p*>O.
Now let us consider x* = lim{xn}N, any accumulation point. If there exists N" C N' such that:
Vn E N",
z. E Ap.(x.)C Ap.(x.)
and
Fp,(x., z.) = (~,
then Proposition 2 applied to a sub sequence N'{ C N " such that z. ~ ~ (n E N'[) implies :If E A0(2) (by Hic) Fo(x*, ~) = ~t. If such a N" does not exist, that is: :IN0: V n >- no(n E N')
z. E Ap.(x.)
and
Fp,(x., z.) ~ O,
the converse, x* ~ M, leads to a contradiction (use Hib, Hie).
J. Denell Extensions o.f the continuity
63
Case (b). The proof is similar and without particular difficulties; the use of assumption H[c is replaced by A0 upper-continuous.
Theorem 6'. I f the assumptions H], H'2, H'3 are satisfied, then every accumulation point of the sequence constructed by A~ is in M. Proof. If M~ = 0 or M~ g 0 and A0 upper continuous, then T h e o r e m 6 proves the result. In the other case, it is clear that {p.}N-*0. Let us consider x* = lim{x.}N, any accumulation point. If for infinitely many n E N', Ap.(x.)= ~t, then Proposition 2 implies x* ~ M. Otherwise, if we assume x* ~ M, the particular choice of z. and the uniform regularity of {Ap I P -> 0} imply the existence of t~ > 0 such that n E N ' : ~ z. E A~(x.). In both cases (p~,~0 or not), a contradiction with H~c follows.
4. Applications to optimization Algorithm 5.2 (p. 318 in [15]) is a particular case of algorithm A2. It corresponds to the version A~ and T h e o r e m 6, case (a), where the map A0 is nevertheless assumed to be upper-continuous, because in that scheme the family {Ap [ p -> 0} is such that Vp -> 0, Ap = A0. There are a lot of particularizations of the algorithm by Huard [15]; for example related gradient methods (conjugate gradient methods...), Frank and Wolfe's method, Rosen's method, linearized method of centers, Zoutendijk's method .... We shall now show on only two new examples that with this algorithm A2 we can modelize a lot of well-known methods that could not be modelized with the classical approach. 4.1. Linearized method of centers with partial linearization [ 1 I, 4] The problem to be solved is: maximize
f(x),
subject to
g~(x) >- O,
i = 1. . . . . m,
xEB, where the functions are concave, continuously differentiable and B a compact polyhedron. For a given e > 0, we denote by d' : B • B ~ R a function defined by d',(z; x) = min{f'(z; x) - f(x), g~z; x) [ i E I,~)} with f ' and g~ being the tangential approximation of f and g~ at point x and with I.(x) -- {i E {1 . . . . . m} I gi(x) < e}. This function d', is actually the "partial linearized F-distance" related to the
64
J. Denel/Extensions o[ the continuity
particular F-distance
d(t,
f(x)) =
min{f(t) - f(x), gi(t) I i E {1 ..... m}}
The method consists in: (1) maximize d" over B and choose an optimal solution z, (2) maximize d(., f(x)) on the segment [x, z] and choose as a successor of any solution of this one-dimensional optimization. It is to be noticed that this method is a particular case of the "method centers by upper bounding functions" [12]. We could verify this latter method an application of algorithm A2. It is not possible to interpret this method with the two-stages scheme given [15] because the function d" is only u.s.c on B. Thus the map
x, of is in
M'(x) = {z E B I d'(z; x) >- d'(t; x) Vt E B} is not upper-continuous. But, by setting:
Vp > O, Ap(x) = {z E B I d'(z; x) >- d'(t; x)Vt E B} 1 Ao(X) = {z E B I d'(z; x) >- d(t, /(x))Vt E B}J and
Vp > O, F,(x, z) = {y ~ Ix, z] I d(y,/(x)) >- p '1
f
Fo(x, z) = (y ~ Ix, zl [ dfy,/(x)) > 0}
we can verify the linearized method of centers, when only the active constraints within 9 are linearized, is an application of A~, by setting:
Eo={x[gtx)>-O,i=l h(x) = / ( x ) ,
..... m } N B ,
El=B,
or(p, p', x) = jr(x) + p',
the family {Ap I P -> 0} is of course p-decreasing, uniformly regular because for every x E E0 and p -> 0, the set Ap(x) ~ t~. Furthermore, B being closed, d', being u.s.c and max{d(t, f(x)) [ t E B} being continuous (classical result) we have the pseudo-upper-continuity of the family {Ap [ p _> 0}. Besides, Proposition 3 implies the uniform regularity of the family {Fp I P -> 0} because the map (x, z)---> [x, z] is lower-continuous. Finally with the choices for h and a, assumption H~ is satisfied. This shows that this method is an application of A2; Theorem 5 is available because the set M (= M2 in this case) is the set of all the optimal solutions for the considered problem (see [15, p. 325)].
4.2. Sub-differentiable optimization Let us consider the algorithm of Dem'yanov [7] for solving the problem: minimize
/(x),
subject to
x ER",
3". Denel/ Extensions o[ the continuity
65
where [ is defined by
f ( x ) = max{fi(x) l i = 1 ..... m}
[i continuously differentiable
and A = {x E R" [ [(x) 0} a p-decreasing family defined by Vx E A, Vp > 0, Ap(x) = {g E E~ I max (V/i(x), g) 0} the family defined by
Vp
> O, F,(x,
g) = {y =
x + Og, 0 >- 0
If(Y) -< f(x) - P}t, f
p = o, Fo(x, g) = {r
x + Og, 0 -> 01/(r) 0), and every direction g E Ao(x) being a descent direction, we have {x E A [ :ig E Ao(x), Fo(x, g) = 0} = 0, hence M2 = 0;
66
J. Denel/ Extensions of the continuity
Assumptions H~,a,b.c are obviously satisfied with the particular choice of h and a. Let us prove Hi. the family {ap I P > 0} is p-decreasing
because p' h(x). Let be given ~ ~ E, F defines the following sequence: if kx E P, stop; otherwise k+lX E F(kx), k E N. Under these conditions, every accumulation point *x of the sequence satisfies: *x ~ P .
Corollary 2. Suppose: E C R" a compact set, P C E, I a finite set of indices, h : E ~ R a continuous function on E and Fi : E ~ ~ ( E ) , i E L a set of point-to-set maps such that: Vi E I, Vx E E - P : (a') F/(x):]:0 and I] closed at x, (fl') x' E l](x) ~ h(x') > h(x). Let us define I" = UiEI I]. We consider the following sequence: ~ E E; if ~x E P, stop; otherwise k+tx E F(kx), k E N. Then every accumulation point *x of the sequence satisfies: *x ~ P.
Proof. The Zangwill's theorem assumptions are satisfied: conditions (a) and ([3) follow directly from conditions (o~') and ([3') respectively. Remark 3. From above, when only one I'~ is considered, we obtain Zangwill's statement.
Remark 4. Taking a successor k+~x E F(kx) means choosing it arbitrarily in one of the ranges F~(kx) i.e. in a free manner. A particular realization, for instance, consists in using all the F~'s one after the other in a determined order i.e. in a cyclic way. In this case, if I = {l, 2 . . . . . p} for instance, taking the natural order, we can define F' by F'(x) = Fp o Fp-i o.. 9o l'~(x) where F' is the composition of p point-to-set maps. Now we give a second corollary (Dubois [5]). Corollary 5. Suppose: E a compact set in R", P C E, F : E ~ ~ ( E ) a point-to-set map, and h : E-->R a continuous function on E such that: Vx E E - P : (or) F(x)d=O and F closed at x, ([3) x' E F(x) ~ h ( x ' ) > h(x). We consider the following algorithm: ~ E E; if kx E P, stop; otherwise k+~x E E such that h(k+~x) >--h(ky) with ky E F(kx). Then every accumulation point *x o f the sequence satisfies: *x ~ P.
Other results related to analysis of convergence of mathematical programming
72
J.Ch. Fiorot, P. Huard/ Composition o[ algorithms
algorithms have been considered in the literature, for instance: G.G.L. Meyer [1 l, 12, 13], H u a r d [8], R.R. Meyer [16], P o l a k [23].
2. Notation and hypotheses The problem is: to maximize a continuous function f over a subset A of R ". For this we give p point-to-set maps A, i ~ I = {l, 2 . . . . . p}, from A into 90(A). We require the following hypotheses: ( H I ) x E A~(x), Vi ft. I, Vx ft. A, (H2) A~ is continuous on A (closed and lower semi-continuous), (H3) Vx ~ A, f : A ~ R has a unique m a x i m u m o v e r Ai(x), (H4) there exists ~ ~ A such that E 0 = { x ~ Al,f(x)>-f~x)} is a c o m p a c t subset.
2.1. Standard examples o[ such Ai (a) Unconstrained case (A = Rn). The range A~(x) is a r~ dimentional linear variety containing x with E~r~ = n (p - n) and R ~ = EieIA~(0). In particular for p=n, ri=l, Ai(x)={u ~ Rn[u=x+0e. 0 ~ R} with e~ the i th vector of any basis of R n. (b) Constrained case. Let be given: Vi, i E I a linear s u b s p a c e of R ~ such that R ~ = V = ~ i Vi and K --- ~ie1K~ with Ki a n o n e m p t y closed subset of Vi. In Cea and Glowinski [2], V and V~ are reflexive B a n a c h spaces. Define A = K and for any x E K we set:
Ai(x) = {y ~ K I Y = (xl, x2 . . . . . xi-i, yi, xi+l . . . . . xp), Yi E Ki} or
AI(X) = (xl . . . . . xi-i, O, xi§ . . . . . xt,) + Ki C_x + Vi. R e m a r k 6. In the previous examples all sets A~(x) are convex; but this condition is not required by A s s u m p t i o n s ( H I ) - ( H 4 ) .
2.2. Definition and properties Definition 7. For any x E E0 and any i E I, we define the point-to set map Mi, M i : x ~ { u E A i ( x ) [ [ ( u ) > - f ( t ) , V t ft. Ai(x)}. If the subset Mi(x) is reduced to a point, which is the case when (H3) is used, we shall term this subset by M~(x) instead of {M~(x)}. For any positive integer m we also define: M = Mi, ~ M~,_, . . . . M~ with it E I for j = 1, 2 . . . . . m.
Remark 8. The following results a b o u t convergence are dependently of the methods used to maximize .f o v e r Ai(.).
established
in-
ZCh. Fiorot, P. Huard/ Composition of algorithms
73
Property 9. Under Hypotheses ( H I ) - ( H 4 ) : M is a continuous function on Eo. Proof. For any it ~ I and for any x E E0, f r o m (H3), the set Mij(x) is n o n e m p t y and M~j is univoque. The hypothesis (H2) and the continuity of f give the closedness of Mi: Moreover, from ( H I ) , Mii(x) belongs to the c o m p a c t E0. And consequently Mij is a continuous univoque function on E0 and M also.
Lemma 10. Under Hypotheses ( H I ) and (H3): x = Mij(x) for every ij E L j = 1,2 . . . . . m is equivalent to x = M(x). Proof. (a) F r o m x = Mij(x) for e v e r y ij E L j = 1, 2 . . . . . m we obtain directly from the definition of M: x = M(x). (b) If x = M ( x ) let be 0y = x, ty, 2y. . . . . my = X with iy E Mij(~-~y) a sequence originating and arriving at x. F r o m ( H I ) and the definition of Mii we obtain f(j-ly) < [(jy). As 0y = my = 0x it follows: f(iy) = [(x) and f r o m (H 1) and (H3), for any it E L we have Jy = x, j = 1, 2 . . . . . m, consequently x = Mij(x).
3. Study of free steering utilization of algorithms 3.1. Algorithm I Let ~ be a given starting point in A, and consider the following sequence (kx), k E N, given by: if kx = M~(kx) for e v e r y i E I then stop, else:
k+~x ~ U
M~(kx). iEl
The primitive problem is to maximize [ o v e r A; in the following section 3.2 with only H y p o t h e s e s (H1)-(H4) we are concerned with seeking the fixed points of the Mi's. These fixed points are related to the o p t i m u m of [ o v e r A. Then additional h y p o t h e s e s such as differentiability and strict concavity of f permit us to obtain the maximum. We always suppose in the sequel that the sequence (kx), k E N given by the algorithm, involves infinitely many distinct points; otherwise the last given point satisfies kx = Mi(kx) for e v e r y i E /, i.e. kx is a fixed point for each M~.
Property 11. For the sequence (kx), k E N, given by algorithm I we have: IIk~'x - kxl[--,O, when k --, ~. Proof. We s u p p o s e the converse i.e. there exist a real 8 > 0 and a subsequence (kx), k E N 1 C N such that IIk+'x - ~xll > 8. As (~§ ~x) ~ E0 • E0 we can find a subsequence converging to (*x, **x) with I]**x - *xll-> 8. But I is finite, and there exists at least one function Mi which is used infinitely m a n y times such that
74
J.Ch. Fiorot, P. Huard/ Composition o[ algorithms
k*lX = Mi(kX). F r o m the continuity of Mi it follows: **x = Mi(*x). But [(*x) = /(**x), this equality and Hypothesis (H3) give *x = **x, yielding a contradiction. Remark 12. Following Ostrowski [21] we know that either the entire sequence converges or it possesses a set of accumulation points which constitute a continuum. Consequently if as in M e y e r [14, 15] we add the following hypothesis: for any given A, {x ~ E o l x = M ( x ) and [(x) = A} is finite, we obtain the convergence of the sequence (kx), k E N. In a forthcoming example (Example 17) we shall note that the convergence may happen even if this last hypothesis is not satisfied.
3.2. Study of fixed points given by algorithm I. We consider an arbitrary accumulation point *x of the sequence (kx), k ~ N, given by algorithm I and the corresponding subsequence (kx), k ~ IQ C N, converging to *x. For a given but arbitrary integer m we define a partition of the indices of the sequence (i.e. a partition of the sequence in fact) in subsets of m successive terms. Then we consider the groups of m successive iterates which give at least a point kx, k E 1V, of the subsequence. In these infinitely many such subsets (or groups) there exist at least one infinity of them which uses the " o p e r a t i o n s " M~t, M, 2. . . . . Mi, in this order. This follows from the fact that, given m successive iterates taken in the partitioning there is a finite number of possible ways (pm) to maximize over the A~'s in a given order. Repetitions are allowed, particularly if m > p. Of course if m < p only a part of the A~'s will be used. L e t us set M = M~m o M~m_~. . . . M~ the composition of these m successive "operations" where il, i: . . . . . im is an ordered sequence of indices taken in I. Lemma 13. Under Hypotheses (H1) to (H4) and with the above definition o[ M,
every accumulation point *x o[ the sequence given by algorithm I satisfies *x = M ( * x ) and *x is also a fixed point [or each Mj, j E {i~, i2. . . . . ira}. Proof. Let N " C N be the subset of indices of points which give by the point-toset map Mi. the first point of each special m successive iterates (recall that each special m successive iterate contains at least a point kx, k E 1Q). We have for K E N": k+mx = Mi, o Mira_, . . . . o Mit(~x). If k and k' are two successive integers in N " we may have k' = k + m if there are two successive groups of "operations" M~,, M~2. . . . . M~,, every one of them giving at least a point of kx, k E Iq. We consider the subsequence (kx), k E N". In particular, thanks to (HI) and (H3), it is possible to satisfy requirements of Corollary 5. Instead of E, F, ky, kX, P and h we take respectively Eo, M, k§ kx (with k E N"), {x E Eol x = M ( x ) }
J.Ch. Fiorot, P. Huard/ Composition of algorithms
75
a n d / . P r o p e r t y 9 gives point (o0. The condition (/3) is written in the following way: x ' = M(x), x#M(x) implies f(x')>f(x). If we suppose the contrary i.e. f(x')=f(x), then f r o m ( H I ) the sequence ~ 'y . . . . . m y = x ' with i y = Mif-ly) for j E {1,2 . . . . . m} is such that [(x) = f ( l y ) = f(2y) . . . . . /(my) = f(x). And now (H3) gives x = ly = 2y . . . . my = x' which cannot hold. Then any accumulation point g of the subsequence (kx), k E N", satisfies g = M($). Returning to the subsequence (kX), k E I([, we can write it (k+PkX)with k E N " and p~, ~ {1, 2 . . . . . m}. In a group of m successive iterates the subsequence (kx), k E N, m a y have several elements i.e. for a given k E N " it m a y exist several p~ E {1,2 . . . . . m} from which we just pick up one denoted Pk and we still have k*Pkx~ *X when k-~ ~ (with k ~ N"). Property 11 gives: IIk+'x - kxll
0, It'+=x -
0 .....
II + x -
for k E N";
then ]]*+Pkx- kxi]~0 for k E N" and consequently *x = ,~, therefore *x = M(*x) and from L e m m a 10: *x = Mij(*x) for j ~ {1, 2 . . . . . m}.
Counterexample 14. If we drop H y p o t h e s i s (H3) we cannot apply Corollary 5 i.e. in fact Zangwill's T h e o r e m l as shown by an example in R z where the graph of a quasi-concave function [ is drawn (Fig. l). Outside the square abcd the value of f is zero. i = 1, 2.
Ai(x ) = {U]U = X + Oei, 0 E R},
Ml(b)=(8),
M ( b ) = M2oMl(b),
! !
i.e. segment [c,a] union the shaded portions,
s S p
s
9 (,~)
Fig. 1.
J.Ch. Fiorot, P. Huardl Composition o[ algorithms
76
b ~ M ( b ) , a or c E M ( b ) a n d h o w e v e r f ( c ) = f ( a ) = f ( b ) = O . Introduction Hypothesis (H5) (H5) ::lm E N, m > - p such that Vj E N, Vi E /, :lk E [ j m + l , ( j + l ) m ] satisfying k+~x = Mi(kx). Algorithm I defines implicitly a sequence (ik), k E N, satisfying k+~x = M~k(kX). Arbitrary sequences (i~) satisfying Hypothesis (H5) will be called "essentially periodic" by analogy with the property introduced in [19, p. 513] for a sequence of vectors. Effectively (H5) may be written (H5') 3 m ' , m ' - > p such that Vj E N, Vi E L 3k E [j+ 1, j + m'] satisfying k+'x = Mi(kx). By setting m ' = 2m for example, (H5) implies (H5') and conversely, setting m = m', (H5') implies (H5).
Theorem 15. Under Hypotheses (HI) to (H5) we have: (i) every accumulation point *x o f the sequence generated by algorithm I satisfies *x = Mi(*x) for any i E L (ii) I / x = M~(x) for every i E I is a sufficient condition o f optimality then *x is a maximum of f over A. (iii) I[ in plus this maximum is unique, then the sequence (kx), k E N, converges to this maximum. Proof. (i) Hypothesis (H5) means that in any sequence of m successive iterates any "operation" i is used at least one time. That is to say, in M appear all the functions Mi, i E I. Then by L e m m a 13, any accumulation point x of the sequence (kX), k E N, is a fixed point for each M , i E I. (ii) Evident. (iii) For the sequence we consider g an arbitrary accumulation point of (kx), k ~ N. By (HI), the sequence of values f(kX) are monotone non decreasing; then we obtain [ ( s [(*x). Hypothesis (ii) and the unicity of the maximum yield: .~ = *X.
3.3. Application to the cyclic case
Algorithm I used in a cyclic way is written in the following manner: Algorithm I bis. Let ~ be a starting point given in A, if kx = Mi(kx) for any i E L then stop, else: k*~X = Mi(kx)
with k ~ i (rood p).
Gathering p successive iterations and defining M = Mp o Mp_~ o . . . o M~, this cyclic algorithm may also be written: let ~ be a starting point given in A; if
J.Ch. Fiorot, P. Huard[ Composition of algorithms
77
kx = M(kx), then stop, else: k+tx = M(kx). This algorithm had been treated in [29, p. 111] and in [19, p. 515] under a less general form (called "univariate relaxation method" following the terminology of [19, p. 224]). If the sequence (kx), k E N, has infinitely many distinct points, Hypothesis (H5) is automatically satisfied: we have the Theorem 15 results under Hypothesis H1 to H4: any accumulation point o f (kx), k E N, given by Algorithm I bis is a fixed point for each Mi, i E I. Let us give some examples relating to Algorithm I or I b i s . Example 16. The set A is R 2. Let us define Ai(x) = {y E i~1,2.
R21y= x
+ Oe,, 0 ~ R},
[ : ( x , x2) ~ - ([(xl - 2) cos a + (x2 - 2) sin al) 1/2 - ( ! - (x~ - 2) sin a + (x~ - 2) cos ~{)1/2 with a S 0 (mod ~ ' ) . This function [ is not quasi-concave and its directional derivatives are not defined along Ll and L2 (Fig. 2), but f satisfies (H3). The iterates *x given by Algorithm I bis are alternatively on L~ and L2 and converge to O (which is the unique maximum).
\
L2
Z
..
" " L1
e2
_--- x 1 F i g . 2,
Example 17. In Remark 12 we point out that the iterates can converge even if Meyer's hypothesis is not satisfied. We give such an example with A = R E and f : ( x , x2)--, min (1 - Ixl + x2[, - 2 1 x l - x2[). The set of optimal solutions (Fig. 3) is the segment [a, b] with a = (- 89
78
J.Ch. Fiorot, P. Huard[ Composition o[ algorithms
b = (89 ~). The corresponding optimum value of jr is zero. The maximizations are done on Ai with A i ( x ) = { y E R 2 1 y = x + 0 e . 0 ~ R}, i = 1 , 2 . Although {x ~ E o l x = M ( x ) , j r ( x ) = 0} = [a, b], an infinite set, for an arbitrary starting point ~ the sequence of iterates converges to a or b or reaches a point of [a, b] in one step. In this example jr also satisfies (H3) but T h e o r e m 15 does not warrant the convergence.
]
:' If(x)
= -I'~
I it
Fig. 3.
3.4. Differentiable case Introduce (H6): [ is differentiable. Now recall the well-known notion of cone of tangents to A at a point x E A:
T ( A , x ) = {y E R" [y = limkA(kx--x) with kx E A and gx -~ x(k ~ oo), k;t ~ R§ In the following, we shall simply write T instead of T ( A , * x ) , Ti instead of T(Ai(*x),*x). We shall also use F ( T ) the negative polar cone of the cone T, F ( T ) = {u ~ (R")*lu-x ~ 0, i = 1, 2, 3, 4}. We define a function f inside G explicitly and outside G we define it by its level sets: For x E G, 4
f ( x ) = fl(x) with f~(x) = B (g~ " x)2" It is a regular F-distance, following Huard [9]. For x ~ G, f is a function whose level sets are defined by four line segments respectively parallel to the sides of G, their lengths being equal to those of the sides, at a distance p from
J.Ch. Fiorot, P. Huard/ Composition of algorithms
82
~3
A2
-2
Fig. 4.
them; these four line segments are linked together by arcs of circles with centers a, b, c, d and radius p so that the join of different curves and line segments is continuously differentiable. Let us note that: jr(x)= 0 and V / ( x ) = 0 for any x belonging to the boundary abcd of G; inside G, jr is a strictly positive quasiconcave function and the level sets of f are decreasing from zero when we move away from G. Let us draw in Fig. 5 the graph of jr but only for x3 -< 0. For x3->0 we obtain a bell-shaped surface which presents a top whose projection on x~ 0, x2 is the "centre" in the sense of [9] belonging to the interior of G. This surface tangent to x~ Or X2 along abcd is connected with the above part x3-< 0 in such a way that it is continuously differentiable and quasi-concave on R 2.
Let us define three directions A,, A2, A 3 parallel to the sides ab, dc, ad and
4
S
A rule line segment moving in a parallel v direction to b'~. and leaning on ~ edges (y) and (y') of the parts of / parboloid )id of vertices b and c /(IY)
C
" \\~
\ A part of paraboloid of revolution defined by the following equation (x - c )2 + (x - c )2 + x = 0 1
F%.
5.
1
2
2
3
J.Ch. Fiorot, P. Huard/ Composition of algorithms
83
maximize following At, a2, za3 in this order and cyclicly. Starting from ~ (Fig. 4), choosing as a successor, when there are many possibilities, the point furtherst away, we obtain four accumulation points a, b, c, d which are stationary points i.e. having a gradient equal to zero. These points are not the maximum, this being the " c e n t r e " of G. For b and c we have b E Mi(b), c E Mi(c) for i = 1, 2, 3 i.e. b @ M ( b ) and c E M ( c ) . On the other hand points a and d satisfy the following conditions: a E Ml(a), a ~ M3(a) but a $ M2(a): Mz(a) = {a'} E int G,
d E M2(d), d E M3(d) but d q~ Ml(d): Ml(d) = {d'} E int G. This is exactly what Corollary 24, i.e., Theorem 21 predicts, and no more can be said about these points. Example in R 3. Here directions A1, A2, A3 are independent, they are the directions of the axes el, e2, e3. Fig. 6 represents incompletely three contours of f with 1, 0, - 1 values. Fig. 7 represents a section in the horizontal plane containing points a, b, c, d an6 the starting point ~ The maximization is made following A~, A2, A3 in this order and cyclicly.
e3
--2
Fig. 6.
84
J.Ch. Fiorot, P. Huard/ Composition o[ algorithms
/ k
2X _: 3 X
/ /
t~ '
ix
-i
0•
Fig. 7. T h e infinite s e q u e n c e of p o i n t s kX s t a y s in the s e c t i o n p l a n e b e c a u s e in the v i c i n i t y of d and b we h a v e kx E M3(kx). T h i s s e q u e n c e has f o u r a c c u m u l a t i o n p o i n t s a, b, c, d. F o r c we have c @ M i ( c ) for i = l, 2, b u t c ~ M3(c), b e c a u s e t o w a r d s the l o w e r part there are b e t t e r values than f ( c ) o v e r A3(c).
References
[I] A. Auslender, "M6thodes num6riques pour la d6composition et la minimisation de fonctions non diff6rentiables", Numerische Mathematik 18 (1971) 213-223. [2l J. Cea et R. GIowinski, "Sur les m6thodes d'optimisation par relaxation", Revue Franqaise d'Automatique, d'lnformatique et de Recherche Opdrationnelle (Dgcembre 1973) 5-32. [3] R. Boyer, "Quelques algorithmes diagonaux en optimisation convexe", Th~se 3~me cycle, Universit6 de Provence 0974). [4] D. Chazan and W. Miranker, "Chaotic relaxation", Linear Algebra and its Applications 2 (1969) 199-222. [5] J. Dubois, "Theorems of convergence for improved nonlinear programming algorithms", Operations Research 21 (1973)328-332. [6] J.C. Fiorot et P. Huard, "Composition et r6union d'algorithmes g6n6raux", Compte-Rendus Acaddmie des Sciences Paris, tome 280 (2 juin 1975), Sgrie A, 1455-1458-S6minaire d'Analyse Num6rique No. 229, Universit~ de Grenoble (Mai 1975). [7] P. Huard. "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331. [8] P. Huard, "Extensions of Zangwi[l's theorem", this volume. [9] P. Huard, "A method of centers by upper-bounding functions with applications", in: J.B. Rosen, O.L. Mangasarian, K. Ritter, eds., Nonlinear programming (Academic Press, New-York, 1970) 1-30.
ZCh. Fiorot, P. Huardl Composition of algorithms
85
[10] B. Martinet et A. Auslender, "M6thodes de decomposition pour la minimisation d'une fonction sur un espace produit", S I A M Journal on Control 12 (1974) 635-643. Ill] G.G.L. Meyer, "Conditions de convergence pour les algorithmes it6ratifs monotones, autonomes el non d6terministes", Revue Fran(aise d'Automatique, d'In[ormatique et de Recherche Op~rationneUe I I (1977)61-74. [12] G.G.L. Meyer, "Convergence conditions for a type of algorithm model", S I A M Journal on Control and Optimization 15 (1977) 77%784. [13] G.G.L. Meyer. "A systematic approach to the synthesis of algorithms", Numerische Mathematik 24 (1975) 277-289. [14] R. Meyer, "On the convergence of algorithms with restart". S I A M Journal o[ Numerical Analysis 13 (1976) 696-704. [15] R. Meyer, "Sufficient conditions for the convergence of monotonic mathematical programming algorithms", Journal of Computer and System Sciences 12 (1976) 108-121. [16] R. Meyer, "A comparison of the forcing function and point-to-set mapping approaches to convergence analysis", S I A M Journal on Control and Optimization 15 (1977) 699-715. [17] J.C. Miellou, "M6thode de Jacobi, Gauss-Seidel, sur-(sous) relaxation par blocs appliquEe ~ une classe de probl6mes non lin6aires", Compte-Rendus Acad~mie des Sciences Paris tome 273 (20 D6cembre 1971) S6rie A, 1257-1260. [18] J.C. Miellou, "AIgorithme de relaxation chaotique .h retards", Revue Fran(aise d'Automatique, d'In[ormatique et de Recherche Op~rationnelle (Avril 1975) 55-82. [19] J.M. Ortega et W.C. Rheinboldt, Iteration solution o[ nonlinear equations in several variables (Academic Press, New York, 1970). [20] J.M. Ortega et W.C. Rheinboldt, "A general convergence result for unconstrained minimization methods", S l A M Journal of Numerical Analysis 9 (1972) 40-43. [21] A.M. Ostroswki, Solution of equations and systems of equations, (Academic Press, New York, 1966). [22] B.T. Poljak, "Existence theorems and convergence of minimizing sequences in extremum problems with restrictions", Soviet Mathematics Doklady 7 (1966) 72-75. [23] E. Polak, Computational methods in optimization, a unified approach (Academic Press, New York 1971). [24] F. Robert, "Contraction en norme vectorielle: convergence d'it6rations chaotique pour des 6quations de point fixe ~ plusieurs variables", Colloque d'Analyse Num6rique (Gourette 1974). [25] F. Robert, M. Charnay et F. Musy, "Iterations chaotiques s~rie parall/,'le pour des ~quations non linEaires de point fixe', Aplikace Mathematiky 20 (1975) 1-37. [26] S. Schechter, "Relaxation methods for linear equations", Communications on Pure and Applied Mathematics 12 (1959) 313-335. [27] S. Schechter, "Minimization of a convex function by relaxation", in: Abadie, ed., Integer and nonlinear programming (North Holland, Amsterdam, 1970) 177-189. [28] R.S. Varga, Matrix iterative analysis (Prentice Hall, Englewood Clifs, N J, 1%2). [291 W.I. Zangwill, Nonlinear programming: a unified approach (Prentice Hall, Englewood Cliffs, N J, 1969).
Mathematical Programming Study 10 (1979) 86-97. North-Holland Publishing Company
MODIFIED
LAGRANGIANS
IN CONVEX
PROGRAMMING
AND
THEIR G E N E R A L I Z A T I O N S E.G. G O L ' S H T E I N
Central Economics-Mathematical Institute, Moscow, U.S.S.R. and N.V. T R E T ' Y A K O V
Central Economics-Mathematical Institute, Moscow, U.S.S.R. Received 22 June 1977
In this paper a rather general class of modified Lagrangians is described for which the main results of the duality theory hold. Within this class two families of modified Lagrangians are taken into special consideration. The elements of the first family are characterized by so-called stability of saddle points and the elements of the second family generate smooth dual problems. The computational methods naturally connected with each of these two families are examined. Further a more general scheme is considered which exploits the idea of modification with respect to the problem of finding a root of a monotone operator. This scheme yields a unified approach to convex programming problems and to determination of saddle and equilibrium points as well as expands the class of modified Lagrangians.
Key words: Modified Lagrangians, Convex Programming, Stability of Saddle Points, Smooth Dual Problems. Monotone Operator.
1. A general class of modified Lagrangians in convex programming C o n s i d e r the c o n v e x p r o g r a m m i n g p r o b l e m ]'(x) ~ sup,
g ( x ) = ( g l ( x ) . . . . . gin(x)) _> 0,
xEG,
(l)
w h e r e G is a c o n v e x s u b s e t of the E u c l i d e a n s p a c e E" a n d the f u n c t i o n s f ( x ) ,
gi(x) are f i n i t e - v a l u e d a n d c o n c a v e o n G. The standard approach
to p r o b l e m (1) (see, e.g. [4]) m a k e s
use of the
Lagrangian function
Fo(x, y) = f ( x ) + (g(x), y),
(2)
w h e r e x E G, y E E " , a n d (., .) d e n o t e s the i n n e r p r o d u c t in the E u c l i d e a n space. O n e m a y briefly d e s c r i b e the role of f u n c t i o n (2) in the f o l l o w i n g way. D e n o t e E~' = {y E E " : y -> 0},
~0(x) = inf>,eE~. Fo(x, y), 86
O0(Y)= supxec Fo(X, y)
and
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
87
consider problems and
q,0(x)--* sup,
x E G,
(3)
~b0(y)--*inf,
y E E~'.
(4)
It is easy to verify that (3) is equivalent to the original problem (1). More precisely, for the feasible set (~ = {x E G: g ( x ) ~ 0} of problem (1) one has
~Oo(X )
~f(x),
x ~ d,
t -~,
x ~ G.
(5)
Thus, the equality supxea q~0(x)= supxed [(x) holds together with X * = X*, where X * = Arg max ~00(x),
X * = Arg max [(x).
x~G
x~.G
On the other hand under well-known conditions (see, e.g. [4]) the duality framework is valid, i.e. sup ~0(x) = i n f 00(Y), xEG
v ez7
the saddle points of Fo(x, y) with respect to (x, y ) E G x E~ forming the set X/f x Y~' with Y~' = Arg minyeEr 00(Y). In other words, the determination of a solution x* of problem (1) may be replaced by that of a saddle point (x*, y~') of function (2). Recall that the solutions y* of the dual problem (4) yield a certain characteristic of stability of the original problem (1). The same approach to the convex programming problem turned out to be applicable, Fo(x, y) being replaced by some other functions. The latter functions are called modified Lagrangians. A function F(x, y) concave in x E G, convex in y E E.'2 is said to be a modilied Lagrangian for problem (1) if a relation analogous to (5) holds, i.e. if inf F(x, y) = {f(x), '
x E G
Let the saddle-point set J(* x 17"* of the modified Lagrangian F(x, y) with respect to (x, y) E G x E .~ be non-empty. Then according to the given definition the first c o m p o n e n t ~'* of this set coincides with the solution set X * of problem (3) or, equivalently, with the solution set X* of problem (1). Now we define a class of modified Lagrangians. Let )t;(~:, rt), i = 1. . . . . m be finite-valued functions for all sr E ( - ~ , +~), 77 E [0, +~). For y = (y~. . . . . y,,) define a function
F~(x, y) = ]'(x) + (g(x), y) - A(g(x), y)
(6)
A(g(x), y) = ~ h.i(gi(x), Yi).
(7)
with i=1
88
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
Further, denote ~x(x) = inf F,(x, y), yEE m
O~(y) = sup FA(x, y) xEG
and consider two problems q~A(x)~sup,
x E G,
(8)
~bx(y)--->inf,
y E E~'.
(9)
Obviously, the ordinary Lagrangian (2) and problems (3), (4) are a particular case of (6)-(9) which corresponds to X~(~, 7) = 0, i = 1. . . . . m. Finally, denote X * = Arg max ~x(x), xEG
Y* = Arg rain 0x(Y). yEE~
Lemma 1 (see [1]). L e t the following a s s u m p t i o n s hold f o r i = 1 . . . . . m. (a) Ai(~, ,/) is c o n v e x in ~, c o n c a v e in ~1. (b) The f u n c t i o n ~1 - Ai(~, 17) is n o n - d e c r e a s i n g in ~ f o r any 71 >- O. (c) lim~.+| [~7/- Ai(~, 7/)] = -oo f o r any ~ < O. (d) inf,~o [ ( r / - Ai(~, 7?)] = 0 f o r any ~ > O. Then the f u n c t i o n FA(x, y) defined by (6), (7) is a modified Lagrangian. particular, X * = X * = X * .
In
Under the assumptions of L e m m a 1 the set Y* depends upon the choice of A~(~, r/), i = 1. . . . . m. A simple extra condition implies that Y* is independent of this choice, i.e. one has Y* = Y~. Lemma 2 (see [1]). Let, in addition to (a)-(d), the following condition be satisfied f o r i = 1..... m. (e) Ai(O, r/) = (OAi/O~:)(O,7/) = 0 f o r all ~1 > O. Then the equality Y * = Y * holds.
The contraction of the class of modified Lagrangians due to the additional requirement (e) looks natural in view of the following circumstances. On the one hand this contraction maintains the usual interpretation of vectors y* E Y~' in various applications (e.g., in mathematical economics). On the other hand it still gives one a chance to improve certain computational methods of convex programming as is shown below. The set of conditions (a)-(d) admits a more obvious form. Namely, L e m m a s 1 and 2 imply the following proposition. Theorem 1 (see [1]). L e t the following a s s u m p t i o n s hold f o r i = 1. . . . . m. (1) Ai(~, 7) is c o n v e x in ~ ~ ( - ~ , +oo), c o n c a v e in T1 E [0, +~). (2) Ai(O, 7) = (OAj/O~)(O, ~/) = 0 f o r all 7? >- O.
E.G. Gol' shtein, N. V. Tret'yakov/ Modified Lagrangians
89
(3) A~(~, 7) -< ~*/for all ~ >- 0, n -> 0. Then the function Fx(x, y) defined by (6), (7) is a modified Lagrangian for any problem of the form (1). Furthermore, the equality Y * = Y * is valid. Throughout the rest of the paper we deal only with such modified Lagrangians for which Y* = Y*. In view of that we shall use the notation Y* for this set.
2. Modified Lagrangians and stability of saddle points The following definition was first given in [5] (see also [1]). The saddle-point set U* x V* of a function F(u, v), concave in u E U, convex in v ~ V is said to be stable in u (with respect to F(u, v)) iff Arg max F(u, v*) = U*
for any v* E V*.
uEU
Theorem 2 (see [1]). Let, in addition to (1)-(3), the following condition hold for i = l . . . . . m. (4) Ai(~,,1)>0 for ~ # 0 , , / > 0 and for ~ < 0 , ~ / = 0 . Then the set X * x Y * which is the saddle-point set of the function F~(x, y) is stable in x with respect to this function. Evidently the condition (4) is not satisfied for Ai(~, , / ) = 0. It agrees with the well-known fact that if no extra properties similar to strict concavity of f(x), gi(x) are in the case, then the set X * x Y* is not stable in x with respect to the ordinary Lagrangian F0(x, y) in the sense of the above definition. The stability of saddle points in y may be defined in a similar way. There is a connection between stability and the problem of convergence of the subgradient method for determining saddle points. The following proposition concerns arbitrary saddle functions which may have no relation to problem (1). Let U, V be closed convex sets in Euclidean spaces and let O, Q be open convex sets which contain U and V respectively. Consider a function F(u, v) concave in u E 0 and convex in v ~ Q whose saddle-point set U* x V* with respect to u E U, v E V is assumed to be non-empty. L e t OuF and 0vF denote the subdifferentials of F in u and in v respectively. Further, let ~ru and ~'v be projections on U and on V. The subgradient projection method may be written as follows u k+l = r
k + t~klk),
Vk+l = ~r~(vk -- aglk),
(10)
where I k E O~F(u k, vk),
1~ ~ O~F(u k, vk).
Theorem 3 (see [1, 5]). I f the saddle-point set U* x V* o f the function F(u, v)
90
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
with respect to u E U, v E V is b o u n d e d a n d stable both in u a n d in v and i[
Jim.k=0,
no
where a = a ( u ~ v ~ is small enough, then m e t h o d (10) converges in terms o f the distance, i.e.
tim dist(u k, U*) = [!m dist(v k, V*) = 0. One may indicate explicit expressions of modified Lagrangians with the property of saddle points stability in both variables in the case of the linear programming problem. In the general case of problem (1) such explicit expressions are not known although it is possible to obtain modified Lagrangians with the desired property using the operation "sup" in x (see below, Section 5). In view of that the following recent result of Maistrovskii [6] is of interest. Let the saddle function F ( u , v) considered in Theorem 3 be smooth. Then according to [6] the statement of Theorem 3 is valid if the stability of U* • V* either in u or in v is required instead of stability in both variables. Therefore if the functions involved in the convex programming problem are smooth then the subgradient projection method (10) with an infinitely decreasing step size converges to the set X* • Y* in terms of the distance, when being applied to a modified Lagrangian which satisfies the conditions of Theorem 2.
3. Modified Lagrangians generating smooth dual problems and the modified dual method of convex programming
The requirements (1)-(4) of Theorem 2 permit one to get smooth dual problem (9) with no restrictive assumptions concerning the original problem (1). This important property of modified Lagrangians was first discovered by several authors [7-9] for the "quadratic" modification (see below). It is this property that leads to certain generalizations of modified Lagrangians given in Section 4. Consider a family of modified Lagrangians for which the property holds. Let a ( u ) be a convex function which belongs to C~(E") and satisfies the conditions a(0) = 0, Va(0) = 0 and IVa(u')-Va(u")l >- " / l u ' - u"l,
~, > 0
for all u', u" • E". Consider the function F " f x , y) = [(x) + max [(g(x) - t, y) - a ( g ( x ) - t)] tEE~
(11)
which is concave in x E G and convex in y E E ' . Generally speaking, the function Fa(x, y) does not admit the form (6) with X(g(x), y) of the separable
E.G. Gol'shtein. N. V. Tret'yakov/ Modified Lagrangians
91
type (7). But as a matter of fact the form (6), (7) was used mainly to simplify the notations, and it can be easily verified that the statements of Theorems 1-3 are valid for F*(x, y). Furthermore, it looks quite natural to consider F~(x, y) for any y E E m, since according to the following proposition the function ~b~(y) = sup F*(x, y) x•G
is well-defined (and smooth) everywhere in Em. T h e o r e m 4 (see [1,2]). The saddle-point set of Fa(x, y) with respect to x E G,
y E E = coincides with X* x Y*, this set being stable in x. The only assumption $0(Y) ~ + ~ implies that 4,'(y) E CI(E m) and IV~'(y ') - V~'(y")l-< (l/~/)ly'- Y"I; Y', y " E E m, where y is the constant involved in the definition of a(u). Note that the modulus l / y in the Lipschitz condition above depends only upon the choice of ~t(u) and consequently it may be treated as known. Therefore the dual problem ~a(y)~inf,
y E E m,
the problem of unconstrained minimization of the smooth function Sa(y), may be solved by means of the finite-step gradient method. This is a way of solving problem (1) and it is natural to call it the modified dual method. When investigating the convergence of the method, one should take account of the fact that in general the calculation of V~b~(y) may be carried out but approximately. In the following scheme the errors in calculating V~b~ are considered in terms of approximate maximization of Fa(x, y) over x E G, what looks natural from the computational point of view. So, for any y ~ E m let the sequences {xk},{yk}, k = 0, 1..... be defined by the relations x k E G,
FO(x k, yk) > sup Fa(x, yk) _ 8k, xEG
yk+l = yk _ 3,kVyF~(x k, yk),
(12)
with 8k -> O,
0 < inf yk -< sup Yk < 2~.
Denote by v and t~ respectively the optimal value and the asymptotic optimal value of problem (1). T h e o r e m 5 (see [2]). Assume that O0(y) # +oo and Y* • O. Let {xk}, {yk} satisfy (12) with an arbitrary starting point y~ E =. Then the condition ~,~=o 8~,/2< +oo
implies that [i_m yk = y* E Y *,
[im inf gi(xk) >--O,
limf(xk)=tS. k.-r
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
92
Moreover, in the case when Y* is bounded (e.g., when the Slater condition is satisfied), the only requirement limm~ 84 = 0 implies that lirr2~inf g,(x k) >- O, lira f(x k) = v, k--|
Jim dist(y 4, Y*) = 0,
For the case of a closed bounded set G, some algorithms for solving problem (1), which are based on the scheme (12), are considered in [2]. For the "quadratic" modified Lagrangian, i.e. for a(u)=-~/{ul 2, the modified dual method was investigated in [7-10]. On the other hand, in several papers by B.W. Kort and D.P. Bertsekas (see, e.g., [12, 13]) a version of the dual method for non-quadratic modified Lagrangians is examined which differs from the method presented here.
4. The modification method for monotone operators
Let Z be a c o n v e x set in the Euclidean space E. A point-to-set operator T : Z ~ 2 E is considered, the set T(z) being non-empty for all z E Z . The operator T is supposed to be monotone, i.e. for any z', z " ~ Z and for any t' ~ T(z'), t" E T(z") one has
(t'-t",z'-z")>_O. The problem under investigation consists in finding a root of T on the set Z, i.e. such z* E Z that 0 E T(z*). In the special case when Z = E and T is a single-valued operator satisfying the inverse strong monotonicity condition
(T(z') - T(z"), z ' - z") >- 3~lT(z') - T(z")l 2, 3, > 0
(13)
for all z', z" E E, the problem may be solved by means of the following method: z k+~ -- z k - ~/klk,
Ii 4 - T(z4)l < e4,
~4 -> 0,
k = O , 1..... Theorem 6 (see [3]). Suppose that (13) holds, the set Z* = {z: T(z) = 0, z E Z} is
non-empty, and the [ollowing inequalities are satisfied: 0 < inf ~/k < sup ~'k < 2y,
_oek
k=0,1 .....
~+oo.
Then limk_.| z k = z* E Z*. It is shown in [2] that if, in particular, T(z) is the gradient of a convex differentiable function f(z), then (13) is equivalent to the Lipschitz condition for
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
93
T(z). Obviously, in this case the method above coincides with the perturbed gradient method of minimization of f ( z ) . The modification method presented here treats the general case in which (13) is not valid. In this case the problem of finding a root of a given arbitrary monotone operator is replaced by that of finding a root of a modified operator, the latter satisfying condition (13). The modification may be obtained by the following scheme. Let R : E - } E be any operator satisfying the inverse strong monotonicity condition (13) as well as the strong monotonicity condition ( R ( z ' ) - R(z"), z' - z") >- 7 , l z ' - z"[2,
71 > 0
(14)
for all z', z" ~ E with 73'~ -< 1. For each w E E consider the operator TR.w : Z ~ 2 E defined by the equality TR.w(z) = T ( z ) - R ( w - z)
and denote by z ( w ) the root of TR.w. (Evidently, if such z ( w ) exists then it is the only root of TR.w on Z). The modified operator TR : E ~ E is then defined by the formula TR(w) = R ( w - z(w)).
Recall that a monotone operator T : Z - * 2 E is said to be maximal monotone if z' E Z, t' E T ( z ' ) whenever the inequality (t' - t, z' - z) --- 0 holds for all z • Z, t ~ T ( z ) . An important example of a maximal monotone operator is given by T=K+Q:Z-,2
E,
where Z is a closed convex set, K ( z ) is the normal cone for Z in the point z, i.e. K ( z ) = {1 : (1, z' - z) >- O, V z ' E Z},
and Q is a m o n o t o n e upper-semicontinuous point-to-set mapping, the set Q ( z ) being compact in E for each z E Z. 7 (see [3]). S u p p o s e that T : Z - * 2 E is a m a x i m a l m o n o t o n e operator. Then the f o l l o w i n g properties hold: (i) the modified operator T is well-defined and single-valued f o r each w ~ E; (ii) Ts satisfies condition (13); (iii) the modified operator T R : E - - * E has the s a m e roots as the original operator T : Z --* 2 r'.
Theorem
Obviously, one may set R ( z ) = V~o(z), where the function ~0(z) is strongly convex in E, the gradient Vr being Lipschitz continuous. Note that the case of R ( z ) = Vr with ~o(z)= 89 2 was considered by R.T. Rockafellar in [13].
94
E.G. Gol'shtein. N. V. Tret'yakov/ Modified Lagrangians
As to the root z(w) of TR.w, it can be found, with any accuracy, by means of "gradient-like" method with infinitely decreasing step-size (see [3, Section 4]). Thus, in view of Theorems 6 and 7 the modification method yields, for practically any maximal monotone operator, a process converging to its root.
5. S o m e applications of the modification m e t h o d
(A) Consider again the convex programming problem (1). Using the notations of Sections 1-3, let us set E = E m,
Z = Y = { y E E m : cgO0(y)g 0},
T ( y ) = OCJ0(y)
in the above scheme of the modification method. The operator T is known to be maximal monotone under the assumption that tP0(y)-= +oo. Further, set R ( y ) = Vq~(y), the convex differentiable function ~(y) satisfying the conditions ~',ly' - y"[ - I V ~ ( y ' ) - V~,(y")] -< ~1 [ y ' - y"[,
1 0 < y, _< -~
(15)
for all y', y" ~ E'L The modified operator TR, which under the assumption above is well-defined, takes the form
TR(v) = VCa(v),
v (~ E m,
where tp~(v)= sup~e~ Fa(x, v), the function Fa(x, v) being a modified Lagrangian of part 3, and the function a(u) = r is conjugate to r Thus Theorems 4 and 5 may be obtained by implementation of Theorems 6 and 7. (B) Consider now another application of the modification method to problem (I). Set E = E ~§
z = (x,y),
T(z) = (-O~Fo(x, y)) x {~TrFo(x, y)} = O~Fo(z), Z = {z ~ E"+" : a~Fo(z) = t~}, g(z) = {V,p,(x), V,p2(y)}, where the function r conditions:
satisfies (15), and the function ~o~(x) satisfies similar
y l x ' - x"[-< [V~,(x') - V~,(x")[--- 1 I x ' - x"[,
(16)
for all x', x" E E". If the set G involved in problem (1) is closed and the functions [(x), gi(x) are upper-semicontinuous over G then the modified operator TR exists and
TR(w) = V~P"(w) = ( - V . F " ( w ) , V~P"(w)),
w = (u, v) E E "+",
95
E.G. Gol'shtein, N. V. Tret'yakov/ Modified Lagrangians
with Pa(u, v) = max #"(u, v, x), xEG
# ~ ( u , v, x ) = f ( x ) - a l ( u - x ) + max [(g(x) - t, v) - a 2 ( g ( x ) - t)] tEE+
= F'~2(x, v) - a d u - x ) ,
al(u) = ,p,(u),
a2(v) = ~'(v).
By Theorem 7 the function P~ is a modified Lagrangian for problem (1) with the following properties: (i) the saddle-point set of Pa(u, v) with respect to u E E n, v E E" coincides with X* x Yr
(ii) P~(u, v) is differentiable in (u, v) = w and there holds (VwP~(w')- VwF~
w ' - w " )>- 3' [ VwPO(w')- VwP~(w")l2
(17)
with w', w " E E n+m,
VwP~(w) = (-Vu#a(w), VvP~(w)).
The inequality (17) being verified, a saddle point of F ( u , v) may be found by means of the perturbed finite-step gradient method (see Section 2) which in this case takes the form x k ~ G,
# ~ ( u k, v k, x k) -> P ~ ( u k, v k ) - 8~,
8k --- 0,
u k+l = u k - ykVat(U t - x k ) ,
k = 0, 1 . . . . .
v~+, = v k _ y k V v # ~ ( u k, v k, xk), T h e o r e m 8 (see [3]). A s s u m e
that
G is c l o s e d a n d the f u n c t i o n s
i = 1 . . . . . m are u p p e r - s e m i c o n t i n u o u s
o v e r G a n d let t h e s a d d l e - p o i n t
Y * o f Fo(x, y), x E G, y ~ E~' be n o n - e m p t y .
0 < inf Yk --<sup Yk < 2y,
f(x), gi(x), set X * x
T h e n u n d e r the c o n d i t i o n s
8~/2 < + ~
~__0
one has i m u k = u* E X * ,
i m v k = v* E Y *
f o r a n y s t a r t i n g p o i n t (u ~ v ~ C E n+m.
Note that for the case of quadratic functions al(u), a2(v) Theorem 8 was first proved by R.T. Rockafellar in [14].
6. O n e m o r e p r o p e r t y of the m o d i f i e d L a g r a n g i a n
Fa(x,
y)
The inequality (17) is valid for the function pa but it does not hold in the case
96
E.G. Gol'shtein. N. V. Tret'yakov/ Modified Lagrangians
o f F a. T h e l a t t e r f u n c t i o n satisfies the c o n d i t i o n (V~F~(z ') - V y ~ ( z " ) , z ' - z">= = - (VxF~
') - VxF~(z"), x ' - x">+ (VvFa(z ') - VyF~(z"), y ' - y")
_ 3,lV,Fa(z ') - V,Fa(z")12;
(18)
w i t h z = (x, y), x', x " E G, y', y " E E ~, w h i c h is not as s t r o n g as (17). N e v e r t h e less, the c o n d i t i o n (18) is sufficient f o r c o n v e r g e n c e o f the f i n i t e - s t e p g r a d i e n t m e t h o d to a s a d d l e p o i n t o f F~(x, y) w h e n e v e r the f u n c t i o n s f ( x ) , gi(x) i n v o l v e d in p r o b l e m (1) a r e s m o o t h e n o u g h . M o r e p r e c i s e l y , s u p p o s e t h a t G = E n a n d t h a t the g r a d i e n t s V f ( x ) , Vg~(x) a r e L i p s c h i t z c o n t i n u o u s o n e a c h b o u n d e d s u b s e t o f E n. C o n s i d e r t h e p r o c e s s xk+t = x k + / 3 V x F ~ ( x ~, yk), yk+l = y~ _ / 3 V y F , ( x k, y~),
k=0,1
.....
Theorem 9. I f X * x Y * • ~ then f o r a n y s t a r t i n g p o i n t (x ~ yO) there exists such /3o(Xo, yO) > 0 t h a t u n d e r the condition 0 < / 3 i0. Let us assume x ~ P. Then x~+lE F2(x~), Vi E N, i - i0, and consequently
1oo
P. Huardl Zangwill's theorem
x , E F2(x,). We may use (3,0) with x = x ' = x " = x,, and we obtain x, ~ Ft(x,), and hence x, ~ F2(x,) because f t ~ Us, hence a contradiction. The hypothesis x~ ~ P is not possible, and the proposition is then satisfied.
(2) The s u b - s e q u e n c e corresponding to N ' is infinite. By definition of N', x. is a cluster point of this sub-sequence. Denoting by s(i) the successor of i in N ' , we have: xiEE-Q,
ViEN',
(1)
xs(i) E FE(Xi),
Vi E N ' .
(2)
Since F, D F2 and from (2) we have x.,~ E F,(xi), Vi E N ' and using (/3o) we get in succession: Fl(x sti)) C Fl(xi),
V i E N'.
xj E Fl(X~),
Vi, j ~ N " , j > i.
x , E F~(xi),
Vi E N ' .
(3)
Let N " be a subset of N ' defining a sub-sequence converging towards x,. This convergence implies with (1): 3 i o E N": xi E ( E - Q) n V ( x , ) ,
v i E N " , i >_ io.
(4)
(1), (2) and (4) allow us to use (yo) with x = x,, x ' = xi and x " = x,,), where i E N". We get: X , v: F,(xs,~),
Vi E N", i >- io.
(5)
(3) and (5) are in contradiction because s ( i ) E N " if i ~ N " by definition. Then the hypothesis x, ~ P is impossible. If following Remark 3 we modify Hypothesis (3,0) we may still apply (3,0) taking for x" the pth successor of xi in the sub-sequence N". Denoting it by xj, it is evident that x i E F~(x~), and relation (3) and relation (5) thus obtained are still in contradiction.
Corollary 1. L e t E be a closed subset o f R", P be a subset o f E. L e t El, F2: (E-P)~ ~ ( E ) be two p o i n t - t o - s e t m a p s such that Fl D F2. W e s u p p o s e more f o r all x E E - P :
(al) F~(x) # fk. (/30 x ' E (E - P ) n F,(x) => Fl(x') c F,(x). (yl) :1V(x), a n e i g h b o u r h o o d o / x , such that: x' E ( E - P ) fq V(x),
x" ~ ( E - P ) tq F2(x') ~ x ~ F,(x").
A n infinite s e q u e n c e {xi ] i E N } is generated with the [ollowing rule:
P. Huardl Zangwill's theorem
101
x0C E. (rO
xi+l ~ F2(xi) Xi+l = xl
if x ~ P.
otherwise.
Under these conditions, f o r any cluster point x , o f the sequence we have x. EP.
Proof. It is a direct application of Proposition O, taking Q = P.
Remark 4. Originally in [3], instead of Hypothesis (yl), this proposition used the slightly stronger hypothesis (y2) of Corollary 2. This weakening, which does not alter the proof, was suggested by J. Denel. Remark 5. Rule (rt) for generating the sequence assumes that we are able to check whether or not a given point x belongs to P. The next corollary uses a more flexible rule (r2) which permits us to take for xi+~ a point different from x, even if x~ ~ P: that is to say using the rule x~+tE F2(x~). This flexibility is obtained at the cost of a slight strengthening of the hypotheses. Corollary 2. We use the same definitions and hypotheses as in Corollary 1, with the following modifications : F~, F2 are defined over the whole set E. (aO and (ill) are supposed valid f o r the whole set E. (y0 is replaced by: (Y2) If x ~ P. ::iV(x), a neighbourhood of x, such that: x'EEtqV(x),
x"~F2(x')~x~FK(x").
Here, the generating rule (r0 becomes:
(r2)
Xo~ E. xi+t E F2(xi) if x ~ P. xi+m E F2(xi)U{xi} otherwise.
Under these conditions, for any cluster point x, of the sequence we have x , E P.
Proof. It is a direct application of Proposition 0, taking Q = ~.
Corollary 3 (Extension of Zangwill's and Polak's theorems [6, 5]). Let E be a closed subset o f R ~. P be a subset o f E. F2:E~(E),
f:E~R.
We assume f o r all x E E - P:
(or3) F2(X) # ~.
102
P. Huard/ Zangwill" s theorem
(/33) x' E F2(x) :~ f ( x ' ) > f(x). (3,3) f upper-semi-continuous at x. 3 V(x), a neighbourhood o f x, such that: x" ~ ( E - P ) O F2(x') :::>f(x") > f ( x ) .
x' E ( E - P ) O V(x),
A n infinite sequence {x, I i E N } is obtained with the following rule: XoE E.
(r,)
xi+l E F2(xi) xi§ = xi
if xi~ P.
otherwise.
Under these conditions, f o r any cluster point x , o f the sequence we have x , E P.
Proof. This is an application of Corollary l, taking for F~ the point-to-set map defined by: El(x) = {y E E ] f ( y ) > f(x)},
which satisfies (/30. Furthermore, (/33) implies F, D F2. Lastly, (3'3) implies (3'0 in this particular context. In fact:
ffx") > f(x) ~ x r {y E E ]f(y) >- f(x")}, f u.s.c, over E - P } x"E E - P ::~ {y E. E ] f ( y ) >- f(x")} D Fl(x")n (E - P), and hence x ~ Fl(x"). Remark 6. With a slightly different formulation, Zangwill proposed in [6] the following stronger hypothesis: E compact, f continuous over E, F2 closed over E - P. His hypothesis, with (/33), implies (3'3), as shown further on by Proposition 2. Polak proposed in [5] the following hypothesis. Vx E E - P, we have (i) f continuous at x; (ii) 3 V(x), a neighbourhood of x, and 8(x), a positive scalar, such that:
x' ~ E n V(x), x" ~ F2(x')~ f(x") > f(x')+ 8(x).
This hypothesis implies (3'3)and (/33). Proposition 4. Using the notations and definitions of Corollary 3 we have the following relation. Suppose E - P compact, F2 closed at x,
f l.s.c, over E - P , f(x') > f(x),
V x ' E F~(x).
Then there exists V(x), a neighbourhood o f x, such that x'~(E-P)OV(x),
x"E(E-P)OF2(x')~f(x")>f(x).
103
P. Huardl Zangwill's theorem
Proof, Let us assume the negation o f the conclusion, that is: (Hyp.)
VV(x),
3x'E(E-P)NV(x),3x"E(E-P)AF2(x')
such that
.f(x") -< .f(x) and let us show that this hypothesis leads to a contradiction. Under this hypothesis, there exist two sequences of points xi and Yi such that:
{xi E E - P l i ~ N}--> x. {yi E ( E - P ) nF2(x~) I i E N}: f(y~) .f(x), in contradiction with the preceding result.
References [1] F. Cordellier and J.C. Fiorot, "On the Fermat-Weber problem with convex cost function", Mathematical Programming 14 (1978). [2] J.C. Fiorot and P. Huard, "Une approche th6orique du probl~me de lin6arisation en programmation math6matique convexe", Publication No. 42 du Laboratoire de Calcul (Universit6 de Lille, 1974). [3] P. Huard, "Optimisation dans R"", Cours de 3b,me cycle, Laboratoire de Calcul (Universit6 de Lille, 1971). [4] G.G.L. Meyer, "Convergence conditions for a type of algorithm model", S l A M Journal on Control and Optimization 15 0977) 779-784. [5] E. Polak, "On the implementation of conceptual algorithm", in: O.L. Mangasarian, K. Ritter and J.B. Rosen, eds., Nonlinear programming (Academic Press, New York, 1970) 275-291. [6] W.I. Zangwill, Nonlinear programming: a unified approach (Prentice Hall, Englewood Cliffs, RI, 1969).
Mathematical Programming Study 10 (1979) 104-109. North-Holland Publishing Company
ON T H E L O W E R S E M I C O N T I N U I T Y OF O P T I M A L SETS IN CONVEX P A R A M E T R I C O P T I M I Z A T I O N D. KLATTE Der Humboldt Universitiit, Berlin, G.D.R. Received December 1977 Revised manuscript received March 1978 Regarding a special class of convex parametric problems sufficient conditions for the lower semicontinuity of the optimal solution sets are developed. Key words: Lower Semicontinuity, Convex Parametric Programs, Optimal Solutions Set, Point-to-Set Maps.
1. Introduction and notation
We consider a parametric programming problem given by P(w)
min{f0(x,w) Ix E M(w)},
w E W variable,
(1)
where the parameter set W is a metric space, [0 is a real-valued function on E" • W and for each w E W, M(w) C_E" represents the constraint set. E" is the Euclidean n-space. Numerous authors have discussed the continuity of the optimal sets and/or the extreme values for various classes of linear and nonlinear mathematical programs, we refer to [1-15]. They have studied these questions by applying appropriate concepts of set convergence or of semicontinuity of point-to-set maps. Throughout this paper Berge's concepts of lower semicontinuous (l.s.c.), upper semicontinuous (u.s.c.) and closed point-to-set maps are used [1, Chapter VI]. In the second paragraph we will apply some results published by Hogan [8]. For point-to-set maps from a metric space W into the Euclidean n-space Hogan's definitions of semicontinuity are equivalent to those given by Berge (see [8, pp. 592-595]). The purpose of this note is to present a sufficient condition for the I.s.c. of the optimal set map ~ : W ~ E" defined by
O(w)={zEM(w)lfo(z,w)=
inf
xEM(w)
fo(x,w)}.
(2)
Under rather general conditions the map ~b is closed or u.s.c, on the solvability set Wp given by
Wp = {w ~ W [~b(w) = r
(3) 104
105
D. Klatte/ Continuity of optimal sets
(see [1, 2, 3, 8, 12]), while the l.s.c, of $ on Wp requires very strong assumptions. It is a well established fact that $ is not 1.s.c. on We in the simple case of linear programs parametrized in the objective function. Some classes of parametric programs which satisfy the I.s.c. property of ~/, have been discussed in [2, 5, 11, 14], for instance. The following lemma will be used in the next paragraph. It can be proved in the same way as Theorem 13 [8]. (A straightforward proof will be found in [10].) Lemma. Suppose that W is a metric space, w E W, J is a finite index set, fj are real-valued, continuous functions on E" x W f o r j E J, and the functions fj(., w), j E J, are convex on E". Let point-to-set maps L, F ~ and F f r o m W into E" be given such that F~
and F~
L(u)n{xEE"lfj(x,u)_s (x -q- s ) 2 i f x < _ - s , 0 if-s<x<s, ( z + 4 ) 2 if z - < - 4 , z2 if z -->0, 0 if - 4 < z < 0 .
O b v i o u s l y , 15 satisfies the suppositions A I - A 4 , and the constraint set m a p is l.s.c. on El+. It is easily s h o w n that
q,(t, s ) = {(x, y,
E 3 I [ - s ---x -< s, y = 1 - 8 8 2, z = - I t } fortO,
~O(t,s)={(x,y,z)EE 3[-s 21xl-', x2 = 1}
(x ~ 0),
~bv(0) = {x ~ E2 [ xl --- 0, x2 = 1}, $D(A) = {(0, O, 1)} (A ~ 0),
$D(O) = {(0, O, 0)}.
H e r e the third c o m p o n e n t o f the d u a l v a r i a b l e is c o n n e c t e d w i t h t h e t h i r d p r i m a l c o n d i t i o n x 2 - l - 0,
w h e r e (., .) d e n o t e s t h e i n n e r p r o d u c t , a n d maximal monotone if its g r a p h is n o t p r o p e r l y c o n t a i n e d in t h a t o f a n y o t h e r m o n o t o n e o p e r a t o r ) . W e u s e t h e t e r m " g e n e r a l i z e d e q u a t i o n " b e c a u s e if T is i d e n t i c a l l y z e r o , t h e n (1.1) r e d u c e s to t h e e q u a t i o n f ( x ) = 0, a n d b e c a u s e s y s t e m s like (1.1) r e t a i n s o m e o f t h e a n a l y t i c p r o p e r t i e s o f n o n l i n e a r e q u a t i o n s , as w e shall s h o w in w h a t f o l l o w s . W e shall b e p a r t i c u l a r l y i n t e r e s t e d in c o n d i t i o n s w h i c h , w h e n i m p o s e d on f a n d T, will e n s u r e t h a t t h e set o f s o l u t i o n s t o (1.1) r e m a i n s n o n e m p t y a n d is well b e h a v e d (in a s e n s e to b e defined) w h e n f is s u b j e c t e d to small p e r t u r b a t i o n s . T o i n t r o d u c e t h e s e p e r t u r b a t i o n s , w e shall m a k e u s e o f a t o p o l o g i c a l s p a c e P a n d a * Sponsored by the United States Army under Contract No. DAAG29-75-C--0024 and by the National Science Foundation under Grant No. MCS74-20584 A02. 128
S.M. Robinson [ Generalized equations, part I
129
function f 9P x f] ~ R " , so that we can replace (1.1) by 0 E f(p, x) + r ( x ) ,
(1.2)
and study the set of x which solve (1.2) as p varies near a base value po. A particular case of (1.2) of special interest for applications is that in which T is taken to be the operator dt~c, where for a closed c o n v e x set C C R " one defines the indicator function Oc of C by
r
:=~'0' x~C, +oo, x ~ C, I.
and where 0 denotes the subdifferential operator [13, Section 23]. This yields the special generalized equation 0 E I(P, x) + O~l,c(X),
(1.3)
which expresses analytically the geometric idea that f(p, x) is an inward normal to C at x. Many problems from mathematical programming, complementarity, mathematical economics and other fields can be represented in the form (1.3): for example, the nonlinear complementarity problem F ( x ) ~ K*,
x U. K,
(x, F ( x ) ) = 0
(1.4)
where F : R " ~ R " , K is a nonempty polyhedral c o n v e x cone in R", and K * := {y E R" I (Y, k) --- 0 for each k E K}, can be written as 0 E F ( x ) + Oq~K(x). Further information on nonlinear complementarity problems (often with K = R% the non-negative orthant) may be found in, e.g., [2, 4, 7, 81. The K u h n - T u c k e r necessary conditions for mathematical programming [61 form a special case of (1.4); e.g., for the problem minimize
0(y),
subject to
g(y)-~0,
h(y) = 0.
(1.5)
where 0, g and h are differentiable functions from R m into R, R q and R r respectively, one has the K u h n - T u c k e r conditions O'(y)+ug'(y)+vh'(y)=O, h(y) = O,
u>-O,
g(y) 0 and rl > 0 with Xv := X0 + ~,B C 12, such that f o r each Xo E X0:
132
S.M. Robinson/Generalized equations, part I
(i) X~, n(Lf,o + T)-1(0) = Xo; (ii) X~ n ( L f , o + T) --1 is U.L.()t) at 0 with respect to riB; (iii) for each y E riB, X , n(Lf~o+ T)-~(y) is convex and nonempty. Then there exist a number ~ E (0, !11 and a neighborhood U(po) such that with E(p):={{xEXo+SBlOEf(p,x)+T(x)}
0
pEU,
p~U,
one has : (1) ~ is upper semicontinuous from U to R" ; (2) E(p0)= X0; and (3) For each e > O, for some neighborhood U,(po) and for each p E U,, 0 # ~ ( p ) C Z(p0) + ()t + e)ao(P)B, where a0(p) := max(Ill(p, x ) - f(P0, x)ll I x e x0}. Note that if P is actually a normed linear space and if f(p, x) is Lipschitzian in p uniformly over x ~ Xo, then for some constant Ix and each p E U, we have ~(p)
c.~
(po) + ()t +
e)IxllP -
PollB,
so that ,~ is locally U.L.[()t + e)Ix] at po. Proof. Choose x0E X0; denote Lfxo+ T by Q(xo). Let 0 E (0, ri] with )tO ---3' and let y E OB; then X , n Q(xo)-I(y)CXo+XllYllB c X,. Hypothesis (iii), together with closure of Q(xo), implies that for each y ~ OB, Xv n Q(x0)-l(y) is non-empty, compact and convex. In particular, X0 is a compact convex set. The basic idea of the proof is to approximate the inverse of the operator f(p, x) + T ( x ) by the inverse of the operator Q('n'(x))(z) := Lf,,(~)(z) + T(z) = f(Po, rr(x)) +f2(Po, rr(x))(z - or(x)) + T(z), where It(x) is the closest point to x in X0, just as one approximates the inverse of a function in the classical inverse-function theorem by the inverse of its linearization about some point. We then apply a fixed-point theorem; in proving the inverse-function theorem one usually uses the contraction principle, but here we have to use the Kakutani theorem. Observe that the "linearized" operator appearing here is of the type we discussed above in considering linear generalized equations; this illustrates our comment that these operators play a r61e in the analysis of generalized equations analogous to that of linear operators in classical analysis. Of course, during this approximation it will be necessary to be careful that we work with the correct component of the inverse image (i.e., that lying in X,), and this adds a certain amount of complexity to the notation.
S.M. Robinson/Generalized equations, part I
Define, for two subsets A and C of R" and a point x E R", d[x, C] :=
133
inf(llx
-
cll l c ~ c} and d [ A , C] = sup{ata, C]I a e A}, where the s u p r e m u m and infimum of ~ are defined to be -or and + ~ respectively. Denote by 7r the projection from R" onto X0; ~r is well known to be nonexpansive, hence a fortiori continuous. Using continuity and c o m p a c t n e s s , one can show that the function /3(5) := max{ll/2(p0, x ) - / 2 ( P 0 ,
~(x))ll I x ~
g o + 8B}
is well defined for small 8, and is continuous at 0 with /3(0)= 0. Thus, we can choose a 8 E (0, 3'] such that ;t/3(8)-< 89and 5/3(5) 0 we obtain [A/(A + ~)]d[x,-~(P0)] -< Aao(p) and thus
dIx, E(p0)] -< (X + E)a0(p).
(2.6)
On the o t h e r hand, if X = 0, then (2.5) implies that d/x, Z(P0)] = 0, in which case (2.6) holds trivially. In either case, t h e r e f o r e ,
X(p) C ~(P0) + (A + E)ao(p)B, which c o m p l e t e s the proof. Verification o f the h y p o t h e s e s o f this t h e o r e m in a p a r t i c u l a r case m a y be difficult; this is particularly true of (ii) and (iii). It is t h e r e f o r e desirable to look for classes of p r o b l e m s for which this verification m a y be easier. In the next section we exhibit such a class f o r h y p o t h e s i s (ii); we do so for (iii) in the following p r o p o s i t i o n .
Proposition 1. In Theorem 1, the hypothesis (iii) may be replaced by (iii)' /z(Po, xo) is positive semidefinite and T is maximal monotone. Proof. We shall s h o w that (iii)', t o g e t h e r with the other h y p o t h e s e s of T h e o r e m 1, implies (iii). C h o o s e a n y x0• Xo; under (iii)' the function Lf~ will be a m a x i m a l m o n o t o n e operator. As T is also m a x i m a l m o n o t o n e and as d o m L.f~ (the effective d o m a i n of LfQ is all of R", we h a v e f r o m [1, C o r o l l a r y 2.7] that Q(xo) is m a x i m a l m o n o t o n e ; hence so is Q(x0) -1. T h e set Q(x0)-~(0) is then
136
S.M. Robinson / Generalized equations, part I
convex, so that (i) implies that Q(x0)-~(0)= x0. It follows that for y E riB, X~ M Q(x0)-'(y) c g o + xllYlln (by (ii)). Now let r, E (0, 711 with Aa < y. If y E aB, the convexity of Q(x0)-'(y) implies that Xv tq Q(x0)-'(y) = Q(x0)-'(y), so Q(xo) -t is locally U.L.(,X) at 0. But this, together with the boundedness of Q(x0)-'(0), shows that Q(xo)-' is locally bounded at 0; in fact, it must be locally bounded at every point of intt~B, since the image of some ball around such a point will be contained in the image of aB, which in turn is contained in the bounded set X ~ =Xo+AaB. But then from [12, T h e o r e m I] we have that i n t a B cannot contain any boundary point of dom Q(x0)-~; however, as i n t a B meets dom Q(x0) -I (at 0) and is connected we finally conclude that i n t a B C int dom Q(xo)-k Thus, for each y with llyll < a the set Q(xo)-~(y) is nonempty, convex and contained in Xa~ c X r Now let 77o be any positive number smaller than a. As hypothesis (ii) of Theorem 1 was true for 7, and as a-< 71, that hypothesis will be satisfied also for "00; as we have just seen, hypothesis (iii) also holds for r~0, and this proves Proposition I. The hypothesis (iii)' is certainly simpler than is (iii); however, (iii) covers a more general class of problems. For example, consider the linear generalized equation
0 ~ - a x +/3 + 0~[-L~I(X), where = > 0. This does not satisfy (iii)'; however, if 1/31~ a then each of its solutions (one if 1/31> a, three if It~l < ~) can be analyzed under (iii). If 1/31-- a then the solution at - s g n / 3 can be so analyzed, but the solution at sgn/3 cannot (indeed, the conclusions of Theorem 1 fail for that solution).
3. Polyhedral multifunctions In the last section, we exhibited a class of problems for which hypothesis (iii) of Theorem 1 always held. Here we do somewhat the same thing for hypothesis (ii): we show that for a class of multifunctions important in applications to optimization and equilibrium problems, local upper Lipschitz continuity holds at each point of the range space. The problem of verifying hypothesis (ii), in the case of such functions, then reduces to that of showing that the Lipschitz constants are uniformly bounded and that the continuity holds on a fixed neighborhood for each function in the family considered. For the application given in Section 4 this is trivial; some cases in which it is non-trivial are treated in Part II.
Definition 2. A multifunction Q : R n ~ R '~ is polyhedral if its graph is the union of a finite (possibly empty) collection of polyhedral convex sets (called components).
S.M. Robinson/Generalized equations, part I
137
Here we use "polyhedral convex set" as in [13, Section 19]. It is clear that a polyhedral multifunction is always closed, and that its inverse is likewise polyhedral. Further, one can show without difficulty that the class of polyhedral multifunctions is closed under scalar multiplication, (finite) addition, and (finite) composition. The following proposition shows that they have good properties also with respect to upper Lipschitz continuity. For brevity, we omit the proofs of this proposition and the next; they may be found in [10].
Proposition 2. L e t F be a polyhedral multifunction f r o m R ~ into R =. Then there exists a constant A such that F is locally U.L.(A) at each XoE R n.
It is worth pointing out that ;t depends only on F and not on x0, although of course the size of the neighborhood of x0 within which the continuity holds will in general depend on x0. The importance of polyhedral multifunctions for applications is illustrated by the following fact, in the statement of which we use the concepts of subdifferential and of a polyhedral convex function (one whose epigraph is a polyhedral convex set), which are discussed further in [13].
Proposition 3. Let f be a polyhedral convex function f r o m R ~ into (-oo, + oo]. Then the subdifferential Of is a polyhedral multifunction.
It follows from this proposition that subdifferentials of polyhedral convex functions display the upper Lipschitz continuity required in Theorem 1. In view of our earlier remarks about polyhedral multifunctions, this behavior is not lost if we combine these subdifferentials in various ways with other polyhedral multifunctions. For example, let C be a nonempty polyhedral convex set in R n and let ~bc : R n ~ (-oo, +oo] be its indicator function, defined by r
0, x~C, +o% x ~ C.
It is readily verified that $c is a polyhedral convex function. Now, if A is a linear transformation from R n into itself and a E R ~, then the operator A x + a + O$c(X) and its inverse are, by Propositions 2 and 3, everywhere locally upper Lipschitzian. Hence, generalized linear equations have good continuity properties with respect to perturbations of the right-hand side; we shall exploit this fact in the next section. This discussion also shows that, if the operator T in Theorem 1 is polyhedral, then the linearized operators L f , , + T have at least some of the continuity properties required in hypothesis (ii) of that theorem; it is still necessary to prove uniformity, but this is trivial if X0 is a singleton, while in general it can often be done by using the structure of the problem (e.g., in nonlinear programming: see Part II of this paper).
138
S.M. Robinson / Generalized equations, part I
4. An application: stability of a linear generalized equation
To illustrate an application of Theorem 1, we specialize it to analyze the behavior of the solution set of the linear generalized equation 0 E A x + a + d~bc(X),
(4.1)
where A is an n x n matrix, a E R", and C is a nonempty polyhedral convex set in R". Such problems include, as special cases, the problems of linear and quadratic programming and the linear complementarity problem. We shall characterize stability of the solution set of (4.1) when the matrix A is positive semidefinite (but not necessarily symmetric); a more general (but more complicated) result could be obtained by dropping the assumption of positive semidefiniteness but assuming hypothesis (iii) of Theorem 1. Theorem 2. Let A be a positive semidelinite n x n matrix, C be a nonempty polyhedral convex set in R* and a ~ R". Then the [ollowing are equivalent: (a) The solution set o f (4.1) is nonempty and bounded. (b) There exists ~o > 0 such that f o r each n x n matrix A' and each a' E R" with e ' : = max{llA' - All, Ila'- all} < G0,
0 such that [or each A', a' with max{HA' - All, Ua' - all} < ~1 we have r # S(A', a') O 9 C S(A, a) + A~'(1 - A~')-t(1 + ~L)B.
(4.3)
Finally, i[ (A', a') are restricted to values [or which S ( A ' , a') is known to be connected (in particular, if A' is restricted to be positive semideJinite), then qt can be replaced by R".
Proof. (b ::> a) If (b) holds then in particular S ( A , a') is nonempty for all a' in some ball about a. This means that 0 belongs to the interior of the range of the operator A(.)+ a + 0~c(-), which is maximal monotone by [1, Corollary 2.7]. Accordingly, the inverse of this operator is locally bounded at 0 [1, Proposition 2.9] and so in particular S ( A , a) is bounded. (a:ff b) We apply Theorem 1, taking P to be the normed linear space of pairs (A', a') of n x n matrices and points of R', with the distance from (A', a') to (A", a") given by max{llA'-A'ql, Ila'-a"ll}; we take po := ( A , a ) , T := O~bc, and f[(A', a ' ) , x ] : = A ' x + a'. The set X0 is then S ( A , a ) ; we let O be any open bounded set containing X0, and since Lf,~(x) = A x + a for any x0, it is clear that the hypotheses are satisfied (note that Proposition 1 implies that (iii) holds). We
S.M. Robinson/Generalized equations, part I
139
then find that for some 6 > 0 , ~ 0 > 0 and all ( A ' , a ' ) with e ' < E 0 , we have S(A', a') A[S(A, a) + 8B] nonempty, which proves (b). Now choose ~ ; without loss of generality we can suppose that 12 was taken to be this ~. As ~ is bounded, we can find ~1 E (0, c0] with Ael < 1 and such that for each x E ~, ~1(1 + Ilxll) --- 7, where 77 is the parameter appearing in Theorem 1. Now pick any (A', a') with ~ ' < el; by the above discussion S(A', a ' ) N ~ is nonempty, and we take x' to be any point of that intersection. We know that 0 E A ' x ' + a' + ~q~c(X'), which is equivalent to x' E [A(-) + a + OtOc(.)]-'[(A - A')x' + (a - a')]. But since x' ~ ~,
It(A
-
A')x'+
(a -
a')ll-< max{llA - A'U, Ila - a'll}(l + Ilx'll) -< ~,(1 + Ilx'll) -< ,7,
and so by upper Lipschitz continuity, d[x', S ( A , a)] --- AII(A - A')x' + (a - a')ll. Now let x0 be the closest point to x' in S ( A ; a); then
II(a
-
m')x' + (a - a
311 - iI(A
-
A ' ) x o + ( a - a311
+ I](A
-
m ' ) ( x' -
x0)[[
_< ,'(1 § ~ , ) § , ' l l x ' - xdl. Accordingly, as IIx' - xoll = d [ x ' , S(A, a)] w e h a v e
dIx', S ( A , a)] -< )ta'(l + ix) + )tE'd[x', S ( A , a)], yielding d[x', S ( A , a)] --- )t~'(1 - Ae')-l(1 +/~). Since x' was arbitrary in S ( A ' , a') O 1/i, we have (4.3). Finally, we observe that for all small ~', S(A', a') n 9 is contained in S ( A , a) + 8B which is contained in qt. If S ( A ' , a') also met the complement of qt then it would be disconnected; thus if S(A', a') is connected it must lie entirely in qz, so that we may replace qt by R" in (4.3). In particular, if A' is positive semidefinite then A'(.) + a' + O~Oc(.) is maximal monotone, so that S(A', a') is convex as the inverse image of 0 under this operator. This completes the proof. One might wonder, since the boundedness of ~ is used at only one place in the proof, whether a refinement of the technique would permit replacement of by R" in all cases. The following example shows that this cannot be done even for n = 1: take C = R+, A = [0] and a = [1], so that the problem is 0 ~ [0]x + [11 + ar
140
S.M. Robinson/Generalized equations, part I
whose solution set is S([0], [l]) = {0}. H o w e v e r , it is readily checked that for any E > 0, S ( [ - e l , [1]) = {0, ~-~}; thus we cannot take g' = R in this case. Theorem 2 provides, in particular, a complete stability theory for convex quadratic programming (including linear programming) and for linear complementarity problems with positive semidefinite matrices; this extends earlier work of Daniel [3] on strictly convex quadratic programming, and of the author I l l ] on linear programming. Stability results for more general nonlinear programming problems are developed in Part II of this paper. It might be worth pointing out that the strong form of Theorem 2 (i.e., with A' restricted to be positive semidefinite) can sometimes be shown to hold because of the form of the problem. For example, consider the quadratic programming problem minimize
89
Qx) + ,
subject to
B x + Dy g o ( x ) - ~ , Ai(v', y*>,
(3.2)
I
and M ( g , E) -> go(x).
Proof. Let x = Y.Aivi. F r o m (3.1) go(x) > go(v i) + (x - v i, yi*).
(3.3)
146
R. Saigal/ Fixed point approach to nonlinear programming
Hence
g0(~) >-- Y, x,g0(v') - ~x,(v', y*) i-I
>=go(X) - ~ ~i--go(v i) + (s - x, y*) and so
x~0(v' - x + ~) >- ~ , X,go(v') >=go(x). But as v ~ x + $ E BGf, ~), we have o u r result. N o t e that in T h e o r e m 3.3, (3.2) gives a c o m p u t a b l e lower b o u n d on the m i n i m u m value of g ( x ) and can thus be used as a stopping rule. Also, (3.3) s h o w s that the algorithm is c o n v e r g i n g to a m i n i m u m . 3.2. Constrained case W e n o w c o n s i d e r the p r o b l e m (1.1-2) w h e n the f u n c t i o n s g~ are c o n v e x functions. Define, s ( x ) = max{gi(x): 1 0 , since the c o n t r a r y implies that 0 E as(~) and so s(g) < 0 is impossible. L e t I(~) = {i : gi(x) = 0}. Then, Os(g) = h u l l { U iett~) 0g;(g)}
R. Saigal/ Fixed point approach to nonlinear programming
147
Ai = 1, and yi* E agi(,~) such that
and so there are n u m b e r s A,>= 0 , i E I(.~), ~ y* = ~ Xiy'*,. H e n c e , pz* + ~, Air* = 0. N o w , let y satisfy (1.2). Then, f r o m (3.1),
go(Y) >=go(E) + (z*, y - ~)), g,(y) >=g~($) + (y~*, y - 2),
i E I(Y,).
Hence
pg0(y) ->-pg0(y) +~. x~(y) >=pgo(x)+(pz* + ~ Aiy~', y - ~ ) = pgo(~)
and hence ~ solves (1.1-2). We n o w s h o w that the algorithm initiated with
r(x) = x - Xo for arbitrary x0 and l(x) as in (3.4) will c o m p u t e an .~ such that 0 E 1($).
T h e o r e m 3.5. Let s satisfy the above assumptions, and let the algorithm implement l and r as above. Then, for arbitrary c o > 0 , starting with a unique r-complete simplex of size co, the algorithm will generate a sequence whose cluster points ~ satisfy 0 @ l($), and thus solve (1.1-2).
Proof. L e t
M(xo, ~) = max{sup{s(x): x C B(Xo, e)}, 0} and D = {x: s(x) 0 for all x, and thus (3.4) r e d u c e s to l(x) = as(x). A c o n s e q u e n c e of T h e o r e m 3.3 is the following. (This result also a p p e a r s in Merrill [7].)
148
R. Saigal/ Fixed point approach to nonlinear programming
3.6. L e t s ( x ) > 0 f o r all x, and that {x: s(x) 0 , the a l g o r i t h m s c o m p u t e a 1 U r - c o m p l e t e s i m p l e x in a finite n u m b e r o f i t e r a t i o n s . A l s o , s i n c e s ( x ) > 0, it will a t t e m p t to m i n i m i z e s(x). N o w , let o- = (v t . . . . . v "'~) be an / - c o m p l e t e s i m p l e x of size 9 > 0 f o u n d b y the a l g o r i t h m . T h e n , t h e r e a r e y* E Os(v ~) s u c h t h a t ~ . hiy* = 0, ~ Ai = 1, hi -> 0 has a solution. A l s o , f r o m T h e o r e m 3.3, if g m i n i m i z e s s,
s(g) >- s ( x ) - ~ x,(v ~, y*) = s(x)-
Y . xi O, f o r s o m e sufficiently s m a l l 9 > O, s ( x ) - ~, ,L(v i, y~) > O, and hence we are done.
In a d d i t i o n , we c a n o b t a i n a l o w e r b o u n d o n the f u n c t i o n in this c a s e as well. L e t ~ r = ( v ~. . . . . v ~. . . . . v r b e l a b e l e d b y y* E 8g0(v ~) a n d v "'~ . . . . . H e n c e s ( v ~ ) < O , i = 1 . . . . . r and s(v~)>-_6 f o r
optimal value of the objective v ~*~) be / - c o m p l e t e a n d let v "*f b e l a b e l e d b y y * E a s ( v i ) . i=r+l ..... n+l. A l s o , let
E Aiy~ = 0, E Ai = 1, Ai ~ 0, 0 = 2 1 A i a n d ~ -- ( 1 [ 0 ) ~ . ~ i Aivi" T h e n w e c a n prove:
Theorem
3.7. Let
v i. y*,
O, ~ be as
go(x) >=go(,r ) - 1 ,~, Ai(vi, y . ) . I:1
Proof. U s i n g (3.1) w e get, f o r i = 1. . . . .
go(g) >- go(v i) + (g - v i, Y*) and for i = r + 1.....
n + 1, w e get
s(g) >=s ( x ) + (g - v ~, y*).
r
above,
and
g solve
(1.1-2).
Then
R. Saigal/ PTxed point approach to nonlinear programming
H e n c e , as s ( s
149
~0
1
r
,,*I
1 ? I ~(v', y*) tl
->- go(~) -
1 ,,-I
*
and w e h a v e o u r result.
4. Piecewise linear functions and nonlinear programming In this s e c t i o n w e e s t a b l i s h the n o t a t i o n and p r o v e s o m e b a s i c r e s u l t s f o r n o n l i n e a r p r o g r a m s with p i e c e w i s e l i n e a r f u n c t i o n s .
Cells and m a n i f o l d s A cell is the c o n v e x hull of a finite n u m b e r o f p o i n t s a n d half lines (half lines are sets o f t h e t y p e {x: x = a + tb, t >=0} w h e r e a a n d b a r e fixed v e c t o r s in R"). T h e d i m e n s i o n o f a cell is the m a x i m u m n u m b e r o f l i n e a r l y i n d e p e n d e n t p o i n t s in the cell. W e will call an n d i m e n s i o n a l cell an n - c e l l . L e t ~- b e a s u b s e t o f an n - c e l l or. If x, y • o ' , O < A < I , (1-A)x+AyE~i m p l i e s that x, y in r, then t is c a l l e d a f a c e o f a cell o-. A s i m p l e f a c t is t h a t f a c e s are cells. A l s o f a c e s that a r e (n - l ) - c e l l s a r e called f a c e t s o f the ceil, a n d that a r e O-cells a r e c a l l e d v e r t i c e s o f the cell. 0 # M b e a c o l l e c t i o n o f n-cells in R". L e t M = U ~ e ~ . W e call ( M , M ) a s u b d i v i d e d n - m a n i f o l d if (4.1) A n y t w o n - c e l l s of M t h a t m e e t , d o so on a c o m m o n f a c e . (4.2) E a c h (n - I ) - f a c e o f a cell lies in at m o s t t w o n - c e l l s . (4.3) E a c h x in M lies in a finite n u m b e r o f n - c e l l s in M. If (M. M ) is a s u b d i v i d e d n - m a n i f o l d f o r s o m e ~t, w e call M a n - m a n i f o l d .
Piecewise linear f u n c t i o n s L e t M be a n - m a n i f o l d , t h e n the f u n c t i o n
g:M~R is c a l l e d p i e c e w i s e l i n e a r on a s u b d i v i s i o n M o f M if (4.4) g is c o n t i n u o u s (4.5) G i v e n a cell tr in d/, t h e r e e x i s t s an affine f u n c t i o n g,, : R" ~ R such that g l ~ ( x ) = g,,(x) (i.e., g r e s t r i c t e d to tr is g~).
150
R. Saigal/ Fixed point approach to nonlinear programming
Generalized subdifferentials L e t M be a n - m a n i f o l d , a n d ~ b e its s u b d i v i s i o n . L e t g : M - - - , R b e a p i e c e w i s e l i n e a r f u n c t i o n . T h e n , f o r e a c h x E M, w e define a g e n e r a l i z e d s u b d i f f e r e n t i a l set cgg(x) as f o l l o w s : F r o m (4.3), x lies in a finite n u m b e r o f n - c e l l s , ~r~, o'2. . . . . crr in ~ say. L e t Vg~, = ai ( w h e r e Vf is the g r a d i e n t v e c t o r o f f ) . T h e n , w e define
Og(x) = hull{al . . . . . a,} a n d w e n o t e t h a t if g in a d d i t i o n , is c o n v e x , t h e n Og(x) is the s u b d i f f e r e n t i a l o f g at x, R o c k a f e l l a r [10]; a n d , as g is l o c a l l y L i p s c h i t z c o n t i n u o u s , Og(x) is t h e g e n e r a l i z e d g r a d i e n t o f C l a r k e [1]. In t h a t c a s e , the t h e o r e m b e l o w is k n o w n , [2], b u t w e will use t h e p i e c e w i s e l i n e a r i t y o f g to e s t a b l i s h it. T h e o r e m 4.1. / f x is a local m i n i m u m o f g0,then 0 ~ 3go(x). Proof. A s s u m e X is a l o c a l m i n i m u m b u t 0 qf-,ggo(X). N o w , let g E crt fq crz n 999fq or, T h e n ago(X) = hull{a~ . . . . . at}. H e n c e , f r o m F a r k a s ' l e m m a , t h e r e is a z # 0 s u c h t h a t ( z , a ~ ) < 0 f o r i = 1 . . . . . r. L e t 9 b e sufficiently s m a l l so the B(X, 9 C U crj. T h e n X + Oz ~ B(X, 9 f o r sufficiently small 0 > 0. A s s u m e X + Oz E o'j f o r s o m e j. H e n c e
go(X + Oz) = (aj, ~) + O(aj, z) - ~,~ = go(x) + O(ai, z) < go(X) a n d w e h a v e a c o n t r a d i c t i o n to the f a c t t h a t X is a local m i n i m u m . G i v e n a p o i n t - t o - s e t m a p p i n g F f r o m R" to n o n e m p t y s u b s e t s o f R", w e s a y F is w e a k l y m o n o t o n e at X with r e s p e c t to 27* E I ( ~ ) on F if t h e r e is an 9 > 0 such t h a t f o r all x ~ B(~, 9 N F. (x - -L Y* - 27*) => 0
for all y* in F(x).
We can then prove: T h e o r e m 4.2. g is a local minimum of go if and only if 0 ~ Ogo(g) and 3go is weakly monotone at X with respect to 0 on R". Proof. L e t X lie in t h e cells trt, ~r2. . . . . tr,. T h e n ago(X) = hull{at . . . . at}. F o r s o m e sufficiently small E > 0 , let B(g, 9 C U try. T o s e e t h e if p a r t , let 0 E ag0(g) a n d let Ogo b e w e a k l y m o n o t o n e with r e s p e c t to z e r o at X. H e n c e , f o r s o m e ~ > 0, for all x E B(g, ~) w e h a v e (x - X, ai) >= 0
w h e r e ai E Ogo(x) C ago(X).
R. Saigal/ Fixed point approach to nonlinear programming
151
Hence, go(x) - go(Y~) = (ai, x ) - "Yi - (ai, -~) + "/i ~ O,
and so ~ is a local m i n i m u m of go- T o see the only if part, let 0 E ago(g) and Ogo not w e a k l y m o n o t o n e with respect to O. Then, for a sufficiently small 9 > 0 such that B ( ~ , 9 U ~r~, there is an x G B ( $ , 9 and a a~E ago(x) such that (x - JL a~) < O. Since ago(x) c ago(x), we have go(x) - go(x) = (ai, x ) - yi - (ai, .~) + yi = (x - x, a,)
- 0 s u c h t h a t
Adgi(.~) = 0, i = 1. . . . , m. (ii) There e x i s t s y* E Ogo(~), z * E ag~(,f) s u c h that 0 = y * + ~ Aiz*. i-t
Proof. Let X be a local minimum, I ( ~ ) = {i: g,(.f) = 0},
C =
U
iEl(~)
Ogi(Y,),
and
cone(C)= {y: y= ~] A~y*,y'3~ C, Ai>O}. i=1
Also, let 0 ~ Og0(,~) + c o n e ( C ) . (It can readily be confirmed that (i) and (ii) hold if and o n l y if 0 E ago($) + cone(C).) T h e n , f r o m Farkas, lemma, since both ago($) and C are c o n v e x
152
R. Saigal/ Fixed point approach to nonlinear programming
c o m b i n a t i o n s of a finite n u m b e r of v e c t o r s , there exists a z such that (z, y*) < 0
f o r all y* ~ Ogo(~),
(z,y*)_-0 such that for Ogo($), a ~ Ogo(x). So,
o ~)+7O=O(aO, z) 0 for all i. N o w consider the point v i - x +x~ E / 3 . Also, by assumption, v ~ - x +x~ ~ D. H e n c e s ( v i - x + x l ) > s ( v ~ ) + ( x ~ - x, y *)
for a l l y * E a s ( v i)
and as v ~ b and v i - x + x t E / 3 we have ( x - x ~ , y * ) > 0 for all y* E Os(v~); for all i. Thus tr is not l U r-complete, a contradiction. The result now follows f r o m T h e o r e m 2.1.
6. Computational considerations As is evident f r o m the Sections 3 and 5, the convergence of the fixed point algorithms can be established under some general conditions on the problem, and differentiability is not necessary. Computational experience indicates that the computation burden increases when the underlying mappings are not smooth. For smooth mappings, under the usual conditions, the fixed point algorithms can be made to converge quadratically Saigal [15]. This can be observed by comparing the solution of four nondifferentiable nonlinear pro-
R. Saigal[ Fixed point approach to nonlinear programming
154
Table A. l Constrained minimization problem with piecewise linear objective function and one piecewise linear constraint in seven variables. Grid of Search
Number of function evaluations
Number of simplexes searched
8.0 4.0 2.0 1.0 0.5 0.25
34 111 28 286 539 487
121 160 54 699 1,034 918
Table A.2 Unconstrained minimization of a piecewise linear convex function of 43 variables. Grid of search
Number of function evaluations
Number of simplexes searched
6.55 3.27 1.63 0.81 0.41 0.20 0.10
951 469 400 526 623 1,081 885
1,431 469 400 526 623 1,081 885
Grid of search
Number of function evaluations
Number of simplexes searched
1.93 0.97 0.48 0.24 0.24 0.12 0.12 0.12 0.06 0.06 0.06 0.06 0.03 0.15 0.07 0.07 0.O7 0.003
1895 1398 608 232 218 433 189 377 679 222 259 130 346 254 9O 171 59 262
3872 1398 608 232 218 433 189 377 679 222 259 130 346 254 9O 171 59 262
Table A.3
R. Saigal/ Fixed point approach to nonlinear programming
155
Table A.4 Grid of search 1.94 0.96 0.48 0.24 0.02 0.06 0.03 0.015 0.007 0.004
Number of function evaluations 1140 42 53 52 16 16 16 16 16 16
Number of simplexes searched 1759 42 53 52 16 16 16 16 16 16
Table A.5 Zero finding problem for a smooth function of 80 variables. The accelerated algorithm has been used to solve this problem. Grid of search
Number of function evaluations
Number of simplexes searched
8.74 4.47 2.24 I. 18 0.56 0.24 0.14 0.03 0.002 0.000009
i ,573 149 106 97 88 84 81 84 81 82
1,750 149 106 97 88 84 81 84 81 82
g r a m m i n g p r o b l e m s i m p l e m e n t i n g t h e m a p p i n g (3.4) p r e s e n t e d in t h e a p p e n d i x , T a b l e s A. 1-4, w i t h t h e s o l u t i o n o f a s m o o t h p r o b l e m o f e i g h t y v a r i a b l e s in T a b l e A.5. On such a p r o b l e m , r e p o r t e d in N e t r a v a l i a n d S a i g a l [9], t h e g r o w t h o f the number of function evaluations with the number of variables was tested. The r e s u l t s w e r e a s a n t i c i p a t e d b y the w o r k s o f Saigal [12] a n d T o d d [17]. It w a s p r e d i c t e d in t h e s e w o r k s t h a t t h e f u n c t i o n e v a l u a t i o n s g r o w a s O(n2), w h e r e n is t h e n u m b e r o f v a r i a b l e s . (See [9, 4. I].)
Appendix W e n o w g i v e s o m e c o m p u t a t i o n a l e x p e r i e n c e with s o l v i n g n o n d i f f e r e n t i a b l e o p t i m i z a t i o n p r o b l e m s o f f a i r l y large n u m b e r o f v a r i a b l e s . F o r c o m p a r i s o n
R. Saigal/ Fixed point approach to nonlinear programming
156
purposes, we also give the results of solving an eighty-variable smooth problem (where the c o n v e r g e n c e has been accelerated).
Problem 1. This is a 7 variable problem. It is a version of the problem considered by Natravali and Saigal [91. The value of entropy on the entropy constraint is 2.7, and this is the 19th run in the series of runs done on this problem.
Problem 2. This is a 43 variable problem considered in W.B. Eisner, " A descent algorithm for the muitihour sizing of traffic n e t w o r k s , " Bell System Technical Journal 56 (1977) 1405-1430. This run was made on a piecewise linear version, while the problem formulated by Eisner was piecewise smooth. The function is convex.
Problem 3. This is the following 15 variable convex piecewise s m o o t h problem: min x
max fi(x) I~j_ 0, then lira g(v k, ~ ( v k, ek)) = g(v ~, rfi(v|
(17)
k -~
Indeed, it follows from (HS) and (H3) (ii) that
Ig(v ~, ,~(v*, e~)) - g(v ~, ,~ (v~))l _< ~(lln~(v ~, ~ ) _ n~ (v~)ll) 0 such that v k = v k~ for k->k0, and Step 7 is carried out for k->k0. It follows from (H6) (ii) that for each ek (k -> k0) after finite number of iterations we find a point u such that g,k(U) > g,k(V *o) -- ~k.
u E O,*(~k),
(18)
Thus Step 8 is carried out infinite number of times and Ek ~ 0 . Moreover, by virtue of (H5)
g,*(u) k0
~k = r
g(v*+], r~(l)k+l, ~ ) ) < g(1)k, ~ ) ) _ i[~.
Hence. the sequence
{g( v k, ffl ( v k, Ck~ would be unbounded. It follows from (H8) and (H3) (ii) that for k ~ oo, k E K
g(v k, rh(vk))~ --oo. which contradicts (H7). Thus, Step 8 is carried out infinite number of times. Consequently, ek ~ 0. Let l(k) be the greatest from the numbers ! < k such that in lth iteration Step 8 was executed. We have already proved that l(k)~oo and e " k ) ~ 0 , when k~oo. It follows from the definition of the number l(k), that e ; = e "k)+~ for l ( k ) < i < - k. Hence
g(v k, r~(v k, ~,)) < g(v,k)+l. ~ (1) l(k)+l, ~l(k)+l))
(21)
Since for l(k) Step 8 was carried out. it follows from (H5) and (H6) (ii) (in a similar way to that which led to inequality (20)) that
g(v"k~, ~ (v'(k), ~.k))) - g(v, tfi(v, E"k~))-- y(~b(E"k~)).
(26)
Thus, from (25) and (26) we obtain for any v E V
g(v k, rfi(v k, Ek)) --o~, k ~ K in (27). with a view to (17), we obtain for any v ~ V
g(v ~, rh(v~)) 0, then existence of such optimal value is evident since:
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
167
(1) At A close to 1 the stop criterion at the upper level will be fulfilled very often (Step 7). It will result in many additional computations aimed at rejection of sequence {u i} and improvement of accuracy at the same point v k. (2) At A close to 0 the stop criterion for the lower level will be tightened too fast. It will result in excessive cost of computation of the function g,(v) at v distant from the solution. We shall illustrate the possibility of selection of optimal A for the simple case of linearly convergent algorithms. Let us assume that (i) Calculation of th(v,r calls for C~ Iogr iterations of the lower-level algorithm. (ii) The upper-level algorithm has linear rate of convergence, i.e. there exists q < 1 such that g,(u j+~) 0 and for x = ( 0 , 0 ) , maxy f(x, y) = 0. Thus a unique solution of (36) is (x, y) = (0, 0, 0, 0). Let us write the problem (36) in the general form introduced in Section 2. The lower-level problem is of the form rain [q(v, m) = (m02+ (m2) 2 + (ml - m2)~ - 2viral - 2v2m2].
(29)
m~E 2
We denote, as before, solutions of (37) by n~(v). The upper-level problem is of the form min [g(v, th(v)) = 100(v2- (vl)2)2 - q(v, rh (v))]. (30) vER 2
168
J. Szymanowski, A. Ruszczynski[ Two-level algorithms
For solving the lower-level problem (29) the well-known variable metric method of Fletcher was used. The stop criterion was based on the norm of the gradient. Let us note that if the norm of the gradient is of range e, then the error of the function observed is of range (e)2. In the accuracy selection algorithm, described in Section 3, it is useful to have these two errors equilibrated (Step 5). Thus the stop test
IIvmq(v, m)ll < ~x/~.
(31)
was used in the Fletcher's method. P a r a m e t e r 7 / > 0 in (31) will be chosen later. The starting point for this algorithm was fixed at m ~ (0.5, 1.0). The Fletcher's method, starting f r o m m ~ generated a sequence {mk(v)} convergent to rh(v). The first point of this sequence, satisfying stop test (31), was chosen as n~(v, r At the upper level a modified variable metric method of Wolfe, Broyden and Davidon (WBD) was used. The set f ~ ( 6 ) of approximate solutions of the upper-level problems was defined as follows
~,(8) = {rE R2:
II [(v "))l
(32)
In order to fulfill assumption (H6), p a r a m e t e r T/ in (31) should be sufficiently small. In our case 7/= 0.1 is good enough. The starting point for the upper-level algorithm was fixed at v ~ = (5, 24). More details about implementation of Fletcher's and W B D methods, together with description of line search procedures etc., may be found in [9]. The accuracy selection algorithm was written in such a way that any minimization method may be used at the upper- or lower-level. Besides the minimization procedures one should determine the following parameters: e0: initial accuracy, c~n: final accuracy, A: accuracy reduction coefficient, a : rate of c o n v e r g e n c e multiplier. In Step 8 of the a c c u r a c y selection algorithm c k is updated according to the formula ek+~= A*r k, where A k = (a)*A. In our example we want to find a point v E R: in which [[Vvg(v, m(v))l[ < l0 -7. Since in the accuracy selection algorithm, formulated in Section 3, it was required that 8 in (32) be equal to c in (31), we choose CmJn= 10-14- It is reasonable to choose c ~ of the same range as the square of the norm of the gradient in v ~ In the c o m p u t a t i o n s c ~ was fixed at 104. N u m e r i c a l results
In order to investigate properties of the a c c u r a c y selection algorithm influence of p a r a m e t e r A on the cost of computations was tested. P a r a m e t e r a was fixed at l, which c o r r e s p o n d s to linear c o n v e r g e n c e of accuracy p a r a m e t e r c*. The
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
169
cost of c o m p u t a t i o n s was defined by the formula TC = N F + 2- NG,
where NF: number of function evaluations at the lower level, NG: number o f gradient evaluations at the lower level. Results of computations for various A are collected in Table 1, where NE: number of reductions of ~, Table
1 a
10 -18 l0 -15 l0 -]z 10 -9 10 -7 2.5 x 10-5 10 .3 10 -2
4x10 2 2 . 5 x l 0 -I
superlinear
NE I
2 2 2 3 4 6 9 13 30 4
Ni
xI
x2
Yl
Yz
f
TC
194 195 196 209 206 229 239 259 269 337 207
- 0 . 4 9 x l 0 -t~ - 0 . 4 9 x l0 -[~ 0 . 4 2 x l0 -6 0 . 2 4 x 1 0 -tz 0.27 x 10 s 0.38 x 10 -~~ -0.71x10 s 0.27 x 10 -s - 0 . 1 3 x 10 -5 - 0 . 3 2 x l 0 -s - 0 . 1 0 x 10-9
-0.39xl0 u - 0 . 3 8 x l0 -ll - 0 . 5 4 x l0 -9 -0,12xl0 u - 0 . 2 4 x 10 io 0.15 x 10 -I~ 0 . 3 3 x l 0 l0 0.11 x 10 -9
0 . 5 8 x l 0 l0 - 0 . 3 6 x l0 ~o - 0 . 1 5 x l0 3 - 0 . 3 0 x l 0 -12 0.27 x 10-s 0.63 x 10 ~0 _0.71xl0-S 0.27 x 10 s
0.58xl0 -u - 0 . 2 8 x l0 T M 0.19• l0 -4 0 . 7 3 x l 0 ,12 - 0 . 2 3 x 10 to 0.17 x 10 -~0 0 . 3 4 x l 0 -I~ 0.11 X 10 -9
0 . 3 8 x l 0 zo 0 . 3 7 x I0 -Is 0 . 1 8 x l0 t: 0 . 1 4 x l 0 -zz 0.71 x 10-~7 0.23 x 10 19 0 . 5 1 x 1 0 t6 0.85 X 10 -17 0 . 1 8 x 10- u 0.10xl016 0.1l X 10 -19
7329 6752 5482 a 4823 4298 3894 3995 4019 4187 a 5001 3264
a Required accuracy not
- - 0 . 5 5 X 10 -9
- 0 . 2 0 x 1 0 12 - 0 . 2 8 x 10-~l
0.58X l0 5
0.30X l0 6
- 0 . 3 2 x l 0 -8 0.21 • 10 to
0 . 7 7 x l 0 -~z 0 . 1 6 x 10 i1
obtained. cosf
7000
6000 %,
!
~~
-5000
I !
,/.
o~
9~,000
I(~ ' 18
1o I~
1i 4
I012 .
.
I0I0 .
.
F i g . I. I n f l u e n c e o f p a r a m e t e r 999
168
I06
10
10
;t o n t h e c o s t o f c o m p u t a t i o n s ,
results of experiments,
- - - t h e o r e t i c a l c u r v e : 7r(A) = 2 4 0 0 + 25011og A I + 2 2 0 0 / J l o g A [.
170
J. Szymanowski, A. Ruszczynskil Two-level algorithms
NI: number of invocations of the lower-level procedure, (xt, x2, y~, Y2): solution obtained, f: value of the objective function, TC: total cost of computations. Influence of parameter A on the cost of computations is illustrated in Fig. 1. Let us note that the value of A = 10-Is corresponds to the method in which lowerlevel problems are solved with utmost accuracy. Cost of computations in this case is marked in Fig. 1 by a horizontal dotted line. Although considerations from Section 5 were made under very simplifying assumptions the curve cost versus a is greatly similar to the theoretical one. Additionally, one experiment was carried out in which parameter a was less than 1. Such values of ot correspond to superlinear convergence of accuracy parameters ~k. Results of this experiment are shown in the last row of Table 1 (A = 0.01, ,~ = 0.02). The cost of computations is much less than in the case of linear decrease of ek. Presumably, superlinear reduction of ~k is better for superlinearly convergent algorithms such as variable metric methods.
7. Final conclusions In the paper analysis of two-level methods was carried out under the assumption that lower-level problems are solved inaccurately. A general scheme of two-level methods covering a wide class of existing methods was used. This approach has a number of advantages. (l) Convergence of these methods was proved under general assumptions about stop criteria in upper- and lower-level algorithms. Optionally chosen minimization methods may be used in the scheme. (2) An accuracy selection algorithm was described and investigated theoretically and numerically. It is not a new two-level method but an algorithm organizing cooperation of minimization methods within the frame of a two-level method. (3) Proposed algorithm is numerically valid, i.e. each its step requires finite number of computations. Moreover, proper equilibration of stop criteria results in substantial savings of computations. Numerical experiments confirmed theoretical considerations. An example of application of the ideas suggested in this paper to the gradient projection method in minimax problems is presented in [7].
References [1] A. Auslender, "Minimization of convex functions with errors", IX International Symposium on Mathematical Programming, Budapest, 1976.
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
171
[2] P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331. [3] J. Denel, "Nouvelles notions de continuit6 des applications multivoques et applications a l'optimisation", Publication No 83 (mars 1977), Laboratoire de Calcul de rUniversite de Lille I. [4] R.R. Meyer, "The validity of a family of optimization methods", SlAM Journal on Control 8 (1970) 41-54. [5] G. Pierra, "Crossing of algorithms in decomposition methods", IFAC Symposium on Large Scale Systems Theory and Applications, Udine, 1976. [6] E. Polak, Computational methods in optimization (Academic Press, New York, 1971). [7] J. Szymanowski and A. Ruszczyfiski, "An accuracy selection algorithm for the modified gradient projection method in minimax problems", 8th IFIP Con[erence on Optimization Techniques, Wurzburg, 1977. [8] W.I. Zangwill, "Nonlinear programming. A unified approach", Prentice Hall, Englewood Cliffs, NJ, 1969). [9] "Methods for unconstrained optimization", Technical Report 1.2.03.2, Institute of Automatic Control, Technical University of Warsaw (1976).
Mathematical Programming Study 10 (1979) 172-190. North-Holland Publishing Company
A COMPARATIVE STUDY OF SEVERAL GENERAL CONVERGENCE CONDITIONS FOR ALGORITHMS MODELED BY POINT-TO-SET MAPS* S. T I S H Y A D H I G A M A
a n d E. P O L A K
University o[ California, Berkeley, CA, U.S.A.
and R. K L E S S I G Bell Laboratories, Holmdel, NJ, U.S.A.
Received 29 November 1977 Revised manuscript received 7 April 1978 A general structure is established that allows the comparison of various conditions that are sufficient for convergence of algorithms that can be modeled as the recursive application of a point-to-set map. This structure is used to compare several earlier sufficient conditions as well as three new sets of sufficient conditions. One of the new sets of conditions is shown to be the most general in that all other sets of conditions imply this new set. This new set of conditions is also extended to the case where the point-to-set map can change from iteration to iteration. Key words: Optimization Algorithms, Convergence Conditions, Point-to-set Maps, Nonlinear Programming, Comparative Study.
1. Introduction In r e c e n t y e a r s , t h e s t u d y o f o p t i m i z a t i o n a l g o r i t h m s h a s i n c l u d e d a s u b s t a n t i a l e f f o r t to i d e n t i f y p r o p e r t i e s o f a l g o r i t h m s t h a t will g u a r a n t e e t h e i r c o n v e r g e n c e in s o m e s e n s e e.g. [ I ] - [ 2 9 ] . A n u m b e r o f t h e s e r e s u l t s h a v e u s e d a n a b s t r a c t a l g o r i t h m m o d e l t h a t c o n s i s t s o f the r e c u r s i v e a p p l i c a t i o n o f a p o i n t - t o - s e t m a p . It is this t y p e o f r e s u l t with w h i c h w e a r e c o n c e r n e d in this p a p e r a n d , in p a r t i c u l a r w i t h the r e s u l t s p r e s e n t e d in [13], [16], [21], [24] a n d [29]. W e h a v e t w o p u r p o s e s . F i r s t , w e w i s h to i n t r o d u c e t h r e e n e w g e n e r a l c o n v e r g e n c e r e s u l t s . S e c o n d , w e wish to i d e n t i f y t h e r e l a t i o n s h i p s a m o n g the general convergence results including both our new results and previously published results. In o r d e r to c o m p a r e r e s u l t s , it is n e c e s s a r y to h a v e a c o m m o n f r a m e w o r k . U n f o r t u n a t e l y , d i f f e r e n t a u t h o r s h a v e u s e d slightly d i f f e r e n t a b s t r a c t a l g o r i t h m m o d e l s a n d h a v e a r r i v e d at slightly d i f f e r e n t c o n c l u s i o n s , p a r t l y b e c a u s e t h e y have used somewhat different concepts of convergence. Thus, before a c o m p a r i s o n c a n be m a d e , it is n e c e s s a r y to e s t a b l i s h a c o m m o n f r a m e w o r k a n d * Research sponsored by the National Science Foundation Grant ENG73-O214-A01 and the National Science Foundation (RANN) Grant ENV76-05264. 172
S. Tishyadhigama, E. Polak, R. Klessigl Algorithms modeled by point-to-set maps
173
then to translate the various theories into this f r a m e w o r k . Our a p p r o a c h to this task is as follows. In Section 2, we define an abstract algorithm model and formally define a concept of c o n v e r g e n c e for this model. Our new convergence results establish that certain conditions are sufficient for the algorithm model to be convergent in the sense of our c o n c e p t of convergence. The earlier results use a similar a p p r o a c h , but occasionally differ f r o m each other by the algorithm model and c o n c e p t of convergence used. We take the essential features of these earlier sufficient conditions and use these to create analogous conditions that are sufficient in our present framework. We then establish relationships between the various sufficient conditions by showing which conditions imply other conditions. In view of our a p p r o a c h to the interpretation of earlier work, we m a k e no claim that, and the reader should not infer that, the contents of this paper fully describe the various earlier results. W h e n we associate an a u t h o r ' s name with a set of sufficient conditions, we mean that the original conditions f r o m which we derived the conditions in question, were first proposed by that author. The interested reader can find all of the new results stated in this p a p e r in [26]. [26] also shows h o w the sufficient conditions used in this p a p e r are derived f r o m the original sufficient conditions. Section 3 contains the main results of this paper. These results are summarized by Fig. 1. Each box represents a set of conditions and an arrow indicates that the conditions at the tail of the a r r o w imply the conditions at the head. We have included in Section 3 results that show that under special conditions, some sets of sufficient conditions are equivalent. The most important of these special
G.I~E ~ M E Y I ~ R
I,,,t
44) I
~
'
~
POLAK~'~---'~1 H._MI:.Y~ I
m
(I)
IZ"NOW'LLI I (3.47) I
C IS LOWER SEMICONTINUOUS
,,, .,SS,NOLEV'LUE0 (,5) C IS CONTINUOUS
Fig. 1. Results of Section 3.
174
S. Tishyadhigama, E. Polak, R. Klessigl Algorithms modeled by point-to-set maps
cases is when the cost (or surrogate cost) function, c, is continuous. The special cases are noted in Fig. 1. In Section 4 we illustrate how the sufficient conditions presented in Section 3 can be modified to apply to an algorithm model which may use a different point-to-set map at each iteration. We do this by extending the most general sufficient conditions of Section 3. Finally, in the Appendix we present s o m e c o u n t e r e x a m p l e s to show that there are meaningful differences between the sets of sufficient conditions.
2. Framework for comparison and preliminaries In this section, we present an abstract algorithm model and define a concept of convergence. In addition, we present s o m e results and notation that will be extensively used in the sequel. (2.1) Delinition. /2 is a Hausdorff topological space that satisfies the first axiom of countability. A C / 2 is called the set of desirable points. (2.2) Remark. The set A consists of points that we will accept as " s o l u t i o n s " to the problem being solved by the algorithm. For example, it m a y consist of all points satisfying a n e c e s s a r y condition of optimality./2 is sometimes taken as the set of feasible points for a problem. T h u s , / 2 m a y be a subset of a larger topological space. If such is the case, the relative topology o n / 2 is used. (2.3) Algorithm model. Let A :/2 ~ 2 a - ~b where 2 a denotes all subsets of O. Step 0: Set i = 0. Choose any z0 E/2. Step 1: Choose any zi+~ E A(z~). Step 2: Set i = i + 1 and go to Step 1. (2.4) Remark. Algorithm Model (2.3) has no provision for stopping and thus always generates an infinite sequence. H o w e v e r , many algorithms have stopping tests and stop only when zi E A. This can be accounted for in (2.3) by defining A(zi) = {zi} w h e n e v e r zi satisfies the stopping condition. Thus our analyses are shortened because we do not have to consider the trivial finite sequence case. We now state our c o n c e p t of convergence. (2.5) Definition. We say the Algorithm Model (2.3) is convergent ~ if the accumulation points of any sequence {z~} constructed by (2.3) are in A. W h e n we s a y t h a t a sequence {y~} c o n v e r g e s , w e still m e a n it in the usual s e n s e , i.e., for s o m e ~, yi--~ y as i ~ .
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
175
(2.6) Remark. We make no assumption that {z~} will have accumulation points. Thus, it is possible for (2.3) to be convergent and for {z~} to have no accumulation points. For example, for an optimization problem with no solution, defining A as the set of solutions means that A : ~b and the applications of a convergent algorithm results in a sequence {zi} that cannot have accumulation points. The definitions (2.1), (2.3) and (2.5) constitute the c o m m o n structure within which we shall carry out our analysis. To conclude this section, we establish some notation and state some results that will be useful later. All of the sufficient conditions in Section 3 assume the existence of a function c : 12 ~ R ~ and imply that c(z') - b Vz' E U.
(3.3) Conditions. (i) c(.) is locally bounded from below on 12 - A.
(ii) c(z') - c(yi) >- c*,
i = O, l ....
which implies that lim c(yi) = c* and this contradicts (3.8)(iii). The proof is now complete. (3.10) Definition. The pair (c, A ) is locally uniformly m o n o t o n i c at z if there exists g(z) > 0 (possibly depending on z) and a neighborhood U ( z ) of z such that (3.11)
c(z")-c(z') 0 such that c(z") - c ( z ' ) 0 which establishes (3.38)(iii). T o show that (3.38)(iv) holds, we a s s u m e the c o n t r a r y and establish a contradiction. L e t z E 1 2 - - A and let ~* E A ( z ) be as a b o v e . Let {U~} be a s e q u e n c e of n e i g h b o r h o o d s of z satisfying (2.9)(ii). If (3.38)(iv) does not hold, we can find 9 > 0, z7 E U~ and z~' E A(z~) such that (3.52)
c(z'3 > c(~*) + e,
i = 0, 1. . . . .
By c o n s t r u c t i o n , z ~ z . Condition (3.47)(v) then implies that {zi'} is c o m p a c t . H e n c e , there exist a s u b s e q u e n c e {zi}~eK2" such that z " ~ i K2Z"* and of c o u r s e z ~ r 2 z . F r o m (3.52) we conclude that (3.53)
c(z:') >- c(~:*) + 9
Since A(.) is closed, (3.47)(iv), z" E A ( z ) and (3.50) c o m b i n e s with (3.53) to yield (3.54)
c(z") >_ c(~*) + 9 >_c(z") + 9 > c(z".)
and we have a contradiction. C o n s e q u e n t l y , (3.38)(iv) must hold and the p r o o f is complete.
184
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
4. Extension to the time varying case In this section, we modify Conditions (3.3) to apply to the case where the point-to-set map depends upon the iteration number, i. The other sufficient conditions can be extended in a similar fashion. These extensions are relatively straightforward. T h e r e f o r e , we extend only Conditions (3.3) (the most general conditions) as an example of what can be done. (4.1) Time varying algorithm model: Let Ai : ,(2 ~ 2 a for i = 0, 1..... Step 0: Set i = 0. Choose any z0E O. Step 1: Choose any zi+~ E Ai(zi). Step 2: Set i = i + I and go to step I.
(4.2) Conditions. (i) c(') is locally bounded from below on ~(2 - A . (ii) There exists an integer N~ -> 0 such that c(z') < c(z) Vz' E A~(z), Vz E ~'2, i>_N,. (iii) For each z E I 2 - - A , if {xi}CI2 is such that x i ~ z and c(xz)-->c*, then there exists an integer N2 - N~ such that c(y) < c* 'r G AN:(xu,). (4.3) Theorem. If Conditions (4.2) hold, then Algorithm model (4.1) is convergent (in the sense of Definition (2.5)).
Proof. Let z* be an accumulation point of {zi}, the sequence constructed by (4.1). We assume that z* E 1 2 - A and establish a contradiction. There exists a subsequence {zl}~sK such that z ~ K z * . Without loss of generality, we can also a s s u m e that {C(Z~)}isK is monotonically decreasing because of (4.2)(ii). If z* ~ / 2 - A, (4.2)0) implies that {c(zi)}~sK is bounded from below and hence c(zi) ~K c*. L e m m a (2.8) can be applied to obtain that c(z~)--> c* and (4.4)
c(zi) >- c* Vi >- Ni.
But, if z* ~ / 2 (4.5)
A, (4.2)(iii) implies that
C(ZN:+I)< C*
which contradicts (4.4). Thus, we must have z * E A and the proof is complete.
(4.6) Remark. It is immediately obvious that Conditions (3.3) imply Conditions (4.2) when Ai-- A for i = 0, 1,2 ..... Appendix A. Selected counterexamples The purpose of this Appendix is to show, by means of c o u n t e r e x a m p l e s , that certain of the implications not proved in Section 3 cannot, in fact, be proved. In
s. Tishyadhigama, E. Polak, R. Klessig/ AIgorithms modeled by point-to-set maps
185
the first set of c o u n t e r e x a m p l e s c is continuous, A is closed and A is single valued. U n d e r these restrictions, the sets of sufficient conditions aggregate into four e q u i v a l e n c e classes (see Fig. 1). T h e s e are: C l a s s I: C o n d i t i o n s (3.3) and (3.31). C l a s s lh C o n d i t i o n s (3.8), (3.13), (3.18), (3,23) and (3.25). C l a s s III: C o n d i t i o n s (3.38) and (3.44). C l a s s IV: C o n d i t i o n (3.47). It is i m m e d i a t e l y e v i d e n t that IV implies III, Ill implies II, II implies I. We shall p r e s e n t c o u n t e r e x a m p l e s to show that the c o n v e r s e is false. T h e first two c o u n t e r e x a m p l e s will be c o n s t r u c t e d f r o m the following o p t i m i z a t i o n p r o b l e m and algorithm. (A.1) Problem. m i n { c ( z ) l z E R'} w h e r e c :R ~~ R ' is defined by -z-I (A.2)
c(z) =
forz2 a= c*,
(A.26)
c(z.i) = (I - 1 / i ) 2 ~ I ___a6*,
(A.27)
c(a(zi)) = (1 + l/i - ~9,: ) ~ i g,_5,
(A.28)
c ( a ( h ) ) = c(O) = O.
13
{z' I z ' r 1, z' ~ ~-, there exists 9 > 0 o f c o n t i n u i t y o f c, z = 1 o r -~. N o w
e,? I,
T h u s , t h e r e e x i s t s an i n t e g e r N such that (A.29)
c ( a ( z u ) ) < 2 = c*,
(A.30)
c(a(Y.N)) = 0 < 1 = ~*
a n d (3.3)(iii) holds at z = 1. A s i m i l a r a r g u m e n t can be used to s h o w that (3.3)(iii) h o l d s at z - 1~ 4. O n the o t h e r h a n d , C o n d i t i o n (3.31)(iii) d o e s not hold at z = 1. T o see this, c o n s i d e r a n y 9 E (0, 1). T h e n , a ( l + ~ ) = 1 + ~ - 9 = 9 (-~,-~). Therefore, (A.31)
2'g
T h u s , t h e r e e x i s t s an g > 0 (A.32)
25
c ( a ( l + E ) ) = ( ~ - ~ ) 2 = E " - - ~ E + % ~ iZ.
s u c h that
c ( a ( 1 + ~)) > 1 = c ( l ) Ve E (0, ~].
C o n s e q u e n t l y , C o n d i t i o n (3.3 l)(iii) d o e s n o t hold at z = I.
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
189
References [1] J.W. Daniel, "Convergent step sizes for gradient-like feasible direction algorithms for constrained optimization," in J.B. Rosen, O.L. Mangasarian and K. Ritter, eds., Nonlinear programming (Academic Press, New York, 1970) pp. 245-274. [2] J. Dubois, "Theorems of convergence for improved nonlinear programming algorithms", Operations Research 21 (1) (1973) 328-332. [3] B.C. Eaves and W.I. Zangwill, "Generalized cutting plane algorithms", Siam Journal on Control 9 (4) (1971) 529-542. [4] R.M. Elkin, "Convergence theorems for Gauss-Seidel and other minimization algorithms", Computer Science Center Tech. Rept. 68-59, University of Maryland, College Park, MD (1968). [51 W.W. Hogan, "Point-to-set maps in mathematical programming", S I A M Review 15 (3) (1973) 591--603. [6] W.W. Hogan, "Applications of general convergence theory for outer approximation algorithms", Mathematical Programming 5 (1973) 151-168. [7] P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (3) (1975) 308--331. [8] R. Klessig, "A general theory of convergence for constrained optimization algorithms that use antizigzagging provisions," S l A M Journal on Control 12 (4) (1974) 598-608. [9] R. Klessig and E. Polak, "An adaptive precision gradient method for optimal control," S I A M Journal on Control 11 (1) (1973) 80-93. [I0] D.G. Luenberger, Introduction to linear and nonlinear programming (Addison-Wesley, Reading, MA, 1973). [11] M.L. Lenard, "Practical convergence conditions for unconstrained optimization", Mathematical Programming 4 (1973) 309--323. [12] G.G.L. Meyer and E. Polak, "Abstract models for the synthesis of optimization algorithms", S I A M Journal on Control 9 (1971) 547-560. [13] G.G.L. Meyer, "Convergence conditions for a type of algorithm model", Tech. Rept. 75-14, The Johns Hopkins University, Baltimore, MD (1975). [14] G.G.L. Meyer, "Algorithm model for penalty-type iterative procedures", Journal of Computer and Systems Sciences 9 (1974) 20-30. [15] R.R. Meyer, "Sufficient conditions for the convergence of monotonic mathematical programming algorithms", Journal of Computer and Systems Sciences 12 (1976) 108-121. [16] R.R. Meyer, "A comparison of the forcing functions and point-to-set mapping approaches to convergence analysis", S I A M Journal on Control 15 (4) (1977) 699-715. [17] R.R. Meyer, "A convergence theory foraclassofanti-jammingstrategies',Tech. Rept. 1481,Math. Research Center, University of Wisconsin, Madison, WI. (1974). [18] H. Mukai and E. Polak, "On the use of approximations in algorithms for optimization problems with equality and inequality constraints", S l A M Journal on Numerical Analysis 15 (4) (1978) 674-.693. [19] J.M. Ortega and W.C. Rheinboidt, lterative solution o f nonlinear equations in several variables (Academic Press, New York, 1970). [20] J.M. Ortega and W.C. Rheinboldt, "A general convergence result for unconstrained minimization methods", S I A M Journal on Numerical Analysis 9 (1) (1972) 40--43. [21] E. Polak, "On the convergence of optimization algorithms", Revue Fran~aise d'Automatique, Informatique et Recherche Op~rationnelle 16(RI) (1969) 17-34. [21a] E. Polak, Computational methods in optimization: a unified approach (Academic Press, New York, 1971). i"22] E. Polak, "On the implementation of conceptual algorithms", in J.B. Rosen, O.L. Mangasarian and K. Ritter, eds., Nonlinear programming (Academic Press, New York, 1970) pp. 275-291. [23] E. Polak, R.W.H. Sargent and D.J. Sebastian, "On the convergence of sequential minimization algorithms", Journal o f Optimization Theory and Applications 14 (1974) 439---442. [24] B.T. Polyak, "Gradient methods for the minimization of functionals", U.S.S.R. Computational Mathematics and Mathematical Physics 3 (1963) 864--878. [25] S.W. Rauch, "A convergence theory for a class of nonlinear programming problems", S I A M Journal on Numerical Analysis 10 (I) (1973) 207-228.
190
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
[26] S. Tishyadhigama, "General convergence theorems: Their relationships and applications", Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, (1977). [27] D.M. Topkis and A.F. Veinott, "On the convergence of some feasible direction algorithms for nonlinear programming", SIAM Journal on Control 5 (2) (1967) 268-279. [28] P. Wolfe, "Convergence conditions for ascent methods", SIAM Review 11 (2) (1969) 226--235. [29] W.I. Zangwill, Nonlinear programming: a unified approach (Prentice-Hall, Englewood Cliffs, NJ., 1969). [30] W.I. Zangwill, "Convergence conditions for nonlinear programming algorithms", Management Science 16 (1) (1969) 1-13. [31] J.L. Kelley, General topology (Van Nostrand, Princeton, NJ., 1955). [32] L. Armijo, "Minimization of functions having continuous partial derivatives", Pacific Journal o[ Mathematics 16 (1966) I-3.