PREFACE
The present book is primarily dedicated to the study of the optimality conditions for a nonlinear programming p...
11 downloads
692 Views
18MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PREFACE
The present book is primarily dedicated to the study of the optimality conditions for a nonlinear programming problem, also known as a mathematical programming problem. Indeed one of the main subjects of mathematical optimization (a relatively modern branch of applied mathematics which has grown at an exponential rate, both from a theoretical point of view and on the side of applications) is the study of a class of extremum problems where the objective function / is to be optimized under some restrictions, usually in the form of equalities and/or inequalities. As an example for a nonlinear programming problem in R^, we mean an extremum problem of the following type: Min f{x) xes
(F)
where S = {x e X \ gi{x) ^ 0, z = l,...,m; hk{x) = 0, /c = l,...,p},
XcR'\
f,guhk:X^R.
If all the functions involved in (P) are linear, we speak of linear programming problems. A more general version of (P) is one where the objective function / is a vector-valued function. This latter case (especially important from an economic point of view) will be treated in the last chapter of the book. Problems similar to (P) arise in several contexts: the building and interpreting of economic models; the study of various technological processes; the development of optimal choices in finance; operations research; management science; production processes; transportation models; statistical decisions, etc. Therefore it is of the utmost importance to study the existence of solutions for (P) and to study the effective methods (i.e. numerical algorithms) for finding solutions of the problem. This second aspect of the study of (P) will not be treated here; we shall be concerned only with the study of the optimality conditions for (P), i.e. of the necessary and
Vlll
sufficient conditions such that a point x * G 5 is a solution (local or global) for ( P ) . We have decided t o conduct our analysis in the Euclidean space IRJ^ (there is only one exception in Section 13 of Chapter III, where problems defined in infinite dimensional spaces are briefly mentioned).
We have
renounced t o a more general mathematical setting, because of two main reasons: a)
Mathematical programming in i R ^ is a relatively self-contained field within the larger class of optimization problems (which includes static and dynamic optimization problems, calculus of variations, optimal control problems, etc.). The reader can quickly become familiar w i t h the main topics; only some previous knowledge in linear algebra, mathematical analysis and convex analysis is necessary in order t o understand the subject.
b) Many optimization problems in practice are defined on IR!^ or can be approximated by a problem defined on BJ^\ moreover, the numerical solution of optimization problems in general spaces by means of computers usually requires a former embedding of the problem in a finite dimensional space. Besides the classical optimality conditions for a problem ( P ) , where all the functions involved are assumed differentiable or even continuously differentiable (i.e. " s m o o t h " ) , special attention has been given t o the study of optimality conditions when the functions of ( P ) are "nonsmooth". T h e title itself of this book obviously points out this aspect. Moreover, we have dedicated an entire chapter (the last one) t o the study of so-called vector optimization
problems,
which are rarely considered in books treating
mathematical programming problems. Indeed, in several practical problems we have to optimize simultaneously, on the basis of certain criteria, several objective functions (e.g. costs of production, production times, degrees of mechanization and automatization, etc.). Such problems are obviously more complex than problem ( P ) and, accordingly, their mathematical handling is more complex.
IX
In preparing this book a special effort has been made t o obtain a selfcontained treatment of the subjects; so we hope t h a t this may be a suitable book as a textbook and a reference (every chapter ends w i t h an extensive bibliography that should be useful t o the reader for further investigations). We do not claim t o have produced an "advanced" book on mathematical programming; this book is addressed t o those researchers, graduate students and post-graduates wishing t o begin the study of nonlinear programming problems, in view of further deepenings. We think and hope t h a t this book may also be useful t o theoretical economists, engineers and applied researchers involved in such problems. The book is divided into six chapters: Chapter I briefly deals with the notion of nonlinear programming problems and with basic notations, conventions, definitions and results pertaining t o set theory, mathematical analysis and linear algebra. Chapter II deals with convex sets, convex functions and generalized convex functions; indeed in the study of optimality conditions for ( P ) and in the study of the so-called "dual problems" of ( P ) , a fundamental role is played by the convexity or also by the generalized convexity of the functions involved. This explains why we have dedicated an entire chapter t o the following topics: topological properties of convex sets, separation of convex sets, theorems of the alternative, properties of convex functions and generalized convex functions. Chapter III covers the classical optimality conditions for a nonlinear programming problem in the smooth case, i.e. when all the functions involved are differentiable or continuously differentiable. The Fritz John and Karush-Kuhn-Tucker optimality conditions are developed, both for equality and/or inequality constrained problems. Both first-order and secondorder optimality conditions are developed; some fundamental material on (first-order) constraint qualifications is presented.
Sufficient optimality
conditions are obtained by use of various generalizations of convex functions. The last section of the chapter deals w i t h so-called "saddle point problems". Chapter IV deals with the study of optimality conditions for ( P ) in the nonsmooth case, i.e. when the functions involved in ( P ) are not differ-
entiable in the classical sense. Thus, different kinds of generalized differentiability are introduced and compared. After some general remarks about classical differentiability notions, we present some generalized directional derivatives and generalized subdifferential mappings and discuss these notions in connection with their application in mathematical optimization.
In this context, the notion of "local cone approximation" of
sets, introduced in Chapter II, is of great importance. Using this notion, we develop the so-called i^-directional derivatives and X-subdifferentials and give an abstract approach for the development of general necessary optimality conditions for the nonsmooth case. In Chapter V we give a survey about the most important duality concepts in mathematical optimization.
For different kinds of optimization
problems we discuss properties of their dual problems and develop different duality assertions. Starting with the well-known duality in linear optimization, we describe the Wolfe duality for convex optimization problems, the Lagrange duality in connection with saddle-point assertions and the perturbation concept, using the notion of conjugate functions. In this manner we shall see t h a t the concepts will be generalized step by step. Chapter VI is concerned with vector nonlinear programming problems. The presentation of the main optimality notions (weak efficient points, efficient points, proper efficient points) is followed by providing theorems giving conditions about the existence of said points.
A classical prob-
lem of vector optimization is then discussed by means of the Fritz John and Kuhn-Tucker conditions, both for the smooth and nonsmooth (Lipschitzian) case. Links with a scalar optimization problem are emphasized (especially by means of some generalized convex functions, described in Chapter II) through various scalarization techniques and in a vector version of the Wolfe dual problem presented in Chapter IV.
The idea t o write the present book emerged several years ago, during a t r i p of the two Italian authors t o the Department of Mathematics of the Technical University of llmenau (Germany), in those days a relatively small university department among the woods of Thuringia ("Das griine Thuringen), where Prof. Karl-Heinz Elster had founded the now renowned
XI
review "Optimization" and created an important center for studies in mathematical programming. We recognize our intellectual debt toward many people from whom we have borrowed many ideas and results; indeed we owe much t o the authors cited in the bibliographical list at the end of each chapter.
We
want, however, t o pay a particular respect t o the memory of Prof. KarlHeinz Elster. We want t o express our gratitude t o the Italian Ministry for University, Scientific and Technological Research for the financial support provided. Finally we want t o thank Drs. A. Sevenster of Elsevier BV (Amsterdam) for his patience and precious collaboration and express our most deep and sincere thanks t o Mrs. Anita Klooster for her diligent and hard work in typing our almost unreadable manuscript. The book has been planned and discussed throughout its various phases of developments by all three authors. However, Chapters I, II and III have been written by G. Giorgi; Chapters IV and V by J. Thierfelder; Chapter VI by A. Guerraggio. The authors.
C H A P T E R I. I N T R O D U C T I O N
1.1. Optimization Problems In the analysis of numerous mathematical problems (arising in Economics, Statistics, Operations Research, Engineering, Physics, etc.) situations often occur where a decison maker must make a decision in order to rule a system in an optimal way (on the basis of specified criteria), a model containing several alternative possibilities and which is, at least partially, under the control of the decision maker. Such a problem can be considered as an optimization problem] "optimization" is a catch-all term for maximization or minimization and lies at the heart of several applied sciences, but perhaps the more natural application of optimization theory is in the field of economic sciences. Indeed optimization (or better, optimization subject to constraints) has been considered by many authors as defining the essential nature of Economics. We quote only the famous classical statement of Robbins (1932, p. 16): "Economics is the science which studies human behaviour as a relationship between ends and scarce means which have alternative uses". The first mathematicians dealing with optimization problems were Fermat, Euler, the Bernoullis, Lagrange and others in connection with the development of Calculus in the 17th and 18th centuries. However, the first results in the modern theory of mathematical optimization were presented by the Russian Nobel prize winner L.V. Kantorovich and the American mathematician G.B. Dantzig, in 1939 and 1947 respectively. In 1951 the American mathematicians H.W. Kuhn and A.W. Tucker published important theoretical results in the theory of mathematical optimization, as extensions of the classical methods of Euler-Lagrange for the solution of optimization problems with equality constraints and in presence of differentiability assumptions. Prior to the work of Kuhn and Tucker and in the same direction,
2
Introduction
we mention the contributions of Fritz John in 1948 and the unpublished Master's thesis of W . Karush in 1939. Since then, studies in mathematical optimization developed rapidly, from a both theoretical and practical point of view, stimulated by the necessity of dealing with various organizational and planning problems used in Economics, Engineering and Natural Science. Such studies often made use of several tools of some modern mathematical fields as, e.g.. functional analysis, topology, linear algebra, convex analysis. Conversely, the development of optimization theory has considerably forced the development of these same fields. So we can affirm that Mathematics has received many impulses from the development of the studies on extremum problems. As we mentioned in the Preface, almost all the problems we handle in the following chapters will be described by real-valued functions defined on IR^.
The optimization problems in which we shall be concerned may
often be formalized as follows: a) The behaviour of a system (in the most general meaning) depends on some variables, some of them beyond the control of the decision maker (these are namely the "data" of the problem) and the other ones under his control (these latter are the true variables of the problem, variables usually described by a vector x G IR^). b) The various alternative possibilities for the decision maker are described by a set iS C JR^: so one has t o choose, in an optimal way, a vector x^ e S or more than one vector, in case of a problem w i t h several solutions. c)
The "advantage" of this choice is shown by the value assumed by a function / , usually defined on a set containing S.
d) Let us consider for the moment the case of f : D C IR^ -^ M, the case of scalar functions
i.e.
(only in the last chapter of this book shall
we face the vector optimization
problem, where f : D C IR^ —> IR^,
m > 1). Then the choice of vector x^ e S \s considered an optimal one and we say that x^ solves our optimization problem when it is
f{x^) ^ f{x) ,
for each xeS
,
(1)
Optimization problems
3
in case of a minimiz3tion
f{x^) ^ f{x) ,
problem',
for each xeS
in case of a maximization
,
(2)
problem.
When the previous inequalities hold strictly, for each x E S, x ^
x^,
we say that x^ is a strict solution of the related optimization problem. More precisely:
we have an optimization
cal programming
problem
problem
or a
mathemati-
when we are given a set 5 C M^
and a
real-valued scalar function / : IRP' —^ M (some authors also consider extended real-valued scalar functions / : ]R^ —> JR U { ± 0 0 } ) , defined on a set D containing S, and when we are looking for some element x^ e S such t h a t (1) holds. In this case we shall write the problem as
Min f(x) . xes
(3)
If we are looking for some element x^ e S such t h a t (2) holds, we shall write the problem as
Max f{x) .
(4)
xeS It is better to distinguish between minimization and maximization problems and the corresponding optimal values:
f{x^) = min f{x) - min {f(x) \ x e S} ; XES
f(x^) = max f{x) = max {/(x) \ x e S} . xeS Obviously it is always
min f{x) = - m a x { - / ( x ) } . XGS
XES
Therefore we can study, without loss of generality, only one type of optimization problems.
In the present book we shall study minimization
problems i.e. problems expressed by relation (3). T h e set S is called the
4
Introduction
set of feasible solutions
or simply the feasible set or constraint
point x G 5 is a feasible point
or also a decision
vector. The function / is named objective (1) holds is called an optimal
solution
function.
or optimal
vector
or a
is the corresponding optimal
point
program
for problem (3) (perhaps
value of the problem. If
5 = 0 the optimization problem is not feasible or not consistent; the problem is feasible or consistent,
A
A point x ^ such t h a t
(i.e. a global, minimum point of / on S) or simply a solution not unique); f{x^)
set.
if 5 7^ 0
even if in this case the problem may
admit no solution. Many authors term (3) an unconstrained
or free minimization problem,
when S coincides with the domain D of f, D open set, or also when S is an open subset of D.
More generally, a free minimization problem consists
in the search of minimizers which are interior points of S. we have a constrained
Otherwise
minimization problem. In the case where S (not
open) is not given by explicit functional constraints, it may be viewed as a true set constraint.
We shall treat separately the cases where S is given
also by specific functional constraints; these functional constraints may be expressed by a system of equalities and/or by a system of inequalities. The following terminology is well known. Consider a point x^ G 5 ; then / is said t o have a local minimum
/(x^) ^/(x) , where N{x^)
at x^ if
\/xeSnN{x^),
is some neighbourhood of x^. If
/(x^) < /(x), then we have a strict
\/xeSnN{x^),
X 7^ x^,
local minimum.
Of course each global minimum is a local minimum, but not conversely. If the feasible set S C M^
is not empty, is bounded and closed (i.e.
compact) and the objective function / is continuous on S, the well-known Weierstrass theorem assures us that problems (3) and (4) admit solution. If we are concerned only with problem (3) or only w i t h problem (4), the above-mentioned assumptions of the Weierstrass theorem are weakened by means of the semicontinuity notions.
Optimization problems
5
Definition 1.1. A function f : D C IR^ -^ ]R \s called lower semicontinuous at x^ G D, if for each 6: > 0 there exists a neighbourhood N{x^)
of
x^ such that
f{x) > f{x^) -s
,
VxeDn
N{x^) .
f{x) < f{x^) +s ,
WxeDn
N{x^)
If it is
f is called upper semicontinuous at x^ G D. The function / is said to be lower (upper) semicontinuous on D (with respect to D) if it is lower (upper) semicontinuous (with respect to D) at each point x^ E D or, equivalently if one of the following conditions holds:
a) The set L{f,a) = {x \ x e D, f{x) ^ a} {U{f,a) = {x \ x e D^ f{x) ^ a}) is closed relative to D for each a E M. b) The set SU{f,a) = {x\x e D, f{x) > a} {SL{f,a) Z), f{x) < a]) is open relative to D for each a e M.
= {x\x
e
We recall that, given T and A two sets such that T C Ac M"^, T is said to be open [closed) relative to A\fT = Anft, where ft is some open (closed) set in IR^. Then we have the following generalized Weierstrass theorem: if 5 C IRP' is nonempty and compact and / is lower semicontinuous on S, then it admits a global minimum over 5; if / is upper semicontinuous on S, then it admits a global maximum over S. If we also take into consideration cases where / does not admit a minimum or a maximum over S, we shall write, instead of (3) or (4), respectively
Inf f{x) ;
Sup f{x) .
In this case, when 5 = 0, then the following convention is used:
inf /(x) = +00 ;
sup f{x) = —oo .
Introduction
We shall be mainly concerned with those constrained minimization problems where t h e feasible set S has a specified structure, i.e. with the socalled nonlinear
programming
problems
( N L P ) . These problems can be
formalized as follows:
Min fix) ,
(5)
xes S =^ {x\x
e X, gi{x) ^ 0, i = l,...,m, hj{x) = 0 , j = l,...,r} ,
where all functions gi, hj are defined on a set D C M^
and X C M^
is
any set contained in D. In vector notation:
S ={x\xe
X, g{x) ^ 0, h{x) = 0},
where ^ = [pi,--.,^m], h = [/ii,...,/i^]. The functions g, h are the constraints X,
or constraint
functions
whereas
when it is not open, may be considered a nonfunctional constraint,
i.e. a set constraint.
The set X might typically include lower and upper
bounds on the variables (in this case X is often called a " b o x " ) or might represent a specially structured abstract constraint. If the optimal point x^ E. S \s not interior t o X, some complications may arise in the study of optimality conditions for problem (5) (see Chapter III). Each of t h e constraints gi[x)
^ 0 is called an inequality
constraint
and each of t h e constraints hj{x)
= 0 is called an equality
constraint.
NLP problems containing only equality constraints and where X is open are also called "classical constrained optimization problems", as they were first considered by J.L. Lagrange more than two centuries ago. In the following we shall assume that all functions involved in ( N L P ) are continuous, at least at a feasible point x^. Thus, if we want t o check whether a feasible point x^ is a local solution or not, only those constraints for which gi{x^) gi{x)
— 0 holds true are relevant, among the constraints
^ 0. In case of gi{x^)
< 0, because of the continuity, there exists a
suitable neighbourhood of x^ which belongs t o {x \ gi{x)
< 0 } , i.e. which
does not contradict this constraint. So, given a point x^ G S, the set
/(xO) = {z|yi(a;0) = 0}
Optimization
problems
is called the set of the active or effective or binding constraints for problem (5) at x^, whereas the set
J(xO) = {i I gi{x°) < 0} is the set of nan active constraints at x^ G S. If in (5) all functions are differentiable, we speak of smooth optimization problems, otherwise they are nonsmoot/? optimization problems. (More precisely: a function is called "smooth of order fc" if it is differentiable and the partial derivatives of order k are continuous.) We also note t h a t the term "nonlinear programming" is a little confusing; indeed here "nonlinear" means "not necessarily linear" and therefore formulation (5) also includes the case where all the functions involved are linear, i.e. the linear programming
problem.
The scheme used t o describe a nonlinear programming problem is, in spite of its appearance, quite general, as in it we may include variants and generalizations, described, e.g., as follows: 1)
If some constraint is in the form gi{x) put gi{x)
2)
^ bi, bi ^^ 0, it is sufficient t o
= Qiix) — bi t o obtain the standard formulation (5).
If some constraint is in the form p i ( x ) ^ 0, it is sufficient t o put ^^(a:) = —gi(x)
t o obtain the standard formulation (5).
3) Also the non negativeness conditions
imposed on vector x, i.e. xi
^ 0,
rr2 ^ 0, ...jXn ^ 0, may be included in formulation (5) by setting gm-j-l{x)
= -Xi
^ 0, ...,gm+n{x)
= -Xn
^ 0.
Moreover, we note that sometimes it is useful to substitute an equality constraint hj{x) hj{x)
= 0 with two opposite inequality constraints hj{x)
^ 0,
^ 0. Similarly, it is sometimes convenient to substitute an inequality
constraint gi{x)
^ 0 with an equality constraint of the type
gi{x) + zi = 0 , where z^ ^ 0 is a so-called slack variable (see also Chapter III). We give only the following simple example of an NLP problem.
Introduction
Figure 1.
Optimization problems
9
Example 1.1.
Min f{x) = \xi-2\ xes S^{x\xeE?,
+
\x2-2\
gi{x) = (X2)2 -xiSO]hi{x)
= (xi)^ + (xs)^ - 1 - 0}
The dashed lines in Figure 1 represent two level sets of the objective function, i.e. points at which the objective function / ues.
has constant val-
T h e feasible set is the arc of the circumference lying w i t h i n the
parabola. It is easy t o see that the solution of this problem is the point x^ = ( \ / 2 / 2 ; \ / 2 / 2 ) . If the equality constraint is removed, the solution is seen t o be at x^ = (2, \ / 2 ) . If both constraints are removed, i.e. if we have a free or unconstrained minimization problem, the solution is x^ = ( 2 , 2 ) . Several classes of NLP problems have been dealt w i t h in the past years; we briefly mention some of them. a) T h e case where in (5) all functions are linear (i.e. we have a linear programming
problem)
has been treated extensively. There is enormous
literature on this subject, both from a theoretical and a computational point of view. W e shall not be particularly concerned with this kind of mathematical programming problem. b) The case where / and every gi are convex functions and all hj are linear functions, has received particular attention. In this case we have a convex programming
problem.
When / , Qi, hj are generalized convex
functions, we have a generalized convex programming
problem.
These
assumptions are usually taken in order t o make the computation of the solutions easier and to assure t h a t any local solution is also global. Many mathematical programming models, arising from practical problems, exhibit convex or generalized convex functions and numerous computational algorithms are "well behaved" under convexity or generalized convexity assumptions. c) When in (5) X C M^ is open and there are no inequality constraints, we have a "classical optimization problem" . These problems are indeed
10
Introduction
the oldest mathematical programming problems since their analysis goes back to the eighteenth century (see Chapter III). d) When in (5) / is a quadratic form and the constraints are linear, we have a quadratic
programming
problem.
Also for this case, particular
results and algorithms have been obtained. e) When / , g and h are separable functions (i.e. of the form f{x) E L i
fki^k)',
9i{x)
= E L i 9iA^k)\
hj{x)
have a separable nonlinear programming
= J2k^i
problem.
hj^k{xk)),
= we
Also for this case,
important for some applications, special algorithms have been developed. f)
When in (5) the objective function is of the type u(x)
f{^) = v(x) we have a fractional
of hyperbolic
programming
problem.
This is an-
other important case of nonlinear programming problems, arising from practical problems. If / is a vector-valued
function, it is again possible to define optimization
problems, but in this case the concept of optimal point has t o be redefined; indeed in this case we have several definitions of an optimal point. When /
is a vector-valued function, we speak of vector optimization.
Vector
optimization is treated in Chapter V I . We cannot let pass unnoticed the limits of our formulation of an optimization problem; in effect we shall focus our analysis only on nonlinear programming problems and on vector programming problems, mostly defined on IR^, whereas some other important mathematical programming problems are excluded from our analysis. These include the case of the stochastic
programming
problems, i.e. those optimization problems where
random variables are t o be taken into account (we shall treat only istic problems), and integer programming
determin-
problems, i.e. those optimization
problems where some or all components of x E S must be integer numbers, a condition typical of several problems of Management Science or of
Basic mathematical preliminaries and notations
11
Operations Research. Finally, it is convenient t o distinguish between
static
optimization problems and dynamic optimization problems. The first concern situations at a certain moment and therefore no variation of time is taken into account; in the second type of problems the time evolution of the considered quantities assumes a fundamental importance. In this class of problems we may include the ones typical of Calculus of Variations, trol Theory, Dynamic
Programming.
Con-
We shall not even be concerned with
these types of optimization problems which form independent (and very wide) subjects of studies. The interested reader may consult the classical works listed in the bibliographical references of the present chapter.
1.2. Basic Mathematical Preliminaries and Notations We review in this section, for the reader's convenience, some basic concepts from set theory, linear algebra and real functions theory. We list also some notations and conventions used throughout the book (some of them were already introduced in the previous section). For more details, the reader is referred t o the books cited at the end of this chapter. A)
SET THEORY
AND LOGICAL
NOTATIONS
•
The empty set is denoted by 0.
•
\f X and Y are two sets and X is contained in Y, we shall write X C Y {or Y D X);
X CY, •
If y
e X,
contained in Y, we shall write
y^Y.
The difference {x\x
•
X
\f X \s properly
X
of two sets X and Y is denoted by X\Y:
X\Y
=
^Y}.
C X , then X\Y
is called the complement
of Y (relative t o
X) and is also denoted by Y^.
When X is the whole space we use
also the notation Y^ or ~ Y.
Do not confuse w i t h the algebraic
difference of two sets: X — Y. •
The algebraic X±Y
sum {difference)
3nd\t\sX±Y
^{z\z
of two sets X, = x±y]xeX,
Y \s denoted by y
eY}.
12
Introduction
•
In the same manner it is A X = {z\z real number A, and it is X^fLi Xi X i , i = l^ '"•)?}f
•
= Ax, x G X},
for an arbitrary
= {z \ z = x-^ + ... + x ^ , x^ G
fo'' the given sets X i , . . . , X p .
The Cartesian product
of n sets 5 i , ^ 2 , . . . , 5^ is denoted by ^ i x
52 X ... X Sp or also by H i L i Si\ the p-times repeated Cartesian product of a set 5 by itself is denoted by 5 ^ . •
JR denotes the set of real numbers]
Q denotes the set of
rational
numbers] Z denotes the set of integer numbers] IN denotes the set of natural •
numbers] i V + is the set of positive
natural numbers.
The extended real number system, i.e. IR U {dzcx)}, is also denoted hy M.
•
The closed real interval
is denoted by [a^b] = {x e ]R\a
with a^b e M] the open real interval lR\a
< X < b}, with a^b e M.
^ x S b},
is denoted by (a, b) = {x e
The significance of (a, 6], [a, 6) is
obvious. •
M^ denotes the n-dimensional real euclidean space. IR^ and
M^
denote the set of elements of M^ having respectively nonnegative, o
o
nonpositive components. M^
and M^
are, respectively, the posi-
tive and the negative orthant of IR^. •
The origin of M^
is simply denoted by 0 and from the context it
will be clear if it is a scalar or a vector. •
Given a point x^ G M^,
an (open) neighbourhood
ball of centre x^ is denoted by N{x^)
or B{x^)
of x^ or (open)
or V ' ( x ^ ) , etc. If we
need t o specify the radius 5 > 0 of the ball we shall write Bs{x^),
Vsix^)
or also N{x^,S),
B{x^,5),
•
c a r d ( 5 ) is the cardinality
•
int(S') denotes the interior
•
r e l i n t ( 5 ) denotes the relative interior
Ns{x^),
etc.
of 5 . of S, i.e. the set of all its interior points.
ter II). •
bd(S') denotes the boundary
of S.
•
c l ( 5 ) or S denotes the closure of S.
of S (see Section 2.1 of Chap-
Basic mathematical preliminaries and notations
• •
13
ext(6') denotes the exterior of S, i.e. the set of all its exterior points. If ^ and B are two logical propositions, then: A=^ B).
B denotes the implication {\f A holds then B holds; A implies The reverse implication is denoted hy A -oo
•
A (real) matrix A of order or dimension m , n (or m • n) is an array of m • n real numbers; matrices are usually denoted by capital latin letters.
A matrix A is also denoted by ^
=
[aij]
i — l,...,m;
j = 1 , . . . , n . Its j - t h column is denoted by A^ whereas its z-th row
14
Introduction
is denoted by Ai. A matrix is square \f m = n; in this case n is the order (or dimension) of the square matrix. •
If A is an m by n matrix, its transpose recall that a square matrix A is symmetric symmetric
•
\f A =
In is the identity
is denoted by A^. \f A — A^
We
and is skew
—A^. matrix of order n ; if there is no need t o specify
its dimension, we shall write / .
The zero (or null) matrix will be
denoted simply by 0. •
If A is a square matrix, det(A)
or \A\ denotes its
determinant.
•
Let A and B be two m by n matrices; we shall write:
A Z B , if aij ^ bij , Vi, j . A>B, \f A ^ B, hut A^B (i.e. there exists at least an element aij > bij). A>B, if aij > bij , Vi, j . If in the above inequalities B is the m by n zero matrix, we say that A is nonnegative
\f A ^ 0 .
A is semipositive
\f A>0
A is positive
\f A > 0 .
.
The same convention and terminology apply also when comparing two vectors of the same dimension. It is apparent the significance of A ^ B, A < B, A < B.
Thus the following (already introduced)
subsets of FT
lRl = {x\xeIR'', IR'l = {x\xe
x^O} R'', X > 0}
are, respectively, the nonnegative and the positive orthant of The set 1 ? ^ \ { 0 } = { x I X € J?^, X > 0 } is the semipositive orthant of IRP'.
FT.
Basic mathematical preliminaries and notations
15
• The inverse of a square matrix A (when it exists) is denoted by A~^] the rank of a matrix, of order m^n, is denoted by r a n k ( ^ ) . • We recall that a quadratic form Q(x) = x^Ax
[A symmetric) is
positive {negative) definite if Q{x) > 0 ( < 0), Vx e IR"^, x ^ 0. It is positive [negative) semidefinite \f Q{x) ^ 0 { ^ 0), ^x E IR^, X y^ 0. It is indefinite if 3x, x such that Q{x) > 0, Q{x) < 0. C)
FUNCTIONS • f : X -^Y, with X C M'^.Y C JR"^, denotes a function defined on X and with range Y. The set X is also called its domain and sometimes it is also denoted by d o m ( / ) . If m > 1 we also write / = [/ij A , •••,/m] and speak then of vector functions or vectorvalued functions. • e p i / = {(x.y) e IRP-^^ \x e X, y ^ fi^)} is the epigraph of / : X -^ iR and the set h y p o / - {{x,y) G iR^+^ | x G X , y ^ f{x)} is the hypograph of / . The str/ct epigraph and str/ct hypograph of / are respectively the sets
e p i V - {{x^y) G iR"+i I X G X, y > / ( x ) } ; hypo^ / = {(x.y) G R^-^' I X G X, y < / ( x ) } . • The a-lower level set of / : X —> JR is denoted by L ( / , a) and it is defined as L ( / , a) = { x G X | / ( x ) ^ a } , with a e M. The a-upper level set of / : X —> JR is denoted by I 7 ( / , a) and is defined as t / ( / , a) =- { x G X | / ( x ) ^ a } . The a-level set of f : X -^ Mar a-isocontour set of f is y (/, a) = ^ ( / , ^ ) n [ / ( / , a), i.e. y ( / , a) - { x G X I / ( x ) - c.}. SL(f, a) = { x G X I / ( x ) < a} is the a-strict lower level set of / and SU{f^a) = { x G X I / ( x ) > a} is the a-strict upper level set of / . We have
L{f, -oo) - 5 L ( / , -oo) = ^ ( / , +(x.) - SU{f, +oo) = 0 ; [/(/, -oo) - SU{f, -oo) = L ( / , +oo) = SL(f, +oo) = X .
16
Introduction
Sometimes we shall use the "small o" of Landau: let / : IR^ —> IR and g : M^ -^ Mbe defined in a neighbourhood N{x^) of x^, with g{x) / 0 in N{x^). Then we may write / — o(g) for x —> x^, when
lim 4 4 = 0.
x-^xo g{x) So, / f
= ^(1). ^or X -^ x^, is equivalent to
lim f(x)
= 0,
= o(||a:||) and / = o ( | | x p ) , for x -^ x^, mean respectively
that lim ^ r - / = 0 and lim f ^ x-^xO
||x||
x-^xO
= 0.
\\x\\^
Let / : M^ —> iR; then the nan deleted limit superior of / at x^ e M^, denoted by lim sup f{x) (also lim f{x)) is defined by
A= limsup/(x)= x->xO
inf
sup
iV(xO)
xeA^(xO)
f{x).
The value A e M may be equivalently characterized as the maximum value of the so-called limit class of Q, where u e ft when there exists a sequence {x^} C M^, x^ —> x^ such that /(o:^) —> cj, i.e. when, for each s > 0 and each N{x^), there exists x E N{x^) such that | / ( ^ ) — Ci;| < s. It is easy to verify that the limit superior A is characterized by the proposition Vs > 0, 3N{x^) such that
f{x) x^
sup iV(xO)
inf
f{x) .
xeN{x^)
We have
lim inf {-f{x))
= — lim sup f{x) ;
Basic mathematical
preliminaries and notations
17
limsup (—/(x)) = — liminf f{x) ; X—*-X^
X—>X^
liminf f{x) = limsup f{x) x^
limsup g{x): liminf g{x)\
X—yx^
x—>x
iii) if l i m g{x) exists, then lim sup {f{x) + g{x)) = lim sup f{x) + lim g{x) X—•X
X—•X
X—•X
(similarly for the l i m i n f operation). It is not difficult to see that / : IRP' —> J2 is upper semicontinuous at x^ G IRP' if and only if limsup f{x) = f{x^) (but if we would adopt x-^xO
the deleted definition we have to write limsup f{x) ^
f{x^)).
x-^xO
Similarly / is lower semicontinuous at x^ E IR^ if and only if l i m i n f f{x) = f{x^) ( l i m i n f f{x) ^ f{x^) if we adopt the deleted X—>X°
X—J-X^
definition). Another notion of generalized "mixed" limits has been introduced in optimization theory by Rockafeller. If 5^ : M^xlR^ R, with x^ elR^^y^ limsup x^xO
liminf x-,x^
eET,
inf g{x,y) = sup y^y^
sup gix^y) = y-^yO
—>
then lim
U{y^)
V{x^)
inf
sup
sup
inf
xeV{x^)
inf
U(y^) y(xO) xeV{x^)
g{x^y) \
yeU{y^)
sup
g(x^y) .
yeU{y^)
• Let / : X - ^ iR, with X C EJ"; then / is said to satisfy a Lipschitz condition on X if there exists a number L > 0 such that \f(x')
- f{x^)\
^ L \\x' - x^
,
yx\x^
6X .
18
Introduction
We shall say that / is Lipschitzian around x^ (or near x^ or locally Lipschitzian at x^) if there exists a neighbourhood of x^, N(x^), and a number L > 0 such that
|/(xi) - f{x^)\
S L \\x' - x^
,
yx\
x^ e N{x^) .
• The terms "non decreasing" and "increasing" will be considered as equivalent; similarly for the terms "non increasing" and "decreasing". We shall write "strictly increasing" and "strictly decreasing" for the strict relations. • Let f : D C M'' -^ M and x^ e mt(D). We say that / is difkrentiable at x^ if there exists a vector y^ E M^ such that, for each X e D, then
f{x)-f{x x^. Then, the vector y^ (which is unique) is formed by the n partial derivatives of / evaluated at x^
a/(x°) dxi
dfjx^) '
'
dxn
(sometimes the following notation is also used: fxi(x^)^ •••, fxni^^))The gradient vector or gradient of / at x^ consists of all n partial derivatives evaluated at x^ and is denoted by V / ( x ^ ) . • We say that / : JR^ -^ iR is continuously difkrentiable at x^ or that it is of class C^ at x^ (and we shall write / G C^(x°)) when V f{x) is continuous at x^. The following implications hold:
feC\x^)=^
/ is differentiate at x^
^
V f{x^) exists
^
/ is continuous at x^ r, ' r » r t i'or ^
f '. JR^ -^R^ is said to be differentiiable at nqrtial
Basic mathematical
preliminaries
and
19
notations
derivatives if all its components admit partial derivatives. The m by n matrix of the partial derivatives is called the Jacobian matrix of / and is denoted by V / ( x ) of J f{x). Thus in this case
dXn
dxi
J / ( x ° ) = V/(xO) =
dU{x^) dxi
dXn 2 = 1,...,m;
j=rl,...,n
dXn
• We say that f : D C M^ -^ Mis twice difkrentiable at x^ e if, for each x 6 i? it is
f(x)-f{x')
= {x-x')Vfix')
. {x - xYHf{x^){x
+
mt{D)
l
- x^) + o{\\x - x^W)^ ,
forx-^x^.
The n by n matrix H f{x^) is called the Hessian matrix of / at x^; it is unique and its elements are the second-order partial derivatives of / , evaluated at x^:
L dxi dxj -
i,j = l , . . . , n .
• We say that / is twice-continuously differentiable at x^ or that / is of class C'^ at x^ when / admits second-order partial derivatives, continuous at x^. • The following implications hold: a) V / is differentiable at x^ -oo
/c'—>-oo
Note that this result is equivalent to the fact that in each boundary point of a convex set there is a supporting hyperplane passing through it. ii) Let x^.x'^ G i n t ( X ) ; let us suppose by contradiction that the set A = {A I 0 < A < 1, (1 - A) x^ + A x V m t ( Z ) } is nonempty. Denote AQ = inf A; we have 0 < AQ < 1 and x^ = (1-Ao) X^ + XQX^ e b d ( X ) , as x^ e i n t ( X ) or x^ e e x t ( X ) contradict the definition of AQ. Let H = {x \ ax =^ ax^} be a supporting hyperplane at x^ (a ^^ 0); therefore x^ G H, i.e. ax^ = ax^, as if we suppose, e.g., ax^ > ax^, it results ax'^ < ax^, in contradiction to the definition of H. Therefore x^ must be a boundary point of X , in contradiction to the assumption. Therefore i n t ( X ) is convex. D As a consequence of the above theorem we deduce that an open set X C iR'^ or also a closed set X C IR^, with m t ( X ) 7^ 0, is convex if and only if it admits a supporting hyperplane at each point x G b d ( X ) . If X is open the assertion is immediate. If X is closed, with i n t ( X ) ^ 0, under the assumption of Theorem 2.2.4, we have already shown that i n t ( X ) is a convex set; since it is i n t ( X ) C X = X, we now have to prove that it is X == X C int(A'), which implies that X = X = i n t ( X ) is convex by Theorem 2.1.7. Suppose by contradiction the existence of x^ G X ,
40
Convex sets
x^ 0 i n t ( X ) ; choose a point x'^ e i n t ( X ) ( i n t X 7^ 0 by assumption) and let
Ao - i n f {A|0 ^ A ^ 1, Ax^ + (1 - A) x^ ^ int(X)} . Clearly it is 0 < AQ ^ 1. Moreover, AQ = 1 would imply x^ G i n t ( X ) which is a contradiction; let x^ = AQ^:^ + (1 — ^0) ^^i it is
x^ e bd(int(X)) c bd(X) . So we can find a supporting hyperplane H = {x\ax
= ax^}, with a ^ 0.
Since x'^ G i n t ( X ) it is arc^ > a^:^ and since x^ e X \t is aa;-^ ^ aa:^. Now -.1 _
""
1
„0
- l - A o " ^
X
1-Ao
and so we have
a[-i-T«--
'^0
r- X
2 1 ^ 0 ^ ax
i.e. ax^ ^ ax^, a contradiction. We can sharpen the above assertion avoiding the assumption i n t ( X ) ^ 0. Indeed, from Theorem 2.2.3 we can deduce that a set X C M^ is closed and convex if and only if it coincides with the intersection of all its (closed) supporting halfspaces; then every boundary point of X lies on a supporting hyperplane. The study of optimality conditions makes often use of the concept of separation between two sets. Let Si and ^2 be arbitrary sets of ]R^; a hyperplane H is said to separate Si and ^2 if 5 i C H= and S2 C H = (or else Si C H= and ^2 C H= ). The hyperplane H is then called a separation hyperplane and sets Si and ^2 are said to be separable if at least a separation hyperplane exists. H is called a proper separation hyperplane if in addition it is SiU S2 (jL H. H is said to be a strict separation hyperplane of Si and ^2 if it is a separation hyperplane and in addition, at least one of the following relations hold:
5 i n i 7 = 0;
52nif = 0.
Separation
theorems
41
H is said to be a strong separation hyperplane for 5 i and ^2 if there exists a ball B of radius s > 0 such that the sets
Si + B ;
S2 + B
are separable. It appears that strong separability implies strict separability and that strict separability implies proper separability, but not conversely. If Si and 5*2 are open, then separability implies strict separability. Moreover, the empty intersection of Si and ^2 is neither necessary nor sufficient for separability and even for proper separability. The empty intersection of Si and ^2 is necessary but not sufficient for strict separability and, much more, for strong separability. Figure 1 shows different types of separation of two sets in J2^.
strong separation
strict separation
42
Convex
sets
proper separation
F i g u r e 1. From the above definitions it is quite immediate to prove that the same definitions are equivalent to the following ones: - Separability between ^ i and 52: there exists a hyperplane H = {x\ax — a} such that sup {ax} ^ inf {ax} . seSi 2:e52 -
Proper separability between ^ i and S2: there exists a hyperplane H = {x \ax = a} such that sup {ax} ^ inf {ax} xeSi ^^^2 and inf {ax} < sup {ax} .
- Strict separability between Si and 6*2: there exists a hyperplane H {x \ax = a} such that
Separation theorems
43
- Strong separability between Si and 5'2: there exists a hyperplane H = {x \ ax =^ a} such that
sup {ax} < inf {ax} . The notion of separability is of utmost importance in the case of convex sets; indeed we have the following fundamental results. Theorem 2.2.5 (Theorem of separation), if X, Y are nonempty convex sets of jR^ and (XnY) = 0, then there exists a hyperplane that separates X and Y, i.e. there exists a vector a e M^, \\a\\ = 1, such that for every X E X and for every y eY, ax ^ ay, i.e.
inf {ax} ^ sup {ay} .
xeX
y^Y
Proof. By assumption the set X — y is convex and the point x^ = 0 ^ {X — Y). We have two cases: a) 0 e X — y, then 0 is a boundary point of X — Y and we can use Theorem 2.2.4. b) 0 ^ X — Y] then we can use Theorem 2.2.3. In both cases, the result is that there exists a vector a j^ 0 such that a{x-y)^a'0 = 0,yxeX,WyeY. D The same result holds with the assumption 0 ^ i n t ( X — Y) or also with the assumption 0 ^ relint(X - Y) instead of {X OY) = 0. Theorem 2.2.6 (Theorem of strong separation). If X, Y are nonempty closed convex sets in M^, with X Pi Y" = 0 and at least one of them is bounded (i.e. is a convex compact set), then there exists a hyperplane that strongly separates X and Y, i.e. there exists a vector a G IR^, \\a\\ = 1, such that
inf {ax} > sup {ay} .
44
Convex
sets
Proof. By assumption the set X — y is convex; moreover it is closed: indeed consider the sequence {x^ - y^} ^ z, with {x^} C X, {y^} C F; we prove that z = x^ — y^, with x^ e X, y^ e Y. If X is compact, there exists a convergent subsequence x^ —^x^e X] then also the subsequence {y^'} is convergent and it is y^' = x^' — {x^' — y^') —^ y^ e Y ^sY \s closed. Moreover, it is 0 ^ X —y. Apply then Theorem 2.2.3: there exists a nonzero vector a G IBP' such that inf {ax\ > a • 0 = 0. This implies that inf {ax} > sup {at/}. xeX y^Y
D
Corollary 2.2.1. If the nonempty convex set X C IRP' does not intersect the nonnegative orthant R^ = {rr | x 6 IRP'^ x ^ 0}, then there exists a semipositive vector a > 0 (i.e. a ^ 0, a 7^ 0) such that ax ^ 0, V x G X, i.e. there exists a hyperplane separating X and IR!^. Proof. Apply Theorem 2.2.5 to the sets X ^x\dY = 1R\.
D
The following theorem gives necessary and sufficient conditions for proper separation between two sets: Theorem 2.2.7. The nonempty convex sets of IRP', X and Y, are properly separable if and only if relint(X) H relint(y) = 0. Proof. See Rockafeller (1970), Theorem 11.3.
D
We note that the notion of separability holds also in nonfinite dimensional spaces; consider, e.g., the following version of the Hahn-Banach separation theorem: let X be a Banach space with disjoint nonempty convex subsets A and B] if A is open and X* is the space of all linear continuous functional on X (dual space), then there is an x* E X * , X* z/z 0, such that sup{a;*(a)}^ M{x*{b)}. aeA ^^B See, e.g., Schaefer (1966). From the above theorems we can deduce several important results about the intersection of convex sets: Theorems 2.2.8 and 2.2.9 are due
Separation theorems
45
to Berge, whereas Theorem 2.2.10 is due to Helly (see, e.g., Berge and Ghouila-Houri (1965)). Theorem 2.2.8. Let X i , X 2 , . . . , X m (m ^ 2) be nonempty closed convex sets of M^, with X = IJI^i Xi a convex set. If the intersection of any m — 1 of the above sets is nonempty, then P l l i i ^i '^ nonempty. Proof. We can, without loss of generality, suppose that the sets X i , X 2 , . . . , Xm are compact; otherwise it suffices to take the points a-^, a^,..., d^ with d^ E Oi-^j Xj and to write
A = { a \ a^ ..., a""} and X• = X^ H conv(A) ,
^ = 1, 2,..., m .
We shall therefore prove the theorem for compact convex sets Xi,
by
induction over m. 1) Let m = 2] let Xi and X2 be convex compact sets, with Xi ^ 0, X2 7^ 0 and X i U X2 convex. If X i f i X2 = 0, there would exist a hyperplane that strongly separates them. There would then be points of Xi U X2 on both sides of the hyperplane. Hence there would be points on this hyperplane, since Xi U X2 is convex. This cannot be since the hyperplane must not intersect either Xi or X2. Therefore the property is true for m = 2. 2) Let us suppose that the theorem is true form prove that it is true also for m = p + 1.
= p{pk
2) and let us
Let Xi, X2,..., Xp+i be convex and compact and let [J^^i Xi be convex. Let every intersection of p of the above sets be nonempty. Put X = n | L i Xi', by assumption X ^ 0 and Xp^i ^ 0: if the two sets are disjoint there exists a hyperplane H that strongly separates them. Apply the induction hypothesis to the sets X^ = XiOH, i = 1,2, ...,p, convex, closed, with convex union given by 4-1
[jXl =
Hn{(jXi)=Hn(\JXi),
i=i
i=i
being i J n X p + i = 0 . By assumption, the intersection of any p— 1 of the sets X i , X 2 , ...,Xp
46
Convex sets
contains X and intersects Xp+i and consequently intersects H. It follows that any p ~ 1 of the sets X { , X 2 , ...,X^ has a nonempty intersection and hence from the induction hypothesis their intersection is nonempty. Thus n f = i Xl== X (IH y^0, which is absurd. D
Theorem 2.2.9.
Let X be a nonempty convex set and X i , X 2 , . . . , X^n
(m ^ 1) be nonempty closed convex sets in ]R^. \f Ai = Xn{f]j^^
Xj)
/
0, z = 1,2,..., m , but X n ( f l l l i Xi) = 0, then X ^ U I ^ i ^ i • Proof. The result is trivial for m = 1, as the intersection of any m — 1 of these convex sets is identical with the intersection of an empty family of sets, namely with TRP', If, then
xr\W = Xi^^ and if X n X i = 0, clearly we cannot have X (lX\. In the case where m ^2\Ne have conv(A) C X. The sets X[ — conv(A)n Xi, z = 1,2, . . . , m , are closed and convex; every m — 1 of them have a non-empty intersection, namely
a^ e f l X'j = conv{A) H ( f | Xj) , but their intersection is empty, otherwise conv(^) and hence X intersect n i l i X^. Their union U £ i ^ i = conv(A) H ( U I ^ i Xi) is not convex (see Theorem 2.2.8) and therefore conv(A) n+1, be convex sets of IR^. If the intersection of any (n + 1) of these sets is nonempty, the intersection Pli^i Xi is nonempty. Proof. Let us suppose that the intersection of any p {n + 1 S p < m) of the sets Xi^X2,-"',Xm
is nonempty; we shall show that the intersection
Some particular convex sets. Convex cones
47
of any p + 1 of these, for example X i , X 2 , ...,Xp-(-i, is nonempty. Let a-^, a-^,..., a^~^^ be vectors such that
a^e
fl
Xk,
j = l,2,..,p+l,
1 ^ k ^ P+i
and let
Ai = {a^} ,
j ^i,
i = l,2...,p+l .
The closed convex sets conv(Ai) C Xi have a convex union, as it is p + 1 > n + 1 (see property v) of the convex hull of a set; we have Uf=i conv{Ai)
= conv{a-^, a^^, ...ja^"^^}) and every p sets of these sets
have a nonempty intersection; indeed a^ e f j i ^ j conv(Ai); therefore from Theorem 2.2.8 we have n £ " i conv(Ai) 7^ 0. But conv(^i) c X^, i = 1, 2, ...,p + 1. and hence 0 ^ f l C i ^ conv(AO C HCI
^i.
D
We note that, as a consequence of Helly's theorem, it results that the assumptions of Theorem 2.2.9 are consistent only for m ^ n if H I ^ i X^i ¥" 0.
2.3. Some Particular Convex Sets. Convex Cones The notion of extreme point of a convex set plays an important role in mathematical programming, especially in linear programming. Let X be a convex set in IR^] each point 3: G X for which there do not exist two distinct points x^^ x'^ e X different from x such that x 6 (x^^x*^) is called an extreme point of X, i.e. x cannot be represented as a strict convex combination of two distinct points in X. In other words, if a: = Xx^ -h (1 — A) x^; A e (0,1); x^,x^ eX =^ x = x^ = x^. Clearly the extreme points of a convex set X are those points of X that do not lie on the interior of any line segment connecting any other pair of points of X. Any convex set X C IRP' may have no extreme points (for example, the hyperplane i J = { x | x G IBT'^ ax = a}, a G IR^, a 7^ 0, and any open
48
Convex sets
ball B(x^^s) has no extreme points; more generally, every open set has no extreme points), a finite number of extreme points (for example, the set A = {x \ X e IRP'^ X ^ 0, ex = 1}, where e = [ 1 , 1 , . . . , 1]) or an infinite number of extreme points (for example, any closed ball B{aP^e) has an infinite number of extreme points given by { x | x G iR^, \\x — x^\\ = e}). An important result concerning extreme points is the following: Theorem of Krein-Milman. A closed bounded convex set S C IRP' is the convex hull of its extreme points. A set in IRP' which is given by the intersection of a finite number of closed halfspaces in R^ is called a {convex) polyhedron or polyhedral set. The term "convex" is in fact superfluous as it follows from convexity of the halfspaces that polyhedra are (closed) convex sets. A polyhedron can be identified by means of its algebraic representation, which, on the ground of the definition, is given by the set X = {x \ x E JRT'^ Ax ^ 6}, where ^ is a real {m^n) matrix and 6 is a vector of IRP. If a polyhedron is bounded, it is often called a polytope. Beware! Some authors call polyhedron what we have called polytope and vice versa (see, e.g., Mangasarian (1969), Martos (1975)). The definition we have accepted seems more the standard in books and papers specifically concerned with convex analysis and linear programming (see, e.g., Stoer-Witzgall (1970), Rockafeller (1970), Bazaraa-Jarvis and Sherali (1990), Gale (I960)). As a consequence of the Krein-Milman theorem we have that a nonempty polytope is given by the convex hull of its extreme points, i.e. if X C IR^ is a polytope and f ^, x^,..., x^ are its extreme points (the number of extreme points of a polytope is always finite and nonzero), then P
X = {x\xeR^,
P
x = X ^ A x ^ ^ A = i, A^o, i = i,2,...,p} . i=l
i=l
Let x^jX-^, ...,x"^ be m H- 1 distinct vectors of iR^, with m ^ n. If the vectors x^ — x^,..., x'^ — x^ are linearly independent, then the convex hull of x^jX^, ...jx'^ is called an m-simplex in M^ with extreme points (called v,m. also vertices) x ^ , x \ ...^x'^
Some particular convex sets. Convex cones
49
771
m
2=0
i=0
The numbers AQ, ..., Am are called the barycentric coordinates of X] note that, as in IR^ the maximum number of linearly independent vectors is n, then there could be no simplex in IRP' with more than n + 1 vertices. A 0-simpiex is a point; a 1-simplex is a closed line segment; a 2-simplex is a triangle; a 3-simplex is a tetrahedron. We now briefly discuss the notion of cone and especially convex cone] indeed convex cones are of utmost importance in several questions of optimization theory. Definition 2.3.1. A nonempty set K C IRP' is called a cone with vertex at X e ]R^ if x + a ( x - x ) e K,\/x e. K, V a > 0. In the special case where x = 0, the cone with vertex at zero is called simply cone (i.e. it is ax e K,Vx e K, V a > 0). Such a set is a union of halflines emanating from the origin. From now on, unless otherwise specified, the vertex of the cones considered is the origin. From the previous definition it appears that the vertex of a cone may or may not belong to the cone (however, it belongs to the closure of the cone). Many authors, however, do include the vertex in the cone by letting a ^ 0 in the definition. Note that our definition implies that the interior of a cone is again a cone. Anyhow, we shall specify the cases where the origin is required to belong to the cone (when it is not clear from the context). Again it may be worth nothing that a cone K may or may not be convex and that a cone K may be open, closed or neither open nor closed. If in addition the cone K is convex, then it is called a convex cone. A cone K is said to be pointed if, whenever x 7^ 0 is in this cone, then —x is not in the cone, i.e., in case K contains the origin, if i f f l {—K) = { 0 } . The following result is often used to characterize convex cones and is a rather immediate consequence of the above definition. Theorem 2.3.1. K C IR^ is a convex cone if and only if:
50
Convex sets
a) ax e K,\/x
e K,ya>
b) x^ + x^ eK,yx\x^
0;
eK.
Proof. Suppose that K \s B convex cone, then x-^^x'^ G K implies that Xx^ + (1-X)x^ eK,VXe (0,1). Letting X = I, we get ^x^ + ^x^ e K and hence x^ + x'^ G K. Conversely, assume a) and b); if x-'-, x^ G K, then from a) we get Xx^ G K and {1- X)x^ e K for each A G (0,1). From b) it follows that also Xx^ + {1 — X)x'^ e K and hence i f is a convex cone.D Some examples of convex cones are: i)
Hyperplanes through a point x^,\.e. H = {x\ a{x — x^)} = 0}, where a G M^, a^O. Here the cone has vertex at x^.
ii)
Closed halfspaces, for example i f = again the cone has vertex at x^.
== { ^ I ^ ( ^ ~" ^^) = 0}- Here
iii) The set C = {x\ Ax ^ 0}, where ^ is a real m, n matrix. It results that if Ki, K2 are convex cones, then Ki nK2 and Ki + K2 are again convex cones; if 7i^ is a cone, coixv{K) is a convex cone. If Ki and K2 are convex cones, then Ki + K2 = conv(ifi Ui^2)- We note that, given a set 5 C M^, we can associate to the same a cone, called a cone generated (or spanned) by S or conical hull of S or projection cone of S, and defined as:
K(S) or cone(5) = {x\x
= Xy, A > 0, y e S} ,
It is easy to see that cone(5) is given by the intersection of all cones containing S. The convex conical (or positive) hull of S or convex cone generated (or spanned) by S is defined as: k
C{S) = J^x\x = Y^ Xix\ keN,
x' eS,
Xi>0,
i = l,..., k^ .
i=l
Evidently it is cone(6') C C(S) and { { 0 } U C{S)} S, Xi ^ 0}.
= { E Kx'
\ x' G
Some particular convex sets. Convex cones
51
It is easy to see that C{S) is given by the intersection of all convex cones containing S (note the parallelism with the definition of convex hull of a set). Theorem 2.3.2. i)
Given S c M^, it results C{S) = cone(conv(S')) = conv(cone(iS')) . Therefore, if 5 is a convex set, it is C{S) = cone(5) . If 5 is a cone, it is C{S) = conv(5) .
ii) If 5 C M^ is a convex compact set, with 0 ^ 5 , then { 0 } U cone(5) is closed. Proof. i)
By definition it is conv(S') C C{S) and cone(conv(S')) C C{S), as C{S) is a cone. Analogously, we have cone(5) C C{S) and hence conv(cone(S')) C C{S), as C{S) is a convex set. For the reverse inclusions, let x G C{S)', then x ~ Yli=i \ ^ \ with A^ > 0 and x' e S. We set /i = 5] Ai > 0 and fj^i = Xi / Yl Xj = (A^ / /x) > 0. Of course we have ^ fJ'i = I and we get: ^ ~ M ( S l^i^^) ^ cone(conv(S')) ; X = Y1 f^iil^^^) — conv(cone(*S')) . The other assertions of i) are straightforward.
52
Convex sets
ii) Let {y^} C cone(5) be a sequence with y^ -^ y^; we have to show that y^ G cone(5). By definition there exist sequences {A^} C iR+ and {x^} C S with y^ = A/cX^, V/c. Because of the compactness of S and 0 ^ 5 , it is \\x^\\ ^ a > 0, V/c. Therefore the sequence {Afc} is bounded. Otherwise, in case of Xk —> +oo, we would get lly^ll = ^k Ik^ll ^ AA; • a —> +00 in contradiction to the convergence of {y^}. Again because of the compactness of S and the boundedness of {Xk} we can assume (without loss of generality) that both sequences are convergent, i.e. x^ ^^ x^ e S, Xk -^ XQ ^ 0. Now, of course it is y^ = Xkx^ -^ Xox^ e {0} U cone(5) .
D
Summing up: a set X C IR^ is (1) a linear subspace, (2) an affine set, (3) a convex set, (4) a convex cone, if for any x^, x^ G X, also Xix^ + A2X^ G X, for every (1) Ai, As GiR; (2) Ai,A2GiR, Ai + A2 = l ; (3) Ai,A2GiR+, Ai+A2 = l ; (4) Ai,A2G
M+.
Moreover X C R"" \s respectively (1), (2). (3), (4) if for any x \ . . . , x ^ G X also E I ^ i AiX^ G X for every (1) XieM; (2) A, G iR, E i Ai = 1; (3) AiGiR+, E^ A, = 1; (4) XiE
M+.
The //near /7a// o / ' X (1'), denoted by span(X). the affine hull of X (2'), denoted by aff(X), the convex hull of X (3'), denoted by conv(X), the convex conical hull of X (4'), denoted by C{S), are respectively:
Some particular convex sets. Convex cones
53
- the smallest linear subspace containing X (1'); - the smallest affine set containing X (2'); ^ the smallest convex set containing X (3'); - the smallest convex cone containing X (4'). Moreover, span(X) = ix\x
= Y^ Xix\ Xi e M, x' e x \ ; i
aff(Z) = ^x\x = Y, hx\ XiER,
^
i
conv(X) = lx\x
= Y^ Xix\ XiZO, i
C{X) = {x I X = ^
Ai = 1, x'
eXy,
i
^
A^ = 1, x'' e x \ ]
i
Xix\ Xi > 0, x'
eXJ.
i
Now let S be an arbitrary set in M^', the (negative) polar cone of S, denoted by 5* is the set S* = {y\y e BP', yx ^ 0, V x € S). If 5 is empty we will interpret 5* as the whole space IBP'. It is immediate to note that the polar cone of a set contains the origin. Some of the following results are a direct consequence of the definition of polar cone; see Bazaraa and Shetty (1976), Ben-Israel (1969), Fenchel (1953) for the less obvious proofs. Let 5, Si and ^2 be nonempty sets in EP, then it holds: i)
5* is a closed convex cone with vertex at the origin; therefore this cone is called the polar cone (sometimes also the dual cone) of S.
ii)
5* = (5)* = (conv(5))* = (cl(conv(5)))* = (cone(5))* = = (cl(cone(5)))* = (C(5))* = (cl(C(5)))* .
iii)
Si C ^2 implies S^ C S^ .
iv)
S'C^**, where5** = (5*)*.
v)
5* = 5***, where S*** = (5**)* .
vi)
SI U 5 | C {Si n 52)* .
54
vii)
Convex sets
5i* n ^2* = (Si U 52)* .
viii) SI ns^c {Si + 52)*; {Si + 52)* c SI n s^ \fOeS1nS2. Therefore if 0 € 5i n -§2 it is {Si + S2)* = S^ n 5 | . Indeed, if 0 6 5i n ^2 then C{Si U ^2) = C(5i + §2). Now (C(5iU52))* = ( C ( 5 i + 5 2 ) ) * and from
(C(5i U 52))* - {Si U 52)* = (5i U 5'2)* = (5i U 52)* and from
(C(5i + 52))* = (5i + 52)* = {Si + 52)* = {Si + 52)* we get {Si + 52)* = (5i U 52)* = 5i* n 52* . ix)
5i* + 5^ = cone(5i* U ^2*) = conv(5i* U 5^) c {Si n 52)*.
E.g. in M^ the polar of a convex cone K is seen to consist of all vectors making a non acute angle with all vectors of the cone K (see Figure 2).
Figure 2.
Some particular convex sets. Convex cones
55
We will now focus further attention on convex cones and will prove an important result for closed convex cones. This result is also known as the polarity property or duality property for closed convex cones. Theorem 2.3.3. Let i f be a nonempty convex cone in M^] then i f * * = K. Proof. Let x E K] then ax ^ 0 Va G i f * (note that i f * = {K)*) and hence x e i f * * . To prove that i f * * C K, let x E i f * * and suppose that X ^ K. By Theorem 2.2.3 there exists a nonzero vector a such that ay ^ a, Wy e K and ax > a for some a. But since y = 0 e K, then a ^ 0 and so ax > 0. We will show that this is impossible by showing that a E i f * (note that x E i f * * by hypothesis). Suppose by contradiction that a ^ K*; then there exists a vector y E K with ay > 0. But then a • Xy can be made arbitrarily large by choosing A sufficiently large, which violates the fact that a • Ay ^ a for each A > 0. This completes the proof. D Corollary 2.3.1. The set i f C M^, i f / and only if i f = i f * * {polarity
0, is a closed convex cone if
property).
As a consequence of the previous theorems and properties we get the following other useful results: a) If i f C IR^, i f 7^ 0, is any set, then i f * * is the closed convex conical hull of i f ( i f * * is also called bipolar cone of A). b) If i f i and if2 are nonempty closed convex cones in M^ we have the following modularity properties:
ifi*nif2* = ( i f i + if2)* (ifinif2)* = ifi* + if2*. The first one is a direct consequence of the previous property vi). To prove the second one, note that it results
(ifi n K2)* = ( i f f n if2**)* = (ifi* u if2*)** =
56
Convex sets
= cl(conv(iri* U K^)) = Kf + Kl . c) If Ki, K2 are closed convex cones with (int(Xi) H K2) / 0, then K^ + K^ = {KinK2y
.
Indeed, let ae (KiD K2)* and X = {(x, a) eM"" xJR:\xe
Ki, a ^ ax} ;
Y = {(y,/?) eM^xM\yeK2,
P^O} .
Then i n t ( X ) f i F = 0 and by the separation theorem we can therefore find a vector (u, ^) e M"" x M, (u, ^) y^ (0,0) such that
ux + a^^uy
+ (3^ ,
\/{x,a)eX,
\/{y,f3)eY.
Obviously, it is ^ ^ 0; assuming ^ = 0 we get ux ^ uy, \/x ^ Ki, Vy G K2, which is impossible since (int J^i) fl jFf2 7^ 0 and u ^ Q. Without loss of generality we can thus set 0. Hence Ax ^ 0 for all x E C. D From this theorem it follows that every polyhedral cone is convex and closed (therefore it contains the origin); moreover, it can be proved the following fundamental result of Weyl (see Gale (1951)): every convex cone, generated by a finite number of points, is a polyhedral cone and vice versa, i.e. the polyhedral cone C can be described by the set k
C = \^x\x = Y^ \ix\
Ai ^ 0, i = 1,2,..., k, k e N} .
i=i
Therefore the polyhedral cone C is the convex cone generated by a finite set X = {x^^x'^^ ...jX^}. For instance M^ is the convex polyhedral cone generated by (e^, e^,..., e'^), where e^ is the i-th unit vector of JRP'. Stated differently, a polyhedral convex cone is given by the set
C = {x\x
= By, y^O}
,
where B is a given real matrix. Polyhedral cones have been extensively treated, e.g. by Gale (1951, 1960); for the proof of the theorem of Weyl, see also Stoer and Witzgall (1970). See the same authors also for the following
58
Convex
sets
interesting result: the solution set of a linear inequality system Ax S b (i.e. a polyhedron) is given by the sum of a polytope and a polyhedral cone (this result appears originally in Goldman (1956)). The following results hold with respect t o polyhedral cones (recall Corollary 2.3.1): i)
If J ^ is a polyhedral cone, then K =
K**.
ii)
The set K = {x \ Ax = 0, x ^ 0} where ^ is a given real matrix, is a polyhedral cone.
iii)
If Ki
and K2 are polyhedral cones, then:
Ki + i^2 is a polyhedral cone; K*
and K2 are polyhedral cones;
Ki n K2 \s ^ polyhedral cone; (Ki
n i^s)* = K^ + K^ and hence
We have mentioned the notion of cone and convex cone generated by an arbitrary set 5 C IBP' ("conical hull" and "convex conical hull" of t h a t set). If we make a suitable translation of the set 5 we obtain the following two cones. I) Let S d IBP' and x^ G EP\ the {projection) K{S^x^),
cone of S at x^, denoted
is the cone generated by 5 — x^, i.e. the set
cone(5 ~ x^) = {z \ z = X{x - x^), x e S, A > 0} . K{S,x^)
is therefore given by the intersection of all cones containing
the set S — x^] moreover,
it is clear that, changing the point x^, we
can define different cones of this type. From the above definition it is also clear t h a t if x^ is an interior point of S, then K{S, space M^.
x^) is the whole
This follows from the fact t h a t S — x^ will have 0 as an
interior point and any cone with vertex zero containing S — x^ \s M^.
Theorems of the alternative for hnear systems
59
II) Let S C ]R^ and x^ e M^] the convex cone of S at x^, denoted by C{Sj x^) is the convex cone generated by 5 — x^, i.e. the set we have denoted also by C{S — x^). In other words C ( 5 , x ^ ) is given by the intersection of all convex cones containing the set S — x^. From Theorem 2.3.2 we have that if S is convex, then C{S,x^) =
K(S,x^). A vector y E iR^ is said to be normal to a convex set 5 at a point x^ G S, when V x E S \t \s y - (x — x^) ^ 0. The set of normal vectors to the convex set 5* at a point x^ is said to be a normal cone to S 3t x^ e S and is denoted by N{S,x^). Therefore N{S,x^) = {y e FT \ y ' {x - x^) ^ 0, \/x e S}. The reader can verify easily that this cone is always convex and closed; moreover, from the definition it
is N{S,x^)
= {S-x^y
= {K{S,x^)y
= {C{S,x^)y.
other cones
associated with a set will be discussed in Chapter III; these cones are particularly important in obtaining general optimality conditions for smooth and nonsmooth optimization problems.
2.4. Theorems of the Alternative for Linear Systems In this section we shall be concerned with a series of theorems related to the occurrence of one of two mutually exclusive events: the existence of solutions of two systems of linear relations. These theorems, known as theorems of the alternative (for linear systems), are numerous, but we shall prove that all known theorems of this kind can be deduced from a unique generator: the so-called theorem (or lemma) of Farkas-Minkowski. The general form of a theorem of the alternative (in the present case, for linear systems) is the following one: Given two linear systems (of inequalities and/or equalities) S and S* ( 5 * may be called "the dual of 5 " ) , it results that S admits solution if and only if 5* is impossible. In other words: either 5 admits solution or 5* admits solution, but never both systems admit or do not admit solution. If we denote by 3S (35*) the existence of solutions of S (of 5*) and by ^S {^S*) the impossibility of S (of 5*), a typical theorem of the
60
Convex sets
alternative can thus be stated as follows:
35 ^ ^ 5 * or, equivalently,
^S ^ 35* . It follows that a typical proof of such a theorem is: 35=^^5*
and
^5=^35* ;
or, equivalently, 3 5 * = > ^ 5 and
^ 5 * =^ 35 .
Theorem 2.4.1 (Theorem of Farkas-Minkowski). For each given matrix A of order (m, n) and each given vector b G JR^, either the system
Si: Ax = b ,
x^O
has a solution x e JRP', or the system
Sl'.yA^Q,
yb 0 or y^ > 0... or ...yP > 0
VV
Theorems of the alternative for hnear systems
65
or E^ > 0 or
E"^ > 0... or ...E^ > 0 . The proof of this theorem will be performed by means of the following three theorems of the alternative described in 8), 9), 10), all obtained from the first series of results.
8)
Sg : Ax + Bv + Cz = 0, v^O, yC>0.
z > 0]
S^ : yA = 0, yB ^ 0,
Proof. If ^9 is impossible it will be impossible also the system
S^:yA
= 0, yB ^ 0, yC ^ 0, yCu > 0 ,
where u = [1,1,..., 1]^, and conversely. But if Sg is impossible, by means of the result described in 6), it will be possible the system
S9:Ax
+ Bv + Cz = -Cu,
v^Q,
z^Q
.
Hence it will be possible also Sg as soon as we put z =^ {z + u) > Q. It is then easy to obtain the converse result: if 59 is impossible, then ^9 admits solution. D q
9)
5x0 : Aix^ + A2X^ + A^x^ + Y^ AjX^ = 0; x^ ^ 0; x^ > 0; x^ > 0,
(Here Aj denotes the j - t h matrix, not the j - t h row of A.)
Slo : yAi = 0; yAj ^0, j = 2, ...,g, yAs > 0 or yAj > 0 for at least one j , A ^ j ^ q. Proof. Note first that x^ > 0 is equivalent to {x^ ^ 0, u^x^ — Wj = 0, Wj > 0}, where Wj € IR and u^ = [1,1,..., 1] has, for each j = 4,..., q, the same dimension of x^. Then system Sio may be rewritten in the equivalent form
66
Convex sets
0 0
x^ +
+
^4
^5
•
0 0
0
0 u^
. . . .
0 0
0
0
0
. .
u^
A3 0
0 -1
(a;^x^x^...,x^)^^0;
•
X
w
^9
\ xn x4
x'
+
xi J
0;
(x^^z;)^>0.
If the said system is impossible, then by means of the result described in 8), we can affirm the existence of vectors y and v = [vj], j = 4, . . . , g , such that: a) yAi = 0] b) y A 2 ^ 0 ; c) yAj + u^Vj ^0, d)
j = 4,..., q ;
[yA3;-v]>0.
From d) we have yA^ ^ 0 and v ^ 0 and from c) it results yAj ^ —u^Vj ^ 0. Moreover: if t' = 0, then yAs > 0 but if there exists a jo such that Vj^ < 0, then yAjQ ^ —u^^Vj^ > 0. Again by means of the result in 8) we can assert that if Sio admits solution, then 5*o is impossible. 10) 5 i i : Aix^ + A2X^ + Asx^ + ^
D
AjX^ = b; x'^ ^ 0; x^ > 0; x^ > 0,
j = 4,...,g. (Again Ak is the fc-th matrix.) 5i*i : 2/yli = 0; yAj ^ OJ = 2, ...,g; yb ^ 0 and y6 < 0 or y^a > 0 or yAj > 0 for at least one j, A ^ j ^ q. Proof. 5 i i may be rewritten in the equivalent form:
Theorems of the alternative for linear systems
3
67
y
x^ ^ 0; [x^ 11]^ > 0; x^ > 0, j = 4,..., q . The present result is obtained at once by applying to this form the previous theorem of the alternative.
D
Proof of Theorem 2.4.3. Let us introduce in the inequalities appearing in system (3) the following "slack" vectors: w'^ ^ 0, w'^ > 0, w'^ > 0, i = 4, ...,p. Then (3) may be rewritten in the form:
A31
Au 0 A22 I A32 0
x^ +
X"
+
1- W^
Api J
^13
0
A23
0
^33
/ tt;"
Ap2 0
lp3
0 0
Aij 9
+ jE =4
+
A20
Azj
'+1:
w' = h ,
i=A
. Apj . with x^ arbitrary, [x^lit;^]^ ^ 0, [x^|t(;^]^ > 0, x^ > 0, j = 4, ...,g; tt;^" > 0, j = 4,...,p. Then apply the result described in 10) in order to obtain that (3) is possible if and only if (4) is impossible. D From Theorem 2.4.3 we obtain a second numerous series of theorems of the alternative; here we list only the most quoted in the literature. 11) (Theorem of the alternative of Gordan.) Si2:Ax
= 0,x>0]
Sl2'yA>0.
68
Convex sets
12) (Theorem of the alternative of Stiemke.) 5i3 : ^a; = 0, a; > 0; S^^:yA>0. 13) (Theorem of the alternative of Ville.)
5i4 : Aa; ^ 0, a: > 0; Sf^: yA> 0, y ^ 0. 14) Si5:Ax^0,x>0Sf^ : yA>0, y ^ 0. (This theorem is due to Gale (I960).) 15) Si6:Ax0;
S^Q : yA ^ 0, y > 0 or yA > 0, y k 0.
16) 5i7 : A:r < 6; S^y •.yA = 0, y[I | -6] > 0 . 17) Si8:Ax^b,x>0;
8^8 : y[A\-b]>0.
y ^ 0.
18) 519 :Ax = b,x>0;
Sfg : y[A i -6] > 0.
19) (Theorem of the alternative of Motzkin or transposition theorem of Motzkin.)
520 : Ax = 0; Bx^O; Dx>0; S^Q:yA + vB + wD = 0; v^O;
w>0.
20) (Theorem of the alternative of Tucker.) S2i:Ax = 0; Bx^O; Cx>0; S^i:yA + vB + wC = 0] v^O; it; > 0. 21) (Theorem of the alternative of Slater.) S22-Ax = 0; Bx^O; C i > 0; Dx>0] Sh '• {y^ + vB + w^C + w^D = 0; v^O; w^ ^0; u;^ > 0} or {yA + vB + w^C + nP'D = 0; v ^ 0; ti;^ > 0; v? ^{)) . 22) (First theorem of the alternative of Mangasarian.)
523 : {Ax > 0 ; 5 x ^ 0; Cx ^ 0; Dx = 0} or {Ax ^ 0; 5 x > 0; Cx ^ 0; Dx = 0} ; 5|3 : y U + y25 + y3c + y4£) = 0; y i > 0 ; y^ > Q ; 23) (Second theorem of the alternative of Mangasarian.)
524 : Ax < 6; 52*4 : M = 0; 2 / ^ 0 ; fey = - 1 } or{yA = 0; y > 0 ; yfe ^ 0 } .
y^^Q.
Theorems of the alternative for hnear systems
69
24) (First theorem of the alternative of Fenchel.) S25'Ax
+ Bz = 0]
S^^:yA^O]
x^O]
z>0]
yB > 0.
25) (Second theorem of the alternative of Fenchel.) S26'Ax + Bz = 0] x^O; z>0', S^e'.yA^O', yB>0. 26) (Theorem of the alternative of Duffin or nonhomogeneous theorem of the alternative of Farkas-Minkowski.)
5*27 : Ax S b; ca: > 7 ; S^,:{yA = 0; y 6 < 0 ; y ^ 0} or {yA = c; yb Sj] 2/^0}. We report also another theorem, due to A.W. Tucker (1956), which is easily obtained from the previous results. Theorem 2.4.4. For any given matrix A of order (m^n), the systems
{Ax^O}',
{yA = 0; y ^ 0}
admit, respectively, solutions x^, y^ such that Ax^ + y^ > 0. Proof. In order to prove that the system
{ A^y = 0 -Ax ^ 0 -Ax - ly 0
is impossible. This is easily proved by means of Theorem 2.4.3. Conversely, suppose that the dual system admits solution: we obtain
0 = {z'^ + z^) Az^ Z [z^ + z^)z^ ^z'^ 'z'^ >Q , absurd.
D
Convex sets
70
As Theorem 2.4.4 is in turn a basic result which generates many theorems of the alternative (see Tucker (1956)), among which the theorem of Farkas-Minkowski, we have so obtained a kind of "loop".
2.5. Convex Functions Let / : X C IR^ -^ IR, where X is a convex set; the function / is said to be convex on X if:
/(Axi + (1-A)x2) ^ Xf{x') + yx\x^
eX,
(1-X)f{x^), VAG [0,1].
(1)
The function f : X —^ M \s called concave on X if and only if —/ is convex on X. Convex (and concave) functions of one real variable have the familiar geometric interpretation encountered in almost all the textbooks on elementary mathematical analysis. See Figures 4 and 5 below.
Figure 4. Example of convex function of one variable.
Convex functions
71
fix)
Figure 5. Example of concave function of one variable. We recall here the basic characterizations and properties for convex functions of one real variable. If (^ is a function defined on the interval (a, b) C M, i.e. (p : (a, 6) -^ ]R, then the following conditions are equivalent: i)
(f is convex on (a, h)
ii)
vii). Let (x-'-jai) and (x"^,a2) belong to e p i / . We have then / ( x i ) g a i , /(a:2) < a2. Then f^Xx'
+ (1 - A) a;2) = V^,i,,2(A) ^ AV^,i,,2(l) + (1 - A) V^,i,,2(0) = = Xf{x^) + (1 - A) /(x2) ^ Aai + (1 - A) a2 ,
owing to the convexity of ipx'^.x'^' Therefore A(x\ a i ) + (1 — A)(x^, a2) G epi/. vii) =^ i). Let x \ x ^ G X ; then
(x\/(xi))Gepi/,
{x^f{x^))eepif
for any x^, x'^ G X . By the convexity of epi / we have [Xx^ + (1 - A) x\
Xf{x^) + (1 - A) /(x2)] 6 epi /
for 0 ^ A ^ 1, or f[Xx' + (1 - A) x2] < Xf{x')
+ (1 - A) /(x2) ,
for 0 < A < 1, and hence / is convex on X.
O
If X C M^ is open (and convex) we have the following characterization.
Convex functions
75
Theorem 2.5.2. Let f : X -^ ]R, X C IRP' be an open convex set; then / is convex on X if and only if: viii) for each x^ £ X there exists a vector vP G IRP' such that
f{x)-
f{x^)^u^{x-x^),
VxeX.
(2)
Proof. Let X C M^ be convex and open and let x^ € X] we first prove the implication e p i / convex =^ viii). The point ( x ^ , / ( x ^ ) ) is a point of the boundary of the convex set e p i / . Therefore on the ground of Theorem 2.2.4, i), there exists a vector (V^VQ) ^ 0 such that
vx + V{)OL ^ vx^ + vof(x^)
(3)
for each (x, a) E e p i / . If VQ = 0, then v • (x — x^) ^ 0, V x 6 X , which implies f = 0, X being open; therefore we have the absurd result {v^ VQ) = 0. If VQ < 0, then it is possible to take a sufficiently large in order to have
vx + VQC^ < vx + vof(x ) , in contradiction to (3). Therefore it is VQ > 0. Choose a = f(x) and u^ = {—v)/vo; from relation (3) we obtain viii). Now let us prove the reverse implication viii) => / convex on X. Let x^,x^ e X and A G [0,1]; therefore as viii) holds, for each x^ e X there exists u^ e R"" such that f{x^) - f{x^) ^ u^{x^ - x^) and f{x^) f{x^) ^ n^(x^ — x^). Multiplying these inequalities respectively by A and (1 — A), we obtain
A/(xi) + (1 - A) f{x^) - f{x^) k u^[Xx' + (1 - A) a;2 - a;°] . Taking x^ = Xx^ + (1 — A) x^, relation i) of Theorem 2.5.1 follows, i.e. / is convex on X.
D
The vector u^ in relation (2) is called a subgradient of / at x^ (see the subsequent Section 2.6). If X is not open and relation (2) holds in i n t ( X ) , / is convex in i n t ( X ) , but in general not also in X: consider for example
76
Convex sets
n = 2, X = {xeIR'^\O^Xi^l,
2 = 1,2},
f 0 ,
if 0 ^ x i ^ 1 and 0 < X2 ^ 1
[ 1 - (xi)2 ,
if 0 ^ a:i ^ 1 and X2 = 0.
Relation (2) is satisfied with u^ = 0 for each x belonging to i n t ( X ) , but / is not convex on X. From Theorem 2.5.2 we also have that every convex function / open convex set X C M^ is the supremum of a family of linear functions. More precisely: for each x^ e X there exists a linear function l{x) = u^x + a such that l{x^) = f{x^) and l{x) ^ / ( x ) , X.
on an affine affine Vx G
We have also that the nonempty set {support set)
U = {{u, a) € iR^+^ \ux + a^
/ ( x ) , Vx G X }
is convex and
/(x) =
sup {ux + a} ,
Vx G X .
{u,a)£U
For concave functions the above characterizations hold, with suitable modifications. In particular, if the hypograph of / is introduced as the set
h y p o / = {(x,a) I (x,a) e X x M, /(x) ^ a} , then / is concave on X if and only if h y p o / is a convex set. Theorem 2.5.3. Let / be defined on the convex set X C IR^. A necessary condition for / to be convex on X , is that the lower level set of /
L ( / , a ) = {x\x€X,
f(x) Sa,
a e M}
is a convex set for each a e M. Proof. Let / be convex on X and let x^,x^ 6 L{f,a).
/ [ ( I - A)x^ + Ax2] ^ (1 - A) /(x^) + A/(x2) ^(1-X)a
Then
+ Xa = a.
Convex functions
77
Hence (1 — X)x^ + Xx'^ E L(f^ a), which is therefore convex.
D
We now show that if L ( / , a) is convex for each a E M, \t does not follow that / is convex on X. Consider the function / on if?, defined by f{x) = x^', f IS not convex on M, however the set
L(/, a)=={x\xeR,
x^ Sa} = {x\xeIR,
x ^ a^/^}
is obviously convex for any a E M. Theorem 2.5.4 (Equivalent definitions for differentiable convex functions). Let X C M^ be an open convex set and let / : X —^ iR be differentiable on X. Then the following assertions are equivalent: a) / is convex on X. b) For each x e X and for each y 6 M^ the function ^x,y(^) =" y^f{^ ty) is nondecreasing, with respect to t, on the interval Tx^y = {t\x
+ +
tyeX). c) For each x ^ x ^ E X the function '0^i^2(A) = (x^ - x^) V / ( A x ^ + (1 — A) x^) is nondecreasing, with respect to A on [0,1]. d) For each x^, x^ 6 X / ( x l ) - / ( x 2 ) ^ ( x l - x 2 ) V / ( x 2 ) .
e) For each x^,x^ 6 X / ( x l ) - / ( x 2 ) ^ ( x l - x 2 ) V / ( x l ) .
f)
For each x^,x^ 6 X (xi-x2).[V/(xi)-V/(x2)]^0.
Proof. a) e). For each x^,x^ G X it is fix')
- / ( x 2 ) = - [ / ( x 2 ) - fix')]
g - ( x 2 - x') Vfix')
=
= ix'-x^)Vfix') . e) => f ) . Summing up the two inequalities fix')
- /(x2) g {x' - x^) Vf{x')
;
/(x2)-/(xl)g-(xl-:r2)V/(x2); we obtain result f ) . f) => c). Let x \ x ^ G X, Ai,A2 G [0,1], with Ai < A2. Since X is convex, yi = Aix^ + (1 - Ai) x^ G X and y^ = x^x' + (1 - A2) x^ G X . Moreover, y^ -y'^ = (Ai - A2)(x^ - x^). We have
Convex functions
79
V';i,.3(A2)-^;x,,2(Ai) = = (x^ - x^) Vf{\2X^ + (1 - A2) x2) - (xi - x2) V/(Aia;i + (1 - Ai) x^) = [l/(Ai - A2)] • (yi - 2/2) . (V/(y2) - V / ( y i ) ) = = [1/(A2 - Ai)] • (y2 - yi) . (V/(y2) _ V / ( y i ) ) ^ 0 . Hence V';i,,2(A2)^ iR is differentiable on the open convex set X C IRP', the following assertions are equivalent: i)
/ is strictly convex on X.
iv) For each x-^,x^ G X , x^ 7^ x^, /(xl)-/(x2)>(xl-x2)V/(x2).
Convex functions
v)
81
For each x^^x'^ e X, x^ ^^ x^, f{x^)-f{x^) M, with (a, b) C iR, the following characterization holds: \i f'\x) exists on (a, 6), then / : (a, 6) —^ ]R is strictly convex on (a,fe) if and only if f^{x) is nonnegative on (a, fe) and not identically zero on any nontrivial subinterval of (a, 6).
82
Convex sets
The characterizations for concave and strictly concave functions are easily obtainable from the previous results. A function f : X -^ M \s linear (affine) on the convex set X C IR^ if and only if it is both convex and concave on X, i.e. if and only if it is
/[(I -X)x' + Xx^] =^ (1 - A) fix') + XfC) , yx^x'^ eX] VAe [0,1]. Let us prove that the above definition is equivalent to the more usual definition of a linear (affine) function on X .= JORP'. Theorem 2.5.7. The function / : M^ —> JR is linear affine if and only if there are a vector c G IR^ and a scalar 7 such that / ( x ) = ex + j . Proof. According to Theorem 2.5.1, i), ii), iii), a function which is convex and concave is characterized by the following relation:
/ ( ( I -X)x'
+ \x^) = (1 - A) fix')
+ A/(x2) ,
V x l . r r ^ G l ? " ; VA € 1? . Set 7 = /(O) and tp{x) = f(x) - /(O) = f(x) same equality and •0(0) = 0, we have i;{Xx) = Xtp{x) ,
- 7. Since i/j fulfills the
V x 6 12" , VA G 12
^l^ix'+ x^) = ip{x') + i^ix"^) ,
VX\X2GJ2".
Especially, we get for x = J27=i ^ i ^ ^
Setting Ci = ip{e^), we have ip{x) = ex and f(x)
= ex + 7.
D
We note that if / i ( x ) , f2{x),..., /n(^) are convex functions, all defined on the convex set X C JR^, then each nonnegative linear combination of these is also a convex function on X. We shall see that this is no longer true for the main generalizations of the convex functions we shall introduce later (Section 2.10).
Convex functions
83
Moreover, \f {fi{x)},
i E I C IN, is a family (finite or not) of functions
which are all convex on a convex set X C M^, then the function
i9(x) = sup fi{x) is a convex function on X, since epi^? = Oiel
^P^fi
'^ ^ convex set.
Regarding the convexity of maps / : X —> i R ^ , we say t h a t a map (or vector-valued functon) / is convex on the convex set X C IR^,
if every
component fi, i = 1, 2,..., m, is convex on X. T h e following theorem gives results on the composition of convex functions; these results are particularly useful, especially when one has t o prove the convexity of a given non-elementary function. Theorem 2.5.8. i)
If / is convex on M^,
then f{Ax
+ b) is convex on M^, for any matrix
A of order ( m , n) and any vector b 6 ii)
Let X C ]R^ 2ind F C M^
IR^.
be convex sets; \f u : F -^ M \s convex
and non-decreasing on F , then for each map f : X ^^ F, convex on X, the composed function (p = u[f{x)] iii)
Let X C iR"" and F C R^
is convex on
X.
be convex sets and J C { 1 , 2 , . . . , m } .
if XX : F —> JR is convex [concave] on F, monotone non-decreasing on F w i t h respect to each component fi, i E J, and monotone nonincreasing on F with respect t o each component fi,
i ^
—^ F such t h a t fi
J,
then
for each map /
= [ / i , / 2 , •••,/m] ' X
[concave] on X
for each i G J and /^ is concave [convex] on X
for each i ^ J, then the composed function if = u[f{x)] [concave] on Proof.
is convex is convex
X.
See Bereanu (1969). W e shall give the proof of a more general
result when we shall introduce the generalized convex functions (see Section 2.15 of the present chapter). W e have seen t h a t f : X -^ Mis
D convex if and only if epi / is a convex
set; therefore if we consider the function on IR^, defined as / = + o o , it
84
Convex sets
is e p i / = 0 and, as the empty set if convex, the function considered is convex too. Similarly, the function on IR^, defined as / = —oo is also convex, since e p i / = JR^"'"-^. Again: consider a convex function / defined on a proper subset D of JR" and let
r f{x) , [ +00
,
\fxeD \f X ^ D .
The epigraph of / is identical to the epigraph of / i where / i is defined on all ]R^. In this way we can always construct convex functions defined throughout M^. We shall speak then of extended convex functions. So / : M^ -^ MU { ± 0 0 } is an extended convex function if e p i / is convex; the effective domain of an extended convex function / is the set
dom(/) = {x\xelR'',
f(x) < +00} .
It is actually the projection of e p i / on IR^. It is easy to see that / is an extended convex function if and only if the inequality
f{Xx' + (1 - A) x^) ^ Xf{x^) + (1 - A) f(x^) is true for all x^, x'^ e d o m ( / ) and for all A E (0,1). The effective domain of an extended convex function is a convex set in M^. The converse statement generally does not hold, i.e. if d o m ( / ) is a convex set, / will not necessarily be an extended convex function. Allowing convex functions to take on infinite values requires some caution in arithemtic operations, such as the undefined operations + 0 0 + (—00): see Rockafellar (1970), Section 4. Convex functions that have values at +00 and —00 do not arise frequently in applications; however, the proper convex functions, i.e. those (extended) convex functions such that f(x)
< +00
for at least one x, and
f{x)
> —00 for all X,
have a considerable theoretical importance. Extended convex functions that are not proper, are also called improper. An extended (and also a real-valued) convex function can be discontinuous only at some boundary points of its effective domain (see also
Convex functions
85
Theorem 2.5.16). Example 2.5.1. Consider the function f : M —> M given by
fix)
2
,
x=.\
+00
,
X> \
which is discontinuous at the boundary point a: = 1 of its effective domain. Some of these discontinuities can be eliminated by means of the closure operation for extended convex functions. We recall that an extended realvalued function / on IR^ is closed (or lower semicontinuous) if one of the following equivalent conditions holds: a) The epigraph of / is a closed set in JR^"^^. b) All lower level sets of / are closed. c)
liminf f{y) = f{x),
\/x.
By definition the supremum of closed functions is closed and we can introduce the closure of / : cl(/(x)) = sup{p(x) I g{y) S f{y),
g closed, Vy} .
This condition is equivalent to epi(cl/) = cl(epi/) or also to cl(/(x)) = liminf / ( y ) , Vx. Now, by definition, we get: / closed 4^ c l ( / ) = / . It holds the following result. Theorem 2.5.9. Vx, then
If / is a convex function on JR", with f{x)
cl(/(x)) = sup{g(x) I g{y) g / ( y ) , Vy, g linear affine} = = sup {ux -h 7 I uy + 7 ^ / ( y ) , Vy} ,
Vx .
> —oo,
86
Convex sets
Since this result is closely connected with the theory of conjugate functions, we shall give its proof later, in Section 2.7 (see the second remark to Theorem 2.7.4). If / ( x ) = —oo for any x, then of course the above equality fails. Since we cannot find a linear affine function which is smaller than / , we have
sup {ux + 7 I t^y + 7 ^ /(y), Vy} = - c o ,
\/x ,
but c l ( / ( x ) ) is characterized by the closure of the epigraph of / . Example 2.5.2. Consider the following (extended) convex function / defined on M\ ( 0
,
if a: > 0
f{x) = { K
,
if x = 0, with K € [0, +oo]
[ +00 ,
if a: < 0 .
The closure of / is coinciding with / at each point, except for the origin, where clf = 0. Example 2.5.3. Consider the (extended) convex function defined as
,, ,
/ 0
,
if||x|| 0 such that V5 E [0,^), we have C{6) C X, and therefore B C C{5) C X. Let x' be an extreme point of C{6) and let /3 = max^ { / ( x ^ ) } . On the ground of Theorem 2.5.3, L{f^(3) is convex. We observe moreover that every point X G C{5) can be expressed as a convex combination of the extreme points of C{5), extreme points whose number is 2 ^ , where K is the dimension of aff(X). The convexity of / implies 2^
2^
2^
fix) = f{j2 ^i^O ^ E ^^/(^') ^ E ^i/^ = /5. i=l
i=l
i=l
It follows that C{6) C L{f,/3). Let now x G S , a: 7^ x, and let x ^ x^^ be the intersections of the line of iR^, defined by x and x, with the boundary of B. Let A G [0,1] and such that, denoted x(X) = (1 - X) x + \x", it results x{\) = x. Then
and hence A X =
,
1+A
X
1 H
1+A
X .
The convexity of / gives then
f{x) = f{x{\)) ^^ '
^ (1 - A) fix) + \fix")
•'Vl + A -
I + A'^
^ (1 - A) fix) + \(5 ;
1 + A / - l + A-'^ ^
l + A'^^ ^ -
l + A'^^ '
Subtracting / ( x ) from both members of the first inequality and taking into account also the second inequality, we have
-A[/3 - fix)] ^ fix) - fix) S X[p - fix)] and therefore |/(x)-/(x)|^i[/3-/(x)].||x-x||,
92
Convex sets
as
||x - x\\ = 11(1 -X)x
+ \x^' - x\\ = ||A(rr'' - x)\\ = XS .
It follows that, for each x such that [/? — f{x)] \f{x) — f{x)\
• ||x — x\\ < 6s, we have
< e, and therefore / is continuous at x.
D
If / is convex on an open convex set X C M^, it is therefore continuous on X. It can be proved (see, e.g. Rockafellar (1970)) that a convex function f : X -^ M, where X C IR^ is an open set, is locally Lipschitzian at x^ E X and therefore continuous at x^. If / : X —> iR is continuous on the convex set X C JR^, we have another characterization, due to Jensen, of the convexity of / . Theorem 2.5.17. Let / be continuous on the convex set X C IR^', then / is convex on X if and only if
f[iix^ + x')]^l[f{x')
+ f{x^)].
(4)
Proof. Necessity: obvious. Sufficiency: we shall prove that it results
/(Axi + (1-A)x2) < A/(xi) + (1 - A)/(x2) , VA€(0,1), yx^,x^eX if this relation is verified for A = 1/2. stages.
(5)
We perform the proof in two
a) For induction upon n we prove that (5) is verified for each number A = m / 2 " , with Tn,n € JN, m < 2". Let us suppose that (5) is verified for A = m / 2 " , with m = 1,...,2" - 1; let A = p/2"+^ with p < 2 " + ^ If p = 2k. then (5) is verified by induction; if p = 2A: + 1, letting Ai = A;/2", A2 = (fc + l ) / 2 " and x' = XiX^ + (1 - Xi)x'^, i = 1,2, we have A = 5 (Ai + A2) and therefore
/(Aa;i + ( l - A ) x 2 ) = / ( i ( i i + 5 2 ) ) g g 1 [fix') + /(x2)] g i [(Ai + A2) fix')
+
+ (2 - Ai - A2) fix^)] = Xfix') + (1 - A) /(x2) .
Convex Functions
93
b) Using the diadic fractions, every A G (0,1) can be written as limit of a sequence {An}, with An of the form that appears in a). Using then the continuity of / , we get the thesis. D The functions satisfying inequality (4) are also called midconvex functions. Of course the class of midconvex functions is larger than the class of convex functions. Let us now state the boundedness properties of convex functions. Theorem 2.5.18. If / : X - ^ iR is convex on the convex set X C M^, then: i)
/ is bounded on every compact subset Y C relint(X);
ii) / is bounded from below on every bounded set Y C X. Proof. i)
As / is continuous on relint(X), it is bounded on every compact subset of X.
ii) Let x^ e relint(X), with x^ fixed. Then there exists 5 > 0 such that the closed ball B = {y e aff(X), \\y - x^\\ ^ 6} C r e l i n t ( X ) . As 5 is a compact set, then, on the grounds of i), f{y) ^ M , V y G S. Now let X G X be arbitrary and let y{x) = x^ + (x^ — x) S/p, with p = \\x — x^\\] as \\y{x) — x^\\ = 5, we have y{x) G S, therefore f{y{x)) ^ M. From the convexity of / we have
Vp + 0
p+ 6
/
p+ 0
p+0
hence
5f{x) Z plfix"") -M] i f p = | | a : - x ^ | | ^k.
+ 5f{x'') Z fc[/(rr:^) - M] + 5/(x^) , D
We conclude this section with a result that relates positively homogeneous functions and convex functions. We recall that a function / :
94
Convex sets
X ^f M, X cone of M^, is said to be positively homogeneous {of the first degree) if, for any number a ^ 0, f{ax) = af{x), Vx E X; f \s said to be subadditive on X C M if, for any x^y e X, f{x + y) ^ f{x) + / ( y ) . It is easy to verify the following result: if / : X - ^ iR is defined on a convex cone K C IRP' (with 0 G i f ) , then if / is positively homogeneous on K, it is convex on K if and only if it is subadditive on K. This means that e p i / is in this case a closed cone in IRP''^^. For other results concerning convex functions and homogeneous functions, see Newman (1969).
2.6. Directional Derivatives a n d S u b g r a d i e n t s of C o n v e x Functions We will now discuss the notion of directional differentiability of a convex function. We first recall the notion of directional derivative. Let / : IR^ —^MU { ± 0 0 } and let x be a point where / is finite; let ^ be a direction of IR^. The right-sided directional derivative of / in the direction y is defined as:
if the said limits exist, finite or not. Similarly, the left-sided
directional
derivative of f at x in the direction y is defined as
t—•U
l>
For y = 0, both / ^ and / i are defined to be zero. The reader can easily verify that
-f^{^\-y)
=
f-{^\y),
so that only the right-sided directional derivative could be considered (indeed in Chapter IV only right-sided generalized directional derivatives will be treated and the symbol " + " will be omitted in the related notations). If f\.{x]y) = f'_{x,y), then / admits directional derivative f'{x^y) at x\ so the unilateral derivative f^{x;y) is bilateral if and only if f^(x] —y) exists and f^(x]-y) = -f^{x]y).
Directional derivatives and subgradients of convex functions
In case /
is a convex function, the existence of f^{x]y)
95
is assured.
Indeed we have the following Theorem 2.6.1. Let / be an extended convex function and let x be a point where / is finite. Then, for each direction y, the ratio [f{x + ty) — is a nondecreasing function of ^ > 0, so t h a t fj^(x]y)
f{x)]/t
exists for every
direction y and
Moreover, f!^(x;y)
is a convex and positively homogeneous function of y,
with/:(a:;y)^/;(a:;i/). Proof. Let x e IRP' be any point such that f{x) f(x
+ y) — fix)]
is finite. Define h[y)
=
the function h is convex, since e p i / i is obtained by
translating the convex set e p i / and h{0)
= 0.
Let A2 > A i > 0 and
qi = A1/A2, g2 = (A2 - A i ) / A 2 . Then gi > 0, ^2 > 0 and qi + q2 = 1Hence
h(Xiy) = h(qiX2y + q20) ^ qih{X2y) + q2h{Q) and
h{\iy) ^ h{\2y) Ai
"~
A2
The last inequality indicates that h{ty)/t
is a nondecreasing function of
t > 0. Thus for a convex function / we can write that for every direction
and f\.{x]y)
nf r
\
exists, although it may not be finite. Then
r
Oi[f{x + tay) - f{x)
at-^0+
,
at
hence / ^ is positively homogeneous. Next we show t h a t / ^ is a convex function of y by proving t h a t / ^ is a subadditive function w i t h respect t o the direction y G dom(/^(r2;; •)). Let y \ y ^ G d o m ( / | ( x , •)). Then x + ty^, x + ty'^ e d o m ( / ) for t > 0 sufficiently small. Now we get
96
Convex sets
t—>'U"*"
t
t-^0+
t
~
< lira [/(^ + 2^^') - /(^)] + [/(^ + 2^2/') " /(^)] ^ ~ t-^0+
t
-/;(a:;y^) + / ; ( x ; y 2 ) . Finally, by subadditivity, for all y with /^(a:;y) < +oo and f^(x]—y) +00, we get
iR is convex on an open set X C FIP' and all its partial derivatives exist at x^ 6 X, then / is differentiable at x^. Proof. See, e.g., Roberts and Varberg (1973).
D
Theorem 2.6.3 (Rademacher's theorem). A locally Lipschitzian function / on an open set X C IR^ is differentiable on X almost everywhere, i.e. except a set of points of measure zero in the Lebesgue sense. It follows that a convex function f : X -^ M, where X C IRP' is open, is differentiable almost everywhere on X. Proof. See, e.g., Saks (1937), Roberts and Varberg (1973) and Rockafellar (1970).
D
Directional derivatives and subgradients of convex functions
97
Another important concept to be introduced is the subgradient of an extended convex function, a concept related to the ordinary gradient in the case of differentiable convex functions. We have seen in Theorem 2.5.2 that f : X —^ M, X open convex set of IRT', is convex on X if and only if there exists a vector u^ e M^ such that for each x^ e X
f{x) - f{x^) ^ v?{x -x^)
,
\/xeX
,
More generally, a vector ^ is called a subgradient of a convex function / at a point x, with f{x) finite, if
f{y)^f{x) + ay-^),
yyeR^.
(i)
The set of all subgradients of / at x is called the subdifFerential of / at X and is denoted by df{x). Relation (1), which we refer to as the subgradient inequality, has a simple geometric meaning: it says that the graph of the linear affine function h{y) = f{x) +^{y — x) is a non-vertical supporting hyperplane to the convex set e p i / at the point {x^ / ( ^ ) ) Clearly df{x) is a closed convex set (maybe void or a singleton) since ^ G df[x) if and only if a system of (infinite) weak linear inequalities is satisfied. If df[x) 7^ 0, / is said to be subdifferentiable at x. For example the Euclidean norm f{x) = \\x\\, x 6 IR^, is not differentiable at x = 0, but it is subdifferentiable and df{0) is given by all vectors ^ such that ||y|| > ^ -y, Vy, i.e. by the closed ball of unitary radius. From the definition of subgradient we have the following important case. Proposition 2.6.4. If X is a nonempty convex set of 12^, then dS{x^ X) is the normal cone to X at x. Proof. By definition ^ e d6{x,X)
S{y,X)^6{x,X)
+ ^-{y-x),
if and only if
Vy .
This condition means that x E X and 0 ^ ^ • {y — x), V y 6 X, i.e. that ^ is normal to X at x.
D
We must note that ^ is a subgradient of / at x if and only if (^, —1) G JR^"^^ is a normal vector to a supporting hyperplane for e p i / at (x, f{x)).
98
Convex sets
In fact ^ is a subgradient of / at x if and only if (1) holds, i.e. if and only if
(^,-l)-(y-x,a-/(a;))^0,
Va ^ / ( j / )
i.e. if and only if
(e,-l)€iV(epi/,(x,/(x)). Subgradients can be characterized by means of directional derivatives, as shown by the next theorem. Theorem 2.6.5. Let / be an extended convex function and let x be a point where / is finite. Then ^ is a subgradient of / at x if and only if
In fact the closure of f^{x]y)
as a convex function of y, is the support
function of the closed convex set
df{x).
Proof. Setting z = x + ty, we can write the subgradient inequality as
Since the difference quotient decreases to f'j^{x\y) as t - ^ 0"^, the first part of the theorem is proved. For the second part we have
cl(/+(x, y)) = sup {cy + 7 I cy + 7 ^ / + ( ^ ' V)^ ^v) = = sup {cy I cy ^ f^{x,y),
\/y} =
= sup {cy I c € df{x)} = 5%y, df{x))
.
D
Example 2.6.1. The function / ( x i , X 2 , ...,2:^) = m a x { x i } , 2 = 1,2, . . . , n , is a convex function. Let I{x) be the set of indices i such that Xi = f{x). Then
w/ N r (xi + tyi)U{x]y) = hm max ^ 7
f{x) ^-^
and since for t sufficiently small every i 0 I{x) can be omitted, we can write
Directional derivatives and subgradients of convex functions
99
f^{x:y) = lim max ^ -^ = •^ + ^ '^^ t^o+ iei(x) t ,. {xi + tyi-Xi) — lim max -^ = max yi . t-^o+ iei{x) t iei{x) Hence df{x)
consists of all vectors ^ G JR^ such that maXj^/(2.) yi ^
^-y,
Vy G iR'^, i.e. of all vectors of components xi,a:2, ...,Xn , such that Xi ^ 0, J2i Xi = l, Xi = 0 if i ^ / ( x ) . It must be noted that the assertion of Theorem 2.6.5 is equivalent to C ^ 9{f^{x]0)), where / ^ is now considered as a function only of the direction (i.e. x is fixed). Theorem 2.6.6. Let / be a proper convex function; then
i)
lfx^dom(/), df{x) = 0.
ii)
If X G relint(dom(/)), df{x) as a function of y, with
^ 0 and f'^{x]y)
/ ; ( x ; y) = sup {C • y U € df{x)}
is closed and proper
= iR U { ± 0 0 } is a proper convex function, then / * is proper too. Proof. i)
Since dom(/) 7^ 0, we have
f*{u) = sup {ux — f{x)} > —00 ,
Vu .
X
ii) Let x^ e relint(dom/) 7^ 0. Since / is a proper convex function, then according to Theorem 2.6.6, we have df{xP) 7^ 0. For u G df{xP) we have / ( x ) ^ f{x^) + 7i(x - x^) ,
\/xe IBJ"
and
ux^ - f{x^) ^ux-
f{x) ,
Vx e iR^ .
This means
f*{u) = sup {ux - f{x)}
g ux^ - f{x^) < +CX)
X
and dom(/*) ^ 0 .
D
Conjugate functions
105
In the proof of the previous theorem we have seen that U df{x) c d o m ( r ) . X
The above inclusion is in general strict, as the equality may not hold, as shown by the following example. Example 2.7.1. Let
f{x) = I X [ +00 ,
if X ^ 0 .
Then ( - 0 0 , 0) - U df{x)
C (-00, 0] = dom(/*) .
X
There are other connections between the conjugate functions and the subdifferentials; for this subject see, e.g., Rockafellar (1970). As a consequence of Definition 2.7.1 we have the so-called Young-Fenchel inequality. f*(u) + f{x) ^ux ,
MxeEJ",
"iueM"
.
Theorem 2.7.2. The equality / * ( i / ) + f{x^) = u^ • x^ holds if and only if
u° G df{x^). Proof. The relation vP 6 df{x^)
is equivalent to
u^{x - x°) ^ f{x) - f{x^) ,
V x G 1?" .
This means u°x - f{x) ^u-x°or equivalently
i.e.
f(x'^) ,
VxeM''
106
Convex sets
Taking the Young-Fenchel inequality into account, the thesis is proved. • We can also introduce the conjugate of / * , called the bi-conjugate of / and denoted / * * , according to / * * == ( / * ) * , i.e.
r*{x)= sup
{x-u-riu)}.
Obviously / * * is a convex lower semi-continuous function, with / * * ^ / . The question arises about the equality between / * * and / for which we have the following result. Theorem 2.7.3. If df{x^) Proof. Let u^ G df{x^),
7^ 0, then / * * ( x ^ ) -
f{x^).
then according to Theorem 2.7.2 we have
r(tzO) + /(a:0) = ua:« and
/(a;°) ^ /**(a:0) = sup {x • u - r{u)}
^ /(x°) .
D
In particular we have the following implications:
u^ e df{x^) —00; then / = / * * if and only if / is convex and lower semi-continuous. Proof. Since the "only if" part of the theorem is obvious, let us assume / convex and lower semi-continuous. If it is identically / = -|-oo, there is nothing to prove. In other cases e p i / 7^ 0 is a closed convex set. Assuming the existence of a point x^ G iR^, x^ G d o m ( / ) , with / * * ( x ^ ) < f{x^), we have (x^,/**(x^)) ^ e p i / . Hence, using the strong separation theorem, we can find a vector (v^vo) G M^'^^, {V.VQ) ^ (0,0), such that
Conjugate functions
sup
107
{vx + t'oa} < V ' x^ + vof**ix^) •
(1)
If vo > 0, then with a sufficiently large we get a contradiction. If t'o = 0 we get sup
vx < vx^ .
xGdom(/)
Let u e doin(/*) and t>0. r{u
Then
+ tv) = snp{{u + tv)x-f{x)}
^
X
^
sup
{ux — f{x)} + t
xGdoin(/)
= f*{u) + t
sup
vx ==
xEdom(f)
sup
vx
xGdom(/)
and
^
(ti + tf)x^ - ( / * ( u ) + t
sup
^;x) =
xEdom(/)
= {ux^ — /*(i/)) + t{vx^ —
sup
t'x) .
x€dom(/)
Since this inequality is true for all t > 0, then we get / * * ( x ^ ) = +oo, which is a contradiction. Thus VQ < 0. Without loss of generality we can set VQ = — 1 ; then we have f*{v) = sup^, {vx - f{x)} < vx^ — / * * ( x ^ ) , in contradiction to the Young-Fenchel inequality. D About the previous theorem the following remarks are useful, a) Because of /**(:r:)=sup{^:r-/*(x)} = X
= sup {ux + 7 I -7 ^ /*('^)} = = sup {ux + 7 I -7 ^ uy - /(y), Vy} = = sup {ux + 7 I -uy - 7 ^ /(y), Vy} ,
Vx ,
108
Convex sets
the equality / = / * * means that / can be represented as a supremum of linear-affine functions which are smaller than / . b) If / is convex, but not lower semi-continuous (i.e. not closed), then in case of / > —oo we have f**{x) former remark we get
cl(/(:r)) = sup{ux + j\uy
= c l ( / ( x ) ) . Together with the
+ j ^ f{y), My} ,
\/x ,
hence the assertion of Theorem 2.5.9 is proved. c) Finally, if / is an arbitrary function which admits a linear-affine function smaller than / , then / * * provides the so-called closed convex hull of / , i.e. the greatest closed convex function which is smaller than / . In order to give some calculus rules on the conjugate of the sum of functions, we must introduce the notion of infimal convolution. Definition 2.7.2. denoted by
Let / i , / 2 , . . . , / m : iR"" -^ iR U {d=oo}; the function,
/i e... e fmifi e... e /m : JR'' ^ iRu {±00}) and defined as ( / l e ... e fm){x)
=
inf
{fi{x')
+ ... + fm{xn}
,
is called the infimal convolution of / i , . . . , /^n. Theorem 2.7.5. Let / i , . . . , / m : M^ -^ RU properties hold:
{ ± 0 0 } ; then the following
1) (/ie...e/m)* = /r+ ••• + /;;; 2) (/i + ... + /m)*^/re...©/;^; 3) If / i , •••, fm are proper convex functions, with H i relint(dom(/j)) ^ 0, then
(/i + ... + /mr = /r©.-.e/;;.
Extreme of convex functions
109
Proof. The first and second assertion can be derived directly from the definition of conjugation and infimal convolution. For the proof of assertion 3) see, e.g., Rockafellar (1970). D
2.8. Extrema of Convex Functions Convex functions (and their generalizations) play a central role in the analysis of extremum problems. The importance of convex functions in problems of this kind lies mainly in some basic properties, described in the following theorems. Theorem 2.8.1.
Let / be a proper convex function on IR^. Then ev-
ery local minimum of / in dom(/) is a global minimum of / on IR^. Proof. If X* € d o m ( / ) is a local minimum, then
fix) ^ fix*) for all X in a sufficiently small neighbourhood N{x*). Let z be any point in iR^. Then ((1 - A) x* + A^) e A^(x*) for sufficiently small A e (0,1) and
f((l-X)x*
+
Xz)Zfix*).
Since / is a proper convex function,
(1 - A) fix*) + Xfiz) ^ / ( ( I -X)x* + Xz) . Adding the last two inequalities and dividing the result by A, we obtain
/(^) ^ fix*) . that is, X* is a global minimum point.
D
Corollary 2.8.1. Let / be a proper convex function on IR^ and let X C d o m ( / ) be a convex set; then every local minimum of / at x € X is a global minimum of / over X. Note that generally the minimal value of a convex function can be attained at more than one point. We will now show that the set of the
110
Convex sets
minimizing points of a proper convex function is a convex set. Theorem 2.8.2. Let / be an extended convex function on JR^. The set of points at which / attains its minimum is convex. Proof. Let a * be the value of / at the minimizing points. Then the set L{f^ Q^*) = {x\x
e IRP'^ f{x)
^ a * } is precisely the set of points at which
/ attains its minimum, and by Theorem 2.5.3 this set is a convex set.
D
Another result, quite important in some applications, is worth mentioning. Theorem 2.8.3. convex set X
Let /
C IBP',
: X
—> iR be a strictly convex function on the
if / attains its minimum on X,
this minimum is
attained at a unique point of X. Proof. x^ e X,
Suppose that the minimum is attained at two distinct points x'^ e X
and let f{x^)
= / ( x ^ ) = a.
rem 2.8.2 t h a t for every A G [0,1] we have f{Xx^
It follows from Theo+ (1 - A) x^) =
contradicting the fact that / is strictly convex.
a, D
The following proposition generalizes a well-known property of differentiable convex functions. Theorem 2.8.4. Let / be an extended convex functon and let f{x*)
be
finite; a necessary and sufficient condition for x* t o be a minimum point for / is t h a t
0 G a/(x*). Proof. By the definition of subgradient, 0 G df{x*)
if and only if f{y)
/ ( x * ) for every y G IR^] that is, x * is a minimum point for / . According t o Theorem 2.6.5, one has 0 G df{x*) is finite at x * and f\.{x*,g)
^ 0, V^.
Therefore if /
if and only if
^ D /
is a differentiable
(extended) convex function on IR^, then V / ( x * ) = 0 if and only if / attains its minimum at x * . This result remains valid also if we replace IR^ by
Systems of convex functions and nonlinear theorems
111
some open convex subset X of M^ such that x* G X. It also indicates the familiar fact that in seeking the unconstrained minimum of a (twice differentiable) convex function, no second-order conditions need to be checked at the stationary points. Other remarkable theorems on the extreme of (extended) convex functions are given in Section 27 of Rockafellar's book on convex analysis (Rockafellar (1970)).
2.9. Systems of Convex Functions and Nonlinear Theorems of the Alternative We conclude the topics related to convex functions with some results concerning systems of inequalities of convex functions which generate theorems of the alternative for the nonlinear case. The first result is due to Fan, Glicksberg and Hoffman (1957) . Theorem 2.9.1. Let / i , / 2 , ...,/m be real-valued convex functions, all defined on the convex set X C IR^ and let gijg2^ "",9k be linear affine functions on M^. If the system
xeX,
fi{x) 0). Indeed, because of the assumptions, also the system
/(x) 0, then it is JC{X^)
0 and from the fact that ui ^ 0, ...,t^m ^ 0, 2z 7^ 0, it results C{x) > 0, which is absurd. D
Systems of convex functions and nonlinear theorems
115
Theorem 2.9.2 holds also if the functions involved are defined on a solid (i.e. with a nonempty interior) convex set X C M^ and x^ G i n t ( X ) . Let us now deduce the theorem of the alternative of Farkas-Minkowski from Theorem 2.9.2. It is quite evident the systems
Ax = b ,
x^O
(5i)
and
A^y^O,
by ^k of /C, there exist vectors p e IR^ and q e R^ such that
pZO
(p,g)7^0, pf{x)+qh{x)
^0 ,
VXGX.
If K is empty, the last inequality becomes strict.
116
Convex sets
Proof. See Berge and Ghouila-Houri (1965) or Mangasarian (1969).
D
A more general approach to the theorems of the alternative involving nonlinear systems has been proposed by Giannessi (1984) and Cambini (1986). Assume we are given the positive integers n and v, the nonempty sets n C M'', X C EJ", Z C M and the real-valued function F : X -^ M". We want to study the conditions for the generalized system
F{x) en ,
xeX
to have (or not to have) solution. Definition 2.9.1. The function w : M^ —> 12 is called weak separation function if
n'' = {heIR''\ w{h) ^ZJDH; S : M" -^ M Is called strong separation function if
n' = {he]R''\
s{h)
^Z}cn,
The following theorems hold: Theorem 2.9.4. Let the sets 7i, X and the function F be given, i)
The systems
F{x)eH,
xeX ^ndw{F{x))eZ ,
WxeX
are not simultaneously possible whatever the weak separation function w might be. ii) The systems
F{x) en,
X G X and s{F{x)) eZ
,
\/x e X
are not simultaneously impossible whatever the strong separation function s might be.
117
Proof. i)
If F{x) eH,
X e X, \s possible, i.e. if 3 ^ e X such that
h = F{x) e n , then we have w{F{x)) is false.
= w(Ji) ^ Z, so that the second system of i)
ii) If F{x) admits no solution, i.e. if /i = F{x) ^ H, V x G X , then we have s{F{x)) — s{K) G Z , V x E X , so that the second system of ii) is true. This completes the proof. D Many applications of the previous general theorem to nonlinear theorems of the alternative, optimality, regularity and duality conditions for mathematical programming problems are given in Cambini (1986), Giannessi (1984, 1987), Martein (1985), Pellegrini (1991), Tardella (1989). Here we report only the following one: consider the particular case where
v = l + m]n=- {{u,w) eR^xR^ : uemt{U), f \ X -^ E}\ g : X -^ JR^; F{x) = f{x),g{x))\
w eV}; Z = (-oo,0]; h = [u,w), where the
positive integers / and m, the closed convex cones U C iR^ V C M^, with int(t/) 7^ 0 (otherwise H = 0), and the functions / , g are given. Definition 2.9.2. Let C be a convex cone. A function F \ X —^ M^ is said to be C-convex-like on the convex set X C iR^, if V x , y G X there exists zeX such that F(z) - (1 - a) F{x) - aF{y) G C, V a G [0,1]. Theorem 2.9.5. Let F{x) = f{x)^g{x)) i)
be cl(H)-convex-like.
If the system
fix) G mt{U) ;
g{x) eV;
xeX
is impossible, then:
3^eU\ such that
XeV\
with (7?, A) 7^ 0 ,
(1)
118
^f{x)
Convex sets
+ \g{x)
^ 0 ,
V X E X .
ii) If the previous inequality holds and moreover
{xeX\
f{x) e mt{U), g{x) e V, Xg{x) = 0} = 0 ,
when ^ = 0 ,
then system (1) is impossible. Proof. See Cambini (1986), Giannessi (1984).
D
2.10. Generalized Convex Functions Generalized convex functions play a very important role in optimization theory; historically the first type of generalized convex function was considered by De Finetti (1949) who first introduced the quasiconvex functions (a name given later by Fenchel (1953)). We have seen in Theorem 2.5.3 that a necessary, but not sufficient, condition for a function f : X —^ IR to be convex on the convex set X C J?'^ is that its lower-level sets
L(/, a) = {x\xeX,
f{x) ^ a, a e M}
are convex for any real number a. A quasiconvex function on a convex set X C IR^ is just a function characterized by the convexity of its lower-level sets. We have therefore the following definition. Definition 2.10.1. The function f : X -^ IR \s quasiconvex on the convex set X C IR^ if ! / ( / , a) is convex for any a e M. We shall see that an equivalent definition of a quasiconvex function f : X —^ M, where X C M^ is convex, is the following one:
x\ x^eX,
f{x^) ^ f{x^) ^ f{Xx^ + (1 - A) x^) ^ f{x^) , VAG
or, equivalently, in a symmetric form:
[0,1]
Generalized convex functions
/(Axi + ( 1 - A ) x 2 )
119
^ max{/(xi),/(x2)}, Vx\x^eX,
V A G [0,1] .
Theorem 2.10.1. If X C iR"" is a convex set 3nd f : X -^ M, then the following conditions are equivalent: i)
/ is quasiconvex on X, i.e.
yx\x^ex, ii)
VAG [0,1].
For each x e X and each y E M^ the function gx,y{i) = / ( ^ + ^y) is quasiconvex on the interval T:^y = {t\teR,
x + tye X} .
iii) For each x^,x^ E X the function /ia:i,x2(A) = / ( A x ^ + (1 ~ '^) ^^) 's quasiconvex on the interval [0,1]. iv) For each a E M the lower-level set L{f,a)
= {x\xeX,
f{x)
^a}
is convex (recall that the empty set 0 is convex by definition), v)
For each a e M the strict lower-level set SL{f,a)
= {x\xeX,
f{x) 0 and Ax^ + (1 - A) x^ G 5 L ( / , a + e), Ve > 0. Hence /(Ax^ + (1 - A) x^) < a + 5, Vs > 0, that is /(Ax^ + (1 - A) x^ ^ a. D Another characterization of quasiconvexity for functions on one variable, in terms of unimodal functions, is given in Avriel, Diewert, Schaible and Zang (1987) and in Martos (1975). The following theorem gives the characterization of differentiable quasiconvex functions, defined on an open convex set. This result is due to Arrow and Enthoven (1961). Theorem 2.10.2. Let X C IRP' be an open convex set and f : X —> Mbe differentiable on X] then / is quasiconvex on X if and only if x\ x2 G X , /(X^) ^ / ( x 2 ) =^ (xl - x2) V / ( x 2 ) S 0 . Proof. Let / be quasiconvex and f{x^)
^ f{x^)\
then
/ ( x 2 + A(xi - x2) = / ( A x i + (1 - A) x2) £ / ( x 2 ) , VA6[0,1] and thus,
lim / ( x - + A ( x - - x ^ ) ) - / ( x ^ ) ^ AjO
A
_ \
^ )
J\
I
Generalized convex functions
121
Conversely, let the condition of the theorem be fulfilled and f{x^) We absurdly assume that there exists a A G (0,1) with f(Xx^
^
fi^'^)+ (1 —
^) ^^) > / ( ^ ^ ) - This means that for the function
it holds
Then, because of the differentiability of h, it follows that h is continuous and that the set {A E [0, X]\h{X) = h{0)} is closed and admits a maximum. Therefore there exists A G [0, A) such that h{X) = h{0) and h(X) > h{0), V A G (A, A]. By the mean value theorem we have
K^,A^o)>0,
Ao€(A,A)
and ^xl,a;2(0) < h^l^x'^iXo)
< /lxl,x2(A) .
But this means that there exists a point x^ = XQX^ + (1 ~ ^^o) x'^ between x^ and x'^, with /(xO) = / i , i , , 2 ( A o ) > / i , i , , 2 ( l ) = / ( x i ) . From the assumptions of the theorem from f{x^)
< f{x^)
it follows
( x l - xO) V / ( x ° ) ^ 0 , i.e.
{l-Xo){x'-x^)Vf{x'')^0, i.e. /i^i a.2(Ao) = 0, in contradiction with the previous result.
D
By the contraposition rule it follows that the implication of Theorem 2.10.2 can be rewritten as
x\ x^eX,
{x^- x2) V/(x2) > 0 =^ f{x^) > /(x2) .
122
Convex sets
A function / : X —> iR is defined to be quasiconcave on the convex set X C M^ if and only if —/ is quasiconvex on the same set X (this kind of definition also holds for the other classes of generalized convex functions which will be introduced later). Another result on the characterization of continuous and respectively of differentiable quasiconvex functions is contained in the following Theorem 2.10.3. a) Let f : X -^ IR he continuous on the convex set X C IRP", then / is quasiconvex on X if and only if
x\ x^eX,
f(x^) < /(x2) ^ f{Xx^ + (1 - A) x2) ^ /(x2), VAG[0,1] .
b) Let / : X -^ iR be differentiable on the open convex set X C M^] then / is quasiconvex on X if and only if
x\ x^ eX,
f{x^) < f{x^) =^ (x^ - x^) Vf(x^)
S 0.
Proof. Obviously a) and b) are necessary for quasiconvexity; moreover, with a reasoning analogous to the proof of Theorem 2.10.2, we see that, under differentiability, a) and b) are equivalent. Now we show that a) is sufficient for the quasiconvexity of / ; this means that the implication of a) holds even if f{x^) = f{x^)Assuming there exists a A E (0,1) with /(Ax^ + (1 — A) x^) > f{x^) = /{x'^), because of the continuity of / , we can also find a point x^ = XQX^ + (1 — XQ) X^ between x^ and Ax^ + (1 - A) x^, i.e. Ao 6 (A, 1), such that
/(Axi + (1 - A) x^) > /(x^) > /(x^) = /(x2) . But this is a contradiction to a). In fact, since Ax-^ + (1 — A) x^ is between x^ and x^, according to
\X^ + (1 - A)x2 = Aa;0 + ( l - A ) ^ 2 ^ Ao
^
Ao^
and / ( x ^ ) < / ( x ^ ) , we should have
A ^ ^Q^ ^^ Ao
Generalized convex functions
123
All the given definitions of quasiconvexity are in general rather difficult to verify, as they imply conditions involving infinitely many inequalities. However, if the function is twice continuously differentiable on X, we have necessary and sufficient conditions for the quasiconvexity, which is easier to verify. We give separately the necessary conditions and the sufficient conditions for the quasiconvexity of a twice continuously differentiable function. Theorem 2.10.4. Let f : X —^ M be twice continuously differentiable on the open convex set X C M^. The following condition is necessary for / to be quasiconvex on X:
y G iR^ X G X, yVfix) = 0 =^ yHf{x)y^
0.
Proof. Let / be quasiconvex and assume that there exist x^, y^ with 2/^V/(x^) = 0 and y^Hf{x^)y^ < 0. Without loss of generality we can set \\y\\ = 1. Because of the continuity of the elements of the Hessian, there exists a neighbourhood Us{x^) such that y^Hf{x)y^ < 0 for each X e Us{x^). We have especially y^Hf{x^±Xy^)y^ < 0, VA G [0,6]. Using Taylor's expansion formula, this allows us to obtain
/(xO + 0, when \a\ is close to zero, we can see that /?"(a) ^ 0 for \a\ sufficiently small and therefore (since /3'(0) = 0) (3{a) ^ /3(0) = 0 for \a\ sufficiently small. Now, since [ V / ( x ° ) ] ^ V / ( x ° ) > 0, for A € [0,1] with |A - Ao| sufficiently small, we have [V/(xO + (A - Ao)(x^ - x2))](-V/(x°)) < 0 and, taking (2) and (4) into account and with /3{X — AQ) ^ 0, h{X) = f{Xx^ + (1 - A) x2) = /(x^ + (A - Ao) • (xi ~ x2)) ^
f{x^ + (A - Ao)(xi - x2) + /3(A - Ao) Vf{x^))
^
=
= f{x^) = h(Xo) . However, this holds for each maximum point XQ of h, thus h is constant on [0,1] and it holds h{X) = h{Xo) for each A 6 [0,1], which is in contradiction to (1). Thus / is quasiconvex (see also Figure 7). D We note that the conditions expressed by Theorem 2.10.5 only require the positive semidefiniteness of the Hessian matrix of f(x) on the hyperplane {y \ yVf(x) = 0}. The first results for the quasiconvexity of twice continuously differentiable functions are due to Arrow and Enthoven
Convex sets
126
(1961); Theorem 2.10.5 generalizes a result of Katzner (1970). For other papers on quadratic forms subject t o a system of linear equalities see, e.g., Debreu (1952), Bellman (1960), Mann (1943), Diewert, Avriel and Zang (1981), Crouzeix and Ferland (1982). We list the main results. The following theorem gives an equivalent condition for the semipositiveness of a quadratic form on a hyperplane.
x(a) = x^-\- a(x^-x^) + p(a) - Vf{x^) f{x{a)) =f{x^)
Figure 7.
Theorem 2.10.6. Let ^ be a symmetric matrix and a ^ 0 a vector; then the following conditions are equivalent: (1)
yAy
^ 0 for each y such that ya = 0.
(2)
T h e equation (of degree ( n — 1) in A)
Generalized convex functions
A-XI
a
aT
0
127
= 0
admits only nonnegative roots. Proof. See, e.g., Hancock (1960) or the mathematical appendix of Samuelson (1947).
D
So we have the following Corollary 2.10.1. Let f : X -^ Mbe twice continuously differentiable on the open convex set X C BJ" and let V/(a;) ^ 0, Wx e X; then / is quasiconvex on X, if and only if the equation (of degree ( n — 1) in A)
Hf{x)-\I [Vf{x)f
V/(x) 0
= 0
admits only nonnegative roots for each x E X . Other interesting results on the positive semidefiniteness of a symmetric matrix A, of order n, on the hyperplane {y | ya = 0}, a, y G M^, a 7^ 0, are given in the quoted paper of Crouzeix and Ferland who prove that the following conditions are equivalent:
CI. ya — Q implies yAy ^ 0. C2. Either A is positive semidefinite or A has one simple negative eigenvalue and there exists a vector h G IRP' such that Ah — a and ah ^ 0. C3. The bordered matrix \ A a Aa =
T^
^
[ a^ 0 has one simple negative eigenvalue. C4. For all nonempty subsets Q C { 1 , 2,..., n }
det
al
0
S0
128
Convex sets
where AQ is obtained from A by deleting rows and columns whose indices are not in Q and aq is obtained analogously from a (i.e. the so-called "bordered principal minors" of the matrix A are all nonpositive). The same authors show that the following conditions are equivalent. D l . There exists a scalar A, 0 ^ A < +00 such that A + Xaa^ is positive semidefinite. D2. Either A is positive semidefinite or A has one simple negative eigenvalue and there exists a vector b G M^ such that Ab = a and ab < 0. D3. For all nonempty subsets Q c {1,2, ...,n} we have
det
AQ
aq
aj
0
^ 0
and if equality holds, then det(^Q) ^ 0. Obviously the D-conditions are sufficient for the C-conditions. We shall be more precise on the result of Theorem 2.10.5 when we introduce the convex and generalized convex functions at a point x^ (Section 2.14). Under the assumption Vf{x) ^ 0, \/x e X, X C M^ open and convex, the conditions imposed by Theorem 2.10.5 are therefore necessary and sufficient for the quasiconvexity of f E C'^{X). Obviously we may have quasiconvex functions with a zero gradient at some points. Consider, e.g., the function / : ]R? —^ M defined as / ( x i , X 2 ) = (2^1)^ + (^2)^. which is quasiconvex on JR^ but with Vf{x^) = 0 at x^ = (0,0). A rather strong sufficient condition (including also the case V / ( x ) = 0) for the quasiconvexity of a twice continuously differentiable function is contained in the following result. Theorem 2.10.7. \f f : X -^ IR\s twice continuously differentiable on the open convex set X C JR^, then / is quasiconvex on X if
xeX,
yj^O,
yVf{x)
= 0 =^ yHf{x)y>0.
(5)
Generalized convex functions
129
Proof. Let f{x^) ^ / ( ^ ^ ) and assume on the contrary that {x^—x'^) V / ( x ^ ) > 0. Again we consider the function
h{X) = h,i^,2{X) = f{Xx' + (1 - A) x^) . By the Weierstrass theorem, there exists AQ G [0,1] such that h(Xo) is the maximum of h on [0,1]. We cannot have AQ = 0, as (x-^ — x^) V/(a:^) = h^(0) S 0, in contrast with the assumption and we cannot have AQ = 1 since h{l) = f{x^) ^ / ( ^ ^ ) = h{0) and AQ = 0 cannot be the maximum point. So we have proved that there exists AQ G (0,1) for which (x^ — x'^) Vf(x^) = 0. Taking the condition of the theorem into account, we get
{x^-x'^)Hf{x^){x^-x^)>0
,
Obviously, there exists a neighbourhood U(x^) such that
(x^-x^)Hf(x){x^-x^)>0 for each x G U{x^).
,
In particular we have, for |t| sufficiently small,
h{Xo + t) - h{Xo) = f{x^ + t{x^ - x^)) - /(rrO) = = i t2(xi - x'^) Hf{x^ + 6t{x^ - x2))(xi - x2) > 0 , where 5 G (0,1). This is a contradiction since AQ was assumed to be a maximum of h on
[0,1].
D
It must be noted that implication (5) is taken by Diewert, Avriel and Zang (1981) as a characterization of another class of generalized convex functions: the strongly pseudoconvex functions (twice continuously differentiable); we shall see later that implication (5) is also sufficient for the strict pseudo-convexity of f{x) G C^. So any condition assuring the positive definiteness of Hf{x) in the subspace orthogonal to V / ( x ) ^ 0 is a sufficient condition, not only for the quasiconvexity of / , but also for other classes of generalized convex functions contained in the class of quasiconvex functions, such as the pseudoconvex functions and the strict
130
Convex sets
pseudoconvex functions. Among these conditions we quote the following, expressed in terms of bordered determinants and due to Arrow and Enthoven (1961), perhaps historically the first sufficient conditions for the quasiconvexity of a twice continuously differentiable function: For each a: G X it is Ak{x) < 0, fc = 1,2,..., n, where
d'f dxidxi Ak(x) =
av dxkdxi
d^f 5a:i9x2
dy dxkdx2
dxidxk
dxi
d^f
df
dxkdxk
dxk
df
df
df
dxi
dx2
dxk
0
See also Ferland (1972b). Another characterization of twice continuously differentiable quasiconvex functions is given by Diewert, Avriel and Zang (1981), who introduce the concepts of semistrict local maximum. Following Martos (1967, 1975) we say that a function f : X -^ R is quasimonotone or quasilinear on the convex set X C M^ if / is both quasiconvex and quasiconcave on X. The term "quasilinear" is preferable, in order to avoid confusion with another concept (see Section 2.12 of this chapter). Equivalently / is quasilinear on X if for any points x^,x^ E X, x^ e {Xx^ + (1 - A) x^), 0 < A < 1, we have
imn{f{x')J{x^)}
^ f(x')
^ max{f(x')J{x')
.
Again we have that f : X -^ IR \s quasilinear on the convex set X C M^ if and only if all its lower and upper level sets L{f,a) and U{f,a) (equivalently: all its strict lower and upper level sets 5 L ( / , a ) , SU{f^a)) are convex for every a e M. This result, which follows directly from the definition of quasilinear functions, implies that the level sets, i.e. the sets
Y{f,a)
= {x\xeX,
f{x) = a}
of a quasilinear function are convex, for all a e IR. The converse of this statement, however, is not true: a function with convex level sets need
Generalized convex functions
131
not be quasilinear. Consider the following example: 0 ,
if 0 ^ X ^ 1
f{x) = { 2 ,
if 1 < X ^ 2
1 ,
if 2 < X ^ 3 .
The set Y(f, a) is empty for a ^ 0 , 1 , 2 and for a =- 0 it is Y{f, 0) -= [0,1]; for a = 1 we have Y{f, 1) = (2,3]; for a = 2 we have F ( / , 2 ) = (1,2]. Thus y ( / , a) is convex for all a, but / is not quasiconvex and hence not quasilinear. Note that / is not continuous. In fact, continuity assures the converse statement, as shown by the following result, due to Greenberg and Pierskalla (1971). Theorem 2.10.8. If the level sets Y(f^ a) are convex for every a E M and / is continuous on the convex set X C JR^, then / is quasilinear on X. Proof. We show the quasiconvexity of / . Let / ( x ^ ) ^ / ( x ^ ) and absurdly assume that there exists a number A € (0,1) such that
f{Xx' +
(l~-~X)x^)>f{x^).
Because of the continuity of / we can find a point x^ = XQX^ + (1 — AQ) X^ between x^ and Xx^ + (1 — A) x^, i.e. a number AQ G (A, 1] such that
/(x°) = / ( x 2 ) < / ( A a ; i + ( l - A ) x 2 . But this provides a contradiction to the condition of the theorem. In fact, since Xx^ + (1 — A) x^ is between x^ and x^, according to
Xx' + (l-X)x^
=^ x ' + ( l - ^ ) x \ Ao
^
Ao^
A e [0,1] AQ
and / ( x ^ ) = f{x'^), we should have, taking into account the assumption on the convexity of the level sets of / ,
/(Axi + ( l - A ) x 2 ) = /(x°) = /(a:2). Thus / is quasiconvex. Since the condition is fulfilled also for —/ we can state that —/ is quasiconvex, too, i.e. / is quasilinear. D
132
Convex sets
If / is differentiable, quasilinearity is characterized as follows: Theorem 2.10.9. Let / be differentiable on the open convex set X C M^; then / is quasilinear on X if and only if
x\ x^eX,
fix')
= f(x^) ^ (x' - x^) V/(x2) = 0 .
Proof. 1) Let / be quasilinear and f{x')
have (x' - x^) Vfix^)
= f{x'^)] then by Theorem 2.10.2 we
= 0.
2) We show now quasiconvexity. Let f{x') ^ / ( ^ ^ ) and assume (x' — x'^) V / ( x ^ ) > 0. Even \f f{x') < f{x^), thanks to Darboux's theorem applied to the function h{\) = /(Ax-^ + (1 — A)x^) on the interval [0,1], we can find x^ e [ x ^ x ^ ) such that / ( x ^ ) = / ( x ^ ) . By the assumption of the theorem, we have then (x^ ~ x^) Vf{x^) — 0. It follows that, for AQ E (0,1]:
0 = [Aoxi + (1 - Ao) x2 - x2] V/(x2) = = [xi - (1 ~ Ao) x^ + (1 - Ao) x2 - x2] V/(x2) = = Ao(xi-x2)V/(x2) in contradiction with the assumption. This means that / is quasiconvex, but this means that —/ is also quasiconvex, and therefore / is quasilinear. D For other insights on quasilinear functions, see Martos (1975), Thompson and Parke (1973). Other types of generalized convex functions we shall examine are the strictly quasiconvex functions, the semistrictly quasiconvex functions, the pseudoconvex functions and the strictly pseudoconvex functions. Definition 2.10.2 (Ponstein (1967), Katzner (1970)). A function f : X -^ Ft, defined on the convex set X C M^ is called strictly quasiconvex on X if
x ^ x ^ e X , x^ 7^ x^, fix')
S /(x2) =^ fiXx' + (1 - A) x2) < /(x2) ,
V A G (0,1)
Generalized convex functions
133
or, equivalently:
x\x^
eX,
xV^^
AG (0,1) :
/(A:,! + (1 - A) x2) < max {/(x^), f{x^)} . Strictly quasiconvex functions are also called "strongly quasiconvex functions" by Avriel (1976), "unnamed convex functions" by Ponstein (1967) and "X-convex functions" by Thompson and Parke (1973). It is easy to see that, as a consequence of the definitions, a strictly quasiconvex function is quasiconvex and that a strictly convex functions is strictly quasiconvex. However, not every convex function is strictly quasiconvex; for example a convex function with a "flat region" cannot be strictly quasiconvex. For differentiable functions we have the following characterization, due to Diewert, Avriel and Zang (1981). Theorem 2.10.10. Let f : X —^ M be differentiable on the open convex set X C IR^] then / is strictly quasiconvex on X if and only if
xeX,
yeEJ",
y/0,
yV/(x) = 0 =^ 5:,,^(t) = / ( x + ty) ,
defined for t ^ 0, does not attain a local maximum at t = 0. Proof. i)
Let / be strictly quasiconvex, y 7^ 0 and yVf{x)
= 0; we assume that
the conditions of the theorem are not fulfilled. Then with t sufficiently small we have x ± 2ty G X, f{x ± 2ty) ^ f{x)
and, according to the
definition of strictly quasiconvexity we have f{x ± ty) < f{x),
but
f{x) < max {f{x - ty), f{x + ty)} < f{x) which is absurd. ii) Let x \ x ^ G X , x^ 7^ x^ and the conditions of the theorem be fulfilled and let /(x-^) S f{x'^)- Assuming there exists a number A G (0,1) with /(Ax^ + (1 - A) x^) ^ / ( x ^ ) then, thanks to the Weierstrass theorem, we can also find a number AQ G (0,1) for which the function
g{X) = /(Axi + (1 - A) x2) = /(x2 + A(xi - x^)) admits a maximum on the interval [0,1]. Hence
134
Convex sets
g^{Xo) = (xi - x^) Vfix^
+ Xo{x^ - x^)) = 0
and so we get a contradiction to the conditions of the theorem.
D
Definition 2.10.3 (Elkin (1968), Ginsberg (1973), Diewert, Avriel and Zang (1981)). A function f : X -^ R, defined on the convex set X C M'', is called semistrictly quasiconvex on X if
x\ x^eX,
f{x')
< f{x^) ^ f{Xx' + (1 - A) x2) < /(x2) , VAG(0,1)
or equivalently if
x\x^eX,
f{x^)^f{x^)
:/(Axi + ( l - A ) x 2 ) < m a x { / ( x i ) , / ( x 2 ) } , VAE(0,1) .
Beware! Semistrictly quasiconvex functions are called "strictly quasiconvex" by Mangasarian (1969), Karamardian (1967), Thompson and Parke (1973) and Avriel (1976). They are called "functionally convex" by Hanson (1964); moreover, Martos (1965, 1975) calls "explicitly quasiconvex" the quasiconvex functions satisfying Definition 2.10.2. We have adopted a terminology widely used and which is motivated by the fact that those semistrictly quasiconvex functions, which satisfy certain continuity properties, have an intermediate position between strictly quasiconvex and quasiconvex functions (see the subsequent Theorem 2.10.11). As a consequence of the definitions we can immediately conclude that a strictly quasiconvex function is also semistrictly quasiconvex; similarly a convex function is semistrictly quasiconvex. For other characterizations of differentiable semistrictly quasiconvex functions, see Diewert, Avriel and Zang (1981).
We have already ob-
served that, under certain continuity properties, semistrictly quasiconvex functions have an intermediate position between strictly quasiconvex and quasiconvex functions.
Lacking these continuity properties, semistrictly
quasiconvex functions need not be quasiconvex. Consider for example the following function, defined on 1R\
^
^
f 1 ,
for x = 0
\ 0 ,
for X 7^ 0 .
Generalized convex functions
135
This function is semistrictly quasiconvex on ]R, but not quasiconvex. If we require that / be lower semicontinuous, then we have the following result, established by Karamardian (1967). Theorem 2.10.11. Let f : X -^ M he 3 lower semicontinuous function on the convex set X C IR^; if / is semistrictly quasiconvex on X, then / is quasiconvex on X , but not conversely. Proof. By Definition 2.10.3 we have
fix')
< f{x^),
A € (0,1) =^ f{Xx' + (1 - A) x') < f{x^) .
Hence if f{x') < f{x^), there is nothing to prove. Now assume that f[x') = f{x^)', we will show (by contradiction) that there exists no f G [Xx^ + (1 - A)x2), A e (0,1) such that f{x^) < f{x). This will then establish the quasiconvexity of / . Let x belong to the open line segment ( a : \ x ^ ) such that / ( x ^ ) < / ( £ ) . Then x e^ = {x\ f{x^) < f{x), x G
{x\x^)}. Since / is lower semicontinuous on X, Ct is open relative to {x^^ x'^)\ hence there exists an x G ( i , x^) f l Jl. By the semistrict quasiconvexity of / we have (since x G f i and x G Jl)
/(x2) < f{x) ^ fix) < fix) and
fix^) < fix) => fix) < fix)
,
a contradiction. Hence no such x exists and / is therefore quasiconvex on X.
That the
converse is not true follows from the following example:
X
,
for X ^ 0
m = {0
,
for 0 < X < 1
X—1 ,
for X ^ 1
is quasiconvex on M, but if we take x-'^^—^, x^ = ^, A = ^ , then fix') < fix''), but fiXx' + (1 - A) x2) = fix'). D
136
Convex sets
An important property of a differentiable convex function is that a stationary point is also a global minimum point; however, this useful property is not restricted to (differentiable) convex functions only. The family of pseudoconvex functions, introduced by Mangasarian (1965) and, under the name of semiconvex functions, by Tuy (1964), strictly includes the family of differentiable convex functions and has the above-mentioned property. Definition 2.10.4. Let / : X —> JR be differentiable on the open set X C M^] then / is pseudoconvex on X if:
x\ x2 6 X , f{x^) < f{x^) =^ {x^ - x^) Wf{x^) < 0 or equivalently if
x\ x^eX,
{x'-
x^) V/(x2) ^ 0 =^ fix')
^ f{x^) .
From this definition it appears obvious that, if / is pseudoconvex and V/(a:^) = 0, then x^ is a point of global minimum of / over X. Pseudoconvexity plays a key role in obtaining sufficient optimality conditions for a nonlinear programming problem as, if a differentiable objective function can be shown or assumed to be pseudoconvex, then the usual first-order stationarity conditions are able to produce a global minimum point. Other characterizations of pseudoconvex functions on open convex sets are due to Diewert, Avriel and Zang (1981) and to Crouzeix and Ferland (1982). Theorem 2.10.12. Let f : X -^ M be differentiable on the open convex set X C ]R^] then / is pseudoconvex on X if and only if
xeX,
y^O,
yVf{x)
= 0=^g{t) = f{x + ty),
defined for ^ ^ 0, attains a local minimum at t = 0. Proof. i)
Let / be pseudoconvex and tyVf{x) ^ 0, V t ^ 0; then we have / ( x + ty) ^ f{x), i.e. g{t) = f{x + ty) admits a minimum at ^ = 0.
Generalized convex functions
137
ii) Let the condition of the theorem be fulfilled and let (x^—x^) \/f(x'^) ^ 0. We have to show that f{x'^) ^ fi^^)Consider the function g{t) = f{x'^ + t(x^ - x^)), t e [0,1], and absurdly suppose f(x^) < f{x^), le,gil) 0, then there exists a local maximum point ^0 for g, with to G (0,1). and therefore g'{to) = 0, i.e. {x^ — x'^)Vf{x^) = 0. However, to is not a local minimizer for g, which contradicts the assumptions. If ^^(0) = 0. i.e. (x^ - x'^) Vf{x'^) = 0, then ^ = 0 is, by assumption, a local minimum point for g, i.e. x^ is a local minimum point for / . But this, together with the absurd assumption ^(1) < ^(0), implies the existence of a local maximum point ^o ^ (0,1) for g. Therefore g'i'to) = 0 and again we have a contradiction with the assumptions of the theorem. D
Theorem 2.10.13. Let f : X —^ M be twice-continuously differentiable on the open convex set X C M^] then / is pseudoconvex on X if and only if for all X e X:
i) yVf{x)
- 0 => yHf{x) y ^ 0, and
ii) whenever V / ( x ) = 0, then / has a local minimum at x. Proof. See Crouzeix and Ferland (1982).
D
We shall see that differentiable convex functions are pseudoconvex, and pseudoconvex functions are, in turn, semistrictly quasiconvex and therefore quasiconvex. We shall revert to the notion of pseudoconvexity after the introduction of generalized convex functions at a point. We shall prove that a quasidifferentiable quasiconvex function (defined on the open convex set X C M^) is pseudoconvex at any point x^ € X, where V / ( x ^ ) 7^ 0. Ortega and Rheinboldt (1970) and Thompson and Parke (1973) have introduced a definition of pseudoconvexity for nondifferentiable functions.
138
Convex sets
Definition 2.10.5. An arbitrary function f : X —^ IR \s pseudoconvex on X if
x\x^€X,
A €(0,1),
/(xi) B means that A is sufficient for B but not conversely
Figure 8. Before proving the nontrivial implications of the above diagram, it is useful to point out, for functions f : IR —> M and from a geometric point of view, the differences between the several types of generalized convex functions. Take into consideration the following functions: / i ( x ) = x^ + x
Figure 9.
Convex sets
142
Hx) =
x^
F i g u r e 10.
,
for X < 0
fs{x) - ^ x^o ,
0
for X ^ 0
/sW
Figure 11.
-x'^ h{x)=
\ 0
,
for X < 0
,
for xG [0,1]
(x - 1)^ ,
for X > 1
t/4W
F i g u r e 12.
Properties in optimizaton problems
143
Then it holds that / i is pseudoconvex (even strictly pseudoconvex), but not convex (or strictly convex). /2 is semistrictly quasiconvex (even strictly quasiconvex), but not pseudoconvex (or strictly pseudoconvex). /s is semistrictly quasiconvex (even pseudoconvex and convex), but not strictly quasiconvex (or strictly pseudoconvex of strictly convex). /4 is quasiconvex but not semistrictly quasiconvex. The non trivial implications shown in Figure 8 are the following ones: 1) / pseudoconvex =^ f semistrictly quasiconvex; 2) / semistrictly quasiconvex and lower semicontinuous =^ f quasiconvex. Let us prove implication 1). Let / be pseudoconvex on X, x^^x^ G X , where X is an open convex subset of IRP', with f{x^) < f(^'^)- Assume that there exists a number A G (0,1) with / ( A x ^ + (1 - A) x'^) ^ / ( x ^ ) . Let us denote by XQ the maximum point of h(X) = f{Xx-^{l — A)x^), A e (0,1). Then we have, with x^ = XQX^ + (1 - -^o) oc'^,
h'(^Xo) = {x^-x^)Vf{x'^) However, because of f{x^) ity we have
= 0, > f{x^),
from the definition of pseudoconvex-
(x^ - x^) Vf(x^) < 0 and thus, since x^ — x^ = (1 — Xo){x^ — x'^), XQ G (0,1),
(xi-x2)V/(:r^) strict global min
=> global min
local min =>• unique global min
stationary
stationary
point => global min
point => unique global min
1 Quasiconvex
w
semistrictly quasiconvex
(x)
(x)
strictly quasiconvex
(x)
(x)
pseudoconvex
(x)
(x)
strictly pseudoconvex
(x)
(x)
(x) (x)
(x)
(x)
(x)
Generalized monotonicity and generalized convexity
149
condition that x^ be a global mininnum of / over a set X C IR^ is that x^ be a global radial minimum of / over X, i.e. for every vector y and every X ^ 0, x^ must be a global minimum point of / on the set X n {x \ x =
x^ + Xy, XeM}. For a local minimum things run differently: a necessary condition that x^ be a local minimum point of / over X is that x^ be a local radial minimum point of / , i.e. for every vector y there exists a scalar Xo(y) such that f{x^) ^ f{x^ + Ay), whenever A € (0, Ao(y)) and x^ + Ay G X. But a local radial minimum point need not be a local minimum point, as the following famous example, due to G. Peano, shows. The function / : IR? —» M, defined as f{x^y) = {y — x^)[y — 2x^), has a local radial minimum at x^ = (0^0), but x^ — (0,0) is not a local minimum for f{x^y). See also related comments in Section 3.2 of the next chapter. However, for a quasiconvex function, defined on a convex set X C M^, the above necessary condition becomes also sufficient. We have therefore the following results. Theorem 2.11.8. If / is quasiconvex on the convex set X C MP', then any local radial minimum point of / over X is also a local minimum point of / over X. Proof. See Thompson and Parke (1973).
D
Theorem 2.11.9. If / is semistrictly quasiconvex on the convex set X C IRP, then x^ 6 X is a global minimum point of / over X if and only if x^ is a local radial minimum point of / over X. Proof. See Thompson and Parke (1973).
D
2.12. Generalized Monotonicity and Generalized Convexity We have seen in Theorem 2.5.4, point f ) , the characterization, due to Minty (1964), of a differentiable convex function by means of the notion of monotonicity of its gradient. In general a vector-valued function (or map) F '. X —^ IRP, X C IR!^, is monotone on X if for every pair of distinct
150
Convex sets
points x^^x'^ e X, we have
{x^-x^)[F(x^)-F{x^)]
^0.
This notion has been generalized by Karamardian (1976) and Karamardian and Schaible (1990) to other notions of generalized monotone maps. Definition 2.12.1. Let X C BT'] a map F : X -^ M^ \s pseudomonotone on X if, for every pair of distinct points, x^^x'^ G X, we have
{x^ - x^) F{x'^) ^0=^{x^-
x^) F[x^) ^ 0 .
Obviously every monotone map is pseudomonotone; the converse is not in general true. It is now clear why it is better to adopt the term "pseudolinear" (instead of "pseudomonotone") when we are dealing with functions both pseudoconvex and pseudoconcave. The following lemma is proved by Karamardian and Schaible (1990). Lemma 2.12.1. A map F \ X —^ M^ is pseudomonotone on X if and only if for every pair of distinct points x^, x^ G X, we have
{x^ - x^) F{x^) > 0 => (x^ - x^) F{x^) > 0 . Proof. First note that the pseudomonotonicity of F is equivalent to
{x^ - x^) F(x^) < 0 =^ (x^ - x2) F(x2) < 0 ; therefore it is equivalent to
(x2 - x^) F{x^) >0=^{x^-
x^) F(x2) > 0 .
D
The following result shows that pseudoconvex functions over a convex set are characterized by the pseudomonotonicity of their gradients, similarly to the characterization of differentiable convex functions via monotonicity of their gradients. Theorem 2.12.1. Let / : X - ^ iR be differentiable on the open convex set X C ]R^] then / is pseudoconvex on X if and only if its gradient V / is pseudomonotone on X.
Generalized monotonicity and generalized convexity
151
Proof (Karamardian (1976)). Suppose that / is pseudoconvex on X] let x^ and x-^ be two arbitrary distinct points in X with
(x^ - x^) Vf{x^) ^ 0 .
(1)
We want to show that
{x^-x^)Vfix^)
^0.
Assume the contrary, i.e.
{x^-x^)Vf{x^) 0. Since V / is pseudomonotone, from the last inequality and Lemma 2.12.1, it follows that
{x^ - x) Vf{x^) > 0 . But then we have {x^ — x'^) V / ( x ^ ) < 0 which contradicts (1). Hence / is pseudoconvex. D
152
Convex sets
Definition 2.12.2. A map F \ X -^ JRP' is said to be quasimonotone on X C M^ if, for every pair of distinct points x^^x'^ G X , we have
(x^ - x^) F(x^) >0=^{x^-
x^) F{x^) ^ 0 .
Obviously, by Lemma 2.12.1, every pseudomonotone map is quasimonotone, but the converse is not true. (Take, for example, f(x) = x^, X = M.) Also for the case of quasimonotonicity there exists a link with quasiconvexity. More precisely, we have the following result, due to Karamardian and Schaible (1990). Theorem 2.12.2. Let f : X -^ Rhe differentiable on the open convex set X C M^; then / is quasiconvex on X if and only if V / is quasimonotone onX. Proof. This is quite similar to the proof of Theorem 2.12.1. In fact a) If / is quasiconvex, then ( x l - x2) V / ( x 2 ) > 0 => / ( x l ) > / ( x 2 ) =^
=> (x2 - x^) V / ( x i ) ^ 0 => (x^ - x2) V / ( x i ) ^ 0 , i.e. V / is quasimonotone. b) Let V / be quasimonotone and assume that / is not quasiconvex. Then there exist x ^ x ^ G X such that / ( x ^ ) ^ / ( x ^ ) and A G (0,1) such that, for X = x'^ + A(x^ — x'^), we have
fix) > /(x2) ^ / ( x i ) .
(5)
The mean-value theorem implies the existence of x and x* such that
fix) - / ( x i ) = {x- xi) V / ( x ) fix)
- / ( x 2 ) = ( x - x2) V / ( X * )
where
Comparison between convex and generalized convex functions
x = x'^ + X{x^ - x'^) ,
153
X* = x'^ + A*(a:^ - x^) , 0 < A* < A < A < 1 .
(6)
Then (5) implies that (x ~ x^) Vf{x)
>0
{x - a;2) V/(x*) > 0 . This yields (x* - x) V / ( x ) > 0
(7)
(x - X*) V/(x*) > 0
(8)
in view of (6). From (8) we obtain (x* - x) V/(x*) < 0 which together with (7) contradicts the quasimonotonicity of V / .
D
It must be noted that Theorems 2.12.1 and 2.12.2 supply pure firstorder characterizations of pseudoconvex and quasiconvex functions. Karamardian and Schaible (1990) discuss seven kinds of monotone and generalized monotone maps and give the characterizations of the corresponding generalized convex functions in terms of generalized monotonicity of their gradients. See also the survey paper of Schaible (1994). 2.13. Comparison Between Convex and Generalized Convex Functions We have seen that quasiconvex functions form the widest class of generalized convex functions introduced in Section 2.10. It may be interesting to compare the most relevant properties of convex functions and quasiconvex functions. The following tables are taken from Greenberg and Pierskalla (1971). As usual, X is a convex subset of M^.
154
Convex
sets
Table 2. Analogous (or generalized) properties.
Convex functions
Quasiconvex functions
l a ) / is convex if and only if epi /
l b ) / is quasiconvex if and only if
is a convex set.
its lower level set L ( / , a ) is convex for any a e M.
2 a ) / is linear if and only if e p i / and h y p o / are convex sets.
2 b ) / is quasilinear if and only if its lower level set L{f^a) upper level set U(f^a)
and its
are convex
for any a E M.
3a) / : X relint(X).
]R is continuous on
3b) f
: X
-^
M \s continuous
almost everywhere on r e l i n t ( X ) (Deak (1962)).
4 a ) One-sided partial derivatives
4b)
exist everywhere on r e l i n t ( X ) .
tives exist almost everywhere on
One-sided
partial
deriva-
r e l i n t ( X ) (Deak (1962)). 5 a ) If / is C 2 ( X ) , then / is con-
5b) l f / i s C 2 ( X ) a n d V / ( x ) 7 ^ 0 ,
vex on X if and only if Hf{x)
V x E X, then / is quasiconvex on
is
positive semidefinite on X.
X
if and only if all the
principal minors of Hf{x)
bordered are non-
positive for each x e X. 6a)
If /
is differentiable on
X,
6b)
If
/
then / is convex on X if and only if
X,
then
x\x^ eX,f{x^)-f{x^) x^)Vf{x^).
X
if
7a)
sup
^ {x^-
f(x) < +00 if X is
xGrelint(X)
compact (Fenchel (1953)).
and
X, f{x') x^)Vf{x^)
7b)
is /
differentiable
on
is quasiconvex
on
only
^ f{x^) ^0.
sup
if
x\x^
=>
{x^
G
-
f{x) < -hoo if X
xGrelint(X)
is compact (Greenberg and Pierskalla (1971)).
Comparison between convex and generalized convex functions
155
Convex functions 8a) Every local minimum point is a global minimum point.
a strict global minimum.
9a) The set of global minimum points Xf is a convex set.
9b) The set of global minimum points Xf is a convex set.
10a)
10b) f{Xx)^f{x),
f{Xx)
^
A/(X),VAG[0,1],
Quasiconvex functions 8 b ) A strict local minimum is also
VA 6 [0,1],
if /(O) ^ 0.
if/(^)^/(0).
11a) ip{X) = f[Xx^ + {l-X)x^] is convex on [0,1], Vrr^, x^ 6 X, if and only if / is convex on X.
l i b ) ^(A) = / [Xx^ + (1 - A) x2] is quasiconvex on [0,1], Vx-^,x^ E X, if and only if / is quasiconvex on X,
12a) g{x) = sup {fi{x)}
12b) g{x) = sup {fi{x)}
is con-
iel
is quasi-
iel
vex, where / is any index set.
convex, where / is any index set.
13a) g{x) — F [f{x)] is convex if F is convex and nondecreasing.
13b) g{x) = F [f{x)]
is quasicon-
vex if F is nondecreasing (see also Section 2.15).
Table 3. Properties of convex functions with no analogue for quasiconvex functions
1) g{x) = YA=I ^i9ii^)^ Xi ^ 0, i = l , . . . , n is a convex function if gi, z = 1, . . . , n , is convex. In case gi is a quasiconvex function, then the property is no more valid. Take, e.g., the functions
flix) =
0
if X g 0
—x^
if x > 0
f2{x)
—x
if X ^ 0
0
if a; > 0
Then fi{x) + /2(^) = —x'^, which is not quasiconvex.
156
Convex sets
2) / * * , the biconjugate of / , equals / if / is convex and closed ( / proper). If / is quasiconvex the property is no longer valid. Take, e.g., the function f{x) = —e~^ for which we have / * * ( x ) = — 1 . 3) If / is convex on the bounded set X C M^, it is i'^ixereimtiX) / ( ^ ) > —oo (see Fenchel (1953)). This property does not hold for quasiconvex functions, as shown by the following example:
^
^
r 1/(0:-1) ,
for XG [0,1)
\ 0
for X = 1 .
,
/ is quasiconvex on X = [0,1] but infa,^reiint(X) / ( ^ ) =
-^^
4) The theorems of Berge (Theorem 2.9.2), Fan-Glicksberg-Hoffman (Theorem 2.9.1), and Bohnenblust-Karlin-Shapley (Theorem 2.9.3) are not extensible, in their original forms, by substituting convexity with quasiconvexity.
2.14. Generalized Convexity at a Point We now present another generalization of convex sets and convex functions, introduced by Mangasarian (1969), namely the convexity of a set at a point x^ and the (generalized) convexity of a function at a point x^. So far we have discussed convexity and generalized convexity of a function on a given (convex) set X C IRP'. In the various definitions two points of X, x^ and x^, must vary on X. If we keep fixed one point x^ and let vary only the second point x over X , we obtain local definitions (or better: "pointwise definitions") of, e.g., convex, pseudoconvex, quasiconvex functions at a point x^ G X, with respect to X. This concept is useful because in many cases "global" generalized convexity is really not needed, especially in obtaining sufficient optimality conditions in nonlinear programming.
Generalized convexity at a point Definition 2.14.1.
157
Let X C JR'^ be a nonempty set and x^ e X; then
X is said to be convex at x^ or star-shaped at x^ if
Xx^ + (i-x)xeX,
WxeX,
VAG[O,I].
Definition 2.14.2. Let / : X - ^ iR, X C iR'" be star-shaped at x^ G X; the function / is said to be convex at x^, with respect to X if
/(AxO + ( l - A ) x ) ^ A/(xO) + (1 - A)/(x) , V x e X , VAG[0,1] ; / is said to be strictly convex at x^, with respect to X, if
xeX,
x^x^
f{Xx^ + {l-X)x)<Xf(x^)
+
{l-X)f{x),
VAG(0,1) . Definition 2.14.3. Let / : X - ^ J? be differentiable at x^ G X C iR""; the function / is said to be pseudoconvex at x^, with respect to X , if
xeX,
(x ~ x°) V/(x^) ^ 0 ^ fix) ^ /(x^)
or equivalently if X G X , / ( x ) < /(x^) ^
(x - x°) V / ( x ^ ) < 0 .
Definition 2.14.4. Let / : X - ^ if?, X C iR"" be star-shaped at x^ G X ; the function / is said to be quasiconvex at x^, with respect to X , if
X G X , fix) < /(xO) =^ /(AxO + (1 _ A) x) ^ /(x^) , VAG[0,1] . Obviously if a function is convex, pseudoconvex, quasiconvex at each point of a convex set X C iR'^, it will be, respectively, convex, pseudoconvex, quasiconvex on X . In the same manner we can introduce functions which are semistrictly quasiconvex at x^, strictly quasiconvex at x^ and strictly pseudoconvex at x^. The following results are proved by Mangasarian (1969) and Martos (1975).
158
Convex sets
1) f : X -^ M, X C M^, star-shaped at 2)
is convex at x^ e X if and only if e p i / is
{x^,f(x^)).
Let f : X -^ M be differentiable and convex at x ^ e X C M'',
X
open and star-shaped set at x^. Then we have
/(a;) - / ( ^ O ) ^ (a; - xO) V/(a:0) ,
Vrr € X .
If / is strictly convex at x^ e X, then we have
f{x) - f{x^) >{x3)
Let /
x^) Vf{x^) ,
\/xeX,
x^x^
: X —^ JR be twice-continuously differentiable and convex at
x^ e X C JR^, X open and star-shaped at x^] then Hf{x^) semidefinite (i.e. yHf(x^)y 4)
.
^0,\/y
e
is positive
M^).
Let / : X - ^ jR be differentiable and quasiconvex at x^ G X C M^, X open and star-shaped at x^] then we have
xeX,
fix) ^ f{x^)^{x-x^)Vf{x^)
^0
or equivalently
xeX,
{x-x^)Vf{x^)>0=^f{x)>f{x^).
We point out that the converse implications of 2), 3) and 4) do not hold (the reader is invited t o build numerical or graphical examples). We must note t h a t \f f : X -^ M \s differentiable and convex at x ° G X C IR^, X open and star-shaped at x^, then / is pseudoconvex at x^, but not conversely. However, i f / is pseudoconvex at x^, it is not necessarily also quasiconvex at x^. Obviously a differentiable function on the open set X C IR^ and w i t h f{x) fix)
^ f{x^),
V x G X,
is pseudoconvex at x^] thus, e.g.,
= — cos X is pseudoconvex at x^ = 0, w i t h respect t o M, but not
quasiconvex at x^ = 0, w i t h respect t o IR. See also Mangasarian (1969), Martos (1975), Lopez Cerda and Vails Verdejo (1976), Giorgi (1987). We point out also t h a t some properties, related t o minimization problems.
Generalized convexity at a point
159
continue to hold under local convexity or local generalized convexity. For example, i f / is strictly convex at x^ E X, then if x^ is a minimunn point of / over X, it is the unique minimunn point. If x^ is a local minimum point of / over X and / is convex at x^, then x^ is also a global minimum point. The same holds even if / is semistrictly quasiconvex at x^. Theorem 2.11.5 and Corollary 2.11.1 hold also under the assumption that / is pseudoconvex at X* e X, with respect to X. Similarly, Corollary 2.11.2 also holds under the assumption that / is strictly pseudoconvex at x* E X with respect to X. The following result has been proved independently by Ferland (1971) and Giorgi (1987); see also Crouzeix and Ferland (1982). Theorem 2.14.1. Let f : X -^ IRhe continuous on the open set X C IR^ and differentiable at x^ e X. If / is quasiconvex at x^, with respect to X, and if Vf(x^) 7^ 0, then / is pseudoconvex at x^. Proof. Consider a point x^ E X such that
{x^-x^)Vf{x^)
^0
(1)
but for which
f{x') < fix'') .
(2)
Then x^ belongs to the nonempty set
Xo = {x\xeX,
fix) ^ /(x^)} ,
whose elements, thanks to the quasiconvexity of fix)
at x ° , verify the
relation
x€Xo=>ix-
x°) V/(xO) ^ 0 .
Let us now consider the sets, both nonempty,
W = {x\xeX,
(x - x") V/(x°) ^ 0} ,
XQo = XonW
.
(3)
160
Convex sets
The following implication holds:
X e Xoo => X e Ho = {x \ X e X, (x-x^)
V/(x^) = 0} .
It is therefore evident that XQO is included in the hyperplane (recall that
Vf(x^)^O) H={x\xeEJ',
(x - x^) V/(x^) = 0} ,
supporting XQ, owing to (3). Relations (1) and (2) point out that x^ belongs to W and Xo and hence to XQO, HQ and H. Moreover, thanks to the continuity of / on X, relation (2) says that x^ lies in the interior of X Q : therefore x^ at the same time belongs to the interior of a set and to a hyperplane supporting the same set, which is absurd. So relation (2) is false and (1) implies f{x^) ^ f{x^). D Corollary 2.14.1. Let / : X —> iR be differentiable and quasiconvex on the open convex set X C M^', then / is pseudoconvex on X if and only if / has a minimum point at x^ G X, whenever Vf{x^) = 0. Proof. The necessity part follows at once from the definition of pseudoconvex functions. As for sufficiency, let x^ e X, Vf{x^) = 0; the point x^ is then a (global) minimum point of / over X and we have
(x - x^) V/(x^) = 0 =^ fix) ^ /(x^) ,
\/xeX
.
Thus / is pseudoconvex at x^ with respect to X. On the other hand, on the grounds of Theorem 2.14.1, the quasiconvex function / is pseudoconvex at every point x E X where V / ( x ) 7^ 0. D From Theorem 2.14.1 it also follows that all those criteria used to test the quasiconvexity of a differentiable or twice-continuously differentiable function f : X —^ M, X open subset of ]R^, and that imply V / ( x ) ^ 0, V x G X , in effect identify the class of pseudoconvex functions. This is the case for Theorem 2.10.5 and for the other criteria equivalent to the conditions expressed by this same theorem. This is also the case for the classical conditions on the bordered Hessian, given by Arrow and Enthoven (1961) and generalized by Ferland (1971, 1972b).
Convexity, pseudoconvexity
and quasiconvexity
161
2.15. Convexity, Pseudoconvexity and Quasiconvexity of Composite Functions In spite of the numerous characterizations available for establishing the convexity or the generalized convexity of a function, it may be difficult to decide whether a given function belongs to a particular class of generalized convex functions. Under suitable conditions, however, it is possible to identify the class to which a composite function belongs, after having established the convexity or the generalized convexity of the (simpler) components of the same function. The following results are due mainly to Mangasarian (1970); however, see also Bereanu (1969, 1972), Bector (1973), Martos (1975) and Schaible (1971, 1972). Let $ : ^ —> jR, where A C M^ x M^] $ is said to be increasingdecreasing (incr-decr) on A if and only if for every (y-^, z^) and (y^, z^) in A
y^ ^ y^ and z^ ^ z'^ => $(y^^^) ^ ^(v^^z^)
.
$ is said to be y-increasing (y-incr) on A if and only if for each {y^-,z) and {y'^^z) in A
y^^y^=>^{y\z)
Z^y\z)
and $ is said to be y-decreasing (y-decr) on A if and only if for each {y^,z) and (y'^.z) in A
y^ ^y^^^{y\z)
^^y\z)
.
The following lemma follows directly from the definition of differentiability and the mean value theorem. Lemma 2.15.1. Let $ : A -^ iR be differentiable on the open convex set A C BT" X M^. Then $ is incr-decr on A if and only if Vy^{y, z) ^ 0 and Vz^{y,z) ^ 0 for all {y,z) e A.
162
Convex sets
The following theorem represents the principal result on the generalized convexity of composite functions. Theorem 2.15.1. L e t X C JR'" be a convex set, l e t / ( x ) = ( / i ( x ) , . . . , / ^ ( x ) ) , g{x) = {gi{x), ...,gk{x)), both defined on X, and let $ be a real-valued function defined on IR^ x M^. Let '^(x) = ^{f{x)^g{x)) and let any one of the following four assumptions hold: i)
/ is convex; g is concave; $ is incr-decr;
ii)
/ is linear; g is linear;
iii) / is convex; g is linear; $ is y-'mcr] iv) / is concave; g is linear; $ is y-decr. Then the following results hold: I)
If <J> is convex on ]R^ x M^, then ip is convex on X.
II)
If X is open, / and g are difFerentiable on X and $ is pseudoconvex on IR^ X M^, then ip is pseudoconvex on X.
III) If $ is quasiconvex, then ip is quasiconvex. Proof. I)
We shall first prove this part of the theorem under assumption i). Let x^.x'^ eX and let 0 S A ^ 1. Then
ij{{l-X)x^
+ Xx^) =
= $ ( / ( ( ! -X)x^ ^ $((1 - A) fix')
+ Ax2), g{{l -X)x'+
Xx^))
^
+ A/(x2), (1 - A) g{x') + Xg{x^))
(since / is convex, g is concave and ip is incr-decr) ^
(1 - A) m{x%
g{x^)) + A*(/(a;2), ^(x^))
(since $ is convex) =
= (1 - A) V'(a;^) + AV'Cx^)
^
Convexity, pseudoconvexity
and quasiconvexity
163
and hence ip is convex. Under assumption ii) the first inequality above is an equality and under assumption iii) or iv) it remains an inequality. II)
We again prove first this part of the theorem under assumption i). Let x^.x'^ e X] then {x^-x^)Vi;{x^)
-
= (x2 - x^) . [V/$(/(a:l), ^(^1)) . v / ( : , l ) + +
Vg^f{x'),g(x'))^Vg{x')].
By the convexity of / , concavity of g, incr-decr property of €> and Lemma 2.15.1, we obtain
^
[/(x2) - fix')] . V / $ ( / ( x l ) , 5(xl)) + b(x2) •V,m{x%g{x'))
g{x')].
.
Hence (x2 - xi) VV(xi) i 0 ^ [/(x2) - fix')] + [gix^)-gix')]-Vg^fix'),gix'))
• Vf^ifix'),gix'))
+
^0
(by the above inequality) => ^fix^gix^))
^
^fix'),gix'))
(by the pseudoconvexity of $ )
Hence '0 is pseudoconvex. Under assumption ii) the first inequality in the above proof becomes an equality and under assumption iii) or iv) it remains an inequality. Ill) Again we prove this part of the theorem under assumption i). Let a:\x2 E X and 0 ^ A g 1. \i ^{x'^) ^ i;{x^), then
Convex sets
164 ^fix%g(x'))^^(f(x'),gix')). Since $ is quasiconvex we have
$((1 - A) /(xi) + A/(x2), (1 - A) g{x') + Xg{x^)) g g
^f{x'),g(x'))
and by the assumptions it follows
$ ( / ( ( ! ~ A) ^1 + Xx% g{{l -X)x^ + \x^)) ^ $(/(xi), g{x^)) , i.e. '0((1 — A)x^ + Ax^) ^ V'(^'^) snd hence t/? is quasiconvex. The rest of the proof is similar. D Theorem 2.15.1 can be easily reformulated with reference to the concave and generalized concave case. Moreover, if we set A; = 0, i.e. ip{x) = $ ( / ( x ) ) , the result I) i) is a well-known result on the convexity of composite functions and III) i) is a similar known result on the quasiconvexity of composite functions (see Berge (1963), Bereanu (1969, 1972), Fenchel (1953). Greenberg and Pierskalla (1971), Martos (1975); see, moreover, Theorem 2.5.8 and results 13a) and 13b) of Table 1 of Section 2.13). Theorem 2.15.1 can be applied to a large class of functions; particularly important in view of applications to nonlinear programming problems is the case of nonlinear fractional functions. Let X be a convex set in IRJ", \et p : X -^ M stnd a : X -^ M and consider ip{x) = p{x)/a{x). Suppose that any one of the following assumptions hold on X (Table 4).
Table 4. 1
2
3
4
5
6
p convex concave convex concave linear linear ^ 0 ^ 0 S 0 S 0 S 0
7 linear
8
9
convex concave
^ 0
a concave convex convex concave linear convex concave linear ^0 > 0 < 0 > 0 < 0 > 0 ¥=0 ^ 0
linear < 0
Convexity, pseudoconvexity
and quasiconvexity
165
Then ip is pseudoconvex on X if X is open and p and a are differentiable on X] otherwise ip is quasiconvex on X. In Bector (1973) and Martos (1975) it is proved that, under the above assumptions, ip is indeed semistrictly quasiconvex on X. The above results follow from Theorem 2.15.1 by observing that the real-valued function ^{y^z) = y/z, (y^z) E M x IR is pseudoconvex and hence also quasiconvex on either of the convex sets
{{y,z)\{y,z)elRxR,
z > 0}
or
{{y,z)\{y,z)eMxR,
z < 0} ,
and by making the identifications
f{x) = p(x) ;
g{x) = a{x)
for the cases 1, 5, 8 and 9; and
f{x) = a{x) ;
g{x) = p{x)
for the cases 2, 6 and 7; and
f{x) = [p(x); a{x)] for the case 3; and
g{x) = [p{x); a{x)] for the case 4. Note moreover that, according to the above assertions, the linear fractional function
...
cx + a
^(") = d^T^' with c^d E IR^] a,/3 G iR, is pseudolinear (and hence semistrictly quasilinear) on each convex set X C M^ on which we have dx + f3 ^ 0. Theorem 2.15.1 can also be applied to reciprocal functions, i.e. to functions of the type h{x) = l / / ( x ) , where / is defined on the convex set X C IRP'. From the above theorem we obtain that:
166
i)
Convex sets
If / is positive and concave on X, then h is convex (and positive) on X.
ii) If / is negative and convex on X, then h is concave (and negative) on X. Theorem 2.15.1 can also be applied to bi-nonlinear functions, i.e. to functions of the type ip{x) = p{x) • /i(x), where p and ji are differentiable on X C JR^. Then the following implications hold on X\ 1) p convex ^ 0, p, concave > 0 =^ (p pseudoconvex ^ 0. 2) p convex < 0, p concave ^ 0 =^ (p pseudoconvex ^ 0. 3) p convex < 0, p convex ^ 0 =^ 0 =^ (/? pseudoconcave ^ 0 . For other applications of Theorem 2.15.1 to special composite functions see Avriel, Diewert, Schaible and Zang (1987), Mangasarian (1970), Martos (1975), Schaible (1971, 1972). Another class of convex composite functions which has some importance, especially in geometric programming, is given by the logarithmic convex functions or L-convex functions, examined in detail by Klinger and Mangasarian (1968). Definition 2.15.1. A function f : X -^ M, defined and positive on the convex set X C IRP', is logarithmic convex or L-convex on X if l o g / is convex on X. It is L-concave on X if l o g / is concave on X. From the previous definition it can be proved that the following properties are equivalent: )
/ i s L-convex on X',
i)
1 / / is L-concave on X',
ii) / = e^, with h convex on X', v) / " is convex on X for each a e (0,1);
Convexity, pseudoconvexity
v)
and quasiconvexity
for each x' e X and for each t € R^, m
167
t > 0, Yl'iLi U = 1. we have
m
f ( E tix') s n {f{x')y^; vi) x\x'^
eX.
Xe [0,1]:
/(Axi + (1 - A) x2) ^ [/(rr^)]^ [/(^2)]'-^ • If / is also differentiable on the open convex set X, the following properties are equivalent: i)
/ is L-convex on X]
ii)
x^.x^eX:
f{x^)
f{x^)
f{x^)
iii) x ^ x ^ E X:
^
^ L /(ari)
/(x2) J
If / is twice-continuously differentiable on the open convex set X, the following properties are equivalent: i)
/ i s L-convex on X;
ii) for each x e X the matrix f{x)-Hf{x)-Vf{x)-[Vf{x)]T is positive semidefinite. For proofs of the above equivalences, see Klinger and Mangasarian (1968) and Stancu-Minasian (1992). Moreover, it can be shown that if / is L-convex on the convex set X, then it is convex on X, but not conversely and that if / is concave on X, then it is L-concave on X, but not conversely.
168
Convex sets
2.16. Convexity, Pseudoconvexity and Quasiconvexity of Quadratic Functions Quadratic functions consist of the sum of a quadratic form and a linear function; they are generally expressed as:
Q{x) == \x^Ax
+ hx ,
where A is a real symmetric matrix of order n and h 6 IBP'. First we note that if the quadratic form F{x) = x^Ax is convex on some convex set X C M^, the above sum Q{x) will also be convex on the same set; this does not occur if x^Ax is pseudoconvex or quasiconvex on some convex set, as the sum of a pseudoconvex or quasiconvex function and a linear function is not necessarily pseudoconvex or quasiconvex. Generalized convexity of quadratic forms and functions has been studied by many authors with different techniques. One of the first approaches is due to Martos (1969, 1971, 1975) who characterized quasiconvex and pseudoconvex quadratic functions on the nonnegative orthant by means of the concept of positive subdefinite matrices. This approach was followed also by Cottle and Ferland (1972) who derived additional criteria. Ferland (1971) and Schaible (1971) independently obtained a characterization of quasiconvex and pseudoconvex quadratic functions on arbitrary solid (i.e. with a nonempty interior) convex sets. For a survey on the main results on this subject, see Avriel, Diewert, Schaible and Zang (1987), Schaible (1981). The following results are taken from the quoted references. First we note that if X C M^ is a convex set with a nonempty interior, then the quadratic function Q{x) = ^ x^Ax + bx is convex (concave) on X if and only if A is positive (negative) semidefinite (the proof is obtained taking Theorem 2.5.5 into account). Similarly Q{x) is strictly convex (strictly concave) ox\ X \f A is positive (negative) definite. The following result is due to Cottle (1967). Theorem 2.16.1. Let X C IRP be a nonempty convex set; then Q{x) is convex on X if and only if Q{x) is convex on every translation X + ao^ X.
Convexity^ pseudoconvexity
and quasiconvexity
169
The following result on the quasiconvexity of quadratic functions is perhaps more interesting and is due to Martos (1975). Theorem 2.16.2. The quadratic function Q{x) is quasiconvex on M^ if and only if it is convex on FIP'. Proof. Let y be any vector of IR^ and a > 0 a number such that
Q M ^ Qi-c^y) •
(1)
(Change the sign of y, not of a, if necessary.) Then
^ a^y Ay + aby S ^ a^y Ay — aby i.e. 2ahy ^ 0 . If, however, this holds for some a > 0, it also holds for any a > 0, thus also (1) holds for any a > 0. Now if Q{x) is quasiconvex on IRP', then (1) implies that for all a > 0
[ay - {-oiy)]^ [A{-OLy) + h] = -2a^y^Ay
+ 2a6y ^ 0
i.e. by ^ ay^Ay
.
The last inequality holds for all a > 0 only \^y^Ay ^ 0 (or {—y)'^A{—y) ^ 0 if the sign of y has been changed); as y has been chosen arbitrarily, thus Q{x) is convex on JR^. The converse of the theorem is obvious. D The previous theorem shows that there is no reason to study the generalized convexity of quadratic functions on IRP'. However, there may be quadratic functions or quadratic forms that are pseudoconvex or quasiconvex on a convex subset of JRP' (e.g. iR!fl), but not convex on that subset. For example the quadratic form of two variables F(x) = —xiX2 is quasiconvex on JR^, but not convex there. However, Martos has observed that
170
Convex sets
for quadratic functions we do not have to distinguish between semistrict quasiconvexity and quasiconvexity, i.e. Q(x) is semistrictly quasiconvex on the convex set X C M^ if and only if it is quasiconvex on X. When X = 1R\ we have several interesting results due to Martos (1969, 1975), Cottle and Ferland (1971, 1972), Ferland (1978, 1981). We need some definitions concerning certain classes of matrices (see Martos (1969)). Definition 2.16.1. A real symmetric matrix A of order n and its corresponding quadratic form x^Ax are called positive subdefinite if for all xeM""
x^Ax < 0 => Ax > 0 or Ax < 0 and strictly positive subdefinite if for all x G IRP'
x^Ax < 0 => Ax > 0 or Ax < 0 . We note that if A is positive semidefinite, it is also strictly positive subdefinite and positive subdefinite, but not conversely. Definition 2.16.2. A real symmetric matrix A of order n and its corresponding quadratic form x^Ax are said to be (strictly) merely positive subdefinite if they are (strictly) positive subdefinite, but not positive semidefinite. Similarly we call a function merely quasiconvex (merely pseudoconvex) on some convex set if it is quasiconvex (pseudoconvex) but not convex on that set. We have the following basic results. Theorem 2.16.3. The symmetric matrix A is merely positive subdefinite if and only if i)
A has one (simple) negative eigenvalue, and
ii) A < 0 . Proof. See Martos (1969, 1975).
D
Convexity, pseudoconvexity
and quasiconvexity
171
A more useful criterion, in view of its applications, is contained in the following Theorem 2.16.4. The symmetric matrix A is merely positive subdefinite if and only if i)
A < 0 , and
ii) all the principal minors of A are nonpositive. Proof. See Cottle and Ferland (1972).
D
Theorem 2.16.5. The quadratic function Q{x) — x^Ax + hx is quasiconvex on the nonnegative orthant 1BJ\. if and only if the following bordered matrix
A =
A
b
b^ 0
is merely positive subdefinite. If Q{x) is quasiconvex on IR^ and b j^ 0, then Q{x) is also pseudoconvex on M^. Proof. See Cottle and Ferland (1972) and Ferland (1971).
D
For what concerns quadratic forms F{x) = x^Ax, we have the following results of Martos (1969). Theorem 2.16.6. The quadratic form 'jp- Ax is quasiconvex on JR!^ if and only if it is positive subdefinite . It is pseudoconvex on JRJ\. if and only if it is strictly positive subdefinite. A simple way to test whether a merely quasiconvex quadratic form, quasiconvex on W\., is also pseudoconvex on jR!f: is given in the following result, again due to Martos (1969). Theorem 2.16.7. A merely quasiconvex quadratic form x^Ax
on M^ is
o
merely pseudoconvex on M^ zeros.
if and only if A does not contain a row of
172
Convex sets
Generalizations of the above results for arbitrary solid convex sets were obtained by the authors cited at the beginning of this section. Here we give the following results, due to Ferland (1972a, 1981). Let A be a real symmetric matrix having exactly one negative eigenvalue; let v be either one of the two possible choices of normalized eigenvector associated with the unique negative eigenvalue. Let us consider the following two sets
T\ = {yelR^\
y^Ay ^ 0 and vy ^ 0}
r j = {y € 12^ I y^Ay ^ 0 and vy S 0} . Consider the quadratic function Q{x) = ^x^Ax + bx, with A a real symmetric matrix of order n and b e M^. Associate with Q the set
M = {xelR''\Ax
+ b = 0} ,
Theorem 2.16.8. The quadratic function Q(x) = ^ x^Ax + bx is merely quasiconvex on the solid convex set X C IR^ if and only if M is nonempty, A has exactly one (single) negative eigenvalue and X C T\ + M or
X cTl + M. If M is nonempty and A has one (single) negative eigenvalue, the sets T\ + M and T^ + M are the maximal domain of quasiconvexity for the quadratic function Q{x) = ^x'^Ax + bx (Ferland (1972a)). Note that M is nonempty if and only if rank(A, b) = rank(A). Another criterion for the quasiconvexity of Q{x) on a solid convex set is contained in the following theorem, again due to Ferland (1981). Theorem 2.16.9. The quadratic function Q{x) = ^ x^Ax + bx is merely quasiconvex on the solid convex set X C IRP' if and only if A and the bordered Hessian matrix
A {Ax + bf
Ax + b 0
has exactly one negative eigenvalue for ail x e X.
Other types of generalized convex functions
173
2.17. Other Types of Generalized Convex Functions A generalization of the class of convex functions can be obtained via generalized means. Note that the right-hand side of the inequality /(Axi + (1 - A) x^) S \f(x^) + (1 - A) /(a;2) ,
A G [0,1] ,
(1)
which defines convex functions, is just the weighted arithmetic mean of f{x^) and f{x^)\ so a first generalization is obtained by substituting other weighted means in the right-hand side of (1). This approach has been investigated by Avriel (1972), Avriel and Zang (1974), Martos (1975), Ben-Tal (1977), Castagnoli and Mazzoleni (1989b), Mond (1983). Before stating this generalized type of convexity, we review some classical definitions and results on generalized means (see Hardy, Littlewood and Polya (1934)). Let a and (3 be positive numbers, A E [0,1] and r ^0 any real number. Then the positive number Mr{a,p,\)
= {X- a' + {I - X) l3'}-r
is called the generalized r-th mean of a and /3. This notion of generalized mean easily extends to more than two positive numbers a i , 0^2? •••? cen. keeping the sum of the nonnegative weights Ai5A2,...,An equal to one. The same remark extends to vectors. For r = 0 we define Mo(a,/?,A) = lim M^(a,/3, A) = a^ •/?i-^ . r—>-0
This is the well-known geometric mean of a and /?. For r = +00 and r = —00, respectively, we define M+00 ( /?, then a > Mr{a, /?, A) > ^ for any finite r and A G (0,1). c) If 5 > r, then Ms{a, ^ , A) ^ M r ( a , /?, A) and the inequality is strict if and only if a ^^ /? and A 6 (0,1). Proof. The assertions a) and b) are easy conclusions of the definitions. Assertion c) can be proved by differentiation of Mr. Since the derivative of Mr with respect to r is positive for r 7^ 0 and Mr is (by definition) continuous at r = 0, we can conclude that Mr is strictly increasing with respect to r. D Following the terminology of Avriel (1972), we now extend the definition of convex functions as follows: Definition 2.17.1. let f : X -^ M be positive on the convex set X C IRT", then / is said to be r^-convex on X if
/(Axi + (l~A)a:2) ^ Mr{f{x^)J{x^lX) Vx\x2 6 X ,
^ V A G [0,1] .
(2)
Note that (2) gives the usual definition of (positive) convex functions for r = 1 and of (positive) quasiconvex functions for r = +00. In general, for r > 1 we obtain from (2) generalized convex functions and for r < 1 we obtain special convex functions. In particular for r = 0 we get the logarithmic convex functons:
/(Aa:i + ( 1 - A ) x 2 ) ^
[f{x')]''[f(x^)]'-\
Wx^x'^ eX,
VAe [0,1] .
Since Ms{f{x^), f{x'^), A) ^ Mr{f{x^), f{x^), A) for 5 > r, it follows that a function that is r"^-convex will also be 5"'"-convex for all s > r. Thus we have a continuous transition from the class of positive convex functions (r = 1) to the class of positive quasiconvex functions (r = +00), via the intermediate class of r"^-convex functions, with 1 < r < +00. Recall that we restricted our definition of M r ( / ( x ^ ) , / ( x ^ ) , A) to a positive / in order to allow zero and negative values of r. Avriel (1972)
Other types of generalized convex functions
175
and Martos (1975) independently define r-convex functions as follows. Definition 2.17.2. The function f : X -^ M, defined on the convex set X C ^ , is said to be r-convex on X if
/(Ax^ + (1 - A) x2) ^ logM^{e^(^'),e^(^'),A} , yx\x^
eX,
VAE
[0,1] .
By the definitions of the weighted arithmetic mean, r-convex functions are thus also defined equivalently as those functions satisfying the inequalities (for ail x^.x'^ e X and for all A G [0,1]):
log [Ae'^^(^') + (1 - A) e^/(^')]7 , if r ^ 0 ,
f{Xx^ + {l-X)x^)
^ \
A/(xi) + ( l - A ) / ( x 2 )
,
ifr-0,
max {/(x^), /(x^)}
, if r = H-oo ,
min {/(x^), /(x^)}
, if r = - o o .
Note that r-convexity, which is no longer restricted to positive values of / , reduces to ordinary convexity when r = 0 and to quasiconvex functions when r = -f-oo. As noted above, r-convexity implies s-convexity for all s > r (being l o g ^ a strictly increasing function for ^ e (0,-hoo)). So r-convex functions represent a continuous parametric transition from the class of convex functions (r = 0) to the class of quasiconvex functions (r = -f-oo), via the intermediate class of r-convex functions, with 0 < r < +oo. Avriel (1972) calls the functions that satisfy Definition 2.17.2 with r < 0, superconvex and with r > 0, subconvex. This superconvexity implies convexity which implies subconvexity which in turn implies quasiconvexity. It is often difficult, from an algebraic point of view, to deal with rconvex functions. However, for r-convex functions with a finite r we have the following useful results due to Avriel (1972). Theorem 2.17.2. Let / : X -> iR, with X C M^ convex; then / is r-convex on X, with r ^^^ 0, if and only if the function
/ = exp(r/(a:))
176
Convex sets
is convex for r > 0 and concave for r < 0. Proof. The proof follows easily from the definition of r-convex functions.D Theorem 2.17.3. Let / : X —> iR be a twice-continuously differentiable function on the open convex set X C IR^. Then / is r-convex on X if and only if the matrix Q, given by
Q{x) = rVf{x){Vf{x)f
+ Hf{x) ,
is positive semidefinite for all x e X. Theorem 2.17.4. Let f : X —^ M be ^ twice-continuously differentiable quasiconvex function on the open convex set X C IR^. If there exists a real number r* satisfying
-zTHf{x)z
' - 'II {zTVf{x)f whenever z^Vf{x)
^ 0, then / is r*-convex.
A further extension of convexity by the use of more general means is possible (see Avriel (1976), Ben-Tal (1977), Castagnoli and Mazzoleni (1989b), Mond (1983)). Definition 2.17.3. Let / : X -> iR be defined on the convex set X C IBJ^ and let i9 be a continuous strictly increasing scalar function that includes / ( x ^ ) and / ( x ^ ) in its domain, for any x \ x ^ E X. Then / is said to be d-convex on X if
/(Axi + (1 - A) x2) g ^-1 {Ai9[/(xi)] + (1 - A) t?[/(x2)]} , VX\X2GX,
VAe [0,1] .
(3)
Note that if i?(x) = x, then (3) reduces to the usual definition of convexity. If i?(x) = x^ for r 7^ 0 and i?(x) = logx for r = 0, then (3) reduces to r"^-convexity. If ^ ( x ) = e^^ for r 7^ 0 and i?(x) = x for r = 0, then (3) reduces to r-convexity.
Other types of generalized convex functions
177
So far we have extended the definition of convexity by generalizing the right-hand side of the usual definition of convex function. A further extension is possible by generalizing the left-hand side of (3) as well. This has been done by Avriel and Zang (1980) who introduced the arcwise connected functions:
in essence, since f(Xx'^ + {l-X)x^),
Xe
[0,1], consists
of the values of / at all points on the straight line between x^ and x'^, we can consider, instead of lines, a more general path from x^ t o x'^. Definition 2.17.4. The set X C FT' is said t o be arcwise connected every pair of points x^^x"^ G X,
if for
there exists a continuous vector-valued
function i J ^ i 3,2(A), called an arc or a path, defined on the unit interval [0,1] and with values in X such that
H^l,a:^{0) = X^ ;
H,l^^2{l)=X^
.
Obviously convex sets are arcwise connected and every arcwise connected set is also connected. On the other hand, a nonconvex set may be arcwise connected. Consider, e.g., the following example, taken from Avriel (1976). Let X C M^ be the points lying outside a circle centered at the origin and w i t h radius r:
X = {{xuX2) I xu X2 e R, {xif
+ {X2f ~ r^ ^ 0} .
X is obviously not convex; now the function, expressed in polar coordinates
[(1 - A) ri + Ar2] cos((l - A) a i + Aa2)
Hx^x^W =
[(1 - A) r i + Ar2] sin((l - A) a i + Aa2)
0 ^ A^ 1, x\x^ eX , where
x\ = Vi cos ai
X2 = Vi sin a^ ,
i = 1, 2
is an arc and X is an arcwise connected set. Definition 2.17.5. A function f nected set X C M^, Hrj.i.j.2{X)
: X
—> M, defined on the arcwise con-
is called 'd-arcwise convex on X if there exists an arc
with values in X such that
178
Convex sets
/[iJ,i,,2(A)] S ^-'{X^[f{x')]
+
VX\X2GX,
(1-X)i9[f{x')],
V A G [0,1]
,
where i9 is a continuous strictly increasing scalar function including
f{x^)
and / ( x ^ ) in its donnain. Avriel and Zang (1980) study extensively the subclass of ?9-arcwise convex functions where i9{x) — x and call these functions arcwise functions.
Similarly these same authors introduce arcwise
and arcwise pseudoconnected
connected
quasiconnected
functions. These classes of functions have,
under some mild regularity conditions, some interesting local/global minimum properties; for details the reader is referred t o Avriel and Zang (1980) and to Avriel. Diewert, Schaible and Zang (1987). Another subclass of iJ-arcwise convex functions is obtained assuming that in Definition 2.17.5 the arc Hj,ij,2{\)
is an h-mean value function
if,i,,2(A) = h'' [Xh{x') + (1 - X)Hx^)] ,
given by
A € [0,1] ,
where /i is a continuous one-to-one and onto function defined on a subset of M including the range of / and with values in M^.
This type of function
was explored in a most elegant way by Ben-Tal (1977); see also Castagnoli and Mazzoleni (1989b). Definition 2.17.6. Let / be defined on the arcwise connected set X C IR^ and let h and i9 be the functions previously defined; then / is said t o be (h — 'd)-convex
f[h-'{Xh(x'))
on X if
+ {l-X)h{x^)]
^ ^~i{X-^[f{x')]
+
Wx\x^ eX,
VAG
^
{l-X)^[f{x^)]}, [0,1] .
Note t h a t if in Definition 2.17.6 we take h{x)
= x, x e JR^, and
7? — ^ra^ r 7^ 0, we obtain as a special case the r-convex functions. Avriel (1976) and Avriel and Zang (1980) point out the existence of functions that are (h — t9)-convex but that do not belong t o any other class of
Other types of generalized convex functions
179
generalized convex functions previously described in the present chapter. This is the case, for example, of the "Rosenbrock's curved valley function" , defined by
f{xu
X2) = 100 [X2 ~ (Xi)2]2 + (1 - xi)^ ,
which is a continuously differentiable nonconvex function on M^,
having
a unique minimum at x^ = ( 1 , 1 ) . Some of the results obtained for {h — 79)-convex functions include the following theorems (see Avriel (1976), Ben-Tal (1977)). Theorem 2.17.5. a)
If / and g are {h - 79)-convex, then ^ " ^ [ ^ ( / ) + ^{g)]
is also (h -
i^)-
convex. b) If / , h and i9 are differentiable, then f \s {h — ??)-convex if and only if
j ^ j ^ df{x') i=l
j=l
dh-\h{x'))
^^J
9^^
' {hi{x^) - h,{x^)),
1 ^2 yx\x
The result under b) reduces t o the usual definition of differentiable convex functions if 'd[f{x)]
= f{x)
and h{x)
= x.
The following result is an immediate consequence of the inequality sub b) in the previous theorem. Theorem 2.17.6. Let / be a differentiable {h — 79)-convex function, w i t h h differentiable on X C M^,
d differentiable in the range of / , and let
x^ ^ X satisfy V / ( x ^ ) = 0. Then x^ is a global minimum of / over
X.
Moreover, {h — 7?)-convexity is related t o ordinary convexity by the following result, again due to Avriel (1976) and Ben-Tal (1977)
180
Convex sets
Theorem 2.17.7. The function f \s {h — '?9)-convex if and only if the function / , given by
f = ^[f{h~Hy))], is convex. Applications of {h — 'i9)-convexity to nonlinear programming can be found in Avriel (1976) and Ben-Tal (1977). Another approach in extending the notion of convex function is concerned with differentiable functions. Hanson (1981) noted that the (generalized) convexity requirement, utilized to prove sufficient optimality conditions for a differentiable mathematical programming problem (see the next chapter), can be further weakened by substituting the linear term (x^ — x^), appearing in the definition of differentiable convex, pseudoconvex, quasiconvex function, with an arbitrary vector-valued function. Definition 2.17.6. The differentiable function f : X —> ]R, X open subset of IR^, is said to be invex on X if there exists a vector-valued function r]\X
X X -^ M"" such that
fix') - f{x^) ^ rj{x\ x^) V/(x2) ,
\/x\x^eX
.
The name "invex" was given by Craven (1981b) and stands for "invariant convex", since f — g o 'd will be invex if the differentiable function g : JR^ - ^ IR is convex and the differentiable surjective function I? : IRJ' —^ M^ has Jacobian matrix VT? of full rank. Indeed, for any y,z e IR^ with u = t9(y) and x = '&{z), from the chain rule for differentiation we have:
V/(2/) T?(z, y) - Vg{u) V79(y) rj{z, y) = S/g{u){x - u) , where 77(2:, y) is now chosen in order to satisfy the equality V'd{y)'r}{z^ y) = {x — u) (the hypothesis on the rank of Vi? assures the existence of such a solution). Immediately we can deduce the invexity of / :
V/(y) • r}[z, y) = Vg{u){x ~ u) ^ g{x) - g{u) =
Other types of generalized convex functions
181
We can note that the same property does not hold for convex functions; take, e.g., the function y = e^, x e M, which is convex. Its convexity is destroyed for the transformed function y = exp ^^u. Similarly / is said to be pseudo-invex on X if for some vector-valued function rj(x^,x'^) and all x-^,a:^ G X, we have
The function / : X - ^ JR is said to be quasi-invex on X if there exists a vector-valued function rj{x^jx'^), not identically equal to zero, such that, for all x^^x'^ G X, we have
fix')
^ fix^)^rj{x\x^)Vf{x^)
^0 .
Local definitions of invex (pseudo-invex, quasi-invex) functions at a point x^ e X have also been given: see Kaul and Kaur (1985). It follows that, by taking r]{x^^x'^) = (x^ — x'^), convex functions are invex, pseudoconvex functions are pseudoinvex and quasiconvex functions are quasiinvex. Moreover, invex functions are both pseudoinvex and quasiinvex and the sum of two (or more) functions that are invex with respect to the same function r}{x^^x'^) is also invex. Ben Israel and Mond (1986) and Kaul and Kaur (1985) have studied the relationships between the various classes of (generalized) invex functions and (generalized) convex functions. Let us complete their results (here and also in the sequel, for similar comparisons, X C IR^ is assumed to be open and convex): I)
A differentiable convex function is also invex, but not conversely.
II)
A differentiable pseudoconvex function is also pseudoinvex, but not conversely.
III) A differentiable quasiconvex function is also quasiinvex, but not conversely. IV) Every invex function is also pseudoinvex with respect to the same function ri{x^^x'^), but not conversely.
182
V)
Convex sets
Every pseudoinvex function is also quasiinvex, but not conversely.
Further insights into these relationships can be deduced by means of the following interesting characterization of invex functions (Craven and Glover (1985), Ben-Israel and Mond (1986)). Theorem 2.17.8. X
C M^]
Let f
then /
: X
-^
is invex on X
IR he differentiable on the open set if and only if every stationary point
of / is a global minimum point over X. Proof.
Clearly if /
is invex, then Vf(x^)
= 0 implies f(x)
^
f(x^),
\/X G X. Assume now t h a t
V/(x^) = 0 ^ / ( x ) ^ / ( x « ) ,
VxeX.
If V / ( x ^ ) = 0, take any r]{x, x ^ ) . If V / ( x ^ ) ^ 0, take
''^'^'
^
[V/(xO)]Tv/(xO)
^^^^ ^ •
^
It follows from Theorem 2.17.8 t h a t if / has no stationary points, then / is invex and that both pseudoconvex and pseudoinvex functions are invex. Thus (unlike pseudoconvex and convex functions) there is no distinction between pseudoinvex and invex functions. This is not in contrast w i t h the previous property IV), which is established with respect t o a same function Tj. W e note t h a t some authors (see, e.g., Hanson and Mond (1987), K i m (1988)) still consider pseudoinvexity as a generalization of invexity. Another result, useful in detecting the relationships between the various classes of generalized convex functions considered in this chapter, is Corollary 2.14.1: a differentiable quasiconvex function in the open convex set X C IBJ^ is pseudoconvex on X if and only if f{x)
has a global
minimum at x G X , whenever V / ( x ) = 0. Thus, under the assumption of quasiconvexity, invexity and pseudoconvexity coincide. So for an invex function not t o be pseudoconvex, it must also not be quasiconvex. Another result concerning the above-mentioned relationships is given by the following theorem, due t o Pini (1991).
Other types of generalized convex functions
Theorem 2.17.9.
183
The class of pseudoconvex functions on X
C IR^ is
strictly included in the class of invex functions if n > 1; if n = 1 (functions of one real variable) the two classes coincide. We can therefore complete the previous results l ) - V ) : VI)
The class of invex and pseudoinvex functions coincides.
VII)
The classes of quasiconvex and invex functions have only a partial overlapping; for example f{x)
= x^ is quasiconvex on iR, but not
invex since the stationary point XQ = 0 is not a global minimum point for / . T h e function j{x\^X2)
— {p^\f
-\- x\~
V^^x^f
— X2 is invex,
as it lacks stationary points, but not quasiconvex, as x^ =
(0,0),
xi - 2, X2 = 1, gives f{x)
> 0.
- f{x^)
< 0 but {x - x^) Wf{x^)
T h e relationship between these notions of generalized convexity of differentiable functions, defined on an open convex set X
C IR^, can be
represented in the diagram of Figure 13 (an incorrect diagram appears in Ben-Israel and Mond (1986); the flaw was corrected by Giorgi (1990)). Another approach t o characterize invex functions is through some associated sets; more precisely, Zang, Choo and Avriel (1977) characterized by means of the lower level sets L ( / , a ) = {x\x
e X^ f(x)
^ a}, a £ M,
those functions whose stationary points are global minimum points, i.e. invex functions. Definition 2.17.7. continuous
If L ( / , a) is nonempty, then it is strictly
lower semi-
if, for every x G L ( / , a ) and sequences { a ^ } , w i t h a^ —> a ,
Z / ( / , Qfi) nonempty, there exists k e N,
a sequence { x * } —> x and /3(x) G
iR, f3{x) > 0, such that x' G L ( / , ai - (3{x) \\x' - x||), i = /c, fc + 1 , . . . . The cited authors prove the following result: Theorem 2.17.10.
A function f
: X
-^ M,
differentiable on the open
set X C IRP' is invex on X if and only if L ( / , a) is strictly lower semicontinuous for every a such that L{f^a)
is nonempty.
184
Convex sets
Proof. See Zang, Choo and Avriel (1977).
D
Figure 13. In order to consider also some type of invexity for nondifferentiable functions, Ben-Israel and Mond (1986), Weir and Mond (1988) and Weir and Jeyakumar (1988) introduced the following definition.
Other types of generalized convex functions
185
Definition 2.17.8. A function / : X —> iR is said to be pre-invex on X if there exists a vector-valued function r] : X x X ~^ IRP' such that
[x^ + Ar?(x\ x^)] G X ,
VAG[0,1],
\Jx^,x^eX
and
f[x^ + Xvix\ x^)] S Xf(x')
+ (1 - A) f{x^) ,
yx\x^ eX,
VAe [0,1] .
Of course pre-invexity is a generalization of convexity; Weir and Mond (1988) have given the following example of a pre-invex function which is not convex: f{x) = —\x\, x e M. Then / is pre-invex with 77 given by XX -X2
,
if X2 ^ 0 and x i ^ 0
-
^2
,
if X2 ^ 0 and x i ^ 0
X2 -
^1
,
if X2 > 0 and x i < 0
X2 -
Xi
,
if X2 < 0 and x i > 0 .
XI
'n(xi,x2) =
0^, we have the definition of invex functions. D For other characterizations of nondifferentiable invex functions by means of subgradients and directional derivatives, see Craven and Glover (1985), Reiland (1989, 1990), Giorgi and Guerraggio (1996), Jeyakumar (1987).
References to Chapter II K.J. ARROW and A.C. ENTHOVEN (1961), Quasiconcave programming, Econometrica, 29, 779-800. K.J. ARROW and L. HURWICZ and H. UZAWA (1961), Constraint qualifications in maximization problems. Naval Res. Logistics Quart., 8, 175-191. M. AVRIEL (1972), R-convex functions. Math. 323.
Programming, 2, 309-
M. AVRIEL (1973), Solutions of certain nonlinear programs involving rconvex functions, J.O.T.A., 11, 159-174. M. AVRIEL (1976), Nonlinear programming-Analysis and Methods, Prentice Hall, Englewood Cliffs, N.J. M. AVRIEL. W.E. DIEWERT, S. SCHAIBLE and I. ZANG (1987), Generalized Concavity, Plenum Publ. Corp., New York. M. AVRIEL, W.E. DIEWERT, S. SCHAIBLE and W.T. ZIEMBA (1981), Introduction to concave and generalized concave functions; in S. Schaible and W.T. Ziemba (Eds.), Generalized Concavity in Optimization and Economics, Academic Press, New York, 21-50. M. AVRIEL and S. SCHAIBLE (1978), Second order characterization of pseudoconvex functions, Math. Programming, 14, 170-185. M. AVRIEL and I. ZANG (1974), Generalized convex functions with applications to nonlinear programming; in P.A. Van Moeseke (Ed.), Mathematical Programs for Activity Analysis, North Holland, Amsterdam, 23-33.
References to Chapter II
187
M. AVRIEL and I. ZANG (1980), Generalized arcwise connected functions and characterizations of local-global nninimum properties, J.O.T.A., 32, 407-425. M.S. BAZARAA, J.J. JARVIS and H.D. SHERALI (1990), Linear
Pro-
gramming and Network Flows, J. Wiley, New York M.S. BAZARAA and C M . SHETTY (1976), Foundations of Optimization, Springer Verlag, Berlin. E.F. BECKENBACK (1937), Generalized convex functions. Bull. Amer. Math. Soc, 43, 363-371. C.R. BECTOR (1970), Someaspectsofquasiconvex programming, Z. Angew. Math. Mech., 50, 495-497. C.R. BECTOR (1973), On convexity, pseudo-convexity and quasi-convexity of composite functions, Cahiers Centre Etudes Res. Oper., 15, 4 1 1 428. C.R. BECTOR and S. CHANDRA (1986), p-convexity and first order duality for a nonlinear programming problem, Congressus Numerantium, 52, 53-62. R. BELLMANN (1960), Introduction to Matrix Analysis, McGraw-Hill, New York. A. BEN-ISRAEL (1969), Linear equations and inequalities on finite dimensional, real or complex, vector spaces: a unified theory, J. Math. Anal. AppL, 27, 367-389. A. BEN-ISRAEL and B. MOND (1986), What is invexity?, J. Austral. Math. Soc, 28 (B), 1-9. A. BEN-TAL (1977), On generalized means and generalized convex functions, J.O.T.A., 21, 1-13. A. BEN-TAL and A. BEN-ISRAEL (1976), A generalization of convex functions via support properties, J. Austral. Math. Soc, 21 (A), 341-361. A. BEN-TAL and A. BEN-ISRAEL (1981), F-convex functions: properties and applications; in S. Schaible and W.T. Ziemba (Eds.), Gener-
188
Convex sets
alized Concavity in Optimization and Economics, Academic Press, New York, 301-314. B. BEREANU (1969), On the composition of convex functions, Revue Roumaine de Math. Pures et Appl., 14, 1077-1084. B. BEREANU (1972), Quasi-convexity, strict quasi-convexity and pseudoconvexity of composite objective functions, Revue Fran^aise d'Autom. Inform. Rech. Oper., 6 R, 15-26. C. BERGE (1963), Topological Spaces, Oliver and Boyd, Edinburgh. C. BERGE and A. GHOUILA-HOURI (1965), Programming, Games and Transportation Networks, Methuen, London. B. BERNSTEIN and R.A. TOUPIN (1962), Some aspects of the Hessian matrix of a strictly convex function, J. Reine und Angew. Math., 210, 65-72. H.F. BOHNENBLUST, S. KARLIN and L.S. SHAPLEY (1950), Solutions of discrete two-persons games; in H.W. Kuhn and A.W. Tucker (Eds.), Contributions to the Theory of Games, Vol. I, Annals of Mathematics Studies N. 24, Princeton Univ. Press, Princeton, 5 1 72. J.M. BORWEIN (1977), Multivalued coonvexity: a unified approach to equality and inequality constraints. Math. Programming, 13, 163180. A. BR0NDSTED (1964) , Conjugate convex functions in topological vector spaces. Mat. Fys. Medd. Dan. Vid. Selsk., 34, 1-27. A. CAMBINI (1986), Nonlinear separation theorems, duality and optimality conditions; in R. Conti, E. De Giorgi and F. Giannessi (Eds.), Optimization and Related Fields, Springer Verlag, Lecture Notes in Mathematics, N. 1190, Berlin, 57-93. E. CASTAGNOLI and P. MAZZOLENI (1986), Generalized convexity for functions and multifunctions and optimality conditions. Technical Rep., Dep. Oper. Res., Univ. of Pisa, N. 134.
References to Chapter II
189
E. CASTAGNOLI and P. MAZZOLENI (1989a), About derivatives of some generalized concave functions; in C. Singh and B.K. Dass (Eds.), Continuous-time, Fractional and Multiobjective Programming, Analytic Publishing Co., Delhi, 53-64. E. CASTAGNOLI and P. MAZZOLENI (1989b), Towards a unified type of concavity; in C. Singh and B.K. Dass (Eds.), Continuous-time, Fractional and Multiobjective Programming, Analytic Publishing Co., Delhi, 225-240. K.L. CHEW and E.U. CHOO (1984), Pseudolinearity and efficiency Math. Programming, 28, 226-239. F.H. CLARKE (1983), A new approach to Lagrange multipliers, Math. Oper. Res., 1, 165-174. F.H. CLARKE (1983), Optimization and Nonsmooth Analysis, J. Wiley, New York. L. COLLATZ and W. WETTERLING (1975), Optimization Springer Verlag, Berlin.
Problems,
R.W. COTTLE (1967), On the convexity of quadratic forms over convex sets. Op. Res., 15, 170-172. R.W. COTTLE and J.A. FERLAND (1971), On pseudo-convex functions of nonnegative variables. Math. Programming, 1, 95-101. R.W. COTTLE and J.A. FERLAND (1972), Matrix-theoretic criteria for the quasi-convexity and pseudo-convexity of quadratic functions, Linear Algebra and Its Applications, 5, 123-136. B.D. CRAVEN (1981a), Invex functions and constrained local minima, Bull. Austral. Math. Soc, 24, 357-366. B.D. CRAVEN (1981b), Duality for generalized convex fractional problems; in S. Schaible and W.T. Ziemba (Eds.), Generalized Concavity in Optimization and Economics, Academic Press, New York, 473489. B.D. CRAVEN and B.M. GLOVER (1985), Invex functions and duality, J. Austral. Math. Soc, 39 (A), 1-20.
190
Convex
J.P. CROUZEIX (1980), A second order condition for
sets
quasiconvexity,
M a t h . Programming, 18, 349-352. J.P. CROUZEIX (1981), Continuity and differentiability properties of quasiconvex functions on R"; in S. Schaible and W . T . Ziemba (Eds.), Generalized Concavity in Optimization and Economics, Academic Press, New York, 109-130. J.P. CROUZEIX and J.A. FERLAND (1982), Criteria for quasiconvexity and pseudoconvexity:
relationships and comparisons. M a t h . Pro-
gramming, 23, 193-205. J.P. CROUZEIX and P.O. LINDBERG (1986), Additively decomposed quasiconvex functions, M a t h . Programming, 35, 4 2 - 5 7 . E. DEAK (1962), Ueber konvexe und interne Funktionen, sovie eine gemeinsame Verallgemeinerung von beiden, Ann. Univ. Sci. Budapest Sect. M a t h . , 5, 109-154. G. DEBREU (1952), Definite and semidefinite quadratic forms, Econometrica, 20, 295-300. G. DEBREU and T.C. K O O P M A N S (1982), Additively decomposed quasiconvex functions, M a t h . Programming, 14, 1-38. B. DE F I N E T T I (1949), Sulle stratificazioni convesse, A n n . M a t . Pura e Appl., 30, 173-183. W . E . D I E W E R T (1981a), Generalized concavity and economics;
in S.
Schaible and W . T . Ziemba (Eds.), Generalized Concavity in Optimization and Economics, Academic Press, New York, 5 1 1 - 5 4 1 . W . E . D I E W E R T (1981b), Alternativecharacterizationsof six kinds of quasiconcavity in the nondifferentiable case w i t h applications t o nonsmooth programming; in S. Schaible and W . T . Ziemba (Eds.), Generalized Concavity in Optimization and Economics, Academic Press, New York, 51-93. W . E . D I E W E R T , M. AVRIEL and I. Z A N G (1981), Nine kinds of quasiconcavity and concavity, J. Econ. Theory, 25, 397-420.
References to Chapter II
191
P. DOMBI (1985), On extremal points of quasiconvex functions, Math. Programming, 33, 115-119. H.D. EGGLESTON (1958), Convexity, Cambridge Univ. Press, Cambridge. R.M. ELKIN (1968), Convergence theorems for the Gauss-Seidel and other minimization algorithms, Ph.D. Dissertation, Univ. of Maryland, College Park. K.-H. ELSTER and R. NEHSE (1980), Optimality conditions for some nonconvex problems; in K. Iracki, K. Malanowski and S. Walukiewicz (Eds.), Optimization Techniques, Part 2, Springer Verlag, Berlin, 1 9. G.M. EWING (1977), Sufficient conditions for global minima of suitable convex Junctionals from variational and control theory, SIAM Review, 19, 202-220. K. FAN, I. GLICKSBERG and A.J. HOFFMAN (1957), Systems
of
in-
equalities involving convex functions, Amer. Math. Soc. Proc, 8, 617-622. W. FENCHEL (1953), Convex Cones, Sets and Functions, Lecture Notes, Princeton Univ., Princeton. J.A. FERLAND (1971), Quasi-convex and pseudo-convex functions on solid convex sets, Technical Report N. 71-4, Dept. of Operations Research, Stanford Univ., Stanford. J.A. FERLAND (1972a), Maximal domains of quasiconvexity and pseudoconvexity for quadratic functions. Math. Programming, 2, 178-192. J.A. FERLAND (1972b), Mathematical programming problems with quasiconvex objective functions. Math. Programming, 3, 296-301. J.A. FERLAND (1978), Matrix criteria for pseudoconvex functions in the class C^, Linear Algebra and Its Applications, 21, 47-57. J.A. FERLAND (1981), Matrix-theoretic criteria for the quasiconvexity of twice continuous differentiable functions. Linear Algebra and Its AppL, 38, 51-63.
192
Convex sets
D. GALE (1951), Convex polyhedral cones and linear inequalities; in T.C. Koopmans (Ed.), Activity Analysis of Production and Allocation, J. Wiley &^ Sons, New York, 287-297. D. GALE (1960), The Theory of Linear Economic Models, McGraw-Hill, New York. L. GERENCSER (1973), On a close relation between quasiconvex and convex functions and related investigations, Mathematische Operationsforsch. und Statistik, 4, 201-211. F. GIANNESSI (1982), Metodi Matematici della Programmazione. Problem! Lineari e non Lineari, Pitagora Editrice, Bologna. F. GIANNESSI (1984), Theorems of the alternative and optimality conditions, J.O.T.A., 42, 331-365. Errata coorige in J.O.T.A., 44, 1984, 363-364. F. GIANNESSI (1987), Theorems of the alternative for multifunctions with applications to optimization: general results, J.O.T.A., 55, 233-256. W. GINSBERG (1973), Concavity and quasiconcavity in economics, J. Ec. Theory, 6, 596-605. G. GIORGI (1984), Quasiconvex programming revisited, Calcolo, 21, 307-
316. G. GIORGI (1987), A note on quasiconvex functions that are pseudoconvex, Trabajos de Investigacion Oper., 2, 80-83. G. GIORGI (1990), A note on the relationships between convexity and invexity, J. Austral. Math. Soc, 32 (B), 97-99. G. GIORGI and A. GUERRAGGIO (1996), Various types of nonsmooth invex functions, J. Inf. Optim. Sciences, 17, 137-150. G. GIORGI and E. MOHLO (1992), Generalized invexity: relationships with generalized convexity and applications to optimality and duality conditions; in P. Mazzoleni (Ed.), Generalized Concavity for Economic Applications, Proceedings of the Workshop held in Pisa, April, 2, 1992, Tecnoprint, Bologna, 53-70.
References to Chapter II
193
B.M. GLOVER (1984), Generalized convexity in nondifferentiable programming, Bull. Austral. Math. Soc, 30, 193-218. A.J. GOLDMAN (1956), Resolution and separation theorems for polyhedral convex sets; in H.W. Kuhn and A.W. Tucker (Eds.), Linear Inequalities and Related Systems, Princeton Univ. Press, Princeton, 41-51. H.J. GREENBERG and W. PIERSKALLA (1971), A review of quasiconvex functions. Op. Res., 19, 1553-1570. M. GUIGNARD (1969), Generalized Kuhn-Tucker conditions for mathematical programming problems in a Banach space, SIAM on Control, 7, 232-241. H. HANCOCK (1960), Theory of Maxima and Minima, Dover Publications, New York (original publication: 1917). M.A. HANSON (1964), Bounds for functionally convex optimal control problems, J. Math. Anal. Appl., 8, 84-89. M.A. HANSON (1981), On sufficiency of the Kuhn-Tucker conditions, J. Math. Anal. Appl., 80, 545-550. M.A. HANSON and B. MOND (1987), Convex transformable programming problems and invexity, J. Inf. Optim. Sciences, 8, 201-207. M.A. HANSON and N.G. RUEDA (1989), A sufficient condition for invexity, J. Math. Anal. Appl., 138, 193-198. G.H. HARDY, J.E. LITTLEWOOD and G. POLYA (1934), Inequalities, Cambridge Univ. Press, Cambridge. H. HARTWIG (1983), On generalized convex functions. Optimization, 14, 49-60. L. HORMANDER (1954), Sur la fonction d'appui des ensembles convexes dans un espace localement convexe. Ark. Math., 3, 181-186. R. HORST (1984), On the convexification of nonlinear programming problems: an applications-oriented survey, European J. of Oper. Res., 15, 382-392.
194
Convex sets
A.D. lOFFE (1986), On the theory of subdifferentials; in J.B. HiriartUrruty (Ed.), Fermat Days 85: Mathennatics for Optimization, North Holland, Amsterdam, 183-200. A.D. lOFFEand V.L. LEVIN (1972). Subdifferentials of convex functions, Trans. Moscow Math. Soc, 26, 1-72. A.D. lOFFE and V.M. TIHOMIROV (1979). Theory of Extremal Problems. North Holland, Amsterdam. J.L.W.V. JENSEN (1906). Sur les fonctions convexes et les inegalites entre les valeurs moyennes, Acta Mathematica, 30, 175-193. V. JEYAKUMAR (1985), Strong and weak invexity in mathematical programming, Methods Oper. Res., 55, 109-125. V. JEYAKUMAR (1986), p-convexity and second-order duality, Utilitas Math., 29, 71-85. V. JEYAKUMAR (1987). On optimality conditions in nonsmooth inequality constrained minimization, Numer. Funct. Anal, and Optim.. 9, 535-546. Y. KANNAI (1977), Concavifiability and constructions of concave utility functions, J. Math. Econ.. 4, 1-56. S. KARAMARDIAN (1967), Strictly quasi-convex (concave) functions and duality in mathematical programming, J. Math. Anal. Appl., 20, 344-358. S. KARAMARDIAN (1976), Complementarity over cones with monotone and pseudomonotone maps, J.O.T.A., 18, 445-454. S. KARAMARDIAN and S. SCHAIBLE (1990), Seven kinds of monotone maps, J.O.T.A., 66, 37-46. S. KARAMARDIAN, S. SCHAIBLE and J.P. CROUZEIX (1993). Characterizations of generalized monotone maps, J.O.T.A., 76, 399-413. S. KARLIN (1959). Mathematical Methods and Theory in Games. Programming and Economics, I and II, Addison-Wesley, Reading, Mass. D.M. KATZNER (1970), Static Demand Theory. The MacMillan Company, New York.
References to Chapter II
195
R.N. KAUL and S. KAUR (1982), Generalizations of convex and related functions, European J. of Oper. Research, 9, 369-377. R.N. KAUL and S. KAUR (1985), Optimality criteria in nonlinear programming involving nonconvex functions, J. Math. Anal. Appl., 105, 104112. D.S. KIM (1988), Pseudo-invexity in mathematical programming, Atti Accademia Peloritana dei Pericolanti, Classe I di Scienze Fisiche, Mat. e Naturali, 66, 347-355. A.P. KIRMAN and L.M. TOMASINI (1986), A note on convexity, Metroeconomica, 20, 136-144. A. KLINGER and O.L. MANGASARIAN (1968), Logarithmic
convexity
and geometric programming, J. Math. Anal. Appl., 24, 388-408. S. KOMLOSI (1983), Some properties of nondifferentiable pseudoconvex functions. Math. Programming, 26, 232-237. S. KOMLOSI (1993), First and second order characterizations of pseudolinear functions, European J. of Oper. Research, 67, 278-286. K.O. KORTANEK and J.P. EVANS (1967), Pseudoconcave programming and Lagrange regularity, Op. Res., 15, 882-891. H.W. KUHN and A.W. TUCKER (Eds.) (1956), Linear Inequalities and Related Systems, Annals of Mathematics Studies N. 38, Princeton Univ. Press, Princeton. J. KYPARISIS and A.V. FIACCO (1987), Generalized convexity and concavity of the optimal value function in nonlinear programming. Math. Programming, 39, 285-304. A. LEROUX (1984), Other determinantal conditions for concavity and quasiconcavity, J. Math. Economics, 13, 43-49. M.A. LOPEZ CERDA and V. VALLS VERDEJO (1976), Propiedades de la funciones cuasiconvexas, Trabajos de Estat. y de Invest. Operativa, 27, 107-114. D.G. LUENBERGER (1968), Quasi-convex programming, SIAM J. Appl. Math., 16, 1090-1095.
196
Convex sets
O.L. MANGASARIAN (1965), Pseudo-convex functions, S.I.A.M. J. on Control, 3, 281-290. O.L. MANGASARIAN (1969), Nonlinear Programming, McGraw-Hill, New York. O.L. MANGASARIAN (1970), Convexity, pseudoconvexity and quasiconvexity of composite functions, Cahiers du Centre d'Etudes de Recherche Oper., 12, 114-122. H.B. MANN (1943), Quadratic forms with linear constraints, American Math. Monthly, 50, 430-433. L. MARTEIN (1985), Regularity conditions for constrained extremum problems, J.O.T.A., 47, 217-233. D.H. MARTIN (1985), The essence of invexity, J.O.T.A., 47, 65-76. B. MARTOS (1965), The direct power of adjacent vertex programming methods, Management Science, 12, 241-255. B. MARTOS (1967), Quasi-convexity and quasi-monotonicity in nonlinear programming, Studia Scientiarum Mathematicarum Hungarica, 2, 265-273. B. MARTOS (1969), Subdefinite matrices and quadratic forms, S.I.A.M. J. Appl. Math., 17, 1215-1223. B. MARTOS (1971), Quadratic programming with a quasiconvex objective function. Op. Res., 19, 87-97. B. MARTOS (1975), Nonlinear Programming.
Theory and Methods,
North Holland, Amsterdam. D. MCFADDEN (1978), Convex analysis; in M. Fuss and D. McFadden (Eds.), Production Economics: A Dual Approach to Theory and Applications, Vol. 1, North Holland, Amsterdam, 383-408. P. MEREAU and J.C. PAQUET (1974), Second order conditions for pseudoconvex functions, S.I.A.M. J. Appl. Math., 27, 131-137. G.J. MINTY (1964), On the monotonicity of the gradient of a convex function. Pacific J. of Mathematics, 14, 243-247.
References to Chapter II
197
B. MOND (1983), Generalized convexity in mathematical programming, Bull. Austral. Math. Soc, 27, 185-202. P. NEWMAN (1969), Some properties of concave functions, J. Econ. Theory, 1, 291-314. H. NIKAIDO (1954), On Von Neumann's minimax theorem, Pacific J. of Mathematics, 4, 65-72. H. NIKAIDO (1968), Convex Structures and Economic Theory, Academic Press, New York. J.M. ORTEGA and W.C. RHEINBOLDT (1970), Iterative Solutions of Nonlinear Equations in Several Variables, Academic Press, New York. K. OTANI (1983), A characterization of quasi-convex functions, J. of Econ. Theory, 31, 194-196. L. PELLEGRINI (1991), On a Lagrangian sufficient optimality condition, J.O.T.A., 68, 19-33. A.L. PERESSINI, F.E. SULLIVAN and J.J. UHL (1988), The Mathematics of Nonlinear Programming, Springer Verlag, Berlin. R. PINI (1991), Invexity and generalized convexity. Optimization, 22, 513525. J. PONSTEIN (1967), Seven kinds of convexity, S.I.A.M. Review, 9, 115119. T. RADO (1935), On convex functions. Trans. Amer. Math. Soc, 37, 266-285. T. RAPCSAC (1991), On pseudolinear functions, European J. of Op. Res., 50, 353-360. T.W. REILAND (1989), Generalized invexity for nonsmooth vector-valued mappings, Numer. Funct. Anal, and Optim., 10, 1191-1202. T.W. REILAND (1990), Nonsmooth invexity. Bull. Austral. Math. Soc, 42, 437-446. A.W. ROBERTS and D.E. VARBERG (1973), Convex Functions, demic Press, New York.
Aca-
198
Convex sets
R.T. ROCKAFELLAR (1967), Convex programming and systems of elementary monotonia relations, J. Math. Anal. AppL, 19, 543-564. R.T. ROCKAFELLAR (1970), Convex Analysis, Princeton Univ. Press, Princeton. R.T. ROCKAFELLAR (1974), Conjugate
Duality
and
Optimization,
C.B.M.S. Series N. 16, S.I.A.M. Publications, Philadelphia. R.T. ROCKAFELLAR (1981), The Theory of Subgradients and its Applications to Problems of Optimization: Convex and Nonconvex Functions, Heldermann Verlag, Berlin. N.G. RUEDA (1989), Generalized convexity in nonlinear Programming, J. of Information &. Optimization Sciences, 10, 395-400. S. SAKS (1937), Theory of the Integral, Hafner Publ. Co., New York. P.A. SAMUELSON (1947), Foundations of Economic Analysis, Harvard Univ. Press, Cambridge, Mass. H.H. SCHAEFER (1966), Topological Vector Spaces, MacMillan, New York. S. SCHAIBLE (1971), Beitragezur Quasikonvexen Programmierung, Doctoral Dissertation, Universitat Koln. S. SCHAIBLE (1972), Quasi-convex optimization in general real linear spaces, Zeitschrift fur Operations Research, 16, 205-213. S. SCHAIBLE (1973a), Quasiconvexity and pseudoconvexity of cubic functions. Math. Programming, 5, 243-247. S. SCHAIBLE (1973b), Quasi-concave, strictly quasi-concave and pseudoconcave functions; in R. Henn, H.P. Kunzi and H. Schubert (Eds.), Methods of Op. Res., 17, 308-316. S. SCHAIBLE (1981), Quasiconvex, pseudoconvex and strictly pseudoconvex quadratic functions, J.O.T.A., 35, 303-338. S. SCHAIBLE (1994), Generalized monotonicity - A survey; in S. Komlosi, T. Rapcsak and S. Schaible (Eds.), Generalized Convexity Proceedings, Pecs, Hungary, 1992, Springer Verlag, Berlin, 229-249. S. SCHAIBLE and W.T. ZIEMBA (Eds.) (1981), Generalized Concavity and Optimization in Economics, Academic Press, New York.
References to Chapter II
199
C. SINGH (1983), Elementary properties of arcwise connected sets and functions, J.O.T.A., 41, 377-387. I.M. STANCU-MINASIAN (1992), Metode de Rezolvare a Problemelor de Programare Fractionara, Editura Academiei Romane, Bucharest. J. STOER and C. WITZGALL (1970), Convexity and Optimization in Finite Dimensions - I, Springer Verlag, Berlin. Y. TANAKA (1990), Note on generalized convex functions, J.O.T.A., 66, 345-349. Y. TANAKA, M. FUKUSHIMA and T. IBARAKI (1989), On generalized pseudoconvex functions, J. Math. Anal. Appl., 144, 342-355. F. TARDELLA (1989), On the image of a constrained extremum problem and some applications to the existence of a minimum, J.O.T.A., 60, 93-104. W.A. THOMPSON and D.W. PARKE (1973), Some properties of generalized concave functions. Op. Res., 21, 305-313. A.W. TUCKER (1956), Dual systems of homogeneous linear relations; in H.W. Kuhn and A.W. Tucker (Eds.), Linear Inequalities and Related Systems, Princeton Univ. Press, Princeton, 3-18. H. TUY (1964), Sur les inegalites lineaires. Colloquium Math., 13, 107123. F.A. VALENTINE (1964), Convex Sets, McGraw-Hill, New York. J.P. VIAL (1982). Strong convexity of sets and functions, J. Math. Economics, 9, 187-205. J.P. VIAL (1983), Strong and weak convexity of sets and functions, Math. Oper. Res., 8, 231-259. T. WEIR and V. JEYAKUMAR (1988). A class of nonconvex functions and mathematical programming, Bull. Austral. Math. Soc, 38, 177189. T. WEIR and B. MOND (1988), Pre-invex functions in multiple objective optimization, J. Math. Anal. Appl.. 136, 29-38.
200
Convex sets
H. WEYL (1935), Elementare Theorie der Konvexen Polyeder, Comm. Math. Helv., 7, 290-306. Translated into English in H.W. Kuhn and A.W. Tucker (Eds.), Contributions to the Theory of Games, Vol. I, Annals of Mathematics Studies N. 24, Princeton Univ. Press, Princeton, 1950, 3-18. P. WOLFE (1967), Methodsof nonlinear programming; in J. Abadie(Ed.), Nonlinear Programming, North Holland, Amsterdam, 99-131. I. ZANG, E.U. CHOO and M. AVRIEL (1977), On functions whose stationary points are global minima, J.O.T.A., 22, 195-208.
CHAPTER III. SMOOTH OPTIMIZATION PROBLEMS. SADDLE POINT CONDITIONS
3.1. Introduction In this chapter we shall mainly analyse the optimality conditions for various types of extremum problems, under differentiability assumptions of the functions involved in these problems. An exception is found in the last section of the chapter. We shall treat separately necessary and sufficient optimality conditions. In some cases, e.g. for the unconstrained extremum problem or for the extremum problem with constraints expressed by equalities, these topics go back to the past two or three centuries. Other cases have been treated more recently: the basic "modern" starting articles are the papers of Fritz John (1948) and Kuhn and Tucker (1951). Prior to these papers is the unpublished thesis of W. Karush (1939); for an interesting account on the history of optimization problems, see Lenstra, Rinnooy Kan and Schrijver (1991)). See also Chapter 1 of Fiacco and McCormick (1968), Pourciau (1980), Prekopa (1980). We shall be mainly concerned with the following types of extremum problems (or mathematical programming problems):
Min fix) ;
(Po)
xex Min f{x) xes S^{x\xeX, Min
gi{x) ^ 0, z = 1,..., m} ;
(P)
f{x)
xeSi
Si = {x\x
e X, gi{x) ^ 0 , i = l,..., m; hj{x) = 0 , j = 1,..., r } , (Pi)
where X C M^ is any set, / , gi {i = 1, ...,7TI) are real-valued functions, all defined and differentiable on an open set D C iR^, with X C D; hj
202
Smooth optimization
problems
{j = 1->"">T < n) are real-valued functions, all defined and continuously differentiable on D.
3.2. Unconstrained Extremum Problems and Extremum Problems with a Set Constraint In this section we discuss necessary and sufficient conditions for the existence of extremum points of differentiable functions, in absence of explicit functional constraints. In other words, we shall be concerned with problem ( P Q ) - When X C JR^ is open or, more generally, when the optimal point x^ is interior t o X, {Po) is a free or unconstrained
minimization
problem. Obviously this last is the oldest among the various types of ext r e m u m problems. In the other cases we have a minimization problem with a set
constraint.
Definition 3.2.1. Given x^ G X, the vector y G R^ direction
is said to be a feasible
from x^ for ( P Q ) if 3 a > 0 such that x^ + ay e X,\/a
e [0, a].
From a geometric point of view a feasible direction ?/ is a vector such t h a t if x^ G X, then every point x^ + ay of the segment joining x^ and x^ + ay, belongs t o X.
T h e following theorem gives a necessary condition
for the existence of local solutions for (Po). Theorem 3.2.1. Let x^ G X be a point of local minimum for (Po); then we have yVf{x^)
^ 0 for any feasible direction y from x^.
Proof. Being that ?/ is a feasible direction, then x^-{-ay
e X,\/a
e [0, a ] .
As / is differentiable on D D X , we have
/(a:^ + ay) = /(x^) + ayV/(x^) + o{\\ay\\) , where lim {o(||a?/||)/||a?/||} = 0 . a—>"0
Therefore, for a > 0 suitably small, the sign of yVf{x^) the sign of
/(x° + ay)-/(x 0 and
lly^ll = 1. The sequence {dk-iV^} has a subsequence that converges to (0, y), where ||y|| = 1. By the Mean Value Theorem we get for each k in this subsequence
hi{z^) - K{x^) = ^kV^Vhiix' + vi,k^ky^) = 0 , i = l,...,r where rji^k ^ (0,1) and
f(z^) - f{x^) = ^kv'Vfix^ + a^^y") ^ 0 , where ^A: € (0,1). Dividing the two last expressions by ^9^ and taking limits as fc —> +oo, we get yVhi{x^) = 0, i = l,...,r and yVf{x^) ^ 0. From Taylor's expansion formula we obtain V'(^^ A) = V(2;°, A) + t?fc • 3/^V,^(a:0, X) + + \ {^k? • y^ • H^i'ix^ + m^ky\
A) • y ^
with r]ke (0,1). Dividing this expression by 5 (i?fc)^ and taking into account that Vxi^i^^, '^) 0 and tA(^^ A) - V(xO, A) = f{z'^) - /(rrO) ^ 0 , we obtain y^'H,i^{x^
+ Vk^ky\X)y^
^0.
Letting k -^ +00 we obtain
and, being y 7^ 0 and satisfying relation (1), this completes the proof of i).
216
Smooth optimization
problems
ii) If (5) holds, i.e. Vx'0(a:^, A) = 0 and ^^(a:, A) is pseudoconvex at x^, then x^ is a point of global minimum of'0(x, A) on 5e:
I.e.
/(x^) - Xh{x^) s fix) - Xh{x),
yxeSe
I.e.
/(xO) £ f(x)
,
V x € 5e .
D
in Theorem 3.3.1 we assumed that the Jacobian matrix Vh(x^) has full rank r ( < n). We now give a generalization of that theorem, in the sense that no regularity condition on the rank of the Jacobian is required. The following theorem is a particular case of a more general result, due to Fritz John (1948), and may be referred to as the Lagrange-Fritz John necessary conditions for the optimality in (Pg)Theorem 3.3.3. In (Pg) let / be differentiable and h be continuously differentiable in a neighbourhood of rr^; if x^ is a local minimum of f{x) on Se, then there exist multipliers AQ, A I , ..., A^., not all zero, such that
AoV/(xO) - J2 XiVhiix^) = 0 .
(7)
2=1
As we have already proved the classical theorem of Lagrange for (Pe). we omit the proof of the previous theorem, which is a particular case of a more general theorem, which will be proved in Section 3.8 of this chapter. Obviously, if in (7) it is AQ 7^ 0, then we obtain relation (5). It can be proved (see Hadley (1964)) that if the rank of the augmented Jacobian matrix ^ "= [ ^ J) o\)'
^ ^^^^ which is less than or equal
to r + 1, is equal to the rank of V/i(x^), then AQ 7^ 0 in (7). If, however, rank(G) > rank(V/i(x^)), then in (7) AQ = 0. Moreover, if
Equality constrained extremum problems
217
rank(G) = r + 1, then / does not take one local minimum at x^ on SQ. Then i f / ( x ) takes on a local minimum at x^ 6 Se, then rank(G) < r + 1. Example 3.3.1. The point x^ = (0)0) is a minimum point for the function f{x)
= xi + X2f subject to h(x) = xf + x^ = 0. Condition (5) is not
verified at x^ and therefore the method of the Lagrangian multipliers is not applicable. Indeed for '0(xi,X2, A) = x i + rc2 + A ( x f + ^2), the system
1 + 2Aa;i = 0 1 + 2Ax2 = 0 X-t
~T~ X0:y=
lim
Afc(a;^ - rr:°)}
is called (Bouligand) tangent cone to S at x^ or contingent cone to S at x^. This cone has been used first in optimization theory by Abadie (1967), Varaiya (1967) and Dubovitskij and Miljutin (1963, 1965). Many authors have given various equivalent definitions of such a cone. We consider only the following ones:
Ti{S, x^) = {yeR^\3{x^}
CS,3{f^k}
C M+,
lim Hk = 0 : x'' = x° + ij,k • y + (J.k • o(l)} ; K—•+00
see Bazaraa, Goode and Nashed (1974).
T2{S,x^) = {yeM^\VN{y), ByeNiy)
VA > 0, 3i€(0,A),
: x^ +
tyeS};
see Bazaraa, Goode and Nashed (1974).
T3(5,x°) = f l
cl(cone((5 n N{x°)) - x°)) ;
see Vargiya (1967).
r4(5,x°) = (y G J?" I 3{t'=} C R+, 3{x'=} C 5 : lim tk = 0, ^
fc—•+00
x^ — x^ __ ^ see Hestenes (1966, 1975), Kurcyusz (1976).
mS, x^) = {y € i?^ I 3 {tk} C JR+, tk ^ 0, 3 { / } - . y : x 0 + tfcy'=e5}; see Rockafellar (1981), Saks (1937).
TeiS,x'') =
{yeR^\3X^0,
3{xnc5\{xO}:.^^.o, A i i | ^ - . y } ;
Local cone approximations
of sets
223
see Hestenes (1966, 1975), Rogak and Scott-Thomas (1973). TriS.x^)
= {yeM''\3ip:]R^^ yX>03te
ST,
lim ip{t) = y;
(0,A) :x^ + t(p{t) eS}
;
see Elster and Thierfelder (1988a, 1988b). For a connplete proof of the equivalence between T{S^x^) and any one of the above cones Ti{S^x^), i = 1, ...,7, see Giorgi and Guerraggio (1992a, 1992b), Elster and Thierfelder (1985). Most of the abovementioned equivalences are straightforward consequences of the definitions. Here we prove only the not immediate equivalence T{S^x^) = T3{S,x% Theorem 3.4.I. Let S C R"" and x^ 6 S; it is T{S, x^) = TsiS, a;°). Proof. If y e r ( 5 , x ' ^ ) , then y=
lim
Xk{x'' - x^) ,
fc—»-hoo
with x^ e S, Xk > 0 and x^ —> x^. Since x^ —> x^, then given any open ball N{x^) there exists a positive integer k such that x^ e S D N{x^) for all k>k. Therefore Xk(x^ - x^) G cone((5 H N{x^)) - x^) for all A; > fc and hence y=
lim
Xk{x^ - x^) e cl(cone((5 n N{x^)) - x^)) .
fc—>H-oo
But since this is true for any iV(x^), then yef]
cl(cone((5 n N{x^)) - x^)) , Ar(xO)
i.e. r(5,a:^)cT3(5,x«) . Conversely, let y E Ts{S^ x^); given any positive integer k, choose an open ball Ni/]^{x^). If y G r 3 ( 5 , x^) this implies that y is the limit of vectors of the form Xi{x^ - x^), where A^ > 0 and x^ G 5 H Ni/k(x^). Now choose
224
Smooth optimization problems
Ik such that \\y — \i^{x^^ — x^)\\ < 1/k. By varying k we thus generate the sequence {A^^} and {x^^}. Note that A^^ > 0, x^^ e S, x^^ —> x^ and Xl,{x^^ - x^) -> y, i.e. X e T{S, x^). Therefore n{S, x^) C T ( 5 , x^) and the proof is complete. D The various definitions of Boulingand tangent cone can be reformulated for a topological vector space, even if for the equivalence of some definitions we must suppose that the space is metrizable. Theorem 3.4.2. i)
T{S,x^)
ii)
if x^ is an isolated point of S, then T{S,x^) then T{S,x^)=R'']
iii)
is a nonempty closed cone with 0 G T{S,x^); = { 0 } ; if x^ e
mt{S),
T{S,x^)=T{S,x^y,
iv) SiCS2=^
r ( 5 i , x ^ ) C T(52,x^) (isotony property).
v) T{Si n ^2, x^) C T(5i, x^) n r(52, xO) r(5iU52,xO)=r(5i,xO)ur(52,xO); vi) T{Si X ^2, {x\ x2)) c r ( 5 i , xi) X T(52, x^). For the proof of this theorem (some assertions are a trivial consequence of the definition), as well as of the other theorems of this section, we refer to Bazaraa and Shetty (1976), Giorgi and Guerraggio (1992a, 1992b), Elster and Thierfelder (1984, 1988a, 1988b), Ward (1988). We note that i) follows immediately from characterization Ts{S^x^). Some examples of a contingent cone to a set at a point x^ = 0 are given in the figures below. In Figure 8 5 is given by the union of the edges of an infinite sequence of squares, every one with a side which is twice that of the previous square. T{S^x^) is the acute angle with vertex at x^ and with edges r' and r'\ In Figure 9 T{S,x^) = E?.
Local cone approximations of sets
225
jfi
yfhv
Jr\ 1 t 1 ^ V y[ 1 iNk
vf
1 ivT
J4»
jon
//[
I
1
//\ A/] / / Ml / 1/1 1 1 1 1 1 1
/I/ yf / Ar MiH / iM
s:
^
iv\
IxV,
fxV IVPVL \ Pv 1 1 1 1 1 1 1 \l 1 X
11 1
\\\\\\\\\}\y \ \ \\
/1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 \ 1
/fllllllllllllllllllllllllN T(S^°)
r(5^°)
ns^°) Figure 4.
Smooth optimization
226
problems
ns^) Figure 5.
t° • « * • • *
r(5,;cO) Figure 6.
T{S^°) 1
^n
- S
Figure 7.
1
1
V "
227
Local cone approximations of sets
Figure 8.
Figure 9. We shall see that if 5 is convex, then S-x^ is a convex set (see Theorem 3.4.10).
C T{S, x^) and T{S, x^)
Other examples of contingent cone are the following: T{Q^^ x^) ~ IRP", TiZS'^.x^) = { 0 } , if x^ e ^ ^ . Vectors y e T{S,x^) are also called "sequential tangent vectors" to S at x^. M. Guignard (1969) introduced (for Banach spaces) the notion of pseudotangent cone, in order to generalize the optimality conditions for a smooth mathematical programming problem. Definition 3.4.2. Let S c M^, S y^ 0, x^ e 5; the set
228
Smooth optimization
problems
P ( 5 , x^) = conv(T(5,xO)) is called pseudotangent cone to S at x^. Obviously P(5', x^) is a convex cone and satisfies all properties listed in Theorem 3.4.2 with the exception of the second relation of v). Obviously T{S,x^)cP(S,x'^), It may occur (especially if S is not convex) that T{S^x^) is a rough local approximation of S around x^. This is due to the fact that on the ground of Definition 3.4.1, the requirement that y G T{S,x^) is quite weak. A stronger requirement is obtained by substituting the sequence {x^} with an "arc". This cone has been introduced by Kuhn and Tucker (1951) and generalized by Ursescu (1982). Definition 3.4.3. Let S C M"", S ^ 0, x^ e 5; the set A(S,x^)
= {yeEJ'\3(p:R^^
EJ", (^(0) = x^,
if'iO) = y, 3 A > 0, V^ 6 (0, A) : ^{t) e S] is called cone of attainable directions to S at x^. Note that in the above definition the function ^ is not required to be continuous and that the differentiability is required only at zero (Kuhn and Tucker require the arc to be differentiable, albeit in their proof only differentiability at zero is used; the cone of Kuhn and Tucker is therefore not the same as A{S,x^)\ see Palata (1989) and Peterson (1973)). Other equivalent definitions of this cone are the following: Ai{S, x^) = {y e IRP \3^
\ JR+ -^ EP', lim ^{t) = y, X—>^0
3 A > 0 , V t € ( 0 , A ) -.x^ + tifit) eS}
;
A2{S, x°) = {y G i ? " I V{tfc} C M+, tk -> 0, 3 {y''} ^y:x^
+ tkV^ € S for large k} ;
Local cone approximations of sets
229
^3(5,x0) = { y € J?"|ViV(y), 3 A > 0 , V t 6 ( 0 , A ) ,
3yeN{y)
-.x^ + tyeS}
.
This last definition is due to Ursescu (1982). Vectors x G A(S, a;°) are said to be "curvilinear tangents"; it is immediate to note that A{S, a;°) C T{S, x°). In Figures 2, 3, 4, 5, 7, 9 we have A(S,x^) = T{S,x°y. A{S,x^) = { 0 } in Figure 6 and A{S,x°) = r' in Figure 8. Moreover v4(g^,2;°) = iR", Va;°; A{Z'',x°) = {0}, Vx° € ^ " . Theorem 3.4.3. i)
A{S, x^) is a nonempty closed cone with 0 6 ^ .
ii)
If x° is an isolated point of S, then ^ ( 5 , x°) = { 0 } and A{S, x°) = ^ " ifx°€int(5).
iii) A{S,x^)
=
A{S,x^).
iv) 5i C 52 =^ A{Si, x°) C ^(52, x°). v)
^(5in52,a;°) C>l(5i,xO)n^(52,x°); ^ ( 5 i U 52, x°) D A(5i, xO) U ^ ( 5 2 , xO).
vi) yl(5i X 52, (xi, x2)) = A(5i, xi) X A{S2, x^).
We note that in Arrow-Hurwicz and Uzawa (1961) and in Bazaraa, Goode and Shetty (1972) the cone A{S, x") is not recognized as a closed set. A further strengthening of Definition 3.4.3 requires that the arc contained in the set is "linear". We obtain in this case the cone of feasible directions, specified in the following Definition 3.4.4. Let 5 c ^ " , 5 7^ 0, x ° G 5 ; the set Z{S, x°) = { y G I J " I 3 A > 0, V t € (0, A) : x ° -t- ty € 5 } is called cone of feasible directions to S at x^.
230
Smooth optimization
problems
Vectors y e Z{S, a;°) are called linear tangent vectors; we have Z{S, x^) C In Figures 3, 7 and 9 Z ( 5 , x ° ) = A{S,x°)
= T{S,x%
in Figure 2
Z ( 5 , x ° ) = int(A(5, xO)) = int(r(5,a;0)); in Figures 4 and 5 Z ( 5 , x ° ) = { 0 } ; in Figure 6 Z(S,x°) = A{S,x^) = { 0 } and in Figure 8 Z{S,x^) = Moreover, Z{Q'',x°) = { 0 } , Va;° € Q"; Z(Q",a;0) = 0, Va;° ^ Q"; Z ( ^ " , x ° ) = { 0 } , Va;° € ^ ' " . Another cone, quite similar to Z{S,x^) is given by the following Definition 3.4.5. Let S C iR", 5 ^^ 0, a;° € 5; the set F ( 5 , x ° ) = { y € i R " | V A > 0 , 3t € (0,A) : x° + ty € 5 } is called radial tangent cone to S at x°; see Vlach (1970, 1981). Theorem 3.4.4. i)
0 6 Z{S, x°) if and only if 0 € F{S, x°) and if and only if x ° € 5.
ii)
If x° is an isolated point of S, it is F ( 5 , x ° ) = Z ( 5 , x ° ) = { 0 } ; if x° G int(5), it is F ( 5 , x ° ) = Z ( 5 , x ° ) = jR".
iii) 5i C 52 ^ ^ ( 5 i , x ° ) C Z(52,x0) and F{Sux^) iv) Z{Si Z(5i F(5i F{Si v)
n 52,x°) U52,x0) U 52, x°) n 52, x°)
= D = C
Z(5i,x°) Z(5i,xO) F ( 5 i , xO) F ( 5 i , x°)
C F(52,x0).
n Z{S2,x°y, UZ(52,xO); U F(52, x"); n F(52, x^).
Z{Si X 5 2 , ( x \ x 2 ) ) = Z ( 5 i , x i ) X Z(52,x2); F ( 5 i X 5 2 , ( x \ x 2 ) ) c F ( 5 i , x i ) x F(52,x2).
We have F(Q",x^) = M", V x ° G Q"; F(Q^,x^) = M'XiO}, Vx^ ^ (g^; F ( 2 ^ " , x°) = { 0 } , V x ° € ^ " . We remark also that in Figures 3, 7 and 9 F(5,xO) = Z ( 5 , x ° ) = ^ ( 5 , x ° ) = T{S,x^y, in Figure 2 F ( 5 , x ° ) = Z(5,xO) = int(yl(5,xO)) = int(T(5,xO)); in Figures 4, 5 F ( 5 , x ° ) =
Local cone approximations of sets
231
Z(S,x^) = {0}; in Figure 6 F{S,x^) = Z{S,x°) = A{S,x°) = {0} and in Figure 8 F(5,x°) = r(5,x°). in any case Z{S, x°) C F{S, x°) and if x° e Sn{EP\S)
then the fol-
lowing "duality properties" hold between the cones F{S, x°) and Z{S, x°). Theorem 3.4.5. Let x° € 5 D iM^\S);
it results that:
i) Z{S, x^) = iR^\F(iR^\5, x^); ii) F{S, x^) = iR^\Z(iR^\5, x^). A further restriction of the previous concepts is given by the cone of interior directions to S at x^ or internal cone or cone of interior displacements] this cone was introduced by Dubovitskij and Miljutin (1963, 1965) with the name of "cone of permissible variations" and was subsequently utilized by Bazaraa and Goode (1972) for obtaining general necessary optimality conditions for a smooth programming problem. Definition 3.4.6. Let 5 C JR"", 5 7^ 0, x^ G 5; the set
I{S,x^) = {yelR''\3N{y), \/yeN{y):x'^
3 A > 0 , Vte(0,A), +
tyeS}
is called cone of interior directions to S at x^. Other equivalent characterizations of this cone are:
/i(5, x^) = {y eM^'lVip:
M^-^ BP', with lim ^{t) = y,
3 A > 0 , VtG (0,A) :x^ + t^{t) e S) ; h{S,x^) =
[yeIR^\3N{y), 3iV(0) : :r:^ + ( U tN{y))r\N(Q)
C s] .
(This characterization is due to Laurent (1972); see also Sachs (1978).)
232
Smooth optimization problems
hiS.x^)
= {yeIR^
I V { / } -^ y, V{Xj,} c JR+,
lim Xk = 0 : x^ + XkV^ e S for large k} ; his, x°) = {2/ € iR'^ I Vx e 12", yip:R^
M, with
¥J(A) = o(A) for A -^ 0+ : 2;° + Ay + (^(A) a; e 5 for small A} From the definitions it appears that
I{S, x^) C Z{S, x°) C A{S, a;°) C r(6', a:°) C P(5, x°) . In Figures 2 and 3, /(S'.a;'^) = mt{T(S,x^))i in Figures 4, 5, 6, 7 and 8 I(S,x°) = 0; in Figure 9 I(S,x^) is the whole space M^, with exclusion of the line r. Moreover, J((Q", a;°) = / ( ^ " , x°) = 0. Theorem 3.4.6. i)
7(5, a;°) is an open cone;
ii)
If x° is an isolated point of 5, then I{S, a;°) = 0; if x° € i n t ( 5 ) , then
/(5,x°) = JR"; iii) /(5,x°) = /(int(5,x°)); iv) 5i C 52 =» /(5i,xO) C /(52,x0); v)
I{Si n 52, x°) = / ( 5 i , xO) n 7(52, x°): 7(5iU52,xO) D 7 ( 5 I , X 0 ) U 7 ( 5 2 , X 0 ) ;
vi) 7(5i X 52,(xi,x2)) = 7(5i,xi) x 7(52,x^). Moreover, it holds the following coimplication: 0 € 7(5, x°) 0 3 t G (0,A) :x^ + tyeS}
.
The cone Q{S^x^) has the properties specified by the following theorem (again the proofs can be found in Elster and Thierfelder (1984), Giorgi and Guerraggio (1992b); see also Ursescu (1982)). Theorem 3.4.8. i)
Q(5, x^) is an open cone;
ii)
If x^ is an isolated point of 5, then Q{S, x^) = 0; if x^ e iiit(5), then Q{S,x^) = R']
iii) Q(int(S'),xO) = Q(5,xO);
234
Smooth optimization
problems
iv) 5i C 52 =^ Q{Si, x^) C Q(52, x^y, v)
Q{SinS2,x^)cQ{Sux^)nQ{S2,x^y, Q{SIUS2,X^)DQ{SI,X^)UQ{S2,X%
vi) Q(Si X 52, (x\ x^)) c Q(Sux^)
x g(52, x^).
We note that in Figures 2 to 9 of this section, we have I(S^x^) = Q(S^x^), with x^ = 0. If we take S as the set of points belonging to the sequence of circles in Figure 10, it is quite immediate to see that 7(5, x^) = 0 and Q{S^x^) ^ 0 (given by the cone depicted in the figure; again we have, for simplicity, x^ = 0).
F i g u r e 10. Also for the cone of quasi-interior directions we have 0 E Q(S^x^) 0. Thus x^ + ty e 5, Wt e (0, A), i.e. yeZ{S,x^), Let us now prove iii). Let y e r ( 5 , x^); then for every N{y) there exists a scalar A > 0 such that {x^ + XN{y)) n 5 7^ 0. Hence it is T{S, x^) C cl(cone(5-a;^)). Moreover, from the convexity of S (and of S) it follows: (x^ + tN{y)) n 5 7^ 0, V t € (0, A). Therefore y e A{S,x^) = A{S,x^) and A{S,x^) = T{S,x^) C cl(cone(5 - x^)). Finally from ii) it follows cl(cone(5 - x^)) C cl(cone(5 - x^)) = Z{S, x^) C T ( 5 , x^) = T(S, x^) and all the cones are equal. So, obviously, T{S^x^) and A{S,x^) are convex cones and we have also the equality T{S^x^) = P{S^x^). To get relation iv), first recall that T(5,x^) = cl(cone(5-xO)) ; /(5,xO) = cone(int(5)~xO) . By Corollary 2.1.3 we get int(T(5, x^)) = int(cl(cone(5 - x^))) = int(cone(5 - x^)) . So we have to show that int(cone(5 - x^)) = cone(int(5) - x^) .
Local cone approximations of sets
237
Without loss of generality we assume x^ = 0. The inclusion
cone(int(S')) C int(cone(5)) is trivial, since cone(int(5)) C cone(S') and cone(int(5')) is an open set. For the converse inclusion, let y G int(cone(5)); then there exists a neighbourhood of y which is contained in cone(5). In the same manner there exists an n-dimensional simplex, i.e. n + 1 vectors y-^, ...^y'^'^-^ G cone(5), with y G int(conv{y^, ...,2/'^"^^}) C cone(5). Now we can find vectors x^^ ...^x'^'^^ G S and numbers Ai,...,An+i > 0 such that y^ = A i x \ i = l , . . . , n + 1. Moreover, with A = m a x { A i , . . . , A^+i} we get (since Oe S) -y' A
jye
= -^x' A
eSfor
3\\i = 1,..., n + 1 and hence
i n t ( c o n v { l y \ .., ^y^+'}) C int(5) = int(5) ,
i.e. y G cone(int(5)). In the last relation we have used Corollary 2.1.3. Assertion v) is straightforward: if int(S') ^ 0, then also I{S,x^) cone(int(5) — x^) ^ 0 and by Corollary 2.1.3 again we have
=
T{S, x^) = int(r(5, x^)) = 7(5, x^) .
D
From the previous results it appears that, even if S is convex, there are three distinct levels of local cone approximations. The first approximation is an open cone; the second approximation has no topological properties with respect to openness and closedness; the third approximation is a closed cone. Always for the case of 5 C IRP' convex, we formulate the following two properties of the contingent cone T{S^x^). Proofs can be found in Aubin and Frankowska (1990) and in Bazaraa and Shetty (1976). Theorem 3.4.11. If Si and ^2 are closed convex sets, with 0 G i n t ( 5 i — 52) or if ^ i and S2 are convex, with relint(5i) D relint(52) 7»^ 0, then
T{SinS2,x^)=T{Si,x^)nT{S2,x^), Theorem 3.4.12. Let 5 C IR^ be convex; we have i)
T{AS^ Ax^) = AT{S^ x^), where A is a matrix of dimension m, n.
238
Smooth optimization
problems
ii) T(5i + 52, {x^ + x2)) . . T{Si, x^) + r ( 5 2 , x^). II) Modified cone a p p r o x i m a t i o n s Some applications of the classical cone approximations introduced above, need additional assumptions concerning either the convexity of S or of the cone. To this extent, Clarke (1975) introduced a new cone approximation which obtained great interest owing to its convexity property. Originally Clarke used for this cone another representation with respect to the one given below, due to Rockafellar (1980, 1981), for which the connection to the other classical cones introduced above is more transparent. See also Aubin and Ekeland (1984), Aubin and Frankowska (1990), Elster and Thierfelder (1984, 1988a, 1988b), Giorgi and Guerraggio (1992a, 1992b), Hiriart-Urruty (1979), Ward (1987, 1988). In the following the cones T ( . , •)- A{'^ •)' Z{','), F ( . , ' ) , / ( • , •), Q(-, •) will be modified in such a way that the point x^ can be varied, too. If we do so, then the convexity behaviour of the cones will be influenced. With such a modification of the classical cones, however, the isotony property is lost. Definition 3.4.8. Let S cM^,
S y^iD and x^ e 5; then
Tm{S,x^) = {ye iR^ I ViV(2/) W{x^)
VA > 0 3 t G (0, A)
3 x E V{x^)nR''\S[j{x^}
3yeN{y)
:x +
tyeS}
is the modified contingent cone to S at x^.
AmiS,x^)
= {yelR''\
VN(y) 3V{x^) 3X > 0 "^t e (0, X)
\/xeV{x^)nSu{x^}
3yeN{y)
:x +
tyeS}
is the modified cone of attainable directions to S at x^ or Clarke tangent cone to S at x^. This cone will be denoted in the sequel by TC{S,x^),
Zm{S,x^) ^ {y e EJ" \3V{x^) 3 A > 0 V t G (0,A) V^ e V{x^) n 5 U {x^} :x +
tyeS)
is the modified cone of feasible directions to S at x^ or hypertangent cone to S at x^] this cone will be denoted in the sequel by H{S^ x^).
Local cone approximations of sets
Fm{S,x^) = {ye M^'iyVix^)
\/X>0
3xeV{x^)n]R''\SU{x^}
239
3te
(0, A) :x +
tyeS}
is the modified radial tangent cone to S at x^.
Im{S,x^) - {y G JR^ I 3N{y) 3V{x^) 3 A > 0 Vt G (0, A) \/x e V{x^) nSU
{x^} Vy G N(y) :x +
tyeS}
is the modified cone of interior directions to S at x^ or cone of epiLipscbitzian directions to S at x^. In the sequel this cone will be denoted
hy
E(S,x^).
QmiS.x"^) = {yeM^\
3N{y) yV{x^) VA > 0 3^ G (0, A)
3X G V{x^) n 5?^\5 U {x^} Vy G N{y) : x + ty e S} is the modified cone of quasi-interior directions to S at x^. The definitions of the cones r C ( - , •), H{', •) and £'(•, •) can be found in the literature (see especially Rockafellar (1980, 1981), Clarke (1975, 1976, 1983), Penot (1979)); the other cones F^(-,-). Tm('r), Qmi'r) have been given here for the purpose of completeness. The cone TC{S,x^) was the first "modified" cone to be introduced by Clarke. The most current definitions of TC{S,x^) consider only the set operation V{x^) D S and not, as in our definition, the union of the point x^ with the set V(x^) D S (in other words, our definition allows point X to coincide with x^). We have chosen this second possibility for the characterization of the modified cones, in order to avoid some troubles with cones Tm{', -), Fm{', •) and Qm{', •)• '" Giorgi and Guerraggio (1992c) it is, however, proved that the two mentioned definitions of Clarke tangent cone coincide when, as in our assumptions, x^ G S. All modified cones just obtained are not isotonic, but they are convex or complements of convex cones. Similar to the previous theorem, we shall formulate the basic properties of the modified cones. For the proofs, again we refer the reader to the quoted authors (see especially G. Giorgi and A. Guerraggio (1992a, 1992b), Elster and Thierfelder (1984, 1988a, 1988b),
240
Smooth optimization problems
Clarke (1983), Ward (1988)). We first note that Clarke tangent cone to S at x^ can be also characterized as follows:
TCi{S,x^) = {yemj^ \\J{x^} C S,
lim x^ = x^ V{Afc} C iR+, A;—•-l-oo
lim Afc = 0, 3 {y^} -^ y: x^ + Xk y^ e S for large k} ; TC2{S, x^) = {ViV(y), 3V{x^), 3X>0,\/xeSn \/te{0,X):x TC,{S,x')
={ n N(0)
+ tN{y)nS^(D}
U
n
N{x^)
xeSnN{x^)
A>o
te{o,x)
[ ^
V{x^) ; + N(0)]}.
^
The last characterization, in terms of the so-called "Painleve-Kuratowski limits of sets", was first given by Rockafellar (1980, 1981). Thus we have, by definition,
rC3(5, x^) = lim inf t'^S
- x) .
The Clarke tangent cone can also be described in terms of the distance function. We have then the following characterization:
TC,{S,x')
^{yeM-l
lim sup ^^(x + Ay) - ds{x) ^ ^ j ^ A-.0+
where ds{x) = inf ||x — v\\. The above definition is the original one due to Clarke (1975). Theorem 3.4.13. i)
H{SjX^) cone.
is a convex cone. Fm{S^x^) is the complement of a convex
ii) 0 e H{S,x^)^Oe
Fm{S,x^)
^x^eS.
Local cone approximations of sets
iii)
241
H{S,x'^)cZ{S,x^);
iv) H{SinS2,x°) D H{Si,x°)nHiS2,x^y, Fm{Si U 52, arO) C Fm{Si, x°) U ^ ^ ( ^ 2 , a;0). v) H{Si X 52, {x\x2)) = H{Si,x^) X H{S2,x^); i^m(5i X 52, (xi,x2)) c F^(5i,a;i) x i^^(52,x2). Theorem 3.4.14. i)
E{S,x'^) cone.
is a convex cone. Tm{S,x°)
is the complement of a convex
ii)
E{S,x^) is an open convex cone and x° e int(5) 0}
Local cone approximations of sets
247
is called recession cone of S. If 5 = 0, then by definition O"^0 = IRP'. We note that 0 + 5 is a convex cone since S + t{\iy^ + My'^) = {S + tXiy^) + ^22/^ C 5 + ^22/^ C 5 for each t > 0, A i , A2 ^ 0 and y \ 2/^ e 0+5. Moreover, 0+5 is the largest (convex) cone K C M^ such that 5 + i f c 5. If 5 is convex and nonempty, then 0 + 5 can be represented as:
0 + 5 = {2/ G JR^ I 5 + y c 5 } . It is intuitive that 0 + 5 is useful to state the boundedness of a set 5 7^ 0. Indeed, if 0+5 is not a singleton formed by the origin, then 5 is unbounded, but the converse does not hold; e.g. 0 + ( ^ = { 0 } . If 5 is closed and convex, then 5 is bounded if and only if 0 + 5 = {0}. As the recession cone is not a local cone approximation, we do not insist further on its algebraic and topological properties. We only remark that 0+5 C 0+5. In fact, let y e 0+5 and let s e S] then, \/t > 0, we have
s + ty = { lim
5^) + ty = lim (s^ + ty) =
lim
A:—)-foo
A:'—>-H-oo
/c—>-+-oo
s^' G 5
where s^^ s^ G 5. The converse relation between 0 + 5 and 0 + 5 does not hold, as shown by the following simple example: 5 = IR'^\{x^}\ we have 0 + 5 = {0}, whereas 0 + 5 = IR?. Moreover, the equality 0 + 5 = 0 + 5 generally does not hold even if 5 is convex (take, e.g., 5 = { ( X I , T 2 ) | xi G ( - 1 , 1 ) , X2 ^ 0 } U { ( - 1 , 0 ) , ( 1 , 0 ) } ; we have { 0 } = 0 + 5 C 0 + 5 = {(0,0:2)1x2^0}).
With 5 + 0+5 c 5 we even get 5 + 0+5 c 5 + 0+5 C 5, i.e. 0 + 5 C 0+5. Example 3.4.1. Consider the following sets of IB?'.
Si = {(xi,X2) I xi > 0, X2> xf"", a > 0} ; 52 = {(xi,X2) \x2 ^ xl} ; Ss = {{xi,X2)\xl+xl
^ 1} ;
248
SA = {{xux2)
Smooth optimization
problemts
I xi > 0, X2 > 0} U {(0,0)} ;
S5 = {{xi,X2)\xi
> 0 , a:2>0} .
The corresponding recession cones are: 0+5i = {(a:i,X2)|xi ^ 0, X2 ^ 0} ; 0+S2 = {ixuX2)\xi
= 0, X 2 ^ 0 } ;
0+53 = {(^1, X2) I XI = 0 = X2} = {(0,0)} ; 0+^4 = {(51, X2) I XI > 0, X2 > 0} U {(0,0)} = 54 ; 0+55 = {(xi,X2) I XI ^ 0, X2 ^ 0} = 55 .
3.5. Necessary Optimality Gonditions for Problem (P) W h e r e t h e Optimal Point is Interior t o X Theorem 3.3.1 gives a condition, necessary under a regularity assumption, for the local optimality of point x^ for problem (Pe), '-e. for problem (Pi) where there are no inequality constraints and X is an open set. One could remark that it should not be restrictive to study only problem (Pe). as an inequality of the type gi{x) ^ 0 can be equivalently substituted by the equality 5i(xi,...,Xn) -xl_^i
= 0
(1)
where the variable Xn-^i is called "slack variable". By means of transformation (1) one could think to have removed any question due to the presence of inequalities and therefore to transfer to problem (Pg) the study of problem ( P ) . However, transformation (1) does not remove the specific difficulties of problem (P), such as the validation of the regularity conditions on the new constraints. Moreover, the "transformed problem" may be complicated, due to the increase of the number of variables and to possible changes of "structure". For example, if ^i is an affine function, the new transformed constraint (1) is no more affine. Transformation (1), indeed advantageous
Necessary optimality conditions for problem (P)
249
in some cases, can cause considerable complications for other cases (see, e.g., Mangasarian and Fromovitz (1967)). Let us therefore consider problem (P):
Min f{x) xes S={x\xeX,
gi{x)^0,„.,gm{x)
^0},
(P)
where the real-valued functions / , 51,...5m ai'e defined on an open set D C M^ containing X. Let x^ e S and let I(x^) be the set of effective (or active or binding) constraints at x^, i.e. the set
l{x°) =
{i\gi{x),
i^lp{x^)}.
Theorem 3.5.1 (linearization theorem of Abadie-Mangasarian). Let x° 6 i n t ( X ) be a solution of (P) or simply a point of local minimum of / on S. Let / and g be differentiable at x ° ; then the system
f zV/(xO) < 0 zVgiix'') < 0 , [ zV5i(xO) ^ 0 ,
ie /jvp(xO) ie /p(xO)
has no solution z in FIP'.
(2)
250
Smooth optimization problems
Proof. Let x^ be a local solution of problem (P) and suppose absurdly that system (2) has a solution z. We shall prove that there exists a J > 0 such that: i)
x^ + SzeX,\/6e
(Oj);
i() g{x^ + Sz) £0,Wde iii) f{x^ +
(0,J);
Sz) 0 such that x^ + 6z e X, WS e
(Oj). b) As / is differentiable at x^, we have
/(^o _^ ^^) _ j(^0) ^ SzVfix^) + 5 \\z\\ eo{x^^ 6z) , with so{x^, 5z) - ^ 0 for (5 —> 0. Then for 6 small enough, say 0 < S < So, we have
zVf{x^) + \\z\\eo{x^,5z) 0. The first case occurs for lack of technical properties of the constraints, called constraint qualifications, which assure the positiveness of the first multiplier UQ. In the second case, where UQ > 0, we have the third approach, due to Kuhn and Tucker (1951), for obtaining necessary optimality conditions for problem (P). The Kuhn-Tucker conditions are perhaps the most known necessary optimality conditions for (P), as they are, under some convexity assumptions, also sufficient for optimality. It is therefore important to see which conditions assure the positiveness of the multiplier UQ in relation (3). As we have already noted, conditions of this nature are usually referred to as constraint qualifications, since they involve only the constraints and are independent of the geometric structure of the feasible set S. However, (see Mangasarian (1969)) the degenerate case UQ = 0 may occur, for example, when the optimal point x^ is at a cusp of S or when 5 is a singleton made of the point x^. We now consider some simple constraint qualifications of "algebraic type". In the next section we shall introduce other constraint qualifications, of "topological type", for which the requirement x^ G i n t ( X ) is related.
Necessary optimality conditions for problem (P)
253
Let x^ e i i i t ( X ) ; let us consider the following constraint qualifications for problem (P). 1) Slater's weak constraint qualification (Slater (1950), Uzawa (1958), Mangasarian (1969)). The constraints gi, i = l , . . . , m , are said to satisfy Slater's weak constraint qualification at x^ G S, \f Qi, i G li^^), is pseudoconvex at x^ and there exists a vector x E: S such that
gi(x) < 0. 2) The original Slater's constraint qualification requires that D is a convex set in IRP' and that g is convex on D. Slater's constraint qualification is satisfied if there exists x e S such that g{x) < 0. 3) Karlin's constraint qualification (Karlin (1959)). Let D be convex and g be convex on D] Karlin's constraint qualification is satisfied if there exists no vector p G M^, p>0, such that p • g{x) ^ 0,\/x e D. 4) Modified strict constraint qualification (Mangasarian (1969), Bazaraa, Goode and Shetty (1972b)). It is expressed as: the functions gi, i G I{x^), are pseudoconvex at x^ and the feasible set 5 contains at least two distinct points x^ and x^ such that gi, i E I{x^), are strictly quasiconvex at x^ with respect to x^, i.e. from gi{x'^) ^ gi{x^) it follows gi{Xx^ + (1 - A) x^) < gi{x^), V A G (0,1). 5) Weak reverse constraint qualification (Arrow-Hurwicz-Uzawa (1961), Mangasarian (1969)). It is expressed as: the functions gi, i G I^x^), are pseudoconcave at x^. 6) Cottle-Dragomirescu constraint qualification (Cottle (1963), Dragomirescu (1967)). The vectors Vgi{x^), independent, i.e. the system
iel{x0)
Ui^O
,
Vi € /(x°)
only admits the zero solution.
i G I{x^),
are positively linearly
254
Smooth optimization
problems
7) Mangasarian-Fromovitz constraint qualification (Mangasarian and Fromovitz (1967)). The system
yVgi(x^) < 0 ,
ie
I(x^)
admits solution. 8) Nondegeneracy condition or rank condition (Arrow, Hurwicz and Uzawa (1961)). The gradients Vgi{x^),
i G I{x^),
are linearly independent.
9) Arrow-HurwicZ'Uzawa first constraint qualification (Arrow-HurwiczUzawa (1961), Mangasarian (1969)). It is expressed as: there exists a vector y G M^ solution of the system
yVgi{x^) ^ 0 ,
ie
yVgiix^)
ie iNPix"^) •
< 0,
Ip{x^)
We may note that this constraint qualification can also be expressed by the following condition: the system
Y^
ui^o ,
UiVgiix^) +
J2
^iVp,(a:^) = 0
ie l{x^)
admits solutions only with U]\/p = 0. Note that conditions 2) and 3) are global conditions, while the remaining ones are local conditions at point x^ e K. Theorem 3.5.3. Let x^ e i n t ( X ) be feasible and let C.Q.i denote the above i-th constraint qualification {i = 1, ...,8). Then we have the following implications:
Necessary optimality conditions for problem (P)
255
a g . 2 ^ C.Q.3 C.Q.l
4=
C.QA
C.Q,6
^
C.Q.9
0.
Necessary optimality conditions for problem (P)
257
Theorem 3.5.4. Let the assumptions of Theorem 3.5.2 be verified and let C.Q.9 be verified. Then in relation (3) we have UQ > 0. Proof. Assume the contrary, i.e. that in relation (3) UQ = 0; then (3) is given by
ielp{xO) Ui ^ 0,
ieiNP{x^) Vi G I{oc^), Ui not all zero for i G INP{X^)
•
By means of Theorem 19 of Section 2.4 (Motzkin transposition theorem) we obtain that the system
yVgiix^) ^ 0 ,
ie
yVgiix^) < 0 ,
ie
Ip{x^) INP{X^)
has no solution, in contrast with the assumed validity of C,Q.9.
D
Thus if any C.Q. is verified, it is possible to rewrite relations (3)-(5), dividing UQ and each component of u by t^o- Denoting by A the m-vector [ui/uo^^^.^Um/uo], the following result is therefore apparent, considered here as the third approach for obtaining necessary optimality conditions for problem (P). Theorem 3.5.5 (Kuhn-Tucker necessary optimality conditions). Let x^ G i n t ( X ) be a solution of problem (P) or simply a point of local minimum of / on S] let / and g be differentiable at x^. Let a constraint qualification be verified; then there exists a vector A G M^, such that
V/(xO) + AVg(xO) = 0 A . g{x') = 0 A^ 0 .
(6) (7) (8)
The nonnegative numbers A^ are usually called Kuhn-Tucker multipliers for problem (P), and again relations (7) are the slack-complementarity conditions. Relations (6)~(7)-(8) can also be written as:
258
V/(:r:')+ Xi^o,
Smooth optimization
E
A,Vp,(x^) - 0
problems
(9)
ie l{x^) .
(10)
Conditions (6)-(7)-(8) are also called qussi-saddle-point conditions in Arrow, Hurwicz and Uzawa (1961); if we denote by i/;{x^ A) = f(x) + Xg{x) the Lagrangian function for problem ( P ) , then (6)-(7)-(8) can be written in the form:
Va:^(:r, A) = 0 ;
A • Vxi^{x, A) = 0 ;
A^ 0 ,
also called local saddle point conditions. Note, again, that if the constraints Qi, i € I{x^), are all pseudoconcave at x^, then C.Q.9 is trivially satisfied; therefore if, e.g., the constraints gi, i = l , . . . , m , are all linear, C.Q.9 is automatically satisfied. In other words, programming problems with all linear constraints need no constraint qualification. From a "historic" point of view, we note that this result was previously proved in an early paper on linear programming theory: see Goldmann and Tucker (1956). Regarding the geometric interpretation of the Kuhn-Tucker conditions, note that any vector of the form X)iG/(TO) XiVgi{x^), where A^ ^ 0 for all i G I{x^), belongs to the cone generated by the gradients of the effective constraints. Thus conditions (9)-(10) can be geometrically interpreted as: the vector — V / ( x ^ ) belongs to the cone generated by the gradients of the effective constraints at the optimal point x^. Finally we note that a point x^ E S satisfies the Kuhn-Tucker conditions (6)-(8) if and only if the set
n(x^) = {y I yV5i(x^) S 0, i e I{x^); yVfix"") < 0} is empty {Q.{x^) = 0). This follows easily from applying the FarkasMinkowski theorem to relations (6)-(8).
3.6. Necessary Optimality Conditions for Problems (Pe) and (P); The Case of a Set Constraint Till now we have assumed that in problems (Pg) and (P) the optimal point x^ is interior to X C i? C M^. If this assumption is relaxed
Necessary optimsility conditions for problems (PQ) and (P)
259
we cannot in general obtain the Fritz John conditions; let us consider, with reference to (Pe), the following example, taken from Bazaraa and Shetty (1976). The problem is to minimize f{x^y)
= —x + y, subject to h{x,y)
(x — 1)^ + 2/^ — 1 = 0, where x, y must also belong to the set X = {{x,y)\\x\
+ \y\^l}
.
It is clear (see also Figure 11) that the optimal point is
o_/V2-J. -IN
Figure 11. Note that z^ ^ i n t ( X ) . Now V / ( x o , yo) = ( - 1 , 1 ) and V/i(xo, yo) (—\/2, —^/2). The related Fritz John conditions are therefore UQVf{xQ, yo) + uVh{xo, yo) = 0 ,
=
260
Smooth optimization problems
I.e.
7io(-l, 1) + ^^ • (-V2, -V2) = (0,0) . These two equations admit only the solution UQ =^ u = 0 and therefore the Fritz John necessary optimality conditions do not hold at this point. In this section we shall modify the Fritz John conditions for (P), to also take care of the case where it is not necessarily x^ e i n t ( X ) , i.e. X is a so-called set constraint (not necessarily functionally specified). The same considerations for problem (Pi) will be made in Section 3.8. Moreover, we shall consider a constraint qualification, which assures the validity of the Kuhn-Tucker conditions, also for this case. We recall that if 5 is a nonempty set in IRP' and x^ G S, then T{S^x^) is the Bouligand tangent cone to S at x^ and P{S^x^) is the pseudotangent cone to S at x^ (see Section 3.4). T*{S,x^) and P*(S,x^) are the respective polar cones. For problem (PQ) the following result, due to Varaiya (1967), is of fundamental importance, especially for its subsequent developments, and is a sharper version of Theorem 3.2.1. Theorem 3.6.1. Let x^ be a solution of problem (PQ) or a point of local minimum of / on X and let / be differentiable at x^. Then we have
~V/(xO) e T*{X,x^), i.e. yVf{x^) Proof.
Let y G T{X^x^)',
^ 0, Vy E
T{X,X^).
then there exist a sequence {x^}
verging to x^ and a nonnegative sequence {Xk}
G X con-
G IR such that y =
limfc-,-foo ^ki^^ — x^). Since / is differentiable at x^, we have for each k:
fix") - /(xO) = (x'^ - x°) V/(x°) + o{\\x'' - x'W) , which implies
Afc(/(x'=) - /(a;°)) = Xkix'' - x°) \/f{x°) + Xk • o(||x^ - x% . Letting fc —> oo in the second member of the previous equality, we get y • Vf{x^)', thus also \k{f{x^) — f{x^)) has a finite limit, which must be nonnegative, being A^ ^ 0. Consequently
yV/(x°)^0, i.e.
Vy€T(X,x°),
Necessary optimality conditions for problems (PQ) and (P)
-Vf{x^)eT*{X,x^)
.
261
D
Theorem 3.6.1 is interesting in and of itself, as it provides a necessary condition for a differentiable function to attain a local minimum over any set X C JR"". Note, moreover, that Theorem 3.6.1 provides a sharper result than Theorem 3.2.1. Guignard (1969) proved the following version of the above theorem (for differentiable functions defined on Banach spaces): A necessary optimality condition for problem (PQ) is —V/(a;^) 6 P*(X^x^). As, for any set S, we have (see Section 2.3), S* = (conv(iS'))*, therefore T*(X^x^) = P * ( X , x^); so the formulation of Guignard is equivalent to the one given in Theorem 3.6.1. Note again that if it is x^ G i n t ( X ) , then T{X,x^) = mr" and r*(X,a:^) = { 0 } : therefore Theorem 3.6.1 recovers also the classical necessary conditions for a free extremum. Let us now apply Theorem 3.6.1 to the study of necessary optimality conditions for problem (P) with a set constraint X C iR^ (similar arguments hold also for problem (Pg); in Section 3.8 we shall study a nonlinear programming problem with both equality and inequality constraints and a set constraint). Definition 3.6.1. In problem (P) let x^ e 5; the set C(x^) = {y € i R " I yVff,(x^) ^ 0, V i € / ( x ^ ) } is called linearizing cone at x^ for problem ( P ) . The linearizing cone (called in Arrow-Hurwicz and Uzawa (1961) cone of locally constrained directions) is a nonempty closed convex polyhedral cone determined by the active constraints of problem ( P ) . The following lemma was proved by Abadie (1967). Lemma 3.6.1. In problem (P) let x^ e 5; then it results P ( 5 , x ° ) C C(xO),i.e. C * ( x ° ) c r * ( 5 , x O ) .
262
Smooth optimization
problems
Proof. It is sufficient to prove that T{S, x^) C C{x^), as, being that C{x^) is a closed convex cone, then P{S,x^) = cl(conv(T(S', x^))) C C{x^). Let y € T{S^x^). Then there exist a sequence { x ^ } G 5 converging to x^ and a nonnegative sequence {A^} € iR, such that y = limfc_,+oo ^k ' (x^ - x^). We have, for i e I{x^),
XkQiix^) = Xk{x^ - x^) . Vgiix')
+ Xk o\\x' - x'^W ,
If k is large enough and if yVgi{x^) > 0 for some i G I{x^), then the right-hand side of the last expression is positive, so that gi{x^) > 0, which is in contradiction to x^ e S. D As a counterexample to the converse of Lemma 3.6.1, consider the following system n M^: 9i{x) = X2 - x f ^ 0 g2{x) = -X2 ^ 0 and take x^ = (0, 0). The cone of tangents is the half-line X2 = 0, x i ^ 0, while the linearizing cone is the whole line X2 = 0. It is worthwhile to emphasize that the cone of tangents is a geometrical concept, while the linearizing cone depends only upon the analytical description of the feasible set S. For example, if we add to the above two constraints the third equality g3{x) = -xi
-X2 ^ 0 ,
the feasible set S remains the same, as well as the cone of tangents at x^ = (0,0), but now the linearizing cone coincides with the tangent cone. Definition 3.6.2. The pair (X^g) of problem (P) is said to be GouldTolle regular at x^ G 5 if and only if for every objective function / , differentiable at x^ and having at x^ a constrained local minimum, the Kuhn-Tucker conditions (6)--(7)-(8) hold. The following is a basic result due to Gould and Tolle (1971).
Necessary optimality conditions for problems (Pe) and (P)
263
Theorem 3.6.2. In problem (P) the pair ( X , 5) is Gould-Tolle regular 3tx^ eS if and only \fT^{S,x^) C C*(x^). For the proof of this theorem we need a further lemma. Lemma 3.6.2. The Kuhn-Tucker conditions (5)-(6)-(7) hold at x^ G 5 if and only i f - V / ( a ; ^ )
eC*{x^).
Proof. The result is a direct consequence of the Farkas-Minkowski theorem of the alternative (Section 2.4, Theorem 2.4.1): the system
has solution if and only if —Vf(x^) -y ^ 0 whenever y • Vgi{x^) ^ 0, for i G I{x^)f i.e. whenever y G C{x^). Therefore the Kuhn-Tucker conditions are equivalent to - V / ( x ^ ) G C*{x^). D We now turn to proving Theorem 3.6.2. Let T*{S,x^) C C*{x^) and / be any objective function, differentiable at x^ G 5 for problem ( P ) , and having at x^ a constrained local minimum. By Theorem 3.6.1 - V / ( x O ) G r*(5,sO); being that r*(5,a:0) ^ C7*(xO), by Lemma 3.6.1 we have r * ( 5 , x ^ ) = C*(x^) and by Lemma 3.6.2, the Kuhn-Tucker conditions (6)-(7)-(8) of Section 3.5 hold and hence the pair {X^g) is Gould-Tolle regular at x^. It remains to show that if (X^g) is GouldTolle regular at rr^ then r * ( 5 , x ^ ) C C*(:rO). In Gould and Tolle (1971) it is in effect proved that if {X,g) is Gould-Tolle regular at x^, then for every y G T*{S^x^), there exists an objective function / , which is differentiable at x^, having at x^ a constrained local minimum and for which — V / ( x ^ ) = y. The Gould-Tolle regularity assumption, together with Lemma 3.6.2, then yields y G C*{x^) and hence the result
r*(5,x^) cc*{x^).
n
See also Bazaraa and Shetty (1976) for another proof of the existence of an objective function / for which —Vf{x^) = y, for any y G T * ( 5 , x^) and such that g has a minimum over S at x^. Both proofs (of Gould and
264
Smooth optimization problems
Tolle and Bazaraa and Shetty) are rather long and intricate, so we prefer to give here only the conceptual aspects in proving Theorem 3.6.2. The condition T'^iS.x^) C C*(x^), i.e. C{x^) C P{S,x^) or C{x^) C r * * ( 5 , x ^ ) , represents a constraint qualification (C.Q.) which enjoys the best situation with respect to problem (P), also for the case of a set constraint. This condition was introduced by Guignard (1960), but see also Canon, Cullum and Polak (1970) and Evans (1970). We shall refer to condition T * ( 5 , x ^ ) C C*(x^) as the Guignard-Gould-Tolle CQ. The concept of regularity of the pair ( X , g) was also analyzed by Arrow-Hurwicz and Uzawa (1961) but Theorem 3.6.2 was proved by Gould and Tolle (1971); the proof of Arrow-Hurwicz and Uzawa considered only the case of a convex feasible set S; moreover, these authors worked with another constraint qualification. Even if the Guignard-Gould-Tolle C.Q. is the most general C.Q. for the feasible set of problem (P), obviously it is possible to have problems for which C*{x^) is a proper subset of T*{S^x^) and therefore the Kuhn-Tucker conditions do not hold at x^: we have already produced a counterexample to the converse of Lemma 3.6.1. For these cases the following considerations and results may be useful (see also Gould and Tolle (1972)): in particular, it will be shown that for any problem (P) a nontrivial optimality criterion can be found which is valid for all objective (differentiable) functions with a local constrained minimum at x^ e S and without imposing the validity of any C.Q. The following lemma will be utilized. Lemma 3.6.3. Let us consider the nonempty sets A, H and K, with 0 e A, 0 e H and K a convex cone and \et AuH
= K. Then A + H = K.
Proof. The result follows directly from the following relations:
K = AuHcA
+ HcK
+ K = K.
D
To employ this result, observe that from C*(x^) C r * ( 5 , x^) we have r * ( 5 , x ^ ) \ C * ( x ^ ) C T*{S,x^). The set T*{S,x^)\C''{x^) does not contain the origin; let us therefore consider the cone T * ( 5 , x^)\C*(x^) U { 0 } . Since r*(5,xO) is convex and since r * ( 5 , x ^ ) = C*(xO) u r * ( 5 , x ^ ) \ C*(x^) U { 0 } , it follows then from Lemma 3.6.3 that
Necessary optimaUty conditions for problems (PQ) and (P)
r * ( 5 , x^) = C%x^) + [r*(5, x^)\C''{x^)
U {0}] .
265
(11)
If we denote by B{x^) the cone of gradients, i.e. the closed convex cone defined by B(x^) = [zeR''\z=
J2
XiVgiix""),
Xi^o},
from the previous results, we have B(x^) = C*(a:^) . Thus relation (11) becomes r * ( 5 , x^) = B(x^) + [r*(5, x^)\C*(x^)
U {0}] .
From Theorem 3.6.1 and the above results we have therefore the following criterion, due to Gould and Tolle (1972): Theorem 3.6.3. If x^ is a local solution of ( P ) , where / and gi are differentiable at x^, then there exist scalars Xi k 0, i e I{x^), such that ~[v/(x^)+
E
A,Vft(x«)] e T * ( 5 , x ^ ) \ C * ( x O ) u { 0 } .
(12)
iel{xO)
Let us note that in the special case where the Guignard-Gould-Tolle constraint qualification T*{S,x^) = C*{x^)
(13)
is satisfied, then T*{S,x^)\C*{x^) is empty and (12) reduces to the classical Kuhn-Tucker conditions. However, if (13) is not satisfied for a given problem (P), Theorem 3.6.3 guarantees the existence of nonnegative multipliers such that -[v/(x^)+
E
XiVgi{x')]eG,
iG/(xO)
where G is a suitable subcone of T * ( 5 , x^). In particular G coincides with r * ( 5 , x^)\C*{x^) U { 0 } if no further assumption is made, whereas G may
266
Smooth optimization problem.s
be a proper subcone of this last set, when suitable assumptions are made. Finally G shrinks to the singleton { 0 } when (13) holds. In general it is not allowed to deduce from the equality
T*(S,x^)\C*(x^)u{0} = G the other equality
r*(5,x°) = C*(:r:^) + G even if, in certain cases, numerous cones G C T * ( 5 , x^) will satisfy the relation
T*{S,x^) = C*ix^) + [r*(S',xO)\C*(a:0) U{0}] = = C*{x^) + G .
(14)
Indeed, consider the following example, taken from Gould and Tolle (1972). Let S be given by the constraints
gi{xi,X2) =X2-xl
SO
g2{xuX2) = -X2 S 0 for which we have already seen that at x^ = (0, 0) the constraint qualification (13) does not hold, and consequently also the Kuhn-Tucker conditions are not satisfied for any objective function / having at x^ a local minimum. Here (see Figure 12) the cone C(x^) is the xi axis, C*{x^) is the X2 axis; T(S, x^) is the nonnegative xi axis and T*(S, a:^)\C*(x^) U { 0 } is the set {(:ri,a:2)|xi 0, we would obtain pAx ^ Xpko for each A > 0, where pko > 0, which is impossible. This completes the proof. D Theorem 3.6.5. Let x^ be a local solution of problem (P), where / and Qi, i E I{x^), are differentiable at x^. Then for every convex subcone Ti{X,x^) of T{X,x^) there exist multipliers AQ ^ 0, Ai ^ 0, i € / ( x ^ ) , not all zero, such that
-[AOV/(X^)+
J2
\Vgi{x')]eT^iX,x'),
Proof. Consider, for each x e X, the vector-valued function a(x) ( / ( x ) - f{x^)jgi{x)), i e / ( x ^ ) . As no vector x e X exists such that o
a{x)e
R-X
o
iR_x...x
o
R-
= K and gi{x) ^ 0,
Vi € J(x°)
=
Necessary optimality conditions for problems (Pg) and (P) (K
269
is obviously an open convex cone), we show that no vector x E
T{X,x^)
exists such that Va(x^) x G K. Ab absurdo, suppose the exis-
tence of such a vector x, with x = liinfc_,_|_oo A/c(x^ —x^), with {x^} e X, x^ -^ x^. {Xk} C M^. We have then
Xk{a{x^) - a{x^)) = AfcVa(a:^)(x^ - x^) + Xj, • o{\\x^ - x^H) , and letting fc —> +00 (being a(x^) = 0),
lim Afca(x^) = Va(a:°) x . But being that K is an open cone, we have Xka{x^)
e K for k suffi-
ciently large and the absurd conclusion follows: a(x^) G K, with x^ E X and gi{x^)
< 0 for i G J(x^), thanks to the continuity. Then, no vector
x G T{X^x^)
will satisfy the condition Va{x^)x
true, a fortiori, for x G As Ti{X^x^)
G i i " and the same is
Ti{X,x^).
is a convex cone, there exists, thanks to Lemma 3.6.4, a
nonzero vector A G Jf*, such that XVa{x^)x
^ 0, V x G r i ( X , j : ^ ) .
Therefore --AVa(x°) G Ti*(X,a:^), from which it follows -[AOV/(XO)+
X^
A,V5,(x^)]Gri*(X,x^),
with Ao ^ 0, Ai ^ 0 (i G / ( x ^ ) ) , and (AQ, A ^ ^^ 0.
D
Obviously, in the case where T{X^ x^) is a convex cone, Theorem 3.6.5 gives a sharper result, being T i = T{X,x^).
T{X^x^)
is convex if, e.g.,
X is a convex set (see Section 3.4) or even if X is star-shaped at x^ (equivalently: convex at x^), i.e. t x + (1 - 0 x^ G X ,
V x G -X , V t G [0,1] ;
see Bazaraa and Shetty (1976). Penot (1991) has obtained more general conditions on X assuring the convexity of T{X^x^);
following the terminology of this author, in this
case the set X is called tangentially regular at x^. If r ( X , x^) is not a convex cone, there exist some convex subcones of the same, that can be chosen to represent T i ( X , x^) in Theorem 3.6.5. One of these subcones
270
Smooth optimization
problems
is the Clarke tangent cone, but obviously if we can choose the largest convex subcone of T{X^x^), Theorem 3.6.5 will be sharper. A convex subcone of T(X,x^), larger than the Clarke tangent cone, is the MichelPenot prototangent cone (see Section 3.4). More recently Treiman (1991) has shown that there exist infinite convex cones lying between the Clarke tangent cone and the Michel-Penot prototangent cone. We conclude this section with the following result in which generalized Kuhn-Tucker conditions of the type of (16) are obtained from Theorem 3.6.5, imposing a constraint qualification. Theorem 3.6.6. Let x^ be a local solution of problem ( P ) , where / and Qi, i E I{x^), are differentiable at x^ and let the following constraint qualification be fulfilled: 3ye
r i ( X , x ^ ) : yVg^{x'^) < 0 ,
\/ie
/(x^) ,
where T i ( X , x^) is a convex subcone of T{X^x^). tipliers Xi ^0, i e I{x^), -(V/(:r^)+
E
Then there exist mul-
such that
A,V5^(x^))e^l*(X,xO).
Proof. Assuming in Theorem 3.6.5 that AQ = 0 we should get -
E A,V^z(^')€Ti*(X,x^) iel(x^)
i.e. X^ A,yV^,(x^) ^ 0 , iei{xO)
VyGTi(X,x^)
and taking y = y vje obtain a contradiction with the assumed constraint qualification.
D
3.7. Again on Constraint Qualifications In the previous section we mentioned the Guignard-Gould-Tolle con-
Again on constraint qualifications
271
straint qualification, i.e. C*{x^) = T * ( 5 , x ^ ) and in Theorem 3.6.4 a generalization of the same was imposed, i.e.
These constraint qualifications are the weakest possible, in the sense that they are necessary and sufficient for the Gould-Tolle regularity of the related necessary optimality condition. In Section 3.5 we have introduced other constraint qualifications for problem ( P ) , in the case x^ e i n t ( X ) . All the above constraint qualifications guarantee the positivity of the multiplier of the gradient of the objective function in the various Fritz John conditions, i.e. their validity guarantees that if a:^ G 5 is a local solution of (P), then the Kuhn-Tucker conditions hold. Besides the constraint qualifications already introduced, there are others, with varying degrees of generality. In the following conditions x^ e X: 1) Zangwill constraint qualification (Zangwill 1969)). It is expressed as:
C{x^)cZ{S,x^)
,
where Z{S^ x^) is the cone of feasible directions to S at x^. 2) Kuhn-Tucker constraint qualification (Kuhn and Tucker (1951)). It is expressed as:
C{x^)^A{S,x^)
,
where ^4(5, x^) is the cone of attainable directions to S at x^. 3) Arrow-Hurwicz-Uzawa second constraint qualification (Arrow, Hurwicz and Uzawa (1961)). It is expressed as
C{x^) Ccl(conv(A(5,x^))) or equivalently as
272
Smooth optimization
problems
The cone A** (5, x^) is also called cone of weakly attainable directions. 4) Abadie constraint qualification (Abadie (1967)). It is expressed as C(xO)cr(5,a;0). 5) We recall the already introduced Guignard-Gould-Tolle constraint qualification (Guignard (1969); Gould and Tolle (1971)). It is expressed as C{x^) C P ( 5 , x^) ,
i.e.
C{x^) Ccl(conv(T(5,x^))) or equivalently C(xO)cr**(5,xO) or equivalently T*(S,x^)
cC*(xO) .
Before going on, we note that in general we find no implications among A** (5,2:^) and T ( 5 , x ^ ) ; the following examples, due to Abadie (1967) and Evans (1970), show the non interdependence of the Abadie C.Q. and the Arrow-Hurwicz-Uzawa second C.Q. Example 3.7.1. Let X = IR?, 5'i(xi,X2) = 0:1X2; 52(^1? 2:2) = —xi\ p3(xi,X2) = —X2- In other words S = {(Xi, X2) I Xi = 0, X2 ^ 0} U {(Xi, X2) I X2 = 0, Xi ^ 0} .
Again on constraint qualifications
273
If we consider the point x° = (0,0), we have T(S',(0,0)) = S and A{S, (0,0)) = S. However, yl**(5,(0,0)) = {{xi,X2)\xi
^ 0, X2 ^ 0} .
Clearly T{S, (0,0)) C A**{S, (0,0)), but they are not equal. Conversely, consider the following functions . a;f sin — , s{xi) = «( ^ xi 0 , . xf cos — , c(a;i) = ^ xi 0 ,
xi ^^ 0 xi = 0 ; xi 7^ 0 xi = 0 .
The above functions are continuously differentiable; the functions and their derivatives vanish at xi = 0. Now consider the following constraint functions gi{xi,X2)
= X2-x\-
s{xi)
92{XI,X2) - -X2 + xl + c{xi) 93{XI,X2) = x l - l . Again we take X = R"^; it can be verified that
A(5,(0,0)) = A**(5,(0,0)) = {(0,0)}, whereas T(5,(0,0)) = {(a;i,X2) I X2 = 0 } . So in this case A**{S,x^)
C T{S,x^),
but they are unequal.
Theorem 3.7.1. If x^ e i n t ( X ) , then the Arrow-Hurwicz-Uzawa first C.Q. implies the Zangwill C.Q. Proof. If we denote by Cp{x^) the set
Cp{x^) = {yeM^\
yVgiix'^) ^ 0, Vz e Ip{x^)]
y^9i{x'') 0. Since x^ 6 i n t ( X ) , then x^ + Xy e X for A sufficiently small. By continuity of gi, i ^ I(x^), and since gi (x^) < 0 it is clear that gi(x^ + Xy) < 0 for all i ^ I(oo^) and for A > 0 sufficiently small. For the pseudoconcave active constraints X(yVgi{x^)) ^ 0 implies that gi{x^ + Xy) S gi{x^) = 0 for all i G Ip(x^) and A > 0. Finally for i € INP{X^), Ky'^aiix^)) < 0 implies that gi{x^ + Ay) < 0 for A sufficiently small. In other words x^ + Ay G 5 for A > 0 and sufficiently small, i.e. y G Z{S^x^) and the proof is complete. D On the grounds of Theorem 3.5.3, on the inclusion relationships among cones Z ( 5 , x ° ) , A{S,x^), T{S,x^), P(S,x^) and Example 3.7.1, under the assumption that x^ G i n t ( X ) , the implications among the various constraint qualifications can be described by the following diagram (Figure 13). If x^ ^ m t ( X ) , the previous diagram is not entirely valid, even if it is easy to verify that it still holds, starting from the Zangwill C.Q. (In Lemma 6.1.1 of Bazaraa and Shetty (1976) the assumption x^ G i n t ( X ) is superfluous and actually is never used in the related proof.) In particular, if x^ ^ i n t ( X ) , it is no more true that the A.H.U. first C.Q. implies Zangwill C.Q. In Giorgi and Guerraggio (1994) modifications of the Zangwill C.Q. and of the other constraint qualifications lying in the first half of the diagram are proposed in order to preserve the validity of the whole diagram in the case x^ ^ i n t ( X ) . More precisely, e.g., the Zangwill C.Q. is modified as follows:
C{x^) c Zi{S,x^) where
,
Again on constraint
275
qualifications
Modified strict C.Q.
Weak Slater C.Q.
Original Slater C.Q.
0 Karlin C.Q.
Nondegeneracy C.Q.
Cottle-Dragomirescu C.Q.
MangasarianFromovitz C.Q.
Arrow-Hurwicz-Uzawa first C.Q.
Weak reverse C.Q.
Zangwill C.Q. ^ Kuhn-Tucker C.Q.
Arrow-Hurwicz-Uzawa second C.Q.
Abadie C.Q.
4> Guignard-Gould-Tolle C.Q.
Figure 13.
Zi (5, x^) = {x\x^
+ ^xeX, n
^5>Q:gi{x^
neN. + \x)
^ 0, Mi, VA E [0,5]}
276
Smooth optimization
problems
In an analogous way the other constraint qualifications which in the diagram "precede" Zangwill C.Q. are modified. If we define by FQ the set of objective functions / having a local minimum at x^ G 5 and differentiable at x^, we already know that there are cases when the classical Kuhn-Tucker conditions (6)-(8) of Section 3.5 do not hold for every / 6 FQ. We have also mentioned the problem, raised by Gould and Tolle (1972), of regularizing the constraints in every problem in the form of (P), in the sense of forcing the Guignard-Gould-Tolle C.Q. to hold by the addition of redundant constraints. This last problem was studied and solved by Agunwamba (1977). Definition 3.7.1. For a given integer k, the function qk is an x^-redundant constraint if and only if qk : M^ -^ M, qk{x^) = 0, qk{x) SO, Wx e S and Vqk{x^) exists. Let g be a vector of x^-redundant constraints for (P) and let us consider the set
Q{x'') = {yelR^\yVqk{x'')SO,
\/k} ,
i.e. the linearizing cone for q{x) at x^ € S. The Gould-Tolle problem for (P) is described as follows. Suppose that 3t x^ e S the Guignard-Gouid-Tolle C.Q. does not hold. Find a function q such that
P{S,x^)
= C{x^)uQ{x^)
.
Clearly this problem is one of regularizing the constraints of (P) with respect to the Guignard-Gould-Tolle C.Q. (Actually, in Gould and Tolle (1972) q is required to have a finite number of components.) Agunwamba (1977) not only discusses necessary and sufficient conditions for the existence of a finite set of x^-redundant constraints which solve the Gould and Tolle problem, but also constructs a general finite solution when certain conditions are satisfied. His main results are given by the following theorem.
Necessary optimality conditions for (Pi)
277
Theorem 3.7.2. i)
The Gould-Tolle problem for (P) always has a solution.
ii) A necessary and sufficient condition for the Gould-Tolle problem to have a finite solution is that there exists a subset B of T*(S, x^) such that the set H =- B (1 {T*{S,x^)\C*(x^)) is a finite set, and every vector y G T*(SjX^)\C*{x^) is a nonnegative combination of some vectors in P. Proof. See Agunwamba (1977).
D
The following corollary gives sufficient conditions for the existence of a finite solution for the above Gould-Tolle problem. Corollary 3.7.1. The Gould-Tolle problem has a finite solution if: i)
cl{conY{T*{S,x^)\C*{x^)))
is a finite cone (i.e. convex polyhedral);
or if ii)
T*{S^x^) is a finite cone; or if
iii) (P, x^) is a finite cone.
3.8. Necessary Optimality Conditions for (Pi) Let us now consider a more general nonlinear programming problem which has both equality and inequality constraints, i.e.
Min fix) ,
(Pi)
xeSi
Si = {x\x
e Xy gi{x) ^ 0 , i = 1,..., m; hj{x) = 0,
j = l,...,r (< n)} , where X C M^, gi, i = 1,..., m, are differentiable at least at x^ G Si and hj, j = 1, ...,r , are continuously differentiable at least in a neighbourhood
278
Smooth optimization problems
of x^ G Si. Theorem 3.6.3 can immediately be fitted to (Pi), if this latter is rewritten as:
Min/(x), xeX,
g{x)^0,
h{x) ^ 0 , h{x) ^ 0 .
Let us introduce the following sets:
D(a;°) = {y G iR" I y • V/i,(xO) = 0, Vj = 1, ..,r} . D(x^) can thus be viewed as the orthogonal complement of the subspace of ]Rr spanned by Vhj{x°), j = 1, ...,r.
E{x°) = C(x°) n D{x^) . Taking Lemma 3.6.1 into account it is immediate to show that E*(x^)
C
T*(5i,a;°). Indeed, from the quoted lemma, we get
T{Su x'>) c{yeR^\
yV5i(xO) g 0, V^ € I(x^); yVhj(x^) ^ 0,
Vj = 1, ...,r; yVhj{x^) ^ 0, Vj = l,...,r} = E(a;°) and from this result we obtain the above assertion. From the previous results the following theorem follows. Theorem 3.8.1. If x^ is a local solution of (Pi), then there exist scalars Xi^O,
ie I{x^),
- [V/(X«) + E
fJij e M, j = l , . . . , r , such that
^^'^9i{^') + E H'^hjix')] ^
(1)
e T*(5i,x^)\£;*(a:^)U{0} , It is obvious that when
T*{Si,x^) = E*{x°),
(2)
i.e. the Guignard-Gould-Tolle C.Q. holds for (Pi), then (1) becomes the classical Kuhn-Tucker condition for (Pi), i.e.
Necessary optimsiUty conditions for (Pi)
V/(xQ)+ E
279
A,Vp,(a:^) + X ] / i , V / i , ( x ^ ) = 0 ,
Or, taking the complementarity slackness conditions into account: m
r
V/(xO) + J2 A^Vp,(x^) + J2 M,•V/^,•(x«) = 0 ,
(3)
AiPi(a:^) = 0 ,
(4)
Ai^O,
i = l,...,m ,
z = l,...,m;
fjLjeM,
jf = l , . . . , r .
(5)
Definition 3.6.2 can be extended to problem (Pi). We say that the triple {g, hj X) is Gould-Tolle regular at x^ G ^ i if for every objective function / with a local minimum at x^ G Si there exist vectors A G IR^ and fi € IR^ such that (3)-(5) hold. It can be proved (see Gould and Tolle (1971)) that the triple {g^h^X) is Gould-Tolle regular at x^ G Si if and only if condition (2) holds. We note that the device to transform an equality constraint into two inequality constraints of opposite sign becomes useless in obtaining the Fritz John conditions for (Pi). Indeed, if we consider the "transformed" problem: M i n / ( x ) , subject to x G X , g(x) ^ 0, h{x) ^ 0, —h{x) ^ 0, it is clear that any feasible point x^ solves the relation m
r
uoVf{x^) + Yl ^iVpi(^') + E ^iV/i,(x^) +
where UQ = Ui = 0 for each ^ = 1, ...,m; Vj =z Wj = 1 for each j = 1, ...,r. Note that the complementarity slackness conditions hold: m
r
r
1=1
jf=l
j=l
This shows that each feasible point will trivially satisfy the above conditions, which then become useless. In order to fit Theorem 3.6.5 to problem
280
Smooth optimization problems
(Pi), we introduce the following lemmas (see Bazaraa and Goode (1972a) and Bazaraa and Shetty (1976)). Lemma 3.8.1.
Let C be an open convex cone in IR^ and let b 6 C;
define the (convex) cone C^ = {c — A6 : c G C, A ^ 0}. If a e C^, then b + 5a + o(S) eCforS-^
0+, where (\\o{d)\\/6) -^ 0 for 5-^ 0+.
Proof. Let a = c - Afe e Cb, where cE C and A ^ 0. Then
b + Sa + o(S) = {l-XS)b
+ S(^C + ^ )
.
Since C is open and c e C then for 5 sufficiently small we have c+o(d)/S e C. Also for 5 sufficiently small (1 — \5) = fi ^ 0. Hence for small enough J > 0, we have
b + 5a + o{5) = {1-X5)b + S(^C + ^ )
eC + C .
But C + C = C since C is open and hence b + 6a + o{S) G C and the proof is complete. D We recall the definition of the cone of quasi-interior directions to X at x^ e X (see Section 3.4): Q{X,x^)
= {x I 3N{x)
such that, for each 5 > 0,
3tG(0,5), yyeN{x):x^ Q{X^x^)
+ tyeX}
.
is an open cone.
Lemma 3.8.2. Let X C IRP' be an arbitrary set and let C be an open convex cone of M^] let a and h be two vector-valued functions defined on an open set containing X and taking values in M^ and IR^, respectively. Suppose that a is differentiate at x^ G X and h is continuously differentiable in a neighbourhood of x^ G X with the Jacobian matrix Vh{x^) of full rank. If the system
r a{x) G C [ h{x) = 0
Necessary optimality conditions for (Pi)
281
has no solution in X, then the system
1 Vh{x^)y = 0, where x^ e X such that a(x^) G C and h(x^) — 0, has no solution in Q(X,xO). Proof. Let us suppose, ab absurdo, that there exists y G Q ( X , x^) which solves the second system (the first system remaining impossible in X ) . From the definition of Q{X^x^^, it follows that a neighbourhood iV(0) and a sequence {A/.} - ^ 0"^ exist such that x^ + Ajt?/ + Ajt • o ( l ) € X , for every element o ( l ) 6 iV(0). Letting x = ( x ^ x ^ ) and x^ = ( x ^ ^ x ^ ^ ) , by the implicit function theorem, from h[x) — 0 and /i(x^) = 0 it follows that there exists a neighbourhood of x^^ where one (and only one) function x2 = F{x^) is defined such that h{x^,F{x^)) = 0, x^^ ^ / ( ^ o i ) . So we have V i / i ( x ^ ) + V2h{x^)VF{x^'^) = 0. By comparing this relation with the assumption Vh{x^)y = 0, i.e. Vih{x^)y^ + V2V/i(x°)y2 = 0 (with obvious meaning of the notation) we obtain y^ = VF{x^^) y^. Now consider the point
(x^' + X,y\F{x^'+Xky'))
= {x^'+ Xky\ x^^ + Xj,y^ + oi{Xk)) = = {x^ + Xky + 02{Xk)) ,
where 02(Afc) = (0,oi(Ajt)). For k large enough this point belongs to X (being y E Q ( X , x^)), and obviously satisfies the condition h{x) = 0; moreover, in force of Lemma 3.8.1, we have
a{x^ + Xky + 02{Xk)) = = a(x^) + Xk • Va(x^) y + osiXk) G C . This conclusion is absurd, and in contradiction with the assumed impossibility of the first system in X . D Theorem 3.8.2. Let x^ be a local solution of (Pi). Moreover, let Q{X, x^) be convex; then there exist scalars AQ ^ 0, A^ ^ 0 (z G I{x^)), Mj U — 1,2, ...,r), with (Ao,A, ju) a nonzero vector, such that
282
Smooth optimization
ie/(iO)
problems
j=i
eQ*{X,x^).
(6)
Proof. Note that if Q{X,x^) is empty the theorem holds trivially, and it also holds trivially if the rank of Vh{x^) is not full. Without loss of generality, assume therefore that the rank of Vh{x^) is full, suppose that x^ is a local solution of (Pi) and let a{x) = (/(x) — f{x^)^g{x)). On the grounds of what was observed in the proof of Theorem 3.6.5, we can assert that there exists no vector x e X such that h(x) = 0 and 0
0
0
a(x) e R- X R- X ... X /2_ = C, where C is an open convex cone. Therefore, thanks to Lemma 3.8.2, no vector y G Q{X,x^) exists such that Va(x^) y 6 C'Q,(^O) and Vh(x^) y = 0, Then, thanks to Lemma 3.6.4, there exists a nonzero vector (g, //) 6 C'*/ o\ x IR^ such that
qiVaix"") y) + /x(V/i(a:^) y) ^ 0 ,
Vy 6 Q{X, x"") ,
Hence
i.e., with q = (AQ, A^), i € /(^^), we have
-
[AOV/(X^)
+ Y.
^^Vp,(x^) + X; MjV/i,(a:^)] € g*(X, x^) .
As to the sign of the multipliers, we observe that C C C'c^(xO) 3f^d therefore ^a(xO) ^ C'*' being g € C*, it follows that AQ ^ 0, A^ ^ 0, Vz G / ( x ^ ) . n The conditions above may be viewed as generalized Fritz John necessary conditions for (Pi). The following remarks may be useful. 1) Theorem 3.8.2 sharpens a result of Bazaraa and Goode (1972a) who refer to the cone of interior directions I(X^x^)] indeed I{X^x^) C Q{X,x^) and hence Q*(X,x^) C r(X,x^).
Necessary optimality conditions for (Pi)
283
2) Note that Theorem 3.6.5 gives a necessary Fritz-John condition for (P), sharper than the one given by Theorem 3.8.2 for (Pi), as we may have Q C Ti and therefore T^{X,x^) C Q*(X,x^). This is not surprising, as (Pi) contains also the equality constraints vector
h{x) = 0. 3) In the assumptions of the theorem we require the convexity of Q{X^ x^)] this is a weaker hypothesis than the convexity of X (considered, e.g., in Robinson (1982)). 4) If x^ e i i i t ( X ) , then Q{X,x^) = J?^ and Q*{X,x^) = { 0 } . Then Theorem 3.8.2 gives in this case the classical Fritz-John conditions for
AoV/(xO)+
Y.
A,V^z(^') + E / i i V / i , ( x O ) = 0
(Ao,AO^O, iel{x^),
{Xo,KfXj)^0
.
Obviously, being that gi{x), i 0 / ( x ^ ) , differentiable at x^, these conditions may be equivalently written as:
AoV/(xO) + XWg{x^) + /iV/i(a:0) = 0 Xgix"^) = 0 (Ao,A)^0,
(Ao,A,/i)7^0.
5) If in Theorem 3.8.2 we could replace Q{X,x^) by the larger cone T{X,x^), the theorem would be sharper, as T*{X,x^) C Q*(X,xO). However, this sharper result does not hold in general. Consider the following example, taken from Bazaraa and Goode (1972a). The problem is: Mmf{x), subject to: x e X, h{x) = 0, where X = {(xi,a:2) I xi and X2 are rational} ;
f{x) = f{xi,X2) It is clear that
= X2 ;
h{x) = h{xi,X2) = X2- \f2x\
.
284
Smooth optimization problems
Xn{x\h{x)
= 0} = {{0,0)} ,
so the only admissible point is the origin and hence x^ = solves the above problem. It is clear that T(XjX^)
(0)0)
= IB? and hence
T * ( X , x^) = { ( 0 , 0 ) } . However, there exists no nonzero vector (Ao,/i) satisfying relation (6) of Theorem 3.8.2. One should note that in the above example Q{X^x^)
is empty and hence the theorem holds triv-
ially. 6) In Mangasarian (1969, p. 168) the following necessary optimality condition for ( P i ) , called "minimum principle" condition, is proved.
{x - x^) . [XoVfix^) + Yl VxeX
A,V^i(x^) + J2 MiV/i,(a:^)] ^ 0 ,
,
where X is a convex set in IR^ with nonempty interior, (AQ, A^, fij) ^ 0, Ai ^ 0, V i € I{x^)-
-[AOV/(XO)
If we put y = X - x^ and y = X - x^, we have
+ Yl
XiVgiix"^) + Y MiV/i,(x^)] e F* ,
iG/(xO)
j=l
and the result of Mangasarian may be viewed as a special case of Theorem 3.8.2. Indeed, under the convexity assumption on X (or on Y) and being i n t ( X ) ^ 0, from
y ccone(y) =r(y,o), it follows
g*(y,o) = T*(y,o)cy*. In (6) of Theorem 3.8.2 the multiplier AQ may be not positive; if one wants to insure the positivity of AQ, some constraint qualification for (Pi)
Necessary optimality conditions for (Pi)
285
need be imposed. We have already seen that the Guignard-Gould-Tolle condition
r*(5i,x^) = E*(x^) insures the validity of the Kuhn-Tucker conditions for ( P i ) : see Theorem 3.8.1.
Keeping the notations already introduced, let us list some
other constraint qualifications for ( P i ) : a) Kuhn- Tucker C. Q. It is expressed as: E{x^) c
A{Si,x^).
b) ArroW'Hurwicz-Uzawa second C.Q. It is expressed as: E{x^) C conY{A{Si^x^)).
c) Abadie C.Q. It is expressed as: E{x^) C
T{Si,x^).
d) Mangasarian-Fromovitz C.Q. (Mangasarian and Fromovitz (1967)). It is expressed as: x^ G i n t ( X ) , Vhj{x^), independent and the system
r yVgiix^) < 0 , 1 yVhj{x^) = 0 ,
j = l , . . . , r , are linearly
i € /(a;°) j = l,...,r
has solution y G IR^. e) Weak Slater C.Q. It is expressed as: x^ G i n t ( X ) , gi, i E I{x^), are pseudoconvex at x^] hj, j = l,...,r, are linear and there exists an x G X such that gi{x) < 0, V i G / ( x ^ ) and hj{x) = 0, V j = 1, ...,r. f)
Weak reverse C.Q. It is expressed as: x^ G i n t ( X ) , gi, i G I{x^), are pseudoconcave at x^ and hj, j = 1, ...,r, are pseudolinear (i.e. both pseudoconcave and pseudoconvex) at x^.
g) Independence C.Q. It is expressed as: x^ G i n t ( X ) , Vgi{x^), j = l , . . . , r , are linearly independent.
i G I{x^).
and V / i j ( x ^ ) ,
Smooth optimization
286
Independence C.Q.
problems
Weak Slater C.Q.
Mangasarian-Fromovitz C.Q.
Kuhn-Tucker C.Q.
^
Weak reverse C.Q.
Arrow-Hurwicz-Uzawa second C.Q.
Abadie C.Q.
Guignard-Gould-Tolle C.Q.
F i g u r e 14. By means of considerations analogous to the ones made for problem (P), in case x^ e i n t ( X ) we have the diagram of Figure 14 showing the various relationships among the constraint qualifications considered for problem (Pi) [see also Bazaraa and Shetty, (1976), Theorem 6.2.3]. It must be noted that the Mangasarian-Fromovitz C.Q. also plays a role in obtaining sensitivity and stability results for a perturbed nonlinear programming problem with inequality and equality constraints. For example, it is shown by Gauvin and Tolle (1977) and Gauvin (1977) that this C.Q. is necessary and sufficient for having the set of Kuhn-Tucker multipliers bounded, an important result in the development of stability analysis for perturbations of (Pi). More precisely, if x^ is a local minimum for ( P i ) , let us denote with K(x^) the set of Kuhn-Tucker multipliers corresponding to x^, i.e. the set of vectors (A,/i) such that (3)-(5) hold. Then K{x^) is a nonempty bounded set if and only if the Mangasarian-Fromovitz C.Q. is satisfied at x^.
Sufficient first-order optimality conditions for (P) and (Pi)
287
Sensitivity and stability in optimization problems are also studied, e.g., in Evans and Gould (1970), Geoffrion (1971), Greenberg and Pierskalla (1972), Fiacco (1983, 1984), Rockafellar (1993). See also Section 3.10. 3.9. Sufficient First-Order Optimality Conditions for (P) and
We present first the following classical theorem of Mangasarian (1969) which gives first order sufficient optimality criteria for problem (P) and which subsumes the sufficiency results of Arrow and Enthoven (1961) and Arrow-Hurwicz and Uzawa (1961). Theorem 3.9.1. Let x^ G 5 and / : Z) —> JR be pseudoconvex at x^ (with respect to D) and gi : D -^ M, i ^ ^i^^)^ be quasiconvex at x^ (with respect to D). If there exists a vector A G IR^ such that {x ~ x^) [Vfix^)
+ XWg{x^)] ^ 0 ,
WxeS
;
Xgix^) = 0 ; A^0 , then x^ solves (P). Proof. First note that thanks to the complementarity slackness conditions Xg{x^) = 0 we have A^ = 0, \fi ^ I{x^), V i G I{x^),
Since gi{x) ^ 0 =
y X e S, it follows by the quasiconvexity of gi, i e I{oo^),
{x - x^) Vgi{x^) ^ 0 ,
Vi G I{x^),
Being that A^ ^ 0, Vz G I{x^),
yxeS
Ai(a;-x°)V5i(x°) g o ,
.
and A^ = 0, V i ^ I{x^),
m
^
gi{x^),
Vx e 5 .
i=i
But since (x - x°) [V/(x°) + AV5(x°)] ^ 0 ,
Vx€5,
we get
288
Smooth optimization
problems
then
which, by the pseudoconvexity of / at x^, implies that f(x)^f(x^),
WxeS.
D
From Theorem 3.9.1 the following Kuhn-Tucker sufficient optimality theorem directly follows. Theorem 3.9.2. Let x^ E S, f be pseudoconvex at x^, Qi, i e I^x^), be quasiconvex at x^. If (x^, A) satisfies the Kuhn-Tucker conditions ( 6 ) (8) of Section 3.5, then x^ solves (P). Note that in Theorems 3.9.1 and 3.9.2 the pseudoconvexity of / at x^ and the quasiconvexity of ^ i , i e I{x^), at x^, are actually required to hold with respect to the feasible set S. Another first-order sufficient optimality theorem for (P) is obtained by imposing generalized convexity conditions on the Lagrangian function
^{x,X) = f(x) +
Xg{xy,X^O.
Theorem 3.9.3. Let x^ e S and let the pair (x^, A) satisfy the KuhnTucker conditions (6)-(8) of Section 3.5. If '0(-, A) is pseudoconvex at x^, with respect to S, then x^ solves (P). Proof. As Va:'0(a:^,A) = 0, being that '0(-,A) is pseudoconvex at x^, it has a minimum at x^ on S. Therefore for each x G 5 we have
fix) + Xgix) ^ /(xO) + \g(x')
,
i.e.
f{x) +
Xg{x)^fix''),
and, being that g{x) ^ 0, V x 6 S, and A ^ 0, we obtain
fix) ^ /(xO) ,
Vx € 5 .
D
SufRcient first-order optimality conditions for (P) and (Pi)
289
We note that the assumptions made in Theorems 3.9.2 and 3.9.3 are not comparable. Note again that in Theorem 3.9.1 the assumption of the quasiconvexity of gi, i G I{x^), was only used to establish that, for all X e S,
But for this purpose it suffices that AS is a convex set or even a star-shaped set at x^. For then, (1 - a) x° + aa: G 5, V a G [0,1], V x 6 5, whence
g-{a) = gi{{l -a)x^
+ Oix)^Q ,
Va G [0,1], Vx G 5 .
As gi{G) = gi{x^) = 0, Vi G /(x°), it results (p-(a) denoting gl{0) = {x- x^) Vgi{x^) ^ 0 ,
dgi{a)/da)
Vi G / ( x ^ ) , Vx G 5 .
So we obtain the following result (see also Arrow and Enthoven (1961) who credit the above remark to Uzawa): Theorem 3.9.4. Let x^ G 5, / be pseudoconvex at x^, S convex and conditions (6)-(8) of Section 3.5 be satisfied. Then x^ solves (P). Hanson (1981) noted that in proving first-order sufficient optimality conditions for (P), the linear term (x — x^), appearing in the definitions of pseudoconvex and quasiconvex (differentiable) functions, plays no role. This fact motivated the introduction of invex (and generalized invex) functions (see Section 2.17 of Chapter II). The proof of the following theorem follows the same lines of the proof of Theorem 3.9.1. Theorem 3.9.5. Let f,gi,iE I{x^), be invex functions at x^ G S, with respect to the same function //(x, x^) : DxD -^ JBT-. If the Kuhn-Tucker conditions (6)-(8) of Section 3.5 are satisfied at x^, then x^ solves (P). Theorem 3.9.5 can be further weakened by imposing that / is invex at x^ G 5 and gi, i G I{x^), are quasi-invex at x^ G 5 (invexity and quasi-invexity are assumed with respect to the same function r]{x^x^)).
290
Smooth optimization
problems
Another interesting application of invex functions is made by Hanson and Mond (1987). They consider problem ( P ) , where X C M^ is open, and say that / and g are Type I objective and constraint functions for (P), with respect to a vector-valued function 77 at x^, if there exists an n-dimensional vector function 77, defined for ail x E S, such that
fix) - fix') ^ nix) • Vfix") and
-5(x°) ^ 7?(x) • [Vgix')f
.
Hanson and Mond prove the following two results. Theorem 3.9.6. For x^ e S to be optimal for (P) it is sufficient that / and Qi, i e I{x^), are Type I functions with respect to a common function rj at x^ and the Kuhn-Tucker conditions (6)-(8) of Section 3.5 are satisfied at x^. Note that Theorem 3.9.6 is nothing but Theorem 3.9.5, where 77 depends only on x and not also on x^. Note, moreover, that if 77 is identically zero then Theorem 3.9.6 holds trivially and x^ is a solution of (P) without regard to the Kuhn-Tucker conditions. Theorem 3.9.7. \f x^ e S and the number of active constraints at x^ is k (i.e. card(/(x^)) = k), with k < n, then for x^ to be solution for (P) it is necessary that / and g are Type I functions, with respect to a common vector 77 at x^, not identically zero for each x € 5. Therefore, combining together the two last theorems, we can assert that if X is open, x^ € S, the Kuhn-Tucker conditions (6)-(8) of Section 3.5 hold at x^ and card(/(x^)) < n, then for x^ to be solution for (P) it is necessary and sufficient that / and g are Type I functions with respect to a common vector 77 at x^, not identically zero for all x e S. Other sufficient first-order optimality conditions involving invex and generalized invex functions are given in Craven (1981), Craven and Glover
SufRcient first-order optimality conditions for (P) and (Pi)
291
(1985), Kaul and Kaur (1985), Jeyakumar (1985a), Martin (1985), Rueda and Hanson (1988), Weir and Jeyakumar (1988). See also Kaul and Kaur (1982), Mahajan and Vartak (1977), Giorgi (1984, 1995), Giorgi and Guerraggio (1994), Guignard (1969), Szlobec (1970) for other considerations on first-order sufficiency criteria of the Kuhn-Tucker type for (P). Mangasarian (1969) proved the following Fritz John-type sufficient criterion for (P): Theorem 3.9.8. Let x^ e S, f be convex at x^ and gi, i e I{x^), strictly convex at x^. If {x^,uo,u) solves the Fritz John conditions (3)-(5) of Section 3.5, then x^ solves (P). Note that the above theorem requires a rather stringent assumption on Qi, i G I{x^), but also allows for UQ = 0 (i.e. for a zero multiplier of the gradient of the objective function) in the Fritz John conditions. In fact, as shown in Theorem 3.9.14 for (Pi), the thesis of Theorem 3.9.8 also holds if / is pseudoconvex at x^ and g^, i G I{x^), are strictly pseudoconvex at or.0
X .
Another Fritz-John sufficient optimality criterion for (P), more general than Theorem 3.9.8, is given by Kaul and Kaur (1985). Theorem 3.9.9. Let x^ E S, f invex at x^ and gi, i E I{oc^), strictly invex at x^, with respect to the same function r]{x^x^) : D x D —^ IRP'. If conditions (3)-(5) of Section 3.5 are satisfied, then x^ solves (P). Proof. From
i€/(a;0)
(uo,Ui) > 0 ,
i€7(x°) ,
it follows, by the Gordan theorem of the alternative, that the system
r 7y(x,xO)V/(xO)
< 0
1 7?(a;,a;0)(Vg/(xO)f < 0
292
Smooth optimization
problems
has no solution x e D, Suppose that x^ is not a solution of (P); then there exists x'^ e S such that / ( x * ) < f{x^) and gi{x*) ^ 0 = gi{x^). But from the invexity of / at x^ and the strict invexity of gj at x^ we obtain the inequalities
0 > fix"") - f{x^) ^ v{x\x^)
Vf{x^)
0 ^ giix*) - gi(x') > 7?(x*, x^) iVgi{x')f
.
These inequalities show that x* is a solution of the system r 77(rc,a;0)V/(xO)
< 0
1 7?(rr,x°)(Vff/(xO))^ < 0 which is a contradiction to the assumption that (1) has no solution. Therefore/(x) ^ /(x^),
VXG5.
D
Obviously if ?7(x,x^) = x —x^, Theorem 3.9.9 becomes Theorem 3.9.8. Sufficient criteria of the Kuhn-Tucker type for problem (Pi) have been proved by Mangasarian (1969) in the following, rather general, formulation. Theorem 3.9.10. Let x^ G 5 i , / be pseudoconvex at x^, ^ i , i G I{x^), be quasiconvex at x^ and let hj, j = 1, ...,r, be quasilinear (i.e. quasiconvex and quasiconcave at x^). If there exist A G IR^, /i G M^, such that the Kuhn-Tucker conditions (3)-(5) of Section 3.8 are satisfied at x^, then x^ solves (Pi). Proof. This sufficient optimality criterion follows directly from Theorem 3.9.2 by observing that the equality constraint h{x) = 0 can be written as h{x) ^ 0 and —h{x) ^ 0, and that the negative of a quasiconcave function is quasiconvex. D Singh (1977) gives the following sufficient optimality criteria for (Pi), criteria considered by this author as a generalization of the previous result of Mangasarian (the result of Singh is, however, almost trivial). Theorem 3.9.11. Let x^ e Si, f pseudoconvex at x^, g and h quasiconvex at x^ and suppose that there exist A G M^, /i G M^, such that
Sufficient first-order optimality conditions for (P) and (Pi)
V/(xO) + AVp(x°) + fi'^hix^)
293
=0,
Xg{x^) = 0 , A^ 0 ,
fi^O
.
Then x^ solves (Pi). Other first-order sufficient optimality theorems for (Pi) are given by Bector and Gulati (1977), Bhatt and Misra (1975), Mahajan and Vartak (1977). All these results are trivial consequences of already established results for ( P ) . Indeed, consider the following problems: Min/(x) ,
xeS
= {x\xeX,
Min/(x) ,
xeS2=^{x\xeX,
g{x) ^ 0} , g{x) S 0, h{x) ^ 0} .
(P) (P2)
Clearly 5 i C S'2 C S. Now \f C(x^), C2{x^) are sufficient conditions for a point x^ to be solution of, respectively, ( P ) , (P2), then it follows trivially that C{x^) ,
h{x^) = 0
or C2(x^) ,
h{x^) = 0
are sufficient conditions for x^ to be solution of (Pi). These conditions are precisely the ones given in Theorem 3.9.11 and in the other papers cited above. Obviously Theorems 3.9.10 and 3.9.11 can be reformulated under suitable assumptions of invexity and generalized invexity of the functions involved. We obtain now for problem (Pi) a sufficient optimality condition, under no (generalized) convexity assumption on h{x). Let us consider the linearizing cone C{x^)\ C{x^) = {zelR''\
zVgiix"^) ^ 0, Vi G I{x^)} ;
294
Smooth optimization
problems
the cone of gradients (for the constraints gi):
and the cone of gradients (for the constraints hj):
In Section 3.6 we have noted that it is
B{x^) = C*{x^) . Then we have the following result: Theorem 3.9.12. Let x^ e Si, f pseudoconvex at x^, gi, i E I{x^), quasiconvex at x^ and let L{x^) C C*{x% Then if there exist i9 € IR^ and /JL e M^ such that
V/(x^) + dVg{x^) + /2Vh{x^) - 0
(2)
^g{x^) = 0 ^ ^0 , then x^ solves (Pi). Proof. As L{x^) C C*(xO) = B{x^), then
/iV/i(a:^) =
Yl
^i^9i{x^)
.
A^ ^ 0, i G /(x^) .
iG/(xO)
Let us now define q G iR"^ as:
_ r Ai ,
^'~ i 0 ,
V i e /(x^)
yi^i{x^).
From (2) we obtain
Vf{x^) + ^Vg{x^) + qVg{x^) = 0 . Letting w = i^ + q \Ne obtain
Sufficient first-order optimality conditions for (P) and (Pi)
295
V/(x^) + wVg{x^) = 0 wg{x^) = ^g{x^) + qg{x^) = 0 g{x^)SO,
h{x^) = 0,
w=^ +
q^O,
From these relations, using the same arguments used in the proof of Theorem 3.9.2, we obtain
f{x^) s fix),
yxeSi.
n
Another quite general sufficient first-order optimality condition (of the Kuhn-Tucker type) for problem (Pi) is obtained in Giorgi and Guerraggio (1994) and Giorgi (1995). Let us premise the following definition. Definition 3.9.1. The set D C K^ \s said to be invex at x^ e D with respect to a function r] : D x D —^ IR^, r}{x^x^) different from the zero vector, when 7?(x,x^) G P{D^x^), \/x e D, where P{D,x^) is the pseudotangent cone to D at x^. This definition, which is a generalization of the concept of pseudoconvex set, due to Guignard (1969), extends the concept of convexity of a set and also of convexity of a set at a point x^ ("star-shaped set"). Indeed, if a set 5 C IRP' is convex or also star-shaped at x^ G 5, then it is also invex at x^, with r]{x^x^) = x — x^, i.e. it is pseudoconvex at x^. Since we proved (Theorem 3.4.10) that, if S is convex, we have P{S,x^) = T{S,x^) = cone{S - x^) (these equalities hold also if 5 is star-shaped at x^] see Bazaraa and Shetty (1976)), then it is obvious that X — x^ = lim/c-^+oo ^k{^^ — ^^)l with A^ = 1 and x^ = x, k — 1,2,..., this means that x — x^ e. P{S^x^). An example of an invex set is: A = {x e ]R\x = Q or x = 1/n, n G iV^.}; this set is invex at x^ = 0 for 77(rr,x^) = x — x^ and also for, e.g., r]{x^x^) = (x —x^)^. Theorem 3.9.13. Let x^ G ^ i , let 5 i be invex at x^ and / be pseudo-invex at x^, with respect to the same function 77. If there exist scalars A^ ^ 0,
296
Smooth optimization
problems
i 6 I{x^), and fij e M, j = 1,..., r, such that
iG/(xO)
i=l
then x^ solves (Pi).
Proof. Let cp = (gi^hj), i 6 I(x^) and j = 1, ...,r, and let B = (fi(Si) the image space of the constraints of (Pi). From the invexity of Si at x^, there will exist a vector-valued function rj : iS'i x ASI —> M^ such that
r?(x,xO)eP(5i,xO), V X G 5 I . Given an arbitrary y e T{Si^x^), then, by a well-known property of Bouligand tangent cones, we have V(p{x^) • y € T(B^(p(x^)).
Being 7]{x^x^) G P(AS'I,X^), we have 77(x,x^) = limfc-^+oo 2/^, with y^ € conv(r(5i,x^)), i.e. r?(x,x^)=
lim (t,2/l^ + (l>.tfc)2/''), «—•+00
with yi'=,y2'= g r ( 5 i , x ° ) , 0 ^ ifc ^ 1. Then it results: V^(xO) • 7?(x, xO) = , lim [tfcV if? is said to be strictly pseudoconvex at x^, if for any x 7^ x^, (x - x^) V / ( x ^ ) ^ 0 =^ / ( x ) - / ( x ^ ) > 0 or, equivalently fix)
- /(x^) ^ 0 =^ (x - x^) V/(x^) < 0 .
(See Chapter II, Section 2.10.) Theorem 3.9.14. Let x^ G 5 i , / be pseudoconvex at x^, gi, i 6
I{x^),
and hj, j = 1,..., r , be strictly pseudoconvex at x^. If there exist XQ G M, XeM^,
IJ^eR'' such that
Ao V/(xO) + AV^(xO) + /iV/i(xO) = 0
(3)
Ap(xQ) = 0
(4)
(AO,A,M)>0,
(5)
Sufficient first-order optimality conditions for (P) and (Pi)
299
then x^ solves (Pi). Proof. Condition (3) can be written as follows (Aj and gj have an obvious significance):
AoV/(x^) + A/V5/(x^) + MV/I(X^) = 0 . Therefore appealing to the Gordan theorem of the alternative (result 11 of Section 2.4), we can affirm that there does not exist any z e M^ such that
Vf{x^)z 0, for an index i E I{x^)\I'^{x^), by means of the Motzkin transposition theorem applied to the last equality, we obtain that the S.M.F.C.Q. does not hold. Therefore A^-A^ = 0, V i e I{x^)\I^{x^),
310
Smooth optimization
problems
c) Finally, because of the S.M.F.C.Q., vectors Vgi{x^), i G I~^{x^), Vhj{x^), j = 1, ...,r, are linearly independent and therefore A^ —A^ = 0, V i G/+(a:^) and Jlj - / i ^ = 0, V j = l,...,r. Conversely assume that the S.M.F.C.Q. does not hold because ii) does not hold. Then, thanks to the Motzkin alternative theorem, it is possible to find scalars Si and tj such that
ieI{xO)\I+{x^)
iG/+(a:0)
j=l
with Si ^ 0, not all zero, for each i € I{x^)\I~^{x^)
and such that
maXi^/+(^o) \si\ < mini^7+(a;0) A^. As the Kuhn-Tucker conditions hold at x^ we get
r
+ E (5i + ^i)vpz(x^) + E(^^+/^i)v^^(^'^) = oi€/+(xO)
j=l
Thus we should obtain two different sets of multipliers. The same conclusion holds if the S.M.F.C.Q. does not hold because i) does not hold. D We should remark that the S.M.F.C.Q. could not properly be considered a constraint qualification, since the set I^{aP) is not known before the validation of the Kuhn-Tucker conditions. Another remark, concerning the S.O.C.Q. of McCormick, is that it requires that the arc a(i9) must be contained in the manifold
{x I gi{x) = 0, i G /(x^), hj{x) = 0, j = 1, ...,r} , i.e. gi[a{^)] = 0, ^ € / ( a ; ° ) , hj[a{^)] = 0, j =
l,...,r.
Second-order optimality conditions
311
However, Theorem 3.10.2 can be obtained under a weaker S.O.C.Q.: (S.O.C.Q.II) For each z ^0, z e Z(x^), there exists a twice differentiable feasible arc a(t?) : [0,i^] -^ i ? ^ such that:
a(0) = x^ ,
a\0) = z
and
The proof of Theorem 3.10.2, under S.O.C.Q.II is left to the reader. For second-order optimality conditions in topological vector spaces, the reader is referred to Ben-Tal and Zowe (1982) and to Maurer and Zowe (1979). See also Hettich and Jongen (1977) for another S.O.C.Q. Let us now consider sufficient second-order conditions for optimality in problem (Pi). Such conditions were essentially derived by Pennisi (1953), Hestenes (1966, 1975), McCormick (1967); see also Pallu de la Barriere (1963) and the unpublished Master's Thesis of Karush (1939). We recall first the definition of the cone Z{x^), previously introduced:
Z{x^) = {ZER''\
zVgi{x^) = 0, i E /+(xQ); zVgi[x^) ^ 0,
i e /(x«)\/+(xO); zVhj{x^)
= 0, j = 1, ...,r} .
We have the following theorem, the proof of which is taken from Fiacco and McCormick (1968). Theorem 3.10.5. Let x^ e Si] if there exist vectors A, /x such that the Kuhn-Tucker conditions (3)-(5) of Section 3.8 are satisfied at x^ and if, for every z E Z{x^), z ^ 0, \t follows that m
z[Hf{x'')
+ Y, \Hgi{x^)
r
+ J2 HHhjix'')]
z>0,
(6)
then x^ is a strict local minimum for problem (Pi). Proof. Assume the Kuhn-Tucker conditions (3)-(5) of Section 3.8 hold and that x^ is not a strict local minimum. Then there exists a sequence {z^} of feasible points z^ ^ x^ converging to x^ and such that for each
J'
312
Smooth optimization
fix') ^ f{z^) .
problems
(7)
Let z'^ = a;° + i?fcy'^, where i^jfc > 0 for each k and ||y''|| = 1. Without loss of generality, assume that the sequence {i?fc,y^} converges to (0,y), where ||y|| = 1. Since the points z'^ are feasible,
9i(z'') - 9i{x'>) = ^ky'^^giix" + Vi,kAy'') ^ o , i 6 7(x0) ,
(8)
hjiz'') - hj{x^) = t?fcy*=V/i^(xO + %fci?fcy'=) = 0 , i = l,..,r
(9)
and from (7) fiz'')
- /(a;0) = ^ky''Vf{x°
+ rjk^kv'') ^ 0 ,
(10)
where rjfe, rji^k, fjj^k sre numbers between 0 and 1. Dividing (8), (9) and (10) by •dk and taking limits, we get yV5i(a:0) ^ 0 ,
ie /(a;")
(11)
yVhj(x^)
= 0,
j = l,...,r
(12)
yVf{x°)
S 0 .
(13)
Suppose that (11) holds with a strict inequality for some i G I~^{x^). Then combining (3)-(5) of Section 3.8, (11) and (12), we obtain m
0 ^ yVf{x^)
= - X ] XivVgiix^)
r
- J2 /^.•yV/i,(a:^) > 0 ,
(14)
which is a contradiction. Therefore yVgi{x^) = 0, "ii e I'^{x^) (or /"^(x^) is empty) and so y G Z{x^). From Taylor's expansion formula we obtain 9i{z'') = 5i(x0) + ^^yfcV5i(xO) + ^ (i?fc)2 . • y'=[ifpi(x° + ei,fci?fcy^)] y'^ ^ 0 ,
i = 1,..., m
(15)
/i,(z'=) = /i^(xO) + ^fcy*^V/i,(xO) + i (^fc)2 .
• y''[Hhj{x'> + ei,fc^fcy'=)] y'' = o,
i = 1,..., r
(16)
Second-order optimality conditions
313
and
•y''[Hf{x° + ^k^ky'')]y''^0,
(17)
where ^k, | j fc, fj_fc are again numbers between 0 and 1. Multiplying (15) and (16) by the corresponding A^ and [J,J and adding to (17) yields 771
^kV^ {v/(xO) + ^
r
Wgiix'^) + Y. iJijVhjix^)] +
i=l
3=1 771
2=1
j=i
The expression in brackets vanishes by the Kuhn-Tucker conditions; dividing the remaining portion by ^ 771
(I?A;)^
and taking linnits we obtain
r
y[Hf{x^) + X^ XiHgiix'') + ^^ fijHhjix'')] y^O. i=i
Since y is nonzero and belongs to Z{x^), (6).
(17)
j=i
it follows that (17) contradicts D
Note that Theorem 3.10.5 contains the classical sufficient conditions for a point x^ to be a strict local unconstrained minimum of the twice continuously differentiable function f : X —> IR and for x^ to be a strict local minimum for problem (Pe). i-^- Theorems 3.2.5 and 3.3.2 are cases of Theorem 3.10.5. Hettich and Jongen (1977) include in the definition of Z(x^) also the equation zVf{x^) = 0. Although this is valid, it does not further restrict Z{x^), since when x^ is a point satisfying the Kuhn-Tucker conditions (3)-(5) of Section 3.8 and z G Z{x^), we have automatically
zVf{x^)
= 0.
Fiacco (1968) extends Theorem 3.10.5 to sufficient conditions for a not necessarily strict minimum for problem (Pi).
314
Smooth optimization
problems
Let us again consider the set Z{x^) and let z e Z(x^), z 7^ 0. Define the Lagrangian function for (Pi):
C{x, A, fi) = f{x) + Xg{x) + iJLh{x) and define the set
Y{£, 5) = {y\ \\y - z\\ ^ e for some z e Z{x^), x^ + Syy is feasible for some 6y, 0 < Sy < S, s > 0 and \\y\\ = 1} . Theorem 3.10.6 (Neighbourhood sufficiency theorem). Let x^ e Si; if there exist vectors A, /i satisfying the Kuhn-Tucker conditions (3)-(5) of Section 3.8 and if there exists e^ > 0, 6^ > 0 such that for every ye Y{£\S') it follows that
y[H^C{x^ + t5yy,X,f,)]y^0,
(18)
V t € (Oj 1), then x^ is a local minimum for (Pi). Proof. See Fiacco (1968) and Fiacco and McCormick (1968).
D
If in (18) we have
y[HCa:{x^ + t5yy,X,fi)]y>0
,
for all t e (0,1), then x^ is a strict local minimum for problem (Pi). Fiacco (1968) proves that Theorem 3.10.5 of McCormick can be obtained, as a corollary, from Theorem 3.10.6. Hestenes (1975) and Robinson (1982) (the second author considers a more general problem than (Pi)) prove that the sufficient optimality conditions of Theorem 3.10.5 actually provide a sharper result. Indeed the following theorem holds. Theorem 3.10.7. Let x^ e Si', if there exist vectors A, /x such that the Kuhn-Tucker conditions (3)-(5) of Section 3.8 are satisfied at x^ and if for every z e Z{x^), z j^ 0, \t follows that m
r
Second-order optimality conditions
then there are a neighbourhood N{x^)
fix)
^ f{x^) + m\\x-
x^f
,
315
and a constant m > 0 such that
yxeSiH
N{x^) .
Proof. See Hestenes (1975).
D
Robinson (1982) asks whether the strict local minimizer obtained by means of Theorems 3.10.5 or 3.10.7 is also an isolated local minimum, that is, whether there is some neighbourhood of x^ containing no other local minimizer for problem (Pi). Robinson considers the following example: minimize ^x'^
subject to: x^ sin(l/a:) = 0 [sin(l/0) := 0] . The feasible region is {0} U {(TITT)"-^, n = ± 1 , ± 2 , . . . } . The second-order sufficient conditions of Theorem 3.10.5 are satisfied at the origin. However, the origin is a cluster point of the feasible region and every feasible point is a local minimizer. It is true that this is a "bad" problem; but this anomaly can be excluded by means of some "regularity" conditions on the constraints. More precisely, Robinson shows that if at the feasible point x^ the Kuhn-Tucker conditions (3)-(5) of Section 8 hold for (Pi), the MangasarianFromovitz constraint qualification is satisfied at x^ and, moreover, the following General Second-Order Sufficient Conditions hold at x^\ G.S.O.S.C.: Relation (6) of Theorem 3.10.5 holds for every z e Z{x^), z y^ 0, and for every A, /x such that (x^,A,/x) satisfies the Kuhn-Tucker conditions. Then x^ is actually an isolated local minimum point for ( P i ) . Note that if the Independence C.Q., or the Strict Mangasarian-Fromovitz C.Q. hold, then in the Kuhn-Tucker conditions the multipliers A, // are unique and so Theorem 3.10.5 assures that x^ is an isolated local minimum point for (Pi), being that the General Second-Order Sufficient Condition is automatically verified.
316
Smooth
optimization
problems
Another area of application of the second-order sufficient optimality conditions is in the study of the sensitivity analysis in nonlinear programming. General results, depending on Theorem 3.10.5, are given by Fiacco and McCormick (1968) and Fiacco (1983). We present only the following result. Theorem 3.10.8. Let / , gi, i = l , . . . , m ; hj, j = 1, ...,r, be twice continuously differentiable on the open set X C M^; consider the problem P(b,c):
Min/(x) subject to: gi{x) ^bi
,
hj{x) = Cj ,
i = l,...,m j = l,...,r .
Let x^ be a local solution of this problem, where hi = 0, i = 1^..., m , and Cj = 0, j = 1, ...,r, and let x^ satisfy the following conditions: i)
The gradients Vpi(x^), i € I{x^), independent.
Vhj{x^),
j = 1, ...,r, are linearly
ii)
The second-order sufficient conditions of Theorem 3.10.5 are satisfied at^o.
iii) In the Kuhn-Tucker conditions A^ > 0, Vz G I{x^) complementary slackness conditions hold).
(i.e. the strict
Then there is a continuously differentiable vector-valued function a:^(6, c), defined on a neighbourhood of (0,0) in IR^ x IRT, such that x^(0,0) = x^ and such that for every (fe, c) in a neighbourhood of (0,0), x^(fe, c) is a strict local solution of problem P{b^c). Moreover, Vfc/(a;0(6,c))|(o,o) = - A Vc/(xO(6,c))|(o,o) = - M
Linearization properties of a nonlinear programming problem
Proof. See Fiacco (1983), McCormick (1983), Luenberger (1984).
317
D
The reader may note that the above theorem extends to a problem with both inequality and equality constraints, the results on the interpretation of Lagrange multipliers, given in Theorem 3.3.5. 3.11. Linearization Properties of a Nonlinear P r o g r a m m i n g Problem In this section we study some interesting characterizations of the solutions of a smooth nonlinear programming problem, with respect to various linearized forms of the same problem. For simplicity we shall consider problem (P) and suppose that X C IRP' is open and that all functions are differentiable: Min f{x) xes S = {x\x
(P) e X^ giix) ^ 0 , i = 1,..., m} .
Consider then the following problems, where x^ 6 S\ (Li) Mm{{x - x^)SJ f{x^)\x (L2) Min {(x - x^) Vf{x^) (L3) Min {fix)
\xeX,
e X, gi{x) ^ 0 , i = l , . . . , m } ; \xeX,
{x-
{x^ x^) Vgiix^)
x^) Vgi{x^) ^ 0, z € I{x^)} ; S 0, i G /(x^)} .
( L i ) , (Z/2) and (L3) represent various "degrees of linearization" of problem ( P ) . In this section we shall establish necessary and sufficient conditions such that a solution x^ of one of the four problems above is also a solution of the other three remaining problems. This points out a kind of invariance property of the solution of (P) with respect to the three linearizations considered (see also Kortanek and Evans (1967)). If x^ is a solution, e.g., of (P) and also solution, e.g., of ( L i ) , we shall write for sake of brevity (P) ^ ( L i ) .
318
Smooth optimization
problems
Theorem 3.11.1. Let x^ G 5. i)
If x^ is a solution of (P), at x^ the constraints of (P) satisfy a C.Q. and every gi, i E I{x^), is quasiconvex at x^, then x^ is also a solution of ( L i ) ; if x^ is a solution of (P) and at x^ the constraints of (P) satisfy a C.Q., then a:^ is also a solution of (L2); if ^^ is a solution of (P), at x^ the constraints of (P) satisfy a C.Q. and / is pseudoconvex at x^, then x^ is also a solution of (I/3).
ii) If x^ is a solution of ( L i ) and / is pseudoconvex at x^, then x^ is also a solution of (P); if a:^ is a solution of (I/2)» / 's pseudoconvex at x^ and every gi, i G I{x^), is quasiconvex at x^, then x^ is also a solution of (P); if x^ is a solution of (L3) and every gi, i G /(a:^), is quasiconvex at x^, then x^ is also a solution of (P). Proof. We prove the following implications: 1) (P) => ( L i ) . Note first that 5 is the feasible set of both (P) and ( L i ) . Suppose that x^ is optimal for (P); then, thanks to the quasiconvexity of gi, i G I{x^),
we have the following implication, for each x e S:
9i{x) ^ gi{x^) = 0, z G I{x^) => {x - x^) Wgiix^) ^ 0 .
(1)
Being a constraint qualification satisfied at x^, the Kuhn-Tucker conditions will hold and, taking (1) into account, we have (x-x^)V/(x^) ^ 0 ,
V X G 5
,
i.e. x^ is also optimal for ( L i ) . 2) (P) => {L2). Denote with D2 the feasible set of (Z/2); as at x^, solution of (P), a constraint qualification is satisfied by gi, z = 1, 2,..., m, the Kuhn-Tucker conditions will hold, so that we can write
ie/(a:0)
V x € I?2 . Hence x ° , being in D2. is also an optimal solution for (L2).
(2)
Linearization properties of a nonlinear programming problem
3) ( L i ) =^ (P). Suppose that x^ is optimal for (Li),
{x-x^)Vf{x^)
^0 ,
yxeS
319
i.e.
,
(3)
Then, being that / is pseudoconvex at x^, from (3) we can draw the inequality f{x)
^ f{x^),
V x G 5. Hence x^ is also optimal for (P).
4) (Ls) =^ (P). The objective functions of the two problems are equal; suppose then that x^ is optimal for {Ls). the quasiconvexity of gi, i E
S c{x\xeX, C{x\xeX,
Then we have, because of
I{x^),
gi{x) ^ gi{x^) - 0, Vi G I{x^)} C (x - x^) Vgi{x^) ^ 0, Vi G I{x^)} .
So the assertion is trivial. 5) Under the pseudoconvexity of / we have (L2) => (^3)- The proof parallels the one given for proving the implication ( L i ) => ( P ) . 6) For completeness let us prove also that, under the quasiconvexity of gi, i G I{x^), we have (L2) =^ (^i)- The proof parallels the one given for proving the implication (L3) =^ (P).
D
From the previous theorem it appears that given the validity of a C.Q. at x^ G 5, the pseudoconvexity of / and the quasiconvexity of every gi, i G I{x^), the four problems (P), ( L i ) , (L2) and (L3) are equivalent, in the sense that they have the same solutions. In particular, we note that (L2) is a linear program, both in the objective function and in the constraints and therefore it may be viewed, under the said assumptions of equivalence, as a "linear test" for the optimality in the nonlinear problem The inclusions proved in Theorem 3.11.1 are depicted in the following diagram (Figure 15). Corollary 3.11.1. Let / be pseudoconvex at x^ G 5 and every gi, i e I{x^), quasiconvex at x^. If at x^ a constraint qualification is satisfied, then a
320
Smooth optimization
problems
(L3)
C.Q.
di) Figure 15. necessary and sufficient condition for x° to be a solution of (P) is that the system of inequalities r yVgiix^)
^ 0 ,
ie
lix') (4)
I 2/V/(xO) ^ 0 admits no solution y G M'^.
Proof. Under our assumptions, thanks to Theorem 3.11.1, x^ is a solution of (P) if and only if it is a solution of (L2), i.e.
{x - x^) Vgi(x^) SO, ie I{x^) => => xVf{x^) ^ x^Vfix^)
.
(5)
Write y = X — x^; then (5) becomes yV5i(x') SO,
ie /(x^) ^ yVfix"")
Z 0.
The sufficiency part springs directly from the Farkas-Minkowski theorem of the alternative, since from the inconsistency of (4) we can establish the existence of multipliers Xi ^ 0, i e I{x^),
i€/(a;0)
such that
Some specific cases
321
However, being that / is pseudoconvex at x^ and every p^, i e I{x^), quasiconvex at x^, this last relation qualifies x^ as a solution of problenn ( P ) . n Historically every constraint qualification for problem (P) or (Pi) affects only the constraints of the said problems. There are in the literature, especially for the nondifferentiable convex case, also conditions affecting the constraints together with the objective function, conditions assuring the validity of Kuhn-Tucker type relations. It is more appropriate to call these last conditions "regularity conditions". These are often investigated with reference to duality or saddle points theorems for a convex program. (See Sections 3.14 and Chapter 5. See also Cambini (1986), Geoffrion (1971), Giannessi (1984), Martein (1985), Rockafellar (1970, 1974).) A necessary and sufficient regularity condition for the differentiable problem (P) is simply obtained from the previous results. In fact, we have the following: Corollary 3.11.2. The point x^ G 5 is a solution of (L2) if and only if there exists a vector A such that the pair (x^, A) fulfills the Kuhn-Tucker conditions for (P):
V/(xO) + XVg{x^) = 0 A5(x^) = 0 A^0 . Corollary 3.11.2 has also been obtained by Martein (1985) in a wider context of regularity conditions.
3.12, Some Specific Cases In this section we briefly consider some particular, or, more generally, some quite important cases of problems (P) or (Pi). a) In many mathematical programming problems arising from economicfinancial models, the independent variables are requested to be nonnegative:
322
Smooth optimization problems
Min{f{x) \xeX
elBJ", g{x) ^ 0, a: ^ 0} .
(1)
Let x^ G i n t ( X ) be a local solution of problem (1) and let at x^ some C.Q. be satisfied. It is easy to see that the Kuhn-Tucker conditions for (1) are expressed by m
V/(:i:') + E^^'^^^(^^) = 0 m
a;°[v/(xO) + 5^AiV5i(x'')]=0 0\ =_ 0 A5(x°)
A^ 0 . If we write also the feasibility conditions for x ° , i.e.
x° ^ 0 ,
p(x°) ^ 0
and set ip{x,\)
= f{x)
+ Xg{x), all the previous conditions may be
equivalently written in a "symnnetric" manner:
x" ^ 0 ,
Vx^Pix^, A) ^ 0 ,
x^Va:tp{x'^, A) = 0
A^ 0,
VAV'CX", A)
AVAV(a;°, A) = 0 .
^ 0,
b) A more general problem than (1) has been considered by Mangasarian (1969): let X ° C EJ"^ x iR"2, X ° open, and consider the problem
Min {fix, y) I (x, y) 6 X", 5(2:, y) ^ 0, h{x, y) = 0, y ^ 0} .
(2)
If (x°, y°) is a local solution of (2) and if the constraints of (2) satisfy at (x°, y°) some C.Q., it is easy to see that the Kuhn-Tucker conditions for this problem are expressed by
Some specific cases
323
V,>C(x^y^A,/i) = 0 ,
A^O, MGiR^
VA/:(:r^y^A,/i)^0,
X{VxC{x^,y^,
X, f,) = 0 ,
V^£(x^y^A,//) = 0 ,
where £(x, y, A, /i) = / ( x , y) + Xg{x, y) + ^h{x, y) . These conditions are also sufficient if at (x^,y^), f(x^y) is pseudoconvex, gi{x^y), i E {i\ gi{x^^y^) = 0} is quasiconvex and hj{x,y), j = 1, ...,r, is quasilinear (i.e. both quasiconvex and quasiconcave). c) Let X C IR^ open, f : X ^ M, S = X n R^; 2 necessary condition for x^ e S to be optimal for / over S is:
Vf{x^) ^ 0 ,
x^Vfix^) = 0 .
These conditions are also sufficient for x^ e S to be optimal for / over 5 if / is pseudoconvex at x^ (with respect to S). d) A very important case of (P) and (Pi) is when all the constraints are linear. Let n e IJ^i+^-s+^a be the set defined by the solution of the system Aiix^ + Ai2x'^ + Ai3X^ ^ b^
Azix^ + Az2X^ + ^33:^3 ^ j,3
x^ ^ 0 ,
x^ eR''\
x^ ^0 ,
with x^ e IBP'^, b' e R"^' and Aij {ij
= 1,2,3) is an im • rij (real)
matrix. Then a necessary condition, such that x = {x^^x'^^x^) 6 f i is a local minimum of the differentiable function / ( x \ x ^ x ^ ) : JR^i+^2+n3 _^ M, is that there exist multipliers A^ € IR^^ {i = 1,2,3), such that
324
Smooth optimization
Vi = V , i / ( x ) - ( A ^ u + A^Asi + X'Asi) x^Vi
problems
^ 0,
=0;
V,2/(x) - {X^Ai2 + X^A22 + X^As2) = 0 Vs = V , 3 / ( ^ ) - {X'Ais + X^A23 + AM33) ^ 0 , x^Vs = 0 ; A^ ^ 0 ,
X^Anx^
+ Ai2X^ + Aisx^
-b^)=0
A^ ^ 0 ,
X^iAsixi
+ A32X^ + Assx^ - 6^) = 0 .
If f(x) is pseudoconvex at x, the above conditions are also sufficient for a (global) minimum of / on i l . e) A special case of d) is when the objective function / is linear; we have in this case the linear programming problem. The literature for this important class of mathematical programming problems is enormous; we quote only the book of Bazaraa, Jarvis and Sherali (1990) and, for a historical account, the paper of Dantzig (1983). Consider the following standard formulation: Min {ex \Ax^b,
x ^ 0} ,
X
where b € JR^, c e JR^, A \s a real matrix of order m^jU. By means of the Kuhn-Tucker conditions it is quite immediate to prove that necessary and sufficient conditions, such that a feasible x solves the linear program above, are that there exists a vector A G IR^ such that XA^c, A^ 0 ,
X{Ax - 6) = 0 , {XA-c)x
= 0.
f) Another important specific case of mathematical programming problems, rather close to linear programming problems from an analytical and computational point of view is the quadratic programming problem, i.e.
Some specific cases
325
Min {ex + i xCx \ Ax ^ b, x ^ 0} , X
where b E M^, c e IRP', A is a matrix of order m, n and C a symmetric matrix of order n. A necessary condition such that a feasible x G iR^ is optimal for this problem is that there exists a vector A G M^ such that
c + Cx-\A^O A^ 0 ,
,
x{c+Cx-
\A) = 0 ,
A(-6 + Ax) = 0 .
If C is positive semidefinite, then the objective function is convex (strictly convex if C is positive definite) and in this case the above conditions become also sufficient for the optimality of the feasible point X.
A classical book on quadratic programming is Boot (1964). See also Avriel (1976), Bazaraa, Sherali and Shetty (1993) and the references quoted therein. g) Another problem that has recently received much attention is the fractional (or hyperbolic) programming problem (see, e.g.. Craven (1988), Schaible (1978), Singh and Dass (1989), Schaible (1981), Cambini, Castagnoli, Martein, Mazzoleni and Schaible (1990)). A programming problem, e.g. of the form
Min {/(x) \x e X, gi{x) ^ 0 , z = 1,..., m} is called a fractional programming problem, when the objective function / is given by
V\X)
We have seen, in Section 2.15 of Chapter II, many criteria for establishing the generalized convexity (concavity) of a ratio between two functions. These criteria may be applied for obtaining further information on the optimality conditions for a fractional programming problem. E.g. if
326
Smooth optimization
i.e. we have a linear fractional objective function, f{x)
problems
will be pseudo-
linear (i.e. both pseudoconvex and pseudoconcave). h) Another widely studied class of programming problems is geometric programming (see, e.g. Duffin, Peterson and Zener (1967) and Avriel, Rijckaert and Wilde (1973). First define a polynomial 5 as a real function consisting of a finite sum of terms
i
j=l
where Ci > 0 and a^j G JR are given constants and Xj > 0, V^' = l , . . . , n . (Note that, in general, polynomials are neither convex nor concave; however, they have some properties of generalized convexity.) A nonlinear program given by
Min {go{x) \ gk{x) ^ 1 , A; = 1,..., m, a: > 0} , where ^0, ^ 1 , •••, ^m ^^^ polynomials, is called a geometric programming problem. These problems have a variety of applications, especially in the engineering sciences. i)
The following problem, called discrete min-max problem, Maximum component minimum problem, goes back to the classical paper of Kuhn and Tucker (1951) and has been subsequently studied by many authors. See, e.g. Bram (1966), Dem'yanov and Malozemov (1974), Danskin (1966, 1967), Kapur (1973), Angrisani (1982) and Pallu de la Barriere (1963) who calls it the "Kantorovich problem":
Min
max
xeSi
I ^ i ^ q
Si = {x\xeX,
fi{x) , g{x) ^ 0, h{x) = 0} ,
(PK)
Some specific cases
327
where X C iR'^ is an open set. 5 : JR^ ~> ]R^, h'.ET' -^ 1R\ If all the functions involved in this problem are twice continuously differentiable at the point x^ £ Si, it is possible to prove the following necessary and sufficient optimality conditions for {PK) (see, e.g., Pallu de la Barriere (1963)). Let
f{x) =
max
fi{x) ,
I ^ i -^ q
/(xO) = {z|gi(xO) = 0 } , H{x^) = {i I / > 0 ) = /i(xO)} . a) Necessary conditions of optimality. If x^ is a point of local minimum for / on Si and if the vectors Vgi{x^), i G I{x^) and V / i j ( x ^ ) , j = l , . . . , r , are linearly independent, then there exist multipliers U ^ 0, i e H{x^), with
E *^ = i; ieH{xO)
Ai^O,
i6/(x°);
VjGl?,
j = l,...,r
such that
E
^iV/i(x^)+ 5^ A,V5,(x^) + + E^.V/i,(x«)=0.
(1)
J=l
b) Sufficient conditions for local optimality. Let >C(x,t, X^v) = tf{x) + \g{x) + vh{x)] if there exist multipliers ti [i e H{jp)), \i {i G / ( x ^ ) ) , 'i^j ( j = 1, "",T) satisfying the properties sub a), if relation (1) is satisfied at x^ G 5 i and if
yH^C{x^,t,X,v)y>Q for every y 7^ 0 of the subspace defined by
328
Smooth optimization
yV/i(xO) = 0 ,
yieH{x^),
yVgi{x^) = 0 ,
\/ie
yVhj{x^) = 0 ,
Vfc = l,...,r ,
I(x^),
problems
withti>0 with \ > 0
then x^ is a point of strict local minimum of / on ^ i . c) An extention of problem (P) (or (Pi)) may be obtained by requiring that the constraints belong to some convex cone (not necessarily the nonpositive orthant). This extension is studied, e.g., by Craven (1978) who considers the problem
Mm{f{x)\xeX,
g{x)eV},
(2)
where V is a closed convex cone of JR^. The Kuhn-Tucker conditions necessary for a minimum of (2) at the feasible point x^ E i n t ( X ) are, assuming a suitable constraint qualification: there exists a vector of multipliers X eV* (the polar of V) such that
V/(xO) + XVg{x^) = 0 Ag(xO) = 0 . See also Bazaraa and Goode (1972), Glover (1983), Guignard (1969), Massam and ZIobec (1974, 1978), Varaiya (1967), Nagahisa and Sakawa (1969).
3.13. Extensions to Topological Spaces (These hints require some basic notions of topology.) Since the early years of the development of the studies concerning mathematical programming problems, many articles considered such problems defined on topological spaces, especially Banach spaces, and with various differentiability assumptions. See, e.g., Bazaraa and Goode (1973), Bender (1978), Borwein (1978), Das (1975), Girsanov (1972), Holmes (1972), Hurwicz (1958), Massam and ZIobec (1974, 1978), Nagahisa and Sakawa
Extensions to topological spaces
329
(1969), Neustadt (1976), Ritter (1969, 1970), Zaffaroni (1993), Ben-Tal and Zowe (1982), Maurer and Zowe (1979), Russel (1966), Luenberger (1969). An extension to Banach space of the approach given in the previous section for problems (P) and (Pi) is presented in Guignard (1969) and Gould and Tolle (1972, 1975), under Frechet differentiability assumption of the functions involved. We note, however, that if the problems are not defined in a normed linear space, even the assumpion of Frechet differentiability may not be possible; in this case one must make use of more general notions of differentiability (see Chapter IV, in addition to the above cited papers). Here we briefly describe the results of Gould and Tolle (1972, 1975). Let X and Y be Banach spaces and let g : X —^Y he (Frechet) differentiable; suppose Ax C X, Ay cY and define the constraint set
S = Axng-\Ay)
,
i.e.
S = {x\x e Ax, g{x) e Ay} . Suppose f : X -^ M, f differentiable; then the optimization problem of interest is
Min f{x) . xes The set of all objective functions / which have a local constrained minimum x^ e S will be denoted by FQ and the set of all derivatives at x^ of elements in FQ will be denoted by DFQ. Denote the topological duals of X, F as X * , y * , respectively, and for any set N* C X * let N* denote the closure of N* in the weak * topology. For B a nonempty subset of X, the {negative) polar cone of B, B~, is the subset of X * given by p - = { x * G X * I x\h)
^ 0, V6 G P } .
The following properties of polar cones, already given for Euclidean spaces, are also true in Banach spaces:
i)
If Pi c P 2 , then P ^ C P f ;
330
ii)
Smooth optimization
problems
B- = (conv(B))-;
iii) B~ is a closed convex cone; iv) B C ( S ~ ) ~ , with equality if and only if B is a closed convex cone. The Bouligand tangent cone to a subset B of the Banach space X at x^ e B \s defined as in Section 3.4, but here we also define the weak tangent cone as follows. Definition 3.13.1. The weak tangent cone to B 3t x^ e B is the set T^{B,x^) = {xeX
\3{\n}
3 {x""} eB
eM,
Xn^ 0, Vn;
ix"" -^ x^, Xnix"^ -x^)-^x
weakly,
i.e. Anx*(x^ - x^) -^ x*(x), Vx* 6 X*} . The weak pseudotangent cone to S at x^ E B, denoted by Pyj{B^x^), is the closure of the convex hull of Tyj{B^ x^). Definition 3.13.2. Let x^ G S, the pseudolinearizing cone at x^, K{x^), and the weak pseudolinearizing cone at x^, Kuj{x^), are defined by K(x^) = {xeX\ where Dg{x^)
Dg{x^){x) e PiAy.gix'')}
,
denotes the derivative of g at x^\
i^^(x^) = {xeX\
Dg{x''){x) e PUAy^gix'')}
.
By using the properties of the tangent cones, it can be easily verified that K{x^) and K^(x^) Kw{x^),
are closed convex cones in X and that K{x^) C
with equality holding if Y is finite dimensional or if Ay is convex.
Definition 3.13.3. The cone of gradients at x^ e S, B*{x^), weak cone of gradients at x^ E S, B^{x^), are defined by B*{x^) = {x* G X* i X* = y* . Dg{x^), for some y* € T-{Ay,g{x^)}
;
and the
Extensions to topological spaces
forsomey*
eT-{Ay,Dg{x^)}
Then 5 ^ ( x ^ ) C B*{x^), finite dimensional.
331
.
with equality holding if Ay is convex or Y is
Varaiya (1967), in the case where Ay is convex, and Guignard (1969), more generally, have shown that the following relations hold:
B*{x^) = K-{x^) -DFoCT-{S,x^)
cT-{S,x^) .
(1)
Thus, if the constraint qualification
T-(5,x^)ci^-(a:^) holds, it follows that the optimality condition
-DFo C 5*(xO) is true.
It should be noted that, in the case when B*{x^)
(2) is closed
and (1) is satisfied, it follows from (2) that, for any / G FQ, there is a y* 6 T-{Ay,g{x^)) such that -Df{x^) = y* • Dg{x^). This is a direct extension of the Kuhn-Tucker conditions previously given for Euclidean spaces. Gould and Tolle (1972, 1975) prove the following, more general results. Theorem 3.13.1. The following relations hold
-DFoCT-(5,x°). Corollary 3.13.1. The optimality condition -DFo
C B*JxO)
holds, if the weak constraint qualification
(3)
332
Smooth optimization
T-(S,x^)cK-ix^)
problems
(4)
is satisfied. Corollary 3.13.2. If Ay is convex and the constraint qualification T-(5,x°)ci^-(xO) is satisfied, then the optimality condition (2) holds. It is natural to ask whether the constraint qualification (4) is the weakest which will ensure the validity of the optimality condition (3). The following theorem answers this question in the affirmative, under an additional assumption. Theorem 3.13.2. If X is reflexive, then
-DFo =
T-{S,x°).
Therefore it follows from the previous relations, that for X reflexive, the weak constraint qualification (4) and the optimality condition (3) are equivalent. For further considerations we refer the reader to the cited papers of Gould and Tolle (1972. 1975).
3.14. O p t i m a l i t y C r i t e r i a of t h e Saddle P o i n t T y p e In the last few years the words "nonsmooth optimization" generally refer to nonlinear programming problems (or also to problems of calculus of variations or optimal control), where the functions involved are not differentiable (in the sense of Frechet), but rather satisfy weaker assumptions concerning various kinds of limits of various kinds of differential quotients, in order to obtain generalized gradients or generalized directional derivatives. After the classical work on Convex Analysis of R.T. Rockafellar, the theory of subgradients of convex functions is by now widely known, to-
Optimality criteria of the saddle point type
333
gether with its numerous applications in mathematical programming. Another important step in this direction has been achieved by F.H. Clarke, who extended the theory of subgradients to nonconvex (locally Lipschitz) functions. Other contributions for building an axiomatic theory of generalized directional derivatives are more recent and rely on topological and algebraic properties of various local conical approximations of sets (Section 3.4). All these approaches will be treated in the next chapter; in the present section we are concerned with the saddle point characterization of optimality conditions. Indeed, this was the first approach used for treating a nonlinear (convex) programming problem in absence of differentiability of the functions involved. We take first into consideration the problem
Min fix) xes
,
S= {x\xeX,
(P) g{x) ^ 0} ,
")m where X c R"", f : X -^ M. g : X -^ W'
With regard to (P) let us consider the Lagrangian function
^ix,X)
= f{x)+Xg{x)
(1)
defined for x G X , A ^ 0. The vector A in (1) is called "Lagrange multipliers vector" or also "dual variables vector". We say that (x^,A^), with x^ e X, X^ ^ 0, is a saddle point (more precisely: a Kuhn-Tucker saddle point) for ip (with respect to X) and for (P) if:
V^(x^A) ^ ^ ( x ^ A^) ^ V^(x, A^) ,
VxeX,
VA^O,
(2)
i.e. i f ' 0 ( x , A^) admits minimum at x^ over X and ip{x^,X) admits maximum at A^, for each A ^ 0:
^(xO, A^) = min 7p(x, A°) = majc V;(x^ A) . xex A ^0 In most cases the surface generated by a Lagrangian function ip : M'^ —> M, which admits a saddle point, looks like a horse saddle (see Figure 1 of Section 3.2).
334
Smooth optimization problem.s
Lemma 3.14.1. A point (x^,A^), x^ e X, X^ ^ 0, is a saddle point of '0(x, A) if and only if a) x^ minimizes ip{x^ A^) over X]
b) p(x°)^0; c) X'gix^) = 0i.e. (x°, A°), x° € 5, A° ^ 0, is a saddle point of •^ if and only if
/(x°) ^ / ( x ) + AVa:) ,
Vz€X.
Proof. Suppose that (x*^, A°) is a saddle point for tp. By the first inequality of (2) we get (A - A°)5(x°) ^ 0 , VA ^ 0, which is possible only for g{x^) S 0; thus b) is fulfilled. Especially, with A == 0 we get X^gix^) ^ 0 and, being A^ ^ 0 and g{x^) S 0, also X^g{x^) ^ 0. Hence it is X^g{x^) = 0, so c) holds. The second inequality of (2) just means a). Conversely, from b) and c) we get immediately, with A ^ 0,
\g{x') g 0 = A°5(x0) . By adding f{x^) we have the first inequality of (2). The second inequality of (2) is assured by a). D Relation sub c) is known as the complementarity slackness condition. A first result concerning sufficient optimality criteria in terms of saddle point of the Lagrangian function is that no convexity assumption on (P) is required (and obviously also no differentiability assumption). Theorem 3.14.1. If ( x ^ A ^ ) , x^ G X , A^ ^ 0, is a saddle point for ^ with respect to X , then x^ solves ( F ) . Proof. If (x^,A^) is a saddle point for -0, then thanks to Lemma 3.14.1, we have
5(0;°)^ 0, A O ^ O , AVa;°) = 0 and
/(xO) g / ( x ) + A V ^ ) ,
Vx€X
Optimality criteria of the saddle point type
335
I.e.
f(x^) ^ fix) ,
WxeS
.
D
It should be remarked that an analogous theorem holds also for a constrained minimization problem with both inequality and equality constraints. Indeed, as no convexity assumption was made in Theorem 3.14.1, equality constraints can be handled by replacing them with two inequality constraints of opposite sign. So if we define the problem
(Pi)
Min fix) xeSi
Si = {x\x
e X, g{x) S 0, h(x) = 0} ,
where f : X -^ R, g : X -^ M"^ and h : X ^ IRP, ^nd define the Lagrangian function for (Pi) £(x, A, /x) == f{x) + Xg{x) + fj,h{x) , defined for x € X , A ^ 0, /i G IRP, the triplet (x^, A°, fx^) is a saddle point for £ (with respect to X) and for (Pi) if £(XO,A,M)
^ /:(x^A^/xO)^£(x,AO,/zO),
V x e X , VA ^ 0, WfxelRp . It is then quite immediate to prove the following result. Theorem 3.14.2. If (x^^X^^fi^) is a saddle point for £ ( x , A,//), with respect to X, then x^ solves (Pi). The necessary saddle point optimality conditions require first of all the convexity assumption on the functions involved in (P). We take therefore into consideration the convex nonlinear programming problem:
Min fix) , xes S = {x\xeX,
(Pe) gix) ^ 0} ,
336
Smooth optimization
problems
where X C JR^ is a convex set, f : X -^ M ^nd g : X —^ M^ are convex functions on X. The first theorem we give is a Fritz John saddle point necessary optimality theorem. Define first the Fritz John-Lagrange function '0(a;, i?, u) = '&f{x) + ug(x), where xeX.'deR^ue IR^, {'d.u) > 0. The following result is due to Uzawa (1958), Karlin (1959), Berge (1963). Theorem 3.14.3. If x^ solves (Pc), then there exists T9O e R, VP e with (i?05^^) ^ 0, such that (x^^'do^u^) is a saddle point for ip, i.e.
R^,
V^(x^l9o,^^) ^ V^(a;^i?o,^^) ^V^(x,t?o,n^) , V X G X ,
Vw^O.
(3)
Moreover, vPg{x^) = 0. Proof. As x^ is a solution of {Pc), the system / ( x ) - /(xO) < 0 5(x) ^ 0
has no solution. Hence a fortiori the system
f /(x) - /(xO) < 0 g{x) < 0 ,
xeX
has no solution. Thus, by the Fan-Glicksberg-Hoffman theorem (Theorem 2.9.1), there exists i9o ^ 0, u° ^ 0, with (i9o,u°) 7^ 0 (i.e. (^9o•^i°) semipositive vector) such that '9o[/(a;) - /(xO)] + u°g{x) ^ 0 ,
Vx € X .
(4)
By letting x = x ° in the above relation, we get u^g(x^) ^ 0. But since «° ^ 0 and 5(x°) ^ 0, we also have u°p(x°) ^ 0 and hence we obtain the complementarity condition u^g{x^) = 0. From (4) we obtain also V'(x°,i?o,«0) = ^o/(x°) ^ i?o/(x) + uOp(x) =
Optimality criteria of the saddle point type
337
The first inequality of the Fritz John-Lagrange saddle point characterization (3) is trivially obtained. D Theorem 3.14.3 obviously can be extended to problem ( P i ) , where / and g are convex functions on the convex set X C IR^ and h(x) is a linear affine function on IR^. Moreover, it should be remarked that if in the same theorem we have T^o = 0, then the objective function / does not appear in the saddle point necessary optimality conditions and Theorem 3.14.3 is not, in this case, very expressive. In order to exclude t?o = 0, i.e. to assure in (3) i9o > 0 (and obviously in this case it is possible to choose i9o = 1). we have to introduce a suitable constraint qualification (C.Q.) for problem (Pc)- Here we consider the following constraint qualifications: I)
Slater's CQ. (Slater (1950)). (Pc) satisfies Slater's C.Q. if there exists x e X such that g{x) < 0.
II)
Karlin'sCQ. (Karlin (1959)). {Pc) satisfies Karlin's C.Q. if there exists no semipositive vector p > 0 such that pg[x) ^ 0, V x G X .
III) Strict constraint qualification (Mangasarian (1969)). {Pc) satisfies the strict C.Q. if there exist two vectors x^,x^ G 5, x^ 7^ x^, and g is strictly convex at x^. Theorem 3.14.4. Slater's C.Q. 0, the reader is referred to the paper of Martein (1985), where regularity conditions are taken into account, i.e. conditions involving both the objective function and the constraints. Theorem 3.14.5 (Kuhn and Tucker (1951), Uzawa (1958), Karlin (1959)). Let x^ be a solution of (Pc) and let one of the constraint qualifications I), II), III) be satisfied; then there exists A^ ^ 0 such that (x^,A^) is a saddle point of the Lagrangian function ip{x^X) = f{x) + Xg{x) (and consequently X^g{x^) = 0). Proof. Let us assume that Karlin's C.Q. holds (i.e. that Slater's C.Q. holds); by Theorem 3.14.3 there exists a triplet (x^^'do^u^) such that (3) holds. If 'do > 0, Theorem 3.14.5 is proved as, in any case, X^g(x^) = 0. If i?o = 0, then u^ > 0 and from the second inequality of (3) it follows 0 ^ u^g{x), \/x e X (since 7?o = 0 and u^g{x^) = 0). But this contradicts Karlin's C.Q. and therefore i9o > 0. D It is easy to construct examples where the constraint qualifications I) or II) or III) do not hold and the Lagrangian function ip{x,X) admits no saddle point, though x^ is a solution of (P). Consider, e.g., the problem
Optimality criteria of the saddle point type
Mmf{x)
339
=X
subject to: x ^ ^ 0 ,
x e M .
The optimal solution is at XQ — 0; the corresponding saddle point problem of the Lagrangian is to find AQ ^ 0 such t h a t XQ + XXQ S XO + XQXQ ^
X +
AQX^
for every x E IR and for every A ^ 0, or equivalently 0 ^ x + AQX^ . Now, for Ao ^ 0, the above inequality does not hold for any x e M. there exists no AQ such t h a t (XQ^XQ)
Thus
is a saddle point for our problem.
Let us now consider the saddle point necessary optimality conditions for a constrained minimization problem with both convex inequality constraints and linear affine equality constraints:
Min f{x)
(PIL)
xeSiL
SiL = {x\x where /
: M^
e JRP', g{x) S 0, h{x) = 0} , —^IR,g:
FT' —> IR^
are convex functions and h(x)
=
Ax — b, with A a real matrix of order p, n and b G JDRP. We note t h a t at present no necessary saddle point optimality condition is known for a programming problem w i t h nonlinear
equality constraints
and where the usual Lagrangian function £ ( x , A, ^ ) =
f{x)+Xg{x)+iJ.h{x)
is utilized. Theorem 3.14.6.
Let x^ be a solution of problem ( P I L ) and let g and
h satisfy any of the following constraint qualifications: i)
(Generalized Slater C.Q.). There exists x E JRP' such t h a t g{x)
< 0,
Ax = b. ii)
(Generalized Karlin C.Q.). V e MP such that pg{x)
There exist no p > 0, p e M^,
+ v{Ax
- 6) ^ 0, V x G M"".
and no
340
Smooth optimization
iii) (Generalized strict C.Q.). There exist x^^x'^ e Sn, that g is strictly convex at x^. Then there exist vectors X^ elR^,
VAGiR^,
problems
x^ 7^ x'^, such
X^ ^ 0, Bnd fjP e EF such that
A ^ O , WfielRP,
VxeiR^
where >C(x, A, /i) = f{x) + Xg{x) + lJi{Ax - h). Moreover, X^g{x^) = 0. Proof. See Mangasarian (1969), Uzawa (1958). We note that the following implications hold for the constraint qualifications introduced here: iii) => i) =^ ii). Moreover, we point out that under assumption i) (Generalized Slater C.Q.), Theorem 3.14.6 is an immediate consequence of the Berge's extension of the Farkas-Minkowski theorem (Theorem 2.9.2). D For other more general saddle point optimality conditions, implying more general Lagrangian functions and/or convexity assumptions, the reader is referred, e.g., to Arrow, Gould and Howe (1973), Bazaraa (1973), Cambini (1986), Elster and Nehse (1980), Giannessi (1984), Rockafellar (1974), Jeyakumar (1985b, 1988). The optimality conditions contained in the above theorem are also known as global Kuhn-Tucker optimality conditions, in contraposition to the local Kuhn-Tucker optimality conditions, given in the previous sections for the differentiable case. It is interesting to relate then, for the different i a t e case, the saddle point characterizations with the local Kuhn-Tucker optimality conditions. We have the following results. Theorem 3.14.7. Let problem (P) be given, where / and g are differentiable on the open set X C JRP'. Then if (x^, A^) is a saddle point for '0(a:, A), with respect to X, then (a:^, A^) satisfies the local Kuhn-Tucker conditions:
V,V^(x^AO) = 0
(5)
VAV'(AAO)^0
(6)
AOVAV^(X^AO) = 0
(7)
A^ ^ 0 .
(8)
Optimality criteria of the saddle point type
341
Proof. By definition of saddle point we have that the function '0(x, A^) has at x^ a minimum; being that X is open we have then relation (5). Thanks to Lemma 3.14.1 we have Vxip{x^^X^) = g{x^) ^ 0, i.e. relation (6) and A O V A ' 0 ( X ^ A ^ ) = X^g{x^) = 0, i.e. relation (7). D Theorem 3.14.8. Let in problem (Pc) the functions / and g be differentiable on the open convex set X C IR^. Then if there exists a vector (x^, A^), satisfying the Kuhn-Tucker conditions (5)-(8), then (x^, A^) is a saddle point for the Lagrangian function ip(x^X) = f(x) + Xg(x). Proof. As ip is convex with respect to x, we have
^{x,X^)-^{x^,X^)
^ (x-x^)V:,ij(x^,X^)
=0,
\/xeX and therefore
V'(x^A^) ^^{x,X^)
,
WxeX
,
Being that X^g{x^) = 0, A^ ^ 0 and g{x^) ^ 0, then
V'(x^A) ^
V^(X^AO)
,
VA^O
.
D
The assertions of Theorems 3.14.7 and 3.14.8 remain true also for problem ( P i ) . In Theorem 3.14.8 we can also assume the pseudoconvexity of the Lagrangian function £ ( x , A), with respect to x. Combining together Theorems 3.14.7 and 3.14.8 we obtain at once the following classical result of Kuhn and Tucker (1951). Theorem 3.14.9. Let us consider problem (Pc), where f : X —^ M and g \ X —^ WC^ are convex differentiable functions on the open convex set X C JRP' and let any of the constraint qualifications I), II), III) be satisfied. Then {x^ solution of (Pc)} ^ { 3 A ° ^ 0 such that (x^,A°) is a saddle point
for '0(x, A)} ^
Smooth
342
optimization
problems
i ( ^ ^ ± M ^ ^ = .V/(.»,
(1)
holds which can be expressed equivalently by
f{x^ + ty) = f{x^) + tyVfix^)
+ o(\t\)
(2)
for small t> 0. A function / with this property is called Gateaux difFerentiable or weakly difFerentiable at x^. Naturally, each Gateaux differentiable function at x^ is also partially differentiable at x^ and therefore the components of the gradient V / ( x ^ ) are again the partial derivatives —— /(^^)» OXi
i = 1, ...,n. We should remark that a Gateaux differentiable function is not necessarily continuous. This is demontrated in the following example. Example 4.2.1. Let f \ ]R? -^ R be given by r ((Xi)2 + {X2?) ^ f{xi,X2)=
x^ and z —^ y (since x + tz2e'^ —> x^) to
yie'Vfix^)
+ y2e'V/(x^) = 2/V/(rr^)
and we get (8). For higher dimensional spaces, i.e. in case n > 2, the proof is the same. Here we have to decompose (10) as a sum of n special difference quotients (for each direction e^,..., e'^) which converges to
yie'Vfix^)
+ ... + 2/n6"V/(xO) = yVf{x^)
.
D
In view of a comparison between Frechet differentiability and strict differentiability in the following example we shall see that both notions are not the same even by Lipschitz assumption. Example 4.2.3. Let the function f : M -^ M he given by
I x^ sin -
if a: 7^ 0
[ 0
if a; = 0 .
fix) = \
X
Obviously, / is (Gateaux and Frechet) differentiable at a;° = 0 with the classical derivative Vf{x^)
= / ' ( ^ ° ) = 0. Also / is Lipschitzian around
368
Nonsmooth optimization
problems
x^ = 0 which can be stated by the mean value theorem (since the derivative according to 2x sin
cos — is bounded near zero). But / is not strictly X
X
differentiable at x^ = 0. For this, we regard the sequences {x^}
and {tk}
according to k^ 1 1 1 ^ 2fc7r + 7r/2 ' ^ 2fc7r ~ 7r/2 2fc7r + 7r/2 ' Both sequences tend to zero and we get with y = 1
/ j.^
f{x^ ^^^^^'^+ tkV) - f{x^) f^-'k^
^ _
^.^
1
\2
V2fc7r-7r/2/
/
1
x2
V2fc7r + 7r/2> ^V2fc7r 7r/2/
_
2/c7r - 7r/2 " 2fc7r + 7r/2 _ _ ^.^ Sfc^TT^ + 7rV2 _ _ 2 ~ A;-.oo (4A;27r2 - 7r2/4) TT ~ which does not coincide with yVf{x^)
= 0.
In practice it is not easy to check the strict differentiability of a function. Therefore it is useful to find a sufficient condition. For this purpose assume / to be continuously Gateaux differentiable at x^. That means that / is Gateaux differentiable on a neighbourhood of x^ and the gradient mapping V / ( x ) is continuous at x^. It holds Theorem 4.2.5. If / is continuously Gateaux differentiable at x^, then it is strictly differentiable at x^. Proof. Let x G M^ with ||a: — x^\\ sufficiently small and z e M^ be given. Then for small t > 0 the function ^x,z(f) = / ( ^ + '^^) is continuous and differentiable. Using the classical mean value theorem we get
with ^ G (0,t). Using the continuity of the gradient mapping at x^ we obtain
lim /(^ + ^^)-/(^) = yyf^.o^ . tiO
Differentiability
369
Since this equality is true for all y e MP', by Theorem 4.2.3 the function / is strictly differentiable at a:^. D It is useful to remark that one can weaken the hypothesis in Theorem 4.2.5 by assuming only the continuous partial differentiability of / at x^ (which is a well-known result of analysis). Here the result can be obtained by the repeated use of the mean value theorem for each direction e^^..., e^ of orthogonal unit vectors. Especially for n = 2 we get from (10)
f(x + tz) — f(x)
^
/./
^
1
ON
OX2
with ^i, x^, z —> y and t I 0 because of the continuity of the partial derivatives. Thus the function / is strictly differentiable at x^. Naturally, if / is continuously Gateaux differentiable not only at the point x^ but even on a neighborhood of x^, then / is also continuously Frechet differentiable and even continuously strictly differentiable at this neighborhood and all these notions coincide (frequently denoted by / G Now we are able to give the following table in which the connections between the several kinds of differentiability, the Lipschitz property and the continuity of a function are demonstrated:
Nonsmooth optimization
370
problems
Continuous differentiability
i Strict differentiability
i
I
Lipschitz property
Frechet differentiability
i
I
Continuity
Gateaux differentiability In our previous remarks we pointed out that the converse implications do not hold in general. This shall be confirmed by the following example in which the function / of Example 4.2.3 is modified. Example 4.2.4. Let a ^ 0 and f : R-^
1
Rhe
given by
if a; 7^0
fix)
if a; = 0
0
We discuss the properties of the function at x^ = 0. Obviously / is continuous at x° = 0 for a > 0 and Lipschitzian around a;° = 0 for o; S 2 (as in the former exannple it can be derived directly from the mean value theorem). For a < 2 the function is not Lipschitzian around x° = 0. This can be demonstrated by the sequences {x''} and {z''} according to
k^
1
fc
~ 2A;7r + 7r/2 '
= _ _ J _ _ ^
2k7T - 7r/2 '
Here for the difference quotients we get
fiz'') - fix'') l_2.fc _
^k\
_
\2fc7r-7r/2/
"^ V2A;7r + 7r/2/
1
1
2A;7r - 7r/2
2A;7r + 7r/2
1 /4fc27r2-7r2/4 Ak'^TT^ - TT^/A -K \ (2fc7r - 7r/2)« "*" (2A;7r + 7r/2)
I)
Directional derivatives and suhdifferentials for convex functions
371
which tends to —oo for /c -> oo (i.e. x^ -^ 0 and z^ -^ 0). Therefore, no Lipschitz constant can be found. Regarding the differentiability we see that / is (Gateaux and Frechet) differentiable at x^ = 0 for a > 1 and continuously differentiable (and hence strictly differentiable) at this point for a > 2. In Example 4.2.3 we have seen that for a = 2 / is not strictly differentiable at x^ = 0. We summarize the results in the following table
Continuity a = 0< a< 1 < a< a = a>2
0 1 2 2
-
+ + + +
Differentiability -
+ + +
Lipschitz property — — —
Strict differentiability — — —
+ +
+
— 1
4.3. Directional Derivatives and SubdifFerentials for Convex Functions If we want to extend the classical differentiability concepts mentioned in Section 4.2, then it is useful to start with the discussion of convex functions. Here with respect to the epigraphs of such functions, the wellknown assertions about the algebraic and topological properties of convex sets (especially separation theorems) provide an approach for generalizing, in an excellent manner, the notion of the directional derivative and the notion of the gradient of differentiable functions. Today these generalizations are basic tools in convex analysis and permit a good application for the discussion of mathematical optimization problems. Convex functions and their properties are discussed earlier in Sections 2.5 and 2.6. In this section we shall summarize once more the most important results regarding the directional differentiability and the structure of the subdifferential of such functions - also in connection with their application to convex optimization problems. These results allow further generalizations for not-necessary convex functions in the next sections.
372
Nonsmooth optimization
problems
Let / : IRP' —> i R U { ± 0 0 } be an extended real-valued convex function and x^ G M^ be a point where / is finite. As we have seen in Sections 2.5 and 2.6, the convexity of / is equivalent to the monotonic increase of the difference quotient
t with respect to the parameter i > 0 for all fixed vectors y € M^. the directional derivative of / at x*^ according to
•^ ^ '^^
t>o
t
no
t
Thus
^^
exists for each direction y (possibly with values ±00) and provides an extended real-valued function / ' ( x ^ , . ) . From the convexity of / we could derive directly that f{x^^.) is also convex (even sublinear, since it is positively homogeneous). Obviously the value f'{x^^y) is finite if x^ ±ty G d o m ( / ) for any t>Q. This is a simple consequence of the inequalities — 00
,y)<jAi±M^jm
0. Without loss of generality we can set UQ = 1 and i) is fulfilled. The implication i) => ii) can be shown analogously to Theorem 4.3.3. Finally let ii) and the complementary slackness condition be fulfilled. Then there exists subgradients v^ € df{x^) and v'^ e dgi{x^), i e I{x^), with
^^ + E ^^^' ^ ° • iG/(xO)
Since for arbitrary x E S the inequalities
v\x - x^) < gi{x) - gi{x^) ^ 0 hold, we get
Vi G I{x^)
Generalized directional derivatives
fix)-f(x^)^v\x-x')
= (-
379
Yl
= -
Y.
Uiv'){x-x'')
=
Uiv'{x-x^)^0
,
iel{x^) which means the minimality of x^.
D
We shall remark that for the proofs of the last assertions the convexity of the functions is not required directly but the convexity of the directional derivatives, which is a weaker assumption, is. In the next sections we present some approaches about the possibility for the construction of convex directional derivatives for nonconvex functions.
4.4. Generalized Directional Derivatives For nonconvex functions we cannot expect that in general the limits of the difference quotients discussed in Sections 4.2 and 4.3 respectively exist, not to mention the linearity or convexity of these derivatives. Therefore other approaches were developed to find suitable generalized directional derivatives useful for applications in the nonconvex case. The simplest way for this purpose is the replacement of the limit operation by the upper and lower limits. So, given a function / : IR^ - ^ JR U { ± 0 0 } and a point x^ E IR^ where / is finite, the upper Dini directional derivative and the lower Dini directional derivative at the point x^ in the direction y G IR^ are defined by
fui^^y)
= limsup no
,
/£(A.)=Iiminf^^-°^^^)-^(-°) no '' It is quite obvious that
(1)
^ (2)
380
Nonsmooth optimization prohlems
and that in the case of the equality of the above limits we get the classical directional derivative f^(x^^ y) discussed in Sections 2.6 and 4.3. Naturally, in general this equality does not ensure the convexity of the directional derivative which would be very important for the application. In this connection Pshenichnyi (1972) and loffe/Tichomirov (1979) have proposed a concept in which the convexity of the directional derivative is assumed. To be exact, a function / is called quasidifFerentiable (according to Pshenichnyi) or locally convex (according to loffe/Tichomirov) at the point x^ if
/L(x°,-) = /a(^°,-) (i.e. that f'{x^^y) exists for each y G IRP') and if this directional derivative is finite and sublinear in y. In the mentioned books of Pshenichnyi and loffe/Tichomirov it is pointed out that the class of such functions is relatively extensive. Also one can find calculus rules for this concept which extend the calculus rules for differentiable and convex functions and which make possible the application of this notion for the handling of (special) nonconvex optimization problems. Of course, in case of the equality and linearity of the above-introduced directional derivatives the function / is Gateaux differentiable according to relation (1) in Section 4.2. The meaning of the directional derivative for convex functions is not only connected with the sublinearity but also with the uniform convergence with respect to the directional vector y according to relation (2) of Section 4.3 in the case of x^ 6 i n t ( d o m ( / ) ) . This property is important for the discussion of constrained optimization problems. Taking this into account, the uniform upper Dim' directional derivative and the uniform lower Dini directional derivative (also upper and lower Dini-Hadamard directional derivative) of the function / at the point x^ in the direction y 6 M^ are introduced by
fui^^y)
= limsup -^ no z->y
f ^
'—^ ,
(3)
Generalized directional derivatives
fZ{x^,y) = hmmf ^ no
f ^
381
^—^ .
(4)
z—^y
Here we have used the topological structure of the space such that we can expect topological properties of these directional derivatives. So by definition, / ^ ( ^ ^ , . ) is upper semi continuous and / ^ ( x ^ , . ) is lower semi continuous with respect to the directional vector. In comparison with (1) and (2) we have
/ ^ ( ^ ° , •) ^ / L ( ^ ° , •) ^ fifi^', •) < /c^(^°, •) • We can state that the function / is uniformly directional differentiable at the point x^ similar to (2) in Section 4.3 if all these directional derivatives coincide, especially if
Then of course, / ^ ( x ^ , 0 ) = / ^ ( x ^ , 0 ) = 0 and this directional derivative is finite and continuous on IR^. In analogy to our former remarks the function / is called regular locally convex (see loffe/Tichomirov (1979)) if both directional derivatives are equal and present a sublinear function in y. Finally, using Theorem 4.2.1 we see that the function / is Frechet differentiable at x^ iff this directional derivative is also linear. Regarding the limit (8) in Section 4.2 we can also define the following generalized directional derivatives at x^ by
^*/ 0 ^
T
f{x + tz) - f{x) no
^
z-^y
no
^
z^y x-^x^
Also here, f]j{x^^ •) and fi{x^^.)
are upper semi continuous and lower semi
continuous respectively which can be derived directly from the definitions.
382
Nonsmooth optimization
prohlems
Both directional derivatives, however, are very interesting because of their convexity properties. In Theorem 4.4.4 we will prove that / ^ ( x ^ , . ) is convex. Now because of
we can state immediately that /£(a:^,.) is concave. Comparing all the directional derivatives introduced in this section we have the following inequalities
In case of
/2(A-) = /a(A-) all directional derivatives coincide, are finite and convex as well as concave, i.e. linear. This is equivalent with the strict differentiability of / at x^ discussed in Section 4.2 (see Theorem 4.2.3). We shall remark that the description of some directional derivatives can be simplified if / is assumed to be Lipschitzian around x^. Then analogous to our remarks in Section 4.2 in this case the variation of the directional vector y can be omitted since
f{x + tz)-f{x)
f{x +
t f{x + tz) - f{x + ty) t
ty)-f{x)i t ^ L \z
for X -^ x^, z -^ y and 1 1 0 (here L is the Lipschitz constant). Therefore we get in this case
f+{x\y) = f^{x\y)
,
Vy€iR",
fl{x\
,
Vy e i?" ,
y) = liminf / ( ^ + ^^) " / ( ^ ) tio *
Generalized directional derivatives
no
383
^
The last limit, generally denoted by
r{x^,y)
= limsup --^ no
f ^
^—^
(7)
is known as the Clarke generalized derivative. It is introduced in the famous paper of Clarke (1975). In Theorem 4.4.4 we shall state that this directional derivative is convex too, also if / is not assumed to be Lipschitzian around x^. Analogous, because of
no
^
this directional derivative is concave. Before we introduce and discuss further directional derivatives by more complicated convergence concepts, we shall analyze which of the above mentioned notions are suitable for the discussion of optimality conditions. For unconstrained optimization problems we have the following result. Theorem 4.4.1. /then
/^(x^y)^O
If x^ e IRP' is a local minimum point of the function
VyGiRV
Proof. If x^ E 5 is a local minimum point of / then for each y 6 M^, each z in a small neighborhood of y and each t > 0 sufficiently small, we have
/(^o + tz) - f(x') ^ ^ t such that
384
Nonsmooth optimization problems
Of course Theorem 4.4.1 remains true if fi{x^,.) is substituted with all other directional derivatives which are larger. Concerning the directional derivative / £ ( x ^ , . ) , however, the assertion would be false in general. So e.g. for the function f(x) = \\x\\ the point x^ = 0 is the minimum of / but because of /^(O, y) = —||y|| the mentioned condition is not verified. Now we regard constrained optimization problems of the form
Min fix) .
(Po)
x€S
In analogy to the assertions stated in Theorem 3.6.1 and Theorem 4.3.2 we can formulate two geometrical optimality conditions using the directional derivatives / ^ ( x ^ , . ) , / ^ ( x ^ , . ) , the contingent cone T(S^x^) and the cone of interior directions I(S,x^). Theorem 4.4.2. \f x^ e S \s a local minimum point of the problem (PQ) then
0 ii)
f${x^y)^0,yyeT{S,x^). /^(x^2/)^0,V2/G/(5,xO).
Proof. Let y e T ( 5 , x ^ ) . Then there exist a sequence {y^} C IRT' converging to y G IR^ and a positive sequence {tk} C M converging to zero such that x^ + tky^ e S for each k G IN. Since x^ is a local minimum of / on 5 we get
fui^^y) = iinisup -^ A:—•oo
j-^—^—-
^ 0.
^
For the second condition let y G / ( 5 , x^). Then for each sequence {y^} C IR^ converging to y G IR^ and for each positive sequence { t ^ } C JR converging to zero we have
x^ + tky^ G S
Generalized directional derivatives
385
for large k G IN. Since x^ is a local minimum of f on S we get for all these sequences
liminf ^—UlLJ—LL_2
^ 0 ,
^^
k-^oo
i.e.//(:^^y)^0.
D
That the stronger condition /+(Ay)^0
VyGr(5,x°)
in general does not hold can be demonstrated by our former Example 4.2.2. For the objective function ., N f{xi,X2)=
0} = = {ye]R''\S-tyCS
Vf > 0} =
= -0+5 we obtain
0+5 = - 0 + ( J ? " \ 5 ) C C -0+(K(ie"\5,x)) = = 0+(J?"\ii:(J?'^\5,x)) = = 0+L(5, x) which is equivalent to property (6) in Definition 4.6.1. Thus we are able to give
D
400
Nonsmooth optimization
problems
Definition 4.6.2. Let K{.j.) be a local cone approximation. The local cone approximation !/(.,.), defined by
L(S,x)
= R''\K{lR''\S,x)
,
is called the dual local cone approximation of
K{.^.).
If L(.,.) is dual to K{.^.) then naturally, by twice complementation we can state that also K{.,.) is dual to L{.,.). This means that {K{.,.), L ( . , . ) } forms a dual pair of local cone approximations. Regarding the special cone approximations introduced in Section 3.4 we can recognize the following dual pairs:
{Z{.,.),Fi.,.)},
{Zmi.,.),Fm{.,.)},
{/(.,.),r(.,.)},
{Im{.,.),Tmi.,.)},
{A{.,.),Qi.,.)},
{Am{.,.),Qm{.,.)}.
This duality relation is very interesting. So each (algebraic or topological) property of the one local cone approximation is associated with any dual property of the other. In this manner the discussion and the comparison of special local cone approximation can be simplified considerably as we have seen in Section 3.4. We shall agree that in the following the notation "ii"(.,.) has any property" (convexity, closeness or so on) means that the cones K(S^ x) admit this property for all sets S and all points x. Now the concept of local cone approximations shall be used for the construction of generalized differentiability notions. Let / : M^ —> i R U { ± o o } be an extended real-valued function and x^ € JR^ be a point in which / is finite. As we have mentioned above we will introduce generalized directional derivatives of / at x^ by approximating the epigraph of / locally at
the point
{x^J{x^)).
Definition 4.6.3. Let i^(.,.) be a local cone approximation. Then the extended real-valued function / ^ ( x ^ , . ) : IR^ —^JRU { ± o o } , defined by
/^(^o, y) := inf {U
R\ iv, 0 e i^(epi / , (x°, / ( x ) ) ) } ,
y € iR" ,
Abstract cone approximations of sets
is called the K-directional
401
derivative of / at x^.
To avoid confusions we agree inf 0 = +00 (analogous sup 0 = —00). Obviously / ^ ( x ^ , . ) is a positively homogeneous function. Moreover, we have the following relations regarding the epigraph and the strict epigraph o f / ^ ( x ^ .): epi^ / ^ ( x ^ .) C i^(epi / , (x^ /(x«))) C epi / ^ ( x ^ .) .
(7)
The second inclusion is obvious. The first inclusion is a consequence of property (6) in Definition 4.6.1. In fact, since {0}xJR+CO+(epi/) we also get {0}xiR+CO+(i 0 it holds i)
iXf)^(x^,y)
=
Xf^ix^,y),yyeM^;
ii) %(A/)(:r°) = A%/(a;0). Proof. For a fixed number A > 0 the mapping cpx : J?"+^ - ^ JR"''^^, according to
is linear and non singular. Using property (2) of Definition 4.6.1 we have ir(epi(A/),(xO,A/(xO))) = Kiipxiepif),Mx^,f{x^))) = ^x{K{epif,{x^,f{x^m = {(y,Ae) I (y,0
€ Kiepif,
= = {x^Jix^m
.
From this we get (A/)^(xO,y) = A/^(x«,y)
\/y e R^
and %(A/)(xO) = {ueM^\uy^ = {ueM^'luy
(A/)^(rrO, y) \/y€M^} S A/^(a;°,y)
= A%/(a;0) .
=
Vy G iR"} = D
We cannot expect that the local cone approximation in this general structure permits rules regarding the representation of the X-directional derivatives and if-subdifferentials for the sum of functions, similar to Theorem 2.6.8. We can give, however, an assertion which compares conditions regarding the sum of /T-directional derivatives and the sum of Ksubdifferentials for different functions. Such assertions are very important for the construction of optimality conditions for constrained optimization problems. Naturally, from the definition we get 0 e OK fix'') ^ f^(x,y)
^0
Vy E JR"
Abstract cone approximations of sets
411
and this equivalence can be used for the discussion of unconstrained optimization problems. As an extension of this relation we have Theorem 4.6.9. Let / i , . . . , / p \ IR^ -^ ]R\J { ± 0 0 } be real-valued functions, which are finite in the point x^ G IR^, ^ i ( - , -), •••, Kp{,^.) be local cone approximations and Ai,..., Ap be positive numbers, Then it holds i)
If 0 G ^
ii) Let Ki{,,.)
KdKjiix'')
then f ^ \iff'{x\y)
^ 0, Vy G i R ^ .
be convex, /f'^(x°,0) == 0, z = l , . . . , p a n d
p
P I dom(/^^^ ^ ^^(x^,.)) :^ 0. Then the converse implication is true, i=l
i.e. if E
^^/f'(^^2/) ^ 0
Vy G iR" then 0 G ^
2=1
Aza;^Jz(x^) .
2=1
Before proving the assertion, it is useful to remark that in Theorem 4.6.7 we have pointed out that the condition /^ '(x^,0) = 0 implies dKifi{x^)
7^ 0-
This means especially that f^'{x^,y) p
y G M"", and thus the terms ^
> - 0 0 for all
p
\dKji{x^)
and ^
2=1
Xiff^'{x^,y)
2=1
determined correctly. Proof of Theorem 4.6.9. 1) If 0 G E
^i^Kifi{x^)
we get from the definition
2=1
2=1
22 == 11
= sup 1 ^ XiUiy I ixi G dKji{x^),
i = 1, . . . , p |
2=1
p
= sup {nt/1 u e ^
KOxJiix"^)}
^ 0
are
412
JMonsmooth optimization
problems
for all y € M''.
2) Let J2 ^if^'i^'^^y) ^ 0, Vy E R^. Then
2=1
P
and we have to show that this set is contained in ^
\dKifi{x^)-
i=i
Because of Theorem 4.6.8, for simplification we can assume that A^ = 1 for all i = I,...,p. Moreover, it is sufficient to show this implication only for p = 2. Via complete induction one can extend the assertion for any other finite set of functions. Now let uy ^ fi'{x^,y) + / 2 ^ ' ( x ^ y ) . With
5(y) = /fn^^y) Ky)=uy^f^^{x^,y) we get g{y) Z h{y)
Vy G 12" ,
i.e. int(epi5) n (hypo/i) = 0 . By assumption both sets are nonempty convex cones. Using the separation theorem (Theorem 2.2.5) we can find a vector (v, a) 6 i R " x jR, {v,a) i^ (0,0), with (^^,a)(y,O^0
V(y,0€epi5,
{v.cc){y,i)^^
V(y,0€hypo/i.
Obviously, we have a ^ 0. Assuming a = 0 we would get vy^^
Vy€dom(/fi(xO,.)),
t;y^0
Vy€dom(/f2(a;0,.)) .
Abstract cone Sipproximations of sets
413
Since by assumption
0 ^ dom(/i"*(^^)(xO,.)) n dom(i^"*(^^)(xO,.)) C C int(dom(/f^(x0,.)))nint(dom(/2^^(a;0,.))) ,
(18)
this is only possible for t; = 0 which would be a contradiction. Hence, a < 0 and without loss of generality we can set a = — 1 . That means especially
vy-{uy-f^\x^,y))
^ 0
^y e
dom{f^\x^,,))
i.e.
vedKji{x'') ,
u-vedKj2{x'')
and
u = v + {u-v)e aKjiix"") + dxMx''). Remark. For the proof it was sufficient to claim only
0 ^ dom(/r'(^^)(x^ .)) n d o m ( / f ^(xO,.)) C C int(dom(/f ^(x^,.))) n dom(/2^^(x^ .)) instead of (18). Therefore in the general case the assumed interior point condition of Theorem 4.6.9 can be weakened by requiring
n
d o m ( / f (^*>(x°,.)) n dom(/^*° {x\ .)) ^ 0
i=l
for any io G {l,...,p}.
n
414
Nonsmooth optimization problems
4.7. Special iiT-Directional Derivative In this section we analyze the special ii'-directional derivatives which are associated with the special cone approximations discussed in Section 3.4. Analogous to the other sections let / : iR" —^Mu {±00} be an extended real-valued function and x° € i R " be a point in which / is finite. We start with the cone of feasible direction Z{.,.) and the radial tangent cone F{.,.) which are dual in the sense of Definition 4.6.2. Here we get the following result: Theorem 4.7.1. For any y e M^ it holds
I)
/^(a;0,y) = limsup ^ no
7 ^
"^ ^ ^ ,
II) /^(x",y) = hmmf -^
f
^—^ ,
no iii) f^{x',y) =
^
-i-ff{x^,y).
Proof.
1) epi /^(xO,.) = {(y,0 e iR" X iR I Ve > 0 : (y,e + e)G^(epi/,(xO,/(xO)))} = = {{y,0 € J?" x i R | V £ > 0 3 A > 0 V i € (0,A) : (xO + i2/,/(xO) + i(e + e ) ) € e p i / } = = { ( 2 / , O e i R " x i R | V £ > 0 3 A > 0 V t e (0,A) : = {iy,0 € iR" X iR I limsup /(^° + ^^) " / ( ^ ° ) g A . 2) epi /^(xO,.) = {{y,0 € iR" X iR I Ve > 0 :
Special K-directional derivative
415
iy,^ + s)eF{epif,ix\f{x^)))}
=
= { ( y , O € i ? " x i R | V e > 0 VA > 0 3te{0,X) (x' + tyj{x^)
+ t{^ + s)) € epif} =
= {{y,^) eR"" xlR\\/e>0
f(x° +
:
ty)-fix^) t
V A > 0 3«€ (0,A) :
^^ + e}
= {{y,0 € iR" X iR I liminf /(^° + ^^) " / ( ^ ° ) ^ A
3) f {xr,y) = limsup -^
7
^—- =
no ^ = _li^i„f (-/)(x° + ' v ) - ( - / ) ( x ° ) ^ tio
*
= -(-/)^(rrO,y).
D
Thus we recognize the special directional derivative (upper and lower Dini directional derivative) / ^ ( x ^ , . ) and / £ ( x ^ , . ) already discussed in Section 4.4. Also we see the dual character of both directional derivatives in the third relation. In conclusion we obtain the following simple assertion which we mentioned in Section 4.4. Theorem 4.7.2.
/
/ ^ ( x ^ , . ) = f^{x^^.)
is Gateaux differentiable at x^ G IR^ if and only if and this directional derivative is finite and linear.
If / is a convex function, then e p i / is a convex set and because of Theorem 3.4.10 we get
Z(epi/,(xO,/(xO))) = F(epi/,(xO,/(xO))) = = cone(epi/-(xO,/(xO))) . Therefore we can state once more the well-known relation
r ( x ^ y ) = r ( x ^ y ) = liin tjO
t
416
Nonsmooth optimization problems
for all y G M^, which we have derived in Sections 2.6 and 4.3. For the cone approximations / ( . , . ) and T ( . , . ) (i.e. the cone of interior directions and the contingent cone respectively) we get the following result: Theorem 4.7.3. For any y € M^ it holds
0
fi^^y)
= limsup ^ no
ii) /-^ (x^, y) = liminf
f(x^ +
no z-^y iii) f\x^.y)
=
f
^—^ ,
tz)^f(x^) t ' ^
^{-fY{x^,y).
Proof. 1) Since 7 ( e p i / , (aP^ / ( ^ ^ ) ) ) 's open, we have
epiO/^(x°,-) = Aepi/,(:rO,/(xO))) = = {(y,0€iR"xJR|3£>0
3iV(y)
3A>0
Vt 6 (0, A) Vz € Ar(y) VC with |C - CI < £ : (xO + tz,/(xO)+tC)eepi/} = = {(j/,0 eM'xR\3e>Q
3N{y) 3A > 0
V< 6 (0, A) Vz € iV(y) VC with |C - CI < ^ : fjx^ + tz) - f{x^) , = {(y,OeiR"x J?|3£>0
Vi6(0,A) yzeN{y) f{x' + tz)-f{x')
. 3A/'(y)
: ^^
3A>0
Special K-directional derivative
417
= {(y,0 € JR" X iR I limsup /(^° + ^^) " / ( ^ ° ) < ^ | tio 2) Since T(epi / , (a;°, / ( x ° ) ) ) is closed, we have
epifix^.)=T{epif,{x'j{x°)))
=
= {(2/,0 G iR" X 1? I Ve > 0 VAr(y) VA > 0 3iG(0,A) 3zeN{y)
3C with |C-.^1 < £ •
(rrO + tz,/(xO) + iC)Gepi/} = = {(y,0 € jR" X iR I Ve > 0 ViV(y) VA > 0 3iG(0,A) 3zGiV(y) 3Cwith | C - e i < £ :
= {(2/,0 G -K" X J? I V£ > 0 ViV(y) VA > 0 3tG(0,A) 32GiV(y) :
f{x^ + tz)-f{x^)
- o o ) , then / is lower semi-continuous at x^.
iii) If f^{x^,0) = / ^ ( x ^ , 0 ) = 0 (this is equivalent to the finiteness of / ^ ( x ^ , . ) and /-^(x^,.)), then / is continuous at x^. Proof. 1) Assuming / is not upper semi-continuous at x^, we could find a positive number s > 0 and a sequence { x ^ } G JR^, x^ —> x^ such that / ( x ^ ) —
Special K-directional derivative
f(x^)
> e.
Setting y^ = -^—r
419
TTT: and tk = \\x^ - x^\\,
loss of generality we can assume that {y^}
without
is convergent (since it is
contained in the compact unit sphere) and t h a t y^ -^ y^. Then
/^(x",y") ^
limsup
,.
fix'') - fix'')
which is a contradiction. 2)
Because of ( - / ) ^ ( 2 ; ^ , 0 ) = - / ^ ( x ^ , 0 ) = 0 we can conclude t h a t
- /
is upper continuous at x^, hence / is lower semi-continuous at x^. 3) The assertion is a consequence of i) and ii) and the fact t h a t f(x^) finite.
is D
Now let / be a convex function and x^ G i n t ( d o m ( / ) ) (in this case /
is continuous at x^).
Obviously ( 0 , 1 ) G i n t ( e p i / ) and according t o
Theorem 3.4.10 we have
/(epi / , (xO, /(x°))) = cone(int(epi / ) - ix'', fix""))) # 0 and
T(epi / , (rr°, /(x"))) = /(epi / . (xO, /(xO))) . Because of / ^ ( x ° , 0) ^ 0 (we have even f^ix^,
0) = 0), by Theorem 4.6.5
we see that
no
^
for all y G iR'^. Thus we get once more the well-known result (discussed in Section 4.3) that a continuous convex function is uniformly directional differentiable.
420
Nonsmooth optimization problems We investigate now the iT-directional derivative related to the cone
approximations ^ ( . , . ) and QC-, •) (i.e. the cone of attainable directions and the cone of quasi-interior directions respectively). Theorem 4.7.6. For any y e M^ it holds
•)
f^i^^.y)
iii) f\x^,y)
= lim sup inf ^
=
f—^-^-^ ,
-{-f)Q{x^,y).
Proof. 1) Since A ( e p i / , (x°, /(a;°))) is closed, we have
epi/^(xO,.) = ^(epi/,(:rO,/(xO))) = = {(y,0 € iR" X 12 I Ve > 0 \/N{y) 3A > 0 Vt€(0,A) 32€iV(y) 3C with |C - $| < e : (xO + t2,/(xO) + t C ) € e p i / } = = { ( y , 0 € J?" X J? I Ve > 0 VAr(t/) 3A > 0 VtG(0,A)
3z€iV(y)
3C with |C - CI < ^ :
/(rr°+tz)-/(x'') g ^ 1 =
= { ( y , 0 € J?" X J? I Ve > 0 ViV(2/) 3 A > 0
Vi€(0,A) 3zeN{y) /(xO + tz)-/(xO) t
{(y,0€iR"xiR| tiO
0
= VA > 0
VC with |C - | | < e :
3iV(y) VA > 0
3 i e ( 0 , A ) \/zeN{y)
:
/ ( x ° + tz) - / ( x ° ) , ,
^^-1 _
= {(y,C)€iR"xJR|
'
no 2^y
t
^i
3) /^(:rO,y) = lim sup inf /(^° + ^^) " / ( ^ ° ) =
.
(-/)(x° + t^)-(-/)(a;°) — ^ ——-
= — lim mf sup -^^ tj.0 2-»V
= -(-/)^(Ay)Because of 0 6 ^ ( e p i / , (x^, f{x^)))
=
t
• and the closedness of this cone,
the directional derivative f^{x^^.) is lower semi-continuous with /"^(x^, 0) ^ 0. In the same manner, since 0 ^ (5(epi/, ( x ^ , / ( x ^ ) ) ) and this cone is open, the directional derivative / ^ ( x ^ , . ) is upper semi-continuous with If / is convex, i.e. if e p i / is a convex set, then from Theorem 3.4.10 we know that
422
Nonsmooth optimization
problems
g(epi / , (xO, f{x^))) = /(epi / , (xO, /(x^))) , ^(epi / , (xO, /(xO))) = T(epi / , (x", f{x^))) . Hence, it holds /'^(x^O^/V,-)
and
f^{x',.)
=
f(x',.).
If the convex function / is assumed even to be continuous at x^ (i.e. if x^ G i n t ( d o m ( / ) ) , then because of our previous remarks we have f\x^^.)
=
/•^(o;^,.) and of course all directional derivatives above are equal. So by the definition of the /f-directional derivatives, the inclusion i^i(epi/,(xO,/(x'^))) C i^2(epi/,(x°,/(x0))) implies the relation
f'''{x\.)^f^^{x\.). Regarding the inclusion graph for the classical cone approximations, discussed in Section 3.4, we can give a corresponding graph with respect to the associated directional derivatives.
/^(x°,.)
>
V| /^(x°,.)
fQ{x^.) Vi
>
V| f\x^,.) >
/^(xO,.) V| f{x\.)
>
r(xO,.)-
Here the directional derivative / ^ ( x ^ , . ) is associated with the pseudo tangent cone which is the convex hull of the contingent cone. Therefore, / ^ ( x ^ , . ) is the convex envelope of f^{x^^ .)• Before investigating further properties regarding these special directional derivatives, we shall analyze the i (a:^,/(x^)) .
x^
with /j. S f{x)
,
(3)
Obviously, the following simple assertions are true. Lemma 4.7.1. If / is lower semi-continuous at x^, then (1) is sufficient for (3). If / is upper semi-continuous at x^, then (2) is sufficient for (3). If / is continuous at x^, then (3) is equivalent to x -^ x^. Now we get the following results regarding the dual local cone approximations Z ^ ( . , . ) = H{.,.) and Fm{-,.), Im{>,) = E(.,.) and r ^ ( . , . ) . Ami^,.) = TC{.,.) and Q^(.,.) respectively. Theorem 4.7.7. For any y e EP- it holds
i) /^-(xO,y) = /^(xO,y)= limsup Z i ^ l M j l i l ,
no ii) /^-(x°,j/)
^
= liminf
•m)f^-{x\y) =
f±±^tUt^
-{-ff-{x\y).
Theorem 4.7.8. For any y e iR" it holds
i) f-{x^.y)
ii) f-{x^.y)
= f'^{x^.y)=
limsup / ( ^ ± ^ £ ) j l i l ,
= liminf no 2—^V
/(^±MllA^, ^
424
Nonsmooth
optimization
problems
Theorem 4.7.9. For any y e IR^ it holds i) f^-{x'.y)
= r^ix',y)
ii) f^-{x^,y)
= lim
sup no
inf liE^^lzJi z-^y
^
= lim
inf no
sup / ( ^ + ^ ^ ) - ^ ^ z-^y ^
iii)/^-(x^y) = ~(-/)e-(rrO,j/). Since the proofs of all assertions are similar, we prove only the last theorem. Proof of Theorem 4.7.9. 1) Since ^ ^ ^ ( e p i / , (a:^,/(x^))) is closed, we have epi/^">(a;0,.) = Am{epif, (x^Jix"))) = { ( y , 0 € i R " x J 2 | V £ > 0 yNiy)
= 3A>0
35 >0
3V{x'^)
:
V (x, n) € epi / with x € V{x^) and |/i - /(a;°)| < e, Vt€(0,A)
3^€iV(y)
3C with |C - CI < e :
(x + tz, / i + *C) € epi / } = = {(y,0€ J?"xiR|V£>0
ViV(2/)...
...3z€iV(y)
3Cwith|C-e|<e:
f{x + tz)-n
< ^1
= {(y,O€iR"xiR|Ve>0
^N{y) ...3z
e N{y) :
= {{y,0
sup
/ ( ^ + ^ ^ ) - ^ g ^} .
€ iR" X JR I lim
inf
Special K-directional derivative
425
2) Since Qmi^P^f^ (^ ) / ( ^ ))) '^ open, we have e p i V ^ - ( ^ ° , . ) = Qm{epif,{x^Jix^))) - {(y,0 € l 2 " x J 2 | 3 e > 0
=
3A^(y) VA > 0 V(5 > 0 Vy(xO)
3 (a:, /i) € hypo° / with x € T^(2;°) and |/i - /(a;°)| < e, 3 i € (0, A) Vz € iV(y) VC with |C - ^| < e : (x + tz,fj, + tC) € epi / } = = |(2/,0 € J 2 " x j R | 3 £ > 0
3N{y)...
...\/zeN(y) /(re + tz)= { ( y , 0 € ^ " x J?|3£>0
VCwith |C - ^| < £ : fi < >1
3 i V ( y ) . . . V z € Ar(y) :
f{x + tz)-fj,
= {(y,OeJ?-xiR|lim
< ._^1
inf
sup / ( ^ + ^^) - A^ < - ^ |
(x,/i) T / 1 °
3) /^-(a:0,y) = lim
sup
inf / ( ^ ± ^ £ ) _ J f =
tiO
= — lim
2:-+y
inf
sup
(X,-M)T-/XO
The representation of the directional derivatives can be simplified if semi-continuity is assumed. Indeed, if / is lower semi-continuous at x^, then by Lemma 4.7.1 the relation (a:,//) if x^ is equivalent to X
^f x \
/x - ^ /(a;0) ,
^l Z fix) .
426
Nonsmooth optimization
problems
On the other hand, if / is upper semi-continuous at x^, then the relation (x, ju) t / x^ is equivalent to ^ ^^
^0 ^
^ ^
jP(^O) ^
^ ^ ^(^) ^
Thus, in Theorem 4.7.7, in Theorem 4.7.8 and in Theorem 4.7.9 we can replace the number /x by f{x). We get Theorem 4.7.10. If / is lower semi-continuous at x^, then for any y G M^ it holds
i)
/^-^(xO,y) = f«{xO,y) = limsup / ( ^ + ^^/)-/(^) ,
no X—^f
ii) f-{x^y)
^
X^
= / ^ ( x ^ y ) == limsup / ( ^ + ^^) " /(^) ^
^
no
iii) / ^ - ( x ^ y ) = /^^(x^2/) = lim
sup
inf / ( ^ + ^^) ^ / ( ^ ) .
X—»-y^ X °
Theorem 4.7.11. If / is upper semi-continuous at x^, then for any y G IRP' it holds
i)
/ ^ - ( x ^ ^ / ) ^ l i m i n f / ( ^ + ^^) " / ( " ) , a; —>•/ x ^
ii) f "^{x^.y) = limmf ^^-^^ tio
^^—^-^-^ , ^
X —*f X^
iii) / ^ - ( x ^ y ) = lim
sup X —*f X^
inf / ( ^ + ^ ^ ) - / ( ^ ) .
Special K-directional derivative
427
Naturally, if / is continuous at x^ then the requirement x —»/ x^ can be replaced by the simple requirement x -^ x^. In this case we recognize some of the directional derivatives discussed in Section 4.4, e.g. it holds
/^'"(xO,.) = /^(^°,-)
=/V,-),
where in particular, f^{x^^.) is the Clarke generalized derivative and /-^^(x^,.) is the Rockafellar subderivative at x^, which are convex functions of the directional vector y. We have stated in Theorems 3.4.13, 3.4.14 and 3.4.15 that the cone approximations £'(.,.), H{.^.) and TC{.,.) provide convex cones for each argument and that the cone approximations Tmi-, -), Fmi-, •) and Qmi-^ •) provide complements of convex cones for each argument. Therefore, convexity and concavity behaviour of the associated directional derivatives can be derived (also without continuity assumptions) which shall be formulated in the following assertion. Theorem 4.7.12. i)
The directional derivatives / ^ ( x ^ , .)• / ^ ( ^ ^ ^ •) and f^^{x^,.) vex.
are con-
ii) The directional derivatives / ^ ' ^ ( x ^ , . ) , /^"^(x^,.) and / ^ ' ^ ( x ^ , . ) are concave. In Theorems 3.4.16 and 3.4.18 some further interesting properties of the cone approximations JB(., .) and r C ( . , . ) were given. For the associated directional derivatives we get Theorem 4.7.13. i)
I f / ^ ( x ^ y ) < o o , then
/^^(Ay) = Ax^y) = n A y ) .
428
Nonsmooth optimization
ii) If f^{x°, 0) < oo (i.e. f^{x°,.)
problems
is finite), then
/r^(xO,y) = /^(xO,y) = /^(xO,y) for all y e -K". Proof. By assumption we have £'(epi / , (x°, /(a;°))) 7^ 0 such that we can use Theorem 3.4.16. We get
JE;(epi / , (x", /(x"))) = int(TC(epi / , (x", /(x°)))) and
TC(epi / , (x°, /(xO))) = £;(epi/,(xO,/(xO))) . Now the assertion is a consequence of Theorem 4.6.4 and Theorem 4.6.5 respectively. D Theorem 4.7.14. For each y e M^ it holds
i) /^(^°,2/) = (-/)^(:rO,-y), ii) /^^(xO,y) = ( - / ) ^ ^ ( x O , - y ) . Proof. Using Theorem 3.4.18, the topological properties and the duality of the cone approximations £ ( . , . ) and Tmi-,.) in the sense of Definition 4.6.2, we get
epiO/^(x°,.) = E(epi/,(xO,/(xO))) = - £ ( i R " \ e p i / , (xO,/(xO))) =
Hence (using the third part of Theorem 4.7.8) for all y G IRP' we have
/^(x°,y) = -f-{x',-y)
= {-ffix',-y)
.
The second assertion can be proved analogously using the duality of the cone approximations TC{.^.)
and Qmi-, •)•
^
Special K-directional derivative
429
Now we are able to give an assertion for the equivalent description of the strict differentiability of a function using the above-mentioned directional derivatives. Theorem 4.7.15. The following assertions are equivalent: i)
/ is strictly differentiable at x^ ,
ii) /^(x°,.) = /^-(AO. iii) f^(x^,.)
is linear,
iv) /-^""(x^,.) is linear. Proof. i) => ii): If / is strictly differentiable at a;° then / is continuous at x ° and
we get
ii) => iii) and iv): Let f^{x^^.) = f^'^{x^^.). As in our previous remarks, from the topological properties of E{.^.) and Tm{", -) we get / ^ ^ ( x ^ , 0) = / ^ ( x ^ , 0 ) = 0 and this directional derivative is finite on MP'. Thus, jTm^j.0^.) and / ^ ( x ^ , . ) are linear (convex and concave), iii) i): From the relations between the associated local cone approximations we get
f-{x',.)^f{x',.)Sf'{x',.)Sf^{x',.). Since /^"^(x^,.) and / ^ ( x ^ , . ) are linear, all directional derivatives are equal and especially it holds / ^ ( x ^ , 0) = / ^ ( x ^ , 0) = 0. By Theorem 4.7.5 we see that / is continuous at x^ and therefore
no
^
430
Nonsmooth optimization
problems
for all y e IRP'. By Theorem 4.2.3 this is equivalent to the strict differentiability of / at x^ since the directional derivative is finite, convex and concave, i.e. linear. D We shall remark that we can also describe the local Lipschitz property of a function by means of the directional derivatives f^{x^^,) and / ^ - ( x ^ . ) . It holds Theorem 4.7.16. The following assertions are equivalent: i)
/ ^ ( x O , 0) = 0 (i.e. / ^ ( x ^ .) is finite),
ii)
/ ^ - ( x ^ O ) = 0 (i.e. / ^ - ^ ( x ^ . ) is finite),
iii) / is Lipschitzian around x^. Proof. i) ^ ii): The equivalence can be derived from Theorem 4.7.14 and the duality of the directional derivatives, according to
i) and ii) =^ iii): From the relations between the associated cone approximations we get
0 = /^-(a:°,0) g /^(x°,0) ^ /^(a;°,0) g /^(a:°,0) = 0 such that by Theorem 4.7.5 / is continuous at x^. Assuming / not to be Lipschitzian around x^, we would find sequences {x^} and {z^} C IRP^ tending to x^ such that
\f{z^)-f[x^)\
oo .
ll^^-x^ll
z^ — x'^
We set y^ = -r—r
rrr and t^ = \\z^ — ^^W- Without loss of generality we
can assume that {y^} is convergent (since it is contained in the compact unit sphere) and that y^ —> y^. Then
,. \f{x^+tky^)-f{x^)\ limsup •^—^ ^—— = oo . /c->oo
^^
Special K-directional
derivative
431
But this is a contradiction since (we have shown that / is continuous at
no
^
and liminf / ( ^ + ^ ^ ) - / ( ^ ) ^ ;T.(^o^^) ^ tio ^
_ ^
for all y e M"". iii) => i): Let / be Lipschitzian around x^ with the Lipschitz constant L>0.
Then
/ ^ ( x ^ O ) = limsup / ( ^ + ^ ^ ) - / ( ^ ) ^ li^s^p ^ ii^ii ^ 0 z-^O
tiO x—•x
0
and
fE^^o^ 0) ^ _ li^i^f - / ( x + t^) + /(x) ^ _ ^ _ ^ ^ ^ ii^ii ^ Q ^ aO
^
z-^0
X—*X"
2^0
hence/^(x^O) = 0 .
D
Finally, we can use Theorem 3.4.17 to get a representation of the directional derivatives for convex functions. Theorem 4.7.17. If / is convex, then for each y G JR^ it holds
i) A^°,y) = /'(x°,y), ii) / ^ ^ ( x ^ y ) = / ^ ( x ^ y ) .
432
Nonsmooth
optimization
problems
Regarding the inclusion diagram for the cone approximations in Section 3.4 we can give the following table which extends the table on p. 422, but also the relations given in Section 4.4. It holds f^ix',.)
> fix',.)
V| /^(x°,.)
V| > f^ix',.)
VI /^^(x°,.)
> /«(xO,.) V| > /^(xO,.)
VI > f^ix',.)
>
VI >
V| > /^(xO,.)
/«-(zO,.)
/^-(xO,.) V|
>
r-{x',.)
The following assertions can be summarized: 1) The directional derivatives lying opposite to each other with respect to the fictitious mid-point of the scheme are dual according to the relation
/^i(x°,.) = -(-/)^nA-)2) The directional derivatives in the first row of the scheme are upper semi-continuous and it holds /^(xO,0)^0. The directional derivatives in the third row of the scheme are lower semi-continuous and it holds
Concerning the directional derivatives in the second row of the scheme it holds /^(x^0)=0. 3) The directional derivatives in the first column of the scheme are convex. The directional derivatives in the last column of the scheme are concave. The directional derivatives lying in the second and in the third column of the scheme have the property:
Generalized optimality conditions
433
4) The function / is directional differentiable in classical sense at x^ iff
f^{x\ .) = fix',.)
.
The function / is Gateaux differentiable at x^ iff
f\x^.)
= fP{x\.)
and this directional derivative is finite and linear. The function / is uniformly directional differentiable at x^ iff
/^(x°,.) = f{x^ .) . The function / is Frechet differentiable at x^ iff
f\x^
.) = f{x\
.)
and this directional derivative is linear. The function / is strictly directional differentiable at x^ iff
f^{x',.)
=
f-{x\.).
We conclude this section by remarking that there are several other types of local cone approximations and therefore several other associated Kdirectional derivatives. For example the directional derivatives /®(a;^,.), /'-^(a:^,.) and f^{xP^.)» nrientioned in Section 4.4, can be generated by the prototangent cone of Michel/Penot and related cones. We shall not pursue these special approaches, but we refer e.g. to the papers of Michel/Penot (1984), Jofre/Penot (1989), loffe (1986) and Ward (1988).
4.8. Generalized Optimality Conditions In this section we shall use the concept of abstract local cone approximations given in Definition 4.6.1 and the associated abstract differentiability notions introduced in Definition 4.6.3 and Definition 4.6.4 respectively in order to construct necessary optimality conditions for general optimization problems with inequality constraints, i.e. for the problem
434
Nonsmooth optimization
Min f(x) xes
problems
(P)
where the feasible set is described by
S = {xeJRJ'lgiix)
^ 0, i = l,...,m}
,
All the functions f : EP' -^ R\J { ± 0 0 } and gi \ M^ -^ RU { ± 0 0 } , i = l , . . . , m , are assumed to be finite and continuous at a certain point x^eS. If x^ G int(5), then of course for the discussion, if the point x^ is a local minimum point, the problem can be regarded as unconstrained. In this case we can give the following simple optimality condition. Theorem 4.8.1. If x^ e int(5) is a local minimum point of the function / , then for all local cone approximations K(.^.) with K{,^.) C r ( . , . ) (here T(.,.) is the contingent cone approximation) it holds
ii)
OedKfix"^)-
Proof. Let x^ € R^ be a local minimum point of / . Then by Theorems 4.7.3 and 4.4.1 we get
f''{x',y)
^ f{x\y)
= ftix',y)
^0
Vy € iR"
which is equivalent to
0 e OK fix'') by the definition of the jF^-subdifferential.
D
An approach for the discussion of the general constraint optimization problem is given in Theorem 4.4.2. Using the tangent cone and the cone of interior directions we can write
f{x',y)
= fUx\y)
,
f{x\y)
=
f^{x',y)
(see Theorem 4.7.3) and we can formulate immediately the following necessary optimality criterion:
Generalized optimality conditions
435
Theorem 4.8.2. If x° 6 5 is a local minimum point of (P), then for all local cone approximations Ki{.,.) K2(.,.) i)
and K2{.,.)
with Ki{.,.)
C r ( . , . ) and
Q !(., •) it holds
/^2(^o^y)^o
ii) f^^{x°,y)^0
Vy€ifi(5,xO), \/yeK2{S,x°).
The assertion is interesting if we use only one local cone approximation K{.,.) with K{.,.) C r ( . , . ) and int(i^(.,.)) C J(.,.). Then of course for the local minimum point we have the necessary optimality conditions /"*W(a;0,y) ^ 0
Vy G K{S, x^) ,
(1)
f^{x^,y)^0
\/yemt{K{S,x'>)).
(2)
In general the conditions in Theorem 4.8.2, especially conditions (1) and (2) respectively, cannot be expressed in terms of the i^-subdifferentials. For this - but also in order to construct sharper optimality conditions - we should assume the local cone approximation K{,,.) to be convex. With this assumption we are able to formulate the following result. Theorem 4.8.3. Let K{.j.) be a convex local cone approximation with K{.,.) C T(.,.) and int(K(.,.)) C / ( . , . ) . If x^ G 5 is a local minimum point of (P) and if one of the following conditions (Bl)
dom(/^^(^) (x^ .)) n K{S,x^)
^ 0
(S2)
d o m ( / ^ ( x ^ .)) n int(i^(5, x^)) ^ 0
be fulfilled, then i)
/^(rr^y)^O
Vy€K(5,xO).
ii) O e a K / ( x ° ) + ( K ( 5 , x O ) r . Proof. Let x^ be a local minimum point of ( P ) . If ( P I ) is fulfilled, then using the above-mentioned optimality condition (1) according to r*W(x°,y)^0
Vy€K(5,a;0),
the nonempty convex cones
436
Nonsmooth
optimization
problems
A = epiO /"*(^)(x°,.) = int(i^(epi / , {x^, /(z^)))) B = K{S, a;0) x M^ are disjoint. By the separation theorem (Theorem 2.2.5) we can find a vector (u, a) 6 i ? " + ^ (u, a) i- (0,0) with
Obviously, a ^ 0, and for a = 0, because of the assumption ( 5 1 ) , we would get u = 0 which is a contradiction. Without loss of generality we set a = —1 and we can conclude uy
^ /intm(^O^y)
ixy ^ 0
y ^ ^ d o m ( f ^ ^ W ( x ^ .)) ,
V2/€ir(5,xO) ,
which means (see Theorem 4.6.6) since dom(/'^^(^)(x^,.)) 7^ 0 ^ e A n t w / ( ^ ' ) = %/(:r^) and ^ u G ( i f ( 5 , x O ) ) * = (i^(5,a:^))* . Hence 0 = 7i + (~t^) G ^ ^ / ( x O ) + ( K ( 5 , x^))*. If (B2) is fulfilled, then because of the above-mentioned condition (2) according to /^(x°,y)^0
Vy6int(K(5,x°))
the nonempty convex cones C = epi j^{x\
.) = ir(epi/,(x",/(a:")))
i? = int(/sr(5,x°)) xint(iR_) are disjoint. In the same manner as in the first part of the proof, by use of the separation theorem we can find a vector u € i R " with
Generalized optimality conditions
uy ^ f^{x^,y) uy^O
437
Vy G dom(/^(rr^ .)) ,
Vy Giiit(if(5,x^)) .
Therefore (since i n t ( i ^ ( 5 , x^)) y^ 0) ^ e %/(:^') = % / ( x ' ) and -zz G (int(if(5,x^)))* = {K{S,x^)y
,
Hence 0 = 1^+ {-u) G ai^/(a;0) + ( i ^ ( 5 , x^))\ Thus, ii) is fulfilled. Finally, for the special vector u G % / ( x ^ ) H ( - X ( 5 , x ^ ) ) * it holds uy^O
VyeK{S,x^)
which is sufficient to i) according to f^{x^,
y) > sup {uy I u G dKf{x^)}
^0
\/y E K{S, x^) .
D
The assertions formulated above have a geometrical structure, since the feasible set S is replaced, locally, by a cone which is determined only by the geometrical structure of set S. Since the feasible set is described by level sets of several functions, one should discuss other approximations for S using the increasing behaviour of the active constraints. Therefore as an extension to the definitions of Chapter 3 we introduce the linearizing cone and the cone of decreasing directions by means of the Jf-directional derivatives. Let S^ixeM""]
gi{x) ^ 0, i = 1,..., m}
and x^ G 5 be a feasible point. All functions are assumed to be continuous as x^. Thus, for local considerations we have to regard only the active constraints. The index set is denoted by I{x^), i.e. I{x'') = {i\gi{x^)
= 0}.
Definition 4.8.1. Let K{.^.)
be a local cone approximation.
438
Nonsmooth
optimization
problems
1) The set
is called the linearizing cone to S at x^. 2) The set D^(x^)
^{yeEJ'l
gfix^,
y) < 0 V i e
I{x^)}
is called the cone of decreasing directions to S at x^. For I{x^) = 0 by definition we set C^{x^) Obviously we have the sinnple inclusion
= D^{x^)
= iR^.
Regarding also the local cone approximations int(i^(.,.) and K{.^.) can derive directly from the definitions 2)int(X)(^0)
g
D^{X^)
ni
C
we
D^{X^)
ni
r\\
(3)
Moreover, because of the upper semi-continuity of the directional derivatives gf^^\x^,.),
i e I{x^),
the cone D'''^^^\x^)
is open. In the same
manner because of the lower semi-continuity of the directional derivatives gf{x^,.),
i e I{x^),
the cone C^{x^)
is closed.
More results can be stated if the local cone approximation K{.^.)
is
assumed to be convex. Then all cones contained in (3) are convex. Theorem 4.8.4. Let K(.^.) it holds: i) Di^^(^)(xO)
be a convex local cone approximation. Then
= int(i5^(xO)) -
ii) int(C^^^W(xQ)) = mt{C^{x^))
=
mt{D^{x^)). mt{C^{x^)).
Generalized optimality conditions
iii) If D^{x°)
439
^ III, then
iv) l f D i ° t W ( x ° ) 7 ^ 0 , then
2)int(X)(a;0) ^ C^(a;0). (Thus the interior, but also the closure of all cones contained in the table (3), coincide.) Proof. 1) Because of (3) and since £)"^*W(x°) is open, it is sufficient to show that int(£>^(xO))CD'n*W(xO) . For this let y 6 int(£)^(x^)) and e ^ 0. Then, because of the convexity, for all 2 e I{x°) we have {y,e) € mt{K{epigu
{x°, gi{x (C.Q.3) 0,ie i)
{ 0 } U / ( x ^ ) , not all vanishing, such that
Ao/^(a:^y)+
J2
ii) ^e\odKf[x^)+
>^^9f{x^y)^0\/yelR^,
E >^idK9i{x''). iei(x^)
Proof. Let x^ be a local minimum point of (P). With respect to Theorem 4.8.10 and the Fan/Glicksberg/Hoffmann alternative theorem we get the existence of multipliers Xi ^ 0, i e { 0 } U / ( x ^ ) , not all vanishing, such that
i€/(a:0)
for all y e d o m ( / ^ ( x ^ .)) n
f]
dom{g^^^^\x^,.)).
Naturally, this in-
equality can be extended for y G M^. By Theorem 4.6.6, Theorem 4.6.9 and the associated remark we can derive
iel{x^) = Ao%/(xO)+
E Ai%^^(x«) iei{x^)
and
ie/(iO)
448
Nonsmooth optimization
problems
If we want to derive a generalized Kuhn-Tucker assertion we have to assume of course a constraint qualification to ensure that AQ 7^ 0 in the relations i) and ii) of Theorem 4.8.11. We can state immediately Theorem 4.8.12. Let K{.^.) be a convex local cone approximation with i f (.,.) C r ( . , . ) and mt(K{.,.)) C / ( . , . ) . Further let the condition (53) of Theorem 4.8.11 and the constraint qualification
D^(x^)
^ 0
(C.Q.5)
be fulfilled. If x^ G 5 is a local minimum point of (P) then there exist multipliers Xi>0,ie
I(x^),
such that
i) /^(:^^2/)+ E
\gf{x',y)^0
WyeM^.
iel(xO)
ii) 0 e dKf{x^) + J2 ^i^Kgiix""). 2G/(xO)
Proof. We use the assertion of Theorem 4.8.11. Assuming AQ = 0 we would get
which contradicts the condition (C.Q.5).
D
We shall now apply the formulated approach for the Lipschitzian case, i.e. for problems of the kind
Min fix)
(PL)
xes with S = {xelR^l
gi{x) S 0, i = 1,..., m}
where all functions are assumed to be Lipschitzian around the point x^.
Generalized optimality conditions
449
If we choose for the local cone approximation i f ( . , . ) the Clarke tangent cone r C ( . , . ) , we have seen in Section 3.4 that in case of E{S,x^) ^ 0 the equality
E{S, x^) = mt(H{S, x^)) = i n t ( r C ( 5 , x^)) holds; we recall that all three cones are convex. Moreover, regarding the associated iT-directional derivatives we have for a function / which is Lipschitzian around x^, the relation / ^ ( x ^ , 0 ) = 0 and
f'^iAy)
= f^iAy)
= f'iAy)
= Hmsup f^^ + 'v^-f^-^ no ^
for all y^ET' (see Theorems 4.7.10, 4.7.13 and 4.7.16). We shall denote this directional derivative, which is finite and continuous on M^, by f^(x^) (the generalized Clarke derivative) as we did in Section 4.4. For the i(^-subdifferentials we have
&rcf(x^)
= dHfix^) = {ueR''\uy
= dEfix^)
=
^ f{x^,y)
Wy e M""} .
As in Section 4.5 we denote this subdifferential (the Clarke subdifferential) by dcif{x^)-
f{x^,y)
By Theorem 4.6.7 this set is bounded and it holds
= max{uy\ue
dcifix"^)
Vy G M"} .
Obviously, the conditions (Bl) of Theorem 4.8.3 but also the condition (B3) of Theorem 4.8.11 are fulfilled since dom(/^(x^,.)) = JR^, dom{gf{x^,.)) = IRT, i = l , . . . , m , and 0 e TC{S,x^). Thus for the Lipschitzian problem (PL) Theorem 4.8.3 can be formulated in the simple manner: Theorem 4.8.13. Let / and gi be Lipschitzian around x^ E S. If x^ 6 5 is a local minimum point of (P) then
i) f{x°,y)^0
\/yeTC{S,x^).
ii) 0 € 5 c j / ( x O ) + (rC(5,a;0))*.
450
Nonsmooth optimization problems To formulate also multiplier rules for Lipschitzian problems we shall dis-
cuss the linearizing cone and the cone of decreasing directions. Obviously we have
= {yelR-\gf{x^,y)SO
\/ieIix^)},
which is a closed convex cone, and
= {y€JR"|5.V,t/) M and g : IR^ —> IR^ are assumed to be convex and differentiable. The latter means that in the representation g = ( g i , ...,gm) all components gi : M^ —^ IR are differentiable convex functions. Using the Lagrangian m
L{x,u) = f(x) + ug{x) = f{x) + Y^ Uigi{x) i=i
we introduce the Wolfe dual problem according to
Max L{x,u)
(Dc)
{x,u)eT
where T = {{x, u) e IR^ x M^ \ V^L^x, n) = 0, n ^ 0} . We shall remark that x and u are the dual variables. Let X 6 5 and {x^u) e T.
Then, because of g{x) ^ 0, u ^ 0,
WxL{x^u) = 0 and the convexity of L{.^u) we get
f{x)
^ f{x) + ug(x) = L{x,u)
^
^ L{x,u) + {x — x)VxL{x,u)
= L{x,u) J
(1)
i.e. weak duality holds. We shall remark that we can weaken the convexity assumption to the functions / and g. For the weak duality it is sufficient that L{.^u) is pseudoconvex for all fixed u^O, In fact, because of VxL{x^u) = 0 we get in this case
L{x,u) ^ L{x,u)
"ixeM"
and the inequality (1) remains true. The weak duality can fail, however, if we assume only pseudoconvexity of / and quasiconvexity of g. Example 5.3.1 (Mangasarian). Let the objective function f : IR -^ M and the constraint function g : IR —^ M be given by
f{x) = - e " ^ " ,
g{x) = ~ x + 1 .
Duality in convex optimization (Wolfe duality)
465
Then, / is pseudoconvex (even strictly pseudoconvex), g is quasiconvex (even linear) and the point x^ = 1 is the (global) minimum point of the problem
Min/(x)
s.t. g{x) ^ 0
with the optimal value f{x^) representation
Max {-e""^ -ux
+ u}
= —e"-*^. The Wolfe dual problem has the
s.t. 2x6"''^ - u = 0, u ^ 0 .
Eliminating u, we see that this problem is equivalent to
Max{-e-''\2x'^-2x
+ l)}
s.t. xZO
.
Obviously, the objective function is negative and tends to zero with increasing x. Especially for x = 10 (and therefore u = 20e~-^^^) we have
-181e-io° > - e - i which contradicts the weak duality. The following strong duality assertion is a conclusion of the Kuhn/Tucker theorem formulated in Chapter 3. Theorem 5.3.1 (Wolfe). Let x^ e S be an optimal solution of (Pc) and let a constraint qualification be fulfilled. Then there exists a vector u^ ^ 0 such that (x^^u^) is an optimal solution of {Dc) and the optimal values of (Pc) and {Dc) are equal. Proof. From the Kuhn/Tucker theorem we get the existence of a multiplier vector u^ >0 such that
Va:L{x^^u^) = 0 and u^g{x^) = 0 . Thus (x^,ii^) G T and we have
/(a:°) = fix^) + A(^°) = i^(^°, u'') . Because of the weak duality, the point (x^^u^) is a solution of ( D c ) .
D
466
Duality
With respect to our previous remarks concerning generalized convexity, we can weaken also in this theorem the convexity assumption by the demand that the Lagrangian L(., u) be pseudoconvex for each fixed u ^ 0. If we regard the linear optimization problem {PLIN) of the former section it is easy to show that the Wolfe dual problem {Dc) is equivalent to (DLJN). In fact, we get by definition
Max {ex — u(Ax — 6) — vx} s.t. c-uA-v
= 0,
u^O,
^; ^ 0 .
(Dc)
Eliminating v, this means
Maxu6
s.t. c - ' u ^ ^ O ,
u^O
which is the dual problem of linear optimization. Thus (regarding the fact that for polyhedral sets no constraint qualification is necessary), Theorem 5.2.1 is a special case of Theorem 5.3.1. To formulate an inverse duality assertion, all functions are assumed to be twice differentiable. Theorem 5.3.2 (Hanson, Huard). Let (x^^u^) be an optimal solution of (Dc) and let the Hessian V^Z/(x^,u^) be not singular. Then x^ is an optimal solution of (Pc) and the optimal values of (Pc) and (Dc) are equal. Proof. Since V'^L{x^^u^) is not singular, by the implicit function theorem there exists a neighborhood of (x^^u^) in which the nonlinear equation VxL{x,u) = 0 is solvable in terms of x = x(u), i.e. it holds x{u^) = x^ and VxL{x{u),u) = 0. We have assumed that [x^.vP) is a maximum point of the dual problem. Thus u^ is also a maximum point of the problem
MaxL(x(.),.)
s.t. u^Q
.
Applying the Kuhn/Tucker theorem to this special problem, we get by derivation of this composite function with respect to u
Duality in convex optimization (Wolfe duality)
467
V:,L(2;(uO),uO)V„x(uO) + V„i:(x(uO),u°) ^ 0 u\V^Lix{u^),
uO) Vux(uO) + V„L(x(uO), nO)) = 0 ,
which means (since VxL{x{u°),vP)
= V^L{x^,vP)
= 0) t h a t
V,L(AnO)=p(xO)gO
Thus, x^ is a feasible point of the primal problem and by the complementary slackness condition we get the equality /(xO) =
/(xO)+A(2;°)-i:(a:°,n°).
Regarding the weak duality, the point x^ is a solution of (Pc)-
D
We can weaken the convexity assertion if we only assume t h a t the objective function is pseudoconvex and the constraint function is quasiconvex.
Even if weak duality fails, the proof is similar.
Because of the
relations
and the assumptions
u^ ^0 ,
V^L{x^,u^) = 0 ,
the Kuhn/Tucker conditions of the primal problem are fulfilled. Now in Theorem 3.9.2 we have stated that by the mentioned assumptions these conditions are also sufficient optimality conditions. Thus, x^ is a solution of the primal problem ( P c ) Similarly, the assertion remains true if the Lagrangian L{.^u^)
is as-
sumed t o be pseudoconvex. This is a consequence of Theorem 3.9.3. We can state that if the strong duality relation
f{x^) = f{x) + if9{x)
468
Duality
holds, then x^ and (x^u) are optimal solutions of (Pc) and (Dc) respectively. The question arises if the converse assertion is true, especially if the equality x^ = x holds. For this we have Theorem 5.3.3 (Mangasarian).
Let x^ be an optimal solution of
and (x,n) be an optimal solution of (Dc)-
(Pc)
If a constraint qualification in
x^ is fulfilled and if L{.^u) is strict convex in a neighborhood of x, then we have x^ = x and the optimal values of (Pc) and [Dc) are equal. Proof. Analogous to Theorem 5.3.1 we can find a vector vP >_0 such that {x^^u^) is also a solution of the dual problem and it holds
f{x^) = L{x^,vP)^L{x,u)
.
We only have to show that x^ = x. If x^ 7^ X, then by the strict convexity we would get (since VxL{x^ u) = 0)
L(x^, u) > L(x, u) + {x^ - x) VxL{x, u) = L(x, u) = f{x^) i.e. /(a:0) + uff(x°)>/(x°) or equivalently
ug{x^) > 0 which is absurd since u>0
and g{x^) ^ 0 .
D
Also in this assertion the convexity assumptions can be weakened. Indeed, if the Lagrangian L(., u) is pseudoconvex for each fixed u > 0, then the first part of the theorem remains true. Assuming strict pseudoconvexity for Z/(., u), then because ofWxL{x^ u) = 0 we get, analogously, for xP ^x
L{x^,u) > L{x,u) = f{x^) which would be a contradiction to iZ ^ 0 and g{x^) ^ 0.
Lagrange duality
469
5.4. Lagrange Duality Let X C iR^ and C/ C M^ be arbitrary sets and L : X x U -^ M be an arbitrary function. Then we can define two extended real valued functions ip : X —^ Wt and (p : U —^ M according to (p{u) = inf {I/(cc, u) \x e X} ,
u EU ^
ip{x) — sup {I/(x, ?i) I ?i G C/} ,
X EX
,
Obviously we have ^{u) ^ L{x,u)
^ i){x)
Vu G [/ Vx G X ,
(1)
and especially sup{ip{u) \ueU}^
mi{xl;{x) \xeX}
.
(2)
In terms of the Lagrangian this means sup
inf L{x^u) ^ inf
sup L{x^u) .
(3)
Thus, the last inequalities provide an approach for the construction of a pair of dual optimization problems. First of all we emphasize that the above demand of equality is closely connected with the existence of saddle-points of the function L. We recall that (xP^vP) is a saddle point of L (exactly a saddle point of L with respect to X xU) iff L ( x ^ u ) ^ L{x^,vP) ^ L{x,vP)
yxeX
\/ueU
.
(4)
Lemma 5.4.1. For the points x^ e X and u^ e U \t holds (p{u^) = i^ix^) if and only if (x^^u^) is a saddle point of L. Proof. Let (p{u^) = ip{x^). Then because of (1) we get ^{u^) =
L{x^,u^)=iP{x^)
and L{x^,u)
^ ^{x^) = L{x^,vP) = ifiyP) ^
L{x,u^)
470
Duality
for 2i\\ X E X and all u E U. Hence, (x^,n^) is a saddle point of L. Conversely let (x^.u^) be a saddle point of L, i.e. let (4) be fulfilled. Then of course we have ^(x^)
- sup {L{x^, u)\ueU} ^
inf {L{x,u^)
^
\ X e X} = ip{u^)
and with (1) we get the equality.
D
We should remark that in case of (p{u^) = ip{x^), which means max
inf L{x^u) — min sup L{x^u)
,
(5)
the points x^ and vP are minimum and maximum point of T/J and ^ respectively. The converse implication, however, is not true, in fact, even if minimum and maximum o^ ip and ^p respectively exist, then in general (5) does not hold. Example 5.4.1. Let X = [/ = [0, 2TT] C IR and the function L be given by L(x, u) = sin(3: + u) . Then (p{u) = inf {sin(x + u) \ x e [0^ 27r]} = = min {sin(x + u) \ x E [0,27r]} = —1 ip{x) = sup{sin(rc + u) \u E [0^27r]} = = max {sin(x + u) \u E [0, 2n]} = + 1 for all x G X and all u E U. Thus max (p{u) < min ip{x) or equivalently max min L(x,u) < min max ueu xex ^ ^ xeX ueu
Lix.u) ^ ^
i.e. the function L does not admit any saddle point.
Lagrange duality
471
To ensure the equality (5) which is equivalent w i t h the existence of a saddle point of L, we have to assume special algebraic and topological properties. We shall give the following assertion regarding the structure of ip and cp. Lemma 5.4.2. i)
If L ( . , u) is lower semi continuous for all fixed u eU, semi continuous function. If L{x^.)
then V^ is a lower
is upper semi continuous for all
fixed X e X, then (p is an upper semi continuous function. ii)
If X, U are compact sets and L ( . , . ) is continuous, then ip and ip are even finite and continuous functions.
iii)
Let X and U be convex sets. If L{.^u)
is convex for all fixed u E U,
then V^ is a convex function. If L ( x , . ) is concave for all fixed x e
X,
then (/? is a concave function. Proof. By the semi continuity assumptions of L ( . , . ) the sets
epi^/j = P I epiL(.,i^) ueu and
hypo(^= PI hypoL(x,.) xex are closed.
Hence, ip and (p are lower semi continuous and upper semi
continuous respectively. In the same manner we get the convexity properties of these functions. Now let X and U be compact sets and L ( . , . ) be continuous. Then by the Weierstrass theorem the functions (p and ip are finite and it holds even
ip{x) = max {I/(x, u) \ u e U} ^
(p{u) = min {L{x^ u) \x e X}
(6)
for all X G X and all u e U. We show the continuity of ip, (The proof that (p is continuous is the same.) Above we mentioned that ip is lower semi continuous. So we have only t o show that it holds
limsup ip{x) ^ "ipix^)
472
Duality
for all a;° 6 X. Assuming the opposite we could find a sequence {x^} C X tending to x° such that A;—)"00
Now by (6) there exists also a sequence {u^} C t7 with
"ipix^) = max {L(x'', u)\ueU}
= L{x^, u^)
for all k G IV. Moreover, since U is compact, without loss of generality we can assume that this sequence is convergent with u^ —^ u E U. Then, however, because of the continuity of L we would get
L{x^^u^)=
lim L(x^^fc^) = A;—•oo
= lim iPix'') > 7/>(x°) = sup{L(x°,n)
\u€U}
k—^oo
which is absurd.
D
Now we are able to formulate the following Minimax theorem. Theorem 5.4.1 (v. Neumann). Let X and U be convex compact sets, L{.^u) be convex and lower semi continuous for all fixed u £ U, and L{x,.) be concave and upper semi continuous for all fixed x E X. Then there exist points x^ e X and u^ e X such that (p{u^) = ipix^)Proof. First of all we should remark that because of the compactness of the sets and the semi continuity properties, the least upper bounds and the least lower bounds are reached, i.e.
ip{u) = inf L{x^u) = min L{x^u) , xex xex (p{x) = sup L(x, u) = max L{x^ u) ueu ^^^ for diW u eU, X e X and
sup inf L(x^u) = max min L(x^u) = max ip{u) , inf sup L{x^u) = min max Lix^u) = min ip{x) . xeX y^^jj xex ueU xex
Lagrange duality
473
Now, because of
max min Lix.u) ^ min max Lix.u) ueu xex ^ ^ xex ueu ^ ^ we only have to show only the converse relation. For this let 7 be a real number with
7 < min max L(x, u) . xeX ueU
(6)
Then for each point x e X there exists a point u eU such that L{x^ u) > 7. This means that
PI {xeX\L(x,u)
^ 7 } = 0-
Since all sets are compact we also can choose a finite number of such sets which are disjoint, i.e. there are points u^^ ...^vP E U with V
{^{xeX\L{xy)
^7}-0.
Thus the convex inequality system (with respect to x) L{x^u^) — 7 < 0 ,
i = 1, "",P
admits no solution. Using the Fan/Glicksberg/Hoffmann theorem (Theorem 2.9.1) we can find nonnegative numbers Ai,...,Ap > 0, not all vanishing, such that V
Y, \i{L{xy)-^)
^0
VXGX .
1=1
p
Without loss of generality we can assume ^
A^ = 1 and regarding the
2=1
concavity of F ( x , . ) we get
7 = ^Xa^Yl i=l
\L{xy) ^ L[X,Y, \iv^) i=l
Wxex
i=l
which means that p
7 ^ min Llx.y^ xeX \ ^
XiU^] ^ max min L(x.u) . ) ueU xeX ^ ^
2=1
Since this relation is true for all numbers 7 fulfilling (6) it holds even
474
Duality
min max Lix.u)
^ max min Lix.u)
.
D
We should remark that the assumptions of the theorem can be weakened. So one can show that the assertion even holds if convexity and concavity is replaced by quasiconvexity and quasiconcavity respectively. We shall not extend our considerations beyond this. For further results we refer to e.g. Sion (1957), (1958), Karlin (1959), Berge/Ghouila-Houri (1962) and Dem'yanov/Malozemov (1972). Now we regard the optimization problem
Min fix)
(P)
xes where S := {x e X \ gi{x) ^ 0 , i ~ l^..., m} . Here X C M^ is an open set and f : X -^ IR and gi : X —^ M, i = l , . . . , m , are arbitrary functions defined on X. Also, here for simplification, we declare the vector-valued function g : X -^ M^ by g{x) = {gi(x), ...,gm{oo)). To apply the above-discussed results in connection with the construction of a duality concept, we specify with L{.,.) the Lagrange function of the problem (P) introduced in Chapter 3 according to m
L{x, u) = f{x) + ug{x) = f{x) + ^
Uigi{x)
i=i
and which is defined on the set X xU
with U =
M^.
By definition we get for x G X
'0(x) = sup L{x,u) = sup {f{x) + ug(x)} = u ^ 0
u ^ 0
if g{x) ^ 0 else . Thus, on the feasible set the objective function / coincides with the function ip. We get
mf, V(x) = inf / ( x ) and the optimization problem (P) can be described equivalently by
Lagrange duality
475
Min i/j{x) . xeX
(PL)
In the same manner we can formulate a dual optimization problem (the Lagrange dual problem)
by
Max ip{u)
{DL)
w i t h the dual objective function
^{u) — inf {L(x, u)\x
^ X} .
Obviously, according t o our former remarks the weak duality relation holds since
sup Lp{u) ^ inf '0(x) — inf ^^^
n ^ 0
f{x)
^^"^
or equivalently (in terms of the Lagrangian)
sup
inf^ L(x^u) ^ inf^
> 0n ^ ^ ^ Moreover, since L{x^.)
^^^
sup L{x^u) = inf^ f{x) ,, > n
ooes
is linear, the dual objective function (/? is concave
and upper semi continuous (see Lemma 5.4.2). Thus the dual optimization problem is a maximization problem of a concave objective function w i t h respect t o a convex (even polyhedral) feasible set, i.e. it is equivalent t o a convex optimization problem. On the other side, if ( P ) is a convex optimization problem (i.e. if all functions / and gi, i = l , . . , , m , are assumed t o be convex and continuous), then the Lagrangian L{.^u)
(with respect t o x) is a convex and
continuous function and therefore the function ip is convex and lower semi continuous. To construct strong duality assertions, by Lemma 5.4.1 we have t o ensure the existence of saddle points of the Lagrangian. In this connection, w i t h the Minimax theorem of v. Neumann (Theorem 5.4.1) we have provided an approach for strong duality, but we cannot apply this assertion since compactness assumptions fail. In Section 3.14, however, some global optimality conditions for convex optimization problems are described in the
476
Duality
form of saddle point assertions by means of additional constraint qualifications. We can extend these results by discussing the solvability not only of the primal but also of the dual problem. First, as an extension of Theorem 3.14.1 we get Theorem 5.4.2. Let {x^,u^), x^ e X, u^ ^0 be a saddle point of L with respect to X x IR^. Then x^ solves (P), u^ solves (DL) and the optimal values are equal. Proof. The assertion is a consequence of Lemma 5.4.1, the weak duality and the above-mentioned property that (P) and (Pi) are equivalent.
D
The proof can also be derived directly from the definition of the Lagrangian. Indeed, let (x^^u^) be a saddle point of L. Then the saddle point inequalities in (4) can be described by
/(x^) + ug{x^) ^ f{x^) + u^g{x^) ^ f{x) + v?g{x) Vx G X V^^ ^ 0 . Obviously the first inequality is equivalent to
ug{x^) ^ u^g{x^)
\/u^0
which means that g{x^) ^ 0 and (setting u = 0) vPg{xP) ^ 0, i.e. even (since vP ^ 0)
g(x^) ^ 0 ,
A(x^) = 0 .
The second inequality is equivalent to
L(x°, u°) = min L(x, vP) . Thus, (x^^u^) is a saddle point of L if and only if
g{x^) ^ 0 , u^gix^) = 0 , L{x^,u°) = min L{x,vP) . We have that the global Kuhn/Tucker conditions (see Section 3.14) are fulfilled and because of
Lagrange duality
477
= min L{x^^u) = (f{u^) XEX
the points x^ and u^ are solutions of (P) and (DL) respectively and strong duality holds. With respect to the existence of saddle points we can use Theorem 3.14.5 for formulating the following direct duality assertion for convex optimization problems. Theorem 5.4.3. Let / and g be convex functions. \f x^ e S is an optimal solution of (P) and if the Slater condition is fulfilled (i.e. there is a point X e X with gi{x) < 0 for all i = 1,..., m), then there exists a vector u^ € IR^, such that u^ is an optimal solution of (DL) and the optimal values of (P) and (DL) are equal. Proof. By Theorem 3.14.5 we get the existence of the vector u^ ^ 0 such that {x^,u^)
is a saddle point of L with respect to X x M^.
Then of
course u^ is a solution of (DL) and the optimal values are equal.
D
In the convex case, Wolfe duality and Lagrange duality are closely connected. Let us compare the dual problems (Dc) and ( D L ) - FO"" this we have to assume that all functions / and gi, i = 1, . . . , m , are convex and differentiable on IR^. Let (x,u) E T be a feasible point of the Wolfe dual problem, i.e. let
VxL{x,u)
= 0,
u^O
,
Because of the convexity this means that the Lagrangian L(.^u)
(as a
function of x) admits a minimum at x, such that
L{x^ u) = (p{u) . Thus the Wolfe dual objective function and the Lagrange dual objective function coincide for such points. This means
sup {L{x, u) I (x, u) eT}
= sup { 0} ^ S
mf{f{x)\xeS}
478
Duality
and therefore, strong Wolfe duality implies strong Lagrange duality. The converse assertion is not correct, even in the differentiable convex case. In fact, for the calculation of the Lagrange dual objective function we have to regard
(p{u) = inf L(x, u) which is not equivalent to VxL{x, u) = 0 for any x G M^. Example 5.4.2. Let f : M-^ M and g : M-^ and g{x) — x. Then the primal problem
Mine^
M be given by f(x)
= e^
s.t. a: ^ 0
has no solution. The optimal value, however, is bounded and it holds
inf {e^ I X ^ 0} - 0 . Regarding the Lagrange dual problem we get
^{u) = inf L{x^u) = inf {e^ -\-ux) = xeX xeX
__ J 0
\f u = 0
\
\f U > 0
—CXD
and max (p{u) = 0. u ^ 0
Hence we have strong duality in the sense of
max (p{u) — inf f{x) = 0 . u ^ 0
^^^
For the Wolfe dual problem, however, the feasible set T is empty since u ^ 0 and VxL(x^u) = e^ + u = 0 are incompatible. Hence
—oo =
sup
L{x^u) < inf f{x) = 0 .
{x,u)eT
""^^
To obtain further duality assertions using special tools of convex analysis, we shall interpret the dual problem geometrically. Therefore we regard the so-called upper image set of the problem (P) according to
M{P) = {{y^z) E i R ^ x J R | 3 x G JR^ :g{x) ^ y, f{x) ^ z}
Lagrange duality
479
as a subset in the image space of the vector-valued function (^, / ) : IR^
Figure 1. Obviously the part of A ^ ( P ) associated t o the feasible set of the primal problem ( F ) is situated at the left-hand side, i.e. in the left of the z-axis. Denoting the optimal value of ( P ) by the number a, this value can be represented then by
a = mi{ze]R\3y = mi{zeIR\{0,z)
g 0 : {y,z) e M{P)} eM{P)}
=
,
i.e. it is the smallest 2:-component of the left part of Now let u e M^,
u^O
and X e X.
Ai{P).
We have {g{x)J{x))
G
M{P)
and we can place a nonvertical hyperplane with the normal vector (ix, 1) and containing the point ( ^ ( x ) , f{x)).
This hyperplane is described by the
equation
z + u^y = f{x) + ug{x) = L(x, u) . Obviously, the value L ( x , u) can be recognized as the ^-component of the point (0,2:) of this hyperplane, i.e. the intersection point of the hyperplane and the z-axis (see Figure 1).
Duality
480
Regarding the dual objective function, in the case of ip{u) > —oo we can construct a parallel hyperplane described by the equation
z + u y = ip{u) = inf L(x^ u) which is a support hyperplane to the set M{P).
Also here the value ip{u)
of the dual objective function can be found by the intersection of this hyperplane with the 2:-axis (cf. Figure 1). Obviously ip{u) S a. If we want to find the optimal value /3 of the dual problem (DL) have to compare all "feasible" supporting hyperplanes to M{P),
we
i.e. all
(nonvertical) support hyperplanes with normal vectors of the form {u, 1), u Z 0.
In Figure 2 this value is illustrated by a special hyperplane with
the normal vector (t/^,1). Thus the vector u^ is a solution of the dual problem. We see once more that /3 S a, which means weak duality.
L/
MiV)
,,l) ^(u,l)^^
F i g u r e 2. Moreover, strong duality is equivalent to the existence of a (nonvertical) hyperplane which supports the set M{P)
at the point (0, a ) where a
is the optimal value of the primal problem. Of course, if (P) is a convex optimization problem, then the set M.{P)
is a convex set (the proof is
left to the reader) which can be separated by a hyperplane from the point (0, a ) .
Obviously, this hyperplane is a support hyperplane. Assuming a
Slater condition, which means that there exists a point {y^z) G
M{P)
Lagrange duality
481
with y < 0 (i.e. left of the 2:-axis), we can ensure that this hyperplane is not vertical, i.e. the associated normal vector can be given in the form In this case, u^ is a solution of the dual problem ( D L ) and it holds strong duality in the sense of
max {(p{u) I 'a ^ 0} = inf {/(x) \xeS}
.
If also the primal problem admits a solution x^ E S, then we have even
max{(p{u) \u ^ 0} = min{f{x) \x e S} and {x^^u^) is a saddle point of L. It is useful to introduce the following function. Definition 5.4.1. The function q : IR^ -^ M, according to
q{y) =m({f{x)\x
e X : g{x) S y} ,
is called marginal function of the problem (P). Especially q(0) is the optimal value of the primal problem. We remember that in case of { x € X | g{x) ^ y } = 0 we have q{y) = +00. Further we can state that the epigraph of this function is closely connected with the set M{P). Really, it is easy to show that
epiq = {(y,z) e M"^ x IR\\/e
> 0 3z S z + s : {y,z) E
M{P)}
i.e. epig is the vertical closure of M(P). Thus, a supporting hyperplane to M.{P) is also a supporting hyperplane to epiq and we can emphasize that strong duality is closely connected with the subdifferentiability of the marginal function at the point y = 0. First we shall give some remarks about the structure of this function. Lemma 5.4.3. i)
g(.) is monotonically decreasing, i.e. y^ ^ y^ implies g(y^) ^ g(y^).
ii)
If (P) is a convex optimization problem, then q{.) is a convex function.
482
Duality
iii) If (P) is a convex optimization problem with finite optimal value and if the Slater condition is fulfilled, then g(.) is a properly convex function with 0 € int(dom(g)). (Thus q is even continuous at y =: 0.) Proof. i)
Let y^ S y^. Then q(y^) = M{f(x)\xeX:gix)£y'} ^
ii)
^
ini{fix)\xeX:g{x)Sy^}^q{y^)
.
Let y^^y'^ G dom(g), i.e. q{y^) < oo, g(y^) < oo. Then for each number 71, 72 with q{y^) < 71, qiy'^) < 72. by definition of the marginal function we can find points xi,a:2 G X such that 9{x') S yi ,
fix^)
£ 71 ,
5(^') ^ y ' ,
/ ( ^ ' ) ^ 72 .
Now because of the convexity of / and gi, i = l , . . . , m , for any A G (0,1) we get g^Xx' + (1 - A) x2) ^ A^(xi) + (1 - A) 5(x2) ^ Xy' + (1 - A) y^ and /(Axi + (1 - A) x2) ^ A/(xi) + (1 - A) /(x2) ^ A71 + (1 - A) 72 , i.e. q{Xy^ + (1 - A) y2) ^ inf {/(x) | x G X : ^(x) ^ Ay^ + (1 - A) y^} ^ /(Axi + ( 1 - A ) x 2 ) ^ A7i + ( 1 - A ) 7 2 . Since this inequality holds for all 71 > q{y^) and 72 > q{y'^) we get also q{Xx^ + (1 - A) x2) ^ Xq{x^) + (1 - A) ^(x^) , i.e. q is convex.
g
Lagrange duality
iii)
483
If the Slater condition is fulfilled, then there exists a feasible point X e X with g{x)
< 0. Then for y = g{x) we get
q{y) = inf {/(x) \xeX
: g{x) ^ y} S f{x) < oo ,
i.e. y e dom(g). Now by monotonicity, for all points y > y we have
Q{y) ^ q{y) < oo , i.e. we also have y G d o m ( g ) , hence 0 G i n t ( d o m ( g ) ) . It rennains to show that q is proper. the monotonicity of q{.) R^\y
^y
^Ojwe
By the finiteness of q{0)
again, for all vectors y G [y, 0] =
and {y G
get
- o o < q{0) ^ q{y) ^ q{y) < oo such that q is finite on this set. Now assuming a point y^ G M^
with q{y^)
= —oo, then taking a
special point y of the set i n t [y, 0] = {y E M^
\y < y < 0} we would
get
q{Xy' + (1 - A) y) ^ Xq{y') + (1 - A) qiy) = - o o for all numbers A G ( 0 , 1 ) . For sufficient small numbers A, however, we have Ay^ + (1 — A ) y G [y, 0] and we have a contradiction t o the finiteness of q on this set. Hence q is proper.
D
There is an interesting relation between the dual objective function and the marginal function. In fact, using the Fenchel conjugation we can formulate Theorem 5.4.4.
-q i-u) = {u) \u ^ 0} =^ min {f{x) \ x e S} or equivalently
max inf L ( x , u ) = m i n sup L(x^u) . u ^ 0 ^ex xeX ^ ^ Q Proof. Because of the assumption we can use Lemma 5.4.3 such that q is a properly convex function with 0 G i n t ( d o m ( g ) ) . Now by Theorem 2.6.6 dq{0) y^ 0. T h e rest is a consequence of Theorem 5.4.5.
D
5.5. Perturbed Optimization Problems The Lagrange duality concept is used mainly for convex optimization problems. In nonconvex optimization usually strong duality fails and therefore other duality concepts are developed.
Most of these concepts
are based on the modification of the classical Lagrangian. In this section we give a general approach for the construction of dual optimization problems by means of a perturbation of the given problem and using the theory of conjugate functions. In this manner generalized Lagrangians can be associated which permit the formulation of strong duality theorems in the form of saddle point assertions. W e only give the basic ideas of this approach. For further details we point t o the references. Analogous t o the previous section, let the optimization problem ( P ) be given according t o
Min fix)
(P)
xes where S = {x E X \ g{x)
^ 0} .
Perturbed optimization problems
487
Also here, X C R"" \s an open set, f \ X -^ M 2iX\6 g : X -^ ET with g{x) = igi(x)^ .,.^gm(x)) are arbitrary functions defined on X. Now, we assume that the problem is embedded in a family of perturbed optimization problems with the parameter y EY C JR^ according to
Min F(x,y) xex
(P)(^)
where F : XxY that
-^ Mis 3i suitable perturbation function with the property
F{x,o)==f{x)
yxes
(1)
and
inf {F(x, 0)\xeX}=
inf {f(x) \ x e S} ,
(2)
Thus, by definition the problem (P)(o) 's equivalent to ( P ) . The function F(.,.) is called perturbation function. The vector y EY is called perturbation parameter. Example 5.5.1. Let Y = R^ and / /(x)
iff g{x) ^ y
y +00
else
Fr{x,y) = < [ +00
(r > 0 ) . else
Then in both cases relations (1) and (2) are fulfilled. Frequently Fo{.,.) is called standard perturbation
function.
Analogous to the Lagrange duality we shall introduce the marginal function by the following Definition 5.5.1. The function qp : M^ -^ M, according to
qF[y)=mi{F{x,y)\xeX}
,
is called generalized marginal function of the problem ( P ) .
488
Duality
Also here gi?(0) is the optimal value of the primal problem. Using the special perturbations discussed in Example 5.5.1 above, we can state close connections with the marginal function for Lagrange duality discussed in the previous section. Example 5.5.2. For the standard perturbation function Fo{.^.) of Example 5.5.1 we have
qpoiy) = inf {Fo(x, y)\xeX} =
=
m{{f(x)\xeX:g{x)Sy}
which is the marginal function of Lagrange duality. Using Fr{.^.) we get
QFriy) = inf {Fr{x,y) \xeX}
=
= inf {f(x) + r . ||y ||2 \xeX: = qFo{y)+r'\\yf
g(x) S y} =
.
Hence, if (P) is a convex optimization problem, then qp^ but also qp^, r > 0, are convex functions (see Lemma 5.4.3). Otherwise, if convexity fails, one can try to generate a convex marginal function by adding the term r • \\y\\'^ with r sufficiently large. For the general case we can formulate Lemma 5.5.1. If F ( . , . ) is a convex function, then also qri-) is convex. Proof. The proof is similar to that of Lemma 5.4.3. Let 2/^,2/^ € dom(gi?), i.e. qriv^) < oo, qpiy^) < oo. Then for each number 71, 72 with Qriv^) < 71. Qpiy'^) < 72. by definition of the marginal function we can find points xi,a:2 G X such that
qF{y')^F(x\y')^^i, qF{y^)^F{x\y^)^j2^ Now because of the convexity, for any A € (0,1) we get
9ir(Ayi + (1 - A) y2) ^ inf {F{x, Xy^ + (1 - A) y^ | x e X } ^
Perturbed optimization problems
489
^ F(Axi + ( l - A ) x 2 , A y i + ( l - A ) y 2 ) ^ AF(x\yi) + (l-A)F(x2,y2)
^
^
^ A7i + ( 1 - A ) 7 2 . Since this inequality holds for all 71 > qpiv^) even
and 72 > qpilP') we get
5^(Ayl + (1 - A) y2) ^ Agjr(yi) + (1 - A) qpiv'') , i.e. gi? is convex.
D
Now we construct a dual optimization problem using results of conjugate functions. By definition we have F*{v, u) = sup {vx + uy - F{x, y)\x
e X, y
EY}
and especially - F * ( 0 , n ) = - sup {uy - F{x,y) = mi{F[x,y)
\xeX,yeY}^
-uy\xeX,
= inf {qF{y) -uy\y
eY]
y € F} = =
= - sup {uy - qpiy) \y eY}
=
-9FH
•
=
(3)
Regarding the inequality
and taking into consideration that ^^(0) 's the optimal value of the primal problem (P), the relation
can be regarded as weakly duality assertion by using the following dual problem Max -g>(u) = -F*(0,ix) .
{Dp)
490
Duality
To allow a better comparison of this duality concept with the Lagrange duality concept discussed in the previous section, we shall replace the argument u by the term —u. Naturally, the negative sign has no influence on the optimal value and therefore on the duality conditions. Thus we write
Max -qp(~^)
= -F^iO.-u)
.
(Dp)
In this manner, taking the standard perturbation function Fo(.,.) the dual problem (DFQ) coincides with the Lagrange dual problem (see Theorem 5.4.4). Also here the dual objective function is concave and upper semi continuous and it holds
g>* = sup {0^ - q^u)
I u e M"^} =
= sup {-q*p{-u) \u G J ? ^ } i.e. g^*(0) is the optimal value of (Dp). Summarizing these results we have the pair of dual optimization problems
Min F{x,0)
(P)
xeFT
Max -F*{0,-u)
(Dp)
with the optimal values ^^(0) and g^*(0) respectively and the weak duality relation
9F(0) ^ qpiO) holds. Before we give some assertions regarding strong duality, we shall remark that also the dual problem (Dp) is imbedded in the family of perturbed problems according to
Max - F * ( - ^ , - n )
(DF)(.)
Perturbed optimization problems
491
which are equivalent t o
Min F*{-v,-u)
.
Thus, in the same manner we can construct the bidual problem
(DDp)
by
Majc -F**(x,0)
(DDF)
or equivalently
Min F**(x,0) . xeJR^ If the perturbation function F is properly convex and lower semi continuous on X , then it holds F * * — F and the bidual problem is equivalent t o the primal problem ( P ) . In the following we shall discuss strong duality, i.e. we shall give necessary and sufficient conditions for the equality of the optimality values. Since this is equivalent to
# ( 0 ) = qpi^) we have to discuss the increasing behaviour of the marginal function q in a neighborhood of the point y — 0. We introduce the following notations. Definition 5.5.2. The problem ( P ) (precisely: the family of disturbed problems) is called i)
normal, if ^ ^ ( 0 ) is finite and qp is lower semi continuous at y = 0,
ii) stable, if qpiO) is finite and ^ ^ ^ ( O ) ^ 0. Obviously, ( P ) is normal if it is stable. In fact, in case of dqp^O) ^ 0 we have ^ ^ ( 0 ) = ^ F * ( 0 ) -
Hence, since g^* is lower semi continuous and
qF = qp ^ also qp is lower semi continuous at y = 0. Moreover, analogous t o Theorem 5.4.5, we can formulate the following assertion.
492
Duality
Theorem 5.5.1. It holds strong duality in the sense of m^x{-qU-u)
I u e JET} = mi{f{x)
\xeS}
if and only if (P) is stable, in this case —dqF{0) is the set of optinnal solutions of (Dp). Proof.
The proof is similar to that of Theorem 5.4.5.
Also here the
condition -u^
e dqpiO)
is equivalent to qU-^^)
+ QF{0) = -UO = 0
and -qU-u')
= qpiO) .
Hence, this means that u^ is a solution of the dual problem and it holds q*p*{0) = max{-q*p{-u)
\ u € JBT"} =
= -qU-^') = = qF{0) = M{f{x)\xeS}.
D
The assertions can be sharpened if the perturbation function F(.,.) is assumed to be convex. Then it holds Theorem 5.5.2. Let F{.^.) be convex and ^^(0) be finite. Then it holds strong duality in the sense of s u p { - q U - ^ ) I ^ ^ ^"^y = inf {f{x)
\xeS}
if and only if (P) is normal. Proof. Using Lemma 5.5.1, the marginal function qp is convex by assumption. Thus it holds q'piO) = ^^(0), i.e. sup {~g^(-ii) I u e 1?^} = inf {f{x) \xeS}
,
Perturbed optimization problems
493
if and only if qp is lower semi continuous at y = 0.
D
Combining Theorems 5.5.1 and 5.5.2 we get Theorem 5.5.3. Let -F(.,.) be convex and gir(O) be finite. Then it holds strong duality in the sense of
m a x { - g > ( - u ) | u e JR^} = mf{f{x)
\xeS}
if and only if (P) is stable if and only if (P) is normal and (Dp) admits any solution. In all cases —dqp(0) is the set of optimal solutions of (Dp). A sufficient criterion which guarantees that (P) is stable shall be given in the following assertion. For this we have to ensure the subdifferentiability of qF{>) at the point y = 0. In Lagrange duality, i.e. by means of the standard perturbation function Po(-,0' ^^^^ could be guaranteed by the Slater condition (see Theorem 5.4.6). Now we use a more general condition. It holds Theorem 5.5.4. Let P(.,.) be convex and qp{0) be finite. Further we assume that there exists a point x E X such that F ( x , . ) is continuous at y = 0. Then (P) is stable. Proof. By assumption, P(5,0) is continuous at y = 0 and therefore it is bounded on a neighborhood N{0) of y = 0. Thus there exists a number K > 0 such that
qp{y) = i n f { F { x , y ) \ x e X } ^ for all
P(x,y) ^ K
yeN{0).
Since qp is convex and because qp{0) = 0, we can state that qp is finite on this neighborhood. Hence it is continuous at y = 0 and by Theorem 2.6.6 we have dqp{0) ^^ 0.
D
Finally we introduce a generalization of the classical Lagrangian. Doing so we shall also be able to formulate saddle point assertions for this
494
Duality
duality concept by means of the perturbation function F ( . , . ) . Definition 5.5.3. The function Lp : X x JR^ —> M, according to LF{X,U)
= inf {uy + F{x,y)\y
eY}
is called generalized Lagrangian of (P). Let us calculate the Lagrangians associated to the special perturbation functions discussed in Example 5.5.1. Example 5.5.3. Using the standard perturbation function Fo(-? •) we get Lpoix, u) = inf {uy + Fo(x, y)\ye = inf {uy + fix)
M"^} =
\yelR^:y^
J f{x) + ug{x)
\f
[ —oo
else
g{x)}
-
u^O
which is the classical Lagrangian extended by —oo for u ^ 0. For Fr(-, •) (r > 0) we get LFA^,
U) = inf {uy + Fr{x, y)\ye
iR^} =
- inf {uy + fix) + r • ||y||2 \yeM^:y^
gix)} =
- fix) + inf {uy + r^ \\y\\^ \yeM^:y^
gix)}
m
m 771
+ ^r-(max{5i(a;),-^}) i=l
m 1=1
where
=
-
Perturbed optimization problems
a,(x,u) = {
1 o u: 4r^^ Uigi{x) + r ' {gi{x))'^
495 .^ , ^ ^ if ^i(^) ^ else .
Frequently this function is called augmented
u. 2r
Lagrangian.
Special convexity properties of the generalized Lagrangian can be stated. Lemma 5.5.2. i)
LF(X, .) is concave and upper semi continuous for fixed x e X.
ii)
If F ( . , . ) is a convex function, then also LF{.^U)
is convex for fixed
ueU. Proof. i)
It holds -LF{X,
U)
= sup {-uy - F{x, y)\yeY}
= [F{x, .)]* {-u) .
Thus, for fixed x e X the function —Ljr{x^.)
is a special conjugate
function. Therefore it is convex and lower semi continuous as a function of u. Hence Ljp{x^.)
is concave and upper semi continuous.
ii) T h e proof is similar t o those of Lemmas 5.4.3 and 5.5.1. Let u G IR^ and x^^x'^
G dom(Z/jp(.,tA)),
i.e. LF{X^^U)
Then for each number 7 1 , 72 w i t h Lpix^^u)
Lp{x^^u) we get also LF{XX^
+ (1 - A) x^, u) ^ XLF{X\
i.e. Lp{.^u)
and 72 >
^) + (1 - A) LF{X^,
is convex.
Lp(x'^^u)
U) ,
D
Now we are able to represent the dual problem also by means of the generalized Lagrangian. In fact, taking the conjugate of the perturbation function F ( . , . ) we get - F * ( 0 , -u) = - sup {-uy - F{x, y)\xeX, = inf {F{x, y) + uy\xeX, = inf {LF{X^U)
\X e X}
yeY} yeY}
=
=
.
Thus, the dual objective function has the representation -qpi-u)
= - F * ( 0 , -u) = inf {Lp{x,
u)\xeX}
and for the dual problem we can write Max inf {Lp{x,u)\x
e X} .
On the other side we can calculate the biconjugate function of F{x^.) according to [F{x, .)]**(y) = sup {uy - [F{x, .)]*(n) \ue]R^}
=
= sup {-uy - [F{x, .)]*(-«) I u € JR.""} .
Perturbed optimization problems
497
Taking into consideration the equality -LF{X,
U) = sup {-uy
- F{x, y)\yeY}
= [F(x, .)]*(-^)
we get even [F(x, .)]**(y) = sup {-uy
+ LF{X, u)\ue
BJ^} ,
that means especially [F{x, .)]**(0) = sup {LF{X, u)\ue
ET^} .
If F is properly convex and lower semi continuous, then of course F{x, 0) = [F{x, .)]**(0) = sup {LF{X, u)\ue
M^}
and also the primal problem can be represented in the form Min s u p { L F { X , u ) \ u e xeX
M^} .
Thus, in this case strong duality is closely connected with the existence of saddle points of the generalized Lagrangian. We shall summarize this result in Theorem 5.5.5. Let F ( . , . ) be properly convex and lower semi continuous. Then it holds strong duality in the form m3x{-qF{-u)
I u e R"^} = mm{f{x)
\xeS}
if and only if (x^^u^) is a saddle point of LFProof. By Lemma 5.4.1 the point (x^^u^) is a saddle point of LF if and only if max
ueR^
inf
xeX
LF(X,U)
^
^
= m.m
sup
xeX u^Br-
LF(XSU)
^
^
.
By our above remarks, however, this is equivalent to max {-qF{~u)
| u € IRT'} = min {F{x, 0)\xeX} = min{f{x)
\xeS}
= .
D
498
Duality
Finally we can formulate the following direct duality assertion. Theorem 5.5.6. Let i^(.,.) be properly convex and lower semi continuous. Further we assume that the optimization problem (P) is stable. If x^ is a solution of (P), then there also exists a point u^ e M^ which is a solution of (Djr) and the optimal values are equal. Proof. Let x^ be a solution of (P). Because of the assumption we can use Theorem 5.5.1 and we get
max{-q^-u)
\ u e ]R^} = mf{f{x)
\ x e S} =
= min{/(a:) \xeS}
.
By Theorem 5.5.5, (x^, u^) is a saddle point of Ljr. Hence, u^ is a solution of the dual problem and strong duality holds. D
References to Chapter V K.J. ARROW, F.J. GOULD and S.M. HOWE (1973), A general saddlepoint result for constrained optimization. Math. Progr., 5, 225-234. J.P. AUBIN (1980), Further properties of Lagrange multipliers in nonsmooth optimization, Appl. Math, and Opt., 6, 79-90. J.P. AUBIN (1993), Optima and Equilibria: An Introduction to Nonlinear Analysis, Springer, Berlin, Heidelberg. M.S. BAZARAA and C M . SHETTY (1979), Nonlinear Programming Theory and Algorithms, John Wiley, New York. C. BERGE and A. GHOUILA-HOURI (1962), Programmes, Jeux et Reseaux de Transport, Dunod, Paris. V.F. DEM'YANOV and V.N. MALOZEMOV (1974), Introduction in Minimax, John Wiley. I. EKELAND and R. TEMAM (1976), Convex Analysis and Variational Problems, North-Holland, Amsterdam.
References to Chapter V
499
A.V. FIACCO and G.P. McCORMICK (1968), Nonlinear Programming: Sequential Unconstrained Minimization Techniques, John Wiley, New York. J. GAUVIN (1979), The generalized gradients of a marginal function in mathematical programming. Math, of Op. Res., 4, 458-463. A.M. GEOFFRION (1971), Duality in nonlinear programming: A simplified applications-oriented development, SIAM-Review, 13, 1-37. J. GLICKSBERG (1952), A further generalization of the Kakutani fixed point theorem with applications to Nash equilibrium points, Proc. A.M.S., 3, 170-174. E.G. GOL'STEIN (1971), The Theory of Duality in Mathematical Theory and its Applications, Nauka. E.G. GOL'STEIN (1972), Theory of Convex Programming, A.M.S. Translation Series. F.J. GOULD (1969), Extensions of Lagrange multipliers in nonlinear programming, SIAM J. Appl. Math., 17. 1280-1297. M.R. HESTENES (1975), Optimization T h e o r y - T h e Finite Dimensional Case, John Wiley, New York. A.D. lOFFE (1979), Necessary and sufficient conditions for a local minimum. 1: A reduction theorem and first order conditions. 2: Conditions of Levitin-Miljutin-Osmolovskij-type. 3: Second order conditions and augmented duality, SIAM J. Control Optim., 17, 245-288. S. KAKUTANI (1941), A generalization of Browers fixed point theorem, Duke Math. J. 8, 457-459. S. KARLIN (1960), Mathematical Methods and Theory in Games, Programming and Economics, McGraw-Hill, New York. P.J. LAURENT (1972), Approximation et Optimisation, Hermann, Paris. D.G. MAHAJAN and M.N. VARTAK (1977), Generalization of some duality theorems in nonlinear programming. Math. Progr., 12, 293-317. O.L. MANGASARIAN (1969), Nonlinear New York.
Programming,
McGraw-Hill,
500
Duality
G.P. McCORMICK (1976), Optimality criteria in nonlinear programming, SIAM-A.M.S. Proc. 9, 27-38. J. V. NEUMANN (1928), ZurTheorieder Gesellschaftsspiele, Math. Ann., 100, 295-320. B.N. PSHENICHNYI (1980), Convex Analysis and Extremum Problems, Nauka. R.T. ROCKAFELLAR (1970), Convex Analysis, Princeton Univ. Press, Princeton. R.T. ROCKAFELLAR (1973), A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Progr., 5, 354-373. R.T. ROCKAFELLAR (1974), Conjugateduality and optimization, SIAMPublications. R.T. ROCKAFELLAR (1974), Augmented Lagrange multiplier functions and duality in nonconvex programming, SIAM J. Control, 12, 268285. R.T. ROCKAFELLAR (1976), Augmented Lagrangians and Applications of the proximal point algorithm in convex programming. Math, of Op. Res.. 1, 97-116. R.T. ROCKAFELLAR (1976), Lagrange SIAM-A.M.S. Proc. 9, 145-168.
multipliers
on
Optimization,
R.T. ROCKAFELLAR (1982), Lagrange multipliers and subderivatives of optimal value functions in nonlinear programming, Math. Progr. Study, 17, 28-66. R.T. ROCKAFELLAR (1985), Extensionsofsubgradient calculus with applications to optimization, Nonlinear Analysis, Th. Meth. AppL, 9, 665-698. M. SION (1957), Sur une generalisation du theoreme minimax, C.R. Academic des Sci., 244, 2120-2123. M. SION (1958), On general minimax theorems, Pac. J. Math., 1958.
References to Chapter V
501
J. TIND and L.A. WOLSEY (1981), An elementary survey on general duality theory in mathematical programming, Math. Progr. 21, 2 4 1 261.
C H A P T E R VI. V E C T O R OPTIMIZATION
6.1 Vector Optimization Problems We say that an optimization problem is a vector or multiobjective or multicriteria problem when the objective function is a vector function / = (/15 f2^ '-> fp)- The variable x ranges over a set X C R^ and may be required to satisfy some further constraints. For the sake of brevity, these further conditions will be considered only through functional constraints and only through inequalities:
Min f{x) xes
(V.P.)
with f : X C M'' -^ RP Bnd S = {xeX : gj{x) SO]j = l , . . . , m } . In this chapter, X will always denote an open set. The set S will be named as the feasible decision set. A vect9r optimization problem is implicit in any decision-making process, when the final choice depends on conflicting goals, namely different criteria which one wants to minimize (or to maximize) and which are not reducible to a single one. These criteria are conflictive and none of them has higher priority over others. Politics, business and in general group decision making are always concerned with satisfying many different view points. It is not surprising to suppose that almost any real-world application of mathematics is able to offer conflictive multiple criteria. Mathematical economics game theory, welfare theory, production theory, the theory of equilibria and many other frameworks deal with applications of vector optimization. A point x e S could be defir-' ' 5 an optimal solution for (V.P.) when it simultaneously minimizes all the objective functions / i , / 2 , . . - , / p in the feasible decision set S] by this definition, a vector optimization problem would have no further mathematical interest with respect to a scalar one. But this approach would mean that criteria have no conflicting
504
Vector
optimization
nature and are no more than one single index in different forms. So this definition is too narrow to be of practicaluse. Unlike mathematical programming problems (with a single objective function) an optimal solution in the previous sense does not necessarily exist in multiobjective optimization problems. This fact heavily depends on the kind of (V.P.) problems and not on the hypotheses that we can introduce on the function / and/or on the feasible set S. We will also consider the points that simultaneously minimize all p objectives; we will call them ideal or Utopia points. Yet we need a broader notion than this extreme and fortunate case. The crucial point is to define an optimality notion that takes into account the different objectives / i ) •'") jp-
The difficulty of a vector optimization problem arises from the incomparability for some pairs of alternatives. From a mathematical point of view, a vector optimization problem consists of the research of those values which will be defined as optimal in the partidlly ordered set Z = f{S) C FIP. The set Z is called the image space under the mapping / or the outcome space. Historically the first satisfactory definition of an optimal solution for (V.P,) was the Pareto optimum. This definition was formulated by the Italian mathematical economist Vilfredo Pareto in his Manuale di Economia Politica (1906) and in some previous papers by the same author at the beginning of the century. A point a:^ G *? is said to be a Pareto optimum (here a Pareto minimum) or a Pareto efficient point when it is nondominated or there is no x G 5 with fi{x) ^ fi{x^), V i = 1, ...,p, and fj{x) < fj{x^) at least for one index j. Hence for every x E S we cannot have A / = fix) - f{x^) e i R ^ \ { 0 } . The theory of vector optimization is at the crossroads of many subjects. The terminology feels the effects of such a position. The terms "minimum"/"maximum" and "optimum" are in line with a mathematical tradition while words such as "efficient" or "nondominated" find a larger use in business-related topics. In any case we will use all these words synonymously. The notions of Paretian optimality is based on the componentwise
Conicsil preference orders
505
order. So we can extend this definition of efficiency by replacing the componentwise order with a general ordering relation or by replacing 1RF_ with some set C (a cone, as we will see) when we are requiring that there be no X E 5 with A / G 1RF_\{0}. In consideration of these remarks, the next section will be devoted to some definitions and to some general properties of binary relations in its connection with (convex, closed and pointed) cones.
6.2. Conical Preference Orders Definition 6.2.1. Let J? be a binary relation on an arbitrary set Z C IRP. We say that R is: a) reflexive when xRx,
\fxEZ]
b) irreflexive when x^x, \/x e Z (the notation x^y denotes the negation of xRy)] c) symmetric when xRy implies yRx, Va:,y € Z ] d) asymmetric when xRy implies yJ^x, Vx,?/ G Z; e) antisymmetric when xRy and yRx imply x = y, Vx^y E Z ] f)
transitive when xRy and yRz imply xRz, Wx^y^z e Z \
g) negatively transitive when x^y
and yJ^z imply x^z,
h) complete (or connected) when \/x,y sibly both); i)
Wx^y^ z E Z ]
e Z \Ne have a:i2y or j/jRx (pos-
M/ea/c/y complete (or i/\^ea/c/y connected) when yx^y
e Z, x ^ y, we
have a:i?y or yJ?^; j)
//near when xi?7/ implies (cx+z) R(cy+z), such that ca: + z, cy + ;2: G Z .
Vc 6 iR-f and yx^y^z
G Z
Remark 6.2.1. The linearity is a compatibility condition between the order
506
Vector
optimization
and the algebraic structures, it requires that Z he B subset of a linear space (e.g. Z C JR^). All the other previous definitions can be given for arbitrary spaces Z without linear structure. Definition 6.2.2. A binary relation i? on a set Z is said to be: a) a preorder when it is reflexive and transitive; b) a partial order when it is reflexive, antisynnmetric and transitive; c) a total order when it is reflexive, antisymmetric, transitive and complete. Definition 6.2.3. A binary relation i? on a set Z is said to be a weak order when it is asymmetric and negatively transitive. Remark 6.2.2. A partial order becomes a strict partial order by substituting the hypothesis of reflexivity in b) by the irreflexivity one. Then, of course, it is asymmetric and hence antisymmetric since xRy and yRx imply xRx, but R is assumed to be irreflexive. Analogously a binary relation R is said to be a strict total order when it is irreflexive, transitive and weakly complete. So a strict total order is a strict partial order which is weakly complete. The relations between strict total order and strict partial order are the same as between total order and partial order. Theorem 6.2.1. i) If i? is a strict total order, then it is a weak order. ii) If i? is a weak order, then it is a strict partial order. Proof. i) R is asymmetric due to the existence of two elements x^y E Z, such that xRy and yRx would lead to xRx, which is in contrast with the hypothesis of irreflexivity. R is negatively transitive too as x^y and y^z mean that zRy and yRx. Thus we have zRx or xJ^z as a consequence of the asymmetry of R.
Conical preference orders
507
ii) If relation i? is a weak order, R must be irreflexive. If there would be a X e Z such that xRx, R could not be asymmetric. R is antisymmetric since it is asymmetric. R is also transitive; indeed, \lx^y^z G Z, xRy and yRz imply (owing to the asymmetric hypothesis) zJ^y and y^x. As R is negatively transitive, we have zl^x and xRz\ in fact, assuming xf.z, by the negative transitivity and zf.y we would get x^y which is a contradiction to the assumption. D We will denote an order relation (partial or total) by ^ ; a strict order relation will be indicated by > . It is easy to verify that an arbitrary partial order ^ induces a strict partial order by setting x > y when it is x ^ ^ while it is not y ^ x or when \t\sxZy with x ^ y. We will write x ^ y or X < y for y ^ X or y > X. Example 6.2.1. a) In FIP we obtain the lexicographic order when we put x > y \f and only if Xi = yi,Vi
yk-
b) In IRP we obtain the Pareto order or the componentwise order by setting X ^ y if and only if Xi ^ yi, \/i = 1, ...,p. In particular, for the Paretian ordering we will write x > y when x Z y and 3i : Xi > yi and X > y when it is Xi > yi, \/i = 1, ...,p. c) A partial (componentwise) order in the set of all the real sequences is obtained by setting x = {x^}
^ y = {y^}
if and only if x^ ^ y^,
WkeN. In the previous examples of componentwise ordering we have x ^ y '\f and only if the vector (or the sequence) x — y has all nonnegative components or equivalently if and only \f x — y belongs to the convex and pointed cone JR^ (respectively: to the convex and pointed cone of the real nonnegative sequences). This connection between a partial order x ^ y and the belonging of x — y to a convex pointed cone is a general statement. We formulate it explicitly for euclidean spaces, even if it holds for any linear space.
508
Vector
optimization
Theorem 6.2.2. i) If IRP is partially ordered by a linear relation ^ , then the set C = {y E M^ : y ^ 0} is a convex pointed cone (which includes the origin). ii) If C is a convex and pointed cone of M^ such that 0 e C, then the binary relation so defined: X ^ y when x — y E C is a partial linear order in JR^. The proof of the theorem is immediate and it shows a strong connection between the antisymmetry of the relation ^ and the pointness of the ordering cone C. The same bi-implication exists between the transitive and linear properties and the convexity of C, and between the reflexive property and the assumption that 0 e C. From now on we will consider only partial orders generated by nontrivial convex and pointed cones [including the origin) and we will write: X ^ c y
for X — y e C ]
X ^c y
for X — y e C\{0}
.
We will drop the subscript C and more simply write x ^ y and x > y where it is clear that C is the ordering cone. It is also convenient to assume that int C 7^ 0 and that C is a closed cone. These requirements have not to be merely understood as a technical condition. Indeed, if i n t C = 0, it may be that a pair of vectors does not admit a minorant, as one can see by considering C C JR^, C = {(x,y) : X = 0^ y Z 0}. The closedness of C is a sufficient condition which assures that the order relation satisfies the Archimedean property {a ^ nb for any ne IN implies h S 0). Also we will generalize the distinction that we already introduced for the Paretian order by using the notation: x>y which means x — y E K = intC. It is easy to verify that the relation X > y is also irreflexive and transitive.
Optimality (or efRciency) notions
509
6.3. Optimality (or Efficiency) Notions In the previous section we introduced the general notion of ordering relations, which we will express explicitly through a general closed convex pointed cone C (with i n t C ^ 0). Now the componentwise order and the set M^ are only a particular case. In particular, in this section we will give the definitions of ideal efficiency, efficiency and weakly efficiency (or minimality) for a set Z C M^ when IR^ is partially ordered by a closed convex pointed cone C (with i n t C 7^ 0). In Section 4 proper efficiency will be introduced. Naturally, maximality notions can be obtained analogously by exchanging the order relations ^ and S . Definition 6.3.1. a) An element z^ e Z \s said to be an ideal {or Utopia) minimal or efficient value of the set Z when \t \s z ^ z^, Wz e Z\ we will write
^0 e
IE{Z);
b) z^ ^ Z is said to be a minimal or efficient value of Z {z^ e E{Z)) when z^ ^ z for some z £ Z implies z ^ z^ (i.e. z = z^) or equivalently when ^z e Z with z^ > z] c) z^ e Z is said to be a weakly minimal or weakly efficient value of Z {z^ e WE{Z)) when z^ is a minimal value of the set Z ordered by the cone K = {0}U int C, i.e. when ^z e Z with z^ > z. Respectively, we will write z^ e IEc{Z),
z^ e Ec{Z),
z^ e
WEc{Z)
when we wish to emphasize the ordering cone of the set Z. For the case where Z = f{S), we have the definitions of ideal (or Utopia) points, of efficient points (or solutions of (V.P,)) and weakly efficient points. For a point x^ e S with z^ = f{x^), we will write x^ e IE{S), x^ e E(S), x^ e WE{S) respectively if and only if z^ E IE(Z), z^ e E{Z), z^ e WE{Z). When it is nonempty the set IE{Z) is singleton for the antisymmetry of the relation ^ . An ideal minimal value dominates any other value of
510
Vector
optimization
Z\ for Z C IRP endowed for example with the componentwise order, z^ is the value with the minimal components z^ for any i — l , . . . , p .
The
notion of minimal value weakens the previous section: if z^ E E{Z),
we
can have some values of Z which are not comparable with z^ but z^ is the best point with respect to those which admit comparability. In other words z^ is nondominated
by any value of Z. Of course IE{Z)
C
E{Z)\
Theorem 6.3.2 will make this relation more exact. The definition of weak minimality can be easily explained with the aid of the particular cone C = R\\
then z^ e WE{Z)
when there does not exist d^ny z e Z with
Zi < zf ( V i = 1, . . . , p ) . Theorem 6.3.1 will show the general inclusion of E{Z)
in
WE{Z).
Remark 6.3.1. Definitions 6.3.1 can be given in local form by considering z as ranging only over a neighborhood of z^ G Z. In particular z^ is called a local minimal value of Z when there exists S > 0 such that z^ G E{Z N5{z^)).
if Z is convex, then E{Z)
t h a t E{Z
n Nsiz^))
C E{Z).
= E{ZnNs{z^)).
it suffices t o show
Suppose that z^ G E{Z
Then there exists z^ e Z such that z^ — z^ = c e C\{0}.
n
Ns{z^))\E{Z). Since Z is con-
vex, V(5 > 0 there exists A G (0,1) for which z'^ = z^ - Xc e Z n It follows t h a t z^-z'^
e C \ { 0 } . Thus z^ ^ E{Z
D
D
Ns{z^).
Ns{z^)).
Remark 6.3.2. As a minimal value of Z is also a maximal w i t h respect t o the partial order induced by —C, it is sufficient t o limit our research t o minimality notions. T h e different efficiency definitions have been formulated through the ordering relation
^ . The same definitions can be given emphasizing the
geometrical features of the image set Z. So it will often be easier t o verify Definitions 6.3.1 since they are reduced t o an inclusion or t o an intersection between two sets and one of them is a prefixed cone.
Definition 6.3.1 a) can be rewritten in the form z —
or Z-z^
Z^EC^^ZEZ,
CC.
Equivalently z^ G E{Z) t h a t z^ — z e C\{0}
if and only if there is no element z e Z such
or z — z^ e —C\{0}.
This inclusion can be replaced
Optimality (or efficiency) notions
511
by the set equation:
{Z - z^) n (-C) = {0}
(or Zn{z^-C)
= {z^})
which offers a useful geometric characterization of minimal values. They can be determined by sliding the cone ~C to contact the boundary of Z; only those points that can be contacted by the cone sole vertex are candidates. Analogously z^ e WE(Z) = EK{Z) with K = { 0 } U i n t C if and only if Z n (z^ -K) = {z^} or equivalently Zr]{z^ - int C) = 0. Example 6.3.1. In ]R? let us consider the ordering cone C — iR^. For the set Z — {{z\^Z2) : —1 ^ ^2 = 0, 2^2 ^ —z\ — 1} it is easy to verify that
IE{Z) = 0 ; E(Z) = {{zi,Z2) : Z2 = -zi - 1, - 1 ^ ^2 ^ 0} ; WE{Z) = E{Z) U {(^1, Z2) : Z2 = - 1 , ;^i ^ 0} .
'Z2
-1 ^ *•! -1
F i g u r e 1. Theorem 6.3.1. The following inclusions hold:
IE{Z) c E{Z) c WE{Z) .
512
Vector optimization
Proof. We have already remarked that IE{Z) is contained in E{Z). So we only have to prove the second inclusion. Let z^ E E{Z) or Zn{z^-C) = {z^}. Then for K - { O j U i n t C . we have Zn{z^-K) C Zn{z^-C) = {z^} which assures that z^ e WE{Z). U Theorem 6.3.2. If the set IE[Z) is not empty, we have IE{Z) and this set is reduced to a singleton.
=
E{Z)
Proof. We have to prove that E{Z) C IE{Z). If z G E{Z) and z^ e IE{Z), from z ^ z^ we deduce z^ ^ z and z = z^ for the antisymmetric property. D
6.4. Proper Efficiency The more restrictive notion of proper efficiency was first introduced by H.W. Kuhn and A.W. Tucker in their classical paper of 1951. In order to understand their aim we have to go back to our vector optimization problem:
Min f{x) xes
(y.P.)
with f : X C EJ" -^ BP, X open and 5 = { x G X : gj{x) S 0] j = l , . . . , m } . Kuhn and Tucker intended to get rid of those solutions that cannot be satisfactorily characterized by a scalar minimization problem (see the following Section 7) and generally to avoid some undesirable situations. In Kuhn-Tucker's definition of proper efficiency the Paretian ordering is considered. I{x^) will denote the set of binding or active constraints at x^eS: I{x°) = {j : gj{x^) = 0}. Definition 6.4.1. Let f^g = (5i,...,Pm) be differentiable functions. A point x^ e S '\s said to be Kuhn-Tucker properly efficient {x^ G PE(S)KT) when it is efficient and there is no vector y G IRP' such that:
I
Jg{x^)y^O
Proper efSciency
513
where J is the symbol of the Jacobian matrix and in particular Jg only concerns the functions g^ with k E I{x^)/(xO)
e
We will also write that z^ =
PE(Z)KT.
Kuhn-Tucker's definition leaves out those efficient points for which in a neighborhood of the same, along the directions y such that Jg{x^) y S 0, there are some points x = x^ + ty such that A / i = fi{x) — fi{x^) < 0 for some i (being V / i ( x ^ ) y < 0), even if they make positive those quantities Afj for which Vfj{x^)y = 0. According to Definition 6.4.1, x^ is an improper efficient point when we are able to move along some "feasible" directions and to minorize some criteria with a marginal gain of a lower order and therefore incomparably higher than the marginal loss of the remaining criteria. The following Kuhn-Tucker example (with p = 2) shows a point x^ which is efficient but not properly efficient (in the sense of Kuhn-Tucker); we can move from x^ and make A / i < 0, A / 2 > 0, but the latter variation is a higher order infinitesimal quantity. Example 6.4.1. Let f{x) = {-x, x^ - 2x), f :1R-^ ]R?, and g{x) = -x. Any point x ^ 1 is efficient for this (V.P.) but x^ = 1 does not satisfy Definition 6.4.1; indeed any y > 0 is a solution of the previous system. Kuhn-Tucker's pioneer definition was followed by a number of other definitions of proper efficiency. In this section we wish to produce a rather large picture of this limitation to optimality notion, but with some guiding principle which will avoid a pure list of definitions. So we have gathered these definitions into two groups. In the first we have inserted those approaches which follow Kuhn-Tucker's definition or that in any case produce like features. The first slight change of Kuhn-Tucker's definition is due to [Klinger, 1967]. His definition does not require the differentiability of the function g. Definition 6.4.2. Let / be a differentiable function. A point x^ E S \s said to be Klinger properly efficient {x^ G PE{S)K or z^ = f{x^) e PE{Z)K) when it is efficient and for no vector y G M^ such that Jf{x^) y 0 such that x^ ^ Mx^, V x ^ 0. (On the contrary we can notice that x^ = 0 satisfies Definitions 6.4.1 and 6.4.2.) Geoffrion's definition was generalized by R. Hartley to any ordering cone C. In both definitions the explicit request that x^ be an efficient point is indeed superfluous. With Hartley's definition we renounce the explicit reference to {V.P.). Let us again be concerned with efficient points of any set Z C IR^. We recall that C > is the nonnegative polar cone of C: C > = {u : u-y ^ 0, Vy G C } ; later we will use the symbol C> in order to denote the strict positive polar of C: C> = { i / : z/ • y > 0, V y G C\{0}}. Definition 6.4.4. A value z^ E Z is said to be a Hartley properly efficient {z^ e PE{Z)Ha) when it is efficient and there exists M > 0 such that for any z e Z and A G C > , with \{z — z^) < 0, there exists
fieC^
suchthat^iz^^z)^M-^{z-z^). Il^il
llMll
The second group of proper efficiency definitions follows a geometrical approach. It goes on focusing the image set Z without any more reference to the objective function / and to its components. The ordering cone is here a general cone. The first definition of this group is due to Hurwicz. Definition 6.4.5. A value z^ e Z \s said to be a Hurwicz properly efficient point {z^ G PE{Z)HU)
when it is efficient and:
cl conv cone [(Z - z^) U C] n ( - C ) = {0} . Example 6.4.3. If we refer Hurwicz's definition to the particular componentwise order, we obtain that the minimal closed convex cone containing {Z — z^) U IRF^ (i.e. its closed convex conical hull) cannot interesect j R ^ \ { 0 } . This condition is not satisfied e.g. in the following figure (when the cone generated by {Z — z^) U M^. is not convex) since cl conv cone [{Z-z^)U]R%] = M'^. Thus z^ ^ PE{Z)HU but z^ G E{Z).
Vector
516
optimization
Figure 2. The geometrical approach to Hurwicz's definition was developed by [Benson, 1979], [Borwein, 1977] and [Henig, 1982]. Roughly speaking, all these geometrical definitions call z^ a properly efficient value when the sets Z - z^ or Z + C - z^ or {Z - z^)uC are "well" separated from — C \ { 0 } (and not only generically disjoint, as happened for the efficient values). Namely, a cone containing the previous sets or a local conical approximation of these sets cannot intersect —C\{0}. Definition 6.4.6. A value z^ e Z \s said to be a Benson properly efficient point {z^ € PE(Z)Be)
cl cone [Z + C-z^]n
when it is efficient and:
( - C ) = {0} .
Definition 6.4.7. i)
A value z^ E Z \s said to be a Borwein properly efficient {z^ G PE(Z)BO)
when it is efficient and
Proper efficiency T{Z + C,z^)n{-C)
517 = {0}
where T denotes the Bouligand tangent cone; ii)
z^ e Z \s said to be a global Borwein properly efficient value {z^ G PE{Z)GBO)
when it is efficient and:
c l c o n e [ Z - ^ ^ ] n ( - C ) = {0} ; iii) z^ G Z is said to be a local Borwein properly efficient value [z^ G PE{Z)LBO)
when it is efficient and:
T[z,z°]n{-c) = {0}. Definition 6.4.8. A value 2:^ G 2' is said to be a Henig properly efficient {z^ G PE{Z)He) when z^ G Ec'{Z) where C is a (closed, pointed and convex) cone with C'\{0} C i n t ( C ) . Remark 6.4.3. In Definitions 6.4.5, 6.4.6 and 6.4.7 ii), the requirement that z^ G E{Z) is superfluous as we have Z — z^ C clconvcone [(Z — z ^ ) u C ] , Z -z^ Gd cone [Z + C - z^] and Z - z^ C c\ cone [Z - z% The same requirement is superfluous in Definition 6.4.7 i). Indeed let us suppose that there exists a point z ^ Z such that z — z^ e. — C \ { 0 } . Let z^ = z- Xk{z ~ z^) e Z + C with 0 < Afc < 1 and l i m A^ = 1. Then
lim 2'^ = z°. But lim — i — ( 2 ^ - 2 ° ) = k—^+oo
fc—>H-oo
1 — A^
lim T^^^[{l-^k)
z-
k—^-\-oo 1 — Ajt
(1 - Afc) z^] = z- z° € T{Z + C,z^) n (-C\{0}), while we supposed z°ePE{Z)Bo. Again the requirement that z^ be efficient is superfluous in Definition 6.4.7 iii) when Z is convex. Indeed, in this case, we have z — z^ e T{Z,z^). Then by the proper efficiency we get z — z^ ^ — C \ { 0 } or z^ G E{Z). Remark 6.4.4. A value z^ satisfying Definition 6.4.8 is called global by [Henig, 1982a] in order to distinguish it from a local version of the same definition.
Vector
518
optimization
By means of Definition 6.4.7 iii) we can give an interpretation of weakly minimal values in terms of proper efficiency.
Theorem 6.4.1. Let K = mtCU {0}. Then WEc(Z)
=
PEK{Z)LBO-
Proof.
From WEc{Z) = EK{Z) it suffices to show that EK{Z) C PEK{Z)LBOLet^o e EK{Z)\PEK{Z)LBOThen {-mtC)nT{Z,z^) ^ 0 or there exists y eintC such that y = lim Xk{z^ - z^) with Xk > 0, fc—>H-oo
z^ E Z and
lim z = z^. it follows that there exists an integer A^ such /c—•+00
that, VA; > N, Xk{z^ - z^) G i^\{0} or z^ - z^ e K\{0}. z^ ^ EK{Z), which is a contradiction.
F i g u r e 3.
By definition D
Proper efRciency
519
The last definition of proper efficiency that we wish to quote was more recently introduced by J.M. Borwein and D. Zhuang. Definition 6.4.9. A value z^ e Z \s said to be superefficient (z^ e PE{Z)SE) when there exists m > 0 such that c\cone{Z-z^)r\{B-C) C mB where B is the closed unit ball of IRP. The above figure draws the inclusion relations among the previous definitions proven by Theorem 6.4.2. The shortened symbols used in this picture are obvious. A few cautionary remarks need to be made. Quoted definitions were stated as they appeared in the reference; slight variations were made only on the symbols. But now in order to compare the different notions of proper efficiency, the ordering cone in Theorem 6.4.2 will always be given by jR^ when we will involve the Definitions of KuhnTucker, Klinger and Geoffrion. The inclusion relations do not need any additional hypothesis. Only two exceptions are necessary. When we treat Kuhn-Tucker and Klinger properly efficient points, these definitions require implicitly some differentiability conditions. Moreover, the statement that PE{Z)LBO C PE{Z)KT holds under a suitable constraint qualification hypothesis; thanks to Abadie's constraint qualification we may claim a coherence between a geometrical and an analytical description of the proper efficiency. Kuhn-Tucker's definition, later improved by Klinger, turns out the most general. The proof bs) uses the important notion of a base for a cone. Definition 6.4.10.
A set A is said to be a base for the cone C when
any y G C'\{0} can be written as y = tA (with unique A G A, t > 0) and
O^A. In finite-dimensional spaces, a convex cone C has a convex compact base if and only if it is closed and pointed. For C = M^ a trivial base is given by A = conv { e \ ..., e^}, where {e^} is the canonical basis of M^.
520
Vector
optimization
Theorem 6.4.2. a)
PE{Z)HUCPE{Z)GBO:
b) PE{Z)Ha
= PE{Z)G
= PE{Z)Be
= PE{Z)GBO
= PE{Z)He
=
PE{Z)sE\ C)
PE{Z)Be(lPE{Z)Bo;
d)
PE{Z)BO , ,{., ^, = M. z^^ - zP l/2MA;fe PE{Z)G bs) PE{Z)Be
PE{Z)GBO
2MAA;
since this is true for each M > 0. C
PE{Z)GBO-
It easily follows:
cl cone(Z - ;^0) n ( - C ) C cl cone(Z + C-z^)r\ b4)
-^
This conclusion contradicts z^ 6
C PE{Z)He-
If ^^ e PE{Z)GBO
( - C ) = {0} . we have clcone(Z -
^^) ^ (""C') = {0}. From a well-known theorem of separation between cones (see Lemma 6.5.5) it follows that there exists a (closed) convex and pointed cone C^ such that (—C)\{0} C i n t ( — C ) and
Vector
522
optimization
clcone(Z — z^) n (—C") = { 0 } . All the more reason for having ( ^ _ ^0) p (.c^/) ^ 10} or z^ e PE{Z)He^ bs) PE{Z)He C PE{Z)sEIf z^ e PE{Z)He and C" is the cone of Definition 6.4.8, there exists s > 0 such that ( Z - 2 : ^ ) n c l c o n e ( - A + sB) = { 0 } where B is the closed unit ball and A is a base for C. So we are able to deduce cone {Z — z^) O (—A + sB) = 0. Now for any fixed z E Z, let us choose any cr G C + (z — z^). From z — z^ = cr — tX for some t ^ 0 and some A G A, we deduce llz —z^ll = |cr|| when t = 0. For ^ > 0 we have ,0
=
M t
^ 8\ indeed
< 8, with
A. wou Id imply cone(Z - z^) n {sB - A) / 0.
Let us set M = sup
< +oo:
AGA
\z — z^ = | | c r - a | | ^ ||a|| + tM ^M{I
+
^)^
m \\a\
From this, for any z e Z and for any a E C + (z — z^), we obtain c o n e ( Z - z ^ ) n { B - C ) C m B . Indeed for any u = t{z-z^) = b-y (t > 0, z G Z , 6 G 5 . y G C) one has - 6 G C + (z - z^) and hence 1 z W ^ m -b t
or \\t{z - z^)
m.
Lastly, clcone(Z - z^) n {B - C) C mB. By contradiction let us suppose w G clcone(Z — z^) 0 {B — C) with \\w\\ > m. From w = uk + o ( l ) = b-y {cjke c o n e ( Z - z^), b e B, y e C)\t follows k thatcjfc eB-C+yB = ^ ^ B C or 7 — 7 ojjc E B — C. So we
k+ l
would obtain
but lim
,
k + 1 Uk G ( J 5 - C ) n c o n e ( Z - z ^ ) or k + l k
^k
w with ||t/;|| > m.
fc-^+oo fc + 1
be) PE{Z)sE C PE(Z)Ha' By contradiction let us suppose that z^ G PE{Z)sE\PE{Z)HaThus Ve > 0 there exists A G C ^ with ||A|| = 1 and X{z - z^) < 0 for some z e Z such that X{z - z^) < — (m + s) iJ.{z — z^) for any // G C > , ||/i|| = 1 (where m is the
Proper efficiency
523
positive number of Definition 6.4.9). deduce the following one:
M^-zO)^
From this inequality we can
1
Then we have:
z-z^e
Indeed, if
"
" {B-C)c z- z^
"
'' (B-C)
.
1
^ ^ (B — C), there exists fi with \\fi\\ = 1 \\z — z^W m -\- s z — z^ 1 such that a • 7; rrrr > a- (b — y) for any b e B and for any y e C. It is easy to verify fi e C > . As the previous inequality holds z- z^ 1 Wb e B and Wy e C we get the contradiction u- 7; pnr > •
||z — ^^11
m+ e
Therefore we obtain that: ^ l i |
{z - .0) € ( S - C) n cone(Z - z") .
Then, as z^ e
we have
PE{Z)SE,
""^"mi'-^") ..^ — z
^ m
and hence the absurdum:
m+c)
PE{Z)Be
C
PE(Z)BO-
This statement is true as:
r ( Z + C, z°) n ( - C ) C cl cone(Z + C - z°) n ( - C ) = { 0 } . d)
PE{Z)BO
inclusion:
C PE{Z)LBO-
Also this inclusion follows readily from the
524
Vector optimization
T{z, z^) n (-C) c T{Z + c, z^) n (-C) . e)
PE{Z)LBO
Since:
C PE[Z)KT-
[Jf{x^)'C{x^)]r\R^_
-
= [J/(xO) • T{S, x^)] nlRF_c T(Z, z^) nIRt = {0} it follows immediately that there is no y e IR^ with Vgj{x^)y ( j G I(x^)) f)
and J / ( x ^ ) y < 0. So we have that z^ e
PE{Z)KT
PE{Z)K-
PE(Z)KT-
By contradiction suppose z^ G
C PE{Z)K-
S 0
PE{Z)KT\
Then there exists a vector y e M^ such that J / ( x ^ ) y 4-oo
We can improve Theorem 6.4.4 and arrive at the equality between PE{Z)KT
(snd all the intermediate sets) and PE{Z)HU
under convex-
ity assumptions which, however, are not comparable with the hypothesis of the previous theorem. Of course involving Definition 6.4.1 necessarily implies the choice of the particular ordering cone iR^. Moreover the equality PE{Z)KT PE(Z)KT
= PE{Z)HU
will suppose the inclusion PE{Z)HU
C
which was proved under some assumptions, namely the differ-
entiability of the functions / and gj {j G I{x^))
and the Abadie constraint
qualification. Theorem 6.4.6. If the functions / and g are ^ ^ - i n v e x with respect to a same function and the Abadie constraint qualification holds, we have
PEiZ)KT =
PE{Z)HU.
Proof. If z° = f{x°)
e PE{Z)KT,
then none of the p systems
V/i(xO) y < 0
Jg{aP) y ^ 0 (where Jg only concerns the functions QJ with j € I{x'^))
has a solution
y G iR". By applying the theorem of alternative of Motzkin to each, we get some numbers u' ^ 0 and vj = 0 such that
Summing over i yields
530
Vector optimization
or
V(A/(x°) + M5(a:°)) = 0 P
where A^ = 1 + ^
Uj and /j^j ~ ^
v^j for j 6 /(2:^) and /ij = 0 for
jf ^ ^(^^). Since the function L{x) = Xf{x) + iJLg{x) is invex with respect to 77, it achieves an unconstrained minimum at x^\ moreover, it is A > 0, /i ^ 0 and iJig{x^) = 0. From L{x) ^ L{x^) we obtain Xf{x) ^ Xf{x^). Then the point x^ is a solution of the scalar problem (S.Pi) which we will consider in Section 7. Theorem 6.7.1 will ensure that z^ E PE{Z)HU• If we strengthen convexity assumptions by also involving the function g, then PE{S)K coincides with PE{S)KT' 'n this case, all the sets previously introduced for describing the proper efficiency are the same and we may claim that convexity assumptions give a coherence between a local and a global description of this notion. Theorem 6.4.7. If the functions gj {j e I{x^))
are pseudo-concave at x^
and the functions gj (j ^ I(oc^)) are continuous at x^, then PE{S)KT
=
PE{S)K.
Proof. By contradiction, suppose that there exists a point x^ e PE{S)K\ PE{S)KTThen there exists a vector y e ET' such that Jf{x^)y < 0 and Vgj{x^)y ^ 0 {j e I{x^)). From the hypotheses on g we obtain Jf{x^) y < 0 and g{x^ + ty) ^0 for any t e IR+ small enough. D Remark 6.4.8. In the previous theorem it is not possible to weaken the pseudo-concavity assumption by requiring e.g. the invexity of —gj or the quasi-concavity of gj at x^. Consider the following {V.P.):
min f{x) = mm(fi{xi,X2), f2{xi,X2)) = min(xi,X2) and -xl - 0:2 ^ 0. The point (0,0) e PE{S)K but (0,0) ^ PE{S)KTIt is easy to verify that —g = xf + X2 is not pseudo-convex but is invex with respect e.g. to 77 = (r/i,?72) = ( x i , + x 5 + X2). The same conclusion can be obtained for the point x^ = 0 and for the problem
Proper efficiency
531
min f{x) = m.m(fi(x)^ f2(x)) = min(—x^) with the quasi-concave function g(x) = —x^. Any properly efficient solution set introduced in this section is a subset of E{Z). The aim of these different notions of proper efficiency is to retain most of the efficient points and to eliminate some anomalous situations. Thus theorems which assure that the closure of the set of proper efficient points contains the set E{Z) are very desirable. Later on, we will prove (see Theorem 6.5.13) that generally the properly efficiency is a good approximation of solutions of iV.P.). When the hypotheses of Theorem 6.4.6 and of the following Theorem 6.4.8 are fulfilled, we can state that PE{Z)HU ~ E{Z) or that all the sets of proper efficiency are the same and coincide with E(Z). It is the case when the problem is linear or S = IR^ and all the functions fi and QJ are linear or affine. Any optimal solution of a linear vector problem is proper with respect to all the definitions of this section. Theorem 6.4.8. If the functions fi and gj [j G I{x^)) are pseudo-concave at x^ and the function QJ {j ^ I{^^)) ^^^ continuous at x^, we have PE{Z)KT
=
E{Z).
Proof. It is quite similar to Theorem 6.4.7.
D
The following theorem can be particularly compared with Theorem 6.4.4. Theorem 6.4.9. If Z is a convex set, then we have PE{Z)LBO
=
PE{Z)HU'
Proof. By the convexity of Z we have T{Z, z^) = cl cone(Z — z^). Hence PE{Z)LBO = PE{Z)GBO = PE{Z)HU^ where the last equality follows from Theorem 6.4.4. D
The analysis of the relation between PE{Z)LBO and PE{Z)Be needs the notion of asymptotic cone. We introduce this definition with a few properties which will be used later on.
532
Vector
optimization
Definition 6.4.15. The asymptotic cone of the set Z C M^ is the cone
As{Z) - {y € iR^ : 3 {Xk} C iR+, limit-.+oo A^ - 0; 3 {zk} C Z : y = limfc^+oo A^^r^} . It is immediate to verify that As(Z) is a closed cone and that As{Z) = As{Z + a), V a e IRP. From Z i C Z2 it works out As{Zi) C As{Z2). When Z is a closed convex set, we have (see [Rockafellar, 1970], Theorem 8.2) the equality between the asymptotic and the recession cone (see Definition 3.4.9):
As{Z) = 0+(Z) = {2/ : 2 + ay G Z ; V^ G Z, Va ^ 0} . Lemma 6.4.1. A set Z C M^ is bounded if and only if As{Z) == {0}. Proof.
Let Z he s bounded set with ||;2;|| < K, \/z
z e As{Z) lim
: z =
lim
tkZ^ with {z^}
C Z and {tk}
E Z, and let C i?+ with
tk = 0. From ||tfc2:'^|| < tkK we obtain ^ = 0. Conversely, let
fc—).-|-oo
A5(Z) = {0}. If Z would be unbounded there would exist a sequence {z^} C Z such that lim z^ = 00. Hence we would have (perhaps by Z^
choosing a subsequence) 0 ^ z =
lim
,
G As(Z).
D
Lemma 6.4.2. For Z i , Z2 C EP, it is A 5 ( Z i n Z2) C ^ ^ ( Z i ) n As(Z2). Proof. It is a straightforward application of the monotonic property of the asymptotic cone. D Remark 6.4.9. In general the equality As{Zi D Zi) = As{Zi) D As{Z2) does not hold. A sufficient condition is obtained by requiring that Z i and Z2 be closed convex sets with Z i H Z2 being nonempty. Indeed in this case we have As{Z\ n Z2) = 0 ^ ( Z i n Z2) and the same claim is true for Z i and Z2. Theorem 6.4.10. If Z is a closed set and As{Z) n ( - C ) = { 0 } , then PE{Z)LBO
= PE{Z)Be^
Theorems of existence
533
Proof. By contradiction suppose z^ e PE{Z)LBo\PE{Z)Be or T{Z, z ° ) n ( - C ) = { 0 } with cl cone(Z + C - z°) n ( - C ) ^ { 0 } . Then there exist {Afc} C jR+ and {z''} C Z such that l i m A f c ( / - 2°) = z € - C \ { 0 } . /c—>+oo
The sequence {^r^} cannot converge to z^ since otherwise we would get z G T{Z^z^) which contradicts z^ G PE{Z)LBOAlso the sequence {Ajt} cannot contain a subsequence converging to zero since we would get z e As{Z - z^) n ( - C ) = As{Z) f l ( - C ) , which is a contradiction to our assumption. Hence {z^} is a bounded sequence. In any case by choosing a subsequence, we can suppose that {z^} converges to a value z* 7^ 2:^. Thus {A^} converges to Ao > 0. But we would have lim z^ = z^ + Z/XQ e Z as Z \S B closed set and this conclusion conk—*-i-oo
tradicts the hypothesis that z^ be efficient.
D
6.5. Theorems of Existence In ordinary scalar optimization problems, for existence theorems the classical reference is given by the Weierstrass theorem which guarantees the existence of extremal points for a function f : S C M^ —^lR\f\t is continuous and 5 is a compact set. A well-known generalization assures the existence of a minimal point under the hypotheses that / is a lower semicontinuous and 5 is a compact set. If we turn our attention to the image space Z = f{S), then we can state that Z has a minimal value when Z + IR^ is closed and bounded below. The similar situation happens in vector optimization problems. Since the compactness assumption can be too strict for applications, one seeks to weaken this hypothesis possibly in favor of that kind of semi-compactness that we meet in generalized Weierstrass theorem. The goal of avoiding this restriction is generally achieved by introducing the so-called Csemicompact, C-compact, C-closed, C-bounded and C-quasibounded sets, by using Zorn's lemma and by imposing adequate (topological) conditions on the ordering cone. In our case these conditions are always satisfied. All the previous definitions are based differently upon the easy equality E{Z) = E{Z + C) and upon the generalization of classical notions by considering only a cone-dependent part of Z.
534
Vector
optimization
This section begins with the various generalizations of compactness, that we have just nnentioned, and with their relationships. Later we will give the more general definition of C-complete set introduced by [Luc, 1989]; so the main existence theorem will be Theorem 6.5.6. Proofs and examples of this section have a definite reference in [Corley, 1980], who introduced the cone semicompactness condition as a light generalization of a previous definition of [Wagner, 1977], in [Hartley, 1978] who defined the C-compactness, in [Sawaragi-Nakayama-Tanino, 1985] and in [Cambini-Martein, 1994] where one can find the more general notion of C-quasibounded sets. For readers interested in more recent developments in the infinite-dimensional case we quote some existence results in particular spaces ordered by supernormal cones (see [Isac, 1983] and [Postolica, 1993]) and by cones introduced by [Ha, 1994] with the property that any bounded set which is contained in a complete subset of Z has a limit. Definition 6.5.1. A set Z C M^ is said to be: a) C-compact when the set (z — C) 0 Z \s compact, \/z e Z] b) C-semicompact when any cover of Z of the form {{z^ — Cy; / , z^ 6 Z}, when / is an index set, admits a finite subcover.
a 6
Theorem 6.5.1. If Z is C-compact, then Z is also C-semicompact. Proof. Let us consider any open cover of Z of the form {{z^ — Cy; a G / , z"^ € Z) and any z^ e Z. The subfamily { ( z ^ - C ) ^ ; a € / , z^ G Z, z^ 7^ z^} is an open cover of {z^ — C)nZ. As Z is C-compact, the set [z^ -'C)nZ is compact and hence its previous cover has a finite subcover. If we add {z^ — Cy to this subcover, we have a finite subcover of Z. D Of course any compact set Z satisfies Definition 6.5.1 a). Example 6.5.1 will show a set Z which is C-compact but not compact. The C-semicompactness condition is still weaker; it only concerns the particular covers of the form {{z^ — Cy}. See Example 6.5.2 for a C-semicompact set which is not C-compact. Theorem 6.5.4 will give a sufficient condition for a C-semicompact set to be also C-compact.
Theorems of existence
535
Example 6.5.1. The statement that a compact set is also C-compact (and therefore C-semicompact) cannot be reversed. For example, take C = M^ and ^ - { ( 2 : 1 , ^ 2 ) :zi + Z2 ^ 0 } . Example 6.5.2. The inclusion relation of Theorem 6.5.2 is proper. The set Z = {{zi,Z2)
: zf + z^ < 1, zi > 0, Z2 > 0} U { ( 0 , 0 ) } is C-
semicompact but not C-compact with respect to C = 1R\. Remark 6.5.1. [Luc, 1989] gives another definition of C-compactness. Here a set Z C JR^ is said to be (7-compact when any cover of Z of the form {Ua + C; a e I, Ua are open} admits a finite subcover. The two definitions are not comparable. The set Z = {{zi,Z2) : zi = —^2} satisfies Definition 6.5.1 a) with respect to C = M^ but not iR^-compact for Luc. An example in the opposite direction is given by the set of Example 6.5.2. However, Luc's definition of C-compactness also extends that of compactness. Also we have that a C-compact set (in the meaning of Luc) is C-semicompact; indeed let us consider any cover of Z of the form {(z^ -CY; ae I, z"^ e Z}. Since the sets {z"^ - Cy + C form a cover of Z which admits a finite subcover, say ( a i , . . . , a s ) , we also have that
zc\J (z^^ - cy. 2=1
In existence theorems the hypothesis of C-semicompactness of Z can be replaced by the condition that Z is a C-closed and C-bounded set. So we arrive at a second generalization of the notion of compactness. Definition 6.5.2. A set Z C M^ is said to be: a) C-closed when Z + C is closed; b) C-bounded when As(Z) n ( - C ) = {0}. Remark 6.5.2. It is of interest to note that there is no implication between the closure of Z and its C-closure. The set Z = {{zi,Z2)
: ziZ2 =
— 1, zi > 0} is closed but not iR^-closed; the set Z = {{zi^ Z2) : ^1+^2
-+oo
538
Vector
optimization
such that lim tk{z'' + c'') = - c € ( - C ) \ { 0 } . First let us suppose that {tkC^} has a convergent subsequence. We may always assume that lim tkC^ = c e C. Therefore {tkz^} converges to —c — c^ which is a nonzero vector in As{Z) f l (—C) since C is a pointed cone. So we have that As{Z) f l ( - C ) ^ { 0 } . If we suppose that {tfcC^} has no convergent subsequence, {tkC^} is unbounded. From Lemma 6.4.1 we have As{{tkC^})
^ { 0 } . Namely, we
may assume by taking a subsequence of {tkC^} that there exists another sequence {rk)
C J?+ with
lim
r^ = 0 and
lim
rk(tkC^) = c ^ 0.
Naturally c e C. From
Wrktkz^ + c|| = = \\rk[tk{z^ + C^) +C]-
(TktkC^ ~ C) - TkC\\
^ Tk \\tk{z^ + c^) + c|| + Wr^tkC^ - c\\ + r , ||c|| it follows that lim TktkZ^ =--c or As{Z)r\{-C)
^ {0).
D
fc—>4-oo
Lemma 6.5.2. If Z + C is C-semicompact, then Z is also C-semicompact. Proof. Let {{z^ — Cy; a e I^ ^^ G Z } be an open cover of Z. For any z e Z, \et z e {z^^ — cy with z^^ 6 Z. Since C is a convex cone we have z + C C {z^^^ - Cy. In fact, if z + c G 2:^° - C for some c^C, then ;2 G z ' ^ o - C , which is a contradiction. Hence { ( z " - C ) ^ ; a G / , ^"^ G Z } is also an open cover of Z + C Since this last set is C-semicompact, this cover has a finite subcover which is, of course, a subcover of Z . D Theorem 6.5.2. If Z is a C-closed and C-bounded set, then Z is also C-semicompact. Proof. For any 2: G Z + C, let us consider the set {z - C)V\{Z ^ C). By virtue of Lemmas 6.4.2 and 6.5.1 we have As {(^z - C) f l ( Z -h C)] C As{z - C) n As{Z + C) = ( - C ) n As{Z -f C) = { 0 } since Z is Cbounded. Therefore (z — C)n{Z + C) is a closed and bounded set, i.e. a compact set Vz G Z -t- C By Definition 6.5.1 a), Z + C is C-compact and, by Theorem 6.5.1, also C-semicompact. Hence by Lemma 6.5.2, Z is C-semicompact. D
Theorems of existence
539
The C~boundedness notion has been generalized by [Cannbini-Martein, 1994] by means of the definition of C-quasibounded sets. Definition 6.5.3. A set Z C JR^ is said to be a C-quasibounded
set when
the section {z — C) 0 Z \s 3 bounded set for any z E Z. Theorem 6.5.3. \f Z C IBP is C-bounded, then Z is C-quasibounded. Proof.
In contradiction t o what we wish to prove, we suppose t h a t
{z^ — C) D Z \s nonbounded for z^ E Z. {z^}
= {z^ - c^}, with c^ eC
and
of generality we can assume that
lim
lim
Then there exists a sequence ||z^|| = + o o . W i t h o u t any loss
z r c ^ ,, ,,, = z*. Then also < — ,, , ,, >
converges t o the same element. Contrary to the hypothesis we have found a point z"" e-Cn
As{Z),
^V
0.
•
The following example will show t h a t the class of C-bounded sets is strictly contained in the class of sets which satisfy Definition 6.5.3. Example 6.5.3. We have already noticed that the set Z = ( z i , Z2) : 2:12:2 = 1, z i < 0 } is not iR^-bounded. But this set is C-quasibounded. Under some additional algebraic and topological assumptions, all the previous concepts (C-compactness, C-semicompactness,
C-boundedness
and C-closure, C-quasiboundedness) coincide. Theorem 6.5.4.
Let Z C M^ be a closed and a convex set.
following statements are
equivalent:
a) Z is C-quasibounded; b) Z is C-closed and C-bounded; c)
Z is C-compact;
d) Z is C-semicompact.
Then the
540
Vector
optimization
Proof, a) -~> b). Assume that there exists c G As{Z)n{~C\{0}). The hypotheses about the set Z assure As{Z) = 0~^{Z) and therefore z+tc e Z, yzeZBndWt^Q. As z +tee z-C too, then the set Z D {z - C) is unbounded, contradicting the C-quasiboundedness of Z. By Remark 6.5.2 the set Z is also C-closed. b) -^ c). In Theorem 6.5.3 we have already proved that the set {z — C)r]Z is bounded y z e Z. Therefore it is compact and Z is C-compact. c) - ^ d). See Theorem 6.5.1. d) —> b). Suppose, to the contrary, that some —c ^ 0 belongs to the set As{Z) n ( - C ) . From As{Z) = 0^{Z) we have ^ - c E Z , for each z e Z, and so E{Z) — 0, in contradiction to the following Theorem 6.5.8 b). As Z is a closed set, the C-boundedness implies its C-closure. D The following notion of C-completeness is rather far from the previous generalizations of compactness conditions, but in a more general way, it still guarantees the existence of efficient points. Definition 6.5.4. A set Z C ]RP is said to be C-complete whence it has no covers of the form {{z^ — C)^; a E 1} when / is an index set and {z^} C Z is a decreasing net (such that a^/3 e I and (3 > a imply
z'^-zf^ e C\{0}). Theorem 6.5.5. If Z is a C-semicompact set, then it is also a C-complete set. Proof. Suppose to the contrary that Z has a cover of the form {{z^ — Cy},
where {z^}
is a decreasing net. As Z is C-semicompact, we have
n
Z C |J(2:'^ - Cy or Z d {z"^ - Cy. t h a t ^ ^ ^{z^'-C).
So we arrive at the contradiction D
Example 6.5.4. There are examples of sets which are C-complete but not C-semicompact. One is provided by the set Z = { ( ^ i , ^2) : 2:2 < 0, ziZ2 = 1} U {{zi^Z2) : Z2 S 0, Z2 = —zi}- The set Z is iR^-complete, but it does not satisfy Definition 6.5.1 b) since its cover {{z'^ — iR2^)^}, with z'^ = (n, —n), has no finite subcovers.
Theorems of existence
541
Theorem 6.5.6. We have E{Z) 7^ 0 if Z is a nonempty C-complete set. Proof. Let us consider the set P consisting of decreasing nets in Z. For a^b e P, if we write aRb when the net b is contained in a, we have that the binary relation R satisfies Definition 6.2.2 b). So P is a partially ordered set. Since any subset of P has an upper bound, P is inductively ordered. By Zorn's lemma, we get a maximal element {z^}. Now we are able to prove that E{Z) ^ 0. Suppose to the contrary that the set of efficient points is empty and consider the net {{z^ — C)^}. It forms a cover of Z. Indeed, if this is not the case, there exists z ^ Z such that z ^z^ — C \ox any a, or z^ ^ z. IKs z ^ E{Z), there is some zi e Z with z > zi. We conclude that z^ > zi, for any a, and we obtain a decreasing net {z^} U { ^ i } which contradicts the maximality of {z^}. Through Z c\^ {{z^ — Cy}, we arrive at the contradiction that Z is not a
C-complete.
D
Theorem 6.5.7. We have E(Z) 7^ 0 if and only if there is some z e Z such that its section Z n{z — C) is C-complete. Proof. From Theorem 6.5.6, \f Z H (z — C) is C-complete for some z e Z, we get that E{Z D {z — C)) is nonempty. It suffices to prove that E[Z r\{zC)) c E{Z). Let ZQ e E{Z n{zC)). If there is some z e Z with ZQ > z, then z e Zn{z — C) and hence z > ZQ. Conversely, if z e E{Z), the section Z n{z — C) is obviously a C-complete set because there are no decreasing nets in Z f i (z — C). D Theorem 6.5.6 and the inclusions proven through Theorems 6.5.1, 6.5.2 and 6.5.5 furnish other sufficient conditions for E{Z) ^ 0. Theorem 6.5.8. We have E{Z) is satisfied:
7^ 0 if one of the following conditions
a) Z is a compact or a C-compact set; b) Z is a C-semicompact set; c) Z is a C-closed and C-bounded set.
542
Vector
optimization
Theorem 6.5.9. Also we have E{Z) 7^ 0 if either of the following conditions holds: i)
Z is a closed and C-quasibounded set;
ii) Z is a C-closed and C-quasibounded set. Proof. i)
Z is C-quasibounded if and only if it is C-compact. The thesis follows immediately from Theorem 6.5.8 a).
ii) Consider the set {z — C) r){Z + C) for any z e Z. First we will prove that this set is bounded. In fact if it is unbounded, then there exists a sequence {z -f- c^} = {z^ — 7^} with c^,7^ G —C, z^ ^ Z and lim
||c II = -t-00. The hypothesis about the cone C assures that
3 a G jR^ such that ac < 0, Vc G - C \ { 0 } . Let B be the closed unit ball and c^ = .--7^7 . 7^ = TT-TT, • As there exists c^ e B D (-C)
such
that ac^ ^ ac, Vc G B n ( - C ) , it works out
az^ = a{z + c^ + 7^) = az + ac^ \\c^\\ + aj^ \\j^\\ S S a z + acO(||c^|| + ||7'=||) which runs to —00. Consequently we have
lim
||2: || = H-oo. This
fc—•4-00
is absurd because z^ E Z H (z — C) and Z is a C-quasibounded set. Now the set {z — C) n (Z + C) is closed and bounded and it is easy to verify that E{Z -|- C) 7^ 0. The thesis follows from the inclusion
E{Z + C)cZ.
D
In a particular case the conditions a), b), c), d) of Theorem 6.5.4 become also necessary for E{Z) 7^ 0. Theorem 6.5.10. Let Z C JR^ be a closed convex set. Then E{Z)
^ 0 \f
and only if any one of the conditions in Theorem 6.5.4 is satisfied. Proof. It is enough to prove that E(Z) ^ 0 implies that Z is C-closed and C-bounded. To this purpose we can repeat the proof of the quoted
Theorems of existence
543
theorem (in the implication d) —> b)).
D
Theorems 6.5.6, 6.5.8 and 6.5.9 give some sufficient conditions for E{Z) 7^ 0 under some hypotheses on the image space Z = f{S). A further theorem of existence involves, instead of the image space, the set S and the function / . Definition 6.5.5. A function / : IR^ -^ FiP is said to be C-semicontinuous when the set f~'^{z - C) is closed V ^ G R^. Remark 6.5.5. For p = 1 and C = ]R\ (resp. C = EF_) the C-semicontinuity definition collapses to the usual definition of lower (respectively upper) semicontinuity. Theorem 6.5.11. If 5 C M^ is a compact set and / is a C-semicontinuous function, then E{Z)
7^ 0.
Proof. If I J {{z"^ -Cy-^ael,
z"" e Z} is an open cover of Z, then Def-
a
inition 6.5.5 assures that | J {f'^Hz"^
- Cy] \ OLE I, z"^ e Z} \s an open
a
cover of S. But S is compact; hence this cover admits a finite subcover whose image forms a finite subcover of Z. Then this set is C-semicompact. The thesis follows from Theorem 6.5.8 b). D Up until now we have been concerned with existence theorems for the set E{Z). A necessary condition for WE{Z) 7^ 0 and PE{Z)He + 0 can be given in terms of asymptotic cone. Lemma 6.5.3. A nonzero vector z G As{Z) if and only if cone(2: + U) D Z nV y^ 0 for every neighborhood U of zero and for every neighborhood V of 00. Proof. As a simple consequence of the definition, we have z G As{Z)\{0} if and only if z G clcone(Z D V) for every neighborhood V of CXD, or {z + U)n cone{Z n V) / 0 for every U{0) and for every V. D
544
Vector
optimization
Theorem 6.5.12. i)
WE{Z)
i^ 0 implies As{Z) f l ( - i n t C) = 0;
ii) PE{Z)He
+ 0 implies As{Z) n ( - C ) = { 0 } .
Proof. i)
Suppose to the contrary that there is some z G As(Z) D (—int C). Of course z e As{Z - z^) for each z^ e Z and then cone(z + U)n(Z z^) n y 7^ 0 for any neighborhood V of oo, by virtue of Lemma 6.5.3. Since z e (—int C), we may choose a neighborhood U{0) small enough such that cone(;2: + U)\{0} C (—intC). As V does not contain zero, the above relations imply {Z — z^) 0 (—intC) y^ 0. This shows that no z^ e Z can be weakly efficient.
ii) if z^ e PE{Z)He there exists a cone C with C \ { 0 } C int C such that z^ € Ec'{Z). By i) we have As{Z) f l ( - i n t C ^ = 0 and consequently A 5 ( Z ) n ( - C ) = {0}. D We conclude this section by utilizing Definition 6.5.2 in order to prove the remark at the end of Section 6.4. In particular Theorem 6.5.13 will hold for compact sets. Lemma 6.5.4.
\f Z C IRP is C-bounded and C-closed, we have Z C
E{Z) + C. Proof. The set Z^ = {z - C) D {Z + C) for ^x\y z e Z is compact (see the proof of Theorem 6.5.9 ii)). In view of Theorem 6.5.8 a), E[Z') ^ 0; moreover, E{Z') C E{Z). Indeed, if there would exist an element z G E{Z')\E{Z), we would have a value z' e Z such that z- z' ^ C. We can also obtain z' G Z' which leads to a contradiction with respect to z G E{Z'), The conclusion 0 i- E{Z') C E{Z) is sufficient for proving Z'r\E{Z) i^ 0. Then, for any z ^ Z,N^e have found a value z G E{Z) such that z ^ z — C or Z C E{Z) + C. D Lemma 6.5.5. Let A be a closed cone such that An
C = { 0 } . Then
Theorems of existence
545
there exists a sequence {Ck} of closed convex pointed cones, Ck ^ C, such that:
i)
C^+iCCfc,VA:; +00
ii)
^Ck
= C',
k=i
iii) AnCk
=
{0},yk;
iv) C\{0} cintCfc, Vfc. Proof, let D = [d : d = --^ ] c e C, c^O}.
We have DnA
= 0 with
C = coneD U { 0 } and Z) a compact set. Then the distance £: of C from the closed set A is positive. Set Ck = cl cone(D + U^/k(0)). Since ^ is a closed cone, for k large enough, from An{D+ U^/k{0)) = 0 it also works out AnCk = { 0 } . It is easy to verify that the cones Ck also fulfill the other requirements of our thesis. D Theorem 6.5.13. Let Z E iR^ be a C-bounded and C-closed set. Then
E{Z)CclPE{Z)He^ Proof. From Definition 6.5.2 b) and from Lemma 6.5.1 we get As{Z + C) n (—C) = { 0 } . Then, in view of Lemma 6.5.5, there exists a sequence of pointed closed cones {Ck} such that flCfc = C, Ck+i C Ck, As{Z + C)r\{-Ck) = { 0 } a n d C \ { 0 } c i n t C f c . Since the set Z + C + C^ is closed, we have that the set Z + C is C^-closed and Cj^-bounded; then Lemma 6.5.4 gives Z C Eck{Z) + Ck for any k. Now let z E E{Z). From the previous inclusion we obtain a sequence {z^} such that z^ E Ec^iZ) H (z - Ck). Then z^ e PE{Z)He (see Definition 6.4.8). Now it suffices to prove that a subsequence of {z^} converges to z. Since C^+i C Ck, z^ e {Z + C) D {z — Ci) for every k. This section is closed and bounded since As[{Z + C)n{zCi)] C As{Z + C)n ( - C i ) and we can assume without loss of generality that {z^} converges to some z e {Z + C) r\{z — C l ) . Since z — z^ e Ck, we have z — z e C and this implies z = z. D
546
Vector
optimization
6.6. Optimality Conditions In the same spirit of scalar optimization we are going t o produce a few necessary o r / a n d sufficient conditions related t o several optimality notions. All these conditions will require the differentiability of the functions / and g] the nondifferentiable case will be analyzed in Section 8 by taking into account weaker assumptions. All the conditions of this section are first-order conditions.
In the literature also some second-order con-
ditions are known, both in the smooth and nonsmooth case. In general, they use the notions of Hessian matrix, of second-order tangent cone or of some other particular cones. The generalization t o the nonsmooth case is often fulfilled with the reference t o the functional class C^^^ or t o functions whose derivatives are locally Lipschitz. W i t h reference t o these derivatives one can apply the approach we will introduce in Section 8. First consider the problem (V.P.) when no functional straint
(inequality)
con-
is considered. The reference for the scalar case is given here by
the well-known necessary condition: if x^ is a local m i n i m u m point for / : 5 C iR^ ^
iR, then Vf{x^)
Theorem 6.6.1.
y^O,Wye
T{S,
If x^ is a weakly local efficient point for (V.P.)
d i f f e r e n t i a t e at x\
then Jf(x^)y^
- i n t C , Vy G
Proof. For y e T(S, x^) we may find a sequence {x^} with
lim
-n—T
TTT; = y.
fc-++oo - UX'^ x" J/(x°)(x'= x°) — + o{\\x^
fix'') - /(x°) _
x^).
0
with
/
T(S,x^). C S,
By Taylor's expansion f(x^)
lim
x^ = x^
— f(x^)
=
- x°||), we get x^-x°
and hence:
I'.J-^^^-^nAy So we have proved the theorem since f(x^)
— f(x^)
^ —intC
D
Optimality conditions
547
Theorem 6.6.2. The condition Jf{x^)y ^ -C, \fy e T{S,x^)\{0} sufficient for x^ to be a local efficient point for {V.P.).
is
Proof. Ab absurdo suppose that x^ is not a local efficient point. Then there exists a sequence {x^} C S converging to x^ with f(x^) — f{x^) G - C \ { 0 } . From: /(x^)-/(.0)^
0
x^-x^
by taking the limits for k —> +oo and possibly by choosing a subsequence of {x^}, we get Jf{x^)y G -C for an element y € T ( S ' , x ^ ) \ { 0 } which leads to a contradiction. CH We can note some differences with the guideline of the scalar case. For p> 1 and S open set, Theorem 6.6.1 does not lead to a result like Fermat's proposition, a generalization which will be given by Theorem 6.6.3. Still, in the case that 5 is an open set, the condition expressed in Theorem 6.6.2 is never satisfied for p = 1; this is so because, if one has V / ( x ^ ) y > 0 for some y e R'', then V / ( x ^ ) ( - y ) < 0. Before proceeding to the particular case of S open set, we give another sufficient condition for the same problem handled by Theorems 6.6.1 and 6.6.2. The proof can be obtained as in Theorem 6.6.2. Theorem 6.6.3.
The point x^ is a local efficient solution of {V.P.) if
sup d • J / ( x ° ) y > 0 for every y e T{S, 'dec >
x^)\{0}.
Theorem 6.6.4. If x^ is a weakly local efficient point for {V.P.) with / differentiable at x^ and S an open set, then 3i?^ 6 C > \ { 0 } such that
Proof. By Theorem 6.6.1 we immediately get Jf{x^)y 0 —intC, V y G IRP-. Since the sets {Jf{x^)y\y G IRP'} and —intC can be separated by a hyperplane, there exists an element '&^ £ C > \ { 0 } such that ^^Jf{x^) y ^ 0, V y G 1?^, or ^^Jf{x^) - 0. " D
548
Vector
optimization
The following example will show that generally Theorem 6.6.4 is not a sufficient condition for x^ to be a local minimal point. This goal will be achieved under a suitable generalized convexity assumption. Example 6.6.1. and f{x)
Consider the problem (V,P.) with S = E?, C =
= f{xi^X2)
R\
= (xi — 0:2,1 — :ri — e~^^). We have, for 1?^ =
(1,1) e C^ \ { 0 } , 79^J/(0,0) = 0 but (0,0) is not a local efficient point as f{xi,xi)
e / ( 0 , 0 ) - ( C \ { 0 } ) , V x i ^ 0.
Theorem 6.6.5. The conditions ^^Jf{x^) = 0, for some 1}^ e C^ \ { 0 } , and / C-pseudoconvex at x^ are sufficient for x^ to be an efficient point for (V.P,). Proof. Ab absurdo there exists a point x e S such that f{x) — f{x^) — C \ { 0 } . The C-pseudoconvexity of / at x^ implies that Jf{x^){x x^) e -intC. So we get the contradiction 'dJf{x^){x^ - x^) < Vi9eC^\{0}.
G — 0, D
In the remainder of the section we turn our attention to vector problems (V.P.), with the inequality constraints gj{x) ^ 0. We will provide a few necessary conditions a la F. John or a la Kuhn-Tucker, satisfied, respectively, by an efficient, a proper efficient and a weakly efficient point. The necessary conditions will become sufficient under a suitable generalized convexity hypothesis. As in Theorem 6.6.5, these hypotheses are also sufficient for assuring the global feature of efficient points. The necessary (and sufficient) conditions for properly efficient points will take into account the large definition by Kuhn-Tucker. So in this case the domination structure is given by C = ]R^. Theorem 6.6.6. If x^ is a weakly local efficient point for {V.P.) where / and g are differentiable at x^, then there exist i^^ e C > and A^ € IR.•m with ( ^ ^ A^) y^ 0 such that ^^Jf{x^) + \^Jg{x^) = 0 and X^g{x^) = 0. Proof. The following system has no solution in FT:
Optimality conditions
549
J Jf{x^){x-x^) e -intC \ g{x^) + Jg{x^){x - x^) e intiR!!? since, from f{x^ + t{x-x^))-f{x^)
e - i n t C and g{x^+ t{x-x^))
. Remark 6.6.2. We still obtain the same result if the assumption of Cpseudoconvexity of / at x^ and of quasiconvexity of A^^ at x^ are replaced
550
Vector
optimization
with the condition that / and g are C-invex at x^ with respect to a same 77 and 7?^ E C>. Indeed, from f{x) - f{x^) - Jf{x^)
r]{x, x^) eC,\/xe
S,
it follows:
^ -X^9{x) + X^g{x^) = -X^g{x)
^ 0.
Then the point x^ is minimal for the scalar function 7?^/. From Theorem 6.7.1 it will follow that x^ e PE{S)HU
C E{S).
Theorem 6.6.8. If x^ € 5 is a properly efficient solution to {V.P.) in the sense of Kuhn-Tucker, then there exist 1?^ G i n t ^ ^ and A^ G M^ such that:
r 79^ . Jf{x^) + X^ . Jg{x^) = 0
1
X'^gix')=0
Proof. From Definition 6.4.1 there is no y e M^ such that Jf{x^)y
< 0
and S7gj{x^) y ^ 0 ( j G / ( ^ ^ ) ) . Then, from the theorem of the alternative of Tucker (see point 20) in Section 2.4). there exist T?^ > 0 and A^ ^ 0 such that
E^?V/.(x^)+ J:
X'^Vg,{x') = 0.
Letting A^ = 0 for j 0 1(00^), we can immediately establish the theorem. D Remark 6.6.3. From Theorem 6.4.2 it follows that in a Paretian framework the conditions of the above theorem are also necessary (without any other assumption) for all the other definitions of proper efficiency with the only exceptions being local proper efficiency in the sense of Borwein and proper efficiency in the sense of Klinger. Theorem 6.6.9. If / and g are differentiable at x^ and the functions T?^/ and A^^' are pseudoconvex and quasiconvex at x^, respectively, the conditions in Theorem 6.6.8 are also sufficient for x^ G 5 to be a Kuhn-Tucker
Optimality conditions
551
properly efficient solution to {V,P.). Proof. In virtue of the quasiconvexity of the function X^g, from X^g{x) S 0 - X^g{x^) we obtain X^Jg{x^){x-x^) ^ 0, Vx G S. Then ^^Jf{x^){xx^) ^ 0 and hence '&^f{x) ^ 'd^f{x^) in view of the pseudoconvexity of 1?^/. The conclusion is as in Remark 6.6.3. D Remark 6.6.4. We obtain the same conclusion of Theorem 6.6.9 if the request of invexity of fi and gj {i = l,...,p; j = l , . . . , m ) with respect to a same rj is made (instead of the pseudoconvexity of T9^/ and of the quasiconvexity of X^g). See Remark 6.6.2. Example 6.6.2. Let / : IR^ -^ IB? with fi{x^ y, z) = y—z and f2{x, y, z) = x + y+y'^ + z'^] \etgi{x,y,z) = -x-y ^ 0 Bnd g2{x,y, z) = -y + z"^ ^ 0. The point (0,0,0) satisfies the conditions expressed by Theorem 6.6.8 for 1?^ = A^ = (1,1). Hence the origin can be a properly efficient point (according to Kuhn-Tucker definition). This conjecture is confirmed by Theorem 6.6.9 since T?^/ = / i + /2 = x + 2y — z + y'^ + z'^ and ^^9 = 91 + 92 = ^x — 2y + z"^ are convex functions. If we also relate Theorem 6.6.6 to the componentwise order of IRP we may compare the signs of the components of the multiplier i?^. When z^ e E{Z) we have i)^ ^ 0, but the stronger hypothesis z^ G PE{Z)KT guarantees i9^ > 0. The same approach to the following theorem says for 2P ^ WE{Z) that we have to add a constraint qualification in order to have ^0 ^ 0. Theorem 6.6.10. Suppose that {V.P.) satisfies the Kuhn-Tucker constraint qualification at x^ G 5. Then a necessary condition for x^ to be weakly local minimal point for (V.P,) is that there exist 1?^ G EP and A° G IR^ such that
552
Vector
optimization
x'g(x') = 0 ^0 > 0 ^ A^ ^ 0 . Proof. Let x^ be a weakly local minimum point of (V.P.). Now we will prove that there is no y 6 IRP' such that Vfi(x^) y < 0 (z = 1,...,;?) and Vgj(x^)y ^ 0 {j e I(x^)). Then the theorem will follow from Motzkin's theorem of alternative (see point 19) in Section 2.4). Suppose, to the contrary, that there exists a vector y satisfying the above inequalities. From the Kuhn-Tucker constraint qualification there exists a function u e C^[0, i] such that a;(0) = x^, 9{u;(t)) S 0 for any t e [Oj] andcj'(O) = ay with a > 0. From fi{co{t)) = fi{x^) + tVfi{x^)ay + o{t) and Vfi{x^)y < 0, Vz = l , . . . , p , it follows fi{uj{t)) < fi{x^) for t sufficiently small. This contradicts the locally weak optimality of x^. D Theorem 6.6.11. If the functions / and g are ^ ^ - i n v e x at x^, with respect to a same r/, the conditions in Theorem 6.6.10 are also sufficient for the weak minimality of the point x^. Proof. Absurdly suppose that there is a feasible point x such that f{x) — f{x^) e - i n t j R ^ , i.e. 7?^[/(x) - / ( x ^ ) ] < 0. The proof goes on as in Remark 6.6.2. D The optimality conditions we considered for (V.P.) with the explicit consideration of the inequality constraints gj{x) ^ 0, have a rather common structure. In general a solution of our problem satisfies the equality 7?J/(x) + \Jg{x) = 0 for some ?? and A or JL{x) = 0 for L = i}f + Xg (and some other conditions). Some further convexity requirements turn these necessary conditions to be also sufficient. The fact that the solutions of (V.P.) are contained in the solution set of the equality JL(x) = 0, can be useful to introduce some duality theorems, as a conclusion of this section. Also in the vector case duality theory is widely developed. Various approaches have been introduced and many duality problems have been
Optimality conditions
553
analyzed in order to solve the original problem or to obtain some further informations about its solutions. Generally these approaches have a common structure composed of a weak assertion, which states an order relation for the optimal values of both problems, and a strong duality theorem which assures the equality between these optimal values. In this section we will not treat the general duality theory in multiobjective programming. We limit ourselves to deal with the so-called Mond-Weir vector duality, which has the advantage of the same objective function as the primal problem. Besides, this duality naturally resorts to some aspects of the generalized convexity. So in this section we associate a dual problem to the primal problem {V.P.):
Max
f{u)
{D.V.P,)
{u,'&,x)es' with
5 ' = {{u, ^, A) : ^Jf{u)
+ XJg(u) = 0; ^ G C ^ \{0},
A G iR!p; Xg{u) ^ 0} . Of course the functions / and g are assumed to be differentiable. Theorem 6.6.11 {weak duality). Let x be any feasible point for (V.P.) and (w, 7?, A) be any feasible point for {D.V.P.).
If the function i9/ is pseudo-
convex at X and if Xg is quasiconvex at u, then f{x) — f{u)
0 —intC
Proof. Since x and (ZA, i9,A) are feasible points, respectively, for (V.P.) and (D.V.P.), we have Xg{x) — Xg{u) ^ 0. From the quasiconvexity of Xg, we get:
VXg{u){x -u)
= XJg{u){x - ix) ^ 0
or:
dJf{u){x
-u)
= Vi}f{u){x -u)
^0 .
The pseudoconvexity of the function i?/ guarantees i9(/(x) — f{u)) Since i? G C > , i? 7^ 0, f{x) - f{u) cannot belong to - i n t C .
^ 0. D
554
Vector
optimization
Remark 6.6.4. The generalized convexity assumptions of the previous theorem are satisfied if / is C-convex at n and QJ is convex at ti, V j = 1,..., ?n. Also appropriate invex type hypotheses can be made. We have the same result as Theorem 6.6.11 if / is C-invex at u and QJ is invex at u, \/j
= l , . . . , m (for the same function rj), or if the Lagrangian function
'^f + ^9 is invex (or pseudoconvex) at u. The so-called strong duality theorems state that, if x^ is a feasible point for (V.P.) which satisfies some optimality notions, then there exist 7?^ and A^ such that (a;^,i?^,A^) is a feasible point for (D.V.P.). An additional generalized convexity assumption implies that also (x^^'d^.X^) satisfies some optimality notions for (D.V.P.). Theorem 6.6.12 {strong duality). If x^ e WE(S), then there exist ^^ G C^ and A^ e iR!p with (^^A^) 7^ 0 and (a:^t?^AO) is feasible for {D,V.P.). If also i^^f and A^^ are respectively pseudoconvex and quasiconvex functions, then (x^,^?^, A^) is a weakly efficient point for (D.V.P.) and the objective values of the primal and the dual problem are equal. Proof. From Theorem 6.6.6 we obtain the first part of the theorem. Now suppose that (x^,i?^, A^) is not weakly efficient for (D.V.P.); then there exists (u, 7?, A) G 5 ' such that f(x^) ~ f(u) € — i n t C This conclusion would contradict the weak duality. Of course the objective values of the two problems are the same. D
6.7. Scalarization In Section 6.1 we underlined the main differences between scalar and vector optimization problems. In the scalar case where a complete order is given, we can always decide, for each pair of alternatives, which is preferred. This important feature is no longer valid in the vector case because the preference order is only partial and a value function, that can order the alternatives, lacks. To overcome this difficulty, techniques which convert vector problems into appropriate scalar problems can be applied. Since the scalar optimization is widely developed, the importance of such a step is
Scalarization
555
clear. Scalarization just means the replacement of a vector optimization problem by a suitable scalar problem with a real objective function which aggregates the criteria. This scalar function is called scabrizing function; in Economics it is called utility or welfare function. In this section different scalarization schemes will be presented and relations between the solutions of scalar problems and the different optimality notions that we introduced for the original (V.P.) will be investigated. The first scalarization technique goes back to Kuhn and Tucker. To our (y,P.) we now associate the following scalar problem:
min Xf(x)
(SPi)
with A G C> and ||A|| = 1. In this manner, instead of the vector function p
/ , we are led to minimize the scalar function ^
Xifi with a normal-
i=i
ized weight vector A = (Ai,...,Ap) of the strict positive polar C>. If C = JR^, this means then that we have A^ > 0, V i = l,...,p. The vector A can be considered as an index of the relative importance of the weights of the criteria. We will denote the points x^ e S that solve (SPi) by Pos^{S)] their values will compose the set Pos^{Z). Then a value zO = f{x^)
e Pos>{Z)
when there exists some AQ G C > (||AO|| = 1) such
that Ao;^^ ^ AQ;^, \IZ e Z, or Xof{x^) ^ Xof{x), \/x e S. Theorem 6.7.1. Pos>{Z) =
PE{Z)HU-
Proof. If z^ e Pos^{Z), then there exists AQ G C > such that XQ{Z^ z) ^ 0, V;^ G Z. Then we have Ao;^ ^ 0, Vz G Z - ^^ and AQC > 0, "ic E C \ { 0 } . The set {z : XQZ ^ 0} is a closed convex cone which contains Z—z^ and C. Then the same set contains clconv cone [(Z~2:^)uC]. Consequently XQZ ^ 0, V2: G clconvcone [ ( Z — 2 : ^ ) U C ] . The existence of an element ^ G cl conv cone [(Z — z^) U C] f l [—C\{0}] is absurd because we have XQZ < 0. On the other hand let z^ G PE{Z)HU and AQ an element of the nonnegative polar cone of cl conv cone [(Z — z^) U C]. Then we have XQZ ^ 0, Vz e Z - z^, and XQC ^ 0, V C G C. In particular, if c G C \ { 0 } , then
556
Vector
optimization
c e clconv cone [(Z — z^)[JC] and the hypothesis z^ G PE{Z)HU assures that c ^ —cl conv cone [{Z — z^) U C]. Then AQC > 0. In this manner we have proved that z^ e Pos^{Z). D Remark 6.7.1. The hypotheses of Theorems 6.4.4, 6.4.6 and 6.4.9, when they are fulfilled, assure that Pos^{Z) with PE{Z)Be
PE{Z)sE
= PE{Z)G
and with
PE{Z)BO\
can also be identified, respectively,
= PE{Z)Ha
with
- PE{Z)He
PE{Z)KT\
with
= PE{Z)GBO
=
PE{Z)LBO^
If we can identify the solutions of {SPi) with the set of Hurwicz's properly efficient values, it follows that Pos^{Z) is contained in E{Z) and in WE{Z), Theorem 6.7.2 is a density theorem of Pos^{Z) in E(Z). The inclusion of Pos^{Z) in E{Z) cannot be reversed; for the set Z = {{x, y) :y^ x'^} and the cone C = M%, the point (0,0) G E(Z) but it does not minimize any linear function Xz (with A G C > , ||A|| = 1 and z G Z). Theorem 6.7.2. cl
If Z is a C-closed and C-convex set, then E{Z)
C
Pos>{Z).
Proof. From Theorem 6.4.4 it follows that PE{Z)He = PE{Z)HU and hence clP£;(Z)i^e = cl PE{Z)HU = cl Pos>{Z). We can suppose £:(Z) = E{Z + C) 7^ 0. Then from Theorem 6.5.10 we obtain that Z + C \s Cclosed and C-bounded. Theorem 6.5.13 guarantees the thesis E{Z) = E{Z + C)c cl PE{Z + C)He = clPE{Z)He = cl Pos>{Z). D Roughly speaking, the scalarization (SPi) is too narrow. We have that any solution of (SPi) is also a solution of (V.P.) and is characterized as a proper value (in Hurwicz's sense). But with this scalarization we may lose some solutions of {V.P.)\ under the hypothesis of Theorem 6.7.2 the loss is to some extent compensated as every z^ G E{Z) can be obtained as lim z^, with z^ G Pos^{Z). k—*-\-oo
We can associate another scalar problem to (V.P.) by allowing that the multiplier A be in the larger set C > or that some of the weights of
Scalarization
557
/ be zero (in the particular case of Z C IBP ordered in a componentwise sense):
min \f{x)
{SP2)
with X e C> , \\X\\ — 1. We will denote the point and the optimal value sets that solve iSP^) {SPi) by Pos = (S) and by Pos = (Z). It is obvious that
P05>(Z)CP05=(Z). Theorem 6.7.3. If Z is a closed set, then c\Pos^{Z)
C Pos - (Z).
Proof. \\ z ^ clPos^{Z), there are two sequences {z^} and {A/.} with z^ e Pos>{Z), lim z^ = z and A^ e C>, ||A^|| = 1, XkZ^ S X^z for any z G Z . Without loss of generality, we may assume that {Xk} converges to some A E C^ with ||A|| = 1. We have that z e Pos- (Z) since Xk{z^ — z) ^ 0, Vfc and Wz E Z\ taking the limit as A; —> +00, we have X{z - z) ^ 0 for any ZEZ. D Theorem 6.7.4. Pos = (Z) C
WE{Z),
Proof. If z^ e Pos - (Z), then there exists a vector XQ e C ^ (||A|| = 1) such that Xo{z ~ z^) ^ 0, V2; G Z . Since we cannot have XQC = 0 for c G —intC, we obtain XQC < 0, V C G —intC. Hence it follows that ( Z - z^) n ( - i n t C) = 0, i.e. z^ G WE{Z). D Without requiring further assumptions, the set Pos = (Z) is placed between Pos^{Z) = PE{Z)HU and WE{Z). This new result is more adequate if the ordering cone would be open. Some further hypotheses allow us to specify the placement of Pos = (Z) with regard to WE{Z). When convexity assumptions will be assumed, then for any given weakly efficient point x^ we always will be able to find a weight vector AQ G C > such that z^ minimizes the linear function A^^;. This weight vector is not necessarily unique; there may be more than one weight producing a solution of {V,P.), Theorem 6.7.5. Let Z be a C-convex set. Then Pos = (Z) = Proof.
It is enough to prove that any x^ G WE{S)
WE{Z).
is also a solu-
558
Vector
optimization
tion of {SP2) for some A G C ^ , ||A|| = 1. If x^ e WE{S), then Zn{z^ - intC) = 0. Then also {Z + C) n {z^ - i n t C ) = 0; in fact, if there exist z ^ Z and c G C with z + c E z^ — ixiiC, we would have z e z^ — intC contradicting the hypothesis x^ e WE{S). As the set {Z + C) is convex, we can separate the two sets z^ — int C and Z + C hy a nonzero vector A G EP, ||A|| = 1. So we obtain A(2;^ - c') ^ A(z + c'O for any c' G int C, c'' G C, 2: G Z. In particular this shows that A G C > . From X{z^ - z - d' - d) ^ 0, we obtain \z^ ^\z, \/z e Z or th"at x^ ePos={S). D Remark 6.7.2. From Theorem 6.7.5 it works out that E{Z) if C is an open cone and Z is a C-convex set.
= Pas -
(Z)
Theorem 6.7.6. If there exists AQ G C > such that z^ uniquely (or strongly) minimizes XQZ over Z, then z^ G E{Z). Proof. If z^ ^ E(Z), then there would exist z e Z such that z^ - z e C\{0}. Then Xo{z^ — z) ^ 0 which contradicts the hypothesis about z^. D The linear scalarization can be generalized by taking into account a general Ci — C2-monotone transformation (not necessarily linear) of the objective function / or of the image space Z. These definitions for functions u : ]RP -^ M^ depend on the ordering of M^ and of M^. For z^.z'^ G R^ we will write z^ ^ z'^ or u{z^) Z u{z'^) when z^ — z'^ e Ci or u{z^) — u{z'^) G C2 where Ci and C2 are the ordering cones of FIP and M^, respectively. For 5 = 1 and C2 = JR+ we get the particular definitions of scalar C-monotone transformations (C = C i ) . Definition 6.7.1. We say that a function u : Z C IR^ -^ M^ '\s\ a) Ci — C2-increasing on Z when, for any z^^z'^ ^ Z, z^ > z^ implies u{z^) > u(z2); b) strictly Ci — C2-increasing on Z when z^ > z^ implies u{z^^ ^ u(z^^ and z^ > z^ implies u{z}) > u{z'^)\
Scalarization
559
c) weakly Ci — C2-increasing on Z when z^ > z^ implies u{z^) > u{z'^)', d) properly Ci — C2-increasing on Z when C[ — C2-increasing with respect to some (closed pointed and convex) cone C[ ^ TRP such that
C\{0}cintCi. The definitions of vector decreasing functions are analogous. The proof of the following theorem is immediate. Theorem 6.7.7.
For the Definitions 6.7.1, we get: d) => B) => h) =^
c). Now let u be a scalar function which satisfies some Definitions 6.7.1. We introduce the set Pos{u{Z)) for the values of the solutions of the scalar problem:
mmu[fix)].
(SPs)
XE.O
Theorem 6.7.8. The set Pos{u{Z)) a) PE{Z)He b) E{Z) c) WE{Z)
is a subset of:
if ti is a properly C-increasing function on Z\
if u is C-increasing on Z ; if u is weakly C-increasing on Z.
Proof. It follows immediately from the definitions involved in the proposition. D We are able to compare these results with the linear scalarization. Indeed, the function u{z) = \z with A G C> satisfies Definitions 6.7.1 a), b), c). The generalization of the statement Pos^{Z) = PE{Z)HU is only partial and we can assume the weaker inclusion Pos^{u{Z)) C E{Z) for all the C-increasing transformations. As for the scalar problem {SP2) the function u{z) = \z with A G C > satisfies Definitions 6.7.1 b) and c). Then Theorem 6.7.8 c) generalizes Theorem 6.7.4 to all the weak C-increasing transformations.
560
Vector
optimization
Now let us go back to the particular linear scalarization. The most expressive results were stated by Theorems 6.7.2 and 6.7.5, even if the solutions of the scalar problem only led to approximate solutions of (V.P.) or to weakly efficient solutions. However, both these theorems required a convexity assumption on the set Z. When this hypothesis is not satisfied one can imagine convexifying the set of outcomes with a C-increasing function H : Z C RP ^ MP {Ci = C2 = C). We will follow [Henig, 1988] by considering, for A e C> or A € C > :
mmXH[f(x)]. xes
(SPMSPs)
Theorem 6.7.9. If the following hypotheses are satisfied: i)
H is any C-increasing function on Z]
ii)
there exists the inverse function H~^ and also H~^ is C-increasing
on H{Z)] iii) the set H{Z) is C-convex and C-closed; then the set of points that minimize \H{Z)
for some A E C> is dense in
E{Z). Proof. The hypotheses i) and ii) assure the following relations, ^z E Z :
H{Z) n {H{z) - C) = H{Z) n H{z -C)
= H{Z D {z - C)). Then
z^ e E[Z) if and only if H{z^) E E{H{Z)). from Theorem 6.7.2.
Now the conclusion follows D
Theorem 6.7.9 generalizes the previous theorem 6.7.2. In any case it is not necessarily true that properly efficient points (in Hurwicz's sense) can be obtained by minimizing the functions XH{z) with A € C > .
Example 6.7.1. For Z = {(^1,2:2) : zi S 0^ Z2 ^ 0, zi + Z2 ^ —1} we have E{Z) = PE{Z)HU = {(^1,^2) : ^1 ^ 0, ^2 ^ 0, zi + Z2 = - 1 } . Let us now consider the JR^-increasing function H = {Hi^H2)
with
2
^ i ( ^ ) = —^/—^i and H2{z) = —\/~^2-
If we minimize ^ i=l
XiHi(z)
Scalarization
561
with Xi > 0 {i = 1,2), we are not able to find out the proper solutions ( 0 , - 1 ) and ( - 1 , 0 ) . Theorem 6.7.10. Let i f be a C-increasing function on Z with also the inverse function H~^ C-increasing on H{Z) and the set H{Z) C-convex. If z^ € E(Z), then there exists XQ e C^ such that z^ minimizes XoH{Z). Proof. We have already noticed (see proof of Theorem 6.7.9) that, if z^ e E{Z), then H{z^) e E{H{Z)), The claim is verified by Theorem 6.7.5. D The second scalarization scheme (see [Lin, 1976c] and [Lin, 1977]) that we are going to introduce is also based upon the remark that linear scalarization is actually successful only under a certain directional convexity condition. This remark leads us to work out a different approach where no convexity condition is required at all in order to obtain the entire desired set of the quasi efficient solutions. As we will see, this notion is different but practically as good as the notion of efficient solutions. The main idea is the conversion of all but one of the multiple objectives to equality constraints and the determination of the optimal solutions of the resultant single-objective problem. This scalarization technique was called by [Lin, 1976c] the mettiod of proper equality constraints. Its features make quite natural the reference to the componentwise ordering. Definition 6.7.2. Let A G JR^, ||A|| = 1. We call a vector z^ X-lineally minimum for the set Z C M^ when it minimizes the A-projection of the lineal subset containing z^, given by the intersection of c l Z with the straight line {z^ + aX] ae M} or Xz^ = min {A^; ] z edZ, z = z^ + aX, a e M}. We will write z^ e C{X). Remark 6.7.3. It is easy to verify that z^ satisfies the previous definition if and only if there is no negative number a such that z^ + aX E cl Z. Geometrically C{X) represents those points of c l Z that are exposed in the direction A.
562
Vector
Definition 6.7.3.
optimization
A vector z^ is said to be quasi-minimum when it is
A-lineally minimum VA e M^, ||A|| = 1. We will write
z'eQ{Z)=
n
>C(A).
I|A|N1
Theorem 6.7.11. z^ e Q{Z) if and only if z^ e
E{c\Z).
Proof. Assume that z^ e Q{Z), but z^ ^ E{c\Z). Then there exists z' e dZ such that z' < z^ or z^ - z' e -ffi^\{0} or z^ =^ z' + A, with A G i R ^ \ { 0 } . But from \z^ = \{z' + A) > \z\ being always able to suppose ||A|| = 1, we deduce that z^ is not an element of Q{Z). On the other hand, z^ ^ Q{Z) implies that z^ is not a A-lineally minimum vector for some A G iR^, ||A|| = 1. Then, for some a G iR, there exists some z' — z^ + aXe c\Z such that \z' < \z^. Then a must be negative. Consequently z^ > z' and z^ cannot belong to £"(01Z). D This theorem makes clear our interest in the set Q{Z). closed set, we have Q{Z) = E{Z).
If Z is a
In any case, a quasiminimum vector
is a solution of (V.P.) if it is inside Z] generally, it is only infinitesimally distant from Z and it possesses all the characteristics of minimal values. So we can focus our attention on the set Q{Z).
To obtain it is by
definition to find the interesection of the sets C{X) with A ranging over an infinite number of vectors of IRF^. It is a nearly impossible task. Then we can consider the fundamental vectors e* {i = 1, ...,p) and the sets C{e^). But again the determination of only a finite number of these sets can p
present some difficulties (and the inclusion E{clZ)
= Q{Z) C f]
£(e')
i=l
is not satisfactory). Then we will consider a particular set, say JC{e^), by supposing that just the set of e^-lineally minimum vectors is of specific interest, and we will try to extract the set Q C >C(e^) with some conditions that assure the equality Q{Z) = C{e^). Remark 6.7.4. Generally the inclusion Q{Z) C C{e^) cannot be reversed. For the set Z = {z e E? : zi = z^, 0 ^ zi ^ 1} we have ^ ( c l Z ) =
Scalarization
563
Q{Z) = {(0,0)} hut C{e^) = Z. This process is included in scalarization techniques since it is deeply linked w i t h the following scalar optimization problem:
{
min zjc
(SPe)
z e Zk{(y) = {z e clZ : Zi = ai (^ = 1, ...,p; i 7^ k), ai e M}
If Qk denotes the set of all vectors a G 1RP~^ for which, unless an obvious reordering of its components, there is some z e clZ (i ^ k) and (pkioa) denotes inm{zk
\ Zi = a^ {i = 1, . . . , p , i ^ k),
c l Z } , we have for instance that C{e^) = {{cpi(a),a); determination of the set C{e^)
such t h a t Zi = a^ a e Qi}.
z E
So the
has been achieved by means of a scalar
problem. Now we have t o strengthen the relationships between e'^-lineally minimum vectors and quasi-minimum vectors. Theorem 6.7.12.
An element ((/?fc(a^),a^) E >C(e^) becomes a quasi-
minimum vector for Z if and only if the scalar function (p^ is IRF^ -leftdecreasing at a^ or (pk{oi) > (pk{(^^)f V a < a^ {a e Qk)Proof.
Suppose (pk{c^^) = ^k{o^^)
fo'' some a'
G Qk, 0
: t-^z e [-a, a]} .
It is easy to check that the Minkowski functional is well defined for every z e Z {\t is indeed a norm for each a 6 i n t C ) and this parametric norm is chosen in such a way that its unit ball equals the order interval: [—a, a] = {z e Z : ||2:||a ^ 1}. We will use this remark in the proof of the following theorem. Theorem 6.7.16. Let the set Z have a strictly lower bound z or Z C z + int C. Then z^ e E{Z) if and only if \\z^ - z\\a < \\z - z\\a for some a e intC and for every z e Z. More generally z^ G WE{Z) if and only if ||z^ - z\\a ^ ll^: — z\\a for some a G int C and for every z e Z.
Proof. If z^ G E{Z) one has {z^ - C) D Z = {z^} and then {z^ - zC)n{Z-z) = {z^ - z}. As z-z^ e - i n t C, we obtain Z-z CintC C
The nondifferentiable case
567
{z - z^) + int C C (^ - z^) + C. So we conclude that {z-z^ + C)n {z^ z-C)n{Z-z) = {z^- z} or [-z^ + z,z^-z]n{Z-z) = {z^ - z}. The value z^ strictly minimizes \\z — z\\a for a = z^ — z. Conversely, let us assume that z^ ^ E{Z). Then there exists some z ^ z^ with i G Z n (^^ — C). Consequently we have z — z^z^ — z — C which implies (by the definition of the Minkowski functional) the contradiction \z - z\a ^ \\z^ - z\\a for any a C int C. Now assume that z^ 6 WE{Z). In a similar way to efficient points we obtain {z-z^+mtC)n{z^-z-mtC)n{Z-z) = 0 or mt[z-z^,z^-z]n (Z-z) = 0. This equality implies {z e Z : \\z - z\\a < l}n(Z-z) = il), for a = z^ — ^, or z^ minimizes the a-distance from z. For the converse implication, let us suppose that z^ E Z minimizes \\z — z\\a for some a 6 int C but z^ 0 WE{Z), Then there exists some z e Z n{z^ - int C). We obtain z - z e z^ - z -intC and the contradiction \\z — z\\a < \\z^ — z\\a for any a e int C. D
Theorem 6.7.16 allows us to characterize the efficient and the weakly efficient values for the problem (V.P.) as minimal solutions of a scalar problem with a parametric norm. No convexity assumption is required but only that the set Z has a strictly lower bound. The Minkowski functional was more recently used by [Zheng, 2000] in order to characterize the Henig properly efficient values and the superefficient values. Jahn's approach was also generalized by other authors; for instance [Luc, 1987a] does not consider a norm but a general function which satisfies a monotonicity assumption. Remark 6.7.6. When M^ is componentwise ordered, it is easy to verify that (for every vector a e IRP with positive components) the Minkowski functional is defined by \\z^a = niax^
\zi\/ai.
6.8. The Nondifferentiable Case In Chapter Four, nonsmooth calculus was introduced for scalar functions f : X C IRP' —> IR with its applications to optimization. In par-
568
Vector
optimization
ticular we recall the definition of the Clarke generalized derivative for a Lipschitzian function around x^ e i n t X rix'^.y)
= hmsup - ^
f
no
^
^—^
r.0
f(x + ty) - f{x)
. . = mf
sup
e>0 S>0
—^
^—^-^-^
.. ,
(*)
0 IR^ at a point x^ £ c\X is defined as
L^(xO)=
fl N{x°)eM
cl{/(x), xGiV(x°)}
570
Vector
optimization
where Af denotes the family of the neighborhoods N{x^) of x^. Then, if Lf(x^) ^ 0 the following inequalities hold: sup
inf
/ ( x ) ^ inf Lf{x^) S
S sup Lf{x')
inf
sup
Nix^)eAr
xeN{xO)
^
f{x) .
We will confine ourselves to proving the last inequality. Fix N E Af and let 5^^ = sup f(N). It follows that sup fiN)
= SN = sup clf{N)
^ sup {
n
cl/(iV)} .
N{xO)e^r
By taking the infimum with respect to N{x^),
we obtain
Example 6.8.1. The now proved inequality can be strict. Consider the function / : (—1,1) C iR —^ JR^, with iR^ componentwise ordered and the images/T n e N.
, -)
= {Ox(-oo,-n)}U{(-oo,-n)xO}U{(-l,-l)}.
It works out sup L/(0) = ( - 1 , - 1 ) and
sup
f{x) = (0,0),
V n e W. By the previous inequality we deduce that generally the notion of maximum limit can be defined differently, in the vector case. In order to extend the definition of the Clarke generalized derivative (*) one needs to specify the approach. From now on we will intend maximum limit as the supremum of the set of the limit points of f at x^. This choice has the quality of also involving the topological structure of the image space Z. Moreover, the previous inequality hints at a more careful analysis if it is worked out by means of sup Lf{x^) rather than inf sup f{x). N{xO)eAr xeN{xO)
Let q{x,t,y) Df{x^,y)=
= t-'^[f{x + ty) - f{x)]. Them the set n
N{x^)eM e>0
c l { q { x , t , y ) \ x e N(x^),
0 < t < e}
The nondifferentiable case
571
represents the limit class of q or the limits of sequences such as tj^ ^ [f{x^ tkv) - fix^)] \N\th lim x^ = x^and lim tk = 0^. Lemma 6.8.1.
If /
: X
+
C iR'^ -> ^ ^ is a locally Lipschitz function
at x^ G i n t X (with Lipschitz constant L) then there exist a neighborhood N{x^)
and £ > 0 such that q{x, t,y) e L \\y\\ B.Wx
e N{x^)
and
V t G (0, s), where B denotes the unit ball. Proof. Let N{x^) be a neighborhood where / is a Lipschitzian function. It is always possible to find a neighborhood N{x^) and e > 0 such that N{x^) + £^(0) C N{x^) for any neighborhood V of the origin. From the definition of Lipschitzian function, \/x 6 N{x^) and V t G (0,£:), we get ||g(x,t,y)||^L||yl|. D Theorem 6.8.I. let f : X C M^ -^ EP be a locally Lipschitz function at a ; ° € i n t X . Then, V y € iR":
a)
Df{x^,y)^0;
b) Df{x^,
y) is a compact set;
c) Z?/(xO,ay) = a D / ( x O , y ) , V a > 0 ; d) D(a/(xO,y)) = a r > / ( x O , y ) , V a > 0 ; e) i5(-/(a:0,y)) = Z)/(xO,-y); f) D/(xO, y' + y2) c D/CxO, y^) + D/(a;0, y^), Vy^, y2 e EP ; g) ^ ( / i + f2){x\y) C D/i(xO,y) + Dh{x^y) Lipschitzian at x^.
with / i and /s locally
Proof. a) It follows immediately from Lemma 6.8.1. b) Also this statement follows immediately from the definition of D / ( x ^ , y) and from the previous lemma. c) and d) See the definition of
q{x^t^y).
572
Vector optimization
e) Let L e D{-f(x^, x° and
lim
y)). Then there exist {x^} and {tk} with
tk = 0+ such that
k—*+oo
L.
lim
t7\-f{x''
lim
+ tky) + fix'')]
x^ = =
k—^+oo
It follows that
lim
fj^^[f{x''
+ tky - tky) - /(x'^ + tky)]
=
k-^-\-oo
L e Df{x^^ —y). In the same manner one can prove Df{x^,—y)
C
D{-f{x^y)). f)
If L € r > / ( x ° , y ^ + y 2 ) , there exist {x*^} and {tfc} with lim
tk = 0"^ and
lim
lim
x*^ = x",
q{x^^tk^y^ + y^) = L. On the other hand
we have
9(x^tk,y' + y^) = tk'[fix'' + tky' + tky^) - fix'')] = = tkHfi^'' + W + tl'[fix''
+ tky^) - fix'' + tky')] +
+ tky')-fix'')]
=
= qix'' + tky', tk, y2) + qix'', tk, y') . Thanks to Lemma 6.8.1 this last sequence (or eventually a subsequence) converges to an element of Dfix^,y'^) + Z?/(x°, y^). g) The proof is the same as in f ) .
D
Definition 6.8.2. Let / : i R " -^ JRP be locally Lipschitz at x ° 6 i n t X . The function f^:W^ EP
/°(x°,3/) = supD/(x°,y) is said to be the generalized directional derivative of / at x^. Remark 6.8.1. The Lipschitz assumption implies (see also Theorem 6.8.1 a)) f^ipcP^y) > —oo. The usual hypothesis i n t C 7^ 0 allows us to refuse the case f^{x^^y)
= +00. Indeed from Theorem 6.8.1 b) we have the bound-
edness of Df{x^^y) Df{x^,y) or Df{x^^y)
or the existence of a neighborhood f7n(0) such that
C t/n(0). If c G int C, there exists Ur{0) with c - Ur{0) C C C —c—C. r
The set Df{x^^y)
is above bounded and the
The nondifferentiable case
573
completeness axiom assures f^{x^,y)
< +00.
Theorem 6.8.2. Let / , / i , / 2 : X C JR^ -^ ]RP he locally Lipschitz functions at a;° 6 i n t X . The following properties hold (for any y,y^,y^ €
a) / V , « 2 / ) = « / V . 2 / ) ,
Va>0;
b) (a/)0(a:0,2/)=a/°(Ay),
Va>0;
c) (-/)V,y) = /V,-j/); d) /V,y' + y')^/V,y^) + /V,y'): e) (/i + /2)°(x°, y) ^ /?(a:0, y) + /^{x^, y). Proof. One can proceed as in Theorem 6.8.1.
D
In the same spirit of the scalar case, we now introduce the notion of generalized subdifferential. It comes down to the definition in Chapter Four when / is a scalar function. In Definition 6.8.3, L{1R^^IRP) will denote the space of the linear maps from IR^ to IRP. Definition 6.8.3. The set
aV(^') = {Ae L{]R^,W) : Ay ^ /^(x^y), Vy G iR^} is said to be the generalized subdifferential at x^ G i n t X of the function f \X CET' ^ W, which is locally Lipschitzian at x^. Theorem 6.8.3. The generalized subdifferential d^f{x^)
is a nonempty
convex set with
f^ix^jv)
=
max Ay .
Proof. Fix any y ^ 0. We can define a map T{ay),
linear with respect to
a and such that r ( 0 ) = 0 and T{y) = / ^ ( x ^ , y). For its linearity we have
T{ay) = f{x^,ay),
Va ^ 0. From 0 ^ f{x^,y)
+ f{x^,-'y)
it follows,
574
Vector
for any a ^ 0, T{-ay) inequality T{ay)
= -T{ay)
S f^{x^,ay),
= -f{x^,ay)
optimization
^ f{x^,-ay).
The
V a G M, and the Hahn-Banach theorem
(extended t o functions with values in a complete ordered linear space) produce a linear map A such that Ay S / ^ ( x ^ , y ) , V y G M^.
d^f{x^)
^ 0. as ^ E a V ( ^ ^ ) , and f{x^,y)
The convexity of the set d^f{x^)
=max{Ay]Ae
We have
d^f(x^)}.
is immediate.
D
Remark 6.8.2. [Thibault, 1980] has deepened the analogy with the scalar case by proving, under some additional slight hypotheses, t h a t d^f{x^)
is
a compact set also in infinite dimensional spaces. Another useful property of d^f(x^) Definition 6.8.4.
A set t / C L{W,1RP)
vex when, \/Ai,A2 L ^ {IRP.IRP),
A ^ i + (/p
is given by its operator convexity. is said t o be operator
G C/ and V A G L ^ {EP.RP)
con-
w i t h (/p - A ) G
we have
~A)A2eU
where Ip is the identity map in ]RP and L > {IRP^IRF) denotes the set of the linear maps A \ BP -^ IW such that Ac eC,\/ce
C.
Definition 6.8.5. T h e set n coot/ = { ^
AiAi
;AieU,AieL^
is called the operator
{IBF.M'),
^
A^ - / ^ ; n G W }
convex hull of the set U C L ( i R ^ , i R ^ ) .
In the convex hull of a set C/ C L{]R^^IRP)
the weighting multipli-
ers can always be considered as a diagonal matrix whose elements are \y
= A ( ^ \ j = l , . . - , p , 0 ^ A^^) g 1. Then an easy comparison leads
t o the inclusion convC/ c cooU. proposition immediately.
From Definition 6.8.4 we arrive at this
The nondifferentiable case
Theorem 6.8.4. convex set.
575
The generalized subdifferential d^f{x^)
is an operator
Remark 6.8.3. When C = M^, the generalized subdifferential of the vector function / is given by the product of the subdifferentials d^fi{x^), defined in Chapter Four for scalar functions. To this end, one has to prove that p
sup Df{x^^y)
= J J sup Dfi{x^^y) for any fixed y. Then Ay S
f^{x^^y)
if and only if ^ y ^ fi{x^,y) or A e d^f{x^) if and only if A^ G d^fi{x^). Let l = {li,..., Ip) e Df{x^, y). Then k G Dfi{x^, y). Hence Df{x^, y) C p
J J Dfi{x^^y)
and consequently we get
2=1
P
P
sup Z)/(x^y) ^ sup n
Dfi{x^,y)
= J ] sup Dfiix'^.y)
i=l
.
i=l P
Consider the case sup Df{x^^y)
< sup J][ Dfi[x^^y).
There exist UJ G
1=1 p
M^ with sup Df[x^^y)
< LO < sup J J Dfi{x^^y),
an index z and some
2=1
sequences { x ^ } and { t ^ } converging to x^ and 0"^ such that Ui < Si = lim
t r ^ [ / i ( x ^ + tA:y) —/i(x'^)]. Eventually by taking some subsequences
/c—•+00
one can get an element 5 = (5i,...,5p) G Df{x^^y). It works out 5 ^ sup Df{x^^y) < a; or 5j ^ CJJ (V j = 1, ...,p) but a;^ < 5^. For scalar functions, f^{x^^y) was also defined as a particular case of fc-directional derivative. We wish to show that this equivalence is still valid for the vector case. In the following definition, H denotes the Clarke tangent cone; the epigraph of / : X C JR^ -^ IRP (denoted by e p i / ) is the set { ( x , z) \ x e X \ z - f{x) G C}.
Definition 6.8.6. The function f^{x^,y) f^(x^y)=mf{^eMP
\ JRI" -^ IRP
: {y,0 e H{epif,{x'j{x'))),
y e M^}
576
Vector
optimization
is called directional derivative of f : X C R"' -* RP at x'^ e i n t X . Theorem 6.8.5.
Let / : X C i R " - ^ iR^ be a locally Lipschitz func-
tion at a;° 6 i n t X . Then fix^,y)
= f^{x°,y),
Vy 6 M".
Proof. Fix any y 6 RP'. Let | such that ( y , 0 € i l ( e p i / , ( x ° , / ( x ° ) ) ) and I e Df(x^,y). Then there exist {x''} and {ifc} with lim x'' = x^
and lim t^-= 0+such that Z =
lim t7^[f (x^+ tky)-f{x^)].
By the
continuity of / , {f{x^)} converges to f{x^). By the definition of Clarke's tangent cone there exists a sequence { ( y ^ , ^ ^ ) } converging to (y, 0 with (x^/(x*=)) + tfc(y^^^) 6 e p i / . Then C''-tkHfi^' + tky'')-fix')] 6 C or £ —Z 6 C. So we have obtained inf ^ ^ sup / or f^(x^, y) ^ f{x^, y). Now we have to prove that any upper bound ^ of f^(x^^y) is also an upper bound of f^{x^^y). Consider a sequence {tk^x^^z^} converging to (0,rr^,/(x^)) with (x^^z^) G e p i / . Eventually by taking some subsequences, tl^^ifix^ + tky) — f{x^)] converges to some / G Df{x^^y). Then
z^ + tk{^ + o(l)) € f{x^) + C + tk{l + C + 0(1)) c fix^ + hy) + C . It works out that [{x^,z^) + tk{y,^ + o(l))] e e p i / or ( T / , ^ ^ (x^J{x')))orC^f^{x^,y),
H(epif, D
As a consequence of this theorem we have that the geometrical approach of Definition 6.8.6 leads to the same definition and to the same properties for the generalized subdifferential d^f{x^). The equality of Theorem 6.8.5 was extended by [Thibault, 1982] to non-Lipschitzian functions. In the scalar case the generalized subdifferential can also be characterized through Rademacher's theorem. This theorem is still valid with vector functions. Then we are able to introduce the definition of the so-called generalized Jacobian and to show the relationships with Definition 6.8.3. The definition of generalized Jacobian already appears in some papers of the early Seventies and even today this definition is the most widespread and quoted.
The nondifferentiahle case
577
Lemma 6.8.2 (Rademacher's theorem). A vector Lipschitzian function on an open set A C M^ is a.e. differentiable or it is differentiable on a subset ncA with m{A\n) = 0. Definition 6.8.7. Let / : X C iR'^ - ^ iR^ be a locally Lipschitz function at x^ E i n t X . The set:
jO/(x^) = conv Jf{x^) = conv{A e L{IR'',mP):A=
lim
Jf{x^)}
is said to be the generalized Jacobian of / at x^. Theorem 6.8.6. The following properties hold:
i) ii)
J^f{x°)^$; J^f{x^)
is a convex set;
iii) J°/(a:°) is a compact subset in
L{R'',]RP)
;
iv) J°/(x°) C n J^fii^') • Proof. i)
It is enough to remark that
lim
Jf{x^),
x^ e ft, exists since / is
x^—>x^
locally Lipschitzian at x^ and hence Jf is norm-bounded. ii)
It is an immediate consequence of Definition 6.8.7.
iii) The convex hull of a compact set is compact. Thus it is sufficient to show that Jf{x^) is a compact set or a closed set. Let {Ai} C Jf{x^), with lim Ai = A. Then Ai = lim J / ( x f ) with x^ 6 Q,
lim x'^ = x°. We obtain A E Jf{x°). iv) We can prove the inclusion for J but the inclusion Jf{x^) easily follows from the definition of the sets J .
C Y[
Jfiip^^) D
578
Vector
optimization
Theorem 6.8.7.
i) J 0 ( / + g){x'>) C J'fix^)
+ A(x°);
Proof. i)
It is sufficient to prove the inclusion for the sets J. Let A G J ( / +
^)(xO), ^ =
lim J{f + g){x'') = _ lim [Jf(x'')
+ J9{x% with
x^ converging to x^ and a:^ G O. By taking some possible subsequences, Jf(x^) and Jg(x^) converge to some elements of Jf{x^) and of Jg{x^), respectively. Then A G Jf[x^) + Jg{x^)ii) This can be proved in an analogous manner.
D
But now the generalized Jacobian definition is not equivalent to the definition of the generalized subdifferential. The inclusion relation and its consequences about optimization problems will be deduced from the following theorem.
Theorem 6.8.8. d^f{x^) = cl coo
J^f{x^).
Proof. First we will prove that f{x^,y) A 6 Jf{x^),
A = lim Jf(x^),
=
sup
Ay, Vy e JR^. Let
with x^ converging to x^ and x^ e ft.
fc—•4-00
For any y G JR^ we get Ay = lim Jf{x^) y = lim A;—i'+oo
=
lim q{x^^ti^^y).
lim g(x^, ti^ y)
/:—>-foo i—>+oo
It works out Ay G Df{x^^y)
and hence
sup Ay ^ f^{x^,y)Foi" the opposite inequality consider any AeJfixO) lim q{x^,tk,y) e Df{x^,y). From lim q{x^,tk,y)^ sup lim k-^+oo
lim tfc^O+
Ac—^+00 f(x^ 4- tirV) — f(x^)
^-^
^^———tk
and hence f^{x^^y)
. we obtain lim q{x^, tk, y) ^ /c^+oo
^
x^eQ. x^-^x^ u
sup Ay AeJf{x^)
sup Ay. AeJfix^) Now J^fix^) = convJ/(xO) c cooJf{x^) implies coo J^f{x^) C cooJf{x^). As Jf{x^) C J^f{x^) implies the opposite inclusion, we have coo J^f{x^) = cooJfix^) or clcoo J V ( ^ ^ ) = clcoo J/(xO) = ^^(xO).
The nondifferentiahle case
579
The last equality is a direct consequence of the theorem of Zaslavskii (see [Rubinov, 1977. p. 124]). D Theorem 6.8.8 states that generally J^f{x^) is a subset of 5^f(x^). This inclusion justifies the choice that necessary conditions for (V.P.) will be given in sharper terms of generalized Jacobian. Generally the last part of this section will propose a non-smooth version of some theorems which were demonstrated in Section 6 in the differentiable case. Theorem 6.8.9. If / is a locally Lipschitz function at x^ G i n t X and x^ is an unconstrained weakly local efficient point of / , then J^f{x^)yn ( - i n t Cy i^ 0 for every y^W'. Proof. We suppose that we can find some y G TRP' such that J^f(x^)y C — i n t C Since the set J^f(x^)y is convex and compact, there exists a closed convex neighborhood V of it contained in —intC. By the upper semi-continuity of the generalized Jacobian, there exists 5 > 0 such that J^f{^)y ^ ^ for every x G conv {x^, x^ + Sy}. As V is convex and closed, we have cl conv {J^ f {x)y ] x G conv{x^,x^ -|- Sy}} C V. By applying the mean value theorem, we obtain:
f{x^ + ty)-f{x^)
G
G cl conv {J^f{x)y
; x G conv {x^, x^ + 5y}} C tV C - i n t C
Vt G (0,5), which contradicts the assumption of the theorem.
D
Remark 6.8.4. The condition J^f{x^)y 0 ( - i n t C ) ^ ^ 0, Vy G iR^, is equivalent to 0 G ^^J^f{x^) for some i9^ e C^ \ { 0 } . Indeed, if there exists some y G IR^ with J^f{x^)y C —intC, then one has ^Ay < 0, Vi9 G C ^ \ { 0 } and WA G J ^ / ( x ^ ) . Conversely, if 0 ^ 79J^/(x^) for every d e C > \ { 0 } , one can separate the origin and the convex compact set {'dJ^f{x^) ] 'd E C > \ { 0 } } . Thus, there is some y G JR^ such that ^Ay