COMPUTATIONAL METHODS IN OPTIMIZATION A Unified Approach
This is Volume 77 in MATHEMATICS IN SCIENCE AND ENGINEERING ...
41 downloads
588 Views
13MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
COMPUTATIONAL METHODS IN OPTIMIZATION A Unified Approach
This is Volume 77 in MATHEMATICS IN SCIENCE AND ENGINEERING A series of monographs and textbooks Edited by RICHARD BELLMAN, University of Southern California A complete list of the books in this series appears at the end of this volume.
COMPUTATIONAL METHODS IN OPTIMIZATION A Unified Approach
E. Polak DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER UNIVERSITY OF CALIFORNIA BERKELEY,CALIFORNIA
SCIENCES
1971 ACADEMIC PRESS
New York and London
COPYRIGHT 0 1971, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED NO PART OF THIS BOOK MAY BE REPRODUCED IN ANY FORM, BY PHOTOSTAT, MICROFILM, RETRIEVAL SYSTEM, OR ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS. REPRODUCTION IN WHOLE OR IN PART FOR ANY PURPOSE OF THE UNITEDSTATEGOVERNMENT IS PERMITTED.
ACADEMIC PRESS, INC. 111 Fifth Avenue, New York, New York 10003
United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. Berkeley Square House, London W l X 6BA
LIBRARY OF CONGRESS CATALOG CARDNUMBER: 72-134540 AMS(M0S) 1970 SUBJECT CLASSIFICATIONS: 90C30, 90C50, 49D05, 49D10,49D15,49D30,49D35,49D40,49D45,49D99,34B05,34B15, 34B99, 3900,65H10,65K05,65L10,65Q05
PRINTED IN THE UNITED STATES OF AMERICA
TO O R E N A N D S H A R O N
This page intentionally left blank
ix
Preface Note to the Reader Conventions and Symbols
1
2
PRELlMI N A R Y RESULTS
4
1
1.1 Nonlinear Programming and Optimal Control Problems 1.2 Optimality Conditions 1.3 Models and Convergence Conditions for Computational Methods
1 7 12
UN C O NSTRAl NE D MINIMIZ A T l O N
28
2.1 2.2 2.3 2.4 2.5
3
xiii xv
Gradient and Quasi-Newton Methods in Rn Reduction of Derivative Calculations Conjugate Gradient Methods in [Wn Unconstrained Discrete Optimal Control Problems Unconstrained Continuous Optimal Control Problems
EQUALITY CONSTRAINTS: R O O T A N D B O UNDARY-VALUE PROBLEMS
28 40 44 66 71
79
3.1 Zeros of a Function and Problems with Equality Constraints in Rn 3.2 Boundary-Value Problems and Discrete Optimal Control 3.3 Boundary-Value Problems and Continuous Optimal Control
83 103
EQUALITY A N D I N E Q U A L I T Y C O N S T R A I N T S
126
4.1 Penalty Function Methods 4.2 Methods of Centers
126 150 vi i
79
CONTENTS
viii
5
6
4.3 Methods of Feasible Directions 4.4 Second-Order Methods of Feasible Directions 4.5 Gradient Projection Methods
159 180 185
C O N V E X OPTIMAL C O N T R O L PROBLEMS
208
5.1 Nonlinear Programming Algorithms Revisited 5.2 A Decomposition Algorithm of the Dual Type 5.3 A Decomposition Algorithm of the Primal Type
208 21 1 234
RATE O F CONVERGENCE
242
6.1 6.2 6.3 6.4
242 25 1 259 268
Linear Convergence Superlinear Convergence: Quasi-Newton Methods Superlinear Convergence: Conjugate Gradient Methods Superlinear Convergence: the Variable Metric Algorithm
Appendices
A
FURTHER MODELS FOR C O M P U T A T I O N A L METHODS
283
A Model for the Implementation of Certain Conceptual Optimal Control Algorithms A.2 An Open-Loop Model for the Implementation of Conceptual Algorithms
288
PROPERTIES O F C O N T I N U O U S F U N C T I O N S
292
B.l B.2 B.3
292 294 295
A.l
B
C
Expansions of Continuous Functions Convex Functions A Few Miscellaneous Results
283
A GUIDE TO IMPLEMENTABLE ALGORITHMS
299
C.l C.2 C.3 C.4 C.5 C.6 C.7
299 301 304 306 309 312 315
General Considerations Gradient Methods Quasi-Newton Methods Conjugate Gradient Algorithms Penalty Function Methods Methods of Feasible Directions with Linear Search Methods of Feasible Directions with Quadratic Search
References Index
317 323
Algorithms are inventions which very often appear to have little or nothing in common with one another. As a result, it was held for a long time that a coherent theory of algorithms couid not be constructed. The last few years have shown that this belief was incorrect, that most convergent algorithms share certain basic properties, and hence that a unified approach to algorithms is possible. This book presents a large number of optimization, boundary-value and root-solving algorithms in the light of a theory developed by the author, which deals with algorithm convergence and implementation. The theory of algorithms presented in this book rests on two pillars. One of these consists of a few, very simple, algorithm models (prototypes) with corresponding convergence theorems. The great importance of these algorithm models lies in the fact that they fit almost all existing algorithms for solving problems in optimal control and in nonlinear programming. Consequently, they provide us with a highly systematic approach to the study of convergence properties of algorithms. This systematic approach is valuable to us in three ways: It guides our inventive process toward algorithms whose convergence is assured, it simplifies considerably our work in showing that a specific algorithm is convergent, and it makes the teaching of algorithms considerably less time-consuming. The second pillar of the theory of algorithms presented in this book consists of a methodology of implementation. Algorithms are usually invented in a form which is conceptually simple, but which is not necessarily irnplementable on a digital computer. Thus, an algorithm will usually construct a sequence of points z, ,z, ,z, ,... which converges to a solution point z. In a “conceptual,” or theoretical, algorithm, the construction of the point z ~ + ~ from the point zi may require the use of a subalgorithm which sets yo = z( ix
X
PREFACE
and then constructs an infinite sequence y o , y , ,y , ,... which converges to zt+, (for example, as in the case of penalty function algorithms). Theoretically,
therefore, in the case of such an algorithm, we can never compute zi+lin finite time. In practice, we truncate the construction of the sequence yo ,y , ,... after a finite number of elements have been constructed. This truncation must be done with care: If we let the construction of the sequence y o , y l , y , ,... run for too long at each iteration, we may be using up much more computer time than is really needed; if we truncate too soon, we may lose convergence. The methodology of algorithm implementation presented throughout this book enables us to set up efficient schemes for truncating the construction of these sequences y o , y1 ,y , ,..., which are compatible with good convergence properties in the resulting implementation of the conceptual algorithm. This book can be used either as a graduate level or reference text. It has been used in the Department of Electrical Engineering and Computer Sciences of the University of California, Berkeley, as a text for a first-year graduate level course in computational methods in optimization. The book deals with optimal control and nonlinear programming problems in a unified way, following the pattern set up in M. D. Canon, C. D. Cullum and E. Polak, “Theory of Optimal Control and Mathematical Programming,” McGraw-Hill, New York, 1970, where the reader will find all the required background material on conditions of optimality, linear and quadratic programming, and convexity. Otherwise, this book is self-contained, with Appendix B furnishing the reader with the few additional mathematical results that he will need. To facilitate the use of this book as a text, the author has slightly modified a number of standard algorithms so as to fit them into a unified framework. These modifications do not seem to affect adversely the performance of the algorithms in question. In selecting algorithms for presentation in this book, the author has given preference to algorithms which can be discussed easily within the theoretical framework of the book and to algorithms which can be used both for nonlinear programming and for optimal control problems. As a result, set approximation and cutting plane methods, the reduced gradient method, the convex simplex method, and dynamic programming were omitted. For those who will be using this book as a reference text, Appendix C was added, to help in the choice of an efficient implementable algorithm. This appendix sets down the author’s personal preferences in algorithms and offers the reader parameters for adjusting these algorithms to his own taste. In presenting algorithms, either in their original or modified form, the author has attempted to acknowledge their originators. However, since some algorithms have been discovered more than once, and since there is little agreement as to the extent to which one algorithm should differ from another
PREFACE
xi
before a person has a “right” to put one’s name on it, the author wishes to apologize if, inadvertently, he has failed to give proper credit. Also, since the origins of some algorithms appear to be obscure, a number of them have been presented without acknowledgement, on the assumption that they are part of our technical folklore. The list of references at the end of the book includes only books and papers which the author consulted, directly or indirectly, in the preparation of the manuscript. They do not constitute an exhaustive bibliography. The author is grateful to Drs. E. Gilbert, H. Halkin, P. Huard, D.H. Jacobson, and E. J. Messerli for comments, criticisms and suggestions, and to the graduate students in the author’s course on algorithms, Messrs. G. Gross, F. Wu and 0. Pironneau, in particular, for comments that have resulted in various improvements in the text. The new algorithms presented in this text have been tested to some extent, and the author is indebted to Messrs. K. Jeyarasasingam, J. Spoerl and J. Raphel for the many hours they have spent programming and computing with these algorithms. The author is particularly indebted to his former students, Drs. R. Klessig and G. Meyer, for their collaboration on algorithms and algorithm models; to Dr. R. Klessig for his invaluable assistence in proofreading the manuscript, to Dr. A. Cohen and Mr. M. J. D. Powell for supplying the author with their results on the rate of convergence of algorithms prior to the publication of these results in the technical literature, and to Dr. W. I. Zangwill for the pleasant discussions in 1967 which have strongly stimulated the author’s work in the area of algorithms. The preparation of this volume involved a great amount of preliminary research which would have been impossible without the generous support received from the National Aeronautics and Space Administration under Grant NGL 05-003-016 and supplements 4,5 and 6, from the Joint Services Electronics Program under Grant AFOSR 68-1488, and from the University of California. This support is gratefully acknowledged.
This page intentionally left blank
NOTE TO THE READER
The system of numbering and cross-referencing is described as follows. Within each section, definitions, theorems, equations, remarks, and so forth, are numbered consecutively by means of boldface numerals appearing in the left-hand margin. In reference to a section within the same chapter, the section number only is used; in reference to a section in another chapter, both the chapter number and the section number are used. For example, “it was shown in Section 3” refers to Section 3 of the same chapter, while “it was shown in Section 2.3” refers to Section 3 of Chapter 2. Similarly, “substituting from (3)” refers to item 3 in the same section, “substituting from (2.3)” refers to item 3 in Section 2 of the same chapter, and “substituting from (3.2.3)” refers to item 3 in Section 2 of Chapter 3.
xiii
This page intentionally left blank
CONVENTIONS AND SYMBOLS
1 Conventions 1. Rn denotes the euclidean space of ordered n-tuples of real numbers. Elements of Rn are denoted by lower-case letters, with the components of a vector x in R" shown as follows: x = (xl, x2,..., x"). When an n-tuplet is a vector in R", it is always treated as a column vector in matrix multiplications, i.e., as an n x 1 matrix, but with the transposition symbol omitted. The xi yi . scalar product in R" is denoted by (.,.) and is defined by (x, y ) = The norm in R" is denoted by II . II and is defined by II x (I = 2. Cv'[to, t,] denotes the space of all piecewise continuously differentiable functions from [ t o ,t f ] into Rywith norm defined by
zy=l
m.
II x llm
=
SUP I/ x(t)ll t"[tO.tfl
3. L,"[t,, t f ] denotes the Hilbert space consisting of equivalence classes of square integrable functions from [ t o , tf] into R', with norm denoted by (1 * (1, and defined by (1 u 112 = (J: 11 u(t)1I2dt)llZ,and with scalar product denoted by ( - , o ) ~ and defined by ( u l , u,)~= (ul(t), u2(t)) dt, where II (1 and (.;> denote the norm and scalar product in R', respectively. 4. LmP[tO, t,] denotes the Banach space consisting of equivalence classes of essentially bounded, measurable functions from [ t o , t,] into R', with norm denoted by )I . /Im and defined by )I u /Im = ess suptortOst,l II u(t)lJ, 5. f (.) or f denotes a function, with the dot standing for the undesignated variable; f ( z ) denotes the value off (.) at the point z. To indicate that the B. Assuming domain off (.) is A and that its range is B, we write f : A that f : A -+ R", we write f in expanded form as follows: f = (fl, f 2,..., f "), so that f ( 2 ) = (f' ( z ) ,f 2(z),..., f "(z)).
Jz
-
.--f
xv
xvi
CONVENTIONS AND SYMBOLS
6 . Given a function g : Rn + Rm, we denote its Jacobian matrix at z by ag(z)/az.This is an rn x n matrix whose ijth element is agi(z)/azj. 7. Given a functionf : Rn + R1,we denote by V’(z) its gradient at z. We always treat V f i ( z ) as a column vector, and hence, its transpose is equal to i3fi(z)/az, i.e., for any y E R”, ( V f i ( z ) ,y ) = (af”(z)/az) y . We denote by P f i ( z ) / a z 2the Hessian off i(-) at z . The Hessian is an n x n matrix whose jkth element is azfi(z)/azjazk. 8. Superscript -1 denotes the inverse of a matrix, e.g., A-l. 9. Superscript T denotes the transpose of a matrix, e.g. AT. 10. To avoid the need for writing {z’ E T I 11 z - z’ lid E } , we abuse standard mathematical notation for balls and denote this set by B(z, E), where I/ . denotes the norm in the particular Banach space under discussion.
,
subject to the constraints 6
Si(Ui)
i = 0,l,..., k
5 0,
- 1,
7
i
= 0,l,...,k ,
where fz : [w” x R” -+ R’, fzo : R’ x R” -+ R1, g) : [wy R1, xi : R” -,R U r , gi : Ry -+ Rzi, and qt : [wy -+ Rmi are continuously differentiable functions. The integer k is the duration of the control process. -+
8 The Continuous Optimal Control Problem. Given a dynamical system
described by the differential equation 9
where x ( t ) is the state of the system at time t, u (t) is the control applied to the system at time t , to is the given starting time, and t f is the final time and may or may not be specified in advance, find a measurable control function a(*) defined on [to , ti], a corresponding trajectory a(*), determined by (9), and the final time t f ,if it is not specified, which minimize the cost functional
subject to the constraints 11
s(u(t))
50
g(x(t>,0 = 0, q(x(t),t ) 5 0
12
for t E [ t o ,ti], for t E [to, trl,
where f :R’ x R” x R1 [wy and f o : [wy x R” x R1-+ R1 are continuously differentiable in x and in u, g) : R‘ -+ R1, g : Ry x R1 Rz and q : Ru x Iwl -+ 58” are continuously differentiable in x , and s : R’ + R’’ is continuously differentiable in u. In addition, ,f,fO, aflax, aflau, af ”lax, af Olau, g, q, ag/ax, aq/ax are piecewise continuous in t. The differentiability assumptions stated above are more stringent than is necessary for the purpose of deriving conditions of optimality, but are usually required in the application of the computational methods. -+
-+
1.1 NONLINEAR PROGRAMMING A N D OPTIMAL CONTROL PROBLEMS
3
The nonlinear programming problem (1) is the simplest of the three problems that we have introduced. Not surprisingly, therefore, the largest fraction of existing algorithms deals with this problem. We shall show that a number of nonlinear programming algorithms are applicable also to optimal control problems. In doing so we shall make constant use of the following transcriptions of discrete optimal control problems into the form of a nonlinear programming problem (see Section 1.5 in [Cl]).
1 3 First Transcription. Consider the discrete optimal control problem (3). Let z = (xo x1 XI, u0 u1 )...)uk-1)y k t 9
y...y
7
9
14
let
15
r(z) :
and let
Then we see that the discrete optimal control problem (3) assumes the form of problem (2) and hence, it becomes clear that, at least in principle, all the nonlinear programming algorithms are applicable to the discrete optimal control problem. The very high dimensionality of the vector z in (1 3) makes this transcription unsatisfactory in many instances. For this reason, we introduce an alternative transcription which utilizes a certain amount of precomputing in order to yield a problem of the form (2) in which the dimension of the vector z is not unreasonably high.
17 Second Transcription. Consider the discrete optimal control problem (3). Let z = (x, ,u, ,u, ,..., uk-1) and let xi(xo , @) denote the solution of (4) at time i corresponding to the control sequence 9 = (u, , u1 ,..., u ~ - ~ ) .
4
,l PRELIMINARY RESULTS
We now set 18
19
and
20
f(4 =
With these definitions, the discrete optimal control problem again becomes reduced to the form of the nonlinear programming problem (2). However, now the vector z is of smaller dimension than in (13) and, in addition, the number of equality constraints, which are governed by the dimensionality of the vector r(z), has also been substantially reduced. We shall now discuss the origin of the discrete optimal control problem and why in many instances the formulation of an optimal control problem in discrete form may be preferable to a formulation in continuous form. The continuous optimal control problem requires us to find a measurable function u(.). However, digital computers do not compute such functions; they can only compute a sequence of numbers. Consequently, any numerical method of solving continuous optimal control problems must involve some form of discretization of the problem. Discrete optimal control problems are frequently discretizations of continuous optimal control problems, with the discretization being carried out in such a way as to minimize the mathematical difficulties in dealing with the problem, as well as to minimize the digital computer memory requirements and to reduce the number of operations required per iteration. As a general rule, one may frequently be forced to resort to a discrete optimal control problem formulation whenever one is interested in on-line control of a dynamical system by means of a small digital computer, since in such situations the solution of a continuous optimal control probIem may simply not be practicable. A rather common manner of producing a discrete optimal control problem from a continuous optimal control problem is that given below. Note that it permits us to exercise careful control on the digital computer memory requirements in solving this problem through the choice of the integer k.
1.1
NONLINEAR PROGRAMMING A N D OPTIMAL CONTROL PROBLEMS
5
It may also result in a reduced number of numerical operations per iteration. Thus, suppose that we restrict u(.) to the class of piecewise constant functions with at most k - 1 equidistant discontinuities. Let T = (tt - to)/k and let (assuming that tr is given) 21
u(t) = ui
for t E [to
+ iT, to + (i + l ) T ) ,
i
= 0,
1,..., k - 1.
Then the control constraints (1 1) become, after renaming functions, si(ui) = s(ui)S 0
22
for i
= 0,
l,..., k - 1.
+ + +
Now, let xi(t) for i = 0, I,..., k - 1 be the solution of (9) for t E [to iT, to (i l)U, satisfying xi(to i T ) = xi and corresponding to u(t) = ui for t [to iT, to (i 1)T],with xo(to)= x, and xifl= xi(to (i 1)T) for i = 0, l,..., k - 1. Then we find that
+ + +
23
+
+ +
xi+l= xi
+ Ito+(i+l)Tf(~i(t), +iT
ui, t) dt,
i
= 0,
I, ..., k - 1.
to
(y,
Note that (23) is of the same form as i.e., it defines a discrete dynamical t +(a+ )T system. To computef;I(xi, ui)= f(xi(t), ui, t) dt we must solve (9) with the initial condition x(to iT) = xi and u(t) = ui for t E [to iT, to (i 1)T). Thus, this form of discretization determines the number of ui and xi which will have to be stored at each iteration when the problem is solved on a digital computer, but it has nothing to say as to how (9) should be integrated. To complete the generation of a discrete optimal control problem, we set
Jtz+iT
+ +
24
+
+
gi(xi)= g(xi , to qi(xi)= q(xi, to
+ iT), + iT),
i
= 0,
l,..., k.
The need for integrating differential equations in solving optimal control problems results in a number of practical difficulties. It also introduces theoretical difficulties whenever one is honest enough to admit that these integrations can only be performed approximately. We shall give a few results which bear on this subject later on. However, at the present stage of development, one mostly has no choice but to isolate these difficulties, deal with them heuristically, and hope that the future will bring us better understanding of this problem. 25 Exercise. Consider the continuous optimal control problem:
26
minimize
u(t)2 dt,
1 PRELIMINARY RESULTS
6
subject to the constraints
d dt
27
- ~ ( t= ) Ax(t)
+ bu(t),
I u(t>l < 1
28
~ ( 0= ) X, , t
E
[0, te],
for t
E
[0, ti],
where x(t) E IW" and u(t) E IW1 for t E [0, tr], A is a constant matrix, and b is a constant vector. (a) Show that when a discrete optimal control problem is obtained from this continuous optimal control problem in the manner discussed previously, i.e., as in (22)-(24), one can get explicit expressions for the functions fi(xi , ui) and fio(xi , ui). (b) After obtaining a discrete optimal control problem from the continuous optimal control problem above, transcribe the discrete optimal control problem into nonlinear programming problem form using transcriptions (13) and (17). The discretization of the continuous optimal control problem can be performed not only in the manner indicated in (21), but also in many other ways. For example, provided one can still interpret the given constraints on the control, one could reduce the infinite dimensional problem (8) to the finite dimensional form (2) by requiring that k-1
29
u(t) =
C uisin wit
for t
E [to , ti],
i=O
where the w iare given and the uiare to be found. Or else one could require that Z
30
u(t) =
1 uijti
for t
E
[to
+ iT, to + (i + 1) T ) ,
j=O
in which case the control vector ui = (uio,uil ,...,uil)of the discrete optimal control problem has higher dimension than that of u(t), the control vector for the continuous optimal control problem. When the discretizations (29) or (30) are used, it may be quite difficult to transcribe the constraint (11) into a suitable set of constraints for the ui in (29) or the uii in (30). Suppose that u(t) E R1 and that the constraint s(u(t)) S 0 is of the form I u(t)l < 1 (i.e., sl(u) = u - 1, s2(u) = -u - 1). Construct constraints on the ui in (29) and on the uij in (30) w which are sufficient to ensure that u(t) satisfies (1 1).
31 Exercise.
1.2
1.2
7
OPTIMALITY CONDITIONS
Optimality Conditions
So far, there are no satisfactory tests for establishing whether a candidate solution to any one of the three general optimization problems presented in the preceding section is optimal or not. It is therefore not surprising that there are no general methods for solving such problems. The algorithms which we shall describe in the chapters to follow can only be used to construct a sequence whose limit satisfies a particular optimality condition. Generally, in the absence of convexity assumptions, such a condition is only a necessary condition of optimality. The subject of optimality conditions is rather vast and the reader is therefore referred to [CI] for a presentation in depth. Here we shall content ourselves with stating, mostly without proof, the optimality conditions used most frequently in computational methods. We shall also identify a few cases where these optimality conditions are satisfied trivially, since any algorithm depending on these conditions would become useless for such a problem. 1 Theorem. If f is optimal for the nonlinear programming problem (l.l), i.e., fO(2) = min{fo(z) i f ( z ) 5 0, r(z) = 02,then there exist scalar multipliers po < 0, p1 < 0,..., pm < 0, and +l, +z ,..., +z,not all zero, such that m
2
i=O
i=l
1 pi Of"(;) + 1 p vrya) = 0,
2
and
pyi((;)= 0
3
for i
=
1, 2,..., m.
An early version of this theorem is due to John [Jl]. In the form stated, this theorem is quite difficult to prove and the interested reader should consult either [CI] or [M4] if he wishes to see how this is done. The following two corollaries are important special cases of the KuhnTucker conditions [K4]: 4 Corollary. If there exists a vector h E [w" such that (Vfi(f), h) > 0 for all i E (1,2, ..., m} satisfyingfi(2) = 0, then the multipliers satisfyiqg (2) and (3) must also satisfy (PO, #l, +z ,..., +&)# 0.
5 Exercise. Prove corollary (4).
rn
6 Corollary. Suppose that the vectors Vri(P), i = 1,2,..., I in (2) are linearly independent. If there exists a vector h E R" such that (Vfi(z), h) > 0 for all i E {I, 2,..., m} satisfying fi(f)= 0, and (Vri(z), h) = 0 for i = 1 , 2,..., I, then rn the multipliers satisfying (2) and (3) must also satisfy po < 0.
7 Exercise. Prove corollary (6). 8 We now present a condition of optimality for a special case of the nonlinear programming problem (l.l), viz., the case when r ( * )= 0. This condition was probably first published by Zoutendijk [Z4] and provides a starting point for constructing a number of important algorithms (the method of feasible directions, the method of centers, etc.). It differs from theorem (1) in one very important respect: it does not involve any multipliers. 8 Theorem. Suppose that r ( - )= 0 in the nonlinear programming problem (1.1) and that P is an optimal solution to (l.l), i.e.,fo(S) = min{fo(2) I f ( z ) 5 0}, then 9 min max(V'(S), h ) = 0, heS isJ,,(d)
where S is any subset of Rn containing the origin in its interior, and 10
Jo(.2) = {0}u { i Ifi(P) = 0, i E (1, 2,..., m}}.
Proof. Suppose that (9) is false. Then there must exist a vector h' E S such that (Vfi(S), h') < 6 < 0 for all i E Jo(S).
Now, setting Vri(P) = 0 in (2), since I ( . ) = 0 by assumption, and taking the scalar product of both sides of (2) with h', we get a contradiction of theorem (1). 11 Exercise. Give a proof of theorem (8) which does not require the use of theorem (1).
12 Exercise. For the special case r ( * )= 0, theorems (1) and (8) are equivalent. Above, we have deduced theorem (8) from theorem (1). To complete the demonstration of equivalence, deduce theorem (1) from theorem (8) under the assumption that r ( . ) = 0. 13 Proposition. Suppose that the set 52 = {z Ifz(z) < 0, i = 1,2,..., m} has no interior, and that r ( . ) = 0. Then (2) and (3) can be satisfied at every point z E 52 (i.e., theorem (1) becomes trivial when the set 52 has no interior). Proof. Let z* be any point in 0.Since 52 has no interior, it is clear that there is no vector h E R" such that
(Vf*(z*),h ) < 0
for all i E Z(z*),
{ j E {1,2,..., m}[f i ( z * ) = O}. Suppose that I(z*) = {il, iz,..., &}. where Z(z*) Then, because of the above observation, the linear subspace
L = {V = ((Vq51(z*),h ) ,..., (V&Z*),
h))J h E Rn} C Ra
1.2 OPTlMALlTY CONDITIONS
9
has no ray in common with the cone C = {u I v < 0} C R", and hence L and C can be separated, i.e., there exists a nonzero 5 E RU such that 14
(5, v >
15
(4, 0) 2 0 It now follows from (15) that
=
0
for all
vEL
for all v E C.
5 5 0 and from (14) that
c C(Vqij(z*), h ) U
for all h E R".
=0
j=1
But this implies that
c PVqij(z*) U
16
= 0.
j=1
Setting pi = 0 for all i E I(z*) and pv and (3) are satisfied.
=
Ej for all ij E Z(z*), we see that (2)
17 Exercise. Show that under the assumptions of proposition (13), condition (9) can be satisfied at any point z' E { z I f ( z ) 5 0). So far, we have only presented necessary conditions of optimality for problem (1.1). We shall now give a sufficient condition. 18 Theorem. Consider the problem (1.1). Suppose that the functions f'(.), i = 0, 1, ..., m, are convex and that the function r ( . ) is affine. If 2 satisfies I($) = 0, f i ( Z ) < 0, for i = 1, 2,..., m y and there exist scalar multipliers pi < 0 for i = 1,2,..., m and p for i = 1,2,..., 1 such that 19
-V'O((a)
m
Z
i=l
i=l
+ c pi Vfi(S) + C yY Vri(5) = 0, pyi(S) = 0
20
for i
=
1, 2,...,m y
then P is optimal for (1.1). (This theorem was first presented in [K4].) Proof. Let Q' = {z If(z) 5 0, r(z) = O}. Then, since the ri(-) are affine, for any z E Q', we have (Vri(z),z - 2 ) = 0, and hence, by (19), for any Z E
sz',
c pi(Vfi(z), z m
21
(Vf0(2), z - S )
=
- 2).
i=l
Now, since the functions f i(.)are convex, we have, for any z i E { i Ifi(2) = 0, iE{l, 2,..., m)},
22
(Vlfi(S),z
-
a)
30
=
0
for i
= 0,
1, 2 ,..., k - 1
m (and the (ai ,tii satisfy (1.4), (1.6) and (1.7)). To obtain theorem (24) from theorem (I), we proceed as follows: First we note that (2) is equivalent to the statement aL(D)/az = 0, where W )= pY0(z) ( p , f ( z ) > ) 0 and a 6(z) < 0 such that 4
c(a(z')) - c(z')
< 6(z) < 0, )3 c ( 4 ,
5
then z is desirable. Now, suppose the sequence {zi} is finite, i.e., (zi}= {z, , z1 ,..., zk ,z ~ + ~ } . Then by step 4, c(zk+l)3 c(z,), and hence from (5), zk is desirable. Now suppose that the sequence (zi} is infinite, and that it has a subsequence which converges to the point z'. We express this as zi z' for i E K C (0, 1,2,...). Assuming that zf is not desirable, there exist an E' > 0, a 8' < 0, and a k E K such that for all i 2 k, i E K, ---f
6
II zi
- z' Ila
< .z'
and 7
c(zi+l) - c(zJ ,< 6'.
Hence, for any two consecutive points zi ,z $ + of ~ the subsequence, with i 3 k (and i, (i + j ) E K ) , we must have 8
C(Zi+j) - C ( Z i )
= [C(Zi+j)
+
- C(Zi+i-J
[C(Zi+l)
+
[C(Zi+j-l)
- C(Zi+j-2)1
- C(Zi)l C(Zi+l) =l -4 Z i ) I 6'.
+ ...
Now, for i E K, the monotonically decreasing sequence c(zi) must converge either because c(.) is continuous at z' or else because c(z) is bounded from below on T. But this is contradicted by (8), which shows that the sequence c(zi) is not a Cauchy sequence for i E K , and hence the theorem must be true. A somewhat more sophisticated and useful model is obtained by substituting for the search function a : T + T of algorithm (2) a set-valued search function A mapping T into the set of all nonempty subsets of T (which we write as A : T + 27. We then get the following procedure (which also uses a stop rule c : T --t R1): 9 Algorithm Model. A : T + 2T, c : T + R1.
Step 0. Step 1. Step 2. Step 3. Step 4.
Compute a zo E T. Set i = 0. Compute a point y E A(zi). Set zi+l= y . If C ( Z ~ + ~3) c(zi), stop;* else, set i
=i
+ I and go to step 2.
rn
I0 Theorem. Consider algorithm (9). Suppose that (i) c(.) is either continuous at all nondesirable points z E T, or else c(z) is bounded from below for z E T;
* See footnote to algorithm (2).
16
1
PRELIMINARY RESULTS
(ii) for every z E T which is not desirable, there exist an ~ ( z > ) 0 and a 6(z) < 0 such that
11
c(z") - c(2')
< 6(z) < 0,
for all z' E T such that 11 z' - z < E(z), and for all z" E A(z'). Then, either the sequence {zi} constructed by algorithm (9) is finite and its next to last element is desirable, or else it is infinite and every accumulation I point of {z,} is desirable. 12 Exercise. Show that (ii) of (10) implies that if c(z') 2 c(z) for at least
one z'
E
A@), then z is desirable.
13 Exercise. Prove theorem (10).
I
I
14 Remark. The reader should be careful not to read more into the statements
of the above convergence theorems than they actually say. Note that these theorems state only that i f a convergent subsequence exists, then its limit point will be desirable. To ensure that accumulation points exist, it is necessary to make some additional assumptions. For example, one may assume that the set T is compact, or that the set { z E T I c(z) < c(z,)} is compact, where z, is the starting point for the algorithm. The reason for not including such assumptions in the statement of theorems such as (3) and (10) is that it is usually better to determine whether an algorithm will produce compact sequences by examining the algorithm in the light of the specific problem to which one wishes to apply it. This point will become I clear in the chapters to follow. The assumptions (i) and (ii) of theorem (10) are not the only ones under which the conclusions of theorem (10) are valid. The following set of assumptions are due to Zangwill [Zl] and can be shown, though not very easily, to be stronger than the assumptions of theorem (lo), i.e., whenever the assumptions below are satisfied, the assumptions of theorem (10) are also satisfied. 15 Exercise. Consider algorithm (9). Suppose that the set Tin (1) is compact,
that the stop rule c(.) is continuous, and that the map A ( * )has the following property: If {zi} is any sequence in T converging to a point z' and {y,} is any sequence in T converging to a point y' and satisfying y, E A(zi) for i = 0, 1 , 2, ..., then y' E A(z'). Show that the conclusions of theorem (10) remain valid under this assumption, provided c(z') 2 c(z) for at least one z' E A ( z ) if and only if z E T is desirable.
16 Exercise. Show that whenever the search function a(-) (or A ( - ) ) and the stop rule c(.) are continuous, assumption (ii) of theorem (3) (theorem (10)) is satisfied. Show that this is also true for the assumptions stated in exercise (1 5).
m
The convergence theorems (3) and (10) can be thought of as being extensions of Lyapunov's second method for the study of the stability of
1.3
MODELS AND CONVERGENCE CONDITIONS
17
dynamical systems described by difference equations. Weaker versions of these theorems can be found in Polyak [P6] and in Zangwill [Zl] and [Z2]; related ideas appear in the work of Varaiya [V3], Levitin and Polyak [L2], Topkis and Veinott [TI], Hurt [H4], and, in somewhat less explicit form in Arrow et al. [A3], Zoutendijk [Z4], and Zukhovitskii et al. [Z7]. The author first presented these theorems in [Pl-P4] without being aware of the rather close result due to his namesake, Polyak [P6], which was buried in a seemingly unimportant paper on gradient methods. Algorithms of the form (2) or (9) make sense only if one can compute the point zi+l = a(zi), or a point zi+l E A(zi), in a practically acceptable manner. Now, one often invents algorithms of the form (2) or (9) in which the calculation of a(zi), or of points in the set A(z,), cannot be carried out in any practical manner. For example, the computation of such a point may require us to find the limit point of a sequence which we must first construct. As we have already mentioned, to resolve the practical difficulty this introduces, we must introduce suitable truncation, or approximation procedures. We shall now describe a few approximation procedures which were first presented in [P3]. (An alternative approach is described and illustrated in Appendix A and in [M8].) The most simple-minded approach to introducing a truncation procedure into algorithm (2) is to define an approximations set, 17
A,(z)
={
y E T I (1 y - a(z)llg < E } ,
where
E
2-0, z E T.
We can then modify (2) as follows: 18 Algorithm Model. Suppose that an E, > 0 is given. Step 0. Compute a z, E T. Step 1. Set i = 0. Step 2. Set E = E,, . Step 3. Compute a y E A,(zi). Step 4. If c ( y ) - c(zi) < -E, set ziti = y, set i = i 1,* and go to step 2; else, set E = €12 and go to step 3. The above algorithm will never stop after a finite number of iterations, since no stop commands are included. One of two things will occur. Either algorithm (18) will construct an infinite sequence {zi},or else it will jam up at a point zk and cycle between steps 3 and 4, dividing E by 2 at each cycle. We shall now show what kind of points one can compute using algorithm (18).
+
19 Theorem. Suppose that the search function a(.) in (17) and the stop rule c(.) in step 4 of algorithm (18) satisfy condition (ii) of theorem (3) and
* Convention: In all algorithms, a sequence of commands such as “set,” appearing in the instructions of a step, must be executed in the order in which they occur in the instructions.
18
'1 PRELIMINARY RESULTS
that c(.) is uniformly continuous on T. Then either algorithm (18) jams up at a desirable point after a finite number of iterations, or else it constructs an infinite sequence {zi} and every accumulation point of this sequence is desirable.
Proof. Suppose that the algorithm jammed up at the point z k . Then it must be cycling between steps 3 and 4, producing in this process a sequence of vectors yi , j = 0, 1,2,..., in T such that IIyi - a(z,)lla < eo/2j and c ( y j ) - c(zk) > -e0/2i. Hence, as j co, yi + a(zk), and since c(.) is assumed to be continuous, we must have c(a(zk)) 2 c(zk), i.e., zk is desirable. Now suppose that the sequence {zi} is infinite and that zi-+z' for i E K C (0, 1, 2,...}, with z' a nondesirable point. Then there exists an integer kEKsuchthatforiEK,i>k, --f
20
II zi
-
Enbe an infinite sequence constructed by this algorithm model. If either T or C'(z0) = {z E T I c(z) c(zJ} is compact, and ( z ~ +-~zi)-+0 as i -+ co, then zi+D as i + co, where 2 is a desirable point.
0. Since Ofo(*) is continuous, we conclude from the mean-value theorem (B.l. 1) that there exists an E' such that for all 0 < X < E ' , 7
fO(Z'
+ hh(z'))
-fO(z')
= h(VfO(Z'
< --h6'/2,
+ ah(z')), h(z'))
30
UNCONSTRAINED MINIMIZATION
2
where 01 E [0, A]. Consequently, whenever z’ is not desirable, we must have A(z’) > 0. Let z‘ E [w” be such that Vfo(z’)# 0. Then A(z’) > 0, and we define the map 0 : Rn + R1 by
qZ)= fyZ + A(Y) A ( ~ ) ) fyz).
8
By inspection, 0(.) is a continuous function, and
qz’)=fyz’ + ~ ( ~h(z’)) ’ 1 -fyz’)
9
Hence there exists an
E‘
=
o r
< 0.
> 0 such that 1 B(z)
- O(z’)[
< -6“/2,
i.e., such that
e(z) < o p < o
10
for all z E { z I I/ z - z’ I] 11
f“Z
< E’}.
But, according to (4),
+ A(z) h(zN
and hence, setting E(z‘) = E’, S(z’) theorem (1.3.3) is satisfied.
=
-f“4
0 to be supplied;
Comment. The first six steps of the algorithm compute an interval [a, ,b,] containing a minimizing point A'. The remaining steps narrow down the length of this interval to the preassigned value E. Step 1. Compute 8(p), O(0). Step 2. If 8(p) 2 8(0), set a, = 0, b, = p, and go to step 7; else, go to step 3. Step 3. Set i = 0, p, = 0. Step 4. Set pi+l = pi t p. Step 5 . Compute Step 6. If 8(pi+J 2 8(pi), set a, = piP1, b, = p i + l , and go to step 7; else, set i = i 1 and go to step 4.
+
Comment. Now A' E [a, ,b,]. We proceed to reduce the length of the interval containing A'. Step 7. Set j = 0. Step 8 . Set 1,. = (bj - aj). Step 9. If li < E, go to step 12; else, go to step 10.
* The numbers Fz = 1
-
Fl and F2 are called Fibonacci fractions; they have the property that Fl and that Fl = (F#.
32
2
UNCONSTRAINED MINIMIZATION
+
+
Step 10. Set vj = aj F l l j , w j = aj F J , . Step 11. If B(v,) < B(wi), set aj+l = a, , set bj+l = wi,set j = j 1, and go to step 8; else, set ai+l= vi , set b,+l = bi , set j = j 1, and go to step 8.
+
+
Comment. Note that li = Fzilo = (0.68)j10. Step 12. Set ii = (ai bJ/2 and stop.
+
15 Exercise. Show that when = ai and b,+l = w j, we shall have wj+l= vj , and that whenever ai+l= vi and b,+l = b, , we must have v , + ~= wi . Hence, the remarkable property of the Golden section search is that at each iteration we only need to carry out one and not two evaluations of the function B( .). In practice, therefore, algorithm (14) would be modified to take this fact into account. We can now state an implementable modification of algorithm (3) which can be used for finding the minima of differentiable convex functions fa(.), under the assumption that the set {z I fa@) < f o ( z , ) } is bounded. 16 Algorithm (Polak [P3]). Let D(z) be an n x n positive definite matrix whose elements are continuous functions of z ; fa(.) is assumed to be convex. Step 0. Select a z, E Rn such that the set defined in (2) is bounded; select an e0 > 0, and select a p > 0 for algorithm (14). Step 1. Set i = 0. Step 2. Set E = e0. Step 3. Compute --D(zi) Vfo(zi). Step 4. Set h(zi) = --D(zi) Vfo(zi).If h(zi) = 0, stop; else, go to step 5. Step 5. Define B : R+ -+ R1 by
+
17
B(P> = f O kP h W
-f"Zi>-
Step 6. Use algorithm (14) to compute (see step 12 of (14)), using the current value of E. ,iih(zi), set i = i 1, and go to Step 7 . If B(,) < -E, set zi+l= zi step 3; else, set E = 4 2 and go to step 6. B
+
+
18 Exercise.
Show that algorithm (16) is of the form of algorithm (1.3.18) and that it satisfies the assumptions of theorem (1.3.19). Hence, show that either algorithm (16) stops* at a point zk , in which case zk must minimize the function fo(.),or else it generates an infinite sequence {zitsuch that fo(zi) + f a = min{fo(z) I z E Rn}. Modify (16) to the form (1.3.33). When the function fa(.) is not necessarily convex, we can use the following modification of algorithm (3) which uses a step size rule probably first introduced by Goldstein [G3].
* Note that algorithm
(16) cannot jam up because of the stop in step 4.
2.1
GRADIENT AND QUASI-NEWTON METHODS IN R"
33
19 Algorithm (Goldstein [G3]). Let D(z) be an n x
IZ positive definite matrix whose elements are continuous functions of z . Step 0. Select a zo E IW" such that the set defined in (2) is bounded; select an a E (0, 4).
Comment. Step I . Step 2. Step 3. Step 4.
Here, 01 = 0.4seems to be a good choice; see Section 6.1. Set i = 0. Compute -D(zi) V f o ( z i ) . Set h(zi) = -D(zi) Vfo(z,).If h(zJ = 0, stop; else, go to step 4. Compute a hi > 0 such that
where
B(hi ;zi) = f o ( z i
21
Step 5 . Set z , + ~= zi
+ hih(zi)) - f 0 ( z i ) .
+ Xih(zi), set i = i + 1, and go to step 2.
1 Computation of hi according to (20): O,(O; zi) = .
We shall now show that algorithm (19) is of the form (1.3.9) and that it satisfies the assumptions of theorem (1.3.10). Recall that we have defined a point z' E IW" to be desirable if Vf0(z')= 0. 22 Theorem. Suppose that {z,} is a sequence constructed by algorithm (19); ) 0, then, either the sequence {z,} is finite, terminating at z k , and V f o ( z k = or else it is infinite and every accumulation point z' of {z,} satisfies VfO(z')= 0.
Proof. Referring to the model (1.3.9), we set c(.) =fo(-)and we define A : R" -+ 2'" (with T = W") as follows:
23
A(z) = {z' = z
+ Xh(z) I x 3 0, #(A; z ) 3 0,
z)
< 01,
34
2
UNCONSTRAINED MINIMIZATION
where h(z) is defined as in step 3 of (19) (with the subscripts deleted), and
1
Construction for proof of theorem (22).
Now, since the algorithm stops constructing new points if and only if for some zk , h(zk)= 0 (see step 3), the first part of the theorem is trivial. To show that the second part of the theorem is true, we shall show that the mapsfo(.) and A ( . ) satisfy the assumptions (i) and (ii) of theorem (1.3.10). Obviously (i) of (1.3.10) is satisfied, sincefo(.) is continuous. Thus we are left with showing that (ii) is satisfied. Let #(.7 .), mapping R+ x Rn into R1,be defined by
#(k
26
= B(h;z)/4
and let* h = p(z) be the smallest positive root of the equation #(A, z ) = 0. Then for every z E Rn such that V'f0(z)# 0, we have #(O, z ) = a(Vfo(z),h(z)) < 0, and hence p(z) > 0 and #(A, z ) < 0 for all h E [0, p(z)). Consequently, for every z such that V f o ( z )# 0, 27
P(z) = max{#(h,
4I
E
[O, P(Z)/21)< 0.
Now, let z E 88" be such that V f o ( z )# 0 (i.e., z is not desirable). Then, since the interval [0,p(z)/2]is compact and since #(., .) is jointly continuous in both its arguments, there exists an E' > 0 such that for all z' E {z' 1 // z' - z I/ < E'} and for all h E [0, p(z)/2],
I #O, z') - #(A, z>l < -P(z)/2,
28
and hence, since +(A, z )
< P(z) for all X E [07p(z)/2], #(A, z') d P(z)/2,
29
for all z'
E
{z' I 11 z'
-
z 11
0, 01 E (0, i),
3
is bounded. Select an
e0'
> 0, an 01 E (0, +), an 01' > 0, and a t9 > 0.
2.2
REDUCTION OF DERIVATIVE CALCULATIONS
41
Comment. Try 01 = 0.4, a' E c0 E /3 E [5, 101. Step 1. Set i = 0. Step 2. Set E = co . Step 3. Compute the vector h ( ~zi) , E [w" whose jth component, h j ( ~zi) , is defined by 4
N(E,zi) =
1
-
- Ifo(zi
+ c q ) -fo(zi)],
j
where ej is thejth column of the n x n identity matrix, i.e., el
e2 = (0, 1,O,...,0), etc.
=
1, 2,..., n,
= (1, O,..., 0),
+
Step 4. Computefo(zi / 3 ~ h (zi)) ~ , -fO(zi) 4A(€, zi). Step 5. If A(€, zi) < 0, compute a such that
- 4 4% Zi)/PE
5
< w,zi ,h(E, Zi)) < A d ( € ,
Zi)/BE,*
where
w,zi
6
Y
M E ,
and go to step 6; else, set
E
Zi)) = f"Zi
= c / 2 and
+
M E , Zi))
-fO(zA
go to step 3.
Comment. Algorithm (1.33) is easily adapted to calculate a satisfies (5). Step 6. If O(x, zi ,h ( ~zi)) , < - a ' ~ , set zifl = zi &(E, zi), set i and go to step 2; else, set E = 4 2 and go to step 3.
+
1
which
=i
+ 1,
7 Exercise. Use theorem (1.3.27) to show that if {zi>is a sequence constructed by algorithm (2), then either {zi}is finite (i.e., the algorithm jams up after a finite number of iterations at a point zk , cycling between steps 3 and 5 or between steps 3 and 6 while continuing to divide E by 2) and Vf O(zk) = 0, or else the sequence {zi> is infinite and every accumulation point z' of {zi} satisfies Vfo(z')
= 0.
8 Exercise. Construct an algorithm of the form (2), but using a step size calculated by means of a simple modification of algorithm (1.36). 9 Exercise. Suppose that the functionfO(.) is convex. Construct an algorithm of the form (2), but using a step size calculated by means of algorithm (1.14). 10 Algorithm (modified quasi-Newton method, Polak [P3], compare [G2]). Suppose that fo(.)is strictly convex and twice continuously differentiable.
* An adaptation of step size subprocedure (1.36) can be used instead of (5) and may require less computation time per iteration.
42
2
Step 0. Select a z,
UNCONSTRAINED MINIMIZATION E
Rn such that the set {z If0C46 f " Z 0 N
11 is bounded. Select an Comment. Step 1. Step 2. Step 3.
E,
> 0, an 01 E (0, a), and an a' > 0.
Try E , E 01 = 0.4, and 01' E Set i = 0. Set E = E, .* Compute the n x n matrix H(E,zi) whose j t h column is 1 j = l , 2 ,..., n, - [Vfo(zi 4 - Vfo(zi)l,
+
12
where ei is the j t h column of the n X , n identity matrix, i.e., el = (1,O,..., 0), e2 = (0, 1 , 0,..., 0), etc. Step 4. If H(E,zi)-l exists and (Vfo(zi), H(E,zi)-l Vfo(zi)> > 0, set h ( ~zi) , = -H(E, zi)-l Vfo(zi) and go to step 5; else, set E = ~ / and 2 go to step 3. Step 5. Compute a such that 13
- 4 < e(k zi ,M E , Zi))
e(*,., .) is defined as in (6). Set
=
< W?fO(zz), A(€, Zi)),
1 if possible.
Comment. Use an adaptation of algorithm (1.33). Step 6. If zi ,h ( ~zi)) , 6 -a'€, set zi+l = zi and go to step 2; else, set E = €12 and go to step 3.
+
e(x,
&(E,
zi), set i
=i
+ 1,
14 Exercise. Use theorem (1.3.27) to show that if { z i } is a sequence constructed by algorithm (lo), then either (zi) is finite, i.e., the algorithm jams up at a
point zk , cycling between steps 3 and 6 while continuing to divide E by 2, and Vfo(zk) = 0, or else {zi} is infinite and converges to a point i? such that VfO((0) = 0, i.e., i? minimizesfO(z) over z E R" (see (1.3.65)). We now present an algorithm for unconstrained minimization which does not require any derivative evaluations and which is not an obvious adaptation o f an algorithm that does require derivatives. This algorithm is called the method of local variations in the Russian literature, and seems to have been known in one form or another for quite a long time. In particular, it is quite obviously related to the Gauss-Seidel algorithm (see [E2]). Recently, it has been described in [B3] and [C3]. It is particularly effective when the functionfO(.) is of the form
c fi"zi). n
15
f"z)
=
i=l
* To cause the algorithm to stop at stationary points and to guarantee superlinear convergence, replace the instruction in step 2 by "If V j o ( z 6 )= 0, stop; else, set E = min {q,, I1 Vjo(zi)ll}and go to step 3". See Cohen [C4].
2.2
REDUCTION OF DERIVATIVE CALCULATIONS
43
We shall need the following notation: For i = 1, 2,..., n, let ei be the ith column of the n x n identity matrix, i.e., el = (1,O ,..., 0), e2 = (0, 1 , O ,..., 0), etc., and let dl = el , d2 = -el , d3 = e, , d4 = -e2 ,..., dznVl = en , dzn= -en.
Method of local variations.
16 Algorithm (method of local variations, Banitchouk et al. [BZ]). Step 0. Select a z, E R" such that the set
is bounded. Select a po > 0. Step 1. Set i = 0, set z = zo and computefo(z). Step 2. Set p = p i . Step 3. Set j = 1 . Step 4. Computefo(z pdj). Step 5 . If f o ( z pdJ 0 such that 7
+ hihi)= min{fo(z, + Ahi) I h 2 0). Set zi+l= zi + h,hi,set i = i + 1, and go to step 1. fo(zi
4 Step 4. Note that the minimization operation required in step 3 makes all conjugate gradient methods purely conceptual. Normally, some form of approximation must be introduced in the computation of h i . The various approximations (such as result from polynomial expansions, or the use of the Golden section search (1.14)) which are currently used in practice upset the relationships which ensure the convergence to the conceptual form of the algorithm. As a result, in practice, one is forced to reinitialize the algorithms every so many iterations. We shall return to this problem later. Before we give a detailed description of a few conjugate gradient methods, which differ only in the specific manner set up for calculating vectors hi E F(zi) in step 2 of (6), let us establish one sufficient condition for algorithm (6) to be convergent.
8 Theorem. Consider problem (3), with the assumptions stated, and algorithm (6). Suppose that in step 2 of (6), the vector hi is always chosen so that for a fixed p > 0, 9
-(Vf0(zJ, hi) 2 p ll VfO(Zi)llII hi II.
* Note that since rn > 0, it follows from (B.2.8) that the set { z I fo((z) < fo(zo)]is compact. Hence, since the Hessian H(.) is continuous, the bound M always exists, and consequently its postulation is redundant.
2.3 ' CONJUGATE GRADIENT METHODS IN [w"
47
Then, either algorithm (6) constructs a finite sequence {zi}, whose last element, zk , minimizes fo(z) over z E R, or else the algorithm constructs an infinite sequence {zi}which converges to a point 2 which minimizesfO(z) over z E R". Proof. The first part of the theorem is trivial, since the algorithm stops at a point zk if and only if Vfo(zk) = 0, and, by assumption,fO(.) is strictly convex, which ensures that such a point solves (3). To prove the second part of the theorem, we shall show that the maps c(.) =fo(.) and A ( . ) defined as below, satisfy assumptions (i) and (ii) of theorem (1.3.10).* Thus, for every z E R*, let A(z) be defined by 10
A(z)
={
y
=z
+ h(z, h)h I h satisfies (9), h(z, h) is determined by (7)).
(We use (7) and (9) in (10) with the subscripts on z and h deleted, of course.) That assumption (i) is satisfied is clear. Hence we only need to establish that assumption (ii) of (1.3.10) is satisfied by the mapsfo(.) and A(.). Now, since fo(.)is twice continuously differentiable, we have, by the Taylor expansion formula (B. 1.12), that 11
fo(z
+ hh) -fo(z)
= h(Vfo(z),
h)
+ h2 s' (1 - t)(h, H(z + thh) h) dt, 0
where H(z) = a2fo(z)/az2.Since, by assumption, (4) is satisfied and (9) holds for every h such that (z Ah) E A(z) (for some A > 0), we must have, for such an h and any A 2 0,
+
12
f"z
+ Ah) -foe)< -hp
II Vf0(Z)ll II h II
+ 3h2MII h 1 2.i
Now suppose that Vfo(z) # 0. Then, for all y E A(z), we must have (by minimizing the right-hand side of (12) over A) that 13
Since VfO(*)is continuous, there must exist an ~ ( z )> 0 such that for all z' such that 11 z' - z /I ,< E(z),
II Vf0(Z')ll2 2 4 II Vf"Z)1l2.
14
Next, since (13) must also be true for all y'
* For this purpose we set
E
A(z'), with y' and z' taking
7' = Rnand we define z E R" to be desirable if V f o ( z )=O. We assume, of course, that both z and z Ah are in the set {z I f o ( z )< fo(zo)},where z, is some given starting point. +
+
48
2
UNCONSTRAINED MINIMIZATION
the place of y and z, respectively, we conclude that for all z' ~ ( z ) and ) for all y' E A(z'),
E {z'
I 11 z
- z'
11
0 such that
52
fo(zi
+ hihi)= min(fo(zi + Ahi) I X 2 01.
Step 3. Set 53
~ i += l Z
( g i gt> 9
9
7
+
set i = i 1, and go to step 2. Note that the above algorithm differs from the Fletcher-Reeves algorithm only in the formula for y, (compare (55) and (46)). Note also that when f "(2) = ( d , z ) +(z,H z ) , where H IS an n x n symmetric, positive definite matrix, the y, as given by (46) and the 7%as given by (55) are identical according to (38), and hence in the case of such anfa(.), the two methods become identical. Now let us see what happens in the general case of problem (3) under the assumptions stated.
+
56 Theorem. Consider problem (3) under the assumptions stated. If zo , z1 ,z2 ,... and h a ,h, ,h, ,... are sequences constructed by algorithm (51), then there exists a p > 0 such that 57
-(Vf0(ZJ, h,) 3 p /I VfO(Z%)II II h, /I. Proof. For every z E W, let H(z) = 82fo(z)/az2 and let g(z) = -Vf O(z). Then, making use of the Taylor formula (B.1.3), we obtain
58
-g,+l
=
-g(zt+l)
=
-g(zz
+ kh,)
Since by construction, ( h , ,g,,,)
=
= 0, we
-g,
+ A,
1
H(z,
+ tA,h,) dt h, .
get from (58),
59 where 60
Hi
=
f H(zi + tAihi) dt. 0
(Note that the second half of (59) is obtained from the fact that (hi-l ,gi) = 0 and that hi = gi yihi-l .) Now, from (55) and (58), together with (59), we obtain that
+
61
2.3
CONJUGATE GRADIENT METHODS IN [w”
But from (4) and (60) we deduce that for all h E R”, m 11 h 112 M 1) h /I2, and hence we must have
55
< ( h , Hih)
I/ hi+l II II gi+1II
+ yihi ,gi+J
=
( g t + l gi+d,
i
=
0, 1,2,... .
1, 2,...,
-
ll &+I 112 I/ hi+] /I II gi+1II
II gi+1 I1 --
/I hi+l II
8 which is the desired result. The following corollary is a direct consequence of theorems (8) and (56):
67 Corollary. Consider problem (3) under the assumptions stated. Then either algorithm (51) constructs a finite sequence (zi) whose last element, z , , satisfies Vfo(zk) = 0, or else algorithm (51) constructs an infinite sequence {zi} which converges to a point 4 such that Vfo(4) = 0 (i.e., either 8 zk or D minimizesfO(z) over all z E R”). Since it is impossible to calculate hi exactly according to (52) (or (43)) there is an accumulation of errors in an actual computation which can affect the convergence properties of algorithms (42) and (51). Usually, to avoid such an accumulation of errors, these algorithms are reinitialized after each k 2 n 1 iterations. In algorithms (42) and (51), reinitialization after k iterations amounts to setting hi, = gi, , for i = 0, 1, 2,..., rather than computing hi, according to (46) or according to (55). For a discussion of the rate of convergence of such conjugate gradient methods “with reinitialization,” see Section 6.3. From a computational point of view, the algorithms (42) and (51) involve approximately the same amount of work per iteration. However, so far, a convergent implementation has been obtained only for (51); see (C.4.1) and
+
~41.
2
56
UNCONSTRAINED MINIMIZATION
We conclude this section with the Fletcher-Powell version [F3] of the Davidon variable metric algorithm [D2]. The original description of this method was published by Davidon [D2] in 1959 and it included a number of empirical devices. Fletcher and Powell [F3] presented a simpler and more basic description in 1963. Their description assumes that all calculations are worked out exactly, i.e., their version is a purely conceptual algorithm. As with the other conjugate gradient methods, this may lead to a deleterious accumulation of errors in an actual computation, in which case it is dealt with by restarting the algorithm every so many iterations. The variable metric algorithm is very popular and has good computational stability properties. Its only drawback is that at each iteration the computer is required to store an n x n matrix H i , which may cause difficulties on a small machine when n 3 100. In discrete optimal control applications, n is often as large as 1000, and in this case one would be more inclined to use one of the two preceding methods. 68 Algorithm (variable metric; Davidon [D2], Fletcher and Powell [F3]). Step 0. Select a zo E R". If Vfo(z0)= 0, stop; else, go to step 1. Step 1. Set i = 0, set Ho = I (the n x n identity matrix), and set go = Vf0(z0)-*
Comment. Note that both gi and Hi are not defined in the same manner here as they are in (54) and (60), respectively. Note that hi is also defined in a manner different from the one of algorithms (42) and (51). Step 2. Set
hi = - H . zg .
69
2 '
Step 3. Compute X i
Step 4. Compute Step 5. If VfO(z,
> 0 such that
Vf *(zi + Xihi). + Xihi) = 0, stop; else, set
+ hihi,
71
z ~ = + ~zi
72
gi+,
73
4, = gi+1 - gi
74
d Z i = Zi+l
= VfO(Zi+l), 9
- zi ,
* The choice Ho = Z is not mandatory. We may choose H,, to be any symmetric positive definite matrix.
2.3
CONJUGATE GRADIENT METHODS I N
R"
57
and go to step 6." m Step 6. Set i = i 1 and go to step 2. It has been shown by Meyer [M7] that when it is applied to a quadratic S(z, H z ) , the variable metric method constructs function f o ( z ) = (d, z ) exactly the same sequences of vectors zi and hi as the methods (42) and (51) (see [M7]). Hence, this is a method of the same type as the two preceding ones. In order to show that algorithm (68) is of the type (6), we only need to show that the matrices Hiare positive definite.
+
+
76 Theorem (Fletcher-Powell [F3]). For i = 0, 1,2, ..., the matrices Hi constructed by algorithm (68) are symmetric and positive definite. Proof. For i = 0, Hi = I, a symmetric, positive definite matrix. By (75), is symmetric if Hi is symmetric; hence, we only need to prove that the matrices Hi are positive definite. We give a proof by induction. Suppose > 0). Then, for any nonzero vector z E R", that Hi is positive definite (Hi 77 Since Hi > 0, H:/' is a well-defined positive definite matrix. Now let p and let q = --H:/'Ag,. Then (77) becomes
= H;I2z
78 Applying the Schwartz inequality, we now obtain < p , p ) ( q ,q ) ( p , q ) 2 , and hence,
= IIp
1 ' II q 112 3
79
80
Hence, we conclude that
* For x, y in LQn we denote by x )< y the n x n matrix xyr, i.e., the zjth component of x )( y is xiyj. Note that (x )( y)z = x 0, then Hi+, > 0, and since H, > 0, we must have Hi > 0 for i = 0, 1, 2,... . At the time the variable metric method was introduced, it was shown that it solved the problem min{(z, d ) +(z, Hz) I z E R"} (where H is symmetric and positive definite) in no more than n iterations. Subsequently, as we have already pointed out, it was shown by Meyer [M7] that, when applied to the minimization of a quadratic form, the Fletcher-Reeves and the DavidonFletcher-Powell methods produce identical sequences of vectors z, ,hi. However, in the ten years since its invention, no proof of convergence of the variable metric method for a more general case has been published. Quite recently, Powell has obtained both a proof of convergence and a rate of convergence for the variable metric method when applied to the minimization of strictly convex functions. Powell was kind enough to supply the author with a still unpublished manuscript [P9] and we shall now reproduce some of these new results. Powell's results on the rate of convergence will be presented in Section 6.4.
+
84 Theorem (Powell). Consider problem (3). If, in addition to the assumptions already stated on fo(.),there exists a Lipschitz constant L > 0 such that, for all z E {z Ifo(z) ( gi+1 Hi &+l> 2
+ -< gi
9
1 Hi gi>
Making use of (73), (74), (75) and the fact that ( g i + l ,d z i ) that
Next, substituting gi
( 4K,g , )
= 0, we
obtain
+ d g , for gi+l in ( 1 12), and making use of the fact that
= (gi+l - g i , H a i ) = - 0 such that 1) gi+l/I2' 2 v for all t h e j 1 in K. To show that the sequence ( gj+l, Higj+l) contains a subsequence which converges to zero, we examine the trace of the matrix Hi+1. From (107),
+
+
118
2 UNCONSTRAINED MINIMIZATION
64
Solving the difference equation (1 18), we now obtain
i = O , 1 , 2 ,....
Because the matrix Hi+1is positive definite, Tr(Hi+l) > 0. Hence, because of lemma (89), there must exist a number w' > 0 such that 120
Now, by the Schwarz inequality,
and since Hi is positive definite, (1 14) gives
(4,Wkj>> Wg,+1)*
122
Y
Making use of these facts and (120), we obtain 123
Therefore, at least two-thirds of the terms in the sum (123) must satisfy
for otherwise the left side of the inequality (123) would be larger than the right side. Now, lemma (101) established that ELo// dgi (I2 < co and hence, for i = 0, 1, 2,..., at least two-thirds of the elements in {(gj+l,Hjgj+J};=,, belong to a subsequence which converges to zero. We now examine the numerators in (117). Continuing to assume that the sequence ( g , } does not converge to zero, we conclude that the monotonically decreasing sequence f o ( z i ) converges to f o >fo(2), and hence that 125
fO(Zi)
Setting u 126
= mu', we
-f"5)
> u' > 0,
i = O , 1 , 2 , 3 ,....
now obtain from (97) that i = O , 1 , 2 , 3 ,....
2.3
'
CONJUGATE GRADIENT METHODS IN
R"
65
xi"=,
Now, since by lemma (IOl), 11 dgj < co, we must have (1 d g j [I2 -+0, as j -+ co, and hence there exists an integer k such that 3w' 11 dgj [I2 < v/3w for all j 2 k. Setting i = 3k $. 2 in (117), we find that 127
since by the preceding discussion, at least (k in (127) satisfy
Hence, from (1 17), for i 3 3k
+ 1) of the 3(k + 1) terms
+ 2,
129
We shall now show that this inequality leads to a contradiction. The trace of a symmetric matrix is the sum of its eigenvalues, and therefore the trace of the positive definite matrix Gi+l is an upper bound on its largest eigenvalue which is the inverse of the smallest eigenvalue of Hi+1.Thus, let pi+l be the smallest eigenvalue of Hi+1; then, because of (129), 130
However, since Hi+1is a positive definite matrix, we must also have 131
2 pi+1II gz+1I?, 7
which is contradicted by (130). We therefore conclude that gi + 0 as i -+ co, which completes our proof. Computational Aspects
To implement an algorithm such as (42), (51) or (68), Fletcher and Powell compute the step size hi as follows (see (52) and (70)): First they compute O(A; zi)= f o ( z i Ah,) - f o ( z i ) and O,(A, zi)= (Af0(zi hi), hi) for some fixed value of A (say for A = 1). Then they use the four values O(0; zi), O(A; zi),O,(O; zi)and O,(A; zi)to construct an interpolating cubic polynomial p ( . ) to the function O(.; zi),and they compute a A' which minimizes p(h) for h 2 0. If O(A'; zi) < 0, they set Xi = A'; otherwise they set A = @ E ( O , 1)) and repeat. With this type of approach to the implementation
+
+
2 UNCONSTRAINED MINIMIZATION
66
of the algorithms (42), (51) and (68), the Fletcher-Reeves (42) and PolakRibigre (5 1) algorithms converge very slowly, while the variable metric algorithm ( 6 9 , converges quite well. To restore the convergence rate of algorithms (42) and (5 1) (with Xi computed as above), one has to reinitialize these algorithms after every IZ 1 iterations (see (6.3.4), (6.3.60)). Algorithm (1.16) suggests the following alternative implementation of algorithms (42), (51) and (68):
+
132 Algorithm (Polak). Step 0. Select an e0 > 0, an 01 > 0, a /3 E (0, 1) and a p > 0; select a = 0. Step 1. Set E = c0 . Step 2. Compute Vfo(zi). Step 3. If Vfo(zi)= 0, stop; else, compute hi (as in (42), (51) or (68),
zo E UP;and set i
but always with the same rule). Step 4. Set @) = f o ( z i phi) - f o ( z i ) and use algorithm (1.14) with the current value of E to compute ,IT. Step 5. If O(p) < -me, set X i = ,IT and go to step 6; else, set E = B E and go to step 4. Step 6. Set zi+l = zi hihi,set i = i 1, and go to step 3. The author has some experimental indications that when algorithm (51) (and presumably also (42)) is implemented as above, there is no need to reinitialize the algorithm as in (6.3.4). However, one probably does spend more time on function evaluations when using the implementation (132). This concludes the first part of our discussion of conjugate gradient methods. We shall discuss their rate of convergence in Sections 6.3 and 6.4, and give additional implementations in Section C.4.
+
+
2.4
+
Unconstrained Discrete Optimal Control Problems
As we have already seen in Chapter 1, discrete optimal control problems can be viewed as nonlinear programming problems with a special structure and, usually, large dimensions. Therefore, all the algorithms we have discussed so far in this chapter are, at least in principle, applicable to unconstrained discrete optimal control problems. In this section we shall discuss the use of algorithms requiring gradient evaluations for solving discrete optimal control problems. We recall that such algorithms require us to calculate the gradient of the cost function at each iteration, a task which can become prohibitive when the dimension of the space over which we are trying to minimize the cost function is very large. Fortunately, as we shall soon see, the structure of the optimal control problem comes to our aid and enables us to substitute a number of “low-dimensional” operations for one “high-dimensional’’ operation.
2.4
UNCONSTRAINED DISCRETE OPTIMAL CONTROL PROBLEMS
67
Most frequently, unconstrained discrete optimal control problems arise when penalty functions, to be discussed in the next chapter, are used to cope with constraints on the states and controls of the dynamical system. These unconstrained problems are usually encountered in the following form: k-1
1
C &,(xi, ui) + q(x&
minimize
xi E
[wv,
ui E
[wu,
i=O
subject to
2
x i f l - xi = f i ( x i , ui),
i
=
0, 1 ,..., k
-
1,
with x, = $,,
where the functions Lo(.,.), &(., .) and ?(.) are continuously differentiable in all their arguments. Since the initial state x , = i, is given, the states xiare uniquely determined through (2) by the control sequence z = (u, , u1 ,..., uk-J, which we shall treat as a column vector in our equations to follow. Thus, we may write xi = xi(z), and understand by this that xi(z) is computed by solving the system (2) with initial state xo = 4, and input sequence z = (u, , u1 ,..., uk-1) up to time i. Because of this, we see that problem (1) is of the form min(fo(z) I z E Wn>,
3 where z
= (u,
, u1 ,..., Uk-1)
E
Rku,so that n
= kp, and
k--l
4
f o ( z )=
1fio(xi(z>, + ?(xk(z)).
i=O
Since k may be quite large (say lOOO), n in (1) may be even larger, which makes a brute force computation of V f o ( z ) highly cumbersome, at best. We shall now develop a method for computing V f o ( z ) which at no time requires us to handle vectors of dimension any higher than v, which is usually orders of magnitude smaller than n. We begin by noting that* 5
and that for i
= 0,
1 , 2,..., k - 1,
* Note that we always treat Vfo(z) as a column vector. However, ajo(z)/azis a 1 x n Jacobian matrix, i.e., a row vector.
2 UNCONSTRAINED MINIMIZATION
68
5
Now, for i = 0, 1 ,..., k and i <j k, let GjSibe a v x v matrix which is determined by solving the matrix dlfference equation, .-@..=-
@.
7
3+1.2
a?(xj(z), 4)@i , i ,
axj
)la
with the initial condition @i,i
= I,
j=i,i+l,
..., k - 1 ,
the v x v identity matrix.
8 Exercise. Show that
-- - @j,t+l aJ;r(xi(z),Ui)
9
aui
=o
aui
for j
=
i + 1, i + 2,...)k
for j = l , 2 ,..., i.
Since we are in the habit of working with column vectors whenever that is possible, we transpose all the terms in (10) to obtain 11
Now, for i
with
Then
=
1, 2,..., k , let p i be the solution of (compare (1.2.25))
2.4 UNCONSTRAINED DISCRETE OPTIMAL CONTROL PROBLEMS
69
and hence, by inspection of (1 I), we get
i = O , l , ..., k - I . Thus, to compute v f o ( z ) ,we first solve (12) to obtain the vectorsp, ,pz ,...,Pk and then use (14) to complete the evaluation of Vfo(z).
15 Exercise. Incorporate the procedure described above for calculating V f o ( z ) into the algorithms (1.16), (1.19), (1.35), (3.42), (3.51) and (3.68). Would you consider using a nondiagonal matrix D(z) in (1.16) and (1.19) when these algorithms are to be used for solving problem (I)? Can you foresee any difficulties with rapid access storage requirements when using (3.68) 1 to solve problem (1) on a digital computer? It is important to note that the formulas developed above for calculating V f o ( z )could also have been developed by expanding f o ( z Sz) -f o ( z )to first-order terms about z and then reading off V f o ( z )by inspection of the result. To carry out this computation we would proceed as follows:
+
16
f0(z
+6
where, for i
~ - )f 0 ( z )
= 0,
I, 2,..., k, ax,.is calculated by solving
i = 0, 1,..., k - 1, with
Sx,
= 0,
i.e., we linearize both (1) and (2) about the nominal values (x, ,x, ,..., xk) and (u, , u1 ,..., uk-1).
18 Exercise. Make use of (16), (17) together with (12), (13), and the fact that f o ( z 6z) - f o ( z ) = (Vfo(z),6 z ) to first-order terms (where 6z = (Su,, 6ul ,..., &-I)) to obtain (14). Pursuing this idea of obtaining derivatives of f 0 ( O by suitable expansions of (1) and (2), we might try to obtain formulas for use in the quasi-NewtonRaphson method (1.42) (or the version discussed in Section 2) by expanding (1) and (2) to second-order terms. However, this does not result in any obvious simplifications, because of the nonlinear relationship between the axi and the 6ui . Because of this, algorithms of the form (1.16) or (1.19)
+
70
2 UNCONSTRAINED MINIMIZATION
have been introduced in which the direction vector h(z) is computed by solving the problem*
* Probably the easiest way to justify all the terms in (19) is as follows: Set y = ( x , u), with x E (wy, u E (w", and treat it as a column vector. Then, to second-order terms, A
B
C
Identifying the four submatrices in (C), we obtain that
D
and (19) now follows by inspection. (Note that a2fp(y)/axfaui= azfi"(y)/aujax' .)
2.5
UNCONSTRAINED CONTINUOUS OPTIMAL CONTROL PROBLEMS
71
subject to
i=O,1,2
,..., k - 1 ,
with 6 x o = 0 .
Although such algorithms have been found to have practical value, there are no theoretical results available to support a claim that they are always convergent. We shall present an algorithm for solving (19) in the next section, where we shall also develop formulas which considerably simplify the application of quasi-Newton-Raphson methods to problem (1). Before proceeding any further, however, the reader should consolidate the above discussion by working the exercise below. 21 Exercise. Write out an algorithm of the form (1.19) for solving problem (1) in which the vector h(z) is calculated by solving (19) and then setting h(z) = (6u,, 6u, ,..., 6u,-,), with z = ( u o , u1 ,..., u,-~), as was the case in rn the rest of this section.
2.5
Unconstrained Continuous Optimal Control Problems
We shall now show how some of the algorithms, discussed in Sections 1 and 3, can be applied to solving optimal control problems of the form 1
minimize
J : : f o ( 4 t ) ,4 t h t ) dt
+
dX(te)),
subject to 2
d dt
-x(t) =f(x(t),
40,t ) ,
t
E [to
,tfl,
4 0 )
=
so,
w h e r e f o : R Y x R u x R1-tR1,f:R x R ” x R14W,andp,:[WY+R1. We shall assume that the functionsfO(., ., .) andf(., ., .) are continuously differentiable in x and in u and that they are piecewise continuous in t . We shall also assume that for every x E R and for every u E R’, the elements of the matrix-valued functions afo(x, u, -)/ax,afo(x,u, *)/au, af(x, u, -)/ax, and af(x,u, .)/au are piecewise continuous. Finally, we shall assume that p,(*) is a continuously differentiable function. In the problem (l), (2), the times to and ti are assumed to be given. However, the methods we shall develop also apply to problems in which
72
2
UNCONSTRAINED MINIMIZATION
the terminal time tf , is free, but finite, because free-time problems can be transcribed into fixed-time problems as follows: 3 Exercise. Consider the free-time optimal control problem, 4
minimize
J;;f@(xtt), u ( 0 , 0 dt
subject to d dt
5
t
- x(t) = f ( x ( t ) , u(t), t ) ,
E
[to , t f l ,
x(to) = $0 ,
where fo, f are as in (1) and (2) and to .is specified, but not tf consider the $xed-time optimal control problem, minimize
6
.
Next,
I l f o ( P ( s ) , zi(s), i(s)) 6 ( ~ds) ~
subject to 7 8
d ds
- f ( s ) = fi(s)Zf(P(s),qs), Z(s)),
d i(s) ds
-
= qsy,
s
E
[O, 11,
s E [O, 11,
Z(0)
=
5(0) = go,
to,
where the functionsfO andfare the same as in (4) and (5). Note that in the problem (6)-(8), the dynamical system is described by (7) and (8) and that its state at time s is (P(s),f(s)),with i(.) real-valued, while its input at time s is (zi(s), B(s)), with a(.) real-valued. Thus, the dimension both of the state vector and of the control vector of (6)-(8) is larger (by 1) than the corresponding dimension in problem (4), (5). Show that if (a(.), 6(.)) is an optimal control for (6)-(8) and (a(.), t"(.)) is the corresponding optimal trajectory, then 4 , a(.) and a(.), defined by 9
t^p =
to
+
6 ( ~ds, ) ~ 0
10 11
9 (to
+ ! ' 6 ( ~ )ds) ~ = R(s), 0
are, respectively, an optimal time, an optimal control, and the corresponding optimal trajectory for the problem (4), (5).
2.5
UNCONSTRAINED CONTINUOUS OPTIMAL CONTROL PROBLEMS
73
For the problem (l), (2) to be well-defined, we must at least require that the controls u(.) be measurable functions. We shall make a somewhat stronger assumption; we shall suppose that the functions u(.) in (l), (2) belong to Lmu[to, ti],the space of equivalence classes of measurable functions from [ t o ,ti] into R' which have the property that ess sup 11 u(t)ll
12
< co.
tE[tO.tfl
The space Lmu[t0, t f ]is a normed linear space. We shall denote the norm of u(.) E L,[to , ti] by /[ u [Im ;we recall that
tr], which we define as We can also introduce a scalar product on Lmu[to, 14
(Note that this is the standard scalar product used in the Hilbert space L Z U [ t O , ti].)
Let x(t; u) denote the solution* of (2) at time t from the given initial state x, at t = to and corresponding to the input u (* ) (i.e., x(to ; u) = x,). Then, if we define the function Fo : Lwu[t0,ti] - R1 byt 15
we see that problem (I), (2) is equivalent to the problem (compare (l.l)), min{FO(u)I u E Lmu[to ,ti]}.
16
Before we can proceed any further, we must define the gradient of Fo(.). Suppose that grad Fo(.)(.) is a map from Lmu[to, ti] x [ t o , ti] into R' such , ti], and* that for every u E Lmu[to, ti], grad Fo(u)E Lmu[to 17
lim IIVllm+O
Fo(u
+ u)
-
Fo(u) - (grad Fo(u), u ) ,
II 0 Ilm
= 0.
* We assume that a solution exists. It then follows from the assumption stated that the solution must be unique. t In the calculations to follow, it is convenient to consider x ( . ; u ) to be an element of the space of continuous functions from [ t o ,t t ] into Ry,with the sup norm. For our purposes, this simplified approach to gradients is adequate. In general, gradients must be discussed in terms of differentials. For a proof of the existence of grad Fo(u), see [D3], theorem 10.7.3.
*
2 UNCONSTRAINED MINIMIZATION
74
Then we say that grad Fo(.)(.) is the gradient of Fo(.). Note that if grad Fo(u)(.) # 0, then, because of (17), there must exist an s' > 0 such that for all s E (0, s'],
FO(u - s grad Fo(u))- Fo(u) < 0,
18
i.e., whenever gradFO(u)(.) # 0, it defines a direction of descent (decrease), just as in the finite dimensional case, and it is not difficult to see that this is also the direction of steepest descent. It is also easy to see that, because of (18), if 2i(.) minimizes Fo(u) over u E Lmu[t0,ti], then we must have grad Fo(li) = 0, exactly as in the finite dimensional case. We shall now obtain formulas for computing grad Fo(u)(.). We could proceed in exactly the same manner as in Section 4. However, to demonstrate an alternative approach, we shall follow the procedure suggested in exercise (4.18). Thus, we shall compute Fo(u Su) - Fo(u) to first-order terms, formally, by linearizing all the functions in (1) and (2) about a given control function u(-) and its corresponding trajectory x(-),and we shall then obtain grad Fo(u)(*)by inspection, since to first-order terms,
+
19
FO(u
+ Su)
-
Fo(u) = (grad Fo(u),6u),
.
Linearizing all the functions in (1) and (2) about the nominal control u(.) and its corresponding trajectory x(., u), we obtain 20 FO(u
+ 6u) dt
+
where Sx(t), t E [to , ti], is computed by solving the linearized differential equation, 21
for t E [to , ti], with the initial condition 8x(to) = 0. Now, for s, t E [to , tf], let @(t,s ) be a v x v matrix determined by the differential equation, @(s, s) = z.
22
where {is the v x v identity matrix. Then we see that
23
2.5
UNCONSTRAINED CONTINUOUS OPTIMAL CONTROL PROBLEMS
75
and hence (20) becomes
Interchanging the order of integration in the first term in (24), we obtain,
withp(tf)T = -aV(x(tf ; u))/ax. Then it is easy to show that for any s E [ t o , ti], 27
p(~)' = - a9)(x(tf ;
ax
', qtf, + js afo(x(t; 1ax
u(t), ') @(ly ') dt.
tf
Comparing (27) with (24) and (25), we conclude that 28
F0(u
+ SU) - F0(u)
to first-order terms. Finally, comparing (28) and (19), we conclude that for t E [to , te], grad Fo(u)(*)is defined by
76
2 UNCONSTRAINED MINIMIZATION
We shall now show how some of the algorithms described in Sections 1 and 3 can be applied to problem (1).
30 Algorithm (gradient method, compare (1.19)). Let 0 Step 0. Select a u(.) E Lmu[to, tp]. Step 1. Set i = 0.
a E (0, 3)
be given.
Comment. To facilitate comparisons in subsequent sections, we write i i i u, x , p , instead of ui, xi ,pi . i Step 2. Compute &t), t E [ t o , tr], by solving (5) with u(t) = u(t), t E [ t o 7 tP1. Step 3. Compute &t), t E [to , t f ] , by solving (26) with u(t) = :(t) and x ( t ) = &), t E [to , tf]. Step 4. Compute, for t E [to , t ~ ] ,
i
Step 5 . For t E [to , t t ] , set h(u)(t) stop; else, go to step 6 . Step 6 . Compute a hi > 0 such that
-hi(l - a ) [I grad F0(&
32
=
-grad Fo(:)(t). If h(h)(.) = 0,
< #(Ai ;): < -&a:
I[ grad Fo(k)/J;,
where 33
e(hi ; b)
= PO(:
+ hi+)) i
- F$).
Comment. A Xi satisfying (32) can be computed by an obvious adaptation of algorithm (1.33), since we assume that min{FO(u)I u E Lmu[t0, tij} is finite. i Step 7. For t E [ t o , t i ] , set ic(t)= :(t) hih(u)(t), set i = i 1, and m go to step 2.
+
+
34 Exercise. Show that either the sequence of controls b(.) constructed by k k algorithm (30) is finite, terminating at u(-), and grad Fo(u)(*)i= 0, or else it is infinite and then every accumulation point ti(.) of {u(.)} satisfies rn grad Fo(li)(.) = 0. [Hint: Adapt the proof of theorem (1.22).] 35 Exercise. Suppose that problem (1) has the particular form
2.5
UNCONSTRAINED CONTINUOUS OPTIMAL CONTROL PROBLEMS
77
where x ( t ) E R", u(t) E R", R(t), Q(t) are symmetric positive definite matrices for all t E [to, ti], and the matrix-valued maps R(.),Q(.), A(.), B(.) are all continuous. Develop an algorithm of the form (1.16) for this problem. Justify the use of your algorithm in solving (36), (37) and show that the algorithm will produce a sequence of controls that actually converges to the optimal control for this problem by showing that the conclusions indicated in (1.18) hold for this case also. rn Problem (36), (37) can also be solved by conjugate gradient algorithms. We illustrate this by 'writing out an adaptation of the Fletcher-Reeves algorithm (3.42).
38 Algorithm (Fletcher-Reeves for (36), (37)). 0 Step 0. Select a u(.) E Lmu[to, ti]. 0 0 Step 1 . Compute x ( t ) for t E [to , t t ] by solving (37) with u(.) = u(.). 0 Step 2. Compute p ( t ) for t E [to , ti] by solving d o zP(t)'
39
+ x ( f ) TR(t), 0
0
=
-P(t)'
A(t)
0
with p(tf)* = 0.
Step 3. Compute
40
grad Fo(t)(t) = - B ( t ) T j ( t ) 0
Step 4. For t E [to , ti], set g ( t )
0
= h(t) =
+ Q(t) i ( t ) ,
t
E [to,
ti].
0
-grad FO(u)(t).
Comment. The algorithm is now initialized. 0 Step 5. If g(.) = 0, stop; else, set i = 0 and go to step 6. Step 6. Compute a hi > 0 such that i
i
~o 01.
41
i+l
+
i
Step 7. For t E [to , ti], set u ( t ) = h(t) A,h(t). i+l i+l Step 8. Compute x ( t ) for t E [to , ti] by solving (37) with u(.) = u (.). i+l Step 9. Compute p ( t ) for t E [to , ti] by solving 42
d dt
$+I
- p (t)' = - y ( t ) T
A(t)
+ i;'(t)
R(t),
with
i+l
p (tf)'
= 0.
70
2
UNCONSTRAINED MINIMIZATION
Step 10. Compute 43
grad Fo((i+ul)(t)= -B(t)T i+l
Step 11. If grad Fo( u )(.)
i+l
g (t) = -grad Fo( u )(t),
+ y,W),
h (t) set i
=i
t
E [to Y
tfl,
i
i+l
45
t E [to tf]
0, stop; else, set
=
i+l
44
'il(t) + Q(t) %'(t),
=*
at>
with yi
= II
i~llz"/llill: ,*
+ 1, and go to step 6.
46 Exercise. Obtain formulas which will enable you to apply the FletcherReeves algorithm to the general unconstrained optimal control problem (1)Y(2).
47 Remark. To apply the Polak-Ribikre algorithm (2.51) to the problem (36), (37), we only need to change the formula for yi in (45) to 48
yi
1
=
1
tf
7
//8/12 to
i+l
( g (t>-
B@>,BOD dt.
EQUALITY CONSTRAINTS: ROOT A N D BOUNDARY-VALUE PROBLEMS
3.1
Zeros of a Function and Problems with Equality Constraints in IW"
We shall now consider briefly the problem of finding a vector z E R" such P Rm is a continuously differentiable function. that g(z) = 0, where g : U Obviously, we can convert the problem of finding the roots of the equation ---f
1
g(z)
=0
into the form 2
min{fo(z)
A t II g(z)l12I z 6 W,
and then apply any one of the minimization algorithms discussed in Section 2.1. When the functionfa(.) is defined as in (2), the "direction vector" h(z), which appears in the algorithms discussed in Section 2.1, is given by the formula 3
h(z) = --D(z) VfO(Z)
where ag(z)/az is an m x n matrix whose 0th element is agi(z)/azj.NOW,all the algorithms we have discussed stop at points z' such that h(z') = 0. Since by assumption, the matrix D(z) is positive definite for all z E {z Ifo(z)
+
= ((1 AiT)Ps,i+l X i > = <Ws*i>
i = O , l , ..., k - 1 .
Rewriting (42), we obtain
We now see that (38) can be written in the alternative form
44
s = 1,2)...,v-a,
= Ws,o 7
since ps,i = Or,iPls, i = 0, 1,2,..., k. So far, we have merely rederived (38) by an alternative approach which consists of the following operations: (i) solve (40) v - a times, with the initial conditionsp,,, = P kT , s ,s = 1, 2,..., v - a;(ii) solve (43) v - a times, with the initial conditions w , , ~= ykS,s = 1,2,..., v - a; (iii) construct the system of equations (44).The labor in this is the same as in the previous derivation. Abramov’s device consists of not using the vectors p s , i to transfer the boundary conditions from time k to time 0, but the vectors qSai= ps,i/llps,i1 . The advantage of doing this lies in the fact that while all the elements of the vectors p s , o may be extremely small compared to the nonzero elements of the matrix Po (causing severe ill-conditioning in the system (37), (44)), the vectors qs,ocertainly do not have this undesirable property. We now show that the vectors qrSican indeed be used to construct a normalized system of equations to replace the possibly ill-conditioned system (44). First, note that 1
45 q s,z
-
I/ ps,i I/
1
ps*i =
ll(1
+ Air)
PS.i+l
I1
(1
+ Air)
PS,i+l
Next, we set 46
i = O , 1 , 2 ,...,k,
3.2
BOUNDARY-VALUE PROBLEMS AND DISCRETE OPTIMAL CONTROL
91
Rearranging terms, we now get
We therefore see that we can substitute for ,the set of equations (44) the set of equations 49
= Es.0
9
s = 1,2,..., v-a,
which is obtained by first solving the difference equation (45) v - a times, with initial conditions q8.k = Pl,s (where Pk.8 is the sth row of the matrix P, in (34)), and then by solving (48) v - a times, with initial conditions Usiis.k = yks/(Pk,sP~,s)1'2. Thus, to obtain an X o which satisfies (34) and which is such that the corresponding x k , computed by solving (33) from this xo , also satisfies (34), we may solve the system of equations (37), (49). Although in the system of equations (37), (49) we may find that the elements of the qs,oare not particularIy small or large as compared to the elements of the matrix Po,we may still be faced with an ill-conditioning effect produced by the fact that the vectors qs,omay have become close to being parallel even if the vectors qs,kwere orthogonal (this is particularly likely when k is large), , = 1 even if ( q i , k , q j , J = 0, i # j . At i.e., we may find that ( q i s 0 qi,o) present there seem to be no general methods for eliminating all causes of ill-conditioning effects in boundary-value problems with difference equations. As we shall see later, one can do somewhat better in the case of differential equations.
50 Exercise. Modify the quasi-linearization version of the Newton-Raphson algorithm (31) so as to make it applicable for solving (1) with boundary conditions of the form go(x,) = 0, gk(Xk) = 0, where go : 5P -+ R" and gk : [wy -+ LWu are continuously differentiable and the Jacobian matrices ago(z)/az, & k ( Z ) / a Z are of full rank for all z in a sufficiently large open set m in Rv.
92
3
R O O T A N D BOUNDARY-VALUE PROBLEMS
51 Exercise. Obtain a quasi-linearization version of the algorithm (2.1.42)
in a form applicable for solving boundary-value problems with difference I equations. 52 Exercise. Show how Abramov's method described above can also be used
to reduce ill-conditioning effects in the Goodman-Lance version of the Newton-Raphson method (18). [Hint: Set S X , = (8xk(d)/&)(?? - and show that Sx, is the solution at time k of the variational difference equation
i)
i
53 axi+l - ax, = -ax,,
= 0,
i
axi
j+l
i
1,..., k - 1, with Sx,, = z - z,
which must satisfy the terminal boundary condition 54
PI, s x ,
=
i
-PI,x,(z) - y,
.
Now apply Abramov's method to obtain a well-conditioned set of equations m to replace (lo)]. We now turn to the discrete optimal control problem, k-1
55
C fiO(xi , ui) + y(xk),
minimize
xi G
[wy,
ui E R",
i=O
subject to 56 57
Xi+l
xo
- xi
= $0
= f i ( x i , ui), g,(x,)
7
i = 0, 1, 2,..., k
- 1,
= 0,
where we shall assume all the functions A.o(.,.),A(.,.), p(.) and gk(.) (gk: IWY -+ UP) to be twice continuously differentiable. We include here the case gk(.) = 0, i.e., the unconstrained optimization problem considered in the previous section. When g k ( . )# 0, we shall assume that a < v and that the Jacobian matrix ag,(x)/& has full rank for all x in a sufficiently large open set in Ry. is an We recall (see (1.2.24)) that for the problem ( 5 9 , if ti,, zi, ,..., optimal control sequence and i0, ,..., 4, is the corresponding optimal trajectory, then there exist multiplier vectors ,j 2,...,$, , 73 such that 58 59
4i+l
-
ai
tii) = 0, gI,(&) = 0,
i = 0, 1,..., k - 1,
3.2
BOUNDARY-VALUE PROBLEMS A N D DISCRETE OPTIMAL C O N T R O L
93
61 62
-
pi+l
= 0,
i = 0, 1 ,..., k - 1.
63 Remark. In writing out (58)-(62), we have assumed that problem (55) is “nondegenerate,” i.e., that the multiplier stipulated in (I .2.24) can be assumed to be -1. When the problem is degenerate, one must proceed in a manner analogous to the one explained in Section I for the problem (1.16). rn In a number of important cases, it is possible to eliminate the t i d , i = 0, 1, 2,..., k - 1 , from the system of equations (58)-(62), by means of (62), to obtain a smaller system of equations involving the di and the ji only. We shall consider these special cases later. For the time being, let us assume that there are no obvious simplifications to the set of equations that can be made, i.e., let us consider the general case first. In an attempt to solve the system (58)-(62) by means of the Newton-Raphson method, we are faced with two obvious choices in defining the vector z. The first choice is to set 64
z == (XI 9
x2
,-..>x k
9
P1
3
P2
9.v.)
Pk
3
7T, UO
9
u1
Y-..Y
uk-l).
The second choice is to set 65
z = (UO
9
u1
)***?
uk--l
9
P1
3
PZ ,*.*,
Pk
9
and to eliminate the xi from (58)-(62) by means of (58), deleting (58) from the system of equations to be solved in the process. We shall develop formulas for use in the Newton-Raphson method based on the definition (64). The reader will be given the opportunity to explore the possibilities presented by definition (65). Note that the specific algorithm we shall obtain is of the quasi-linearization type. With z defined as in (64), the function g ( . ) in (3) is defined by (58)-(62). From (64), g ( . ) has k v kv a k p = k(2v p) a arguments, while v k p = k(2v p) + a compofrom (58)-(62), it has kv a ( k - 1). nents, and hence it satisfies the requirement of the Newton-Raphson method that it map z into the space of which it is an element (i.e.,g : Rn -+ Rn, with n = k(2v p) a. Next, we turn to equation (8), for which we must obtain an expression in this particular case. Because of the great complexity of the system (58)-(62), it is probably easiest to obtain the detailed expressions for (8) by setting S.! = - and computing ( 8 ) in the rearranged form,
+ + + + + + + + + +
+ +
’t’ i
66
94
3
ROOT AND BOUNDARY-VALUE PROBLEMS
which can be done by expanding the equations (58)-(62) to first-order terms in about the point Performing this task, we obtain from (58),
Si
i.
i=O,l, i
where Sx,
=0
..., k - 1 ,
and 3
j
i
i
i
5. = xi+l - xi - & ( x i , us),
68
i = 0, 1,..., k - 1.
Next, from (59), we obtain 69
and from (60), we obtain
i=l,2
,..., k - 1 ,
where (a%.O(x, u)/auax)‘ is a v x p matrix whose jth row is (a/au) [(af;ro(x,u ) / a ~ ) ~and ]j,
i = l , 2 ,..., k - 1 . From (61), we obtain
where 73
Finally, from (62), we obtain
3.2
-
BOUNDARY-VALUE PROBLEMS A N D DISCRETE OPTIMAL CONTROL
i
--Ti
i=O,1,
9
95
..., k - 1 ,
where
The system of equations (67)-(75) is a mixture of difference equations ((67) and (70)) and of algebraic equations. It presents us with two eventualities: either one can solve, uniquely, for &, the system (74), or not. in When (74) cannot be solved to obtain a unique expression for the terms of the S i i and the Sji,we are faced with a horrendously large system of equations to solve and our chances of success are not too good, since it is not at all clear as to how one can efficiently decompose this task into a sequence of simpler ones. However, when one can obtain a unique expression for the in terms of the S i i and the Sji from (74), the situation becomes considerably inproved, as we shall now show. Thus, suppose that (74) can be solved for the S& in terms of the and the S j i , and that this solution is unique. Then, upon substituting for j+l j j j+l into (67) and (70) and replacing SJi by xi - xi and Sp, by p i the we obtain a system of equations of the form,
Sii
Sti
Sii
d,
Sii
76 77
j+l
j+l
xi+l
-
j+l
pi
xi
j
j+l
pitl
-
j+l
= Aixi j j+l
=
Ci xi
+
j j+l
j -
+ Dipi+l j j+l
-
= 0,
1 ,..., k
1,
ui,
i
i wi ,
i = 1, 2,..., k - 1,
-
78 where (78) is obtained from (57), (69) and (72). In developing a special method for solving the boundary-value problem (76)-(78), we shall assume that all the inverses used do indeed exist. When they do not exist, the implied simplifications obviously cannot be utilized and one has to revert to solving this boundary-value problem either as a system of algebraic equations which is defined by (76)-(78) or by other i
* Do not confuse the Qj,
i
j
i
, wi ,ui ,pi, and qi appearing here with similarly denoted quantities used earlier in this section.
96
3
ROOT'AND BOUNDARY-VALUE PROBLEMS
means which may appear to be expedient in a specific situation. In order to 1 from all the simplify notation, we shall drop the superscripts j and j symbols appearing in (76)-(78). Thus, in the worst case, we have to solve the system
+
-Bo 0 0 --B, 0 0
- I
-A, 0
79
0 0
*.-
...
0 --B2
0 ...
0
...
...
0 -Cl 0
-D1
r
0
0
o
-B,
...
0
0 0
0 0
-Bk-1
0 0
0 0 0
0 0 - 0
+
Ai), and for i = 1, 2,..., k - 1, where for i = 0, 1,..., k - 1, Ai = ( I Di= ( I Di).Since the matrix in (79) is sparse (i.e., it has many zeros), the task of solving (79) need not be hopeless. However, we shall assume that the problem is such that one can utilize its dynamical structure to simplify computations, as shown below. We begin by observing that from (77),
+
80
Pk-1
where &-I 81
= (I
+
&I).
= ck-lxk-l
+-
Dk-lPk
9
If we treat p k as a known constant, then we see that PIC-1 = K k - l x k - l
+
qk-1
where we have simply defined Kk-, = c k - 1 , shall now show that for j = 1, 2,..., k , 82
- wk-1
pj = K j x j
+ qj
y
9
qk-1 = L 1 p k
- wk-1.
We
3.2
BOUNDARY-VALUE PROBLEMS AND DISCRETE OPTIMAL CONTROL
97
where the matrices K j and the vectors qj will be defined later. Thus, suppose 1, that f o r j = i
+
03
Pi+l = Ki+lXi+l
+ 4i+l -
+ A < ) for i = 0, 1 , 2,..., k + B,.Ki+l~i+l+ Biqi+l -
Then, from (76), with 2% = (I 84
xi+l
Solving (84) for 05
= &ixi
-
1, we obtain
US.
we now obtain xi+l
=
(I - BiKi+l)-'(&Xi
+ Biqi+1 -
~i).
Substituting for pi+1into (77) from (83) and (85), we finally obtain 06
pi
=
Cixi
+ DiKi+I(I
-
where we define Dd = ( I we now get from (86) that 07 pi
=
BiKi+l)-'(&xi
+ Biqi+l - v() + Diqi+1 - wi,
+ Di) for
1, 2,..., k
i
=
-
1 . Collecting terms,
+ DiKi+l(I - BiKd+l)-' &] xi + [Di+ DiKi+l(I BiKi+l)-' Bi] qi+l - DiKi+l(I - BgKi+l)-' U* - W $,
[Ci
-
i.e., we find that (82) is true f o r j = i, with Ki , qi defined by
00 Ki 09
+ DiKi+l(Z - B&+l)-' Ai , = Di[I + Ki+l(I - BiKi+I)-l Bi] qi+l - DiKi+l(I =
i = 1 , 2,..., k - 1,
Ci
-
BiKi+I)-' ~i - wi , i = 1,2,..., k - 1 .
Setting KkPl = c k - 1 and qk-1 = Dk-lPk - w k P l ,we see from (80) that (82) holds for j = k - 1. It now follows by induction that (82) holds for j = k , k - 1, ..., 1, provided the inverses appearing in (86)-(89) exist, as we shall assume that they do. To complete our demonstration, we note that we are consistent if we set 90
Kk
= 0,
qk
=Pk*
Show that there may exist valid boundary conditions for (88) and (89) other than those in (90). In particular, consider the choice Kk = -@k, q k = GkTT Y k . [Hint: See (120).] We shall now summarize the procedure for solving (76)-(78) which is implied by the above development.
91 Exercise.
+
92 Algorithm (solves system (76)-(78)).* Step 1. For i = k , k - 1, k - 2,..., 1, compute the K, by solving (88)
with the boundary condition in (90). i
*When Gr = 0, the calculations below can compute q i directly from (89), for 1, 2,..., k - 1, by setting qk = yr . (We do not know pr in advance.)
=
3
98
Step 2. For i
ROOT A N D BOUNDARY-VALUE PROBLEMS
1, 2,..., k - 1, compute
=
93
Li
94
= Di[I
It =
+ Kt+i(I - BiKi+l)-' Bi],
-DiKC+I(I - BiKi+i)-'
~i
- wi .
Comment. Equation (89), with its boundary conditions, may now be written in the more compact form, 95
qz. = L z.q i + l f li Step 3.
i = 1,
3
29.e.9
k - 1, qk
=Pk
For i = 0, 1, 2 ,..., k - 1, compute
96
Mi
97
Ni= ( I - BiKi+&-' Bi ,
98
mi
=
=
( I - BiKi+I)-' A t ,
-(I - BiKi+l)-l vi .
Comment. Equation (76), with its boundary conditions from (78), can now be written in the more compact form,
+
+
99
xi+i = Mixi Niqi+l mi i = 0,1,...,k - 1, Xo 9
=$0,
GkXk
-gk-
Step 4. For i = 1, 2,..., k , compute
&
100
= LiLi+l ''. Lk-1
Ek = I
9
101 j=k-1
Comment. The qi can now be expressed as follows: 102
qi
Step 5.
For i
=
= eiPk
f
i = l , 2 ,..., k .
9
1, 2,..., k, compute
103
104
Comment. With the above definitions, the solution of (99) is seen to be 105
Xk
=
Mkio f
k-1
k-1
i=l
i=l
1 fliqi +
fiimi-1 f Nk-lqk
+ mk-1
3.2
BOUNDARY-VALUE PROBLEMS A N D DISCRETE OPTIMAL CONTROL
99
i.e., 106
xk
= Ekpk
+ dk
9
where Ek and dk are defined to be the corresponding matrix and vector in (105) (dk does not depend on pk). Step 6. Set 107
108 i=l
i=l
Comment. Applying the boundary conditions (78) to (106),we obtain
109
xk
+
= Ekpk
dk
= Ek(GkTT - @ k x k = (1
110
GkXk
=
+ yk) + dk
+ Ek@k)-'[Ek(GkTn.+ Y k ) + dkl,
Gk(I
+ Ek@k)-' EkGkTn + Gk(I - Ek@k)-l(Ekyk + 4)
- -gk*
Step 7. Set 111
?7
=
112
xk =
113
Pk
=
+ Gk(I + Ek@k)-l(EkYk + ( I + Ek@k)-l[Ek(GkTT+ + dkl, -[Gk(z
-
Ek@k)-' EkGkT]-'[gk
dk)],
Yk)
GkTT - @ k X k
+yk.
Step 8. For i = 1, 2,..., k, compute qi by solving (95), xi by solving (85) with x o = 9, , and pi from (82). It should be clear by now that the solution of the discrete optimal control problem (55)-(57) could be extremely time-consuming, at least in the general case, with k and v both large. However, there are a number of very important cases where considerable simplifications take place. 114 Example. Suppose that (55)-(57) have the specific form k-1
115
minimize
1 ho(xi)+ 4 11 ui /I2,
i=O
subject to
3 ROOT' A N D BOUNDARY-VALUE PROBLEMS
100
In this case, v(-) = 0, g k ( X k ) = x k - kk , and hence (61) becomes pk = T, i.e., (61) carries no information and can be dropped. At the same time, (62) becomes (under the non-degeneracy assumption stated, i.e., po = -l), 117
+ B=ji+, = 0,
-6,
i = O , 1 , 2 ,..., k - I .
Substituting for U, into (116) (and dropping the hats), we see that an optimal trajectory ko,jal ,...,k k must satisfy the system of equations, 118
Xi+l
- xi
=.fXxi)
+
BBTpi+l
i = 0, 1,..., k - 1,
,
X,
= 4,
, xk = k k ,
i = 1 , 2,..., k -
119
1,
with the optimal control sequence then determined by (1 17), where the $, satisfy (118), (119) with x f = k,, i = 0, I ,..., k. Obviously, (1 18), (1 19) is a much simpler system of equations to solve than (58)-(62), which was encountered in the general case. 8 120 Exercise. Obtain the formulas for solving (118), (119) by means of the 8 Newton-Raphson method. 121 Exercise. Consider the quadratic control problem, 122
l
2
minimize
(x ( x i , k
Rixi)
8 x 1
+
k-1 i=O
subject to 123
xi+l
= xixi
+ Biui,
i = 0, 1,..., k - 1, x, = go,
where the v x v matrices R j and the p x p matrices Q,are symmetric and positive definite. Show that under a nondegeneracy assumption, the optimal control sequence G o , G, ,..., 6 k - 1 and the optimal trajectory go, .El ,..., kk must satisfy 124
125 126
= Aixi
pi = -&xi =
+ BiQ~lBiTpi+l,
+ AiTPi+l -
i = 0, 1, 2,..., k - 1,
i = l , 2 ,..., k - 1 ,
x,
=
k,,
Pk=-R$k,
i = O , 1 , 2 ,..., k - 1,
QYIB:$i+l,
where the $,, i = 1,2,..., k, satisfy (124), (125) together with the ki , i = 0, 1, 2,..., k. Also show that for this case, 127
$i
=K
A
9
where K,is a v x v matrix determined by solving
i = 1,2,..., k,
3.2
120
Ki
BOUNDARY-VALUE PROBLEMS A N D DISCRETE OPTIMAL CONTROL 101 =
-Ri + A?Ki+,(Z
i = 1, 2, ..., k
Ai,
- BiQilB:Ki+l)-l
-
1,
with the boundary condition Kk = -Rk . Note that because of (127) we obtain a feedback control law: 129
zii
=
Q;'B?Ki+,(Z
-
i = 0, 1, 2, ..., k - 1.
BiQ;lBiTKi+,)-'
Show that we could have set
fii
130
=
Ki2,
+qi,
i = 1, 2,..., k,
with Ki determined by solving (128) with the boundary condition Kk = 0 and qi determined by solving 131
qi = AiT[Z
+ Ki+l(Z - BiQi1B?Ki+J1
BiQilBiT]qi+l, i = 1, 2,..., k
-
1,
with the boundary condition q k = -Rk&. Which of the two possibilities (127), (130) would you chose for carrying out calculations ? To conclude this section, we shall show that under the assumptions stated in exercise (121), the Riccati equation (128) always has a solution, independently of whether we set Kk = -Rk or Kk = 0. This fact is a result of the two lemmas given below. Suppose that D is a symmetric, positive semidefinite v x v matrix, and that K is a v x v negative semidefinite matrix. Then the matrix (I - D K ) is nonsingular.
132 Lemma.
Proof. We begin by noting that since D is symmetric and positive semidefinite, Dx = 0 if andonly if (x, D x ) = 0,* and that (I - D K ) is nonsingular if and only if ( Z - KTD) is nonsingular. Now, suppose that (I - ZPD) is singular. Then there must exist a nonzero vector x' such that 133
( Z - KTD)x' = 0,
which leads us to the conclusion that KTDx' = 134
XI,
and hence that
(Dx', KTDx') = (Dx', x'). It now follows from the fact that KT is negative semidefinite and that D is positive semidefinite that
135
(Dx', KTDx') = (Dx',x')
= 0,
* This follows from the fact that since D 2 0 is symmetric,there exists a diagonal matrix A 2 0 and an n x n matrix T such that D = (TTA1/z)(A1/zT).
102
3
R O O T A N D BOUNDARY-VALUE PROBLEMS
which leads us to the conclusion that Dx' = 0, and hence that KTDx' = 0. But this is a contradiction and hence the proof is complete. I
136 Lemma. Suppose that K, D are as in lemma (132). Then the matrix K(I - DK)-l is negative semidefinite. Proof. We have shown in lemma (132) that ( I - DK)-l exists; hence the matrix in question is well-defined. Next, for any x E R",
137
(x, K(Z - 0 Q - l ~ = ) ( ( I - DK)-'x, ( I - DK)TK(I- DK)-lx) =
( Y , K . ) - ( Y , KTDKy)
< 0,
where y = ( I - DK)-lx, since K is negative semidefinite and D is positive semidefinite. I
138 Theorem. Consider the difference equation (128): Suppose that for i = 1,2,..., k , the matrices Ri are symmetric and positive semidefinite, and that for i = 0, 1, 2,..., k - 1, the matrices Qi are symmetric and positive definite. Then the difference (128) has a well-defined solution Ki ,i = 1, 2,..., k for Kk = - R k . (Note that We may choose Rk = 0.) Proof. We give a proof by induction. We shall show that the Ki not only exist, but that they are negative semidefinite. First, Kk = -Rk is negative semidefinite, by definition. Suppose that Ki+l is negative semidefinite. Then, setting D i= BiQ;'BiT, we find that Di is positive semidefinite and hence, by lemma (132), we see that Ki is well-defined by (128). By lemma (136), the matrix Ki+l(I - DiKi+l)-l is negative semidefinite. Since Ki is the sum of two negative semidefinite matrices, Ki is negative semidefinite. Thus we have shown that if Ki+l exists and is negative semidefinite, then Ki exists and is also negative semidefinite. But Kk = -Rk is negative semidefinite, I and hence the Ki are well-defined by (128) for i = 1,2,..., k.
139 Exercise. Use a Riccati equation approach to obtain the optimal control law for the problem k
minimize
C [(di , x i ) + ;(xi
, &xi)]
i=l
k-1
+ C [(ci
ui>
i=O
+ Kui ,
Qiui)
+ (xi, Siui)l,
subject to xi+l
=
&xi
+ Biui,
i = 0, 1 , 2,..., k - 1, xo = 4,
where the matrices Ri and Q+are symmetric and positive definite.
m
3.3
BOUNDARY-VALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 103
3.3
Boundary-Value Problems and Continuous Optimal Control
The material to be presented in this section closely parallels the material in Section 2. In fact, a number of the results in this section can be obtained formally from those presented in Section 2 simply by replacing xi by x(t), uiby u(t), xi+l - xi by ( d / d t )x(t), and so forth, since one can always consider the difference equation (2.1) to be a discretization of a differential equation. However, as we shall soon see, there are significant differences between the formulas one obtains for the discrete and the continuous case. Also, a number of results which are valid for the continuous case cannot be adapted to the discrete case without special assumptions, if at all. As we shall see, there are more things one can do and say in the case of boundary-value problems with continuous dynamics than in the case of boundary-value problems with discrete dynamics. As in Section 2, we shall first show how the Newton-Raphson method (1.9) can be used to solve a general class of boundary-value problems with differential equations. Then we shall examine a special class of boundaryvalue problems which arise as necessary conditions of optimality for continuous optimal control problems, as a result of the Pontryagin maximum principle (1.2.35). To facilitate a comparison between the results in this section and those in Section 2, we shall retain the notation introduced in Section 2 for numbering iterates, viz., we shall write &t), A(t), j ( t ) and not xj(t), u3(t),p j ( t ) , which would have been more consistent with the notation used in Section 1. In Section 2, we saw that the Newton-Raphson method could be developed in at least two versions in application to discrete dynamics boundary-value problems. These two versions, the Goodman-Lance version [G4] and the quasi-linearization version [M2], [B6], also apply to continuous dynamics boundary-value problems. We begin with the Goodman-Lance version, since it is the easier one of the two to understand. Thus, suppose that we are given a continuous dynamical system described by the differential equation 1
where f :R" x R1 + aBy is continuously differentiable in x and piecewise continuous in t . Furthermore, we shall assume that for every x E R" the elements of the Jacobian af(x, .)/ax are piecewise continuous. In addition, we are given the boundary conditions 2
104
3
ROOT A N D BOUNDARY-VALUE PROBLEMS
where yo E R", yr E R'-", a < v, and Po ,Pf are full-rank matrices of dimension a x Y and (v - a ) x v, respectively. As in the discrete case, to solve the boundary-value problem (l), (2) by the Newton-Raphson method, we must first transcribe it into the form
3
g(z) = 0.
To do this for the Goodman-Lance version of the Newton-Raphson method, we define x(t; t o , z ) to be the solution of (1) at time t corresponding to the intial state z, i.e., x(to ; t o , z ) = z. Then we define the function X I : [wy -+ R' by* 4
Xf(Z) =
X(tf ; t o , z) ,
and, finally, we define the function g : W y
-+
5
=
[wy
as
-Y f (PfXf(Z) p oz - Y o 1.
Obviously, the function g(.) in (5) is differentiable. With g(.) defined as in (9, we now see that the problem (l), (2) is equivalent to solving the equation g(z) = 0 for an initial state z and then computing x(t; t o ,z ) for t E [to , ti],with Prx(tf ; to ,z ) = yf ensured by the fact that g(z) = 0. From here on we proceed on the assumption that (ag(z)/az)-' exists for all z E R", or at least for all z in a sufficiently large subset of Iwy. Obviously, it may not be possible to verify this assumption. In an actual situation, we would simply go ahead and use the Newton-Raphson method and would account for a contingency occuring by inserting into the program a test which would stop computation whenever it appeared that the sequence being constructed diverges or whenever 1 det ag(,$/az 1 < E, where E is very small. We recall from (1.9) that given a point k E W', the Newton-Raphson i+l method constructs its successor z by solving the equation
6
j=o,1,2 Substituting from ( 5 ) into (6), we obtain i
7
Pr
axf(z) i+l (z az
~
i
-
z)
-Pfxf(z)
=
-Poz
i
8
Po(;' - z )
* We assume that (1)
has a solution on
[I,
i
=
i
+ Y f,
+ yo,
, 14 for every initial state z E R".
,....
3.3
BOUNDARY-VALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 105
which simplifies out to i
axf(;) i+l
9
Pf
az
=Pi- axf(z)i - Prxr(1)
z
az
+ yr ,
j+l
10
Poz = y o . Now, it is not difficult to show that given a perturbation 6z about an initial state of (1) we can compute the perturbation [x(tf ; t o , k 6z) x(tf ; t o , i)] to first-order terms by solving
+
5
i
d dt
- 8x(t) =
11
af(x(t; to ’ z)) Sx(t), ax
t
E
[ t o , tf], 6x(to) = 6z, j
for ax(@. To do this, suppose that for f, s E [ t o , tf], @(t, s) is a v x v matrix defined as the solution of the differential equation,
where Z is the v x v identity matrix. Then we see that 3
13
6X(tf)
= @(tf
, to) 6z,
and hence we conclude that i _ axf(z) _ _ --
14
az
C(tf , to).
Consequently, (9) and (10) become j
15
j
Pf@(tf,t o ) jt’
= Pf@(tf,to)
1 - PfXf(5)+ yf ,
i+l
16
Po z = y o . j
17 Exercise. Show that the matrix @(t, s) also satisfies the adjoint differential equation, 18
d j
i
- @(t,s) = -@(t, s)
ds
i
af(x(s; to ’ z, ax
9s)
,
s E [to , tf], t E [to , ti].
i
Hence, show that the matrix Pf@(tf,to) can be computed by solving the differential equation, i
19
3 d j af(x(s; to ’ z), - P(s) = -P(s) ax ds
j
,
s E [to , ti],
P(tf) = Pi
,
j
to yield Pf@(tr,to) = P(to).
8
3
.I06
ROOT A N D BOUNDARY-VALUE PROBLEMS
Note that just as in the discrete case (compare (2.17)), it is easier to cpmpute 3 3 3 P i @ ( t f , to) by solving (19), than by first solving (12) to obtain @ ( t o , tr) separately, since (19) is a differential equation in fewer variables than (12). We now summarize the above discussion by presenting it in the form of an algorithm. 20 Algorithm (Goodman-Lance version of Newton-Raphson method [G4]). 0
Step 0. Select a z E EP. Step. 1. Set j = 0. i Step 2. For t E [ t o , t f ] ,compute x ( t ; to , by solving (1) with x(to) = z, i i and set xe(z) = x(ti ; to , z). Step 3. For s E [to,,ti], compute the Jacobian matrix af(x(s;t o , ;)/ax. Step 4. Compute P(to) by solving (19). Step 5. Compute by solving
1)
21
h ( t 0 ) j 2 = ;(to)
22
Po z
j+l
z' - PfXf(i) + yf ,
= yo.
+
Step. 6. Set j =j 1 and go to step 2. Thus, just as in the application of the Goodman-Lance version to the discrete case, in Section 2, at each iteration we solve the nonlinear differential i equation (1) in the forward direction from the initial state x(to) = z. Then we solve the linear, variational adjoint equation (19) backwards in time. Finally, we solve a linear algebraic set of equations (21), (22). Note that remark (2.21) is relevant to the present case as well.
23 Exercise. Adapt algorithm (20) for the solution of (1) with boundary conditions of the form go(x(to))= 0, gf(x(tr)) = 0, where go : Ry-+ Ra, gf: R" R"-", a: < v, are continuously differentiable functions whose Jacobians are full-rank matrices for all z E W.
-
24 Exercise. Modify algorithm (20) to obtain a quasi-Newton method of the form of algorithm (2.1.42). To define the quasi-linearization version of the Newton-Raphson method [M2], [B6] for the boundary-value problem (l), (2), we must first introduce the Newton-Raphson method for solving equations of the form 25
g(4 = 0,
when g : L --*. L, with L a Banach space. In this case, the derivative of g ( . )
3.3
BOUNDARY-VALUE PROBLEMS. CONTINUOUS OPTIMAL CONTROL 107
at z E L is denoted by (ag(z)/az)(-) and, if it exists, is defined to be the linear functional from L into L, with the property that 26
where 11 . IJL denotes the norm in L. Assuming that the inverse map (ag(.)/az)-I (.) of (ag(.)/az(.)) is well-defined for all z E Land is continuous on L x L, the Newton-Raphson method for solving (25) is defined by 27 j+l
i.e., to compute z from we solve (27), exactly as in the finite dimensional case. Also, as in the finite dimensional case, if 2 is the limit point of the sequence constructed according to (27), then we must have g(S) = 0 (assuming of course, that this sequence converges and that (ag(-)/i3z)-1(-)is a continuous map from L x L into L). For our purposes, the preceding discussion of the Newton-Raphson method in Banach spaces is sufficient. For a discussion of a number of aspects of importance in perturbation theory, as well as of rate of convergence, the reader may wish to consult the very lucid paper by Antosiewicz [A2], or the original work of Kantorovich [K2]. We can now proceed with the description of the quasi-linearization version of the Newton-Raphson method. Note that in what follows, the choice of the variable z is made in a slightly different manner than was done in Section 2. Let L = Ry x C v ’ [ t ott], , where Cv’[to, tf] is the space of all piecewise continuously differentiable functions from [to , tf] into Ry,with the norm of z = (xo, x(.))in L defined by 28
I1 z IIL
= (I1 x o
112
+ t,s;%ftil I1
X(t)112)”2.
Given a vector z E L, we shall always decompose it into two parts, as follows: z = (x0,x(.)),
29
x0 E Ry, x(.) E Cv’[to, hl.
To define the map g ( . ) for the problem (l), (2), we introduce the maps and Pi : IWq Cv’[to, tr], Po : R’ x Cv’[to,ti] Iwy x Cv’[to, tf] -+ Iwy-a, defined by
2 : Ry x Cv’[to, tf] -+ 30
2(xo, x(.))(t) = xo
-+
+ 1‘f(x(s), s) ds - x(t), to
31
32
Po(xo,x(.)) = Pox0 - Y o , Pf(X0, x(.)) = P&f) - Yf
3
t
E [to,
hl,
108
3
ROOT A N D BOUNDARY-VALUE PROBLEMS
where all the quantities on the right of the equal signs are defined as in (l), (2). Now, let g :L -+ L be defined by 33 Then, from (30), (31) we see that g ( * )is a continuously differentiable map, whose derivative we shall obtain shortly. We shall assume from now on, as we did for the Goodman-Lance version of the Newton-Raphson method that (ag(z)/az)-l exists for all z in a sufficiently large subset of L. 34 Exercise. Show that with g(.) defined as in (33), (27) becomes 35 -
i
-xo -
f f(i(s),
s) ds
+ &t),
to
i
36
Po(% - xo)
37 P&L'(tf)
+ yo,
i
= --Pox0
+ yf .
i
- x(tf)l = -&(tf)
rn
Simplifying out (35)-(37), we now find that (27), with g(.) defined by (33), assumes the form
38
j+l
x(t)
=
+J
3;;
t to
39
j+ 1
Po xa
i
af(x(sh ax
[ix'l(s) - &s)] ds
+ s" &),
s) ds,
to
t = ya
E [to
,ti],
,
j+ 1
40 P i x ( t f ) = yi . To solve the system of equations (38)-(40), we transcribe it into a linear boundary-value problem, as follows:
d dt
i+l
4t)]+ f ( i ( t ) , t ) ,
41
- x ( t ) = af(kt),t , [ Y ( t )
-
42
P 2 ( t 0 ) = yo ,
= yf
ax
PS(t,) j+l
t
E [to
y
ti],
.
i+l
43 Exercise. Show that any solution ( xo , x (.)) of (38)-(40) satisfies (41), (42)
and that any solution %I(.) of (41) together with 2 '; = ':'(to) satisfy (38)-(40), i.e., show that the systems (38)-(40) and (41): (42) are equivalent.
3.3
BOUNDARY-VALUE PROBLEMS. CONTINUOUS OPTIMAL CONTROL 109
We now summarize the preceding results by restating them in the form of an algorithm. Note that in the end we obtain an algorithm which could have been derived formally from algorithm (2.31), simply by replacing [xi+l - xi] by (d/dt)x ( t ) and xi by x ( t ) .
44 Algorithm (quasi-linearization version of Newton-Raphson method [M2], [B61). 0 Step 0. Select an x(.)
0
E
0
C J t o , t f ] such that Pox(to) = y o , Ptx(tf) = yr . 0
Comment. Strictly speaking, it is not necessary to choose x ( . ) such that 0 Po:(to) = yo and Pfx(tr) = yf ; however, such a choice seems to lead to a better initial guess. Comment. Suppose that xo , xi are such that Poxo = y o , Pixt = yt ; then 0 we may set x ( t ) = xo [(t - to)/(tf- to)l(xi - xo), t E [to , ti]. I Step 1. Set j = 0. Step 2. For t E [to , t i ] , computef(i(t), t ) and &(t), t)/ax. Step 3. For t E [to , ti], compute ? ( t ) by solving (41), (42). Step 4. Set j = j 1 and go to step 2.
+
+
45 Exercise. Adapt algorithm (44) for the solution of (1) with boundary conditions of the form go(x(to))= 0, gf(x(te)) = 0, where go : IT@'--+R", gr : R' + 01 < v, are continuously differentiable functions whose Jacobian matrices are of full rank for all z E [wy. 46 Exercise.
Modify algorithm (44) to obtain a quasi-Newton method of the form of algorithm (2. I .42).
47 Remark. Note that the first observation made in remark (2.32) also applies
to the two versions of the Newton-Raphson method we have just discussed, i.e., in the Goodman-Lance version we always have an approximation to a solution of (l), (2), i.e., i(.), which satisfies (l), while in the quasi-linearization version we always have an approximation to a solution of (l), (2) which satisfies (2). This difference between the two versions may be decisive when one has to make a choice between the two. Among people dealing with trajectory optimization problems, the quasi-linearization version appears to be in favor, since it seems less costly in computer time to solve the linear differential equation boundary-value problem (41), (42) at each iteration than the nonlinear differential equation (l), the linear differential equation (19) and the algebraic system (21), (22). Since the linear boundary-value problem (41), (42) must be solved at each iteration of the quasi-linearization version, it is important to perform this
110
3
ROOT A N D BOUNDARY-VALUE PROBLEMS
task efficiently. We shall now describe two methods for solving boundaryvalue problems of the form
+ 4t),
d dt
48
- x(t) = A ( t ) X(t)
49
Pox(t0)= y o ,
P&)
= yf
t
E [to,
tfl,
,
where x(t), v ( t ) E Iw“, A ( . ) is a piecewise continuous map from [ t o ,tp] into the space of all v x v matrices,* yo E (w-~l,yf E aB”--~l, and Po ,PPare maximum-rank matrices of dimension a x v and (v - a) x v, respectively ( a < v). For t , s E [ t o , tf], let @(t, s) be a v x v matrix defined as the solution of
50 where Z is the v x v identity matrix. Then we see that
51
x(tf>= @(tf , to) x(to)
+J
tf
@(tf, s) u(s) ds, t0
52 The reader should recall that the matrix @ ( t , s ) has the following two well-known properties:
53
@(t,s)-’
= @(s,
@(r,
t),
s’) @(s’, s) = @(t,s),
t, s, s’ E
[to
,trl.
We can now use (49) together with either (51) or (52) to obtain a complete system of linear equations in x(to)or in x(tf).Thus, to obtain the initial state x(t,) of a solution x(.) satisfying (48), (49) we may solve the system of equations,
54
Pox(t0) = y o , tf
55
Pi@(tf, t o ) to)
=
- J, Pr@(tf,S ) U(S) ds
+ yf .
Note that for this solution to be unique, the matrix
1
56
(Pf@(tf, to) must be nonsingular.
*In this space we can use the Frobenius norm defined by [IAll; A = (at,).
= &ju:j,
where
3.3 BOUNDARYyVALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 111
Similarly, to obtain the terminal state of a solution x(.) satisfying (48), (49), we may solve the system of equations, P o @ ( ~,otf) x(tf)
57
=
-
J
to
Po@(to , S )
U(S)
ds
+ yo,
te
58
PfX(tf) = yf
.
Note that in the discrete case we could only obtain equations in the initial state x,, (see (2.37), (2.38)). To obtain a set of equations in the terminal state x k , in the discrete case, we would have to require that the matrices ( I Ai)be nonsingular for i = 0, I, ..., k - 1. Even then, the need to invert all these matrices would make the construction of a system of equations in xk rather unattractive. We thus encounter an interesting difference between discrete and continuous systems: A difference equation is easily solved in in the forward direction, but not in the reverse direction; a differential equation can be solved in either direction with equal ease (provided, of course, that the solution does not grow extdmely large or small). 00) we would choose to solve When (48) is stable (i.e., @(t, s) --+ 0 as t (54), ( 5 9 , while when (48) is unstable (i.e., @(t, s) + 0 as s + -a),we would choose to solve (57), (58). In either event, however, when tr - to is large, both the system (54), (55) and the system (57), (58) may be badly ill-conditioned due to the fact that the elements of Pf@(tr, to) appear to be zero compared to the elements of P o , or vice versa, with a similar statement holding for (57), (58). Fortunately, (54), (55) is not the only set of equations one can set up to compute an x(to) such that the solution x ( t ; t o ,x(to)) of (48) satisfies (49), and similarly for (57), (58). We shall now describe a method due to Abramov [All for constructing an alternative set of equations in x(to). These equations are somewhat more difficult to set up than (54), ( 5 9 , however, they are much better conditioned than (54), (55). We shall leave the adaptation of Abramov’s method to obtain a substitute system for (57), (58) as an exercise for the reader. We begin by describing an efficient way for setting up (55) (compare (2.39)). For t E [ t o , if], let P ( t ) be a (v - a ) x Y matrix which satisfies the adjoint equation, d 59 P ( t ) = -P(t) A @ ) , t E [ t o , frl, PO!) = P I .
+
--f
Then P ( t ) = P f @ ( t f, t ) ,
60
and hence (55) becomes 61
t
[ t o , tfl,
112
3
ROOT A N D BOUNDARY-VALUE PROBLEMS
Note that (59) is a differential equation in fewer variables than (50), and hence (61) is easier to compute than ( 5 5 ) if one were to compute (55) by first solving (50). 62 Exercise.
Obtain a formula analogous to (61) to replace (57). [Hint: Use the adjoint equation (59), but with a different boundary condition.] m Abramov's method consists in "normalizing" the matrix P(t). Thus, for t E [to , t f ] , let Q(t) be a (v - a) x v matrix defined by
63
Q= W t ) P(t>,
where P ( t ) is the solution of (59) and M ( t ) is a (v - a ) x (v - a) nonsingular normalization matrix which we shall determine in such a way as to ensure that 64 65
The normalization we use is expressed by (65). To simplify notation, we shall use a dot over the letter notation to denote differentiation with respect to t, i.e., o ( t ) = (d/dt)Q(t), etc. Then, from (65), we obtain 66
67
68
Clearly, (68) is satisfied if we choose M ( t ) so that 69
M(t>W t 1 - l Q(t>Q ( t Y
=
Q4 t ) Q(t>'.
Now, from (67), (59) and (63), 70
Q(t) = M(t) M(t)-l Q(t) - Q(t)A ( t ) .
3.3
BOUNDARY-VALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 11 3
Now, for t E
72
[ t o , tf],
let w ( t ) E [Wy--OI be defined by
w(t> = Q(t>x(t), where Q(t) is the solution of (71) and x(t) is a solution of (48), (49). Then we see that
Hence we see that we can replace the system of equation (54), (55) by the system of equations, 74
75
Pox(t0) = yo Q(to)
to)
=
,
to),
which is obviously better conditioned than (54), ( 5 5 ) and which is computed by first solving (71) and then (73). 76 Exercise. Adapt Abramov's method to obtain a substitute set of linear equations in x(tf). When would you prefer to use Abramov's method to obtain a system of equations in x(tf) rather than in x(to)? Note that Abramov's method is much more powerful in the continuous case, where we can take all the constraints on x(tf) and transform them simultaneously into constraints on x(to), than in the discrete case, where we could take the constraints on xk only, equation by equation, and transform them into constraints on x, . Obviously, the amount of ill-conditioning that can be removed by means of Abramov's method in the continuous case is much larger than in the discrete case. We now turn to boundary-value problems which arise as a necessary condition of optimality in certain continuous optimal control problems.
* Note that Q(t)Q(t)T = PrPfT.
114
3
R O O T A N D BOUNDARY-VALUE PROBLEMS
In particular, let us consider the problem,
w h e r e f o : R u x R U x R 1 + R 1 , f : R v x R u x R 1 - + I W Y , v:iw'-+R1 and gf : [wy + R= satisfy the assumptions in (1.14, i.e., af "/ax, af Opu, aflax, aflau exist and, together withfO and f are piecewise continuous in t. Furthermore, we assume that aglax exists and has maximum rank in a sufficiently large subset of W". As in Section 2.5, we 'shall assume that the controls u ( - ) belong to La'@[tO,tfl. Applying the Pontryagin maximum principle (1.2.35) to the problem (77)-(79), we find that if ti(.) is an optimal control and $(-) is the corresponding optimal trajectory, then there exists a vector 73 E Raand a multiplier function $ : [to , tf] + [wy (usually referred to as the co-state), such that (assuming that the problem is nondegenerate, i.e., that $O = -1 in (1.2.35) for this case) ti(-), a(.), 73 and $(.) satisfy 80 81
$(to) =
$0,
gP(a(tf)) = 0,
82 83
and for almost all t E [ t o , ti],
Hence, we see that ti(*), $(.),$(-), and 73 must also be a solution of the following system of equations: 85 86
3.3
BOUNDARY-VALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 115
Now, since we have assumed that the controis u(.) are in Lm@[to, tt], solutions x(-) and p ( - )of (85)-(88) must be continuous because of the nature of (85) and (87), i.e., x(.) and p ( . ) are in Cv[to,t t ] , the space of continuous functions from [ t o ,tf] into [wv. Let L = DB" x Cv[to, ti] x CY[tO, tfl x Lau[to,tf]. We shal1,always partition elements z E L as follows: 89
z = (a, 4 . h P(.), 4.)).
To make L a Banach space we now introtluce a norm which we denote by 1) * )IL and define by 90
NOW, let g, : L --t CJto , t t ] , g, : L -+C,[to, t t ] , g3 : L defined by
If we now defineg : L+ L by 94
--+
La"[to, ti] be
116
3
ROOT A N D BOUNDARY-VALUE PROBLEMS
we find that solving (85)-(88) is equivalent to solving the equation g(z) = 0. It is not difficult to see that g ( . ) is continuously differentiable, and hence that (27) becomes, in this case, for t E [to , ti],
=
-g&,
l(9, i(9, i
98
A(", i
agf(x(tr)) ( 2 ( t i ) - X(ti)] = -gi(f(ti)). ax
(For a definition of (a2f0(x,U, t)/au ax)* and of (azfO(x,U , t)/ax au)* see the footnote on p. 70.)
3.3
BOUNDARY-VALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 117
99 Exercise. Convert (95) and (96) into a system of differential equations and state the boundary values for this system. [Hint: Proceed as in (43).] As in the discrete case (2.67)-(2.75), (97) may or may not be solvable uniquely for % l ( t ) , t E [ t o , tp] [compare (2.74)). When it is not solvable, the system of equations (95)-(98) becomes exceedingly difficult to solve. We shall therefore assume that (98) can be solved uniquely for 'L1(t), in which case G1(t) must be an affine function in J ( t ) and j ( t ) , i.e., j+l 5 3 u ( t ) = &(t)%'(t) h(t)'$(t)&t), where N ( t ) , M ( t )are p x v matrices and i ( t ) E W. In this case, substitution for % l ( t ) into (95) and (96) reduces our problem to solving a boundary-value problem of the following form (compare (2.76)-(2.78)):
+
+
3
Y ( t ) = A ( t ) F ( t ) + B(t) dt
100
- i(t),
t
E [to
, tf],
101 102 103 3
+
+
j
104 Exercise. Assuming that %'(t) = N ( t ) ?l(t) M ( t )G 1 ( t ) &t), 'jf [to> t f ] , obtain expressions for all the quantities in (100)-(103) (ie., for A ( t ) , B(t), etc.). To simplify notation, just as in the discrete case, we now drop the super1 on all the symbols in (100)-(103). We now have (at least) scriptsj and j two alternatives in choosing a method for solving (100)-(103). The first of these alternatives is favored in the U.S.S.R. and consists of utilizing the Abramov method discussed a little earlier in this section. To apply j+l Abramov's method to (100)-(103), we must first eliminate n (i.e., n ) from (103). This can be done as follows: Let Pr be any (v - 01) x v full-rank matrix satisfying
+
105
PpGiT = 0. Then (103) is obviously equivalent to the constraint
106
PfP(tP) - Pf@fX(tf)
= yi
.
To construct such a matrix Pi we may proceed as follows: Without loss of generality, suppose that the first 01 columns of Gr are linearly independent
118
3
ROOT A N D BOUNDARY-VALUE PROBLEMS
(Gf is an a x v matrix which was assumed to be of full rank, since Gf = agf(x)/ax for x = i(tf)). Let G{ be the a x 01 matrix consisting of the first a columns of Gf ; then we may write G, = [G,', GJ to express Gf in partitioned form. Now consider the equation 107
[G,', GJ
@:I)
=
Gf'Pfr
+ G;P;~ = 0,
where Pf = [Pi', Pi], with Pr' an (v - a) x a submatrix and P; a (v - a) x (v - a) submatrix. (We have simply expanded the transpose of (105).) Hence,
p,'
1 08
=
-p"GT(G;T)'1. f f
Now let P; be the (v - a) x (v - a) identity matrix. Then Pf = (P;, Pi), with Pf' as defined by (108) has full rank and satisfies (105). Abramov's method then yields the following set of equations in x(to),p(to) (compare (74), (75)): 109
X(t0) =
110
Qi to)
+ Qz(t0)
to)
so,
= Nto),
where the v x v matrices Ql(to),Q2(t0)are obtained by solving the differential equation, d
111
dt Q(t> = Q71-1 - Q(t)&t),
t
E [to
tfl,
9
where Q(t) is a v x 2 v matrix which partitions into [Ql(t), Q,(t)] = Q(t) with Q&), Q,(t) v x v submatrices, A(t) is a 2 v x 2v matrix defined by 112
and
(- G, -1- I -). -Pf@f
113
Q ( t f )=
-
Pf
-
The value of h(to) is computed by solving the differential equation, 114
d
dt h(t) = Q ( t ) A(t)Q(t)T [Q(t) Q(t)l-l N t )
+ Q(t) 4 t ) ,
t
E [to
7
tfI
3.3
BOUNDARY-VALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 119
where h(t) E R”,d(t) = (-u(t), -w(t)) (column vector) and
( ).
115
h(tf) = Yf -g
The second alternative in solving (100)-(103) is favored in the U.S. and consists of utilizing the possibility that the relation 116
PW
= K(t) x ( t )
+qw,
t E [to , tfl
is valid, with K(t) a v x v matrix and q(t) E R”. Assuming that (116) is true, we find from (1 16), (100) and (101) that
where we have used a super dot to denote differentiation with respect to Equating terms on both sides of (1 17), we find that we may set 118 119
zd K(t) zd dt)
=
-K(t) A ( t ) - K(t) B(t) K(t)
=
[ - K ( t ) B(t)
+W
+ D(t) K(t) + C(t),
l q(t) - K(t) 4 t ) - w,
2.
t E [to, t ~ ] ,
t E [to , tfl,
and from (103), we may set 120
K ( t f ) = @f ,
q ( t f )= GfTT
+Y f .
Alternatively, we may set 121
K(tf)
= 0,
d t f ) = P(td
and retain (103) as a condition to be made use of later. Thus, (1 16) is possible provided the Riccati type of differential equation (1 18) has a solution with one of the possible boundary conditions given. Assuming that the Riccati equation (1 18) does have a solution for the boundary conditions indicated, we can summarize the procedure for solving (100)-(103), based on its use, as follows: 122 Algorithm (solves system (100)-(103)).* Step 1. For t E [to , ti], compute K(t) by solving (119) with K(tf) = @f *When Gp = 0 in (103), the calculations below can be simplified considerably.
.
120
ROOT AND BOUNDARY-VALUE PROBLEMS
3
Step 2. For t
E
[to , ti],
compute the
d
123
dt Y(t) = [--K(t) B(t)
v
x v matrix
+ D(t)] Y(t),
Y(t) by solving t
E
[to , ti], Y ( t f )= I,
v x v identity ma&. Step 3. For t E [ t o , te], compute
where Z is the
124
Z(t) =
st
Y(t) Y(s)-l [-w(s) - K(s) v(s)] ds.
te
Comment. With the boundary conditions (120) (see (53)),*
+ + lo),
125
4(t) = Y(t)(GfT.rr Yf)
Step 4.
t E [ t o , trl.
For t E [ t o , t i ] , compute the v x v matrix X ( t ) by solving
126
where l i s the v x Comment.
d
127
identity matrix.
With p(t) defined by (1 16), we have
-x(t)
dl
v
=
+
[A@) B(t) K(t)] x ( t )
+ B(t) q(t) + u(t),
t
E [to,
trl.
Hence, because of (125) and (126) (and because of (53)),
+ 1"X(t)X(s)-l B(s)Z(s)ds + [f X ( t ) X(s)-l Y(s)ds] GnTr. to
t0
Step 5. Compute 129
xi
=
X(ti) G o ,
tf
131 Mi = J t o X ( t f )X(s)-l Y(s)ds.
Comment. We now make use of (103) to compute n,
* Since we do not know n=, we cannot compute q ( t ) by solving (119) directly. However, when Gf
=
0, this difficulty disappears.
3.3
BOUNDARY-VALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 121
Step 7. For t E [to , 213, compute x ( t ) according to (128) and 4 ( t ) according to (125). W Step 8. For t E [ t o , tr], compute p ( t ) according to (116).
133 Exercise. Develop an algorithm for solving (100)-(103) which uses (118), (1 19) with the boundary conditions (121). 134 Remark. Both the Abramov method and (122) involve the solution of a Riccati type of differential equation. It has been claimed in the U.S.S.R. that it is much more likely that (1 11) will have a solution than that (1 18) will have a solution, and that the system (109), (110) is usually better conditioned than the system of equations implied by (132). In the US., critics of the Abramov method point out, however, that (1 11) is considerably harder to solve than (1 18) and that, in general, in the Abramov method one deals with differential equations of larger dimension than in (122). It may well be, however, that this geographically based difference of opinion is due mostly to a preference resulting from familiarity with one method or the other, rather than to extensive computational experience, since there seem to be no published comparisons of the two methods. The interested reader may wish to conduct an experiment or two for himself so as to decide which of the two approaches works better for the particular class of problems he is dealing with. To conclude this section, we consider an important special case of the problem (77)-(79)-the linear-dynamics, quadratic-cost-regulator problemwhich can be solved completely, as we shall soon see. In the process of solving this problem, we shall also establish a particular case of (118) for which a solution exists. We base our presentation on a paper by Bucy [B5].Thus, consider the optimal control problem
135
minimize
1 2
- [(x(T),@x(T))
+ 1' ( x ( t ) , C x ( t ) ) + (u(t), Ru(t)) d t ] ,
subject to 136
where, as before, x ( t ) E W, u(t) E R', and A , F, @, C, and R are constant matrices of dimension v x v, v x p, v x v, v x v, and p x p, respectively. We shall assume that the matrix R is symmetric and positive definite, and that the matrices @ and C are symmetric and positive semidefinite. (Note
122
3
ROOT A N D BOUNDARY-VALUE PROBLEMS
that since all the matrices in (135), (136) are time-invariant, we lose no generality by setting to = 0, t f - to = T; compare (77)-(79).) For the problem (135), (136), the conditions (80)-(84) are not only necessary, but also sufficient, and they assume the specific form,
d dr
137
- 4(t) = A q t ) + Fiqt),
138
-$(t) dt
d
139
$(O)
140
--RLi(t)
= C q t ) - AT$@), = $,
,
t
E
[O,TI,
t
E
10,
TI,
t E 10,
TI,
$(T) = -@4(T),
+ F*$(t) = 0,
with (140) being obtained by carrying out the maximization indicated in (84). Solving (140) for a(t) and substituting into (137), we find that a(.), $(.) must be a solution of the system of differential equations,
d dt
141
-x(t) = M t )
142
d -p(t) dt
143
=
+Mt),
Cx(t) - ATp(t),
x(0) = 4, ,
t
E
[O,TI,
t
E
10,TI,
p ( T ) = -@x(T),
where B = FR-'FT. Note that B is symmetric and positive semidefinite. Now, for t, s E [0, TI, let X(t, s) be a 2v x 2v matrix defined by 144
d
X(t, s),
t, s E [O, TI, X(s, s) = I,
where I is the 2v x 2v identity matrix. We shall partition the matrix X(t, s) into four v x v blocks as follows: 145
We now see that for any t E [0, TJ, 146 147
3.3
BOUNDARY-VALUE PROBLEMS, CONTINUOUS OPTIMAL CONTROL 123
Now because of (143), we obtain, 148
and hence, for t E [O, TI, 149
150
exists for a11 t E [0, TI. (Note that E(T) is nonsingular, since E(T) = I , the v x v identity matrix.) Let us suppose for the moment that E(t) is nonsingular for all t E [0, TI. Then, for t E [0, TI, defining the matrix K ( t ) by 151
we have 152
and hence (1 41) becomes 153
d dt
-~ ( t= ) [A
+ BK(t)] x(t),
t
E
[0, TI, x(0) = xo .
154
155
where the super dot denotes differentiation. Now, it is well-known that 156
d
- X(T, t) dt
-A
=
X(T, t >
I
----I----
-B
( - C ! AT
,
124
3
ROOT A N D BOUNDARY-VALUE PROBLEMS
which yields, upon rearranging terms and multiplying both sides by E(t)-I (see (150)), d 158 - K(t) = -K(t) A - ATK(t)- K(t) BK(t)
dt
+ C,
t
E
[0, TI, K(T) = -4,
where the boundary condition K ( T ) = -@ is obtained directly from (151) by inspection. Note that (158) is identical with (118) for the case A ( t ) = A , B ( t ) = B, C(t) = C, D(t) = AT. We shall now show that the matrix E(t) in (150) is indeed nonsingular for all t E [0, TI, and hence that (158) has a solution K(t) for t E [0, TI which is given explicitely by (151). 159 Theorem. For t E [0, TI, the matrix E(t) defined by (1 50) is nonsingular.
Proof. We begin by observing that if x’(t),p’(t) satisfy (141), (142) for t E [t‘, TI, with t‘ E [0, TI, x’(t’) = xo’,p’(t’) = po’, then
Hence, we must have
Also, we observe that E(T) = I, the v x v identity matrix, and hence is nonsingular. Now suppose that E(t’) is singular for some t’ E [0, T ) ; then there must exist a nonzero vector po‘ E R”, such that 162
[@X,,(T, t’)
+ &(T, t‘)l
Po’ = 0.
Now let x’(t),p’(t)satisfy (141), (142) for t E [t’, T ]with x‘(t’) = O,p‘(t’) = p o l . Then we must have 163
3.3
BOUNDARY-VALUE PROBLEMS. CONTINUOUS OPTIMAL CONTROL 125
and hence, because of (162),
+
164
@x’(T) p’(T) = 0. However, by (161), we must also have
Making use of (164), we now obtain 166
0
=
+
(x’(T),@x’(T)>
( ~ ‘ ( t ) Cx‘(t)) ,
+ ( p ’ ( t ) ,Bp’(t)) dr.
By assumption, the matrices B, C, and @ are all symmetric and positive semidefinite. Consequently, we must have @x’(T)= 0,
167
for (166) to hold. But Cx’(r)
Cx’(t) = 0, = 0 for
Bp‘(t) = 0
for t E [t‘, TI,
r E [r‘, T ]implies that
168
and hence, p‘(T) = exp[-(T - t’) AT]po’ # 0.
169
But by (164) and (167), 170
p’(T) = -@x’( T ) = 0,
which contradicts (169). Consequently, the only po’ for which (162) can hold rn is the zero vector, i.e., E(t) is nonsingular for all t E [0, TI. 171 Exercise.
Show that the matrix K ( t ) defined by (158) is symmetric and rn negative semidefinite for all r E [0, TI. This concludes our discussion of unconstrained optimization and boundary-value problems as they occur in continuous optimal control.
EQUALITY A N D INEQUALITY CONSTRAINTS
4.1
Penalty Function Methods
We now return to the constrained minimization problems (1.1. l), (1.1.3) and ( I . 1.8). Most of our results in this section will be for the nonlinear programming problem (1.1. I), with the discrete optimal control problem (1.1.3) being treated as a special case of (1.1.1). The treatment of the continuous optimal control problem (1.1 .8) is considerably more difficult and more involved than that of the preceding finite dimensional problems and we shall content ourselves with only a descriptive presentation of some of the more important results. For the purpose of presenting penalty function methods, it is convenient to write the nonlinear programming problem (I. 1.1) in the following, more compact form:
1
(CP)
min(fo(z) I z E C C R">,
wherefO : R" + R' is a continuously differentiable function and C is a closed subset of R". The underlying idea behind penalty function methods is that of solving the problem (l), which we shall call (CP), by constructing a sequence of points zi E R" which are optimal for a sequence of unconstrained minimization problems of the form
2
(UP),
min(fo(z)
+ pi ( z) I z
E
Rn>,
i=O,1,2
,...,
which we shall call (UP), , the (UP), being so constructed that the z, -+f E C as i + oc) and f is optimal for (CP). There are two major, basically different 126
4.1
127
PENALTY FUNCTION METHODS
approaches to constructing the problems (UP), , or to be more specific, the penalty functions pi(.). The first approach, which is known as the exterior penalty function method, was proposed by Courant in 1943 [C5], at a time when almost all the well-known nonlinear programming algorithms of today were still to be invented. In the exterior penalty function method, the functions pi(.) are chosen so as to make it progressively more and more expensive to pick a point not in C, and so that either p,(z) = 0 for all z E C or so that p,(z) 0 as i co for all z E C. In presenting exterior penalty function methods, we shall draw upon the relatively recent work of Zangwill [Z3], rather than upon the early work of Courant which was specifically directed towards the solution of problems in differential equations. The second approach is called the interior penalty function method. In it, the penalty functions pi(.)are chosen so that the zi which are optimal for the (UP), must all belong to the interior of C, fo((z) pi(z) - f o ( z ) as i -+ co for all z in the interior of C , and f o ( ( z ) p,(z) -+ 03 as the point z approaches the boundary of C from within. Because of this last property of the pi(.), interior penalty function methods are also known as barrier methods, since they repel the points j = 0, 1,2,..., constructed by any one of the algorithms discussed in the preceding chapter, in the process of solving 0 (UP), from the boundary of C. Because of this, the initial point z used for solving (UP), must be picked in the interior of C. In our exposition of interior penalty function methods we shall draw upon the work of the two best known contributors in this area, Fiacco and McCormack [FI], [F2]. Incidentally, their book [F2] is the most comprehensive exposition of penalty function methods presently available and the reader is referred to it for further reading. As we shall also see, there are many situations where interior and exterior penalty function methods can be combined profitably to yield a mixed approach. For additional reading on this subject the reader is referred to [F2] and to [L3]. In our presentation, we shall proceed as follows: First we shall establish the properties with which penalty functions must be endowed in order to guarantee that the zi which are optimal for the (UP), converge to a point which is optimal for (CP). After that we shall examine the computational aspects of penalty function methods. Finally, we shall discuss their application to optimal control problems. -+
-+
+
+
i,
Exterior Penalty Function Methods
The reason for switching to the plural, i.e., for saying methods rather than method is that each choice of a family of penalty functionspi(.) in (2) results in a different algorithm. Thus, there is a whole class of exterior penalty function methods.
128
4 EQUALITY A N D INEQUALITY CONSTRAINTS
Exterior penalty functions.
3 Definition. Let C be a closed subset of P. A sequence of continuous functions pi' : U P -+ W,i = 0, 1, 2, ..., is called a sequence of exterior penalty functions for the set C if p,'(z) = 0
>0 P : + ~ ( Z ) > p,'(z) PiW PiW
-
aJ
forall Z E C , i = O , 1 , 2 ,..., forall z # C , i = O , 1 , 2 ,..., for all z # C, i = 0, 1, 2,..., as i- co for all z # C.
m
Now, consider the problem (CP) defined in (1) and let pi'(.), i = 0, 1,2,..., be a sequence of exterior penalty functions for the set C in (1). We introduce a sequence of unconstrained minimization problems (UP):, defined as follows:
8
(UP); min{fo(z)
+ p,'(z) I z E Rn},
i=O,1,2
,....
To ensure that both the problem (CP) and the problems (UP); have a solution, it suffices to make the following assumption: 9 Assumption. We shall suppose that there is a point z'
Z' = { z I f o ( z ) 0. Hence, we obtain, upon letting i --+ 00, f o ( 9 )6 rn. Consequently, we must have f o ( 9 )= E, i.e., 9 must be optimal for (CP). 24 Exercise. Suppose that the sequence {Z~}F=~ is defined as in theorem (21). rn Show that if zj E C, then ziE C for all i 3 j .
25 Exercise. Suppose that f j : Rn functions, and that 26
C
= {z
Ifj(z)
Show that the functions p i f : Rn 27
-+
---f
R',
j = 1,2, ..., m, are continuous
< 0 , j = 1, 2,..., m}. R1, i = 0, 1,2,..., defined by
4.1
PENALTY FUNCTION METHODS
131
where /3 3 1 and ui, i = 0, 1, 2, ..., is a strictly increasing sequence of positive numbers which tends to co as i- co, is a sequence of exterior penalty functions for the set C in (26). w
28 Exercise. Suppose that the functions fj(.) in (26) are continuously differentiable and that p 3 2. Show that under this assumption the functions pi’(-) defined by (27) are also continuously differentiable. 29 Exercise. Suppose that r : Rn
-j
30
= {z
C
R” is continuous and that
I r(z)
= O}.
Show that the functions pi’(-),i = 0, 1, 2, ..., defined by 31
where /3 3 1 and a i , i = 0, 1,2,..., is a strictly increasing sequence of positive numbers which tend to co as i+ co, is a sequence of exterior penalty functions for the set C in (30). Also, show that p i ’ ( . )is continuously rn differentiable whenever r(.) is continuously differentiable and /3 2.
32 Exercise. Suppose that {pi’(.)}Lois a sequence of exterior penalty functions is a sequence of exterior penalty functions for for the set C and that {gi’(.)}Eo the set c. Show that {pi’(.)+ji’(.)}Lois a sequence of exterior penalty functions for the set C n and that {min{pi’(.),j5i’(.)}}Lois a sequence of exterior penalty functions for the set C u
c,
c.
33 Remark. Exterior penalty functions can be used not only to transform a constrained minimization problem into a sequence of unconstrained minimization problems, but also to remove constraints which certain algorithms will not accept. Suppose that we wish to solve the general nonlinear programming problem (1.1. l), i.e., min{fo(z) I r(z) = 0, f(z) 5 0} and that the function r ( . ) is not affine. Virtually all the methods to be described later are inapplicable to this problem because of the nonlinear equality constraint r(z) = 0. However, suppose that is a sequence of exterior penalty functions for the set {z I r(z) = O}; then, under suitable assumptions, we can use a number of methods to solve the, sequence of problems, min(f*(z) p,‘(z) 1 f ( z ) 5 O>, i = 0, 1, 2, ..., to obtain a solution for the original problem. By removing only some of the constraints with penalty functions rather than all, we may hope to obtain better numerical w behavior (better “conditioning”) in our computations.
+
34 Exercise. Consider the problem (l.l.l), i.e., min{fo(z) I r(z) = O , f ( z ) 5 0}, where all the functions are continuously differentiable. Let {pi’(.)}?=o be
4 EQUALITY A N D INEQUALITY CONSTRAINTS
132
a sequence of exterior penalty functions for the set {z I r(z) = 0}, and consider the sequence of problems (Pi) : min{f O(z) p,’(z) I f ( z ) 5 0},i = 0, 1,2,... . Give sufficient conditions for the problems (1.1.1) and (Pi) to have solutions. then any accumulaShow that if zi is an optimal point for the problem (Pi), tion point of the sequence {zi}zo is an optimal point for the problem (1.1.1). m
+
35 Exercise. Consider the problem
36
min{fo((z) I z E C C L},
(CP),
where L is a Banach space, C is a closed subset of L, and f O : L -+ R1 is a continuous function. Show that all the results presented so far remain valid for the problem (CP), provided we replace Rn by L in (3), (8), (lo), (1 l), (18), (21) and all the preceding exercises.
37 Exercise. Consider the continuous optimal control problem 38
ST (1
minimize
0
x ( t ) - x’(t)llz dt,
subject to
d 39 - x ( t ) = Ax(t) dt
+ Bu(t),
t
E
[0, TI,
x ( t ) E R”,
~ ( tE )Ru,
~ ( 0=) io,
40
where the matrices A , B are constant, and x’(.) is a given nominal trajectory. Show that by setting L = Lzu[O,TI,
c = (4-1 E L2m
41
TI I II 1; d
01,
and by defining pi‘(.) as 42
pi’@)
= mi
max{(ll u 1; - 4, 01,
where mi , i = 0, 1, 2, ..., is a sequence of strictly increasing positive numbers which tend to og as i -+ 00, we obtain a sequence of unconstrained problems, 43
minimize
I‘ II
x(t;
4 - x‘(t)112dt
4.1 PENALTY FUNCTION METHODS
133
3.;
whose solutions h(.) and corresponding trajectories h) may converge to the optimal solution G(.) and optimal trajectory a(.) of (3Q-00) as i-+ a. In (43) we denoted by ~ ( tu); the solution of (39) at time t corresponding to W the given initial state go and the indicated input u(*).
Interior Penalty Function Methods We now proceed to interior penalty function methods for solving the problem (CP) defined in (1). As in the case of exterior penalty function methods, the results we are about to present are trivially extendable to the infinite dimensional problem (CP), defined in exercise (35) by replacing R” with L in all the statements to follow.
44 Assumption. We shall assume (i) that the set C in (1) is closed, that it has an interior, and that the closure of the interior of C is equal to C, i.e., that = C # m y the empty set; (ii) that (9) is satisfied. (We denote the interior of C by W The effect of assumption (44) (i) is to rule out constraint sets with “whiskers,” such as the set in R”,
c.)
45
(z
I (11
z - c 112
- II r l12)( - I1 r 112)2 d 01,
which consists of a ball of radius 11 r [I and center c, and of a tangent line which passes through the point c r and is orthogonal to r. The reason for this assumption is that interior penalty function methods construct sequences in the interior of C and hence could never find an optimal point located on a “whisker.”
+
Interior penalty functions.
134
4
EQUALITY A N D INEQUALITY CONSTRAINTS
-
46 Definition. Let C be a subset of IW" which satisfies assumption (44). A sequence of continuous functions p ; : R1, i = 0, 1, 2, ... (where is the interior of C ) is said to be a sequence of interior penalty functions for the set C if
c
47 48
-
0 < p:+,(z) < p:(z)
Pl(4
49 pi(;)
for all z E
c
c
and i
= 0,
1, 2, 3,...,
C,
0
as i+ co for all z
co
as j + co for any sequence z* E aC, as j + such that and i = O , 1 , 2 ,....
E
6)
;+
E
00,
(In (49) aC denotes the boundary of C . ) Now consider the sequence of problems 50
(UP):
min{fo(z)
+ p;(z) 1 z c>,
j
E
= 0, 1,
2, 3,..., *
where the pj"(.)are interior penalty functions for the set C in (1). 51 Lemma. Consider the problems (UP): above and suppose that there is a z" E C such that the set { z 1 f o ( z ) fo(z") p;(z")} is compact. Then for j = O,'l, 2, ..., there exists a zi E which minimizesfO(z) p;(z) over z E
c
fO(z") p;(z"), and hence zj is optimal for (UP): .
+
c
+
0
53 Exercise. Suppose that we have a point z E C. In practice, the p ; ( * ) are defined for all z E W, z 4 X , and to use any one of the algorithms described in Section 2.1 to solve the problems (UP):, j = 0, 1,2,..., it is necessary to modify slightly the step size subprocedures so as to ensure that the sequences which they construct remain in C. Devise such modifications for the subprocedures (2.1.14), (2.1.33) and (2.1.36) to be used in the algorithms (2.1.16), (2.1.19) and (2.1.39, respectively. Make sure that your modifications result in algorithms for which the m conclusions of theorem (2.1.22) remain valid.
* Note that we use the i in (UP): to indicate "interior," while j running index.
=
0, 1, 2, 3,..., is the
4.1
135
PENALTY FUNCTION METHODS
F o r j = 0, I, 2, ..., let zj be optimal for the unconstrained problem (UP):. Then, assuming that (44) is satisfied, every accumulation point P of the sequence {zj}jmois optimal for the constrained problem (CP).*
54 Theorem.
Proof. F o r j = 0, 1, 2,..., let 55
b j = min(fo(z)
56
b
=
+ &(z)
min{fo(z) I z
E
I z E c},
C}.
Then, because of (47), we have 57
Since the b j form a bounded, monotonically decreasing sequence, they must converge to some b' 2 b. Suppose that b' > b. Let f be an optimal point for (CP), then, sincefo(-)is continuous and because of (i) in assumption (44), there must exist an open ball B, with center f such that B n c # o,the empty set, and for all z" E B, (b' - b) ~ O ( Z " ) < b' 58 2 . ~
Now take any point z" in B n c. Then, since by (48) p;(z") there exists an integer N such that for a l l j 2 N , p;(z")
59
-j
0 as j + 00,
(b' - b) < ___ 4 '
and hence for a l l j 3 N , bj
60
< f0(z") + p"(z;)
< b'
-
(6'
-
b)
4
'
which contradicts the fact that b j + b'. Hence we must have b' = b. Now, let z* be any accumulation point of the sequence {z,}:==, ,i.e., z j z* a s j .+ 00, j E K C (0, 1,2, 3, ...}. Suppose that z* is not optimal for (CP). Then we must have fo(z*) > b, and hence the sequence ([fo(zj) - b] p;(zj)}, j E K , cannot converge to zero, which contradicts the fact that (bj - b) 0 a s j + 00. Hence we must havefo(z*) = b, i.e., z * is optimal for (CP). m -+
+
-+
Suppose that for i function, and consider the set
61 Exercise. 62
C
=
= {z Ifi(z)
1,2,..., m, fi : Rn + R1 is a continuous
< 0, i = 1, 2,...,m}.
* We call the problems (UP): unconstrained because they can be solved by simple modifications of the algorithms in Section 2.1 ; see exercise (53).
136
4
EQUALITY A N D INEQUALITY CONSTRAINTS
Assuming that assumption (44) is satisfied, that for every z E &, fi(z) < 0, i = 1, 2 ,..., m, and that aj, j = 0, 1, 2, 3 ,..., is a sequence of strictly decreasing positive numbers which converge to 0 as j -+ co (ai10), show that interior penalty functions for the set C can be defined at least in the following two ways: ZEC, j = o , 1 , 2
,...,
m
64
P;(z)
= -aj
1 log[-fi(z)l i=l
= -aj
log((-l)"fl(z)f2(z)
.**jm(z)>, z
with (64) defining a penalty function only if M
E
Q,
j = 0, 1, 2, 3,...,
= maxj(-minjfj(z))
< co.*
rn
65 Exercise. Consider again the optimal control problem (38)-(40). Show that it can be solved by obtaining the limit point G(.) of the $(.) E L,@[O,T],
which are optimal for the unconstrained sequence of problems, 66
minimize
/
T
/I x ( t ; u ) - x'(t)Il2 dt
- q log [o -
0
I' I/ u(t)l12d t ] ,
subject to
j = o , 1 , 2 ,...,
d.1E L2p[o, TI, where the aj > 0 decrease strictly to zero as j -
m
co.
Exterior-Interior Penalty Function Methods As we shall soon show, under suitable assumptions, exterior and interior penalty function methods can be combined to produce a mixed method for solving the problem (CP) defined in (1). 67 Assumption. We shall suppose (i) that the C in (1) is of the form C = C' n C", where C' is closed and C" is closed and satisfies (44), i.e., = C"; (ii) that (9) is satisfied; (iii) that for at least one I E C which is
zit
optimal for (CP) there exists an open ball B such that f E B and the set m B n C*n C? is not empty. Now consider the sequence of unconstrained minimization problems,
68
(UP):
min(f*(z)
+ p,'(z) + pi(z) I z
E
C"),
i = 0, 1, 2, 3,...,
* Stricly speaking, we should set &(z) = - c x j ( ~ ~ , log(--fi(z)/M). However, the term alM has no effect on the optimal z, and hence may be omitted.
4.1
137
PENALTY FUNCTION METHODS
where the pi'(.) are exterior penalty functions for the set C' and the p:(*) are interior penalty functions for the set C", with C', C" defined as in (67). 69 Exercise. Assuming that there is a z" E C" such that the set
{z I f"Z>
< fO(Z") + PO'(Z") + P;l(Z")I
is compact, show that for i = 0, 1, 2,..., there is a point ziE C" which minimizes fo(z) p,'(z) pr(z) over z E .'?C [Hint: See lemma (51).]
+
+
= 0, 1, 2, ..., let zibe optimal for the unconstrained problem (UP)r. Then, assuming that (67) is satisfied, every accumulation point of the sequence {zi}Eois optimal for the constrained problem (CP) defined in (1).
70 Theorem. For i
Proof. Without loss of generality, we may assume that zi 1 as i -+ 03. We shall show that f E C. Since c" is closed by assumption (67), P E C"; hence suppose that 16 C'. By assumption (67) (iii) there is a point Z" E C' n C". For this z" we must have, by definitions (3) and (46), -+
71
fO(Z")
+ Pi'(Z") + pi(z")
(In fact, p,'(z") = 0 for i = 0, 1, 2,....) exists an integer N' 2 0 such that 72
fO(z")
-+
Let 6
as i - t
fO(Z")
03.
> 0 be arbitrary; then there
+ p,'(z") + p;(z) < fO(z") + 6
for all i >, N'.
Now, since 1 6 C', there exists an integer N" and an open ball B with center 1, such that
73
fo(z)
+ pi'(z) > f0(z") + 6,
for all z
E
B, i >, N"
(compare the proof of lemma (18)). Since zi 1,there is an integer N 2 0 such that ziE B for all i 3 N".Let N = max{N', N", N"};then ---f
74
f"ZJ
+
P,'(Z,)
+ pi(z,) >
fO(Z")
+6 >
fO(Z")
+ Pi'"'') +
PJ(Z")
for all i >, N, which contradicts the optimality of the zi ,i >, N. Hence we must have 1 E C. Now suppose that 1 is not optimal for (CP). Then by (67) (iii), there exists a I E C which is optimal for (CP), with an associated open ball B centered at I, such that 75
f"Z*)
0 and fo(zi) is bounded, i = 0, 1 , 2, 3 , ..., there must exist a subsequence fo(zi) pi'(zi) p;(zi), i E K C (0, 1, 2, 3,...}, such that* (sincefo(zi)+fo(.2))
+
+
78
fo(zi)
+ pi'(zi) + p;(z,) ib 3 fa($),
as i
-+ 00,
i
E
K.
But then (76) contradicts the optimality of zifor i sufficiently large, i E K. m Hence, 2 must be optimal for (CP).
Computational Aspects The preceding theorems have established the essential property of penalty function methods: that under rather mild assumptions they will construct sequences which converge to points that are optimal for the problem (CP) defined in (1). All of the preceding results implicitly depend upon the gross idealization that we can solve the unconstrained minimization problems (UP), , i = 0, 1,2,..., where we drop the characterizing superscripts e (exterior), i (interior) and m (mixed). Let us continue for a little longer to assume that we can solve the problems (UP), , and in addition, that this can be done by means of the algorithms presented in the first section of Chapter 2. In this context we shall now comment on two computational aspects. The first is that of how to start and how to end, or rather truncate the process of constructing the "minimizing" sequence of points zi. The second aspect concerns the relative merits of the various penalty function methods that we have presented. Suppose that we are given an E > 0 and that we are willing to settle for a point f such that lfo(2) - iii 1 < E and d(2, C ) < E , where iii = min{fo(z)l z E C } , and d(., .) is a suitably defined distance function. All penalty function methods construct points zi, which are optimal for (UP), , such that f o ( z i )-+iii and ziis either in C or else zi 2 E C. Hence, in principle, there is an integer j such that zi satisfies our requirements. Suppose that we know this integerj, which in practice will be fairly large, and suppose that we proceed to minimize fo(z) pj(z), starting from some 0 initial guess z, by means of any one of the algorithms in Chapter 2 which ---f
+
* Since zi --* f as i -+ co andfO() iscontinuous,thereexists a /3 < for i = O , 1, 2,....
co
suchthat I f o ( ( z , ) l
< fl
4.1
139
PENALTY FUNCTION METHODS
require the computation of a gradient. We are very likely to find that f0&) 0 appears to be almost zero compared to p j ( z ) , if pj(.) is an exterior penalty function, or vice versa, if pi(.) is an interior penalty function.* Assuming that all calculations are carried out on a finite precision machine, this will result in an extreme loss of accuracy in the computations to follow. However, 0 suppose that we started with a moderate amount of penalty, so that afo(z)/az 0 and api(z)/azare of comparable size. Then the process of calculating zi , as 0 1 2 the limit point of a sequence z, z , z,... will proceed with much better efficiency and accuracy. Supposing that pi(.) = cu,p(.), as in (27), (31), (63) and (64), once we have computed zi , so that 2f0((z,)/az mi ap(zi)/az = 0, we can construct pi+,(.) by making only moderately different from m i , so that 2f0(zi)/2zand api+,(zi)/azare not too different in magnitude and hence, after setting = z, , ensure good numerical behavior in the calculations for z i + , . Thus, the usual practice is to start with a moderate penalty and to let it progress as the calculations proceed. This practice is further indicated by the fact that usually there is no way to establish a priori the specific j stipulated above. However, once part of a minimizing sequence zo , z1 , z2 ,..., zN has been constructed, we can, as is shown in [F2], attempt to find a z which is optimal for (CP) by extrapolation, as follows: Assuming that pi(.) = cu,p(.), we can obtain a good estimate for z by minimizing f o ( z ( t ) )over z ( t )E C, where z ( t ) is a curve which interpolates the zi constructed to date and is of the form
+
79
z ( t ) = zo
+ ta, + t2a, +
where the vector coefficients a j E R”, j the system of linear equations
80
=
t3a3
+ ... + Pa,,
t 2 0,
1, 2,..., N , are determined by solving
z(.J = zi
,
i = 1 , 2,..., N .
The “optimal” c will satisfy t > aN and most probably will be such that z(t) is on the boundary of C . It can be found by means of one-dimensional search techniques, such as the Golden section search (2.1.14), if applicable. Thus, both from the point of view of starting and of stopping a calculation using penalty functions, there seem to be advantages to constructing a moderate rather than small number of points zi which are optimal for the unconstrained problems (UP), . In deciding which of the penalty function methods are most suitable for solving a specific problem, the reader may find himself guided by the following pros and cons: Exterior penalty function methods have the 0 advantage that they can be started at any point z, as an initial guess, and that they can be used with both equality and inequality constraints. In addition, in penalty functions of the form of (27), whenever we find a
* Referring to (27), (31), (63) and (64),we see that the same disparities are likely to occur in the magnitudes of the gradients of these functions.
140
4 EQUALITY AND INEQUALITY CONSTRAINTS
point k in the process of unconstrained minimization such that f j(i)< 0, the corresponding term disappears from the sum in (27) and its derivatives i do not enter into the calculation of the gradient offO(z)+ p j ' ( z ) at z = z. This fact tends to simplify calculations. On the negative side, exterior penalty function methods usually construct points which are not feasible, i.e., which are not in C, and penalty functions such as (27) have few derivatives, a fact which may have an adverse effect on speed of convergence according to 0 Fiacco and McCormack [F2]. Also, should one start calculations at a point z which is in the interior of a set such as in (26), the penalty function and all of its derivatives remain zero until crosses the boundary of C, thus providing the computational process with no guidance in the interior of C. Interior penalty functions, on the other hand, construct sequences of feasible points only, usually have many derivatives, and do provide guidance to the computational process in the interior of C. Their disadvantages are that they can only be used with a restricted class of constraint sets C, those having an interior; that to start them out, one must have a point in the interior of C ; and that in computing, in the case of a C such as in (62), all the terms in a penalty function such as (63) or (64) contribute to the value of the gradient of the penalty function all the time, thus making the evaluation of the gradient of the penalty function more cumbersome than in the case of exterior penalty function methods. Mixed, or exterior-interior penalty function methods enjoy the best of all 0 possible worlds to some extent. Given an initial guess z for the problem (CP) and assuming that C = {z I fj(z) < 0, .j = 1 , 2,..., m, rj(z) = 0, j = 1 , 2,..., l}, one can assign exterior penalty functions to the constraints rj(z) = 0, 0 j = 1, 2, ..., 1, and to the constraints f j ( z ) < 0 whenever fj(z) 2 0. One can then assign interior penalty functions to those constraints f j ( z ) < 0 for whichfj(i) < 0. [Write C = C, n C, n ... n C , n C,' n C2' n ... n C1', with Cj' = ( 2 I rj(z) = O), Cj = {z 1 f j ( z ) < 01,and use the results in (32).] So far, we have discussed penalty function methods as if we could solve the unconstrained minimization problems (UP), exactly, and in finite time, on a digital computer. However, as we have seen in the preceding chapters, the various unconstrained minimization methods which are usually available to us compute only stationary points. Furthermore, they usually take an infinite number of iterations to compute these stationary points. Therefore, if we insisted on using penalty function methods in a literal sense, we could not even get past the minimization of f o ( z ) po(z) over z E R", in finite time. * Consequently, we must use truncation procedures in approximating
1
+
* We denote a penalty function bypi(.)(without superscripts) whenever it is not necessary to indicate whether it is an exterior or an interior penalty function.
4.1
PENALTY FUNCTION METHODS
141
the minimizing, or stationary points offo(.) +pi(.). Also, if we are going +pi(.) methods which can only compute points zi such to apply to to(.) that Vfo(zi) Vpi(zi) = 0, we must become concerned as to the nature of the accumulation points of such a sequence {zi}. We shall now propose two truncation procedures, one for exterior and one for interior penalty functions, and we shall establish the properties of these procedures in a few special cases. Thus consider again the problem (CP) in (l), suppose that {z Ifo(z) < a} is compact for all a E W,and let (l/~Jp’(.),i = 0, 1, 2, ..., be a sequence of exterior penalty functions for the set C in (1). We shall suppose thatfo(.) and p’(.) are continuously differentiable functions and that E* = E/V, where E > 0 and /3 > 1. We now propose a first-order penalty function type of algorithm for “solving” the problem (CP).
+
81 Algorithm (modified exterior penalty function method, Polak [P3]). 0
Step 0. Select an E > 0, an 01 E (0,+), a /3 > 1, a p > 0, and a z Step 1. Set e0 = E, setJ = 0, and set i = 0. Step 2. Compute i
82
h(2; €i)
=
E
UP.
- [VfO(t) + -L Vp’(l)]. Ei
i
, to step 4; else, set cifl = ei//3, set zi Step 3. If 11 h(z; EJII > E ~ go set i = i 1, and go to step 2. j Step 4. Use algorithm (2.1.33) to compute a X such that
+
i
83
- a) 11
-X(l
h(&
€i)(12
+ Ah(;; j
0 and a 6* < 0 such that for all z E B(z*, €*) = {z E c I 11 z - z* II < €*},
f0(z') -fo(z)
33
< 6*
for all z'
* Assuming, of course, that the sequence {*,}
E
A@),
for all z E B(z*, €*).
being constructed is approaching a solution. that this theorem does not depend on assumption ( 5 ) being satisfied. However, when ( 5 ) is not satisfied, algorithm (27) may stop at zo ; i.e., it may be useless. To compute a vector Z" E A(z'), z' E C, set zj = z' in step 1 of (27) and compute zifl in step 3 or in step 5 , as may be appropriate. Then set z" = z * + ~Thus . the set A(z') is determined by the set of vectors h(z') E S* which satisfy (21) for z = z', together with the step size rule (28), when ho(z') < 0, and A(z') = {z'} when ho(z') = 0. + Note
*
4.2
157
METHODS OF CENTERS
(Recall that for any z E C,ho(z) < 0 (see remark (26).) Thus, suppose that z* E C and that ho(z*) = v* < 0. Since ho(*)is continuous, there exists an E > 0 such that 34
hO(z)
V* (.) and the set J,(z) have the properties,
15
h,O(z)< 0
for any z
E
16
J,(z) 3 J,,(z)
whenever
E
17
h,O(z) 2 h!r(z)
whenever
E
18
for any E 2 0, for any z E C, there exists a p > 0 such that J,+&) J,(z);
19
for any E 2 0, for any z E C, there exists a p > 0 such that J,(z') C J,(z) for all z' E B(z, p) =
C , for any
E
2 0;
> E', for any z E C ; > E', for any z E C ;
-
{z E
c I I! z'
-
II
< PI.
w
Zoutendijk's algorithm, below, can be used to compute points in C satisfying h,O(z) = 0, under assumption (2.3).*
Zoutendijk method of feasible directions. *In principle, we do not need assumption (2.5) for the algorithms in this section. However, when (2.5) is not satisfied, they may stop at zo , or fail to compute a z,, in a finite number of iterations, Le., they may be useless (see (2.30)).
4.3
METHODS OF FEASIBLE DIRECTIONS
I63
20 Algorithm (method of feasible directions, Zoutendijk [Z4]). Step 0. Compute a zo E C ; select an E' > 0, an E" E (0, E') and a fl E (0, 1); set i = 0. Comment. It is usual to set fl
=
1/2. See (2.30) for computation of z, .
Step 1. Set E = E'. Step 2. Set z = zi . Step 3. Compute h(z) = (h,O(z), h,(z)) by solving (10)-(12). Step 4. If h,O(z) < - E , set h(z) = h,(z) and go to step 7; else, go to step 5. Step 5 . If E < E", compute hoo(z)by solving (10)-(12) (with E = 0) and go to step 6 ; else, set E = BE and go to step 3. Step 6. If hoo(z).=0, set zi+l = z and stop; else, set E = BE and go to step 3. Step 7. Compute h(z) 2 0 such that 21
h(z)
= max{h
I fi(z
+ ah(z)) < 0 for all
OL
E
[0, A] and i
=
1,2,..., m}.
+
Comment. Here, A(z) is the largest X for which z ah(z) E C, for all OL E [O, A]. Step 8 . Compute p(z) E [0, A(z)] to be the smallest value in that interval such that 22
f"Z
+ p(z) h(4)
=
min{f"z
+ ph(zN I p
E
10, 4z)l).
Comment. It can be seen that p(z) will always exist if z, satisfies assumption (2.3). 1, and go to step 2. Step 9. Set zi+l = z p(z) h(z), set i = i
+
+
< fo(zo)}is compact. Show that if the sequence {zi} generated by algorithm (20) is finite, then its last element, zk ,must satisfy hoo(z,) = 0, and that if {zi}is infinite, then every accumulation point 1 of {zi} must satisfy hoo(z) = 0. [Hint: Use theorem (1.3.42) as follows: Set S = S*, c(z) = fO(z), + ( E , z) = h,O(z), H(E,z) = {h E S* I h,O(z) = maxieJe(z) 0, for any E 3 0, and for any i E (0, 1,2,..., m), there exists an si(p) > 0 such that if for z E C(z,), with the property that iE
JmY
24
164
4 EQUALITY AND INEQUALITY CONSTRAINTS
then
2s
f”2
--SP + sh) - f”2) < 2 ’
for all s E [0, st&)], for all h E H(E,z).
Lemma 2. Given any E > 0 and any i E (1, 2,..., m>,there exists a ti(€)> 0 such that fa(z th) < 0 for all z E {z E C’(z,) I fi(z) < -E) for all h E S*, and for all t E [0, ti(€)].(Construct a contradiction.) Once these two lemmas are established, it is easy to see that the conditions (1.3.42)(i) and (1.3.42)(ii) are satisfied. Next, (1.3.42)(iii) is satisfied, since f o ( . ) is continuous and C’(z,) is compact. That condition (1.3.42) (iv) is satisfied follows directly from lemma 1, while (1.3.42)(v) can be established by means of arguments similar to the ones used for (36)-(39). Condition (1.3.42)(vi) can be established by making use of lemmas 1 and 2. Finally, we see that (1.3.42)(vii) is true by definition. To remove the dependence of the construction of zi+l on the current value of e , we only need to change the instruction in step 9 of algorithm (20) from “go to step 2” to “go to step 1.” The resulting algorithm, stated below, can then be shown to be of the form of the model (1.3.9), as we shall soon see.
+
26 Algorithm (method of feasible directions, Polak [Pl]). Step 0. Compute a z, E C, select an E’ > 0, an E” E (0, and set i = 0.
E’),
a
fl E (0, l),
Comment. It is usual to set B = 1/2. See (2.30) for a method to compute a z, E C, using algorithm (2.27), or (5), or (20) or (26). Step 1. Set z = zi . Step 2. Set e0 = E’, and set j = 0. Step 3. Compute a vector li,,(z) = (h:iz), h,,(z)) by solving (10)-(12) for E = e j . Step 4. If h:j < - E ) , set e(z) = e j , set h(z) = h,,(z), and go to step 7; else, go to step 5. Comment. Do not store ~ ( z )it; is defined only because it will be needed in proving convergence properties of algorithm (26) later. Step 5. If ej < E“, compute hOo(z)by solving (10)-(12) with E = 0, and + ~Bej, s e t j = j 1, and go to step 3. go to step 6; else, set E ~ = Step 6. If hOo(z)= 0, set z,+~= z, and stop; else, set E ~ = + ~Bei, set j =j 1, and go to step 3. Step 7. Compute A(z) > 0 such that (21) is satisfied. Step 8. Compute p(z) E [0, h(z)]to be the smallest value in that interval satisfying (22).
+
+
4.3
METHODS OF FEASIBLE DIRECTIONS
165
Comment. p(z) will always exist if z, satisfies assumption (2.3). 1, and go to step 1. Step 9. Set zi+l = z p(z) h(z), set i = i Note that in the version (20), the value of E is allowed to decrease continuously, while in the version (26), the value of E ( E ~ ) is reset to its original value of e' at each iteration. Both of these approaches have their advantages and their disadvantages. For example, in the algorithm (20), for some reason, E may become qui,te small while ziis still quite far from a point 2 satisfying hOo(2)= 0. As a result, some of the inactive constraints, satisfying -kE < fj(zi) < - E , with k small, may force the step size p(z) to become unnecessarily small, causing a slowdown in the convergence process. This would not ocqur in the version (26). However, as zi approaches a point 1 which satisfies hOo(z)= 0, algorithm (26) may spend too much time in decreasing E , from the value E' to the much smaller value ~ ( zat ) each iteration. In practice, one might use some heuristic to switch from the version (20) to the version (26), and, if need be, back again, in the course of a calculation. This can obviously be done very easily, since the two algorithms differ in only one small detail.
+
+
27 Lemma. Algorithm (26) cannot cycle indefinitely between steps 3 and 6 while constructing a sequence e j 0 as j 00. --f
---f
Proof. Suppose that zi E C is such that hOo(zi)= 0. Then, after a finite number of reductions of ei , the algorithm will construct an e j < E". It will then determine in step 6 that hOo(zi) = 0, and it will stop after setting zi+l = zi . Next, suppose that zi E C is such that hoo(zi)< 0. Then, by (18), there exists a p > 0 such that for all E* E [0,p ] , J&) = J,(zi), and hence, h:,(zi) = h,o(Zi) < 0 for all cj E [0, p]. Let j ' > 0 be the smallest integer such that we must have 28 hCj,(Zi)< - E j ' , Ej'
P~'E'< min{p, -hOo(zi)};then =
p'd.
Hence, a new point, zi+l , will be constructed after at most j ' reductions of , i.e., whenj = j'. Consequently, algorithm (26) is well-defined.
E$
29 Exercise. Show that lemma (27) is also true for algorithm (20).
30 Remark. Suppose that {zi} is a sequence constructed by algorithm (26) (or algorithm (20)); thenfo(zo) > fo(zl) > fO(z2) > ..., as can be seen from is continuously the mean-value theorem (B.l.1) and the facts that to(*) differentiableand that (Vfo(zi),h(zi)) < -e(zi) < 0. However, the sequence
166
4
EQUALITY AND INEQUALITY CONSTRAINTS
h&(zi), i = 0, 1, 2, ..., has no demonstrable monotonicity properties rn (h(zi) = h(z), c(zi) = ~ ( z for ) z = zi in step 4 of (26)). Although (26) is an algorithm with ‘‘6 reduction,” it is not of the form of the model (1.3.33), but of the form of the model (1.3.9), with c(.) = fo(.), T = C , A : C -+ 2c defined by the instructions in (26), and z E C defined to be desirable if hoo(z)= 0. The reason we can identify (26) with the form (1.3.9) and not with the form (1.3.33) is that the reduction of ci in (26) is carried out on the basis of a test that does not involve the values of the function fo(*). The instructions in (26) defining the map A : C -+ 2‘ are rather complex, and it may help the reader if we now exhibit it explicitly. First, suppose that zi E C is such that hoo(zi)= 0. Then, as we have seen in lemma (27), after a finite number of reductions of c j ,.algorithm (20) sets zi+l = zi , i.e.,
4 4 = (4,
31
when hoo(z)= 0.
Now suppose that zi E C is such that hoo(zi)< 0. Then algorithm (26) computes a unique integer j’ such that h;jaGI(zi)< --/P’E’ and h&(zi) > - p j c ’ , for j > j’, j an integer, and sets e(zi) = P~’E’. Thus, c(zi) is uniquely determined by zi . Next, let S*(zi) C S* be defined as follows:
32 Then we see that the point zi+l constructed by algorithm (26) must belong to the set
33 {z’ = zi
+ p’h’
E
C I h’ E S*(zi),p‘ 3 O,fo(z’)= min{fo(zi
+ph’) I p 3 0, zi + ph‘ C}}. €
Consequently (under any assumption, such as (2.3), which ensures that the min in (33) exists), 34
A(z) = {z’ = z
+ p’h‘ E C I h’
= min{fo(z
E
2 0,fo(z‘) when hoo(z)< 0. ph E C } } ,
S*(z),p‘
+ ph‘) I p 3 0, z +
Taken together, (31) and (34) define the set A(z) C C for every z E C. We can now establish the convergence properties of algorithm (26). Suppose that A(z) is well-defined by (31) and (34) for all z E C. Let zo ,z1 , z2 ,... be a sequence constructed by algorithm (26) for problem (1) (where we had assumed all functions to be continuously differentiable), and suppose that (2.3) is satisfied by zo . Then, either the sequence { z i } is finite and its last element, say z k . ,satisfies hoo(z,) = 0, or else { z i } is infinite and every accumulation point z of { z i } satisfies hoO(z) = 0 (see also (1.3.65)).
35 Theorem.
4.3
167
METHODS OF FEASIBLE DIRECTIONS
Proof. We have shown above that algorithm (26) is of the form of the model (1.3.9). To prove theorem (35) we only need to show that the assumptions (i) and (ii) of theorem (1.3.10) are satisfied by the maps c(.) = fo(-) and A ( . ) defined by (31), (34), for z E C defined to be desirable if hoo(z)= 0 (recall, we set T = C). (Since algorithm (26) can stop constructing new points only when it encounters a zi such that hoo(zi)= 0, the finite sequence case is trivial.) Since by assumption f 0 ( O is continuously differentiable, (i) is satisfied, and hence we are left with establishing (ii). Suppose that z E C is such that hoo(z)< 0. Then we must have ~ ( z )> 0 and / Z € ( ~ ) ( Z< ) -E(z) < 0. We shall now show that for points z’ E C sufficiently close to z, we must have E(z’) 3 e(z)//3. From (19) it follows that there exists a p > 0 such that
36
J€CZ,(Z’)
Let 8: Iwn
--f
cJ d z ) ,
for all z’ E B(z, p).
R1 be defined by
37
max (Vfi(z’), h);
8(z’) = min
h€S* i€Jc(z)(z)
then, by (B.3.20), 8(.) is a continuous function, and hence there exists a p‘ E (0, p ] such that
I O(z’) - h,4,)(z)l < ~ ( z(1) - /3),
38 since 8(z)
= h$,)(z). Consequently,
because of (36),
0 such that for i = 1 , 2,..., m, 42
+
I f i ( z ’ rh) -fi(z’)I
< E(z)/~,
Again, since the functionsr(.), i
for all z’ E B(z, p”), h E S*, t E [O, t‘]. = 0,
1, 2,..., m, are continuously differen-
168
4
EQUALITY A N D INEQUALITY CONSTRAINTS
tiable and S is compact, it follows from theorem (B.3.7) that there exists a p" E (0, p"] and a t" E (0, t '] such that for i = 0, 1 ,2,..., m,
43 I(Vf"z'
'(4B + th), h ) - (VfYZ'), h)l < 2, for all z'
E
B(z, p), h E S*, t
E
[0, t"].
Now consider all the functions fi(.),i E J,c,,,(z') for any z' E B(z, j5). Then, by the mean-value theorem, for a given h(z') E S*(z'), we must have, for any t E [0, t"],and i E (0, 1, 2,..., m}, 44
fyz'
where --E(z')
+ th(z')) = f i ( z ' ) + t(Vf"z' + Ph(Z')),h(z')),
E [0, t ] . Since for any h(z') E S*(z'), we must have (Vfi(z'),h(z')) < < --E(z)/~,it follows from (43) and (44) that for all i E J,cz,,(z'),
fyz'
45
+ th(z')) -
t
E
[O, t"].
Examining (45) and (42), we conclude that X(z'), as computed in step 7 of (26), must satisfy X(z') 2 t", since because of (42) and (49, 46
fi(z'
+ th(z')) < 0
for ail t E [0, t"], i
=
1 , 2,..., m.
Consequently, because of (46) and (45), we must have 47
fO(Z'
+ p(z') h(z')) - fO(Z')
0 and of affine functions among the fz(-), any z E C, consider the linear programming problem, (Vf0(z), h),
minimize
70 subject to
71 72
(Vf"Z),
h)
+
< 0, I h % l< 1, E
i # 0, i E J,(z), i = 1 , 2,..., n.
172
4 EQUALITY A N D INEQUALITY CONSTRAINTS
We shall denote a solution of (70)-(72) by h,(z). Note that just as in the case of all the other linear programming problems we have seen in this section, h,(z) may not be unique. Also, we shall use the notation h,"(z) = (Vf0(z),h,(Z)>, i.e.,
73
h>(z) = min{(VlfO(z), h ) I ( V f f ( z ) ,h ) I hi [ < 1, i = l,..., n}.
+
E
d 0, i # 0,
i E
J,(z);
74 Proposition. Suppose that for every z E C there exists a vector h E Rn such that (Vfi(z),h ) < 0 for all i # 0, i E Jo(z). Then for any z E C,hoO(z)= 0 if and only if hoo(z)= 0 (h:(-) was defined in (8)). (Note that the condition in this lemma ensures that the Kuhn-Tucker constraint qualification is satisfied at every point in C; see [CI], theorem (3.3.21).)
Proof. We make use of contraposition to establish (74). Thus, suppose that for some z E C, h,O(z) < 0; then, by inspection, we find that hoo(z)< 0, i.e., if hoO(z)= 0, then we must also have hoo(z)= 0. Now suppose that for some z E C,hoO(z)< 0. Then, for some h' E S* = { h 1 I hi I d 1, i = 1,2, ..., n>, we must have hoo(z)= 0, we shall have (Vfs(z),h' Xlh) < 0 for all i E Jo(z)and for some h2 > 0, X2(h' h'h) E S*. Hence, we must also have h,O(z) < 0. Consequently, if hoo(z)= 0, then we must also have h,O(Z) = 0. Thus, under the assumption stated in (74), computing the zeros of the function boo(.) seems to be as good an idea as computing the zeros of boo(.), as far as the confidence one can have that the point that one has computed is optimal for (1). However, the linear programming problem (70)-(72) contains one variable less than the linear programming problem (10)-(12). In addition, the function h:(*) is generally more suited to optimal control problems than the function h,O(.), as we shall see in the subsection on optimal control problems.
+
+
75 Theorem. Suppose that for every z E C there exists a vector h E BB" such that (Vji(z),h ) < 0 for all i # 0, i E Jo(z).Then theorem (35) remains valid for the modification of algorithm (26) which results from the substitution of step 3" below for step 3 in (26). Step 3". Compute a vector h,,(z) = (h:, ,h,,(z)) by solving (70)-(72), with E = c j , for a vector h,i(z) (=h,l(z)) and then setting h:,(z) = (.).) All of these are first-order methods, since the directionfinding process involves only the first derivative of the functions fi(*), i = 0, 1,2,..., m, which define the problem to be solved, i.e., problem (3.1). We shall now show how the functions ho(.),A G O ( . ) and A?(.) can be modified by the addition of second-derivative terms in order to obtain algorithms with a Newton-Raphson-like appearance, and hopefully, with a better rate of convergence than that of the methods discussed in the preceding section. As in Section 3, we shall consider the problem,
min{fo(z) Ifi(z) < 0,i
= 0,
1,2,..., m},
where the f i: R" -+ R1 are continuously differentiable functions. (At some point it will become necessary to assume ,that these functions are twice continuously differentiable.) As before, we shall denote by C the constraint set, C
= {z E
R" Ifi(z)
< 0,i = 1,2,..., m},
and, in addition to the indicator set JE(z),defined in (3.7), we shall make use of the indicator set Z&), which we define, for any E 2 0 and for any Z E C, by
ZE(z) = { i I f i ( z ) + E 2 0, i E (1, 2,..., m}}, so that JE(z)= {0} u ZE(z). We now define the functions h'O(.), hio(*),&O(*) and &O(-), mapping C into R1 as follows:
h'O(z)
=
~ i max{(vfO(z), q h ) + (h, Ho(z) h ) ; f i ( z ) + (VfYz), h )
+ (h, Hi(z)h), i
=
1, 2,..., m};
182
4 EQUALITY A N D INEQUALITY CONSTRAINTS
8 Proposition. For any z E C , h’O(z)
= 0 if
and only if ho(z) = 0.
Proof. Suppose that for some z E C, h’O(z) < 0. Then, by inspection, we must also have ho(z) < 0, i.e., ho(z) = 0 implies that h’O(z) = O.* Next, suppose that
ho(z) = max{(VjO(z), h’);ji(z)
+ (Vji(z), h’), i = 1, 2,...,m} < 0,
for some h’ E S*. Then there exists a A‘ > 0 such that (because the quadratic terms will be dominated by the linear and affine ones) max{A(VfO(z),h’)
+ A2(h’,Ho(z)h’);fi(z) + A(Vfi(z),
+ A2(h’, Hi(z), A‘), i
=
1 , 2,..., m} < 0
Hence, h‘O(z) < 0, which implies that if h’(z)
= 0,
h’)
for all A E (0, A’]. then ho(z) = 0.
9 Proposition. For any z E C, hio(z) = 0 if and only if hoo(z)= 0. 10 Exercise. Prove proposition (9). [Hint:Repeat the arguments used in the proof of proposition (8).] 8 11 Proposition. Suppose that the assumptions of proposition (3.74) are satis8 fied; then, for any z E C, hio(z) = 0 if and only if hoo(z)= 0.
12 Exercise. Prove proposition (11). [Hint: Proceed as for (8).]
8
1 3 Proposition. Suppose that the assumptions of propositions (3.74) are satisfied and that for all z E C, the matrix Ho(z) is positive definite. Then, for any z E C , &O(z) = 0 if and only if hoo(z)= 0. Proof. When Ho(z) is positive definite, the set
is compact for every 01 E W. Hence, is well-defined for every E 2 0. Now, suppose that for some z E C, &O(z) < 0; then, by inspection, we must also have &O(z) < 0 (since @(z) < hAo(z) for all z E C). Consequently, &O(z) = 0 implies that &O(z) = 0.
* This is a proof by contraposition, as are those of the propositions to follow. They are based on the fact that if the falsehood of statement A implies the falsehood of statement B, then the truth of B implies the truth of A.
4.4
SECOND-ORDER METHODS OF FEASIBLE DIRECTIONS
183
Next, suppose that for some z E C , hgUo(z)< 0, with
+ (A’, Ho(z)A’). (0, l), we must have h(Vfi(z), h’) + X2(h’, H,(z) h’) < 0, h,“o(z)= (VfO(Z), h’)
Then, for all X E iEZo(z),and hence, since for some X in (0, 1) we must also have h h ’ ~ S * , we conclude that hio(z)< 0, so that hio(z) = 0 implies that hgUo(z)= 0. In rn view of proposition (1 l), we are obviously done. The relationship between the zeros of /zoo(.) and of hao(.) was established in proposition (3.74). To complete our demonstration of relationships between all the functions which we shall show to be usable in a method of feasible directions, we shall now work exercise (2.16) for the reader. 14 Proposition. For any z E C, ho(z)= 0 if and only if hoo(z)= 0.
+
Proof. Suppose that ha@) = max{(VfO(z), h’); f”z) (Vfi(z), h‘), i = 1,2,..., m} < 0 for some h’ ES*.Then hoo(z)< maxi,Jo(,) (Vfi(z),h’) < 0 and hoo(z)= 0 implies that ho(z)= 0. Now suppose that for some z E C , hoo(z)= max,,Jo(Z)( V f i ( z ) ,h”) < 0, for some h” E S*. Then for some A” E (0, I], we must have max{A”(VfO(z),h”); f i ( z ) h”(Vfi(z),A”), i = 1, 2,...,m} < 0. Consequently, ho(z) < 0. We rn therefore conclude that ho(z)= 0 implies hoo(z)= 0.
+
15 Exercise. The functions defined in (4)-(7) do not exhaust all the possibilities of modifying the functions ha(.),h,O(.),and A$(*) in such a way as to obtain a new function whose zeros coincide with those of the old function. Thus, consider the functions from C into R” defined by 16
h’O(Z) =
E&{(VjO(Z),h )
+ ( A , H o e ) h ) I f”z> + (.) in algorithm (3.26), we change the instruction in step 3 of (3.26) from “Compute a vector f ; J z )= (h! (z),hcj(z)) by solving (10)-(12) for E = F?’’ to “Compute a vector h&) = (h!:(z),h,,(z)) by solving (4.21)-(4.23) for E = E , ,” where (4.21)-(4.23) are as stated below. 21
minimize
ho,
subject to 22
23
--ho
+ (V f z ( z ) ,h ) + (h, Hz(z)h ) d 0,
for all i E J,(z);
l h a l < 1,
i = l , 2 ,..., n.
Note that the problems (18)-(20), (21)-(23), as well as those defined by (6), (7) and (16), (17), require us to minimize either a linear or a quadratic form on a constraint set defined by linear and quadratic inequalities. So far, there are no methods for solving such problems in a finite number of iterations. Consequently, these functions are not usable in an implementable algorithm of the form of (3.26) unless we set H,(z) = 0 for all i appearing in the constraints in the feasible direction-finding subproblem. These considerations seem to rule out an immediate interest in the functions h’O(*) and h:O(.), and suggest that we set H,(z) = 0 for i = 1, 2, ..., m in (6), (7), and (16), (17). To give an algorithm using the functions h ’ O ( . ) , or k”’(.) a Newton-Raphson-like appearance, we would set Ho(z) = a2fo(z)/az2 for all z E C , and H,(z) = 0 for i = 1 , 2,..., m.* 24 Theorem. Suppose that in the instruction of step 2 of algorithm (3.5), the
words “Solve (2.22)-(2.25)” are replaced by the words “Solve (4.18)-(4.20).” Then the conclusions of theorem (2.32) remain valid for this modification 1 of algorithm (3.5). 25 Exercise. Prove theorem (24). [Hint: Proceed essentially as in the proof of theorem (2.32).] 1 26 Theorem. Suppose that in the instruction of step 3 of algorithm (3.26), the words “by solving (10)-(12)” are replaced by the words “by solving (4.21)-(4.23).” Then the conclusions of theorem (3.35) remain valid for m this modification of algorithm (3.26).
* Provided, of course, that H,(z) is positive semidefinite.
4.5
185
GRADIENT PROJECTION METHODS
27 Exercise. Prove theorem (26). [Hint: Proceed essentially as in the proof of theorem (3.35).] 28 Remark. Theorems (24) and (26) only make sense when the algorithms in question are well-defined, i.e., only when they are applied to problems in which they will not jam up for some reason, such as the nonexistence of a p(z). An assumption such as the requirement that the set { z I fo(z) fo(zo)} be compact takes care of this difficulty provided one starts at z, . Also see lemma (3.27).
0 is arbitrary and z’E C is not desirable.] The interested reader may also construct similar theorems involving the functions defined in (16) and (17). This concludes our discussion of methods of feasible directions.
4.5
Gradient Projection Methods
We conclude this chapter with two gradient projection methods for solving the problem, 1
min{fO(z) Ifi(z) < 0, i
-
=
1, 2 ,..., m},
under the assumption that thef’ : IW” R1, i = 0, 1, ..., m,are continuously differentiable convex functions. As before, we shall denote by C the constraint set, i.e., 2
C
= {z E
IWn I fi(z)
< 0, i = 1, 2,..., m}.
I86
4
EQUALITY A N q INEQUALITY CONSTRAINTS
Since we assume that the functions fi(.),i = 1,2, ..., m, are convex, it is clear that the set C must also be convex. Let z be any point in C . We define the projection of z - Vf O(z) onto C to be the point z, E C which satisfies
11 z,
3
- (z -
Vfo(z))ll
= min{((z' - (z -
Vfo(z))l/1 z'
E
C}.
4 Exercise. Show that z, exists and is unique because C is closed. Also, show that
(z'
5
- z, ,z - Vfo(z) - z,)
0 such that for the vectors f i , i E I&), are linearly every z E C and for every E E [O, 4,
independent. This assumption can be removed at the expense of an increase in the complexity of the gradient projection algorithm and we shall indicate later how this can be done. In the algorithm below, we shall use the notation introduced in (lo), (16) and (18), setting FI = 0, PI= 0 when I is empty, and in addition, for every E E [0, E‘] and for every z E C, we define
36
37 Algorithm (gradient projection method for (32), Polak [Pl]). Step 0. Compute a Z ~ C E ; select a / ~ E ( O , l), and 2 ~ ( 0 E,’), and an E” E
(0,
E l ; set i = O.+
Comment. See (2.30) for a “bootstrap” method for computing a zo . It is common to set /3 = 3. Step 1. Set z = zi . Step 2. Set e0 = E and set j = 0. *When Zo(z) is empty, we set F I ~ (=~ 0, ) J‘I~($)= 0. The c’ > 0 is assumed to be such that (35) holds.
t
4.5
Step 3.
GRADIENT PROJECTION METHODS
191
Compute the vector,
38 Step 4. If 11 hf,(z)j12> E ) , set h(z) step 12; else, go to step 5 .
=
-he,(z), set ~ ( z= ) E) , and go to
Comment. Do not store the value of ~ ( z )It . is introduced here only because it will be needed in the convergence proofs to follow. Step 5 . If cg < E", compute the vector h,(z) (as in (38), with E) replaced by 0), compute the vector yo(z) (according to (36)), and go to step 6; else, go to step 7. Step 6 . If 11 h,(~)11~ = 0 and y,(z) 5 0, set z , + ~= z, and stop; else, go to step 7. Step 7. Compute the vector y,,(z) (according to (36)). = PE,, set j = j 1, and go to step 3; Step 8. If yf9(z)5 0, set else, go to step 9. Step 9. Assuming that Z&) = {kl,k 2 ,..., k,,) and that k, < k, < ... < k,., set &(z) = yzJz) for a = 1 , 2, ..., m' (where yz,(z) is the ath component of the vector y 0 such that for any z E C and any E E [0,E'], the vectors Vfi(z),i E ZE(z),are linearly inde-
pendent. We retain the notation introduced earlier in this section with the following, rather obvious, adaptation: For any E E [0, E'] and any z E C , we shall denote by
93
FIE(,)= (vfi(z))ieI,(z)
a matrix whose columns are the vectors Vfi(z),icZE(z),ordered linearly will still be defined by (16) and on i. The projection matrices PIE(,)and Pi(*) (18), but with the matrix FIE(,)now defined by (93). 94 Algorithm (hybrid gradient projection method, Polak [Pl]).
Comment. The first eleven steps of this algorithm are the same as in (37). Step 0. Compute a Z ~ CE ; select a P E ( O , l), and C E (0,E'], and an E" E (0, 2); set i = 0. Comment. See (2.30) for a method of computing a zo . It is common to set
p = *.
Step 1. Set z = zi. Step 2. Set c0 = 2 and set j = 0. Step 3. Compute the vector 95
Step 4. If 1) h,,(z)llz > E step 12; else, go to step 5.
~ set ,
h(z)
=
--hJz), set ~ ( z = ) E
~ and ,
go to
Comment. Do not store E(z), it is only introduced for the sake of the proofs to follow.
4.5 GRADIENT PROJECTION METHODS
203
Step 5 . If ei < E", compute the vector ho(z) (using formula (95) with replaced by 0), compute the vector yo(z) (using (36)), and go to step 6 ; else, go to step 7. Step 6. If 11 h,(~)11~ = 0 and yo(z) 5 0, set zi+l= z, and stop (z is optimal); else, go to step 7. Step 7. Compute the vector y,,(z) (using formula (36)). Step 8. If yfj(z) 5 0, set E ~ = + Pej, ~ set j = j 1, and go to step 3; else, go to step 9. Step 9. Assuming that Zf,(z)= {kl,k2,...,k,,} and that k, < k, < ... < k,., set p:;(z) = y:,(z) for 01 = 1, 2,..., m'. Step 10. Find the smallest k E Zf,(z) such that the vector
+
96
&(z)
= Pkj(&k
Vf0(Z)
satisfies the relation 97
II kj(Z)Il
=
max{ll P+
Vf0(Z)ll
I 1 E z€j(z)7 pL4,(z) > 01;
set h(z) = --&,(z). , E ~ = + ~ ,set j Step 1 1 . If 11 h(z)1I2 < E ~ set else, set ~ ( z = ) E ~ and , go to step 12.
=j
+ 1, and go to step 3;
+
Comment. Generally, the half-line {z' = z &(z) 1 h 2 0} intersects the set C at the point z only, i.e., generally, h(z) does not define a feasible direction. We now construct a vector v(z) which does define a feasible direction. Step 12. If h(z) = -h,(z)(z), set K,(z)(z)= Z,(z)(z),and go to step 13; else, set Kf(z)(z)= If(z)(z) - k and go to step 13. Step 13. Compute the vector 98
where t = -e(z) (1, 1, ..., 1) E such that 99
[w"
and p(z) 3 1 is the smallest scalar in [I, a)
W Z ( z ) ,4z)>
< -+),
for I = 0 when K,(z)(z)= Zf(z)(z), and for 1 = 0, k,when Kc(z)(z)= Zf(z)(z) - k. Step 14. Compute h(z) > 0 to be the smallest scalar satisfying 100
fo(z Step 15.
+ h(z) v(z)) = min{fO(z+ hu(z)) I h 3 0, z + h(z) E C}. Set zi+l = z + h(z) u(z), set i = i + 1, and go to step 1.
101 Theorem. Suppose that problem (1) is such that algorithm (94) is welldefined for all z E {z E C I fo(z) f0(zo)},where zo is as determined in step 0
0 and of a 6 < 0 such that 102
fo(z’
+ h(z’) ~ ( z ’ ) -) fo(z‘) < 6
for all z‘ E B(z, E).
First, proceeding exactly as in the proof of theorem (61), and, in addition, making use of the fact that the functionsfi(-), i = 0, 1, ..., m,are continuously differentiable, we can show that if z E C is not optimal for (I), then there -8 for exists a p > 0 and an P > 0 such that algorithm (94) will set E(z’) all z’ E B(z, p). Next, by construction in step 13 (see (99)),
< -E(z’) < -2; v(z’)) < -E(z’) < - P
103 (Vfo(~‘),u(z’))
104 (V’’(Z’),
for I
E J(z,)(Z’),
1 $ Kf(Z4z’);
105 = P(Z) O), then the optimality conditions (1.2.25)-(1.2.30) will be satisfied for our problem. In view of the above discussion, let us concern ourselves for a while with the following problem:
15 The Primal Problem. We are given a compact set C C R” which is either strictly convex, or else it consists of a unique point, and we are also given a map 93 : [ a m i n , CO) -+ 2’”, a m i n 3 0, such that (i) for every a 3 a m i n , @(a) is a compact convex set which has an interior for every a > a m i n ; (ii) the map @(.) is continuous in the Hausdorff metric;* (iii) for every a 3 a m i n , @(a) = b(a)n K, where K is either R” or else a convex polytope with interior, and &(a) is a strictly convex set, with the property that &(a’) is contained in the interior of &(a“) whenever a’ < a”. We are required to find an B 3 a m i n and a vector 4 E C , such that 6 = min{a I @(a) n C # 0 , a
16
amin},
{a} = 9 ( B ) n C.
17
In addition, we may wish t o find a unit vector i such that 18
(x
-
4, P)
a m i n such that , a’] is compact, there exists an B E [amin , a’] such that B = inf{a I 9 ( a ) n C # O , a 3 amin}. Let {ai}~=o be any sequence Prooj
@(a’) n C # O . Since [amin
* Given two compact sets, A, B, in R”, the Hausdorff distance between these sets is defined by d(A, B) = max(d,, d2), where Q = maxZEA min,,, 11 x - y I1 and d2 = maxWFB mineEAll x - Y 11.
5.2
A DtCOMPOSITION ALGORITHM OF THE DUAL TYPE
215
in [0, a?’] which decreases monotonically to 8, satisfying, in addition, B(a,) n C # 0 for i = 0, 1,2,... . Then, by (ii) of (15), the compact sets B(aJ n C form a monotonically decreasing sequence, and satisfy B(ai+Jn C C %(ai) n C for i = 0, 1,2, ... . Consequently, the sequence of sets @%(a2) n C > z o converges to the set nZ,,(9(ai) n C ) # 0.Now by is continuous in the Hausdorff metric, and hence, assumption (15) (ii), W(.) %’(a,) B(4) as i-. co. But B(8) is compact by (15) (i), and hence, ---f
W
c)=
( B ( ~n ~) i=O
($10
a(at) ) n c = 9 ( d ) n c,
and
Next, suppose that 9 ( 8 ) n C consists ,of more than one point, i.e., suppose that x’ # x” are both in B(8) n C. Then the linear segment 1 = (x = Ax‘ (1 - A) X” I h E (0,l)} C B(8) n C is in the interior of both C and &(a), since both of these sets are strictly convex. Now &(.) is continuous in the Hausdorff metric, and by (15) (iii), &‘(a)C b(8)for every a E [amin , a). We therefore conclude that there exists an a” < 8 such that &(a”)n Z # 0 . But ZC K , and hence, K n &’(a”) n Z = B(a”) n Z # 0 . However, this contradicts the optimality of 8, and hence, 9 ( 8 ) n C must consist of a unique point 2. Finally, since both B(4) and C are convex and since their intersection consists of a single point 3, there exists a hyperplane H = (x I (x - jZ., s”) = 0) which separates 9(&) from C , i.e., there exists a vector f of unit norm which satisfies (18) and (19). We shall need the following observation in justifying some of the definitions to follow:
+
22 Proposition. Let d be defined as in (16) and suppose that (20) is satisfied. Then for any amin a’ < 01” 8, g ( a ’ ) # %(a”).
0,
which contradicts (24). Thus, v(.) must be continuous. Now, for every s E S, let
P(s) = { x E R” I ( x - u(s), s)
30
B
= O},
i.e., P(s) is the support hyperplane to C at v(s), with outward normal s. Next, let T C S be defined by 31
T = {S E S
I (X
- ~( s) ,S )
0 for all x E g(amin)},
i.e., for every s E T, the hyperplane P(s) separates e%(amin) from C. Now, suppose that 2, 9, 8, satisfy (16)-(19). Then we must have 9 = v(8), and, since .%‘(amin) C e%(&), 8 E T. *.
See figure on p. 220.
t Note that (24) can also be written as (~(s), s> = max{(x, s)
I x E C}.
5.2 A DECOMPOSITION ALGORITHM OF THE DUAL TYPE
217
Finally, let d : T - t R1 and w : T - t Rv be defined as follows:*
32
d(s) = min{a 1 %(a) n P(s) # 0, 01 3
33
w(s)
amin),
= 9(d(s)) n P(s).
34 Proposition. The maps d(.)and w(.) are well-defined. ProoJ: Let us consider d(.) first. Suppose that for some s E T, W(a) n P(s) = o for all a a m i n . Then, since W(amin) C %(a) for all a a m i n , P(s) must separate W(a) from C for all a 3 a m i n , in contradiction of our assumption that C has points in @(a) for some a a m i n (see a m i n such that %(a') n P(s) # 0 , and (20)). Hence, there exists an a' therefore, a" = inf{a 1 W(a) n P(s) # 0 , a amin> a'. Let {ai&, be a sequence in [ a m i n , a'] which decreases monotonically to a" such that W(aJ n P(s) # o for i = 0, 1,2,... . Then the compact sets W(aJ n P(s) form a monotonically decreasing sequence, satisfyingW(ai+l)nP(s) C W(aJ nP(s), for i = 0, 1, 2,..., and hence, W(ai) n P(s) converges in the Hausdorff metric to a set W" n P(s) # O . But by assumption, W(.) is continuous in the Hausdorff metric, hence, we must have W(a") = W", and so %'(a") n P(s) # o . We therefore conclude that d ( . )is well-defined. Now let us turn to the map w(.). We have just shown that the set W(d(s))n P(s) is not empty for all s E T. It remains to show that it consists of a unique point. So, suppose that for some s E T, the set W(d(s))n P(s) contains two distinct points, w1 # w 2 . Then, since that set is convex, it must l}, also contain the linear segment {w I w = Aw, (1 - A) w 2 ,0 A Since by assumption, b(d(s)) is strictly convex (see (15) (iii)), this leads us to the conclusion that P(s) must be a support hyperplane to the set K. However, this is impossible, since by assumption (20), C has points in the interior of K,and hence, P(s) cannot separate C from K as it would if it were a support rn hyperplane to K. Hence, w(.) is well-defined. Now suppose that B, 9 and P satisfy (16)-(19). Then we must have
>
>
>
+
35
B E T,
8 = d(P),
<
8. Then, since B is optimal for (151,since ~ ( ais)convex for every a 3 a m i n , since B(B) c &d(s)), since c is strictly convex, and because of (20), the convex set W(d(s))must have points in the interior of C. But this is impossible, since, by construction, the hyperrn plane P(s) separates W(d(s))from C. Hence, d(s) B for all s E T. Consequently, problem (15) reduces to the following dual problem:
d(s).
Pfoof. Since s is a normal to the hyperplane P(s) which contains both u(s) and w(s), s is orthogonal to [w(s) - u(s)]. Hence, the two vectors, s and w(s) - u(s), are linearly independent, and therefore the set
* By ~ ( xy)C, , we denote the set fx’
= ~ ( xy)x” , 1 x” E C), etc. To show thatp(x, y) is well-defined, we essentially repeat the arguments in lemma (21). 8 The set ~ ( s ,w(s) - u(s))T is a circular arc of unit radius (it is the intersection of T with the plane { x I x = k 4- p(w(s) - u(s)), --oL) < A, p < -OL))).
t
+
5.2
A DECOMPOSITION ALGORITHM OF THE DUAL TYPE
219
is a unit circle in Ry. Constructing a dual for (39) in the same fashion as we have constructed (37), we conclude that
43
p(s, w(s) - u(s))
I
= max{d(s’) S ’ E a@)},
where
44
O(S)
= {s’ E
T I S‘
= AS
+ ~ ( w ( s-) D(s)), A, p E (-
03,
a)}.
In obtaining (43), we make use of the fact that if s’ E n(s, w(s) - u(s)) R”, 1) s’ 11 = 1, is such that n(s, w(s) - u(s)) P(s’), the one-dimensional projection of the hyperplane P(s‘) onto the two-dimensional subspace spanned by s and [w(s) - u(s)], separates n(s, w(s) - u(s)) B ( a m i n ) from n(s, w(s) - u(s)) C, then P(s’) separates 9 ( a m i n ) from C, i.e., s’ E T, because
P(s’) n ~ ( s w(s) , - ~(s)) Iwy
= ~ ( s w(s) , - ~(s)) P(s’).
Conversely, if s‘ E a@), so that s‘ = n(s, w(s) t u(s)) s’, and ( y , s’) 2 ( x , s’) for all y E B ( a m i n ) , for all x E C, then ( y , s’) 2 ( x , s’) for all y E n(s, w(s) - u(s)) W(mmin), for all x E n(s, w(s) - u(s)) C. Consequently, since s E ~ ( s ) ,we have 45
P(S,
4 s ) - 4s))
2 4s).
Suppose that P @ , 4 s ) - 4s))
46
=
4.9.
Then we must have 47
n(s, w(s) - v(s)) u(s)
=
n(s, w(s) - u(s)) w(s).
But this is impossible, since u(s) # w(s) and the vector w(s) - u(s) is in the range of the projection operator n(s, w(s) - u(s)). Hence, (42) must hold. In view of (43) and (44), we see that the map A ( - ) is well-defined, i.e., there is no s E T such that A(s) = 0 . We can now state our first algorithm for solving the dual problem (37). This algorithm has a rather interesting history, having evolved over a number of years through the work of Krassovskii [K3], Neustadt “11, Eaton [El], and Polak and Deparis [p4]. The version below was first presented by Polak in [P2]. 48 Algorithm (for dual problem (37), Polak [P2]).
Step 0. Step 1 . Step 2. Step 3. i = i 1,
+
Compute a point so E T, and set i = 0. Set s = si . Compute a point s‘ E A(s). If d(s’) = d(s), set si+l= s and stop; else, set si+l = s’, set and go to step 1.
5 C O N V E X OPTIMAL CONTROL PROBLEMS
220
“)
@(s’
Sets and maps for algorithm (48).
49 Theorem.
Let {si} be any sequence in T constructed by algorithm (48). Then either {si}is finite and its last element is optimal for the dual problem (37), or else {si} is infinite and every accumulation point of {si} is optimal for the dual problem (37). Proof. Quite obviously, algorithm (48) is of the form of the model (1.3.9), except that it maximizes d(s) instead of minimizing c(s). We shall therefore need to put a minus sign in front of all the relations involving the map d(*) defined in (32), when showing that the assumptions of theorem (1.3.10) are satisfied by algorithm (48). For this purpose also, we define a point s‘ E T to be desirable if it is optimal for the dual problem (37). Suppose that s E T is optimal for (37). Then we must have
50
d(s)
= max{d(s’) 1 s’ E
T } 3 p(s, w(s) - u(s))
= max{d(s’) 1 s’ E a($)},
and hence, since s E a@), d(s) = p(s, w(s) - u(s)). (Note that when s is optimal, u(s) = w(s), and therefore u(s) = {s}.) Conversely, suppose that d(s’) = d(s) for any s’ E A(s); then, by lemma (41), s must be optimal for (37). Because of this, the case of a finite sequence {si} is trivial. To establish the theorem for the case when {si} is infinite, we must show that assumptions (i) and (ii) of theorem (1.3.10) are satisfied by the map -d(.) defined by (32) and the map A ( . ) defined by (40). Since -d(s) 2 --B for all s E T,where -B is defined as in (16), we see that (i) of (1.3.10) is satisfied
5.2
A DECOMPOSITION ALGORITHM OF THE DUAL TYPE
221
by -d(.), and hence we only need to show that for every nonoptimal s E T, which could, conceivably, be an accumulation point of a sequence {si}in T constructed by algorithm (48), there exist an E(S) > 0 and a 6(s) < 0 such that 51
-d(s”)
+ d(s’) < 6(s)
for all s’ E B(s, ~(s)), for all s”E A($’)
a m i n , we define T(a) = {s E T I ( x - v(s), s)
> 0, for all x E &?(a)}.
Since d(si+,)> d(si) for i = 0, 1, 2,..., we must have B(d(si,3) 3 W(d(s,)), and hence, T(d(si+,))C T(d(s,)) C ... C T(d(so))for i = 0, 1,2,... . Now, let s’ E T(d(s,)).Then we must have d(s’) 2 d(s,), and since both T(d(s,)) and W(amin) are compact, there must exist a /l > 0 such that (x - u(s’), s) 2 /l, for all x E W(O1min) for all s‘ E T(d(s,)). Otherwise, for some s’ E T(d(s,)), there exists an x’ E W(amin) such that (x’ - v(s’), s’) = 0, implying that d(s‘) = amin,in direct contradiction of the fact that d(s’) 2 d(s,) > amin. Since siE T(d(s,)) for i = 1,2, ..., every accumulation point s* of this sequence must satisfy (x - D(s*), s*) 2 /3 > 0 for all x E W(amin). Consequently, the only nonoptimal points s E T which can, conceivably, be limit points of a sequence {si} constructed by (48), must satisfy 54
rnax{(x
-
~(s), s> I x E W(amin)}
> 0,
i.e., they must belong to the relative interior of T. Thus, we only need to prove (52) for all nonoptimal s in the relative interior of T. Obviously, it will suffice to show that d(.)and a(.) are both continuous at any nonoptimal s in the relative interior of T. Continuity of d(.). Let s be any point in the relative interior of T, i.e., > /3 > 0 for all x E g ( a m i n ) and let 6’ E [ a m i n , d(s)) be
<X - ~(s),S )
222
5
CONVEX OPTIMAL CONTROL PROBLEMS
arbitrary. Then the sets W(d(s) - S’) and P(s) are strictly separated, since %(d(s) - 8‘) n P(s) = 0 , by the definition of d(s). Let w’E P(s) and
w” E W(d(s) - 8’) be such that
1) w’ - w” 11
55
Let w
= &(w’
56
=
min(1)x - y /I I x E P(s),y
E
W(d(s) - S’)}.
+ w”);then there exists a y > 0 such that < -y,
- w,s) xsc max(x
xE&j$-*,)(x
-
w,s> 3 Y .
Now, by theorem (B.3.20), the functions max(x xsc
-
and
w, -)
min
x€W( d(s)--6’)
(x
-
w, .)
are continuous on DBY, since both C and W(d(s) -- 8’) are compact, and the scalar product is jointly continuous in both of its arguments. Hence, there exists an E‘ > 0 such that for all s’ E T, 11 s - s’ 11 < E’, max(x - w,s‘)
57
xsc
-Y d(s) - 6‘
for all s’ E T,
I/ s’
< E’.
- s 11
Now, let 6” E (d(s), $1 be arbitrary. Then we must have 59
min
X€9(d(S)+6”)
Let v’ 60
= v(s) - y’Xs,
xsWf-6&c
where y’
(x E
-
< 0.
(0,l) is such that u’
< -(1
- u’, s)
u(s), s) = - A
- y’)
A,
E
C . Then we have
max(x - u‘, s) xsc
=
y’X.
Invoking once again theorem (B.3.20), we conclude that there exists an E” > 0 such that for all s’ E T, satisfying 11 s’ - s (1 < E”, 61
min
(x
x€h(d(s)+d’)
- u‘, s’)
< - (1
-
Y‘)
,
max(x xec
-
v‘, s‘)
Y’h 2 -. 2
But (61) implies that 62
d(s’)
< d(s) + 6”
for all s’ E T, 11 s’ - s 11
Combining (58) and (62), we conclude that d(.) is continuous at s.
< E”.
5.2
A DECOMPOSITION ALGORITHM OF THE DUAL TYPE
223
Continuity of a(.). First, by arguments similar to the ones used above, it can be shown that the mapp : Iwy x Rv + R1, defined by (39), is continuous at every pair of vectors (x, y ) which are linearly independent. Now, whenever s is not optimal for the dual problem (37), w(s) - u(s) # 0 and is orthogonal to s. Hence, we conclude that a(.) is continuous at every nonoptimal s E T if w(.) is continuous at every nonoptimal s E T (recall that we have shown in (25)that v(.) is continuous). Let s E T be nonoptimal and let ( s , } ~ =be ~ any sequence in T converging to s. Then, since d(.) is continuous, d(s,) d(s) as i - co. Then, since 9(-) is continuous, W(d(s,))-+ W(d(s))as i - co.Now, let w be any accumulation point of the sequence ( W ( S , ) ) ~ = ,~ i.e., w(s,) w for i E K C (0, 1, 2,...). Then w(s,) E &(d(s,)), and therefore w E W(d(s)). Also, since w(s,) E P(s,), we must have 63 (w(s,) - u(s,), s,> = 0 for i = O , 1 , 2 ,.... -+
-+
But, for i E K, w(s,) -+ w, u(s,) -+u(s), and s, + s, as i since the scalar product is continuous,
64
(w
- u(s),
--f
00.
Consequently,
s) = 0,
i.e., w E P(s). But by the nature of problem (37), 9(d(s)) n P(s) consists of exactly one point, w(s). Hence, we must have w = w(s), and as a result, w(s,) + w(s) as i 00, which proves that w(.) is continuous, and completes our proof. We shall now show what is involved in applying algorithm (48) to the optimal control problem (1)-(5). First, to compute an so E T, we may proceed as follows: We solve the quadratic programming problem, --f
+
for its unique solution 5. Let 6 = dk R k f . Then B(olmin)= (G}, and by (201, q ( 6 ) > 0. We now compute X E 10, 11 by solving q(h5 (I - A) a,) = 0, i.e., by solving*
11 h6
65
+ (1 - h)
$k
- 2k
+
;1
f 2p = 0,
which yields 66
+
x)
The point P = JIG (1 - 4, is obviously on the boundary of C , and the tangent hyperplane to C passing through P must separate from dk Rk,f
* Note
that x
E
e, where C
=
+
{x 1 q(x) Q
O}.
224
5 CONVEX OPTIMAL CONTROL PROBLEMS
and hence it must separate 9(amin)from C . Consequently, the vector so = (1/11 Vq(3;1)11)Vq(Z)E T, where 67
Vq(2)
=
Q(2 - 9J.
Next, suppose that we have an s E T and that we wish to calculate u(s). Then, because of the particular form of C , we do not need to solve the nonlinear programming problem (24), i.e., max{(x, s) I x E C}. Instead, we see that we can compute u(s) from the fact that 68
x > 0,
Vq(4s))= h, q(u(sN
= 0.
From the first part of (68), we get 69
U(S) - 9 k = AQ-4,
and from the second part of (68), we get h2(Q-4, S )
70
so that
= d2/3/(Q-'s,
-
2/3
= 0,
s). Substituting back into (69), we finally obtain
71
Thus, there is no difficulty, in this case, in computing u(s). To compute d(s) and w(s) we must solve the quadratic programming problem, 72
subject to 73
(d,+Rkz-u(s),s)
=0,
Izi[
< 1,
i = 1,2,..., k.
Suppose that z(s) is optimal for (72), (73). Then it is easy to see that 74
75
W(S)
= dk
+ R~z(s).
Thus, we can compute both d(s) and w(s) by means of finite procedures.
5.2 A DECOMPOSITION ALGORITHM OF THE DUAL TYPE
225
76 Remark. Suppose that the matrix P is the zero matrix. Then, in (65), 6 = dk and the evaluation of d(s) and of w(s) becomes considerably simplified, since, from the optimality conditions (1.2. IS), it now follows that 77
zi(s) = sat(+(Ak--”B,s)),
i
=
1, 2,..., k.*
To compute #, we substitute from (77) into (73), to obtain the piecewise linear equation in I/ k
C sat(#(Ak-iB, s))(Ak-iB, s)
78
=
-(dk - u(s), s).
i=l
Since -(dk - u(s), s ) < 0, because dk is 9(amin)and s E T, and since the we conclude that right-hand side of (78) is monotonically increasing in $ < 0. In any event, since the graph of the right-hand side of (78) is piecewise linear, the computation of # can be carried out quite simply and in a finite number of iterations. w
+,
+
79 Exercise. State a simple procedure for computing satisfying (78). w So far, we have encountered no difficulties in applying algorithm (48) to problem (1)-(5). However, we are about to find one in the computation of p(s, w(s) - u(s)), and hence of a point s‘ E A(s), since this requires us to solve the problem, min{d(s’) I s’ E T, s‘
80
=
As
+ p(w(s) - u(s)), A, p E (-
03,
a)}.
There is no finite procedure for solving (80) and hence, in practice, some sort of approximation must be introduced. It has been found empirically that one obtains satisfactory results by picking an integer M (usually M = 3 or 5 will do) and by examining the points si
=
U/Ils
+ (j/M)(w(s)- m l ) b + (j/M)(w(s)- 4sNl for j
=
1, 2,..., M ,
and then setting s‘ = sj, , where d(s,,) 3 d(si) for j = 1, 2,..., M. This procedure can be refined further by first multiplying (w(s) - u(s)) by a , On the problem described, this suitably chosen scale factor y ~ ( 0 1). approach results in an algorithm that is considerably faster than the method of feasible directions, which is an obvious alternative. This is partly due to the fact that an so for (48) is much easier to compute than a zo for (4.3.26). Finally, suppose that we have an optimal P for the dual problem which
* Recall, sat(y) = y for all y E [-1,
I ] and sat(y) = sgn y otherwise.
226
5
CONVEX OPTIMAL CONTROL PROBLEMS
corresponds to the optimal control problem (1)-(5). Then the optimal control sequence Zi, , Zil ,..., for (1)-(5) is given by
6. = zi+l(1),
81
i =0,1,2
,..., k
-
1,
where the zi(B) are determined from (72) and (73) (or from (77) and (78) when P = 0). While in practice the heuristic procedure outlined above is quite adequate, the theoretically minded may feel more comfortable with an algorithm of the form of model (1.3.33), which can be constructed as follows: For S E [0, 2 ~ 1let* ,
x [(cos 6) s
+ (sin S)[w(s) - u(s)].
83 Exercise. Let 8 E [0, 2771 be such that 8 = max(6 I s’(6, s) E T}. Show that d(s’(., s)) is quasi-concave on [0, 81. (A function d‘ : [0,8] + Iwl is said to be quasi-concave if the set (6 E [0, 81 I d’(6) >, y } is convex for every real y.) a 84 Exercise. Show that if
8 E [0, 27r] is such that
d(s’(8, s)) = max(d(s’(6, s))
then 85
I 6 E [o, 2 ~ 1 1 ,
8 E [0, 81, where 8 is as defined in (83). As a result, show that p(s, w(s) - u(s))
= max(d(s‘(6, s)) = max(d(s‘(6, s))
I 6 E [0,27r]} I 6 E [O,81).
86 Exercise. Show that the Golden section search procedure(2.1.14)can be used to compute p(s, w(s) - u(s)) because d(s‘(*,s)) is quasi-concave on [0, 81. a Because of the facts which the reader was invited to prove for himself in the preceding three exercises, algorithm (48) can be extended to conform to the model (1.3.33), as follows: 87 Algorithm (for solving problem (37), Polak [P3]). Step 0. Compute an so E T; select an C > 0, a set i = 0. Step 1. Set E = C. Step 2. Set s = si . Step 3. Compute u(s), w(s) and d(s).
E
(0, l), and a p 2 1;
* Referring to (44),we see that an alternative description for o(s) (in spherical coordinates) is ~(s)= Is’@, s) E T I 6 E [O, 2~11.
5.2
A DECOMPOSITION ALGORITHM OF THE DUAL TYPE
Step 4. Step 5. Step 6 . step 0) to Step 7. Step 8 .
227
If ~ ( s )= w(s), set = s and stop; else, go to step 5. Set O(8) = -@‘(a, s)) for 6 E [ 0 , 2 ~ ] . Use procedure (2.1.14) (with the current value of E , and p as in compute a F.* Compute d(s’(ii, s)). If d(s’(p, s)) - d(s) 3 E , set si+l = s’(p, s), set i = i 1, and go to step 1; else, set E = p e , and go to step 6.
+
88 Exercise. Show that if {si} is a sequence in Tconstructed by algorithm (87), then, either {si} is finite and its last element is optimal for (37), or else {si} is infinite, and then every accumulation point of {si} is optimal for (37). [Hint: ‘Show that the assumptions of theorem (1.3.27) are satisfied by algorithm (87). J rn
89 Exercise. Show that if algorithm (87) is modified into a time-varying version by replacing the words “go to step 1” by the words “go to step 2” in the instruction in step 8, then the convergence properties stated in (88) remain unaffected. [Hint: Make use of an appropriate model in Section 1.3.1 In problem (1)-(5), the constraint set C was described by a single quadratic inequality. As a result, the computation of the point ~ ( s ) presented no difficulty whatsoever. However, suppose that 90
C
=
{x E BBy 1 qi(x) < 0, i
=
1 , 2,..., m>,
and the q“.) are strictly convex, but not quadratic functions, and that C has an interior. Then there is no way for computing ~ ( s )in a finite number of iterations, and hence neither algorithm (48) nor algorithm (87) can be applied to solving the dual problem (37) without some additional modifications. To keep the discussion as simple as possible, we shall develop a heuristic elaboration of algorithm (48). The reader may wish to extend this elaboration to algorithm (87) for himself. We shall suppose that the functions q“.) are not only strictly convex, but also continuously differentiable. We begin by introducing an exterior penalty function for the set C ; let p : Ry-+R1 be defined by m
PW
91
=
c (max{O,
qi(x)>Y,
i=l
which we then use to define an approximation set to ~(s), as follows: Let y 3 0, > 0 be given scale factors; then, for every E > 0 and s E T, we define 92
* It is necessary to modify (2.1.14) slightly so as to ensure that 8 is as in (83).
[ a , , b,] C [O, 81, where
228
5 C O N V E X OPTIMAL CONTROL PROBLEMS
Note that if y is chosen to be zero, then V,(s) contains exactly one point, u,(s), which minimizes (the convex function) - ( x , s) (1//3~)p(x) over Ry (since (x, s) = 0 is not possible for all x E C because C has an interior). Furthermore, by referring to Section 4.1, we find that if y = 0, then (see theorem (4.1.21)) u,(s) + u(s) as E 0, E E [0, ;I, t‘ > 0,* and (see lemma (4.1.11)) 93 - < - for all E E [0, 21.
+
--f
Consequently, if y
= 0,
94
30
for all
E E
[0, C],
and therefore, u,(s) is separated from C by the hyperplane P(s) passing though u(s), with normal s. Now, let 95
P ( x , s)
= {x’ E
R” I ( x ’ - x, s) = O},
x
E
W) s E s,
i.e., P(x, s) is a hyperplane through x with normal s. Then the set C must lie to one side of P(u,(s), s), i.e., 96
<x - u,(s), s>
0, a /3~(0,l), and scale factors y 2 0, /3 > 0; set i = 0. Step 1 . Set E = C. Step 2. Set s = si. Step 3. Compute a point u&) E V,(s) (as in (92)). Step 4. Compute d(u,(s), s), 6(u,(s), s). Step 5 . Set s’(6) = [l/ll(cos 6 ) s (sin S)(i%(u,(s), s) - u,(s))II][(cos 6 ) s (sin 6)(6(u&), s) - v,(s)] for 6 E [0, 2771.
+
€
* Assuming, of course, that u&) [O, 4,i> 0.
E
+
is well-defined and remains in a bounded set for all
229
5.2 A DECOMPOSITION ALGORITHM OF THE DUAL TYPE
Step 6. For each 6 E [0, 2n-1, compute a vector u,(s’(S)). Step 7. Compute 8 E [0, 2n-] such that
d ( W ( s ) ) , ~’(8))= ma@(W(6>), ~’(6))I 8 E [o, 2n-1).
100
Step 8. If d(u6(s’(8)), ~‘(8))- d(u,(s),s) 2 E, set si+l = s’(8), set i and go to step 1; else, set E = and go to step 3.
=i
+ 1, w
101 Exercise. Justify algorithm (99) by showing that if y = 0, and {si} is any infinite sequence constructed by algorithm (99), then any accumuIation point of {si} is optimal for the dual problem (37). [Hint: Show that the assumptions of theorem (1.3.27) are satisfied by algorithm (99) with y = 0.1 w In practice, of course, we cannot set y = 0, since we would not be able to compute u,(s) in a finite number of iterations by any of the methods discussed in Chapter 2. In addition, we cannot possibly compute a u,(s’(S)) for every S E [0, 22~1,as specified in step 6 of (99). Thus, we would choose a y > 0 and use a finite search over the circle (~’(6)16 E [0,2n-]}. For example, we might restrict ourselves in steps 6 and 7 of (99) to the values 6 = 0,2n-/M, 4n-/M,62~/M, ..., 2rr(M- l/M), where M is a judiciously chosen integer. To further illustrate the applicability of algorithm (87) to optimal control problems, let us consider the two continuous optimal control problems given below.
102 Example. Consider the minimum-time optimal control problem, 103
minimize
T,
t
subject to 104
- ~ ( t= ) Ax(t)
+ Bu(t),
105
x(0) = 0,
q(x(T))
d dt
E
[0, TI, x(t) E R’,
< 0,
I u0)l
< 1,
~ ( tE )R1, t
E
10,
TI,
where A , B are constant matrices, q ( . ) is as defined in (4), and u(.) is a piecewise continuous function. We shall assume that (104) is completely controllable. For this problem, we define, for a 2 0 (amin = 0), 106
%(a) =
Ix(Lx’)
I
e(m‘-f)ABu(t) dt u (-) E
= 0
* That is, x(a’) is the solution of (104) at t some admissible control u(*).
= a’, with
*
a, a’ E [0, a]\,
x(0) = 0, and corresponding to
230
5 CONVEX OPTIMAL CONTROL PROBLEMS
where is the set of all real-valued, piecewise continuous functions u(.), defined on [O, a)and satisfying j u(t)l < 1 for all t E [0, a).It is easy to see that for every a >, 0, 2 ( a ) is a compact, convex set, and that W(.) is continuous in the Hausdorff metric. In addition, since (104) is completely controllable, it can be shown that the set B(a)is compact and strictly convex for every a 3 0, and that B(a')is contained in the interior of %(a")whenever 01' < a". (These facts can be found in most intermediate level texts on the theory of optimal control.) Thus, this problem satisfies the assumptions (15 ) (i)-(iii). To compute d(s) and w(s) for this problem, we must solve the subproblem, min{T' I &T', u) E P(s), u E a},
107
where [(T', u) is the solution at T' of (104), corresponding to x(0) = 0 and to the indicated control u(.) E a. Applying the maximum principle (1.2.35) to problem (107), we find that if u(-, s) is the optimal control, T'(s) is the minimum time and x(., s) is the corresponding optimal trajectory, then 4 V ' Y 4 . 7 3)) = x ( T ' , s),
d
108 - x(t, s) dt
= M t , s)
+ Bu(t, s),
t
E
[O, T'(s)],
x(0, s)
= 0,
u(., s) E 42,
the corresponding costate p ( . , s) satisfies 109
and the maximum relation (1.2.38) is satisfied, i.e., 110 BTp(t, s) u(t, s) 2 BTp(t, s) v
for all u E [ - 1,
+11,
and almost all t E [0, T'(s)].
From (110), we conclude that u(t, s) = sgn BTp(t,s)
111
for almost all t E [0, T'(s)].
Now, from (log), p ( t , s) = e- (t-T'(dL4' *s
112
for t
E
[0, T'(s)],
and hence, 113
u(t, s) = sgn(#(s, e--(t-T'(s))AB)), for almost all t E [0, T'(s)].
5.2 A DECOMPOSITION ALGORITHM OF THE DUAL TYPE
231
To determine z,b in (113), we make use of the boundary condition x(T'(s),s) E P(s), i.e., of the equation, 114
(x(T'(s),s) - u(s), s)
= 0.
Now, from (108), 115
B sgn(+(s, e(T'(S)-t)A B ) ) dt. Substituting into (1 14) and rearranging terms, we obtain, 116
Since we may set z,b = - l , * (1 16) must be solved for T'(s) only. Obviously, this is much harder to do than to solve the piecewise linear equation (78) which we had encountered in the discrete optimal control problem (1)-(5). In fact, there is no procedure for solving (I 16) in a finite number of implementable operations (even on a digital computer with infinite word length), and hence, in practice, one must always use a heuristic method for truncating the search for a solution of (1 16) after a finite number of operations. However, assuming we can solve (116), then d(s) = T'(s), and w(s) = x(T'(s),s). This example clearly illustrates the fact that, as a rule, continuous optimal control problems, even simple ones, are much harder to solve than similar discrete optimal control problems. It also helps to point up the fact that available algorithms, when applied to continuous optimal control problems, must usually be considered as conceptual only, since any implementation of these algorithms requires some, usually heuristic, modification. 117 Example. Consider the minimum energy optimal control problem, 118
+ 1' ~
minimize
( tdt,) ~
0
subject to 119 120
d dt x(t) = Ax(t)
-
x(0)
= 20
9
+ Bu(t),
t
q(x(T)) < 0,
E
[O, TI, x(t) E
I u(t)l
< 1,
[wy,
u(t) E R1, t t 10, TI,
* For all s E T such that d(s) = T'(s) > 0, we must have 0 in (116)) such that 130
-j
T'
(s, e(T'-t)AB) sgn((s, e(T'-t)AB))dt 0
< (u(s), s) < -j
T"
(s, e(T"-t)AB) sgn((s, e'T"'-t'AB)) dt,* 0
and then continue to subdivide this interval to obtain new values for T', T", satisfying (130) until T" - T' is adequately small. Finally, approximate T'(s) by T'. To compute an adequate approximation to t,h for (128), proceed in the C (-co,O] such that same spirit as forT'(s), above,i.e.,find an interval [+,$I for Ifi = #', the left-hand side of (128) is smaller than -(erAi,, - u(s), s), and for Ifi = #", the left-hand side of (128) is larger than this quantity. Then w reduce this interval to acceptable size by consecutive halving. 131 Exercise. Devise an algorithm for solving the minimum-time problem (103)-(105) by adding to algorithm (87) yet one more approximation procedure for calculating d(s) and w(s), making use of the suggestion in the preceding remark. w
* See footnote on p. 231.
5 CONVEX OPTIMAL CONTROL PROBLEMS
234
5.3
A Decomposition Algorithm of the Primal Type
In this section we shall present an algorithm for solving the same type of problem as the ones considered in the preceding section, but under somewhat less restrictive assumptions. In particular, referring to (2.15), we find that in the preceding section, we had assumed that both the target set C and the sets &(a), a 3 a m i n were strictly convex. However, in many cases of interest, this is not true, as, for example, in problem (1.3)-(1.6), where, for a! 3 0, a(a) =
1.
= dk
+
Rkz
1
k-1
Y
,dz 2=1 j=1
Iz"l
+2 z=1
< a,
kl
(see (1.10) for notation), which (when not empty) is a convex polytope and hence is not strictly convex. Note also, that in (1) W(a)may be empty for all a < amin, with amin3 0. Consequently, we introduce the geometric problem below, which differs from the primal problem (2.15) only in the nature of the assumption made, into the form of which we can transcribe the optimal control problems considered in Section 2 as well as problems of the form (1.3)-(1.6).
2 The Geometric Problem. We are given a convex compact set C C Ry, and a map W : [am,, , 00) -+ 2'", a m i n 3 0, such that (i) for every a! 2 a m i n , W(a) is a compact convex set; (ii) W(a')C &?(a")whenever a' < a"; (iii) for any 01 3 a m i n , for any open set 0 3 W(a), there exists an E > 0 such that for all a' 3 a m i n , satisfying 1 a' - a I < E , a(&') C 0. We are required to find an d 3 a m i n and an 4 E C such that 3
d
4
$E
5
=
min{a I W(a) n C # O , 01
3 amin},
W ( d ) n C.
Remark. Note that problem (2) differs from problem (2.15) not only in that the requirement of strict convexity for C and &(a) in (2.15) has been relaxed to a requirement simply of convexity, but also in that the continuity specified in (iii) of (2) is more frequently satisfied than the continuity specified in (ii) of (2.15). In fact, if a(.)is continuous in the Hausdorff metric, it is continuous in the sense defined in (iii) of (2); however, the converse is not true.
6 Assumption. To ensure that problem (2) has a solution, we shall assume that there exists an a m a x 3 a m i n such that W(amax) n C # O .
5.3
A DECOMPOSITION ALGORITHM OF THE PRIMAL TYPE
7 Lemma. Assuming that (6) is satisfied, there exist an d E [a min , a,] an f E C which satisfy (3) and (4).
235
and
Proof. Since GY = [ami n , ( ~ m a x ]is compact, there must exist an E GY, such that 62 = inf{a I %(a) n C # i a , a E a>.Let {ai}& be any sequence in Gi‘ which decreases monotonically to 8, satisfying, in addition, @(ai) n C # m for i = 0, 1, 2, ... . Then, by (ii) of (2), the compact sets B(aJ n C form a monotonically decreasing sequence, i.e., n C C @(ai) n C for i = 0, 1,2,... . Consequently, the sequence of sets {%(ai) n C } z o converges to the set (-)Lo (%(at)n C ) # 0.But assumptions (2) (ii) and (iii) imply that @(ai)-+a(&), as i -+m, and by (2) (i), %(d) is compact. Hence we must (@(ai)n C ) , and therefore, d satisfies (3). Since have %(a) n C = @(&) n C # o,there obviously must exist an f satisfying (4). To define an algorithm for solving the geometric problem (2), we shall make use of the following four maps which are simple extensions of the maps in the preceding section: For every nonzero s E R”, let V(s)be defined by
nfo
8
V(s) = { u E C I (x - u, s)
< 0 for all x E C } ,
i.e., V(s) consists of all the points u on the boundary of C which satisfy (0, s) = max{(x, s) I x E C } . When C is strictly convex, V(s) consists of one point only, that point being u(s), as defined in (2.24).
9 Exercise. Show that V(s) is compact and not empty because the scalar product is continuous and C is compact. (Note that {v E C I (x - u, s) < 0 for all x E C} is empty.) 10 Exercise. Show that for every nonzero s E R”, for any open set 0 3 V(s), there exists an E > 0 such that 0 3 V(s’)for all nonzero s’ E Ry satisfying
11 s’ - s 11 < E , i.e., show that V ( - )is upper semicontinuous. Note that if u’ E V(s), then ( u - u’, s) = 0 for all u E V(s), i.e., V(s) is contained in the hyperplane {x E RyI (x - u’, s) = O}. In view of this observation, we can state the following definition: For every nonzero s C UP, let P(s) be the hyperplane, 11
P(s) = {x E R”1 (x
-
u, s)
= 0, u E
V(s)}.
(Note again that all the u E V(s) define exactly the same hyperplane P(s), since, quite obviously, V(s) = P(s) n C.) Next, again let d = [ a m i n , amax]. Then, for any nonzero s E [wy such that P(s) n %(amax) # 0,we define d(s) as follows: 12
if ( x - v, s) < 0 for some X E @ ( ~ ~ , , ) ,u E V(s), min{a I P(s) n %(a) # 0,a E d}, otherwise.
236
5 CONVEX OPTIMAL CONTROL PROBLEMS
Finally, for all s E Iwy such that P(s) n W(amax) # a , we define W(s) by W(s) = {w E W(d(s)>I (w - u, s>
13
< 0, u E V(s)>.
It is not difficult to see that, just as in the case of P(s), d(s) and W(s)do not depend on the particular v E V(s)used in (12) or (13), i.e., they are functions of s only. The reason for the difference between the d(.) defined in (2.32) and the d(.) above, which is expressed by the first line of (12), is the fact that the domains of definition of these two functions are different. With T defined as in (2.31), we find that d(s) = d(s) for all s E T. We shall now present an algorithm for solving the geometric problem (2), which combines the geometric ideas that were used in the construction of algorithm (2.48) with those of the Frank and Wolfe method [F5].The algorithm below does not exhaust all the possibilities of combining the geometric ideas developed in Secdon 2 with those of the Frank and Wolfe algorithm. For an alternative approach, see [B3]. However, the algorithm below seems to have a greater range of applicability.
Sets and maps for algorithm (14).
14 Algorithm Step 0. Step 1 . Step 2. and go to Step 3.
(Meyer-Polak [M6]). Compute an xo E C and a yo E W(amin); set i = 0. Set x = x i , y = y i . If y E C, set xi+l = y , yi+l = y, and stop; else, set s step 3. Compute a u E V(s).
=y
- x,
5.3
A DEqOMPOSITION ALGORITHM OF THE PRIMAL TYPE
237
Step 4. Compute &). Step 5 . Compute a w E W(s). Step 6. Compute* a y’ E [y, w ] and an x’E [x,v] such that
15
ll Y‘
- x’II = minil/Y”
-
x” II I y” E [ y , wl, x”
4).
E [x,
+
Step 7. Set xi+l= x’,yi+l = y’; set i = i 1, and go to step 1. We shall now show that algorithm (14) is of the form of model (1.3.9) and hence that its convergence properties can be deduced from theorem (1.3.10). where , & is as in (3), and we For this purpose, we define T = C x 9(&) define z = (x,y ) (x E C, y E W(0z)) to be desirable if y E C. Next, for every z = (x,y ) E T, we define c(z) by
44 = II x - Y II.
16
Finally, we define A : T -+ 2* by {(y, y)},
if z
(J {z’ = (X’Y
I
= (x,y ) i s
desirable,
y’) I x’ E rx, 01, Y’
E
[ y , wl;
WEW(W--5)
c(z‘)
= min{c(z”) I
x” E [x,v], y” E [y, w]}}, otherwise.
18 Exercise. Suppose that x E C and that y E g(0z).Show that P ( y - x) n a($)#
o.
Hence, show that algorithm (14) and the corresponding map A ( * )in (17) are well-defined. w
19 Exercise. Suppose that z = (x,y ) E T, x # y. Show that if 0 is any open set containing W(y - x), then there exists an E > 0 such that W(y‘ - x’)C 0, for all z’ = (x’,y’) E T such that 11 x’ - x 11 < E , 11 y’ - y 11 < E. T, x # y. Show that
20 Exercise.
Suppose that z
21
Z(z) b min{c(z’) I z’ = (x‘,y‘), x’E [x,u], y’ E [y,w], v E V ( y - x), w E W ( y - x)} < c(z).
22 Exercise.
(0
= (x,y ) E
Let C(.) be defined as in (21), and let z
= (x,y )
w
be such that
* Given any x , y E UP, we denote the line segment joining x and y by [x, y ] , i.e., [x, yl = = xu + (1 - A)y I A E [O, 111.
238
5
CONVEX OPTIMAL CONTROL PROBLEMS
y $ C , i.e., suppose that z is not desirable. Show that for any 6 exists an E > 0 such that
< C(z) + 6 - x 11 < I/ y' - y 11 < E } .
23
E(z')
for all z'
> 0 there E
B(z, E),
where B(z, E) = {z' E T I I/ x' E, [Hint: See the proof of theorem (2.49).] rn In view of (21), we see that if z E T is not desirable, then Z(z) = min(c(z') I z' E A(z)} < c(z). Next, since c(.) is continuous and because of (23), we see that if z E T is not desirable, then there must exist an E > 0 such that, for all z' E B(z, E), 24
E(z')
25
< E(z)
E(z)
-
-
c(z)
3
Consequently, we must have
26
E(z') - c(z')
< C(z)
-
c(z)
3
0, for otherwise, d(s) = 0. This problem can also be solved in a finite number of implementable operations (as explained in remark (2.76)) to yield a control sequence 0 0 0 uo(s), ul(s),..., uk-x(s) which is optimal for (36)-(38) and, in addition satisfies k-1
39 0
40
w
=
[Ak$,
+
k-1
AL-i-lBui(:)] E
w(:).
i=O
0
0
0
We now have a point x E C, a point y E W(O), a point v 0 0 1 1 point w E W(s).To compute x, y , we must solve the problem, 41
minimize
{I1
0 0
x -y
112 I x E [x, u], y
0 E
0
E
V(s) and a
0
[ y , w]}.
This is a simple 8uadratic programming problem whose solution results in 0 two parameters, X and p, both contained in [0, 11, such that I
42
00
x=Xxf(l-
Note that 43 1
1
2
2
Having obtained x, y, we now repeat our calculations to obtain x, y, etc. However, we are not interested in simply finding an in C and a in %‘(a) (where OZ is the minimum cost for (36)-(38)) which are sufficiently close i i together; we also wish to find an admissible control sequence, uo , u1 ,..., h k - 1 , which takes the system (37) from the initial state so to the terminal state i x k = j ~ and , which satisfies CFZi uj2 < 4, where & is the minimum cost for (36)-(38). Because of this, we should organize our calculations as follows:
p
44 Algorithm (solves problem (36)-(38)). 0
0
0
Step 0. Set x = 0, y = A k i 0 ; set ui = 0, i = 0, 1,..., k - 1; set i = 0. Step 1. Set x = i,y = p. Step 2. If Q y 5 q, set xiCl = x, yi+l = y, and stop; else, set s = y - x, and go to step 3. Step 3. Solve the linear programming problem, min{(x, s) I Qx 5 q}, for a vector u.
5.3
A DECOMPOSITION ALGORITHM
OF THE PRIMAL TYPE
241
Step 4. If (Ak$, - u, s) < 0, set w = Akg0 ; else, solve (36)-(38) (as explained in (2.76)) for a control sequence uo(s),uI(s),..., uk&) and set w = A", A"-i-lBui(~). Step 5 . Compute A, ii E [0, 11 such that
+
45
II[XX
+ (1
-
4 01 - [Is;v + (1 - i4 w311
= min{ll[h if1
+ (1 - 4
Step 6. Set x = Ax puj (1 - p) uj(s), j step 1. i
+
+ (1
Ul -
[CLV
A) u; set
+ (1 -PI
if1
y = Fiy = 0, 1, 2,..., k - 1; set i -
46 Exercise. Show that CFZt hjZ < 8, for i
wl11 I A p E [O, 11).
+ (1 - p) w ;set i+luj = = i + 1, and go to
= 0, 1, 2,..., where B is the i+ 1 i+ 1 minimum cost for (36)-(38). Also show that y =Akko+CTZi Buj. In conclusion, we should note that while it may have been convenient, for the purpose of exposition, to characterize the algorithm in Section 2 as dual and the algorithm in this section as primal, this distinction is rather artificial. Their similarity, which results from the fact that they both emanate from the same geometric characterization of optimal problems, is much more significant than their differences, and leads us to classify them under the common and more sensible grouping of geometrically derived decomposition algorithms for optimal control problems.
6 RATE O F CONVERGENCE
6.1
Linear Convergence
The subject of rate of convergence can be treated in a very general and very abstract manner. However, for our purposes, we may restrict ourselves to a few rather simple concepts. In this section we shall only need the concept be a sequence in of linear convergence, which we define as follows: Let a Banach space 9? which converges to a point z*. We shall say that {zi} converges to z* at least linearly (or that its rate of convergence to z* is at least linear) if there exists an integer k 2 0, a constant E and a 8 E [0, 1) such that 1 11 zi - Z* 1 1 1< EOi for all i 2 k,
{zi}zo
i.e., we say that the convergence of {z~}is at least linear if 11 zi - z* llg + 0 as i -+ 03 at least as fast as a geometric progression (as before, we denote by 11 . 11% the norm in g). In Chapter 2, we have presented a number of algorithms for solving the problem, min{fo(z) I z E W},
2
where fo(.)was assumed to be a continuously differentiable function. With the exception of the algorithms in Section 2.2 and of the algorithms (2.3.42) and (2.3.68), all of these algorithms are characterized by the fact that the sequences {zi>that they construct satisfy the following relations: 3 zi+l = zi hihi , i = O , 1 , 2 , 3 ,..., 4 p E (0,1lY* - 1 < /I Vf”(Zi) /I /I hi /I.
p
> 1 would violate the Schwarz inequality: I 0,Xi
+ Xihi) - f " q )
-f"Zi)
=fijp,
-
< Xia, 0,
57 d'(z, A)
=
Setting z
fA V f o ( z ) )V f o ( z ) )dt.
1 -
= zi
in (57) and invoking (8), we now obtain
58
~
Am 2
AM < 1 - A ' ( z ~A), < ~. 2
Now, according to (6), we must chose hi so that 59
a
< A'(Z2 ,Az) < 1 - a,
i.e., so that 60
a
< 1 - A'(Z,, Xi) < 1 - a.
6.2
SUPERLINEAR CONVERGENCE: QUASI-NEWTON METHODS
251
Comparing (60) with (58), we find that if we choose Xi to satisfy ( 5 9 , then, rn because of (54), both (60) and (58) will be satisfied. Thus theorem (53) indicates that at least in some cases there will be no need to construct a new step size hi after a certain number of iterations which may be required to enter a set in which (8) is satisfied. Note that since normally the constants m and M would not be available, the best one can do is to check, as suggested in the footnote accompanying algorithm (2.1.37), whether the previous step size Xi-1 is not satisfactory for the present iteration, i.e., whether it may be possible to set hi = hiPl.
6.2 Superlinear Convergence : Quasi-Newton Methods In this section, we shall obtain bounds on the rate of convergence of the Newton-Raphson method (2.1.39), which solves the problem, min{fo(z) I z E Rn};of the Newton-Raphson method (3.1.9), which solves the problem of finding the roots of a continuously differentiable function g : R" -+ R" with nonsingular Jacobian ag(z)/az; and of the quasi-Newton algorithm (2.1.42), which solves the problem, min{fo(z) I z E R"}. We begin with the Newton-Raphson method (3.1.9), of which (2.1.39) is a special case for g(.) = Ofo(.). 1 Algorithm (Newton-Raphson). Finds zeros of g : Rn (ag(z)/az)-l exists and is continuous. Step 0. Step 1. Step 2. Step 3. 2
-+
R",
provided
Select a zo E R". Set i = 0. Compute g(z,). If g(zi) = 0,stop; else, compute a(zi) according to CI(Zi) =
zi
g(z,),
-)*( -I
and go to step 4. rn Step 4. Set zi+l = a(zi), set i = i 1, and go to step 2. To simplify notation, we shall denote the n x n Jacobian matrix ag(z)/az by g'(z), i.e., we define
+
for all z
3
E
R".
We shall denote the second derivative of g(*),assuming that it exists, by g"(*)(.).We recall from definition (B.1.7) that for any z E Rn, for any y E Rn, g"(z)(y) is an n x n matrix, since g : R" Rn. -+
252
6
RATE OF CONVERGENCE
Proposition. Suppose that g'(.)-l is continuous.* If the sequence {zi} generated by the Newton-Raphson algorithm ( 1 ) is infinite and converges to a point 1, then g(1) = 0. Proof. Since g'(.)-' is continuous, the map a(*) defined by (2) is also continuous. Now, {zi}satisfies Z,+l
= a(z,),
i
= 0,
1, 2, ... .
Hence, letting i -+ coywe obtain that 1 = a(S), i.e., that 1 = 1 - gf(l)-l g(i?),
which implies that g(1) = 0, since the matrix g'(1)-l is nonsingular by assumption. Let a'@) denote the Jacobian matrix aa(z)/az for all z E R", where a(.) is defined as in (2). Then, we must have, for any y, z E R",
4.4 Y
=Y -
a [a, g'(z)-'] (v)g(z> - g'(z)-'
g'(4 Y
where l i s the n x n identity matrix, and [(a/az)g'(z)-l](*)is a linear operator defined by
provided this limit exists (compare (B.l.7)). We shall assume that the limit in (8) exists, in which case, for every y E R", we see that [(a/&) g'(z)-l](y) is an n x n matrix. Assuming that g"(-)(.)exists and is continuous in z ,we ~ find, since
a
7 &
[g'(z)g'(z)-'1
=
a I =0 a,
for all z
E
R",
that 10
* We say that a matrix-valuedfunction G(.) from R"into the space of all n x n matrices is continuous if all the components of GO) are continuous. t We say that g"(.)(-)is continuous in I if zi -+ z as i -+ co always implies that g"(zi)(y) g"(z)Q as i + co, for any y E R".
-
253
6.2 SUPERLINEAR CONVERGENCE: QUASI-NEWTON METHODS
Hence, substituting from (10) into (7), we obtain, for all y 11
a'(z)y = &)-'
E
Rn,
g"(z>(v)g'(z1-l g w .
Suppose that f is a zero of g ( . ) , i.e., that g(0) = 0. Then, since a'(5)y for all y E R" (by (7) or (ll)), we find that 12
a'($) = 0
for all 0 such that g(0)
=0
= 0.
Proceeding as above, it is not difficult to show that a"(*)(*),the second derivative of a(.) (as defined by (B.1.7)), exists and is continuous if the function g ( . ) is three times continuously differentiable, or, to state this in simpler language, if the elements gi(.),i = 1,2,..., n, of g ( * ) ,are three times continuously differentiable. Assuming that a"( *)(-)exists and is continuous, and that the sequence {zi}constructed by algorithm (1) converges to a point 9, which as we have already shown must satisfy g(0) = 0, we obtain from the Taylor formula for second-order expansions, that 13
where 11 a"(f)ll,the norm of the operator a"(.$)(.)is defined in ( B . l . l l ) and [zi , f] = {f = hzi (1 - A); I X E [0, l]}. In obtaining (13), we have made use of the inequality, 11 a"(z)(y)ll < 11 a"(z)llII y 11, which follows directly from definition ( B . l . l l ) . (The reader will recall that 11 a"(z)(y)ll = max{ll a''(z)(v)Y' I1 I II Y' II 11.) Hence, since u'(f) = 0, since f = a($), and since z $ + = ~ a(zJ for i = 0, 1 , 2,..., we obtain that
+
-' < I1 g'(4-l I1 I1 g(z)ll < (1/m)I1g(z)ll,* because of ( W , and since g ( f ) = 0, it follows that there exists a p E (0, 1/2) such that I( g(z)ll/m < 1/2 for all z E B($ p), and hence
ml
II 44
20
- 2 II = I1 z - 2
< ll z
-f
- g ' ( W g(z)/l I1 I1 g'(z)-l g(z)ll d
+
p
+ + < 1,
i.e., a(z) E B ( f , 1) for all z E B($ p). Now, making use of the Taylor formula for second-order expansions (B.l.l2), we find that
* Sinceg'(z) is a Hessian matrix, it is symmetric. It now follows from (18) that 11 g'(z)ll < Mand I1g'(z)-' I1 < l / m for all z E R", where /Ig'(z)ll = max {II g'(zly II I II y II < 11, II g'(z)-' II =
max 4 g'(z)-'y I1 I II Y II < 1).
6.2 SUPERLINEAR CONVERGENCE: QUASI-NEWTON METHODS
255
and hence, for all z E B(b, p), 22
where we have again made use of the inequality I/ g(z)-’ I/ < ljm. Making use of Taylor’s formula for first-order expansions (B. 1.3), we obtain 23
Combining (22) with (23), we now obtain 24
Now, making use of Taylor’s formula for first-order expansions (B. 1.3), we obtain 25
g(a(z)) = g(5)
+ I ’ g v + t(a(z)
- 2)) dt
0
Consequently (since g(2) 26
(44 - 2, g(a(z))>
=
= 0),
I
l&)- $1.
because of (1 8),
1
0 such that (30) is satisfied for h = Xi , and go to step 6. -h(1
-
Comment. Use procedure (2.1.33) to compute hi . Step 6 . Set z i f l = zi hih(zi), set i = i 1 , and go to step 2. We shall now show that, under suitable assumptions, the quasi-Newton algorithm (29) will set hi = 1 for all i greater than some integer k , and hence that it has the same rate of convergence as the Newton-Raphson algorithm (1) (or, to be more precise, (2.1.39)). Note that the theorem below, which is due to Goldstein [G3], requires f 0 ( O to be strictly convex, but only twice continuously differentiable. In obtaining the quadratic rate of convergence (15 ) for the Newton-Raphson algorithm (2.1.39), we had to assume thatfo(.) was four times continuously differentiable, but we did not need to assume that f O ( 0 was strictly convex.* As we shall now see, when we assume that fo(.) is only twice continuously differentiable, we cannot show that (2.1.39) converges quadratically, though we can still show that it converges superIinearEy.
+
+
31 Theorem (Goldstein [G3]). Suppose that f o : IW" -+ R1is twice continuously differentiable and that there exist constants m and M , 0 < m < M , such that m II Y
32
112
< ( Y , H W Y ) < M I1 Y 112,
for all z E Rn and for all y E IW" (where, as before, H(z) = 82fo(z)/az2). If {zi}zois any sequence constructed by algorithm (29), then (i) the sequence {zi} converges to a point 1 which minimizes f o ( z ) over Z E [Wn;
(ii) there exists an integer k such that for all i >, k , hi = 1; (iii) the convergence of {zi} to 2 is superlinear, i.e., for any B E ( 0 , I], 11 zi - 1 ll/Bi 0 as i co. ---f
-+
* Alternatively, from (17), we could have assumed that fa(.) is three times continuously differentiable and that (18) holds.
6.2 SUPERLINEAR CONVERGENCE: QUASI-NEWTON METHODS
257
ProoJ: We begin with (i). First, it follows from theorem (B.2.8) that the function fo(.)is strictly convex and that the set C(zo)= {z I fo(z) ,
, for some
c",
< co,
Proof. First, suppose that p = n and j = n - 1. Then hi+, = gi+, and , n n n hi = gi for all i E J , , and hence, I1 hi+, - hi 11 = II gi+, - gi 11. Therefore, l,orp=nandO < j < n - 2 . supposethateitherp > n a n d 0 < j < n Next, recall that j+l
j i
j+l
II hi+j+l - hi II
=
< II gi+i+l
-
II gi+i+1 - Yi+ihi+j - g z. - Yihi II j+ 1
gi /I
Now making use of (28), we obtain
+ II
j i
yi+jhi+i
- yihi
II.
6.3
SUPERLINEAR CONVERGENCE: CONJUGATE GRADIENT METHODS i
265
i
Let CHi = . 4 Remark. It follows from (B.2.8) that the set C(zo) is compact, because m > 0 in (3). Hence, since H ( - ) is continuously differentiable by (2), there exist constants M 2 m and L > 0 such that
m I1 y 11'
< ( y , H ( z ) y ) < M 11 y
for all y E Rn, for all z E C(zo),
and
(I H ( z ) - H(1)ll < L I1 z - 1 I(,
for all z E C(zo),
where 1 is such that f o ( l ) = min{fO(z)I z E Rn}. (Note that 1 is unique because f O(.) is strictly convex on C(zo),which must contain 2.) To reduce the need for leafing back and forth, we now restate the variable metric algorithm. 7 Algorithm (variable metric; Davidon [D2], Fletcher and Powell [F3]). Step 0. Select a zo E R". If Vfo(zo)= 0, stop; else, go to step 1. Step 1. Set i = 0, set Ho = I (the n x n identity matrix), and set go = Vf0(ZO).*
Comment. Note that both g, and Hi are not defined in the same way here as they are in Section 3. Step 2. Set 8
hi
=
-Higi.
Step 3. Compute hi 3 0 such that 9
fo(zi
+ hihi)
+ Ahi) I h 2 o}.
= min{fO(zi
+
Step 4. Compute V f O(zi hihi). Xihi) = 0, stop; else, set Step 5 . If VfO(z,
+
+ h,hi ,
10
z , + ~= zi
11
gi+1 = VfO(Zi+l),
12
47,
= gi+1 - gi
9
* The choice Ho = Zis not mandatory. We may choose Ho to be any symmetric, positive definite matrix.
6 RATE OF CONVERGENCE
270
and go to step 6.* Step 6. Set i = i 1 and go to step 2. We recall from (2.3.76) that the matrices Hi are symmetric and positive definite for all i = 0, 1, 2, ..., and we recall from (2.3.106) that if (zi}is an infinite sequence constructed by algorithm (7) for problem (l), under assumption (2), then zi+ f as i -+ co,where fo(D) = min(fo(z) I z E UP}. We begin by showing that fo(zi)+ f o ( 2 ) as i -+ 00 at least as fast as a geometric progression.
+
15 Theorem (Powell). Suppose that assumption (2) is satisfied. If (zi} is an infinite sequence constructed by algorithm (7) for problem (l), then there exists a constant q(zo)E (0, 1) such that 16
f"Zd
-f0(4 < q(z0)i [ff"(zo)-j0(2)1,
i = 0, I, 2,...,
where D is the limit point of (zi}. Proof. We shall make use of some of the facts established in the proof of theorem (2.3.106). Since we have shown that (2.3.129) cannot be true, we conclude from (2.3.1 17) that i = O , l , 2 ,....
17
Therefore, applying an argument similar to the one which we had used to obtain (2.3.124) ,we conclude that for at least two-thirds of the integers j E (0, 1,2,..., i}, the inequality, 18
II gj+l 11' d 3 ~ ( g j +, Hjgj+l?, ~ must be satisfied. Therefore, both the inequality (2.3.124) and (18) must be satisfied simultaneously for at least one-third of the integersje (0, 1,2,..., i } , and hence, for these integers j , we must have
19
Making use of (2.3.97), we now obtain, for these integers j ,
20
mlf0(zj+1)-
f"41< 9ww' II 4%112.
Since by lemma (2.3.89), (1 Agj //'/I/ Azj (1' is bounded for all j = 0, 1,2,...,
* See footnote on p. 56.
6.4 SUPERLINEAR CONVERGENCE: VARIABLE METRIC ALGORITHM
271
and because of the bound on 11 dzi 11 given by (2.3.104), we conclude that there must exist a constant q' such that 21
for all thosej E (0, 1,2,..., i> for which (20) holds. Thus, for at least one-third of the integersj E (0,1 , 2,...,i } , we must have 22
+~) < Now, for the remaining values of j, we must have I ~ O ( Z ~-fo(P)] [fo(zj) - f o ( P ) ] , and hence, we find that (16) is satisfied for q(zo) = (q'/1 q ' Y . w The following result is an important consequence of theorem (1 5):
+
23 Corollary.
There exists a b
< co such that m
24
Proof. According to (2.3.104), 25
and since P minimizesfO(z),we must have Ifo(zi)-fo(zi+,)] We now conclude from (16) that
< Ifo(zi)-fO(i)].
26 which shows that (24) must hold. The two theorems to follow will make frequent use of the above corollary for the following reason: Since zi + P as i -+ co, m
27
P Consequently, because of (24),
28
= zi
+CAz~
272
6
RATE OF CONVERGENCE
We shall use the notation, i = O , 1 , 2 ,....
29
30 Lemma. Suppose that assumption (2) is satisfied. If {z,} is an infinite sequence constructed by algorithm (7), converging to the point 2, then
II 4,- H ( 9 4/I d LA, II 4 /I,
31
i = 0, 1, 2y...,
where, as before, H(z) = 8fo(z)/i3z2and L is the constant in (6). Proof. By the Taylor formula for first-order expansions (B.1.3), 1
32
dg,
H(zi
= 0
+ t dz,) dt d z i .
Consequently, 33
where we have made use of (6), (29), and the fact that for t 11 zi tdzi - 111 < max{((zi- 2 11, (1 z,+~- 2 II} < d i.
+
E
[0, 11, rn
34 Lemma. Let P be any nonsingular n x n matrix and let To(,)be defined by
f"z)
35
= fO(P-1z)
for all z E R".
Suppose that zo ,z1 , z2 ,... is a sequence constructed by algorithm (7) when applied to the solution of problem (l), and suppose that Z0,Z1,Z2 ,... is a sequence constructed by algorithm (7) when applied to the problem, min(fo(z) 1 z E R"} but with Ro = PPT in step 1. If .To = Pzo , then, for i = 0, 1, 2,...y
36
f"9
37
Ii
38
gi = (P-l)Tgi ,
=fO(Zi), =
Pz, ,
and
39
Bi = PHiPT,
6.4 SUPERLINEAR CONVERGENCE: VARIABLE METRIC ALGORITHM
273
where the bars over the letters indicate the quantities constructed by algorithm (7) in the process of solving the problem, min{fo(z) I z E W}. Proof. Suppose that (37) is true; then (36) follows from (35) and so does (38). Consequently, we only need to prove (37) and (39). Note that (37) and (39) are true for i = 0. Now suppose that (37) and (39) are true for some integer i 0; then a direct application of algorithm (7) shows that (37) and (39) are also valid for i 1 . Consequently, since they are valid for i = 0, (37) and (39) must hold for all i = 0, 1,2,... .
+
40 Theorem (Powell). Suppose that assumption (2) is satisfied, and that H, , Hl , H, ,... is an infinite sequence of matrices constructed by algorithm (7) in the process of solving problem (1). Then there exist constants Ei and M, 0 < 5 R < 00, such that
1. Then we shall show that the product M o M l M* 2 - - M i - - .is finite. For, suppose that the product M i< @ < 00, and that M i 1 for i = 0, 1,2,... . Then, since II Hi+, Il/max{l, I1 Hi ll} d M i , for i = 0, 1, 2,...,
nzo
m w , II Hi Ill
47
and hence, we must have, for i 48
ll Hi+1(I
< Mi-1max{l, II H i 4 Ill,
=
i
=
1, 2,...,
0, 1, 2,...,
< M imax{l, II Hi Ill < max{l, I/ Ho II}
i
M i< max{l, /I Ho I l l @.
j=O
To establish the bounds M i ,we shall make use of the matrices Q i , i = 0, 1, 2 ,..., defined as follows: 1
1
49
Q 2. --Z - -
/I Azi /I2
dzi)(dzi
+ ( d g i , &) dg,)(dg,,
i
= 0,
I, 2,... .
It follows from theorem (2.3.76) (or rather from its proof) that the matrices Qi, i = 0, 1,2,.,., are symmetric and positive definite. Note also that by direct calculation, we obtain
Q i d z i= dgi
50 We shall now show that
* From now on we assume that H(*) = I.
6.4
SUPERLINEAR CONVERGENCE: VARIABLE METRIC ALGORITHM
275
To prove (51), we shall make use of the fact that the euclidean norm of a symmetric matrix is the largest of the absolute values of its eigenvalues. For i = 0, 1 , 2,..., we define xi
52
= Q:I2
Azi
== Q”” 2
A gi
7
the second equality being a consequence of (50). We now express Azi and dgi in terms of xi and substitute for these quantities in (14). We then obtain the identity,
53
where Ji = Q:12HiQ:/2and the definition of Ri is obvious from (53). To establish (51), we have to show that the eigenvalues of the matrix Q:12Hi+lQ:’z are bounded by max(1, II Ji 11). Since 54
1
Ri = Ji
we find that for all y
E
-
<Xi
7
Jixi>(Jixi
JiXi)
9
R”,
55 by the Schwarz inequality, i.e., we find that Ri is positive semidefinite. (Note that Rixi = 0.) Suppose that y i is an eigenvector of R i corresponding to the largest eigenvalue of Ri . Then Riyi = /I Ri (1 y i , and hence,
56
( Y i RiYi) 9
=
I1 Ri I/ 11 Yi 112
=
(Yi
< (Yi JiYi) < /I 11 Ri/I < 11 Ji I]. Now, 9
Ji
(JiXi Y i Y 9
Y
JiYi)
- < x i , Jixi)
I1 II Yi ( I 2,
which shows that since xi is an eigenvector of R, corresponding to the zero eigenvalue, and Ri is symmetric, xi is orthogonal to all the other eigenvectors of Ri . Consequently, the matrix, Q
57
Q:12Hi+lQi - Ri 1/2
+
1 _ 1, x,_ 112 %>(Xi
9
has the same eigenvectors as Ri , with the same corresponding eigenvalues, with the exception of the eigenvalue corresponding to xi , which is 1 (it was zero in Ri). Since the matrix in (57) is symmetric, we conclude that (51) is true.
276
6
RATE OF CONVERGENCE
Since the matrix Qi is symmetric, we deduce from (51) that 5%
We now define the M i , i = 0, 1 , 2,..., as 59
Mi
= maxU,
II Q? Ill max{l, I1 Qi Ill,
and note that M ib 1. Next, observe that (58) yields
60
/I Hi+,I1 < Mi maxu9 /I Hi Ill,
i = O , 1 , 2 ,...,
i.e., we have established the first half of the inequality in (48). We shall now show that n f o M i < 00, to complete the proof of (48). For this purpose, we calculate bounds on 11 Q;‘ 11 and 11 Qi 11, which will lead us to a bound on Mi. BY (491, 61
Since for any dyad a ) @ , II a)(b [I = II a II I1 b 11, we find that 62
+
II Asi - Azi II II Azi II I 0. Now referring to lemma (2.3.89), we conclude that there exists a constant D, independent of i, such that 64
Consequently, we deduce from (46) that (see (29))
II Q i
65
-I l l
< DLAi
3
i = O , 1 , 2 ,...,
and hence, that
II Qi II
66
< 1 + DLAi
7
i = O , 1 , 2 ,....
Since (24) implies that di -+ 0 as i+ co, wk conclude from (65) that there exists a positive constant D’ such that 67
II Qi’ II G
1
+ D‘di
9
i = o , 1 , 2 ,....*
Substituting from (66) and (67) into the definition of the M i , (59), we find that
M i< ( 1
68
+ D’di)(l + DLd,),
i = 0, 1,2, 3 ,... .
Hence, to establish (48), it is sufficient to show that the product
nLo(l+ D’d,)(l + DLd,) is bounded. It is not difficult to show that this
product is bounded if the sum xzodiis convergent. Now, from (26) and (29), we deduce that
69
- f0(z)l/m)”2 4 < dim{2[f0(z0) 1 - ddzo)
and since q(zo)E (0, l ) , we conclue that xa:odi is convergent. Hence, we are done with the first part of theorem (40). We still have to prove the first half of (41), or equivalently, the first half of the inequality (42), i.e., that there exists an Ei > 0 such that 0 < T?i < p,j, j = 1,2 ,..., n, i = 0, 1,2, 3 ,..., where the pi$ are the eigenvalues of Hi. For this it is sufficient to show that the matrices Gi= H;’ are uniformly bounded in norm. We shall show that we can replace Hiby Giin the in-
+
* Note that -1 II Q? II < II Q i l - I l l = II Q;V - Qdll < II Q;’ I1 II I - QSI1 < 11 Q;’ 1) DLA, . Hence, there must exist a b < a, such that 11 Q:; 11 < (l/l-DLA,)< b for i = 0, 1, 2,..., and consequently, 11 Q;’ - Ill < bDLA,, i = 0, 1 , 2,....
278
6
RATE OF CONVERGENCE
equality (48). To obtain this result, we use (52) to express zi and gi in terms of xi in equation (2.3.108). We then obtain the identity,
where the definition of Siis clear from an inspection of (70). Now, since the matrix ( I - (1/11 xi /Iz) xi>(xi) is a symmetric projection operator, we must have
II siit
71
< II
it1
Furthermore, xiis an eigenvector of Sicorresponding to the eigenvalue zero. Since the eigenvectors of Si are orthogonal to each other, we conclude that the matrix [& (l/ll xi [I2) xi>(xi]has the same eigenvectors and eigenvalues as S i , with the exception of the eigenvalue corresponding to the eigenvector xi , which was zero in Smand now becomes 1. Therefore, we are led to an inequhlity which is quite similar to (51), namely,
+
72
11 Q ; ~ ~ ~ G ~ + /I~< Q max(1, ; ~ / ~ 11 Q ; ~ / ~ G ~ QII}, ;~'~ i
= 0,
I, 2,... .
We now obtain from (72) that
73
II Gi+l It
< I1 Qi I/ max(1, II Q;' II II GiII> < Mi maxu, /I G, I l l < Mi)maxu, II Go Ill < A maxu, II Go Ill?
(C
and hence we are done. As the reader may recall, theorem (40) was of some importance in Section (2.3), where it was stated without proof (see (2.3.84)). However, its main value lies in the fact that we need it to prove the following theorem, which shows that the variable metric method converges superlinearly: 74 Theorem. Suppose that assumption (2) is satisfied, and that P E 08" is such that fo(P) = min{fo(z) I z E W}. If {zi} is an infinite sequence constructed 0 as by the variable metric algorithm (7), then (11 zifl - $2 11/ 1 1 zi - 1 1 ) i -+ co,i.e., {zi> converges to S superlinearly.
-
Proof. We begin by reusing the arguments appearing in the beginning of the proof of theorem (40), where we used the functionfO(.), defined as
6.4 SUPERLINEAR CONVERGENCE: VARIABLE METRIC ALGORITHM
279
in (35), with P = H(2)1/2,and constructed a sequence {Z,} in the process of solving problem (43), with algorithm (7) altered to initialize in step 1, so that ITo = H(2). Setting z' = H(2)1/2$,we find that z' minimizes f o ( z ) over z E R", and hence, .Zi z' as i co.Now, making use of (37) for P = H(2)lI2, we obtain ---f
75
---f
i = O , 1 , 2 ,....
5, - z' = H($)l/Z (z, - i),
Hence, we find again that without loss of generality we may assume that H(2) = I, the identity matrix, and thereby simplify the algebra of the proof.* By the Taylor formula for first-order expansions (B. 1.3),
76
g,
Since g(2) 77
= g(zi) = g(2)
= 0,
+ s' H(2 + t(z,
-
0
2)) dt (z,
2).
and because of (5), we must have m 11 Zi
-
2 11
< 11 gi 11
) 0, a S(z) < 0, and a y(z) > 0 such that (compare (1.3.28)) 4
c(y, z") - c(y, z')
< S(Z)
for all z' E B(z, ~ ( z )=) {zo E T I I]zo - z 11 < E (z)},for all z" E A ( y , z'),for all Y E 10, Y(z)l; (iii) there exists a sequence { ( s } ~ o such that 5, > 0 for s = 0, 1,2,..., 5
f f s < C o s=o
and 6
I C ( / ~ ~ Ez~) , - c(0,
z)l
< 5, ,
for all
z E T.
w
7 Lemma. Suppose that (3) (i) and (iii) are satisfied. If {zi}is an infinite sequence constructed by algorithm (l), and {zi}has at least one accumulation point, then the accompanying sequence { E ~ } ; ~ converges to zero. Proof. By construction, { E ~ }is a monotonically decreasing sequence which is bounded from below by zero, and consequently, it must converge. Suppose, therefore, that 8
Ei+E*
>o
as i-t co.
We shall show that the inequality in (8) leads to a construction. Relations (2) and (8) imply that there exists an integer k 2 0 such that 9
q(i) = q*
and
ei = Bp*e0 = E*
for all i 2 k.
* For use with algorithms for solving nondiscretized continuous optimal control problems, it is necessary to refine model (1) a little; see [=a].
A.l
IMPLEMENTATION OF OPTIMAL CONTROL ALGORITHMS
285
Also, because of the test in step 3 of (l), we must have 10
C(E*,
Zi) 6
zi+l) - C(E*,
---(YE*
Consequently, we must have c(ei ,zi)-+ -03 (because of (2)), 11
4.6
9
Zi)
2
C(0,ZZ)
-
as i
hi)
-
for all i 3 k. 03.
Now, by (6) i = O , 1 , 2 ,....
9
Hence, because of (9), we must have 12
Now suppose that zi z* as i -+ co,i E K C (0, 1 , 2,...} (i.e., z* is an accumulation point of (zi}).Then, by (3) (i), there exists an integer k' 3 k such that -+
13
< fu*
for all i E K, i 3 k'.
c ( e i , zi)3 c(0, z*) - 2&
for all i E K , i 3 k',
c(0, Zi) - 40, z*)
Combining (12) and (13), we obtain 14
which contradicts our original conclusion that c(ei ,zi) -03 as i -+ co. Consequently, the inequality in (8) must be false, i.e., we must have E* = 0. --f
15 Definition. Let K be any infinite subsequence of the integers. We define the index function k : K -+ K by
k(i) = min{jE K ( j 3 i
16
+ l},
i.e., k ( . ) computes the successive points of the subsequence. 17 Lemma.
Suppose that { z i } ~ , is , a sequence constructed by algorithm (1) and that K C (0, 1, 2, 3,...}. If assumption (3)(iii) is satisfied, then u(k(i))
18
C(Ek(i)
,Z,(i)) 6 2
1
53
j=q(i+l)
Proof. Let N 19
= (0,
C(Ei+l,
Zi+J
for all i E K.
1,2, 3,...I and let b: N - . (0, l} be defined by
0 if q(i) = q(i - 1) b(i) = 11 otherwise. Now, because of the test in step 3 of (l),
20
+
286
APPENDIX A
Hence, if b(i
FURTHER MODELS FOR COMPUTATIONAL METHODS
+ 1) = 0,
E
, and (20) yields
< c(ei ,zi)-
,
21
= + ~ci
~
,
C ( E ~ + ~z $ + ~ )
If b(i 22
for i E N if b(i
+ 1) = 1, then making use of (6) twice, we obtain, from (20),
C(Ei+l
7
Zi+l)
d
C(Ei+l
Zi) - aEi+l
3
+ +
6 407 Zi)
d
4%
9
&(i+l,
Zi)
- %+l
Edi+l)
+ tdi) -
%+l
9
for i g N if b(i
+ 1) = 1.
to the right-hand sides of (21) and (22), we see that
Adding 23
+ 1) = 0.
C(Ei+l
9
Zi+l)
< 4% Zi) + b(i + l)"o(i+l) + tO(i)I, 7
i = 0, 1,2,... *
Making use of (23) recursively, we now obtain for all i E K, k(i)
~(w , ) 6 C
24
+ t g ( d+ c ( E ~ +, ~z ~ + ~ ) .
b(.j)[to(j)
+i+2
Since b(j) = I implies that q(i - 1) < q(i), it follows that for any s such that 1) < s < q(k(i)),f s can be repeated at most twice in (24). Consequently, (24) yields
q(i
+
k(i)
c b(.j)[tdj) + t&l)I 6 2 c
25
o(k(i))
j=i+a
55.
j=Q(i+l)
Combining (24) and (25), we now obtain (18). 26 Theorem. Suppose that assumption (3) is satisfied. Then, either algorithm (1) jams up at a desirable point ziafter a finite number of iterations, or else
it constructs an infinite sequence {zi} such that every accumulation point of that sequence is desirable.
Proof. First, suppose that algorithm (1) jams up at a point zk which is not desirable. Then, by (3)(ii) there must exist a 6(zk)< 0 and a y(zk) > 0 such that 27
c(y, z") - c(y, zk)
< S(zk)
for all z''
6
4,zk), Y E [O, ~ ( z d l .
Now, when the algorithm jams up, it cycles between steps 2 and 3. Hence, ,zk), it must be generating a sequence {y,};=o such that y , E A(BQ'"'+"E~ p = 0, 1, 2,..., and 28
.C(BQ(k)+P€o,
y,) - c ( B Q ( k ) + P € o ,Zk) > -aBQ(K)+PE0 ,
p
= 0,
1,2, ... .
A.l
287
IMPLEMENTATION OF OPTIMAL CONTROL ALGORITHMS
However, for some integer p’ 2 0, we must have max{qVk)+P’E0, B p ( k ) + p ’ ~ o } < min { - 6(z), y(z)},
29
and hence, we see that (28) cannot hold if zk is not desirable. Thus, algorithm (1) cannot jam up at a nondesirable point zk . Now suppose that the sequence {zi}is infinite and that z* is an accumulation point of that sequence. Thus, suppose that zi-+ z* as i-. co, for i E K C { O , 1, 2,...}, and that k(.) is the index function for K. Suppose that z* is not desirable. We shall show that this leads to a contradiction. First, because of lemma (7), we note that q(i) co as i-. 00 and that ci + 0 as i + co. Next, by (3)(iii), --f
30
C(%
9
Zi)
3 do, Zi) - f d i )
i=O,1,2
9
,....
From (3)(i) we conclude that c(0, Zi)-. LiO, z*)
31
as i+ co, i e K .
Consequently, since zi+l E A ( E ~,+ZJ,~ since ei -.0 as i
--f
co, and since
tn(i)0 as i -.co because of ( 5 ) , we obtain from (30) and (31) that ---f
32
lim C ( E < , i€K
zi)3 lim[c(O, zi) -
3 c(0, z*).
i€K
Now, since z* is not desirable, by (3)(ii), there exist an and a y* > 0 such that
33 c(y, z”) - c(y, 2’) < 6* for all z’ E B(z*, E * ) ,
for all
Z” E
E*
> 0, a 6* < 0
A ( y , z’), for all y E [O,y * ] .
Since ci + 0 and since zi z* as i -+ co,for i E K, there must exist an integer k‘ such that for all i 2 k‘, ziE B(z*, E*) and ei < y*. Hence, by (33), --f
< 6*
c ( E ~ +zi+& ~ , - c ( E ~ + ,~zi)
34
for all i E K, i 3 k’.
Referring to (3)(iii), we define
35
-
Then {b,}:=,, is a monotonically increasing sequence which is bounded from above, and which therefore converges. Consequently, since q(i) co as i co, there must exist an integer k“ 3 k’ such that --f
36
bdk(i))
- bdi,
k for all i 2 k‘. 1 Algorithm Model.* A : N x T + 2 T , c : T - t R1, tl : N + N , t2 : N - t N . Step 0. Compute a zo E T and set i = 0. Step 1. Set z = ziand s e t j = tl(i). Step 2. Compute a y E ACj, z). Step 3. If c ( y ) < c(z), set z $ + ~= y , set i = i 1, and go to step 1; else, set zi+l = zi, s e t j = t2(i),set i = i 1, and go to step 2.
+
+
2 Theorem. Consider algorithm (1). Suppose that tl(*) and t 2 ( - )are truncation functions and that c(.) and A ( * ,*) have the following properties: (i) c(.) is either continuous at all nondesirable z E T, or else c(z) is bounded from below ) 0, for z E T; (ii) for every z E T which is not desirable, there exist an ~ ( z > a 6(z) < 0 and a k(z) E N , such that
3 c(z”)
-
c(z’) < 6(z)
, k(z).
Then every accumulation point of an infinite sequence zi constructed by algorithm (1) is desirable.
ProoJ
First suppose that there is an integer i’
EN
such that zi,=
=
... . Then we find that the algorithm keeps on constructing vectors y E A(tl(i’) + t2(i’ + q), zi,), q = 1 , 2,..., such that c(y) 2 c(zi,). Since ti(.) zi’+2=
and t2(-)are both truncation functions, it now follows from assumption (ii) above that zi,must be desirable. Note that in this case zir is the limit point point of the sequence constructed by the algorithm. Now suppose that there is no integer i E N such that zi = zi+l= * * * and that 2 is an accumulation point of {zi}.Thus, suppose that 2 is not desirable
* This algorithm model and the accompanying convergence theorem were first presented in [P3a].
290
APPENDIX A
FURTHER MODELS FOR COMPUTATIONAL METHODS
and that zi f as i-. co for iE K C (0, 1 , 2 ,...}. Then, by assumption (ii) above, there exist an 2 > 0, a 8 < 0 and a h E N such that ---f
4
c(z”) - c(z’)
for all z’ E B ( f , Z), for all z” A ( j , z’),for all j h. Now, since zi + 1 as i --+ co, for i E K, there exists an integer k E N such that tl(i) >, and ziE B(1, C) for all ie K , i > k. Let K be any infinite subset of K such that if i, i + p are two consecutive indices in K’, then zi+, # zi. Consequently, zi + 1 as i -+ co, for i E K’, and in addition, because of (4), if i, i p are two consecutive indices in K and i > k , then
+
~ ~ ~ converge , because of assumption (i), and, But the sequence { c ( z ~ ) }must since this is contradicted by ( 5 ) , we conclude that f must have been desirable. To illustrate the manner in which model ( 1 ) can be used, we shall modify once again the method of centers (4.2.27) (compare (4.2.47)). First, we modify the Golden section search algorithm as follows:
6 Algorithm (Golden section search). Integer k 2 0 to be supplied; Fl = ( 3 - d 5 ) / 2 , F, = ( 2 / 5 - 1)/2 (compare (2.1.14)). Computes an interval containing the minimizer of a convex function 0 : R1 --+ R1 when this minimizer is in [0, 00).
Step 0. Select a p > 0. Step 1 . Compute O(O), O@). Step 2. If 0(p) > O(O), set a, = 0, b, = p, and go to step 7 ; else, go to step 3. Step 3 . Set i = 0 and set p, = 0. Step 4. Set pi+l = pi p. Step 5 . Compute O(P~+~). Step 6. If O(pi+l)2 O(p,), set a, = pi-l, b, = pi+l, and go to step 7; else, set i = i 1 and go to step 4.
+
+
Comment. The desired minimizer is contained in the interval [ a , , b,]. Step 7 . Set j = 0. Step 8. Set uj = aj Fl(bj - aj), wj = aj F,(bj - aj). Step 9. If 0(uj) < O(wj), set aj+l = aj , set bj+l = w i , and go to step 10; else, set aj+l = uj , set bj+l = bj , and go to step 10. Step 10. If j < k,set j = j 1 and go to step 8; else, set ii = (aj bJ2 and stop.
+
+
+
+
A.2
OPEN-LOOP IMPLEMENTATION
291
We now incorporate this into algorithm (4.2.27), in accordance with model (l), as follows (compare (4.2.47)), to obtain an algorithm applicable under the same convexity assumptions as for (4.2.47): 7 Algorithm (implementation of modified method of centers, Polak [P3a]). Step 0. Compute a zo E C, and set i = 0; select truncation functions td.1,tz(.). Step 1. Set z = zi and set k = t,(i), Step 2. Solve (4.2.22)-(4.2.25) to obtain (ho(z),h(z)). Step 3. If ho(z) = 0, set z % + ~= ziand stop; else, go to step 4. Step 4. Set O(p) = d(z ph(z), z ) and use ( 6 ) to compute I.. 1, ph(z), set i = i Step 5 . If f o ( z ph(z)) - f i ( . Y ) < (Vfi(Z>,z - Y ) .
5
When the inequality in ( 5 ) is strict for all y # z in R", the function fi(.)is strictly convex, and vice versa. 6 Theorem. Suppose that f : R" R1 is twice continuously differentiable. Then fi(.)is convex if and only if the Hessian a2fi(z)/az2is positive semidefinite for all z in Iw". -+
7 Theorem. Suppose that f i : Iw" R1 is a continuously differentiable convex function. If f E IW" is such that Vfi(f) = 0, then fi(f) ,
10
is bounded.
295
8.3 A FEW MISCELLANEOUS RESULTS
Proof. By the Taylor formula for second-order expansions (1.12), for any y f z in Rn, 11
fYz) - f Y Y ) = (Vfi(z), z - r> -
1:(z
-y,
H(z - t(z - y))(z - Y ) ) (1
-
t ) dt
< 0, there exist an E' > 0 and a h, > 0 such that for all z E B(z', E') = {z I 11 z - z' 11 d E'} and for all h E S,
2
I fyz
+ Ah)
-
f"z>l
0 be arbitrary, but finite. Then fi(.)is uniformly continuous on the compact ball B(z', E ' ) = {z I 11 z - z' 11 < E ' } , and hence, there exists an E" > 0 such that
3
1f"Z)
-fi(z")I
0 and a A, > 0 such that for all z E B(z', E ' ) = { z I 11 z - Z" 11 E'} and for all h E S,
0 which is compatible with the convergence of the algorithm. This test will H always be satisfied after a finite number of iterations. Figure 5 shows a comparison between the steepest descent algorithm (2.1.16), the implementation (2.3.132) of the Polak-Ribikre method, and the quasi-Newton method (3.1). While the quasi-Newton method has the best rate of convergence, it also requires more time per iteration than the other two methods. Based on this consideration, one would usually prefer algorithm (1) or (2.1.132) to a gradient or quasi-Newton method.+ While implementations of the variable metric method (2.3.68), using a cubic interpolation in computing the step size, are known to be more numerically
* To be absolutely sure of superlinear convergence, use the test 8 < min {c', 11 g, Il} in step 14, and replace yt by w(i + l ) y t , in (7), as in (6.3.8). + Note that in Fig. 5, algorithm (2.3.132) appears to converge superlinearly, satisfying (6.3.10) with p = n.
308
APPENDIX C A GUIDE TO IMPLEMENTABLE ALGORITHMS
stable than either of the conjugate gradient methods we have seen, with the same type of rule for selecting the step size, it is not at all clear that such implementations of the variable metric method are superior to algorithm (1). In addition, the variable metric algorithm requires a computer with a much larger, fast access memory because of the need to store the matrix H i . For the reader who has a preference for the variable metric algorithm, we suggest, instead of using polynomial interpolation in computing step size, that he proceed as in algorithm (l), since we believe that this will result in a more stable implementation. 8 Algorithm (recommended implementation of variable metric algorithm).
Comment. Solves the problem, min{fo(z) I z E R"}, for fo(.) twice continuously differentiable. For relevant theory, see Sections 2.3, 6.4 and A.2. Step 0. Select a Z ~ R"; E select an integer k satisfying 1 < k < 10; select E (0.5,0.8); set i = j = 0. If Vfo(zo)= 0, stop; else, go to step 1 . Step 1. Set Ho = I (the n x n identity matrix), set go = Vfo(zo). Step 2. Set
h . = - H . gI .I
9
.
+
1 , k = 3k, and go to step 4; Step 3. If (ilk) = 0 modulo k, set j =j else, go to step 4 (see comment after step 3 in (1)). Step 4. Set q = 0; set z = zi ; set h = hi ; and define O(.) as in (2). Step 5 . Set x = 0. Step 6 . Compute O'(x) according to (3). Step 7. If e'(x) = 0, set hi = x and go to step 12; else, go to step 8. Step 8 . Set h = 1 . Step 9. Compute d according to (4). Step 10. If d < 0, set q = q 1 and go to step 1 1 ; else, set A = /3h and go to step 9. Step 11. If q <j , set x = x - hO'(x) and go to step 6 ; else, set hi = x - hO'(x), and go to step 12; Step 12. Compute Vfo(zi Xihi). Step 13. If Vfo(zi Xihi)= 0, stop; else, set
+
+
set i = i
+
+ 1, and go to step 2.
rn
C.5
C.5
309
PENALTY FUNCTION METHODS
Penalty Function Methods
There seems to be a feeling that penalty function methods cannot be used for determining a point satisfying optimality conditions with great accuracy, i.e., it is felt that they tend to suffer from numerical errors. As a result, they tend to become slow towards the end. Generally, exterior penalty function methods are considered to perform better than interior penalty function methods, as can be seen even in the very simple example in Fig. 6. However, X
+
a V
M O D I F I E D E X T E R I O R P E N A L T Y F U N C T I O N METHOD S U E S E O U E N C E OEFINEO 0 1 ALGORITHM c 4 . i . r n 1
YDOIFIEU INTERIOR P E N A L T Y FUNCTION METHOO S U E S E O U E N C E O E F I N E O B Y A L G O R I T H M 1 4 . 1 . 9 4 1
4.30 b
3.w
3.m 3.30
1.m P.6 2.P
1.09
-
-
0 X
x
-
1-95
*
1.m
-
1.w
0
0
B " . X
P
4
t
+
+
6
X
9
t
+
10
t
X
+
Y
X
+
14
X
+
X
*
lE
L6
a0
I T E R A T I O N S
Fig. 6. A comparison of the modified exterior penalty function method (4.1.81) with the modified interior penalty function method (4.1.95). For both algorithms the choice of parameters was E = 1, 01 = 0.4, p = 1, = 2. The problem: minimize exp[(zl))" 5(z2))"] (zl))" ~ O ( Z ~subject )~, to (zl) 2(za))"- 1 < 0, (z'))" + (2)"))" - 42' 1 < 0, (2'))" (2))" - z1 - 2)" < 0 ; 201 = 0.95, 20)" = 0.10.
+ +
+
+
+
+
a study by Lootsma [L3] indicates that mixed penalty function methods converge more frequently than either exterior or interior penalty function methods. Because of this, we recommend the mixed penalty function method given below, which is a cross between algorithms (4.1.81) and (4.1.95). We recommend this algorithm on the basis of the theoretical results in Section 4.1, which show that it can only converge to points satisfying the KuhnTucker optimality conditions. For heuristically based implementations, see [F21. 1 Algorithm (recommended mixed penalty function method).
Comment. Solves the problem, min{fo(z) Ifi(z) < 0,i
=
1,2,..., m, rj(z) =
310
APPENDIX C
A GUIDE TO IMPLEMENTABLE ALGORITHMS
= 1, 2, ... I}, where the f i ( . ) and r j ( . ) are continuously differentiable. Requires the assumption that the set C = {z E R” Ifi(z) < 0, i = 1, 2, ..., m, rj(z) = 0, j = 1, 27...,I} satisfies the Kuhn-Tucker constraint qualification (see [CI]). For relevant theory see Section 4.1. 0 Step 0 . Select a z E R”; select a, a‘, a” E (0, 0.5); select a p E (0.5,0.8). 0 Step 1. Set z = z and set i = 0. Step 2. Define the index sets,
0,j
,..., m } 1 fj(z) 2 0}, I” = { j E {I, 2 )...)rn} I fj(z) < O}. I’
= {.j E { 1, 2
Step 3. Define the exterior and interior penalty functions, 1
p’(z)
1 r W 2 + C [max{0,fj(z)>12,
=
j=1
Step Step Step Step Step
4. 5. 6. 7. 8.
jEI’
If i = 0, go to step 5; else, go to step 8. Compute Vf0(z), Vp’(z), Vp”(z). Set E’ = II VP’(Z)ll/Il Vf0(Z)ll, E” = II Vf“Z)ll/ll Select an E (0.1, 1). Compute h(z)
= -
[VfO(Z)
+ T1 Vp’(z) +
E“
VP”(Z)ll.
Vpyz)].
Step 9. If I/ h(z)ll > E ~ go , to step 10; else, set r n - a E z, = z, i = i 1, and go to step 2. Comment. We now compute step size as in (2.1). Step 10. Seth = 1. Step 1 1 . Compute
+
11
= fO(z
,
E
~
I
= + ~( Y E ( , E’ = a E ,
+ hh(z)) + T1 P‘(Z + Ah@)) + E’”’(Z + hh(z)) +
Step 12. If d < 0, set z = z (hhz) and go to step 8; else, set h = j3h and go to step 11. A rather common approach to speeding up the convergence of penalty function methods is to use a superlinearly convergent algorithm for solving the sequence of problems. This approach can also be used to speed up algo-
C.5 PENALTY FUNCTION METHODS
311
rithm( I). The following algorithm makes use of the Polak-Ribibre conjugate gradient method with restart, because we cannot justify convexity assumptions when using penalty functions for equality constraints. (When the 1 iterations, its converconjugate gradient algorithm is restarted every n gence to a stationary point can be established without assuming convexity of the objective function (see (6.3.9)).)
+
8 Algorithm (recommended mixed penalty function method with conjugate gradient subprocedure).
min{fO(z) / f i ( z ) < 0, i = I , 2,...,m, r3(.) are continuously differentiable. Requires the assumption that the Kuhn-Tucker constraint qualification is satisfied. For relevant theory see Section 4.1. 0 Step 0. Select a z E R"; select a, a', a" E (0, 0.5), select a /3 E (0.5, 0.8). 0 Step 1. Set z = z and set i = 0. Step 2. Define the index sets I', I" as in (2), (3); define the penalty functions p'(z), p"(z) as in (4), (5). Step 3. If i = 0, go to step 4; else, go to step 7. Step 4. Compute Vfo(z), Vp'(z), Vp"(z). Step 5. Set 6' = I1 Vp'(z)ll/Il Vfo(z)ll, 6" = I/ Vfo(z)ll/ll Vp"(z)ll. Step 6. Select an e0 E (0.1, 1). Step 7. Define Comment. Solves the
problem,
r i ( z ) = 0 , i = 1, 2, ..., I}, where the
fi(.) and the
9 Comment. We now apply a simplified form of algorithm (4.1) with reinitialization to the minimization of fO. Step Step Step Step
8. 9. 10. 11.
Compute Vfo(z). Set g = h = -Vfo(z); set j Set q = 0. Define 0 : R1 -+ R1 by
e(x) = p ( z
10
= 0.
+ xh) -J ' o ( ~ ) .
Step 12. Set x = 0. Step 13. Compute 11
eyx) = 0, an E" E (0, E'), an a' > 0, an a" > 0, a /3' E (0, 1), a p" E (0.5,O.Q and an integer k satisfying 5 < k < 10. Step 1. Compute a vector z, satisfyingfj(z,) < 0,j = 1, 2,..., m, Az, = 0; set i = 0. Comment. Step 2. Step 3. Step 4. Step 5.
For a method of computing z, , see steps 1-4 of algorithm (1). Set E = E'. Set z = zi. Define the index sets JEA,JENas in (3)-(5). Compute the vector (h,O(z), h,(z)) by solving the problem,
13
minimize
(Vfo(z),h),
subject to
+
< 0, ('CfW,h ) < 0, Ih'I < 1,
( V f W ,h )
14
15 16
Ah
j # 0, j E JEN(z); j
I
=
E
JCA(z);
1 , 2,..., n;
= 0.
set h(z) = h,(z), and go to step 9; else, go Step 6. If h,O(z)< -a", to step 7. Step 7. If E < E", set = E, solve (13)-(16) for E = 0, to obtain (hoo(z), h,(z)), and go to step 8; else, set E = and go to step 4. Step 8. If h,O(z) = 0, stop; else, set E = p';, and go to step 4. Step 9. Compute the smallest integer q such that (10) and (1 1) are satisfied. (/3")q h(z), and set i = i 1. Step 10. Set zi+l = z Step 11. If (i/k) = 0 modulo k , go to step 2; else, go to step 3. PIE,
+
C.7
+
Methods of Feasible Directions with Quadratic Search
Of the various possibilities that we discussed in Section 4.4, we feel that the algorithm given below should be the most efficient one, because it is the simplest one.
1 Algorithm (recommended method of feasible directions with quadratic search).
31 6
APPENDIX C A GUIDE TO IMPLEMENTABLE ALGORITHMS