This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
(a, c) Yx = (y, t) E H. Consider the point zk = (uk,s k ) E H \ G chosen in Step 3. Let ik= (uk,s k ) , where ik - min{t I h(uk,t) 2 0), so that ikE H but ( u k , t ) $ H for every t < ik.If ikE G then ikis feasible and by the choice of zk it follows that ikis an optimal solution. So let ik$ G and denote by xk = ( y k , t k )the first point of G on the line segment joining ikto (a, c). By removing (xk,xk] from the box [(a,c) ,zk], we obtain a polyblock with vertices
Since in view of (2.10) xk < ikwe have tk < sk,hence xkln+' $ H and will be dropped. Thus, only x k l , . . . , xkn will remain for consideration, as if we were working in Rn. This discussion suggests that to solve problem (2.21) by Algorithm PA, Steps 3 and 4 should be modified as follows:
vk # 0, select xk = (u, s ) E argmax{f
(2) I r E vk}. Compute ik-- min{t I h(uk,t) 0). If ik:= (uk,ik)E G, terminates: an optimal solution has been obtained. Otherwise, go to Step 4.
Step 3.
If
k
k
>
Compute xk = ik- A k ( i k - (a, c)), with Ak = min {A I Step 4. i k - A ( z k - (a, c)) E G). Determine the new current best feasible solution zk+' and the new current best value CBV. Compute the proper vertex set Vk+' of Pk+1= & \ (zk,b] according to Proposition 7. In this manner, the algorithm will essentially work in the y-space and can be viewed as a polyblock outer approximating procedure in this space. The above method cannot be easily extended to the general case of (2.19) when t E Rm with m > 1. However, to take advantage of the fact that the objective function depends on a small number of variables one
Monotonic optimization
Figure 2.1. Inadequate &-approximateoptimal solution.
can use a branch and bound procedure (see Sections 7,8 below), with branching performed on the y-space.
6.
Successive incumbent transcending algorithm
The &-approximateoptimal solution, as computed by Algorithm PA for MO/A (or Algorithm QA for MO/B) in finitely many iterations, may not be an adequate approximate optimal solution. In fact it may be infeasible and for a given E > 0 it may sometimes give an objective function value quite far from the actual optimal value of the problem, as illustrated by the example depicted in Figure 1.1 where x* is almost feasible but not feasible. To overcome this drawback, in this Section we propose a finite algorithm for computing a more adequate approximate optimal solution of MO/A. 0) # 0 (this is a mild Assume that {x E [a,b] I g(x) < 0, h(x) assumption that can often be made to hold by shifting a to a' < a). For E > 0 satisfying {x E [a, b] I g(x) 5 - E , h(x) 0) # 0, we say that a feasible solution Z is essentially E-optimal if
> >
Clearly an infinite sequence {z(E)),E \ 0, of essentially &-optimalsolutions will have a cluster point x* which is a nonisolated feasible solution satisfying f (x*) = max{ f (x) I x E S*),
60
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where S* = cl{x I g(x) < 0 5 h ( x ) , x E [a, b]). Note that S* may be a proper subset of the feasible set {x I g(x) 5 0 5 h(x), x E [a,b]) (which is closed, since g is 1.s.c. and h is u.s.c.). Such a nonisolated feasible solution x* is referred to as an essential optimal solution. Basically, the proposed algorithm for finding an essentially e-optimal solution of MO/A is a procedure for successively solving a sequence of incumbent transcending subproblems of the following form: (*) Given a real number y, find a feasible solution with an objective function value exceeding y, or else prove that no such solution exists. As will be shortly seen each of these subproblems reduces to a MO/B problem.
6.1
Incumbent transcending subproblem
For any given y E R U (-oo) consider the problem min{g(x) min f (x) 2 y, h(x)
> 0, x E [a,b]).
Since this is a MO/B problem without normal constraint, an e-optimal solution of it can be found by Algorithm QA in finitely many iterations (Remark 4). Denote the optimal values of MO/A and (Bly) by max MO/A and min B/y, respectively. (i) Ifmin(B/y) > 0 then any feasible solution 2 of MO/A such that f (Z) y - e is an e-optimal solution of MO/A. Hence, if min(B/y) > 0 for y = -oo then MO/A is infeasible.
PROPOSITION 2.8
>
(ii) If min(B/y) < 0 then any feasible solution o x of (Bly) such that g ( o x ) < 0 is a feasible solution of MO/A with f ( o x ) 2 y. (iii) If min(B/y) = 0 then any feasible solution 3 of MO/A such that g(Z) 5 - E and f (2) y - E is essentially &-optimal.
>
Proof. (i) If min(B/y) > 0 then, since every feasible solution x of MO/A satisfies g(x) 5 0, it cannot be feasible to (Bly). But h(x) 0, x E [a, b], hence f (x) < y. Consequently, max (MOIA) < y, i.e. f (z) y - E > rnax (MOIA) - E , and hence 2 is an &-optimalsolution of MO/A. (ii) If o x E [a, b] is a feasible solution of (Bly) while g(Gx) < 0, then g(Bx) < 0, f ( 6 x ) y, h ( o x ) 2 0, hence Gx is a feasible solution of MO/A) with f ( o x ) 2 y. (iii) If min(B/r) = 0 then any x E [a, b] such that g(x) -E, h(x) 0, is infeasible to (Rly), hence must satisfy f (x) < y. This implies that y sup{f(x)l g(x) 5 - E , h(x) 0, x E [a, b]), so if a feasible solution n: of MO/A satisfies g(3) 5 - E and f (3) 2 - E , then f (2) E 2
>
>
>
>
>
A/
+
61
Monotonic optimization
sup{ f (x) I g(x) 5 -e, h(x) y &-optimalto MO/A.
6.2
2 0, x
E [a, b]), and so 3 is essentially 0
Successive incumbent transcending algorithm for MO/A
Proposition 2.8 can be used to devise a successive incumbent transcending procedure for finding an &-essentialoptimal solution of MO/A. Before stating the algorithm we need some definitions. A box [p, q] c [a,b] can be replaced by a smaller one without losing any x E [p, q] satisfying g(x) 0, f (x) y, h(x) 2 0, i.e. without losing any x E [p, q] satisfying
g(x) 5 0 5 h,(x) := min{f (x) - y, h(x)). This reduced box redo[p, q], called a valid reduction of [p, q], is defined by
where n
As in Section 3, it can easily be proved that the box redo[p, q] still contains all feasible solutions x € b,q] of (Bly) with g(x) 0. For any given copolyblock Q with proper vertex set V, denote by redoQ the copolyblock whose vertex is obtained from V by deleting all z E V satisfying redo[z,b] = 0 and replacing every other z E V with the lowest corner z' of the box redo[z,b]. Also for any z E [a, b] denote by py(z) the first point where the line segment joining z to b intersects the surface h-, (x) := mini f (x) - y, h(x)) = 0, i.e.
0, the inclusion {x I g(x) 0, f (x) 2 y) c Qk shows that min(B/y) > 0, and so, by Proposition 2.8, It. is an €-optimal solution of MO/A. If -E < g(zk) 0 , then Z is a feasible solution of MO/A with f(3) = y - E , while - E < min{g(x) I f ( x ) 2 y, h(x) 2 0 , x E [a,b]), -e, hence f ( x ) < y = f(Z) E for all x E [a,b] satisfying g(x) h(x) 0. This means that Z is essentially €-optimal. There remains to show that the algorithm is finite. Since at every occurrence of Step 3 the current best value f (z) improves at least by E > 0 while it is bounded above by f (b), it follows that Step 3 cannot occur infinitely many times. Therefore, there is Lo such that for all k ko we have g(xk) > O! and also g(zk) ) 0 < g(xk). From this moment the algorithm works exactly as a procedure QA for solving the problem (B/y). Since g(xkj- g(zk) + 0 for k -++oo by the convergence of this procedure, the event -E < g(zk) 0 must occur for sufficiently 0 large k. This completes the proof.
>
+
0). Then
The last equation follows from the assumption in (i) by letting k -t +oo. (ii) follows from the fact t,hat d* = sup{d(X): X E A) inf{f(x): x E C3,
2
d(0) =
(iii) and (iv) follow immediately from the definition of duality bounds.0
xiEIlU12
REMARK 3.1 (a) The condition g i ( ~ ) ~> : 0 for x E C, is fulfilled if, e.g., there is an index j such that {x: I(: E C,gj(x) 5 0 )
=O.
In this case, one can choose X0 = (0,. . . , 0 , A:, 0 , . . . , 0 ) with XjO = 1. (b) The monotonicity property (iii) is useful in the application of duality bounds within a branch and bound scheme, which is considered in the next sections. (c) Based on Property (iv) one can use redundant constraints to improve duality bounds in some interesting cases, see, e.g., Shor and Stetsyuk (2002).
83
3 Duality Bound Methods in Global Optimization
2.2
Convex envelopes and convexification bound
We recall the concept of the convex envelope of a nonconvex fuiiction which is a basic tool in theory and algorithms of nonconvex global optimization, see e.g., Falk and Hoffman (1976), Horst and Tuy (1996), and Horst et al. (2000).
3.1 Let C c Rn be nonempty convex, and let f : C DEFINITION be 1.s.c. on C. The function x
H
cp,,
--+
R
'PC,,-. C - + R , (x) := sup{h(x):h:C R convex, h 5 f on C}
,
-f
is said to be the convex envelope of f over C. Notice that it is often convenient to eliminate formally the set C by setting
and replacing cpCtf accordingly by its extension cp,,, : Rn -+ R U {oo). It is well-known and easy to see that cpCqf is 1.s.c. on C, and hence is representable as the pointwise supremum of the affine minorants of f . Geometrically cpCvf is the function whose (closed) epigraph (i.e., the set of points on and above its graph) coincides with the convex hull of the epigraph of f . The following basic properties and their proofs can be found, e.g., in Horst and Tuy (1996) and Horst et al. (2000).
PROPOSITION 3.2 Assume that in Definition 3.1, C is compact and let D c C be convex7g g: Rn R be an afine function. Then -+
(i) m := min{f (x) : x E C ) = min{cpClf(x) : x E C ) , (ii) {Y E C : f ( y ) = m)
(iv) cp,,f+,
= cp,,
C {Y
E C : cp,,,
(y) = m),
,+ g.
Notice that the result (ii) can be precised: it is easy to see that the set of global minimizers of cpCvf over C is the convex hull of the set of global minimizers of f over C. For each nonconvex optimization problem of the type (3.1) with we can construct the following convexified problem:
12
= 0,
84
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Obviously, cp* is also a lower bound of f *. We call cp* a convexification bound.
2.3
Relationship between duality bounds and convexification bounds
A very nice property of duality bounds is that in many interesting cases, they are at least as good as convexification bounds. More precisely, this property is shown in the following. PROPOSITION 3.3 Assume that in Problem (3.1) I2 = 0 and some constraint qualification is fulfilled for the convexified Problem (3.3). T h e n
(ii) d* = cp* for the case where gi, i E I I are all linear.
be the Lagrangian of Problem (3.3). Then it follows from the above assumption and the definition of convex envelopes that d* = sup inf L(x, A) X>O "cEC
> sup inf z ( x , A) = cp* X>O x E C
(ii) From Proposition 3.2(iv), it follows that on C it holds
Thus, from Proposition 3.2(i) we have for each A 2 0 that
A) = inf z ( x , A), inf L(x, A) = inf cpC,~(x,
xEC
XEC
xEC
13 which implies that d* = cp*. In principle, duality bounds can be strictly better than convexification bounds. The following special case serves as an example (see Diir, 2002).
PROPOSITION 3.4 Assume that in Problem (3.1) and Problem (3.3) the following conditions are fulfilled: (i) I2 = 0; C is compact; (ii) f is strictly concave o n C , i.e.
3 Duality Bound Methods in Global Optimization
85
(iii) -gi, i E II are strictly concave and continuously differentiable o n C , and there i s 3 E C such that gi(3) < 0 , i E I l ; (iv) f * > cp*; (v) cpc,! is not constant o n any linesegment contained in C .
T h e n d* > cp*, i.e. in this case, the duality bound i s strictly better than the convexification bound.
3.
Branch and bound methods using duality bounds
The branch and bound scheme is one of the most promising methods developed for solving multiextremal global optimization problems. The main idea of this scheme consists of two basic operations: successively refined partitioning of the feasible set and estimation of lower and upper bounds for the optimal value of the objective function over each subset generated by the partitions. In this section, we present a branch and bound scheme using duality bounds for solving global optimization problems of the form
where C is a simple n-dimensional compact convex set as used in the first relaxation step of a branch and bound procedure, e.g., a simplex or a rectangle, f and gi (i = 1,. . . ,m ) are lower semicontinuous functions on C. The additional assumption on C is made here for the implementability and convergence of the algorithm. Let
3.1
Branch and bound scheme
For each set R for the problem
C , we denote by p(R) the duality bound computed
and by F ( R ) a finite set of feasible points of this problem, if there exists any.
Branch and bound algorithm Initialization: Set R' = C. Compute p(R1) and F ( R ~ c ) R l n ~ Set . p1 = p(R1). If F ( R ~ # ) 0, then compute yl = min{f (x) : x E F ( R ~ ) ) and choose x1 such that f ( x l ) = yl, otherwise, set yl = + m . If p1 = + m , then set R 1 = 0, otherwise, set R1 = {R1), k = 1.
86
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Iteration k (i) If !Xk = 0, then stop: either Problem (3.1) has no feasible solution or xk is an optimal solution. (ii) If !Xk # 0, then perform a partition of Rk obtaining {R!, . . . , R:), where R:i = 1 , . . . ,r are nonempty n-dimensional sets satisfying UTzl R: = R ~i ,n t ~ mi n t ~ =? 0, for i # j . (iii) For each i = 1,. . . ,r compute ,u(R:) and F (R:) (iv) Set yk+l = min{yk, min{ f (~x): x E
U;=l
c R:
fl
L.
F(R~))).
(v) Choose xkS1 such that f (xk+l)= yk+l. (vi) Set !XkS1= !Xk \ { R ~U{R: ) : p ( R f ) < yk+l, i = 1,.. . , T ) . (vii) If !Xk+1# 0, then set pk+: = min{p(R) : R E !Xk+l) and choose such that p(Rkfl) = pk+l, otherwise, set pk+l = Rktl E y,++l. Go to iteration k t k + 1.
3.2
Convergence
Whenever the above algorithm does not terminate after finitely many iterations, it generates at least one infinite nested sequence of compact partition sets {RQ)such that RQS1c RQ for all q. We obtain the convergence of the algorithm in the following sense.
T H E O R E3.1 M Assume that the algorithm generates a n infinite nested sequence {RQ) of partition sets such that
T h e n each optimal solution of the problem min{ f (x) : x E R*)
(3.5)
is a n optimal solution of Problem (3.4). Proof. For each q let xQE RQsuch that f (xQ)= min{f (x) : x E Rq). Let x* be an accumula,tion point of the sequence {xq). Then x* E R* and, by co. passing to a subsequence if necessary, assume that xQ -+ x* as q Since RQ+' c RQ 'dq and x* E RQ 'dq, it follows with the definition of f (x4) that f (x4) f (xQfl ) f (x*). Hence lim,,, f ( 2 4 ) exists satisfying limq+COf (xq:) f (x*). On the other hand, lower semicontinuity f (xQj2 f (z*), so that we have limq,CO f (xQ)= f (x*). implies lim,,, -+
) -GO,
88
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
it follows from Proposition 3.1 (i) and Remark 3.1(a) that p ( ~ k O )= +m, which implies that the partition set RkOhas to be removed from further 0 consideration. This contradiction implies that r* E L. Notice that the results of Theorems 3.1 and 3.2 can also be derived from the approach given in Dur (2001).
4.
Decomposition met hod using duality bounds
In this section, we discuss a decomposition method for solving a class of global optimization problems, in which the variables can be divided into two groups in such a way that whenever a group is fixed, all functions involved in the problem must have the same structure with regard to the other group, e.g., linear, convex or concave, etc. Based on the decomposition idea of Benders, a corresponding 'master problem' is defined on the space of one of two variable groups. For solving the resulting master problem, the branch and bound algorithm using duality bound presented in the previous section is applied. Convergence properties of the algorithm for this problem class is established in the next subsection. A special class of so-called partly convex programming problems is considered thereafter. The results presented in this section are originated from Thoai (2002a,b).
4.1
Decomposition branch and bound algorithm
The class of nonconvex global optimization problems to be considered here can be formulated as follows: min F(x, y) s.t.Gi(x,y) 5 0 (i = 1,.. . , m) x E C, y E Y,
(3.6)
where C is a compact convex subset of Rn, Y a closed convex subset of RP, and F and Gi (i = 1,.. . ,m) are continuous functions defined on a suitable set containing C x Y. To apply the branch and bound algorithm presented in the previous section, we also assume in addition that C has a simple structure as e.g., a simplex or a rectangle. We denote by Z the feasible set of Problem (3.6), i.e., Z = { ( x , ~ )Gi(x,y) : 5 O(i = 1 , . . . , m ) , x E C , y E Y), and assume that Problem (3.6) has an optimal solution. Define a function 4 : Rn -+ R by
(3.7)
3 Duality Bound Methods i n Global Optimization
89
and agree that 4(x) = +oo whenever the feasible set of Problem (3.8) is empty. Then Problem (3.6) can be formulated equivalently as min{4(x): x E C}.
(3.9)
More precisely, we state the equivalence between the Problems (3.6) and (3.9) in the following proposition whose proof is obvious.
PROPOSITION 3.5 A point (x*,y*) i s optimal t o Problem (3.6), if and only if r* is a n optimal solution of Problem (3.9), and y* is a n optimal solution of Problem (3.9) with x = x*. In view of Proposition 3.5, instead of Problem (3.6), we consider Problem (3.9) which is usually called the 'master problem'. For solving the master problem in Rn, the branch and bound algorithm presented in Section 3.1 is applied. Notice that partitioning is applied only in the space of the x-variables. For each partition set R c C , a lower bound p(R) of the optimal value of the problem min{4(x) : x E R} = min{F(x, y): Gi(x, y) 5 O(i = 1,..., m ) , x E R, y E Y} is obtained by solving the dual problem, i.e.,
We now establish some convergence properties of the branch and bound algorithm applied to Problem (3.9). Let C0 be the subset of C consisting of all points x E C such that Problem (3.8) has an optimal solution, i.e.,
Note that, since Problem (3.6) is solvable, it follows from Proposition 3.5 that Go # 0. Further, let M : C0 -4 RP be a point-to-set mapping defined by M ( x ) = {y E RP: Gi(x,y) 5 O(i = 1 , . . . , m ) , y E Y}.
(3.11)
The following definition, which is introduced based on well known concepts from convex analysis and parametric optimization (see, e.g., Berge, 1963; Hogan, 1973; Bank et al., 1983), is used for establishing convergence of the algorithm.
90
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
DEFINITION 3.2 (i) We say that the function 'dual-proper' at a point xOE C0 if
4
in Problem (3.9) is
(ii) The function 4 is 'upper semicontinuous (u.s.c)' at xO E CO if for each sequence ( 2 4 ) c C O , limq-+m XQ = xO the inequality lim+(xQ)5 4(x0) holds. (iii) A point-to-set mapping M : C0 + RP is called 'lower semicontinuous according to Berge (1.s.c.B.)' a t xOE C0 if for each open set R satisfying R n M(x') # 0 there exists an open ball, C, around xOsuch that R n M ( x ) # 0 for all x E C n CO.
THEOREM 3.3 Assume that the decomposition branch and bound algorithm generates an infinite subsequence {RQ) of partition sets such that (i) RQ+' c RQfor all q and limq,, (ii) the function
RQ=
RQ= {r*) c Go,
4 is dual-proper at r*,
(iii) there exists qo satisfying p(RQ0)> -00, and (iv) there exists a compact set YO c Y such that for each X 2 0 and for each x E C , the set of optimal solutions of the problem min{F(x, y ) + C E 1 Gi(x, y)Xi : y E Y), if it exists, has a nonempty subset in YO. Then r* is an optimal solution of Problem (3.9), i.e., the point r*, together with each optimal solution of the Subproblem (3.8) with x = r*, is an optimal solution of Problem (3.6). Proof. From Assumption (iii) and the monotonicity property of duality bounds, it follows that p(RQ)> -00 for q 2 qo. For each q 2 qo, let
and let XQ be an optimal solution of the problem max{wq(X): X i.e., p(RQ)= wq(XQ).Moreover, let
2 01,
91
3 Duality Bound Methods in Global Optimization
First, we show that w*(A) = supq wq(A) for each A. By definition, it is obvious that w*(A) 2 supq wq(X). On the other hand, for each q, let 2 4 E R4, y4 E Yo c Y such that
Then limq+OO(xQ, yq) = (r*,y*), where r* E C0 as in Assumption (i) and y* E YO c Y, by assumption (iv). This implies that SUP 4
wq(A) = lim wq(A) = F(T*, Y*) q-+w
+ C Gi(r*,y*)Ai 2 w*(X). i=l
Thus, we have w*(A) = sup, wq(A). Since the sequence {p(R'J)) of lower bounds is nondecreasing and bounded by the optimal value of Problem (3.6), its limit, p*, exists, and we have p* = lim p(Rq) = lim wq(Aq)= lim maxwq(A) Q4W q+w q+w A20 = sup max wq(A) = max sup wq(A) = max w*(A). q A20 A20 q x>o Since 4 is dual-proper at r * (Assumption (ii)), it follows that p* = @(r*), which implies that r* is an optimal solution of Problem (3.9), and hence, the point r*,together with each optimal solution of the Subproblem (3.8) with x = r*, forms an optimal solution of Problem (3.6). 0 REMARK3.2 If Y is a compact set, then Conditions (iii) and (iv) in Theorem 3.3 can obviously be removed.
THEOREM 3.4 Let the assumptions of Theorem 3.3 be fulfilled. Further, assume that throughout the algorithm one has F(R4) # 0 for each q, and the function qh is upper semicontinuous at r*. Then each accumulation point of the sequence {xq) generated by the algorithm, at which 4 is upper semicontinuous, is an optimal solution of Problem (3.9). Proof. Let x* be an accumulation point of {xq), (note that accumulation points exist because of the compactness of C). By passing to a subsequence if necessary, we assume that limq,OO 2 4 = x*. Since lim R q =
4+00
n
~q
= {,*I
co,
92
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
and 4 is upper semicontinuous at r*, it follows that for each q there is a point 7-4 E R4 such that 4(r4) < +co and &$(rq) 5 $(r*). Since {4(xq)) is nonincreasing and bounded by the optimal value of Problem (3.6), and 4(x4) 4(r4) for each q, it follows from the upper semicontinuity of 4 at x* that
v f (x*) + Xlvgl(x*) + X2Vg2(x*) = 0,
4 General Quadratic Programming
111
Xlgl(x*) = X2g2(x*) = 0, X2Q2 is positive semidefinite.
A + XIQl
+
The above result can be extended to the case of more than two quadratic constraints. As an example, consider the following problem of minimizing a quadratic function over the intersection of ellipsoids: min f ( x ) = ( A x ,x )
+ 2(b,x )
s.t. g i ( x ) = ( Q i ( x - a i ) , ( x - a i ) ) - r i 2 5 0 ,
i=l,
...,m ,
(4.8)
where A is a symmetric matrix, Qi ( i = 1 , . . . , m) are symmetric positive semidefinite matrices and ai E Rn, ri > 0, ( i = 1, . . . , m) are given vectors and numbers, respectively. Obviously, each set
is an ellipsoid with the center ai and the radius
Ti.
THEOREM 4.5 ( C F . FLIPPO AND JANSEN, 1996) A feasible point x* of problem (4.8) is a global optimal solution if there exist multipliers Xi 0, i = 1 , . . . , m such that
>
A+
x
XiQi is positive semidefinite.
i=l
We discuss now necessary conditions. Let x* be a global optimal solution of problem (4.7). If no constraint is active at x*, i.e., gl(x*) < 0, g2(x*) < 0, then it is well known that V f ( x * ) = 0 and A is positive semidefinite. If only one constraint is active at x*, say gl ( x * ) = 0, g2(x*) < 0, and V g l ( x * ) # 0, then it follows from (local) second or0 such that V f ( x * ) der necessary conditions that there exists X 1 XIVgl(x*)= 0 and A XIQ1 has at most one negative eigenvalue. For the case that both constraints are active at x*, i.e. gl(x*) = g(x*) = 0, necessary conditions are established based on the behavior of the gradients of gl and g2 at x*.
+
>
+
THEOREM 4.6 ( C F . PENGAND Y U A N ,1997) Let x* be a global optimal solution of problem (4.7). (a) If V g l ( x * ) and V g 2 ( x * )are linearly independent, then there exist A1 2 0, X z 2 0 such that
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
+
+
(i) V f (x*) XIVgl(x*) X2Vg2(x*)= 0 and (ii) A XIQl X2Q2 has at most one negative eigenvalue.
+
+
(b) If Vgl(x*) = aVgz(x*) # 0 for a A2 2 0 such that (i) holds and (iii) A
> 0, then there exist X1
> 0,
+ XIQl + X2Q2 is positive semidefinite.
To get some necessary and sufficient conditions for global optimality of problem (4.7), one has to make some more assumptions. In HiriartUrruty (2001), the following special case of problem (4.7) is considered:
where f is convex nonlinear, gl, g2 are convex and there is xO E Rn such that gl(xO)< 0, g2(x0) < 0 (Slater's condition). Based on Condition (4.6), necessary and sufficient conditions for global optimality are obtained for problem (4.9). These results are extensions of Theorem 4.3.
THEOREM 4.7 (CF. HIRIART-URRUTY, 2001) (a) A point x* with gl(x*) = g2(x*) = 0 is a global optimal solution of problem (4.9) if and only i f there exist XI 0, X2 0 such that
>
>
'dd E Iiit T(C,x*), where C is the convex feasible set of problem (4.9) and T ( C , x * ) the tangent cone to C at x* defined by
(b) A point x* with gl(x*) = 0, g2(x*)< 0 is a global optimal solution of problem (4.9) i f and only if there exist X1 0 such that
>
Ax*
+ b = Xi(Qlx.* + qi), b'd
E Int T ( C ,x*),
4 General Quadratic Programming
113
where
Another special case of problem (4.7) is considered in Stern and Wolkowicz (1995). It is the problem min f (x) = (Ax, x) - 2(b, x) s.t. - oo 5 p 5 (Qx, x) 5 a 5 +oo,
(4.10)
where A and Q are symmetric matrices.
THEOREM 4.8 ( C F . STERNAND WOLKOWICZ, 1995) Let x* be a feasible point of problem (4.10) and assume that the following 'constraint qualification' holds at x*: Qx* = 0
implies ,8 < 0
< a.
T h e n x* is a global optimal solution if and only if there exists X E R such that (A - XQ)x* = b, (A - XQ) is positive semidefinite, X(/3 - (Qx*,x*)) 0 2 ~ ( ( Q x *x*) , - a).
>
2.2
Duality
General quadratic programming is a rare class of nonconvex optimization problems in which one can construct primal-dual problem pairs without any duality gap. A typical example for this is problem (4.10). For XI 2 0, X2 1 0 define the Lagrangian
of problem (4.10) and the dual function
Then the Langrangian dual problem of (4.10) is defined as the problem
Assume that THEOREM 4.9 (CF. STERNAND WOLKOWICZ, 1995) problem (4.10) has a n optimal solution. T h e n strong duality holds for the problem pair (4.10)-(4.11), i.e., f * = d*, where f * and d* denote the optimal values o f problem (4.10) and problem (4.11), respectively.
114
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Two interesting special cases of problem (4.10) are respectively the problems with Q = I (the unit matrix), ,b' < 0 < a and Q = I, ,b' = a > 0. Notice that the problem of minimizing a quadratic function over an ellipsoid, the constrained eigenvalue problem and the quadratically constrained least squares problem can be converted into these special cases (cf. Pham and Le Thi, 1995; Flippo and Jansen, 1996). There is another way to construct a dual problem of (4.10). For XI 5 0, X2 5 0 such that the matrix (A - XIQ X2Q) is regular, define the quadratic dual function
+
Then the optimization problem SUP h(X1, X2) s.t.X1 1 0, A:! 5 0, (A - XIQ X2Q) is positive definite.
+
(4.12)
can be considered as a dual of problem (4.10), and we have the following.
Assume that THEOREM 4.10 (CF. STERNAND WOLKOWICZ, 1995) problem (4.10) has an optimal solution and there is X E IR such that the matrix A - XQ is posztive definite. Then strong duality holds for the problem pair (4.10)-(4.12), i.e., f* = h*, where f* and h* denote the optimal values of problem (4.10) and problem (4.12), respectively.
3.
Solution met hods
The general quadratic programming problem is NP-hard (cf. Sahni, 1974; Pardalos and Vavasis, 1991). In this section, we present some main solution methods for the global optimization of this NP-hard problem. In general, these methods are developed based on three basic concepts which are successfully used in global optimization. We describe these concepts briefly before presenting different techniques for the realization of them in general quadratic programming. For details of three basic concepts, see, e.g., Horst and Tuy (1996), Horst and Pardalos (1995), and Horst et al. (2000, 1991). It is worth noting that most techniques to be presented here can be applied to integer and mixed integer quadratic programming problems (which do not belong to the subject of this overview).
3.1
Basic concepts
To establish this concept, we conOuter approximation (OA). sider the problem of minimizing a linear function ( c ,x) over a closed
4
General Quadratic Programming
115
subset F C W1. This problem caAn be replaced by the problem of Unding an extreme optimal solution of the problem min{(c, x): x G F } , where F denotes the convex hüll of F . Let C\ be any closed convex set containing F and assume that x1 is an optimal solution of problem min{(c, x): x G C\], Then x1 is also an optimal solution of the original problem whenever x1 G F. The basic idea of the outer approximation concept is to construct iteratively a sequence of convex subsets {C/e}, k = 1,2,... such that C\ D C2 D • • • D F and the corresponding sequence {xk} such that for each /c, xk is an optimal solution of the relaxed problem min{(c, x): x G Ck}> This process is performed until finding xk G F. An OA procedure is convergent if it holds that xk —> x* G F for k —> +00. Branch and bound scheme (BB). The BB scheme is developed for the global optimization of problem /* = min{/(x): x G F} with / being a continuous function and F a compact subset of Mn. It begins with a convex compact set C\ D F and proceeds as follows. Compute a lower bound JJL\ and an upper bound 71 for the optimal value of the problem min{/(x) : x G CiDF}. (71 = f{xl) if some feasible solution x1 G F is found, otherwise, 71 = +00). At Iteration k > 1, if +00 > fik > jk o r jj,k z=z 4-00, then stop, (in the first case, xk with f(xk) = 7^ is an optimal solution, in the second case, the underlying problem has no solution). Otherwise, divide Ck into finitely many convex sets Ckx,..., Ckr satisfying Ui=i Cki — Ck andC/c- D C^. = 0 for i =fi j , (the sets Ck and C/- are called 4partition sets'). Compute for each partition set a lower bound andan upper bound. Update the lower bound by choosing the minimum of lower bounds according to all existing partition sets, and Update the upper bound by using feasible points found so far. Delete all partition sets such that the corresponding lower bounds are bigger than or equal to the actual upper bound. If not all partition sets are deleted, let Ck+i be a partition set with the minimum lower bound, and go to Iteration k + 1. A BB algorithm is convergent if it holds that 7^ \ /* and/or jjik / * / * for k —> + 0 0 .
Combination of B B and OA. In many situations, the use of the BB scheme in combination with an OA in the bounding procedure can lead to efficient algorithms. Such a combination is called branch and cut algorithm, if an OA procedure using convex polyhedral subsets Ck Vfc > 1 is applied.
116
3.2
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Reformulation-linearization techniques
Consider quadratic programming problems of the form min f (x) = (c, x) s.t. gi(x) 5 0, i = 1 , . . . , I (ai,x) - bi 5 0,
(4.13)
i = 1, ... , m ,
where c E Rn, ai E Rn, bi E R 'di = 1,.. . ,m, and for each i = 1 , . . . ,I, the quadratic function gi is given by
with di, q;, Q i k , Qil being given real numbers for all i, k , I . It is assumed that the polyhedral set X = {x E Rn: (ai, x) - bi 5 0, i = 1 , . . . ,m) is bounded and contained in Rn+= {x E R n : x 2 0). The first linear relaxation of problem (4.13) is performed as follows. For each quadratic function of the form
define additional variables v k = x k2 , k = 1 , ...,n, and wkl=xkxl, k = 1 , ..., n - 1 ; 1 = k + 1 ,
..., n.
From (4.15), one obtains the following linear function in variables x, v, w: n
n
n-1
n
The linear program (in variables x, v and w) min f (x) = (c, x) s.t. [gi(x)]&O, i = 1 , [(bi - (ai,x))(bj- (aj,x))le 2 0,
...,I
(4.17)
V1 5 i 5 j 5 m
is then a linear relaxation of (4.13) in the following sense (cf. Sherali and Tuncbilek, 1995; Audet et al., 2000): Let f * and f be the optimal values of problems (4.13) and (4.17), respectively, and let ( 3 ,v, G ) be an optimal solution of (4.17). Then
4 General Quadratic Programming
117
(a) f * 2 f and (b) if @k = 3; V k = 1 , . . . ,n, zEkl = % k 3 l V k = 1 , . . . ,n 1,. . . ,n , then Z is an optimal solution of (4.13).
-
1; 1 = k
+
Geometrically, the convex hull of the (nonconvex) feasible set of problem (4.13) is relaxed by the projection of the polyhedral feasible set of problem (4.17) on Rn. As well-known, this projection is polyhedral. In the case that the condition in (b) is not fulfilled, i.e., either flk # 3; for at least one index k or zEkl # 3kZl for at least one index pair ( k ,l), a family of linear inequalities have to be added to problem (4.17) to cut zE) off from the feasible set of (4.17) without cutting off the point (3,@, any feasible point of (4.13). To this purpose, several kinds of cuts are discussed in connection with branch and bound procedures. Resulting branch and cut algorithms can be found, e.g., in Al-Khayyal and Falk (1983) and Audet et al. (2000).
3.3
Lift-and-project techniques
The first ideas of lift-and-project techniques were proposed by Sherali and Adams (1990) and Lovkz and Schrijver (1991) for zero-one optimization. ~ h e s basic e ideas can be applied to programming as follows. The quadratic programming problem to be considered is given in the form min f (x) = (c, x)
where C is a compact convex subset of Rn, c E Rn, and each function gi is given by (4.19) gi(x) = ( Q ~ x2) , 2(qi, X) di
+
+
with Qi being n x n symmetric matrix, qi E Rn, and di E R. To each vector x = ( x l , . . . ,x , ) ~ E Rn, the symmetric matrix X = xxT E Rnxn with elements Xij = xixj (i, j = 1 , . . . , n ) is assigned. Let Snbe the set of n x n symmetric matrices. Then each quadratic function
on Rn is lifted to a linear function on Rn x Sn defined by
where (Q, X ) = X Esn.
CYz1QijXij stands for the inner product
of Q,
118
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Thus, the set
can be approximated by the projection of the set {(x, X ) E Rn x S n : (Q, X )
+ 2(q,x) + d I0)
on Rn. By this way, the feasible set of problem (4.18) is approximated by the set {x E Rn: (Qi,X )
+ 2(qi,x j + ci'i 5 0 for some X E Sn, i = 1 , .x
C } , (4.20)
and problem (4.18) is then relaxed by the problem min f (x) = (c, x) s.t. ( Q i , ~ ) + 2 ( q i , x ) + . d i < 0 , i = l , ...,m XEC, X E S n .
(4.21)
Next, notice that for each x E Rn, the matrix
is positive semidefinite. 'Therefore, problem (4.18) can also be relaxed by the problem min f (x) = (c, x) s.t. ( Q ~ , X ) + ~ ( ~ ~ , X ) +i~=~l , 0 it is N P - h a r d to find a feasible solution to the linear bzievel programming problem with n o more than E times the optimal value. Related results can also be found in Hansen et al. (1992). In bicriterial optimization problems two objective functions are minimized simultaneously over the feasible set (Pardalos et al., 1995). To formulate them, a vector valued objective function can be used: ((
min x "{a(x) : x E X),
(6.11)
6 Bilevel Programming
173
Rn and a : X -t R2. In such problems a compromise bewhere X tween the two, in general competing objective functions a l ( x ) and a z ( x ) is looked for. Roughly speaking, one approach for such problems is to call a point x* E X a solution if it is not possible to improve both objective functions at x* simultaneously. Such points are clearly compromise points. More formally, x* E X is Pareto optimal for problem (6.11) if
In this definition, the first orthant R$ in JR2 is used a s an ordering cone, i.e. to establish a partial ordering in the space of objective function values of problem (6.11), which is R2. In a more general formulation, another ordering cone V c R2 is used. The cone V is assumed to be convex and pointed. Then, x* E X is Pareto optimal for problem (6.11) with respect to the ordering cone V if
The relations of bilevel programming to bicriterial optimization have been investigated e.g. in the papers Fliege and Vicente (2003); Haurie et al. (1990); Marcotte and Savard (1991). On the one hand, using R: as the ordering cone, it is easy to see that at least one feasible point of the bilevel programming problem (6.1), (6.6) is Pareto optimal for the
But this, in general, is not true for a (local) optimal solution of the bilevel problem. Hence, attempts to solve the bilevel programming problem via bicriterial optimization with the ordering cone JR: will in general not work. On the other hand, Fliege and Vicente (2003) show that bicriterial optimization can indeed be used to prove optimality for the bilevel programming problem. But, for doing so, another more general ordering cone has to be used. Closely related to bilevel programming problems are also the problems of minimizing a function over the efficient set of some multicriterial optimization problem (see Fiilop, 1993; Muu, 2000). One tool often used to reformulate the optimistic bilevel programming are the Karush-Kuhn-Tucker condiproblem as an one-level tions. If a regularity condition is satisfied for the lower level problem (6.1), then the Karush-Kuhn-Tucker conditions are necessary optimality
174
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
conditions. They are also sufficient in the case when (6.1) is a convex optimization problem in the x-variables for fixed parameters y. This suggests to replace problem (6.I ) , (6.6) by min F(x, y) X,YJ subject to G(y) 10, Vxf (x, Y)
+ ~ ~ V x dY)x =, 0,
Problem (6.12) is called an (MPEC), i.e. a mathematical program with equilibrium constraints, in the literature Luo et al. (1996). The relations between (6.1), (6.4), (6.5) and (6.12) are highlighted in the following theorem. THEOREM6.5 (DEMPE(2002)) Consider the optimistic bilevel programming problem (6.1), (6.4), (6.5) and assume that, for each fixed y, the lower level problem (6.1) is a convex optimixation problem for which (MFCQ) is satisfied for each fixed y and all feasible points. Then, each local optimal solution for the problem (6.1), (6.4), (6.5) corresponds to a local optimal solution for problem (6.12). This implies that it is possible to solve the optimistic bilevel programming problem via an (MPEC) but only if the lower level problem has a unique optimal solution for all values of the parameter or if it is possible to avoid false stationary points of the (MPEC). The solution of a pessimistic bilevel programming problem via an (MPEC) is not possible. Note that the opposite implication is not true in general. This can be seen in the following example.
EXAMPLE 6.4 Consider the simple optimistic linear bilevel programming problem min{y: x E @(y),-1
1y 5
l),
X>Y
where Q(y) := Argmin,{xy : 0
< x < 1)
at the point (x, y) = (0,O). Then, [0,1], if y = 0, (11, {O}, Take 0
< E < 1 and set
W,(O,O) = (-E,
ifyO. E)
x (-E,E). Then,
175
6 Bilevel Programming
Since the infimal function value of the upper level objective function F ( x , y) = y on this set is zero, the point (x, y) = (0,O) is a local optimal solution of problem (6.12). Due to its definition,
Since this function has no local minimum at y = 0, this point is not a local optimistic optimal solution. The essential reason for the behavior in this example is the lack of lower semicontinuity of the mapping Q(y) which makes it reproducible in a more general setting. It is a first implication of these considerations that the problems (6.1), (6.4), (6.5) and (6.1), (6.6) are not equivalent if local optimal solutions are considered and a second one that not all local optimal solutions of the problem (6.12) correspond in general to local optimal solutions of the problem (6.l), (6.4), (6.5). It should be noted that, under the assumptions of Theorem 6.5 and if the optimal solutions of the lower level problem are strongly stable in the sense of Kojima (1980) (cf. Theorem 6.6 below), then the optimistic bilevel programming problem (6.I ) , (6.2) is equivalent to the (MPEC) (6.12). The following example from Mirrlees (1999) shows that this result is no longer valid if 'the convexity assumption is dropped.
EXAMPLE 6.5 Consider the problem
where @(y) is the set of optimal solutions of the following unconstrained optimization problem on the real axis:
Then, the necessary optimality conditions for the lower level problem are y(x 1) exp{- (n: I ) ~ ) (x - 1)~XP{-(X- 1)2 ) = 0
+
+
+
O Vr satisfying V G ~ ( Y OI) ~0,
i : ~ i ( y O= ) 0.
This property is usually called Bouligand stationarity (or B-stationarity) of the point yo.
4.2
Using the KKT conditions
If the Karush-Kuhn-Tucker conditions are applied to replace the lower level problem by a system of equations and inequalities, problem (6.12) is obtained. The Example 6.5 shows that it is possible to obtain necessary optimality conditions for the bilevel programming problem by this approach only in the case when the lower level problem is a convex parametric one and also only using the optimistic position. But even in this case this is not so easy since the familiar regularity conditions are not satisfied for this problem.
THEOREM 6.9 (SCHEELAND SCHOLTES,2000) For problem (6.12) the Mangasarian-Fromowitx constraint qualification (MFCQ) is violated at every feasible point. To circumvent the resulting difficulties for the construction of KarushKuhn-Tucker type necessary optimality conditions for the bilevel programming problem, in Scheel and Scholtes (2000) a nonsmooth version of the KKT reformulation of the optimistic bilevel programming problem is constructed: min F ( x , y)
X,YJ
subject to G(y)
< 0,
V A x , Y: 4 = 0, mini-g(x, y), A) = 0.
(6.14)
Here, for a , b E Rn,the formula min{a, b) = 0 is understood component wise. For problem (6.14) the following generalized variant of the linear independence constraint qualification can be defined (Scholtes and Stohr, 2001):
(PLICQ) The piecewise linear independence constraint qualification is satisfied for the problem (6.14) at a point (xO,yo, A') if the gradients of all
180
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
the vanishing components of the constraint functions G(y), V xL(x, y, A), g(x, y), X are linearly independent. Problem (6.14) can be investigated by considering the following patchwork of nonlinear programs for fixed sets I: min F(x, y) X>YJ
Then, the piecewise linear independence constraint qualification is valid for problem (6.14) at some point (xO,yo, XO) if and only if it is satisfied for each of the problems (6.15) for all sets J ( X O ) C I 2 I ( x O , The following theorem says that the (PLICQ) is generically satisfied. For this define the set % = { ( F , G , f , g ) E C ( R m-tn , RISSSISp): (PLICQ) is satisfied at each feasible point of (6.14) with llAllco I B) for an arbitrary constant 0 < B < oo, llXllco = max{lXil: 1 5 i I p ) is the L,-norm of a vector X E RP and 1 2 2. Roughly speaking, the zero neighborhood in the (Whitney) Ck topology in RP is indexed by a positive continuous function E : Rp -t R+ and contains all (vector-valued) functions h E Ck(Rp,Rt) such that each component function together with all its derivatives up to order b are bounded by the function E . For details the interested reader is referred to Hirsch (1994).
<
rn, the set 7 1 iis also dense in the Ck-topologyfor all 2 5 k 5 1. Now, after this excursion to regularity, the description of necessary optimality conditions for the bilevel programming problem with convex lower level problems using the optimistic position is continued. For the origin of the following theorem for mathematical programs with equilibrium constraints see Scheel and Scholtes (2000). There a relaxation of problem (6.12) is considered:
6 Bilevel Programming
min F ( x ,y) X,YJ,Y subject to V x L ( x ,y, A, p) = 0,
G ( y ) 1 0,
In the following theorem, a more restrictive regularity condition than (MFCQ) is needed: (SMFCQ) The strict Mangasarian-Fromowitx constraint qualification (SMFCQ) is satisfied at xO for problem (6.7) if there exists a Lagrange multiplier ( A , p ) ,
as well as a direction d satisfying
v P i ( x o ) d< 0,
for each i with Pi(xO)= Xi = 0,
VPi(xo)d= 0,
for each i with Xi
v y j jxo)d
for each j
and {vP~(xO): Xi dent.
-
0,
> 0,
= 1, . . . , q ) are linearly indepen-
> 0 ) )1) { v r j ( x O :) j
Note that this condition is im.plied by (PLICQ).
T H E O R E6.11 M Let ( x O , X O ) be a local minimizer of problem (6.14) and use zO = ( x0 ,y 0 ). w
If the (MFCQ) is valid for problem (6.16) at ( x Oyo, , A'), then there exist multipliers ( K , W , C , [ ) satisfying
+
+
V F ( ~ ' ) rcT (0,v y ~ ( y 0 ) )Q ( V ~ L (0 Z, X 0 )w) vxg(zO)w - ,.$ = 0, gi(~O)= & 0, 0
XiJi=O, Cili
Vi7 Vi,
2 07 i E K7
f i T ~ ( y 0=) 0,
L 0,
+ c T V g ( z O )= 0,
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where K = { i : gi(xO, with respect to ( x ,y).
= A: = 0 ) and V denotes the gradient
If the (SMFCQ) is fulfilled for the problem (6.16), then there exist unique multipliers ( K , w , 5, E ) solving the last system of equations and inequalities with CiCi 1 0, i E K being replaced b y
For related optimality conditions see e.g. Flegel and Kanzow (2002, 2003).
5.
5.1
Solution algorithms Implicit function approach
To solve the bilevel programming problem, it is reformulated as a onelevel problem. The first approach again uses the implicit determined solution function of the convex lower level problem x ( ~provided ) this function is uniquely determined. If the assumptions ( C ) , (MFCQ),(SSOC), and (CRCQ) are satisfied for (6.1) at every point y with G ( y ) 0 , then the resulting problem
0 ) G I 2 ~ ( x ( y O ) , := { j : g j ( x ( y O )yo) , =0) and XO is a Lagrange multiplier vector in the lower level problem corIf the constraints responding to the optimal solution x ( y O )for y = gi(x,y) 5 0 in problem (6.1) are locally replaced by gi(x,y) = 0, i E 5, the resulting lower level problems are
mxi n { f ( x , y ) :gi(x1y)= O,Yi E
I).
(6.17)
If the gradients { ~ , ~ ~ ( x yo) ( y :~i )E ,7 ) are moreover linearly independent (which can be guaranteed for small sets 7 > J(AO)with A0 being a vertex in A ( X ( ~ O ) ,y o ) ) , then the optimal solution function x f ( . ) of the problem (6.17) is differentiable (Fiacco, 1983). Let Z denote the family of all index sets determined by the above two demands for all vertices A0 E A ( x ( ~yo). ~ ~ ,
6 Bilevel Programming
183
THEOREM 6.12 (DEMPEAND PALLASCHKE, 1997) Consider problem (6.1) at the point x0 := (x(yO),yo) and let (MFCQ), (SSOC) as well as (CRCQ) be satisfied there. If the condition
(FRR) For each vertex X0 E A(zO)the matrix
has full row rank n
+ II (x(yO),yo) 1
is valid, then the generalized derivative of the function x(.) at the point y = yo in the sense of Clarke (1983) is ax(yo) = conv
U Vx (y ). I
0
IEZ
Using this formula, a bundle algorithm (cf. Outrata at al., 1998) can be derived to solve the problem (6.13). Since the full description of bundle algorithms is rather lengthy, the interested reader is referred e.g. to Outrata at al. (1998). Repeating the results in Schramm (1989) (cf. also Outrata at al., 1998) the following result is obtained: THEOREM6.13 (DEMPE,2002) If the assumptions (C), (MFCQ), (CRCQ), (SSOC), and (FRR) are satisfied for the convex lower level problem (6.1) at all po.ints (x, y), x E Q(y), G(y) = 0, and the sequence i n the bundle algorithm remains of iteration points { (x(yk),y k >Xk) bounded, then this algorithm computes a sequence {(x(yk),yk, A') having at least one accumulation point (x(yO),yo, XO) with
}El
}El
If assumption (FRR) is not satisfied, then the point (x(yO),yo) is pseudostationary i n the sense of Mikhalevich et al. (1987). Hence, under suitable assumptions the bundle algorithm computes a Clarke stationary point. Such points are in general not Bouligand stationary.
5.2
A smoothing method
To solve problem (6.12) several authors (e.g. Fukushima and Pang, 1999) use an NCP function approach to replace the complementarity
184
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
constraints. This results in the nondifferentiable problem min F ( x , y)
Z,YJ
where a function
@(a,
.) satisfying
is called an NCP function. Examples and properties of NCP functions can be found in the book of Geiger and Kanzow (2003). NCP functions are inherently nondifferentiable, and algorithms solving problem (6.18) use smoothed NCP functions. Fukushima and Pang (1999) use the function
and solve the resulting problems min F(x, y) X,YJ subject to G(y) 1 0, VxL(x, Y, 4 = 0)
(6.19)
for E + 0 with suitable standard algorithms. Hence selecting an arbitrary sequence { c k ) g l they compute a sequence {(xk,yk, A k ) ) g l of solutions and investigate the properties of the accumulation points of this sequence. To formulate their convergence result, the assumption of week nondegeneracy is needed. To formulate this assumption consider the Clarke derivative of the function @(- gi(x, y), &). This Clarke derivative exists and is contained in the set
Let the point (3,jj,X) be an accumulation point of the sequence {(xk,y k , A k ) } ~ , .It is then easy to see that, for each i E I ( Z , j j ) \ J ( X ) any accumulation point of the sequence
6 Bilevel Programming
belongs to Ci (3,jj, X), hence is of the form
with ( 1 - ~ ~ ) ~ + ( 15- 1. ~ ~It )is~said that the sequence {(xk,yk, X k ) ) z l is asymptotically weakly nondegenerate, if in this formula neither Ji nor Xi vanishes for any accumulation point of {(xk,yk, Xk))E1. Roughly speaking this means that both gi(zk,yk) and approach zero in the same order of magnitude (see Fukushima and Pang, 1999).
THEOREM 6.14 (FUKUSHIMA A N D PANG,1999) Let for each point (xk,yk, Xk) the necessary optimality conditions of second order for prob~ l e m (6.19) be satisfied. Suppose that the sequence {(xk,yk, x ~ ) ) Econverges to some (Z,jj, X) for k + oo. If the (PLICQ) holds at the limit is asymptotically weakly nonpoint and the sequence {(xk,yk, Xk))E1 degenerate, then (3,jj, A) is a Bouligand stationary solution for problem (6.12).
5.3
SQP methods
Recently several authors have reported (in view of the violated regularity condition rather surprisingly) a good behavior of SQP methods for solving mathematical programs with equilibrium constraints (see Anitescu, 2002; Fletcher et al., 2002; Fletcher and Leyffer, 2002). To sketch these results consider a bilevel programming problem (6.6) with a convex parametric lower level problem (6.1) and assume that a regularity assumption is satisfied for each fixed parameter value y with G ( y ) 5 0. Then, by Theorem 6.5, a locally optimal solution of the bilevel programming problem corresponds to a locally optimal solution for the problem (6.12). Consequently, in order to compute local minima of the bilevel problem, problem (6.12) can be solved. In doing this, Anitescu (2002) uses the elastic mode approach in a sequential quadratic programming algorithm solving (6.12). Th'is means that if a quadratic programming problem minimizing a quadratic approximation of the objective function of problem (6.12) subject to a linear approximation of the constraints of this problem has a feasible solution with bounded Lagrange multipliers then the solution of this problem is used as a search direction. And if not, a regularized quadratic programming problem is used to compute this search direction. For simplicity, this idea is described for problem (6.7). Then this means that the following problem is used to compute this search direction:
186
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
subject to Pi(x) %(x)
+ VP(x)d 5 0,
+ Vy(x)d = 0,
Vi = 1 , . . . , p V j = 1 , . . . ,q.
Here, W can be the Hessian matrix of the Lagrange function of the problem (6.7) or another positive definite matrix approximating this Hessian. If this problem has no feasible solution or unbounded Lagrange multipliers the solution of problem (6.7) (or accordingly the solution process for the problem (6.12)) with the sequential quadratic programming approach is replaced by the solution of the following problem by the same approach:
where c is a sufficiently large constant. This is the elastic mode SQP method. To implement t,he idea of Anitescu (2002) assume that the problem (6.16) satisfies the (SMFCQ) and that the quadratic growth condition at a point x = x 0
(QGC) There exists a > 0 satisfying
for all x in some open neighborhood of xO is valid for problem (6.12) at a locally optimal solution of this problem.
THEOREM 6.15 (ANITESCU,2002) If the above two assumptions are satisfied then the elastic mode sequential quadratic programming algorithm computes a locally optimal solution of the problem (6.12) provided it is started suficiently close to that solution and the constant c is sufficiently large. Using stronger assumptions Fletcher et al. (2002) have even been able to prove local Q-quadratic convergence of sequential quadratic programming algorithms to solutions of (6.12).
6.
Discrete bilevel programming
If integer variables appear in the lower or upper levels of a bilevel programming problem the investigation becomes more difficult and the number of references is rather small, see Dempe (2003). With respect to the existence of optimal solutions the location of the discrete variables is important Vicente et al. (1996). Most difficult is the situation when the lower level problem is a parametric discrete one and the upper level.
6 Bilevel Programming
187
problem is a continuous problem. Then the graph of the solution set mapping @(.) is in general neither closed nor open. The other cases can be treated more or less analogously to the continuous problems. One way to solve discrete optimization problems (and also bilevel programming problems) is branch-and-bound. If the integrality conditions in both levels are dropped at the beginning and are introduced via the branching procedure, then a global optimal solution of the relaxed problem, which occasionally proves to be feasible for the bilevel problem is in general not an optimal solution for the bilevel programming problem. Moreover, the usual fathoming procedure is not valid, see Moore and Bard, (1990). Fathoming is used in a branch-and-bound algorithm to decide that a node of the enumeration tree need not be explored further. This decision cannot be based on the comparison of the incumbent objective value with the optimal objective function value of the relaxed problem if an optimal solution of the latter problem proves to be feasible for the bilevel problem. Mixed-discrete linear bilevel programming problems with continuous lower level problems have been transformed into linear bilevel problems in Audet et al. (1997) which opens a second way for solving such problems. Other solution methods include one using explicitly the solution set mapping of a right-hand side parametrized Boolean knapsack problem in the lower level and another one using cutting planes in the discrete lower level problem with parameters in the objective function only (see Dempe, 2002). To describe a further approach consider a linear bilevel programming problem with integer variables in the upper level problem only:
subject to Alx 5 bl, x 2 0, integer where y solves
(6.20)
Then, an idea of White and Anandalingam (1993) can be used to transform this problem into a mixed discrete optimization problem, For this, apply the Karush-Kuhn-Tucker conditions to the lower level problem.
188
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
This transforms problem (6.20) into
subject to A 1 x 5 bl,
x 1 0 , integer,
B ~ >Ab2,
(6.21)
Now use a penalty function approach to get rid of the complementarity constraint resulting in the problem
subject to A l x
bl,
x
2 0,
integer,
(6.22)
A 2 x + B2y = b2, y 1 0 , B,TX b2. By application of the results in White and Anandalingam (1993) the following is obtained:
THEOREM 6.16 Assume that problem (6.22) has a n optimal solution for some positive KO. Then, the problem (6.22) describes a n exact penalty function approach for problem (6.20), i.e. there i s a number K* such that the optimal solutions of the problems (6.22) and (6.20) for all K 2 K* coincide. This idea has been used in Dempe and Kalashnikov (2002) to solve an application problem in gas industry. Moreover, the implications of a movement of the discreteness condition from the lower to the upper level problems has been touched there.
7.
Conclusion
In the paper a selective survey of results in bilevel programming has been given. It was not the intention of the author to give a detailed description of one or two results but rather to give an overview over different directions of research and to describe some of the challenges of this topic. Since bilevel programming is a very living area a huge number of questions remain open. Among others, these include optimality conditions as well as solution algorithms for problems with nonconvex lower level problems, discrete bilevel programming problems in every context, and many questions related to the investigation of pessimistic bilevel
6 Bilevel Programming
189
programming problems. Also, one implication from NP-hardness often used in theory is that such problems should be solved with approximation algorithms which, if possible, should be complemented by a bound on the accuracy of the computed solution. One example for such an approximation algorithm can be found in Marcotte (1986) but in general the description of such algorithms is a challenging task for future research.
References Anandalingam, G. and F'riesz, T. (eds.). (1992). Hierarchical Optimization. Annals of Operations Research, vol. 24. Anitescu, M. (2002). On solving mathematical programs with complementarity constraints as nonlinear programs. Technical Report No. ANLINCS-P864-1200, Department of Mathematics, University of Pittsburgh,. Audet, C., Hansen, P., Jaumard, B., and Savard, G. (1997). Links between linear bilevel and mixed 0-1 programming problems. Journal of Optimization Theory and Applications, 93:273-300. Bard, J.F. (1998). Practical Bilevel Optimixation: Algorithms and Applications. Kluwer Academic Publishers, Dordrecht. Clarke, F.H. (1983). Optimixation and Nonsmooth Analysis. John Wiley & Sons, New York. Dempe, S. (1992). A necessary and a sufficient optimality condition for bilevel programming problems. Optimization, 25:341-354. Dempe, S. (2002). Foundations of Bilevel Programming. Kluwer Academic Publishers, Dordrecht. Dempe, S. (2003). Annotated bibliography on bilevel programming and mathematical programs with equilibrium constraints. Optimization, 52:333-359. Dempe, S. and Kalashnikov, V. (2002). Discrete bilevel programming: Application to a gas shipper's problem. Preprint No. 2002-02, T U Bergakademie Freiberg, Fakultat fiir Mathematik und Informatik. Dempe, S. and ~allaschke,D. (1997). Quasidifferentiability of optimal solutions in parametric nonlinear optimization. Optimixation, 40:l--24. Deng, X. (1998). Complexity issues in bilevel linear programming. In: Multilevel Optimization: Algorithms and Applications (A. Migdalas,
190
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
P.M. Pardalos, and P. Varbrand, eds.), pp. 149-164, Kluwer Academic Publishers, Dordrecht . Fiacco, A.V. (1983). Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. Academic Press, New York. Flegel, M.L. and Kanzow, C. (2002). Optimality conditions for mathematical programs with equilibrium constraints: Fritz John and Abadie-Type approaches. Report, Universitat Wurzburg, Germany. Flegel, M.L. and Kanzow, C. (2003). A Fritz John approach to first order optimality conditions for mathematical programs with equilibrium constraints. Optimization, 52:277-286. Fletcher, R. and Leyffer, S. (2002). Numerical experience with solving MPECs as NLPs. Numerical Analysis Report NA/210, Department sf Mathematics, University of Dundee, UK. Fletcher, R., Leyffer, S., Ralph, D., and Scholtes, S. (2002). Local Convergence of SQP Methods for Mathematical Programs with Equilibrium Constraints. Numerical Analysis Report NA/209, Department of Mathematics, University of Dundee, UK. Fliege, J. and Vicente, L.N. (2003). A Bicriteria Approach to Bilevel Optimization, Technical Report, Fachbereich Mathematik, Universitat Dortmund, Germany. Frangioni, A.(1995). ' o n a new class of bilevel programming problems and its use for reformulating mixed integer problems. European Journal of Operational Research, 82:615-646. Fukushima, M. and Pang, J.-S. (1999). Convergence of a smoothing continuation method for mathematical programs with complementarity constraints. In: Ill-posed Variational Problems and Regularization Techniques (M. Thera and R. Tichatschke, eds.). Lecture Notes in Economics and Mathematical Systems, No. 477, Springer-Verlag, Berlin. Fulop., J . (1993). On the Equivalence between a Linear Bilevel Programming Problem and Linear Optimization over the Efficient Set. Working Paper, No. W P 93-1, Laboratory of Operations Research and Decision Systems, Computer and Automation Institute, Hungarian Academy of Sciences. Geiger, C. and Kanzow, C. (2003). Theorie und Numerik restrzngierter Optimierungsaufgaben. Springer-Verlag, Berlin.
6 Bilevel Programming
191
Guddat, J., Guerra Vasquez, F., and Jongen, H.Th. (1990). Parametric Optimization: Singularities, Pathfollowing and Jumps. John Wiley & Sons, Chichester and B.G. Teubner, Stuttgart. Hansen, P., Jaumard, B., and Savard, G. (1992). New branch-and-bound rules for linear bilevel programming. SIAM Journal on Scientific and Statistical Computing, 13:1194-1217. Harker, P.T. and Pang, J.-S. (1988). Existence of optimal solutions to mathematical programs with equilibrium constraints. Operations Research Letters, 7:61-64. Haurie, A., Savard, G., and White, D. (1990). A note on: An efficient point algorithm for a linear two-stage optimization problem. Operations Research, 38:553-555. Hirsch, M.W. (1994). Differential Topology. Springer-Verlag, Berlin. Klatte, D. and Kummer, B. (2002). Nonsmooth Equations in Optimixation; Regularity, Calculus, Methods and Applications. Kluwer Academic Publishers, Dordrecht. Kojima, M. (1980). Strongly stable stationary solutions in nonlinear programs. In: Analysis and Computation of Fixed Points (S.M. Robinson, ed.), pp. 93-138, Academic Press, New York. Lignola, M.B. and Morgan, J. (1997). Stability of regularized bilevel programming problems. Journal of Optimixation Theory and Applications, 93:575-596. Loridan, P. and Morgan, J. (1989). New results on approximate solutions in two-level optimization. Optimixation, 20:819-836. Loridan, P. and Morgan, J. (1996). Weak via strong Stackelberg problem: New results. Journal of Global Optimization, 8:263-287. Lucchetti, R., Mignanego, F., and Pieri, G. (1987). Existence theorem of equilibrium points in Stackelberg games with constraints. Optimixation, 18:857-866. Luo, Z.-Q., Pang, J.-S., and Ralph, D. (1996). Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge. Macal, C.M. and Hurter, A.P. (1997). Dependence of bilevel mathematical programs on irrelevant constraints. Computers and Operations Research, 24:ll29-ll4O.
192
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Marcotte, P. (1986). Network design problem with congestion effects: A case of bilevel programming. Mathematical Programming, 34:142-162. Marcotte, P. and Savard, G. (1991). A note on the Pareto optimality of solutions to the linear bilevel programming problem. Computers and Operations Research, 18:355-359. Multilevel optimization: algorithms and applications. Nonconvex Optimixation and its Applications (Athanasios Migdalas, Panos M. Parda10s and Peter Varbrand, eds.), vol. 20, Kluwer Academic Publishers, Dordrecht . Mikhalevich, V.S., Gupal, A.M., and Norkin, V.I. (1987). Methods of Nonconvex Optimixation. Nauka, Moscow (in Russian). Mirrlees, J.A. (1999). The theory of moral hazard and unobservable bevaviour: Part I. Review of Economic Studies, 66:3-21. Moore, J. and Bard, J.F. (1990). The mixed integer linear bilevel programming problem. Operations Research, 38:911-921. Muu, L.D. (2000). On the construction of initial polyhedral convex set for optimization problems over the efficient set and bilevel linear programs. Vietnam Journal of Mathematics, 28:177-182. Outrata, J., KoEvara, M., and Zowe, J. (1998). Nonsmooth Approach to Optimization Problems with Equilibrium Constraints. Kluwer Academic Publishers, Dordrecht. Pardalos, P.M., Siskos, U.,and Zopounidis, C., eds. (1995). Advances in Multicriteria Analysis. Kluwer Academic Publishers, Dordrecht. Ralph, D. and Dempe, S. (1995). Directional derivatives of the solution of a parametric nonlinear program. Mathematical Programming, 70:159172. Scheel, H. and Scholtes, S. (2000). Mathematical programs with equilibrium constraints: stationarity, optimality, and sensitivity. Mathematics of Operations Research, 25:l-22. Scholtes, S. and Stohr, M. (2001). How stringent is the linear independence assumption for mathematical programs with stationarity constraints? Mathematics of Operations Research, 262351-863. Schramm, H. (1989). Eine Kombznation von bundle- und trust-regionVerfahren xur Losung nichtdiflerenxierbarer Optimierungsprobleme, No. 30, Bayreuther Mathematische Schriften, Bayreuth,.
6 Bilevel Programming
193
Shapiro, A. (1988).Sensitivity analysis of nonlinear programs and differentiability properties of metric projections. SIAM Journal Control Optimization, 26:628-645. Vicente, L.N.,Savard, G., and Judice, J.J. (1996). The discrete linear bilevel programming problem. Journal of Optimization Theory and Applications, 89:597-614. Vogel, S. (2002). Zwei-Ebenen-Optimierungsaufgaben mit nichtkonvexer Zielfunktion in der unteren Ebene: Pfadverfolgung und Spriinge. Ph. D thesis, Technische Universitat Bergakademie Freiberg. White, D.J. and Anandalingam, G. (1993). A penalty function approach for solving bi-level linear programs. Journal of Global Optimization, 3:397-419.
Chapter 7
APPLICATIONS OF GLOBAL OPTIMIZATION TO PORTFOLIO ANALYSIS Hiroshi Konno .bstract
1.
We will survey some of the recent successful applications of deterministic global optimization methods to financial problems. Problems to be discussed are mean-risk models under nonconvex transaction cost, minimal transaction unit constraints and cardinality constraints. Also, we will discuss several bond portfolio optimization problems, long term portfolio optimization problems and others. Problems to be discussed are concave/d.c. minimization problems, minimization of a nonconvex fractional function and a sum of several fractional functions over a polytope, optimization over a nonconvex efficient set and so on. Readers will find that a number of difficult global optimization problems have been solved in practice and that there is a big room for applications of global optimization methods in finance.
Introduction
The purpose of this paper is to review some of the recent successful applications of global optimization methodologies in portfolio theory. Portfolio theory was originated by H. Markowitz in 1952 and has since developed into diverse field of quantitative finance including market risk analysis, credit risk analysis, pricing of derivative securities, structured finance, securitization? real options and so on. Mathema,tical programming is widely used in these areas, but applications in market risk malysis are by far the most important. Also, it is virtually the only area in finance where global optimization methodologies have been applied in a successful way. The starting point of the portfolio theory is the mean-variance (MV) model (Konno and Watanabe, 1996) in which the risk measured by the
196
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
variance of the rate of return of portfolio is minimized subject to the constraint on the level of expected return. This problem is formulated as a convex quadratic programming problem. Though mathematically simple, it took more than 30 years before a large scale mean-variance model was solved in practice, due to the computational difficultly associated with handling a completely dense variance-covariance matrix. The breakthrough occurred in 1984, when Perold (1984) solved a large scale mean-variance problem using a factor model approach and sparse matrix technologies. Twenty years later, we are now able to solve a very large scale MV model consisting of over 10,000 variables on a personal computer. If we replace variance by absolute deviation as a measure of risk, then we can solve the resulting mean-absolute deviation (MAD) model (Konno and Yamazaki, 1991) even when there are more than a million variables since the problem is reduced to a linear programming problem. Both MV model and MAD model can be formulated as a convex minimization problem, so it has little to do with "global" optimization. However, when we extend the model one step further, we need to introduce a variety of nonconvex terms. These include, among others transaction cost, tax, market impact, minimal transaction unit constraints and cardinality constraints. Then we need to apply global optimization approach to solve the resulting nonconvex problems. By global optimization methods, we mean here deterministic algorithms as discussed in the textbook of Horst and Tuy (1996). Also, we concentrate on a class of exact algorithms, i.e., those which generate an optimal solution in the limit or &-optimalsolution in finitely many step. There are still relatively few successful applications of global optimization to finance. Reasons are two-folds. First, deterministic global optimization is a rather new area. In fact, deterministic and exact algorithms are neglected in a survey paper of Rinnooy-Kan and Timmer (1989) appeared in 1989. This means that solving a non-convex problem in a deterministic way has been considered intractable until mid 1980's unless the problem has some special structure, such as concave minimization on an acyclic network (Zangwill, 1968). Heuristic and multi-start local search methods were the only practical methods for handling nonconvex problems without special structures. Therefore, most financial engineers are not aware of recent progress in global optimization and thus try to formulate the problem within the framework of convex minimization or simply apply local search or heuristic approach.
7 Applications of Global Optimization to Portfolio Analysis
197
Second, global optimizers are more interested in applications in physical problems. It appears that there is still psychological barrier for mathematical programmers to do research in dual (monetary) space. In the next two sections, we discuss applications of global optimization to mean-risk models. A variety of nonconvex problems have been solved successfully by employing mean-absolute deviation framework. Section 4 will be devoted to applications of fractional programming methods to bond portfolio analysis. Here we discuss the minimization of the sum of linear fractional functions and the ratio of two convex functions over a polytope. Section 5 will be devoted to miscellaneous applications of global optimization in finance such as minimization over an efficient set, long-term constant proportion portfolio problem, long-short portfolio and problems including integer constraints. Readers are referred to a recent survey on the applications of mathematical programming to finance by Mulvey (2001), a leading expert of both mathematical programming and financial engineering.
2.
Mean-risk models
In the following, we will present some of the basics of the mean risk models. Let there be n assets Sj, j = 1 , 2 , .. . ,n and let R j be the random variables representing the rate of return of Sj . Let x j 1 0 be the proportion of the fund to be invested into Sj. The vector x = (xl, xz, . . . ,x,) is called a portfolio, which has to satisfy the following condition. n
Let R(x) be the rate of return of the portfolio:
and let r(x) and v(x) be, respectively the mean and the variance of R(x). Then the mean-variance (MV) model is represented as follows. minimize subject to
v(x) r(x) 2 p (7.3) (MV) x E X, where X E Rn is an investable set defined by (7.1). Also, it may contain additional linear constraints. And p is a constant to be specified by an investor.
198
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Let x(p) be an optimal solution of the problem (7.3). Then the trais called an efficient frontier. jectory of r (x(p)), There are two alternative representations of the mean variance model, namely (MV2)
minimize subject to
r(x) v(x) 5 a2 x € X,
(7.4)
(MV3)
maximize subject to
r ( x ) - Xv(x) x E X.
(7.5)
All three representations are used interchangeably since they generate the same efficient frontier as we vary p in (MV), a in (MV2) and X 2 0 in (MV3). There are several measures of risk other than variance (standard deviation) such as absolute deviation, lower semi-variance, (lower-semi) partial moments, below-target risk, value-at-risk (VaR) , conditional valueat-risk (CVaR) . Most of these except VaR are convex functions of x. Mean-risk models are denoted as either one of (7.3)-(7.5), where variance v(x) is replaced by one of the risks introduced above. However, following three risk measures are by far the most important from the computational point of view when we extend the model into the direction of global optimization. w ( x ) = E [IW- E[R(x)II] Lower semi-sbsolute deviation W- (x) = E [I R(X) - E [ ~ ( x ) ] Below target risk of degree one BTl (x) = E [I R(x) - T 1-1 (T is a constant) Absolute deviation
1-1
programming problem. since the associated mean-risk model can be formulated as a linear programing problem when (R1,R2, . . . R,) is distributed over a set of finitely many points (rlt, rzt,. . . rnt),t = 1 , 2 , . . . , T and
are known. For example, the mean-absolute deviation model minimize subject to
W (x) r(x) 2 p x € X
7 Applications of Global Optimization t o Portfolio Analysis
can be represented as follows:
where r j = c= :, ftrjt. It is straightforward to see that the problem can be converted to a linear programmming problem
II
minimize
CTZlft(st+ $Jt)
subject to
~ t - $ J t = C ~ = l ( r j t - ~ j ) ~t j=, 1 , 2 ,...,T st20,
$Jt>O,
t = 1 , 2 ,...,T
(7.9)
Also, CVaR,(x) defined by the lower a quantile of R(x): (7.10) E [- R(X) IR(X) I VaR, (x)], 1-a shares the same property as the above three measures (Rockafellar and Uryasev, 2001). CVaR, (x) =
3.
Mean-risk models under market friction
Markowitz formulated the mean-variance model assuming there is no friction in the market. However, nonlinear transaction fee and tax are associated with selling and/or buying assets. Also, we experience the socalled market impact effect when we buy a large amount of assets. The unit price of the asset may increase due to the supply-demand relation and thus the actual return would be substantially smaller than those in the ideal frictionless market. Also, we often need to handle discrete variables. Among such examples are minimal transaction unit constraints and cardinality constraints. The former is associated with the existence of minimal unit one can trade in the market, usually 1000 stocks in Tokyo Stock Exchange. The latter is associated with investors who do not want to hold too many assets, when one has to impose a condition on the maximal number of assets in the portfolio.
3.1
Transaction cost
There are two common types of transaction cost, i.e., piecewise linear concave and piecewise constant as depicted by Figure 7.1.
200
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
(a) piecewise linear concave
(b) piecewise constant
Figure 7.1. Transaction cost function.
Transaction cost is usually relatively large when the amount of transaction is smaller and it increases gradually with small rate, hence concave (Figure 7.l(a)). An alternative is a piecewise constant function as denoted by (Figure 7.l(b)), which is very popular in e-trade system. It is well known that these types of transaction cost functions can be represented in a linear form by introducing 0-1 variables. The number of 0-1 variables are equal to the number of linear pieces (or steps). Therefore, we need to introduce around 8 to 10 times n zero-one variables, so that it is out of the scope of the state-of-the-are integer programming softwares when n is over 1000.
3.2
Market impact cost
The unit price of the asset will sharply increase when we purchase assets beyond some bound, which induces additional transaction cost. One typical cost subject to market impact is depicted in Figure 7.2, which is a d.c. function.
Figure 7.2. Market impact.
7 Applications of Global Optimization to Portfolio Analysis
201
Mean-absolute deviation model under concave and d.c. transaction cost c(x) : maximize r(x) - c(x) (7.11) subject to W(x) 5 w x E X, has been successfully solved by a branch and bound algorithm proposed by Phong et al. (See Phong et al., 1995, for details). The mean-absolute deviation model under transaction cost (7.10) can then be reformulated as a linearly constrained non-concave maximization problem: maximize subject to
As reported in Konno and VC7ijayanayake (1999), the problem can be solved in a few seconds on a personal computer when T 5 60 and n 5 500. In fact, the branch and bound algorithm below can generate an optimal solution much faster than the state-of-the-art integer programming software CPLEX applied to a 0-1 integer programming reformulation of the same problem (Konno and Yamamoto, 2003). Similar algorithms have been applied to a number of portfolio optimization problems under nonconvex transaction cost, including index tracking (Konno and Wijayanayake, 2001b), portfolio rebalance (Konno and Yamamoto, 2001), and long-short portfolio optimization (Konno et al., 2005). Further, this algorithm has been extended to portfolio optimization under market impact (), where the cost function becomes a d.c. function as depicted in Figure konno:fig2. Let us note that the MV model under nonconvex transaction cost still remains intractable from the computational point of view, since we need to handle a, large scale 0-1 quadratic programming problem.
3.3
Branch and bound algorithm
We will present here the branch and bound algorithm Konno and Wijayanayake (1999) used for solving linearly constrained separable concave minimization problem introduced above.
202
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Let F be the set of (x,4, $) E R~~2T satisfying the constraints of problem (7.11)except the lower and apper bound constraints on xi's.
Branch and bound algorithm.
2" If
r = 4,then goto 9;Otherwise goto 3.
E I?: 3' Choose a problem (Pk)
maximize subject to
f (x)= C>l {rjxj- cj(xi)) (x,4,$) E F Pk 5 x ak.
4" Let $(xi) be a linear understimating function of ci(xi)over the inter(j= 1,2,., . ,n)and define a linear programming val pk x ak, problem
<
E then goto 8. Otherwise let fk = f (xk). 5" If fk < f" then goto 7;Otherwise goto 6.
6" If
f" = fk;k = kk and eliminate all the subproblems (Pi) for which gt(xt)l
f".
7" If gk(xk)5 f" then goto 2. Otherwise goto 8. k ) lj = 1,2,. . . , n) , 8" Let c,(x:) - c:(x:) = max{cj (x?) - c,k (xi
and define two subproblems:
7 Applications of Global Optimization to Portfolio Analysis
r = r U {fl+l,f i + z ) ,
k =k
+ 1 and goto 3.
9" Stop: 2 is an &-optimalsolution of (Po). THEOREM7.1 2 converges to a n E-optimal solution of (Po) as k
Proof. See Thach et al. (1996).
-+
oo. 0
REMARK 2 Branching strategy using x: as a subdivision point is called w-subdevision strategy. A number of numerical experiments show that this strategy is usually superior to standard bisection, where the midpoint of the interval is chosen as a subdivision point.
3.4
Integer constraints
Associated with portfolio construction is a minimal unit we can purchase, usually 1,000 stocks in the Tokyo Stock Exchange. When the amount of fund is lagrer, then we can ignore this constraint and round the calculated portfolio to the nearest integer multiple of minimal transaction unit. The resulting portfolio exhibits almost the same risk-return structure. When however, the amount of fund is smaller, as in the case of the individual investor, simple rounding may significantly disport the portfolio, particular by when the amount of fund is small. It is reported in that we can properly handle these constraints by slightly modifying the branching strategy (Step 8) of the branch and bound algorithm. Also, the state of the art integer programming software can handle these integer constraints if the problem is formulated in the framework of mean-absolute deviation model (Konno and Yamamoto, 2003).
4.
Applications of fractional programming
Fractimal programming started in 1961, when Charnes and Cooper (1962) showed that the ratio of two nonnegative affine functions is quasiconvex and thus can be minimized over linear constraints by a variant of simplex method. Also, Dinkelbach (1967) showed that the ratio of a nonnegative convex and concave functions over a convex set can be minimized by solving a series of convex minimization problems. The sum of linear fractional functions is no longer quasi-convex, so that it cannot be minimized by convex minimization methodologies. Also, the ratio of two convex functions cannot be minimized by Dinkelbach's method. Minimizing the sum of linear ratios and minimizing general fractional functions is therefore the subject of a global optimization which is now under intestive study,
204
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Associated with bonds are several alternative measures of return and risk. Among the popular return measures are average of direct yield, terminal yield and maturity, all of which are represented as a linear fractional function of a portfolio x. A typical bond portfolio optimization problem is to maximize one of these linear fractional functions over a linear system of equalities and inequalities, which can be solved standard methods. Another problem associated with bond portfolio (Konno and Watanabe, 1996) is qf x+q1o q;x+qzo maximize P ~ , Z + P-~P:X+PZO O subject to
Alx
+ Azy 5 b
I
220, Y20 where x E Rnl, y E Rn2 are, respectively the amount of assets to be added and subtracted from the portfolio. A number of algorithms have been proposed for this problem, among which is a parametric simplex alogorithm (Konno and Watanabe, 1996) seems to be the most efficient. The first step of this algorithm is to define w = l/(pt,x +p20),
X = wz,
Y = wy
and convert the problem (7.13) as follows: tX+910w - (q:Y
maximize
+420~)
+ A2Y - bw 5 0 p i x + p20w = 1
subject to
A1X
XLO,
(7.14)
wLO.
Let (X*,Y*, w*) be an optimal solution of (7.14). Then (x*,y*) = (X*/w*,Y*/w*) is an optimal solution of (7.13). The problem (7.14) is equivalent to 1
t
maximize
?(qlX
subject to
AIX
+ plow) - (qiY + qmw)
+ A2Y - bw I 0
p i x fp20w = 1 P;X
+ piow = F
x>o, lmin
WLO
I I I Emax
7 Applications of Global Optimization to Portfolio Analysis
205
are respectivily, the maximal and minimal value of where I,, and tmin pi X plow in the feasible region. Let us note that this problem can be efficiently solved by primalldual parametric simplex algorithm. Optimization of the weighted sum of objectives leads to a maximization of sum of ratios over a polytope. An efficient branch and bound algorithm using well designed convex under estimating function can now solve the problem with the number of fractional terms up to 15 (Konno, 2001). Another fractional problem is the maximal predictability portfolio problem proposed by Lo and MacKinlay (1997) and solved by Gotoh and Konno (2001). maximize
+
subject to
x E X,
where both P and Q are positive definite and X is a polyhedral set. If P is negative semi-definite and Q is positive definite, then the problem can be solved by Dinkelbach's approach (Dinkelbach, 1967). Let us define a function
for X > 0 and let x(X) be the maximal x correspanding to g(X). Let A* be such that g(X*) = 0. Then it is easy to see (Gotoh and Konno, 2001) that x(X*) is an optimal solution of (7.16) for general P and Q. Problem defining g(X) is a convex maximization problem when P is positive semi-definite, which can be solved by a branch and bound algorithm when n is small. Also the zero point of g(X) can be found by bisection or other search methods. It has been demonstrated in Gotoh and Konno (2001) that the problem can be solved fast when n is less than 20.
5.
Miscellaneous applications
In the section, we will discuss additonal important applications of global optimization in finance.
5.1
Optimization over an efficient set
Let us consider a class of multiple objective optimization problems P j , j = 1 , 2,..., k . maximize subject to
c$x x EX
(7.17)
206
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
A feasible solution x* E X is called efficient when there exists no x E X such that t (7.18) cjx 2 cjx*, j = 1 , 2 , . . . , lc, and strict inequality holds for at least one j. The set XE of efficient solutions is called an efficient set. Let consider another objective function f o ( + )and consider maximize subject to
f0 (x) x E XE
(7.
This is a typical global optimization problem since XE is a nonconvex set. A number of algorithms have been proposed when X is polyhedral and fo is convex (Yamamoto, 2002). In particular, when fo is linear there exists a finitely convergent algorithm. Multiple objective optimization problems appear in bond portfolio analysis as explained in Section 3. Fortunately, the number of objectives is usually single digit, usually less than 5. It has been shown in Thach et al. (1996) that the problem of finding a portfolio on the efficient frontier such that the piecewise linear transaction cost associated with rebalancing a portfolio from the current portfolio xO n.
is minimal can be solved by dual reformulation of the original problem. The problem with up to k = 5 and up to 100 variables can be solved within a practical amount of computation time. Also a minimal cost rebalancing problem with the objective function
where c(.) is a piecewise linear concave function and XE is a meanabsolute deviation efficient frontier, can be a solved by a branch and bound algorithm by noting that XE consists of a number of linear pieces (Konno and Yamamoto, 2001). The problem can be reduced to a series of linearly constrained concave minimization problem which can be solved by a branch and bound algorit.hm of Phong et al. (1995).
5.2
Long term port folio opt imization by constant rebalance
Constant rebalance is one of very popular methods for long term portfolio management, where one sells those assets whose price is higher and
7 Applications of Global Optimization to Portfolio Analysis
207
purchases those assets whose price is lower and thus keep the proportion of the weight of the portfolio constant. Given the expected return in each period, the mean variance model (7.5) over the planning horizon T becomes a minimization of a highly nonconvex polynomial function over a polytope : 2
maximize
subject to
f ~ ( x= )
c:'~ fa {nLl(c:=~ (1 + r;Jxj)} 2 - (EL1 fs n L c>,cl+ r;,)xj) - CLf nL1(c:==,(l+ r;,)~,)
Cy==lx j = 1, 0
(7.20)
< x j < aj, j = 1 , 2 , .. . ,n,
where
f, the probability of the scenario s ,
rjt rate of return of asste j during period t under secnario s. Maranas et al. (1997) applied a branch and bound algorithm similar to the one explained in Section 3.3 using
as an underestimator of fx(x) over the hyper-rectangle [pk,a k ] . When y is large enough, gx(x: y) is a convex function of x. Also
where
S = max{ajk - pjk I j
= 1,2,
. . . ,n}.
Hence gx(x: y) is a good approximation of fx(x) when 6 is small enough. It is shown in Maranas et al. (1997) that this algorithm can solve problems of size up to (n, T, s) = (9,20,100).
5.3
Optimization of a long-short portfolio
Long- short portfolio where one is allowed to sell assets short is a very popular fund managernent strategy among hedge funds. The resulting optimization problem looks to be an easy concave maximization problem without sign constraints on the weights of portfolio. However, it is really not. First, the fund manager has to pay deposit in addition to the transaction cost. Also he is not supposed to leave cash unused. Then, the cash out of short sale is reserved at the the third party who lends the asset.
208
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
As a result, the investable set of the mean-variance model [6] becomes a non-convex set
Also, the objective function contains a non-convex transaction cost. Therefere the problem becames a maximization of a non-concave objective function over a non-convex region. This seemingly very difficult problem has been successfully solved (Konno et al., 2005) by extending the branch and bound algorithm of Section 3.3.
Ackowledgements This research was supported a part by the Grantin-Aid for Scientific Reseach of the Mimistry of Education, Science, Culture and Sports B(2) 15310122 and 15656025. Also, the author acknowledges the generous support of the Hitachi Corporation.
References Charnes, A. and Cooper, W.W. (1962). Programming with linear fractional functionys. Naval Reseach Logistics Quanterly, 9: 181-186. Dinkelbach, D. (1967). On nonlinear fractional programming. Management Science, 13:492-498. Gotoh, J. and Konno, H. (2001). Maximization of the ratio of two convex quadratic fructions over a polytope. Computational Optimization and Apllication, 20:43-60. Horst, R. and Tuy, H. (1996). Global Optimixation: Deterministic Approaches. 3rd edition. Springer Verlag. Konno, K. (2001). Minimization of the sum of several linear fractional functions. In: N. Hadjisavvas (ed.), Advances in Global Optimixation, pp. 3-20. Springer-Verlag. Konno, H., Koshizuka, T. , and Yamamoto, R. (2005). Optimization of a long-short portfolio under nonconvex transaction cost. Forthcoming in Dynamics of Continuous, Discrete and Im,pulsive Systems. Konno, H., Thach, P.T., and Tuy, H. (1997). Optimization on Low Rank Nonconvex Structures. Kluwer Academic Publishers.
7 Applications of Global Optimization to Portfolio Analysis
209
Konno, H., and Watanabe, H. (1996). Nonconvex bond portfolio optimization problems and their applications to index tracking. Journal of the Operetions Research Society of Japan, 39:295-306. Konno, H. and Wijayanayake, A. (1999). Mean-absolute deviation portfolio optimization model under transaction costs. Journal of the Operations Research Society of Japan, 42:422-435. Konno, H. and Wijayanayake, A. (2000). Portfolio optimization problems under d.c. transaction costs and minimal transaction unit constraints. Journal of Global Optimization, 22:137-154. Konno, H. and Wijayanayake, A. (2001a). Optimal rebalancing under concave transaction costs and minimal transaction units constraints. Mathematical Programming, 89:233-250. Konno, H. and Wijayanayake, A. (2001b). Minimal cost index tracking under concave transaction costs. International Journal of Theoretical and Applied Finance, 4:939-957. Konno, H. and Yamamoto, R. (2001). Minimal concave cost rebalance to the efficient frontier. MathematicalProgramming, B89:233-250. Konno, H. and Yamamoto, R. (2003). Global Optimixation us. Integer Programming in Portfolio Optimization Under Nonconvex Transaction Cost. Working paper, ISE 03-07, Department of Industrial and Systems Engineering, Chuo University. Konno, H. and Yamazaki, H. (1991). Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market. Management Science, 37:519-531. Lo, A. and MacKinlay, C. (1997). Maximizing predictablity in stock and bond markets. Microeconomic Dynamics, 1:102-134. Maranas, C., Androulakis, I., Berger, A., Floudas, C.A., and Mulvey, J.M. (1997). Solving tochastic control problems in finance via global optimization, Journal of Economic Dynamics and Control. 21:14051425. Markowitz, H. (1959). Portfolio Selection; Eficient Diversification of Investment. John Wiley & Sons. Mulvey, J.M. (2001). Introduction to financial optimization: Mathematical programming special issue. Mathmatical Programming, B89:205216.
210
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Perold, A. (1984). Large scale portfolio optimization. Management Science, 30:1143-1160. Phong, T.Q., An, L.T.H., and Tao, P.D. (1995). On globally solving linearly constrained indefinite quadratic minimization problem by decomposition branch and bound method. Operations Research Letters, 17:215-220. Rinnooy-Kan, A.H. and Timmer, G.T. (1989). Global optimization, In: Nemhauser, G.L. et al. (eds.), Handbooks in Operations Research and Management Science, vol. 1, Chapter 9. Elsevier Science Publishers, B.V. Rockafellar, R.T. and Uryasev, S. (2001). Optimization of conditional value-at-risk. Journal of Risk, 2:21-41. Thach, P.T., Konno, H., and Yokota, D. (1996). A dual approach to a mimimization on the set of Pareto-optimal solutions. Journal of Optimization Theory and Applications, 88:689-707. Tuy, H. (1998). Convex Analysis and Global Optimixation. Kluwer Academic Publishers, Dordrecht. Yamamoto, Y. (2002). Optimization over the efficient set: Overview. Journal of Global Optimixation, 22:285-317. Zangwill, W. (1968). Minimun concave cost flows in certain networks. Management Science, 14:429-450.
Chapter 8
OPTIMIZATION TECHNIQUES IN MEDICINE Panos M. Pardalos Vladimir L. Boginski Oleg Alexan Prokopyev Wichai Suharitdamrong Paul R. Carney Wanpracha Chaovalitwongse Alkis Vazacopoulos Abstract
1.
We give a brief overview of a rapidly emerging interdisciplinary research area of optimization techniques in medicine. Applying optimization approaches proved to be successful in various medical applications. We identify the main research directions and describe several important problems arising in this area, including disease diagnosis, risk prediction, treatment planning, etc.
Introduction
In recent years, there has been a dramatic increase in the application of optimization techniques to the study of medical problems and the delivery of health care. This is in large part due to contributions in three fields: the development of more efficient and effective methods for solving large-scale optimization problems (operations research), the increase in computing power (computer science), and the development of more sophisticated treatment methods (medicine). The contributions of the three fields come together since the full potential of the new treatment methods often cannot be realized without the help of quantitative models and ways to solve them. Applying optimization techniques proved to be effective in various medical applications, including disease diagnosis, risk prediction, treatment planning, imaging, etc. The success of these approaches is par-
212
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
ticulary motivated by the technological advances in the development of medical equipment, which has made possible to obtain large datasets of various origin that can provide useful information in medical applications. Utilizing these datasets for the improvement of medical diagnosis and treatment is the task of crucial importance, and the fundamental problems arising here are to find appropriate models and algorithms to process these datasets, extract useful information from them, and use this information in medical practice. One of the directions in this research field is associated with applying data mining techniques to the rnedical data. This approach is especially useful in the diagnosis of disease cases utilizing the datasets of historical observations of various characteristics of different patients. Standard mathematical programming approaches allow one to formulate the diagnosis problems as optimization models. In addition to diagnosis, optimization techniques are successfully applied to treatment planning problems, which deal with the development of the optimal strategy of applying a certain therapy to a patient. An important aspect of these problems is the identification and efficient control of various risk factors arising in the treatment process. These risk management problems can be addressed using optimization methods. There are numerous other application areas of optimization techniques in medicine, that are widely discussed in the literature (Pardalos and Principe, 2002; Sainfort et al., 2004; Du et al., 1999; Pardalos et al., 2004b; Cho et al., 1993). This chapter reviews the main directions of optimization research in medical domain. The remainder of the chapter is organized as follows. In Section 2 we present several examples of applying optimization techniques to diagnosis and prediction in medical applications: diagnosis of breast cancer, risk prediction by logical analysis of data, human brain dynamics and epileptic seizure prediction. Section 3 discusses treatment planning procedures using the example of the radiotherapy planning. In the next two sections we give a brief review of optimization problems in medical imaging and health care applications. Finally, Section 6 concludes the discussion.
2.
Diagnosis and prediction
Diagnosis and prediction are among the most fundamental problems in medicine, which play a crucial role in the successful treatment process. In this section, we present several illustrative examples of applying optimization techniques to these problems.
8
Optimization Techniques i n Medicine
2.1
213
Disease diagnosis and prediction as data mining applications
In a common setup of the disease diagnosis problem, one possesses a historical dataset of disease cases (corresponding to different patients) represented by several known parameters (e.g., the patient's blood pressure, temperature, size of a tumor, e t ~ . ) .For all elements (patients) in this dataset, the actua,l disease diagnosis outcome is known. A natural way to diagnose new patients is to utilize the available dataset with known diagnosis results (so-called training dataset) for constructing a mathematical model that would classify disease cases with unknown diagnosis outcomes based on the known information. In the data mining framework, this problem is referred to as classification, which is one of the major types of problems in predictive modeling, i.e., predicting a certain attribute of an element in a dataset based on the known information about its other attributes (or features). Due to the availability of a training da.taset, these problems are also associated with the term "supervised learning." To give a formal introduction to classification, suppose that we have a dataset of N elements, and each of these elements has a finite number of certain attributes. Denote the number of attributes as n. Then every element of the given dataset can be represented as a pair (xi, yi), i = 1,. . . ,N , where xi E Rn is an n-dimensional vector:
and yi is the class attribute. The value of yi defines to which class a given element belongs, and this value is known a priori for each element of the initial dataset. It should be also mentioned that in this case yi can take integer values, and the number of these values (i.e., the number of classes) is pre-defined. Now suppose that a new element with the known attributes vector x, but unknown class attribute y, is added to the dataset. As it was mentioned above, the essence of classification problems is to predict the unknown value of y. This is accomplished by identifying a criterion of placing the element into a certain class based on the information about the known attributes x of this element. The important question arising here is how to create a formal model that would take the available dataset as the input and perform the classification procedure. The main idea of the approaches developed in this field is to adjust (or, "train") the parameters of the classification model using the existing information about the elements in the available training dataset and
214
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
then apply this model to classifying new elements. This task can often be reduced to solving an optimization problem (in particular, linear programming) of firiding optimal values of the parameters of a classification model. One of the techniques widely used in practice deals with the geometrical approach. Recall that since all the data elements can be represented as n-dimensional vectors (or points in the n-dimensional space), then these elements can be separated geometrically by constructing the surfaces that serve as the "borders" between different groups of points. One of the common approaches is to use linear surfaces (planes) for this purpose, however, different types of nonlinear (e.g., quadratic) separating surfaces can be considered in certain applications. It is also important to note that usually it is not possible to find a surface that would "perfectly" separate the points according to the value of some attribute, i.e., points with different values of the given attribute may not necessarily lie at the different sides of the surface, however, in general, the number of such errors should be small enough. So, according to this approach, the classification problem is represented as the problem of finding geometrical parameters of the separating surface(s). These parameters can be found by solving the optimization problem of minimizing the misclassification error for the elements in the training dataset (so-called "in-sample error"). After determining these parameters, every new data element will be automatically assigned to a certain class, according to its geometrical location in the elements space. The procedure of using the existing dataset for classifying new elements is often called "training the classifier." It means that the parameters of separating surfaces are "tuned" (or, "trained") to fit the attributes of the existing elements to minimize the number of errors in their classification. However, a crucial issue in this procedure is not to "overtrain" the model, so that it would have enough flexibility to classify new elements, which is the primal purpose of constructing the classifier. As an illustrative example of applying optimization techniques for classification of disease cases, we briefly describe one of the first practical applications of mathematical programming in classification problems developed by Mangasarian et al. (1995). This study deals with the diagnosis of breast cancer cases. The essence of the breast, cancer diagnosis system developed in Mangasarian et al. (1995) is as follows. The authors considered the dataset consisting of 569 30-dimensional feature vector corresponding to each patient. Each case could be classified as malignant or benign, and the actual diagnosis was known for all the elements in the dataset. These 569 elements were used for "training" the classifier, which was developed
8
215
Optimization Techniques in Medicine
based on linear programming (LP) techniques. The procedure of constructing this classifier is relatively simple. The vectors corresponding to malignant and benign cases are stored in two matrices. The matrix A (m x n) contains m malignant vectors (n is the dimension of each vector), and the matrix B (Ic x n) represents Ic benign cases. The goal of the constructed model is to find a plane which would separate all the vectors (points in the n-dimensional space) in A from the vectors in B. If a plane is defined by the standard equation
where w = (wl, . . . , w , ) ~ is an n-dimensional vector of real numbers, and y is a scalar, then this plane will separate all the elements from A and B if the following conditions are satisfied:
Here e = ( I l l , .. . , l)Tis the vector of ones with appropriate dimension (m for the matrix A and k for the matrix B). However, as it was pointed out above, in practice it is usually impossible to perfectly separate two sets of elements by a plane. So, one should try to minimize the average measure of misclassifications, i.e., in the case when the constraints (8.1) are violated the average sum of violations should be as small as possible. The violations of these constraints are modeled by introducing nonnegative variables u and v as follows:
Now we are ready to write down the optimization model that will minimize the total average measure of misclassification errors as follows: m
k
C + I cC vj .
1 min ui W,Y;U,V m i=l
1
-
3=1
subject to Aw+u2ey+e Bw-vsey-e u20, v20. As one can see, this is a linear programming problem, and the decision variables here are the geometrical parameters of the separating plane w and y, as well as the variables representing misclassification error u and v. Although in many cases these problems may involve high
216
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
dimensionality of data, they can be efficiently solved by available LP solvers, for instance Xpress-MP or CPLEX. The misclassification error that is minimized here is usually referred to as the in-sample error, since it is measured for the training sample dataset. Note that if the in-sample error is unacceptably high, the classifying procedure can be repeated for each of the subsets of elements in the halfspaces generated by the separating plane. As a result of such a procedure several planes dividing the elements space into subspaces will be created, which is illustrated by Figure 8.1. Then every new element will be classified according to its location in a certain subspace. If we consider the case of only one separating plane, then after solving the above problem, each new cancer case is automatically classified into either malignant or benign class as follows: if the vector x corresponding to this case satisfies the condition xTw > 7 it is considered to be malignant, otherwise it is assumed to be benign. It is important to mention that although the approach described here is rather simple, its idea can be generalized for the case of multiple classes and multiple nonlinear separating surfaces. Another issue associated with the technique considered in this section is so-called overtraining the classijier, which can happen if the training sample is too large. In this case, the model can adjust to the training dataset too much, and it would not have enough flexibility to classify the unknown elements, which will increase the generalization (or, "outof-sample") error. In Mangasarian et al. (1995), the authors indicate that even one separating plane can be an overtrained classifier if the number of attributes in each vector is too large. They point out that the best out-of-sample results were achieved when only three attributes of each vector were taken into account, and one separating plane was used. These arguments lead to introducing the following concepts closely related to classification: feature selection and support vector machines (SVMs). A review of these and other optimization approaches in data mining is given in Bradley et al. (1999). The main idea of feature selection is choosing a minimal number of attributes (i.e., components of the vector x corresponding to a data element) that are used in the construction of separating surfaces (Bradley et al., 1998). This procedure is often important in practice, since it may produce a better classification in the sense of the out-of-sample error. The essence of support vector machines is to construct separating surfaces that will minimize the upper bound on the out-of-sample error. In the case of one linear surface (plane) separating the elements from two classes, this approach will choose the plane that maximizes the sum of the distances between the plane and the closest elements from each class,
8
Optimization Techniques in Medicine
Figure 8.1. An example of binary classification using linear separating surfaces
i.e., the "gap" between the elements from different classes (Burges, 1998; Vapnik, 1995). An application of support vector machines to breast cancer diagnoses is discussed in Lee et al. (2000).
2.2
Risk Prediction by logical analysis
Risk stratification is very common in medical practice. It is defined as the ability to predict undesirable outcomes by assessing patients using the available data: age, gender, health history, specific measurements like EEG, ECG, heart rate, etc. (Califf et al., 1996). The usefulness of any risk-stratification scheme arises from how it links the data to a specific outcome. Risk-stratification systems are usually based on some standard statistical models (Hosmer and Lemeshow, 1989). Recently a new methodology for risk prediction in medical applications using Logical Analysis of Data (LAD) was proposed (Alexe et al., 2003). The LAD technique was first introduced in Hammer (1986). This methodology is based on combinatorial optimization and Boolean logic. It was successfully applied for knowledge discovery and pattern recognition not only in medicine, but in oil exploration, seismology, finance, etc. (Boros et al., 2000). Next, we briefly describe the main idea of LAD. More detailed information about this approach can be found in Boros et al. (1997), Ekin et al. (2000), and Alexe et al. (2003). Let R c Rn be a set of observations. By R+ and R- we denote subsets of positive and negative observations respectively. We also need to define the notion of a pattern P:
218
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where ai (i E I) and ,Oj ( j E J) are sets of real numbers (so-called cutpoints), I and J are sets of indices. A pattern P is positive (negative) if P n Rf # 0 (P n 0- # 0 ) and P fl R- = 0 (P n RS = 0 ) . Obviously, in a general case for real-life applications the number of detected patterns can be extremely large. Patterns are characterized by three parameters: degree, prevalence and risk. The degree of a pattern is the number of inequalities, which identify the pattern in (8.7). The total number of observations in the pattern P is called its absolute prevalence. The relative prevalence of a pattern P is defined as the ratio of its absolute prevalence and 101. The risk p p of a pattern P identifies the proportion of positive observations in the pattern IP n R+I PP =
pnnl
'
Introducing some thresholds on degree, prevalence, and risk we can identify the high-risk and low-risk patterns in the given datasets. Set C = CS U C- is called a pandect, where CS (C-) is a set of high-risk (low-risk) patterns. An application of LAD to coronary risk prediction is presented in Alexe et al. (2003), where the problem of constructing a methodology for distinguishing groups of patients at high and at low mortality risk is addressed. The size of the pandect C in the considered problem was about 4700 low- and high-risk patterns, which is obviously too large for practical applications. To overcome this difficulty, a nonredundant system of low- and high-risk patterns T = TS U T- was obtained. This system satisfies the following properties:
Using the system T defined above the following classification tool referred to as the Prognostic Index ~ ( x is) defined as
where T+ (7-) is the number of high-risk (low-risk) patterns in T , and rf (x) (7-(2)) is the number of high-risk (low-risk) patterns, which are
8
Optimization Techniques in Medicine
219
satisfied by an observation x. Using ~ ( x )an , observation x is classified to low- or high-risk depending on the sign of ~ ( x )The . number of patients classified into the high- and low-risk groups was more than 97% of the size of the studied population. The proposed technique was shown to outperform standard methods used by cardiologists (Alexe et al., 2003).
2.3
Brain dynamics and epileptic seizure prediction
The human brain is one of the most complex systems ever studied by scientists. Enormous number of neurons and the dynamic nature of connections between them makes the analysis of brain function especially challenging. Probably the most important direction in studying the brain is treating disorders of the central nervous system. For instance, epilepsy is a common form of such disorders, which affects approximately 1% of the human population. Essentially, epileptic seizures represent excessive and hypersynchronous activity of the neurons in the cerebral cortex. During the last several years, significant progress in the field of epileptic seizures prediction has been made. The advances are associated with the extensive use of electroencephalograms (EEG) which can be treated as a quantitative representation of the brain functioning. Motivated by the fact that the complexity and variability of the epileptic seizure process in the human brain cannot be captured by traditional methods used to process physiological signals, in the late 1980s, Iasemidis and coworkers pioneered the use of the theory of nonlinear dynamics to link neuroscience with an obscure branch of mathematics and try to understand the collective dynamics of billions of interconnected neurons in brain (Iasemidis and Sackellares, 1990, 1991; Iasemidis et al., 2001). In those studies, measures of the spatiotemporal dynamical properties of the EEG were shown to be able to demonstrate patterns that correspond to specific clinical states (a diagram of electrode locations is provided in Figure 8.2). Since the brain is a nonstationary system, algorithms used to estimate measures of the brain dynamics should be capable of automatically identifying and appropriately weighing existing transients in the data. In a, chaotic system, orbits originating from similar initial conditions (nearby points in the state space) diverge exponentially (expansion process). The rate of divergence is an important aspect of the system dynamics and is reflected in the value of Lyapunov exponents. The method developed for estimation sf Short Term Maximum Lyapunov Exponents (STL,,), an estimate of L,, for nonstationary data, is explained in Iasemidis et al. i2000). Having estimated the STL,, temporal profiles
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Figure 8.2. (A) Inferior transverse and (B) lateral views of the brain, illustrating approximate depth and subdural electrode placement for EEG recordings are depicted. Subdural electrode strips are placed over the left orbitofrontal (AL),right orbitofrontal (AR), left subtemporal (BL), and right subtemporal (BR) cortex. Depth electrodes are placed in the left temporal depth ( ( C L ) and right temporal depth (CR) to record hippocampal activity
8
Optimization Techniques i n Medicine
221
at individual cortical site, and as the brain proceeds towards the ictal state, the temporal evolution of the stability of each cortical site can be quantified. However, the system under consideration (brain) has a spatial extent and, as such, information about the transition of the system towards the ictal state should also be included in the interactions of its spatial components. The spatial dynamics of this transition are captured by consideration of the relations of the STL,, between different cortical sites. For example, if a similar transition occurs at different cortical sites, the STL,, of the involved sites are expected to converge to similar values prior to the transition. Such participating sites are called "critical sites," and such a convergence "dynamical entrainment." More specifically, in order for the dynamical entrainment to have a statistical content, the T-index (from the well-known paired T-statistics for comparisons of means) as a measure of distance between the mean values of pairs of STL,, profiles over time can be used. The T-index at time t between electrode sites i and j is defined as: Ti,j(t) = v'Ex IE{STLmax,i - STLmaxj) 1 /ai (t) where E { . ) is the sample average difference for the STLmaX,i- STLmax estimated over a moving window wt(X) defined as: 1 if X E [ t - N - l , t ] i f ~ e [ t - ~ - ~ , t ] ,
o
where N is the length of the moving window. Then, ai,j(t) is the sample standard deviation of the STL,,, differences between electrode sites i and j within the moving window wt(X). Thus defined T-index follows a t-distribution with N - 1 degrees of freedom. Therefore, a two-sided t-test with N - 1 degrees of freedom, at a statistical significance level a should be used to test the null hypothesis, Ho: "brain sites i and j acquire identical STL,,, values at time t." Not surprisingly, the interictal (before), ictal (during), and immediate postictal (after the seizure) states differ with respect to the spatiotemporal dynamical properties of intracranial EEG recordings. However, the most remarkable finding was the discovery of characteristic spatiotemporal patterns among critical electrode sites during the hour preceding seizures (Iasemidis and Sackellares, 1990, 1991; Iasemidis et al., 2001; Sackellares et al., 2002; Pardalos et al., 2003b,a,c). Such critical electrode sites can be selected by applying quadratic optimization techniques and the electrode selection problem can be formulated as a quadratically constrained quadratic 0-1 problem (Pardalos et al., 2004a): min x T ~ x
(8.9)
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where the following definitions are used: A is a n x n matrix, whose each element a i j represents the T-index between electrode i and j within 10minute window before the onset of a seizure, B is n x n matrix, whose each element bi,j represents the T-index between electrode i and j within 10-minute window after the onset of a seizure, k denotes the number of selected critical electrode sites, T, is the critical value of T-index to reject Ho, vector x = ( x l , . . . ,x,) E (0, l j n , where each xi represents the cortical electrode site i . If the cortical site i is selected to be one of the critical electrode sites, then xi = 1; otherwise, xi = 0. The use of a quadratical constraint ensures that the selected electrode sites show dynamical resetting of the brain following seizures (Shiau et al., 2000; Pardalos et al., 2002a), that is, divergence of STLmaxprofiles after seizures. In more details seizure prediction algorithm based on nonlinear dynamics and multi-quadratic 0-1 programming is described in Pardalos et al. (2004a). other g o u p s reported evidence in support of the existence of the preictal transition, which is detectable through quantitative analysis of the EEG in Elger and Lehnertz (1998), Lehnertz and Elger (1998), Martinerie et al. (1998), Quyen et al. (1999), and Litt et al. (2001). The use of algebra-geometric approach to the study of dynamic processes in the brain is presented in Pardalos et al. (2003d). Quantum models are discussed in Pardalos et al. (2002b) and Jibu and Yassue (1995).
3.
Treatment planning
In this section we discuss an application of optimization techniques in treatment planning. Probably the most developed and popular domain in medicine, where optimization techniques are used for treatment planning is radiation therapy. Radiation therapy is the method to treat cancer with high-energy radiation that destroys the ability of cancerous cells to reproduce. There are two types of radiation therapy. The first one is an external beam radiation with high-energy rays aimed to the cancerous tissues. Multileaf collimator shapes the beam by blocking out some parts of the beam. To precisely shape the beam, multileaf collimators consist of a small array
8
223
Optimization Techniques in Medicine
of metal leaves for each beam. Thus, each beam is specified by a set of of evenly spaced strips (pencils), and the treatment plan is defined by a collection of beams with the amount of radiation to be delivered along each pencil within each beam. The other radiation therapy method is called brachytherapy. In this type of treatment, radioactive sources (seeds) are placed in or near the tumors. These two types of therapy need to be planned to localize the radiation area with the minimum of destroyed tissue. For external beam radiation therapy, radiation planning involves the specification of beams, direction, intensity and shape of the beam. It is a difficult problem because we need to optimize the dose to the tumor (cancerous area) and minimize the damage to healthy organs simultaneously. To reduce the difficulty of the treatment planning procedure, optimization techniques have been applied. Numerous optimization algorithms were developed for the treatment planning in radiation therapy. As one of the possible approaches one can consider multi-objective optimization techniques (Lahanas et al., 2003a,b). Linear (Lodwick et al., 1998), mixed-integer (Lee and Zaider, 2003; Lee et al., 2003b) and nonlinear programming (Billups and Kennedy, 2003; Ferris et al., 2003) techniques are extensively used in therapy planning. the initial step in any radiotherapy planning is to obtain a set of tomography images of the patient's body around the tumor. The images are then discretized into sets of pixels: critical (the set of pixels with healthy tissue sensitive to radiotherapy); body (the set of pixels with healthy tissue not very sensitive to radiotherapy) and tumor (the set of pixels with cancer cells). The formulations of the breatment planning models (linear, quadratic, etc.) depend on the specified clinical constraint: dose homogeneity, target coverage, dose limits for different anatomical structures, etc. As an example of a problem arising in this area consider the work presented in Billups and Kennedy (2003), where the following formulation based on Lodwick et al. (1998) was discussed: Minimize m a x dose to critical structures Subject to: required tumor dose
< tumor dose 5 max tumor dose,
normal tissue dose 5 dose bound for normal tissue dose 2 0.
Billups and Kennedy (2003) formulate the problem as follows:
s.t.: y -
7,x
D(c,p,b)z(p,b)
PEPb € B
> 0,
c E critical,
224
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where x(p, b) is the amount of radiation to be delivered using the pth pencil in the b-th beam, D(i, p, b) is the fraction of x(p, b) that will be delivered to pixel i, Tl and Tu are the lower bound and upper bound of the amount of radiation delivered to tumor element, respectively, and y is a dummy variable. If we denote by S the feasible region, the problem above can be written as: min y. -Y,xES
In order to reduce the number of used beams (Billups and Kennedy, 2003) penalized the objective function for each used beam by some fixed penalty P:
P
In order to reduce the radiation exposure to the healthy tissue, the brachytherapy was developed as an alternative to external beam radiation. Nevertheless, the right placement of seeds in tumors is a complicated problem. Lee and Zaider (2003) developed the treatment planning for prostate cancer cases by using a mixed integer programming optimization model. This algorithm uses 0-1 variables to indicate the placement and non-placement of seeds in the three-dimensional grid generated by ultrasound image. Since each seed radiates a certain amount of dose, the radiation dose at point P can be modeled from each location of seeds implanted in tumors. Using this idea, the authors formulated the contribution of seeds at point P by
where D ( r ) is dose contribution function, X j is a vector corresponding to point j and x j is 0-1 seed placement variables at j . The constraints of MIP model can be modeled with the lower and upper bond of dose at point P:
8
Optimization Techniques i n Medicine
225
where Up and Lp are the upper and lower bound for radiation dose at point P, respectively. A review of optimization methods in the radiation therapy is presented in Shepard et al. (1999). Also, some promising directions of future research in the radiation therapy are discussed in Lee et al. (2003a). For the description of some other treatment planning problems (besides radiation therapy) the reader is referred to Sainfort et al. (2004).
4.
Medical imaging
Recent advances in imaging technologies combined with marked improvement in instrumentation and development of computer systems have resulted in increasingly large amounts of information. New therapies have more requirements on the quality and accuracy of image information. Therefore, medical imaging plays an ever-increasing role in diagnosis, prediction, planning and decision-making. Many problems in this field are addressed using optimization and mathematical programming techniques. In particular, specialized mathematical programming t,echniques have been used in a variety of domains including object recognition, modeling and retrieval, image segmentation, registration, skeletonization, reconstruction, classification, etc. (Cho et al., 1993; Kuba et al., 1999; Udupa, 1999; Rangarajan et al., 2003; Du et al., 1999). Some recent publications on imaging using optimization address the following problems: reconstruction methods in electron paramagnetic resonance imaging (Johnson et al., 2OO3), skeletonization of vascular images in magnetic resonance angiography (Nystrom and Smedby, 2001), image reconstruction using multi-objective optimization (Li et al., 2000), etc. Discrete tomography extensively utilizes discrete mathematics and optimization theory. A nice overview on medical applications of discrete tomography is given in Kuba et al. (1999).
5.
Health care applications
Optimization and operations research methods are extensively used in a variety of problems in health care, including economic analysis (optimal pricing,'demand forecasting and planning), health care units operations (scheduling and logistics planning, inventory management, supply chain management, quality management, facility location), etc.
226
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Scheduling and logistic problems are among the most important and classical problems in optimization theory. One of the widely used applications of these problems to medicine is so-called nurse scheduling problem. Nurse scheduling planning is a non-trivial and important task, because it affects the efficiency and the quality of health care (Giglio, 1991). The schedule has to determine daily assignments of each nurse for a specified period of time while respecting certain constraints on hospital policy, personal preferences and qualification, workload, etc. Because of the practical importance of these problems, many algorithms and methods for solving it have been proposed (Miller et al., 1976; Warner, 1976; Isken and Hancock, 1991; Siferd and Benton, 1992; Weil et al., 1995). There are basically two types of nurse scheduling problems: cyclical and non-cyclical scheduling. In cyclical scheduling, an individual nurse works in a pattern repeatedly in a cycle of N weeks. On the other hand, non-cyclical scheduling generates a new scheduling period with available resources and policies that attempt to satisfy a given set of constraints. Recently, a problem of rerostering of nurse schedules was addressed in Moz and Pato (2003) This problem is common in hospitals where daily work is divided into shifts. The problem occurs in the case of the non-scheduled absence of one of the nurses, which violates one of the constraints for the given time shift. In Moz and Pato (2003), an integer multicommodity flow model was applied to the aforementioned problem and the corresponding integer linear programming problem was formulated. Computational results were reported for the real instances from the Lisbon state hospital. In general, optimization techniques and linear programming in particular is a very powerful tool, which can be used for many diverse problem in health care applications. For example, we can refer to Sewell and Jacobson (2003), where the problem of pricing of combination vaccines for childhood immunization was addressed using integer programming formulation. Other important problems in health care applications (inventory and queueing management, workforce and workload models, pricing, forecasting, etc.) are reviewed in Sainfort et al. (2004).
6.
Concluding remarks
In this chapter, we have identified and briefly summarized some of the promising research directions in the exciting interdisciplinary area of optimization in medicine. Although this review is certainly not exhaustive, we have described several important practical problems arising in various medical applications, as well as methods and algorithms used
8 Optimization Techniques in Medicine
227
for solving these problems. As we have seen, applying optimization techniques in medicine can often significantly improve the quality of medical treatment. It is also important to note that this research area is constantly growing, since new techniques are needed to process and analyze huge amounts of data arising in medical applications. Addressing these issues may involve a higher level of interdisciplinary effort in order to develop efficient optimization models combining mathematical theory and medical practice.
Acknowledgements This work was partially supported by a grant from the McKnight Brain Institute of University of Florida and NIH.
References Alexe, S., Blackstone, E., Hammer, P., Ishwaran, H., Lauer, M., and Snader C.P. (2003). Coronary risk prediction by logical analysis of data. Annals of Operations Research, 119:15-42, Billups, S. and Kennedy, J . (2003). minimum-support solutions for radiotherapy. Annals of Operation Research, 119:229-245. Boros, E., Hammer, P., Ibaraki, T., and Cogan, A. (1997). Logical analysis of numerical data. Mathematical Programming, 79:163-190. Boros, E., Hammer, P., Ibaraki, T., Cogan, A., Mayoraz, E., and Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions Knowledge and Data Engineering, 12:292-306. Bradley, P., Fayyad, U., and Mangasarian, 0. (1999). Mathematical programming for data mining: Formulations and challenges. INFORMS Journal on Computing, 11(3):217-238. Bradley, P., Mangasarian, 0., and Street, W. (1998). Feature selection via mathematical programming. INFORMS Journal on Computing, 10:209-217. Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121-167. Califf, R., Armstrong, P., Carver, J., D'Agostino, R., and Strauss, W. (1996). Stratification of patients into high, medium and low risk subgroups for purposes of risk factor management. Journal of the American College of Cardiology, 27(5):1007-1019. Cho, Z., Jones, J., and Singh, M. (1993) Foundations of Medical Imaging. Wiley.
228
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Du, D.-Z., Pardalos, P.M., and Wang, J . (eds.) (1999). Discrete Mathematical Problems with Medical Applications. DIMACS Worskhop. American Mathematical Society. Ekin, O., Hammer, P., and Kogan, A. (2000). Convexity and logical analysis of data. Theoretical Computer Science, 244:95-116. Elger, C. and Lehnertz, K. (1998). Seizure prediction by non-linear time series analysis of brain electrical activity. The European Journal of Neuroscience, 10:786-789. Ferris, M., Lim, J., and Shepard, D. (2003). Radiosurgery treatment planning via nonlinear programming. Annals of Operation Research, 1l9:247-260. Giglio, R. (1991). Resource scheduling: From theory to practice. Journal of the Society for Health Systems, 2(2):2-6. Hammer, P. (1986). Partialiy defined Boolean functions and cause-effect relationships. In: International Conference on Multi-Attribute Decision Making via OR-Based Expert Systems. University of Passau. Hosmer, D. and Lemeshow, S. (1989). Applied Logistic Regression. Wiley. Iasemidis, L., Pardalos, P., Sackellares, J., and Shiau, D.-S. (2001). Quadratic binary programming and dynamical system approach to determine the predictability of epileptic seizures. Journal of Combinatorial Optimization, 5:9-26. Iasemidis, L., Principe, J., and Sackellares, J . (2000). Measurement and quantification of spatiotemporal dynamics of human epileptic seizures. In: M. Akay (ed.), Nonlinear Biomedical Signal Processing, Vol. 11, pp. 294-318. IEEE Press. Iasemidis, L. andsackellares, J. (1990). Phase space topography of the electrocorticogram and the Lyapunov exponent in partial seizures. Brain Topography, 2:187-201. Iasemidis, L. and Sackellares, J . (1991). The evolution with time of the spatial distribution of the largest Lyapunov exponent on the human epileptic cortex. In: D. Duke and W. Pritchard (eds.), Measuring Chaos in the Human Brain, pp. 49-82. World Scientific. Isken, M. and Hancock, W. (1991). A heuristic approach to nurse scheduling in hospital units with non-stationary, urgent demand, and a fixed staff size. Journal of the Society for Health Systems, 2(2):24-41.
8
Optimization Techniques in Medicine
229
Jibu, M. and Yassue, K. (1995). Quantum Brain Dynamics and Consciouness: An Introduction. John Benjamins Publishing Company. Johnson, C., McGarry, D., Cook, J., Devasahayam, N., Mitchell, J., Subramanian, s., and Krishna, M. (2003). Maximum entropy reconstruction methods in electron paramagnetic resonance imaging. Annals of Operations Research, 119:lOl-118. Kuba, A., Herman, G. Matej, S., and Todd-Pokropek, A. (1999). Medical Applications of discrete tomography. In: D.-Z. Du, P. M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 195-208. DIMACS Series, vol. 55. American Mathematical Society. Lahanas, M., Baltas, D., and Zamboglou, N. (2003a). A hybrid evolutionary multiobjective algorithm for anatomy based dose optimization algorithm in HDR brachytherapy. Physics in Medicine and Biology, 48:399-415. Lahanas, M., Schreibmann, E., and Baltas, D. (2003b). Multiobjective inverse planning for intensity modulated radiotherapy with constraintfree gradient-based optimization algorithms. Physics in Medicine and Biology, 48:2843-2871. Lee, E., Deasy, J., Langer, M., Rardin, R., Deye, J., and Mahoney, F. (2003a). Final report-NCI/NSF Workshop on Operations Research Applied to Radiation Therapy. Annals of Operation Research, 119:143-146. Lee, E., Fox, T. and Crocker, I. (2003b). Integer programming applied to intensity-modulated radiation therapy treatment planning. Annals of Operation Research, 119:165-181. Lee, E. and Zaider, M. (2003). Mixed integer programmming approaches to treatment planning for brachytherapy - Application to permanent prostate implants. Annals of Operation Research, 119:147-163. Lee, Y.-J., Mangasarian, O., and Wolberg, W. (2000). Breast cancer survival and chemotherapy: A support vector machine analysis. In: D.-Z. Du, P.M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 1-9. DIMACS Series, vol. 55. American Mathematical Society. Lehnertz, K. and Elger, C. (1998). Can epileptic seizures be predicted? Evidence from nonlinear time series analysis of brain electrical activity. Physical Review Letters, 80:5019-5022.
230
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Li, X., Jiang, T., and Evans, D. J. (2000). Medical image reconstruction using a multi-objective genetic local search algorithm. International Journal of Computer Mathematics, 74:301-314. Litt, B., Esteller, R., Echauz, J.,Maryann, D., Shor, R., Henry, T., Pennell, P., Epstein, C., Bakay, R., Dichter, M., and Vachtservanos, G. (2001). Epileptic seizures may begin hours in advance of clinical onset: A report of five patients. Neuron, 30:51-64. Lodwick, W., McCourt, S., Newman, F., and Humphries, S. (1998). Optimization methods for radiation therapy plans. In: Computational, Radiology and Imaging: Therapy and Diagnosis. IMA Series in Applied Mathematics, Springer. Mangasarian, O., Street, W., and Wolberg, W. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4):570-577. Martinerie, J., Adam, C.V., and Quyen, M.L.V. (1998). Epileptic seizures can be anticipated by non-linear analysis. Nature Medicine, 4:1173-1176. Miller, H., Pierskalla, W., and Rath, G. (1976). Nurse scheduling using mathematical programming. Operations Research, 24(8):857-870. Moz, M. and Pato, M.V.(2003). An integer multicommodity flow model applied to the rerostering of nurse schedules. Annals of Operations Research, 1l9:285-3Ol. Nystrom, I. and Smedby, 0. (2001). Skeletonization of volumetric vascular images distance information utilized for visualization. Journal of Combinatorial Optimization, 5:27-41. Pardalos, P., Chaovalitwongse, W., Iasemidis, L., Sackellares, J., Shiau, D.-S., Carney, P., Prokopyev, O.A., and Yatsenko, V.A. (2004a). Seizure warning algorithm based on optimization and nonlinear dynamics. Revised and resubmitted to Mathematical Programming. Pardalos, P., Iasemidis, L., Shiau, D.-S., and Sackellares, J. (2002a). Combined application of global optimization and nonlinear dynamics to detect state resetting in human epilepsy. In: P. Pardalos and J. Principe (eds.), Biocomputing, pp. 140-158. Kluwer Acedemic Publishers. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., and Chaovalitwongse, W. (2003a). Prediction of human epileptic seizures based on
8
Optimization Techniques in Medicine
231
optimization and phase changes of brain electrical activity. Optimization Methods and Software, 18(1):81-104. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., Chaovalitwongse, W., Principe, J., and Carney, P. (2003b). Adaptive epileptic siezure prediction system. IEEE Transactions on Biomedical Engineering, 50(5):616-626. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., Yatsenko, V., and Chaovalitwongse, W. (2003~).Analysis of EGG data using optimization, statistics, and dynamical systems techniques. Computational Statistics and Data Analysis, 44:391-408. Pardalos, P. and Principe, J. (eds.) (2002). Biocomputing. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., Carney, P., and Iasemidis, L. (eds.) (2004b) Quantitatzve Neuroscience: Models, Algorithms, Diagnostics, and Therapeutic Applications. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., and Yatsenko, V. (2002b). Classical and quantum controlled lattices: Self-organization, optimization and biomedical applications. In: P. Pardalos and J , Principe (eds.), Biocomputing, pp. 199-224. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., Yatsenko, V., and Butenko, S. (2003d). Nonlinear dynamical systems and adaptive filters in biomedicine. Annals of Operations Research, 119:119-142. Quyen, M.L.V., Martinerie, J., Baulac, M., and Varela, F. (1999). Anticipating epileptic seizures in real time by non-linear analysis of similarity between EEG recordings. Neuroreport, 10:2149-2155. Rangarajan, A., Figueiredo, M., and Zerubia, J. (eds.) (2003) Energy Minimization Methods zn Computer Vision and Pattern Recognition, 4th International Workshop, EMMCVRP 2003. Springer. Sackellares, J., Iasemidis, L., Gilmore, R., and Roper, S. (2002). Epilepsy-when chaos fails. In: K. Lehnertz, J. Arnhold, P. Grassberger, and C. Elger (eds.), Chaos in the brain? Sainfort, F., Brandeau, M.,and Pierskalla, W. (eds.) (2004) Handbook of Operations Research and Health Care. Kluwer Academic Publishers. Sewell, E. and Jacobson, S. (2003). Using an integer programming model to determine the price of combination vaccines for childhood immunization. Annals of Operations Research, 119:261-284.
232
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Shepard, D., Ferris, M., Olivera, G., and Mackie, T. (1999). Optimizing the delivery of radiation therapy to cancer patients. SIAM Review, 41 (4):721-744. Shiau, D., Luo, Q., Gilmore, S., Roper, S., Pardalos, P., Sackellares, J., and Iasemidis, L. (2000). Epileptic seizures resetting revisited. Epilepsia, 41/S7:208-209. Siferd, S. and Benton, W. (1992). Workforce staffing and scheduling: Hospital nursing specific models. European Journal of Operations Research, 60:233-246. Udupa, J. (1999). A study of 3D imaging approaches in medicine. In: D.-Z. Du, P.M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 209--216.DIMACS Series, vol. 55. American Mathematical Society. Vapnik, V. (1995). The -Vatwe of Statistical Learning Theory. Springer. Warner, D. (1976). Scheduling nursing personnel according to nursing preference: A mathematical programming approach. Operations Research, 24(8):842-856. Weil, G., Heus, K., Francois, P., and Poujade, M. (1995). Constraint programming for nurse scheduling. IEEE Engineering in Medicine and Biology, 14(4):417-422.
Chapter 9
GLOBAL OPTIMIZATION I N GEOMETRY - CIRCLE PACKING INTO T H E SQUARE Phter GSbor Szab6 MihSly Csaba Mark6t Tibor Csendes Abstract
1.
The present review paper summarizes the research work done mostly by the authors on packing equal circles in the unit square in the last years.
Introduction
The problem of finding the densest packing of n equal objects in a bounded space is a classical one which arises in many scientific and engineering fields. For the two-dimensional case, it is a well-known problem of discrete geometry. The Hungarian mathematician Farkas Bolyai (1775-1856) published in his principal work ('Tentamen', 183233 Bolyai, 1904) a dense regular packing of equal circles in an equilateral triangle (see Figure 9.1). He defined an infinite packing series and investigated the limit of vacuitas (the gap in the triangle outside the circles). It is interesting that these packings are not always optimal in spite of the fact that they are based on hexagonal grid packings (Szab6, 2000a). Bolyai was probably the first author in the mathematical literature who studied the density of a series of packing circles in a bounded shape. Of course, the work of Bolyai was not the very first in packing circles. There were other interesting early packings in fine arts, relics of religions and in nature (Tarnai, 1997), too. The old Japanese sangaku problems (Fukagawa and Pedoe, 1989; Szab6, 2001) contain many nice results related to the packing of circles. Figure 9.2 shows an example of packing 6 equal circles in a rectangle. The problem of finding the densest packing of n equal and nonoverlapping circles has been. studied for several shapes of the bounding
234
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Figure 9.1. triangle.
The example of Bolyai for packing 19 equal circles in an equilateral
Figure 9.2. Packing of 6 equal circles in a rectangle on a rock from Japan.
9
Circle Paclcing into the Square
235
region, e.g., in a rectangle (Ruda, 1969), in a triangle (Graham and Lubachevsky, 1995) and circle (Graham et al., 1998). Our work focuses only on the 'Packing of Equal Circles in a Square'-problem. The Hungarian mathematicians Dezso LBzBr and LBszl6 Fejes T6th have already investigated the problem before 1940 (Staar, 1990; Szab6 and Csendes, 2001). The problem first appeared in literature in 1960, when Leo Moser (1960) guessed the optimal arrangement of 8 circles. Schaer and Meir (1965) proved this conjecture and Schaer (1965) solved the n = 9 case, too. Schaer has given also a proof for n = 7 in a letter to Leo Moser in 1964, but he never published it. There is a similar unpublished result from R. Graham in a private letter for n = 6. Later Schwartz (1970) and Melissen (1994) have given proof for this case (up to n = 5 circles the problem is trivial). The next challenge was the n = 10 case. de Groot et al. (1990) solved this after many authors published new and improved packings: Goldberg (1970); Milano (1987); Mollard and Payan (1990); Schaer (1971); Schliiter (1979) and Valette (1989). Some unpublished results are known also in this case: Griinbaum (1990); Grannell (1990); Petris and Hungerbiiler (1990). The proof is based on a computer aided method, and nobody published a proof using only pure mathematical tools. There is an interesting mathematical approach of this case in Hujter (1999). Peikert et al. (1992) found and proved optimal packings up to n = 20 using a computer aided method. Based on theoretical tools only, G. Wengerodt solved the problem for n = 14, 16 and 25 (Wengerodt, 1983, 1987a,b), and with K. Kirchner for n = 36 (Kirchner and Wengerodt, 1987). In the last decades, several deterministic (Locatelli and Raber, 2002; Markbt, 2003a; Mark6t and Csendes, 2004; Nurmela and OstergArd, 1999a; Peikert et al., 1992) and stochastic (Boll et al., 2000; Casado et al., 2001; Graham and Lubachevsky, 1996) methods were published. Proven optimal packings are known up to n = 30 (Nurmela and Ostergbrd, 1999a; Peikert et al., 1992; Markbt, 2003a; Mark6t and Csendes, 2004) and for n = 36 (Kirchner and Wengerodt, 1987). Approximate packings (packings determined by computer aided numerical computations without a rigorous proof) and candidate packings (best known arrangements with a proof of existence but without proof of optimality) were reported ir, the literature for up to n = 200: Boll et al. (2000); Casado et al. (2001); Graham and Lubachevsky (1996); Nurmela and OstergArd (1997); Szab6 and Specht (2005). At the same time, some other results (e.g. repeated patterns, properties of the optimal solutions and bounds, minimal polynomials of packings) were published as well (Graham and Lubachevsky, 1996; Locatelli and Raber, 2002; Nurmela
236
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
et al., 1999; Tarnai and GBsp&r, 1995-96; Szab6, 2000b; Szab6 et al., 2001; Szab6, 2004).
The packing circles in a square problem
2.
The packing circles in a square problem can be described by the following equivalent problem settings:
PROBLEM 1 Find the value of the m a x i m u m circle radius, r,, such that n equal non-overlapping circles can be placed i n a unit square. PROBLEM2 Locate n points i n a unit square, such that the m i n i m u m distance mn between any two points is maximal.
PROBLEM 3 Give the smallest square of side p,, which contains n equal and non-overlapping circles where the radius of circles is 1.
PROBLEM 4 Determine the smallest square of side an that contains n points with mutual distance of at least 1.
2.1
Optimization models
The problem is at one hand a geometrical problem and on the other hand a continuous global optimization problem. Problem 2 can be written shortly as a 2n 1 dimensional continuous nonlinear constrained (or max-min) global optimization problem in the following form:
+
This problem can be considered in the following ways:
a)
as a DC programming problem (Horst and Thoai, 1999).
A DC (difference of convex functions) programming problem is a mathematical programming problem, where the objective function can be described by a difference of two convex hnctions. The objective function of the problem can he stated as the difference of the following two convex functions g and h :
9
Circle Packing into the Square
where
or as an all-quadratic optimization problem. The general form of an all-quadratic optimization problem (Raber, 1999) is
b)
+ (do)Tx]
min[xT Q0x subject to
1 = 27, n # 36 was the highly increasing number of initial tile combinations. For n = 28, a sequential process on those combinations would have required about 1000 times more processor time (about several decades) even with non-interval computations -compared to the case of n = 27. The idea behind the newly proposed method is that we can utilize the local relations (patterns) between the tiles and eliminate groups of tile combinations together. Let us denote a generalized point packing problem instance by P ( n , X I , . . . ,Xn, Yl, . . . , Yn), where n is the number of points to be located, (Xi, Y,) E I I ~ ,i = 1 , . . .n are the components of the starting box, and the objective function of the problem is given by (9.3). The theorem below shows how to apply a result achieved on a 2m-dimensional packing problem to a 2n-dimensional problem with n>m>2.
THEOREM 9.7 ( M A R K ~ATN D CSENDES,2004) Assume that n 2 are integers and let
and
>m >
Pn = P ( n , X i , . . . ,Xn, Yl, . . . ,Yn) = P ( n , ( X , Y))
be point packing problem instances (Xi, Y,, Zi, Wi' E 1;Xi, Y,, Zi, Wi G [O, 11). R u n the B&B algorithm o n Pm using an f c u t 0 8 value in the accelerating devices but skipping the step of improving f . Stop after a n arbitrary preset number of iteration steps. Let (Zi, . . . , Z k , Wi, . . . , WA) := (Z', W') be the enclosure of all the elements placed o n the WorkList and o n the ResultList. Assume that there exists an invertible, distancepreserving geometric transformation cp with cp(Zi) = Xi and cp(Wi) = Y,, satisfying i = 1 , . . . ,m. T h e n for each point packing (x, y) E (x, y) E (X, Y) and fn(z, y ) f , the statement
>
(x, Y ) E
(dzi),.
also holds.
d Z k ) , X m + l , . . . ,Xn, ~p(J+'i),. - ., ~p(Wk),Ym+l, ,Yn) := (X',Y1)
256
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
- B&B
refl.
-
Figure 9.7. The idea behind processing tile combinations.
The meaning of Theorem 9.7 is the following: assume that we are able to reduce some search regions on a tile set S t . When processing a higher dimensional subproblem on a tile set S containing the image of the tile set of the smaller problem, it is enough to consider the image of those of the remaining regions of St as t,he particular coinpoiients of the latter problem. Figure 9.7 illustrates the application of the idea of l-landing sets of tile-combinations: the remaining regions of the tile combinations S and S' are given by the shaded areas. The transformation p is a reflection to the horizontal centerline of the rectangular region enclosing S'.
9.1 ( M A R K ~AND T CSENDES,2004) Let p be the identity COROI,I,ARY transformation and assume that the BBB algorithm terminates with a n e m p t y WorlcList and with a n e m p t y R e s u l t l i s t , i e . , the whole search W ) = ( Z I ,. . . , Zm, W I , . . . , Wm) = ( X I , .. . , XvL1Y I , . . . , Ym) region (2, i s eliminated by the accelerating devices using (the s a m e ) f . T h e n ( X ,Y) does n o t contain a n y ( 2 ,y) E R~~ vectors for which f,,(z, y ) 2 f holds.
6.8
Tile algorithms used in the optimality proofs
The method of the optimality proofs is started by finding feasible tile patterns and their remaining areas on some small subsets of the whole set of tiles. Then bigger and bigger subsets are processed while using the results of the previous steps. Thus, the whole method consists of several phases. The two basic procedures are:
Grow0 add tiles from a new coiumn to each element of a set of tile combinations.
Join0 join the elements of two sets of tile coinhinations pairwise. The detailed description of Join ( ) and Grow ( ) and the strategy of increasing the dimensionality of the subproblems can be found in Mark6t and Csendes (2004).
9
Circle Packing into the Square
257
Numerical results: optimal packings for
6.9
n = 28, 29, 30 The results obtained with the multiphase interval arithmetic based optimality proofs are summarized below: Apart from symmetric cases, one initial tile combination (more precisely, the remaining areas of the particular combination) contains all the global optimal solutions of the packing problem of n points. The guaranteed enclosures of the global maximum values of Problem 2 are
F&= [0.2305354936426673,0.2305354936426743], w (F&) z7 . l0-l5,
Fig = [0.2268829007442089,0.2268829007442240], w (F,*,)z2 . 10-14, F&= [0.2245029645310881,0.2245029645310903], w(F,",) x 2 . 10-15. The exact global maximum value differs from the currently best known function value by at most w(F,*). Apart from symmetric cases, all the global optimizers of the problem of packing n points are located in an (X,Y ) : box (see Mark6t and Csendes, 2004). The components of the result boxes have the widths of between approximately 10-12(with the exception of the components enclosing possibly free points). The differences between the volume of the whole search space and the result boxes are more than 711, 764, and 872 orders of magnitudes, respectively. The total computational time was approximately 53, 50, and 20 hours, respectively. The total time complexities are remarkably less than the forecasted execution times of the predecessor methods.
6.10
Optimality of the conjectured best structures
An optimal packing structure specifies which points are located on the sides of the square, which pairs have minimal distance, and which points of the packing can move while keeping optimality. The output of our methods serves only as a numerical approximation to the solution of the particular problems but it says nothing about the structure of the optimal packing(s). Extending the ideas given in Nurmela and Osterg&rd
258
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
(1999a) to an interval-based context, in a forthcoming paper we intent to prove also some structural properties of the global optimizers (for details see Mark& 2003b).
Acknowledgments The authors are grateful for all the help given by colleagues for the underlying research. This work was supported by the Grants OTKA T 016413, T 017241, OTKA T 034350, FKFP 0739197, and by the Grants OMFB D-30/2000, OMFB E-2412001.
References Alefeld, G. and Herzberger, J. (1983). Introduction to Internal Computations. Academic Press, New York. Althofer, I. and Koschnick, K.U. (1991). On the convergence of threshold accepting. Applied Mathematics and Optimization, 24: 183-195. Ament, P. and Blind, G.(2000). Packing equal circles in a square. Studia Scientiamm Mathematicarum Hungarica, 36:313-316. Boll, D.W., Donovan, J., Graham, R.L., and Lubachevsky, B.D. (2000). Improving dense packings of equal disks in a square. Electronic Journal of Combinatorzcs, 7:R46. Bolyai. F. (1904). Tentamen Juventutem Studiosam in Elementa Matheseos Purae, Elementaris Ac Sublimioris, Methodo Intituitiva, Evidentiaque Huic Propria, Introducendi, Volume 2, Second edition, pp. 119122. Casado, L.G., Garcia, I., and Sergeyev, Ya.D. (2000). Interval branch and bound algorithm for finding the first-zero-crossing-point in onedimensional functions. Reliable Computing, 6:179-191. Casado, L.G., Garcia, I., Szab6, P.G., and Csendes, T. (2001) Packing equal circles in a square. 11. New results for up to 100 circles using the TAMSASS-PECS stochastic algorithm. In: Optimization Theory: Recent Developments from Mcitrahdza, pp. 207-224. Kluwer, Dordrecht. Croft, H.T., Falconer, K.J., and Guy, R.K. (1991). Unsolved Problems in Geometry, pp. 108-110. Springer, New York. Csallner, A.E. Csendes, T., and Mark&, M.Cs. (2000). Multisection in interval methods for global optimization. I. Theoretical results. Journal of Global Optimization, 16:371-392. Csendes, T. (1988). Nonlinear parameter estimation by global optimization-- Efficiency and reliability. Acta Cybernetica, 8:361-370.
9 Circle Packing into the Square
259
Csendes, T . and Ratz, D. (1997). Subdivision direction selection in interval methods for global optimization, SIAM Journal on Numerical Analysis, 34:922-938. Du, D.Z. and Pardalos, P.M. (1995). Minimax and Applications. Kluwer, Dordrecht . Dueck, G. and Scheuer, T . (1990). Threshold accepting: A general purpose optimization algorithm appearing superior to simulated annealing. Journal of Computational Physics, 90:161-175. Fejes T6th, G. (1997). Handbook of Discrete and Computational Geometry. CRC Press, Boca Raton. Fejes T6th, L. (1972). Lagerungen in der Ebene, auf der Kugel und im Raum. Springer-Verlag, Berlin. Fodor, F. (1999). The densest packing of 19 congruent circles in a circle. Geometriae Dedicata 74:139-145. Folkman, J.H. and Graham, R.L. (1969). A packing inequality for compact convex subsets of the plane. Canadian Mathematical Bulletin, 12:745-752. Fukagawa, H. and Pedoe, D. (1989). Japanese temple geometry problems. Sun gaku. Charles Babbage Research Centre, Winnipeg. Goldberg, M. (1970). The packing of equal circles in a square. Mathematics Magazine, 43:24-30. Goldberg, M. (1971). Packing of 14, 16, 17 and 20 circles in a circles. Mathematics Magazine, 44:134-139. Graham, R.L. and Lubachevsky, B.D. (1995). Dense packings of equal disks in an equilateral triangle from 22 to 34 and beyond. Electronic Journal of Combinatorics 2:Al. Graham, R.L. and Lubachevsky, B.D. (1996). Repeated patterns of dense packings of equal circles in a square, Electronic Journal of Combinatonics, 3:R17. Graham, R.L., Lubachevsky, B.D., Nurmela, K.J., and Osterg&rd,P.R.J. (1998). Dense packings of congruent circles in a circle. Discrete Mathematics 181:139-154. Grannell, M. (1990). An Even Better Packing of Ten Equal Circles in a Square. Manuscript.
260
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
de Groot, C., Monagan, M., Peikert, R., and Wurtz, D. (1992). Packing circles in a square: review and new results. In: System Modeling and Optimization, pp. 45-54. Lecture Notes in Control and Information Services, vol. 180. de Groot, C., Peikert, R. and Wurtz, D. (1990). The Optimal Packing of Ten Equal Circles in a Square. IPS Research Report No. 90-12, Eidgenossiche Technische Hochschule, Ziirich . Grunbaum, B. (1990). An Improved Packing of Ten Circles in a Square. Manuscript. Hadwiger, H. (1944). ~ b e extremale r Punktverteilungen in ebenen Gebieten. Mathematische Zeitschrift, 49:370-373. Hammer, R. Hocks, M., Kulisch, U., and Ratz, D. (1993). Numerical Toolbox for Veriified Computing. I. Springer-Verlag, Berlin. Hansen, E. (1992). Global Optimization Using Interval Analysis. Marcel Dekker, New York. van Hentenryck, P., McAllester, D., and Kapur, D. (1997). Solving polynomial systems using a branch and prune approach, SIAM Journal on Numerical Analysis, 34:797-827, 1997. Horst R. and Thoai. N.V. (1999). D.C. programming: Overview, Journal of Optimization Theory and Applications, 103:l-43. Hujter, M. (1999). Some numerical problems in discrete geometry. Computers and Mathematics with Applications, 38:175-178. Karnop, D.C. (1963). Random search techniques for optimization problems. Automatzca, 1:111-121. Kearfott, R.B. (1996). Test results for an interval branch and bound algorithm for equality-constrained optimization. In: Computational Methods and Applications, pp. 181-200. Kluwer, Dordrecht. Kirchner, K. and Wengerodt, G. (1987). Die dichteste Packung von 36 Kreisen in einem Quadrat. Beitrage zur Algebra und Geometrie, 25:147-159, 1987. Knuppel, 0 . ( l 9 B a ) . PROFIL -Programmer's Runtime Optimized Fast Interval Library. Bericht 93.4., Technische Universitat HamburgHarburg. Knuppel, 0. (1993b). A Multiple Precision. Arithmetic for PROFIL. Bericht 93.6, Technische Universitat Hamburg-Harburg.
9 Circle Packing into the Square
261
Kravitz, S. (1967). Packing cylinders into cylindrical containers. Mathematics Magazine, 40:65-71. Locatelli, M. and Raber, U. (1999). A Deterministic global optimization approach for solving the problem of packing equal circles in a square. In: International Workshop on Global Optimixation (G0.99), F'lrenze. Locatelli, M. and Raber, U. (2002). Packing equal circles in a square: A deterministic global optimization approach. Discrete Applied Mathematics, 122:139-166. Lubachevsky, B.D. (1991). How to simulate billiards and similar systems. Journal of Computational Physics, 94:255-283. Lubachevsky, B.D. and Graham, R.L. (1997). Curved hexagonal packings of equal disks in a circle. Discrete and Computational Geometry, 18:179-194. Lubachevsky, B.D. Graham, R.L., and Stillinger, F.H. (1997). Patterns and structures in disk packings. Periodica Mathematica Hungarica, 34:123-142. Lubachevsky, B.D. and Stillinger, F.H. (1990). Geometric properties of random disk packings. Journal of Statistical Physics, 60:561-583. Maranas, C.D., Floudas, C.A., and Pardalos, P.M. (1998). New results in the packing of equal circles in a square. Discrete Mathematics, 128:187-193. Markbt, M.Cs. (2000). An interval method to validate optimal solutions of the "packing circles in a unit square" problems. Central European Journal of Operational Research, 8:63-78. Mark&, M.Cs. (2003a). Optimal packing of 28 equal circles in a unit square- The first reliable solution. Numerical Algorithms, 37:253261. Markbt, M.Cs. (2003b). Reliable Global Optimixation Methods for Constrained Problems and Thew Application for Solving Circle Packing Problems (in Hungarian). Ph.D. dissertation. Szeged. Available at http://www.inf.u-szeged.hu/-markot/phdmm.ps.gz Mark&, M.Cs. and Csendes, T. (2004). A New verified optimization technique for the "packing circles in a unit square" problems. Forthcoming in SIAM Journal on Optimixation.
262
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Mark&, M.Cs., Csendes, T., and Csallner, A.E. (2000). Multisection in interval methods for global optimization. 11. Numerical tests. Journal of Global Optimixation, 16:219-228. Matyas, J . (1965). Random optimization. Automatixation and Remote Control, 26:244-251. McDonnell, J.R. and Waagen, D. (1994). Evolving recurrent perceptrons for time-series modeling. IEEE Transactions on Neural Networks, 5:24-38. Melissen, J.B.M. (1993). Densest packings for congruent circles in an equilateral triangle. American Mathematical Monthly, 100:916-925. Melissen, J.B.M. (l994a). Densest packing of six equal circles in a square. Elemente der Mathematik, 49:27-31. Melissen, J.B.M. (1994b). Densest packing of eleven congruent circles in a circle. Geometriae Dedicata, 50:15-25. Melissen, J.B.M. (1994~).Optimal packings of eleven equal circles in an equilateral triangle. Acta Mathernatica Hungarica, 65:389-393. Melissen, J.B.M. and Schuur, P.C. (1995). Packing 16, 17 or 18 circles in an equilateral triangle. Discrete Mathematics, 145:333-342. Milano, R. (1987). Configurations optimales de desques duns un polygone rigulier. Mkmoire de licence, Universitk Libre de Bruxelles. Mollard, M. and Payan, C. (1990). Some progress in the packing of equal circles in a square. Discrete Mathematics, 84:303-307. Moore, R.E. (1966). Interval Analysis. Prentice-Hall, Englewood Cliffs. Moser, L. (1960). Problem 24 (corrected), Canadian Mathematical Bulletin, 8:78. Neumaier, A. (2001). Introduction to Numerical Analysis. Cambridge Univ. Press, Cambridge. Nurmela, K.J. (1993). Constructing Combinatorial Designs by Local Search. Series A: Research Reports 27, Digital Systems Laboratory, Helsinki University of Technology. Nurrnela, K.J. and Osterg&rd, P.R.J. (1997). Packing up to 50 equal circles in a square. Discrete and Computational Geometry, 18:111120.
9
Circle Packing into the Square
263
Nurmela, K. J. and Osterg&rd, P.R.J . (l999a). More optimal packings of equal circles in a square. Discrete and Computational Geometry, 22:439-457. Nurmela, K.J. and Osterg&rd,P.R.J. (1999b). Optimal packings of equal circles in a square. In: Y. Alavi, D.R. Lick, and A. Schwenk (eds.), Combinatorics, Graph Theory, and Algorithms, pp. 671-680. Nurmela, K.J., Osterg&rd, P.R.J., and aus dem Spring, R. (1999). Asymptotic Behaviour of Optimal Circle Packings in a Square. Canadian Mathematical Bulletin, 42:380-385, 1999. Oler, N. (1961a). An inequality in the geometry of numbers. Acta Mathematica, 105:19-48. Oler, N. (1961b). A finite packing problem. Canadian Mathematical Bulletin; 4:153-155. Peikert, R. (1994). Dichteste Packungen von gleichen Kreisen in einem Quadrat, Elemente der Mathematik, 49:16-26. Peikert, R., Wurtz, D., Monagan, M., and de Groot, C. (1992). Packing circles in a square: A review and new results. In: P. Kall (ed.), System Modellzng and Optimization, pp. 45-54. Lecture Notes in Control and Information Sciences, vol. 180. Springer-Verlag, Berlin. Petris, J. and Hungerbuler, N. (1990). Manuscript. Pirl, U. (1969). Der Mindestabstand von n in der Einheitskreisscheibe gelegenen Punkten. Mathematische Nachrichten, 40: 111-124. Raber, U. (1999). Nonconvex All-Quadratic Global Optimization Problems: Solution Methods, Application and Related Topics. Ph.D. thesis. University of Trier. Rao, S.S. (1978). Optimization Theory and Applications. John Willey and Sons, New York. Ratschek, H. and Rokne! J. (1988). New Computer Methods for Global Optimizatzon. Ellis IIorwood, Chichester. Reis, G.E. (1975). Dense packings of equal circles within a circle. Mathematics Magazine, 48:33-37. Ruda, M. (1969). Packing circles in a rectangle (in Hungarian). Magyar Tudoma'nyos Akad6mia Matematikai 6s Fizikai Tudoma'nyok Osztcilya'nak Kozieme'nyei, 19:73-87.
264
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Schaer, J. (1965). The densest packing of nine circles in a square, Canadian Mathematical Bulletin, 8:273-277. Schaer, J. (1971). On the densest packing of ten equal circles in a square. Mathematics Magazine, 44:139-140. Schaer, J. and Meir, A. (1965). On a geometric extremum problem. Canadian Mathematical Bulletin, 8:21-27. Schliiter, K. (1979). Kreispackung in Quadraten. Elemente der Mathematik, 34:12-14. Schwartz, B.L. (1970). Separating points in a square. Journal of Recreational Mathematics, 3:195-204. Solis, F.J. and Wets, J.B. (1981). Minimization by random search techniques. Mathematics of Operations Research, 6:19-50.
E. Specht 's packing web site. h t t p ://www .packomania .com Specht, E. and Szab6, P.G. (2004). Lattice and Near-Lattice Packings of Equal Circles in a Square. In preparation. Staar, Gy. (1990). The Lived Mathematics (in Hungarian). Gondolat, Budapest. Szab6, P.G. (2000a). Optimal packings of circles in a square (in Hungarian). Polygon, X:48-64. Szab6, P.G. (2000b). Some new structures for the "equal circles packing in a square" problem. Central European Journal of Operations Research, 8:79-91. P.G. Szab6 (2001). Sangaku- Wooden boards of mathematics in Japanese temples (in Hungarian). KoMaL, 7:386-388. Szab6, P.G. (2004). Optimal substructures in optimal and approximate circle packings. Forthcoming in Beitrage zur Algebra und Geometrie. Szab6, P.G. and Csendes, T . (2001). Dezso LBzAr and the densest packing of equal circles in a square problem (in Hungarian). Magyar Tudoma'ny, 8:984-985. Szab6, P.G. Csendes, T., Casado, L.G., and Garcia, I. (2001). Packing equal circles in a square. I. Problem setting and bounds for optimal solutions. In: Optimization Theory: Recent Developments from Ma'trahcixa, pp. 191-206. Kluwer, Dordrecht.
9
Circle Packing into the Square
265
Szab6, P.G. and Specht, E. (2005). Packing up to 200 Equal Circles in a Square. Submitted for publication. Tarnai, T . (1997). Packing of equal circles in a circle. Structural Morphology: Toward the New Millenium, pp. 217-224. The University of Nottingham, Nottingham. Tarnai, T . and GBspBr, Zs. (1995-96). Packing of equal circles in a square. Acta Technica Academiae Scientiarum Hungaricae, 107(1-2):123-135. Valette, G. (1989). A better packing of ten circles in a square. Discrete Mathematics, 76:57-59. Wengerodt, G (1983). Die dichteste Packung von 16 Kreisen in einem Quadrat. Beitrage xur Algebra und Geometrie, 16:173-190. Wengerodt, G. (1987a). Die dichteste Packung von 14 Kreisen in einem Quadrat. Beitrage xur Algebra und Geometrie, 25:25-46. Wengerodt, G. (1987b). Die dichteste Packung von 25 Kreisen in einem Quadrat. Annales Universitatis Scientiarum Budapestinensis de Rolando Eotvos Nominatae. Sectio Mathematica, 30:3-15. Wiirtz, D., Monagan, M., and Peikert, R. (1994). The history of packing circles in a square. Maple Technical Newsletter, 0:35-42.
Chapter 10
A DETERMINISTIC GLOBAL OPTIMIZATION ALGORITHM FOR DESIGN PROBLEMS FrBdkric Messine Abstract
1.
Complete extensions of standard deterministic Branch-and-Bound algorithms based on interval analysis are presented hereafter in order to solve design problems which can be formulated as non-homogeneous mixed-constrained global optimization problems. This involves the consideration of variables of different kinds: real, integer, logical or categorical. In order to solve interesting design problems with an important number of variables, some accelerating procedures must be introduced in these extended algorithms. They are based on constraint propagation techniques and are explained in t,his chapter. In order to validate the designing methodology, rotating machines with permanent magnets are considered. The corresponding analytical model is recalled and some global optimal design solutions are presented and discussed.
Introduction
Design problems are generally very hard to solve and furthermore very difficult to formulate in a rational way. For instance, the design of electro-mechanical actuators is clearly understood as an inverse problem: from some characteristic values given by the designer, find the physical structures, components and dimensions which entirely describe the resulting actuator. This inverse problem is ill-posed in the Hadamard sense because, even if the existence of a solution could be guaranteed, most often there is a large, or even an infinite number of solutions. Hence, only some solut,ions can be characterized and then, it becomes natural to search the optimal ones by considering some criteria, a priori, defined. As it is explained in Fitan et al. (2004) and Messine et a1 (2001), general inverse problems must consider the dimensions but also the structure and the components of a sort of actuator. Thus, an interesting formu-
268
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
lation of the design problems of electro-mechanical actuators-or other similar design problems-consists in considering the associated following non-homogeneous mixed constrained optimixation problem: min
xERn~,~EWn=, b€Bnb , k ~ n r ! + i
f (x, z , b , k)
subject to
(10.1) gi(x,x,b,k) 5 0 'di E (1,...,n g ) h j ( x , x , b , k ) = O ' d j ~ ( 1..., , nh)
where f , gi and hj are real functions, Kirepresents an enumerated set of categorical variables, for example a type of material, and B = {0,1) the logical set which is used to model some different possible structures. Interval analysis was introduced by Moore (1966) in order to control the numerical errors generated by the floating point representations and operations. Consequently, a real value x is enclosed by an interval where the lower and upper bounds correspond to the first floating point numbers over and under x. The operations over the intervals are then developed, so defining the interval arithmetic. Using this tool, reliable enclosures of functions are obtained. In global optimization and more precisely in Branch-and-Bound techniques, interval analysis is meant to compute reliable bounds of the global optimum for univariate or multivariate, non-linear or non-convex homogeneous analytical functions (Hansen, 1992; Kearfott, 1996; Messine, 1997; Moore, 1966; Ratschek and Rokne, 1988). In this chapter, one focuses on design problems which are generally non-homogeneous and mixed (with real, integer, logical and categorical variables). Therefore, this implies some extensions of the standard interval Branch-and-Bound algorithms. Furthermore, design problems are subjected to strong (equality) constraints and then the implicit relations between the variables can be used in order to reduce a priori the part of the box where the constraints cannot be satisfied. These techniques are named constraint propagation or constraint pruning techniques (Hansen, 1992; Messine, 1997, 2004; Van Henterbryck et al., 1997). In Section 2, a deter~ninistic(exact) global optimization algorithm is presented. It is an extension of an interval Branch and Bound algorithm developed in Ratschek and Rokne (1988) and Messine (1997), to deal with such a problem (10.1). An important part of this section is dedicated to the presentation of a propagation technique based on the computational tree. This technique inserted in interval Branch and Bound algorithms has permitted to improve considerably the speed of convergence of such methods. In order to validate this approach and
10 A Deterministic Global Optimization Algorithm for Design Problems 269
in order to show the efficiency of such an algorithm, only one type of electro-mechanical actuator is considered: rotating machines w i t h perm a n e n t magnets. This choice was determined by my personal commitment in the formulation of the analytical model of such actuators, and also by the fact that they represent difficult global optimization problems. Other related work on the design of piezo-electric actuators can be found in Messine et a1 (2001). In Section 3, the analytical model of rotating machines w i t h permanent magnets is entirely presented and detailed. The physical assumptions to obtain this analytical relations are not discussed here, see Fitan et al. (2003, 2004), Messine et al. (1998), Kone et al. (1993), Nogarede (2001), and Nogarede et al. (1995) for a thorough survey on this subject. Numerical optimal solutions for some machines are then subsequently discussed.
2.
Exact and rigorous global optimization algorithm
These kinds of algorithm, named Branch-and-Bound, work within two phases: the computation of bounds of a given function considered over a box, and the decomposition of the initial domain into small boxes. Thus, the initial problem is bisected into smaller ones and for each subproblem, one tries to prove that the global optimum cannot occur in them by comparing the bounds with the best solution previously found. Hence, only sub-problems which may contain the global optimum are kept and stored (a list is generated). For constrained problems, it is also possible to show by computing the bounds that a constraint never can be satisfied over a given box; these corresponding sub-problems are discarded. Furthermore, the constraints reveal implicit relations between the variables and then, some techniques, named constraint propagation or constraint pruning or constraint deduction, are developed to reduce the domain where the problem is studied. These techniques are based on the calculus trees of the constraints (Messine, 1997, 2004; Van Henterbryck et al., 1997) or on linearizations of the constraint functions by using a Taylor expansion at the first order (Hansen, 1992). The exact method developed for solving Problems (10.1) is an extension of Interval Branch and Bound algorithms (Hansen, 1992; Kearfott, 1996; Messine, 1997; Ratschek and Rokne, 1988). All these algorithms are based on interval analysis (Moore, 1966) which is the tool for computing the bounds of a continuous function over a box; i.e. an interval vector. Generally, these algorithms work with homogeneous real variables according to an exclusion principle: when a constraint cannot be satisfied
270
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
in a considered box or when it is proved that the global optimum cannot occur in the box. In our case, it is necessary to extend an internal Branch-and-Bound algorithm to deal with non-homogeneous and mixed variables: real, integer, logical and categorical issue to different physical sizes. Furthermore, in order to improve the convergence of this kind of algorithms, the introduction of some iterations of propagation techniques became unavoidable (Hansen, 1992; Messine, 1997, 2004; Van Henterbryck et al., 1997). In the corresponding code, all the variables are represented by interval compact sets: rn
real variables: one considers the interval compact set where the global solution is searched,
rn
integer variables: the integer discrete set is relaxed to become the closest continuous interval compact set; {zL,. . . , xu) becomes
[xL, xu]. logical variable: {0,1) is relaxed into [ O , l ] , categorical variables: one introduces some definition of intermediate univariate real functions as explained in a following part of this section. The categorical sets are in fact sets of number from one to the number of categories. Of course, these enumerated sets are not ordered. Therefore, a distinction must be introduced between continuous and discrete variables. In the following algorithm, f denotes the function to be minimized, C represents the list where the sub-boxes are stored, 2 and f denote the current solution during the running of the program, ef is the given desired accuracy for the global optimum value and E is a given vector for the precisions of the corresponding solution points. The main steps of Algorithm 10.1 are boldfaced and are defined and detailed in later subsections. ALGORITHM 10.1 (INTERVAL BRANCHAND BOUNDALGORITHM) Begin 1. Let X E EXn, x Wex Bnb x Kibe the initial domain i n which the global m i n i m u m is sought. 2. Set f := +oo. 3. Set C := (+oo, X). 4. Extract from C the box for which the lowest lower bound has been computed. 5. Bisect the considered box, yielding Vl, V2.
n;;,
10 A Deterministic Global Optimization Algorithm for Design Problems 271
6. For j := 1 to 2 do 6.1. Compute vj := lower bound of f over 4 . 6.2. Propagate the constraints over 5 , (4 can be reduced). 6.3. Compute the lower and upper bounds for the interesting constraints over &. 6.4. i f f vj and n o constraint is unsatisfactory then 6.4.1. insert (vj, &) in L. 6.4.2. set f := min (f,f ( m ) ) , where m is the midpoint of &, if and only if m satisfies all the constraints. 6.4.3. if f is changed then remove from L all ( z , Z ) where x > f and set y := m. end if 7. if f < min(,,z)sc z E / and the largest box i n L is smaller than E , then STOP. Else GoTo Step 4. Result: f", G, L. End
>
+
Because the algorithm stops when the global minimum is sufficiently accurate (less than E /), and also when all the sub-boxes Z are sufficiently small, all the global solutions are given by the minimizers belonging to the union of the remaining sub-boxes in L, and the minimal value is given by the current minimum f . In practice, only f and its corresponding solution, are considered.
REMARK 10.1 In order to consider the n o n homogeneous case, E is in fact a real positive vector, represents the desired accuracy for the boxes remaining in the list L; ~i > 0 if it corresponds to a real variable and else ~i = 0, for logical, integer and categorical variables. Algorithm 10.1 follows the four following phases: the bisection of the box, the computation of bounds over a box, the exclusion of a box and propagation techniques to reduce the considered box a priori. These techniques are detailed in the following subsections.
2.1
Bisection rules
This phase is critical because it determines the efficient way to decompose the initial problem into smaller ones. In our implementation, all the components of a box are represented by real-interval vectors. Nevertheless, attention must be paid to the components when they represent real, integer, logical or categorical variables.
272
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
The classical principle of bisection-in the continuous homogeneous case-consists in choosing a coordinate direction parallel to which the box has an edge of maximum length. This box is then bisected normal to this direction (Ratschek and Rokne, 1988). For solving the Problem (10.1), the real variables are generally nonhomogeneous (coming from different physical characteristics: current density and ,diameter of a machine for example). Furthermore, Algorithm 10.1 must deal with discrete variables: logical, integer, and categorical. Hence, the accuracy given by the designer, is represented in Algorithm 10.1 by a real vector E corresponding to each variable: it is the expected precision for the solution at the end of the algorithm. EI, is fixed to 0, if it represents a discrete (integer, logical or categorical) component. Therefore, the bisection rule is modified, considering continuous and discrete variables. Therefore, one uses two different ways to bisect a variable according to its type (continuous or discrete). : the given weights for respecLet us denote, by wx, wf, W: and ,w tively the real variables xi, the integer variables xi, the logical variables bi and the categorical variables ki. First, the following real values are computed for all the variables:
where the application I .I denotes the cardinal (i.e. the number of elements) of the considered discrete sets. The largest real value of this list implies the variable (k) which will be bisected, in the following way: 1. Z1 := Z and Z2 := Z 2. i f
~k
= 0 then ( f o r d i s c r e t e v a r i a b l e s )
e l s e Zk i s divided by i t s midpoint, t h i s d i r e c t l y produces Z1 and Z2.
denotes the kth comwhere Zk = [$, rg], respectively (Zl)k and (22)k, ponents of Z , respectively of Z1 and Z2. [xIIrepresents the integer part of the considered real value x.
REMARK 10.2 It is more efficient to emphasize the bisection for the discrete variables ki,because that involves a lot of considerable modifi-
10 A Deterministic Global Optimization Algorithm for Design Problems 273
cations of the so-considered optimization problem (10.1)). In the following numerical examples and more generally, the weight for the discrete variables are fixed to wf = w: = w: = 100 and for the real variables wx =
Computation of the bounds
2.2
The computation of the bounds represents the fundamental part of the algorithm, because all the techniques of exclusion and of propagation are depending on them. An inclusion function is an interval function, such that it encloses the range of a function over a box Y. For a given function f , a corresponding inclusion function is denoted by F, such that: [minuEyf (y), maxyEyf (y)] C F ( Y ) , furthermore one has: Z C Y implies F ( Z ) G F ( Y ) . The given functions must be explicitly detailed to make possible the construction of an inclusion function. Algorithm 10.1 works and converges even i f f is not continuous (Kearfott, 1996; Moore, 1966; Ratschek and Rokne, 1988). The number of global minimum points can be unbounded, but f has to be bounded in order to obtain a global minimum. Lipschitz conditions, differentiability, or smoothness properties are not needed. Nevertheless the numerical running is facilitated and the convergence speed may be improved if these properties are present. The following paragraph recalls the standard interval techniques used to construct inclusion functions (Moore, 1966). Let II be the set of real compact intervals [a, b], where a , b are real (or floating point) numbers. The arithmetic operations for intervals are defined as follows:
+
+
[a, b] [c,d] = [a -tc, b dl [a, b] - jc,d] = [a - d, b - c] = [a, b] x [c,d] = [min{a x c, a x d, b x c, b x d), max{axc,axd,bxc,bxd)] if 0 @ [c, dl [a,b] t [c,dl = [a, b] x
I
(10.2)
[i,$1
These above operations can be extended for mixed computations between real values and intervals because a real value is a degenerated interval where the two bounds are equal. When, the real value is not representable by a floating point number, an interval can then be generated with the two closest floating points enclosing the real value; for example for T, two floating points must be considered, one just under T and the other just over. Definitions (10.2) show that subtraction and division in I1 are not the inverse operatiyns of addition and multiplication. Unfortunately, the
274
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
interval arithmetic does not conserve all the properties of the standard one; for example it is sub-distributive: V(A, B, C) E 1l3, A x ( B C) 5 A x B A x C (Moore, 1966). The division by an interval containing zero is undefined and then, an extended interval arithmetic has been developed, refer to Ratschek and Rokne (1988) and Hansen (1992). The natural extension of an expression of f into interval, consisting by replacing each occurrence of a variable by its corresponding interval (which encloses it), and then by applying the above rules of interval arithmetic, is an inclusion function; special procedures for bounding trigonometric and transcendental functions allow the extension of this procedure to a great number of analytical functions. This represents a fundamental theorem of interval analysis Moore (1966). The bounds so-evaluated (by the natural extension of an expression of f ) are not always accurate in the sense that the bounds may become too large and then inefficient. Hence, several other techniques based on Taylor expansions, are classically used, refer to Messine (1997), Moore (1966), and Ratschek and Rokne (1988) for a thorough survey and discussion on this subject; for these inclusion function, the given function must be continuous and at least once differentiable. For our design problems, the natural extension into in,terval has generally been sufficient. Interval arithmetic is well defined only for continuous real functions and then, inclusion functions must be extended to deal with discrete variables. For logical and integer variables, one must just relax the fact that these variables are discrete: the discrete logical sets ( 0 , l ) become the continuous interval compact sets [ O , l ] , and the discrete integer sets ( 0 , . . . ,n), (1,. . . ,n), or more generally {xL ,z L 1,xL 2 , . . . , xu) are relaxed by respectively the following compact intervals: [O, n], [I,n] and [xL,xu]. Hence, a new inclusion function concerning mixed variables: logical, integer and real variables can then be constructed. The categorical variables cannot directly be considered in an expression of a function, because they represent some varieties of an object which induces some effects. Generally, these effects bring positive real values; for example the magnetic polarization value which depends on the kind of the permanent magnets used. Therefore, each categorical variable (used to represent varieties of objects) must be associated to at least one real univariate function, denoted by:
+
+
+
+
In this work, only univariate functions are considered because they actually are sufficient for our practical uses. Furthermore in our code, all
10 A Deterministic Global Optimization Algorithm for Design Problems 275
categorical variables a k are denoted by an integer number, beginning from 1 to IKkI. Each of these numbers correspond to a precise category which must be previously defined. Hence, for computing the bounds of a function f over a box X , 2, B , C, depending on the univariate real functions denoted by ai, enclosures of the intervals [minujEKjai (aj),maXuj~Kj ai(oj)] must be computed. Denoting by C j an enumerate subset of the corresponding categorical set Kj, the following inclusion function for the corresponding real function ai is then defined by: [vl,vl], if C j = [I, 11, if C j = [IKjI, [Kjl],
{ ~, .f. . , cy I}, max {vi, i E {c:, . . . , ~ y } } ] , in the general case, where ic, = [Cf, Cy] [l,lKkl] in this representation, a general inclusion function F ( X , 2,B, C) is then constructed for mixed (discrete and continuous) expressions.
REMARK 10.3 A more efficient inclusion function for the real univariate function ai over an enumerate set Cj C K j , is: Ai(Cj) := [rnin {vi, i E {C:,
. . . , cY}},
max {vi, i E
{ ~ f. . ,. , Cy}}]
+
However, this function needs an enumeration of the subset Cj for each computations. Other techniques are possible and some of them are detailed in Messine et al (2001). Hence, lower and upper bounds can also be generated. In order to produce logical, integer and categorical solutions with continuous relaxations for the corresponding discrete variables, only particular bisection rules must be considered, refer to the above section.
2.3
Exclusion principle
The techniques of exclusion are based on the fact that it is proved that the global optimum cannot occur in a box. This leads to two main possibilities, considering a sub-box denoted by X , 2, B , C: I . the (already found) solution, denoted by f , cannot be improved in , B, C) > f , i.e. the lower bound of this considered box: F ~ ( x2, the given function f over the sub-box X , 2,B, C is greater than
276
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
a solution already found: no point in the box can improve this solution f , see Step 6.4 of Algorithm 10.1. 2. It can be proved that a constraint will never be satisfied in the sub-box: G ~ ( x 2, , B, C) > 0 or 0 $ H k ( X ,Z, B, C). Equality constraints are hard to be satisfied numerically. Therefore, given tolerances are introduced for each equality constraints and then, one verifies if Hk(X,Z, B , C) [- (ce)k, (&e)k]in place of H k ( X ,2, B, C) = [O,O].
In our case, the computation of the bounds is exact and rigorous; thus the associated global optimization algorithm is said exact and rigorous, and the global optimum is then perfectly enclosed with a given accuracy: XU-xf < ~ i , ' v 'Ei (1,...,n,}, Z: = $,'v'i E (1,...,n,), b: = by,b'i E (1,. . . , nb} and k f = kv,'v'i E (1,. . . , nk}. REMARK10.4 It may be possible that a logical or a categorical variable generates new additional constraints and variables. In this case, particular procedures must be inserted.
2.4
Constraint propagat ion techniques
Constraint propagation techniques based on interval analysis permit to reduce the bounds of an initial hypercube (interval vector) by using the implicit relations between the variables derived from the constraints. In this subsection, the constraints are written in a general way, as follows: (10.3) c(x) E [a,b], with x E X c Rn. where c is a real function which represents the studied constraint, [a, b] is a real fixed interval and X is a real interval compact vector. In order to consider an equality constraint, one fixes a = b. For an inequality constmint, a is fixed to -00 (numerically one uses the lower representable floating point value). REMARK10.5 Only the continuous case is considered in this section. However, it is very simple to extend these techniques to deal with integer, logical and real variables-except the categorical case-by relaxing the discrete variables by their corresponding continuous set, such as explained below, and by taking the integer part of the upper bound and the integer part-plus one if it is different to the real value-of the lower bound of the resulting interval.
10 A Deterministic Global Optimization Algorithm for Design Problems 277
2.4.1
Classical interval propagation techniques.
The linear case. the propagation is:
If the given constraint is linear: c(x) =
zyz2=1 aixi,
where k is in (1, . . . , n } and Xi is the ith interval component of X .
The non-linear case, Hansen method. If the constraint c is non linear, but continuous and at least once differentiable, Hansen (1992) uses a Taylor expansion at the first order to produce a linear equation with interval coefficients. A Taylor expansion at the first order can be written as follows:
where (x, y) E X 2 and J E 2 (X represents the open set of the compact hypercube X: a component of x has the following form ] $, x y [) . An enclosure of Vc(