Essays and Surveys in Global Optimization (Gerad 25th Anniversary Series)

ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION GERAD 25th Anniversary Series w Essays and Surveys i n Global Optimization...

Author: Charles Audet | Pierre Hansen | Giles Savard

25 downloads 733 Views 14MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION

GERAD 25th Anniversary Series

w

Essays and Surveys i n Global Optimization Charles Audet, Pierre Hansen, and Gilles Savard, editors

w

Graph Theory and Combinatorial Optimization David Avis, Alain Hertz, and Odile Marcotte, editors

w

Numerical Methods in Finance Hatem Ben-Ameur and Michkle Breton, editors

w

Analysis, Control and Optimization o f Complex Dynamic Systems El-KCbir Boukas and Roland MalhamC, editors

w

Column Generation Guy Desaulniers, Jacques Desrosiers, and Marius M. Solomon, editors

w

Statistical Modeling and Analysis for Complex Data Problems Pierre Duchesne and Bruno RCmillard, editors

w

Performance Evaluation and Planning Methods for the Next Generation Internet AndrC Girard, Brunilde Sansb, and FClisa Vizquez-Abad, editors

Dynamic Games: Theory and Applications Alain Haurie and Georges Zaccour, editors

Logistics Systems: Design and Optimization And& Langevin and Diane Riopel, editors w

Energy and Environment Richard Loulou, Jean-Philippe Waaub, and Georges Zaccour, editors


Edited by

CHARLES AUDET ~ c o l Polytechnique e de Montreal and GERAD

PIERRE HANSEN HEC Montreal and GERAD

GILLES SAVARD ~ c o l Polytechnique e de Montreal and GERAD

Q - Springer

Charles Audet &ole Polytechnique de Montreal & GERAD Montreal, Canada

Pierre Hansen HEC Mont+al& GERAD Montrkal, Canada

Gilles Savard Ecole Polytechnique de Montrkal& GERAD Montreal, Canada

Library of Congress Cataloging-in-Publication Data A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN-10: 0-387-25569-9 ISBN 0-387-25570-2 (e-book) Printed on acid-free paper. ISBN-13: 978-0387-25569-9 O 2005 by Springer Science+Business Media, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science + Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed in the United States of America. 9 8 7 6 5 4 3 2 1

SPIN 11053194

Foreword

GERAD celebrates this year its 25th anniversary. The Center was created in 1980 by a small group of professors and researchers of HEC MontrBal, McGill University and of the ~ c o l Polytechnique e de Montreal. GERAD's activities achieved sufficient scope to justify its conversion in June 1988 into a Joint Research Centre of HEC MontrBal, the ~ c o l e Polytechnique de Montreal and McGill University. In 1996, the Universite du Quebec B MontrBal joined these three institutions. GERAD has fifty members (professors), more than twenty research associates and post doctoral students and more than two hundreds master and Ph.D. students. GERAD is a multi-university center and a vital forum for the development of operations research. Its mission is defined around the following four complementarily objectives: w

The original and expert contribution to all research fields in GERAD's area of expertise; The dissemination of research results in the best scientific outlets as well as in the society in general; The training of graduate students and post doctoral researchers; The contribution to the economic community by solving important problems and providing transferable tools.

GERAD's research thrusts and fields of expertise are as follows: Development of mathematical analysis tools and techniques to solve the complex problems that arise in management sciences and engineering; Development of algorithms to resolve such problems efficiently; Application of these techniques and tools to problems posed in related disciplines, such as statistics, financial engineering, game theory and artificial intelligence; Application of advanced tools to optimization and planning of large technical and economic systems, such as energy systems, transportation/communication networks, and production systems; Integration of scientific findings into software, expert systems and decision-support systems that can be used by industry.

vi


One of the marking events of the celebrations of the 25th anniversary of GERAD is the publication of ten volumes covering most of the Center's research areas of expertise. The list follows: Essays a n d Surveys i n Global Optimization, edited by C. Audet, P. Hansen and G. Savard; G r a p h T h e o r y a n d Combinatorial Optimization, edited by D. Avis, A. Hertz and 0. Marcotte; Numerical M e t h o d s i n Finance, edited by H. Ben-Ameur and M. Breton; Analysis, Cont r o l a n d Optimization of Complex D y n a m i c Systems, edited by E.K. Boukas and R. Malham6; C o l u m n Generation, edited by G. Desaulniers, J. Desrosiers and M.M. Solomon; Statistical Modeling a n d Analysis for Complex D a t a Problems, edited by P. Duchesne and B. R6millard; Performance Evaluation a n d P l a n n i n g M e t h o d s for t h e N e x t G e n e r a t i o n I n t e r n e t , edited by A. Girard, B. Sansb and F. VBzquez-Abad; D y n a m i c Games: T h e o r y a n d Applications, edited by A. Haurie and G. Zaccour; Logistics Systems: Design a n d Optimization, edited by A. Langevin and D. Riopel; E n e r g y a n d Environment, edited by R. Loulou, J.-P. Waaub and G. Zaccour. I would like to express my gratitude to the Editors of the ten volumes, to the authors who accepted with great enthusiasm to submit their work and to the reviewers for their benevolent work and timely response. I would also like to thank Mrs. Nicole Paradis, Francine Benoit and Louise Letendre and Mr. Andre Montpetit for their excellent editing work. The GERAD group has earned its reputation as a worldwide leader in its field. This is certainly due to the enthusiasm and motivation of GERAD's researchers and students, but also to the funding and the infrastructures available. I would like to seize the opportunity to thank the organizations that, from the beginning, believed in the potential and the value of GERAD and have supported it over the years. These are HEC Montrkal, ~ c o l Polytechnique e de Montreal, McGill University, Universit6 du Qu6bec B Montr6al and, of course, the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds qu6b6cois de la recherche sur la nature et les technologies (FQRNT). Georges Zaccour Director of GERAD

Le Groupe d'ktudes et de recherche en analyse des dkcisions (GERAD) fete cette annke son vingt-cinquikme anniversaire. Fond6 en 1980 par une poignke de professeurs et chercheurs de HEC Montreal engagks dans des recherches en kquipe avec des collkgues de 1'Universitk McGill et de 1'~colePolytechnique de Montrkal, le Centre comporte maintenant une cinquantaine de membres, plus d'une vingtaine de professionnels de recherche et stagiaires post-doctoraux et plus de 200 ktudiants des cycles supkrieurs. Les activitks du GERAD ont pris suffisamment d'ampleur pour justifier en juin 1988 sa transformation en un Centre de recherche conjoint de HEC Montrkal, de 1'~colePolytechnique de Montrkal et de 1'Universitk McGill. En 1996, 1'Universitk du Qukbec & Montr6al s'est jointe & ces institutions pour parrainer le GERAD. Le GERAD est un regroupement de chercheurs autour de la discipline de la recherche op6rationnelle. Sa mission s'articule autour des objectifs complkmentaires suivants : la contribution originale et experte dans tous les axes de recherche de ses champs de compktence ; la diffusion des rksultats dans les plus grandes revues du domaine ainsi qu'auprks des diffkrents publics qui forment l'environnement du Centre ; la formation d'ktudiants des cycles sup6rieurs et de stagiaires postdoctoraux ; rn la contribution & la communautk kconomique & travers la r6solution de problkmes et le dkveloppement de coffres d'outils transfkrables. Les principaux axes de recherche du GERAD, en allant du plus thkorique au plus appliqu6, sont les suivants : rn le dkveloppement d'outils et de techniques d'analyse mathkmatiques de la recherche opkrationnelle pour la rksolution de problkmes complexes qui se posent dans les sciences de la gestion et du gknie ; la confection d'algorithmes permettant la rksolution efficace de ces problkmes ; rn l'application de ces outils & des problkmes posks dans des disciplines connexes & la recherche opkrationnelle telles que la statistique, l'ingknierie financikre, la thkorie des jeux et l'intelligence artificielle ; rn l'application de ces outils & l'optimisation et & la planification de grands systkmes technico-kconomiques comme les systkmes knergB tiques, les rkseaux de t616communication et de transport, la logistique et la distributique dans les industries manufacturikres et de service ;

...

vlll


l'intkgration des r6sultats scientifiques dans des logiciels, des systkmes experts et dans des systkmes d'aide & la dkcision transfkrables B l'industrie. Le fait marquant des cklkbrations du 25e du GERAD est la publication de dix volumes couvrant les champs d'expertise du Centre. La liste suit : Essays a n d Surveys i n Global Optimization, kditk par C. Audet, P. Hansen et G. Savard; G r a p h T h e o r y a n d Combinatorial Optimization, kdit6 par D. Avis, A. Hertz et 0 . Marcotte; Numerical M e t h o d s i n Finance, kditk par H. Ben-Ameur et M. Breton ; Analysis, C o n t r o l a n d Optimization of C o m p l e x D y n a m i c Systems, kditk par E.K. Boukas et R. Malhamk ; C o l u m n Generation, 6ditk par G. Desaulniers, J . Desrosiers et M.M. Solomon ; Statistical Modeling a n d Analysis for Complex D a t a Problems, kditk par P. Duchesne et B. Rkmillard ; Performance Evaluation a n d P l a n n i n g M e t h o d s for t h e N e x t G e n e r a t i o n I n t e r n e t , kditit par A. Girard, B. Sansb et F. V&zquez-Abad; D y n a m i c G a m e s : T h e o r y a n d Applications, 6ditk par A. Haurie et G. Zaccour ; Logistics S y s t e m s : Design a n d Optimization, kditk par A. Langevin et D. Riopel; E n e r g y a n d Environment, kdit6 par R. Loulou, J.-P. Waaub et G. Zaccour. Je voudrais remercier trks sinckrement les 6diteurs de ces volumes, les nombreux auteurs qui ont trks volontiers r6pondu & l'invitation des kditeurs & soumettre leurs travaux, et les kvaluateurs pour leur bknkvolat et ponctualitk. Je voudrais aussi remercier Mmes Nicole Paradis, Francine Benoit et Louise Letendre ainsi que M. Andr6 Montpetit pour leur travail expert d'kdition. La place de premier plan qu'occupe le GERAD sur 1'6chiquier mondial est certes due B la passion qui anime ses chercheurs et ses ittudiants, mais aussi au financement et & l'infrastructure disponibles. Je voudrais profiter de cette occasion pour remercier les organisations qui ont cru dks le depart au potentiel et B la valeur du GERAD et nous ont soutenus durant ces annkes. I1 s'agit de HEC Montrkal, 1'~colePolytechnique de Montrkal, 1'Universitk McGill, l'Universit6 du Qukbec & Montrkal et, bien sfir, le Conseil de recherche en sciences naturelles et en gknie du Canada (CRSNG) et le Fonds qukbkcois de la recherche sur la nature et les technologies (FQRNT). Georges Zaccour Directeur du GERAD

Contents

Foreword Contributing Authors Preface 1 Unilateral Analysis and Duality Jean- Paul Penot 2 Monotonic Optimization: Branch and Cut Methods

v vii xi xiii 1

39

Hoang Tuy, Faiz Al-Khayyal, and Phan Thien Thach 3 Duality Bound Methods in Global Optimization Reiner .Horst and Nguyen Van Thoai 4 General Quadratic Programming Nguyen Van Thoai

5 On Solving Polynomial, Factorable, and Black-box Optimization Problems Using the RLT Methodology Hanif D. Sherali and Jitamitra Desai

79

107

131

6 Bilevel Programming Stephan Dempe

165

7 Applications of Global Optimization to Portfolio Analysis Hiroshi Konno

195

8 Optimization Techniques in Medicine Panos M. Pardalos, Vladimir L. Boginski, Oleg Alexan Prokopyev, Wichai Suharitdamrong, Paul R. Carney, Wanpracha Chaovalztwongse, and Alkzs Vazacopoulos 9 Global Optimization in Geometry-Circle Packing into the Square Pe'ter ~ i b o Szabd, r Miha'ly ~ s a b aMarkdt, and Tibor Csendes

211

233

x


10 A Deterministic Global Optimization Algorithm for Design Problems Fre'dkric Messine

267

Contributing Authors FAIZAL-KHAYYAL Georgia Institute of Technology, USA faizQisye.gatech.edu VLADIMIR L. BOGINSKI University of Florida, USA vbQufl.edu PAULR. CARNEY University of Florida, USA carneprQpeds.ufl.edu WANPRACHA CHAOVALITWONGSE Rutgers, The State University of New Jersey, USA wchaoval~at~rciQrutgers.edu

TIBOR CSENDES University of Szeged, Hungary csendesQinf.u-szeged.hu STEPHANDEMPE Technical Univ. Bergakademie Freiberg, Germany dempeQmath.tu-freiberg.de JITAMITRA DESAI Virginia Polytechnic Institute and State University, USA j idesaiQvt .edu REINERHORST University of Trier, Germany horstQuni-trier.de HIROSHIKONNO Chuo University, Japan konnoQindsys.chuo-u.ac.jp

MIHALYCSABAMARKOT University of Szeged, Hungary mark0tQinf.u-szeged.hu FRBDBRICMESSINE Universitk de Pau et des Pays de I'Adour, France messineQuniv-pau.fr

PANOSM. PARDALOS University of Florida, USA pardalosQufl.edu JEAN-PAUL PENOT Universite de Pau et des Pays de I'Adour, France Jean-Paul.PenotQuniv-pau.fr

OLEGALEXAN PROKOPYEV University of Florida, USA oap4ripeQufl.edu HANIFD. SHERALI Virginia Polytechnic Institute and State University, USA hanifsQvt.edu WICHAISUHARITDAMRONG University of Florida, USA wichaisQufl.edu POTER GABORSZABO University of Szeged, Hungary pszab0Qinf.u-szeged,hu P H A NTHIEN THACH Institute of Mathematics, VNNCST, Vietnam ptthachQmath.ac.vn HOANGT U Y Institute of Mathematics, Vietnam htuyQmath.ac.vn NGUYENVANTHOAI University of Trier, Germany thoaiQuni-trier.de ALKISVAZACOPOULOS Dash Optimization Inc., USA avQdashoptimization.com

Preface

Global optimization aims at solving the most general problem of deterministic mathematical programming: to find the global optimum of a nonlinear, nonconvex, multivariate function of continuous and/or integer variables subject to constraints which may be themselves nonlinear and nonconvex. Moreover, a proof of optimality of the solution found is also requested. Not surprisingly and due to its difficulty, global optimization has long been neglected and, for instance, not even mentioned in many books of nonlinear programming of the 60's through 80's. The situation has however drastically changed in the last 25 years, in which global optimization has become an important field in itself, with its own Journal of Global Optimization and about 100 books published on theory and applications. The essays and surveys of the present volume are testimonies of the successes obtained by many researchers in this field, the power of its methods, the diversity of its applications as well as the variety of the ideas and research poles explored. A first series of papers addresses mathematical properties and algorithms for general global optimization problems or specialized classes thereof: in his essay "Unilateral Analysis and Duality," Jean-Paul Penot introduces one-sided versions of Lagrangian and perturbations and, using concepts from generalized convexity, presents the main features of duality in a general framework. In "Monotonic Optimization: Branch and Cut Methods" Hoang Tuy, Faiz Al-Khayyal and Phan Thien Thach consider monotonic optimization problems dealing with multivariate monotonic functions and differences of monotonic functions. They derive several new families of cuts. Reiner Horst and Nguyen Van Thoai survey "Duality Bound Methods in Global Optimization" which use Lagrangian duality and brand-and-bound. In another paper, Nguyen Van Thoai also surveys "General Quadratic Programming," focusing on three topics: optimality conditions, duality and solution methods. In their survey "On Solving Polynomial, Factorable, and Black-Box Optimization Problems using the RLT Methodology," Hanif D. Sherali and Jitamitra Desai present a discussion of the Relaxation-Linearization/Conversification Technique applied to several classes of problems. Finally, Stephan Dempe surveys the thriving field of "Bilevel Programming," i.e., hierarchical models in which an objective function is minimized over the graph of the solution set mapping of a parametric optimization problem. A second series of papers addresses applications of global optimization in various fields. Hiroshi Konno surveys "Application of Global

xiv

ESSA Y S AND SURVEYS IN GLOBAL OP T I M I ATION

Optimization to Portfolio Analysis" an area which has been explored extensively with much success. Mean-risk models under nonconvex transaction costs, minimal unit transaction and cardinality constraints are discussed, as well as several bound portfolio optimization problems. Panos Pardalos and co-workers survey recent applications of ('Optimization Techniques in Medecine," a rapidly emerging interdisciplinary research area. They consider problem of diagnosis, risk prediction, brain dynamics and epileptic seizure predictions, as well as treatment planning. P6ter GBbor Szab6, MihBly Csaba Mark6t and Tibor Csendes survey in depth a much studied problem in their paper "Global Optimization in Geometry - Circle Packing into the Square." The main tool used is interval analysis. Finally, Fredkric Messine illustrates the state-of-the art in solving a most difficult problem, i.e., mixed-integer problem with complicated constraints involving trigonometric function, in his essay "A Deterministic Global Optimization Algorithm for Design Problems." The conjunction of interval analysis with constraint propagation techniques within a branch-and-bound framework leads to exact solution of complex electrical motor design problems, which would have appeared completely out-of-reach just a few years ago.

CHARLES AUDET PIERREHANSEN GILLESSAVARD GERAD

Chapter 1

UNILATERAL ANALYSIS AND DUALITY Jean-Paul Penot Dedicated to Jean-Jacques Moreau o n the occasion of his 80th birthday

Abstract

1.

We introduce one-sided versions of Lagrangians and perturbations. We relate them, using concepts from generalized convexity. In such a way, we are able to present the main features of duality theory in a general framework encompassing numerous special instances. We focus our attention on the set of multipliers. We look for an interpretation of multipliers as generalized derivatives of the performance function associated with a dualizing parameterization of the given problem.

Introduction

Unilateral analysis is the study of phenomena or concepts in which one-sided notions arise and for which directions cannot be reverted. Optimization is one of its main streams, since in general one is interested in a minimization process and the maximization question is without interest, or vice versa. Thus optimality conditions are of the same nature. A typical appearance of this point of view has emerged with the notion of subdifferential in convex analysis and its extensions in nonsmooth analysis and generalized convexity. One-sided estimates also play an important role in some practical methods such as the branch and bound method. A number of results in duality theory are also of one-sided essence. Duality theories for optimization problems have received much attention, both from a theoretical point of view and from a computational concern. The aim of the present article is to stress the use of one-sided analysis in duality theories. Specifically, we examine the possibility of obtaining some flexibility in replacing equalities with inequalities in the definitions of Lagrangians and perturbations. Such a step is in accordance with a number of generalizations used in optimiza-

2


tion, convex analysis and nonsmooth analysis which led to what is often called unilateral analysis or one-sided analysis. As in recent monographs dealing with generalized convexity (Pallaschke and Rolewicz, 1997; Rubinov, 2000a; Rubinov and Yang, 2003; Singer, 1997), our framework is quite simple and general: we do not suppose the decision space X is endowed with any structure and the admissible (or feasible) set A c X is any subset. Also, we do not impose any special property on the objective function f : X + R of the constrained problem

0")

minimize f (x) subject to x E A

we consider. We give a special attention to the case the feasible set A is the value at some base point Oz of some parameter space Z of a multimapping F : Z 3 X. In such a case (which contains many special situations, in particular mathematical programming problems), a natural perturbation of problem ( P ) is available and a duality theory can be applied provided one disposes of a coupling function between Z and some other space Y. Another classical duality theory is the Lagrangian of hopefully simple functions is used theory in which a family (ty)yEy to approach the objective function f : X + R U {+a) on A or rather the modified function fA = f LA, where LA is the indicator function of A given by LA(X) = 0 if x E A, LA(X) = +a if x E X\A. The two approaches have been compared in some cases (see, for example, Magnanti, 1974; Ekeland and Tkmam, 1976; Penot, 1995a; Rockafellar, 1974b; Rockafellar and Wets, 1997; Tind and Wolsey, 1981, . . . ). But it seems that such a comparison is lacking in a more general framework. We undertake it in Section 4 after a short review of the two schemes mentioned above. That comparison relies on the concepts of conjugacy and generalized (or abstract) convexity, which go back to Fan (1956, 1963) and Moreau (1970); see also Balder (1977), Briec and Horvath (2004), Dolecki and Kurcyusz (1978), Komiya (1981), Penot and Rubinov (2005), Rockafellar and Wets (1997), and Singer (1997). In Section 3, the definitions and results from abstract convexity which will be used in this paper are collected. More information, sometimes in excruciating details, can be found for example in the monographs Pallaschke and Rolewicz (1997), Rubinov (2000a), Rubinov and Yang (2003), and Singer (1997). For the convex case, which serves as a model for the whole theory, we refer to Aubin (1979), Aubin and Ekeland (1975), Auslender (2000), Az6 (1997), Borwein and Lewis (2000), Ekeland and T6mam (1976), Laurent (1972), Hiriart-Urruty (1998), Minoux (1983), Moreau (l97O), Rockafellar (l974b), and Zdinescu (2002) for example.

+

3

Unilateral Analysis and Duality

1

Although the notion of duality and the related notion of polarity (Volle, 1985; Penot, 2000) can be set in a more general framework (Martinez-Legaz and Singer, 1995; Penot, 2000; Singer, 1986), and even in the broad setting of lattices (Birkhoff, 1966; Martinez-Legaz and Singer, 1990; Ore, 1944; Penot, 2000; Volle, 1985), we limit our approach to the case of conjugacies, in the line of Moreau (1970), Balder (1977), Martinez-Legaz (1988b, 1995), Pallaschke and Rolewicz (1997), Penot (1985), Penot and Volle (1987, 1990a), Rubinov and Yang (2003), Singer (1986), and Tind and Wolsey (1981). In Penot and Rubinov (2005) and in some examples presented here, the case in which the parameter space Z is endowed with some preorder is considered. As there, we give a particular attention to the set of multipliers here. In particular, we relate this set to the solution set of the dual problem and to the subdifferential of the performance (or value) function p associated with the perturbation. Here the concept of multiplier is a global one and the subdifferential is a set related to the coupling, and not to a local subdifferential as in nonsmooth analysis. It follows that, on the contrary to the case studied in Gauvin (1982), Gauvin and Janin (1990), Gauvin and Tolle (1977), Penot (1997a,b, 1999), Rockafellar (1974a,b, 1982, 1993), no constraint qualification and no growth conditions are required. Several examples are gathered in Section 5. We end the paper with some observations concerning the links with calmness and penalization. Recent contributions (Aubin and Ekeland, 1975; Bourass and Giner, 2001; Clarke and Nour, 2005; Huang and Yang, 2003; Ioffe, 2001; Ioffe and Rubinov, 2002; Nour, 2004; Penot, 2003c; Rolewicz, 1994, 1996; Rubinov and Yang, 2003, . . . ) show the interest of dealing with dualities in a non classical framework. We hope the.present paper will contribute to the usefulness of such a general approach.

2.

Sub-Lagrangians and multipliers

The Lagrangian approach to duality in optimization is general and simple. It relies on the elementary inequality sup inf L(x, y) 5 inf sup L(x, y) YEY"EX

"EX ~ E Y

a

valid for any function L: X x Y -+ := R U {-GO, +GO). In order to take advantage of this inequality, we suppose that we are given a family (ty)yEy of extended real-valued functions on X such that ty 5 f A := f L A for each y E Y, or, equivalently, setting L(x, y) := Ly(x) := ty(x),

+

4


Then we say that the bifunction L : X x Y + is asub-Lagrangian of problem ('P). The change of the equality fA(.) = supyEyL(.,y) defining a Lagrangian into an inequality brings some versatility to the approach. We get an idea of the flexibility of the notion of sub-Lagrangian in observing that if f is bounded below by m on A, then for Y := (01, L(x, y) = m and L(x, y) = m LA(X)are sub-Lagrangians, but in general they are not Lagrangians. More generally, if (Ay)yE is a family of Ay and if f is bounded below by my subsets of X such that A c on Ay, then one gets a sub-Lagrangian in setting L(x, y) = my L ~ , ( x ) , for x E X, y E Y. We will see that most properties of Lagrangians can be preserved under appropriate assumptions. The first one is the weak duality property. Introducing the dual functional dL by

+

nyGY

+

we define the Lagrangian dual problem as (DL

maximize dL(y) y E Y

Then, in view of relations (1.1) and (1.2), one has the obvious weak duality inequality (1.4) s u p ( D ~I) inf('P). This inequality may offer useful estimates. One says that there is no duality gap when this inequality is an equality. The no duality gap property may occur for a sub-Lagrangian which is not a Lagrangian. It always occurs when some multiplier is available. Here, the notion of multiplier we adopt is a global notion, not a local one as for the notion of Lagrange or Karush-Kuhn-Tucker multiplier; however, under generalized convexity assumptions, both concepts coincide. The precise definition is as follows; we strive to keep close to the classical notion of convex analysis, in spite of the fact that L is just a sub-Lagrangian. DEFINITION1.1 One says that an element y of Y is a multiplier (for the sub-Lagrangian L) if infxEx L(x, y) = infxEAf (x), or, in other terns, if dL(fj) = inf f A ( X ) . The set of multipliers will be denoted by M (or ML if one needs to stress the dependence on L). A number of studies have been devoted to obtaining conditions ensuring the no gap property or the existence of multipliers (see Aubin, 1979; Moreau, 1964; Balder, 1977; Dolecki and Kurcyusz, 1978; Huang and Yang, 2003; Janin, 1973; Lindberg, 1980; Pallaschke and Rolewicz, 1997; Rockafellar, 1974a; Rubinov et al., 1999a,

1

5


2000, 2002; Rubinov and Yang, 2003, for instance). Some forms of convexity or quasi-convexity implying an infsup property is often involved (see Aubin and Ekeland, 1975; Auslender, 2000; Moreau, 1964; Simons, 1998, . . . ). A prototype of such results is the Sion-Von Neumann minimax theorem (Aubin, 1979; Rubinov and Yang, 2003; Simons, 1994; Sion, 1954, 1958) which represents a noticeable step outside the realm of convex analysis. The very definition of a multiplier leads to the following obvious observation. PROPOSITION 1.1 An element y of Y belongs to the set M of multipliers i i and only if, it belongs to the set Sz of solutions to the dual problem (DL) and there is no duality gap. Thus, when M is nonempty, one has M = Sz. When a multiplier is available, one gets the value of problem (P)by solving the unconstrained problem (Qd

minimize L(x, y)

xEX

which is easier to solve than (P) in general. In fact, much more can be expected.

PROPOSITION 1.2 Suppose L is a sub-Lagrangian of ( P ) . If 3 2 X belongs to the set S of solutzons to (P) and zf y E M , then 3 belongs to the set Sg of solutions to (Qg) and L(3, y) = fA(Z). Conversely, given y E Y, if 3 t: Sg and zf L(3,jj) = fA(3),t h e n 3 E S and G E M . Proof. Let Z E S and E M . Then, for any x E X , in view of the assumption fA(3) = inf(P) = dL(y) 5 L(x, jj), one has L(Z,y) 5 fA(3)l L(x, y), so that 3 is a solution to (Qg); moreover, taking x = 3 in these last inequalities one gets L(3, y) = f A (3). Conversely, let y E Y and let 3 be a solution to (Qg) such that L(3, y) = fA(3). Then, for any x E X , one has fA(3)= L(3, y) = dL(y) 5 inf(P) 5 fA(x), so that Z is a solution to (P),in particular 0 Z E A, dL(jj) = inf(P) and y is a multiplier. When the assumptions of the preceding statement are satisfied and when L(3, y) is finite, one can show that (3, y) E X x Y is a saddle-point of L on X x Y in the sense that for any (x, y) E X x Y one has

The knowledge of a multiplier leads to an unconstrained problem, a clear advantage. On the other hand, in many cases the function L(., y) is nonsmooth, even when the data are smooth. Note that for many

6


methods of global minimization smoothness is not as important as for a local search. In some cases as in Examples 1, 2 below, the nonsmoothness is rather mild when f and g are smooth, since LC(.,jj) is quasi-differentiable in the terminology of Pchenitchny (1990), i.e. this function has directional derivatives which are sublinear and continuous as functions of the direction. It is well-known that multipliers may exist while the primal problem ( P ) has no solution (Aubin and Ekeland, 1975; Luenberger, 1969, . . .). One of the advantages of the generality of our approach is that it makes clear the symmetry of the situation. Turning the dual problem (DL) into a minimization problem

(7;'

)

minimize - dL(y) y E Y

called the adjoint problem, one can note that -L~ given by -L~ (y, x) := -L(x, y) for (y, x) E Y x X is a Lagrangian of (PE)since supzEx -L(x, y) = -dL(y) for any y E Y. This symmetry has been observed in Balder (1977, p. 337) in the case the optimization problem is of mathematical programming type. Then the bi-adjoint problem is

(Pr>

minimize dLL(x) := sup L(x, y)

x E X.

YEY

When L is a Lagrangian, the bi-adjoint problem is just (P). In the may be simpler than ( P ) and its value is such that general case

(Pr)

sup(DL) 5 inf

(Pr) Iinf ( P ) .

(Pr*)

One always has = ( P i ) . The following statement is a direct application of the preceding proposition to the adjoint problem.

1.1 When sup(DL) = i n f ( P r ) , the multipliers of the adCOROLLARY joint problem (PE) are the solutions of In particular, when L is a Lagrangian and when there is no duality gap, the multipliers of the adjoint problem (P;) are the solutions of ( P ) .

(Pr).

An illustration of the sub-Lagrangian theory is offered by the familiar case of linear programming, in which X is a normed space, f is a continuous linear form c on X and the constraint set is given by

where A is now a continuous linear map from X to another normed space 2, >_B, >C being preorders in X and Z defined by closed convex cones B and C respectively, Y := Z* and b E 2 . When the polar cone

1

7


C0 := {y E Y : 'dz E C (y,z) 5 0) is difficult to deal with, one may replace it with a smaller cone D and use the sub-Lagrangian L given by L(x,y) := (c,x)

+ (y,b-Ax)

for (x,y) E B x D ,

L(x, y) := +oo for (x, y) E (X\B) x Y, L(x, y) := -oo for (x, y) E B x (Y\D). Then, the dual problem turns out to be of a similar form: maximize (y, b)

ATy 2 B C,~y

2~ 0.

The possibility of choosing D in different ways (for instance in taking for D a finite dimensional polyhedral cone) yields estimates which can be easier to get than the ones obtained by taking D = C0 for the usual Lagrangian. On the other hand, it may be more difficult to find multipliers.

3.

Sub-pertusbations and generalized convexity

Another means to get duality results is the theory of perturbations. It relies on some notions of generalized convexity we recall now. Then we will show that a one-sided character can be given to the theory. Given two sets Y, Z paired by means of a coupling function c : Y x Z + Ik := R U {-oo, +oo) one defines the Fenchel-Moreau conjugacy p H pC from IkZ to RY by setting

and the reverse conjugacy q H qCfrom IkY to

IkZ

+

by setting analogously

+

(+oo) = Throughout we adopt the convention (+oo) (-m) = (-oo) +oo and the rule r - s = r (-s); we also assume that sup0 = -oo and inf 0 = +oo. A first appearance of duality arises from the inequality p 2 pCC:= (pC)C. Another way to arrive at duality relationships consists in using the of functions on Z (which could be called c-linear or family ( c ( ~ , generalized linear) and the associated family Ac := := ~ ( y.), r : y E Y, r E R) of c-afine functions (or generalized affine functions). Functions p E RZ which can be represented as suprema of families A,(p) c A, will be called c-convex; we write p E r c ( Z ) :

+

8


The Fenchel-Moreau theorem relates the two approaches.

THEOREM 1.1 For any function p E RZ, the biconjugate pCCcoincides with the A,-convex hull coA,(p) of p given by:

Given some 2 E 2, the function p is said to be c-convex at 2 if the equality pCC(2)= p(2) holds. When p(2) is finite, one has pCC(2)= p(2) and the supremum in (1.7) with z = Z is attained if, and only if, the c-subdifferential aCp(z)is nonempty, where the c-subdiflerential of p at 2 is defined by

The idea of using generalized convexity in the study of Lagrangians has been extensively exposed and exploited (Balder, 1977; Dolecki and Kurcyusz, 1978; Huang and Yang, 2003; Janin, 1973; Horvath, 1990; Kurcyusz, 1975; Lindberg, 1980; Martinez-Legaz, 198813; Moreau, 1970; Pallaschke and Rolewicz, 1997; Penot, 1985; Penot and Volle, 1990a; Rockafellar, 1974a; Rubinov et al., 1999a,b, 2000, 2002; Rubinov and Simsek, 1995a,b, 2000; Rubinov and Yagublov, 1986; Rubinov and Yang, 2003, . . . ). The perturbational approach assumes that one disposes of a parameter space Z and a function P : X x Z --+ Ik which represents a perturbation of the given problem in the sense that for some base point Oz of the parameter space Z one has P ( x , Oz) = fA(x) for each x E X, where as above, fA is the extended objective function given by f A := f LA. Here we relax this condition in calling a function P : X x Z -+ such that P ( x , Oz) 5 fA(x) for each x E X a sub-perturbation of ( P ) . We associate to P the performance (or value) function p given by

+

K and let z E Z\{Oz). It follows from (1.20) that for each x E F ( z ) there exists some v E A such that d ( x ,v ) 5 ~ ' d ( Oz z, ) . If X is the Lipschitz rate of f , we obtain that

Taking the infimum over x E F ( z ) we get p(z) 2 p(Oz) - d X d ( z ,0 2 ) . Taking the supremum over K' > K , it follows that (1.19) holds for q = p with k = KX. 0 In the next result we point out a link between general multipliers and penalization: the knowledge of some multiplier gives some information about the threshold of the penalization coefficient. Again we consider problem ( P ) , with A = G V 1 ( o z ) . For r E R+, we denote by fr the penalized fwction given by

Note that it is much easier to use this function than the function x H f ( x ) kd(x,A) which is the natural penalized function. When relation (1.20) holds, taking the infimum over z E G ( x ) , one can pass from f kd(., A ) to f , since d(.,A ) 5 ~d ( G ( . )0, 2 ) This observation points out the importance of (1.20) which has been the object of numerous studies.

+

+

PROPOSITION1.10 Suppose Y and Z are metric spaces with base points O y and O z respectively. Suppose the set M of multipliers is nonempty and that for some y > 0 the coupling function c satisfies

Then, for r 2 yd(Oy, M ) , one has inf f r ( X ) = inf f ( A ) . Proof. Let x E X, y E M and let z E G ( x ) . Using the notation of relation (1.16),one has

hence

24


inf f (A) = inf L ( X , y) = inf (f (x') %/EX

+ k(x', Y))

I f (x)

Taking the infimum over y E M and z E G(x) one gets, for r $(OY 7 M ) ,

>

Since f,(x) = f (x) for x E A, we get inf f (A) = inf f, (X). For related results, see Bonnans (1990), Lasserre (1980), Penot (1992, 2002), Thibault (1995), Yang and Huang (2001), Ye et al. (1997) and their references. Examples of coupling functions satisfying relation (1.22) are given in Penot (2003~).

Appendix In this appendix we present a simple proof of the assertion of Example 6, inspired by Zaffaroni (2004a); see Rubinov and Simsek (2000), Shveidel (1997), Zaffaroni (2004b) for related material.

PROPOSITION 1.11 A subset S of a normed space Z is closed and starshaped if, and only iJ it contains 0 and is c-convex, where c is the evaluation mapping c: Y x Z --t R given by c(y, z) = y(z) and Y is the set of continuous superlinear functions on Z. Proof. If S := L: r], where p is a continuous superlinear function on Z and r E R, and if 0 E S, then one has r 0 and S is clearly closed and starshaped. Since these properties are preserved by taking intersections, the condition is sufficient. In order to prove it is necessary, let us show that if S is a closed and starshaped subset of Z and if 2 E Z\S, then, there exist some continuous superlinear function p on Z and r E R+ such that p(z) L: r for each z E S and p(2) > r . Since S is closed, there exists some s E ( 0 , l ) and some r > 0 such that the open ball V with center w := s2 and radius r does not intersect S . Let C := (0, + m ) V be the cone generated by V. Since S is starshaped, we have S f l [I, +oo)V = 0.Moreover, we have W + C C[ 1 , + m ) V s i n c e f o r a n y t > O , v ~ Vw, e h a v e w + t v = (1+t)v1 for v' := (1 t)-'w t ( l t)-'v E V, by convexity of V. Therefore, S n (w C ) = 0.Since 2 = s-'w = w (s-l - l)w E w C , the result is a consequence of the following lemma of independent interest. 0

>

+

+

+

+

+

+

LEMMA1.1 Given an open convex cone C # Z of a normed space Z and w E C there exists some continuous superlinear function p on Z such that p(w) = 1 and w C = {z E Z : p(z) > 1).

+

1 Unilateral Analysis and Duality

Proof. Let C+ := (43)'be the dual cone of C and let

K := {y E C + : (y, w) = 1). Let r > 0 be such that B ( w , r ) c C. Then, for each y E K and each z E B(0, r ) , we have w - z E C , hence

so that llyll I r-l. Thus, K is weak* compact. Let p be given by p(z) := infuEK(y,2). Then, the compactness of K ensures that p(z) > 1 for each z E w + C since for each y E K one has C c {u E Z : (y,u) > 0) and since there exists some y E K such that p(z) = (y, z). Now let us show that p(x) I 1 for each x E Z\(w C ) . Then x - w C. By the Hahn-Banach theorem, one can find some y E Z*\{O) such that

+

4

Since C is a cone we have y E C f . Moreover, since C is open and w E C we note that (y, w) > 0. Without loss of generality we may assume (y, w) = 1. Thus y E K and p(x) 5 (y, x) 5 (y, w) inf{(y, c) : c E C) = 1. 0

+

One may wonder whether the function p given by p(z) := d ( z , Z\C) which is obviously superlinear could be of some use for such a matter (shrinking C if necessary).

Acknowledgements The author thanks C. Ziilinescu for his accurate criticisms and remarks.

References Aubin, J.-P. (1979). Mathematical Methods of Games and Economic Theory. North-Holland, Amsterdam. Aubin, J.-P. and Ekeland, I. (1975). Minimisation de crithres intB graux. Comptes Rendus de 1'Acade'mie des Sciences Paris. Se'rie A-B, 281(9):A285-A288. Aubin, J.-P. and Ekeland, I. (1984). Applied Nonlinear Analysis. Wiley, New York. Auslender, A. (2000). Existence of optimal solutions and duality results under weak conditions. Mathematical Programming, 88(1):A45-A59.

Az6, D. (1997). Ele'ments d 'analyse convexe et variationnelle, Ellipses, Paris.

26


Bachir, H., Daniilidis, A., and Penot, J.-P. (2002). Lower subdifferentiability and integration. Set-Valued Analysis, 10(1):89-108. Balder, E.J. (1977). An extension of duality-stability relations to nonconvex problems. SIAM Journal on Optimization, 15:329-343. Balder, E.J. (1980). Nonconvex duality-stability relations pertaining to the interior penalty method. Mathematische Operationsforschung und Statistik (Serie Optimization), 11:367-378. Ben-Tal, A. and Teboulle, M. (1996). A conjugate duality scheme generating a new class of differentiable duals. SIAM Journal on Optimization, 6(3):617-625. Bertsekas, D .P. (l975). Necessary and sufficient conditions for a penalty function to be exact. Mathematical Programming, 9:87-89. Bertsekas, D.P (1982). Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York. Birkhoff, G. (1966). Lattice Theory. American Mathematical Socity, Providence, RI. Bonnans, J.F. (1990). Thkorie de la penalisation exacte. RAIRO Mode'lisation Mathe'matique et Analyse Nume'rique, 24(2):197-210. Bonnans, J.F. and Shapiro, A. (2000). Perturbation Analysis of Optimization Problems. Springer Series in Operations Research, Springer, New York. Borwein, J.M. and Lewis, A.S. (2000). Convex Analysis and Nonlinear Optimization. CMS Books in Mathematics. Springer-Verlag, New York. Bourass, A. and Giner, E. (2001). Muhn-Tucker conditions and integral functionals. Journal of Convex Analysis, 8:533-553. Breckner, W.W. and Kassay, G. (1997). A systematization of convexity concepts for sets and functions. Journal of Convex Analysis, 4(1):109127. Briec, W. and Horvath, Ch. (2004). B-convexity. Optimization, 53(2): 103-127. Briec, W., Horvath, Ch., and Rubinov, A. (2005). Separation in BConvexity. Forthcoming in: Pacific Journal of Optimization.


27

Clarke, F.H. (1983). Optimization and Nonsmooth Analysis. Wiley, New York. Clarke, F.H. and Nour, C. (2005). Nonconvex duality in optimal control. Forthcoming in: SIAM Journal on Control and Optimization. Crouzeix, J.-P. (1977a). Contribution B l'Btude des fonctions quasiconvexes. Thbe d j ~ t a tUniversitk , de Clermont 11. Crouzeix, J.-P. (197713). Conjugacy in quasiconvex analysis. In: Convex Analysis and its Applications (A. Auslender, ed.). Lecture Notes in Economics and Mathematica.1 Systems, vol. 144, pp.66-99, SpringerVerlag, Berlin. Crouzeix, J.-P. (2003). La convexit6 g6nBralisBe en Bconomie mathkmatique. ESAIM: Proceedings, 13:31-40. Daniilidis, A. and Martinez-Legaz, J .E. (2002). Characterization of evenly convex sets and evenly convex functions. Journal of Mathematical Analysis and Applications, 273:58-66. Dolecki, S. and Kurcyusz, S. (1978). On @-convexity in extremal problems. SIAM Journal of Control and Optimization, 16:277-300. Dolecki, S. and Rolewicz, S. (1979). Exact penalties for local minima. SIAM Journal of Control and Optimization, 17(5):596-606. Eberhard, E., Nyblom, M., and Ralph, D. (1998). Applying generalised convexity notions to jets. In: Generalized Convexity, Generalised Monotonicity: Recent Results (J.-P. Crouzeix et al., eds.), pp. 111-157 Kluwer, Dordrecht . Eberhard, A. and Nyblom, M. (1998). Jets, generalised convexity, proximal normality and differences of functions. Nonlinear Analysis, 34:319-360. Ekeland, I. and Tkmam, R. (1976). Analyse Convexe et Problimes Variationnels. Dunod, Paris; English transl., North Holland, Amsterdam. Evtushenko, Yu., Rubinov, A.M., and Zhadan, V.G. (2001). General Lagrange-type functions in constrained global optimization, 11. Exact auxiliary functions. Optimization Methods and Software, 16:231-256. Fan, K. (1956). On systems of linear inequalities. Annals of Mathematics Studies, 38:99--156. Fan, K. (1963). On the Krein-Milman theorem. Proceedings of Symposia in Pure Mathematics, 7:2ll-219.

28


Flachs, J. (1981). Global saddle-point duality for quasi-concave programs. I. Mathematical Programming, 20:327-347. Flachs, J . (1982). Global saddle-point duality for quasi-concave programs. 11. Mathematical Programming, 24:326-345. Flachs, J . and Pollatschek, M.A. (1969). Duality theorems for certain programs involving minimum or maximum operations. Mathematical Programming, l6:348-370. Flores-BazBn, F. (1995). On a notion of subdifferentiability for nonconvex functions. Optimization, 33:l-8. Flores-BazBn, F. and Martinez-Legaz, J.E. (1998). Simplified global optimality conditions in generalized conjugation theory. In: J.-P. Crouzeix, J.E. Martinez-Legaz, and M. Volle (eds.), Generalized Convexity, Generalized Monotonicity, pp. 303-329. Kluwer, Dordrecht. Fujishige, S. (1984), Theory of submodular programs: A Fenchel-type min-max theorem and subgradients of submodular functions. Mathematical Programming, 29:142-155. Frank, A. (1982). An algorithm for submodular functions on graphs. Annals of Discrete Mathematics 16:97-120. Gauvin, J . (1982). The'orie de la Programmation Mathe'matique non Convexe. Les Publications CRM, Montrhal; Revised version: (1994), Theory of Nonconvex Programming; Les Publications CRM, Montrhal. Gauvin, J . and Janin, R. (1990). Directional derivative of the value function in parametric optimization. Annals of Operations Research, 27:237-252. Gauvin, J. and Tolle, J.W. (1977). Differential stability in nonlinear programming. SIAM Journal of Control and Optimization 15:294-311. Giannessi, F . (1984). Theorems of the alternative and optimality conditions. Journal of Optimization Theory and Applications, 42(3): 331365. Goh, C.J. and Yang, X.Q. (2001). Nonlinear Lagrangian theory for nonconvex optimization. Journal of Optimization Theory and Applications, 109(1):99-121. Greenberg, H.P. and Pierskalla, W.P. (1973). Quasiconjugate function and surrogate duality. Cahiers du Centre d ' ~ t u d ede Recherche Ope'rationnelle, 15:437-448.

1


29

Hayashi, M. and Komiya, H. (1982). Perfect duality for convexlike programs. Journal of Optimization Theory and Applications, 38:179-189; 107-113. Hestenes, M.R. (1975). Optimization Theory. The Finite Dimensional Case. Wiley-Interscience, New York. Hiriart-Urruty, J.-B. (1998). Optimisation et Analyse Convexe. Presses Universitaires de France, Paris. Horvath, C. (1990). Convexit6 g6n6ralis6e et applications. Mkthodes topologiques en analyse convexe, MontrBal, Qc, 1986), S6minaire de Mathkmatiques Sup&rieures,vol. 110, pp. 79-99, Presses de l'Universit6 de Montrkal, Montrkal, QC, 1990. Huang, X.X. and Yang, X.Q. (2003). A unified augmented Lagrangian approach to duality and exact penalization. Mathematics of Operations Research, 28(3):533-552. Ioffe, A.D. (2001). Abstract convexity and non-smooth analysis, Abstract convexity and non-smooth analysis. Advances in Mathematical Economics, vol. 3, pp. 45-61, Springer, Tokyo. Ioffe, A.D. and Rubinov, A.M. (2002). Abstract convexity and nonsmooth analysis: Global aspects. Advances in Mathematical Economics, vol. 4, pp. 1-23, Springer, Tokyo. Janin, R. (1973). Sur la dualitd en programmation dynamique. Comptes Rendus de 1'Acade'mie des Sciences. Se'rie A-B, A277:1195-1197. Komiya, H. (1981). Convexity on a topological space. Fundamenta Mathematicae, 11l(2):107-113. Kurcyusz, S. (1975). Some remarks on generalized Lagrangians. Optimization Techniques, 362-388. Kutateladze, S.S. and Rubinov, A.M. (1972). Minkowski duality and its applications. Russiari Mathematical Surveys, 27:137-191. Lasserre, J.B. (1980). Exact penalty function and Lagrange multipliers. RAIRO Automatique, 14(2):117-126. Laurent, P.-J. (1972). Approximation et Optimisation. Hermann, Paris. Li, D. (1995). Zero duality gap for a class of nonconvex optimization problems. Journal of Optimization Theory and Applications, 85:309324.

30


Li, D. and Sun, X. (2001). Nonlinear Lagrangian methods in constrained nonlinear optimization. In: Optimization Methods and Applications, Applied Optimization, vol. 52; pp. 267-277, Kluwer, Dordrecht. Lindberg, P.O. (1980). A generalization of Fenchel conjugation giving generalized Lagrangians and symmetric nonconvex duality. Survey of Mathematical Programming, (Proceedings of the Ninth International Mathematical Programming Symposium, Budapest, l976), vol. 1, pp. 249-267, North-Holland, Amsterdam-Oxford-New York. LovAsz, L. (1983). Submodular functions and convexity. In: Mathematical Programming, the State of the Art (Bonn 1982) (A. Bachem, M. Grotschel, and B. Korte, eds.), pp. 235-257, Springer Verlag, Berlin. Luenberger, D. (1969). Optimization by Vector Space Methods. Wiley, New York. Magnanti, T.L. (1974). Fenchel and Lagrange duality are equivalent. Mathematical Programming, 7:253-258. Martinez-Legaz, J.-E. (1988a). Quasiconvex duality theory by generalized conjugation methods. Optimization, 19:603-652. Martinez-Legaz, J.-E. (1988b). On lower subdifferentiable functions. In: Trends in Mathematical Optimization (K.H. Hoffmann et al., eds), Internationale Schriftenreihe zur Numerischen Mathematik, vol. 84, pp. 197-232, Birkhauser, Basel. Martinez-Legaz, J.-E. (1995). Fenchel duality and related properties in generalized conjugation theory. International Conference in Applied Analysis (Hanoi, 1993), Southeast Asian Bulletin of Mathematics, 19:99-106. Martinez-Legaz, J.-E. and Rubinov, A.M. (2001). Increasing positively homogeneous functions defined on Rn.Acta Mathernatica Vietnam, 26(3):313-331. Martinez-Legaz, J.-E. Rubinov, A.M., and Singer, I. (2002). Downward sets and their separation and approximation properties. Journal of Global Optimization 23(2):111--137. Martinez-Legaz, J.-E. and Singer, I. (1990). Dualities between complete lattices. Optimization, 21:481-508. Martinez-Legaz, J.-E,. and Singer, I. (1995). Subdifferentials with respect to dualities. Zeitschriift fur Operations Research. Mathematical Methods of Operations Research 42(l):109-125.


31

Minoux, M. (1983). Programmation Mathe'matique. The'orie et Algorithmes, Dunod, Paris. Moreau, J.-J. (1964). ThBorkmes "inf-sup". Comptes Rendus de l'Acad6mie des Sciences. Paris, 258:2720-2722. Moreau, J.-J. (1970). Inf-convolution, sous-additivit6, convexit6 des fonctions numBriques. Journal de Mathe'matiques Pures et Applique'es, 4:109-154. Moreau, J.-J. (2003). Fonctionnelles Convezes. SBminaire sur les Bquations aux dBriv6es partielles, Collkge de France, Paris, 1967 Nour, C. (2004). Smooth and nonsmooth duality for free time problem. Optimal Control, Stabilization and Nonsmooth Analysis, Lecture Notes in Control and Information Sciences, vol. 301, pp. 323-331, Springer Verlag, Heidelberg. Oettli, W. (1985). RBgularisation et stabilit6 pour les problkmes "infsup". Mannh,eimer Berichte, 26:9-10. Oettli, W. and Schlager, D. (1998). Conjugate functions for convex and nonconvex duality. Journal of Gbbal Optimixation, 13(4):337-347. Oettli, W., Schlager, D., and ThBra, M. (2000). Augmented Lagrangians for general side constraints, Optimization (Namur, 1998) (Nguyen, Van Hien et al., eds.), Lecture Notes in Economics and Mathematical Systems, vol. 481, pp. 329-338, Springer, Berlin. Ore, 0 . (1944). Galois connexions. Transactions of the American Mathematical Society, 55:493-513. Pallaschke, D. and Rolewicz, S. (1997). Foundations of Mathematical Optimization. Mathematics and its Applications, vol. 388, Kluwer Academic Publishers Group, Dordrecht, 1997. Pallaschke, D. and R.olewicz, S. (1998). Penalty and augmented Lagrangian in general optimization problems. In: Charlemagne and his heritage. 1200 years of civilixation and science in Europe (Aachen, 1995), vol. 2, pp. 423-437. Brepols, Turnhout. Pchenitchny, B.N. (1990). Necessary conditions for an extremum, penalty functions and regularity. In: Perspectives in Control Theory (Sielpia, 1988), Progress in Systems and Control Theory, vol. 2, pp. 286-296. Birkhauser Boston, Boston, MA.

32


Penot, J.-P. (1985). Modified and augmented Lagrangian theory revisited and augmented. Unpublished lecture, Fermat Days, Toulouse. Penot, J.-P. (1992). Estimates of the exact penalty coefficient threshold. Utilitas Mathematica, 42: 147-161. Penot, J.-P. (1995a). Analyse unilaterale et dualit& Communication to the Mode Meeting, Brest. Penot, J.-P. (l995b). Generalized convexity in the light of nonsmooth analysis. In: Recent developments in optimization (Dijon, 1994), Lecture Notes in Economics and Mathematical Systems, vol. 429, pp. 269290, Springer, Berlin, 1995. Penot, J.-P. (1997a). Generalized derivatives of a performance function and multipliers in mathematical programming. In: Parametric optimization and related topics, IV (Enschede, 1995), Approximation & Optimization, vol. 9, pp. 281-298, Lang, Frankfurt am Main. Penot, J.-P. (199713). Multipliers and generalized derivatives of performance functions. Journal of Optzmization Theory and Applications, 93(3):609-618. Penot, J.-P. (1997~).Duality for radiant and shady programs. Acta Mathematical Vzetnamica, 22(2):541-566. Penot, J.-P. (1999). Points de vue sur l'analyse de sensibilite en programmation mathkmatique. In: A. Decarreau, R. Janin, R. Philippe, and A. Pietrus (eds.), Actes des sixiimes journe'es du groupe mode), pp. 176-203, Editions Atlantiques, Poitiers. Penot , J .-P. (2000). What is quasiconvex analysis? Optimization, 47:35110. Penot, J.-P. (2001). Duality for anticonvex problems. Journal of Global Optimization, 19:163-182. Penot , J.-P. (2002). Augmented Lagrangians, duality and growth conditions. Journal of Nonlinear Convex Analysis, 3(3):283-302. Penot, J.-P. (2003a). Characterization of solution sets of quasiconvex programs. Journal of Optimixation Theory and Applications, 117(3):627--636. Penot , J.-P. (2OO3b). A Lagrangian approach to quasiconvex analysis. Journal of Optimixation Theory and Applications, 117(3):637-647.

1


33

Penot , J.-P. (2003~).Rotundity, smoothness and duality. Control and Cybernetics, 32(4):711--733. Penot, J.-P. (2005). The bearing of duality on microeconomics. Advances in Mathematical Economics, 7:113-139. Penot, J.-P. and Quang, P.H. (1993). Cutting Plane Algorithms and Approximate Lower Subdifferentiability. Fortcoming in Journal of Optimization Theory and Applications. Penot, J.-P. and Rubinov, A.M. (2005). Multipliers and General Lagrangians. Submitted. Penot, J.-P. and Sach, P.H. (1997). Generalized monotonicity of subdifferentials and generalized convexity. Journal of Optimization Theory and Applications, 94(1):251-262. Penot, J.-P. and Volie, M. (1987). Dualit6 de Fenchel et quasi-convexit6. Comptes Rendus des Se'ances de I'Acade'mie des Sciences. Se'rie I. Mathe'matique, 304(13):371-374. Penot, J.-P. and Volle, M. (1988). Another duality scheme for quasiconvex problems. In: K.H. Hoffmann et al. (eds.), Trends in Mathematical Optimzzation, pp. 259-275. Internationale Schriftenreihe zur Numerischen Mathematik, vol. 84, Birkhauser, Basel. Penot, J.-P. and Volle, M. (1990a). On quasi-convex duality. Mathematics of Operations Research, 15(4):597-625. Penot, J.-P. and Volle, M. (1990b). On strongly convex and paraconvex dualities. In: A. Cambini, E. Castagnoli, L. Martein, P. Mazzoleni, S. Schaible (eds.), Generalized convexity and fractional programming with economic applications (Pisa, Italy, 1988), pp. 198-218. Lecture Notes in Economics and Mathematical Systems, vol. 345, Springer Verlag, Berlin. Penot, J.-P. and Volle, M. (2003). Surrogate programming and multipliers in quasiconvex programming. SIAM Journal of Control Optimization, 42(6):1994--2003. Penot, J.-P. and Zaiinescu, C. (2000). Elements of quasiconvex subdifferential calculus. Journal of Convex Analysis, 7:243-269. Plastria, F. (1985). Lower subdifferentiable functions and their minimization by cutting plane. Journal of Optimization Theory and Applications, 46(1):37-54.

34


Pshenichnii, B. and Daniline, Y. (1977). Me'thodes nume'riques duns les problimes d 'extre'mum. Mir, Moscow. Rockafellar, R.T. (1974a). Augmented Lagrange multiplier functions and duality in nonconvex programming, SIAM Journal of Control Optimization, 12:268-285. Rockafellar, R.T. (1974b). Conjugate Duality and Optimization. Lectures given at the Johns Hopkins University (Baltimore, 1973), Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, no. 16, Society for Industrial and Applied Mathematics, Philadelphia, Pa. Rockafellar, R.T. (1982). Lagrange multipliers and subderivatives of optimal value functions in nonlinear programming. Nondifferential and variational techniques in optimization (Lexington, 1980), Mathematical Programming Study, 17:28-66. Rockafellar, R.T. (1993). Lagrange multipliers and optimality. SIAM Review, 35:183-238. Rockafellar, R.T. and Wets, R.J.-B. (1997). Variational Analysis. Springer, Berlin. Rolewicz, S. (1994). Convex analysis without linearity. Control and Cybernetics, 23:247-256. Rolewicz, S. (1996). Duality and convex analysis in the absence of linear structure. Mathematica Japonica, 44:165-182. Rubinov, A.M. (2000a). Abstract Convexity and Global Optimixation. Kluwer, Dordrecht. Rubinov, A.M. (2000b). Radiant sets and their gauges. In: V. Demyanov and A.M. Rubinov (eds.), Quasidiflerentiability and Related Topics. Kluwer, Dordrecht. Rubinov, A.M. and Andramonov, M. (1999). Lipschitz programming via increasing convex-along-rays functions. Optimization Methods and Software, 10:763-781. Rubinov, A.M. and Glover, B .M. (1997). On generalized quasiconvex conjugation. In: Y. Censor and S. Reich (eds.), Recent Developments in Optimization Theory and Nonlinear Analysis, pp. 199-216. American Mathematical Society, Providence, RI.


35

Rubinov, A.M. and Glover, B.M. (1998a). Quasiconvexity via two steps functions. In: J.-P. Crouzeix, J.-E. Martinez-Legaz and M. Volle (eds.), Generalized Convexity, Generalized Monotonicity, pp. 159-183. Kluwer Academic Publishers, Dordrecht. Rubinov, A.M. and Glover, B.M. (1998b). Duality for increasing positively homogeneous functions and normal sets. RAIRO Recherche Opt!rationnelle, 32:105-123. Rubinov, A.M. and Glover, B.M. (1999). Increasing convex-along-rays functions with applications to global optimization. Journal of Optimization Theory and Applications, 102:615-642. Rubinov, A.M., Glover, B.M., and Yang, X.Q. (1999a). Extended Lagrange and penalty functions in continuous optimization. Optimization, 46:327-351. Rubinov, A.M., Glover, B.M., and Yang, X.Q. (1999b). Decreasing functions with applications to optimization. SIAM Journal on Optimization, 10(1):289-313. Rubinov, A.M., Glover, B.M., and Yang, X.Q. (2000). Nonlinear unconstrained optimization methods: A review. In: Progress in optimization (Perth, 1998),pp. 65-77. Applied Optimization, vol. 39, Kluwer Acad. Publ., Dordrecht. Rubinov, A.M., Huang, X.X., and Yang, X.Q. (2002). The zero duality gap property and lower semicontinuity of the perturbation function. Mathematics of Operations Research, 27:775-791. Rubinov, A.M. and Simsek, B. (1995a). Conjugate quasiconvex nonnegative functions. Optimization, 35:l-22. Rubinov, A.M. and Simsek, B. (199513). Dual problems of quasiconvex maximization. Bulleti~of the Australian Mathematical Society, 51:139-144. Rubinov, A.M. and Simsek, B. (2000). Separability of star-shaped sets with respect to infinity. Progress in Optimization (Perth, 1998) (X. Yang et al:, eds.), Applied Optimization, vol. 39, pp. 45-63, Kluwer Acad. Publ., Dordrecht. Rubinov, A.M. and Yagublov, A.A. (1986). The space of star-shaped sets and its applications in nonsmooth optimization. Mathematical Programming Study, 29: 176-202.

36


Rubinov, A.M. and Yang, X.Q. (2003). Lagrange-Type Functions in Constrained Nonconvex Optimization. Kluwer Academic Publishers. Shveidel, A. (1997). Separability of star-shaped sets and its application to an optimization problem. Optimization, 40:207-227. Simons, S. (1994). A flexible minimax theorem. Acta Mathematica Hungarica, 63(2):119-132. Simons, S. (1998). Mznimax and Monotonicity. Lecture Notes in Mathematics, vol. 1693, Springer, Berlin. Singer, I. (1986). Some relations between dualities, polarities, coupling functions and conjugations. Journal of Mathematical Analysis and Applications, 1l5:1-22. Singer, I. (1997). Abstract Convex Analysis. John Wiley, New York. Sion, M. (1954). On the existence of functions having given partial derivatives on a curve. Transactions of the American Mathematical Society, 77:179-201. Sion, M. (1958). On general minimax theorems. Pacific Journal of Mathematics, 8:171-176. Thach, P.T. (1991). Quasiconjugate of functions, duality relationships between quasiconvex minimization under a reverse convex constraint and quasiconvex maximization under a convex constraint and application. Journal of Mathematical Analysis and Applications, 159:299-322. Thach, P.T. (1993). Global optimality criterion and a duality with a zero gap in nonconvex optimization. SIAM Journal on Mathematical Analysis, 24(6):1537-1556. Thach, P.T. (1994). A nonconvex duality with zero gap and applications. SIAM Journal on Optimization, 4(1):44-64. Thach, P.T. (1995). Diewert-Crouzeix conjugation for general quasiconvex duality and applications. Journal of Optimization Theory and Applications, 86(3):719-743. Thibault, L. (1995). Proprie'tks des sous-diflrentiels de fonctions localement lipschitziennes de'finies sur un espace de Banach skparable. Applications. Ph.D. Thesis, Universit6 Montpellier. Tind, J. and Wolsey, L.A. (1981). An elementary survey of general duality theory in mathematical programming. Mathematical Programming, 21:241-261.

1


37

Volle, M. (1985), Conjugaison par tranches. Annali di Matematica Pura ed Applicata. Series IV, 139:279-312. Yang, X.Q. and Huang, X.X. (2001). A nonlinear Lagrangian approach to constrained optimization problems. SIAM Journal on Optimixation, 11(4):1119-1144. Ye, J.J., Ye, X.Y., and Zhu, Q.J. (1997). Exact penalization and necessary optimality conditions for generalized bilevel programming problems. SIAM Journal on Optimixation, 7:481-507. Zaffaroni, A. (2004a). Is every radiant function the sum of quasiconvex functions? Mathematical Methods of Operations Research, 59:221-233. Zaffaroni, A. (2004b). A Conjugation Scheme for Radiant Functions. Preprint, University di Lecce. ZMinescu, C. (2002). Convex Analysis i n General Vector Spaces. World Scientific, Singapore.

Chapter 2

MONOTONIC OPTIMIZATION: BRANCH AND CUT METHODS Hoang Tuy Faiz Al-Khayyal Phan Thien Thach Abstract

1.

Monotonic optimization is concerned with optimization problems dealing with multivariate monotonic functions and differences of monotonic functions. For the study of this class of problems a general framework (Tuy, 2000a) has been earlier developed where a key role was given to a separation property of solution sets of monotonic inequalities similar to the separation property of convex sets. In the present paper the separation cut is combined with other kinds of cuts, called reduction cuts, to further exploit the monotonic structure. Branch and cuts algorithms based on an exhaustive rectangular partition and a systematic use of cuts have proved to be much more efficient than the original polyblock and copolyblock outer approximation algorithms.

Introduction

Monotonic optimization, or more generally d.m. (differences of monotonic) optimization, is concerned with nonconvex optimization problems described by means of monotonic and d.m. functions. This class of problems is very wide and includes a large majority of nonconvex problems encountered in the applications, such as: multiplicative programming, quadratically constrained quadratic optimization, polynomial optimization, posynomial optimization, Lipschitz optimization, fractional programming, generalized fractional programming, etc. (Tuy, 2000a; Rubinov et al., 2001; Phuong and Tuy, 2002, 2003; Luc, 2001; Tuy et al., 2004; Tuy and Luc, 2000; Tuy and Nghia , 2003, . . . .) For the numerical study of this class of problems a theory of d.m. optimization (Tuy, 2000a) has been recently developed which shares many common features with d.c. optimization (Tuy, 1995). Just as the ba-

40


sic problem of d.c. optimization is convex maximization under convex constraints, the basic problem of d.m. optimization is maximization of a monotonic function under monotonic constraints. In d.c. optimization the separation property of convex sets in its various forms plays a fundamental role. Similarly, in monotonic optimization a key role is given to a separation property of normal sets (i.e. upper level sets of increasing functions), where however, separation is achieved by a cone congruent to the nonnegative orthant rather than by a halfspace. Based on this specific separation property, normal sets can be approximated as closely as desired by particular geometric objects called "polyblocks" (the analogues of polytopes), so that monotonic optimization problems can be solved by polyblock outer approximation algorithms analogous to polyhedral outer approximation algorithms for convex maximization. Limited computational experience with polyblock algorithms has shown that they work quite well for problems whose equivalent monotonic formulation has relatively small dimension n, typically n 5 10 (Rubinov et al., 2001; Tuy and Luc, 2000; Tuy et al., 2004; Luc, 2001; Phuong and Tuy, 2002). However, for n 2 5 the algorithm often converges slowly near to the optimum and needs a large number of iterations to reach the desired accuracy. This phenomenon is quite common for outer approximation procedures. According to the outer approximation scheme, each new polyblock is derived from the previous one by cutting off some unfit portion disclosed by the separation property. But, as has been often observed, cuts of this kind usually become shallower and shallower in high dimension. One may wonder whether it is possible to make these cuts deeper by removing the whole removable portion and not only part of it as we did in the original polyblock algorithms. On the other hand, before initializing the polyblock and copolyblock algorithms in Tuy (2000a) a reduction operation is applied to the box containing the feasible set. This reduction actually results from two consecutive cuts, each removing the complement of a cone congruent to the negative or the positive orthant. The question arises as to whether monotonicity properties could be exploited more efficiently by combining reduction with separation cuts and branching to produce algorithms capable of handling larger problems than previously. The purpose of this paper is to investigate these possibilities and to develop branch and cut algorithms for monotonic optimization based on a systematic use of valid cuts exploiting the monotonic structure. In Section 2 we review some basic geometric concepts needed for the foundation of monotonic optimization, such as normal and conormal sets, their separation property, together with the approximation of these sets

Monotonic optimization

41

by polyblocks and copolyblocks. The cuts underlying this approximation are called separation cuts because they are generated by the specific separation property of normal and conormal sets. In Section 3 we introduce valid reduction cuts, which can be used for replacing a given box by a smaller one without losing any feasible solution contained in the box. Section 4 presents a refined version of the polyblock algorithm for solving the canonical monotonic optimization problem. The refinement consists in a maximal use of monotonicity cuts for tightening the outer approximating polyblocks and accelerating the convergence of the outer approximation procedure. As a result a new polyblock algorithm is developed which can also be considered a special branch and cut algorithm combining outer approximation with branching and range reduction. Since dimensionality is a formidable obstacle for every deterministic global optimization method, Section 5 discusses some techniques for transforming nonconvex optimization problems into monotonic optimization problems of reduced dimension and also for handling problems with many intermediate variables. A drawback of the polyblock approximation algorithm is that in the most general case it may be very difficult to obtain by this algorithm an adequate approximate optimal solution in finitely many iterations. To overcome this drawback, Section 6 presents a finite procedure for computing an adequate approximate optimal solution. Another potential difficulty with polyblock algorithms is that the collection of partition sets may quickly increase in size as the algorithm proceeds because at each iteration a partition set may be subdivided into n subsets (n = dimension of the problem). Therefore, in Section 7 a branch-reduce-and-bound algorithm for general d.m. optimization is presented in which successive partition always proceeds by bisection as in conventional rectangular algorithms. Finally, to illustrate the practicability of the proposed algorithms, Section 8 presents some numerical examples taken from the literature and known to be among the hardest test problems. Regarding the notation, throughout the paper, for two vectors x, y E Rn we write u r= x A y, v = x V y to mean ui = min(xi, yi), vi = max(xi, yi), Yi = 1,.. . ,n. For any finite set { x l , . . . ,xm) C Rn we k to mean ui = minixi,. . . x m i ), vi = write u = A ~ = ~vX=~Vg1x , max{xi,. . . ,xT) Y i = 1,.. . ,n.

2.

Basic concepts

We first briefly review some basic concepts of monotonic optimization (Tuy, 2000a).

42


Increasing function, normal set and polyblock For any two vectors x, y E Rn we write x 5 y (x < y, resp.) to mean X i 5 yi (xi < yi, resp.) for every i = 1 , . . . , n. If a 5 b then the box [a, b] ((a, b], resp.) is the set of all x E Rn satisfying a 5 x 5 b (a < x 5 b, resp.). When x 5 y we also say that y dominates x. As usual, ei denotes the i-th unit vector of Rn, i.e. the vector such that ei = 1, e; = 0 V j # i, and e the vector of n ones, i.e. e = Cy'l ei. A function f : R? + R is said to be increasing if f (x') 1 f (x) when x' 1 x 1 0; strictly increasing if, in addition, f ( x t ) > f (x) when x' > x. A function f is said to be 2.1

decreasing (strictly decreasing) if -f is increasing (strictly increasing). Let [a, b] be a box in R3. A set G c [a,b] is said to be normal in [a, b] (or briefly, normal) if x E G =+ [a,x] C G. A set H C [a,b] is said to be conormal in [a, b] (or briefly, conormal) if x 6 H + [a,x] f l H = 0. (Conormal sets have been previously called "reverse normal" in Tuy (2000a)l If g, h : R 3 + R are increasing functions then clearly the set G = {x E [a, b] I g(x) 5 0) is normal, while the set H = {x E [a,b] I h(x) 1 0) is conormal. Given a set A c [a, b] the normal hull of A, written Al, is the smallest normal set containing A. The conormal hull of A, written LA, is the smallest conormal set containing A. (i) The normal hull of a set A c [a,b] C Rn+ is the PROPOSITION 2.1 set A1 = uZEA[a, z]. If A is compact then so is A]. (ii) The conormal hull of a set A c [a, b] c Rn+ is the set [A = uzEA[z,b]. If A is compact then so is [A. Proof. It suffices to prove (i), because the proof of (ii) is similar. Let P = uzEA[a,z]. Clearly P is normal and P > A, hence P > A1 . Conversely, if x E P then x E [a,z] for some z E A c A], hence x E A1 by normality of A], so that P c ~1 and therefore, P = A]. If A is compact then A is contained in a ball B centered at 0, and if xk E A], k = 1 , 2 , .. . , then since xk E [a, zk] C B, there exists a subsequence {k,) C {1,2,. . .) such that zku + z0 E A, x k -+ xO E [a, zO],hence xOE A], proving the compactness of A:. 0

A polyblock P is the normal hull of a finite set V c [a, b] called its vertex set and is denoted by V = vertP. By Proposition 2.1, P = uZEv[a,z]. A vertex z of a polyblock is called proper if there is no vertex z' # z "dominating" z, i.e. such that z' 2 z. The set of proper vertices of P is denoted by pvertP. An improper vertex or improper

43


element of V is an element of V which is not a proper vertex. Obviously, a polyblock is fully determined by its proper vertex set; more precisely, P = ( p v e r t ~ ) li.e. , a polyblock is the normal hull of its proper vertices. Similarly, a copolyblock (reverse polyblock) Q is the conormal hull of a finite set T c [a, b] called its vertex set. By Proposition 2.1, Q = uzET[z,b]. A vertex z of a copolyblock is called proper if there is no vertex z' # z "dominated" by z, i.e. such that z' 5 z. An improper vertex or improper element of T is an element of T which is not a proper vertex. Obviously, a copolyblock is fully determined by its proper vertex set; more precisely, a copolyblock is the conormal hull of its proper vertices.

PROPOSITION 2.2 a polyblock.

(i) The intersection of finitely many polyblocks is

(ii) The intersection of finitely many copolyblocks is a copolyblock. Proof. If TI, T2 are the vertex sets of two polyblocks PI,P2, respectively, then PI n P 2 = ( U Z E T [a, ~ 21) n (U,€TZ [a,Y] = U Z E T ~ , ~ E[a, T Zz] n [a,Y] = U z E ~ l , y E[a, ~ 22 A y] where u = A y means ui = min{zi, yi) b'i = 1,. . . , n. Similarly, if TI, T2 are the vertex sets of two copolyblocks Q1, Q2, reT ~ [z, b] n [Y, b] = U ~ C , T ? / E~T ~[z V Y, b] spectively, then Qi n Q2 = U ~ E ,,ET~ where v = 2 V y means vi = max{zi, yi) b'i = 1, . . . , n. Finally observe that if x E [a, b] then the set [a, b] \ (x, b] is a polyblock with proper vertices

2.2

The canonical monotonic opt irnizat ion problem

As was proved in Tuy (2000a), by simple manipulations any optimization problem dealing with increasing or decreasing functions can be reduced to the following canonical form:

where [a, b] C Rn+;f , g, h: Rn+ + R are increasing functions, and f , h are U.S.C.(upper semi-continuous) while g is 1.s.c. (lower semi-continuous). Setting

the problem can alternatively be written as max{ f (x) I x E G n H

c [a,b])

44


with G, H being closed normal and conormal subsets in [a, b], respectively. Sometimes it will be convenient to refer to the constraint g(x) 5 0 (i.e. x E G) as the normal constraint, and the constraint h(x) 2 0 (i.e. x E H) as the conorrnal constraint. Of course the problem is the same as that of minimizing the lower semi-continuous decreasing function -f (x) over the set G n H . Also note that a minimization problem such as min{f (x) I x E G n H , x E [a, b]).

(MO/B)

- Specifically, can be converted to an equivalent maximization problem. by setting x = a + b - y, = -f(a+ b-y), G = a + b-G, H = a b - H, it is easily seen that the problem MO/B is equivalent to the following MO/A:

+

of(y)

Therefore, in the sequel, we will mostly restrict attention to the problem MO/A. For a closed normal set G in [a, b] a point 3 E G is called an upper boundary point if the cone K$ := {x I x > Z ) contains no point x E G. The set of all upper boundary points of G is called its upper boundary and denoted by d+G. Clearly, if 2 E [a, b] \ G then the first point of G in the line segment joining 2 to a is an upper boundary point of G.

PROPOSITION 2.3 Let G be a closed normal set in a box [a, b], and 2 E [a, b] \ G. If 3 is any point on d+G such that IT;. < 2 then the cone K$ := {x 1 x > IT;.) separates 2 strictly from G. Proof. If there were x E G such that x > 3 then by normality, [Z,x] C G, hence G n K: > [Z,x] n Kz # 0, conflicting with Z being an upper 0 boundary point. We shall refer to the cone K: as a separation cut with vertex Z. The next Corollary shows that with respect to compact normal sets, polyblocks behave like polytopes with respect to compact convex sets.

2.1 Any compact normal set is the intersection of a family COROLLARY of polyblocks. In other words, any compact normal set G can be approximated as closely as desired by a polyblock. Proof. Clearly, if G c [a, b] then Po := [a, b] is a polyblock containing G. Therefore, the family I of polyblocks containing G is not empty. We have G = niEIRibecause if there were x E niErPi\ G there would

45


exist, by the above Proposition, a polyblock P contradiction.

> G such that

x $ P, a 0

Based on these properties, a method for solving problem MO/A consists in generating inductively a nested sequence of polyblocks outer approximating the feasible set:

in such a way that max{f(x) I x E Pk)\ max{f(x) ( x E G n H ) . At each iteration, a vertex zk of the current polyblock Pkis chosen such that f (zk) is maximal among all vertices z of Pk belonging to H (if no such z exists, the algorithm terminates: either a current best feasible solution exists and then it is an optimal solution of MO/A; or else the problem MO/A is infeasible). If zk E G then zk is the sought global maximum. Otherwise, a point xk E dSG is determined such that the set (xk,zk] = {x: xk < x 5 zk) contains no feasible solution and can be removed from [a,zk] without losing any feasible solution. If also xk E H then xk is a feasible solution and can be used to update the current best feasible solution. By removing (xk,zk] from Pk,a new polyblock Pk+l, smaller than Pk,is formed which excludes zk while still containing at least a global optimal solution of the problem. The procedure can then be repeated at the next iteration. Under mild conditions, namely: f (x) upper semi-continuous, a E G, Int G

# a), b E H,

x

> a 'dx E H

it has been shown in Tuy (2000a) that as Ic -+ +co the sequence xk converges to a global optimal solution of MO/A. The convergence speed of the above method critically depends on two operations:

I.) given a point zk E [a, b] \ G, select a boundary point xk of G such that the set {x: xk < x 5 zk) can be removed from [a,zk] without losing any feasible solution; 2. generate the new polyblock set.

Pk+1and

compute its proper vertex

Although the rules given for these operations in the original polyblock approximation algorithm in Tuy (2000a) have proven to perform satisfactorily on problems which can be reduced to equivalent monotonic

46

ESSA Y S AND SURVEYS IN GLOBAL OPTIMIZATION

optimization problems of small dimension, it turns out that the convergence guaranteed by these rules is too slow in the general case. It is therefore of interest to examine how one can speed up the convergence by using more sophisticated rules for these operations, especially for the construction of the new polyblock Pk+l.

3.

Valid reduction cuts

Given a number y E f (GnH) and a box [p, q] c [a,b] we would like to check whether the box [p, q] contains at least one solution to the system

v,

q'] c [p, q] still containing the and to find, if possible, a smaller box best solution of (2.2) in b,q], i.e., a solution x of (2.2) in [p, q] with maximal value f ( x ) . Observe that if g(q) 2 0 then for every point x0 E [p,q] satisfying (2.2) the line segment joining xOto q intersects the surface g(x) = 0 at a point x' E [p, q] satisfying

v,

q'] contains all x E Therefore, it suffices that 0 I h(x), f ( x ) 2 y,i.e., satisfying

PROPOSITION 2.4

(i) If g(p)

(ii)

> 0 or min{h(q), f (q) - y) < 0 then

b, q] satisfying If g(p) 5 0 then the box b, q'] there is no x E

b, q] satisfying g(x) =

(2.2). where q' = p

+ Cy=,ai(qi - pi)e2,

with

b, q]. = q - Cy=,Pi(qi - pi)ei,

still contains all feasible solutions to (2.3) in (iii) If h,(q) 2 0, then the box with

v,q] where p'

still contains all feasible solutions to (2.3) in

b,q]

47


Proof. It suffices to prove (ii) because (iii) can be proved analogously, while (i) is obvious. Since qQ= aiqi + (1 - ai)pi with 0 5 ai 5 1, it follows that pi 1q i 1 qi Vi = 1 , . . . ,n , i.e. [p, q'] c [p, q]. Recall that

+

For any x E G fl b, q] we have, by normality, [p, x] c G, so xi = p (xi - pi)ei E G, i = 1,.. . ,n. But xi L: qi, SO xi = p a(qi - pi)ei with 0 1a 1. This implies that a 5 ai, i.e. xi 5 p + ai(qi - pi)e2, i = 1,.. . ,n , and consequently x q', i.e. x E [p, q']. Thus, G n [p, q] c G n [p, q'], which completes the proof because the converse inclusion is obvious from the fact [p, q'] c [p, q].

y ) c P'c P.

(2.9)

Proof. Since E (7 [a,x] = 8 for every deleted z , while E f l [a, z] C redy [a,x] := [a,x'] for every other x, the proposition follows. 0

48


We shall refer to the polyblock P' with vertex set V' as the y-valid reduction of the polyblock P and denote it by redyP. As we saw the reduction amounts to a number of monotonicity cuts.

The polyblock algorithm

4.

Recall that at the k t h iteration of the polyblock outer approximation method for solving MO/A outlined in Subsection 2.2 we have a polyblock Pk > G n H n {x I f (x) 2 3/lc) where yk is the current best value (the objective function value at the best feasible solution so far available). Then let zk be a maximizer of f (x) over the proper vertex set vk of Pk. If zk E G then it is the sought optimal solution. Otherwise, two operations are performed: 1. Select a point xk E d+G such that the set {x I xk < x 5 zk) can be removed from [a,zk] without losing any feasible solution x with f (4 2 7. 2. Generate the new polyblock Pk+land compute its proper vertex set Vk+l. Let us describe how to perform these two operations.

4.1

Computing the boundary point xk.

Given a point zk E [a, b] \ G we want to select a boundary point xk of G such that the set {n: E [a,b] x > x k ) cuts off zk without cutting off any point of G. As in Tuy (2000a) we take xk = n(zk), where n(xk) is the first point of G on the line segment joining zk to a , i.e.

I

n(zk) = zk - Xk(z.k ' - a ) ,

Xk

= min{X

I g(zk - X(zk - a)) 5 0).

Then the convergence speed depends on how fast zk - xk

-+

0 as k

-+

+m.

Experience has shown that, with this choice of x k , if the optimal solution lies in a strip {x I ai xi ai E ) with E > 0 very small for some i E (1,. . . ,n ) , then as zk approaches this strip the ratio Xk = Izf - x$l/ (zf - ai) may decrease slowly, causing a slow convergence of zk to the optimal solution. To prevent such an event, it suffices to arrange that (2.10) x > a + a e V X E H.

0 is chosen not too small compared to Ilb - all (e.g. a = 1/411b - all). This can be achieved by simply shifting the origin to -ae. A point x E G f l H is said to be an upper basic solution of MO/A if there is no y E G n H such that y x, y # x. Clearly an upper basic

>

49


solution must be an upper boundary point of G and for any y E G n H there is an upper basic solution x y, namely x = zn, where z' E argmax{xl I x E G n H, x 2 y), xi E argmax{zi I x E G n H, z 2 xi-') for i = 2 , . . . , n. Therefore, an optimal solution of the problem MO/A can always be found among the upper basic solutions.

>

PROPOSITION 2.6 Assume that (2.10) holds. Let a' = a - ae, G = (G - Rn+)n [a', b]. Then G is a closed normal set such that every upper basic optimal solution of the problem

is also an upper basic optimal solution of MO/A. Proof. That G is closed is clear. To see that G is normal, let x E G, so that x E [a', b] and x E G - Rn+,i.e. x y for some y E G. If x' E [a', x] then on the one hand, a' x' x 5 b, hence x' E [a', b] , on the other hand, x' x 5 y E G, hence x' E G - Rn+, so x' E G, proving the normality of G. Now let Z be an upper basic optimal solution of (2.11). Then Z E G and whenever x' E G, x' 2 3 then x' = 3. Since Z E G we have Z 5 y for some y E G c G, hence Z = y E G. Furthermore, if x' E G and x' 2 Z then, since G c G, we must have x' = Z, so 3 is an upper extreme point of G and obviously an optimal solution of

<
1 we have zkp $! (xkl,zkl] because Pk, C Pkl\ (xkl, b] . Hence, 11 zkp - zkl11 1 mini,' ,...,, Izikl - xik ' 1. k On the other hand, mini=l,,,,,n(zil - ai) pllzk1 - all by (2.15) because zkl E H, while xk' lies on the line segment joining a to zk1, so zik ' - xfl = kl z? - ai/llzk1 - all llzkl - xkl 11 2 pllzk1 - x k 111 b'i, i.e. lzi 1 1 , O I I Z ~ ~ - xk1ll. Therefore, Ilzk* - zk111 L mini=l,...,n Izikl - xikl I > pllzkl - xkl 11 2 pq, conflicting with the boundedness of the sequence {zkl) c [a, b]. Thus, zk - xk -+ 0 and by boundedness, we may assume, by passing to subsequences if necessary, that xk --+ 3, zk --+ 3. Then, since zk E H, xk E G b'k, it follows that E G n H , i.e. 3 is feasible. Furthermore, f ( z k ) f ( z ) b'z E Pk > G n H , hence by letting k +oo: f (3) f (x) b'x E G n H, i.e. 3 is an optimal solution. 0

>

XP

>

>

--+

REMARK2.2 Algorithm PA differs from the original Algorithm 1 in Tuy (2000a) mainly in that monotonicity cuts are used systematically to reduce the feasible portion currently still of interest. All the remarks about implementation issues in Tuy (2000a), Section 5, for the original Algorithm 1 in Tuy (2000a), also apply for Algorithm PA. In particular,

53


(i) To avoid storage problems in connection with the growth of the it may be useful to restart the Algorithm whenever set exceeds a prescribed limit L. If o x is the point where we would like to restart the algorithm (usually, o z = xk or current best solution), then Step 5 should be modified as follows.

vk,

vkI

Step 5. If IVk+ll 5 L, then set k Otherwise go to Step 6.

t

k

+ 1 and return to Step 1.

-

Step6. R e d e f i n e ~ ~ = n ( Q x ) , V ~ + ~ = { b + ( x ~ - b ~ ) ..., e ~ ,n), i=1, and return to Step 1. With this modification an occurrence of Step 6 means a restart, i.e. the beginning of a new cycle of iterations. (ii) Computational experience seems to suggest that the convergence should usually be faster for problems with a more balanced feasible set G n H; in other words, e.g. if a box [a,b] can be chosen so that min{bi - ai/bj - a j I i < j ) is more or less near to 1. To achieve this, a rescaling may often be useful.

REMARK 2 . 3 Since in practice we must stop the algorithm at some iteration k , the question arises as to when to stop (how large k should be) to obtain a zk sufficiently close to an optimal solution. If the problem MO/A does not involve any conormal constraint, i.e. has the form max{f (4 1 9(x>5 0, x E [a, bl),

+

then xk is always feasible, so when f (xk)- f (xk) < E , then f (xk) E > f ( z k ) 2 max{f(x) 1 g(x) 5 0 5 h(x),x E [a, b ] ) , i.e. xk will give an &-optimalsolution of MO/A. In the general case, xk may not be feasible and it may happen that Algorithm PA cannot provide an &-optimal solution in finitely many iterations. Since however, g(zk) - g(xk) -+ 0 and g(xk) 5 0, we must have g(zk) ) I for k sufficiently large. Then xk is an E-approximate optimal solution, in the sense that it is an optimal solution of the perturbed problem

Although f (zk) tends to the optimal value w of MO/A, the drawback of the algorithm is that it does not indicate how small should be E > 0 to have f (zk) sufficiently close to w .

54


Copolyblock algorithm

4.4

A similar copolyblock algorithm can be formulated for solving the monotonic minimization problem

Without loss of generality one can assume that

(G, H have the same meanings as previously). For a given number 6 E f (G n H ) (eg. 6 is the value of the objective function at the best feasible solution so far available) let

By 6-valid reduction of a box

b,q] c [a,b] we mean the box

where

If Q is a copolyblock with proper vertex set V then the &valid reduction of Q is the copolyblock redsQ generated by the set V' obtained from V by deleting every z E V satisfying reds[z, b] = 0 and replacing every other z E V by the lowest corner z' of the box reds[z, b]. The next proposition is the analogue of Proposition 2.7:

PROPOSITION 4.2' 1 Let Q be a copolyblock with proper vertex set V c [a, b] and let x E [a, b] satisfy V, := {x E V I z < x ) # 0. T h e n the copolyblock Q' := Q \ [a,x) has vertex set

T' = (V \ V*) U {xi = z .t(xi - zi)ei 1 z E V*, i E (1,.. . ,n ) ) . The improper elements of T' are those xi = x+ (xi - zi)ei for which there exists y E V$ := (x E V I x 5 x ) such that J ( y , z ) := { j I yj > zj) = { i ) . I n other words the proper vertex set of the copolyblock Q' := Q\[a, x) is obtained from T' by removing improper elements according to the rule:

55

Monotonic optimization For every pair z E V*, y E V$ compute J ( y , z) = { j ( yj J ( y , z) consists of a single element i then remove z2.

> zj) and zf

With the above background the copolyblock algorithm for solving MO/B reads as follows (assuming f , g 1.s.c. and h U.S.C.):

Algorithm QA [Copolyblock Algorithm] Let Q1 be an initial copolyblock containing G n H and let Step 0. & be its proper vertex set. For example, Q1 = [a, b], Vl = {a). Let 3' be the best feasible solution available and CBV = f (3') (if no feasible solution is available, set CBV = +m). Set k = 1.

Step 1. Let = redsQk and Reset a c ~ { EzVk).

Vk = p V e r t G ~ kfor , 6 = CBV.

Step 2. 1f Vk = 0 terminate: if CBV = +m the problem is infeasible; if CBV < +m the current best feasible solution zk is an optimal solution. Step 3. If Pk# 0, select zk E argmin{f (z) I z E Vk). If h(zk) 2 0, then terminate: zk is an optimal solution. Otherwise go to Step 4.

+

Step 4. Compute xk = zk Xk(b - zk), with Xk = min { A I h(zk X(b - z k ) ) 2 0) (so xk is the first point of H on the halfline from zk to b). Determine the new current best feasible solution zkfl and the new current best value CBV. Compute the proper vertex set Vk+1 of \ [a,xk) according to Proposition 4.2'. Qk+1 =

+

o~~

Step 5..

Increment k and return to Step 1

REMARK2.4 Just as with the Algorithm PA for MO/A, although the condition x < b b'x E G is sufficient for the convergence of the algorithm, to prevent possible jams near to certain facets of the box [a,b] one should arrange so that x < b - a e VXEG, where cr > 0 is a constant not too small compared to Ilb - all. Also, if there is no normal constraint, i.e. if the problem has the form

then xk yields an &-optimal solution when f (xk) - f (zk) ) i. In the general case, xk can only yield, for k so large that g(xk) ) E , an iapproximate optimal solution of MO/B, i.e. an optimal solution of the

56

5.


Dimension reduction techniques

At the present state of knowledge, no deterministic global optimization method can pretend to be able to solve efficiently general nonconvex problems of large dimension. The monotonic optimization method is not an exception. Nevertheless, a host of large-scale nonconvex problems can be reduced to monotonic optimization problems of much reduced dimension. Therefore in practice the range of problems solvable by the monotonic optimization approach extends far beyond problems of small dimension. Since in any event the dimension of a global optimization problem is a crucial factor determining its difficulty, it is of utmost importance, before embarking on a global solution procedure, to reduce the dimension whenever possible. As shown in the next examples, this can often be done by simple transformations.

EXAMPLE 2.1 Let ui (x), i = 1,. . . ,m, be positive-valued increasing functions on Rn+, and D = {x I Ax 5 c) be a polytope in RF. Since A = A+ -A- where A+, A- are matrices with nonnegative components, the constraint Ax 5 c is in fact a d.m. constraint, so the "multiplicative programming" problem

is a monotonic optimization problem in Rn+, In most practical cases, m is much smaller than n. Choosing yi = ui(x) as new variables we can rewrite this problem as

Since the function

nL1yt is increasing on RT while

is a normal compact set, hence contained in some box [a, b] (e.g. ai = min{ui(x) I x E D), bi = ~ ~ x { u ~ ( (xx E) D)), this is a monotonic optimization problem in RI;.:


57

Using this transformation fairly large scale multiplicative programs with hundreds of variables and up to 10 functions u ~ ( x )can be solved efficiently by the polyblock approximation algorithm, even in its original version (see e.g. Rubinov et al., 2001), Tuy and Luc (2000); Phuong and Tuy (2003). Similarly, the problem

is equivalent to

More generally, a problem of the form

with
(a, c) Yx = (y, t) E H. Consider the point zk = (uk,s k ) E H \ G chosen in Step 3. Let ik= (uk,s k ) , where ik - min{t I h(uk,t) 2 0), so that ikE H but ( u k , t ) $ H for every t < ik.If ikE G then ikis feasible and by the choice of zk it follows that ikis an optimal solution. So let ik$ G and denote by xk = ( y k , t k )the first point of G on the line segment joining ikto (a, c). By removing (xk,xk] from the box [(a,c) ,zk], we obtain a polyblock with vertices

Since in view of (2.10) xk < ikwe have tk < sk,hence xkln+' $ H and will be dropped. Thus, only x k l , . . . , xkn will remain for consideration, as if we were working in Rn. This discussion suggests that to solve problem (2.21) by Algorithm PA, Steps 3 and 4 should be modified as follows:

vk # 0, select xk = (u, s ) E argmax{f

(2) I r E vk}. Compute ik-- min{t I h(uk,t) 0). If ik:= (uk,ik)E G, terminates: an optimal solution has been obtained. Otherwise, go to Step 4.

Step 3.

If

k

k

>

Compute xk = ik- A k ( i k - (a, c)), with Ak = min {A I Step 4. i k - A ( z k - (a, c)) E G). Determine the new current best feasible solution zk+' and the new current best value CBV. Compute the proper vertex set Vk+' of Pk+1= & \ (zk,b] according to Proposition 7. In this manner, the algorithm will essentially work in the y-space and can be viewed as a polyblock outer approximating procedure in this space. The above method cannot be easily extended to the general case of (2.19) when t E Rm with m > 1. However, to take advantage of the fact that the objective function depends on a small number of variables one


Figure 2.1. Inadequate &-approximateoptimal solution.

can use a branch and bound procedure (see Sections 7,8 below), with branching performed on the y-space.

6.

Successive incumbent transcending algorithm

The &-approximateoptimal solution, as computed by Algorithm PA for MO/A (or Algorithm QA for MO/B) in finitely many iterations, may not be an adequate approximate optimal solution. In fact it may be infeasible and for a given E > 0 it may sometimes give an objective function value quite far from the actual optimal value of the problem, as illustrated by the example depicted in Figure 1.1 where x* is almost feasible but not feasible. To overcome this drawback, in this Section we propose a finite algorithm for computing a more adequate approximate optimal solution of MO/A. 0) # 0 (this is a mild Assume that {x E [a,b] I g(x) < 0, h(x) assumption that can often be made to hold by shifting a to a' < a). For E > 0 satisfying {x E [a, b] I g(x) 5 - E , h(x) 0) # 0, we say that a feasible solution Z is essentially E-optimal if

> >

Clearly an infinite sequence {z(E)),E \ 0, of essentially &-optimalsolutions will have a cluster point x* which is a nonisolated feasible solution satisfying f (x*) = max{ f (x) I x E S*),

60


where S* = cl{x I g(x) < 0 5 h ( x ) , x E [a, b]). Note that S* may be a proper subset of the feasible set {x I g(x) 5 0 5 h(x), x E [a,b]) (which is closed, since g is 1.s.c. and h is u.s.c.). Such a nonisolated feasible solution x* is referred to as an essential optimal solution. Basically, the proposed algorithm for finding an essentially e-optimal solution of MO/A is a procedure for successively solving a sequence of incumbent transcending subproblems of the following form: (*) Given a real number y, find a feasible solution with an objective function value exceeding y, or else prove that no such solution exists. As will be shortly seen each of these subproblems reduces to a MO/B problem.

6.1

Incumbent transcending subproblem

For any given y E R U (-oo) consider the problem min{g(x) min f (x) 2 y, h(x)

> 0, x E [a,b]).

Since this is a MO/B problem without normal constraint, an e-optimal solution of it can be found by Algorithm QA in finitely many iterations (Remark 4). Denote the optimal values of MO/A and (Bly) by max MO/A and min B/y, respectively. (i) Ifmin(B/y) > 0 then any feasible solution 2 of MO/A such that f (Z) y - e is an e-optimal solution of MO/A. Hence, if min(B/y) > 0 for y = -oo then MO/A is infeasible.

PROPOSITION 2.8

>

(ii) If min(B/y) < 0 then any feasible solution o x of (Bly) such that g ( o x ) < 0 is a feasible solution of MO/A with f ( o x ) 2 y. (iii) If min(B/y) = 0 then any feasible solution 3 of MO/A such that g(Z) 5 - E and f (2) y - E is essentially &-optimal.

>

Proof. (i) If min(B/y) > 0 then, since every feasible solution x of MO/A satisfies g(x) 5 0, it cannot be feasible to (Bly). But h(x) 0, x E [a, b], hence f (x) < y. Consequently, max (MOIA) < y, i.e. f (z) y - E > rnax (MOIA) - E , and hence 2 is an &-optimalsolution of MO/A. (ii) If o x E [a, b] is a feasible solution of (Bly) while g(Gx) < 0, then g(Bx) < 0, f ( 6 x ) y, h ( o x ) 2 0, hence Gx is a feasible solution of MO/A) with f ( o x ) 2 y. (iii) If min(B/r) = 0 then any x E [a, b] such that g(x) -E, h(x) 0, is infeasible to (Rly), hence must satisfy f (x) < y. This implies that y sup{f(x)l g(x) 5 - E , h(x) 0, x E [a, b]), so if a feasible solution n: of MO/A satisfies g(3) 5 - E and f (3) 2 - E , then f (2) E 2

>

>

>

>

>

A/

+

61


sup{ f (x) I g(x) 5 -e, h(x) y &-optimalto MO/A.

6.2

2 0, x

E [a, b]), and so 3 is essentially 0

Successive incumbent transcending algorithm for MO/A

Proposition 2.8 can be used to devise a successive incumbent transcending procedure for finding an &-essentialoptimal solution of MO/A. Before stating the algorithm we need some definitions. A box [p, q] c [a,b] can be replaced by a smaller one without losing any x E [p, q] satisfying g(x) 0, f (x) y, h(x) 2 0, i.e. without losing any x E [p, q] satisfying

g(x) 5 0 5 h,(x) := min{f (x) - y, h(x)). This reduced box redo[p, q], called a valid reduction of [p, q], is defined by

where n

As in Section 3, it can easily be proved that the box redo[p, q] still contains all feasible solutions x € b,q] of (Bly) with g(x) 0. For any given copolyblock Q with proper vertex set V, denote by redoQ the copolyblock whose vertex is obtained from V by deleting all z E V satisfying redo[z,b] = 0 and replacing every other z E V with the lowest corner z' of the box redo[z,b]. Also for any z E [a, b] denote by py(z) the first point where the line segment joining z to b intersects the surface h-, (x) := mini f (x) - y, h(x)) = 0, i.e.

0, the inclusion {x I g(x) 0, f (x) 2 y) c Qk shows that min(B/y) > 0, and so, by Proposition 2.8, It. is an €-optimal solution of MO/A. If -E < g(zk) 0 , then Z is a feasible solution of MO/A with f(3) = y - E , while - E < min{g(x) I f ( x ) 2 y, h(x) 2 0 , x E [a,b]), -e, hence f ( x ) < y = f(Z) E for all x E [a,b] satisfying g(x) h(x) 0. This means that Z is essentially €-optimal. There remains to show that the algorithm is finite. Since at every occurrence of Step 3 the current best value f (z) improves at least by E > 0 while it is bounded above by f (b), it follows that Step 3 cannot occur infinitely many times. Therefore, there is Lo such that for all k ko we have g(xk) > O! and also g(zk) ) 0 < g(xk). From this moment the algorithm works exactly as a procedure QA for solving the problem (B/y). Since g(xkj- g(zk) + 0 for k -++oo by the convergence of this procedure, the event -E < g(zk) 0 must occur for sufficiently 0 large k. This completes the proof.

>

+

0). Then

The last equation follows from the assumption in (i) by letting k -t +oo. (ii) follows from the fact t,hat d* = sup{d(X): X E A) inf{f(x): x E C3,

2

d(0) =

(iii) and (iv) follow immediately from the definition of duality bounds.0

xiEIlU12

REMARK 3.1 (a) The condition g i ( ~ ) ~> : 0 for x E C, is fulfilled if, e.g., there is an index j such that {x: I(: E C,gj(x) 5 0 )

=O.

In this case, one can choose X0 = (0,. . . , 0 , A:, 0 , . . . , 0 ) with XjO = 1. (b) The monotonicity property (iii) is useful in the application of duality bounds within a branch and bound scheme, which is considered in the next sections. (c) Based on Property (iv) one can use redundant constraints to improve duality bounds in some interesting cases, see, e.g., Shor and Stetsyuk (2002).

83

3 Duality Bound Methods in Global Optimization

2.2

Convex envelopes and convexification bound

We recall the concept of the convex envelope of a nonconvex fuiiction which is a basic tool in theory and algorithms of nonconvex global optimization, see e.g., Falk and Hoffman (1976), Horst and Tuy (1996), and Horst et al. (2000).

3.1 Let C c Rn be nonempty convex, and let f : C DEFINITION be 1.s.c. on C. The function x

H

cp,,

--+

R

'PC,,-. C - + R , (x) := sup{h(x):h:C R convex, h 5 f on C}

,

-f

is said to be the convex envelope of f over C. Notice that it is often convenient to eliminate formally the set C by setting

and replacing cpCtf accordingly by its extension cp,,, : Rn -+ R U {oo). It is well-known and easy to see that cpCqf is 1.s.c. on C, and hence is representable as the pointwise supremum of the affine minorants of f . Geometrically cpCvf is the function whose (closed) epigraph (i.e., the set of points on and above its graph) coincides with the convex hull of the epigraph of f . The following basic properties and their proofs can be found, e.g., in Horst and Tuy (1996) and Horst et al. (2000).

PROPOSITION 3.2 Assume that in Definition 3.1, C is compact and let D c C be convex7g g: Rn R be an afine function. Then -+

(i) m := min{f (x) : x E C ) = min{cpClf(x) : x E C ) , (ii) {Y E C : f ( y ) = m)

(iv) cp,,f+,

= cp,,

C {Y

E C : cp,,,

(y) = m),

,+ g.

Notice that the result (ii) can be precised: it is easy to see that the set of global minimizers of cpCvf over C is the convex hull of the set of global minimizers of f over C. For each nonconvex optimization problem of the type (3.1) with we can construct the following convexified problem:

12

= 0,

84


Obviously, cp* is also a lower bound of f *. We call cp* a convexification bound.

2.3

Relationship between duality bounds and convexification bounds

A very nice property of duality bounds is that in many interesting cases, they are at least as good as convexification bounds. More precisely, this property is shown in the following. PROPOSITION 3.3 Assume that in Problem (3.1) I2 = 0 and some constraint qualification is fulfilled for the convexified Problem (3.3). T h e n

(ii) d* = cp* for the case where gi, i E I I are all linear.

be the Lagrangian of Problem (3.3). Then it follows from the above assumption and the definition of convex envelopes that d* = sup inf L(x, A) X>O "cEC

> sup inf z ( x , A) = cp* X>O x E C

(ii) From Proposition 3.2(iv), it follows that on C it holds

Thus, from Proposition 3.2(i) we have for each A 2 0 that

A) = inf z ( x , A), inf L(x, A) = inf cpC,~(x,

xEC

XEC

xEC

13 which implies that d* = cp*. In principle, duality bounds can be strictly better than convexification bounds. The following special case serves as an example (see Diir, 2002).

PROPOSITION 3.4 Assume that in Problem (3.1) and Problem (3.3) the following conditions are fulfilled: (i) I2 = 0; C is compact; (ii) f is strictly concave o n C , i.e.


85

(iii) -gi, i E II are strictly concave and continuously differentiable o n C , and there i s 3 E C such that gi(3) < 0 , i E I l ; (iv) f * > cp*; (v) cpc,! is not constant o n any linesegment contained in C .

T h e n d* > cp*, i.e. in this case, the duality bound i s strictly better than the convexification bound.

3.

Branch and bound methods using duality bounds

The branch and bound scheme is one of the most promising methods developed for solving multiextremal global optimization problems. The main idea of this scheme consists of two basic operations: successively refined partitioning of the feasible set and estimation of lower and upper bounds for the optimal value of the objective function over each subset generated by the partitions. In this section, we present a branch and bound scheme using duality bounds for solving global optimization problems of the form

where C is a simple n-dimensional compact convex set as used in the first relaxation step of a branch and bound procedure, e.g., a simplex or a rectangle, f and gi (i = 1,. . . ,m ) are lower semicontinuous functions on C. The additional assumption on C is made here for the implementability and convergence of the algorithm. Let

3.1

Branch and bound scheme

For each set R for the problem

C , we denote by p(R) the duality bound computed

and by F ( R ) a finite set of feasible points of this problem, if there exists any.

Branch and bound algorithm Initialization: Set R' = C. Compute p(R1) and F ( R ~ c ) R l n ~ Set . p1 = p(R1). If F ( R ~ # ) 0, then compute yl = min{f (x) : x E F ( R ~ ) ) and choose x1 such that f ( x l ) = yl, otherwise, set yl = + m . If p1 = + m , then set R 1 = 0, otherwise, set R1 = {R1), k = 1.

86


Iteration k (i) If !Xk = 0, then stop: either Problem (3.1) has no feasible solution or xk is an optimal solution. (ii) If !Xk # 0, then perform a partition of Rk obtaining {R!, . . . , R:), where R:i = 1 , . . . ,r are nonempty n-dimensional sets satisfying UTzl R: = R ~i ,n t ~ mi n t ~ =? 0, for i # j . (iii) For each i = 1,. . . ,r compute ,u(R:) and F (R:) (iv) Set yk+l = min{yk, min{ f (~x): x E

U;=l

c R:

fl

L.

F(R~))).

(v) Choose xkS1 such that f (xk+l)= yk+l. (vi) Set !XkS1= !Xk \ { R ~U{R: ) : p ( R f ) < yk+l, i = 1,.. . , T ) . (vii) If !Xk+1# 0, then set pk+: = min{p(R) : R E !Xk+l) and choose such that p(Rkfl) = pk+l, otherwise, set pk+l = Rktl E y,++l. Go to iteration k t k + 1.

3.2

Convergence

Whenever the above algorithm does not terminate after finitely many iterations, it generates at least one infinite nested sequence of compact partition sets {RQ)such that RQS1c RQ for all q. We obtain the convergence of the algorithm in the following sense.

T H E O R E3.1 M Assume that the algorithm generates a n infinite nested sequence {RQ) of partition sets such that

T h e n each optimal solution of the problem min{ f (x) : x E R*)

(3.5)

is a n optimal solution of Problem (3.4). Proof. For each q let xQE RQsuch that f (xQ)= min{f (x) : x E Rq). Let x* be an accumula,tion point of the sequence {xq). Then x* E R* and, by co. passing to a subsequence if necessary, assume that xQ -+ x* as q Since RQ+' c RQ 'dq and x* E RQ 'dq, it follows with the definition of f (x4) that f (x4) f (xQfl ) f (x*). Hence lim,,, f ( 2 4 ) exists satisfying limq+COf (xq:) f (x*). On the other hand, lower semicontinuity f (xQj2 f (z*), so that we have limq,CO f (xQ)= f (x*). implies lim,,, -+

) -GO,

88


it follows from Proposition 3.1 (i) and Remark 3.1(a) that p ( ~ k O )= +m, which implies that the partition set RkOhas to be removed from further 0 consideration. This contradiction implies that r* E L. Notice that the results of Theorems 3.1 and 3.2 can also be derived from the approach given in Dur (2001).

4.

Decomposition met hod using duality bounds

In this section, we discuss a decomposition method for solving a class of global optimization problems, in which the variables can be divided into two groups in such a way that whenever a group is fixed, all functions involved in the problem must have the same structure with regard to the other group, e.g., linear, convex or concave, etc. Based on the decomposition idea of Benders, a corresponding 'master problem' is defined on the space of one of two variable groups. For solving the resulting master problem, the branch and bound algorithm using duality bound presented in the previous section is applied. Convergence properties of the algorithm for this problem class is established in the next subsection. A special class of so-called partly convex programming problems is considered thereafter. The results presented in this section are originated from Thoai (2002a,b).

4.1

Decomposition branch and bound algorithm

The class of nonconvex global optimization problems to be considered here can be formulated as follows: min F(x, y) s.t.Gi(x,y) 5 0 (i = 1,.. . , m) x E C, y E Y,

(3.6)

where C is a compact convex subset of Rn, Y a closed convex subset of RP, and F and Gi (i = 1,.. . ,m) are continuous functions defined on a suitable set containing C x Y. To apply the branch and bound algorithm presented in the previous section, we also assume in addition that C has a simple structure as e.g., a simplex or a rectangle. We denote by Z the feasible set of Problem (3.6), i.e., Z = { ( x , ~ )Gi(x,y) : 5 O(i = 1 , . . . , m ) , x E C , y E Y), and assume that Problem (3.6) has an optimal solution. Define a function 4 : Rn -+ R by

(3.7)

3 Duality Bound Methods i n Global Optimization

89

and agree that 4(x) = +oo whenever the feasible set of Problem (3.8) is empty. Then Problem (3.6) can be formulated equivalently as min{4(x): x E C}.

(3.9)

More precisely, we state the equivalence between the Problems (3.6) and (3.9) in the following proposition whose proof is obvious.

PROPOSITION 3.5 A point (x*,y*) i s optimal t o Problem (3.6), if and only if r* is a n optimal solution of Problem (3.9), and y* is a n optimal solution of Problem (3.9) with x = x*. In view of Proposition 3.5, instead of Problem (3.6), we consider Problem (3.9) which is usually called the 'master problem'. For solving the master problem in Rn, the branch and bound algorithm presented in Section 3.1 is applied. Notice that partitioning is applied only in the space of the x-variables. For each partition set R c C , a lower bound p(R) of the optimal value of the problem min{4(x) : x E R} = min{F(x, y): Gi(x, y) 5 O(i = 1,..., m ) , x E R, y E Y} is obtained by solving the dual problem, i.e.,

We now establish some convergence properties of the branch and bound algorithm applied to Problem (3.9). Let C0 be the subset of C consisting of all points x E C such that Problem (3.8) has an optimal solution, i.e.,

Note that, since Problem (3.6) is solvable, it follows from Proposition 3.5 that Go # 0. Further, let M : C0 -4 RP be a point-to-set mapping defined by M ( x ) = {y E RP: Gi(x,y) 5 O(i = 1 , . . . , m ) , y E Y}.

(3.11)

The following definition, which is introduced based on well known concepts from convex analysis and parametric optimization (see, e.g., Berge, 1963; Hogan, 1973; Bank et al., 1983), is used for establishing convergence of the algorithm.

90


DEFINITION 3.2 (i) We say that the function 'dual-proper' at a point xOE C0 if

4

in Problem (3.9) is

(ii) The function 4 is 'upper semicontinuous (u.s.c)' at xO E CO if for each sequence ( 2 4 ) c C O , limq-+m XQ = xO the inequality lim+(xQ)5 4(x0) holds. (iii) A point-to-set mapping M : C0 + RP is called 'lower semicontinuous according to Berge (1.s.c.B.)' a t xOE C0 if for each open set R satisfying R n M(x') # 0 there exists an open ball, C, around xOsuch that R n M ( x ) # 0 for all x E C n CO.

THEOREM 3.3 Assume that the decomposition branch and bound algorithm generates an infinite subsequence {RQ) of partition sets such that (i) RQ+' c RQfor all q and limq,, (ii) the function

RQ=

RQ= {r*) c Go,

4 is dual-proper at r*,

(iii) there exists qo satisfying p(RQ0)> -00, and (iv) there exists a compact set YO c Y such that for each X 2 0 and for each x E C , the set of optimal solutions of the problem min{F(x, y ) + C E 1 Gi(x, y)Xi : y E Y), if it exists, has a nonempty subset in YO. Then r* is an optimal solution of Problem (3.9), i.e., the point r*, together with each optimal solution of the Subproblem (3.8) with x = r*, is an optimal solution of Problem (3.6). Proof. From Assumption (iii) and the monotonicity property of duality bounds, it follows that p(RQ)> -00 for q 2 qo. For each q 2 qo, let

and let XQ be an optimal solution of the problem max{wq(X): X i.e., p(RQ)= wq(XQ).Moreover, let

2 01,

91


First, we show that w*(A) = supq wq(A) for each A. By definition, it is obvious that w*(A) 2 supq wq(X). On the other hand, for each q, let 2 4 E R4, y4 E Yo c Y such that

Then limq+OO(xQ, yq) = (r*,y*), where r* E C0 as in Assumption (i) and y* E YO c Y, by assumption (iv). This implies that SUP 4

wq(A) = lim wq(A) = F(T*, Y*) q-+w

+ C Gi(r*,y*)Ai 2 w*(X). i=l

Thus, we have w*(A) = sup, wq(A). Since the sequence {p(R'J)) of lower bounds is nondecreasing and bounded by the optimal value of Problem (3.6), its limit, p*, exists, and we have p* = lim p(Rq) = lim wq(Aq)= lim maxwq(A) Q4W q+w q+w A20 = sup max wq(A) = max sup wq(A) = max w*(A). q A20 A20 q x>o Since 4 is dual-proper at r * (Assumption (ii)), it follows that p* = @(r*), which implies that r* is an optimal solution of Problem (3.9), and hence, the point r*,together with each optimal solution of the Subproblem (3.8) with x = r*, forms an optimal solution of Problem (3.6). 0 REMARK3.2 If Y is a compact set, then Conditions (iii) and (iv) in Theorem 3.3 can obviously be removed.

THEOREM 3.4 Let the assumptions of Theorem 3.3 be fulfilled. Further, assume that throughout the algorithm one has F(R4) # 0 for each q, and the function qh is upper semicontinuous at r*. Then each accumulation point of the sequence {xq) generated by the algorithm, at which 4 is upper semicontinuous, is an optimal solution of Problem (3.9). Proof. Let x* be an accumulation point of {xq), (note that accumulation points exist because of the compactness of C). By passing to a subsequence if necessary, we assume that limq,OO 2 4 = x*. Since lim R q =

4+00

n

~q

= {,*I

co,

92


and 4 is upper semicontinuous at r*, it follows that for each q there is a point 7-4 E R4 such that 4(r4) < +co and &$(rq) 5 $(r*). Since {4(xq)) is nonincreasing and bounded by the optimal value of Problem (3.6), and 4(x4) 4(r4) for each q, it follows from the upper semicontinuity of 4 at x* that

v f (x*) + Xlvgl(x*) + X2Vg2(x*) = 0,

4 General Quadratic Programming

111

Xlgl(x*) = X2g2(x*) = 0, X2Q2 is positive semidefinite.

A + XIQl

+

The above result can be extended to the case of more than two quadratic constraints. As an example, consider the following problem of minimizing a quadratic function over the intersection of ellipsoids: min f ( x ) = ( A x ,x )

+ 2(b,x )

s.t. g i ( x ) = ( Q i ( x - a i ) , ( x - a i ) ) - r i 2 5 0 ,

i=l,

...,m ,

(4.8)

where A is a symmetric matrix, Qi ( i = 1 , . . . , m) are symmetric positive semidefinite matrices and ai E Rn, ri > 0, ( i = 1, . . . , m) are given vectors and numbers, respectively. Obviously, each set

is an ellipsoid with the center ai and the radius

Ti.

THEOREM 4.5 ( C F . FLIPPO AND JANSEN, 1996) A feasible point x* of problem (4.8) is a global optimal solution if there exist multipliers Xi 0, i = 1 , . . . , m such that

>

A+

x

XiQi is positive semidefinite.

i=l

We discuss now necessary conditions. Let x* be a global optimal solution of problem (4.7). If no constraint is active at x*, i.e., gl(x*) < 0, g2(x*) < 0, then it is well known that V f ( x * ) = 0 and A is positive semidefinite. If only one constraint is active at x*, say gl ( x * ) = 0, g2(x*) < 0, and V g l ( x * ) # 0, then it follows from (local) second or0 such that V f ( x * ) der necessary conditions that there exists X 1 XIVgl(x*)= 0 and A XIQ1 has at most one negative eigenvalue. For the case that both constraints are active at x*, i.e. gl(x*) = g(x*) = 0, necessary conditions are established based on the behavior of the gradients of gl and g2 at x*.

+

>

+

THEOREM 4.6 ( C F . PENGAND Y U A N ,1997) Let x* be a global optimal solution of problem (4.7). (a) If V g l ( x * ) and V g 2 ( x * )are linearly independent, then there exist A1 2 0, X z 2 0 such that


+

+

(i) V f (x*) XIVgl(x*) X2Vg2(x*)= 0 and (ii) A XIQl X2Q2 has at most one negative eigenvalue.

+

+

(b) If Vgl(x*) = aVgz(x*) # 0 for a A2 2 0 such that (i) holds and (iii) A

> 0, then there exist X1

> 0,

+ XIQl + X2Q2 is positive semidefinite.

To get some necessary and sufficient conditions for global optimality of problem (4.7), one has to make some more assumptions. In HiriartUrruty (2001), the following special case of problem (4.7) is considered:

where f is convex nonlinear, gl, g2 are convex and there is xO E Rn such that gl(xO)< 0, g2(x0) < 0 (Slater's condition). Based on Condition (4.6), necessary and sufficient conditions for global optimality are obtained for problem (4.9). These results are extensions of Theorem 4.3.

THEOREM 4.7 (CF. HIRIART-URRUTY, 2001) (a) A point x* with gl(x*) = g2(x*) = 0 is a global optimal solution of problem (4.9) if and only i f there exist XI 0, X2 0 such that

>

>

'dd E Iiit T(C,x*), where C is the convex feasible set of problem (4.9) and T ( C , x * ) the tangent cone to C at x* defined by

(b) A point x* with gl(x*) = 0, g2(x*)< 0 is a global optimal solution of problem (4.9) i f and only if there exist X1 0 such that

>

Ax*

+ b = Xi(Qlx.* + qi), b'd

E Int T ( C ,x*),


113

where

Another special case of problem (4.7) is considered in Stern and Wolkowicz (1995). It is the problem min f (x) = (Ax, x) - 2(b, x) s.t. - oo 5 p 5 (Qx, x) 5 a 5 +oo,

(4.10)

where A and Q are symmetric matrices.

THEOREM 4.8 ( C F . STERNAND WOLKOWICZ, 1995) Let x* be a feasible point of problem (4.10) and assume that the following 'constraint qualification' holds at x*: Qx* = 0

implies ,8 < 0

< a.

T h e n x* is a global optimal solution if and only if there exists X E R such that (A - XQ)x* = b, (A - XQ) is positive semidefinite, X(/3 - (Qx*,x*)) 0 2 ~ ( ( Q x *x*) , - a).

>

2.2

Duality

General quadratic programming is a rare class of nonconvex optimization problems in which one can construct primal-dual problem pairs without any duality gap. A typical example for this is problem (4.10). For XI 2 0, X2 1 0 define the Lagrangian

of problem (4.10) and the dual function

Then the Langrangian dual problem of (4.10) is defined as the problem

Assume that THEOREM 4.9 (CF. STERNAND WOLKOWICZ, 1995) problem (4.10) has a n optimal solution. T h e n strong duality holds for the problem pair (4.10)-(4.11), i.e., f * = d*, where f * and d* denote the optimal values o f problem (4.10) and problem (4.11), respectively.

114


Two interesting special cases of problem (4.10) are respectively the problems with Q = I (the unit matrix), ,b' < 0 < a and Q = I, ,b' = a > 0. Notice that the problem of minimizing a quadratic function over an ellipsoid, the constrained eigenvalue problem and the quadratically constrained least squares problem can be converted into these special cases (cf. Pham and Le Thi, 1995; Flippo and Jansen, 1996). There is another way to construct a dual problem of (4.10). For XI 5 0, X2 5 0 such that the matrix (A - XIQ X2Q) is regular, define the quadratic dual function

+

Then the optimization problem SUP h(X1, X2) s.t.X1 1 0, A:! 5 0, (A - XIQ X2Q) is positive definite.

+

(4.12)

can be considered as a dual of problem (4.10), and we have the following.

Assume that THEOREM 4.10 (CF. STERNAND WOLKOWICZ, 1995) problem (4.10) has an optimal solution and there is X E IR such that the matrix A - XQ is posztive definite. Then strong duality holds for the problem pair (4.10)-(4.12), i.e., f* = h*, where f* and h* denote the optimal values of problem (4.10) and problem (4.12), respectively.

3.

Solution met hods

The general quadratic programming problem is NP-hard (cf. Sahni, 1974; Pardalos and Vavasis, 1991). In this section, we present some main solution methods for the global optimization of this NP-hard problem. In general, these methods are developed based on three basic concepts which are successfully used in global optimization. We describe these concepts briefly before presenting different techniques for the realization of them in general quadratic programming. For details of three basic concepts, see, e.g., Horst and Tuy (1996), Horst and Pardalos (1995), and Horst et al. (2000, 1991). It is worth noting that most techniques to be presented here can be applied to integer and mixed integer quadratic programming problems (which do not belong to the subject of this overview).

3.1

Basic concepts

To establish this concept, we conOuter approximation (OA). sider the problem of minimizing a linear function ( c ,x) over a closed

4

General Quadratic Programming

115

subset F C W1. This problem caAn be replaced by the problem of Unding an extreme optimal solution of the problem min{(c, x): x G F } , where F denotes the convex hüll of F . Let C\ be any closed convex set containing F and assume that x1 is an optimal solution of problem min{(c, x): x G C\], Then x1 is also an optimal solution of the original problem whenever x1 G F. The basic idea of the outer approximation concept is to construct iteratively a sequence of convex subsets {C/e}, k = 1,2,... such that C\ D C2 D • • • D F and the corresponding sequence {xk} such that for each /c, xk is an optimal solution of the relaxed problem min{(c, x): x G Ck}> This process is performed until finding xk G F. An OA procedure is convergent if it holds that xk —> x* G F for k —> +00. Branch and bound scheme (BB). The BB scheme is developed for the global optimization of problem /* = min{/(x): x G F} with / being a continuous function and F a compact subset of Mn. It begins with a convex compact set C\ D F and proceeds as follows. Compute a lower bound JJL\ and an upper bound 71 for the optimal value of the problem min{/(x) : x G CiDF}. (71 = f{xl) if some feasible solution x1 G F is found, otherwise, 71 = +00). At Iteration k > 1, if +00 > fik > jk o r jj,k z=z 4-00, then stop, (in the first case, xk with f(xk) = 7^ is an optimal solution, in the second case, the underlying problem has no solution). Otherwise, divide Ck into finitely many convex sets Ckx,..., Ckr satisfying Ui=i Cki — Ck andC/c- D C^. = 0 for i =fi j , (the sets Ck and C/- are called 4partition sets'). Compute for each partition set a lower bound andan upper bound. Update the lower bound by choosing the minimum of lower bounds according to all existing partition sets, and Update the upper bound by using feasible points found so far. Delete all partition sets such that the corresponding lower bounds are bigger than or equal to the actual upper bound. If not all partition sets are deleted, let Ck+i be a partition set with the minimum lower bound, and go to Iteration k + 1. A BB algorithm is convergent if it holds that 7^ \ /* and/or jjik / * / * for k —> + 0 0 .

Combination of B B and OA. In many situations, the use of the BB scheme in combination with an OA in the bounding procedure can lead to efficient algorithms. Such a combination is called branch and cut algorithm, if an OA procedure using convex polyhedral subsets Ck Vfc > 1 is applied.

116

3.2


Reformulation-linearization techniques

Consider quadratic programming problems of the form min f (x) = (c, x) s.t. gi(x) 5 0, i = 1 , . . . , I (ai,x) - bi 5 0,

(4.13)

i = 1, ... , m ,

where c E Rn, ai E Rn, bi E R 'di = 1,.. . ,m, and for each i = 1 , . . . ,I, the quadratic function gi is given by

with di, q;, Q i k , Qil being given real numbers for all i, k , I . It is assumed that the polyhedral set X = {x E Rn: (ai, x) - bi 5 0, i = 1 , . . . ,m) is bounded and contained in Rn+= {x E R n : x 2 0). The first linear relaxation of problem (4.13) is performed as follows. For each quadratic function of the form

define additional variables v k = x k2 , k = 1 , ...,n, and wkl=xkxl, k = 1 , ..., n - 1 ; 1 = k + 1 ,

..., n.

From (4.15), one obtains the following linear function in variables x, v, w: n

n

n-1

n

The linear program (in variables x, v and w) min f (x) = (c, x) s.t. [gi(x)]&O, i = 1 , [(bi - (ai,x))(bj- (aj,x))le 2 0,

...,I

(4.17)

V1 5 i 5 j 5 m

is then a linear relaxation of (4.13) in the following sense (cf. Sherali and Tuncbilek, 1995; Audet et al., 2000): Let f * and f be the optimal values of problems (4.13) and (4.17), respectively, and let ( 3 ,v, G ) be an optimal solution of (4.17). Then


117

(a) f * 2 f and (b) if @k = 3; V k = 1 , . . . ,n, zEkl = % k 3 l V k = 1 , . . . ,n 1,. . . ,n , then Z is an optimal solution of (4.13).

-

1; 1 = k

+

Geometrically, the convex hull of the (nonconvex) feasible set of problem (4.13) is relaxed by the projection of the polyhedral feasible set of problem (4.17) on Rn. As well-known, this projection is polyhedral. In the case that the condition in (b) is not fulfilled, i.e., either flk # 3; for at least one index k or zEkl # 3kZl for at least one index pair ( k ,l), a family of linear inequalities have to be added to problem (4.17) to cut zE) off from the feasible set of (4.17) without cutting off the point (3,@, any feasible point of (4.13). To this purpose, several kinds of cuts are discussed in connection with branch and bound procedures. Resulting branch and cut algorithms can be found, e.g., in Al-Khayyal and Falk (1983) and Audet et al. (2000).

3.3

Lift-and-project techniques

The first ideas of lift-and-project techniques were proposed by Sherali and Adams (1990) and Lovkz and Schrijver (1991) for zero-one optimization. ~ h e s basic e ideas can be applied to programming as follows. The quadratic programming problem to be considered is given in the form min f (x) = (c, x)

where C is a compact convex subset of Rn, c E Rn, and each function gi is given by (4.19) gi(x) = ( Q ~ x2) , 2(qi, X) di

+

+

with Qi being n x n symmetric matrix, qi E Rn, and di E R. To each vector x = ( x l , . . . ,x , ) ~ E Rn, the symmetric matrix X = xxT E Rnxn with elements Xij = xixj (i, j = 1 , . . . , n ) is assigned. Let Snbe the set of n x n symmetric matrices. Then each quadratic function

on Rn is lifted to a linear function on Rn x Sn defined by

where (Q, X ) = X Esn.

CYz1QijXij stands for the inner product

of Q,

118


Thus, the set

can be approximated by the projection of the set {(x, X ) E Rn x S n : (Q, X )

+ 2(q,x) + d I0)

on Rn. By this way, the feasible set of problem (4.18) is approximated by the set {x E Rn: (Qi,X )

+ 2(qi,x j + ci'i 5 0 for some X E Sn, i = 1 , .x

C } , (4.20)

and problem (4.18) is then relaxed by the problem min f (x) = (c, x) s.t. ( Q i , ~ ) + 2 ( q i , x ) + . d i < 0 , i = l , ...,m XEC, X E S n .

(4.21)

Next, notice that for each x E Rn, the matrix

is positive semidefinite. 'Therefore, problem (4.18) can also be relaxed by the problem min f (x) = (c, x) s.t. ( Q ~ , X ) + ~ ( ~ ~ , X ) +i~=~l , 0 it is N P - h a r d to find a feasible solution to the linear bzievel programming problem with n o more than E times the optimal value. Related results can also be found in Hansen et al. (1992). In bicriterial optimization problems two objective functions are minimized simultaneously over the feasible set (Pardalos et al., 1995). To formulate them, a vector valued objective function can be used: ((

min x "{a(x) : x E X),

(6.11)

6 Bilevel Programming

173

Rn and a : X -t R2. In such problems a compromise bewhere X tween the two, in general competing objective functions a l ( x ) and a z ( x ) is looked for. Roughly speaking, one approach for such problems is to call a point x* E X a solution if it is not possible to improve both objective functions at x* simultaneously. Such points are clearly compromise points. More formally, x* E X is Pareto optimal for problem (6.11) if

In this definition, the first orthant R$ in JR2 is used a s an ordering cone, i.e. to establish a partial ordering in the space of objective function values of problem (6.11), which is R2. In a more general formulation, another ordering cone V c R2 is used. The cone V is assumed to be convex and pointed. Then, x* E X is Pareto optimal for problem (6.11) with respect to the ordering cone V if

The relations of bilevel programming to bicriterial optimization have been investigated e.g. in the papers Fliege and Vicente (2003); Haurie et al. (1990); Marcotte and Savard (1991). On the one hand, using R: as the ordering cone, it is easy to see that at least one feasible point of the bilevel programming problem (6.1), (6.6) is Pareto optimal for the

But this, in general, is not true for a (local) optimal solution of the bilevel problem. Hence, attempts to solve the bilevel programming problem via bicriterial optimization with the ordering cone JR: will in general not work. On the other hand, Fliege and Vicente (2003) show that bicriterial optimization can indeed be used to prove optimality for the bilevel programming problem. But, for doing so, another more general ordering cone has to be used. Closely related to bilevel programming problems are also the problems of minimizing a function over the efficient set of some multicriterial optimization problem (see Fiilop, 1993; Muu, 2000). One tool often used to reformulate the optimistic bilevel programming are the Karush-Kuhn-Tucker condiproblem as an one-level tions. If a regularity condition is satisfied for the lower level problem (6.1), then the Karush-Kuhn-Tucker conditions are necessary optimality

174


conditions. They are also sufficient in the case when (6.1) is a convex optimization problem in the x-variables for fixed parameters y. This suggests to replace problem (6.I ) , (6.6) by min F(x, y) X,YJ subject to G(y) 10, Vxf (x, Y)

+ ~ ~ V x dY)x =, 0,

Problem (6.12) is called an (MPEC), i.e. a mathematical program with equilibrium constraints, in the literature Luo et al. (1996). The relations between (6.1), (6.4), (6.5) and (6.12) are highlighted in the following theorem. THEOREM6.5 (DEMPE(2002)) Consider the optimistic bilevel programming problem (6.1), (6.4), (6.5) and assume that, for each fixed y, the lower level problem (6.1) is a convex optimixation problem for which (MFCQ) is satisfied for each fixed y and all feasible points. Then, each local optimal solution for the problem (6.1), (6.4), (6.5) corresponds to a local optimal solution for problem (6.12). This implies that it is possible to solve the optimistic bilevel programming problem via an (MPEC) but only if the lower level problem has a unique optimal solution for all values of the parameter or if it is possible to avoid false stationary points of the (MPEC). The solution of a pessimistic bilevel programming problem via an (MPEC) is not possible. Note that the opposite implication is not true in general. This can be seen in the following example.

EXAMPLE 6.4 Consider the simple optimistic linear bilevel programming problem min{y: x E @(y),-1

1y 5

l),

X>Y

where Q(y) := Argmin,{xy : 0

< x < 1)

at the point (x, y) = (0,O). Then, [0,1], if y = 0, (11, {O}, Take 0

< E < 1 and set

W,(O,O) = (-E,

ifyO. E)

x (-E,E). Then,

175


Since the infimal function value of the upper level objective function F ( x , y) = y on this set is zero, the point (x, y) = (0,O) is a local optimal solution of problem (6.12). Due to its definition,

Since this function has no local minimum at y = 0, this point is not a local optimistic optimal solution. The essential reason for the behavior in this example is the lack of lower semicontinuity of the mapping Q(y) which makes it reproducible in a more general setting. It is a first implication of these considerations that the problems (6.1), (6.4), (6.5) and (6.1), (6.6) are not equivalent if local optimal solutions are considered and a second one that not all local optimal solutions of the problem (6.12) correspond in general to local optimal solutions of the problem (6.l), (6.4), (6.5). It should be noted that, under the assumptions of Theorem 6.5 and if the optimal solutions of the lower level problem are strongly stable in the sense of Kojima (1980) (cf. Theorem 6.6 below), then the optimistic bilevel programming problem (6.I ) , (6.2) is equivalent to the (MPEC) (6.12). The following example from Mirrlees (1999) shows that this result is no longer valid if 'the convexity assumption is dropped.

EXAMPLE 6.5 Consider the problem

where @(y) is the set of optimal solutions of the following unconstrained optimization problem on the real axis:

Then, the necessary optimality conditions for the lower level problem are y(x 1) exp{- (n: I ) ~ ) (x - 1)~XP{-(X- 1)2 ) = 0

+

+

+

O Vr satisfying V G ~ ( Y OI) ~0,

i : ~ i ( y O= ) 0.

This property is usually called Bouligand stationarity (or B-stationarity) of the point yo.

4.2

Using the KKT conditions

If the Karush-Kuhn-Tucker conditions are applied to replace the lower level problem by a system of equations and inequalities, problem (6.12) is obtained. The Example 6.5 shows that it is possible to obtain necessary optimality conditions for the bilevel programming problem by this approach only in the case when the lower level problem is a convex parametric one and also only using the optimistic position. But even in this case this is not so easy since the familiar regularity conditions are not satisfied for this problem.

THEOREM 6.9 (SCHEELAND SCHOLTES,2000) For problem (6.12) the Mangasarian-Fromowitx constraint qualification (MFCQ) is violated at every feasible point. To circumvent the resulting difficulties for the construction of KarushKuhn-Tucker type necessary optimality conditions for the bilevel programming problem, in Scheel and Scholtes (2000) a nonsmooth version of the KKT reformulation of the optimistic bilevel programming problem is constructed: min F ( x , y)

X,YJ

subject to G(y)

< 0,

V A x , Y: 4 = 0, mini-g(x, y), A) = 0.

(6.14)

Here, for a , b E Rn,the formula min{a, b) = 0 is understood component wise. For problem (6.14) the following generalized variant of the linear independence constraint qualification can be defined (Scholtes and Stohr, 2001):

(PLICQ) The piecewise linear independence constraint qualification is satisfied for the problem (6.14) at a point (xO,yo, A') if the gradients of all

180


the vanishing components of the constraint functions G(y), V xL(x, y, A), g(x, y), X are linearly independent. Problem (6.14) can be investigated by considering the following patchwork of nonlinear programs for fixed sets I: min F(x, y) X>YJ

Then, the piecewise linear independence constraint qualification is valid for problem (6.14) at some point (xO,yo, XO) if and only if it is satisfied for each of the problems (6.15) for all sets J ( X O ) C I 2 I ( x O , The following theorem says that the (PLICQ) is generically satisfied. For this define the set % = { ( F , G , f , g ) E C ( R m-tn , RISSSISp): (PLICQ) is satisfied at each feasible point of (6.14) with llAllco I B) for an arbitrary constant 0 < B < oo, llXllco = max{lXil: 1 5 i I p ) is the L,-norm of a vector X E RP and 1 2 2. Roughly speaking, the zero neighborhood in the (Whitney) Ck topology in RP is indexed by a positive continuous function E : Rp -t R+ and contains all (vector-valued) functions h E Ck(Rp,Rt) such that each component function together with all its derivatives up to order b are bounded by the function E . For details the interested reader is referred to Hirsch (1994).

<
rn, the set 7 1 iis also dense in the Ck-topologyfor all 2 5 k 5 1. Now, after this excursion to regularity, the description of necessary optimality conditions for the bilevel programming problem with convex lower level problems using the optimistic position is continued. For the origin of the following theorem for mathematical programs with equilibrium constraints see Scheel and Scholtes (2000). There a relaxation of problem (6.12) is considered:


min F ( x ,y) X,YJ,Y subject to V x L ( x ,y, A, p) = 0,

G ( y ) 1 0,

In the following theorem, a more restrictive regularity condition than (MFCQ) is needed: (SMFCQ) The strict Mangasarian-Fromowitx constraint qualification (SMFCQ) is satisfied at xO for problem (6.7) if there exists a Lagrange multiplier ( A , p ) ,

as well as a direction d satisfying

v P i ( x o ) d< 0,

for each i with Pi(xO)= Xi = 0,

VPi(xo)d= 0,

for each i with Xi

v y j jxo)d

for each j

and {vP~(xO): Xi dent.

-

0,

> 0,

= 1, . . . , q ) are linearly indepen-

> 0 ) )1) { v r j ( x O :) j

Note that this condition is im.plied by (PLICQ).

T H E O R E6.11 M Let ( x O , X O ) be a local minimizer of problem (6.14) and use zO = ( x0 ,y 0 ). w

If the (MFCQ) is valid for problem (6.16) at ( x Oyo, , A'), then there exist multipliers ( K , W , C , [ ) satisfying

+

+

V F ( ~ ' ) rcT (0,v y ~ ( y 0 ) )Q ( V ~ L (0 Z, X 0 )w) vxg(zO)w - ,.$ = 0, gi(~O)= & 0, 0

XiJi=O, Cili

Vi7 Vi,

2 07 i E K7

f i T ~ ( y 0=) 0,

L 0,

+ c T V g ( z O )= 0,


where K = { i : gi(xO, with respect to ( x ,y).

= A: = 0 ) and V denotes the gradient

If the (SMFCQ) is fulfilled for the problem (6.16), then there exist unique multipliers ( K , w , 5, E ) solving the last system of equations and inequalities with CiCi 1 0, i E K being replaced b y

For related optimality conditions see e.g. Flegel and Kanzow (2002, 2003).

5.

5.1

Solution algorithms Implicit function approach

To solve the bilevel programming problem, it is reformulated as a onelevel problem. The first approach again uses the implicit determined solution function of the convex lower level problem x ( ~provided ) this function is uniquely determined. If the assumptions ( C ) , (MFCQ),(SSOC), and (CRCQ) are satisfied for (6.1) at every point y with G ( y ) 0 , then the resulting problem

0 ) G I 2 ~ ( x ( y O ) , := { j : g j ( x ( y O )yo) , =0) and XO is a Lagrange multiplier vector in the lower level problem corIf the constraints responding to the optimal solution x ( y O )for y = gi(x,y) 5 0 in problem (6.1) are locally replaced by gi(x,y) = 0, i E 5, the resulting lower level problems are

mxi n { f ( x , y ) :gi(x1y)= O,Yi E

I).

(6.17)

If the gradients { ~ , ~ ~ ( x yo) ( y :~i )E ,7 ) are moreover linearly independent (which can be guaranteed for small sets 7 > J(AO)with A0 being a vertex in A ( X ( ~ O ) ,y o ) ) , then the optimal solution function x f ( . ) of the problem (6.17) is differentiable (Fiacco, 1983). Let Z denote the family of all index sets determined by the above two demands for all vertices A0 E A ( x ( ~yo). ~ ~ ,


183

THEOREM 6.12 (DEMPEAND PALLASCHKE, 1997) Consider problem (6.1) at the point x0 := (x(yO),yo) and let (MFCQ), (SSOC) as well as (CRCQ) be satisfied there. If the condition

(FRR) For each vertex X0 E A(zO)the matrix

has full row rank n

+ II (x(yO),yo) 1

is valid, then the generalized derivative of the function x(.) at the point y = yo in the sense of Clarke (1983) is ax(yo) = conv

U Vx (y ). I

0

IEZ

Using this formula, a bundle algorithm (cf. Outrata at al., 1998) can be derived to solve the problem (6.13). Since the full description of bundle algorithms is rather lengthy, the interested reader is referred e.g. to Outrata at al. (1998). Repeating the results in Schramm (1989) (cf. also Outrata at al., 1998) the following result is obtained: THEOREM6.13 (DEMPE,2002) If the assumptions (C), (MFCQ), (CRCQ), (SSOC), and (FRR) are satisfied for the convex lower level problem (6.1) at all po.ints (x, y), x E Q(y), G(y) = 0, and the sequence i n the bundle algorithm remains of iteration points { (x(yk),y k >Xk) bounded, then this algorithm computes a sequence {(x(yk),yk, A') having at least one accumulation point (x(yO),yo, XO) with

}El

}El

If assumption (FRR) is not satisfied, then the point (x(yO),yo) is pseudostationary i n the sense of Mikhalevich et al. (1987). Hence, under suitable assumptions the bundle algorithm computes a Clarke stationary point. Such points are in general not Bouligand stationary.

5.2

A smoothing method

To solve problem (6.12) several authors (e.g. Fukushima and Pang, 1999) use an NCP function approach to replace the complementarity

184


constraints. This results in the nondifferentiable problem min F ( x , y)

Z,YJ

where a function

@(a,

.) satisfying

is called an NCP function. Examples and properties of NCP functions can be found in the book of Geiger and Kanzow (2003). NCP functions are inherently nondifferentiable, and algorithms solving problem (6.18) use smoothed NCP functions. Fukushima and Pang (1999) use the function

and solve the resulting problems min F(x, y) X,YJ subject to G(y) 1 0, VxL(x, Y, 4 = 0)

(6.19)

for E + 0 with suitable standard algorithms. Hence selecting an arbitrary sequence { c k ) g l they compute a sequence {(xk,yk, A k ) ) g l of solutions and investigate the properties of the accumulation points of this sequence. To formulate their convergence result, the assumption of week nondegeneracy is needed. To formulate this assumption consider the Clarke derivative of the function @(- gi(x, y), &). This Clarke derivative exists and is contained in the set

Let the point (3,jj,X) be an accumulation point of the sequence {(xk,y k , A k ) } ~ , .It is then easy to see that, for each i E I ( Z , j j ) \ J ( X ) any accumulation point of the sequence


belongs to Ci (3,jj, X), hence is of the form

with ( 1 - ~ ~ ) ~ + ( 15- 1. ~ ~It )is~said that the sequence {(xk,yk, X k ) ) z l is asymptotically weakly nondegenerate, if in this formula neither Ji nor Xi vanishes for any accumulation point of {(xk,yk, Xk))E1. Roughly speaking this means that both gi(zk,yk) and approach zero in the same order of magnitude (see Fukushima and Pang, 1999).

THEOREM 6.14 (FUKUSHIMA A N D PANG,1999) Let for each point (xk,yk, Xk) the necessary optimality conditions of second order for prob~ l e m (6.19) be satisfied. Suppose that the sequence {(xk,yk, x ~ ) ) Econverges to some (Z,jj, X) for k + oo. If the (PLICQ) holds at the limit is asymptotically weakly nonpoint and the sequence {(xk,yk, Xk))E1 degenerate, then (3,jj, A) is a Bouligand stationary solution for problem (6.12).

5.3

SQP methods

Recently several authors have reported (in view of the violated regularity condition rather surprisingly) a good behavior of SQP methods for solving mathematical programs with equilibrium constraints (see Anitescu, 2002; Fletcher et al., 2002; Fletcher and Leyffer, 2002). To sketch these results consider a bilevel programming problem (6.6) with a convex parametric lower level problem (6.1) and assume that a regularity assumption is satisfied for each fixed parameter value y with G ( y ) 5 0. Then, by Theorem 6.5, a locally optimal solution of the bilevel programming problem corresponds to a locally optimal solution for the problem (6.12). Consequently, in order to compute local minima of the bilevel problem, problem (6.12) can be solved. In doing this, Anitescu (2002) uses the elastic mode approach in a sequential quadratic programming algorithm solving (6.12). Th'is means that if a quadratic programming problem minimizing a quadratic approximation of the objective function of problem (6.12) subject to a linear approximation of the constraints of this problem has a feasible solution with bounded Lagrange multipliers then the solution of this problem is used as a search direction. And if not, a regularized quadratic programming problem is used to compute this search direction. For simplicity, this idea is described for problem (6.7). Then this means that the following problem is used to compute this search direction:

186


subject to Pi(x) %(x)

+ VP(x)d 5 0,

+ Vy(x)d = 0,

Vi = 1 , . . . , p V j = 1 , . . . ,q.

Here, W can be the Hessian matrix of the Lagrange function of the problem (6.7) or another positive definite matrix approximating this Hessian. If this problem has no feasible solution or unbounded Lagrange multipliers the solution of problem (6.7) (or accordingly the solution process for the problem (6.12)) with the sequential quadratic programming approach is replaced by the solution of the following problem by the same approach:

where c is a sufficiently large constant. This is the elastic mode SQP method. To implement t,he idea of Anitescu (2002) assume that the problem (6.16) satisfies the (SMFCQ) and that the quadratic growth condition at a point x = x 0

(QGC) There exists a > 0 satisfying

for all x in some open neighborhood of xO is valid for problem (6.12) at a locally optimal solution of this problem.

THEOREM 6.15 (ANITESCU,2002) If the above two assumptions are satisfied then the elastic mode sequential quadratic programming algorithm computes a locally optimal solution of the problem (6.12) provided it is started suficiently close to that solution and the constant c is sufficiently large. Using stronger assumptions Fletcher et al. (2002) have even been able to prove local Q-quadratic convergence of sequential quadratic programming algorithms to solutions of (6.12).

6.

Discrete bilevel programming

If integer variables appear in the lower or upper levels of a bilevel programming problem the investigation becomes more difficult and the number of references is rather small, see Dempe (2003). With respect to the existence of optimal solutions the location of the discrete variables is important Vicente et al. (1996). Most difficult is the situation when the lower level problem is a parametric discrete one and the upper level.


187

problem is a continuous problem. Then the graph of the solution set mapping @(.) is in general neither closed nor open. The other cases can be treated more or less analogously to the continuous problems. One way to solve discrete optimization problems (and also bilevel programming problems) is branch-and-bound. If the integrality conditions in both levels are dropped at the beginning and are introduced via the branching procedure, then a global optimal solution of the relaxed problem, which occasionally proves to be feasible for the bilevel problem is in general not an optimal solution for the bilevel programming problem. Moreover, the usual fathoming procedure is not valid, see Moore and Bard, (1990). Fathoming is used in a branch-and-bound algorithm to decide that a node of the enumeration tree need not be explored further. This decision cannot be based on the comparison of the incumbent objective value with the optimal objective function value of the relaxed problem if an optimal solution of the latter problem proves to be feasible for the bilevel problem. Mixed-discrete linear bilevel programming problems with continuous lower level problems have been transformed into linear bilevel problems in Audet et al. (1997) which opens a second way for solving such problems. Other solution methods include one using explicitly the solution set mapping of a right-hand side parametrized Boolean knapsack problem in the lower level and another one using cutting planes in the discrete lower level problem with parameters in the objective function only (see Dempe, 2002). To describe a further approach consider a linear bilevel programming problem with integer variables in the upper level problem only:

subject to Alx 5 bl, x 2 0, integer where y solves

(6.20)

Then, an idea of White and Anandalingam (1993) can be used to transform this problem into a mixed discrete optimization problem, For this, apply the Karush-Kuhn-Tucker conditions to the lower level problem.

188


This transforms problem (6.20) into

subject to A 1 x 5 bl,

x 1 0 , integer,

B ~ >Ab2,

(6.21)

Now use a penalty function approach to get rid of the complementarity constraint resulting in the problem

subject to A l x

bl,

x

2 0,

integer,

(6.22)

A 2 x + B2y = b2, y 1 0 , B,TX b2. By application of the results in White and Anandalingam (1993) the following is obtained:

THEOREM 6.16 Assume that problem (6.22) has a n optimal solution for some positive KO. Then, the problem (6.22) describes a n exact penalty function approach for problem (6.20), i.e. there i s a number K* such that the optimal solutions of the problems (6.22) and (6.20) for all K 2 K* coincide. This idea has been used in Dempe and Kalashnikov (2002) to solve an application problem in gas industry. Moreover, the implications of a movement of the discreteness condition from the lower to the upper level problems has been touched there.

7.

Conclusion

In the paper a selective survey of results in bilevel programming has been given. It was not the intention of the author to give a detailed description of one or two results but rather to give an overview over different directions of research and to describe some of the challenges of this topic. Since bilevel programming is a very living area a huge number of questions remain open. Among others, these include optimality conditions as well as solution algorithms for problems with nonconvex lower level problems, discrete bilevel programming problems in every context, and many questions related to the investigation of pessimistic bilevel


189

programming problems. Also, one implication from NP-hardness often used in theory is that such problems should be solved with approximation algorithms which, if possible, should be complemented by a bound on the accuracy of the computed solution. One example for such an approximation algorithm can be found in Marcotte (1986) but in general the description of such algorithms is a challenging task for future research.

References Anandalingam, G. and F'riesz, T. (eds.). (1992). Hierarchical Optimization. Annals of Operations Research, vol. 24. Anitescu, M. (2002). On solving mathematical programs with complementarity constraints as nonlinear programs. Technical Report No. ANLINCS-P864-1200, Department of Mathematics, University of Pittsburgh,. Audet, C., Hansen, P., Jaumard, B., and Savard, G. (1997). Links between linear bilevel and mixed 0-1 programming problems. Journal of Optimization Theory and Applications, 93:273-300. Bard, J.F. (1998). Practical Bilevel Optimixation: Algorithms and Applications. Kluwer Academic Publishers, Dordrecht. Clarke, F.H. (1983). Optimixation and Nonsmooth Analysis. John Wiley & Sons, New York. Dempe, S. (1992). A necessary and a sufficient optimality condition for bilevel programming problems. Optimization, 25:341-354. Dempe, S. (2002). Foundations of Bilevel Programming. Kluwer Academic Publishers, Dordrecht. Dempe, S. (2003). Annotated bibliography on bilevel programming and mathematical programs with equilibrium constraints. Optimization, 52:333-359. Dempe, S. and Kalashnikov, V. (2002). Discrete bilevel programming: Application to a gas shipper's problem. Preprint No. 2002-02, T U Bergakademie Freiberg, Fakultat fiir Mathematik und Informatik. Dempe, S. and ~allaschke,D. (1997). Quasidifferentiability of optimal solutions in parametric nonlinear optimization. Optimixation, 40:l--24. Deng, X. (1998). Complexity issues in bilevel linear programming. In: Multilevel Optimization: Algorithms and Applications (A. Migdalas,

190


P.M. Pardalos, and P. Varbrand, eds.), pp. 149-164, Kluwer Academic Publishers, Dordrecht . Fiacco, A.V. (1983). Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. Academic Press, New York. Flegel, M.L. and Kanzow, C. (2002). Optimality conditions for mathematical programs with equilibrium constraints: Fritz John and Abadie-Type approaches. Report, Universitat Wurzburg, Germany. Flegel, M.L. and Kanzow, C. (2003). A Fritz John approach to first order optimality conditions for mathematical programs with equilibrium constraints. Optimization, 52:277-286. Fletcher, R. and Leyffer, S. (2002). Numerical experience with solving MPECs as NLPs. Numerical Analysis Report NA/210, Department sf Mathematics, University of Dundee, UK. Fletcher, R., Leyffer, S., Ralph, D., and Scholtes, S. (2002). Local Convergence of SQP Methods for Mathematical Programs with Equilibrium Constraints. Numerical Analysis Report NA/209, Department of Mathematics, University of Dundee, UK. Fliege, J. and Vicente, L.N. (2003). A Bicriteria Approach to Bilevel Optimization, Technical Report, Fachbereich Mathematik, Universitat Dortmund, Germany. Frangioni, A.(1995). ' o n a new class of bilevel programming problems and its use for reformulating mixed integer problems. European Journal of Operational Research, 82:615-646. Fukushima, M. and Pang, J.-S. (1999). Convergence of a smoothing continuation method for mathematical programs with complementarity constraints. In: Ill-posed Variational Problems and Regularization Techniques (M. Thera and R. Tichatschke, eds.). Lecture Notes in Economics and Mathematical Systems, No. 477, Springer-Verlag, Berlin. Fulop., J . (1993). On the Equivalence between a Linear Bilevel Programming Problem and Linear Optimization over the Efficient Set. Working Paper, No. W P 93-1, Laboratory of Operations Research and Decision Systems, Computer and Automation Institute, Hungarian Academy of Sciences. Geiger, C. and Kanzow, C. (2003). Theorie und Numerik restrzngierter Optimierungsaufgaben. Springer-Verlag, Berlin.


191

Guddat, J., Guerra Vasquez, F., and Jongen, H.Th. (1990). Parametric Optimization: Singularities, Pathfollowing and Jumps. John Wiley & Sons, Chichester and B.G. Teubner, Stuttgart. Hansen, P., Jaumard, B., and Savard, G. (1992). New branch-and-bound rules for linear bilevel programming. SIAM Journal on Scientific and Statistical Computing, 13:1194-1217. Harker, P.T. and Pang, J.-S. (1988). Existence of optimal solutions to mathematical programs with equilibrium constraints. Operations Research Letters, 7:61-64. Haurie, A., Savard, G., and White, D. (1990). A note on: An efficient point algorithm for a linear two-stage optimization problem. Operations Research, 38:553-555. Hirsch, M.W. (1994). Differential Topology. Springer-Verlag, Berlin. Klatte, D. and Kummer, B. (2002). Nonsmooth Equations in Optimixation; Regularity, Calculus, Methods and Applications. Kluwer Academic Publishers, Dordrecht. Kojima, M. (1980). Strongly stable stationary solutions in nonlinear programs. In: Analysis and Computation of Fixed Points (S.M. Robinson, ed.), pp. 93-138, Academic Press, New York. Lignola, M.B. and Morgan, J. (1997). Stability of regularized bilevel programming problems. Journal of Optimixation Theory and Applications, 93:575-596. Loridan, P. and Morgan, J. (1989). New results on approximate solutions in two-level optimization. Optimixation, 20:819-836. Loridan, P. and Morgan, J. (1996). Weak via strong Stackelberg problem: New results. Journal of Global Optimization, 8:263-287. Lucchetti, R., Mignanego, F., and Pieri, G. (1987). Existence theorem of equilibrium points in Stackelberg games with constraints. Optimixation, 18:857-866. Luo, Z.-Q., Pang, J.-S., and Ralph, D. (1996). Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge. Macal, C.M. and Hurter, A.P. (1997). Dependence of bilevel mathematical programs on irrelevant constraints. Computers and Operations Research, 24:ll29-ll4O.

192


Marcotte, P. (1986). Network design problem with congestion effects: A case of bilevel programming. Mathematical Programming, 34:142-162. Marcotte, P. and Savard, G. (1991). A note on the Pareto optimality of solutions to the linear bilevel programming problem. Computers and Operations Research, 18:355-359. Multilevel optimization: algorithms and applications. Nonconvex Optimixation and its Applications (Athanasios Migdalas, Panos M. Parda10s and Peter Varbrand, eds.), vol. 20, Kluwer Academic Publishers, Dordrecht . Mikhalevich, V.S., Gupal, A.M., and Norkin, V.I. (1987). Methods of Nonconvex Optimixation. Nauka, Moscow (in Russian). Mirrlees, J.A. (1999). The theory of moral hazard and unobservable bevaviour: Part I. Review of Economic Studies, 66:3-21. Moore, J. and Bard, J.F. (1990). The mixed integer linear bilevel programming problem. Operations Research, 38:911-921. Muu, L.D. (2000). On the construction of initial polyhedral convex set for optimization problems over the efficient set and bilevel linear programs. Vietnam Journal of Mathematics, 28:177-182. Outrata, J., KoEvara, M., and Zowe, J. (1998). Nonsmooth Approach to Optimization Problems with Equilibrium Constraints. Kluwer Academic Publishers, Dordrecht. Pardalos, P.M., Siskos, U.,and Zopounidis, C., eds. (1995). Advances in Multicriteria Analysis. Kluwer Academic Publishers, Dordrecht. Ralph, D. and Dempe, S. (1995). Directional derivatives of the solution of a parametric nonlinear program. Mathematical Programming, 70:159172. Scheel, H. and Scholtes, S. (2000). Mathematical programs with equilibrium constraints: stationarity, optimality, and sensitivity. Mathematics of Operations Research, 25:l-22. Scholtes, S. and Stohr, M. (2001). How stringent is the linear independence assumption for mathematical programs with stationarity constraints? Mathematics of Operations Research, 262351-863. Schramm, H. (1989). Eine Kombznation von bundle- und trust-regionVerfahren xur Losung nichtdiflerenxierbarer Optimierungsprobleme, No. 30, Bayreuther Mathematische Schriften, Bayreuth,.


193

Shapiro, A. (1988).Sensitivity analysis of nonlinear programs and differentiability properties of metric projections. SIAM Journal Control Optimization, 26:628-645. Vicente, L.N.,Savard, G., and Judice, J.J. (1996). The discrete linear bilevel programming problem. Journal of Optimization Theory and Applications, 89:597-614. Vogel, S. (2002). Zwei-Ebenen-Optimierungsaufgaben mit nichtkonvexer Zielfunktion in der unteren Ebene: Pfadverfolgung und Spriinge. Ph. D thesis, Technische Universitat Bergakademie Freiberg. White, D.J. and Anandalingam, G. (1993). A penalty function approach for solving bi-level linear programs. Journal of Global Optimization, 3:397-419.

Chapter 7

APPLICATIONS OF GLOBAL OPTIMIZATION TO PORTFOLIO ANALYSIS Hiroshi Konno .bstract

1.

We will survey some of the recent successful applications of deterministic global optimization methods to financial problems. Problems to be discussed are mean-risk models under nonconvex transaction cost, minimal transaction unit constraints and cardinality constraints. Also, we will discuss several bond portfolio optimization problems, long term portfolio optimization problems and others. Problems to be discussed are concave/d.c. minimization problems, minimization of a nonconvex fractional function and a sum of several fractional functions over a polytope, optimization over a nonconvex efficient set and so on. Readers will find that a number of difficult global optimization problems have been solved in practice and that there is a big room for applications of global optimization methods in finance.

Introduction

The purpose of this paper is to review some of the recent successful applications of global optimization methodologies in portfolio theory. Portfolio theory was originated by H. Markowitz in 1952 and has since developed into diverse field of quantitative finance including market risk analysis, credit risk analysis, pricing of derivative securities, structured finance, securitization? real options and so on. Mathema,tical programming is widely used in these areas, but applications in market risk malysis are by far the most important. Also, it is virtually the only area in finance where global optimization methodologies have been applied in a successful way. The starting point of the portfolio theory is the mean-variance (MV) model (Konno and Watanabe, 1996) in which the risk measured by the

196


variance of the rate of return of portfolio is minimized subject to the constraint on the level of expected return. This problem is formulated as a convex quadratic programming problem. Though mathematically simple, it took more than 30 years before a large scale mean-variance model was solved in practice, due to the computational difficultly associated with handling a completely dense variance-covariance matrix. The breakthrough occurred in 1984, when Perold (1984) solved a large scale mean-variance problem using a factor model approach and sparse matrix technologies. Twenty years later, we are now able to solve a very large scale MV model consisting of over 10,000 variables on a personal computer. If we replace variance by absolute deviation as a measure of risk, then we can solve the resulting mean-absolute deviation (MAD) model (Konno and Yamazaki, 1991) even when there are more than a million variables since the problem is reduced to a linear programming problem. Both MV model and MAD model can be formulated as a convex minimization problem, so it has little to do with "global" optimization. However, when we extend the model one step further, we need to introduce a variety of nonconvex terms. These include, among others transaction cost, tax, market impact, minimal transaction unit constraints and cardinality constraints. Then we need to apply global optimization approach to solve the resulting nonconvex problems. By global optimization methods, we mean here deterministic algorithms as discussed in the textbook of Horst and Tuy (1996). Also, we concentrate on a class of exact algorithms, i.e., those which generate an optimal solution in the limit or &-optimalsolution in finitely many step. There are still relatively few successful applications of global optimization to finance. Reasons are two-folds. First, deterministic global optimization is a rather new area. In fact, deterministic and exact algorithms are neglected in a survey paper of Rinnooy-Kan and Timmer (1989) appeared in 1989. This means that solving a non-convex problem in a deterministic way has been considered intractable until mid 1980's unless the problem has some special structure, such as concave minimization on an acyclic network (Zangwill, 1968). Heuristic and multi-start local search methods were the only practical methods for handling nonconvex problems without special structures. Therefore, most financial engineers are not aware of recent progress in global optimization and thus try to formulate the problem within the framework of convex minimization or simply apply local search or heuristic approach.

7 Applications of Global Optimization to Portfolio Analysis

197

Second, global optimizers are more interested in applications in physical problems. It appears that there is still psychological barrier for mathematical programmers to do research in dual (monetary) space. In the next two sections, we discuss applications of global optimization to mean-risk models. A variety of nonconvex problems have been solved successfully by employing mean-absolute deviation framework. Section 4 will be devoted to applications of fractional programming methods to bond portfolio analysis. Here we discuss the minimization of the sum of linear fractional functions and the ratio of two convex functions over a polytope. Section 5 will be devoted to miscellaneous applications of global optimization in finance such as minimization over an efficient set, long-term constant proportion portfolio problem, long-short portfolio and problems including integer constraints. Readers are referred to a recent survey on the applications of mathematical programming to finance by Mulvey (2001), a leading expert of both mathematical programming and financial engineering.

2.

Mean-risk models

In the following, we will present some of the basics of the mean risk models. Let there be n assets Sj, j = 1 , 2 , .. . ,n and let R j be the random variables representing the rate of return of Sj . Let x j 1 0 be the proportion of the fund to be invested into Sj. The vector x = (xl, xz, . . . ,x,) is called a portfolio, which has to satisfy the following condition. n

Let R(x) be the rate of return of the portfolio:

and let r(x) and v(x) be, respectively the mean and the variance of R(x). Then the mean-variance (MV) model is represented as follows. minimize subject to

v(x) r(x) 2 p (7.3) (MV) x E X, where X E Rn is an investable set defined by (7.1). Also, it may contain additional linear constraints. And p is a constant to be specified by an investor.

198


Let x(p) be an optimal solution of the problem (7.3). Then the trais called an efficient frontier. jectory of r (x(p)), There are two alternative representations of the mean variance model, namely (MV2)

minimize subject to

r(x) v(x) 5 a2 x € X,

(7.4)

(MV3)

maximize subject to

r ( x ) - Xv(x) x E X.

(7.5)

All three representations are used interchangeably since they generate the same efficient frontier as we vary p in (MV), a in (MV2) and X 2 0 in (MV3). There are several measures of risk other than variance (standard deviation) such as absolute deviation, lower semi-variance, (lower-semi) partial moments, below-target risk, value-at-risk (VaR) , conditional valueat-risk (CVaR) . Most of these except VaR are convex functions of x. Mean-risk models are denoted as either one of (7.3)-(7.5), where variance v(x) is replaced by one of the risks introduced above. However, following three risk measures are by far the most important from the computational point of view when we extend the model into the direction of global optimization. w ( x ) = E [IW- E[R(x)II] Lower semi-sbsolute deviation W- (x) = E [I R(X) - E [ ~ ( x ) ] Below target risk of degree one BTl (x) = E [I R(x) - T 1-1 (T is a constant) Absolute deviation

1-1

programming problem. since the associated mean-risk model can be formulated as a linear programing problem when (R1,R2, . . . R,) is distributed over a set of finitely many points (rlt, rzt,. . . rnt),t = 1 , 2 , . . . , T and

are known. For example, the mean-absolute deviation model minimize subject to

W (x) r(x) 2 p x € X

7 Applications of Global Optimization t o Portfolio Analysis

can be represented as follows:

where r j = c= :, ftrjt. It is straightforward to see that the problem can be converted to a linear programmming problem

II

minimize

CTZlft(st+ $Jt)

subject to

~ t - $ J t = C ~ = l ( r j t - ~ j ) ~t j=, 1 , 2 ,...,T st20,

$Jt>O,

t = 1 , 2 ,...,T

(7.9)

Also, CVaR,(x) defined by the lower a quantile of R(x): (7.10) E [- R(X) IR(X) I VaR, (x)], 1-a shares the same property as the above three measures (Rockafellar and Uryasev, 2001). CVaR, (x) =

3.

Mean-risk models under market friction

Markowitz formulated the mean-variance model assuming there is no friction in the market. However, nonlinear transaction fee and tax are associated with selling and/or buying assets. Also, we experience the socalled market impact effect when we buy a large amount of assets. The unit price of the asset may increase due to the supply-demand relation and thus the actual return would be substantially smaller than those in the ideal frictionless market. Also, we often need to handle discrete variables. Among such examples are minimal transaction unit constraints and cardinality constraints. The former is associated with the existence of minimal unit one can trade in the market, usually 1000 stocks in Tokyo Stock Exchange. The latter is associated with investors who do not want to hold too many assets, when one has to impose a condition on the maximal number of assets in the portfolio.

3.1

Transaction cost

There are two common types of transaction cost, i.e., piecewise linear concave and piecewise constant as depicted by Figure 7.1.

200


(a) piecewise linear concave

(b) piecewise constant

Figure 7.1. Transaction cost function.

Transaction cost is usually relatively large when the amount of transaction is smaller and it increases gradually with small rate, hence concave (Figure 7.l(a)). An alternative is a piecewise constant function as denoted by (Figure 7.l(b)), which is very popular in e-trade system. It is well known that these types of transaction cost functions can be represented in a linear form by introducing 0-1 variables. The number of 0-1 variables are equal to the number of linear pieces (or steps). Therefore, we need to introduce around 8 to 10 times n zero-one variables, so that it is out of the scope of the state-of-the-are integer programming softwares when n is over 1000.

3.2

Market impact cost

The unit price of the asset will sharply increase when we purchase assets beyond some bound, which induces additional transaction cost. One typical cost subject to market impact is depicted in Figure 7.2, which is a d.c. function.

Figure 7.2. Market impact.


201

Mean-absolute deviation model under concave and d.c. transaction cost c(x) : maximize r(x) - c(x) (7.11) subject to W(x) 5 w x E X, has been successfully solved by a branch and bound algorithm proposed by Phong et al. (See Phong et al., 1995, for details). The mean-absolute deviation model under transaction cost (7.10) can then be reformulated as a linearly constrained non-concave maximization problem: maximize subject to

As reported in Konno and VC7ijayanayake (1999), the problem can be solved in a few seconds on a personal computer when T 5 60 and n 5 500. In fact, the branch and bound algorithm below can generate an optimal solution much faster than the state-of-the-art integer programming software CPLEX applied to a 0-1 integer programming reformulation of the same problem (Konno and Yamamoto, 2003). Similar algorithms have been applied to a number of portfolio optimization problems under nonconvex transaction cost, including index tracking (Konno and Wijayanayake, 2001b), portfolio rebalance (Konno and Yamamoto, 2001), and long-short portfolio optimization (Konno et al., 2005). Further, this algorithm has been extended to portfolio optimization under market impact (), where the cost function becomes a d.c. function as depicted in Figure konno:fig2. Let us note that the MV model under nonconvex transaction cost still remains intractable from the computational point of view, since we need to handle a, large scale 0-1 quadratic programming problem.

3.3

Branch and bound algorithm

We will present here the branch and bound algorithm Konno and Wijayanayake (1999) used for solving linearly constrained separable concave minimization problem introduced above.

202


Let F be the set of (x,4, $) E R~~2T satisfying the constraints of problem (7.11)except the lower and apper bound constraints on xi's.

Branch and bound algorithm.

2" If

r = 4,then goto 9;Otherwise goto 3.

E I?: 3' Choose a problem (Pk)

maximize subject to

f (x)= C>l {rjxj- cj(xi)) (x,4,$) E F Pk 5 x ak.

4" Let $(xi) be a linear understimating function of ci(xi)over the inter(j= 1,2,., . ,n)and define a linear programming val pk x ak, problem

<
E then goto 8. Otherwise let fk = f (xk). 5" If fk < f" then goto 7;Otherwise goto 6.

6" If

f" = fk;k = kk and eliminate all the subproblems (Pi) for which gt(xt)l

f".

7" If gk(xk)5 f" then goto 2. Otherwise goto 8. k ) lj = 1,2,. . . , n) , 8" Let c,(x:) - c:(x:) = max{cj (x?) - c,k (xi

and define two subproblems:


r = r U {fl+l,f i + z ) ,

k =k

+ 1 and goto 3.

9" Stop: 2 is an &-optimalsolution of (Po). THEOREM7.1 2 converges to a n E-optimal solution of (Po) as k

Proof. See Thach et al. (1996).

-+

oo. 0

REMARK 2 Branching strategy using x: as a subdivision point is called w-subdevision strategy. A number of numerical experiments show that this strategy is usually superior to standard bisection, where the midpoint of the interval is chosen as a subdivision point.

3.4

Integer constraints

Associated with portfolio construction is a minimal unit we can purchase, usually 1,000 stocks in the Tokyo Stock Exchange. When the amount of fund is lagrer, then we can ignore this constraint and round the calculated portfolio to the nearest integer multiple of minimal transaction unit. The resulting portfolio exhibits almost the same risk-return structure. When however, the amount of fund is smaller, as in the case of the individual investor, simple rounding may significantly disport the portfolio, particular by when the amount of fund is small. It is reported in that we can properly handle these constraints by slightly modifying the branching strategy (Step 8) of the branch and bound algorithm. Also, the state of the art integer programming software can handle these integer constraints if the problem is formulated in the framework of mean-absolute deviation model (Konno and Yamamoto, 2003).

4.

Applications of fractional programming

Fractimal programming started in 1961, when Charnes and Cooper (1962) showed that the ratio of two nonnegative affine functions is quasiconvex and thus can be minimized over linear constraints by a variant of simplex method. Also, Dinkelbach (1967) showed that the ratio of a nonnegative convex and concave functions over a convex set can be minimized by solving a series of convex minimization problems. The sum of linear fractional functions is no longer quasi-convex, so that it cannot be minimized by convex minimization methodologies. Also, the ratio of two convex functions cannot be minimized by Dinkelbach's method. Minimizing the sum of linear ratios and minimizing general fractional functions is therefore the subject of a global optimization which is now under intestive study,

204


Associated with bonds are several alternative measures of return and risk. Among the popular return measures are average of direct yield, terminal yield and maturity, all of which are represented as a linear fractional function of a portfolio x. A typical bond portfolio optimization problem is to maximize one of these linear fractional functions over a linear system of equalities and inequalities, which can be solved standard methods. Another problem associated with bond portfolio (Konno and Watanabe, 1996) is qf x+q1o q;x+qzo maximize P ~ , Z + P-~P:X+PZO O subject to

Alx

+ Azy 5 b

I

220, Y20 where x E Rnl, y E Rn2 are, respectively the amount of assets to be added and subtracted from the portfolio. A number of algorithms have been proposed for this problem, among which is a parametric simplex alogorithm (Konno and Watanabe, 1996) seems to be the most efficient. The first step of this algorithm is to define w = l/(pt,x +p20),

X = wz,

Y = wy

and convert the problem (7.13) as follows: tX+910w - (q:Y

maximize

+420~)

+ A2Y - bw 5 0 p i x + p20w = 1

subject to

A1X

XLO,

(7.14)

wLO.

Let (X*,Y*, w*) be an optimal solution of (7.14). Then (x*,y*) = (X*/w*,Y*/w*) is an optimal solution of (7.13). The problem (7.14) is equivalent to 1

t

maximize

?(qlX

subject to

AIX

+ plow) - (qiY + qmw)

+ A2Y - bw I 0

p i x fp20w = 1 P;X

+ piow = F

x>o, lmin

WLO

I I I Emax


205

are respectivily, the maximal and minimal value of where I,, and tmin pi X plow in the feasible region. Let us note that this problem can be efficiently solved by primalldual parametric simplex algorithm. Optimization of the weighted sum of objectives leads to a maximization of sum of ratios over a polytope. An efficient branch and bound algorithm using well designed convex under estimating function can now solve the problem with the number of fractional terms up to 15 (Konno, 2001). Another fractional problem is the maximal predictability portfolio problem proposed by Lo and MacKinlay (1997) and solved by Gotoh and Konno (2001). maximize

+

subject to

x E X,

where both P and Q are positive definite and X is a polyhedral set. If P is negative semi-definite and Q is positive definite, then the problem can be solved by Dinkelbach's approach (Dinkelbach, 1967). Let us define a function

for X > 0 and let x(X) be the maximal x correspanding to g(X). Let A* be such that g(X*) = 0. Then it is easy to see (Gotoh and Konno, 2001) that x(X*) is an optimal solution of (7.16) for general P and Q. Problem defining g(X) is a convex maximization problem when P is positive semi-definite, which can be solved by a branch and bound algorithm when n is small. Also the zero point of g(X) can be found by bisection or other search methods. It has been demonstrated in Gotoh and Konno (2001) that the problem can be solved fast when n is less than 20.

5.

Miscellaneous applications

In the section, we will discuss additonal important applications of global optimization in finance.

5.1

Optimization over an efficient set

Let us consider a class of multiple objective optimization problems P j , j = 1 , 2,..., k . maximize subject to

c$x x EX

(7.17)

206


A feasible solution x* E X is called efficient when there exists no x E X such that t (7.18) cjx 2 cjx*, j = 1 , 2 , . . . , lc, and strict inequality holds for at least one j. The set XE of efficient solutions is called an efficient set. Let consider another objective function f o ( + )and consider maximize subject to

f0 (x) x E XE

(7.

This is a typical global optimization problem since XE is a nonconvex set. A number of algorithms have been proposed when X is polyhedral and fo is convex (Yamamoto, 2002). In particular, when fo is linear there exists a finitely convergent algorithm. Multiple objective optimization problems appear in bond portfolio analysis as explained in Section 3. Fortunately, the number of objectives is usually single digit, usually less than 5. It has been shown in Thach et al. (1996) that the problem of finding a portfolio on the efficient frontier such that the piecewise linear transaction cost associated with rebalancing a portfolio from the current portfolio xO n.

is minimal can be solved by dual reformulation of the original problem. The problem with up to k = 5 and up to 100 variables can be solved within a practical amount of computation time. Also a minimal cost rebalancing problem with the objective function

where c(.) is a piecewise linear concave function and XE is a meanabsolute deviation efficient frontier, can be a solved by a branch and bound algorithm by noting that XE consists of a number of linear pieces (Konno and Yamamoto, 2001). The problem can be reduced to a series of linearly constrained concave minimization problem which can be solved by a branch and bound algorit.hm of Phong et al. (1995).

5.2

Long term port folio opt imization by constant rebalance

Constant rebalance is one of very popular methods for long term portfolio management, where one sells those assets whose price is higher and


207

purchases those assets whose price is lower and thus keep the proportion of the weight of the portfolio constant. Given the expected return in each period, the mean variance model (7.5) over the planning horizon T becomes a minimization of a highly nonconvex polynomial function over a polytope : 2

maximize

subject to

f ~ ( x= )

c:'~ fa {nLl(c:=~ (1 + r;Jxj)} 2 - (EL1 fs n L c>,cl+ r;,)xj) - CLf nL1(c:==,(l+ r;,)~,)

Cy==lx j = 1, 0

(7.20)

< x j < aj, j = 1 , 2 , .. . ,n,

where

f, the probability of the scenario s ,

rjt rate of return of asste j during period t under secnario s. Maranas et al. (1997) applied a branch and bound algorithm similar to the one explained in Section 3.3 using

as an underestimator of fx(x) over the hyper-rectangle [pk,a k ] . When y is large enough, gx(x: y) is a convex function of x. Also

where

S = max{ajk - pjk I j

= 1,2,

. . . ,n}.

Hence gx(x: y) is a good approximation of fx(x) when 6 is small enough. It is shown in Maranas et al. (1997) that this algorithm can solve problems of size up to (n, T, s) = (9,20,100).

5.3

Optimization of a long-short portfolio

Long- short portfolio where one is allowed to sell assets short is a very popular fund managernent strategy among hedge funds. The resulting optimization problem looks to be an easy concave maximization problem without sign constraints on the weights of portfolio. However, it is really not. First, the fund manager has to pay deposit in addition to the transaction cost. Also he is not supposed to leave cash unused. Then, the cash out of short sale is reserved at the the third party who lends the asset.

208


As a result, the investable set of the mean-variance model [6] becomes a non-convex set

Also, the objective function contains a non-convex transaction cost. Therefere the problem becames a maximization of a non-concave objective function over a non-convex region. This seemingly very difficult problem has been successfully solved (Konno et al., 2005) by extending the branch and bound algorithm of Section 3.3.

Ackowledgements This research was supported a part by the Grantin-Aid for Scientific Reseach of the Mimistry of Education, Science, Culture and Sports B(2) 15310122 and 15656025. Also, the author acknowledges the generous support of the Hitachi Corporation.

References Charnes, A. and Cooper, W.W. (1962). Programming with linear fractional functionys. Naval Reseach Logistics Quanterly, 9: 181-186. Dinkelbach, D. (1967). On nonlinear fractional programming. Management Science, 13:492-498. Gotoh, J. and Konno, H. (2001). Maximization of the ratio of two convex quadratic fructions over a polytope. Computational Optimization and Apllication, 20:43-60. Horst, R. and Tuy, H. (1996). Global Optimixation: Deterministic Approaches. 3rd edition. Springer Verlag. Konno, K. (2001). Minimization of the sum of several linear fractional functions. In: N. Hadjisavvas (ed.), Advances in Global Optimixation, pp. 3-20. Springer-Verlag. Konno, H., Koshizuka, T. , and Yamamoto, R. (2005). Optimization of a long-short portfolio under nonconvex transaction cost. Forthcoming in Dynamics of Continuous, Discrete and Im,pulsive Systems. Konno, H., Thach, P.T., and Tuy, H. (1997). Optimization on Low Rank Nonconvex Structures. Kluwer Academic Publishers.


209

Konno, H., and Watanabe, H. (1996). Nonconvex bond portfolio optimization problems and their applications to index tracking. Journal of the Operetions Research Society of Japan, 39:295-306. Konno, H. and Wijayanayake, A. (1999). Mean-absolute deviation portfolio optimization model under transaction costs. Journal of the Operations Research Society of Japan, 42:422-435. Konno, H. and Wijayanayake, A. (2000). Portfolio optimization problems under d.c. transaction costs and minimal transaction unit constraints. Journal of Global Optimization, 22:137-154. Konno, H. and Wijayanayake, A. (2001a). Optimal rebalancing under concave transaction costs and minimal transaction units constraints. Mathematical Programming, 89:233-250. Konno, H. and Wijayanayake, A. (2001b). Minimal cost index tracking under concave transaction costs. International Journal of Theoretical and Applied Finance, 4:939-957. Konno, H. and Yamamoto, R. (2001). Minimal concave cost rebalance to the efficient frontier. MathematicalProgramming, B89:233-250. Konno, H. and Yamamoto, R. (2003). Global Optimixation us. Integer Programming in Portfolio Optimization Under Nonconvex Transaction Cost. Working paper, ISE 03-07, Department of Industrial and Systems Engineering, Chuo University. Konno, H. and Yamazaki, H. (1991). Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market. Management Science, 37:519-531. Lo, A. and MacKinlay, C. (1997). Maximizing predictablity in stock and bond markets. Microeconomic Dynamics, 1:102-134. Maranas, C., Androulakis, I., Berger, A., Floudas, C.A., and Mulvey, J.M. (1997). Solving tochastic control problems in finance via global optimization, Journal of Economic Dynamics and Control. 21:14051425. Markowitz, H. (1959). Portfolio Selection; Eficient Diversification of Investment. John Wiley & Sons. Mulvey, J.M. (2001). Introduction to financial optimization: Mathematical programming special issue. Mathmatical Programming, B89:205216.

210


Perold, A. (1984). Large scale portfolio optimization. Management Science, 30:1143-1160. Phong, T.Q., An, L.T.H., and Tao, P.D. (1995). On globally solving linearly constrained indefinite quadratic minimization problem by decomposition branch and bound method. Operations Research Letters, 17:215-220. Rinnooy-Kan, A.H. and Timmer, G.T. (1989). Global optimization, In: Nemhauser, G.L. et al. (eds.), Handbooks in Operations Research and Management Science, vol. 1, Chapter 9. Elsevier Science Publishers, B.V. Rockafellar, R.T. and Uryasev, S. (2001). Optimization of conditional value-at-risk. Journal of Risk, 2:21-41. Thach, P.T., Konno, H., and Yokota, D. (1996). A dual approach to a mimimization on the set of Pareto-optimal solutions. Journal of Optimization Theory and Applications, 88:689-707. Tuy, H. (1998). Convex Analysis and Global Optimixation. Kluwer Academic Publishers, Dordrecht. Yamamoto, Y. (2002). Optimization over the efficient set: Overview. Journal of Global Optimixation, 22:285-317. Zangwill, W. (1968). Minimun concave cost flows in certain networks. Management Science, 14:429-450.

Chapter 8

OPTIMIZATION TECHNIQUES IN MEDICINE Panos M. Pardalos Vladimir L. Boginski Oleg Alexan Prokopyev Wichai Suharitdamrong Paul R. Carney Wanpracha Chaovalitwongse Alkis Vazacopoulos Abstract

1.

We give a brief overview of a rapidly emerging interdisciplinary research area of optimization techniques in medicine. Applying optimization approaches proved to be successful in various medical applications. We identify the main research directions and describe several important problems arising in this area, including disease diagnosis, risk prediction, treatment planning, etc.

Introduction

In recent years, there has been a dramatic increase in the application of optimization techniques to the study of medical problems and the delivery of health care. This is in large part due to contributions in three fields: the development of more efficient and effective methods for solving large-scale optimization problems (operations research), the increase in computing power (computer science), and the development of more sophisticated treatment methods (medicine). The contributions of the three fields come together since the full potential of the new treatment methods often cannot be realized without the help of quantitative models and ways to solve them. Applying optimization techniques proved to be effective in various medical applications, including disease diagnosis, risk prediction, treatment planning, imaging, etc. The success of these approaches is par-

212


ticulary motivated by the technological advances in the development of medical equipment, which has made possible to obtain large datasets of various origin that can provide useful information in medical applications. Utilizing these datasets for the improvement of medical diagnosis and treatment is the task of crucial importance, and the fundamental problems arising here are to find appropriate models and algorithms to process these datasets, extract useful information from them, and use this information in medical practice. One of the directions in this research field is associated with applying data mining techniques to the rnedical data. This approach is especially useful in the diagnosis of disease cases utilizing the datasets of historical observations of various characteristics of different patients. Standard mathematical programming approaches allow one to formulate the diagnosis problems as optimization models. In addition to diagnosis, optimization techniques are successfully applied to treatment planning problems, which deal with the development of the optimal strategy of applying a certain therapy to a patient. An important aspect of these problems is the identification and efficient control of various risk factors arising in the treatment process. These risk management problems can be addressed using optimization methods. There are numerous other application areas of optimization techniques in medicine, that are widely discussed in the literature (Pardalos and Principe, 2002; Sainfort et al., 2004; Du et al., 1999; Pardalos et al., 2004b; Cho et al., 1993). This chapter reviews the main directions of optimization research in medical domain. The remainder of the chapter is organized as follows. In Section 2 we present several examples of applying optimization techniques to diagnosis and prediction in medical applications: diagnosis of breast cancer, risk prediction by logical analysis of data, human brain dynamics and epileptic seizure prediction. Section 3 discusses treatment planning procedures using the example of the radiotherapy planning. In the next two sections we give a brief review of optimization problems in medical imaging and health care applications. Finally, Section 6 concludes the discussion.

2.

Diagnosis and prediction

Diagnosis and prediction are among the most fundamental problems in medicine, which play a crucial role in the successful treatment process. In this section, we present several illustrative examples of applying optimization techniques to these problems.

8

Optimization Techniques i n Medicine

2.1

213

Disease diagnosis and prediction as data mining applications

In a common setup of the disease diagnosis problem, one possesses a historical dataset of disease cases (corresponding to different patients) represented by several known parameters (e.g., the patient's blood pressure, temperature, size of a tumor, e t ~ . ) .For all elements (patients) in this dataset, the actua,l disease diagnosis outcome is known. A natural way to diagnose new patients is to utilize the available dataset with known diagnosis results (so-called training dataset) for constructing a mathematical model that would classify disease cases with unknown diagnosis outcomes based on the known information. In the data mining framework, this problem is referred to as classification, which is one of the major types of problems in predictive modeling, i.e., predicting a certain attribute of an element in a dataset based on the known information about its other attributes (or features). Due to the availability of a training da.taset, these problems are also associated with the term "supervised learning." To give a formal introduction to classification, suppose that we have a dataset of N elements, and each of these elements has a finite number of certain attributes. Denote the number of attributes as n. Then every element of the given dataset can be represented as a pair (xi, yi), i = 1,. . . ,N , where xi E Rn is an n-dimensional vector:

and yi is the class attribute. The value of yi defines to which class a given element belongs, and this value is known a priori for each element of the initial dataset. It should be also mentioned that in this case yi can take integer values, and the number of these values (i.e., the number of classes) is pre-defined. Now suppose that a new element with the known attributes vector x, but unknown class attribute y, is added to the dataset. As it was mentioned above, the essence of classification problems is to predict the unknown value of y. This is accomplished by identifying a criterion of placing the element into a certain class based on the information about the known attributes x of this element. The important question arising here is how to create a formal model that would take the available dataset as the input and perform the classification procedure. The main idea of the approaches developed in this field is to adjust (or, "train") the parameters of the classification model using the existing information about the elements in the available training dataset and

214


then apply this model to classifying new elements. This task can often be reduced to solving an optimization problem (in particular, linear programming) of firiding optimal values of the parameters of a classification model. One of the techniques widely used in practice deals with the geometrical approach. Recall that since all the data elements can be represented as n-dimensional vectors (or points in the n-dimensional space), then these elements can be separated geometrically by constructing the surfaces that serve as the "borders" between different groups of points. One of the common approaches is to use linear surfaces (planes) for this purpose, however, different types of nonlinear (e.g., quadratic) separating surfaces can be considered in certain applications. It is also important to note that usually it is not possible to find a surface that would "perfectly" separate the points according to the value of some attribute, i.e., points with different values of the given attribute may not necessarily lie at the different sides of the surface, however, in general, the number of such errors should be small enough. So, according to this approach, the classification problem is represented as the problem of finding geometrical parameters of the separating surface(s). These parameters can be found by solving the optimization problem of minimizing the misclassification error for the elements in the training dataset (so-called "in-sample error"). After determining these parameters, every new data element will be automatically assigned to a certain class, according to its geometrical location in the elements space. The procedure of using the existing dataset for classifying new elements is often called "training the classifier." It means that the parameters of separating surfaces are "tuned" (or, "trained") to fit the attributes of the existing elements to minimize the number of errors in their classification. However, a crucial issue in this procedure is not to "overtrain" the model, so that it would have enough flexibility to classify new elements, which is the primal purpose of constructing the classifier. As an illustrative example of applying optimization techniques for classification of disease cases, we briefly describe one of the first practical applications of mathematical programming in classification problems developed by Mangasarian et al. (1995). This study deals with the diagnosis of breast cancer cases. The essence of the breast, cancer diagnosis system developed in Mangasarian et al. (1995) is as follows. The authors considered the dataset consisting of 569 30-dimensional feature vector corresponding to each patient. Each case could be classified as malignant or benign, and the actual diagnosis was known for all the elements in the dataset. These 569 elements were used for "training" the classifier, which was developed

8

215

Optimization Techniques in Medicine

based on linear programming (LP) techniques. The procedure of constructing this classifier is relatively simple. The vectors corresponding to malignant and benign cases are stored in two matrices. The matrix A (m x n) contains m malignant vectors (n is the dimension of each vector), and the matrix B (Ic x n) represents Ic benign cases. The goal of the constructed model is to find a plane which would separate all the vectors (points in the n-dimensional space) in A from the vectors in B. If a plane is defined by the standard equation

where w = (wl, . . . , w , ) ~ is an n-dimensional vector of real numbers, and y is a scalar, then this plane will separate all the elements from A and B if the following conditions are satisfied:

Here e = ( I l l , .. . , l)Tis the vector of ones with appropriate dimension (m for the matrix A and k for the matrix B). However, as it was pointed out above, in practice it is usually impossible to perfectly separate two sets of elements by a plane. So, one should try to minimize the average measure of misclassifications, i.e., in the case when the constraints (8.1) are violated the average sum of violations should be as small as possible. The violations of these constraints are modeled by introducing nonnegative variables u and v as follows:

Now we are ready to write down the optimization model that will minimize the total average measure of misclassification errors as follows: m

k

C + I cC vj .

1 min ui W,Y;U,V m i=l

1

-

3=1

subject to Aw+u2ey+e Bw-vsey-e u20, v20. As one can see, this is a linear programming problem, and the decision variables here are the geometrical parameters of the separating plane w and y, as well as the variables representing misclassification error u and v. Although in many cases these problems may involve high

216


dimensionality of data, they can be efficiently solved by available LP solvers, for instance Xpress-MP or CPLEX. The misclassification error that is minimized here is usually referred to as the in-sample error, since it is measured for the training sample dataset. Note that if the in-sample error is unacceptably high, the classifying procedure can be repeated for each of the subsets of elements in the halfspaces generated by the separating plane. As a result of such a procedure several planes dividing the elements space into subspaces will be created, which is illustrated by Figure 8.1. Then every new element will be classified according to its location in a certain subspace. If we consider the case of only one separating plane, then after solving the above problem, each new cancer case is automatically classified into either malignant or benign class as follows: if the vector x corresponding to this case satisfies the condition xTw > 7 it is considered to be malignant, otherwise it is assumed to be benign. It is important to mention that although the approach described here is rather simple, its idea can be generalized for the case of multiple classes and multiple nonlinear separating surfaces. Another issue associated with the technique considered in this section is so-called overtraining the classijier, which can happen if the training sample is too large. In this case, the model can adjust to the training dataset too much, and it would not have enough flexibility to classify the unknown elements, which will increase the generalization (or, "outof-sample") error. In Mangasarian et al. (1995), the authors indicate that even one separating plane can be an overtrained classifier if the number of attributes in each vector is too large. They point out that the best out-of-sample results were achieved when only three attributes of each vector were taken into account, and one separating plane was used. These arguments lead to introducing the following concepts closely related to classification: feature selection and support vector machines (SVMs). A review of these and other optimization approaches in data mining is given in Bradley et al. (1999). The main idea of feature selection is choosing a minimal number of attributes (i.e., components of the vector x corresponding to a data element) that are used in the construction of separating surfaces (Bradley et al., 1998). This procedure is often important in practice, since it may produce a better classification in the sense of the out-of-sample error. The essence of support vector machines is to construct separating surfaces that will minimize the upper bound on the out-of-sample error. In the case of one linear surface (plane) separating the elements from two classes, this approach will choose the plane that maximizes the sum of the distances between the plane and the closest elements from each class,

8


Figure 8.1. An example of binary classification using linear separating surfaces

i.e., the "gap" between the elements from different classes (Burges, 1998; Vapnik, 1995). An application of support vector machines to breast cancer diagnoses is discussed in Lee et al. (2000).

2.2

Risk Prediction by logical analysis

Risk stratification is very common in medical practice. It is defined as the ability to predict undesirable outcomes by assessing patients using the available data: age, gender, health history, specific measurements like EEG, ECG, heart rate, etc. (Califf et al., 1996). The usefulness of any risk-stratification scheme arises from how it links the data to a specific outcome. Risk-stratification systems are usually based on some standard statistical models (Hosmer and Lemeshow, 1989). Recently a new methodology for risk prediction in medical applications using Logical Analysis of Data (LAD) was proposed (Alexe et al., 2003). The LAD technique was first introduced in Hammer (1986). This methodology is based on combinatorial optimization and Boolean logic. It was successfully applied for knowledge discovery and pattern recognition not only in medicine, but in oil exploration, seismology, finance, etc. (Boros et al., 2000). Next, we briefly describe the main idea of LAD. More detailed information about this approach can be found in Boros et al. (1997), Ekin et al. (2000), and Alexe et al. (2003). Let R c Rn be a set of observations. By R+ and R- we denote subsets of positive and negative observations respectively. We also need to define the notion of a pattern P:

218


where ai (i E I) and ,Oj ( j E J) are sets of real numbers (so-called cutpoints), I and J are sets of indices. A pattern P is positive (negative) if P n Rf # 0 (P n 0- # 0 ) and P fl R- = 0 (P n RS = 0 ) . Obviously, in a general case for real-life applications the number of detected patterns can be extremely large. Patterns are characterized by three parameters: degree, prevalence and risk. The degree of a pattern is the number of inequalities, which identify the pattern in (8.7). The total number of observations in the pattern P is called its absolute prevalence. The relative prevalence of a pattern P is defined as the ratio of its absolute prevalence and 101. The risk p p of a pattern P identifies the proportion of positive observations in the pattern IP n R+I PP =

pnnl

'

Introducing some thresholds on degree, prevalence, and risk we can identify the high-risk and low-risk patterns in the given datasets. Set C = CS U C- is called a pandect, where CS (C-) is a set of high-risk (low-risk) patterns. An application of LAD to coronary risk prediction is presented in Alexe et al. (2003), where the problem of constructing a methodology for distinguishing groups of patients at high and at low mortality risk is addressed. The size of the pandect C in the considered problem was about 4700 low- and high-risk patterns, which is obviously too large for practical applications. To overcome this difficulty, a nonredundant system of low- and high-risk patterns T = TS U T- was obtained. This system satisfies the following properties:

Using the system T defined above the following classification tool referred to as the Prognostic Index ~ ( x is) defined as

where T+ (7-) is the number of high-risk (low-risk) patterns in T , and rf (x) (7-(2)) is the number of high-risk (low-risk) patterns, which are

8


219

satisfied by an observation x. Using ~ ( x )an , observation x is classified to low- or high-risk depending on the sign of ~ ( x )The . number of patients classified into the high- and low-risk groups was more than 97% of the size of the studied population. The proposed technique was shown to outperform standard methods used by cardiologists (Alexe et al., 2003).

2.3

Brain dynamics and epileptic seizure prediction

The human brain is one of the most complex systems ever studied by scientists. Enormous number of neurons and the dynamic nature of connections between them makes the analysis of brain function especially challenging. Probably the most important direction in studying the brain is treating disorders of the central nervous system. For instance, epilepsy is a common form of such disorders, which affects approximately 1% of the human population. Essentially, epileptic seizures represent excessive and hypersynchronous activity of the neurons in the cerebral cortex. During the last several years, significant progress in the field of epileptic seizures prediction has been made. The advances are associated with the extensive use of electroencephalograms (EEG) which can be treated as a quantitative representation of the brain functioning. Motivated by the fact that the complexity and variability of the epileptic seizure process in the human brain cannot be captured by traditional methods used to process physiological signals, in the late 1980s, Iasemidis and coworkers pioneered the use of the theory of nonlinear dynamics to link neuroscience with an obscure branch of mathematics and try to understand the collective dynamics of billions of interconnected neurons in brain (Iasemidis and Sackellares, 1990, 1991; Iasemidis et al., 2001). In those studies, measures of the spatiotemporal dynamical properties of the EEG were shown to be able to demonstrate patterns that correspond to specific clinical states (a diagram of electrode locations is provided in Figure 8.2). Since the brain is a nonstationary system, algorithms used to estimate measures of the brain dynamics should be capable of automatically identifying and appropriately weighing existing transients in the data. In a, chaotic system, orbits originating from similar initial conditions (nearby points in the state space) diverge exponentially (expansion process). The rate of divergence is an important aspect of the system dynamics and is reflected in the value of Lyapunov exponents. The method developed for estimation sf Short Term Maximum Lyapunov Exponents (STL,,), an estimate of L,, for nonstationary data, is explained in Iasemidis et al. i2000). Having estimated the STL,, temporal profiles


Figure 8.2. (A) Inferior transverse and (B) lateral views of the brain, illustrating approximate depth and subdural electrode placement for EEG recordings are depicted. Subdural electrode strips are placed over the left orbitofrontal (AL),right orbitofrontal (AR), left subtemporal (BL), and right subtemporal (BR) cortex. Depth electrodes are placed in the left temporal depth ( ( C L ) and right temporal depth (CR) to record hippocampal activity

8


221

at individual cortical site, and as the brain proceeds towards the ictal state, the temporal evolution of the stability of each cortical site can be quantified. However, the system under consideration (brain) has a spatial extent and, as such, information about the transition of the system towards the ictal state should also be included in the interactions of its spatial components. The spatial dynamics of this transition are captured by consideration of the relations of the STL,, between different cortical sites. For example, if a similar transition occurs at different cortical sites, the STL,, of the involved sites are expected to converge to similar values prior to the transition. Such participating sites are called "critical sites," and such a convergence "dynamical entrainment." More specifically, in order for the dynamical entrainment to have a statistical content, the T-index (from the well-known paired T-statistics for comparisons of means) as a measure of distance between the mean values of pairs of STL,, profiles over time can be used. The T-index at time t between electrode sites i and j is defined as: Ti,j(t) = v'Ex IE{STLmax,i - STLmaxj) 1 /ai (t) where E { . ) is the sample average difference for the STLmaX,i- STLmax estimated over a moving window wt(X) defined as: 1 if X E [ t - N - l , t ] i f ~ e [ t - ~ - ~ , t ] ,

o

where N is the length of the moving window. Then, ai,j(t) is the sample standard deviation of the STL,,, differences between electrode sites i and j within the moving window wt(X). Thus defined T-index follows a t-distribution with N - 1 degrees of freedom. Therefore, a two-sided t-test with N - 1 degrees of freedom, at a statistical significance level a should be used to test the null hypothesis, Ho: "brain sites i and j acquire identical STL,,, values at time t." Not surprisingly, the interictal (before), ictal (during), and immediate postictal (after the seizure) states differ with respect to the spatiotemporal dynamical properties of intracranial EEG recordings. However, the most remarkable finding was the discovery of characteristic spatiotemporal patterns among critical electrode sites during the hour preceding seizures (Iasemidis and Sackellares, 1990, 1991; Iasemidis et al., 2001; Sackellares et al., 2002; Pardalos et al., 2003b,a,c). Such critical electrode sites can be selected by applying quadratic optimization techniques and the electrode selection problem can be formulated as a quadratically constrained quadratic 0-1 problem (Pardalos et al., 2004a): min x T ~ x

(8.9)


where the following definitions are used: A is a n x n matrix, whose each element a i j represents the T-index between electrode i and j within 10minute window before the onset of a seizure, B is n x n matrix, whose each element bi,j represents the T-index between electrode i and j within 10-minute window after the onset of a seizure, k denotes the number of selected critical electrode sites, T, is the critical value of T-index to reject Ho, vector x = ( x l , . . . ,x,) E (0, l j n , where each xi represents the cortical electrode site i . If the cortical site i is selected to be one of the critical electrode sites, then xi = 1; otherwise, xi = 0. The use of a quadratical constraint ensures that the selected electrode sites show dynamical resetting of the brain following seizures (Shiau et al., 2000; Pardalos et al., 2002a), that is, divergence of STLmaxprofiles after seizures. In more details seizure prediction algorithm based on nonlinear dynamics and multi-quadratic 0-1 programming is described in Pardalos et al. (2004a). other g o u p s reported evidence in support of the existence of the preictal transition, which is detectable through quantitative analysis of the EEG in Elger and Lehnertz (1998), Lehnertz and Elger (1998), Martinerie et al. (1998), Quyen et al. (1999), and Litt et al. (2001). The use of algebra-geometric approach to the study of dynamic processes in the brain is presented in Pardalos et al. (2003d). Quantum models are discussed in Pardalos et al. (2002b) and Jibu and Yassue (1995).

3.

Treatment planning

In this section we discuss an application of optimization techniques in treatment planning. Probably the most developed and popular domain in medicine, where optimization techniques are used for treatment planning is radiation therapy. Radiation therapy is the method to treat cancer with high-energy radiation that destroys the ability of cancerous cells to reproduce. There are two types of radiation therapy. The first one is an external beam radiation with high-energy rays aimed to the cancerous tissues. Multileaf collimator shapes the beam by blocking out some parts of the beam. To precisely shape the beam, multileaf collimators consist of a small array

8

223


of metal leaves for each beam. Thus, each beam is specified by a set of of evenly spaced strips (pencils), and the treatment plan is defined by a collection of beams with the amount of radiation to be delivered along each pencil within each beam. The other radiation therapy method is called brachytherapy. In this type of treatment, radioactive sources (seeds) are placed in or near the tumors. These two types of therapy need to be planned to localize the radiation area with the minimum of destroyed tissue. For external beam radiation therapy, radiation planning involves the specification of beams, direction, intensity and shape of the beam. It is a difficult problem because we need to optimize the dose to the tumor (cancerous area) and minimize the damage to healthy organs simultaneously. To reduce the difficulty of the treatment planning procedure, optimization techniques have been applied. Numerous optimization algorithms were developed for the treatment planning in radiation therapy. As one of the possible approaches one can consider multi-objective optimization techniques (Lahanas et al., 2003a,b). Linear (Lodwick et al., 1998), mixed-integer (Lee and Zaider, 2003; Lee et al., 2003b) and nonlinear programming (Billups and Kennedy, 2003; Ferris et al., 2003) techniques are extensively used in therapy planning. the initial step in any radiotherapy planning is to obtain a set of tomography images of the patient's body around the tumor. The images are then discretized into sets of pixels: critical (the set of pixels with healthy tissue sensitive to radiotherapy); body (the set of pixels with healthy tissue not very sensitive to radiotherapy) and tumor (the set of pixels with cancer cells). The formulations of the breatment planning models (linear, quadratic, etc.) depend on the specified clinical constraint: dose homogeneity, target coverage, dose limits for different anatomical structures, etc. As an example of a problem arising in this area consider the work presented in Billups and Kennedy (2003), where the following formulation based on Lodwick et al. (1998) was discussed: Minimize m a x dose to critical structures Subject to: required tumor dose

< tumor dose 5 max tumor dose,

normal tissue dose 5 dose bound for normal tissue dose 2 0.

Billups and Kennedy (2003) formulate the problem as follows:

s.t.: y -

7,x

D(c,p,b)z(p,b)

PEPb € B

> 0,

c E critical,

224


where x(p, b) is the amount of radiation to be delivered using the pth pencil in the b-th beam, D(i, p, b) is the fraction of x(p, b) that will be delivered to pixel i, Tl and Tu are the lower bound and upper bound of the amount of radiation delivered to tumor element, respectively, and y is a dummy variable. If we denote by S the feasible region, the problem above can be written as: min y. -Y,xES

In order to reduce the number of used beams (Billups and Kennedy, 2003) penalized the objective function for each used beam by some fixed penalty P:

P

In order to reduce the radiation exposure to the healthy tissue, the brachytherapy was developed as an alternative to external beam radiation. Nevertheless, the right placement of seeds in tumors is a complicated problem. Lee and Zaider (2003) developed the treatment planning for prostate cancer cases by using a mixed integer programming optimization model. This algorithm uses 0-1 variables to indicate the placement and non-placement of seeds in the three-dimensional grid generated by ultrasound image. Since each seed radiates a certain amount of dose, the radiation dose at point P can be modeled from each location of seeds implanted in tumors. Using this idea, the authors formulated the contribution of seeds at point P by

where D ( r ) is dose contribution function, X j is a vector corresponding to point j and x j is 0-1 seed placement variables at j . The constraints of MIP model can be modeled with the lower and upper bond of dose at point P:

8


225

where Up and Lp are the upper and lower bound for radiation dose at point P, respectively. A review of optimization methods in the radiation therapy is presented in Shepard et al. (1999). Also, some promising directions of future research in the radiation therapy are discussed in Lee et al. (2003a). For the description of some other treatment planning problems (besides radiation therapy) the reader is referred to Sainfort et al. (2004).

4.

Medical imaging

Recent advances in imaging technologies combined with marked improvement in instrumentation and development of computer systems have resulted in increasingly large amounts of information. New therapies have more requirements on the quality and accuracy of image information. Therefore, medical imaging plays an ever-increasing role in diagnosis, prediction, planning and decision-making. Many problems in this field are addressed using optimization and mathematical programming techniques. In particular, specialized mathematical programming t,echniques have been used in a variety of domains including object recognition, modeling and retrieval, image segmentation, registration, skeletonization, reconstruction, classification, etc. (Cho et al., 1993; Kuba et al., 1999; Udupa, 1999; Rangarajan et al., 2003; Du et al., 1999). Some recent publications on imaging using optimization address the following problems: reconstruction methods in electron paramagnetic resonance imaging (Johnson et al., 2OO3), skeletonization of vascular images in magnetic resonance angiography (Nystrom and Smedby, 2001), image reconstruction using multi-objective optimization (Li et al., 2000), etc. Discrete tomography extensively utilizes discrete mathematics and optimization theory. A nice overview on medical applications of discrete tomography is given in Kuba et al. (1999).

5.

Health care applications

Optimization and operations research methods are extensively used in a variety of problems in health care, including economic analysis (optimal pricing,'demand forecasting and planning), health care units operations (scheduling and logistics planning, inventory management, supply chain management, quality management, facility location), etc.

226


Scheduling and logistic problems are among the most important and classical problems in optimization theory. One of the widely used applications of these problems to medicine is so-called nurse scheduling problem. Nurse scheduling planning is a non-trivial and important task, because it affects the efficiency and the quality of health care (Giglio, 1991). The schedule has to determine daily assignments of each nurse for a specified period of time while respecting certain constraints on hospital policy, personal preferences and qualification, workload, etc. Because of the practical importance of these problems, many algorithms and methods for solving it have been proposed (Miller et al., 1976; Warner, 1976; Isken and Hancock, 1991; Siferd and Benton, 1992; Weil et al., 1995). There are basically two types of nurse scheduling problems: cyclical and non-cyclical scheduling. In cyclical scheduling, an individual nurse works in a pattern repeatedly in a cycle of N weeks. On the other hand, non-cyclical scheduling generates a new scheduling period with available resources and policies that attempt to satisfy a given set of constraints. Recently, a problem of rerostering of nurse schedules was addressed in Moz and Pato (2003) This problem is common in hospitals where daily work is divided into shifts. The problem occurs in the case of the non-scheduled absence of one of the nurses, which violates one of the constraints for the given time shift. In Moz and Pato (2003), an integer multicommodity flow model was applied to the aforementioned problem and the corresponding integer linear programming problem was formulated. Computational results were reported for the real instances from the Lisbon state hospital. In general, optimization techniques and linear programming in particular is a very powerful tool, which can be used for many diverse problem in health care applications. For example, we can refer to Sewell and Jacobson (2003), where the problem of pricing of combination vaccines for childhood immunization was addressed using integer programming formulation. Other important problems in health care applications (inventory and queueing management, workforce and workload models, pricing, forecasting, etc.) are reviewed in Sainfort et al. (2004).

6.

Concluding remarks

In this chapter, we have identified and briefly summarized some of the promising research directions in the exciting interdisciplinary area of optimization in medicine. Although this review is certainly not exhaustive, we have described several important practical problems arising in various medical applications, as well as methods and algorithms used

8 Optimization Techniques in Medicine

227

for solving these problems. As we have seen, applying optimization techniques in medicine can often significantly improve the quality of medical treatment. It is also important to note that this research area is constantly growing, since new techniques are needed to process and analyze huge amounts of data arising in medical applications. Addressing these issues may involve a higher level of interdisciplinary effort in order to develop efficient optimization models combining mathematical theory and medical practice.

Acknowledgements This work was partially supported by a grant from the McKnight Brain Institute of University of Florida and NIH.

References Alexe, S., Blackstone, E., Hammer, P., Ishwaran, H., Lauer, M., and Snader C.P. (2003). Coronary risk prediction by logical analysis of data. Annals of Operations Research, 119:15-42, Billups, S. and Kennedy, J . (2003). minimum-support solutions for radiotherapy. Annals of Operation Research, 119:229-245. Boros, E., Hammer, P., Ibaraki, T., and Cogan, A. (1997). Logical analysis of numerical data. Mathematical Programming, 79:163-190. Boros, E., Hammer, P., Ibaraki, T., Cogan, A., Mayoraz, E., and Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions Knowledge and Data Engineering, 12:292-306. Bradley, P., Fayyad, U., and Mangasarian, 0. (1999). Mathematical programming for data mining: Formulations and challenges. INFORMS Journal on Computing, 11(3):217-238. Bradley, P., Mangasarian, 0., and Street, W. (1998). Feature selection via mathematical programming. INFORMS Journal on Computing, 10:209-217. Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121-167. Califf, R., Armstrong, P., Carver, J., D'Agostino, R., and Strauss, W. (1996). Stratification of patients into high, medium and low risk subgroups for purposes of risk factor management. Journal of the American College of Cardiology, 27(5):1007-1019. Cho, Z., Jones, J., and Singh, M. (1993) Foundations of Medical Imaging. Wiley.

228


Du, D.-Z., Pardalos, P.M., and Wang, J . (eds.) (1999). Discrete Mathematical Problems with Medical Applications. DIMACS Worskhop. American Mathematical Society. Ekin, O., Hammer, P., and Kogan, A. (2000). Convexity and logical analysis of data. Theoretical Computer Science, 244:95-116. Elger, C. and Lehnertz, K. (1998). Seizure prediction by non-linear time series analysis of brain electrical activity. The European Journal of Neuroscience, 10:786-789. Ferris, M., Lim, J., and Shepard, D. (2003). Radiosurgery treatment planning via nonlinear programming. Annals of Operation Research, 1l9:247-260. Giglio, R. (1991). Resource scheduling: From theory to practice. Journal of the Society for Health Systems, 2(2):2-6. Hammer, P. (1986). Partialiy defined Boolean functions and cause-effect relationships. In: International Conference on Multi-Attribute Decision Making via OR-Based Expert Systems. University of Passau. Hosmer, D. and Lemeshow, S. (1989). Applied Logistic Regression. Wiley. Iasemidis, L., Pardalos, P., Sackellares, J., and Shiau, D.-S. (2001). Quadratic binary programming and dynamical system approach to determine the predictability of epileptic seizures. Journal of Combinatorial Optimization, 5:9-26. Iasemidis, L., Principe, J., and Sackellares, J . (2000). Measurement and quantification of spatiotemporal dynamics of human epileptic seizures. In: M. Akay (ed.), Nonlinear Biomedical Signal Processing, Vol. 11, pp. 294-318. IEEE Press. Iasemidis, L. andsackellares, J. (1990). Phase space topography of the electrocorticogram and the Lyapunov exponent in partial seizures. Brain Topography, 2:187-201. Iasemidis, L. and Sackellares, J . (1991). The evolution with time of the spatial distribution of the largest Lyapunov exponent on the human epileptic cortex. In: D. Duke and W. Pritchard (eds.), Measuring Chaos in the Human Brain, pp. 49-82. World Scientific. Isken, M. and Hancock, W. (1991). A heuristic approach to nurse scheduling in hospital units with non-stationary, urgent demand, and a fixed staff size. Journal of the Society for Health Systems, 2(2):24-41.

8


229

Jibu, M. and Yassue, K. (1995). Quantum Brain Dynamics and Consciouness: An Introduction. John Benjamins Publishing Company. Johnson, C., McGarry, D., Cook, J., Devasahayam, N., Mitchell, J., Subramanian, s., and Krishna, M. (2003). Maximum entropy reconstruction methods in electron paramagnetic resonance imaging. Annals of Operations Research, 119:lOl-118. Kuba, A., Herman, G. Matej, S., and Todd-Pokropek, A. (1999). Medical Applications of discrete tomography. In: D.-Z. Du, P. M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 195-208. DIMACS Series, vol. 55. American Mathematical Society. Lahanas, M., Baltas, D., and Zamboglou, N. (2003a). A hybrid evolutionary multiobjective algorithm for anatomy based dose optimization algorithm in HDR brachytherapy. Physics in Medicine and Biology, 48:399-415. Lahanas, M., Schreibmann, E., and Baltas, D. (2003b). Multiobjective inverse planning for intensity modulated radiotherapy with constraintfree gradient-based optimization algorithms. Physics in Medicine and Biology, 48:2843-2871. Lee, E., Deasy, J., Langer, M., Rardin, R., Deye, J., and Mahoney, F. (2003a). Final report-NCI/NSF Workshop on Operations Research Applied to Radiation Therapy. Annals of Operation Research, 119:143-146. Lee, E., Fox, T. and Crocker, I. (2003b). Integer programming applied to intensity-modulated radiation therapy treatment planning. Annals of Operation Research, 119:165-181. Lee, E. and Zaider, M. (2003). Mixed integer programmming approaches to treatment planning for brachytherapy - Application to permanent prostate implants. Annals of Operation Research, 119:147-163. Lee, Y.-J., Mangasarian, O., and Wolberg, W. (2000). Breast cancer survival and chemotherapy: A support vector machine analysis. In: D.-Z. Du, P.M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 1-9. DIMACS Series, vol. 55. American Mathematical Society. Lehnertz, K. and Elger, C. (1998). Can epileptic seizures be predicted? Evidence from nonlinear time series analysis of brain electrical activity. Physical Review Letters, 80:5019-5022.

230


Li, X., Jiang, T., and Evans, D. J. (2000). Medical image reconstruction using a multi-objective genetic local search algorithm. International Journal of Computer Mathematics, 74:301-314. Litt, B., Esteller, R., Echauz, J.,Maryann, D., Shor, R., Henry, T., Pennell, P., Epstein, C., Bakay, R., Dichter, M., and Vachtservanos, G. (2001). Epileptic seizures may begin hours in advance of clinical onset: A report of five patients. Neuron, 30:51-64. Lodwick, W., McCourt, S., Newman, F., and Humphries, S. (1998). Optimization methods for radiation therapy plans. In: Computational, Radiology and Imaging: Therapy and Diagnosis. IMA Series in Applied Mathematics, Springer. Mangasarian, O., Street, W., and Wolberg, W. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4):570-577. Martinerie, J., Adam, C.V., and Quyen, M.L.V. (1998). Epileptic seizures can be anticipated by non-linear analysis. Nature Medicine, 4:1173-1176. Miller, H., Pierskalla, W., and Rath, G. (1976). Nurse scheduling using mathematical programming. Operations Research, 24(8):857-870. Moz, M. and Pato, M.V.(2003). An integer multicommodity flow model applied to the rerostering of nurse schedules. Annals of Operations Research, 1l9:285-3Ol. Nystrom, I. and Smedby, 0. (2001). Skeletonization of volumetric vascular images distance information utilized for visualization. Journal of Combinatorial Optimization, 5:27-41. Pardalos, P., Chaovalitwongse, W., Iasemidis, L., Sackellares, J., Shiau, D.-S., Carney, P., Prokopyev, O.A., and Yatsenko, V.A. (2004a). Seizure warning algorithm based on optimization and nonlinear dynamics. Revised and resubmitted to Mathematical Programming. Pardalos, P., Iasemidis, L., Shiau, D.-S., and Sackellares, J. (2002a). Combined application of global optimization and nonlinear dynamics to detect state resetting in human epilepsy. In: P. Pardalos and J. Principe (eds.), Biocomputing, pp. 140-158. Kluwer Acedemic Publishers. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., and Chaovalitwongse, W. (2003a). Prediction of human epileptic seizures based on

8


231

optimization and phase changes of brain electrical activity. Optimization Methods and Software, 18(1):81-104. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., Chaovalitwongse, W., Principe, J., and Carney, P. (2003b). Adaptive epileptic siezure prediction system. IEEE Transactions on Biomedical Engineering, 50(5):616-626. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., Yatsenko, V., and Chaovalitwongse, W. (2003~).Analysis of EGG data using optimization, statistics, and dynamical systems techniques. Computational Statistics and Data Analysis, 44:391-408. Pardalos, P. and Principe, J. (eds.) (2002). Biocomputing. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., Carney, P., and Iasemidis, L. (eds.) (2004b) Quantitatzve Neuroscience: Models, Algorithms, Diagnostics, and Therapeutic Applications. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., and Yatsenko, V. (2002b). Classical and quantum controlled lattices: Self-organization, optimization and biomedical applications. In: P. Pardalos and J , Principe (eds.), Biocomputing, pp. 199-224. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., Yatsenko, V., and Butenko, S. (2003d). Nonlinear dynamical systems and adaptive filters in biomedicine. Annals of Operations Research, 119:119-142. Quyen, M.L.V., Martinerie, J., Baulac, M., and Varela, F. (1999). Anticipating epileptic seizures in real time by non-linear analysis of similarity between EEG recordings. Neuroreport, 10:2149-2155. Rangarajan, A., Figueiredo, M., and Zerubia, J. (eds.) (2003) Energy Minimization Methods zn Computer Vision and Pattern Recognition, 4th International Workshop, EMMCVRP 2003. Springer. Sackellares, J., Iasemidis, L., Gilmore, R., and Roper, S. (2002). Epilepsy-when chaos fails. In: K. Lehnertz, J. Arnhold, P. Grassberger, and C. Elger (eds.), Chaos in the brain? Sainfort, F., Brandeau, M.,and Pierskalla, W. (eds.) (2004) Handbook of Operations Research and Health Care. Kluwer Academic Publishers. Sewell, E. and Jacobson, S. (2003). Using an integer programming model to determine the price of combination vaccines for childhood immunization. Annals of Operations Research, 119:261-284.

232


Shepard, D., Ferris, M., Olivera, G., and Mackie, T. (1999). Optimizing the delivery of radiation therapy to cancer patients. SIAM Review, 41 (4):721-744. Shiau, D., Luo, Q., Gilmore, S., Roper, S., Pardalos, P., Sackellares, J., and Iasemidis, L. (2000). Epileptic seizures resetting revisited. Epilepsia, 41/S7:208-209. Siferd, S. and Benton, W. (1992). Workforce staffing and scheduling: Hospital nursing specific models. European Journal of Operations Research, 60:233-246. Udupa, J. (1999). A study of 3D imaging approaches in medicine. In: D.-Z. Du, P.M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 209--216.DIMACS Series, vol. 55. American Mathematical Society. Vapnik, V. (1995). The -Vatwe of Statistical Learning Theory. Springer. Warner, D. (1976). Scheduling nursing personnel according to nursing preference: A mathematical programming approach. Operations Research, 24(8):842-856. Weil, G., Heus, K., Francois, P., and Poujade, M. (1995). Constraint programming for nurse scheduling. IEEE Engineering in Medicine and Biology, 14(4):417-422.

Chapter 9

GLOBAL OPTIMIZATION I N GEOMETRY - CIRCLE PACKING INTO T H E SQUARE Phter GSbor Szab6 MihSly Csaba Mark6t Tibor Csendes Abstract

1.

The present review paper summarizes the research work done mostly by the authors on packing equal circles in the unit square in the last years.

Introduction

The problem of finding the densest packing of n equal objects in a bounded space is a classical one which arises in many scientific and engineering fields. For the two-dimensional case, it is a well-known problem of discrete geometry. The Hungarian mathematician Farkas Bolyai (1775-1856) published in his principal work ('Tentamen', 183233 Bolyai, 1904) a dense regular packing of equal circles in an equilateral triangle (see Figure 9.1). He defined an infinite packing series and investigated the limit of vacuitas (the gap in the triangle outside the circles). It is interesting that these packings are not always optimal in spite of the fact that they are based on hexagonal grid packings (Szab6, 2000a). Bolyai was probably the first author in the mathematical literature who studied the density of a series of packing circles in a bounded shape. Of course, the work of Bolyai was not the very first in packing circles. There were other interesting early packings in fine arts, relics of religions and in nature (Tarnai, 1997), too. The old Japanese sangaku problems (Fukagawa and Pedoe, 1989; Szab6, 2001) contain many nice results related to the packing of circles. Figure 9.2 shows an example of packing 6 equal circles in a rectangle. The problem of finding the densest packing of n equal and nonoverlapping circles has been. studied for several shapes of the bounding

234


Figure 9.1. triangle.

The example of Bolyai for packing 19 equal circles in an equilateral

Figure 9.2. Packing of 6 equal circles in a rectangle on a rock from Japan.

9

Circle Paclcing into the Square

235

region, e.g., in a rectangle (Ruda, 1969), in a triangle (Graham and Lubachevsky, 1995) and circle (Graham et al., 1998). Our work focuses only on the 'Packing of Equal Circles in a Square'-problem. The Hungarian mathematicians Dezso LBzBr and LBszl6 Fejes T6th have already investigated the problem before 1940 (Staar, 1990; Szab6 and Csendes, 2001). The problem first appeared in literature in 1960, when Leo Moser (1960) guessed the optimal arrangement of 8 circles. Schaer and Meir (1965) proved this conjecture and Schaer (1965) solved the n = 9 case, too. Schaer has given also a proof for n = 7 in a letter to Leo Moser in 1964, but he never published it. There is a similar unpublished result from R. Graham in a private letter for n = 6. Later Schwartz (1970) and Melissen (1994) have given proof for this case (up to n = 5 circles the problem is trivial). The next challenge was the n = 10 case. de Groot et al. (1990) solved this after many authors published new and improved packings: Goldberg (1970); Milano (1987); Mollard and Payan (1990); Schaer (1971); Schliiter (1979) and Valette (1989). Some unpublished results are known also in this case: Griinbaum (1990); Grannell (1990); Petris and Hungerbiiler (1990). The proof is based on a computer aided method, and nobody published a proof using only pure mathematical tools. There is an interesting mathematical approach of this case in Hujter (1999). Peikert et al. (1992) found and proved optimal packings up to n = 20 using a computer aided method. Based on theoretical tools only, G. Wengerodt solved the problem for n = 14, 16 and 25 (Wengerodt, 1983, 1987a,b), and with K. Kirchner for n = 36 (Kirchner and Wengerodt, 1987). In the last decades, several deterministic (Locatelli and Raber, 2002; Markbt, 2003a; Mark6t and Csendes, 2004; Nurmela and OstergArd, 1999a; Peikert et al., 1992) and stochastic (Boll et al., 2000; Casado et al., 2001; Graham and Lubachevsky, 1996) methods were published. Proven optimal packings are known up to n = 30 (Nurmela and Ostergbrd, 1999a; Peikert et al., 1992; Markbt, 2003a; Mark6t and Csendes, 2004) and for n = 36 (Kirchner and Wengerodt, 1987). Approximate packings (packings determined by computer aided numerical computations without a rigorous proof) and candidate packings (best known arrangements with a proof of existence but without proof of optimality) were reported ir, the literature for up to n = 200: Boll et al. (2000); Casado et al. (2001); Graham and Lubachevsky (1996); Nurmela and OstergArd (1997); Szab6 and Specht (2005). At the same time, some other results (e.g. repeated patterns, properties of the optimal solutions and bounds, minimal polynomials of packings) were published as well (Graham and Lubachevsky, 1996; Locatelli and Raber, 2002; Nurmela

236


et al., 1999; Tarnai and GBsp&r, 1995-96; Szab6, 2000b; Szab6 et al., 2001; Szab6, 2004).

The packing circles in a square problem

2.

The packing circles in a square problem can be described by the following equivalent problem settings:

PROBLEM 1 Find the value of the m a x i m u m circle radius, r,, such that n equal non-overlapping circles can be placed i n a unit square. PROBLEM2 Locate n points i n a unit square, such that the m i n i m u m distance mn between any two points is maximal.

PROBLEM 3 Give the smallest square of side p,, which contains n equal and non-overlapping circles where the radius of circles is 1.

PROBLEM 4 Determine the smallest square of side an that contains n points with mutual distance of at least 1.

2.1

Optimization models

The problem is at one hand a geometrical problem and on the other hand a continuous global optimization problem. Problem 2 can be written shortly as a 2n 1 dimensional continuous nonlinear constrained (or max-min) global optimization problem in the following form:

+

This problem can be considered in the following ways:

a)

as a DC programming problem (Horst and Thoai, 1999).

A DC (difference of convex functions) programming problem is a mathematical programming problem, where the objective function can be described by a difference of two convex hnctions. The objective function of the problem can he stated as the difference of the following two convex functions g and h :

9

Circle Packing into the Square

where

or as an all-quadratic optimization problem. The general form of an all-quadratic optimization problem (Raber, 1999) is

b)

+ (do)Tx]

min[xT Q0x subject to

1 = 27, n # 36 was the highly increasing number of initial tile combinations. For n = 28, a sequential process on those combinations would have required about 1000 times more processor time (about several decades) even with non-interval computations -compared to the case of n = 27. The idea behind the newly proposed method is that we can utilize the local relations (patterns) between the tiles and eliminate groups of tile combinations together. Let us denote a generalized point packing problem instance by P ( n , X I , . . . ,Xn, Yl, . . . , Yn), where n is the number of points to be located, (Xi, Y,) E I I ~ ,i = 1 , . . .n are the components of the starting box, and the objective function of the problem is given by (9.3). The theorem below shows how to apply a result achieved on a 2m-dimensional packing problem to a 2n-dimensional problem with n>m>2.

THEOREM 9.7 ( M A R K ~ATN D CSENDES,2004) Assume that n 2 are integers and let

and

>m >

Pn = P ( n , X i , . . . ,Xn, Yl, . . . ,Yn) = P ( n , ( X , Y))

be point packing problem instances (Xi, Y,, Zi, Wi' E 1;Xi, Y,, Zi, Wi G [O, 11). R u n the B&B algorithm o n Pm using an f c u t 0 8 value in the accelerating devices but skipping the step of improving f . Stop after a n arbitrary preset number of iteration steps. Let (Zi, . . . , Z k , Wi, . . . , WA) := (Z', W') be the enclosure of all the elements placed o n the WorkList and o n the ResultList. Assume that there exists an invertible, distancepreserving geometric transformation cp with cp(Zi) = Xi and cp(Wi) = Y,, satisfying i = 1 , . . . ,m. T h e n for each point packing (x, y) E (x, y) E (X, Y) and fn(z, y ) f , the statement

>

(x, Y ) E

(dzi),.

also holds.

d Z k ) , X m + l , . . . ,Xn, ~p(J+'i),. - ., ~p(Wk),Ym+l, ,Yn) := (X',Y1)

256


- B&B

refl.

-

Figure 9.7. The idea behind processing tile combinations.

The meaning of Theorem 9.7 is the following: assume that we are able to reduce some search regions on a tile set S t . When processing a higher dimensional subproblem on a tile set S containing the image of the tile set of the smaller problem, it is enough to consider the image of those of the remaining regions of St as t,he particular coinpoiients of the latter problem. Figure 9.7 illustrates the application of the idea of l-landing sets of tile-combinations: the remaining regions of the tile combinations S and S' are given by the shaded areas. The transformation p is a reflection to the horizontal centerline of the rectangular region enclosing S'.

9.1 ( M A R K ~AND T CSENDES,2004) Let p be the identity COROI,I,ARY transformation and assume that the BBB algorithm terminates with a n e m p t y WorlcList and with a n e m p t y R e s u l t l i s t , i e . , the whole search W ) = ( Z I ,. . . , Zm, W I , . . . , Wm) = ( X I , .. . , XvL1Y I , . . . , Ym) region (2, i s eliminated by the accelerating devices using (the s a m e ) f . T h e n ( X ,Y) does n o t contain a n y ( 2 ,y) E R~~ vectors for which f,,(z, y ) 2 f holds.

6.8

Tile algorithms used in the optimality proofs

The method of the optimality proofs is started by finding feasible tile patterns and their remaining areas on some small subsets of the whole set of tiles. Then bigger and bigger subsets are processed while using the results of the previous steps. Thus, the whole method consists of several phases. The two basic procedures are:

Grow0 add tiles from a new coiumn to each element of a set of tile combinations.

Join0 join the elements of two sets of tile coinhinations pairwise. The detailed description of Join ( ) and Grow ( ) and the strategy of increasing the dimensionality of the subproblems can be found in Mark6t and Csendes (2004).

9


257

Numerical results: optimal packings for

6.9

n = 28, 29, 30 The results obtained with the multiphase interval arithmetic based optimality proofs are summarized below: Apart from symmetric cases, one initial tile combination (more precisely, the remaining areas of the particular combination) contains all the global optimal solutions of the packing problem of n points. The guaranteed enclosures of the global maximum values of Problem 2 are

F&= [0.2305354936426673,0.2305354936426743], w (F&) z7 . l0-l5,

Fig = [0.2268829007442089,0.2268829007442240], w (F,*,)z2 . 10-14, F&= [0.2245029645310881,0.2245029645310903], w(F,",) x 2 . 10-15. The exact global maximum value differs from the currently best known function value by at most w(F,*). Apart from symmetric cases, all the global optimizers of the problem of packing n points are located in an (X,Y ) : box (see Mark6t and Csendes, 2004). The components of the result boxes have the widths of between approximately 10-12(with the exception of the components enclosing possibly free points). The differences between the volume of the whole search space and the result boxes are more than 711, 764, and 872 orders of magnitudes, respectively. The total computational time was approximately 53, 50, and 20 hours, respectively. The total time complexities are remarkably less than the forecasted execution times of the predecessor methods.

6.10

Optimality of the conjectured best structures

An optimal packing structure specifies which points are located on the sides of the square, which pairs have minimal distance, and which points of the packing can move while keeping optimality. The output of our methods serves only as a numerical approximation to the solution of the particular problems but it says nothing about the structure of the optimal packing(s). Extending the ideas given in Nurmela and Osterg&rd

258


(1999a) to an interval-based context, in a forthcoming paper we intent to prove also some structural properties of the global optimizers (for details see Mark& 2003b).

Acknowledgments The authors are grateful for all the help given by colleagues for the underlying research. This work was supported by the Grants OTKA T 016413, T 017241, OTKA T 034350, FKFP 0739197, and by the Grants OMFB D-30/2000, OMFB E-2412001.

References Alefeld, G. and Herzberger, J. (1983). Introduction to Internal Computations. Academic Press, New York. Althofer, I. and Koschnick, K.U. (1991). On the convergence of threshold accepting. Applied Mathematics and Optimization, 24: 183-195. Ament, P. and Blind, G.(2000). Packing equal circles in a square. Studia Scientiamm Mathematicarum Hungarica, 36:313-316. Boll, D.W., Donovan, J., Graham, R.L., and Lubachevsky, B.D. (2000). Improving dense packings of equal disks in a square. Electronic Journal of Combinatorzcs, 7:R46. Bolyai. F. (1904). Tentamen Juventutem Studiosam in Elementa Matheseos Purae, Elementaris Ac Sublimioris, Methodo Intituitiva, Evidentiaque Huic Propria, Introducendi, Volume 2, Second edition, pp. 119122. Casado, L.G., Garcia, I., and Sergeyev, Ya.D. (2000). Interval branch and bound algorithm for finding the first-zero-crossing-point in onedimensional functions. Reliable Computing, 6:179-191. Casado, L.G., Garcia, I., Szab6, P.G., and Csendes, T. (2001) Packing equal circles in a square. 11. New results for up to 100 circles using the TAMSASS-PECS stochastic algorithm. In: Optimization Theory: Recent Developments from Mcitrahdza, pp. 207-224. Kluwer, Dordrecht. Croft, H.T., Falconer, K.J., and Guy, R.K. (1991). Unsolved Problems in Geometry, pp. 108-110. Springer, New York. Csallner, A.E. Csendes, T., and Mark&, M.Cs. (2000). Multisection in interval methods for global optimization. I. Theoretical results. Journal of Global Optimization, 16:371-392. Csendes, T. (1988). Nonlinear parameter estimation by global optimization-- Efficiency and reliability. Acta Cybernetica, 8:361-370.

9 Circle Packing into the Square

259

Csendes, T . and Ratz, D. (1997). Subdivision direction selection in interval methods for global optimization, SIAM Journal on Numerical Analysis, 34:922-938. Du, D.Z. and Pardalos, P.M. (1995). Minimax and Applications. Kluwer, Dordrecht . Dueck, G. and Scheuer, T . (1990). Threshold accepting: A general purpose optimization algorithm appearing superior to simulated annealing. Journal of Computational Physics, 90:161-175. Fejes T6th, G. (1997). Handbook of Discrete and Computational Geometry. CRC Press, Boca Raton. Fejes T6th, L. (1972). Lagerungen in der Ebene, auf der Kugel und im Raum. Springer-Verlag, Berlin. Fodor, F. (1999). The densest packing of 19 congruent circles in a circle. Geometriae Dedicata 74:139-145. Folkman, J.H. and Graham, R.L. (1969). A packing inequality for compact convex subsets of the plane. Canadian Mathematical Bulletin, 12:745-752. Fukagawa, H. and Pedoe, D. (1989). Japanese temple geometry problems. Sun gaku. Charles Babbage Research Centre, Winnipeg. Goldberg, M. (1970). The packing of equal circles in a square. Mathematics Magazine, 43:24-30. Goldberg, M. (1971). Packing of 14, 16, 17 and 20 circles in a circles. Mathematics Magazine, 44:134-139. Graham, R.L. and Lubachevsky, B.D. (1995). Dense packings of equal disks in an equilateral triangle from 22 to 34 and beyond. Electronic Journal of Combinatorics 2:Al. Graham, R.L. and Lubachevsky, B.D. (1996). Repeated patterns of dense packings of equal circles in a square, Electronic Journal of Combinatonics, 3:R17. Graham, R.L., Lubachevsky, B.D., Nurmela, K.J., and Osterg&rd,P.R.J. (1998). Dense packings of congruent circles in a circle. Discrete Mathematics 181:139-154. Grannell, M. (1990). An Even Better Packing of Ten Equal Circles in a Square. Manuscript.

260


de Groot, C., Monagan, M., Peikert, R., and Wurtz, D. (1992). Packing circles in a square: review and new results. In: System Modeling and Optimization, pp. 45-54. Lecture Notes in Control and Information Services, vol. 180. de Groot, C., Peikert, R. and Wurtz, D. (1990). The Optimal Packing of Ten Equal Circles in a Square. IPS Research Report No. 90-12, Eidgenossiche Technische Hochschule, Ziirich . Grunbaum, B. (1990). An Improved Packing of Ten Circles in a Square. Manuscript. Hadwiger, H. (1944). ~ b e extremale r Punktverteilungen in ebenen Gebieten. Mathematische Zeitschrift, 49:370-373. Hammer, R. Hocks, M., Kulisch, U., and Ratz, D. (1993). Numerical Toolbox for Veriified Computing. I. Springer-Verlag, Berlin. Hansen, E. (1992). Global Optimization Using Interval Analysis. Marcel Dekker, New York. van Hentenryck, P., McAllester, D., and Kapur, D. (1997). Solving polynomial systems using a branch and prune approach, SIAM Journal on Numerical Analysis, 34:797-827, 1997. Horst R. and Thoai. N.V. (1999). D.C. programming: Overview, Journal of Optimization Theory and Applications, 103:l-43. Hujter, M. (1999). Some numerical problems in discrete geometry. Computers and Mathematics with Applications, 38:175-178. Karnop, D.C. (1963). Random search techniques for optimization problems. Automatzca, 1:111-121. Kearfott, R.B. (1996). Test results for an interval branch and bound algorithm for equality-constrained optimization. In: Computational Methods and Applications, pp. 181-200. Kluwer, Dordrecht. Kirchner, K. and Wengerodt, G. (1987). Die dichteste Packung von 36 Kreisen in einem Quadrat. Beitrage zur Algebra und Geometrie, 25:147-159, 1987. Knuppel, 0 . ( l 9 B a ) . PROFIL -Programmer's Runtime Optimized Fast Interval Library. Bericht 93.4., Technische Universitat HamburgHarburg. Knuppel, 0. (1993b). A Multiple Precision. Arithmetic for PROFIL. Bericht 93.6, Technische Universitat Hamburg-Harburg.

9 Circle Packing into the Square

261

Kravitz, S. (1967). Packing cylinders into cylindrical containers. Mathematics Magazine, 40:65-71. Locatelli, M. and Raber, U. (1999). A Deterministic global optimization approach for solving the problem of packing equal circles in a square. In: International Workshop on Global Optimixation (G0.99), F'lrenze. Locatelli, M. and Raber, U. (2002). Packing equal circles in a square: A deterministic global optimization approach. Discrete Applied Mathematics, 122:139-166. Lubachevsky, B.D. (1991). How to simulate billiards and similar systems. Journal of Computational Physics, 94:255-283. Lubachevsky, B.D. and Graham, R.L. (1997). Curved hexagonal packings of equal disks in a circle. Discrete and Computational Geometry, 18:179-194. Lubachevsky, B.D. Graham, R.L., and Stillinger, F.H. (1997). Patterns and structures in disk packings. Periodica Mathematica Hungarica, 34:123-142. Lubachevsky, B.D. and Stillinger, F.H. (1990). Geometric properties of random disk packings. Journal of Statistical Physics, 60:561-583. Maranas, C.D., Floudas, C.A., and Pardalos, P.M. (1998). New results in the packing of equal circles in a square. Discrete Mathematics, 128:187-193. Markbt, M.Cs. (2000). An interval method to validate optimal solutions of the "packing circles in a unit square" problems. Central European Journal of Operational Research, 8:63-78. Mark&, M.Cs. (2003a). Optimal packing of 28 equal circles in a unit square- The first reliable solution. Numerical Algorithms, 37:253261. Markbt, M.Cs. (2003b). Reliable Global Optimixation Methods for Constrained Problems and Thew Application for Solving Circle Packing Problems (in Hungarian). Ph.D. dissertation. Szeged. Available at http://www.inf.u-szeged.hu/-markot/phdmm.ps.gz Mark&, M.Cs. and Csendes, T. (2004). A New verified optimization technique for the "packing circles in a unit square" problems. Forthcoming in SIAM Journal on Optimixation.

262


Mark&, M.Cs., Csendes, T., and Csallner, A.E. (2000). Multisection in interval methods for global optimization. 11. Numerical tests. Journal of Global Optimixation, 16:219-228. Matyas, J . (1965). Random optimization. Automatixation and Remote Control, 26:244-251. McDonnell, J.R. and Waagen, D. (1994). Evolving recurrent perceptrons for time-series modeling. IEEE Transactions on Neural Networks, 5:24-38. Melissen, J.B.M. (1993). Densest packings for congruent circles in an equilateral triangle. American Mathematical Monthly, 100:916-925. Melissen, J.B.M. (l994a). Densest packing of six equal circles in a square. Elemente der Mathematik, 49:27-31. Melissen, J.B.M. (1994b). Densest packing of eleven congruent circles in a circle. Geometriae Dedicata, 50:15-25. Melissen, J.B.M. (1994~).Optimal packings of eleven equal circles in an equilateral triangle. Acta Mathernatica Hungarica, 65:389-393. Melissen, J.B.M. and Schuur, P.C. (1995). Packing 16, 17 or 18 circles in an equilateral triangle. Discrete Mathematics, 145:333-342. Milano, R. (1987). Configurations optimales de desques duns un polygone rigulier. Mkmoire de licence, Universitk Libre de Bruxelles. Mollard, M. and Payan, C. (1990). Some progress in the packing of equal circles in a square. Discrete Mathematics, 84:303-307. Moore, R.E. (1966). Interval Analysis. Prentice-Hall, Englewood Cliffs. Moser, L. (1960). Problem 24 (corrected), Canadian Mathematical Bulletin, 8:78. Neumaier, A. (2001). Introduction to Numerical Analysis. Cambridge Univ. Press, Cambridge. Nurmela, K.J. (1993). Constructing Combinatorial Designs by Local Search. Series A: Research Reports 27, Digital Systems Laboratory, Helsinki University of Technology. Nurrnela, K.J. and Osterg&rd, P.R.J. (1997). Packing up to 50 equal circles in a square. Discrete and Computational Geometry, 18:111120.

9


263

Nurmela, K. J. and Osterg&rd, P.R.J . (l999a). More optimal packings of equal circles in a square. Discrete and Computational Geometry, 22:439-457. Nurmela, K.J. and Osterg&rd,P.R.J. (1999b). Optimal packings of equal circles in a square. In: Y. Alavi, D.R. Lick, and A. Schwenk (eds.), Combinatorics, Graph Theory, and Algorithms, pp. 671-680. Nurmela, K.J., Osterg&rd, P.R.J., and aus dem Spring, R. (1999). Asymptotic Behaviour of Optimal Circle Packings in a Square. Canadian Mathematical Bulletin, 42:380-385, 1999. Oler, N. (1961a). An inequality in the geometry of numbers. Acta Mathematica, 105:19-48. Oler, N. (1961b). A finite packing problem. Canadian Mathematical Bulletin; 4:153-155. Peikert, R. (1994). Dichteste Packungen von gleichen Kreisen in einem Quadrat, Elemente der Mathematik, 49:16-26. Peikert, R., Wurtz, D., Monagan, M., and de Groot, C. (1992). Packing circles in a square: A review and new results. In: P. Kall (ed.), System Modellzng and Optimization, pp. 45-54. Lecture Notes in Control and Information Sciences, vol. 180. Springer-Verlag, Berlin. Petris, J. and Hungerbuler, N. (1990). Manuscript. Pirl, U. (1969). Der Mindestabstand von n in der Einheitskreisscheibe gelegenen Punkten. Mathematische Nachrichten, 40: 111-124. Raber, U. (1999). Nonconvex All-Quadratic Global Optimization Problems: Solution Methods, Application and Related Topics. Ph.D. thesis. University of Trier. Rao, S.S. (1978). Optimization Theory and Applications. John Willey and Sons, New York. Ratschek, H. and Rokne! J. (1988). New Computer Methods for Global Optimizatzon. Ellis IIorwood, Chichester. Reis, G.E. (1975). Dense packings of equal circles within a circle. Mathematics Magazine, 48:33-37. Ruda, M. (1969). Packing circles in a rectangle (in Hungarian). Magyar Tudoma'nyos Akad6mia Matematikai 6s Fizikai Tudoma'nyok Osztcilya'nak Kozieme'nyei, 19:73-87.

264


Schaer, J. (1965). The densest packing of nine circles in a square, Canadian Mathematical Bulletin, 8:273-277. Schaer, J. (1971). On the densest packing of ten equal circles in a square. Mathematics Magazine, 44:139-140. Schaer, J. and Meir, A. (1965). On a geometric extremum problem. Canadian Mathematical Bulletin, 8:21-27. Schliiter, K. (1979). Kreispackung in Quadraten. Elemente der Mathematik, 34:12-14. Schwartz, B.L. (1970). Separating points in a square. Journal of Recreational Mathematics, 3:195-204. Solis, F.J. and Wets, J.B. (1981). Minimization by random search techniques. Mathematics of Operations Research, 6:19-50.

E. Specht 's packing web site. h t t p ://www .packomania .com Specht, E. and Szab6, P.G. (2004). Lattice and Near-Lattice Packings of Equal Circles in a Square. In preparation. Staar, Gy. (1990). The Lived Mathematics (in Hungarian). Gondolat, Budapest. Szab6, P.G. (2000a). Optimal packings of circles in a square (in Hungarian). Polygon, X:48-64. Szab6, P.G. (2000b). Some new structures for the "equal circles packing in a square" problem. Central European Journal of Operations Research, 8:79-91. P.G. Szab6 (2001). Sangaku- Wooden boards of mathematics in Japanese temples (in Hungarian). KoMaL, 7:386-388. Szab6, P.G. (2004). Optimal substructures in optimal and approximate circle packings. Forthcoming in Beitrage zur Algebra und Geometrie. Szab6, P.G. and Csendes, T . (2001). Dezso LBzAr and the densest packing of equal circles in a square problem (in Hungarian). Magyar Tudoma'ny, 8:984-985. Szab6, P.G. Csendes, T., Casado, L.G., and Garcia, I. (2001). Packing equal circles in a square. I. Problem setting and bounds for optimal solutions. In: Optimization Theory: Recent Developments from Ma'trahcixa, pp. 191-206. Kluwer, Dordrecht.

9


265

Szab6, P.G. and Specht, E. (2005). Packing up to 200 Equal Circles in a Square. Submitted for publication. Tarnai, T . (1997). Packing of equal circles in a circle. Structural Morphology: Toward the New Millenium, pp. 217-224. The University of Nottingham, Nottingham. Tarnai, T . and GBspBr, Zs. (1995-96). Packing of equal circles in a square. Acta Technica Academiae Scientiarum Hungaricae, 107(1-2):123-135. Valette, G. (1989). A better packing of ten circles in a square. Discrete Mathematics, 76:57-59. Wengerodt, G (1983). Die dichteste Packung von 16 Kreisen in einem Quadrat. Beitrage xur Algebra und Geometrie, 16:173-190. Wengerodt, G. (1987a). Die dichteste Packung von 14 Kreisen in einem Quadrat. Beitrage xur Algebra und Geometrie, 25:25-46. Wengerodt, G. (1987b). Die dichteste Packung von 25 Kreisen in einem Quadrat. Annales Universitatis Scientiarum Budapestinensis de Rolando Eotvos Nominatae. Sectio Mathematica, 30:3-15. Wiirtz, D., Monagan, M., and Peikert, R. (1994). The history of packing circles in a square. Maple Technical Newsletter, 0:35-42.

Chapter 10

A DETERMINISTIC GLOBAL OPTIMIZATION ALGORITHM FOR DESIGN PROBLEMS FrBdkric Messine Abstract

1.

Complete extensions of standard deterministic Branch-and-Bound algorithms based on interval analysis are presented hereafter in order to solve design problems which can be formulated as non-homogeneous mixed-constrained global optimization problems. This involves the consideration of variables of different kinds: real, integer, logical or categorical. In order to solve interesting design problems with an important number of variables, some accelerating procedures must be introduced in these extended algorithms. They are based on constraint propagation techniques and are explained in t,his chapter. In order to validate the designing methodology, rotating machines with permanent magnets are considered. The corresponding analytical model is recalled and some global optimal design solutions are presented and discussed.

Introduction

Design problems are generally very hard to solve and furthermore very difficult to formulate in a rational way. For instance, the design of electro-mechanical actuators is clearly understood as an inverse problem: from some characteristic values given by the designer, find the physical structures, components and dimensions which entirely describe the resulting actuator. This inverse problem is ill-posed in the Hadamard sense because, even if the existence of a solution could be guaranteed, most often there is a large, or even an infinite number of solutions. Hence, only some solut,ions can be characterized and then, it becomes natural to search the optimal ones by considering some criteria, a priori, defined. As it is explained in Fitan et al. (2004) and Messine et a1 (2001), general inverse problems must consider the dimensions but also the structure and the components of a sort of actuator. Thus, an interesting formu-

268


lation of the design problems of electro-mechanical actuators-or other similar design problems-consists in considering the associated following non-homogeneous mixed constrained optimixation problem: min

xERn~,~EWn=, b€Bnb , k ~ n r ! + i

f (x, z , b , k)

subject to

(10.1) gi(x,x,b,k) 5 0 'di E (1,...,n g ) h j ( x , x , b , k ) = O ' d j ~ ( 1..., , nh)

where f , gi and hj are real functions, Kirepresents an enumerated set of categorical variables, for example a type of material, and B = {0,1) the logical set which is used to model some different possible structures. Interval analysis was introduced by Moore (1966) in order to control the numerical errors generated by the floating point representations and operations. Consequently, a real value x is enclosed by an interval where the lower and upper bounds correspond to the first floating point numbers over and under x. The operations over the intervals are then developed, so defining the interval arithmetic. Using this tool, reliable enclosures of functions are obtained. In global optimization and more precisely in Branch-and-Bound techniques, interval analysis is meant to compute reliable bounds of the global optimum for univariate or multivariate, non-linear or non-convex homogeneous analytical functions (Hansen, 1992; Kearfott, 1996; Messine, 1997; Moore, 1966; Ratschek and Rokne, 1988). In this chapter, one focuses on design problems which are generally non-homogeneous and mixed (with real, integer, logical and categorical variables). Therefore, this implies some extensions of the standard interval Branch-and-Bound algorithms. Furthermore, design problems are subjected to strong (equality) constraints and then the implicit relations between the variables can be used in order to reduce a priori the part of the box where the constraints cannot be satisfied. These techniques are named constraint propagation or constraint pruning techniques (Hansen, 1992; Messine, 1997, 2004; Van Henterbryck et al., 1997). In Section 2, a deter~ninistic(exact) global optimization algorithm is presented. It is an extension of an interval Branch and Bound algorithm developed in Ratschek and Rokne (1988) and Messine (1997), to deal with such a problem (10.1). An important part of this section is dedicated to the presentation of a propagation technique based on the computational tree. This technique inserted in interval Branch and Bound algorithms has permitted to improve considerably the speed of convergence of such methods. In order to validate this approach and

10 A Deterministic Global Optimization Algorithm for Design Problems 269

in order to show the efficiency of such an algorithm, only one type of electro-mechanical actuator is considered: rotating machines w i t h perm a n e n t magnets. This choice was determined by my personal commitment in the formulation of the analytical model of such actuators, and also by the fact that they represent difficult global optimization problems. Other related work on the design of piezo-electric actuators can be found in Messine et a1 (2001). In Section 3, the analytical model of rotating machines w i t h permanent magnets is entirely presented and detailed. The physical assumptions to obtain this analytical relations are not discussed here, see Fitan et al. (2003, 2004), Messine et al. (1998), Kone et al. (1993), Nogarede (2001), and Nogarede et al. (1995) for a thorough survey on this subject. Numerical optimal solutions for some machines are then subsequently discussed.

2.

Exact and rigorous global optimization algorithm

These kinds of algorithm, named Branch-and-Bound, work within two phases: the computation of bounds of a given function considered over a box, and the decomposition of the initial domain into small boxes. Thus, the initial problem is bisected into smaller ones and for each subproblem, one tries to prove that the global optimum cannot occur in them by comparing the bounds with the best solution previously found. Hence, only sub-problems which may contain the global optimum are kept and stored (a list is generated). For constrained problems, it is also possible to show by computing the bounds that a constraint never can be satisfied over a given box; these corresponding sub-problems are discarded. Furthermore, the constraints reveal implicit relations between the variables and then, some techniques, named constraint propagation or constraint pruning or constraint deduction, are developed to reduce the domain where the problem is studied. These techniques are based on the calculus trees of the constraints (Messine, 1997, 2004; Van Henterbryck et al., 1997) or on linearizations of the constraint functions by using a Taylor expansion at the first order (Hansen, 1992). The exact method developed for solving Problems (10.1) is an extension of Interval Branch and Bound algorithms (Hansen, 1992; Kearfott, 1996; Messine, 1997; Ratschek and Rokne, 1988). All these algorithms are based on interval analysis (Moore, 1966) which is the tool for computing the bounds of a continuous function over a box; i.e. an interval vector. Generally, these algorithms work with homogeneous real variables according to an exclusion principle: when a constraint cannot be satisfied

270


in a considered box or when it is proved that the global optimum cannot occur in the box. In our case, it is necessary to extend an internal Branch-and-Bound algorithm to deal with non-homogeneous and mixed variables: real, integer, logical and categorical issue to different physical sizes. Furthermore, in order to improve the convergence of this kind of algorithms, the introduction of some iterations of propagation techniques became unavoidable (Hansen, 1992; Messine, 1997, 2004; Van Henterbryck et al., 1997). In the corresponding code, all the variables are represented by interval compact sets: rn

real variables: one considers the interval compact set where the global solution is searched,

rn

integer variables: the integer discrete set is relaxed to become the closest continuous interval compact set; {zL,. . . , xu) becomes

[xL, xu]. logical variable: {0,1) is relaxed into [ O , l ] , categorical variables: one introduces some definition of intermediate univariate real functions as explained in a following part of this section. The categorical sets are in fact sets of number from one to the number of categories. Of course, these enumerated sets are not ordered. Therefore, a distinction must be introduced between continuous and discrete variables. In the following algorithm, f denotes the function to be minimized, C represents the list where the sub-boxes are stored, 2 and f denote the current solution during the running of the program, ef is the given desired accuracy for the global optimum value and E is a given vector for the precisions of the corresponding solution points. The main steps of Algorithm 10.1 are boldfaced and are defined and detailed in later subsections. ALGORITHM 10.1 (INTERVAL BRANCHAND BOUNDALGORITHM) Begin 1. Let X E EXn, x Wex Bnb x Kibe the initial domain i n which the global m i n i m u m is sought. 2. Set f := +oo. 3. Set C := (+oo, X). 4. Extract from C the box for which the lowest lower bound has been computed. 5. Bisect the considered box, yielding Vl, V2.

n;;,


6. For j := 1 to 2 do 6.1. Compute vj := lower bound of f over 4 . 6.2. Propagate the constraints over 5 , (4 can be reduced). 6.3. Compute the lower and upper bounds for the interesting constraints over &. 6.4. i f f vj and n o constraint is unsatisfactory then 6.4.1. insert (vj, &) in L. 6.4.2. set f := min (f,f ( m ) ) , where m is the midpoint of &, if and only if m satisfies all the constraints. 6.4.3. if f is changed then remove from L all ( z , Z ) where x > f and set y := m. end if 7. if f < min(,,z)sc z E / and the largest box i n L is smaller than E , then STOP. Else GoTo Step 4. Result: f", G, L. End

>

+

Because the algorithm stops when the global minimum is sufficiently accurate (less than E /), and also when all the sub-boxes Z are sufficiently small, all the global solutions are given by the minimizers belonging to the union of the remaining sub-boxes in L, and the minimal value is given by the current minimum f . In practice, only f and its corresponding solution, are considered.

REMARK 10.1 In order to consider the n o n homogeneous case, E is in fact a real positive vector, represents the desired accuracy for the boxes remaining in the list L; ~i > 0 if it corresponds to a real variable and else ~i = 0, for logical, integer and categorical variables. Algorithm 10.1 follows the four following phases: the bisection of the box, the computation of bounds over a box, the exclusion of a box and propagation techniques to reduce the considered box a priori. These techniques are detailed in the following subsections.

2.1

Bisection rules

This phase is critical because it determines the efficient way to decompose the initial problem into smaller ones. In our implementation, all the components of a box are represented by real-interval vectors. Nevertheless, attention must be paid to the components when they represent real, integer, logical or categorical variables.

272


The classical principle of bisection-in the continuous homogeneous case-consists in choosing a coordinate direction parallel to which the box has an edge of maximum length. This box is then bisected normal to this direction (Ratschek and Rokne, 1988). For solving the Problem (10.1), the real variables are generally nonhomogeneous (coming from different physical characteristics: current density and ,diameter of a machine for example). Furthermore, Algorithm 10.1 must deal with discrete variables: logical, integer, and categorical. Hence, the accuracy given by the designer, is represented in Algorithm 10.1 by a real vector E corresponding to each variable: it is the expected precision for the solution at the end of the algorithm. EI, is fixed to 0, if it represents a discrete (integer, logical or categorical) component. Therefore, the bisection rule is modified, considering continuous and discrete variables. Therefore, one uses two different ways to bisect a variable according to its type (continuous or discrete). : the given weights for respecLet us denote, by wx, wf, W: and ,w tively the real variables xi, the integer variables xi, the logical variables bi and the categorical variables ki. First, the following real values are computed for all the variables:

where the application I .I denotes the cardinal (i.e. the number of elements) of the considered discrete sets. The largest real value of this list implies the variable (k) which will be bisected, in the following way: 1. Z1 := Z and Z2 := Z 2. i f

~k

= 0 then ( f o r d i s c r e t e v a r i a b l e s )

e l s e Zk i s divided by i t s midpoint, t h i s d i r e c t l y produces Z1 and Z2.

denotes the kth comwhere Zk = [$, rg], respectively (Zl)k and (22)k, ponents of Z , respectively of Z1 and Z2. [xIIrepresents the integer part of the considered real value x.

REMARK 10.2 It is more efficient to emphasize the bisection for the discrete variables ki,because that involves a lot of considerable modifi-


cations of the so-considered optimization problem (10.1)). In the following numerical examples and more generally, the weight for the discrete variables are fixed to wf = w: = w: = 100 and for the real variables wx =

Computation of the bounds

2.2

The computation of the bounds represents the fundamental part of the algorithm, because all the techniques of exclusion and of propagation are depending on them. An inclusion function is an interval function, such that it encloses the range of a function over a box Y. For a given function f , a corresponding inclusion function is denoted by F, such that: [minuEyf (y), maxyEyf (y)] C F ( Y ) , furthermore one has: Z C Y implies F ( Z ) G F ( Y ) . The given functions must be explicitly detailed to make possible the construction of an inclusion function. Algorithm 10.1 works and converges even i f f is not continuous (Kearfott, 1996; Moore, 1966; Ratschek and Rokne, 1988). The number of global minimum points can be unbounded, but f has to be bounded in order to obtain a global minimum. Lipschitz conditions, differentiability, or smoothness properties are not needed. Nevertheless the numerical running is facilitated and the convergence speed may be improved if these properties are present. The following paragraph recalls the standard interval techniques used to construct inclusion functions (Moore, 1966). Let II be the set of real compact intervals [a, b], where a , b are real (or floating point) numbers. The arithmetic operations for intervals are defined as follows:

+

+

[a, b] [c,d] = [a -tc, b dl [a, b] - jc,d] = [a - d, b - c] = [a, b] x [c,d] = [min{a x c, a x d, b x c, b x d), max{axc,axd,bxc,bxd)] if 0 @ [c, dl [a,b] t [c,dl = [a, b] x

I

(10.2)

[i,$1

These above operations can be extended for mixed computations between real values and intervals because a real value is a degenerated interval where the two bounds are equal. When, the real value is not representable by a floating point number, an interval can then be generated with the two closest floating points enclosing the real value; for example for T, two floating points must be considered, one just under T and the other just over. Definitions (10.2) show that subtraction and division in I1 are not the inverse operatiyns of addition and multiplication. Unfortunately, the

274


interval arithmetic does not conserve all the properties of the standard one; for example it is sub-distributive: V(A, B, C) E 1l3, A x ( B C) 5 A x B A x C (Moore, 1966). The division by an interval containing zero is undefined and then, an extended interval arithmetic has been developed, refer to Ratschek and Rokne (1988) and Hansen (1992). The natural extension of an expression of f into interval, consisting by replacing each occurrence of a variable by its corresponding interval (which encloses it), and then by applying the above rules of interval arithmetic, is an inclusion function; special procedures for bounding trigonometric and transcendental functions allow the extension of this procedure to a great number of analytical functions. This represents a fundamental theorem of interval analysis Moore (1966). The bounds so-evaluated (by the natural extension of an expression of f ) are not always accurate in the sense that the bounds may become too large and then inefficient. Hence, several other techniques based on Taylor expansions, are classically used, refer to Messine (1997), Moore (1966), and Ratschek and Rokne (1988) for a thorough survey and discussion on this subject; for these inclusion function, the given function must be continuous and at least once differentiable. For our design problems, the natural extension into in,terval has generally been sufficient. Interval arithmetic is well defined only for continuous real functions and then, inclusion functions must be extended to deal with discrete variables. For logical and integer variables, one must just relax the fact that these variables are discrete: the discrete logical sets ( 0 , l ) become the continuous interval compact sets [ O , l ] , and the discrete integer sets ( 0 , . . . ,n), (1,. . . ,n), or more generally {xL ,z L 1,xL 2 , . . . , xu) are relaxed by respectively the following compact intervals: [O, n], [I,n] and [xL,xu]. Hence, a new inclusion function concerning mixed variables: logical, integer and real variables can then be constructed. The categorical variables cannot directly be considered in an expression of a function, because they represent some varieties of an object which induces some effects. Generally, these effects bring positive real values; for example the magnetic polarization value which depends on the kind of the permanent magnets used. Therefore, each categorical variable (used to represent varieties of objects) must be associated to at least one real univariate function, denoted by:

+

+

+

+

In this work, only univariate functions are considered because they actually are sufficient for our practical uses. Furthermore in our code, all


categorical variables a k are denoted by an integer number, beginning from 1 to IKkI. Each of these numbers correspond to a precise category which must be previously defined. Hence, for computing the bounds of a function f over a box X , 2, B , C, depending on the univariate real functions denoted by ai, enclosures of the intervals [minujEKjai (aj),maXuj~Kj ai(oj)] must be computed. Denoting by C j an enumerate subset of the corresponding categorical set Kj, the following inclusion function for the corresponding real function ai is then defined by: [vl,vl], if C j = [I, 11, if C j = [IKjI, [Kjl],

{ ~, .f. . , cy I}, max {vi, i E {c:, . . . , ~ y } } ] , in the general case, where ic, = [Cf, Cy] [l,lKkl] in this representation, a general inclusion function F ( X , 2,B, C) is then constructed for mixed (discrete and continuous) expressions.

REMARK 10.3 A more efficient inclusion function for the real univariate function ai over an enumerate set Cj C K j , is: Ai(Cj) := [rnin {vi, i E {C:,

. . . , cY}},

max {vi, i E

{ ~ f. . ,. , Cy}}]

+

However, this function needs an enumeration of the subset Cj for each computations. Other techniques are possible and some of them are detailed in Messine et al (2001). Hence, lower and upper bounds can also be generated. In order to produce logical, integer and categorical solutions with continuous relaxations for the corresponding discrete variables, only particular bisection rules must be considered, refer to the above section.

2.3

Exclusion principle

The techniques of exclusion are based on the fact that it is proved that the global optimum cannot occur in a box. This leads to two main possibilities, considering a sub-box denoted by X , 2, B , C: I . the (already found) solution, denoted by f , cannot be improved in , B, C) > f , i.e. the lower bound of this considered box: F ~ ( x2, the given function f over the sub-box X , 2,B, C is greater than

276


a solution already found: no point in the box can improve this solution f , see Step 6.4 of Algorithm 10.1. 2. It can be proved that a constraint will never be satisfied in the sub-box: G ~ ( x 2, , B, C) > 0 or 0 $ H k ( X ,Z, B, C). Equality constraints are hard to be satisfied numerically. Therefore, given tolerances are introduced for each equality constraints and then, one verifies if Hk(X,Z, B , C) [- (ce)k, (&e)k]in place of H k ( X ,2, B, C) = [O,O].

In our case, the computation of the bounds is exact and rigorous; thus the associated global optimization algorithm is said exact and rigorous, and the global optimum is then perfectly enclosed with a given accuracy: XU-xf < ~ i , ' v 'Ei (1,...,n,}, Z: = $,'v'i E (1,...,n,), b: = by,b'i E (1,. . . , nb} and k f = kv,'v'i E (1,. . . , nk}. REMARK10.4 It may be possible that a logical or a categorical variable generates new additional constraints and variables. In this case, particular procedures must be inserted.

2.4

Constraint propagat ion techniques

Constraint propagation techniques based on interval analysis permit to reduce the bounds of an initial hypercube (interval vector) by using the implicit relations between the variables derived from the constraints. In this subsection, the constraints are written in a general way, as follows: (10.3) c(x) E [a,b], with x E X c Rn. where c is a real function which represents the studied constraint, [a, b] is a real fixed interval and X is a real interval compact vector. In order to consider an equality constraint, one fixes a = b. For an inequality constmint, a is fixed to -00 (numerically one uses the lower representable floating point value). REMARK10.5 Only the continuous case is considered in this section. However, it is very simple to extend these techniques to deal with integer, logical and real variables-except the categorical case-by relaxing the discrete variables by their corresponding continuous set, such as explained below, and by taking the integer part of the upper bound and the integer part-plus one if it is different to the real value-of the lower bound of the resulting interval.


2.4.1

Classical interval propagation techniques.

The linear case. the propagation is:

If the given constraint is linear: c(x) =

zyz2=1 aixi,

where k is in (1, . . . , n } and Xi is the ith interval component of X .

The non-linear case, Hansen method. If the constraint c is non linear, but continuous and at least once differentiable, Hansen (1992) uses a Taylor expansion at the first order to produce a linear equation with interval coefficients. A Taylor expansion at the first order can be written as follows:

where (x, y) E X 2 and J E 2 (X represents the open set of the compact hypercube X: a component of x has the following form ] $, x y [) . An enclosure of Vc(

Essays and Surveys in Global Optimization (Gerad 25th Anniversary Series)

Logistics Systems: Design and Optimization (Gerad 25th Anniversary Series)

Graph Theory and Combinatorial Optimization (Gerad 25th Anniversary Series)

Graph Theory and Combinatorial Optimization (Gerad 25th Anniversary Series)