Moments, Positive Polynomials and Their Applications
Imperial College Press Optimization Series ISSN 2041-1677 Series...
77 downloads
913 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Moments, Positive Polynomials and Their Applications
Imperial College Press Optimization Series ISSN 2041-1677 Series Editor: Jean Bernard Lasserre (LAAS-CNRS and Institute of Mathematics, University of Toulouse, France)
Vol. 1: Moments, Positive Polynomials and Their Applications by Jean Bernard Lasserre
Imperial College Press Optimization Series
Vol. 1
Moments, Positive Polynomials and Their Applications Jean Bernard Lasserre LAAS-CNRS and Institute of Mathematics University of Toulouse, France
ICP
Imperial College Press
Published by Imperial College Press 57 Shelton Street Covent Garden London WC2H 9HE Distributed by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
MOMENTS, POSITIVE POLYNOMIALS AND THEIR APPLICATIONS Imperial College Press Optimization Series — Vol. 1 Copyright © 2010 by Imperial College Press All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-13 978-1-84816-445-1 ISBN-10 1-84816-445-9
Printed in Singapore.
To my daughter Julia and to Carole
This page intentionally left blank
Preface
Consider the following list of problems: • • • • • • • • • •
Finding the global minimum of a function on a subset of Rn , Solving equations, Computing the Lebesgue volume of a subset S ⊂ Rn , Computing an upper bound on µ(S) over all measures µ satisfying some moment conditions, Pricing exotic options in Mathematical Finance, Computing the optimal value of an optimal control problem, Evaluating an ergodic criterion associated with a Markov chain, Evaluating a class of multivariate integrals, Computing Nash equilibria, With fb the convex envelope of a function f , evaluate fb(x) at some given point x.
The above seemingly different and unrelated problems share in fact a very important property: they all can be viewed as a particular instance of the Generalized Moment Problem (GMP in short)! And of course the above list is not exhaustive! It is known that the GMP has great modeling power with impact in several branches of Mathematics and also with important applications in various fields. However, and as illustrated by the above list, in its full generality the GMP cannot be solved numerically. According to Diaconis (1987), “the theory [of moment problems] is not up to the demands of applications”. One invoked reason is the high complexity of the problem: “numerical determination ... is feasible for a small number of moments, but appears to be quite difficult in general cases”, whereas Kemperman (1987) points out vii
viii
Preface
the lack of a general algorithmic approach. Indeed quoting Kemperman: “...a deep study of algorithms has been rare so far in the theory of moments, except for certain very specific practical applications, for instance, to crystallography, chemistry and tomography. No doubt, there is a considerable need for developing reasonably good numerical procedures for handling the great variety of moment problems which do arise in pure and applied mathematics and in the sciences in general...”. The main purpose of this book is to show that the situation becomes much nicer for the GMP with polynomial data (and sometimes one may even consider GMPs with some piecewise polynomial data or even rational functions). Indeed, results from real algebraic geometry and functional analysis have provided new characterizations of polynomials positive on a basic semi-algebraic set K ⊂ Rn , and dual results on moment sequences that can be represented by finite Borel measures supported on K. This beautiful duality is nicely captured by standard duality in convex analysis, applied to some appropriate convex cones. Moreover, these characterizations are not only very elegant and simple to state, but more importantly, it also turns out that they are amenable to practical computation as they can be checked by semidefinite programming (and sometimes linear programming), two well known powerful techniques of convex optimization. Conjunction of the above theoretical breakthrough with the development of semidefinite programming has allowed to define a numerical scheme based on semidefinite programming to approximate, and sometimes solve exactly, the original GMP. It merely consists of a hierarchy of semidefinite relaxations of the GMP where each semidefinite relaxation is a convex optimization problem for which efficient public softwares are available. As shown in the book, the beautiful duality between moments and positive polynomials is perfectly expressed by standard duality in convex optimization, when applied to these semidefinite relaxations. This book is an attempt to convince the reader that one may now consider solving (or at least approximating as closely as desired) some difficult problems that were thought to be out of reach a few years ago, or for which only heuristics were available. Of course, since such problems remain hard, this methodology has some practical limitations mainly due to the size of the original problem, especially in view of the present status of semidefinite programming solvers (used as a black box subroutine to solve each semidefinite relaxation). However, we also indicate how sparsity or symmetries (when present) can be exploited so as to handle problems of larger size. Hence much remains to be done but we hope that this methodology
Preface
ix
will open the door to a more systematic and efficient treatment of such problems. We introduce this methodology in a unified manner. To do so we first provide a general numerical scheme for solving (or approximately solving) the abstract GMP with polynomial data. Convergence and several other properties are proved in a very general context. Next, we illustrate in detail the above general methodology when applied to several instances of the GMP in various and very different applications in Global Optimization, Algebra, Probability and Markov Chains, Optimal Control, Mathematical Finance, Multivariate Integration, etc..., possibly with ad hoc adjustments if and when necessary. Depending on the problem on hand, additional insights are also provided. The book is divided into two main parts, and here is a brief chapterby-chapter description of its content. Part I is devoted to the theoretical basis that supports the proposed methodology to solve or approximate the abstract GMP. Part II is devoted to illustrate (and sometimes complement) the methodology for specific instances of the GMP in various domains. Part I Chapter 1 describes the abstract basic GMP and its dual with a few examples. We also provide some general results concerning the GMP with polynomial data and its dual, and show why the theory of moments and its dual theory of positive polynomials can be useful to solve the GMP. Chapter 2 reviews basic results of real algebraic geometry on the representation of positive polynomials, among which the fundamental Positivstellensatze of Krivine, Stengle, Schm¨ udgen, Putinar and Jacobi and Prestel. We also provide some additional representation results that take account of several specific cases like for instance, finite varieties, convex semi-algebraic sets, representations that preserve sparsity, etc. Chapter 3 is the dual of Chapter 2 as most results are the dual analogues of those described in Chapter 2. Indeed the problem of representing polynomials that are positive on a set K has a dual facet which is the problem of characterizing sequences of reals that are moment sequences of some finite Borel measure supported on K. Moreover, as we shall see, this beautiful duality is nicely captured by standard duality in convex analysis, applied to some appropriate convex cones. We review basic results in the moment problem and also particularize to some specific important cases like those
x
Preface
in Chapter 2. Chapter 4, now with the appropriate theoretical tools on hand, describes the general methodology to solve the abstract GMP. That is, we provide a hierarchy of semidefinite (or sometimes linear) relaxations of the basic GMP, whose associated sequence of optimal values converges to the optimal value of the GMP. Variants of this methodology are also provided to handle additional features such as countably many moment constraints, several measures, or GMP with sparsity properties. This chapter should allow the reader to see what is the basic idea underlying the methodology that permits to solve (or approximate) a problem formulated as a particular instance of the GMP. Part II Part II of the book is devoted to convince the reader about the power of the moment approach for solving the generalized moment problem with polynomial data. Each of the next chapters illustrates (and sometimes complements) the above methodology on some particular important instances of the GMP. For each chapter we detail the semidefinite relaxations of Chapter 4 in the specific context of the chapter. Moreover, depending on the specificity of the problem on hand, additional insights are also provided. Chapter 5 is about global optimization, probably the simplest instance of the generalized moment problem. We detail the semidefinite relaxations of Chapter 4 for minimizing a polynomial on Rn and on a compact basic semialgebraic set K ⊂ Rn . We also discuss the linear relaxations and several particular cases, e.g. when K is a polytope or a finite variety. In particular, this latter case encompasses all 0 − 1 discrete optimization problems. Chapter 6 is about solving systems of polynomial equations. Of course, if the goal is to search for just one solution, one may minimize any polynomial criterion and see the problem as a particular case of Chapter 5. But we also consider the case where one searches for all complex or all real solutions and show that the moment approach is well-suited to solve this problem and provides the first algorithm to compute all real solutions without computing all the complex solutions, in contrast with the usual algebraic approaches based on Gr¨ obner bases or homotopy. Chapter 7 covers some applications in probability. We first consider the problem of computing an upper bound on µ(S) for some subset S ⊂ Rn , over all measures µ that satisfy certain moment conditions. We then consider
Preface
xi
the difficult problem of computing (or at least approximating) the volume of a compact basic semi-algebraic set. We end up with the mass-transfer (or Monge-Kantorovich) problem. Chapter 8 is about Markov chains and invariant probabilities. We first address the problem of computing an upper bound on µ(S) for some S ⊂ Rn , over all invariant probability measures µ of a given Markov chain on Rn . We then consider the problem of approximating the value of an ergodic criterion, as an alternative to simulation which only provides a random estimate. Chapter 9 considers an important application in mathematical finance, namely the pricing of exotic options under a no-arbitrage assumption, first with only knowledge of some moments of the distribution of the underlying asset price, and then when one assumes that the asset price obeys some Ito stochastic differential equation. Chapter 10 considers an application in control. We apply the moment approach to the so-called weak formulation of optimal control problems in which the initial problem is viewed as an infinite linear-programming model over suitable occupation measures, an instance of the generalized moment problem. Chapter 11 considers the following problem: Given a rational function f on a basic semi-algebraic set K, evaluate fb(x) at a particular point x in the domain of the convex envelope fb of f . We then consider semidefinite representations for the convex hull co(K) of K. That is, finding a set defined by linear matrix inequalities in a lifted space such that co(K) is a suitable projection of that set. Chapter 12 is about approximating the multivariate integral of a rational function or an exponential of a multivariate polynomial. We then consider the moment approach as a tool for evaluating gradients or Hesssians in the maximum entropy approach for estimating an unknown density based on knowledge of some of its moments. Chapter 13 first considers the problem of minimizing the supremum of finitely many rational functions on a basic semi-algebraic set. Then this is used to compute (or approximate) the value of Nash equlibria for N-player finite games. We end up with applying the moment approach to 2-player zero-sum polynomial games. Chapter 14, our final chapter, applies the moment approach to provide bounds on functionals of the solution of linear partial differential equations with boundary conditions and polynomial coefficients.
xii
Preface
• The applications of the GMP described in this book come from various areas, e.g., Optimal Control, Optimization, Mathematical Finance, Probability and Operations Research. Therefore, in the Appendices at the end of the book, we have provided some brief basic background on results from optimization and probability that are used in some of the chapters. • At some places in the book, some theorems, lemmas or propositions are framed into a box to emphasize their importance, at least to the author’s taste. • Sometimes, proofs of theorems, lemmas and propositions are postponed for the sake of clarity of exposition and for the reader to avoid from being lost in technical details in the middle of a chapter. Sometimes, if short and/or important, a proof is provided just after the corresponding theorem, lemma or proposition. Finally, if it is too long, or too technical, or not crucial, a proof is simply omitted but a reference is provided in the Notes and Sources section at the end of the corresponding chapter.
Acknowledgments
This book has benefited from several stimulating discussions with (among others) M. Anjos, R.E Curto, E. de Klerk, L. Elghaoui, L.A. Fialkow, K. Fukuda, J.W. Helton, M. Kojima, S. Kuhlmann, M. Laurent, M. Marshall, Y. Nesterov, P. Parrilo, D. Pasechnich, V. Powers, A. Prestel, M. Putinar, J. Renegar, B. Reznick, M-F. Roy, K. Schm¨ udgen, M. Schweighofer, F. Sottile, B. Sturmfels, J. Sun, T. Theobald, K. Toh, L. Tuncel, V. Vinnikov, and H. Wolkowicz, during several visits and/or workshops in the period 2000-2008 at the Mathematical Sciences Research Institute (MSRI, Berkeley), The American Institute of Mathematics (AIM, Palo Alto), the Fields Institute for Research in Mathematical Sciences (Toronto), the Institute for Mathematics and its Applications (IMA, Minneapolis), the Oberwolfach Institute, and the Institute for Mathematical Sciences (IMS, Singapore). I gratefully acknowledge financial support from the AMS and the above institutes which made these visits possible. I also want to thank D. Bertsimas, B. Helton, R. Laraki, M. Laurent, T. Netzer, C. Prieur, M. Putinar, P. Rostalski, T. Pri´eto-Rumeau, E. Tr´elat, M. Zervos, Xuan Vinh Doan, as some of the results presented in this book have been obtained in collaboration with them, and in particular, I wish to thank my colleague Didier Henrion at LAAS (especially for our collaboration on the software GloptiPoly which he made user friendly with free access). Finally, special thanks are due to J-B. Hiriart-Urruty, J. Renegar and S. Sorin. For part of this research, financial support from the (French) ANR under grant ANR-NT05-3-41612 is gratefully acknowledged. At last, I want to thank the CNRS institution for providing a very nice and pleasant working environment at the LAAS-CNRS laboratory in Toulouse. Jean B. Lasserre
Toulouse, May 2009 xiii
This page intentionally left blank
Contents
Preface
vii
Acknowledgments
xiii
Part I Moments and Positive Polynomials
1
1. The Generalized Moment Problem
3
1.1 1.2 1.3 1.4 1.5 1.6
Formulations . . . . . . . . Duality Theory . . . . . . . Computational Complexity Summary . . . . . . . . . . Exercises . . . . . . . . . . Notes and Sources . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2. Positive Polynomials 2.1 2.2 2.3 2.4 2.5
2.6
4 7 10 12 13 13 15
Sum of Squares Representations and Semi-definite Optimization . . . . . . . . . . . . . . . . . . . . . Nonnegative Versus s.o.s. Polynomials . . . . . . . Representation Theorems: Univariate Case . . . . Representation Theorems: Mutivariate Case . . . Polynomials Positive on a Compact Basic Semi-algebraic Set . . . . . . . . . . . . . . . . . . 2.5.1 Representations via sums of squares . . . . 2.5.2 A matrix version of Putinar’s Positivstellensatz . . . . . . . . . . . . . . 2.5.3 An alternative representation . . . . . . . Polynomials Nonnegative on Real Varieties . . . . xv
. . . .
. . . .
. . . .
16 20 22 24
. . . . . .
28 28
. . . . . . . . .
33 34 37
Contents
xvi
2.7 2.8 2.9 2.10 2.11
Representations with Sparsity Properties Representation of Convex Polynomials . . Summary . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
3. Moments 3.1
3.2
3.3 3.4
3.5 3.6 3.7
The One-dimensional Moment Problem . . . . 3.1.1 The full moment problem . . . . . . . 3.1.2 The truncated moment problem . . . . The Multi-dimensional Moment Problem . . . 3.2.1 Moment and localizing matrix . . . . . 3.2.2 Positive and flat extensions of moment matrices . . . . . . . . . . . . . . . . . The K-moment Problem . . . . . . . . . . . . . Moment Conditions for Bounded Density . . . 3.4.1 The compact case . . . . . . . . . . . . 3.4.2 The non compact case . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . . . . .
51 . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
53 54 56 57 58
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
61 62 66 67 68 70 71 71
4. Algorithms for Moment Problems 4.1 4.2 4.3 4.4 4.5
The Overall Approach . . . . . . . . . . . . . . Semidefinite Relaxations . . . . . . . . . . . . . Extraction of Solutions . . . . . . . . . . . . . Linear Relaxations . . . . . . . . . . . . . . . . Extensions . . . . . . . . . . . . . . . . . . . . 4.5.1 Extensions to countably many moment constraints . . . . . . . . . . . . . . . . 4.5.2 Extension to several measures . . . . . 4.6 Exploiting Sparsity . . . . . . . . . . . . . . . . 4.6.1 Sparse semidefinite relaxations . . . . . 4.6.2 Computational complexity . . . . . . . 4.7 Summary . . . . . . . . . . . . . . . . . . . . . 4.8 Exercises . . . . . . . . . . . . . . . . . . . . . 4.9 Notes and Sources . . . . . . . . . . . . . . . . 4.10 Proofs . . . . . . . . . . . . . . . . . . . . . . .
39 42 47 48 49
73 . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
73 75 80 86 87
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
87 88 91 94 96 97 98 99 99
Contents
xvii
4.10.1 Proof of Theorem 4.3 . . . . . . . . . . . . . . . 99 4.10.2 Proof of Theorem 4.7 . . . . . . . . . . . . . . . 102 Part II Applications
107
5. Global Optimization over Polynomials 5.1 5.2 5.3
The Primal and Dual Perspectives . . . . . . . . . . Unconstrained Polynomial Optimization . . . . . . . Constrained Polynomial Optimization: Semidefinite Relaxations . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Obtaining global minimizers . . . . . . . . . 5.3.2 The univariate case . . . . . . . . . . . . . . 5.3.3 Numerical experiments . . . . . . . . . . . . 5.3.4 Exploiting sparsity . . . . . . . . . . . . . . 5.4 Linear Programming Relaxations . . . . . . . . . . . 5.4.1 The case of a convex polytope . . . . . . . . 5.4.2 Contrasting LP and semidefinite relaxations. 5.5 Global Optimality Conditions . . . . . . . . . . . . . 5.6 Convex Polynomial Programs . . . . . . . . . . . . . 5.6.1 An extension of Jensen’s inequality . . . . . 5.6.2 The s.o.s.-convex case . . . . . . . . . . . . . 5.6.3 The strictly convex case . . . . . . . . . . . 5.7 Discrete Optimization . . . . . . . . . . . . . . . . . 5.7.1 Boolean optimization . . . . . . . . . . . . . 5.7.2 Back to unconstrained optimization . . . . . 5.8 Global Minimization of a Rational Function . . . . . 5.9 Exploiting Symmetry . . . . . . . . . . . . . . . . . 5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Notes and Sources . . . . . . . . . . . . . . . . . . . 6. Systems of Polynomial Equations 6.1 6.2 6.3
109 . . 110 . . 111 . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
117 119 121 122 123 125 126 126 127 130 131 132 133 134 135 137 138 141 143 144 144 147
Introduction . . . . . . . . . . . . . . . . . . . . . . . . 147 Finding a Real Solution to Systems of Polynomial Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Finding All Complex and/or All Real Solutions: A Unified Treatment . . . . . . . . . . . . . . . . . . . . . 152
Contents
xviii
6.4 6.5 6.6
6.3.1 Basic underlying idea . . . . . 6.3.2 The moment-matrix algorithm Summary . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
7. Applications in Probability 7.1 7.2 7.3 7.4 7.5 7.6
163
Upper Bounds on Measures with Moment Conditions Measuring Basic Semi-algebraic Sets . . . . . . . . . . Measures with Given Marginals . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . . . . . . . . .
. . . . . .
8. Markov Chains Applications 8.1
8.2 8.3 8.4 8.5
Bounds on Invariant Measures 8.1.1 The compact case . . . 8.1.2 The non compact case Evaluation of Ergodic Criteria Summary . . . . . . . . . . . . Exercises . . . . . . . . . . . . Notes and Sources . . . . . . .
9.3 9.4
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Option Pricing with Moment Information Option Pricing with a Dynamic Model . . 9.2.1 Notation and definitions . . . . . 9.2.2 The martingale approach . . . . . 9.2.3 Semidefinite relaxations . . . . . . Summary . . . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . .
10.4
Introduction . . . . . . . . . . . . . . Weak Formulation of Optimal Control Semidefinite Relaxations for the OCP 10.3.1 Examples . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . .
183 183 185 187 189 190 191 193
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
10. Application in Control 10.1 10.2 10.3
163 168 175 177 177 178 181
9. Application in Mathematical Finance 9.1 9.2
155 155 160 161 161
193 196 197 198 200 202 203 205
. . . . . . Problems . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
205 206 210 212 215
Contents
10.5
xix
Notes and Sources . . . . . . . . . . . . . . . . . . . . . 216
11. Convex Envelope and Representation of Convex Sets 11.1
11.2
11.3 11.4 11.5 11.6
219
The Convex Envelope of a Rational Function . . . . . . 11.1.1 Convex envelope and the generalized moment problem . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Semidefinite relaxations . . . . . . . . . . . . . . Semidefinite Representation of Convex Sets . . . . . . . 11.2.1 Semidefinite representation of co(K) . . . . . . . 11.2.2 Semidefinite representation of convex basic semi-algebraic sets . . . . . . . . . . . . . . . . Algebraic Certificates of Convexity . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . . . . . . . . . .
12. Multivariate Integration 12.1
12.2
12.3
12.4 12.5 12.6
Integration of a Rational Function . . . . . 12.1.1 The multivariable case . . . . . . . 12.1.2 The univariate case . . . . . . . . . Integration of Exponentials of Polynomials 12.2.1 The moment approach . . . . . . . 12.2.2 Semidefinite relaxations . . . . . . . 12.2.3 The univariate case . . . . . . . . . Maximum Entropy Estimation . . . . . . . 12.3.1 The entropy approach . . . . . . . 12.3.2 Gradient and Hessian computation Summary . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . . .
13.2 13.3
220 223 225 226 229 234 239 239 240 243
. . . . . . . . . . . . .
. . . . . . . . . . . . .
13. Min-Max Problems and Nash Equilibria 13.1
219
Robust Polynomial Optimization . . . . . . . . 13.1.1 Robust Linear Programming . . . . . . 13.1.2 Robust Semidefinite Programming . . . Minimizing the Sup of Finitely Many Rational Functions . . . . . . . . . . . . . . . . . . . . . Application to Nash Equilibria . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
243 245 246 250 251 253 254 256 257 259 260 262 262 265
. . . . . 265 . . . . . 267 . . . . . 269 . . . . . 270 . . . . . 273
Contents
xx
13.4 13.5
13.3.1 N-player games . . . . . . . . . . . . . 13.3.2 Two-player zero-sum polynomial games 13.3.3 The univariate case . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
14. Bounds on Linear PDE 14.1 14.2
273 276 280 281 282 285
Linear Partial Differential Equations . . . . . . . . . . . 285 Notes and Sources . . . . . . . . . . . . . . . . . . . . . 288
Final Remarks
291
Appendix A
293
A.1 A.2 A.3 A.4 A.5 A.6
Background from Algebraic Geometry Fields and Cones . . . . . . . . . . . . . . . . . . . . . Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . Varieties . . . . . . . . . . . . . . . . . . . . . . . . . . Preordering . . . . . . . . . . . . . . . . . . . . . . . . Algebraic and Semi-algebraic Sets over a Real Closed Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . . . . . . . . .
. . . .
. 300 . 302
Appendix B Measures, Weak Convergence and Marginals B.1 B.2 B.3
305
Weak Convergence of Measures . . . . . . . . . . . . . . 305 Measures with Given Marginals . . . . . . . . . . . . . . 308 Notes and Sources . . . . . . . . . . . . . . . . . . . . . 310
Appendix C Some Basic Results in Optimization C.1 C.2 C.3 C.4 C.5 Appendix D D.1 D.2 D.3 D.4
293 295 296 298
Non Linear Programming . . . . . . . . . Semidefinite Programming . . . . . . . . Infinite-dimensional Linear Programming Proof of Theorem 1.3 . . . . . . . . . . . Notes and Sources . . . . . . . . . . . . .
311 . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
The GloptiPoly Software Presentation . . . . . . . . . . . Installation . . . . . . . . . . . . Getting started . . . . . . . . . . Description . . . . . . . . . . . . D.4.1 Multivariate polynomials
311 313 316 318 319 321
. . . . . . . . . . . . . . . . . . . . (mpol)
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
321 321 322 323 324
Contents
D.5
D.6
D.4.2 Measures (meas) . . . . . . . . . D.4.3 Moments (mom) . . . . . . . . . . D.4.4 Support constraints (supcon) . . D.4.5 Moment constraints (momcon) . . D.4.6 Floating point numbers (double) Solving Moment Problems (msdp) . . . . D.5.1 Unconstrained minimization . . . D.5.2 Constrained minimization . . . . D.5.3 Several measures . . . . . . . . . Notes and Sources . . . . . . . . . . . . .
xxi
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
325 325 327 327 328 329 329 331 335 337
Glossary
339
Bibliography
341
Index
359
Chapter 1
The Generalized Moment Problem
We describe the abstract basic Generalized Moment Problem (GMP) and its dual with a few examples. We also provide some general results concerning the GMP with polynomial data and its dual and show why the theory of moments and its dual theory of positive polynomials can be useful to solve the GMP.
Problems involving moments of measures arise naturally in many areas of applied mathematics, statistics and probability, economics, engineering, physics and operations research. For instance, how do we obtain optimal bounds on the probability that a random variable belongs to a given set, given some of its moments? How do we price derivative securities in a financial economics framework without assuming any model for the underlying price dynamics, given only moments of the price of the underlying asset? But as we will see throughout the book, some (and even many) other problems seemingly different and which a priori do not involve any moment of some measure, have equivalent reformulations which involve moments or generalized moments. In fact, these problems can be seen as particular instances of a linear infinite dimensional optimization problem, called the Generalized Moment Problem (in short, GMP). Let K be a Borel subset of Rn and let M (K) be the space of finite signed Borel measures on K, whose positive cone M (K)+ is the space of finite Borel measures µ on K. Given a set of indices Γ, a set of reals {γj : j ∈ Γ}, and functions f, hj : K → R, j ∈ Γ, that are integrable with respect to every measure µ ∈ M (K)+ , the GMP is defined as follows: ρmom = GMP :
sup µ∈M (K)+
s.t.
Z Z
f dµ K
(1.1) hj dµ 5 γj ,
K
3
j ∈ Γ.
1 The Generalized Moment Problem
4
(Recall that the symbol “5” stands for either an inequality “≤” or an equality “=”.) Note that we write “sup” instead of “max” to indicate that an optimal solution might not be attained. In this chapter, we present several examples that illustrate the modeling power of problem (1.1), develop a duality theory that forms the basis of future developments and briefly discuss the complexity of problem (1.1). 1.1
Formulations
In this section, our goal is to illustrate the modeling flexibility of formulation (1.1). Towards this goal, we present three examples from rather diverse areas such as probability theory, financial economics and optimization. Moment problems in probability Given vectors α = (α1 , . . . , αn )0 ∈ Nn and x = (x1 , . . . , xn )0 ∈ Rn , let αn 1 α2 xα := xα 1 x2 . . . xn .
Let S, K ⊂ Rn be given Borel sets with S ⊂ K. Then, the problem of finding an optimal bound on the probability that a K-valued random variable X belongs to S, given some of its moments γα for α ∈ Γ ⊂ Nn , can be formulated as solving the problem: ρmom =
µ(S) Z s.t. xα dµ = γα ,
sup
µ∈M (K)+
K
(1.2) ∀ α ∈ Γ,
a special case of formulation (1.1) with f (x) = 1S (x) and hα (x) = xα , α ∈ Γ. Moment problems in financial economics A central question in financial economics is to find the price of a so-called derivative security given information on the underlying asset. Let us take the example of the so-called European call option on an underlying security (this is why an option is called derivative security as its value is derived from another) with strike k and maturity T . It gives its holder the option (but not the obligation) to buy the underlying security at time T at price k. Clearly, if the price ST is more than k, then the holder will exercise
1.1 Formulations
5
the option and make a profit of ST − k, while if it is less than k, he will not exercise and does not make a profit. Thus, the payoff of this option is max(ST − k, 0). Clearly, as the payoff of this option is nonnegative, it has some value. A key problem in financial economics is to determine the price of such an option. This is exactly the area of the 1997 Nobel prize in economics awarded to Robert Merton and Myron Scholes (Fisher Black has passed away in 1995). Under the assumption that the price of the underlying asset follows a geometric Brownian motion and using the noarbitrage assumption, the Black-Scholes formula provides an explicit and insightful answer to this question. No arbitrage means that one cannot make money deterministically. For example, if a stock trades in two exchanges, it will trade at the same price, since otherwise there is an arbitrage opportunity. It turns out that the assumption of no-arbitrage is equivalent to the existence of a probability measure µ, such that the price of any European call option with strike k is given by Eµ [max(ST − k, 0)]. If we do not assume a particular stochastic process for the price dynamics St , but only moments of the price ST at time t = T , and under the no-arbitrage assumption, the problem of finding an optimal upper bound on the price of a European call option with strike k given the first m moments γj , j = 0, 1, . . . , m, (γ0 = 1) of the price of the underlying asset, is given by:
ρmom =
sup µ∈M (R+ )+
s.t.
Z Z
R+
max(x − k, 0) dµ
(1.3)
xj dµ = γj , j = 0, 1, . . . , m, R+
a special case of formulation (1.1) with K = R+ , f (x) = max(x − k, 0) and hj (x) = xj . As another example, if the prices pj of European call options of strike kj , j = 1, . . . , m are given, the problem of finding an optimal upper bound on the price of a European call option with strike k is a special case of formulation (1.1) with K = R+ , f (x) = max(x − k, 0), hj (x) = max(x − kj , 0), and γj = pj . Global optimization over polynomials With f : Rn →R and K ⊂ Rn , consider the constrained optimization problem:
6
1 The Generalized Moment Problem
f ∗ = sup f (x) s.t. x ∈ K, which we rewrite as ρmom =
sup µ∈M (K)+
s.t.
Z Z
(1.4)
f dµ K
(1.5) dµ = 1.
K
Theorem 1.1. Problems (1.4) and (1.5) are equivalent, that is f ∗ = ρmom . Proof. If f ∗ = +∞, let M be arbitrary large, and let x ∈ K be such that f (x) ≥ M . Then, Rwith µ := δx ∈ M (K)+ (with δx being the Dirac measure at x), we have f dµ = f (x) ≥ M , and so ρmom = +∞. We the case f ∗ < +∞. Since f (x) ≤ f ∗ for all x ∈ K, R next consider then K f dµ ≤ f ∗ and thus ρmom ≤ f ∗ . Conversely, with every x ∈ K, we associate the Dirac measure δx ∈ M (K)+ which is a feasible solution of problem (1.5) with value f (x), leading to ρmom ≥ f ∗ . This combined with f ∗ ≥ ρmom leads to f ∗ = ρmom , the desired result. Notice also that if x∗ ∈ K is a global minimizer of problem (1.4), then the probability measure µ∗ := δx∗ is an optimal solution of problem (1.5). In other words, we can formulate the general nonlinear optimization problem as a special case of the generalized moment problem (1.1), which underscores the modeling flexibility of problem (1.1). In contrast to problem (1.4), problem (1.5) is linear, and thus convex. It is, however, infinitedimensional. Given the diversity and generality of these examples, it is evident that the generalized moment problem is a problem of remarkable modeling power. In later chapters and in the exercises we explore several other examples. The goal of the book is to understand the complexity of problem (1.1) and its variations, to explore applications in a variety of applied contexts and to develop algorithms for providing bounds, and sometimes solutions. For the case f = 0, hα = xα , α ∈ Γ ⊂ Nn , problem (1.1) becomes a feasibility problem known as the K-moment problem. Definition 1.1. Given γα , α ∈ Γ ⊂ Nn and a set K ⊆ Rn , the K-moment problem asks whether there exists a finite Borel measure µ ∈ M (K)+ such that Z xα dµ = γα , α ∈ Γ. K
1.2 Duality Theory
7
In the next section, we develop a duality theory that forms the basis of algorithms and relaxations that we will utilize in later chapters.
1.2
Duality Theory
Problem (1.1) is linear and its dual problem is given by:
ρpop = inf λ
s.t.
X
γ j λj
j∈Γ
X j∈Γ
λj hj (x) ≥ f (x), ∀ x ∈ K.
λj ≥ 0,
(1.6)
j ∈ Γ+
where Γ+ ⊆ Γ stands for the set ofR indices j for which the generalized moment constraint is the inequality hj dµ ≤ γj . As we will apply general results from conic duality in convex optimization, we write problems (1.1) and (1.6) as conic optimization problems. So we introduce the convex cones: R R C(K) = {(γ, γ0 )0 : ∃ µ ∈ M (K)+ s.t. γ0 = K f dµ, γj = K hj dµ, ∀ j ∈ Γ }, ( ) X 0 P (K) = (λ, λ0 ) : λj ≥ 0, j ∈ Γ+ ; λj hj (x) + λ0 f (x) ≥ 0, ∀ x ∈ K , j∈Γ
and rewrite problems (1.1) and (1.6) as ρmom = sup {γ0 : (γ, γ0 )0 ∈ C(K) },
(1.7)
X ρpop = inf { γj λj : (λ, −1)0 ∈ P (K) }, λ j∈Γ
(1.8)
γ0
where R denotes the closure of the set R (whereas int R denotes its interior). The weak duality property holds for problems (1.7) and (1.8) if for any P two feasible solutions γ0 of (1.7) and λ of (1.8), one has γ0 ≤ j∈Γ λj γj (and so ρmom ≤ ρpop ). If ρmom < ρpop then one says that there is a duality gap for problems (1.7) and (1.8). Finally strong duality holds if there is no duality gap, i.e., ρpop = ρmom .
1 The Generalized Moment Problem
8
Using general results of conic duality in convex optimization, we obtain the following weak and strong duality.
Theorem 1.2. (a) (Weak duality) The optimal values of (1.7) and (1.8) satisfy ρmom ≤ ρpop . (b) (Strong duality) If (γ, γ0 ) ∈ C(K) for some γ0 , and there exists λ ∈ R|Γ| (with λj ≥ 0 for all j ∈ Γ+ ) such that (λ, −1)0 ∈ Int P (K), then ρmom = ρpop and problem (1.1) has an optimal solution, that is the sup is attained.
Proof. (a) Let γ0 and λ be arbitrary feasible solutions of (1.7) and (1.8) respectively. By definition of the cone C(K) there exists a finite Borel measure µ ∈ M (K)+ such that Z Z X X Z X γ0 = f dµ ≤ λj hj dµ = λj hj dµ ≤ λj γ j , K
K
j∈Γ
j∈Γ
K
j∈Γ
that is, weak duality holds (and so, ρmom ≤ ρpop ). (b) Strong duality follows from general results of conic duality in convex optimization that requires, however, that there exists a vector (λ, −1) 0 in the interior of the cone P (K), which is known as a Slater type condition. In important special cases we do not need to impose Slater type conditions for strong duality to hold, as the next theorem states.
Theorem 1.3. Suppose that K is compact, f is bounded and uppersemicontinuous on K, hj is continuous on K for every j ∈ Γ, and there exists k ∈ Γ such that hk > 0 on K. Then: (a) ρmom = ρpop and if problem (1.1) has a feasible solution then it has an optimal solution, that is, the sup is attained. (b) If the sup is attained, problem (1.1) has an optimal solution µ supported on finitely many points of K, i.e., µ is a finite linear combination of Dirac measures.
1.2 Duality Theory
9
A proof which uses results from infinite-dimensional linear optimization, can be found in Appendix C.4 where an appropriate summarized background is also provided. There are cases where one can relax the compactness of the set K and still maintain strong duality. One such case is when −f is lowersemicontinuous, bounded from below and inf-compact (or a moment function); see Definition B.11. In addition, for each α ∈ Γ, the moment conR straint is an inequality constraint K hα dµ ≤ γα , and hα is a nonnegative lower-semicontinuous function. Countably many constraints. We next consider the moment problem (1.1) and extend Theorem 1.3 when there are countably many constraints present in problem (1.1). Corollary 1.4. Suppose that the assumptions of Theorem 1.3 hold with Γ a countable set. If (1.1) has a feasible solution, then it is solvable (i.e., the sup is attained), and there is no duality gap, i.e., ρmom = ρpop . Proof. Let (Γm ) be a sequence of finite sets such that Γm ⊂ Γ for all m and Γm ↑ Γ as m→∞. We may assume without loss of generality that there exists an index, say j = 0 ∈ Γ1 (hence 0 ∈ Γm for all m) such that h0 > 0 on K. Suppose that (1.1) has a feasible solution and consider the moment problem (1.1) with finitely many constraints indexed in Γm ; let ρm mom be its optimal value. Similarly let ρm denote the optimal value of its dual. By pop m m Theorem 1.3, ρm = ρ , for all m. In addition, ρ ≤ ρ for each m mom mom pop mom m and the sequence (ρmom ) is monotone nonincreasing as more constraints are added as m increases. Moreover, let µm ∈ M (K)+ be an optimal solution of the primal. As h0 > 0 on K we have h0 ≥ δ on K for some δ > 0, and so µm (K) ≤ γ0 /δ for all m. Therefore, there is a subsequence {mi } and a finite Borel measure µ on K such that µmi ⇒ µ1 as i→∞. Fix j ∈ Γ arbitrary so that hj ∈ Γm for all m sufficiently large. As hj is continuous, K is compact and µmi ⇒ µ, we have Z Z hj dµ = lim hj dµmi 5 γj , K
i→∞
K
and so, as j was arbitrary, µ is feasible for the moment problem. Moreover, as f is upper semicontinuous, it is bounded above on K, and 1 The notation µ nj ⇒ µ (standard in probability) stands for the weak convergence of measures; see Definition B.1.
10
1 The Generalized Moment Problem
i ρmom ≤ lim sup ρm mom = lim sup
i→∞
i→∞
Z
K
f dµmi ≤
Z
f dµ K
where the last inequality on the right follows from Proposition 1.4.18 in Hern´ andez-Lerma and Lasserre (2003). This proves that µ is a primal i solution of the moment problem (with Γ countable), and so ρm mom ↓ ρmom , m m which also implies ρmom ↓ ρmom because the sequence (ρmom ) is monotone. m m Finally, as ρm pop = ρmom , we also get ρpop ↓ ρmom .
1.3
Computational Complexity
In this section, we consider the complexity of a variant of problem (1.1) with only inequality moment constraints:
ρ≤ mom =
sup µ∈M (K)+
s.t.
Z Z
f dµ K
K
(1.9) hj dµ ≤ γj j ∈ Γ,
(i.e., Γ = Γ+ ), and the corresponding dual problem becomes: X ρ≤ γ j λj pop = inf j∈Γ
s.t.
X j∈Γ
λj hj (x) ≥ f (x), ∀ x ∈ K,
(1.10)
λj ≥ 0, j ∈ Γ. All the results we presented earlier regarding strong duality continue to hold for the pair of primal and dual problems (1.9) and (1.10). In order to study the complexity of problem (1.9), we consider the separation problem associated with problem (1.10): Definition 1.2. The separation problem for problem (1.10): Given λj ≥ 0, j ∈ Γ, check whether X λj hj (x) ≥ f (x), ∀ x ∈ K, j∈Γ
and if not, find a violated inequality.
1.3 Computational Complexity
11
Given the equivalence of separation and optimization, as well as strong duality, we need to consider conditions under which the separation problem is solvable in polynomial time. Theorem 1.5. (a) If f is concave, hj is convex for every j ∈ Γ, and K is a convex set, then problem (1.9) is solvable in polynomial time. (b) If f and hj , j ∈ Γ are quadratic or piecewise linear functions over p polyhedra Ki , i = 1, . . . , p, that form a partition of K = Rn (with p being a polynomial in n and |Γ|), then problem (1.9) is solvable in polynomial time. Proof.
(a) The separation problem becomes inf
x∈K
X j∈Γ
λj hj (x) − f (x),
which in this case is a convex optimization problem, solvable efficiently using the ellipsoid method. (b) We present the case in which f (x) = x0 Qx + b0 x + c and hj (x) = x0 Qj x + b0j x + cj , j ∈ Γ. We have X j∈Γ
0
λj hj (x) − f (x) = x0 Qx + b x + c,
where Q=
X j∈Γ
b=
X j∈Γ
c=
X j∈Γ
λj Qj − Q, λj bj − b, λj cj − c.
To solve the separation problem, we check whether Q is positive semidefinite. If it is not, we decompose Q = U0 Θ U, where Θ is the diagonal matrix of the eigenvalues of Q. Let θi < 0 be a negative eigenvalue of Q. Let u be a vector with uj = 0, j 6= i, and ui large enough so that
1 The Generalized Moment Problem
12
θi u2i + (Ub)i ui + c < 0. Let x0 = U0 u. Then, X 0 λj hj (x0 ) − f (x0 ) = x00 Qx0 + b x0 + c j∈Γ
0
= u0 UU0 Θ UU0 u + b U0 u + c 0
= u0 Θu + b U0 u + c n n X X = θj u2j + (Ub)j uj + c j=1
j=1
= θi u2i + (Ub)i ui + c < 0,
which produces a violated inequality. If Q is positive semidefinite, then we solve the convex quadratic optimization problem 0
ρ0 = minn x0 Qx + b x + c, x∈R
which can be solved efficiently (e.g. as a semidefinite program after some transformation). If ρ0 < 0 and it is attained as some x0 , then x0 produces a violated inequality. Otherwise, the given dual solution is feasible. The case of piecewise linear functions is addressed in Exercise 1.3.
1.4
Summary
A key objective in this book is to provide algorithms for solving problem (1.1). In order to make progress we will restrict ourselves to cases where the functions f and hj , j ∈ Γ, are polynomials (or is some cases, rational functions or piecewise polynomials) and the set K is a basic semi-algebraic set, i.e., K = {x ∈ Rn : gi (x) ≥ 0, i = 1, . . . , m}, where gi ∈ R[x], for all i = 1, . . . , m. In particular, in the case where f , {hj }j∈Γ ∈ R[x], and K is a basic semi-algebraic set, then problem (1.6) asks for a polynomial to be nonnegative for all x ∈ K. This naturally leads us in Chapter 2 to study nonnegative polynomials (and polynomials nonnegative on a basic semialgebraic set) a topic of central importance in the development of 20th century mathematics. Similarly, when f , {hj }j∈Γ ∈ R[x], then in problem (1.1), only the moments of the unknown measure µ are involved and not µ
1.6 Notes and Sources
13
itself, which naturally leads us in Chapter 3 to the study of the K-moment problem. In fact, we will see that there is a nice and beautiful duality between the theory of moments and the theory of positive polynomials. 1.5
Exercises
Exercise 1.1. Given vectors xi ∈ Rn and scalars αi , i = 1, . . . , m, β and γ, formulate the problem of finding a polynomial f ∈ R[x] such that f (xi ) = αi , i = 1, . . . , m, β ≤ f (x) ≤ γ for all x ∈ Rn as a generalized moment problem (1.1). Exercise 1.2. When K is not compact, prove that Theorem 1.3 is also valid if the data satisfy the following: −f is lower-semicontinuous, bounded from below and inf-compact (or a moment function); see Definition B.11. In addition, for each α ∈ Γ, when the moment constraint is an inequality R hα dµ ≤ γα , then hα is a nonnegative lower-semicontinuous function, and hα is continuous otherwise. (Hint: Use Theorem B.9 and Proposition B.10.) Exercise 1.3. Prove Theorem 1.5(b) for the case that f and hj , j ∈ Γ, are piecewise linear functions over p convex sets Ki that form a partition of K = Rn .
1.6
Notes and Sources
1.1. The fact that no-arbitrage is equivalent to the existence of a martingale measure was originally proved by Harrison and Kreps (1979). For a derivation from linear optimization duality see Bertsimas and Tsitsiklis (1997). 1.2. For a survey on duality results for conic linear problems see Shapiro (2001). 1.3. The ellipsoid method was developed by Shor (1970), and Yudin and Nemirovski (1977). The polynomial time complexity of the method was shown by Khachian (1979). For the equivalence of separation and optimization see Gr¨ otschel et al. (1988). Theorem 1.5 is from Bertsimas and Sethuraman (2000, Theor. 16.4.4).
Chapter 2
Positive Polynomials
We review basic results of real algebraic geometry on the representation of positive polynomials, among which are the fundamental Positivstellensatz of Krivine, Stengle, Schm¨ udgen, Putinar and Jacobi and Prestel. We also provide some additional representation results that take into account several specific cases like finite varieties, convex semi-algebraic sets, representations that preserve sparsity, etc.
In the previous chapter, we have seen that the dual problem (1.6) of the generalized moment problem in the case where f , (hj )j∈Γ are polynomials and K is a basic semi-algebraic set, asks for a polynomial to be nonnegative for all x ∈ K. In one dimension, the ring R[x] of real polynomials of a single variable has the fundamental property (Theorem 2.5) that every nonnegative polynomial p ∈ R[x] is a sum of squares of polynomials, that is, p(x) ≥ 0, ∀x ∈ R
⇔
p(x) =
k X i=1
hi (x)2 , ∀x ∈ R,
for finitely many polynomials (hi ). In multiple dimensions, however, it is possible for a polynomial to be nonnegative without being a sum of squares. In fact, in his famous address in a Paris meeting of mathematicians in 1900, where he posed important problems for mathematics to be addressed in the 20th century, Hilbert in his 17th problem conjectured that every nonnegative polynomial can always be written as a sum of squares of rational functions. This conjecture was later proved to be correct by Emil Artin in 1926, using the Artin-Schreier theory of real closed fields. An immediate extension is to consider characterizations of polynomials p that are nonnegative on a basic semi-algebraic set K defined by polynomial inequalities gi (x) ≥ 0, i = 1, . . . , m. By characterization, we mean a 15
2 Positive Polynomials
16
representation of p in terms of the gi ’s such that the nonnegativity of p on K follows immediately from the latter representation in the same way that p is obviously nonnegative, when it is a sum of squares. In other words, this representation of p can be seen as a certificate of nonnegativity of p on K. In this chapter, we review the key representation theorems for nonnegative or positive polynomials. These theorems are the fundamendal building blocks for later chapters. This chapter uses some elementary results from algebraic geometry that we review in Appendix A, so that the book is self-contained. 2.1
Sum of Squares Representations and Semi-definite Optimization
In this section we show that if a nonnegative polynomial has a sum of squares representation then one can compute this representation by using semidefinite optimization methods. Given that semidefinite optimization problems are efficiently solved both from a theoretical and a practical point of view, it follows that we can compute a sum of squares decomposition of a nonnegative polynomial, if it exists, efficiently. Let R[x] denote the ring of real polynomials in the variables x = (x1 , . . . , xn ). A polynomial p ∈ R[x] is a sum of squares (in short s.o.s.) if p can be written as X x 7→ p(x) = pj (x)2 , x ∈ Rn , j∈J
for some finite family of polynomials (pj : j ∈ J) ⊂ R[x]. Notice that necessarily the degree of p must be even, and also, the degree of each pj is bounded by half of that of p. Denote by Σ[x] ⊂ R[x] the space of s.o.s. polynomials. For any two real symmetric matrices A, B, recall that hA, Bi stands for trace(AB). Finally, Pn for a multi-index α ∈ Nn , let |α| := i=1 αi . Consider the vector vd (x) = (xα )|α|≤d
= (1, x1 , . . . , xn , x21 , x1 x2 , . . . , xn−1 xn , x2n , . . . , xd1 , . . . , xdn )0 , of all the monomials xα of degree less than or equal to d, which has dimension s(d) := n+d d . Those monomials form the canonical basis of the vector space R[x]d of polynomials of degree at most d.
2.1 Sum of Squares Representations
17
Proposition 2.1. A polynomial g ∈ R[x]2d has a sum of squares decomposition (or is s.o.s.) if and only if there exists a real symmetric and positive semidefinite matrix Q ∈ Rs(d)×s(d) such that g(x) = vd (x)0 Qvd (x), for all x ∈ Rn . Proof. Suppose there exists a real symmetric s(d) × s(d) matrix Q 0 for which g(x) = vd (x)0 Qvd (x), for all x ∈ Rn . Then Q = HH0 for some s(d) × k matrix H, and thus, 0
0
g(x) = vd (x) HH vd (x) =
k X
(H0 vd (x))2i ,
i=1
∀x ∈ Rn .
Since x 7→ (H0 vd (x))i is a polynomial, then g is expressed as a sum of squares of the polynomials x 7→ (H0 vd (x))i , i = 1, . . . , k. Conversely, suppose that g (of degree 2d) has a s.o.s. decomposition P g(x) = ki=1 [hi (x)]2 for some family {hi : i = 1, . . . , k} ⊂ R[x]. Then necessarily, the degree of each hi is bounded by d. Let hi be the vector of coefficients of the polynomial hi , i.e., hi (x) = h0i vd (x), i = 1, . . . , k. Thus, g(x) =
k X
vd (x)0 hi h0i vd (x) = vd (x)0 Qvd (x),
i=1
with Q ∈ Rs(d)×s(d) , Q :=
Pk
i=1
∀x ∈ Rn ,
hi h0i 0, and the proposition follows.
Given a s.o.s. polynomial g ∈ R[x]2d , the identity g(x) = vd (x)0 Qvd (x) for all x, provides linear equations that the coefficients of the matrix Q must satisfy. Hence, writing X vd (x)vd (x)0 = Bα x α , α∈Nn
for appropriate s(d)×s(d) real symmetric matrices (Bα ), checking whether P α the polynomial x 7→ g(x) = is s.o.s. reduces to solving the α gα x 1 semidefinite optimization (feasibility) problem: Find Q ∈ Rs(d)×s(d) such that: 0
Q=Q,
Q 0,
hQ, Bα i = gα ,
(2.1) n
∀α ∈ N ,
1 For a brief background on Semidefinite Programming, the reader is referred to Section C.2.
2 Positive Polynomials
18
a tractable convex optimization problem for which efficient software packages are available. Indeed, up to arbitrary precision > 0 fixed, a semidefinite program can be solved in a computational time that is polynomial in n+d the input size of the problem. Observe that the size s(d) = n of the semidefinite program (2.1) is bounded by nd . On the other hand, nonnegativity of a polynomial g ∈ R[x] can be checked also by solving a single semidefinite program. Indeed, if g is nonnegative then it can be written as a sum of squares of rational functions, and so, clearing denominators, hg = f,
(2.2)
for some s.o.s. h, f ∈ Σ[x], and there exist bounds on the degree of h and f . Conversely, if (2.2) has a nontrivial s.o.s. solution h, f ∈ Σ[x], then g is obviously nonnegative. Therefore, using (2.1) for h and f in (2.2), finding a certificate h, f for nonnegativity of g reduces to solving a single semidefinite program. Unfortunately, the available bounds for the size of this semidefinite program are by far too large for practical implementation. This is what makes the sum of squares property very attractive computationally, in contrast with the weaker nonnegativity property which is much harder to check (if not impossible). Example 2.1. Consider the polynomial in R[x] = R[x1 , x2 ] f (x) = 2x41 + 2x31 x2 − x21 x22 + 5x42 . Suppose we want to check whether f is a sum of squares. As f is homogeneous, we attempt to write f in the form f (x1 , x2 ) = 2x41 + 2x31 x2 − x21 x22 + 5x42 2 0 2 x1 q11 q12 q13 x1 2 2 = x2 q12 q22 q23 x2 x1 x2
=
q11 x41
+
q13 q23 q33
q22 x42
x1 x2
+ (q33 + 2q12 ) x21 x22 + 2q13 x31 x2 + 2q23 x1 x32 ,
for some Q 0. In order to have an identity, we obtain
2.1 Sum of Squares Representations
Q 0;
19
q11 = 2, q22 = 5, q33 + 2q12 = −1, 2q13 = 2, q23 = 0.
In this case we easily find the particular solution
2 −3 1
0 Q = −3
1
5 0 = HH0 , 05
20
1 H = √ −3 1 , 2 13
and so f (x1 , x2 ) =
1 1 (2x21 − 3x22 + x1 x2 )2 + (x22 + 3x1 x2 )2 , 2 2
which is indeed a sum of squares.
Sufficient condition for being s.o.s. Observe that checking whether a polynomial g ∈ R[x]2d is s.o.s. via (2.1) requires introducing the auxiliary (symmetric matrix) variable Q ∈ Rs(d)×s(d) , i.e., we do not have “if and only if” conditions expressed directly in terms of the coefficients (gα ) of g. If fact such conditions on g exist. They define the set G := { g = (gα ) ∈ Rs(2d) : ∃ Q ∈ Rs(d)×s(d) such that (2.1) holds}, which is the orthogonal projection on Rs(2d) of the set of elements (g, Q) that satisfy (2.1), a basic semi-algebraic set of Rs(2d)+s(d)(s(d)+1)/2 . Indeed the semidefinite constraint Q 0 can be stated as polynomial inequality constraints on the entries of Q (use determinants of its principal minors). Hence, by the Projection Theorem of real algebraic geometry, the set G is a semi-algebraic set (but not a basic semi-algebraic set in general); see Theorem A.7. Hence, in general, G is not a basic semi-algebraic set but a finite union ∪i∈I Gi of basic semi-algebraic sets Gi ⊂ Rs and it is very hard to obtain the polynomials in the variables gα that define each Gi . The next result states a sufficient condition for g ∈ R[x] to be s.o.s., Pn directly in terms of the coefficients (gα ). Let Nnd := {α ∈ Nn : i=1 αi ≤ d} and let Γd := {2β : β ∈ Nnd }.
2 Positive Polynomials
20
Theorem 2.2. Let x 7→ g(x) = with d ≥ 1, and write g =
n X
P
α gα x
α
be a polynomial of degree 2d
gid x2d i + h + g0 ,
i=1
n where h ∈ R[x] contains none of the monomials (x2d i )i=1 . Then g is s.o.s. if X X g0 ≥ |gα | − min[0, gα ] (2.3) α6∈Γd
gid ≥
X
α6∈Γd
α∈Γd
|gα |
X |α| |α| − min[0, gα ] , 2d 2d α∈Γd
∀ i = 1, . . . , n.
(2.4)
On the one hand, the conditions (2.3)-(2.4) are only sufficient but on the other hand, they define a convex polyhedron in the space Rs(2d) of coefficients (gα ) of g ∈ R[x]. This latter property may be interesting if one has to optimize in the space of s.o.s. polynomials of degree at most d.
2.2
Nonnegative Versus s.o.s. Polynomials
It is important to compare nonnegative and s.o.s. polynomials because we have just seen that one knows how to check efficiently whether a polynomial is s.o.s. but not whether it is only nonnegative. There are two kinds of results for that comparison, depending on whether or not one keeps the degree fixed. The first result is rather negative as it shows that when the degree is fixed and the number of variables grows, then the gap between nonnegative and s.o.s. polynomials increases and is unbounded. Namely, let P[x]d (resp. H[x]d ) denote the cone of homogeneous and nonnegative (resp. homogeneous and s.o.s.) polynomials of degree 2d. (Recall that a polynomial p ∈ R[x] of degree d is homogeneous if p(λx) = λd p(x) for all λ ≥ 0 and all x ∈ Rn .) To compare both setsRwe need subsets of finite volume. So let H be the hyperplane {f ∈ R[x] : Sn−1 f dµ = 1}, where µ is the rotation invariant c d := H ∩ P[x]d measure on the unit sphere Sn−1 ⊂ Rn . Finally, let P[x] b d := H ∩ H[x]d . and H[x]
2.2 Nonnegative Versus s.o.s. Polynomials
21
Theorem 2.3. There exist constants C1 , C2 > 0 depending only on d and such that ! c d) vol(P[x] (d/2−1)/2 C1 n ≤ ≤ C2 n(d/2−1)/2 . (2.5) b d) vol(H[x]
c d and H[x] b d can Therefore, if d is fixed and n→∞, the gap between P[x] become as large as desired. On the other hand, while a nonnegative polynomial f ∈ R[x] may not be a s.o.s. (Exercise 2.4), we next show that we can perturb f to make it a sum of squares. The price to pay is consistent with Theorem 2.3 in that the approximation of f we need consider does not have same degree as f . Given r ∈ N arbitrary, let Θr , θr ∈ R[x] be the polynomials Θr (x) := 1 +
n X
x2r i ;
i=1
θr (x) :=
n X r X x2k i
i=1 k=0
k!
.
(2.6)
P Given f ∈ R[x] let kf k1 := α∈Nn |fα | if f = (fα ) is the vector of coefficients of f . Next, with > 0, we define 1 fr := f + Θr ;
2 fr := f + θr .
Theorem 2.4. (a) If f ∈ R[x] is nonnegative on [−1, 1]n , then for every > 0 there 1 1 exists r1 such that fr ∈ Σ[x] for all r ≥ r1 and kf − fr k1 →0 as ↓ 0 (and r ≥ r1 ). (b) If f ∈ R[x] is nonnegative, then for every > 0 there exists r2 such 2 2 that fr ∈ Σ[x] for all r ≥ r2 and kf − fr k1 →0 as ↓ 0 (and r ≥ r2 ). Theorem 2.4 is a denseness result with respect to the l1 -norm of coefficients. Indeed, it states that a polynomial f which is nonnegative on 1 [−1, 1]n (resp. on Rn ) can be perturbed to a s.o.s. polynomial fr (resp. 2 1 2 fr ) such that kf − fr k1 →0 (resp. kf − fr k1 →0). It also provides a certificate of nonnegativity of f on [−1, 1]n (resp. on Rn ). Indeed fix
22
2 Positive Polynomials
1 x ∈ [−1, 1]n (resp. x ∈ Rn ) and let →0 to obtain that 0 ≤ fr (x)→f (x) 2 (resp. 0 ≤ fr (x)→f (x)). Concerning Theorem 2.4(b), observe that in addition to the l1 -norm 2 convergence kf − fr k1 →0, the convergence is also uniform on compact sets. Notice that a polynomial f nonnegative on the whole Rn (hence on the box [−1, 1]n ) can also be approximated by the s.o.s. polynomial 1 2 fr of Theorem 2.4(a) which is simpler than the s.o.s. approximation fr . 1 However, in contrast to the latter, the s.o.s. approximation f ≈ fr is not uniform on compact sets, and is really more appropriate for polynomials 1 nonnegative on [−1, 1]n only (and indeed the approximation f ≈ fr is uniform on [−1, 1]n ). In addition, in Theorem 2.4(a) the integer r1 does not depend on the explicit choice of the polynomial f but only on: (a) and the dimension n, (b) the degree and the size of the coefficients of f . Therefore, if one fixes these four parameters, we find an r such that the statement of Theorem 2.4(a) holds for any f nonnegative on [−1, 1]n , whose degree and size of coefficients do not exceed the fixed parameters.
2.3
Representation Theorems: Univariate Case
In this section, we review the major representation theorems for nonnegative univariate polynomials, for which the results are quite complete. Let R[x] be the ring of real polynomials of the single variable x, and let Σ[x] ⊂ R[x] be its subset of polynomials that are sums of squares of elements of R[x]. We first prove that if p ∈ R[x] is nonnegative, then p ∈ Σ[x]. Theorem 2.5. A polynomial p ∈ R[x] of even degree is nonnegative if and only if it can be written as a sum of squares of other polynomials, i.e., Pk p(x) = i=1 [hi (x)]2 , with hi ∈ R[x], i = 1, . . . , k.
P Proof. Clearly, if p(x) = kj=1 [hj (x)]2 , then p(x) ≥ 0 for all x ∈ R. Conversely, suppose that a polynomial p ∈ R[x] of degree 2d (and with highest degree term p2d x2d ) is nonnegative on the real line R. Then, the real roots of p should have even multiplicity, otherwise p would alter its sign in a neighborhood of a root. Let λj , j = 1, . . . , r, be its real roots with corresponding multiplicity 2mj . Its complex roots can be arranged in conjugate pairs, al + ibl , al − ibl , l = 1 . . . , h (with i2 = −1). Then p can
2.3 Representation Theorems: Univariate Case
23
be written in the form: x 7→
p(x) = p2d
r Y
j=1
(x − λj )2mj
h Y l=1
(x − al )2 + b2l .
Note that the leading coefficient p2d needs to be positive. By expanding the terms in the products, we see that p can be written as a sum of squares of k = 2h polynomials. (In fact, one may also show that p is a sum of only two squares.) We next concentrate on polynomials p ∈ R[x] that are nonnegative on an interval I ⊂ R. Moreover, the three cases I = (−∞, b], I = [a, ∞) and I = [a, b] (with a, b ∈ R) all reduce to the basic cases I = [0, ∞) and I = [−1, +1] using the change of variable 2x − (a + b) f (x) := p(b − x), f (x) := p(x − a), and f (x) := p , b−a respectively. The representation results that one obtains depend on the particular choice of the polynomials used in the description of the interval. The main result in the one-dimensional case can be summarized in the next theorem.
Theorem 2.6. Let p ∈ R[x] be of degree n. (a) Let g ∈ R[x] be the polynomial x 7→ g(x) := 1 − x2 . Then p ≥ 0 on [−1, 1] if and only if p = f + g h,
f, h ∈ Σ[x],
and with both summands of degree less than 2n. (b) Let x 7→ g1 (x) := 1 − x, x 7→ g2 (x) := 1 + x, x 7→ g3 (x) := g1 (x)g2 (x). Then, p ≥ 0 on I = [−1, 1] if and only if p = f 0 + g 1 f1 + g 2 f2 + g 3 f3 ,
f0 , f1 , f2 , f3 ∈ Σ[x].
In addition, all the summands have degree at most n, and f1 , f2 = 0, if n is even, whereas f0 , f3 = 0, if n is odd. The case [0, ∞) reduces to the case [−1, 1] (see Exercise 2.1) and we have:
2 Positive Polynomials
24
Theorem 2.7. Let p ∈ R[x] be nonnegative on [0, +∞). Then p = f 0 + x f1 , for two s.o.s. polynomials f0 , f1 ∈ Σ[x] and the degree of both summands is bounded by deg p.
It is important to emphasize that Theorem 2.6 (resp. Theorem 2.7) explicitly use the specific representation of the interval [−1, 1] (resp. [0, +∞)) as the basic semi-algebraic set: {x : 1−x ≥ 0; 1+x ≥ 0} or {x : 1−x2 ≥ 0} (resp. {x : x ≥ 0}). In Exercise 2.2 we consider the case [−1, 1] = {x ∈ R : h(x) ≥ 0}, with h not equal to the polynomial x 7→ (1 + x)(1 − x), and a weaker result is obtained. The next result also considers the interval [−1, 1] but provides another representation that does not use s.o.s.
Theorem 2.8. Let p ∈ R[x]. Then p > 0 on [−1, 1] if only if X p= cij (1 − x)i (1 + x)j , cij ≥ 0,
(2.7)
i+j≤d
for some sufficiently large d.
Notice that Theorem 2.8 leads to a linear optimization feasibility problem to determine the coefficients cij in the representation (2.7) (Exercise 2.3).
2.4
Representation Theorems: Mutivariate Case
In this section we consider the multivariate case. As already mentioned, a nonnegative polynomial p ∈ R[x] does not necessarily have a sum of squares representation. In Exercise 2.4 we show that the polynomials p(x1 , x2 ) = x21 x22 (x21 + x22 − 1) + 1,
p(x1 , x2 , x3 ) = x21 x22 (x21 + x22 − 3x23 ) + 6x63
2.4 Representation Theorems: Mutivariate Case
25
are nonnegative but they do not have a s.o.s. representation. On the other hand, nonnegative quadratic polynomials, and nonnegative fourth degree homogeneous polynomials of three variables have a s.o.s. representation (Exercise 2.5). The next celebrated theorem due to Poly´ a provides a certificate of positivity for homogeneous polynomials that are positive on the simplex. Theorem 2.9 (Poly´ a). If p ∈ R[x] is homogeneous and p > 0 on Rn+ \ {0}, then for sufficienly large k ∈ N, all non zero coefficients of the polynomial x 7→ (x1 + . . . + xn )k p(x) are positive. As a consequence of Theorem 2.9 we next obtain the following representation result for nonhomogeneous polynomials that are strictly positive on Rn+ . If p ∈ R[x], we denote by p˜ ∈ R[x, x0 ] the homogeneous polynomial associated with p, that is, p˜(x, x0 ) := xd0 p(x/x0 ), with d being the degree of p, and denote by pd (x) = p˜(x, 0) for all x ∈ Rn , the homogeneous part of p, of degree d. Pn Theorem 2.10. Let x 7→ g(x) := 1 + j=1 xj and p ∈ R[x] with degree d. If p > 0 on Rn+ and pd > 0 on Rn+ \ {0}, then for sufficienly large k ∈ N, the polynomial g k p has positive coefficients. n+1 Proof. We first prove that p˜ > 0 on Rn+1 + \{0}. Let (x, x0 ) ∈ R+ \{0}. If x0 = 0, then x ∈ Rn+ \ {0} and p˜(x, 0) = pd (x) > 0. If x0 > 0, then y := x/x0 ∈ R+ ˜(x, x0 ) = xd0 p(y) > 0. n , and thus p(y) > 0 implies that p n+1 Therefore, p˜ > 0 on R+ \ {0}. Hence by Theorem 2.9, for all sufficiently large k ∈ N, the polynomial k n X (x, x0 ) 7→ q(x, x0 ) := xj + x0 p˜(x, x0 ), j=1
has positive coefficients. But then, as p˜(x, 1) = p(x), it follows that the polynomial k n X x 7→ q(x, 1) = xj + 1 p˜(x, 1) = g k · p, j=1
has positive coefficients, for all sufficiently large k ∈ N.
2 Positive Polynomials
26
We next characterize when a semi-algebraic set described by polynomial inequalities, equalities and non-equalities is empty. In order to achieve this, we need the following definition. Definition 2.1. For F := {f1 , . . . , fm } ⊂ R[x], and a set J ⊆ {1, . . . , m} Q we denote by fJ ∈ R[x] the polynomial x 7→ fJ (x) := j∈J fj (x), with the convention that f∅ = 1. The set
P (f1 , . . . , fm ) :=
X
qJ fJ
J⊆{1,...,m}
:
qJ ∈ Σ[x]
(2.8)
is called (by algebraic geometers) a preordering. It is also a convex cone of R[x]; see Appendix A. We first state a key result from Stengle: Theorem 2.11 (Stengle). Let k be a real closed field, and let F := (fi )i∈I1 , G := (gi )i∈I2 , H := (hi )i∈I3 ⊂ k[x] be finite families of polynomials. Let (a) P (F ) be the preordering generated by the family F , (b) M (G) be the set of all finite products of the gi ’s, i ∈ I2 (the empty product being the constant polynomial 1), and (c) I(H) be the ideal generated by H. Consider semi-algebraic set K = {x ∈ k n : fi (x) ≥ 0,
hi (x) = 0,
∀ i ∈ I1 ;
∀ i ∈ I3 }.
gi (x) 6= 0,
∀ i ∈ I2 ;
The set K is empty if and only if there exist f ∈ P (F ), g ∈ M (G) and h ∈ I(H) such that f + g 2 + h = 0.
(2.9)
The polynomials f, g, h ∈ k[x] in (2.9) provide a Stengle certificate of K = ∅. In (2.9), there is also a (very large) bound on the degree of g ∈ M (G), the degree of the s.o.s. weights qJ ∈ Σ[x] in the representation
2.4 Representation Theorems: Mutivariate Case
27
(2.8) of f ∈ P (F ), as well as the degree of the weights pj ∈ R[x] in the P representation of h = j pj hj . Therefore in view of Proposition 2.1, in principle checking existence of a certificate (f, g, h) requires solving a single semidefinite program. But unfortunately, the available degree bound being huge, the size of such a semidefinite program is by far too large for practical implementation.
Stengle and Farkas certificates The Stengle certificate f, g, h ∈ k[x] in (2.9) for K = ∅, is a nonlinear generalization of the celebrated Farkas Lemma (or Theorem of the alternative) in linear algebra which provides a certificate of emptyness for a polyhedral set K = {x : Ax ≤ b} (for some matrix A ∈ Rm×n and some vector b ∈ Rm ). In fact, a Farkas certificate is a particularly simple Stengle certificate for convex polyhedra. Indeed, a Farkas certificate of ∅ = {x : Ax ≤ b} 0 0 is a nonnegative vector u ∈ Rm + such that u A = 0 and u b < 0. So let x 7→ fj (x) = (b − Ax)j , j = 1, . . . , m, x 7→ g1 (x) = 1 (so that M (G) is identical to the constant polynomial 1) and x 7→ h1 (x) = 0 (so that I(H) is the 0 polynomial). The polynomial m
x 7→ f (x) :=
−1 X uj (b − Ax)j u0 b j=1
is an element of P (F ) because uj ≥ 0 for every j = 1, . . . , m and u0 b < 0. But then
1 + f (x) = 1 −
m 1 X u0 b − u0 Ax u (b − Ax) = 1 − = 0, j j u0 b j=1 u0 b
that is, (2.9) holds. We next consider basic semi-algebraic sets, that is, semi-algebraic sets defined by inequalities only. As a direct consequence of Theorem 2.11, one obtains:
28
2 Positive Polynomials
Theorem 2.12. (Stengle’s Positivstellensatz and Nullstellensatz). Let k be a real closed field, f ∈ k[x], and let K = { x ∈ kn :
fj (x) ≥ 0, j = 1, . . . , m }.
(a) Nichtnegativstellensatz. f ≥ 0 on K if and only if there exists ` ∈ N, and g, h ∈ P (f1 , . . . , fm ) such that f g = f 2` + h. (b) Positivstellensatz. f > 0 on K if and only if there exist g, h ∈ P (f1 , . . . , fm ) such that f g = 1 + h. (c) Nullstellensatz. f = 0 on K if and only if there exists ` ∈ N, and g ∈ P (f1 , . . . , fm ) such that f 2` + g = 0. Again, as for Theorem 2.11, there is also a bound on ` and the degree of the s.o.s. weights qJ ∈ Σ[x] in the representation (2.8) of g, h ∈ P (f1 , . . . , fm ). This bound depends only on the dimension n and on the degree of the polynomials (f, f1 , . . . , fm ). Therefore in principle, checking existence of a certificate (l, g, h) in Theorem 2.12(a)-(c) requires solving a single semidefinite program (but of huge size). In practice, one fixes an a priori (much smaller) degree bound and solves the corresponding semidefinite program. If the latter has a feasible solution then one obtains a certificate g, h ∈ P (f1 , . . . , fm ). However, such a certificate is numerical and so can be obtained only up to some machine precision, because of numerical inaccuracies inherent to semidefinite programming solvers.
2.5
Polynomials Positive on a Compact Basic Semi-algebraic Set
In this section, we restrict our attention to compact basic semi-algebraic sets K ⊂ Rn and obtain certificates of positivity on K that have certain algorithmic advantages. In fact, Putinar’s Positivstellensatz below is the key result that we will later use extensively to solve the GMP. 2.5.1
Representations via sums of squares
The first representation result, known as Schm¨ udgen’s Positivstellensatz, was an important breakthrough as it was the first to provide a simple characterization of polynomials positive on a compact basic semi-algebraic set K, and with no additional assumption on K (or on its description).
2.5 Polynomials Positive on a Compact Basic Semi-algebraic Set
29
Theorem 2.13 (Schm¨ udgen’s Positivstellensatz). Let (gj )m j=1 ⊂ R[x] be such that the basic semi-algebraic set K := {x ∈ Rn :
gj (x) ≥ 0, j = 1, . . . , m}
(2.10)
is compact. If f ∈ R[x] is strictly positive on K then f ∈ P (g1 , . . . , gm ), that is, X f = fJ gJ , for some s.o.s. fJ ∈ Σ[x], (2.11) J⊆{1,...,m}
and with gJ =
Q
j∈J
gj .
Theorem 2.13 is a very powerful result, but note that the number of terms in (2.11) is exponential in the number of polynomials that define the set K. However, a major improvement is possible under a relatively weak assumption on the polynomials that define the compact set K. Associated with the finite family (gj ) ⊂ R[x], the subset of P (g1 , . . . , gm ) defined by m X Q(g) = Q(g1 , . . . , gm ) := q0 + qj gj : (qj )m ⊂ Σ[x] (2.12) j=0 j=1
is a convex cone called the quadratic module generated by the family (g j ).
Assumption 2.1. With (gj )m j=1 ⊂ R[x], there exists u ∈ Q(g) such that n the level set {x ∈ R : u(x) ≥ 0} is compact. Theorem 2.14 (Putinar’s Positivstellensatz). Let K ⊂ Rn be as in (2.10) and let Assumption 2.1 hold. If f ∈ R[x] is strictly positive on K then f ∈ Q(g), that is, f = f0 +
m X
fj gj ,
(2.13)
j=1
for some s.o.s. polynomials fj ∈ Σ[x], j = 0, 1, . . . , m. In contrast to Theorem 2.13, the number of terms in the representation (2.13) is linear in the number of polynomials that define K, a crucial improvement from a computational point of view. The condition on K
30
2 Positive Polynomials
(Assumption 2.1) is not very restrictive. For instance, it is satisfied in the following cases: (a) All the gi ’s are affine and K is a polytope. (b) The set {x ∈ Rn : gj (x) ≥ 0} is compact for some j ∈ {1, . . . , m}. Also, suppose that we know some N > 0 such that K ⊂ {x ∈ Rn : kxk2 ≤ N }. Let x 7→ gm+1 (x) := N − kxk2 . Adding the quadratic constraint gm+1 (x) ≥ 0 in the definition (2.10) of K does not change K as this last constraint is redundant. But with this new representation, K satisfies the required condition in Theorem 2.14. The following theorem provides further understanding on the condition in Theorem 2.14.
n Theorem 2.15. Let (gj )m j=1 ⊂ R[x], assume that K ⊂ R defined in (2.10) is compact, and let Q(g) be as in (2.12). The following conditions are equivalent: (a) There exist finitely many p1 , . . . , ps ∈ Q(g) such that the level set {x ∈ Rn : pj (x) ≥ 0, j = 1, . . . , s} which contains K is compact and Q j∈J pj ∈ Q(g) for all J ⊆ {1, . . . , s}. (b) Assumption 2.1 holds. (c) There exists N ∈ N such that the polynomial x 7→ N − kxk2 ∈ Q(g). (d) For all p ∈ R[x], there is some N ∈ N such that both polynomials x 7→ N + p(x) and x 7→ N − p(x) are in Q(g).
Both Theorems 2.13 and 2.14 have significant computational advantages. Indeed, from Proposition 2.1, given a polynomial f > 0 on K, checking whether f has the representation (2.11) or (2.13), and assuming an a priori bound on the degree of the unknown s.o.s. polynomials, reduces to solving a semidefinite optimization problem, as we saw in Section 2.3. Example 2.2. Let x 7→ f (x) = x31 − x21 + 2x1 x2 − x22 + x32 and K = {x : g1 (x) = x1 ≥ 0, g2 (x) = x2 ≥ 0, g3 (x) = x1 +x2 −1 ≥ 0}. To check whether P3 f ≥ 0 on K we attempt to write f = f0 + i=1 fi gi , where fi ∈ Σ[x] and each has degree 2, that is, fi = (1, x1 , x2 )Qi (1, x1 , x2 )0 ,
Qi 0, i = 0, 1, 2, 3.
2.5 Polynomials Positive on a Compact Basic Semi-algebraic Set
31
Solving the semidefinite feasiblity problem, we find that 000 000 0 0 0 Q0 = 0, Q1 = 0 0 0 , Q2 = 0 1 0 , Q3 = 0 1 −1 , 001 000 0 −1 1
and so,
f (x) = x22 x1 + x21 x2 + (x1 − x2 )2 (x1 + x2 − 1) which proves that f ≥ 0 on K. Degree bound The following result provides a bound on the degree of the weights (fj ) ⊂ Σ[x] in the representation (2.13). For f ∈ R[x] written x 7→ f (x) = P α α fα x let |α| |α|! |fα | . := kf k0 := max |α| with α α 1 ! · · · αn ! α α
Theorem 2.16 (Degree bound). Let K ⊂ Rn in (2.10) satisfy Assumption 2.1 and assume that ∅ 6= K ⊂ (−1, 1)n . Then there is some c > 0 such that for all f ∈ R[x] of degree d, and positive on K (i.e. such that f ∗ := min{f (x) : x ∈ K} > 0), the representation (2.13) holds with c kf k0 deg fj gj ≤ c exp d 2 nd ∗ , ∀j = 1, . . . , m. f In view of the importance of Theorem 2.14 in the book, we next provide a proof due to M. Schweighofer. Proof of Theorem 2.14 We first need some preliminary results. A quadratic module Q ⊂ R[x] satisfies 1 ∈ Q, Q + Q ⊂ Q and Σ[x]Q ⊂ Q. In addition Q is proper if −1 6∈ Q, that is, if Q 6= R[x]. Finally Q is said to be Archimedean if for every f ∈ Q there is some N ∈ N such that N ± f ∈ Q. Lemma 2.17. If Q ⊂ R[x] is a quadratic module then I := Q ∩ −Q is an ideal. Proof. Let f ∈ R[x] and g ∈ I . Then f g = g (f + 1)2 /2 + (−g)(f − 1)2 /2 ∈ I.
32
2 Positive Polynomials
Lemma 2.18. If Q ⊂ R[x] is a maximal proper quadratic module then Q ∪ −Q = R[x]. Proof. Let f ∈ R[x] and assume that f 6∈ Q ∪ −Q. As Q is maximal, the quadratic modules Q + f Σ[x] and Q − f Σ[x] are not proper; that is there exists g1 , g2 ∈ Q and s1 , s2 ∈ Σ[x] such that −1 = g1 + s1 f and −1 = g2 − s2 f . Multiplying the first by s2 and the second by s1 yield s1 + s2 + s1 g1 + s2 g2 = 0. Therefore s1 , s2 ∈ I := Q ∩ −Q. But then s1 f ∈ I because I is an ideal and so we obtain the contradiction −1 = g1 +s1 f ∈ Q.
Lemma 2.19. Let Q ⊂ R[x] be an Archimedean maximal proper quadratic module, let I := Q ∩ −Q, and let f ∈ R[x]. Then there is a unique a ∈ R such that f − a ∈ I. Proof. As Q is Archimedean both sets A := {a ∈ R : f − a ∈ Q} and B := {b ∈ R : b − f ∈ Q} are not empty. It suffices to show that A ∩ B is a singleton. Being proper, Q does not contain any negative number and therefore a ≤ b for all a ∈ A, b ∈ B. Set a0 := sup{a : a ∈ A} and b0 := sup{b : b ∈ B} so that a0 ≤ b0 . In fact a0 = b0 because if a0 < c < b0 then f − c 6∈ Q ∪ −Q, which contradicts Lemma 2.18. Then it suffices to show that a0 ∈ A and b0 ∈ B in which case A ∩ B = a0 . Assume that a0 6∈ A, i.e., f − a0 6∈ Q. Then by maximality of Q, the quadratic module Q0 := Q + (f − a0 )Σ[x] cannot be proper and so −1 = g + (f − a0 )s for some g ∈ Q and s ∈ Σ[x]. As Q is Archimedean, choose N ∈ N such that N − s ∈ Q and ∈ R such that 0 < < N −1 . As a0 − ∈ A one has f − (a0 − ) ∈ Q and so −1 + s = g + (f − a0 + )s ∈ Q and (N − s) ∈ Q. Adding up yields the contradiction N − 1 ∈ Q and N − 1 < 0. Therefore a0 ∈ A and similar arguments also yield b0 ∈ B, the desired result. So now let f ∈ R[x] be (strictly) positive on K. We will prove that f ∈ Q(g). Proposition 2.20. There exists s ∈ Σ[x] such that sf ∈ 1 + Q(g). Proof. We shall prove that the quadratic module Q0 := Q(g) − f Σ[x] is not proper. Assume that Q0 is proper. By Zorn’s lemma one may extend Q0 to a maximal proper quadratic module Q ⊃ Q0 and Q is Archimedean because Q ⊃ Q(g). By Lemma 2.19 there exists a ∈ Rn such that xi − ai ∈ I := Q ∩ −Q for every i = 1, . . . , n. Since I is an ideal, f − f (a) ∈ I for
2.5 Polynomials Positive on a Compact Basic Semi-algebraic Set
33
any f ∈ R[x]. Choosing f := gj yields that gj (a) = gj − (gj − gj (a)) ∈ Q because gj ∈ Q(g) ⊂ Q and −(gj − gj (a)) ∈ Q. This implies gj (a) ≥ 0 and as j was arbitrary, a ∈ K. Finally, −f (a) = (f − f (a)) − f ∈ Q because f − f (a) ∈ I ⊂ Q and −f ∈ Q0 ⊂ Q; therefore −f (a) ≥ 0, the desired result. Proposition 2.21. There exist h ∈ Q(g) and N ∈ N such that N −h ∈ Σ[x] and hf ∈ 1 + Q(g). Proof. Choose s ∈ Σ[x] as in Proposition 2.20. Using Theorem 2.15(d) there exists k ∈ N such that 2k − s, 2k − s2 f − 1 ∈ Q(g). Let h := s(2k − s) and N := k 2 ; then h ∈ Q(g) and N − h = k 2 − 2sk + s2 = (k − s)2 ∈ Σ[x]. In addition, hf − 1 = s(2k − s)f − 1 = 2k(sf − 1) + (2k − s2 f − 1) ∈ Q(g) because sf − 1, 2k − s2 f − 1 ∈ Q(g). To conclude, choosing h, N as in Proposition 2.21 with N > 0 and k ∈ N such that k + f ∈ Q(g), one obtains (k −
1 1 )+f = ((N − h)(k + f ) + (hf − 1) + kh) ∈ Q(g). N N
Iterate this process with (k − N −1 ) in lieu of k to obtain that (k − 2N −1 ) + f ∈ Q(g). After kN iterations of this process one obtains the desired result f ∈ Q(g). 2.5.2
A matrix version of Putinar’s Positivstellensatz P For f ∈ R[x] written x 7→ f (x) = α fα xα , recall that kf k0 = max |fα | α
α1 ! · · · α n ! . |α|!
(2.14)
Definition 2.2. (a) The norm of a real symmetric matrix-polynomial F ∈ R[x]p×p is defined by kFk := maxkξk=1 kξ 0 F(x)ξk0 . (b) A real symmetric matrix-polynomial F ∈ R[x]p×p is said to be a sum of squares (in short s.o.s.) if F = L L0 for some q ∈ N and some real matrix-polynomial L ∈ R[x]p×q . Let I denote the p × p identity matrix.
2 Positive Polynomials
34
Theorem 2.22. Let K ⊂ Rn be the basic semi-algebraic set in (2.10) and let Assumption 2.1 hold. Let F ∈ R[x]p×p be a real symmetric matrix-polynomial of degree d. If for some δ > 0, F(x) δ I for all x ∈ K, then F(x) = F0 (x) +
m X
Fi (x) gj (x)
(2.15)
j=1
for some s.o.s. matrix-polynomials (Fj )m j=0 , and c 2 d kFk . deg F0 , deg F1 g1 , . . . deg Fm gm ≤ c d n δ Obviously, Theorem 2.22 is a matrix-polynomial analogue of Theorem 2.14. In fact, one might have characterized the property F(x) δ I on K by using Theorem 2.14 as follows. Let S := {ξ ∈ Rp : ξ 0 ξ = 1} and notice that the compact basic semi-algebraic set K × S ⊂ Rn × Rp satisfies Assumption 2.1 whenever K does. If f ∈ R[x, ξ] denotes the polynomial (x, ξ) 7→ ξ 0 F(x)ξ, then F δI on K
⇔
f ≥ δ
on K × S,
and so by Theorem 2.14, F δI on K implies ξ 0 F(x)ξ = σ0 (x, ξ) +
m X j=1
σj (x, ξ) gj (x) + σm+1 (x, ξ) (ξ 0 ξ − 1),
(2.16)
for some s.o.s. polynomials (σ0 )m 0 ⊂ Σ[x, ξ] and some polynomial σm+1 ∈ R[x, ξ]. However, in general, the degree of weights in the representation (2.15) and (2.16) is not the same. It is not clear whether one should be preferred. 2.5.3
An alternative representation
We next present an alternative representation not based on s.o.s. polynomials. We make the following assumption: Assumption 2.2. The set K in (2.10) is compact and the polynomials (1, gj )m j=1 generate the algebra R[x].
2.5 Polynomials Positive on a Compact Basic Semi-algebraic Set
35
Note that if (1, gj )m j=1 do not generate R[x], we add redundant inequalities as follows. Let xk ≤ min{xk : x ∈ K} for all k = 1, . . . , n. Then, with gm+k (x) := xk − xk , we introduce the additional (redundant) constraints gm+k (x) ≥ 0, k = 1, . . . , n, in the definition (2.10) of K, and reset m := m + n. With this new equivalent definition, Assumption 2.2 holds. For every j = 1, . . . , m, let g j := maxx∈K gj (x) (well-defined because K is compact), and let (b g j )m j=1 be the polynomials gj ’s, normalized with respect to K, that is, for j = 1, . . . , m ( gj /gj , if g j > 0, gbj := (2.17) gj , if g j = 0.
We next let G := (0, 1, b g1 , . . . , gbm , 1 − b g1 , . . . , 1 − b gm) ⊂ R[x], and let ∆G ⊂ R[x] be the set of all products of the form q1 · · · qk , for polynomials (qj )kj=1 ⊂ G, and integer k ≥ 1. Denote by CG the cone generated by ∆G , i.e., f ∈ CG if X αm f= cαβ gb1α1 · · · gbm (1 − gb1 )β1 · · · (1 − gbm )βm , α,β∈Nm
for finitely many nonnegative coefficients (cαβ ) ⊂ R+ (and with gbk0 = 1), or using the vector notation 1 − gb1 gb1 b = ... , b = ... , 1 − g g gbm
f ∈ CG if f has the compact form f =
X
α,β∈Nm
1 − gbm
bα (1 − g b )β . cαβ g
(2.18)
Equivalently, f ∈ CG if f is a polynomial of R[b g1 , . . . , gbm , 1−b g1, . . . , 1−b gm], with nonnegative coefficients. n Theorem 2.23. Let (gi )m i=1 ⊂ R[x], K ⊂ R be as in (2.10). Under Assumption 2.2, if f ∈ R[x] is strictly positive on K, then f ∈ CG . Equivalently, (2.18) holds for finitely many nonnegative coefficients (cαβ )α,β∈Nm ∈ R+ .
36
2 Positive Polynomials
In contrast to Theorems 2.13 and 2.14, the representation (2.18) involves some nonnegative scalar coefficients (cαβ ) rather than s.o.s. polynomials in Σ[x]. Determining if f > 0 on K using Theorem 2.23 leads to a linear optimization feasibility problem for which extremely efficient software packages are available. On the other hand, it involves products of arbitrary powers of the gi ’s and (1 − gj )’s, a highly undesirable feature. In particular, the presence of large binomial coefficients is source of ill-conditioning and numerical instability. The case of polytopes If K is a convex polytope then Theorem 2.23 simplifies and we obtain a generalization to a polytope in Rn of Theorem 2.8 for [−1, 1] ⊂ R. Theorem 2.24. Let gj ∈ R[x] be affine for every j = 1, . . . , m and assume that K in (2.10) is compact with a nonempty interior. If f ∈ R[x] is strictly positive on K then X αm f = cα g1α1 · · · gm , (2.19) α∈Nm
for finitely many nonnegative scalars (cα ). Notice that Theorem 2.24 is of the same flavor as Theorem 2.23, except it does not require to introduce the polynomials 1 − gj /gj , j = 1, . . . , m. Remarks There are three features that distinguish the case n > 1 from the case n = 1 treated in the previous section. (a) Theorems 2.13, 2.14, and 2.23, all deal with compact sets K, whereas Theorem 2.6 can handle the (non compact) interval [0, ∞). (b) In Theorems 2.13, 2.14, and 2.23, f is restricted to be strictly positive, instead of nonnegative in Theorem 2.6. (c) In Theorems 2.13, 2.14, and 2.23, nothing is said on the degree of the polynomials involved in (2.11), (2.13), or in (2.18), whereas in Theorem 2.6, the degree is bounded and known in advance. In fact, bounds exist for the representations (2.11) and (2.13); see e.g. Theorem 2.16. However they are not practical from a computational viewpoint. This is the reason why
2.6 Polynomials Nonnegative on Real Varieties
37
Theorems 2.13, 2.14, and 2.23 do not lead to a polynomial time algorithm to check whether a polynomial f is positive on K. 2.6
Polynomials Nonnegative on Real Varieties
In this section, we introduce representations of polynomials on a real variety. The first result considers an arbitrary real variety, whereas the second considers a finite variety associated with a zero-dimensional ideal I of R[x] which is radical. A brief background on basic definitions and results of algebraic geometry can be found in Appendix A. Let V ⊂ Rn be the real variety defined by: V := { x ∈ Rn :
gj (x) = 0,
j = 1, . . . , m },
(2.20)
for some family of real polynomials (gj ) ⊂ R[x], and given f ∈ R[x] and > 0, let fr ∈ R[x] be the polynomial: fr := f +
r X n X x2k i
k=0 i=1
k!
,
≥ 0,
r ∈ N.
(2.21)
Theorem 2.25. Let V ⊂ Rn be as in (2.20), and let f ∈ R[x] be nonnegative on V . Then, for every > 0, there exists r ∈ N and nonnegative scalars (λj )m j=1 , such that, for all r ≥ r , fr +
m X
λj gj2
is s.o.s.
(2.22)
j=1
In addition, kf − fr k1 →0, as ↓ 0 (and r ≥ r ). Theorem 2.25 is a denseness result, the analogue for varieties of Theorem 2.4(b) for V = Rn , and provides a certificate of nonnegativity of f on V . Notice that in contrast with Theorems 2.13 and 2.14 (letting an equality constraint being two reverse inequality constraints) Theorem 2.25 makes no assumption on the variety V ; in addition one has scalar multipliers (λj } in (2.22) instead of s.o.s. multipliers in (2.11) or (2.13). On the other hand, the former theorems state that if V is compact and f is nonnegative on V , then f + = f/n,0 has the sum of squares representation (2.11) (or (2.13)), and kf − f/n0 k∞ →0 as ↓ 0, instead of the weaker kf − fr k1 ↓ 0 in Theorem 2.25.
38
2 Positive Polynomials
Polynomials nonnegative on a finite variety We recall that a zero-dimensional ideal I ⊂ R[x] is an ideal2 such that the associated variety VC (I) := { x ∈ Cn : g(x) = 0 ∀g ∈ I } is finite. In such a case, the quotient ring R[x]/I is a finite-dimensional R-vector space whose dimension is larger than |VC (I)| and equal to |VC (I)| if and only if I is radical. This is an important special case when one deals with discrete sets as in discrete optimization; for instance when the set K ⊂ Rn consists of the grid points (xij ) that are solutions of the polynomial equations 2ri Y K = x ∈ Rn : (xi − xij ) = 0; i = 1, . . . , n . (2.23) j=1
Binary optimization deals with the case when ri = 1 for all i = 1, . . . , n and xij ∈ {0, 1}. Theorem 2.26. Let I = hg1 , . . . , gm i be a zero-dimensional ideal of R[x] with associated (finite) variety VC (I) ⊂ Cn . Assume that I is radical √ (I = I), and let S := VC (I) ∩ Rn . Let (hi )ri=1 ⊂ R[x] and K := {x ∈ S :
hi (x) ≥ 0,
i = 1, . . . , r}.
(2.24)
Then f ∈ R[x] is nonnegative on K if and only if f = f0 +
r X
f i hi +
i=1
m X
vi gi
(2.25)
j=1
where (fi )ri=0 ⊂ Σ[x] and (vi )m i=1 ⊂ R[x]. In the case where K is the set (2.23), the degree of the polynomials (fi ) Pn is bounded by ( i=1 ri ) − n. For finite varieties, observe that Theorem 2.26 provides a representation result stronger than that of Theorem 2.14. In particular, f is not required to be strictly positive and sometimes, a degree bound is also available. 2 For definitions and properties of ideal, radical ideal, etc. the reader is referred to Section A.2.
2.7 Representations with Sparsity Properties
2.7
39
Representations with Sparsity Properties
In this section, we introduce sparse representations for polynomials f nonnegative on a basic semi-algebraic set K, when there is weak coupling between some subsets of variables in the polynomials gj that define the set K, and f . By weak coupling (to be detailed later) we mean that (a) each polynomial in the definition of K contains a few variables only, and (b), the polynomial f is a sum of polynomials, each containing also a few variables only. This sparse representation is computationally important as it translates to smaller semidefinite programs for computing the s.o.s. polynomials that define the representation. In fact, given the current state of semidefinite optimization, it is absolutely critical to exploit sparsity in order to solve problems involving a large number of variables. We first consider the simple case of three sets of variables. Denote by R[x, y, z] the ring of real polynomial in the variables (x1 , . . . , xn ), (y1 , . . . , ym ) and (z1 , . . . , zp ). Let Kxy ⊂ Rn+m , Kyz ⊂ Rm+p , and K ⊂ Rn+m+p be the compact basic semi-algebraic sets defined by: Kxy = { (x, y) ∈ Rn+m : Kyz = { (y, z) ∈ R
m+p
K = { (x, y, z) ∈ R
gj (x, y) ≥ 0,
:
n+m+p
hk (y, z) ≥ 0,
j ∈ Ixy }
(2.26)
k ∈ Iyz }
: (x, y) ∈ Kxy ;
(2.27)
(y, z) ∈ Kyz } (2.28)
for some polynomials (gj ) ⊂ R[x, y], (hk ) ⊂ R[y, z], and some finite index sets Ixy , Iyz ⊂ N. Denote by Σ[x, y] (resp. Σ[y, z]) the set of sums of squares in R[x, y] (resp. R[y, z]). Let P (g) ⊂ R[x, y] and P (h) ⊂ R[y, z] be the preorderings generated by (gj )j∈Ixy and (hk )k∈Iyz , respectively, that is P (g) =
P (h) =
X
J⊆Ixy X
σJ
Y
j∈J
σJ
J⊆Iyz
Y
k∈J
gj : σJ ∈ Σ[x, y] hk
!
: σJ ∈ Σ[y, z]
.
Similarly, let Q(g) ⊂ R[x, y] and Q(h) ⊂ R[y, z] be the quadratic modules X Q(g) = σ0 + σj gj : σ0 , σj ∈ Σ[x, y] j∈Ixy
2 Positive Polynomials
40
Q(h) =
σ0 +
X
σk h k :
k∈Iyz
σ0 , σk ∈ Σ[y, z]
.
Theorem 2.27. Let Kxy ⊂ Rn , Kyz ⊂ Rm , and K ⊂ Rn+m+p be the compact basic semi-algebraic sets defined in (2.26)-(2.10), and assume that K has nonempty interior. Let f ∈ R[x, y] + R[y, z]. (a) If f is positive on K, then f ∈ P (g) + P (h). (b) If (x, y) 7→ N − k(x, y)k2 ∈ Q(g) and (y, z) 7→ N − k(y, z)k2 ∈ Q(h) for some scalar N , and if f is positive on K, then f ∈ Q(g) + Q(h). Let us contrast Theorem 2.14 with Theorem 2.27(b). Theorem 2.14 gives the representation: X X f = p0 + pj gj + q k hk j
k
with p0 , pj , qk ∈ Σ(x, y, z), while Theorem 2.27(b) gives the representation X X f = p0 + pj gj + q 0 + q k hk , j
k
with p0 , pj ∈ Σ(x, y), and q0 , qk ∈ Σ(y, z). This sparser representation leads to semidefinite optimization problems of significantlty lower dimension, which in turn lead to significant computational advantages. In other words, Theorem 2.27 implies that the absence of coupling between the two sets of variables x and z in the original data f, gj , hk , is also reflected in the specialized sparse representations of Theorem 2.27(a)-(b). Example 2.3. Let K = {(x, y, z)| 1 − x2 − y 2 ≥ 0, 1 − y 2 − z 2 ≥ 0} and f (x, y, z) = 1 + x2 y 2 − x2 y 4 + y 2 + z 2 − y 2 z 4 . Theorem 2.27(b) allows the sparse representation f = 1 + x4 y 2 + y 4 z 2 + x2 y 2 (1 − x2 − y 2 ) + y 2 z 2 (1 − y 2 − z 2 ), with p0 (x, y) = 1 + x4 y 2 , p1 (x, y) = x2 y 2 , q0 (y, z) = y 4 z 2 and q1 (y, z) = y 2 z 2 . Note that p0 , p1 ∈ Σ(x, y) and q0 , q1 ∈ Σ(y, z). We next extend Theorem 2.27 to more general sparsity patterns, when f and gj satisfy certain sparsity conditions themselves. With R[x] = R[x1 , . . . , xn ], let I0 := {1, . . . , n} be the union ∪pk=1 Ik of p subsets Ik , k = 1, . . . , p (with possible overlaps). Let nk = |Ik |. Let R[x(Ik )] denote
2.7 Representations with Sparsity Properties
41
the ring of polynomials in the nk variables x(Ik ) = {xi : i ∈ Ik }, and so R[x(I0 )] = R[x]. Assumption 2.3. Let K ⊂ Rn be as in (2.10). A scalar M > 0 is known and such that kxk∞ < M for all x ∈ K. P 2 2 Note that under Assumption 2.3, we have i∈Ik xi ≤ nk M , k = 1, . . . , p, and therefore, in the Definition (2.10) of K, we add the p redundant quadratic constraints X 0 ≤ gm+k (x) (:= nk M 2 − x2i ), k = 1, . . . , p, (2.29) i∈Ik
and set m0 = m + p, so that K is now defined by: K := {x ∈ Rn :
gj (x) ≥ 0,
j = 1, . . . , m0 }.
(2.30)
Note that gm+k ∈ R[x(Ik )], for all k = 1, . . . , p. Assumption 2.4. Let K ⊂ Rn be as in (2.30). The index set J = {1, . . . , m0 } is partitioned into p disjoint sets Jk , k = 1, . . . , p, and the collections {Ik } and {Jk } satisfy: (a) For every j ∈ Jk , gj ∈ R[x(Ik )], that is, for every j ∈ Jk , the constraint gj (x) ≥ 0 only involves the variables x(Ik ) = {xi : i ∈ Ik }. (b) The objective function f ∈ R[x] can be written as f =
p X
k=1
fk ,
with fk ∈ R[x(Ik )],
k = 1, . . . , p.
(2.31)
The main result about sparsity is as follows.
Theorem 2.28. Let K ⊂ Rn be as in (2.30) (i.e. K as in (2.10) with the additional redundant quadratic constraints (2.29)). Let Assumptions 2.3 and 2.4 hold and in addition, assume that for every k = 1, . . . , p − 1, Ik+1 ∩ ∪kj=1 Ij ⊆ Is for some s ≤ k. (2.32) If f ∈ R[x] is strictly positive on K, then p X X qk + f = qjk gj , k=1
(2.33)
j∈Jk
for some sums of squares polynomials (qk , qjk ) ⊂ R[x(Ik )], k = 1, . . . , p.
2 Positive Polynomials
42
The key property (2.32) that allows the sparse representation (2.33) is called the running intersection property. Under this property the absence of coupling of variables in the original data is preserved in the representation (2.33). So Theorem 2.28 provides a representation that is more specific than that of Theorem 2.14. Let us illustrate Theorem 2.28 with an example. Example 2.4. Let x = (x1 , . . . , x5 ) and x 7→ g1 (x) = 1 − x21 − x22 x 7→ g2 (x) = 1 − x42 x43 − x23
x 7→ g3 (x) = 1 − x23 − x24 − x25
x 7→ f (x) = 1 + x21 x22 + x22 x23 + x23 − x23 x24 − x23 x25 K = {x :
g1 (x1 , x2 ), g2 (x2 , x3 ), g3 (x3 , x4 , x5 ) ≥ 0}.
Then I1 = {1, 2} meaning that the polynomial g1 only involves variables x1 , x2 , I2 = {2, 3} and I3 = {3, 4, 5}. Moreover, f = f1 + f2 + f3 , with f1 = 1 + x21 x22 , f2 = x22 x23 and f3 = x23 − x23 x24 − x23 x25 . Let us check property (2.32). For k = 1, I2 ∩ I1 = {2} ⊂ I1 . For k = 2, I3 ∩ (I1 ∪ I2 ) = {3} ⊂ I2 and thus property (2.32) holds. Thus, Theorem 2.28 allows the sparse representation f (x) = 1 + x41 x22 + x21 x42 + x21 x22 g1 (x) +x62 x63 + x22 x43 + x22 x23 g2 (x) +x43 + x23 g3 (x).
2.8
Representation of Convex Polynomials
If C ⊆ Rn is a nonempty convex set, a function f : C→R is convex on C if f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y),
∀ λ ∈ (0, 1), x, y ∈ C.
Similarly, f is strictly convex on C if the above inequality is strict for every x, y ∈ C, x 6= y, and all λ ∈ (0, 1). If C ⊆ Rn is an open convex set and f is twice differentiable on C, then f is convex on C if and only if its Hessian3 ∇2 f is positive semidefinite on C (denoted ∇2 f 0 on C). Finally, if ∇2 f is positive definite on C (denoted ∇2 f 0 on C) then f is strictly convex on C. 3 Recall that the Hessian ∇2 f (x) is the n × n symmetric matrix whose entry (i, j) is ∂ 2 f /∂xi ∂xj evaluated at x.
2.8 Representation of Convex Polynomials
43
Definition 2.3. A polynomial f ∈ R[x]2d is said to be s.o.s.-convex if its Hessian ∇2 f is a s.o.s. matrix polynomial, that is, ∇2 f = LL0 for some real matrix polynomial L ∈ R[x]n×s (for some s ∈ N). Pn Example 2.5. A polynomial f ∈ R[x] is said to be separable if f = i=1 fi for some univariate polynomials fi ∈ R[xi ]. And so, every convex separable polynomial f is s.o.s.-convex because its Hessian ∇2 f (x) is a positive 00 semidefinite diagonal matrix, with all diagonal entries (fi (xi ))i , nonnega00 tive for all x ∈ Rn . Hence, being univariate and nonnegative, fi is s.o.s. for every i = 1, . . . , n, from which one easily concludes that f is s.o.s.-convex. An important feature of s.o.s.-convexity is that it can be can be checked numerically by solving a semidefinite program. Of course every s.o.s.-convex polynomial is convex. Also, the s.o.s.-convex polynomials have the following interesting property.
Lemma 2.29. If a symmetric matrix polynomial P ∈ R[x]r×r is s.o.s. then the double integral Z 1Z t (x, u) 7→ F(x, u) := P(u + s(x − u)) ds dt (2.34) 0
0
is also a symmetric s.o.s. matrix polynomial F ∈ R[x, u]r×r . Proof. Writing P = (pij )1≤i,j≤n with pij ∈ R[x] for every 1 ≤ i, j ≤ n, let d := maxi,j deg pij . With x, u fixed, introduce the univariate matrix polynomial s 7→ Q(s) := P(u + s(x − u)) so that Q = (qij ) where qij ∈ R[s] has degree at most d for every 1 ≤ i, j ≤ n. Observe that Z 1Z t Z 1Z t Z P(u + s(x − u)) dsdt = Q(s) dsdt = Q(s) dµ, 0
0
0
0
∆
where µ is uniformly distibuted on the set ∆ := {(s, t) : 0 ≤ t ≤ 1; 0 ≤ s ≤ t}. By Tchakaloff’s theorem (see Tchakaloff (1957) and Theorem B.12), there exists a measure ϕ finitely supported on, say m, points {(sk , tk )}m k=1 ⊂ ∆ whose moments up to degree d match exactly those of µ. That is, there exist positive weights hk , k = 1, . . . , m, such that Z m X f dµ = hk f (sk , tk ), ∀ f ∈ R[s, t], deg f ≤ d. ∆
k=1
2 Positive Polynomials
44
So let x, u ∈ Rn be fixed. If P = LL0 for some L ∈ R[x]r×q , one obtains: Z
1 0
Z
t 0
P(u + s(x − u)) dsdt = =
m X
k=1 m X k=1
hk Q(sk ) hk L(u + sk (x − u)) L0 (u + sk (x − u))
= AA0 for some A ∈ R[x, u]r×mq .
And as a consequence: Lemma 2.30. For a polynomial f ∈ R[x] and every x, u ∈ Rn : f (x) = f (u) + ∇f (u)0 (x − u) Z 1Z t + (x − u)0 ∇2 f (u + s(x − u))dsdt (x − u). 0 0 | {z } F(x,u)
And so if f is s.o.s.-convex and f (u) = 0, ∇f (u) = 0, then f is a s.o.s. polynomial. Let K be the basic semi-algebraic set in (2.10), Q(g) be as in (2.12) and define Qc (g) ⊂ Q(g) to be the set: Qc (g) :=
σ0 +
m X j=1
λj g j :
λ ∈ Rm + ; σ0 ∈ Σ[x], σ0 convex
. (2.35)
The set Qc (g) is a specialization of the quadratic module Q(g) to the convex case, in that the weights associated with the gj ’s are nonnegative scalars, i.e., s.o.s. polynomials of degree 0, and the s.o.s. polynomial σ0 is convex. In particular, every f ∈ Qc (g) is nonnegative on K. Let FK ⊂ R[x] be the convex cone of convex polynomials nonnegative on K. Recall that kf k 1 P denote the l1 -norm of the vector of coefficients, i.e., kf k1 = α |fα |.
2.8 Representation of Convex Polynomials
45
Theorem 2.31. Let K be as in (2.10), Slater’s condition4 hold and gj be concave for every j = 1, . . . , m. Then with Qc (g) as in (2.35), the set Qc (g) ∩ FK is dense in FK for the l1 -norm k · k1 . In particular, if K = Rn (so that FRn =: F is now the set of nonnegative convex polynomials), then Σ[x] ∩ F is dense in F. Theorem 2.31 states that if f is convex and nonnegative on K (including the case K ≡ Rn ) then one may approximate f by a sequence (fr ) ⊂ Qc (g) ∩ FK with kf − fr k1 →0 as →0 (and r→∞). For instance, with r0 := b(deg f )/2c + 1, the polynomial fr may be defined as x 7→ fr (x) := f +(θr0 (x)+θr (x)) with θr as in (2.6). Observe that Theorem 2.31 provides f with a certificate of nonnegativity on K. Indeed, let x ∈ K be fixed arbitrary. Then as fr ∈ Qc (g) one has fr (x) ≥ 0. Letting ↓ 0 yields 0 ≤ lim→0 fr (x) = f (x). And as x ∈ K was arbitray, f ≥ 0 on K. For the class of s.o.s.-convex polynomials, we have the more precise result: Theorem 2.32. Let K be as in (2.10) and Slater’s condition hold. Let f ∈ R[x] be such that f ∗ := inf x {f (x) : x ∈ K} = f (x∗ ) for some x∗ ∈ K. ∗ If f and (−gj )m j=1 are s.o.s.-convex then f − f ∈ Qc (g). Proof. As Slater’s condition holds, there exists a vector of LagrangeKKT multipliers λ ∈ Rm + such that the Lagrangian polynomial Lf := f − f ∗ −
m X
λj g j
(2.36)
j=1
is nonnegative and satisfies Lf (x∗ ) = 0 as well as ∇Lf (x∗ ) = 0; see Section C.1. Moreover, as f and (−gj ) are s.o.s.-convex then so is Lf because P ∇2 Lf = ∇2 f − j λj ∇2 gj with λ ≥ 0. Therefore, by Lemma 2.30, Lf ∈ Σ[x], i.e., f − f∗ −
m X
λj g j = f 0 ,
j=1
for some f0 ∈ Σ[x]. As Lf is convex then so is f0 and so f − f ∗ ∈ Qc (g). 4 Slater’s considion holds if there exists x ∈ K such that g (x ) > 0 for every 0 0 j j = 1, . . . , m.
2 Positive Polynomials
46
Hence the class of s.o.s.-convex polynomials is very interesting because a nice representation result is available. Notice that the above representation holds with f ≥ 0 on K and K is not required to be compact. Another interesting case is when the Hessian of f is positive definite on K.
Theorem 2.33. Let K be as in (2.10) and let Assumption 2.1 and Slater’s condition both hold. Let gj be concave, j = 1, . . . , m, and let f ∈ R[x] be convex and such that ∇2 f 0 on K. If f ≥ 0 on K then f ∈ Q(g), i.e., f = f0 +
m X
fj gj ,
(2.37)
j=
for some s.o.s. polynomials (fj )m j=0 ⊂ Σ[x].
Proof. Under Assumption 2.1 the set K is compact. Hence, let f ∗ := minx∈K f (x) and let x∗ ∈ K be a minimizer. As f, −gj are convex, j = 1, . . . , m, and Slater’s condition holds, there exists a vector of nonnegative multipliers λ ∈ Rm + such that the Lagrangian polynomial Lf in (2.36) is nonnegative on Rn with Lf (x∗ ) = 0 and ∇Lf (x∗ ) = 0. Next, as ∇2 f 0 and −∇2 gj 0 on K, ∇2 Lf 0 on K; this is because K is compact, ∇2 Lf ∇2 f , and the smallest eigenvalue of ∇2 Lf is a continuous function of x. Hence there is some δ > 0 such that ∇2 Lf (x) δIn for all x ∈ K (where In denotes the n × n identity matrix). Next, by Lemma 2.30, Lf (x) = h(x − x∗ ), F(x, x∗ )(x − x∗ )i where F(x, x∗ ) is the matrix polynomial defined by:
x 7→
F(x, x∗ ) :=
Z
1 0
Z
t 0
∇2 Lf (x∗ + s(x − x∗ )) ds dt .
As K is convex, x∗ +s(x−x∗ ) ∈ K for all s ∈ [0, 1], and so for every ξ ∈ Rn , 0
∗
ξ F(x, x )ξ ≥ δ
Z
1 0
Z
t 0
ξ 0 ξdsdt =
δ 0 ξ ξ. 2
2.9 Summary
Therefore, F(x, x∗ )
δ 2
47
In for every x ∈ K, and by Theorem 2.22,
F(x, x∗ ) = F0 (x) +
n X
Fj (x) gj (x),
j=1
for some real symmetric s.o.s. matrix polynomials (Fj ) ⊂ R[x]n×n . Therefore, Lf (x) = hx − x∗ , F(x, x∗ )(x − x∗ )i , n X = hx − x∗ , F0 (x)(x − x∗ )i + gj (x) hx − x∗ , Fj (x)(x − x∗ )i j=1
= σ0 (x) +
n X
σj (x) gj (x)
j=1
for some s.o.s. polynomials (σj )m j=0 ⊂ Σ[x]. Hence, recalling the definition of Lf , f (x) = f ∗ + Lf (x) +
m X
λj gj (x)
j=1 m X
= (f ∗ + σ0 (x)) +
(λj + σj (x)) gj (x),
j=1
that is, f ∈ Q(g) because f ∗ ≥ 0 and λ ≥ 0.
x ∈ Rn ,
So for the class of compact basic semi-algebraic sets K defined by concave polynomials, one obtains a refinement of Putinar’s Positivstellensatz Theorem 2.14, for the class of convex functions whose Hessian is positive definite on K, and that are only nonnegative on K.
2.9
Summary
In this chapter, we have presented representation results for positive polynomials, and polynomials positive on semi-algebraic sets or varieties. Once we assume a degree for the polynomials involved, these representations can be computed via semidefinite or linear optimization methods, and thus their computation can be done efficiently. Especially important is Theorem 2.28 that allows the use of sparse semidefinite relaxations of Chapter 5 for polynomial optimization problems with a large number of variables
2 Positive Polynomials
48
and a sparsity pattern that satisfies Assumption 2.4. Also a nice and simple representation result holds for the very interesting class of s.o.s.-convex polynomials. 2.10
Exercises
Exercise 2.1. (Goursat’s Lemma) With f ∈ R[x] and deg f = m, the polynomial fˆ 1−x m ˆ , x 7→ f (x) := (1 + x) f 1+x is called the Goursat transform of f . (a) What is the Goursat transform of fˆ? (b) Prove that f > 0 on [−1, 1] if and only if fˆ > 0 on [0, ∞) and degree fˆ = m. (c) f ≥ 0 on [−1, 1] if and only if fˆ ≥ 0 on [0, ∞) and degree fˆ ≤ m. Exercise 2.2. Consider a polynomial h ∈ R[x] with [−1, 1] = {x ∈ R : h(x) ≥ 0}, that is, h is not necessarily the polynomial x 7→ g(x) := (1 + x)(1 − x). Prove that if p > 0 on [−1, 1], then p = s0 + hs1 for some s0 , s1 ∈ Σ[x]. Exercise 2.3. Given a polynomial p ∈ R[x], use Theorem 2.8 to show that the problem of checking whether p > 0 on [−1, 1] reduces to a linear optimization feasibility problem. Exercise 2.4. Show that the following polynomials are nonnegative but not s.o.s. (a) p(x1 , x2 ) = x21 x22 (x21 + x22 − 1) + 1. (b) p(x1 , x2 , x3 ) = x21 x22 (x21 + x22 − 3x23 ) + 6x63 . Exercise 2.5. Show that the following classes of polynomials have the property that if they are nonnegative, then they are s.o.s. (a) Quadratic polynomials (b) Fourth degree homogeneous polynomials on 3 variables. Exercise 2.6. Let A ∈ Rm×n , b ∈ Rm . Show that a Farkas certificate of ∅ = {x : Ax = b; x ≥ 0} is a particularly simple Stengle certificate (2.9). Exercise 2.7. Prove Theorem 2.12 by using Theorem 2.11
2.11 Notes and Sources
49
P Exercise 2.8. Let f ∈ R[x] be separable, i.e., x 7→ f (x) = ni=1 fi (xi ) for some univariate polynomials fi ∈ R[xi ], i = 1, . . . , n. Show that if fi is convex for every i = 1, . . . , n, then f is s.o.s.-convex.
2.11
Notes and Sources
2.2. For a nice exposition on degree bounds for Hilbert’s 17th problem see Schmid (1998). Theorem 2.3 is from Blekherman (2006) whereas Theorem 2.4(a) is from Lasserre and Netzer (2007) and provides an explicit construction of an approximating sequence of sums of squares for the denseness result of Berg (1987). Theorem 2.4(b) is from Lasserre (2006e) and Theorem 2.2 is from Lasserre (2007b). 2.3. Most of the material for the one-dimensional case is taken from Powers and Reznick (2000). Theorem 2.6(a) is due to Fekete (1935) whereas Theorem 2.6(b) is attributed F. Luck` acs. Theorem 2.7 is from P´ olya and Szeg¨ o (1976). Theorem 2.8 is due to Hausdorff (1915) and Bernstein (1921). 2.4. The important Theorem 2.9 is due to P´ olya (1974),. The Positivstellensatz (Theorem 2.12) is credited to Stengle (1974) but was proved earlier by Krivine (1964a). For a complete and recent exposition of representation of positive polynomials, the reader is referred to Prestel and Delzell (2001); see also Kuhlmann et al. (2005) and the more recent Helton and Putinar (2007), Scheiderer (2008) and Marshall (2008). For a nice discussion on historical aspects on Hilbert’s 17th problem the reader is referred to Reznick (2000). 2.5. Theorem 2.13 is due to Schm¨ udgen (1991). The machinery uses the spectral theory of self-adjoint operators in Hilbert spaces. The important Theorem 2.14 is due to Putinar (1993) and Jacobi and Prestel (2001), whereas Theorem 2.15 is due to Schm¨ udgen (1991). Concerning degree bounds for s.o.s. terms that appear in those representation results, Theorem 2.16 is from Nie and Schweighofer (2007); see also Marshall (2009b) and Schweighofer (2005a). For the non compact case, negative results are provided in Scheiderer (2008). However, some nice representation results have been obtained in some specific cases; see e.g. Marshall (2009a). Theorem 2.24 for the case of polytopes is due to Cassier (1984) and Handelman (1988) and Theorem 2.23 for compact semi-algebraic sets follows
50
2 Positive Polynomials
from a result due to Krivine (1964a,b), and re-stated later in Becker and Schwartz (1983), and Vasilescu (2003). The proof of Theorem 2.14 due to Schweighofer (2005b) is taken from Laurent (2008) and Proposition 2.21 is from (Marshall, 2008, 5.4.4). Theorem 2.22, the matrix-polynomial version of Putinar’s Positivstellensatz, was first proved in Scherer and Hol (2004) and Kojima and Maramatsu (2007), independently (with no degree bound). The version with degree bound is from Helton and Nie (2010). 2.6. Theorem 2.25 is from Lasserre (2005) whereas Theorem 2.26 is from Parrilo (2002) who extended previous results in Lasserre (2002c) for the grid case. 2.7. Theorem 2.27 is from Lasserre (2006c). Theorem 2.28 was first proved in Lasserre (2006a) under the assumption that the feasible set K has a nonempty interior. This assumption was later removed in Kojima and Maramatsu (2009). For extensions of Theorem 2.28 to some non compact cases, see the recent Kuhlmann and Putinar (2007, 2009). 2.8. Lemma 2.29 and 2.30 are from Helton and Nie (2010) who introduced the very interesting class of s.o.s.-convex polynomials. Theorems 2.31 and 2.32 are from Lasserre (2008a). Some properties of convex polynomials with applications to polynomial programming have been investigated in Andronov et al. (1982) and Belousov and Klatte (2002). 2.9. Exercise 2.1 is from Goursat (1894). Exercise 2.2 is from Powers and Reznick (2000), who also provide a priori bounds on the degrees of s0 , s1 based on the degree of p, h and the smallest absolute value of the roots of p.
Chapter 3
Moments
Most results of this chapter are the dual analogues of those described in Chapter 2. Indeed the problem of representing polynomials that are positive on a set K has a dual facet which is the problem of characterizing sequences of reals that are moment sequences of some finite Borel measure supported on K. Moreover, as we shall see, this beautiful duality is nicely captured by standard duality in convex analysis, applied to some appropriate convex cones of R[x]. We review basic results in the moment problem and also particularize to some specific important cases like in Chapter 2.
Let z ∈ Cn denote the complex conjugate vector of z ∈ Cn . Let {zα zβ }z∈Cn ,α,β∈Nn be the basis of monomials for the ring of polynomials C[z, z] = C[z1 , . . . , zn , z 1 , . . . , z n ] of the 2n variables {zj , z j }, with coefficients in C. Recall that for every z ∈ Cn , α ∈ Nn , the notation zα stands for the monomial z1α1 · · · znαn of C[z]. The support of a Borel measure µ on Rn is a closed set, the complement in Rn of the largest open set O ⊂ Rn such that µ(O) = 0 (and recall that Cn may be identified with R2n ). Definition 3.1. (a) The full moment problem: Let (gi )m i=1 ⊂ C[z, z] be such such that each gi (z, z) is real, and let K ⊂ Cn be the set defined by: K := {z ∈ Cn : gi (z, z) ≥ 0, i = 1, . . . , m}. Given an infinite sequence y = (yαβ ) ⊂ C of complex numbers, with α, β ∈ Nn , is y a K-moment sequence, i.e., does there exist a measure µ supported on K, such that Z yαβ = zα zβ dµ, ∀ α, β ∈ Nn ? (3.1) K
(b) The truncated moment problem: Given (gi ), K ⊂ Cn as above, a 51
3 Moments
52
finite subset ∆ ⊂ Nn × Nn , and a finite sequence y = (yαβ )(α,β)∈∆ ⊂ C of complex numbers, is y a K-moment sequence, i.e., does there exist a measure µ supported on K, such that Z yαβ = zα zβ dµ, ∀ α, β ∈ ∆? (3.2) K
Definition 3.2. In both full and truncated cases, a measure µ as in (3.1) or (3.2), is said to be a representing measure of the sequence y. If the representing measure µ is unique then µ is said to be determinate (i.e., determined by its moments), and indeterminate otherwise.
Example 3.1. For instance, the probability measure µ on the real line R, with density with respect to the Lebesgue measure given by: √ (x 2π)−1 exp (− ln(x)2 /2) if x > 0, x 7→ f (x) := 0 otherwise, and called the log-normal distribution, is not determinate. Indeed, for each a with −1 ≤ a ≤ 1, the probability measure with density x 7→
fa (x) := f (x) [ 1 + a sin(2π ln x) ]
has exactly the same moments as µ. The above moment problem encompasses all the classical onedimensional moment problems of the 20th century. (a) The Hamburger problem refers to K = R and (yα )α∈N ⊂ R. (b) The Stieltjes problem refers to K = R+ , and (yα )α∈N ⊂ R+ . (c) The Hausdorff problem refers to K = [a, b], and (yα )α∈N ⊂ R. (d) The Toeplitz problem refers to K being the unit circle in C, and (yα )α∈Z ⊂ C. In all chapters of this book, we only consider real moment problems, that is, moment problems characterized with: • A set K ⊂ Rn , and • A sequence y = (yα ), α ∈ Nn .
3.1 The One-dimensional Moment Problem
53
The multi-dimensional moment problem is significantly more difficult than the one-dimensional case for which the results are fairly complete. This is because, in view of Theorem 3.1 below, obtaining conditions for a sequence to be moments of a representing measure with support on a given subset Ω ⊆ Rn , is related to characterizing polynomials that are nonnegative on Ω. When the latter characterization is available, it will translate into conditions on the sequence. But as we have seen in Chapter 2, and in contrast to the univariate case, polynomials that are nonnegative on a given set Ω ⊆ Rn have no simple characterization, except for compact basic semi-algebraic sets as detailed in Section 2.5. Thus, for instance, the full multi-dimensional K-moment problem is still unsolved for general sets K ⊂ Cn , including K = C. Before we proceed further, we first state the important Riesz-Haviland theorem. Let y = (yα ) ⊂ R be an infinite sequence, and let Ly : R[x]→R, be the linear functional
f (x) =
X
α∈Nn
f α xα
7→ Ly (f ) =
X
fα yα .
(3.3)
α∈Nn
Theorem 3.1. (Riesz-Haviland) Let y = (yα )α∈Nn ⊂ R and let K ⊂ Rn be closed. There exists a finite Borel measure µ on K such that Z xα dµ = yα , ∀α ∈ Nn , (3.4) K
if and only if Ly (f ) ≥ 0 for all polynomials f ∈ R[x] nonnegative on K. Note that Theorem 3.1 is not very practical as we do not have an explicit characterization of polynomials that are nonnegative on a general closed set K ⊂ Rn . However, we have seen in Chapter 2 some nice representations for the subclass of compact basic semi-algebraic sets K ⊂ Rn . Theorem 3.1 will serve as our primary proof tool in the next sections.
3.1
The One-dimensional Moment Problem
Given an infinite sequence y = (yj ) ⊂ R, introduce the Hankel matrices Hn (y), Bn (y) and Cn (y) ∈ R(n+1)×(n+1) , defined by
3 Moments
54
Hn (y)(i, j) := yi+j−2 ;
Bn (y)(i, j) := yi+j−1 ;
Cn (y)(i, j) := yi+j ,
for all i, j ∈ N, with 1 ≤ i, j ≤ n + 1. The Hankel matrix Hn (y) is the one-dimensional (or univariate) version of what we later call a moment matrix in Section 3.2.1. 3.1.1
The full moment problem
Recall that for any two real-valued square symmetric matrices A, B, recall that the notation A B (resp. A B) stands for A − B being positive semidefinite (resp. A − B being positive definite). For the full Hamburger, Stieltjes, and Hausdorff moment problems, we have:
Theorem 3.2. Let y = (yj )j∈N ⊂ R. Then (a) y has a representing Borel measure µ on R if and only if the quadratic form x 7→ sn (x) :=
n X
yi+j xi xj
(3.5)
i,j=0
is positive semidefinite for all n ∈ N. Equivalently, Hn (y) 0 for all n ∈ N. (b) y has a representing Borel measure µ on R+ if and only if the quadratic forms (3.5) and x 7→ un (x) :=
n X
yi+j+1 xi xj
(3.6)
i,j=0
are positive semidefinite for all n ∈ N. Equivalently, Hn (y) 0, and Bn (y) 0 for all n ∈ N. (c) y has a representing Borel measure µ on [a, b] if and only if the quadratic forms (3.5) and x 7→ vn (x) :=
n X
i,j=0
(−yi+j+2 + (b + a) yi+j+1 − ab yi+j ) xi xj
n ∈ N,
(3.7) are positive semidefinite for all n ∈ N. Equivalently, Hn (y) 0 and −Cn (y) + (a + b) Bn (y) − ab Hn (y) 0, for all n ∈ N.
3.1 The One-dimensional Moment Problem
Proof.
(a) If yn =
sn (x) =
n X
i,j=0
R
R
xi xj
55
z n dµ(z), then Z
z
i+j
dµ(z) =
R
Z
R
n X i=0
xi z
i
!2
dµ(z) ≥ 0.
Conversely, we assume that (3.5) holds, or, equivalently, Hn (y) 0, for all n ∈ N. Therefore, for every q ∈ Rn+1 we have q0 Hn (y)q ≥ 0. Let p ∈ R[x] Pr be nonnegative on R so that it is s.o.s. and can be written p = j=1 qj2 for some r ∈ N and some polynomials (qj )rj=1 ⊂ R[x]. But then 2n X
pk yk = Ly (p) = Ly (
r X j=1
k=0
qj2 )
=
r X j=1
q0j Hn (y)qj ≥ 0,
where qj is the vector of coefficients of the polynomial qj ∈ R[x]. As p ≥ 0 was arbitrary, by Theorem 3.1, (3.4) holds with K := R. (b) This is similar to part (a). (c) One direction is immediate. For the converse, we assume that both (3.5) and (3.7) hold. Then, for every q ∈ Rn+1 we have q0 [ −Cn (y) + (a + b) Bn (y) − ab Hn (y) ] q ≥ 0. Let p ∈ R[x] be nonnegative on [a, b] of even degree 2n and thus, by Theorem 2.6(b), it can be written x 7→ p(x) = u(x) + (b − x)(x − a)q(x) with both polynomials u, q being s.o.s. with deg q ≤ 2n − 2, deg u ≤ 2n. If deg p = 2n − 1 then again by Theorem 2.6(b), p can be written x 7→ p(x) = v(x)(x − a) + w(x)(b − x) for some s.o.s. polynomials v, w of degree less than 2n−2. But then p(x) = ((x−a)p(x)+(b−x)p(x))/(b−a) = u(x) + (b − x)(x − a)q(x) for some s.o.s. polynomials u, q with degree less than 2n. P P Thus, in both even and odd cases, writing u = j u2j and q = k qk2 for some polynomials (uj , qk ) of degree less than n, and with associated vectors of coefficients (uj , qk ) ⊂ Rn+1 , one obtains
3 Moments
56
Ly (p) =
X
u0j Hn (y)uj +
j
X k
(q0k [−Cn (y) + (a + b) Bn (y) − ab Hn (y)] qk ) ≥ 0.
Therefore Ly (p) ≥ 0 for every polynomial p nonnegative on [a, b]. By Theorem 3.1, Eq. (3.4) holds with K := [a, b]. Observe that Theorem 3.2 provides a criterion directly in terms of the sequence (yn ). 3.1.2
The truncated moment problem
We now state the analogue of Theorem 3.2 for the truncated moment prob2n+1 lem for a sequence y = (yk )2n k=0 (even case), and y = (yk )k=0 (odd case). We first introduce some notation. For an infinite sequence y = (yj )j∈N , write the Hankel moment matrix Hn (y) in the form Hn (y) = [v0 , v1 , . . . , vn ]
(3.8)
where (vj ) ⊂ Rn+1 denote the column vectors of Hn (y). The Hankel rank of y = (yj )2n j=0 , denoted by rank(y), is the smallest integer 1 ≤ i ≤ n, such that vi ∈ span {v0 , . . . , vi−1 }. If Hn (y) is nonsingular, then its rank is rank(y) = n + 1. Given an m × n matrix A, Range(A) denotes the image space of A, i.e., Range(A) = {Au, u ∈ Rn }. Theorem 3.3 (The even case). Let y = (yj )0≤j≤2n ⊂ R. (a) y has a representing Borel measure µ on R if and only if Hn (y) 0 and rank(Hn (y)) = rank(y). (b) y has a representing Borel measure µ on R+ if and only if Hn (y) 0, Bn−1 (y) 0, and the vector (yn+1 , . . . , y2n ) is in Range(Bn−1 (y)). (c) y has a representing Borel measure µ on [a, b] if and only if Hn (y) 0 and (a + b) Bn−1 (y) ab Hn−1 (y) + Cn−1 (y).
3.2 The Multi-dimensional Moment Problem
57
Theorem 3.4 (The odd case). Let y = (yj )0≤j≤2n+1 ⊂ R. (a) y has a representing Borel measure µ on R if and only if Hn (y) 0 and vn+1 ∈ Range Hn (y). (b) y has a representing Borel measure µ on R+ if and only if Hn (y) 0, Bn (y) 0, and the vector (yn+1 , . . . , y2n+1 ) is in Range Hn (y). (c) y has a representing Borel measure µ on [a, b] if and only if b Hn (y) Bn (y) and Bn (y) a Hn (y). Example 3.2. In the univariate case n = 1, let y ∈ R5 be the truncated sequence y = (1, 1, 1, 1, 2), hence with associated Hankel moment matrix 111 H2 (y) = 1 1 1 . 112
One may easily check that H2 (y) 0, but it turns out that y has no representing Borel measure µ on the real line R. However, observe that for sufficiently small > 0, the perturbed sequence y := (1, 1, 1 + , 1, 1, 2) satisfies H2 (y ) 0 and so, by Theorem 3.3(a), y has a finite Borel representing measure µ . But then, necessarily, there is no compact interval [a, b] such that µ is supported on [a, b] for every > 0; see Exercise 3.1. In truncated moment problems, i.e., given a finite sequence y = (yk )nk=0 , the basic issue is to find conditions under which we may extend the sequence y, so as to be able to build up positive semidefinite moment matrices of higher orders. These higher order moment matrices are called positive extensions or flat extensions when their rank does not increase with the size. The rank and range conditions in Theorems 3.3-3.4 are such conditions. 3.2
The Multi-dimensional Moment Problem
Most of the applications considered in later chapters of this book refer to real (and not complex) moment problems. Correspondingly, we introduce the basic concepts of moment and localizing matrices in the real case Rn .
3 Moments
58
However, these concepts also have their natural counterparts in Cn , with P the usual scalar product hu, vi = j uj vj . As already mentioned, the multi-dimensional case is significantly more difficult because of the lack of nice characterization of polynomials that are nonnegative on a given subset Ω ⊆ Rn . Fortunately, we have seen in Section 2.5 that such a characterization exists for the important case of compact basic semi-algebraic sets. For an integer r ∈ N, let Nnr := {α ∈ Nn : |α| ≤ r} with |α| := Pn i=1 αi ≤ r. Recall that vr (x) := (1, x1 , x2 , . . . xn , x21 , x1 x2 , . . . , x1 xn , . . . , xr1 , . . . , xrn )0
(3.9)
denotes the canonical basis of the real vector space R[x] r of real-valued polynomials of degree at most r (and let s(r) := n+r denote its dimenn sion). Then, a polynomial p ∈ R[x]r is written as x 7→ p(x) =
X
α∈Nn
pα xα = hp, vr (x)i,
where p = {pα } ∈ Rs(r) denotes its vector of coefficients in the basis (3.9). And so we may identify p ∈ R[x] with its vector of coefficients p ∈ Rs(r) . 3.2.1
Moment and localizing matrix
We next define the important notions of moment matrix and localizing matrix. Moment matrix Given a s(2r)-sequence y = (yα ), let Mr (y) be the moment matrix of dimension s(r), with rows and columns labeled by (3.9), and constructed as follows: Mr (y)(α, β) = Ly (xα xβ ) = yα+β ,
∀ α, β ∈ Nnr ,
(3.10)
with Ly defined in (3.3). Equivalently, Mr (y) = Ly (vr (x)vr (x)0 ) where the latter notation means that we apply Ly to each entry of the matrix vr (x)vr (x)0 .
3.2 The Multi-dimensional Moment Problem
59
Let us consider an example with n = r = 2. In this case, M2 (y) becomes y00 | y10 y01 | y20 y11 y02 − − − − − − − y | y y | y y y 20 11 30 21 12 10 y | y11 y02 | y21 y12 y03 M2 (y) = 01 . − − − − − − − y20 | y30 y21 | y40 y31 y22 y11 | y21 y12 | y31 y22 y13 y02 | y12 y03 | y22 y13 y04
In general, Mr (y) defines a bilinear form h., .iy on R[x]r as follows: hp, qiy := Ly (pq) = hp, Mr (y)qi = p0 Mr (y)q,
∀ p, q ∈ Rs(r) ,
where again, p, q ∈ R[x]r , and p, q ∈ Rs(r) denote their vector of coefficients. Recall from Definition 3.2 that if y is a sequence of moments for some measure µ, then µ is called a representing measure for y, and if unique, then µ is said to be determinate, and indeterminate otherwise, whereas y is called a determinate (resp. indeterminate) moment sequence. In addition, for every q ∈ R[x], Z hq, Mr (y)qi = Ly (q 2 ) = q 2 dµ ≥ 0, (3.11) so that Mr (y) 0. It is also immediate to check that if the polynomial q 2 P is expanded as q(x)2 = α∈Nn hα xα , then X hq, Mr (y)qi = Ly (q 2 ) = hα y α . α∈Nn
• Every measure with compact support (say K ⊂ Rn ) is determinate, because by the Stone-Weierstrass theorem, the space of polynomials is dense (for the sup-norm) in the space of continuous functions on K. • Not every sequence y that satisfies Mi (y) 0 for every i, has a representing measure µ on Rn . This is in contrast with the one-dimensional case where by Theorem 3.2(a), a sequence y such that Hn (y) 0 for all n = 0, 1, . . ., has a representing measure. (Recall that in the onedimensional case, the moment matrix is just the Hankel matrix Hn (y) in (3.8).) However, we have the following useful result:
3 Moments
60
Proposition 3.5. Let y be a sequence indexed in the basis v∞ (x), which satisfies Mi (y) 0, for all i = 0, 1, . . .. (a) If the sequence y satisfies ∞ X
Ly (x2k i )
k=1
−1/2k
i = 1, . . . , n,
= +∞,
(3.12)
then y has a determinate representing measure on Rn . (b) If there exist c, a > 0 such that ∀ α ∈ Nn ,
| yα | ≤ c a α ,
(3.13)
then y has a determinate representing measure with support contained in the box [−a, a]n . In the one-dimensonial case (3.12) is called Carleman’s condition. The moment matrix also has the following properties: Proposition 3.6. Let d ≥ 1, and let y = (yα ) ⊂ R be such that Md (y) 0. Then | yα | ≤ max
y0 , max Ly (x2d ) , i
In addition, rescaling y so that y0 maxi=1,...,n Ly (x2d i ), 1
∀α ∈ Nn2d .
i=1,...,n
1
| yα | α ≤ τd2d ,
=
∀α ∈ Nn2d ,
1,
and letting τd
:=
α 6= 0.
Localizing matrix Given a polynomial u ∈ R[x] with coefficient vector u = {uγ }, we define the localizing matrix with respect to y and u, to be the matrix Mr (u y) with rows and columns indexed by (3.9), and obtained from Mr (y) by : Mr (u y)(α, β) = Ly (u(x)xα xβ ) =
X
γ∈Nn
uγ yγ+α+β ,
∀ α, β ∈ Nnr . (3.14)
3.2 The Multi-dimensional Moment Problem
61
Equivalently, Mr (u y) = Ly (u vr (x)vr (x)0 ), where the previous notation means that Ly is applied entrywise. For instance, when n = 2, with y00 y10 y01 M1 (y) = y10 y20 y11 and x 7→ u(x) = a − x21 − x22 , y01 y11 y02
we obtain
ay00 − y20 − y02 ay10 − y30 − y12 ay01 − y21 − y03 M1 (u y) = ay10 − y30 − y12 ay20 − y40 − y22 ay11 − y31 − y13 . ay01 − y21 − y03 ay11 − y31 − y13 ay02 − y22 − y04
Similar to (3.11) we have
hp, Mr (u y)qi = Ly (u pq), for all polynomials p, q ∈ R[x]r with coefficient vectors p, q ∈ Rs(r) . In particular, if y has a representing measure µ, then Z hq, Mr (u y)qi = Ly (u q 2 ) = u q 2 dµ, (3.15) for every polynomial q ∈ R[x] with coefficient vector q ∈ Rs(r) . Therefore, Mr (u y) 0 whenever µ has its support contained in the set {x ∈ Rn : u(x) ≥ 0}. It is also immediate to check that if the polynomial uq 2 is expanded as P u(x) q(x)2 = α∈Nn hα xα then X hq, Mr (u y)qi = hα yα = Ly (u q 2 ). (3.16) α∈Nn
3.2.2
Positive and flat extensions of moment matrices
We next discuss the notion of positive extension for moment matrices. Definition 3.3. Given a finite sequence y = (yα )|α|≤2r with Mr (y) 0, the moment extention problem is defined as follows: extend the sequence y with new scalars yβ , 2r < |β| ≤ 2(r + 1), so as to obtain a new finite sequence (yα )|α|≤2(r+1) such that Mr+1 (y) 0. If such an extension Mr+1 (y) is possible, it is called a positive extension of Mr (y). If in addition, rank Mr+1 (y) = rank Mr (y), then Mr+1 (y) is called a flat extension of Mr (y).
3 Moments
62
For truncated moment problems, flat extensions play an important role. We first introduce the notion of an atomic measure. An s-atomic measure is a measure with s atoms, that is, a linear positive combination of s Dirac measures. Theorem 3.7 (Flat extension). Let y = (yα )|α|≤2r . Then the sequence y admits a rank Mr (y)-atomic representing measure µ on Rn if and only if Mr (y) 0 and Mr (y) admits a flat extension Mr+1 (y) 0. Theorem 3.7 is useful as it provides a simple numerical means to check whether a finite sequence has a representing measure. Example 3.3. Let µ be the measure on R, defined to be µ = δ0 + δ1 , that is, µ is the sum of two Dirac measures at the points {0} and {1}. Then
21 M1 (y) = , 11
211 M2 (y) = 1 1 1 , 111
and obvioulsy, rank M2 (y) = rank M1 (y) = 2. 3.3
The K-moment Problem
The (real) K-moment problem identifies those sequences y that are moment-sequences of a measure with support contained in a set K ⊂ Rn . Given m polynomials gi ∈ R[x], i = 1, . . . , m, let K ⊂ Rn be the basic semi-algebraic set K := {x ∈ Rn : gi (x) ≥ 0, i = 1, . . . , m}.
(3.17)
For notational convenience, we also define g0 ∈ R[x] to be the constant polynomial with value 1 (i.e., g0 = 1). Recall that given a family (gj )m j=1 ⊂QR[x], we denote by gJ , J ⊆ {1, . . . , m}, the polynomial x 7→ gJ (x) := j∈J gj (x). In particular, when J = ∅, g∅ = 1. Let y = (yα )α∈Nn be a given infinite sequence. For every r ∈ N and every J ⊆ {1, . . . , m}, let Mr (gJ y) be the localizing matrix of order r Q with respect to the polynomial gJ := j∈J gj ; in particular, with J := ∅, Mr (g∅ y) = Mr (y) is the moment matrix (of order r) associated with y.
3.3 The K-moment Problem
63
As we have already seen, there is a duality between the theory of moments and the representation of positive polynomials. The following important theorem, which is the dual facet of Theorems 2.13 and 2.14, makes this statement more precise.
Theorem 3.8. Let y = (yα )α∈Nn be a given infinite sequence in R, Ly : R[x]→R be the linear functional introduced in (3.3), and let K be as in (3.17), assumed to be compact. (a) The sequence y has a finite Borel representing measure with support contained in K, if and only if Ly (f 2 gJ ) ≥ 0,
∀ J ⊆ {1, . . . , m},
∀ f ∈ R[x],
(3.18)
or, equivalently, if and only if Mr (gJ y) 0,
∀ J ⊆ {1, . . . , m},
∀ r ∈ N.
(3.19)
(b) Assume that there exists u ∈ R[x] of the form u = u0 +
m X
ui g i ,
j=1
ui ∈ Σ[x],
i = 0, 1, . . . , m,
and such that the level set {x ∈ Rn : u(x) ≥ 0} is compact. Then, y has a finite Borel representing measure with support contained in K, if and only if Ly (f 2 gj ) ≥ 0,
∀ j = 0, 1, . . . , m,
∀ f ∈ R[x],
(3.20)
or, equivalently, if and only if Mr (gj y) 0,
∀ j = 0, 1, . . . , m,
∀ r ∈ N.
(3.21)
Proof. (a) For every J ⊆ {1, . . . , m} and f ∈ R[x]r , the polynomial f 2 gJ is nonnegative on K. Therefore, of a measure R 2 if y is the sequence of moments 2 µ supported on K, then f gJ dµ ≥ 0. Equivalently, Ly (f gJ ) ≥ 0, or, in view of (3.16), Mr (gJ y) 0. Hence (3.18)-(3.19) hold. Conversely, assume that (3.18), or equivalently, (3.19) holds. As K is compact, by Theorem 3.1, y is the moment sequence of a measure with P support contained in K if and only if α∈Nn fα yα ≥ 0 for all polynomials f ≥ 0 on K. Let f > 0 on K, so that by Theorem 2.13
3 Moments
64
f =
X
pJ gJ ,
(3.22)
J⊆{1,...,m}
for some polynomials {pJ } ⊂ R[x], all sums of squares. Hence, since pJ ∈ Σ[x], from (3.18) and from the linearity of Ly , we have Ly (f ) ≥ 0. Hence, P for all polynomials f > 0 on K, we have Ly (f ) = α fα yα ≥ 0. Next, let f ∈ R[x] be nonnegative on K. Then for arbitrary > 0, f + > 0 on K, and thus, Ly (f + ) = Ly (f ) + y0 ≥ 0. As > 0 was arbitrary, Ly (f ) ≥ 0 follows. Therefore, Ly (f ) ≥ 0 for all f ∈ R[x], nonnegative on K, which by Theorem 3.1, implies that y is the moment sequence of some measure with support contained in K. (b) The proof is similar to part (a) and is left as an exercise. Recall the definitions (2.8) (resp. (2.12)) of the preordering P (g1 , . . . , gm ) (resp. the quadratic module Q(g1 , . . . , gm )), which are both convex cones of R[x], and let R[x]∗ be the algebraic dual of R[x]. Then in the language of convex analysis, (3.18) (resp. (3.20)) states that Ly (or y) belongs to the dual cone P (g1 , . . . , gm )∗ (resp. Q(g1 , . . . , gm )∗ ) ⊂ R[x]∗ . Hence, the duality between polynomials positive on a basic semialgebraic set K ⊂ Rn and sequences y ∈ R[x]∗ that have a finite Borel representing measure supported on K, is nicely captured by standard duality of convex analysis, applied to the appropriate convex cones P (g1 , . . . , gm ) or Q(g1 , . . . , gm ) of R[x]. Note that the conditions (3.19) and (3.21) of Theorem 3.8 are stated in terms of positive semidefiniteness of the localizing matrices associated with the polynomials gJ and gj involved in the definition (3.17) of the compact set K. Alternatively, we also have:
Theorem 3.9. Let y = (yα )α∈Nn ⊂ R be a given infinite sequence, Ly : R[x]→R be the linear functional introduced in (3.3), and let K be as in (3.17), assumed to be compact. Let CG ⊂ R[x] be the convex cone defined in (2.18), and let Assumption 2.2 hold. Then, y has a representing measure µ with support contained in K if and only if Ly (f ) ≥ 0,
∀ f ∈ CG ,
(3.23)
or, equivalently, Ly (b g α (1 − gb)β ) ≥ 0,
with gb as in (2.17).
∀ α, β ∈ Nm
(3.24)
3.3 The K-moment Problem
65
Proof. If y is the moment sequence of some measure with support contained in K, then (3.24) follows directly from Theorem 3.1, because gbj ≥ 0 and 1 − gbj ≥ 0 on K, for all j = 1, . . . , m. Conversely, let (3.24) (and so (3.23)) hold, and let f ∈ R[x], with f > 0 on K. By Theorem 2.23, f ∈ CG , and so f can be written as in (2.18). Therefore, Ly (f ) ≥ 0 and (3.24) follows from the linearity of Ly . Finally, let f ≥ 0 on K, so that f + > 0 on K for every > 0. Therefore, 0 ≤ Ly (f + ) = Ly (f ) + y0 because f + ∈ CG . As > 0 was arbitrary, we obtain Ly (f ) ≥ 0. Therefore, Ly (f ) ≥ 0 for all f ∈ R[x], nonnegative on K, which by Theorem 3.1, implies that y is the moment sequence of some measure with support contained in K. Exactly as Theorem 3.8 was the dual facet of Theorems 2.13 and 2.14, Theorem 3.9 is the dual facet of Theorem 2.23. Again, in the language of convex analysis, (3.24) states that Ly (or y) belongs to the dual cone ∗ CG ⊂ R[x]∗ of CG ⊂ R[x]. Note that Eqs. (3.24) reduce to countably many linear conditions on the sequence y. Indeed, for fixed α, β ∈ Nm , we write X gbα (1 − gb)β = qγ (α, β) xγ , γ∈Nn
for finitely many coefficients (qγ (α, β)). Then, (3.24) becomes X
γ∈Nn
qγ (α, β) yγ ≥ 0,
∀ α, β ∈ Nm .
(3.25)
Eq. (3.25) is to be contrasted with the positive semidefiniteness conditions (3.20) of Theorem 3.8. In the case where all the gj ’s in (3.17) are affine (so that K is a convex polytope), we also have a specialized version of Theorem 3.9.
Theorem 3.10. Let y = (yα )α∈Nn ⊂ R∞ be a given infinite sequence, Ly : R[x]→R be the linear functional introduced in (3.3). Assume that K is compact with nonempty interior, and all the gj ’s in (3.17) are affine, so that K is a convex polytope. Then, y has a finite Borel representing measure with support contained in K if and only if Ly (g α ) ≥ 0,
∀ α ∈ Nm .
(3.26)
3 Moments
66
A sufficient condition for the truncated K-moment problem Finally, we present a very important sufficient condition for the truncated K-moment problem. That is, we provide a condition on a finite sequence y = (yα ) to admit a finite Borel representing measure supported on K. Moreover, this condition can be checked numerically by standard techniques from linear algebra.
Theorem 3.11. Let K ⊂ Rn be the basic semi-algebraic set K := {x ∈ Rn : gj (x) ≥ 0, j = 1, . . . , m}, for some polynomials gj ∈ R[x] of degree 2vj or 2vj − 1, for all j = 1, . . . , m. Let y = (yα ) be a finite sequence with |α| ≤ 2r, and let v := maxj vj . Then y has a rank Mr−v (y)-atomic representing measure µ with support contained in K if and only if: (a) Mr (y) 0, Mr−v (gj y) 0, j = 1, . . . , m, and (b) rank Mr (y) = rank Mr−v (y). In addition, µ has rank Mr (y) − rank Mr−v (gi y) atoms x ∈ Rn that satisfy gi (x) = 0, for all i = 1, . . . , m. Note that in Theorem 3.11, the set K is not required to be compact. The rank-condition can be checked by standard techniques from numerical linear algebra. However it is also important to remember that computing the rank is sensitive to numerical imprecisions. 3.4
Moment Conditions for Bounded Density
In this section, we consider the K-moment problem with bounded density. That is, given a finite Borel (reference) measure µ on K ⊆ Rn with moment sequence y = (yα ), α ∈ Nn , under what conditions on y do we have Z yα = xα h dµ, ∀α ∈ Nn , (3.27) K
for some bounded density 0 ≤ h ∈ L∞ (K, µ)? The measure dν := hdµ is said to be uniformly absolutely continuous with respect to µ (denoted ν µ) and h is called the Radom-Nikodym derivative of ν with respect to µ. This is a refinement of the general Kmoment problem where one only asks for existence of some finite Borel representing measure on K (not necessarily with a density with respect to
3.4 Moment Conditions for Bounded Density
67
some reference measure µ). Recall that for two finite measures µ, ν on a σ-algebra B, one has the natural partial order ν ≤ µ if and only if ν(B) ≤ µ(B) for every B ∈ B; and observe that ν ≤ µ obviously implies ν µ but the converse does not hold.
3.4.1
The compact case
We first consider the case where the support of the reference measure µ is a compact basic semi-algebraic set K ⊂ Rn . Theorem 3.12. Let K ⊂ Rn be compact and defined as in (3.17). Let z = (zα ) be the moment sequence of a finite Borel measure µ on K. (a) A sequence y = (yα ) has a finite Borel representing measure on K, uniformly absolutely continuous with respect to µ, if and only if there is some scalar κ such that: 0 ≤ Ly (f 2 gJ ) ≤ κ Lz (f 2 gJ )
∀ J ⊆ {1, . . . , m}. (3.28) (b) In addition, if the polynomial N − kxk2 belongs to the quadratic module Q(g) then one may replace (3.28) with the weaker condition 0 ≤ Ly (f 2 gj ) ≤ κ Lz (f 2 gj )
∀ f ∈ R[x],
∀ f ∈ R[x];
∀ j = 0, . . . , m
(3.29)
(with the convention g0 = 1). (c) Suppose that the gj ’s are normalized so that 0 ≤ gj ≤ 1
on
K,
∀ j = 1, . . . , m,
and that the family (1, {gj }) generates the algebra R[x]. Then a sequence y = (yα ) has a finite Borel representing measure on K, uniformly absolutely continuous with respect to µ, if and only if there is some scalar κ such that: 0 ≤ Ly (g α (1 − g)β ) ≤ κ Lz (g α (1 − g)β ),
∀ α, β ∈ Nm .
(3.30)
Proof. We only prove (a) as (b) and (c) can be proved with very similar arguments. The only if part: Let dν = hdµ for some 0 ≤ h ∈ L∞ (K, µ), and let κ := khk∞ . Observe that gJ ≥ 0 on K for all J ⊆ {1, . . . , m}.
3 Moments
68
Therefore, for every J ⊆ {1, . . . , m} and all f ∈ R[x]: Ly (f 2 gJ ) =
Z
f 2 gJ dν = K
Z
K
f 2 gJ h dµ ≤ κ
Z
f 2 gJ dµ = κ Lz (f 2 gJ ), K
and so (3.28) is satisfied. The if part: Let y and z be such that (3.28) holds true. Then by Theorem 3.8, y has a finite Borel representing measure ν on K. In addition, let γ := (γα ) with γα := κzα − yα for all α ∈ Nn . From (3.28), one has Lγ (f 2 gJ ) ≥ 0,
∀ f ∈ R[x];
∀J ⊆ {1, . . . , m},
and so, by Theorem 3.8 again, γ has a finite Borel representing measure ψ on K. Moreover, from the definition of the sequence γ Z
f d(ψ + ν) = K
Z
f κ dµ, K
∀ f ∈ R[x],
and therefore, as measures on compact sets are moment determinate, ψ + ν = κµ. Hence κµ ≥ ν which shows that ν µ and so one may write dν = h dµ for some 0 ≤ h ∈ L1 (K, µ). From ν ≤ κµ one obtains Z
A
(h − κ) dµ ≤ 0,
∀ A ∈ B(K)
(where B(K) is Borel σ-algebra associated with K). And so 0 ≤ h ≤ κ, µ-almost everywhere on K. Equivalently, khk∞ ≤ κ, the desired result. Notice that using moment and localizing matrices, (3.28) ⇔ 0 Mi (gJ y) κ Mi (gJ z),
(3.29) ⇔ 0 Mi (gj y) κ Mi (gj z), 3.4.2
i = 1, . . . ; J ⊆ {1, . . . , m}
i = 1, . . . ; j = 0, 1, . . . , m.
The non compact case
Let the reference measure µ be a finite Borel measure on Rn , not supported on a compact set. As one wishes to find moment conditions, it is natural to consider the case where all the moments z = (zα ) of µ are finite, and a simple sufficient condition is that µ satisfies the generalized Carleman condition (3.12) of Nussbaum.
3.4 Moment Conditions for Bounded Density
69
Theorem 3.13. Let z = (zα ) be the moment sequence of a finite Borel measure µ on Rn which satisfies the generalized Carleman condition, i.e.: ∞ X
k=1
−1/2k Lz (x2k = ∞, i )
∀ i = 1, . . . , n.
(3.31)
A sequence y = (yα ) has a finite Borel representing measure ν on Rn , uniformly absolutely continuous with respect to µ, if there exists a scalar 0 < κ such that for all i = 1, . . . , n: 0 ≤ Ly (f 2 ) ≤ κ Lz (f 2 )
Proof.
∀ f ∈ R[x].
(3.32)
For every i = 1, . . . , n, (3.32) with x 7→ f (x) = xki , yields −1/2k −1/2k Ly (x2k ≥ κ−1/2k Lz (x2k , i ) i )
∀ k = 0, 1, . . . ,
and so, using (3.31), one obtains ∞ X k=1
−1/2k Ly (x2k ≥ i )
∞ X
−1/2k κ−1/2k Lz (x2k = +∞, i )
k=1
for every i = 1, . . . , n, i.e., the generalized Carleman condition (3.12) holds for the sequence y. Combining this with the first inequality in (3.32) yields that y has a unique finite Borel representing measure ν on Rn . It remains to prove that ν µ and its density h is in L∞ (Rn , µ). Let γ = (γα ) with γα := κzα − yα for all α ∈ Nn . Then the second inequality in (3.32) yields Lγ (f 2 ) ≥ 0
∀f ∈ R[x].
(3.33)
Next, observe that from (3.32), for every i = 1, . . . , n, and every k = 0, 1, . . ., 2k Lγ (x2k i ) ≤ κ Lz (xi ),
which implies −1/2k −1/2k , Lγ (x2k ≥ κ−1/2k Lz (x2k i ) i )
(3.34)
3 Moments
70
and so, for every i = 1, . . . , n, ∞ X
−1/2k Lγ (x2k i )
k=1
≥
∞ X
−1/2k κ−1/2k Lz (x2k = +∞, i )
k=1
i.e., γ satisfies the generalized Carleman condition. In view of (3.33), γ has a (unique) finite Borel representing measure ψ on Rn . Next, from the definition of γ, one has Z Z f d(ψ + ν) = κ f dµ, ∀ f ∈ R[x]. Rn
Rn
But as µ (and so κµ) satisfies Carleman condition (3.31), κµ is moment determinate and therefore, κµ = ν + ψ. Hence ν µ follows from ν ≤ κµ. Finally, writing dν = hdµ for some nonnegative h ∈ L1 (Rn , µ), and using ν ≤ κµ, one obtains Z (h − κ) dµ ≤ 0 ∀ A ∈ B(Rn ), A
and so 0 ≤ h ≤ κ, µ-almost everywhere on Rn . Equivalently, khk∞ ≤ κ, the desired result. Observe that (3.32) is extremely simple as it is equivalent to stating that κ Mr (z) Mr (y) 0,
∀r = 0, 1, . . .
Finally, a sufficient condition for (3.31) to hold is Z exp |xi | dµ < ∞, ∀i = 1, . . . , n. 3.5
Summary
In this chapter, we have presented representation theorems for moment problems. The representations results involving moments are dual to the representation results of Chapter 2 involving positive polynomials, and illustrate the duality between moments and positive polynomials. From a practical perspective, Theorems 3.7 and 3.11 provide a computable sufficient condition to check whether a finite sequence of moments has a representing measure. As we will see in the next chapter, Algorithm 4.2 to extract measures given moment sequences, uses Theorem 3.11. Finally, we
3.7 Notes and Sources
71
have also considered moment conditions for existence of a measure with bounded density with respect to some given reference measure.
3.6
Exercises
Exercise 3.1. Consider Example 3.2. (a) Show that for sufficiently small > 0, the sequence y = (1, 1, 1+, 1, 1) has a finite Borel representing measure µ on R. (b) Could a compact interval [a, b] ⊂ R be such that the support of µ is contained in [a, b] for every ? Why? (Hint: One could extract a sequence µi that converges weakly to some probability measure ν supported on [a, b]; see tightness of probability measures in Section B.1.) Exercise 3.2. Show Theorem 3.8(b). Exercise 3.3. Recall that for a convex cone C ⊂ Rn , its dual cone C ∗ is defined by C ∗ = {z ∈ Rn : hz, yi ≥ 0, ∀y ∈ C}. Let P ⊂ R[x] and Σ[x] be the convex cones of nonnegative and s.o.s. polynomials, respectively. Next, define the two convex cones: M = {y ∈ RNn : y has a representing measure} ; M = {y ∈ RN
n
: M(y) 0}.
(a) Show that M ∗ = P and P ∗ = M . (b) Show that M∗ = Σ[x] and M = Σ[x]∗ .
3.7
Notes and Sources
Moment problems have a long and rich history. For historical remarks and details on various approaches for the moment problem, the interested reader is referred to Landau (1987b). See also Akhiezer (1965), Curto and Fialkow (2000), and Simon (1998). Example 3.1 is from (Feller, 1966, p. 227) whereas Example 3.2 is from Laurent (2008). Theorem 3.1 was first proved by M. Riesz for closed sets K ⊂ R, and subsequently generalized to closed sets K ⊂ Rn by Haviland (1935, 1936). 3.1. Most of this section is from Curto and Fialkow (1991). Theorem 3.3(c) and Theorem 3.4 were proved by Krein and Nudel’man (1977) who also gave the sufficient conditions Hn (γ), Bn−1 (γ) 0 for Theorem 3.3(b)
72
3 Moments
and the sufficient condition Hn (γ), Bn (γ) 0 for Theorem 3.4(b). 3.2. The localizing matrix was introduced in Curto and Fialkow (2000) and Berg (1987). The multivariate condition in Proposition 3.5 that generalizes an earlier result of Carleman (1926) in one dimension, is stated in Berg (1987), and was proved by Nussbaum (1966). Proposition 3.6 is taken from Lasserre (2007b). The infinite and truncated moment matrices (and in particular their kernel) have a lot of very interesting properties. For more details, the interested reader is referred to Laurent (2008). 3.3. Concerning the solution of the K-moment problem, Theorem 3.8(a) was proved by Schm¨ udgen (1991) with a nice interplay between real algebraic geometry and functional analysis. Indeed, the proof uses Stengle’s Positivstellensatz (Theorem 2.12) and the spectral theory of self-adjoint operators in Hilbert spaces. Its refinement (b) is due to Putinar (1993), and Jacobi and Prestel (2001). Incidently, in Schm¨ udgen (1991), the Positivstellensatz Theorem 2.13 appears as a Corollary of Theorem 3.8(a). Theorem 2.24 is due to Cassier (1984) and Handelman (1988), and appears prior to the more general Theorem 3.9 due to Vasilescu (2003). Theorems 3.7 and 3.11 are due to Curto and Fialkow (1991, 1996, 1998), where the results are stated for the the complex plane C, but generalize to Cn and Rn . An alternative proof of some of these results can be found in Laurent (2005); for instance Theorem 3.11 follows from (Laurent, 2005, Theor. 5.23). 3.4. This section is from Lasserre (2006b). The moment problem with bounded density was initially studied by Markov on the interval [0, 1] with µ the Lebesgue measure, a refinement of the Hausdorff moment problem where one only asks for existence of some finite Borel representing measure ν on [0, 1]. For an interesting discussion with historical details, the interested reader is referred to Diaconis and Freedman (2006) where, in particular, the authors have proposed a simplified proof as well as conditions for existence of density in Lp ([0, 1], µ) with a similar flavor.
Chapter 4
Algorithms for Moment Problems
We describe a general methodology to solve the abstract generalized moment problem (GMP). That is, we provide a hierarchy of semidefinite relaxations whose associated sequence of optimal values converges to the optimal value of the basic GMP. Variants of this methodology are also provided to handle additional features such as countably many moment constraints, several measures, or GMP wih sparsity properties. This chapter should allow the reader to understand the basic idea underlying the methodology that permits to solve (or approximate) a problem formulated as a particular instance of the GMP.
In this chapter, we return to the generalized moment problem (1.1), and we utilize the results of the two previous chapters in order to provide computable bounds and extract solutions for Problem (1.1). The goal of this chapter is to make the transition from the characterizations of the previous two chapters to effective and efficient computation. The approach taken in this chapter is representative of the overall approach of the book.
4.1
The Overall Approach
Let Γ be a finite set of indices, {f, (hj )j∈Γ } ⊂ R[x], with f (x) =
X
f α xα
and hj (x) =
α∈Nn
X
hjα xα ,
α∈Nn
and let K ⊂ Rn be the basic semi-algebraic set K = { x ∈ Rn : gj (x) ≥ 0, j = 1, . . . , m },
(4.1)
with gj ∈ R[x], j = 1, . . . , m. The generalized moment problem is given by 73
4 Algorithms for Moment Problems
74
ρmom =
sup µ∈M (K)+
s.t.
Z Z
f dµ K
K
(4.2) hj dµ 5 γj , j ∈ Γ.
and let Γ+ ⊆ Γ be the set of indices corresponding to inequality constraints ”≤”. R Let y = (yα ) with yα = K xα dµ, for some µ ∈ M (K)+ . With Ly as in (3.3), rewrite (4.2) in the equivalent form: X ρmom = sup Ly (f ) (= fα yα ) y
α∈Nn
s.t. Ly (hj ) = yα =
Z
X
α∈Nn
K
hjα yα 5 γj , j ∈ Γ
xα dµ, α ∈ Nn ,
(4.3)
for some µ ∈ M (K)+ .
Note that problem (4.3) is entirely described through the moments yα of µ only, and not µ itself. Therefore, we may replace the unknown measure µ by its sequence of moments y, and conditions on y that state that y should be the moment sequence of some measure µ, with support contained in K. We have seen in Chapter 3 that such conditions exist, and depending on the chosen type of conditions, are obtained with either semidefinite or linear constraints on y. We next develop the dual point of view via nonnegative polynomials. Recall that the dual of problem (4.2) is
ρpop = inf λ
s.t.
X
γ j λj
j∈Γ
X j∈Γ
λj hj (x) − f (x) ≥ 0,
λj ≥ 0,
∀x∈K
(4.4)
∀j ∈ Γ+ .
To compute ρpop , we need an efficient characterization of the nonnegativity P on K of the function f0 := j λj hj − f in (4.4). In general, no such charac-
4.2 Semidefinite Relaxations
75
terization exists. However, when K is the basic semi-algebraic set defined in (4.1), then the representations of polynomials positive on K described in Chapter 2, are available. Depending on the chosen representation, the nonnegativity constraint in (4.4) reduces to semidefinite or linear constraints on the coefficients of the polynomial f0 .
4.2
Semidefinite Relaxations
Recall from Chapter 3 that with a sequence y = (yα )α∈Nn , and gj ∈ R[x], we can associate the localizing matrix Mr (gj y), for all j = 1, . . . , m. Depending on parity, let 2vj or 2vj − 1 be the degree of gj ∈ R[x], and 2v0 or 2v0 − 1 be that of f ∈ R[x]. Similarly, let 2wj or 2wj − 1 be the degree of the polynomial hj ∈ R[x], for all j ∈ Γ. The primal relaxation For i ≥ i0 := max[maxj=0,...,m vj , maxj∈Γ wj ], consider the semidefinite optimization problem:
ρi = sup Ly (f ) y
s.t. Ly (hj ) 5 γj , Mi (y) 0,
j∈Γ
(4.5)
Mi−vj (gj y) 0, j = 1, . . . , m.
• The semidefinite constraints Mi (y), Mi−vj (gj y) 0 state necessary conditions on the variables (yα ) with |α| ≤ 2r, to be moments of some finite Borel measure supported on K. Therefore, it immediately follows that ρi ≥ ρmom for all i ≥ i0 and so, Problem (4.5) is a semidefinite relaxation of Problem (4.3) (or (4.2)) for all i ≥ i0 . In addition, (ρi )i forms a monotone nonincreasing sequence, because more constraints are added as i increases. • We call (4.5) a primal relaxation of (4.3) as the unknown variable y is of same ”flavor” as the variable µ in the moment problem (4.2).
4 Algorithms for Moment Problems
76
• The semidefinite relaxation (4.5) is a particular convex optimization problem called a semidefinite program, that can be solved efficiently. Indeed if one uses certain interior point algorithms, then up to arbitrary precision > 0 fixed, a semidefinite program can be solved in a computational time that is polynomial in the input size (i.e., the size needed to code its description); see Section C.2. This feature is essential to explain the power of the moment approach described in this book.
The dual relaxation P P Write Mi (y) = α Bα yα and Mi−vj (gj y) = α Cjα yα , for appropriate real symmetric matrices (Bα , Cjα )α . From standard duality in convex optimization (and more precisely, in semidefinite programming), the dual of the semidefinite program (4.5) reads: X ρ∗i = inf λj γ j λ,X,Zj
j∈Γ
s.t. −hX, Bα i −
m X j=1
hZj , Cjα i +
X, Zj 0, j = 1, . . . , m;
X j∈Γ
λj hjα = fα , |α| ≤ 2r,
λj ≥ 0, ∀j ∈ Γ+ .
(4.6) See Section C.2. The dual variable λj is associated with the constraint Ly (hj ) 5 γj of the primal, whereas the dual matrix variable X (resp. Zj ) is associated with the semidefinite constraint Mi (y) 0 (resp. Mi−vj (gj y) 0). We next interpret the constraints of Problem (4.6). So let (λ, X, {Zj }) be a feasible solution of (4.6). Mutliplying each side of the constraint by xα and summing up yields * + * + m X X X X α α λj hj (x) − f (x) = X, Bα x + Zj , Cjα x . (4.7) j∈Γ
α
j=1
α
Observe that with vi (x) as in (3.9), and recalling the definition of moment and localizing matrices, we have X X Bα xα = vi (x)vi (x)0 ; and Cjα xα = hj (x)vi−vj (x)vi−vj (x)0 , α
α
4.2 Semidefinite Relaxations
77
for all j = 1, . . . , m. From the spectral decomposition of the positive semidefinite matrices X and {Zj }, we obtain X X 0 0 X= f0k f0k ; Zj = fjk fjk , j = 1, . . . , m, k
k
for some vectors {fjk } of appropriate dimensions. Substituting in (4.7), we obtain X j∈Γ
λj hj (x) − f (x) =
X k
hf0k , vi (x)i2 +
m X X j=1 k
hfjk , vi−vj (x)i2 ,
that is, X j∈Γ
λj hj − f = f 0 +
m X
fj gj ,
(4.8)
j=1
with f0 (x) :=
X k
hf0k , vi (x)i2 ;
fj (x) :=
X k
hfjk , vi−vj (x)i2 ,
for all j = 1, . . . , m. Therefore, the fj ’s are all sum of squares, j = 0, 1, . . . , m, and in addition, deg fj gj ≤ 2r, for all j (with g0 = 1). The identity (4.8) is a certificate of nonnegativity on K of the polynomial P j∈Γ λj hj − f . In fact, we have just shown that the dual semidefinite relaxation (4.6) has the equivalent formulation
ρ∗i = inf
λ,fj
s.t.
X
λj γ j
j∈Γ
X j∈Γ
λj hj − f = f 0 +
m X
fj gj ,
(4.9)
j=1
fj ∈ Σ[x] and deg fj ≤ 2(i − vj ),
j = 0, . . . , m.
λj ≥ 0, ∀j ∈ Γ+ .
Therefore the dual semidefinite program (4.9) is a strengthening of the dual (4.4) because we have replaced the nonnegativity on K by a specific (sufficient) certificate of positivity.
78
4 Algorithms for Moment Problems
The above approach corresponds to a classical scheme in optimization. To a relaxation of constraints in the primal problem (i.e., replacing (4.3) with (4.5)) corresponds a strengthening of the initial dual (replacing (4.4) with (4.9)). Let v := maxm j=1 vj . The moment approach algorithm for solving (or at least approximate) the moment problem (1.1) is a follows:
Algorithm 4.1. (The moment problem (1.1)) Input: A set of polynomials {f, (hj )j∈Γ } ⊂ R[x] with degree 2v0 or 2v0 − 1, and 2wj or 2wj − 1, for every j ∈ Γ. A basic semi-algebraic set K = {x ∈ Rn : gj (x) ≥ 0, j = 1, . . . , m}, where the polynomials gj are of degree 2vj or 2vj − 1, j = 1, . . . , m; A number k, the index of the highest relaxation. Output: The optimal value ρmom and a finite set of points of K (the support of an optimal solution µ of (1.1)), or a lower bound ρk on ρmom . Algorithm: 1. Solve the semidefinite optimization problem (4.5) with optimal value ρi . 2. If no optimal solution y∗ is found and i < k, then increase i by one and go to Step 1; if no optimal solution is found and i = k stop and output ρk only provides a lower bound ρk ≤ ρmom . 3. Let y∗ be an optimal solution. If rank Ms−v (y∗ ) = rank Ms (y∗ )
for some i0 ≤ s ≤ i
(4.10)
then ρi = ρmom and rank Ms (y∗ ) points of K are extracted with Algorithm 4.2 (see Section 4.3). They form the (finite) support of a measure µ on K, an optimal solution of the moment problem (1.1). 4. If (4.10) does not hold and i < k, then increase i by one and go to Step 1; otherwise, stop and output ρk only provides a lower bound ρk ≤ ρmom . We next address the convergence of the semidefinite relaxations (4.5)-(4.9).
4.2 Semidefinite Relaxations
79
Theorem 4.1. Let {f, (hj )j∈Γ , (gj )m j=1 } ⊂ R[x], with h0 > 0 on K, and let M (K)+ be the set of finite Borel measures on K, with K ⊂ Rn as in (4.1). Let Assumption 2.1 hold, and let ρmom be the optimal value of the generalized moment problem defined in (4.2), assumed to be finite. Consider the sequence of primal and dual semidefinite relaxations defined in (4.5) and (4.6), with respective sequences of optimal values (ρ i )i and (ρ∗i )i . Then: (a) ρ∗i ↓ ρmom and ρi ↓ ρmom . (b) If (4.5) has an optimal solution y which satisfies rank Ms (y) = rank Ms−v (y)
for some i0 ≤ s ≤ i
(4.11)
(with v := maxj vj ) then ρi = ρmom and the generalized moment problem (4.2) has an optimal solution µ ∈ M (K)+ which is finitely supported on rank Ms (y) points of K.
Proof.
(a) By Theorem 1.3, we know that ρpop = ρmom and so ρpop = ρmom ≤ ρi ≤ ρ∗i ,
∀i ≥ i0 .
Let > 0 and λ be a feasible solution of (4.4) with associated value ρλ which satisfies X ρmom = ρpop ≤ ρλ = λj γj ≤ ρpop + = ρmom + . j∈Γ
In particular, we have X j∈Γ
λj hj − f ≥ 0,
on K.
Recall that h0 > 0 on K, and consider the new solution λ defined by λj = λj for all j 6= 0, and λ0 = λ0 + . The solution λ is feasible in (4.4) because X X λj hj − f = λj hj − f + h0 ≥ h0 > 0, on K, j∈Γ
j∈Γ
and with associated value ρλ = ρλ + γ0 . Invoking Theorem 2.14 we obtain that X j∈Γ
λj hj − f = f 0 +
m X j=1
fj gj ,
4 Algorithms for Moment Problems
80
for some s.o.s. polynomials (fj ) ⊂ Σ[x]. Hence, λ is feasible for the semidefinite relaxation (4.6) as soon as 2i ≥ maxj=0,...,m deg fj gj (with g0 = 1). But then ρ∗i ≤ ρλ ≤ ρmom + (1 + γ0 ), and as > 0 is arbirary, the result follows. (b) If (4.11) holds for an optimal solution y of ρi , then by Theorem 3.11, y is the sequence of moments up to order 2s of a Borel measure µ finitely supported on rank Ms (y) points of K. Hence Z Z ρi = Ly (f ) = f dµ; hj dµ = Ly (hj ) 5 γj , ∀j ∈ Γ, K
K
which shows that µ is a feasible solution of the generalized moment problem (4.2), with value ρi ≥ ρmom . Therefore µ is an optimal solution of the generalized moment problem, the desired result. If (4.11) holds at an optimal solution y of ρi then finite convergence takes place, that is the semidefinite relaxation (4.5) is exact, with optimal value ρi = ρmom . Moreover, in this case one may extract t (= rank Ms (y)) points of K, supports of a measure µ ∈ M (K)+ , optimal solution of the generalized moment problem (4.2). How to extract those points is the purpose of Algorithm 4.3 detailed in the next section. Notice that if K is compact but Assumption 2.1 does not hold, we may still define converging semidefinite relaxations with constraints Mi (gJ y) Q 0, for all J ⊆ {1, . . . , m}, where gj := j∈J gj , and g∅ = 1. The only change in the proof is to invoke Theorem 2.13 instead of Theorem 2.14. However, the corresponding semidefinite relaxations have now 2m linear matrix inequalities instead of m in (4.6), a serious drawback for practical purposes. 4.3
Extraction of Solutions
In this section we describe a procedure that extracts an optimal solution of the generalized moment problem (1.1) when the semidefinite relaxation (4.5) is exact at some step i of the hierarchy, i.e., with ρi = ρmom , and the rank condition (4.11) of Theorem 4.1 holds. In this case an optimal solution of (1.1) is a measure supported on rank Ms (y) points of K. Therefore, let y be an optimal solution of the semidefinite relaxation (4.5) for which (4.11) holds, so that ρi = ρmom . Then the main steps of the extraction algorithm can be sketched as follows.
4.3 Extraction of Solutions
81
• Cholesky factorization. As condition (4.11) holds, y is the vector of a rank Ms (y)-atomic Borel measure µ supported on K. That is, there are r (= rank Ms (y)) points (x(k))rk=1 ⊂ K such that µ =
r X
κ2j δx(j) ,
j=1
κj 6= 0, ∀j;
r X
κ2j = y0 ,
(4.12)
j=1
with δ• being the Dirac measure at •. Hence, by construction of the moment matrix Ms (y), Ms (y) =
r X
κ2j vs (x∗ (j))(vs (x∗ (j))0 = V∗ D(V∗ )0
(4.13)
j=1
where V∗ is written columnwise as V∗ = vs (x∗ (1)) vs (x∗ (2)) · · · vs (x∗ (r))
with vs (x) as in (3.9), and D is a r×r diagonal matrix with entries D(i, i) = d2i , i = 1, . . . , r. In fact, the weights (κj )rj=1 do not play any role in the sequel. As long as κj 6= 0 for all j, the rank of the moment matrix Ms (y) associated with the Borel measure µ defined in (4.12) does not depend on the weights κj , The extraction procedure with another matrix Ms (˜ y) written as in (4.13) but with different weights κ ˜ j , would yield the same global minimizers ˜ would also be an optimal (x∗ (j))rj=1 . Of course, the new associated vector y solution of the semidefinite relaxation with value ρi = ρmom . Extract a Cholesky factor V of the positive semidefinite moment matrix Ms (y), i.e. a matrix V with r columns satisfying Ms (y) = VV0 .
(4.14)
Such a Cholesky factor can be obtained via singular value decomposition, or any cheaper alternative; see e.g. Golub and Loan (1996). The matrices V and V∗ span the same linear subspace, so the solution extraction algorithm consists in transforming V into V ∗ by suitable column operations. This is described in the sequel. • Column echelon form. Reduce V to column echelon form
82
4 Algorithms for Moment Problems
1 ? 0 1 0 0 ? ? U = .. . 0 0 ? ? . ..
1 ? ..
. 0 ··· ? ···
1 ? .. .
? ? ? ··· ?
by Gaussian elimination with column pivoting. By construction of the moment matrix, each row in U is indexed by a monomial xα in the basis vs (x). Pivot elements in U (i.e. the first non-zero elements in each column) correspond to monomials xβj , j = 1, 2, . . . , r of the basis generating the r solutions. In other words, if w(x) = xβ1
xβ 2 . . . x β r
denotes this generating basis, then
0
vs (x) = Uw(x)
(4.15)
(4.16)
for all solutions x = x∗ (j), j = 1, 2, . . . , r. In summary, extracting the solutions amounts to solving the system of polynomial equations (4.16). • Solving the system of polynomial equations (4.16). Once a generating monomial basis w(x) is available, it turns out that extracting solutions of the system of polynomial equations (4.16) reduces to solving a linear algebra problem, as described below. 1. Multiplication matrices. For each degree one monomial xi , i = 1, 2, . . . , n, extract from U the r × r multiplication matrix Ni containing the coefficients of monomials xi xβj , j = 1, 2, . . . , r, in the generating basis (4.15), i.e. such that Ni w(x) = xi w(x),
i = 1, 2, . . . , n.
(4.17)
4.3 Extraction of Solutions
83
The entries of global minimizers x∗ (j), j = 1, 2, . . . , r are all eigenvalues of multiplication matrices Ni , i = 1, 2, . . . , n. That is, Ni w(x∗ (j)) = x∗i (j) w(x∗ (j)),
i = 1, . . . , n;
j = 1, . . . , r.
But how to reconstruct the solutions (x∗ (j)) from knowledge of the eigenvalues of the Ni s? Indeed, from the n r-uplets of eigenvalues one could build up rn possible vectors of Rn whereas we are looking for only r of them. 2. Common eigenspaces. Observe that for every j = 1, . . . , r, the vector w(x∗ (j)) is an eigenvector common to all matrices Ni , i = 1, . . . , n. Therefore, in order to compute (x∗ (j)), one builds up a random combination
N=
n X
λi N i
i=1
of multiplication matrices Ni s, where λi , i = 1, 2, . . . , n are nonnegative real numbers summing up to one. Then with probability 1, the eigenvalues of N are all distinct and so N is non-derogatory, i.e., all its eigenspaces are 1-dimensional (and spanned by the vectors w(x∗ (j)), j = 1, . . . , r). Then, compute the ordered Schur decomposition N = Q T Q0
(4.18)
where Q = q1 q2 · · · q r
is an orthogonal matrix (i.e. q0i qi = 1 and q0i qj = 0 for i 6= j) and T is upper-triangular with eigenvalues of N sorted in increasing order along the diagonal. Finally, the i-th entry x∗i (j) of x∗ (j) ∈ Rn is given by x∗i (j) = q0j Ni qj ,
i = 1, 2, . . . , n,
j = 1, 2, . . . , r.
In summary here is the extraction algorithm:
(4.19)
4 Algorithms for Moment Problems
84
Algorithm 4.2. (The extraction algorithm) Input: The moment matrix Ms (y) of rank r. Output: The r points x∗ (i) ∈ K, i = 1, . . . , r, support of an optimal solution of the moment problem. Algorithm: 1. Get the Cholesky factorization VV0 of Ms (y). 2. Reduce V to an echelon form U 3. Extract from U the multiplication matrices Ni , i = 1, . . . , n. Pn 4. Compute N := i=1 λi Ni with randomly generated coefficients λi , and the Schur decomposition N = QTQ0 . Compute Q = q1 q2 · · · qr , and x∗i (j) = q0j Ni qj ,
i = 1, 2, . . . , n,
j = 1, 2, . . . , r.
Example 4.1. Consider the moment problem (1.1) in R2 , where x 7→ f (x) = −(x1 − 1)2 − (x1 − x2 )2 − (x2 − 3)2 ; h1 = 1; γ1 = 1, and K = { x ∈ R2 : 1 − (x1 − 1)2 ≥ 0; 1 − (x1 − x2 )2 ≥ 0; 1 − (x2 − 3)2 ≥ 0}. The first (i = 1) semidefinite relaxation yields ρ1 = −3 and rank M1 (y) = 3, whereas the second (i = 2) semidefinite relaxation yields ρ2 = −2 and rank M1 (y) = rank M2 (y) = 3. Hence, the rank condition (4.11) is satisfied, which implies that −2 = ρ2 = ρmom . The moment matrix M2 (y) reads 1.0000 1.5868 2.2477 2.7603 3.6690 5.2387 1.5868 2.7603 3.6690 5.1073 6.5115 8.8245 2.2477 3.6690 5.2387 6.5115 8.8245 12.7072 M2 (y) = 2.7603 5.1073 6.5115 9.8013 12.1965 15.9960 3.6690 6.5115 8.8245 12.1965 15.9960 22.1084 5.2387 8.8245 12.7072 15.9960 22.1084 32.1036 and the monomial basis is v2 (x) = 1 x1
x2
x21
x1 x2
0 x22 .
4.3 Extraction of Solutions
85
The Cholesky factor (4.14) of M2 (y) is given by
−0.9384 −0.0247 0.3447 −1.6188 0.3036 0.2182 −2.2486 −0.1822 0.3864 V= −2.9796 0.9603 −0.0348 −3.9813 0.3417 −0.1697 −5.6128 −0.7627 −0.1365 whose column echelon form reads (after rounding)
1 01 00 U= −2 3 −4 2 −6 0
1 . 0 2 5
Pivot entries correspond to the following generating basis (4.15) w(x) = [1 x1
0
x2 ] .
From the subsequent rows in matrix U we deduce from (4.16) that all solutions x must satisfy the three polynomial equations x21 = −2 + 3x1 x1 x2 = −4 + 2x1 + 2x2 x22 = −6 + 5x2 . Multiplication matrices (4.17) (by x1 and x2 ) in the basis w(x) are readily extracted from rows in U: 010 001 N1 = −2 3 0 , N2 = −4 2 2 . −4 2 2 −6 0 5 A randomly chosen convex combination of N1 and N2 yields
0 0.6909 0.3091 N = 0.6909 N1 + 0.3091 N2 = −2.6183 2.6909 0.6183 −4.6183 1.3817 2.9274
4 Algorithms for Moment Problems
86
with orthogonal matrix in Schur decomposition (4.18) given by 0.4082 0.1826 −0.8944 Q = 0.4082 −0.9129 −0.0000 . 0.8165 0.3651 0.4472
From equations (4.19), we obtain the 3 optimal solutions 1 2 2 x∗ (1) = , x∗ (2) = , x∗ (3) = . 2 2 3 Numerical stability
All operations of the above solution extraction algorithm are numerically stable, except the Gaussian elimination step with column pivoting. However, practical experiments with GloptiPoly reveal that ill-conditioned problem instances leading to a failure of Gaussian elimination with column pivoting are very scarce. This experimental property of Gaussian elimination was already noticed in Golub and Loan (1996). Number of extracted solutions The number of solutions (x∗ (j)) extracted by the algorithm is equal to the rank of the moment matrix Ms (y). Up to our knowledge, when solving a semidefinite relaxation there is no easy way to control the rank of the moment matrix, hence the number of extracted solutions. 4.4
Linear Relaxations
In this section, we derive a hierarchy of linear programming (LP) relaxations whose associated sequence of optimal values converges to the desired value ρmom . We introduce the following linear optimization problem
Li = sup y
X
Ly (f )
α∈Nn
s.t. Ly (hj ) 5 γj ,
j ∈ Γ,
Ly (g α (1 − g)β ) ≥ 0, ∀α, β : deg g α (1 − g β ) ≤ 2i, and its dual:
(4.20)
4.5 Extensions
L∗i = inf c,λ s.t.
X
λj γ j
j∈Γ
X j∈Γ
87
λj hj − f =
X α,β
cαβ g α (1 − g)β , α
(4.21)
β
cαβ ≥ 0, ∀α, β : deg g (1 − g) ≤ 2i. λj ≥ 0, ∀j ∈ Γ+ .
The linear relaxation (4.20) is the linear programming (LP) analogue of the semidefinite relaxation (4.5), whereas its dual (4.21) is the LP analogue of the dual semidefinite program (4.6) (or its equivalent form (4.9)). We next address convergence.
Theorem 4.2. Let {f, (hj )j∈Γ , (gj )m j=1 } ⊂ R[x], with h0 = 1, and let K ⊂ Rn be as in (4.1). Let Assumption 2.2 hold and let ρmom be the optimal value of the generalized moment problem defined in (4.2), assumed to be finite. Consider the sequence of primal and dual linear relaxations defined in (4.20) and (4.21), with respective sequences of optimal values (Li )i and (L∗i )i . Then L∗i ↓ ρmom and Li ↓ ρmom . Proof. The proof mimics that of Theorem 4.1, the only change is that we P invoke Theorem 2.23 for the representation of the polynomial j∈Γ λj hj − f + , positive on K. 4.5
Extensions
In this section we consider a few extensions of the generalized moment problem. Namely, we consider the case where one has countably many moment constraints, the case of several unknown measures, and the case where some sparsity pattern is present. 4.5.1
Extensions to countably many moment constraints
We now consider the moment problem (4.2) with countably many moment constraints
88
4 Algorithms for Moment Problems
Z
hj dµ 5 γj ,
j = 0, 1, . . . ,
K
that is, when Γ = N. For every i ∈ N let Γi := {j ∈ {0, 1, . . . , i} : deg hj ≤ 2i} so that Γi is finite and Γi ↑ Γ as i→∞. With i ≥ maxj vj , consider the following semidefinite program: ρi = sup Ly (f ) y
s.t. Ly (hj ) 5 γj , Mi (y) 0,
j ∈ Γi
(4.22)
Mi−vj (gj y) 0, j = 1, . . . , m .
Theorem 4.3. Let Assumption 2.1 hold, and let Γ = N. Assume that h0 > 0 on K and let ρmom be the optimal value of the generalized moment problem defined in (4.2), assumed to be finite. Consider the sequence of semidefinite relaxations defined in (4.22). Then ρi ↓ ρmom as i→∞. The proof is postponed to Section 4.10.
4.5.2
Extension to several measures
So far, the moment problem (4.2) we have considered has only one measure as unknown. In some applications developed in later chapters, we need consider an extension of (1.1) that involves several (but finitely many) unknown measures µi , i = 1, . . . , p, not necessarily on the same space. Therefore, let Γ, ∆ be two finite sets of indices, and for every i ∈ ∆: - Let {fi , (hij )j∈Γ } ⊂ R[xi ] = R[xi1 , . . . , xini ], for some integers ni ∈ N, i ∈ ∆. - Let Ki ⊂ Rni be the basic closed semi-algebraic sets Ki := { xi ∈ Rni : g(xi ) ≥ 0, for some finite set of polynomials Gi ⊂ R[xi ].
g ∈ Gi },
(4.23)
4.5 Extensions
89
Then consider the following problem: ρmom =
XZ
sup
µi ∈M (Ki )+ i∈∆
s.t.
XZ
i∈∆
fi dµi Ki
(4.24) hij dµi 5 γj ,
Ki
j∈Γ
with dual ρpop = inf λ
s.t.
X
γ j λj
j∈Γ
X j∈Γ
λj hij (xi ) − fi (xi ) ≥ 0,
∀ xi ∈ Ki , i ∈ ∆.
(4.25)
λj ≥ 0, ∀j ∈ Γ+ .
Theorem 4.4 below is the analogue of Theorem 1.3 for the multimeasures moment problem (4.24).
Theorem 4.4. Assume that Ki is compact for each i ∈ ∆ and there exists j ∈ Γ such that hij > 0 on Ki for all i ∈ ∆. Assume that (4.24) has a feasible solution (µi )i∈∆ . Then ρmom = ρpop and the moment problem (4.24) has an optimal solution, that is, the sup is attained. In addition, (4.24) has an optimal solution (µi ) such that each µi is supported on finitely many points of Ki , i.e., µi a finite linear combination of Dirac measures, i ∈ ∆. As Theorem 1.3 for the single measure moment problem could be extended in Corollary 1.4 to handle countably many moment constraints, Theorem 4.4 can be also extended in a similar way to handle countably many moment constraints. Semidefinite relaxations We now describe the semidefinite relaxations associated with (4.24). Let vg := ddeg g/2e for every g ∈ Gi , i ∈ ∆, let di := max{vg : g ∈ Gi }, i ∈ ∆, and let D := max{deg hij : i ∈ ∆, j ∈ Γ}. Next, with k ≥ k0 := max[D, deg f, maxi∈∆ di ], the following semidefinite program
4 Algorithms for Moment Problems
90
ρk = sup
X
Lyi (fi )
yi i∈∆
s.t.
X
j∈Γ
Lyi (hij ) 5 γj ,
i∈∆
Mk (yi ) 0, i
i∈∆
Mk−vg (g y ) 0,
(4.26)
g ∈ Gi , i ∈ ∆
is the analogue for the multi-measures moment problem (4.24) of (4.5) for the moment problem (4.2). Its dual reads: ρ∗k = inf λ
s.t.
X
j∈Γ X j∈Γ
λj γ j λj hij − σi −
X
g∈Gi
σg g − fi = 0,
σi , σg ∈ Σ[xi ]; deg σg g ≤ 2k, λj ≥ 0, ∀j ∈ Γ+ .
on Ki , ∀ i ∈ ∆
(4.27)
∀ g ∈ Gi , ∀ i ∈ ∆
And we obtain the following multi-measures version of Theorem 4.1.
Theorem 4.5. Assume that the hypotheses of Theorem 4.4 hold true, and Assumption 2.1 holds for each Ki , i ∈ ∆. Let ρmom be the optimal value of the moment problem (4.24), assumed to be finite. Consider the primal and dual semidefinite relaxations (4.26) and (4.27), with respective optimal values ρk and ρ∗k . Then: (a) ρ∗k ↓ ρmom and ρk ↓ ρmom as k→∞. (b) If (4.26) has an optimal solution (y i )i∈∆ which satisfies rank Msi (yi ) = rank Msi −di (yi ),
∀i ∈ ∆,
(4.28)
for some k0 ≤ si ≤ k, then ρk = ρmom and the generalized moment problem (4.24) has an optimal solution (µi )i∈∆ with each µi being finitely supported on rank Msi (yi ) points of Ki , i ∈ ∆. The proof of Theorems 4.4 and 4.5 are verbatim copies of that of Theorems 1.3 and 4.1 with ad hoc adjustments.
4.6 Exploiting Sparsity
4.6
91
Exploiting Sparsity
Consider the moment problem (4.2). Despite their nice properties, the size of the semidefinite relaxations (4.5) grows rapidly with the dimension n. n+i Typically, the moment matrix M (y) is s(i) × s(i) with s(i) = i n , and n+2i there are n variables yα . This makes the applicability of Algorithm 4.1 limited to small to medium size problems only. Fortunately, in many practical applications of large size moment problems, some sparsity pattern is often present and may be exploited. For instance, under a condition called the Running Intersection Property, Theorem 2.27 of Chapter 2 is an example on how a sparsity pattern in the data (f, K) can be exploited to yield a specific ”sparse” representation of f , positive on K. In this section we also obtain a sparse version of Algorithm 4.1 when the moment problem (4.2) exhibits some sparsity pattern in its data {f, (hj )j∈Γ , K}. As we have seen in section 2.7, suppose that there is no coupling between some subsets of variables in the polynomials gk that define the set K, and the polynomials f and hj , j ∈ Γ. By no coupling between two sets of variables we mean that there is no monomial involving some variables of the two sets in any of the polynomials f , gk or hj . In other words, with R[x] = R[x1 , . . . , xn ], let I0 := {1, . . . , n} be the union (with possible overlaps) ∪pk=1 Ik of p subsets Ik , k = 1, . . . , p, with cardinal denoted nk . For an arbitrary V ⊆ I0 , let R[x(V )] denote the ring of polynomials in the variables x(V ) = {xi : i ∈ V }, and so R[x(I0 )] = R[x]. Let Assumption 2.3 and 2.4 hold, and also assume that for every j ∈ Γ, hj =
p X k=1
hjk ,
hjk ∈ R[x(Ik )].
(4.29)
So as in Section 2.7, let m0 = m + p and {1, . . . , m0 } = ∪pi=1 Ji and K be as in (2.30) after having added the p redundant quadratic constraints (2.29). For all i = 1, . . . , p, let Ki := {x ∈ Rni : gk (x) ≥ 0, so that the set K = {x ∈ Rn : gj (x) ≥ 0, K = { x ∈ Rn : x(Ii ) ∈ Ki ,
k ∈ Ji }
∀ j ∈ ∪pi=1 Ji } also reads i = 1, . . . , p }.
Let ∆ = {1, . . . , p}, and let Ijk := Ij ∩ Ik with cardinal njk whenever Ijk 6= ∅. For every j ∈ ∆, if Ij ∩ Ik 6= ∅, then denote by πjk µj the measure
4 Algorithms for Moment Problems
92
on Rnjk which is the projection of µj on the variables x(Ijk ). Define: Γj := { k : k > j
and Ij ∩ Ik 6= ∅} j ∈ ∆, j < p,
and consider the multi-measures generalized moment problem:
ρsparse mom =
sup
XZ
fi dµi
µi ∈M (Ki )+ i∈∆
s.t.
X
hki dµi 5 γk ,
i∈∆
(4.30)
∀k ∈ Γ ∀ k ∈ Γj , 1 ≤ j < p.
πjk µj = πkj µk ,
Notice that in view of Assumption 2.3 and 2.4, every Ki is compact, and so, the constraint πjk µj = πkj µk reduces to the countably many moment constraints Z Z x(Ijk )α dµj − x(Ijk )α dµk = 0, ∀α ∈ Nnjk . (4.31) Kj
Kk
Therefore, the moment problem (4.30) is an instance of the multi-measures moment problem (4.24) with the additional feature that it has countably many moment constraints instead of finitely many. Its dual reads: X ρsparse = inf λj γ j pop λ,ujk α
s.t.
X
j∈∆
λk ≥ 0, ∀k ∈ Γ+
λk hki +
k∈ΓX
−
X
X
(4.32)
α uik α x(Iik )
n
X k∈Γi α∈N ik α uli α x(Iil ) − fi ≥ 0
l:i∈Γl α∈Nnil
on Ki ,
i ∈ ∆.
Recall from Chapter 2 that the running intersection property holds if the collection of sets {Ik } satisfies the following condition:
For k = 1, . . . , p − 1,
Ik+1 ∩
k [
j=1
Ij ⊆ I s
for some s ≤ k. (4.33)
4.6 Exploiting Sparsity
93
Theorem 4.6. Let Assumptions 2.3 and 2.4 hold as well as (4.29), and suppose that for some k ∈ Γ, hki > 0 on Ki for every i = 1, . . . , p. Consider the multi-measures moment problem (4.30). Then ρsparse mom ≥ ρmom and (4.30) has an optimal solution, that is, the sup is attained. Moreover, (4.30) has an optimal solution (µi )i∈∆ such that each µi is supported on finitely many points of Ki , i.e., µi a finite linear combination of Dirac measures, i ∈ ∆. In addition, if the running intersection property (4.33) holds then ρsparse mom = ρmom .
Proof. By Theorem 1.3 the moment problem (4.2) is solvable with optimal value ρmom . Let µ be an optimal solution and for every i ∈ ∆, let µi := πi µ be the marginal of µ on Rni (projection on the variables x(Ii )). Then the family (µi )i∈∆ is feasible for the moment problem (4.30) with value Z XZ fi dµi = f dµ = ρmom , i∈∆
Ki
K
which shows that ρsparse mom ≥ ρmom . We next prove that (4.30) is solvable. As it has a feasible solution, let (µni )i∈∆ be a maximizing sequence of (4.30). Let k ∈ Γ be such that hki > 0 on Ki , for every i = 1, . . . , p. Then hki ≥ δi P on Ki for some δi > 0, and so i∈∆ δi µi (Ki ) ≤ γk , which shows that all feasible solutions (µi )i∈∆ are uniformly bounded. Therefore, as every Ki is compact, and the measures µni are bounded uniformly in n, there is a subsequence (nk ) and a measure µ∗i on Ki , such that for every i = 1, . . . , p, µni k ⇒ µ∗i as k→∞.1 As Ki is compact, for every j ∈ Γ, p Z p Z X X γj = hji dµni k → hji dµ∗i as k→∞. i=1
Similarly, R
Ki
πjl µnj k →πjl µ∗j
i=1
Ki
as k→∞.2 Hence, πjl µ∗j = πlj µ∗l for all (j, l) with
1 Recall that µnk i R nk ∗ Ki g dµi → Ki g dµi as 2 This is because for g
where
Z
∗ πjl
Kj ∩Kl
⇒ µ∗i is the weak convergence of measures, i.e., k→∞, for all functions g continuous on Ki . See Section B.1 ∈ R[x(Ijl )], Z Z Z n n ∗ ∗ πjl g dµj k → πjl g dµ∗j = g d(πjl µ∗j ), g d(πjl µj k ) = Kj
: R[x(Ijl )]→R[x(Ij )] is given by
Kj
∗ g)(x(I )) (πjl j
Kj
= g(x(Ijl )) for all x(Ij ) ∈ Rnj .
4 Algorithms for Moment Problems
94
Ij ∩ Il 6= ∅. And so, (µ∗i )i∈∆ is a feasible solution for (4.30) with value p Z X i=1
Ki
fi dµ∗i
= lim
k→∞
p Z X i=1
Ki
fi dµni k = ρsparse mom ,
the desired result. Finally, to prove that ρsparse mom = ρmom if (2.32) holds, and in view of sparse ρmom ≥ ρmom , it suffices to prove that given an optimal solution (µi )i∈∆ of (4.30), one may construct a measure µ feasible for (4.2), with same value as ρsparse mom . By Lemma B.13, there exists a measure µ on K such that its projection πk µ on Rnk (i.e., on the variables x(Ik )) is the measure µk , k ∈ ∆. Therefore, as hki ∈ R[x(Ii )] for every k ∈ Γ, i ∈ ∆, we obtain Z XZ XZ hk dµ = hki dµ = hki dµi 5 γk , ∀ k ∈ Γ, K
i∈∆
K
i∈∆
Ki
which proves that µ is feasible for the moment problem (4.2). Similarly, as P f = i fi with fi ∈ R[x(Ii )], we obtain ρsparse mom =
p Z X i=1
fi dµi = Ki
p Z X i=1
fi dµ = K
Z
f dµ, K
and so, µ is an optimal solution of (4.2) with value ρmom = ρsparse mom . 4.6.1
Sparse semidefinite relaxations
Recall that the moment and localizing matrices Mi (y) and Mi (gj y) used in the semidefinite relaxations (4.5) are indexed in the canonical basis of R[x]. For every α ∈ Nn , let supp (α) ∈ I be the support of α, i.e., supp (α) := { i ∈ {1, . . . , n} :
αi 6= 0 },
α ∈ Nn .
For instance, with n = 6 and α := (004020), supp (α) = {3, 5}. For each k = 1, . . . , p, let Ik be the set of all subsets of Ik . Next, define Sk := { α ∈ Nn :
supp (α) ∈ Ik },
k = 1, . . . , p.
(4.34)
A polynomial h ∈ R[x(Ik )] can be viewed as a member of R[x], and is written X h(x) = h(x(Ik )) = hα x α (4.35) α∈Sk
4.6 Exploiting Sparsity
95
i Instead of defining moments variables yi = (yα ) (with α ∈ Nni ) for each measure µi , i = 1, . . . , p, and relating them via the equality constraints (4.31), we find better to use a single y = (yα ) with α ∈ Nn but use only those yα with supp(α) ∈ ∪pi=1 Ik . With k ∈ {1, . . . , p} fixed, and g ∈ R[x(Ik )], let Mi (y, Ik ) (resp. Mi (g y, Ik )) be the moment (resp. localizing) submatrix obtained from Mi (y) (resp. Mi (g y)) by retaining only those rows (and columns) α ∈ Nn of Mi (y) (resp. Mi (g y)) such that supp(α) ∈ Ik . In doing so, Mi (y, Ik ) and Mi (g y, Ik ) can be viewed as moment and localizing matrices with rows and columns indexed in the canonical basis of R[x(Ik )]i . Indeed, Mi (y, Ik ) contain only variables yα with supp (α) ∈ Ik , and so does Mi (g y, Ik ) because g ∈ R[x(Ik )]. And for every polynomial u ∈ R[x(Ik )]i , with coefficient vector u in the canonical basis, we also have
hu, Mi (y, Ik )ui = Ly (u2 ), 2
hu, Mi (g y, Ik )ui = Ly (g u ),
∀ u ∈ R[x(Ik )]i
∀ u ∈ R[x(Ik )]i ,
and therefore, Mi (y, Ik ) 0 ⇔ Ly (u2 ) ≥ 0, 2
Mi (g y, Ik ) 0 ⇔ Ly (g u ) ≥ 0,
∀ u ∈ R[x(Ik )]i
∀ u ∈ R[x(Ik )]i .
The sparse analogue of the semidefinite relaxations (4.5) read:
ρsparse = sup Ly (f ) i y
s.t. Ly (hj ) 5 γj , Mi (y, Ik ) 0,
∀j ∈ Γ
k = 1, . . . , p
Mi−vj (gj y, Ik ) 0, ∀ j ∈ Jk , k = 1, . . . , p.
The dual of (4.36) is the semidefinite program:
(4.36)
4 Algorithms for Moment Problems
96
(ρsparse )∗ = i s.t.
inf
λ,σkj
X j∈Γ
X
λj γ j
j∈Γ
λj hj −
p X
σk0 +
k=1
λj ≥ 0, j ∈ Γ+
X
l∈Jk
σkl gl
!
−f = 0
(4.37)
σk0 , σkl ∈ Σ[x(Ik )], l ∈ Jk , k = 1, . . . , p deg σk0 , deg σkl gl ≤ 2i, l ∈ Jk , k = 1, . . . , p.
Theorem 4.7. Assume that the moment problem (4.2) has optimal value ρmom > −∞, let Assumption 2.3 and 2.4 hold as well as (4.29), and suppose that for some q ∈ Γ, hqi > 0 on Ki for every i = 1, . . . , p. Consider the sparse semidefinite relaxations (4.36) and (4.37). If the running intersection property (4.33) holds then ρsparse ↓ ρmom i as i→∞.
The proof of Theorem 4.7 being tedious, it is postponed to Section 4.10. 4.6.2
Computational complexity
The number of variables for the sparse semidefinite relaxation (4.36) (with Pp optimal value ρsparse ) is bounded by k=1 nkn+2i , and so, if all nk ’s are i k close to each other, say nk ≈ n/p for all k, then one has at most O(p( np )2i ) variables, a big saving when compared with O(n2i ) in the original semidefinite relaxation (4.5). In addition, one also has p moment matrices of size O(( np )i ) and m + p localizing matrices of size O(( np )i−v ) (where v = maxj ddeg gj /2e) to be compared with a single moment matrix of size O(ni ) and m localizing matrices of size at most O(ni−v ) in (4.5). When using an interior point method to solve (4.36), it is definitely better to handle p matrices, each of size (n/p)i , rather than a single one of size ni .
4.7 Summary
4.7
97
Summary
We have presented two general numerical approximation schemes for computing (or approximating) the optimal value ρmom of the generalized moment problem (4.2) with polynomial data. An important observation is that convexity of the underlying sets K does not influence the approach. The underlying (linear or semidefinite) relaxations are convex for arbitrary basic semi-algebraic sets K. In fact, the set K might have a very complicated geometry. This observation underlies the generality of the approach. Of course, the price to pay is that the size of the relaxations might be very large. The first numerical scheme consists of solving a hierarchy of semidefinite relaxations of increasing size, where the increase in size is due to allowing larger and larger (polynomial) degrees in some s.o.s. representation of polynomials, positive on K. Thanks to some powerful Positivstellensatz, convergence of the sequence of optimal values to the exact optimal value ρmom is guaranteed. In addition, if some rank condition holds at an optimal solution of some semidefinite relaxation in the hierarchy, finite convergence takes place and one may extract finitely many points which are the support of a measure, optimal solution of the generalized moment problem. The second numerical scheme consists of solving an analogous hierarchy of linear relaxations of increasing size. Again, the increase in size is due to allowing larger and larger (polynomial) degrees in some other representation of polynomials, positive on K. Thanks to some other ad hoc Positivstellensatz, convergence of the sequence of optimal values to the exact optimal value ρmom is also guaranteed. Importantly, the above numerical scheme can also be applied for variations of the generalized moment problem (4.2), provided the problem is described with polynomials, or to some extent, piecewise polynomials. Convergence is proved here for the special (but still relatively general) cases considered in this section, but sometimes, convergence in more complicated moment problems may also be proved using tailored arguments, depending on the problem on hand. The methodology we outlined leads to semidefinite and linear relaxations whose solution approximates (and sometimes computes exactly) the optimal value ρmom of problem (4.2). We have also considered variants the moment problem (4.2), namely: • (4.2) with countably many moment constraints.
4 Algorithms for Moment Problems
98
• (4.24) with several measures on (possibly) different spaces (as in e.g. Exercise 4.3), with their support contained on basic semialgebraic sets, explicitly defined by polynomial inequalities. For those variants, we have proved convergence of the corresponding semidefinite relaxations. Finally, we have also considered sparse semidefinite relaxations to treat the moment problem (4.2) when its data (f, K, hj ) exhibit a certain sparsity pattern (often present in many large scale problems). When the sparsity patterns satisfies some condition (well-known in graph theory), then convergence is also proved. This opens up the door to the possibility of solving moment problems of large dimensions. Of course, one may also think of several other variants of (4.2) for which the methodology we outlined can be applied but with no general convergence result available. However, the semidefinite relaxations provide tighter and tighter upper bounds on the optimal value of these variants of (4.2).
4.8
Exercises
Exercise 4.1. Suppose K = Rn . Provide an example such that the sequence of relaxations in Theorem 4.1 does not converge to ρmom . Exercise 4.2. Prove that Theorem 4.4 can be extended to handle countably many moment constraints, i.e., if the set Γ is countable instead of being finite. Exercise 4.3. Let Γ be a finite set of indices, f1 ∈ R[x], f2 ∈ R[z]. Let (h1j )j∈Γ ⊂ R[x] and (h2j )j∈Γ ⊂ R[z], and let K1 ⊂ Rn , K2 ⊂ R be basic semi-algebraic sets. Consider the problem Z Z ρmom = sup f1 dµ1 + f2 dµ2 µ1 ∈M (K1 ),µ2 ∈M (K2 ) K1 K Z Z2 (4.38) s.t. h1j dµ1 + h2j dµ2 = γj , j ∈ Γ K1
K2
involving two measures µ1 and µ2 . Let K2 ⊂ R be the interval [0, +∞). Let Assumption 2.1 hold for K1 and hki > 0, i = 1, 2 for some k ∈ Γ. (a) Can we extend Theorem 4.4? (b) Write the semidefinite relaxations (4.26) for Problem (4.38).
4.10 Proofs
99
(c) Can we prove that the sequence of relaxations converges to the value ρmom ? Exercise 4.4. Assume that Theorem 1.3 holds. Give a simple proof of Theorem 4.7. (Hint: Use Theorem 2.28.) 4.9
Notes and Sources
The approach in this chapter was outlined in Lasserre (2008b). Theorem 4.7 was proved in Lasserre (2006a) for polynomial optimization problems with sparsity, i.e., when Γ = {0} and h0 = 1. 4.10 4.10.1
Proofs Proof of Theorem 4.3
Call Qi the semidefinite program (4.22). We first prove that Qi has a feasible solution. Let µ be a feasible solution of the moment problem (4.2), and let y = (yα ) be the sequence of moments of µ (well-defined because µ is supported on K). Recalling the definition of Mi (y) and Mi−vj (gj y), one has Mi (y) 0 and Mi−vj (gj y) 0. In addition, Z Ly (hk ) = hk dµ 5 γk , ∀k ∈ Γ, K
which proves that y is a feasible solution of Qi . Therefore ρmom ≤ ρi for every i. We next prove that ρi < ∞ for all sufficiently large i. Let 2i0 ≥ max[deg h0 , deg f, maxj vj ], and let Nni := {α ∈ Nn : |α| ≤ i}, i ∈ N. As K is compact, there exists N such that N ± xα > 0 on K for all α ∈ Nni0 , and in view of Assumption 2.1, the polynomial x 7→ N ± xα belongs to the quadratic module Q ⊂ R[x] generated by {gj } ⊂ R[x], i.e., m X Q := σ0 + σj gj : σj ∈ Σ[x] ∀ j ∈ {0, . . . , m} . j=1
Similarly, as h0 > 0 on K and K is compact, h0 − δ > 0 on K for some δ > 0, and so h0 − δ ∈ Q. But there is even some integer l(i0 ) such that h0 − δ ∈ Ql(i0 ) and N ± xα ∈ Q(l(i0 )) for all α ∈ Ni0 , where Q(t) ⊂ Q is the set of elements Pm of Q which have a representation σ0 + j=1 σj gj for some {σj } ⊂ Σ[x]
100
4 Algorithms for Moment Problems
with deg σ0 ≤ 2t and deg σj gj ≤ 2t for all j = 1, . . . , m. Of course, we also have h0 − δ ∈ Q(l) and N ± xα ∈ Q(l) for all α ∈ Nni0 , whenever l ≥ l(i0 ). Therefore, consider i = l(i0 ) (≥ i0 ). For every feasible solution y of Ql(i0 ) , the constraint Ly (h0 ) 5 γ0 yields that y0 ≤ M := γ0 /δ. Indeed, as h0 − δ ∈ Q(l(i0 )), γ0 − δy0 = Ly (h0 − δ) = Ly (σ0 ) +
m X
Ly (σj gj )
j=1
for some (σj ) ⊂ Σ[x]. Therefore, the semidefinite constraints Ml(i0 ) (y) 0 and Ml(i0 )−vj (gj y) 0 yield Ly (h0 − δ) ≥ 0. Similarly, as N ± xα ∈ Q(l(i0 )), N y0 ± Ly (xα ) = Ly (N ± xα ) = Ly (σ0 ) +
m X
Ly (σj gj )
j=1
for some (σj ) ⊂ Σ[x]. Hence, for same reasons, Ly (N ± xα ) ≥ 0, which yields | Ly (xα ) | ≤ N M, α ∈ Nni0 . P Next, Ly (f ) ≤ N M α |fα | because 2i0 ≥ degf and so ρl(i0 ) < +∞. From what precedes, and with s ∈ N arbitrary, let l(s) ≥ s be such that Ns ± xα ∈ Q(l(s))
∀ α ∈ Nns ,
(4.39)
for some Ns . Next, let r ≥ l(i0 ) (so that ρr < +∞), and let yr be a nearly optimal solution of Qr with value 1 1 ρr ≥ Lyr (f ) ≥ ρr − ≥ ρmom − . (4.40) r r Fix s ∈ N. Notice that from (4.39), for all r ≥ l(s), one has | Lyr (xα ) | ≤ Ns M,
∀ α ∈ Nns .
Therefore, for all r ≥ l(i0 ), |yαr | = | Lyr (xα ) | ≤ Ns0 ,
∀ α ∈ Nns ,
where Ns0 = max[Ns M, Vs ], with r Vs := max {|yα | : α ∈ Ns ; l(i0 ) ≤ r < l(s)}.
(4.41)
4.10 Proofs
101
Complete each yr with zeros to make it an infinite vector in l∞ , indexed in the canonical basis of R[x]. In view of (4.41), one has r |yα | ≤ Ns0 ,
∀ α ∈ N;
2s − 1 ≤ |α| ≤ 2s,
for all s = 1, 2, . . .. br ∈ l∞ defined by yb0r := y0r /M , and Hence, define the new sequence y r ybα :=
r yα Ns0
∀ α ∈ Nn ,
2s − 1 ≤ |α| ≤ 2s
for all s = 1, 2, . . ., and in l∞ , consider the sequence (b yr )r as r→∞. Obr viously, the sequence (b y )r is in the unit ball B1 of l∞ , and so, by the Banach–Alaoglu theorem (see, e.g., (Ash, 1972, Theorem. 3.5.16)), there b ∈ B1 and a subsequence {ri }, such that y bri →b y as i→∞ for the exists y weak ? topology σ(l∞ , l1 ) of l∞ . In particular, pointwise convergence holds, that is, ri lim ybα → ybα ,
i→∞
Next, define y0 := M yb0 and yα := ybα × Ns0 ,
∀ α ∈ Nn .
2s − 1 ≤ |α| ≤ 2s,
s = 1, 2, . . . .
bri →b y implies the pointwise convergence The pointwise convergence y ri y →y, i.e., ri lim yα → yα ,
i→∞
∀ α ∈ Nn .
(4.42)
Let s ∈ N be fixed. From the pointwise convergence (4.42), Similarly, limi→∞ Ms (gj yri ) = limi→∞ Ms (yri ) = Ms (y) 0. Ms (gj y) 0, for all j = 1, . . . , m. And as s was arbitrary, we obtain Mr (y) 0;
Mr (gj y) 0,
j = 1, . . . , m;
r = 0, 1, 2, . . . ,
which, by Theorem 3.8(b), implies that y has a finite Borel representing measure µ with support contained in K. In addition, fix k ∈ Γ arbitrary. The pointwise convergence (4.42) yields limi→∞ Lyri (hk ) = Ly (hk ), and as k ∈ Γr for all sufficiently large r, we obtain Ly (hk ) 5 γk . And so, as k ∈ Γ was arbitrary, Z hk dµ = Ly (hk ) = lim Lyri (hk ) 5 γk , ∀k ∈ Γ, K
i→∞
which proves that µ is a feasible solution of the moment problem (4.2).
4 Algorithms for Moment Problems
102
On the other hand, (4.40) and (4.42) also yield ρmom ≤ lim ρri = lim Lyri (f ) = Ly (f ) = i→∞
i→∞
Z
f dµ, K
and so µ is an optimal solution of (4.2). 4.10.2
Proof of Theorem 4.7
With ρsparse mom being the optimal value of (4.30), it is sufficient to prove that sparse ρsparse ↓ ρsparse mom because by Theorem 4.6 we have ρmom = ρmom . Define i the sets: Θki := { α ∈ Nn : supp (α) ∈ Ik ; |α| ≤ 2i }, k = 1, . . . , p, ( ) p p [ [ Θi := Θki = α ∈ Nn : supp (α) ∈ Ik ; |α| ≤ 2r , k=1
Θ :=
[
i∈N
Θi =
(
k=1
α∈N
n
: supp (α) ∈
p [
k=1
Ik
)
,
and call Qi the semidefinite program (4.36). We first prove that Qi has a feasible solution. Let ν be a feasible solution of the moment problem (4.2), and let Z yα = xα dν ∀ α ∈ Θi . Recalling the definition of Mi (y, Ik ) and Mi−vj (gj y, Ik ), one has Mi (y, Ik ) 0 and Mi−vj (gj y, Ik ) 0; therefore, y is an obvious feasible solution of Qi . Next we prove that ρsparse < +∞ for all sufficiently i large i. Let 2i0 ≥ max[deg f, maxj deg gj , maxk deg hk ]. In view of Assumption 2.3 and as hqk > 0 on Kk for every k = 1, . . . , p, hqk −δk > 0 on Kk for some δk > 0, k = 1, . . . , p. Next, again by compactness of Kk , there exists N such that N ± xα > 0 on Kk for all α ∈ Θki0 , and all k = 1, . . . , p. Therefore, for every k = 1, . . . , p and α ∈ Θki0 , the polynomials x 7→ hqk (x) − δk and x 7→ N ± xα belong to the quadratic module Qk ⊂ R[x(Ik )] generated by {gj }j∈Jk ⊂ R[x(Ik )], i.e., X Qk := σ0 + σj gj : σj ∈ Σ[x(Ik )] ∀ j ∈ {0} ∪ Jk . j∈Jk
4.10 Proofs
103
But there is even some l(i0 ) such that hqk − δk and N ± xα belong to Qk (l(i0 )) for all α ∈ Θki0 and k = 1, . . . , p, where Qk (t) ⊂ Qk is the set P of elements of Qk which have a representation σ0 + j∈Jk σj gj for some {σj } ⊂ Σ[x(Ik )] with deg σ0 ≤ 2t and deg σj gj ≤ 2t for all j ∈ Jk . Of course we also have hqk − δk ∈ Qk (l) and N ± xα ∈ Qk (l) for all α ∈ Θki0 , whenever l ≥ l(i0 ). Therefore, consider i = l(i0 ) (≥ i0 ). For every feasible solution y of Ql(i0 ) , γq − y 0
p X
δk =
k=1
p X
k=1
Ly (hqk − δk ) ≥ 0.
Indeed, for every k = 1, . . . , p, Ly (hqk − δk ) = Ly (σ0 ) +
X
Ly (σj gj )
j∈Jk
for some (σj ) ⊂ Σ[x(Ik )], and so the semidefinite constraints Ml(i0 ) (y, Ik ) 0 and Ml(i0 )−vj (gj y, Ik ) 0, j ∈ Jk , yield Ly (hqk − δk ) ≥ 0, k = 1, . . . , p. P Therefore y0 ≤ M := γq ( k δk )−1 . Similarly, X N y0 ± Ly (xα ) = Ly (N ± xα ) = Ly (σ0 ) + Ly (σj gj ) j∈Jk
for some (σj ) ⊂ Σ[x(Ik )]. And so for same reasons, Ly (N ± xα ) ≥ 0 which in turn implies | Ly (xα ) | ≤ N M,
α ∈ Θki0 ; k = 1, . . . , p. P As 2i0 ≥ degf , it follows that Ly (f ) ≤ N M α |fα |. This is because by Assumption 2.4, fα 6= 0 ⇒ α ∈ Θi0 . Hence ρsparse l(i0 ) < +∞. From what precedes, and with s ∈ N arbitrary, let l(s) ≥ s be such that Ns ± xα ∈ Qk (l(s))
∀ α ∈ Θks ;
k = 1, . . . , p,
(4.43)
for some Ns . Next, let r ≥ l(i0 ) (so that ρsparse < +∞), and let yr be a r nearly optimal solution of Qr with value 1 1 sparse sparse ≥ ρmom − . (4.44) − ρr ≥ Lyr (f ) ≥ ρr r r Fix s ∈ N. Notice that from (4.43), for all r ≥ l(s), one has | Lyr (xα ) | ≤ Ns M
∀ α ∈ Θs .
4 Algorithms for Moment Problems
104
Therefore, for all r ≥ i0 , |yαr | = | Lyr (xα ) | ≤ Ns0
∀ α ∈ Θs ,
(4.45)
where Ns0 = max[Ns M, Vs ], with r Vs := max {|yα | : α ∈ Θs ; l(i0 ) ≤ r < l(s)}.
Complete each yr with zeros to make it an infinite vector in l∞ , indexed r in the canonical basis of R[x]. Notice that yα 6= 0 only if α ∈ Θ. In view of (4.45), for every s = 1, 2, . . ., one has: r |yα | ≤ Ns0
∀ α ∈ Θ;
2s − 1 ≤ |α| ≤ 2s.
(4.46)
br ∈ l∞ defined by yb0r := y0r /M , and Hence, define the new sequence y r ybα :=
r yα Ns0
∀ α ∈ Θ,
2s − 1 ≤ |α| ≤ 2s,
for all s = 1, 2, . . ., and in l∞ , consider the sequence (b yr )r as r→∞. Obr viously, the sequence (b y )r is in the unit ball B1 of l∞ , and so, by the Banach–Alaoglu theorem (see, e.g., (Ash, 1972, Theorem. 3.5.16)), there b ∈ B1 and a subsequence {ri }, such that y bri →b y as i→∞ for the exists y weak ? topology σ(l∞ , l1 ) of l∞ . In particular, pointwise convergence holds, that is, ri lim ybα → ybα ,
i→∞
∀ α ∈ Nn .
Notice that ybα 6= 0 only if α ∈ Θ. Next, define y0 := M yb0 and yα := Ns0 ybα ,
2s − 1 ≤ |α| ≤ 2s,
s = 1, 2, . . . .
bri →b y implies the pointwise convergence The pointwise convergence y ri y →y: ri lim yα → yα
i→∞
∀ α ∈ Θ.
(4.47)
Let s ∈ N be fixed. From the pointwise convergence (4.47), lim Ms (yri , Ik ) = Ms (y, Ik ) 0,
i→∞
k = 1, . . . , p.
Similarly lim Ms (gj yri , Ik ) = Ms (gj y, Ik ) 0,
i→∞
j ∈ Jk ,
k = 1, . . . , p.
4.10 Proofs
105
As s was arbitrary, we obtain that for all k = 1, . . . , p, Mr (y, Ik ) 0;
Mr (gj y, Ik ) 0,
j ∈ Jk ;
r = 0, 1, 2, . . . .
(4.48)
Introduce the subsequence yk obtained from y by yk := { yα : supp (α) ∈ Ik }
∀ k = 1, . . . , p.
(4.49)
Recall that Mr (y, Ik ) (resp., Mr (gj y, Ik )) is also the moment matrix Mr (yk ) (resp., the localizing matrix Mr (gj yk )) for the sequence yk indexed in the canonical basis of R[x(Ik )]. By Theorem 3.8(b), (4.48) implies that yk has a representing measure µk with support contained in Kk , k = 1, . . . , p. Next, let j, k be such that Ijk (= Ij ∩ Ik ) 6= ∅, and recall that Ijk is the set of all subsets of Ijk . Observe that from the definition (4.49) of yj and yk , one has j k yα = yα
∀ α with supp (α) ∈ Ijk ,
and as measures on compact sets are moment determinate, it follows that the marginal probability measures of µj and µk on the variables x(Ijk ) are the same probability measure, denoted µjk . That is, Z k j yα = yα = xα dµjk ∀ α with supp (α) ∈ Ijk , Kjk
for some measure µjk supported on the set Kjk = { x ∈ Rnjk :
x(Ij ) ∈ Kj ;
x(Ik ) ∈ Kk }.
But then, by Lemma B.13 (to be adapted to measures instead of probability measures), there exists a measure µ on Rn with support contained in K, such that its projection πk µ on Rnk (on the variables x(Ik )) is just µk , for every k = 1, . . . , p. Next, from the pointwise convergence (4.47), Z hk dµ = Ly (hk ) = lim Lyri (hk ) 5 γk ∀k ∈ Γ, i→∞
K
which proves that µ is a feasible solution of the moment problem (4.2). On the other hand, (4.44) and (4.47) again, also yield Z sparse r ρmom ≤ lim ρri = lim Ly i (f ) = Ly (f ) = f dµ, i→∞
i→∞
K
from which we conclude that µ is in fact an optimal solution of (4.2).
Chapter 5
Global Optimization over Polynomials
This chapter is on global optimization, probably the simplest instance of the generalized moment problem. We detail the semidefinite relaxations of Chapter 4 for minimizing a polynomial on Rn and on a compact basic semi-algebraic set. We also discuss the linear relaxations and several particular cases, e.g. when K is a polytope or a finite variety. In particular, this latter case encompasses all 0 − 1 discrete optimization problems.
In this chapter, we address the following two fundamendal optimization problems: (a) The unconstrained polynomial optimization problem: f ∗ := min { f (x) : x ∈ Rn }
(5.1)
(b) The constrained polynomial optimization problem: fK∗ := min { f (x) : x ∈ K }
(5.2)
where f ∈ R[x] is a real-valued polynomial and K = {x ∈ Rn : gi (x) ≥ 0, i = 1, . . . , m},
(5.3)
and f, gi ∈ R[x], i = 1, . . . , m. While the framework of this chapter is a special case of the previous chapter, because of the more special structure we will be able to develop a deeper theory. The basic semi-algebraic set K is assumed to be a compact subset of Rn . We do not assume that K is convex or even connected. As observed earlier, this is a rather rich modeling framework that includes linear, quadratic, 0/1, mixed 0/1 optimization problems as special cases. In 109
110
5 Global Optimization over Polynomials
particular, constraints of the type xi ∈ {0, 1} can be written as x2i − xi ≥ 0 and xi − x2i ≥ 0. When n = 1, we have seen in Chapter 2 that a nonnegative univariate polynomial can be written as a sum of squares of polynomials. We will see that this condition naturally leads to reformulating Problems (5.1)-(5.2) as semidefinite optimization problems, for which efficient algorithms and software packages are available. It is quite remarkable that a nonconvex problem can be reformulated as a convex one and underscores the importance of the representation theorems from Chapter 2. On the other hand, the multivariate case radically differs from the univariate case, because not every nonnegative polynomial can be written as a sum of squares of polynomials. Moreover, Problem (5.1) involving a polynomial f of degree greater than or equal to four on n-variables is N P hard. However, we will see that Problem (5.1) can be approximated as closely as desired (and often can be obtained exactly) by solving a finite sequence of semidefinite optimization problems of the same type as in the one-dimensional case. A similar conclusion also holds for Problem (5.2). 5.1
The Primal and Dual Perspectives
Let K ⊂ Rn be as in (5.3) and recall that M (Rn )+ and M (K)+ are the spaces of finite Borel measures on Rn and K, respectively. Following Chapter 1, we reformulate Problems (5.1), (5.2) as follows:
ρmom :=
min
µ∈M (Ω)+
Z
f dµ Ω
(5.4)
s.t. µ(Ω) = 1
i.e., (5.4) is just (1.1) (but now minimizing) with Γ = {1}, h1 = 1, γ1 = 1, and with Ω = Rn or Ω = K with K as in (5.3). Therefore by Theorem 1.1, Theorem 5.1. (a) Problem (5.1) is equivalent to problem (5.4) with Ω = Rn , and so f ∗ = ρmom . (b) Problem (5.2) is equivalent to problem (5.4) with Ω = K, and so, fK∗ = ρmom .
5.2 Unconstrained Polynomial Optimization
111
Taking the dual perspective developed in Chapter 1, we also rewrite Problem (5.1) (resp. Problem (5.2)) as
ρpop = sup λ s.t. f (x) − λ ≥ 0,
∀x ∈ Ω,
(5.5)
with Ω = Rn (resp. Ω = K). It is clear that ρpop = ρmom = f ∗ if Ω = Rn and ρpop = ρmom = fK∗ if Ω = K. We now consider linear and semidefinite relaxations for Problems (5.1) and (5.2), as special cases of the numerical scheme developed in Chapter 4 for the generalized moment problem. In doing so, for Problem (5.1), we obtain a single semidefinite relaxation, which is exact if and only if the polynomial x 7→ f (x) − f ∗ is a sum of squares. For Problem (5.2), we obtain a hierarchy of semidefinite relaxations whose sizes depend on the degree d permitted for the s.o.s. weights in the representation (2.13) in Theorem 2.14. The larger the degree the better the optimal value λ. We may also consider the alternative representation of f (x) − λ as an element of the cone CG in Eq. (2.18), and invoke Theorem 2.23. In doing so, we obtain a hierarchy of linear relaxations, instead of semidefinite relaxations.
5.2
Unconstrained Polynomial Optimization
In the unconstrained case Ω = Rn , the only interesting case is when deg f is even, otherwise necessarily f ∗ = −∞. Therefore, let 2r be the degree of f ∈ R[x], and let i ≥ r. The semidefinite relaxation (4.5) of Chapter 4 reads:
ρi = inf { Ly (f ) : Mi (y) 0, y0 = 1 } y
with associated dual:
(5.6)
5 Global Optimization over Polynomials
112
ρ∗i = sup { λ : X 0 ; λ,X
hX, Bα i = fα − λδα=0 ,
∀ |α| ≤ 2i }
(5.7)
where Mi (y), the moment matrix associated with the sequence y, is written P Mi (y) = |α|≤2i yα Bα for appropriate symmetric matrices {Bα }, and δα=0 is the Kronecker symbol. In the present context, the interpretation (4.9) reads:
ρ∗i = sup {λ : f − λ ∈ Σ(x) }.
(5.8)
λ
Therefore, one has to consider only one semidefinite relaxation, namely that with i = r, because obviously, if f has degree 2r then f − λ cannot be a sum of squares of polynomials with degree larger than r. That is, ρ∗i = ρ∗r for all i > r. Proposition 5.2. There is no duality gap, that is, ρr = ρ∗r . And, if ρr > −∞ then (5.8) has an optimal solution. Proof. The result follows from the duality theory of semidefinite optimization if we can prove that there is a strictly feasible solution y of Problem (5.6), i.e., such that Mr (y) 0 (Slater condition); see Theorem C.17. Let µ be a probability measure on Rn with a strictly positive density f with respect to the Lebesgue measure and with all its moments finite; that is, µ is such that Z Z α yα = x dµ = xα f (x) dx < ∞, ∀ α ∈ Nn . Rn
Rn
Then Mr (y), with y as above, is such that Mr (y) 0. To see this, recall that for every polynomial q ∈ R[x] of degree at most r, and vector of coefficients q ∈ Rs(r) , Z hq, qiy = hq, Mr (y)qi = q 2 dµ (from (3.11)) n ZR = q(x)2 f (x) dx Rn
> 0, whenever q 6= 0 (as f > 0).
Therefore, y is strictly feasible for Problem (5.6), i.e., Mr (y) 0, the desired result.
5.2 Unconstrained Polynomial Optimization
113
We next prove the central result of this section.
Theorem 5.3. Let f ∈ R[x] be a 2r-degree polynomial with global minimum f ∗ > −∞. (a) If the nonnegative polynomial f − f ∗ is s.o.s., then Problem (5.1) is equivalent to the semidefinite optimization problem (5.6), i.e., f ∗ = ρ∗r = ρr and if x∗ ∈ Rn is a global minimizer of (5.1), then the moment vector y∗ := (x∗1 , . . . , x∗n , (x∗1 )2 , x∗1 x∗2 , . . . , (x∗1 )2r , . . . , (x∗n )2r )
(5.9)
is a minimizer of Problem (5.6). (b) If Problem (5.7) has a feasible solution and f ∗ = ρ∗r , then f − f ∗ is s.o.s.
Proof.
(a) Let f − f ∗ be s.o.s., that is, f (x) − f ∗ =
k X
qi (x)2 ,
i=1
x ∈ Rn ,
(5.10)
for some polynomials {qi }ki=1 ⊂ R[x]r with coefficient vector qi ∈ Rs(r) , i = 1, 2, . . . , k, with s(r) = n+r n . Equivalently, with vr (x) as in (3.9), with X =
Pk
i=1
f (x) − f ∗ = hX, Mr (y)i,
x ∈ Rn ,
(5.11)
qi q0i and y = v2r (x). From Eq. (5.11) it follows that
hX, B0 i = f0 − f ∗ ;
hX, Bα i = fα , for all 0 6= α, |α| ≤ 2r,
so that (as X 0) X is feasible for problem (5.7) with value λ := f ∗ . Since y∗ in (5.9) is feasible for Problem (5.6) with value f ∗ and ρ∗r = ρr , it follows that y∗ and X are optimal solutions to Problems (5.6) and (5.7), respectively. (b) Suppose that Problem (5.7) has a feasible solution and f ∗ = ρ∗r . Then, from Proposition 5.2, Problem (5.7) has an optimal solution (X∗ , f ∗ ), with X∗ 0, and there is no duality gap, that is, ρr = ρ∗r . As X∗ 0, we use Pk its spectral decomposition to write X∗ = i=1 qi q0i for some vector {qi }
5 Global Optimization over Polynomials
114
in Rs(r) . Using the feasibility of (X∗ , f ∗ ) in (5.7), we obtain *
X∗ ,
which, using X∗ =
Pk
X
i=1
X
Bα x α
α
+
= f (x) − f ∗ ,
qi q0i and
Bα xα = Mr (v2r (x)) = vr (x)vr (x)0 ,
α
yields the desired sum of squares f (x) − f ∗ =
k X i=1
hqi q0i , vr (x)vr (x)0 i =
k X i=1
hqi , vr (x)i2 =
k X
qi2 (x),
i=1
with the polynomials x 7→ qi (x) := hqi , vr (x)i, for all i = 1, . . . , k.
From the proof of Theorem 5.3, it is obvious that if f ∗ = ρ∗r , then any global minimizer x∗ of f is a zero of each polynomial qi , where X∗ = Pk 0 ∗ ∗ i=1 qi qi at an optimal solution X of problem (5.7). When f −f is s.o.s., solving Problem (5.7) provides the polynomials qi of such a decomposition. As a corollary, we obtain the following. Corollary 5.4. Let f ∈ R[x] be of degree 2r. Assume that Problem (5.7) has a feasible solution. Then, f (x) − f ∗ =
k X i=1
qi (x)2 − [f ∗ − ρ∗r ],
x ∈ Rn ,
(5.12)
for some real-valued polynomials qi ∈ R[x] of degree at most r, i = 1, 2, . . . , k. The proof is the same as that of Theorem 5.3(b), except now we may not have f ∗ = ρ∗r , but instead ρ∗r ≤ f ∗ . Hence, ρ∗r always provides a lower bound on f ∗ . Corollary 5.4 states that, up to some constant, one may always write f − f ∗ as a s.o.s. whenever Problem (5.7) has a feasible solution. The previous development leads to the following algorithm for either solving Problem (5.1) or providing a lower bound on its optimal value f ∗ .
5.2 Unconstrained Polynomial Optimization
115
Algorithm 5.1. ( Unconstrained polynomial optimization) P Input: A polynomial x 7→ f (x) = α∈Nn fα xα of degree 2r.
Output: The value f ∗ = minx∈Rn f (x) or a lower bound ρr on f ∗ . Algorithm: 1. Solve the semidefinite optimization problem (5.6) with optimal value ρr and optimal solution y∗ (if y∗ exists). 2. If rank Mr−1 (y∗ ) = rank Mr (y∗ ), then f ∗ = ρr and there are at least rank Mr (y∗ ) global minimizers, which are extracted by applying Algorithm 4.2. 3. Otherwise, ρr only provides a lower bound ρr ≤ f ∗ . We next show that Algorithm 5.1 correctly determines whether ρr is the exact solution value f ∗ or a lower bound.
Theorem 5.5. Let f ∈ R[x] with degree 2r, and suppose that the optimal value ρr of Problem (5.6) is attained at some optimal solution y ∗ . If rank Mr−1 (y∗ ) = rank Mr (y∗ ), then f ∗ = ρr , and there are at least rank Mr (y∗ ) global minimizers.
Proof. We have already shown that ρr ≤ f ∗ . If rank Mr−1 (y∗ ) = rank Mr (y∗ ), we let s = rank Mr (y∗ ). Then Mr (y∗ ) is a flat extension of Mr−1 (y∗ ), and so, by Theorem 3.7, y∗ is the vector of moments up to order 2r, ofR some s-atomic probability measure µ∗ on Rn . Therefore, ρr = Ly (f ) = f dµ∗ , which proves that µ∗ is an optimal solution of (5.4), and ρr = ρmom = f ∗ , because we always have ρr ≤ ρmom . We next show that each of the s atoms of µ∗ is a global minimizer of f . Indeed, being s-atomic, there is a family (x(k))sk=1 ⊂ Rn and a family (βk )sk=1 ⊂ R, such that µ∗ =
s X k=1
Hence,
βk δx(k) ,
βk > 0,
∀k = 1, . . . , s;
s X k=1
βk = 1,
116
5 Global Optimization over Polynomials
f ∗ = ρr =
Z
f dµ∗ = Rn
s X
βk f (x(k)),
k=1
which, in view of f (x(k)) ≥ f ∗ , for all k = 1, . . . , s, implies the desired result f (x(k)) = f ∗ , for all k = 1, . . . , s. As an illustration, suppose that y∗ satisfies rank Mr (y∗ ) = 1. Therefore, Mr (y∗ ) = vr (x∗ )vr (x∗ )0 for some x∗ ∈ Rn , that is, y∗ is the vector of moments up to order 2r of the Dirac measure at x∗ , and one reads an ∗ optimal solution x∗ , from the subvector of first“moments” yα with |α| = 1. ∗ Note also that when n = 1, f − f is always a s.o.s., so that we expect that ρr = f ∗ , where 2r is the degree of f . In other words, the global minimization of a univariate polynomial is a convex optimization problem, the semidefinite program (5.6). Therefore, in view of Section 2.4, we also expect that ρr = f ∗ for quadratic polynomials, and bivariate polynomials of degree 4. Let us illustrate these properties with an example. Example 5.1. We consider the polynomial f on two variables: x 7→ f (x) = (x21 + 1)2 + (x22 + 1)2 + (x1 + x2 + 1)2 . Note that in this case r = 2 and f −f ∗ is a bivariate polynomial of degree 4, and therefore it is a sum of squares. We thus expect that ρ2 = f ∗ . Solving Problem (5.6) for i = 2, yields a minimum value of ρ2 = −0.4926. In this case, it turns out that M2 (y∗ ) has rank one, and from the optimal solution y∗ , we check that y = (1, x∗1 , x∗2 , (x∗1 )2 , x∗1 x∗2 , (x∗2 )2 , . . . , (x∗1 )4 , . . . , (x∗2 )4 ), with x∗1 = x∗2 ≈ −0.2428. We observe that the solution x∗ is a good approximation of a global minimizer of problem (5.1), since the gradient vector verifies ∂f ∂f = = 4 · 10−9 . ∂x1 x=x∗ ∂x2 x=x∗ In this example, it follows that the semidefinite relaxation is exact. The reason we have ∇f (x∗ ) ≈ 0, but not exactly 0, is likely due anavoidable numerical inaccuracies when using a SDP solver.
5.3 Constrained Polynomial Optimization
117
The next example discusses when the approach does not work and the effect of perturbations. Example 5.2. Consider the bivariate polynomial x 7→ f (x) = x21 x22 (x21 + x22 − 1). In Exercise 2.4(a) we have shown that f + 1 is positive, but not a sum of squares. Note that the global optimum is f ∗ = −1/27 and there are √ √ ∗ four global minimizers x = (± 3/3, ± 3/3). If we consider the problem f ∗ = min f (x1 , x2 ) and apply Algorithm 5.1 for r = 3, we obtain that ρ3 = −∞, that is the bound is uninformative. However consider the perturbed problem of minimizing the polynomial 10 f := f (x) + (x10 1 + x2 ) with = 0.001. Applying Algorithm 5.1 using the software GloptiPoly we find that ρ5 ≈ f ∗ and four optimal solutions extracted (x1 , x2 ) ≈ x∗ . In other words, while the approach fails when applied to the original polynomial f , it succeeds when applied to the perturbed polynomial f .
5.3
Constrained Polynomial Optimization: Semidefinite Relaxations
In this section, we address Problem (5.2) with f ∈ R[x] being a polynomial of degree 2v0 or 2v0 − 1. The set K defined in (5.3) is assumed to be a compact basic semi-algebraic subset of Rn . For all j = 1, . . . , m, and depending on its parity, let 2vj , or 2vj − 1, denote the degree of the polynomials gj in the definition (5.3) of the set K. We assume that Assumption 2.1 holds, which as we discussed in Section 2.5, is verified in many cases, for example if there is one polynomial gk such that {x ∈ Rn : gk (x) ≥ 0} is compact; see the discussion following Theorem 2.14 and Theorem 2.15. Under Assumption 2.1, from Theorem 2.14, every polynomial f ∈ R[x] which is strictly positive on K, can be written f (x) = σ0 (x) +
m X j=1
gj (x)σj (x),
∀ x ∈ Rn ,
(5.13)
for some polynomials σj ∈ Σ[x], for all j = 0, 1, . . . , m. From the general scheme described in Chapter 4, the semidefinite relaxations (4.5) are as follows: For i ≥ maxj=0,...,m vj ,
5 Global Optimization over Polynomials
118
ρi = inf Ly (f ) y
s.t. Mi (y) 0,
(5.14)
Mi−vj (gj y) 0, j = 1, . . . , m y0 = 1.
The dual of problem (5.14) is the semidefinite program: ρ∗i = sup λ λ,{σj }
s.t. f − λ = σ0 +
m X
σj g j
(5.15)
j=1
deg σ0 ≤ 2i; deg σj ≤ 2i − 2vj , j = 1, . . . , m. Note that if in the definition of K there is an equality constraint gj (x) = 0 (i.e. two opposite inequality constraints gj (x) ≥ 0 and −gi (x) ≥ 0), then one has the equality constraint Mi−vj (gj y) = 0 in (5.14) and accordingly, in (5.15) the weight σj ∈ R[x] is not required to be a s.o.s. The overall algorithm is as follows:
Algorithm 5.2. (Constrained polynomial optimization) Input: A polynomial x 7→ f (x) of degree 2v0 or 2v0 − 1; a set K = {x ∈ Rn : gj (x) ≥ 0, j = 1, . . . , m}, where the polynomials gj are of degree 2vj or 2vj − 1, j = 1, . . . , m; a number k of the highest relaxation. Output: The value fK∗ = minx∈K f (x) and a list of global minimizers or a lower bound ρk on f ∗ .
Algorithm: 1. Solve the semidefinite optimization problem (5.14) with optimal value ρi , and optimal solution y∗ (if it exists). 2. If there is no optimal solution y∗ then ρk only provides a lower bound ρk ≤ fK∗ . If i < k then increase i by one and go to Step 1; otherwise stop and output ρk . 3. If rank Mi−v (y∗ ) = rank Mi (y∗ ) (with v := maxj vj ), then ρi = fK∗ and there are at least rank Mi (y∗ ) global minimizers, which are extracted by applying Algorithm 4.2. 4. If rank Mi−v (y∗ ) 6= rank Mi (y∗ ), and i < k, then increase i by one and go to Step 1; otherwise, stop and output ρk only provides a lower bound ρk ≤ fK∗ .
5.3 Constrained Polynomial Optimization
119
And we have the following convergence result:
Theorem 5.6. Let Assumption 2.1 hold and consider the semidefinite relaxation (5.14) with optimal value ρi . (a) ρi ↑ fK∗ as i→∞. (b) Assume that (5.2) has a unique optimal solution x∗ ∈ K and let yi be a nearly optimal solution of (5.14) with value Ly (f ) ≤ ρi + 1/i. Then as i→∞, Lyi (xj )→x∗j for every j = 1, . . . , n. Theorem 5.6(a) is a direct consequence of Theorem 4.1. For Theorem 5.6(b) one uses same arguments as in the proof of Theorem 4.3. A subsequence yik converges pointwise to a sequence y which is shown to be the moment sequence of a measure µ supported on K, and with value Ly (f ) = limk→∞ Lyik (f ) ≤ fK∗ . Hence µ is the Dirac measure δx∗ at the unique global minimizer x∗ ∈ K. Therefore the whole sequence yi converges pointwise and in particular, for every j = 1, . . . , n, Lyi (xj )→x∗j as i→∞. 5.3.1
Obtaining global minimizers
As for the unconstrained case, after solving the semidefinite relaxation (5.14) for some value of i ∈ N, we are left with two issues: (a) How do we know that ρi < fK∗ , or ρi = fK∗ ? (b) If ρi = fK∗ , can we get at least one global minimizer x∗ ∈ K? Again, an easy case is when (5.14) has an optimal solution y ∗ which satisfies rank Mi (y∗ ) = 1, and so, necessarily, Mi (y∗ ) = vi (x∗ )vi (x∗ )0 for some x∗ ∈ Rn . In addition, the constraints Mi−vj (gj y∗ ) 0 imply that x∗ ∈ K. That is, y∗ is the vector of moments up to order 2i of the Dirac measure at x∗ ∈ K, and one reads the optimal solution x∗ , from the ∗ subvector of first “moments” yα with |α| = 1. For the case of multiple global minimizers, we have the following sufficient condition which is implemented at step 3 of Algorithm 5.2: Theorem 5.7. Let f ∈ R[x], and suppose that the optimal value ρi of Problem (5.14) is attained at some optimal solution y ∗ . Let v := maxj=1,...,m vj . If rank Mi−v (y∗ ) = rank Mi (y∗ ), then fK∗ = ρi and there are at least s := rank Mi (y∗ ) global minimizers, and they can be extracted by Algorithm 4.2.
5 Global Optimization over Polynomials
120
Proof. We already know that ρi ≤ fK∗ for all i. If s = rank Mi−v (y∗ ) = rank Mi (y∗ ), then by Theorem 3.11, y∗ is the moment vector of some satomic probability measure µ∗ on K. We then argue that each of the s atoms of µ∗ is a global minimizer of f on K. Indeed, being s-atomic, there is a family (x(k))sk=1 ⊂ K and a family (βk )sk=1 ⊂ R, such that µ∗ =
s X
βk δx(k) ,
βk > 0,
k=1
∀k = 1, . . . , s;
s X
βk = 1,
k=1
Hence, fK∗ ≥ ρi = Ly∗ (f ) =
Z
f dµ∗ = K
s X
k=1
βk f (x(k)) ≥ fK∗ ,
which clearly implies fK∗ = ρi , and f (x(k)) = fK∗ , for all k = 1, . . . , s, the desired result. The rank-test of Theorem 5.7 implies that if it is satisfied, we can conclude that there are rank Mi (y∗ ) global minimizers encoded in the optimal solution y∗ of (5.14). In order to extract these solutions from y ∗ , we apply Algorithm 4.2. This extraction algorithm is implemented in the GloptiPoly software described in Appendix D. Example 5.3. Consider the optimization problem fK∗ = min x21 x22 (x21 + x22 − 1) s.t. x21 + x22 ≤ 4,
which is the same as in Example 5.2, but with the additional ball constraint kxk2 ≤ 4. Therefore the optimal value is fK∗ = −1/27 with global minimiz√ √ ∗ ers x = (± 3/3, ± 3/3). Applying Algorithm 5.2 for k = 4 and using GloptyPoly, the optimal value is obtained with ρ4 ≈ −1/27 and the four optimal solutions (±0.5774, ±0.5774)0 ≈ x∗ are extracted. When we add the additional nonconvex constraint x1 x2 ≥ 1, we find that the optimal value is obtained with ρ3 ≈ 1 and the two optimal solutions (x∗1 , x∗2 )0 = (−1, −1)0 and (x∗1 , x∗2 )0 = (1, 1)0 are extracted. In both cases, the rank-test is satisfied for an optimal solution y∗ and a global optimality certificate is provided thanks to Theorem 5.7. Example 5.4. Let n = 2 and f ∈ R[x] be the concave polynomial f (x) := −(x1 − 1)2 − (x1 − x2 )2 − (x2 − 3)2
5.3 Constrained Polynomial Optimization
121
and let K ⊂ R2 be the set: K := {x ∈ R2 : 1 − (x1 − 1)2 ≥ 0, 1 − (x1 − x2 )2 ≥ 0, 1 − (x2 − 3)2 ≥ 0}. The optimal value is fK∗ = −2. Solving problem (5.14) for i = 1, yields ρ1 = −3 instead of the desired value −2. On the other hand, solving problem (5.14) for i = 2 yields ρ2 ≈ −2 and the three optimal solutions (x∗1 , x∗2 ) = (1, 2), (2, 2), (2, 3) are extracted. Hence, with polynomials of degree 4 instead of 2, we obtain (a good approximation of) the correct value. Note that there exist scalars λj = 1 ≥ 0 such that f (x) + 3 = 0 +
3 X
λj gj (x),
j=1
but f (x) − fK∗ (= f (x) + 2) cannot be written in this way (otherwise ρ1 would be the optimal value −2). For quadratically constrained nonconvex quadratic problems, the semidefinite program (5.14) with i = 1 is a well-known relaxation. But ρ1 which sometimes provides directly the exact global minimum value, is only a lower bound in general. 5.3.2
The univariate case
If we consider the univariate case n = 1, and with K = [a, b] or K = [0, ∞), the corresponding sequence of semidefinite relaxations (5.14) simply reduces to a single relaxation. In other words, the minimization of a univariate polynomial on an interval of the real line (bounded or not) is a convex optimization problem and reduces to solving a semidefinite program. Indeed, consider for instance, the case where f has degree 2r and K ⊂ R is the interval [−1, 1]. By Theorem 2.6(b) f (x) − fK∗ = f0 (x) + f3 (x)(1 − x2 ),
x ∈ R,
(5.16)
for some s.o.s. polynomials f0 , f3 ∈ Σ[x] such that the degree of the summands is less than 2r.
5 Global Optimization over Polynomials
122
Theorem 5.8. Let f ∈ R[x] be a univariate polynomial of degree 2r. The semidefinite relaxation (5.14) for i = r of the problem fK∗ = min{f (x) : 1 − x2 ≥ 0}, x
is exact, i.e., ρr = optimal solution.
ρ∗r
=
fK∗ .
In addition, both (5.14) and (5.15) have an
Proof. From (5.16), (fK∗ , f0 , f3 ) is a feasible solution of (5.15) with i = r, and so, optimal, because ρ∗r = fK∗ . Therefore, from ρ∗r ≤ ρr ≤ fK∗ , we also obtain ρr = fK∗ , which in turn implies that y∗ := v2r (x∗ ) is an optimal solution, for any global minimizer x∗ ∈ K. A similar argument holds if f ∈ R[x] has odd degree 2r − 1, in which case f (x) − fK∗ = f1 (x)(1 + x) + f2 (x)(1 − x),
x ∈ R,
for some s.o.s. polynomials f1 , f2 ∈ Σ[x] such that the degree of the summands is less than 2r − 1. Again, for the problem fK∗ = min{f (x) : 1 − x ≥ 0, 1 + x ≥ 0}, x
the relaxation (5.14) with i = r is exact. 5.3.3
Numerical experiments
In this section, we report on the performance of Algorithm 5.2 using its implementation in the software package GloptiPoly (see Chapter D) on a series of benchmark non-convex continuous optimization problems. In Table 5.1, we record the problem name, the source of the problem, the number of decision variables (var), the number of inequality or equality constraints (cstr), and the maximum degree arising in the polynomial expressions (deg), the CPU time in seconds (CPU) and the order of the relaxation (order). At the time of the experiment, Gloptipoly was using the semidefinite optimization solver SeDuMi; see Sturm (1999). As indicated by the label dim in the rightmost column, quadratic problems 2.8, 2.9 and 2.11 in Floudas et al. (1999) involve more than 19 variables and could not be handled by the current version of GloptiPoly. Except for problems 2.4 and
5.3 Constrained Polynomial Optimization
123
Table 5.1 Continuous optimization problems. CPU times and semidefinite relaxation orders required to reach global optimality. problem (Lasserre, 2001, Ex. 1) (Lasserre, 2001, Ex. 2) (Lasserre, 2001, Ex. 3) (Lasserre, 2001, Ex. 5) (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb. (Floudas et al., 1999, Pb.
2.2) 2.3) 2.4) 2.5) 2.6) 2.7) 2.8) 2.9) 2.10) 2.11) 3.2) 3.3) 3.4) 3.5) 4.2) 4.3) 4.4) 4.5) 4.6) 4.7) 4.8) 4.9) 4.10)
var 2 2 2 2 5 6 13 6 10 10 20 24 10 20 8 5 6 3 1 1 1 1 2 1 1 2 2
cstr 0 0 0 3 11 13 35 15 31 25 10 10 11 10 22 16 16 8 2 2 2 2 2 2 2 5 6
deg 4 4 6 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 6 50 5 4 6 6 4 4 4
CPU 0.13 0.13 1.13 0.22 11.8 1.86 1012 1.58 67.7 75.3 45.3 3032 1.20 1.50 2.42 0.17 0.94 0.25 0.14 0.41 0.20 0.16 0.31 0.58
order 2 2 8 2 3 2 2 2 2 2 dim dim 2 dim 3 2 2 4 3 25 3 2 3 3 2 2 4
3.2, the computational load is moderate. In almost all reported instances the global optimum was reached exactly by a semidefinite relaxation of small order. 5.3.4
Exploiting sparsity
As already mentioned in Chapter 4, despite their nice properties, the size of the semidefinite relaxations (5.14) grows rapidly with the dimension n. n+i Typically, the moment matrix M (y) is s(i) × s(i) with s(i) = i n , and there are n+2i variables y . This makes the applicability of Algorithm α n 5.2 limited to small to medium size problems only. Fortunately, in many practical applications of large size moment problems, some sparsity pattern is often present and may be exploited as outlined in Chapter 4. Suppose that there is no coupling between some subsets of variables
5 Global Optimization over Polynomials
124
in the polynomials gk that define the set K, and the polynomial f . That is, by no coupling between two sets of variables we mean that there is no monomial involving some variables of such subsets in any of the polynomials f , (gk ). Recalling the notation of Section 4.6, let I0 := {1, . . . , n} be the union ∪pk=1 Ik of p subsets Ik , k = 1, . . . , p, with cardinal denoted nk (with possible overlaps). For an arbitrary J ⊆ I0 , let R[x(J)] denote the ring of polynomials in the variables x(J) = {xi : i ∈ J}, and so R[x(I0 )] = R[x]. So let Assumption 2.3, 2.4 hold and as in section 2.7, let m0 = m + p and {1, . . . , m0 } = ∪pi=1 Ji and K be as in (2.30) after having added in its definition the p redundant quadratic constraints (2.29). With k ∈ {1, . . . , p} fixed, and g ∈ R[x(Ik )], let Mi (y, Ik ) (resp. Mi (g y, Ik )) be the moment (resp. localizing) submatrix obtained from Mi (y) (resp. Mi (g y)) by retaining only those rows (and columns) α ∈ Nn of Mi (y) (resp. Mi (g y)) such that supp(α) ∈ Ik . Hence in the present context, the sparse semidefinite relaxation (4.36) reads: ρsparse = inf Ly (f ) i y
s.t. Mi (y, Ik ) 0,
k = 1, . . . , p
(5.17)
Mi−vj (gj y, Ik ) 0, j ∈ Jk , k = 1, . . . , p y0 = 1
whereas its dual is the semidefinite program (ρsparse )∗ = i
sup λ λ,σkj
s.t. f − λ =
p X k=1
σk0 +
X
j∈Jk
σkj gj
(5.18)
σk0 , σkj ∈ Σ[x(Ik )], k = 1, . . . , p deg σk0 , deg σkj gj ≤ 2i, k = 1, . . . , p. Theorem 5.9. Let Assumptions 2.3 and 2.4 hold. Consider the sparse semidefinite relaxations (5.17) and (5.18). If the running intersection property (4.33) holds then ρsparse ↓ fK∗ i sparse ∗ ∗ and (ρi ) ↓ fK as i→∞. as i→∞.
5.4 Linear Programming Relaxations
125
Theorem 5.9 is just Theorem 4.7 adapted to the present context of global optimization. To see the gain in terms of number of variables and size of the moment and localizing matrices, the reader is referred to the comment just after Theorem 4.7. For instance, if τ := supk |I(k)| is relatively small (say e.g. 6, 7) then one may solve optimization problems with n = 1000 variables whereas with n = 1000, one may not even implement the first standard semidefinite relaxation (5.14) with i = 1! 5.4
Linear Programming Relaxations
In this section, and still with K as in (5.3), we derive linear programming relaxations, the specialized version for problem (5.2) of those described in Chapter 4 for the generalized moment problem with polynomial data. Let gbj be the normalized version associated with gj , j = 1, . . . , m, and defined in (2.17). Let i ∈ N be fixed and consider the following linear optimization problem: Li = inf {Ly (f ) : y0 = 1 ; y
Ly (b g α (1 − gb)β ) ≥ 0,
∀ |α + β| ≤ i }. (5.19)
with Ly being as in (3.3). Indeed, Problem (5.19) is a linear optimization problem, because since |α + β| ≤ i, the conditions Ly (b g α (1 − gb)β ) ≥ 0 yield finitely many linear inequality constraints on finitely many coefficients yα of the infinite sequence y. The dual of Problem (5.19) is X L∗i = sup { λ : f − λ = uα,β gbα (1 − gb)β }. (5.20) λ,u≥0
α,β∈Nm , |α+β|≤i
From Theorem 4.2 we obtain
Theorem 5.10. The sequence (Li )i is monotone nondecreasing, and under Assumption 2.2, lim L∗i = lim Li = fK∗ .
i→∞
i→∞
(5.21)
The proof which uses Theorem 4.2 is left as an exercise. So the relaxations (5.19)-(5.20) are linear programs, a good news because in principle, with current LP packages, one is able to solve linear programs with million variables and constraints! However, as we will see, the linear programs (5.19)-(5.20) suffer from some serious drawbacks.
5 Global Optimization over Polynomials
126
5.4.1
The case of a convex polytope
In the particular case where the basic semi-algebraic set K ⊂ Rn is compact and all the polynomials gi in the definition (5.3) of K are affine (and so K is a polytope), one may specialize the linear relaxation (5.19) to: Li = inf { Ly (f ) : y0 = 1; y
Ly (g α ) ≥ 0,
∀ |α| ≤ i },
(5.22)
uα g α }.
(5.23)
and its associated dual: L∗i = max { λ : f − λ = λ,u≥0
X
α∈Nm ; |α|≤i
Theorem 5.11. If all gj ’s in (5.3) are affine and K is a convex polytope with nonempty interior, then: lim L∗i = lim Li = fK∗ .
i→∞
i→∞
(5.24)
The proof is analogous to that of Theorem 5.10 and is left as an exercise. 5.4.2
Contrasting LP and semidefinite relaxations
Theorem 5.11 implies that we can approach the global optimal value fK∗ as closely as desired by solving linear optimization problems (5.19) of increasing size. This should be interesting because very powerful linear optimization software packages are available, in contrast with semidefinite optimization software packages that have not yet reached the level of maturity of the linear optimization packages. Unfortunately we next show that in general the LP-relaxations (5.19) cannot be exact, that is, the convergence in (5.21) (or in (5.24) when K is a convex polyope) is only asymptotic, not finite. Indeed, assume that the convergence is finite, i.e., L∗i = fK∗ for some i ∈ N. Suppose that the interior of K (int K) is given by int K = {x ∈ Rn : gj (x) > 0,
j = 1, . . . , m}.
Then, if there exists a global minimizer x∗ ∈ int K, and 1 − gbj (x∗ ) > 0 for all j = 1, . . . , m, we would get the contradiction X 0 = f (x∗ ) − L∗i = uα,β gb(x∗ )α (1 − gb(x∗ ))β > 0. α,β∈Nm
5.5 Global Optimality Conditions
127
Similarly, if x∗ ∈ ∂K, let J ∗ := {j : gj (x∗ ) = 0}. For the same reasons, if there exists x ∈ K with f (x) > f (x∗ ) and gj (x) = 0 for all j ∈ J ∗ , then finite convergence cannot take place. Example 5.5. Let K := {x ∈ R : 0 ≤ x ≤ 1} and x 7→ f (x) := x(x − 1). This is a convex optimization problem with global minimizer x∗ = 1/2 in the interior of K, and optimal value −1/4. The optimal values (Li )i of the linear relaxations (5.19) are reported in Table 5.2. The example shows the rather slow monotone convergence of Li → − 0.25, despite the original problem is convex. Table 5.2 The slow convergence of the linear relaxation (5.19). i Li
2 -1/3
4 -1/3
6 -0.3
10 -0.27
15 -0.2695
On the other hand, with −f instead of f , the problem becomes a harder concave minimization problem. But this time the second relaxation is exact! Indeed, fK∗ = 0 and we have f (x) − fK∗ = x − x2 = x(1 − x) = g1 (x)g2 (x), with g1 (x) = x and g2 (x) = 1 − x. Example 5.5 illustrates that the convergence Li ↑ fK∗ as i→∞, is in general asymptotic, not finite; as underlined, the global minimizer being in the interior of K, convergence cannot be finite. In addition, Example 5.5 exhibits an annoying paradox, namely that LP-relaxations may perform better for the concave minimization problem than for the a priori easier convex minimization problem. Finally, notice that for large values of i, the constraints of the LP-relaxations (5.20) should contain very large coefficients due to the presence of binomial coefficients in the terms (1 − gb)β , a source of numerical instability and ill-conditioned problems. 5.5
Global Optimality Conditions
In this section, we derive global optimality conditions for polynomial optimization generalizing the local optimality conditions due to Karush-KuhnTucker (KKT) for nonlinear optimization. A vector (x∗ , λ∗ ) ∈ Rn × Rm satisfies the KKT conditions associated with problem (5.2) (and is called a KKT pair) if
5 Global Optimization over Polynomials
128
∇f (x∗ ) =
m X j=1
λ∗j ∇gj (x∗ ),
λ∗j gj (x∗ ) = 0,
j = 1, . . . , m,
gj (x∗ ), λ∗j ≥ 0,
j = 1, . . . , m.
(5.25)
• The nonnegative dual variables λ∗ ∈ Rm are called the Lagrange KarushKuhn-Tucker multipliers; See Section C. • In fact, most local optimization algorithms try to find a pair of vectors (x∗ , λ∗ ) ∈ Rn × Rm + that satisfies (5.25). ∗ • In general, x is not a global minimizer of the Lagrangian polynomial x 7→
Lf (x) := f (x) − fK∗ −
m X
λ∗j gj (x).
(5.26)
j=1
• However, if f is convex, the gi ’s are concave, and the interior of K is nonempty (i.e., Slater’s condition holds), then (5.25) are necessary and sufficient optimality conditions for x∗ to be an optimal solution of (5.2). Moreover, x∗ is a global minimizer of the Lagrangian polynomial Lf which is nonnegative on Rn , with Lf (x∗ ) = 0. The developments in the earlier section lead to the following global optimality conditions.
Theorem 5.12. Let x∗ ∈ K be a global minimizer for problem (5.2), with global optimum fK∗ , and assume that f − fK∗ has the representation (5.13), i.e., f (x) − fK∗ = σ0 (x) +
m X j=1
σj (x) gj (x),
x ∈ Rn ,
(5.27)
for some s.o.s. polynomials {σj }m j=0 ⊂ Σ[x]. Then: (a) σj (x∗ ), gj (x∗ ) ≥ 0, for all j = 1, . . . , m. (b) σj (x∗ )gj (x∗ ) = 0, for all j = 1, . . . , m. Pm (c) ∇f (x∗ ) = j=1 σj (x∗ ) ∇gj (x∗ ), that is (x∗ , λ∗ ) is a KKT pair, with λ∗j := σj (x∗ ), ∀j = 1, . . . , m. Pm (d) x∗ is a global minimizer of the polynomial f − fK∗ − j=1 σj gj .
5.5 Global Optimality Conditions
129
Proof. (a) Part (a) is obvious from x∗ ∈ K, and the polynomials σj ’s are s.o.s. (b) From (5.27) and the fact that x∗ is a global minimizer, we obtain f (x∗ ) − fK∗ = 0 = σ0 (x∗ ) +
m X
σj (x∗ )gj (x∗ ),
j=1
which in turn implies part (b) because gj (x∗ ) ≥ 0 for all j = 1, . . . , m, and the polynomials (σj ) are all s.o.s., hence nonnegative. This also implies σ0 (x∗ ) = 0. (c) Differentiating and using the fact that the polynomials (σj ) are s.o.s., and using part (b), yields part (c). (d) From (5.27) we obtain, f−
fK∗
−
m X j=1
σj gj = σ0 ≥ 0,
because σ0 ∈ R[x] is s.o.s., and using property (b), f (x∗ ) − fK∗ −
m X
σj (x∗ ) gj (x∗ ) = 0,
j=1
which shows that x∗ is a global minimizer of f − fK∗ −
P
j
σj g j .
Theorem 5.12 implies that when f − fK∗ has the representation (5.27), then: (a) (5.27) should be interpreted as a global optimality condition. (b) The s.o.s. polynomial coefficients {σj } ⊂ R[x] should be interpreted as generalized Karush-Kuhn-Tucker Lagrange multipliers. P (c) The polynomial f − fK∗ − j σj gj is a generalized Lagrangian, with s.o.s. polynomial multipliers instead of nonnegative scalars. It is s.o.s. (hence nonnegative on Rn ), vanishes at every global minimizer x∗ ∈ K, and so x∗ is also a global minimizer of this Lagrangian.
130
5 Global Optimization over Polynomials
Note that in the local KKT optimality conditions (5.25), only the constraints gj (x) ≥ 0 binding at x∗ , have a possibly nontrivial associated Lagrange (scalar) multiplier λ∗j . In contrast, in the global optimality condition (5.27), every constraint gj (x) ≥ 0 has a possibly nontrivial s.o.s. polynomial Lagrange multiplier x 7→ σj (x). Note that if gj (x∗ ) > 0, then necessarily σj (x∗ ) = 0 = λ∗j , as in the local KKT optimality conditions. In non convex optimization, a constraint gj (x) ≥ 0 that is not binding at a global minimizer x∗ ∈ K may still be important, i.e., if removed from the definition of K, then the global minimum fK∗ may decrease strictly. In this case, and in contrast to local KKT optimality conditions (5.25), gj is necessarily involved in the representation (5.13) of f − fK∗ (when the latter exists), hence with a nontrivial s.o.s. multiplier σj which vanishes at x∗ . To see this consider the following trivial example. Example 5.6. Let n = 1 and consider the following problem: fK∗ = min {−x : 1/2 − x ≥ 0, x2 − 1 = 0}, x
fK∗
with optimal value = 1, and global minimizer x∗ = −1. The constraint 1/2 − x ≥ 0 is not binding at x∗ = −1, but if removed, the global minimum jumps to −1 with new global minimizer x∗ = 1. In fact, we have the representation f (x) − fK∗ = −(x + 1) = (x2 − 1)(x + 3/2) + (1/2 − x)(x + 1)2 , which shows the important role of the constraint 1/2 − x ≥ 0 in the representation of f − fK∗ , via its nontrivial multiplier x 7→ σ1 (x) := (x + 1)2 . Note also that σ1 (x∗ ) = 0 = λ1 and σ2 (x∗ ) = x∗ + 3/2 = −1/2 = λ2 are the KKT Lagrange multipliers (λ1 , λ2 ) ∈ R+ × R in the local optimality conditions (5.25).
5.6
Convex Polynomial Programs
If practice seems to reveal that convergence of the semidefinite relaxations is often fast and even finite, we have seen that their size grows rapidly with the rank i in the hierarchy. And so, if sparsity in the original problem
5.6 Convex Polynomial Programs
131
data is not exploited, the approach is limited to small to medium size problems only. On the other hand, it is well-known that a large class of convex optimization problems can be solved efficiently. Therefore, as the moment approach is dedicated to solving difficult non convex (most of the time N P-hard) problems, it should have the highly desirable feature to somehow recognize ”easy” problems like convex ones. That is, when applied to such easy problems it should show some significant improvement or a particular nice behavior not necessarily valid in the general case. This is the issue that we investigate in this section. 5.6.1
An extension of Jensen’s inequality
Recall that if µ is a probability measure on Rn with Eµ (x) < ∞, Jensen’s inequality states that if f ∈ L1 (µ) and f is convex, then Z Eµ (f ) (= f dµ) ≥ f (Eµ (x)), Rn
a very useful property in many applications. We next provide an extension of Jensen’s inequality when one restricts its application to the class of s.o.s.convex polynomials. Recall that Nn2d = {α ∈ Nn : |α| ≤ 2d}. Theorem 5.13. Let f ∈ R[x]2d be s.o.s.-convex; see Definition 2.3. Let y = (yα )α∈Nn2d satisfy y0 = 1 and Md (y) 0. Then: Ly (f ) ≥ f (Ly (x)),
(5.28)
where Ly (x) = (Ly (x1 ), . . . , Ly (xn )).
Proof. Let z ∈ Rn be fixed, arbitrary, and consider the polynomial x 7→ f (x) − f (z). Then from Lemma 2.30, f (x) − f (z) = h∇f (z), x − zi + h(x − z), F(x)(x − z)i,
(5.29)
with F : Rn →R[x]n×n being the matrix polynomial Z 1Z t x 7→ F(x) := ∇2 f (z + s(x − z)) ds dt. 0
0
As f is s.o.s.-convex, by Lemma 2.29, F is a s.o.s. matrix polynomial and so the polynomial x 7→ g(x) := h(x − z), F(x)(x − z) is s.o.s., i.e., g ∈ Σ[x].
5 Global Optimization over Polynomials
132
Then applying Ly to the polynomial x 7→ f (x) − f (z) and using (5.29) yields (recall that y0 = 1) Ly (f ) − f (z) = h∇f (z), Ly (x) − zi + Ly (g) ≥ h∇f (z), Ly (x) − zi
[because Ly (g) ≥ 0].
As z ∈ Rn was arbitrary, taking z := Ly (x) (= (Ly (x1 ), . . . , Ly (xn )) yields the desired result. Hence (5.28) is Jensen’s inequality extended to linear functionals Ly : R[x]2d →R in the dual cone of Σ[x]d , that is, vectors y = (yα ) such that Md (y) 0 and y0 = Ly (1) = 1; hence y is not necessarily the (truncated) moment sequence of some probability measure µ. As a consequence we also get: Corollary 5.14. Let f be a convex univariate polynomial, g ∈ R[x] (and so f ◦ g ∈ R[x]). Let d := d(deg f ◦ g)/2e, and let y = (yα )α∈Nn2d be such that y0 = 1 and Md (y) 0. Then: Ly (f ◦ g) ≥ f (Ly (g)). 5.6.2
(5.30)
The s.o.s.-convex case
Next, with f ∈ R[x]d and 2i ≥ max[deg f, maxj deg gj ], consider the semidefinite program: ρi = inf {Ly (f ) : Mi (y) 0; Ly (gj y) ≥ 0, j = 1, . . . , m; y0 = 1} (5.31) y
and its dual ρ∗i =
sup γ,λ∈Rm + ,σ0 ∈Σ[x]i
{γ : f − γ = σ0 +
m X j=1
λj gj }.
(5.32)
Theorem 5.15. Let K be as in (2.10) and Slater’s condition hold. Let f ∈ R[x] be such that fK∗ := inf x {f (x) : x ∈ K} = f (z) for some z ∈ K. Assume that f and −gj are s.o.s.-convex, j = 1, . . . , m. Then fK∗ = ρi = ρ∗i . Moreover, if y is an optimal solution of (5.31) then x∗ := (Ly (xi )) ∈ K is a global minimizer of f on K.
5.6 Convex Polynomial Programs
133
Proof. Recall the definition of Qc (g) in (2.35). By Theorem 2.32, f −fK∗ ∈ P Qc (g), i.e., f − fK∗ = σ0 + j λj gj for some λ ∈ Rm + and some σ0 ∈ Σ[x]. Therefore, (fK∗ , λ, σ0 ) is a feasible solution of (5.32), which yields ρ∗i ≥ fK∗ , and which combined with ρ∗i ≤ ρi ≤ fK∗ yields ρ∗i = ρi = fK∗ . Obviously (5.31) is solvable and so, let y be an optimal solution. By Theorem 5.13, fK∗ = ρ∗i = Ly (f ) ≥ f (Ly (x)), and similarly, 0 ≥ Ly (−gj ) ≥ −gj (Ly (x)), j = 1, . . . , m, which shows that x∗ := Ly (x) (= Ly (xi )) ∈ K is a global minimizer of f on K. Therefore, when the polynomials f and (−gj ) are s.o.s., the first semidefinite program in the hierarchy of semidefinite programs (5.14) is exact as it is either identical to (5.31) or more constrained, hence with optimal value ρi = fK∗ . In other words, the methodology recognizes s.o.s.-convexity. 5.6.3
The strictly convex case
If f or some of the −gj ’s is not s.o.s.-convex but ∇2 f 0 (so that f is strictly convex) and −gj is convex for every j = 1, . . . , m, then one obtains the following result.
Theorem 5.16. Let K be as in (2.10) and let Assumption 2.1 and Slater’s condition hold. Assume that for all x ∈ Rn , ∇2 f (x) 0 and −∇2 gj (x) 0, j = 1, . . . , m. Then the hierarchy of semidefinite relaxations (5.14) -(5.15) has finite convergence. That is, fK∗ = ρ∗i = ρi for some index i. In addition, both primal and dual relaxations are solvable.
Proof.
Let fK∗ := minx∈K f (x). By Theorem 2.33, f − fK∗ = σ0 +
m X
σj g j ,
j=1
for some s.o.s. polynomials (σj ) ⊂ Σ[x]. Let 2i0 ≥ maxj deg σj + deg gj (with g0 = 1). Then (fK∗ , (σj )) is a feasible solution of the semidefinite program (5.15). Hence fK∗ ≤ ρ∗i ≤ ρi ≤ fK∗ , which yields the desired result fK∗ = ρi = ρ∗i . When compared to Theorem 5.15 for the s.o.s.-convex case, in the strictly convex case the simplified semidefinite relaxation (5.31) is not guar-
134
5 Global Optimization over Polynomials
anteed to be exact. However, finite convergence still occurs for the standard semidefinite relaxations (5.14). Hence the hierarchy of semidefinite relaxations exhibits a particularly nice behavior for convex problems, a highly desirable property since convex optimization problems are easier to solve. In contrast, and using arguments from Section 5.4.2, such a nice behavior cannot be expected in general for the LP-relaxations (5.19).
5.7
Discrete Optimization
In this section, we consider problem (5.2) with K a finite variety. More precisely, in the definition (5.3) of K, the gi ’s are such that K can be rewritten K = {x ∈ Rn : gi (x) = 0, i = 1, . . . , m;
hj (x) ≥ 0, j = 1, . . . , s}. (5.33)
Indeed, (5.33) is a particular case of (5.3) where some inequality constraints gi (x) ≥ 0 and −gi (x) ≥ 0 are present. The polynomials (gi )m i=1 ⊂ R[x] define an ideal I := hg1 , . . . , gm i of R[x], and we will consider the case where I is a zero-dimensional radical ideal, that is, I is radical, and the variety VC (I) := {x ∈ Cn : gi (x) = 0, i = 1, . . . , m}, is a finite set (see Sections 2.6 and A.2). This is an important special case as it covers all basic 0/1 optimization problems. For instance, if we let gi (x) := x2i − xi , for all i = 1, . . . , n, then we recover boolean optimization, in which case, VC (I) = {0, 1}n and I = hx21 − x1 , . . . , x2n − xn i is radical. Similarly, given (ri )ni=1 ⊂ N, and a i finite set of points (xij )rj=1 ⊂ R, i = 1, . . . , n, let gi (x) :=
ri Y
j=1
(xi − xij ),
i = 1, . . . , n.
Then we recover (bounded) integer optimization problems, in which case VC (I) is the grid {(x1j1 , x2j2 , · · · , xnjn )}, where jk ∈ {1, . . . , rk } for all k = 1, . . . , n.
5.7 Discrete Optimization
135
Theorem 5.17. Let f ∈ R[x], K ⊂ Rn be as in (5.33), and let the ideal I = hg1 , . . . , gm i ⊂ R[x] be radical and zero-dimensional. Let (ρi ) be the optimal values of the semidefinite relaxations defined in (5.14). Then, there is some i0 ∈ N such that ρi = fK∗ for all i ≥ i0 .
Proof. The polynomial f − fK∗ is nonnegative on K. Therefore, by Theorem 2.26 there exist polynomials (σk )m k=1 ⊂ R[x], and s.o.s. polynomials (σ0 , (vj )sj=1 ) ⊂ Σ[x], such that: f − fK∗ = σ0 +
m X
k=1
σk g k +
s X
v j hj .
j=1
Let d1 , d2 , d3 be the degree of σ0 , the maximum degree of the polynomials s (σk gk )m i=1 , and (vj hj )j=1 respectively, and let 2i0 ≥ max[d1 , d2 , d3 ]. ∗ Then, (fK , (σk ), (vj )) is a feasible solution for of the relaxation (5.15) for i = i0 , and with value fK∗ , so that ρ∗i0 ≥ fK∗ . As we also have ρ∗i ≤ ρi ≤ fK∗ whenever the semidefinite relaxations are well-defined, we conclude that ρ∗i = ρi = fK∗ for all i ≥ i0 . Finally, let µ be the Dirac probability measure at some global minimizer x∗ ∈ K of problem (5.2), and let y ∈ R∞ be the vector of its moments. Then y is feasible for all the semidefinite relaxations (5.14), with value fK∗ , which completes the proof. In fact, more will be said in the next chapter. In particular, as soon as the real variety VR (I) is finite (and even if I is not real radical), the semidefinite relaxation (5.14) is exact for some index i in the hierarchy, and one may extract all optimal solutions. See Theorem 6.1.
5.7.1
Boolean optimization
It is worth noting that in the semidefinite relaxations (5.14), the constraints Mr (gi y) = 0 translate into simplifications via elimination of variables in the moment matrix Mr (y) and the localizing matrices Mr (hj y). Indeed, consider for instance the boolean optimization case, i.e., when gi (x) = x2i − xi for all i = 1, . . . , n. Then the constraints Mr (gi y) = 0 for all i = 1, . . . , n, simply state that whenever |α| ≤ 2r, one replaces every
136
5 Global Optimization over Polynomials
variable yα with the variable yβ , where for k = 1, . . . , n: ( 0, if αk = 0, βk = 1, otherwise. Indeed, with x2i = xi for all i = 1, . . . , n, one has xα = xβ , with β as above. For instance, with n = 2, we obtain y00 y10 y01 y10 y11 y01 y y y y y y 10 10 11 10 11 11 y y y y y y M2 (y) = 01 11 01 11 11 01 . y10 y10 y11 y10 y11 y11 y11 y11 y11 y11 y11 y11 y01 y11 y01 y11 y11 y01
In addition, every column (row) of Mr (y) corresponding to a monomial xα , with αk > 1 for some k ∈ {1, . . . , n}, is identical to the column corresponding to the monomial xβ , with β as above. Hence, the constraint Mr (y) 0 c r (y) 0, with the new simplified moment reduces to the new constraint M matrix y00 y10 y01 y11 c 2 (y) = y10 y10 y11 y11 . M y01 y11 y01 y11 y11 y11 y11 y11
Theorem 5.17 has little practical value. For instance, in the case of boolean optimization, we may easily show that ρi = ρi0 for all i ≥ i0 := n. c i0 (y) has size 2n × 2n , and solving But in this case the simplified matrix M problem (5.2) by simple enumeration would be as efficient! However, in general, one obtains the exact global optimum at some earlier relaxation with value ρi , i.e., with i i0 . Numerical experiments. One also reports the performance of GloptiPoly on a series of small-size combinatorial optimization problems (in particular, the MAXCUT problem). In Table 5.3 we first let GloptiPoly converge to the global optimum, in general extracting several solutions. The number of extracted solutions is reported in the column labelled sol. Then, we slightly perturbed the criterion to be optimized in order to destroy the problem symmetry. Proceeding this way, the optimum solution
5.7 Discrete Optimization
137
Table 5.3 Discrete optimization problems. CPU times and semidefinite relaxation order required to reach global optimum and extract several solutions. problem QP (Floudas et al., 1999, Pb. 13.2.1.1) QP (Floudas et al., 1999, Pb. 13.2.1.2) Max-Cut P1 Floudas et al. (1999) Max-Cut P2 Floudas et al. (1999) Max-Cut P3 Floudas et al. (1999) Max-Cut P4 Floudas et al. (1999) Max-Cut P5 Floudas et al. (1999) Max-Cut P6 Floudas et al. (1999) Max-Cut P7 Floudas et al. (1999) Max-Cut P8 Floudas et al. (1999) Max-Cut P9 Floudas et al. (1999) Max-Cut cycle C5 Anjos (2001) Max-Cut complete K5 Anjos (2001) Max-Cut 5-node Anjos (2001) Max-Cut antiweb AW92 Anjos (2001) Max-Cut 10-node Petersen Anjos (2001) Max-Cut 12-node Anjos (2001)
var 4 10 10 10 10 10 10 10 10 10 10 5 5 5 9 10 12
cstr 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
deg 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
CPU 0.10 3.61 38.1 2.7 2.6 2.6 2.6 44.3 2.6 49.3 0.19 0.19 0.24 39.6 -
order 1 2 3 2 2 2 4 2 3 2 3 3 4 3 4 3 3
sol 1 1 10 2 2 2 dim 2 4 2 6 10 20 6 dim 10 dim
is generically unique and convergence to the global optimum is easier. Of course, the size of the combinatorial problems is relatively small and GloptiPoly cannot compete with ad hoc heuristics, which may solve problems with many more variables. But these numerical experiments are reported only to show the potential of the method, as in most cases the global optimum is reached at the second semidefinite relaxation in the hierarchy. 5.7.2
Back to unconstrained optimization
We have seen in Section 5.2 that for the unconstrained optimization problem (5.1), the semidefinite relaxations (5.6) reduce to a single one, and with a 0-1 answer, depending on whether or not, the polynomial f − f ∗ is a sum of squares. Therefore, in general, according to Theorem 5.3, the SDP (5.6) provides only a lower bound on f ∗ . However, if one knows a priori some bound M on the euclidean norm kx∗ k of a global minimizer x∗ ∈ Rn , then it suffices to replace the original unconstrained problem (5.1) with the constrained problem (5.2), with K being the basic semi-algebraic set K := {x ∈ Rn :
M 2 − kxk2 ≥ 0}.
138
5 Global Optimization over Polynomials
It is immediate to verify that Assumption 2.1 holds, and therefore, the machinery described in Section 5.3 applies, and the semidefinite relaxations (5.14) with K as above, converge to f ∗ = fK∗ . Another approach which avoids the a priori knowledge of this bound M , consists of taking K := {x ∈ Rn :
∇f (x) = 0},
(5.34)
since if a global minimizer x∗ ∈ Rn exists, then necessarily ∇f (x∗ ) = 0, and in addition, x∗ is also a global minimizer of f on K defined in (5.34). However, convergence of the relaxations (5.14) has been proved for a compact set K. Fortunately, the set K in (5.34) has nice properties. Indeed: Proposition 5.18. For almost all polynomials f ∈ R[x]d , the ideal If is zero-dimensional and radical. And so we get the following result:
Theorem 5.19. With f ∈ R[x]d , and K ⊂ Rn as in (5.34), consider the semidefinite relaxation defined in (5.14) with optimal value ρi . Let Fd := {f ∈ R[x]d : ∃ x∗ ∈ Rn s.t. f (x∗ ) = f ∗ = min {f (x) : x ∈ Rn }}. Then for almost all f ∈ Fd , ρi = f ∗ = fK∗ , for some index i, i.e. finite convergence takes place. Theorem 5.19 is a direct consequence of Proposition 5.18 and Theorem 5.17. 5.8
Global Minimization of a Rational Function
In this section, we consider the problem:
rK := inf {
p(x) : x ∈ K }, q(x)
(5.35)
5.8 Global Minimization of a Rational Function
139
where p, q ∈ R[x] and K is defined as in (5.3). Note that if p, q have no common zero on K, then rK > −∞ only if q does not change sign on K. Therefore, we will assume that q is strictly positive on K.
Proposition 5.20. Let rK be the optimal value of Problem (5.35) and consider the following instance of the generalized problem of moments Z ρmom = inf p dµ µ∈M (K)+ K Z (5.36) s.t. q dµ = 1. K
Then rK = ρmom .
Proof. Assume first that rK > −∞, so that p(x) ≥ rK q(x) for all x ∈ K, R and let µR ∈ M (K)+ be a feasible measure for problem (5.36). Then K p dµ ≥ rk K q dµ, leading to ρmom ≥ rK . Conversely, let x ∈ K be fixed arbitrary, and let µ ∈ M (K)+ be the −1 Rmeasure q(x) δx , where δx is the Dirac measure at the point x ∈ K. Then, K q dµ = 1, so that R µ is a feasible measure for problem (5.36). Moreover, its value satisfies K p dµ = p(x)/q(x). As x ∈ K was arbitrary, ρmom ≤ rK , and the result follows. Finally if rK = −∞ then from what precedes we also have ρmom = −∞. One recognize in (5.36) a particular instance of the generalized moment problem (1.1). Let 2vk or 2vk − 1 be the degree of the polynomial gk ∈ R[x] in the definition (5.3) of K, for all k = 1, . . . , m. Proceeding as in Section 5.3, we obtain the semidefinite relaxation for i ≥ max{deg p, deg q, maxk vk }, which is analogous to (5.14): ρi = inf Ly (p) y
s.t. Mi (y), Mi−vk (gk y) 0, Ly (q) = 1.
k = 1, . . . , m
(5.37)
Note that in contrast to (5.14) where y0 = 1, in general y0 6= 1 in (5.37). In fact, the last constraint Ly (q) = 1 in (5.37) yields y0 = 1 whenever q = 1, that is, problem (5.36) reduces to problem (5.2). The dual of problem (5.37)
5 Global Optimization over Polynomials
140
reads: ρ∗i = sup λ σk ,λ
s.t. p − λq = σ0 + σk ∈ Σ[x];
m X
(5.38)
σk g k
k=1
deg σk ≤ i − vk , k = 0, 1, . . . , m.
(with v0 = 1 ). Recall that q > 0 on K.
Theorem 5.21. Let K be as in (5.3), and let Assumption 2.1 hold. Consider the semidefinite relaxations (5.37) and (5.38). Then, (a) rK is finite and ρ∗i ↑ rK , ρi ↑ rK as i→∞. (b) In addition, if K ⊂ Rn has nonempty interior then ρ∗i = ρi for every i. (c) Let x∗ ∈ K be a global minimizer of p/q on K. If the polynomial p − q rK ∈ R[x], nonnegative on K, has the representation (5.13), then both problems (5.37) and (5.38) have an optimal solution, and ρ ∗i = ρi = rK , for all i ≥ i0 , for some i0 ∈ N. Proof. (a) follows directly from Theorem 4.1 and rK = ρmom . (b) ρ∗i = ρi follows from Slater’s condition which is satisfied for problem (5.37). Indeed, let µ ∈ M (K) R + be a measure with uniform distribution on K, and scaled to ensure q dµ = 1. (As K is compact, this is always possible.) Then, the vector y of its moments, is a strictly feasible solution of problem (5.37) (Mi (y) 0, and K having a nonempty interior implies Mi−vk (gk y) 0 for all k = 1, . . . , m). Thus, there is no duality gap, i.e., ρ∗i = ρi , and the result follows from (a). (c) If p − qrK has the representation (5.13), then as we did for part (a), we can find a feasible solution to problem (5.38) with value rK , for all i ≥ i0 , for some i0 ∈ N. Hence, ρ∗i = ρi = rK ,
∀ i ≥ i0 .
Finally, (5.37) has an optimal solution y. It suffices to take for y the vector of moments of the measure q(x∗ )−1 δx∗ , i.e., yα = q(x∗ )−1 (x∗ )α ,
α ∈ Nn .
5.9 Exploiting Symmetry
5.9
141
Exploiting Symmetry
In this section we briefly describe how symmetry can be exploited to replace a semidefinite program invariant under the action of some group of permutations, with a much simpler one. In particular, it can be applied to the semidefinite relaxations (5.14) when f ∈ R[x] and the polynomials (gj ) ⊂ R[x] that define K are all invariant under some group of permutations. Let Sn be the space of n × n real symmetric matrices and let Aut(Sn ) be the group of automorphisms on Sn . Let G be a finite group acting on Rn via ρ0 : G → GL(Rn ), which in turn induces an action ρ : G → Aut(Sn ) on Sn by ρ(g)(X) = ρ0 (g)T X ρ0 (g) for every g ∈ G , X ∈ Sn . Assume that ρ0 (g) is orthonormal for every g ∈ G . A matrix X ∈ Rn×n is said to be invariant under the action of G if ρ(g)(X) = X for every g ∈ G , and X is invariant if and only if X is an element of the commutant algebra AG := { X ∈ Rn×n : ρ0 (g) X = X ρ0 (g),
∀ g ∈ G }.
(5.39)
Of particular interest is when G is a subgroup of the group Pn of permutations of {1, . . . , n}, in which case ρ0 (g)(x) = (xg(i) ) for every x ∈ Rn , and ρ(g)(X)ij = Xg(i),g(j) for every 1 ≤ i, j ≤ n. For every (i, j) ∈ {1, . . . , n} × {1, . . . , n}, the orbit OG (i, j) under action of G , is the set of couples {(g(i), g(j)) : g ∈ G }. With ω the number of ˜ l by (D ˜ l )ij := 1 if (i, j) orbits, and 1 ≤ l ≤ ω, define the n × n matrix D q ˜ l, D ˜ l i, for ˜ l / hD belongs to orbit l, and 0 otherwise. Normalize to Dl := D
every 1 ≤ l ≤ ω, and define:
• The multiplication table Di Dj =
ω X
l γij Dl ,
i, j = 1, . . . , ω.
l=1
• The ω × ω matrices L1 , . . . , Lω by: i (Lk )ij := γkj ,
i, j, k = 1, . . . , ω.
Then the commutant algebra (5.39) reads ( ω ) X G A = xl D l : xl ∈ R l=1
5 Global Optimization over Polynomials
142
with dimension dim AG = ω. Exploiting symmetry in semidefinite program is possible thank to the following crucial property of the matrices (Dl ): Theorem 5.22. The mapping Dl → Ll is a ?-isomorphism called the regular ?-representation of AG , and in particular: ω X j=1
xl D l 0
⇐⇒
ω X j=1
xl Ll 0.
(5.40)
Application to semidefinite programming Consider the semidefinite program sup { hC, Xi : hAk , Xi = bk ,
k = 1, . . . , p}
(5.41)
X0
and assume it is invariant under the action of G , that is, C ∈ AG and the feasible region is globally invariant, meaning that if X feasible in (5.41) then so is ρ(g)(X) for every g ∈ G . By convexity, for every feasible X of (5.41), the matrix X0 := P 1 g∈G ρ(g)(X) is feasible, invariant under action of G and with same |G | objectif value as X. Therefore, we can include in the semidefinite program (5.41) the additional linear constraint X ∈ AG without affecting the optimal value. Pω Therefore, writing X = l=1 xl Dl and setting cl := hC, Dl i,
akl := hAk , Dl i,
∀ l = 1, . . . , ω; k = 1, . . . , p,
the semidefinite program (5.41) has same optimal value as 0
sup { c x :
x∈Rω
a0k
x = bk ,
k = 1, . . . , p;
ω X l=1
xl Ll 0 }.
(5.42)
Observe that in (5.41) we have n variables and a n × n positive semidefinite matrix X, whereas in (5.42) we only have ω variables and a ω × ω positive semidefinite matrix.
5.10 Summary
5.10
143
Summary
In this chapter we have considered the general global optimization problem with polynomial data. We have provided a hierarchy of semidefinite relaxations to approximate as closely as desired the optimal value of the original problem. Sometimes the optimal value is obtained exactly by solving (few) finitely many semidefinite programs of the hierarchy. Like primal approaches in nonlinear programming that search for ˜ ∈ Rn , the moment approach should also be rea local minimizer x garded as a primal approach where one now searches not only for a global minimizer x∗ ∈ Rn but also for the sequence of its moments y∗ = ((x∗ )α ), α ∈ Nn , i.e., a search in a lifted space (of moments). Similarly, like dual approaches in nonlinear programming (e.g. Lagrangian and extended Lagrangian) that search for scalar Lagrange Karush-Kuhn-Tucker multipliers associated with the constraints, the s.o.s. approach should also be viewed as a dual approach as one also searches for multipliers of the constraints, but now s.o.s. (instead of scalar) multipliers, hence also in a lifted space (of polynomials). A nice feature of the hierarchy (5.14) is finite convergence in the s.o.s.convex case (as the first semidefinite program in the hierarchy is exact), and in the strict convex case as well. In other words, the moment-s.o.s. approach recognizes easier convex problems. Next, we have provided a convergent hierarchy of linear relaxations which unfortunately suffers from several drawbacks; in particular, and in contrast with the hierarchy of semidefinite relaxations, convergence is only asymptotic in general (except for discrete problems). We have also defined a hierarchy of sparse semidefinite relaxations that exploits sparsity in the problem definition, that is when each of the polynomials involved in the description of the set K involves a few variables only, and when the polynomial f to minimize is a sum of polynomials that also involve a few variables only. Convergence still holds if the sparsity pattern satisfies the so-called ”running intersection property” well-known in gaph theory. Finally, we have also extended the approach to find the global minimum of a rational function on a compact basic semi-algebraic set.
5 Global Optimization over Polynomials
144
5.11
Exercises
Exercise 5.1. Prove Theorem 5.10 and Theorem 5.11. Exercise 5.2. Let f ∈ R[x] be a univariate polynomial. Show that for solving the problem fK∗ = min{f (x) : x ≥ 0}, x
with K = [0, ∞), the relaxation (5.14) with i := d(deg f )/2)e is exact. Exercise 5.3. Show that for solving the problem inf { x : x2 = 0}, the semidefinite relaxation (5.14) with i = 1 is exact. What about the dual? Is it solvable? Why? Exercise 5.4. Consider the global minimization of the bivariate quartic polynomial x 7→ f (x) := x21 +(1−x1 x2 )2 . What can be said about the semidefinite relaxation (5.6)? Is there any optimal solution? Similarly, what can be said about the semidefinite relaxation (5.8)? Is there any optimal solution?
5.12
Notes and Sources
For a survey on semidefinite programming and its multiple applications, the interested reader is referred to Vandenberghe and Boyd (1996). Most of the material in this Chapter is from Lasserre (2000, 2001, 2002d,b,c, 2004, 2006d). Shor (1987, 1998) was the first to prove that the global minimization of a univariate polynomial is a convex optimization problem. Later, Nesterov (2000) defined exact semidefinite formulations for the univariate case, while converging semidefinite relaxations for the general multivariate case were treated in Lasserre (2000, 2001, 2002b,c) and Parrilo (2000, 2003). de Klerk et al. (2006) provided a Polynomial Time Approximation Scheme for minimizing polynomials of fixed degree on the simplex. Building on Shor’s ideas, Sherali and Adams (1990, 1999) and Sherali et al. (1992), proposed their RLT (Reformulation-Linearization Technique), the earliest hierarchy of LP relaxations for polynomial optimization, and proved finite convergence for 0/1 problems. Notice that the LP-relaxations of the RLT formulation, and especially their dual, can be interpreted in the light of the (representation) Theorem 2.23 which provide a rationale for their convergence. Other linear relaxations for 0/1 programs have also
5.12 Notes and Sources
145
been proposed by Balas et al. (1993), Lov´ asz and Schrijver (1991), while a hierarchy of semidefinite relaxations for 0/1 problems were first proposed by Lov´ asz and Schrijver (1991) who also proved finite convergence. A comparison between semidefinite and linear relaxations (in the spirit of Sherali and Adams) for general polynomial optimization problems is made in Lasserre (2002d), in the light of the results of Chapter 2 on the representation of positive polynomials. For 0/1 problems, Laurent (2003) compares the linear relaxations of Sherali and Adams (1990) and Lov´ asz and Schrijver (1991), and the semidefinite relaxations of Lov´ asz and Schrijver (1991) and Lasserre (2002b) within the common framework of the moment matrix, and proved that the latter semidefinite relaxations are the strongest. This has motivated research on integrality gaps for difficult combinatorial optimization problems. (The integrality gap measures the ratio between the optimal value of the relaxation and that of the problem to solve.) In particular Chlamtac (2007) and Chlamtac and Singh (2008) showed that the hierarchy of semidefinite relaxations provides improved approximations algorithms for finding independent sets in graphs, and for colouring problems. See also the related work of Schoenebeck (2008). Theorem 5.17 is from Parrilo (2003), an extension to the general setting (5.33) of the grid case studied in Lasserre (2002b,c). See also Laurent (2007a) for refinements. Recent approaches to unconstrained optimization via optimizing on the gradient ideal appears in Hanzon and Jibetean (2003) with matrix methods and in Jibetean and Laurent (2005) with semidefinite programming. In both approaches one slightly perturbates p (of say degree 2d) by adding monomials {x2d+2 }, with a small i coefficient , and obtain a sequence of polynomials f , with the property that V = {x : ∇f (x) = 0}, is finite and the minima f∗ converge to f ∗ . In particular, the polynomials {∂f /∂xi } form a Gr¨ obner basis of the ideal they generate. On the other hand, Proposition 5.18 is Proposition 1 in Nie et al. (2006). To handle the case where no global minimizer exists, Schweighofer (2006) uses s.o.s. and the concept of gradient tentacles and Vui and Son (2009) use the truncated tangency variety. Polynomials satisfying sparsity patterns were investigated in Kojima et al. (2005) and the sparse semidefinite relaxations (5.17) were first proposed in Waki et al. (2006) as a heuristic to solve global optimization problems with a large number of variables and which satisfy some sparsity pattern. Their convergence in Theorem 5.9 was proved in Lasserre
146
5 Global Optimization over Polynomials
(2006a) if the sparsity pattern satifies the running intersection property. The sparse relaxations have been implemented in the SparsePOP software of Waki et al. (2009) and numerical experiments show that one may then solve global optimization problem with n = 1000 variables for which even the first non sparse semidefinite relaxation of the hierarchy (5.14) cannot be implemented. Kim et al. (2009, to appear) provides a nice application for sensor network localization. Section 5.6 is from Lasserre (2009c) while Section 5.8 is inspired from Jibetean and de Klerk (2006), the first to prove Theorem 5.21. Section 5.9 is inspired from from Laurent (2008) and Theorem 5.22 is from de Klerk et al. (2007). For exploiting symmetry in the context of sums of squares and semidefinite programming see also the work of Gaterman and Parrilo (2004) and Vallentin (2009). For instance, these ideas have been used sucessfully in coding theroy for large error correcting codes based on computing the stability number of some related graph; see e.g. Laurent (2007b) and Schrijver (2005).
Chapter 6
Systems of Polynomial Equations
This chapter is about solving systems of polynomial equations. Of course if the goal is to search for just one solution, one may minimize any polynomial criterion and see the problem as the global optimization of Chapter 5. But we also consider the case where one searches for all complex and/or real solutions and show that the moment approach is well-suited to solve this problem as it provides the first algorithm to compute all real solutions without computing all the complex solutions, in contrast with the usual algebraic approaches based on Gr¨ obner bases or homotopy.
6.1
Introduction
Computing all complex and/or all real solutions of a system of polynomial equations is a fundamental problem in mathematics with many important practical applications. Namely, let J ⊆ R[x] be an ideal generated by a set of polynomials gj ∈ R[x], j = 1, . . . , m. Two basic important problems are: (I) The computation of the algebraic variety VC (J) := {z ∈ Cn : gj (z) = 0 ∀j = 1, . . . , n }
(6.1)
(II) The computation of the real variety VR (J) := {x ∈ Rn : gj (x) = 0 ∀j = 1, . . . , n } = VC (J) ∩ Rn (6.2) both associated with J, and possibly, a set of generators for the radical ideal I(VL (J)) for L = R or C, assuming VL (J) is finite. One way to solve problem (II) is to first compute all complex solutions and to sort out VR (J) = Rn ∩VC (J) from VC (J) afterwards. This is certainly 147
148
6 Systems of Polynomial Equations
possible when I is a zero-dimensional ideal, but even in this case one might perform many unnecessary computations, particularly if |VR (J)| |VC (J)|, i.e. in case there are many more complex roots than real roots. In addition, there are cases where VR (J) is finite whereas VC (J) is not! These two reasons alone provide a rationale for designing a method tailored to problem (II), that is, a method that takes into account right from the beginning the real algebraic nature of the problem. In this chapter we first consider obtaining (at least) one (real) solution instead of all solutions in VC (J) as do some algebraic approches based on Gr¨ obner bases when I is zero-dimensional (but at a high computational cost and with exact arithmetic). One may search for solutions x ∈ VR (J) that minimize some chosen polynomial criterion f ∈ R[x], which reduces to solving a particular case of the polynomial optimization problem studied in Chapter 5. We show on a significant sample of benchmark problems that in most cases, the semidefinite relaxations (5.14) deliver a solution in a few iterations, thus providing a viable complement and/or alternative to the more ambitious but computationally expensive, algebraic approaches with exact arithmetic. In addition, it also works even when the ideal I is not zero-dimensional. Then we provide a unified treatment of problems (I) and (II) to obtain the 0-dimensional variety VL (J) with L = R or L = C. Remarkably: (a) All information needed is contained in the moment matrix Mi (y) provided the order i is sufficiently large. (b) In the algorithm, what differentiates the search for all real solutions from the search of all real and complex solutions is the presence or absence of a semidefiniteness constraint on the moment matrix. 6.2
Finding a Real Solution to Systems of Polynomial Equations
If one is interested in finding only at least one point x ∈ K with good (but not infinite) precision, then an alternative to algebraic methods is to specify an arbitrary polynomial criterion f ∈ R[x] to minimize, and solve the associated optimization problem (5.2) of Chapter 5 with K as in (6.3), and with optimal value fK∗ . More specifically, the basic closed semi-algebraic feasible set K ⊂ Rn is now of the form: K := { x ∈ Rn : gj (x) = 0,
j = 1, . . . , m;
hk (x) ≥ 0, k = 1, . . . , p} ,
(6.3)
6.2 Finding a Real Solution to Systems of Polynomial Equations
149
for some polynomials gj , hk ∈ R[x], for all j = 1, . . . , m, and all k = 1, . . . , p. That is, K is the intersection of a real variety with a basic closed semialgebraic set. It is clearly a special case of the set K described in (5.3), as an inequality can be written as two reverse inequalities. Therefore the semidefinite relaxations (5.14) specialize to ρi = inf Ly (f ) y
s.t. y0 Mi (y)
= 1 0,
(6.4)
Mi−vj (gj y) = 0, j = 1, . . . , m Mi−wk (hk y) 0, k = 1, . . . , p with 2i ≥ s := max[deg f, maxj deg gj , maxk deg hk ]. Let J := hg1 , . . . , gm i ⊂ R[x] be the ideal generated by the gj ’s. If VR (J) is finite then the sequence of semidefinite relaxations (6.4) has finite convergence, that is, there is some i1 ∈ N, such that ρi = fK∗ for all i ≥ i1 , and the inf is attained, that is, (6.4) has an optimal solution y. Moreover, by running Algorithm 4.2, one extracts s (:= rank Mi (y)) points of K, all global minimizers. In addition, if the ideal J is radical, then the dual SDPs of (6.4) also have finite convergence. More precisely, let v := ds/2e. Theorem 6.1. Let f ∈ R[x] and let K be compact and defined as in (6.3). Let Assumption (2.1) hold (considering an equality constraint gj (x) = 0 as two reverse inequality constraints), and consider the semidefinite relaxations defined in (6.4). Then: (a) ρi ↑ fK∗ and ρ∗i ↑ fK∗ as i→∞. (b) If the ideal J (= hg1 , . . . , gm i) is such that VR (J) is finite, then there exists i1 such that (6.4) has an optimal solution and ρi = fK∗ for all i ≥ i1 . Moreover, if y is an optimal solution of (6.4) for i = i1 , then rank Ms−v (y) = rank Ms (y) =: r,
(6.5)
for some d ≤ s ≤ i1 , and one may extract r global minimizers in K by Algorithm 4.2. (c) If J = hg1 , . . . , gm i is zero-dimensional and radical then there exists i2 such that the dual SDP of (6.4) is also solvable with value ρ ∗i = fK∗ for all i ≥ i2 .
6 Systems of Polynomial Equations
150
Theorem 6.1(a) is a particular case of Theorem 4.1 whereas to prove (b) and (c) one invokes Theorem 6.20 in Laurent (2008) and Theorem 2.26 respectively. Notice that for (c) the radicality assumption of the ideal J cannot be removed as seen in the following trivial one-dimensional example. Example 6.1. fK∗ = inf {x ∈ R : x2 = 0 }, x
for which we have ρ1 = inf { y1 : y0 = 1; y2 = 0; y
y0 y1 y1 y2
0 },
and so, evidently, fK∗ = 0 = ρ1 with optimal solution y0 = 1, y1 = y2 = 0. On the other hand, the polynomial x 7→ x − fK∗ = x cannot be written as q0 + x2 q1 for some q1 ∈ R[x] and some s.o.s. polynomial q0 ∈ Σ[x]. Therefore, the dual of the semidefinite relaxation (6.4) has no optimal solution. However, there is no duality gap because for every > 0, √ 1 1 x 7→ x − (fK∗ − ) = x + = ( + √ x)2 − x2 , 2 4 and so ρ∗1 = fK∗√= 0. Of course, the ideal J = hx2 i is zero-dimensional but not radical as J = hxi. Numerical experiment. The experiments were carried out by using the software GloptiPoly. In fact, instead of choosing an explicit criterion f ∈ R[x] to minimize, we have decided to minimize the trace of the moment matrix in the semidefinite relaxations (6.4). Alternative criteria (such as e.g. minimum coordinate or minimum Euclidean-norm solution) are of course possible. When the rank test (6.5) is passed, then running Algorithm 4.2 returns r solutions in K. Tables 6.1 and 6.2 give a short description of the examples in the sample. Numerical results are reported in Tables 6.3 and 6.4. Column sol indicates the number of solutions successfully extracted by GloptiPoly. In the last
6.2 Finding a Real Solution to Systems of Polynomial Equations Table 6.1 problem boon bifur brown butcher camera1s caprasse cassou chemequ cohn2 cohn3 comb3000 conform1 conform2 conform3 conform4 cpdm5 d1 des18 3 des22 24 discret3 eco5 eco6 eco7 eco8 fourbar geneig heart i1 ipp katsura5 kinema kin1 ku10 lorentz manocha noon3 noon4 noon5 proddeco puma quadfor2 quadgrid
151
Short descriptions of systems of polynomial equations. Part 1.
short description neurophysiology problem non-linear system bifurcation Brown’s 5-dimensional almost linear system Butcher’s system from PoSSo test suite displacement of camera between two positions Caprasse’s system from PoSSo test suite Cassou-Nogues’s system from PoSSo test suite chemical equilibrium of hydrocarbon combustion Cohn’s modular equations for special algebraic number fields Cohn’s modular equations for special algebraic number fields combustion chemistry example for a temperature of 3000 degrees Emiris’ conformal analysis of cyclic molecules (b11 = −9) √ Emiris’ conformal analysis of cyclic molecules (b11 = − 3/2) Emiris’ conformal analysis of cyclic molecules (b11 = −310) Emiris’ conformal analysis of cyclic molecules (b11 = −13) 5-dimensional system of Caprasse and Demaret sparse system by Hong and Stahl dessin d’enfant dessin d’enfant from PoSSo test suite 5-dimensional economics problem 6-dimensional economics problem 7-dimensional economics problem 8-dimensional economics problem four-bar mechanical design problem generalized eigenvalue problem heart dipole problem interval arithmetic benchmark six-revolute-joint problem of mechanics problem of magnetism in physics robot kinematics problem inverse kinematics of an elbow manipulator 10-dimensional system of Ku equilibrium points of 4-dimensional Lorentz attractor intersection of high-degree polynomial curves neural network modeled by adaptive Lotka-Volterra system neural network modeled by adaptive Lotka-Volterra system neural network modeled by adaptive Lotka-Volterra system system with product-decomposition structure hand position and orientation of PUMA robot Gaussian quadrature formula with 2 knots and 2 weights over [-1,+1] interpolating quadrature formula for function defined on a grid
6 Systems of Polynomial Equations
152
Table 6.2 Part 2. problem rabmo rbpl redeco5 redeco6 redeco7 redeco8 rediff3 reimer5 rose s9 1 sendra solotarev stewart1 stewart2 trinks virasoro wood wright
Short descriptions of systems of polynomial equations. short description optimal multi-dimensional quadrature formulas generic positions of parallel robot reduced 5-dimensional economics problem reduced 6-dimensional economics problem reduced 7-dimensional economics problem reduced 8-dimensional economics problem 3-dimensional reaction-diffusion problem 5-dimensional system of Reimer general economic equilibrium problem small system from constructive Galois theory from PoSSo test suite from PoSSo test suite direct kinematic problem of parallel robot direct kinematic problem of parallel robot from PoSSo test suite construction of Virasoro algebras system derived from optimizing the Wood function Wright’s system
column, mem means that the error message “out of memory” was issued by SeDuMi. GloptiPoly successfully solved about 90% of the problems.
6.3
Finding All Complex and/or All Real Solutions: A Unified Treatment
We here consider the ideal J := hg1 , . . . , gn i ⊂ R[x] generated by some polynomials gj ∈ R[x], j = 1, . . . , n, and its associated varieties VL (J) defined in (6.1) and (6.2) with L = C or R. One uses the vector space R[x]t ⊂ R[x] of polynomials of degree at most t, and certain subsets of its dual space (R[x]t )∗ , the space of linear functionals on R[x]t . More precisely, for an integer t ≥ d := maxj deg gj , let Ht := { xα gj : j = 1, . . . , m, and α ∈ Nn with |α| + deg gj ≤ t}. (6.6) In the algorithm for computing VC (J) we consider the set Kt := { y ∈ (R[x]t )∗ :
Ly (p) = 0 ∀p ∈ Ht },
(6.7)
6.3 Finding All Complex and/or All Real Solutions: A Unified Treatment
153
Table 6.3 Systems of polynomial equations. CPU times and semidefinite relaxation orders required to reach global optimum. Part 1. problem boon bifur brown butcher camera1s caprasse cassou chemequ chemequs cohn2 cohn3 comb3000 conform1 conform2 conform3 conform4 cpdm5 d1 des18 3 des22 24 discret3
var 6 3 5 7 6 4 4 5 5 4 4 10 3 3 3 3 5 12 8 10 8
cstr 6 3 5 7 6 4 4 5 5 4 4 10 3 3 3 3 5 12 8 10 8
deg 4 9 5 4 2 4 8 3 3 6 6 3 4 4 4 4 3 3 3 2 2
CPU 1220 8.20 6.27 1.33 0.58 9.48 6.73 0.48 0.55 24.6 0.22 0.19 3.89 12.2 0.24 77.2 0.31
order 4 6 3 4 2 3 8 3 2 3 3 2 3 3 5 6 2 3 4 1 1
sol 8 2 1 mem 2 2 mem 1 1 1 1 1 2 2 4 2 1 dim mem 1 1
whereas for computing VR (J), we rather consider the smaller set Kt, := { y ∈ Kt :
Mbt/2c (y) 0 }.
(6.8)
So the only difference between the two cases is the presence or absence of the semidefiniteness constraint Mt (y) 0 on the moment matrix Mt (y). Notice that K2t := { y ∈ (R[x]2t )∗ : Mt (gj y) = 0 , j = 1, . . . , m}, K2t, := { y ∈ (R[x]2t )∗ : Mt (y) 0;
Mt (gj y) = 0, j = 1, . . . , m },
and with K := VR (J) (i.e. no inequality constraints in (6.3)), the semidefinite relaxation (6.4) reads ρi = inf { Ly (f ) : y ∈ K2t, ; y0 = 1 }. y
(6.9)
6 Systems of Polynomial Equations
154
Table 6.4 Systems of polynomial equations. CPU times and semidefinite relaxation order required to reach global optimum. Part 2. problem eco5 eco6 eco7 eco8 fourbar geneig heart i1 ipp katsura5 kinema kin1 ku10 lorentz manocha noon3 noon4 noon5 proddeco puma quadfor2 quadgrid rabmo rbpl redeco5 redeco6 redeco7 redeco8 rediff3 reimer5 rose s9 1 sendra solotarev stewart1 stewart2 trinks virasoro wood wright
var 5 6 7 8 4 6 8 10 8 6 9 12 10 4 2 3 4 5 4 8 4 5 9 6 5 6 7 8 3 5 3 8 2 4 9 12 6 8 4 5
cstr 5 6 7 8 4 6 8 10 8 6 9 12 10 4 2 3 4 5 4 8 4 5 9 6 5 6 7 8 3 5 3 8 2 4 9 10 6 8 3 5
deg 3 3 3 3 4 3 4 3 2 2 2 3 2 2 8 3 3 3 4 2 4 5 5 3 2 2 2 2 2 6 9 2 7 3 2 2 3 2 2 2
CPU 5.98 57.4 256 1310 0.16 33.2 1532 44.1 6.42 0.74 26.4 72.5 0.64 1.27 0.22 0.65 4.48 0.11 1136 0.75 10.52 36.9 0.16 0.13 0.14 0.13 0.09 79.5 5.45 0.34 0.24 20.4 372 0.78 0.16 0.20 0.17
order 3 3 3 3 2 3 3 2 2 2 2 3 2 2 6 3 3 3 2 3 3 3 3 3 1 1 1 1 1 6 7 2 5 2 2 2 2 1 2 1
sol 1 1 1 1 1 1 2 1 1 1 1 dim 1 2 1 1 1 1 1 4 2 1 mem 1 1 1 1 1 1 mem 2 1 1 1 2 1 1 1 1 1
6.3 Finding All Complex and/or All Real Solutions: A Unified Treatment
6.3.1
155
Basic underlying idea
Notice that if K = VR (J) (assumed to be finite) and f = 1, then every x ∈ VR (J) is an optimal solution. Denote δx the Dirac measure at x ∈ Rn . The moment sequence y = (yα ) of a probability measure µ :=
X
x∈VR (J)
cx δx ,
cx ≥ 0;
X
cx = 1
x∈VR (J)
on VR (J), is a feasible solution of the semidefinite relaxations (6.9). In addition, among all such probability measures, those with cx > 0 for all x ∈ VR (J), give maximum rank to the moment matrix Mi (y), and for all sufficiently large i, rank Mi (y) = |VR (J)|. Moreover, as VR (J) is finite, then by Theorem 6.1(b), there is an index i1 such that the semidefinite relaxation (6.9) is exact and y is the moment sequence of some probability measure on VR (J). Now, a generic element of K2t, (i.e., generated randomly according to a uniform probability distribution on K2t, ) gives maximun rank to the moment matrix Mt (y). The idea is to impose in the semidefinite relaxation (6.9) the additional constraint that y ∈ Kt, be generic (so as to give maximum rank to Mbt/2c (y)), and so, when (6.5) is satisfied, y is the moment sequence of a probability measure supported on all points of VR (J). Finally, it turns out that a generic element of Kt, lies in the relative interior of Kt, and can be found with appropriate interior-point algorithm for semidefinite programming algorithms. Some of SDP solvers used in GloptiPoly (e.g. the SeDuMi solver) implement such interior point methods. So, any optimal solution y of (6.9) (obtained by running GloptiPoly with e.g. the SeDuMi solver of Sturm (1999)) is generic and provides the desired result when y satisfies (6.5). At last but not least, for all t sufficiently large, every element y of Kt will also satisfy the rank condition (6.5), but Mt (y) 0 does not necessarily hold and so in general, y ∈ Kt is not anymore the moment sequence of a probability measure. However, again, a generic element (i.e. in the relative interior of K2t ) gives maximum rank to Mt (y) and along with the property (6.5), permits to extract all points of VC (J) whenever VC (J) is finite. 6.3.2
The moment-matrix algorithm
Before presenting the moment-matrix algorithm to compute the variety VL (J) of an ideal J ⊂ R[x] with L = C or R, we briefly explain how
156
6 Systems of Polynomial Equations
computing the points of VC (J) reduces to an eigenvalue problem when J is 0-dimensional. Recall that for an arbitrary variety V ⊂ Cn , the vanishing ideal I(V ) is the set of polynomials g ∈ R[x] that vanish on V , and by the Hilbertnull√ stellenstaz, for an arbitrary ideal J ⊆ R[x], its associated radical ideal J is the vanishing √ ideal I(VC (J)). Similarly, by the Realnullstellensatz, the real radical R J is the vanishing ideal I(VR (J)); see Section A.2 The quotient algebra R[x]/J. Given an ideal J ⊆ R[x], the elements of the quotient space R[x]/J are the cosets [f ] = f + J : {f + q : q ∈ J} and if J is 0-dimensional, then R[x]/J it is a finite-dimensional R-vector space with addition [f ] + [g] = [f + g] and scalar multiplication λ[f ] = [λf ] for λ ∈ R, and f, g ∈ R[x].√ Its dimension is larger than |VC (J)|, with equality if J is radical (i.e. J = J). It is also an algebra with multiplication [f ] [g] = [f g], and given an arbitrary h ∈ R[x], one may define the linear multiplication operator mh : R[x]/J→R[x]/J that gives [f ] 7→ [f h] for all f ∈ R[x]. If BJ denotes a basis of R[x]/J then the matrix Mh associated with mh in that basis, plays a fundamental role in obtaining the points of VC (J). Indeed: Theorem 6.2. Let J ⊂ R[x] be a zero-dimensional ideal and let BJ := {b1 , . . . , bN } be a basis of R[x]/J. For every v ∈ VC (J) let uv := (bi (v))1≤i≤N ∈ Cn . Then the set {h(v) : v ∈ VC (J)} is the set of eigenvalues of Mh and Mh uv = h(v) uv for all v ∈ VC (J). In particular, by applying Theorem 6.2 with the polynomial x 7→ h(x) := xi , one obtains the coordinates vi ∈ C of all points v ∈ VC (J), as the eigenvalues of the multiplication matrix Mxi . We next present a few results that are relevant to understand the moment-matrix algorithm to compute VL (J) with L = C or L = R. Recall that Nnt = {α ∈ Nn : |α| ≤ t} and for a polynomial g ∈ R[x]t , n g ∈ RNt denotes its vector of coefficients in the usual canonical basis (xα ). n For a sequence y = (yα ) ∈ RN2t , let hKer Mt (y)i ⊂ R[x] be the ideal generated by the polynomials g ∈ R[x]t such that Mt (y)g = 0. In other words, hKer Mt (y)i is the ideal generated by the polynomials g of degree at most t and whose vector of coefficients g lies in the kernel of the moment matrix Mt (y).
6.3 Finding All Complex and/or All Real Solutions: A Unified Treatment
157
Let d := maxj ddeg gj /2e, and let L = C or R. Let Kt := Kt when L = C and Kt := Kt when L = R. The basic idea of the algorithm is to obtain an index s and a generic sequence y ∈ Kt such that the ideal J0 := hKer Ms (y)i satisfies J ⊆ J0 ⊆ I(VL (J))
(6.10)
and the sequence y satisfies rank Ms−d (y) = rank Ms (y).
(6.11)
The first condition (6.10) will ensure that VL (J0 ) = VL (J) while the second condition (6.11) will permit to build up multiplications matrices Mxi for extracting all points of VL (J). It turns out that finding such a couple (s, y) ∈ N × Kt is possible thanks to very nice properties of the moment matrix’s kernel Ker Ms (y) when y ∈ Kt . More precisely, to get the first inclusion in (6.10) we use Lemma 6.3. Let d ≤ s ≤ bt/2c and y ∈ Kt . rank Ms (y) then J ⊆ hKer Ms (y)i,
If rank Ms−d (y) =
while to get the second inclusion we use Theorem 6.4. Let 1 ≤ s ≤ bt/2c and y ∈ Kt be such that rank Ms (y) = max { rank Ms (z) : z ∈ Kt }.
(6.12)
Then hKer Ms (y)i ⊆ I(VL (J)). Finally:
Theorem 6.5. Let y ∈ Kt and d ≤ s ≤ bt/2c. Assume that rank Ms (y) is maximum (i.e. (6.12) holds), and let J0 := hKer Ms (y)i. If rank Ms−d (y) = rank Ms (y) then √ J ⊆ J0 ⊆ I(VC (J)) (= J), case L = C (6.13) √ R J0 = I(VR (J)) (= J), case L = R, (6.14) and any basis of the column space of Ms−1 (y) is a basis of R[x]/J0 . Hence one can construct the multiplication matrices in R[x]/J0 from the matrix Ms (y) and find the variety VL (J0 ) = VL (J) using the eigenvalue method.
158
6 Systems of Polynomial Equations
In fact, (extraction) Algorithm 4.2 applied to the moment matrix Ms (y) of Theorem 6.5 is exactly the eigenvalue method applied to the ideal hKer Ms (y)i. The moment-matrix algorithm. Assume that L = R or C and VL (J)| < ∞. Algorithm 6.1. The moment-matrix algorithm for VL (J) Input: gj , j = 1, . . . , m; t ≥ d. If L = C set Kt := Kt else if L = R set Kt := Kt, . Output: A basis BJ ⊆ R[x]s−1 of R[x]/hKer Ms (y)i (which enables the computation of VL (J)). 1 Find a generic element y ∈ Kt (i.e., such that (6.12) holds). 2 Check if rank Ms−d (y) = rank Ms (y) for some d ≤ s ≤ bt/2c. 3 If yes then 4 return a basis BJ ⊆ R[x]s−1 of the column space of Ms−1 (y), and extract VL (J) by applying Algorithm 4.2 to the moment matrix Ms (y). 5 else 6 iterate (go to 1) replacing t with t + 1 7 end if Algorithm 6.1 has finite convergence because:
Theorem 6.6. Let L = C or R, and let J := hg1 , . . . , gm i ⊂ R[x] satisfy 1 ≤ |VL (J)| < +∞. Then there exists t1 ≥ d, t2 ∈ N, such that for every bt/2c ≥ t1 + t2 : rank Mt1 (y) = rank Mt1 −d (y)
∀ y ∈ Kt .
And so, Theorem 6.6 combined with Theorem 6.5 yields the finite convergence of Algorithm 6.1. It is worth noticing that when L = R, then Algorithm 6.1 reduces to solving the optimization problem (5.2) with objective (polynomial) function f = 1, using the hierarchy of semidefinite relaxations (5.14), provided that one uses a primal-dual interior point algorithm, as proposed in the
6.3 Finding All Complex and/or All Real Solutions: A Unified Treatment
159
GloptiPoly software. This is because with f = 1, all points of VR (J) are optimal solutions, and an optimal solution y of (5.14) will lie in the relative interior of Kt , the desired property needed in Algorithm 6.1 to get all real solutions. Example 6.2. Consider the following example in R3 h1 = x21 − 2x1 x3 + 5 h2 = x1 x22 + x2 x3 + 1 h3 = 3x22 − 8x1 x3 which has 8 complex solutions among which only 2 are real. Table 6.5 Rank of Ms (y) for generic y ∈ Kt in Example 6.2. s= s= s= s= s=
0 1 2 3 4
t=2 1 4
t=4 1 4 8
t=6 1 4 8 9
t=8 1 4 8 8 9
For the algorithm with L = C, Table 6.5 shows the ranks of the matrices Ms (y) for generic y ∈ Kt = Kt , as a function of s and t. The rank condition rank Ms−d (y) = rank Ms (y) for some d ≤ s ≤ bt/2c holds for t = 8 and s = 3 as we have: rank M3 (y) = rank M2 (y) with y ∈ K8 . One obtains the following 8 complex solutions: v1 = −1.101, −2.878, −2.821 , v2 = 0.07665 + 2.243i, 0.461 + 0.497i, 0.0764 + 0.00834i , v3 = 0.07665 − 2.243i, 0.461 − 0.497i, 0.0764 − 0.00834i , v4 = −0.081502 − 0.93107i, 2.350 + 0.0431i, −0.274 + 2.199i , v5 = −0.081502 + 0.93107i, 2.350 − 0.0431i, −0.274 − 2.199i , v6 = 0.0725 + 2.237i, −0.466 − 0.464i, 0.0724 + 0.00210i , v7 = 0.0725 − 2.237i, −0.466 + 0.464i, 0.0724 − 0.00210i , v8 = 0.966, −2.813, 3.072 with maximum error of := maxi≤8,j≤3 |hj (vi )| ≤ 3 · 10−10 . For the algorithm with L = R and the same example, Table 6.6 displays the ranks of the matrices Ms (y) for generic y ∈ Kt, ; now the rank condition is satisfied at s = 2 and t = 6; that is, rank M2 (y) = rank M1 (y) with
6 Systems of Polynomial Equations
160
Table 6.6 Rank of Ms (y) for generic y ∈ Kt, in Example 6.2.
s= s= s= s=
0 1 2 3
t=2 1 4
t=4 1 4 8
t=6 1 2 2 3
y ∈ K6, . The real roots extracted are v1 = (−1.101, −2.878, −2.821);
v2 = (0.966, −2.813, 3.072)
with a maximum error of ≤ 9 · 10−11 . 6.4
Summary
In this chapter we have considered two applications of the method of moments for solving and analyzing systems of polynomial equations. In the first case we use the machinery developed in Chapter 5 and propose a hierarchy of semidefinite relaxations to obtain (at least) one solution. Experimental results show that in many cases a solution is obtained at a relaxation of low order. In the second case we have provided an algorithm to compute all complex or all real solutions of a system of polynomial equations when the ideal generated by the polynomials defining the equations has finitely many complex or real roots. To do so we use ideal-like properties of the kernel of moment matrices and the resulting algorithm uses only numerical linear algebra in the complex case and numerical linear algebra plus semidefinite programming for extracting only the real roots. Remarkably: - all information needed to compute the roots is contained in the moment matrix and its kernel. - the only difference between the real and complex cases is the presence or absence of a semidefiniteness constraint on the moment matrix. In the former (real) case, Algorithm 6.1 is the same as solving the optimization problem (5.2) with objective criterion f = 1, using the hierarchy of semidefinite relaxations (5.14), provided that one uses a primal-dual interior point algorithm, as proposed in the Gloptipoly software. The resulting algorithm for the real case is numerical in nature as it uses semidefinite programming and could provide an alternative to algebraic
6.6 Notes and Sources
161
approaches based on Gr¨ obner bases with exact arithmetic. The latter are more ambitious as they provide all solutions with infinite precision but at a high computational cost.
6.5
Exercises
Exercise 6.1. Consider the ideal I := hxi (x2i + 1)i ⊂ R[x] generated by the polynomials x 7→ gi (x) = xi (x2i + 1), i = 1, . . . , n. • Compute VC (I) and VR (I); compare their cardinality. • Compute K4, . What happens? • When does Algorithm 6.1 with L = R stop? Exercise 6.2. With n = 2, consider the ideal I := hx21 + x22 i ⊂ R[x] generated by the polynomial x → 7 g(x) = x21 + x22 . • Compute VC (I) and VR (I); compare their cardinality. • Compute K4, . What happens? • When does Algorithm 6.1 with L = R stop?
6.6
Notes and Sources
There exist various methods for solving problem (I) which range from numerical continuation methods see e.g. Sommese and Wampler (2005)), to exact symbolic methods (e.g. Rouillier (1999)), or more general symbolic/numeric methods (e.g. Mourrain and Tr´ebuchet (2005) or Zhi and Reid (2004), see also the monograph Stetter (2004)). For instance, Verschelde (1999) proposes a numerical algorithm via homotopy continuation methods (cf. also Sommese and Wampler (2005)) whereas Rouillier (1999) solves a zero-dimensional system of polynomials symbolically by giving a rational univariate representation (RUR) for its solutions, of the form 1 (t) n (t) f (t) = 0, x1 = gg(t) , . . ., xn = gg(t) , where f, g, g1 , . . . , gn ∈ R[t] are univariate polynomials. The computation of the RUR relies in an essential way on the multiplication matrices in the quotient algebra R[x]/I which thus requires knowledge of a corresponding linear basis of the quotient space. The literature concerned with solving problem (II), is by far not as broad as for problem (I). With the exception of Lasserre et al. (2008b), most algorithms are based on real-root counting algorithms using e.g. Hermite’s
162
6 Systems of Polynomial Equations
Quadratic forms or variants of Sturm sequences; see e.g. Basu et al. (2003) or Rouillier and Zimmermann (2003) for a discussion. The moment matrix algorithm was first derived for computing VR (I) in Lasserre et al. (2008b) and then extended to cover both cases VC (I) and VR (I) in Lasserre et al. (2008c). Finally, one may also adapt to the real case some existing methods tailored to the complex case. One still uses positive semidefiniteness of moment matrices but now combined with a stopping criterion different from the rank test of Theorem 6.6. For instance, in Lasserre et al. (2009) one uses a stopping criterion borrowed from Zhi and Reid (2004) for the complex case.
Chapter 7
Applications in Probability
This chapter covers some applications in probability. We first consider the problem of computing an upper bound on µ(S) over all measures µ that satisfy certain moment conditions. We then consider the problem of computing (or at least approximating) the volume of a compact basic semi-algebraic set. We end up with the mass-tranfer (or Monge-Kantorovich) problem.
With S ⊂ Rn , the problem of computing bounds on µ(S) over all measures µ that satisfy certain moment conditions is motivated by providing bounds on the probability that a Rn -valued random variable belongs to S, given the only information that some of its moments are known. This latter problem is old as it dates back to famous mathematicians and probabilists like e.g. Markov, Chebyshev and Chernoff whose names are now associated with some celebrated bounds in probability. In this chapter we apply results of earlier chapters to provide a general optimization based methodology to address this problem. We also consider the important problem of measuring a basic closed semi-algebraic set K ⊂ Rn , i.e. computing µ(K) for a given measure µ on Rn when all moments of µ are known. In particular if µ is the normalized Lebesgue measure on a box that contains K, then µ(K) is just the (Lebesgue) R volume of K. Finally, we consider the problem of bounding the integral K f dµ over all Q measures µ on the cartesian product K := pj=1 Kj , with given marginals νj on Kj , j = 1, . . . , p. For p = 2 this is just the celebrated mass transfer (or Monge-Kantorovich) problem. 7.1
Upper Bounds on Measures with Moment Conditions
Let γα , α ∈ Γ ⊂ Nn , be a sequence of moments and let S = {x ∈ Rn : pj (x) ≥ 0, 163
j = 1, . . . , t}
(7.1)
7 Applications in Probability
164
be a given basic semi-algebraic set. Then, given a Rn -valued random variable X, the problem of finding an optimal bound on Prob (X ∈ S), given some of its moments γα , can be formulated as the problem: of computing:
ρmom =
sup µ∈M (Rn )+
s.t.
Z Z
1S dµ Rn
Rn
(7.2) xα dµ = γα , α ∈ Γ,
which is (4.2) with K = Rn , f = 1S and x 7→ hα (x) := xα , α ∈ Γ. Note that γ0 = 1. As we do not want to deal with indicator functions, without loss of generality we decompose µ ∈ M (Rn )+ into a sum φ + ν, with φ, ν ∈ M (Rn )+ , φ supported on S and ν supported on Rn . And so we now consider the generalized moment problem: Z ρmom = sup dφ ν∈M (Rn )+ ,φ∈M (S)+ S (7.3) Z Z s.t. xα dφ + xα dν = γα , α ∈ Γ. Rn
S
The multi-measures moment problem (7.3) is equivalent to (7.2). Indeed, consider an arbitrary feasible solution φ, ν such that ν(S) > 0 and let ν1 , ν2 be the restrictions of ν to S and Rn \ S respectively. Then (φ0 , ν 0 ) with φ0 := φ + ν1 and ν 0 := ν2 is another feasible solution with value φ0 (S) = φ(S) + ν1 (S) ≥ φ(S). Therefore as we maximize µ(S) there is no need to impose that ν is supported on Rn \ S. The multi-measures moment problem has been studied in Chapter 4. So introduce the moment variables y = (yα ) and z = (zα ), Z Z yα = xα dφ, zα = xα dν, ∀ α ∈ Nn . S
Rn
Let 2vj or 2vj − 1 be the degree of pj ∈ RR[x], j = 1, . . . , t. Using the decomposition of µ, the moment condition xα dµ = γα reads: yα + z α = γ α ,
∀ α ∈ Γ.
Since the measure φ has support on the basic semi-algebraic set S, then for
7.1 Upper Bounds on Measures with Moment Conditions
165
i ≥ v := maxj=1,...,t vj , one imposes the semidefinite constraints: Mi (y) 0,
Mi−vj (pj y) 0, j = 1, . . . , t.
On the other hand, since ν is supported on Rn we only require that Mi (z) 0. Therefore introduce the sequence of semidefinite optimization problems: ρi = sup y0 y,z
s.t. yα + zα = γα , Mi (y) 0,
α∈Γ
(7.4)
Mi−vj (pj y) 0, j = 1, . . . , t
Mi (z) 0
which are the semidefinite relaxations analogues of (4.26) for problem (7.3). But we cannot apply Theorem 4.5 because ν is supported on Rn , a non compact set. However, we obtain the following sequence of upper bounds. (Recall that v := maxj=1,...,t vj .)
Theorem 7.1. Let ρi be the optimal value of the semidefinite program (7.4). Then: (a) For every i ≥ v, ρi ≥ ρmom and moreover, ρi ↓ ρ∗ ≥ ρmom as i→∞. (b) If ρi is attained at an optimal solution (y, z) which satisfies rank Mi (y) = rank Mi−v (y) (7.5) rank Mi (z) = rank Mi−1 (z) then ρi = ρmom .
If Condition (7.5) is satisfied, we can apply Algorithm 4.2 to extract a finitely supported measure that achieves the bound ρmom . If in problem (7.3) the sup is now over measures µ with support on a basic semi-algebraic set K ⊂ Rn as defined in (4.1), we need to slightly modify the approach by requiring that the measure ν has support on K, i.e., with wj = ddeg gj /2e, one includes the additional semidefinite constraints Mi−wj (gj z) 0, j = 1, . . . , m, in (7.4); see Exercise 7.2.
166
7 Applications in Probability
If both K and S are compact and satisfy Assumption 2.1 then we can directly apply Theorem 4.5 to the generalized moment problem (7.3) and so in Theorem 7.1(a), the equality ρ∗ = ρmom now holds. Let us illustrate the approach in this section with some examples. Example 7.1. Let n = 2 and (1, γ10 , γ01 , γ20 , γ11 , γ02 ) = (1, 20, 20, 500, 390, 500)
(7.6)
be the vector of first and second order moments and let S = {x2 + y 2 ≤ 1}. The first relaxation yields ρ1 ≈ 0.1079 and Algorithm √ 4.2 constructs an √ optimal measure µ to be a Dirac measure at the point ( 2/2, 2/2) on S with mass 0.1079. Thus, the first relaxation is exact. If in addition, to this set of moments we add the moments γ40 = γ04 = 251000, we have v2 = 2, and thus √ v =√2. We find that ρ2 ≈ 0.003264 and a Dirac measure at the point ( 2/2, 2/2) on S with mass 0.003264. Note that the uper bound has decreased significantly with only two fourth order moment conditions. To illustrate the dependence of the bound on the set S, let S := {x2 /2+ 2 y ≤ 1}, and use the moment conditions (7.6) to obtain ρ1 ≈ 0.10944. Algorithm 4.2 constructs an optimal measure µ to be a Dirac measure at the point (1.1454, 0.5865) on S with mass 0.10944. Example 7.2. This example shows that the approach applies for nonconvex and disconnected sets S. Let n = 2, (1, γ10 , γ01 , γ20 , γ11 , γ02 , γ40 , γ04 ) = (1, 0, 0, 20, 0, 20, 500, 500), and let S := {x2 /2 + y 2 ≤ 1, x2 + y 2 /2 ≥ 1}. The set S being an intersection of an ellipsoid with a complement of another ellipsoid, consists of two disconnected nonconvex sets. The two ellipsoids intersect in four points p p 4 (x(i))i=1 := (± 2/3, ± 2/3). The second relaxation yields ρ2 ≈ 0.2111 and Algorithm 4.2 yields an optimal measure " 4 # 1X µ = 0.2111 δx , 4 i=1 i p p where (x(i))4i=1 = (± 2/3, ± 2/3), are the intersection points of the two ellipsoids defining the set S.
7.1 Upper Bounds on Measures with Moment Conditions
167
Upper bounds in the univariate case We here consider the univariate case, that is, upper bounds on a probability measure µ on R. Introduce the matrices y 0 y 1 y 2 . . . yi y1 y2 . . . . . . yi+1 Mi (y) := . . .. .. .. , .. .. . . . yi yi+1 . . . y2i−1 y2i
Bi (y) :=
y1 y2 .. . yi+1
y2 . . . yi+1 y3 . . . yi+2 .. , .. .. . . . yi+2 . . . y2i+1
Ci (y) :=
y2 y3 .. . yi+1
y3 . . . yi+1 y4 . . . yi+2 .. . .. .. . . . yi+2 . . . y2i
While the Hankel matrix Mi (y) is the moment matrix associated with the sequence y, in Bi (y) (resp. in Ci (y)) one recognizes the localizing matrix Mi (g y) (resp. Mi−1 (h y)) associated with y and the polynomial x 7→ g(x) := x (resp. x 7→ h(x) := x2 ). With S = [a, b] and (γα ), α ∈ Γ ⊂ N, being a finite collection of moments, consider the semidefinite optimization problem: ρi = sup y0 y,z
s.t. yα + zα = γα , Mi (y) 0,
α∈Γ
(7.7)
(a + b)Bi−1 (y) ab Mi−1 (y) + Ci (y) Mi (z) 0.
Theorem 7.2. Let γ = (γα ) be such that α ≤ 2i for all α ∈ Γ. (a) Let S = [a, b] R⊂ R. Then ρi = supµ∈M (R)+ {µ(S) : µ ∼ γ }, where µ ∼ γ means that xα dµ = γα for all α ∈ Γ, that is, ρi is a tight upper bound. (b) Let S = [a, b] ⊂ R+ and include the additional constraint Bi−1 (z) 0 in the semidefinite program (7.7). Then ρi = supµ∈M (R+ )+ {µ(S) : µ ∼ γ }, that is, ρi is a tight upper bound.
7 Applications in Probability
168
Note that the additional constraint Bi−1 (z) 0 ensures that we restrict ourselves to measures µ with support on R+ . Example 7.3. Let S = [1, 3] and (1, γ1 , γ2 ) = (1, 2, 10). In this case ρ1 = supµ µ(S) = 1. With γ3 = 15 and γ4 = 150, one obtains ρ2 = supµ ≈ 0.8815 and µ = φ + ν with φ = 0.8815 (λδ{1} + (1 − λ)δ{3} ) and λ = 0.1102. Lower bounds Suppose one now wishes to find a lower bound on µ(S). A natural idea is to replace “sup” by “inf” in the semidefinite optimization problem (7.4). But if we decompose µ = φ + ν with the support of φ contained in S, we now must impose ν to have support in Rn \ S (which is not closed if S is), otherwise we would obtain that ρi = 0, with φ = 0 as long as 0 ∈ S. Alternatively, one may wish to find an upper bound on 1−µ(S) = µ(Rn \S) and apply the above methodology if Rn \ S is a basic closed semi-algebraic set. Let us illustrate the approach by an example. Example 7.4. Let S = {x ∈ R2 : x21 /100 + x22 < 1} and γ = (1, 0, 0, 0.1, 0.0.1, 0.5, 0.5) = (1, γ10 , γ01 , γ20 , γ11 , γ02 , γ40 , γ04 ). In this case Sc (= R2 \ S) = {x2 /100 + y 2 ≥ 1}. The optimal value ρ2 = 0.1010 leads to the upper bound µ(Sc ) ≤ 0.101, which in turn leads to the lower bound µ(S) ≥ 0.899. 7.2
Measuring Basic Semi-algebraic Sets
Given a measure µ on Rn with all its moments known (or which can be computed), and a compact basic semi-algebraic set S ⊂ Rn , computing (or approximating) µ(S) is an important problem with applications not only in probability but also in operations research and applied Mathematics. For instance, with µ being the Lebesgue measure on a box K := [a, b]n ⊂ n R with mass y0 and with S ⊂ K, then µ(S) provides the volume of S (scaled by y0 /(b − a)n ). Computing the volume and/or integrating on a subset K ⊂ Rn is a challenging problem with potentially many important applications. One possibility is to use a basic Monte Carlo technique that generates points uniformly in a box containing K and then compute the proportion of points falling into K. To the best of our knowledge, most of all other approximate (deterministic or randomized) or exact techniques
7.2 Measuring Basic Semi-algebraic Sets
169
deal with polytopes or convex bodies only. In fact, it is well-known that even approximating the volume is hard (and even for convex polytopes). For numerical integration against a weight function on simple sets (like e.g. simplex, box) powerful cubature formulas are available, but not for abritrary basic semi-algebraic sets. As we next see, the moment approach developed earlier is well-suited to address this problem. So with K ⊂ Rn being a basic closed semi-algebraic set as described in (4.1) and S being the basic semi-algebraic set described in (7.1), consider the following optimization problem:
ρ=
sup ν∈M (K)+
s.t.
Z Z
1S dν K
(7.8) n
α
∀α ∈ N ,
x dν = γα , K
where γ = (γα ) is the (known) moment vector of µ. Hence (7.8) is the generalized moment problem (4.2) with f = 1S and with countable set Γ = Nn . Recall that measures on compact sets are determinate, that is, completely determined by their moments. Therefore, if K is compact, problem (7.8) is a fake optimization problem since because of the moment constraints, ν is uniquely determined. Therefore, necessarily ν = µ and ρ = µ(S). Again, we do not want to deal with indicator functions. So, as we did for problem (7.2), we consider the multi-measures generalized moment problem:
ρmom =
sup ν∈M (K)+ ,φ∈M (S)+
s.t.
Z Z
dφ S α
x dφ + S
Z
(7.9) α
K
n
x dν = γα , ∀ α ∈ N ,
which is exactly (7.3) with Γ = Nn and with K instead of Rn . Let (φ, ν) be an arbitrary feasible solution of (7.9). As both φ and ν have support contained in K, the constraints of (7.9) imply φ + ν = µ because measures with compact support are determinate. Hence φ ≤ µ which in turn yields φ(S) ≤ µ(S), and so ρmom ≤ µ(S). Therefore, a unique optimal
7 Applications in Probability
170
solution of (7.9) is the couple (φ∗ , ν ∗ ) where φ∗ (resp. ν ∗ ) is the restriction of µ to S (resp. to K \ S). Then indeed, ρmom = φ∗ (S) = µ(S) = ρ. Next, the analogues of semidefinite relaxations (4.26) for problem (7.9) read: ρi = sup y0 y,z
s.t. yα + zα = γα , Mi (y) 0, Mi (z) 0. Mi−vj (pj y) 0,
Mi−wj (gj z) 0,
α ∈ Nn2i
(7.10)
j = 1, . . . , t j = 1, . . . , m
where wj = d(deg gj )/2e, j = 1, . . . , m and Nn2i := {α ∈ Nn : |α| ≤ 2i}. Of course one has ρi ≥ ρmom because (7.10) is a relaxation of (7.9). So we have a multi-measures moment problem with countably many moment constraints. The following result is a slight extension of Theorem 4.5 to handle the countably many moment constraints of (7.9).
Theorem 7.3. Let Assumption 2.1 hold for K ⊂ Rn in (4.1) and for S ⊂ Rn in (7.1). Let ρmom and ρi be as in (7.9) and (7.10) respectively. Then as i→∞, ρi ↓ ρmom = µ(S). Proof. We briefly sketch the proof which is very similar to that of Theorem 4.5. Let (yi , zi ) be a nearly optimal solution of (7.10), e.g. any feasible solution with value ρi − 1/i ≤ y0 ≤ ρi . Then (yik , zik ) converges pointwise to some (y, z) for some subsequence (ik ). Fix α ∈ Nn , arbitrary. The pointwise convergence yields yα + zα = γα and as α is arbitrary, (y, z) satisfies the moment constraints. Again, because of the pointwise convergence, for every r = 1, . . ., we also have Mr (y), Mr (pj y) 0, j = 1, . . . , t, and Mr (z), Mr (gj z) 0, j = 1, . . . , m. Therefore, by Theorem 3.8, y (resp. z) is the moment vector of some measure φ (res. ν) supported on S (resp. on K), and so (φ, ν) is feasible for (7.9) with value φ(S) = limk→∞ y0ik = limk→∞ ρik ≥ ρmom . And so (φ, ν) is an optimal solution of (7.9). As ρi is monotone we deduce the desired result limi→∞ ρi = ρmom . On the other hand we have already seen that ρmom = µ(S). Remark 7.1. One may always rescale the problem so as to make the set
7.2 Measuring Basic Semi-algebraic Sets
171
K contained in the box [−1, 1]n . In this case, in (7.10) one may ignore the constraint Mi−wj (gj z) 0, to obtain the slightly simpler semidefinite realaxation: ρi = sup y0 y,z
s.t. yα + zα = γα , Mi (y) 0, Mi (z) 0. Mi−vj (pj y) 0,
α ∈ Nn2i
(7.11)
j = 1, . . . , t,
and Theorem 7.3 remains valid with (7.11) instead of (7.10). i i Indeed, in the proof of Theorem 7.3, from yα + zα = γα , for every n i α ∈ N2i , one obtains 0 Mi (z ) Mi (γ) for every i. In particular, i z2β ≤ γ2β for every β ∈ Nni . Next, as µ is supported on K ⊆ [−1, 1]n , i one has |γα | ≤ 1 for every α ∈ Nn , and so by Proposition 3.6, |zα |≤1 n for all α ∈ N2i . This implies that the limiting sequence z in the proof of Theorem 7.3 satisfies |zα | ≤ 1 for all α ∈ Nn , which in turn by Theorem 3.5(b), implies that z has a representing measure ν supported on [−1, 1]n . But by y + z = γ and the fact that measures with compact support are determinate, we have ν ≤ µ. Therefore since µ is supported on K then so is ν. The rest of the proof is the same. The dual problem R As problem (7.8) is the same as ρ = sup { S dν : ν = µ ; ν ∈ M (K)+ }, consider its dual Z ρ∗ = inf { f dµ : f ≥ 1S ; f ∈ C(K) }, (7.12) f
S
where C(K) is the Banach space of continuous functions on K. Of course, instead of C(K) one could have chosen the space B(K) of bounded measurable functions on K, in which case a trivial optimal solution is f := 1S . But one next shows that by using the much smaller class of functions C(K), one also obtains the same optimal value. And as we will see, the dual of the semidefinite relaxation (7.11) is strongly related to (7.12). Let x 7→ d(x, S) be the euclidean distance to the set S, and for > 0 let D := {x ∈ K : d(x, S) ≥ }. Lemma 7.4. Assume that S and K \ S have non empty interior, and let ρ and ρ∗ be as in (7.8) and (7.12), respectively.
7 Applications in Probability
172
Then ρ = ρ∗ = µ(S). Moreover, there is a minimizing sequence in R[x], i.e. a sequence of polynomials R (p ) ⊂ R[x] that satisfy the constraints of (7.12) and such that lim→0 S p dµ = ρ∗ . Proof. As S and K\S have non empty interior, there is some 0 > 0 such that for every < 0 , the closed set D is non empty and disjoint from S. As K is a metric space (hence a normal space), by Urysohn’s Lemma1 there is a continuous function f : K→[0, 1] such that f = 1 on S and f = 0 on D . Therefore, Z µ(S) ≤ f dµ ≤ µ({x ∈ K : d(x, S) < }). K
R
and so K f dµ→µ(S) as →0. Let (k )k ⊂ R+ be a sequence with k > 0 for every k, k →0 as k→∞, and let fk be as above. As K is compact, by the Stone-Weierstrass theorem, for every k ∈ N, there is a polynomial pk ∈ R[x] such that sup {|fk (x) − pk (x)| : x ∈ K} ≤ k . Thefore the Rsequence (pk + k )k ⊂ R[x] R is such that pk + k ≥ 1S on K and limk→∞ K (pk + k )dµ = limk→∞ K fk dµ = ρ∗ . R Hence in solving (7.12), one wishes to minimize K f dµ over all f in C(K) minorized by 1S . And in fact, there is a minimizing sequence in R[x] (⊂ C(K)) with the desired limiting value ρ∗ . On the other hand, with K ⊆ [−1, 1]n , the dual of the semidefinite relaxation (7.11) reads: X ρ∗i = inf λα γ α λ,σj ,ψ0
s.t.
α
X α P
λα x α − 1 = σ 0 +
t X
σj p j ,
j=1
(σj )tj=0 ⊂ Σ[x],
α ψ0 ∈ Σ[x], α λα x = ψ 0 , deg ψ0 , deg σ0 , deg σj pj ≤ 2i,
or, equivalently:
1 Urysohn’s Lemma states that for any two disjoint closed sets A, B of a normal Hausdorff space X, there is a continuous function X→ [0, 1] such that f = 1 on A and f = 0 on B; see Ash (1972)[A4.2, p. 379]. A metric space is normal.
7.2 Measuring Basic Semi-algebraic Sets
ρ∗i = inf
f,σj
Z
173
f dµ K
s.t. f − 1 = σ0 +
t X j=1
σj p j ,
f, σ0 , . . . , σt ∈ Σ[x]
(7.13)
deg f, deg σ0 , deg σj pj ≤ 2i. That is, among the s.o.s. polynomials f such that f ≥ 1S , one searches R for the one that minimizes K f dµ. Equivalently, with this criterion, one searches for the best polynomial approximation from above of the function 1S . So obviously ρ∗i ≥ ρ∗ . Proposition 7.5. Let K ⊆ [−1, 1]n and let both S and K \ S have a non empty interior. Then for every i ≥ maxj=1,...,t vj , the semidefinite relaxations (7.11) and (7.13) have same optimal value, i.e., ρ i = ρ∗i . And so ρ∗i →ρmom = µ(S). Proof. Let y (resp. z) be the moment sequence of the restriction of µ to S (resp. to K \ S). As both S and K \ S have a non empty interior, then necessarily, for every i one has Mi (z) 0, Mi (y) 0, and Mi (pj y) 0 for every j = 1, . . . , t. Indeed, for every 0 6= h ∈ R[x]i and 0 6= g ∈ R[x]i−vj , with respective vectors of coefficients h and g, Z Z hh, Mi (y)hi = h2 dµ 6= 0 , hh, Mi (z)hi = h2 dµ 6= 0, K\S
S
and hg, Mi (pj y)gi =
Z
S
g 2 pj dµ 6= 0, because neither h nor g can vanish
on an open set. Moreover, by definition, yα + zα = γα for every α ∈ Nn . Therefore (y, z) is a strictly feasible solution of (7.11), which by a standard result in semidefinite optimization, yields absence of a duality gap, i.e., the desired result ρi = ρ∗i ; see Section C.2. Example 7.5. Consider the two-dimensional example with K := [−1, 1]2 and S := {x ∈ R2 : −(x21 + x22 )3 + 4x21 x22 ≥ 0 } which is the 2-dimensional folium displayed in figure 7.1. Notice that S is not convex and its interior is formed with several disjoint connected components. Figure 7.2 displays a polynomial approximation g ∈ R[x]20
July 24, 2009
15:8
World Scientific Book - 9in x 6in
7 Applications in Probability
174
Fig. 7.1
Fig. 7.2
The 2d-Folium surface S.
Degree-20 polynomial approximation of 1S .
of 1S , from above, obtained by solving the semidefinite relaxation (7.11) (and its dual (7.13)), a semidefinite program with 231 variables y, i.e., with moments variables (yα ) up to order 20. The obtained relative error (ρ20 − µ(S))/µ(S) is 1.2%. For visualization purposes, max(5/4, g) rather than g is displayed in Figure 7.2. Even if typical oscillations occur near the boundary regions, we can recognize the shape of Figure 7.1.
moments
7.3 Measures with Given Marginals
175
Integration against a weight function In fact, from the proof of Theorem 7.3, it follows that for any given finite subset Nnd = {α ∈ Nn : |α| ≤ d}, one also has: Z Z i α lim yα = x dφ = xα dµ, ∀α ∈ Nnd , (7.14) i→∞
S
S
where yi is any optimal (or nearly optimal) solution of (7.10). Hence let K be a box or a simplex of Rn that contains the basic semi-algebraic set S, and suppose that one wishes to approximate the integral Z J := f (x) W (x) dx S
where W : Rn →R is a nonnegative measurable function, and f ∈ R[x] has degree at most d. One may then proceed as follows: (1) Fix some integer i with d i, and compute or approximate the collection of moments γ = (γα ), α ∈ Nn2i , of the measure dµ = W dx on K, e.g. via cubature formulas. As K is a box or a simplex, powerful techniques are available for such a task; see e.g. Gautschi (1981, 1997). (2) Compute ρi in (7.10) and obtain an optimal (or nearly optimal solution) yi . X i Then from (7.14), one obtains J ≈ fα yα provided that i is sufficiently α∈Nn d
large.
7.3
Measures with Given Marginals
Let K := K1 × K2 with K1 , K2 being compact Borel subsets of Rn1 and Rn2 respectively, and let πi : M (K)+ →M (Ki )+ , i = 1, 2, be the projection mappings. That is, for every measure µ in M (K)+ , (π1 µ)(B) := µ(B × K2 )
and
(π2 µ)(B) := µ(K1 × B)
for every Borel subset B of K1 and K2 , respectively. With f : K→R being a measurable function, and ν1 ∈ M (K1 )+ , ν2 ∈ M (K2 )+ being two given probability measures on K1 and K2 , respectively, consider the linear program:
7 Applications in Probability
176
MT : ρmom =
sup µ∈M (K)+
Z
f dµ
(7.15)
K
s.t. πi µ = νi for i = 1, 2. R That is, one searches for the supremum of K f dµ over all probability measures µ on K whose marginal on K1 (resp. on K2 ) matches ν1 (resp. ν2 ). Of course one may generalize to more than two sets and consider a finite family (Ki ), i = 1, . . . , p. MT (i.e. when p = 2) is also the celebrated transportation (or masstransfer) problem, sometimes also called the Monge-Kantorovich problem. To see that (7.15) is a particular instance of the generalized moment problem (1.1), it suffices to notice that if Ki is compact for i = 1, 2, every finite measure on Ki is moment determinate, i.e. is completely determined by its moments; see Definition 3.2. One may thus replace the constraint πi µ = νi with the countably many linear equality constraints Z
α
x dµ = K
Z
xα dνi , Ki
∀ α ∈ N ni ;
i = 1, 2.
Therefore (1.1) with Γ := Nn1 ∪ Nn2 , hα = xα and R α(7.15) is also the GMP ni γα = x dνi whenever α ∈ N , i = 1, 2. Proposition 7.6. Let f be bounded and upper-semicontinuous. (7.15) has an optimal solution and there is no duality gap.
Then
Proof. Observe that the product probability measure µ := ν1 ⊗ ν2 on K is a feasible solution of (7.15). Also the assumptions of Theorem 1.3 are satisfied (h0 = 1) and so Corollary 1.4 applies. Next, let f ∈ R[x] and let Kt := {x ∈ Rnt : gtj (x) ≥ 0, with vtj
j = 1, . . . , mt },
:= d(deg gtj )/2e, j = 1, . . . , mt , t = 1, 2.
t = 1, 2, For 2i ≥
7.5 Exercises
177
max[deg f, maxt,j deg gtj ], the semidefinite programs ρi = sup Ly (f ) y
s.t. Mi (y) 0,
(7.16)
Mi−vtj (gtj y) 0, j = 1, . . . , mt ; t = 1, 2 yα = γ α α ∈ Γ; |α| ≤ 2i,
are the semidefinite relaxations (4.22) adapted to the generalized moment problem (7.15). And so the following convergence result is a direct consequence of Theorem 4.3.
Proposition 7.7. Let f ∈ R[x] and let ρmom (resp. ρi ) be as in (7.15) (resp. as in (7.16)). Then ρi ↓ ρmom as i→∞.
7.4
Summary
In this chapter, we have defined a sequence of semidefinite relaxations that provide increasingly stronger bounds on µ(S) whenever S is a basic closed semi-algebraic set, and µ satisfies some moment constraints. When µ is restricted to be supported on a compact basic semi-algebraic set K ⊃ S then the associated sequence of upper bounds converge to a tight upper bound on µ(S). As a consequence, we can approximate the volume of a basic semi-algebraic set to any desired accuracy by solving an appropriate hierarchy of semidefinite relaxations. R Similarly, one may also approximate as closely as desired the integral S W dx for any nonnegative measurable weight function W provided one is able to approximate the moments of the measure dµ = W dx on a box (or simplex) K ⊃ S, e.g. by cubature formulas. Finally, we has also proposed a hierarchy of semidefinite relaxations to solve (or at least approximate) the mass-transfer problem. 7.5
Exercises
Exercise 7.1. With S as in (7.1) what happens if in (7.2) one replaces ”sup” with ”inf”. If we now assume that S ⊂ K is open (with K still compact), what can be said? (Hint: K \ S is closed hence compact and inf µ µ(S) = 1−supµ µ(K\S). What about the semidefinite relaxations? Can we always describe the set K \ S easily? What is the conclusion?
7 Applications in Probability
178
Exercise 7.2. Let K = { x ∈ Rn : gj (x) ≥ 0, j = 1, . . . , m } and S = { x ∈ Rn : pk (x) ≥ 0, k = 1, . . . , t } ⊆ K, with gj , pk ∈ R[x]. Suppose that both sets K and S satisfy Assumption 2.1. Let γα be a finite collection of moments. We are interested in solving µ(S) Z s.t. xα dµ = γα , α ∈ Γ.
sup
µ∈M (K)+
(7.17)
K
(a) By modifying Problem (7.4), propose a sequence of relaxations that provide increasingly better upper bounds on Problem (7.17). Does convergence ρi ↓ ρmom hold? (b) Let n = 2, K = {x : 0 ≤ x1 , x2 ≤ 1}, S = {x : x21 + x22 ≤ 1} and (1, γ10 , γ01 , γ20 , γ11 , γ02 ) = (1, 20, 20, 500, 390, 500). Solve Problem (7.17). Exercise 7.3. Prove Theorem 7.2. 7.6
Notes and Sources
7.1. Part of this chapter is inspired from Lasserre (2002a). A detailed discussion on the complexity and results on finding bounds under moment constraints can be found in Bertsimas and Popescu (2005) and in particular, the following historical notes. Research on bound problems (also called generalized Chebyshev inequalities) dates back to the work of Gauss, de la Vall´ee Poussin, Cauchy, Chebyshev, Markov, Stieltjes. The univariate case was first proposed and formulated in Chebyshev (1874) and later resolved by his student Markov (1884) in his PhD thesis, using continued fractions techniques. For surveys on Chebyshev systems see e.g. Shohat and Tamarkin (1943) Godwin (1955, 1964), and Karlin and Studden (1966). Citing (Shohat and Tamarkin, 1943, p. 10), “the problem of moments lay dormant for more than 20 years.” It surfaced with the book of Tong (1980) and later in Landau (1987b) with the volume Moments in Mathematics which includes the survey Landau (1987a), as well as several relevant papers among which Kemperman (1987) and Diaconis (1987). More recent contributions are from e.g. Lasserre (2002a) and Bertsimas and Popescu (2005).
7.6 Notes and Sources
179
The use of optimization and duality to address moment inequalities in probability first appeared in 1960, independently and simultaneously in Isii (1960) and in Karlin and Studden (1966). Isii (1963) extended these duality results for random vectors on complete regular spaces, whereas in Smith (1995) one finds new interesting applications in decision analysis, dynamic programming, statistics and finance. On the other hand, Shapiro (2001) provided a rigorous and detailed discussion on conditions for strong duality to hold. A detailed analysis of strong duality and sensitivity analysis for semi-infinite programming problems can be found in Bonnans and Shapiro (2000). 7.2. Computing or even approximating the volume of a convex body is hard theoretically and in pracice as well. Even for a convex polytope Ω ⊂ Rn , exact computation of its volume vol(Ω) or integrating over Ω is difficult. Its computational complexity is discussed in e.g. Bollob´ as (1997) and Dyer and Frieze (1988). Any deterministic algorithm with polynomial time complexity that would compute upper and lower bounds v and v on vol(Ω), cannot yield an upper bound g(n) on v/v better than polynomial in the dimension n. Methods for exact volume computation use either triangulations or simplicial decompositions depending on whether Ω has a half-space description or a vertex description. See e.g. Cohen and Hickey (1979), Lasserre (1983), Lawrence (1991) and see B¨ ueler et al. (2000) for a comparison. Another set of methods which use generating functions are described in e.g. Barvinok (1993) and Lasserre and Zeron (2001). Concerning integration on simple sets (e.g. simplex, box) via cubature formulas, the interested reader is referred to e.g. Gautschi (1981, 1997). In contrast with these negative results, and if one accepts randomized algorithms that fail with small probability, then the situation is much better. Indeed, the celebrated probabilistic approximation algorithm of Dyer et al. (1991) computes the volume to fixed arbitrary relative precision , in time polynomial in −1 . The latter algorithm uses approximations schemes based on rapidly mixing Markov chains and isoperimetric inequalities. See also hit-and-run algorithms for sampling points according to a given distribution, described in e.g. Belisle et al. (1993). Theorem 7.3 which permits to approximate as closely as desired the volume of a compact basic semialgebraic set, is from Henrion et al. (2009b). In this latter reference, numerical issues associated with solving the semidefinite programs are discussed, as well as the use of other bases for R[x]i than the usual canonical basis of monomials (xα ). In particular, in some examples, much better results (in
180
7 Applications in Probability
terms of numerical precision and stability) are obtained with the basis of Chebyshev polynomials. 7.3. The mass transfer problem is a very hold engineering problem stated by the french geometer Monge in the 18th century for military applications, and is a special case of moment problems involving measures with given marginals. For instance, with K1 = K2 and a specific distance function f , its optimal value also measures the distance between two probability measures ν1 and ν2 on K1 , and so induces a metric on the space of probability measures on K1 . For the interested reader, a nice discussion on moment problems involving measures with given marginals can be found in Kemperman (1987) and the many references therein.
Chapter 8
Markov Chains Applications
This chapter is about Markov chains and invariant probabilities. We first address the problem of computing an upper bound on µ(S) over all invariant probability measures µ of a given Markov chain on Rn . We then consider the problem of approximating the value of an ergodic criterion, as an alternative to simulation which only provides a random estimate.
Consider a discrete-time Markov chain (MC) Φ = (Φ0 , Φ1 , . . .) on a measurable space (X, B) with transition probability function (or stochastic kernel) P : X × B → R, that is: • For every x ∈ X, P (x, ·) is a probability measure on X. • For every B ∈ B, x 7→ P (x, B) is a measurable function. When X is finite then P is just a stochastic matrix, the matrix of transition probabilities with all rows summing up to one. One may consider P from two dual viewpoints. As a linear operator acting on B(X) (the Banach space of bounded measurable functions on X, equipped with the sup-norm) by: Z f 7−→ P f (x) := P (x, dy)f (y), ∀ x ∈ X, X
and as a linear operator acting on M (X) (the Banach space of finite signed measures on X) by: Z µ 7−→ µ P (B) := P (x, B) µ(dx), ∀ B ∈ B. X
With X ⊆ Rn , let C(X) denote the Banach space of bounded continuous functions on X, equipped with the sup-norm. 181
8 Markov Chains Applications
182
A Markov chain Φ on X ⊆ Rn , with associated stochastic kernel P , is said to be weak-Feller if P maps C(X) into C(X). A probability measure µ ∈ M (X)+ is called invariant if it satisfies µP = µ ;
µ(X) = 1.
(8.1)
The spaces B(X) and M (X) form a dual pair1 of vector spaces with duality bracket Z hf, µi = f dµ, ∀ f ∈ B(X), µ ∈ M (X). (8.2) X
Therefore (8.1) is equivalent to: µ(X) = 1 ;
hf, µi = hf, µP i (= hP f, µi ),
∀f ∈ B(X).
When P is the operator acting on M (X), one sometimes write P ∗ µ instead of µP , as it is the adjoint operator of P : B(X)→B(X). Similarly, the spaces C(X) and M (X) also form a dual pair with same duality bracket as in (8.2). So if P is weak-Feller, (8.1) is equivalent to µ(X) = 1 ;
hf, µi = hP f, µi,
∀f ∈ C(X).
(8.3)
If X := K ⊂ Rn is compact, then C(K) is separable and the space of polynomials (restricted to K) is a dense subset of C(K). Therefore, the canonical basis of monomials (xα )α∈Nn is a countable dense subset of C(K), and (8.3) is equivalent to hP xα , µi = hxα , µi ;
∀ α ∈ Nn ;
µ(K) = 1.
(8.4)
The following is a basic result for Markov chains on a compact space. Proposition 8.1. Let X ⊂ Rn be compact and let P be weak-Feller. Then P has an invariant probability measure. As seen in the following example, the weak-Feller assumption is crucial and cannot be removed. Example 8.1. Let X ⊂ R be the compact space [0, 1], and let Φ be the Markov chain with transitions defined by: Φt /2 if Φt 6= 0 Φt+1 := 1 if Φt = 0. 1 See
Section C.3.
8.1 Bounds on Invariant Measures
183
The weak-Feller property fails because P f is not continuous a x = 0 whenever f is continuous. And P has no invariant probability.
8.1 8.1.1
Bounds on Invariant Measures The compact case
Let X := K ⊆ Rn and S ⊂ K be given compacts sets, and consider the problem of computing an upper bound on µ(S) over all invariant probability measures µ of P . That is, evaluate or approximate j ∗ :=
sup µ∈M (K)+
{ µ(S) : µP = µ ;
µ(K) = 1}.
(8.5)
Let Γ := Nn and consider the problem: ρmom :=
sup µ∈M (K)+
h1S , µi
s.t. hP xα − xα , µi = 0, ∀ α ∈ Γ µ(K) = 1,
(8.6)
where 1S is the indicator function of S. In (8.6) one recognizes an instance of the generalized moment problem (1.1) with countably many moment constraints.
Lemma 8.2. Assume that K ⊂ Rn and S ⊂ K are compact. If the transition probability function P is weak-Feller then in (8.6) the sup is attained and there is no duality gap. Moreover, j ∗ = ρmom .
Proof. By Proposition 8.1, as K is compact and P is weak-Feller, P admits an invariant probability measure µ; hence (8.6) has a feasible solution. As S is compact, its indicator function 1S is upper-semicontinuous. By Corollary 1.4, the sup is attained and there is no duality gap. Finally, that j ∗ = ρmom follows from the equivalence of (8.4), (8.3), and (8.1) when P is weak-Feller. Observe that if in addition to be weak-Feller, P maps R[x] into R[x], then the constraints of (8.6) state linear constraints on the moments of µ. However, Problem (8.6) is still untractable numerically because K and
8 Markov Chains Applications
184
S are still too general and in addition, the indicator function 1S is not a polynomial. However, observe that (8.6) is equivalent to ρmom := sup h1, µi µ,ν
s.t. hP xα − xα , µ + νi = 0, ∀ α ∈ Γ h1, µ + νi = 1 µ ∈ M (S)+ , ν ∈ M (K)+ .
(8.7)
Indeed, as we have already seen before, if (µ, ν) is a feasible solution of (8.6) with ν(S > 0) then one may obtain another feasible solution (µ0 , ν 0 ) with µ0 (S) > µ(S). It suffices to take ν 0 (B) := ν(B ∩ (K \ S)) and µ0 (B) := µ(B) + ν(B ∩ K) for every B ∈ B. Now, if K and S are compact basic semi-algebraic sets and P maps R[x] into R[x], then (8.7) is an instance of the generalized moment problem with polynomial data and countably many moment constraints. And so, one may invoke results of Chapter 4 and build up convergent semidefinite relaxations for (8.7). So let K ⊂ Rn be as in (4.1) for some polynomials (gj )m j=1 ⊂ R[x], and let S := { x ∈ Rn : hk (x) ≥ 0,
k = 1, . . . , p },
(8.8)
with hk ∈ R[x] for all k = 1, . . . , p. Let vj := ddeg gj /2e and wk := ddeg hk /2e for all j = 1, . . . , m and k = 1, . . . , p. Assume that P maps R[x] into R[x] and consider the following semidefinite program: ρi := sup y0 y,z
s.t. Ly+z (P xα − xα ) = 0, ∀ α; deg P xα ≤ 2i y0 + z 0 = 1 Mi (y), Mi (z) 0 Mi−vj (z gj ) 0, j = 1, . . . , m Mi−wk (y hk ) 0, k = 1, . . . , p.
(8.9)
Theorem 8.3. Let K be as in (4.1), S ⊂ K as in (8.8), and let Assumption 2.1 hold for both K and S. Assume that P is weak-Feller and maps R[x] into R[x]. Let ρmom , ρi be the respective optimal values of (8.6) and (8.9). Then ρi ↓ ρmom = j ∗ as i→∞.
8.1 Bounds on Invariant Measures
185
Sketch of the proof. Let (yi , zi ) be a feasible solution of (8.9) such that y0i ≥ ρi − 1/i. First, one shows that there exist y, z and a subsequence (ij ) such that pointwise convergence holds, i.e., as j→∞, i
yαj → yα ;
i
zαj → zα ,
∀ α ∈ Nn .
Next, this pointwise convergence implies that for every i ∈ N, Mi (y) 0,
Mi (z) 0,
Mi (gj y) 0,
Mi (hk z) 0,
for all j = 1, . . . , m and all k = 1, . . . , p. Invoking Theorem 3.8, y and z are the respective moment sequences of two measures, µ on S and ν on K. For fixed α ∈ Nn , pointwise convergence implies Ly+z (P xα − xα ) = 0, and so i (µ, ν) is feasible for (8.7). But we also have ρmom ≤ y0j →y0 = µ(S), which imples that (µ, ν) is an optimal solution of (8.7), and so µ(S) = ρmom . Finally, as the converging subsequence (ij ) was arbitrary and the sequence (ρi ) is monotone, we obtain the desired result y0i ↓ ρmom . 8.1.2
The non compact case
We now consider the non compact case where X = Rn . We still assume that P is weak-Feller and maps R[x] into R[x]. Let δ, M > 0 be fixed. We are now less ambitious and consider the problem ρmom :=
sup δ µ∈M (X)M +
{ µ(S) : µP = µ ;
µ(X) = 1}
δ where M (X)M + ⊂ M (X)+ is the set defined by: Z δ M (X)M := { µ ∈ M (X) : (eδ xi + e−δ xi ) dµ < M, + + X
(8.10)
∀i = 1, . . . , n}.
(8.11) That is, we only consider invariant measures µ whose all maginals µi have a tail with exponential decay. Let S ⊂ Rn be as in (8.8) and consider the following semidefinite program: ρi := sup y0 y,z
s.t. Ly+z (P xα − xα ) = 0, ∀ α; deg P xα ≤ 2i y0 + z 0 = 1 Mi (y), Mi (z) 0 Mi−wk (y hk ) 0, k = 1, . . . , p, Pi 2t k = 1, . . . , n. t=1 Lz (xk )/(2t!) ≤ M,
(8.12)
8 Markov Chains Applications
186
Theorem 8.4. Let S ⊂ Rn as in (8.8), and let Assumption 2.1 hold for S. Assume that P maps R[x] into R[x] and P has an invariant measure δ in M (X)M + . Let ρmom and ρi be the respective optimal values of (8.10) and (8.12). Then ρi ↓ ρmom as i→∞. Sketch of the proof. The proof mimics that of Theorem 8.3 except that to i prove the pointwise convergence zαj →zα , one now invokes a property of the P ij moment matrix Mij (z) coupled with the constraint t=1 Lz (x2t k )/(2t!) ≤ ij M , k = 1, . . . , n, which implies that supj |zα | ≤ τs whenever |α| ≤ s, for some scalars τs , s = 1, . . . P∞ 2t Next, the limit sequence z satisfies t=1 Lz (xk )/(2t!) ≤ M , k = 1, . . . , n, from which we deduce that the generalized Carleman condition (3.12) holds. Therefore, by Proposition 3.5, z is the moment sequence of a determinate measure ν on Rn , and y is the moment sequence of a measure µ on S (hence also determinate). For fixed α ∈ Nn , pointwise convergence implies Ly+z (P xα − xα ) = 0; this and the fact that both µ and ν are determinate imply that (µ + ν)P = µ + ν, and so µ + ν is feasible for (8.10). The rest of the proof is similar. Example 8.2. On the real line R, and with r ≤ 4, consider the deterministic dynamical system: Φt+1 = r Φt (1 − Φt ),
t = 0, 1, . . .
which is called the logistic map. Depending on r, the above system has interesting properties. In particular, when r > r0 (for some r0 ) it provides an example of a very simple dynamical system with chaotic behavior. With r ≤ 4 observe that if Φ0 ∈ [0, 1] then all iterates Φt also remain in [0, 1]. Furthermore, the transition probability function P of the associated Markov chain is obsviously weak-Feller and the moments y = (yj ) of an invariant probability measure µ of P on [0, 1] satisfy the countably many linear constraints j
j
Ly (x ) = yj = Ly ((rx(1 − x)) ) = r
j
j X
k=0
j (−1) yj+k , k k
j = 0, 1, . . .
So let x 7→ g(x) = x(1−x) and x 7→ h(x) = (x−a)(b−x), with 0 ≤ a ≤ b ≤ 1. With K := [0, 1] and S the closed interval [a, b] ⊂ [0, 1], the semidefinite
8.2 Evaluation of Ergodic Criteria
187
relaxations (8.9), which read ρi := sup y0
y,z P s.t. yj + zj − rj jk=0 (−1)k kj (yj+k + zj+k ) = 0, j = 0, . . . i y0 + z 0 = 1 Mi (y), Mi (z) 0 Mi−1 (y g) 0, j = 1, . . . , m Mi−1 (z h) 0, k = 1, . . . , p,
provide a monotone sequence of upper bounds that converges to maxµ µ(S) where the max is over all invariant probability measures of P . Example 8.3. Iterated Functions Systems (IFS). For every k = 1, . . . , N , let fk ∈ R[x] and consider the Markov chain on Rn defined by Φt+1 = fξt (Φt ),
t = 0, 1, . . .
where (ξt ) are i.i.d. random variables2 with values in ∆ = {1, . . . , N } and associated probabilities p = (pj ) ∈ RN . The transition probability function P of the Markov chain Φ is weak-Feller, because if h : Rn →R is bounded and continuous then so is the function X x 7−→ P h(x) = pj h(fj (x)), x ∈ Rn . j∈∆
If for some compact basic semi-algebraic set K ⊂ Rn , fk maps K into K for every k, then the semidefinite relaxations (8.9) converge to ρmom . Otherwise one may implement the semidefinite relaxations (8.12).
8.2
Evaluation of Ergodic Criteria
Evaluation of ergodic criteria is an important particular (and simpler) case of the generalized moment problem considered in the previous section. Given a Markov chain Φ = (Φ0 , Φ1 , . . .) on X ⊂ Rn , with associated stochastic kernel P , one is often interested in its expected long-run average behavior. More precisely, with f : Rn →R being a given measurable function, one is interested in computing or approximating the long-run expected average cost
2 i.i.d.
stands for independent and identically distributed.
8 Markov Chains Applications
188
J(x) := lim sup E x N →∞
"
N −1 1 X f (Φt ) N t=0
#
(8.13)
where Ex is the expectation operator3 associated with Px . A Borel set B ∈ B is called invariant if P 1B ≥ 1B , and a probability measure µ is called ergodic if µ(B) ∈ {0, 1} for every invariant set B ∈ B. Let µ be an invariant probability measure of the transition probability function P associated with Φ, and let f ∈ L1 (µ). If µ is ergodic or µ is the unique invariant probability measure of P , then as a consequence of Birkhoff’s Individal Ergodic Theorem, for µ-almost all x: # " N −1 Z 1 X f (Φt ) = f dµ. (8.14) J(x) := lim sup E x N t=0 N →∞ X " N −1 # Z 1 X 0 J (x) := lim sup f (Φt ) = f dµ, Px − a.s. N t=0 N →∞ X See e.g. Theorem 2.3.4 and Proposition 2.4.2 in Hern´ andez-Lerma and Lasserre (2003). Therefore, with Γ = Nn and K ⊂ Rn , consider the following infinite-dimensional linear programs: ρ1mom (ρ2mom ) := sup (inf ) hf, µi µ
µ
s.t. hP xα − xα , µi = 0, h1, µi = 1 µ ∈ M (K)+ ,
∀α ∈ Γ
(8.15)
each of which is an instance of the generalized moment problem (1.1) with countably many moment constraints. Solving (8.15) provides an ergodic probability measure µ1 (resp. µ2 ) that maximizes (resp. minimizes) (8.14) over all invariant probability measures with support contained in K. If P admits a unique invariant probability measure on K then ρ1mom = ρ2mom . If K as in (4.1) is compact, Assumption 2.1 holds, and P maps R[x] into R[x], then the semidefinite relaxations
3 Let
(Ω, F ) be the measurable space consisting of the canonical sample path Ω = := X × X × · · · and the corresponding product σ-algebra F . Then every initial probability distribution φ determines a probability measure on Ω, denoted Pφ , and the notation Px corresponds to the initial distribution φ := δx with all mass concentrated on x ∈ X. X∞
8.3 Summary
189
ρ1i (ρ2i ) := sup (inf ) Ly (f ) y
y
s.t. Ly (P xα − xα ) = 0, ∀ α; deg P xα ≤ 2i (8.16) Mi (y), Mi−vj (y gj ) 0, j = 1, . . . , m y0 = 1, provide upper bounds ρ1i ≥ ρ1mom (resp. lower bounds ρ2i ≤ ρ2mom ) and in addition, for j = 1, 2, ρji →ρjmom as i→∞. So if P admits a unique invariant probability measure on K, the exact value J ∗ = ρ1mom = ρ2mom satisfies ρ2i ≤ J ∗ ≤ ρ1i ,
∀i
and ρ2i ↑ J ∗ ,
ρ1i ↓ J ∗
as i→∞.
(8.17)
In the non compact case X = Rn , and like in Section 8.1, one may consider the subspace of invariant probability measures µ of P , that have a tail with exponential decay, that is, such that µ ∈ M M δ (X)+ with M M δ (X)+ as in (8.11). Then one consider the moment problem (8.15) with µ ∈ M M δ (X)+ instead of µ ∈ M (K)+ . If P admits an invariant probability measure µ ∈ M M δ (X)+ then the semidefinite relaxations ρ1i (ρ2i ) := sup (inf ) Ly (f ) y
y
s.t. Ly (P xα − xα ) = 0, ∀ α; Mi (y) 0, j = 1, . . . , m y0 = 1 P i 2t t=1 Ly (xk )/(2t!) ≤ M,
deg P xα ≤ 2i k = 1, . . . , n,
provide upper bounds ρ1i ≥ ρ1mom (resp. lower bounds ρ2i ≤ ρ2mom ) and in addition, ρji →ρjmom as i→∞, for all j = 1, 2. So, again, if P admits a unique invariant probability measure µ ∈ M M δ (X)+ then (8.17) also holds for every i. 8.3
Summary
In this chapter we have considered the moment approach to provide tight bounds on j ∗ = supµ µ(S) where the ”sup” is over all invariant measures of a given Markov chain Φ whose transition probability function P maps polynomials into polynomials. The upper bounds provided by appropriate semidefinite relaxations converge to the exact value j ∗ . This
8 Markov Chains Applications
190
approach also permits to approximate the exact value of the ergodic funcPN −1 tional J(x) = lim supN →∞ Ex [ (1/N ) k=0 f (Φk )] for some given polynomial f ∈ R[x]. This deterministic approach which provides an arbitrarily close approximation of j ∗ (and of J(x) = J ∗ when the invariant probability measure is unique) is to be contrasted with simulation techniques that provide random estimates of j ∗ or J ∗ . In particular, if µ is the unique invariant probability measure of P , then (8.17) holds and so one provides smaller and smaller intervals in which J ∗ lies, something that cannot be done with random estimators.
8.4
Exercises
Exercise 8.1. In (8.5) replace ”sup” with ”inf”. Do we have an analogue of Lemma 8.2? What happens? If we now assume that S ⊂ K is open (with K still compact), what can be said? (Hint: K \ S is closed hence compact and inf µ µ(S) = 1 − supµ µ(K \ S). What about the semidefinite relaxations? Can we always describe the set K \ S easily? What is the conclusion?) Exercise 8.2. Consider again the IFS like in Example 8.3 but now suppose that the (ξt ) are not i.i.d. any longer but ξ = (ξ0 , . . .) is itself a Markov chain on ∆ with associated transition probability function Q = (Qij ), i.e., Prob [ξt+1 = j | ξt = i] = Qij , for every i, j ∈ ∆, and every t = 0, 1, . . .. Is (Φt , ξt ) a Markov chain on Rn × ∆? Describe its transition probability function P and write explicitly P h(Φ, k) for an arbitrary polynomial h ∈ R[x]. Is P weak-Feller? Write the semidefinite relaxations (8.9) for approximating supµ µ(S) over all invariant probability measures µ of P . Exercise 8.3. Suppose that for some i, the semidefinite relaxation (8.12) has no feasible solution. What can be concluded? Exercise 8.4. Consider Example 8.2 with r := 4, and let f ∈ R[x] be given. Write the corresponding semidefinite relaxations (8.15). What can be said? is there a unique invariant probability measure for P (Hint: try to compute fixed points (i.e. cycles of length 1) of the mapping x 7→ 4x(1 − x), then cycles of length 2). What can be concluded?
8.5 Notes and Sources
8.5
191
Notes and Sources
Proposition 8.1 is from Krylov and Bogolioubov (1937). For a numerical approximation of j ∗ and J ∗ , the most popular methods are Monte Carlo schemes that simulate the Markov chain Φ and provide random estimates of j ∗ and J ∗ . For basics on the the Monte Carlo approach, the interested reader is referred to Niederreiter (1992). On the other hand there exist general numerical schemes to approximate j ∗ directly by (a) discretization of the space X into a grid and (b) looking for a measure that is a convex combinations of Dirac measures on that grid, and such that |hf, µP −µi| ≤ for all f ∈ F , where > 0 is fixed and F is a finite set of test functions. This results in a linear program that computes the optimal weights of the convex combination of Dirac measures. One then decreases and enlarge F to provide a sequence of bounds that converges to j ∗ . See e.g. Hern´ andezLerma and Lasserre (1998b,a).
Chapter 9
Application in Mathematical Finance
This chapter considers an application in mathematical finance, namely the pricing of some exotic options. First under a no-arbitrage assumption and knowledge of some moments of the distribution of the underlying asset price. Then when one assumes that the asset price obeys some Ito’s stochastic differential equation.
An important problem in mathematical finance is to evaluate the price of a derivative security given information of the underlying asset. For instance, an European Call Option with strike K gives its holder the option (but no obligation) to buy the underlying security at time T at price K. If the price xT at time T is above K, then the holder will exercise the option to make a profit of xT − K, whereas if xT ≤ K he will not exercise the option. Therefore, the payoff max(xT − K, 0) of this option being nonnegative, it has some value. Under no arbitrage,1 this value is given by E[(x − K)+ ] where E is the expectation operator with respect to the distribution of the price of the underlying asset x, and the notation x+ stands for max[0, x]. In this chapter we investigate this issue in the case where one has some moment information of the distribution and in the case where the price of the underlying asset obeys a stochastic differential equation. 9.1
Option Pricing with Moment Information
Finding an optimal upper bound on the price of an European Call Option with strike K, given the first p + 1 moments {γj }pj=0 of the price of the underlying asset, reduces to solving:
1 For
the concept of arbitrage, see the discussion in Section 1.1.
193
9 Application to Mathematical Finance
194
ρmom :=
sup µ∈M (R+ )+
(Z
+
R+
(x − K) dµ :
Z
j
x dµ = γj , R+
j∈Γ
)
(9.1)
with Γ := {0, . . . , p}. Obviously, (9.1) is an instance of the generalized moment problem (1.1) with K = R+ , x 7→ hj (x) = xj for all j ∈ Γ, and x 7→ f (x) = (x − K)+ . Problem (9.1) is similar to problem (7.2) in the univariate case. Observe that f is not a polynomial, and as in Chapter 7 we replace µ with the sum of two measures φ and ν with support on S := [K, +∞) and K = R+ respectively. That is, we again want to solve a variant of the generalized moment problem with two unknown measures φ and ν; namely : Z ρmom = sup (x − K) dφ φ,ν ZS Z (9.2) s.t. xj dφ + xj dν = γj j ∈ Γ R+
S
φ ∈ M (S)+ , ν ∈ M (R+ )+ .
Again we should have imposed that ν(S) = 0. However, since we maximize, if a feasible solution (φ, ν) of (9.2) is such that ν(S) > 0 then one easily constructs a feasible solution (φ0 , ν 0 ) with ν 0 (S) = 0 and with value at least as good. Write ν as the sum ν1 + ν2 of two mutually singular measures ν1 , ν2 , with ν1 (B) = ν(B ∩ S) and ν2 (B) = ν(B ∩ (R+ \ S)) for all Borel 0 0 sets B of R+ . Then R for (9.2) with value R (φ , ν ) := (φ R+ ν1 , ν2 ) is feasible R 0 (x − K)dν ≥ (x − K)dφ + (x − K)dφ = 1 S (x − K)dφ because S S S (x − K) ≥ 0 on S. The dual of (9.1) reads X infp+1 λj γ j λ∈R
s.t.
j∈Γ X
j∈Γ X j∈Γ
λj xj − (x − K) ≥ 0 on S . λ j xj ≥ 0
(9.3)
on R+
Then (9.3) has a complete description as a semidefinite program because one has an appropriate description of polynomials nonnegative on an interval,
9.1 Option Pricing with Moment Information
195
in terms of a weighted sum of squares. Indeed by Theorem 2.7, a nonnegative univariate polynomial f ∈ R[x] nonnegative on [a, +∞) can be written as q0 + (x − a)q1 for two s.o.s. polynomials q0 , q1 ∈ Σ[x] such that deg q0 , deg (x − a)q1 ≤ deg f . Therefore solving (9.3) reduces to solving the single semidefinite program: X ρ∗ = inf λj γ j λ∈Rp+1
s.t.
j∈Γ X
j∈Γ X
λj xj − (x − K) = q0 + (x − K) q1 λ j xj = q 2 + x q 3
(9.4)
j∈Γ
qj ∈ Σ[x], j = 0, . . . , 3 deg (q0 , (x − K)q1 , q2 , xq3 ) ≤ p. Let p˜ := dp/2e. The semidefinite program (9.4) is the dual of ρ = sup Ly (x − K) (= y1 − Ky0 ) y,z
s.t. Mp˜(y), Mp˜(z) 0 Mp−1 ˜ ((x − K) y) 0 Mp−1 ˜ (x z) 0 yj + zj = γj , j ∈ Γ,
(9.5)
where Mp−1 ˜ ((x−K) y) (resp. Mp−1 ˜ ((x−K) y)) is the localizing matrix associated with y and the polynomial x 7→ x − K (resp. z and the polynomial x 7→ x). Of course one has ρmom ≤ ρ ≤ ρ∗ . Theorem 9.1. Asume that there exists a probability measure µ on R+ which has a density f > 0 with respect to the Lebesgue measure on R + , and such that Z ∞ Z j j x dµ = x f (x) dx = γj , j ∈ Γ. R+
0
Then: (a) ρmom = ρ = ρ∗ . (b) Let y be an optimal solution of (9.5). If rank Mp˜(y) = rank Mp−1 ˜ (y) rank Mp˜(z) = rank Mp−1 ˜ (z) there is a rank Mp˜(y)-atomic measure φ on S and a rank Mp˜(z)-atomic measure ν on R+ \ S such that µ := φ + ν is an optimal solution of (9.1).
9 Application to Mathematical Finance
196
Proof. (a) As there exists dµ = f dx which satisfies the moment constraints, it follows that the moment vector γ lies in interior of the moment space. Therefore, there is no duality gap between (9.1) and its dual (9.2), i.e. ρmom = ρ∗ . But this also implies the desired result ρmom = ρ = ρ∗ . (b) As in addition Mp−1 ˜ ((x − K) y) 0, then by Theorem 3.11, y is the (truncated) moment vector of a measure φ on S and for same reasons z is the (truncated) moment vector of a measure ν on R+ . Therefore γj = y j + z j =
Z
xj dφ + S
Z
xj dν = R+
Z
xj dµ R+
for every j ∈ Γ. Moreover ν has its support in R+ \ S otherwise if some atom a of ν (with weight say λa > 0) is in S then taking φ0 := φ + λa δa and ν 0 := ν − λa δa would yield a strictly better feasible solution for (9.5) (obtained from the moment vector (y0 , z0 ) of (φ0 , ν 0 )), in contradiction with the optimality of y. Hence ν(S) = 0 which in turn yields ρmom = Ly (x − K) =
Z
S
(x − K)dφ =
Z
R+
since ν(S) = 0.
9.2
(x − K)+ dµ
Option Pricing with a Dynamic Model
In the previous section, there was no model for the dynamics of the price of the underlying asset. The only available information was knowledge of some moments of its distribution at the time to exercise the option. In this section we assume that the price of the underlying asset obeys some Ito’s stochastic differential equation. In this context, the basic idea of the moment approach is to associate the asset price of a European style option with suitably defined expected occupation measures. One then exploits the martingale property of certain associated stochastic integrals to derive an infinite system of linear equations satisfied by the moments of the measures considered. Then one relates the value of an option with the solution of an infinite-dimensional linear programming problem, an instance of the generalized moment problem (1.1) with countably many moment constraints.
9.2 Option Pricing with a Dynamic Model
9.2.1
197
Notation and definitions
Fix a filtered probability space (Ω, F , Ft , P) satisfying the usual conditions and supporting a standard, one-dimensional (Ft )-Brownian motion W . We consider a number of derivatives, the underlying asset price process of which satisfies a stochastic differential equation (SDE) of the form dXt = b(Xt ) dt + σ(Xt ) dWt ,
X0 = x 0 ∈ I ,
(9.6)
Here, I is either (0, +∞) or R, and b, σ : I → I are given functions such that (9.6) has a unique strong solution with values in I , for all t ≥ 0, P-a.s.. In particular, we assume that the underlying asset price process X is given by one of the following three models: Model 1 b(x) := bx, σ(x) := σx and I = (0, +∞), for some constants b, σ ∈ R. This is the familiar geometric Brownian motion underlying the Black and Scholes model. Model 2 b(x) := γ(θ − x), σ(x) := σ and I = R, for some constants γ, θ, σ ∈ R. This mean-reverting diffusion is an Ornstein-Uhlenbeck process, which for instance, appears in Vasicek’s interest rate model. √ Model 3 b(x) := γ(θ − x), σ(x) := σ x and I = (0, +∞), for some constants γ, θ, σ ∈ R such that γθ > σ 2 /2. This diffusion models the short rate dynamics assumed in the Cox-IngersollRoss interest rate model. Note that the inequality γθ > σ 2 /2 is necessary and sufficient for the solution of (9.6) to be non-explosive, in particular, for the hitting time of 0 to be equal to +∞ with probability one. Payoff. Several types of derivative payoff structures may be considered. But here for illustration we consider barrier call options. The price of a typical down-and-out barrier call option written on the underlying process X, is given by: J(x0 ) := e−ρT E (XT − K)+ 1{T } (τ ) , (9.7) where T > 0 is the option’s maturity time, K is the option’s strike price, ρ is a constant discounting factor, x0 is the initial underlying asset price,
9 Application to Mathematical Finance
198
and τ is the (Ft )-stopping time defined by τ := inf {t ≥ 0 : Xt ≤ H} ∧ T,
(9.8)
and H < x0 is the knockout barrier. So, starting with initial state x0 > H, the process stops either when asset price hits the barrier H for the first time at time τ ≤ T or stops at τ = T if the asset price sample path was always above H. 9.2.2
The martingale approach
The extended infinitesimal generator A of the price-time process (Xt , t) associated with the diffusion (9.6) is defined by f 7→ (A f )(x, t) :=
1 ∂f ∂f σ(x)2 ∇2x f (x, t) + b(x) (x, t) + (x, t) 2 ∂x ∂t
for all f in the domain D(A ) (which contains the set Cc2 (R × R+ ) of all twice-continuously differentiable functions f : R × R+ 7→ R with compact support). Assumption 9.1. b, σ 2 ∈ R[x] so that A maps polynomials into polynomials. Moreover, sup
n X
t∈[0,T ] j=1
h i E |Xtj |k < ∞,
for all T > 0,
for all k ∈ N.
Assumption 9.1 implies in particular that every polynomial f ∈ R[x, t] belongs to D(A ), and the process M f defined by Mtf
:= f (Xt , t) − f (x0 , 0) −
Z
t
(A f ) (Xs , s) ds 0
is a square-integrable martingale. Moreover, fix any (Ft )-stopping time τ that is bounded by a constant T > 0, P-a.s.. With regard to Doob’s optional sampling theorem, under Assumption 9.1, if f ∈ R[x, t] then E [f (Xτ , τ )] − f (x0 , 0) − E
Z
τ
(A f )(Xs , s) ds = 0. 0
(9.9)
9.2 Option Pricing with a Dynamic Model
199
Expected occupation measures Consider: • the expected occupation measure µ(·) = µ(· ; x0 ) of the price-time process (Xt , t) up to time τ , supported on Q := [H, +∞) × [0, T ] and defined by:
µ(B × C) := E
"Z
#
1B (Xs ) ds , [0,τ ]∩C
B ∈ B(Rn ), C ∈ B([0, T ]) (9.10)
where B(R) (resp. B([0, T ])) is the Borel σ-algebra on Rn (resp. on [0, T ]), and • the exit location probability measure ν(·) = ν(· ; x0 ), supported on S := [0, T ] × {H} ∪ {T } × [H, +∞), and defined by: ν(B×C) := P ((Xτ , τ ) ∈ B × C) ,
B ∈ B(Rn ), C ∈ B([0, T ]), (9.11)
i.e., ν is the joint probability distribution of (Xτ , τ ).
The generalized moment problem From the definitions of µ and ν, one may rewrite the martingale property (9.9) as Z Z f dν − f (x0 , 0) − (A f ) dµ = 0, (9.12) S
Q
which is called the basic adjoint equation and which characterizes the measures µ and ν associated with the generator A . Wih no loss of generality and to simplify the exposition, we set the discount factor to zero so that the payoff (9.7) reads Z J(x0 ) := E (XT − K)+ 1{T } (τ ) = (x − K)+ dν, (9.13) S
and we consider the moment problems:
9 Application to Mathematical Finance
200
sup (inf ) µ,ν
µ,ν
s.t.
Z
ZS S
(x − K)+ dν Z f dν − (A f ) dµ = f (x0 , 0), Q
∀f ∈ R[x, t]
µ ∈ M (Q)+ , ν ∈ M (S)+ .
Equivalently, as ν is supported on [0, T ] × {H} ∪ {T } × [H, +∞), decompose ν into the sum of three mutally singular measures, namely ν = ν 1 + ν2 + ν3 with ν1 supported on S1 := [0, T ] × {H}, ν2 supported on S2 := {T } × (H, K) and ν3 supported on S3 := {T } × [K, +∞). Therefore J(x0 ) =
Z
+
S
(x − K) dν =
Z
S3
(x − K) dν3
and so we consider the two moment problems ρ1mom = sup µ,νi
s.t.
Z
(x − K) dν3 Z f dνi − (A f ) dµ = f (x0 , 0),
S3 3 Z X i=1
Si
Q
∀f ∈ R[x, t]
(9.14)
∀f ∈ R[x, t]
(9.15)
µ ∈ M (Q)+ , νi ∈ M (Si )+ , i = 1, 2, 3,
ρ2mom
= inf
µ,νi
s.t.
Z
(x − K) dν3 Z f dνi − (A f ) dµ = f (x0 , 0),
S3 3 Z X i=1
Si
Q
µ ∈ M (Q)+ , νi ∈ M (Si )+ , i = 1, 2, 3.
9.2.3
Semidefinite relaxations
Let y1 = (y1k ), y2 = (y2k ) and y3 = (y3k ) be the moment sequences of the measures ν1 , ν2 and ν3 respectively, whereas z = (zij ) denotes that of µ.
9.2 Option Pricing with a Dynamic Model
Observe that with f (x, t) := xk tj the integral j zk(j−1) + k b zkj + j zk(j−1) + kγ θ z(k−1)j − k γzkj + j zk(j−1) + k γ θ z(k−1)j − k γzkj +
Z
201
(A f ) dµ reads: Q
k(k−1)σ 2 z(k−2)j 2 k(k−1)σ 2 z(k−2)j 2 k(k−1)σ 2 z(k−1)j 2
in Model 1 in Model 2 in Model 3
(9.16)
Let t 7→ g1 (t) = t(T − t), x 7→ g2 (x) = (H − x)(K − x), x 7→ g3 (x) := (x − K), x 7→ g4 (x) := (x − H), and consider the semidefinite programs: ρ1i = sup y31 − Ky30 yi ,z
s.t. Mi (z), Mi (yk ) 0, k = 1, 2, 3 Mi−1 (g1 y1 ), Mi−1 (g2 y2 ), Mi−1 (g3 y3 ) 0 Mi−1 (g1 z), Mi−1 (g4 z) 0 H k y1j + T j y2k + T j y3k − (9.16) = 0j xk0 , k + j ≤ i
(9.17)
and ρ2i = inf y31 − Ky30 yi ,z
s.t. Mi (z), Mi (yk ) 0, k = 1, 2, 3 Mi−1 (g1 y1 ), Mi−1 (g2 y2 ), Mi−1 (g3 y3 ) 0 Mi−1 (g1 z), Mi−1 (g4 z) 0 H k y1j + T j y2k + T j y3k − (9.16) = 0j xk0 , k + j ≤ i. Theorem 9.2. Suppose that the following conditions hold: (i) The measure ν3 is moment-determinate. (ii) The infinite system Mi (z), Mi (yk ) 0, k = 1, 2, 3 Mi (g1 y1 ), Mi (g2 y2 ), Mi (g3 y3 ) 0 Mi (g1 z), Mi (g4 z) 0 k H y1j + T j y2k + T j y3k − (9.16) = 0j xk0 ,
(9.18)
(9.19)
∀ k + j ≤ i,
with i = 0, 1, . . ., uniquely determines the moment sequence y3 . Then as i→∞, ρ1i ↓ J(x0 )
and
ρ2i ↑ J(x0 ).
9 Application to Mathematical Finance
202
In view of the structure of the problem considered, Assumption (i) in the statement of Theorem 9.2 is satisfied if E (exp{c |XT |}) < +∞ for some c > 0, which is true for Models 2 and 3, but not for Model 1. Numerical experiments For barrier options, relative errors of the moment approach are displayed in Table 9.1 for several values of the volatility σ. The other parameters read: Model 1: x0 = 1, K = 1, H = 0.8, T = 2, b = 0 Models 2 and 3: x0 = 1, K = 1, H = 0.8, T = 2, γ = 1, θ = 0.95. With regard to Model 1, the SDP solver ran into numerical problems for i = 8 and the value σ = 0.25. The relative error in Table 9.1 gives the accuracy of the i = 8, 9 approximation. These results indicate that the bounds provided by the moment approach are very tight. However, our experimentation was limited in size because of the numerical problems encountered with the semidefinite solver SeDuMi. As a result, we could not go beyond i = 8 or i = 9 for barrier payoff structures. Table 9.1
Barrier options. Model 1
i=8
σ = 0.10
σ = 0.15
σ = 0.20
σ = 0.25
2.63%
3.91%
0.52%
1.07%
Model 2
i=9
σ = 0.10
σ = 0.15
σ = 0.20
σ = 0.25
1.97%
2.19%
1.36%
2.8%
Model 3
i=9
9.3
σ = 0.10
σ = 0.15
σ = 0.20
σ = 0.25
6.3%
2.85%
1.47%
0.83%
Summary
In Section 9.2 we have described the moment approach for approximating the prices of several exotic options of European type. For concreteness,
9.4 Notes and Sources
203
we have focused on fixed-strike, down-and-out barrier call option payoff structures. However, it can easily be modified to account for other payoff structures including double-barrier knockout and Parisian call options, or their put counterparts. An distinguishing feature of the approach is its ability to handle various dynamic models of the asset price, other than the geometric Brownian motion model (GBM). For instance, it can handle underlying price dynamics modelled by processes such as an OrnsteinUhlenbeck process as in Vasicek’s model, a standard square-root process with mean-reversion as in the Cox-Ingersoll-Ross (CIR) model, and others (provided that the parameters of the infinitesimal generator are polynomials) which are of particular interest in the fixed-income and the commodity markets.
9.4
Notes and Sources
9.1 The material in this section is from Lasserre (2008b). Bounding the price of an option without assuming any model for the price dynamics and only under the no-arbitrage assumption has a long history in the economic literature. For a detailed account as well as complexity results and exact formulas in specific cases, the interested reader is referred to Bertsimas and Popescu (2002). 9.2 The material of this section is from Lasserre et al. (2006). The study of exotic options has been a major research area in mathematical finance, and the literature is abundant in results such as exact formulas or numerical approximation techniques when the underlying asset price dynamics are modelled by a geometric Brownian motion (GBM). If we assume that the underlying asset follows a GBM, then there exists a closed form analytic expression for the price of a down-and-out barrier call option (e.g., in (Musiela and Rutkowski, 1997, Section 9.6)). On the other hand, this is not the case for arithmetic-average Asian options. Approaches to the approximate pricing of Asian options when the underlying is modelled by a GBM include quasi-analytic techniques based on Edgeworth and Taylor expansions, and the like (e.g., Turnbull and Wakeman (1991)), methods derived by means of probabilistic techniques as in Curran (1992), Rogers and Shi (1995)), the numerical solution of appropriate PDE’s in Rogers and Shi (1995)) and Monte Carlo simulations in e.g., Glasserman et al. (1999). Most of these approximation techniques can be adapted to account for the
204
9 Application to Mathematical Finance
pricing of Asian options when the underlying follows other diffusion processes such as a mean-reverting process. However, despite their practical importance, such extensions are still at their early stages. The moment approach developed in this chapter derives from the methodology of moments introduced by Dawson (1980) for the analysis of geostochastic systems modelled as solutions, called stochastic measure diffusions, of measure-valued martingale problems with a view to a range of applications in areas such as statistical physics, population, ecology and epidemic modelling, and others. Modeling some problems associated with diffusions as an infinite-dimensional linear program on a space of appropriate measures, and approximating the optimal value using only finitely moments was already proposed in Helmes et al. (2001); for instance to evaluate the moments of certain exit time distribution and also for changepoint detection problems. Also, Schwerer (2001) considered the evaluation of moments of the steady-state distribution of a reflected Brownian motion. In both the above works, the authors use Hausdorff type moment conditions of Theorem 3.10 which translate into linear constraints (and so LP relaxations), whereas we use semidefinite relaxations based on the moment conditions described in Theorem 3.8. For a comparison between LP and SDP relaxations in the general case, see Chapter 5 and in the context of some diffusions see Lasserre and Pri´eto-Rumeau (2004). For moment problems associated with SDE see also Stoyanov (2001). Finally, the infinite-dimensional LP approach has also been used for some continuoustime stochastic control problems (e.g., modeled by diffusions) and optimal stopping problems in Jung and Stockbridge (2002), Helmes and Stockbridge (2000), Kurtz and Stockbridge (1998), and also Fleming and Vermes (1989). In some of these works, the moment approach is also used to approximate the resulting infinite-dimensional LP.
Chapter 10
Application in Control
This chapter considers an application in control. We apply the moment approach to the so-called weak formulation of optimal control problems in which the initial problem is viewed as an infinite linear-programming model over suitable occupation measures, an instance of the generalized moment problem.
10.1
Introduction
Consider the following problem:
J(0, x0 ) := inf
u∈U
s.t.
Z
T
h(s, x(s), u(s)) ds + H(x(T )) 0
d dt x(s)
= f (s, x(s), u(s)),
(x(s), u(s)) ∈ X × U,
a.e. on [0, T ]
(10.1)
a.e. on [0, T ]
x(T ) ∈ K, with initial condition x(0) = x0 ∈ X, which models a deterministic system where x(t) (resp. u(t)) is the state (resp. the control) of the system at time t, and x(t) obeys the ordinary differential equation in (10.1) and where the goal is to find a control trajectory u : [0, T ]→U such that the (controlstate) trajectory liesR in X × U, the final state x(T ) lies in K and such that T the cost functional 0 hds + H(x(T )) is minimized. The above problem (10.1) is a so-called deterministic optimal control problem (OCP in short). In principle, solving an OCP is a difficult 205
206
10 Application in Control
challenge, despite powerful theoretical tools are available, e.g. the celebrated Maximum Principle and Hamilton-Jacobi-Bellman (HJB) optimality equation. The problem is even more difficult in the presence of state and/or control constraints as is the case in (10.1) and in most practical applications. State constraints are particularly difficult to handle. On the other hand, there is a so-called weak formulation of the OCP which consists of an infinite-dimensional linear program (LP) over two spaces of measures. The two unknown measures are the state-action occupation measure up to the final time T , and the state occupation measure at time T . The optimal value of the resulting LP always provides a lower bound on the optimal value of the OCP, and under some convexity assumptions, both values coincide. The dual of this infinite dimensional LP has an interpretation in terms of subsolutions of related HJB-like optimality conditions, as for the unconstrained case. Another interesting feature of this LP approach with occupation measures is that state constraints, as well as state and/or action constraints, are all easy to handle; indeed they simply translate into constraints on the supports of the unknown occupation measures. Although this LP approach is valid for any OCP, in general solving the corresponding (infinite-dimensional) LP is difficult. However, it turns out that this LP can be formulated as a particular instance of the generalized moment problem (1.1) and so, as we shall see, the general methodology of Chapter 4 can be easily adapted when the data of the OCP (10.1) are polynomials and basic semi-algebraic sets.
10.2
Weak Formulation of Optimal Control Problems
Consider the optimal control problem OCP (10.1) with initial condition x(0) = x0 ∈ X and where: - X, K ⊂ Rn and U ⊂ Rm are compact basic semi-algebraic sets. - U is the space of measurable functions u : [0, T ]→U. - h ∈ R[t, x, u], H ∈ R[x] - f : R × Rn × Rm →Rn is a polynomial map, i.e. fk ∈ R[t, x, u] for all k = 1, . . . , n. Before rewriting the OCP as a particular instance of the generalized moment problem, we need introduce some additional notation and definitions. For a compact Hausdorff space X ⊂ Rn let C(X ) denote the
10.2 Weak Formulation of Optimal Control Problems
207
Banach space of continuous functions on X , equipped with the sup-norm, so that its topological dual C(X )∗ is isometrically isomorphic to M (X ), the Banach space of finite signed measures on X , equipped with the total variation norm (denoted C(X )∗ ' M (X )). The notation M (X )+ denotes the positive cone of M (X ), i.e. the space of bounded measures on X. Let Σ := [0, T ] × X, S := Σ × U, and let C1 (Σ) be the Banach space of continuously differentiable functions w ∈ C(Σ), endowed with the sup-norm kwk + k∇wk. Next, with u ∈ U, let A : C1 (Σ)→ C(S) be the mapping w 7→ A w(t, x, u) :=
∂w (t, x) + hf (t, x, u), ∇x w(t, x)i, ∂t
(10.2)
and let L : C1 (Σ)→ C(S) × C(K) be the mapping w 7→ L w := (−A w, wT ), where wT (x) := w(T, x), for all x ∈ K. Occupation measures Notice that for a feasible trajectory (s, x(s), u(s)), and w ∈ C1 (Σ), one has wT (x(T )) = w(0, x0 ) +
Z
T
A w (s, x(s), u(s)) ds.
(10.3)
0
Let (µ, ν) ∈ M (S)+ × M (K)+ be defined as µ(A × B × C) :=
ν(D) = 1D (x(T ));
Z
1B×C ((x(s), u(s))) ds [0,T ]∩A
for all Borel sets D ⊂ K, A ⊂ [0, T ], B ⊂ X, and C ⊂ U (and where x 7→ 1• (x) stands for the indicator function of the set •). The measure µ is called the state-action occupation measure up to time T , whereas ν is the state occupation measure at time T , for the trajectory (s, x(s), u(s)). Then the time integration (10.3) is the same as the spatial integration Z
wT dν = w(0, x0 ) + K
Z
A w dµ. S
(10.4)
10 Application in Control
208
Similarly, the cost of this trajectory can be expressed via µ and ν by J(0, x0 , u) =
Z
T
h(s, x(s), u(s)) ds + H(x(T )) = 0
Z
h dµ + S
Z
H dν. K
(10.5)
And so, for an arbitrary feasible trajectory (s, x(s), u(s)), one has hL w, (µ, ν)i = hw, δ(0,x0 ) i
∀ w ∈ C1 (Σ).
(10.6)
Notice that we have translated the dynamics of the system, i.e., the ordinary differential equation in (10.1), in terms of the system of linear equations (10.4) on the occupation measures µ and ν, induced by a family of test functions w ∈ C1 . Similarly, the cost functional is now the linear criterion (10.5) in µ, ν. This formulation in terms of occupation measures is the deterministic analogue (with control) of the martingale approach that we have described in Section 9.2.2 for stochastic differential equations. Indeed, (9.12) and (9.13) are the stochastic analogues of (10.4) and (10.5), respectively. The only difference is that for the former, the infinitesimal generator A contains second-order partial derivatives because of the presence of the Brownian. Hence (10.4)-(10.5) will be the basis of the semidefinite relaxations for the OCP (10.1). The generalized moment problem Let L ∗ : M (S) × M (K)→ C1 (Σ)∗ be the adjoint mapping of L , defined by h(µ, ν), L wi = hL ∗ (µ, ν), wi, for all ((µ, ν), w) ∈ M (S) × M (K) × C1 (Σ). A function g : [0, T ] × Rn → R, is a solution of the Hamilton-JacobiBellman optimality equation if inf { A g (s, x, u) + h(s, x, u) } = 0,
u∈U
∀ (s, x) ∈ int Σ
(10.7)
with boundary condition gT (x) (= g(T, x)) = H(x), for all x ∈ K. On the other hand, a function w ∈ C1 (Σ) is said to be a smooth subsolution of the Hamilton-Jacobi-Bellman equation (10.7) if Aw+h ≥ 0
on S,
and w(T, x) ≤ H(x),
∀ x ∈ K.
10.2 Weak Formulation of Optimal Control Problems
209
Then, consider the infinite-dimensional linear program: ρmom =
inf
(µ,ν)∈∆
{h(µ, ν), (h, H)i :
L ∗ (µ, ν) = δ(0,x0 ) }
(10.8)
(where ∆ := M (S)+ × M (K)+ ). Its dual reads: ρpop =
sup w∈C1 (Σ)
{hδ(0,x0 ) , wi :
L w ≤ (h, H)}.
(10.9)
Note that the feasible solutions w of (10.9) are precisely smooth subsolutions of (10.7). Under compactness-continuity conditions, there is no duality gap between the infinite dimensional linear programs (10.8) and (10.9). Theorem 10.1. Let X, U, K be all compact, and let h, H be continuous. Moreover, assume that for every (t, x) ∈ Σ, ( the set f (t, x, U) ⊂ Rn is convex, and the function (10.10) v 7→ gt,x (v) := inf { h(t, x, u) : v = f (t, x, u)} is convex, u∈U
then the OCP (10.1) has an optimal solution and ρpop = ρmom = J(0, x0 ).
(10.11)
To see that (10.8) is an instance of the generalized moment problem with polynomial data, it suffices to notice that since Σ is compact, the constraint L ∗ (µ, ν) = δ(0,x0 ) is equivalent to (10.6) for all w in R[t, x], and so is equivalent to the countably many equality constraints hL (xα tk ), (µ, ν)i = hxα tk , δ(0,x0 ) i
∀ (α, k) ∈ Nn × N,
(10.12)
which in turn reads α Z Z k−1 α x0 if k = 0 k α k α T kt x + t h∇x x , f i dµ = x dν − , (10.13) 0 otherwise K S for all α ∈ Nn and k ∈ N. Equivalently (10.13) reads: Z Z 1 2 fαk dν − fαk dµ = bαk , (α, k) ∈ Nn × N, K
S
1 2 with fαk = T k xα ∈ R[x], fαk = ktk−1 xα + tk h∇x xα , f i ∈ R[t, x, u], and α bαk = x0 if k = 0 and 0 otherwise.
10 Application in Control
210
Hence, under (10.10) the OCP (10.1) has same optimal value as the infinite dimensional linear program ρmom =
min
µ∈M (S)+ ,ν∈M (K)+
{
Z
h dµ + S
Z
H dν : K
(10.12) holds }, (10.14)
a variant of the generalized moment problem (1.1) with two unknown measures µ, ν and countably many constraints.
10.3
Semidefinite Relaxations for the OCP
Let y = {yα }, α ∈ Nn , and z = {zβ }, β ∈ N × Nn × Nm , be the moment variables associated with ν and µ respectively. For every (α, k) ∈ Nn × N, let d(α, k) := max[|α|, |β|] of yα and zβ in the constraint T k yα − Lz ktk−1 xα + tk h∇x xα , f i =
xα 0 if k = 0 . 0 otherwise
(10.15)
The sets K and S = Σ × U are compact basic semi-algebraic sets, with K defined as in (2.10) and S defined by: S := {(t, x, u) ∈ R × Rn × Rm
hk (t, x, u) ≥ 0 ∀ k = 1, . . . , p} (10.16) for some polynomials {hk }pk=1 ⊂ R[t, x, u]. Depending on parity, let deg gj = 2rj or 2rj − 1 and similarly, deg hk = 2lk or 2lk − 1. For every 2i ≥ i0 := max[deg h, deg H, maxj,k [rj , lk ]], the semidefinite relaxation of the GPM (10.14) reads: :
ρi = inf Ly (H) + Lz (h) y,z
s.t Mr (y), Mr (z) 0 Mi−rj (gj y) 0, j = 1, . . . , m Mi−lk (hk z) 0, k = 1, . . . , p (10.15), (α, k) ∈ Nn × N, d(α, k) ≤ 2i.
(10.17)
Notice that with (α, k) = (0, 0) (resp. (α, k) = (0, 1)) in (10.13), one obtains ν(K) = 1 (resp. µ(S) = T ), so that y0 = 1 and z0 = T .
10.3 Semidefinite Relaxations for the OCP
211
Theorem 10.2. Let K ⊂ Rn , S ⊂ R × Rn × Rm be the compact basic semi-algebraic sets defined in (2.10) and (10.16) respectively. Let Assumption 2.1 holds for K and for S as well. Let ρmom be as in (10.14) and let ρi be as in (10.17). Then ρi ↑ ρmom as i→∞, and if (10.10) holds, ρi ↑ ρmom = J(0, x0 ).
The dual relaxation The dual of the semidefinite program (10.17), which reads: ρ∗i =
sup λkα ,σj ,ψk
w(0, x0 )
s.t w(t, x) = {
X k,α
λkα tk xα : d(α, k) ≤ 2i }
A w + h = ψ0 + H − w T = σ0 +
p X
k=1 m X
ψ k hk
(10.18)
σj g j
j=1
deg σ0 , ψ0 , σj gj , ψk hk ≤ 2i,
∀j ≤ m, ∀k ≤ p
is a strengthening of (10.9) because w ∈ C1 (Σ) is now replaced with a polynomial of R[t, x]. Moreover, the constraint h+A w ≥ 0 on S (resp. H − wT ≤ 0 on K) is replaced with the stronger requirement that h + A w (resp. H −wT ) belong to the quadratic module generated by the polynomials (hk ) (resp. (gj )), with a degree bound on the weights (ψk ) ⊂ R[t, x, u] (resp. (σj ) ⊂ R[x]). That is, in (10.18) one searches for a polynomial w ∈ R[t, x] (with a degree bound that increases with i), which is a subsolution of the HamiltonJacobi equation (10.7), and which maximizes w(0, x0 ). Optimal solutions w ∗ ∈ R[t, x] of the dual relaxation (10.18) can provide some useful information to compute an approximate control feedback for the OCP. Recall that h + A w ∗ ≥ 0 on S. Therefore, with (t, x) ∈ [0, T ] × X fixed, consider a control u∗ that minimizes over the compact set U, the polynomial u 7→ h(t, x, u) + A w ∗ (t, x, u), which is nonnegative on U. If the optimal value is close to zero, i.e., if h(t, x, u∗ ) + A w∗ (t, x, u∗ ) ≈ 0, then such a global minimizer u∗ ∈ U is likely a good candidate for feedback
10 Application in Control
212
control a time t and in state x ∈ X. Indeed, when using u∗ , the HamiltonJacobi-Bellman equation (10.7) is almost satisfied at the point (t, x) ∈ Σ. OCP with Free terminal time. The above moment approach is also valid for free terminal time OCPs, that is, OCP where the final time T is itself unknown. For instance, think about computing the optimal trajectory for sending a satellite to some given orbit in minimum time. For the OCP with free terminal time T ≤ T0 (for some T0 fixed), we need adapt the notation because now T is also a variable. And so in this context, Σ := [0, T0 ] × X and S = Σ × U. The measure ν in the infinite dimensional linear program (10.8) is now supported in [0, T0 ] × K instead of K previously. Hence, the sequence y associated with ν is now indexed in the canonical basis {tk xα } of R[t, x] instead of {xα } previously. The corresponding semidefinite relaxation is exactly the same as in (10.17) with appropriate definitions of the polynomials gj ∈ R[t, x] (resp. hk ∈ R[t, x, u]) that define the set [0, T0 ) × K (resp. S). The particular case of minimal time problem is obtained with h = 1, H = 0. 10.3.1
Examples
We have tested the above methodology on two minimum time OCPs, namely the double and Brockett integrators, because the associated optimal value T (x0 ) to steer an initial state x0 to the origin 0, can be calculated exactly. The double integrator. Consider the double integrator system in R2 x˙ 1 (t) = x2 (t), x˙ 2 (t) = u(t),
(10.19)
where x = (x1 , x2 ) is the state and the control u(t) ∈ U , satisfies the constraint |u(t)| ≤ 1, for all t ≥ 0. In addition, the state x(t) is constrained to satisfy x2 (t) ≥ −1, for all t. Therefore X = {x ∈ R2 : x2 ≥ −1}, K = {(0, 0)}, and U = [−1, 1]. For this very simple system (10.19), one is able to compute exactly the optimal minimum time T (x) to steer an initial state x to the origin.
10.3 Semidefinite Relaxations for the OCP
213
If x1 ≥ −(x22 − 2)/2 then T (x) = t1 + t2 + t3 with: t2 = −(t1 + 1)/2 + t1 x2 + x1 ;
t1 = 1 + x 2 ;
else if x1 ≥ (−x22 /2) sign x2 then
q T (x) = 2 x1 + x22 /2 + x2 ,
t3 = 1,
q else T (x) = 2 x1 + x22 /2 − x2 .
Observe that X is not compact and so the convergence result of Theorem 10.2 may not hold. In fact, one may impose the additional constraint kx(t)k∞ ≤ M for some large M (and modify X accordingly), because for initial states x0 with kx0 k∞ relatively small when compared to M , the optimal trajectory remains in X. However, in the numerical experiments of Table 10.19, one has maintained the original constraint x2 ≥ −1. With ρi (x0 ) the optimal value of the semidefinite relaxation (10.17) with initial state x0 ∈ X, Table 10.1 displays the values of x0 ∈ X while Table 10.2 displays the the values of the ratii ρ5 (x0 )/T (x0 ). One may see that with relatively few moments, the approximation is quite close (i.e. the ratio is close to 1), especially for values of the initial state corresponding to the right upper-triangle of Table 10.2. Table 10.1 x01 x02
0.0 -1.0
0.2 -0.8
Double integrator: data initial state x0 = (x01 , x02 ). 0.4 -0.6
Table 10.2
0.6 -0.4
0.8 -0.2
1.0 0.0
1.2 0.2
1.4 0.4
1.6 0.6
1.8 0.8
2.0 1.0
Double integrator: ratio ρ5 (x0 )/T (x0 ). fifth semidefinite relaxation
0.7550 0.6799 0.6062 0.5368 0.4713 0.0000 0.4742 0.5410 0.6106 0.6864 0.7462
0.5539 0.4354 0.9805 0.8422 0.6417 0.4184 0.5068 0.6003 0.6826 0.7330 0.8032
0.3928 0.9828 0.9314 0.8550 0.7334 0.5962 0.6224 0.6988 0.7416 0.7979 0.8564
0.9995 0.9794 0.9462 0.8911 0.8186 0.7144 0.7239 0.7585 0.8125 0.8588 0.9138
0.9995 0.9896 0.9706 0.9394 0.8622 0.8053 0.7988 0.8236 0.8725 0.9183 0.9394
0.9995 0.9923 0.9836 0.9599 0.9154 0.8825 0.8726 0.8860 0.9241 0.9473 0.9610
0.9994 0.9917 0.9853 0.9684 0.9448 0.9044 0.8860 0.9128 0.9305 0.9481 0.9678
0.9992 0.9919 0.9847 0.9741 0.9501 0.9210 0.9097 0.9257 0.9375 0.9480 0.9678
0.9988 0.9923 0.9848 0.9727 0.9505 0.9320 0.9263 0.9358 0.9507 0.9559 0.9696
0.9985 0.9923 0.9862 0.9793 0.9665 0.9544 0.9475 0.9452 0.9567 0.9634 0.9755
0.9984 0.9938 0.9871 0.9776 0.9637 0.9534 0.9580 0.9528 0.9604 0.9733 0.9764
10 Application in Control
214
The Brockett integrator Consider the so-called Brockett system in R3 x˙ 1 (t) = u1 (t), x˙ 2 (t) = u2 (t), x˙ 3 (t) = u1 (t)x2 (t) − u2 (t)x1 (t),
(10.20)
where x = (x1 , x2 , x3 ), and the control u(t) = (u1 (t), u2 (t)) ∈ U satisfies the constraint u1 (t)2 + u2 (t)2 ≤ 1 for all t ≥ 0. The goal is again to steer an initial state x0 ∈ X to the origin and therefore X = R3 , K = {(0, 0, 0)} and U is the closed unit ball of R2 , centered at the origin. Let T (x0 ) be the minimum time needed to steer an initial state x0 ∈ X to the origin which is in fact equal to the minimum time to reach x0 from 0. Proposition 10.3. The minimum time T (x) needed to steer the origin to a point x = (x1 , x2 , x3 ) ∈ R3 is given by p θ x21 + x22 + 2|x3 | , (10.21) T (x) = p θ + sin2 θ − sin θ cos θ
where θ = θ(x) is the unique solution in [0, π) of
θ − sin θ cos θ 2 (x1 + x22 ) = 2|x3 |. sin2 θ
(10.22)
Moreover, the function T is continuous on R3 , and is analytic outside the line x1 = x2 = 0. Again recall that the convergence result of Theorem 10.2 is guaranteed for X compact only whereas in the present case, X = R3 is not compact. One possibility is to take for X a large ball of R3 centered at the origin because for initial states x0 with norm kx0 k relatively small, the optimal trajectory remains in X. However, in the numerical experiments presented below, we have chosen to maintain X = R3 . In Table 10.3 we have displayed the optimal values ρi (x0 ) for 16 different values of the initial state x0 , in fact, all 16 combinations of x1 = 0, x2 = 0, 1, 2, 3, and x3 = 0, 1, 2, 3. So, the entry (2, 3) of Table 10.3 for the second semidefinite relaxation is ρ2 for the initial condition x0 = (0, 1, 2). At some (few) places in the table, the ∗ indicates that the SDP-solver encountered some numerical problems, which explains why one finds a lower bound
10.4 Summary
215
Table 10.3 Brockett integrator: semidefinite relaxations: ρi (x0 ). First semidefinite relaxation i = 1 0.0000 0.9999 1.9999 2.9999 0.0140 1.0017 2.0010 3.0006 0.0243 1.0032 2.0017 3.0024 0.0295 1.0101 2.0034 3.0040 Second semidefinite relaxation i = 2 0.0000 0.2012 0.3738 0.4946
0.9998 1.1199 1.2003 1.3467
1.9997∗ 2.0762 2.1631 2.2417
2.9994∗ 3.0453 3.1304 3.1943
Third semidefinite relaxation i = 3 0.0000 0.7665 1.0826 1.3804
0.9995 1.3350 1.7574 2.0398
1.9987∗ 2.1563 2.4172 2.6797
2.9984∗ 3.0530 3.2036 3.4077
Fourth semidefinite relaxation i = 4 0.0000 1.2554 1.9962 2.7006
0.9992 1.5925 2.1871 2.7390
1.9977 2.1699 2.5601 2.9894
2.9952 3.0478 3.1977 3.4254
Optimal time T (x0 ) 0.0000 2.5066 3.5449 4.3416
1.0000 1.7841 2.6831 3.4328
2.0000 2.1735 2.5819 3.0708
3.0000 3.0547 3.2088 3.4392
ρi−1 (x0 ) slightly higher than ρi , when practically equal to the exact optimal value T (x0 ). Again, good approximations are obtained with few moments for values of the initial state x0 corresponding to the right upper-triangle of Table 10.3.
10.4
Summary
We have presented the moment approach to OCP problems with state and/or control constraints. When the data of the OCP (dynamics, instantaneous costs, and constraints) are described by polynomials then one may devise a numerical scheme for approximating the global optimal value J(0, x0 ) from below. It consists in a hierarchy of semidefinite relaxations of a weak formulation of the original OCP. Under some convexity conditions,
216
10 Application in Control
one obtains convergence to the global optimum J(0, x0 ). Preliminary results on two minimum time examples with several values of the initial state x0 ∈ X, are very encouraging as one often gets very close to the optimal value T (x) with relatively few moments. Of course, even if not trivial, the two above examples are low-dimensional examples, and in view of the present status of semidefinite solvers, so far the appproach is limited to small problems. In addition, and under appropriate convexity conditions like (10.10), one only gets convergence to the optimal value J(0, x0 ) of the OCP. Therefore, this semidefinite programming approach should be viewed as a complement to existing methods for solving the OCP; in particular, as it may provide good approximations of the optimal value, it could be used to evaluate the efficiency of other approaches that compute also feasible controls u ∈ U . Using the semidefinite relaxations to obtain approximations of an optimal control u∗ ∈ U is an important topic of further research. 10.5
Notes and Sources
Most of the material of this chapter is from Lasserre et al. (2008a) and Theorem 10.1 is from Vinter (1993) (proved in the more general framework ˙ of differential inclusions x(t) ∈ F (t, x(t))). Proposition 10.3 is from Prieur and Tr´elat (2005). In Henrion et al. (2008) is described a first attempt on how to use optimal solutions w ∗ ∈ R[t, x] of the dual relaxation (10.18), for computing an approximate control feedback for the OCP. Preliminary results on a simple academic problem are encouraging. For a detailed account of HJB theory in the case of state constraints, the interested reader is referred to Capuzzo-Dolcetta and Lions (1990) and Soner (1986). There exist many numerical methods to compute the solution of a given optimal control problem; for instance, multiple shooting techniques which solve two-point boundary value problems as described, e.g., in Pesch (1994), Stoer and Bulirsch (2002), or direct methods, as, e.g., in von Stryk and Bulirsch (1992), Fletcher (1980), Gill et al. (1981), which use, among others, descent or gradient-like algorithms. To deal with optimal control problems with state constraints, some adapted versions of the maximum principle have been developed (see Jacobson et al. (1971), Maurer (1977), and see Hartl et al. (1995) for a survey of this theory), but happen to be very hard to implement in general. The LP approach for deterministic optimal control problems is de-
10.5 Notes and Sources
217
scribed in e.g. Hernandez-Hernandez et al. (1996) whereas Vinter (1993) considers the even more general context of differential inclusions. General LP-approximation schemes based on grids have been proposed in e.g. Hern´ andez-Lerma and Lasserre (1998a), and more recently in Gaitsgory and Rossomakhine (2006) in the control context. This LP approach with occupation measures has also been used in the context of discrete-time Markov control processes, and is dual to Bellman’s optimality principle. For more details the interested reader is referred to Borkar (2002), Hern´ andez-Lerma and Lasserre (1996, 1999, 2003) and the many references therein. For some continuous-time stochastic control problems (e.g., modeled by diffusions) and optimal stopping problems, the LP approach has also been used with success to prove existence of stationary optimal policies; see for instance Jung and Stockbridge (2002), Helmes and Stockbridge (2000), Helmes et al. (2001), Kurtz and Stockbridge (1998), and also Fleming and Vermes (1989). In some of these works, the moment approach is also used to approximate the resulting infinite-dimensional LP. Finally, let us mention the work of Prajna and Rantzer (2007) where the notion of occupation measures associated with trajectories of an autonomous dynamical system is used to certify some temporal properties.
Chapter 11
Convex Envelope and Representation of Convex Sets
This chapter considers the following problem: Given a rational function f on a basic semi-algebraic set K, evaluate fb(x) at a particular point x in the domain of the convex envelope fb of f . We then consider the semidefinite representation of the convex hull co(K) of K. That is, finding a set defined by linear matrix inequalities in a lifted space, such that co(K) is a projection of that set.
11.1
The Convex Envelope of a Rational Function
Recall that the convex envelope fb of a real-valued function f : Rn →R is the largest convex function majorized by f . Computing the convex envelope fb of f is a difficult problem, and so far, there is no efficient algorithm that approximates fb by convex functions (except for the simpler univariate case). In this section, we consider the class of rational fractions f on a compact basic semi-algebraic set K ⊂ Rn (and f = +∞ outside K). We view the problem as a particular instance of the generalized moment problem (1.1), and we provide an algorithm for computing convex and uniform approximations of its convex envelope fb. More precisely, with D := co(K) being the convex hull of K:
(a) We provide a sequence of convex functions (fi )i that converges to fb uniformly on any compact subset of D where fb is continuous, as i increases. (b) At each point x ∈ Rn , computing fi (x) reduces to solving a semidefinite program Qix . (c) For every x ∈ int D, the semidefinite dual Q∗ix has an optimal solution, and any optimal solution provides an element of the subgradient ∂fi (x) at the point x ∈ int D. 219
220
11.1.1
11 Convex Envelope and Representation of Convex Sets
Convex envelope and the generalized moment problem
Let K ⊂ Rn be compact, and denote by D its convex hull. Hence, by a theorem of Caratheodory, D is convex and compact; see Rockafellar (1970). Also, recall that C(K) is the Banach space of real-valued continuous functions on K (hence bounded), equipped with the sup-norm kf k := supx∈K |f (x)|, whereas the Banach space M (K) of finite signed Borel measures on K, equipped with the norm of total variation, is identified with C(K)∗ , the topological dual of C(K). Its positive cone M (K)+ is the set of finite Borel measures on K. Finally, let P(K) ⊂ M (K)+ be the set of Borel probability measures on K. The spaces M (K) and C(K) form a dual pair of vector spaces, with duality bracket hµ, f i :=
Z
µ ∈ M (K), f ∈ C(K),
f dµ, K
and the associated weak ? topology on M (K) is the coarsest topology for which µ → hµ, f i is continuous for every function f in C(K). e the natural extension to Rn of f ∈ C(K), that is Denote by f, x 7→ fe(x) :=
f (x) on K +∞ on Rn \ K.
(11.1)
Note that fe is lower-semicontinuous (l.s.c.), admits a minimum and its effective domain K is non-empty and compact (denoted by dom f˜ in the sequel). With f ∈ C(K), the convex envelope fb of fe is the largest convex e function majorized by f. With f ∈ C(K), and x ∈ D fixed, arbitrary, consider the infinitedimensional linear program: ρ1mom (x) =
inf
µ∈M (K)+
hµ, f i
s.t. hµ, zi i = xi , hµ, 1i = 1
i = 1, . . . , n
(11.2)
R (where recall that hµ, zi i = K zi dµ). Notice that (11.2) is a particular instance of the generalized moment problem (1.1).
11.1 The Convex Envelope of a Rational Function
221
Lemma 11.1. Let D = co(K), f ∈ C(K) and fe be as in (11.1). Then the convex envelope fb of fe is given by: 1 ρmom (x), if x ∈ D, b f(x) = (11.3) +∞, otherwise, and so, D = dom fb.
Proof. With x ∈ D fixed, let ∆x (K) ⊂ P(K) be the set of probability measures µ on K centered at x (that is hµ, zi i = xi for i = 1, . . . , n), and let ∆∗x (K) ⊂ ∆x (K) be its subset of probability measures that have a finite support. Then: b f(x) =
inf
µ∈∆∗ x (K)
hµ, f i,
∀x ∈ D.
(See Choquet (1969).) Next, since ∆∗x (K) is dense in ∆x (K) with respect to the weak ? topology, and P(K) is metrizable and compact with respect to the same topology (see Choquet (1969)), fb(x) =
min hµ, f i = ρ1mom (x),
µ∈∆x (K)
for every x ∈ D. If x ∈ / D, there is no probability measure on K, with finite b support, and centered in x; therefore f(x) = +∞. Next, let p, q ∈ R[x], with q > 0 on K, and let f ∈ C(K) be defined as x 7→ f (x) = p(x)/q(x),
x ∈ K.
(11.4)
For every x ∈ D fixed, consider the infinite-dimensional LP, ρmom (x) =
inf
µ∈M (K)+
hµ, pi
s.t. hµ, zi qi = xi , hµ, qi = 1.
i = 1, . . . , n
(11.5)
The dual of (11.5) is the infinite-dimensional LP ρpop (x) =
sup γ∈R,λ∈Rn
{γ + hλ, xi : p(y) − q(y)hλ, yi ≥ γq(y),
∀ y ∈ K}, (11.6)
11 Convex Envelope and Representation of Convex Sets
222
Equivalently, as q > 0 everywhere on K, and f = p/q on K, ρpop (x) =
sup γ∈R,λ∈Rm
{γ + hλ, xi :
f (y) − hλ, yi ≥ γ,
∀ y ∈ K}. (11.7)
In view of the definition of f˜, notice that f (y) − hλ, yi ≥ γ,
∀y ∈ K
⇔
e − hλ, yi ≥ γ, f(y)
∀y ∈ Rn .
Hence (11.7) is just the dual of (11.2), for every x ∈ D, and so ρpop (x) ≤ ρ1mom (x), for every x ∈ D. In fact we have the following well known result (that holds of course in a more general framework).
Theorem 11.2. Let p, q ∈ R[x] with q > 0 on K, and let f be as in (11.4). Let x ∈ D (= co(K)) be fixed, arbitrary, and let ρmom (x) and ρpop (x) be as in (11.5) and (11.7), respectively. Then both (11.5) and (11.7) have an optimal solution and there is no duality gap, i.e., b ρpop (x) = ρmom (x) = ρ1mom (x) = f(x),
∀ x ∈ D.
(11.8)
In fact, the above result can be completed. Let f ∗ : Rn → R be the e i.e., Legendre-Fenchel conjugate of f, λ 7→ f ∗ (λ) := sup : {hλ, yi − fe(y)}. y∈Rn
b Let ∂ f(x) denote the subdifferential of fb at the point x. From Theorem b 23.4 in Rockafellar (1970), ∂ f(x) 6= ∅ at least for every x in the relative interior of D. Corollary 11.3. Let x ∈ D be fixed, arbitrary, and let ρpop (x) be as in (11.7). (a) (11.7) is solvable if and only if ∂ fb(x) 6= ∅, in which case any optimal solution (λ∗ , γ ∗ ) satisfies: b λ∗ ∈ ∂ f(x),
and
γ ∗ = −f ∗ (λ∗ ).
(11.9)
b (b) If f is as in (11.4), then ∂ f(x) 6= ∅ for every x in D, and so, (11.7) is solvable and (a) holds for every x in D.
11.1 The Convex Envelope of a Rational Function
11.1.2
223
Semidefinite relaxations
So let K ⊂ Rn be the basic closed semi-algebraic set defined in (4.1). The infinite dimensional LP (11.5) is an obvious instance of the generalized moment problem (1.1). Therefore, in the present context, the semidefinite relaxations (4.5) associated with (11.5) read: ρi (x) = inf Ly (p) y
s.t. Ly (zk q) = xk ,
k = 1, . . . , n (11.10)
Mi (y) 0,
Mi−vj (gj y) 0, j = 1, . . . , m Ly (q) = 1,
while its dual reads: ρ∗i (x) =
sup γ + hλ, xi
γ,λ,(σj )
s.t. p(z) − (hλ, zi + γ)q(z) = σ0 (z) + σj ∈ Σ[z], j = 0, . . . , m deg σ0 , deg σj gj ≤ 2i.
m X
σj (z) gj (z),
j=1
∀z
(11.11)
Theorem 11.4. Let K be as in (4.1), and let Assumption 2.1 hold. Let f be as in (11.4) with p, q ∈ R[x], and with q > 0 on K. Let fb be as in (11.3), and with x ∈ D (= co(K)) fixed, consider the semidefinite relaxations (11.10)-(11.11). Then: (a) The function fi : Rn → R ∪ {+∞} defined by x 7→ fi (x) := ρi (x),
x ∈ Rn ,
(11.12)
is convex, and as i → ∞, fi (x) ↑ fb(x) pointwise, for all x ∈ Rn . (b) If D has a nonempty interior int D, then (11.11) is solvable and: ρ∗i (x) = ρi (x) = fi (x),
x ∈ int D,
and for every optimal solution (λ∗i , γi∗ ) of (11.11), fi (z) − fi (x) ≥ hλ∗i , z − xi, that is, λ∗i ∈ ∂fi (x).
∀ z ∈ Rn ,
(11.13)
11 Convex Envelope and Representation of Convex Sets
224
So, Theorem 11.13 not only provides a convex approximation fi of fb with monotone pointwise convergence fi (x) ↑ fb(x) as i→∞, but it also provides a subgradient λ ∈ ∂fi (x) at every point x ∈ D. We even get the stronger uniform convergence: Corollary 11.5. Let K be as in (4.1), and let Assumption 2.1 hold. Let f and fb be as in (11.4) and (11.3) respectively, and let fi : D → R, be as in Theorem 11.13. Then fi is lower-semicontinuous. Moreover, for every compact Ω ⊂ D on which fb is continuous, b − fi (x)| = 0, lim sup |f(x)
(11.14)
i→∞ x∈Ω
b uniformly that is, the monotone nondecreasing sequence {fi } converges to f, on every compact on which fb is continuous. Example 11.1. Consider the bivariate rational function f : [−1, 1]2 → R: x 7→ f (x) :=
x1 x2 , 1 + x21 + x22
x ∈ [−1, 1]2,
on [−1, 1]2, displayed in Figure 11.1, with f3 as well. In Figure 11.2 we
0.
0.
0.3
0.3
0.2
0.2
0.1
0.1
0
0
−0.1
−0.1
−0.2
−0.2
−0.3
−0.3
−0. 1
−0. 1 05
1 0.5
0
0.5
1 05
0
0 −0 5
−0.5 −1
0 −0.5
−0.5 −1
−1
Fig. 11.1
−1
Example 11.1, f and f3 on [−1, 1]2 .
have displayed (f3 − f2 ) which is of the order 10−9 (which explains why for a few values of x ∈ [−1, 1]2 one may sometimes have f3 (x) ≤ f2 (x) as we are at the limit of machine precision). It also means that again f2 provides a very good approximation of the convex envelope fb, that is, a very good approximation is already obtained at the first relaxation (i.e., here with i = 2) !
11.2 Semidefinite Representation of Convex Sets
225
da a1
data1
−9
−9
x 10
x 10
8
7
6
6 5
2 3 0
2
−2
1
− 1
0 1 1
05
1
0.5
05
0
0.5
0
0 −0.5
−0.5 −1
−0.5 −1
−1
Fig. 11.2
11.2
0 −0 5 −1
Example 11.1, f3 − f2 and (f3 − f2 )+ on [−1, 1]2 .
Semidefinite Representation of Convex Sets
Semidefinite programming is a powerful optimization tool with a growing number of various applications. Therefore, an important issue is to characterize convex sets that have a semidefinite representation (SDr), and called SDr sets.
Definition 11.1. A convex set Ω ⊂ Rn is SDr if there exist integers m, p and real p × p symmetric matrices (Ai )ni=0 , (Bj )m j=1 such that: Ω = {x ∈ Rn : ∃ y ∈ Rm s.t. A0 +
n X
A i xi +
i=1
m X j=1
Bj yj 0 }. (11.15)
In other words, Ω is the linear projection on Rn of the convex set Ω0 := {(x, y) ∈ Rn × Rm : A0 +
n X i=1
A i xi +
m X j=1
Bj y j 0 }
of the lifted space Rn+m , and Ω0 is called a semidefinite representation (SDr) of Ω. The set Ω0 is also called a lifted SDr or a lifted LMI (Linear Matrix Inequality) because one sometimes needs additional variables y ∈ Rm to obtain a description of Ω via appropriate LMIs. Here are examples of SDr sets:
11 Convex Envelope and Representation of Convex Sets
226
• The intersection of half-spaces, i.e., a polyhedron {x ∈ Rn : Ax ≤ b}, is a trivial example of convex sets whose SDr is readily available without lifting. Indeed Ax ≤ b is an LMI with diagonal matrices Ai in (11.15). • The intersection of ellipsoids Ω := {x ∈ Rn : x0 Qj x + b0 x + cj ≥ 0, j = 1, . . . , m } (where −Qj 0 for all j =, . . . , m) is a SDr set with lifted LMI representation in R(n+1)(n+2)/2−1 : 1 x0 0 xY 0 Ω = (x, Y) : trace (Qj Y) + b0 x + cj ≥ 0, j = 1, . . . , m.
• The epigraph of a univariate convex polynomial is a SDr set. • Convex sets of R2 described from genus-zero plane curves are SDr sets; see Parrilo (2007). • Hyperbolic cones obtained from 3-variables hyperbolic homogeneous polynomials are SDr sets; see the proof of the Lax conjecture in Lewis et al. (2005).
In this section, we consider semidefinite representations for the convex hull co(K) of compact basic semi-algebraic sets K ⊂ Rn as described in (4.1). We show that co(K) has a SDr if the defining polynomials (gj ) ⊂ R[x] of K satisfy what we call the Putinar’s Bounded Degree Representation (P-BDR) of affine polynomials. However, for general basic semi-algebraic sets K, one cannot expect that the P-BDR property holds (if it ever holds) for nice values of the order r. Indeed otherwise one could minimize any affine polynomial on K efficiently. So from a practical point of view, the most interesting case is essentially the convex case, that is, when K is convex, and even more, when the defining polynomials gj in (2.10) are concave, in which case one may hope that the P-BDR property holds for nice values of r. 11.2.1
Semidefinite representation of co(K)
Let K ⊂ Rn be the basic closed semi-algebraic set defined in (4.1) for some Pm polynomials gj ∈ R[x], j = 1, . . . , m. Recall that Q(g) = { j=0 σj gj : σj ∈ Σ[x]} is the quadratic module generated by the gj ’s (with the convention
11.2 Semidefinite Representation of Convex Sets
227
g0 = 1), and given r ∈ N, define Qr (g) ⊂ Q(g) to be the set Qr (g) := {
m X j=0
σj g j :
σj ∈ Σ[x], deg σj + deg gj ≤ 2r}.
For an affine polynomial x 7→ f0 + vector of coefficients.
Pn
i=1
(11.16)
fi xi , let (f0 , f ) ∈ R × Rn be its
Definition 11.2 (P-BDR property). Given a compact set K ⊂ Rn defined as in (4.1), the Putinar’s Bounded Degree Representation (PBDR) of affine polynomials holds for K if there exists r ∈ N such that [ f affine and positive on K ]
⇒
f ∈ Qr (g),
(11.17)
except perhaps for a set of vectors f in Rn with Lebesgue measure zero. Call r its order. Recall that if Assumption 2.1 holds for the gj ’s that define K then by Theorem 2.14, if f ∈ R[x] is positive on K then f ∈ Qr (g) for some r = rf that depends on f . In Definition 11.2, the P-BDR property requires that rf < r for almost all affine f ∈ R[x], positive on K. n Theorem 11.6. Let K ⊂ R be compact and defined as in (4.1) and let n+2r s(2r) := n .
If the P-BDR property holds for K with order r, then co(K) is a SDr set and the convex set Mr−rj (gj y) 0, j = 0, 1, . . . , m Ωr := (x, y) ∈ Rn × Rs(2r) : Ly (zi ) = xi , i = 1, . . . , n y0 =1 (11.18) is a semidefinite representation of co(K). Notice that in Theorem 11.6, the SDr (11.18) of co(K) is given explicitly in terms of the data (gj ) that define K. We have already seen that the intersection K of half-spaces and/or ellipsoids is a SDr set. It is relatively easy to show that in both cases the P-BDR
11 Convex Envelope and Representation of Convex Sets
228
property holds for K with r = 1. We next consider two other nontrivial examples. Example 11.2. Let n = 2 with gj concave and deg gj = 2 or 4, for all j = 1, . . . , m, so that K is convex. It is known that in general K is not representable by a LMI in the variables x1 and x2 only. For instance take m = 1 and g1 (x) = 1 − x41 − x42 . The rigid convexity condition of Helton and Vinnikov (2007) is violated, and so there is no SDr of K involving only x. But on the other hand, K is known to be SDr. Assume K is compact, Slater’s condition holds1 and let f ∈ R[x] be affine and nonnegative on K with global minimum f ∗ ≥ 0 on K. Convexity along with Slater’s condition implies that the KKT optimality conditions (5.25) hold at any global minimizer x∗ ∈ K; see Section C.1. And so there exist nonnegative Lagrange multipliers λ ∈ Rm + such that the (convex) P Lagrangian Lf := f − f ∗ − m λ g has optimal value 0 and, in addition, j j j=1 x∗ ∈ K is a global minimizer of Lf on R2 . Therefore, the polynomial Lf being nonnegative on R2 and being quadratic or quartic in 2 variables, is s.o.s. That is, Lf = σ for some σ ∈ Σ[x] and deg σ ≤ 4. But then f = f ∗ + Lf +
m X j=1
λj gj ∈ Q2 (g)
because as f ∗ ≥ 0, f ∗ + Lf ∈ Σ[x]. That is, the P-BDR property holds for K with order r = 2. Hence, K has the SDr (11.18). Example 11.3. Let m = 2 with gi (x) = x0 Ai x + ci ,
i = 1, 2,
(11.19)
for some real symmetric matrices Ai , and vector c = (c1 , c2 ) ∈ R2 . Given a linear polynomial f ∈ R[x] with coefficient vector f = (fi )ni=1 ∈ n R , consider the semidefinite program: ρ = min { Ly (f ) : M1 (y) 0; Ly (gi ) ≥ 0; i = 1, 2; y0 = 1 }. y
(11.20)
Proposition 11.7. Let K ⊂ Rn be defined as in (4.1) with m = 2 and gj as in (11.19), and let Q be as in (11.20). Assume that K is compact with nonempty interior and λ1 A 1 + λ 2 A 2 ≺ 0
(11.21)
1 Recall that Slater’s condition holds if there is some x ∈ K such that g (x ) > 0 0 0 j for all j = 1, . . . , m.
11.2 Semidefinite Representation of Convex Sets
229
for some λ = (λ1 , λ2 ) ≥ 0. Then ρ = f ∗ for almost all f ∈ Rn , and so the P-BDR property holds for co(K) with order r = 1. That is, co(K) has the semidefinite representation (11.18). Outer approximation of co(K) by an SDr set Let kxk denote the Euclidean norm of x ∈ Rn . With B0 := { x ∈ Rn : kxk ≤ 1} and given a compact set Ω ⊂ Rn and ρ > 0, Ω + ρ B0 = { x ∈ Rn : inf kx − yk ≤ ρ }. y∈Ω
Theorem 11.8. Let K ⊂ Rn be a compact set as defined in (4.1), and let Assumption 2.1 hold. For every fixed > 0 there is an integer r ∈ N such that co(K) ⊆ K ⊂ co(K) + B0 , where K is the SDr convex set Mr −rj (gj y) 0, j = 0, . . . , m x ∈ Rn : ∃ y ∈ Rs(2r ) s.t. Ly (zi ) = xi , i = 1, . . . , n y0 = 1. (11.22) In addition, bounds on r are available. Hence Theorem 11.8 shows that even if co(K) is not SDr, an arbitrarily close approximation K of co(K) is SDr. Moreover, the semidefinite representation (11.22) is explicit in terms of the polynomials gj that define K. 11.2.2
Semidefinite representation of convex basic semialgebraic sets
As already mentioned, for a general non convex basic semi-algebraic set K ⊂ Rn , one cannot expect the P-BDR property to hold (if it ever holds) for nice values of the order r. Indeed otherwise one could minimize any affine polynomial on K efficiently. So from a practical point of view, the most interesting case is essentially when K is convex, and even more, when the defining polynomials (gj ) in (2.10) are concave.
230
11 Convex Envelope and Representation of Convex Sets
Therefore in this section, K ⊂ Rn defined in (4.1) is compact and convex. We will use the Lagrangian Lf (see (5.26) and (C.9)) associated with the polynomials (gj ) and a linear f ∈ R[x], already encountered in Example 11.2. Lemma 11.9. Let K ⊂ Rn in (4.1) be compact, and assume that the gj ’s that define K are all concave and Slater’s condition holds. Given f ∈ R[x], let f ∗ := minx∈K f (x). For every linear f ∈ R[x] with kf k = 1, there exists λf ∈ Rm + such that the Lagrangian x 7→ Lf (x) := f (x) − f ∗ −
m X
λfj gj (x)
(11.23)
j=1
is nonnegative on Rn . In addition |f ∗ | ≤ τK ;
λfj ≤ MK ,
∀ j = 1, . . . , m,
(11.24)
where τK and MK are independent of f .
Theorem 11.10. Let K ⊂ Rn be compact and defined as in (4.1). Assume that the (gj ) that define K are all concave and Slater’s condition holds. Given a linear polynomial f ∈ R[x], let Lf be the Lagrangian defined in (11.23). If Lf is s.o.s. for every linear f ∈ R[x], then the P-BDR property holds for K with order r = max ddeg gj /2e, and K is a SDr set. In j=1,...,m
addition, the convex set Mr (y) Ly (gj ) (x, y) ∈ Rn × Rs(2r) : Ly (zi ) y0 is a SDr of K.
0 ≥ 0, j = 1, . . . , m = xi , i = 1, . . . , n =1
(11.25)
The semidefinite representation (11.25) of the convex set K has two important advantages over the semidefinite representation (11.18) of co(K) in the non convex case: • The index r in Theorem 11.10 is known in advance.
11.2 Semidefinite Representation of Convex Sets
231
• Moreover, the semidefinite constraint 0 is only concerned with the moment matrix Mr (y), that is, it does not depend on the data (gj ) that define the set K! Therefore, in view of Theorem 11.10, an important issue to find sufficient conditions to ensure that the Lagrangian Lf is s.o.s., and if possible, conditions that can be checked directly from the data gj . For instance, we have seen in Section 2.1. a sufficient conditions on the coefficients of a polynomial to ensure it is s.o.s. Another condition is concerned with the Hesssian of Lf . Recall from Definition 2.3 that a polynomial f ∈ R[x] is s.o.s.-convex if its Hessian ∇2 f is s.o.s., i.e., ∇2 f = W W0 for some possibily non square matrix polynomial W ∈ R[x]n×k . Theorem 11.11. Let K ⊂ Rn be as in (4.1) and let Assumption 2.1 hold. Assume that the gj ’s that define K are all concave and Slater’s condition holds. (a) If −gj is s.o.s.-convex for every j = 1, . . . , m, then K is SDr with semidefinite representation (11.25). (b) If every −gj is either s.o.s.-convex or satisfies −∇2 gj 0 on K∩{x : gj (x) = 0}, then there exists r ∈ N such that the set Ωr in (11.18) is a semidefinite representation of K.
Proof. Let f ∈ R[x] be linear and as K is compact, let x∗ be a minimizer of f on K. Then from the definition (11.23) of Lf , it follows that Lf (x∗ ) = 0 and ∇Lf (x∗ ) = 0. (a) As every −gj is s.o.s.-convex then so is Lf . Therefore, by Lemma 2.30, Lf ∈ Σ[x], and the conclusion follows from Theorem 11.10. (b) As Lf (x∗ ) = 0 and ∇Lf (x∗ ) = 0, ∗ 0
Lf (x) = (x − x )
Observe that ∇2 Lf = −
Z |
1 0
Z
t 0
∇2 Lf (x∗ + s(x − x∗ ))dsdt (x − x∗ ). {z } F(x,x∗ )
P
j∈J
λfj ∇2 gj where J := {j ∈ {1, . . . , m} : λfj >
11 Convex Envelope and Representation of Convex Sets
232
0} 6= ∅. If −∇2 gj (u) 0 on K∩{u : gj (u) = 0} then the matrix polynomial Z
(x, u) 7→ Fj (x, u) :=
1 0
Z
t 0
−∇2 gj (u + s(x − u)) dsdt
is positive definite on ∆j := {(x, u) ∈ K × K : gj (u) = 0}. Indeed, it is obviously positive semidefinite on ∆j because −∇2 gj 0 and K is convex. Next, with (x, u) ∈ ∆j fixed, suppose that 0
ξ Fj (x, u)ξ :=
Z
1 0
Z
t 0
−ξ 0 ∇2 gj (u + s(x − u))ξ dsdt = 0
for some ξ ∈ Rn . Then necessarily −ξ 0 ∇2 gj (u+s(x−u))ξ = 0 for almost all s ∈ [0, 1], in contradiction with −∇2 gj (u) 0. Hence Fj (x, u) 0 on ∆j . As the smallest eigenvalue of Fj is continuous in (x, u) and ∆j is compact, its minimum on ∆j is positive, and so Fj (x, u) δj In for all (x, u) ∈ ∆j and some δj > 0 (where In is the n × n identity matrix). In particular, Fj (x, x∗ ) δj In for all j ∈ J and all x ∈ K because (x, x∗ ) ∈ ∆j . Hence by Theorem 2.22, for every j ∈ J, x 7→ Fj (x, x∗ ) = Fj0 (x) +
m X
Fjk (x) gk (x),
(11.26)
k=1
for some s.o.s. matrix polynomials (Fjk )m k=0 . Similarly, if −gj is s.o.s.convex then by Lemma 2.29, x 7→ Fj (x, x∗ ) is s.o.s. and so can be written as in (11.26) (take Fjk = 0, k ≥ 1). Hence (with g0 = 1) X f x 7→ Lf (x) = (x − x∗ )0 λj Fj (x, x∗ ) (x − x∗ ) =
m X
k=0
=
m X
gk (x)
j∈J
X
λfj (x
j∈J
− x ) Fjk (x)(x − x ) ∗ 0
∗
σj (x) gj (x),
k=0
for some s.o.s. polynomials (σj ) ∈ Σ[x]. Importantly, notice that the degrees of the s.o.s. matrix polynomial weights (Fjk ) do not depend on f . They only depend on δj , deg gj , and kFj k on the compact set ∆j ; see Theorem 2.22. Hence the degree of the s.o.s. polynomials (σj ) ⊂ Σ[x] is bounded uniformly in f by say 2r ∈ N. Therefore, for every linear
11.2 Semidefinite Representation of Convex Sets
polynomial f ∈ R[x], f (x) − f ∗ = Lf (x) +
X
λfj gj (x) =
j∈J
m X
233
σj0 (x) gj (x),
k=0
(σj0 )
for some s.o.s. ⊂ Σ[x] whose degree is bounded uniformly in f by 2r + 2. This implies that the P-BDR property holds for K with order r + 1, and so, by Theorem 11.6, Ωr+1 in (11.18) is a SDr of K. Example 11.4. Consider the class of convex sets K ⊂ Rn where gj ∈ R[x] is concave and separable for every j = 1, . . . , m. That is, for every Pn j = 1, . . . , m, gj is concave and gj = k=1 gjk (xk ) for some univariate polynomials gjk ∈ R[xk ], k = 1, . . . , n. We have seen in Example 2.5 that −gj is s.o.s.-convex. Therefore by Theorem 11.11, K has the SDr (11.25). P 1/2d In particular, letting kxkd := ( ni=1 x2d , the d-Euclidean ball K := i ) n 2d {x ∈ R : kxkd ≤ 1} is SDr for all d ≥ 1. Outer approximation of K by an SDr set In case where we cannot prove that K is SDr, we next provide a convex SDr set K which is an outer approximation of K, arbitrarily close to K. As K is compact, let τK ∈ R be such that: x∈K
⇒
kxk ≤ τK .
(11.27)
Theorem 11.12. Let K ⊂ Rn as in (4.1) be compact, with τK as in (11.27). Assume that the polynomials (gj ) that define K are all concave and Slater’s condition holds. With r ∈ N, r ≥ ddeg gj /2e, j = 1, . . . , m, let x 7→ Pn 2r Θr (x) := i=1 (xi /τK ) . Then for every > 0, there exists r ∈ N such that K ⊆ Kr ⊆ K + B0 , where Kr is the SDr convex set x ∈ Rn : ∃ y ∈ Rs(2r) s.t.
Mr (y) Ly (gj ) Ly (Θr ) Ly (zi ) y0
0 ≥ 0, j = 1, . . . , m ≤ 1 = xi , i = 1, . . . , n = 1.
(11.28)
(11.29)
Notice that like (11.25), the semidefinite representation (11.29) is explicit in terms of the gj ’s and moreover, the semidefinite constraint 0 is
234
11 Convex Envelope and Representation of Convex Sets
only concerned with the moment matrix Mr (y), that is, it does not depend on the data (gj ) that define K. 11.3
Algebraic Certificates of Convexity
In this final section we consider the problem of detecting whether some given basic semi-algebraic set K is convex. By detecting, we mean that if K is convex then one may obtain a certificate (or a proof) of convexity by some algorithm. The geometric characterization of convexity λx + (1 − λ)y ∈ K,
∀ x, y ∈ K, λ ∈ (0, 1)
(11.30)
is not a certificate because when this property holds, it cannot be checked by an algorithm. b ⊂ Rn × Given the basic semi-algebraic set K defined in (2.10), let K n R × R be the associated basic semi-algebraic set: where:
b := {(x, y, λ) : gˆj (x, y, λ) ≥ 0, j = 1, . . . , 2m + 1 } K (x, y, λ) 7→ gˆj (x, y, λ) := gj (x),
(11.31)
j = 1, . . . , m
(x, y, λ) 7→ gˆj (x, y, λ) := gj−m (y),
(x, y, λ) 7→ gˆj (x, y, λ) := λ(1 − λ),
j = m + 1, . . . , 2m j = 2m + 1,
and let P (ˆ g ) ⊂ R[x, y, λ] be the preordering associated with the polynomials b in (11.31), i.e., gˆj that define K ! Y X P (ˆ g) := σJ gˆk : σJ ∈ Σ[x, y, λ] . (11.32) J⊆{1,...,2m+1}
k∈J
Our algebraic certificate of convexity is a follows.
Theorem 11.13. Let K ⊂ Rn be the basic semi-algebraic set defined in (2.10). Then K is convex if and only if for every j = 1, . . . , m and all (x, y, λ) ∈ Rn × Rn × R: θj (x, y, λ) gj (λ x+(1−λ) y) = gj (λ x+(1−λ) y)2pj +hj (x, y, λ) (11.33) for some polynomials θj , hj ∈ P (ˆ g ) and some integer pj ∈ N.
11.3 Algebraic Certificates of Convexity
235
Proof. The set K is convex if and only if (11.30) holds, that is, if and only if for every j = 1, . . . , m, gj (λx + (1 − λ)y) ≥ 0,
∀x, y ∈ K, λ ∈ [0, 1],
or, equivalently, if and only if for every j = 1, . . . , m, gj (λx + (1 − λ)y) ≥ 0,
b ∀(x, y, λ) ∈ K.
(11.34)
But then (11.33) is just an application of Stengle’s Nichtnegativstellensatz, i.e., Theorem 2.12(a), applied to (11.34). The polynomials θj , hj ∈ P (ˆ g), j = 1, . . . , m, obtained in (11.33) indeed provide an obvious algebraic certificate of convexity for K. This is because if (11.33) holds then for every x, y ∈ K and every λ ∈ [0, 1] one has θj (x, y, λ) ≥ 0 and hj (x, y, λ) ≥ 0 because θj , hj ∈ P (ˆ g); and so θj (x, y, λ)gj (λx + (1 − λ)y) ≥ 0. Therefore if θj (x, y, λ) > 0 then gj (λx+(1−λ)y) ≥ 0 whereas if θj (x, y, λ) = 0 then gj (λx+(1−λ)y)2pj = 0 which in turn implies gj (λx + (1 − λ)y) = 0. Hence for every j = 1, . . . , m, gj (λx+(1−λ)y) ≥ 0 for every x, y ∈ K and every λ ∈ [0, 1], that is, (11.30) holds and so K is convex. In principle, the algebraic certificate can be obtained numerically. Indeed, there is a bound on pj ∈ N, and the degrees of the s.o.s. weights σJ in the representation (11.32) of the polynomial certificates θj , hj ∈ P (ˆ g) in (11.33); see the discussion just after Theorem 2.12. Therefore, checking whether (11.33) holds reduces to checking whether some semidefinite program has a feasible solution. However the bound is so large that for practical implementation one should proceed as follows. Fix an a priori bound M on pj ∈ N and on the degrees of the s.o.s. polynomial weights σJ that define hj , θj ∈ P (ˆ g); then check whether (11.33) holds true by solving the associated semidefinite program. If K is convex and the degrees of the certificates are small, then by increasing M one eventually finds a feasible solution. However, in practice such a certificate can be obtained only up to some machine precision, because of numerical inaccuracies inherent to semidefinite programming solvers. Notice that the certificate in Theorem 11.13 involves polynomials in R[x, y, λ]. We next provide another algebraic certificate that does not involve the variable λ, but for the class of basic semi-algebraic sets K whose
236
11 Convex Envelope and Representation of Convex Sets
defining polynomials (gj ) satisfy the following non degeneracy assumption on the boundary ∂K. Assumption 11.1 (Non-degeneracy). For a basic semi-algebraic set K ⊂ Rn as in (2.10), the polynomials (gj ) ⊂ R[x] satisfy the nondegeneracy property if for every j = 1, . . . , m, ∇gj (x) 6= 0 whenever x ∈ K and gj (x) = 0. We first have the following characterization of convexity: Lemma 11.14. Let K be as in (2.10) and let both Slater’s condition and Assumption 11.1 hold. Then K is convex if and only if for every j = 1, . . . , m, h∇gj (y), x − yi ≥ 0,
∀ x, y ∈ K with gj (y) = 0.
(11.35)
Proof. The only if part is obvious. Indeed if h∇gj (y), x − yi < 0 for some x ∈ K and y ∈ K with gj (y) = 0, then there is some t > 0 such that gj (y + t(x − y)) < 0 for all t ∈ (0, t) and so the point x0 := tx + (1 − t)y does not belong to K, which in turn implies that K is not convex. For the if part, (11.35) implies that at every point of the boundary, there exists a supporting hyperplane for K. As K is closed with nonempty interior, the result follows from (Schneider, 1994, Theor. 1.3.3). As a consequence we obtain the following algebraic certificate of convexity. Corollary 11.15 (Algebraic certificate of convexity). Let K be as in (2.10), and let both Slater’s condition and Assumption 11.1 hold. Then K is convex if and only if for every j = 1, . . . , m, hj h∇gj (y), x − yi = h∇gj (y), x − yi2l + θj + ϕj gj
(11.36)
for some integer l ∈ N, some polynomial ϕj ∈ R[x, y] and some polynomials hj , θj in the preordering of R[x, y] generated by the family of polynomials (gk (x), gp (y)), k, p ∈ {1, . . . , m}, p 6= j. Proof. By Lemma 11.14, K is convex if and only if for every j = 1, . . . , m, the polynomial (x, y) 7→ h∇gj (y), x−yi is nonnegative on the set Ωj defined by: Ωj := {(x, y) ∈ K × K : gj (y) = 0 }.
(11.37)
11.3 Algebraic Certificates of Convexity
237
Then (11.36) follows from Theorem 2.12(a).
So if K is convex then (l, hj , θj , ϕj ) provides us with the desired certificate of convexity, which in principle, can be obtained numerically as for the algebraic certificate (11.33). However, in practice such a certificate can be obtained only up to some machine precision, because of numerical inaccuracies inherent to semidefinite programming solvers. See the discussion after Theorem 11.13. Observe that in Corollary 11.15, K is not necessarily compact. For compact basic semi-algebraic sets K that satisfy Assumption 2.1, one provides the following certificate. Assumption 11.2 (Certificate of convexity). Both Slater’s condition and Assumption 11.1 hold for the set K ⊂ Rn in (2.10). In addition, for every j = 1, . . . , m, the polynomial gj satisfies (x, y) 7→ h ∇gj (y), x − y i =
m X
σjk gk (x)
(11.38)
k=0
+
m X
ψjk gk (y) + ψj gj (y),
k=0,k6=j
for some s.o.s. polynomials (σjk ), (ψjk )k6=j ⊂ Σ[x, y], and some ψj ∈ R[x, y]. If Assumption 11.2 hold then K is convex because obviously (11.38) implies (11.35), which in turn, by Lemma 11.14, implies that K is convex. By fixing an a priori bound 2dj on the polynomial (σjk gk , ψjk gk , ψj gj ), checking whether (11.38) holds reduces to solving a semidefinite program, simpler than for checking whether (11.36) holds. For instance, for every j = 1, . . . , m, it suffices to solve the semidefinite program (recall that rk = d(deg gk )/2e, k = 1 . . . , m) ρj := min Lz (h∇gj (y), x − yi) z s.t. Mdj (z) 0 Mdj −rk (gk (x) z) 0, k = 1, . . . , m (11.39) Mdj −rk (gk (y) z) 0, k = 1, . . . , m; k 6= j Mdj −rj (gj (y) z) = 0 y0 = 1. If ρj = 0 for every j = 1, . . . , m, then Assumption 11.2 holds. However, again because of the numerical inaccuracies inherent to the SDP solvers,
238
11 Convex Envelope and Representation of Convex Sets
one would only get ρj ≈ 0; and so, this certificate of convexity is valid only up to machine precision. Example 11.5. Consider the following simple illustrative example in R2 : K := { x ∈ R2 : x1 x2 −1/4 ≥ 0; 0.5−(x1 −0.5)2 −(x2 −0.5)2 ≥ 0 } (11.40) Obviously K is convex but its defining polynomial x 7→ g1 (x) := x1 x2 − 1/4 is not concave whereas x 7→ g2 (x) = 0.5 − (x1 − 0.5)2 − (x2 − 0.5)2 is. With d1 = 3, solving (11.39) using GloptiPoly 3 yields the optimal value ρ1 ≈ −4.58.10−11 which, in view of the machine precision for the SDP solvers used in GloptiPoly, could be considered to be zero, but of course with no guarantee. For j = 2 there is no test to perform because −g2 being quadratic and convex yields h∇g2 (y), x − yi = g2 (x) − g2 (y) + (x − y)0 (−∇2 g2 (y))(x − y) (11.41) {z } | s.o.s.
which is in the form (11.38) with d2 = 1.
The next result states that under Assumption 11.2, the convex set K has a semidefinite representation. Let dj ∈ N be such that 2dj is larger than the maximum degree of the polynomials σjk gk , ψjk gk , ψj gj ∈ R[x, y] in (11.38), j = 1, . . . , m.
Theorem 11.16. Let K be as in (2.10) and let both Assumption 2.1 and Assumption 11.2 hold. Then K is convex and with r := maxj dj , the set Ωr in (11.18) is a semidefinite representation of K.
Example 11.6. Consider the convex set K of Example 11.5 for which the defining polynomial g1 of K is not concave. We have seen that Assumption 11.2 holds (up to ρ1 ≈ 10−11 , close to machine precision) and max[d1 , d2 ] = 3. By Theorem 11.16, if ρ1 would be exactly 0, the set M3 (y) 0 M2 (gj y) ≥ 0, j = 1, 2 Ω := (x, y) ∈ Rn × Rs(6) : (11.42) L (z ) = xi , i = 1, 2 y i y0 =1 would be a semidefinite representation of K.
11.5 Exercises
239
At least in practice, for every linear polynomial f ∈ R[x], minimizing Ly (f ) over Ω yields the desired optimal value f ∗ := min {f (x) : x ∈ K}, up to precision M ρ1 ≈ −M 10−11 for some constant M . On the other hand, if K is convex there is no guarantee that (11.38) holds true and so, one may wonder how restrictive is Assumption 11.2. The following result provides some insight. Corollary 11.17. Let K in (2.10) be convex and let Assumption 2.1 and Slater’s condition both hold. Assume that for every j = 1, . . . , m, either −gj is s.o.s.-convex or −gj is convex on K and −∇2 gj 0 on K ∩ {x : gj (x) = 0}. Then Assumption 11.2 holds and so Theorem 11.16 applies. Hence from Corollary 11.17, it follows that Theorem 11.11 is a special case of Theorem 11.16. 11.4
Summary
b In this chapter we have considered the problem of evaluating f(x) at a b point x for the convex envelope f of a given rational function f : K→R on a compact basic semi-algebraic set K. It turns out that for every x in the domain D of fb, a hierarchy of semidefinite relaxations provide a monotone b sequence fi (x) ↑ f(x) (in fact the convergence is even uniform on every compact subset of K where fb is continuous). Then we have addressed the problem of finding an explicit semidefinite representation for co(K) (or K itself if convex) and provided several sufficient conditions for this to happen. Central is the so-called P-BDR property of the polynomials that define the set K. Finally, we have also provided an algebraic certificate of convexity for arbitrary convex basic semi-algebraic sets, as well as two other certificates for convex basic semi-algebraic sets that satisfy a nondegeneracy property on their boundary. In principle they all can be obtained numerically via semidefinite programming, and so only up to machine precision, because of numerical inaccuracies inherent to SDP solvers. 11.5
Exercises
Exercise 11.1. Let (sji ) ⊂ N, (gji ) ⊂ R+ , and let K ⊂ Rn be as in (2.10) with
240
11 Convex Envelope and Representation of Convex Sets
x 7→ gj (x) := −
n X
2sji
gji xi
+ hj (x),
j = 1, . . . , m,
i=1
where hj ∈ R[x] is a an affine polynomial, for every j = 1, . . . , m. If K is compact, show that K has the semidefinite representation (11.25). Exercise 11.2. Let Ω be as in (11.42) of Example 11.6. Show that for every linear polynomial f ∈ R[x], minimizing Ly (f ) over Ω yields the desired optimal value f ∗ := min {f (x) : x ∈ K}, up to ρ1 ≈ −M 10−11 for some constant M . (Hint: use a bound on the Lagrange KKT multipliers.)
11.6
Notes and Sources
11.1. The material in this section is from Laraki and Lasserre (2008a). Determining an index i ∈ N such that fi approximates fb up to some prescribed error ε > 0 is an open problem. For computing the convex envelope of a function f on a bounded domain, Brighi and Chipot (1994) propose triangulation methods and provide piecewise degree-1 polynomial approximations fh ≥ fb, and derive estimates of fh − fb (where h measures the size of the mesh). Another possibility is to view the problem as a particular instance of the general moment problem (like we do here), and use geometrical approaches as described in e.g. Anastassiou (1993) or Kemperman (1987); but, as acknowledged in the latter references, this approach is only practical for say, the univariate or bivariate cases. 11.2. Most of the material of that section is from Lasserre (2009b) and Lasserre (2009c). Theorem 11.11 is from Helton and Nie (2010). Theorem 11.13 is from Lasserre (2009a) where algebraic certificates of convexity are discussed. The issue of characterizing SDr convex sets was raised in e.g. Ben-Tal and Nemirovski (2001) and Helton and Vinnikov (2007) who first show that rigid convexity is a necessary condition, also sufficient in dimension 2; this latter property is related to the Lax conjecture proved in Lewis et al. (2005). Parrilo (2007) exhibited specific semidefinite representations for genus-0 plane curves. In Helton and Nie (2010) and Helton and Nie (2009a) one may find several sufficient conditions (typically second order positive curvature conditions) for existence of semidefinite representations for basic semi-algebraic sets K (as well as for co(K) when K is not convex).
11.6 Notes and Sources
241
They also provide a nice technique to construct an explicit semidefinite representation for the convex hull co(∪pi=1 Wi ) where each Wi already has a semidefinite representation. The authors even conjecture that every basic semi-algebraic convex set K ⊂ Rn has a semidefinite representation. Finally, Chua and Tuncel (2008) consider even more general lifted conic representations of convex sets, called lifted G-representations (SDr being a special case) and discuss various geometric properties of convex sets admitting such lifted G-representations, as well as measures of ”goodness” for such representations.
Chapter 12
Multivariate Integration
This chapter considers the problem of approximating multivariate integrals of rational functions or exponentials of a polynomial, on a compact basic semialgebraic set. We also describe the moment approach as a tool for evaluating gradients or Hesssians in the maximum entropy approach for estimating an unknown density based on the knowledge of its moments.
Multivariate integrals arise in statistics, physics, and engineering and finance applications among other areas. For instance, such integrals are needed to compute probabilities over compact sets for multivariate normal random variables, and therefore, it is important to be able to approximate such integrals. In this chapter we are interested in the following two problems: • Given a basic semi-algebraic set K ⊂ Rn , a finite Borel measure µ on K R with all moments available, compute (or approximate) the integral K f dµ where f is a rational function p/q, with p, q ∈ R[x] and q > 0 on K. • Given a polynomial p ∈ R[x], compute (or approximate) the integral R exp p(x) dx where B ⊂ Rn is a simple set like e.g., the box [−1, 1]n B or a simplex of Rn . As an application, we then apply the moment approach described for the latter problem to compute gradient and Hessian data needed in maximum entropy estimation. 12.1
Integration of a Rational Function
Let K ⊂ Rn be a compact basic semi-algebraic set as defined in (4.1), and let µ be a given finite Borel measure on K. As µ is supported on a compact 243
12 Multivariate Integration
244
set, it is moment determinate,1 and so we assume that we are are given all its moments z = (zα )α∈Nn . Typically, µ is a convex combination of standard distributions (uniform, (truncated) exponential, ...) on K, so that in principle its moments can be obtained relatively easily. With f := p/q where p, q ∈RR[x] and q > 0 on K, one goal is to compute (or approximate) the integral K f dµ, and more generally, to approximate finitely many moments of the new measure dν := q −1 dµ. In the univariate case, a classical related problem is the computation of orthogonal polynomials2 of dν which is called the modified measure. Lemma 12.1. Let K ⊂ Rn be compact, and let q ∈ R[x] with q > 0 on K. Let µ be a given finite Borel measure with support contained in K. Then the only finite Borel measure ϕ with support in K, solving the system of linear equations Z Z xα q dϕ = xα dµ, ∀ α ∈ Nn , (12.1) K
K
−1
is dϕ = q dµ. In particular, q ϕ with respect to µ.
−1
is the Radon-Nikodym derivative [ dϕ dµ ] of
Proof. The linear system (12.1) has the particular solution dϕ = q −1 dµ which has all its moments well defined because q > 0 on K and µ is finite with compact support. Moreover, Z Z Z xα q dϕ = xα qq −1 dµ = xα dµ, ∀α ∈ Nn , K
K
K
and so (12.1) holds. We next prove that dϕ = q −1 dµ is the only finite Borel measure solution of (12.1). So let (12.1) hold for some finite Borel measure ϕ with support in K, and define ψ to be the Borel measure qdϕ (recall that q > 0 on K), with all moments finite because K is compact. We thus have Z Z xα dψ = xα dµ, ∀α ∈ Nn . K
K
Recall that Borel measures with compact support are moment determinate because by Weierstrass theorem, the space of polynomials is dense (for the 1 See
Definition 3.2. family of univariateZpolynomials (p0 , .Z. . , pd ) ⊂ R[x] is orthogonalZ with respect to some measure ν on R if pk pt dν = δk=t , xk pt dν = 0 if k < t, and xt pt dν > 0, 2A
for all 0 ≤ k, t ≤ d. This definition extends to the multivariate case; see Helton et al. (2008).
12.1 Integration of a Rational Function
245
sup-norm) in the space of continuous functions on K; therefore ψ = µ. As ψ has compact support, let us define dν := q −1 dψ, also well-defined because q −1 > 0 on K, and ψ has its suppport contained in K. Therefore, on the one hand, Z Z Z Z xα dν = xα q −1 dψ = xα q −1 q dϕ = xα dϕ, ∀ α ∈ Nn K
K
K
K
(so that ν = ϕ), whereas on the other hand, from ψ = µ, Z Z Z xα dν = xα q −1 dψ = xα q −1 dµ, ∀ α ∈ Nn . K
K
K
Again, as Borel measures with compact support are moment determinate, it follows that dϕ = q −1 dµ, the desired result. We first consider the multivariable case n > 1 and then specialize to the one-dimensional case n = 1. 12.1.1
The multivariable case
Recalling that z = (zα ) is the vector of moments of µ, consider the infinitedimensional linear programs: Z ρ1mom := sup p dϕ ϕ∈M (K)+ ZK (12.2) s.t. q xα dϕ = zα , ∀ α ∈ N, K
and ρ2mom :=
inf
ϕ∈M (K)+
s.t.
Z
ZK K
p dϕ (12.3) q xα dϕ = zα ,
∀ α ∈ N.
By Lemma 12.1, both linear programs (12.2) and (12.3) have the unique solution dϕ = q −1 dµ and so, Z p ρ1mom = ρ2mom = dµ. K q Depending on parity, let deg q = 2v0 or 2v0 −1, and with K ⊂ Rn as in (4.1) let deg gj = 2vj or 2vj − 1, for all j = 1, . . . , m. Consider the semidefinite
12 Multivariate Integration
246
programs: ρ1i := sup Ly (p) y
s.t. Mi (y) 0 Mi−vj (gj y) 0, j = 1, . . . , m Ly (q xα ) = zα , |α| ≤ 2i − deg q,
(12.4)
and ρ2i := inf Ly (p) y
s.t. Mi (y) 0 Mi−vj (gj y) 0, j = 1, . . . , m Ly (q xα ) = zα , |α| ≤ 2i − deg q.
(12.5)
The sequence (ρ1i ) (resp. (ρ2i )) is monotone non increasing (resp. monotone non decreasing).
Theorem 12.2. Let K ⊂ Rn be as in (4.1) and Assumption 2.1 hold. Consider the semidefinite programs (12.4) and (12.5). Then, as i→∞, Z Z p p 1 2 ρi ↓ dµ and ρi ↑ dµ. (12.6) q K K q
Proof. The semidefinite program (12.4) (resp. (12.5)) is the semidefinite relaxation (4.22) associated with the generalized moment problem (12.2) (resp. (12.3)), with countably many moment constraints. The assumptions of Theorem 4.3 are satisfied, and so (12.6) holds. 12.1.2
The univariate case
We now consider the case n = 1. Let s be the degree of the polynomial q ∈ R[x], and let I ⊂ R[x] be the ideal generated by q. So, for every k = 0, 1, . . ., write x 7→ xk = rk (x) + hk (x)q(x),
k = 0, 1, . . .
(12.7)
for some hk ∈ R[x] and rk ∈ R[x]/I. Recall that R[x]/I is a R-vector space of dimension s, with basis 1, x, . . . , xs−1 . Therefore, rk has degree less than
12.1 Integration of a Rational Function
247
or equal to s − 1, and reads x 7→ rk (x) =
s−1 X
rkj xj ,
j=0
for some vector of coefficients rk ∈ Rs . Next, let µ be a finite mesure on a compact basic semi-algebraic set K ⊂ R, with sequence of moments z, and consider the sequence y = (yk ) defined as follows: yk = Ly (rk ) + Ly (hk q) =
s−1 X
rkj yj + Lz (hk ),
k = 0, 1, . . . , (12.8)
j=0
with rk , hk ∈ R[x] as in (12.7) and where we use the constraint Ly (hk q) = Lz (hk ). Hence the semidefinite relaxations (12.4) and (12.5) simplify because they involve only s variables, namely the unkown moments y0 , y1 , . . . , ys−1 . Indeed, the other moments yk with k ≥ s, are obtained via (12.8) as a linear combination of the first s moments y0 , . . . , ys−1 , plus the P constant term Lz (hk ) = j hkj zj . With K ⊂ R being the interval [a, b] (with a, b ∈ R), let g1 , g2 ∈ R[x] be the polynomials x 7→ g1 (x) := b−x and x 7→ g2 (x) := x−a. With i ≥ deg p, consider the univariate semidefinite relaxation analogues of (12.4)-(12.5): ρ1i :=
sup y0 ,...,ys−1
Ly (p)
s.t. Mi (y) 0, Mi−1 (g1 y), Mi−1 (g2 y) 0
(12.9)
and ρ2i :=
inf
y0 ,...,ys−1
Ly (p)
s.t. Mi (y) 0, Mi−1 (g1 y), Mi−1 (g2 y) 0,
(12.10)
where in the Hankel (moment) matrix Mi (y) and localizing matrix Mi−1 (gj y), j = 1, 2, each entry yk with k ≥ s, is replaced with the expression (12.8), affine in y0 , . . . , ys−1 . We then get the more specialized result:
12 Multivariate Integration
248
Theorem 12.3. Let K ⊂ R be the interval [a, b], and let f := p/q with p, q ∈ R[x], and q > 0 on K. Let µ be a finite Borel measure on K, with sequence of moments z, and consider the semidefinite relaxations (12.9) and (12.10). Then as i→∞, Z b Z b 1 2 ρi ↓ f dµ and ρi ↑ f dµ. a
a
It is important to emphasize that in the generic case where q −1 has only simple poles, one may decompose q −1 into a sum of elementary real fractions with denominators of degree at most 2. In this generic quadratic case, the semidefinite relaxations defined in (12.9) and (12.10) have only two variables, namely y0 and y1 ! Example 12.1. Let K ⊂ R be the interval [a, b], with a, b > 0, and let p = 1 and q = x, so that f = 1/x. Hence in (12.7) we have xk = 0 + xk−1 x,
k = 1, . . . ,
and so rk = 0, and hk = xk−1 , for all k = 1, . . .. Let µ be the uniform distribution on [a, b], and so Z b yk = xk dx = (bk+1 − ak+1 )/(k + 1), k = 0, 1, . . . a
R Rb We want to approximate f dµ = Ly (1) = a x−1 dx = ln b/a. The equation Ly (qxk ) = Lz (xk ) yields yk+1 = (bk+1 − ak+1 )/(k + 1),
k = 0, 1, . . .
and so, the semidefinite relaxations (12.9) and (12.10) contain the single variable y0 only! For instance: y0 b−a M1 (y) = ; b − a (b2 − a2 )/2 y0 b−a (b2 − a2 )/2 M2 (y) = b − a (b2 − a2 )/2 (b3 − a3 )/3 . 2 2 (b − a )/2 (b3 − a3 )/3 (b4 − a4 )/4
With a = 1, b = 2, we observe a very fast convergence of both upper and lower bounds to the exact value Ly (1) = ln 2 ≈ 0.693147; see Table 12.1.
12.1 Integration of a Rational Function Table 12.1 Example 12.1 : bounds ρ1i , ρ2i .
249
Upper and lower
i
1
2
3
4
ρ1i
0.7000
0.693333
0.693152
0.693147
ρ2i
0.6666
0.692307
0.693122
0.693146
Example 12.2.RHere p = 1 and q R= 1+x2 , so that f = 1/(1+x2 ). We want b to approximate f dµ = Ly (1) = a (1 + x2 )−1 dx = arctan(b) − arctan(a). Using (12.8), the equation Ly (qxk ) = Lz (xk ) yields yk+2 = Ly (rk ) + Lz (hk ),
k = 0, 1, . . .
In particular, r2k (x) = (−1)k and r2k+1 (x) = (−1)k x for all k = 1, . . ., and so, for instance, the moment matrix M2 (y) reads y0 y1 −y0 + (b − a) . M2 (y) = y1 −y0 + (b − a) −y1 + (b2 − a2 )/2 2 2 3 3 −y0 + (b − a) −y1 + (b − a )/2 y0 + (b − a )/3 + (a − b) R2 With, a = 1, b = 2 one has 1 f dx = Ly (1) = arctan(2) − arctan(1) ≈ 0.3217505, and we obtain again a very good approximation with few moments; see Table 12.2. Table 12.2 Example 12.2 : Upper and lower bounds ρ1i , ρ2i . i
1
2
3
4
ρ1i
1.000
0.325
0.321785
0.321757
ρ2i
0.000
0.300
0.321260
0.321746
Example 12.3. Consider the Chebyshev measure dµ = (1 − x2 )R−1/2 dx 1 on the interval [−1, 1], and suppose that we want to approximate −1 (1 + x2 )−1 dµ. Results are displayed in Table 12.3. Notice that we already obtain very good bound with i = 6, i.e., with only 12 moments. The largest
12 Multivariate Integration
250
Table 12.3 ρ1i , ρ2i .
Example 12.3 : Upper and lower bounds
i
6
7
8
9
ρ1i
2.222102
2.221461
2.221461
2.221442
ρ2i
2.221328
2.221328
2.221438
2.221438
Table 12.4 formula.
Example 12.3 : Gauss-Kronrod quadrature
n=2
n=3
n=4
n=5
2.22529479
2.22155480
2.22144480
2.22144156
semidefinite relaxation (12.9) with i = 9 has 2 variables and an LMI matrix size of 9×9. On the other hand, we have also computed another estimate via the Gauss-Kronrod quadrature formula which has 2n + 1 nodes (including ±1) and is exact for polynomials up to degree 4n − 1; see (Gautschi, 1997, p. 167). The estimate with n = 5 gives the value 2.22144156 which, as expected, is between our upper and lower bounds.
12.2
Integration of Exponentials of Polynomials
Consider the following class of multivariate exponential integrals: ρmom =
Z
g(x) eh(x) dx,
(12.11)
K
where x ∈ Rn , g, h ∈ R[x], and K ⊂ Rn is a simple set like e.g. the Qn box i=1 [ai , bi ], or a simplex of Rn . For clarity of exposition, we will only describe the approach for simple two-dimensional integrals on a box [a, b] × [c, d] ⊂ R2 . The multivariate case n ≥ 3 essentially uses the same machinery but with more complicated and tedious notation.
12.2 Integration of Exponentials of Polynomials
12.2.1
251
The moment approach
Suppose that one wants to approximate: ρmom =
Z
g(x, y) eh(x,y) dy dx
(12.12)
K
where g, h ∈ R[x, y], and K = [a, b] × [c, d] ⊂ R2 . Consider the measure µ on R2 defined by µ(B) =
Z
eh(x,y) dy dx
∀ B ∈ B(R2 ),
(12.13)
xα y β eh(x,y) dy dx
(12.14)
K∩B
and its sequence of moments z = (zαβ ): zαβ =
Z
xα y β dµ(x, y) = K
Z
b a
Z
d c
for all (α, β) ∈ N2 . Clearly, ρmom = Lz (g), where Lz : R[x, y]→R is the usual linear functional f (=
X
fαβ xα y β )
7→
α,β
Lz (f ) :=
X
fαβ zαβ .
α,β
Therefore we can compute ρmom once we have all necessary moments z = (zαβ ). Integration by parts yields: zαβ =
1 β+1 −
1 β+1
If one writes h(x, y) = zαβ =
Z
P
b a
Z
h iy=d xα y β+1 eh(x,y) dx b
a
γ,δ
Z
y=c
d
xα y β+1 c
∂h(x, y) h(x,y) e dy dx. ∂y
hγδ xγ y δ then
dβ+1 cβ+1 vα − wα − β+1 β+1
X
(γ,δ)∈N2
δhγδ z(α+γ)(β+δ) β+1
(12.15)
where v = (vα ) and w = (wα ) are the moments of the measures dν = eh(x,d)dx and dξ := eh(x,c) dx on [a, b], respectively.
12 Multivariate Integration
252
Let k, l ∈ R[x] be the univariate polynomials x 7→ k(x) := h(x, d) and x 7→ l(x) := h(x, c). Integration by parts for vα yields: Z b 1 h α+1 k(x) ix=b vα = xα eh(x,d)dx = x e α+1 x=a a Z b 1 − xα+1 k 0 (x) ek(x) dx ∀α ∈ N, α+1 a or, equivalently, vα =
bα+1 ek(b) aα+1 ek(a) X t kt − − vα+t α+1 α+1 α+1 t∈N
∀α ∈ N,
(12.16)
where (kt ) is the coefficient vector of the polynomial k ∈ R[x] of degree kx . Similarly: wα =
bα+1 el(b) aα+1 el(a) X t lt − − wα+t α+1 α+1 α+1 t∈N
∀α ∈ N,
(12.17)
where (lt ) is the coefficient vector of the polynomial l ∈ R[x] of degree lx . In view of (12.16) and (12.17), all moments vα and wα are affine functions of v0 , . . . , vkx −1 , and w0 , . . . , wlx −1 , respectively. The moment approach Notice that in (12.12), the quantity ρmom to approximate, is a linear combination Lz (g) of moments of µ. Then, to compute upper and lower bounds on ρmom , it suffices to build up two hierarchies of semidefinite programs Qui and Qli , as follows. Consider the vectors of moments v, w and z, up to order 2i, i ∈ N. • The linear constraints of both Qui and Qli are obtained from the equations (12.15), (12.16), and (12.17), that contain only moments of order up to 2i. • The linear matrix inequality constraints of both Qui and Qli state necessary conditions on v, w and z, to be moments of some measures supported on [a, b], [a, b] and [a, b] × [c, d], respectively. Then the semidefinite program Qui (resp. Qli ) maximizes (resp. minimizes) the linear criterion Lz (g) under the above constraints. Both are semidefinite relaxations of the original problem (12.12), and so their respective optimal values ρ1i and ρ2i provide upper and lower bounds on ρmom . In addition, the
12.2 Integration of Exponentials of Polynomials
253
quality of both bounds increases with i because more and more constraints (12.15), (12.16), and (12.17), are taken into account.
12.2.2
Semidefinite relaxations
The support of the measure µ is the semiagebraic set {(x, y) ∈ R2 : θi (x, y) ≥ 0, i = 1, 2}, where θ1 , θ2 ∈ R[x, y] are the polynomials (x, y) 7→
θ1 (x, y) := (b − x)(x − a) . θ2 (x, y) := (d − y)(y − c)
As θ1 , θ2 are both quadratic, the necessary conditions for moment and localizing matrices associated with z read: Mi (z) 0,
Mi−1 (θk z) 0,
k = 1, 2.
(12.18)
Next, both measures ν and ξ are supported on the set {x ∈ R : θ3 (x) ≥ 0}, with x 7→ θ3 (x) := (b − x)(x − a), and so, analogues of (12.18) can be derived for v and w. In addition, obvious bounds are available for z0 , v0 and w0 . Namely: v0 ≤ M1 := (b − a) supx∈[a,b] e|h(x,d)| w0 ≤ M2 := (b − a) supx∈[a,b] e|h(x,c)| z0 ≤ M3 := (b − a)(d − c) sup
(x,y)∈[a,b]×[c,d]
e|h(x,y)|
.
(12.19)
Combining these necessary conditions and the linear relations for z, v, and w in (12.15), (12.16), and (12.17), one obtains an upper bound ρ1i and a lower bound ρ2i for ρmom by solving the following semidefinite programs: ρ1i (resp. ρ2i ) = sup (resp. inf ) Lz (g) z,v,w
s.t.
z,v,w
v 0 ≤ M 1 ; w0 ≤ M 2 ; z0 ≤ M 3
Mi (z) 0, Mi−1 (θj z) 0, j = 1, 2 Mi (v) 0, Mi−1 (θ3 v) 0, Mi (w) 0, Mi−1 (θ3 w) 0,
12 Multivariate Integration
254
zαβ =
dβ+1 β+1 vα
−
cβ+1 β+1 wα
−
X
(γ,δ)∈N2
δhγδ z , β + 1 (α+γ)(β+δ)
∀(α, β) ∈ N2 : α + γ + β + δ ≤ 2i. vα =
bα+1 ek(b) α+1
−
aα+1 ek(a) α+1
−
∀α ∈ N : α + kx ≤ 2i. wα =
bα+1 el(b) α+1
X t kt vα+t , α+1 t∈N
aα+1 el(a) α+1
−
∀α ∈ N : α + lx ≤ 2i.
−
X t lt wα+t , α+1 t∈N
(12.20)
The sequence (ρ1i ) (resp. (ρ2i )) is monotone non increasing (resp. monotone non decreasing).
Theorem 12.4. Let ρ1i and ρ2i be as in (12.20) with Mj as in (12.19), j = 1, 2, 3. Then as i→∞, Z bZ d Z bZ d ρ1i ↓ g(x, y) eh(x,y)dy dx and ρ2i ↑ g(x, y) eh(x,y)dy dx. a
c
a
c
Theorem 12.4 states that one can obtain an arbitrarily close approximation of ρmom by solving two hierarchies of semidefinite programs. 12.2.3
The univariate case
In this section we want to approximate the one-dimensional integral ρmom =
Z
1
g(x) eh(x) dx,
(12.21)
0
Pd j where x ∈ R, g, h ∈ R[x]. Write x 7→ h(x) = j=0 hj x (with hd 6= 0), and let y = (yk ) be the sequence of moments of the measure µ on [0, 1] with density eh(x) with respect to the Lebesgue measure. Integration by parts yields: Z
1
k
0
x h (x) e 0
h(x)
h
k
dx = x e
h(x)
i1 0
−k
Z
xk−1 eh(x) dx,
(12.22)
12.2 Integration of Exponentials of Polynomials
255
for all k = 0, 1, . . .. Therefore, d hd yd−1 = e
h(1)
−e
h(0)
−
d−1 X
jhj yj−1
(12.23)
j=1
and for every k = 0, . . . d hd yd+k = eh(1) − (k + 1)yk −
d−1 X
jhj yj+k ,
(12.24)
j=1
which shows that every moment yk with k ≥ d − 1 is a linear combination of y(d) := (y0 , . . . , yd−2 ), i.e., yk = hvk , y(d)i,
k ≥ d − 1,
(12.25)
for some vector vk ∈ Rd−1 , obtained from (12.23)-(12.24). Moreover, y0 ≤ M := sup { e|h(x)| : x ∈ [0, 1] }.
(12.26)
With i ≥ max[deg g, d], consider the semidefinite program: ρ1i (resp. ρ2i ) = sup (resp. inf ) Ly (g) y(d)
y(d)
s.t. y0 ≤ M Mi (y) 0 Mi−1 (x(1 − x) y) 0 yk = hvk , y(d)i, k = d − 1, . . . , 2i.
(12.27)
Corollary 12.5. Let ρ1i and ρ2i be as in (12.27) with M as in (12.26). (a) Then as i→∞, Z 1 Z 1 g(x) eh(x) dx and ρ2i ↑ g(x) eh(x) dx. ρ1i ↓ 0
0
1
2
(b) Let y (i) (resp. y (i)) be an arbitrary optimal solution of the semidefinite programs (12.27) (garanteed to exist). Then for every k ∈ N, and as i→∞, Z 1 Z 1 1 k h(x) 2 yk (i) → x e dx. and yk (i) → xk eh(x) dx. 0
0
12 Multivariate Integration
256
Proof. We briefly sketch the proof. Let y be a feasible solution of (12.27). Because of y0 ≤ M and Mi (y), Mi−1 (x(1−x)y) 0, one has |yk | ≤ y0 ≤ M for every k = 0, . . . , 2i. Hence, the feasible set is compact (as bounded and closed) and so, there is an optimal solution that we denote by y 1 (i) in the ”sup” case and y2 (i) in the ”inf” case. By completing with zeros, one may consider yj (i) as an element of l∞ , j = 1, 2, and even as an element of the M -ball of l∞ , which is compact in the weak ? topology σ(l∞ , l1 ). Therefore, there are subsequences (ip ) ⊂ N, (iq ) ⊂ N, and vectors y1 , y2 ∈ l∞ such that, as p and q→∞, yj1 (ip ) → yj1
and yj2 (iq ) → yj2 ,
∀j = 0, 1, . . .
(12.28)
From the above convergence it follows that for every i ∈ N, Mi (yk ) 0 and Mi−1 (x(1 − x)yk ) 0, k = 1, 2, which in turn implies that y1 (resp. y2 ) is the moment sequence of a measure µ1 (resp. µ2 ) on [0, 1]. In addition, k with yk (d) := (y0k , . . . , yd−2 ) ∈ Rd−1 , k = 1, 2, and from the convergence (12.28), on obtains: yj1 = hvj , y1 (d)i;
yj2 = hvj , y2 (d)i,
∀j ∈ N,
that is, (12.25) holds for y1 and y2 . However, one may show that dµ = eh(x) dx is the only measure on [0, 1] whose moments y satisfy (12.25). Hence µ1 = µ2 = µ, which shows that in fact the whole sequences y 1 (i) and y2 (i) both converge to the same limit y1 = y2 = y, and so (b) is proved. Finally (a) is an immediate consequence of (b). Notice that in the univariate case, one solves two hierarchies of semidefinite relaxations which contain d − 1 variables only, as opposed to the multivariate case where the number of variables increases in the hierarchy. 12.3
Maximum Entropy Estimation
As a particular application of the above methodology, consider the maximum entropy estimation which is concerned with the following problem: Let f ∈ L1 ([0, 1])3 be a nonnegative function only known via the first 2d + 1 moments of its associated measure dµ = f dx on [0, 1]. From that partial knowledge one wishes (a) to provide an estimate fd of f such that the first 2d + 1 moments of fd dx match those of f dx, and (b) analyze 3 L ([0, 1]) denote the Banach space of integrable functions on the interval [0, 1] of 1 R the real line, equipped with the norm kf k = 01 |f (x)| dx.
12.3 Maximum Entropy Estimation
257
the asymptotic behavior of fd when d→∞. This problem has important applications in various areas of physics, engineering, and signal processing in particular. An elegant methodology is to search for fd in a (finitely) parametrized family {fd (λ, x)} of functions, and optimize over the unknown parameters λ via a suitable criterion. For instance, one may wish to select an estimate fd that maximizes some appropriate entropy. Several choices of entropy functional are possible as long as one obtains a convex optimization problem in the finitely many coefficients λi ’s. In this section, one chooses the Boltzmann-Shannon entropy, in which case the optimal estimate fd∗ is known to be the exponential of some polynomial of degree 2d, whose coefficient vector λ ∈ R2d+1 is then an optimal solution of a convex optimization problem. Therefore, results of the previous section can be used in any maximizing entropy algorithm to provide a numerical method to evaluate both gradient and Hessian of the function R1 λ 7→ 0 fd (λ, x)dx, at each current iterate λ. Its distinguishing feature is to avoid computing orthonormal polynomials of the measure fd (λ; x)dx. 12.3.1
The entropy approach
Consider the problem of estimating an unknown density f : [0, 1]→ R+ , based on the knowledge of its first 2d + 1 moments, y = (y0 , . . . , y2d ) only. That is, yj =
Z
1
xj f (x) dx,
j = 0, . . . , 2d,
0
where in general y0 = 1 (as f is a density of some probability measure µ on [0, 1]). The entropy approach is to compute an estimate fd that maximizes some appropriate entropy, e.g., the usual Boltzmann-Shannon entropy H : L1 ([0, 1])→R ∪ {−∞}: g 7→ H [g] := −
Z
1
g(x) ln g(x) dx, 0
a strictly concave functional. Therefore, the problem reduces to: sup g
H [g] :
Z
1
j
x g(x) dx = yj , 0
j = 0, . . . , 2d
.
(12.29)
12 Multivariate Integration
258
The structure of this infinite-dimensional convex optimization problem permits to search for an optimal solution g = fd∗ of the form x 7→ fd∗ (x) = exp
2d X
λ∗j xj ,
j=0
where λ∗ ∈ R2d+1 is unique when fd∗ has to satisfy the constraints of (12.29). This is because the Legendre-Fenchel conjugate4 ψ ∗ of the function x ln x if x > 0 x 7→ ψ(x) := 0 if x = 0 +∞ if x < 0 is z 7→ ψ ∗ (z) := exp (z − 1). And so, the conjugate functional H is Z 1 g 7→ H ∗ (g) = exp (g(x) − 1)dx.
∗
of −H
0
Therefore, given λ ∈ R
2d+1
, define fd (λ, ·) : R+ → R to be:
x 7→ fd (λ, x) := exp
2d X
λ j xj ,
j=0
x ∈ R.
(12.30)
Problem (12.29) reduces to the concave finite-dimensional optimization problem: Z 1 P: sup hy, λi − fd (λ, x) dx . λ∈R2d+1
0
gd∗
Notice that solving P is just evaluating at the point y, where gd∗ is the Legendre-Fenchel transform of the function gd : R2d+1 → R, Z 1 λ 7→ gd (λ) := fd (λ, x) dx, (12.31) 0
i.e., the mass of the measure dµd := fd (λ, x)dx on [0, 1]. Lemma 12.6. Let fd , gd be defined as in (12.30) and (12.31) respectively, and let z(λ) = (zk (λ)) ⊂ R be the sequence: 4 Let f : Rn → R ∪ {+∞} with f 6≡ +∞ be such that there is an affine function minorizing f on Rn . Then the function λ 7→ f ∗ (λ) := sup {λ0 x − f (x) : x ∈ Rn } is called the Legendre-Fenchel conjugate of f . See e.g. (Hiriart-Urruty and Lemarechal, 1993, p. 37).
12.3 Maximum Entropy Estimation
zk (λ) :=
Z
1
xk fd (λ, x) dx,
259
k = 0, 1, . . .
0
Then gd is convex with gradient ∇gd = (∂gd /∂λk )k given by λ 7→
Z
∂gd (λ) = ∂λk
1
xk fd (λ, x) dx = zk (λ),
0
∀ k = 0, . . . , 2d,
(12.32)
2
gd )j,k given by and Hessian ∇2 gd = ( ∂λ∂j ∂λ k
λ 7→
12.3.2
∂ 2 gd (λ) = ∂λj ∂λk
Z
1
xj+k fd (λ, x) dx = zj+k (λ),
0
∀ j, k = 0, . . . , 2d. (12.33)
Gradient and Hessian computation
Notice that computing the first 2d + 1 moments z(λ) of the measure dµd = fd (λ, x)dx yields the gradient ∇gd (λ), which permits to implement a first-order minimization algorithm. Computing an additional 2d moments provides us with the Hessian ∇2 gd (λ) as well. For instance, to solve P one may wish to implement Newton’s method, which in view of (12.32)-(12.33), yields the iterates λ(k+1) = λ(k) + [∇2 gd (λ(k) ) ]−1 ˆ z(λ(k) ),
k = 0, 1, . . .
(12.34)
where zˆ(λ(k) ) = y − z(λ(k) ). In fact, one only needs compute the vector zd (λ) = (z0 (λ), . . . , z2d−2 (λ)) of first 2d−1 moments because any other moment zk (λ) with k ≥ 2d−1, can be expressed from zd (λ) at no additional cost. This follows from (12.23)(12.24) in substituting eh(x) with fd (λ, x). One possibility is to run 2d−1 semidefinite relaxations (12.27) with x 7→ g(x) := xk to get an approximation of zk (λ) for every k = 0, . . . , 2d−2. But it is better to run a single semidefinite relaxation (12.27) (which maximizes some criterion Lz (g)) for some index i sufficiently large. Indeed by Corollary 12.5(b), for every k = 0, 1, . . ., zk1 (i)→
Z
1
k
x e 0
h(x)
dx =
Z
1 0
xk fd (λ, x) dx = zk (λ),
12 Multivariate Integration
260
Table 12.5 k
Example 12.4 : f (x) = (1 + x)−1 and d = 1. coefficients λ(k) ∈ R3
moments
0 1 2 8 9 10
111.5 41.28 15.48 .7023 .6935 .6931
98.24 36.25 13.45 .3149 .3071 .3068
88.69 32.69 12.10 .2005 .1934 .1931
?
.6931
.3069
.1931
1.0000 0.3323 0.0569 0.0125 -0.0049 -0.0055
1.000 0.2306 -1.4160 -1.0867 -0.9257 -0.9205
5.000 5.4424 6.3745 0.4837 0.2461 0.2383
as i→∞ (where z1 (i) is an optimal solution of the semidefinite relaxation (12.27)). Example 12.4. Let x 7→ f (x) := (1 + x)−1 be the unknown density. With d = 1 (i.e. λ ∈ R3 ) and i = 5 in (12.27), i.e., with at most 10 moments, results are displayed in Table 12.5. The last line with a ”?” displays the exact moments. Very good results are obtained after 10 iterations only of Newton’s method (12.34). Gradient components are about O(10−5 ) or O(10−6 ), and the first seven moments of the measure dµ = fd (λ, x)dx read: (0.6931, 0.3068, 0.1931, 0.1402, 0.1099, 0.0903, 0.0766), a fairly good approximation. Finally, results with d = 2 (i.e. with λ ∈ R5 ), are displayed in Table 12.6, again with very good results. However, we had to set i = 7 (i.e. 14 moments) for the semidefinite relaxations (12.27). In case the Hessian would be ill-conditioned when close to an optimal solution, more sophisticated second-order methods, or even firstorder methods might be preferable for larger r. At iteration 8 the gradient is O(10−10 ). Figure 12.1 displays fd − f on [0, 1] because both curves of fd and f are almost indistinguishable. Indeed, the scale in Figure 12.1 is 10−4 .
12.4
Summary
In this chapter, we have considered the moment approachR to solve (or at least approximate) two types of multivariate integrals J = K f dµ: Namely, • When K ⊂ Rn is a basic compact semi-algebraic set, f is a rational
12.4 Summary
Table 12.6 k 0 1 2 4 8
261
Example 12.4 : f (x) = (1 + x)−1 and d = 2. coefficients λ(k) ∈ R5
moments 8.2861 3.3332 1.5477 0.7504 0.6931
6.1806 2.3940 1.0112 0.3588 0.3069
5.1359 1.9635 0.8010 0.2408 0.1931
4.4543 1.6918 0.6781 0.1842 0.1402
3.9579 1.4974 0.5939 0.1507 0.1098
0.7500 0.2198 0.0171 0.0195 -0.0001
0.7500 -0.0129 -0.6777 -1.5589 -0.9958
0.7500 0.7500 0.7500 0.9905 0.7967 0.7684 0.3247 1.2359 0.8959 4.0336 -8.0020 5.5820 0.4658 -0.2191 0.0562
? 0.6931 0.3069 0.1931 0.1402 0.1098
x 10
exp(−1/10000−9973/10000 x+949/2000 x2−589/2500 x3+331/5000 x4)−1/(1+x)
0
0.1
−4
1.5
1
0.5
0
−0.5
−1 0.2
0.3
Fig. 12.1
0.4
0.5 x
0.6
0.7
0.8
0.9
1
Example 12.4, d = 2: fd − f on [0, 1].
function p/q with q > 0 on K, and µ is a given measure on K whose sequence of moments is known and available. • When K is a simple set like e.g. the unit box [−1, 1]n ⊂ Rn , µ is the Lebesgue measure, and f is the exponential of some given polynomial. In both cases we have defined a hierarchy of semidefinite relaxations that provide upper and lower bounds converging to the exact value J. Finally, we have also considered the maximum entropy approach to estimate an unknown density f ∈ L1 (K) from the knowledge of its moments.
12 Multivariate Integration
262
When K is a simple set, then the approximation scheme developed earlier permits to evaluate gradient and Hessian data needed in an optimization algorithm for maximum entropy.
12.5
Exercises
Exercise 12.1. Let n = 1, a > 0, and with the Gaussian density R a x 7→ 2 f (x) ::= exp(−x /2), one wishes to approximate the integral 0 f (x)dx. Apply the semidefinite relaxations (12.27) and compare with approximations already available in some Tables.
12.6
Notes and Sources
12.1 Most of the material inR this section is from Lasserre (2009d). For a numerical approximation of f dµ, the most popular methods are Monte Carlo schemes and (at least in the one-dimensional case n = 1) generalizations of Gauss quadrature formulas, i.e., formulas which integrate exactly in space of Laurent polynomials or more general rational functions. The former approach is described in Niederreiter (1992) and the latter procedure is standard. One has first to compute the recurrence coefficients for orthogonal polynomials associated with µ, and then compute the nodes and weights from these coefficients. However, as noted in Deun et al. (2005) even for the one-dimensional case, if the (recurrence) coefficients are not known explicitly then problems arise in their computation. This is also true even for polynomial quadrature formulae with arbitrary measures; see (Deun et al., 2005, p. 1). An exception is precisely in (Deun et al., 2005, §3) where explicit recurrence coefficients are provided for the case of so-called Chebyshev weight functions. On the other hand, if one uses standard Gaussian quadrature (or cubature if n > 1) formulas exact for polynomials, then one has only an estimate and some error occurs. In the one-dimensional case, another possibility is to use modification algorithms for the previously mentioned modified measure f dµ, get orthogonal polynomials, and then the first moment of f dµ. 12.2 Most of the material of this section and in particular, Theorem 12.4 and Corollary 12.5, are from Bertsimas et al. (2008) where in fact, K ⊂ R n
12.6 Notes and Sources
263
can also be the more general basic semi-algebraic set: {x ∈ Rn : bl1 ≤ x1 ≤ bu1 , bli (x[i − 1]) ≤ xi ≤ bui (x[i − 1]),
∀i = 2, . . . , n},
where x[i] ∈ Ri is the vector of first i elements of x for all i = 1, . . . , n, bli , bui ∈ R[x[i − 1]] for all i = 2, . . . , n, and bl1 , bu1 ∈ R. 12.3 Most of the material is from Lasserre (2007a). For more details on the maximum entropy approach, the interested reader is referred to e.g. Byrnes and Lindquist (2006), Borwein and Lewis (1991b,a, 1992), Georgiou (2006), Mead and Papanicolaou (1984), and Tagliani (2002a,b). As early as in Mead and Papanicolaou (1984), it was recognized that such entropy methods may outperform classical Pad´e-like approximations. Except for the homotopy approach developed in Georgiou (2006), optimization algorithms using first or second order methods to search for λ, need evaluate the gradient ∇λ gd and Hessian ∇2λ gd at the current iterate λ. Of course, this can be done via some quadrature formula and a typical example of this approach is the Newton method described in Mead and Papanicolaou (1984). In principle, the quadrature formula should be with respect to the weight function and so, requires to repeatedly compute orthogonal polynomials with respect to the measure fd (λ, x)dx on [0, 1], not an easy task in general. This is why one rather uses some Gaussian quadrature formula with respect to dx on [0, 1] and incorporate fd (λ, •) in the integrand to evaluate. In doing so, one estimates both gradient and Hessian with some unknown error. However, very good numerical results may be obtained because Gaussian quadrature formula seem to perform well for integrating exponentials of polynomials, at least in the univariate case on [0, 1].
Chapter 13
Min-Max Problems and Nash Equilibria
This chapter considers some min-max problems, as well as the minimization of the supremum of finitely many rational functions on a compact basic semialgebraic set. Then this is used to compute (or approximate) the value of Nash equilibria for N-player finite games. We end up with applying the moment approach to polynomial games.
13.1
Robust Polynomial Optimization
Let K ⊂ Rn be a basic semi-algebraic set and consider the following minmax optimization problem:
f ∗ := inf
sup f (x, y)
x∈K y∈Ω(x)
(13.1)
where f ∈ R[x, y], and for each x ∈ K, the set Ω(x) ⊂ Rp is a convex polytope. Problem (13.1) is the robust version of the global polynomial optimization problem of Chapter 5. Indeed, one may interpret (13.1) as a game against nature ”y”, where one wishes to take an optimal robust decision x, where ”robust” means against the worst possible outcome of the opponent y ∈ Ω(x). We consider the case where when x is fixed: • The polynomial y 7→ fx (y) (:= f (x, y)) ∈ R[y] is affine, and • for every x ∈ K, the set Ω(x) is the convex polytope: Ω(x) := {y ∈ Rp : A(x) y = b(x) ; y ≥ 0 } 265
(13.2)
13 Min-Max Problems and Nash Equilibria
266
where for some s ∈ N, the mapping x 7→ A(x) ∈ Rs×p defines a matrix polynomial and the mapping x 7→ b(x) defines a vector polynomial, that is, A ∈ R[x]s×p and b ∈ R[x]p . Therefore, writing (x, y) 7→
f (x, y) = f0 (x) +
p X
fj (x) yj ,
j=1
Problem (13.1) reads: f ∗ = inf sup {f0 (x) + x∈K
y
p X j=1
fj (x) yj : A(x) y = b(x); y ≥ 0}.
Let Aj (x) denote the jth column of A(x). Standard LP-duality in the inner maximization problem, yields:
f ∗ = inf f0 (x) + u0 b(x) x,u
s.t. Aj (x)0 u − fj (x) ≥ 0, x ∈ K, u ∈ Rs .
j = 1, . . . , p.
(13.3)
Observe that for every j = 1, . . . , p, the function (x, u) 7−→ hj (x, y) := Aj (x)0 u − fj (x), is a polynomial in x and u. And so is the function (x, u) 7→ u0 b(x). In b ⊂ Rn × Rs defined by addition, the set K b := {(x, u) ∈ Rn × Rs : hj (x, u) ≥ 0, j = 1, . . . , p; x ∈ K } K
is a basic semi-algebraic set. Hence the min-max problem (13.1) reduces to solving the global polynomial optimization problem (13.3) in Rn × Rs . Therefore, the semidefinite relaxations defined in Chapter 5 can be applied. b is compact and Assumption 2.1 holds, their convergence In particular, if K is guaranteed by Theorem 5.6. Therefore, solving the robust optimization problem (13.1) reduces to solving a polynomial optimization problem in Rn+p , as opposed to Rn in the non robust setting. In what precedes, the decision x ∈ Rn lies in a basic semi-algebraic set K ⊂ Rn , whereas the ”perturbation” y lies in a nice convex set Ω(x). We next see the reverse situation.
13.1 Robust Polynomial Optimization
13.1.1
267
Robust Linear Programming
Consider now the following linear program: sup { c(x)0 y :
A y = b;
y
y ≥ 0 },
(13.4)
where y ∈ Rp is a decision vector while x ∈ Rn is a perturbation (taking values in a basic semi-algebraic set K ⊂ Rn ), unknown at the time where the decision y should be taken. Hence, now the decision y lies in a nice convex set Ω ⊂ Rp , whereas the perturbation x lies in a compact basic semi-algebraic set K ⊂ Rn . One further assumes that the reward mapping x 7→ c(x) is polynomial, that is, c ∈ R[x]p . Therefore, in this context, the robust LP problem reads:
ρ∗ = sup inf c(x)0 y,
(13.5)
y∈Ω x∈K
where Ω := {y ∈ Rp : A y = b; y ≥ 0}. Equivalently, ρ∗ = sup { λ : y ∈ Ω , λ,y
x 7→ c(x)0 y − λ > 0 on K }.
We next define appropriate semidefinite relaxations for problem (13.5). Let K ⊂ Rn be the basic semi-algebraic set in (4.1), and for 2i ≥ i0 := max[deg c, deg gj ], consider the hierarchy of semidefinite programs: ρi := sup λ λ,y,σj
s.t. A y = b ;
y≥0
c(x)0 y − λ = σ0 (x) +
m X
σj (x)gj (x),
j=1
(σj )m j=0 ∈ Σ(x) deg σ0 , deg σj + deg gj ≤ 2i,
∀ x ∈ Rn
(13.6)
j = 1, . . . , m.
That (13.6) is a semidefinite program is because writing x 7→ ck (x) = P α α ckα x for every k = 1, . . . , p, and recalling how the semidefinite relaxations introduced in Chapters 4 and 5 are formulated, the constraint 0
c(x) y − λ = σ0 (x) +
m X j=1
σj (x)gj (x),
∀ x ∈ Rn
268
13 Min-Max Problems and Nash Equilibria
is the same as stating (with Nn2i = {α ∈ N : |α| ≤ 2i}) p X
k=1
ckα yk − λ1{0} (α) = hBα , Xi +
m X j=1
hCjα , Zj i,
∀α ∈ Nn2i ; X, Zj 0,
for some appropriate real symmetric matrices (Bα , Cjα ), and where the unknowns are λ ∈ R, y ∈ Rp and the real symmetric matrices X, Zj 0. The sequence (ρi ) is obviously monotone nondecreasing and in addition, ρi ≤ ρ∗ for every i ≥ i0 . Theorem 13.1. Let K be as in (4.1) and satisfy Assumption (2.1). Let ρ∗ be as in (13.5) and assume that Ω = {y ∈ Rp : Ay = b, y ≥ 0} is compact. Consider the hierarchy of semidefinite programs defined in (13.6). Then ρi ↑ ρ∗ as i→∞. Proof. For every y ∈ Ω, let y 7→ ρ(y) := inf {c(x)0 y : x ∈ K} so that ρ∗ = sup {ρ(y) : y ∈ Ω}. Observe that necessarily, λ ≤ ρ(y) ≤ ρ∗ for every feasible solution (y, λ, (σj )) of (13.6). On the other hand, as K is compact and c ∈ R[x]p , ρ(y) is continuous; hence ρ∗ = ρ(y∗ ) for some y∗ ∈ Ω because Ω is compact. So with > 0 fixed, arbitrary, by Theorem 2.14, there exists s.o.s. polynomials (σj ) ⊂ Σ[x] such that c(x)0 y∗ − ρ∗ + = σ0 (x) +
m X j=1
σj (x)gj (x),
x ∈ Rn .
Hence for sufficiently large i, (y∗ , λ(:= ρ∗ − ), (σj )) is a feasible solution of (13.6), and so ρi ≥ ρ∗ − . Therefore, as > 0 was arbitrary, letting →0 yields the desired result ρi ↑ ρ∗ . Observe that the semidefinite relaxations (13.6) are basically the same as those defined for a polynomial optimization problem on K, with the additional variables y ∈ Rp and linear constraints Ay = b (that do not change with the index of the relaxation). If Ω and K are compact then ρ∗ = c(x∗ )0 y∗ for some y∗ ∈ Ω and some x∗ ∈ K. Indeed let (yi )i ⊂ Ω be a maximizing sequence in (13.5). For every yi ∈ Ω there exists xi ∈ K that minimizes x 7→ c(x)0 yi on K with value ρi := c(xi )0 yi →ρ∗ as i→∞. For some subsequence (ik ) one gets
13.1 Robust Polynomial Optimization
269
(yik , xik )→(y∗ , x∗ ) ∈ Ω × K, and by continuity, ρ∗ = lim ρik = lim c(xik )0 yik = c(x∗ )0 y∗ . k→∞
k→∞
Corollary 13.2. Let K be as in (4.1) and satisfy Assumption (2.1). Let ρ∗ be as in (13.5) and assume that Ω = {y ∈ Rp : Ay = b, y ≥ 0} is compact. Let (y∗ , x∗ ) ∈ Ω × K be an optimal solution of (13.5), and consider the hierarchy of semidefinite programs defined in (13.6). If the polynomial x 7→ c(x)0 y∗ − c(x∗ )0 y∗ which is nonnegative on K, has Putinar’s representation (2.13), i.e., if x 7→ c(x)0 y∗ − ρ∗ = σ0∗ +
m X
σj∗ gj ,
(13.7)
j=1
for some s.o.s. polynomials (σj∗ ) ⊂ Σ[x], then ρ∗ = ρi∗ for some index i∗ , that is, finite convergence occurs. Proof. Let 2i∗ := max[2i0 , maxj deg σj∗ gj ] with (σj ) as in (13.7). Then (y∗ , ρ∗ , (σj∗ )) is a feasible solution of the semidefinite program (13.6), with value ρi∗ = ρ∗ . So in this case, by solving the semidefinite program (13.6) for some index i∗ , one obtains an optimal robust decision y∗ ∈ Ω and one also identifies an associated worst ”move” x∗ ∈ K by nature. 13.1.2
Robust Semidefinite Programming
Consider now the following semidefinite program: sup { c(x)0 y : y
p X j=1
Aj y j A 0 }
(13.8)
where (Aj ) ⊂ Rt×t are real symmetric matrices and y is a decision vector of Rp . As before, x ∈ Rn is a perturbation taking values in a basic semialgebraic set K ⊂ Rn , unknown at the time where the decision y should be taken, and again, one assumes that the reward mapping x 7→ c(x) is polynomial, i.e., c ∈ R[x]p . Therefore, the robust semidefinite problem reads: ρ∗ = sup inf c(x)0 y, y∈Ω x∈K
13 Min-Max Problems and Nash Equilibria
270
Pp which is the same as (13.5) but now with Ω := {y ∈ Rp : j=1 Aj yj A0 }, still a convex set (but not a polytope). Therefore, with K ⊂ Rn be the basic semi-algebraic set defined in (4.1), and for 2i ≥ max[deg c, deg gj ], consider the hierarchy of semidefinite programs: ρi := sup λ s.t.
λ,y,σj p X j=1
Aj y j A 0 0
c(x) y − λ = σ0 (x) +
m X
(13.9) σj (x)gj (x),
j=1
∀x ∈ R
(σj )m j=0 ∈ Σ(x), deg σj + deg gj ≤ 2i,
n
j = 1, . . . , m.
The sequence (ρi ) is obviously monotone non decreasing.
Theorem 13.3. Let K be as in (4.1) and satisfy Assumption (2.1). Let Pp ρ∗ be as in (13.5) and assume that Ω = {y ∈ Rp : j=1 Aj yj A0 } is compact. Consider the hierarchy of semidefinite programs defined in (13.9). Then ρi ↑ ρ∗ as i→∞. The proof is a verbatim copy of that of Theorem 13.1 and is omitted. Finally, an obvious analogue of Corollary 13.2 holds with Ω = {y ∈ Rp : Pp j=1 Aj yj A0 }. 13.2
Minimizing the Sup of Finitely Many Rational Functions
In this section we consider the global optimization problem:
ρ∗ := inf
x∈K
f0 (x) + max fj (x) j=1,...,p
where: • K is a basic semi-algebraic set as in (4.1) and,
(13.10)
13.2 Minimizing the Sup of Finitely Many Rational Functions
271
• x 7→ fj (x) := pj (x)/qj (x) with pj , qj ∈ R[x] and qj > 0 on K, for all j = 0, 1, . . . , p. b ⊂ Rn+1 be the basic semi-algebraic set Let K
{ (x, z) ∈ Rn × R : x ∈ K;
z qj (x) − pj (x) ≥ 0, j = 1, . . . , p }, (13.11)
and consider the new optimization problem Z Z ρmom = inf { (p0 + z q0 ) dµ : q0 dµ = 1 }, b + µ∈M (K)
b K
b K
(13.12)
b + is the set of finite Borel measures on K). b In Prob(where recall that M (K) lem (13.12) one recognizes an instance of the generalized moment problem (1.1). Proposition 13.4. Let ρ∗ and ρmom be as in (13.10) and (13.12) respectively. If ρ∗ > −∞ then ρ∗ = ρmom Proof. For > 0 fixed, arbitrary, let x ∈ K be such that f0 (x) + b maxi=1,...,p fi (x) ≤ ρ∗ + . Let z := maxi=1,...,p fi (x) so that (x, z) ∈ K because x ∈ K and qi > 0 on K for every i = 1, . . . , p. With δ(x,z) being b let µ be the measure q0 (x)−1 δ(x,z) . Then the Dirac measure at (x, z) ∈ K, R R b µ ∈ M (K)+ with q0 dµ = 1, and (p0 + zq0 )dµ = p0 (x)/q0 (x) + z ≤ ρ + . As > 0 was arbitrary, it follows that ρmom ≤ ρ∗ . R b + be such that q0 dµ = 1. As On the other hand, let µ ∈ M (K) f0 (x) + maxi=1,...,p fi (x) ≥ ρ∗ for all x ∈ K, it follows that p0 (x) + zq0 (x) ≥ b (as q0 > 0 on K). Integrating with respect to µ ρ∗ q0 (x)R for all (x, z) ∈ K R yields (p0 + zq0 )dµ ≥ ρ∗ q0 dµ = ρ∗ , which proves that ρmom ≥ ρ∗ , and so, ρmom = ρ∗ , the desired result. Semidefinite relaxations If K is compact, and under Assumption 2.1, let max{|pi (x)| : x ∈ K} M1 := max , i=1,...,p min{qi (x) : x ∈ K}
(13.13)
and M2 :=
min
i=1,...,p
min{pi (x) : x ∈ K} max{qi (x) : x ∈ K}
.
(13.14)
13 Min-Max Problems and Nash Equilibria
272
b to be Redefine the set K
b := {(x, z) ∈ Rn × R : hj (x, z) ≥ 0, K
j = 1, . . . p + m + 2}
(13.15)
with
j = 1, . . . , p (x, z) 7→ hj (x, z) := gj (x) (x, z) 7→ hj (x, z) := z qj−p (x) − pj−p (x) j = p + 1, . . . , p + m . (x, z) 7→ hj (x, z) := (M1 − z)(z − M2 ) j = m + p + 1 (13.16) Lemma 13.5. Let K ⊂ Rn be compact and let Assumption 2.1 hold. Then b ⊂ Rn+1 defined in (13.15) also satisfies Assumption 2.1. the set K
Proof. As Assumption 2.1 holds for K, equivalently, the quadratic polynomial x 7→ M − kxk2 can be written in the form (2.13); see Theorem 2.15. So consider the quadratic polynomial w ∈ R[x, z] defined by: (x, z) 7→ w(x, z) = M − kxk2 + (M1 − z)(z − M2 ). Obviously, its level set {(x, z) : w(x, z) ≥ 0} ⊂ Rn+1 is compact and moreover, w can be written in the form w(x, z) = σ0 (x) +
p X
σj (x) gj (x) + hm+p+1 (x, z)
j=1
for some s.o.s. polynomials (σj ) ⊂ R[x, z]. Therefore Assumption 2.1 holds b the desired result. for K,
We are now in position to define a hierarchy of semidefinite relaxations for solving (13.10). Let y = (yα ) be a real sequence indexed in the monomial basis (xβ z k ) of R[x, z] (with α = (β, k) ∈ Nn × N). Let h0 (x, z) := p0 (x) + zq0 (x), and let vj := d(deg hj )/2e for every j = 0, . . . , m + p + 1. For i ≥ i0 := max vj , introduce the hierarchy j=0,...,p+m+1
of semidefinite programs:
ρi := inf Ly (h0 ) y
s.t. Mi (y) 0 Mi−vj (hj y) 0, Ly (q0 ) = 1.
j = 1, . . . , m + p + 1
(13.17)
13.3 Application to Nash Equilibria
273
Theorem 13.6. Let K ⊂ Rn be compact and as in (4.1). Let Assumption 2.1 hold and assume that ρ∗ > −∞. Consider the semidefinite program (13.17) with (hj ) ⊂ R[x, z] and M1 , M2 defined in (13.16) and (13.13)-(13.14) respectively. Then: (a) ρi ↑ ρ∗ = ρmom as i→∞. (b) Let yi be an optimal solution of (13.17). If rank Mi (yi ) = rank Mi−i0 (yi ) =: t
(13.18)
then one may extract t points (x∗ (k))tk=1 ⊂ K, all global minimizers of problem (13.10). Theorem 13.6 is just Theorem 4.1 applied to the generalized moment problem (13.12). 13.3
Application to Nash Equilibria
Nash equilibrium, a central concept in game theory, is a profile of mixed strategies (a strategy for each player) such that each player is bestresponding to the strategies of the opponents. In this section we are concerned with algorithms to effectively compute (or at least approximate) Nash equilibria in the context of N-players non zero-sum finite games, and 2-players zero-sum polynomial games. 13.3.1
N-player games
A finite game is a tuple (N, (S i )i=1,...,N , (g i )i=1,...,N ) where N ∈ N is the set of players, S i is the finite set of pure strategies1 and g i : S → R is the payoff function of player i, where S := S 1 × · · · × S N . The set X ∆i = pi = pi (xi ) x ∈S i : pi (xi ) ≥ 0, pi (xi ) = 1 i i xj ∈S
of probability distributions over S i is called the set of mixed strategies of i player i. Notice that ∆i is a simplex of R|S | , hence a compact basic semi1 When
using a pure strategy, Player i chooses a single action xi ∈ S i deterministically, whereas when using a mixed strategy he randomly selects an action x i ∈ S i , according to some probability distribution on S i .
274
13 Min-Max Problems and Nash Equilibria
algebraic set. If each player some mixed strategy pi (·), the vector i chooses 1 1 N ∈ ∆ = ∆ × ... × ∆N is called the profile, and the denoted p = p , ..., p expected payoff of player i is gi (p) =
X
x∈S
p1 (x1 ) · · · pN (xN )g i (x).
For a player i, and a profile p, let p−i be the profile of the other players except i, that is, p−i = (p1 , ..., pi−1 , pi+1 , ..., pN ), and let S −i = S 1 × · · · × S i−1 × S i+1 · · · × S N . Given a profile p and an action xi ∈ S i , define g i (xi , p−i ) :=
X
x−i ∈S −i
p1 (x1 ) · · · pi−1 (xi−1 )pi+1 (xi+1 ) · · · pN (xN )g i (x),
where x−i ∈ S −i denotes the vector (x1 , . . . , xi−1 , xi+1 , . . . , xN ). The value of the game is given by: i ρ∗ := inf sup g (xi , p−i ) − gi (p) , p∈∆ i∈N,xi ∈S i
(13.19)
and a profile p0 is a Nash equilibrium (in mixed strategies) if and only if, for all i ∈ N and all xi ∈ S i , g i (p0 ) ≥ g i (xi , p−i 0 ), or equivalently if i p0 ∈ arg min max g (xi , p−i ) − gi (p) . (13.20) p∈∆ i∈N,xi ∈S i
As each S i is finite, the min-max problem (13.19) is a particular instance of problem (13.10). Therefore, one can use the hierarchy of semidefinite relaxations (13.17) to approximate ρ∗ as closely as desired. In addition, if (13.18) is satisfied at some step in the hierarchy, then one obtains an optimal strategy. Moreover, since the optimal value is zero, one knows when the algorithm should stop and if it does not stop, one has a bound on payoffs so that one knows which -equilibrium is reached.
Example 13.1. Consider the simple illustrative example of a 2 × 2 game with data x12 x22 x11 (a, c) (0, 0) x21 (0, 0) (b, d) for some scalars (a, b, c, d). For instance, when Player 1 selects action x11 and Player 2 selects action x12 , the reward of Player 1 (resp. Player 2) is a
13.3 Application to Nash Equilibria
275
(resp. c). Let p := (p1 , p2 ) where p1 ∈ [0, 1] is the probability that player 1 plays x11 and p2 ∈ [0, 1] is the probability that player 2 plays x12 . Then one has to solve: 1 a p − a p1 p2 − b (1 − p1 )(1 − p2 ) b (1 − p2 ) − a p1 p2 − b (1 − p1 )(1 − p2 ) inf 2 sup c p1 − c p1 p2 − d (1 − p1 )(1 − p2 ) p∈[0,1] d (1 − p1 ) − c p1 p2 − d (1 − p1 )(1 − p2 ), that is, one has to minimize the supremum of four bivariate bilinear polynomials on the box [0, 1]2 . The first relaxation in the hierarchy of semidefinite programs (13.17) reads: ρ1 = inf y001 y
s.t. M1 (y) 0 −y002 + (M1 + M2 )y001 − M1 M2 y000 ≥ 0 y100 − y200 ≥ 0, y010 − y020 ≥ 0 y001 − ay100 + ay110 + b(y0 − y100 − y010 + y110 ) ≥ 0 y001 − by0 + by010 + ay110 + b(y0 − y100 − y010 + y110 ) ≥ 0 y001 − cy100 + cy110 + d(y0 − y100 − y010 + y110 ) ≥ 0 y001 − dy0 + dy100 + cy110 + d(y0 − y100 − y010 + y110 ) ≥ 0 y000 = 1 e.g. with M1 = max[a, b, c, d] and M2 = − max[a + b, c + d]. Solving (13.17) with the GloptiPoly software (see Section D) and with (a, b, c, d) = (0.05, 0.82, 0.56, 0.76), one obtains ρ3 ≈ 3.93.10−11 and the three optimal solutions p ∈ {(0, 0), (1, 1), (0.57575, 0.94253)}. With randomly generated a, b, c, d ∈ [0, 1] we also obtained a very good approximation of the global optimum 0 and 3 optimal solutions in most cases with i = 3 (i.e. with moments or order 6 only) and sometimes i = 4. We have also considered 2-player non-zero sum p × q games with randomly generated reward matrices A, B ∈ Rp×q and p, q ≤ 5. We could solve 5 × 2 and 4 × q (with q ≤ 3) games exactly with the 4th (sometimes 3rd) semidefinite relaxation, i.e. ρ4 = O(10−10 ) ≈ 0 and one could extract an optimal solution.2 However, the size is relatively large and one is close to the limit of present public semidefinite solvers like SeDuMi. Indeed, for 2 In
fact GloptiPoly 3 extracts all solutions because most semidefinite solvers that one may call in GloptiPoly 3 (e.g. SeDuMi) use primal-dual interior points methods which find an optimal solution in the relative interior of the feasible set. In the present context of (13.17) this means that at an optimal solution y ∗ , the moment matrix Mi (y∗ ) has maximum rank and its rank corresponds to the number of solutions.
13 Min-Max Problems and Nash Equilibria
276
a 2-player 5 × 2 or 4 × 3 game, the third semidefinite relaxation has 923 variables and M3 (y) ∈ R84×84 , whereas the fourth relaxation has 3002 variables and M4 (y) ∈ R210×210 . For a 4 × 4 game the third relaxation with 1715 variables and M3 (y) ∈ R120×120 is still solvable, whereas the fourth relaxation has 6434 variables and M4 (y) ∈ R330×330 . 13.3.2
Two-player zero-sum polynomial games
Consider now the min-max optimization problem:
ρ∗ :=
inf
sup
µ∈P(K1 ) ν∈P(K2 )
Z
K2
Z
p(x, z) dµ(x) dν(z)
(13.21)
K1
where K1 ⊂ Rn1 , K2 ⊂ Rn2 are basic semi-algebraic set as defined in (4.1), and P(K1 ) (resp. P(K2 )) denotes the set of probability measures on K1 (resp. K2 ). Problem (13.21) models a two-player zero-sum polynomial game. A probability measure µ ∈ P(K1 ) (resp. ν ∈ P(K2 )) corresponds to a mixed strategy of player 1 (resp. player 2). In such a game, player 1 chooses an action x ∈ K1 according to the probability distribution µ whereas player 2 chooses an action z ∈ K2 according to the probability distribution ν. The payoff of player 1 is −p(x, z) and p(x, z) for player 2. If K1 and K2 are compact, there exists an optimal solution (µ∗ , ν ∗ ) in which µ∗ (resp. ν ∗ ) corresponds to an optimal mixed strategy for player 1 (resp. player 2). In addition Z Z ρ∗ := sup inf p(x, z) dν(x) dµ(z) (13.22) ν∈P(K2 ) µ∈P(K1 )
K1
K2
We next show how to determine (or approximate) ρ∗ and an optimal pair of strategies (µ∗ , ν ∗ ). With p ∈ R[x, z] as in (13.21), write X p(x, z) = pα (x) zα
with
α∈Nn2
pα (x) =
X
β∈Nn1
pαβ xβ ,
|α| ≤ dz ,
(13.23)
13.3 Application to Nash Equilibria
277
where dz is the total degree of p when seen as a polynomial in R[z]. So, let pαβ := 0 for every β ∈ Nn1 whenever |α| > dz . Given a sequence y = (yα ) indexed in the canonical basis (xα ) of R[x], let py ∈ R[z] be the polynomial defined by: X X z 7→ py (z) := pαβ yβ zα . (13.24) α∈Nn2
β∈Nn1
Let K1 ⊂ Rn1 (resp. K2 ⊂ Rn2 ) be the basic semi-algebraic set defined in (4.1) with m1 polynomials (gj ) ⊂ R[x] (resp. m2 polynomials (hk ) ⊂ R[z]). Let rj := ddeg gj /2e, for every j = 1, . . . , m1 , and with h0 = 1, consider the following hierarchy of semidefinite programs: ρi := inf
y,λ,σk
s.t.
λ
λ − py =
m2 X
σk h k
k=0
Mi (y) 0 Mi−rj (gj y) 0, j = 1, . . . , m1 y0 = 1 σk ∈ Σ[z]; deg σk + deg hk ≤ 2i, k = 0, 1, . . . , m2 .
(13.25)
Observe that with py as in (13.24), for any admissible solution (y, λ) of (13.25), λ ≥ sup { py (z) : z ∈ K2 }.
(13.26)
z
Similarly, with p as in (13.21), write X p(x, z) = pˆα (z) xα
with
(13.27)
α∈Nn1
pˆα (z) =
X
pˆαβ zβ ,
β∈Nn2
|α| ≤ dx ,
where dx is the total degree of p when seen as a polynomial in R[x]. So, let pˆαβ := 0 for every β ∈ Nn2 whenever |α| > dx . Given a sequence y = (yα ) indexed in the canonical basis (zα ) of R[z], let X X x 7→ pˆy (x) := pˆαβ yβ xα . (13.28) α∈Nn1
β∈Nn2
13 Min-Max Problems and Nash Equilibria
278
Let lk := ddeg hk /2e, for every k = 1, . . . , m2 , and consider the following hierarchy of semidefinite programs (with g0 = 1): ρi := sup γ y,γ,σj
s.t.
pˆy − γ =
m1 X
σj g j
j=0
Mi (y) 0 Mi−lk (hk y) 0, k = 1, . . . , m2 y0 = 1 σj ∈ Σ[x]; deg σj + deg gj ≤ 2i, j = 0, 1, . . . , m1 .
(13.29)
Observe that with pˆy as in (13.28), for any admissible solution (y, γ) of (13.29), γ ≤ inf { pˆy (x) : x ∈ K1 }. x
Let r1 := maxj=1,...,m1 ddeg gj /2e, and let r2 := maxk=1,...,m2 ddeg hk /2e. Theorem 13.7. Let K1 ⊂ Rn1 and K2 ⊂ Rn2 both satisfy Assumption 2.1 and let ρ∗ be the value of the game (13.21). Let ρi and ρi be the respective optimal values of the semidefinite programs (13.25) and (13.29). Then: (a) limi→∞ ρi = limi→∞ ρi = ρ∗ . (b) Let y1 be part of an optimal solution of (13.25) with optimal value ρi , and let y2 be part of an optimal solution of (13.29) with optimal value ρk . If ρi ≤ ρk and if rank Mi (y1 ) = rank Mi−r1 (y1 ) (=: s1 ) rank Mk (y2 ) = rank Mk−r2 (y2 ) (=: s2 ) then ρi = ρk = ρ∗ . In addition, there is an optimal strategy µ∗ ∈ P(K1 ) for player 1 which is supported on s1 points of K1 and an optimal strategy ν ∗ ∈ P(K2 ) for player 2 which is supported on s2 points of K2 . Proof. (a) Let µ∗ ∈ P(K1 ), ν ∗ ∈ P(K2 ) be optimal strategies of player 1 and player 2 respectively, and let y∗ = (yα∗ ) be the sequence of moments
13.3 Application to Nash Equilibria
279
of µ∗ (well-defined because K1 is compact). Then ρ∗ =
sup ν∈P(K2 )
=
sup ν∈P(K2 )
=
sup ν∈P(K2 )
=
sup ν∈P(K2 )
Z
Z
Z
Z
K2
Z
p(x, z)dµ∗ (x) K1
X
K2 α∈Nn2
X
K2 α∈Nn2
K2
X
pαβ
β∈Nn1
X
β∈Nn1
Z
dν(z)
K1
xβ dµ∗ (x) zα dν(z)
pαβ yβ∗ zα dν(z)
py∗ (z) dν(z) = sup { py∗ (z) : z ∈ K2 } z
= inf { λ : λ − py∗ = σ0 + λ,σk
where recall that z 7→ py∗ (z) = arbitrary. Then
m2 X
σk h k ;
k=1
P
α∈Nn
P
ρ∗ − py∗ + = σ0 +
2 (σj )m j=0 ⊂ Σ[z] },
α∈Nn
m2 X
pαβ yβ∗
σk hk ,
zα . Fix > 0
(13.30)
k=1
for some polynomials (σk ) ⊂ Σ[z] of degree at most d1 . Hence (y ∗ , ρ∗ +, σk ) is an admissible solution for the semidefinite program (13.25) whenever 2i ≥ d := d1 + supk deg hk , and so ρi ≤ ρ∗ + ,
∀ 2i ≥ d .
(13.31)
Now, let (yi , λi ) be an admissible solution of the semidefinite program (13.25) with value λi ≤ ρi + 1/i. By an argument already used several ˆ ∈ R∞ and a subsequence (ik ) such that times in Chapter 4, there exists y ik the pointwise convergence y →ˆ y holds, that is, ik ˆα lim yα = y
k→∞
∀α ∈ Nn1 .
(13.32)
But then, invoking (13.32) yields Mr (ˆ y) 0
ˆ ) 0, and Mr (gj y
∀j = 1, . . . , m1 ;
r = 0, 1, . . .
13 Min-Max Problems and Nash Equilibria
280
By Theorem 2.14, there exists µ ˆ ∈ P(K1 ) such that Z yˆα = xα dˆ µ, ∀α ∈ Nn1 . K1
On the other hand, Z Z ρ∗ ≤ sup ν∈P(K2 )
K2
p(x, z)dˆ µ(x) K1
= inf { λ : λ − pyˆ = σ0 + P
m2 X
dν(z) = sup { pyˆ (z) : z ∈ K2 }
σk h k ;
k=1
z
2 (σj )m j=0 ⊂ Σ[z] },
P with z 7→ pyˆ (z) = α∈Nn2 ˆβ zα . Let ρ := supz {pyˆ (z) : β∈Nn1 pαβ y z ∈ K2 } (hence ρ ≥ ρ∗ ), and consider the polynomial X X z 7→ pyi (z) := pαβ yβi zα . α∈Nn2
β∈Nn1
It has same degree as pyˆ , and by (13.32), kpyˆ − pyik k→0 as k→∞ (for some norm on the (finite dimensional vector) of coefficients). Hence, supz {pyik (z) : z ∈ K2 }→ ρ ≥ ρ∗ as k→∞. Moreover, by (13.26) one obtains λik ≥ supz {pyik (z) : z ∈ K2 }. Therefore, λik ≥ ρ − ≥ ρ∗ − for all sufficiently large k (say for all ik ≥ d2 ). This combined with λik ≤ ρ∗ + + 1/ik for all 2ik ≥ d1 , yields the desired result that limk→∞ ρik = limk→∞ λik = ρ∗ , because > 0 (fixed) was arbitrary. Finally, as the converging subsequence (ik ) was arbitrary, we get that the entire sequence (ρi ) converges to ρ∗ . Similar arguments are used to prove ρi →ρ∗ . (b) Let y1 , y2 be as in Theorem 13.7. If the two rank conditions hold, by Theorem 3.11, y1 (resp. y2 ) is the moment sequence of some probability measure µ∗ ∈ P(K1 ) supported on s1 points of K1 (resp. ν ∗ ∈ P(K2 ) supported on s2 points of K2 ). From the proof of (a) it follows immediately that ρi ≥ supz {py1 (z) : z ∈ K2 } ≥ ρ∗ and similarly ρk ≤ inf x {ˆ py2 (x) : x ∈ K1 } ≤ ρ∗ . As ρi ≤ ρk it follows that ρi = ρk = ρ∗ . Hence, (µ∗ , ν ∗ ) is an optimal pair of strategies for the game (13.21). 13.3.3
The univariate case
Let K1 , K2 ⊂ R be intervals of the real line, say K1 = [a, b] and K2 = [c, d]. Then the optimal value ρ∗ of the game (13.7) is obtained by solving a single semidefnite program.
13.4 Exercises
281
Namely, let d1 be the maximum degree of the univariate polynomial x 7→ pα (x) defined in (13.23), when |α| ≤ dz . Similarly, let d2 be the maximum degree of the univariate polynomial z 7→ pˆα (z) defined in (13.27), when |α| ≤ dx . Consider the semidefinite programs: inf λ
y,λ,σk
s.t.
λ − py = σ0 + σ1 (d − z)(z − c) Mdd1 /2e (y) 0 Mdd1 /2e−1 ((b − x)(x − a), y) 0 y0 = 1 σ0 , σ1 ∈ Σ[z]; deg σj ≤ dz , j = 0, 1,
(13.33)
and sup λ y,λ,σk
s.t.
pˆy − λ = σ0 + σ1 (b − x)(x − a) Mdd2 /2e (y) 0 Mdd2 /2e−1 ((d − z)(z − c), y) 0 y0 = 1 σ0 , σ1 ∈ Σ[x]; deg σj ≤ dx , j = 0, 1.
(13.34)
Corollary 13.8. If K1 and K2 are intervals of the real line R, then the value ρ∗ of the game (13.21) is the optimal value of the semidefinite programs (13.33) and (13.34).
13.4
Exercises
Exercise 13.1. Prove the following lemma: Lemma 13.9. Let K ⊂ Rn be compact and let p, q : Rn →R be continuous with q > 0 on K. Let P(K) ⊂ M (K)+ be the set of Borel probability measures on K. Then
13 Min-Max Problems and Nash Equilibria
282
inf
µ∈P(K)
Z
p dµ
ZK
= q dµ
inf
ϕ∈M (K)+
K
=
13.5
inf
µ∈P(K)
{
Z
Z
K
p dϕ : K
Z
q dϕ = 1} K
p(x) p dµ = inf . x∈K q(x) q
Notes and Sources
13.1. Most of this section is from Lasserre (2006d) and for a survey on algorithms and complexity of robust optimization the interested reader is referred to e.g. Ben-Tal et al. (2000). For recent various applications see Ben-Tal et al. (2006) and all papers in the same issue of the journal. 13.2 and 13.3. The material of these sections is from Laraki and Lasserre (2008b). Corollary 13.8 was first proved in Parrilo (2006). It is well known that any 2-player zero-sum finite game is reducible to a linear program and hence could be computed in polynomial time; see e.g. Dantzig (1963). Lemke and Howson (1964) provided a famous algorithm that computes a Nash equilibrium of any 2-player non zero-sum finite game. The algorithm has been extended to n-player finite games in Rosenm¨ uller (1971), Wilson (1971) and Govindan and Wilson (2003). An alternative to the Lemke-Howson algorithm for 2-player games is provided in van den Elzen and Talman (1991) and has been extended to nplayer games in Herings and van den Elzen (2002). As shown in the recent survey Herings and Peeters (2009), all these algorithms are homotopy-based and converge under some regularity condition. Recently, Savani and von Stengel (2006) proved that the Lemke-Howson algorithm for 2-player games may be exponential. One may expect that this result extends to all known homotopy methods. Daskalakis et al. (2006) proved that solving numerically 3-player finite games is hard.3 The result has been extended to 2-player finite games by Chen and Deng (2006). For a recent survey on the complexity of computing equilibria on game theory, see Roughgarden (2009). A different approach to solve the problem is to view the set of Nash 3 More
precisely, it is complete in the PPAD class of all search problems that are guaranteed to exist by means of a direct graph argument. This class was introduced in Papadimitriou (1994) and is between P and N P .
13.5 Notes and Sources
283
equilibria as the set of real nonnegative solutions to a system of polynomial equations. Methods of computational algebra (e.g. using Gr¨ obner bases) can be applied as suggested and studied in e.g. Datta (2009), Lipton and Markakis (2004) and Sturmfels (2002). However, in this approach, one first computes all complex solutions to sort out all real nonnegative solutions afterwards. Interestingly, polynomial equations can also be solved via homotopy-based methods (see e.g. Verschelde (1999)). Notice that if one adopts the viewpoint of Nash equilibria as real solutions of polynomial equations, then the machinery of Chapter 6 can be applied so as to avoid computing complex solutions. These above algorithms are concerned with finite games. On the other hand, in Polynomial games introduced by Dresher et al. (1950), the set of pure strategies Si of Player i is a product of compact intervals. When the game is zero-sum and Si = [0, 1] for each player i, Parrilo (2006) showed that finding an optimal solution is equivalent to solving a single semidefinite program; see Corollary 13.8. Shah and Parrilo (2007) extended the methodology to discounted zero-sum stochastic games in which the transition is controlled by one player only. Finally, it is worth noticing recent algorithms designed to solve some specific classes of infinite games; for instance see G¨ urkan and Pang (2009).
Chapter 14
Bounds on Linear PDE
This chapter applies the moment approach to provide bounds on functionals of solutions of linear partial differential equations with boundary conditions and polynomial coefficients.
14.1
Linear Partial Differential Equations
In general, it is impossible to obtain analytical solutions of partial differential equations (PDEs) and so one has to invoke numerical methods. A typical approach is to first discretize the domain and then obtain an approximate solution by solving the resulting equations, and matching boundary values and initial conditions. As an immediate consequence, the computational complexity is exponential in the dimension because with O(1/) points in each dimension, one needs to solve a system of size O((1/)n ) for n-dimensional PDEs and a desired accuracy of O(). In this chapter we provide a methodology to compute upper and lower bounds on some functional of the solution of a PDE with boundary conditions. Let L be the following partial differential operator with polynomial coefficients : L=
X
Lα
α∈Nn
∂α , ∂xα
Lα ∈ R[x]
∀ α ∈ Nn ,
(14.1)
where for every α ∈ Nn , ∂ α u(x) ∂ |α| u(x) = . ∂xα ∂ α 1 x1 · · · ∂ α n xn Let Ω ⊂ Rn with boundary ∂Ω, f : Rn →R, and consider the linear PDE: 285
286
14 Bounds on Linear PDE
Lu = f
on Ω,
(14.2)
with some additional appropriate boundary conditions on ∂Ω. The goal is to evaluate (or provide upper and lower bounds on) the functional Z J := (G u) dx, (14.3) Ω
for some linear operator G := solution of the PDE (14.2).
P
α
α
∂ Gα ∂x α as in (14.1), and where u is a
The moment approach Equation (14.2) should be understood in a weak sense, namely Z Z Lu = f ⇐⇒ (Lu)φ dx = f φ dx, ∀φ ∈ D, Ω
Ω
where D is some subset of the space of smooth functions C ∞ . One also assumes that D has a countable dense subset generated by F = (φ1 , φ2 , . . .), so that: Z Z Lu = f ⇐⇒ (Lu)φ dx = f φ dx, ∀φ ∈ D, ZΩ ZΩ ⇐⇒ (Lu)φi dx = f φi dx, ∀φi ∈ F . (14.4) Ω
Ω
Assume that a solution u is bounded from below by u0 . Then the basic underlying idea is to consider the function v := u − u0 as a nonnegative element of L1 (Ω), i.e. the density of a measure dµ = vdx on Ω, absolutely continuous with respect to the Lebesgue measure. Integration by parts of (14.4) yields countably many linear conditions on the moments of µ and additional variables (moments of appropriate measures on the boundary ∂Ω). Similarly, J in (14.3) becomes a linear expression in the same variables. Then one computes a lower (resp. upper) bound by minimizing (resp. maximizing) the latter linear expression under: - finitely many linear moment constraints obtained from (14.4), and - appropriate moment conditions for support constraints of the involved measures.
14.1 Linear Partial Differential Equations
287
By increasing the number of moment constraints, one obtains tighter lower and upper bounds. As the form of the moment constraints is problem dependent, the approach is better illustrated on the following simple one-dimensional example. Let the PDE (14.2) be the ordinary differential equation (ODE): u00 + au0 + bu = f
or, equivalently, v 00 + av 0 + bv = f˜
(with f˜ := f − bu0 ) on the interval Ω = [0, 1]. Let the boundary conditions on ∂Ω be u0 (0) = c0 and u0 (1) = c1 . With φ ∈ F , integration by parts of (14.4) yields Z 1 Z 1 φf˜ dx = v 0 φ |10 − vφ0 |10 + avφ |10 + (φ00 − aφ0 + bφ) vdx 0
0
that is, Z
1 0
φf˜ dx = φ(1)c1 − φ(0)c0 +v(1)(aφ(1) − φ0 (1)) + v(0)(φ0 (0) − aφ(0)) Z 1 + (φ00 − aφ0 + bφ) vdx. 0
Let y = (yk ) be the vector of moments of dµ = vdx, and let γ = (γj ) with Z 1 ˜ dx, γj := xj f(x) j = 0, 1, . . . 0
j
With x 7→ φj (x) := x , one obtains the moment conditions γj = c1 − c0 δj=0
+v(1)(a − j) + v(0)(δj=1 − aδj=0 ) +j(j − 1) yj−2 − aj yj−1 + b yj ,
(14.5)
with the Kronecker symbol δj=· and the convention yj = 0 whenever j < 0. Next, suppose that G(v) := xk v + xp v 0 + xq v” for some k, p, q ∈ N, so that Z 1 (Gv) dx = yk + v(1) − v(0)δp=0 − p yp−1 + c1 − c0 δq=0 0
−q(v(1) − v(0)δq=1 ) + q(q − 1) yq−2
(14.6)
again with the convention yj = 0 if j < 0. With x 7→ g(x) := x(1 − x), and for i ≥ i0 := max[k, p − 1, q − 2], consider the semidefinite programs:
14 Bounds on Linear PDE
288
ρ1i =
sup
(14.6)
y,v(1),v(0)
s.t. Mi (y) 0, Mi−1 (g y) 0 (14.5),
(14.7)
0 ≤ j ≤ 2i,
and ρ2i =
inf
y,v(1),v(0)
(14.6)
s.t. Mi (y) 0, Mi−1 (g y) 0 (14.5),
(14.8)
0 ≤ j ≤ 2i.
Of course ρ2i ≤ J ≤ ρ1i for every i ≥ i0 and so, as i→∞, ρ2i ↑ ρ ≤ J
14.2
and ρ1i ↓ ρ ≥ J.
(14.9)
Notes and Sources
This chapter is inspired from Bertsimas and Caramanis (2006). The moment approach to PDE was already described in Dawson (1980) but with no effective computation as general multivariate moment conditions were not available at that time. The first effective application of the momentapproach to solving PDEs seems to date back to the ”eigenvalue moment method” initiated in Handy and Bessis (1985) in the context of onedimensional quantum systems with rational fraction potential functions. The motivation was to provide an alternative to eigenvalue methods for computing bounds on the quantum ground-state energy. Using a monomial basis of test functions, integration by parts of the associated Schr¨ odinger equation yields linear constraints on the moments of dµ := u dx where u is the solution of the PDE. At that time, semidefinite programming was not a widely spread out technique and so the authors used Hankel-Hadamard determinant constraints (from the Hamburger moment problem) as moment conditions. The latter can be expressed in terms of finitely many moment variables (the so-called missing moments) after elimination of the others via the linear moment equations from the Schr¨ odinger PDE. As observed in several examples, the convergence of upper and lower bounds to the
14.2 Notes and Sources
289
ground-state energy is fast. As the method becomes untractable for more than 3 missing moments, Handy et al. (1988) provided an improvement by using cutting planes to approximate the convex feasible region defined by the positive semidefiniteness of the Hankel-moment matrix; see also Handy et al. (1989) and Handy (2001) for refinements and extensions. A modern treatment would simply use a hierarchy of semidefinite relaxations in the spirit of (14.7)-(14.8). More recently, Bertsimas and Caramanis (2006) applied this technique to Bessel and Helmholtz equations as well as on a queueing application where some queue length is approximated via a reflected Brownian motion. They also reported excellent numerical results.
Final Remarks
We hope to have convinced the reader that the moment approach developed in this book can be useful to help solve or approximate the Generalized Moment Problem and some of its various applications. As already mentioned, and since most of the addressed problems are difficult, the reader will have understood that this methodology has practical limitations mainly due to the size of the initial problem to solve. Indeed so far, and in view of the present status of the semidefinite programming solvers, the approach is limited to problems of modest size. Fortunately, many problems of larger size exhibit some sparsity patterns or symmetries that can be exploited as we have also indicated. In the book, we have not touched upon the (new) theory of positive polynomials in the non commutative case, i.e., polynomials in non commuting variables. Several beautiful results have already been obtained in Non Commutative Real Algebraic Geometry, a new and promising domain of Mathematics. Among these results, one finds some non commutative Positivstellensatze analogues of the Positivstellenstaze presented in Chapter 2 of this book. For an introduction to these concepts the interested reader is referred to the survey Helton and Putinar (2007) as well as Schm¨ udgen (2008) and the many references therein. Potential applications arise in e.g. control as well as in quantum information. For control applications the interested reader is referred to e.g. de Oliveira et al. (2008). Also, a recent non commutative version of the moment approach for polynomial optimization presented in Chapter 5 of this book, has been defined in Pironio et al. (2009) and tested successfully in a quantum information application described in Navascu´es et al. (2008).
291
This page intentionally left blank
Appendix A
Background from Algebraic Geometry
A main concern in algebraic geometry is the analysis of the set of solutions for systems of polynomial equations. In this section, we review some of the basic ingredients of algebraic geometry with emphasis on the strong interplay between geometric notions (the set of solution “points“), and their algebraic representation (the polynomials equations). A.1
Fields and Cones
We denote by x + y and x · y (= xy) the addition and multiplication respectively of two elements x, y of a field. If both F, F1 are fields and F ⊆ F1 , then F1 is a field extension of F , denoted F1 /F . A field extension F1 /F is called algebraic if every element of F1 is a root of some nonzero polynomial with coefficients in F . A field extension which is not algebraic is called transcendental. For instance C/R is an algebraic field extension, whereas R/Q is transcendental. If F1 is regarded as a vector space over F , its dimension is called the degree of the extension. According to this degree, an algebraic field extension is further classified as finite or infinite. All finite field extensions are algebraic, but the converse is not true. A field F is algebraically closed if every univariate polynomial of degree at least 1, and with coefficients in F , has a root in F . Every field F is contained in a field F which is algebraically closed and such that every element of F is the root of a nonzero univariate polynomial with coefficients in F . This field (unique up to isomorphism) is called the algebraic closure of F . For instance, the field C is the algebraic closure of the field R. We begin with the definitions of an ordered field, a cone and a real field. 293
294
A Background from Algebraic Geometry
Definition A.1. An ordered field (F, ≤) is a field F , equipped with an ordering ≤a total order relation ≤ such that for any x, y ∈ F : (a) x ≤ y ⇒ x + z ≤ y + z for all z ∈ F . (b) If 0 ≤ x and 0 ≤ y then xy ≥ 0. The ordering ≤ on F is called Archimedean and F is called an Archimedean ordered field if for each x ∈ F there is n ∈ N such that x ≤ n. For example, with their usual ordering, both Q and R are Archimedean. Definition A.2. A cone of a field F is a subset P ⊂ F such that (a) If x, y ∈ P then x + y ∈ P and xy ∈ P . (b) x2 ∈ P for every x ∈ F . A cone P with the property that −1 6∈ P is called proper. The set {x ∈ F : x ≥ 0} is called the positive cone of (F, ≤), and the set ΣF 2 ⊂ F of finite sums of squares of elements of F , is a cone contained in every cone of F .
Theorem A.1. Let F be a field. Then the following are equivalent: (a) F can be ordered. (b) F has a proper cone. (c) −1 6∈ ΣF 2 . (d) For every x1 , . . . , xn in F n X i=1
x2i = 0
⇒ x1 = · · · = xn = 0.
A field that satifies the above four properties in Theorem A.1 is called a real field. A real closed field F is a field that has no nontrivial real algebraic extension F1 ⊃ F , F1 6= F . For instance, the field Q of rationals is a real field and its extension field Ralg of real algebraic numbers (which is its algebraic closure), is a real closed field. Of course, R is also a real closed field.
A.2 Ideals
A.2
295
Ideals
Let k be an arbitrary field and k[x] = k[x1 , . . . , xn ] be the (commutative) ring of polynomials in n variables with coefficients in k. Definition A.3. An ideal I ⊂ k[x] is a subspace of k[x] that satisfies ∀ f ∈ k[x],
g ∈ I ⇒ f g ∈ I.
Given a finite family F = {f1 , . . . , fm } ⊂ k[x], we denote by I = hf1 , . . . , fn i the ideal generated by F , that is, hf1 , . . . , fn i :=
(m X
gi fi
i=1
:
gi ∈ k[x], i = 1, . . . , m
)
.
The following theorem is a key result on ideals.
Theorem A.2. (The Hilbert Basis Theorem) Every ideal I of k[x] is finitely generated, that is, there exists a finite family F = {f1 , . . . , fs } ⊂ k[x] such that f ∈I ⇒ f =
s X i=1
hi f i ,
hi ∈ k[x], i = 1, . . . , s.
As every ideal of k[x] is finitely generated, one may wonder whether some representations are better than others. Reduced Gr¨ obner bases are precisely representations that are interesting in the sense that if a finite subset F ⊂ k[x] is a reduced Gr¨ obner basis of an ideal I ⊂ k[x] then the ideal membership problem of detecting whether or not a given polynomial f ∈ k[x] is in I, is solvable by Buchberger’s algorithm (the multi-dimensional analogue of the Euclidean algorithm in the univariate case). An important notion is the quotient ring k[x]/I associated with an ideal I of k[x]. Its elements, of the form f + I, are called cosets of I. Namely, k[x]/I consists of equivalence classes modulo I of polynomials f ∈ k[x]. Equivalently, given two polynomials f, g ∈ k[x], f is congruent to g modulo I if and only if f −g ∈ I. It follows that k[x]/I is a commutative ring with the usual addition and multiplication inherited from k[x].
A Background from Algebraic Geometry
296
Definition A.4. The radical ideal defined as follows √ I := {f ∈ k[x] :
√ I ⊂ k[x] of an ideal I ⊂ k[x] is
there exists e ∈ N such that f e ∈ I}.
For instance, with n = 1 and a ∈ R, let I = h(x − a)2 i √ ⊂ R[x] be the ideal of R[x] generated by the polynomial x 7→ x − a. Then, I = hx − ai. A.3
Varieties
An important geometric notion is that of a variety. Definition A.5. Given a subset S ⊂ k[x], and a field K ⊇ k, the set VK (S) ⊂ K n defined by VK (S) := {x ∈ K n : f (x) = 0, ∀f ∈ S} is called a variety of K n . In particular, given an ideal I ⊂ k[x], VK (I) denotes the variety of K n associated with I, and if I = hf1 , . . . , fm i ⊂ k[x], VK (I) = {x ∈ K n : fi (x) = 0, i = 1, . . . , m}. Note that the variety is in K n , whereas the ideal is in k[x]. Similarly, given a subset V ⊂ K n , we define the ideal I(V ) in k[x], by I(V ) := {f ∈ k[x] : f (x) = 0, ∀x ∈ V }. Ideals and varieties illustrate an interplay between algebra and geometry. With ideals of k[x] (algebraic objects) are associated varieties in K n (geometric objects), and conversely, with subsets of K n are associated ideals of k[x]. Example A.1. Let I1 = hf1 i ⊂ R[x] with f1 (x) = x21 + x22 , and let I2 = hf2 , f3 i ⊂ R[x] with f2 (x) = x1 + x2 , and f3 (x) = x2 . We have VR (I1 ) = VR (I2 ) = {0, 0}. On the other hand, VC (I2 ) = {(0, 0)}, whereas VC (I1 ) is the union of the lines x2 = ±ix1 . The example illustrates that I1 and I2 are really different ideals, but the set of solutions in k = R of the corresponding
A.3 Varieties
297
system of equations has too few solutions to distinguish them. Enlarging k = R to C shows that the corresponding varieties are indeed different. A fundamental result in algebraic geometry is the so-called Hilbert Nullstellensatz. Theorem A.3 (Weak Hilbert Nullstellensatz). Let I ⊂ k[x] be an ideal whose associated algebraic variety n o n Vk (I) := x ∈ k : g(x) = 0, ∀g ∈ I , is empty. Then I = k[x].
In particular, consider a family {fi } ⊂ C[x], and the associated set of polynomials equations fi (x) = 0,
i = 1, . . . , m.
(A.1)
Then, the above system has no solution x ∈ Cn if and only if there exist m polynomials gi ∈ C[x] such that 1 =
m X
fi gi .
(A.2)
i=1
In other words, the gi ’s provide a certificate of infeasibility of the system of equations (A.1). Notice that Theorem A.3 also includes the fundamental theorem of algebra. Indeed, if I consists of a single univariate (non constant) polynomial f ∈ C[x], then VC (I) is never empty because we would have that 1 ≡ gf for some g ∈ C[x], which is clearly impossible. Example A.2. Let f1 (x) = x21 + x22 + 1 f2 (x) = x1 + x2 , f3 (x) = x2 and let I = hf1 , f2 , f3 i. Then we have VC (I) = ∅, which is confirmed by the identity 1 = f1 − x1 f2 + (x1 − x2 )f3 . Theorem A.4 (Strong Hilbert √ Nullstellensatz). For every ideal I ⊂ k[x], one has I(Vk (I)) = I.
A Background from Algebraic Geometry
298
√ Theorem A.4 says that if I is radical, i.e., if I = I, then I is in a sense a minimal algebraic description, without redundances (like squares). Definition A.6. An ideal I of a commutative ring A is said to be real if, for every sequence a1 , . . . , ap of elements of A, one has p X i=1
a2i ∈ I
⇒
ai ∈ I,
i = 1, . . . , p.
We now state the Real-Nullstellensatz, the analogue of Theorem A.4 for real closed fields. Theorem A.5 (Real Nullstellensatz). Let k be a real closed field and I an ideal of k[x]. Then I = I(Vk (I)) if and only if I is real. For instance, let k = R and g ∈ k[x] with g(x) = x. Then, the ideal I = hg 2 i is not real, because g 2 ∈ I, whereas g 6∈ I. In fact, we have I(Vk (I)) = hgi 6= I. Given an ideal I ⊂ k[x], the ideal denoted √ k I := {f ∈ k[x] : ∃ p ∈ N, (fi )m i=1 ⊂ k[x]
√ k
I and defined by
s.t.
f 2p +
m X j=1
fi2 ∈ I }
is called the real radical of I. And one has another form of the Real Nullstellensatz: Theorem A.6 (Real Nullstellensatz). Let k be a real closed field √ and I an ideal of k[x]. Then k I = I(Vk (I)).
A.4
Preordering
The following definitions present concepts that play an essential role in real algebraic geometry.
A.4 Preordering
299
Definition A.7. A subset T of a commutative ring A, with 1 ∈ T , is called: (a) a preprime of A if 0 ∈ T,
T + T ⊆ T,
T · T ⊆ T,
and
− 1 6∈ T.
(b) a quadratic module on A if T + T ⊆ T,
A2 · T ⊆ T,
− 1 6∈ T.
and
(c) a preordering of A if T + T ⊆ T,
T · T ⊆ T,
A2 ⊆ T,
and
− 1 6∈ T.
where A2 stands for the set of squared elements of A. Clearly, every preordering is a preprime and a quadratic module, and since every quadratic module contains ΣA2 , the preordering ΣA2 is the smallest quadratic module on A, as well as being the smallest preordering of A. A preordering enjoys the basic properties of positive elements. The maximal quadratic modules S on A also satisfy S ∪ −S = A
and S ∩ −S is a prime ideal.
Thus, if A is a field, S ∩ −S = {0}. In this latter case, the ordering a ≤ b
⇔
b − a ∈ S,
a, b ∈ A,
linearly orders A, so that for all a, b, c ∈ A, a ≤ b ⇒ a + c ≤ b + c,
0 ≤ a ⇒ 0 ≤ a b2 .
Those linear orderings of fields are called semiorderings. Example A.3. We present three examples of preorderings. With A := R, let T := R+ , and with A := C (X , R), the ring of continuous functions f : X →R from a nonnempty topological space X to R, let T := {f ∈ C (X , R) | f ≥ 0}.
300
A Background from Algebraic Geometry
Another example of a preordering, extensively used in the sequel, is obtained as follows. Let R[x] be the ring of real polynomial in the n variables x1 , . . . , xn (also an R-algebra), and let Σ[x] ⊂ R[x] denotes the sums of squares of elements of R[x]. Given a family F := {f1 , . . . , fm } ⊂ R[x], and Q a set J ⊂ {1, . . . , m} we let by fJ (x) = j∈J fj (x), with the convention that f∅ ≡ 1. Then, the set P (f1 , . . . , fm ) :=
X
qJ fJ
:
J⊆{1,...,m}
qJ ∈ Σ[x]
is a preordering, called the preordering generated by F . Finally, with the constant polynomial f0 = 1, the set
Q(f0 , f1 , . . . , fm ) :=
m X
j=0
qj fj
:
qj ∈ Σ[x]
(A.3)
(A.4)
is called the quadratic module generated by the family F . Observe that both P (f1 , . . . , fm ) and Q(f0 , . . . , fm ) are also convex cones of R[x].
A.5
Algebraic and Semi-algebraic Sets over a Real Closed Field
In this section we introduce algebraic and semi-algebraic sets.
A.5 Algebraic and Semi-algebraic Sets over a Real Closed Field
301
Definition A.8. Let k be a real closed field. (a) An algebraic set A of k n is the set of zeros of some subset B ⊂ k[x], that is, A = V (B) = {x ∈ k n : f (x) = 0, ∀f ∈ B}, for some B ⊂ k[x]. (b) A semi-algebraic set A ⊂ k n is a set of the form A =
ri s \ [
i=1 j=1
{x ∈ k n : fij (x) 0},
for some finite family (fij ) ⊂ k(x], and where “” is either “ 0,
j = 1, . . . , m },
for some family (fj )m j=1 ⊂ k[x]. (d) A basic closed semi-algebraic set A ⊂ k n is a semi-algebraic set of the form A = { x ∈ k n : fj (x) ≥ 0,
j = 1, . . . , m },
for some family (fj )m j=1 ⊂ k[x]. From their definition, semi-algebraic sets are invariant under finite union, finite intersection and complementation. They are also invariant under projection, and in fact, every semi-algebraic set of k n is the projection of an algebraic set of k n+1 . If k is a real closed field and if the coefficients of the polynomials that define a semi-algebraic set belong to a subring D ⊂ k, the semi-algebraic set is said to be defined over D. The following result is the important projection theorem.
Theorem A.7. Let k be a real closed field and D ⊂ k be a subring of k. Given a semi-algebraic set of k n+1 defined over D, its projection to k n is a semi-algebraic set defined over D.
302
A Background from Algebraic Geometry
Notice that by Theorem A.7, the projection of a basic semi-algebraic set is semi-algebraic. However, in general it is not a basic semi-algebraic set. Every semi-algebraic set A ⊂ k n can be written as a finite union of semi-algebraic sets of the form {x ∈ k n : fi (x) = 0, gj (x) > 0, i = 1, . . . , m, j = 1, . . . , p}, for some families {fi }, {gj } ⊂ k[x]. We equip k n with the usual Euclidean topology coming from the ordering of k, so that the open ball B(x, r) is the usual subset {y ∈ k n : ky − xk < r}. So, the closure S of a semi-algebraic subset S ⊂ k n is S = {x ∈ k n : ∀t ∈ k, ∃y ∈ S s.t. ky − xk < t2 or t = 0}. The closure and the interior of a semi-algebraic set are semi-algebraic. Warning: Notice that in general, the closure S of S ⊂ k n is not obtained from S by just relaxing the strict inequalities in the definition of S. Example A.4. Let S := {x ∈ R2 : x31 − x21 − x22 > 0}. Relaxing > to ≥ would imply x = 0 ∈ S which is not true because x ∈ S implies x1 ≥ 1. Similarly, in general, the interior int(S) of the semi-algebraic set S = {x ∈ Rn : gj (x) ≥ 0, j ∈ I} is not obtained from S by replacing the inequalities gj ≥ 0 with the strict inequalities gj > 0. Example A.5. With n = 1, let S := {x ∈ R : g(x) ≥ 0}, with g(x) = (1 − x)x2 (1 + x), so that S = [−1, 1] and 0 ∈ int(S) = (−1, 1) . But the set {x ∈ R : g(x) > 0} does not contain 0. Note also that closed and bounded semi-algebraic sets are not necessarily compact for real closed fields other than R. A.6
Notes and Sources
Sources for the introductory material are from Bochnak et al. (1998), Basu et al. (2003), and Adams and Loustaunau (1994). Specifically, Example
A.6 Notes and Sources
303
A.1 if from Adams and Loustaunau (1994). The real Nullstellensatz in the form of Theorem A.5 is due to Risler (1970).
This page intentionally left blank
Appendix B
Measures, Weak Convergence and Marginals
B.1
Weak Convergence of Measures
Let X be a metric space with its usual associated Borel σ-algebra B. Let M (X) (resp. C(X)) be the space of finite signed Borel measures (resp. bounded continuous functions) on X and let M (X)+ be its positive cone, the space of finite Borel measures on X. We next introduce the weak convergence of measures, the most commonly used in probability.
Definition B.9. Let M (X) be equipped with the weak topology σ(M (X), C(X)). A sequence (µn ) ⊂ M (X)+ converges weakly to µ ∈ M (X)+ , denoted µn ⇒ µ, if and only if, as n→∞, Z Z f dµn → f dµ ∀ f ∈ C(X). (B.1) X
X
Definition B.10. Let Π ⊂ M (X)+ be a set of probability measures on X. Then Π is said to be: (a) tight if for every > 0 there is a compact set K ⊂ X such that µ(K ) ≥ 1 − for every µ ∈ Π. (b) relatively compact if every sequence in Π contains a weakly convergent subsequence, that is, for every (µn ) ⊂ Π there exists µ ∈ M (X)+ and (µnk ) ⊂ (µn ) such that µnk ⇒ µ as k→∞. Tightness and relatively compactness are related via the following result: 305
306
B Measures, Weak Convergence and Marginals
Theorem B.8 (Prohorov). Let X be a metric space and let Π ⊂ M (X)+ be a set of probability measures on X. Then: (a) If Π is tight then Π is relatively compact. (b) If X is separable and complete and Π is relatively compact, then Π is tight. Thus if X is a Polish (i.e., separable and complete metric) space, then Prohorov’s Theorem yields that Π is tight if and only if it is relatively compact. This is also true if X is a locally compact separable metric space in which case it can be given a metric under which it is also complete. Therefore, in view of the above, it is important to detect when a set of probability measures is relatively compact. Definition B.11. A function f : X→R+ is said to be: (a) a moment if there exists a sequence of compact sets Kn ↑ X such that lim inf {f (x) : x ∈ X \ Kn } = +∞.
n→∞
(b) inf-compact if the level set Ar := {x ∈ X : f (x) ≤ r} is compact for every r ∈ N. Of course, f is a moment if it is inf-compact. Conversely, if f : X→R+ is a moment, then the closure of Ar is compact for every r ∈ N. Moment functions are useful as we have: Theorem B.9. Let Π ⊂ M (X)+ be a set of probability measures and let f : X→R+ be a moment function. (a) Π is tight if Z sup f dµ < +∞. (B.2) µ∈Π
X
(b) Conversely, if X is a locally compact separable metric space and Π is tight, then there exists a moment function f : X→R+ that satisfies (B.2). Recall that X is a metric space. A function f : X→R is lowersemicontinuous (l.s.c.) and bounded below if there exists a nondereasing sequence (vn ) ⊂ C(X) such that vn (x) ↑ f (x) for every x ∈ X. Let L(X) denote the space of functions on X that are l.s.c. and bounded from below. The space L(X) is useful to characterize weak convergence of measures.
B.1 Weak Convergence of Measures
307
Proposition B.10. Let (µn ) ⊂ M (X)+ be a bounded sequence. Then µn ⇒ µ if and only if Z Z lim inf f dµn ≥ f dµ, ∀f ∈ L(X). (B.3) n→∞
X
X
Finally we end up this section with the following two important results. Recall that the support of a Borel measure µ on Rn is a closed Borel set, the complement of the largest open set O ⊂ Rn with µ(O) = 0. Theorem B.11. Let f1 , . . . , fm : X→R be Borel measurable on a measurable space X and let µ be a probability measure on X such that f i is integrable with respect to µ for each i = 1, . . . , m. Then there exists a probability measure ϕ with finite support on X, such that: Z Z fi dϕ = fi dµ, ∀ i = 1, . . . , m. X
X
Moreover, the support of ϕ may consist of at most m + 1 points. A cubature formula of degree d for a Borel measure µ on Rn , is given by an integer m ≥ 1, points {xi }m i=1 in the support of µ and positive scalars {λi }m such that i=1 Z m X f (x) dµ = λi f (xi ), Rn
i=1
for all polynomials f ∈ R[x] of degree at most d. The next result is the basis of existence of cubature formulas for numerical integration and is an extension of the celebrated Tchakaloff’s theorem.
Theorem B.12. Let µ be a finite Borel measure with support K ⊂ Rn and let d ≥ 1 be a fixed positive integer. Then there exist m points n+d {xi }m ⊂ K (with m ≤ ) and positive weights λi , i = 1, . . . , m, i=1 n such that Z m X f dµ = λi f (xi ), K
i=1
for every polynomial f ∈ R[x] of degree at most d.
B Measures, Weak Convergence and Marginals
308
Theorem B.12 states that for an arbitrary finite Borel measure µ supported on K ⊂ Rn , one may always find a Borel measure ϕ finitely supported on K, such that all moments of ϕ up to degree m, match exactly those of µ. It extends to arbitrary measures supported on K, Tchakaloff’s theorem proved for measures supported on a compact set K and absolutely continuous with respect to the Lebesgue measure on Rn (see Tchakaloff (1957)).
B.2
Measures with Given Marginals
In this section we address the following issue: Let I = {1, . . . , n} = ∪pi=1 Ik with |Ik | =: nk , and let Ijk := Ij ∩ Ik (whenever IJ ∩ Ik 6= ∅), with |Ijk | =: njk , j 6= k. In general, the Ik ’s do P not form a partition of I as one may have k nk > n. Let x = (x1 , . . . , xn ) and for any J ⊆ I denote by x(J) the vector (xi : i ∈ J). For every i = 1, . . . , p, let µi be a given probability measure on Rni , supported on a compact set Ki ⊂ Rni . Whenever Ij ∩ Ik 6= ∅, denote by πjk µj the projection of µj on Rnjk (its marginal on the variables x(Ijk )). Let K ⊂ Rn be the set K = { x ∈ Rn : x(Ii ) ∈ Ki ,
i = 1, . . . , p },
(B.4)
and assume that πjk µj = πkj µk ,
∀j, k = 1, . . . , p; Ij ∩ Ik 6= ∅.
(B.5)
Can we construct a probability measure µ on Rn and supported on K, such that its projection πj µ on Rnj (i.e. its marginal on the variables x(Ij )) is equal to µj , for every j = 1, . . . , p? Lemma B.13. Let I = {1, . . . , n} = ∪pi=1 Ik with |Ik | =: nk . Let Ki ⊂ Rni be given compact sets, and let K ⊂ Rn be the compact set defined in (B.4). Let µi be given probability measures supported on Ki , i = 1, . . . , p, and such that (B.5) holds. If the collection of sets {Ik } satisfies the running intersection property, i.e., \ k−1 [ Ik Ij ⊆ Is for some s ≤ k, k = 2, . . . , p, (B.6) j=1
B.2 Measures with Given Marginals
309
then there exists a probability measure µ supported on K, such that πj µ = µ j ,
j = 1, . . . , p,
that is, µj is the marginal of µ on Rnj (i.e., with respect to the variables x(Ij ), j = 1, . . . , p). Pm Proof. For every 1 ≤ m ≤ p, let n(m) := j=1 nj . Let B(Rn ) denote the Borel σ-algebra associated with Rn . We construct µ by induction on k = 1, . . . , p. Induction property (H): With 1 ≤ m < p, there is a probability measure νm supported on K(m) := {x ∈ Rn(m) : x(Ij ) ∈ Kj ,
j = 1, . . . , m}
such that πk νm = µk , i.e., the projection of νm on Rnk (on the variables x(Ik )) is µk , for every k = 1, . . . , m. (H) is true for m = 1 (just take ν1 := µ1 ). We next prove that it is true for m + 1. Define a measure νm+1 on R(m+1) as follows: Let I m := Im+1 ∩ (∪m k=1 Ik ) with cardinal τ . (a) If I m = ∅ then νm+1 (A × B) := νm (A) µm+1 (B),
∀A ∈ B(Rn(m) ), B ∈ B(Rnm+1 )
And νm+1 satisfies property (H). (b) If I m 6= ∅ then by (B.6) I m+1 ⊆ Is for some s ≤ m. Let dϕ(x(I m )) be the projection of µm+1 on Rτ (on the variables x(I m )). By (B.5), ϕ is also the projection of νm on Rτ (with respect to x(I m )). Next, the probability measure νm can be disintegrated into the product of its associated conditional probability νm (• | x(I m )) and its marginal ϕ on Rτ (with respect to the variables x(I m )). That is Z νm (A × B) = νm (A | x(I m )) dϕ(x(I m )) B
for all hyper-rectangles A × B with A ∈ B(Rn(m)−τ ) and B ∈ B(Rτ ). Similarly, µm+1 can also be disintegrated into the product of its associated conditional probability νm+1 (• | x(I m )) and its marginal ϕ, that is: Z µm+1 (A × B) = µm+1 (A | x(I m )) dϕ(x(I m )) B
for every hyper-rectangle A × B with A ∈ B(Rnm+1 −τ ), and B ∈ B(Rτ ).
310
B Measures, Weak Convergence and Marginals
Therefore, define νm+1 on Rn(m+1) as: Z νm+1 (A × B × C) := νm (A | x(I m+1 )) µm+1 (C | x(I m )) dϕ(x(I m )), B
for every hyper-rectangle A × B × C, with A ∈ B(Rn(m)−τ ), B ∈ B(Rτ ), and C ∈ B(Rnm+1 −τ ). Then, with A := Rn(m)−τ , νm (Rn(m)−τ |x(I m )) = 1 so that Z νm+1 (A × B × C) := µm+1 (C | x(I m+1 )) dϕ(x(I m )) = µm+1 (B × C), B
which shows that µm+1 = πm+1 νm+1 , the marginal of νm+1 on Rnm+1 (on the variables x(Im+1 )). Similarly, with C := Rτ , µm+1 (Rτ |x(I m )) = 1 so that Z νm+1 (A × B × C) := νm (A | x(I m+1 )) dϕ(x(I m )) = νm (A × B), B
which shows that νm is the marginal of νm+1 on Rn(m) (on the variables x(∪m i=1 Ik )). But by (H), πk νm = µk , for every k = 1, . . . , m and so πk νm+1 = µk , for all k = 1, . . . , m. Hence πk νm+1 = µk ,
k = 1, . . . , m + 1.
As νm is supported on K(m) and µm+1 on Km , it follows that νm+1 is supported on K(m + 1) := {x ∈ Rn(m+1) : x(Ij ) ∈ Kj ,
j = 1, . . . , m + 1}.
Hence (H) propagates to m+1. Finally, as K(p) ≡ K, we obtain the desired result that µ (:= νp ) is a probability measure on K, with marginal µi on Ki for every i = 1, . . . , p.
B.3
Notes and Sources
Most of Section B.1 is from (Hern´ andez-Lerma and Lasserre, 2003, Chap. 1) and (Ash, 1972, A6). Theorem B.11 can be found in e.g. Mulholland and Rogers (1958); see also (Anastassiou, 1993, Theor. 2.1.1, p. 39), and Theorem B.12 is from Bayer and Teichmann (2006) who refined a result of Putinar (2000). Section B.2 is from Lasserre (2006a).
Appendix C
Some Basic Results in Optimization
C.1
Non Linear Programming
Let O ⊂ Rn be an open subset and consider the optimization problem P defined by:
P:
inf { f (x) : x ∈ K }
(C.1)
x
with K := { x ∈ Rn : gj (x) ≥ 0, j = 1, . . . , m;
hk (x) = 0, k = 1, . . . , p } (C.2)
where f : O→ R is differentiable and gj , hk : Rn → R are continuously differentiable functions for every j = 1, . . . , m, and k = 1, . . . , p. Recall that x∗ ∈ O ∩ K is a local minimizer of f on K if there exists an open neighborhood C of x∗ such that f (x) ≥ f (x∗ ) for all x ∈ C ∩ K. Theorem C.14 (Fritz-John optimality conditions). If x∗ ∈ O ∩K is a local minimum of f on K, there exist λ ∈ Rm and ψ ∈ Rp+1 such that: ∗
(i) ψ0 ∇f (x ) = (ii)
λj ≥ 0,
m X j=1
∗
λj ∇gj (x ) +
j = 1, . . . , m
∗
(iii) λj gj (x ) = 0, j = 1, . . . , m.
311
p X k=1
ψk ∇hk (x∗ )
(C.3) (C.4) (C.5)
C Some Basic Results in Optimization
312
The variables (λ, ψ) in Theorem C.14 are called the dual multipliers. Notice that if ψ0 = 0, the above conditions are not very informative about a local minimizer and so, one would like to find a condition that ensures ψ0 6= 0 in (C.3). Therefore, given x ∈ K, consider the so-called constraint qualification condition: n ∃u0 ∈ R : h∇hk (x), u0 i = 0, ∀k = 1, . . . , p. QC(x) the vectors{∇hk (x)}pk=1 are linearly independent h∇gj (x), u0 i > 0 whenever gj (x) = 0.
Under some additional condition one may obtain the more informative optimality conditions due to Karush, Kuhn and Tucker (and called the KKT optimality conditions) where the dual multiplier ψ0 in (C.3) can be taken to be 1. Theorem C.15 (KKT optimality conditions). If x∗ ∈ O ∩ K is a local minimum of f on K, and if the constraint qualification QC(x∗ ) holds, there exist λ ∈ Rm and ψ ∈ Rp such that: (i) (ii)
∇f (x∗ ) = λj ≥ 0, ∗
m X j=1
λj ∇gj (x∗ ) +
j = 1, . . . , m
(iii) λj gj (x ) = 0, j = 1, . . . , m
p X k=1
ψk ∇hk (x∗ )
(C.6) (C.7) (C.8)
The dual variables (λ, ψ) ∈ Rm × Rp are called the Lagrange-KKT multipliers. Another constraint qualification, stronger than QC(x), states that all vectors {∇hk (x)} and and {∇gj (x)}j:gj (x∗ )=0 are linearly independent. Convexity: In the presence of convexity the KKT-conditions become sufficient, that is,
Theorem C.16. Let f be convex on the open convex O ⊂ Rn , hk : Rn → R be affine for every k = 1, . . . , p, and gj : Rn →R be concave differentiable for every j = 1, . . . , m. If x∗ ∈ O ∩ K satisfies (C.6)-(C.8) then x∗ is a local minimizer of f sur O ∩ K. In addition, because of convexity, every local minimizer x∗ ∈ O ∩ K is a global minimizer of f on O ∩ K.
C.2 Semidefinite Programming
313
Finally, under the convexity conditions of Theorem C.16, write the affine function hk as x 7→ hk (x) := a0k x + bk fort some ak ∈ Rn , bk ∈ R, for every k = 1, . . . , p. The following so-called Slater’s condition ensures that QC(x) holds at every x ∈ K. Slater:
The vectors {ak }pk=1 are linearly independent and ∃x0 ∈ K : gj (x0 ) > 0, j = 1, . . . , m; hk (x0 ) = 0, k = 1, . . . , p.
Hence, under the conditions of Theorem C.16 and Slater’s condition, the conditions (C.6)-(C.8) are necessary and sufficient for x∗ ∈ O ∩ K to be a global minimizer of f on O ∩ K. Next, the Lagrangian Lf : O→R associated with x∗ ∈ O ∩ K and (λ, ψ) ∈ Rm × Rp , is defined as:
x 7→ Lf (x) := f (x) − f (x∗ ) −
m X j=1
λj gj (x) −
p X
ψk hk (x).
(C.9)
k=1
If x∗ ∈ O ∩ K and (λ, ψ) ∈ Rm × Rp satisfy the KKT-optimality conditions (C.6)-(C.8) then x∗ is a stationary point of Lf . Moreover, in the convex case, i.e., under the conditions of Theorem C.16, the Lagrangian Lf is convex and nonnegative on O with (global) minimum Lf (x∗ ) = 0 on O attained at x∗ ∈ K. In particular, if O = Rn then Lf is convex nonnegative on the whole space Rn with minimum 0 attained at x∗ ∈ K. Most Nonlinear Programming algorithms try to find a pair ∗ (x , (λ∗ , ψ ∗ )) ∈ Rn × Rm × Rp that satisfy the KKT-optimaliy conditions (C.6)-(C.8).
C.2
Semidefinite Programming
Conic programming is a subarea of convex optimization that refers to linear optimization problems over general convex cones. And semidefinite programming (in short, SDP) is a particular case of conic programming when one considers the convex cone of positive semidefinite matrices, whereas linear programming considers the positive orthant of Rn , a polyhedral convex cone.
C Some Basic Results in Optimization
314
Let Sp ⊂ Rp×p be the space of real p × p symmetric matrices. Whenever A, B ∈ Sp , the notation A B (resp. A B) stands for A − B is positive semidefinite (resp. positive definite). Also, the notation hA, Bi stands for trace (AB) (= trace (BA)). In canonical form, a semidefinite program reads:
P:
inf {c0 x : F0 + x
n X i=1
F i xi 0 }
(C.10)
where c ∈ Rn , and {Fi }ni=0 ⊂ Sp for some p ∈ N. Denote by inf P the optimal value of P (possibly −∞ if unbounded or +∞ if P has no feasible solution x ∈ Rn ). If the optimal value is attained at some x∗ ∈ Rn then write inf P = min P. Pn The semidefinite constraint F0 + i=1 Fi xi 0 is also called a Linear Matrix Inequality (LMI). That P in (C.10) is a convex optimization problem can be seen as follows: Consider the mapping F : Rn →Sp defined by: x 7→ F(x) := F0 +
n X
F i xi ,
i=1
x ∈ Rn .
The constraint F(x) 0 is the same as λmin (F(x)) ≥ 0 where the function x 7→ λmin (F(x)) maps x ∈ Rn to the smallest eigenvalue of the real symmetric matrix F(x) ∈ Sp . But the smallest eigenvalue of a real symmetric matrix is a concave function of its entries, and the entries of F(x) are affine functions of x. Hence the set {x : F(x) 0} is convex and therefore, P consists of minimizing a linear functional on a convex set, i.e., a convex optimization problem. Observe that if the matrices (Fi ) are diagonal then P is just a finite dimensional linear programming problem. Conversely, consider the linear programming problem inf { c0 x : a0j x ≥ bj , j = 1, . . . , p;
x∈Rn
x ≥ 0 },
and let F(x) ∈ Sp be the diagonal matrix defined by F(x)jj = a0j x − bj for every j = 1, . . . , p. Then the above linear program is also the SDP inf x {c0 x : F(x) 0 }.
C.2 Semidefinite Programming
315
Semidefinite programming is a non trivial extension of linear programming. Indeed, while the latter considers the positive orthant Rn+ (a polyhedral convex cone), the former considers the non polyhedral convex cone of positive semidefinite matrices. Duality The dual problem associated with P is the convex optimization problem:
P∗ :
sup {−hF0 , Zi : hFi , Zi = ci ,
Z∈Sp
i = 1, . . . , n; Z 0 }
(C.11)
with optimal value denoted sup P∗ (possibly +∞ if unbounded or −∞ is there is no feasible solution Z 0). If the optimal value is attained at some Z∗ 0 then write sup P∗ = max P∗ . In fact P∗ is also a semidefinite program. Indeed, the set S := {Z ∈ Sp : hFi , Zi = ci ,
i = 1, . . . , n }
can be put in the form S = {G(z) = G0 +
s X j=1
Gj z j : z ∈ R s }
n for some appropriate matrices (Gj )s+1 j=0 ⊂ Sp . (For instance, if the (Fi )i=1 are linearly independent then s = p(p + 1)/2 − m.) Next, setting d = (dj ) ∈ Rs with dj := hF0 , Gj i for every j = 0, . . . , s, P∗ reads: 0
h−F0 , G0 i + max { −d z : G0 + z
s X j=1
Gj zj 0 },
which is of the same form as P. Weak duality states that inf P ≥ sup P∗ and holds without any assumption. Indeed, let x ∈ Rn and 0 Z ∈ Sp be any two feasible solutions of P and P∗ respectively. Then: * n + n X X 0 cx = hFi , Zi xi = Fi xi , Z ≥ h−F0 , Zi, i=1
i=1
316
C Some Basic Results in Optimization
where we have used that if Z 0 and A B then hA, Zi ≥ hB, Zi. And so inf P ≥ sup P∗ . The nonnegative quantity inf P − sup P∗ is called the duality gap. The absence of a duality gap (i.e., inf P = sup P∗ ) does not hold in general. However, it happens under some strict feasibility conditions. Theorem C.17 (Strong duality). Let P and P∗ be as in (C.10) and (C.11) respectively. (i) If there exists x ∈ Rn such that F(x) 0 then inf P = sup P∗ and inf P = max P∗ if the optimal value is finite. (ii) If there exists Z 0 feasible for P∗ then inf P = sup P∗ and min P = sup P∗ if the optimal value is finite. (iii) If there exists Z 0 feasible for P∗ and x ∈ Rn such that F(x) 0, then min P = max P∗ . The strict feasiblity condition in Theorem C.17(i) (resp. Theorem C.17(ii)) is a specialization of Slater’s condition in convex programming, when applied to the convex problem P (resp. applied to the convex problem P∗ ). Computational complexity What makes semidefinite programming a powerful technique is its computational complexity when using algorithms based on interior point methods. Indeed one may find an approximate solution within prescribed accuracy > 0, in a time that is polynomial in the input size of the problem. For instance, in using path following algorithms based on the Nesterov-Todd √ search directions, one only needs O( n ln(1/)) iterations to find such an approximate solution. See e.g. Tuncel (2000). C.3
Infinite-dimensional Linear Programming
Let X , Y be two arbitrary real vector spaces and let h·, ·i be a bilinear form on X × Y . The couple (X , Y ) is called a dual pair of vector spaces if it satisfies: • For each 0 6= x ∈ X there exists y ∈ Y such that hx, yi 6= 0. • For each 0 6= y ∈ Y there exists x ∈ X such that hx, yi 6= 0. Given a dual pair (X , Y ), the weak topology on X (denoted σ(X , Y ))
C.3 Infinite-dimensional Linear Programming
317
is the coarsest topology under which all the elements of Y are continuous when regarded as linear forms h·, yi on X . Equivalently, in this topology, the base of neighborhoods of the origin is given by {x ∈ X : |hx, yi| ≤ 1, ∀y ∈ I}, where I is a finite subset of Y . An important particular case is when Y is a normed vector space and X is the strong dual of Y (denoted Y ∗ ), i.e., X is the space of linear forms on Y , continuous for the topology induced by the norm. Then the weak topology σ(X , Y ) (= σ(Y ∗ , Y )) is called the weak ? topology on X and has the following fundamental property. Theorem C.18 (Banach-Alaoglu Theorem). The unit ball B := {x ∈ Y ∗ : kxk ≤ 1} of Y ∗ is compact in the weak ? topology. In addition, if Y is separable then B is weak ? sequentially compact. An infinite-dimensional linear program requires the following components: • two dual pairs of vector spaces (X , Y ), (Z , W ); • the induced associated weak topologies on each vector space; • a weakly continuous linear map G : X →Z , with adjoint G∗ : W →Y ; • a positive convex cone C in X , with dual cone C ∗ in Y ; and • vectors b ∈ Z and c ∈ Y . Then the primal linear program is P:
minimize hx, ci subject to: Gx = b,
x ∈ C.
(C.12)
The corresponding dual linear program is P∗ :
maximize hb, wi subject to: c − G∗ w ∈ C ∗ ,
w∈W.
(C.13)
Notice that there is a lot of freedom for choosing the vector spaces X , Y , W , Z . For instance, when X and Z are normed linear spaces, in general Y is not the strong dual X ∗ of X , and similarly, W is not the strong dual Z ∗ of Z . This is because one usually wants to work with spaces whose elements have a simple physical interpretation. An element x ∈ X is called feasible for P if x ∈ C and Gx = b; P is said to be consistent, if it has a feasible solution. If P is consistent, then
C Some Basic Results in Optimization
318
its value is defined as inf P := inf {hx, ci :
x is feasible for P};
otherwise, inf P = +∞. The linear program P is solvable if there is some feasible solution x∗ ∈ X that achieves the value inf P; then x∗ is an optimal solution of P, and one then writes inf P = min P. The same definitions apply for the dual linear program P∗ . The next weak duality result can be proved as in elementary (finitedimensional) linear programming. Proposition C.19. If P and P∗ are both consistent, then their values are finite and satisfy sup P∗ ≤ inf P. There is no duality gap if sup P∗ = inf P, and strong duality holds if max P∗ = min P, i.e., if there is no duality gap, and both P∗ and P are solvable.
Theorem C.20. Let D, H be the sets in Z × R, defined as D := {(Gx, hx, ci) :
x ∈ C}
H := {(Gx, hx, ci + r) :
x ∈ C, r ∈ R+ }
(C.14) (C.15) (C.16)
If P is consistent with finite value, and D or H is weakly closed (i.e., closed in the weak topology σ(Z × R, W × R)), then P is solvable and there is no duality gap, i.e., sup P∗ = min P.
C.4
Proof of Theorem 1.3
(a) To simplify notation we prove Theorem 1.3 in the case where all constraints are equality constraints. Let B(K) be the space of bounded measurable functions on K. The pairs of vector spaces: (Z , W ) := (R|Γ| , R|Γ| ), R form two dual pairs with duality bracket hµ, gi := K g dµ, and the usual scalar product in R|Γ| . Then (1.1) is the infinite-dimensional linear (X , Y ) := (M(K) , B(K));
C.5 Notes and Sources
319
optimization problem (C.12) with C := M(K)+ , and where the linear map G : X →Z is given by Z µ 7→ Gµ = hj dµ ∈ R|Γ| , K
P
with adjoint G∗ : λ 7→ G∗ λ := j λj hj . The map G is weakly continuous because G∗ (R|Γ| ) ⊂ B(K) (as the hj ’s are bounded on K). Therefore, as we maximize, it suffices to prove that the set H in (C.15), i.e., H := {(Gµ, hµ, f i − r) :
µ ∈ C, r ∈ R+ },
|Γ|+1 is weakly closed (i.e., here, closed in the usual topology ). R of R Let (µn , rn ) ∈ C × R+ be such that Gµn →b and K f dµn − rn →a for some R(b, a) ∈ R|Γ| × R. As K ⊂ Rn is compact and hk > 0 for some k ∈ Γ, from K hk dµn →bk , one deduces that the sequence (µn (K))n is bounded and therefore, there is a subsequence {nj } and a measure µ ∈ C such that µnj ⇒ µ (see Definition B.1 in Appendix B). Hence, R as hi is continuous for every i ∈ Γ, we obtain b = limj→∞ Gµnj = ( K hj dµ). Next, recall that f is upper semicontinuous R (and bounded on K). Therefore by RProposition B.10, R lim supj K f dµnj ≤ K f dµ. Moreover, from a = limj→∞ K f dµnj − rnj we infer that rnj ⊂ R+ is bounded above and so for another subsequence of {nj } (still denoted {nj } for convenience) we have rnj →r ∈ R+ . But then Z Z a = lim sup f dµnj − rnj ≤ f dµ − r j→∞
K
K
R
0 0 so that a = K R f dµ − Rr for some0 r ≤ r ∈ R+ .0 Hence we have proved that (b, a) = ( K hj dµ, K f dµ − r ) for some (µ, r ) ∈ C × R+ , that is, the set H is weakly closed. From Theorem C.20, the moment problem (1.1) is solvable and there is no duality gap, i.e., ρmom = ρpop . (b) Let µ be an optimal solution of (1.1). As f and the hj ’s are all bounded on K, then by Theorem B.11, there onR finitely R exists a Rmeasure ν supported R many points of K and such that K f dν = K f dµ, and K hj dν = K hj dµ, for all j ∈ Γ.
C.5
Notes and Sources
Most of Section C.1 is from Hiriart-Urruty (1998) while Section C.2 is inspired from Vandenberghe and Boyd (1996) and most of Section C.3 is
320
C Some Basic Results in Optimization
taken from Anderson and Nash (1987); in particular, see (Anderson and Nash, 1987, Theorems 3.10 and 3.22.).
Appendix D
The GloptiPoly Software
D.1
Presentation
In this chapter we describe and report on the GloptiPoly software that implements the methodology described in earlier chapters. GloptiPoly is a Matlab1 freeware to solve (or at least approximate) the Generalized Moment Problem (GMP) in (1.1) with polynomial data, that is, when the set K is a basic semi-algebraic set and f, (hj ) are polynomials. It can also solve the several extensions of the GMP described in Chapter 4. GloptiPoly 3 is an extension of of the former version 2 of GloptiPoly described in Henrion and Lasserre (2003) dedicated so solving the global optimization problem (5.2) of Chapter 5. The software automatically generates and solves the hierarchy of semidefinite relaxations (4.5). It is a user-friendly package that researchers and students can experiment easily. It can also be used as a tutorial material to illustrate the use of semidefinite relaxations for solving the GMP. So far, GloptiPoly is aimed at solving small- and medium-scale problems. D.2
Installation
GloptiPoly 3 is a freeware subject to the General Public Licence (GPL) policy. It is available for Matlab 7.2 (Release 2006) and later versions. It can be downloaded at www.laas.fr/∼henrion/software/gloptipoly3
1 Matlab
is a trademark of The MathWorks, Inc.
321
D The GloptiPoly Software
322
The package, available as a compressed archive, consists of several mfiles and subdirectories, and it contains no binaries. Extracted files are placed in a gloptipoly3 directory that should be declared in the Matlab working path, e.g. using the Matlab command >> addpath gloptipoly3 To solve the semidefinite relaxation (4.5) GloptiPoly 3 uses by default the semidefinite programming solver SeDuMi of Sturm (1999); so this package should be properly installed. Other semidefinite solvers can also be used provided they are installed and interfaced through YALMIP of L¨ ofberg (2004). D.3
Getting started
Please type the command >> gloptipolydemo to run interactively the basic example that follows. Consider the classical problem of minimizing globally the twodimensional six-hump camel back function Henrion and Lasserre (2003) 1 min2 g0 (x) = 4x21 + x1 x2 − 4x22 − 2.1x41 + 4x42 + x61 . 3 x∈R The function has six local minima, two of them being global minima. Using GloptiPoly 3, this optimization problem can be modeled as an instance of the moment problem as follows: >> mpol x1 x2 >> g0 = 4*x1ˆ2+x1*x2-4*x2ˆ2-2.1*x1ˆ4+4*x2ˆ4+x1ˆ6/3 Scalar polynomial 4x1ˆ2+x1x2-4x2ˆ2-2.1x1ˆ4+4x2ˆ4+0.33333x1ˆ6 >> P = msdp(min(g0)); GloptiPoly 3.0 Define moment SDP problem ... (GloptiPoly output suppressed) ... Generate moment SDP problem >> P
D.4 Description
Moment SDP problem Measure label Relaxation order Decision variables Semidefinite inequalities
= = = =
323
1 3 27 10x10
Once the moment problem is modeled, a semidefinite solver (here SeDuMi) can be used to solve it numerically. >> [status,obj] = msol(P) GloptiPoly 3.0 Solve moment SDP problem ***************************************************** Calling SeDuMi SeDuMi 1.1R3 by AdvOL, 2006 and Jos F. Sturm, 1998-2003. ... (SeDuMi output suppressed) ... 2 globally optimal solutions extracted >> status status = 1 >> obj obj = -1.0316 >> x = double([x1 x2]); x(:,:,1) = 0.0898 -0.7127 x(:,:,2) = -0.0898 0.7127 The flag status = 1 means that the moment problem is solved successfully and that GloptiPoly 3 can extract two globally optimal solutions reaching the objective function obj = -1.0316. D.4
Description
GloptiPoly 3 uses advanced Matlab features for object-oriented programming and overloaded operators. The user should be familiar with the following basic objects.
D The GloptiPoly Software
324
D.4.1
Multivariate polynomials (mpol)
A multivariate polynomial is an affine combination of monomials, each monomial depending on a set of variables. Variables can be declared in the Matlab working space as follows: >> clear >> mpol x >> x Scalar polynomial x >> mpol y 2 >> y 2-by-1 polynomial vector (1,1):y(1) (2,1):y(2) >> mpol z 3 2 >> z 3-by-2 polynomial matrix (1,1):z(1,1) (2,1):z(2,1) (3,1):z(3,1) (1,2):z(1,2) (2,2):z(2,2) (3,2):z(3,2) Variables, monomials and polynomials are defined as objects of class mpol. All standard Matlab operators have been uploaded for mpol objects: >> y*y’-z’*z+xˆ3 2-by-2 polynomial matrix (1,1):y(1)ˆ2-z(1,1)ˆ2-z(2,1)ˆ2-z(3,1)ˆ2+xˆ3 (2,1):y(1)y(2)-z(1,1)z(1,2)-z(2,1)z(2,2)-z(3,1)z(3,2)+xˆ3 (1,2):y(1)y(2)-z(1,1)z(1,2)-z(2,1)z(2,2)-z(3,1)z(3,2)+xˆ3 (2,2):y(2)ˆ2-z(1,2)ˆ2-z(2,2)ˆ2-z(3,2)ˆ2+xˆ3 Use the instruction >> mset clear to delete all existing GloptiPoly variables from the Matlab working space.
D.4 Description
D.4.2
325
Measures (meas)
Variables can be associated with real-valued measures, and one variable is associated with only one measure. For GloptiPoly 3, measures are identified with a label, a positive integer. When starting a GloptiPoly session, the default measure has label 1. By default, all created variables are associated with the current measure. Measures can be handled with the class meas as follows: >> mset clear >> mpol x >> mpol y 2 >> meas Measure 1 on 3 variables: x,y(1),y(2) >> meas(y) % create new measure Measure 2 on 2 variables: y(1),y(2) >> m = meas 1-by-2 vector of measures 1:Measure 1 on 1 variable: x 2:Measure 2 on 2 variables: y(1),y(2) >> m(1) Measure number 1 on 1 variable: x The above script creates a measure µ1 (dx) on R and a measure µ2 (dy) on R2 . Use the instruction >> mset clearmeas to delete all existing GloptiPoly measures from the working space. Note that this does not delete existing GloptiPoly variables. D.4.3
Moments (mom)
Linear combinations of moments of a given measure can be manipulated with the mom class as follows: >> mom(1+2*x+3*xˆ2) Scalar moment I[1+2x+3xˆ2]d[1] >> mom(y*y’) 2-by-2 moment matrix (1,1):I[y(1)ˆ2]d[2]
326
D The GloptiPoly Software
(2,1):I[y(1)y(2)]d[2] (1,2):I[y(1)y(2)]d[2] (2,2):I[y(2)ˆ2]d[2] R The notation I[p]d[k] stands for p dµk where p is a polynomial of the variables associated with the measure µk , and k is the measure label. Note that it makes no sense to define moments over several measures, or nonlinear moment expressions: >> mom(x*y(1)) ??? Error using ==> mom.mom Invalid partitioning of measures in moments >> mom(x)*mom(y(1)) ??? Error using ==> mom.times Invalid moment product Note also the distinction between a constant term and the mass of a measure: >> 1+mom(x) Scalar moment 1+I[x]d[1] >> mom(1+x) Scalar moment I[1+x]d[1] >> mass(x) Scalar moment I[1]d[1] Finally, let us mention three equivalent notations to refer to the mass of a measure: >> mass(meas(y)) Scalar moment I[1]d[2] >> mass(y) Scalar moment I[1]d[2] >> mass(2) Scalar moment I[1]d[2]
D.4 Description
327
The first command refers explicitly to the measure, the second command is a handy short-cut to refer to a measure via its variables, and the third command refers to GloptiPoly’s labeling of measures. D.4.4
Support constraints (supcon)
By default, a measure on n variables is defined on the whole Rn . We can restrict the support of a mesure to a given semialgebraic set as follows: >> 2*xˆ2+xˆ3 == 2+x Scalar measure support equality 2xˆ2+xˆ3 == 2+x >> disk = (y’*y x+y(1) supcon.supcon Invalid reference to several measures D.4.5
Moment constraints (momcon)
We can constrain linearly the moments of several measures: >> mom(xˆ2+2) == 1+mom(y(1)ˆ3*y(2)) Scalar moment equality constraint I[2+xˆ2]d[1] == 1+I[y(1)ˆ3y(2)]d[2] >> mass(x)+mass(y) min(mom(xˆ2+2)) Scalar moment objective function min I[2+xˆ2]d[1] >> max(xˆ2+2) Scalar moment objective function max I[2+xˆ2]d[1] The latter syntax is a handy short-cut which directly converts an mpol object into an momcon object. D.4.6
Floating point numbers (double)
Variables in a measure can be assigned numerical values: >> m1 = assign(x,2) Measure 1 on 1 variable: x supported on 1 point which is equivalent to enforcing a discrete support for the measure. Here µ1 is set to the Dirac at the point 2. The double operator converts a measure or its variables into a floating point number: >> double(x) ans = 2 >> double(m1) ans = 2 Polynomials can be evaluated in a similar fashion: >>double(1-2*x+3*xˆ2) ans = 9 Discrete measure supports consisting of several points can be specified in an array: >> m2 = assign(y,[-1 2 0;1/3 1/4 -2]) Measure 2 on 2 variables: y(1),y(2) supported on 3 points >> double(m2)
D.5 Solving Moment Problems (msdp)
329
ans(:,:,1) = -1.0000 0.3333 ans(:,:,2) = 2.0000 0.2500 ans(:,:,3) = 0 -2
D.5
Solving Moment Problems (msdp)
Once a moment problem is defined, it can be solved numerically with the instruction msol. In the sequel we give several examples of GPMs handled with GloptiPoly 3. D.5.1
Unconstrained minimization
In Section D.3 we already encountered an example of an unconstrained polynomial optimization solved with GloptiPoly 3. Let us revisit this example: >> mset clear >> mpol x1 x2 >> g0 = 4*x1ˆ2+x1*x2-4*x2ˆ2-2.1*x1ˆ4+4*x2ˆ4+x1ˆ6/3 Scalar polynomial 4x1ˆ2+x1x2-4x2ˆ2-2.1x1ˆ4+4x2ˆ4+0.33333x1ˆ6 >> P = msdp(min(g0)); ... >> msol(P) ... 2 globally optimal solutions extracted Global optimality certified numerically This indicates that the global minimum is attained with a discrete measure supported on two points. The measure can be constructed from the knowledge of its first moments of degree up to 6: >> meas
330
D The GloptiPoly Software
Measure 1 on 2 variables: x1,x2 with moments of degree up to 6, supported on 2 points >> double(meas) ans(:,:,1) = 0.0898 -0.7127 ans(:,:,2) = -0.0898 0.7127 >> double(g0) ans(:,:,1) = -1.0316 ans(:,:,2) = -1.0316 When converting to floating point numbers with the operator double, it is essential to make the distinction between mpol and mom objects: >> v = mmon([x1 x2],2)’ 1-by-6 polynomial vector (1,1):1 (1,2):x1 (1,3):x2 (1,4):x1ˆ2 (1,5):x1x2 (1,6):x2ˆ2 >> double(v) ans(:,:,1) = 1.0000 0.0898 -0.7127 ans(:,:,2) = 1.0000 -0.0898 0.7127 >> double(mom(v)) ans = 1.0000 0.0000 -0.0000
0.0081
-0.0640
0.5079
0.0081
-0.0640
0.5079
0.0081
-0.0640
0.5079
The first instruction mmon generates a vector of monomials v of class mpol, so the command double(v) calls the convertor @mpol/double which evaluates a polynomial expression on the discrete support of a measure (here two points). The last command double(mom(v)) calls the convertor @mom/double which returns the value of the moments obtained after solving the moment problem.
D.5 Solving Moment Problems (msdp)
331
Note that when inputing moment problems on a unique measure whose mass is not constrained, GloptiPoly assumes by default that the measure has mass one, i.e. that we are seeking a probability measure. Therefore, if g0 is the polynomial defined previously, the two instructions >> P = msdp(min(g0)); and >> P = msdp(min(g0), mass(meas(g0))==1); are equivalent. D.5.2
Constrained minimization
Consider the non-convex quadratic problem min −2x1 + x2 − x3 s.t. 24 − 20x1 + 9x2 − 13x3 + 4x21 − 4x1 x2 + 4x1 x3 + 2x22 − 2x2 x3 + 2x23 ≥ 0 x1 + x2 + x3 ≤ 4, 3x2 + x3 ≤ 6 0 ≤ x1 ≤ 2, 0 ≤ x2 , 0 ≤ x3 ≤ 3
Each constraint in this problem is interpreted by GloptiPoly 3 as a support constraint on the measure associated with x. >> mpol x 3 >> x(1)+x(2)+x(3) mpol x 3 >> g0 = -2*x(1)+x(2)-x(3); >> K = [24-20*x(1)+9*x(2)-13*x(3)+4*x(1)ˆ2-4*x(1)*x(2) ... +4*x(1)*x(3)+2*x(2)ˆ2-2*x(2)*x(3)+2*x(3)ˆ2 >= 0, ... x(1)+x(2)+x(3) > mv = mvec(mu) 10-by-1 moment vector (1,1):I[1]d[1] (2,1):I[x(1)]d[1] (3,1):I[x(2)]d[1] (4,1):I[x(3)]d[1] (5,1):I[x(1)ˆ2]d[1] (6,1):I[x(1)x(2)]d[1] (7,1):I[x(1)x(3)]d[1] (8,1):I[x(2)ˆ2]d[1] (9,1):I[x(2)x(3)]d[1] (10,1):I[x(3)ˆ2]d[1] These moments are the decision variables of the SDP problem solved with the above msol command. Their numerical values can be retrieved as
D.5 Solving Moment Problems (msdp)
333
follows: >> double(mv) ans = 1.0000 2.0000 -0.0000 2.0000 7.6106 1.4671 2.3363 4.8335 0.5008 8.7247 The numerical moment matrix can be obtained using the following commands: >> double(mmat(mu)) ans = 1.0000 2.0000 2.0000 7.6106 -0.0000 1.4671 2.0000 2.3363
-0.0000 1.4671 4.8335 0.5008
2.0000 2.3363 0.5008 8.7247
As explained in Chapter 4, one can build up a hierarchy or relaxations, whose associated monotone sequence optimal values converges to the global optimum, under mild technical assumptions. By default the command msdp builds the relaxation of lowest order, equal to half the degree of the highest degree monomial in the polynomial data. An additional input argument can be specified to build higher order relaxations: >> P = msdp(min(g0), K, 2) ... Moment SDP problem Measure label Relaxation order Decision variables Semidefinite inequalities >> [status,obj] = msol(P) ...
= = = =
1 2 34 10x10+8x(4x4)
334
D The GloptiPoly Software
Global optimality cannot be status = 0 obj = -5.6922 >> P = msdp(min(g0), K, 3) ... Moment SDP problem Measure label Relaxation order Decision variables Semidefinite inequalities >> [status,obj] = msol(P) ... Global optimality cannot be status = 0 obj = -4.0684
ensured
= = = =
1 3 83 20x20+8x(10x10)
ensured
Observe that the semidefinite programming problems involve an increasing number of variables and constraints. They generate a monotone non decreasing sequence of lower bounds on the global optimum, which is eventually reached numerically at the fourth relaxation: >> P = msdp(min(g0), K, 4) ... Moment SDP problem Measure label = 1 Relaxation order = 4 Decision variables = 164 Semidefinite inequalities = 35x35+8x(20x20) >> [status,obj] = msol(P) ... 2 globally optimal solutions extracted Global optimality certified numerically status = 1 obj = -4.0000
D.5 Solving Moment Problems (msdp)
335
>> double(x) ans(:,:,1) = 2.0000 0.0000 0.0000 ans(:,:,2) = 0.5000 0.0000 3.0000 >> double(g0) ans(:,:,1) = -4.0000 ans(:,:,2) = -4.0000 D.5.3
Several measures
GloptiPoly 3 can handle several measures whose moments are linearly related. For example, consider the GMP arising when solving polynomial optimal control problems as detailed in Chapter 10. We are seeking two occupation measures µ(dx, du) and ν(dx) of a state vector x(t) and control vector u(t) which satisfy the ODE dx(t) = f (x, u), x(0) = x0 , dt with f (x, u) a given polynomial mapping and x0 a given initial condition. The measure µ is supported on a given semialgebraic set K1 corresponding to constraints on x and u. The measure ν is supported on a given semialgebraic set K2 corresponding to performance requirements. For example K2 = 0 indicates that state x must reach the origin. Given a polynomial test function g(x) we can relax the dynamics constraint with the moment constraint Z Z g(x)dν(x) − g(x0 ) = h∇g(x), f (x, u)idµ(x, u) K2
K1
linking linearly moments of µ and ν. A lower bound on the minimum time achievable by any feedback control law u(x) is then obtained by minimizing the mass of µ over all possible measures µ, ν satisfying the support and moment constraints. The gap between the lower bound and the exact minimum time is narrowed by enlarging the class of test functions g.
336
D The GloptiPoly Software
In the following script we solve this moment problem in the case of a double integrator with state and input constraints: % bounds on minimal achievable time for optimal control of % double integrator with state and input constraints x0 = [1; 1]; u0 = 0; % initial conditions d = 6; % maximum degree of test function
% analytic minimum time if x0(1) >= -(x0(2)ˆ2-2)/2 tmin = 1+x0(1)+x0(2)+x0(2)ˆ2/2; elseif x0(1) >= -x0(2)ˆ2/2*sign(x0(2)) tmin = 2*sqrt(x0(1)+x0(2)ˆ2/2)+x0(2); else tmin = 2*sqrt(-x0(1)+x0(2)ˆ2/2)-x0(2); end % occupation measure for constraints mpol x1 2 mpol u1 m1 = meas([x1;u1]); % occupation measure for performance mpol x2 2 m2 = meas(x2); % dynamics scaling = tmin; % time scaling f = scaling*[x1(2);u1]; % test function g1 = mmon(x1,d); g2 = mmon(x2,d); % initial condition assign([x1;u1],[x0;u0]); g0 = double(g1); % moment problem P = msdp(min(mass(m1)),... u1ˆ2 = -1,... % state constraint x2’*x2