LOGIC-BASED DECISION SUPPORT: Mixed Integer Model Formulation
ANNALS OF DISCRETE MATHEMATICS
General Editor: Peter L...
39 downloads
554 Views
7MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
LOGIC-BASED DECISION SUPPORT: Mixed Integer Model Formulation
ANNALS OF DISCRETE MATHEMATICS
General Editor: Peter L. HAMMER Rutgers University,New Brunswick, NJ, U.S.A.
Advisory Editors C. BERGE, Universite de Paris, France M. A. HARRISON, University of California, Berkely, CA, U.S.A. V. KLEE, University of Washington, Seattle, WA, U.S.A. J.-H. VAN LINT California Institute of Technology, Pasadena, CA, U.S.A. G.-C.ROTA, Massachusetts Institute of Technology, Cambridge, MA, U.S.A.
This volume is based o n lectures delivered at the First Advanced Research Institute o n Discrete Applied Mathematics supported by the Air Force Office of Scientific Research and held at RUTCOR - Rutgers Center for Operations Research, May 1985.
40
LOGIC-BASED DECISION SUPPORT Mixed Integer Model Formulation
Robert G.JEROSLOW t
1989
NORTH-HOLLAND-AMSTERDAM
NEWYORK
OXFORD *TOKYO
'
Elsevier Science Publishers B.V., 1989
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publishers, Elsevier Science Publishers B.V. (Physical Sciences and Engineering Division), PO. Box 103, lOOOAC Amsterdam, The Netherlands. Special regulations for readers in the USA - This publication has been registered with the Copyright Clearance Center lnc. (CCC), Salem, Massachusetts. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the USA. All other copyright questions, including photocopying outside of the USA, should be referred to the copyright owner, Elsevier Science Publishers B.V., unless otherwise specified. No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein.
ISBN: 0 444 87119 5 Published by:
ELSEVIER SCIENCE PUBLISHERS B.V P.O. Box 103 1000 AC Amsterdam The Netherlands Sole distributors for the U.S.A. and Canada.
ELSEVIER SCIENCE PUBLISHING COMPANY, INC 52 Vanderbilt Avenue
N e w York, N.Y. 10017 U.S.A.
Library of Congress Cataloging-in-Publication Data J e r o s i o w , R o b e r t G . , 1942-1988 1.9,gic-based d e c i s i o n s u p p o r t . (ilniials of d i s c r e t e m a t h e m a t i c s ; 4 0 ) Bibliography: p . 1 . Decision support systems. 2 . Decision-making-3. L o g i c , Symbolic and Mathematical models. I. T i t l e . 11. S e r i e s . mathematical.
T58.62.547 1989 l S B N 0-444-87119-5
658.4'03
PRINTED IN THE NETHERLANDS
88-31027
V
Robert S. Jeroslow 1942 - 1988
vi
ROBERT G. JEROSLOW 1942 - 1988 On August 3 1 this year, towards the middle of the Mathematical Programming Symposium in Tokyo, Bob Jeroslow had a fatal heart attack. His early, sudden and completely unexpected death at the age of 46 came as lightning from a clear sky. It was a terrible shock to his friends and colleagues and in a way made all of us newly aware of our vulnerability and the frailty of the human condition. Our profession suffered a heavy loss indeed. Bob started his graduate work in Operations Research (at Columbia first, then at Cornell), but soon switched to Mathematics and wrote his dissertation in Logic, under Anil Nerode. He was a fresh Ph.D. in the summer of ’69 on his way to a job in the Math. Department at the University of Minnesota, when he visited Pittsburgh and we met for the first time. We had several conversations that I found intellectually stimulating, and I showed him some of the problems in Discrete Optimization that I was working on and that I felt could benefit from his background in Logic. We continued our contacts through telephone and correspondence, and the upshot of our interaction was the paper “On the Structure of the Hypercube” (Management Science Research Report No. 198, CMU, August-December 1969). This joint piece of work, published later as “Canonical Cuts on the Unit Hypercube” in the SIAM Journal on Applied Mathematics, 23, 1972, seems to have played a role in luring Bob back to Operations Research, and so in 1972 he joined our group at CMU’s Graduate School of Industrial Administration as Assistant Professor. At C M U Bob flung himself wholeheartedly into the ongoing effort of developing a cutting plane theory for integer and nonconvex programming based on Convex Analysis, that would use different tools and capture different aspects of the problem than the group theoretic approach that was prevalent at the time. Among the people outside CMU involved in this effort, Bob was in contact mainly with Fred Glover. The result of our joint work in this direction became known as disjunctive programming, or the disjunctive method. It is essentially a theory of optimization over unions of convex polyhedra. Bob wrote several papers on the subject, some by himself, some with me and some with his former doctoral student, Charlie Blair. The topics of these papers range from the basic principles of disjunctive programming (Annals of Discrete Mathematics, I , 1977; Journal of Optimization Theory and Its Applications, 30, 1980), to structural properties like the sequential convexifiability of facial disjunctive sets (SIAM Journal on Control and Optimization, 18, 1980; Discrete Applied Mathematics, 9, 1984), to methodological contributions like the monoidal cut strengthening procedure for mixed-integer programs that combines the disjunctive method with the algebraic approach (European Journet of’Operational Research, 4, 1980). While at CMU, Bob also made some interesting contributions to complexity theory. In one of them (Discrete Mathematics, 4, 1973) he extended the Klee-Minty result about the simplex method requiring exponentially many steps on certain problem classes, to a non-
R. JEROSL 0W
Vii
standard variant of the simplex method which uses the pivot column choice rule of maximizing the improvement of the objective function value. In 1978, already a full professor, Bob moved to Georgie Tech. During the late seventies and early elghties he wrote a number of important papers with Charlie Blair on the value function of an integer program (Discrete Mathematics, 19, 1977 and 25, 1979;Mathematical Programming, 23, 1982; Discrete Mathematics, 9, 1984 and 10, 1985). These papers deal with issues like subadditive duality and sensitivity analysis in integer programming. Starting around 1982, Bob got interested in problems of integer programming representability (Mathematical Programming Study 22, 1984;Discrete Applied Mathematics, 17, 1987; European Journal of Operational Research, 12, 1988). Here he drew on earlier work by Bob Meyer, as well as on my 1974 technical report on the properties of the convex hull of feasible points of a disjunctive set. That report contained a linear representation ofthe convex hull of a union of polyhedra in ahigher dimensional space. At the time, this representation did not seem important because its dimension grows exponentially with the number of polyhedra in the union. However, in 1982, Bob recognized the crucial fact that if applied selectively to a subset o f the constraints instead of the full set, this representation provides one of the chiefvehicles towards obtaining formulations whose linear programming relaxations are tight. Bob coined the term “sharp” for representations in which the inequalities by themselves define the integer hull, and obtained several important results concerning such representations. He became convinced that introducing appropriately chosen new variables is in many situations a more efficient way of sharpening a formulation than generating cutting planes in the original Variables. We had many lively discussions on this matter and were planning on writing a joint paper on the subject. Other people with whom he interacted on this topic include Charlie Blair, Kipp Martin, Ron Rardin, and his student Jim Lowe. After a while, around, 1985, Bob’s preoccupation with representability focused on the integer programming representation of logical inference; and, more generally, on the application of mathematical programming techniques to artificial intelligence, expert systems etc. (Computers and Operations Research, 19, 1986;Decision Support Systems, 1988 ;Annals of Discrete Mathematics, 1988). Here was finally an area upon which Bob could bring to bear the full arsenal of his training in Logic, combined with his knowledge of the polyhedral method. His pathbreaking work in this new and exciting area, much of which he presented in his Lecture Notes for the first ARIDAM at RUTCOR, published in the present volume, may ultimately prove to be the most influential part of his entire professional legacy. Besides being an outstanding mathematician, Bob had exceptional pedagogical skills: his students used to rave about him. He was a very earnest person, scrupulously conscientious about his commitments and obligations, generous with his time for students and colleagues alike. He would sometimes worry without a good reason and get excited, or become suspicious; on those occasions he needed somebody, a friend, to calm him down. But ifhe needed friendship, he also offered it: he was loyal and reliable. Beyond personal relations, Bob was a warmhearted, sensitive human being, who cared about issues of fairness and justice, and was never indifferent to the plight of people he knew about. We will all badly miss him. Egon Balas
Viii
for Richard J D u f b an applied mathematician in the grand style, a gentle man
.
ix
"...Science as well as technology, will in the near and in the farther future increasingly turn from problems of intensity, substance, and energy, to problems of structure, organization, idormatioL and control. .
.... ."
J. von Neumann, 1949, in his attribution of the views of N. Wiener
X
Contents INTRODUCTION I
1
MIXED-INTEGER MODEL FORMULATION
3
Lecture 1: Disjunctive Representations 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Some definitions and a basic result . . . . . . . . . . . . . . . . 1.3 Some s m a l l examples . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Simple Fixed Charge. . . . . . . . . . . . . . . . . . . . 1.3.2 Simple Fixed Charge for Use in Equality Constraints . 1.3.3 Simple Unbounded Fixed Charge . . . . . . . . . . . . . 1.3.4 Simple Fixed Benefit . . . . . . . . . . . . . . . . . . . . 1.3.5 Simple Fixed Benefit With Minimum Usage Level . . . 1.3.6 "Or" Logical Connective .Epigraph . . . . . . . . . . . 1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
Lecture 2: Furthtr Illustrations 2.1 Some further examples . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Graph of "or" logical connective . . . . . . . . . . . . . 2.1.2 Graph of "exclusive or" logical connective . . . . . . . . 2.1.3 Sepprable programming with fixed charges and convex sections (epigraph) . . . . . . . . . . . . . . . . . . . . . 2.1.4 Interactive fuced charges (epigraph) . . . . . . . . . . . . 2.1.5 Clique constraints in node packing . . . . . . . . . . . . 2.1.6 Distribution system design . . . . . . . . . . . . . . . . 2.2 A simplification in the disjunctive representation for some multiple rhs instances . . . . . . . . . . . . . . . . . . . . 2.2.1 Unions of nonempty rectangles . . . . . . . . . . . . . . 2.2.2 Translation of polyhedra . . . . . . . . . . . . . . . . . . 2.2.3 "Primitive" either/or constraints . . . . . . . . . . . . . 2.3 'Separate' vs . 'joint' formulations . . . . . . . . . . . . . . . . .
6
5 9 14 14 16 17 18 19 20 21 21 23 23 23 25
25 27 30 31 32 34 35 35 37
xi
CONTENTS 2.4
Exercises
..............................
41
Lecture 3: Constructions which Parallel Set Operations 3.1 Definitions and basic constructions . . . . . . . . . . . . . . . . 3.2 The union construction . . . . . . . . . . . . . . . . . . . . . . 3.3 Some other constructions . . . . . . . . . . . . . . . . . . . . . 3.4 Some technical properties of the basic constructions . . . . . . 3.5 Composite constructions and 'structure' in MIP . . . . . . . . . 3.6 Two central technical results . . . . . . . . . . . . . . . . . . . 3.7 Hereditary sharpness . . . . . . . . . . . . . . . . . . . . . . . .
43
Lecture 4: Topics in Representability 4.1 Reformulation via distributive laws . . . . . . . . . . . . . . . . 4.2 Convex union representability . . . . . . . . . . . . . . . . . . . 4.3 Using combinatorial principles in representability . . . . . . . . 4.4 Some experimental results . . . . . . . . . . . . . . . . . . . . . 4.4.1 Either/or constraints 4.4.2 Multiple fixed charges . . . . . . . . . . . . . . . . . . .
57
....................
43 45
47 48
49 52 55
57 62 65 69 69 73
I1 LOGIC-BASED APPROACHES TO DECISION SUPPORT 77 Lecture 5: Propositional Logic and Mixed Integer Programming 79 5.1 Introduction 79 5.2 A "natural deduction" system for propositional logic . . . . . . 82 5.3 Propositional logic as done by integer programming . . . . . . 85 5.4 Clausal chaining: a subroutine . . . . . . . . . . . . . . . . . . 90 5.5 Some properties of frequently-used algorithms of expert systems 95 5.6 The Davis-Putnam Algorithm in Two Forms . . . . . . . . . . 99 5.7 Some recent developments (December 1987) 100 102 5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.............................
...........
Lecture 6: A Primer on Predicate Logic 103 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2 Predicate logic: basic concepts. notation . . . . . . . . . . . . . 105 6.3 Applications for problem-solving . . . . . . . . . . . . . . . . . 111
R . JEROSLOW
Xii
Lecture 7: Computational Complexity above NP: A Retrospective Overview 119 7.1 7.2 7.3 7.4 7.5
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The fundamental distinction: conceptions vs . their instances Two fundamental results . . . . . . . . . . . . . . . . . . . . . . What if we increase expressability "a little bit"? . . . . . . . The Polynomial Hierarchy, Probabilistic Models. and Games
. .
................................
119 121 122 125 128
Lecture 8: Theorem-Proving Techniques which Utilise Discrete 137 Programming 8.1 8.2 8.3 8.4
8.5
Reduction of Predicate Logic to a Structured Propositional Logic138 Preliminary discussion 140 The algorithm framework 142 Illustrations and comments 146 A generalisation: predicate logic together with linear constraints 150
....................... ..................... ....................
Lecture B: Spatial Embeddings for Linear and Logic Structures 163 9.1 9.2 9.3 9.4
.................... .................... ..............
Definition of an Embedding Illustrations of embeddings Results for predicate logic embeddings Logic an pre-processing routines for MIP: an example via the DP/DPL algorithm
........................
Lecture 10: Tasks Ahead 10.1
Three "top-down" Views of Mathematical PropamminPf
162
166
. . . . 165
. . . . . . . . . . . . . . . . . 166 169 ..................... 172 . . . . . . . . . . . . . . . . . . . . . . 174 . . . . . . . . 175 . . . . . . . . . . . . . 175 . . . . . . . . . . . . . 176 176 . . . . . . . 177 . . . . . . . . . . . . . . . . . 177 . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 . . . . . . 178 . . . . . . . . 178
The Intellectual Heritage Academic settings for Mathematical Programming . . . Users' Perspective8 Some conclusions 10.2 Some research challenges related to these lectures 10.2.1 Research on MIP representability 10.2.2 Research on the AI/OR Interface 10.3 Some other research programs in the AI/OR Interface . . . . . 10.4 Some programs and courses in the AI/OR Interface 10.4.1 Purdue University, MIRC 10.4.2 University of Texas at Austin, Ph.D. Programs in MIS andOR 10.4.3 Camegie-Mellon University, GSIA and SUPA 10.4.4 University of Iowa, Management Sciences 10.1.1 10.1.2 10.1.3 10.1.4
153 158 159
CONTENTS
xiii
. . . . 178 . . . . . . . . . . . . . . . 179
10.4.5 University of Colorado at Boulder. MIS and OR 10.4.6 Northwestern University. MIS 10.4.7 Duke University. the F’uqua School 10.4.8 Massachusetts Institute of Technology. the Sloan School 10.4.9 Georgia Institute of Technology. Management Science 10.5 Guessing Ahead
. . . . . . . . . . . . 179
...........................
179
. 179 179
IHustrat ive Examples
183
Solutions to Examples
191
Bibliography
203
xiv
List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Naive Paradigm of DSS . . . . . . . . . . . . . . . . . . . . . . Simple Fixed Charge Equality Fixed Charge Unbounded Fixed Charge Simple Fixed Bene-lit . . . . . . . . . . . . . . . . . . . . . . . . Fixed Benefit with Minimum Usage . . . . . . . . . . . . . . . Separable Programming . . . . . . . . . . . . . . . . . . . . . . Interactive Fixed Charges RectangdarDomain . . . . . . . . . . . . . . . . . . . . . . . . Clique Constraints . . . . . . . . . . . . . . . . . . . . . . . . . Need for the Hypotheses . . . . . . . . . . . . . . . . . . . . . . Intersection Loses Sharpness . . . . . . . . . . . . . . . . . . . . HisF+G ............................. His max{ F,G} . . . . . . . . . . . . . . . . . . . . . . . . . . A Lattice of Representations . . . . . . . . . . . . . . . . . . . Network on an Index Set . . . . . . . . . . . . . . . . . . . . . . Complex Fixed Charges . . . . . . . . . . . . . . . . . . . . . . An And/or Tree . . . . . . . . . . . . . . . . . . . . . . . . . . “Or” Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . “Or/and” Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . A Complexity Chain . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Outline . . . . . . . . . . . . . . . . . . . . . . . . . Intellectual Heritage . . . . . . . . . . . . . . . . . . . . . . . . Current Influences . . . . . . . . . . . . . . . . . . . . . . . . . Academic Settings . . . . . . . . . . . . . . . . . . . . . . . . . Piecewise Linear Function . . . . . . . . . . . . . . . . . . . . . Two Convex Sections . . . . . . . . . . . . . . . . . . . . . . . . Concave Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Sum Minimum of Functions . . . . . . . . . . . . . . . . . . . . . . . Truth Valuation Search Tree . . . . . . . . . . . . . . . . . . .
........................ ....................... .....................
.....................
.............................
6 15 16 17 18 19 26 28 29 30 33 37 38 40 61 67 73 98 131 132 133 144 166 167 171 183 184 185 186 187 199
xv
List of Tables I MULTI-DIVISION PROBLEMS: R.H.S. MULTIPLIER (1.1) 11 MULTI-DIVISION PROBLEMS: R.H.S. MULTIPLIER (1.3) 111 MULTI-DIVISION PROBLEMS: R.H.S. MULTIPLIER (1.9) IV .................................. V
. . .
......................................
.
VI Seven problems with P = 0.3, N X 1 = 5 . . . . . . . . . . . . VII Six problems with P = 0.3, NX1= 6 . . . . . . . . . . . . . . VIII A harder problem at N X 1 = 6, P = 0.3 . . . . . . . . . . . . . I SATISFIABILITY TESTS USING BANDBX . . . . . . . . . 11 SATISFIABILITY TESTS USING LAND AND POWELL’S
.
.
CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III T E S T S U S I N G A P E X I V . . . . . . . . . . . . . . . . . . . . . N CREATING A HARD PROBLEM . . . . . . . . . . . . . . . .
70 71 72 74 75 75 76 76 87 88 88 89
This Page Intentionally Left Blank
1
Introduction This monograph is directly based on a series of ten lectures, of the same title, which I gave at Rutgers University as part of the Advanced Research Institute on Discrete Applied Mathematics (ARIDAM). The lectures divide naturally into two parts. In Leetures 1 through 4, we cover the theory of representations for those problems which are solvable by mixed-integer programming ( M I P ) , with emphasis on disjunctive formulation techniques. In Lectures 5 through 9, we discuss logic-based approaches to decision support which help to create more "intelligent" systems. We try to show the huge potential for MIP techniques to assist in these approaches and, conversely, the potential for results in applied logic to be relevant in MIP research. Lecture 10 raises broader philosophical issues for speculation and discussion, and it attempts to put the work treated in the previous lectures in a broader perspective. Those readers interested primarily in mixed-integer programming model formulation may wish to read only the first four lectures. However, in my view, some of the most interesting research challenges for MIP in the future will derive from tie-ins to other subject areas ,notably, those relating to decision support and "intelligent" software systems. The disjunctive formulation techniques are specifically designed to bridge the tie-in to logic-based methods for decision support. Seen more broadly, the technical issue of MIP problem formulation is part of knowledge representation [Win 19841. Moreover, the distinction between MIP heuristic and branch-and-bound algorithms versus MIP formulation, exactly parallels that between artificial intelligence search techniques [Nil19711, [Pea 19841 versus knowledge representation techniques. Recent years have seen significant practical successes for applications of certain artificial intelligence techniques, notably "expert systems" [Hay Wat Len 19831,as well as for certain mixed-integer programmhqg techniques [Hof Pad 19841. From the US~T'B perspective, both use modeling conceptions and computer-based techniques to solve problems. My main thesis in these lectures is that, going beyond the simple "juxtaposition" of both approaches, there is also substantial practical benefit to be obtained from pursuing the intellectual connections between the approaches. In this development, against the backdrop of easily-accessed distributed processing, there is potential to more fully realize the aspiration of the 1940's and 1950's: of having widespread use of sophisticated modeling methodologies to aid in decision-making for large organizations and complex modem societies.
2
R. JEROSLO W
The lectures are intended for an audience familiar with mathematical programming. For example, we include introductory and expository material on the propositional calculus and predicate calculus from logic, although similar material is available at greater length in a number of excellent introductory logic texts, e.g. [Men 19641, [Shoen 19671. With the earlier lectures, we also include some exercises with solutions. I a m grateful to Peter Hammer for inviting me to give these lectures and for encouraging their development, and to Rutgers University and the Air Force Office of Scientific Research for their sponsorship of the ARIDAM. I a m particularly thankful to Professor Bernhard Korte, and the Alexander von Humboldt Foundation, for their support on a leave of absence in 1983 during which I could begin in depth study of some of the new developments in Artificial Intelligence. Most of the ideas in these lectures were sketched out during that period, although it has taken several years of furthur research and writing to provide precise technical underpinnings for what we conjectured then. During this process, I have greatly benefitted from the continued support of the National Science Foundation, the Air Force Office of Scientific Research, and the Georgia Tech Foundation. I wish to thank former Dean Charles Gearing for making me aware of developments in computer science and computer technology and their relevance to business problems. Both Anil Nerode and Richard Platek were very helpful in suggesting readings and literature. My secretary Tawanna Tookes has always provided invaluable assistance in producing the manuscript against demanding deadlines. I a m grateful for her efforts. I also wish to extend my appreciation to Dr. Endre Boros, for his fine work in . preparing camera-ready copy for this volume. During the lectures , I very much appreciated comments from Egon Balas, Peter Hammer, Giorgio Gallo, Harvey Greenberg, Michel Minoux, and Bruno Simeone and other A D A M participants. I believe they will find several of their suggestions reflected here.
Atlanta, Georgia July 1986
Part I
MIXED-INTEGER MODEL FORMULATION
This Page Intentionally Left Blank
5
LECTURE 1 DISJUNCTIVE REPRESENTATIONS: A FUNDAMENTAL RESULT AND SOME ILLUSTRATIONS Summary: The basic formulation result; the essential need for auxiliary variables, and obtaining faces and facets by projection; modular focus and linkage issues; some illustrations.
1.1
Introduction
Starting in the late 1960's, when experience with solving mixed-integer programs (MIPS)began to accumulate, it was empirically observed that Merent
algebraic representations of the same MIP constraint condition could behave very differently in computation. In one algebraic formulation, a given MIP could be intractible, while the same MIP might be easily solvable with another formulation. In addition, the easily-solved formulation might involve many more variables and constraints than the intractible one. This latter fact was not consistent with experience from linear programming, and suggested that some new features of MIP formulations could override representation size as a key to the computational tractibility of a MIP. To our knowledge, the earliest systematic studies of this nature are in [Geo Gra 19741 and [Wil1974],although the phenomenon was part of the "folklore" of MIP earlier. Moreover, it is in [Geo Gra 1974, Section 51 that the size of the linear relaxation is specifically cited aa a key feature of a MIP formulation, with a smaller linear relaxation being better. By the "linear relaxation" (LR) of a MIP formulation, we mean the linear program (LP) which results when every variable declaration "zj is binary" is replaced by "0 5 z j 5 1." While [Geo Gra 19741 is generally cited as the first successful large-scale test of Benders' decomposition [Bend 19621,it is equally notable for its insights on MIP formulation, which contributed crucially to this success. Let us put this phenomenon in a broader decision support perspective. A
R. JEROSLO W
6
naive paradigm for the solution of problems via models is in Figure 1 below. Realistic paradigms are given in [Bon Hol Whi 19821 and [Nan Bal 19831 but even the crude one in Figure 1 will allow us to make several points.
"REAL WORLD"
-
PROBLEM
+
REPRESENTATION
ALGORITHM/ INFERENCE ENGINE
4
USER'S "SOLUTION"
Figure 1: Naive Paradigm of DSS
In Figure 1, the "representation" includes: 0
Choice of language for representation
0
Determination of representability
0
Choice of a "best" or "undominated" representation Data structures to implement the representation
Also, the "inference engine" includes: 0
Pre-, during, and post-processing
0
Precompilation of frequently used routines
0
Algorithms for special structure/substructures
0
General-purpose algorithms/algorithm frameworks
In examining Figure 1, each arrow is somehow mysterious. How do realworld problems achieve a formulation? How does a computer printout either lead to or guide a solution, which is often to be implemented by a large organization with multiple participants having different orientations, abilities and preferences? In these lectures, we will not address these two latter questions, but we do focus on the arrow from representation to computation. Clearly, the algorithm ought to be matched to the representation, at least at run time. In branch-and-bound (BB), which solves a MIP by examining a sequen'ce of linear programs which are LR's of various partial solutions (see e.g. [Gar Nem 19721 for a discussion of branch-and-bound), the LR's should be as representative of the problem as is possible. For constraints, one has more information
LECTURE 1
7
on the MIP constraint set when its LR is smaller. The theoretically smallest LR would be the convex hull of the MIP constraints (see [Roc 19703 or [Stoe and Wit 19701 for convexity terminology and results). A formulation with this convex hull property we call sharp, It follows that, if we wish to solve MIPS by branch-and-bound (BB), we should use formulations with a s m a l l linear relaxation (LR), ideally a sharp formulation. There are, however, trade-offs in the size of the LR and the size of the MIP formulation, which we will discuss more fully later on. This dictum - of looking for a MIP formulation with a s m a l l LR - still holds if Lagrangean relaxation is imbedded in the BB scheme, or if Benders decomposition is used, or if an algorithm other than the Simplex Method is used to solve the LR, etc. A t run time the algorithm should be matched to the representation - i.e. formulation and solution are not independent subjects. There are some rather good representation techniques for substantial parts of human knowledge as for example, the first-order predicate logic to be discussed in Part Two. However, when computation is needed, the knowledge may have to be temporarily converted to a Merent form in order for efficient solution by specific algorithms. Despite the early realization of the importance of formulation techniques for MIP,very little was done in this area for nearly a decade following [Geo Gra 19741. The formulation work which was undertaken in the 1970’s was focused on cutting-planes, which is a more restrictive conception than current approaches. When interest returned in the early 1980’s to the solution of large-scale M I P S , the issue of problem representation was again addressed. As attention has gradually become focused on this topic, researchers have become aware of the relatively limited number of techniques for obtaining good MIP representa tions . There is even the more fundamental question, as to what kinds of constraints yield to Mzp formulations at all. Aside from R.R. Meyer and T. Ibaraki, for many years few researchers regarded this latter question as of interest. In m a n y respects, the field of discrete programming had not progressed beyond Dantzig’s early model formulations [Dan19571 that first motivated the importance of binary and integer variables in a program. We do have some knowledge of MIF’formulation techniques, however, and our knowledge is growing very rapidly now. We begin to review some of that knowledge in this lecture, building on work of R. R. Meyer and on the disjunctive methods of cutting-plane theory. We begin by illustrating that, already in linear programming, there are
R. JEROSLOW
8
representation issues which have not been systematically addressed. Consider, for example, the convex constraint n
j=1
This can be represented by 2" linear constraints, in which all combinations of signs occur for the terms f z j . For instance, with n = 2, 1211 1211 5 1 becomes:
+
Moreover, each of the inequalities in (1.2) is needed, since it is a facet (see e.g. [Roc 19701 or [Stoe Wit 19701 for this terminology). Generally, all 2" inequalities are needed if we use a linear system in only the "original" " variables" 2j. Obviously, such a hear system is too large to use, except possibly if the inequalities are added aa they are needed. However, this entire issue disappears if we are willing to use "auxiliary variables". We note that the points defined by (1.1) are the convex span of the positive and negative unit vectors +Ej and we can represent Cj lzjl 5 1 by:
u s i n g a d i a t y o@&blee g$,y; iliary vatiables is :
2
0. Yet a different representation with aux-
Both of the systems (1.3) or (1.4) are s m a l l in size. Under what circumstances does a polyhedron (i.e. the solution set to a system of h e a r inequalities), which may require a large number of inequalities to define, nevertheless have a compact formulation, possibly with auxiliary variables? This basic question has not been investigated, to our knowledge.
LECTURE 1
9
Some definitions and a basic result
1.2
We return to R. R. Meyer's concern for a precise definition of those sets representable by MIP constraints. We adapt his approach in the following d&nition.
Definition: Let a set S E R" be given. We say that S is bounded-MIPrepresentable (b-MIP.r) if there are matrices A, B and a vector h, and a subset K of the indices of a d i a i y warkbles y, with z
In this setting, relazatwn is:
ES
t ,
for someywith yj E (0,l) for j E K we have A = + By 2 h
(1.5)
= (A, B, h, K ) is called a rep+e~~entcrtion of S , and its rel(S) - = {zI for some y with 0 I y j 5 lfor j E K, we have Az By 1 h}
+
We note that : 0
0
0
M(S)depends on S, not just
S. Always rel(S)
2 conv(S).
Auxiliary variables are an intrinsic feature. The y j for j E K are called control uarkblea, as they "control" the part of S in which z lies. The set S m a y be unbounded; the term "bounded" refers to the yj, j E K,which could (equivalently) be given any bounds.
In his work, Meyer has been concerned with several notions of representability, which are distinct, and only one of which is essentially the one above. The concept above is the only representability we shall use here. It is adequate to cover all cases in which sets have been represented using linear inequalities and bounded integer variables. It is thus adequate for practical applications to date. However, it is more restrictive than representability by general integer variables (e.g. [Mey 19751, [Jer Low 19841) or other concepts of representability involving unions rather than integer variables (e.g. [Mey Tha Mal 19801).
R. JEROSLO W
10
Meyer’s original focus was on the representability of functions, while here f representable in Meyer’s sense are those whose epigraphs epi(f) are representable in the sense above. The concept of function representability used by Ibaraki is essentially that of the graph gph(f) of f. In some contexts, representation of the hypograph hypo(f) is also important. Here we have the definitions:
our focus is on sets. The functions
Lemma: gph(f) is b-M1P.r iff both epi(f) and hypo(f) are b-MIP.r. Discontinuous functions cannot have a representable graph, hence the fixed-charge function (see section 1.3 below) does not. However, the epigraph of the fixed-charge function is representable. Thus, the concepts of epigraph, hypograph, and graph representability are all distinct. By the lemma, graph representability is the more restrictive concept for functions. The different concepts of function representability are relevant in different modeling situations. For example, suppose that f appears only positively in the minimizing objective and/or in 5 constraints: min
8.t.
....... .......
+f(2)
+f(Z)
+....... +....... 5 .......
(1.8)
Then represent epi( f). This procedure works because occurrences of ”f(z)” can be replaced by I, and z representations of ” z 2 f(z)” can be added to the constraints. These modifications do not change the set of solutions 2. A similar rule holds for the use of ”hypo(f)” in monotone situations. In general, a representation of gph(f) is needed. [Jer 1984a] gives some fairly broad conditions under which representability of epi(f) or hypo(f) suffices, but this issue has not been systematically investigated. We now discuss disjunctive formulations, which are a particular kind of representation. First, we need to define the concept of a ”starred recession cone”. For a polyhedron P described using auxiliary variables
LECTURE 1
11
P = (21 for some y, A E + By 2 h ) Define rec*(P) = {zI for some y, A t
Lemma: If P
+ By 2 0)
(1.9) (1.10)
# 0, rec*(P) is independent of P representing P,and in fact: for some ?a E P, rnz E P for all rn 2 0) = (21 for all z' E P, X I + rnz E P for all rn 2 0 ) = rec(P)
rec*(fl) =
(21 2'
+
(1.11)
In the lemma, rec(P) is the ordinary recession cone of the polyhedron as in convexity theory (e. g. [ Roc 19701 or [Stoe and Wit 19701). It is the cone of directions such that, starting from some point in P, one can "recede" in that direction indefinitely without leaving P. For P bounded, rec(P) = (0). For P empty, by definition rec(P) = R". When P is empty, rec'(l') and rec(P) can be different. Indeed if P = { t l z = 5 , l 5 y 5 -1) then P = 0, but using the bracketed representation for P, rec*(P) = (0). As it turns out, the starred recession cone is more convenient to use in representation work, even though it depends on the representation P of P,when P = 0. For starred recession cones, see [Jer 1984~1and [Jer 1986b]. Our next result is from [Jer 1984~1,and it describes a representation which is obtained by adapting ideas from Balas' union construction [Bal19741 and our co-propositions in [Jer 19741. Disjunctive Representation Lemma: Suppose that PI,...,Pt are polyhedra with representations P1,...,Pt. Then the following is a representation of the smallest representable set S containing PIU U Pt: tE
S
-
...
there are dl), ...,adt),y(l), ...,y(t) and binary scalars ml, ...,mt with
+
> h(i)mi, i = 1, ...,t
~ ( i ) ~ ( i ~) ( i ) y ( i I
(1.12)
R. JEROSLOW
12
...,t then S = PI U ...U Pt.
When rec*(Pi) is independent of i = 1,
In the lemma, note that the final representation (1.12) depends on the given ones Pi. We cannot therefore speak of "the" disjunctive representation of a set S, but rather of "a" disjunctive representation of S. The representation (1.12) appears to be restricted to use on "either/or constraints" of the form: 2 E PI or 2 E P2 or ... or z E Pt. However, one consequence of our next result is that the only sets which can be represented are of the "either/or" form, so that (1.12) is entirely general. The result is essentially from [Jer Low 19841. Theorem: S R" is b-MIP.r iff S = PI u ... u Pt is a (finite) union of polyhedra Pi with rec(Pi) independent of i = 1,...,t .
When S is b.MIF'.r, let P1,...,Pt be representations of PI,...,Pt with rec*(Pi) independent of i = 1,...,t (as can always be assumed). Then the disjunctive formulation S represents S and furthurmore is sharp, i.e. rel(8) = conv(S).
Corollary: Any representable set has a sharp formulation.
To my knowledge, the only two completely general formulation techniques are the disjunctive and the netforms of Glover, K l i n p a n , and Macmillan (see e.g. [Glo Kli McM 19771, [Glo Hul Kli Stu 19781). Of the two, only the disjunctive technique always gives sharp formulations. The netform technique reduces a MTP to a network with logical conditions. While the latter formulation is not always sharp, it is very much in the spirit of meshing representation to the algorithm, where the algorithm used in net forms is one for networks. While uniform representability with sharpness is a desirable aspect of the disjunctive formulation (1.12), its disadvantage is its size. For example, when the number of logical alternatives (i.e. number t of Pi) is large, the representation can be huge. If simplifications do not occur, then the representation may be huge even with a smaller number of logical alternatives. As we work exercises with disjunctive formulations in section 1.3, we will see that, frequently, substantial simplifications occur which greatly reduce the size. However, this happens on an ad hoc basis, and sometimes the simplifications are not substantial. Later on in these lectures, we will cite a theoretical principle which sometimes allows great simplifications, and we will cite a mod-
LECTURE 1
13
%cation of the disjunctive construction which, when it applies, tends to reduce representation size dramatically. Work of this nature is current research. In order to get a ”feel” for what the disjunctive formulations can do, when they simplify, etc., one must work numerous examples. We now mention the issue of modularity of representations. For a ”complex” set S C_ Rn,it is usually too diflicult to get a sharp (i.e. rel(S) = conv(S)) representation. So one works with subsets (“modules”) within the constraints. This simple point is worth emphasizing, for it is crucial to understanding research in representation techniques. Given that one works with modules and not the entire constraint set S C_ En,the question arises, as to how the modules are to be ’linked’ together. We will discuss this issue in Lectures 3 and 4. Several other advantages of disjunctive approaches are that these: 0
Allow modularization of representations, with focus on linkage
0
Mesh with logic-based approaches to artificial intelligence
0
Mesh with polyhedral combinatorics to improve specialized representations
A general conceptual approach to MIP modeling due to R. K. Martin is ”variable redefinition”, which has led to formulations for lot-sizing problems of various types, using auxiliary variables. It has a broad domain of applicability. It is a conceptual framework, rather than a technique, and sharpness of the representation is essentially an hypothesis of the framework. Conceptually it includes the disjunctive methods. We will discuss it furthur in Lecture 4. Other formuhtion techniques use problem-specific ingenuity, and tend to focus on the travelling salesman problem and, more generally, networks with fixed charges. For certain lot sizing problems, Wolsey and Barany achieved a sharp formulation, and this has been recently extended in [Wol 19861. More typically, the linear relaxation is tightened, without necessarily obtaining sharpness. Results of Wolsey, Barany, van Roy, Padberg, Claus, Hochbaum, Magnanti and Wong, Rardin and Choe, Leung, and others fall in this category. We mention also Kipp Martin’s recent cubic size sharp formulation of (the incidence vectors of) spanning trees in graphs. Although the problem-specific techniques are for specialized settings, they have produced a number of new ideas which may prove useful more broadly, and are already of value in significant applications.
R. JEROSLOW
14
We end this section with a brief reference to the technique of projection from a representation, which can be used in contexts where a characterization of facets is needed, or a sharp linear system in "original" variables is desired. The technique was first used by Balm and Pulleyblank, who successfully obtained such a linear system in the context of bi-partite matching. This technique applies when a representation is available which is sharp, so that x E conv(5) +-+
there is y with A x
+ By 2 h
(1.13)
Here we include 0 5 y j I 1, j E K among the representation constraints for = ( A , B , H,K)). We observe that all facets of conv(S) occur as (pA)z >_ p h as p varies among all basic feasible solutions to:
s
(1.14)
For any z* E R",one can minimize p ( A z * - h) subject to (1.14). A negative value is found iff z* $ conv(5). If z* conv(5) a basic feasible solution p* is found which "cuts off" z* : p*Az* < p*h, although ( p * A ) z 2 p * h is valid for conv( S). One can use the projection method to generate original variable cuts "as needed" from a sharp representation. When used in the setting of a theoretical characterization, as done by Balas and Pulleyblank, its success requires "spotting the pattern" of all basic solutions p to (1.14). Here computational work can help to develop insights.
1.3
Some small examples
In this section, we begin to develop experience with disjunctive formulations, and with the theorem characterizing representability. 1.3.1
Simple Fixed Charge
Here we have epi(F) = PI U P2 where
LECTURE 1
15
F(xl)=
I
0,
x1=
0
f, 0 < 2 1 5 M
' Figure 2: Simple Fixed Charge Here rec*(P,) = rec*(P2),so epi(F) i s representable. A disjunctive formulation is: z\1 ) = o m 1 Om2 5 x r ) 5 m2M
2 0m1 ml
+ m2 = 1
x1 =
xp+ xy +
z = z1
This simplifies to: (use
m2
0
22
2(2)
2 mzf
ml,
m2
(= 0 + $1 (2 0
binary =El
1
+ mzf = m f )
= y)
5
x1
5 yM, y binary
in conwhere the term "+f" is put into the objective (or used for '7+F(21)'7 straints). This is the usual representation for a simple fked charge. We note that the variable y of the usual representation is an auziliary variable of the disjunctive formulation. This raises the issue: what does "original" variable mean? If the epigraph is taken as "given", the variable y is "awiliary." If the usual algebraic representation is taken as "given", the variable y is "original." What do we view as being a "given" of the problem? I would suggest that the real world problem is the given, and the rest are concepts about it (including the concept of a fked charge function). Since what is an "original" variable and what is an "auxiliary" variable depends on
R. JEROSLO W
16
the conceptualization, these are relative, and hence somewhat artificial, categories. In some cases, the problem is already "given" in abstract terms as e.g. representing the incidence vectors of members of a matroid. For these problems, there can be a meaningfd distinction between "original" and "auxiliary" variables. Let us now consider the use of the function above in constraints. gph(F) is not closed, hence cannot be a ( h i t e ) union of polyhedra. Therefore, gph(F) is not representable.
1.3.2
Simple Fixed Charge for Use in Equality Constraints
For use in cash balance equations, the following modification of the function in 1.3.1 can be useful.
t (here L
> 0 is b "minimumusage level")
M
Figure 3: Equality Fixed Charge
gph(F) is PI U P2 where
A disjunctive formulation is:
LECTURE 1
17
z = yf,
y binary
Instead of using the equation z = yf, one m a y rather put "yf" at each occurrence of F(21). In the "real world", both minimum usage levels L > 0 and maximum levels M typically occur. However, what one uses to achieve a representation depends on the representation task. Why have we used a maximum level M above? Our next example shows that it is needed (see [Mey 19751).
1.3.3
Simple Unbounded Fixed Charge Fbi)
{ f, iiff 0,
F(zl) =
f
x1
Figure 4: Unbounded Fixed Charge
21
0
=0
< 21
R. JEROSLO W
18
Here we cannot get the recession condition to hold. Hence the epigraph is not represent able. For example, we might try: epi(f) = PI u Pz, where P1={(z,
21)121 =
P2=((z121)10
0z
L 211
2 0) 2
1 f}
Then rec(P1) = { ( z , O ) l z 2 0}, but rec(P2) = { ( z , z ) ~ z2 0, 2 1 2 0}, so rec(P1) # rec(P2). Ifwe try the disjunctive construction anyway, we will get the representable hull. (Here it is the identically zero function). This outcome is typical of common (sophisticated) errors when a representation is attempted for a nonrepresentable set. In this example, we note that epi( f) is still not representable if a minimum usage level is added, i.e. F ( z 1 ) = f if L 5 21. Again the recession condition fails. In our next example, we suppose that a benefit is obtained (rather than a cost incurred) from utilization of a resource as measured by zl. The discontinuity of a h e d charge at the origin caused no difficulties for epigraph representation. However, here we shall see that the matter is different.
1.3.4
Simple Fixed Benefit
(here B
-B
> 0)
i Figure 5: Simple Fixed Benefit
LECTURE 1
19
Here epi(F) is not closed, since the line segment from (0,O)to (0, -B) is not in epi(F). Hence epi(F) is not a union of polyhedra; and so it is not representable.
Simple Fixed Beneflt With M i n i m u m Usage Level
1.3.5
t F(x1) =
0,if
O < X l < L
-B, if L 5 x1
Figure 6: Fixed Benefit with Minimum Usage Here epi(f) = PI U P 2 where Pl={(Z,zl)lz P2={(.z,21)12
L 0, 2 1 L 0) L -B, 21 L L )
The disjunctive formulation simplifies to (exercise): 21
>_ Ly, y binary
with "-yB" in the criterion for F(z1). In the fixed benefit context, the unboundedness of 2 1 did not matter; it was the minimum usage level which was crucial. The subtleties of modeling, which we have illustrated in the f i s t five examples of functions of a single variable, give an indication of complexities of functions of two or three variables. We will see some of these issues in the next lecture. We conclude these examples by representing the epigraph of a logical connective, in which the value zero is viewed as "false", while value one is viewed
R. JEROSLOW
20
as ”true”,
1.3.6
”Or” Logical Connective
- Epigraph
First, let us represent a set which is simpler than the logical connective. Specifically, consider S = PI u P2 where: P1={(2,21,
all i = 1,...,n)
PZ={(z,zl,
a U i = 1,...,n)
...,ZJZ 2 0, 2; = 0 ...,z,)Iz 2 1, o 5 zi 5 1
A disjunctive formulation is:
2 Om1
z(l) 2 1m2
zil) = 0m1 all i
o 5 z!’) 5 1 m2 all i
+ m2 = 1 z = z(O) + A2) ( 2 o + mz = m2) m1
2;
= Zi1) + “!a)
( == $))
This simplifies to: 0
5 zi L. y 5
z all z i , y binary
In linear relaxation (LR), ’y’ can be omitted. We thus obtain conv(P1uP2) in LR. But Pz = conv(F) where F = {(z,z1, ...,z,)\z 2 1, all 2; binary). The set we actually wish to represent is S‘ = PI u F. We obtain conv(P1 u F) in the LR, i.e. the convex span of the epigraph of the ”or” connective. To simultaneously obtain a representation, m a k e all variables binary. This last example has illustrated a very important principle, which we will discuss at more length in Lecture 3, when we treat ”canonical constructions.” The principle is as follows: Suppose That: 1) A very large (possibly exponential) number of logical alternatives can be viewed as the union of a relatively ”small” number of ”groupings” of the alternatives
And 2) A relatively ”small” shu7-p ”partial representation” is available for each of
these groupings” (via any technique)
LECTURE
1
21
Then Disjunctive techniques can be used to obtain a fairly " s m a l l " sharp representation of the entire set. With algebraic simplifications in addition, the representation can be made even more compact.
1.4
Related Work
As is evident from the previous sections, there is a close connection between the theory of cutting-planes and the theory of representations. In fact, from [Ball9741 and [Jer 19741, these subject are essentially in a duality relation (see particularly displays (107) and (108) of [Jer 19741). Moreover, the polyhedral annexation approach of Glover [Glo 1975133 is closely related to the disjunctive methods, in that both approaches generate the same family of cutting planes (see [Jer 1977]), and both are systematic developments building on an observation of Owen [Owe 19731. I have surveyed cutting-plane theory earlier in [Jer 19771 and [Jer 1978a] and consequently do not discuss it here. In retrospect, a relatively neglected result of cutting-plane theory is Balas' elegant characterization of the convex span of facial constraints [Bal 19741, [Bal 19791 which is related to Blair's result [Bla 19771 and has given rise to several consequences including [Bla Jer 19841, [Jer 1978bl. The boolean methods of discrete programming (e.g. [Ham Rud 19701, [Ham 19741, [Ham 19791) utilize concepts and ideas from the propositional logic, as do the disjunctive methods which are OUT focus here. Boolean approaches often proceed via reductions of an integer program to logical inequalities, (e.g. [Gran Ham 19741, [Ham Joh Pel 19741, [Ham Ngu 19791) and are most effective in the difficult context of nonlinear formulations (e.g. [Ham Pel 19721, [Ham Han Sim 19841, [Ham Sim 1987a], [Ham Sim 1987bI). In contrast, disjunctive approaches seek propositional logic structures in the manner in which a large linear constraint set is naturally decomposed into unions, intersections, etc. (as will be clearer after Lectures 3 and 4). These two approaches use boolean logic in different ways.
1.5
Exercises
Problems 1-4 of the "Illustrative Examples" can be worked.
This Page Intentionally Left Blank
23
LECTURE 2 FURTHER ILLUSTRATIONS OF DISJUNCTIVE REPRESENTATIONS Summary: The purpose of this lecture is to furthur familiarize the reader with disjunctive represent at ions. To this end, it contains applications to two-dimensional fixed charges, to either/or constraints, and other common constraints. One example accounts for the ”disaggregation” phenomenon of Graves and Geofiion [Geo Gra 19741. A result on a simplification for certain instances of multiple right-hand-side (r.h.s.) constraints is given, along with a practical rule-of-thumb for representing cost or revenue functions that are given by combining components of cost or revenue. Some of the examples are drawn from [Jer 1984bl and some others from an earlier version of [Jer Low 19841. The result on function representability is from [Jer 1984al and the simplification lemma is from [Jer 198681.
2.1
Some further examples
We continue the list of examples from Lecture 1.
2.1.1
Graph of ”or” logical connective
In 1.3.6 we treated the epigraph of the ”or” connective. Here we treat the graph. Let
R. JEROSLOW
24
We have S = Po u F1 u
...u Fn.We represent S disjunctively:
DF ~ + m l + - - - + m , = 1allmibinary z =
xy z(i) (= xy mi) d
x j = ~ ; x y ) (h}
(= S ) In what follows, we shall need some definitions. Def: The construction Op(.) parallels the set operation Op(.) on if:
(S1,...,
Ezample: Let the construction S1 A ...A 3, consist of juztaposing the representations !&, ...,S, while making auxiliary variables of different Si disjoint. Then A parallels n (intersection): EV&
A
... A st) = S1n ... n St
LECTURE 3
45
This holds for for A.
all representations, so we say "there is no domain restric-
tion"
We note that spatial relaxation Rel(-) is related to syntactic relaxation RL(-) by: Ev(RL(2)) = Rel(3) We also note: Rel(S, A
...A S t )
= Rel(S,) n ... n Rel(8,)
2 conv(S1) n ...n conv(S:) 3 conv(S1 n ...n St) In the above, the first inclusion 2 is equality = for sharp representations. The second inclusion
3.2
2 is typically strict.
The union construction
We need some preliminary results, and we utilize this definition of the "starred recession cone" of a representation = ( A , B , h, K )of a set S: rec*(S) = {el
for some y with yj = O ,
For polyhedral P,if
j E K, Az+&
20)
P # 0, rec*(P) = rec(P)
Proposition: ([Jer 1984c, 1986b1): For representable S # 0 with representation rec*(S) = {el
for all 2'
= {el
e' E S
S
= (A, B , h, K):
and m
2
0,
+m e E S)
for some 2' E S and all m 2 0, z'+mz E S }
Suppose we are given representations 2; = ( A ( i ) B('), ,
K ( ' ) )for sets S;:
R. JEROSLOW
46
...
Theorem: [Jer 1984c, 1986bI S1v v St represents the smallest representable set containing S1 u .._ u St. If rec*(Si) is independent of i = 1,...,t , then it represents S1 u u St. Furthermore:
---
Note: If rec*($;) is independent of i, Ev($, v
... v S t )
=
S1
u ... U St so v
parallels u on this restricted domain of (S,, ...,S t ) . In our notation, we use the logical 'or' symbol 'V' for the construction which parallels the union operation, just as we need the logical 'and' symbol 'A' for the construction which parallels the intersection operation. For other operations, we may use the same symbol for the paralleling construction, or sometimes we underline the symbol. The domain of the union construction 'V' is necessarily restricted. Even for union of polyhedra, without the recession condition the union is typically not representable, so there cannot be a parallel construction for the union in general. The union construction very closely follows the disjunctive construction for union of polyhedra; it simply adds the constraints y): 5 m; for all j E K('), which are necessary. Note that if the yy) occur in a set partitioning (SOS1) constraint
CjEK(;) )y: = 1 (which occurs e.g.
when a disjunctive representation is used
for Si) that constraint is homogenized to become CjEK(;) y)! = mi. Therefore explicit constraints yy) 5 mi would not be needed in such a context. The formula above for Rel(Sl v v S t ) corresponds to what we earlier termed the sharpness of disjunctive constructions. Indeed, given the relaxations Rel(!&) for the individual Si, and using these representations $; for Si
-
LECTURE 3
47
to represent S = S1 U ... U St, one cannot expect to have a better relaxation than conv(Rel(S,) U U Rel(St)).
...
3.3
Some other constructions
We use the notation of 3.1 above and give constructions for some other commonly occurring set operations. Our list of operations is not meant to be exhaustive in any way, but simply useful in common contexts.
S = S1 x ..- x St
represents a set S:
---
Clearly S = S1 x x St so x parallels x with no domain restriction. Note that the variables z(') are made disjoint here. Also note: Rel(S) = Rel(S,) x
- - - x Rel(2,)
We compare the formula above with a well-known formula for the convex span of the Cartesian product of sets (se e.g. [Roc 19701): conv(S1 x
..- x St) = conv(S1) x - - - x cow(&)
The Cartesian product operation occurs in the settings of decentralization of operations (as in decomposition [Dan19631)and in the setting of hierarchical delegation of efforts. Another important set operation arises from linear &e transformations T ( z )= L ( z ) v, where L is a linear transformation. The most common instances of such linear a f h e maps are s u m and projection. The following is easily verified.
+
Proposition: T(conv(S)) = conv(T(S)) Suppose we have a representation S = ( A ,B , h, K ) of a set of S. This is then a representation T(S)of T(S):
R. JEROSLO W
48
2
E
T(S)++ there exist w , y with y j E ( 0 , l ) Aw 2
all j E
K,
+ By 2 h,
= T ( w )+ u
so T ( S )parallels T ( S ) . Also note Rel(T(S)) - = T(Rel(S)) We note that, of all the basic constructions (i.e. union, intersection, Cartesian product, linear afhe transformation), only the union construction introduces new control variables.
3.4
Some technical properties of the basic constructions
All the basic (or: "canonical") constructions Op(-) are (relaxation) commutative on their domain, i.e.
This can be verified by checking the formulas for Re1 for each of the basic constructions. Commutativity is not an easy concept to motivate, but it is an essential technical concept for later work.
DeE Op( .) is elementary on its domain if
All basic constructions are elementary, as one easily checks on a case-bycase basis. Proposition: If %( .) is elementary then it is sub-commutative, i.e.
49
...,
DeE Op(.) is sharp if, whenever S1, St are sharp representations of S1, ..., St whichGe in the domain of Op(.), then Op(Sl, S t ) is a sharp representation of Op(S1, ..., S:) i. e.:
...,
We note the intersection construction is not sharp since, even for sharp S;, Rel(S, A
...A S t )
= conv(S1) fl
...f l conv(St)
1 conv(S1 n ...n St) and typically the inclusion 2 is strict. Hence "commutative" does not imply sharp. The other basic constructions are sharp, and in this way we have a precise sense in which intersection is the issue of MIP!
3.5
Composite constructions and 'structure' in MIP
The composite constructions arise from composing (iterating) the basic constructions. These are needed to describe the structure of MIP constraints sets, and the degree of composition (i.e. maximum nesting of iteration) is often in the two to six range for practical MIPS. The degree depends on how the constraint sets are viewed, more specifically, which subsets of the constraints are viewed as the "modules" having representations that are not furthur decomposed into basic operations. To illustrate this concept of structure, we consider an hypothetical scenario.
R. JEROSLOW
50
Your firm has two divisions, which are completely separate except for some company-wide budget restrictions and some flow of goods to division dist (distributor) from division fact (a factory). These company-wide constraints form a set S; x E S is required of a feasible operating plan. These constraints may include e.g. overall budget and global resource constraints. Here x = (x(l),&)) and, except for the company-wide Constraints, the variables x(l) are those of diet. Diet distributes goods x(l) to a number of customers. It obtains these from three basic sources, but the first source has a choice of three possible subsources. The fact that x(') can be distributed through a network to meet all customer demand is represented as "x(l) E S1". Thus the constraints on dist, other than x E S, can be represented:
where Sa U S3 U S4 is the f i s t source, SS,the second source, and s6 the third source. Fact has variables x ( ~ )representing , two products which it manufactures. The first product can be manufactured in two ways, and the other in three ways. Thus the constraints on fact can be represented: ($7
v
$8)
x
(S, v Sl, v S l l )
The fim's situation can therefore be represented as the MIP:
The degree of this composite construction is four. In addition, if our firm is actually part of a larger corporation, the degree of the composite construction would be larger. This scenario provides a typical example of how composite constructions can be used as a means of structuring many large-scale MIPS, through a focus on the manner in which parts of the model are linked together. Even a general MIP can, if one wishes, be viewed from the perspective of composite constructions. For example
can be viewed as
LECTURE 3
51
where S1 = ((vl,sr~)lsr~ I 10, YO I 8) and Sa = ((sr1,yl)l for some ~ 3 ~ 2 2 04 we have 11 = 23, yl = 523 - 721. However, this may or may not be an insightful reformulation. In general, the most appropriate composite construction corresponds to the intuitive perceptions of the model's users as to how the subparts are structured. Composite constructions provide, first of all, a language for precisely describing structure of a MIP problem. In addition, a study of their mathematical properties assist in the solution of the problem. Via a study of the relazations of composite constructions ("Relaxation Calculus") we can obtain means of improaing the linear relaxation (LR). For example,
S- A (S1 - A is inferior in
(S, V 53 V S4 S, + $6)
X ($7
V SS)
X ($9
V
$10
V
SII))
(1)
LR generally to
s A (((Sl
A
( S 1 + $5
v(S1 A ( 8 4
+ s 6 ) ) (51 + Ss + Ss))) x
x(S9
(s3 (87
+ s 6 + S6))
v s.8)
(2)
v SlO v S l d
even though it is equivalent to (1)(i.e. describes the same set). In turn (2) is inferior in LR to 4
V (S A ( ~ A1 (Si + SS +
$6)
x
($7
v S 8 ) x (Ss v $10 v S11))
(3)
i=l
although equivalent. Howewer, even the best formulation (3) of these three is not sharp in general. We will prove the relative LR dominances among (l), (2) and (3) when we discuss the distributive laws in Lecture 4. Here we content ourselves with a summary of the four main goals of our concept of structure via composite constructions: 1. To reflect the structural features of MIP's, in terms of the way the parts are linked together, and to provide a language for dkussingrepresentations.
R. JEROSLOW
52
2. To develop guidelines for choosing among alternative representations of
the same MIP constraints. 3. To rnodulurke the work of representability. E.g., any desirable representation (not necessarily disjunctive) can be inserted for use for any given part of the MIP.
MIP to logic - based decision support, firstly to propositional logic but also to fragments of predicate logic.
4. To relate
The logic-based approaches are one of the main directions in ”artificial intelligence”. There are several other tie-ins between MIP and A1 as well. We will discuss these issues in Lectures 5 , 6, and 10.
3.6
T w o central technical results
In this section, we lay some groundwork for Lecture 4, by citing some technical results on the concept of commutativity and sharpness. These are hard results to motivate by themselves; their importance is in their later applications. We also indicate the key ideas in the proofs of these technical results. Full details for most of these proofs are in [Jer 1984~1;however that reference does not do linear &e transformations in generality, although it does handle addition and projection. For furthur details on linear afFine transformation, see [Jer 1986bl. The two main technical theorems are: Theorem on Commutativity: If the composite construction Op(.) does not contain occurrences of both union and intersection, then it is relaxation commutative.
Theorem on Sharpness: If the composite construction Op(.) does not contain occurrences of intersection, then it is sharp.
Here are the key lemmata used in the proofs: Lemma 1: Each basic construction (union, intersection, linear afFine transformation, Cartesian Product) is relaxation commutative.
LECTURE 3
53
L e m m a 2: If Op(.) has no occurrences of union and K = vector of convexsets, then Op(K) is convex.
...,K t ) is a
(K1,
L e m m a 3: For T linear f f i e :
(In the f i s t equation, T is an operation, while it is a construction in the second equation).
L e m m a 4: Rel(S1 x
- - .x S t )
cow( S1 x
0
- x St)
..-x Rel(S,) = conv(S1) x - - x conv(St) = Rel(Sl) x
*
Idea of proof for commutativity: By induction on the formation of Op. Let
OP(S) = op‘(op,(S),...,o p , ( S ) ) )
...,st),
where S = (S1, Op’is a basic construction, and the Op; are composite constructions or the identity. Case: Op’ is intersection. We have:
Rel(Op(S)) = Rel(Ai oP,(S)) = conv(/\t=l Rel(oP,(S))) = &l Rel(Op,(S))
(Lemma 1) (Re1 is convex)
= nf=, conv(Opi(Rel(S))) (induct.) = n L 1 oPi(Rel(S))
(Lemma 2)
= Op(Rel(S)) = conv(Op(Rel(S))
(Lemma 2)
R. JEROSLO W
54
Case: Op' is T ( . ) ,linear &e
Rel(Op(S)) = Rel(T(Op,(S)) = T(Rel(Oll,(S)))
(Lemma 3)
= T(conv(Opl(Rel(S)))) (induct) = conv(T(Op,(Rel(S))))
(Lemma 3)
= conv( Op( Rel( S)))
Etc. for other cases.
Idea of proof for sharpness: Also by induction:
Case: Op' is union
R 4 0 p ( S ) ) = R 4 V ; op,(S)) = cow( Ui Rel(-1Op .(S-)))
(Corn.)
= conv(Ui conv(Opi(S))) ( h d Hyp)
= conv(U; Opi(S)) = conv( Op( S ) ) Case: Op' is linear a f b e T.
&l(Op(S)> = T(Rel(Op,(S)) = T(conv(OP,(S)))
(Lemma 3)
(Ind HYP)
= conv( T(OP1( S ) ) ) = conv( Op( S))
Etc. for Cartesian Product We illustrate the use of these technical theorems. Suppose:
LECTURE 3
55
+
Since (Sl x 8, S3) V S4 does not contain intersection, by the Theorem on Commutativity we have: Rel((8, x
S,
+ S3) v S4) = conv((Rel(S1) x Re@,) + Rel(S3)) U Rel(S4))
In addition, if the
8; are sharp for i = 1,2,3,4 we have:
Rel(S, x S2 Since S3
3.7
+ S3) V S4) = conv((S1 X S2 + S3)U S4)
+ Ss A S4 does not contain union, we have:
Hereditary sharpness
DeE A representation S is hereditarily sharp if, for every partial assignment W of control variables to binary values, with resulting representation S(W), we have Rel(S(W)) = conv(Ev(S(W))) Thus, a hereditarily sharp representation is sharp (take W = 8). A hereditarily sharp representation does not require reformulation at lower nodes of the search tree, in order aolely to retain aharpneaa.
Theorem on Hereditary sharpness If the composite construction Op(.) does not contain occurrences of intersection, it is hereditarily sharp ( i F takes hereditarily sharp representations to the same). See [Jer 1984~1 for more details on hereditary sharpness. Disjunctive representations are hereditarily sharp. In [Jer Low 19851 w e give a sharp representation which is not hereditarily sharp and which arises naturally.
This Page Intentionally Left Blank
57
LECTURE 4 TOPICS IN REPRESENTABILITY Summary: We conclude our survey of MIP representability with a brief discussion of these four topics: 4.1 Reformulation of MIP via distributive laws, which will serve to justify some claims made in Lecture 3, and will be used again in Lecture 9 in connection with logic; 4.2 A discussion of the regularity condition which arises in nonlinear (specif-
ically, convex union) representability, and its relation to some classical results; 4.3 An example of the kind of result which allows a significant reduction in the size of disjunctive representations, when a "combinatorial index set"
is present; this result simultaneously extends common uses of Martin's variable redefinition and the union construction of disjunctive methods; 4.4 Some experimental results which illustrate the value of disjunctive formu-
lations.
4.1
Reformulation via distributive laws
The material in this section occurs in [Jer 1986b]. It is well-known that:
More generally we have: Proposition: If Op( U,W) has one occurrence of the set U,and W is a vector of sets, then: Si, W ) = OP(si, W )
OP(U 1
U I
R. JEROSLOW
58
Theorem: For a composite construction Op(U.W) with one occurrence of the representation of g,and W a vector ofrepresentations:
2 Rel(Vi0p(S;,W)) with = in place of 2 when Op has no occurrence of intersection.
(*) Rel(Op(V.S. - a-?- W))
The content of the theorem is that the LR is improved (made smaller) by moving union outward. The reader can verify that this is what we did in Lecture 3.3 to derive (2) from (1) and to derive (3) from (2). Note that this improvement in the LR generally increases the size of the representation, unless there are simplifications which occur.
Idea of prooE An induction on the formation of Op:
where Op' is a basic construction and we allow that any Op;(W) may simply be a representation in W, similarly for Op,(U, W). We have by commutativity
Also note:
Therefore we have that Rel(*(V;
&, W) contains:
LECTURE 4
59
A slightly more involved argument treats the = case. It is well-knownthat
Su
(nSi) = n ( su si) 1
1
More generally, we have Proposition: If Op(U, W) has one occurrence of the set U and W is a vector of sets oP(n si, W )c OP(si, W )
n i
a
Moreover, if every linear a 5 e
T in Op is 1- 1,
op(nsi, W ) = i
O P ( ~ ; ,W ) i
Theorem: For a canonical construction Op(U. W) with one occurrence of the representation U,and W a vector of repreemtations
(Asi ,w))c R d ( A Op(Si,w))
R ~WP
i
a
Note: This is opposife the has the better LR.
2 for v. Here a generally smaller representation
W = (Wl, ...,Wt) and W' = (Wi,...,Wi) be vectors of sets with representations W,,xand let 9 be a composite Monotonicity PrincipZe: Let
R. JEROSLOW
60
construction o f t arguments. Let Rel(E) Rel(=) for i = 1,...,t .
2 Rel(W')
abbreviate Rel(W;) 2
Then Rel(W)
2 &l(W')
+
Rel(op(W))
2 Rel(Op(W'))
By using the two distributive laws and the monotonicity principle, we obtain lattices of reformulations of the same MIP constraint set, where we can predict changes in the LR. There are, at any given point, m a n y options for which distributive laws will be used to obtain a reformulation. The example below illustrates this, with D = A k D k . The arrows point to an improved LR, while "hc" written on an arrow indicates that the problem representation would increase, unless simplifications occur. Similarly, "Dec" indicates a decreased size of the representation:
LECTURE 4
61
Figure 15: A Lattice of Representations
R. JEROSLO W
62
Here are some reformulation principles, to aid in tightening the LR: 1. Always begin by pushing intersections inward (via distributive laws) wherever this is valid (e.g. all linear &e T are 1- 1). 2. Gradually move unions outward. When a union reaches the outermost
part of the construction, consider the (generally preferred) alternative of branching on it. 3. Always check reformulations for possible simplifications which may reduce size. 4. In choosing which union to move outward, consult the user for informa-
tion to rank which 'or' subproblems appear to be most crucial. Pretesting may be used to validate user perceptions.
4.2
Convex union representability
In this section, we consider the representation of certain nonlinear sets, specifically, convex constraints with some variables binary. This material is drawn from [Jer 1984bl. Let F ( z ;y) be a vector of positively homogeneous closed convex functions, with no --oo values. The following technical regularity condition plays an essential role:
(*I
F( 0 ; y )
5 0 -.+
y =0
Def: A set S is bounded con~lezrepresentable (b.c.r) when there is a vector b, an F ( z ;y) as above, satisfying (*), and a subset K of the indices of auxiliary variables y with z ES
t)
there is 1 with y j E ( 0 , l ) for j E K and F ( z ;y) 5 b
Theorem: A set S is b.c.r. iff S = S1 u u St is a finite union of closed, convex sets Si with rec(Si) independent of i. When S is b.c.r., it has a sharp representation, i.e. an 3, Ev(S) = S , with
---
Rel(S) = cow( S)
LECTURE 4
63
We remark that the disjunctive construction can be adapted to produce the sharp representation S. Also, the canonical constructions go over, with new conditions on their domains (see below). Moreover, the sets representable with K empty are exactly the closed, convex sets. h the regularity condition (*) the constraints 0 5 yk 5 1 for k E K can be viewed as included in the vector of functions F, without changing its homogeneous nature, etc. Thus (*) is actually equivalent to the apparently stronger requirement: F(0;y) 5 0 and yk = 0 for k E
&e
K
+
y =0
We now check what the regularity condition becomes in the case of a linear operation. Here: If S is represented by F then: z E T ( S )e, there are y, w with y j E (0,1} for j E K, F ( w ; y) 5 b,
2
= T(w)
WritingT(w) = A(w)+wo, with A linear,the regularity conditionbecomes:
F(w;y)5 0,
0 = A(w)
---t
9 = 0, w = 0
It need not be true. Let us now specifically focus on the two common applications of linear a i l h e operations, namely, su11~8and projections. Let T(S1, ...,St) = S1+ ... S, and each S; described by &(z; y(')) 5 b(;):
+
zE
S1+
...+ St
t)
there exist y(;), w(;) with y)! for all j E K ( i ) , all i , 0
5 yj 5 1 for all
j E K ( i ) and all i , J!!(W(qyw)
The regularity condition is:
< b(') all i
E {0,1}
R. JEROSLOW
64
Lemma: [Jer 1984b] rec(S;) = {wI
for some g(') with ' :9
= o for j E K ( i ) ,
F;(w;y(')) 5 0) provided S; # 0. Hence to meet the regularity condition, we need only assume: w(;) E rec( S;) for all i and
xi wG) = 0
-+ all w(4
=0
This is a clcrseicd requirement for SI+ ...+ St to be closed and convex. Similarly for projection of 2 = (&I, &I) onto &I: E proj(S)
f+
there exists d2), with gj E (0, I}, jEK, O o
and let Dt be the (known) demand in period t . It is known [Vei 19691 that the extreme points of the set of solutions have the property that, in any period of positive production the entering inventory is zero. For example, in a T = 5 period problem where every three adjacent components of the 15-component vector represent a triple ( z t , I t , y t ) in ascending order of the period t, the following vector represents one extreme point solution:
(01
+ Da, Da, 1,0,0,0,D3 + D4 + Ds, D4 + D5,0,0, D 5 , 0 , 0 , 0 , 0 )
In this solution, we manufacture in period 1 for the f i s t two periods and in period 3 for the last three periods, using inventory to satisfy demand in periods 2 , 4 and 5 where we do no production. Note that the vector above can be viewed as the s u m of the vectors
+
(01 Da, Da, 1 , O , O , O,O, O,O, O,O, O,O, 0,O) and
+ D4 + Ds,D4 + Ds,O,0,D5,0,0,0,0)
(0,0,0,0,0,0,~3
These two vectors can, in turn, be viewed as labels for certain arcs in the following graph; specifically, the first vector labels the arc from node 1 to node 3, while the second labels the arc from node 3 to node 5: In a similar maaner, all extreme point vectors are obtained by choosing a path from node 1 to node 5 in the network of Figure 16, and adding up the associated arc vectors. In general, there are exponentially many paths through such ”lot sizing networks;” nevertheless, a compact representation of the extreme points will be possible, using the network structure as an “index set” for the vectors. In addition, the linear relaxation of the representation of extreme points will provide a representation of all solutions, i.e. will be sharp. The sharpness will be a consequence of a sharp representation for the indez set - i.e. for the set of all paths in a network. Indeed, by placing unit capacities
LECTURE 4
67
Figure 16: Network on an Index Set on all arcs in the network, a unit flow into the "source"node 1 and a unit flow out of the "sink" node 5 , from the work of Ford and W e r s o n [For M19621, the extreme point solutions to the standard conservation equations at nodes will be the indicator vectors of the paths, where these latter vectors have a component for each arc in the network. In this manner, we have an example of a set of vectors which is not itself a combinatorial set, and so not the indicator vectors of a combinatorial problem, but for which there is a "hidden" combinatorial structure that "indexes" the set and can be used to achieve sharpness for a compact representation. Let us now abstract this example somewhat.
In place of the arc vectors, we consider general representable (b-MIP.r) sets Tj, rather than simply singleton sets. The natural representation of singleton sets is sharp, and in general, we will require a sharp representation for each Tj. In place of the network, we consider a general combinatorial structure F of ordered pairs (i, j),where in our application (i, j ) E F abbreviates "arc j is on path i." We shall require a sharp representation for the incidence vectors of the collection of sets { j l ( i , j ) E F} as i (the paths) varies. In this broad setting, we are now prepared to state the representation result which will account for lot sizing as one application. Theorem: Let S = Ui S;,each S; = Cj/(i,j)EFTj, with each Si of j . independent
Let
fj(2;
@) < b ( j ) , yf) E {0,1} for p E Kj, be
a
# 8, rec(Tj)
sharp representation of
R. JEROSLOW
68
T,. Let g ( u ; w ) 5 b, wp E {0,1) for p E K be a sharp representation of
V = {u binary I for some i,
[uj
=1
if (i, j ) E F ] for all j } Then the following is a sharp representation of S: z E
S
++
for some z(j),y(j),u, w with
wp E {0,1) for p E K , I#) E {0,1) forpEKjallj, o < y f ’ < u j for p E K j all j , we have fj(z(j); ,W)
5 b W U j all j
s(u,w)5 b and 2 = C, z(j) We shall provide a proof of the theorem above in a forthcoming paper.’ The description 3 = Vi(~j~~i,j~cF&’j) is already sharp, since it has no occurrence of intersection. Only the stze of the representation is at issue in this result. When the Tj are general b-MIP.r sets, but F is the diagonal i.e. F = {(j,j ) l all j ) with sharp description C juj = 1, this result becomes the union construction of the disjunctive methods. Several variations of simple lot sizing are treated by the same approach, so long as the problem reduces to the set of extreme points described earlier (or a very similar set). The versions in the literature differ in this respect from what we described above, that only the implication from zt > 0 to yt = 1 is sequited in the literature. To summarize, polyhedral combinatorics can often be used to obtain sharp representations of the base sets (the ”modules”) of a composite construction. It is equally usetul as a means of entirely restructuring the representation of a composite set, when relevant combinatorial objects are used merely to indez such a construction. The results obtainable from polyhedral combinatorics me, on the one hand, fairly speciulized. However, they are among the most powerfil principles for limiting enumeration and making representations compact. ‘Added in proof: See our technical report, “Two Mixed Integer Programming Formulationa Ariaing in Manufacturing Management.”
LECTURE 4
69
These new formulations furthur extend the concept of "special structures."
4.4
Some experimental results
Here we summarize two computer experiments reported in [Jer Low 19851 and described in furthur detail in an earlier technical report. All problems were randomly generated with specific structure as described below. 4.4.1
Either/or constraints
The scenario is a multi-division firm in which each division has a choice of technologies. The composite structure is of depth 3:
Here P represents the common constraints on corporate resources, and P;j is the j-th technology available for division i. In all problems run,P has three constraints, and all Pij are 3 by 3. "N1- Nan below meam 1"' divisions, Na technologies per division. In the following tables:
MIPP= formulation via composite construction, with disjunctive methods for the either/or constraints of technology choice;
MIPS = "standard" formulation for either /or constraints The right-hand-side (r.h.s.) multiplier is a measure of the degree to which the common constraints dominate the problem. At the setting 1.1, these are moderately tight, and so diminish the advantage of the sharp formulation for either/or constraints. At the setting 1.9, the either/or constraints dominate. In all three tables, the feature which stands out is the LP/Discrete ratio, which gives the ratio of the value of the linear programming relaxation to the value of the integer program. The fact that the two programs are so close in value accounts for the favorable results, and it is an algorithm-independent measure (our problems were run on Apex IV). In general the size of our "sharp" formulation MIPP was at least twice that of the "standard" formulation MIPS.
R. JEROSLO W
70
I TIME
MIPP
I
I
I
I
MI S
NODES
MIPP
MIPS
7
1.0005
1.084
0.92
19
1.0047
1.098
5
29.08
65
1.0007
1.174
3.92
12
6.67
13
1.0048
1.088
6.93
3.75
7
24.25
57
1.00275
1.102
8.85
33.5
5.16
10
69.25
165
1.0008
1.1336
9
21.43
484'
4.11
9
140,
1.00168
1.1466**
4
29.1
399'
5
6
1.00046
1.144.'
#
MIPP
3-2
12
.63
0.39
1.83
3
3.75
5-2
12
1.86
2.36
2.5
6
8-2
12
3.58
3-3
12
1.64
0.91
5-3
12
3.15
8-3
12
12-3 15-3
Problem
MIPS
Avg
Max
Avg
Max
-
11.52 2.17
382
-
only one sample ++
the ratio is the LP over the best solution found
LECTURE 4
+ 0.83
0.69
2.2
71
Table II: MULTI-DMSION PROBLEMS: R.H.S.MULTIPLIER (1.3)
I
I
I
I
I
?P
Problem
#
-
-
I
I
I
NODES
'
I
'
MIPS
1
RATIO LP/DIS( RETE
-
MIPP
MIPS
Avg
MIPP
MIPS
7
1.0066
1.142
Avg
Max
4
4.9
MsI
-
3-2
18
5-2
18
1.92
2.72
2.3
4
9.0
29
1.0009
1.118
8-2
18
5.89
18.60
2.7
7
39.6
91
1.0013
1.174
3-3
18
1.69
1.52
3.4
10
9.7
17
1.0032
1.133
5-3
18
4.20
11.52
3.9
10
38.2
119
1.0011
1.174
8-3
18
10.99
123.5
4.5
18
194.7
597
1.0017
1.116
12-3
3
19.4
-
6.3
9
-
1.00008
9
-
1.00008
15-3
3
-
30.5
-
4
I
I
R. JEROSLOW
72
CIPLIER (1.9)
-
I MIPP
TIME
I RATIO
I
I
I
MIPS
NODES
LP/DISCRETE
Problem
#
MIPP
MIPS
3-2
8
0.97
1.44
2.0
3
8.75
14
1.0026
1.186
5-2
8
2.32
10.42
2.0
3
32.75
55
1.0006
1.2383
a-2
8
6.60
64.92
3.0
6
115.4
278
1.0034
1.1926
3-3
8
1.59
4.13
2.38
3
21.1
31
1.0088
1.1971
5-3
8
5.57
41.77
4.12
9
107.7
337
1.0041
1.2393
8-3
8
12.22 311.68
3.63
7 407.4
762
1.0013
1.2283
12-3
3
25.0
1.0004
1.1408
15-3
-
4
35.15
1.0008
1.315
325* 3.66 306*
7
4
218'
14
270'
-
* only one sample ** the ratio is the LP over the best discrete found when stopped
LECTURE 4 4.4.2
73
Multiple fixed charges
Fixed charge problems with multiple charges and increasing returns to scale were modelled. The graph of a typical function of this type is as drawn below
Figure 17: Complex Fixed Charges In this scenario, the zi were the number of components of various types which could be used to assemble several different types of end products, with known demand for sets of ”equivalent” end products. We explored several formulations. In the ”sharp” formulation, the cost functions Fi were represented by the disjunctive formulation. In the ”linked sharp” formulation, the cost functions Fi were each additively decomposed into a s u m of fixed charges plus an ”economies of scale” function. Each component was modeled sharply via a disjunctive formulation, and these were then added. In addition, formulations I and II are similar to those appearing in the literature and were used as two ”standard” formulations. In our data, the parameter p is a measure of the (minimum) per cent of
R. JEROSLO W
74
TOTAL
INTEGER
MODEL
# CONSTRAINTS
# VARIABLES
VARIABLES
STANDARD I
10 * N X i + N C
11 * N X i + N Y t
6* NXi
STANDARD I1
13 * N X i
+ NYt
6*NXi
+NC
14 * NXi
SHARP
6*NXi+NC
10*NXi+NYt
3*NXi
LINKED-SHARP
11 * N X i + N C
19 * N X i + N Y t
6*NXi
cost tied up in fixed charges; these were relatively high on the average. Also N X i is the number of component variables z;. As can be seen in Table IV, the size of the sharp representation is the smallest of all representations. Here is an instance where hereditary sharpness is achieved with an improvement in representation size, due to problemdependent simplifications. As c a n be seen from the data in Table V to VIII, these problems were difficult for all methods of representation. This occurs because the linear relaxation for even a sharp representation of fixed costs is a poor estimate of the actual cost in most of the range of the variable, and here the fixed costs were dominant in the data. Nevertheless, the sharp representation was better, and its relative advantage increases with problem size, as in 4.4.1.
75
LECTURE 4
t+
SAMPLES
P = 0.5
DISCRETE/ LPRATIO
SHARP
1.36
I
1.86
34.04
11
1.36
I #
# SAMPLES
I
P = 0.1 1 TIME NODES ~
SHARP I
II
, r 83138
DISCRETE/ LP RATIO 1.22
25.01
65.61
138.25
1.48
106.29
131.38
1.22
I
Table VI: Seven problems with P = 0.3. NX1 = 5
I
I
Avg-time
Avg.time
to find
Total
Number
Formulation
to LP
optimum
time
of nodes
Discrete/LP
Sharp
1.3 sec
14.3
18.7
63.3
1.26
I
2.0
8.7
41.3
84.4
1.50
Linked-sharp
3.8
30.0
60.9
79.3
1.26
76
R. JEROSLOW
Table VII: Six
roblems with P = 0.3, N X 1 = 6
v .
~~
Avg t ime tofmd
Total
Number
optimum
time
of nodes
Discrete/LP
2.0 sec.
22.6
36.5
94.8
1.32
3.0
40.3
120.2
176.6
1.56
Linked-sharp 5.5
25.2
129.2
119.2
1.32
Avg.time Formulation
Sharp
I
to LP
Table Vm: A harder prob em at NX1 = 6, P = 0.3 Formulation
1
Time to LP
2.0 sec.
Linked-sharp
I
I Total time
of nodes
57.4
62.9
137
67.7
2133
2900
113.4
2400 unknown
Time to OPT
Number
1 :!i!
Discrete/LP
Part I1
LOGIC-BASED APPROACHES T O DECISION SUPPORT
This Page Intentionally Left Blank
79
LECTURE 5 PROPOSITIONAL LOGIC AND MIXED INTEGER PROGRAMMING Summary: We begin our discussion of the logic-based approaches to systematizing human intelligence by an exposition of the propositional logic and its relation to mixed-integer programming. This is a natural starting point for mathematical programmers, since propositional logic can be viewed as a special kind of integer programming constraint set. In addition, many of the successful current practical uses of logic in decision support do not go far beyond propositional logic, although this situation may soon change. We defer until the next lecture a fuller discussion of the logic-based approaches, and a treatment of a more complex logic, the predicate logic. Predicate logic is the theoretical basis of the theorem-proving framework of PROLOG [Clo Me1 19841 and of related technologicalefforts, including the Japanese ”Fifth Generation Project” [Feig McCor 19821.
5.1
Introduction
The propositional logic concerns assertions such as ”John is tall,” ”Mary went to the store,” etc. with a definite meaning, such that these assertions are either true or false. The focus of this logic is not on the meaning of the assertions, nor even necessarily on whether they are true or false. Rather, the focus concerns how the ”unanalyzed” basic assertions are combined by means of the logical connectives ’and’, ’or’, ’not ’, ’implies, ’ etc. and on the laws governing such combinations. The unanalyzed assertions are represented by a numbered sequence of Zettercr PI,Pa,P3, ..., which, in informal discussions, are written P,Q, R,etc. A Iiteralis a letter ( P j ) or its negation (-#j). More complex propositions are built up from PI, Pa,P3, ...by means of the connectiuecr. These propositions are denoted A, B, C, etc. The meaning of the
R. JEROSLO W
80
connectives is as follows:
A A B Asserts that both A and B are true. ’A’ is read ’and’ (conjunction). A V B Asserts that at least one of A, B is true (possibly both are true). ’V’ is read ’or’ (disjunction).
1A
Asserts that A is false. ’1’ is read ’not’ (negation).
A 3 B Abbreviates - A V B . ’3’is read ’implies ’ (implication). We remark that A 3 B does not assert that A causes B, only that either A is false or B is true. Other notations used elsewhere for A 3 B are A + B and (in logic programming) B t A (read: B if A). Let + f j abbreviate Pj and let -Pj abbreviate 1Pj. A disjunctive clause is a proposition of the form VjEKkPj for some finite index set K & { 1’2’3,...}.
E.g.
V
PI V Pa0 V 1P17)
is a disjunctive clause.
A conjunctiae n o m d form (CNF) is a conjunction of disjunctive clauses: (4’2 V
P1 V Pa0 V 1P17) A ( i P 1 V Pie) A (Pa V -Pa0
V Pi7
is a CNF.
Sometimes a CNF is given as a list of disjunctive clauses: V Pi V Pa0 V -Pi, 1Pl v 4 0 Pa V -@a0 V Pi7 It is tacitly understood that all clauses are asserted as true.
The propositional logic has greatly influenced the disjunctive methods of mixed-integer programming. In fact, these methods concern simply the negationless propositional logic in which unanalysed propositions have been replaced by systems of linear inequalities. We observe that a binary mixed integer program constraint set
(Az
2 b) A ( 2 1 = 0 V 2 1 = 1) A
...A
(2,
=0V
Z,
= 1)
is a CNF, in which the propositional letters are systems of linear inequalities. In MIP, negation does not occur. In fact, spatial negution of a linear inequality system (e.g. 2 1 = 0 ) is typically not closed (as e.g. 2 1 # 0)’ hence not representable.
LECTURE 5
81
In many settings, relative complement (relative to the b-MIP.r set forming the 'universe ') can serve as a negation for at least some of the 'letters.' E.g. = 1) 7(z1 = 0) = ( A z 2 b ) A (zl= 1) A (z2 = 0 V ...A (z+ = 0 V z+ = 1) i(Az5b) = 8 (but typically no negation for individual inequalities inside Az 2 b). Let us return to the propositional logic proper. We note some obvious basic laws, such as:
Symmetry: A A B = B A A , A v B = B V A Associativity: A A ( B A C ) = ( A A B ) A C,A v ( B v C ) = ( A v B ) v C De Morgan Laws:
l ( A ~ B =) i A V i B ~ ( A v B=) 7 A ~ + 3 In these laws, the meaning of the equality is that the left-hand-side (1.h.s.) proposition always has the same truth d u e (true or false) as does the righthand-side (r.h.s.), and this holds regurdless of the truth values of the unanalyzed propositions Pj. These laws are used to 'drive negations inward' until they are against letters (TPj) or disappear (as 11Pj = Pj). This subroutine is efficient, in fact, it is linear time. Two other important laws are the:
Distributive laws: ( 1 ) A A ( B v C ) = ( A A B ) v ( AA C ) (2) A v (BA
C)= ( A v B ) A ( A v C )
After using the De Morgan Laws to put any negations against letters, (2) can be repeatedly employed to reach a conjunctive normal form. We illustrate this general fact by a simple illustration:
P v (Q A R ) v ( X A Y ) =
[ ( P V Q ) A (PVR)] V (XA Y ) = [(XAY)V(PAQ)l A [ ( x A Y ) V (PV R)] = [(XVPVQ)A(YVPVQ)] A [ ( x V P V R ) A (Y V P V R)]
R. JEROSLOW
82
This use of distributive laws can, in the ’worst case’, require ezponential space. To see this, apply it to e.g. (PI A Pa)V (P3 A P1) V V (&-I A Pzn). A corollary of the above use of distributive laws is:
...
Corollary: Any proposition has an equivalent proposition in conjunctive normal form. The concept of a disjunctive normal form is developed by analogy with the CNF, by interchanging the roles of ’and’ with ’or’. Specifically, a conjzmctive clause is a conjunction of Literals AjEK &Pj. A disjunctive normal form (DNF) is a disjunction of conjunctions ofliterals (as e.g. abovein P v ( Q A R ) v ( X A Y ) ) . Using the De Morgan Laws and distributivity we have: Corollary: Every proposition has an equivalent proposition in disjunctive normal form. With this perspective, some of the computational issues in connection with the disjunctive methods can be explained as follows. The unanalyzed MIP is in CNF, while the disjunctive methods require a DNF, and the natural conversion of CNF to DNF requires exponential time and space in the worst case. For this reason, there is a need to analyze substructures where, either, the conversion is simple or simplified, or where the D N F is the natural formulation. For the same reason, there is a need to be able to use formulations intermediate between the CNF and the DNF, and to provide means of moving stepwise from CNF to D N F in a way which guarantees step-by-step improvements (thus our distributive laws for relaxations in Lecture 4). Glover’s polyhedral annexation approach is notable in this perspective, as it allows derivation of cutting-planes directly from a CNF formulation [Glo 1975b], and this can be advantageous.
5.2
A ”natural deduction” system for propositional logic
In the propositional logic, the tautologies play a special role - these are the composite propositions A which are true regardless of the truth values assigned to the letter Pj. For example, PIV 4’1 is a tautology: and so is A A B 3 .B A A (by the symmetry laws) as well as A A ( B V C) 3 ( A A B ) V (A A C) (by the distributive laws).
LECTURE 5
83
A mechanical way of testing a proposition A to decide if it is a tautology, is to try out all (exponentially many) possible truth values for the letters in A, and to see if all of these make A true. This method can be wasteful of computation as we see from A A B 3 B A A, where the form of the proposition makes it a tautology. Generally, many methods have been devised to speed up tautology testing, none of which is known to be faster than exponential time in the worst case. The classical approach of logic to generating (as opposed to testing) exactly the set of tautologies, are various systems of deduction. We favor the natural deduction systems, as they s e e m closest to human reasoning. Natural deduction is due to Gentzen (see e.g. [Gen 19691) and the specific system we next present is the propositional logic part of Prawitz' system [Pra 19551. Prawitz' s monograph is now hard to obtain, and also the majority of logic texts present other systems, which focus more on characterizingthe tautologies by logical axiom and rules than on the "naturality" of the system. Good texts are [Men 19641 and [Shoen 19671. In the natural deduction systems, there is a special symbol 'A' which stands for 'absurdity.' T A abbreviates A 3 A. (If 'absurdity' is true, then everything is true). Each propositional connective, ezcept negation, has an introduction rule and an elimination rule.
(AI)
A B AAB
(AE)
AAB A
and
AAB B
( A ) and ( B ) in the ( V E ) rule, ( A ) in the ( 3 I)rule, as well as ( T A )in the ( A c ) rule, above the premiss of a rule, can be discharged by the rule (i.e. no longer count as an 'assumption'). Deductions are in tree f o m . Natural deduction systems have no axioms. Their theorems are those
R. JEROSLOW
a4
propositions having proofs with no assumptions (i.e. all top formulae of the proof are discharged somewhere in the proof). We next give some illustrations of proofs in this natural deduction system.
Ezample 5.2.1; A A B ~ A A B1 B A BAA 3 BAA' AAB
Since all assumptions (i.e. top formulas) are discharged by the bottom line, the bottom line is proven. Let us read this proof, line-by-line. From the leftmost top formula A A B we deduce B by ( A E ) ;from the rightmost we deduce A also by ( A E ) . Then we deduce B h A using ( A I ) . Now we note that B h A has been proven from A A B ; so we deduce A A B 3 B A A using (3 I),and discharge both top formulae as assumptions. Since A A B 3 B A A has no assumptions, it is a tautology. The proof above certainly does follow humanlike reasoning, if perhaps a little slowly. We next give three more proofs of tautology, which will illustrate that some "less natural" proofs can also be implemented in natural deduction, and provide some practice with this system, The reader should justify every step in each proof. We use superscripts to mark places in the proofs where assumptions are discharged, with the s a m e number occurring at the discharged formulae.
Ezample 5.2.2: B
B' B v lB h
V
i B (excluded middle) iB2 B V i B i [ BV 1B)3 A
l [ B v ,BI3
A B V 1B3
LECTURE 5
85
Ezample: 5.2.3 (half a De Morgan Law)
1A3 A - i AV 7 B 4
AAB' A
A A B ~ B
1B3 A
i ( A A B)' i ( AA B)3 - A v i B 3 - ( A A B)4
' ( A A B)'
Ezample 5.2.4: (half a Distributive Law)
A A ( BV C)z A AA(BVC)' BvC
A A ( B v c)' A C' AAC ( A A B ) v ( A A C)
B' AAB ( A A B ) v ( A A c) ( A A B ) v ( A A C)l A A (Bv C) 2 (AAB)v(AAC)'
While the system above possesses a naturality, it can seem cumbersome to use, due to the need to keep track of the tree structure and discharging. While compact notations alleviate this for machine implementation, many authors prefer simpler proof systems which involve some axioms, a rule of deduction, and linear proofs with no discharging. All the different logical systems for propositional logic must be justified by a completeness theorem, i.e. a theorem that they prove exactly the tautologies. Such a result holds in this case as well.
Theorem: The above system proves B X B is a tautology.
5.3
Propositional logic as done by integer programming - 1
A propositional form B is satisfiable iff it is true for at least one truth valuation (equivalently, i B is not a tautology, so that B can be true). The satisfiability problem is:
Given: a proposition B To determine: is B satisfiable?
R. JEROSLOW
86
Integer programming is oriented more toward satisfiability testing than toward tautology testing, although these are equivalent tasks. There is a 'standard' way of 'imbedding' propositional logic into Euclidean space, so as to deal with it by integer programming techniques. By 'standard', we mean it is found frequently in the technical literature and in textbooks. (In Lecture 9, we shall briefly cite alternative imbeddings, of which there are many with advantages over the standard embedding). The standard representution of a disjunctive clause V j E *Pj ~ is xjEKz(APj)
2
z(Pj) E
(o,l)
di where z ( 1 P j ) is 1 - .(Pi). For example, the standard rep. of (1 - .(Pl))
V
Pa
V
+ .(Pa) + .(&)
Ps is 21
Here the value '1' stands for 'true' and '0' for false. This linear inequality holds exactly if the clause is true. In this manner, a list of disjunctive clauses (i.e. a CNF) becomes a list of (pure) integer programming constraints of equivalent satisfiability. How hard are satisfiability problems to solve, when done by integer programming? Our experience with satisfiability problems is confined to randomly-generated problems. We have made a fairly extensive search for "real-world" (non Horn) satisfiability problems, only to fmd that few of these occur. Theoretical results about randomly generated problems appear to depend very much on the probability distribution chosen for problem generation and the solution method used. For example, one version of satisfiability problems are very easy to solve by simple heuristics, when satisfiable [F'ra Ho 19861, at least with a diminishing probability of error, while other versions are intractible by a moderately skillful exact method (private communication from V. Chvatal). In our experience, randomly-generated satisfiability problems are easy to solve by a standard MIP code (APEX IV) with no adaptations or special features. Our random generation methods are described in detail in [Low 19841. Briefly summarized, for pure satisfiability problems, after fixing a clause length and a number of clauses and letters, each clause is filled by drawing equiprobably from the letters without replacement. The sign of the letters is then chosen either at random in each occurrence (fist method), or to be opposite the sign of the previous occurrence (second method) with the f i s t occurrence sign at random.
LECTURE 5
87
In some problems, very fast heuristics were used to prescreen the problems and to leave a part which was "hard for humans." These problems are the ones in which either the number of clauses or letters fail to be a multiple of five. In other problems, the prescreening was not done. It seemed not to make any difference to the computer, as was also the case with the "boundary" between NP and polynomial time (i.e. clausal size s = 3 versus s = 2). We used three MIP codes: BANDBX supplied by Clarence H. Martin (actually an IP code), a code from the book by Land and Powell, and APEX N.We used APEX N after it arrived at our campus. We summarize some of OUI results in the next three tables. PROBLEM SIZE
SATISFIABLE? NODES
TIME (CPU secs) LP TOTAL
~~
L=31,
c=44,
s=2
NO
2
4.0
6.5
L=35,
c=45,
s=2
NO
3
4.7
9.8
L=37,
c=45,
s=2
YES
4
4.7
10.8
L=36,
C=52,
s=2
NO
3
6.0
14.4
L=46,
C=63,
s=2
NO
3
8.7
21.3
L=53,
C=68,
s=2
YES
3
10.1
21.5
L=36,
c=39,
s=3
YES
1
3.9
4.1
L=38,
c=45,
s=3
YES
2
5.5
6.5
L=43,
c=45,
s=4
YES
1
5.2
5.4
L=40,
c=45,
s=4
YES
2
5.3
7.0
L=25,
c=35,
s=5
YES
1
3.5
3.6
Table I: SATISFIABILITY TESTS USING BANDBX
R. JEROSLOW
88
TIME (CPU secs)
PROBLEM SIZE
SATISFIABLE?
NODES
LP
TOTAL ~~~
L=38,
C=40,
s=2
YES
1
3.4
3.5
L=42,
C=40,
s=2
YES
1
3.6
3.7
L=44,
c=45,
s=2
YES
2
4.8
6.5
L=45,
c=45,
s=2
YES
5
4.7
13.0
L=44,
C=52,
s=2
YES
1
5.5
5.6
L=43,
c=55,
s=3
NO
3
6.3
13.7
L=45,
C=60,
s=3
NO
3
7.6
13.4
L=45,
C=60, s=4
NO
3
7.3
15.5
Table 11: SATISFIABILITY TESTS USING LAND AND POWELL’S CODE
TIME (APEX) CLAUSES
LITERALS
CONSISTENT?
NODES
LP
TOTAL
300
160
YES
1
1.4
1.8
300
120
YES
1
1.0
1.4
300
100
YES
1
1.3
1.7
400
100
YES
1
0.9
1.3
400
60
YES
1
1.1
1.4
500
60
YES
2
1.8
2.9
500
50
YES
1
1.2
1.6
600
60
YES
5
4.2
35.3
Table 111: TESTS USING APEX
IV
LECTURE 5
89
We were able to create a hard problem by selecting certain letters in a satisfiability problem, and fixing these to truth values. When "too many" were fked, the problem was inconsistent. As we gradually relaxed the number of letters fixed, the problem moved toward LP feasibility, and the highest run times occurred just at the point feasibility began. The original satisfiability problem which we modified had 400 clauses, 100 letters, and three literals per clause. The data is in the Table JY. We believe that the "incumbent finding feature" of branch-and-bound, which is present in MIP approaches to satisfiability, but not present in traditional logic approaches, was crucial to our favorable run times on consistent problems. It remains open, whether incumbent finding by linear programming can be replaced by a faster routine developed by list processing. I would conjecture that o w run times can be improved by two orders of magnitude, via specialized codes.
TOTAL FIXED
CONSISTENT NODES ? TOTAL
TIME APEX UNITS LP TOTAL
38
NO
1
INFEAS
11.3
20
NO
1
INFEAS
40.9
18
NO
1
INFEAS
71.0
16
NO
1
INFEAS
62.4
14
NO
1
INFEAS
84.1
12
NO
13
88.7
266.8
10
NO
16
77.0
299.8
8
YES
13
75.6
170.2
Table Iv:CREATING A HARD PROBLEM
R. JEROSLOW
90 S.4
Clausal chaining: a subroutine
We shall be studying one of the most effective logic-based algorithms for satisfiability testing, the algorithm of Davis and Putnam [Dav Put 19601 in the form treated in [LOV19781, which we call DPL. DPL is very closely related to MIP algorithms, at least when these utilize the standard representation of disjunctive clauses. To learn DPL, we first learn its most important subroutine, which we call "clausal chaining" (CC) and which is also called "unit resolution" [Lov 19781. Here is a description of clausal chaining: Given: A list of disjunctive clauses First: Delete any clause which contains both a Pj and i P j . Go to repeat. Repeat: Look for unit clauses (i.e. one literal clauses). If there are none, stop. If there is a unit clause kPj which has been made false, declare the problem inconsistent and stop; Similarly if both Pj and 1Pj are unit clauses. Otherwise, make ktpj true; delete clauses in which '3 occurs: delete T P from ~ any clause in which it occurs. Go to repeat.
Diagnosis: If the list of clauses is empty when clausal chaining stops, the original list was consistent. If CC declares inconsistency, it is correct. If neither case holds, we don't know.
Clausal chaining is not a satisfiability tester, since it can stop due to non unit clauses, although the problem is inconsistent. However, for certain distributions of problem instances, it is very powerful when combined with some trivial tests (see [&a H o 19861). Clausal chaining can be implemented in linear time, by using the proper data structures adapted from [Dow Gal 19841. We next give an example of CC. Note that the first step is to remove the fifth clause, where both P4 and iP4 occur.
91
LECTURE 5 Example
Pl
-+
F
(satisfiable) Ps
-+
F
Empty List
Pz
T
P3 + T Clausal chaining is a special instance of resolution, in which one of the clauses is a unit.
92
R. JEROSLO W Resolution is the following rule of logic:
A V Pj
iPj V B
AV B
In the above, A and B denote the remainder of a disjunctive clause. For C C , either A = 0 or B = 0. Lemma: All truth settings and inconsistencies which are discovered by C C , are also discovered by the linear relaxations (LR) of the standard formulation. Proof:If both Pj and i P j occur in a clause, the standard formulation has
...+ .(Pj) + ...+ (1- % ( P j )+) ... 2 1 Due to cancellation, this constraint is always satisfied in the LR. Suppose there is a unit clause Pj. Then in the standard formulation
z(Pj) 2 1 occurs. So if we have already set z ( P j ) = 0, the LR is inconsistent. If also i P j is a unit clause, we have 1 - z(Pj)3 1
i.e.
z(Pj)5 0
so again the LR is inconsistent. Otherwise the LR sets .(Pi) = 1,so all clauses in which Pj occurs are satisfied in the LR. Any clause in which P, occurs e.g. 7Pj gives (1- .(Pj)) z ( B ) 2 1
+
so
it is equivalent in LR to
this analysis is then applied inductively on the number of steps in C C .
Q.E.D. Conclusion: [Bla Jer Low 19851 The truth settings and inconsistency diagnoses of clausal chaining are exactly those of the linear relaxation of the standard formulation.
Proof: We need consider only the case in which CC terminates with a nonempty list and no inconsistency diagnosis.
LECTURE 5
93
4
In this case, all clauses left have two or more literals. Set all .(Pi) = for all unset Pj, and the LR is satisfied. Thus there are no more truth settings to find and no inconsistency in the LR.
Q.E.D. While the conclusion above at first seems to state that linear programming and clausal chaining are equivalent, that equivalence is restricted to inconsistency diagnoses and to variables fixed at "true" or "false." By chance, the linear program may find a satisfying truth valuation involving many variables which are not fixed in value (incumbent finding). This "chance" event happens more frequently than one expects, particularly when linear programs are solved repeatedly in branch-and-bound. Resolution with nonunit clauses can go beyond the linear relaxation of the standard formulation. For example:
P z v TP1
Pl v p 2
However
i.e. .(Pa) 2
4
i.e. P a is "half true"
We now turn to the "Horn clauses", which are a restricted form of disjunctive clauses, specifically, those with zero or one positive literals. For predicate logic, these have played a central role in both expert systems and in the PROLOG language. We present Horn clauses in their forms as implications. An implication is Horn if all its hypothesis are (unnegated) letters and it has zero or one conclusion, which is also a (nunnegated) letter
El ,...,H, 3 H i.e., - I EV~... V -E, V a or H1,...,H, 3
i.e.
-IH~ V ... V i H ,
R. JEROSLOW
94
This is a restricted format for representing knowledge. Here is one diflerence in scope for some important artificial intelligence methods, as contrasted with MIP models. The Horn clause format requires that some definite single conclusion follow from a consistent set of positive facts. In MIP models, for instance, while statistical data may justify a warehouse being located in a certain metropolitan area, there can be alternatives as to its nature and size. Much is gained algorithmically by restricting attention to Horn clauses, as we see in the next results. Note fist that a unit resolution done on Horn clauses results in a (shorter) Horn clause.
Conclusion: [Bla Jer Low 19851 A set of Horn clauses is inconsistent iff clausal chaining finds an inconsistency if€ the linear relaxation of the standard formulation is empty.
ProoE We need consider only the case that CC terminates with no inconsistency and a nonempty list of Horn clauses of size two or more. Just make all unset letters false and these remaining clauses are satisfied.
Q.E.D. From the above and Khachian’s result [Kha 19791 that linear programming is polynomial time, we see that Horn clause consistency testing is polynomial time. Actually, by the result of Dowling and Gallier, it is linear time. The above results do not generalize to nonhorn clauses. For example, the following list of clauses are Horn except for the f i s t , and the list is inconsistent:
Pl v p2 lP1 v P2
Pi V lPa -Pi V lP3 Clausal chaining takes no action (no unit clauses occur). So it does not detect inconsistency. Resolution does detect inconsistency:
lP1 v P2
Pl V p2
PI v -Pa
Pa
7Pi v 7P2 -Ja’
A
LECTURE 5
95
Similarly, branching on .(PI) in IP would detect inconsistency for both branches. A Horn+ clause is a Horn clause
with a nonempty conclusion H. Note that a set of Horn+ clauses is always consistent (make all letters "true").
S.5
Some properties of fkequently-used algorithms of expert systems
Two algorithms which are frequently used in expert systems are "forward chaining" and "backward chaining". Sometimes they are used together, and with other algorithms. Here we shall define them, illustrate them, and relate them conceptually to clausal chaining. Fomad Chaining consists of evaluating all positive unit clauses Pj as "true" all negative unit clauses Pj BB "false" and then inductively: If HI, ...,H, have been set "true"
a,, ...,H, 3 H is among the rules, ...,H, II HI,
and if
{
set H true declare inconsistency and stop. Thi process of using a rule is called "firing"the rule, and the overall process of forward chaining is sometimes called "the chase." Forward chaining is implied by clausal chaining, as we illustrate: fien
HI
l H 1 V 1Hi
Hi
-iHi
H,
-iH,V H
H
V
V
... H, v H -.I
... V T H , v H
R. JEROSLO W
96
Remark: For Horn+ clauses, the truth settings achieved by FC and CC are the same. ProoT: For Horn+ clauses, initially all unit clauses Pj are set "true." Inductively, the only truth settings possible are "true." All rules fired by FC are true for the valuation found by FC. Consider a rule not fired by FC:
Hi,...,H72 H At least one of the Hi has not been given a truth value. (Possibly H has a truth value 'true") All unvalued letters can be set "true" or all can be set "false," and this will make all rules "true." Thus there are no more settings to find, so CC also has no more settings.
Q.E.D. Such a result does not hold for Horn clauses. e.g.:
7E
i.e. H 3
1G V H i.e. G II H FC puts no value on G,but CC sets G "false." Thus, CC is more powerful than FC. However the following holds:
Remark: For Horn clauses, the consistency diagnoses of FC and CC agree. Proof: Without loss of generality, FC finds no inconsistency. Consider an d k e d rule: Hi,...,H , 3 H or H I , ...,H , 3
At least one Hi has not been set "true." (It m a y have been set "false"). If these have not already been set "false", they can be set "false" and the rul e is "true." Hence the list of Horn clauses is consistent.
Q.E.D.
97
LECTURE 5
On non-Horn clauses, FC is much weaker than CC.e.g.:
CC will find this list of clauses inconsistent, but FC has no provision to handle such non Horn clauses (recall that only Horn clauses are permitted in expert systems). We now turn to a discussion of backward chaining, an algorithm based on the humanlike procedure of reducing a task to its subtasks and trying to accomplish these subtasks, either directly or by a furthur reduction. As there m a y be several ways of accomplishing the task, alternative list of subtasks can rise which lead to a complex structure to represent the overall reduction process. These structures are the and/or trees, and can require significant computation. We describe backward chaining largely by illustration. Backward chaining (BC) works as follows. Given a "goal" or "subgoal" of finding H true, when H is not known to be true, all nrle.9 concluding H are retrieved from the rule base. E.g.:
H I , H7, HQ 3 H
rule 17
H3,Hs 3 H
rule23
Ha,H7,Hla 3 H
rule 108
At least one set of premisses must be found true, which sets up subgoals in an and/or tree ( s e e Figure 18). The double arcs in Figure 18 indicate "and." The process is then iterated with each subgoal viewed as a goal. The number of subgoal nodes of the tree cannot exceed the total number of conditions in all rules (see two occurrences of H7 above). Loops can occur as e.g. caused by a rule
H 3 Hia which can be ignored during BC. Backward chaining has a serious deficiency, specifically, it can fail to diagnose an inconsistency on Horn clauses. e.g. God
G
R. JEROSLOW
98
Figure 18: An And/or Tree Rules concluding goal: Several (lots of computation!), which eventually are found to imply G Some other rules: G3H, H 3 These are not activated by BC, but 1G is implied. To avoid running into such difficulties, BC should be restricted in use to Horn+ clauses. Given the problematic nature of both FC and BC, as well as the significant computation time of BC, I do not see reasons to use these algorithms when an excellent linear-time algorithm for Horn clauses can be adapted to linear time unit resolution. These linear time algorithms also come in ”forward” and”backward” versions, but not problematic ones. Beyond being linear time, they are exceptionally efficient. For those who are concerned with the study of cognitive processing, a heuristic method used by human problem solvers is of interest in its own right. However, toward the goal of decision support, the humanlike nature of a heuristic is not a sufficient recommendation. Humans are not always optimally efficient or even free of error. However, consideration of humanlike heuristics can be a useful starting point for an exact analysis and for experimental evaluation. In point of fact, the linear-time algorithms of Dowling and Gallier can be viewed as an approach to making forward and backward chaining both efficient
LECTURE 5
99
and complete for Horn clauses. Humanlike heuristics ought therefore to be a subject for further analysis. When the "rule base" (i.e. set of rules) of an expert system naturally partitions into parts, with Werent parts solely relevant to different queries, then there is a basis for potentially a less-than-linear-timeprocedure. In general, one cannot hope for less time in execution than simply that needed to read all the rules. Without very special structural features, one cannot hope for an heuristic to quickly find just those few rules which pertain to answering a query; typically, there are more than a few.
5.6
The Davis-Putnam Algorithm in T w o Forms
To complete our presentation of DPL, we need to give its two remaining subroutines: monotone variable fizing (MVF) and splitting. MVF is used after CC stops without having determined the consistency of the list given. Any Pj appearing only as Pj is set true. Any Pj appearing only as -d'j is set false. MVF cannot change the consistency of the list. It is not done by math programming algorithms, although it can be validly added as a subroutine for consistency testing. MVF is not valid in general if there is a nonzero objective function. Splitting can be done by either of two subroutines, resolution (the original method) and brunching (second method). Splitting is done after MVF, when the consistency of the list has still not been determined. Any Pj must occur as both Pj and 7Pj. The situation is: Pj V R1, lpj
,
T l
v s1,
...) Pj V R,
- - - dauses containing Pj
..., ...,
1Pj v s b
---
T,
- - - no occurrences Pj
clauses containing 1Pj
Splitting via resolution
The list of clauses above is actually equivalent (for satisfiability) to the given list. To see this, the reader should first prove the following lemma.
R. JEROSLOW
100
Lemma: Any truth valuation making all R; v Sj true either makes all R; true or makes all Sj true.
Splitting via branching Create two subproblems:
"Pj false"
"Pj true"
R1, ...,R ,
-..,s b Tl, ...,T,
TI,
...,T,
S1,
The original problem is consistent if€ at least one of the two subproblems is consistent. In Lecture 9, we shall prove a general result which implies that branching is superior to resolution, in terms of the linear relaxation and hence also in terms of unit resolution (it also is usually superior in terms of total size as well). W e summarize the relations between DPL and branch-and-bound (for the standard formulation) with these intuitive equations: DPL = CC
+ MVF + Branching
Since C C =LP-Incumbent Finding We have: DPL = BB (standard formulation of clauses)
+MVF - Incumbent Finding As noted earlier, MVF
can be added to BB. We return to DPL in Lecture 9, where we use it to assist in MIP, just as here we have used h4IP to assist in satisfiability testing.
5.7
Some recent developments (December 1087)
When we performed the earlier testing reported in Section 5.4, we made a few informal attempts to contact other researchers who had experience with satisfiability algorithms. We were told that most algorithms would simply not be expected to work (in any practical amount of time) on problems of the size routinely solved by APEX.
LECTURE 5
101
On the other hand, our belief was that the main contribution of linear programming, for just satisfiabilityproblems in the standard formulation, was its incumbent-finding feature. If an effectivelist-processing procedure could be devised to perform incumbent finding, we felt that the comparison could favor DPL. After all, it is very inefficient to carry around bases when all one is doing is list processing. In joint research with Jinchang Wang, we have confirmed our earlier conjecture. By enhancing DPL with a linear-time version of CC and with incumbentfinding, we have experienced run times roughly ten times faster than APEX on "easier" problems, and over a hundred times faster on "harder" problems. We report our methods and results in our paper, "Solving Propositional Satisfiability Problems." However, we do not feel that these results are the final story for a comparison of logic-based versus discrete programming-based approaches. There remain issues of alternate embeddings of logic, and enhancements of logic to include other commonly-occurringconstraints. We will discuss these issues in Lecture 9. We feel it is not a question of one approach versus the other, but of utilizing ideas from logic and discrete programming together. In another recent development, MI. Wang and I have discovered an interesting property ofessentially the standard imbedding of logic. By working with the negation variables .(Pj) = 1- z ( P j ) ,and nonnegative criterion functions b j V ( P j ) (with b = ( b l , ...,&) 2 0), one obtains a dual whose solution can be interpreted as describing the structure of Horn clause proofs. The precise interpretation varies with the vector b 2 0 chosen. For b 2 0 the j - t h unit vector e j , dual optimal solutions can be interpreted as proofs of Pj, when Pj is provable. Dual optimal solutions give near proofs of Pj when Pj is not provable, where a n e w proofis a proof structure which would be entirely valid if only exactly one non-versed hypothesis were given as a "fact" (trivially, Pj is always a near proof of itself, but the interest lies in other alternate near proofs). If several propositions are viewed as "targets" of proof, it would be appropriate to use as b 2 0 a vector with one's in the coordinates of these targets, and zero otherwise. An interesting aspect of this approach to Horn clause logic via linear programming are the new features it allows. Changes of the given facts of a situation correspond to changes only in the objective function of the dual program, and the same is true of changes in the rules of reasoning. Thus linear programming postoptimality can be used and a problem can be restarted from the previously optimal basis, rather than having to rerun it from scratch (or, even worse, recode the algorithm to add or delete rules of reasoning). The results are reported in our joint paper, "Dynamic Programming, In-
zf=l
R. JEROSLO W
102
tegral Polyhedra, and Horn Clause Knowledge Bases,” where connections to classical topics in Operations Research are also made. The principles we develop and apply for Horn clause logic are also applicable for obtaining compact and sharp MIP formulations of the problems described in Lecture 4, with regard to variable redefinition, as well as for formulations of more difficult problems.
5.8
Exercises
All remaining problems 12-15 can now be worked.
103
LECTURE 6 A PRIMER O N PREDICATE LOGIC Summary: We introduce and discuss the logic of predicates from an intuitive point of view, with either [Men 19641 or [Shoen 19671 as references which go in detail. We seek here to tie it into its potential uses in problem solving, and to indicate some of the potential obstacles from a theoretical perspective.
6.1
Introduction
The predicate logic concerns models, i.e. relations (predicates) on domains of individual objects. While the propositional logic treats isolated assertions like ”John is tall,” the predicate logic treats assertions such as ”John is the father of Susan,” where the latter is viewed as Father(John,Susan) i.e. as an instance of relation Father (2, y) of fatherhood on the domain of all persons. The specific assertion Father(John,Susan) is called a complete inetuntiution (instance) of Father (2, y), obtained by instantiating ”John” for the variable 2 and ”Susan for the variable y. The predicate logic allows us to concisely state general principles, such as these two: Father(z, y) 3 Anc(z, y) Mother(z, y) 3 Anc(z, y) These Horn clauses state that both fathers and mothers are ancestors. Together with the Horn clause
A m ( = , z ) A Anc(z, y> 3 Anc(z, y) that all ancestor of an ancestor is an ancestor, we have completely captured the ancestor relation on the domain of all persons.
104
R. JEROSLOW
We can replace the predicate logic knowledge representation above by complete instantiation as (2, y, z ) run over all triples of persons. However, this makes three Horn clauses become billions of Horn clauses. It is a reduction of predicate to propositional logic, but is value is limited. The use of predicate logic as a language thus significantly increases our ability to express knowledge and general principles. However, by a result of Plaisted cited in [Den Lew 19831, this expressibility is purchased at the cost of a significant jump in computational complexity. Specifically, that predicate logic just for Horn clauses is complete at exponential time. Since Horn clauses propositional logic is linear time, the great increase in time is entirely accounted for by the generally exponential number of complete instantiations. Exponential-time complexity is a new barrier, substantially more serious than N P completeness. Even if P # NP, NP may be "only slightly nonpolynomial." In contrast, "exponential time" cannot be reduced. While the worst-case complexity need not dominate Horn clause predicate logic, it is a "flag" not to be ignored, and focuses attention on the use of special features of a predicate logic rule base. It marks the deduction problem as probably harder than MIP. Indeed, in the successful expert system applications for which large segments of the rule base have been published, the bulk of such rule bases involve a good deal of instantiation. These rule bases are close to being Horn-clause propositional logic in which the use of predicates serves to structure the database. The predicate logic is a very useful language for expressing constmints between rektions over a set of objects. As a mathematical subject, it comes together with a number of diflerent techniques for carrying out deductions from these constraints and for answering queries regarding objects in the set. Predicate logic is used as the primary or sole approach in azltomated theorem propting and in the "logic based" approaches to artificial intelligence. It also h d s application as a language within many of the other AI approaches, and therefore is an essential subject to those interested in machine intelligence. As a language, it "meshes" well with modern database methods, particularly relational databases [Ull 19821, [Codd 19721. However, for use in query processing there is need for furthur research on streamlining means of answering commonly-occurring queries on large databases. Predicate logic was developed in the period 1910-1940as a language formalization of logical deduction. Lowenheim, Skolem, Bernays, Hilbert, Godel, Herbrand and Gentzen all made significant contributions. The subarea of mathematical logic concerned with deduction in formal systems is called proof theory.
LECTURE 6 6.2
105
Predicate logic: basic concepts, notation
In the predicate logic, for each integer n 2 0 , there is an infinite supply of "predicate symbols" which intuitively represent n-ary relations on a set:
P"' , Pnl, P"S , ... For example, with n = 2, P:(z, y) may represent the "parent" relation on the set of persons ("x is a parent of y"), and P:(z) may represent the fact that "x is a parent" in a given model. Constant8 represent individual objects in the set: El,
c2, c3,
In addition to the propositional connectives for constructing more complex relations, there are quantifiers: V
- universal quantifier read "for every"
3
--
existential quantifier, read "there exists"
Example:
is read: "For every individual 2 1 in the set, if there is an individual 2 2 for which ( C I , 2 2 ) is in the relation denoted by Pf, then 2 1 is in the set denoted by Pi." However, this rigorously correct,reading is too complex! Instead one would read: "For every 2 1 , if€ P:(cl,
22)
for some 2 2 , then also P i ( z 1 ) is true."
In the model where P:(z, g) is the parent relation and Pi(z)asserts that "x has black hair," this formula of the predicate logic asserts: "If c1 has any children, then everyone has black hair." It can be true in the usual model of all persons only if c1 is childless; otherwise, it is certainly false. A sentence like
R. JEROSLO W
106
is either true or false in a given model. A valid sentence is one that is true in all models. It is the predicate logic analogue of a tautology. A satisfiable sentence is one that is true in some model. Above we have described the pure (first order) predicate logic with individual constants. In the applied (first order) predicate logic there are also fvnction symbols in the second order logic we have quantifiers 3T" and VT" with T ranging over n-ary subsets of S. Unless otherwise stated, our discussion concerns only pure (first order) logic with individual constants. As regards validity,
q,
, = S x S , P i = 0 and it is false. It is satisfiable. is not valid. Put S = { q }P: Put s = {Cl}, P; = 0 , P i = 0. The variable occurrence of 2 1 i n P i ( z 1 ) is called bound by the quantifier occurrence ( V Z ~which ) has 21 in its scope, i.e. in
(*I
(322) P?(cI, 22)
3 Pi(2i)
Similarly, the occurrence of 2 2 in P?(cl, 2 2 is bound by the 3 2 3 . The opposite of bound is free. Thus 2 1 is free in (*). Some occurrences of a variable can be bound, others free. For example in
the first occurrence of Also in
21
(3x1) P,"(Cl,2 1 ) A P3(21) in P? is bound, while the occurrence in Pi is free.
(321) (Pl%l,
21)
A P,l(Zl))
both occurrences of 21 are bound. A sentence is a formula with no free variables. Parameters a l , a2, can be inserted for occurrences of variables. Since we never quantify over parameters, they always occur free. We might view parameters as generic individuals. We shall need the following symbolism:
...
LECTURE 6
107
F," = result of substituting u for u in all its free occurrences in F To obtain a system for predicate logic, we add these new rules to the earlier ones for propositional logic (see [Pra 19651):
(VI)
F (V4FZo
(YE)
(V4 F FZE t a constant
or a parameter
In the predicate logic, there are special restrictions on the use of these rules: ( V I ) : 'a' does not occur in any assumption (i.e. undischarged top formula) on which F depends
( 3 E ): 'a' does not occur in ( 3 z ) F or in G, and 'a' does not occur in any assumption on which (the top occurrence of) G depends, save only for occurrences of F," The reason for the restriction on ("I)is to avoid erroneous "proofs," such as:
p; (a1
(erroneous use of ( V I )
The sole topformula has been discharged by a ( 2 I),but the bottom formula is not valid. The bottom formula asserts that, if there is any individual 2 2 such that P;(zz)holds, then P;(zl) holds for all individuals 21. In some models, this is not true.
R. JEROSLOW
108
The restriction on ( V I ) insures that no special assumptions have been made regarding 'a', so that a general claim (Vz)Fg can follow from F. The restriction on ( 3 E )is similarly motivated.
For this logic, Prawitz has established a completeness theorem (by combining his results with those [Giidel1930] and [Gen 19691).
Completeness theorem (following Giidel, Gentzen, Prawitz)
The natural deduction system above proves a sentence F if and only if it is valid.
A similar result holds for applied predicate logic.
This completeness theorem is remarkable, since it states that the sentences true in every model - finite or infinite - are exactly those proven in the natural deduction system described. In contrast to propositional logic, however, where one can test for tautologies, there is no algorithm whatever for testing validity. If a sentence is valid, a proof eventually will show up. If not, one will never show up. However, we cannot tell when to "stop looking. "
We use a simple example from [Pra 19651 to illustrate a proof in predicate logic.
Ezample 6.2.1:
109
LECTURE 6
Some other surprising properties of the predicate logic are cited in the next results and example. A set E is consistent if absurdity E\ cannot be deduced from in predicate logic.
Theorem: (Henkin) Any consistent set of formulas has a model. Corollary: If 2 is a set of formulas, such that every finite subset has a model, then E has a model. Proof: E is consistent, since a derivation of finite subset of C.
A necessarily
would be due to a
Q.E.D. Ezample 6.2.2: C = axioms for equality all true sentences of arithmetic on nonnegative integers. -(c = 0 ) , -(c = I), -(c = 2), etc.
+ +
Every finite subset of C has a model - the usual one, with 'c' larger than any integer named in the subset of formulas. By the corollary, has a model,
R. JEROSLO W
110
but clearly it is not the ”standard” one. Skolem was probably the first to understand how to create nonstandard models (using Merent techniques than those here). We next turn to the Prenex Laws, used for moving the quantifiers within formulas. These will be very useful in Lectures 7 and 8, as well as immediately below. The Prenex Laws are easily verified as valid, and they are:
Prenez Laws
( 3 z ) P 3 Q = ( V z ) ( P3 Q ) (Vz)P 3 Q = ( 3 z ) ( P3 Q) z not free in Q P 3 (3z)Q = ( 3 z ) ( P3 Q ) P 3 (Vz)Q = ( V z ) ( P3 Q ) z not free in P
(3z)PV Q = ( ~ z ) (VPQ ) ( V z ) Pv Q = ( V z ) ( PV Q ) z not free in Q l(3Z)P = (V2)lP +z)P
= (3Z)lP
Using these laws, quantifiers can always be ”moved to the front” to obtain an equivalent formula in Prenez Normal Form, with a propositional matriz. We next turn to Skolem’s reduction, which from our perspective gives an efficient method which takes a predicate logic sentence into another sentence, with these properties: 1. The first sentence is satisfiable exactly if the second is;
2. The second sentence is in Prenex Normal Form, with all occurrences of any V preceding all occurrences of 3.
This result is stated in synopsis below, and we only illustrate it. However, our method in the illustration is general.
Theorem: The satisfiability problem for sentences of pure predicate logic is fragment. equivalent to the satisfiability of just the
77
LECTURE 6
111
Ezumple 6.2.3 Without loss of generality, we can assume a Prenex Form with a propositional matrix: (V21)(322)(v23)(324)M(Z1,22,23,24)
This formula is satisfiable exactly if there is a model and a function F (called a "Skolem function") on it with
(VZl)(V23)(323)(324)M(Zlr F ( z l ) , 239 2 4 ) i.e. exactly if there is a model and functions F, G on it with
(v21)(v23)M(21, F ( z l ) , 23, G(zl, 2 3 ) ) Let us use F' respectively G' for the graphs of F resp. G. Then the original formula is satisfiable exactly if there is a model for
i.e. model for
6.3
Applications for problem-solving
Kowalski [Kow 19791 has pioneered the application of predicate logic in problem-solving. In this section, we use illustrations drawn from his book. The methods for predicate logic problem solution are drawn from the resolution-based approaches to theorem proving, as developed initially by Robinson [Rob 1965, 19681,which have led to a large literature (see e.g. [Ble Lov 19831,[Lov 19781. There are approaches to theorem proving which do not
R. JEROSLOW
112
use resolution (see e.g. [Ble 19741, [Nev 19741). The resolution-based approach for Horn clause predicate logic is embodied in the PROLOG interpreter first developed by Roussel [Rou 19751. The resolutiob-based approach requires CNF(”c1ausal form”) although some other approaches do not. Ezample 6.3.1: We are given Mother(z,g) 3 Parent(=, y)
i.e.
Father(z, y) 3 Parent(z, y)
i.e.
Etc.
Parent(z, z ) A Parent(%,y)
i.e.
Etc.
2 Grandparent(z, y)
i.e.
TMother(z, y)
V
Parent(z, y)
plus perhaps other rules. These rules are stateL with free varia les as they involve general principles. In addition, we are given a databcrse which contains:
Father( Zeus, Aries) Father(Aries, Harmonia) etc. plus a wealth of other data.
Question: Is Zeus a grandparent of Hannonia? Solution method: Add -Grandparent( Zeus, Harmcnzia) and obtain a contradiction (if false) or show there is no contradiction. If we were to proceed by complete instantiation, we would substitute constants for free variables in ewery poseibk? way. If there is a contradiction, this will uncover it and reduces the problem to propositional logic. Upon “instantiating” in every possible way, we obtain: Father(Zeus, Aries) 3 Parent( Zeus, Aries) Father(Aries, Harmonia) 3 Parent(Arie3, Harmonia) Parent( Zeus, Aries) A Parent(Aries, Harmonia) 3 Grandparent( Zeus, Harmonia)
113
LECTURE 6
Clearly, with these instances ("instantiations") of the rules, we can obtain Grandparent(Zeus, Harmonia). In detail, the resolutions are: Father(Zeu8, Ark#) iFather(Zeu8, Ark@)V Parent(Zeui, Arks) Parent( Zeu8Ark8) Father(Arie8, Elormonk) iFather(Atk8, Elarmonk)
V
Perent(Atk8, liarmonk)
Parent(Ark8ffarmonk) Parent(Zeu8, A r k # ) iParent(Zeu8, Arkr) V iParent(Ark8, Elurmonk)V Grandparent(Zeur,Ark#)
~
Grandparent (Zeur ,Ark 8 )
The last line contradicts -Grandparent( Zeus, Harrnmia) and we are done. However, complete instantiation also produces m a n y useless instantiations, e.g.
Mother(Zeus, Aries) 3 Parent(Zeus, Aries) Father(Elurmonia, Zeus) 3 Parent(Hurmoniu, Zeus) etc.
In even a moderate size database, aast numbers of useless instantiations created by this process. Furthurmore, to complicate our task, it is not always clear which are useless. PROLOG does not proceed by complete instantiation. Instead, it seeks to do resolution in the most general setting possible. It fixes a value of a variable (binding the variable) only where that is needed to let resolution proceed. This technique of "late binding" tries to keep variables free as long as possible. For example, we have in the database
are
R. JEROSLOW
114
Father( Zeus, Aries) Father( AriesHarmonia) and the rule
Father(z, y) 3 Parent(z, y)
No resolution is possible at this point. However the bindings (z,y) = (Zeus, Aries) and (2, y) = (Aries, Harmonia) d o w resolutions and we have Parent( Zeus, Aries) Parent(Aries, Harmonka) we also have in our database
Parent(%,z ) A Parent(z, y) 3 Grandparent(z, y)
Under the binding (z,y) = (Zeua,Aries), by resolution we have:
Parent(Arle8, y) 3 Grandparent(Zeus, y) with the binding y = Elarmonia, resolution obtains:
Grandparent( Zeus, Harmonka) Here we used forward chaining, also called ”bottom up” inference. This result can also be obtained by backward chaining, called ”top down” inference. In general, any complete implementation of clausal chaining - via all bindings which would continue CC - will be adequate in the Horn clause setting. While ”late binding” techniques are better than complete instantiation, they also face combinatorial growth: Out of all possible bindings that allow resolution to continue, which should be done now? This is the issue of a ”control strategy”, and more sophisticated versions of PROLOG permit complex heuristics for choosing the ”best” resolution and and binding.
LECTURE 6
115
For non-Horn clauses, the ”partial instantiation” of late binding plus resolution needs to be supplemented by fuctoring, which unify variables within a clause (see [Lov 19781).
Ezample 6.3.2 Here we give a database application Suppose that a certain relational database has three relations, with fields as indicated.
Supplier No
SR(
Status
Name
7
City
9
9
(key) Part No Part(
Color
Name 9
3
Weight
9
(key1 Supplier No
SY(
Part No
Quantity
9
f
1
and a relation ”generated as needed”
Lt(r, y)
”2
is less than y”
(Computable functions can also be entered in this ”as needed” manner as a graph, thus avoiding function symbols). We show how possible queries (questions regarding the database) can be formulated in predicate logic, so that theorem-proving devices can be used to retrieve answers to the queries. In the queries below, we restrict ourselves to queries answered by lists. The predicate formula in the right-hand-column is to be refuted, e.g. for the f i s t query one is to obtain a contradiction from the database plus the clause lPart(z, bolt, y, z ) V +y(u, 2, w) V TSr(u, u, u , t). A listing of all the values of the parameter ‘a’ is desired.
116
R. JEROSLO W Possible queries
Translation for Logical deduction
What are the names of the suppliers of bolts?
Par+, bolt, y, %) SY(%%W) A
SR(u, a,ZI, t ) 3
What are the names of the parts supplied by Apex? What are the names of suppliers located in London who supply nuts weighing more than one ounce?
SR(z, a,y, London) A SY(Z,U,ZI) A Part(u, nut, w , t ) A Lt(1,t) 3
In all three instances, logical deduction is to be used to determine all possible settings of variables and parameters which lead to a contradiction, and then to print out the parameters only (projection). As an approach to improving response at run time, the forms of the most commonly asked queries can be anticipated, and all possible deductions be precompiled. The original query is thus reduced to a union of ”easier” queries which require only database lookup. In this manner, deduction can be largely avoided at run time, and if there is not a large number of alternative ”look ups” for a given query, this approach can be efficient. The precompilation approach is due to Reiter [Rei 19781 and Henschen [Hen Nag]. In Reiter’s approach to interfacing logical inference and database, the purely existential validity fragment of predicate logic plays an important role. We illustrate some of these ideas in the next example and the results following. Ezample 5.3.2 Given a database of ”facts” D (no variables) and given principles which govern the database domain of quantifier-free matrices, Horn or not :
LECTURE 6
117
We have a quantifier-freematrix Q(a,y) and we wish to know all a such that in that domain there necessadyare y with Q ( a , y ) true. We note that the validity of
is equivalent to, by Prenex Laws, the validity of
L
A
1.e. it is of the purely ezistentiaf form (3 F)Q'( a, 2 ) . The following wellknown result is thus h e l p l l (see e.g. [Den Lew 1983)).
Theorem: (3 G ) B ( G ) ,B quantifier free, is valid (i.e. true in all models) iff B(;) is a tautology, where is the set of all vectors of constants drawn VEEV from B (if none in B, add one).
-
Corollary: In every domain where D is true and where
is true for all i = 1,...,t there necessarily are domain elements true, for a among domain elements, if and only if
A
d
with Q(a, c )
is a theorem of predicate logic. ( ' m) a) y, Reiter has shown that, in addition to the axioms (V ~ ( ' ) ) C ~ ( ~one add axioms for equality, together with the graph, of the equality relation, plus an axiom ('domain closure') that all objects occur in the database, and this will not change the answers to existential queries. The result is significant because equality is not easy for resolution-based procedures to handle. In Lecture 8, we give an approach to theorem proving which focuses on the validity fragment of predicate logic, or, equivalently, its ? satisfiability fragment. Indeed, (3 S)Q(G)is valid exactly if (V ;)-IQ(G) is not satisfiable. F'rom Skolem's reduction, satisfiability in predicate logic is reducible to v 3 satisfiability, so that the latter is not testable. However, we shall see that 3 t/ satisfiability is reducible to satisfiability, which, by the theorem above, reduces to propositional logic and so is testable.
7
-4
A 4
R. JEROSLO W
118
v%
In the applied predicate logic, the fragment reduces (by use of function fragment, and hence all of predicate logic reduces to the symbols) to the V fragment. Hence the fragment of applied predicate logic is not testable, and the theorem above is restricted to the pure predicate logic with constant symbols.
7
4
119
LECTURE 7 COMPUTATIONAL COMPLEXITY ABOVE NP: A RETROSPECTIVE OVERVIEW Summary: We survey some of the complexity results on problems which are harder than NP, and interject our own perspective. This lecture is a digression and is not needed to understand the subsequent lectures. However, it will be useful for the reader to have a broader framework for algorithms for the predicate logic, which appears to have a complexity above the N P class widely known to those in Operations Research. Moreover, we will relate some of these higher complexity classes to problems which naturally occur in Operations Research.
7.1
Introduction
Researchers have long sought general measures by which they could discern various degrees of ”difficulty” in different practical problems. Such measures would guide in the modelling of a practical situation, favoring models of lower difficulty; and would help to set expectations for the performance of algorithms. In the 1970’s the precise concept of ”computationalcomplexity” was viewed a s meeting this need for a measure of difficulty, to the extent that polynomialtime algorithms (at least those of low degree) were viewed as ”tractible” while others were viewed as “intractible”. However, this view is very much in dispute today, due to the fact that the Simplex Algorithm can be exponential time for certain problems [Kee Min 19711, [Jer 1973b], while some nonpolynomial problems (such as knapsack problems and satishbility problems) are typically satisfactorily solved for practical needs. Moreover, the entire thrust of current research in MD? is to develop efiicient means for solving problems which, in the terminology of complexity theory, are “intractible”. The fact is, that this research is meeting with success. At this point in time, most
120
R. JEROSLOW
applications-oriented applied scientists, including those in Computer Science, simply ignore the intractibility recommendations of complexity theory. However, computational complexity retains a central role for setting expectations as to dgotithm frcrmeutorks for problem solution. For instance, upon learning that a problem is NP-complete, one often is lead to consider a branch-and-bound approach (or dynamic programming or cutting-planes) as the general solution method, within which many other partial solution methods may be imbedded. In addition, computational complexity can be used to set expectations on the worst-case performance of proposed algorithms. To some extent, complexity can be used to suggest or motivate algorithms by means of the conceptud schemes for computation that are associated with complexity results. This is a valuable contribution. In terms of practical measures of computational difliculty, the current choice is among the "wind tunnel" method, sophisticated analyses of the performance of specific algorithms on randomly generated data, and sophisticated analyses of probabilistic algorithms in the worst case. There have been a number of critiques of the use of randomly-generated data and their match to real-world problems, which we do not wish to repeat here. Moreover, results in this area are often exceptionally hard to obtain, even for "rudimentary" algorithms which lack realism. The "wind tunnel" method consists of trying algorithms experimentally against real-world data. The folklore ascribes to Gene Lawler the idea of a "computer Olympics". Here "contestants" would try out their algorithms against a library of real-world problems on which many other algorithms have been tested. A variant of this idea is a centralized algorithm testing facility, which could be a long-term, sustained activity with the capability of aiding many industries via quantitative modelling techniques. A major function of such a facility would be to create and maintain an extensive library of actual or modified problems from various industrial settings. Of course, such a facility would be open to all researchers who wished to try out their algorithms and approaches. While computer usage and run time certification would be on a fee basis, access to the problem library would be inexpensive and easy. To avoid conflict of interest, such a facility would neither engage in or contract out any algorithm development, etc. It would be overseen by a board of respected scientists representing a variety of interests and approaches. In my view, a centralized testing facility is clearly a very appropriate way of linking applied decision sciences to applications. It dominates the problemat-a-time, client-at-atime approach which is currently the dominant practice, where often even the difficulty of solved problems has not been ascertained by
LECTURE 7
121
alternative algorithms. Occasionally, the problem datasets are not available for testing and verification by other experimenters. In terms of computational complexity, which remains an essential part of applied science and which is necessary for any sophisticated perspective in our area, a theoretical consideration is the need to address trial-and-error procedures (see e.g. [Jer 19751, [Go1 19651, [Put 19651). In these procedures, while no one algorithm may be &cient, there can nevertheless be a sequence of algorithms between which users switch, over time, to achieve solution of larger and larger problems. This issue has not been addressed in the current complexity theory. A more recent development for computational complexity derives from database theory, where, in very large databases, even a quadratic-time algorithm may be far too slow. Here the emphasis is often on algorithms with worst case time complexity that is linear, or even less-than-linear,in the size of the database. In the following sections, we assume a familiarity with the elements of the theory of N p complexity, and we review more than we introduce concepts. Background material is the fist three chapters of [Gar Joh 19791.
7.2
The findamental distinction: conceptions vs. their instances
Many of the results on computational complexity derive from the huge gap between what humans can conceive of in principle, on a theoretical level, versus what they can actually implement. Most people have little sophistication concerning the process of obtaining, from the powerful human imagination, concrete, usable outcomes. We can have absolutely clear und concGe conceptions of the basic me&ical functioning of a computing device (or of axiom ) and yet have no under8tandhg of what will be the oufpuf of a long calculation (or of a long deduction... ) It is these clear conceptions which lead to succinct formulations in a logic. It is our lack of understanding of output which leads to "hard" problems. The result is, that tasks which are seemingly simple to describe are hard, and predicate logic may prove "impossible" without taking advantage of special processing which is possible for some structured problems. A second phenomenon is as follows. If we can give an absolutely clear conception of a class of tasks, we obtain from this a clear conception of a harder task. This is the "diagonalization"
...
R. JEROSLOW
122
principle "which we shall shortly illustrate. As a consequence, at least some "complexity hierarchies" do not "collapse" to lower levels, so that complexity takes on a graded structure. We now illustrate the diagonalization process. Fix a function e.g. F(n) = 2". Consider all programs which, given an input z of length 1.1 5 n, output a "yes" or "no" in time 5 F(n). Call this a class Pi of programs p. Now consider this program po: Given z = 11 = 1...l ( n one's), apply the n-th program to 2. If it does not stop in time 5 F(n),output a "yes" or "no". If it stops in time 5 F(n), output a "no" if it does not output a "no"; output a "yes" otherwise.
Fact: pb takes time "a little longer" than F( n ) , and it is not equioalent to a program in Pf. For if pb has number no and stops in time a 5 2"O, it must answer both "yes" and "no" to input no. Diagonalbationis an ancient phenomenon as e.g. the Cretan Liar Paradox, "this sentence is false". It is reflected in the "paradoxes" of informal set theory (e.g. "the set of all those sets which are not members of themselves"). With great technical skill, Godel showed it to be a means of proving complexity results and incompleteness results [Godel 1931, 19341. It is a widelyu s e d technique of complexity theory. Here the ingenuity of the proof lies in finding meBIls to express a "paradox like" condition in a logic or computation which does not apriori appear to be sufficiently expressive. The surprising expressiveness of simple fragments of logic in turn derives from the power of even simple conceptions. Further progress in complexity theory has been hindered by a lack of new insights beyond the two phenomena of "gap" and "diagonalization". There is a consequent inability to determine central interrelations, e.g., if polynomial time is the same as nondeterministic polynomial time (P=?NP).
7.3
Two fundamental results
We now give two examples, both central to complexity theory, of clear, concise ideas having consequences which are hard to determine. We assume that the reader is familiar with the concept of a "Turing machine," a general-purpose programmable computer which is user unfriendly, having a primitive machine language and no compiler. Only persistent students of logic have actually programmed these machines. Conceptually, they could hardly be simpler as each
LECTURE 7
123
0
works on tape divided in "squares"
0
there is one symbol per square
0
there is one read/write head over one square
0
the next move is 0, fl squares
This is certainly a clear, concise conception. It is, in principle, adequate for computing anything we can compute on the most modem digital computers, although the modern machines of course run much faster. These primitive, theoretical computing devices will allow US to see the surprising expressiveness of the propositional calcdus. We will see that m y polynomial-time computation with these machines can be expressed as a polynomial-size proposition, such that an accepting computation corresponds exactly to a satisfiable proposition in conjunctive normal form. Furthermore, a polynomial-time computation even "with guesses" (i.e. a nondeterministic polynomial time computation) yields to the same treatment. The sole differencebetween a deterministic and a nondeterministic polynomial-time computation will be that the former yields a proposition with Horn clauses only. As a consequence of this fact, clausal chaining (unit resolution) or linear programming is proven complete for polynomial-time [Do1 Lip Rei 19791 (see also [JonLaa 19741, [Sky Val 19851,and [Val 19821). The result for linear programming of course relies on Khachian' s algorithm [Kha 19791, to show that linear programming is in polynomial t h e . It is "hard" to anticipate what a Turing machine will do, with guesses, in polynomial time. As a consequence, it will be hard to determine satisfiability in conjunctive normal form. We follow essentially the original construction of Cook [Coo 19711, and proceed as follows: 0
0
We introduce a propositional letter for each possible tape location, at each time, for each alphabet letter and each possible machine state. This leads to a polynomial number of propositional letters. Then in order to say that only permissible moves can be made in going &om time t to (t 1); we write implication clauses like:
+
A
letter 01 alone in square ( i - 1) at time t letter ul in square i un'lh read head and in state q at time t A letter 4 3 done in square ( i 1) at time t 3 allowable transition .......
+
R. JEROSLO W
124
We use other implications similarly to state that other squares are unchanged, that we start with a specified input, etc. Now note that non-determinism comes "for free": replace allowable transition.. by allowable translv allowable trans2V.....
...
"Polynomial time" is of interest for theoretical reasons, e.g. closure under multiplication (as above) or addition, dominance of a madmumof polynomials by a polynomial, and other properties. Practical interest lies in the low order po1ynomiale.g. O ( n ) , O ( n g ) ,O(n3), and marginally O(n*),O(n6). Further interest in nondeterministic polynomial time N P was motivated by the discovery of a huge number of Operations Reseamh problems which are in NP time (=NP) i.e. polynomial time in the correct "guesses". In fact, there are hundreds of instances cited in the book by Garey and Johnson. The connections between Operations Research problems and NP computational complexity were first developed by Karp [Kar 19721. Propositional satisfiability is a "special case" of many of these, while from Cook's construction, it is also the fundamental problem of the NP completeness paradigm. As we noted, Turing machines are relatively in&cient and cumbersome. However, changes in the model of computation do not usually change the tasks computable in polynomial time, although such changes can speed up tasks by constant factors or even orders of magnitude. As long as the minute operations of a model of computationis to change bits, its computation will be expressable in propositional logic. Only when more global "minute" operations are used (as e.g., multiplication of integers in constant time, independent of size) can there be changes in what is computable in polynomial time. We now turn to our second fundamental result. How much logical expressability do we need to concisely describe nondeterministic ezponential time computation? The answer is due to H. Lewis, and it is: ? satisfiability in predicate logic, i.e., the complement of what R. Reiter identifies as the fragment of predicate logic most relevant to database queries. Lewis main technical idea is that, to discuss an exponential length tape during an exponential time computation, we need to be able to use base m numbers of length m (where mm = c", and c" is the computation time). In this manner, a short symbol indicates an ezponentially large time.
LECTURE 7
125
We shall use the constants &, d1, ...,&-I of predicate logic to denote m - 1 to base m. The base m succeseorrelation can be developed using universally-quantified principles: E.g. for all i = 0, rn - 2 an axiom: 0,1,
...,
...,
(Vz0)...(Vzrn-2)
SC(zo,z1,
zrn-2,d;yzOy 21,
.--, zrn-2, 4+1)
plus other axioms to precisely specify successor. We then defme 5 in terms of successor, etc. We nezt develop a computation predicate: n
At time Z O . . . ,,,-I ~ in tape square yo ...yrn-l the symbol is q and the read head is/is not located (with internal state Q)" These predicates can be introduced and axiomatbed in a manner similar to Cook' s approach for propositional logic, but using the universal quantifier to replace what would otherwise be an exponential number of propositions. The details are in [Lew 19801. As in the propositional logic, nondeterminism comes "for free" in Lewis construction. Lewis' construction takes a nondeterministic exponentialtime computation and makes it correspond to a s m a l l set of universal sentences of pure predicate logic with constant symbols, such that the computation ends in acceptance exactly if the sentences are satisfiable. As we saw in Lecture 6, this 7 fragment of predicate logic is, in turn, decidable in nondeterministic exponential time, since complete instantiation results in an exponential-size proposition. Thus, nondeterministic exponential time can be identified with this fragment of logic. Plaisted's related result is that, for deterministic exponential time, only tl Horn clauses are needed, and so that time class can be identified with that logic class. The reader may consult the useful figure in [DenLew 19831 for placement of the complexity of other fragments of predicate logic. In that reference, the pure predicate calculus (without constants) is used, so initial existential prefixes need to be removed to compare to our way of stating results here. A
7.4
What if we increase expressability "a little bit"?
We've seen how expressability of universal quantification leads to nondeterministic exponential time computation. Suppose we add function symbols for
R. JEROSLO W
126
successor, and plus; and we add in the equality relation, but we ignore the other symbols of predicate logic. In this language, let Presburger arithmetic consist of all true statements of arithmetic. Now the time complexity will jump above Zn, or even the " '2 upper bound for nondeterministic exponential time, according to our next two results.
Theorem: [Coop 19721 There is a 2''" burger arithmetic.
- t h e algorithm for determining if a sentence is in Pres-
(Presburger proved deudability in [Pres 19291).
Theorem: [Fis Rab 19741 Any nondeterministic algorithm for deciding Presburger arithmetic requires at least 22'' time. Next suppose we add a three-place letter to represent multiplication, and we let first order arithmetic (FOA) be all true statements of arithmetic in this language. A subset of FOA is Peano's arithmetic (FPA), which consists of all statements in this language provable from Peano's axioms. The Peano axioms are a small list of basic facts about successor, the recursion relating successor to addition, the recursion relating addition to multiplication, and then the axiom schema of induction; see [Men 19641 or [Shoen 19671 for details. The following well-known result is a direct consequence of [Godel 19311.
Theorem: [Godel 19311 There is no algorithm for determining which statements are in FPA. Godel's proof essentially consists of an ingenious way of representing any computation of any length in FPA, so that all halting T u r i i machine computations can be proven from Peano's axioms. A diagonalization then completes the proof. Subsequently, Robinson [TarMos Rob 19531 produced a s m a l l , finite theory Q which also is adequate to establish all halting computations. Hence Q also has no algorithm for deciding theorems, and neither does predicate logic (since the theorems A of Q exactly correspond to the theorems 3 A of predicate logic). In addition to such "negative" results as the non-existence of an algorithm, Godel's techniques can also be used to obtain some algorithms. Specifically, from a mechanical enumeration of some of the nontheorems of FPA, we au-
LECTURE 7
127
tomatically can find a nontheorem this procedure does not enumerate. We illustrate this with a second diagonalisation. Indeed, from such an enumeration one obtains a program TQ such that it stops on (nl "program n stops on input n" is enumerated }. We ask: does program stop on input no? If it did, "program no stops on input no" would be a theorem of FPA, hence not in the enumeration, so no would not stop on input no, by definition. This is a contradiction. Hence, program no does not stop on input no. Is "program no stops on input no" enumerated? No,for if it is, program no would stop on input no, by definition. However, we h o w it does not stop. Hence "program TQ stops on input no" is a non-theorem yet not enumerated! How ezpressive i s FOA P Suppose we added an "oracle" to decide all halting computations instantly? If that isn't enough, add an oracle to the oracle, etc.
Theorem: (Due to E. Post and cited in [Rog 19671). No finite-level oracle can determine the theorems of FOA. From Post's theorem above, even among problems which are unsolvable, "oracles" we can distinguish degrees of difficulty, and in fact a whole hierarchy of unsolvable problems. Let's m e t a e direction and cut back on expressiveness. When are things "only aa complez" as in Operations Research?
Theorem: [Opp 19681
The existential statements of Presburger arithmetic form an NP-complete set.
Theorem: (Lewis)
The existential satisfiable sentences of pure predicate logic form an NPcomplete set. Theorem: (Bledsoe and Shostak)
There is a 2"-time procedure for the universal statements of Presburger arithmetic.
R. JEROSLOW
128
As we have seen from our excursion into more complex, and hence more expressive, logical theories, things get "curiouser and curiouser". In the next section, we turn to what are the relatively "lower levels" of complexity which still lie above NP,but which are not definitely known to be nonpolynomial. In this lecture, we have not touched on some very significant higher complexity results (as e.g. [Mey Sto 19721, [Sto Mey 19731, [HarStea 19651, [Har Lew Stea 19651) as these are of an automata-theoretic nature, and do not directly intersect with our topic in these lectures. Also see [Jer 19734 for an Operations Research problem which is unsolvable (i.e. integer programming with quadratic constraints), as a consequence of the results in [Mat 19701and [Dav Put Rob 19611.
7.b
The Polynomial Hierarchy, Probabilistic Models, and Games
The Polynomial Hierarchy of Meyer, Stockmeyer and Karp uses the concept of an "oracle," and is done in analogy with Post's hierarchy.
NP = NPtime or
C1
are those sets of the form:
Here p is a polynomial and P is a polynomial-time computable predicate. Thus are those problems decidable in polynomial time, gioen correct nguesses". (We use IzI to denote the length of z ) . Suppose there is an oracle for any given C1 set which answers membership questions instantly. An NP computation using such an oracle for e.g. { w ( ( 3 u1,.1 5 p ( ~ w ~ ) R ( w , would u ) ) have the form:
Here R ' is the predicate for lisb of words in R. Using Prenex Laws, this converts to:
LECTURE 7
129
( V d q 5 p(lw'q)) This is of the form
which is called Za - form and involves an alternation of quantifiers from 3 to V. Conversely, any set defmed by a BZ form is NPtime in an oracle for a El (i.e. NP) set. Now suppose we had oracles for all Ba sets. The analogy continues, i.e. by an "time computation we arrive at E3 sets, which have two quantifier alternations:
We proceed similarly for all finite levels Z,, n 2 0. Let Bo be Ptime (polynomial time) by definition. We have the hierarchy:
It is unknown if the hierarchy is one of strict inclusions. As presented in this way, the polynomial hierarchy is very abstract. However, even at its lower levels, it relates to Operations Research problems, specifically, to parametric mixed integer programming. Suppose that an MIF' description is given of a plant and warehouse location problem, and we need to know: "Is it the case that, no matter where I locate plants 5-10on the allowable sites, there will be a way of meeting all customer demand with our client Apex Corp. requiring no more than $ 1million in transportation costs this year?" Our query has the form:
( yrtims ) ( V
3 Other locations, distributions
(meets demand and Apex trans 5 $1 Mil)
R. JEROSLO W
130
The question is in "c0-X~"(complement of &). Suppose instead of "a way" we want "an optimal way". Then OUT question has the form:
(
)(
V L;y;.
V Possibly different locations, distributions
3 Other locations, distributions
The other locations, distributions meet demand and
If the possibly merent locations, distributions meet demand, they have at least as high a cost and Apex transportation costs do not exceed $ 1 million This is c 0 - X ~ . See [Bla Jer 19831 for some parametric programming problems which are in P or NP. Different levels of the hierarchy suggest different types of algorithms. XI suggests the use of "or" trees. If any choice succeeds, accept 2. "OR" trees are like branch-and-bound, ezcepf that in an "OR" tree there is no communication between alternative paths (unlike BB, where there is). & suggests the use of "or/and" trees. We note that all En involve binary trees with branches of polynomial length. So membership a En can be tested in polynomial spuce (and exponential time, via buckfmcking). Letting PSPACE denote polynomial space we have:
Eo c Xi C_ ... c En C ...
...P S P A C E
The following result "caps" the polynomial hierarchy: Theorem: [Sto 19771
PSPACE =
U En nZO
From the theorem, if P=NP then P=PSPACE, and the converse implication is clear. Moreover, in polynomial space at most an exponential number of
LECTURE 7
131
recast as a binary
tree
Figure 19: “Or” Tree tape configurations, combined with read head positions and states, is possible. Thus, PSPACE is contained in exponential time (EXPTIME), so the entire polynomial hierarchy lies below the Horn (satisfiable) fragment of predicate logic. Here is another reason why predicate logic is probably very difficult. Just as boolean expressions, in the form of propositional logic, play a central role in NP or XI, they also provide complete sets at the various levels of the polynomial hierarchy.
Theorem: [Sto 19771 The set of all true statements of the following form is complete for Xn (under Ptime functions):
in which B is a propositional form, and there are (n- 1) alternations of quantifiers. Papadimitriou gives a variant construction which involves random variables (continuing work of Gill,Valiant, others). This construction exactly follows that for ezcept h a t the universal quantifiers are replaced by random choices of a vector, and the final condition is to be attained more than half the time (for En, it is ”all the time”).
R. JEROSLO W
132
,,(2)
Such trees can be recast as binary - branching. To succeed need any y(') such that all y ( p ) from it succeed.
Figure 20: "Or/and" Tree So P& is:
p(Iz1)) (given a realization of a random Iy(2)l 5 p(I.1)) ( 3 1 ~ ( ~5 ) 1p(lzl))P(z,~ ( l ) , y ( ~ )( , ~ 1 )is true more than half the time. (3ly(l)l
Similarly for PX,,and by definition PPSPACE (probabilistic PSPACE) is the union Un>O - PX,,. Theorem: [Pap] PPSPACE=PSPACE
PPSPACE is essentially equivalent to finite horizon, dynamic non Markovian decision processes with terminal state rewards (i.e. non additive) in which one gets to see the realization of a random event before having to make the next decision (which will be followed by another random event). Although there is an exponential number of possible policies, there are only polynomially many state8 over all stages. However, transition probabilities depend on the entire past hietory of decisions. Call this FHDNM. Any policy is evaluated by its expected reward. Given a description of a FHDNM process and a quantity B, we consider this form of a question: "Is there a policy of cost 5 B?"
LECTURE 7
133
Theorem: [Pap]
FHDNM is PSPACE complete. The "surprise feature" of this result is not that FHDNM is "hard" (it seems impossible!) but that it places "so low". We now have this picture:
IP
PARA
MIP
PPSPACE
C;NexptimeCN(2*"-
time)(=.
..C no algorithm
Figure 21: A Complexity Chain The polynomial hierarchy can also be related to various generalizations of linear programming, generalizations motivated by considerations of public policy and of delegation of authority (agent problems). Bi-Zeoel programs (games) are LP's in which two players have control over disjoint sets of the variables. They move in a definite order.
R. JEROSL 0W
134 "Policy maker" sets variables first,
then "citizen" reacts. "Policy maker" and "citizen" each have their own linear criterion function involving both sets of variables. AZZ data is known to both players. In what follows, we assume the LP gives a polytope (bounded). We ask the following:
Question: What move should policy maker choose, to maximize his/her benefits, with clear knowledge that citizen is dso a maximizer?
A variant of our question (for use in complexity) is: can the value of the bi-level program be 2 B? Bi-level programa occurred in a policy setting at the world bank (Candler and Townsley) and more general sequenced-move games date much earlier (Von Stackelberg). Shce all variables are continuous and all constrahts h e a r , they are very simplified programs and do not consider e.g. possible controls over price (criterion functions), taxation policies, discrete alternatives, etc. Therefore complexity results for bi-level programs awe very serious statements about potentid barriers to efficient solution of 8CtUd policy questions, etc. Two results here are: Theorem: (Fa&, see [Bard Fal19821) The optimum vdue of a bi-level program occurs at an extreme point. Theorem: [Jer 1985~1 Bi-level programming is NP-complete. p level linear prognrms are a direct generalization of p = 2 levels. Again, it is sequenced - move with complete information, and each player must leave feasible moves for those yet to go. Variables are continuous. A practical example for p = 3 occurs when a CEO (Chief Executive Officer) specifies possibilities to a divisional president, who further specifies possible actions to a divisional executive.
LECTURE 7
135
Theorem: [Jer 1985~1 From the ability to solve (p 1) level linear programs up to 50% of the optimal value to the first-moving player, one can decide membership in sets at level C, of the polynOmial hierarchy.
+
(Upper bounds are not known). We ask this question, in view of the above complexity results: How can a game theoretic solution concept be "normative" when it cannot be computed? How can we recommend as a "solution" what we ourselwe8 cannot implement? For a use of complexity theory to challenge a cooperative model interpretation in game theory, see [Chv 19781. As a practical matter, Candler reports on a policy question solved to optimality as a linear program, under the assumption that the "citizens" share the goals of the "policy maker," versus the same question solved to within 1% as a bi-level program. The LP solution was wrong by a factor of two. It would be very useful to have better algorithms for bi-level programs. Artificial intelligence methods are viewed as particularly suited to illstructured situations, in which a clear problem definition or god statement may be lacking, and in which an implementable normative framework of a traditional type is lacking (see [Sim 19731). It follows fkom Chvatd's work and the work cited above that, even in highly structured situations with clear normative measures, computational complexity alone can obscure the implementabaty of these measures. Potentidy this provides another role of nontraditional (e.g. satisficing) approaches; this matter needs to be made precise. Conversely, in those instances of structured problems with usable normative measures, all approaches to problem-solving can be gauged by these measures.
This Page Intentionally Left Blank
137
LECTURE 8
THEOREM-PROVING TECHNIQUES WHICH UTILIZE DISCRETE PROGRAMMING
Summary: We show how ideas from discrete programming can be used in
conjunction with theorem-proving techniques, with the potential to improve the efficiency of formal deduction. From discrete programming, we utilize the emphasis on propositional logic and, specifically, the use of incumbent-finding techniques for satisfying valuations. Incumbent-findingcan be efficiently performed by suitable list processing routines, as well. From theorem-proving techniques, we utilize the simple form of unification for pure predicate logic with constant symbols. We focus on decidable fragments of this predicate logic, notably the G satisfiability fragment, although in principle all predicate logic can be treated by the methods discussed here. For the decidable fragments, we develop finite algorithms with worst case time bounds equal to the theoretical ones from complexity theory, provided that nondeterminism is replaced by exponentiation. This property insures that our algorithm does not waste time unnecessarily. However, the intrinsic complexity of these fragments of logic is exceptionally high, and we expect that, while practical methods may utilize the gamut of known devices, problem structure will be essential to exploit. Our theorem-provingphilosophy is influenced by Nevins’ view [Nev 19741, on the value of doing a c i e n t logic subroutines first, prior to utilizing routines which can cause explosive growth in space or time requirements. Our algorithm was described in [Jer 1985dl.
R. JEROSLOW
138
Reduction of Predicate Logic to a Structured Propositional Logic
8.1
We begin with some general results on reducing pure predicate logic, with constant symbols, to propositional logic. The specific reductions here are not typically useful directly, but they will serve to guide our algorithm development. We proved the theorems below in [Jer 1985d], only to later confirm that much of them were in the logical folklore (see e.g. [Den Lew 1983]), but at the time we did not know a reference in the literature. We include a sketch of our proofs. Theorem: Let A = A( z , Y ) respectively B = I?(;) be quantifier free, let z be of length n, and let C be the set of constants occurring in A resp. B. (If no constants occur in A resp.B, put C = { c } where c is a constant). Let C denote the set of all vectors of constants of C of length n, and put t = IC(. Let be a vector of constants of length n, none of which is drawn from C. Then: A
-
4
2
(A) (3 ;)I?($’) is satisfiable iff B ( 2 ) is satisfiable iff (3 ; ) B ( ; ) has a model of size 5 n t .
+
(B) (V ; ) B ( s ) is satisfiable iff model of size 5 t.
s)(V
A-
- B ( < ) is satisfiable iff (V ;)I?(;)
c EC
d
A
has a
(C) (3 $)A(;,$) is satisfiable iff (3 $ ) A ( d , Y) is satisfiable iff (3 )(V $)A(z,y) has a model of size 5 n t .
2
+
Corollary: 3-SAT is NP-complete. A
Sketch of proofi
(A) If (3 G ) B ( s ) has a model, let those elements which satisfy I?(;) be denoted by a vector of new constants d . From the model we obtain a truth valuation making B ( 2 ) true. Conversely, from such a truth true. valuation we obtain a model making I?(;) and so (3 ;)I?(:) A
LECTURE 8
139
(B) If (V ;)I?(;)
is satisfied in a model, that model provides a truth valuation making A ; E z B ( Z ) true, since for each c E C the model contains an element denoted by c. Conversely, from such a truth valuation, we obtain a model whose elements are exactly C in which AB(2)
-
i.e. (V
c EC
Z)B(Z)is true.
(C) Combines ideas of (A) and (B).
Q.E.D.
- b(')fails}
for some k, cf'y
+Dpz < b p }
LECTURE 9
161
= k-th row of C(') D f ) = k-th row of D(') b t ) = k-th element of B(')
and Cf)
Let g be a function from (1,
...,t } such that g ( i ) is a row index of C(')for
alli. Then:
Now note that
iff for every basic feasible solution to
xi
+ xi
w i d8(i) ' ) z < ub wib('). 8 ( 3 if w we have ub 5 0 if w = 0 and we have Thus i f P # 0 (so UC = 0, u 1 0 implies ub 5 0) we have (Vg E P)S =
for some basic feasible solution to u c ZiWiC$) = 0,u 1 0,w 2 0 with w # 0 we have
+
xi wiD,(,)z (4
2 ub
# 0.
Z ; w ; + C;Uj = 1
+ xi ~ b $ ) }
We obtain an inner approximation as more b. f. s. are generated. In this context, if z+ is the current solution to the "master problem" we add a new alternative inequality if, for some function g , we find a b. f. s. (u', w + ) with a nonredundant inequality of the form wtD${,z 2 u'b Xi w;b$,. If we find no such basic feasible solution, we may stop with the current z*. Here in the universal (spatial) quantifier case, the search is more extensive, and can be computationdy very demanding. Of course, this is due to the nature of this quantifier.
xi
+
162
R. JEROSLOW
8.4
Logic as pre-processing routines for MIP: an example via the DP/DPL algorithm
In this section, we address this issue: to what extent do the DP and DPL algorithms generabe to MIP processing? During our discussion, we will also uncover an interesting relationship between branching versus resolution. For simplicity, assume that each proposition Pj is replaced by a polyhedron. (Actually, Pj can be a general b-MIP.r set; see our fuller discuss on in [Jer 1986bl). In general, the variables in each Pi are shared. We also consider the Cartesian Product (CP) case, in which the variables z ( j ) occutIing in Pj do not occur in Pi for i # j. (The CP case generalizes truth valuations). In this context, there also are polyhedral Pj and i P j with
Pj n lPj = 8 there is a domain constraint 2
E Pj U l P j allj
and Pj and -Pj are to have the same recession cone. This essentially is an instance of an embedding with D = 0, and we have:
W = ni(PulPj), 1Pj = Pj \ w Suppose first that we are given a list of clauses in PI, ...,P,. We ask if clausai chdning goes over. In fact, it does. We now explain why. If both Pj and 1Pj occw in a clause, 2 E PjUlPj by the domain constraint, so the clause can be deleted. If unit clauses Pj and 'Pj both occur, it is inconsistent by Pj n 1Pj = 0. If Pj occurs alone as a unit clause, we add " z E Pj" as a side constraint; and we can delete any clause containing Pj, and remove -.Pj from other clauses (by Pj n 1Pj = 0 ) . We next see that, in general, monotone variable fin'ng does not go over to MIP. E.g. if the clauses are PI V Pa, Pz v P3, we cannot assume "z E P1." If Pz, P3 C -.PI, actually all solutions have z E lP1. However, monotone fixing i s valid in the Cartesian product case (as e.g. E Pln does not affect or in propositional logic) for then the setting "dl) d3). We next ask if splitting goes over to MIP. Here we are given three sets of clauses: Pj V R1, ...,Pj V R, l p j v s1,..., l p j v sb Tl, .*, Tc
LECTURE 9
163
In the resolution form splitting is
It is ultuuye implied (as z E Pj U -Pi). However, in general it need not be equiwdent. We may have all R; = Pi and all Sk = -Pi, so the problem is inconsistent, but the resolved form need not be. In the Cartesian product case, resolution splitting does give an equivalent problem (exercise). Splitting in the brunching form is always equiwdent provided the relevant condition "t E Pj" or "2 E 1Pj" is added as a side constraint. This completes our discussion of the three basic subroutines of DP/DPL. Note that, as we proceed, we accumulate a list of side constraints, all of which are polyhedral. Thus all side constraints together define a polyhedron. Prior to splitting, we attempt to fathom the problem by solving the linear relaxation (LR) of the current list (using representations P; of the Pi): 2
E Rd((Pj V R,) A ...A
(Pj V A(1Pj v A ( 1 P j V Se) AT1 A A,.. A
S,)
t
R,)
Tc)
E side constraints
z E Rel(Pj v l P j ) all j
If we prove inconsistency, or obtain a solution, we may terminate calculation. We now ask: which version of splitting has the superior LR? Brunching does, in ull cuses (consult Lecture 4 for the distributive laws):
Of course, Rel((Ail\k (Ri V
&)
A
(Aprp)) describes the linear relaxation
of the problem resulting from resolution, while Rel(Ai R;A ApT p )respectively Rd(Ak SL A Ap T p ) is the linear relaxation of the problem on the first resp.
164
R. JEROSLO W
second branch due to branching. In this MIP setting, any side constraints obtained in clausal chaining are already implied by the linear relaxation, hence in both branches of more unit resolution will be possible than in the completely resolved form. This comparison of LR’s is further strengthened if we take into account the additional side constraints from branching. Moreover, in the case of only propositional logic, the above analysis shows that clausal chaining is at least as powerful in generating new side constraints in either branch as it is in the resolution form. We proved this earlier in [Bla Jer Low 19851. The reluzation dominunce of branching is clear from the above. Moreover, in most cases, the sum of the sizes of both branches is smaller than that of the resolved form, although this statement is not a rigorous result. This kind of advantage, of branching over resolution, continues in the predicate logic treatment of Lecture 8. In the search tree process of locating a truth valuation, branching on a propositional letter will subsume doing all possible combinations of resolutions on that letter (prior to unification). Thus, branching on a letter resulting from unification subsumes all possible resolutions arising from that specific unification. In this section, we have seen how ideas and algorithms from logic can be used to aid in MIP solution, here in the form of pre-processing devices. This complements our earlier work in Lecture 5 and our remarks above on branching versus resolution, which generally show how to aid logic processing by devices from mathematical programming. Indeed, the two subjects of discrete programming and applied logic are very much interrelated.
165
LECTURE 10 TASKS AHEAD Summary: In this discursive lecture, we present some of our own views on trends and possible futures for Mathematical Programming, and we cite some general research projects which appear to be worthwhile. We mention the many research activities that connect parts of Artificial Intelligence to Operations Research, which are quite Merent than the connections covered in our lectures here. We conclude with some information on Artificial Intelligence instruction at a few business schools which we contacted. In trying to obtain a strategic picture of Mathematical Programming, we have been very much influenced by the ideas of Michael Porter in both [Por 19801 and [Por 19851, although the frameworks there do not directly apply to academic research areas. The main difEculty I found, in trying to anticipate future developments, has been the speed with which possibilities become realities in the very fastmoving areas of computer technology. Some of the developments I "guess" at here are already underway.
10.1
Three "top-down" Views of Mathematical Programming
We begin by giving three ways to view Mathematical Programming: intellectual history, academic settings for instructional programs, and end uses and users of OUT methods. All three of these views are at such a high level, that virtually none of us will recognize much in common with OUT day-to-day activities. Nevertheless, such a "stratospheric" analysis is appropriate in order to see trends and opportunities, and to understand how o w aggregate activities may be viewed by others outside our field. The section concludes with a prognosis and recommendations from my own perspective.
R. JEROSLO W
166 10.1.1
The Intellectual Heritage
Operations Research is an area, born during the Second War out of military needs of that period, which represented a fusion of several academic areas, notably those in Economics and Mathematics. We might draw the influences of the 1940's and 1950's in this manner: 1940'8 and 50'8
Economics
Graph theory and combin.
I Figure 23: Intellectual Heritage
In this diagram, the influence from linear algebra is represented by linear programming algorithms, from real analysis by nonlinear programming theory and algorithms of that period, and the influence from graph theory and combinatorics began of course with network flow theory. The "theory of the h" to which we refer is largely microeconomic profit-maximizationmodels, and "competition" is reflected in the early work in noncooperative game theory, which f i s t entered Operations Research via zero-sum two-person games. Each link above can be identified with the research of one or a smallnumber of individuals at this early stage of our field; let us not attempt to praise famous men by citing them!
LECTURE 10
167
In the mid 1980’s’the picture looks more like this: 1980’8
rn Mathematics
numerical analysis
Electrical Engineering
I Complexity
I
JGrnputer Science
Data structures
and
Data
bare
An. Int
algorithms
? I Psychology
Figure 24: Current Influences
168
R. JEROSLOW
Here the influence of general equilibrium has entered Mathematical Programming through the development of pivoting methods and associated results in Mathematics and Game Theory. The effectsof advances in these areas was widely felt in the 1960's and 1970's. Much of the influeace of economic modeling in Operations Research attained national prominence in the work of W. Hogan and in later work by H. Greenberg on the PIES model and other energy models. Related work is continuing and we shall cite it in 10.3 below. The most recent change in the diagram is the emergence of a new field, Computer Science, created at the juncture of applied logic and electrical engineering. The engineers supply the hardware, plus studies in pattern recognition and signal processing; the logicians supply most of the theory for hardware and software, particularly the latter. Computer Science breaks down in turn into many subfields, as it has become increasingly specialized and tends to deiine itself more autonomously from its origins. Computational complexity was the first to influence Mathematical Programming via the efforts of R.M.Karp, who built on J. Edmonds' earlier concept of a "good" algorithm and S. A. Cook's work in automata theory. The influence from data structures (for use in algorithms) originates in the work of F. Glover and D. K h p a n on network algorithms. Initially their research proceeded from practical experience with Mathematical Programming algorithms, quite independently of Computer Science. Tarjan's monograph [Tarj 19831 has continued the intellectual emphasis in this important and growing area. In the 1980's diagram, I have drawn two dotted lines for connections which are now beginning to develop. More and more mathematical programmers are learning and occasionally teaching database query languages and related database topics, including database design via programming models. A. Geofhion's development in "structured modelling" [Geo 19851 reflects this connection as well, although it has a far broader scope I will cite below. While database is slowly moving into programming, at least outside computer science departments, information systems design and systems policy has moved into Management Information Systems, rather than Operations Research. However, a recent paper by D. Klingman, N. Phillips, and R. Padman which details an extremely successful implementation of OR techniques, along with techniques from AI and IS, describes how the starting point for applications work can lie in the information system. Our efforts in these lectures has been to illustrate a part of the Linkage that I have drawn between Mathematical Programming and Artificial Intelligence. I will overview other links briefly later on in this lecture. What is significantly different about Artificial Intelligence is that, unlike
LECTURE 10
169
the other areas of Computer Science, it has been influenced by cognitive processing models drawn from Psychology. This influence has, in turn, led to the very ambitious goal of mimicking human intelligence. This goal may not be feasible, and we should not expect to see it achieved during our lifetimes, although some new computer architectures now being discussed may change things. What we can expect, and in fact do see now, is far more intelligent software which further assists in the decision-making processes of organizations. This latter decision support function is the primary role to which Mathematical Programming makes valuable contributions. Hence the connection to systematized human intelligence is important for us, and in that respect the role of Psychology is and will remain central. The potential for creating decision support systems with intelligent capabilities is a major theme of [Bon Hol Whi 19821, which strongly influenced my own efforts. A. Whinston is probably the first to articulate this development, and to arrive at this perspective from a background of Economics and Operations Research. His work and those of his collaborators and students are reflected in the research and projects of the Management Information Systems group at the Krannert School at Purdue, as well as in commercialproduct offeringsfor micro computers, of which GURU is the most recent. In his approach, intelligent modelling is interpreted by and evaluated by traditional utility measures [Moo Whi 19861. The link from Matbematical Programming to Economics has traditionally been weak and almost pro forma. The primary activity in this linkage has been algorithms for computing solutions, or results on structural forms of (conditions for) solutions. Relatively little work has been done on the interpretation of solutions or on solution concepts (e.g. interpretive applied microeconomics), even though this was a focus of research initially [Koop 19511;Wolsey’s paper [Wol1981]is a happy exception. Our failure to develop this link further is probably a missed opportunity. As a social entity, the values and raerence points for basic Operations Research has remained in Applied Mathematics, where most of the founders of the field were trained. Via the use of competing automata based on concrete AI paradigms, the link to economics may be strengthened. 10.1.2
Academic settings for Mathematical Programming
As we saw in the preceding lectures, Mathematical Programming has been historically defined as a collection of certain areas of Mathematics and, to a
170
R. JEROSLOW
lesser extent, of Economics, which are useful in many contexts of decisionmlrlring and resource allocation. The justification for a special identity for these topic areas is to allow a focus for the further development of these areas, and the associated algorithms, modeling techniques and applications. The possibility for such an identity also arises for these reasons, via the support of user communities in many other academic areas, and in industries. The outstanding success of the Simplex Algorithm greatly aided this process. Operations Research is, in this regard, entirely similar to Statistics and theoretical aspects of Computer Science, Electrical Engineering, and other areas of Applicable Mathematics. We must keep in mind that, until fairly recently, Mathematics communities proper had little interest in applications, and little understanding of the role of computer experimentation in applied mathematics. Even within Applied Mathematics, there was substantial competition between "traditional" Applied Mathematics as used e.g. in physics, and the newer Applied Mathematics exemplified by Statistics, Computer Science, and Operations Research. Attitudes within the Mathematics communities have changed and are continuing to change (see e.g. [Renew 19841). These productive developments will curtail unnecessary fissions and bring new vitality to Mathematics, as it removes the earlier "push" felt by applied mathematicians. However, the "pull" from user communities is, if anything, stronger than ever; so more and more applied mathematicians will be working in settings outside of Mathematics proper. The various "pushes" and "pulls", which earlier defined Operations Research, represent forces which change over time. Thus its continued viability needs to be periodically re-examined, particularly when very new trends become evident. To see the current state of affairs, we can begin by taking note of academic settings in which Operations Research and Mathematical Programming are represented. Here we take the perspective that Operations Research consists of Mathematical Programming plus several areas of Applied Stochastic Methods (including applied probability and statistics, simulation, stochastic processes and stochastic optimization). These two major divisions of Operations Research, which do overlap at many points (e.g. dynamic programming, stochastic programming, stochastic analysis of algorithms, etc.), are not always housed in the same academic unit. Mathematical Programming and Operations Research has been located in, both, Colleges of Engineering, of Science, and sometimes of Liberal Arts. This diversity reflects in part the diverse academic settings for Mathematics and Applied Mathematics. In addition, Operations Research has been located in
LECTURE 10
171
Colleges of Management, which is a professional school setting, and in some instances Operations Research is a free-standing department. We can draw the different academic settings as follows:
Education
.. Management
A OR Dept
Figure 25: Academic Settings
The most common academic settings are in Departments of Mathematics, Statistics, Industrial and Systems Engineering, Computer Science, or of Management. Operations Research is also carried out in some Departments of Electrical or Mechanical Engineering and of Economics; no doubt other settings have also arisen. In each case, there is usually significant adaptation at the level of the instructional program. For example, in the introdzcctory linear programming course in a Management setting, one stresses recognition of real-world prob-
R. JEROSLO W
172
lems as linear programs, the use and interpretation of computer output, ”what if” questions and some decision-support issues. In Industrial Engineering, the focus for the same topic would on the algorithms and software for computer solutions. In Mathematics, issues arise of, both, numerical analysis and extensions to ordered fields; although today’s students, with weak Mathematics backgrounds, often can explore these topics only at a graduate level. Thus the ”same” course actually becomes three very different courses in three different academic settings. The diversity of settings, and the degree of adaptation to teaching programs, is very unusual. Equally surprising at the intellectual level of research is the cohesion and identity as Operations Researchers. This derives from professional associations, meetings, and journals, which continue the values of the founders. As a consequence of the diversity of settings, changes in the content of Mathematical Programming will be driven by needs of instructional programs, as much as by intellectual thrusts. 10.1.3
Users’ Perspectives
For a moment, let us adopt the view of users of Operations Research. They may see things in this way: 0
Both Operations Research and Artificial Intelligence construct models to represent situations and have techniques to assist in achieving satisfactory outcomes.
0
OR is stronger on numerical calculations.
0
A1 is stronger on calculations with symbols.
The users have problems which typically involve both numbers and symbols. Therefore, users will seek one source to go to for both kinds of calculations. Currently, there is no such source. This is not simply a matter of juxtaposing related disciplines. Fundamental approaches and issues in both fields are deeply intertwined. Substantial intellectual cross-fertilization is to be expected. In more specific detail, historically the primary user areas of Operations Research have included Production/Operations Management, Finance, and Marketing. In Production/Operations Management, sophisticated schematic techniques from Artificial Intelligence are being used, in conjunction with some Operations Research methods, to represent and to solve scheduling problems.
LECTURE 10
173
The relatively more developed application approach of expert systems is beginning to be used in Finance, Marketing, and Accounting. Retention of these applications areas seems crucial, yet unsophisticated user communities can perceive that the choice of a technique from one area (Operations Research or Artificial Intelligence) precludes the use of techniques from the other area. Operations Research has historically a greater and long-standing emphasis on the real-time performance of its algorithms on large-scale, real-world applications. This relative advantage, however, cannot be indefinitely guaranteed. For instance, for those problems which are accessible to solution by Horn clause expert systems, we saw earlier that linear time algorithms exist to implement computation. On the other hand, some practitioners of Artificial Intelligence are gradually becoming more aware of ways in which their algorithms can be improved by use of quantitative algorithms. This presents a good opportunity to combine methods to mutual advantage. For well-developed AI technologies, such as expert systems, one expects these to move directly to the user communities as concrete applications. Other more formative AI symbolic technologies can profitably undergo a combination with OR quantitative technologies, and even expert systems can be combined with other OR techniques and software modules in configuring large decision support systems. The opportunity to benefit users, by combining traditional quantitative techniques with newer symbolic techniques, is by no means restricted to Operations Research. It constitutes a "gap" of huge potential for many subject areas of application, for Computer Science, and for Applied Mathematics generally. The issue one must consider is the proper positioning of Operations Research for the long range to better exploit that part of these opportunities which are accessible to its methods. Within the business school environments, instruction in expert systems is emerging in Management Information Systems groups more readily than in Operations Research groups. This increases the likelihood that MIS will absorb more of AX concepts and technologies. There is also a trend for Operations Researchers to make commitments to the MIS area. While situations vary at different universities, overall it would be fruitful for MIS and OR to develop cooperative efforts, and for more OR academics to deepen interest in MIS. It would be beneficial to do so for database, modelling, decision support, and implementation reasons, as well as for exploiting opportunities in AI.
R. JEROSLOW
174
Such a development would be, after all, only a continuation of the association between OR and computer theory and technology, which began when the Simplex Algorithm was first coded up. Had computers existed in the last century, very likely OR would have started then. Advances in computer technology have always been to the benefit of OR. 10.1.4
Some conclusions
Now about midway through this tenth lecture, it can hardy be surprising to the reader that I recommend the growth of the linkage between Operations Research and Artificial Intelligence. However, I see this linkage developing as a consequence of wider changes which are needed in the orientation and graduate training of Operations Research. I believe that, after nearly forty years, the base provided us by the founders of our field is no longer adequate, by itaeZf, for the continued growth of the field. Operations Research remains viable on this base, but only for a decreased number of applied scientists. A contraction of the field, provided it also resulted in higher quality research, would not be bad per se; but the increasing isolation from user communities is a very negative development. To some extent, a contraction in the size of the Operations Research community can be offset by a growth from abroad, as more developing nations seek to acquire the skills of our field. Export, however, is not an attractive path to take. Our efforts should also benefit our own nations. Moreover, export is a temporary solution. If we conceive of Operations Research today as being composed, half by Applied Stochastic Methods, and half by Mathematical Programming, I would propose that both these areas be reduced to one-third and that the remaining third be taken up primarily by Computer Science and Applied Logic, with contributions from Economics, Organizational Behavior and Psychology. Within Computer Science, Artificial Intelligence will naturally emerge as crucial; it does not need any special emphasis per se. The linkage to Applied Logic is equally as crucial; a thorough one-semester course in logic should be in all OR curricula at the doctoral level, in my view. A second logic course is advisable. There is sufficient common interest between Operations Research and Applied Logic to act on it directly, rather than waiting for the "trickle down" through Computer Science. The emphasis on Computer Science will help to take Operations Research further "downstream" toward the users. In addition, we need means of bypassing, to some extent, the many obstacles in moving from improved methodology to its use in practical contexts. We should not remain entirely dependent on
LECTURE 10
175
the ”trickle through” of our techniques to the functional areas of business, or via individual consulting efforts, even the best of which are necessarily of limited duration and scope. To this end, the establishment of a library for storing datasets of applied problems from industry, is the crucial link to rapid and rapidly - used advances in methodology. In terms specifically of the relation of OR to AI, I would not discount either the efforts or the risks involved in making the connections to AI. Such work is primarily for the more entrepreneurial among us. Nevertheless, as I shall discuss in the next section, several OR academics are already far along in this direction. A good place to begin is with an understanding of expert systems (e.g. [Har King 19851, [Hay Wat Len 1983]), including a ”hands-on” technology, and a reading of that part of the OR literature I will cite in 10.3 that seems most interesting to the reader. The shift to a weight of one third on Computer Science and Applied Logic, as I have just suggested, does indeed imply that the twethirds in our inheritance retains its value. After forty years in a much changed world, some adjustment would be expected. It remains important to retain our roots in Mathematics, and in axiomatic studies. I would expect that, following the influence from Applied Logic, much progress will come to depend on new uses of Mathematical Analysis.
10.2
Some research challenges related to these lectures
In this section, we sketch briefly several research projects related to our work, which we believe can be fruitfully implemented now. We divide these into two groups, in accordance with whether they concern primarily MIP representability or primarily the interface of A1 and OR. 10.2.1
Research on MIP representability
We list five potential projects: 10.2.1.1 Further integration of polyhedral combinatorics with disjunctive methods, including a detailing of potential applications. 10.2.1.2 Empirical studies of formulations which arise in practice; useful taxonomies; exploration of ”automatic formulation” for a user description of the M I P ; detailing of frequently-occurringsimplifications.
R. JEROSLOW
176
10.2.1.3 Non lattice-theoretic treatments to obtain "fairly s h q " formulations, for problems which have neither efficient disjunctive formulations nor efficient combinatorial ones. 10.2.1.4 Delineation of cases where the linear relaxation (LR)"almost" solves the problem, with practical asymptotic results (perhaps related to Shapley-Folkman-Stan,see Cassel's proof). 10.2.1.6 Adaptation of search algorithms to best exploit good LR's, including dynamic reformulation at search nodes and postoptimality information. Of these potentialprojects, 10.2.1.2 is the most needed and, a% an empirical effort, it would requke a smaU team of researchers.
10.2.2
Research on the AI/OR Interface
These five projects are all worthwhile:
10.2.2.1 Improvements in the representation and handling of propositional logic. 10.2.2.2 Use of theorem schema, saving of search trees, precompilation (to save effort at run time; this is a form of "learning"). 10.2.2.3
Development of techniques for utilizing the Horn clause fragment of which is "almost Horn".
a problem
10.2.2.4 Integration of logic techniques into frame-based reasoning; use of "generalized pointers". 10.2.2.6 Development of techniques for combining list processing routines of logic with OR "number crunching," for problems where both logic und linear structure occur.
Of these potential projects, work is underway at several OR locations for 10.2.2.1; some AI researchers are beginning to address 10.2.2.3; and some software systems are currently offered which claim capabilities similar to those mentioned in 10.2.2.4. 10.3
Some other research programs in the AI/OR Interface
Here we list those methodological efforts in the AI/OR interface of which we are aware, and which are not related to MIP representability. We would be happy to learn of other efforts.
LECTURE 10
177
10.3.1 Work on heuristics, borrowing from AI and OR (Glover, Pearl) 10.3.2 Integration with databases (Geoffrion; Klingman, Phillips, and Pad-
-); 10.3.3 Data structures (Tarjan, Glover and Klingman); 10.3.4 Model management (Greenberg, Vance) including automated -lanations; 10.34 Techniques for configuring decision support systems from rule bases, databases, and algorithms (Whinston); 10.3.6 Uses of parallelism, particularly to aid search algorithms of MIP (Meyer, Phillips and Rosen, Balm); 10.3.7 Natural language interfaces for OR software (Greenberg)
10.3.8 Uses of intelligent systems to select among Mathematical Programming models and to guide search strategies (Schittkowski; Greenberg; Murphy and Stab; Minoux).
10.4
Some programs and courses in the AI/OR Interface
We did a nonsystematic survey in April and May 1986, by calling up some colleagues and asking them about AI activity in their group or department. In some cases, programs were under revision and may be very Merent now. We srlmmnrire the information briefly, in close to the order it came in. What emerges (except for Purdue) is a picture of AI as "starting up" in business schools,with the initial course as one in expert systems, and given by the MIS group. 10.4.1
Purdue Univurity, MIRC
The Management Information Research Center (MIRC) of the Krannert School has 800 undergraduates and 8-12 Ph.D. students in residence, with four faculty. Its main intellectual interests are: decision support systems; accounting information systems; and automated manufacturing. It is heavily into Artificial Intelligence and has been so for several years. Source: Andrew Whinston
R. JEROSLO W
178 10.4.2
University of Texas at Austin, Ph.D. Programs in MIS and OR
The MIS program has six required courses and eighteen courses available. Among the required courses are database, decision support, and database administration. Other courses include expert systems and artificial intelligence. The Management Science program requires five courses and an elective, and thirteen courses are available. Software design is required, and students can select two minor areas of three courses each. Source: Darwin Klingmm 10.4.3
Carnegie-Mellon University, GSIA and SUPA
The course offerings in Artificial Intelligence are under major revision at the Graduate School of Industrial Administration (GSIA) and the School of Urban and Public Mairs (SUPA). Currently there is a general introduction to AI in two half semester courses, followed by an expert systems course. A LISP workshop and a Texas Instruments Personal Consultant workshop is offered. A significant broadening of AI instruction is expected, including expansion to full semester courses and a differentiation between Ph.D. (technical) and MBA (managerially-oriented) course offerings. Source: Peng Si Ow 10.4.4
University of Iowa, Management Sciences
An AI course including Prolog programming, practice with an expert system shell and exposure to automated AI planning has been taught twice and is expected to become a required course for Ph.D. students with an MIS emphasis. A workshop on management-oriented AT applications will also be offered frequently. Source: ColinBell 10.4.6
University of Colorado at Boulder, MIS and OR
The Center for Applied Artificial Intelligence is a research center which includes OR and MIS faculty. About four courses have been or will be taught, including experimental courses. An expert systems course is a regular offering, using Texas Instruments Personal Consultant Plus. In 1986-87 an AI programming course will also be taught on an experimental basis.
LECTURE 10
179
Sources: Claude McMillan and David Monarchi 10.4.6
Northwestern University, MIS
Courses are offered in Decision Support Systems in Planning, Information Systems Analysis and Design, Database Management Systems, and Artificial Intelligence and Expert Systems. Source: Eitan Zemel and Benjamin Mittman 10.4.7 Duke University, the Fuqua School
A course on “Expert Systems in Management” has been offered twice on an experimental basis by Marketing faculty, and is likely to become a permanent offering. There is no MIS group and interest in AI is restricted to proven technologies. Software support has been through use of Ml, with current plans to switch to GURU. Sources: Joe Mazsola and John McCann. 10.4.8
Massachusetts Institute of Technology, the Sloan School
There is an expert systems course which is jointly listed with Computer Science. Other AI topics, including frames, are taught in other MIS courses. Students learn LOTUS, and learn LISP as the only programming language which is required in the introductory MIS course. 10.4.0
Georgia Institute of Technology, Management Science
A special topics course has been taught on decision support systems, which includes expert systems as a decision support tool and, potentially, as a form of assistance in working with a toolbench of integrated software packages. Readings have been used from the books by Harmon and King, Sprague and Watson, and Boncsek, Holsapple and Whinston. The course included exercises in PROLOG, and a project which required the construction of a simple expert system in TURBOPROLOG. A system shell was demonstrated (TI Personal Consultant Easy).
10.5
Guessing Ahead
The ”shocks” to our industrial and geopolitical environments, which emanate from computer technology, are likely to increase in severity in the next five to
R. JEROSLOW
180
ten years. This is due to the fact that ideas for drastic changes are numerous, as are well-funded experiments to test these ideas. Several of them will "pan out." Competitive pressures dictate continued experimentation. A stable technology is a distant dream. Having followed all these developments from the 1920-21period of Emil Post's notebook, through Turing's 1930's conceptualization of a general-purpose computer, into the modern age, I am, personally, amazed at the effect of useful ideas on the practical world. What was needed for realization of the ideas was advances in materials, the commitment to proceed, and good development work. The challenge now is to munuge the outcome of this process, from basic science to everyday use, and s e e what principles we can learn as the pace of the process accelerates. I a m going to conclude the lecture series with a few guesses at likely developments which will affect Operations Research. They are mostly extrapolations on current trends or themes which seem to have near-term potential. I hope I will not be too embarrassed if I re-read this list in three or four years. I insert it here more as a "fun" way to conclude, than as a serious exercise.
10.6.1
Probably
Exploratory p r o g r b g environments will drop further in price to the "high price" end of personal computers, and will become available in non-LISP languages with high numericd computation capabilities greatly accelerating the pace and diversity of software development. Comment: This was written in May 1986. Now in August 1986 I a m aware of two product offerings which claim these capabilities, one priced at $ 5000 per copy, with a quantity discount.
-
10.6.2
Probably
Effective means of utilizing parallel and "non-von" architecture will become available, and will give a strong boost to artificial intelligence by allowing substantially more real-time processing of complex tasks. Comment: This is a very risky guess, since it is not clear at all as to how to exploit parallelism. I am betting based on the number and quality of scientists engaged in this work and the availability of new architectures for experimentation. There will be some surprises.
LECTURE 10 10.6.3
181
Probably
Linear programming will be put on a chip (once the algorithm framework stabilizes!) which will improve real-time response and search methods which rely on the linear relaxation. Comment: This idea was related to me in the 1960's. While technically feasible now, the proliferation of new LP algorithms have made it riskier to invest heavily in hard-wiring one algorithm. Thus the likehood of 10.5.3 is s m a l l e r than three years ago. A compromise solution is the hard-wiring of certain common subroutines, as e.g. Gaussian elimination. Let us hope for fewer surprises in the future! 10.6.4
Probably
Computers will become available which utilize analog means of processing 3dimensional spatial scenes. When combined with new methods for spatial representation of symbolic structures, this will allow powerful techniques for pattern matching, reasoning by analogy, and several kinds of learning. Commenls: This guess reflects my bias, that most of human intelligence is based on combining a vast data store and an intricate pointer system in it (for crose-referencing), with a very powerful pattern matcher probably adapted from vision centers. Logical deduction, which is often associated with human uniqueness, is probably done in a "recent" and s m a l l set of circuits, primarily as a check on the matching activity. I do not believe that people do logic particularly well. That is why they are so aware of it, as when one consciously places one's feet when beginning to walk. Even the construction of formal proofs is probably supported by a similar symbol matcher. The pointer structure in the memory store can provide research for logicians, as the pattern matcher can for analysts.
Of course, all these guesses may be wrong. Atlanta, Georgia August 1986 &vised November 1986, March 1987 and 1988.
This Page Intentionally Left Blank
183
ILLUSTRATIVE EXAMPLES 1. Give a representationfor the condition: z E [1/1,U1] or z
E [&,U2] or
or
2
E
[hut]
Simplify as far as poasible (as always). 2. Give a representationfor the graph of a piecewise linear function of one variable, with the two line segments
Figure 26: Piecewise Linear Function
R. JEROSLOW
184
3. Give a representation for the graph of this function, where 0
< ul:
if 2 1 = 0
4. Give a representation for the epigraph of the following function, which
uses onZg one binary variable (and no other integer variable): (Ingeneral, t convex sections requires ( t - 1) binary vbls).
Figure 27: Two Convex Sections
EXAMPLES
185
In each of the three problems to follow, decide whether (from the point of view of the linear relaxation) it is better to represent the function on the left directly, or as the s u m or minimum of the two functions on the right. (Use graph representability and a disjunctive representation). (HINT: first obtain min(u,u ) for 0 5 u, u 5 2). 5. x2
t
+
I
(0. -1)
Figure 28: Concave S u m
R. JEROSLO W
186 6. x2
t
vs.
+
Figure 29: Linear S u m
EXAMPLES
187
7.
vs.
Figure 30: Minimum of Functions
R. JEROSLOW
188
8. Suppose that at least one of these constraint systems hold for at least one i = 1,...,m: C ? = l ~ j z j I bi
allzj
2 0
Develop a disjunctive characterization when all 0;j > 0 and all b; 2 0. 9. Give a representation of the graph of the AND function. 1.e. 21,22,23
are bmary variables where 23 = 1if€ 21 = 2 2 = 1. 10. What occurs if the disjunctive construction is used on the following condition:
The variables zl, 22,23 are continuous and nonnegative and:
+ 222 + 23 5 10
either
21
or
221
+
22
55
11. Provide a disjunctive formulation of the following condition, which has the convex span as linear relaxation. Simplify this formulation as far as possible.
The condition is: "Exactly two of the nonnegative variables Z ~ , Z Z23 , and 2 4 are positive, and those which are positive lie in the interval [l, lo]. Moreover, if 2 1 is positive, then either 2 2 or 23 is positive." After giving your formulation, determine if this "standard" formulation has the same hear relaxation: Zj
5 ~j 5 l O ~ j ,
z1
+ %a + +
21
5 %2 + %3
%3
%4 =
Zj
E {O, 1) for
j = 1,2,3,4
2
(HINT: The standard formulation has possible in its LR).
21
= 5,22 =
+,23
= 0, 2 4 = 10
12. (This example is usehl after Lecture 5, and after reading the paper by Blair, Jeroslow and Lowe on MIP approaches to propositional logic. It is
EXAMPLES
189
concerned with Tseitin’s efficient algorithm for obtaining an equivalent conjunctive normal form (CNF)). Use the linear-time conjunctive normal form (CNF)procedure to obtain a cnf (in possibly additional letters) for the following condition of propositional logic. Compare the linear relaxation of this formulation to that obtained from the usual CNF. The condition is:
(4’1
A
Pa
A
P3) V (PI A Ps A P6) V l(P1
-+
P7
V
Ps)
13. Apply the Davis-Putnam procedure in branching form, to determining the satisfiability of the following condition of propositional logic. Use PI,Pa, P3, as the order of choosing branching variables.
...
The condition is:
Pi
V-P,
-Pi
ViP3 VP4
P3
V7P4
Pa
VP3
VP4
14. Determine if the following set of Horn Clauses is satisfiable and, if so, give satisfying valuation:
R. JEROSLOW
190
15. For the following logic problem, give an alternate formulation of your own which has a better relaxation (LR)than the "standard formulation." Show that your LR is better. (HINT:Look at the alternative settings and use these in a disjunctive formulation, possibly introducing new variables. A convex listing it too long, so "condition" your analysis on the four possible truth d u e s for PI and Pa. To show that your LR is stronger, it may help t o note that 1, 1,i) or some related point, solves the given LR but perhaps not the stronger one).
(3, i, i,
191
SOLUTIONS TO EXAMPLES 1. Pi = {.It;5 z 5 Ui}, rec'(Pi) = ( 0 ) A disjunctive formulation is:
Actually, from the simplification given in Lecture 2, as long as Li for i = 1, ...,t the following simplification is also sharp:
I U;
A second disjunctive representation uses a Werent description of the same polyhedra: pi = (2 I rec*(P;) = (21
for some y,
2
= yLi
+ (I - y)Ui, O 5 I I 1)
for some y, 2
= y(& - U;),y = 0) = ( 0 )
The disjunctive formulation is:
This simplifies upon substituting out for each di).
R. JEROSLO W
192 2.
Simplifies to: (use
3.
X = A,)
SOLUTIONS
193
Simplification gives (using X = ~
2 ,=
y(')):
and similarly rec*(Pz) = ((21, z z ) l z ~= 0, zz 2 0). The disjunctive formulation simplifies to (aher substituting for (l)
(z)
(l)
21 9 2 1 922
=PI):
To obtain only one binary variable A(= XZ), use e.g. A1 = 1 + X in the above and delete "A1 + Xz = 1."
R. JEROSLO W
194
In Examples 5 to 7, the left-hand-side (LHS)function is called f , and the right-hand-side (RHS) functions are called g and h. LR=linear relaxation. 5. (3, Q)is in conv(grph(g)) (i.e. the convex span of the graph of g) and also (3,3) E conv(grph(h)). So (3, is in the LR of the RHS formulation,
y)
but not in the LR of the LHS formulation. The LHS formulation is superior.
i)
6. (1, E conv(grph(g)) and ( 1 , O ) E conv(grph(h)), so (1,;) is in the LR of the R H S formulation-but not of the LHS formulation.
Again the LHS formulation is superior. 7. When the min(u,u) function is formulated disjunctively, its LR(linear relaxation) is the convex span of (I,u,u ) = (0, 0, 0), (0,2,0), (0, 0,2) and (2,2,2). Therefore (z,z1,1 - 21) is in the LR of these four points,
when the ELHS (i.e. separate) formulation is used. Thus for suitable 0 1 , 0 2 , 0 3 , 0 4 we have:
In fact, (1) is simply long-hand for ( t , z 1 , 1 - 21) = @1(0,0,0)+ %(O, 2,O) 03(0,0,2)+ 04(2,2,2). Note that (1) implies
+
by adding the rows for
21
and 1 - 21. We will use (2) later.
When the LHS (i.e. pooled) representation is used, ( z , 21) is in the LR if€ it is in the convex span of ( O , O ) , ( l , l ) and (0,2)-i.e. if€ there are 61,62,63 with 61+62+63=1 d 6 k > o
SOLUTIONS
195
It sufficesto show that a solution to (1)gives a solution to (3). (We know that the converse is true, since the pooled formulation is best possible). Given a solution to (l), put 62 = 2@4,S3= 0 2 . Then from (1)we at once have z = &,zl = 263 6., Also using (2) we get 6, 6, = O2 + 2 0 4 5 2 0 1 +2@3+4@4= 1, so ifwe set 61 = 1-82 -63 we get 61 2 0. So we do obtain a solution to (3).
+
+
We conclude: as far as the linear relaxations are concerned, here the pooled and seperate formulations are equdly good This outcome is atypical, as commonly the pooled formulation is strictly superior.
Then PI = {%I for some 01,
...,O n > 0 with xy Oj = 1,
we have z = xy[Oj-$]ej} where ej is the j-th unit vector. We have that the desired set S =
PI u ...u P,.
After simplification, we obtain:
9.
One way of representing it is from the four polyhedra:
R. JEROSLO W
196
The disjunctive construction gives:
This simplifies to:
Xl+Xz+X3