Praise for Stochastic Local Search: Foundations and Applications ‘Hoos and Stützle, two major players in the field, provide us with an excellent overview of stochastic local search. If you are looking for a book that covers all the major metaheuristics, gives you insight into their working and guides you in their application to a wide set of combinatorial optimization problems, this is the book.’ Marco Dorigo, Université Libre de Bruxelles ‘Stochastic Local Search: Foundations and Applications provides an original and synthetic presentation of a large class of algorithms more commonly known as metaheuristics. Over the last 20 years, these methods have become extremely popular, often representing the only practical approach for tackling so many of the hard combinatorial problems that are encountered in real-life applications. Hoos and Stützle’s treatment of the topic is comprehensive and covers a variety of techniques, including simulated annealing, tabu search, genetic algorithms and ant colony optimization, but a main feature of the book is its proposal of a most welcome unifying framework for describing and analyzing the various methods.’ Michel Gendreau, Université de Montréal ‘Local search algorithms are often the most practical approach to solving constraint satisfaction and optimization problems that admit no fast deterministic solution. This book is full of information and insights that would be invaluable for both researchers and practitioners.’ Henry Kautz, University of Washington ‘This extensive book provides an authoritative and detailed exposition for novices and experts alike who need to tackle difficult decision or combinatorial optimization problems. The chapters span fundamental theoretical questions such as, “When and why do heuristics work well?” but also more applied aspects involving, for instance, the comparison of very different algorithms. The authors are university faculty members and leading players in their research fields; our communities will enjoy in particular their book’s valuable teaching material and a “complete” bibliography of the state of the art for the field.’ Olivier Martin, Université Paris-Sud, Orsay ‘The authors provide a lucid and comprehensive introduction to the large body of work on stochastic local search methods for solving combinatorial problems. The text also covers a series of carefully executed empirical studies that provide significant further insights into the performance of such methods and show the value of an empirical scientific methodology in the study of algorithms. An excellent overview of the wide range of applications of stochastic local search methods is included.’ Bart Selman, Cornell University
‘Stochastic local search is a powerful search technique for solving a wide range of combinatorial problems. If you only want to read one book on this important topic, you should read Hoos and Stützle’s. It is a comprehensive and informative survey of the field that will equip you with the tools and understanding to use stochastic local search to solve the problems you come across.’ Toby Walsh, Cork Constraint Computation Centre University College Cork ‘This book provides remarkable coverage and synthesis of the recent explosion of work on randomized local search algorithms. It will serve as a good textbook for classes on heuristic search and metaheuristics as well as a central reference for researchers. The book provides a unification of a broad spectrum of methods that enables concise, highly readable descriptions of theoretical and experimental results.’ David L. Woodruff, University of California, Davis
S tochastic L ocal S earch F oundations and A pplications
This Page Intentionally Left Blank
S tochastic L ocal S earch F oundations and A pplications
Holger H. Hoos Department of Computer Science University of British Columbia Canada Thomas Stützle Department of Computer Science Darmstadt University of Technology Germany
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER
Senior Editor Publishing Services Manager Editorial Coordinator Editorial Assistant Cover Design Cover Image Text Design Composition Technical Illustration Copyeditor Proofreader Indexer Interior printer Cover printer
Denise Penrose Simon E. M. Crump Emilia Thiuri Valerie Witte Gary Ragaglia and Holger H. Hoos c Thomas Morse/Chuck Place Photo “Antelope Canyon” Rebecca Evans and Associates Kolam USA Dartmouth Publishing, Inc. Lori Newhouse Calum Ross Robert Swanson The Maple-Vail Book Manufacturing Group Phoenix Color
Morgan Kaufmann Publishers is an imprint of Elsevier. 500 Sansome Street, Suite 400, San Francisco, CA 94111 This book is printed on acid-free paper. © 2005 by Elsevier Inc. All rights reserved. Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—without prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com) by selecting “Customer Support” and then “Obtaining Permissions.” Library of Congress Cataloging-in-Publication Data Application submitted ISBN: 1-55860-872-9 For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com. Printed in the United States of America 04 05 06 07 08 5 4 3 2 1
[Science] ... is not a steady march from ignorance to knowledge. It’s more like mountaineering expedition. On the way up an unscaled peak, climbers will gain some altitude on one route, then find it’s a dead end. They’ll spot a better one, backtrack a little and move on. The fact that they sometimes have to take a step backward for every two steps forward doesn’t mean they are wasting their time. It means that inching up an uncharted mountain is tough work. When you step back, though, and take a look at the overall picture — a long view from the upper slopes of the mountain — it turns out in hindsight that the path was clear. (Michael D. Lemonick, Science Writer)
This Page Intentionally Left Blank
This book is dedicated to our parents Dieses Buch ist unseren Eltern gewidmet
Eva-Marie Hoos & Hans-Helmut Hoos Berta Stützle & Günther Stützle
About the Authors Holger H. Hoos is an Assistant Professor at the Computer Science Department of the University of British Columbia (Canada). His Ph.D. thesis on stochastic local search algorithms for computationally hard problems in artificial intelligence, completed in 1998 at Darmstadt University of Technology (Germany), received the ‘Best Dissertation Award 1999’ of the German Informatics Society. He has been working on the design and empirical analysis of stochastic local search algorithms since 1994, and his research in this area has been published in book chapters, journal articles and at major conferences in AI and OR. Holger’s research interests are currently focused on topics in artificial intelligence, bioinformatics, empirical algorithmics and computer music. At the University of British Columbia, he is a founding member of the Bioinformatics, Empirical & Theoretical Algorithmics Laboratory (BETA-Lab), a member of the Laboratory for Computational Intelligence (LCI), and a faculty associate of the Peter Wall Institute for Advanced Studies. Thomas Stützle is an Assistant Professor at the Computer Science Department of Darmstadt University of Technology (Germany). He received an M.Sc. degree in Industrial Engineering and Management Science at the University of Karlsruhe and a Ph.D. from the Computer Science Department of Darmstadt University of Technology. He was a postgraduate fellow at the Department of Statistics and Operations Research, Universidad Complutense de Madrid and a Marie Curie Fellow at IRIDIA, Université Libre de Bruxelles. Thomas has been involved in several EU funded projects on the study of stochastic local search techniques and his research is published in various journals, book chapters and conferences in OR and AI. His current research focuses on the further development of SLS methods, search space analysis, the automatisation of the design and the tuning of SLS algorithms, and new hybridisation schemes for the effective solution of hard combinatorial problems.
Contents Prologue
1
Part I: Foundations
11
1 Introduction
13
1.1 Combinatorial Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Two Prototypical Combinatorial Problems . . . . . . . . . . . . . . . . . . . . . 1.3 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Depth: Some Advanced Concepts in Computational Complexity . . . . . . . .
1.4 Search Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Stochastic Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Depth: Randomness and Probabilistic Computation . . . . . . . . . . . . . . . .
1.6 Further Readings and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 SLS Methods 2.1 2.2 2.3 2.4 2.5 2.6
Iterative Improvement (Revisited) . . ‘Simple’ SLS Methods . . . . . . . . . . Hybrid SLS Methods . . . . . . . . . . . Population-Based SLS Methods . . . . Further Readings and Related Work Summary . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . .
3 Generalised Local Search Machines
13 16 23 29 31 37 52 54 55 56
61 . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. 61 . 71 . 85 . 95 . 105 . 108 . 111
113
3.1 The Basic GLSM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 In Depth: Formal Definition of GLSM Semantics . . . . . . . . . . . . . . . . . . . 119 3.2 State, Transition and Machine Types . . . . . . . . . . . . . . . . . . . . . . . . . 122 3.3 Modelling SLS Methods Using GLSMs . . . . . . . . . . . . . . . . . . . . . . . 131 3.4 Extensions of the Basic GLSM Model . . . . . . . . . . . . . . . . . . . . . . . . 138 3.5 Further Readings and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 142 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
xi
xii
Contents
4 Empirical Analysis of SLS Algorithms
149
4.1 Las Vegas Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 In Depth: Probabilistic Approximate Completeness and ‘Convergence’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.2 Run-Time Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.3 RTD-Based Analysis of LVA Behaviour . . . . . . . . . . . . . . . . . . . . . . 171 In Depth: Benchmark Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 4.4 Characterising and Improving LVA Behaviour . . . . . . . . . . . . . . . . . . 184 4.5 Further Readings and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 198 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5 Search Space Structure and SLS Performance 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
Fundamental Search Space Properties Search Landscapes and Local Minima . Fitness-Distance Correlation . . . . . . . Ruggedness . . . . . . . . . . . . . . . . . . In Depth: NK -Landscapes . . . . . . . . . . Plateaus . . . . . . . . . . . . . . . . . . . . . Barriers and Basins . . . . . . . . . . . . . Further Readings and Related Work . Summary . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
203 . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
203 209 220 226 231 235 243 249 250 252
Part II: Applications
255
6 Propositional Satisfiability and Constraint Satisfaction
257
6.1 The Satisfiability Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 6.2 The GSAT Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 In Depth: Efficiently Implementing GSAT . . . . . . . . . . . . . . . . . . . . . . . 271 6.3 The WalkSAT Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 6.4 Dynamic Local Search Algorithms for SAT . . . . . . . . . . . . . . . . . . . . 284 6.5 Constraint Satisfaction Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 6.6 SLS Algorithms for CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 6.7 Further Readings and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 306 6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
7 MAX-SAT and MAX-CSP
313
7.1 The MAX-SAT Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 7.2 SLS Algorithms for MAX-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 In Depth: Efficient Evaluation of k -Flip Neighbourhoods for MAX-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 7.3 SLS Algorithms for MAX-CSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Contents
xiii
7.4 Further Readings and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 350 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
8 Travelling Salesman Problems 8.1 8.2 8.3 8.4 8.5 8.6
357
TSP Applications and Benchmark Instances . . . . . . . . . . . . . . . . . . . 357 ‘Simple’ SLS Algorithms for the TSP . . . . . . . . . . . . . . . . . . . . . . . . 367 In Depth: Efficiently Implementing SLS Algorithms for the TSP . . . . . . . . . . 382 Iterated Local Search Algorithms for the TSP . . . . . . . . . . . . . . . . . . 384 Population-Based SLS Algorithms for the TSP . . . . . . . . . . . . . . . . . . 399 Further Readings and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 410 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
9 Scheduling Problems 9.1 9.2 9.3 9.4 9.5 9.6
417
Models and General Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 417 Single Machine Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 In Depth: Details of Dynasearch for the SMTWTP . . . . . . . . . . . . . . . . . . 437 Flow Shop Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 In Depth: Neighbourhood Restrictions in TS-NS-PFSP . . . . . . . . . . . . . . . . 448 Group Shop Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Further Readings and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 460 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
10 Other Combinatorial Problems 10.1 10.2 10.3 10.4 10.5 10.6 10.7
Graph Colouring . . . . . . . . . . . . . . The Quadratic Assignment Problem . Set Covering . . . . . . . . . . . . . . . . . Combinatorial Auctions . . . . . . . . . DNA Code Design . . . . . . . . . . . . Further Readings and Related Work Summary . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . .
467 . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
468 477 488 498 507 517 520 523
Epilogue
527
Glossary
537
Bibliography
575
Index
633
This Page Intentionally Left Blank
Iterative Improvement (II): determine initial candidate solution s While s is not a local optimum: ⎢ ⎢ choose a neighbour s of s such that ⎢ ⎢ ⎣ g(s ) < g(s) s := s
Randomised Iterative Improvement (RII): determine initial candidate solution s While termination condition not satisfied: ⎢ ⎢ With probability wp: ⎢ ⎢ choose a neighbour s of s uniformly ⎢ ⎢ at random ⎢ ⎢ ⎢ Otherwise: ⎢ ⎢ choose a neighbour s of s such that ⎢ ⎢ g(s ) < g(s) ⎢ ⎢ ⎢ or, if no such s exists, choose s ⎢ ⎣ such that g(s ) is minimal s := s
Variable Neighbourhood Descent (VND): determine initial candidate solution s i := 1 Repeat: choose a most improving neighbour s of s in N i If g(s ) < g(s): s := s i := 1 Else: i := i + 1 Until i > imax Note: N1 , . . . , N imax is a set of neighbourhood relations, typically ordered according to increasing size of the respective local neighbourhoods.
Variable Depth Search (VDS): determine initial candidate solution s
tˆ:= s
While s is not locally optimal: ⎢ ⎢ Repeat: ⎢ ⎢ ⎢ select best feasible neighbour t ⎢ ⎢ If g(t) < g(tˆ): tˆ:= t ⎢ ⎢ Until construction of complex step ⎢ ⎢ ⎣ completed s := tˆ
Simulated Annealing (SA): determine initial candidate solution s set initial temperature T according to annealing schedule While termination condition not satisfied: ⎢ ⎢ probabilistically choose a neighbour ⎢ ⎢ ⎢ s of s ⎢ ⎢ If s satisfies probabilistic acceptance ⎢ ⎢ criterion (depending on T ): ⎢ ⎢ ⎣ s := s update T according to annealing schedule Note: The annealing schedule may keep T constant for a number of search steps.
Tabu Search (TS): determine initial candidate solution s While termination criterion is not satisfied: ⎢ ⎢ determine set N of non-tabu neighbours of s ⎢ ⎢ ⎢ choose a best improving solution s in N ⎢ ⎣ update tabu attributes based on s s := s Note: Tabu attributes are associated with solution components.
Dynamic Local Search (DLS): determine initial candidate solution s initialise penalties While termination criterion is not satisfied: ⎢ ⎢ compute modified evaluation function g ⎢ ⎢ from g based on penalties ⎢ ⎢ ⎢ perform subsidiary local search on s ⎢ ⎣ using evaluation function g update penalties based on s Note: Penalties are associated with solution components; the subsidiary local search ends in a local minimum of g .
Iterated Local Search (ILS): determine initial candidate solution s perform subsidiary local search on s While termination criterion is not satisfied: ⎢ ⎢ r := s ⎢ ⎢ perform perturbation on s ⎢ ⎢ ⎢ perform subsidiary local search on s ⎢ ⎣ based on acceptance criterion, keep s or revert to s := r Note: The search history may additionally influence the perturbation phase and the acceptance criterion.
Greedy Randomised ‘Adaptive’ Search Procedure (GRASP): While termination criterion is not satisfied: ⎢ ⎢ generate candidate solution s using ⎢ ⎢ ⎢ subsidiary greedy randomised ⎢ ⎣ constructive search perform subsidiary local search on s
Adaptive Iterated Construction Search (AICS): initialise weights While termination criterion is not satisfied: ⎢ ⎢ generate candidate solution s using subsi⎢ ⎢ diary randomised constructive search ⎢ ⎢ ⎣ perform subsidiary local search on s adapt weights based on s Note: The subsidiary constructive search is based on weights and heuristic information.
Ant Colony Optimisation (ACO): initialise weights While termination criterion is not satisfied: ⎢ ⎢ generate population sp of candidate ⎢ ⎢ solutions using subsidiary ⎢ ⎢ ⎢ randomised constructive search ⎢ ⎣ perform subsidiary local search on sp adapt weights based on sp Note: The subsidiary constructive search uses weights (pheromone trails) and heuristic information.
Memetic Algorithm (MA): determine initial population sp perform subsidiary local search on sp While termination criterion is not satisfied: ⎢ ⎢ generate set spr of new candidate ⎢ ⎢ solutions by recombination ⎢ ⎢ ⎢ perform subsidiary local search on spr ⎢ ⎢ generate set spm of new candidate ⎢ ⎢ solutions from sp and spr by mutation ⎢ ⎢ ⎢ perform subsidiary local search on spm ⎢ ⎣ select new population sp from candidate solutions in sp, spr and spm
We can only see a short distance ahead, but we can see plenty there that needs to be done. —Alan Turing, Mathematician
rologue Imagine you visit a friend in the beautiful city of Augsburg in southern Germany. It is summer and you set yourselves the challenge to visit all 127 ‘Biergärten’ (beer gardens) on a single day. (If you don’t like beer, or if you don’t have friends in Augsburg, consider visiting all coffee shops in Vancouver, Canada.) Can this be done? If so, which route should you take? Clearly, your chances of reaching your goal may depend on finding a short round trip that takes you to all 127 places. As you arrive at Biergarten No. 42, your friend gives you the following puzzle, offering to pay for all your drinks if you can solve it before the night is over: ‘Last week my friends Anne, Carl, Eva, Gustaf and I went out for dinner every night, Monday to Friday. I missed the meal on Friday because I was visiting my sister and her family. But otherwise, every one of us had selected a restaurant for a particular night and served as a host for that dinner. Overall, the following restaurants were selected: a French bistro, a sushi bar, a pizzeria, a Greek restaurant, and the Brauhaus. Eva took us out on Wednesday. The Friday dinner was at the Brauhaus. Carl, who doesn’t eat sushi, was the first host. Gustaf had selected the bistro for the night before one of the friends took everyone to the pizzeria. Tell me, who selected which restaurant for which night?’ There are various approaches for solving these problems. Given the huge number of possible round trips through the Biergärten, or assignments of weekdays, hosts, and restaurants, systematic enumeration (i.e., trying out all possibilities) is probably not a realistic option. Some people would take a more sophisticated approach and eliminate certain assignments or partial tours through careful reasoning, while systematically searching over the remaining alternatives. 1
2
Prologue
But most of us would probably take a rather different approach in practice: starting with a rough and somewhat arbitrary first guess, small changes are repeatedly performed on a given tour or assignment, with the goal of improving its quality or of getting closer to a feasible solution. This latter type of approach is known as stochastic local search (SLS) and plays a very important role in solving combinatorial problems like the ones illustrated above. (It may be noted that the logical puzzle and the shortest round trip problem can be seen as instances of the Propositional Satisfiability and Travelling Salesman Problems, which will be more formally introduced in Chapter 1 and used throughout this book.)
Why Stochastic Local Search? There are many reasons for studying stochastic local search (SLS) methods. As illustrated above, SLS is closely related to a very natural approach in human problem solving. Many SLS methods are surprisingly simple, and the respective algorithms are rather easy to understand, communicate and implement. Yet, these algorithms can often solve computationally hard problems very effectively and robustly. SLS methods are also typically quite general and flexible. The same SLS methods have been found to work well for a broad range of different combinatorial problems, and existing algorithms can often be modified quite naturally and easily to solve variants of a given problem. This makes SLS methods particularly attractive for solving real-world problems, which are often not completely or correctly specified at the beginning of a project and may consequently undergo numerous revisions or modifications before all relevant aspects of the given application situation are captured. Another reason for the popularity of SLS lies in the fact that this computational approach to problem solving facilitates an explorative approach to algorithm design. Furthermore, as we will discuss in more detail in Chapter 2, many prominent and successful SLS methods are inspired by natural phenomena, which gives them an additional intellectual appeal. For these (and many other) reasons, SLS methods are among the most prominent and successful techniques for solving computationally hard problems in many areas of computer science (specifically artificial intelligence) and operations research; they are also widely used for solving combinatorial problems in other disciplines, including engineering, physics, management science and bioinformatics. The academic interest in SLS methods can be traced back to the beginnings of modern computing. In operations research, local search algorithms were developed and described in the 1950s, and in artificial intelligence, SLS methods have been studied since the early days of the field, in the 1960s. To date, the study
Prologue
of SLS algorithms falls into the intersection of algorithmics, statistics, artificial intelligence, operations research, and numerous application areas. At the same time, SLS methods play a prominent role in these fields and are rapidly becoming part of the respective mainstream academic curricula.
About this Book ‘Stochastic Local Search: Foundations and Applications’ was primarily written for researchers, students and practitioners with an interest in efficient heuristic methods for solving hard combinatorial problems. In particular, it is geared towards academic and industry researchers in computer science, artificial intelligence, operations research, and engineering, as an introduction to and overview of the field or as a reference text; towards graduate students in computer science, operations research, mathematics, or engineering, as well as towards senior undergraduate students with some background in computer science and mathematics, as primary or supplementary text for a course, or for self-study; and towards practitioners, who need to solve combinatorial problems for practical applications, as a reference text or as an introduction to and overview of the field. The main goal of this book is to provide its readers with important components of a scientific approach to the design and application of SLS methods, and to present them with a broad, yet detailed view on the general concepts and specific instances of SLS methods, including aspects of their development, analysis, and application. More specifically, we aim to give our readers access to detailed knowledge on the most prominent and successful SLS techniques; to facilitate an understanding of the relationships, the characteristic similarities and differences between existing methods; to introduce and discuss basic and advanced aspects of the empirical analysis of SLS algorithms; and to give hands-on knowledge on the application of some of the most widely used SLS methods to a variety of combinatorial problems. Stochastic search algorithms are being studied by a large number of researchers from different communities, many of which have quite different views on the topic or specific aspects of it. While striving for a balanced and objective presentation, this book provides a view on stochastic local search that is based on our background and experience. This is reflected, for instance, in the specific choice of our formal definition of stochastic local search (Chapter 1), in the GLSM model for hybrid SLS methods (Chapter 3), the extensive and in-depth coverage of empirical analysis and search space structure (Chapters 4 and 5), as well as in the selection of algorithms and problems we cover in varying degree of detail (particularly in Chapters 9 and 10). There are rational reasons for most
3
4
Prologue
– if not all – of these choices; nevertheless, in many cases, equally defensible alternative decisions could have been made. Clearly, some topics would benefit from broader and deeper coverage. However, even relatively large book projects are subject to certain resource limitations in both time and space, and it is our hope that our choices of the material and its presentation will make this book useful for the previously stated purposes.
Structure and Supplementary Materials The main body of this book consists of two parts. Part 1, which comprises Chapters 1 to 5, covers the foundations of the study of stochastic local search algorithms, including: • fundamental concepts, definitions, and terminology (Chapter 1), • an introduction to a broad range of important SLS methods and their most relevant variants (Chapter 2), • a conceptual and formal model that facilitates the development and understanding of hybrid SLS methods (Chapter 3), • a methodical approach for the empirical analysis of SLS methods and other randomised algorithms (Chapter 4), and • features and properties of the spaces searched by SLS algorithms and their impact on SLS behaviour (Chapter 5). The material from the first two chapters provides the basis for all other aspects of SLS algorithms covered in this book; Chapters 1 and 2 should therefore be read before any other chapters and in their natural sequence. Chapters 3, 4, and 5 are quite independent from each other and expand the foundations of SLS in different directions. Chapter 3 complements Chapter 2; since it discusses some of the more complex SLS methods in a different light, it can be very useful for reviewing and deepening the understanding of these practically very relevant methods. The scope of Chapter 4 extends substantially beyond the empirical analysis of SLS algorithms; although most of the material covered in the subsequent chapters does not directly depend on the concepts and methods from Chapter 4, we strongly believe that anyone involved in the design and application of SLS algorithms should be familiar at least with the basic issues and approaches discussed there. Chapter 5 in some sense covers the most advanced material presented in this book; it should be useful to readers interested in a deeper knowledge of the
Prologue
factors and reasons underlying SLS behaviour and performance, but reading it is not a prerequisite to understanding any of the material covered in the other chapters. Part 2 comprises Chapters 6 to 10, which present, in varying degree of scope and detail, SLS algorithms for a number of well-known and widely studied combinatorial problems. Except for Chapter 7, which should be read after Chapter 6 since it builds on much of the material covered there, all chapters of this second part are basically independent of each other and can be studied in any combination and order. Chapters 6 to 8 provide a reasonable coverage of the most prominent and successful SLS methods for the respective problems and discuss the respective algorithms in a relatively detailed way. Chapters 9 and 10 are of a more introductory nature; their focus lies on a small number of SLS algorithms for the respective combinatorial problems that have been selected primarily based on their performance and general interest. In particular, the five main sections of Chapter 10 are independent of each other and can be studied in any combination and order. ‘In Depth’ Sections. Additional, clearly marked ‘In Depth’ sections are included in various chapters. These provide additional material that expands or complements the main body of the respective chapter, often at a more technical or detailed level. These sections are generally not required for understanding the main text, but in many cases they should be helpful for obtaining a deeper understanding of important concepts and issues. Further Readings.
Towards the end of each chapter, a ‘Further Readings and Related Work’ section provides additional references and pointers to literature on related topics. In the case of subjects for which there is a large body of literature, these represent only a small selection of references deemed especially relevant and/or accessible by the authors. These references should provide good starting points for the reader interested in a broader and deeper knowledge of the respective topic.
Chapter Summaries.
Each chapter closes with a summary section that briefly reviews the most relevant concepts and ideas covered in the respective chapter. The purpose of this summary is to provide the reader with a high-level overview of the material presented in the chapter, and to point out connections (and differences) between the respective concepts and approaches. Together with the chapter introductions and exercises, these summaries facilitate rapid reviewing of previously studied or known material.
5
6
Prologue
Exercises.
Each chapter is accompanied by a collection of exercises, classified according to their degree of difficulty as ‘easy’, ‘medium’ and ‘hard’. This classification is only approximate and does not necessarily reflect the anticipated amount of time needed for producing a solution; although an exercise marked as ‘easy’ may be relatively straightforward to solve, it may still require a substantial amount of time until the details of the solution are worked out and written down. The exercises cover the material presented in the respective chapter and are intended to facilitate a deeper understanding of the subject matter. They include theoretical questions as well as hands-on implementation and experimentation exercises.
References and Bibliography. References to the technical and research literature are provided throughout the book, particularly in the previously mentioned ‘Further Readings and Related Work’ sections. These give rise to an extensive bibliography that covers much of the most relevant literature on SLS algorithms and related topics, with a particular emphasis on recent publications. Glossary and Index. The glossary contains brief explanations of important technical terms useful throughout the book. In conjunction with the extensive and thoroughly compiled index, the glossary particularly facilitates using this book as a reference book or for self-study. Webpage and Supplementary Materials.
Supplementary materials are provided from the book webpage at www.sls-book.net. These include slide sets that may be useful in the context of courses that use the book as a primary or supplementary text (see also Section ‘Suggested Uses’ below), as well as reference implementations of some of the SLS algorithms discussed in this book (needed for some of the hands-on exercises and useful for further practical experience) and some educational tools, for example, for the empirical analysis of SLS behaviour.
Suggested Uses This book was designed for various types of uses. As a whole, it is intended to be used as a reference book for researchers and practitioners or as the primary text for a specialised graduate or upper-level undergraduate course on stochastic search algorithms; furthermore, parts of it can be used as primary reading or supplementary material for modules of more general courses in artificial intelligence, algorithms, operations research, combinatorial problem solving, empirical methods in computer science, etc. The following specific suggestions reflect our own experience, including the use of parts of this book by students, researchers, and
Prologue
course instructors at the University of British Columbia (Vancouver, Canada) and Darmstadt University of Technology (Darmstadt, Germany). General introduction to SLS methods, particularly for self-study. Chapters 1 and 2; Sections 3.1 to 3.3 and 3.6; Sections 4.1 to 4.3 and 4.6; Section 5.8; any one or two sections from Chapter 10. For more advanced self-study, the remaining materials can be added as desired; particularly the remaining sections of Chapters 4 as well as Chapters 6 and 8 are highly recommended. Graduate Course on SLS methods/stochastic search. Chapters 1 and 2; Sections 3.1 to 3.3 and 3.6; Chapter 4; Sections 5.1 to 5.3 and 5.8; Chapters 6 and 7 without the sections on CSP and MAX-CSP; Chapter 8; and any two sections from Chapter 10. Depending on the precise format, focus and level of the course, this selection may be expanded in various ways, for example, by additionally covering Section 9.1 and any one other section from Chapter 9. For a general course on stochastic search methods, an additional module on randomised systematic search algorithms should be included (a sample set of slides for such a module is available from www.sls-book.net). SLS Module(s) in a general AI course.
Parts of Chapters 1 and 2; Sections 3.1 to 3.3 and 3.6; Sections 4.1 to 4.3 and 4.6; parts of Chapter 6; and possibly parts of Chapters 8, 9, or 10. The selections from Chapters 1, 2, 6 and 8 to 10 will naturally be based on the prerequisite knowledge of the students as well as the format, level and other modules of the course. A minimal subset for a module of about two lectures in an undergraduate course would mainly take parts of Chapters 1 and 2 and illustrate the working principles of SLS methods using example applications described in Part 2. SLS Module(s) in a general algorithms course.
Parts of Chapters 1 and 2; Sections 3.1 to 3.3 and 3.6; Sections 4.1 to 4.3 and 4.6; Sections 5.1 to 5.3 and 5.8; parts of Chapters 6 and 8; and possibly one or more sections from Chapter 10. The precise balance between these components will naturally depend on the exact nature of the course, particularly on its focus on theoretical or practical aspects of problem solving. In the context of strongly practically oriented algorithms courses, the in-depth sections in Chapters 4, 6 and 8 may be of particular interest.
SLS Module(s) in a discrete optimisation course. Parts of Chapters 1 and 2; Sections 3.1 to 3.3 and 3.6; Chapter 4; parts of Chapter 8 and 9; and any one or two sections from Chapter 10. Additional material, particularly from Chapters 6 and 7, can be used to further expand and complement this selection.
7
8
Prologue
Parts of this book can also be used as primary or supplementary material for specialised graduate courses on SAT, CSP, TSP, scheduling and empirical methods in computing.
The Making of SLS:FA The process of creating this book is in many ways related to the subject material discussed therein. Not unlike the fundamental approach of local search, it involved navigating a huge space of possibilities in an iterative manner. This process was initiated in 1998, when both, H. H. and T. S. were finishing their Ph.D. theses at the Computer Science Department of Darmstadt University of Technology, and the idea of combining materials from both theses into a comprehensive book on Stochastic Local Search first arose. Five years and about 650 pages later, we reached the end of this search trajectory. The result of a myriad of construction, perturbation and evaluation steps is this book. Interestingly and perhaps not too surprisingly, both, the writing process and its end result turned out to be very different from what we had originally imagined. Although it would be hard to precisely define the objective(s) being optimised through the writing process, it took us through many situations that closely resemble those of a stochastic local search algorithm trying to solve a challenging instance of a hard combinatorial problem. There were phases of rapid progress and stagnation; we encountered (and overcame) numerous local minima; and along the way, we had to make many decisions based on very limited local information, various forms of heuristic guidance, and some degree of experience. Random, or at least completely unforeseen and unpredictable, factors played a large role in this local search process. Rather trivial sources of randomness, such as hardware and software glitches, were complemented by more fundamental stochastic influences, such as the random thoughts and ideas that on warm summer nights seem to preferably lurk around the Biergärten, always looking for a receptive mind, or the random person sticking their head into the office door, causing the more organised ideas to fly apart in a hurry. Without these random influences, and the circumstances conducive to them, this book could not have been created in its present form. At the same time, this book has been shaped by many other factors and influences. These include the places and circumstances under which part of the work was done. (Some of the more interesting places where parts of the book have been written include a log cabin on Sechelt Inlet, the beautiful and tranquil Nitobe Garden, a grassy spot near the top of Whyte Islet in Howe Sound, and the wild and remote inlets of the Pacific Northwest, onboard the Nautilus Explorer.) More importantly, they include a huge and diverse amount of interaction with
Prologue
friends and family, mentors, colleagues, students and our publishers, who provided crucial guidance, diversification, evaluation and general support. Finally, especially during the final phase of the process, our work on this book was largely driven by Hofstadter’s Law: ‘It always takes longer than you expect, even when you take into account Hofstadter’s Law.’ [Hofstadter, 1979], the significance and effects of which can hardly be overestimated. As a consequence, it would be foolish to believe that our stochastic local search process has led us into a global optimum. However, we feel that, largely thanks to the previously mentioned factors and influences, in the process of creating this book we managed to avoid and escape from many low-quality local optima, and achieved an end result that we hope will be useful to those who study it. In this context, we are deeply grateful towards those who contributed directly and indirectly to this work, and who provided us with guidance and support in our local — and global — search. High-level guidance is of central importance in any effective search process; in our case, there are several people who played a key role in shaping our approach to scientific research and who provided crucial support during various stages of our academic careers. First and foremost, we thank Wolfgang Bibel, our former advisor and ‘Doktorvater’, for providing a highly supportive and stimulating academic environment in which we could freely pursue our research interests, and whose encouragement and substantial support was highly significant in getting this project underway. Furthermore, H. H. gratefully acknowledges the ongoing and invaluable support from his academic mentors and colleagues, Alan Mackworth and Anne Condon, who also played an important role during the early stages of writing this book. T. S. would especially like to thank Marco Dorigo for the pleasure of joint research and for his support in many senses. On the other side, we have received more specific guidance on the contents of this book from a number of colleagues, students and fellow SLS researchers. Their detailed comments led to improvements in various parts of this book and helped to significantly reduce the number of errors. (Obviously, the responsibility for those errors that we managed to hide well enough to escape their vigilance rests solely with us.) In this context, we especially thank (in alphabetical order) Markus Aderhold, Christian Blum, Marco Chiarandini, Anne Condon, Irina Dumitrescu, Frank Hutter, David Johnson, Olivier Martin, Luis Paquete, Marco Pranzo, Tommaso Schiavinotto, Kevin Smyth, Dan Tulpan and Maxwell Young. We also acknowledge helpful comments by Craig Boutilier, Rina Dechter, JinKao Hao, Keld Helsgaun, Kalev Kask, Henry Kautz, Janek Klawe, Lucas Lessing, Elena Marchiori, David Poole, Rubén Ruiz García, Alena Shmygelska and Dave Tompkins. Special thanks go to David Woodruff, Toby Walsh, Celso Ribeiro and Peter Merz, whose detailed comments provided valuable guidance in improving the presentation of our work.
9
10
Prologue
In addition, we gratefully acknowledge the interesting and stimulating discussions on the topics of this book that we shared with many of our co-authors, colleagues, students and fellow researchers at TUD and UBC, as well as at conferences, workshops, tutorials and seminars. It is their encouragement, enthusiasm and continuing interest that provided much of the background and motivation for this work. The staff at Morgan Kaufmann, Elsevier, Kolam and Dartmouth Publishing have been instrumental in the realisation of this book in many ways; we deeply appreciate their expertise and friendly support throughout the various stages of this project. We are particularly grateful to Denise Penrose, Senior Editor at Morgan Kaufmann, whose enthusiasm for this project and patience in dealing with the adverse effects of Hofstadter’s Law (as well as with her authors’ more peculiar wishes and ideas) played a key role in creating this book. Simon Crump, Publishing Services Manager at Elsevier, and Jamey Stegmaier, Project Manager at Kolam USA, have been similarly instrumental during the production stages, and we gratefully acknowledge their help and support. We also thank Jessica Meehan and her team at Dartmouth Publishing, who produced many of the figures, as well as Lori Newhouse and Calum Ross for copyediting and proofreading the book, and Robert Swanson for creating the index. Many thanks also to Emilia Thiuri and Valerie Witte, for their help during the draft stages, throughout the reviewing process and during production, and to Brian Grimm, marketing manager at Morgan Kaufmann, for substantially increasing the visibility of our work. H. H. also wishes to thank Valerie McRae for her help with proofreading the manuscript in various draft stages, and for much appreciated moral and administrative support. Finally, we thank our families who provided the stable and stimulating environment that formed the starting point of our personal and intellectual development, and who shape and accompany the trajectories of our lives in a unique and special way. H. H. expresses his deepest gratitude to Sonia and Jehannine for being his partners in adventure, joy, and sorrow, and his parents, siblings and extended family for their affection and diversifying influence. T. S. especially thanks his wife Maria José for sharing her life with him, Alexander for all his curiosity and love, and his parents for their continuous care and support. This book has been shaped by many factors and influences, but first and foremost it is the product of our joint research interests and activities, which co-evolved over the past seven years into an immensely fruitful and satisfying collaboration and, more importantly, into a close friendship.
part
I Foundations
This Page Intentionally Left Blank
The machine does not isolate us from the great problems of life but plunges us more deeply into them. —Antoine de Saint-Exupéry, Pilot & Writer
1
Introduction
This introductory chapter provides the background and motivation for studying stochastic local search algorithms for combinatorial problems. We start with an introduction to combinatorial problems and present SAT, the satisfiability problem in propositional logic, as well as TSP, the travelling salesman problem, as the central problems used for illustrative purposes throughout the first part of this book. This is followed by a short introduction to computational complexity. Next, we discuss and compare various fundamental search paradigms, including the concepts of systematic and local search, after which we formally define and discuss the notion of stochastic local search, one of the practically most important and successful approaches for solving hard combinatorial problems.
1.1 Combinatorial Problems Combinatorial problems arise in many areas of computer science and other disciplines in which computational methods are applied, such as artificial intelligence, operations research, bioinformatics and electronic commerce. Prominent examples are tasks such as finding shortest or cheapest round trips in graphs, finding models of propositional formulae or determining the 3D-structure of proteins. Other well-known combinatorial problems are encountered in planning, scheduling, time-tabling, resource allocation, code design, hardware design and genome sequencing. These problems typically involve finding groupings, orderings or assignments of a discrete, finite set of objects that satisfy certain conditions or constraints. Combinations of these solution components form the potential solutions of a combinatorial problem. A scheduling problem, for instance, can be 13
14
Chapter 1 Introduction
seen as an assignment problem in which the solution components are the events to be scheduled, and the values assigned to events correspond to the time at which these occur. This way, typically a huge number of candidate solutions can be obtained; for most combinatorial optimisation problems, the space of potential solutions for a given problem instance is at least exponential in the size of that instance.
Problems and Solutions At this point, it is useful to clarify the distinction between problems and problem instances. In this book, by ‘problem’, we mean abstract problems (sometimes also called problem classes), such as ‘for any given set of points in the Euclidian plane, find the shortest round trip connecting these points’. In this example, an instance of the problem would be to find the shortest round trip for a specific set of points in the plane. The solution of such a problem instance would be a specific shortest round trip connecting the given set of points. The solution of the abstract problem, however, is an algorithm that, given a problem instance, determines a solution for that instance. Generally, problems can be defined as sets of problem instances, where each instance is a pair of input data and solution data. This is an elegant mathematical formalisation; however, in this book we will define problems using a slightly less formal, but more intuitive (yet precise), representation. For instances of combinatorial problems, we draw an important distinction between candidate solutions and solutions. Candidate solutions are potential solutions that may possibly be encountered during an attempt to solve the given problem instance; but unlike solutions, they may not satisfy all the conditions from the problem definition. For our shortest round trip example, typically any valid round trip connecting the given set of points, regardless of length, would be a candidate solution, while only those candidate round trips with minimal length would qualify as solutions. It should be noted that while the definition of any combinatorial problem states clearly what is considered a solution for an instance of this problem, the notion of candidate solution is not always uniquely determined by the problem definition, but can already reflect a particular approach for solving the problem. As an example, consider the variant of the shortest round trip problem in which we are only interested in trips that visit each given point exactly once. In this case, candidate solutions could be either arbitrary round trips which do not necessarily respect this additional condition, or the notion of candidate solution could be restricted to round trips that visit no point more than once.
1.1 Combinatorial Problems
15
Decision Problems Many combinatorial problems can be naturally characterised as decision problems: for these, the solutions of a given instance are specified by a set of logical conditions. As an example of a combinatorial decision problem, consider the Graph Colouring Problem: given a graph G and a number of colours, find an assignment of colours to the vertices of G such that two vertices that are connected by an edge are never assigned the same colour. Other prominent combinatorial decision problems include finding satisfying truth assignments for a given propositional formula (the Propositional Satisfiability Problem, SAT, which we revisit in more detail in Section 1.2) or scheduling a series of events such that a given set of precedence constraints is satisfied. For any decision problem, we distinguish two variants: the search variant, where, given a problem instance, the objective is to find a solution (or to determine that no solution exists); the decision variant, in which for a given problem instance, one wants to answer the question whether or not a solution exists. These variants are closely related because algorithms solving the search variant can always be used to solve the decision variant. Interestingly, for many combinatorial decision problems, the converse also holds: algorithms for the decision variant of a problem can be used for finding actual solutions.
Optimisation Problems Many practically relevant combinatorial problems are optimisation problems rather than decision problems. Optimisation problems can be seen as generalisations of decision problems, where the solutions are additionally evaluated by an objective function and the goal is to find solutions with optimal objective function values. The objective function is often defined on candidate solutions as well as on solutions; the objective function value of a given candidate solution (or solution) is also called its solution quality. For the Graph Colouring Problem mentioned previously, a natural optimisation variant exists, where a variable number of colours is used and the goal is, given a graph, to find a colouring of its vertices, using only a minimal (rather than a fixed) number of colours. Any combinatorial optimisation problem can be stated as a minimisation problem or as a maximisation problem, depending on whether the given objective function is to be minimised or maximised. Often, one of the two formulations is
16
Chapter 1 Introduction
more natural, but algorithmically, minimisation and maximisation problems are treated equivalently. In this book, for uniformity and formal convenience, we generally formulate optimisation problems as minimisation problems. For each combinatorial optimisation problem, we distinguish two variants: the search variant: given a problem instance, find a solution with minimal (or maximal, respectively) objective function value; the evaluation variant: given a problem instance, find the optimal objective function value (i.e., the solution quality of an optimal solution). Clearly, the search variant is the more general of these, since with the knowledge of an optimal solution, the evaluation variant can be solved trivially. Additionally, for each optimisation problem, we can define: associated decision problems: given a problem instance and a fixed solution quality bound b, find a solution with an objective function value smaller than or equal to b (for minimisation problems; greater than or equal to b for maximisation problems) or determine that no such solution exists. Many combinatorial optimisation problems are defined based on an objective function as well as on logical conditions. In this case, candidate solutions satisfying the logical conditions are called feasible or valid, and among those, optimal solutions can be distinguished based on their objective function value. While the use of logical conditions in addition to an objective function often leads to more natural formulations of a combinatorial optimisation problem, it should be noted that the logical conditions can always be integrated into the objective function in such a way that the feasible candidate solutions correspond to the solutions of an associated decision problem (i.e., to candidate solutions with bounded solution quality). As we will see throughout this book, many algorithms for decision problems can be extended to related optimisation problems in a rather natural way. However, such simple extensions of algorithms that work well on certain decision problems are not always effective for finding optimal or near-optimal solutions of the corresponding optimisation problems, and consequently, different algorithmic methods need to be considered for this task.
1.2 Two Prototypical Combinatorial Problems In the following, we introduce two well-known combinatorial problems which will be used throughout the first part of this book for illustrating algorithmic
1.2 Two Prototypical Combinatorial Problems
17
techniques and approaches. These are the Propositional Satisfiability Problem (SAT), a prominent combinatorial decision problem which plays a central role in several areas of computer science, and the Travelling Salesman Problem (TSP), one of the most extensively studied combinatorial optimisation problems. Besides their prominence and well established role in algorithm development, both problems have the advantage of being conceptually simple, which facilitates the development, analysis and presentation of algorithms and algorithmic ideas. Both will be discussed in more detail in Part 2 of this book (see Chapters 6 and 8).
The Propositional Satisfiability Problem (SAT) Roughly speaking, the Propositional Satisfiability Problem is, given a formula in propositional logic, to decide whether there is an assignment of truth values to the propositional variables appearing in this formula under which the formula evaluates to ‘true’. In the following, we present a formal definition of SAT. While the details of this definition may not be crucial for comprehending the restricted forms of the problem used in the remainder of this book, they are important for a deeper understanding of the nature and properties of the general SAT problem. Propositional logic is based on a formal language over an alphabet comprising propositional variables, truth values and logical operators. Using logical operators, propositional variables and truth values are combined into propositional formulae which represent propositional statements. Formally, the syntax of propositional logic can be defined in the following way:
Definition 1.1 Syntax of Propositional Logic
S := V ∪ C ∪ O ∪ {(, )} is the alphabet of propositional logic, with V := {xi | i ∈ N} denoting the countable infinite set of propositional variables, C := {, ⊥} the set of truth values (or propositional constants) true and false, and O := {¬, ∧, ∨} the set of propositional operators negation (‘not’), conjunction (‘and’) and disjunction (‘or’). The set of propositional formulae is characterised by the following inductive definition: • the truth values and ⊥ are propositional formulae; • each propositional variable xi ∈ V is a propositional formula; • if F is a propositional formula, then ¬F is also a propositional formula;
18
Chapter 1 Introduction
• if F1 and F2 are propositional formulae, then (F1 ∧ F2 ) and (F1 ∨ F2 ) are also propositional formulae. Only strings obtained by a finite number of applications of these rules are propositional formulae.
Remark: Often, additional binary operators, such as ‘←’ (implication) and ‘↔’ (equivalence), are used in propositional formulae. These can be defined based on the operators from Definition 1.1; hence, including them into our propositional language does not increase its expressiveness.
Assignments are mappings from propositional variables to truth values. Using the standard interpretations of the logical operators on truth values, assignments can be used to evaluate propositional formulae. Hence, the semantics of propositional logic can be defined as follows:
Definition 1.2 Semantics of Propositional Logic
The variable set Var (F ) of formula F is defined as the set of all variables appearing in F . A variable assignment of formula F is a mapping a : Var (F ) → {, ⊥} of the variable set of F to the truth values. The set of all possible variable assignments of F is denoted by Assign(F ). The value Val(F, a) of formula F under assignment a is defined inductively based on the syntactic structure of F : • Val(, a) := • Val(⊥, a) := ⊥ • Val(xi , a) := a(xi ) • Val(¬F1 , a) := ¬Val(F1 , a) • Val(F1 ∧ F2 , a) := Val(F1 , a) ∧ Val(F2 , a) • Val(F1 ∨ F2 , a) := Val(F1 , a) ∨ Val(F2 , a) The truth values ‘’ and ‘⊥’ represent logical truth and falsehood, respectively; the operators ‘¬’ (negation), ‘∧’ (conjunction) and ‘∨’ (disjunction) are defined by the following truth tables:
1.2 Two Prototypical Combinatorial Problems
¬ ⊥
⊥
∧ ⊥
⊥
⊥ ⊥ ⊥
∨ ⊥
19
⊥ ⊥
Remark: There are many different notations for the truth values ‘’ and
‘⊥’, including ‘0’ and ‘1’, ‘−1’ and ‘+1’, ‘T’ and ‘F’, as well as ‘TRUE’ and ‘FALSE’. Likewise, the propositional operators ‘¬’, ‘∧’ and ‘∨’ are often denoted ‘–’, ‘∗’ and ‘+’, or ‘NOT’, ‘AND’ and ’OR’. Because the variable set of a propositional formula is always finite, the complete set of assignments for a given formula is also finite. More precisely, for a formula containing n variables there are exactly 2n different variable assignments. Considering the values of a formula under all possible assignments, the fundamental notion of satisfiability can be defined in the following way: Definition 1.3 Satisfiability
A variable assignment a is a model of formula F if, and only if, Val(F, a) = ; in this case we say that a satisfies F . A formula F is called satisfiable if, and only if, there exists at least one model of F . Based on the notion of satisfiability, we can now formally define the SAT problem. Definition 1.4 The Propositional Satisfiability Problem
Given a propositional formula F , the Propositional Satisfiability Problem (SAT) is to decide whether or not F is satisfiable. Obviously, SAT can be seen as a combinatorial decision problem, where variable assignments represent candidate solutions and models represent solutions. As for any combinatorial decision problem, we can distinguish a decision variant and a search variant: in the former, only a yes/no decision regarding the satisfiability of the given formula is required; in the latter, also called the model-finding variant, in case the given formula is satisfiable, a model has to be found. Often, logical problems like SAT are studied for syntactically restricted classes of formulae. Imposing syntactic restrictions usually facilitates theoretical studies and can also be very useful for simplifying the design and analysis of
20
Chapter 1 Introduction
algorithms. Normal forms are types of syntactically restricted formulae such that for an arbitrary formula F there is always at least one semantically equivalent formula F in normal form. Thus, each normal form induces a subclass of propositional formulae which is as expressively powerful as full propositional logic. The two most commonly used normal forms, CNF and DNF, are introduced in the following definition. Definition 1.5 Normal Forms
A literal is a propositional variable (called a positive literal) or its negation (called a negative literal). Formulae of the syntactic form c1 ∧ c2 ∧ . . . ∧ cm are called conjunctions, while formulae of the form d1 ∨ d2 ∨ . . . ∨ dm are called disjunctions. A propositional formula F is in conjunctive normal form (CNF), if, and only if, it is a conjunction over disjunctions of literals. In this context, the disjunctions are called clauses. A CNF formula F is in k -CNF, if, and only if, all clauses of F contain exactly k literals. A propositional formula F is in disjunctive normal form (DNF), if, and only if, it is a disjunction over conjunctions of literals. In this case, the conjunctions are called clauses. A DNF formula F is in k -DNF, if, and only if, all clauses of F contain exactly k literals.
Example 1.1 A Simple SAT Instance
Let us consider the following propositional formula in CNF:
F := (¬x1 ∨ x2 ) ∧ (¬x2 ∨ x1 ) ∧ (¬x1 ∨ ¬x2 ∨ ¬x3 ) ∧ (x1 ∨ x2 ) ∧ (¬x4 ∨ x3 ) ∧ (¬x5 ∨ x3 )
For this formula, we obtain the variable set Var(F ) = {x1 , x2 , x3 , x4 , x5 }; consequently, there are 25 = 32 different variable assignments. Exactly one of these, x1 = x2 = , x3 = x4 = x5 = ⊥, is a model, rendering F satisfiable.
The Travelling Salesman Problem (TSP) The motivation behind the Travelling Salesman Problem (also known as Travelling Salesperson Problem) is the problem faced by a salesperson who needs to
1.2 Two Prototypical Combinatorial Problems
21
visit a number of customers located in different cities and tries to find the shortest round trip accomplishing this task. In a more general and abstract formulation, the TSP is, given a directed, edge-weighted graph, to find a shortest cyclic path that visits every node in this graph exactly once. In order to define this problem formally, we first introduce the notion of a Hamiltonian cycle:
Definition 1.6 Path, Hamiltonian Cycle
Let G := (V, E, w ) be an edge-weighted, directed graph where V := {v1 , v2 , . . . , vn } is the set of n = #V vertices, E ⊆ V × V the set of (directed) edges, and w : E → R+ a function assigning each edge e ∈ E a weight w(e). A path in G is a list (u1 , u2 , . . . , uk ) of vertices ui ∈ V (i = 1, . . . , k ), such that any pair (ui , ui+1 ), i = 1, . . . , k − 1, is an edge in G. A cyclic path in G is a path for which the first and the last vertex coincide, i.e., u1 = uk in the above notation. A Hamiltonian cycle in G is a cyclic path p in G that visits every vertex of G (except for its starting point) exactly once, i.e., p = (u1 , u2 , . . . , un , u1 ) is a Hamiltonian cycle in G if, and only if, n = #V , and {u1 , u2 , . . . , un } = V . The weight of a path p can be calculated by adding up the weights of the edges in p:
Definition 1.7 Path Weight
For a given edge-weighted, directed graph and a path p := (u1 , . . . , uk ) in G, k−1 the path weight w(p) is defined as w(p) := i=1 w((ui , ui+1 )). Now, the TSP can be formally defined in the following way:
Definition 1.8 The Travelling Salesman Problem
Given an edge-weighted, directed graph G, the Travelling Salesman Problem (TSP) is to find a Hamiltonian cycle with minimal path weight in G.
Often, the TSP is defined in such a way that the underlying graphs are always complete graphs, that is, any pair of vertices is connected by an edge, because for any TSP instance with an underlying graph G that is not complete, one can always construct a complete graph G such that the TSP for G has exactly the same solutions as the one for G. (This is done by choosing the edge weights for
22
Chapter 1 Introduction
edges missing in G high enough that these edges can never occur in an optimal solution.) In the remainder of this book we will always assume that TSP instances are specified as complete graphs. Under this assumption, the Hamiltonian cycles in a given graph correspond exactly to the cyclic permutations of the underlying vertex set. Interesting subclasses of the TSP arise when the edge weighting function w has specific properties. The following definition covers some commonly used cases: Definition 1.9 Asymmetric, Symmetric and Euclidean TSP Instances
A TSP instance is called symmetric if, and only if, the weight function w of the underlying graph is symmetric, that is, if for all v, v ∈ V , w((v, v )) = w((v , v )); if w is not symmetric, the instance is called asymmetric. The Travelling Salesman Problem for asymmetric instances is also called the Asymmetric TSP (ATSP). A symmetric TSP instance satisfies the triangle inequality if, and only if, w((u1 , u3 )) ≤ w((u1 , u2 )) + w((u2 , u3 )) for any triples of different vertices u1 , u2 and u3 . A TSP instance is metric if, and only if, the vertices in the given graph correspond to points in a metric space such that the edge weight between any two vertices corresponds to the metric distance between the respective points. A TSP instance is called Euclidean if, and only if, the vertices correspond to the points in a Euclidean space and if the weight function w is a Euclidean distance metric. Finally, TSP instances for which the vertices are points on a sphere and the weight function w represents geographical (great circle) distance are called geographic.
Example 1.2 A Sample (Geographic) TSP Instance
Figure 1.1 shows a geographic TSP instance with 16 vertices. The vertices of the underlying graph correspond to 16 locations Ulysses is reported to have visited on his odyssey, and the edge weights represent the geographic distances between these locations. The figure also shows the optimal solution, that is, the shortest round trip (length 6 859 km). This tour is calculated based on direct air distances, and can only be travelled by a ‘modern Ulysses’ using an aircraft. This TSP instance has been first described by Grötschel and Padberg [1993]; it can be found as ulysses16.tsp in the TSPLIB Benchmark Library [Reinelt, 2003].
1.3 Computational Complexity
23
Europe
Bonifaccio
Circeo
Maronia Corfu
Troy
Ustica Stromboli Ithaca Messina Favignana Zakinthos Taormina
Asia
Malea
Gibraltar
Birzebbuga
Djerba
Africa
Figure 1.1 A graphic representation of the geographic TSP instance ‘ulysses16’ and
its optimal solution (dashed line); the solid line and arrows indicate the sequence in which Homer’s Ulysses supposedly visited the 16 locations. See Example 1.2 for details.
1.3 Computational Complexity A natural way for solving most combinatorial decision and optimisation problems is, given a problem instance, to search for solutions in the space of its candidate solutions. For that reason, these problems are sometimes also characterised as search problems. However, for a given instance of a combinatorial problem, the set of candidate solutions is very large, typically at least exponential in the size of that instance. For example, given a SAT instance with 100 variables, typically all 2100 different truth assignments are considered candidate solutions. This raises the following question: ‘Is it possible to search such vast spaces efficiently?’ More precisely, we are interested in the time required for solving an instance of a combinatorial problem as a function of the size of this instance. Questions like this lie at the core of computational complexity theory, a wellestablished field of computer science with considerable impact on other areas. In the context of this book, complexity theory plays a role, because the primary field
24
Chapter 1 Introduction
of application of stochastic local search algorithms is a class of computationally very hard combinatorial problems, for which no efficient algorithms are known (where efficient means polynomial run-time w.r.t. instance size). Moreover, to date a majority of the experts in complexity theory believe that for fundamental reasons the existence of efficient algorithms for these problems is impossible.
Complexity of Algorithms and Problems The complexity of an algorithm is defined on the basis of formal machine models. Usually, these are idealised, yet universal models, designed in a way that facilitates formal reasoning about their behaviour. One of the first, and still maybe the most prominent of these model is the Turing machine. For Turing machines and other formal machine or programming models, computational complexity is defined in terms of the space and time requirements of computations. Complexity theory usually deals with problem classes (generally countable sets of problem instances) instead of single instances. For a given algorithm or machine model, the complexity of a computation is characterised by the functional dependency between the size of an instance and the time and space required to solve this instance. Here, instance size is defined as the length of a reasonably concise description; hence, for a SAT instance, its size corresponds to the length of the propositional formula (written in linear form), while the size of a TSP instance is typically proportional to the size of the underlying graph. For reasons of analytical tractability, many problems are formulated as decision problems, and time and space complexity are analysed in terms of the worst-case asymptotic behaviour. Given a suitable definition of the computational complexity of an algorithm for a specific problem, the complexity of the problem itself can be defined as the complexity of the best algorithm for this problem. Because generally time complexity is the more restrictive factor, problems are often categorised into complexity classes with respect to their asymptotic worst-case time complexity.
N P -hard and N P -complete Problems Two particularly interesting complexity classes are P, the class of problems that can be solved by a deterministic machine in polynomial time, and N P, the class of problems that can be solved by a nondeterministic machine in polynomial time. (Note that nondeterministic machines are not equivalent to machines that make random choices; they are hypothetical machines which can be thought of
1.3 Computational Complexity
25
as having the ability to make correct guesses for certain decisions.) Of course, every problem in P is also contained in N P, because deterministic calculations can be emulated on a nondeterministic machine. However, the question whether also N P ⊆ P, and consequently P = N P, is one of the most prominent open problems in computer science. Since many extremely application-relevant problems are in N P, but possibly not in P (i.e., no polynomial-time deterministic algorithm is known), this so-called P vs N P Problem is not only of theoretical interest. For these computationally hard problems, the best algorithms known so far have exponential time complexity. Therefore, for growing problem size, the problem instances become quickly intractable, and even tremendous advances in hardware design have little effect on the size of the problem instances solvable with state-of-the-art technology in reasonable time. Many of these hard problems from N P are closely related and can be translated into each other in polynomial deterministic time (these translations are also called polynomial reductions). A problem that is at least as hard as any other problem in N P (in the sense that each problem in N P can be polynomially reduced to it) is called N P-hard. Thus, N P-hard problems in some sense can be regarded as at least as hard as every problem in N P. But they do not necessarily have to belong to the class N P themselves, as their complexity may actually be higher. N P-hard problems that are contained in N P are called N P-complete; in a certain sense, these problems are the hardest problems in N P. The SAT problem, introduced in Section 1.2, is the prototypical N P-complete problem. Historically, it was the first problem for which N P-completeness was established [Cook, 1971]. N P-completeness of SAT can directly be proven by encoding the calculations of a Turing machine M for an N P problem into a propositional formula whose models correspond to the accepting computations of M . Furthermore, it is quite easy to show that SAT remains N P-complete when restricted to CNF or even 3-CNF formulae (see, e.g., Garey and Johnson [1979]). On the other hand, SAT is decidable in linear time for DNF, for 2-CNF [Cook, 1971] and for Horn formulae [Dowling and Gallier, 1984]. Our second example problem, the TSP, is known to be N P-hard [Garey and Johnson, 1979]. The same holds for many special cases, such as Euclidean TSPs and even TSPs in which all edge weights are either one or two. In all of these cases, the associated decision problem for optimal solution quality is N P-complete. However, there exist a number of polynomially solvable special cases of the TSP, such as fractal TSP instances that are generated by so-called Lindenmayer Systems [Moscato and Norman, 1998] or specially structured Euclidean instances where, for example, all vertices lie on a circle; for an extensive overview of polynomially solvable special cases of the TSP we refer to Burkard et al. [1998b] and Gilmore et al. [1985].
26
Chapter 1 Introduction
Besides SAT and TSP, many other well-known combinatorial problems are N P-hard or N P-complete, including the Graph Colouring Problem, the Knapsack Problem, as well as many scheduling and timetabling problems, to name just a few [Garey and Johnson, 1979]. It should be noted that for N P-complete combinatorial decision problems, the search and decision variants are equally hard in the sense that if one could be solved deterministically in polynomial time, the same would apply to the other. This is the case because any algorithm for the search variant also solves the decision variant; and furthermore, given a decision algorithm and a specific problem instance, a solution (if existent) can be constructed by iteratively fixing solution components and deciding solubility of the resulting, modified instance (this approach requires only a polynomial number of calls to the decision algorithm). In the same sense, for N P-hard optimisation problems, the search and evaluation variants are equally hard. Furthermore, if either of these variants could be solved efficiently (i.e., in polynomial time on a deterministic machine), all decision variants could be solved efficiently as well; and if all decision variants could be solved efficiently, the same would hold for the search and evaluation variant. One fundamental result of complexity theory states that it suffices to find a polynomial time deterministic algorithm for one single N P-complete problem to prove that N P = P. This is a consequence of the fact that all N P-complete problems can be encoded into each other in polynomial time. Today, most experts believe that P = N P; however, so far all efforts of finding a proof for this inequality have been unsuccessful, and there has been some speculation that today’s mathematical methods might be too weak to solve this fundamental problem.
Not All Combinatorial Problems are Hard Although many combinatorial problems are N P-hard, it should be noted that not every computational task that can be formulated as a combinatorial problem is inherently difficult. A well-known example for a problem that, at the first glance, might seem to require searching an exponentially large space of candidate solutions is the Shortest Path Problem: given an edge-weighted graph G (where all edge weights are positive) and two vertices u, v in G, find the shortest route from u to v , that is, the path with minimal total edge weight. Fortunately, this shortest path problem can be solved efficiently; in particular, a simple recursive scheme for calculating all pairwise distances between u and any other vertex in the given graph, known as Dijkstra’s algorithm [Dijkstra, 1959], can find shortest paths in quadratic time w.r.t. the number of vertices in the given graph. In general, there are many other combinatorial problems
1.3 Computational Complexity
27
that can be solved by polynomial-time algorithms. In many cases, these efficient algorithms are based on a general method called dynamic programming (cf. [Bertsekas, 1995]).
Practically Solving Hard Combinatorial Problems Nevertheless, many practically relevant combinatorial problems, such as scheduling and planning problems, are N P-hard and therefore generally not efficiently solvable to date (and possibly, if N P = P, not efficiently solvable at all). However, being N P-complete or N P-hard does not mean that it is impossible for a problem to be solved efficiently. Practically, there are at least three ways of dealing with these problems: • find an application relevant subclass of the problem that can be solved efficiently; • use efficient approximation algorithms; • use stochastic approaches. Regarding the first strategy, we have to keep in mind that N P-hardness is a property of an entire problem class Π, whereas in practice, often only instances from a certain subclass Π ⊆ Π occur. In general, Π need not be N P-hard, that is, while for Π an efficient algorithm might not exist, it may still be possible to find an efficient algorithm for the subclass Π ; as an example consider the SAT problem for 2-CNF formulae, which is polynomially solvable. Furthermore, N P-hardness results characterise the worst-case complexity of a problem, and typical problem instances may be much easier to solve. Formally, this can be captured by the notion of average case complexity; although average case complexity results are typically significantly harder to prove and are hence much rarer than worst-case results, empirical studies suggest that for many N P-hard problems, typical or average case instances can be solved reasonably efficiently. The same applies to the time complexity of concrete algorithms for combinatorial problems; a well-known example is the Simplex Algorithm for linear optimisation, which has worst-case exponential time complexity [Klee and Minty, 1972], but has been empirically shown to achieve polynomial run-times (w.r.t. problem size) in the average case. In the case of an N P-hard optimisation problem that cannot be narrowed down to an efficiently solvable subclass, another option is to accept suboptimal solutions. Formally, the degree of suboptimality of a solution quality q is typically expressed in the form of the approximation ratio, defined as q/q ∗ for a
28
Chapter 1 Introduction
minimisation problem, and q ∗/q for a maximisation problem, where q ∗ is the optimal solution quality for the given problem instance. For a given optimisation problem we can then consider associated approximation problems, in which the objective is to find solutions with an approximation ratio bounded from above by a given constant r > 1. Often, as r is increased, the computational complexity of these approximation problems decreases to the point where they become practically solvable. In some cases, allowing a relatively small margin from the optimal solution quality renders the problem deterministically solvable in polynomial time. In other cases, the approximation problem remains N P-hard, while for practically occurring problem instances, suboptimal solutions of acceptable quality can be found in reasonable time. For example, it is well known that the general TSP for instances with arbitrary edge weights is not efficiently approximable to any constant factor, that is, there is no deterministic algorithm that is guaranteed to find solutions of quality within a constant factor of the optimum for a given problem instance in polynomial time [Sahni and Gonzalez, 1976]. Yet, for instances satisfying the triangle inequality, Christofides’ polynomial-time construction algorithm guarantees an approximation ratio of at most 1.5 [Christofides, 1976]. Furthermore, in the case of Euclidean TSP instances, a polynomial time approximation scheme exists, that is, there are algorithms that find solutions for arbitrary approximation ratios larger than one in polynomial time w.r.t. instance size [Arora, 1998]. Sometimes, however, even reasonably efficient approximation methods cannot be devised, or the problem is a decision problem, to which the notion of approximation cannot be applied at all. In these cases, one further option is to focus on probabilistic rather than deterministic algorithms. At first glance, this idea seems to be appealing. After all, according to the definition of the complexity class N P, at least N P-complete problems can be efficiently solved by (hypothetical) nondeterministic machines. But this, of course, is of little practical use, since it is unlikely that such idealised machines can be built; and for an actual probabilistic algorithm there is typically merely a small chance that it can solve the given problem in polynomial time. In practice, the success probability of such an algorithm can be arbitrarily small. Nevertheless, in numerous cases, probabilistic algorithms have been found to be considerably more efficient on N P-complete or N P-hard problems than the best deterministic methods available. In other cases, probabilistic methods and deterministic methods complement each other in the sense that for certain types of problem instances one or the other has been found to be superior. SAT and TSP, the two combinatorial problems introduced previously, are amongst the most fundamental and best-known problems in this category. Finally, it should be noted that even truly exponential scaling of run-time with instance size does not necessarily rule out solving practically relevant problem instances. For theoretical purposes, complexity analysis typically focuses on
1.3 Computational Complexity 10160
1020
10 • n4 −6 n/25 10 • 2
10−6 • 2n/25 −6 n 10 • 2
10140 10120
10
run-time
run-time
1015 10
29
105 1
10100 1080 1060 1040 1020
10−5
1
−10
10
2
1
1
00
80
60
0
0
0
0
0
instance size n
40
20
0
1
0
0
0
0
00
1
80
60
40
20
0
1
10−20 0
50 100 150 200 250 300 350 400 450 500
instance size n
Figure 1.2 Left: Polynomial-time algorithms are not always better than exponential-
time algorithms; in this example, for problem instances of size smaller than 1 187, the exponential-time algorithm performs better than the polynomial-time algorithm. Right: Performance differences between two exponential-time algorithms; in this example, one algorithm can solve instances up to size n = 500 in the same time required by the other algorithm for instances of size 20.
asymptotic behaviour, and for exponential scaling, constants (such as the base of the exponential) are mostly not considered. In practice, however, these constants are obviously extremely important, especially, when the size of problem instances that need to be solved has reasonable upper bounds. Consider, for example, an algorithm A with time complexity 10−6 · 2n/25 (where n is the problem size), and another algorithm B with time complexity 10 · n4 (see also Figure 1.2). Of course, for big problem instances, here about n > 1 187, A becomes quickly dramatically more costly than B . However, for n ≤ 1 100, A is much more efficient than B (for n = 100, the performance ratio is larger than 108 in favour of the exponential-time algorithm). It is important to keep in mind that exponential complexity should be avoided whenever possible, and does eventually, as instance size grows, render the application of an algorithm infeasible. However, for many problems where exponential time complexity is unavoidable (unless P = N P), some algorithms, though exponential in time complexity, or even incomplete, can still be dramatically more efficient than others and hence make it feasible to solve the problem for practically interesting instance sizes (see Figure 1.2, right side). This is where heuristic guidance, combined with randomisation and probabilistic decisions (all of which are central issues of this book), can make the difference. In Depth Some Advanced Concepts in Computational Complexity Besides the fundamental notions of N P -hardness and N P -completeness discussed above, there are a number of other concepts from complexity theory that are of interest in the context of combinatorial problems. This in-depth section briefly covers some of
30
Chapter 1 Introduction these advanced concepts which, although not essential for understanding the material presented in the main text, are particularly relevant to the topics discussed later in this book. The instances of many combinatorial problems contain numbers, such as the edge weights in the TSP. For some of these problems, there are algorithms whose run-time is bounded from above by a polynomial function over |π |, the size of a given problem instance π , and Max(π ), the maximum absolute value of any number occurring in π ; such algorithms are called pseudo-polynomial. Note that the run-time of a pseudo-polynomial algorithm may be exponential in the size of π , and hence not have polynomial time complexity. (The reason for this is that the maximum absolute value of any number in π can be exponential in |π |.) N P -complete decisions problems for which a pseudo-polynomial algorithms is known are called weakly N P-complete or N P-complete in the weak sense, while those that remain N P -complete, even if run-time is measured as a function of Max(π ) and Max(π ) is bounded from above by a polynomial in |π |, are called strongly N P -complete or N P -complete in the strong sense. A prominent example of an N P -complete problem for which a pseudo-polynomial algorithm is known is the Knapsack Problem; examples for strongly N P -complete problems include TSP and the Set Covering Problem (see Chapter 10, Section 10.3) [Garey and Johnson, 1979]. The notion of N P -hardness applies to decision and optimisation problems alike. Slightly more technically as above, N P -hardness can be defined as follows: A problem Π is N P-hard if the existence of a polynomial time algorithm for Π implies the existence of a polynomial time algorithm for all N P -complete problems (and hence, for all problems in N P ). Conversely, a problem Π is N P-easy if the existence of a polynomial-time algorithm for any N P -complete problem implies the existence of a polynomial time algorithm for Π. While strictly speaking, the notion of N P -completeness only applies to decision problems, the relationship between the complexity of many hard optimisation problems and their associated decision problems can be formally captured by the concept of N P equivalence: an optimisation problem is N P-equivalent if, and only if, it is N P -hard and N P -easy. It is well-known and relatively easy to show that the TSP is N P -equivalent, while its associated decision problem is N P -complete [Garey and Johnson, 1979]. As discussed previously, one of the strategies for solving hard combinatorial optimisation problems more efficiently is to settle for suboptimal solution qualities. The following concepts capture the computational complexity of such approximations more precisely. A polynomial-time approximation scheme (PTAS) for a combinatorial optimisation problem is an algorithm that is guaranteed to achieve an approximation ratio of r = 1+ for any given > 0 in time bounded from above by a polynomial function of the size of the given problem instance π . A polynomial-time approximation scheme whose run-time is also at most polynomial in 1/ is called a fully-polynomial-time approximation scheme (FPTAS). The class of problems that have polynomial-time approximation schemes and fully-polynomial-time approximation schemes are called PTAS and FPTAS , respectively. An example of a problem in PTAS is the Euclidean TSP [Arora, 1998], while the Knapsack Problem is known to be in FPTAS [Ibarra and Kim, 1975]. The complexity class APX comprises all optimisation problems for which there exists an algorithm that is guaranteed to find a solution within a constant factor of the optimal solution quality of any given instance. Note that APX contains PTAS . A problem Π in APX is APX -complete if the existence of a polynomial-time approximation scheme for Π implies the existence of polynomial-time approximation schemes for all problems in APX . Hence, APX -complete problems are the hardest APX problems, as
1.4 Search Paradigms
31
N P -complete problems are the hardest problems in N P . Prominent examples for APX complete problems are metric TSP and MAX-SAT, a well-known optimisation variant of SAT (see also Chapter 95) [Ausiello et al., 1999] .
1.4 Search Paradigms Basically all computational approaches for solving hard combinatorial problems can be characterised as search algorithms. The fundamental idea behind the search approach is to iteratively generate and evaluate candidate solutions; in the case of combinatorial decision problems, evaluating a candidate solution means to decide whether it is an actual solution, while in the case of an optimisation problem, it typically involves determining the respective value of the objective function. Although for N P-hard combinatorial problems the time complexity of finding solutions can grow exponentially with instance size, evaluating candidate solutions can often be done much more efficiently, that is, in polynomial time. For example, for a given TSP instance, a candidate solution would correspond to a round trip visiting each vertex of the given graph exactly once, and its objective function value can be computed easily by summing up the weights associated with all the edges used for that round trip. Generally, the evaluation of candidate solutions much depends on the given problem, and is often rather straightforward to implement. The fundamental differences between search algorithms are in the way in which candidate solutions are generated, which can have a very significant impact on the algorithms’ theoretical properties and practical performance. In this context, general mechanisms can be defined that are applicable to a broad range of search problems. Consequently, in the remainder of this section, we discuss various search paradigms based on their underlying approaches to generating candidate solutions.
Perturbative vs Constructive Search Candidate solutions for instances of combinatorial problems are composed of solution components, such as the assignments of truth values to individual propositional variables (atomic assignments) in the case of SAT. Hence, given candidate solutions can easily be changed into new candidate solutions by modifying one or more of the corresponding solution components. This can be characterised as perturbing a given candidate solution, and hence we classify search algorithms that rely on this mechanism for generating the candidate solutions to be tested as perturbative search methods. Applied to SAT, perturbative search would start
32
Chapter 1 Introduction
with one or more complete truth assignments and then at each step generate other truth assignments by changing the truth values of a number of variables in each such assignment. While for perturbative approaches, the search typically takes place directly in the space of candidate solutions, it can sometimes be useful to also include partial candidate solutions in the search space, that is, candidate solutions in which one or more solution components are missing. Examples for such partial candidate solutions are partial truth assignments for a SAT instance which leave the truth values of some propositional variables unspecified, and partial round trips for a TSP instance, which correspond to paths in the corresponding graph that visit a subset of the vertices and can be extended into Hamiltonian cycles by adding additional edges. The task of generating (complete) candidate solutions by iteratively extending partial candidate solutions can be formulated as a search problem in which typically the goal is to obtain a ‘good’ candidate solution, where for optimisation problems, the goodness corresponds to the value of the objective function. Algorithms for solving this type of problem are called constructive search methods (or construction heuristics). As a simple example, consider the following method for generating solution candidates for a given TSP instance. Start at a randomly chosen vertex in the graph, and then iteratively follow an edge with minimal weight connecting the current vertex to one of the vertices that has not yet been visited. This method generates a path that, by adding the starting vertex as a final element to the corresponding list, can be easily extended into a Hamiltonian cycle in the given graph, that is, a candidate solution for the TSP instance. This simple construction heuristic for the TSP is called the Nearest Neighbour Heuristic; on its own, it typically does not generate candidate solutions with close-to-optimal objective function values, but it is commonly and successfully used in combination with perturbative search methods (this will be discussed in more detail in Chapter 8).
Systematic vs Local Search A different, and more common, classification of search approaches is based on the distinction between systematic and local search: Systematic search algorithms traverse the search space of a problem instance in a systematic manner which guarantees that eventually either a (optimal) solution is found, or, if no solution exists, this fact is determined with certainty. This typical property of algorithms based on systematic search is called completeness. Local search algorithms, on the other hand, start at some location of the given search space and subsequently
1.4 Search Paradigms
33
move from the present location to a neighbouring location in the search space, where each location has only a relatively small number of neighbours, and each of the moves is determined by a decision based on local knowledge only. Typically, local search algorithms are incomplete, that is, there is no guarantee that an existing solution is eventually found, and the fact that no solution exists can never be determined with certainty. Furthermore, local search methods can visit the same location within the search space more than once. In fact, many local search algorithms are prone to getting stuck in some part of the search space which they cannot escape from without using special mechanisms, such as restarting the search process or performing some type of diversification steps. As an example for a simple local search method for SAT, consider the following algorithm: given a propositional formula F in CNF over n propositional variables, randomly pick a variable assignment as a starting point. Then, in each step, check whether the current variable assignment satisfies F . If not, randomly select a variable, and change its truth value from ⊥ to or vice versa. Terminate the search when a model is found, or after a specified number of search steps have been performed unsuccessfully. This algorithm is called Uninformed Random Walk and will be revisited in Section 1.5. To obtain a simple systematic search algorithm for SAT, we modify this local search method in the following way. Given an ordering of the n propositional variables, with each variable assignment a we uniquely associate a number k between 0 and 2n − 1 such that digit i of the binary representation of k is 1 if, and only if, assignment a assigns to propositional variable i. Our systematic search algorithm starts with the variable assignment setting all propositional variables to ⊥, which corresponds to the number 0. Then, in each step we move to the variable assignment obtained by incrementing the numerical value associated with the current assignment by one. The procedure terminates when the current assignment satisfies F or after 2n − 1 of these steps. Obviously, this procedure searches the space of all variable assignments in a systematic way and will either return a model of F or terminate unsuccessfully after 2n − 1 steps, in which case we can be certain that F is unsatisfiable.
Local Search = Perturbative Search? Local search methods are often, but not always based on perturbative search. The Uninformed Random Walk algorithm for SAT introduced previously is a typical example of a perturbative local search algorithm, because in each search step we change the truth value assigned to one variable, which corresponds to a perturbation of a candidate solution. However, local search can also be used for constructive search processes. This is exemplified by the Nearest Neighbour
34
Chapter 1 Introduction
Heuristic for the TSP introduced earlier in this section, where vertices are iteratively added to a given partial tour based on the weight of the edges leading to vertices adjacent to the last vertex on that tour. Clearly, this process corresponds to a constructive local search on the given graph. Generally, construction heuristics can be interpreted as constructive local search methods, and as we will see in Chapter 2, there are some prominent examples of SLS methods based on constructive local search. In many cases, constructive local search can be combined with perturbative local search. A typical example is the use of the Nearest Neighbour Heuristic for generating the starting points for a perturbative local search algorithm for the TSP. Another interesting example is Ant Colony Optimisation [Dorigo and Di Caro, 1999], which can be seen as a perturbative search method where in each step one or more constructive local searches are performed. (See also Chapter 2, Section 2.4.) Interestingly, perturbative search, although naturally associated with local search methods, can also provide the basis for systematic search algorithms. As an example, let us consider the systematic variant of the Uninformed Random Walk algorithm for SAT presented on page 33. The steps of this search algorithm correspond to perturbations of complete variable assignments; consequently, the algorithm can be considered a perturbative systematic search method. As this example shows, perturbative search methods can be complete. It should be noted, however, that we are presently not aware of any perturbative systematic search methods that achieve competitive performance on any hard combinatorial problem.
Constructive Search + Backtracking = Systematic Search Another interesting relationship can be established between constructive search methods and systematic search algorithms. Let us once more consider our prototypical example for constructive search, the Nearest Neighbour Heuristic for the TSP. If we modify this algorithm such that in each step of the construction process the given partial tour can be extended with arbitrary neigbours of its last vertex, it is clear that the constructive search method thus obtained can in principle find the optimal solution to any given TSP instance. Hence, an algorithm which could systematically enumerate all such constructions would obviously be guaranteed to solve arbitrary TSP instances optimally (given sufficient time), that is, it would be complete. Such a complete algorithm for the TSP can be obtained easily by combining the Nearest Neighbour Heuristic with backtracking. At each choice point of the construction algorithm (including the initial vertex), a list of all alternative
1.4 Search Paradigms
35
choices is kept. Once a complete tour has been generated, the search process ‘backtracks’ to the most recent choice point at which unexplored alternatives exist, and the constructive search is resumed there using an alternate vertex at this point. This backtracking process first tries alternate choices for recent decisions (which are deep in the corresponding search tree), and once all alternatives are explored for a given choice point, revisits earlier choices. In this latter case, all subsequent choice points are newly generated, that is, in our example, from that point on, we first use the Nearest Neighbour Heuristic to generate another complete tour, and then recursively continue to revise the choices made in this process. Visiting all solutions by means of a backtrack search method leads to an algorithm with at least exponential time complexity, which becomes rapidly infeasibly even for relatively small problem instances. Fortunately, in many situations it is possible to prune large parts of the corresponding search tree which can be shown to not contain any solutions. For example, in the case of the TSP, the search on a given branch can be terminated if the length of the current partial tour plus a lower bound on the length of the completion of the tour exceeds the shortest tour found in the search so far. This type of algorithm is called branch & bound or A∗ search in the operations research and artificial intelligence communities, respectively. For SAT, one can easily devise a backtrack algorithm that searches a binary search tree in which each node corresponds to assigning a truth value to one variable, which is then fixed for the subtree beneath that node. This tree can be pruned considerably by using unit propagation, a technique that propagates the logical consequences of particular atomic variable assignments down the search tree and effectively eliminates subtrees from the search that cannot contain a model of the given formula. Unit propagation is one of the key techniques used in all state-of-the-art systematic search algorithms for SAT. In general, systematic backtracking is a recursive mechanism which can be used to build a complete search algorithm on top of a constructive search method. This approach can be applied to basically any constructive search algorithm. Moreover, many prominent and successful systematic search algorithms can be decomposed into a constructive search method and some form of backtracking. It should be noted that the construction methods used in this context need not be as ‘greedy’ as the Nearest Neighbour Heuristic. Furthermore, although many well-known systematic search algorithms are deterministic, it is possible to combine randomised construction heuristics with backtracking in order to obtain stochastic systematic search algorithms (see, e.g., Gomes et al. [1998]). There is also some flexibility in the backtracking mechanisms, which do not have to revisit choices in the simple recursive manner indicated above; in fact, as long as there is a reasonably compact representation of all unexplored
36
Chapter 1 Introduction
candidate solutions, essentially any strategy that guarantees to eventually evaluate all of these leads to a complete search algorithm. In particular, this allows the order in which decisions are revisited to be randomised or dynamically changed based on search progress — approaches which provide the basis for some of the best known systematic search algorithms for combinatorial problems such as SAT.
Advantages and Disadvantages of Local Search It might appear that due to their incompleteness, local search algorithms are generally inferior to systematic methods. But as will be shown later, this is not the case. Firstly, many problems are of a constructive nature and their instances are known to be solvable. In this situation, the goal of any search algorithm is to generate a solution rather than just to decide whether one exists. This holds in particular for optimisation problems, such as the Travelling Salesman Problem (TSP), where the actual problem is to find a solution of sufficiently high quality, but also for underconstrained decision problems, which are not uncommon in practice. Obviously, the main advantage of a complete algorithm — its ability to detect that a given problem instance has no solution — is not relevant for finding solutions of solvable instances. Secondly, in a typical application scenario the time to find a solution is often limited. Examples for such real-time problems can be found in virtually all application domains. Actually one might argue that almost every real-world problem involving interaction with the physical world, including humans, has real-time constraints. Common examples are real-time production scheduling, robot motion planning and decision making, most game playing situations, and speech recognition for natural language interfaces. In these situations, systematic algorithms often have to be aborted after the given time has been exhausted, which, of course, renders them incomplete. This is particularly problematic for certain types of systematic optimisation algorithms that search through spaces of partial solutions without computing complete solutions early in the search (this is the case for many dynamic programming algorithms); if such a systematic algorithm is aborted prematurely, usually no solution candidate is available, while in the same situation local search algorithms typically return the best solution found so far. Ideally, algorithms for real-time problems should be able to deliver reasonably good solutions at any point during their execution. For optimisation problems this typically means that run-time and solution quality should be positively correlated; for decision problems one could guess a solution when a time-out
1.5 Stochastic Local Search
37
occurs, where the accuracy of the guess should increase with the run-time of the algorithm. This so-called any-time property of algorithms is usually difficult to achieve, but in many situations the local search paradigm is naturally suited for devising any-time algorithms. Generally, systematic and local search algorithms are somewhat complementary in their applications. An example for this can be found in Kautz and Selman’s work on solving SAT-encoded planning problems, where a fast local search algorithm is used for finding solutions whose optimality is proven by means of a systematic search algorithm [Kautz and Selman, 1996]. As we will discuss later in more detail, local search algorithms are often advantageous in certain situations, particularly if reasonably good solutions are required within a short time, if parallel processing is used and if the knowledge about the problem domain is rather limited. In other cases, particularly when provably optimal solutions are required, time constraints are less important and some knowledge about the problem domain can be exploited, systematic search may be the better choice. There is also some evidence that for certain problems, different types of instances are more effectively solved using local or systematic search methods, respectively. Unfortunately, to date the general question of when to prefer local search over systematic methods and vice versa remains mostly unanswered.
1.5 Stochastic Local Search Many widely known and high-performance local search algorithms make use of randomised choices in generating or selecting candidate solutions for a given combinatorial problem instance. These algorithms are called stochastic local search (SLS) algorithms, and they constitute one of the most successful and widely used approaches for solving hard combinatorial problems. SLS algorithms have been used for many years in the context of combinatorial optimisation problems. Among the most prominent algorithms of this kind we find the Lin-Kernighan Algorithm for the Travelling Salesman Problem [Lin and Kernighan, 1973], as well as general methods such as Evolutionary Algorithms (see, e.g., Bäck [1996]) and Simulated Annealing [Kirkpatrick et al., 1983] (these SLS methods will be presented and discussed in Chapter 2). More recently, it has become evident that stochastic local search algorithms can also be very successfully applied to the solution of N P-complete decision problems such as the Graph Colouring Problem (GCP) [Hertz and de Werra, 1987; Minton et al., 1992] or the Satisfiability Problem in propositional logic (SAT) [Selman et al., 1992; Gu, 1992; Selman et al., 1994].
38
Chapter 1 Introduction
A General Definition of Stochastic Local Search As outlined in the previous section, local search algorithms generally work in the following way. For a given instance of a combinatorial problem, the search for solutions takes place in the space of candidate solutions. Note that this search space may include partial candidate solutions, as required in the context of constructive search algorithms. The local search process is started by selecting an initial candidate solution, and then proceeds by iteratively moving from one candidate solution to a neighbouring candidate solution, where the decision on each search step is based on a limited amount of local information only. (See also Figure 1.3.) In stochastic local search algorithms, these decisions as well as the search initialisation can be randomised. Furthermore, the search process may use additional memory, for example, for storing a limited number of recently visited candidate solutions. Formally, a stochastic local search algorithm can be defined in the following way: Definition 1.10 Stochastic Local Search Algorithm
Given a (combinatorial) problem Π, a stochastic local search algorithm for solving an arbitrary problem instance π ∈ Π is defined by the following components:
s
c
Figure 1.3 Illustration of stochastic local search. Left: ‘Bird’s-eye view’ of a search space region; s marks a solution and c the current search position; neighbouring candidate solutions are connected by lines. Right: In each step, the search process moves to a neighbouring search position that is chosen based on local information only; here, the elevation of search positions indicates a heuristic value that is used for selecting the search steps to be performed.
1.5 Stochastic Local Search
39
• the search space S (π ) of instance π , which is a finite set of candidate solutions s ∈ S (also called search positions, locations, configurations or states); • a set of (feasible) solutions S (π ) ⊆ S (π ); • a neighbourhood relation on S (π ), N (π ) ⊆ S (π ) × S (π ); • a finite set of memory states M (π ), which, in the case of SLS algorithms that do not use memory, may consist of a single state only; • an initialisation function init(π ) : ∅ → D(S (π ) × M (π )), which specifies a probability distribution over initial search positions and memory states; • a step function step(π ) : S (π ) × M (π ) → D(S (π ) × M (π )) mapping each search position and memory state onto a probability distribution over its neighbouring search positions and memory states; • a termination predicate terminate(π ) : S (π ) × M (π ) → D({, ⊥}) mapping each search position and memory state to a probability distribution over truth values ( = true, ⊥ = false), which indicates the probability with which the search is to be terminated upon reaching a specific point in the search space and memory state. In the above, D(S ) denotes the set of probability distributions over a given set S , where formally, a probability distribution D ∈ D(S ) is a function D : S → R+ 0 that maps elements of S to their respective probabilities.
Remark: In this definition, all components depend on the given problem instance π . Formally, these could be defined as (higher-order) functions mapping the given problem instance onto the corresponding search space, solution set, etc. While this is a straightforward extension of the definition as given above, for increased readability, we specify the components instantiated for a given problem instance; furthermore, we will often omit the formal reference to the problem instance, by writing S instead of S (π ), etc.
Any neighbourhood relation N (π ) can be equivalently specified in the form of a function N : S (π ) → 2S(π) that maps candidate solutions s ∈ S to the sets of their respective direct neighbours N (s) := {s ∈ S | N (s, s )} ⊆ S ; the set N (s) is called the neighbourhood set, or just the neighbourhood, of s. The combination of search position and memory state forms the state of the SLS algorithm, or search state. In the simplest case, search states solely consist of the respective candidate solution, and no additional memory is used;
40
Chapter 1 Introduction
this is formally captured by M (π ) := {m0 }, where m0 is an arbitrary constant. If additional memory is used, the memory state can consist of multiple independent attributes, that is, M (π ) := M1 × M2 × . . . × Ml(π) for some instance dependent constant l(π ). Although M (π ) can, in principle, represent a number of memory states that is exponential in the size of the given problem instance, typically it has a compact (i.e., polynomially bounded) representation. The memory state can be used to represent information that the algorithm is using to control the search process, such as the temperature parameter in Simulated Annealing or the tabu status of solution components in Tabu Search (cf. Chapter 2), but also simple book keeping mechanisms such as an iteration counter that can be used, for instance, in the context of a restart mechanism. As an alternative to the initialisation and step functions, one can also specify initialisation and step procedures that draw an element from the probability distributions init(π )() and step(π )(s, m) for a given search position s and memory state m. The same holds for the termination predicate. (The notation step(π )(s, m) and init(π )() reflects the fact that these components are formally defined as higher order functions. For example, step is instantiated through the first argument π into an instance specific step function that has s and m as its two arguments.) In the remainder of this book, we will use both types of definitions interchangeably, where init(π ), step(π, s, m) and terminate(π, s, m), when used in algorithm outlines, represent the procedures realising the probabilistic selection from the corresponding probability distributions. In cases where no additional memory is used, that is, #M (π ) = 1, we will often write step(π, s) and terminate(π, s) instead of step(π, s, m) and terminate(π, s, m). Based on the components of the definition, the algorithm outlines in Figures 1.4 and 1.5 specify the semantics of stochastic local search algorithms for the search variants of decision and optimisation problems, respectively. The only major difference between the two versions is that for optimisation problems, the best candidate solution found so far, the so-called incumbent solution, is being memorised and returned upon termination of the algorithm (if it is a feasible solution); in this context, the objective function f for the given problem is used to determine the quality of candidate solutions. Furthermore, for decision problems, the termination condition is typically satisfied as soon as a solution is found, that is, s ∈ S . In the case of optimisation problems, however, finding a feasible solution s ∈ S is typically not a sufficient termination criterion; in fact, many SLS algorithms for optimisation problems search through spaces containing feasible solutions only, that is, S = S . It may be noted that any SLS algorithms realises a Markov process; in particular, the behaviour of an SLS algorithm from a given search state (s, m) does not depend on any aspects of the search history that lead to that state, except for the information captured in s and m.
1.5 Stochastic Local Search
41
procedure SLS-Decision(π) input: problem instance π ∈ Π output: solution s ∈ S (π) or ∅
(s, m) := init(π, m); while not terminate(π, s, m) do (s, m) := step(π, s, m); end if s ∈ S (π) then return s else return ∅ end end SLS-Decision Figure 1.4 General outline of a stochastic local search algorithm for a decision problem Π.
procedure SLS-Minimisation(π ) input: problem instance π ∈ Π output: solution s ∈ S (π ) or ∅
(s, m) := init(π , m); sˆ := s; while not terminate(π , s, m) do (s, m) := step(π , s, m); if f(π , s) < f(π , sˆ) then sˆ := s;
end end if ˆ s ∈ S (π ) then return sˆ else return ∅ end end SLS-Minimisation Figure 1.5 General outline of a stochastic local search algorithm for a minimisation problem Π with objective function f; sˆ is the incumbent solution, that is, the best candidate solution found at any time during the search so far.
Example 1.3 A Simple SLS Algorithm for SAT
For a given SAT instance, that is, a CNF formula F , we define the search space as Assign(F ), the set of all possible variable assignments of F . Obviously, the
42
Chapter 1 Introduction
set of solutions is then given by the set of all models (satisfying assignments) of F . A frequently used neighbourhood relation is the so-called one-flip neighbourhood, which defines two variable assignments to be direct neighbours if, and only if, they differ in the truth value of exactly one variable, while agreeing on the assignment of the remaining variables. Formally, this can be written in the following way: for all a, a ∈ Assign(F ), N (a, a ) if, and only if, there exists v ∈ Var(F ), such that Val(v , a) = Val(v , a ) and for all v ∈ Var(F ) − {v }, Val(v, a) = Val(v, a ). The search mechanism we will specify in the following does not use any memory, and hence we define M := {0}. As an initialisation function, let us consider an ‘uninformed’ random selection realised by a uniform distribution over the entire search space. This initialisation function randomly selects any assignment of F with equal probability. Formally, it can be written as init()(a , m) := init()(a ) := 1/#S = 1/2n , where a ∈ S is an arbitrary variable assignment of F and n is the number of variables appearing in F . (Note that formally, init() = init(F )() is a probability distribution and init()(a ) denotes the probability of a under the distribution init(). According to our earlier convention, we omit the problem instance from the notation of init and step when it is clear from the context.) Analogously, we can define a step function that maps any variable assignment a to the uniform distribution over all its neighbouring assignments. Formally, if N (a) := {a ∈ S | N (a, a )} is the set of all assignments neighbouring to a, the step function can be defined as step(a, m)(a , m) := step(a)(a ) := 1/#N (a) = 1/n. This SLS algorithm is called uninformed random walk; as one might imagine, it is quite ineffective, since it does not provide any mechanism for steering the search towards solutions of the given problem instance.
Neighbourhoods and Neighbourhood Graphs Generally, the choice of an appropriate neighbourhood relation is crucial for the performance of an SLS algorithm and often, this choice needs to be made in a problem specific way. Nevertheless, there are standard types of neighbourhood relations which form the basis for many successful applications of stochastic local search. One of the most widely used types of neighbourhood relations is the so-called k -exchange neighbourhoods, in which two candidate solutions are neighbours if, and only if, they differ in at most k solution components. The neighbourhood used in the simple SAT algorithm from Example 1.3 (as well as in most state-of-the-art SLS algorithms for SAT) is a 1-exchange neighbourhood. For the TSP, one could define a k -exchange neighbourhood such that
1.5 Stochastic Local Search
u4
u3
u4
u3
u1
u2
43
2-exchange
u1
u2
Figure 1.6 Schematic view of a single SLS step based on the standard 2-exchange neighbourhood relation for the TSP.
from a given candidate round trip, all its direct neighbours can be reached by changing the positions of at most k vertices in the corresponding permutation. However, this neighbourhood relation was found to be inferior to a different type of k -exchange neighbourhood, where the edges of the given graphs are viewed as the solution components, and two candidate round trips are k -exchange neighbours if, and only if, one can be obtained from the other by removing at most k edges and rewiring the resulting partial tours [Reinelt, 1994]. Figure 1.6 illustrates two tours that are neighbours under this latter 2-exchange neighbourhood. Any neighbourhood relation N induces a (directed) graph on the underlying search space S ; in this neighbourhood graph GN , two vertices s, s are connected by an edge (s, s ) if, and only if, (s, s ) ∈ N . Formally, for a given problem instance π , the neighbourhood graph induced by search space S (π ) and neighbourhood relation N (π ) is defined as GN (π ) := (S (π ), N (π )). Many important properties of the neighbourhood relation are reflected directly in the neighbourhood graph. For instance, most standard neighbourhood relations (such as the k -exchange neighbourhoods introduced above) are symmetric, that is, ∀s, s ∈ S : (N (s, s ) ⇔ N (s , s)); this means, that the neighbourhood graph is symmetric in its edges and essentially corresponds to an undirected graph. This situation is relevant in practice, because it is a necessary precondition for an SLS algorithm’s ability to directly reverse search steps. The degree of each vertex in the neighbourhood graphs corresponds to the size of its neighbourhood. In many cases, in particular for k -exchange neighbourhoods, all vertices of GN have the same degree, (i.e., the underlying neighbourhood graph is regular). Another important property of the neighbourhood graph is its diameter, diam(GN ), which gives a worst-case lower bound on the number of search steps required for reaching (optimal) solutions from arbitrary points in the search space. Neighbourhood graphs and their properties are further discussed in Chapter 5.
44
Chapter 1 Introduction
Search Strategies, Steps and Trajectories Typically, the first three components of our definition of an SLS algorithm, the search space, solution set and neighbourhood relation, depend very much on the problem being solved. Together, these components provide the basis for solving a given problem using stochastic local search. But based on a given definition of a search space, solution set and neighbourhood relation, a wide range of search strategies, specified by the definition of initialisation and step functions, can be devised. To some extent, such search strategies can be independent from the underlying search space, solution set, and neighbourhood, and consequently can be studied and presented separately from these. In this context, the following concepts are often useful: Definition 1.11 Search Steps and Search Trajectories
Let Π be a (combinatorial) problem, and let π ∈ Π be an arbitrary instance of Π. Given an SLS algorithm A for Π according to Definition 1.10, a search step (also called move ) is a pair (s, s ) ∈ S × S of neighbouring search positions such that the probability for A moving from s to s is greater than 0, that is, N (s, s ) and step(s)(s ) > 0. A search trajectory is a finite sequence (s0 , s1 , . . . , sk ) of search positions si (i = 0, . . . , k) such that for all i ∈ {1, . . . , k}, (si−1 , si ) is a search step and the probability of initialising the search at s0 is greater than zero, that is, init()(s0 , m) > 0 for some m ∈ M .
For the simple SLS algorithm for SAT introduced in Example 1.3, each search step is an arbitrary pair of neighbouring variable assignments, and a search trajectory is a sequence of variable assignments in which each pair of successive elements is neighbouring; obviously such a trajectory corresponds to a sequence of search steps. In general, any search trajectory corresponds to a walk in the neighbourhood graph.
Uninformed SLS: Random Picking and Random Walk The two (arguably) simplest SLS strategies are Uninformed Random Picking and Uninformed Random Walk. Both do not use memory and are based on an initialisation function that returns the uniform distribution over the entire search space. SLS algorithms based on this initialisation function randomly select any element of the search space S with equal probability as a starting point for the search process.
1.5 Stochastic Local Search
45
For Uninformed Random Picking, a complete neighbourhood relation is used (i.e., N = S × S ) and the step function maps each point in S to a uniform distribution over all its neighbours, that is, every point in S . Effectively, this strategy randomly samples the search space, drawing a new candidate solution in every step. Uninformed Random Walk uses the same initialisation function, but for a given, arbitrary neighbourhood relation N ⊆ S × S , its step function returns the uniform distribution over the set of neighbours of the given candidate solution, which implements a uniform, random selection from that neighbourhood in each step. Obviously, for the complete neighbourhood relation, this coincides with Uninformed Random Picking; for more restricted neighbourhoods it leads to a strategy that slightly more resembles the intuitive notion of local search. As one might imagine, both of these uninformed SLS strategies are quite ineffective, since they do not provide any mechanism for steering the search towards solutions. Nevertheless, as we will see later, in combination with more directed search strategies, both Uninformed Random Picking and variants of Uninformed Random Walk play a role in the context of preventing or overcoming premature stagnation in complex and much more effective SLS algorithms.
Evaluation Functions To improve on the simple uninformed SLS strategies discussed above, a mechanism is needed to guide the search towards solutions. For a given instance π of a decision problem, this can be achieved using an evaluation function g (π )(s): S (π ) → R that maps each search position onto a real number in such a way that the global optima of π correspond to the solutions of π . In the following, we will use the notation g (π, s) instead of g (π )(s) in the context of algorithm outlines, and we will often write g (s) when the problem instance π is clear from the context; analogously, we use f (π, s) and f (s) instead of f (π )(s) to denote objective function values. The evaluation function is used for assessing or ranking candidate solutions in the neighbourhood of the current search position. The efficacy of the guidance thus provided depends on properties of the evaluation function and its integration into the search mechanism being used. Typically, the evaluation function is problem specific, and its choice is to some degree dependent on the search space, solution set and neighbourhood underlying the SLS approach under consideration. In the case of SLS algorithms for combinatorial optimisation problems, the objective function characterising the problem is often used as an evaluation function, such that the values of the evaluation function correspond directly to the
46
Chapter 1 Introduction
quantity to be optimised. However, sometimes different evaluation functions can provide more effective guidance towards high-quality or optimal solutions. For example, in the case of unweighted MAX-SAT, an optimisation variant of SAT in which the objective is to maximise the number of satisfied clauses, local search algorithms with better theoretical approximation guarantees can be obtained when using a specific evaluation function different from the number of clauses satisfied by a given assignment [Khanna et al., 1994] (see also Chapter 7). For combinatorial decision problems, sometimes evaluation functions are naturally suggested by the objective functions of optimisation variants, but often there is more than one obvious choice of an evaluation function. In the case of SLS algorithms for SAT, the following evaluation function g is often used: given a formula F in CNF and an arbitrary variable assignment a of F , g (F, a) is defined as the number of clauses of F that are unsatisfied under a. Obviously, the models of F correspond to the global minima of g and are characterised by g (F, a) = 0. It may be noted that this evaluation function corresponds to the objective function of the previously mentioned unweighted MAX-SAT problem. Remark: In the literature, often no distinction is made between an objective function and an evaluation function. To minimise potential confusion between the definition of the problem to be solved (which, in case of an optimisation problem, includes an objective function) and the definition of an SLS algorithm for solving this problem (which might make use of an evaluation function different from the problem’s objective function), we systematically distinguish between the two concepts in this book.
Generally, through the use of an evaluation function whose global optima correspond to the (optimal) solutions, decision problems and optimisation problems can be treated analogously. However, for a decision problem, the result of the SLS algorithm is generally useless, unless it is a global optimum of the evaluation function and hence corresponds to a solution. For optimisation problems, suboptimal solutions (usually local minima) can be very useful — in which case the respective evaluation function should guide the algorithm to high-quality solutions as effectively as possible (which might complicate or conflict with providing effective guidance towards optimal solutions). Remark: In the literature, the evaluation function is often treated as an integral part of the definition of an SLS algorithm. Although it is technically possible to define SLS algorithms using the concept of an evaluation function instead of that of a step function, the resulting definitions would capture the concept of stochastic local search less naturally and would lead to unnecessarily complex or imprecise representations of certain SLS algorithms.
1.5 Stochastic Local Search
47
These difficulties specifically arise for SLS algorithms that use multiple or dynamically changing evaluation functions (such techniques are prominent and successful in various domains). Using our definition, in many cases the concept of an evaluation function still provides a useful and convenient means for structuring the definition of step functions.
Iterative Improvement One of the most basic SLS algorithms using an evaluation function is Iterative Improvement. Given a search space S , solution set S , neighbourhood relation N and evaluation function g , Iterative Improvement starts from a randomly selected point in the search space, and then tries to improve the current candidate solution w.r.t. g . The initialisation function is typically the same as in Uninformed Random Picking, that is, for arbitrary s ∈ S , init(s) := 1/#S . Furthermore, if for a given candidate solution s, I (s) is the set of all neighbouring candidate solutions s ∈ N (s) for which g (s ) < g (s), then the step function can be formally defined as:
step(s)(s ) :=
1/#I (s) 0
if s ∈ I (s) otherwise
This SLS strategy is also known as iterative descent or hill-climbing, where the latter name is motivated by the application of Iterative Improvement to maximisation problems. Note that in the case where none of the neighbours of a candidate solution s realises an improvement w.r.t. the evaluation function, step(s) is not a probability distribution. Hence, when using this step function, the search process is terminated as soon as this case is encountered — an obviously unsatisfying mechanism which we will revisit shortly. Example 1.4 Iterative Improvement for SAT
Using the same definition for the search space, solution set, neighbourhood relation, and set of memory states as in Example 1.3 (page 41f.), we consider the evaluation function g which maps each variable assignment a to the number of clauses of the given formula F that are unsatisfied under a. Iterative Improvement then starts the search at a randomly selected variable assignment (like Uninformed Random Walk, see Example 1.3), and in each step, it randomly selects one of the assignments that leave fewer clauses unsatisfied than the current candidate solution. Since according to the definition of the neighbourhood relation, each search step corresponds to flipping the truth value associated with one of the variables appearing
48
Chapter 1 Introduction
in F , Iterative Improvement can be seen as always performing variable flips that increase the overall number of satisfied clauses. To efficiently implement iterative improvement algorithms, evaluation function values are typically maintained using incremental updates (also called delta evaluations) after each search step. This is done by calculating the effects of the differences between the current candidate solution s and a neighbouring candidate solution s on the evaluation function value. Since in many cases, the evaluation function value of a candidate solution consists of independent contributions of its individual solution components (or of small subsets of solution components), this can often be achieved by solely considering the contributions of those solution components that are not common to s and s . For example, in the case of the TSP, where the solution components correspond to the edges of the given graph, when using the standard 2-exchange neighbourhood, neighbouring round trips p and p differ in two edges. Given w(p), the weight of p, the weight w(p ) can be incrementally computed by subtracting the weight of the edges contained in p but not in p and adding the weight of the edges contained in p but not in p. Note how in this example, using incremental updating, the computation of w(p ) requires at most four arithmetic operations, regardless of the number n of vertices in the given graph, compared to n arithmetic operations if w(p ) is computed from scratch.
Local Minima In our definition of Iterative Improvement, the step function is not well-defined for candidate solutions that do not have any improving neighbours. A candidate solution with this property corresponds to a local minimum of the evaluation function g . Formally, this is captured in the following definition: Definition 1.12 Local Minimum, Strict Local Minimum
Given a search space S , a solution set S ⊆ S , a neighbourhood relation N ⊆ S × S and an evaluation function g : S → R, a local minimum is a candidate solution s ∈ S such that for all s ∈ N (s), g (s) ≤ g (s ). We call a local minimum s a strict local minimum if for all s ∈ N (s), g (s) < g (s ). (Local maxima and strict local maxima can be defined analogously.)
Note that under this definition, global minima of the evaluation function are also considered to be local minima. Intuitively, local minima, and even more so,
1.5 Stochastic Local Search
49
strict local minima, are positions in the search space from which no single search step can achieve an improvement w.r.t. the evaluation function. In cases where an SLS algorithm guided by an evaluation function encounters a local minimum that does not correspond to a solution, this algorithm can ‘get stuck’. This happens, for example, when an Iterative Improvement algorithm is defined in such a way that it terminates (or just stays at the same candidate solution) when a local optimum is encountered. There are no general (non-trivial) theoretical bounds on the solution quality of local optima for arbitrary combinatorial optimisation problems. While such bounds have been proven for specific problems (e.g., the Euclidean TSP [Chandra et al., 1999]), general guarantees can only be given for complete neighbourhood relations, in which case any local minimum is also a global minimum. Yet, the size of such complete neighbourhoods is typically exponential w.r.t. instance size, and therefore they cannot be searched reasonably efficiently in practice. However, typical instances of combinatorial optimisation problems can be empirically shown to have high-quality local optima which often can be found reasonably efficiently by high-performance SLS algorithms.
Computational Complexity of Local Search While empirically, local minima of basically any instance of a combinatorial optimisation problem can be found reasonably fast, theoretically, in most cases the number of steps needed by an iterative improvement algorithm to find a local optimum cannot be bounded by a polynomial. However, any local search algorithm should at the very least be able to execute individual local search steps efficiently. This idea gives rise to the complexity class PLS [Johnson et al., 1988]. Intuitively, PLS is the class of problems for which a local search algorithm exists in which initial positions and search steps as well as the evaluation function values of search positions can always be computed in polynomial time (w.r.t. instance size) on a deterministic machine. This means that local optimality can be verified efficiently or, in case a candidate solution is not locally optimal, a neighbouring solution of better quality can be generated in polynomial time. Note that this theoretical concept does not include any statement on the number of local search steps required for reaching a local optimum. Analogously to the notion of N P-completeness, the class of PLS-complete problems is defined in such a way that it captures the hardest problems in PLS. If for any of these problems local optima can be found in polynomial time, the same would hold for all problems in PLS. It is conjectured that the class of polynomial local search problems is a strict subset of PLS and hence, in the worst case superpolynomial run-time may be required by any algorithm to find local minima of a PLS-complete problem. The first well-known combinatorial optimisation
50
Chapter 1 Introduction
problem that was shown to be PLS-complete is the partitioning of weighted graphs under the Kernighan-Lin neighbourhood [Kernighan and Lin, 1970]. The TSP under the neighbourhood induced by a variant of the Lin-Kernighan Algorithm [Lin and Kernighan, 1973], one of the most efficient local search algorithm for the TSP, has also been shown to be PLS-complete [Papadimitriou, 1992]. Furthermore, PLS-completeness has been shown for Iterative Improvement algorithms for the TSP that are based on the standard k -exchange neighbourhood with sufficiently large k > 3 [Krentel, 1989], while the question of PLS-completeness when using 2- or 3-exchange neighbourhoods remains open [Johnson and McGeoch, 1997; Yannakakis, 1997].
Escape Strategies In many cases, local minima are quite common (this will be further discussed in Chapter 5), and for optimisation problems, the corresponding candidate solutions are typically not of sufficiently high quality. Consequently, techniques for avoiding or escaping from local minima are of central importance in SLS algorithm design, and a large number of such mechanisms have been proposed and evaluated in the literature. Many of these are discussed in detail or mentioned in passing in the following chapters, and specifically the next chapter introduces some of the most prominent and successful approaches for avoiding search stagnation due to local minima. Therefore, we restrict the present discussion to two very simple methods. One straightforward way of modifying Iterative Improvement such that local minima are dealt with more reasonably is to simply reinitialise the search process whenever a local minimum is encountered. While this simple restart strategy can work reasonably well when the number of local minima is rather small or restarting the algorithm is not very costly (in terms of overhead cost for initialising data structures, etc.), in many cases this technique is rather ineffective. Alternatively, one can relax the improvement criterion and, when a local minimum is encountered, perform a randomly chosen non-improving step. This can be realised as a uniform random selection among all neighbours of the current search position (which corresponds to an Uninformed Random Walk step), or it can be done by randomly selecting one of the neighbours that result in the lowest increase in evaluation function value (this corresponds to a ‘mildest ascent step’ and is closely related to a variant of Iterative Improvement that will be discussed in more detail in Chapter 2). For neither of those latter two mechanisms is there any guarantee that the search algorithm effectively escapes from arbitrary local minima, because the nature of a local minimum can be such that after any such ‘escape step’, the only improving step available leads directly back into the same local minimum.
1.5 Stochastic Local Search
51
Furthermore, in the case of non-strict local minima, minimally worsening steps will lead to walks in so-called plateaus — regions of neighbouring candidate solutions with identical evaluation function values. Such plateaus can be very extensive (cf. Chapter 5), and it can be difficult to decide whether the search process is trapped in a plateau region that does not allow any further improvement without an effective escape mechanism.
Intensification vs Diversification As we will show in more detail in later chapters, the strong randomisation of local search algorithms, that is, the utilisation of stochastic choice as an integral part of the search process, can lead to significant increases in performance and robustness. However, with this potential comes the need to balance randomised and goal-directed components of the search strategy, a trade-off which is often characterised as ‘diversification vs intensification’. Intensification refers to search strategies that aim to greedily improve solution quality or the chances of finding a solution in the near future by exploiting, for instance, the guidance given by the evaluation function. Diversification strategies try to prevent search stagnation by making sure that the search process achieves a reasonable coverage when exploring the search space and does not get trapped in relatively confined regions that do not contain (sufficiently high-quality) solutions. In this sense, Iterative Improvement is an intensification strategy, while Uninformed Random Walk is a diversification strategy, and as we will see in the next chapter, both strategies can be combined into an SLS approach called Randomised Iterative Improvement, which typically shows improved performance over both pure search methods. A large variety of techniques for combining and balancing intensification and diversification strategies has been proposed, and to some extent these will be presented and discussed in the remainder of this book. While the resulting SLS algorithms often perform very well in practice, typically their behaviour is not well understood. The successful application of these algorithms is often based on intuition and experience rather than on theoretically or empirically derived principles and insights, particularly when it comes to the trade-off between diversification and intensification. While in this context, problem specific knowledge is often (if not typically) crucial for achieving peak performance and robustness, a solid understanding of the various types of SLS methods, combined with detailed knowledge of their properties and characteristics is of at least equal importance. The latter is especially relevant in cases where one of the reasons for applying SLS algorithms is a lack of sufficient specific knowledge about the problem to be solved; in this situation, where specialised algorithms are typically not available, SLS algorithms are attractive because they often allow solving the problem
52
Chapter 1 Introduction
reasonably efficiently using fairly generic and easily implementable algorithms. More importantly, for many hard combinatorial problems, such generic SLS methods can also be quite naturally extended with or adapted based on problemspecific knowledge as it becomes available. The specialised SLS algorithms thus obtained are often amongst the best-known techniques for solving these problems, and specifically for large instances of optimisation problems or under tight constraints on time and other computational resources, in many cases they represent the only known methods for finding (high-quality) solutions in practice. In the following chapters, we will introduce and discuss a broad range of SLS algorithms, covering many state-of-the-art generic SLS methods. Our discussion will focus on underlying general properties and design principles, such as the combination of search strategies and methods for balancing intensification and diversification aspects of search. Later, we will show in detail how these general methods are applied and adapted to specific combinatorial problems, yielding high-performance or state-of-the-art algorithms for solving these problems.
In Depth Randomness and Probabilistic Computation Implementations of randomised algorithms almost always realise all random choices and decisions by means of a pseudo-random number generator (PRNG) [Knuth, 1997]. PRNGs are provided by basically all modern programming environments; they are based on deterministically generated cycles of integers, from which floating point numbers can be obtained by appropriate scaling. PRNGs should satisfy the following conditions: •
The generated sequence of numbers should be serially uncorrelated, that is, ntuples from the sequence should be independent of one another.
•
The generator should have a long period; while ideally it should not cycle at all, in practice, the repetition should occur only after a very large set of numbers has been generated.
•
The sequence should be uniform and unbiased, that is, equal fractions of generated numbers should fall into equal intervals.
•
The algorithm underlying the PRNG and its implementation should be as efficient as possible.
To date, most PRNGs are based on three types of methods: linear congruential generators (LCGs), lagged Fibonacci generators (LFGs) and the Mersenne Twister (MT) [Matsumoto and Nishimura, 1998]. Especially the Mersenne Twister presents an interesting alternative to standard PRNGs, which are typically based on linear congruential generators. The MT 19 937 version of the Mersenne Twister has a period of 219 937 − 1 and has been shown to generate sequences of excellent quality. It is also very efficient (according to empirical measurements it is up to four times faster than the standard rand() function in C/C++); furthermore, implementations in many programming languages are freely and publicaly available [Matsumoto, 2003].
1.5 Stochastic Local Search
53
Basically all PRNGs produce uniformly distributed numbers. In most SLS algorithms, random decisions involve uniform or biased choices from a finite set; these can be easily implemented using a uniform random number generator. Sometimes, however, it is desirable to sample from a different type of distribution, such as a normal or exponential distribution. This can be achieved by appropriately chosen transformations of the output of a uniform random number generator, such as the well-known Box-Muller transformation, which generates a pair of normally distributed random values from a pair of uniformly distributed values [Box and Muller, 1958]. As an alternative to using pseudo-random numbers generated by PRNGs, true random numbers can be used; these can be obtained in various ways, all of which are based on sampling and processing a source of entropy, such as the decay of radioactive material, atmospheric noise from a radio, etc. (Of course, whether these physical phenomena are truly random is ultimately unclear; however, for all practical purposes, they appear to be random according to the commonly used criteria.) Random numbers from such sources are freely and publicly available from several websites (cf. Haahr [2003]; Walker [2003]); furthermore, there are various commercial hardware devices that can be plugged into a standard PC (empirical studies suggest that not all of these consistently produce high-quality random number sequences). Compared to efficient PRNGs, these true random number generators are very slow, which can be a serious drawback in the context of heavily randomised algorithms, such as many SLS algorithms or Monte Carlo simulations, which may require millions of random numbers per second. It is well known that for implementing certain types of probabilistic algorithms, especially Monte-Carlo simulations of physical systems, it is of crucial importance to use a high-quality (pseudo-)random number generator. In the case of SLS algorithms, the issue is less clear. There is no empirical evidence that true random number sequences offer any advantages over the sequences obtained from state-of-the-art pseudo-random number generators, and given the general availability and efficiency of the latter, there is no reason to use lower quality PRNGs or true random number sources. It may be noted that by implementing a probabilistic algorithm using a PRNG, what is implemented is effectively a derandomised, entirely deterministic version of the algorithm. As previously mentioned, with current hardware this has the advantage of higher efficiency; it also greatly facilitates debugging, since any run of the algorithm can be perfectly reproduced by initialising the PRNG with the same seed. This raises the question whether this derandomisation can result in a loss of computational power or efficiency. From a theoretical point of view, SLS algorithms, like all algorithms that use randomised decisions, are based on a probabilistic model of computation, such as the probabilistic Turing machine [de Leeuw et al., 1955; Gill, 1977; Rabin, 1976], which can be seen as a variant of a conventional deterministic Turing machine that has access to an arbitrary number of random bits, each of which is zero or one with probability 1/2, that can be used at arbitrary points in the computation. Note that there is an important difference between nondeterministic and probabilistic machine models. Nondeterministic models — which are used, for example, in the definition of the complexity class N P — intuitively can be seen as having the ability to make nondeterministic guesses such that computation time is minimised. (Alternatively, nondeterministic machines can be viewed as pursuing all possible paths of computation simultaneously, such that only the shortest of these potential computations determines the run-time.) Probabilistic machine models, on the other hand, can be seen as making actual randomised choices; consequently, each possible path of computation corresponds
54
Chapter 1 Introduction to a set of such mutually independent random choices and has a probability associated with it. This gives rise to a probability distribution over computation paths and consequently, over run-times. Probabilistic models of computation and randomised algorithms are of substantial interest in complexity theory. For decision problems, one can distinguish between three types of randomised algorithms: depending on the probability of giving incorrect ‘yes’ and ‘no’ answers, respectively, there are algorithms with zero, one-sided and two-sided error. SLS algorithms, as considered in this book, are generally of the first type, which is also known as the class of Las Vegas algorithms. (These are formally defined and further discussed in Chapter 4.) The class of problems that can be solved by probabilistic algorithms with zero error probability in expected run-time that is polynomial in the size of the given input is known as ZPP . Another prominent probabilistic complexity class is BPP , the class of problems that can be solved in polynomial time (w.r.t. the size of the given input) in the worst case by a probabilistic algorithm with two-sided error probability bounded from above by 1/2 − for some > 0 [Papadimitriou, 1994]. While it is known that ZPP ⊆ BPP , many questions regarding the relationships between these probabilistic complexity classes and other complexity classes, in particular P and N P , remain open. It is rather easy to see that P ⊆ ZPP (and hence also P ⊆ BPP ); furthermore, it is known that ZPP ⊆ N P . Assuming P = N P , it is not known whether ZPP is a proper superset of P or a proper subset of N P . Furthermore, the relationship between BPP and N P is unknown. Interestingly, it is strongly believed that P = BPP (see, for example, Kabanets [2002]), and hence also P = ZPP , which suggests that from a theoretical point of view, the use of true randomisation may not substantially improve our ability of solving hard combinatorial problems. The fact that empirically, the derandomisation of probabilistic algorithms by use of high-quality PRNGs typically does not appear to result in performance losses is consistent with this belief and, in fact, is commonly seen as additional support for it. There is little doubt, however, that the typical properties of true and pseudo-random number sequences stated above are crucial for the excellent performance of many SLS methods and other probabilistic algorithms.
1.6 Further Readings and Related Work Due to the introductory nature of this chapter, there is a huge body of literature related to the concepts presented here. Introductions to combinatorial problems and search methods can be found in many modern or classic textbooks on combinatorial optimisation, operations research or artificial intelligence (such as Aarts and Lenstra [1997], Lawler [1976], Nemhauser and Wolsey [1988], Papadimitriou and Steiglitz [1982], Poole et al. [1998], Reeves [1993b], RaywardSmith et al. [1996], Russel and Norvig [2003], etc.); details on heuristic search can also be found in Pearl [1984]. For a slightly different definition of combinatorial optimisation problems we refer to the classical text by Papadimitriou and Steiglitz [1982]. A detailed discussion of complexity theory, N P-completeness and
1.7 Summary
55
N P-hard problems can be found in Garey and Johnson [1979], Papadimitriou [1994] or Reischuk [1990]. For a general reference to recent research on the Propositional Satisfiability Problem we refer the interested reader to the book edited by van Maaren, Gent and Walsh [2000], and to the overview article by Gu et al. [1997]. For details on and a large number of variants of the TSP we refer to the now classical book edited by Lawler et al. [1985] or the monograph by Reinelt [1994]. For a detailed account of the state-of-the-art in TSP solving with SLS algorithms up to 1997, the book chapter by Johnson and McGeoch [1997] is the best reference; results of more recent variants are collected in a book chapter by Johnson and McGeoch [2002] and on the web pages for the 8th DIMACS Challenge on the TSP [Johnson et al., 2003a]. Regarding stochastic local search methods for SAT, early work includes the studies by Selman et al. [1992] and Gu [1992], while some of the better performing algorithms have been presented by McAllester et al. [1997]. For an overview and comparison of the best-performing SLS algorithms for SAT up to the year 2000 we refer to Hoos and Stützle [2000a]. Further details on state-of-the-art algorithms for SAT and TSP as well as further references for these problems can be found in Chapters 6 and 8 of this book.
1.7 Summary This chapter started with a brief introduction to combinatorial problems and distinguished between two main types of problems, decision and optimisation problems. We introduced the Propositional Satisfiability Problem (SAT) and the Travelling Salesman Problem (TSP) as two prototypical combinatorial problems. Both problems are conceptually simple and easy to state, which facilitates the design and analysis of algorithms. At the same time, they are computationally hard and appear at the core of many real-world applications; hence, these problems pose a constant challenge for the development of new algorithmic techniques for solving hard combinatorial problems. Many combinatorial problems, including SAT and TSP, are N P-hard; consequently, there is little hope for finding algorithms with better than exponential worst-case behaviour. However, this does not imply that all instances of these problems are intrinsically hard. Interesting or application-relevant subclasses of hard combinatorial problems can be efficiently solvable. For many optimisation problems, there are efficient approximation algorithms that can find good solutions reasonably efficiently. Additionally, stochastic algorithms can help in solving combinatorial problems more robustly and efficiently in practice.
56
Chapter 1 Introduction
Next, we discussed various search paradigms and highlighted their relations and properties. We distinguished perturbative local search methods, which operate on fully instantiated candidate solutions, from constructive search algorithms, which iteratively extend partial candidate solutions. Combinations of constructive search algorithms with backtracking lead to complete, systematic search methods that are traditionally known as tree search or refinement search techniques. Local search algorithms, which move between candidate solutions based on local information only, have the advantage of being easily applicable to a broad range of combinatorial problems, for many of which they have been shown to be the most effective solution methods. Furthermore, they are typically rather easy to implement and often have attractive any-time properties. But these advantages come at a price. Local search algorithms are typically incomplete and, particularly in the case of stochastic local search methods, they are generally difficult to analyse – an issue that will be addressed in more detail in Chapters 4 and 5. Finally, we gave a general definition of stochastic local search algorithms that covers both, perturbative as well as constructive methods within a unified framework. Based on this definition we introduced and discussed a number of simple SLS strategies such as Iterative Improvement, which forms the basis of many of the more complex SLS methods presented in the next chapter.
Exercises 1.1
[Easy] Consider the following Graph Colouring Problem: Given a graph G := (V, E ) with vertex set V and edge relation E , assign a minimal number of colours c1 , c2 , . . . , ck to the vertices such that two vertices that are connected by an edge in E are never assigned the same colour. Show how this problem fits the (informal) definition of a combinatorial problem from Section 1.1, and state the different decision and optimisation variants.
1.2
[Medium] Recall the puzzle from the prologue: Last week my friends Anne, Carl, Eva, Gustaf and I went out for dinner every night, Monday to Friday. I missed the meal on Friday because I was visiting my sister and her family. But otherwise, every one of us had selected a restaurant for a particular night and served as a host for that dinner. Overall, the following restaurants were selected: a French bistro, a sushi bar, a pizzeria, a Greek restaurant, and the Brauhaus. Eva took us out on Wednesday. The Friday dinner was at the Brauhaus. Carl, who doesn’t eat sushi, was the first host. Gustaf had selected the
Exercises
57
bistro for the night before one of the friends took everyone to the pizzeria. Tell me, who selected which restaurant for which night? Formalise this puzzle as a SAT instance. 1.3
[Hard] Consider the problem of finding a Hamiltonian cycle in a given (undirected) graph (cf. Definition 1.6, page 21). Show how the known result that the Hamiltonian Cycle Problem is N P-hard implies the N P-hardness of the TSP for graphs in which all edge weights are equal to one or two. (Hint: You need to show that any polynomial-time deterministic TSP algorithm could be used for solving (suitably encoded) instances of the Hamiltonian Cycle Problem.)
1.4
[Hard] Consider the following argument. For the Euclidean TSP, given an arbitrary approximation ratio r > 1, there exists a deterministic algorithm that achieves that ratio in polynomial run-time w.r.t. the number of vertices, n. Hence, the associated decision problems for arbitrary solution quality bounds can be solved by a deterministic algorithm with run-time polynomial in n, which implies that the search variant of the Euclidean TSP is also efficiently solvable. (Note that this conclusion is in direct contradiction with the known result that the Euclidean TSP is N P-hard.) Why is this argument flawed? (Hint: Think carefully about the nature of the solution quality bounds.)
1.5
[Easy] Given an arbitrary TSP instance G, does the Nearest Neighbour Heuristic (see Section 1.4, page 31ff.) always return the same solution, that is, does G have a uniquely defined nearest neighbour tour? (Justify your answer.)
1.6
[Easy] Consider the following recursive algorithm for SAT: procedure DP-SAT(F,A) input: propositional formula F, partial truth assignment A output: true or false if A satisfies F then return true end if ∃ unassigned variable in A then randomly select variable x that is unassigned in A; A := A extended by x := ;
A := A extended by x := ⊥; end if DP-SAT (F,A ) = true or DP-SAT (F,A ) = true then return true
58
Chapter 1 Introduction
else return false end end DP-SAT
Which search paradigm does this algorithm implement and which of the properties discussed in Section 1.4 does it possess? 1.7
[Medium] Design a simple complete stochastic local search algorithm for SAT and show how it fits Definition 1.10 (page 38f.). Show that your algorithm is complete and discuss the practical importance of this completeness result.
1.8
[Medium] Consider the following, alternative definition of a stochastic local search algorithm. Given a (combinatorial) problem Π, a stochastic local search algorithm for solving an arbitrary problem instance π ∈ Π is defined by the following components: • a (directed) search graph G(π ) := (V, E ), where the elements of V are the candidate solutions of π and the arcs in E connect any candidate solution to those candidate solutions that can be reached in one search step; • an evaluation function fπ , which assigns a numerical value fπ (s) to each candidate solution s and whose global maxima correspond to the (optimal) solutions of π ; • an initialisation procedure init(π ), which determines a candidate solution at which the search process is started; • an iteration procedure iter(π ), which for any candidate solution s selects a candidate solutions s such that (s, s ) ∈ E ; • a termination function terminate(π ), which for a given candidate solution determines whether the search is to be terminated (this function can make use of a random number generator and a limited amount of memory on earlier events in the search process). Is this definition equivalent to Definition 1.10 (page 38f.), that is, does it cover the same class of algorithms? Discuss the differences between the definitions and try to decide which one is better.
1.9
[Medium] Consider the decision variant of the Graph Colouring Problem as described in Exercise 1.1. Design an iterative improvement algorithm for this problem.
Exercises
59
1.10 [Easy] Consider the following Conflict-Directed Random Walk algorithm for SAT: procedure CDRW-SAT (F ) input: CNF formula F output: model of F or ‘no solution found’
a := randomly chosen assignment of the variables in formula F ; while not (a is a model of F ) do c := randomly chosen clause in F that is unsatisfied under a; v := randomly chosen variable from c; a := a with v flipped; end return ‘no solution found’; end CDRW-SAT Let N be the neighbourhood relation under which assignments a and a are direct neighbours if, and only if, a can be reached from a in a single search step according to this algorithm. Is N symmetric? (Justify your answer.)
This Page Intentionally Left Blank
Perfection has been attained not when nothing remains to be added but when nothing remains to be taken away. —Antoine de Saint-Exupéry, Pilot & Writer
SLS METHODS Stochastic Local Search (SLS) is a widely used approach to solving hard combinatorial optimisation problems. Underlying most, if not all, specific SLS algorithms are general SLS methods that can be applied to many different problems. In this chapter we present some of the most prominent SLS methods and illustrate their application to hard combinatorial problems, using SAT and TSP as example domains. The techniques covered here range from simple iterative improvement algorithms to complex SLS methods, such as Ant Colony Optimisation and Evolutionary Algorithms. For each of these SLS methods, we motivate and describe the basic technique and discuss important variants. Furthermore, we identify and discuss important characteristics and features of the individual methods and highlight relationships between them.
2.1 Iterative Improvement (Revisited) In Chapter 1, Section 1.5, we introduced Iterative Improvement as one of the simplest, yet reasonably effective SLS methods. We have pointed out that one of the main limitations of Iterative Improvement is the fact that it can, and often does, get stuck in local minima of the underlying evaluation function. Here, we discuss how using larger neighbourhoods can help to alleviate this problem without rendering the exploration of local neighbourhoods prohibitively expensive. 61
62
Chapter 2 SLS Methods
Large Neighbourhoods As pointed out before, the performance of any stochastic local search algorithm depends significantly on the underlying neighbourhood relation and, in particular, on the size of the neighbourhood. Consider the standard k -exchange neighbourhoods introduced in Chapter 1, Section 1.5. It is easy to see that for growing k , the size of the neighbourhood (i.e., the number of direct neighbours for each given candidate solution), also increases. More precisely, for a k -exchange neighbourhood, the size of the neighbourhood is in O (nk ), that is, the neighbourhood size increases exponentially with k . Generally, larger neighbourhoods contain more and potentially better candidate solutions, and hence they typically offer better chances for finding locally improving search steps. They also lead to neighbourhood graphs with smaller diameters, which means that an SLS trajectory can potentially more easily explore different regions of the underlying search space. In a sense, the ideal case would be a neighbourhood relation for which any locally optimal candidate solution is guaranteed to be globally optimal. Neighbourhoods which satisfy this property are called exact; unfortunately, in most cases exact neighbourhoods are exponentially large with respect to the size of the given problem instance, and searching an improving neighbouring candidate solution may take exponential time in the worst case. (Efficiently searchable exact neighbourhoods exist in a few cases; for example, the Simplex Algorithm in linear programming is an iterative improvement algorithm that uses a polynomially searchable, exact neighbourhood, and is hence guaranteed to find a globally optimal solution.) This situation illustrates a general tradeoff: using larger neighbourhoods might increase the chance of finding (high-quality) solutions of a given problem in fewer local search steps when using SLS algorithms in general and Iterative Improvement in particular; but at the same time, the time complexity for determining improving search steps is much higher in larger neighbourhoods. Typically, the time complexity of an individual local search step needs to be polynomial w.r.t. the size of the given problem instance. However, depending on problem size, even quadratic or cubic time per search step might already be prohibitively high if the instance is very large.
Neighbourhood Pruning Given the tradeoff between the benefits of using large neighbourhoods and the associated time complexity of performing search steps, one attractive idea for improving the performance of Iterative Improvement and other SLS algorithms is to use large neighbourhoods but to reduce their size by never examining neighbours
2.1 Iterative Improvement (Revisited)
63
that are unlikely to (or that provably cannot) yield any improvements in evaluation function value. While in many cases, the use of large neighbourhoods is only practically feasible in combination with such pruning, the same pruning techniques can be applied to relatively small neighbourhoods, where they can lead to substantial improvements in SLS performance. For the TSP, one such pruning technique that has been shown to be useful in practice is the use of candidate lists, which for each vertex in the given graph contain a limited number of their closest direct neighbours, ordered according to increasing edge weight. The search steps performed by an SLS algorithm are then limited to consider only edges connecting a vertex i to one of the vertices in i’s candidate list. The use of such candidate lists is based on the intuition that high-quality solutions will be likely to include short edges between neighbouring vertices (cf. Figure 1.1, page 23). In the case of the TSP, pruning techniques have shown significant impact on local search performance not only for large neighbourhoods, but also for rather small neighbourhoods, such as the standard 2-exchange neighbourhood. Other neighbourhood pruning techniques identify neighbours that provably cannot lead to improvements in the evaluation function based on insights into the properties of a given problem. An example for such a pruning technique is described by Nowicki and Smutnicki [1996a] in their tabu search approach to the Job Shop Problem, which will be described in Chapter 9.
Best Improvement vs First Improvement Another method for speeding up the local search is to select the next search step more efficiently. In the context of iterative improvement algorithms, the search step selection mechanism that implements the step function from Definition 1.10 (page 38f.) is also called pivoting rule [Yannakakis, 1990]; the most widely used pivoting rules are the so-called best improvement and first improvement strategies described in the following. Iterative Best Improvement is based on the idea of randomly selecting in each search step one of the neighbouring candidate solutions that achieve a maximal improvement in the evaluation function. Formally, the corresponding step function can be defined as follows: given a search position s, let g ∗ := min{g (s ) | s ∈ N (s)} be the best evaluation function value in the neighbourhood of s. Then I ∗ (s) := {s ∈ N (s) | g (s ) = g ∗ } is the set of maximally improving neighbours of s, and we define step(s)(s ) := 1/#I ∗ (s) if s ∈ I ∗ (s), 0 otherwise. Best Improvement is also called greedy hill-climbing or discrete gradient descent. Note that Best Improvement requires a complete evaluation of all neighbours in each search step.
64
Chapter 2 SLS Methods
The First Improvement neighbour selection strategy tries to avoid the time complexity of evaluating all neighbours by performing the first improving step encountered during the inspection of the neighbourhood. Formally, Iterative First Improvement is best defined by means of a step procedure rather than a step function. At each search position s, the First Improvement step procedure evaluates the neighbouring candidate solutions s ∈ N (s) in a particular fixed order, and the first s for which g (s ) < g (s), that is, the first improving neighbour encountered, is selected. Obviously, the order in which the neighbours are evaluated can have a significant influence on the efficiency of this strategy. Instead of using a fixed ordering for evaluating the neighbours of a given search position, random orderings can also be used. For fixed evaluation orderings, repeated runs of Iterative First Improvement starting from the same initial solution will end in the same local optimum, while by using random orderings, many different local optima can be reached. In this sense, random-order First Improvement inherently leads to a certain diversification of the search process. The following example illustrates the variability of the candidate solutions reached by random-order First Improvement.
Example 2.1 Random-Order First Improvement for the TSP
In this example, we empirically study a random-order first improvement algorithm for the TSP that is based on the 2-exchange neighbourhood. This algorithm always starts from the same initial tour, which visits the vertices of the given graph in their canonical order (i.e., in the order v1 , v2 , . . . , vn , v1 ). Furthermore, when initialising the search, a random permutation of the integers from 1 to n is generated, which determines the order in which the neighbhourhood is scanned in each search step. (This permutation remains unchanged throughout the search process.) As usual for simple iterative improvement methods, the search is terminated when a local minimum of the given evaluation function (here: weight of the candidate tour) is encountered. This algorithm was run 1 000 times on pcb3038, a TSP instance with 3 038 vertices available from the TSPLIB benchmark library. For each of these runs, the length of the final, locally optimal tour (i.e., the weight of the corresponding path in the graph) was recorded. Figure 2.1 shows the cumulative distribution of the percentage deviations of these solution quality values from the known optimal solution. (The cumulative distribution function specifies for each relative solution quality value q on the x-axis the relative frequency with which a solution quality smaller or equal to q is obtained.) Clearly, there is a large degree of variation in the qualities of the 1 000 tours produced by our random-order iterative first improvement algorithm. The average tour length is 8.6% above the known optimum, while the 0.05-
cumulative frequency
2.1 Iterative Improvement (Revisited)
65
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 7
7.5
8
8.5
9
9.5
10
10.5
relative solution quality [%]
Figure 2.1 Cumulative distribution of the solution quality returned by a random-order
first improvement 2-exchange algorithm for the TSP on TSPLIB instance pcb3038, based on 1 000 runs of the algorithm.
and 0.95-quantiles of this solution quality distribution can be determined as 7.75% and 9.45% above the optimum. Based on the shape of this empirical distribution, it can be conjectured that the solution quality data follow a normal distribution. This hypothesis can be tested using the Shapiro-Wilk test [Shapiro and Wilk, 1965], a statistical goodness-of-fit test that specifically checks whether given sample data are normally distributed. In this example, the test accepts the hypothesis that the solution quality data follow a normal distribution with mean 8.6 and standard deviation 0.51 at a p-value of 0.2836. (We refer to Chapter 4 for more details on statistical tests). Normally distributed solution qualities occur rather frequently in the context of SLS algorithms for hard combinatorial problems, such as the TSP.
As for large neighbourhoods, in the context of pivoting rules there is a tradeoff between the number of search steps required for finding a local optimum and the computation time for each search step. Search steps in first improvement algorithms can often be computed more efficiently than in best improvement algorithms, since in the former case, typically only a small part of the local neighbourhood is evaluated, especially as long as there are multiple improving search steps from the current candidate solution. However, the improvement obtained by each step of First Improvement is typically smaller than for Best Improvement and therefore, more search steps have to be performed in order to reach a local optimum. Additionally, Best Improvement benefits more than First Improvement from the use of caching and updating mechanisms for evaluating neighbours efficiently.
66
Chapter 2 SLS Methods
Remark: Besides First Improvement and Best Improvement, iterative im-
provement algorithms can use a variety of other pivoting rules. One example is Random Improvement, which randomly selects a candidate solution from the set I (s) := {s ∈ N (s) | g (s ) < g (s)}; this selection strategy can be implemented as First Improvement where a new random evaluation ordering is used in each search step. Another example is the least improvement rule, which selects an element from I (s) that minimally improves the current candidate solution.
Variable Neighbourhood Descent Another way to benefit from the advantages of large neighbourhoods without incurring a high time complexity of the search steps is based on the idea of using standard, small neighbourhoods until a local optimum is encountered, at which point the search process switches to a different (typically larger) neighbourhood, which might allow further search progress. This approach is based on the fact that the notion of a local optimum is defined relative to a neighbourhood relation, such that if a candidate solution s is locally optimal w.r.t. a neighbourhood relation N1 it need not be a local optimum for a different neighbourhood relation N2 . The general idea of changing the neighbourhood during the search has been systematised by the Variable Neighbourhood Search (VNS) framework [Mladenovic´ and Hansen, 1997; Hansen and Mladenovic, ´ 1999]. VNS comprises a number of algorithmic approaches including Variable Neighborhood Descent (VND), an iterative improvement algorithm that realises the general idea behind VNS in a very straightforward way. In VND, k neighbourhood relations N1 , N2 , . . . , Nk are used, which are typically ordered according to increasing size. The algorithm starts with neighbourhood N1 and performs iterative improvement steps until a local optimum is reached. Whenever no further improving step is found for a neighbourhood Ni and i + 1 ≤ k , VND continues the search in neighbourhood Ni+1 ; if an improvement is obtained in Ni , the search process switches back to N1 , from where the search is continued as previously described. An algorithm outline for VND is shown in Figure 2.2. In general, there are variants of this basic VND method that switch between neighbourhoods in different ways. It has been shown that Variable Neighbourhood Descent can considerably improve the performance of iterative improvement algorithms both w.r.t. to the solution quality of the local optima reached, as well as w.r.t. the time required for finding (high-quality) solutions compared to using standard Iterative Improvement in large neighbourhoods [Hansen and Mladenovic, ´ 1999]. It may be noted that apart from VND, there are several other variants of the general idea underlying Variable Neighbourhood Search. Some of these — in
2.1 Iterative Improvement (Revisited)
67
procedure VND(π , N1 , N2 , . . . , Nk )
input: problem instance π ∈ Π , neighbourhood relations N1 , N2 , . . . , Nk output: solution sˆ ∈ S (π ) or ∅
s := init(π ); sˆ := s; i := 1;
repeat find best candidate solution s in neighbourhood Ni (s); if g(s ) < g(s) then
s := s ; if f(s) < f(ˆ s) then sˆ := s; end
i := 1; else
i = i + 1; end until i > k if sˆ ∈ S then return sˆ else return ∅ end end VND Figure 2.2 Algorithm outline for Variable Neighbourhood Descent for optimisation problems; note that the evaluation function g is used for checking whether the search has reached a local minimum, while the objective function of the given problem instance, f, is used for detecting improvements in the incumbent candidate solution. For further details, see text.
particular, Basic VNS and Skewed VNS [Hansen and Mladenovic, ´ 2002] — are conceptually closely related to Iterated Local Search, a hybrid SLS method that will be discussed later in this chapter (cf. Section 2.3).
Variable Depth Search A different approach to selecting search steps from large neighbourhoods reasonably efficiently is to compose more complex steps from a number of steps in small, simple neighbourhoods. This idea is the basis of Variable Depth Search
68
Chapter 2 SLS Methods
(VDS), an SLS method introduced first by Kernighan and Lin for the Graph Partitioning Problem [1970] and the TSP [1973]. Generally, VDS can be seen as an iterative improvement method in which the local search steps are variable length sequences of simpler search steps in a small neighbourhood. Constraints on the feasible sequences of simple steps help to keep the time complexity of selecting complex steps reasonably low. (For an algorithm outline of VDS, see Figure 2.3.) As an example for a VDS algorithm, consider the Lin-Kernighan (LK) Algorithm for the TSP. The LK algorithm performs iterative improvement using complex search steps each of which corresponds to a sequence of 2-exchange steps. The mechanism underlying the construction of a complex step can be
procedure VDS(π ) input: problem instance π ∈ Π output: solution sˆ ∈ S (π ) or ∅
s := init(π ); sˆ := s; while not terminate (π , s) do t := s; tˆ:= t ; repeat
t := selectBestFeasibleNeighbour(π , t); if f(t) < f(tˆ) then tˆ:= t ; end until terminateConstruction (π , t, tˆ);
s := tˆ; if f(s) < f(ˆ s) then sˆ := s; end end if sˆ ∈ S then return sˆ else return ∅ end end VDS
Figure 2.3 Algorithm outline for Variable Depth Search for optimisation problems; for details, see text.
2.1 Iterative Improvement (Revisited)
u
69
v
(a) u
w
v
u
w
v'
v
w
v'
v
(b)
(c) u
w'
(d) Figure 2.4 Schematic view of a Lin–Kernighan exchange step: (a) shows a Hamiltonian
path, (b) a possible δ-path, (c) the next Hamiltonian path (which is closed by introducing the left dashed edge) and (d) indicates a next possible δ-path.
understood best by considering a sequence of Hamiltonian paths, that is, paths that contain each vertex in the given graph G exactly once. Figure 2.4a shows an example in which a Hamiltonian path between nodes u and v is obtained from a valid round trip by removing the edge (u, v ). Let us fix one of the endpoints in this path, say u; the other endpoint is kept variable. We can now introduce a cycle into this Hamiltonian path by adding an edge (v, w) (see Figure 2.4b). The resulting subgraph can also be viewed as a spanning tree of G with one additional edge; it is called a δ -path. The cycle in this δ -path can be broken by removing a uniquely defined edge (w, v ) incident to w, such that the result is a new Hamiltonian path that can be extended to a Hamiltonian cycle (and hence a candidate solution for the TSP) by adding an edge between v and the fixed endpoint u (this is the dashed edge (v , u) in Figure 2.4c). Alternatively, a different edge can be added, leading to a new δ -path as indicated in Figure 2.4d. Based on this fundamental mechanism, the LK algorithm computes complex search steps as follows: Starting with the current candidate solution (a Hamiltonian cycle) s, a δ -path p of minimal path weight is determined by replacing one edge as described above. If the Hamiltonian cycle t obtained from p by adding a (uniquely defined) edge has weight smaller than s, then t (and its weight) is memorised. The same operation is now performed with p as a starting point, and iterated until no δ -path can be obtained with weight smaller than that of the best Hamiltonian cycle found so far. Finally, the minimal weight Hamiltonian cycle s found in this iterative process provides the end point of a complex search step.
70
Chapter 2 SLS Methods
Note that this can be interpreted as a sequence of 1-exchange steps that alternate between δ -paths and Hamiltonian cycles. In order to limit the time complexity of constructing complex search steps, VDS algorithms use two types of restrictions, cost restrictions and tabu restrictions, on the selection of the constituting simple search steps. In the case of the LK algorithm, any edge that has been added cannot be removed and any edge that has been removed cannot be introduced any longer. This tabu restriction has the effect that a candidate sequence for a complex step is never longer than n, the number of vertices in the given graph. The original LK algorithm also uses a number of additional mechanisms, including a limited form of backtracking, for controlling the generation of complex search steps; as a consequence, the final tour returned by the algorithm is guaranteed to be optimal w.r.t. the standard 3exchange neighbourhood. Along with other details of the LK algorithm, these mechanisms are described in Chapter 8, Section 8.2. VDS algorithms have been used with considerable success for solving a number of problems other than the TSP, including the Graph Partitioning Problem [Kernighan and Lin, 1970], the Unconstrained Binary Quadratic Programming Problem [Merz and Freisleben, 2002] and the Generalised Assignment Problem [Yagiura et al., 1999].
Dynasearch Like VDS, Dynasearch is an iterative improvement algorithm that tries to build a complex search step based on a combination of simple search steps [Potts and van de Velde, 1995; Congram et al., 2002; Congram, 2000]. However, differently from VDS, Dynasearch requires that the individual search steps that compose a complex step are mutually independent. Here, independence means that the individual search steps do not interfere with each other with respect to their effect on the evaluation function value and the feasibility of candidate solutions. In particular, any dynasearch step from a feasible candidate solution s is guaranteed to result in another feasible candidate solution, and the overall improvement in evaluation function value achieved by the dynasearch step can be obtained by summing up the effects of applying each individual search step to s. As an example for this independence condition, consider a TSP instance and a specific Hamiltonian cycle t = (u1 , . . . , un , un+1 ), where un+1 = u1 . A 2-exchange step involves removing two edges (ui , ui+1 ) and (uj , uj +1 ) from t (without loss of generality, we assume that 1 ≤ i and i + 1 < j ≤ n). Two 2-exchange steps that remove edges (ui , ui+1 ), (uj , uj +1 ) and (uk , uk+1 ), (ul , ul+1 ), respectively, are independent if, and only if, either j < k or l < i. An example of a pair of independent 2-exchange steps is given in Figure 2.5. Any set of independent steps can be executed in parallel, leading to an overall
2.2 ‘Simple’ SLS Methods
71
Hamiltonian cycle (u1,…, un, un+1) u1
ui ui+1
uj uj+1
uk uk+1
ul ul+1
un un+1
two independent 2-exchange steps u1
ui ui+1
uj uj+1
uk uk+1
ul ul+1
un un+1
Figure 2.5 Example of a pair of independent 2-exchange steps that can potentially form
a dynasearch step.
improvement equal to the sum of the improvements achieved by the simple component steps and to a feasible candidate solution, here: another Hamiltonian cycle in the given graph. The neighbourhood explored by Dynasearch consists of the set of all possible complex search steps; it can be shown that in general this neighourhood can be of exponential size w.r.t. to the size of the underlying simple neighbourhoods. However, through the use of a dynamic programming algorithm it is possible to find the best possible complex search step in polynomial time. (Roughly speaking, the key principle of Dynamic Programming is to iteratively solve a sequence of increasingly large subproblems that leads to a solution of the given problem, exploiting independence assumptions such as the one described previously [Bertsekas, 1995].) Only in the worst case, one complex dynasearch step consists of a single simple step. Although Dynasearch is a very recent local search technique, it has already shown very promising performance on several combinatorial optimisation problems, such as the Single Machine Total Weighted Tardiness Problem [Congram et al., 2002] (we discuss a dynasearch algorithm for this well-known N P-hard scheduling problem in Chapter 9, Section 9.2), the TSP and the Linear Ordering Problem [Congram, 2000].
2.2 ‘Simple’ SLS Methods In the previous section, we introduced several ways of extending simple exchange neighbourhoods that can significantly enhance the performance of Iterative Improvement and prevent this algorithm from getting stuck in very low-quality local optima. Another way of addressing the same problem is to modify the step function, such that for a fixed and fairly simple neighbourhood, the search process can
72
Chapter 2 SLS Methods
perform worsening steps which help it to escape from local optima. As mentioned in Chapter 1, Section 1.5, the simplest technique for achieving this is to use randomised variants of Iterative Improvement or a restart strategy that re-initialises the search process whenever it gets stuck in a local optimum. In this section, we will discuss a number of different methods that achieve the same effect often in a more efficient and robust way. These are simple in the sense that they essentially perform only one type of search step, while later in this chapter, we will discuss hybrid SLS algorithms, which combine various different types of search steps, as well as population-based SLS methods.
Randomised Iterative Improvement One of the simplest ways of extending iterative improvement algorithms such that worsening steps can be performed is to sometimes select a neighbour at random, rather than an improving neighbour, within the individual search steps. Such uninformed random walk steps may be performed with a fixed frequency, such that the alternation between improvement steps and random walk steps follows a deterministic pattern. Yet, depending on the improvement strategy used, this may easily lead to a situation in which the effect of the random walk steps are immediately undone in subsequent improvement steps, leading to cycling behaviour and preventing an escape from given local optima. Therefore, it is preferable to probabilistically determine in each search step whether to apply an improvement step or a random walk step. Typically, this is done by introducing a parameter wp ∈ [0, 1], called walk probability or noise parameter, that corresponds to the probability of performing a random walk step instead of an improvement step. The resulting algorithm is called Randomised Iterative Improvement (RII). Like Iterative Improvement, it typically uses a random initialisation of the search, as described in Chapter 1, Section 1.5. Its step function can be written as stepRII (s)(s ) := wp · stepURW (s)(s ) + (1 − wp)· stepII (s)(s ), where stepURW (s) (s ) is the step function for uninformed random walk and stepII (s)(s ) is a variant of the step function for the Iterative Improvement Algorithm (see Section 1.5) that differs only in that a minimally worsening neighbour is selected if the set I (s) of strictly improving neighbours is empty. As shown in Figure 2.6, the RII step function is typically implemented as a two level choice, where first a probabilistic decision is made on which of the two types of search steps is to be applied, and then the corresponding search step is performed. Obviously, there is no need to terminate this SLS algorithm as soon as a local optimum is encountered. Instead, the termination predicate can be realised in various ways. One possibility is to stop the search after a limit on the CPU time or the number of search steps has been reached; alternatively, the search may be terminated when a given number of search steps has been performed without achieving any improvement.
2.2 ‘Simple’ SLS Methods
73
procedure step-RII(π, s, wp) input: problem instance π, candidate solution s, walk probability wp output: candidate solution s
u := random([0, 1]); if (u ≤ wp) then s := stepURW (π, s); else
s := stepII (π, s); end return s end step-RII Figure 2.6 Standard implementation of the step function for Randomised Iterative Im-
provement; random([0, 1]) returns a random number between zero and one using a uniform probability distribution.
A beneficial consequence of using a probabilistic decision on the type of local search performed in each step is the fact that arbitrarily long sequences of random walk steps (or improvement steps, respectively) can occur, where the probability of performing r consecutive random walk steps is wpr . Hence, there is always a chance to escape even from a local optimum that has a large ‘basin of attraction’ in the sense that many worsening steps may be required to ensure that subsequent improvement steps have a chance of leading into different local optima. In fact, for RII it can be proven that, when the search process is run long enough, eventually a (optimal) solution to any given problem instance is found with arbitrarily high probability. (More details on this proof can be found in the in-depth section on page 155ff.) Example 2.2 Randomised Iterative Improvement for SAT
RII can be very easily applied to SAT by combining the uninformed random walk algorithm presented in Example 1.3 (page 41f.) and an iterative improvement algorithm like that of Example 1.4 (page 47f.), using the same search space, solution set, neighbourhood relation and initialisation function as defined there. The only difference is that here, we will apply a best improvement local search algorithm instead of the simple descent method from Example 1.4: In each step, the best improvement algorithm flips a variable that leads to a maximal increase in the evaluation function. Note that such a best improvement algorithm need not terminate at a local optimum, because in this situation the maximally improving variable flip is a perfectly valid
74
Chapter 2 SLS Methods
worsening step (more precisely: a least worsening step). The step function for RII is composed of the two step functions for this greedy improvement algorithm and for uninformed random walk as described previously: With probability wp, a random neighbouring solution is returned, otherwise with probability 1 − wp, a best improvement step is applied. We call the resulting algorithm GUWSAT.
Interestingly, a slight variation of the GUWSAT algorithm for SAT from Example 2.2, called GSAT with Random Walk (GWSAT), has been proven rather successful (see also Chapter 6, page 269f.). The only difference between GUWSAT and GWSAT is in the random walk step. Instead of uninformed random walk steps, GWSAT uses ‘informed’ random walk steps by restricting the random neighbour selection to variables occurring in currently unsatisfied clauses; among these variables, one is chosen according to a uniform distribution. When GWSAT was first proposed, it was among the best performing SLS algorithms for SAT. Yet, apart from this success, Randomised Iterative Improvement is rather rarely applied. This might be partly due to the fact that it is such a simple extension of Iterative Improvement, and more complex SLS algorithms often achieve better performance. Nevertheless, RII certainly deserves attention as a simple and generic extension of Iterative Improvement that can be generalised easily to more complex SLS methods.
Probabilistic Iterative Improvement An interesting alternative to the mechanism for allowing worsening search steps underlying Randomised Iterative Improvement is based on the idea that the probability of accepting a worsening step should depend on the respective deterioration in evaluation function value such that the worse a step is, the less likely it would be performed. This idea leads to a family of SLS algorithms called Probabilistic Iterative Improvement (PII), which is closely related to Simulated Annealing, a widely used SLS method we discuss directly after PII. In each search step, PII selects a neighbour of the current candidate solution according to a given function p(g, s), which determines a probability distribution over neighbouring candidate solutions of s based on their respective evaluation function values. Formally, the corresponding step function can be written as step(s)(s ) := p(g, s). Obviously, the choice of the function p(g, s) is of crucial importance to the behaviour and performance of PII. Note that both Iterative Improvement, as defined in Chapter 1, Section 1.5, and Randomised Iterative Improvement can
2.2 ‘Simple’ SLS Methods
75
be seen as special cases of PII that are obtained for particular choices of p(g, s). Generally, PII algorithms for which p(g, s) assigns positive probability to all neighbours of s have properties similar to RII, in that arbitrarily long sequences of worsening moves can be performed and (optimal) solutions can be found with arbitrarily high probability as run-time approaches infinity.
Example 2.3 PII / Constant Temperature Annealing for the TSP
The following, simple application of PII to the TSP illustrates the underlying approach and will also serve as a convenient basis for introducing the more general SLS method of Simulated Annealing. Given a TSP instance represented by a complete, edge-weighted graph G, we use the set of all vertex permutations as search space, S , and the same set as our set of feasible candidate solutions, S . (This simply means that we consider each Hamiltonian cycle in G as a valid solution.) As the neighbourhood relation, N , we use a reflexive variant of the 2-exchange neighbourhood defined in Chapter 1, Section 1.5, which for each candidate solution s contains s itself as well as all Hamiltonian cycles that can be obtained by replacing two edges in s. The search process uses a simple randomised initialisation function that picks a Hamiltonian cycle uniformly at random from S . The step function is implemented as a two-stage process, in which first a neighbour s ∈ N (s) is selected uniformly at random, which is then accepted according to the following probability function:
paccept (T, s, s ) :=
⎧ ⎨ ⎩ exp
1
f (s)−f (s ) T
if f (s ) ≤ f (s)
(2.1)
otherwise
This acceptance criterion is known as the Metropolis condition. The parameter T , which is also called temperature, determines how likely it is to perform worsening search steps: at low temperature values, the probability of accepting a worsening search step is low, while at high temperature values, the algorithm accepts even drastically worsening steps with a relatively high probability. As for RII, various termination predicates can be used for determining when to end the search process. This algorithm corresponds to a simulated annealing algorithm in which the temperature is being kept constant at T . In fact, there exists some evidence suggesting that compared to more general simulated annealing approaches, this algorithm performs quite well, but in general, the determination of a good value for T may be difficult [Fielding, 2000].
76
Chapter 2 SLS Methods
Simulated Annealing Considering the example PII algorithm for the TSP, in which a temperature parameter T controls the probability of accepting worsening search steps, one rather obvious generalisation is to allow T to vary over the course of the search process. Conceptually, this leads to a family of SLS algorithms known as Simulated Annealing (SA), which was proposed independently by Kirkpatrick, Gelatt and Vecchi [1983], and Cerný [1985]. SA was originally motivated by the annealing of solids, a physical process in which a solid is melted and then cooled down slowly in order to obtain perfect crystal structures, which can be modelled as a state of minimum energy (also called ground state). To avoid defects (i.e., irregularities) in the crystal, which correspond to meta-stable states in the model, the cooling needs to be done very slowly. The idea underlying SA is to solve combinatorial optimisation problems by a process analogous to the physical annealing process. In this analogy, the candidate solutions of the given problem instance correspond to the states of the physical system, the evaluation function models the thermodynamic energy of the solid, and the globally optimal solutions correspond to the ground states of the physical system. Like PII, Simulated Annealing typically starts from a random initial solution. It then performs the same general type of PII steps as defined in Example 2.3, where in each step first a neighbour s of s is randomly chosen (proposal mechanism), and then an acceptance criterion parameterised by the temperature parameter T is used to decide whether the search accepts s or whether it stays at s (see Figure 2.7). One standard choice for this acceptance criterion is a probabilistic choice according to the Metropolis condition (see Equation 2.1, page 75), which was also used in an early article on the simulation of the physical annealing process [Metropolis et al., 1953], where the parameter T corresponded to the actual procedure step-SA(π, s, T ) input: problem instance π, candidate solution s, temperature T output: candidate solution s
s := proposal(π, s); s := accept(π, s, s , T ); return s end step-SA Figure 2.7 Standard step function for Simulated Annealing; proposal randomly selects a neighbour of s, accept chooses probabilistically between s and s , dependent on temperature T .
2.2 ‘Simple’ SLS Methods
77
physical temperature. Throughout the search process, the temperature is adjusted according to a given annealing schedule (often also called cooling schedule). Formally, an annealing schedule is a function that for each run-time t (typically measured in terms of the number of search steps since initialisation) determines a temperature value T (t). Annealing schedules are commonly specified by an initial temperature T0 , a temperature update scheme, a number of search steps to be performed at each temperature and a termination condition. In many cases, the initial temperature T0 is determined based on properties of the given problem instances such as the estimated cost difference between neighbouring candidate solutions [Johnson et al., 1989; van Laarhoven and Arts, 1987]. Simple geometric cooling schedules in which temperature is updated as T := α · T have been shown to be quite efficient in many cases [Kirkpatrick et al., 1983; Johnson et al., 1989]. The number of steps performed at each temperature setting is often chosen as a multiple of the neighbourhood size. Simulated Annealing can use a variety of termination predicates; a specific termination condition often used for SA is based on the acceptance ratio, that is, the ratio of proposed steps to accepted steps. In this case, the search process is terminated when the acceptance ratio falls below a certain threshold or when no improving candidate solution has been found for a given number of search steps. Example 2.4 Simulated Annealing for the TSP
The PII algorithm for the TSP specified in Example 2.3 (page 75) can be easily extended into a Simulated Annealing algorithm (see also Johnson and McGeoch [1997]). The search space, solution set and neighbourhood relation are defined as in Example 2.3. We also use the same initialisation and step functions, where propose(π, s) randomly selects a neighbour of s, and accept(π, s, s , T ) probabilistically accepts s depending on T , using the Metropolis condition. The temperature T is initialised such that only 3% of the proposed steps are not accepted, and updated according to a geometric cooling schedule with α = 0.95; for each temperature value, n ·(n −1) search steps are performed, where n is the size (i.e., number of vertices) of the given problem instance. The search is terminated when for five consecutive temperature values no improvement of the evaluation function has been obtained, and the acceptance rate of new solutions has fallen below 2%. Compared to standard iterative improvement algorithms, including 3-opt local search (an iterative improvement method based on the 3-exchange neighbourhood on edges) and the Lin-Kernighan Algorithm, the SA algorithm presented in Example 2.4 performs rather poorly. By using additional techniques,
78
Chapter 2 SLS Methods
including neighbourhood pruning (cf. Section 2.1), greedy initialisation, low temperature starts and look-up tables for the acceptance probabilities, significantly improved results, which are competitive with those obtained by the Lin-Kernighan Algorithm, can be obtained. Greedy initialisation methods, such as starting with a nearest neighbour tour, help SA to find high-quality candidate solutions more rapidly. To avoid that the beneficial effect of a good initial candidate solution is destroyed by accepting too many worsening moves, the initial temperature is set to a low value. The use of look-up tables deserves particular attention. Obviously, calculating the exponential function in Equation 2.1 (page 75) is computationally expensive compared to the evaluation of one neighbouring solution obtained by one 2-exchange step. By using a precomputed table of values of the function exp(∆/T ) for a range of argument values ∆/T and by looking up the accetance probabilities exp((f (s) − f (s ))/T ) from that table, a very significant speedup (in our example about 40%) can be achieved [Johnson and McGeoch, 1997]. A feature of Simulated Annealing that is often noted as particularly appealing is the fact that under certain conditions the convergence of the algorithm, in the sense that any arbitrarily long trajectory is guaranteed to end in an optimal solution, can be proven [Geman and Geman, 1984; Hajek, 1988; Lundy and Mees, 1986; Romeo and Sangiovanni-Vincentelli, 1991]. However, the practical usefulness of these results is very limited, since they require an extremely slow cooling that is typically not feasible in practice.
Tabu Search A fundamentally different approach for escaping from local minima is to use aspects of the search history rather than random or probabilistic techniques for accepting worsening search steps. Tabu Search (TS) is a general SLS method that systematically utilises memory for guiding the search process [Glover, 1986; 1989; 1990; Hansen and Jaumard, 1990]. The simplest and most widely applied version of TS, which is also called Simple Tabu Search, consists of an iterative improvement algorithm enhanced with a form of short-term memory that enables it to escape from local optima. Tabu Search typically uses a best improvement strategy to select the best neighbour of the current candidate solution in each search step, which in a local optimum can lead to a worsening or plateau step (plateau steps are local search steps which do not lead to a change of the evaluation function value). To prevent the local search to immediately return to a previously visited candidate solution and to avoid cycling, TS forbids steps to recently visited search positions. This can be implemented by explicitly memorising previously visited candidate solutions and ruling out any step that would lead back to those. More commonly,
2.2 ‘Simple’ SLS Methods
79
procedure step-TS(π, s, tt) input: problem instance π, candidate solution s, tabu tenure tt output: candidate solution s
N := admissibleNeighbours(π, s, tt); s := selectBest(N ); return s end step-TS Figure 2.8 Standard step function for Tabu Search; admissibleNeighbours(π, s, tt) returns
the set of admissible neighbours of s given the tabu tenure tt, selectBest(N ) randomly chooses an element of N with maximal evaluation function value.
reversing recent search steps is prevented by forbidding the re-introduction of solution components (such as edges in case of the TSP) which have just been removed from the current candidate solution. A parameter tt, called tabu tenure, determines the duration (in search steps) for which these restrictions apply. Forbidding possible moves using a tabu mechanism has the same effect as dynamically restricting the neighbourhood N (s) of the current candidate solution s to a subset N ⊂ N (s) of admissible neighbours. Thus, Tabu Search can also be viewed as a dynamic neighbourhood search technique [Hertz et al., 1997]. This tabu mechanism can also forbid search steps leading to attractive, unvisited candidate solutions. Therefore, many tabu search algorithms make use of a so-called aspiration criterion, which specifies conditions under which the tabu status of candidate solutions or solution components is overridden. One of the most commonly used aspiration criteria overrides the tabu status of steps that lead to an improvement in the incumbent candidate solution. Figure 2.8 shows the step function that forms the core of Tabu Search. It uses a function admissibleNeighbours to determine the neighbours of the current candidate solution that are not tabu or are tabu but satisfy the aspiration criterion. In a second stage, a maximally improving step is randomly selected from this set of admissible neighbours. Example 2.5 Tabu Search for SAT
Using the same definition for the search space, solution set and neighbourhood relation as in Example 1.3 (page 41f.), and the same evaluation function as in Example 1.4 (page 47f.), Tabu Search can be applied to SAT in a straightforward way. The search starts with a randomly chosen variable assignment. Each search step corresponds to a single variable flip that is selected according to the associated change in the number of unsatisfied clauses and its tabu status. More precisely, in each search step, all variables are considered
80
Chapter 2 SLS Methods
admissible that either have not been flipped during the least tt steps, or that, when flipped, lead to a lower number of unsatisfied clauses than the best assignment found so far (this latter condition defines the aspiration criterion). From the set of admissible variables, a variable that, when flipped, yields a maximal decrease (or, equivalently, a minimal increase) in the number of unsatisfied clauses is selected uniformly at random. The algorithm terminates unsuccessfully if after a specified number of flips no model of the given formula has been found. This algorithm is known as GSAT/Tabu; it has been shown empirically to achieve very good performance on a broad range of SAT problems (see also Chapter 6). When implementing GSAT/Tabu, it is crucial to keep the time complexity of the individual search steps minimal, which can be achieved by using special data structures and a dynamic caching and incremental updating technique for the evaluation function (this will be discussed in more detail in Chapter 6, in the in-depth section on page 271ff.; Chapter 6 also provides a detailed overview of state-of-the-art SLS algorithms for SAT). It is also very important to determine the tabu status of the propositional variables efficiently. This is done by storing with each variable x the search step number itx when it was flipped last and comparing the difference between the current iteration number it and itx to the tabu tenure parameter, tt: variable x is tabu if, and only if, it − itx is smaller than tt.
In general, the performance of Tabu Search crucially depends on the setting of the tabu tenure parameter, tt. If tt is chosen too small, search stagnation may occur; if it is too large, the search path is too restricted and high-quality solutions may be missed. A good parameter setting for tt can typically only be found empirically and often requires considerable fine-tuning. Therefore, several approaches to make the particular settings of tt more robust or to adjust tt dynamically during the run of the algorithm have been introduced. Robust Tabu Search [Taillard, 1991] achieves an increased robustness of performance w.r.t. the tabu tenure by repeatedly choosing tt randomly from an interval [ttmin , ttmax ]. Additionally, Robust Tabu Search forces specific local search moves if these have not been applied for a large number of iterations. For example, in the case of SAT, this corresponds to forcing a specific variable to be flipped if it has not been flipped in the last k ·n search steps, where k > 1 is a parameter and n is the number of variables in a given formula. (Note that it does not make sense to set k to a value smaller or equal to one in this case.) A variant of Robust Tabu Search is currently amongst the best known algorithms for MAX-SAT, the optimisation variant of SAT (see also Chapter 7).
2.2 ‘Simple’ SLS Methods
81
Reactive Tabu Search [Battiti and Tecchiolli, 1994], uses the search history to adjust the tabu tenure tt dynamically during the search. In particular, if candidate solutions are repeatedly encountered, this is interpreted as evidence that search stagnation has occurred, and the tabu tenure is increased. If, on the contrary, no repetitions are found during a sufficiently long period of time, the tabu tenure is gradually decreased. Additionally, an escape mechanism based on a series of random changes is used to prevent the search process from getting trapped in a specific region of the search space. In Section 10.2 (page 482ff.) we will present in detail a reactive tabu search algorithm for the Quadratic Assignment Problem. Generally, the efficiency of Tabu Search can be further increased by using techniques exploiting a form of intermediate-term or long-term memory to achieve additional intensification or diversification of the search process. Intensification strategies correspond to efforts of revisiting promising regions of the search space, for example, by recovering elite candidate solutions, that is, candidate solutions that are amongst the best that have been found in the search process so far. When recovering an elite candidate solution, all tabu restrictions associated with it can be cleared, in which case the search may follow a different search path. (For an example, we refer to the tabu search algorithm presented in Section 9.3, page 446ff.) Another possibility is to freeze certain solution components and to keep them fixed during the search. In the TSP case, this amounts to forcing certain edges to be kept in the candidate solutions seen over a number of iterations. Diversification can be achieved by generating new combinations of solution components, which can help to explore regions of the search space that have not been visited yet. One way of achieving this is by introducing a few rarely used solution components into the candidate solutions. An example for such a mechanism is the forced execution of search steps, as in Robust Tabu Search. Another possibility is to bias the local search by adding a component to the evaluation function contribution of specific search steps based on the frequency with which these were applied. For a detailed discussion of diversification and intensification techniques that exploit intermediate and long-term memory we refer to Glover and Laguna [1997]. Overall, tabu search algorithms have been successfully applied to a wide range of combinatorial problems, and for many problems they are among best known algorithms w.r.t. the tradeoff between solution quality and computation time [Battiti and Protasi, 2001; Galinier and Hao, 1997; Nowicki and Smutnicki, 1996b; Vaessens et al., 1996]. We will discuss several tabu search algorithms in the second part of this book. Crucial for these successful applications of Tabu Search is often a carefully chosen neighbourhood relation, as well as the use of efficient caching and incremental updating schemes for the evaluation of candidate solutions.
82
Chapter 2 SLS Methods
Dynamic Local Search So far, the various techniques for escaping from local optima discussed in this chapter were all based on allowing worsening steps during the search process. A different approach for preventing iterative improvement methods from getting stuck in local optima is to modify the evaluation function whenever a local optimum is encountered in such a way that further improvement steps become possible. This can be achieved by associating penalty weights with individual solution components, which determine the impact of these components on the evaluation function value. Whenever the iterative improvement process gets trapped in a local optimum, the penalties of some solution components are increased. This leads to a degradation in the current candidate solution’s evaluation function value until it is higher than the evaluation function values of some of its neighbours (which are not affected in the same way by the penalty modifications), at which point improving moves become available. This general approach provides the basis for a number of SLS algorithms which we collectively refer to as dynamic local search (DLS) methods. Figure 2.9 shows an algorithm outline of DLS. As motivated above, the underlying idea is to find local optima of a dynamically changing evaluation function g using a simple local search algorithm localSearch, which typically performs iterative improvement until a local minimum in g is found. The modified evaluation function g is obtained by adding penalties penalty(i) to solution components used in a candidate solution s to the original evaluation function value g (π , s):
g (π , s) := g (π, s) +
penalty (i),
i∈SC(π ,s)
where SC (π , s) is the set of solution components of π used in a candidate solution s. The penalties penalty(i) are initially set to zero and subsequently updated after each subsidiary local search. Typically, updatePenalties increases the penalties of some or all the solution components used by the locally optimal candidate solution s obtained from localSearch(π, g , s). Particular DLS algorithms differ in how this update is performed. One main difference is whether the penalty modifications are done in an additive way or in a multiplicative way. In both cases, the penalty modification is typically parameterised by some constant λ, which also takes into account the range of evaluation function values for the particular instance being solved. Additionally, some DLS techniques occasionally decrease the penalties of solution components not used in s [Schuurmans and Southey, 2000; Schuurmans et al., 2001].
2.2 ‘Simple’ SLS Methods
83
procedure DLS (π ) input: problem instance π ∈ Π output: solution sˆ ∈ S(π ) or ∅
s := init(π ); s := localSearch(π , s); sˆ := s; while not terminate (π , s) do g := g + i ∈ SC(π ,s) penalty(i ); s := localSearch(π, g , s); if (f(s ) < f(ˆ s); sˆ := s ; end
updatePenalties(π, s ); end if sˆ ∈ S then return sˆ else return ∅ end end DLS Figure 2.9 Algorithm outline of Dynamic Local Search for optimisation problems; penalty(i) is the penalty associated with solution component i, SC(π , s) is the set of solution components used in candidate solution s, localSearch(π , g , s) is a subsidiary local search procedure using evaluation function g , and updatePenalties is a procedure for updating the solution component penalties. (Further details are given in the text.)
Penalising all solution components of a locally optimal candidate solution can cause difficulties if certain solution components that are required for any optimal solution are also present in many other local optima. In this case, it can be useful to only increase the penalties of solution components that are least likely to occur in globally optimal solutions. One specific mechanism that implements this idea uses the solution quality contribution of a solution component i in candidate solution s , fi (π, s ), to estimate the utility of increasing penalty(i):
util(s , i) :=
fi (π, s ) 1 + penalty (i)
(2.2)
Using this estimate of utility, updatePenalties then only increases the penalties of solution components with maximal utility values. Note that dividing the solution quality distribution by 1 + penalty(i) avoids overly frequent penalisation
84
Chapter 2 SLS Methods
of specific solution components by reducing their utility. (This mechanism is used in a particular DLS algorithm called Guided Local Search [Voudouris and Tsang, 1995].) It is worth noting that in many cases, the solution quality contribution of a solution component does not depend on the current candidate solution. In the case of the TSP, for example, the solution components are typically the edges of the given graph, and their solution quality contributions are given by their respective weights. There are cases, however, where the solution quality contributions of individual solution components are dependent on the current candidate solution s , or, more precisely, on all solution components of s . This is the case, for example, for the Quadratic Assignment Problem (see Section 10.2, page 477ff.), where DLS algorithms typically use approximations of the actual solution cost contribution [Voudouris and Tsang, 1995; Mills et al., 2003]. Example 2.6 Dynamic Local Search for the TSP
This example follows the first application of DLS to the TSP, as presented by Voudouris and Tsang [1995; 1999], and describes a particular DLS algorithm called Guided Local Search (GLS). Given a TSP instance in the form of an edge-weighted graph G, the same search space, solution set and 2-exchange neighbourhood is used as in Example 2.1 (page 64f.). The solution components are the edges of G, and the cost contribution of each edge e is given by its weight, w(e). The subsidiary local search procedure localSearch performs first improvement steps in the underlying 2-exchange neighbourhood and can be enhanced by using standard speed-up techniques, which are described in detail in Chapter 8, Section 8.2. In GLS, the procedure updatePenalties(π, s) increments the penalties of all edges of maximal utility contained in candidate solution s by a factor λ, which is chosen in dependence of the average length of good tours; in particular a setting of
λ := 0.3 ·
f (s2-opt ) n
where f (s2-opt ) is the objective function value of a 2-optimal tour, and n is the number of vertices in G, has been shown to yield very good results on a set of standard TSP benchmark instances [Voudouris and Tsang, 1999].
The fundamental idea underlying DLS of adaptively modifying the evaluation function during a local search process has been used as the basis for a number of SLS algorithms for various combinatorial problems. Among the earliest DLS algorithms is the Breakout Method, in which penalities are added to solution
2.3 Hybrid SLS Methods
85
components of locally optimal solutions [Morris, 1993]. GENET [Davenport et al., 1994], an algorithm that adaptively modifies the weight of constraints to be satisfied, has directly inspired Guided Local Search, one of the most widely applied DLS methods [Voudouris and Tsang, 1995; 2002]. Closely related SLS algorithms, which can be seen as instances of the general DLS method presented here, have been developed for constraint satisfaction and SAT, where penalties are typically associated with the clauses of a given CNF formula [Selman and Kautz, 1993; Cha and Iwama, 1996; Frank, 1997]; this particular approach is also known as clause weighting. Some of the best-performing SLS algorithms for SAT and MAX-SAT (an optimisation variant of SAT) are based on clause weighting schemes inspired by Lagrangean relaxation techniques [Hutter et al., 2002; Schuurmans et al., 2001; Wu and Wah, 2000].
2.3 Hybrid SLS Methods As we have seen earlier in this chapter, the behaviour and performance of ‘simple’ SLS techniques can often be improved significantly by combining them with other SLS strategies. We have already presented some very simple examples of such hybrid SLS methods. Randomised Iterative Improvement, for example, can be seen as a hybrid SLS algorithm, obtained by probabilistically combining standard Iterative Improvement and Uninformed Random Walk (cf. Section 2.2, page 72ff.). Similarly, many SLS implementations make use of a random restart mechanism that terminates and restarts the search process from a randomly chosen initial position based on standard termination conditions; this can be seen as a hybrid combination of the underlying SLS algorithm and Uninformed Random Picking. In this section, we present a number of well-known and very successful SLS methods that can be seen as hybrid combinations of simpler SLS techniques.
Iterated Local Search In the previous sections, we have discussed various mechanisms for preventing iterative improvement techniques from getting stuck in local optima of the evaluation function. Arguably one of the simplest and most intuitive ideas for addressing this fundamental issue is to use two types of SLS steps: one for reaching local optima as efficiently as possible, and the other for effectively escaping from local optima. This is the key idea underlying Iterated Local Search (ILS) [Lourenc¸o et al., 2002], a SLS method that essentially uses these two types of search steps alternatingly to perform a walk in the space of local optima w.r.t. the given evaluation function.
86
Chapter 2 SLS Methods
procedure ILS(π ) input: problem instance π ∈ Π output: solution sˆ ∈ S (π ) or ∅
s := init(π ); s := localSearch(π , s); sˆ := s; while not terminate(π , s) do s := perturb(π , s ); s := localSearch(π , s ); if (f(s ) < f(ˆ s)) then ˆ s := s ; end
s := accept(π , s, s ); end if sˆ ∈ S then return sˆ else return ∅ end end ILS Figure 2.10 Algorithm outline of Iterated Local Search (ILS) for optimisation problems.
(For details, see text.)
Figure 2.10 shows an algorithm outline for ILS. As usual, the search process can be initialised in various ways, for example, by starting from a randomly selected element of the search space. From the initial candidate solution, a locally optimal solution is obtained by applying a subsidiary local search procedure localSearch. Then, each iteration of the algorithm consists of three major stages: first, a perturbation is applied to the current candidate solution s; this yields a modified candidate solution s from which in the next stage a subsidiary local search is performed until a local optimum s is obtained. In the last stage, an acceptance criterion accept is used to decide from which of the two local optima, s or s , the search process is continued. Both functions, perturb and accept, can use aspects of the search history, for example, when the same local optima are repeatedly encountered, stronger perturbation steps may be applied. As in the case of most other SLS algorithms, a variety of termination predicates terminate can be used for deciding when the search process ends. The three procedures localSearch, perturb and accept form the core of any ILS algorithm. The specific choice of these procedures has a crucial impact
2.3 Hybrid SLS Methods
87
on the performance of the resulting algorithm. As we will discuss in the following, these components need to complement each other for achieving a good tradeoff between intensification and diversification of the search process, which is critical for obtaining good performance when solving hard combinatorial problems. It is rather obvious that the subsidiary local search procedure, localSearch, has a considerable influence on the performance of any ILS algorithm. In general, more effective local search methods lead to better performing ILS algorithms. For example, when applying ILS to the Travelling Salesman Problem, using 3-opt local search (i.e., an iterative improvement algorithm based on the 3-exchange neighbourhood relation) typically leads to better performance than using 2-opt local search, while even better results than with 3-opt local search are obtained when using the Lin-Kernighan Algorithm as a subsidiary local search procedure. While often, iterative improvement methods are used for the subsidiary local search within ILS, it is perfectly possible to use more sophisticated SLS algorithms, such as SA, TS or DLS, instead. The role of perturb is to modify the current candidate solution in a way that will not be immediately undone by the subsequent local search phase. This helps the search process to effectively escape from local optima, and the subsequent local search phase has a chance to discover different local optima. In the simplest case, a random walk step in a larger neighbourhood than the one used by localSearch may be sufficient for achieving this goal. There are also ILS algorithms that use perturbations consisting of a number of simple steps (e.g., sequences of random walk steps in a 1-exchange neighbourhood). Typically, the strength of the perturbation has a strong influence on the length of the subsequent local search phase; weak perturbations usually lead to shorter local search phases than strong perturbations, because the local search procedure requires fewer steps to reach a local optimum. If the perturbation is too weak, however, the local search will often fall back into the local optimum just visited, which leads to search stagnation. At the same time, if the perturbation is too strong, its effect can be similar to a random restart of the search process, which usually results in a low probability of finding better solutions in the subsequent local search phase. To address these issues, both the strength and the nature of the perturbation steps may be changed adaptively during the search. Furthermore, there are rather complex perturbation techniques, such as the one used in Lourenc¸o [1995], which is based on finding optimal solutions for parts of the given problem instance. The acceptance criterion, accept, also has a strong influence on the behaviour and performance of ILS. A strong intensification of the search is obtained if the better of the two solutions s and s is always accepted. ILS algorithms using this acceptance criterion effectively perform iterative improvement in the space of local optima reached by the subsidiary local search procecudure. Conversely, if
88
Chapter 2 SLS Methods
the new local optimum, s , is always accepted regardless of its solution quality, the behaviour of the resulting ILS algorithm corresponds to a random walk in the space of the local optima of the given evaluation function. Between these extremes, many intermediate choices exist; for example, the Metropolis acceptance criterion known from Simulated Annealing has been used in an early class of ILS algorithms called Large Step Markov Chains [Martin et al., 1991]. While all these acceptance criteria are Markovian, that is, they only depend on s and s , it has been shown that acceptance criteria that take into account aspects of the search history, such as the number of search steps since the last improvement of the incumbent candidate solution, often help to enhance ILS performance [Stützle, 1998c]. Example 2.7 Iterated Local Search for the TSP
In this example we describe the Iterated Lin-Kernighan (ILK) Algorithm, an ILS algorithm that is currently amongst the best performing incomplete algorithms for the Travelling Salesman Problem. ILK is based on the same search space and solution set as used in Example 2.3 (page 75). The subsidiary local search procedure localSearch is the Lin-Kernighan variable depth search algorithm (LK) described in Section 2.1 (page 68ff.). Like almost all ILS algorithms for the Travelling Salesman Problem, ILK uses a particular 4-exchange step, called a double-bridge move, as a perturbation step. This double-bridge move is illustrated in Figure 2.11; it has the desirable property that it cannot be directly reversed by a sequence of 2-exchange moves as performed by the LK algorithm. Furthermore, it was found in empirical studies that this perturbation is effective independently of problem size. Finally, an acceptance criterion is used that always returns the better of the two candidate solutions s and s . An efficient implementation of
C
B
C
B
D
A
double bridge move
D
A
Figure 2.11 Schematic representation of the double-bridge move used in ILK. The four
dashed edges on the left are removed and the remaining paths A, B, C, D are reconnected as shown on the right side.
2.3 Hybrid SLS Methods
89
this structurally rather simple algorithm has been shown to achieve excellent performance [Johnson and McGeoch, 1997]. (Details on this and other ILS algorithms for the TSP are presented in Section 8.3, page 384ff.)
Generally, ILS can be seen as a straight-forward, yet powerful technique for extending ‘simple’ SLS algorithms such as Iterative Improvement. The conceptual simplicity of the underlying idea led to frequent re-discoveries and many variants, most of which are known under various names, such as Large Step Markov Chains [Martin et al., 1991], Chained Local Search [Martin and Otto, 1996], as well as, when applied to particular algorithms, to specific techniques, such as Iterated Lin-Kernighan algorithms [Johnson and McGeoch, 1997]. Despite the fact that the underlying ideas are quite different, there is also a close conceptual relationship between ILS and certain variants of Variable Neighbourhood Search (VNS), such as Basic VNS and Skewed VNS [Hansen and Mladenovic, ´ 2002]. ILS algorithms are also attractive because they are typically easy to implement: in many cases, existing SLS implementations can be extended into ILS algorithms by adding just a few lines of code. At the same time, ILS algorithms are currently among the best-performing incomplete search methods for many combinatorial problems, the most prominent application being the Travelling Salesman Problem [Johnson and McGeoch, 1997; Martin and Otto, 1996]. For an overview of various issues arising in the design and implementation of ILS algorithms we refer to Lourenc¸o et al. [2002].
Greedy Randomised Adaptive Search Procedures A standard approach for quickly finding high-quality solutions for a given combinatorial optimisation problem is to apply a greedy construction search method (see also Chapter 1, Section 1.4) that, starting from an empty candidate solution, at each construction step adds the solution component ranked best according to a heuristic selection function, and to subsequently use a perturbative local search algorithm to improve the candidate solution thus obtained. In practice, this type of hybrid search method often yields much better solution quality than simple SLS methods initialised at candidate solutions obtained by Uninformed Random Picking (see Chapter 1, Section 1.5). Additionally, when starting from a greedily constructed candidate solution, the subsequent perturbative local search process typically takes much fewer improvement steps to reach a local optimum. By iterating this process of greedy construction and perturbative local search, even higher-quality solutions can be obtained. Unfortunately, greedy construction search methods can typically only generate a very limited number of different candidate solutions. Greedy Randomised
90
Chapter 2 SLS Methods
procedure GRASP(π ) input: problem instance π ∈ Π output: solution sˆ ∈ S(π ) or ∅
s := ∅; sˆ := s; f(ˆ s) := ∞; while not terminate(π , s) do s := construct(π ); s := localSearch(π , s); if (f(s ) < f(ˆ s)) then ˆ s := s ; end end if sˆ ∈ S then return sˆ else return ∅ end end GRASP Figure 2.12 Algorithm outline of GRASP for optimisation problems. (For details,
see text.)
Adaptive Search Procedures (GRASP) [Feo and Resende, 1989; 1995] try to avoid this disadvantage by randomising the construction method, such that it can generate a large number of different good starting points for a perturbative local search method. Figure 2.12 shows an algorithm outline for GRASP. In each iteration of the algorithm, first a candidate solution s is generated using a randomised constructive search procedure, construct. Then, a local search procedure, localSearch, is applied to s, yielding an improved (typically, locally optimal) candidate solution s . This two-phase process is iterated until a termination condition is satisfied. In contrast to standard greedy constructive search methods, the constructive search algorithm used in GRASP does not necessarily add a solution component with maximal heuristic value in each construction step, but rather selects randomly from a set of highly ranked solution components. This is done by defining in each construction step a restricted candidate list (RCL) and then selecting one of the solution components in the RCL randomly according to a uniform distribution. In GRASP, there are two different mechanisms for defining the RCL: by cardinality restriction or by value restriction. In the case of a cardinality restriction, only the k best-ranked solution components are included in the RCL.
2.3 Hybrid SLS Methods
91
Value restriction allows the number of elements in the RCL to vary. More specifically, let g (l) be the greedy heuristic value of a solution component l, L be the set of feasible solution components, and let gmin := min{g (l) | l ∈ L} and gmax :=max{g (l) | l ∈ L} be the best and worst heuristic values among the feasible components, respectively. Then a component l is inserted into the RCL if, and only if, g (l) ≤ gmin + α(gmax − gmin ). Clearly, the smaller k or α, the greedier is the selection of the next solution component. The constructive search process performed within GRASP is ‘adaptive’ in the sense that the heuristic value for each solution component typically depends on the components that are already present in the current partial candidate solution. This takes more computation time than using static heuristic values that do not change during the construction process, but this overhead is typically amortised by the higher quality solutions obtained when using the ‘adaptive’ search method. Note that it is entirely feasible to perform GRASP without a perturbative local search phase; the respective restricted variants of GRASP are also known as semi-greedy heuristics [Hart and Shogan, 1987]. But in general, the candidate solutions obtained from the randomised constructive search process are not guaranteed to be locally optimal with respect to some simple neighbourhood; hence, even the additional use of a simple iterative improvement algorithm typically yields higher quality solutions with rather small computational overhead. Indeed, for a large number of combinatorial problems, empirical results indicate that the additional local search phase improves the performance of the algorithm considerably.
Example 2.8 GRASP for SAT
GRASP can be applied to SAT in a rather straightforward way [Resende and Feo, 1996]. The constructive search procedure starts from an empty variable assignment and adds an atomic assignment (i.e., an assignment of a truth value to an individual propositional variable of the given CNF formula) in each construction step. The heuristic function used for guiding this construction process is defined by the number of clauses that become satisfied as a consequence of adding a particular atomic assignment to the current partial assignment. Let h(i, v ) be the number of (previously unsatisfied) clauses that become satisfied as a consequence of the atomic assignment xi := v , where v ∈ {, ⊥}. In each construction step, an RCL is built by cardinality restriction; this RCL contains the k variable assignments with the largest heuristic value h(i, v ). In the simplest case, the current partial assignment is extended by an atomic variable assignment that is selected from the RCL uniformly at random. In Resende and Feo [1996], a slightly more complex assignment
92
Chapter 2 SLS Methods
strategy is followed. If an unsatisfied clause c exists, in which only one variable is unassigned under the current partial assignment, this variable is assigned the value that renders c satisfied. (This mimics unit propagation, a well-known simplification strategy for SAT that is widely used in complete SAT algorithms.) Only if no such clause exists, a random element of the RCL is selected instead. After a complete assignment has been generated, the respective candidate solution is improved using a best improvement variant of the iterative improvement algorithm for SAT from Example 1.4 (page 47f.). The search process is terminated when a solution has been found or after a given number of iterations has been exceeded. This GRASP algorithm together with other variants of construct was implemented and tested on a large number of satisfiable SAT instances from the DIMACS benchmark suite [Resende and Feo, 1996]. While the results were reasonably good at the time the algorithm was first presented, it is now outperformed by more recent SLS algorithms for SAT (see Chapter 6).
GRASP has been applied to a large number of combinatorial problems, including MAX-SAT, Quadratic Assignment and various scheduling problems; we refer to Festa and Resende [2001] for an overview of GRASP applications. There are also a number of recent improvements and extensions of the basic GRASP algorithm; some of these include reactive GRASP variants in which, for example, the parameter α used in value-restricted RCLs is dynamically adapted [Prais and Ribeiro, 2000], and combinations with tabu search or path relinking algorithms [Laguna and Martí, 1999; Lourenc¸o and Serra, 2002]. For a detailed introduction to GRASP and a discussion of various extensions of the basic GRASP algorithm as presented here, we refer to Resende and Ribeiro [2002].
Adaptive Iterated Construction Search Considering algorithms based on repeated constructive search processes, such as GRASP, the idea of exploiting experience gained from past iterations for guiding further solution constructions is appealing. One way of implementing this idea is to use weights associated with the possible decisions that are made during the construction process. These weights are adapted over multiple iterations of the search process to reflect the experience from previous iterations. This leads to a family of SLS algorithms we call Adaptive Iterated Construction Search (AICS). An algorithm outline of AICS is shown in Figure 2.13. At the beginning of the search process, all weights are initialised to some small value τ0 . Each
2.3 Hybrid SLS Methods
93
procedure AICS(π ) input: problem instance π ∈ Π output: solution sˆ ∈ S(π ) or ∅
s := ∅; sˆ := s; f(ˆ s) := ∞; w := initWeights(π ); while not terminate(π , s) do s := construct(π , w, h); s := localSearch(π , s); s) then if f(s ) < f(ˆ sˆ = s ; end
w := adaptWeights(π , s , w); end if sˆ ∈ S then return sˆ else return ∅ end end AICS Figure 2.13 Algorithm outline of Adaptive Iterated Construction Search for optimisation problems. (For details, see text.)
iteration of AICS consists of three phases. First, a constructive search process is used to generate a candidate solution s. Next, an additional perturbative local search phase is performed on s, yielding a locally optimal solution s . Finally, the weights are adapted based on the solution components used in s and the solution quality of s . As usual, various termination conditions can be used to determine when the search process is ended. The constructive search process uses the weights as well as a heuristic function h on the solution components to probabilistically select components for extending the current partial candidate solution. Generally, h can be chosen to be a standard heuristic function, as used for greedy methods or in the context of tree search algorithms; alternatively, h can be based on lower bounds on the solution quality of s, such as the bounds used in branch & bound algorithms. For AICS, it can be advantageous to implement the solution component selection in such a way that at all points of the construction process, with a small probability,
94
Chapter 2 SLS Methods
any component solution can be added to the current partial candidate solution, irrespective of its weight and heuristic value. As in GRASP, the perturbative local search phase typically improves the quality of the candidate solution generated by the construction process, leading to an overall increase in performance. In the simplest case, iterative improvement algorithms can be used in this context; however, it is perfectly possible and potentially beneficial to use more powerful SLS methods that can escape from local optima of the evaluation function. Typically, there is a tradeoff between the computation time used by the local search phase vs the construction phase, which can only be optimised empirically and depends on the given problem domain. The adjustment of the weights, as implemented in the procedure adaptWeights, is typically done by increasing the weights that correspond to the solution components contained in s . In this context, it is also possible to use aspects of the search history; for example, by using the incumbent candidate solution as the basis for the weight update, the sampling performed by the construction and perturbative search phases can be focused more directly on promising regions of the search space. Example 2.9 A Simple AICS Algorithm for the TSP
The AICS algorithm presented in this example is a simplified version of Ant System for the TSP by Dorigo, Maniezzo and Colorni [1991; 1996], enhanced by an additional perturbative search phase, which in practice improves the performance of the original algorithm. (Ant System is a particular instance of Ant Colony Optimisation, an SLS method discussed in the following section.) It uses the same search space and solution set as used in Example 2.3 (page 75). Weights τij ∈ R+ 0 are associated with each edge (i, j ) of the given graph G, and heuristic values ηij := 1/w((i, j )) are used, where w((i, j )) is the weight of edge (i, j ). At the beginning of the search process, all edge weights are initialised to a small value, τ0 . The function construct iteratively constructs vertex permutations (corresponding to Hamiltonian cycles in G). The construction process starts with a randomly chosen vertex and then extends the partial permutation φ by probabilistically selecting a vertex not contained in φ according to the following distribution:
pij :=
[τij ]α · [ηij ]β α β l∈N (i) [τil ] · [ηil ]
if j ∈ N (i)
(2.3)
and 0 otherwise, where N (i) is the feasible neighbourhood of vertex i, that is, the set of all neighbours of i that are not contained in the current partial permutation φ, and α and β are parameters that control the relative impact of the weights vs the heuristic values.
2.4 Population-Based SLS Methods
95
Upon the completion of each construction process, an iterative improvement search using the 2-exchange neighbourhood is performed until a vertex permutation corresponding to a Hamiltonian cycle with minimal path weight is reached. The adaption of the weights τij is done by first decreasing all τij by a constant factor and then increasing the weights of the edges used in s proportionally to the path weight f (s ) of the Hamiltonian cycle represented by s , that is, for all edges (i, j ), the following update is performed:
τij := (1 − ρ) · τij + ∆(i, j, s where 0 < ρ ≤ 1 is a parameter of the algorithm, and ∆(i, j, s ) is defined as 1/f (s ) if edge (i, j ) is contained in the cycle represented by s and as zero otherwise. The decay mechanism controlled by the parameter ρ helps to avoid unlimited increased of the weights τij and lets the algorithm ‘forget’ the past experience reflected in the weights. The specific definition of ∆(i, j, s ) reflects the idea that edges contained in good candidate solutions should be used with higher probability in subsequent constructions. The search process is terminated after a fixed number of iterations.
Different from most of the other SLS methods presented in this chapter, AICS has not (yet) been widely used as a general SLS technique. It is very useful, however, as a general framework that helps to understand a number of recent variants of constructive search algorithms. In particular, various incomplete tree search algorithms can be seen as instances of AICS, including the stochastic tree search algorithm by Bresina [1996], the Squeeky-Wheel Optimisation algorithm by Joslin and Clements [1999], and the Adaptive Probing algorithm by Ruml [2001]. Furthermore, AICS can be viewed as a special case of Ant Colony Optimisation, a prominent SLS method based on an adaptive iterated construction process involving populations of candidate solutions.
2.4 Population-Based SLS Methods All SLS methods we have discussed so far manipulate only one single candidate solution of the given problem instance in each search step. A straightforward extension is to consider algorithms where several individual candidate solutions are simultaneously maintained; this idea leads to the population-based SLS methods discussed in this section. Although in principle, one could consider populationbased search methods in which the population size may vary throughout the
96
Chapter 2 SLS Methods
search process, the population-based SLS methods considered here typically use constant size populations. Note that population-based SLS algorithms fit into the formal definition of an SLS algorithm (Definition 1.10, page 38f.) by considering search positions that are sets of individual candidate solutions. Though interesting in some ways, this view is somewhat unintuitive, and for most practical purposes, it is preferable to think of the search process as operating on sets of candidate solutions for the given problem instance. For example, a population-based SLS algorithm for SAT intuitively operates on a set of variable assignments. In the following, unless explicitly stated otherwise, we will use the term ‘candidate solution’ in this intuitive sense, rather than to refer to entire populations. The use of populations offers several conceptual advantages in the context of SLS methods. For instance, a population of candidate solutions provides a straightforward means for achieving search diversification and hence for increasing the exploration capabilities of the search process. Furthermore, it facilitates the use of search mechanisms that are based on the combination of promising features from a number of individual candidate solutions.
Ant Colony Optimisation Ant Colony Optimisation (ACO) is a population-based SLS method inspired by aspects of the pheromone-based trail-following behaviour of real ants; it was first introduced by Dorigo, Maniezzo and Colorni [1991] as a metaphor for solving hard combinatorial problems, such as the TSP. ACO can be seen as a populationbased extension of AICS, based on a population of agents (ants) that indirectly communicate via distributed, dynamically changing information, the so-called (artificial) pheromone trails. These pheromone trails reflect the collective search experience and are exploited by the ants in their attempts to solve a given problem instance. The pheromone trail levels used in ACO correspond exactly to the weights in AICS. Here, we use the term ‘pheromone trail level’ instead of ‘weight’ to be consistent with the literature on Ant Colony Optimisation. An algorithm outline of ACO for optimisation problems is shown in Figure 2.14. Conceptually, the algorithm is usually thought of as being executed by k ants, each of which creates and manipulates one candidate solution. The search process is started by initialising the pheromone trail levels; typically, this is done by setting all pheromone trail levels to the same value, τ0 . In each iteration of ACO, first a population sp of k candidate solutions is generated by a constructive search procedure construct. As in AICS, in this construction process each ant starts with an empty candidate solution and iteratively extends the current partial candidate solution with solution components that are selected probabilistically according to the pheromone trail levels and a heuristic function, h.
2.4 Population-Based SLS Methods
97
procedure ACO(π ) input: problem instance π ∈ Π output: solution sˆ ∈ S (π ) or ∅
sp := {∅}; sˆ := ∅; f(ˆ s) := ∞; τ := initTrails(π ); while not terminate(π , sp) do sp := construct(π , τ, h); sp := localSearch(π , sp); s) then if f(best(π , sp )) < f(ˆ sˆ = best(π , sp ); end
τ := updateTrails(π , sp , τ ); end if sˆ ∈ S then return sˆ else return ∅ end end ACO Figure 2.14 Algorithm outline of Ant Colony Optimisation for optimisation problems;
best(π , sp ) denotes the individual from population sp with the best objective function value. The use of the procedure localSearch(π , sp ) is optional. (For details, see text.)
Next, a perturbative local search procedure localSearch may be applied to each candidate solution in sp; typically, an iterative improvement method is used in this context, resulting in a population sp of locally optimal candidate solutions. If the best of the candidate solutions in sp , best(π , sp ), improves on the overall best solution obtained so far, this candidate solution becomes the new incumbent candidate solution. As in GRASP and AICS, this perturbative local search phase is optional, but typically leads to significantly improved performance of the algorithm. Finally, the pheromone trail levels are updated based on the candidate solutions in sp and their respective solution qualities. The precise pheromone update mechanism differs between various ACO algorithms. A typical mechanism first uniformly decreases all pheromone trail levels by a constant factor (intuitively, this corresponds to the physical process of pheromone evaporation), after which a subset of the pheromone trail levels is increased; this subset and the
98
Chapter 2 SLS Methods
amount of the increase is determined from the quality of the candidate solutions in sp and sˆ, and from the solution components contained in these. As usual, a number of different termination predicates can be used to determine when to end the search process. As an alternative to standard termination criteria based on CPU time or the number of iterations, these can include conditions on the make-up of the current population, sp , such as the variation in solution quality across the elements of sp or their average distance from each other. Example 2.10 A Simple ACO Algorithm for the TSP
In this example, we present a variant of Ant System for the TSP, a simple ACO algorithm which played an important role as the first application of the ant colony metaphor to solving combinatorial optimisation problems [Dorigo et al., 1991; Dorigo, 1992; Dorigo et al., 1996]. This algorithm can be seen as a slight extension of the AICS algorithm from Example 2.9 (page 94f.). The initialisation of the pheromone trail levels is performed exactly like the weight initialisation in the AICS example. The functions construct and localSearch are straightforward extensions of the ones from Example 2.9 that perform the respective construction and perturbative local search processes for each individual candidate solution independently. The pheromone trail update procedure, updateTrails, is also quite similar to the adaptWeights procedure from the AICS example; in fact, it is based on the same update as specified in Equation 2.4 (page 95), but instead of ∆(i, j, s ), now a value ∆(i, j, sp ) is used, which is based on contributions from all candidate solutions in the current population sp according to the following definition: ∆(i, j, sp ) := ∆(i, j, s ), (2.5) s ∈sp
where ∆(i, j, s ) is defined as 1/f (s ) if edge (i, j ) is contained in the Hamil-
tonian cycle represented by candidate solution s and as zero otherwise. According to this definition, the pheromone trail levels associated with edges which belong to the highest-quality candidate solutions (i.e., low-weight Hamiltonian cycles) and which have been used by the most ants are increased the most. This reflects the idea that heuristically, these edges are most likely to be contained in even better (and potentially optimal) candidate solutions and should therefore be selected with higher probability during future construction phases. The search process is terminated after a fixed number of iterations. Note how, in terms of the biological metaphor, the phases of this algorithm can be interpreted loosely as the actions of ants that walk the edges of the given graph to construct tours (using memory to ensure that
2.4 Population-Based SLS Methods
99
only Hamiltonian cycles are generated as candidate solution) and deposit pheromones to reinforce the edges of their tours. The algorithm from Example 2.10 differs from the original Ant System (AS) only in that AS did not include a perturbative local search phase. For many (static) combinatorial problems and a variety of ACO algorithms, it has been shown, however, that the use of a perturbative local search phase leads to significant performance improvements [Dorigo and Gambardella, 1997; Maniezzo et al., 1994; Stützle and Hoos, 1996; 1997]. ACO, as introduced here, is typically applied to static problems, that is, to problems whose instances (i) are completely specified before the search process is started and (ii) do not change while solving the problem. (All combinatorial problems covered in this book are static in this sense.) In this case, the construction of candidate solutions, the perturbative local search phase, and the pheromone updates are typically performed in a parallel and fully synchronised manner by all ants. There are, however, different approaches, such as Ant Colony System, an ACO method in which ants modify the pheromone trails during the construction phase [Dorigo and Gambardella, 1997]. When applying ACO to dynamic optimisation problems, that is, optimisation problems where parts of the problem instances (such as the objective function) change over time, the distinction between synchronous and asynchronous, decentralised phases of the algorithm becomes very important. This is reflected in the ACO metaheuristic [Dorigo et al., 1999; Dorigo and Di Caro, 1999; Dorigo and Stützle, 2004], which provides a general framework for ACO applications to both, static and dynamic combinatorial problems. ACO algorithms have been applied to a wide range of combinatorial problems. The first ACO algorithm, Ant System, was applied to the TSP and several other combinatorial problems. It has been shown to be capable of solving some non-trivial instances of these problems, but its performance falls substantially short of that of state-of-the-art algorithms. Nevertheless, Ant System can be seen as a proof-of-concept that the ideas underlying ACO can be used to solve combinatorial optimisation problems. Following Ant System, many other Ant Colony Optimisation algorithms have been developed, including Ant Colony System [Dorigo and Gambardella, 1997], MAX –MIN Ant System [Stützle and Hoos, 1997; Stützle and Hoos, 2000] and the ANTS Algorithm [Maniezzo, 1999]. These algorithms differ in important aspects of the search control and introduced advanced features, such as the use of look-ahead or pheromone trail level updates during the construction phase or diversification mechanisms, such as bounds on the range of possible pheromone trail levels. Some of the most prominent ACO applications are to dynamic optimisation problems, such as routing in telecommunications networks, in which traffic patterns are subject to significant changes over time [Di Caro and Dorigo, 1998]. We refer to the book by
100
Chapter 2 SLS Methods
Dorigo and Stützle [2004] for a detailed account of the ACO metaheuristic, different ACO algorithms, theoretical results and ACO applications.
Evolutionary Algorithms With Ant Colony Optimisation, we saw an example of a population-based SLS method in which the only interaction between the individual elements of the population of candidate solutions is of a rather indirect nature, through the modification of a common memory (namely, the pheromone trails). Perhaps the most prominent example for a type of population-based SLS algorithms based on a much more direct interaction within a population of candidate solutions is the class of Evolutionary Algorithms (EAs). In a broad sense, Evolutionary Algorithms are a large and diverse class of algorithms inspired by models of the natural evolution of biological species [Bäck, 1996; Mitchell, 1996]. They transfer the principle of evolution through mutation, recombination and selection of the fittest, which leads to the development of species that are better adapted for survival in a given environment, to solving computationally hard problems. Evolutionary algorithms are generally iterative, population-based approaches: Starting with a set of candidate solutions (the initial population), they repeatedly apply a series of three genetic operators, selection, mutation and recombination. Using these operators, in each iteration of an evolutionary algorithm, the current population is (completely or partially) replaced by a new set of candidate solutions; in analogy with the biological inspiration, the populations encountered in the individual iterations of the algorithm are often called generations. The selection operator implements a (generally probabilistic) choice of individual candidate solutions either for the next generation or for the subsequent application of the mutation and recombination operators; it typically has the property that fitter individuals have a higher probability of being selected. Mutation is based on a unary operation on individuals that introduces small, often random modifications. Recombination is based on an operation that generates one or more new individuals (the offspring) by combining information from two or more individuals (the parents). The most commonly used type of recombination mechanism is called crossover; it is originally inspired by a fundamental mechanism in biological evolution of the same name, and essentially assembles pieces from a linear representation of the parents into a new individual. One major challenge in designing evolutionary algorithms is the design of recombination operators that combine parents in such a way that the resulting offspring is likely to inherit desirable properties from their parents, while improving on their parents’ solution quality.
2.4 Population-Based SLS Methods
101
Note how Evolutionary Algorithms fit into our general definition of SLS algorithms, when the notion of a candidate solution as used in an SLS algorithm is applied to populations of candidate solutions of the given problem instance, as used in an EA. The concepts of search space, solution set and neighbourhood, as well as the generic functions init, step and terminate, can be easily applied to this population-based concept of a candidate solution. Nevertheless, to keep this description conceptually simple, in this section we continue to present evolutionary algorithms in the traditional way, where the notion of candidate solution refers to an individual of the population comprising the search state. Intuitively, by using a population of candidate solutions instead of a single candidate solution, a higher search diversification can be achieved, particularly if the initial population is randomly selected. The primary goal of Evolutionary Algorithms for combinatorial problems is to evolve the population such that good coverage of promising regions of the search space is achieved, resulting in high-quality solutions of a given optimisation problem instance. However, pure evolutionary algorithms often seem to lack the capability of sufficient search intensification, that is, the ability to reach high-quality candidate solutions efficiently when a good starting position is given, for example, as the result of recombination or mutation. Hence, in many cases, the performance of evolutionary algorithms for combinatorial problems can be significantly improved by adding a local search phase after applying mutation and recombination [Brady, 1985; Suh and Gucht, 1987; Mühlenbein et al., 1988; Ulder et al., 1991; Merz and Freisleben, 1997; 2000b] or by incorporating a local search process into the recombination operator [Nagata and Kobayashi, 1997]. The class of Evolutionary Algorithms thus obtained is usually called Memetic Algorithms (MAs) [Moscato, 1989; Moscato and Norman, 1992; Moscato, 1999; Merz, 2000] or Genetic Local Search [Ulder et al., 1991; Kolen and Pesch, 1994; Merz and Freisleben, 1997]. In Figure 2.15 we show the outline of a generic memetic algorithm. At the beginning of the search process, an initial population is generated using function init. In the simplest (and rather common) case, this is done by randomly and independently picking a number of elements of the underlying search space; however, it is equally possible to use, for example, a randomised construction search method instead of random picking. In each iteration of the algorithm, recombination, mutation, perturbative local search and selection are applied to obtain the next generation of candidate solutions. As usual, a number of termination criteria can be used for determining when to end the search process. The recombination function, recomb(π , sp), typically generates a number of offspring solutions by repeatedly selecting a set of parents and applying a recombination operator to obtain one or more offspring from these. As mentioned
102
Chapter 2 SLS Methods
procedure MA(π ) input: problem instance π ∈ Π output: solution sˆ ∈ S (π ) or ∅
sp := init(π ); sp := localSearch1 (π , sp); sˆ := best(π , sp); while not terminate(π , sp) do sp := recomb(π , sp); sp := localSearch2 (π , sp ); sp := mutate(π, sp ∪ sp ); sp := localSearch3 (π , sp ); s) then if f(best(π , sp ∪ sp )) < f(ˆ ˆ s = best(π , sp ∪ sp ); end
sp := select(π , sp, sp , sp ); end if sˆ ∈ S then return sˆ else return ∅ end end MA Figure 2.15 Algorithm outline of a memetic algorithm for optimisation problems; best(π , sp) denotes the individual from a population sp with the best objective function value. (For details, see text.)
before, this operation is generally based on a linear representation of the candidate solutions, and pieces together the offspring from fragments of the parents; this type of mechanism creates offspring that inherit certain subsets of solution components from their parents. One of the most commonly used recombination mechanisms is the one-point binary crossover operator, which works as follows. Given two parent candidate solutions represented by strings x1 x2 . . . xn and y1 y2 . . . yn , first, a cut point i is randomly chosen according to a uniform distribution over the index set {2, . . . , n}. Two offspring candidate solutions are then defined as x1 x2 . . . xi−1 yi yi+1 . . . yn and y1 y2 . . . yi−1 xi xi+1 . . . xn (see also Figure 2.16). One challenge when designing recombination mechanisms stems from the fact that often, simple crossover operators do not produce valid solution candidates. Consider, for example, a formulation of the TSP in which the solution candidates are represented by permutations of the vertex set, written as vectors
2.4 Population-Based SLS Methods
103
cut 0
1
1
0
1
1
1
0
Parent 1
1
0
0
0
1
0
1
0
Parent 2
0
1
1
0
1
0
1
0
Offspring 1
1
0
0
0
1
1
1
0
Offspring 2
Figure 2.16 Schematic representation of the one-point binary crossover operator.
(u1 , u2 , . . . , un ). Using a simple one-point binary crossover operation as the basis for recombination obviously leads to vectors that do not correspond to Hamiltonian cycles of the given graph. In cases like this, either a repair mechanism has to be applied to transform the results of a standard crossover into a valid candidate solution, or special crossover operators have to be used, which are guaranteed to produce valid candidate solutions only. In Chapter 8 we give two examples of high-performance memetic algorithms for the TSP that illustrate both possibilities. An overview of different specialised crossover operators for the TSP can be found in Merz and Freisleben [2001]. The role of function mutate(π, sp ∪ sp ) is to introduce relatively small perturbations in the individuals in sp ∪ sp . Typically, these perturbations are of stochastic nature, and they are performed independently for each individual in sp ∪ sp , where the amount of perturbation applied is controlled by a parameter called the mutation rate. It should be noted that mutation need not be applied to all individuals of sp ∪ sp ; instead, a subsidiary selection function can be used to determine which candidate solutions are to be mutated. (Until rather recently, the role of mutation compared to recombination for the performance of one of the most prominent types of evolutionary algorithms, Genetic Algorithms, has been widely underestimated [Bäck, 1996].) As in ACO and AICS, perturbative local search is often useful and necessary for obtaining high-quality candidate solutions. It typically consists of selecting some or all individuals in the current population, and then applying an iterative improvement procedure to each element of this set independently. Finally, the selection function used for determining the individuals that form the next generation sp of candidate solutions typically considers elements of the original population, as well as the newly obtained candidate solutions, and selects from these based on their respective evaluation function values (which, in this context, are usually referred to as fitness values). Generally, the selection is done in such a way that candidate solutions with better evaluation function values
104
Chapter 2 SLS Methods
have a higher chance of ‘surviving’ the selection process. Many selection schemes involve probabilistic choices; however, it is often beneficial to use elitist strategies, which ensure that the best candidate solutions are always selected. Generally, the goal of selection is to obtain a population with good evaluation function values, but at the same time, to ensure a certain diversity of the population. Example 2.11 A Memetic Algorithm for SAT
As in the case of previous examples of SLS algorithms for SAT, given a propositional CNF formula F with n variables, we define the search space as the set of all variable assignments of F , the solution set as the set of all models of F , and a basic neighbourhood relation under which two variable assignments are neighbours if, and only if, they differ exactly in the truth value assigned to one variable (1-flip neighbourhood). As an evaluation function, we use the number of clauses in F unsatisfied under a given assignment. Note that the variable assignments for a formula with n variables can be easily represented as binary strings of length n by using an arbitrary ordering of the variables and representing the truth values and ⊥ by 1 and 0, respectively. We keep the population size fixed at k assignments. To obtain an initial population, we use k (independent) iterations of Uninformed Random Picking from the search space, resulting in an initial population of k randomly selected variable assignments. The recombination procedure performs n/2 one-point binary crossovers (as defined above) on pairs of randomly selected assignments from sp, resulting in a set sp of n offspring assignments. The function mutate(F, sp ∪ sp ) simply flips µ randomly chosen bits of each assignment in sp ∪ sp , where µ ∈ {1, . . . , n} is a parameter of the algorithm; this corresponds to performing µ steps of Uninformed Random Walk independently for all s ∈ sp ∪ sp (see also Section 1.5). For perturbative local search, we use the same iterative best improvement algorithm as in Example 1.4 (page 47f.), which is run until a locally minimal assignment is obtained. The function localSearch3 (F, sp ) returns the set of assignments obtained by applying this procedure to each element in s ; the same function is used for localSearch1 and localSearch2 . Finally, select(F, sp, sp , sp ) applies a simple elitist selection scheme, in which the k best assignments in sp ∪ sp ∪ sp are selected to form the next generation (using random tie-breaking, if necessary). Note that this selection scheme ensures that the best assignment found so far is always included in the new population. The search process is terminated when a model of F is found or a fixed number of iterations has been performed without finding a model.
2.5 Further Readings and Related Work
105
So far, we are not aware of any Memetic Algorithm or Evolutionary Algorithm for SAT that achieves a performance comparable to state-of-the-art SAT algorithms. However, even when just following the general approach illustrated in this example, there are many alternate choices for the recombination, mutation, perturbative local search and selection procedures, few of which appear to have been implemented and studied so far.
The most prominent type of Evolutionary Algorithms for combinatorial problem solving has been the class of Genetic Algorithms (GAs) [Holland, 1975; Goldberg, 1989]. In early GA applications, individual candidate solutions were typically represented as bit strings of fixed length. Using this approach, interesting theoretical properties of certain Genetic Algorithms can be proven, such as the well-known Schema Theorem [Holland, 1975]. Yet, this type of representation has been shown to be disadvantageous in practice for solving certain types of combinatorial problems [Michalewicz, 1994]; in particular, this is the case for permutation problems such as the TSP, which are represented more naturally using different encodings. Besides Genetic Algorithms, there are two other major approaches based on the same metaphor of Evolutionary Computation: Evolution Strategies [Rechenberg, 1973; Schwefel, 1981] and Evolutionary Programming [Fogel et al., 1966]. All three approaches have been developed independently and, although all of them originated in the 1960s and 1970s, only in the beginning of the 1990s did researchers become fully aware of the common underlying principles [Bäck, 1996]. These three types of Evolutionary Algorithms tend to be primarily applied to different types of problems: While Genetic Algorithms are typically used for solving discrete combinatorial problems, Evolution Strategies and Evolutionary Programming were originally developed for solving (continuous) numerical optimisation problems. For a detailed discussion of the similarities and differences between these different types of Evolutionary Algorithms and their applications, we refer to Bäck [1996].
2.5 Further Readings and Related Work There exists a huge amount of literature on the various SLS methods discussed in this chapter. Since it would be impossible to give a reasonably complete list of references, we refer the interested reader to some of the most relevant and accessible literature, and point out books as well as conference and workshop proceedings that will provide additional material and further references.
106
Chapter 2 SLS Methods
There are relatively few books that provide a general introduction to and overview of different SLS techniques. One of these is the book on ‘modern heuristics’ by Michalewicz and Fogel [2000], which is rather focused on Evolutionary Algorithms but also discusses other SLS methods; another one is the book by Sait and Youssef [1999], which includes the discussion of two lesserknown SLS techniques: Simulated Evolution and Stochastic Evolution. For a tutorial-like introduction to some of the SLS techniques covered in this chapter, such as SA, TS or GAs, we refer to the book edited by Reeves [1993b]. More advanced material is provided in the book on local search edited by Arts and Lenstra [1997], which contains expert introductions to individual SLS techniques as well as overviews on the state-of-the-art of applying SLS methods to various combinatorial problems. The Handbook of Metaheuristics [Glover and Kochenberger, 2002] includes reviews of different SLS methods and additional related topics by leading experts on the respective subjects. There is a large number of books dedicated to individual SLS techniques. This is particularly true for Evolutionary Algorithms, one of the oldest and most developed SLS methods. Currently, the classics in this field are certainly the early books describing these techniques [Holland, 1975; Goldberg, 1989; Schwefel, 1981; Fogel et al., 1966]; the book by Mitchell [1996] offers a good introduction to Genetic Algorithms. Similarly, there exist a number of books dedicated to Simulated Annealing, including Aarts and Korst [1989] or van Laarhoven and Aarts [1987]. For an overview of the literature on SA as of 1988 we refer to Collins et al. [1988]. A tutorial-style overview of SA is given in Dowsland [1993], and a summary of theoretical results and statistical annealing schedules can be found in Aarts et al. [1997]. For a general overview of Tabu Search and detailed discussions of its features, we refer to the book by Glover and Laguna [1997]. This book also covers in detail various more advanced strategies, such as Strategic Oscillation and Path Relinking, as well as some lesser-known tabu search methods. Ant Colony Optimisation is covered in detail in the book by Dorigo and Stützle [2004]. For virtually all of the SLS methods covered in this chapter, large numbers of research articles have been published in a broad range of journals and conference proceedings. Research on some of the most prominent SLS methods is presented at dedicated conferences or workshop series. Again, Evolutionary Algorithms are particularly well represented, with conference series like GECCO (Genetic and Evolutionary Computation Conference), CEC (Congress on Evolutionary Computation) or PPSN (Parallel Problem Solving from Nature) as well as some smaller conferences and workshops dedicated to specific subjects and issues in the general context of Evolutionary Algorithms. Similarly, The ANTS series of workshops (From Ant Colonies to Artificial Ants: A Series of International Workshops on Ant Algorithms) provides a specialised forum for research on Ant Colony Optimisation algorithms and closely related topics. Many of the
2.5 Further Readings and Related Work
107
most recent developments and results in these areas can be found in the respective proceedings. The Metaheuristics International Conference (MIC) series, initiated in 1995, has a broader scope, including many of the SLS methods described in Sections 2.2 and 2.3 of this chapter. The corresponding post-conference collections of articles [Osman and Kelly, 1996; Voß et al., 1999; Hansen and Ribeiro, 2001; Resende and de Sousa, 2003] are a good reference for recent developments in this general area. An extensive, commented bibliography on various SLS methods can be found in Osman and Laporte [1996]. In the operations research community, papers on SLS algorithms now appear frequently in journals such as the INFORMS Journal on Computing, Operations Research, European Journal of Operational Research and Computers & Operations Research. There even exists one journal, the Journal of Heuristics that is dedicated to research related to SLS methods. Since the early 1990s, SLS algorithms have also been very prominent in the artificial intelligence community, particularly in the context of applications to SAT, constraint satisfaction, planning and scheduling. The proceedings of major AI conferences, such as IJCAI (International Joint Conference on Artificial Intelligence), AAAI (AAAI National Conference on Artificial Intelligence), ECAI (European Conference on Artificial Intelligence), as well as the proceedings of the CP (Principles and Practice of Constraint Programming) conferences and leading journals in AI, including Artificial Intelligence and the Journal on AI Research (JAIR), contain a large number of articles on SLS algorithms and their application to AI problems (we will provide many of these references in Part II of this book). There are a number of SLS methods that we did not present in this chapter, some of which are closely related to the approaches we discussed. First, let us mention that there exists a number of other algorithms that make use of large neighbourhoods. Prominent examples are Ejection Chains [Glover and Laguna, 1997] and Cyclic Exchange Neighbourhoods [Thompson and Orlin, 1989; Thompson and Psaraftis, 1993]. For an overview of SLS methods based on very large scale neighbourhoods we refer to Ahuja et al. [2002]. Other SLS methods include: Threshold Accepting, a variant of Simulated Annealing that uses a deterministic acceptance criterion [Dueck and Scheuer, 1990]; Extremal Optimisation, which in each step tries to eliminate ‘defects’ of the current candidate solution and accepts every new candidate solution independent of its solution quality [Boettcher and Percus, 2000; Boettcher, 2000]; and Variable Neighbourhood Search (VNS), which is based on the fundamental idea of changing the neighbourhood relation during the search process. The VNS framework comprises Variable Neighbourhood Descent (see Section 2.1) as well as various methods that can be seen as special cases of Iterated Local Search (see Section 2.3); other VNS algorithms, however, such as Variable Neighbourhood Decomposition
108
Chapter 2 SLS Methods
Search (VNDS) [Hansen and Mladenovic´ , 2001b], differ significantly from the SLS methods discussed in this chapter. Ant Colony Optimisation is only the most successful example of a class of algorithms that are often referred to as swarm intelligence methods [Bonabeau et al., 1999; Kennedy et al., 2001]. A technique that is inspired by Evolutionary Algorithms is Estimation of Distribution Algorithms (EDA) [Baluja and Caruana, 1995; Larrañaga and Lozano, 2001; Mühlenbein and Paaß, 1996]; these algorithms build and iteratively update a probabilistic model of good candidate solutions that is used to generate populations of candidate solutions. Another population-based SLS method is Scatter Search [Glover, 1977; Glover et al., 2002; Laguna and Martí, 2003], which is similar to Memetic Algorithms, but typically uses a more general notion of recombination and differs in some other details of how the population is handled. Several of these and other SLS methods are described in the Handbook of Metaheuristics [Glover and Kochenberger, 2002] and in the book New Ideas in Optimisation [Corne et al., 1999].
2.6 Summary At the beginning of this chapter we discussed important details and refinements of Iterative Improvement, one of the most fundamental SLS methods. Large neighbourhoods can be used to improve the performance of iterative improvement algorithms, but they are typically very costly to search; in this situation, as well as in general, neighbourhood pruning techniques and pivoting rules, such as first-improvement neighbour selection, can help to increase the efficiency of the search process. More advanced SLS methods, such as Variable Neighbourhood Descent (VND), Variable Depth Search (VDS) and Dynasearch use dynamically changing or complex neighbourhoods to achieve improved performance over simple iterative improvement algorithms. Although these strategies yield significantly better performance for a variety of combinatorial problems, they are also typically more difficult to implement than simple iterative improvement algorithms and often require advanced data structures to realise their full benefit. Generally, the main problem with simple iterative improvement algorithms is the fact that they get easily stuck in local optima of the underlying evaluation function. By using large or complex neighbourhoods, some poor-quality local optima can be eliminated; but at the same time, these extended neighbourhoods are typically more costly or more difficult to search. Therefore, in this
2.6 Summary
109
chapter we introduced and discussed various other approaches for dealing with the problem of local optima as encountered by simple iterative improvement algorithms: allowing worsening search steps, that is, search steps which achieve no improvement in the given evaluation or objective function, such as in Simulated Annealing (SA), Tabu Search (TS) and many Iterated Local Search (ILS) algorithms and Evolutionary Algorithms (EAs); dynamically modifying the evaluation function, as exemplified in Dynamic Local Search (DLS); and using adaptive constructive search methods for providing better initial candidate solutions for perturbative search methods, as seen in GRASP, Adaptive Iterated Construction Search (AICS) and Ant Colony Optimisation (ACO). Each of these approaches has certain drawbacks. Allowing worsening search steps introduces the need to balance the ability to quickly reach good candidate solutions (as realised by a greedy search strategy) vs the ability to effectively escape from local optima and plateaus. Dynamic modifications of the evaluation function can eliminate local optima, but at the same time typically introduces new local optima; in addition, as we will see in Chapter 6, it can be difficult to amortise the overhead cost introduced by the dynamically changing evaluation function by a reduction in the number of search steps required for finding (high-quality) solutions. The use of adaptive constructive search methods for obtaining good initial solutions for subsequent perturbative SLS methods raises a very similar issue; here, the added cost of the construction method needs to be amortised. Beyond the underlying approach for avoiding the problem of search stagnation due to local optima, the SLS algorithms presented in this chapter share or differ in a number of other fundamental features, such as the combination of simple search strategies into hybrid methods, the use of populations of candidate solutions and the use of memory for guiding the search process. These features form a good basis not only for a classification of SLS methods, but also for understanding their characteristics as well as the role of the underlying approaches (see also Vaessens et al. [1995]). Our presentation made a prominent distinction between ‘simple’ and hybrid SLS methods, where hybrid methods can be seen as combinations of various ‘simple’ SLS methods. In some cases, such as ILS and EAs, the components of the hybrid method are various perturbative SLS processes. In other cases, such as GRASP, AICS and ACO, constructive and perturbative search mechanisms are combined. All these hybrid methods can use different types of ‘simple’ SLS algorithms as their components, including simple iterative improvement methods as well as more complex methods, such as SA, TS or DLS, and a variety of constructive search methods. In this sense, the hybrid SLS methods presented here are higher-order algorithms that require complex procedural or functional
110
Chapter 2 SLS Methods
parameters, such as a subsidiary SLS procedure, to be specified in order to be applied to a given problem. It is interesting to note that some of the hybrid algorithms discussed here, including ACO and EAs, originally did not include the use of perturbative local search for improving individual candidate solutions. However, adding such perturbative local search mechanisms has been found to significantly improve the performance of the algorithm in many applications to combinatorial problems. Two of the SLS methods discussed here, ACO and EAs, can be characterised as population-based search methods; these maintain a population of candidate solutions that is manipulated and evaluated during the search process. Most state-of-the-art population-based SLS approaches integrate features from the individual elements of the population in order to guide the search process. In ACO, this integration is realised by the pheromone trails which provide the basis for the probabilistic construction process, while in EAs, it is mainly achieved through recombination. In contrast, all the ‘simple’ SLS algorithms discussed in Section 2.2 as well as ILS, GRASP and AICS manipulate only a single candidate solution in each search step. In many of these cases, such as ILS, various population-based extensions are easily conceivable [Hong et al., 1997; Stützle, 1998c]. Integrating features of populations of candidate solutions can be seen as one (rather indirect) mechanism that uses memory for guiding the search process towards promising regions of the search space. The weights used in AICS serve exactly the same purpose. A similarly indirect form of memory is represented by the penalties used by DLS; only here, the purpose of the memory is at least as much to guide the search away from the current, locally optimal search position, as to guide it towards better candidate solutions. The prototypical example of an SLS method that strongly exploits an explicit form of memory for directing the search process is Tabu Search. Many SLS methods were originally inspired by natural phenomena; examples of such methods are SA, ACO and EAs, along with several other SLS methods that we did not cover in detail. Because of the original inspiration, the terminology used for describing these SLS methods is often heavily based on jargon from the corresponding natural phenomenon. Yet, closer study of these computational methods, and in particular of the high-performance algorithms derived from them, often reveals that their performance has little or nothing to do with the aspects and features that are important in the context of the corresponding natural processes. In fact, it can be argued that the most successful nature-inspired SLS methods for combinatorial problem solving are those that have been liberated to a large extent from the context of the phenomenon that originally motivated them, and use the new mechanisms and concepts derived from that original context to effectively guide the search process.
Exercises
111
Finally, it should be pointed out that in virtually all of the local search methods discussed in this chapter, the use of random or probabilistic decisions results in significantly improved performance and robustness of these algorithms when solving combinatorial problems in practice. One of the reasons for this lies in the diversification achieved by stochastic methods, which is often crucial for effectively avoiding or overcoming stagnation of the search process. In principle, it would certainly be preferable to altogether obliviate the need for diversification by using strategies that guide the search towards (high-quality) solutions in an efficient and reliable way. But given the inherent hardness of the problems to which SLS methods are typically applied, it is hardly surprising that in practice, such strategies are typically not available, leaving stochastic local search as one of the most attractive solution approaches.
Exercises 2.1
[Easy] What is the role of 2-exchange steps in the Lin-Kernighan Algorithm?
2.2
[Medium] Design and describe a variable depth search algorithm for SAT.
2.3
[Easy] Show that the condition for the independence of a pair of 2-exchange steps to be considered for a complex dynasearch move is necessary to guarantee feasibility of the tour obtained after executing a pair of 2-exchange moves. To do so, consider what happens if we have i < k < j < l for the indices of the 2-exchange moves that delete edges (ui , ui+1 ), (uj , uj +1 ) and (uk , uk+1 ), (ul , ul+1 ), respectively.
2.4
[Easy] Show that Iterative Improvement and Randomised Iterative Improvement can be seen as special cases of Probabilistic Iterative Improvement.
2.5
[Easy] Explain the impact on the value of the tabu tenure parameter in Simple Tabu Search on the diversification vs intensification of the search process.
2.6
[Medium] Which tabu attributes would you choose when applying Simple Tabu Search to the TSP? Are there different possibilities for deciding when a move is tabu? Characterise the memory requirements for efficiently checking the tabu status of solution components.
2.7
[Medium] Why is it preferable in Dynamic Local Search to associate penalties with solution components rather than with candidate solutions?
112
Chapter 2 SLS Methods
2.8
[Medium; Implementation] Implement the Guided Local Search (GLS) algorithm for the TSP described in Example 2.6 (page 84). (You can make use of the 2-opt implementation available at www.sls-book.net; if you do so, think carefully about how to best integrate the edge penalties into the local search procedure.) For the search initialisation, use a tour returned by Uninformed Random Picking. Run this implementation of GLS on TSPLIB instance pcb3038 (available from TSPLIB [2003]). Perform 100 independent runs of the algorithm with n search steps each, where n = 3 038 is the number of vertices in the given TSP instance. Record the best solution quality reached in each run and report the distribution of these solution quality values (cf. Example 2.1, page 64f.). For comparison, modify your implementation such that instead of Guided Local Search, it realises a variant of the randomised first improvement algorithm for TSP described in Example 2.1 (page 64f.) that initialises the search by Uninformed Random Picking. Measure the distribution of solution qualities obtained from 100 independent runs of this algorithm on TSPLIB instance pcb3038, where each run is terminated when a local minimum is encountered. Compare the solution quality distribution thus obtained with that for GLS — what do you observe? What can you say about the run-time required by both algorithms, using the same termination criteria as in the previous experiments?
2.9
[Medium] Show precisely how Memetic Algorithms fit the definition of an SLS algorithm from Chapter 1, Section 1.5.
2.10 [Medium] The various SLS methods described in this chapter can be classified according to different criteria, including: (1) the use of a population of solutions, (2) the explicit use of memory (other than just for storing control parameters), (3) the number of different neighbourhood relations used in the search, (4) the modification of the evaluation function during the search and (5) the inspiring source of an algorithm (e.g., by natural phenomena). Classify the SLS methods discussed in this chapter according to these criteria.
The purpose of models is not to fit the data but to sharpen the questions. —Samuel Karlin, Mathematician & Bioinformatician
Generalised Local Search Machines In this chapter, we introduce Generalised Local Search Machines (GLSMs), a formal framework for stochastic local search methods. The underlying idea is that most efficient SLS algorithms are obtained by combining simple (pure) search strategies using a control mechanism; in the GLSM model, the control mechanism is essentially realised by a non-deterministic finite state machine. GLSMs provide a uniform framework capable of representing most modern SLS methods in an adequate way; they facilitate representations which clearly separate between search and search control. After defining the basic GLSM model, we establish the relation between our definition of stochastic local search algorithms and the GLSM model. Next, we discuss several aspects of the model, such as state types, transitions types and structural GLSM types; we also show how various well-known SLS methods can be represented in the GLSM framework. Finally, we address extensions of the basic GLSM model, such as co-operative, learning and evolutionary GLSMs.
3.1 The Basic GLSM Model Many high-performance SLS algorithms are based on a combination of several simple search strategies, such as Iterative Best Improvement and Random Walk or the subsidiary local search and perturbation procedures in Iterated Local Search. Such algorithms can be seen as operating on two levels: at a lower level, the underlying simple search strategies are executed, while activation of and transitions between different strategies is controlled at a higher level. The main 113
114
Chapter 3 Generalised Local Search Machines
idea underlying the concept of a General Local Search Machine (GLSM) is to explicitly represent the higher-level search control mechanism in the form of a finite state machine. Finite state machines (FSMs) are one of the most prominent formal models in the computing sciences [Hopcroft et al., 2001; Sipser, 1997]. They can be seen as abstractions of systems characterised by a finite number of states. Starting in a specific state, the current state of an FSM can change as a response to certain events, for example, a signal received from its environment; these changes in system state are called state transitions. As one of the simplest control paradigms, FSMs are widely used to model systems, processes and algorithms in many domains, such as hardware design or state-of-the-art computer games. Intuitively, a Generalised Local Search Machine (GLSM) for a given problem Π is an FSM in which each state corresponds to a simple SLS method for Π. The machine starts in an initial state z0 ; it then executes one step of the SLS method associated with the current state and selects a new state according to a transition relation ∆ in a nondeterministic manner. This is iterated until a termination condition is satisfied; this termination condition typically depends on the search state (e.g., evaluation or objective function value of the current candidate solution), search history (e.g., number of local search steps or state transitions performed), or resource bounds (e.g., total CPU time consumed). For most SLS methods, the termination predicate used typically depends more on the specific application context than on the underlying search strategy. Therefore, for simplicity’s sake, a termination condition is not explicitly included in our GLSM model; instead, we consider it as part of the run-time environment. (Note, however, that analogously to standard FSM models, termination conditions could easily be included in the GLSM model in the form of absorbing final states and appropriate state transitions, such that when a final state is reached, the machine halts and the search process is terminated.) Like SLS algorithms, GLSMs can make use of additional memory, for example, to model parameters of the underlying search strategies or to memorise previously encountered candidate solutions.
Definition 3.1 Generalised Local Search Machine
A Generalised Local Search Machine (GLSM) is formally defined as a tuple M := (Z, z0 , M, m0 , ∆, σZ , σ∆ , τZ , τ∆ ) where Z is a set of states and z0 ∈ Z the initial state. As in Definition 1.10 (page 38f.), M is a set of memory states and m0 ∈ M is the initial memory state. ∆ ⊆ Z × Z is the transition relation for M; σZ and σ∆ are sets of state types and transition types, respectively, while τZ : Z → σZ and τ∆ : ∆ → σ∆ associate states and transitions with
3.1 The Basic GLSM Model
115
their corresponding types. We call τZ (z ) the type of state z and τ∆ ((z1 , z2 )) the type of transition (z1 , z2 ), respectively.
Note that in this definition, state types are used to formally represent the (typically simple) SLS methods associated with the GLSM states, and transition types capture the various strategies that are used to switch between the GLSM states. (Transition types are further discussed in the next section, and practically relevant examples of state types are given in Section 3.3.) It is often useful to assume that σZ and σ∆ do not contain any types that are not associated with at least one state or transition of the given machine (i.e., τZ , τ∆ are surjective). In this case, we define the type of machine M as τ (M) : = (σZ , σ∆ ). We allow for several states of M having the same type (i.e., τZ need not be injective). However, in many cases there is precisely one state of each state type; in the following, when this is the case we will use the same symbols for denoting states and their respective types, as long as their meaning is clear from the context. Furthermore, for simplicity, we assume that different state types always represent different search strategies. Note that we do not require that each of the states in Z can be actually reached when starting in state z0 ; as we will shortly see, it is generally not trivial to decide this form of reachability. Nevertheless, it is desirable to ensure, whenever possible, that a given GLSM does not contain unreachable states. Example 3.1 A Simple 3-State GLSM
The following GLSM models a hybrid SLS strategy which after initialising the search (state z0 ), probabilistically alternates between two search strategies (states z1 and z2 ): M := ({z0 , z1 , z2 }, z0 , {m0 }, m0 , ∆, σZ , σ∆ , τZ , τ∆ ) where ∆ := {(z0 , z1 ), (z1 , z2 ), (z2 , z1 ), (z1 , z1 ), (z2 , z2 )} σZ := {z0 , z1 , z2 } σ∆ := {PROB(p) | p ∈ {1, p1 , p2 , 1 − p1 , 1 − p2 }} τZ (zi ) := zi , i ∈ {1, 2, 3} τ∆ ((z0 , z1 )) := PROB(1) τ∆ ((z1 , z2 )) := PROB(p1 ) τ∆ ((z2 , z1 )) := PROB(p2 ) τ∆ ((z1 , z1 )) := PROB(1 − p1 ) τ∆ ((z2 , z2 )) := PROB(1 − p2 )
116
Chapter 3 Generalised Local Search Machines PROB(1–p1)
PROB(1–p2) PROB(p1)
Z0
PROB(1)
Z1
Z2 PROB(p2)
Figure 3.1 Simple 3-state GLSM, representing a hybrid SLS method that, after initialising
the search process, probabilistically alternates between two search strategies.
Intuitively, transitions of type PROB(p) from the current state will be executed with probability p. The generic transition type PROB(p) formally corresponds to unconditional, probabilistic transitions with an associated transition probability p; it will be presented in more detail in Section 3.2. Note that there is only one memory state, m0 , which indicates that the search process does not make use of memory other than the GLSM state. This type of GLSM can be used, for example, to model a variant of Randomised Iterative Improvement, in which case the state types z0 , z1 , z2 may represent the simple SLS methods Uninformed Random Picking, Best Improvement and Uninformed Random Walk (see also Section 3.3).
We will usually specify GLSMs more intuitively using a standard graphic representation for the finite state machine part. In this graphic notation, GLSM states are represented by circles labelled with the respective state types, and state transitions are represented by arrows, labelled with the respective transition types; the initial state is marked by an incoming arrow that does not originate from another state. That arrow may be annotated by the initial memory state, m0 , or a procedure that generates m0 , if the given GLSM uses additional memory, that is, it has a memory space M with more than one element. The graphic representation for the GLSM from Example 3.1 is shown in Figure 3.1.
GLSM Semantics Having formally defined the structure and components of a Generalised Local Search Machine, we now need to specify the semantics of this formal model, that is, the way in which it works when applied to a specific instance π of a combinatorial problem Π.
3.1 The Basic GLSM Model
117
Obviously, the operation of any given GLSM M crucially depends on the search strategies represented by its states and on the nature of the transitions between these states. These, together with the problem instance to be solved as well as the corresponding search space, the solution set, the neighbourhood relations for the SLS methods associated with each state type and the termination predicate of the stochastic local search process to be modelled, form the runtime environment of M. The separation between the components included in the GLSM definition and those included in the run-time environment is mainly motivated by the goal to have the GLSM model capture the higher-level search control mechanism, independent of the subsidiary SLS methods associated with the states and independent of the given problem instance. Remark: The memory space M is the only component of a GLSM that may
depend on the given problem instance. However, since in many cases memory states play an important part in the context of the high-level control strategy of a hybrid SLS algorithm, such as Dynamic Local Search or Reactive Tabu Search, it makes sense to include it in the definition of a GLSM rather than in that of the run-time environment. The same argument applies, to a lesser extent, to the search space, since features of the current candidate solution can also be used for search control purposes. In the context of the formal definition of GLSM semantics, the separation of the components into two mathematical objects, the GLSM and its run-time environment, is not important. For now, we will focus on the part of the search process performed by a GLSM that is independent of the run-time environment, and in particular of the nature of the state and transition types. (Transition types are further discussed in the next section, and practically relevant examples of state types are given in Section 3.3.) Generally, the operation of a GLSM M in a given run-time environment can be described as follows. First, the state of M is set to the initial state z0 , the memory state is set to m0 , and an initial candidate solution is determined. Rather than specifying the initial candidate solution as a part of the GLSM model, we generate it using the search method associated with the initial state z0 , which in the simplest case is Uninformed Random Picking. After initialisation, M performs GLSM steps, each of which consists of a search step according to the current state of M, followed by a state transition, which may depend on the new candidate solution and memory state of M, and may result in a new memory state. These GLSM steps are iterated until the termination predicate specified in the run-time environment is satisfied.
118
Chapter 3 Generalised Local Search Machines Example 3.2 Semantics of a Simple 3-State GLSM
The operation of the simple 3-state GLSM from Example 3.1 (page 115f.) can be described intuitively as follows. For a given problem instance π , the local search process is initialised by setting the machine to its initial state z0 . The memory state is set to m0 , but since it never changes, we ignore it in the following (m0 is the only memory state). Then, one local search step is performed according to state type z0 (this step is designed to initialise the subsequent local search process, for example, by randomly generating or selecting a candidate solution), and with a probability of one, the machine switches to state z1 . Now, one search step is performed according to the search strategy associated with state z1 , and with probability p1 , the machine switches to state z2 , otherwise it remains in z1 . The behaviour in state z2 is very similar: first, a z2 step is performed, and then with probability p2 , the machine switches back to state z1 , otherwise it remains in z2 . This results in a local search process which repeatedly and nondeterministically switches between phases of z1 and z2 steps. However, only once in each run of the machine, a z0 step is performed, and that is at the very beginning of the local search process. As previously discussed, a number of different termination criteria can be used for this type of search process. Note that the operation of M is uniquely characterised by the state of M, the candidate solution of the search process realised by M, and the memory state of M for any given point in time, where time is measured in GLSM steps. This information can be captured in the form of two functions, the actual state trajectory and actual search trajectory of M, which specify the machine state and the candidate solution of M as well as the memory state over time. However, due to the inherent stochasticity of GLSMs and the SLS algorithms they model, the outcome of each GLSM step is generally not a single candidate solution, machine state and memory state, but rather a set of probability distributions over machine states, candidate solutions and memory states. Therefore, to completely characterise the behaviour of a given GLSM, two functions are used that define the probability distribution over candidate solutions and machine states as well as memory states over time, the probabilistic search trajectory and the probabilistic state trajectory. Technically, when additional memory is used, we allow the memory state to be affected by state transitions (this will be further discussed in Section 3.2 and examples are given in Section 3.3), but not by search steps, and consequently we consider it to be part of the state trajectory. This reflects our view of the memory state as a part of the higher-level search control mechanism represented by the GLSM.
3.1 The Basic GLSM Model
119
In Depth Formal Definition of GLSM Semantics In the following, we show how the semantics of a GLSM, in terms of its probabilistic and actual search and state transition functions, can be formally defined. In the following, for simplicity’s sake, we ignore the termination predicate that is part of the given runtime environment. (Note, however, that in principle the termination predicate can be integrated into the following definitions in a rather straightforward way.) We assume that the semantics of each state type τ , which are formally part of the run-time environment, are defined in the form of search transition functions γτ : S × M → D(S), where S denotes the set of candidate solutions in the search space induced by the given problem instance, M is the given set of memory states and D(S) represents the set of probability distributions over S . Intuitively, γτ determines for each candidate solution in S and memory state in M the resulting candidate solution after one τ -step has been performed; it corresponds to the step function of the search strategy associated with τ and can be defined functionally or procedurally (see also Chapter 1). Based on the subsidiary search transition functions γτ , we now define the direct search transition function γS : S × M × Z → D(S) which, for a given search position, memory state and GLSM state, determines the distribution over search positions after one step of the GLSM:
γS (s, m, z)(s ) := γτ (s, m)(s ) where τ := τZ (z) is the type of state z and γτ (s, m) is the distribution over candidate solutions and memory states reached from s, m after one τ -step. (Recall that according to our earlier convention, γτ (s, m)(s ) denotes the probability of s under that distribution, and γS (s, m, z)(s ) denotes the probability of s under distribution γS (s, m, z).) Furthermore, we assume that the semantics of the state transitions in the given GLSM are specified in the form of a function γZ : S × M × Z → D(Z × M ) that models the direct transitions between states of the given GLSM. (Note that this allows for state transitions to depend on and to affect memory states, as will be discussed in detail later, in Section 3.2.) Formally, this function can be defined on the basis of the transitions from a given state z and the semantics of the respective transition types; for example, given an unconditional, probabilistic transition between two states zi , zk , that is, τ∆ ((zi , zk )) = PROB(p), applying the direct state transition function γZ to a search position s, memory state m and GLSM state zi , gives a distribution D over GLSM states and memory states such that D((zk , m)) = p, where here and in the following, D(e) denotes the probability of e under distribution D . Note that in this example, the transition does not affect the memory state m. The direct transition function γS and the state transition function γZ can be generalised to the case that models the effects of a single GLSM step on a given distribution of candidate solutions, memory states or GLSM states. In the case of γS this is modelled by a function ΓS defined as:
ΓS (DS , DZM )(s ) :=
s∈S,m∈M,z∈Z
γS (s, m, z)(s ) · DS (s) · DZM (z, m)
Note that the probability of candidate solution s after the step is obtained from the probabilities of going in state z from candidate solution s and memory state m to candidate
120
Chapter 3 Generalised Local Search Machines solution s , weighted by the probability of s, m and z under the given distributions DS and DZM . Analogously, we generalise γZ to the function ΓZ , which models the effects of one GLSM step on a given distribution of GLSM states:
ΓZ (DS , DZM )(z , m ) :=
s∈S,m∈M,z∈Z
γZ (s, m, z)(z , m ) · DS (s) · DZM (z, m)
Note that the memory state may be affected only when the GLSM state is updated, but not as a side-effect of a search step. This reflects the view that GLSM states and memory states are conceptually closely related, since both model aspects of the search control mechanism. Based on ΓS and ΓZ , we can now give the following inductive definition of the prob∗ ∗ abilistic search and state trajectory functions γS : N → D(S) and γZ : N → D(Z × M ):
γS∗ (0)(s) := 1/#S γS∗ (t + 1)(s) := ΓS (DS , DZM ), ∗ ∗ where DS := γS (t) and DZM := γZ (t)
1 if z = z0 and m = m0 0 otherwise ∗ γZ (t + 1)(z, m) := ΓZ (DS , DZM ), ∗ (0)(z, m) := γZ
∗ ∗ where DS := γS (t + 1) and DZM := γZ (t) ∗ ∗ The interlocked inductive definitions of γS and γZ reflect the intended operation of a GLSM, where in each step, first a new candidate solution s is determined based on the current candidate solution s, memory state m, and GLSM state z , and then, a new GLSM state z and memory state m are determined based on s , m and z . Note that the choice of the initial distribution of candidate solutions is somewhat arbitrary, since the true starting point of the search trajectory is typically determined by means of the first search step in state z0 (with initial memory state m0 ) in a way that is effectively independent of the current candidate solution. The actual search and state trajectories are formally defined in a similar manner, in ∗ ∗ : N → S and δZ : N → Z × M : the form of two functions δS
δS∗ (0) := draw(γS∗ (0)) δS∗ (t + 1) := draw(γS (DS , DZM )) ∗ ∗ where DS := γS (t) and DZM := γZ (t) ∗ δZ (0) := (z0 , m0 ) ∗ δZ (t + 1) := draw(γZ (DS , DZM )) ∗ ∗ where DS := γS (t + 1) and DZM := γZ (t)
In these definitions, the function draw(D) randomly selects an element from the domain of a given probability distribution D such that element e is chosen with probability D(e). Note the similarity between this definition and the definition of the probabilistic search ∗ ∗ and state trajectories γS and γZ ; the only difference is that in the case of the actual
3.1 The Basic GLSM Model
121
trajectories, in each GLSM step a single candidate solution and state are randomly chosen according to the respective distributions.
GLSMs as Factored Representations of SLS Strategies Technically, a GLSM represents the higher-level search control of an SLS strategy, that is, the way in which the initialisation and step function of the SLS method are composed from the respective functions of subsidiary component SLS methods. In this sense, a GLSM is a factored representation of an SLS strategy. Note that the memory used by many SLS strategies, such as Tabu Search or Ant Colony Optimisation, is treated as an explicit part of the high-level search control mechanism that is modelled by a GLSM. In particular, as previously mentioned, in a GLSM formalisation, all modifications of the memory state are performed by means of GLSM state transitions and not in combination with the actual search steps of the underlying component search strategies, which may depend on the memory state, but only affect the current candidate solution. (Technically, the memory modification performed along with a state transition may depend on the current search position and memory state; memory updates that depend on the precise nature of the last search step, such as required in Tabu Search, can be implemented by keeping the previous candidate solution in memory.) The other components of an SLS algorithm, namely the search space, solution set and neighbourhood relation (all of which are induced by the given problem instance), as well as the termination predicate, form part of the run-time environment of a GLSM representation of that algorithm. When modelling an existing or novel hybrid SLS algorithm by a GLSM, the component search strategies associated with the GLSM states are often derived from existing simple SLS algorithms for the given problem. Theoretically, these component SLS strategies can be complex or hybrid strategies; but according to the primary motivation behind the GLSM model, they should be as pure and simple as possible, in order to achieve a clean separation between simple search strategies and search control. Typically, the component search strategies have the same search space and solution set, which is part of the GLSM’s run-time environment. The respective initialisation functions are either modelled by the initialisation state of the GLSM, or not needed at all, since the respective component search strategy is applied to the result of another component search strategy, in which case the respective initial probability distribution over candidate solutions is implicitly given by the context in which the corresponding GLSM state is activated. The termination predicates of these subsidiary SLS methods, particularly when they are based on
122
Chapter 3 Generalised Local Search Machines
aspects of the current candidate solution or memory state, are often reflected in conditional transitions leaving the respective GLSM state. The simple SLS algorithms modelled by the states of a given GLSM M can be based on different neighbourhood relations, which become part of the run-time environment of M. It is always possible to define a unified relation that contains the neighbourhood relation for each state type as a subset. However, in the case of the subsidiary SLS methods associated with the states of a given GLSM this can be problematic, since formally, the presence of the initial state often implies that under the unified neighbourhood relation any two candidate solutions are direct neighbours. This is true, for example, for Uninformed Random Picking, which uses a complete neighbourhood (see Chapter 1, Section 1.5). But if the search initialisation is considered separately, as in the formal definition of an SLS algorithm (Definition 1.10, page 38f.), unifying the neighbourhood relations of the remaining component search strategies can be useful. Various component search strategies of a given GLSM may also use different memory spaces. These can always be combined into a single, unified memory space, as required by the formal definition of a GLSM. It may be noted that technically, memory states are not needed in the GLSM model, since the memory states used in any SLS algorithm can always be folded into GLSM states. But this often leads to unnecessarily complex and cumbersome GLSM representations, in which aspects of high-level and low-level search control are not cleanly separated. Although there are cases where the distinction between high-level and low-level search control is somewhat debatable, for most hybrid SLS algorithms the decision which aspects of the search control mechanism should be modelled by GLSM states and state transitions as opposed to memory states is fairly obvious.
3.2 State, Transition and Machine Types In order to completely specify a GLSM, definitions for the search methods associated with each state type need to be given. Formally, this can be done in the form of search transition functions γτ : S × M → D(S ) for each state type τ (see also the in-depth section on page 119ff.). But as in the case of SLS algorithms in general, it is often clearer and more convenient to define the semantics of GLSM state types in a procedural way, usually in the form of pseudo-code. However, in some cases, more adequate descriptions of complex state types can be obtained by using other formalisms. This is particularly the case for simple SLS strategies whose search steps are based on a multi-stage selection process — these are often amenable to concise decision tree representations. (Examples of
3.2 State, Transition and Machine Types
123
such SLS methods include the WalkSAT algorithm family for SAT, cf. Chapter 6, Section 6.3.) While concrete examples for various state types will be given in Section 3.3 and in subsequent chapters, it is worth discussing some fundamental distinctions between certain state types. One of these concerns the role of the respective states within the general definition of stochastic local search (Definition 1.10, page 38f.). Although we are modelling search initialisation and local search steps using the same mechanism, namely GLSM states, there is a clear distinction between the states that realise these two components of an SLS algorithm. An initialising state is usually different from a search step state in that it is left after one corresponding step has been performed. Also, while search step states correspond to moves in a restricted local neigbourhood (like flipping one variable in SAT), a single initialisation step can typically lead to arbitrary candidate solutions. (As an example, consider Uninformed Random Picking.) Formally, we define an initialising state type as a state type τ for which the local search position after one τ -step is independent of the local search position before the step; states of an initialising type τ are called initialising states. Generally, each GLSM will have at least one initialising state, which is also its initial state. A GLSM can, however, have more than one initialising state and use these states to implement certain forms of restart strategies. Furthermore, state types may be distinguished based on whether or not their semantics depends on the given memory state. The former are called parametric state types, and states of such types are called parametric states. An example for a parametric state can be found in the GLSM representation of Simulated Annealing, where the behaviour of the underlying SLS method depends on the temperature parameter (cf. Section 3.3). Finally, there are many combinatorial problems with a particular, natural neighbourhood relation N . In these cases, it is often useful to distinguish between single-step states and multi-step states with respect to that neighbourhood: given a neighbourhood N and a current search position s, one search step in a single-step state always leads to a direct neighbour of s under N , while one search step in a multi-step state may lead to a candidate solution whose distance to s in the neighbourhood graph induced by N is greater than one. For example, most SLS algorithms for SAT use a 1-exchange neighbourhood relation, under which two variable assignments are direct neighbours if, and only if, they differ in exactly one variable’s value. In this context, a single-step state would flip one variable’s value in each step, whereas a multi-step state could flip several variables per local search step. Consequently, initialising states are an extreme case of multi-step states, since they can affect the values of all variables at the same time.
124
Chapter 3 Generalised Local Search Machines
Transition Types The search control mechanism of a GLSM is realised by its state transitions. While the possible transitions between states are specified in the form of a transition relation, the precise conditions under which a transition (z, z ) is executed are captured in its type, τ∆ ((z, z )); consequently, the definition of transition types forms an important part of GLSM semantics. In the following, we introduce the transition types that provide the basis for the GLSM models of most practically relevant SLS methods. These can be conveniently presented as a hierarchy of increasingly complex and expressive transition types, ranging from simple deterministic transitions to conditional probabilistic transitions. DET—Unconditional deterministic transitions. This is the most basic transition type. When a GLSM is in a state z with an outgoing transition (z, z ) of type DET, it will invariably switch to state z after a single GLSM step. This means that unless z = z , only one step of the search strategy corresponding to z is performed before switching to a different state. Furthermore, for each GLSM state there can be at most one outgoing transition of this type, which severely restricts the class of GLSM structures (and hence search control mechanisms) that can be realised when only DET transitions are used. Although the use of unconditional deterministic transitions is somewhat limited, they frequently occur in the GLSM models of practically relevant SLS algorithms, where they are mostly utilised for leaving initialising states. PROB(p)—Unconditional probabilistic transitions. A transition of type PROB(p) from a GLSM state z to another state z takes a GLSM that is in state z directly into state z with probability p. Clearly, DET transitions are equivalent to a special case of this transition type, namely to PROB(1), and hence do not have to be considered separately in the following. If the set of transitions leaving a state z is given as {t1 , . . . , tn }, where the type of transition tj is PROB(pj ) for every j , the semantics of this state type require nj=1 pj = 1 in order to ensure that the selection of the transition from z is based on a proper probability distribution. Note that without loss of generality, by using PROB(0) transitions we can restrict our attention to fully connected GLSMs, where for each pair of states (zi , zk ), a transition of type PROB(pik ) is defined. The uniform representation thus obtained facilitates theoretical investigations as well as practical implementations. It is also interesting to note that for a given GLSM M any state z that can be reached from the initial state of M by following transitions of type PROB(p) with p > 0 will eventually be reached with arbitrarily high probability in any sufficiently long run of M. Furthermore, in any state z with a PROB(p) selftransition (z, z ), the number of GLSM steps before leaving z , and hence the
3.2 State, Transition and Machine Types
125
number of consecutive search steps performed according to the search strategy associated with state z , is distributed geometrically with mean and variance 1/p. CPROB(C, p) and CDET(C)—Conditional transitions. While until now we have focused on transitions whose execution only depends on the actual state, the following generalisation from PROB(p) introduces context-dependent transitions. A CPROB(C, p) transition from state z to state z is executed with a probability proportional to p only when a condition predicate C is satisfied. If C is not satisfied, all transitions CPROB(C, p) from the current state are blocked, that is, they cannot be executed. Note that we do not (and generally cannot) require that the p values of the outgoing CPROB(C ) transitions of a given GLSM state sum to one; consequently, these values do not directly correspond to transition probabilities, but only determine the ratios between probabilities within a set of unblocked transitions from the same state. In particular, if in a given situation there is only one unblocked CPROB(C, p) transition with p > 0 from the current GLSM state, that transition will be taken with probability one, regardless of the value of p. Obviously, PROB(p) transitions are equivalent to conditional probabilistic transitions CPROB(, p), where is the predicate that is always true. Without loss of generality, we can therefore restrict our attention to GLSMs in which all transitions are of type CPROB(C, p). An important special case of conditional transitions is conditional deterministic transitions, in particular, transitions of the type CDET(C ) = CPROB(C, 1). Conditional deterministic conditions also arise when for a given GLSM state z all but one of its outgoing transitions are blocked at any given time. Note that a deterministic GLSM M is obtained if all condition predicates for the transitions leaving each state of M are mutually exclusive. Generally, depending on the nature of the condition predicates used, the decision whether a conditional transition is deterministic or not can be rather difficult. For the same reasons it can be difficult to decide for a given GLSM with conditional probabilistic transitions whether a particular state is reachable from the initial state. For practical uses of GLSMs with conditional transitions, it is important to ensure that all condition predicates can be evaluated in a sufficiently efficient way (compared to the cost of executing local search steps); ideally, this evaluation should require at most linear (better: constant) time w.r.t. the size of the given problem instance. We distinguish between two kinds of condition predicates. The first of these captures properties of the current candidate solution and its local neighbourhood; the second kind is based on search control aspects, such as the time that has been spent in the current GLSM state, the overall run-time or the current memory state. Naturally, these two kinds of conditions can also be combined.
126
Chapter 3 Generalised Local Search Machines
count(k ) countm(k ) scount(k ) scountm(k ) lmin evalf(y ) noimpr(k )
always true total number of GLSM steps ≥ k total number of GLSM steps modulo k = 0 number of GLSM steps in current state ≥ k number of GLSM steps in current state modulo k = 0 current candidate solution is a local minimum w.r.t. the given neighbourhood relation current evaluation function value ≤ y incumbent candidate solution has not been improved within the last k steps
Table 3.1 Commonly used simple condition predicates.
Some concrete examples for condition predicates are listed in Table 3.1. Note that all these predicates are based on local information only and can thus be efficiently evaluated during the search process. Usually, for each condition predicate, a positive as well as a negative (negated) form will be defined. By using propositional connectives, such as ‘∧’ or ‘∨’, these simple predicates can be combined into compound predicates. However, it is not difficult to see that allowing compound transition predicates does not increase the expressive power of the GLSM model, since every GLSM using compound condition predicates can be reduced to an equivalent GLSM using only simple predicates by introducing additional states and/or transitions.
Transition Actions None of the transition types introduced above have any effect on the memory state of the given GLSM. While in principle, the condition predicates used in conditional probabilistic (or deterministic) transitions may depend on the memory state, in the absence of a means for effecting changes in memory, GLSMs cannot make any use of memory states. Hence, in order to model SLS algorithms that use additional memory, such as Simulated Annealing, Iterated Local Search or Tabu Search, we introduce the concept of transition actions. Transition actions are associated with individual transitions and are executed whenever the GLSM executes the corresponding transition. Generally, transition actions can be added to each of the transition types defined above, and the semantics of the transition in terms of its effect on the immediate successor state of the GLSM are not affected. If T is a transition type, we let T : A denote the same transition type with associated action A. Formally, a
3.2 State, Transition and Machine Types
127
transition action can be seen as a function mapping search positions and memory states into memory states. This accurately reflects the fact that the effect of any transition action is limited to the memory state of the given GLSM; in particular, transition actions cannot be used to modify the current GLSM state or candidate solution. However, transition actions have access to the current search position, and hence they can be used, for example, to memorise the current candidate solution. In practice, the memory used in a given GLSM is often factored into a number of separate attributes or data structures, each of which can be manipulated independently of the others. Hence, when specifying transition actions procedurally, it is often advantageous to represent a single transition action by multiple assignments or procedure calls. It may also be noted that by assigning special roles to parts of a structured memory space, transition actions can be used for realising input/output functionality in actual GLSM implementations or for communicating information between individual machines in co-operative GLSM models (these are further discussed in Section 3.4). Also, by using a special action NOP (‘no operation’) that has no effect on the memory state (formally modelled as an identity function on the given memory space), we can obtain uniform GLSMs in which all transitions have associated actions.
Machine Types The types of the states and transitions form an important part of the complete specification of a given GLSM and determine crucial aspects of the behaviour of the underlying search algorithm. However, it can sometimes be useful to abstract from these types and to focus on the structure of the search control mechanism, as reflected in the states and the transition relation of a GLSM. For example, one may be interested in the difference between a GLSM with five states that are connected sequentially (such that they are visited one after the other in a fixed sequence), and a GLSM with three states that, in principle, allows arbitrary transitions between its states. This motivates the following categorisation of GLSMs into structural classes or machine types. 1-state machines: This is the minimal form of a GLSM. Since every GLSM needs an initialising state in order to generate the initial candidate solution for the local search process, 1-state machines essentially realise iterated sampling processes, such as Uninformed Random Picking. Such extremely simple search algorithms can be useful for analytical purposes, for example, as a reference model when evaluating other types of GLSMs. Nevertheless, their overall practical relevance is rather limited.
128
Chapter 3 Generalised Local Search Machines
1-state+init machines: These machines have one state for search initialisation and one working state. Machines of this type can be further classified as sequential 1-state+init machines, which visit the initialisation state z0 only once, and alternating 1-state+init machines, which may visit z0 multiple times in the course of the search process. The structure of these machine models is shown in Figure 3.2. Many simple SLS methods naturally correspond to sequential 1-state+init GLSMs, while alternating 1-state+init machines are good models for simple SLS algorithms that use a restart strategy. 2-state+init sequential machines: This machine type has three states, one of which is an initialisation state that is only visited once, while the other two are working states. However, once the machine has switched from the first state to the second, it will never switch back (see Figure 3.3, left side);
Z1
Z0
Z1
Z0
Figure 3.2 Sequential (left) and alternating (right) 1-state+init GLSM.
Z1
Z0
Z1
Z2
Z0
Z2
Figure 3.3 Sequential (left) and alternating (right) 2-state+init GLSM.
3.2 State, Transition and Machine Types
129
analogously, the GLSM switches from the second to the third state only once. Thus, each search trajectory of such a machine can be partitioned into three phases: one initialisation step, a number of steps in the first working state and a number of steps in the second working state. 2-state+init alternating machines: Like a 2-state+init sequential machine, this machine type has one initialisation state and two working states. Here, however, arbitrary transitions between all states are possible (see Figure 3.3, right side). An interesting special case arises when the initial state can only be visited once, while the machine might arbitrarily switch between the two working states. Another case that might be distinguished is a uni-directional cyclic machine model, which allows the three states to be visited only in one fixed cyclic order. Obviously, the categorisation can easily be continued in this manner by successively increasing the number of working states. However, as we will see later, to describe state-of-the-art stochastic local search algorithms, machines with up to three states are often sufficient. We conclude our categorisation with a brief look at two potentially interesting cases of the k -state+init machine types: k-state+init sequential machines: As a straightforward generalisation of sequential 2-state+init machines, in this machine type we have k + 1 states that are visited in a linear order. Consequently, after a machine state has been left, it will never be visited again (see Figure 3.4, top). After initialisation, each search trajectory of this type of GLSM can be partitioned into up to k contiguous segments, each of which consists of a sequence of search steps performed in the same GLSM state. k-state+init alternating machines: These machines allow arbitrary transitions between the k + 1 states and may therefore re-initialise the search process and switch between strategies as often as desired (see Figure 3.4, bottom). Two interesting special cases are the uni- and bi-directional cyclic machine models, which can switch between states in a cyclic manner. In the former case, the cyclic structure can be traversed only in one direction, while in the latter case the machine can switch from any state to both its neighbouring states (see Figure 3.5). This categorisation of GLSMs according to their structure provides a very high-level view of the respective search control mechanism, which can be refined in many ways. Nevertheless, as we will see in the following, the abstraction of machine types presented here can be very useful for capturing fundamental differences between various stochastic local search methods.
130
Chapter 3 Generalised Local Search Machines
Z0
Z1
Z2
Z1
Z2
…
Zk
Z3
Z0
Z4
…
Figure 3.4 Sequential (top) and alternating (bottom) k-state+init GLSM.
Z1
Z2
Z0
Z1
Z3
…
Z4
Z2
Z0
Z3
…
Z4
Figure 3.5 Uni-directional (left) and bi-directional (right) cyclic k-state+init GLSM.
3.3 Modelling SLS Methods Using GLSMs
131
DET RP
Figure 3.6 GLSM for Uninformed Random Picking.
3.3 Modelling SLS Methods Using GLSMs Up to this point, we have introduced the GLSM model and discussed various types of GLSM states, transitions and structures. We now demonstrate applications of the model by specifying and discussing GLSM representations for many of the well-known SLS methods described in Chapter 2; this way, important similarities and differences between SLS methods are highlighted. GLSM representations of other SLS methods and algorithms are covered in later chapters and exercises.
Uninformed Random Picking and Random Walk The simplest possible SLS algorithm is Uninformed Random Picking, as introduced in Chapter 1, Section 1.5. When cast into our definition of a stochastic local search algorithm, the init and step functions of Uninformed Random Picking are identical and perform a random uniform selection of a candidate solution from the underlying search space. The corresponding GLSM is shown in Figure 3.6. It has only one state of type RP, formally defined by τRP (s, m)(s ) = 1/#S for all s , s and m. Since functional state type definitions of more advanced SLS strategies can get rather complex and difficult to understand, it is often preferable to define the step functions for GLSM state types procedurally. Such a definition for the random picking state type is shown in Figure 3.7. Note that, as in previous chapters, in these procedural descriptions we generally only mention those parts of the memory state (if any) that are actually used in the corresponding search mechanism. The Uninformed Random Walk algorithm (cf. Chapter 1, Section 1.5) requires an additional state type, RW, whose semantics are defined in Figure 3.8. In its simplest form, the search is initialised by random picking, followed by a series of uninformed random walk steps. The corresponding GLSM is shown in Figure 3.9. In practice, many SLS algorithms are extended by a restart mechanism, by which, in the simplest case, after every k search steps (where k is a parameter
132
Chapter 3 Generalised Local Search Machines
procedure step-RP(π, s) input: problem instance π ∈ Π, candidate solution s ∈ S(π) output: candidate solution s ∈ S(π)
s := random(S); return s end step-RP Figure 3.7 Procedural specification of GLSM state RP; the function random(S) returns an element of S selected randomly according to a uniform distribution over S.
procedure step-RW(π, s) input: problem instance π ∈ Π, candidate solution s ∈ S(π) output: candidate solution s ∈ S(π)
s := random(N(s)); return s end step-RW Figure 3.8 Procedural specification of GLSM state RW.
DET RP
DET
RW
Figure 3.9 GLSM for Uninformed Random Walk.
of the algorithm), the search process is reinitialised. Generally, other conditions can be used for determining when a restart should occur. Figure 3.10 shows the GLSM for Uninformed Random Walk with Random Restart; it is obtained from the GLSM for the basic Uninformed Random Walk algorithm without restart by a simple modification of the state transitions. Note how the GLSM representations for both algorithms indicate the fact that Uninformed Random Walk can already be seen as a (albeit very simple) hybrid SLS algorithm, using two types of search steps, Uninformed Random Picking and Uninformed Random Walk. While using a restart mechanism for Uninformed Random Walk does not appear to be useful other than for illustrative purposes, the analogous extension of Iterative Improvement, covered in the following, leads to significant performance improvements.
3.3 Modelling SLS Methods Using GLSMs
133
CDET(not R) DET
RP
RW CDET(R)
Figure 3.10 GLSM for Uninformed Random Walk with Random Restart; R is the restart predicate, for example, countm(k).
procedure step-BI(π, s) input: problem instance π ∈ Π, candidate solution s ∈ S(π) output: candidate solution s ∈ S(π)
g∗ := min{g(s ) | s ∈ N(s)}; s := random({s ∈ N(s) | g(s ) = g∗ }); return s end step-BI Figure 3.11 Procedural specification of GLSM state BI.
CDET(not R) DET
RP
BI CDET(R)
Figure 3.12 GLSM for Iterative Best Improvement with Random Restart; R is the restart predicate, for example, lmin.
Iterative Improvement The GLSM model for Iterative Improvement (cf. Chapter 1, Section 1.5) is similar to that for Uninformed Random Walk. Again, we use an RP state to model the search initialisation by random picking, but the second state now captures the semantics of iterative improvement search steps. A procedural specification of a GLSM state BI that models best improvement search steps is given in Figure 3.11; Figure 3.12 shows the GLSM for Iterative Best Improvement Search with Random Restart. Note that the random restart mechanism will enable the algorithm to escape from local minima of the evaluation function and can hence
Chapter 3 Generalised Local Search Machines CPROB(not R,1–p) BI (
ET
CD
(1–
p)
B RO
P
RP CD
ET
(R
PR
OB
)
CPROB(not R,1–p)
R)
CPROB(not R,p)
134
( p)
RW
CPROB(not R,p)
Figure 3.13 GLSM for Randomised Iterative Best Improvement with Random Restart;
R is the restart predicate, for example, countm(m).
be expected to improve its performance. Notice that the only difference between this GLSM and the one for Uninformed Random Walk with Random Restart shown in Figure 3.10 lies in the type of one state (BI vs RW). This reflects the common structure of the search control mechanism underlying both of these simple SLS algorithms. Similarly, the GLSM models for other variants of iterative improvement search, such as First Improvement or Random Improvement, are obtained by replacing the BI state by a state of an appropriately defined type that reflects the semantics of these different kinds of iterative improvement steps. Using the RP, RW and BI state types, it is easy to construct a GLSM model for Randomised Iterative Improvement, one of the simplest SLS algorithms (cf. Chapter 2, Section 2.2). The 2-state+init GLSM shown in Figure 3.13 represents the hybrid search mechanism in an explicit and intuitive way. Since the addition of the Uninformed Random Walk state enables the algorithm in principle to escape from local minima, the restart mechanism included in this GLSM is practically not as important as in the previous case of pure Iterative Best Improvement. It may be noted that the same SLS algorithm could be modelled by a 1-state+init GLSM, using a single state for Randomised Iterative Improvement steps. This representation, however, would be substantially inferior to the 2-state+init GLSM introduced above, since the structure of the respective GLSM model does not adequately capture the search control strategy.
3.3 Modelling SLS Methods Using GLSMs
135
DET : T:=update(T)
RP
DET : T:=T0
SA(T)
Figure 3.14 GLSM for Simulated Annealing; the initial temperature T0 and temperature
update function update implement the annealing schedule.
Simulated Annealing To model Simulated Annealing by a GLSM, we use a parameterised state type SA(T ) to represent the probabilistic iterative improvement strategy that forms the core of Simulated Annealing. This state type can be specified procedurally by the step function introduced in Chapter 2 (cf. Figure 2.7 on page 76). The initialisation and modifications of the temperature parameter T prescribed by the annealing schedule are realised by transition actions. This leads to the GLSM model shown in Figure 3.14. Note how this representation separates the basic search process (which corresponds to the changes in search position) from modifications of the temperature T , which is a search control parameter. Many variants of Simulated Annealing, including Constant Temperature Simulated Annealing and more complex, hybrid algorithms that combine Simulated Annealing steps with other types of search steps, can be easily represented by similar GLSMs. Furthermore, transition actions can be used in a similar way for modelling Tabu Search algorithms (cf. Chapter 2, Section 2.2); in this case, a state type representing basic tabu search steps is used along with transition actions that update the tabu status of solution components.
Iterated Local Search As a hybrid SLS method, Iterated Local Search performs two basic types of search steps in its subsidiary local search and perturbation phases. Obviously, these as well as the search initialisation are modelled by separate GLSM states LS, PS and RP. A slight complication is introduced by the acceptance criterion, which is used in ILS to determine whether or not the search continues from the new candidate solution obtained from the last perturbation and subsequent local search phase. There are various ways of modelling this acceptance mechanism in a GLSM.
136
Chapter 3 Generalised Local Search Machines
procedure step-AC(π, s, t) input: problem instance π ∈ Π, candidate solution, s,t ∈ S(π) output: candidate solution s ∈ S(π) if C(π, s, t) then return s else return t end end step-AC Figure 3.15 Procedural specification of GLSM state AC; this state type uses a candidate solution t, stored earlier in the search process and a selection predicate C(π, s, t), which returns if s is to be selected as the new search position, and ⊥ otherwise. A selection predicate that is often used in the context of ILS is better(π, s, t) := (g(s) < g(t)), where g is the evaluation function for the given problem instance, π.
CDET(not CL)
CDET(not CL)
RP
DET
LS
CDET(not CP ) CDET(CL): t:=pos
CDET(CP)
PS
LS
CDET(CL)
DET: t:=pos
AC(t)
Figure 3.16 GLSM representation of Iterated Local Search; CP and CL are condi-
tion predicates that determine the end of the perturbation phase and the local search phase, respectively, and pos denotes the current search position. For many ILS algorithms, CL := lmin.
Figure 3.16 shows a GLSM representation of ILS in which the application of the acceptance criterion is modelled as a separate state, AC (for a procedural definition of AC, see Figure 3.15). Note the use of transition actions for memorising the current candidate solution, which is needed when applying the acceptance criterion in state AC. Furthermore, it may be noted that our GLSM model allows for several perturbation steps to be performed in a row; perturbation mechanisms of this type can be found in various ILS algorithms (for an example, see Chapter 7, page 331).
3.3 Modelling SLS Methods Using GLSMs
137
CDET(not CC ) CS DET
initTrails
Cl
CDET(CC )
CDET(CL ): updateTrails LS CDET(not CL )
Figure 3.17 GLSM representation of an ACO algorithm, such as Ant System; the
condition predicates CC and CL determine the end of the construction and local search phases, respectively. For many ACO algorithms, CL := lmin. Note that some ACO algorithms, such as Ant Colony System, remove pheromone while constructing solutions. This type of pheromone update can be represented by an additional transition action associated with transition (CS, CS).
Ant Colony Optimisation There are different approaches to representing population-based SLS algorithms as GLSMs, corresponding to different views of the underlying stochastic local search process itself. As briefly discussed in Chapter 2, Section 2.3, one can view populations of individual candidate solutions for the given problem instance π as search positions; under this view, the search space of a population-based SLS algorithm consists of sets of candidate solutions of π . Ant Colony Optimisation, for instance, can then be represented by the GLSM shown in Figure 3.17. State CI initialises the construction search, and state CS performs a single construction step for all ants (cf. Example 2.10, page 98f.). The LS state performs a single step of this local search procedure for the entire population of ants. For a typical iterative improvement local search, this means that iterative improvement steps are performed for all ants that have not reached a locally minimal candidate solution of the given problem instance; in this case, usually a condition predicate CL is used that is satisfied when all ants have obtained a locally minimal candidate solution. Initialisation and update of the pheromone trails are modelled using transition actions.
138
Chapter 3 Generalised Local Search Machines
An alternative GLSM model for ACO is based on a view under which the search space consists of probability distributions of candidate solutions for the given problem instance. Note that the probabilistic construction process carried out by each ant induces a probability distribution in which, ideally, higher-quality candidate solutions have higher probability of being constructed. The subsequent local search phase then biases this probability distribution further towards better candidate solutions. Finally, updating the pheromone values modifies the probability distribution underlying the next construction phase. This representation has two disadvantages. Firstly, it does not reflect the fact that ACO effectively samples the probability distribution central to this model in each cycle in order to obtain a set of candidate solutions for the given problem instance; secondly, it does not capture the compact implicit representation of the probability distributions given by the pheromone values. However, this view on ACO is interesting for theoretical reasons, for example, in the context of analysing important theoretical properties of ACO, such as the probability of obtaining a specific solution quality within a given time bound. Also note that the GLSM model corresponding to this view does not require transition actions for manipulating the pheromone trails because these are now essential components of the search position. Another general approach for modelling population-based SLS algorithms as GLSMs is to represent each member of the population by a separate GLSM. The resulting co-operative GLSM models will be discussed in more detail in the next section.
3.4 Extensions of the Basic GLSM Model In this section, we discuss various extensions of the basic GLSM model. One of the strengths of the GLSM model lies in the fact that these extensions arise quite naturally and can be easily realised within the basic framework. Some extended GLSM models, in particular co-operative GLSMs, have immediate applications in the form of existing SLS algorithms, while others have not yet been studied in detail.
Co-operative GLSM Models A natural extension of the basic GLSM model that is particularly suited for representing population-based SLS algorithms, is to apply several GLSMs simultaneously to the same problem instance. We call such extensions co-operative
3.4 Extensions of the Basic GLSM Model
139
GLSM models, since they capture the idea of solving a given problem instance through the co-operative effort of an ensemble of agents. In the simplest case, such an ensemble consists of a number of identical GLSMs without any communication between the individual machines. The semantics of this homogeneous co-operative GLSM model are conceptually equivalent to executing multiple independent runs of an individual GLSM. This model is particularly attractive for parallelisation because it is very easy to implement. It involves virtually no communication overhead (other than making the given problem instance available to all agents and possibly terminating the runs of all individual GLSMs when a solution has been found) and can be almost arbitrarily scaled in principle. The restrictions of this model can be relaxed in two directions. One is to allow ensembles of different GLSMs. This heterogeneous co-operative GLSM model is particularly useful for modelling algorithm portfolios, that is, robust combinations of various SLS algorithms, each of which is likely to show superior performance on certain types of instances, when the features of the given problem instances are not known a priori [Huberman et al., 1997; Gomes and Selman, 1997b]. Generally, the heterogenous co-operative model has very similar advantages to its homogeneous variant; it is easy to implement and almost free of communication overhead. Another generalisation is to allow communication between the individual GLSMs of a co-operative model. This is required for explicitly modelling population-based SLS algorithms in which the individual search trajectories are not independent. As an example, consider variants of Ant Colony Optimisation that allow only the ants that obtained the best solution quality in a given iteration to update the pheromone trails (iteration-best pheromone update) [Stützle and Hoos, 2000]; in this case, communication between the ants is required in order to determine the best candidate solution. In principle, co-operative GLSM models can be extended with various communication schemes, including blackboard mechanisms, synchronous broadcasting and one-to-one message passing in a fixed network topology. There are various ways of formally realising these techniques within the GLSM framework. One approach is to allow transition conditions and transition actions to access a shared memory state, that is, information that is shared between the individual GLSMs. Another option is to use special transition actions for communication (e.g., send and receive). Many population-based SLS algorithms can be naturally represented as homogeneous co-operative GLSMs. Most ACO algorithms, for example Ant System [Dorigo et al., 1991; 1996], can be easily modelled in the following way. The basic GLSMs corresponding to the individual ants have the same structure as the GLSM model in Figure 3.17 (page 137); only now, the GLSM states represent the construction and local search steps performed by an individual ant, and the
140
Chapter 3 Generalised Local Search Machines
transition action updateTrails performs a synchronised pheromone trail update for all individual ants. Note that in this case, the pheromone values are shared information between the individual ants’ GLSMs. Co-operative GLSMs with communication are more difficult to design and to implement than those without communication, since issues such as preventing and detecting deadlocks and starvation situations generally have to be considered. Furthermore, the communication between individual GLSMs usually involves a certain amount of overhead. This overhead has to be amortised by the performance gains that may be realised in terms of speedup when applied to a specific class of problem instances and/or in terms of increased robustness over different problem classes. Generally, one way of using communication to improve the performance of co-operative GLSMs is to propagate candidate solutions with low evaluation function values (or other attractive properties) within the ensemble such that individual GLSMs that detect a stagnation of their search can pick up these ‘hints’ and continue their local search from there. (This approach is very similar to the co-operative model described by Clearwater et al. [1991; 1992], which uses a ‘blackboard’ for communicating hints between agents executing a rather simple search strategy.) This type of co-operative search method can be easily modelled by a homogeneous co-operative GLSM with communication. In such a model, the search effort will be more focused on exploring promising parts of the search space than in a co-operative model without communication. Another general scheme uses two types of GLSMs, analysts and solvers. Analysts do not attempt to find solutions but rather analyse features of the search space. The solvers try to use this information to improve their search strategy. This architecture is an instance of the heterogeneous co-operative GLSM model with communication. It can be extended in a straightforward way to allow for different types of analysts and solvers, or several independent sub-ensembles of analysts and solvers.
Learning via Dynamic Transition Probabilities One of the features of the basic GLSM model with probabilistic transitions is the fact that the transition probabilities are static, that is, they are fixed when designing the GLSM. An obvious generalisation, along the lines of learning automata theory [Narendra and Thathachar, 1989], is to let the transition probabilities evolve over time as the GLSM is running. The search control in this model corresponds to a variable-structure learning automaton. The environment in which such a dynamic GLSM is operating is given by the evaluation function induced by an individual problem instance or by a class of evaluation functions induced by a class of instances. In the first case (single-instance learning), the idea is to
3.4 Extensions of the Basic GLSM Model
141
optimise the control strategy on one instance during the local search process. The second case (multi-instance learning), is based on the assumption that for a given problem domain (or sub-domain), all instances share certain features to which the search control strategy can be adapted. The modification of the transition probabilities can either be realised by an external mechanism (external adaption control) or within the GLSM framework by means of specialised transition actions (internal adaption control). In both cases, suitable criteria for transition probability updates have to be defined. Two classes of such criteria are those based on search trajectory information and those based on GLSM statistics. The latter category includes state occupancies and transition frequencies, while the former primarily comprises basic descriptive statistics of the evaluation or objective function value along the search trajectory, possibly in conjunction with discounting of past observations. The approach as outlined here captures only a specific form of parameter learning for a given parameterised class of GLSMs. Conceptually, this can be further extended to allow for dynamic changes of transition types (which is equivalent to parameter learning for a more general transition model, such as conditional probabilistic transitions). In principle, concepts and methods from learning automata theory can be used for analysing and characterising dynamic GLSMs; basic properties, such as expedience or optimality can easily be defined [Narendra and Thathachar, 1989]. We conjecture, however, that theoretically proving such properties will be extremely difficult, as the theoretical analysis of standard SLS behaviour is already complex and rather limited in its results. Nevertheless, we believe that empirical methodology can provide a sufficient basis for developing and analysing interesting and useful dynamic GLSM models.
Evolutionary GLSM Models For co-operative GLSMs, another form of learning can be realised by letting the number or type of the individual GLSMs vary over time. The population dynamics of these evolutionary GLSM models can be interpreted as a learning mechanism. As for the learning GLSMs described above, we can distinguish between single-instance and multi-instance learning and base the process for dynamically adapting the population on similar criteria. In the conceptually simplest case, the evolutionary process only affects the composition of the co-operative ensemble: machines that are performing well will spawn off numerous offspring replacing individuals showing inferior performance. This mechanism can be applied to both, homogeneous and heterogeneous models for single-instance learning. In the former case, the selection is based on trajectory information of the individual machines and achieves a similar effect as described above for certain types of homogeneous co-operative
142
Chapter 3 Generalised Local Search Machines
GLSMs with communication: The search is concentrated on exploring promising parts of the search space. When applied to heterogeneous models, this approach allows the realisation of self-optimising algorithm portfolios, which can be useful for single-instance as well as multi-instance learning. This concept can be further extended by introducing mutation and possibly recombination operators as known from Evolutionary Algorithms. It is also easily conceivable to combine evolutionary and individual learning, for example, by evolving ensembles of dynamic GLSMs. And finally, one could consider models that additionally allow communication within the ensemble. By combining different extensions we can arrive at very complex and potentially powerful GLSM models; while these are very expressive, in general they will also be extremely difficult to analyse. Nevertheless, their implementation is rather straightforward, and an empirical approach for analysing and optimising their behaviour appears to be viable. We believe that such complex models, which allow for a very flexible and fine-grained search control, will likely be most effective when applied to problem classes with varied and salient structural features (see also Chapter 5). There is little doubt that, to some extent, this is the case for most real-world problem domains.
Continuous GLSM Models The basic GLSM model and the various extensions discussed up to this point model local search algorithms for solving discrete combinatorial problems. Yet, by using continuous instead of discrete local search strategies for the GLSM state types, the model can be easily and naturally extended to continuous optimisation approaches. Although SLS methods for continuous optimisation problems are beyond the scope of this book, it should be noted that the GLSM model’s main feature, the clear distinction between simple search strategies and search control, is as useful an architectural and conceptual principle for continuous optimisation algorithms as it is in the discrete case. Furthermore, the GLSM model is particularly well-suited for modelling algorithms for hybrid problems that involve discrete and continuous solution components, as well as for modelling approaches that combine phases of discrete and continuous search in order to solve continuous or hybrid optimisation problems.
3.5 Further Readings and Related Work The main idea underlying the GLSM model, namely to adequately represent complex algorithms as a combination of several simple strategies, is one of the
3.5 Further Readings and Related Work
143
fundamental concepts in the computing sciences. Here, we have applied this general idea to SLS algorithms for combinatorial decision and optimisation problems, using suitably extended finite state machines for representing search control mechanisms. The GLSM model is partly inspired by Pnueli’s work on hybrid systems (see, e.g., Maler et al. [1992]) and Henzinger’s work on hybrid automata; the latter is conceptually related to the GLSM model in that it uses finite state machines to model systems with continuous and discrete components and dynamics [Alur et al., 1993; Henzinger, 1996]. The GLSM definition and semantics are heavily based on well-known concepts from automata theory (for general references, see Harrison [1978] or Rozenberg and Salomaa [1997]). However, when using conditional transitions or transition actions, the GLSM model extends the conventional model of a finite state machine. In its most general form, the GLSM model bears close resemblance to a restricted form of Petri nets [Krishnamurthy, 1989], that uses only one token. While in principle, any local search algorithm can be represented as a GLSM, this representation does not always offer substantial advantages, particularly when the underlying search control mechanisms are very simple. Note, however, that some of the most successful local search algorithms for various problem classes (such as Novelty+ for SAT [Hoos, 1999a], H-RTS for MAX-SAT [Battiti and Protasi, 1997a] and iterated local search algorithms for TSP [Martin et al., 1991; Johnson, 1990; Johnson and McGeoch, 1997]) rely on rather complex search control mechanisms that are adequately captured by the GLSM model. It is also worth noting that many existing algorithmic frameworks for local search, such as GenSAT [Gent and Walsh, 1993], can be easily and adequately represented using the GLSM model. These frameworks are generally more specific than the GLSM model and emphasise more details of the respective algorithm families; however, they can be easily realised as generic GLSMs without losing any detail of description. This is achieved by using structured generic state types to capture the more specific aspects of the framework to be modelled. There exist various languages and programming environments that are specifically designed to facilitate the formulation and implementation of local search algorithms (see, for example, di Gaspero and Schaerf [2002], Fink and Voß [2002], Laburthe and Caseau [1998], Michel and van Hentenryck [1999; 2001] or van Hentenryck and Michel [2003]). Some of these support abstract representations of various aspects of the search process, such as the composition of different neighbourhoods [Hentenryck and Michel, 2003]. For an overview of several such approaches we refer to Voß and Woodruff [2002]. However, the main focus of these languages and environment is usually on simplifying the implementation of SLS algorithms by providing programming language concepts and constructs that directly support the natural and efficient implementation of key components of the respective search mechanism [Michel and Hentenryck, 2000; 2001]. To a large degree, these approaches complement conceptual frameworks such as the
144
Chapter 3 Generalised Local Search Machines
GLSM model, which are primarily designed to represent higher-level aspects of the search control mechanisms underlying complex, hybrid SLS methods. The various extensions of the basic GLSM model discussed in this chapter are closely related to established work on learning automata [Narendra and Thathachar, 1989], parallel algorithm architectures [Jájá, 1992] and Evolutionary Algorithms [Bäck, 1996]. While most of the proposed extensions have not yet been implemented and empirically evaluated, they appear to be promising, especially when considering existing work on multiple independent tries parallelisation [Shonkwiler, 1993; Gomes et al., 1998], algorithm portfolios [Gomes and Selman, 1997b] and learning local search approaches for solving hard combinatorial problems [Boyan and Moore, 1998; Minton, 1996].
3.6 Summary Based on the intuition that high-performance SLS algorithms are usually obtained by combining several simple search strategies, in this chapter we introduced the model of Generalised Local Search Machines (GLSMs). This conceptual framework formalises the search control mechanisms underlying most hybrid SLS methods using a finite state machine (FSM) model that associates simple component search strategies with the FSM states. FSMs belong to the most basic and yet fruitful concepts in computer science; using them to model search control mechanisms offers a number of advantages over other formalisms, such as pushdown automata or rule-based systems. Firstly, FSM-based models are conceptually simple; consequently, they can be implemented easily and efficiently. Secondly, the formalism is expressive enough to allow for the adequate representation of a broad range of SLS algorithms. Finally, there is a huge body of work on FSMs; many results and techniques are in principle directly applicable to GLSMs, which may be of interest in the context of analysing and optimising SLS algorithms. Of course, formalisms equivalent to the FSM model, such as rule-based descriptions, could be chosen instead. While this might be advantageous in certain contexts (such as reasoning about properties of a given GLSM), we find that the automaton model provides a more intuitive and accessible framework for designing and implementing SLS algorithms whose nature is primarily procedural. In our experience, the GLSM model facilitates the development and design of new, hybrid SLS algorithms. In this context, both conceptual and implementational aspects play a role: due to the conceptual simplicity of the GLSM model and its clear representational distinction between search strategies and search control, hybrid combinations of existing SLS methods can be easily formalised and
Exercises
145
explored. Using a generic GLSM simulator, which is not difficult to implement, new, hybrid SLS methods can be realised and evaluated in a very efficient way. Based on the formal definition of the GLSM model, the semantics of a GLSM can be specified in a rather straightforward way. Furthermore, there is a close relationship between the GLSM model and the standard definition of an SLS algorithm; in particular, GLSMs provide a factored representation of the step functions underlying complex, hybrid SLS methods that conceptually separates the underlying simple search strategies from the higher-level search control mechanism. Categorisations of state, transition and machine types provide a basis for systematically studying the search control mechanisms underlying many high-performance SLS algorithms. It may be noted that most SLS methods can be represented by rather simple GLSMs. But the fact that many high-performance SLS algorithms tend to be based on search control mechanisms that correspond to structurally slightly more complex GLSMs suggests that further performance improvements may be achieved through the development and systematic study of complex combinations of simple search strategies, as facilitated by the GLSM model. As we have shown, the basic GLSM model can be easily extended in various ways. Co-operative GLSM models comprise ensembles of GLSMs and capture a wide range of multi-agent approaches to combinatorial problem solving in an adequate way, including simple population-based approaches, algorithm portfolios and co-operative search methods. Various forms of learning can be modelled by GLSMs with dynamically changing transition probabilities or evolutionary GLSM models. Finally, GLSM models can be easily generalised to SLS methods for continuous and hybrid optimisation problems. These extensions are very natural generalisations that not only demonstrate the scope of the general idea but also suggest numerous avenues for further research. Overall, we believe that the GLSM model is very useful as a unifying framework that facilitates the understanding, development and analysis of stochastic local search algorithms.
Exercises 3.1
[Easy] Consider the GLSM from Example 3.1 (page 115f.). What is the probability that directly after entering state z2 , a sequence of three successive z2 steps is performed?
3.2
[Easy] Specify the SLS method realised by the GLSM shown at the top of the next page in the form of pseudo-code.
Chapter 3 Generalised Local Search Machines CDET(not Imin) DET
I1
S1 CPROB(not R, 1-q )
CPROB(R, r) CDET(Imin)
146
DET
I2
CPROB(R, 1-r)
S2 CPROB(not R, q)
3.3
[Medium; Implementation] Implement a simple GLSM simulator that supports only transitions of type PROB and no transition actions; in each step, instead of performing an actual search step, simply generate a line of output that indicates the type of respective search step. Use your simulator to perform two independent runs of the GLSM from Example 3.1 (page 115f.) with 20 steps each. Show the output of your simulator along with the probability for observing the respective state trajectories.
3.4
[Medium] Explain how any SLS algorithm can be formalised as a 1-state+init GLSM. Why is this formalisation not desirable?
3.5
[Easy] Decide for each of the condition predicates from Table 3.1 (page 126) with the exception of to which of the following categories it belongs: • A: solely based on properties of the current candidate solution • B: solely based on search control aspects, including memory state
3.6
[Medium] Give a good GLSM representation for the following hybrid SLS algorithm for TSP: 1. Start from a nearest neighbour tour for the given TSP instance. 2. Perform 10 000 iterations of Simulated Annealing, using a 3-exchange neighbourhood and a geometric annealing schedule with a starting temperature of 10 and a temperature reduction by a factor of 0.8 every 500 search steps.
Exercises
147
3. Perform best improvement steps in the 3-exchange neighbourhood until a local minimum is found. 4. If given target solution quality has not been reached, perform a single, random 4-exchange step and go to Step 3. It is sufficient to specify the semantics for each state type and transition action in the form of a precise and concise natural language description. 3.7
[Medium] Show how a Memetic Algorithm (see Chapter 2, Section 2.4) can be modelled as a GLSM.
3.8
[Medium] (a) Specify a GLSM model for ILS that does not model the acceptance criterion by a separate state. (b) Discuss advantages and disadvantages of the model from part (a) compared to the representation in Figure 3.16 (page 136).
3.9
[Hard] (a) Give a formal definition of a co-operative GLSM model with a fixed number of individual machines that can communicate with each other via shared memory, building on Definition 3.1 (page 114f.). (b) Describe the semantics of this extended model in an informal, but precise way (cf. Section 3.1). (c) Use the extended model to give an adequate representation of ACO.
3.10 [Hard] Formally define the semantics of a variant of the basic GLSM model in which the memory state can solely be changed during search steps (and not during state transitions). Discuss the advantages or disadvantages of this variant compared to the model specified in Section 3.1.
This Page Intentionally Left Blank
There is no higher or lower knowledge, but one only, flowing out of experimentation. —Leonardo da Vinci, Inventor & Artist
Empirical Analysis of SLS Algorithms In this chapter, we discuss methods for empirically analysing the performance and behaviour of stochastic local search algorithms. Most of our general considerations and all empirical methods covered in this chapter apply to the broader class of (generalised) Las Vegas algorithms, which contains SLS algorithms as a subclass. After motivating the need for a more adequate empirical methodology and providing some general background on Las Vegas algorithms, we introduce the concept of run-time distributions (RTDs), which forms the basis of the empirical methodology presented in the following. Generally, this RTD-based analysis technique facilitates the evaluation, comparison and improvement of SLS algorithms for decision and optimisation problems; specifically, it can be used for obtaining optimal parameterisations and parallelisations.
4.1 Las Vegas Algorithms Stochastic Local Search algorithms are typically incomplete when applied to a given instance of a combinatorial decision or optimisation problem; there is no guarantee that an (optimal) solution will eventually be found. However, in the case of a decision problem, if a solution is returned, it is guaranteed to be correct. The same holds for the decision variants of optimisation problems. Another important property of SLS algorithms is the fact that, given a problem instance, the time required for finding a solution (in case a solution is found) is a random variable. These two properties, correctness of the solution computed and run-times 149
150
Chapter 4 Empirical Analysis of SLS Algorithms
characterised by a random variable, define the class of (generalised) Las Vegas algorithms (LVAs).
Definition 4.1 Las Vegas Algorithm
An algorithm A for a problem class Π is a (generalised) Las Vegas algorithm (LVA) if, and only if, it has the following properties: (1) If for a given problem instance π ∈ Π, algorithm A terminates returning a solution s, s is guaranteed to be a correct solution of π . (2) For each given instance π ∈ Π, the run-time of A applied to π is a random variable RTA,π .
Remark: The concept of Las Vegas algorithms was introduced by Babai for
the (theoretical) study of randomised algorithms [Babai, 1979]. In theoretical computer science, usually two (equivalent) definitions of a Las Vegas algorithm are used. Both of these definitions assume that the algorithm terminates in finite time; they mainly differ in that the first definition always requires the algorithm to return a solution, while the second definition requires that the probability of returning a correct solution is larger or equal than 0.5 [Hromkovic, ˇ 2003]. Note that our definition allows for the possibility that, with an arbitrary probability, the algorithm does not return any solution; in this sense, it slightly generalises the established concept of a Las Vegas algorithm. In the following, we will use the term Las Vegas algorithm to refer to this slightly more general notion of an LVA. Here and in the following, we treat the case in which an SLS algorithm does not return a solution as equivalent to the case where the algorithm does not terminate. Under this assumption, any SLS algorithm for a decision problem is a Las Vegas algorithm, as long as the validity of any solution returned by the algorithm is checked. Typically, checking for the correctness of solutions is very efficient compared to the overall run-time of an SLS algorithm, and most SLS algorithms perform such a check before returning any result. (Note that for problems in N P, the correctness of a solution can always be verified in polynomial time.) Based on this argument, in the following we assume that SLS algorithms for decision problems always check correctness before returning a solution. As an example, it is easy to see that Uninformed Random Picking (as introduced in Section 1.5) is a Las Vegas algorithm. Because a solution is generally never returned without verifying it first (as explained above), condition (1) of the
4.1 Las Vegas Algorithms
151
definition is trivially satisfied, and because of the randomised selection process in each search step and/or in the initialisation the time required for finding a solution is obviously a random variable. In the case of SLS algorithms for optimisation problems, at the first glance, the situation seems to be less clear. Intuitively and practically, unless the optimal value of the objective function is known, it is typically impossible to efficiently verify the optimality of a given candidate solution. However, as noted in Chapter 1, Section 1.1, many optimisation problems include logical conditions that restrict the set of valid solutions. The validity of a solution can be checked efficiently for combinatorial optimisation problems whose associated decision problems are in N P, and SLS algorithms for solving such optimisation problems generally perform such a test before returning a solution. Hence, if only valid solutions are considered correct, SLS algorithms for optimisation problems fit the formal definition of Las Vegas algorithms. However, SLS algorithms for optimisation problems have the additional property that for fixed run-time, the solution quality achieved by the algorithm, that is, the objective function value of the incumbent candidate solution, is also a random variable.
Definition 4.2 Optimisation Las Vegas Algorithm
An algorithm A for an optimisation problem Π is a (generalised) optimisation Las Vegas algorithm (OLVA) if, and only if, it is a (generalised) Las Vegas algorithm, and for each problem instance π ∈ Π the solution quality achieved after any run-time t is a random variable SQ(t).
Note that for OLVAs, the solution quality achieved within a bounded run-time is a random variable, and the same holds for the run-time required for achieving or exceeding a given solution quality. Las Vegas algorithms are prominent in various areas of computer science and operations research. A significant part of this impact is due to the successful application of SLS algorithms for solving N P-hard combinatorial problems. However, there are other very successful Las Vegas algorithms that are not based on stochastic local search. In particular, a number of systematic search methods, including some fairly recent variants of the Davis Putnam algorithm for SAT (see also Chapter 6), make use of non-deterministic decisions such as randomised tiebreaking rules and fall into the category of generalised Las Vegas algorithms. It should be noted that Las Vegas algorithms can be seen as a special case of the larger, and also very prominent, class of Monte Carlo Algorithms. Like LVAs,
152
Chapter 4 Empirical Analysis of SLS Algorithms
Monte Carlo algorithms are randomised algorithms with randomly distributed run-times. However, a Monte Carlo algorithm can sometimes return an incorrect answer; in other words, it can generate false positive results (incorrect solutions to the given problem instance) as well as false negative results (missed correct solutions), while for (generalised) Las Vegas algorithms, only false negatives are allowed.
Empirical vs Theoretical Analysis As a result of their inherently non-deterministic nature, the behaviour of Las Vegas algorithms is usually difficult to analyse. For most practically relevant LVAs, in particular for SLS algorithms that perform well in practice, theoretical results are typically hard to obtain, and even in the cases where theoretical results do exist, their practical applicability is often very limited. The latter situation can arise for different reasons. Firstly, sometimes the theoretical results are obtained under idealised assumptions that do not hold in practical situations. This is, for example, the case for Simulated Annealing, which has been proven to converge towards an optimal solution under certain conditions, one of which is infinitesimally slow cooling in the limit [Hajek, 1988]—which obviously is not practical. Secondly, most complexity results apply to worst-case behaviour, and in the relatively few cases where theoretical average-case results are available, these are often based on instance distributions that are unlikely to be encountered in practice. Finally, theoretical bounds on the run-times of SLS algorithms are typically asymptotic, and do not reflect the actual behaviour accurately enough. Given this situation, in most cases the analysis of the run-time behaviour of Las Vegas algorithms is based on empirical methodology. In a sense, despite dealing with algorithms that are completely known and easily understood on a step-by-step execution basis, computer scientists are in a sense in the same situation as, for instance, an experimental physicist studying some nondeterministic quantum effect or a microbiologist investigating bacterial growth behaviour. In either case, a complex phenomenon of interest cannot be easily derived from known underlying principles solely based on theoretical means; instead, the classical scientific cycle of observation, hypothesis, prediction, experiment is employed in order to obtain a model that explains the phenomenon. It should be noted that in all empirical sciences, in particular in physics, chemistry and biology, it is largely a collection of these models that constitutes theoretical frameworks, whereas in computer science, theory is almost exclusively derived from mathematical foundations. Historical reasons aside, this difference is largely due
4.1 Las Vegas Algorithms
153
to the fact that algorithms are completely specified and mathematically defined at the lowest level. However, in the case of SLS algorithms (and many other complex algorithms or systems), this knowledge is often insufficient to theoretically derive all relevant aspects of their behaviour. In this situation, empirical approaches, based on computational experiments, are often not only the sole way of assessing a given algorithm, but also have the potential to provide insights into practically relevant aspects of algorithmic behaviour that appear to be well beyond the reach of theoretical analysis.
Norms of LVA Behaviour By definition, Las Vegas algorithms are always correct, while they are not necessarily complete, that is, even if a given problem instance has a solution, a Las Vegas algorithm is generally not guaranteed to find it. Completeness is not only an important theoretical concept for the study of algorithms, but it is often also relevant in practical applications. In the following, we distinguish not only between complete and incomplete Las Vegas algorithms, but also introduce a third category, the so-called probabilistically approximately complete LVAs. Intuitively, an LVA is complete, if it can be guaranteed to solve any soluble problem instance in bounded time; it is probabilistically approximately complete (PAC), if it will solve each soluble problem instance with arbitrarily high probability when allowed to run long enough; and it is essentially incomplete, if even arbitrarily long runs cannot be guaranteed to find existing solutions. These concepts can be formalised as follows: Definition 4.3 Asymptotic Behaviour of LVAs
Consider a Las Vegas algorithm A for a problem class Π, and let Ps (RTA,π ≤ t) denote the probability that A finds a solution for a soluble instance π ∈ Π in time less than or equal to t. A is called • complete if, and only if, for each soluble instance π ∈ Π there exists some tmax such that Ps (RTA,π ≤ tmax ) = 1; • probabilistically approximately complete (PAC) if, and only if, for each soluble instance π ∈ Π, limt→∞ Ps (RTA,π ≤ t) = 1; • essentially incomplete if, and only if, it is not PAC, that is, if there exists a soluble instance π ∈ Π, for which limt→∞ Ps (RTA,π ≤ t) < 1.
154
Chapter 4 Empirical Analysis of SLS Algorithms
Probabilistic approximate completeness is also refered to as the PAC property, and we will often use the term ‘approximately complete’ to characterise algorithms that are PAC. Furthermore, we will use the terms completeness, probabilistic approximate completeness and essential incompleteness also with respect to single problem instances or subsets of a problem Π, if the respective properties hold for the corresponding sets of instances instead of Π. Examples for complete Las Vegas algorithms are randomised systematic search procedures, such as Satz-Rand [Gomes et al., 1998]. Many stochastic local search methods, such as Randomised Iterative Improvement and variants of Simulated Annealing, are PAC, while others, such as basic Iterative Improvement, many variants of Iterated Local Search and most tabu search algorithms are essentially incomplete (see also in-depth section on page 155ff.). Theoretical completeness can be achieved for any SLS algorithm by using a restart mechanism that systematically re-initialises the search such that eventually the entire search space has been visited. However, the time limits for which solutions are guaranteed to be found using this appproach are typically far too large to be of practical interest. A similar situation arises in many practical situations for search algorithms whose completeness is achieved by different means, such as systematic backtracking. Essential incompleteness of an SLS algorithm is usually caused by the algorithm’s inability to escape from attractive local minima regions of the search space. Any mechanism that guarantees that a search process can eventually escape from arbitrary regions of the search space, given sufficient time, can be used to make an SLS algorithm probabilistically approximately complete. Examples for such mechanisms include random restart, random walk and probabilistic tabulists; however, as we will discuss in more detail later (see Section 4.4), not all such mechanisms necessarily lead to performance improvements relevant to practical applications. For optimisation LVAs, the concepts of completeness, probabilistic approximate completeness and essential incompleteness can be applied to the associated decision problems in a straightforward way, using the following generalisations:
Definition 4.4 Asymptotic Behaviour of OLVAs
Consider an optimisation Las Vegas algorithm A for a problem Π , and let Ps (RTA ,π ≤ t, SQA ,π ≤ r · q ∗ (π )) denote the probability that A finds a solution of quality ≤ r · q ∗ (π ) for a soluble instance π ∈ Π in time ≤ t, where q ∗ (π ) is the optimal solution quality for instance π .
4.1 Las Vegas Algorithms
155
A is called • r -complete if, and only if, for each soluble instance π ∈ Π there exists some tmax such that Ps (RTA ,π ≤ tmax , SQA ,π ≤ r · q ∗ (π )) = 1; • probabilistically approximately r -complete (r -PAC) if, and only if, for each soluble instance π ∈ Π , limt→∞ Ps (RTA ,π ≤ t, SQA ,π ≤ r · q ∗ (π )) = 1; • essentially r -incomplete if, and only if, it is not approximately r-complete, i.e., if there exists a soluble problem instance π ∈ Π , for which limt→∞ Ps (RTA ,π ≤ t, SQA ,π ≤ r · q ∗ (π )) < 1.
With respect to finding optimal solutions, we use the terms complete, approximately complete and essentially incomplete synonymously for 1-complete, approximately 1-complete and essentially 1-incomplete, where q is the optimal solution quality for the given problem instance.
In Depth Probabilistic Approximate Completeness and ‘Convergence’ The PAC property states that by running an algorithm sufficiently long, the probability of not finding a (optimal) solution can be made arbitrarily small. Hence, an increase of the run-time typically pays off in the sense that it also increases the chance that the algorithm finds a solution. As previously stated, several extremely simple SLS algorithms have the PAC property. For example, it is rather straightforward to show that Uninformed Random Picking is PAC. Under some simple conditions, several more complex SLS algorithms can also be proven to have the PAC property. One condition that is sufficient for guaranteeing the PAC property for an SLS algorithm A is the following: there exists > 0 such that in each search step, the distance to an arbitary, but fixed (optimal) solution s∗ is reduced with probability greater than or equal to , where distance is measured in terms of a minimum length search trajectory of A from its current position to s∗ . To see why this condition implies that A is PAC, consider a situation where the distance between the current candidate solution and s∗ is equal to l. In that case, we can compute a lower bound on the probability of reaching s∗ in exactly l steps as l . Since the diameter ∆ of the given neighbourhood graph is an upper bound for l, we can give a worst-case estimate for the probability of reaching s∗ from an arbitrary candidate solution within ∆ steps as ∆ . Any search trajectory of length t > ∆ can be partitioned into segments of length ∆, for each of which there is an independent probability of at least ∆ of reaching solution s∗ ; consequently, the probability that A does not reach s∗ within a trajectory of length t can be bounded by
(1 − ∆ )t/∆ .
156
Chapter 4 Empirical Analysis of SLS Algorithms Since ∆ > 0, by choosing t sufficiently large, this failure probability can be made arbitrarily small, and consequently, the success probability of A converges to 1 as the run-time approaches infinity, that is, A is PAC. A proof along these lines can easily be applied to SLS methods such as Randomised Iterative Improvement (see Chapter 2, page 72ff.) and — with some additional assumptions on the maximum difference between the evaluation function values of neighbouring candidate solutions — Probabilistic Iterative Improvement (see Chapter 2, page 74f.). The PAC property has also been proven for a number of other algorithms, including Simulated Annealing [Geman and Geman, 1984; Hajek, 1988; Lundy and Mess, 1986], specific Ant Colony Optimization algorithms [Gutjahr, 2002; Stützel and Dorigo, 2002], Probabilistic Tabu Search [Faigle and Kern, 1992], deterministic variants of Tabu Search [Glover and Hanafi, 2002; Hanafi, 2001] and Evolutionary Algorithms [Rudolph, 1994]. For many SLS algorithms, properties that are stronger than the PAC property have been proven. (In some sense, the strength of the result depends on the notion of probabilistic convergence proven; see Rohatgi [1976] for the different notions of probabilistic convergence.) In particular, in some cases it can be proven that if sk is the candidate solution at step k , then limk→∞ P (sk ∈ S) = 1 (i.e., the probability that the current search position sk is a (optimal) solution tends to one as the number of iterations approaches infinity). In other words, if run sufficiently long, the probability for the algorithm to visit any non-solution position becomes arbitrarily small. This is exactly the type of result that has been proven, for example, for Simulated Annealing [Hajek, 1988]. This type of convergence can be nicely contrasted with the definition of PAC which implies directly that limk→∞ P (ˆ sk ∈ S) = 1, where sˆk is the incumbent solution after step k. Clearly, the former type of convergence result is stronger in that it implies the PAC property but not vice versa. However, for practical purposes, this stronger sense of convergence is irrelevant. This is because for decision problems, once a solution is found that satisfies all logical conditions, the search is terminated, and for optimisation problems, the best candidate solution encountered so far is memorised and can be accessed at any time throughout the search. Additionally, these stronger convergence proofs are often based on particular parameter settings of the algorithm that result in an extremely slow convergence of the success probability that is not useful for practically solving problems; this is the case for practically all ‘strong’ convergence proofs for Simulated Annealing. The significance of a PAC result is that the respective algorithm is guaranteed to not get permanently trapped in a non-solution area of the search space. What is of real interest in practice, however, is the rate at which the success probability approaches one. While PAC proofs typically give, as a side effect, a lower bound on that rate, this bound is typically rather poor, since it does not adequately capture the heuristic guidance utilised by the algorithm. In fact, in proofs of the PAC property, one typically has to assume a worst-case scenario in which the search heuristic is maximally misleading. However, empirical results indicate that in many cases variants of SLS methods that are PAC perform significantly better in practice than non-PAC variants [Hoos, 1999a; Hoos and Stützle, 2000a; Stützle and Dorigo, 2002], which gives a strong indication that in practice the convergence rate of PAC algorithms is much higher than the theoretical analyses would suggest. In general, proving better bounds on the convergence rate of
4.1 Las Vegas Algorithms
157
state-of-the-art SLS algorithms appears to be very challenging, but is doubtlessly an interesting direction of theoretical work on SLS algorithms.
Application Scenarios and Evaluation Criteria For the empirical analysis of any algorithm it is crucial to use appropriate evaluation criteria. In the case of Las Vegas algorithms, depending on the characteristics of the application context, different evaluation criteria are appropriate. Let us start by considering Las Vegas algorithms for decision problems and classify possible application scenarios in the following way: Type 1: There are no time limits, that is, we can afford to run the algorithm as long as it needs to find a solution. Basically, this scenario is given whenever the computations are done off line or in a non-realtime environment where it does not really matter how long it takes to find a solution. In this situation we are interested in the expected time required for finding a solution; this can be estimated easily from a number of test runs. Type 2: There is a hard time limit for finding the solution such that the algorithm has to provide a solution after some given time tmax ; solutions that are found later are of no use. In real-time applications, such as robotic control or dynamic task scheduling, tmax can be very small. In this situation we are not so much interested in the expected time for finding a solution, but in the probability that after the hard deadline tmax a solution has been found. Type 3: The usefulness or utility of a solution depends on the time that was needed to find it. Formally, if utilities are represented as values in [0, 1], we can characterise these scenarios by specifying a utility function U : R+ → [0, 1], where U (t) is the utility of finding a solution at time t. As can be easily seen, application types 1 and 2 are special cases of type 3 which can be characterised by utility functions that are either constant (type 1) or step functions U (t) := 1 for t ≤ tmax and U (t) := 0 for t > tmax (type 2). While in the case of no time limits being given (type 1), the mean run-time of a Las Vegas algorithm might suffice to roughly characterise its run-time behaviour, in real-time situations (type 2) this measure is basically meaningless. Type 3 is not only the most general class of application scenario, but these scenarios are also the most realistic. The reason for this is the fact that real-world problem solving usually involves time-constraints that are less strict than the hard deadline given in type 2 scenarios. Instead, at least within a certain interval, the value of a solution gradually decreases over time. In particular, this situation is given when taking into account the costs (in particular, CPU time) of finding a solution.
158
Chapter 4 Empirical Analysis of SLS Algorithms
As an example, consider a situation where hard combinatorial problems have to be solved on line using expensive hardware in a time-sharing mode. Even if the immediate benefit of finding a solution is invariant over time, the costs for performing the computations will diminish the final payoff. Two common ways of modelling this effect are constant or proportional discounting, which use utility functions of the form U (t) := max{u0 − c · t, 0} and U (t) := e−λ·t , respectively (see, e.g., [Poole et al., 1998]). Based on the utility function, the weighted solution probability U (t) · Ps (RT ≤ t) can be used as a performance criterion. If U (t) and Ps (RT ≤ t) are known, optimal cutoff times t∗ that maximise the weighted solution probability can be determined as well as the expected utility for a given time t . These evaluations and calculations require detailed knowledge of the solution probabilities Ps (RT ≤ t), potentially for arbitrary run-times t. In the case of optimisation Las Vegas algorithms, solution quality has to be considered as an additional factor. One might imagine application contexts in which the run-time is basically unconstrained, such as in the type 1 scenarios discussed above, but a certain solution quality needs to be obtained, or situations in which a hard time-limit is given, during which the best possible solution is to be found. Typically, however, one can expect to find more complex tradeoffs between run-time and solution quality. Therefore, the most realistic application scenario for optimisation Las Vegas algorithms is a generalisation of type 3, where the utility of a solution depends on its quality as well as on the time needed to find it. This is modelled by utility functions U (t, q ) : R+ × R+ → [0, 1], where U (t, q ) is the utility of a solution of quality q found at time t. Analogous to the case of decision LVAs, the probability Ps (RT ≤ t, SQ ≤ q ) for obtaining a certain solution quality q within a given time t, weighted by the utility U (t, q ) can be used as a performance criterion.
4.2 Run-Time Distributions As we have argued in the previous section, it is generally not sufficient to evaluate LVAs based on the expected time required for solving a problem instance or for achieving a given solution quality, or the probability of solving a given instance within a given time. Instead, application scenarios are often characterised by complex utility functions, or Las Vegas algorithms are evaluated without a priori knowledge of the application scenario, such that the utility function is unknown, but cannot be assumed to correspond to one of the special cases characterising type 1 or 2 application scenarios. Therefore, LVA evaluations should be based on a detailed knowledge and analysis of the solution probabilities Ps (RT ≤ t) for decision problems and Ps (RT ≤ t, SQ ≤ q ) for optimisation problems, respectively. Obviously, these probabilities can be determined from the probability
4.2 Run-Time Distributions
159
distributions of the random variables characterising the run-time and solution quality of a given LVA.
Definition 4.5 Run-Time Distribution
Consider a Las Vegas algorithm A for decision problems class Π, and let Ps (RTA,π ≤ t) denote the probability that A finds a solution for a soluble instance π ∈ Π in time less than or equal to t. The run-time distribution (RTD) of A on π is the probability distribution of the random variable RTA,π , which is characterised by the run-time distribution function rtd : R+ → [0, 1] defined as rtd(t) = Ps (RTA,π ≤ t). Similarly, given an optimisation Las Vegas algorithm A for an optimisation problem Π and a soluble problem instance π ∈ Π , let Ps (RTA ,π ≤ t, SQA ,π ≤ q ) denote the probability that A applied to π finds a solution of quality less than or equal to q in time less than or equal to t. The runtime distribution (RTD) of A on π is the probability distribution of the bivariate random variable (RTA ,π , SQA ,π ), which is characterised by the run-time distribution function rtd : R+ × R+ → [0, 1] defined as rtd(t, q ) = Ps (RTA ,π ≤ t, SQA ,π ≤ q ).
Since RTDs are completely and uniquely characterised by their distribution functions, we will often use the term ‘run-time distribution’ or ‘RTD’ to refer to the corresponding run-time distribution functions.
Example 4.1 RTDs for Decision and Optimisation LVAs
Figure 4.1 (left) shows a typical run-time distribution for an SLS algorithm applied to an instance of a hard combinatorial decision problem. The RTD is s (RT ≤ t)) represented by a cumulative probability distribution curve (t, P that has been empirically determined from 1 000 runs of WalkSAT, one of the most prominent SLS algorithms for SAT, on a hard Random 3-SAT instance with 100 variables and 430 clauses (for details on the algorithm and problem s (RT ≤ t) represents an empirical estimate for the class, see Chapter 6); P success probability Ps (RT ≤ t)). Figure 4.1 (right) shows the bivariate RTD for an SLS optimisation algorithm applied to an instance of a hard combinatorial optimisation problem. The plotted surface corresponds to the cumulative probability distribution of an empirically measured RTD, in this case determined from 1 000 runs of an iterated local search algorithm applied to instance pcb442 with 442 vertices from TSPLIB, a benchmark library for the TSP (details on SLS algorithms
Chapter 4 Empirical Analysis of SLS Algorithms
P(solve)
160
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.001
P(solve) 1 0.8 0.6 0.4 0.2 0 2.5 2 1.5
100
1 0.01
0.1
1
run-time [CPU sec]
10
100
rel. soln. quality [%]
0.5 0
1
0.1
10
run-time [CPU sec]
Figure 4.1 Typical run-time distributions for SLS algorithms applied to hard combinatorial
decision (left) and optimisation problems (right); for details, see text.
and benchmark problems for the TSP will be discussed in Chapter 8). Note how the contours of the three-dimensional bivariate RTD surface projected into the run-time/solution quality plane reflect the tradeoff between runtime and solution quality: for a given probability level, better solution qualities require longer runs, while vice versa, shorter runs yield lower quality solutions.
The behaviour of a Las Vegas algorithm applied to a given problem instance is completely and uniquely characterised by the corresponding RTD. Given an RTD, other performance measures or evaluation criteria can be easily computed. For decision LVAs, measures such as the mean run-time for finding a solution, its standard deviation, median, quantiles or success probabilities for arbitrary time limits are often used in empirical studies. For optimisation LVAs, popular evaluation criteria include the mean or standard deviation of the solution quality for a given run-time (cutoff time) as well as basic descriptive statistics of the run-time required for obtaining a given solution quality. Unlike these measures, however, knowledge of the RTD allows the evaluation of Las Vegas algorithms for problems and application scenarios which involve more complex trade-offs. Some of these can be directly represented by a utility function, while others might concern preferences on properties of the RTDs. As an example for the latter case, consider a situation where for a given time-limit t , one SLS algorithm gives a high mean solution quality but a relatively large standard deviation, while another algorithm produces slightly inferior
4.2 Run-Time Distributions
161
solutions in a more consistent way. RTDs provide a basis for addressing such trade-offs quantitatively and in detail.
Qualified Run-Time Distributions Multivariate probability distributions, such as the RTDs for optimisation LVAs, are often more difficult to handle than univariate distributions. Therefore, when analysing and characterising the behaviour of optimisation LVAs, instead of working directly with bivariate RTDs, it is often preferable to focus on the (univariate) distributions of the run-time required for reaching a given solution quality threshold.
Definition 4.6 Qualified Run-Time Distribution
Let A be an optimisation Las Vegas algorithm for an optimisation problem Π and let π ∈ Π be a soluble problem instance. If rtd(t, q ) is the RTD of A on π , then for any solution quality q , the qualified run-time distribution (QRTD) of A on π for q is defined by the distribution function qrtdq (t) := rtd(t, q ) = Ps (RTA ,π ≤ t, SQA ,π ≤ q ).
The qualified RTDs thus defined are marginal distributions of the bivariate RTD; intuitively, they correspond to cross-sections of the two-dimensional RTD graph for fixed solution quality values. Qualified RTDs are useful for characterising the ability of a SLS algorithm for an optimisation problem to solve the associated decision problems (cf. Chapter 1). In practice, they are commonly used for studying an algorithm’s ability to find optimal or close-to-optimal solutions (if the optimal solution quality is known) or feasible solutions (in cases where hard constraints are given). Analysing series of qualified RTDs for increasingly tight solution quality thresholds can give a detailed picture of the behaviour of an optimisation LVA. An important question arises with respect to the solution quality bounds used when measuring or analysing qualified RTDs. For some problems, benchmark instances with known optimal solutions are available. In this case, bounds expressed as relative deviations from the optimal solution quality are often used; the relative deviation of solution quality q from optimal solution quality q ∗ is calculated as q/q ∗ −1; the relative solution qualities thus obtained are often expressed in percent. (In cases where q ∗ = 0, sometimes the solution quality is normalised by dividing it by the maximal possible objective function value.) If optimal solutions are not known, one possibility is to evaluate the SLS algorithms w.r.t. the best
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.01
P(solve)
Chapter 4 Empirical Analysis of SLS Algorithms
P(solve)
162
0.8% 0.6% 0.4% 0.2% opt
0.1
1
10
100
1 000
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
10s 3.2s 1s 0.3s 0.1s
0
run-time [CPU sec]
0.5
1
1.5
2
2.5
relative solution quality [%]
Figure 4.2 Left: Qualified RTDs for the bivariate RTD from Figure 4.1. Right: SQDs for
the same RTD .
known solutions. This method, however, has the potential disadvantage that best known solutions may change. Therefore, it is sometimes preferable to use lower bounds of the optimal solution quality, especially if these are known to be close to the optimum, as is the case for the TSP [Held and Karp, 1970; Johnson and McGeoch, 1997]. Alternatively, there are statistical methods for estimating optimal solution qualities in cases where tight lower bounds are not available [Dannenbring, 1977; Golden and Steward, 1985]. Example 4.2 Qualified Run-Time Distributions
Figure 4.2 (left) shows a set of qualified RTDs which correspond to marginal distributions of the bivariate empirical RTD from Example 4.1 (page 159f.). Note that when tightening the solution quality bound, the qualified RTDs get shifted to the right and appear somewhat steeper in the semi-log plot. This indicates that not only the run-time required for finding higher-quality solutions is higher, but also the relative variability of the run-time (as reflected, for example, in the variation coefficient, that is, the standard deviation of the RTD divided by its mean). The latter observation reflects a rather typical property of SLS algorithms for hard optimisation problems.
Solution Quality Distributions An orthogonal view of an optimisation LVAs behaviour is given by the distribution of the solution quality for fixed run-time limits.
4.2 Run-Time Distributions
163
Definition 4.7 Solution Quality Distribution
Let A be an optimisation Las Vegas algorithm for an optimisation problem Π and let π ∈ Π be a solvable problem instance. If rtd(t, q ) is the RTD of A on π , then for any run-time t , the solution quality distribution (SQD) of A on π for t is defined by the distribution function sqdt (q ) := rtd(t , q ) = Ps (RTA ,π ≤ t , SQA ,π ≤ q ).
Like qualified RTDs, solution quality distributions are marginal distributions of a bivariate RTD. They correspond to cross-sections of the two-dimensional RTD graph for fixed run-times; in this sense they are orthogonal to qualified RTDs. SQDs are particularly useful in situations where fixed cutoff times are given (such as in type 2 application scenarios). Furthermore, they facilitate quantitative and detailed analyses of the trade-offs between the chance of finding a good solution fast and the risk of obtaining only low-quality solutions. Different from run-time, solution quality is inherently bounded from below by the quality of the optimal solution of the given problem instance. This constrains the SQDs of typical SLS algorithms, such that for sufficiently long runtimes, an increase in mean solution quality is often accompanied by a decrease of solution quality variability. In particular, for a probabilistically approximately complete algorithm, the SQDs for increasingly large time-limits t approach a degenerate probability distribution that has all probability mass concentrated on the optimal solution quality.
Example 4.3 Solution Quality Distributions
Figure 4.2 (right) shows a set of SQDs, that is, marginal distributions of the bivariate empirical RTD from Example 4.1 (page 159f.), which offer an orthogonal view to the qualified RTDs from Example 4.2. The SQDs show clearly that for increasing run-time, the entire probability mass is shifted towards higher-quality solutions, while the variability in solution quality decreases. It is also interesting to note that the SQDs for large run-times are multimodal, as can be seen from the fact that they have multiple steep segments which correspond to the peaks in probability density (modes).
An interesting special case arises for iterative improvement algorithms that cannot escape from local minima regions of the given evaluation function. Once they have encountered such a local minima region, these essentially incomplete
164
Chapter 4 Empirical Analysis of SLS Algorithms
algorithms are unable to obtain any further improvements in solution quality. Consequently, as the run-time is increased towards infinity, the respective SQDs approach a non-degenerate probability distribution. For simple iterative improvement methods that only allow strictly improving steps and always perform such steps when they are possible, this asymptotic SQD is reached after finite run-time on any given problem instance. Moreover, in this case asymptotic SQDs can be easily sampled empirically by simply performing multiple runs of the algorithm and recording the quality of the incumbent solution upon termination of each of these runs. Asymptotic SQDs are useful for characterising the performance of simple iterative improvement algorithms (cf. Example 2.1, page 64f.). The information provided by asymptotic SQDs is typically well complemented by the (univariate) run-time distribution that captures the time spent by the algorithm before terminating, independent of the final solution quality reached in this run. To distinguish this type of run-time distribution from the complete bivariate run-time distribution that characterises the behaviour of any optimisation LVA and from the notion of a qualified run-time distribution discussed above, we refer to it as termination-time distribution (TTD). Note that unlike qualified RTDs, TTDs are not marginal distributions of the underlying bivariate RTD. Asymptotic SQDs are also very useful for characterising the performance of purely constructive search algorithms, such as the Nearest Neighbour Heuristic for the TSP (cf. Chapter 1, Section 1.4), which terminate as soon as a complete candidate solution has been obtained. Unlike (perturbative) iterative improvement algorithms, constructive search algorithms typically terminate after a fixed, instance-dependent number of search steps. Consequently, they typically show much less variability in run-time (or no variability at all), which simplifies comparative performance analyses.
Time-Dependent Summary Statistics Instead of dealing with a set of SQDs for a series of time limits, researchers (and practitioners) often just look at the development of certain solution quality statistics over time (SQTs). A common example of such an SQT is the function SQ(t), which characterises the time-dependent development of the mean solution quality achieved by a given algorithm. It is often preferable to use SQTs that reflect the development of quantiles (e.g., the median) of the underlying SQDs over time, since quantiles are typically statistically more stable than means. Furthermore, SQTs based on SQD quantiles offer the advantage that they can be seen as horizontal sections or contour lines of the underlying bivariate RTD surfaces. Combinations of such SQTs can be very useful for summarising certain aspects of
0.7
165
100 median 0.75 quantile 0.9 quantile
0.6 0.5
run-time [CPU sec]
relative solution quality [%]
4.2 Run-Time Distributions
0.4 0.3 0.2 0.1 0 0.1
0.9 quantile 0.75 quantile median
10
1
0.1 1
10
run-time [CPU sec]
100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
relative solution quality [%]
Figure 4.3 Left: Development of median solution quality, 0.75 and 0.9 SQD quantiles over
time for the same TSP algorithm and problem instance as used in Figure 4.1 (page 160). Right: RTQ for the same algorithm and problem instance.
a full SQD series, and hence a complete bivariate RTD; they are particularly well suited for explicitly illustrating trade-offs between run-time and solution quality. Especially individual SQTs, however, offer a fairly limited view of an optimisation Las Vegas algorithm’s run-time behaviour in which important details can be easily missed.
Example 4.4 Solution Quality Statistics Over Time
Figure 4.3 (left) shows the development of median solution quality and its variability over time, obtained from the same empirical data underlying the bivariate RTD from Example 4.1 (page 159f.). From this type of evaluation, which is often used in the literature, we can easily see that in the given example the algorithm behaves in a very desirable way: with increasing run-time, the median solution quality as well as the higher SQD quantiles improve substantially and consistently; in this particular example, we can also see that there is a large and rapid improvement in solution quality after 4–20 CPU seconds. The gradual decrease in solution quality variability during the first and final phase of the search is rather typical for the behaviour of highperformance SLS algorithms for hard combinatorial optimisation problems; it indicates that for longer runs the algorithm tends to find better solutions in a more consistent way. Note, however, that interesting properties, such as the fact that in our example the SQDs for large run-times are multimodal, or that the variation in run-time increases when higher-quality solutions need to be obtained, cannot be observed from the SQT data shown here.
166
Chapter 4 Empirical Analysis of SLS Algorithms
It is interesting to note that, while SQTs are commonly used in the literature for evaluating and analysing the behaviour of SLS algorithms for optimisation problems, the orthogonal concept of qualified RTD statistics dependent on solution quality (RTQs) does not appear to be used at all. Possibly the reason for this lies in the fact that SQTs are more intuitively related to the run-time behaviour of an optimisation LVA, and that empirical SQTs can be measured more easily (the latter issue will be discussed in more detail in the next section). Nevertheless, RTQs can be useful, for instance, in cases where trade-offs between the mean and the standard deviation of the time required for reaching a certain solution quality q have to be examined in dependence of q , but where the details offered by a series of qualified RTDs (or the full bivariate RTD) are not of interest.
Example 4.5 Run-Time Statistics Depending on Solution Quality
Figure 4.3 (right) illustrates several quantiles of the qualified RTDs from Figure 4.2 (page 162) for relative solution quality q in dependence of q . Note the difference to the SQT plots in Figure 4.3 (left), which show SQD statistics as a function of run-time.
Empirically Measuring RTDs Except for very simple algorithms, such as Uninformed Random Picking, it is typically not possible to analytically determine RTDs for a given Las Vegas algorithm. Hence, the true RTDs characterising a Las Vegas algorithm’s behaviour are typically approximated by empirical RTDs. For a given instance π of a decision problem, the empirical RTD of an LVA A can be easily determined by performing k independent runs of A on π and recording for each successful run the time required to find a solution. The empirical run-time distribution is given by the cumulative distribution function associated with these observations. Each run corresponds to drawing a sample from the true RTD of A on π , and clearly, the more runs are performed, the better will the empirical RTD obtained from these samples approximate the true underlying RTD. For algorithms that are known to be either complete or probabilistically approximately complete (PAC), it is often desirable (although not always practical) to terminate each run only after a solution has been found; this way, a complete empirical approximation of A’s RTD on π can be obtained. In cases where not all runs are successful, either because the algorithm is essentially incomplete or because some runs were terminated before a solution could be found, a truncated approximation of the true RTD
4.2 Run-Time Distributions
167
can be obtained from the successful runs. Practically, nearly always a cutoff time is used as a criterion for terminating unsuccessful runs. More formally, let k be the total number of runs performed with a cutoff time t , and let k ≤ k be the number of successful runs, that is, runs during which a solution was found. Furthermore, let rt(j ) denote the run-time for the j th entry in a list of all successful runs, ordered according to increasing run-times. The s (RT ≤ t) := #{j | rt(j ) ≤ t}/k . cumulative empirical RTD is then defined by P The ratio sr := k /k is called the success ratio of A on π with cutoff t . For algorithms that are known or suspected to be essentially incomplete, the success ratio converges to the asymptotic maximal success probability of A on the given problem instance π , which is formally defined as p∗s := limt→∞ Ps (RTA,π ≤ t). For sufficiently high cutoff time, the empirically determined success ratio can give useful approximations of p∗s . Unfortunately, in the absence of theoretical knowledge on the success probability or the speed of convergence of the success ratio, the decision whether a given cutoff time is high enough to obtain a reasonable estimate of the success probability needs to be based on educated guessing. In practice, the following criterion is often useful in situations, where a reasonably high number of runs (typically between 100 and 10 000) can be performed: When increasing a given cutoff t by a factor of τ (where τ is typically between 10 and 100) does not result in an increased success ratio, it is assumed that the asymptotic behaviour of the algorithm is observed and that the observed success ratio is a reasonably good approximation of the asymptotic success probability. Note that in these situations, as well as in cases where success ratios equal to one cannot be achieved for practical reasons (e.g., due to limited computing resources), certain RTD statistics, in particular all quantiles lower than sr, are still available. Other RTD statistics, particularly the mean time for finding a solution, can be estimated using the following approach: When for cutoff time t , k out of k runs were successful, the probability for any individual run with cutoff t to succeed can be estimated by the success ratio sr := k /k . Consequently, for n successive (or parallel) independent runs with cutoff t , the probability that at least one of these runs is successful is 1 − (1 − sr)n . Using this result, quantiles higher than sr can be estimated for the variant of the respective algorithm that re-initialises the search after each time interval of length t (static restart). Furthermore, the expected time for finding a solution can be estimated from the mean time over the successful runs by taking into account the expected number of runs required to find a solution as well as the mean run-time of the failed runs (see also Parkes and Walser [1996]): (RT ) = E (RTs ) + ( E
1
sr
(RTf ), − 1) · E
(4.1)
Chapter 4 Empirical Analysis of SLS Algorithms 14 12 10
P(solve)
run-time [CPU sec]
168
8 6 4 2 0 0 100 200 300 400 500 600 700 800 900 1 000
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
2
4
run #
6
8
10 12 14 16 18 20
run-time [CPU sec]
Figure 4.4 Run-time data for WalkSAT/SKC, a prominent SLS algorithm for SAT, applied to a hard Random 3-SAT instance for approx. optimal noise setting, 1 000 tries. Left: bar diagram rt(j); right: corresponding RTD .
(RTs ) := 1/k · k rt(j ) is the average run-time of a successful run, where E j=1 (RTf ) is the average run-time of a failed run. Using a static restart mechand E (RTf ) := t . Note that in this case, anism with a fixed cutoff time t results in E E (RT ) depends on the cutoff time t ; in fact, RTD information can be used for determining values t that lead to optimal performance in the sense of minimal (RT ) (cf. Section 4.4). expected solution time E
Example 4.6 Raw Run-Time Data vs Empirical RTDs
Figure 4.4 (left) shows the raw data from running WalkSAT/SKC, a prominent SLS algorithm for SAT, on a hard problem instance with 100 variables and 430 clauses; each vertical line represents one run of the algorithm and the height of the lines indicates the CPU time needed for finding a solution. The right side of the same figure shows the corresponding RTD s (RT ≤ t)). Note that as a cumulative probability distribution curve (t, P the run-time is extremely variable, which is typical for SLS algorithms for hard combinatorial problems. Clearly, the RTD representation gives a much more informative picture of the run-time behaviour of the algorithm than simple descriptive statistics summarising the data shown on the left side of Figure 4.4, and, as we will see later in this chapter, it also provides the basis for more sophisticated analyses of algorithmic behaviour. (The graphs shown in Figure 4.4 are based on the same data used in Example 4.1 on page 159f.)
4.2 Run-Time Distributions
169
For empirically approximating the bivariate RTDs of an optimisation LVA A on a given problem instance π , a slightly different approach is used. During each run of A , whenever the incumbent solution (i.e., the best candidate solution found during this run) is improved, the quality of the improved incumbent solution and the time at which the improvement was achieved is recorded in a solution quality trace. The empirical RTD is derived from the solution quality traces obtained over multiple (independent) runs of A on π . Formally, let k be the number of runs performed and let sq (t, j ) denote the quality of the best solution found in run j until time t. Then the cumulative empirical run-time distribution of A on s (RT ≤ t , SQ ≤ q ) := #{j | sq (t , j ) ≤ q }/k . Qualified π is defined by P RTDs and SQDs as well as SQT and RTQ data and, where appropriate, asymptotic SQDs and TTDs can also be easily derived from the solution quality traces. With regard to the use of cutoff times and their impact on the completeness of the empirical RTDs, considerations very similar to those discussed for the case of decision problems apply.
CPU Time vs Operation Counts Up to this point, and consistent with a large part of the empirical analyses of algorithmic performance in the literature, we have used CPU time for measuring and reporting the run-time of algorithms. Obviously, a CPU time measurement is always based on a concrete implementation and run-time environment (i.e., machine and operating system). However, it is often more appropriate, especially in the context of comparative studies of algorithmic performance, to measure run-time in a way that allows one to abstract from these factors and that facilitates comparisons of empirical results across various platforms. This can be done using operation counts, which reflect the number of operations that are considered to contribute significantly towards an algorithm’s performance, and cost models, which relate the cost (typically in terms of run-time per execution) of these operations relative to each other or absolute in terms of CPU time for a given implementation and run-time environment [Ahuja and Orlin, 1996]. Generally, using operation counts and an associated cost model rather than CPU time measurements as the basis for empirical studies often gives a clearer and more detailed picture of algorithmic performance. This approach is especially useful for comparative studies involving various algorithms or different variants of one algorithm. Furthermore, it allows one to explicitly address trade-offs in the design of SLS algorithms, such as complexity vs efficacy of different types of local search steps. To make a clear distinction between run-time measurements corresponding to actual CPU times and abstract run-times measured in operation counts, we refer to the latter as run-lengths. Similarly, we refer to RTDs obtained
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10−4
P(solve)
Chapter 4 Empirical Analysis of SLS Algorithms
P(solve)
170
easy medium hard
10−3
10−2
10−1
1
run-time [CPU sec]
10
102
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
easy medium hard
10
102
103
104
105
106
run-time [search steps]
Figure 4.5 RTDs (left) and RLD (right) for WalkSAT/SKC, a prominent SLS algorithm
for SAT, applied to three Uniform Random 3-SAT instances of varying difficulty, based on 1 000 runs per instance (using an approx. optimal noise parameter setting).
from run-times measured in terms of operation counts as run-length distributions or RLDs. For SLS algorithms, a commonly used operation count is the number of local search steps. In the case of pure SLS methods, such as Iterative Improvement, there is only one type of local search step, and while the cost or time complexity of such a step typically depends on the size and other properties of the given problem instance, in many cases it is constant or close to constant within and between runs of the algorithm on the same instance. In this situation, measuring run-time in terms of local search steps as elementary operations is often the method of choice; furthermore, run-times measured in terms of CPU time and run-lengths based on local search steps as basic operations are related to each other by scaling with a constant factor. Example 4.7 RTDs vs RLDs
Figure 4.5 shows RTD and RLD data for the same experiments (solving three Uniform Random 3-SAT instances with 100 variables and 430 clauses each using WalkSAT/SKC, a prominent SLS algorithm for SAT). The operations counted for obtaining RLDs are local search steps; in the case of WalkSAT/SKC, each local search step corresponds to flipping the truth value assigned to one propositional variable. Note that, when comparing the RTDs and the corresponding RLDs in a semi-log plot, both distributions always have the same shape. This reflects the fact that the CPU time per step is roughly constant. However, closer examination of the RTD and RLD data reveals that the CPU time per step differs between the three instances; the reason for this is the fact that the hard problem was solved on a faster machine
4.3 RTD-Based Analysis of LVA Behaviour
171
than the medium and easy instances. In this example, the CPU time per search step is 0.027ms for the hard instance, and 0.035ms for the medium and easy instances; the time required for search initialisation is 0.8ms for the hard instance and 1ms for the medium and easy instances. These differences result solely from the difference in CPU speed between the two machines used for running the respective experiments.
In the case of hybrid SLS algorithms characterised by GLSM models with multiple frequently used states, such as Iterated Local Search (cf. Chapter 2, Section 2.3 and Chapter 3, Section 3.3), the search steps for each state of the GLSM model may have significantly different execution costs (i.e., run-time per step) and, consequently, they should be counted separately. By weighting these different operation counts relative to each other, using an appropriate cost model, it is typically possible to aggregate them into run-lengths or RLDs. Alternatively, or in situations where the cost of local search steps can vary significantly within a run of the algorithm or between runs on the same instance, it may be necessary to use finer-grained elementary operations, such as the number of evaluations of the underlying objective function, or the number of updates of internal data structures used for implementing the algorithm’s step function.
4.3 RTD-Based Analysis of LVA Behaviour After having introduced RTDs (and related concepts) in the previous section, we now show how these can be used for analysing and characterising the behaviour and relative performance of Las Vegas algorithms. We will start with the quantitative analysis of LVA behaviour based on single RTDs; next, we will show how this technique can be generalised to cover sets and distributions of problem instances. We will then explain how RTDs can be used for the comparative analysis of several algorithms before returning to individual algorithms, for which we discuss advanced analysis techniques, including the empirical analysis of asymptotic behaviour and stagnation.
Basic Quantitative Analysis based on Single RTDs When analysing or comparing the behaviour of Las Vegas Algorithms, the empirical RTD (or RLD) data can be used in different ways. In many cases, graphic representations of empirical RTDs provide a good starting point. As an example,
Chapter 4 Empirical Analysis of SLS Algorithms
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
P(solve)
P(solve)
Figures 4.6 and 4.7 show the RTD for the hard problem instance from Figure 4.5 (page 170) in three different views. Compared to standard representations, semilog plots (as shown on the right side of Figure 4.7) give a better view of the distribution over its entire range; this is especially relevant for RTDs of SLS algorithms, which often show an extreme variability in run-time. Also, when using semi-log plots to compare RTDs, uniform performance differences characterised by a constant factor can be easily detected, as they correspond to simple shifts along the horizontal axis (for an example, see Figure 4.5, page 170). On the other hand, log-log plots of an RTD or its associated failure rate decay function, 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
103
104
105
106
0
80
0
70
0
60
0
50
0
40
0
30
00
run-time [search steps]
0
0
0
00
0
00
0
00
0
0
00
00
00
0
00
0
0
20
10 0
102
run-time [search steps]
Figure 4.6 Left: RLD for WalkSAT/SKC, a prominent SLS algorithm for SAT, on a hard
Random 3-SAT instance for approx. optimal noise parameter setting. Right: Semi-log plot of the same RLD.
1
1
0.1
0.1
1-P(solve)
P(solve)
172
0.01
0.001 102
103
104
105
run-time [search steps]
106
0.01
0.001 102
103
104
105
106
run-time [search steps]
Figure 4.7 Log-log plot of the same RLD as in Figure 4.6 (left) and log-log plot of the corresponding failure probability over time (right).
4.3 RTD-Based Analysis of LVA Behaviour
173
1− rtd(t), are often very useful for examining the behaviour of a given Las Vegas algorithm for extremely short or extremely long run-times (cf. Figure 4.7). While graphical representations of RTDs are well-suited for investigating and describing the qualitative behaviour of Las Vegas Algorithms, quantitative analyses are usually based on summarising the RTD data with basic descriptive statistics. For our example, some of the most common standard descriptive statistics, such as the empirical mean, standard deviation, minimum, maximum and some quantiles, are reported in Table 4.1. Note again the huge variability of the data, as indicated by the large standard deviation and quantile ratios. The latter, like the variation coefficient, vc := stddev/mean, have the advantage of being invariant to multiplication of the data by a constant, which – as we will see later – is often advantageous when comparing RTDs. In the case of optimisation LVAs, analogous considerations apply to graphical representations and standard descriptive statistics of qualified RTDs for various solution quality bounds. Similarly, different graphical representations and summary statistics can be used for analysing and characterising empirical SQDs for various run-time bounds or time-dependent statistics of solution quality; this approach is more commonly followed in the literature, but not always preferable over studying qualified RTDs. Generally, it should be noted that for directly obtaining sufficiently stable estimates for summary statistics, the same number of test-runs have to be performed as for measuring reasonably accurate empirical RTDs. Thus, measuring RTDs does not cause a computational overhead in data acquisition when compared to measuring only a few simple summary statistics, such as averages and empirical standard deviations. At the same time, arbitrary quantiles and other descriptive statistics can be easily calculated from the RTD data. Furthermore, in the case of optimisation LVAs, bivariate RTDs, qualified RTDs, SQDs and SQTs can all be easily determined from the same solution quality traces without significant overhead in computation time. Because qualified RTDs, SQDs and
mean min max stddev vc
57 606.23 107 443 496 58 953.60 1.02
median
q0.25 ; q0.1 q0.75 ; q0.9 q0.75 /q0.25 q0.9 /q0.1
38 911 16 762; 5 332 80 709; 137 863 4.81 25.86
Table 4.1 Basic descriptive statistics for the RLD shown in Figures 4.6 and 4.7; qx denotes the x-quantile; the variation coefficient vc := stddev/mean and the quantile ratios qx /q1−x are measures for the relative variability of the run-length data.
Chapter 4 Empirical Analysis of SLS Algorithms
SQTs merely present different views on the same underlying bivariate RTD, and since similar considerations apply to all of these, in the following discussion of empirical methodology we will often just explicitly mention RTDs. Because of the high variability in run-time over multiple runs on the same problem instance that is typical for many SLS algorithms, empirical estimates of mean run-time can be rather unstable, even when obtained from relatively large numbers of successful runs. This potential problem can be alleviated by using quantiles and quantile ratios instead of means and standard deviations for summarising RTD data with simple descriptive statistics.
Basic Quantitative Analysis for Ensembles of Instances In many applications, the behaviour of a given algorithm needs to be tested on a set of problem instances. In principle, the same method as described above for single instances can be applied — RTDs are measured for each instance, and the corresponding sets of graphs and/or associated descriptive statistics are reported. Often, LVA behaviour is analysed for a set of fairly similar instances (such as instances of the same type, but different size, or instances from the same random instance distribution). In this case, the RTDs will often have similar shapes (particularly as seen in a semi-log plot) or share prominent qualitative properties, such as being uni- or bi-modal, or having a very prominent right tail. A simple example can be seen in Figure 4.8 (left side), where very similarly shaped RTDs are obtained when applying the same SLS algorithm for SAT (WalkSAT/SKC) to three randomly generated instances from the same instance 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
P
P(solve)
174
10
102
103
104
105
run-time [search steps]
106
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
102
103
104
105
106
median run-time [search steps]
Figure 4.8 Left: RLDs for WalkSAT/SKC (using an approx. optimal noise parameter setting), a prominent SLS algorithm for SAT, applied to three hard Random 3-SAT instances. Right: Distribution of median local search cost for the same algorithm across a set of 1 000 Uniform Random 3-SAT instances.
4.3 RTD-Based Analysis of LVA Behaviour
175
distribution (Uniform Random 3-SAT with 100 variables and 430 clauses). In such cases, a representative or typical instance can be selected for presentation or further analysis, while the analogous data for the other instances are only briefly summarised. It is very important, however, to not naïvely assume properties of or similarities between RTDs based on a few selected examples only, but to carefully test such assumptions by manual or automated analysis of all or sufficiently many RTDs. In Section 4.4, we will demonstrate how in certain cases, the latter can be done in an elegant and informative way by using functional approximations of RTDs and statistical goodness-of-fit tests. For bigger sets of instances, such as the sets obtained from sampling random distributions of problem instances, it becomes important to characterise the performance of a given algorithm on individual instances as well as across the entire ensemble. Often (but not always!) when analysing the behaviour of reasonably optimised, probabilistically approximately complete SLS algorithms in such situations, there is a fairly simple scaling relationship between the RTDs for individual problem instances: Given two instances and a desired probability of finding a solution, the ratio of the run-times required for achieving this solution probability for the two instances is roughly constant. This is equivalent to the observation that in a semi-log plot, the two corresponding RTDs essentially differ only by a shift along the time axis. If this is the case, the performance of the given algorithm across the ensemble can be summarised by one RTD for an arbitrarily chosen instance from the ensemble and the distribution of the mean (or any quantile) of the individual RTDs across the ensemble. The latter type of distribution intuitively captures the cost for solving instances across the set; in the past it has often been referred to as ‘hardness distribution’ – however, it should be noted that without further knowledge, the underlying notion of hardness is entirely relative to the algorithm used rather than intrinsic to the problem instance, and hence this type of distribution is technically more appropriately termed a search cost distribution (SCD). An example for such a SCD, here for an SLS algorithm for SAT (WalkSAT/SKC) applied to a set of 1 000 Uniform Random 3-SAT instances with 100 variables and 430 clauses each, is shown in Figure 4.8 (right side). In reality, the simple multiplicative scaling relationship between any two instances of a given ensemble will hardly ever hold exactly. Hence, depending on the degree and nature of variation between the RTDs for the given ensemble, it is often reasonable and appropriate to report cost distributions along with a small set of RTDs that have been carefully selected from the ensemble such that they representatively illustrate the variation of the RTDs across the sets. Sometimes, distributions (or statistics) of other basic descriptive RTD statistics across the ensemble of instance—for example, a distribution of variation coefficients or quantile ratios—can be useful for obtaining a more detailed picture of the algorithm’s behaviour on the given ensemble. It can also be very informative
176
Chapter 4 Empirical Analysis of SLS Algorithms
to investigate the correlation between various features of the RTD across the ensemble; specifically, the correlation between the median (or mean) and some measure of variation can be very interesting for understanding LVA behaviour. Finally, it should be mentioned that when dealing with sets of instances that have been obtained by systematically varying some parameter, such as problem size, it is natural and obvious to study characteristics and properties of the corresponding RTDs (or the cost distributions) in dependence of this parameter. Otherwise, similar considerations as discussed above for ensembles of instances apply. Again, choosing an appropriate graphical representation, such as a semi-log plot for the functional dependence of mean run-time on problem size, is often the key for easily detecting interesting behaviour (e.g., exponential scaling).
In Depth Benchmark Sets The selection of benchmark instances is an important factor in the empirical analysis of an algorithm’s behaviour, and the use of inadequate benchmark sets can lead to questionable results and misleading conclusions. The criteria for benchmark selection depend significantly on the problem domain under consideration, on the hypotheses and goals of the empirical study, and on the algorithms being analysed. There are, however, some general issues and principles which will be discussed in the following. Typically, benchmark sets should mainly consist of problem instances that are intrinsically hard or difficult to solve for a broad range of algorithms. While easy instances can be sometimes useful for illustrating or investigating properties of specific algorithms (for example polynomially solvable instances that are hard for certain, otherwise highperforming algorithms), they should not be used as general benchmark problems, as this can easily lead to heavily biased evaluations and assessments of the usefulness of specific algorithms. Similar considerations apply to problem size; small problem instances can sometimes lead to atypical SLS behaviour that does not generalise to larger problem sizes. To avoid such problems and to facilitate studies on the scaling of SLS performance it is generally advisable to include problem instances of different sizes into benchmark sets. Furthermore, benchmark sets should contain a diverse collection of problem instances. An algorithm’s behaviour can substantially depend on specific features of problem instances, and in many cases at least some of these features are not known a priori. Using a benchmark set comprising a diverse range of problem instances reduces the risk of incorrectly generalising from behaviour or performance results that only apply to a very limited class of problem instances. We distinguish three types of benchmark instances: instances obtained from realworld applications, artificially crafted problem instances and randomly generated instances. Some combinatorial problems have no real-world applications; where real-world problem instances are available, however, they often provide the most realistic test-bed for algorithms of potential practical interest. Artificially crafted problem instances can be
4.3 RTD-Based Analysis of LVA Behaviour
177
very useful for studying specific properties or features of an algorithm; they are also often used in situations where real-world instances are not available or unsuitable for a specific study (e.g., because they are too large, too difficult to solve, or only very few real-world instances are available). Random problem instance generators have been developed and widely used in many domains, including SAT and TSP. These generators effectively sample from distributions of problem instances with controlled syntactic properties, such as instance size or expected number of solutions. They offer the advantage that large testsets can be generated easily, which facilitates the application of statistical tests. However, basing the evaluation of an algorithm on randomly generated problem instances only carries the risk of obtaining results that are misleading or meaningless w.r.t. to practical applications. Ideally, benchmark sets used for empirical studies should comprise instances of all three types. In some cases, it can also be beneficial to additionally use suitable encoded problem instances from other domains. The performance of SAT algorithms, for example, is often evaluated on SAT-encoded instances from domains such as graph colouring, planning or circuit verification (see, e.g., Hoos and Stützle [2000a]). In these cases, it is often important to ensure that the respective encoding schemes do not produce undesirable features that, for instance, may render the resulting instances abnormally difficult for the algorithm(s) under consideration. In principle, artificially crafted and randomly generated problem instances can offer the advantage of carefully controlled properties; in reality, however, the behaviour of SLS algorithms is often affected by problem features that are not well understood or difficult to control. (This issue will be further discussed in Chapter 5.) Randomly generated instance sets often show a large variation w.r.t. their non-controlled features, leading to the kind of diversity in the benchmark sets that we have advocated above. On the other hand, this variation often also causes extreme differences in difficulty for instances within the same sample of problem instances (see, e.g., Hoos [1998], Hoos and Stützle [1999]). This can easily lead to substantial differences in difficulty (as well as other properties) between test-sets sampled from the same instance distribution. As a consequence, comparative analyses should always evaluate all algorithms on identical test-sets. To facilitate the reproducibility of empirical analyses and the comparability of results between studies, it is important to use established benchmark sets and to make newly created test-sets available to other researchers. In this context, public benchmark libraries play an important role. Such libraries exist for many domains; widely known examples include TSPLIB (containing a variety of TSP and TSP-related instances), SATLIB (which includes a collection of benchmark instances for SAT), ORLIB (comprising test instances for a variety of problems from Operations Research), TPTP (a collection of problem instances for theorem provers) and CSPLIB (a benchmark library for constraints). Good benchmark libraries are regularly updated with new, challenging problems. Using severely outdated or static benchmark libraries for empirical studies gives rise to various, well-known pitfalls [Hooker, 1994; 1996] and should therefore be avoided as much as possible. Furthermore, good benchmark libraries will provide descriptions and explanations of all problem instances offered, ideally accompanied by references to the relevant literature. Generally, a good understanding of all benchmark instances used in the context of an empirical study, regardless of their source, is often crucial for interpreting the results correctly and conclusively.
178
Chapter 4 Empirical Analysis of SLS Algorithms
Comparing Algorithms Based on RTDs Empirical investigations of algorithmic behaviour are frequently performed in the context of comparative studies, often with the explicit or implicit goal to establish the superiority of a new algorithm over existing techniques. In this situation, given two Las Vegas algorithms for a decision problem, one would empirically show that one of them consistently gives a higher solution probability than the other. Likewise, for an optimisation problem, the same applies for a specific (e.g., the optimal) solution quality or for a range of solution qualities. Formally, this can be captured by the concept of probabilistic domination, defined in the following way: Definition 4.8 Probabilistic Domination
Let π ∈ Π an instance of a decision problem Π, and let A and B be two Las Vegas algorithms for Π. A probabilistically dominates B on π if, and only if, ∀t : Ps (RTA,π ≤ t) ≥ Ps (RTB,π ≤ t) and ∃t : Ps (RTA,π ≤ t) > Ps (RTB,π ≤ t). Similarly, for an instance π ∈ Π of an optimisation problem Π and optimisation LVAs A and B for Π , A probabilistically dominates B on π for solution quality less than or equal to q if, and only if, ∀t : Ps (RTA ,π ≤ t, SQA ,π ≤ q ) ≥ Ps (RTB ,π ≤ t, SQB ,π ≤ q ) and ∃t : Ps (RTB ,π ≤ t, SQB ,π ≤ q ) > Ps (RTB ,π ≤ t, SQB ,π ≤ q ). A probabilistically dominates B on π if, and only if, A probabilistically dominates B on π for arbitrary solution quality bounds q .
Remark: A probabilistic domination relation holds between two Las Vegas algorithms on a given problem instance if, and only if, their respective (qualified) RTDs do not cross each other. This provides a simple method for graphically checking probabilistic domination between two LVAs on individual problem instances.
In practice, performance comparisons between Las Vegas algorithms are complicated by the fact that even for a single problem instance, a probabilistic domination does not always hold. This situation is characterised by the occurrence of cross-overs between the corresponding RTDs, indicating that which of the two algorithms performs better, that is, obtains higher solution probabilities (for a given solution quality bound), depends on the time the algorithm is allowed to run.
4.3 RTD-Based Analysis of LVA Behaviour
179
Statistical tests can be used to assess the significance of performance differences. In the simplest case, the Mann-Whitney U-test (or, equivalently, the Wilcoxon rank sum test) can be applied [Sheskin, 2000]; this test determines whether the medians of two samples are equal, hence a rejection indicates significant performance differences. This test can also be used to determine whether the median solution qualities achieved by two SLS optimisation algorithms are identical. (The widely used t-test generally fulfils a similar purpose, but requires the assumption that the given samples are normally distributed with identical variance; since this assumption is often violated in the context of the empirical analysis of SLS behaviour, in many cases, the t-test is not applicable.) The more specific hypothesis whether the theoretical RTDs (or SQDs) of two algorithms are identical can be tested using the Kolmogorov-Smirnov test for two independent samples [Sheskin, 2000]. One important question when assessing the statistical significance of performance differences observed between algorithms is that of sample size: How many runs should be performed for measuring the respective empirical RTDs? Generally, the precision of statistical tests, that is, their ability to correctly distinguish situations in which the given null hypothesis is correct from those where it is incorrect, crucially depends on sample size. Table 4.2 shows the performance differences between two given RTDs that can be detected by the Mann-Whitney U-test for standard significance levels and power values in dependence of sample size. (Note that the significance level and power value indicate the maximum probabilities that the test incorrectly rejects or accepts the null hypothesis that the medians of the given RTDs are equal, respectively.) sign. level 0 .05 , power 0 .95
sign. level 0 .01 , power 0 .99
sample size
m1 /m2
sample size
m1 /m2
3 010 1 000 122 100 32 10
1.1 1.18 1.5 1.6 2 3
5 565 1 000 225 100 58 10
1.1 1.24 1.5 1.8 2 3.9
Table 4.2 Performance differences detectable by the Mann-Whitney U-test for various sample sizes (runs per RTD); m1 /m2 denotes the ratio between the medians of the two given RTDs. (The values in this table have been obtained using a standard procedure based on adjusting the statistical power of the two-sample t-test to the MannWhitney U-test using a worst-case Pitman asymptotic relative efficiency (ARE) value of 0.864.)
Chapter 4 Empirical Analysis of SLS Algorithms
P(solve)
180
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
ILS MMAS
0.1
1
10
100
1 000
run-time [CPU sec]
Figure 4.9 Qualified RTDs for two SLS algorithms for the TSP that, applied to a standard benchmark instance, are required to find a solution of optimal quality. The two RTDs cross over between 20 and 30 CPU seconds.
Example 4.8 Comparative RTD Analysis
Figure 4.9 shows the qualified RTDs for two SLS algorithms for the TSP, MAX –MIN Ant System (MMAS) and Iterated Local Search (ILS) under the requirement of finding a solution of optimal quality for TSPLIB instance lin318 with 318 vertices, each RTD is based on 1 000 runs of the respective algorithm. Although the Mann-Whitney U-test rejects the null hypothesis that the medians of the two RTDs are equal at a significance level α = 0.05 (the p-value is 6.4 · 10−5 ), taking into consideration the sample size of 1 000 runs per RTD, the difference between the medians is slightly too small to be considered significant at a power of 0.8. On the other hand, the significance of the obvious differences between the two distributions is confirmed by the Kolmogorov-Smirnov test, which rejects the null hypothesis that the observed run-times for the two algorithms stem from the same distribution at a significance level of α = 0.05 (the p-value is ≤ 2.2 · 10−16 ). Clearly, there is no probabilistic domination between the two algorithms. The qualified RTD curves cross over at one specific point between 20 and 30 CPU seconds, and ILS gives a higher solution probability than MMAS for shorter runs, whereas MMAS is more effective for longer runs. Both algorithms eventually find optimal solutions in all runs and hence do not show any evidence for essentially incomplete behaviour on this problem instance. Interestingly, it appears that MMAS has practically no chance of finding an optimal solution in less than 10 CPU seconds, while ILS finds optimal solutions with a small probability after only 0.2 CPU seconds. (This salient difference in performance is partly explained by the fact that population-based
4.3 RTD-Based Analysis of LVA Behaviour
181
algorithms such as MMAS typically incur a certain overhead from maintaining multiple candidate solutions.)
Comparative Analysis for Ensembles of Instances As previously mentioned, empirical analyses of LVA behaviour are mostly performed on ensembles of problem instances. For comparative analyses, in principle this can done by comparing the respective RTDs on each individual problem instance. Ideally, when dealing with two algorithms A and B , one would hope to observe probabilistic domination of A by B (or vice versa) on every instance of the ensemble. In practice, probabilistic domination does not always hold for all instances, and even where it holds, it may not be consistent across a given set of instances. Hence, an instance-based analysis of probabilistic domination (based on RTDs) can be used to partition a given problem ensemble into three subsets: (i) those on which A probabilistically dominates B , (ii) those on which B probabilistically dominates A and (iii) those for which probabilistic domination is not observed, that is, for which A’s and B ’s RTDs cross each other. The relative sizes of these partitions give a rather realistic and detailed picture of the algorithms’ relative performance on the given set of instances. Statistical tests can be used to assess the significance of performance differences between two algorithms applied to the same ensemble of instances. These tests are applied to performance measures, such as mean run-time or an RTD quantile, for each algorithm on any problem instance in the given ensemble; hence, they do not capture qualitative differences in performance, particularly as given in cases where there is no probabilistic domination of one algorithm over the other. The binomial sign test as well as the Wilcoxon matched pairs signedrank test measure whether the median of the paired differences is statistically significantly different from zero, indicating that one algorithm performs better than the other [Sheskin, 2000]. The Wilcoxon test is more sensitive, but requires the assumption that the distribution of the paired differences is symmetric. It may be noted that the widely used t-test for two dependent samples requires assumptions on the normality and homogeneity of variance of the underlying distributions of search cost over the given test-set; this test should not be used for comparing the performance of SLS algorithms, where these assumptions are typically not satisfied. Particularly for large instance ensembles, it is often useful to refine this analysis by looking at particular performance measures, such as the median run-time, and to study the correlation between A and B w.r.t. these. For qualitative analyses of such correlations, scatter plots can be used in which each instance is represented
Chapter 4 Empirical Analysis of SLS Algorithms
by one point in the plot, whose coordinates correspond to the performance measure for A and B applied to that instance. Quantitatively, the correlation can be summarised using the empirical correlation coefficient. When the nature of an observed performance correlation seems to be regular (e.g., a roughly linear trend in the scatter plot), a simple regression analysis can be used to model the corresponding relationship in the algorithms’ performance. To test whether the correlation between the performance of two algorithms is significant, non-parametric tests like Spearman’s rank order test or Kendall’s tau test can be employed [Sheskin, 2000]. These tests determine whether there is a significant monotonic relationship in the performance data. They are preferable over tests based on the Pearson product-moment correlation coefficient, which require the assumption that the two random variables underlying the performance data stem from a bivariate normal distribution.
Example 4.9 Comparative Analysis on Instance Ensembles
Figure 4.10 shows the correlation between the performance of an ILS algorithm and an ACO algorithm for TSP applied to a set of 100 randomly generated Euclidean TSP instances (the algorithms and problem class are described in Chapter 8). The ILS algorithm has a lower median run-time than the ACO algorithm for 66 of the 100 problem instances; this performance difference is statistically significant, because the Wilcoxon matched pairs signed-rank test rejects the null hypothesis that the performance of the two
median run-time ILS [CPU sec]
182
1 000 100 10 1 0.1 0.1
1
10
100
1 000
median run-time MMAS [CPU sec]
Figure 4.10 Correlation between median run-time required by MMAS vs ILS for finding
the optimal solutions to instances of a set comprising 100 TSP instances with 300 vertices each; each median was measured from 10 runs per algorithm. The band between the two dashed grey lines indicates performance differences that, based on the sample size of the underlying RTDs, cannot be assumed to be statistically significant.
4.3 RTD-Based Analysis of LVA Behaviour
183
algorithms is equal at a significance level of α = 0.05 (the p-value is 7 · 10−5 ). (It may be noted that based on the sample size of 10 runs per instance that was used for the RTDs underlying each median value, performance differences of less than a factor of three can not be assumed to be statistically significant, which follows from a power analysis of the Mann-Whitney U-test that is used for assessing such performance differences, when using α = 0.05 and a power of 0.95.) The median run-times required for finding optimal solutions show a significant correlation (the correlation coefficient is equal to 0.39 and Spearman’s rank order test rejects the null hypothesis that the performance for the two algorithms is uncorrelated at significance level α = 0.05; the p-value is 9 · 10−11 ), which indicates that instances that are difficult for one algorithm tend to also be difficult for the other. This suggests that similar features are responsible for rendering instances from this class of TSP instances difficult for both SLS algorithms, a hypothesis that can be investigated further through additional empirical analysis (cf. Chapter 5).
Peak Performance vs Robustness Most state-of-the-art SLS algorithms have parameters (such as the noise parameter in Randomised Iterative Improvement, or the mutation and crossover rates in Evolutionary Algorithms) that need to be set manually; often, these parameter settings have a very significant impact on the respective algorithm’s performance. The existence of such parameters complicates the empirical investigation of LVA behaviour significantly. This is particularly the case for comparative studies, where ‘unfair parameter tuning’, that is, the use of unevenly optimised parameter settings, can bring about extremely misleading results. Many comparative empirical studies of algorithms in the literature use peak performance w.r.t. parameter settings as the measure for comparing parameterised algorithms. This can be justified by viewing peak performance as a measure of potential performance; more formally, it can be seen as a tight upper bound on performance over algorithm parameterisations. For peak performance analyses, it is important to determine optimal or close to optimal parameterisations of the respective algorithms. Since differently parameterised versions of the same algorithm can be viewed as distinct algorithms, the RTD-based approach described above can be applied. For continuous parameters, such as the noise parameter mentioned before, a series of such experiments can be used to obtain approximations of optimal values. Peak performance analysis can be very complex, especially when multiple parameters are involved whose
184
Chapter 4 Empirical Analysis of SLS Algorithms
effects are typically not independent from each other, or when dealing with complex parameters, such as the temperature schedule for Simulated Annealing, for which the domain of possible settings are extremely large and complex. In such cases, it can be infeasible to obtain reasonable approximations of optimal parameter settings; in the context of comparative studies, this situation should then be clearly acknowledged and approximately the same effort should be spent in tuning the parameter settings for every algorithm participating in a direct comparison. An alternative to hand-tuning is the use of automated parameter tuning approaches that are based on techniques from experimental design [Xu et al., 1998; Coy et al., 2001; Birattari et al., 2002]. In practice, optimal parameter settings are often not known a priori; furthermore, optimal parameter settings for a given algorithm can differ considerably between problem instances or instance classes. Therefore, robustness of an SLS algorithm w.r.t. suboptimal parameter settings is an important issue. This notion of robustness can be defined as the variation in an algorithm’s RTD (or some of its basic descriptive statistics) caused by specific deviations from an optimal parameter setting. It should be noted that typically, such robustness measures can be easily derived from the same data that have been collected for determining optimal parameter settings. A more general notion of robustness of an LVA’s behaviour additionally covers other types of performance variation, such as the variation in run-time for a fixed problem instance and a given algorithm (which is captured in the corresponding RTD ) as well as performance variations over different problem instances or domains. In all these cases, using RTDs rather than just basic descriptive statistics often gives a much clearer picture of more complex dependencies and effects, such as qualitative changes in algorithmic behaviour which are reflected in the shape of the RTD s. More advanced empirical studies should attempt to relate variation in LVA behaviour over different problem instances or domains to specific features of these instances or domains; such features can be of entirely syntactic nature (e.g., instance size), or they can reflect deeper, semantic properties. In this context, for SLS algorithms, features of the corresponding search spaces, such as density and distribution of solutions, are particularly relevant and often studied; this approach will be further discussed in Chapter 5.
4.4 Characterising and Improving LVA Behaviour Up to this point, our discussion of the RTD-based empirical methodology has been focused on analysing specific quantitative and qualitative aspects of
4.4 Characterising and Improving LVA Behaviour
185
algorithmic behaviour as reflected in RTDs. In this section, we first discuss more advanced aspects of empirical RTD analysis. This includes the analysis of asymptotic and stagnation behaviour, as well as the use of functional approximations for mathematically characterising entire RTDs. Then, we discuss how a more detailed and sophisticated analysis of RTDs can facilitate improvements in the performance and run-time behaviour of a given Las Vegas algorithm.
Asymptotic Behaviour and Stagnation In Section 4.1, we defined various norms of LVA behaviour. It is easy to see that all three norms of behaviour — completeness, probabilistic approximate completeness (PAC property) and essential incompleteness — correspond to properties of the given algorithm’s theoretical RTDs. For complete algorithms, the theoretical cumulative RTDs will reach one after a bounded time (where the bound depends on instance size). Empirically, for a given time bound, this property can be falsified by finding a problem instance on which at least one run of the algorithm did not produce a solution within the respective time bound. However, it should be clear that a completeness hypothesis can never be verified experimentally, since the instances for which a given bound does not hold might be very rare, and the probability for producing longer runs might be extremely small. SLS algorithms for combinatorial problems are often incomplete, or in the case of complete SLS algorithms, the time bounds are typically too high to be of any practical relevance. There are, however, in many cases empirically observable and practically significant differences between essentially incomplete and PAC algorithms [Hoos, 1999a]. Interestingly, neither property can be empirically verified or falsified. For an essentially incomplete algorithm, there exists a problem instance for which the probability of not finding a solution in an arbitrarily long run is greater than zero. Since only finite runs can be observed in practice, arbitrarily long unsuccessful runs could hypothetically always become successful after the horizon of observation. On the other hand, even if unsuccessful runs are never observed, there is always a possibility that the failure probability is just too small compared to the number of runs performed, or the instances on which true failure can occur are not represented in the ensemble of instances tested. However, empirical run-time distributions can provide evidence for (rather than proof of) essential incompleteness or PAC behaviour and hence provide the basis for hypotheses which, in some cases, can then be proven by theoretical analyses. Such evidence primarily takes the form of an apparent limiting success probability that is asymptotically approached by a given empirical RTD .
Chapter 4 Empirical Analysis of SLS Algorithms
P(solve)
186
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
MMAS MMAS* 100
1 000
10 000
run-time [CPU sec]
Figure 4.11 Qualified RTDs for two SLS algorithms for the TSP that are required to find an optimal solution of a well-known benchmark instance; MMAS is provably PAC, whereas MMAS∗ is an essentially incomplete variant of the same algorithm (see text for details). Each RTD is based on 1 000 independent runs of the respective algorithm.
Example 4.10 Asymptotic Behaviour in Empirical RTDs
Figure 4.11 shows the qualified RTDs for two variants of an ACO algorithm required to find an optimal solution for TSPLIB instance lin318 with 318 vertices. The RTD for MMAS∗ shows severe stagnation behaviour; after 26 CPU seconds, the probability for finding a solution does not increase any further, and up to 10 000 CPU seconds, not a single additional solution is found. This provides strong evidence (but no proof) that MMAS∗ is essentially incomplete. Conversely, all 1 000 runs of MMAS were successful and the underlying RTD appears to asymptotically approach one, suggesting that MMAS is probabilistically approximately complete. In fact, MMAS, a slight extension of MMAS∗ , is provably PAC, while MMAS∗ is essentially incomplete. The two algorithms differ only in the key feature that renders MMAS PAC [Stützle and Dorigo, 2002] (details on MMAS can be found in Chapter 8, Section 8.4).
In practice, true asymptotic behaviour (such as probabilistic approximate completeness) is less relevant than the rate at which the failure probability of a given LVA decreases over time. Intuitively, a drop in this rate indicates a stagnation in the algorithm’s progress towards finding solutions of the given problem instance. Here, we adopt a slightly different view of stagnation, which turns out to be consistent with the intuition described before. This view is based on the fact that in many cases, the probability of obtaining a solution of a given problem instance by using a particular Las Vegas algorithm can be increased by restarting
4.4 Characterising and Improving LVA Behaviour
187
the algorithm after a fixed amount of time (the so-called cutoff time) rather than letting it run longer and longer. Whether or not such a static restart strategy yields the desired improvement depends entirely on the respective RTD, and it is easy to see that only for RTDs identical to exponential distributions (up to discretisation effects), static restart does not result in any performance loss or improvement [Hoos and Stützle, 1999]. Exponential RTDs are characterised by a constant rate of decay in their right tail, which corresponds to the failure probability, a measure of the probability that the given algorithm fails to find an existing solution of a given problem instance within a given amount of time. When augmenting any LVA with a static restart mechanism, the resulting algorithm will show RTDs with exponentially decaying right tails. Based on this observation, efficiency and stagnation can be measured by comparing the decay rate of the failure probability at time t, denoted λ(t), with the tail decay rate obtained when using static restarts with cutoff t, denoted λ∗ (t). This leads to the following definition:
Definition 4.9 LVA Efficiency and Stagnation
Let A be a Las Vegas algorithm for a given combinatorial problem Π, and let rtdA,π (t) be the cumulative run-time distribution function of A applied to a problem instance π ∈ Π. Then we define λA,π (t) := −d/dt[ln(1− rtdA,π )](t) = 1/(1− rtdA,π (t))· d/dt[rtdA,π ](t), where d/dt[f ] denotes the first derivative of a function f in t. Furthermore, we define λ∗A,π (t) := −ln(1 − rtdA,π (t))/t. The efficiency of A on π at time t is then defined as effA,π (t) := λA,π (t)/ ∗ λA,π (t). Similarly, the stagnation ratio of A on π at time t is defined as stagrA,π (t) := 1/effA,π (t), and the stagnation of A on π at time t is given by stag A,π (t) := ln(stagrA,π (t)). Finally, we define the minimal efficiency of A on π as effA,π := inf{effA,π (t) | t > 0} and the minimal efficiency of A on a problem class Π as effA,Π := inf{effA,π | π ∈ Π}. The maximum stagnation ratio and maximum stagnation on problem instances and problem classes are defined analogously.
Remark: For empirical RTDs, the decay rates λA,π (t) are approximated using
standard techniques for numerical differentiation of discrete data such that artifacts due to discretisation effects are avoided as much as possible. It is easy to see that according to the definition, for any essentially incomplete algorithm A there are problem instances on which the minimal efficiency of
188
Chapter 4 Empirical Analysis of SLS Algorithms
A is zero. Constant minimal efficiency of one is observed if, and only if, the corresponding RTD is an exponential distribution. LVA efficiency greater than one indicates that restarting the algorithm rather then letting it run longer would result in a performance loss; this situation is often encountered for SLS algorithms during the initial search phase. It should be clear that our measure of LVA efficiency is a relative measure; hence, the fact that a given algorithm has high minimal efficiency does not imply that this algorithm cannot be further improved. As a simple example, consider Uninformed Random Picking as introduced in Chapter 1, Section 1.5; this primitive search algorithm has efficiency one for arbitrary problem instances and run-times, yet there are many other SLS algorithms which perform significantly better than Uninformed Random Picking, some of which have a smaller minimal efficiency. Hence, LVA efficiency as defined above cannot be used to determine the optimality of a given Las Vegas algorithm’s behaviour in an absolute way. Instead, it provides a quantitative measure for relative changes in efficiency of a given LVA over the course of its run-time. (However, the definition can easily be extended such that an absolute performance measure is obtained; this is done by using the restart decay rate λ∗ over a set of algorithms instead of λ∗A,π in the definition of LVA efficiency.)
Functional Characterisation of LVA Behaviour Obviously, any empirical RTD, as obtained by running a Las Vegas algorithm on a given problem instance, can be completely characterised by a function — a step function that can be derived from the empirical RTD data in a straightforward way. Typically, if an empirical RTD is a reasonably precise approximation of the true RTD (i.e., if the number of runs underlying the empirical RTD is sufficiently high), this step function is rather regular and can be approximated well using much simpler mathematical functions. Such approximations are useful for summarising the observed algorithmic behaviour as reflected in the raw empirical RTD data. But more importantly, they can provide the basis for modelling the observed behaviour mathematically, which is often a key step in gaining deeper insights into an algorithm’s behaviour. It should be noted that this general approach is commonly used in other empirical disciplines and can be considered one of the fundamental techniques in science. In the case of empirical RTDs, approximations with parameterised families of continuous probability functions known from statistics, such as exponential or normal distributions, are particularly useful. Given an empirical RTD and a parameterised family of cumulative probability functions, good approximations
4.4 Characterising and Improving LVA Behaviour
189
can be found using standard model fitting techniques, such as the MarquartLevenberg algorithm [Marquardt, 1963] or the expectation maximisation (EM) algorithm [Dempster et al., 1977]. The quality of the approximation thus obtained can be assessed using standard statistical goodness-of-fit tests, such as the well-known χ2 -test or the Kolmogorov-Smirnov test [Sheskin, 2000]. Both of these tests are used to decide if a sample comes from a population with a specific distribution. While the Kolmogorov-Smirnov test is restricted to continuous distributions, the χ2 goodness-of-fit test can also be applied to discrete distributions. Example 4.11 Functional Approximation of Empirical RTDs
1 0.9 empirical RLD ed[61081.5] 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 102 103
600 500
χ2 value
P(solve)
Looking at the empirical RLD of WalkSAT/SKC applied to a hard Uniform Random 3-SAT instance shown in Figure 4.6 (page 172), one might notice that the RLD graph resembles that of an exponential distribution. This leads to the hypothesis that on the given problem instance, the algorithm’s behaviour can be characterised by an exponential RLD. To test this hypothesis, we first fit the RLD data with a cumulative exponential distribution function of the form ed[m](x) := 1 − ex/m , using the Marquart-Levenberg algorithm (as realised in C. Gramme’s Gnufit software) to determine the optimal value for the parameter m. This approximation is shown in Figure 4.12 (left side).
0.01 acceptance threshold 0.05 acceptance threshold
400 300 200 100
105
105
run-time [search steps]
106
0 102
103
104
105
median run-time [search steps]
Figure 4.12 Left: Best-fit approximation of the RLD from Figure 4.6 (page 172) by an
exponential distribution; this approximation passes the χ2 goodness-of-fit test at significance level α = 0.05. Right: Correlation between median run-length and χ2 values from testing RLDs of individual instances versus a best-fit exponential distribution for a testset of 1 000 hard Random 3-SAT instances; the horizontal lines indicate the acceptance thresholds for the 0.01 and 0.05 acceptance levels of the χ2 -test.
190
Chapter 4 Empirical Analysis of SLS Algorithms
Then, we applied the χ2 goodness-of-fit test to examine the hypothesis whether the resulting exponential distribution is identical to the theoretical RTD underlying the empirically observed run-lengths. In the given example, the resulting χ2 value of 26.24 indicates that our distribution hypothesis passed the test at a standard significance level α = 0.05.
It is worth noting that, since Las Vegas algorithms (like all algorithms) are of an inherently discrete nature, their true (theoretical) RTDs are always step functions. However, there are good reasons for the use of continuous probability functions for approximation. For increasing problem sizes, these step functions will become arbitrarily detailed — an effect which, especially for computationally hard problems, such as SAT or TSP, becomes relevant even for relatively modest and certainly realistically solvable problem sizes. Furthermore, abstracting from the discrete nature of RTDs often facilitates a more uniform characterisation that is mathematically easier to handle. However, for ‘very easy’ problem instances, that is, instances that can be solved by a given algorithm in tens or hundreds of basic operations or CPU cycles, the discrete nature of the respective true RTDs can manifest itself — an effect which needs to be taken into account when fitting parameterised functions to such data and testing the statistical significance of the resulting approximations.
Functional Characterisation for Instance Ensembles Like the previous RTD-based analytical approaches, the functional characterisation of LVA behaviour can be extended from single problem instances to ensembles of instances in a rather straightforward way. For small instance sets, it is generally feasible to perform the approximation and goodness-of-fit test for each instance as described above; for larger ensembles, it becomes necessary to automate this procedure, and to analyse and summarise its results in an appropriate way. Overall, similar considerations apply as described in the previous section. Using this approach, hypotheses on the behaviour of a given LVA on classes or distributions of problem instances can be tested. Hypotheses on an LVA’s behaviour on infinite or extremely large sets of instances, such as the set of all SAT instances with a given number of clauses and variables, cannot be proven by this method; however, it allows one to falsify such hypotheses or to collect arbitrary amounts of evidence for their validity.
4.4 Characterising and Improving LVA Behaviour
191
Example 4.12 Functional RTD Approximation for Instance Ensembles
A simple generalisation from the result presented in the previous example results in the hypothesis that for an entire class of SAT instances WalkSAT/SKC’s behaviour can be characterised by exponential run-time distributions. Here, we test this hypothesis for a set of 1 000 Uniform Random 3-SAT instances with 100 variables and 430 clauses. By fitting the RLD data for the individual instances with exponential distributions and calculating the χ2 values as outlined above, we obtained the result shown on the right side of Figure 4.12 (page 189), which shows the median values of the RLDs plotted against the corresponding χ2 values: Although, for most instances, the distribution hypothesis is rejected, we observe a clear correlation between the solution cost of the instances and the χ2 values, and for almost all of the hardest instances, the distribution hypothesis passes the test. Thus, although our original generalised hypothesis could not be confirmed, the results suggest an interesting modification of this hypothesis. (Further analysis of the easier instances, for which the RLDs could not be well approximated by exponential distributions, shows that there is a systematic deviation in the left tail of the RLDs, while the right tail matches that of an exponential distribution; details on this result can be found in Hoos and Stützle [1999; 2000a].)
This functional characterisation approach can also be used for analysing and modelling the dependency of LVA behaviour on algorithmic parameters or properties of problem instances (in particular, problem size). Furthermore, it facilitates comparative studies of the behaviour of two or more LVA algorithms. In all of these cases, reasonably simple, parameterised models of the algorithms’ run-time behaviour provide a better basis for the respective analysis than the basic properties and statistics of RTDs discussed before. For example, when studying the scaling of an algorithm’s run-time behaviour with problem size, knowledge of good parameterised functional approximations of the RTDs reduces the investigation to an analysis of the impact of problem size on the model parameters (e.g., the median of an exponential distribution). As we will see in the following, such characterisations can also have direct consequences for important issues such as parallelisation or optimal parameterisation of Las Vegas algorithms. At the same time, they can suggest novel interpretations of LVA behaviour and thus facilitate an improved understanding of these algorithms.
Chapter 4 Empirical Analysis of SLS Algorithms
Optimal Cutoff Times for Static Restarts A detailed analysis of an algorithm’s RTDs, particularly with respect to asymptotic behaviour and stagnation, can often suggest ways of improving the performance of the algorithm. Arguably the simplest way to overcome stagnation of an SLS algorithm is to restart the search after a fixed amount of time (cutoff time). Generally, based on our definition of search efficiency and stagnation, it is easy to decide whether such a static restart strategy can improve the performance of a Las Vegas algorithm A for a given problem instance π . If for all run-times t, the efficiency of A on π at time t, eff A,π (t), is larger than one, restart with any cutoff-time t will lead to performance loss. Intuitively, this is the case when with increasing t, the probability of finding a solution within a given time interval increases, which is reflected in an cumulative RTD graph that is steeper at t than the exponential distribution ed[m] for which ed[m](t) = rtdA,π (t) (an example for such an RTD is shown in Figure 4.13). Furthermore, if, and only if, eff A,π (t) = 1 for all t, restart at any time t will not change the success probability for any time t ; as mentioned in Section 4.3, this condition is satisfied if, and only if, the RTD of A on π is an exponential distribution. Finally, if there exists a run-time t such that effA,π (t) ≤ 1 for all t > t , then restarting the algorithm at time t will lead to an increased solution probability for some run-time t > t. This is equivalent to the condition that from t on the cumulative RTD graph of A on π is less steep for any time t > t than the exponential distribution ed[m] for which ed[m](t) = rtdA,π (t). In the case where random restart is effective for some cutoff-time t , an optimal cutoff time topt can intuitively be identified by finding the ‘left-most’ exponential distribution, ed[m∗ ], that touches the RTD graph of A on π , and
P(solve)
192
1 MMAS 0.9 ed[14] 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1
1
10
100
1 000
run-time [CPU sec]
Figure 4.13 Qualified RTD of an ACO algorithm for TSP (MMAS) on TSPLIB instance
lin318 with 318 vertices, based on 1 000 independent runs, and exponential distribution with identical median. The fact that this RTD is consistently steeper than an exponential indicates that restart with any fixed cutoff time will lead to performance loss.
4.4 Characterising and Improving LVA Behaviour
193
the minimal t for which ed[m∗ ](t) = rtdA,π (t). Formally, this is achieved using the following definitions:
m∗ := min{m | ∃t > 0 : ed[m](t) = rtdA,π (t)}
(4.2)
topt := min{t | t > 0 ∧ ed[m∗ ](t) = rtdA,π (t)}
(4.3)
where rtdA,π (t) is the theoretical run-time distribution of A on π , and A is incomplete, that is, Ps (RT ≤ t) < 1 for any finite run-time t (note that A may still be probabilistically approximately complete). Generally, there are two special cases to be considered when solving these two equations. Firstly, we might not be able to determine m∗ because the set over which we minimise in the first equation has no minimum. In this case, if the infimum of the set is zero, it can be shown that the optimal cutoff time is either equal to zero, or it is equal to +∞ (depending on the behaviour of topt as m∗ approaches zero). Secondly, if m∗ as defined by the first equation exists, it might still not be possible to determine topt , because the set in the second equation does not have a minimum. In this case, there are arbitrarily small times t for which ed[m∗ ](t) = rtdA,π (t), that is, the two curves are identical on some interval [0, t ], and the optimal cutoff time is equal to zero. In practice, optimal cutoff times of zero will hardly occur, since they could only arise if A would solve π with probability larger than zero for infinitesimally small run-times. Equations 4.2 and 4.3 apply to theoretical as well as to empirical RTDs. In the latter case, however, it is sufficient to consider only run-times t in Equations 4.2 and 4.3 that have been observed in one of the runs underlying the empirical RTD. There is one caveat with this method: cases in which the optimal cutoff time determined from Equation 4.3 is equal to one of the longest run-times underlying the given empirical RTD should be treated with caution. The reason for this lies in the fact that the high quantiles of empirical RTDs, which correspond to the longest runs, are often rather statistically unstable. Still, using cutoffs based on such extreme run-times may be justified if there is evidence that the algorithm shows stagnation behaviour. In the case of SLS algorithms for optimisation problems, optimal cutoff times are determined from qualified RTDs. Clearly, such optimal cutoff times depend on the solution quality bound. In many cases, tighter solution quality bounds (i.e., bounds that are closer to the optimal solution quality) lead to higher optimal cutoff times; yet, for weak solution quality bounds, restart with any cutoff time typically leads to performance loss.
Example 4.13 Determining Optimal Cutoff Times for Static Restarts
Figure 4.14 shows the empirical qualified RTD of a simple ILS algorithm for the TSP for finding optimal solutions to TSPLIB instance pcb442 with
1 0.9 ed[18] ILS 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1
P(solve)
Chapter 4 Empirical Analysis of SLS Algorithms
P(solve)
194
1
10
100
run-time [CPU sec]
1 000
1 0.9 ILS + dynamic restart ILS 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 1 10
100
1 000
run-time [CPU sec]
Figure 4.14 Qualified RTD for an ILS algorithm required to find optimal solutions
for TSPLIB instance pcb442; note the stagnation behaviour apparent from the RTD graph. Left: Optimal cutoff time for static restarts, topt , and corresponding exponential distribution ed[m∗ ]. Right: Effect of dynamic restart strategy. (Details are given in the text.)
n = 442 vertices. The algorithm was run 1 000 times on a Pentium 700MHz machine with 512MB RAM, and unsuccessful runs were terminated after 1 000 CPU seconds. This qualified RTD shows strong stagnation behaviour; note that this behaviour could not have been observed when limiting the maximal run-time of the algorithm to less than 5 CPU seconds. Figure 4.14 shows the optimal cutoff time for static restarts, topt , and the corresponding exponential distribution ed[m∗ ], determined according to Equations 4.2 and 4.3. The same exponential distribution characterises the shape of the RTD for the algorithm using static restarts with cutoff time topt .
Dynamic Restarts and Other Diversification Strategies One drawback of using a static restart strategy lies in the fact that optimal cutoff times typically vary considerably between problem instances. Therefore, it would be preferable to re-initialises the search process not after a fixed cutoff time, but depending on search progress. A simple example of such a dynamic restart strategy is based on the time that has passed since the current incumbent candidate solution was found; if this time interval exceeds a threshold θ , a restart is performed. (In this scheme, incumbent candidate solutions are not carried over restarts of the search.) The time threshold θ is typically measured in search steps; θ corresponds to the minimal time interval between restarts and is often defined
4.4 Characterising and Improving LVA Behaviour
195
depending on syntactic properties of the given problem instance, in particular, instance size. Example 4.14 Improving SLS Behaviour Using Dynamic Restarts
Figure 4.14 (right) shows the effect of the simple dynamic restart strategy described above on the ILS algorithm and TSP instance from Example 4.13. Here, for a TSP instance with n vertices, θ := n is used as the minimal time-interval between restarts. Interestingly, the RTD of ILS with this dynamic restart mechanism is basically identical to the RTD of ILS with static restart for the optimal cutoff-time determined in the previous example. This indicates that the particular dynamic restart mechanism used here is very effective in overcoming the stagnation behaviour of the ILS algorithm without restart.
Restarting an SLS algorithm from a new initial solution is typically a rather timeconsuming operation. Firstly, a certain setup time is required for generating a new candidate solution from which the search is started and for initialising the data structures used by the search process accordingly. This setup time is often substantially higher than the time required for performing a search step. Secondly, after initialising the search process, SLS algorithms almost always require a certain number of search steps to reach regions of the underlying search space in which there is a non-negligible chance of finding a solution. These effects are reflected in extremely low success probabilities in the extreme left tail of the respective RTDs. Furthermore, they typically increase strongly with instance size, rendering search restarts a costly operation. These disadvantages can be avoided by using diversification techniques that are less drastic than restarts in order to overcome stagnation behaviour. One such technique called fitness-distance diversification has been used to enhance the ILS algorithm for the TSP mentioned in Example 4.13; the resulting algorithm shows substantially better performance than the variant using dynamic restarts from Example 4.14. (Details on this enhanced ILS algorithm can be found in Chapter 8, page 396.) Another diversification technique that also has the theoretical advantage of rendering the respective SLS algorithm probabilistically approximately complete (PAC), is the so-called random walk extension [Hoos, 1999a]. In terms of the GLSM models of the respective SLS algorithms, the random walk extension consists of adding a random walk state in such a way that throughout the search, arbitrarily long sequences of random walk steps can be performed with some (small) probability. This technique has been used to obtain state-of-the-art SLS
196
Chapter 4 Empirical Analysis of SLS Algorithms
algorithms for SAT, such as Novelty+ (for details, see Chapter 6). Generally, effective techniques for overcoming search stagnation are important components of advanced SLS methods, and improvements in these techniques can be expected to play a major role in designing future generations of SLS algorithms.
Multiple Independent Runs Parallelisation Las Vegas algorithms lend themselves to a straightforward parallelisation approach by performing independent runs of the same algorithm in parallel. From the discussion in the previous sections we know that if an SLS algorithm has an exponentially distributed RTD, such a strategy is particularly effective. Based on a well-known result from the statistical literature [Rohatgi, 1976], if for a given algorithm the probability of finding a solution in t time units is exponentially distributed with median m, then the probability of finding a solution in at least one of p independent runs of time t each is exponentially distributed with median m/p. Consequently, if we run such an algorithm once for time t, we obtain exactly the same success probability as when running the algorithm p times for time t/p. By executing these p independent runs in parallel on p processors, an optimal parallelisation speedup Sp := RT 1 /RT p = p is achieved, where RT 1 = t is the sequential run-time and RT p = t/p is the parallel computation time, using p processors. This theoretical result holds for arbitrary numbers of processors. In practice, SLS algorithms do not have perfectly exponential RTDs; as explained previously, there are typical deviations in the left tail which reflect the setup time and initial search phase. Therefore, when the number of processors is high enough that each of the parallel runs becomes very short, the parallelisation speedup will generally be less than optimal. Given an empirical RTD, the parallelisation speedup Sp for reaching a certain success probability ps can be calculated as follows. RT 1 , the sequential run-time required for reaching a solution probability ps , can be directly determined from the given RTD; technically,
s (RT ≤ t ) ≥ ps }. Then the parallel time required for reachRT 1 := min{t | P ing the same solution probability by performing multiple independent runs on p processors is given by
s (RT ≤ t ) ≥ 1 − (1 − ps )1/p } RT p := min{t | P
(4.4)
Using this equation, the minimal number of processors required for achieving the desired success probability within a maximal accumulated parallel run-time tmax can be easily determined. (The accumulated parallel run-time is the total run-time over all processors.) It is interesting to note that for higher success probabilities, the maximal number of processors for which optimal parallelisation can be achieved is typically also higher.
parallelisation speedup
4.4 Characterising and Improving LVA Behaviour 100 90 80 70 60 50 40 30 20 10
197
bw_large.c (hard) bw_large.b (easier)
10 20 30 40 50 60 70 80 90 100
number of processors
Figure 4.15 Speedup achieved by multiple independent runs parallelisation of a high– performing SLS algorithm for SAT applied to two SAT-encoded instances of a hard planning problem. The diagonal line indicates optimal parallelisation speedup. Note that for the easier instance, the parallelisation speedup is increasingly suboptimal for more than 10 processors. (For details, see text.)
Example 4.15 Speedup Through Independent Parallel Runs
Figure 4.15 shows the parallelisation speedup Sp as a function of the number of processors (computed using Equation 4.4) for a high-performance SLS algorithm for SAT (Novelty) applied to two well-known benchmark instances for SAT, the SAT-encoded planning problems bw large.b and bw large.c. The underlying empirical RTDs (determined using instancespecific optimal noise parameter settings of Novelty) are based on 250 successful runs each, and all points of the speedup curves are based on no fewer than ten runs. A desired success probability of ps = 0.95 was used for determining the sequential and parallel run-times. Instance bw large.c is much harder than bw large.b, and allows approx. optimal speedup for more than 70 processors; the underlying RTD is almost perfectly approximated by an exponential distribution. For the easier instance, the parallelisation speedup becomes suboptimal for more than 10 processors; this is due to the larger relative impact of the setup time and initial search phase on overall run-time.
Generally, using multiple independent runs is an attractive model of parallel processing, since it involves basically no communication overhead and can be easily implemented for almost any parallel hardware and programming environment, from networks of standard workstations to specialised multiple instruction / multiple data (MIMD) machines with thousands of processors. The resulting parallel SLS algorithms are precisely captured by the homogeneous co-operative GLSM
198
Chapter 4 Empirical Analysis of SLS Algorithms
model without communication introduced in Chapter 3. They are of particular interest in the context of SLS applications to time-critical tasks (such as robot control or on-line scheduling), as well as to the distributed solving of very large and hard problem instances.
4.5 Further Readings and Related Work The term Las Vegas algorithm was originally introduced by Babai [1979]. Although the concept is widely known, the literature on Las Vegas algorithms is relatively sparse. Luby, Sinclair and Zuckerman have studied optimal strategies for selecting cutoff times [Luby et al., 1993]; closely related theoretical work on the parallelisation of Las Vegas algorithms has been published by Luby and Ertel [1994]. The application scenarios for Las Vegas algorithms and norms of LVA behaviour covered here have been introduced by Hoos and Stützle [1998]. Run-time distributions have been occasionally observed in the literature for a number of years [Taillard, 1991; Battiti and Tecchiolli, 1992; Taillard, 1994; ten Eikelder et al., 1996]. Their use, however, has been typically restricted to purely descriptive purposes or to obtaining hints on the speedup achievable by performing independent parallel runs of a given sequential algorithm [Battiti and Tecchiolli, 1992; Taillard, 1991]. Taillard specifies general conditions under which super-optimal speedups can be achieved through multiple independent tries parallelisation [Taillard, 1994]. The use of RTDs at the core of an empirical methodology for studying SLS algorithms was first proposed by Hoos and Stützle [1998]. Since then, RTD-based methods have been used for the empirical study of a broad range of SLS algorithms for numerous combinatorial problems [Aiex et al., 2002; Hoos and Stützle, 2000a; Hoos and Boutilier, 2000; Stützle and Hoos, 2001; Stützle, 1999; Tulpan et al., 2003]. There is some related work on the use of search cost distributions over instance ensembles for the empirical analysis of complete search algorithms. Kwan showed that for different types of random CSP instances, the search cost distributions for several complete algorithms cannot be characterised by normal distributions [Kwan, 1996]. Frost, Rish and Vila use continuous probability distributions for approximating the run-time behaviour of complete algorithms applied to randomly generated Random 3-SAT and binary CSPs from the phase transition region [Frost et al., 1997]. In Rish and Frost [1997], this approach is extended to search cost distributions for unsolvable problems from the over-constrained region. Gomes and Selman studied run-time distributions of backtracking algorithms based on the Brelaz heuristic for solving instances of the Quasigroup Completion
4.5 Further Readings and Related Work
199
Problem, a special type of CSP, in the context of algorithm portfolios design [Gomes and Selman, 1997a]. Interestingly, the corresponding RTDs for the randomised systematic search algorithms they studied can (at least in some cases) be approximated by ‘heavy-tailed’ distributions, a fact which can be exploited for improving the performance of these algorithms by using a static restart mechanism [Gomes et al., 1997]. Similar results have been obtained for randomised complete algorithms for SAT; at the time, the resulting algorithms showed state-of-the-art performance on many types of SAT instances [Gomes et al., 1998]. Interestingly, the RTDs for some of the most widely known and best-performing SLS algorithms for SAT appear to be well approximated by exponential distributions [Hoos, 1998; Hoos and Stützle, 1999; Hoos, 1999a] or mixtures of exponentials [Hoos, 2002b]. To our best knowledge, heavy-tailed RTDs have generally not been observed for any SLS algorithm. A number of specific techniques have proven to be useful in the context of certain types of experimental analyses. Estimates for optimal solution qualities for combinatorial optimisation problems can be obtained using techniques based on insights from mathematical statistics [Dannenbring, 1977; Golden and Steward, 1985]. Using solution quality distributions, interesting results have been obtained regarding the behaviour of SLS algorithms as instance size increases [Schreiber and Martin, 1999]. Techniques from experimental design were shown to be helpful in deriving automated (or semi-automated) procedures for tuning algorithmic parameters [Xu et al., 1998; Coy et al., 2001; Birattari et al., 2002]. Various general aspects of empirical algorithms research are covered in a number of publications. There have been several early attempts to provide guidelines for the experimental investigation of algorithms for combinatorial optimisation problems and to establish reporting procedures that improve the reproducibility of empirical results [Crowder et al., 1979; Jackson et al., 1990]. Guidelines on how to report results that are more specific to heuristic methods, including SLS algorithms, are given in Barr et al. [1995]. Hooker advocates a scientific approach to experimental studies in operations research and artificial intelligence [Hooker, 1994]; this approach is based on the formulation and careful experimental investigation of hypotheses about algorithm properties and behaviour. General guidelines for the experimental analysis of algorithms are also given by McGeoch and Moret [McGeoch, 1996; McGeoch and Moret, 1999; Moret, 2002]. A recent article by Johnson provides an extensive collection of guidelines and potential pitfalls in experimental algorithms research, including some very practical advice on the topic [Johnson, 2002]. Gent et al. give a similar, but more limited, overview of potential problems in the experimental analysis of algorithms [Gent et al., 1997]. Statistical methods are at the core of any empirical approach to investigate the behaviour and the performance of SLS algorithms. Cohen’s book on empirical
200
Chapter 4 Empirical Analysis of SLS Algorithms
methods in artificial intelligence is becoming a standard text and reference book for the presentation and application of statistical methods not only in AI but also in other fields of computer science [Cohen, 1995]. For an additional introduction to statistical methods we also recommend the book by Papoulis [1991]. The handbook by Sheskin [2000] is an excellent guide to statistical tests and their proper application; a more specialised introduction to non-parametric statistics can be found in Conover [1999] and Siegel et al. [1988]. Furthermore, for general techniques of experimental design and the analysis of experimental data we refer to the work of Dean and Voss [2000] and Montgomery [2000].
4.6 Summary Empirical methods play a crucial role in analysing the performance and behaviour of SLS algorithms, and appropriate techniques are required for conducting empirical analyses competently and correctly. In this chapter, we motivated why run-time distributions (RTDs) provide a good basis for empiricially analysing the behaviour of SLS algorithms and more generally, members of the broader class of (generalised) Las Vegas algorithms. We discussed the asymptotic behaviour of Las Vegas algorithms and introduced three application scenarios with different requirements for empirical performance analyses. We then introduced formally the concepts of run-time distributions (RTDs), qualified run-time distributions (QRTDs) and solution quality distributions (SQDs), as well as time-dependent solution quality statistics (SQTs) and solution quality dependent run-time statistics (RTQs). Empirical RTDs can be easily obtained from the same data required for stable estimates of mean run-times or time-dependent solution quality. We presented and discussed RTD-based methods for the empirical analysis of individual LVAs as well as for the comparative analysis of LVAs, on single problem instances and instance ensembles. We also contrasted peak-performance and robustness analysis and argued that the latter is important to capture dependencies of an algorithm’s performance on parameter settings, problem instances or instance size. The measures of efficiency and stagnation are derived from a given RTD and characterise an algorithm’s performance over time; intuitively, these measures indicate how much an algorithm’s performance can be improved by a static restart mechanism. Functional approximations of RTDs with known probability distributions can be used to summarise and mathematically model the behaviour of Las Vegas algorithms. The regularities of LVA behaviour captured by such functional characterisations can facilitate performance analysis, for example, by suggesting simplified experimental designs in which only the parameter values of a functionally
Exercises
201
characterised family of RTDs are analysed instead of the entire distributions. Applied to SLS algorithms, this approach can also reveal fundamental properties of the algorithm and provide deeper insights into its behaviour. Results from the empirical analysis of an SLS algorithm can provide significant leverage for further improvement of its performance. We gave an overview of various approaches to achieving such improvements, including static and dynamic restart mechanisms, adaptive diversification, random walk extension and parallelisation based on multiple independent tries. Overall, the importance of empirical analyses in the development and application of SLS algorithms can hardly be overestimated. We believe that the methods and techniques presented in this chapter provide a solid basis for sound and thorough empirical studies on SLS algorithms and thus facilitate the development of better algorithms and an improved understanding of their characteristics and behaviour.
Exercises 4.1
[Easy] Give three examples for (generalised) Las Vegas algorithms and identify all stochastic elements in these.
4.2
[Easy] Describe a concrete application domain where the utility of a solution to a given problem instance changes over time.
4.3
[Medium] Prove that Uninformed Random Picking (see Chapter 1, Section 1.5) has the PAC property.
4.4
[Easy] Explain the difference between a run-time distribution (RTD), a solution quality distribution (SQD) and a search cost distribution (SCD).
4.5
[Medium] In order to investigate the behaviour of an SLS algorithm for a combinatorial optimisation problem on a given problem instance, solution quality traces over m independent runs are recorded. In each of these runs, the known optimal solution quality for the given instance is reached. Explain how qualified run-time distributions (RTDs) for various solution quality bounds and solution quality distributions (SQDs) for various run-time bounds can be obtained from these solution quality traces.
4.6
[Easy; Hands-On] Study the behaviour of a simple iterated local search algorithm for the TSP (available from www.sls-book.net) on TSPLIB instance
Chapter 4 Empirical Analysis of SLS Algorithms
lin318 (available from TSPLIB [Reinelt, 2003]). In particular, report and compare the solution quality-distributions (SQDs) for increasingly high run-time bounds. (The provably optimal solution quality for this instance is 42 029.) Describe how the SQDs change with the run-time bounds and explain the reasons underlying this phenomenon. 4.7
[Medium] You are comparing the performance of two SLS algorithms A and B for a combinatorial decision problem. Applied to a well-known benchmark instance, these algorithms were found to exhibit the RTDs shown below. 1 algorithm A algorithm B ed[744]
0.1
P(solve)
202
0.01
0.001 10
102
103
104
run-time [search steps]
What do you learn from these RTDs? Which further experiments do you suggest to decide which algorithm is superior? 4.8
[Medium] Explain why it is desirable to mathematically model observed RTDs using functional approximations. Do the approximations have to be perfect to be useful?
4.9
[Medium] What happens if the equations used for determining optimal restart times according to Equations 4.2 and 4.3 (page 193) are applied to a complete Las Vegas algorithm?
4.10 [Hard] Outline an empirical approach to run and decide a competition on the best SLS algorithm for the TSP. Discuss all relevant aspects of the competition (selection of problem instances, performance measures, experimental protocol) and justify your approach.
[. . . ] to the traveler, a mountain outline varies with every step, and it has an infinite number of profiles, though absolutely but one form. Even when cleft or bored through it is not comprehended in its entireness. —Henry David Thoreau, Writer & Philosopher
Search Space Structure and SLS Performance The performance of SLS algorithms crucially depends on structural aspects of the spaces being searched. Studying the nature of this dependency can significantly improve our understanding of SLS behaviour and facilitate the further improvement and successful application of SLS methods. In this chapter, we introduce various aspects of search space structure and discuss their impact on SLS performance. These include fundamental properties of a given search space and neighbourhood graph, such as size, connectivity, diameter and solution density, as well as global and local properties of the search landscapes encountered by SLS algorithms, such as the number and distribution of local minima, fitness distance correlation, measures of ruggedness, and detailed information on the plateau and basin structure of the given space. Some of these search space features can be determined analytically, but most have to be measured empirically, often involving rather complex search methods. We exemplify the type of results obtainable from such analyses of search space features and their impact on SLS performance for our standard example problems, SAT and TSP.
5.1 Fundamental Search Space Properties The search process carried out by any SLS algorithm when applied to a given problem instance π can be seen as a walk on the neighbourhood graph associated with π , GN (π ). Recall from Chapter 1, Section 1.5 that GN (π ) := (S (π ), N (π )), 203
204
Chapter 5 Search Space Structure and SLS Performance
where S (π ) is the search space of π , that is, the set of all candidate solutions, and N (π ) is the given neighbourhood relation. Obviously, the properties of the search space and the corresponding neighbourhood graph have an impact on the behaviour and performance of SLS algorithms.
Remark: For simplicity and generality, in this chapter we mainly use the term search position (or short: position) to refer to candidate solutions; other equivalent terms that are often found in the literature are state or configuration.
Search Space Size and Diameter It is intuitively clear that the order of the neighbourhood graph, that is, the size of the search space in terms of the number of candidate solutions it comprises, plays an important role: Generally, finding any of a fixed number of (optimal) solutions becomes harder as the size of the search space increases. For example, we would expect that a SAT instance with 20 variables is substantially easier to solve than one with 200, considering that in the former case, the search space comprises only 220 = 1 048 576 variable assignments, compared to 2200 ≈ 1.61 · 1060 in the latter case. This correlation between search space size and search cost certainly exists for the simplest SLS methods, Uninformed Random Picking and Uninformed Random Walk, and it typically also holds for more powerful SLS strategies. However, search complexity is not always correlated with search space size, as other factors, which will be discussed in the following, can have an impact on SLS behaviour that is substantial enough to completely dominate the effect of search space size. This has been shown, for example, for certain types of SAT instances and algorithms [Hoos, 1999b]. Another property of the neighbourhood graph that is intuitively related to search cost is the diameter, diam(GN (π )), which is defined as the maximal distance between any two vertices s, s in GN (π ) in terms of the number of edges that have to be traversed to reach s from s or vice versa. For example, in the case of a SAT instance with n variables under the 1-flip neighbourhood, the diameter of the neighbourhood graph is n. The neighbourhood graphs underlying typical SLS algorithms are usually connected; that is, there is a path between any two vertices; consequently, the diameter of such neighbourhood graphs is finite. All else being equal, graphs with larger diameters are intuitively harder to search than compact graphs characterised by a small diameter. For Uninformed Random Walk, this is clearly the case; since in almost all SLS
5.1 Fundamental Search Space Properties
205
algorithms the search steps predominantly correspond to traversing single edges in the underlying neighbourhood graph, a similar correlation can be expected. There is a relationship between the size of the search space, the diameter of the neighbourhood graph and the local neighbourhood size (where the latter corresponds to the vertex degree of the neighbourhood graph): Intuitively, for fixed search space size, the larger the neighbourhood size, the smaller the diameter of GN (π ). Neighbourhood graphs are typically regular, that is, each candidate solution has the same number of direct neighbours. In this case, a bound on this relationship can be formally derived from the well-known Moore bound as diam(GN (π )) ≥ logd−1 (m − 2 · m/d), where d := |N (s)| is the neighbourhood size and m := |S (π )| denotes the size of the given search space (see, e.g., Biggs [1993]). However, in most cases this bound is rather weak, and a more precise determination of the diameter is desirable.
Example 5.1 Fundamental Search Space Properties for the TSP
For the symmetric TSP, when candidate solutions are represented as permutations of the vertices, a problem instance with n vertices has a search space of size (n − 1)!/2. (Note that for the symmetric TSP, the direction in which a tour is ntraversed does not matter.) nIt is easy to see that the neighbourhood size is = n · (n − 1)/2 and 3 = n · (n − 1) · (n − 2)/6 for the 2- and 2 3-exchange neighbourhoods, respectively. Unfortunately, the exact diameter of the neighbourhood graphs is unknown. For the 2-exchange neighbourhood, an upper bound of n − 1 can be given [Stadler and Schnabl, 1992], while an immediate lower bound is n/2. For the 3-exchange neighbourhood, the same upper bound applies, but the lower bound is now n/3. In both cases, the true diameter is more likely to be close to the respective lower bounds.
As illustrated in the example, in some cases the diameter of the search space cannot be determined exactly. A similar problem often arises in the computation of the distance between candidate solutions s and s , which is usually defined to be the length of the shortest path between s and s in the given neighbourhood graph. If the exact distance cannot be computed easily, surrogate distances measures are often used that give an approximation of the true distance. For example, in the case of the TSP, the distance d(s, s ) between candidate solutions s and s is usually measured in terms of the number of edges that are part of s but not s ; more precisely, d(s, s ) := n − k , where n is the
206
Chapter 5 Search Space Structure and SLS Performance
number of vertices in the given graph and k is the number of edges contained in both s and s (this distance metric is called bond distance). The bond distance is tightly correlated with the true distance in the standard 2- and 3-exchange neighbourhoods and is used in practically all studies of the search landscape for the TSP [Boese, 1996; Mühlenbein, 1991; Merz and Freisleben, 2001; Stützle and Hoos, 2000]. In general, when using surrogate distance measures in the context of analysing SLS behaviour, it is important to consider properties of candidate solutions and search steps of the given algorithm in an appropriate way. For example, the bond distance for the TSP takes into account that the crucial features of a candidate tour are based on the relative order of its vertices, rather than their absolute position in a list representation of the respective cyclic path.
Number and Density of Solutions Another factor that has a rather obvious impact on search cost is the number of (optimal) solutions of a given problem instance: For fixed search space size, the more (optimal) solutions there are, the shorter is the expected time to find one of these using Uninformed Random Picking; again, it is reasonable to expect a qualitatively similar negative correlation between the number of (optimal) solutions and search cost for more complex SLS algorithms. In the case of Uninformed Random Picking, it is easy to verify that for a problem instance π with search space S (π ) and k (optimal) solutions, the expected number of search steps required for finding a (optimal) solution is #S (π )/k . In this simple example, the expected search cost is inversely proportional to k/#S (π ), the density of (optimal) solutions. Different from the number of solutions, solution density can be meaningfully compared between problem instances that differ in size. Because solution density values are typically very small, it is often more convenient to report them as − log10 (k/#S (π )). As illustrated in the following example, for SLS algorithms more powerful than Uninformed Random Picking, the search cost can be strongly correlated with solution density, but the relationship is usually not precisely an inverse proportionality.
Example 5.2 Solution Density vs Search Cost for GWSAT
The impact of solution density on search cost has been demonstrated clearly for Uniform Random 3-SAT, a well-known class of benchmark instances
5.1 Fundamental Search Space Properties
207
106
1 0.8
search cost [mean # steps]
cdf emp. pdf (⫻10) logN[26.5, 1.26]
P
0.6 0.4 0.2 0 20
22
24
26
28
-log10(solution density)
30
32
105
104
103 102 20
22
24
26
28
30
32
-log10(solution density)
Figure 5.1 Left: Distribution of solution density over a set of hard Uniform Random-3SAT instances with 100 variables and 430 clauses; the corresponding empirical probability density function (histogram) is well approximated by a log-normal distribution (dashed grey curve); the arrow indicates the solution density value of an instance with a single solution. Right: Correlation between solution density and search cost for GWSAT. (For details, see text.)
for SAT (see also Chapter 6, Section 6.1) [Clark et al., 1996; Hoos, 1999b]. The left side of Figure 5.1 shows the distribution of the solution density over a set of hard Uniform Random 3-SAT instances with 100 variables and 430 clauses (as determined by a suitably modified systematic search algorithm). Clearly, there is a large variability in the solution density over this test-set. Furthermore, as shown in the figure, the distribution of the solution density, and hence the distribution of the number of solutions, is well approximated by a log-normal distribution. This is an effect of the random process used for generating these problem instances, during which each point in the search space (i.e., each variable assignment) is uniformly affected by each independently generated clause. (It may be noted that the empirical distribution of the logarithms of the solution density values does not pass the Shapiro-Wilk normality test at the 0.05 significance level; however, as can be easily seen from a quantile-quantile plot, the reason for this solely lies in a discretisation effect for extremely small solution densities.) As can be seen from the right side of Figure 5.1, there is a strong correlation between the solution density and the search cost of GWSAT, a well-known randomised best-improvement algorithm for SAT (see also Section 6.2, page 269f.). Here, the search cost for a given instance represents the mean number of search steps required for finding a solution,
208
Chapter 5 Search Space Structure and SLS Performance
using GWSAT with an approximately optimal noise parameter setting and no restart. A more detailed analysis reveals that there is a strong linear dependency between the negative logarithm of the solution density, − log10 (sd), and the logarithm of the search cost, log10 (sc) (the Pearson correlation coefficient is 0.83); this indicates a polynomial relationship between sd and sc. Notably, there is a substantial variation in search cost, especially for small numbers of solutions, which indicates that factors other than solution density have a significant impact on search cost. Analogous results have been obtained for sets of randomly generated instances of graph colouring and constraint satisfaction problems [Clark et al., 1996; Hoos, 1999b].
For typical combinatorial problems, the density of (optimal) solutions is very low. For example, while it has been shown for critically constrained Uniform Random 3-SAT that the average number of solutions scales exponentially with the number of variables, n [Monasson and Zecchina, 1996; 1997], closer examination reveals that the solution density drops exponentially with n. In many cases, structured instances of combinatorial decision problems have unique or very few solutions (e.g., this is the case for a number of well-known SATencoded instances of blocksworld planning problems first used by Kautz and Selman [1996]); the same applies to many types of combinatorial optimisation problems. For very small problem instances, solution density can be determined by exhaustive enumeration or estimated using simple sampling techniques. In some cases, in particular for SAT, the number of solutions for moderately large instances can be determined using suitably modified variants of systematic search algorithms (see, e.g., Birnbaum and Lozinskii [1999]; Bayardo Jr. and Pehoushek [2000]). Furthermore, there are situations in which (expected) solution densities can be analytically derived; this is the case, for instance, for the previously mentioned class of Uniform Random 3-SAT [Monasson and Zecchina, 1996; 1997]. In most cases, however, determining or estimating solution densities is a difficult problem, for which no general solution methods are available. One method that is fairly regularly used in practice for sampling the set of (optimal) solutions of a given problem instance is based on performing multiple runs of a highperformance SLS algorithm. Typically, this method leads to very biased samples, and it is often infeasible to perform sufficiently many runs to obtain a reasonably good estimate of the solution density. Additionally, in the case of optimisation problems, it requires knowledge of the optimal solution quality, which in many cases is not available.
5.2 Search Landscapes and Local Minima
209
Distribution of Solutions For problem instances with multiple (optimal) solutions, and particularly in the case of non-vanishing solution densities, the distribution of the solutions across the search space can have an impact on search cost. This can be illustrated by considering the two extreme cases of evenly distributed and tightly clustered solutions. If solutions are evenly distributed within the search space, the expected number of steps for reaching a solution using Uninformed Random Walk is very similar across the entire search space. If all solutions are tightly clustered in a relatively small region of the space, the search cost varies considerably depending on the starting point of the walk. Intuitively, more efficient SLS algorithms should exhibit similar behaviour. One method for measuring how evenly solutions are distributed within a given search space is based on the pairwise distances within a sample Sˆ of solutions that has been obtained, for example, by running a high-performance stochastic algorithm multiple times. (Ideally, the entire set of solutions should be used; but as discussed above, this is typically infeasible except in the case of small problem instances.) We now consider the distributions D (s) of distances d(s, s ) between any solution s ∈ Sˆ and all other solutions s ∈ Sˆ and note that large variations between the D (s) indicate an uneven solution distribution. For this approach to work well it is important to use an unbiased sample of the solution set; unfortunately, obtaining such unbiased samples is often hard or impossible. In many cases, decision problems such as SAT have a large number of solutions that occur in the form of tightly connected regions. These solution plateaus will be discussed in more detail in Section 5.5.
5.2 Search Landscapes and Local Minima In all but the simplest SLS algorithms, the search process is guided by an evaluation function g (π ) which, in the case of optimisation problems, is often identical with the given objective function f (π ). To capture the impact of the evaluation function in combination with search space properties and the neighbourhood relation, we introduce the well-known concept of a search landscape. Definition 5.1 Search Landscape
Given an SLS algorithm A and a problem instance π with associated search space S (π ) and neighbourhood relation N (π ), as well as an evaluation
210
Chapter 5 Search Space Structure and SLS Performance
function g (π ) : S → R, the search landscape (or short landscape) of π , L(π ), is defined as L(π ) := (S (π ), N (π ), g (π )).
For a given search landscape L = (S, N, g ), we will refer to the evaluation function value g (s) of a given position s also as the level of s. Furthermore, for convenience, we will sometimes refer to the fundamental properties of S and GN = (S, N ) discussed in Section 5.1 as properties of L; and following common usage, we will occasionally use the term search space structure to refer to the structure of L.
Remark: Formal definitions that differ from the one given here can be found
in the literature; the differences are, however, superficial only and do not affect the common underlying concept. Frequently, the term fitness landscape is used to refer to the same concept. This term dates back to some of the earliest studies of search landscapes, which were performed in the context of evolutionary theory [Wright, 1932], and has since been used in the study of the factors underlying the behaviour of Evolutionary Algorithms [Kallel et al., 2001].
Basic Landscape Properties Both global and local properties of search landscapes can have an impact on the behaviour and performance of SLS algorithms. The following properties provide the basis for a useful classification of landscapes:
Definition 5.2 Invertible, Locally Invertible and Non-Neutral Landscapes
Let L := (S, N, g ) be a search landscape. L is: • invertible (or non-degenerate), if no two positions in S have the same level, that is, ∀s, s ∈ S : [s = s ⇒ g (s) = g (s )]. Landscapes that do not have this property are also called degenerate. • locally invertible, if any local neighbourhood in L is invertible, that is, ∀r ∈ S : [∀s, s ∈ N (r ) ∪ {r } : [s = s ⇒ g (s) = g (s )]]. • non-neutral, if neighbouring positions in L always have different levels, that is, ∀s ∈ S : [∀s ∈ N (s) : [s = s ⇒ g (s) = g (s )]]. Landscapes in
5.2 Search Landscapes and Local Minima
211
which neighbouring positions may have the same level are also called neutral.
Obviously, every invertible landscape is locally invertible, and every locally invertible landscape is non-neutral; however, there are non-neutral landscapes that are not locally invertible and locally invertible landscapes that are not invertible [Flamm et al., 2002]. Although exceptions exist, the search landscapes encountered by SLS algorithms for combinatorial problems are often degenerate. For example, when using the number of violated clauses as an evaluation function in the case of SAT, the landscapes are typically degenerate and neutral; in fact, as we will discuss in more detail in Section 5.5, these landscapes are characterised by large and numerous plateaus. On the other hand, there are many classes of combinatorial optimisation problems that give rise to locally invertible landscapes; for example, the landscapes for Euclidean TSP instances obtained from a set of randomly placed vertices under the standard 2-edge-exchange neighbourhood can be expected to be locally invertible. Note that problem instances with invertible landscapes always have unique (optimal) solutions. The converse, however, does not hold: single solution instances can have degenerate landscapes. Locally invertible landscapes have the interesting property that when performing an iterative best improvement search, the end point of the respective search trajectory is uniquely determined by its starting point [Flamm et al., 2002].
Position Types and Position Type Distributions While basic properties, such as invertibility or neutrality can be useful for characterising landscapes at a global level, the detailed analysis of search landscapes is often based on local features. The following definition provides a natural classification of search positions according to the topology of their local neighbourhood. Definition 5.3 Position Types
Let L := (S, N, g ) be a search landscape. For a position s ∈ S , we define the following functions, which determine the number of upwards, sidewards and downwards steps from s to its direct neighbours:
upw(s) := #{s ∈ N (s) | g (s ) > g (s)} sidew(s) := #{s ∈ N (s) | g (s ) = g (s)} downw(s) := #{s ∈ N (s) | g (s ) < g (s)}
212
Chapter 5 Search Space Structure and SLS Performance IPLAT SLMAX LEDGE
LMAX
SLOPE LMIN
SLMIN
Figure 5.2 Examples for the various types of search positions.
Based on these functions, we define the following position types: SLMIN(s) LMIN(s) IPLAT(s) LEDGE(s) SLOPE(s)
:⇔ :⇔ :⇔ :⇔ :⇔
downw(s) = sidew(s) = 0 downw(s) = 0 ∧ sidew(s) > 0 ∧ upw(s) > 0 downw(s) = upw(s) = 0 downw(s) > 0 ∧ sidew(s) > 0 ∧ upw(s) > 0 downw(s) > 0 ∧ sidew(s) = 0 ∧ upw(s) > 0 LMAX(s) :⇔ downw(s) > 0 ∧ sidew(s) > 0 ∧ upw (s) = 0 SLMAX(s) :⇔ sidew(s) = upw (s) = 0
The positions defined by these predicates are called strict local minima (SLMIN), local minima (LMIN), interior plateau (IPLAT), ledge (LEDGE), slope (SLOPE), local maxima (LMAX) and strict local maxima (SLMAX) positions. For an illustration of the various position types, see Figure 5.2.
Note that for any landscape L, the classes of search positions induced by these predicates form a complete partition of S , that is, every search space position falls into exactly one of these types. Note also that these types can be weakly ordered according to the restrictiveness of their defining predicates when assuming that defining equalities are more restrictive than inequalities; in this respect, SLMIN, SLMAX and IPLAT are most restricted, followed by LMIN, LMAX and SLOPE, while LEDGE is least restricted. This ordering can be further refined based on the observation that equalities on the number of sideward steps are less restrictive than those on the number of upward or downward steps; consequently, type SLOPE is less constrained than LMIN and LMAX, while IPLAT is more restrictive than SLMIN and SLMAX. For random landscapes, we would therefore expect a distribution of the position types according to this ordering, that is, the more constrained a position type is, the more rarely it should occur. The relative abundance of position types encountered in a given search landscape can give interesting insights into SLS behaviour; it can be summarised in
5.2 Search Landscapes and Local Minima
213
the form of position type distributions that specify for each position type T the fraction of type T positions within the given search space. For small search spaces, position type distributions can be precisely determined by exhaustive enumeration. In many cases, however, the search spaces to be analysed are too large for this approach, and sampling methods have to be applied. Unbiased random sampling often suffers from the problem that positions of a type that are particularly relevant for SLS behaviour, such as LMIN, SLMIN or IPLAT positions, can be extremely rare. (Note that most effective SLS algorithms mainly search a small subspace of high-quality candidate solutions, which can be structurally very different from the rest of the search space.) One way of overcoming this problem is to sample position types along SLS trajectories [Hoos, 1998]. The position type distributions thus obtained characterise the regions of the search space seen by the respective SLS algorithm and are hence likely to contain position type information related to the algorithm’s behaviour and performance. The following example illustrates the methods used for determining position type distributions as well as the results obtained from this type of search space analysis.
Example 5.3 Position Type Distributions for SAT
In the context of an analysis of the search space features of various types of SAT instances, the following results on position type distributions have been obtained [Hoos, 1998]: Table 5.1 shows the complete distributions of
Instance
avg sc
SLMIN
LMIN
IPLAT
uf20-91/easy uf20-91/medium uf20-91/hard
13.05 83.25 563.94
0% < 0.01% < 0.01%
0.11% 0.13% 0.16%
0% 0% 0%
SLOPE
LEDGE
LMAX
SLMAX
0.59% 0.31% 0.56%
99.27% 99.40% 99.23%
0.04% 0.06% 0.05%
< 0.01% < 0.01% < 0.01%
Instance
uf20-91/easy uf20-91/medium uf20-91/hard
Table 5.1 Distribution of position types for critically constrained Uniform Random 3-SAT instances with low, intermediate and high search cost for GWSAT, a high-performance SLS algorithm for SAT based on exhaustive enumeration of the search space. (See text for details.)
214
Chapter 5 Search Space Structure and SLS Performance
position types for three small Uniform Random 3-SAT instances from the solubility phase transition region (see also Chapter 6, Section 6.1); the fractions of positions for each type were obtained by exhaustive enumeration of the entire search space. The instances were selected based on the search cost for GWSAT, a high-performance randomised iterative improvement algorithm for SAT (see also Chapter 6, Section 6.2), measured as the mean number of search steps required by GWSAT to find a solution, using close to optimal parameter settings and estimated from 100 successful runs per instance; the search cost values are indicated as ‘avg sc’ in the table. Entries reading ‘< 0.01%’ indicate that the corresponding values are in the open interval (0%, 0.01%). The results are consistent with the ordering based on the restrictiveness of the position types discussed above: LEDGE positions are predominant, followed by SLOPE, LMIN and LMAX positions; SLMIN and SLMAX positions occur very rarely, and no IPLAT positions were found for any of the instances analysed here or later. This suggests that the search spaces of Uniform Random 3-SAT instances show some structural properties similar to entirely random search spaces; but while random search spaces can be expected to contain equal numbers of LMIN and LMAX as well as of SLMIN and SLMAX positions, this is apparently not the case for the search spaces of Uniform Random 3-SAT instances, where LMIN positions occur more frequently than LMAX positions. This is most probably an effect of the CNF generation mechanism for Uniform Random 3-SAT instances; since each added three-literal CNF clause ‘lifts’ the evaluation function for one eighth of the search positions by one, while the remaining positions remain unaffected, local maxima are more likely to be eliminated when more and more clauses are added. Our results also suggest that for Uniform Random 3-SAT instances from the solubility phase transition, the hardness of instances for SLS algorithms such as GWSAT might be correlated with the number of LMIN positions, which is consistent with the intuition that local minima impede local search (cf. Yokoo [1997]). It is interesting to note that there are no IPLAT positions, and consequently, randomised iterative improvement algorithms, such as GWSAT, can always either perform an improving or a worsening step, but are never forced to search the interior of large plateaus. Table 5.2 shows position type distributions for Uniform Random 3-SAT instances that are too large for exhaustive enumeration; since for these instances, random sampling yields almost exclusively LEDGE (and, very seldomly, SLOPE) positions, we sampled the position distributions along trajectories of GWSAT. For obtaining these samples, we used 100 runs of the
5.2 Search Landscapes and Local Minima
Instance
uf50-218/medium uf100-430/medium uf150-645/medium Instance
uf50-218/medium uf100-430/medium uf150-645/medium
215
avg sc
SLMIN
LMIN
IPLAT
615.25 3 410.45 10 231.89
0% 0% 0%
47.29% 43.89% 41.95%
0% 0% 0%
SLOPE
LEDGE
LMAX
SLMAX
< 0.01%
52.71% 56.11% 58.05%
0% 0% 0%
0% 0% 0%
0% 0%
Table 5.2 Distribution of position types for critically constrained Uniform Random-3-SAT instances of different size, sampled along GWSAT trajectories. (See text for details.)
algorithm with a maximum of 1 000 steps each. Any of these short runs during which a solution was found were terminated at that point in order to prevent solution positions from being over-represented in the sample. (GWSAT will, once it has found a solution, return to that solution over and over again with a very high probability, unless the search is restarted.) Therefore, the actual sample sizes vary; however, we made sure that each sample contained at least 50 000 search positions. It may be noted that the differences in search cost between the instances shown in Table 5.2 can be entirely attributed to differences in solution density. The results of this analysis clearly reflect the fact that GWSAT is strongly attracted to local minima positions. They also suggest that neither SLMIN, nor IPLAT or SLOPE positions play a significant role in SLS behaviour on Uniform Random 3-SAT instances, which is consistent with our earlier observation for small instances, that these position types are extremely rare or do not occur at all. Similar results have been obtained for other types of SAT instances (cf. Hoos [1998]); IPLAT positions were never encountered and SLOPE as well as SLMIN positions occur only occasionally. Instead, particularly when sampling along GWSAT trajectories, the position type distributions are dominated by LEDGE positions, which mainly represent the gradient descent and plateau escape behaviour of GWSAT, and LMIN positions, which characterise plateau search phases of GWSAT’s search. Somewhat surprisingly, our results regarding the ratio of LEDGE and LMIN positions along GWSAT’s trajectories, when applied to instances from various problem domains, do not give clear evidence for the plateau search phase having
216
Chapter 5 Search Space Structure and SLS Performance
substantially larger or smaller impact on SLS performance than the gradient descent and plateau escape phases.
Clearly, in non-neutral landscapes, all local minima are strict [Flamm et al., 2002]. Hence, the results described in Example 5.3 illustrate the previously mentioned fact that SAT instances typically have neutral search landscapes; furthermore, the observed position type distributions show that this neutrality is manifested for the vast majority of all search positions. This latter observation exemplifies how, in general, position type distributions can be used to elaborate and quantify basic landscape properties such as degeneracy, local non-invertibility and neutrality.
Number, Density and Distribution of Local Minima Local minima are amongst the most relevant landscape features in terms of their impact on SLS behaviour. (As always, we assume that any given SLS algorithm attempts to minimise its respective evaluation function.) Clearly, in the case of landscapes that have no local minima other than the global minima (which are technically speaking also local minima), even simple iterative improvement algorithms should find (optimal) solutions relatively easily. The hardness of solving typical combinatorial optimisation problems can therefore be attributed to the fact that the respective search landscapes typically contain a large number of local minima that are not global optima. These observations suggest that the number of local minima should be positively correlated with the hardness of the respective problem instance for SLS algorithms. For degenerate landscapes, we define the number of local minima as the total number of LMIN and SLMIN positions; this reflects the intuition that positions of these types have a detrimental effect on SLS behaviour, since they do not allow improving search steps. (In Sections 5.5 and 5.6, we will discuss the larger plateau and basin structures that are not captured by this definition.) Analogous to the case of the number of solutions, rather than the number of local minima, m, the respective local minima density, defined as m/ #S , should be considered; again, given the typically very small values of this measure, it is often more convenient to use − log10 (m/ #S ). In Example 5.3 (Table 5.1), we saw some evidence supporting the correlation between the local minima density and search cost (all three SAT instances shown there have the same search space size). Local minima density can typically not be measured directly (except for extremely small problem instances), and no general estimation methods are
5.2 Search Landscapes and Local Minima
217
available. However, in some cases, local minima density can be analytically determined or estimated. For example, in completely random landscapes over a neighbourhood graph with #S vertices and vertex degree D , the expected local minima density is known to be #S/(D + 1) [Kauffmann et al., 1988; Stadler, 1995]. In some cases, local minima density can be estimated based on autocorrelation length, a measure of landscape ruggedness that will be discussed in Section 5.4 [Stadler and Schnabl, 1992; Stadler, 1995]; based on this method, it has been shown for a number of combinatorial problems and neighbourhood relations, including the TSP with the 2-vertex-exchange neighbourhood, that the number of local minima increases exponentially with problem size n (for TSP, n is the number of vertices in the given graph); at the same time, in most cases, the local minima density decreases exponentially with n (see [Stadler, 1995]). It is intuitively clear that for a given local minima density, the distribution of local minima within a given landscape may have an impact on SLS behaviour: While for a uniform distribution, the localised density of local minima would be the same across the entire landscape, highly non-uniform distributions are associated with a large variability in localised local minima density. A commonly used method for studying the local minima distribution is based on the pairwise distances between a set of local minima sampled by means of SLS algorithms. For sampling the local minima, typically relatively simple SLS methods, such as iterative improvement algorithms, are used, although in principle, more complex and effective methods can be equally well applied. It may be noted that the SLS method used in this context will typically introduce a bias into the sampling process; in order to obtain a meaningful sample, it is important to ensure that the sampling algorithm (i) is well diversified and (ii) reaches reasonably high-quality local minima. The first of these criteria is typically achieved by means of randomisation, while the second calls for the application of an SLS algorithm with reasonably high performance. Given a sample of local minima, either the empirical distribution of the distances between the respective positions in the underlying search landscape is calculated, or the empirical distribution between all local minima positions and the closest (optimal) solution. In either case, the range of observed distances and the modes of the distance distributions reflect important properties of the distribution of local minima across the landscape. For example, a multi-modal distance distribution indicates the concentration of local minima in a number of clusters, where the lower modes correspond to the intra-cluster distances, while the higher modes represent the inter-cluster distances. A more detailed analysis of the clustering of local minima can be performed using standard statistical techniques for cluster analysis [Everitt et al., 2001]. As illustrated in the following example, this method can be used for showing that for many
218
Chapter 5 Search Space Structure and SLS Performance
Euclidean TSP instances, all local minima occur in a relatively small region of the search space.
Example 5.4 Distribution of Local Minima for the TSP
In order to analyse the distribution of local minima within the search landscapes for various TSP instances (all taken from TSPLIB), a set M of distinct local minima was obtained from 1 000 runs of a 3-opt first-improvement local search algorithm, where each run started from a different random initial solution. The 3-opt algorithm uses standard speed-up techniques based on fixedradius nearest neighbour searches with candidate lists of the 40 nearest neighbours (see Chapter 8, Section 8.2 for details on these speed-up techniques). The pairwise distances d(s, s ) between any two locally minimal candidate solutions s and s in M were measured in terms of the bond distance, that is, d(s, s ) is equal to the number of edges that occur in s but not in s . Furthermore, for all elements of M , we determined the minimal bond distance to a set of optimal tours; for any given problem instance, this set was obtained by selecting the best candidate solutions generated in 100 independent runs of a high-performance SLS algorithm for the TSP. (It may be noted that using this method, for all instances considered here, except for pr1002, multiple optimal solutions were found; in all cases, the provably optimal solution quality is known.) The same experiment was performed for an ILS algorithm that uses the previously described 3-opt algorithm as its subsidiary local search procedure and is capable of reaching substantially higher-quality local minima. This ILS algorithm uses a single random double-bridge move for perturbation and an acceptance criterion that always selects the better of two given candidate solutions. In this experiment, it was run 200 times on each given TSP instance in order to obtain a set of sample local minima; each run was terminated after n iterations, where n is the number of vertices in the given graph. The results of these experiments are shown in Table 5.3 and Figure 5.3. Generally, the average distance between the local minima is very small compared to the maximal possible bond distance of n between any two candidate solutions. This indicates that all local minima are concentrated in a relatively small region of the respective search landscape. Furthermore, the average and maximum distances between local minima and between locally minimal and globally optimal tours are rather similar; however, the average distance between local minima is slightly larger than the average distance from a local minimum to its (purportedly) closest global optimum. These observations
5.2 Search Landscapes and Local Minima
Instance
n
avg sq [%]
avg dlmin
min dopt
219
avg dopt
max dopt
185.90 208.6 246.0
231 273 293
123.1 143.2 151.8
166 207 208
Results for 3-opt
rat783 pr1002 pcb1173
783 1 002 1 173
3.45 3.58 4.81
197.8 242.0 274.6
147 141 198
Results for Iterated Local Search
rat783 pr1002 pcb1173
783 1 002 1 173
0.92 0.85 1.05
142.2 177.2 177.4
67 67 58
Table 5.3 Experimental results on the distribution and solution quality of local minima
cumulative frequency
for three well-known TSP instances from TSPLIB. The data on local minima were collected from 1 000 runs of a simple 3-opt algorithm (upper part of the table) and from 200 runs of a more powerful ILS algorithm (lower part of the table); n denotes the number of vertices in the given graph, avg sq is the average relative solution quality of the local minima (measured as percentage deviation from the optimal solution quality of the given instance), avg dlmin is the average distance between the local optima, and min dopt , avg dopt and max dopt denote the minimum, average and maximum distance between a local minimum and the (purportedly) closest globally optimal solution. (For details, see text.)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 d(opt) 0.1 d(lmin) 0 140 160 180 200 220 240 260 280 300 320
bond distance
Figure 5.3 Distributions of the distance between local minima and the respective
(purportedly) closest global optimum (left curve) and of the average distance between local minima (right curve); the underlying set of local minima has been obtained from 1 000 runs of a simple 3-opt algorithm on TSPLIB instance pr1002. (For further details, see text.)
220
Chapter 5 Search Space Structure and SLS Performance
suggest that optimal solutions are located centrally within the region of high local minima density. Further evidence for this hypothesis comes from the fact that the higherquality local minima found by the ILS algorithm tend to be significantly closer to each other and to the closest (purportedly) global optimum than observed for the lower-quality local minima found by 3-opt. This indicates that higher-quality local minima tend to be concentrated in smaller regions of the given search space. As a result, in the case of the TSP, local minima are clearly not uniformly distributed within the search landscape. (We will see more evidence for this hypothesis in the next section, in the context of fitness-distance analysis for the TSP.)
The results illustrated in this example, that is, concentration of the local minima in a small region of the landscape and central position of optimal solutions within that region, have been shown to hold for many types of TSP instances, including randomly generated and structured instances, as well as for other combinatorial optimisation problems, such as the Graph Bi-Partitioning Problem [Boese et al., 1994; Merz and Freisleben, 2000b]. However, for other problems, including certain types of instances of the Quadratic Assignment Problem (see also Chapter 10, Section 10.2), the local minima for commonly used neighbourhood relations are distributed across the entire landscape, and the average distance between local minima is close to the diameter of the underlying search graph [Merz and Freisleben, 2000a; Stützle, 1999]. In certain types of highly neutral landscapes, such as the ones typically encountered for SAT instances, local minima positions are clustered in the form of large plateau regions. While this phenomenon is clearly an important aspect of the distribution of local minima within the given landscape, depending on the size and distribution of these plateaus within the given landscape, it may be hard to detect from an empirically sampled local minima distance distribution. (The properties of such plateaus and their impact on SLS behaviour will be discussed in more detail in Section 5.5.)
5.3 Fitness-Distance Correlation The evaluation function value is the primary guidance mechanism that is used by SLS algorithms in their search for a (globally optimal) solution to a given problem instance. For this guidance to be effective, ideally, the better the candidate solutions are rated, the closer they should be to an optimal solution. Fitness-distance
5.3 Fitness-Distance Correlation
221
analysis (FDA) aims to evaluate the nature of the relationship between the solution quality and the distance between solutions within a given search landscape. This relationship can be summarised using a simple measure of the correlation between the quality of solutions in terms of their evaluation function value and their distance to the closest globally optimal solution [Jones and Forrest, 1995].
Definition 5.4 Fitness-Distance Correlation Coefficient
Given a candidate solution s ∈ S , let g (s) be the evaluation function value of s, and let d(s) be the distance of s to the closest global optimum. Given fitness-distance pairs (g (s), d(s)) for all s ∈ S , the fitness-distance correlation coefficient (FDC coefficient) is defined as
ρfdc (g, d) :=
Cov (g, d) g (s) · d(s) − g (s) · d(s) = , (5.1) σ (g ) · σ (d) g 2 (s) − g (s)2 d2 (s) − d(s)2
where Cov(g, d) denotes the covariance of fitness-distance pairs (g (s), d(s)) over all s ∈ S ; σ (g ) and σ (d) are the respective standard deviations of the evaluation function and the distance values for all s ∈ S ; and g (s), g 2 (s), g (s) · d(s) denote the averages of g (s), g 2 (s) and g (s) · d(s), respectively, over all candidate solutions s ∈ S .
Remark: The term ‘fitness-distance correlation’ was originally introduced in
the literature on evolutionary algorithms, where — motivated by the notion of evolutionary fitness of an organism — the term ‘fitness’ is often used to refer to evaluation or objective function values. In the context of minimisation problems, the use of the term ‘fitness’ is somewhat counterintuitive; but given the common usage of the terms ‘fitness’ and ‘fitness-distance correlation’ in the literature, we will use the same terminology in this section.
By definition, we have that −1 ≤ ρfdc (g, d) ≤ 1. The extreme values are taken in case of a perfect linear correlation between fitness and distance. For minimisation problems, a large positive value of ρfdc indicates that the lower the evaluation function value, the closer the respective positions are, on average, to a globally optimal solution. A value close to zero indicates that the evaluation function does not provide much guidance towards globally optimal solutions, while for negative correlations, the evaluation function is actually misleading.
222
Chapter 5 Search Space Structure and SLS Performance
Empirical Evaluation of the Fitness-Distance Relationship The computation of the exact FDC coefficient would require to compute averages over all solutions in the search space. Obviously, this is infeasible even for relatively small instances of combinatorial problems. Therefore, in practice, the FDC coefficient is computed based on a sample of candidate solutions. Given a sample of m candidate solutions {s1 , . . . , sm } with an associated set of fitness-distance pairs {(g1 , d1 ), . . . , (gm , dm )}, the estimate rfdc of ρfdc is computed as
rfdc :=
(g, d) Cov , σ (g ) · σ (d)
(5.2)
where (g, d) := Cov
1 (gi − g¯)(di − d¯), m−1 m
(5.3)
i=1
σ (g ) :=
1 m−1
m i=1
(gi − g¯)2 ,
σ (d) :=
1 (di − d¯)2 , m−1 m
(5.4)
i=1
and g¯, d¯are the averages over the sets G := {g1 , . . . , gm } and D := {d1 , . . . , dm }, (g, d) is the sample covariance of the pairs (gi , di ), while σ (g ) respectively; Cov and σ (d) are the sample standard deviations of G and D, respectively. The FDC coefficient can be estimated using random samples of candidate solutions [Jones and Forrest, 1995]. However, in the context of powerful SLS algorithms, it is typically more interesting to focus the analysis towards locally optimal solutions. The main reason for this lies in the fact that efficient SLS methods are highly biased towards sampling good candidate solutions, and several high-performance SLS methods, such as Iterated Local Search or Memetic Algorithms, can be seen as searching the space of local optima. Consequently, in many studies, fitness-distance analysis and in particular the computation of the FDC coefficient is based on samples of local optima [Boese, 1996; Merz and Freisleben, 1999; 2000b; Reeves, 1999; Stützle and Hoos, 2000]. Such samples are typically obtained from multiple runs of an iterative improvement algorithm that uses Uninformed Random Picking as its initialisation procedure. Occasionally, fitness distance analysis is also based on candidate solutions obtained from short runs of a high-performance SLS algorithm, which allows the sampling of higher-quality candidate solutions than those typically obtained from iterative improvement algorithms [Boese, 1996; Stützle, 1999].
5.3 Fitness-Distance Correlation
223
The same data that are used for estimating FDC coefficients can be graphically displayed in the form of fitness-distance plots. These are scatter plots in which every fitness-distance pair (g, d) is represented by a point with x-coordinate d and y -coordinate g ; evaluation function values g are sometimes measured as percentage deviations from a (purportedly) global optimum. Because the FDC coefficient only measures the linear correlation between distances and evaluation function values, nonlinear relationships that are easily visible in an FDC plot may sometimes be overlooked in an analysis that is solely based on the FDC coefficient. The following example illustrates the use of fitness-distance plots.
Example 5.5 Fitness-Distance Analysis for the TSP
Using the same 3-opt algorithm as described in Example 5.4 (page 218ff.), we sampled 2 500 local optima for TSPLIB instance rat783. The fitnessdistance plot for the respective local minima is shown on the left side of Figure 5.4. Clearly, there is a significant fitness-distance correlation; the corresponding FDC coefficient of 0.68 confirms this observation. Note that the maximum possible distance between candidate solutions for this TSP instance is n = 783. Hence, the distances between local optima and the respective closest global optima are generally relatively small, which confirms the result regarding the distribution of local minima illustrated in Example 5.4.
7
percentage deviation from best quality
percentage deviation from optimum
5 4.5 4 3.5 3 2.5 2 120
6.5
6 5.5
5 4.5
4 3.5
3 2.5
140
160
180
200
220
distance to global optimum
240
46
48
50
52
54
56
58
60
distance to best known solution
Figure 5.4 Fitness-distance plots for TSP instance rat783, based on 2 500 local optima
obtained from a 3-opt algorithm (left side) and for QAP instance tai60a, based on 1 000 local optima obtained from a 2-opt algorithm for the QAP (right side). The maximum possible distance is 783 for the TSP instance and 60 for the QAP instance.
224
Chapter 5 Search Space Structure and SLS Performance
However, not all combinatorial problems give rise to search landscapes with significant fitness-distance correlations. The fitness-distance plot shown on the right side of Figure 5.4 was obtained from 1 000 runs of a 2-opt algorithm for the Quadratic Assignment Problem (QAP), applied to tai60a, a well-known benchmark instance from QAPLIB. (The QAP will be discussed in more detail in Chapter 10, Section 10.2.) Clearly, there is no significant fitness-correlation (rfdc = 0.03); furthermore, it may be noted that here, different from our previous observations for TSP instances, the distances between the local optima and the (unique) best known solution is generally close to the maximum possible distance of 60. (Since provably optimal solutions for this instance are not known, the best known solution has been used instead.) This indicates that for QAP instances such as the one used in this experiment, the local minima are much more uniformly distributed across the search space than for the TSP. These observations suggest that SLS algorithms for the QAP may require much stronger diversification mechanisms than SLS algorithms for the TSP — a hypothesis that is well supported by more direct evidence.
Applications and Limitations of Fitness-Distance Analysis The existence of a strong correlation between the evaluation function value and the distance to the closest global optimum, as observed for many types of TSP instances, is often referred to as a big valley structure of the underlying landscape [Boese, 1996]; intuitively, in this structure a (optimal) solution is surrounded by a large number of local minima with evaluation function values that deteriorate with increasing distance from the (optimal) solution. In the case of maximisation problems, the analogous phenomenon is typically referred to as the massif central. Further evidence for a big valley structure can be obtained from results on the correlation between the solution quality and the average distance between one element and any other element of a given set of locally optimal candidate solutions [Mühlenbein et al., 1988; Mühlenbein, 1991; Kauffman, 1993; Boese et al., 1994]. An example of such a correlation is illustrated graphically in Figure 5.5; the correlation between solution quality and average distance values indicates that high-quality local minima tend to be located centrally within the region containing all local minima represented by this sample. The presence of a big valley structure and high FDC has an impact on the design and behaviour of SLS algorithms. In the case of search landscapes with a marked big valley structure, the use of strong intensification mechanisms usually
5.3 Fitness-Distance Correlation
225
percentage deviation from optimum
5 4.5 4 3.5 3 2.5 2 170
180
190
200
210
220
230
average distance between solutions
Figure 5.5 Average distance among a sample of 2 500 local minima sampled using a 3-opt algorithm versus the percentage deviation from the optimum solution quality for TSP instance rat783.
leads to better performing SLS algorithms. For example, in Iterated Local Search this can be achieved by a greedy acceptance criterion that always accepts the better of two given candidate solutions and thus helps to focus the search around the best candidate solution found so far. Also, for landscapes with high FDC, the use of restart strategies can be rather detrimental, and better performance can typically be achieved by using weaker forms of search diversification or perturbation. Fitness-distance analysis can also be used to compare the effectiveness of neighbourhood relations used in different SLS algorithms. Neighbourhoods that lead to landscapes with higher FDC coefficients typically provide the basis for more efficient SLS algorithms. This has been shown in the case of the TSP [Boese, 1996; Merz and Freisleben, 2001]; similar results have been obtained for the Graph Bi-Partitioning Problem [Merz and Freisleben, 2000b]. FDC is widely used to analyse the difficulty of problems and problem instances based on the idea that, all else being equal, high fitness-distance correlations should render problem instances easier to solve for SLS algorithms, since the respective search landscapes provide more global guidance. However, fitness-distance analysis (FDA) has a number of shortcomings and limitations. Firstly, FDA is based on the knowledge of (optimal) solutions, and it can typically only be used as an a posteriori method for assessing the hardness of problem instances or the performance of SLS algorithms. However, this is not a strong drawback if FDA results for individual problem instances can be generalised to larger instance classes; current experience suggests that this is often the case. Furthermore, even if used as an a posteriori method, FDA can play an important role in the context of explaining and understanding the behaviour of SLS
226
Chapter 5 Search Space Structure and SLS Performance
algorithms. In many cases, the observations and insights gained from FDA can be exploited to develop new and better performing algorithms [Boese et al., 1994; Stützle, 1998c; Finger et al., 2002]. Secondly, for many instances of optimisation problems, optimal solutions cannot be readily computed. In these cases, sometimes the best known solutions are used as the basis of a fitness-distance analysis (cf. Example 5.5). This appears to be reasonable, since in many cases, the goal of an SLS application is to reach high-quality, but not necessarily optimal, solutions; however, proceeding in this way may lead to erroneous conclusions if these best known solutions are rather distant to true global optima. Finally, it has been shown in the literature that fitness-distance analysis can lead to counterintuitive results if the FDC coefficient is used as a basis for classifying problem instances according to their hardness for SLS algorithms. For example, Altenberg [1997a] constructed a function that, based on FDC results, should be difficult for a given evolutionary algorithm, but experimentally turns out to be very easy. However, closer examination revealed that when using a distance measure that reflects the effective neighbourhood used by the evolutionary algorithm rather than plain Hamming distance, a high FDC coefficient is obtained, which correctly characterises this problem instance as easy for the given algorithm. A similar case arises for the so-called ridge function R(s), which is defined as ⎧ ⎨ n + 1 + #1 s, if ∃i ∈ {0, 1, . . . , n} : s = 1i 0n−i ; (5.5) R(s) := ⎩ n − # s, otherwise; 1 where any candidate solution s is a string of zeros and ones, n is the length of s and #1 s denotes the number of ones in s [Quick et al., 1998]. While global optima of this function are easily found by an iterative improvement algorithm, the FDC averaged over all 2n candidate solutions is very small, which erroneously suggests that the function would be difficult to optimise. For further criticism of fitness-distance analysis we refer to Naudts and Kallel [2000]. However, despite of its known shortcomings, fitness-distance analysis has proven to be a valuable tool for the analysis of search landscapes and has yielded significant insights, particularly into the behaviour of hybrid SLS algorithms.
5.4 Ruggedness One of the features that strongly influences the performance of SLS algorithms is the ruggedness of the search landscape. Ruggedness describes the degree of
5.4 Ruggedness
227
Figure 5.6 Example of a smooth (left side) and a rugged (right side) search landscape.
variability between the levels of neighbouring positions. Intuitively, landscape ruggedness is related to the number of local optima: with a high density of distinct local optima are typically very rugged, while smooth landscapes can be expected to have fewer local optima (see Figure 5.6). Consequently, more rugged landscapes are intuitively harder to search for SLS algorithms. Conversely, smoother landscapes allow iterative improvement algorithms to perform a larger number of steps before hitting a local optimum, and hence to explore larger parts of the search space, which typically results in higher-quality candidate solutions.
Landscape Correlation Functions There have been several attempts to formally define the concept of landscape ruggedness. One possibility is to measure the correlation between the levels of positions at a fixed distance i in the given landscape [Weinberger, 1990].
Definition 5.5 Search Landscape Correlation Function
Given a search landscape L := (S, N, g ), the search landscape correlation function of L, ρ(i), is defined as
ρ(i) :=
g (s) · g (s )d(s,s )=i − g (s)2 g 2 (s) − g (s)2
(5.6)
where g (s) and g 2 (s) denote the average of g (s) and g 2 (s), respectively, measured over all positions in the given space S , and g (s) · g (s )d(s,s )=i is the average over g (s) · g (s ) for all pairs of solutions s, s ∈ S for which d(s, s ) = i.
The correlation structure of a given landscape is characterised by the values
ρ(i) for different distances i. The most important correlation is the first-order
228
Chapter 5 Search Space Structure and SLS Performance
correlation, ρ(1), which captures the statistical dependency between the level of a position and its direct neighbours. Values of ρ(1) that are close to one indicate that neighbouring positions tend to have very similar levels; intuitively, this corresponds to a smooth landscape. (Note that a high correlation value ρ(1) implies that positions whose level is far below the average over the entire landscape can be expected to have direct neighbours at similarly low levels.) In contrast, a ρ(1) value close to zero indicates that there is basically no correlation between the levels of neighbouring positions. In this case, the evaluation function values of any candidate solution is essentially independent of the evaluation function value of its neighbours, which gives rise to very rugged landscapes and renders the problem very hard for local search. Intuitively, a high correlation value ρ(1) occurs, for example, in the TSP: A neighbouring solution under the standard 2-exchange neighbourhood (see Chapter 1, page 42f.) differs only in two of the n possible edges; hence, n − 2 edge weights remain the same and we expect the cost of two neighbouring tours to be rather similar. Zero correlation results, for example, if for each candidate solution an independently generated random value is assigned, that is, in the case of completely random landscapes. In principle, when given a particular instance of some combinatorial problem, the landscape ruggedness could be computed exactly by measuring the evaluation function values of all candidate solutions and the distances between every pair of solutions. In practice, this is not feasible because of the enormous size of the given search spaces. However, for some problems, the quantities that are used in the definition of the landscape correlation function can be determined analytically from the instance data [Angel and Zissimopoulos, 2000; 2001]. In cases where the landscape correlation function cannot be determined analytically, estimation methods have to be used. One fairly common approach is to measure the correlations between neighbouring solutions by means of a (uninformed) random walk (cf. Chapter 1, page 45) [Weinberger, 1990]. Starting from a randomly selected initial candidate solution, a random walk of m steps can be used to generate a sequence of evaluation function values (g1 , . . . , gm ), based on which the (empirical) autocorrelation function of the walk can be determined as:
r(i) :=
m−i 1/(m − i) · k=1 (gk − g¯) · (gk+i − g¯) m 1/m · k=1 (gk − g¯)2
(5.7)
m where g¯ := 1/m · k=1 gk is the average of the observed evaluation function values. This autocorrelation function gives statistical information about the strength of the correlation between two points that are i steps apart in the random walk. Note that if the empirical autocorrelation function of a random walk is to correctly summarise the information on the correlation structure of the search
5.4 Ruggedness
229
landscape, one necessary assumption is that the landscape is isotropic; this basically means that the starting point of the random walk has no influence on the statistical information obtained from its trajectory. In this situation, any random walk is representative of the entire search landscape. For many problems, the empirical autocorrelation functions of random walks show an exponential decay of the form r (i) = e−i/l , where the parameter l is called the (empirical) correlation length of the search landscape. Based on this observation, as long as r (1) = 0, we can define the (empirical) correlation length as
l :=
1 ln(|r (1)|)
.
(5.8)
Like r (1), the correlation length summarises the ruggedness of the given landscape: the larger the correlation length, the smoother is the search landscape. Correlation length typically depends on instance size [Stadler, 1995]; therefore, it is often scaled in dependence of the diameter of the neighbourhood graph and defined as l := l/diam(GN (π )). A measure similar to the correlation length, called autocorrelation coefficient, was introduced by Angel and Zissimopoulos [Angel and Zissimopoulos, 1998]. It is defined as ξ := 1/(1 − ρ(1)); intuitively, large values of ξ correspond to smooth landscapes.
Random Landscapes Landscape correlation functions can also be computed for random distributions of problem instances. For example, some well-known random instance distributions for symmetric TSP instances are obtained by assigning random weights to the edges of a given graph, while others are defined by randomly placing points in the Euclidean plane (see Chapter 8, Section 8.1 for more details). In such cases, the evaluation function value of each position can be seen as a random variable, and instead of the quantities of Equation 5.6 (page 227), which refer to the landscape of one particular problem instance, one can compute the expected values of the corresponding random variables. For an introduction to the resulting theory we refer to Stadler [1996]. Examples of how this can be done have been given by Stadler and Schnabl [1992] for the TSP and in Angel and Zissimopoulos [1998], as well as Stadler and Happel [1992] for the Graph Bi-Partitioning Problem. In this context, a particularly important role is played by the so-called AR(1) landscapes, where all correlations are completely determined by the correlations between directly neighbouring positions. Formally, a landscape is an AR(1) landscape if, and only if, it is isotropic, Gaussian and Markovian. A random landscape
230
Chapter 5 Search Space Structure and SLS Performance
is called isotropic if all candidate solutions have the same expected evaluation function value and if for any pairs of candidate solutions (s, t) and (u, v ) with d(s, t) = d(u, v ) the expectations E[g (s) · g (t)] and E[g (u) · g (v )] are equal. A random landscape is Gaussian if the evaluation function value for each individual search position follows a Gaussian (normal) distribution. For example, it can be argued that the evaluation function values of random TSP instances are normally distributed. The reason for this is that the length of a tour can be seen as the sum of n random variables corresponding to the distance between each pair of neighbouring vertices in the tour drawn from some (possibly unknown) distribution with a fixed mean µ and a fixed variance σ 2 ; by the central limit theorem (see, e.g., Rohatgi [1976]), for reasonably large instance sizes n, the resulting tour lenghts will be normally distributed. Analogous arguments can be applied to many other combinatorial problems. Finally, a random landscape is Markovian if the evaluation function value of any position statistically only depends on the evaluation function values of all its neighbours. It is easy to see that (at least in the limit), a random walk in an AR(1) landscape is an AR(1) (first-order autoregressive) stochastic process [Box and Jenkins, 1970], which leads to an autocorrelation function of the form
ρ(i) = (ρ(1))i = e−i/λ
(5.9)
where λ is the correlation length. Among the AR(1) landscapes we find random instance distributions of the TSP, the Graph Bi-Partitioning Problem and several others combinatorial problems [Stadler, 1995]. Prototypical examples for random AR(1) landscapes are the NK-landscapes (for details, see the in-depth section on page 231ff.). The NKlandscape model is important in the context of epistasis, a concept originating from genetics (where it refers to the masking of the effects of a set of genes by another set of genes), which is related to landscape ruggedness. In the literature on evolutionary algorithms, the term epistasis has been used to refer to the interaction between the solution components of a given combinatorial problem in terms of their effect on the evaluation function. Intuitively, in cases where the effects of the solution components on the evaluation function are mutually independent, the corresponding optimisation problem is easy to solve, since suitable values for each solution component can be determined independently of the others. But with increasing interactions among solution components, the problem of finding optimal values of the given evaluation function becomes more difficult. Several measures for epistasis have been proposed and used in the literature [Davidor, 1991; Fonlupt et al., 1998]. However, some concerns have been raised regarding these epistasis measures [Naudts and Kallel, 2000; Rochet et al., 1997],
5.4 Ruggedness
231
and their use is largely limited to research on the behaviour of evolutionary algorithms.
In Depth N K-Landscapes NK-landscapes are a statistical model of search landscapes [Kauffman, 1993]. Given n binary variables, s1 , . . . , sn , the evaluation function value of a candidate solution s := (s1 , . . . , sn ) is defined as g(s) :=
n 1 gi (si , si1 , . . . , sik ), n i=1
(5.10)
that is, g(s) is the average over contributions of the n individual solution components. The contribution of an invididual solution component si depends on the value of si as well as on the values of k other variables. It is specified in the form of a function gi : {0, 1}k+1 → [0, 1], whose value for each possible assignment of the k + 1 input variables is a random number, which is typically chosen according to a uniform distribution over the interval [0, 1]. In implementations of NK-landscape generators, the possible values for gi can be determined efficiently by using a lookup table that contains the function values for all 2k+1 possible inputs (see Altenberg [1997b] for a discussion of how to avoid the explicit storage of all function values). Note that for k > 0, changing the value of a single variable generators, si will generally affect the evaluation function contributions of all other variables that depend on si . The k variables that form the context of the evaluation function contribution of a solution component si can be chosen according to different models. The two models originally investigated by Kaufmann are the random neighbourhood model, where the k variables are chosen randomly according to a uniform distribution among the n − 1 variables other than si , and the adjacent neighbourhood model, in which the k variables are chosen that are closest to si in a total ordering s1 , s2 , . . . , sn . (The adjacent neighbourhood model uses periodic boundaries, that is, variable s1 is a direct neighbour of variable sn ). While no significant differences between the two models were found in terms of global properties of the respective families of search landscapes, such as mean number of local optima or autocorrelation length [Weinberger, 1991; Kauffman, 1993], they differ substantially with respect to computational complexity. While the optimisation problem for general N K -landscapes is N P-hard, in the case of the adjacent neighbourhood model, it can be solved by a dynamic programming algorithm in time O(n · 2k ) [Weinberger, 1996; Wright et al., 2000]. For the random neighbourhood model, on the other hand, only the special case of k = 1 is polynomially solvable [Wright et al., 2000]. N K -landscapes were designed as a prototype of tunably rugged search landscapes. The properties of the resulting landscapes are influenced by the two parameters n and k. The parameter k corresponds to the order of the interaction between variables; low values of k indicate a low interaction, while high values of k indicate a strong interaction between the variables. At the extremes, k = 0 corresponds to landscapes in which the evaluation function contribution of each variable is independent of all other variables; for k = n − 1, on the other hand, the contribution of every variable depends on the values of all other variables, which leads to completely random landscapes, in which
232
Chapter 5 Search Space Structure and SLS Performance the evaluation function value of any candidate solution is statistically independent of its neighbours. When considering the 1-exchange neighbourhood typically used by SLS algorithms for N K -landscapes, under which the value of exactly one variable is flipped in each search step, interesting differences arise for the two extreme cases k = 0 and k = n − 1. For k = 0, all variables can be optimised independently of each other. Consequently, any search position that is not globally optimal has a direct neighbour with a better evaluation function value, and every local optimum is also a global optimum. Note that as a result of using random numbers as the values of the component contributions gi , these N K -landscapes are expected to have a unique optimal solution. When performing Iterative Improvement on such a landscape, the Hamming distance to the optimal solution is reduced by one in each step; consequently, when initialising the search at a randomly chosen position, the global optimum is reached in n/2 search steps on average. In the case of k = n − 1, the landscapes are completely random. In this case, it can be shown that the probability for an arbitrarily chosen candidate solution to be a local optimum is 1/(n + 1), which leads to an expected number of 2n /(n + 1) local optima. Furthermore, the expected number of iterative improvement steps to reach a local optimum scales approximately as ln(n − 1). By varying the parameter k , interpolations between these two extreme cases can be obtained. Estimates for the number of local optima and the length of iterative improvement trajectories for large k have been derived by Weinberger [1991]. The empirical autocorrelation function and the correlation length can be approximated by r(i) ≈ (1 − (k + 1)/n)i and l ≈ n/(k + 1), respectively [Fontana et al., 1993]. Furthermore, the autocorrelation functions ρ(i) for various types of N K -landscapes have been analytically derived [Fontana et al., 1993; Weinberger, 1996]. It has also been shown experimentally that while for low k , N K -landscapes have significant fitness-distance correlations [Kauffman, 1993], with increasing k , the FDC coefficient quickly approaches zero [Merz, 2000].
Landscape Ruggedness and Local Optima As stated at the beginning of this section, ruggedness is closely related to the number of local minima in a given landscape; in fact, it has been proposed to call a family of landscapes rugged if the number of local optima increases exponentially with search space size [Palmer, 1991]. This suggests that the number or density of local minima may be a good measure for landscape ruggedness. Since it is typically infeasible to determine the exact number of local minima for realistically sized problem instances, estimation methods have to be used instead. In the context of the random field approach to the analysis of search landscapes, the relationship between landscape ruggedness and the number of local optima is made more precise by the correlation length conjecture. This conjecture establishes a tight relationship between the number of local minima and the correlation length of a given isotropic elementary landscape [Stadler and Schnabl,
5.4 Ruggedness
233
1992]. (For the definition of an elementary landscape, see, for example, Stadler [2002a].) More precisely, the correlation length conjecture can be stated as follows. Given an isotropic elementary landscape L, let λ be the correlation length of L (as determined from a random walk on L), and let r (λ) denote the average distance in L that can be reached from a given initial position by a random walk of length λ. Furthermore, let B (r (λ)) be the number of search positions in a region of radius r (λ). Then, it is conjectured that L has approximately #S/B (r (λ)) local optima. The conjecture is based on the observation that for typical elementary landscapes, the correlation length λ directly determines the expected size of correlated regions in L, and in the absence of other distinctive features, each correlated region can be expected to have only a very small number of local optima (see also García-Pelayo and Stadler [1997]). The correlation length conjecture has been tested on several combinatorial optimisation problems including the TSP, the Graph Bi-Partitioning Problem [Krakhofer and Stadler, 1996] and p-spin models [Stadler and Krakhofer, 1996]; these studies have yielded empirical evidence in favour of the conjecture. However, the correlation length conjecture cannot be expected to hold when the underlying isotropy assumption is not satisfied. For example, a study of discrete XY-Hamiltonian spin glasses has shown that, as the degree of anisotropy is raised, the predictions obtained from the correlation length hypothesis become increasingly inaccurate [García-Pelayo and Stadler, 1997].
Landscape Ruggedness and Algorithm Behaviour Intuitively, landscape ruggedness has an impact on the behaviour of specific SLS algorithms and the general effectiveness of stochastic local search for solving a given problem instance. Therefore, at the first glance, there should be some hope that measures of ruggedness, such as autocorrelation or scaled correlation length, could help to explain the variation in search cost between different problem instances. However, for several combinatorial problems, including the TSP and the Graph Bi-Partitioning Problem, it has been shown (usually with the help of random landscape theory) that expected autocorrelation and correlation length values only depend on instance size, but not on particular instance features [Stadler and Happel, 1992; Stadler and Schnabl, 1992; Angel and Zissimopoulos, 1998]. Considering the substantial variation in difficulty between instances of the same size, even when the instances stem from the same random instance distribution, this casts some doubt on the suitability of ruggedness measures for explaining or predicting the hardness of individual problem instances.
234
Chapter 5 Search Space Structure and SLS Performance
These doubts are further confirmed by empirical results. For example, it has been shown that in the case of SAT, the correlation length is independent of the ratio of clauses to variables, which is known to critically determine average instance hardness [Rana, 1999]. In a similar vein, Kinnear Jr. [1994] reported in a study of genetic programming algorithms that he was unable to observe any correlation between the landscape autocorrelation and the difficulty of problem instances, where difficulty was measured in terms of a difficulty index derived from run-length distributions. (In the same study, the average number of steps required by an iterative improvement algorithm to reach a local optimum was found to be a better predictor for difficulty.) While measures of landscape ruggedness are often insufficient for distinguishing between the hardness of individual problem instances, they can occasionally be useful for analysing differences between various neighbourhood relations for a given problem, for studying the impact of parameter settings of an SLS algorithm on its behaviour or for classifying the relative difficulty of combinatorial problems at an abstract level. In the following, we briefly exemplify each of these applications. Most properties of a search landscape are strongly determined by the choice of the underlying neighbourhood relation. In particular, varying the neighbourhood relation of a given search landscape can have substantial effects on its ruggedness. This gives rise to the general idea of using measures of ruggedness for selecting one of several possible neighbourhood relations; intuitively, a neighbourhood relation that minimises landscape ruggedness should provide a better basis for SLS methods that can solve the given problem effectively. Example 5.6 Neighbourhoods and Ruggedness for the Symmetric TSP
Stadler and Schnabl computed the correlation length of several neighbourhoods for the symmetric and the asymmetric TSP [Stadler and Schnabl, 1992]. In particular, they focused on the transposition neighbourhood, under which two tours are direct neighbours if, and only if, they differ by the positions of two vertices, and the standard 2-exchange neighbourhood (see Figure 1.6, page 44). They have shown that the transposition neighbourhood leads to a correlation length of n/4, while the 2-exchange neighbourhood leads to twice as large a correlation length of n/2. This analysis indicates that the 2-exchange neighbourhood results in smoother landscapes, which should be beneficial for stochastic local search. This is confirmed by the empirical observation that SLS algorithms based on the 2-exchange neighbourhood usually perform significantly better than algorithms based on the transposition neighbourhood [Reinelt, 1994].
5.5 Plateaus
235
Care has to be taken when comparing neighbourhoods of different size. For example, a well known result by Stadler and Schnabl predicts that the landscape ruggedness obtained for the k -exchange neighbourhood for the symmetric TSP is well approximated by an AR(1) process with a correlation length of λ = n/k + O(1) [Stadler and Schnabl, 1992]. In fact, this estimate is rather independent of the particular distance matrix, and it was shown to hold for Euclidean TSP instances (independent of the dimensionality of the space), as well as for TSP instances with random distance matrices. This result suggests that for larger k , smaller correlation lengths — indicative of more rugged landscapes — are obtained. However, the increased ruggedness needs to be contrasted with the smaller diameter of the search space and the empirical results on the solution quality achieved by iterative improvement algorithms based on the respective neighbourhoods. A second use of ruggedness measures arises in the context of finding good settings for parameters of an SLS algorithm that can influence the shape of the search landscape. Probably the most noteworthy example for this approach was given by Angel and Zissimopoulos [1998], who examined the landscape correlation coefficient for the Graph Bi-Partitioning Problem in dependence of a parameter α that determines the penalisation of infeasible candidate solution in a simulated annealing algorithm for this problem [Johnson et al., 1989]. Interestingly, they found that the landscape correlation coefficient depends on α, and that when using a value α∗ that maximises the correlation coefficient, the simulated annealing algorithm they studied achieved superior performance [Angel and Zissimopoulos, 1998]. Finally, measures of landscape ruggedness have been used as the basis for high-level classifications of the difficulty of combinatorial optimisation problems. One such result led to a ranking of combinatorial problems, including the Travelling Salesman Problem (under different neighbourhood structures), the Quadratic Assignment Problem (see also Chapter 10, Section 10.2), the Weighted Independent Set Problem, the Graph Colouring Problem (see also Chapter 10, Section 10.1), the Maximum Cut Problem and the Low Autocorrelation Bit String Problem, based on the observation of a rough correspondence between the known autocorrelation coefficients and the general difficulty of these problems reported in the literature [Angel and Zissimopoulos, 2000].
5.5 Plateaus So far, our discussion of search space features has been focused on global features of search landscapes, such as local minima density or the fitness distance
236
Chapter 5 Search Space Structure and SLS Performance
correlation coefficient, and on features of local neighbourhoods, such as position types. Now, we shift our attention to landscape structures and features that are encountered at an intermediate scale—in particular, to the plateau regions that are characteristic for the neutral landscapes obtained for many types of combinatorial problems, including SAT.
Plateaus and Their Basic Properties Before discussing plateau properties, we define the concept of a region and its border. Intuitively, a region is a connected set of search positions, and its border is formed by those positions that have neighbours outside the region. Formally, these concepts can be defined as follows.
Definition 5.6 Region, Border
Let L := (S, N, g ) be a search landscape and GN := (S, N ) the corresponding neighbourhood graph. A region in L is a set R ⊆ S of search positions that induces a connected subgraph of GN , that is, for each pair of positions s , s ∈ R, a connecting path s = s0 , s1 , . . . , sk = s exists, for which all si are in R and all si , si+1 are direct neighbours w.r.t. N . The border of a region R is defined as the set of positions within R that have at least one neighbour outside of R, that is, border (R) := {s ∈ R | ∃s ∈ S − R : N (s, s )}.
Plateau regions are simply regions of positions that are all at the same level; plateaus are plateau regions that cannot be further extended by expanding their border. This is captured by the following definition.
Definition 5.7 Plateau Region, Plateau
Let L := (S, N, g ) be a search landscape. A region R in L is a plateau region, if, and only if, all positions s ∈ R have the same evaluation function value, that is, ∃l ∈ N : ∀s ∈ R : g (s ) = l; in this case, level(R) := l is called the level of plateau region R. A plateau in L is a maximal plateau region, that is, a plateau region P for which no position in border (P ) has any neighbour s ∈ S − P with g (s ) = level(P ).
5.5 Plateaus
237
P6.2
P6.1 P4.3
P5 P4.2 P4.1
P4.4 P3.2
P3.1
P2 P1
Figure 5.7 Plateaus in a simple search landscape; note the solution plateau P1 and the various degenerate (single position) plateaus, such as P2 and P3.2 .
Finally, we refer to plateaus that consist entirely of solutions as solution plateaus, while all other plateaus are called non-solution plateaus.
According to this definition, every search position is contained in exactly one plateau; hence the set of plateaus forms a partition of the search space (see Figure 5.7). It may be noted that in non-neutral landscapes, all plateaus are degenerate in the sense that they consist of a single search position only. Plateaus of size one also occur in most neutral landscapes, such as the landscapes encountered for SAT instances. Two important properties of plateaus are size and diameter. The size of a plateau P is simply the number of positions in P , that is, size(P ) := #P . The diameter of P is defined as the diameter of the corresponding subgraph of GN (the neighbourhood graph of the given landscape), that is, the maximal distance between any two positions s, s ∈ P (see also Section 5.1, page 204). It should be noted that the diameter of a plateau P in a given landscape L can be larger than the diameter of the search graph of L, GN because intuitively, P can be ‘folded’ into GN . Another important property of a plateau P is its width, defined as the minimal length of a path between any position in P and any position not in P . Plateau width has an impact on the efficacy of plateau escape mechanisms: Intuitively, plateaus of lower width are easier to escape from. Interestingly, for many classes of SAT instances, as previously indicated by the lack of internal plateau positions (see Example 5.3, page 213ff.), all plateaus appear to be of width one (see also Hoos [1998]). Finally, the plateau branching factor of a position s is defined as the fraction of direct neighbours of s that are in the same plateau as s. (When restricted to LMIN positions, the plateau branching factor is also called local minima
238
Chapter 5 Search Space Structure and SLS Performance
branching factor [Hoos, 1998; 1999b].) Average plateau branching factors close to one indicate highly branched plateaus, while low branching factors are characteristic for weakly branched plateaus. Note that the extreme cases of branching factors of zero and one correspond to degenerate plateaus that consist of a single SLMIN, SLMAX or SLOPE position, and internal plateau positions, respectively. The effectivity of exploration and escape mechanisms can be affected by plateau branching; in particular, simple escape mechanisms, such as Uninformed Random Walk, can be expected to be less effective for escaping from more highly branched plateaus. For small problem instances, basic plateau properties can be determined by exhaustive plateau exploration. For this purpose, various standard search algorithms can be used. Depth-first search (see, e.g., Russel and Norvig [2003]) has the advantage that for a given plateau P , it requires only O (diam(P )) space for storing the search frontier. Depending on the plateau topology and the starting point, breadth-first search (see, e.g., Russel and Norvig [2003]) may require substantially more space, but can be adapted more easily for partial plateau exploration by abandoning certain search positions from the frontier. More advanced techniques for partial plateau exploration include hybrid search techniques that alternate between phases of breadth-first search with bounded depth and random walk, as well as other types of stochastic sampling methods. Partial plateau exploration can yield useful lower bounds on basic plateau characteristics, such as size and diameter. Typically, rather than measuring the properties of a single plateau, one is interested in the plateau properties that are characteristic for a given landscape or family of landscapes. One approach for investigating such characteristic plateau properties is to determine a sample of search positions, for example, from the endpoints of SLS trajectories or by using dedicated statistic sampling techniques, and to start complete or partial plateau exploration from these endpoints.
Exits and Exit Distributions Even extensive plateaus do not necessarily impede search progress. As an example for this, imagine a plateau in which every position has a neighbour at a lower level (such a position is called an exit) — clearly, even a simple iterative improvement algorithm would not be adversely affected by such a plateau. On the other hand, plateaus that have few or no exits may cause substantial problems for SLS algorithms. The concept of exits and the related notion of open vs closed plateaus are captured in the following definition.
5.5 Plateaus
239
Definition 5.8 Exits, Open and Closed Plateaus
Let L := (S, N, g ) be a search landscape and P a plateau in L. An exit of P is a position s on the border of P that has a neighbour at a lower level than P , that is, s is an exit of P if, and only if, s ∈ border (P ) and ∃s ∈ S − P : [N (s, s ) ∧ g (s ) < level(P )]; in this context, such a position s is called a target of exit s, and (s, s ) is called an exit-target pair. We use Ex(P ) to denote the set of exits of plateau P . A plateau P is called open if, and only if, it has exits, (i.e., Ex(P ) = ∅), and closed otherwise, (i.e., Ex(P ) = ∅).
Note that the definition of a closed plateau includes strict local minima as a special case. Simple iterative improvement methods can escape from open, but not from closed plateaus if only non-worsening search steps are allowed. Clearly, for open plateaus, the number, or more precisely, the density of exits and their distribution within the plateau has an impact on SLS behaviour: Intuitively, plateaus with a high exit density and uniform exit distribution should be less detrimental for SLS efficiency than plateaus with a small number of exits that are all clustered together. The exit distance distribution, that is, the distribution of pairwise distances between exits on a given plateau, captures both exit density and exit distribution; in particular, the average exit distance provides a measure for the expected number of steps required for finding an exit from any point of a given plateau. In cases where plateaus can be explored exhaustively, the number of exits and their location within the plateau can be recorded in a straightforward way. From this information, the exit density is directly calculated as #Ex(P )/#P . The exit distribution can be determined using any algorithm for finding shortest paths in graphs, such as Dijkstra’s algorithm [Dijkstra, 1959]. In fact, this shortest path determination can be built into the search algorithm that is used for exploring the plateau. In principle, similar techniques can be used for estimating exit densities and exit distributions (or their characteristics, such as the mean exit distance) of plateaus that are too large for exhaustive exploration; in this case, the information underlying the estimate is obtained from a partial plateau search or sampling method.
Plateau Connectivity and Plateau Connection Graphs The notion of plateaus and their connections via exits provides a convenient basis for obtaining an abstract, yet relatively detailed view of a neutral landscape.
240
Chapter 5 Search Space Structure and SLS Performance P6.2
P6.1 P4.3
P5 P4.2 P4.1
P4.4 P3.2
P3.1
P2 P1
Figure 5.8 Plateau connection graph for the simple search landscape from Figure 5.7
(page 237); the edge-weights are not explicitly indicated and are all equal to one.
Intuitively, this is achieved by collapsing all positions that belong to the same plateau into a single ‘macro position’ (or ‘macro state’). These form the vertices of a so-called plateau connection graph (PCG), whose edges indicate the existence of exits that directly connect the respective plateaus. (For a simple example, see Figure 5.8.) Definition 5.9 Plateau Connection Graph
The plateau connection graph (PCG) of a landscape L is a directed graph G := (V, E ) whose vertices are the plateaus in L and in which there is an edge e := (P, P ) ∈ E if, and only if, P has an exit with a target in P , that is, ∃s ∈ Ex(P ), s ∈ P : [N (s, s ) ∧ g (s ) < g (s)].
It is often useful to refine the notion of a PCG by assigning weights to the edges of the graph that indicate how strongly two given plateaus are connected or how likely a particular SLS algorithm is to reach one from the other. One way of defining such a weighted plateau connection graph is to define the weight of an edge between plateaus P and P as w((P, P )) := #ETP(P, P )/#ETP(P ), where ETP(P ) is the set of all exit-target pairs (s, s ) for which s ∈ Ex(P ), and ETP(P, P ) := {(s, s ) ∈ ETP(P ) | s ∈ P }. In this case, w((P, P )) intuitively corresponds to the probability of reaching P from P under the simplifying assumption that all exit/target pairs (s, s ) are used equally likely. Determining complete plateau connection graphs typically requires exhaustive enumeration of the entire search space. In cases where this is feasible, the PCG can be constructed during the enumeration process based on information on the sets ETP(P, P ). Note that by exploring the plateaus one at a time, the PCG can be built vertex by vertex, and the space requirement of the overall algorithm is dominated by the space needed for storing the frontier of the search process used for plateau exploration. When exhaustive enumeration of the search space is infeasible, it is often useful to construct certain subgraphs
5.5 Plateaus
241
of the full PCG by exhaustively searching a subset of the plateaus, for example, plateaus below a certain level or plateaus that a given SLS algorithm is likely to visit. The latter can be determined by exploring plateaus starting from positions that have been sampled from sets of SLS trajectories. In cases where any exhaustive plateau exploration is infeasible, parts of a given PCG can still be approximated based on partial plateau exploration techniques; this approach typically does not yield proper subgraphs of the PCG, since some of the exits between two partially explored plateaus may not have been found during the partial plateau search. Example 5.7 Plateau Connection Graphs for SAT
Figures 5.9 and 5.10 show the partial weighted plateau connection graphs for two critically constrained Uniform Random 3-SAT instances; the first of
9.1 0.60
8.1
0.28
0.63 0.09
0.28
7.1 0.08
0.66
0.25
6.1 0.68
5.1
0.07 0.25
0.74 0.06
4.1
4.2
0.79 0.5
0.22 0.2
3.2
3.4
0.91
2.1 0.4
0.35
1.2
1.3 1
3.1
3.3
1
1
2.3
2.2
0.5
0.2
1.1
1.4
1
0.1
Figure 5.9 Partial plateau connection graph of an easy satisfiable Uniform Random 3-SAT
instance with 20 variables and 91 clauses. (For further explanations, see text.)
242
Chapter 5 Search Space Structure and SLS Performance 8.1 0.65
7.1 0.25
0.27
0.69
0.07
6.1 0.74 0.06
5.1 0.22 0.78 0.17
4.2
4.1 0.76 0.08
0.05
3.3 0.28
2.7
2.6
2.8
2.3
0.07 0.19 0.39
2.1 1
3.4
1
1.1 1.8
3.1 0.40
0.18 0.43
3.6
3.5
1
1
1
0.55
.91 2.4
2.5 0.11
0.84
3.2
2.2
0.29
0.07
1.3 1
0.06
1.2 1
0.1
Figure 5.10 Partial plateau connection graph of a hard satisfiable Uniform Random 3-SAT instance with 20 variables and 91 clauses. (For further explanations, see text.)
these is relatively easy for high-performance SLS algorithms for SAT, such as GWSAT (see Chapter 6, Section 6.2), while the other is relatively hard. In these illustrations, open plateaus are shown as elliptical nodes, closed non-solution plateaus as rectangular nodes and solution plateaus as triangular nodes. The y -coordinates of the nodes as well as the node labels indicate the respective plateau level; more precisely, a node label of the form l.i is used for plateau number i at level l. The edge-weights of the PCG, which have been determined as previously explained (cf. page 240), are indicated by edge labels and line style; in particular, dashed lines indicate edges of weight < 0.05 and dotted lines indicate some of the edges of weight < 0.01. For both instances, there is only a single plateau per level with exit density < 1 at level 5 and above. (Plateaus with exit density 1 are essentially irrelevant in the context of SLS behaviour, because they pose no challenges for the search process.) These partial PCGs have been obtained by complete exploration of all plateaus and their respective exits. As can be seen from their respective (partial) PCGs, both the easy and the hard instance have a very similar number of closed plateaus and a single solution plateau. However, for the easy instance, the plateaus are connected
5.6 Barriers and Basins
243
in such a way that following the maximal weight path leads directly to the solution plateau, while for the hard instance, the maximum weight path leads to closed non-solution plateau 1.1. Intuitively, the easy instance has a very attractive solution that can be easily reached from almost everywhere in the search space, while the hard instance has a very attractive non-solution region, a ‘trap’, from which it is quite hard to reach a solution. There is some indication that such differences in plateau connectivity are typically underlying the observed differences in hardness (for SLS algorithms) between otherwise very similar SAT instances [Hoos, 2002b].
As illustrated in Example 5.7, certain features of the plateau connection graph have a significant impact on SLS behaviour; for example, the occurrence of closed non-solution plateaus, especially when they are located at the bottom of a system of plateaus that all feed predominantly into that same ‘sink’, can be expected to impede search progress and may cause search stagnation. Structures such as these ‘deep basins’ or ‘traps’ can occur in non-neutral as well as in neutral landscapes. As illustrated in the example, they are important for understanding SLS behaviour.
5.6 Barriers and Basins Not all strict local minima or closed plateaus are equally hard to escape from. One factor that is intuitively connected with the difficulty of achieving an improvement in the given evaluation function g from a strict local minimum or closed plateau is the difference in g that needs to be overcome in order to reach a position at a lower level. The following definition formalises this idea for arbitrary positions in a landscape (cf. Hajek [1988], Flamm et al. [2002]).
Definition 5.10 Mutual Accessibility, Barrier Level, Depth
Given a landscape L := (S, N, g ), two positions s, s ∈ S are mutually accessible at level l if, and only if, there is a path in the neighbourhood graph GN := (S, N ) that connects s and s , visiting only positions t with g (t) ≤ l. The barrier level between positions s, s , denoted bl(s, s ) is the lowest level l at which s and s are mutually accessible. The barrier height between s and s , denoted bh(s, s ), is defined as bh(s, s ) := bl(s, s ) − g (s).
244
Chapter 5 Search Space Structure and SLS Performance 5
P
4
s
3
q
2
u
r
v t
1 0
Figure 5.11 Some advanced landscape features. (See text for discussion.)
Finally, we define the depth of a position s ∈ S as the minimal barrier height bh(s, s ), where s is any position with g (s ) < g (s); the depth of s is defined as ∞ if s is a global minimum of L.
Example 5.8 Mutual Accessibility, Barrier Level and Depth
Consider the simple search landscape in Figure 5.11. Positions r and t are mutually accessible at level 3, but not at level 2; level 3 is also the barrier level between r and t. The barrier height between r and t is 1, while the barrier height between r and q is 2. Note that barrier level is symmetric, while barrier height is not: bl(u, v ) = bl(v, u) = 3, but bh(u, v ) = 1 = 2 = bh(v, u). The depth of q , u and v is 1, 0 and 2, respectively.
For many types of SLS algorithms that can escape from strict local minima or closed plateaus, such as Probabilistic Iterative Improvement, the probability of moving from one search position to another within a given number of search steps is negatively correlated with the barrier height between the two positions. The concepts of barrier height and depth of local minima play an important role in the theory of such algorithms; this applies particularly to Simulated Annealing, where local minima depth has been analytically linked to the convergence properties of certain types of cooling schedules [Hajek, 1988].
Basins and Saddles Closely related to the notion of local minima depth and barrier height is the notion of a basin, which intuitively describes a region of positions at or below a given level. Formally, basins can be defined as follows:
5.6 Barriers and Basins
245
Definition 5.11 Basin
Given a landscape L := (S, N, g ) and a position s ∈ S , the basin of s at level l is the set of all search positions s such that g (s ) ≤ l and s, s are mutually accessible at level l; the basin of s at level g (s) is also referred to simply as the basin of s. A basin of s below level l is a maximal, connected subset of the basin of s at level l that consists solely of positions s for which g (s ) < l, or, equivalently, B is a basin of s below level l if, and only if, there is some t in the basin of s at level l such that B = {s ∈ S | bh(s, s ) = 0 ∧ bl(s , t) < l}. The basin of s below level g (s) is referred to as the basin below s. For example, in Figure 5.11, the basin of u at level 2 contains t and the two other positions at level 0 along with u itself; the basin of u at level 3, however, contains 12 positions, including r, s, t and v . Similarly, the basin of r contains just r and its two neighbours, while the basin of r at level 3 is identical to the basin of u at level 3, as well as to the basin of s. Note that depending on the given landscape and the given level l, there can be more than one basin at level l. To move between two such basins, a search trajectory intuitively has to reach their barrier level and cross a saddle, that is, a position or plateau at the barrier level. The following definition captures this notion of a saddle (see also Flamm et al. [2002]): Definition 5.12 Saddle Point, Saddle Plateau
Given a landscape L := (S, N, g ) and positions r, s, t ∈ S , s is a saddle point between r and t if, and only if, there is a walk w from r to t in GN := (S, N ) such that (i)
w visits s;
(ii) g (s) > max{g (r ), g (t)}; (iii) g (s) is the maximum level reached by any position on w; (iv) g (s) is the barrier level between r and t; (v) w never enters the same basin below s more than once, that is, for any position u visited by w, the positions in any basin below u visited by w form a connected subpath of w. A plateau P in L is called a saddle plateau between r and t if, and only if, it consists entirely of saddle points between r and t.
246
Chapter 5 Search Space Structure and SLS Performance
For example, in Figure 5.11, position s is a saddle point between r and t, and P is a saddle plateau between q and r . Note that according to our definition of saddle points, if one position in a given plateau P is a saddle point, all other positions in P are also saddle points, and hence P is a saddle plateau. Condition (ii) of this definition ensures that in neutral landscapes, saddles (i.e., saddle points or saddle plateaus) represent only connections between true basins and not, for example, the direct connections between two plateaus via an exit from one to the other. Condition (v) rules out walks that climb to the saddle level g (s) of a basin, but then drop below that level again without leaving the basin, before reaching and crossing the true saddle; in cases where multiple basins are connected by saddles at the same level, it also prevents each of these saddles from being considered a saddle between two arbitrary basins. Note that according to our definition, saddle points can have any position type except SLMIN. Furthermore, all saddle plateaus are open plateaus with exits to at least two different (lower-level) plateaus that are not connected at any lower level of the plateau connection graph of the given landscape. While the concepts of basins and saddles as defined here can be very useful for characterising search landscapes and understanding SLS behaviour, they do not always precisely capture the constraints on the trajectories of a given SLS algorithm. In particular, there is no guarantee that all positions in the basin below a position s can actually be visited by a given SLS algorithm starting from s; as an example, consider Iterative Best Improvement in a non-degenerate landscape for the case where the basin below s contains more than one strict local minimum. Similarly, even if two basins are only separated by a relatively low barrier, a given SLS algorithm may be unable or unlikely to cross the respective saddle. Remark: In other literature, sometimes basins are referred to as ‘cups’ [Hajek, 1988] or ‘cycles’ [Flamm et al., 2002]. A related, but slightly different definition of saddle points has been proposed by Flamm et al. [2002].
Basin Trees and Basin Partition Trees It is relatively easy to see that for any search landscape L := (S, N, g ), given two positions s, s ∈ S , the basins of s and s , B and B , are either disjoint, or one contains the other (the latter includes the case B = B ). Hence, the basins in L form a hierarchy. In this context, the basins that reach just below any adjacent saddle play a special role; these so-called barrier-level basins give rise to the notion of a basin tree. A basin tree represents the hierarchical relationship between the barrier-level basins of L; its leaves correspond to the closed plateaus
5.6 Barriers and Basins
247
of L (including strict local minima), while the internal nodes are closely related to the barriers between them. The following definition formalises these concepts (see also the work of Flamm et al. [2000; 2002]).
Definition 5.13 Barrier-Level Basin, Basin Tree
Let L := (S, N, g ) be a landscape. For each position s ∈ S , consider the set BL(s) of all barrier levels between s and any other position s ∈ S for which bl(s, s ) > max{g (s), g (s )}. Let BLB(s) denote the set of all basins of s below any level in BL(s); the elements of BLB(s) are referred to as barrierlevel basins of s. For barrier-level basins B, B ∈ BLB(s) we say that B is directly contained in B if, and only if, B ⊂ B and there is no B ∈ BLB(s) with B ⊂ B ⊂ B . The basin tree (BT) of L is an edge-weighted tree BT := (V, E, w ) whose vertices represent the barrier-level basins of the positions in S , that is, V := {BLB (s) | s ∈ S } ∪ {S }; the edges in E connect any B ∈ V with the barrier-level basins that are directly contained in B (the children of B in T ), that is, E := {(B, B ) | B, B ∈ V and B is directly contained in B }; and edge-weights are defined as w((B, B )) := l − l , where l and l are the minimal levels of any point in B and B that occurs in none of the children of B and B , respectively.
An example of a basin tree for a very simple landscape is shown in Figure 5.12. Notice that many positions of a given landscape L will be represented by more than one vertex of the respective basin tree. However, there is an easy way of obtaining a closely related (isomorphic) tree structure called the basin partition tree, in which every position is represented by exactly one vertex.
l2
B4
l1
B3
B2 B1
B4 B3
B2 B1
Figure 5.12 Barrier-level basins and basin tree for the simple search landscape from Figure 5.7 (page 237); the edge-weights are not indicated and correspond to the vertical distances between the nodes in the tree.
248
Chapter 5 Search Space Structure and SLS Performance Definition 5.14 Basin Partition Tree
The basin partition tree (BPT) of L, T := (V , E, w ), is obtained from the basin tree of L, T = (V, E, w ), by replacing each vertex B ∈ V that has children B1 , B2 , . . . , Bk by a new vertex B that represents only those positions of B that are not contained in any of its children, i.e., B := B − {B1 , B2 , . . . , Bk } and V := {B | B ∈ V }.
The notion of a basin partition tree can be intuitively illustrated with the following metaphor [Flamm et al., 2000]. Imagine the search landscape being flooded with water such that at some point in time, all positions at a given level l and below are submerged. Clearly, at any time, there is a distinct number of separate, water-filled basins (this number may be one for sufficiently high water levels). Now, we consider all critical water levels at which the ‘land bridges’ between two or more basins just become submerged. Below any such level l, there are two or more distinct basins that, as the water raises above l, are merged into one bigger basin. Under this view, the vertices in the basin partition tree T of the given landscape consist of all positions in a basin between one such critical level and the next higher critical level. Furthermore, any two vertices v , v of T have the same parent v if, and only if, there is a critical level l for which, as the water reaches l, the distinct basins corresponding to v and v become connected. For degenerate landscapes, this can involve the simultaneous flooding of multiple landbridges (i.e., saddles) between v and v . Note that, as desired, the position sets represented by the vertices of a basin partition tree form a complete partition of the respective landscape L, i.e., every position in L is represented by exactly one vertex of the tree. (See also Figure 5.13.) The same property holds for plateau connection graphs, and it is not hard to see that the vertices of the BPT of a given landscape L can be obtained by
R
R
S2
S2 S1 B4 B3
B2 B1
S1
B4 B3
B2 B1
Figure 5.13 Basin partition tree for the landscape from Figure 5.7 (page 237); the edgeweights are not indicated and correspond to the vertical distances between the nodes in the tree.
5.7 Further Readings and Related Work
249
merging sets of vertices in the PCG of L. Hence, basin partition trees can be seen as abstractions of plateau connection graphs which summarise the connectivity between closed plateaus.
5.7 Further Readings and Related Work The topic of search space analysis has been receiving a considerable amount of general interest in recent years, as witnessed by a steadily growing number of studies in this area. Search landscapes of combinatorial problems are studied in such diverse fields as theoretical biology [Wright, 1932; Stadler, 2002a] and chemistry [Mezey, 1987], physics [Frauenfelder et al., 1997; Kirkpatrick and Toulouse, 1985], evolutionary computation [Merz and Freisleben, 1999; Kallel et al., 2001; Reeves and Rowe, 2003], operations research [Reeves, 1999; Boese et al., 1994] and artificial intelligence [Yokoo, 1997; Hoos, 1998; Watson et al., 2003]. Some of the simplest approaches to search space analysis focus on the number of solutions and its link to search cost [Clark et al., 1996; Hoos, 1999b]. However, typically the number of solutions cannot be determined exactly for large instance sizes; furthermore, it is often insufficient to explain even fairly drastic differences in SLS behaviour. Somewhat more detailed analyses consider the relative frequency of occurrence for different types of search positions [Frank et al., 1997; Hoos, 1999b] or the distribution of the local minima over a given search landscape [Kirkpatrick and Toulouse, 1985; Mühlenbein et al., 1988; Kauffman, 1993; Boese, 1996]. Typically, these analyses are based on samples of the given search space. While the number of local minima is usually also difficult to determine exactly, some estimation methods exist [Bray and Moore, 1980; Garnier and Kallel, 2002; Stadler and Schnabl, 1992; Tanaka and Edwards, 1980]. Fitness-distance analysis has received a significant amount of attention in the literature. While precursors of the concept had been proposed and used before [Kirkpatrick and Toulouse, 1985; Mühlenbein et al., 1988; Boese et al., 1994], the fitness-distance correlation coefficient was first defined by Jones and Forrest [1995]. Since then, a number of problems have been analysed using this technique, including the Travelling Salesman Problem [Stützle and Hoos, 2000; Merz and Freisleben, 2001], the Flow Shop Scheduling Problem [Reeves, 1999; Watson et al., 2003], the Quadratic Assignment Problem [Merz and Freisleben, 2000a; Stützle, 1999; Stützle and Hoos, 2000], the Graph Bi-Partitioning Problem [Merz and Freisleben, 2000b], the Set Covering Problem [Finger et al., 2002], the Linear Ordering Problem [Schiavinotto and Stützle, 2003] and many more. However, as illustrated in the work of Naudts and Kallel [2000], there are some pitfalls in solely relying on this measure.
250
Chapter 5 Search Space Structure and SLS Performance
Similarly widely used is the analysis of landscape ruggedness. The ruggedness of search landscapes has been empirically measured for a number of problems, using random walks [Weinberger, 1990] or trajectories of iterative improvement algorithms [Kinnear Jr., 1994]. In addition, there is a substantial amount of theoretical work, much of which is based on the theory of random landscapes [Stadler, 1995]. In particular, there exist strong theoretical results linking spectral landscape theory [Stadler, 2002b] and ruggedness measures such as autocorrelation functions [Stadler, 1996]. For more general overviews of mainly theoretical developments, we refer to the work of Reidys and Stadler [2002] and Stadler [2002a]. There has been some work on more detailed aspects of search space structure. Plateaus and neutrality in search landscapes have been studied in the context of problems from artificial intelligence [Frank et al., 1997; Hoos, 1998] as well as from theoretical biology and chemistry [Huynen et al., 1996; Reidys and Stadler, 2001]. Similarly, there are several theoretical and empirical studies of barriers, basins and related concepts [Ferreira et al., 2000; Flamm et al., 2000; 2002; Hordijk et al., 2003; Stadler and Flamm, 2003]. Plateau connection graphs and basin partition trees have been developed in the context of recent work by Hoos, and are described here for the first time. So far, while these more advanced approaches to landscape analysis hold much promise in the context of understanding the behaviour and performance of SLS algorithms, they are still largely unexplored.
5.8 Summary Search space features and properties have an important impact on the behaviour and performance of SLS algorithms. In this chapter, we introduced and discussed a wide range of measures and techniques that can be used for analysing various aspects of search space structure. We covered a number of fundamental properties, such as search space size and neighbourhood size, the diameter of the neighbourhood graph, and the number, density and distribution of (optimal) solutions. The concept of a search landscape captures the set of candidate solutions, the neighbourhood relation and the evaluation function used in an SLS algorithm, but abstracts from details of the actual search process. Search landscapes can be classified into various landscape types, which have important implications on the behaviour of certain SLS algorithms. To capture local features, search positions, (i.e., candidate solutions) can also be classified into different position types according to their local neighbourhood in the given landscape. As we have
5.8 Summary
251
illustrated, the analysis of position type distributions can yield important information about a given search landscape. Among the different types of positions, local minima are particularly relevant, since they tend to have a detrimental effect on SLS performance. Landscape features such as the number, density and distribution of local minima positions play an important role in analysing the hardness of problem instances for given SLS methods and for understanding SLS behaviour. Fitness-distance analysis (FDA) is an important and widely used method for analysing and characterising search landscapes. FDA captures the correlation between the evaluation function value of search positions and their distance to the closest (optimal) solution. This correlation can be summarised in the fitness distance correlation (FDC) coefficient or studied in more detail by using fitness-distance plots. We introduced methods for empirically determining FDC coefficients and discussed various applications and limitations of fitness-distance analysis. Another important property that is intuitively related to problem hardness and SLS behaviour is landscape ruggedness. Intuitively, for rugged landscapes, the evaluation function value of a search position is only weakly correlated with its direct neighbours. This intuition is captured by the concept of landscape correlation functions. In practice, correlation functions are often approximated using the empirical autocorrelation functions of random walks, which can be summarised by means of correlation length, a widely used measure for landscape ruggedness. The theory of random landscapes provides a mathematical framework for the analysis of landscape ruggedness. In fact, many random search landscapes can be shown to be AR(1) landscapes, in which case the correlation structure is fully defined by the correlation between neighbouring candidate solutions or, analogously, by the correlation length. There is an interesting and intuitive relationship between ruggedness and local minima density. In particular, under certain circumstances, the number and density of local minima can be estimated based on the correlation length of a given landscape; the latter can sometimes be determined analytically or estimated empirically with relatively low computational cost. Measures of landscape ruggedness have been widely used for analysing or predicting the hardness of problem instances. They can also be useful for assessing the relative merits of different neighbourhood relations as the basis for SLS algorithms. However, as in the case of FDC, the usefulness of typical measures of ruggedness for these applications is limited in various ways. Finally, we described various approaches for a more detailed analysis of search landscapes. In many cases, the search landscapes encountered by SLS algorithms for combinatorial decision or optimisation problems contain large plateaus. Features such as the size and diameter of plateaus can have a substantial impact on the behaviour of SLS methods. We distinguished between two
252
Chapter 5 Search Space Structure and SLS Performance
fundamentally different types of plateaus, open and closed plateaus, depending on the existence of exits to lower levels. The density and distribution of exits for the plateaus of a given landscape can have substantial effects on SLS behaviour and performance. Plateau connection graphs capture the way in which plateaus are connected within a given landscape and are often extremely useful for understanding the hardness of problem instances and the behaviour of SLS algorithms. The concepts of basins, barrier levels and saddles, as well as the related concept of local minimum depth provide further means for the detailed analysis and characterisation of search landscapes. They form the basis for the notions of basin partition trees, which can be seen as abstractions of plateau connection graphs; like those, they provide high-level, yet detailed characterisations of landscape structure. The analysis of the spaces and landscapes searched by SLS algorithms is crucial for understanding SLS behaviour and performance, and in many cases provides key insights that can be used for improving existing SLS algorithms. Many relatively well established types of search space analyses are computationally expensive and suffer from various limitations; they can also be rather difficult to perform and care needs to be taken to correctly interpret the results. Nevertheless, techniques such as fitness-distance or autocorrelation analysis can yield useful insights. More advanced methods, such as the ones based on measuring plateau connection graphs or basin partition trees are computationally very expensive, since they require enumeration of large parts of the search space. But at the same time, this type of search space analysis facilitates a much deeper understanding of search space structure and is hence likely to become increasingly important and prominent in the context of analysing and explaining the behaviour and performance of SLS algorithms.
Exercises 5.1
[Easy] Explain why it is possible that for a family of instances of a given combinatorial problem, the number of solutions increases exponentially, while the solution density decreases exponentially, as instance size increases.
5.2
[Easy] Prove that the expected number of search steps required by Uninformed Random Picking for finding a (optimal) solution for a problem instance π with search space S (π ) and k (optimal) solutions is #S (π )/k .
5.3
[Medium] Prove that the neighbourhood graph of a SAT instance under the 2-flip neighbourhood in which neighbouring assignments differ in the truth value
Exercises
253
of exactly two variables is disconnected. Which conclusion can you draw from this fact? 5.4
[Easy] Give a (simple) example for a landscape that is non-neutral, but not locally invertible.
5.5
[Easy] Give a simple argument that intuitively explains why SAT landscapes based on the standard evaluation function, which measures the number of clauses violated under a given assignment, are usually degenerate.
5.6
[Medium] Are there non-neutral search landscapes in which a gradient walk (i.e., a trajectory of Iterative Best Improvement) from a given point is not uniquely defined? Give an example of such a landscape or prove that no such landscape exists.
5.7
[Easy] Give an example for a landscape that has no local minimum other than the global optimum and is yet very hard to search for any standard SLS method.
5.8
[Medium] (a) Show a (fictitious) fitness distance plot that indicates an FDC close to zero. (b) Explain why in this situation random restarts can still be detrimental to the performance of a given SLS algorithm.
5.9
[Medium; Hands-On] Perform a fitness-distance analysis for Novelty+ , a highperformance SLS algorithm for SAT (available from www.sls-book.net), on SATLIB instance bw large.a (available from SATLIB [Hoos and Stützle 2003a]; this formula has exactly one model) based on the best candidate solutions from 1 000 runs, each of which is terminated after n steps, where n is the number of variables in the given problem instance. Measure and report the FDC coefficient and show a fitness-distance plot; interpret the results of your analysis.
5.10 [Medium] What can you say about the plateau connection graphs of non-neutral landscapes?
This Page Intentionally Left Blank
part
II Applications
This Page Intentionally Left Blank
In the mountains of truth you never climb in vain: either you reach new heights today or you practice your strength so you can climb higher tomorrow. —Friedrich Nietzsche, Philosopher
Propositional Satisfiability and Constraint Satisfaction The Satisfiability Problem in Propositional Logic (SAT) is a conceptually simple combinatorial decision problem that plays a prominent role in complexity theory and artificial intelligence. To date, stochastic local search methods are among the most powerful and successful methods for solving large and hard instances of SAT. In this chapter, we first give a general introduction to SAT and motivate its relevance to various areas and applications. Next, we give an overview of some of the most prominent and best-performing classes of SLS algorithms for SAT, covering algorithms of the GSAT and WalkSAT architectures as well as dynamic local search algorithms. We discuss important properties of these algorithms — such as the PAC property — and outline their empirical performance and behaviour. Constraint Satisfaction Problems (CSPs) can be seen as a generalisation of SAT; they form an important class of combinatorial problems in artificial intelligence. In the second part of this chapter, we introduce various types of CSPs and give an overview of prominent SLS approaches to solving these problems. These approaches include encoding CSP instances into SAT and solving the encoded instances using SAT algorithms, various generalisations of SLS algorithms for SAT and native CSP algorithms.
6.1 The Satisfiability Problem As motivated and formally defined in Chapter 1 (page 17ff.), the Satisfiability Problem in Propositional Logic (SAT) is to decide for a given propositional 257
258
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
formula F , whether there exists an assignment of truth values to the variables in F under which F evaluates to true; such satisfying assignments are called models of F and form the solutions of the respective instance of SAT. When applying SLS algorithms to SAT, we are typically more interested in solving the search variant of SAT (i.e., in finding models of a given formula) rather than the decision variant. It should be noted that typical SLS algorithms for SAT (including all SAT algorithms covered in this chapter) are incomplete and hence cannot determine with certainty that a given formula is unsatisfiable (i.e., that it has no models).
CNF Representations and Transformations Most algorithms for SAT, including all state-of-the-art SLS algorithms, are restricted to formulae in conjunctive normal form (CNF), that is, to formulae that are conjunctions over disjunctions of literals. Since any propositional formula can be transformed into a logically equivalent CNF formula, in principle this restriction does not limit the class of SAT instances that can be solved by such algorithms. The naïve method of transforming a non-CNF formula into CNF (using the distributive laws of propositional logic to resolve nestings of ‘∧’ and ‘∨’ that are not allowed in CNF) can lead to an exponential growth in the length of the formula. There is, however, an alternative CNF transformation, which avoids this effect at the cost of introducing a number of additional propositional variables that scales linearly with the size of the given formula in the worst case [Poole, 1984]. When representing problems from other domains as SAT instances, in many cases relatively natural and concise CNF formulations can be found directly without using general CNF transformation methods. In particular, this is the case for many classes of CSPs, and we will discuss approaches for encoding CSP instances as SAT in Section 6.5.
Alternative Formulations of SAT Alternative representations of SAT for CNF formulae are used in various contexts, specifically, when techniques for solving more general problems are applied to SAT. As we will discuss in some more detail in Section 6.5, SAT can be seen as a special case of the more general finite discrete CSP. Another prominent representation encodes the truth values ⊥ (false) and (true) as integers 0 and 1, and propositional variables as integer variables with domain {0, 1}. Negated literals ¬x are then encoded as I (¬x) := 1− x, while positive literals remain unchanged, that is, I (x) := x. Finally, the encoding of a CNF clause ci = l1 ∨ l2 ∨ l3 ∨ . . . ∨ lk(i) is given by I (ci ) := I (l1 ) + I (l2 ) + . . . + I (lk(i) ),
6.1 The Satisfiability Problem
259
and an entire CNF formula F = c1 ∧ c2 ∧ . . . ∧ cm is encoded as I (F ) := I (c1 ) · I (c2 ) · . . . · I (cm ). Then, a truth assignment a satisfies ci if, and only if, the corresponding 0-1 assignment satisfies the inequality I (ci ) ≥ 1, and the CNF formula F is satisfied under a if, and only if, I (F ) ≥ 1. Based on this representation, SAT can be seen as a special case of a discrete constrained optimisation problem: Let ui (F, a) := 1 if clause ci is unsatisfied under assignment a and ui (F, a) := 0 otherwise; furthermore, let U (F, a) := m ∗ i=1 ui (F, a). Then any model of F corresponds to a solution of a ∈ argmina∈{0,1}n U (F, a) subject to ∀i ∈ {1, 2, . . . , m} : ui (F, a) = 0. This type of constrained optimisation problem is a particular case of the 0-1 Integer Linear Programming (ILP) or Boolean Programming Problem. Using these representations, SAT instances can, in principle, be solved using more general CSP or ILP algorithms. In practice, however, this approach has not been able to achieve sufficiently high performance to provide a viable alternative to native SAT solvers, such as the SLS algorithms presented in this chapter (see, e.g., Mitchell and Levesque [1996], Battiti and Protasi [1998], Schuurmans et al. [2001]). However, a number of SAT algorithms, particularly some of the dynamic local search methods presented in Section 6.4, are inspired by more general CSP or constrained optimisation solving techniques. Furthermore, successful SLS algorithms for SAT have been extended to more general classes of CSPs and ILPs, resulting in competitive solvers for these problems (some of these generalised SLS algorithms will be discussed in Section 6.6). Finally, it may be noted that the ILP formulation of SAT can be easily generalised to weighted MAX-SAT, a closely related optimisation problem, for which in some cases more general ILP methods perform much better than for SAT [Resende et al., 1997].
Polynomial Simplification of CNF Formulae One of the advantages of the native, logical formulation of SAT is that propositional formulae in general, and CNF formulae in particular, can often be substantially simplified using computationally cheap reduction techniques. Such reductions have been shown to be crucial in solving various types of SAT instances more effectively; as preprocessing techniques, they can be used for simplifying the input to any SAT algorithm for CNF formulae. One of the simplest reductions is the elimination of duplicate literals and clauses from a given CNF formula. Obviously, this can be performed in time O(n), where n is the size of the formula, and results in a logically equivalent CNF formula. Similarly, all clauses that contain a variable and its negation, and are hence trivially satisfied (tautological clauses), can be detected and eliminated in
260
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
linear time. A slightly more interesting reduction is the elimination of subsumed clauses. A clause c = l1 ∨ l2 ∨ . . . ∨ lk is subsumed by another clause c = l1 ∨ l2 ∨ . . . ∨ lj if, and only if, every literal in c also occurs in c, that is, {l1 , l2 , . . . , lj } ⊆ {l1 , l2 , . . . , lk }. Detection and elimination of all subsumed clauses can be performed efficiently and leads to a logically equivalent formula. Another linear time reduction is the elimination of clauses containing pure literals, that is, variables that occur either only negated or only unnegated in the given formula. Setting such a variable to false or true, respectively, does not change the satisfiability of the formula; hence, all clauses containing such variables can be removed. One of the most important reduction techniques is based on the unit resolution method: If a CNF formula contains a unit clause, that is, a clause consisting of only a single literal, this clause and all clauses containing the same literal can be removed (this is a special case of the subsumption reduction), and all remaining occurrences of the corresponding variable (i.e., the complementary literal) can be removed — this can be seen as a special case of the general resolution rule (see, e.g., Russel and Norvig [2003]). Performing unit resolution for all unit clauses in the original CNF formula leads to a logically equivalent CNF formula; we also refer to this transformation as a single pass of unit propagation. It may be noted that unit resolution can lead to empty clauses, rendering the resulting formula trivially unsatisfiable, or eliminate all clauses, leaving an empty CNF formula, which is trivially satisfiable. Furthermore, unit resolution can produce new unit clauses and hence make further unit resolution steps possible. Repeated application of unit resolution eventually leads to a formula without any unit clauses. We refer to this reduction as complete unit propagation; it can be performed in time O(n) and forms a crucial component of basically any systematic search algorithm for SAT. Unit propagation alone is sufficient for deciding the satisfiability of Horn formulae, that is, CNF formulae in which every clause contains at most one unnegated variable [Dowling and Gallier, 1984], in linear time w.r.t. to the size of the given formula. It also forms the basis of a linear-time algorithm for solving SAT for 2-CNF formulae [del Val, 2000]. Unit propagation provides the basis for two other efficient and practically useful simplification techniques, unary and binary failed literal reduction. The key idea behind unary failed literal reduction is the following: If setting a variable x occurring in the given formula F to true makes F unsatisfiable, then adding the unit clause c := ¬x to F yields a logically equivalent formula F . Since F contains at least one unit clause, c, it can be simplified using unit propagation, which can result in a substantially smaller formula. Whether setting x to true renders F unsatisfiable is determined by adding a unit clause c := x to F , and by checking whether subsequent application of unit propagation produces an empty clause. Complete unary failed literal reduction consists of performing this operation for each variable occurring in the given formula and has
6.1 The Satisfiability Problem
261
complexity O (n2 ). Binary failed literal reduction works analogously but checks whether simultaneously adding any two unary binary clauses, c1 := x and c2 := y and applying unit propagation leads to a trivially unsatisfiable formula. If this is the case, the binary clause c := ¬x ∨ ¬y is added to F , which potentially leads to further simplifications. Binary failed literal reduction has time complexity O(n3 ); it is a fairly expensive operation, but sometimes leads to substantial reductions in the overall time required for solving a given SAT instance (see, e.g., Brafman and Hoos [1999]).
Randomly Generated SAT Instances Many empirical studies of SAT algorithms have made use of randomly generated CNF formulae. Various such classes of SAT instances have been proposed and studied in the literature; in most cases, they are obtained by means of a random instance generator that samples SAT instances from an underlying probability distribution over CNF formulae. The probabilistic generation process is typically controlled by various parameters, which mostly determine syntactic properties of the generated formulae, such as the number of variables and clauses, in a deterministic or probabilistic way. One of the earliest and most widely studied classes of randomly generated SAT instances is based on the random clause length model (also called fixed density model): Given a number of variables, n, and clauses, m, the clauses are constructed independently from each other by including each of the 2n literals with fixed probability p [Franco and Paull, 1983]. A variant of this model was used in Goldberg’s empirical study on the average case time complexity of the Davis Putnam algorithm [Goldberg, 1979]. Theoretical and empirical results show that this family of instance distributions is mostly easy to solve on average using rather simple deterministic algorithms [Cook and Mitchell, 1997; Franco and Swaminathan, 1997]. As a consequence, the random clause length model is no longer widely used for evaluating the performance of SAT algorithms. Similar considerations apply to other distributions of SAT instances, such as the instances obtained from the AIM instance generator [Asahiro et al., 1996], which can be solved in polynomial time by binary failed literal reduction [Hoos and Stützle, 2000a]. To date, the most prominent class of randomly generated SAT instances that is used extensively for evaluating the performance of SAT algorithms is based on the so-called fixed clause length model and known as Uniform Random k -SAT [Franco and Paull, 1983; Mitchell et al., 1992]. For a given number of variables, n, number of clauses, m, and clause length k , Uniform Random k -SAT instances are obtained as follows. To generate a clause, k literals are chosen independently
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
and uniformly at random from the set of 2· n possible literals (the n propositional variables and their negations). Clauses are not included into the problem instance if they contain multiple copies of the same literal, or if they are tautological, that is, they contain a variable and its negation. Using this mechanism, clauses are generated and added to the formula until it contains m clauses overall.
Random k-SAT Hardness and Solubility Phase Transition
0.8
0
0.6
−1
0.4
−2 P(sat) P(unsat) kcnfs mean sc (all)
0.2
−3
0 3
3.5
4
4.5
#cl/#var
5
5.5
6
−4
1
1
0.8
0
0.6
−1 −2
0.4 kcnfs mean sc (unsat) kcnfs mean sc (all) nov+ mean sc (sat) P(sat) P(unsat)
0.2 0 3
3.5
4
4.5
5
5.5
−3
6
−4
log mean search cost [CPU sec]
1
P(sat), P(unsat)
1
log mean search cost [CPU sec]
One particularly interesting property of Uniform Random k -SAT is the occurrence of a phase transition phenomenon, that is, a rapid change in solubility that can be observed when systematically increasing (or decreasing) the number m of clauses for a fixed number of variables n [Mitchell et al., 1992; Kirkpatrick and Selman, 1994]. More precisely, for small m, almost all formulae are underconstrained and therefore satisfiable; when crossing some critical value m∗ , the probability of generating a satisfiable instance drops sharply to almost zero. Beyond m∗ , almost all instances are overconstrained and thus unsatisfiable. (For an illustration, see Figure 6.1.) For Uniform Random 3-SAT, it has been empirically shown that this phase transition occurs approximately at m∗ = 4.26n for large n; for smaller n,
P(sat), P(unsat)
262
#cl/#var
Figure 6.1 The Uniform Random 3-SAT solubility phase transition, illustrated here for formulae with n = 200 variables. Left: Empirically measured probability of obtaining satisfiable vs unsatisfiable instances and mean search cost (measured in terms of CPU time required for solving a given instance) for kcnfs, a state-of-the-art systematic search algorithm for this problem class. Right: Mean search cost (sc) for kcnfs on unsatisfiable vs satisfiable instances for the same test-sets as in the left figure, and mean search cost (measured in terms of mean CPU time for solving a given instance) of Novelty+, a highperformance SLS algorithm. (All logarithms are base 10.)
6.1 The Satisfiability Problem
263
the critical clauses/variable ratio m∗ /n is slightly higher [Mitchell et al., 1992; Crawford and Auton, 1996]. For fixed k , the transition becomes increasingly sharp as n grows; furthermore, the critical value m∗ increases with k [Kirkpatrick and Selman, 1994]. Empirical analyses have shown that problem instances from the phase transition region of Uniform Random 3-SAT tend to be particularly hard for both systematic SAT solvers [Cheeseman et al., 1991; Crawford and Auton, 1996] and SLS algorithms [Yokoo, 1997]. Figure 6.1 illustrates this effect for kcnfs [Dubois and Dequen, 2001], a state-of-the-art systematic search algorithm for this problem class and Novelty+ [Hoos, 1999a], a high-performance SLS algorithm for SAT (see also Section 6.3, page 276ff.). Striving to evaluate their algorithms on hard problem instances, many researchers are using test-sets sampled from the phase transition region of Uniform Random 3-SAT. Particularly in the context of empirical studies including incomplete SAT algorithms, these test-sets are separated into satisfiable and unsatisfiable instances using state-of-the-art complete SAT solvers [Hoos and Stützle, 2000b]. Although similar results hold for Uniform Random k -SAT with k > 3, test-sets from these instance distributions are rarely used.
SAT-Encodings of Other Combinatorial Problems Since SAT is an N P-complete problem, any other problem in N P can be encoded into SAT in polynomial time and space. SAT-encoded instances of various combinatorial problems play an important role in evaluating and characterising the performance of SAT algorithms; these combinatorial problems stem from various domains, including mathematical logic, artificial intelligence and VLSI engineering. Finite, discrete constraint satisfaction problems (CSPs) can be seen as a generalisation of SAT in which variables can have domains other than truth values, and constraints between the values assigned to individual variables can be different from the ones captured by CNF clauses. CSPs are also often a natural intermediate stage in encoding other combinatorial problems into SAT. CSP instances can be encoded into SAT in various ways; CSPs and their encodings into SAT will be further discussed in Section 6.5. It has been shown that certain types of randomly generated CSPs can be solved similarly efficiently by applying current SAT algorithms to SAT-encoded instances as by using state-of-the-art CSP algorithms [Hoos, 1999c; 1999b] (see also Section 6.6). Other prominent examples of SAT-encoded instances of combinatorial problems include graph colouring, various types of planning and scheduling problems, Boolean function learning, inductive inference, cryptographic key search, and
264
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
n-Queens [Gu et al., 1997; Hoos and Stützle, 2000c]. For some of these, particularly in the case of SAT-encoded STRIPS planning problems from the ‘blocks world’ and ‘logistics’ domains, applying SAT solvers and reduction techniques to suitably encoded problem instances has been shown to achieve performance levels that are competitive with state-of-the-art algorithms for the respective original problem [Kautz and Selman, 1996]. Key factors underlying such results are the conceptual simplicity of SAT, which facilitates the design and efficient implementation of algorithms, and the large amount of knowledge on techniques for solving SAT and their specific properties. Furthermore, using suitable SAT encodings and reduction techniques is of crucial importance for solving the resulting SAT problems efficiently. Interestingly, the size of the SAT encodings is not always indicative of the difficulty of solving them. Particularly, it has been shown for various problem domains that compact SAT encodings that result in instances with small search spaces can be much more difficult to solve than sparser encodings that produce instances with substantially larger search spaces [Ernst et al., 1997; Hoos, 1998; 1999b].
Some Practical Applications of SAT Despite its conceptual simplicity and abstract nature, the SAT problem has various practical applications. Some of the most prominent industrially relevant SAT applications stem from hardware design and verification, in particular, from the verification of reactive systems, such as microprocessor components. In an approach called bounded model checking (BMC), a system and a specification of its formal properties can be encoded into a propositional formula, whose models correspond to bugs, that is, situations in which the behaviour of the system violates its specifications [Biere et al., 1999a; 1999b]. Similar to SAT encodings of planning problems that require the plan length to be bounded, in BMC, the size of the bug, that is, the number of states of the system involved in the bug, is limited by a constant. It may be noted that for proving that a given system does not have any bugs below a certain size, a complete SAT solver is required. Incomplete SAT solvers, such as the SLS algorithms for SAT covered in the following sections, can be used, however, to find bugs efficiently. Symbolic model checking methods, such as BMC, are increasingly gaining industrial acceptance, because compared to traditional, simulation-based validation techniques, they detect a wider range of bugs, including subtle error conditions. Many traditional formal verification techniques use binary decision diagrams (BDDs) [Bryant, 1986] for representing propositional formulae. By
6.1 The Satisfiability Problem
265
using CNF encodings and standard SAT algorithms in a BMC approach, it is often possible to find bugs faster, and to find bugs of minimal size; the latter is important, since small bugs are typically easier to understand for a human system tester or designer. Furthermore, BDD-based approaches often require extremely large amounts of memory as well as specialised techniques for finding models of the given propositional formula, while the CNF representations are typically more concise and can be solved using standard SAT algorithms [Biere et al., 1999a]. (It should be noted, however, that BDD representations facilitate solving problems beyond SAT, such as finding all solutions of a given formula.) Another application area in which SAT encodings and solvers have been successfully used for solving real-world problems is asynchronous circuit design [Vanbekbergen et al., 1992; Gu and Puri, 1995]. In one prominent approach to asynchronous circuit synthesis, the circuits are specified using signal transition graphs (STGs). One of the core problems is then to assign a distinguishable binary code to every circuit state. This Complete State Coding (CSC) Problem can be modelled as a SAT problem, but the size and hardness of the formulae thus obtained limits the practical applicability of SAT algorithms for solving CSC instances. However, by partitioning the STG into smaller components and using a SAT algorithm to solve the corresponding CSC subproblems, substantial performance improvements can be obtained for industrial asynchronous circuit design benchmarks [Gu and Puri, 1995]. Finally, SAT algorithms have been recently used for solving real-world sports scheduling problems [Zhang, 2002]. Specifically, the problem of finding fair schedules for college conference basketball tournaments can be encoded into SAT. This encoding is based on a decomposition of the problem into three phases, each of which deals with different constraints of the overall scheduling problem. Using a standard SAT algorithm for solving the SAT instances for the three phases, real-world college conference basketball scheduling problems were solved substantially more efficiently than by previous, specialised techniques, and more balanced schedules were obtained than the ones that are currently used for these tournaments [Zhang, 2002].
Generalisations and Related Problems Many generalisations of the Propositional Satisfiability Problem have been proposed and studied in the literature. As mentioned above, the Constraint Satisfaction Problem (CSP) can be seen as a generalisation of SAT. Multi-Valued SAT [Béjar and Manyà, 1999; Frisch and Peugniez, 2001] and Pseudo-Boolean CSP [Abramson et al., 1996; Connolly, 1992; Walser, 1997; Løkketangen, 2002] are
266
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
two special cases of the CSP that are closely related to SAT. Multi-Valued SAT (MV-SAT) allows variables whose domains are arbitrary finite sets of values and uses logical constraints similar to CNF clauses. Pseudo-Boolean CSPs use binary variables with domain {0, 1}, but allow more general constraints. Both, MV-SAT and Pseudo-Boolean CSP, as well as general finite discrete CSPs will be further discussed in Section 6.5. The optimisation variant of SAT, in which the objective is to maximise the number of satisfied clauses, of a given CNF formula rather than completely satisfying every clause, is called (unweighted) MAX-SAT. In a further generalisation called weighted MAX-SAT, weights (usually positive integer or real numbers) are associated with the clauses of a given CNF formula, and the objective is to find a variable assignment that maximises the total weight of the satisfied clauses. As one of the conceptually simplest combinatorial optimisation problems, and because of its close relation to SAT, MAX-SAT plays an important role in the development and evaluation of search algorithms for hard combinatorial problems. In general, the best known methods for solving MAX-SAT problems are SLS algorithms. MAX-SAT problems and SLS algorithms for MAX-SAT will be discussed in more detail in Chapter 7. Another interesting generalisation of SAT is Dynamic SAT (DynSAT) [Hoos and O’Neill, 2000]; intuitively, in DynSAT, a given CNF formula changes over time and a solution consists of a sequence of models such that at any time, the current CNF formula is satisfied by the current model. Equivalently, DynSAT can be defined in such a way that each problem instance consists of a conventional CNF formula some of whose variables are fixed to specific truth values at certain times. SLS algorithms for SAT can be generalised to DynSAT in a straight-forward way and appear to be well-suited for solving these problems. Let us mention three other prominent problems that are closely related to SAT. In the Propositional Validity Problem (VAL), the objective is to decide whether a given propositional formula F is valid, that is, whether F is satisfied under all of its variable assignments [Russel and Norvig, 2003]. VAL and SAT are dual problems in the sense that any formula F is valid if, and only if, ¬F is unsatisfiable. Hence, any complete algorithm for SAT can be used for deciding VAL and vice versa. VAL is an important problem in theorem proving and has applications in artificial intelligence and other areas of computer science. The Satisfiability Problem for Quantified Boolean Formulae (QSAT) can be seen as a generalisation of both SAT and VAL. A quantified Boolean formula (QBF) is a propositional formula in which all variables are quantified existentially (∃) or universally (∀). A QBF of the form ∃x : F is satisfiable if, and only if, either
6.2 The GSAT Architecture
267
assigning x := or x := ⊥ makes F satisfiable, and a QBF of the form ∀x : F is satisfiable if, and only if, both x := and x := ⊥ render F satisfiable (see, e.g., Cadoli et al. [2002] or Rintanen [1999b]). Many important problems in artificial intelligence can be mapped directly into QSAT, including conditional planning, abduction and non-monotonic reasoning [Rintanen, 1999a; 1999b]. QSAT also plays a prominent role in complexity theory, where it is prototypical and complete for the problems in the polynomial hierarchy. Finally, #SAT is a variant of SAT in which, given a propositional formula F , the objective is to determine the number of models of F (counting variant) or to decide whether F has at least a given number of models (decision variant) [Roth, 1996; Bailey et al., 2001]. This problem has important applications to approximate reasoning problems in artificial intelligence; it is also of substantial theoretical interest, as the counting variant of #SAT is the prototypical complete problem for the complexity class #P, and the decision variant is a prototypical complete problem for the probabilistic complexity class PP [Papadimitriou, 1994].
6.2 The GSAT Architecture The GSAT algorithm [Selman et al., 1992] was one of the first SLS algorithms for SAT; it had a very significant impact on the development of a broad range of SAT solvers, including most of the current state-of-the-art SLS algorithms for SAT. Like all SAT algorithms covered in this chapter, GSAT is based on a 1exchange neighbourhood in the space of all complete truth value assignments of the given formula; under this ‘one-flip neighbourhood’, two variable assignments are neighbours if, and only if, they differ in the truth assignment of exactly one variable. Furthermore, GSAT uses an evaluation function g (F, a) that maps each variable assignment a to the number of clauses of the given formula F unsatisfied under a. Note that the models of F are exactly the assignments with evaluation function value zero. GSAT and most of its variants are iterative improvement methods that flip the truth value of one variable in each search step. The selection of the variable to be flipped is typically based on the score of a variable x under the current assignment a, which is defined as g (F, a) − g (F, a ), where a is the assignment obtained from a by flipping the truth value of x. Algorithms of the GSAT architecture differ primarily in their underlying variable selection method. In the following, we describe some of the most widely known and best-performing GSAT algorithms.
268
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
procedure GSAT (F, maxTries, maxSteps) input: CNF formula F, positive integers maxTries and maxSteps output: model of F or ‘no solution found’ for try := 1 to maxTries do a := randomly chosen assignment of the variables in formula F ; for step := 1 to maxSteps do if a satisfies F then return a end x := randomly selected variable flipping that minimizes the number of unsatisfied clauses; a := a with x flipped; end end return ‘no solution found’ end GSAT Figure 6.2 The basic GSAT algorithm; all random selections are according to a uniform
probability distribution over the underlying sets.
Basic GSAT The core of the basic GSAT algorithm [Selman et al., 1992] consists of a simple best-improvement search procedure: Starting from a randomly chosen variable assignment, in each local search step, one of the variables with maximal score, that is, a variable that results in a maximal decrease in the number of unsatisfied clauses, is flipped. If there are several variables with maximal score, one of them is randomly selected according to a uniform distribution. The iterative best-improvement search underlying GSAT gets easily stuck in local minima of the evaluation function. Therefore, GSAT uses a simple static restart mechanism that re-initialises the search at a randomly chosen assignment every maxFlips flips. The search is terminated when a model of the given formula F has been found, or after maxTries sequences (also called ‘tries’) of maxFlips variable flips each have been performed without finding a model of F (see Figure 6.2). Straightforward implementations of GSAT are rather inefficient, since in each step the scores of all variables have to be calculated from scratch. The key to efficiently implementing GSAT is to compute the complete set of scores only once at the beginning of each try, and then after each flip to update only the scores of those variables that were possibly affected by the flipped variable. Details on these implementation issues for GSAT and related algorithms are discussed in the in-depth section on page 271. For any fixed number of restarts, GSAT is essentially incomplete [Hoos, 1998; 1999a], and severe stagnation behaviour is observed on most SAT instances.
6.2 The GSAT Architecture
269
Still, when it was introduced, GSAT outperformed the best systematic search algorithms for SAT available at that time. To date, basic GSAT’s performance is substantially weaker than that of any of the other algorithms described in the following, and the algorithm is mainly of historical interest.
GSAT with Random Walk (GWSAT) Basic GSAT can be significantly improved by extending the underlying search strategy into a randomised best-improvement method (see Chapter 2, page 72ff.). This is achieved by introducing an additional type of local search step, so-called conflict-directed random walk steps. In this type of random walk step, first a currently unsatisfied clause c is selected uniformly at random. Then, one of the variables appearing in c is randomly selected and flipped, thus effectively forcing c to become satisfied. A simple SLS algorithm that initialises the search by randomly picking an assignment (like basic GSAT) and then performs a sequence of these conflict-directed random walk steps has been proven to solve 2-SAT in quadratic expected time [Papadimitriou, 1991]; this result inspired the use of this type of random walk to extend basic GSAT. The basic idea of GWSAT is to decide at each local search step with a fixed probability wp (called walk probability or noise setting) whether to do a standard GSAT step or a variant of a conflict-directed random walk step, in which a variable is flipped that has been selected uniformly at random from the set of all variables occurring in currently unsatisfied clauses. Note that the variables that can be flipped in this latter type of random walk step are exactly the same as for the conflict-directed random walk steps described above, only the probabilistic bias may differ, depending on the number and length of clauses in which a given variable appears. For any wp > 0, this algorithm allows arbitrarily long sequences of random walk steps; this implies that from arbitrary assignments, a model (if existent) can be reached with a positive, bounded probability [Hoos, 1999a]. In particular, this allows the algorithm to escape from any local minima region of the underlying search space. Hence, the probability that GWSAT (without random restart) applied to a satisfiable formula finds a solution converges to one as the run-time approaches infinity, that is, GWSAT is probabilistically approximately complete (PAC). Like all GSAT algorithms, GWSAT uses the same static restart mechanism as basic GSAT. Generally, GWSAT achieves substantially better performance than basic GSAT. It has been shown that when using sufficiently high noise settings (the precise value varies between problem instances), GWSAT does not suffer from stagnation behaviour. Furthermore, for hard SAT instances, it typically shows exponential RTDs [Hoos, 1998; Hoos and Stützle, 1999]; hence, static restart
270
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
strategies are ineffective, and optimal speedup can be obtained by multiple independent runs parallelisation (see Chapter 4, Section 4.4). For low noise settings, stagnation behaviour is frequently observed; recently, there has been evidence that the corresponding RTDs can be characterised by mixtures of exponential distributions [Hoos, 2002b].
GSAT with Tabu Search (GSAT/Tabu) The best-improvement search underlying basic GSAT can be easily extended into a simple tabu search strategy. GSAT/Tabu is obtained from basic GSAT by associating a tabu status with propositional variables of the given formula [Mazure et al., 1997; Steinmann et al., 1997]. In GSAT/Tabu, after a variable x has been flipped, it cannot be flipped back within the next tt steps, where the tabu tenure, tt, is a parameter of the algorithm. In each search step, the variable to be flipped is selected as in basic GSAT, except that the choice is restricted to variables that are currently not tabu. Upon search initialisation, the tabu status of all variables is cleared. Efficient implementations of GSAT/Tabu store for each variable x the search step number tx when it was last flipped. When initialising the search, all the tx are set to −tt; subsequently, every time a variable x is flipped, tx is set to the current search step number t since the last initialisation of the search process. A variable x is tabu if, and only if, t − tx ≤ tt. Unlike in the case of GWSAT, it is not clear whether GSAT/Tabu with fixed cutoff parameter maxSteps has the PAC property. Intuitively, for low tt, the algorithm may not be able to escape from extensive local minima regions without using restart, while for high tt settings, all the routes to a solution may be cut off, because too many variables are tabu. In practice, for very short tabu tenure, GSAT/Tabu often shows severe stagnation behaviour (the tt value for which this occurs depends on the given problem instance). For sufficiently high tabu tenure settings, GSAT/Tabu does not suffer from stagnation behaviour, and for hard problem instances, it shows exponential RTDs. As with GWSAT’s noise parameter, very high settings of tt, although not causing stagnation behaviour, uniformly decrease GSAT/Tabu’s performance. Using instance-specific optimised tabu tenure settings for GSAT/Tabu and similarly optimised noise settings for GWSAT, GSAT/Tabu typically performs significantly better than GWSAT, particularly when applied to large and structured SAT instances [Hoos and Stützle, 2000a]. (There are, however, a few exceptional cases where GSAT/Tabu performs substantially worse than GWSAT, including well-known SAT-encoded instances of logistics planning problems.) Analogous to basic GSAT, GSAT/Tabu can be extended with a random walk mechanism; limited experimentation suggests that typically this hybrid algorithm
6.2 The GSAT Architecture
271
does not perform better than GSAT/Tabu [Steinmann et al., 1997]. Overall, besides the dynamic local search algorithms covered in Section 6.4, GSAT/Tabu is one of the best-performing variants of GSAT known to date. (see Example 6.1 on page 280.)
HSAT and HWSAT The intuition behind HSAT [Gent and Walsh, 1993b] is based on the observation that in basic GSAT, some variables might never get flipped although they are frequently eligible to be chosen. This can cause stagnation behaviour, since one of these variables may have to be flipped to allow the search to make further progress. Therefore, when in a search step there are several variables with identical score, HSAT always selects the least recently flipped variable, that is, the variable that was flipped longest ago. Only shortly after search initialisation, when there are still variables that have not been flipped, HSAT performs the same random tie breaking between variables with identical score as plain GSAT. Apart from this difference in the variable selection mechanism, HSAT is identical to basic GSAT. Although HSAT was found to show superior performance over basic GSAT [Gent and Walsh, 1993b], it is clear that it is even more likely to get stuck in local minima from which it cannot escape, since the history-based tie-breaking rule effectively restricts the search trajectories when compared to GSAT. To counteract this problem, HSAT can be extended with the same random walk mechanism as used in GWSAT. The resulting variant is called HWSAT [Gent and Walsh, 1995]; like GWSAT, HWSAT has the PAC property. Generally, HWSAT shows improved peak performance over GWSAT. Compared to GSAT/Tabu, HWSAT’s performance appears to be somewhat better on hard Uniform Random 3-SAT instances and certain types of structured SAT problems, and significantly worse in many other cases [Hoos and Stützle, 2000a].
In Depth Efficiently Implementing GSAT The key to implementing GSAT algorithms efficiently lies in caching and updating the variable scores that form the basis for selecting the variable to be flipped in each search step. Typically, not all variable scores change after each search step; this suggests that rather than recomputing all variable scores in each step, it should be more efficient to compute all scores when the search is initialised, but to subsequently only update the scores affected by a variable that has been flipped. The following definition will help to explain the precise mechanism for incrementally updating the scores and to analyse its time complexity.
272
Chapter 6 Propositional Satisfiability and Constraint Satisfaction Definition 6.1 Variable and Clause Dependencies Given a CNF formula F and two variables x, x appearing in F , x is dependent on x (and vice versa) if, and only if, there is a clause in which both x and x appear. Furthermore, we define the set of variables dependent on x as
Vdep (F, x) := {x ∈ Var(F ) | x} is dependent on x A clause c of F is dependent on x, if, and only if, x appears in c, and the set of clauses dependent on x is defined as
Cdep (F, x) := {c is a clause of F | c} is dependent on x A clause c is critically satisfied by a variable x under assignment a if, and only if, x appears in c, c is satisfied under a, and flipping the value of x makes c unsatisfied. Finally, a variable x is critically dependent on a variable x under assignment a, if, and only if, there is a clause c that is dependent on x and x , and flipping x results in the clause to change its satisfaction status from (i) satisfied to unsatisfied or vice versa, or (ii) satisfied to critically satisfied (by x ) or vice versa.
After flipping a variable x, only clauses dependent on x can change their satisfaction status; hence, in order to update the evaluation function value (i.e., the number of unsatisfied clauses), only the clauses in Cdep (x, F ) need to be considered. According to the definition of a variable’s score, the score of x just changes its sign as a consequence of flipping x. For all other variables x = x, the score of x remains unchanged if x is not dependent on x, that is, if x ∈ / Vdep (F, x). Hence, after flipping x, only the scores of the variables in Vdep (F, x) need to be updated. In fact, among those, only the scores of variables that critically depend on x can actually change. For a given formula F with n variables, m clauses, and a clause length (number of literals per clause) bounded from above by CL(n), the time complexity of computing all variable scores is O(m · CL(n)). This is achieved by going through all clauses, checking their satisfaction status, and increasing or decreasing the scores of the variables appearing in a clause c, depending on whether c is currently unsatisfied, or whether it is critically satisfied by a given variable. At the end of this process, the evaluation function value, a list of all unsatisfied clauses, and all variable scores have been computed. After each search step, all variable scores that are affected by the respective flip can be updated in time O(CD(n) · CL(n)), where CD(n) is an upper bound on the cardinality of the sets Cdep (F, x). This is achieved by going through all clauses that are dependent on the flipped variable, x, and updating the scores of the variables occurring in these, depending on the (critical) satisfaction status of the respective clause before and after the flip of x. In order to perform this operation efficiently, for each variable x, a list is kept of the clauses that are dependent on x; these lists are built when parsing the input formula. For each variable, we furthermore store its current truth value and score, and for each clause, we store its (critical) satisfaction status under the current assignment.
6.3 The WalkSAT Architecture
273
For Uniform Random k -SAT formulae with constant clauses/variable ratio, the average number of dependent clauses for each variable is constant. Therefore, independent of instance size, this implementation of GSAT achieves a time complexity of Θ(1) for each search step, compared to Θ(n2 ) for a naïve implementation in which all variable scores are computed before every variable flip. For SAT-encoded instances of other combinatorial problems, there are typically more extensive variable dependencies, leading to a somewhat reduced, but still substantial performance advantage of the efficient implementation described above. The efficient mechanism for caching and updating variable scores described here is also used in Selman and Kautz’s publicly available reference implementation of GSAT. Very similar techniques can be used for efficiently implementing other SLS algorithms, such as Galinier and Hao’s tabu search algorithm for the CSP, which is outlined in Section 6.6. Interestingly, for the WalkSAT algorithms described in the following, a more straight-forward implementation, which does not use the previously described caching and incremental updating scheme, achieves slightly better performance.
6.3 The WalkSAT Architecture The WalkSAT architecture is based on ideas first published by Selman, Kautz and Cohen [1994] and was later formally defined as an algorithmic framework by McAllester, Selman and Kautz [1997]. WalkSAT can be seen as an extension of the conflict-directed random walk method that is also used in Papadimitriou’s algorithm [1991] and GWSAT. It is based on a 2-stage variable selection process focused on the variables occurring in currently unsatisfied clauses. For each local search step, in a first stage, a clause c that is unsatisfied under the current assignment is selected uniformly at random. In a second stage, one of the variables appearing in c is then flipped to obtain the new assignment. Thus, while the GSAT architecture is characterised by a static neighbourhood relation between assignments with Hamming distance one, using this two-stage procedure, WalkSAT algorithms are effectively based on a dynamically determined subset of the GSAT neighbourhood relation. As a consequence of this substantially reduced effective neighbourhood size, WalkSAT algorithms can be implemented efficiently without caching and incrementally updating variable scores and still achieve substantially lower CPU times per search step than efficient GSAT implementations [Hoos, 1998; Hoos and Stützle, 2000a]. All WalkSAT algorithms considered here use the same random search initialisation and static random restart as GSAT. A pseudo-code representation of the WalkSAT architecture is shown in Figure 6.3.
274
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
procedure WalkSAT (F, maxTries, maxSteps, slc) input: CNF formula F, positive integers maxTries and maxSteps, heuristic function slc output: model of F or ‘no solution found’ for try := 1 to maxTries do a := randomly chosen assignment of the variables in formula F ; for step := 1 to maxSteps do if a satisfies F then return a end c := randomly selected clause unsatisfied under a; x := variable selected from c according to heuristic function slc; a := a with x flipped; end end return ‘no solution found’ end WalkSAT Figure 6.3 The WalkSAT algorithm family. All random selections are according to a
uniform probability distribution over the underlying sets; WalkSAT algorithms differ in the variable selection heuristic slc.
WalkSAT/SKC The first WalkSAT algorithm, WalkSAT/SKC, originally introduced in a paper by Selman, Kautz and Cohen [1994], differs in one important aspect from most of the other SLS algorithms for SAT: The scoring function scoreb (x) used by WalkSAT/SKC counts the number of currently satisfied clauses that will be broken— that is: become unsatisfied — by flipping a given variable x. Using this scoring function, the following variable selection scheme is applied: If there is a variable with scoreb (x) = 0 in the clause selected in stage 1, that is, if c can be satisfied without breaking another clause, this variable is flipped (zero damage step). If more than one such variable exists in c, one of them is selected uniformly at random and flipped. If no such variable exists, with a certain probability 1-p, the variable with minimal scoreb value is selected (greedy step; ties are broken uniformly at random); in the remaining cases, that is, with probability p (the socalled noise setting), one of the variables from c is selected uniformly at random (random walk step). Conceptually as well as historically, WalkSAT/SKC is closely related to GWSAT. However, there are a number of significant differences between both algorithms, which in combination account for the generally superior performance of WalkSAT/SKC. Both algorithms use closely related types of random walk
6.3 The WalkSAT Architecture
275
steps; but WalkSAT/SKC applies them only under the condition that there is no variable with scoreb (x) = 0. In GWSAT, on the other hand, random walk steps are performed in an unconditional probabilistic way. From this point of view, WalkSAT/SKC is greedier, since random walk steps, which usually increase the number of unsatisfied clauses, are only performed when every variable occurring in the selected clause would break some clauses when flipped. Yet, in a greedy step, due to its two-stage variable selection scheme, WalkSAT/SKC chooses from a significantly reduced set of neighbours and can therefore be considered less greedy than GWSAT. Finally, because of the different scoring function, in some sense, GWSAT shows a greedier behaviour than WalkSAT/SKC: In a best-improvement step, GWSAT may prefer a variable that breaks some clauses, but compensates for this by fixing other clauses, whilst in the same situation, WalkSAT/SKC would select a variable that may lead to a smaller reduction in the total number of unsatisfied clauses, but breaks fewer currently satisfied clauses. It has been proven that WalkSAT/SKC with fixed maxTries parameter has the PAC property when applied to 2-SAT [Culberson et al., 2000], but it is not known whether the algorithm is PAC in the general case. Note that, differently from GWSAT, it is not clear whether WalkSAT/SKC can perform arbitrarily long sequences of random walk steps, since random walk steps are only possible when the selected clause does not allow any zero damage steps. In practice, however, WalkSAT/SKC does not appear to suffer from any stagnation behaviour when using sufficiently high (instance-specific) noise settings, in which case its run-time behaviour is characterised by exponential RTDs [Hoos, 1998; Hoos and Stützle, 1999; 2000a]. Like in the case of GWSAT, stagnation behaviour is frequently observed for low noise settings, and there is some evidence that the corresponding RTDs can be characterised by mixtures of exponential distributions [Hoos, 2002b]. Generally, when using (instance-specific) optimised noise settings, WalkSAT/SKC probabilistically dominates GWSAT in terms of the number of variable flips required for finding a model to a given formula, but it does not always reach the performance of HWSAT or GSAT/Tabu. When comparing CPU time, however, WalkSAT/SKC typically outperforms all GSAT variants presented in Section 6.2. (See also Example 6.1 on page 280.)
WalkSAT with Tabu Search (WalkSAT/Tabu) Analogously to GSAT/Tabu, there is also an extension to WalkSAT/SKC that uses a simple tabu search mechanism. WalkSAT/Tabu [McAllester et al., 1997] uses the same two-stage selection mechanism and the same scoring function scoreb
276
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
as WalkSAT/SKC and additionally enforces a tabu tenure of tt steps for each flipped variable. (To implement this tabu mechanism efficiently, the same approach is used as described in Section 6.2 for GSAT/Tabu.) If the selected clause c does not allow a zero damage step, of all the variables occurring in c that are not tabu, WalkSAT/Tabu picks the one with the highest scoreb value; when there are several variables with the same maximal score, one of them is selected uniformly at random. It may happen, however, that all variables appearing in c are tabu, in which case no variable is flipped (a so-called null-flip). WalkSAT/Tabu with fixed maxTries parameter has been shown to be essentially incomplete [Hoos, 1998; 1999a]. Although this is mainly caused by null-flips, it is not clear whether replacing null-flips by random walk steps, for instance, would be sufficient for obtaining the PAC property. In practice, when using sufficiently high (instance-specific) tabu tenure settings, WalkSAT/Tabu’s run-time behaviour is characterised by exponential RTDs; but there are cases (particularly for structured SAT instances) in which extreme stagnation behaviour is observed. Typically, however, WalkSAT/Tabu performs significantly better than WalkSAT/SKC, and there are structured SAT instances (e.g., large SAT-encoded blocks world planning problems), for which WalkSAT/Tabu appears to achieve better performance than any other SLS algorithm currently known [Hoos and Stützle, 2000a].
Novelty and Novelty+ Novelty [McAllester et al., 1997] is a WalkSAT algorithm that uses a historybased variable selection mechanism similar to HSAT. In Novelty, the number of local search steps that have been performed since a variable was last flipped is taken into consideration; this value is called the variable’s age. An important difference of Novelty compared to WalkSAT/SKC and WalkSAT/Tabu is that it uses the same scoring function as GSAT. In Novelty, after an unsatisfied clause has been chosen, the variable to be flipped is selected as follows. If the variable with the highest score does not have minimal age among the variables within the same clause, it is always selected. Otherwise, it is only selected with a probability of 1-p, where p is a parameter called the noise setting. In the remaining cases, the variable with the next lower score is selected (see also Figure 6.4). When sorting the variables according to their scores, ties are broken according to decreasing age. (If there are several variables with identical score and age, the reference implementation by Kautz and Selman always chooses the one appearing first in the selected clause.) Note that for p > 0, the age-based variable selection of Novelty probabilistically prevents flipping the same variable over and over again; at the same time,
6.3 The WalkSAT Architecture
best variable does not have minimal age
select best variable from clause
277
best variable has minimal age
p
1-p
select best select 2nd best variable from clause variable from clause
Figure 6.4 Decision tree representation of Novelty’s mechanism for selecting a variable to
be flipped within a given clause. Deterministic and probabilistic choices are represented by shaded and white circles, respectively; edges are labelled with the respective conditions and probabilities. Shaded boxes indicate variable decision actions.
flips can be immediately reversed with a certain probability if no better choice is available. Generally, the Novelty algorithm is significantly greedier than WalkSAT/SKC, since always one of the two most improving variables from a clause is selected, where WalkSAT/SKC may select any variable if no improvement can be achieved without breaking other clauses. Also, Novelty is more deterministic than WalkSAT/SKC and GWSAT, since its probabilistic decisions are more limited in their scope and take place under more restrictive conditions. For example, different from WalkSAT/SKC, the Novelty strategy for variable selection within a clause is deterministic for both p = 0 and p = 1. On the one hand, this typically leads to a significantly improved performance of Novelty when compared to WalkSAT/SKC. On the other hand, because of this property, it can be shown that, for fixed maxTries setting, Novelty is essentially incomplete [Hoos, 1998], because selecting only among the best two variables in a given clause can lead to situations where the algorithm gets stuck in local minima of the objective function. This situation has been observed for a number of commonly used benchmark instances, where it severely compromises Novelty’s performance [Hoos and Stützle, 2000a]. By extending Novelty with conflict-directed random walk analogously to GWSAT, the essential incompleteness as well as the empirically observed stagnation behaviour can be overcome. The Novelty+ algorithm [Hoos, 1998; 1999a] selects the variable to be flipped according to the standard Novelty mechanism with probability 1 − wp, and performs a random walk step, as defined above for GWSAT, in the remaining cases. A GLSM model of the resulting algorithm is shown in Figure 6.5.
Chapter 6 Propositional Satisfiability and Constraint Satisfaction CPROB(not R, 1-wp) NV(p) R)
CDET(R)
CDET(not R) NV(p)
RP
RP
p)
1-w
( OB
PR
DET
CD
ET
(R
PR
OB
)
CPROB(not R, 1-wp)
(
ET
CD
CPROB(not R, wp)
278
(w
p)
RW
CPROB(not R, wp)
Figure 6.5 GLSM models for Novelty (left side) and Novelty+ (right side); the restart predicate R is equal to countm(m), GLSM state RP initialises the search at a randomly selected variable assignment, NV(p) performs a Novelty step (with noise setting p), and RW performs a random walk step (see text for details).
Novelty+ is provably PAC for wp > 0 and shows exponential RTDs for sufficiently high (instance-specific) settings of the primary noise parameter, p. In practice, small walk probabilities, wp, are generally sufficient to prevent the extreme stagnation behaviour that is occasionally observed for Novelty and to achieve substantially superior performance compared to Novelty. In fact, a setting of wp := 0.01 seems to result in uniformly good performance [Hoos, 1999a], and the algorithm’s performance appears to be much more robust w.r.t. to the wp parameter than w.r.t. to the primary noise setting, p. In cases where Novelty does not suffer from stagnation behaviour, Novelty+ ’s performance for wp := 0.01 is typically almost identical to Novelty’s. Overall, Novelty+ is one of the best-performing WalkSAT algorithms currently known and one of the best SLS algorithms for SAT available to date [Hoos and Stützle, 2000a; Hutter et al., 2002].
R-Novelty and R-Novelty+ R-Novelty [McAllester et al., 1997] is a variant of Novelty that is based on the intuition that, when deciding between the best and second best variable (using the same scoring function as for Novelty), the actual difference of the respective scores should be taken into account. The exact mechanism for choosing a variable
6.3 The WalkSAT Architecture
279
from the selected clause can be seen from the decision tree representation shown in Figure 6.6. Note that the R-Novelty heuristic is quite complex — as reported by McAllester et al. [1997], it was discovered by systematically testing a large number of WalkSAT variants. R-Novelty’s variable selection strategy is even more deterministic than Novelty’s; in particular, it is completely deterministic for any p ∈ {0, 0.5, 1}. Since the pure R-Novelty algorithm gets too easily stuck in local minima, an extremely simple diversification mechanism is used: Every 100 steps, a variable is randomly chosen from the selected clause and flipped. As shown in [Hoos, 1998; 1999a], this loop breaking strategy is generally not sufficient for effectively escaping from local minima and leaves R-Novelty essentially incomplete (for fixed maxTries); as in the case of Novelty, severe stagnation behaviour is observed in practice for some SAT instances [Hoos and Stützle, 2000a]. R-Novelty’s performance is often, but not always, superior to Novelty’s. Replacing the original diversification mechanism in R-Novelty with a random walk mechanism exactly analogous to the one used in Novelty+ leads to the R-Novelty+ algorithm [Hoos, 1998; 1999a]. Like Novelty+ , R-Novelty+ is provably PAC for wp > 0 and shows exponential RTDs for sufficiently high (instance-specific) noise settings. Again, a small walk probability of wp := 0.01 appears to be generally sufficient for avoiding stagnation behaviour and for robustly achieving good performance in practice. R-Novelty+ ’s performance for instances on which R-Novelty does not suffer from stagnation behaviour is very similar to R-Novelty’s. There is some indication that R-Novelty and R-Novelty+ do not reach the performance of Novelty+ on several classes of structured SAT
best variable has minimal age
best variable does not have minimal age
select best variable from clause
score difference > 1
p1
score difference = 1
1−p1
p2
1−p2
select best select 2nd best select best select 2nd best variable from clause variable from clause variable from clause variable from clause
Figure 6.6 Decision tree representation of the mechanism used by R-Novelty for selecting
a variable to be flipped in a given clause c; ‘score difference’ refers to the difference in score between the best and the second best variable in c; p1 := min{2 − 2p, 1}, p2 := max{1 − 2p, 0}.
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
instances, including SAT-encoded hard graph colouring and planning problems [Hoos and Stützle, 2000a].
Example 6.1 Performance Results for SLS Algorithms for SAT
median run-time Novelty+(0.6,0.01) [CPU sec]
To illustrate the performance differences between various GSAT and WalkSAT algorithms, we empirically analysed their performance on a number of well-known benchmark instances for SAT. All performance results reported in the following are based on at least 100 runs per problem instance, conducted on a PC with a 2GHz Xeon CPU, 512KB cache, and 4GB RAM, running Red Hat Linux 2.4.20-18.9. All algorithms were run with optimised parameters (noise and tabu tenure) and without restart. The left side of Figure 6.7 shows the run-time distributions of GSAT with Random Walk (GWSAT), GSAT/Tabu, WalkSAT/SKC and Novelty+ , determined from 500 runs of the algorithm on a SAT-encoded, hard graph colouring instance with 100 vertices and 239 edges from SATLIB [Hoos and Stützle, 2000a]. The observed performance differences, which are consistent across all percentiles of the RTDs, are typical for many types of SAT instances. Furthermore, all four RTDs can be well approximated by exponential distributions, which again is characteristic for these algorithms when using sufficiently high noise and tabu tenure settings [Hoos and Stützle, 2000a]. It may
1 0.8
P(solve)
280
Novelty+(0.6,0.01) WalkSAT/SKC(0.5) GSAT/TABU(10) GWSAT(0.6)
0.6 0.4 0.2 0 0.0001 0.001 0.01
0.1
1
run-time [CPU sec]
10
100
10 1
01 0.01 0.001 0.0001 0.0001 0.001
0.01
0.1
1
10
median run-time GWSAT(0.6) [CPU sec]
Figure 6.7 Left: Run-time distributions for various GSAT and WalkSAT algorithms on a SAT-encoded graph colouring instance. Right: correlation of median search cost between GSAT with Random Walk (GWSAT) and Novelty+ on a set of randomly generated, SATencoded graph colouring problems; the horizontal and vertical lines indicate the median, q0.1 and q0.9 of the search cost for the respective algorithm across the self-test; the diagonal lines indicate equal, 1/10th and 1/100th CPU time of Novelty+ compared to GWSAT. (For further details, see text.)
6.3 The WalkSAT Architecture
281
be noted that basic GSAT, when run on the same problem instance, could not find a solution in 500 runs of 10 CPU seconds each. The right side of Figure 6.7 illustrates the correlation of search cost between GWSAT and Novelty+ across a set of 100 instances from the same distribution of randomly generated, hard graph colouring instances as the previously studied instance. Each data point in the graph represents the median CPU time required by GWSAT vs Novelty+ on a single problem instance, determined from an RTD based on 100 runs. Horizontal and vertical lines indicate the median as well as the q0.1 and q0.9 percentiles of the distribution of search cost for the two algorithms, respectively, across the entire test-set. As can be clearly seen from this correlation plot, Novelty+ performs substantially better than GWSAT across the entire test-set. Furthermore, the hardness of the problem instances for both algorithms is highly correlated, indicating that both algorithms are affected by the same features of the respective instances (in this case, the solution density, which varies substantially across the test-set; see also Chapter 5, Section 5.1). Similar results hold for all GSAT and WalkSAT algorithms discussed in this chapter. Table 6.1 summarises performance results for several GSAT and WalkSAT algorithms on a test-set comprising a hard instance from the solubility phase transition of Uniform Random 3-SAT, as well as SAT-encoded instances of graph colouring, Boolean function learning, and planning problems. These SAT instances range in size from 75 variables and 298 clauses (par8-5-c) to 3 016 variables and 50 457 clauses (bw large.c). The results
Problem Instance
uf200/hard flat100/hard par8-5-c logistics.d bw large.a bw large.c
GSAT/Tabu
WalkSAT/SKC
45.4 (9.30 ·106 ) 0.39 (73 594) 0.22 (45 027) — 0.09 (6 977) 11.9 (1.01 ·106 )
1.04 (0.85 ·106 ) 0.15 (192 788) 0.010 (13 345) 0.58 (398 277) 0.02 (13 505) 23.6 (9.76 ·106 )
WalkSAT/ Tabu
Novelty+
4.61 (3.70 ·106 ) 0.82 (0.61 ·106 ) 0.18 (229 496) 0.06(80 315) 0.006 (8 388) 0.003(3 341) 0.49 (332 494) 0.21(113 664) 0.01 (7 563) 0.01(6 780) 6 5.6 (2.00 ·10 ) 11.4 (4.36 ·106 )
Table 6.1 Performance of various WalkSAT algorithms on selected benchmark instances
for SAT; the table entries are median run-times obtained from RTDs based on 100 or more runs per instance, reported in CPU seconds (search steps). All algorithms solved any given problem instance in every run, with the exception of GSAT/Tabu, which did not solve logistics.d in 10 runs of 10 CPU seconds each for any of a number of tabu tenure settings tested, and which solved bw large.c in only 247 of 250 runs.
282
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
illustrate the excellent performance of Novelty+ and WalkSAT/Tabu compared to other GSAT and WalkSAT algorithms. Note also that GSAT/Tabu often performs better than the WalkSAT algorithms in terms of search steps required for solving a given problem instance; yet, as previously explained, this rarely results in faster run-times, since search steps of WalkSAT algorithms can be implemented more efficiently than those of GSAT algorithms. More detailed results on the performance and behaviour of GSAT and WalkSAT algorithms can be found in Hoos and Stützle [2000a].
WalkSAT with Adaptive Noise The noise parameter, p, which is common to all WalkSAT algorithms discussed here with the exception of WalkSAT/Tabu (where the tabu tenure tt plays a similar role), has a major impact on the performance and run-time behaviour of the respective algorithm. For low noise settings, stagnation behaviour is typically observed, and as a consequence, using an appropriate maxSteps setting for the static restart mechanism becomes crucial for obtaining good performance [Hoos and Stützle, 2000a]. For sufficiently high noise settings, however, the maxSteps setting has typically little or no impact on the behaviour of the algorithm [Parkes and Walser, 1996; Hoos and Stützle, 1999], since the corresponding RTDs are closely approximated by exponential distributions. (There are exceptions to this general observation, including instances on which essentially incomplete WalkSAT variants show extreme stagnation behaviour, as well as the irregular instances recently described by Hoos [2002b].) Fortunately, for many of the most prominent and best-performing WalkSAT algorithms, including WalkSAT/SKC, WalkSAT/Tabu, Novelty+ and R-Novelty+ , the noise settings required for reaching peak performance are generally high enough that the cutoff parameter, maxSteps, does not affect performance unless it is chosen too low, in which case performance is degraded. This leaves the noise setting, p, to be optimised in order to achieve maximal performance of these WalkSAT algorithms. Unfortunately, finding the optimal noise setting is typically rather difficult. Because optimal noise settings appear to differ considerably depending on the given problem instance, this task often requires experience and substantial experimentation with various noise values [Hoos and Stützle, 2000a]. It has been shown that even relatively minor deviations from the optimal noise setting can lead to a substantial increase in the expected time for solving a given instance; and to make matters worse, the sensitivity of WalkSAT’s performance w.r.t. the noise setting seems to increase with the size and hardness of the problem instance
6.3 The WalkSAT Architecture
283
to be solved [Hoos, 2002a]. This complicates the use of WalkSAT for solving SAT instances as well as the evaluation, and hence the development, of new WalkSAT algorithms. The key idea behind Adaptive WalkSAT [Hoos, 2002a] is to use high noise values only when they are needed to escape from stagnation situations in which the search procedure appears to make no further progress towards finding a solution. This idea is closely related to the motivation behind Reactive Tabu Search [Battiti and Tecchiolli, 1994]. More precisely, Adaptive WalkSAT dynamically adjusts the noise setting p, and hence the probability for performing greedy steps, based on search progress, as reflected in the time elapsed since the last improvement in the evaluation function has been achieved. At the beginning of the search process, the search is maximally greedy (p := 0). This will typically lead to a series of rapid improvements in the evaluation function value, followed by stagnation (unless a solution to the given problem instance is found). In this situation, the noise value is increased. If this increase is not sufficient to escape from the stagnation situation, that is, if it does not lead to an improvement in evaluation function value within a certain number of steps, the noise setting is further increased. Eventually, the noise setting should be high enough for the search process to overcome the stagnation situation, at which point the noise can be gradually decreased until the next stagnation situation is detected or a solution to the given problem instance is found. As an indicator for search stagnation, Adaptive WalkSAT uses a predicate that is true if, and only if, no improvement in evaluation function value has been observed over the last θ · m search steps, where m is the number of clauses of the given problem instance, and θ is a parameter. Every increase in the noise setting is realised as p := p + (1 − p) · φ. The decrements are defined as p := p − p · φ/2, where p is the noise level, and φ is an additional parameter. The asymmetry between increases and decreases in the noise setting is motivated by the fact that detecting search stagnation is computationally more expensive than detecting search progress, and by the observation that it is advantageous to approximate optimal noise levels from above rather than from below [Hoos, 2002a]. After the noise setting has been increased or decreased, the current evaluation function value is stored and becomes the basis for measuring improvement, and hence for detecting search stagnation. As a consequence, between increases in noise level, there is always a phase during which the trajectory is monitored for search progress without further increasing the noise. No such delay is enforced between successive decreases in noise level. It may be noted that the behaviour of the adaptive noise mechanism is controlled by two parameters, θ and φ. While one might assume that this merely replaces the problem of tuning one parameter, p, with the potentially more difficult problem of tuning these new parameters, it appears that the performance
284
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
of Adaptive WalkSAT is much more robust w.r.t. to the settings of θ and φ, than WalkSAT is w.r.t. to the noise setting. Using fixed settings of θ := 1/6 and φ := 0.2 for Adaptive Novelty+ generally seems to result in similar performance as observed for Novelty+ with approximately optimal, instance-specific noise settings; in some cases, Adaptive Novelty+ achieves significantly better performance than Novelty+ with approximately optimal static noise [Hoos, 2002a], which makes Adaptive Novelty+ one of the best-performing and most robust SLS algorithms for SAT currently available.
6.4 Dynamic Local Search Algorithms for SAT The first application of Dynamic Local Search to SAT was proposed around the same time as GWSAT. Since then, a number of DLS algorithms for SAT have been developed, the most recent of which achieve better performance than the best GSAT and WalkSAT variants for many types of SAT instances and can therefore be seen as the best performing SLS algorithms for SAT currently known. Most DLS algorithms for SAT are based on variants of GSAT as their underlying local search procedure. The solution components that are being selectively penalised are the clauses of the given formula; in the following, we denote the penalty associated with clause c by clp(c). (Here and in the following we assume — without loss of generality — that all clauses of a given CNF formula are pairwise different.) Consistent with the general outline for DLS algorithms from Section 2.2 (page 82ff.), typically a modified evaluation function of the form g (F, a) := g (F, a) + clp(c) c∈CU (F,a)
is used within the local search procedure, where CU (F, a) is the set of all clauses in F that are unsatisfied under assignment a. Many DLS algorithms for SAT use the notion of clause weights clw(c) instead of clause penalties, where
clw(c) := clp(c) + 1 and
g (F, a) :=
clw(c).
c∈CU (F,a)
For g (F, a) := #CU (F, a), the standard evaluation function used by most SLS algorithms for SAT, both definitions of g (F, a) are equivalent. The major
6.4 Dynamic Local Search Algorithms for SAT
285
differences between DLS algorithms for SAT are in the details of the local search procedure and in the scheme used for updating the clause penalties or weights. Most DLS algorithms for SAT perform excellently in terms of the number of variable flips required for finding a model of a given formula. However, the time complexity and frequency of the weight updates is typically rather high, which makes it difficult for DLS algorithms to reach or exceed the time performance of the best-performing WalkSAT variants. Unfortunately, the run-time behaviour of DLS algorithms for SAT has not been as thoroughly investigated as that of GSAT and WalkSAT algorithms. In particular, little is know about these algorithms in terms of their asymptotic run-time behaviour, search stagnation and RTD characterisations.
GSAT with Clause Weights This early DLS algorithm for SAT is based on the observation that when applied to certain types of structured SAT instances, basic GSAT often finds the same set of clauses unsatisfied at the end of a run [Selman and Kautz, 1993]. In this GSAT variant, weights are associated with each clause. These weights are initially set to one; before each restart, the weights of all currently unsatisfied clauses are increased by δ := 1. The underlying local search procedure is a variant of basic GSAT that uses the modified evaluation function g (F, a) introduced above. It may be noted that for sufficiently high maxSteps settings, this local search procedure will terminate in or very close to a local minima region of the underlying search space. Different from the other DLS methods discussed in this section, GSAT with Clause Weights begins each local search phase from a randomly selected variable assignment. (A further extension, called ‘Averaging In’, uses a modified search initialisation that introduces a bias towards the best candidate solutions reached in previous local search phases [Selman and Kautz, 1993].) GSAT with Clause Weights performs substantially better than basic GSAT on various classes of structured SAT instances, including SAT-encoded graph colouring problems; there is also some indication that by using the same clause weighting mechanism with GWSAT, further performance improvements can be achieved [Selman and Kautz, 1993]. Today, since its performance is not competitive with any of the more recent DLS algorithms for SAT presented in the following, GSAT with Clause Weights is mainly of historical interest. Several variants of GSAT with Clause Weights have been studied by Cha and Iwama [1995]. In particular, they proposed and tested a variant that — like the Breakout Method [Morris, 1993], an earlier DLS algorithm for the CSP — performs weight updates whenever a local minimum of the modified evaluation function is encountered and, in its basic form, does not perform restarts. This
286
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
algorithm appears to perform substantially better than GSAT and GWSAT when applied to a class of randomly generated SAT instances that have only a single model [Asahiro et al., 1996]. (These instances, however, are not intrinsically hard, because they can be solved by polynomial simplifications, and hence they are only of limited use as benchmark problems [Hoos and Stützle, 2000a].) There is no evidence that this variant performs better than the original GSAT with Clause Weights algorithm. Cha and Iwama also investigated slight variations of the weight update scheme, as well as combinations of their basic algorithm with static restarts and a simple tabu search strategy that, different from GSAT/Tabu or WalkSAT/Tabu, associates a tabu status with the most recently visited variable assignments, rather than with recently flipped variables [Cha and Iwama, 1995]. From their limited empirical results it appears that none of these variations achieves significant performance improvements over their previously described, basic variant of GSAT with Clause Weights.
Methods Using Rapid Weight Adjustments Frank introduced several variants of GSAT with Clause Weights that perform weight updates after each local search step [Frank, 1996; 1997]. The underlying idea is that GSAT should benefit from discovering which clauses are most difficult to satisfy relative to recent assignments. The most basic of these variants, called WGSAT, uses the same weight initialisation and update procedure as GSAT with Clause Weights, but performs only a single GSAT step before updating the clause weights. On hard Random 3-SAT instances, WGSAT achieves a significantly improved performance over HSAT (and hence, basic GSAT) when measuring run-time in terms of variable flips required for finding a solution [Frank, 1996; 1997]. When comparing CPU times however, it appears that due to the computational overhead caused by the frequent weight updates, WGSAT’s performance cannot reach that of HSAT or GWSAT. A modification of this algorithm, called UGSAT, uses a best-improvement local search strategy, but restricts the neighbourhood considered in each search step to the set of variables appearing in currently unsatisfied clauses [Frank, 1996]. (Note that this is the same effective neighbourhood as used in the random walk steps of GWSAT.) While this leads to considerable speedups for naïve implementations of the underlying local search procedure, the difference for efficient implementations is likely to be insufficient to render UGSAT competitive with HSAT or GWSAT. Another variant of WGSAT implements a uniform decay of clause weights over time. The underlying idea is that the relative importance of clauses w.r.t.
6.4 Dynamic Local Search Algorithms for SAT
287
their satisfaction status can change during the search, and hence a mechanism is needed that focuses the weighted search on the most recently unsatisfied clauses. In WGSAT with Decay, this idea is implemented by uniformly decaying all clause weights in each weight update phase before the weights of the currently unsatisfied clauses are increased; this decay is performed according to the formula clw(c) := ρ · clw(c), where the decay rate ρ (with 0 < ρ < 1) is a parameter of the algorithm [Frank, 1997]. Empirical results suggest that on larger instances from the phase transition region of Uniform Random 3-SAT, using this decay mechanism slightly improves the performance of WGSAT when measured in terms of variable flips required for finding a model; this improvement, however, appears to be insufficient to amortise the added time complexity of the frequent weight update steps. Nevertheless, as we will see later in this section, similar mechanisms for focusing the search on recently unsatisfied clauses play a crucial role in state-of-the-art DLS algorithms for SAT.
Guided Local Search (GLS) This DLS algorithm has been applied to a number of combinatorial problems [Voudouris, 1997; Voudouris and Tsang, 1999]. GLS for SAT (GLSSAT) [Mills and Tsang, 1999a; 2000] is based on a local search algorithm that, similar to HSAT, Novelty and R-Novelty, implements a bias towards flipping variables whose respective values have not been changed recently. More precisely, in each local search step, from the set of all variables that, when flipped, would lead to a strict decrease in the total penalty of unsatisfied clauses, the one whose last flip has occurred least recently is flipped. If no such strictly improving variable exists, the same selection is made from the set of all variables that, when flipped, do not cause an increase in the evaluation function value. The subsidiary local search procedure terminates when a satisfying assignment is found, or after a fixed number smax of consecutive non-improving flips has been made. Before the actual search begins, GLSSAT performs a complete pass of unit propagation in order to simplify the given formula. Then, all clause penalties are initialised to zero, and the search starts from a variable assignment that is chosen uniformly at random. After each local search phase, the penalties of all clauses with maximal utilities are incremented by δ := 1, where the utility of a clause c under assignment a is defined as util(a, c) := 1/(1 + clp(c)) if clause c is unsatisfied under x, and zero otherwise. Note that this corresponds to incrementing the smallest clause penalties occurring in currently unsatisfied clauses. An important extension of GLSSAT uses an additional mechanism for bounding the range of the clause penalties: If after updating the clause penalties, the maximum penalty exceeds
288
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
a given threshold, pmax, all clause penalties are uniformly decayed by multiplying them with a factor pdecay. This clause penalty decay mechanism has a substantial impact on the performance of GLSSAT and significantly improves the algorithm’s efficacy in solving large and hard structured instances. A similar modification of GLSSAT, called GLSSAT2, was used in another study [Mills and Tsang, 2000]; in this variant, all clause penalties are multiplied by a factor pdecay := 0.8 after every 200 penalty updates. GLSSAT achieves better performance than WalkSAT/SKC on some widely used benchmark instances when measuring run-time in terms of variable flips, but in many cases WalkSAT/SKC is superior in terms of CPU time [Mills and Tsang, 2000]. There are some hard structured SAT instances, however, for which GLSSAT2 appears to perform significantly better than WalkSAT/SKC. Indirect evidence suggests that GLSSAT is generally outperformed by the most recent DLS algorithms for SAT, such as ESG and SAPS (these are described later in this section).
The Discrete Lagrangian Method (DLM) The basic DLM algorithm for SAT [Shang and Wah, 1998] is motivated by the theory of Lagrange multipliers for continuous optimisation. Basic DLM is a DLS algorithm based on GSAT/Tabu with clause weights as its underlying local search procedure; in each search step, it flips a non-tabu variables that maximises the decrease in the total weight of all unsatisfied clauses. This subsidiary local search is terminated when an assignment is reached for which the number of neighbouring assignments with larger or equal evaluation function value exceeds a given threshold θ1 . After each local search phase, the penalties for all unsatisfied clauses are increased by δ + := 1; additionally, in order to bound the range of the clause penalties, all penalties are reduced by δ − := 1 after every θ2 local search phase. Before the actual search begins, DLM simplifies the given formula by performing a complete pass of unit propagation. As usual, all clause penalties are initialised to zero, and the search process starts from a variable assignment that is chosen uniformly at random. This basic DLM algorithm has been extended in various ways. DLM-99-SAT [Wu and Wah, 1999] uses an additional mechanism for escaping more effectively from local minima of the evaluation function. The idea behind this mechanism is to identify clauses that are frequently unsatisfied in local minima, and to additionally increase their penalties. This is achieved by means of temporary clause penalties ti , which are initialised at zero and increased by δw := 1 for all unsatisfied clauses whenever a local minimum is encountered. After each regular clause penalty update, if the ratio between the maximal ti and average ti over all
6.4 Dynamic Local Search Algorithms for SAT
289
clauses exceeds a threshold θ3 , the regular penalty of the clause with the largest ti is increased by δs := 1. (In another variant, only the ti of currently unsatisfied clauses are considered when computing the ratio and determining the clause penalty that receives the additional increase.) A different extension of DLM, called DLM-2000-SAT, uses a long-term memory mechanism for preventing the search process from getting stuck repeatedly in certain attractive non-solution areas of the search space. This is implemented by using a list of previously visited assignments and by adding an additional distance penalty to the evaluation function for assignments that are close to the elements of this list. More precisely, during the search process, every ws variable flips, the current variable assignment is added to a fixed-length queue. Using the assignments aj in this queue, a distance term for a given vari able assignment a is computed as d := j min{θt , hd(a, aj )}, where hd(a, aj ) is the Hamming distance (i.e., the number of variables assigned different values) between assignments a and aj . The evaluation function used in the subsidiary local search procedure is then extended to g (F, a) := g (F, a) + c∈CU (F,a) clw(c)− d, where CU (F, a) denotes the number of clauses in F unsatisfied under a. Note that by using a bound θt n on the distance contribution from each assignment aj , the impact of this mechanism on the search process is fairly localised. DLM-99-SAT shows substantially better performance than the basic DLM algorithm, particularly on large and structured SAT instances. DLM-2000-SAT, the most recent DLM variant, typically seems to perform better than DLM-99SAT as well as WalkSAT/SKC. For a considerable time, this dynamic local search algorithm was one of the best known SLS algorithms for SAT.
The Exponentiated Subgradient Algorithm (ESG) The Exponentiated Subgradient (ESG) algorithm [Schuurmans et al., 2001] is motivated by subgradient optimisation, a well-known method for minimising Lagrangian functions, which is often used for generating good lower bounds in branch and bound techniques or as a heuristic in local search algorithms. ESG starts its search from a randomly selected variable assignment after initialising all clause weights to one. As its underlying local search procedure, ESG uses a best improvement search method that can be seen as a simple variant of GSAT. In each local search step, the variable to be flipped is selected uniformly at random from the set of all variables that appear in currently unsatisfied clauses and whose flip leads to a maximal reduction in the total weight of unsatisfied clauses. When reaching a local minimum position (i.e., an assignment in which flipping any variable that appears in an unsatisfied clause would not lead to a decrease in the total weight of unsatisfied clauses), with probability η , the search
290
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
is continued by flipping a variable that is uniformly chosen at random from the set of all variables appearing in unsatisfied clauses; otherwise, the local search phase is terminated. After each local search phase, the clause weights are updated. This involves two stages: First, the weights of all clauses are multiplied by a factor depending on their satisfaction status; weights of satisfied clauses are multiplied by αsat , weights of unsatisfied clauses by αunsat (scaling stage). Then, all clause weights are smoothed using the formula clw(c) := clw(c)·ρ+(1−ρ)·w (smoothing stage), where w is the average of all clause weights after scaling, and the parameter ρ has a fixed value between zero and one. The algorithm terminates when a satisfying assignment for F has been found or when a user-specified maximal number of iterations have been completed. In a straight-forward implementation of ESG, the weight update steps are computationally much more expensive than the weighted search steps, whose cost is determined by the underlying basic local search procedure. Each weight update step requires accessing all clause weights, while a weighted search step only needs to access the weights of the critical clauses, that is, clauses that can change their satisfaction status when a variable appearing in a currently unsatisfied clause is flipped. (The complexity of all other operations is dominated by these operations.) Typically, for the major part of the search, only few clauses are unsatisfied; hence, only a small subset of the clauses is critical, rendering the weighted search steps computationally much cheaper than weight updates. If weight updates would typically occur very infrequently compared to weighted search steps, the relatively high complexity of the weight update steps might not have a significant effect on the performance of the algorithm. However, experimental evidence indicates that the fraction of weighting steps performed by ESG is quite high; it ranges from around 7% for SAT encodings of large flat graph colouring problems to more than 40% for SAT-encoded all-interval-series problems [Hutter et al., 2002]. Efficient implementations of ESG therefore critically depend on additional techniques in order to achieve the competitive performance results reported by Schuurmans et al. [2001]. The most recent publicly available ESG-SAT implementation by Southey and Schuurmans (Version 1.4), for instance, uses αsat := 1 (which avoids the effort of scaling satisfied clauses), replaces w by 1 in the smoothing step, and utilises a ‘lazy’ weight update technique which updates clause weights only when they are needed. When measuring run-time in terms of search steps, ESG typically performs substantially better than state-of-the-art WalkSAT variants, such as Novelty+ . In terms of CPU-time, however, even the optimised ESG-SAT implementation by Southey and Schuurmans does not always reach the performance of Novelty+ . Compared to DLM-2000-SAT, ESG-SAT typically requires fewer steps for finding
6.4 Dynamic Local Search Algorithms for SAT
291
a model of a given formula, but in terms of CPU-time, both algorithms show very similar performance [Schuurmans et al., 2001; Hutter et al., 2002].
Scaling and Probabilistic Smoothing (SAPS) The SAPS algorithm [Hutter et al., 2002] can be seen as a variant of ESG that uses a modified weight update scheme, in which the scaling stage is restricted to the weights of currently unsatisfied clauses, and smoothing is only performed with a certain probability psmooth . Note that restricting the scaling operation to the weights of unsatisfied clauses (αsat := 1) does not affect the variable selection in the weighted search phase, since rescaling all clause weights by a constant factor does not affect the variable selection mechanism. (Southey and Schuurmans’ efficient ESG implementation also makes use of this fact.) This reduces the complexity of the scaling step from Θ(#C (F )) to Θ(#CU (F, a)), where C (F ) is the set of clauses in the given CNF formula F and CU (F, a) is the set of clauses in F that are unsatisfied under assignment a. After a short initial search phase, typically only a few clauses remain unsatisfied such that #CU (F, a) becomes rather small compared to #C (F ); this effect seems to be more pronounced for larger SAT instances with many clauses. The smoothing step, however, has complexity Θ(#C (F )), and now dominates the complexity of the weight update. Therefore, by applying the expensive smoothing operation only occasionally, the time complexity of the weight update procedure can be substantially reduced. It has been shown experimentally that this does not have a detrimental effect on the performance of the algorithm in terms of the number of weighted search steps required for solving a given instance [Hutter et al., 2002]. By having the weight update procedure perform smoothing of all clause weights (using the same formula as shown in the description of ESG above) only with a probability psmooth 1, the time complexity of a weight update is reduced to Θ(psmooth · #C (F ) + #CU (F, a)) compared to Θ(#C (F ) + #CU (F, a)) for ESG. As a result, the discounted cost of smoothing no longer dominates the algorithm’s run-time. Performing the smoothing probabilistically rather than deterministically after a fixed number of steps (like the occasional clause weight reduction in DLM) also has the theoretical advantage of preventing the algorithm from getting trapped in the same kind of cyclic behaviour that renders R-Novelty essentially incomplete. (In practice, SAPS has been found to consistently perform well for small psmooth values of about 0.05.) The SAPS algorithm as described here does not require additional implementation tricks other than the standard mechanism for efficiently accessing critical clauses, which is used in all efficient implementations of SLS algorithms for SAT. Compared to ESG, SAPS typically requires a similar number of variable flips for
292
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
procedure UpdateClauseWeights (F, a; α, ρ, psmooth ) input: propositional formula F, variable assignment a; scaling factor α, smoothing factor ρ, smoothing probability psmooth
C := {c | c is a clause of F }; U := {c ∈ C | c is unsatisfied under a}; for each c ∈ U do clw(c) := clw(c) · α; end with probability psmooth do for each c ∈ C do
clw(c) := clw(c) · ρ + (1 − ρ) · w ; end end end UpdateClauseWeights Figure 6.8 The SAPS weight update procedure; w is the average over all clause weights.
finding a model of a given formula, but in terms of time performance it is significantly superior to ESG, DLM-2000-SAT, and the best known WalkSAT variants [Hutter et al., 2002]. However, there are some cases (in particular, hard and large SAT-encoded graph colouring instances), for which SAPS does not reach the performance of Novelty+ . A reactive variant of SAPS, RSAPS, automatically adjusts the smoothing probability psmooth during the search, using a mechanism that is very similar to the one underlying Adaptive WalkSAT. RSAPS sometimes achieves significantly better performance than SAPS [Hutter et al., 2002]; however, different from Adaptive WalkSAT, RSAPS still has other parameters, in particular ρ, that need to be tuned manually in order to achieve optimal performance.
6.5 Constraint Satisfaction Problems An instance of the Constraint Satisfaction Problem (CSP) is defined by a set of variables, a set of possible values (or domain) for each variable, and a set of constraining conditions (constraints) involving one or more of the variables. The Constraint Satisfaction Problem is to decide for a given CSP instance whether all variables can be assigned values from their respective domains such that all constraints are simultaneously satisfied. Depending on whether the variable domains are discrete or continuous, finite or infinite, different types of CSP
6.5 Constraint Satisfaction Problems
293
instances and respective subclasses of the CSP can be distinguished. Here, we restrict our attention to the finite discrete CSP, a widely studied type of constraint satisfaction problem with many practical applications. Definition 6.2 Finite Discrete CSP
A CSP instance is a triple P := (V, D, C), where V := {x1 , . . . , xn } is a finite set of n variables, D is a function that maps each variable xi to the set Di of possible values it can take (Di is called the domain of xi ), and C := {C1 , . . . , Cm } is a finite set of constraints. Each constraint Cj is a relation over an ordered set Var(Cj ) of variables from V , that is, for Var(Cj ) := (y1 , . . . , yk ), Cj ⊆ D(y1 ) × · · · × D(yk ). The elements of the set Cj are referred to as satisfying tuples of Cj , and k is called the arity of the constraint Cj . A CSP instance P is called n-ary, if, and only if, the arity of all constraints in P have arity at most n; in particular, binary CSP instances have only constraints of arity at most two. P is a finite discrete CSP instance if, and only if, all variables in P have discrete and finite domains. n A variable assignment of P is a mapping a : V → i=1 Di that assigns to each variable x ∈ V a value from its domain D(x). Let Assign(P ) denote the set of all possible variable assignments for P ; then a variable assignment a ∈ Assign(P ) is a solution of P if, and only if, it simultaneously satisfies all constraints in C, that is, for all Cj ∈ C with, say, Var(Cj ) = (y1 , . . . , yk ) the assignment a maps y1 , . . . , yk to values v1 , . . . , vk such that (v1 , . . . , vk ) ∈ Cj . CSP instances for which at least one solution exists are also called consistent, while instances that do not have any solutions are called inconsistent. The finite discrete CSP is the problem of deciding whether a given finite discrete CSP instance P is consistent.
Remark: In many cases, the constraint relations involved in CSP instances can be represented more compactly by using standard mathematical relations, such as ‘=’, ‘=’, ‘’, ‘≥’. In other cases, a more compact representation of a given constraint Cj is obtained by explicitly listing the complement of the set of satisfying tuples, that is, the set of unsatisfying tuples of Cj .
Example 6.2 The Canadian Flag Problem
Let us consider the problem of colouring the Canadian flag by assigning colours red (r ) and white (w) to the four fields L, C, R, M in such a way that
294
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
M L
C
R
Figure 6.9 A simple CSP instance: the problem of colouring the Canadian flag (see text for details).
any two neighbouring fields are coloured differently (see Figure 6.9). This problem can be formulated as a binary CSP instance as follows:
V := {L, C, R, M } D(L) := D(C ) := D(R) := D(M ) := {r, w} C := {C1 , C2 , C3 } with
Var(C1 ) := (L, C ) Var(C2 ) := (C, M ) Var(C3 ) := (C, R)
and
C1 := C2 := C3 := {(r, w), (w, r)}
There are two solutions to this CSP instance; one assigns red to M, L, R and white to C , while the other assigns white to M, L, R and red to C . By adding a fourth, unary constraint to C that forces M to be coloured red, the instance can be modified such that only the solution corresponding to the correct colouring of the Canadian flag remains. This simple CSP instance is an example of a Map Colouring Problem, which in turn can be seen as a special case of the Graph Colouring Problem (GCP). GCP is an important subclass of the CSP in which the objective is to colour the vertices of a given graph in such a way that two vertices connected by an edge are never assigned the same colour. The GCP is covered in more detail in Chapter 10, Section 10.1.
Like SAT, finite discrete CSP is an N P-complete combinatorial problem. This can be proven rather easily based on the following close relationship between SAT and finite discrete CSP. Any instance of SAT (for CNF formulae) can be seen as a finite discrete CSP instance where all the domains contain only the truth
6.5 Constraint Satisfaction Problems
295
values , ⊥ and each constraint contains exactly all the satisfying assignments of one particular clause of the given CNF formula F . Vice versa, as we will show in the next section, any finite discrete CSP instance can be directly transformed into a SAT instance rather efficiently.
Encoding CSP Instances into SAT CSP instances can be encoded into SAT in a rather straight-forward way. The basic idea is to use propositional variables to represent the assignment of values to single CSP variables and clauses to express the constraint relations [de Kleer, 1989]. For the sake of simplicity, we assume in the following, without loss of generality, that the domains of all variables are equal to Zk := {0, 1, . . . , k − 1}, where k is an arbitrary positive integer. Furthermore, we use σ (Cj ) to denote the arity of a constraint Cj , that is, the number of variables involved in Cj . Given a finite discrete CSP instance P := (V, D, C) with V := {x1 , . . . , xn }, D(x) := Zk for all x ∈ V , and C := {C1 , . . . , Cm }, a very natural SAT encoding is based on propositional variables ci,v that, if assigned the value , represent the assignment xi := v , where v ∈ D(xi ). P can then be represented by a CNF formula comprising the following sets of clauses: (1) ¬ci,v1 ∨ ¬ci,v2
(1 ≤ i ≤ n; v1 , v2 ∈ Zk ; v1 < v2 )
(2) ci,0 ∨ ci,1 ∨ . . . ∨ ci,k−1
(1 ≤ i ≤ n)
(3) ¬ci1 ,v1 ∨ ¬ci2 ,v2 ∨ . . . ∨ ¬cis ,vs
(xi1 := v1 ; xi2 := v2 ; . . . ; xis := vs ) violates some constraint Cj ∈ C with σ (Cj ) := s)
Intuitively, these clause sets ensure that each constraint variable is assigned exactly one value from its domain (sets 1 and 2) and any solution is compatible with all constraints (set 3). The number of propositional variables required for encoding a given CSP instance is linear in the number of constraint variables and their domain sizes, while the number of clauses is at least linear in the number of constraint variables and depends critically on the domain sizes and the arity of the constraints. This encoding is frequently used in the context of translations of combinatorial problems into SAT that use CSP as an intermediate domain (cf. Section 6.1). It is known as the sparse encoding, because it generates relative large SAT instances whose models have only a small fraction of the propositional variables set to . (In the literature, this encoding has also been referred to as the unary transform or direct encoding.) By using an alternative SAT encoding of CSP instances, the number of propositional variables required for encoding a given CSP instance can be significantly
296
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
reduced compared to the sparse encoding. The compact encoding (in the literature also referred to as the binary transform or log encoding) is based on the idea of representing the value v assigned to any constraint variable xi by a group of log2 k propositional variables ci,j , using a binary encoding of v [Iwama and Miyazaki, 1994; Hoos, 1999c; 1999b]. This leads to a CNF formula with n ·log2 k propositional variables; particularly for large domain sizes k , this can be a substantial reduction compared to the n · k propositional variables required by the sparse encoding. The number of clauses, however, is similar for both encodings, since in either case the same number of clauses is needed for representing the constraint relations (these clauses typically dominate the overall number of clauses). Although the SAT instances generated by the compact encoding have search spaces that are substantially smaller than those obtained from the sparse encoding, and consequently, substantially higher solution densities, they often appear to be much harder to solve using standard SLS algorithms for SAT [Hoos, 1999c; 1999b; Frisch and Peugniez, 2001]. (There is, however, some evidence that for relatively small, structured CSP instances, the SAT instances obtained from the compact encoding can sometimes be solved as efficiently as those obtained from the sparse encoding [Prestwich, 2003].) There are several other methods for encoding CSP instances into SAT. One of these is the multivalued encoding, a variant of the sparse encoding that does not include the binary clauses preventing two values being simultaneously assigned to the same CSP variable (set 1 above); it produces SAT instances that have higher solution densities than those obtained by the sparse encoding and appear to be easier to solve for high-performance SLS algorithms for SAT [Prestwich, 2003]. Even higher solution densities can be achieved by weakened encodings, which can be seen as a generalisation of the multivalued encoding. Although recent empirical results suggest that at least one weakened encoding, the socalled reduced encoding, can lead to excellent SLS performance, further studies are required to clarify the benefits of weakened encodings [Prestwich, 2003]. Finally, the support encoding is similar to the sparse encoding, but rather than ruling out the unsatisfying tuples of the given constraints (set 3 above), it directly captures the satisfying tuples in the form of so-called support clauses [Kasif, 1990; Gent, 2002].
CSP Simplification and Local Consistency Techniques Similar to the case of SAT, native CSP instances can often be substantially reduced in size and complexity by applying polynomial-time simplification methods. Also known as local consistency techniques, these methods are transformations that are applied to (local) subsets of a given CSP instance P [Mackworth, 1977;
6.5 Constraint Satisfaction Problems
297
Debruyne and Bessière, 2001]. Local consistency techniques can reduce the effective domains of CSP variables by eliminating values that cannot occur in any solution. One of the most prominent simplification techniques for the CSP is the enforcement of arc consistency. A given CSP instance P is made arc consistent w.r.t. to one of its constraints, C , by removing any value v from the domain of any variable x involved in C if, and only if, there exists no CSP assignment that satisfies C , that is, no tuple t ∈ C for which x has value v . A CSP instance P is arc consistent if, and only if, it is arc consistent w.r.t. all of its constraints. For binary CSP instances with e constraints and maximal domain size k , the best known algorithms for enforcing arc consistency have a time complexity O(ek 2 ) and space complexity O (ek ) [Bessière et al., 1999]. (It may be noted that enforcing arc consistency on a given CSP instance is equivalent to applying unit propagation to its support encoding [Gent, 2002].) A number of further local consistency techniques has been described by Debruyne and Bessière [2001]. Combined with backtracking mechanisms, simplification methods, such as forward checking or enforcing arc consistency, play a crucial role in systematic search algorithms for the CSP [Haralick and Elliot, 1980; Grant and Smith, 1996]. They can also be used as preprocessing techniques before applying SLS-based, incomplete CSP solvers. The high computational cost of enforcing higher levels of local consistency, such as path consistency, are often not amortised by the reduced run-times of CSP solvers that are subsequently applied to the resulting CSP instances. One method for improving this situation is to apply the corresponding local consistency methods to heuristically selected parts of a given CSP instance only [Kask and Dechter, 1995].
Prominent Benchmark Instances for the CSP There are numerous types of CSP instances that have been commonly used in the literature on the CSP. Many studies have focused on a particular class of randomly generated CSP instances with binary constraint relations, which we call Uniform Random Binary CSP. Besides the number of CSP variables and the size of the variable domains, this problem distribution is characterised by two parameters, the constraint graph density α and the constraint tightness β ; α specifies the probability that a constraint relation exists between an arbitrary pair of CSP variables, and β is the expected fraction of value pairs that satisfy a given constraint relation. For this class of CSP instances, a solubility phase transition phenomenon with an associated peak in hardness, similar to the one for Uniform Random 3-SAT, has been observed [Smith, 1994], and test-sets of hard instances can be obtained for specific combinations of α and β values.
298
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
Another widely used class of CSP instances stems from the Graph Colouring Problem (see Example 6.2), which can be seen as a special case of the finite discrete CSP in which all variables have the same domain, and all constraint relations are binary, allowing a pair of values (x, y ) if, and only if, x = y (inequality constraint). The Graph Colouring Problem and specific instance classes are discussed in more detail in Chapter 10, Section 10.1. Graph colouring instances with three colours are amongst the most commonly used benchmark instances for the CSP. A prominent special case of the Graph Colouring Problem is the Quasigroup Completion Problem (QCP), which is derived from the following Quasigroup Problem or Latin Square Problem: Given an n × n quadratic grid and n colours, the objective is to assign a colour to each grid cell in such a way that every row and column contains all n colours. In the QCP, the objective is to decide whether a partial solution of the Quasigroup Problem, that is, an incomplete assignment of colours to the given grid such that no two cells in the same row or column have the same colour, can be extended into a complete solution by assigning colours to the remaining cells. In the CSP formulation, the pre-assigned cells can be easily represented by unary constraints. The QCP is known to be N P-complete [Colbourn, 1984], and a phase-transition phenomenon with an associated peak in hardness has been observed [Gomes and Selman, 1997b]. The QCP has a variety of important applications in areas, such as dynamic wavelength routing in fibre optic networks and the design of statistical experiments [Kumar et al., 1999; Laywine and Mullen, 1998]. The n-Queens Problem is another prominent CSP; it can be seen as a generalisation of the problem of placing eight queens on a chessboard such that no queen is threatened by any of the other queens. This is achieved by distributing the queens in such a way that no row, column, or diagonal has more than a single queen on it. The 8-Queens Problem can be represented as a CSP instance with 8 variables and 28 binary constraints. In the n-Queens Problem, the objective is to place n queens on an n × n board subject to analogous constraints. Most of the work on CSP has been focused on binary CSP. One of the reasons for this is that any non-binary CSP instance can be transformed into a binary CSP instance in a rather straight-forward way [Dechter and Pearl, 1989; Rossi et al., 1990; Bacchus et al., 2002]. Another reason lies in the fact that algorithms restricted to binary CSP instances are typically easier to implement than general CSP solvers. There are numerous other classes of CSP instances, including CSP encodings of the real-world problems mentioned in Section 6.1. Some application-relevant problems include frequency assignment in radio networks, scheduling problems and vehicle routing. A description of many of the different types of constraint satisfaction problems can be found at CSPLIB, a benchmark library for constraints [Gent et al., 2003].
6.6 SLS Algorithms for CSPs
299
6.6 SLS Algorithms for CSPs Because of the close relationship between CSP and SAT, the respective SLS algorithms for solving these problems are quite similar; historically, there has been significant cross-fertilisation between both domains in terms of SLS algorithm design and development. We distinguish three types of SLS techniques for solving CSPs: SLS algorithms for SAT applied to SAT-encoded CSP instances; generalisations of SLS algorithms for SAT; and native SLS algorithms for CSPs. In the following, we will discuss each of these approaches in more detail and present some of the most prominent and best performing SLS algorithms for CSPs.
The ‘ Encode and Solve as SAT ’ Approach In principle, any CSP instance P can be solved by encoding it into SAT and subsequently applying standard SAT solvers to determine the satisfiability of the resulting CNF formula F . If P is soluble, its solutions can be determined from the models of F . By using any of the SAT encodings of CSPs discussed in Section 6.5, encoding CSP instances as well as decoding their solutions are efficient processes, and the resulting SAT instances are typically not prohibitively large compared to the original CSP instances. The main advantage of this approach lies in the fact that it allows the use of highly optimised and efficiently implemented ‘off-the-shelf’ SAT solvers. Besides the SLS algorithms described earlier in this chapter, this includes highperformance systematic SAT solvers and other state-of-the-art SAT algorithms (see Section 6.7 for references). Furthermore, standard polynomial preprocessing techniques for SAT can be used to simplify SAT-encoded CSP instances prior to applying a SAT solver. CSP preprocessing techniques, such as efficiently computable forms of k -consistency, can be applied before encoding a CSP instance into SAT. However, one potentially major disadvantage of the ‘encode and solve as SAT’ approach may arise from the inability of standard SAT algorithms to exploit the structure present in given CSP instances. There is some indication that by using the sparse encoding and highperforming SAT algorithms, such as Novelty or Novelty+ , competitive performance compared to state-of-the-art SLS algorithms for the CSP, such as Galinier and Hao’s Tabu Search algorithm (which will be described later in this section), can be obtained for Uniform Random Binary CSP instances [Hoos, 1998; 1999b]. Similar results appear to hold for graph colouring instances, but there is some evidence that native CSP algorithms might achieve superior performance
300
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
for random instances with large variable domains [Frisch and Peugniez, 2001]. Interestingly, when using the compact encoding, SLS algorithms for SAT show substantially weaker performance; this performance difference appears to be caused by aspects of the search space structure induced by the respective encodings; in particular, it has been shown that applied to the same CSP instances, the compact encoding generates search spaces with substantially higher local minima branching factors (see Chapter 5) than the sparse encoding [Hoos, 1998; 1999b]. Clearly, the ‘encode and solve as SAT’ approach is not limited to the CSP, but can in principle be applied to any N P-complete problem. For the CSP, this approach is particularly attractive, because as a result of the close relationship between SAT and CSP, the encoding of CSP instances into SAT is conceptually simple and very efficiently computable in practice. Whether or not it can achieve competitive performance compared to the best native CSP algorithms, particularly when applied to structured CSP instances, is somewhat unclear at the present time and needs to be further investigated.
Pseudo-Boolean CSP and WSAT(PB) An alternative to the ‘encode and solve as SAT’ approach discussed in the previous section is to extend high performing SLS algorithms for SAT to more general subclasses of the CSP. One such generalisation of SAT is obtained by maintaining the restriction to Boolean variables, while allowing constraints that are more expressive than CNF clauses. In the Pseudo-Boolean CSP, also known as the (Linear) Pseudo-Boolean Programming, all variables have Boolean values represented by integers zero and one, and the constraints between variables xi are of the form n
aij · xi ≥ bj ,
i=1
where the aij as well as bj are rational numbers. Note that analogous constraints that use any of the relations ‘≤’, ‘’, ‘=’ and ‘=’ instead of ‘≥’ can be represented using ‘≥’ constraints only. Pseudo-Boolean constraints are more expressive than CNF clauses because any clause can be expressed by a single pseudo-Boolean constraint, but there are pseudo-Boolean constraints that cannot be captured by a single CNF clause. From an operations research point of view, Pseudo-Boolean CSP can be seen as a special case of 0-1 Integer Linear Programming [Nemhauser and Wolsey, 1988].
6.6 SLS Algorithms for CSPs
301
Example 6.3 Pseudo-Boolean Constraints
As an example for a Pseudo-Boolean constraint between three variables y1 , y2 , y3 with domain {0, 1}, consider the inequality y1 + y2 − y3 ≥ 0. This constraint is equivalent to y1 + y2 + (1 − y3 ) ≥ 1, and hence to the CNF clause x1 ∨ x2 ∨ ¬x3 . The following constraint limits the number of variables that are assigned the value one to a maximum of k : (−y1 ) + . . . + (−yn ) ≥ −k
Note that in order to express this constraint by a CNF formula, nk clauses of size k + 1 each are required; these encode the condition that for every possible subset of k + 1 variables, at least one variable needs to be assigned the value ⊥. There are several SLS algorithms for Pseudo-Boolean CSP [Abramson et al., 1996; Connolly, 1992; Walser, 1997; Løkketangen, 2002]. Among these, Walser’s WSAT(PB) algorithm is of particular interest, since it is based on a direct generalisation of the WalkSAT architecture to Pseudo-Boolean CSP. The WSAT(PB) algorithm follows the WalkSAT framework as presented in Section 6.3, but uses a generalised evaluation function and variable selection mechanism. The evaluation function is based on the notion of the net integer distance of a constraint from being satisfied. More precisely, for each constraint C , let d(a, C ) denote the integer difference between the right-hand side of the inequality C and the value of the left-hand side under assignment a if C is unsatisfied, or zero otherwise; the evaluation function value of assignment a is then defined as the sum of the d(a, C ) values over all constraints unsatisfied in a. As in the SAT case, an assignment that satisfies all constraints has an evaluation function value of zero. Based on this evaluation function, WSAT(PB) uses a modified version of the WalkSAT variable selection strategy to determine the variable to be flipped in each search step. First, a constraint is uniformly selected at random from the set of all currently unsatisfied constraints. Then, a variable involved in C is selected according to the following criteria: If flipping any of the variables involved in C leads to a decrease in the evaluation function, the variable that leads to the largest such decrease is selected; if there are several such variables, the one that was flipped least recently is chosen. Otherwise, with a small probability wp, the variable that has been flipped least recently is selected; in the remaining cases, the variable whose flip would cause a minimal increase in the evaluation function is chosen; again, ties are broken in favour of the least recently flipped variable. (At the beginning of the search, ties may arise between variables that have not been flipped yet; such ties are broken uniformly at random.) Additionally, WSAT(PB)
302
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
uses a simple tabu mechanism, which excludes all variables that have been flipped within the previous tt search steps from the selection process. Different from conventional WalkSAT, WSAT(PB) supports a biased random initialisation of the search process, in which each variable is independently set to zero with probability pz and to one otherwise; however, experimental results suggest that using a biased initialisation (i.e., pz = 0.5) generally does not lead to performance improvements [Walser, 1997]. Furthermore, the WSAT(PB) algorithm, as presented by Walser [1997], can also be used to solve an optimisation variant of Pseudo-Boolean CSP in which a subset of the constraints is considered to be ‘soft’ constraints, and the objective is to find variable assignments that satisfy all conventional (‘hard’) constraints, while minimising the number of unsatisfied soft constraints. This problem can be seen as a special case of MAX-CSP, the optimisation variant of CSP, which is discussed in more detail in Chapter 7, where we also describe the mechanism used by WSAT(PB) to handle soft constraints (see Chapter 7, page 348). Many practically relevant problems can be formulated quite easily and naturally as Pseudo-Boolean CSP instances. Applied to encodings of the Progressive Party Problem [Smith et al., 1996] and radar surveillance problems (the latter include soft constraints), WSAT(PB) has been shown to achieve significantly improved performance over state-of-the-art commercial integer programming and constraint programming packages as well as compared to results from the literature [Walser, 1997].
WalkSAT Algorithms for Many-Valued SAT Another interesting subclass of CSPs is the class of non-Boolean or many-valued satisfiability problems [Béjar and Manyà, 1999; Frisch and Peugniez, 2001]. In non-Boolean SAT (NB-SAT), each variable can take values from some finite domain D , which may contain more than two values [Frisch and Peugniez, 2001]. Formally, a non-Boolean literal is of the form z/v or ¬z/v , where z is a variable and v a value from the domain of z . The value of z/v under the (non-Boolean) variable assignment a is true if, and only if, z is set to v in a, and false otherwise; the value of ¬z/v under a is obtained by negating the value of z/v under a. Analogously to conventional SAT, non-Boolean SAT is the problem to decide for a given non-Boolean CNF formula, that is, for a conjunction over disjunctions of non-Boolean literals, whether or not it has a satisfying (non-Boolean) assignment. Obviously, any conventional CNF formula can be represented by a non-Boolean CNF formula with the same number of clauses and variables. When encoding NB-SAT instances into SAT, however, a significantly higher number of variables and CNF clauses as used in the non-Boolean formula may be required.
6.6 SLS Algorithms for CSPs
303
Because NB-SAT instances have the same clause structure as conventional SAT instances, SAT algorithms such as WalkSAT can be generalised to nonBoolean SAT in a rather straightforward way; the only major difference lies in the fact that in NB-SAT, the concept of a variable flip needs to be redefined. In NB-WalkSAT, the non-Boolean variant of WalkSAT by Frisch and Peugniez [2001], a variable flip corresponds to assigning a different value to a non-Boolean variable such that the literal selected in the corresponding search step, and hence the clause in which it appears, becomes satisfied. (It may be noted that this constitutes an important difference to WSAT(PB), in which search steps do not always guarantee the satisfaction of any previously unsatisfied constraints.) Otherwise, NB-WalkSAT is identical to WalkSAT/SKC, although other WalkSAT variants can easily be extended to NB-SAT in an analogous way. Béjar and Manyà have introduced a similar extension of WalkSAT, called MV-WalkSAT, which solves a variant of many-valued SAT that is slightly richer than the non-Boolean CNF formulae underlying NB-SAT [Béjar and Manyà, 1999]. Both, NB-WalkSAT and MV-WalkSAT were applied to many-valued SAT encodings of various combinatorial decision problems, such as graph colouring, where they showed excellent performance. However, to date, the question whether these and other SLS algorithms for many-valued SAT can substantially outperform state-of-the-art SLS algorithms for SAT applied to suitably encoded instances has not been answered conclusively.
The Min Conflicts Heuristic and Variants There are a number of SLS algorithms for the general finite discrete CSP, although in many cases, the implementations are restricted to binary constraints. Among the most widely known of these are the Min Conflicts Heuristic (MCH) [Minton et al., 1990; 1992] and its variants. MCH iteratively modifies the assignment of a single variable in order to minimise the number of violated constraints, which is achieved in the following way: Given a CSP instance P , the search process is initialised by assigning each variable in P a value that is chosen uniformly at random from its domain. Then, in each local search step, first a CSP variable xi is selected uniformly at random from the conflict set K (a), which is the set of all variables that appear in a constraint that is unsatisfied under the current assignment a. A new value v is then chosen from the domain of xi , such that by assigning v to xi , the number of unsatisfied constraints (conflicts) is minimised. If there is more than one value of v with that property, one of the minimising values is chosen uniformly at random. In many ways, MCH is analogous to the SLS algorithms for SAT described earlier in this chapter. Like all SAT algorithms covered here, MCH is based on a
304
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
1-exchange neighbourhood. Considering that CNF clauses in SAT play the same role as constraint relations in CSP, the evaluation function underlying MCH, defined as the number of constraints violated under a given assignment, is essentially the same as the one used by GSAT or Novelty. The way in which MCH selects the variable and the value for this variable in each local search step is similar to the two-stage variable selection process underlying the WalkSAT architecture. Like most iterative improvement methods, MCH is essentially incomplete, since it can get stuck in local minima of the evaluation function. The simplest way to overcome this problem is to use a static restart mechanism analogous to the one found in GSAT. Not surprisingly, however, there are other, substantially more effective solutions. These are mainly derived from mechanisms used in the better performing GSAT and WalkSAT algorithms, which is somewhat surprising, considering that MCH itself predates all of the SLS algorithms for SAT discussed above, including basic GSAT and WalkSAT/SKC. WMCH is a variant of MCH that uses a random walk mechanism analogous to GWSAT [Wallace and Freuder, 1995]. In each WMCH step, first a variable xi is chosen uniformly at random from the conflict set (as in MCH). Then, with probability wp > 0, xi is assigned a value from its domain Di that has been chosen uniformly at random; this kind of search step is called a random walk step. In the remaining cases, that is, with probability 1 − wp, a conflict-minimising value is chosen and assigned, as in a conventional MCH step. As in the case of GWSAT, this random walk mechanism renders WMCH probabilistically approximately complete for wp > 0. Furthermore, WMCH has been empirically observed to perform substantially better than MCH with random restart [Stützle, 1998c]. Note that different from the random walk steps used in SLS algorithms for SAT, such as GWSAT, random walk steps in WMCH do not necessarily have the effect of satisfying a previously unsatisfied constraint. WMCH can be varied slightly such that in each random walk step, after choosing a variable xi involved in a currently violated constraint C , xi is assigned a value v such that C becomes satisfied; if no such v exists, a value is chosen at random. This variant, however, performs only marginally better than the random walk mechanism used in WMCH [Stützle, 1998c]. Analogous to GSAT and WalkSAT, MCH can be extended with a tabu search mechanism [Stützle, 1998c; Steinmann et al., 1997]. In TMCH, after each search step, that is, after the value of variable xi is changed from v to v , the variable/value pair (xi , v ) is declared tabu for the next tt steps. While (xi , v ) is tabu, value v is excluded from the selection of values for xi , except if assigning v to xi leads to an improvement over the incumbent assignment (aspiration criterion). TMCH appears to generally perform better than WMCH. Interestingly, a tabu tenure setting of tt := 2 was found to consistently result in good performance for CSP instances of different types and sizes [Stützle, 1998c].
6.6 SLS Algorithms for CSPs
305
A Tabu Search Algorithm for CSPs The tabu search algorithm by Galinier and Hao [1997], TS-GH, is currently one of the best performing SLS algorithms for the CSP. TS-GH is based on the same neighbourhood and evaluation function as MCH, but uses a different mechanism for selecting the variable/value pair involved in each search step: Amongst all pairs (x, v ) such that variable x appears in a currently violated constraint and v is any value from the domain of x, TS-GH chooses the one that leads to a maximal decrease in the number of violated constraints. If multiple such pairs exist, one of them is selected uniformly at random. As in MCH, the actual search step is then performed by assigning v to x. This best-improvement strategy is augmented with the same tabu mechanism used in TMCH: After changing the assignment of x from v to v , the variable value pair (x, v ) is declared tabu for tt search steps. As in TMCH, an aspiration criterion is used to override the tabu status of variable/value pairs corresponding to search steps that lead to improvements over the incumbent assignment. In order to achieve competitive performance of TS-GH, it is crucial to avoid computing evaluation function values for every variable/value pair that may potentially be involved in a search step. Instead, in order to implement TS-GH efficiently, a caching and incremental updating technique analogous to the one underlying efficient implementations of GSAT (see in-depth section on page 271) is used [Galinier and Hao, 1997]: After initialising the search, the effects on the evaluation function of changing the assignment of any variable x to any value d from its domain are computed and stored in a two-dimensional table of size n × k , where n is the number of variables, and k is the size of the largest domain in the given CSP instance. Based on the entries in this table, the (non-tabu) variable/value pair that results in the maximal improvement in the evaluation function value can be selected in time O (n · k ) in the worst case. After each search step, only the entries in the table that are affected by the corresponding change in the current assignment need to be updated. For CSP instances with binary constraint relations, initialising the table takes time O (n2 · k ) in the worst case; the update after a search step can be performed in time O (n · k ) in the worst case, but is substantially faster for CSP instances with sparse constraint graphs. Using this technique and an efficient implementation of the tabu mechanism, as described for GSAT/Tabu, the search steps of TS-GH are as efficient as those of MCH. It may be noted that TS-GH was originally introduced as an algorithm for MAX-CSP, the optimisation variant of CSP, in which the objective is to find a variable assignment that satisfies a maximal number of constraints. (SLS algorithms for MAX-CSP will be further discussed in Chapter 7, Section 7.3.) Empirical studies suggest that when applied to the conventional CSP, TS-GH generally achieves better performance than any of the MCH variants,
306
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
including TMCH, rendering it one of the best known SLS algorithms for the CSP [Stützle, 1998c]. Unlike in the case of TMCH, for TS-GH, the optimal setting of the tabu tenure parameter, tt, increases with instance size, which makes it harder to solve previously unknown CSP instances with peak efficiency [Stützle, 1998c].
6.7 Further Readings and Related Work SAT and CSP have been extensively studied for several decades, and there is an extensive body of literature on these problems and on algorithms for solving them. The SLS algorithms presented in this chapter have been selected primarily based on their performance and historical significance; however, there are many other SLS algorithms for SAT and CSP that are interesting and fairly widely known. One of the earliest applications of SLS techniques to SAT is found in Gu’s SAT1 algorithm family [Gu, 1992]. Developed independently and published around the same time as the basic GSAT algorithm, the first SAT1 algorithms are based on a simple iterative improvement method augmented with various techniques for overcoming search stagnation. Subsequently, these early SAT1 algorithms have given rise to numerous variants and extensions, including parallel SLS algorithms for SAT, complete SAT algorithms obtained from combining SLS techniques and backtracking algorithms, and special cases of Iterated Local Search. Many of these algorithms have been applied successfully to SATencodings of real-world VLSI circuit testing and synthesis and scheduling problems. A good overview of this line of work can be found in Gu et al. [1997]. Both, basic GSAT and the earliest SAT1 algorithms are predated by the Steepest Ascent Mildest Descent (SAMD) algorithm for MAX-SAT [Hansen and Jaumard, 1990], which will be covered in some more detail in Chapter 7 (page 329). Since the early 1990s, a large number of SLS algorithms for SAT have been introduced and studied in the literature. These include methods based on Simulated Annealing [Spears, 1993; Beringer et al., 1994; Selman et al., 1994], Evolutionary Algorithms [Gottlieb et al., 2002] and Greedy Randomised Adaptive Search Procedures (GRASP) [Resende and Feo, 1996]. While several of these algorithms have been directly compared to some of the SAT algorithms presented in this chapter, there is no evidence that any of them might generally perform better than the best WalkSAT or dynamic local search algorithms. SLS algorithms also play an important role in the theoretical complexity analysis of SAT. Using a variant of Papadimitriou’s Random Walk algorithm [Papadimitriou, 1991] that restarts the search from a randomly chosen assignment after 3n variable flips, Schöning [1999; 2002] proved that any k -CNF formula
6.7 Further Readings and Related Work
307
with n variables can be solved in time poly (n) · (2(k − 1)/k )n , where poly (n) is a polynomial function over n. By using the same algorithm with a modified search initialisation, which exploits sets of mututally independent clauses, the currently best known lower bound on the time complexity of SAT for 3-CNF formulae of poly (n) · 1.3303n was obtained [Schuler et al., 2001]. For k -CNF with k > 3, the currently best lower bounds on the time complexity of SAT were obtained by Paturi et al. [1997; 1998], based on an algorithm that first calculates the closure of the given formula F under bounded-length resolution, and then performs a simple stochastic iterated construction search in order to find models of the resulting CNF formula. This algorithm forms the basis of another recent SAT solver, UnitWalk [Hirsch and Kojevnikov, 2001], which has been empirically shown to reach the performance of state-of-the-art SLS algorithms for SAT for various classes of benchmark instances and is provable probabilistically approximately complete. The survey paper by Gu et al. [1997] provides an excellent overview of the SAT problem, including an interesting classification of SAT algorithms, complexity results, various types of benchmark instances, and a large number of practical applications. It also presents a number of SLS algorithms for SAT, which, however, is somewhat incomplete and now rather outdated, as well as a comprehensive list of references. A more recent study by Hoos and Stützle [2000a] presents a fairly complete and up-to-date overview of GSAT and WalkSAT algorithms, including detailed results on the run-time behaviour and performance of these algorithms. The GSAT architecture can be generalised to the CSP in a rather straightforward way; a GSAT variant that includes various additional SLS mechanisms, such as random walk, clause weighting, and a dynamic restart strategy, was described by Kask and Dechter [1995], who used it in an empirical study on the efficacy of preprocessing techniques for SAT and CSP. An interesting extension that combines GSAT with a tree search mechanism based on cycle-cutsets, called GSAT+CC, has been applied to Uniform Random Binary CSP [Kask and Dechter, 1996]. Preliminary empirical results suggest that for a limited class of CSP instances (those with small cutsets), using the additional tree search mechanism results in substantially improved performance, while on other subclasses of the CSP, GSAT+CC does not reach the performance of the previously mentioned GSAT variant. The Breakout Method [Morris, 1993] is an early and relatively widely known dynamic local search method for the CSP. The original Breakout Algorithm used a deterministic first improvement algorithm as its underlying local search procedure. It served as the inspiration for several other DLS algorithms for the CSP, including the previously mentioned GSAT+CC as well as GENET [Davenport et al., 1994], one of the first extensions of MCH and a precursor of the Guided Local Search algorithm by Voudouris and Tsang [Voudouris, 1997;
308
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
Voudouris and Tsang, 1999] (see also Chapter 2, Section 2.2). A recent study has produced empirical evidence suggesting that a version of the Breakout Method based on the same type of neighbourhood relation as Galinier and Hao’s Tabu Search algorithm performs significantly better than random walk extensions of MCH [Williams Jr. and Dozier, 2001]. This indicates that dynamic local search is a promising approach for future CSP algorithms. Binary CSP instances are commonly used for the evaluation of Evolutionary Algorithms, where they serve as a benchmark for investigating algorithm behaviour for constrained problems [Eiben, 2001; Marchiori and Steenbeek, 2000b; Craenen et al., 2000; Dozier et al., 1998]. From the published results, however, it is unclear how these algorithms compare to state-of-the-art SLS algorithms for the CSP in terms of performance; given the experience for SAT, it is doubtful that the proposed EAs can reach state-of-the-art performance. Recently, Solnon developed ant colony optimisation algorithm for the CSP, using a local search procedure based on MCH [Solnon, 2002a; 2002b]. This algorithm was successfully applied to Uniform Random Binary CSP and graph colouring instances; for hard Uniform Random Binary CSP instances from the solubility phase transition region, the ACO algorithm was found to perform better than WMCH. Furthermore, ACO algorithms have been successfully applied to subclasses of the CSP, such as the car sequencing problem [Solnon, 2000]. Encodings of CSP instances into SAT and vice versa have been the subject of a number of studies. Recent work has focused on the impact of different encodings on the performance of SLS algorithms [Hoos, 1999b; Frisch and Peugniez, 2001; Gent, 2002; Prestwich, 2003], as well as on the effects of polynomial preprocessing techniques on the resulting SAT and CSP instances [Walsh, 2000; Gent, 2002]. As general references for the CSP, the interested reader is referred to the book by Tsang [1993] (in parts now somewhat outdated) as well as to the more recent book by Dechter [2003].
6.8 Summary In this chapter, we presented and discussed SLS algorithms for two important and prominent combinatorial decision problems, the Propositional Satisfiability Problem (SAT) and the Constraint Satisfaction Problem (CSP). Both problems are of substantial theoretical interest and have a range of real-world applications. SAT is one of the most prominent and widely studied N P-complete decision problems. Most SAT algorithms operate on propositional formulae in
6.8 Summary
309
conjunctive normal form (CNF); because any formula can be transformed into CNF, this is not a serious limitation. Moreover, instances of other combinatorial problems can often be easily encoded into SAT, using reasonably compact and natural CNF representations. While SAT can be formulated as a special case of CSP as well as of 0-1 Integer Linear Programming, the conventional logical formulation appears to provide a much better basis for solving SAT instances efficiently. Polynomial time simplification techniques, such as unit propagation, play a crucial role for preprocessing SAT instances before applying a general SAT solver, as well as within systematic search algorithms for SAT; on their own, they can be used for solving several interesting subclasses of SAT efficiently. We discussed various types of SAT instances, including Uniform Random k -SAT, one of the most prominent classes of randomly generated SAT instances, and the solubility phase transition phenomenon observed for this subclass of SAT; SAT-encodings of other combinatorial problems; and instances from several practical applications of SAT, such as circuit verification and design. We briefly mentioned a number of generalisations of SAT, including Multi-Valued SAT, MAXSAT, and the Satisfiability Problem for Quantified Boolean Formulae (QSAT), as well as problems related to SAT, such as the Propositional Validity Problem (VAL). We presented three classes of SLS algorithms for SAT: the GSAT architecture, the WalkSAT architecture and dynamic local search algorithms for SAT. While GSAT and related algorithms played a pivotal role in the early development of SLS algorithms for SAT, recent WalkSAT and dynamic local search algorithms, such as Novelty+ and SAPS, are amongst the best SAT solvers currently known. The Constraint Satisfaction Problem (CSP) can be seen as a generalisation of SAT in which the variable domains can be different from the set {, ⊥} and the constraining conditions that have to be simultaneously satisfied by any solution can be arbitrary relations between a subset of CSP variables. Our discussion was focused on the finite discrete CSP, a N P-complete subproblem in which all variable domains are finite and discrete sets. We gave a brief overview of various widely used classes of benchmark instances for the CSP, including Uniform Random Binary CSP, as well as instances of the Graph Colouring and Quasigroup Completion problem. We discussed three SLS approaches for solving the CSP: (1) Encoding CSP instances into SAT and solving them using SLS algorithms for SAT (or any other type of SAT solver); (2) using direct generalisations of SAT algorithms for solving CSP instances; and (3) applying native SLS algorithms for the CSP. It is presently not clear whether any of these approaches consistently achieves substantially better performance than the others.
310
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
For the first approach, different SAT encodings of the CSP can be used. Between the two encodings discussed here, the sparse encoding and the compact encoding, the former produces SAT instances that appear to be consistently easier to solve for standard SLS algorithms for SAT. In the context of the second approach, we discussed direct generalisations of WalkSAT for two interesting subclasses of the general CSP, Pseudo-Boolean CSP (also known as Pseudo-Boolean Programming) and Many-Valued SAT. Our discussion of the third approach was focused on the Min-Conflicts Heuristic (MCH) and the tabu search algorithm by Galinier and Hao (TS-GH); while the former played a pivotal role in the development of SLS algorithms for SAT and CSP, the latter achieves substantially better performance than MCH and its more recent variants. Overall, SAT is (and continues to be) an ideal problem for developing and evaluating algorithmic ideas, including SLS techniques, because of its conceptual simplicity as well as its theoretical and practical significance. While many problems are more naturally encoded into CSP than into SAT, it is presently not clear that native CSP algorithms can substantially outperform highly optimised SAT algorithms on suitably chosen encodings. Generally, the development and understanding of SLS algorithms is significantly further advanced for SAT than for CSP. One of the major reasons for this lies in the fact that SAT (for CNF formulae) — because of its conceptual simplicity compared to the more general finite discrete CSP — facilitates to a larger extent the development, analysis and efficient implementation of SLS algorithms. While there appears to be substantial room for further improvements in native SLS algorithms for the CSP, one particularly promising approach for solving practically relevant types of CSP instances is to use generalisations of high-performance SLS algorithms for SAT augmented with specific methods for handling certain types of complex constraints. Finally, it may be noted that for both, SAT and CSP, the potential of many advanced SLS methods, such as Iterated Local Search, Variable Depth Search or Ant Colony Optimisation, has not been fully explored, and it is quite likely that by using such advanced techniques, further significant improvements in our ability to solve these problems can be achieved.
Exercises 6.1
[Easy] Consider the problem G of colouring the vertices of the graph shown below with four colours such that no two vertices connected by an edge have the same colour.
Exercises
311
N
B
L D
F
(a) Formulate this problem as a CSP instance. (b) Formulate this problem as a SAT instance, that is, give a CNF formula F such that any model of F corresponds to a solution of G. 6.2
[Medium] When allowing an arbitrary number of tries, GSAT is probabilistically approximately complete. Explain why nevertheless other mechanisms for achieving the PAC property, such as conflict-directed random walk, are preferable over the simple static restart mechanism.
6.3
[Easy] Describe how you can use a WalkSAT algorithm to solve the Propositional Validity Problem (VAL) for a given formula in disjunctive normal form (DNF).
6.4
[Medium] Discuss the statistical significance of the performance differences shown for various GSAT and WalkSAT algorithms in the two plots from Figure 6.7 (page 280) based on the sample sizes used for these experiments and your knowledge of appropriate statistical tests.
6.5
[Medium; Hands-on] (a) Compare the performance of SAPS and Novelty+ on instances g125.18, ais12, and logistics.c. (b) Characterise the behaviour of SAPS on logistics.c for varying ρ and α settings. (These SAT instances are part of the DIMACS/GCP, AIS, and Planning/logistics benchmark sets available from SATLIB [Hoos and Stützle, 2003]; efficient
312
Chapter 6 Propositional Satisfiability and Constraint Satisfaction
implementations of SAPS and Novelty+ are available from www.sls-book. net.) 6.6
[Easy] Formally specify the 6-Queens Problem in the form of a finite discrete CSP instance.
6.7
[Easy] Specify the CSP variable domains as well as the arity of the constraints that arise in the context of encoding a given 3-SAT instance into a CSP instance.
6.8
[Easy] How many clauses and variables are required in the worst case for encoding an NB-SAT instances with n variables and m clauses into semantically equivalent SAT instances using a sparse encoding?
6.9
[Medium] Develop the details of a caching and incremental updating scheme for the evaluation function values in the TS-GH algorithm.
6.10
[Hard] Prove that, when applied to Uniform Random 3-SAT instances, the efficient caching and updating mechanism for variable scores in GSAT, as described in the in-depth section on page 271, has time-complexity O (1) in each search step.
It is the mark of an educated mind to rest satisfied with the degree of precision which the nature of the subject admits and not to seek exactness where only an approximation is possible. —Aristotle, Philosopher
MAX-SAT and MAX-CSP MAX-SAT and MAX-CSP are the optimisation variants of SAT and CSP. These problems are theoretically and practically interesting, because they are among the conceptually simplest combinatorial optimisation problems, yet instances of optimisation problems from many application domains can be represented as MAX-SAT or MAX-CSP instances in an easy and natural way. SLS algorithms are among the most powerful and successful methods for solving large and hard MAX-SAT and MAX-CSP instances. In this chapter, we first introduce MAX-SAT. Next, we present some of the best-performing SLS algorithms for various types of MAX-SAT instances and give an overview of results on their behaviour and relative performance. In the second part of this chapter, we introduce MAX-CSP and discuss SLS methods for solving the general problem as well as the closely related overconstrained pseudo-Boolean and integer optimisation problems.
7.1 The MAX-SAT Problem MAX-SAT can be seen as a generalisation of SAT for propositional formulae in conjunctive normal form in which, instead of satisfying all clauses of a given CNF formula F with n variables and m clauses (and hence F as a whole), the objective is to satisfy as many clauses of F as possible. A solution to an instance of this problem is a variable assignment (i.e., a mapping of variables in F to truth values), that satisfies a maximal number of clauses in F . 313
314
Chapter 7
MAX-SAT and MAX-CSP
Definition 7.1 Unweighted MAX-SAT
m k i Given a CNF formula F := i=1 j=1 lij , let f (F, a) be the number of clauses in F that are unsatisfied under variable assignment a. The (Unweighted) Maximum Satisfiability Problem (MAX-SAT) is to find a variable assignment a∗ ∈ argmina∈Assign(F ) f (F, a) or, equivalently, a∗ ∈ argmaxa ∈ Assign(F ) (m − f (F, a)), that is, an assignment a∗ that maximises the number of the satisfied clauses in F .
Remark: Maximising the number of satisfied clauses in F is equivalent to minimising the number of unsatisfied clauses. Although MAX-SAT is intuitively defined as a maximisation problem, it is often formally more convenient to consider the equivalent minimisation problem; this is the reason for using the objective function f (F, a), whose value is to be minimised, in our definition of MAX-SAT. In the following, we will consider MAX-SAT as a minimisation problem.
This definition captures the search variant of MAX-SAT; the evaluation variant and associated decision problems can be defined in a similar way: Given a CNF formula F , in the evaluation variant, the objective is to determine the minimum number of clauses unsatisfied under any assignment; the associated decision problem for a given solution quality bound b is to determine whether there is an assignment that leaves at most b clauses in F unsatisfied. Note that SAT is equivalent to the decision variant of unweighted MAX-SAT with solution quality bound b := 0, that is, to deciding whether for a given CNF formula F there exists an assignment a, such that the number of clauses in F unsatisfied under a is equal to zero. Example 7.1 A Simple MAX-SAT Instance
Let us consider the following propositional formula in CNF:
F := (¬x1 ) ∧ (¬x2 ∨ x1 ) ∧ (¬x1 ∨ ¬x2 ∨ ¬x3 ) ∧ (x1 ∨ x2 ) ∧ (¬x4 ∨ x3 ) ∧ (¬x5 ∨ x3 ) The minimum number of clauses in F that are unsatisfied under any assignment, f (F, a∗ ), is one; two of the (many) assignments that achieve this
7.1 The MAX-SAT Problem
315
optimal solution quality are x1 := x2 := x3 := x4 := x5 := ⊥ and x1 := ⊥, x2 := x3 := x4 := x5 := .
It may be noted that while the SAT problem is defined for arbitrary propositional formulae, the definition of MAX-SAT is restricted to formulae in CNF. Furthermore, different from SAT, MAX-SAT is not invariant under certain logical equivalence transformations, that is, there exist MAX-SAT instances whose underlying CNF formulae are logically equivalent but whose optimal solutions are different. In particular, the solutions of a MAX-SAT instance can change when introducing multiple copies of clauses in the given CNF formula; in unweighted MAX-SAT, the number of copies of a clause can be used to express its importance relative to other clauses. As a consequence, standard simplification techniques for SAT, including unit propagation and pure literal reduction, are not directly applicable to MAX-SAT.
Weighted MAX-SAT In many applications of MAX-SAT, the constraints represented by the CNF clauses are not all equally important. These differences can be represented explicitly and compactly by using weights associated with each clause of a CNF formula.
Definition 7.2 Weighted CNF Formula
A weighted CNF formula is a pair (F, w), where F is a CNF formula F := ki m + j=1 lij , and w : {c1 , . . . , cm } → R is a function that i=1 ci with ci = assigns a positive real value to each clause of F ; w(ci ) is called the weight of clause ci . (Without loss of generality, we assume that all clauses in F are pairwise different.)
Intuitively, the clause weights in a weighted CNF formula reflect the relative importance of satisfying the respective clauses; in particular, appropriately chosen clause weights can indicate the fact that satisfying a certain clause is considered more important than satisfying several other clauses. Weighted MAX-SAT is a straightforward generalisation of unweighted MAX-SAT, in which the objective is to minimise the total weight of the unsatisfied clauses, rather than just their total number.
316
Chapter 7
MAX-SAT and MAX-CSP
Definition 7.3 Weighted MAX-SAT
Given a weighted CNF formula F = (F, w), let f (F , a) be the total weight of the clauses of F that are unsatisfied under assignment a, that is, f (F , a) := c∈CU (F,a) w(c), where CU (F, a) is the set of all clauses of F unsatisfied under a. The Weighted Maximum Satisfiability Problem (Weighted MAXSAT) is to find a variable assignment a∗ that maximises the total weight of the satisfied clauses in F , that is, a∗ ∈ argmina∈Assign(F ) f (F , a), or, equivalently, a∗ ∈ argmaxa∈Assign(F ) (f˜ − f (F , a)), where f˜ := m i=1 w (ci ) is the total weight of all clauses in F .
Although the definition allows for real-valued clause weights, it is easy to show that integer clause weights are sufficient for expressing arbitrary relative importance relations between clauses. Primarily for historically motivated efficiency reasons, many implementations of MAX-SAT algorithms support only integer clause weights. (Many older types of microprocessors performed integer operations substantially faster than floating point operations; this is not the case for modern CPUs.) However, because in most programming languages the ranges of integer data types are very limited compared to floating point data types, such implementations can sometimes not handle certain types of MAX-SAT instances. Many combinatorial optimisation problems contain logical conditions that have to be satisfied for any feasible candidate solution; these conditions are often called hard constraints, while constraints whose violation does not preclude feasibility are referred to as soft constraints. When representing such problems as weighted MAX-SAT instances, the hard constraints can be captured by choosing the weights of the corresponding CNF clauses high enough that no combination of soft constraint clauses can outweigh a single hard constraint clause. The decision problem with solution quality bound b associated with such a weighted MAX-SAT instance, where b is lower than the weight of a single hard constraint clause, but at least as high as the combined weight of any set of soft constraint clauses, then accurately represents the given problem; in particular, any solution to such a weighted MAX-SAT instance corresponds to a feasible candidate solution of the underlying combinatorial optimisation problem.
Example 7.2 A Simple Weighted MAX-SAT Instance
Consider the formula F from Example 7.1 with the following clause weights:
w(c1 ) := w(¬x1 ) := 2 w(c2 ) := w(¬x2 ∨ x1 ) := 1
7.1 The MAX-SAT Problem
w(c3 ) w(c4 ) w(c5 ) w(c6 )
:= := := :=
w(¬x1 ∨ ¬x2 ∨ ¬x3 ) w(x1 ∨ x2 ) w(¬x4 ∨ x3 ) w(¬x5 ∨ x3 )
317
:= 7 := 3 := 7 := 7
The total weight of the clauses unsatisfied under assignment x1 := ⊥, x2 := x3 := x4 := x5 := is 1, which is the optimal solution quality for the weighted MAX-SAT instance (F, w). Furthermore, when considering this weighted MAX-SAT instance with solution quality bound 6, clauses c3 , c5 and c6 can be seen as hard constraints, while all other clauses represent soft constraints. The assignment x1 := x2 := x3 := x4 := x5 := ⊥, which was optimal for the unweighted MAX-SAT instance F , has objective function value 7 and is hence not a feasible candidate solution in this context.
Complexity and Approximability Results MAX-SAT (unweighted as well as weighted) is an N P-hard optimisation problem, since SAT can be reduced to MAX-SAT in a straightforward way. Interestingly, while 2-SAT, the restriction of SAT to CNF formulae with clauses of length 2, can be solved in polynomial time, MAX-2-SAT, the corresponding restriction of MAX-SAT, is known to be N P-hard, as is MAX-3-SAT (i.e., MAX-SAT for CNF formulae with clause length 3). However, there are polynomial-time algorithms for MAX-SAT that are guaranteed to find solutions within a certain range of the optimum for arbitrary MAX-SAT instances. The first such approximation algorithm was a relatively simple greedy construction method that has been shown to solve any weighted MAX-SAT instance within a factor (approximation ratio) of at most 2 from the respective maximum total weight of the clauses satisfied under any variable assignment [Johnson, 1974]. (More recently, it has been shown that Johnson’s algorithm guarantees an approximation ratio of 1.5 [Chen et al., 1997].) Since 1994, a series of polynomial-time algorithms with substantially improved approximation ratios has been introduced [Yannakakis, 1994; Goemans and Williamson, 1994; 1995; Feige and Goemans, 1995; Mahajan and Ramesh, 1995; Asano, 1997; Asano and Williamson, 2000]; the most recent of these guarantees an approximation ratio of 1.275 [Asano and Williamson, 2000]. (Assuming the correctness of a conjecture by Zwick [1999], which is supported by numerical evidence, this latter result can be improved to 1.201 [Asano and Williamson, 2000].) For the special cases MAX-3-SAT and MAX-2-SAT, the best
318
Chapter 7
MAX-SAT and MAX-CSP
approximation algorithms guarantee solutions within 8/7 ≈ 1.1429 [Karloff and Zwick, 1997] and 1.075 [Feige and Goemans, 1995; Mahajan and Ramesh, 1995], respectively. It is interesting to note that a simple iterative improvement algorithm with a non-oblivious evaluation function (see Section 7.2) has been proven to achieve a worst-case approximation ratio of 2k /(2k − 1) for MAX-k -SAT [Khanna et al., 1994]. There are limitations on the theoretical performance guarantees that can be obtained from polynomial-time algorithms for MAX-SAT: If P = N P, there exists no polynomial-time approximation algorithm for MAX-3-SAT, and hence for MAX-SAT, with a (worst-case) approximation ratio lower than 8/7 ≈ 1.1429; for MAX-2-SAT, an analogous result rules out approximation ratios lower than 1.0472 [Håstad, 1997; 2001]. Arbitrarily improved approximation ratios α can be obtained at the cost of run-times that are exponential in instance size and depend on the desired value of α [Dantsin et al., 1998]. It is worth noting that approximation algorithms for MAX-SAT, such as the ones mentioned here, can be empirically shown to achieve much better solution qualities for many types of MAX-SAT instances; however, their performance is usually substantially inferior to that of high-performance SLS algorithms for MAX-SAT (see, e.g., Hansen and Jaumard [1990]).
Randomly Generated MAX-SAT Instances As in the case of SAT, empirical studies play a prominent role in the analysis of the performance and behaviour of MAX-SAT algorithms. In this context, various classes of randomly generated MAX-SAT instances are commonly used, in particular Uniform Random 3-SAT instances, which are typically sampled from the overconstrained region of the respective solubility phase transition, that is, the clauses/variable ratio is larger than the critical value of approximately 4.3, and the instances are unsatisfiable with very high probability (see also Chapter 6, page 262f.). A number of empirical studies have used test-sets obtained from the random clause length model, in which each of the possible 2 · n literals over n variables is included with a fixed probability in any clause (see Chapter 6, page 261). A wellknown set of such instances is part of the DIMACS collection of SAT benchmark instances; these jnh instances have 100 variables and between 800 and 900 clauses each, including satisfiable and unsatisfiable instances. Sets of weighted MAXSAT instances have been derived from test-sets of random clause length formulae, including the jnh instances, by determining for each clause an integer weight between 1 and 1 000 uniformly at random [Resende et al., 1997; Yagiura and
7.1 The MAX-SAT Problem 1 0.8
NTD[500,100,1] NTD[500,200,1] NTD[500,500,1] Uniform[1,1000]
1 0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
319
NTD[500,100,1] NTD[500,100,10] NTD[500,100,100]
0 0 100 200 300 400 500 600 700 800 900 1000
0 100 200 300 400 500 600 700 800 900 1000
Figure 7.1 Truncated discretised Gaussian distributions used for generating clause
weights for weighted Uniform Random 3-SAT test-sets; cumulative distribution functions NDT [µ, σ , δ ] for µ = 500 and various values of σ (with uniform distribution over [1, . . . , 1 000] for comparison) (left) and δ (right).
Ibaraki, 1998; 2001]. Particularly the weighted jnh (wjnh) instances have been widely used for evaluating the performance of MAX-SAT algorithms. A range of test-sets of weighted MAX-SAT has been introduced by Hoos; these consist of Uniform Random 3-SAT instances with truncated discretised Gaussian clause weight distributions. The weight distributions are characterised by three parameters µ, σ and δ , where µ is the mean, σ the standard deviation (before truncation) and δ the granularity of the underlying Gaussian probability distribution, which is symmetrically truncated to the interval [1, 2µ − 1] (see Figure 7.1); the granularity specifies the minimum difference between non-identical clause weights and hence determines, together with the range [1, 2µ −1], the number of distinct values that clause weights can take. It may be noted that for σ = 0 or δ > 2µ, all clause weights are identical, which renders the respective instances equivalent to unweighted MAX-SAT instances. Furthermore, for large values of σ /µ, the clause weight distributions approach a uniform distribution over the interval [1, 2µ − 1]. These test-sets have been designed and used for investigating the impact of clause weight variance and granularity on the performance of MAX-SAT algorithms [Smyth et al., 2003].
MAX-SAT Encodings of Other Combinatorial Problems Many N P-hard combinatorial optimisation problems can be quite easily and naturally encoded into MAX-SAT. A good example for this is the following Minimum-Cost Graph Colouring Problem (Min-Cost GCP): Given an (undirected) edge-weighted graph G := (V, E, w ) and an integer k , determine a
320
Chapter 7
MAX-SAT and MAX-CSP
minimum cost k -colouring of G, where a k -colouring of G is a mapping a that assigns an integer from the set {1, . . . , k } to each vertex in V , and the cost of a colouring a is the sum of all edge-weights w(e) for which e is an edge whose two incident vertices are assigned the same colour under a. Any instance G of this problem can be transformed into a weighted MAXSAT instance F (G) as follows: For each edge e = (v, v ) and colour l, we create a clause ce := ¬xv,l ∨ ¬xv ,l with weight w(e) (the weight of edge e in G). k Furthermore, for each vertex v in G, we create a clause cv := l=1 xv,l with := max{
v := e∈E(v) w(e), and E (v ) is the weight w wv | v ∈ V } + 1, where w is defined in such a set of all edges in E that are incident with v ; intuitively, w way that it just exceeds the maximum total weight of all edges incident to any particular vertex in G. Finally, for each vertex v in G and for each pair of different . It is easy colours l, l , we create a clause cv,l,l := ¬xv,l ∨ ¬xv,l with weight w to see that the optimal solutions of the weighted MAX-SAT instance F (G) thus obtained correspond exactly to the optimal solutions of the given Min-Cost GCP instance G. Furthermore, under the 1-flip neighbourhood, the locally optimal candidate solutions of F (G) correspond exactly to the k -colourings of G. MAXSAT-encoded Min-Cost GCP instances with integer weights have been used in several studies on SLS algorithms for MAX-SAT [Yagiura and Ibaraki, 1998; 2001; Hoos et al., 2002]. Another problem that can be easily encoded into weighted MAX-SAT is the Weighted Set Covering Problem (Weighted SCP), in which, given a set A, a collection F := {A1 , . . . , Am } of subsets Aj ⊆ A and a weight function w : F → R+ , the objective is to find a minimal weight set cover of A, where a set cover of A is a subset of F such that the sets in C cover all elements of A, that is, C = A, and the weight of C is the total weight of its elements, that is, A ∈C w(A ). This problem is N P-hard and has applications, for example, in Boolean circuit optimisation. MAX-SAT encodings of Weighted SCP instances from the ORLIB benchmark library [Beasley, 2003] have been used for evaluating the performance of MAX-SAT algorithms [Yagiura and Ibaraki, 1998; 2001; Smyth et al., 2003]. Set covering problems are discussed in more detail in Chapter 10, Section 10.3. Other hard combinatorial optimisation problems that have been encoded into MAX-SAT and used in the context of various studies on MAX-SAT algorithms include time-tabling problems [Yagiura and Ibaraki, 1998; 2001; Hoos et al., 2002], the problem of finding most probable explanations in Bayesian networks (MPE) [Park, 2002], and the problem of minimising the number of crossings that arise when embedding level-graphs into a plane (LGCMP) [Randerath et al., 2001; Smyth et al., 2003]. In almost all cases, these problems contain hard and soft constraints, which are captured by appropriately chosen weights of the respective CNF clauses. Furthermore, all of these problems have real-world applications in diverse areas, such as system diagnosis and database design.
7.2 SLS Algorithms for MAX-SAT
321
7.2 SLS Algorithms for MAX-SAT Many SLS methods have been applied to MAX-SAT, resulting in a large number of algorithms for unweighted and weighted MAX-SAT. In this section, we present some of the most prominent and best-performing algorithms, including straightforward applications of SLS algorithms for SAT to unweighted MAXSAT, variants of WalkSAT, Dynamic Local Search and Tabu Search, as well as Iterated Local Search algorithms. Additionally, we discuss some SLS algorithms that are based on larger neighbourhoods and non-oblivious evaluation functions; these approaches are rather specific to MAX-SAT. MAX-SAT algorithms based on other SLS methods, such as Simulated Annealing, GRASP or Ant Colony Optimisation, will be briefly mentioned in Section 7.4.
Solving MAX-SAT Using SLS Algorithms for SAT Any SLS algorithm for SAT can be applied to unweighted MAX-SAT in a straightforward way. The only modification required in this context is the addition of a simple mechanism that keeps track of the incumbent candidate solution and returns it at the end of the search process, provided its solution quality meets a given bound, if such a bound has been specified as an input to the algorithm. Hence, in principle any of the SLS algorithms for SAT described in Chapter 6 can be used for solving unweighted MAX-SAT. It is not clear that SLS algorithms that are known to perform well on SAT can be expected to show equally strong performance on unweighted MAX-SAT. There is some empirical evidence that for long run-times, GWSAT obtains consistently higher solution qualities than a number of earlier SLS algorithms for MAXSAT, including algorithms based on Simulated Annealing and Tabu Search, when applied to Uniform Random 3-SAT instances of varying constrainedness [Hansen and Jaumard, 1990; Selman et al., 1994]; however, the different termination criteria used in these comparative studies render these results somewhat inconclusive (see also Battiti and Protasi [1997b]). Similar results have been obtained for GSAT/Tabu [Battiti and Protasi, 1997a]; these will be discussed in more detail later. More recent results show that Novelty+, one of the best-performing SLS algorithms for SAT known to date (cf. Chapter 6, page 276ff.), typically does not reach the performance of state-of-the-art SLS algorithms for MAX-SAT on Uniform Random 3-SAT instances; this is particularly the case for highly constrained instances [Hoos et al., 2003]. Intuitively, WalkSAT algorithms such as Novelty+ have difficulties in selecting effective search steps in situations where a relatively large number of clauses is unsatisfied. In each search step, they select the variable
322
Chapter 7
MAX-SAT and MAX-CSP
to be flipped from an unsatisfied clause that is uniformly chosen at random. However, with many unsatisfied clauses, only a few of which contain variables whose flip leads to improved candidate solutions, selecting an unsatisfied clause from which such a variable can be chosen is rather unlikely. This is particularly the case for highly constrained instances, in which all candidate solutions — including optimal quality solutions — have a high number of unsatisfied clauses. GSAT algorithms, on the contrary, do not suffer from this problem, since they are able to choose the variable whose flip achieves the maximal improvement in solution quality with a probability that is independent of the number of unsatisfied clauses and instance constrainedness. There are very few results on the performance obtained by applying dynamic local search algorithms for SAT to unweighted MAX-SAT; recent empirical results suggest that SAPS, a state-of-the-art SAT algorithm (cf. Chapter 6, page 291f.), outperforms GLS [Mills and Tsang, 2000] in terms of the CPU time required for finding quasi-optimal (i.e., best known) solutions for overconstrained Uniform Random 3-SAT instances, but does not reach the performance of IRoTS (a state-of-the-art Iterated Local Search algorithm for MAX-SAT described later in this section) on these instances [Tompkins and Hoos, 2003]. Interestingly, based on RTD analyses it seems that GLS frequently suffers from search stagnation, whereas this does not appear to be the case for SAPS, which typically shows regular exponential RTDs.
WalkSAT Algorithms for Weighted MAX-SAT GSAT and WalkSAT algorithms can be generalised to weighted MAX-SAT by using the objective function for weighted MAX-SAT — that is, the total weight of the clauses unsatisfied under a given assignment — as the evaluation function based on which the variable to be flipped in each search step is selected. A WalkSAT variant for weighted MAX-SAT with explicit hard and soft constraints, WalkSAT-JKS, has been proposed by Jiang, Kautz and Selman [1995]. Applied to standard weighted MAX-SAT, this algorithm closely resembles WalkSAT/SKC, but differs in that it allows random walk steps even in situations where ‘zero damage’ flips are available (cf. Chapter 6, page 274). When hard constraints are explicitly identified (via a lower bound on the weights of CNF clauses that are to be treated as hard constraints), this WalkSAT algorithm restricts the clause selection in the first stage of the variable selection mechanism to unsatisfied hard constraint clauses, unless all hard constraints are satisfied by the current candidate assignment. The WalkSAT-JKS algorithm for weighted MAX-SAT has
7.2 SLS Algorithms for MAX-SAT
323
been shown to achieve impressive results on various sets of MAX-SAT-encoded Steiner tree problems [Jiang et al., 1995]; it should be noted, however, that these results crucially rely on a particularly effective encoding of the original Steiner tree problems into MAX-SAT. In principle, the 2-stage variable selection mechanism underlying all WalkSAT algorithms can be extended to MAX-SAT in two different ways: (i) by using the objective function for weighted MAX-SAT in the second stage as in WalkSAT-JKS (‘we’ mechanism), and (ii) by considering clause weights in the selection of an unsatisfied clause in the first stage (‘wcs’ mechanism) [Hoos et al., 2002; 2003]. The motivation behind the latter mechanism is based on the following observations: In situations where many clauses are unsatisfied, the probability for selecting the best clause, that is, the unsatisfied clause that contains one of the variables whose flip leads to a maximal improvement in the objective function value, can be very small when basing this selection on a uniform distribution, as used in standard WalkSAT. By selecting an unsatisfied clause c with a probability proportional to the weight of c, the WalkSAT search process becomes more focused on satisfying clauses with high weights. (This probabilistic clause selection method is analogous to the well-known roulette wheel selection used in many Evolutionary Algorithms.) The we and wcs mechanisms can be used individually or combined, which leads to three weighted MAX-SAT variants of any WalkSAT algorithm for SAT. A recent empirical study indicates that these variants of WalkSAT/SKC are typically outperformed by the respective Novelty+ variants (note that an analogous situation holds for the SAT versions of these WalkSAT algorithms). Furthermore, Novelty+ /wcs+we typically performs better than the two other variants and standard Novelty+, except for satisfiable weighted MAX-SAT instances (i.e., instances (F, w) where F is a satisfiable CNF formula), for which standard Novelty+ tends to outperform the wcs and we variants. Novelty+ /wcs+we tends to find optimal solutions to the wjnh instances faster (both in terms of CPU time and search steps) than other state-of-the-art algorithms for weighted MAX-SAT, including the GLS and IRoTS algorithms described later (see also Example 7.3). On other types of weighted MAX-SAT instances, in particular on heavily overconstrained weighted Uniform Random 3-SAT instances, none of the Novelty+ variants appears to reach state-of-the-art performance. However, for various types of MAX-SAT-encoded instances of other problems, including well-known classes of minimum-cost graph colouring and set covering instances, Novelty+ /wcs+we appears to find quasi-optimal (i.e., best known) solutions in significantly less CPU time than other high-performance algorithms for MAX-SAT, such as IRoTS or GLS, and appears to be the best-performing MAX-SAT algorithm known to date [Hoos et al., 2003].
324
Chapter 7
MAX-SAT and MAX-CSP
Dynamic Local Search Algorithms for Weighted MAX-SAT Generalising DLS algorithms for SAT to weighted MAX-SAT raises an interesting issue: How should the dynamically changing clause penalties used within DLS interact with the fixed clause weights that are part of any weighted MAX-SAT instance? The first algorithm for weighted MAX-SAT based on the ‘discrete Langrangian method’, DLM-SW, proposed by Shang and Wah [1997], uses an eval uation function of the form g (a) := c∈CU (a) (clp(c) + w(c)), where CU (a) is the set of clauses unsatisfied under assignment a, clp(c) is the penalty associated with clause c, which is dynamically adjusted during the search process, as in the basic DLM algorithm for SAT (cf. Chapter 6, page 288f.), and w(c) is the weight of clause c as specified in the given weighted MAX-SAT instances (F, w). Different from basic DLM for SAT, the local search procedure underlying this DLM algorithm for weighted MAX-SAT is an iterative first improvement algorithm (based on the standard 1-flip neighbourhood relation). There is some evidence that DLM-SW performs better than WalkSAT-JKS, but does not reach the performance of either, Novelty+ /wcs or Novelty+ /wcs+we, on the wjnh instances w.r.t. to the solution quality reached after a fixed number of search steps [Mills and Tsang, 1999b]. Another approach for integrating clause penalties and clause weights has been followed in a straightforward generalisation of DLM-99-SAT (cf. Chapter 6, page 288f.) to weighted MAX-SAT [Wu and Wah, 1999]; this variant of DLM differs from the SAT version only in the initialisation of the clause penalties and in the parameter settings δ +, δ − and δs . The weighted MAX-SAT variant initialises the clause penalties to the weights w(c) of the respective clauses, and chooses the parameters δ +, δ − and δs , which control the modification of the clause penalties during the search, individually for each clause c proportional to its weight w(c). This approach for handling clause weights in the context of a dynamic local search algorithm differs notably from the one followed in the earlier DLMSW algorithm. When applied to the wjnh instances, DLM-99-SAT for weighted MAX-SAT appears to perform better than DLM-SW [Wu and Wah, 1999], but there is empirical evidence that it typically does not reach the performance of Novelty+ /wcs and Novelty+ /wcs+we [Hoos et al., 2003]. Like DLM, GLSSAT, another high-performance dynamic local search algorithm for SAT, has been extended to weighted MAX-SAT [Mills and Tsang, 1999b; 2000]. The resulting GLSSAT variant considers the clause weights of the given weighted MAX-SAT instance only in the utility value of a clause c, defined as util(a, c) := w(c)/(1 + clp(c)) if clause c is unsatisfied under assignment a and zero otherwise. Otherwise, the algorithm is identical to GLSSAT (see also Chapter 6, page 287f.). It is worth noting that this approach for handling clause
7.2 SLS Algorithms for MAX-SAT
325
weights is conceptually similar to the one underlying Novelty+ /wcs. In both cases, the clause weights are not reflected directly in the evaluation function underlying the search process, but influence the search trajectory in a different way. In GLS for MAX-SAT, only the penalty values of clauses with maximal utility are increased after each local search phase; hence, clauses with high weights will typically receive high penalties, which biases the subsidiary local search algorithm towards preferentially satisfying them. On the wjnh instances, this GLS variant performs substantially better than the previously discussed DLM and WalkSAT algorithms in terms of solution quality reached after a fixed number of iterations [Mills and Tsang, 1999b; 2000]. However, when comparing the CPU time required for finding optimal solutions, both Novelty+ /wcs and Novelty+ /wcs+we typically show better performance [Hoos et al., 2003]. For weighted Uniform Random 3-SAT instances, GLS for MAXSAT generally outperforms Novelty+ /wcs+we in terms of search steps required for finding quasi-optimal solutions; but in many cases, this performance advantage is insufficient to amortise the substantially higher time complexity of search steps in GLS. For certain types of weighted MAX-SAT instances, such as Uniform Random 3-SAT instances with low variance clause weight distributions, GLS appears to be the best-performing MAX-SAT algorithm known to date [Hoos et al., 2003]. However, GLS for MAX-SAT does not reach the state-of-the-art performance of Novelty+ /wcs+we on various types of MAX-SAT-encoded instances of other problems, such as minimum-cost graph colouring or weighted set covering [Hoos et al., 2002]. Furthermore, limited RTD analyses indicate that, different from other state-of-the-art MAX-SAT algorithms, such as Novelty+ /wcs+we and IRoTS (described below), GLS for MAX-SAT tends to suffer from stagnation behaviour, which often compromises the robustness of its performance; this appears to be even the case (although to a lesser extent) when all penalty values are regularly decayed, as in GLSSAT2 [Hoos et al., 2003]. Example 7.3 Performance Results for SLS Algorithms for MAX-SAT
This example illustrates the performance differences between Novelty+ / ws+we and GLSSAT2, two of the best-performing MAX-SAT algorithms known to date. All CPU times reported in this example have been measured on PCs with dual 2.4GHz Intel Xeon processors, 512KB cache, and 1GB RAM running Red Hat Linux, Version 2.4smp. As can be seen from the left side of Figure 7.2, there is typically a clear probabilistic domination relationship between the two algorithms: The qualifed RTDs for reaching optimal quality solutions, as determined using a complete MAX-SAT algorithm, are very similar in shape and do not intersect.
MAX-SAT and MAX-CSP 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.0001 0.001
median run-time Novelty+/wcs+we [CPU sec]
Chapter 7
P(solve)
326
Novelty+/wcs+we GLSSAT2
0.01
0.1
run-time [CPU sec]
1
10
10 1 0.1 0.01 0.001 0.0001 0.0001 0.001
unsat instances sat instances
0.01
0.1
1
10
median run-time GLSSAT2 [CPU sec]
Figure 7.2 Left: Qualified RTDs for GLSSAT2 and Novelty+ /wcs+we for reaching opti-
mal solution quality on a typical unsatisfiable weighted MAX-SAT instance (wjnh304) from the wjnh benchmark set. Right: Correlation of search cost required by GLSSAT2 vs Novelty+ /wcs+we for reaching the optimal solution qualities on the wjnh benchmark set; the search cost for each instance is measured as median run-time over 100 runs. The horizontal and vertical lines indicate the median as well as the q0.1 and q0.9 quantiles of the search cost for the respective algorithm across the test-set; the diagonal lines indicate 10 times, equal, and 1/10th CPU time of Novelty+ /wcs+we compared to GLSSAT2.
When evaluated across the entire test-set wjnh, a well-known and widely used set of randomly generated weighted MAX-SAT instances (cf. Section 7.1), Novelty+ /wcs+we does not always perform better than GLSSAT2. However, especially for unsatisfiable instances, Novelty+ /wcs+we tends to find optimal quality solutions up to more than 20 times faster than GLSSAT2. Table 7.1 summarises the results of a performance comparison between GLSSAT2 and Novelty+ /wcs+we across a number of well-known test-sets, including the previous studied wjnh set and several sets of weighted Uniform Random 3-SAT instances (rndn-m/wµ-σ , where n and m denote the number of clauses and variables, and µ and σ are the parameters of the truncated discretised Gaussian weight distributions NDT [µ, σ , 1]). The wjnh test-set comprises 14 satisfiable and 30 unsatisfiable instance, and each of the rndn-∗ test-sets contains 100 unsatisfiable instances. The instances from these test-sets are much harder for state-of-the-art complete MAX-SAT solvers, such as wmaxsat-lb2-moms [Alsinet et al., 2003], than for the SLS algorithms studied here. For example, the run-time required by wmaxsat-lb2-moms for solving instance wjnh304 is more than ten times higher than the median run-time of Novelty+ /wcs+we (0.29 CPU seconds vs 0.023 CPU seconds; interestingly, more than 85% of the run-time of wmaxsat-lb2-moms is needed to find an optimal solution rather than for
7.2 SLS Algorithms for MAX-SAT
Test-set
GLSSAT2
wjnh 0.0242 (2 977) rnd200-1000/w1000-200 0.4589 (69 523) rnd200-1000/w1000-1000 1.3637 (217 961) rnd200-1400/w1400-1400 3.1840 (408 888)
327
Novelty+ /wcs + we
fd
0.0131 (9 261) 0.9995 (747 376) 1.2041 (1 033 582) –
0.02 / 0.68 0.49 / 0.07 0.38 / 0.35 1/0
Table 7.1 Performance of GLSSAT2 vs Novelty+ /wcs+we on selected sets of randomly
generated weighted MAX-SAT instances; the table entries are median search cost values over the respective test-sets, where the search cost for a given problem instance is defined as the median run-time required for finding a (quasi-)optimal solution and is reported in CPU seconds (search steps); ‘–’ indicates that no such solutions could be found within more than 1 000 CPU seconds. The search cost values for each algorithm were determined from 100 runs per instance, and algorithms were always run (without restart) until a (quasi-) optimal solution was found. The two values in the fd column indicate the fraction of instances from the respective test-set on which GLSSAT2 probabilistically dominated Novelty+ /wcs+we (first value) and vice versa (second value).
proving its optimality). Most of the other instances considered here cannot be solved by complete MAX-SAT algorithms within reasonable amounts of CPU time. Consequently, we used multiple, very long runs of state-of-theart SLS algorithms for MAX-SAT, including GLS and Novelty+ /wcs+we to determine the best possible solution qualities. More precisely, we made sure that in each run, after the final solution quality was reached at some run-time t∗ , the respective algorithm continued searching for at least time 10t∗ without finding another improvement. For all instances where provably optimal solution qualities are known, these were shown to be correctly determined by this protocol. (See also Smyth et al. [2003].) The results shown in Table 7.1 indicate that often, but not always, GLSSAT2 tends to perform better than Novelty+ /wcs+we; this is particularly pronounced for the highly overconstrained rnd200-1400/w1400-1400 instances, which Novelty+ /wcs+we fails to solve within 1 000 CPU seconds. Note that GLSSAT2 typically requires a substantially lower number of search steps; however, although optimised implementations of both algorithms were used, the time complexity of individual search steps is substantially higher for GLSSAT2 than for Novelty+ /wcs+we, which is due to the inherent differences between the two underlying SLS methods (see also Chapter 6, Sections 6.3 and 6.4). Figure 7.3 (left side) shows the solution quality distributions (SQDs) obtained for GLSSAT2 and Novelty+ /wcs+we and run-times of 1 and 10 CPU seconds on a MAX-SAT-encoded instance of the Level Graph Crossing
MAX-SAT and MAX-CSP 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
10 GLSSAT2 (10s) Novelty+/wcs+we (10s) Novelty+/wcs+we (1s) GLSSAT2 (1s)
0
2
4
6
8
relative solution quality
relative solution quality
Chapter 7
P
328
10
Novelty+/wcs+we GLSSAT2
8 6 4 2 0 0.1
1
10
100
run-time [CPU sec]
Figure 7.3 Left: Solution quality distributions (SQDs) for GLSSAT2 vs Novelty+ /wcs+we
on MAX-SAT-encoded LGCMP instance lgcmp75-1 for run-times of 1 and 10 CPU sec. Right: Development of solution quality over time for the same instance; the curves for each algorithm correspond to the median values of the underlying SQDs, while the error bars indicate the respective q0.1 and q0.9 quantiles. The data underlying these graphs were obtained from 100 runs of each algorithm.
Minimisation Problem (LGCMP). This weighted MAX-SAT instance has 7 500 variables and 128 306 clauses, and all clause weights are either 1 (for clauses derived from the optimisation objective) or 13 307 (for clauses that correspond to hard constaints). The SQD graphs show relative solution quality values defined as sq/sq ∗ − 1, where sq ∗ is the best known solution quality for the given weighted MAX-SAT instance. Each SQD is based on 100 runs of the respective algorithm. As can be clearly seen, for short run-times, the performance of Novelty+ /wcs+we drastically dominates that of GLSSAT2 for the given instance, while for longer run-times, GLSSAT2 tends to find higher-quality solutions. Not surprisingly, allowing more run-time leads to substantial improvements in the solution qualities reached by both algorithms; furthermore, the variation in solution quality obtained from different runs decreases. (This behaviour is rather typical for SLS algorithms for optimisation problems.) These observations are further confirmed by the SQT curves shown on the right side of Figure 7.3, which confirm that for run-times up to 1 CPU sec, the solution qualities reached by Novelty+ /wcs+we are significantly higher than those obtained by GLSSAT2, while for longer tun-times, GLSSAT2 finds higher-quality solutions. As can be seen in Table 7.2, slightly different performance results are obtained for other sets of MAX-SAT-encoded set covering and graph colouring instances (scp4 and gcp-yi), as well as for smaller LGCMP instances
7.2 SLS Algorithms for MAX-SAT
Test-set
GLSSAT2 t = 0.1s t = 1s
scp4 gcp-yi lgcmp75 lgcmp100
1.33 11.24 8.1 · 106 1.1 · 106
0.02 0.22 49.3 248.82
t = 10s 0.01 0.15 0.20 0.25
329
Novelty+ /wcs+we t = 0.1s t = 1s t = 10s 0.02 0.12 3.27 10.02
0.01 0.05 0.09 1.39
0 0.01 0 0.51
Table 7.2 Performance of GLSSAT2 vs Novelty+ /wcs+we on selected sets of MAX-
SAT-encoded instances of other problems; each set contains 10 instances. The table entries are median relative solution quality values for various run-times; relative solution quality is defined as sq/sq∗ − 1, where sq is an absolute solution quality value for an instance with optimal (or best known) solution quality sq∗. For scp4, the sq∗ values are the known optimal solution qualities, while for the other test-sets, they are the best solution qualities ever observed by any of the algorithms studied here within 100 runs of 100 CPU seconds each. The medians are taken from the distributions of median relative solution quality over the respective test-set; the underlying solution quality distributions for each problem instance are based on 100 runs of the respective algorithm.
(lgcmp75); for these, Novelty+ /wcs+we tends to find significantly higher quality solutions than GLSSAT2 for a wide range of run-times.
Tabu Search Algorithms for MAX-SAT Hansen and Jaumard’s Steepest Ascent Mildest Descent (SAMD) algorithm for unweighted MAX-SAT can be seen as one of the earliest applications of Tabu Search to MAX-SAT or SAT [Hansen and Jaumard, 1990]. (The name of the algorithm is derived from a formulation of MAX-SAT as a maximisation problem.) SAMD can be seen as a variant of GSAT/Tabu that imposes a tabu tenure of tt steps only on variables flipped in non-improving steps; variables flipped in improving steps are not declared tabu. Furthermore, SAMD terminates if after a fixed number of search steps no improvement in the objective function value has been achieved. SAMD has been shown to outperform a standard Simulated Annealing algorithm for MAX-SAT as well as various approximation algorithms with theoretical performance guarantees (see also Section 7.1) on a number of Uniform Random k -SAT instances with k ∈ {2, 3, 4} and varying constrainedness [Hansen and Jaumard, 1990]. Although GWSAT has been reported to achieve better solution qualities than SAMD [Selman et al., 1994], the dif-
330
Chapter 7
MAX-SAT and MAX-CSP
ferences in the underlying termination criteria and run-times make a meaningful comparison very difficult [Hansen and Jaumard, 1990; Battiti and Protasi, 1997b]. A tabu search algorithm that is equivalent to GSAT/Tabu without random restart has been applied to unweighted MAX-SAT; experimental results on Uniform Random 3-SAT instances suggest that this variant performs slightly better than SAMD and possibly exceeds the performance of GWSAT [Battiti and Protasi, 1997a]. There is also some indication that a variant of this tabu search algorithm achieves further slight performance improvements; this variant uses an aspiration criterion (which allows a search step to be performed regardless of the tabu status of the corresponding variable if it achieves an improvement in the incumbent candidate solution) and a slightly modified tie-breaking rule for choosing one of several search steps that lead to an identical improvement in objective function value. A further variant of tabu search for MAX-SAT, TS-YI, is based on a first improvement search strategy [Yagiura and Ibaraki, 1998; 2001]. Like all SLS algorithms for MAX-SAT discussed so far, it is based on the 1-flip neighbourhood relation and uses the objective function for evaluating search steps. The search is started from a randomly chosen assignment, and none of the variables are tabu. Then, in each step, the neighbourhood of the current variable assignment is scanned in random order, and the first variable flip that leads to an improving neighbouring variable assignment is executed. If no improving search step is possible, a minimally worsening step (w.r.t. to the standard evaluation function) is performed. Any variable that is flipped is declared tabu for a fixed number tt of subsequent search steps. The search process is terminated after a fixed amount of CPU time or a fixed number of search steps. TS-YI has been applied to various types of unweighted and weighted MAXSAT instances. There is some empirical evidence that for unweighted instances generated according to the random clause length model, this tabu search algorithm appears to perform better than WalkSAT/SKC and substantially better than basic GSAT. However, for various test-sets of weighted MAX-SAT instances, particularly for MAX-SAT-encoded minimum-cost graph colouring, weighted set covering, and time tabling problems, its performance appears to be worse than that of WalkSAT/SKC, but substantially better than that of basic GSAT [Yagiura and Ibaraki, 2001]. While it is not clear how the performance of TS-YI compares to that of the previously discussed tabu search algorithms for MAXSAT, there is no evidence that it generally reaches or exceeds the performance of GLS or of the wcs variants of Novelty+ . Finally, Robust Tabu Search (RoTS; see also Chapter 2, page 80) has recently been applied to MAX-SAT [Smyth et al., 2003]. The RoTS algorithm for MAXSAT is closely related to GSAT/Tabu for weighted MAX-SAT. In each search step, one of the non-tabu variables that achieves a maximal improvement in the
7.2 SLS Algorithms for MAX-SAT
331
total weight of the unsatisfied clauses is flipped and declared tabu for the next tt steps. Different from GSAT/Tabu, RoTS uses an aspiration criterion that allows a variable to be flipped regardless of its tabu status if this leads to an improvement in the incumbent candidate solution. Additionally, RoTS forces any variable whose value has not been changed over the last 10 · n search steps to be flipped (where n is the number of variables appearing in the given MAX-SAT instance). This diversification mechanism helps to avoid stagnation of the search process. Finally, instead of using a fixed tabu tenure, every n search steps, RoTS randomly chooses the tabu tenure tt from [ttmin , . . . , ttmax ] according to a uniform distribution. The tabu status of variables is determined by comparing the number of search steps that have been performed since the most recent flip of a given variable with the current tabu tenure; hence, changes in tt immediately affect the tabu status and tenure of all variables. An outline of RoTS for MAX-SAT is given in Figure 7.4. Note that if several variables give the same best improvement for the evaluation function, one of these variables is randomly chosen. Limited empirical results indicate that on the wjnh instances, RoTS requires generally more search steps but in many cases less CPU time for finding optimal solutions than the weighted MAX-SAT version of GLS; however, it does not reach the performance of the wcs variants of Novelty+ on these instances. On Weighted Uniform Random 3-SAT instances, RoTS typically shows significantly better performance than Novelty+ /wcs+we, both in terms of search steps and CPU time required for finding (quasi-)optimal solutions. In terms of CPU time, it typically also exceeds the performance of GLS for MAX-SAT for both weighted and unweighted Uniform Random 3-SAT instances; this performance advantage appears to be particularly pronounced for highly constrained instances [Hoos et al., 2002; 2003].
Iterated Local Search for MAX-SAT Yagiura and Ibaraki [1998; 2001] proposed and studied a simple ILS algorithm for MAX-SAT, ILS-YI, which initialises the search at a randomly chosen assignment and uses a subsidiary iterative first improvement search procedure based on the 1-flip neighbourhood, as well as a perturbation phase that consists of a fixed number of (uninformed) random walk steps; the acceptance criterion always selects the better of the two given candidate solutions. While ILS-YI generally appears to perform better than GSAT in terms of solution quality reached after a fixed amount of CPU time, for various sets of benchmark instances, including MAX-SAT-encoded minimum cost graph colouring problems, its performance is weaker than that of WalkSAT/SKC or TS-YI. However, there are some cases, in particular a large MAX-SAT encoded real-world time-tabling instance, for which
332
Chapter 7
MAX-SAT and MAX-CSP
procedure RoTS(F , ttmin , ttmax , maxNoImpr)
input: weighted CNF formula F , positive integers ttmin , ttmax , maxNoImpr output: variable assignment aˆ
n := number of variables in F ; a := randomly chosen assignment of the variables in F ; aˆ := a; k := 0; repeat if (k mod n = 0) then
tt := random([ttmin , . . . , ttmax ]); end
v := randomly selected variable whose flip results in a maximal decrease in g(a); if g(a with v flipped) < g(ˆ a) then a := a with v flipped; else if ∃ variable v that has not been flipped for ≥ 10 · n steps then a := a with v flipped; else
v := randomly selected non-tabu variable whose flip results in the maximal decrease in g(a) a := a with v flipped; end if g(a) < g(ˆ a) then aˆ := a; end
k := k + 1; until no improvement in aˆ for maxNoImpr steps return aˆ end RoTS Figure 7.4 Algorithm outline of Robust Tabu Search for MAX-SAT; g(a) denotes the total weight of the clauses in the given formulae that are unsatisfied under a; a variable is tabu if, and only if, it has been flipped during the last tt search steps.
ILS-YI appears to perform better than TS-YI and WalkSAT/SKC [Yagiura and Ibaraki, 2001]. Another ILS algorithm for MAX-SAT has been recently proposed by Smyth, Hoos and Stützle [2003]. This algorithm, IRoTS, uses the same random initialisation as ILS-YI. Its subsidiary local search and perturbation phases are both based on the previously described RoTS algorithm. Each local search phase executes RoTS steps until no improvement in the incumbent solution has been achieved
7.2 SLS Algorithms for MAX-SAT
333
for a given number of steps. The perturbation phase consists of a fixed number of RoTS search steps with tabu tenure values that are substantially higher than the ones used in the local search phase. At the beginning of each local search and perturbation phase, all variables are declared non-tabu, irrespectively of their previous tabu status. If applying perturbation and subsequent local search to a candidate solution s results in a candidate solution s that is better than the incumbent candidate solution, the search is continued from s . If s and s have the same solution quality, one of them is chosen uniformly at random. In all other cases, the worse of the two candidate solutions s and s is chosen with probability 0.9, and the better one otherwise. Empirical results show that when comparing the CPU time required for finding optimal or quasi-optimal solutions, IRoTS typically performs significantly better than GLS and Novelty+ /wcs+we on weighted and unweighted Uniform Random 3-SAT instances; the performance advantage of IRoTS is particularly pronounced for highly constrained instances with low-variance clause weight distributions. Overall, IRoTS appears to be the best-performing MAX-SAT algorithm for these types of instances. On the wjnh instances, IRoTS does not reach the performance of the wcs variants of Novelty+ . Furthermore, while IRoTS finds optimal solutions for a significant fraction of the unsatisfiable instances faster (in terms of CPU time) than GLS, it does not reach the performance of GLS on many of the satisfiable wjnh instances. Similarly, for several classes of MAX-SAT-encoded instances of other combinatorial optimisation problems, such as minimal cost graph colouring and weighted set covering, IRoTS performs significantly worse than GLS for weighted MAX-SAT in terms of finding solutions of optimal or best known quality [Smyth et al., 2003]. Limited experimentation suggests that using a perturbation phase consisting of a sequence of random walk steps instead of the previously described robust tabu search procedure typically results in a decrease in performance. Example 7.4 Performance Results for SLS Algorithms for MAX-SAT (2)
The following experiments illustrate the performance differences between IRoTS and GLSSAT2 on weighted and unweighted Uniform Random 3-SAT instances. All CPU times reported in this example have been measured on PCs with dual 2.4GHz Intel Xeon processors, 512KB cache, and 1GB RAM running Red Hat Linux, Version 2.4smp. First, in order to assess the differences in the solution qualities achieved by both algorithms, solution-quality distributions (SQDs) were measured for IRoTS and GLSSAT2 applied to an unweighted Uniform Random 3SAT instance with 500 variables and 5 000 clauses over a range of run-times. The left side of Figure 7.5 shows the development of solution quality over time (SQT curves) obtained from the SQD data for both algorithms. Clearly,
MAX-SAT and MAX-CSP
0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
GLSSAT2 IRoTS
0.1
1
10
run-time [CPU sec]
100
median run-time IRoTS [CPU sec]
Chapter 7
relative solution quality
334
10 1
0.1 0.01 0.001 0.0001 0.0001 0.001
0.01
0.1
1
10
median run-time GLSSAT2 [CPU sec]
Figure 7.5 Left: Development of the relative solution quality over time for IRoTS vs
GLSSAT2 on a hard unweighted Random 3-SAT instance with 500 variables and 5 000 clauses; the curves for each algorithm correspond to the median values of the underlying SQDs, while the error bars indicate the respective q0.1 and q0.9 quantiles. The data underlying these graphs were obtained from 100 runs of each algorithm. Right: Correlation of the search cost required by IRoTS vs GLSSAT2 for reaching (quasi-)optimal solution qualities on the rnd200-2000/u benchmark set; the search cost for each instance is measured as median run-time over 100 runs. The horizontal and vertical lines indicate the median, q0.1 , and q0.9 of the search cost for the respective algorithm across the test-set; the diagonal lines indicate equal, 1/10th and 1/100th CPU time of IRoTS compared to GLSSAT2.
IRoTS tends to reach solutions of better quality than GLSSAT2 for any given run-time; this is particularly the case for short runs. In the next step, the performance of the two algorithms was compared across the test-set rnd200-2000/u, comprising 100 unweighted Uniform Random 3-SAT instances with 200 variables and 2 000 clauses each, all of which are unsatisfiable. Since state-of-the-art systematic search algorithms for MAX-SAT were found to be unable to find provably optimal solutions for the MAX-SAT instances used in this example within a reasonable amount of CPU time, quasi-optimal solutions were determined using the same method as described in Example 7.3. For each of the 100 instances from the test-set, we measured qualified RTDs for finding a quasi-optimal solution over 100 runs of IRoTS and GLSSAT2, respectively. (Both algorithms were always run — without restart — until the desired solution quality was reached.) The right side of Figure 7.5 shows the correlation between the median runtimes of IRoTS and GLSSAT2 over the test-set. Clearly, IRoTS performs substantially better than GLSSAT2 across the entire test-set and generally tends to find quasi-optimal solutions up to 80 times faster. As can be seen from Table 7.3, similar results are obtained for other unweighted and weighted Uniform Random 3-SAT test-sets. As in Example 7.3,
7.2 SLS Algorithms for MAX-SAT
Test-set
rnd200-1000/u rnd200-1400/u rnd200-2000/u rnd200-1000/w1000-200 rnd200-1000/w1000-1000 rnd200-1400/w1400-1400
335
IRoTS
GLSSAT2
fd
0.0132 (6 655) 0.0141 (6 584) 0.0171 (6 927)
0.0384 (5 665) 0.2525 (32 473) 0.3487 (31 510)
0.99 / 0 0.98 / 0 0.99 / 0
0.0729 (47 523) 0.5242 (318 897) 0.1021 (57 905)
0.4589 (69 523) 1.3637 (217 961) 3.1840 (408 888)
0.57 / 0 0.35 / 0.11 0.97 / 0
Table 7.3 Performance of IRoTS vs GLSSAT2 on selected benchmark instances for
unweighted and weighted MAX-SAT; the table entries are median search cost values over the respective test-sets, where the search cost for a given problem instance is defined as the median run-time required for finding a quasi-optimal solution and reported as CPU seconds (search steps). The search cost values for each algorithm were determined from 100 runs per instance, and algorithms were always run (without restart) until a quasioptimal solution was found. The values in the fd column indicate the fraction of instances from the respective test-set on which IRoTS probabilistically dominated GLSSAT2 (first value) and vice versa (second value).
rndn-m/wµ-σ denotes a set of 100 Uniform Random 3-SAT instances with n variables and m clauses, with clause weights drawn from a truncated discretised Gaussian weight distribution NDT [µ, σ , 1]. (It may be noted that the only difference between the instances in test-sets rndn-m/u and rndn-m/w* are the clause weights.) Interestingly, as the constrainedness of the instances (i.e., the number of clauses per variable) is increased, the performance of GLSSAT2 deteriorates, while the performance of IRoTS remains relatively unaffected. This effect is more pronounced for the weighted instances, which also tend to be harder for both algorithms.
MAX-SAT Algorithms Based on Larger Neighbourhoods While all prominent and high-performance SLS algorithms for SAT are based on the 1-flip neighbourhood, there are very successful SLS algorithms for MAXSAT that are based on larger neighbourhoods. Yagiura and Ibaraki [1998; 1999; 2001] studied various such algorithms, ranging from simple iterative first improvement to iterated local search methods. The key to the success of these algorithms is a combination of a clever reduction of the 2- and 3-flip neighbourhoods with an efficient caching scheme for evaluating moves in these larger neighbourhoods. This reduction is done in such a way that no possible improving neighbour is lost,
336
Chapter 7
MAX-SAT and MAX-CSP
that is, local optimality remains invariant under the neighbourhood reduction. Furthermore, under realistic assumptions, each local search step requires time O(n + m) for the 2-flip neighbourhood and time O(m + t2 n) for the 3-flip neighbourhood in the average case, given an input formula with n variables, m clauses, and no more than t occurrences of each variable (see also in-depth section below); this result has been empirically confirmed for a range of weighted Uniform Random 3-SAT test-sets [Yagiura and Ibaraki, 1998; 1999]. Empirical results for variants of TS-YI and ILS-YI that use the reduced 2- and 3-flip neighbourhoods indicate that on various test-sets of weighted MAXSAT instances, these larger neighbourhoods lead to significant performance improvements in terms of the solution quality reached after a fixed amount of CPU time. Particularly for MAX-SAT-encoded minimum-cost graph colouring and weighted set covering instances, as well as for a big, MAX-SAT-encoded realworld time-tabling instance, the 2-flip variant of ILS-YI performs better than the other versions of ILS-YI and any of the TS-YI variants. It is presently not clear whether other, state-of-the-art MAX-SAT algorithms can reach or exceed the performance of ILS-YI (or TS-YI) on these types of instances; however, there is some preliminary evidence that GLS and Novelty+ /wcs+we may perform better in various cases. It is also unclear whether the use of larger neighbourhoods might lead to performance improvements in state-of-the-art SLS algorithms for MAX-SAT, such as Novelty+ /wcs+we, GLS or IRoTS.
In Depth Efficient Evaluation of k-Flip Neighbourhoods for MAX-SAT The key to efficiently implementing algorithms for SAT or MAX-SAT that are based on performing iterative improvement steps in a multiflip neighbourhood (i.e., a k -flip neighbourhood with k > 1) lies in a combination of two techniques. Both of these make use of the fact that the effect of a multiflip on the evaluation function can be decomposed into the effects of single-variable flips and correction terms. In the following, we explain these techniques primarily for the special case of the 2-flip neighbourhood, and mention generalisations for k > 2 only briefly (for details, see Yagiura and Ibaraki [1999]). Given a weighted CNF formula with clauses c1 , . . . , cm , consider a search step in the 2-flip neighbourhood, in which the truth values of variables x and y are flipped. Let ∆g(a, {x, y}) be the change in evaluation function value caused by the 2-flip of x and y in a. (Recall that we defined MAX-SAT as a minimisation problem; hence, an improving search step corresponds to ∆g(a, {x, y}) < 0.) This value can be computed as
∆g(a, {x, y}) :=
m i=1
∆gi (a, {x, y}) =
m
(∆gi (a, {x}) + ∆gi (a, {y}) − hi (a, {x, y})) ,
i=1
where ∆gi (a, {z}) captures the effect of flipping a single variable z on the satisfaction of clause ci , and hi (a, {x, y}) is an adjustment term that captures the possible interference of flipping x and y simultaneously.
7.2 SLS Algorithms for MAX-SAT
337
Clearly, if a clause ci does not contain occurrences of both x and y , we have hi (a, {x, y}) := 0. Otherwise, for a clause ci that contains x as well as y , the following cases can be distinguished:
• Clause ci is unsatisfied under a. In this case, we use the adjustment term hi (a, {x, y}) := w(ci ); since the flip of either variable in {x, y} renders the clause satisfied, this adjustment is needed to prevent double-counting the satisfaction of the clause as a result of the 2-flip.
• Clause ci is critically satisfied under a, that is, it contains exactly one satisfied literal l. In this case, if l is not equal to x or y or either of their negations, we use hi (a, {x, y}) := 0. Otherwise, we need to use hi (a, {x, y}) := −w(ci ) to account for the fact that the 2-flip leaves the satisfaction status of the clause unchanged.
• Clause ci contains exactly two satisfied literals under a. If those two literals contain both variables x and y , we use hi (a, {x, y}) := −w(ci ) since flipping both variables would render ci unsatisfied otherwise, hi (a, {x, y}) := 0. • Clause ci contains more than two satisfied literals under a. In this case, neither any single flip nor the 2-flip can render ci unsatisfied, and we can use hi (a, {x, y}) := 0. The values ∆g(a, {x, y}) that provide the basis for comparing neighbouring candidate solutions and for incrementally updating the evaluation function value after each search m step can be easily determined from the values h(a, {x, y}) := i=1 hi (a, {x, y}) and m ∆g(a, {z}) := i=1 ∆gi (a, {z}). These values can be efficiently cached and updated after each search step using a simple extension of the mechanism described in the in-depth section on the efficient implementation of GSAT (page 271). It is important to note that only those values h(a, {x, y}) that are not equal to zero need to be memorised. Based on this observation, it can be shown that for a formula with n variables, m clauses, and maximal clause length l, in which no variable has more than t occurrences, this mechanism requires O(n + min{m · l2 , n2 }) memory in the worst case; the worst-case time complexity for a single search step is O(n2 + t · l2 ), under the (practically realistic) assumption that each value h(a, {x, y}) can be stored and retrieved in constant time. This mechanism can be generalised to k -flip neighbourhoods with k > 2; in that case, the worst-case memory requirement is O(n + min{m · lk , nk }) and the worst-case time complexity for a single search step is O(2k · nk + k · t · lk ). Under some additional assumptions, it can be shown that the expected memory requirement is only O(n + m) and the expected time complexity of a search step is O(2k · nk + t) for any constant k . A second technique, which achieves further improvements in efficiency when using the 2-flip neighbourhood, is based on the following observation. If both, m ∆g(a, {x}) := m i=1 gi (a, {x}) and ∆g(a, {y}) := i=1 gi (a, {y}), are larger or equal to zero, then flipping x and y can only result in an improvement in evaluation function m value if h(a, {x, y}) := i=1 hi (a, {x, y}) < 0. Hence, any search for improving neighbours in the 2-flip neighbourhood can be restricted to all those pairs of variables x, y for which h(a, {x, y}) < 0. It can be shown that this restriction reduces the size of the neighbourhood to O(n + m · l) in the worst case; an average-case analysis indicates that, under certain conditions, the expected size of this restricted neighbourhood is at most n + 3/4 · m. This leads to a further reduction of the worst-case time complexity to O(n + m · l + t · l2 ) for a single search step.
338
Chapter 7
MAX-SAT and MAX-CSP
A similar approach leads to a reduced 3-flip neighbourhood of size O(m · l3 + n · t2 · l2 } in the worst case and of O(m + n · t2 } in the average case, with a worst-case time complexity of O(m · l3 + n · l2 · t2 ) for a single search step.
Non-Oblivious SLS Algorithms for MAX-SAT All SLS algorithms for SAT and MAX-SAT discussed so far use evaluation functions that are oblivious in the sense that they are not affected by the degree of satisfaction of any given clause c, that is, by the number of literals that are satisfied in c under a given assignment. Non-oblivious evaluation functions, in contrast, reflect the degree of satisfaction of the clauses satisfied by a given variable assignment. Theoretical analyses have shown that iterative improvement local search achieves better worst-case approximation ratios for unweighted MAX-SAT when using non-oblivious evaluation functions instead of the standard, oblivious evaluation function that counts the number of clauses unsatisfied under a given assignment [Alimonti, 1994; Khanna et al., 1994]. In particular, using the nonoblivious evaluation functions g2 (a) := 3/2· w(S1 (a))+2· w(S2 (a)) and g3 (a) := w(S1 (a))+9/7· w(S2 (a))+10/7· w(S3 (a)), where w(Si (a)) is the total weight of the set of all clauses satisfied by exactly i literals under assignment a, in conjunction with iterative improvement algorithms leads to worst-case approximation ratios of 4/3 and 8/7 for MAX-2-SAT and MAX-3-SAT, respectively. (Similar non-oblivious evaluation functions and respective approximation results exist for MAX-k -SAT with k > 3.) Battiti and Protasi [1997a; 1997b] proposed and studied a number of SLS algorithms for MAX-SAT that make use of these non-oblivious evaluation functions. The simplest of these is an iterative best improvement algorithm; it can be seen as a variant of basic GSAT that terminates as soon as a local minimum state is reached. For this algorithm (applied to MAX-3-SAT), using the non-oblivious evaluation function g3 instead of the standard GSAT evaluation function leads to improved solution qualities; however, both of these algorithms perform significantly worse than GWSAT and SAMD, except when applied to weakly constrained Uniform Random 3-SAT instances. Furthermore, GSAT, GWSAT and GSAT/Tabu perform significantly worse when using a non-oblivious evaluation function [Battiti and Protasi, 1997a]. These observations suggest that the theoretical advantage of using non-oblivious evaluation functions manifests itself only in the worst case, or (more likely) that it does not apply to SLS methods more powerful than iterative improvement.
7.2 SLS Algorithms for MAX-SAT
339
Note that the non-oblivious and oblivious evaluation functions discussed above have different local minima. Based on this observation, Battiti and Protasi designed a hybrid SLS algorithm that first performs non-oblivious iterative best improvement until a local minimum w.r.t. to the non-oblivious evaluation function is reached, followed by an oblivious iterative best improvement phase that is continued beyond its first local minimum. This hybrid SLS algorithm reaches better solution qualities than SAMD for various Uniform Random 3-SAT testsets, but its performance is inferior to GWSAT for long run-times [Battiti and Protasi, 1997b]. Better performance is achieved by H-RTS, a complex hybrid SLS algorithm that combines non-oblivious and oblivious iterative best improvement with an oblivious reactive tabu search procedure [Battiti and Protasi, 1997a]. H-RTS starts the search from a randomly chosen variable assignment; next, non-oblivious iterative best improvement (BIN) steps are performed until a local minimum (w.r.t. the non-oblivious evaluation function) is reached. Then, phases of oblivious iterative best improvement (BI) search and tabu search (TS) are alternated until the total number of variable flips performed since initialising the search reaches 10 · n, where n is the number of variables in the given MAX-SAT instance, at which point the search is re-initialised (see Figure 7.6). Each BI search phase ends when a local minimum w.r.t. the standard oblivious evaluation function is reached. The subsequent TS phase performs 2·(tt+1) steps of oblivious iterative best improvement tabu search with fixed tabu tenure tt.
CDET(not lmin(g))
CDET(not lmin(g'))
RP
CDET(L): adjust(tt) CDET(R)
CDET(lmin(g))
BI
CDET(lmin(g'))
DET: tt:=ttinit
BIN
TS(tt)
CDET(not (R or L))
Figure 7.6 GLSM representation of the H-RTS algorithm; R := mcount(10 · n + 1),
L := scount(2 · (tt + 1)), and the transition action adjust(tt) adjusts the tabu tenure setting. (For details, see text.)
340
Chapter 7
MAX-SAT and MAX-CSP
When the search is initialised (or restarted), tt is set to a fixed value ttinit . After each TS phase, tt is adjusted based on the Hamming distance covered within that search phase (i.e., the number of variables that are assigned different truth values immediately before and after the 2 · (tt + 1) TS steps): if that distance is large, the tabu tenure is increased in order to diversify the search; if the distance is big, the tabu tenure is decreased to keep the search process focused on promising regions of the search space. Additionally, an upper and lower bound on the tabu tenure are imposed (for details, see Battiti and Protasi [1997a]). H-RTS has been applied to various sets of unweighted Uniform Random 3-SAT and Uniform Random 4-SAT instances. In terms of solution quality achieved after a fixed number of search steps (variable flips), H-RTS performs significantly better than basic GSAT, GWSAT and GSAT/Tabu, especially for large, highly constrained problem instances [Battiti and Protasi, 1997a]. Furthermore, H-RTS shows substantially more robust performance w.r.t. to the initial tabu tenure setting ttinit than GSAT/Tabu w.r.t. to its tabu tenure parameter, tt. When it was first proposed, H-RTS was one of the best-performing algorithms for unweighted MAX-SAT; however, it typically seems to be unable to reach the performance of some more recent SLS algorithms for MAX-SAT algorithms, such as IRoTS. Interestingly, there is some evidence that the performance of H-RTS does not significantly depend on the initial non-oblivious local search phase, but is rather due to the (oblivious) reactive tabu search procedure.
7.3 SLS Algorithms for MAX-CSP MAX-CSP generalises CSP analogous to the way in which MAX-SAT generalises SAT: Given a CSP instance, the objective is to satisfy as many constraints as possible. The importance of MAX-CSP resides in the fact that it is one of the simplest extensions of CSP to constraint optimisation problems; as such, it is typically used as a first step for extending algorithmic techniques for CSP solving to optimisation problems. As in Chapter 6, we will focus on finite discrete MAXCSP, where the domains of all CSP variables are finite and discrete.
The MAX-CSP Problem The simplest case of MAX-CSP gives all constraints the same importance, and the goal is to maximise the number of satisfied constraints.
7.3 SLS Algorithms for MAX-CSP
341
Definition 7.4 Unweighted MAX-CSP
Given a CSP instance P := (V, D, C) as in Definition 6.2 (page 293), let f (a) be the number of constraints violated under variable assignment a, and let m be the number of constraints in P . The (Unweighted) Maximum Constraint Satisfaction Problem (MAX-CSP) is to find a∗ ∈ argmina∈Assign(P ) f (a) or, equivalently, a∗ ∈ argmaxa∈Assign(P ) (m − f (a)), that is, a variable assignment a∗ that maximises the number of the satisfied constraints in P .
As in MAX-SAT, maximising the number of satisfied constraints is equivalent to minimising the number of unsatisfied constraints; in the following we consider MAX-CSP as a minimisation problem. Note that CSP is the decision variant of MAX-CSP in which the objective is to determine whether there is a CSP variable assignment that simultaneously satisfies all constraints. The evaluation variant and the associated decision problems are defined as in the case of MAX-SAT. The MAX-CSP problem arises in the context of overconstrained CSP instances, in which it is impossible to satisfy all given constraints simultaneously. In such cases, the assumption underlying unweighted MAX-CSP that all constraints are equally important is often not appropriate. To address this issue, the MAXCSP formalism can be extended, similar to MAX-SAT, to include constraint weights that explicitly represent the importance of satisfying specific constraints of a given CSP instance. Definition 7.5 Weighted CSP Instance
A weighted CSP instance is a pair (P, w), where P is a CSP instance and w : {C1 , . . . , Cm } → R+ is a function that assigns a positive real value to each constraint Ci of P ; w(Ci ) is called the weight of constraint Ci . (Without loss of generality, we assume that all constraints in P are pairwise different.)
The objective in weighted MAX-CSP is to find a CSP variable assignment for a given weighted CSP instance that minimises the total weight of the unsatisfied constraints.
Definition 7.6 Weighted MAX-CSP
Given a weighted CSP instance P := (P, w), let f (a) be the total weight of the constraints of P violated under CSP variable assignment a, that is,
342
Chapter 7
MAX-SAT and MAX-CSP
f (a) :=
C∈CU (a) w (Ci ), where CU (a) is the set of all constraints of P violated under assignment a. The Weighted Maximum Constraint Satisfaction Problem (Weighted MAX-CSP) is to find a variable assignment a∗ that maximises the total weight of the satisfied constraints in P , that is, a∗ ∈ argmina∈Assign(P ) f (a) or, equivalently, a∗ ∈ argmaxa∈Assign(P ) (f˜ − f (a)), m where f˜ := i=1 w(Ci ) is the total weight of all constraints in P .
The constraint weights reflect the different priorities in satisfying the respective constraints. They can be used to encode problems that involve hard constraints, which must be satisfied in any feasible solution, as well as soft constraints, which represent an optimisation goal; problems of this type occcur in many real-world applications. MAX-CSP is an N P-hard problem because it is a generalisation of CSP, which itself is N P-complete (cf. Chapter 6). As might be expected, even finding high-quality suboptimal solutions for MAX-CSP is difficult in the worst case: for k√ -ary MAX-CSP with domains of size d, achieving approximation ratios of dk−2 k+1+1 − is N P-hard for any constant > 0 [Engebretsen, 2000]. The best theoretical worst-case performance guarantees known to date have been shown for an algorithm based on linear programming and randomized rounding, which achieves an approximation ratio of dk−1 [Serna et al., 1998].
Randomly Generated and Structured MAX-CSP Instances Algorithms for MAX-CSP have mostly been evaluated on instances that are randomly generated according to the Uniform Random Binary CSP model described in Chapter 6, Section 6.5. This generative model has four parameters: the number of CSP variables, n; the domain size for each CSP variable, k ; the constraint graph density, α; and the constraint tightness, β . In the context of MAX-CSP, these parameters are typically chosen in such a way that the resulting instances are unsatisfiable [Wallace, 1996a; Galinier and Hao, 1997]. Random weighted MAX-CSP instances are obtained by assigning randomly chosen weights to the constraints; these are typically sampled from a uniform distribution over a given range of integers [Lau, 2002]. Other combinatorial optimisation problems from a wide range of application areas can be encoded into MAX-CSP in a straightforward way. One example for such a problem is university examination timetabling: Given a set of examinations, a set of time-slots, a set of students, and for each student, a set of examinations to be taken, the objective is to assign a set of examinations to a set of time slots such that certain hard constraints are satisfied and additional
7.3 SLS Algorithms for MAX-CSP
343
criteria are optimised. (For simplicity’s sake, this version of the problem does not capture room assignments.) A typical hard constraint is to forbid any temporal overlaps between the examinations taken by the same student; a typical example of a soft constraint is to maintain a minimum temporal distance between any pair of examinations for the same student (for an extensive list of possible constraints found in real life examination timetabling problems, see Burke [1996]). A set of benchmark instances that has commonly been used to evaluate algorithms for examination timetabling with exactly these two types of constraints has been defined by Carter et al. [1996]. In particular, the soft constraints penalise timetables in which the temporal distance ∆t between two exams taken by the same student is less than six time slots; the penalty is 6 − ∆t if ∆t < 6 and zero otherwise. In the weighted MAX-CSP formulation, this penalty is represented by five constraints for every student; each of these is violated if the temporal distance between two examinations is equal to ∆t time slots, where 0 < ∆t < 6, and has a weight of 6 − ∆t. The hard constraints, which forbid overlapping time slots for exams taken by the same student, are assigned a weight larger than the sum of the weights of all soft constraints. Another example of a problem that can be represented as a MAX-CSP arises in the context of frequency assignment in wireless communication networks. In the Frequency Assignment Problem (FAP), a set of wireless communication connections is given along with a limited number of available frequencies; the objective is to assign a frequency to each communication connection subject to additional constraints. These constraints typically concern the reduction or the avoidance of electromagnetic interference; for example, a minimum separation may be required between any of the frequencies assigned to two physically close connections. For more details on the FAP and its variants, we refer to Aardal et al. [2003]. One particular variant that can be cast directly into the MAX-CSP framework is described in the following example.
Example 7.5 The Radio Link Frequency Assignment Problem
A widely used set of FAP benchmark instances stems from the Radio Link Frequency Assignment Problem (RLFAP), a particular FAP variant that has been extensively studied in the EUCLID CALMA project (see Eisenblätter and Koster [2003]). In this project, 11 RLFAP instances stemming from a military communications application were provided by CELAR (Centre d’Electronique de’l Armement, France); these instances are all based on simplified data from a real network of field phones. Each of the CELAR instances specifies a set of radio links that need to be established between pairs of sites, where each site has an associated list of possible frequencies it may use. The goal is to assign frequencies to the sites
344
Chapter 7
MAX-SAT and MAX-CSP
such that interference is avoided or, if that is impossible, minimised. For this purpose, for each pair of links (i, j ), a separation constraint |freq(i) − freq(j )| ≥ dij is given, where dij is the minimum distance required between the frequencies freq(i) and freq(j ) assigned to links i and j in order to avoid interference; in reality, these distance values depend on the position of the sites as well as on the physical environment. (Note that the separation constraints can be seen as an extension of the binary inequality constraints found in graph colouring; see also Chapter 10, page 477 for a discussion of such extensions.) In the CELAR instances, there are additional binary constraints representing the fact that each communication connection between two sites requires two radio links, one for each direction of communication between the given sites, which are separated by a fixed difference in frequency. Depending on the nature of specific CELAR instances, three different optimisation criteria are considered. These amount to finding a frequency assignment that 1. causes no interference and uses a minimal number of different frequencies (and hence, maximises the number of unused frequencies that may later be utilised for additional links); or 2. causes no interference and minimises the maximum frequency used (and hence, maximises the unused portion of the frequency spectrum); or 3. minimises the weighted sum of the violated constraints. For the third objective, which is used for overconstrained, unsatisfiable instances, each constraint is assigned one of four weights, according to the priority of the respective links. The quality of a frequency assignment is measured based on respective violations of the interference constraints by using the objective function
f (a) := w1 · nc1 (a) + w2 · nc2 (a) + w3 · nc3 (a) + w4 · nc4 (a), where w1 , . . . , w4 are the four different constraint weights, and nci (a) is the number of constraints with weight wi that are violated under assignment a. For some instances, additional mobility constraints are considered, which model the situation that some connections have pre-assigned frequencies whose modification is either costly or impossible. The size of the CELAR instances ranges from 200 connections and 1 235 constraints to 916 connections and 5 744 constraints. The structure of each instance can be described by an interference graph G := (V, E ), whose vertices corresponds to the given links and whose edges represent the non-trivial
7.3 SLS Algorithms for MAX-CSP
345
Figure 7.7 Interference graph for the RLFAP instance CELAR06 (for details, see text).
separation constraints between the respective links. (A separation constraint is trivial if, and only if, it is satisfied by all possible combinations of frequencies available for the respective links.) Figure 7.7 shows the interference graph for instance CELAR06, an overconstrained instance with 200 links and 1 332 separation constraints, in which the objective is to minimise the weighted sum of violated constraints. Through a series of research efforts over several years, most of the CELAR instances have been optimally solved; in this context, a number of SLS methods have been used (for details, see Eisenblätter and Koster [2003]).
346
Chapter 7
MAX-SAT and MAX-CSP
SLS Algorithms for Unweighted MAX-CSP Because of the way SLS algorithms for CSP evaluate and minimise constraint violations in order to find solutions to a given CSP instance, these algorithms can generally be applied directly to unweighted MAX-CSP instances. Variants of the Min-Conflicts Heuristic (MCH; see Chapter 6, Section 6.6) were amongst the first SLS algorithms applied to unweighted MAX-CSP. Empirical results on a set of randomly generated MAX-CSP instances show that WMCH performs better than basic MCH and basic MCH with random restart [Wallace and Freuder, 1995]. Interestingly, a parametric study of WMCH’s performance indicates that the performance-optimising setting of wp (the probability for executing random walk steps rather than basic MCH steps) depends on the number of constraints violated in the optimal solutions to the MAX-CSP: higher optimal solution quality values require smaller wp settings. In a further experimental study, the performance of the same three MCH variants was compared with that of three other CSP algorithms: • the Breakout Method, an early and relatively widely known dynamic local search method for the CSP [Morris, 1993]; • EFLOP, a hybrid SLS algorithm that combines iterative improvement with value propagation techniques [Yugami et al., 1994]; • weak commitment search, a complete method that uses the min-conflicts heuristic as a value ordering heuristic when extending a partial candidate solution, and that uses a restart mechanism if a partial candidate solution cannot be extended further without violating constraints [Yokoo, 1994]. (In the optimisation case, a partial candidate solution is typically abandoned if it has a higher weight than the incumbent candidate solution.) While on a set of small randomly generated MAX-CSP instances (with 30 variables and domain size 5) WMCH did not perform significantly better than these three methods, it did achieve better performance on larger instances [Wallace, 1996a]. The best results for randomly generated MAX-CSP instances obtained so far were reported for the tabu search algorithm by Galinier and Hao (TS-GH) [Galinier and Hao, 1997]. Different from MCH variants, which in each step choose a variable involved in a conflict and then consider changing the value of this variable, TS-GH determines each search step by considering the set of all variable-value pairs (v, y ) for which v occurs in a currently violated constraint (cf. Chapter 6, page 305; it may be noted that TS-GH had been applied to MAX-CSP before it was evaluated on soluble CSP instances).
7.3 SLS Algorithms for MAX-CSP
347
On randomly generated MAX-CSP instances with up to 500 CSP variables and domain size 30, TS-GH has been shown to outperform WMCH: TS-GH reached the same solution quality as WMCH in about three to four times fewer search steps and found better quality solutions when allowed the same run-time (in terms of search steps). Due to the speed-up techniques used in efficient implementations, the search steps of TS-GH are only slightly more expensive than those of WMCH; the difference in CPU time per search step has been measured at about 15% [Galinier and Hao, 1997]. One may conjecture that the better performance of TS-GH compared to WMCH is a result of the larger neighbourhood searched by TS-GH in each single step. However, limited empirical results indicate that a variant of TS-GH that uses random walk instead of tabu search for escaping from local optima performs significantly worse than WMCH [Galinier and Hao, 1997]. On the other hand, it is known that the restriction of the neighbourhood to variables involved in conflicts is important for obtaining TS-GH’s high performance. There is some empirical evidence suggesting that the performance of TS-GH drops significantly if all variable–value pairs are considered (including those containing variables not involved in currently violated constraints) [Hao and Pannier, 1998].
SLS Approaches to Weighted MAX-CSP The previously described algorithms for unweighted MAX-CSP can be easily extended to weighted MAX-CSP. Somewhat surprisingly, so far this approach has remained largely unexplored. An exception is the work of Lau and Watanabe [1996] and Lau [2002], who developed an approximation algorithm for weighted MAX-CSP based on semidefinite programming and randomised rounding; for domain sizes two and three, this algorithm achieves an approximation ratio of 2.451. A variant of this algorithm that applies iterative improvement to the solution obtained from the approximation algorithm (APII) has been empirically compared to an SLS algorithm that consists of a greedy construction heuristic followed by an iterative improvement procedure (GII), as well as to an extension of MCH to weighted MAX-SAT. Applied to ‘forced’ instances, which are randomly generated in a way that guarantees their solubility, APII achieved substantially better solution qualities than GII and MCH. In these experiments, MCH and APII were allotted approximately the same run-time, while GII terminated in a local minimum within roughly 5% of this time. Furthermore, on the forced instances the approximation algorithm without the subsequent local search phase performed better than GII. However, on randomly generated instances that were not soluble by construction, the performance advantages observed for APII were less pronounced [Lau, 2002], which suggests that the
348
Chapter 7
MAX-SAT and MAX-CSP
excellent performance of APII on forced instances may be an artifact induced by the instance generation process.
Overconstrained Pseudo-Boolean CSP Pseudo-Boolean CSP can be seen as a restriction of CSP in which all variables have domains {0, 1}, but more expressive constraints are supported than the CNF clauses used in SAT or MAX-SAT (see also Chapter 6, page 300ff.). The PseudoBoolean CSP formalism can be extended to consider optimisation objectives in addition to the conventional, hard constraints. In the resulting Overconstrained Pseudo-Boolean CSP (OPB-CSP), optimisation goals are encoded as competing soft constraints. As in Pseudo-Boolean CSP, the hard constraints in OPB-CSP are of the form n aij · a(xi ) ≥ bj , i=1
where the aij as well as bj are rational numbers and a(xi ) is the value of constraint variable xi under assignment a. (Note that analogous constraints that use any of the relations ‘≤’, ‘’, ‘=’, ‘=’ instead of ‘≥’ can be represented using ‘≥’ constraints only.) However, now we additionally consider a set of soft constraints of the same form, but with constants cij and dj instead of aij and bj . Given such an OPB-CSP instance P with m hard constraints and m soft constraints, the goal is to determine a variable assignment a that satisfies all hard constraints and minimises the total degree of soft-constraint violation, that is, m n the objective function f (a) := j=1 i=1 max{0, dj − cij · a(xi )} instead of the number of violated soft constraints [Walser, 1999; Walser et al., 1998]. Note that the definition of the objective function f relies on the algebraic structure of the constraints in conjunction with the use of numeric variables, and hence cannot be applied to weighted MAX-CSP, which allows arbitrary constraint relations. It may be noted that OPB-CSP with m hard constraints and m soft constraints is equivalent to the following formulation as an integer programming problem:
Minimise
f (x1 , . . . , xn ) :=
n m
max{0, dj − cij · xi }
j=1 i=1
subject to
aij · xi ≥ bj xi ∈ {0, 1}
(j = 1, . . . , m (i = 1, . . . , n
(7.1)
7.3 SLS Algorithms for MAX-CSP
349
Using this formulation, it can be relatively easily shown that every OPB-CSP instance can be converted into an equivalent integer linear program, and hence that OPB-CSP can be seen as a special case of integer linear programming [Walser, 1999]. The most straightforward way of applying SLS methods to this problem is to use an evaluation function that captures the violation of hard as well as soft constraints. The basic version of WSAT(PB), a well-known SLS algorithm for Pseudo-Boolean CSP (cf. Chapter 6, page 301), can be easily extended to OPBCSP by using the evaluation function
f (x1 , . . . , xn ) :=
n m j=1 i=1
|aij · xi − bj | +
n m
d(cij · xi , dj ) · wj ,
(7.4)
j=1 i=1
where d(x, y ) := max {0, y − x} and the constraint weights wj are positive real numbers that can be used to bias the search process towards satisfying certain hard constraints [Walser, 1999; Walser et al., 1998]. (Using such weights has been shown to be important for achieving good performance on certain types of OPBCSP instances.) To handle hard constraints efficiently, the WSAT(PB) variable selection strategy is extended by first randomly selecting an unsatisfied hard constraint with probability wph , while a violated soft constraint is chosen with probability 1 − wph , and then selecting from this constraint the variable to be flipped, according to the strategy described in Chapter 6 (page 301f.). OPB-CSP can be extended by allowing ranges of integers instead of variable domains {0, 1}. The resulting overconstrained integer programming (OIP) instances can be solved using WSAT(OIP), a generalisation of the WSAT(PB) algorithm that can handle integer variables. (Like OPB-CSP, OIP can be seen as a special case of integer programming.) Different from WSAT(PB), WSAT(OIP) allows modifications of the current value v of a given integer variable to values v with |v − v | ≤ 2. An implementation of WSAT(OIP) is available from Walser’s WSAT(OIP) webpage [Walser, 2003]; this supersedes the earlier implementation of WSAT(PB), which can be seen as a restricted variant of WSAT(OIP). A large number of practically relevant problems can be formulated easily and naturally within the OPB-CSP and OIP frameworks. WSAT(OIP) has been tested on a variety of problems that can be encoded using Boolean variables. These problems include radar surveillance problems (which include soft constraints) and the Progressive Party Problem [Smith et al., 1996]. For both problems, WSAT(OIP) showed significantly improved performance over a state-ofthe-art commercial integer programming package (CPLEX) and other methods for solving these problems. WSAT(OIP) also achieved excellent performance
350
Chapter 7
MAX-SAT and MAX-CSP
on capacitated production planning and AI planning problems, which were represented using non-Boolean integer variables [Walser et al., 1998; Kautz and Walser, 1999].
7.4 Further Readings and Related Work MAX-SAT is one of the most widely studied simple combinatorial optimisation problems, and a wide range of SLS algorithms for MAX-SAT has been proposed and evaluated in the literature, including algorithms based on Simulated Annealing, GRASP, Variable Neighbourhood Search, Ant Colony Optimisation and Evolutionary Algorithms. Hansen and Jaumard studied a Simulated Annealing algorithm that uses a Metropolis acceptance criterion and a standard geometric annealing schedule [Hansen and Jaumard, 1990]. This algorithm was found to perform worse than SAMD on various sets of unweighted Uniform Random k -SAT instances; however, in some cases, it reaches better quality solutions than SAMD with substantially higher run-times (both algorithms are terminated when no improvement in the incumbent solution has been observed for a specified number of search steps). GRASP was one of the first SLS algorithms for weighted MAX-SAT [Resende et al., 1997]. It was originally evaluated on the wjnh instances described in Section 7.1, but was later found to be substantially outperformed by the first DLM algorithm for weighted MAX-SAT [Shang and Wah, 1997] and other stateof-the-art algorithms. Recently, Variable Neighbourhood Search (VNS) has been applied to weighted MAX-SAT [Hansen and Mladenovi c, ´ 1999; Hansen et al., 2000]. A variant called Skewed VNS, which accepts worse candidate solutions depending on the amount of deterioration in solution quality and the Hamming distance from the incumbent solution, was shown to perform much better than a basic version of VNS and a basic Tabu Search algorithm. However, it is not clear how Skewed VNS performs compared to state-of-the-art algorithms for weighted MAX-SAT, such as GLS or IRoTS. Roli et al. have studied various Ant Colony Optimisation algorithms for CSP and MAX-CSP [Roli et al., 2001]. They mainly investigated different ways of using pheromones and presented limited computational results for their algorithms on a small set of MAX-SAT instances. These results indicate that their ACO algorithms (without using local search) perform substantially worse than stateof-the-art algorithms for MAX-SAT. Evolutionary Algorithms can be easily applied to MAX-SAT, because the candidate solutions can naturally be represented as binary strings, and standard crossover and mutation operators can be
7.4 Further Readings and Related Work
351
applied in a straightforward way. Some insights into the behaviour of genetic algorithms for MAX-SAT have been obtained, but pure evolutionary algorithms (without local search) were found to perform relatively poorly [Rana, 1999; Rana and Whitley, 1998; Bertoni et al., 2000]. There are relatively few complete algorithms for MAX-SAT. Most of these are based either on branch & bound extensions of backtracking algorithms derived from the Davis-Logeman-Loveland (DLL) procedure [Davis et al., 1962] or on branch & cut approaches. A comparison of a DLL-based MAX-SAT solver by Borchers and Furman [1999b], which uses short runs of GWSAT for obtaining upper bounds on the solution quality, and a branch & cut algorithm by Joy, Mitchell and Borchers [1997] showed that the DLL-based approach performed significantly better than the branch & cut algorithm on MAX-3-SAT, while the branch & cut algorithm was found to be superior on MAX-2-SAT instances and MAX-SAT-encoded Steiner tree problems. The DLL algorithm of Borchers and Furman has recently been significantly improved by including more powerful lower-bounding techniques and variable selection heuristics [Alsinet et al., 2003]. It may be noted that the wjnh instances as well as some weighted MAX-SAT instances with up to 500 variables that were used to evaluate VNS [Hansen et al., 2000] were solved to optimality with CPLEX, a well known general-purpose integer programming software. However, all of these methods appear to be substantially less efficient in finding high-quality solutions for large and hard MAX-SAT instances than state-of-the-art SLS algorithms [Resende et al., 1997; Hoos et al., 2003]. MAX-CSP has received considerable attention from the constraint programming community as a straightforward extension of CSP to optimisation problems. MAX-CSP is a special case of the Partial Constraint Satisfaction Problem, which involves finding values for a subset of variables satisfying only a subset of the constraints [Freuder, 1989; Freuder and Wallace, 1992]. More recently, two general frameworks for constraint satisfaction and optimisation have been introduced, Semi-Ring Based CSP [Bistarelli et al., 1997] and Valued CSP [Schiex et al., 1995]. So far, most research concentrated on establishing formal comparisons of these frameworks or adapting propagation techniques and complete algorithms to solve problems formulated within these frameworks; we are not aware of SLS algorithms for the latter two frameworks. For MAX-CSP, significant research efforts have been directed towards the development of efficient complete algorithms. Since the first branch & bound algorithm for unweighted MAX-CSP [Freuder and Wallace, 1992], especially the lower bounding techniques have been significantly refined, leading to much better performing branch & bound algorithms [Wallace, 1994; 1996b; Larrosa et al., 1999; Kask and Dechter, 2001; Larrosa and Dechter, 2002; Larrosa and Meseguer, 2002].
352
Chapter 7
MAX-SAT and MAX-CSP
Few results are available for SLS algorithms for MAX-CSP other than the ones described in this chapter. Kask compared the performance of an algorithm based on the Breakout Method with that of a state-of-the-art branch & bound algorithm and found that the former outperforms the latter on Random MAX-CSP with dense constraint graphs, while for sparse constraint graphs, the branch & bound algorithm is slightly faster than the breakout algorithm [Kask, 2000]. Hao and Pannier compared TS-GH to a Simulated Annealing algorithm for MAXCSP; their computational results suggest that Simulated Annealing is clearly inferior to TS-GH [Hao and Pannier, 1998]. Battiti and Protasi extended H-RTS to the Maximum k -Conjunctive Constraint Satisfaction problem (MAX-k -CCSP), a special case of MAX-CSP with Boolean variables, in which each constraint corresponds to a conjunction of k literals [Battiti and Protasi, 1999]. As previously stated, the Overconstrained Integer Programming Problem (OIP) introduced by Walser is a special case of integer linear programming (ILP). There exist several SLS algorithms for 0–1 ILP (i.e., pseudo-Boolean optimisation) and general ILP. Computational results by Walser suggest that the ‘general-purpose’ Simulated Annealing strategy (GPSIMAN) by Conolly [1992] is outperformed by WSAT(OIP) on a variety of problems [Walser, 1999]. Extensions of GPSIMAN were later applied by Abramson to set partitioning problems [Abramson et al., 1996]. Abramson and Randall applied Simulated Annealing to encodings of optimisation problems into general ILP problems and later introduced a modelling environment based on dynamic list structures [Abramson and Randall, 1999]. Furthermore, there are adaptations of evolutionary algorithms and GRASP for integer linear programming [Pedroso, 1999; Neto and Pedroso, 2001]. Several SLS algorithms have been developed for solving the more general Mixed Integer Linear Programming Problem (MILP), which allows {0, 1} variable domains as well as continuous domains in the form of intervals over real numbers. Obviously, these algorithms can also be applied to pure ILP problems, which can be seen as a special case of MILP in which no continuous variables occur. For an overview of SLS algorithms for MILP we refer to a recent article by Løkketangen [2002].
7.5 Summary The Maximum Satisfiability Problem (MAX-SAT) is the optimisation variant of SAT in which the goal is to find a variable assignment that maximises the number or total weight of satisfied clauses. As one of the conceptually simplest hard combinatorial optimisation problems, MAX-SAT is of considerable theoretical
7.5 Summary
353
interest. Furthermore, a diverse range of hard combinatorial optimisation problems, many of which have direct real-world applications, can be efficiently and naturally encoded into MAX-SAT. By using appropriately chosen clause weights and solution quality bounds, combinatorial optimisation problems with hard and soft constraints can be represented by weighted MAX-SAT instances. Considerable effort has been spent in designing efficient (i.e., polynomialtime) approximation algorithms for MAX-SAT that have certain worst-case performance guarantees. For widely used types of benchmark instances, including test-sets of randomly generated MAX-SAT instances as well as MAX-SAT encodings of other combinatorial optimisation problems (such as set covering and time-tabling), these approximation algorithms do not reach the performance of even relatively simple SLS algorithms. Furthermore, different from the situation for SAT, systematic search algorithms for MAX-SAT are substantially less efficient than SLS algorithms in finding high-quality solutions to typical MAX-SAT instances. The most successful SLS algorithms for MAX-SAT fall into four categories: (i) tabu search algorithms, in particular Robust Tabu Search (RoTS) and Reactive Tabu Search (H-RTS); (ii) dynamic local search (DLS) algorithms, particularly Guided Local Search (GLS); (iii) iterated local search (ILS) algorithms, particularly Iterated Robust Tabu Search (IRoTS); and (iv) generalisations of highperformance SAT algorithms, in particular Novelty+ with weighted clause selection. Some of these algorithms achieve state-of-the-art performance on mildly overconstrained instances whose optimal solutions leave relatively few clauses unsatisfied (GLS as well as Novelty+ and its variants for weighted MAX-SAT seem to fall into this category), while others, such as IRoTS, appear to be stateof-the-art for highly overconstrained instances. All of these algorithms make use of information on the search history, mainly in the form of a tabu list or dynamically adjusted clause penalties. There is some evidence that by using large neighbourhoods, such as reduced versions of the 2-flip and 3-flip neighbourhoods, high-performance ILS and Tabu Search algorithms for MAX-SAT can be further improved; these improvements, however, critically rely on efficient mechanisms for searching these larger neighbourhoods. On the other hand, although the use of non-oblivious evaluation functions, that is, evaluation functions that are not indifferent w.r.t. to the number of literals that are simultaneously satisfied in a given clause, leads to theoretical and practical improvements in the performance of simple iterative improvement methods for unweighted MAX-SAT, there is little evidence that non-oblivious evaluation functions are instrumental in reaching state-of-the-art SLS performance on MAX-SAT instances of any type. Although a wide range of other SLS methods has been applied to MAXSAT, including Simulated Annealing, GRASP, Ant Colony Optimisation and
354
Chapter 7
MAX-SAT and MAX-CSP
Evolutionary Algorithms, there is currently no evidence that any of these can achieve state-of-the-art performance. The Maximum Constraint Satisfaction Problem (MAX-CSP) can be seen as a generalisation of CSP where the objective is to find a CSP variable assignment that maximises the number or total weight of satisfied constraints. Current empirical results suggest that the best performing SLS algorithms for CSP are also best for MAX-CSP; in particular, the tabu search algorithm by Galinier and Hao, TS-GH, appears to be the most efficient algorithm for unweighted MAX-CSP known to date. However, most existing experimental studies on SLS algorithms for MAX-CSP are limited to particular classes of randomly generated MAX-CSP instances. Furthermore, the potential of many advanced SLS methods, such as Dynamic Local Search or Iterated Local Search, in the context of MAX-CSP is largely unexplored. Overall, considerably more research is necessary to obtain a more complete picture on the relative performance and behaviour of SLS algorithms for MAX-CSP. On the other hand, generalisations of WalkSAT to overconstrained pseudoBoolean CSP and integer programming problems, which can be seen as special cases of MAX-CSP, have been used successfully to solve various application problems, and in many cases, they have been shown to achieve substantially better performance than specialised algorithms and state-of-the-art commercial optimisation tools. However, compared to state-of-the-art complete integer or constraint programming algorithms, SLS methods for these problems have been much less explored, very likely leaving considerable room for further improvement.
Exercises 7.1
[Easy] How can an implementation of a standard SLS algorithm for SAT, such as GSAT, be used (without modifications) for solving weighted MAX-SAT instances with integer clause weights? Discuss potential drawbacks of this approach to solving weighted MAX-SAT instances.
7.2
[Easy] Give an example that illustrates how duplication of clauses in a CNF formula can affect the optimal solutions of the corresponding MAX-SAT instance.
7.3
[Medium; Hands-On] Implement the SAMD algorithm and the TS-YI algorithm described in Section 7.2 (page 329f.) for weighted MAX-SAT (you can use the UBCSAT code available from www.sls-book.net as a convenient
Exercises
355
implementation framework). Analyse the performance of the two algorithms on the test-sets rnd100-500/w500-100, rnd100-500/w500-200 and rnd100500/w500-500 from www.sls-book.net, using appropriate computational experiments and empirical evaluation methods. Based on your observations, formulate a hypothesis on the dependency of the algorithms’ performance on the variability of the clause weight distributions; describe further experiments that could be conducted in order to test your hypothesis. 7.4
[Easy] Consider the instance of the Weighted Set Covering Problem specified by the following diagram
A3 A2 A5
A1 A4
and let w(Ai ) := wi (i ∈ {1, . . . , 5}) be arbitrary weights. Represent this problem (a) as a weighted (discrete finite) MAX-CSP instance; (b) as a weighted MAX-SAT instance. 7.5
[Medium] Given a Min-Cost GCP instance G, prove that (i) the optimal solutions of the MAX-SAT instance F (G), obtained from the encoding described in Section 7.1 (page 319f.), correspond exactly to the optimal solutions of G and that (ii) under the 1-flip neighbourhood, the locally optimal candidate solutions of F (G) correspond exactly to the k -colourings of G.
7.6
[Medium] Is it the case that any weighted MAX-CSP instance can be encoded into an equivalent, reasonably compact weighted MAX-SAT instance? If so, how big is the difference between the size of the original MAX-CSP instance and the MAX-SAT encoding?
7.7
[Medium] Give an extended definition of weighted MAX-SAT in which weights can be attached to arbitrary subformulae of a propositional formula. Discuss potential advantages and disadvantages of such a generalised version of weighted MAX-SAT, particularly w.r.t. to solving such problems with SLS algorithms.
356
Chapter 7
MAX-SAT and MAX-CSP
7.8
[Easy] Explain how an examination timetabling problem can be represented as an instance of weighted MAX-CSP. Exemplify your encoding by applying it to a small examination timetabling instance with four exams, six students and four time slots.
7.9
[Medium] Extend the definition of weighted MAX-CSP such that penalties can be specified for assignments of particular values to CSP variables. Does this increase the representational power of weighted MAX-CSP?
7.10 [Medium] Formulate the Frequency Assignment Problem (introduced in Section 7.3, page 343ff.) (a) as a pseudo-Boolean optimisation problem; (b) as a weighted (discrete finite) MAX-CSP instance.
Traveller, there is no path, paths are made by walking. —Antonio Machado, Poet
Travelling Salesman Problems The Travelling Salesman Problem (TSP) is probably the most widely studied combinatorial optimisation problem and has attracted a large number of researchers over the last five decades. Work on the TSP has been a driving force for the emergence and advancement of many important research areas, such as stochastic local search or integer programming, as well as for the development of complexity theory. Apart from its practical importance, the TSP has also become a standard testbed for new algorithmic ideas. In this chapter we first give a general overview of TSP applications and benchmark instances, followed by an introduction to the most basic local search algorithms for the TSP. Based on these algorithms, several SLS algorithms have been developed that have greatly improved the ability of finding high quality solutions for large instances. We give a detailed overview of iterated local search algorithms, which are currently among the most successful SLS algorithms for large TSP instances, and present several prominent, high-performance TSP algorithms that are based on population-based SLS methods. While most of this chapter focuses on symmetric TSPs, we also discuss aspects that arise in the context of solving asymmetric TSPs.
8.1 TSP Applications and Benchmark Instances Given an edge-weighted, completely connected, directed graph G := (V, E, w ), where V is the set of n := #V vertices, E the set of (directed) edges, and w : E → R+ a function assigning each edge e ∈ E a weight w(e), the Travelling 357
358
Chapter 8 Travelling Salesman Problems
Salesman Problem (TSP) is to find a minimum weight Hamiltonian cycle in G, that is, a cyclic path that contains each vertex exactly once and has minimal total weight (a formal definition was given in Chapter 1, page 20ff.). Following one of the most intuitive applications of the TSP, namely, finding optimal round trips through a number of geographical locations, the vertices of a TSP instance are often called ‘cities’, the paths in G are called ‘tours’ and the edge weights are referred to as ‘distances’. In this chapter we focus mainly on the symmetric TSP, that is, the class of TSP instances in which for each pair of edges (vi , vj ) and (vj , vi ) we have w((vi , vj )) = w((vj , vi )). We will also highlight some of the issues that arise when dealing with the asymmetric TSP (ATSP), where for at least one pair of vertices the directed edges (vi , vj ) and (vj , vi ) have different weights.
TSP as a Central Problem in Combinatorial Optimisation The TSP plays a prominent role in research as well as in a number of application areas. The design of increasingly efficient TSP algorithms has provided a constant intellectual challenge, and many of the most important techniques for solving combinatorial optimisation problems were developed using the TSP as an example application. This includes cutting planes in integer programming [Dantzig et al., 1954], which led to the modern, high performing branch & cut methods [Grötschel and Holland, 1991; Padberg and Rinaldi, 1991; Applegate et al., 1998; 2003a], polyhedral approaches [Grötschel and Padberg, 1985; Padberg and Grötschel, 1985], branch & bound algorithms [Little et al., 1963; Held and Karp, 1971], as well as early local search algorithms [Croes, 1958; Flood, 1956; Lin, 1965; Lin and Kernighan, 1973]. Additionally, many of the general SLS methods presented in Chapter 2, such as Simulated Annealing or Ant Colony Optimisation, were first tested on the TSP. The TSP also played an important role in the development of computational complexity theory [Garey and Johnson, 1979]. In fact, several books are entirely devoted to the TSP [Gutin and Punnen, 2002; Lawler et al., 1985; Reinelt, 1994], and an enormous number of research articles cover the various aspects of TSP solving. For details on the history of TSP solving, we refer to Schrijver’s overview paper on the history of combinatorial optimisation [Schrijver, 2003], the book chapter by Hoffmann and Wolfe [1985] and the web page by Applegate et al. [2003b]. There are various reasons for this central role of the TSP in combinatorial optimisation. Firstly, it is a conceptually simple problem, which is easily explained and understood, but as an N P-hard problem, it is difficult to solve [Garey and Johnson, 1979]. Secondly, the design and analysis of algorithms for the TSP are not obscured by technicalities that arise from dealing with side constraints, which
8.1 TSP Applications and Benchmark Instances
359
are often difficult to handle in practice. Thirdly, the TSP is now established as a standard testbed for new algorithmic ideas, which are often assessed based on their performance on the TSP. Fourthly, given the significant amount of interest in the research community, new contributions to TSP solving or insights into the problem structure are likely to have a large impact. Finally, the TSP arises in a variety of applications and is therefore of significant practical relevance.
Benchmark Instances Extensive computational experiments have always played an important role in the history of the TSP. These experiments involve several types of TSP instances. In many cases, these are predominantly metric TSP instances, that is, instances in which the vertices correspond to points in a metric space and the edge weights correspond to metric distances between pairs of points. Metric TSP instances for which the distances are based on the standard Euclidean metric are also called Euclidean. Regardless of whether they are metric or not, almost all available TSP benchmark instances use integer distances. One main reason is that on older computers, integer computations were much faster than computations using floating point numbers; high precision approximations of the true distances in metric spaces can be achieved by multiplying all floating point distances in a given TSP instance with a constant factor and subsequent rounding. A well known and widely used collection of TSP instances is available through TSPLIB, a benchmark library for the TSP [Reinelt, 2003]. TSPLIB comprises more than 100 instances with up to 85 900 cities. For all except four TSPLIB instances, optimal solutions have been determined; as of September 2003, the largest instance solved provably to optimality has 15 112 cities. Most of the TSPLIB instances stem from influential studies on the TSP; many of them originate from practical applications, such as minimising drill paths in printed circuit board manufacturing, positioning detectors in X-ray crystallography or finding the shortest round trip through all the Biergärten in Augsburg, Germany.* Many of the remaining TSPLIB instances are of geographical nature, where the intervertex distances are derived from the distances between cities and towns with given coordinates. Two examples of TSPLIB instances are shown in the upper part of Figure 8.1. A set of TSP instances derived from problems in VLSI design, ranging from 131 to 744 710 vertices, is available from the web page by Applegate et al. [2003b]. ∗ Augsburg
is close to one of the authors’ (T. S.) home town; however, T. S. never managed to visit all Biergärten in one night. The reason for this failure may well be that at the time he lived there, he was not yet aware of the shortest tour.
360
Chapter 8 Travelling Salesman Problems
Figure 8.1 Four (Euclidean) TSP benchmark instances. The two instances in the top row
stem from an application in which drill paths in manufacturing printed circuit boards are to be minimised in length (left side: TSPLIB instance pcb1173 with 1 173 vertices and right side: fl1577 with 1 577 vertices, the latter instance shows a pathological clustering of vertices). The bottom row shows a Random Uniform Euclidean instance (left side) and a Randomy Clustered Euclidean instance (right side); both instances have 3 162 vertices each.
From the same web page, a TSP instance of potential interest for globetrotters is available; this World TSP instance comprises all 1 904 711 populated cities or towns registered in the National Imagery and Mapping Agency database and the Geographic Names Information System; several additional TSP instances comprising the towns and cities of individual countries are available from the same site. An overview of further practical applications of the TSP can be found on the web page by Applegate et al. [2003b] or in the book by Reinelt [1994]. A large part of the experimental (and also theoretical) research on the TSP has used randomly generated instances with the most widely used classes being Random Euclidean (RE) instances and Random Distance Matrix (RDM) instances. In RE instances, the vertices correspond to randomly placed points in an l-dimensional hypercube, and the edge weights are given by the pairwise Euclidean distances between these points. (The Euclidean distance between two points
8.1 TSP Applications and Benchmark Instances
361
l
− yi )2 . Commonly, real valued distances are scaled by a constant factor α and subsequently rounded or truncated to obtain integer values.) Most experimental studies involving RE instances have focused on twodimensional instances (i.e., l = 2), in which the points are uniformly distributed in a square; we refer to these as Random Uniform Euclidean (RUE) instances. An example for an RUE instance is shown in Figure 8.1 (bottom left plot). The class of RUE instances has the interesting property that, as the instance size approaches √ infinity, the ratio of the optimal tour length to n (where n is the number of vertices) converges towards a constant γ [Beardwood et al., 1959]; for squares with sides of length one, the value of γ is approximately 0.721 [Johnson et al., 1996; Percus and Martin, 1996]. Another type of two-dimensional RE instances, which have been used in the 8th DIMACS Implementation Challenge on the TSP [Johnson et al., 2003a], places the points in clusters within a square area. More precisely, these Random Clustered Euclidean (RCE) instances are obtained by first distributing the cluster centres uniformly at random; then, each actual point is placed by choosing a cluster centre uniformly at random, and then adding to each coordinate a displacement sampled from a normal distribution. An example of an RCE instance is shown in Figure 8.1 (bottom right plot). RCE instances are interesting because it is known that various local search algorithms are negatively affected by the clustering of the vertices. RDM instances are symmetric, non-Euclidean instances in which the edge weights are randomly chosen integers from a given interval. This distribution of TSP instances is known to pose a considerable challenge for many SLS algorithms [Johnson and McGeoch, 1997]. However, since they are conceptually and structurally far-removed from any application, these instances are mainly of theoretical interest.
x := (x1 , . . . , xl ) and y := (y1 , . . . , yl ) is defined as d(x, y ) :=
i=1 (xi
Lower Bounds on the Optimal Solution Quality A large amount of research efforts on the TSP have been dedicated to finding good lower bounds on the optimal solution quality for given instances. Lower bounds are used in complete search methods, such as branch & bound algorithms, to estimate the minimum cost incurred for completing partial solutions and to prune search trees if the cost estimation is larger than or equal to the best solution encountered earlier in the search process. In this context, it is important to have estimates that are close to the real costs, because better estimates facilitate more extensive pruning of the search tree. In general, lower bounds are also
362
Chapter 8 Travelling Salesman Problems
useful for assessing the quality of solutions obtained from incomplete algorithms. This is particularly the case for lower bounds obtained by deterministic methods, which are often used for the evaluation of SLS algorithms in cases where optimal solution qualities are unknown. A general approach for obtaining lower bounds is to solve a relaxation of the original problem, which is typically obtained by removing some problem constraints. Feasible solutions to the original problem then correspond to a subset of the solutions to the relaxed problem, and an optimal solution to the relaxed problem is therefore always a true lower bound for the solution quality of the original problem. Efficient lower-bounding techniques are based on relaxations resulting in problems that can be solved quickly but at the same time have an optimal solution quality close to that of the original problem. One of the simplest lower bounds for a TSP instance G is based on the following observation: By removing a single edge from an optimal tour s∗ with weight w(s∗ ), a spanning tree t of the graph G is obtained. (Recall that a spanning tree in a weighted graph is a subgraph that contains all vertices and has no cycles; the weight of a spanning tree is defined to be the sum of the weights of the edges it contains.) Clearly, a minimum weight spanning tree t∗ has a total edge weight w(t∗ ) ≤ w(t), and hence w(t∗ ) is a lower bound for w(s∗ ). This lower bound is also quick to compute. In fact, the well known algorithms of Kruskal and Prim run in time O (m log n), where m is the number of edges; by using Fibonacci heaps, the time-complexity of Prim’s algorithm can be reduced to O (m + n log n). (For the best lower bounds on the complexity of computing minimum spanning trees see Chazelle [2000].) However, this spanning tree bound can still be relatively far from the optimal solution value. Tighter bounds than the minimum spanning tree lower bound can be obtained as follows: Let G \ {v1 } be the graph obtained from G by deleting vertex v1 and all the edges incident to v1 . A one-tree is a spanning tree on the vertices v2 , v3 , . . . , vn plus two edges incident to vertex v1 (see Figure 8.2 for an example of a one-tree). We get a minimum weight one-tree for G by computing a minimum spanning tree of G \ {v1 } and adding the two minimum weight edges incident to v1 . The weight of the resulting one-tree is a lower bound for w(s∗ ), because every minimum weight tour s∗ of G is a one-tree. This lower bound could be improved by choosing several or all vertices to play the role of v1 and then taking the maximum weight of the corresponding one-trees as a lower bound; however, this does not result in significant gains and is quite time consuming [Reinelt, 1994; Cook et al., 1997]. Luckily, there exist other techniques to improve upon the one-tree bound. These are based on the following observation: We can assign a value pi to each vertex vi and add these values to the weights of all edges incident to a vertex vi ∈ V , resulting in a graph G := (V, E, w ) with edge weights
8.1 TSP Applications and Benchmark Instances
v10
v4
363
v9 v8
v3
v7
v1 v2 v5
v6
Figure 8.2 Example of a one-tree in a graph of ten vertices. The subtree on vertices v2 to
v10 forms a minimum spanning tree.
w ((vi , vj )) := w((vi , vj )) + pi + pj (recall that the edges are not oriented). This has the effect of increasing the weight of each tour in G by a constant amount of n 2 · i=1 pi in G . Clearly, this transformation preserves the optimality of tours. It may, however, result in different optimal one-trees [Held and Karp, 1971; n Cook et al., 1997]; then, by subtracting 2 · i=1 pi from the weight of an optimal one-tree of G , a lower bound on the minimum weight tour in G is obtained. The quality of this lower bound depends on the values of the vertex penalties p1 , . . . , pn . The Held-Karp (HK) lower bound is obtained from the set of vertex penalties pˆ1 , . . . , pˆn that maximises the value of the resulting lower bound. The exact Held-Karp bound can be computed by algorithms based on linear programming. As an alternative, Held and Karp proposed an algorithm that iteratively modifies the penalty assignments p1 , . . . , pn [Held and Karp, 1971]. Roughly speaking, this algorithm iteratively increases the penalties for vertices with degree one in the current optimal one-tree, and decreases the penalties for vertices with degree greater than two; the search is terminated when a minimum weight onetree is obtained in which all vertices have degree two (this corresponds to a feasible solution of G), or when a maximum number of iterations has been performed. The lower-bounding technique of Held and Karp is an example of the more general method called Lagrangian Relaxation, and the iterative penalty adjustment algorithm is an instance of a subgradient optimisation method [Fisher, 1981; 1985; Held et al., 1974]. Experimental results suggest that for many types of TSP instances, the HeldKarp bounds are very tight. For RUE instances, the HK lower bound is typically within less than one percent of the actual optimal solution value [Johnson et al., 1996]. For TSPLIB instances, the gap between the HK bound and the respective optimum solution quality is often slightly larger, but for almost all instances the HK bound is still within two percent of the optimal solution quality.
364
Chapter 8 Travelling Salesman Problems
State-of-the-Art Methods for TSP Solving The algorithmic techniques for TSP solving have reached a very high level of sophistication. This is true for complete algorithms as well as for SLS algorithms. Currently, the best performing complete algorithms for the TSP are branch & cut methods, which are based on solving a series of linear programming relaxations of an integer programming problem [Mitchell, 2002]. These relaxations are typically obtained by allowing the binary variables typically used in integer programming formulations of the TSP [Nemhauser and Wolsey, 1988; Reinelt, 1994] to take arbitrary values from the interval [0, 1], instead of constraining them to integer values from the set {0, 1}. Cutting plane methods are used to make the relaxation more closely approximate the optimum solution of the original integer programming problem. This is done by finding linear inequalities that are satisfied by all integer feasible solutions but not by the non-integer optimal solution to the current relaxation. These so-called cuts are then added to obtain the next linear optimisation problem, which again is solved to optimality. This process is iterated until finding ‘good’ cuts becomes hard. Then, it may become preferable to branch by splitting the current problem into two subproblems; this is done by forcing one edge to be part of any solution in one subproblem and not to appear in any solution of the other subproblem [Grötschel and Holland, 1991; Padberg and Rinaldi, 1991]. For a detailed description of state-of-the-art branch & cut algorithms for the TSP, we refer to Applegate et al. [2003a]. Within modest computation times, efficient implementations of state-of-theart branch & cut algorithms for the TSP can routinely solve to optimality small to medium size symmetric TSP instances, ranging from a few hundred to around 1 000 to 3 000 vertices. Given substantially longer run-times, these algorithms can also solve much larger instances; as previously mentioned, the largest TSPLIB instance that has been solved (provably) optimally (as of September 2003), instance d15112, has 15 112 vertices. (Partly motivated by the availability of such powerful complete algorithms, many studies on symmetric TSP algorithms now focus on solving large instances with thousands of vertices.) Despite these impressive successes, complete algorithms suffer from some limitations. Firstly, the computation times quickly become prohibitively large with increasing instance size. For example, finding an optimal solution and proving its optimality for instance d15112 required a total estimated computation time of 22.6 CPU years on a Compaq EV6 Alpha processor running at 500 MHz (The actual computation was performed on a network of up to 110 workstations [Applegate et al., 2003b].) Secondly, the computation times of complete methods vary strongly among instances: while TSPLIB instance pr2392 (with 2 392 vertices) was solved within 116 CPU seconds on a 500 MHz Compaq XP1000 workstation, solving TSPLIB instance d2103 with 2 103 vertices required a total
8.1 TSP Applications and Benchmark Instances
365
run-time of about 129 CPU days. (In the latter case, the actual computations were performed on a network of 55 Alpha 21164 processors running at 400 and 500 MHz [Applegate et al., 2003b].) Given the limitations of complete algorithms, there is considerable interest in SLS algorithms for the TSP. If reasonably high quality solutions are required very quickly, heuristic construction search algorithms are very useful. Construction methods that give a reasonable trade-off between computation time and solution quality, such as the Savings Heuristic or Farthest Insertion (both are described in Section 8.2), can find tours for RUE instances with a few thousand vertices within about 11 to 16 percent of the Held-Karp lower bounds (which, as explained above, are known to be close to the optimal solution quality) in fractions of a CPU second on a 500MHz Alpha processor [Johnson and McGeoch, 2002]. When allowing run-times of a few CPU seconds, the same relative solution quality can be obtained for instances with several hundred thousand vertices [Johnson and McGeoch, 2002]. Constructive search methods are also important in the context of generating good initial solutions for iterative improvement algorithms. The performance of iterative improvement algorithms for the TSP also depends crucially on the underlying neighbourhood relation and on various details of the search process. In Section 8.2, we give an overview of commonly used iterative improvement algorithms and report some illustrative performance results. Optimal or close to optimal solutions can typically be obtained by using hybrid SLS algorithms at the cost of higher computation times. State-of-the-art SLS algorithms for the TSP can find optimal solutions for symmetric instances with thousands of cities within seconds or minutes of CPU times on modern workstations (as of 2003) [Cook and Seymour, 2003; Helsgaun, 2000]; significantly larger instances can typically be solved optimally or almost optimally within CPU hours. For example, the best performing SLS algorithm identified in a recent extensive experimental study, ILK-H (which is presented in Section 8.3), found a candidate solution of TSPLIB instance d15112 whose quality is only 0.0186 percent away from the known optimum in about seven hours of CPU time on a 500MHz Alpha processor [Johnson and McGeoch, 2002]. For the same instance, other SLS algorithms obtained solution qualities within one percent of the optimum in less than seven CPU seconds. The impressive performance of SLS algorithms when applied to very large TSP instances is exemplified by the results obtained for a RUE instance comprising 25 million cities, for which after 8 CPU days on a IBM RS6000 machine, Model 43-P 260, a solution quality within 0.3 percent of the estimated optimal solution quality was reached by a high-performance iterated local search algorithm [Applegate et al., 2003c]. (The performance of various SLS algorithms for the TSP is further illustrated by the results of the 8th DIMACS Implementation Challenge on the TSP [Johnson et al., 2003a]).
366
Chapter 8 Travelling Salesman Problems
Asymmetric TSPs Empirical results indicate that asymmetric TSP (ATSP) instances, in which a given graph has at least one pair of vertices for which w((v, v )) = w((v , v )), are typically harder to solve than symmetric TSP instances of comparable size [Johnson et al., 2002]. TSPLIB includes 27 ATSP instances ranging from 17 to 443 vertices. A large number of additional instances has been recently generated in the context of empirical studies of ATSP algorithms [Cirasella et al., 2001; Johnson et al., 2002]; these include several classes of randomly generated ATSP instances that model real-world problems, such as moving drills along a tilted surface, scheduling read operations on computer disks, collecting coins from pay phones or finding shortest common super-strings for a set of genomic DNA sequences (a problem that arises in genome reconstruction). There are also some individual instances directly taken from practical applications of the ATSP, such as stacker crane problems, vehicle routing [Fischetti et al., 1994], robot motion planning, scheduling read operations on a tape drive or code optimisation [Young et al., 1997]. These instances and random instance generators are available online at the web site of the 8th DIMACS Implementation Challenge [Johnson et al., 2003a]. ATSP instances can be solved by means of a native ATSP algorithm or, alternatively, by a transformation into symmetric TSP instances, which are then solved using a high-performance algorithm for the symmetric TSP. One such transformation works as follows [Jonker and Volgenant, 1983]. Given a directed graph G := (V, E, w ) with vertex set V := {v1 , . . . , vn }, edge set E , and weight function w, we define an undirected graph G := (V , E , w ) with V := V ∪ {vn+1 , vn+2 , . . . , vn+n }, E := V × V , and w as
w ((vi , vn+j )) := w ((vn+j , vi )) := w((vi , vj ))
for i, j ∈ {1, . . . , n} and (vi , vj ) ∈ E
w ((vn+i , vi )) := w ((vi , vn+i )) := −M w ((vi , vj )) := M
for i ∈ {1, . . . , n} otherwise,
where M is a sufficiently large number, for example, M := v,v ∈V w((v, v )). An example of this transformation is shown in Figure 8.3. It is easy to see that for each Hamiltonian cycle with weight wa of an asymmetric TSP instance there is a Hamiltonian cycle in the symmetric instance with weight wa − n · M . Once a solution for the symmetric TSP instance G is obtained, it can easily be transformed into a solution for the ATSP instance G. There exist other transformations that replace each vertex in G by three vertices, but avoid that the resulting symmetric TSP instance has negative edge weights (which may cause problems with some existing implementations of TSP algorithms).
8.2 ‘Simple’ SLS Algorithms for the TSP
367
-M v1
v1
4 5
5 6
3 M
7
v2
M
M
v5
M M
3
v2
M
2 -M
v3
v6
6
2
4 v3
7
v4
-M
Figure 8.3 Transformation of an ATSP instance (left side) into a symmetric TSP instance
(right side). (For details, see text.)
Although symmetric TSP instances obtained from such transformations have twice or thrice as many vertices as the respective original ATSP instances, solving these using algorithms for the symmetric TSP is often more effective than solving the original ATSP instances using native ATSP algorithms [Johnson et al., 2002]. Furthermore, empirical results indicate that the Held-Karp (HK) lower bounds on the optimal solution quality for the symmetric TSP instances obtained from such transformations are often tighter than the widely used Assignment Problem (AP) lower bounds for the respective ATSP instances [Johnson et al., 2002]. Recent empirical results indicate that ATSP instances with a relatively large gap between the AP and the HK bound are most efficiently solved by transforming them into symmetric TSP instances and solving these using state-of-the-art symmetric TSP algorithms, such as Helsgaun’s Lin-Kernighan variant [Helsgaun, 2000]. However, ATSP algorithms that are guided by information from the AP bound, such as Zhang’s heuristic [Zhang, 1993] (a truncated branch & bound algorithm which uses the AP lower bound), tend to show better performance for ATSP instances for which both bounds are relatively close to each other [Johnson et al., 2002].
8.2 ‘Simple’ SLS Algorithms for the TSP Much of the early research on incomplete algorithms for the TSP has been focused on construction heuristics and iterative improvement algorithms.
368
Chapter 8 Travelling Salesman Problems
These techniques are important, because they are at the core of many more advanced SLS algorithms. They range from extremely fast constructive search algorithms, such as the Nearest Neighbour Heuristic, to complex variable depth search methods, in particular, variants of the Lin-Kernighan Algorithm, which make extensive use of a number of speedup techniques.
Nearest Neighbour and Insertion Construction Heuristics There is a large number of constructive search algorithms for the TSP, ranging from extremely fast methods for metric TSP instances, whose run-time is only slightly larger than the time required for just reading the instance data from the hard-disk (see, for example, Platzman and Bartholdi III [1989]), to more sophisticated algorithms with non-trivial bounds on the solution quality achieved in the worst case. In the context of SLS algorithms, construction heuristics are often used for initialising the search; iterative improvement algorithms for the TSP typically require fewer steps for reaching a local optimum when started from higher-quality tours obtained from a good construction heuristic. One particularly intuitive and well-known constructive search algorithm has been already discussed in Chapter 1, Section 1.4: The Nearest Neighbour Heuristic (NNH) starts tour construction from some randomly chosen vertex u1 in the given graph and then iteratively extends the current partial tour p = (u1 , . . . , uk ) with an unvisited vertex uk+1 that is connected to uk by a minimum weight edge (uk+1 is called a nearest neighbour of uk ); when all vertices have been visited, a complete tour is obtained by extending p with the initial vertex, u1 . The tours constructed by the NNH are called nearest neighbour tours. For TSP instances that satisfy the triangle inequality, nearest neighbour tours are guaranteed to be at most by a factor of 1/2·(log2 (n)+1) worse than optimal tours in terms of solution quality [Rosenkrantz et al., 1977]. In the general case, however, there are TSP instances for which the Nearest Neighbour Heuristic returns tours that are by a factor of 1/3 · (log2 (n + 1) + 4/3) worse than optimal, and hence, the approximation ratio of the NNH for the general TSP cannot be bounded by any constant [Rosenkrantz et al., 1977]. In practice, however, the NNH typically yields much better tours than these worst-case results may suggest; for metric and TSPLIB instances, nearest neighbour tours are typically only 20–35% worse than optimal. In most cases, nearest neighbour tours are locally similar to optimal solutions, but they include some very long edges that are added towards the end of the construction process in order to complete the tour (two examples are shown in Figure 8.4). This effect is avoided to some extent by a variant of the NNH that penalises insertions of such long edges [Reinelt, 1994]. Compared to the standard NNH, this variant requires only slightly more
8.2 ‘Simple’ SLS Algorithms for the TSP
369
Figure 8.4 Two examples of nearest neighbour tours for TSPLIB instances. Left: pcb1173
with 1 173 vertices, right: fl1577 with 1 577 vertices. Note the long edges contained in both tours.
computation time, but when applied to TSPLIB instances, it finds tours that are about 5% closer to optimal. Insertion heuristics construct tours in a way that is different from that underlying the NNH; in each step, they extend the current partial tour p by inserting a heuristically chosen vertex at a position that typically leads to a minimal length increase. Several variants of these heuristics exist, including (i)
nearest insertion construction heuristics, where the next vertex to be inserted is a vertex ui with minimum distance to any vertex uj in p;
(ii) cheapest insertion, which inserts a vertex that leads to the minimum increase of the weight of p over all vertices not yet in p; (iii) farthest insertion, where the next vertex to be inserted is a vertex ui for which the minimum distance to a vertex in p is maximal; (iv) random insertion, where the next vertex to be inserted is chosen randomly. For TSP instances that satisfy the triangle inequality, the tours constructed by nearest and cheapest insertion are provably at most twice as long as an optimal tour [Rosenkrantz et al., 1977], while for random and farthest insertion, the solution quality is only guaranteed to be within a factor O(log n) of the optimum [Johnson and McGeoch, 2002]. In practice, however, the farthest and random insertion heuristics perform much better than nearest and cheapest insertion, yielding tours that, in the case of TSPLIB and RUE instances, are on average between 13% and 15% worse than optimal [Johnson and McGeoch, 2002; Reinelt, 1994].
370
Chapter 8 Travelling Salesman Problems
The Greedy, Quick-Bor uvka ˚ and Savings Heuristics The construction heuristics discussed so far build a complete tour by iteratively extending a connected partial tour. An alternative approach is to iteratively build several tour fragments that are ultimately patched together into a complete tour. One example for a construction heuristic of this type is the Greedy Heuristic, which works as follows. First, all edges in the given graph G are sorted according to increasing weight. Then, this list is scanned, starting from the minimum weight edge, in linear order. An edge e is added to the current partial candidate solution p if inserting it into G , the graph that contains all vertices of G and the edges in p, does not result in any vertices of degree greater than two or any cycles of length less than n edges. There exist several variants of the Greedy Heuristic that use different criteria for choosing the edge to be added in each construction step. One of these is the Quick-Bor u˚ vka Heuristic [Applegate et al., 1999], which is inspired by the [1926]. First, the vertices in G minimum spanning tree algorithm by Boruvka ˚ are sorted arbitrarily (e.g., for metric TSP instances, the vertices can be sorted according to their first coordinate values). Then, the vertices are processed in the given order. For each vertex ui of degree less than two in G , all edges incident to ui that appear in G but not in G are considered. Of these, the minimum weight edge that results neither in a cycle of length less than n nor in a vertex of degree larger than two, is added to G . Note that at most two scans of the vertices have to be performed to generate a tour. Another construction heuristic that is based on building multiple partial tours, is the Savings Heuristic, which was initially proposed for a vehicle routing problem [Clarke and Wright, 1964]. It works by first choosing a base vertex ub and n − 1 cyclic paths (ub , ui , ub ) that consist of two vertices each. As long as more than one cyclic path is left, at each construction step two cyclic paths p1 and p2 are combined by removing one edge incident to ub in both, p1 and p2 , and by connecting the two resulting paths into a new cyclic path p12 . The edges to be removed in this operation are selected such that a maximal reduction in the cost of p12 compared to the total combined cost of p1 and p2 is achieved. Regarding worst-case performance, it can be shown that greedy tours are at most (1 + log n)/2 times longer than an optimal tour, while the length of a savings tour is at most a factor of (1 + log n) above the optimum [Ong and Moore, 1984]; no worst-case bounds on solution quality are known for QuickBoruvka ˚ tours. Empirically, the Savings Heuristic produces better tours than both Greedy and Quick-Boruvka; ˚ for example, for large RUE instances, the length of savings tours is on average around 12% above the Held-Karp lower bounds, while Greedy and Quick-Boruvka ˚ find solutions around 14% and 16% above these lower bounds, respectively [Johnson and McGeoch, 2002]. Computation
8.2 ‘Simple’ SLS Algorithms for the TSP
371
times are modest, ranging for 1 million vertex RUE instances from 22 (for Quick-Boruvka) ˚ to around 100 seconds (for Greedy and Savings) on a 500 MHz Alpha CPU.
Construction Heuristics Based on Minimum Spanning Trees Yet another class of construction heuristics builds tours based on minimumweight spanning trees (MSTs). In the simplest case, such an algorithm consists of the following four steps: First, an MST t for the given graph G is computed; then, by doubling each edge in t, a new graph G is obtained. In the third step, a Eulerian tour p of G , that is, a cyclic path that uses each edge in G exactly once, is generated; a Eulerian tour can be found in O (e), where e is the number of edges in the graph [Cormen et al., 2001]. Finally, p is converted into a Hamiltonian cycle in G by iteratively short-cutting subpaths of p (see Chapter 6 in Reinelt [1994] for an algorithm for this step). This last step, however, does not increase the weight of a tour if the given TSP instance satisfies the triangle inequality; hence, in this case, the final tour is at most twice as long as an optimal tour. However, empirically this construction heuristic performs rather poorly, with solution qualities that are on average around 40% above the optimal tour lengths for TSPLIB and RUE instances [Reinelt, 1994; Johnson and McGeoch, 2002]. Much better performance is obtained by the Christofides Heuristic [Christofides, 1976]. The central idea behind this heuristic is to compute a minimum weight perfect matching of the odd-degree vertices of the MST (there must be an even number of such vertices), which can be done in time O (k 3 ), where k is the number of odd-degree vertices. (A minimum perfect matching of a vertex set is a set of edges such that each vertex is incident to exactly one of these edges; the weight of the matching is the sum of the weights of its edges.) This is sufficient for converting the MST into an Eulerian graph, that is, a graph containing an Eulerian tour. As described previously, in a final step, this Eulerian tour is converted into a Hamiltonian cycle. For TSP instances that satisfy the triangle inequality, the resulting tours are guaranteed to be at most a factor 1.5 above the optimum solution quality. While the standard version of the Christofides Heuristic appears to perform worse than both, the Savings and Greedy Heuristics [Reinelt, 1994; Johnson and McGeoch, 2002], its performance can be substantially improved by additionally using greedy heuristics in the conversion of the Eulerian tour into a Hamiltonian cycle. The resulting variant of the Christofides Heuristic appears to be the bestperforming construction heuristic for the TSP in terms of the solution quality achieved; however, its run-time is higher than that of the Savings Heuristic by a factor that increases with instance size from about 3.2 for RUE instances with
372
Chapter 8 Travelling Salesman Problems
1 000 vertices to about 8 for RUE instances with 3.16 million vertices [Johnson et al., 2003a].
k-Exchange Iterative Improvement Methods Most iterative improvement algorithms for the TSP are based on the k -exchange neighbourhood relation, in which candidate solutions s and s are direct neighbours if, and only if, s can be obtained from s by deleting a set of k edges and rewiring the resulting fragments into a complete tour by inserting a different set of k edges. For iterative improvement algorithms for the TSP that use a fixed k exchange neighbourhood relation, k = 2 and k = 3 are the most common choices. Current knowledge suggests that the slight improvement in solution quality obtained by increasing k to four and beyond is not amortised by the substantial increase in computation time [Lin, 1965]. The most straightforward implementation of a k -exchange iterative improvement algorithm considers in each step all possible combinations for the k edges to be deleted and replaced. After deleting k edges from a given candidate solution s, the number of ways in which the resulting fragments can be reconnected into a candidate solution different from s depends on k ; for k = 2, after deleting two edges (ui , uj ) and (uk , ul ), the only way to rewire the two partial tours into a different complete tour is by introducing the edges (ui , uk ) and (ul , uj ). Note that after a 2-exchange move, one of the two partial tours is reversed. (For an illustration, see Figure 1.6, page 44.) For k = 3, there are several ways of reconnecting the three tour fragments obtained after deleting three edges, and in an iterative improvement algorithm based on this neighbourhood, all of these need to be checked for possible improvements. Figure 8.5 shows two of the four ways of completing a 3-exchange move after removing a given set of three edges. Furthermore, 2-exchange moves can be seen as special cases of 3-exchange moves in which the sets of edges deleted from and subsequently added to the given candidate tour have one element in common. Allowing an overlap between these two sets has the advantage that any tour that is locally optimal w.r.t. a k -exchange neighbourhood is also locally optimal w.r.t. to all k -exchange neighbourhoods with k < k . Based on the 2-exchange and 3-exchange neighbourhood relations, various iterative improvement algorithms for the TSP can be defined in a straightforward way; these are generally known as 2-opt and 3-opt algorithms, because they produce tours that are locally optimal w.r.t. the 2-exchange and 3-exchange neighbourhoods, respectively. In particular, different pivoting rules can be used (these determine the mechanism used for selecting an improving neighbouring candidate solution; see also Chapter 2, Section 2.1). In general, first-improvement
8.2 ‘Simple’ SLS Algorithms for the TSP
u5
u4
u6
u3
u1 u5
u4
u6
u1
u2 u5
u3
u2
373
u4
u6
u3
u1
u2
Figure 8.5 Two possible ways of reconnecting partial tours in a 3-exchange move after edges (u1 , u2 ), (u3 , u4 ) and (u5 , u6 ) have been removed from a complete tour. Note that in the left result, the relative direction of all three tour fragments is preserved.
algorithms for the TSP can be implemented in such a way that the time complexity of each search step is substantially lower than for best-improvement algorithms. But even first-improvement 2-opt and 3-opt algorithms need to examine up to O(n2 ) and O(n3 ) neighbouring candidate solutions in each step, which leads to a significant amount of CPU time per search step when applied to TSP instances with several hundreds or thousands of vertices. Fortunately, there exists a number of speedup techniques that result in significant improvements in the time complexity of local search steps [Bentley, 1992; Johnson and McGeoch, 1997; Martin et al., 1991; Reinelt, 1994].
Fixed Radius Search For any improving 2-exchange move from a tour s to a neighbouring tour s , there is at least one vertex that is incident to an edge e in s that is replaced by a different edge e with lower weight than e. This observation can be exploited for speeding up the search for an improving 2-exchange move from a given tour s. For a vertex ui , two searches are performed that consider each of
374
Chapter 8 Travelling Salesman Problems
the two tour neighbours of ui as a vertex uj , respectively. For a given uj , a search around ui is performed for vertices uk that are closer to ui than w((ui , uj )), the radius of the search. For each vertex uk found in this fixed radius near neighbour search, removing one of its two incident edges in s leads to a feasible 2-exchange move. The first such 2-exchange move that results in an improvement in solution quality is applied to s, and the iterative improvement search is continued from the resulting tour s by performing a fixed radius near neighbour search for another vertex. If fixed radius near neighbour searches for all vertices do not result in any improving 2-exchange move, the current tour is 2-optimal. The idea of fixed radius search can be extended to 3-opt [Bentley, 1992]. In this case, for each search step, two fixed radius near neighbour searches are required, one for a vertex ui as in the case of 2-opt (see above), resulting in a vertex uk , and the other for the tour neighbour ul of uk with radius w((ui , uj )) + w((uk , ul )) − w((ui , uk )).
Candidate Lists In the context of identifying candidates for k -exchange moves, it is useful to be able to efficiently access the vertices in the given graph G that are connected to a given vertex ui by edges with low weight, for example, in the form of a list of neighbouring vertices uk that is sorted according to edge weight w((ui , uk )) in ascending order. By using such candidate lists for all vertices in G, fixed radius near neighbour searches can be performed very efficiently; this is illustrated by the empirical results reported in Example 8.1 on page 376f. . Interestingly, the use of candidate lists within iterative first-improvement algorithms, such as 2-opt, often leads to improvements in the quality of the local optima found by these algorithms. This suggests that the highly localised search steps that are evaluated first when using candidate lists are more effective than other k -exchange steps. Full candidate lists comprising all n −1 other vertices require O (n2 ) memory and take O(n2 log n) time to construct. Therefore, especially to reduce memory requirements, it is often preferable to use bounded-length candidate lists; in this case, a fixed radius near neighbour search for a given vertex ui is aborted when the candidate list for ui has been completely examined, if the radius criterion did not stop the search earlier. As a consequence, the tours obtained from an iterative improvement algorithm based on this mechanism are no longer guaranteed to be locally optimal, because some improving moves may be missed. Typically, candidate lists of length 10 to 40 are used, although shorter lengths are sometimes chosen. Simply using short candidate lists that consist of the
8.2 ‘Simple’ SLS Algorithms for the TSP
375
vertices connected by the k lowest weight edges incident to a given vertex can be problematic, especially for clustered instances like those shown on the right side of Figure 8.1 (page 360). For metric TSP instances, alternative approaches to constructing bounded-length candidate lists include so-called quadrant-nearest neighbour lists [Pekny and Miller, 1994; Johnson and McGeoch, 1997] and candidate lists based on Delaunay triangulations [Reinelt, 1994]. Helsgaun proposed a more complex mechanism for constructing candidate lists that is based on an approximation to the Held-Karp lower bounds (see Section 8.1) [Helsgaun, 2000]. This mechanism works as follows: Based on the modified edge weights w ((ui , uj )) obtained from an approximation to the HeldKarp lower bounds, so-called α-values are computed for each edge (ui , uj ) as α((ui , uj )) := w (t+ (ui , uj )) − w (t), where w (t) is the weight of a minimum weight one-tree t and w (t+ (ui , uj )) is the weight of a minimum weight one-tree t+ (ui , uj ) that is forced to contain edge (ui , uj ). For each edge, α(ui , uj ) ≥ 0, and α(ui , uj ) = 0 if the edge (ui , uj ) is contained in some minimum weight onetree. A candidate list for a vertex ui can now be obtained by sorting the edges incident to ui according to their α-values in ascending order and bounding the length of the list to a fixed value k or by accepting only edges with α-values that are below some given threshold. The vertices contained in these candidate lists are called α-nearest neighbours. Empirically it was shown that compared to the candidate lists obtained by the other methods mentioned above, candidate lists based on α-values can be much smaller and still cover all edges contained in an optimal solution. For example, for TSPLIB instance att532, candidate lists consisting of 5 α-nearest neighbours cover an optimal solution, while list length 22 is required when using standard candidate lists based on the given edge weights [Helsgaun, 2000].
Don’t Look Bits Another widely used mechanism for speeding up iterative improvement search for the TSP is based on the following observation. If in a given search step, no improving k -exchange move can be found for a given vertex ui (e.g., in a fixed radius near neighbour search), it is unlikely that an improving move involving ui will be found in future search steps, unless at least one of the edges incident to ui in the current tour has changed. This can be exploited for speeding up the search process by associating a single don’t look bit (DLB) with each vertex; at the start of the iterative improvement search, all DLBs are turned off (i.e., set to zero). If in a search step no improving move can be found for a given vertex, the respective DLB is turned on (i.e., set to one). After each local search step, the DLBs of all vertices incident to edges that
376
Chapter 8 Travelling Salesman Problems
were modified (i.e., deleted from or added to the current tour) in this step are turned off again. The search for improving moves is started only at vertices whose DLB is turned off. In practice, the DLB mechanism significantly reduces the time complexity of first-improvement search, since after a few neighbourhood scans, most of the DLBs will be turned on. The speedup that can be achieved by using DLBs is illustrated by the empirical results for various variants of 2-opt shown in Example 8.1. The DLB mechanism can be easily integrated into more complex SLS methods, such as Iterated Local Search or Memetic Algorithms. One possibility is to set only the DLBs of those vertices to zero that are incident to edges that were deleted by the application of a tour perturbation or a recombination operator; this approach is followed in various algorithms described in Sections 8.3 and 8.4 and typically leads to a further substantial reduction of computation time when compared to resetting all DLBs to zero. Furthermore, DLBs can be used to speed up first-improvement local search algorithms for combinatorial problems other than TSP.
Example 8.1 Effects of Speedup Techniques for 2-opt
To illustrate the effectiveness of the previously discussed speedup techniques, we empirically evaluated three variants of 2-opt: a straight-forward implementation that in each search step evaluates every possible 2-exchange move (2-opt-std); a fixed radius near neighbour search that uses candidate lists of unbounded length (2-opt-fr+cl); and a fixed radius near neighbour search that uses candidate lists of unbounded length as well as DLBs (2-opt-fr+cl+dlb). For all variants, the search process was initialised at a random permutation of the vertices, and it was terminated as soon as a local minimum was encountered. These algorithms were run 1 000 times on several benchmark instances from TSPLIB using an Athlon 1.2 GHz MP CPU with 1 GB RAM running Suse Linux 7.3. (The 2-opt implementation used for these experiments is available from www.sls-book.net.) The results reported in Table 8.1 show that the speedup techniques achieve substantial decreases in run-time over a standard implementation, an effect that increases strongly with instance size. The most significant speedup seems to be due to the combination of fixed-radius nearest neighbour search with candidate lists, while the additional use of DLBs can sometimes reduce the computation times by another factor of two. In addition, the bias in the local search towards first examining the most promising moves that is introduced by the use of candidate lists results in a significant improvement in the solution quality obtained by 2-opt; the use of DLBs diminishes this effect only slightly. When bounded length candidate lists are used, very
8.2 ‘Simple’ SLS Algorithms for the TSP
2-opt-std
2-opt-fr + cl
377
2-optfr + cl + dlb
3-opt-fr + cl
Instance
∆avg
tavg
∆avg
tavg
∆avg
tavg
∆avg
tavg
rat783
13.0 14.5 16.8 13.6 15.0 14.7 12.9 13.6 16.2 —
93.2 250.2 315.6 528.2 1 421.2 3 862.4 19 175.0 80 682.0 360 386.0
3.9 8.5 10.1 7.9 8.8 8.2 6.9 7.1 8.0 7.4
3.9 10.8 13.0 21.1 47.9 73.0 162.2 406.7 1 544.1 1 560.1
8.0 9.3 11.1 9.0 10.1 9.4 8.0 8.6 9.9 9.0
3.3 7.1 7.4 11.1 24.9 40.2 87.4 194.8 606.6 787.6
3.7 4.6 4.9 22.4 4.5 4.4 3.7 6.0 4.6 4.4
34.6 66.5 76.4 93.4 188.7 277.7 811.6 2 260.6 8 628.6 7 807.5
pcb1173 d1291 fl1577 pr2392 pcb3038 fnl4461 pla7397 rl11849 usa13509
Table 8.1 Computational results for different variants of 2-opt and 3-opt. ∆avg denotes the average percentage deviation from the optimal solution quality over 1 000 runs per instance, and tavg is the average run-time for 1 000 runs of the respective algorithm, measured in CPU milliseconds on an Athlon 1.2 GHz CPU with 1GB of RAM. (For further details, see text.)
similar results were obtained for most instances (not shown here); only on the pathologically clustered instance fl1577, the solution quality decreases to an average of almost 60% above the optimum, while the computation time is reduced by about 10% (these observations were made for a length bound of 40). 3-opt achieves better-quality solutions than the previously mentioned 2-opt variants at the cost of substantially higher computation times; this is illustrated by the results for 3-opt with a fixed-radius search using candidate lists of a length limited to 40 shown in the last column of Table 8.1. Interestingly, using unbounded candidate lists for 3-opt leads to computation times that can be substantially higher than the ones reported in Table 8.1. This illustrates that bounding the length of candidate lists becomes increasingly important in the context of local search algorithms based on larger neighbourhoods.
The Lin-Kernighan (LK) Algorithm Empirical evidence suggests that iterative improvement algorithms based on k -exchange neighbourhoods with k > 3 return better tours, but the
378
Chapter 8 Travelling Salesman Problems
computation times required for searching these large neighbourhoods render this approach ineffective. Variable depth search algorithms overcome this problem by partially exploring larger neighbourhoods (see also Chapter 2, page 67ff.). The best-known variable depth search method for the TSP is the LinKernighan (LK) Algorithm that was described from a high level perspective in Chapter 2 (page 68ff.); it is an iterative improvement method that uses complex search steps obtained by iteratively concatenating a variable number of elementary 1-exchange moves. In each complex step, which we also call an LK step, a set of edges X := {x1 , . . . , xr } is deleted from the current tour, and another set of edges Y := {y1 , . . . , yr } is added to it. The number of edges that are exchanged, r, is determined dynamically and can vary for each complex search step. (This is explained in more detail below.) For an overview of the general idea underlying the construction of the LK steps, we refer to Figure 2.4 (page 69) and the text description given there. The two sets X and Y are constructed iteratively, element by element, such that edges xi and yi as well as yi and xi+1 must share an endpoint, respectively; a complex step that satisfies this criterion is called sequential. Based on this criterion, the edges in X and Y can be represented as xi = (u2i−1 , u2i ) and yi = (u2i , u2i+1 ), respectively. Furthermore, at any point during the iterative construction of a complex step, that is, for any X = {x1 , . . . , xi } and Y = {y1 , . . . , yi }, there needs to be an alternative edge yi such that the complex step defined by X := {x1 , . . . , xi } and Y := {y1 , . . . , yi } applied to the current tour yields a valid tour, (i.e., a Hamiltonian cycle in the given graph G); there is only one exception to this rule for the case i = 2, which is treated in a special way [Lin and Kernighan, 1973]. The Lin-Kernighan Algorithm initialises the search process at a randomly chosen Hamiltonian cycle (i.e., a vertex permutation) of the given graph G. The search for each improving (complex) LK step starts with selecting a vertex u1 ; next, an edge x1 := (u1 , u2 ) is selected for removal, then an edge y1 := (u2 , u3 ) is chosen to be added, etc. At each stage of this construction process, the length w(pi ) of the tour pi obtained by applying the constructive search step determined by X := {x1 , . . . , xi } and Y := {y1 , . . . , yi } (as defined above) is computed i as well as gi := j=1 w(yj ) − w(xj ), the total gain for X = {x1 , . . . , xi } and Y = {y1 , . . . , yi }. The construction process is terminated whenever the total gain gi is smaller than w(p)− w(pi∗ ), where p is the current tour and pi∗ is the best tour encountered during the construction, that is, i∗ := argmini w(pi ). At this point, if the complex step corresponding to X := {x1 , . . . , xi∗ } and Y := {y1 , . . . , yi∗ } leads to an improvement in solution quality, this step is executed and pi∗ becomes the current tour.
8.2 ‘Simple’ SLS Algorithms for the TSP
379
To bound the length of the search for an improving complex step, in the original Lin-Kernighan Algorithm the sets X and Y are required to be disjoint; this means that an edge that has been removed cannot be added back later in the same complex search step and vice versa. A limited amount of backtracking is allowed if a sequence of elementary moves does not yield an improved tour. In LK, backtracking is triggered when during the construction of an LK step no improving one has been found; backtracking is applied only at the first two levels, that is, for the choices of x1 , y1 , x2 and y2 . During backtracking, alternatives for edge y2 are considered in order of increasing (or equal) weight w(y2 ). If this is unsuccessful, the alternative choice for x2 is considered; since this leads to a temporary violation of the sequentiality criterion (see above), it needs to be handled in a special way. If none of these alternatives for x2 and y2 can be extended into an improving complex step, backtracking is applied to the choice of y1 . When all alternatives for y1 are exhausted without finding an improving complex step, the other edge incident to the starting vertex u1 is considered as a choice for x1 . Only after all these attempts at finding an improving step by a search centred at vertex u1 have failed, is an alternate choice for u1 considered. This backtracking mechanism ensures that all 2- and 3-exchange moves are checked when searching for improving search steps; consequently, the tours obtained by LK (when run to completion) are locally optimal w.r.t. to the 2- and 3-exchange neighbourhoods. In addition to the complex LK steps, Lin and Kernighan also proposed to consider some specially structured, non-sequential 4-exchange moves as candidates for improving search steps. (An example of a non-sequential 4-exchange move is the double-bridge move illustrated in Figure 2.11 on page 88.) However, Lin and Kernighan noted that the improvement obtained by the additional check of these moves depends strongly on the given TSP instance. The Lin-Kernighan Algorithm uses several techniques for pruning the search. Firstly, the search for edges (v, v ) to be added to Y is limited to the five shortest edges incident to vertex v . Secondly, for k ≥ 4, no edge in the current tour can be removed if it was contained in a collection of previously found high-quality tours. Furthermore, several mechanisms are provided for guiding the search. These include a rule that the edges to be added to Y are chosen such that w(xi+1 ) − w(yi ) is maximised (a limited form of look-ahead) and a preference for the longer of two alternative edges in the context of choosing edge x4 , one of the edges that is removed from the current tour. Lin and Kernighan applied LK to various TSP instances ranging from 20 to 110 vertices. For all of these instances, LK found optimal solutions; however, the success probability (i.e., the probability that one run of LK finds an optimal solution) dropped from 1 for small instances with about 20 cities to approximately 0.25 for instances with about 100 cities.
380
Chapter 8 Travelling Salesman Problems
Variants of the LK Algorithm The details of the original LK Algorithm can be varied in many ways, and the design choices made by Lin and Kernighan do not necessarily lead to optimal performance. These design choices include the depth and the width of backtracking, the rules used for guiding the search (e.g., look-ahead), the use of a bound on the length of complex LK steps and the choice of 2-exchange moves as elementary search steps. Additional room for variation exists w.r.t. algorithmic details that are not specific to LK, such as the type and length of neighbourhood list or the search initialisation procedure. Some alternatives for these design choices are realised in the four well-known LK variants by Johnson and McGeoch [1997; 2002], Applegate et al. [1999], Neto [1999] and Helsgaun [2000]. For a detailed discussion of these LK algorithms and their performance we refer to the original papers. A particularly noteworthy LK variant is Helsgaun’s LK (LK-H), which differs from the original Lin-Kernighan Algorithm in several key features and typically performs substantially better. In LK-H, the complex moves correspond to sequences of sequential 5-exchange moves; these are iteratively built using candidate lists based on α-values (see page 375). If at any point during the construction of a complex step a tour improvement can be achieved, the corresponding search step is executed immediately. In some sense this corresponds to a firstimprovement search mechanism within the construction sequence for a single complex search step, while the original LK algorithm uses a best-improvement strategy in this context. Finally, LK-H uses only backtracking on the choice of x1 , the first edge to be removed from the current tour. Example 8.2 Performance of LK Algorithms
In this example, we illustrate the performance obtained by current LK algorithms on a number of TSPLIB instances. On the left side of Figure 8.6, we show the asymptotic solution quality distributions (asymptotic SQDs, for details see Section 4.2, page 162ff.) for two prominent, publically available LK implementations, the LK-H algorithm (version 1.2) and the LK algorithm of Applegate, Bixby, Chvatál and Cook (LK-ABCC, 99.12.15 release). In addition, we present the asymptotic SQD for the 3-opt algorithm used for generating the results from Table 8.1 (page 377). Each asymptotic SQD is based on 1 000 runs starting from random initial tours. On the right side of Figure 8.6 we show the distribution of the computation times required for reaching a local optimum. Additional summary results for the two LK variants applied to the TSPLIB instances from Example 8.1 are shown in Table 8.2 together with the results for the 3-opt algorithm
381
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
P
P
8.2 ‘Simple’ SLS Algorithms for the TSP
LK-H LK-ABCC 3opt-S
0
1
2
3
4
5
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
LK-H LK-ABCC 3opt-S
0
6
0.5
relative solution quality [%]
1
1.5
2
2.5
3
run-time [CPU sec]
Figure 8.6 Left side: Asymptotic solution quality distributions of LK-H, LK-ABCC and the
3-opt algorithm from Example 8.1 on TSPLIB instance pcb3038. Right side: Distribution of the computation times to run a local search from a random initial solution for the three algorithms on instance pcb3038.
LK-ABCC Instance
rat783 pcb1173 d1291 fl1577 pr2392 pcb3038 fnl4461 pla7397 rl11849 usa13509
3-opt-fr + cl
LK-H
∆avg
tavg
∆avg
tavg
∆avg
tavg
1.85 2.25 5.11 9.95 2.39 2.14 1.74 4.05 6.00 3.23
21.0 45.3 63.0 114.1 84.9 134.3 239.3 625.8 1 072.3 1 299.5
0.04 0.24 0.62 5.30 0.19 0.19 0.09 0.40 0.38 0.19
61.8 238.3 444.4 1 513.6 1 080.7 1 437.9 1 442.2 8 468.6 9 681.9 13 041.9
3.7 4.6 4.9 22.4 4.5 4.4 3.7 6.0 4.6 4.4
34.6 66.5 76.4 93.4 188.7 277.7 811.6 2 260.6 8 628.6 7 807.5
Table 8.2 Computational results for LK-ABCC and LK-H and the 3-opt algorithm from
Example 8.1 on various TSPLIB instances. ∆avg denotes the average percentage deviation from the optimum solution quality and tavg is the average run-time over 1 000 runs of the respective algorithm, measured in CPU milliseconds on an Athlon 1.2 GHz CPU with 1GB of RAM. (For further details, see text.)
from Example 8.1. (All experiments were performed on an Athlon 1.2 GHz MP CPU with 1 GB RAM running Suse Linux 7.3.) As can be observed from these results, LK-H finds significantly better tours than LK-ABCC and 3-opt. However, it does so at the cost of
382
Chapter 8 Travelling Salesman Problems
significantly higher computation times. In fact, the run-times shown in Figure 8.6 and Table 8.2 do not include the preprocessing times required by LKH for generating the candidate sets (LK-H uses the α-nearest neighbours described on page 375, which are time-intensive to compute). In the case of instance pcb3038, this preprocessing requires 4.41 CPU seconds and it increases to about 131 CPU seconds for instance usa13509. The preprocessing times for the two other algorithms are much smaller; for example, LK-ABCC requires a preprocessing time of 0.53 CPU seconds for instance usa13509. This example shows that different LK algorithms can vary substantially w.r.t. their performance, as measured by solution quality or computation time.
In-Depth Efficiently Implementing SLS Algorithms for the TSP In order to obtain the performance results reported in most empirical studies on SLS algorithms for the TSP, efficient implementations are required that make use of fairly sophisticated data structures. This is particularly true for state-of-the-art LK variants. In general, data structures used within SLS algorithms for the TSP need to support the following operations: (i) determine where a given vertex is located within a tour; (ii) determine the successor and the predecessor of a vertex within a given tour; (iii) check whether a vertex uk is visited between vertices ui and uj for a given tour and orientation; and (iv) execute a k -exchange move, which includes swaps and inversions of tour segments. For TSP instances up to around 1 000 vertices, the standard array representation for tours appears to be most efficient [Fredman et al., 1995]. In this representation, a cyclic path (uφ(1) , . . . , uφ(n) , uφ(1) ) in the given graph G is stored in two arrays, which hold the permutation of vertex indices φ := (φ(1), . . . , φ(n)) and its inverse, ψ := (ψ(1), . . . , ψ(n)), where ψ(i) is the position of vertex index i in φ. Clearly, the predecessor, successor and ‘between’ queries can be answered in constant time. The time-complexity of the move operation, however, has been empirically determined as O(n0.7 ) [Bentley, 1992]; this operation is therefore a bottleneck for large instances, and more advanced data structures are required to reduce its time complexity. One widely used alternative to the array representation is based on two-level trees [Chrobak et al., 1990; Fredman et al., 1995]. In this representation, a tour is divided into √ √ √ roughly n segments of length between 1/2 · n and 2 · n each; these segments are represented by vertices at the first level of a tree whose root corresponds to the entire tour, while the leaves are the vertices of G. (For details on the implementation of this data structure we refer to Fredman et al. [1995] and Applegate et al. [1999].) When using twolevel trees, the successor and predecessor of a vertex can be determined in constant time
8.2 ‘Simple’ SLS Algorithms for the TSP
383
and the same holds for answering ‘between’ queries; the respective constants, however, are slightly larger than for the array representation. The worst-case complexity of the √ move operation, on the other hand, is only O( n). Based on extensive computational experiments, Fredman et al. [1995] recommend the use of the two-level tree representation when solving TSP instances with up to around one million vertices. For larger instances, they recommend to use a tour representation based on splay-trees [Sleator and Tarjan, 1985], which allows each operation to be performed in O(log n) in the worst case. It should be noted that LK algorithms are not easy to implement efficiently. Neto estimated that the development of a high-performance LK implementation that uses most of the techniques described here requires around eight man-months [Neto, 1999]; this estimate has been confirmed by other researchers [Merz, 2002]. Fortunately, at least three very efficient implementations of LK variants are publicly available. These are the LK implementation by Applegate, Bixby, Chvatal, ´ and Cook, which is a part of the Concorde library [Applegate et al., 2003b], Helsgaun’s LK variant [Helsgaun, 2003] and Neto’s LK implementation [Neto, 2003].
Local Search for the Asymmetric TSP ATSP algorithms are generally much less studied than algorithms for symmetric TSP instances; in particular, this is true for construction heuristics as well as for ‘simple’ local search methods. Although most construction heuristics are directly applicable to the ATSP, few computational results are available [Johnson et al., 2002]. Empirical results show that for the ATSP, in contrast to what is observed on symmetric TSP instances, the Nearest Neighbour Heuristic typically performs much better than the Greedy Heuristic. However, for many classes of ATSP instances, even better results (in terms of solution quality) are obtained by construction heuristics that are based on the assignment problem lower bound for the ATSP. These constructive search methods generate a tour by iteratively merging a set of vertex-disjoint simple directed cycles that forms a minimum cost vertex cover of the given graph G; such sets are obtained as a side product of the computation of the assignment problem lower bound. An early heuristic for the merging step has been developed by Karp [1979]; a variant of this approach was recently proposed by Glover et al. [2001]. When applying iterative improvement algorithms to the ATSP, a slight complication arises from the fact that sub-tour reversals lead to changes in solution quality. While 2-exchange moves always involve sub-tour reversals, there is a specific 3-exchange move that preserves the direction of all partial tours. The iterative improvement methods based on this type of move are called reduced 3-opt algorithms; these are amongst the simplest iterative improvement algorithms for the ATSP. The speedup techniques described above for symmetric TSP algorithms can be directly applied to reduced 3-opt.
384
Chapter 8 Travelling Salesman Problems
A variable depth search algorithm for the ATSP has been developed by Kanellakis and Papadimitriou [1980]. This KP algorithm can be seen as an adaptation of the Lin-Kernighan Algorithm to the ATSP; it makes use of double-bridge moves, a special type of non-sequential 4-exchange moves. An implementation of this method by Cirasella et al. [2001] has been shown to yield significantly better solution qualities than reduced 3-opt, albeit at the cost of substantially increased run-times [Johnson et al., 2002].
8.3 Iterated Local Search Algorithms for the TSP Iterated Local Search (ILS), as introduced in Chapter 2, Section 2.3, offers a straight-forward, yet flexible way of extending simple local search algorithms (see also the algorithm outline on page 86 and the GLSM model on page 136). ILS applications for the TSP have a long history, and some of the hybrid SLS algorithms thus obtained are amongst the best-performing TSP algorithms currently known.
Iterated Descent Historically, Iterated Descent by Baum [1986a; 1986b] was the first ILS method for the TSP. Variants of Iterated Descent use different first-improvement methods as their subsidiary local search procedure, including 2-opt, a limited form of 3-opt that examines only a part of the 3-exchange neighbourhood, and a firstimprovement algorithm that is based on a 2-exchange neighbourhood on vertices under which two candidate solutions are direct neighbours if, and only if, the corresponding vertex permutations differ in exactly two positions. The perturbation phase of Iterated Descent consists of a random 2-exchange step, and its acceptance criterion always selects the candidate solution with the better solution quality. Although Iterated Descent performs better than pure 2-opt and 3-opt local search, from today’s perspective, its performance is not impressive. Most likely, the most substantial weakness of Iterated Descent is its perturbation mechanism. It is now also known that the 2-opt and 3-opt local search procedures used in Iterated Descent perform poorly on the RDM instances used in Baum’s empirical evaluation.
Large-Step Markov Chains (LSMC) The Large-Step Markov Chains (LSMC) algorithm by Martin, Otto and Felten [1991; 1992] is the first high-performance ILS algorithm for the TSP. The name
8.3 Iterated Local Search Algorithms for the TSP
385
of this approach reflects the fact that the behaviour of LSMC (like that of many other ILS algorithms) can be modelled as a Markov chain (see, e.g., Papoulis [1991]) on the locally minimal candidate solutions obtained at the end of each local search phase, where the segment of the search trajectory between any two such subsequent local minima corresponds to a ‘large step’. One important contribution of the LSMC approach is the exploitation of a particular 4-exchange step, the so-called double-bridge move. A double-bridge move first removes four edges from the tour, resulting in a decomposition into four segments A, B, C, D. Then, these segments are reconnected in the order A, D, C, B by adding four new edges (for a graphical illustration, see Figure 2.11 on page 88). The double-bridge move was originally introduced by Lin and Kernighan [1973] in their LK algorithm, where this type of search step is applied within the iterative improvement process. However, the double-bridge move is typically not used in current variants of LK, with the notable exception of LK-H. In LSMC, a random double-bridge move is used to perturb a locally optimal tour. LSMC considers only double-bridge moves for which the combined weight of the four new edges is lower than a constant k times the average edge weight in the current locally optimal candidate solution. Originally, a value of k := 10 was used; but experimental results suggest that the performance of the algorithm is not very sensitive to the value of k , as long as it is not too small [Martin et al., 1992]. As its subsidiary local search procedure, the first LSMC algorithm initially used a 3-opt first-improvement search, which was later replaced by a more powerful LK algorithm. The local search procedure exploits three speed-up techniques, namely: (i) a type of fixed radius search that uses the minimum and the maximum weight edge in the current candidate solution for pruning the search for improving 3-exchange steps; (ii) a so-called change list, a concept that is equivalent to the use of don’t look bits; and (iii) a hash table for storing 3-opt candidate solutions, which is consulted for checking whether a tour has been previously identified to be locally optimal. The acceptance criterion in LSMC is taken from Simulated Annealing: Given two candidate tours s and s , where s has been obtained from s by perturbation and subsequent local search, s is always accepted if its solution quality is better than that of s; otherwise, s is accepted with probability exp(f (s) − f (s ))/T ), where T is a parameter called temperature. However, in a later variant an acceptance criterion was used that always accepts the better of the two candidate solutions. The resulting zero-temperature LSMC algorithm is also known as Chained Local Optimisation (CLO) [Martin and Otto, 1996]. LSMC with a subsidiary 3-opt local search procedure has been shown to solve small random Euclidean TSP instances with up to 200 cities in less than one CPU hour on a SUN SPARC 1 workstation (a very slow computer compared to current PCs). Relatively good performance was also observed on several TSPLIB instances. For example, LSMC found an optimal solution of instance lin318 in
386
Chapter 8 Travelling Salesman Problems
about four CPU hours on the SUN SPARC 1; by using a LK algorithm as the subsidiary local search procedure, the time required for solving this instance optimally was reduced by a factor of about four. In this particular case, however, it was shown to be essential to use non-zero temperatures in the LSMC acceptance criterion. LSMC with the LK subsidiary local search procedure also found optimal solutions to several larger TSPLIB instances, including att532 and rat783 [Martin et al., 1991].
Iterated Lin-Kernighan An early variation on LSMC is the Iterated Lin-Kernighan (ILK) algorithm developed by Johnson and McGeoch [Johnson, 1990; Johnson and McGeoch, 1997]. There are several key differences between the LSMC algorithm and Johnson’s ILK. Firstly, the acceptance criterion used in ILK always selects the better of the two locally optimal candidate solutions. Secondly, the perturbation phase does not make use of the limiting condition on the edge weight of a double-bridge move imposed in the LSMC approach, but it applies random double-bridge moves instead (i.e., the four cut-points are chosen uniformly at random). Thirdly, the local search is initialised with a randomised version of the Greedy Heuristic; in the randomised version, instead of deterministically selecting the minimum weight feasible edge, between the two shortest edges the one with less weight is chosen with a probability of 2/3, and the other one is selected in the remaining cases. Finally, ILK uses a substantially more efficient implementation of LK, and consequently ILK performs better than LSMC, especially when considering computation time. Early results for ILK (as of 1990) were quite promising: Applied to TSPLIB instances with 318 to 2 392 vertices, optimal solutions were obtained (for the 2 392 vertex instance, this required about 55 CPU hours on a Sequent computer [Johnson, 1990], an extremely slow machine compared to current PCs). The ILK algorithm was further fine-tuned and extensively tested for an overview article on the state-of-the-art in incomplete TSP algorithms [Johnson and McGeoch, 1997]. The main differences between the 1997 variant and the earlier ILK algorithm appear to be the exploitation of don’t look bits after the double-bridge move and the use of a bound on the depth of the LK search. This ‘production-mode ILK’ was shown to achieve optimal or close to optimal solutions on a variety of random Euclidean instances and TSPLIB instances. Difficulties in finding solutions within less than one percent of the optimum solution quality were only reported on TSPLIB instance fl3795, which shows a pathological clustering similar to instance fl1577 depicted in Figure 8.1 on page 360. Running times were rather modest; for example, on 10 000 vertex RUE instances, n iterations of ILK took approximately 1 570 seconds on a SGI
8.3 Iterated Local Search Algorithms for the TSP
387
Challenge 196 MHz machine [Johnson and McGeoch, 1997]. The ILK algorithm was for a considerable time the state-of-the-art SLS algorithm for the TSP.
Chained Lin-Kernighan Like ILK, the Chained Lin-Kernighan algorithm (CLK-ABCC) by Applegate et al. [1999] uses the LK algorithm as its subsidiary local search procedure. CLK-ABCC differs from ILK in various details of the LK local search, including its use of smaller candidate sets (by default it uses quadrant nearest neighbour sets of size 12); it also uses a different perturbation mechanism that affects only a locally restricted part of a candidate solution, and initialises the search using the Quick-Boruvka ˚ construction heuristic (see Section 8.2). For further details on the LK local search procedure used in CLK-ABCC, we refer to Applegate et al. [1999] and to the original CLK-ABCC code, which is available from the web page of Applegate et al. [2003b]. In the following, we focus on some of the other algorithmic features of CLK-ABCC, particularly the perturbation mechanism. The standard CLK-ABCC algorithm uses so-called geometric double-bridge moves as perturbation steps; these are based on the following method for selecting the four edges to be removed from the current candidate tour s. For convenience, a direction is imposed on s and we denote the arc between vertex u and its direct tour successor succ(u) as (u, succ(u)). In a first step, a set U of min{0.001 · n, 10} vertices are randomly sampled from the given graph G := (V, E, w). Then, among the edges (u, succ(u)) contained in s with u ∈ U , the one with the maximal difference w((u, succ(u))) − w((u, u∗ )), where u∗ is the nearest neighbour of u, — that is, the vertex in V that minimises w((u, u∗ )) — is removed from s. In a second step, three vertices are chosen uniformly at random from the k nearest neighbours of vertex u. Let u1 , u2 and u3 denote these vertices; then, the edges (ui , succ(ui )), i = 1, 2, 3 are removed. The four edges chosen in this process determine the double-bridge move used for perturbation. The value of k controls the locality of the perturbation. For small k , a geometric doublebridge move results in a localised perturbation that only affects edges close to one specific vertex, while for large k , less localised perturbations are obtained. An interesting detail of the CLK-ABCC algorithm is the resetting strategy for the don’t look bits (DLBs) that is applied after a perturbation. Many ILS algorithms for the TSP, including LSMC and ILK, reset only the DLBs of vertices that are incident to edges changed by the perturbation; then, only these vertices are considered as starting points for the search for an improving move. CLKABCC additionally resets the DLBs of all vertices that are at most ten edges away from the end-points of the modified edges in the current tour, as well as the DLBs of the vertices in the neighbour sets of these endpoints. Experimental results
388
Chapter 8 Travelling Salesman Problems
suggest that among several alternative DLB resetting strategies, this mechanism leads to the best overall performance of CLK-ABCC [Applegate et al., 1999]. (Similar observations have been made independently by Stützle [1998c] in the context of Iterated 2-opt and 3-opt for the TSP.) The Chained Lin-Kernighan algorithm by Applegate, Cook and Rohe (CLKACR) is a variant of CLK-ABCC that differs from this earlier algorithm in two main aspects [Applegate et al., 2003c]. Firstly, CLK-ACR uses a different mechanism for selecting the double-bridge move in the perturbation step. The first edge to be deleted, (u, succ(u)), is chosen as described for CLK-ABCC. The three other edges to be removed are of the form (ui , succ(ui )), i = 1, 2, 3, where each vertex ui is obtained as the endpoint of a random walk of length l in the neighbourhood graph, starting from vertex u. The neighbourhood graph is defined as (V, E ), where V is the set of vertices of the given instance, and E is the set of all edges of the form (u, v ) such that v is in the candidate list of u in the LK heuristic. (As in CLK-ABCC, by default, the candidate lists consist of the 12 quadrant nearest neighbours.) The locality of the perturbation can be controlled by varying the length l of the random walk. It was found that for long runs, better results are obtained when using higher values of l [Applegate et al., 2003c]. Secondly, CLKACR is based on a subsidiary LK search procedure that differs from the one used in CLK-ABCC w.r.t. the depth and the breadth of the backtracking mechanism. A performance comparison of CLK-ABCC and CLK-ACR in the context of the 8th DIMACS Implementation Challenge on the TSP revealed that CLK-ACR is slightly superior on the largest instances tested (see Johnson and McGeoch [2002] as well as the DIMACS challenge web pages by Johnson et al. [2003a]). It is somewhat unclear how the performance of those algorithms compares to that of ILK. When running both algorithms for n iterations, ILK finds better-quality tours than CLK-ACR or CLK-ABCC on most instances, however the run-times required by ILK are several times larger than those of CLK-ACR and CLKABCC. (The difference corresponds to a factor of two to five for RUE instances, and is substantially larger for TSPLIB and RCE instances.) The CLK-ACR code is capable of handling extremely large TSP instances and has been applied to instances up to 25 million vertices, where it reached a solution quality within 1% of the estimated optimum in 24 CPU hours on an IBM RS6000 Model 43-P 260 workstation with 4 GB RAM; given a total run-time of 8 CPU days, a solution quality within 0.3% from the estimated optimum was obtained.
Iterated Helsgaun (ILK-H) Given the excellent performance of LK-H, Helsgaun’s variant of the LK algorithm (see also Section 8.2), using this local search procedure as the core of an ILS
8.3 Iterated Local Search Algorithms for the TSP
389
procedure constructionILK-H(G, sˆ) input: weighted graph G := (V, E, w), incumbent candidate solution sˆ output: candidate solution s ∈ S(π )
p := empty tour; ui := selectVertexRandomly(V ); append vertex ui to partial tour p; while p is not a complete tour do C := {uj | (ui , uj ) is a candidate edge ∧ α((ui , uj )) = 0 ∧ (ui , uj ) ∈ sˆ}; if C = ∅ then C := {uj | (ui , uj ) is a candidate edge}; end if C = ∅ then
C := {uj | uj not chosen yet}; end
uj := choosePseudoRandomVertex(C); append vertex uj to partial tour p; ui := uj ; end return p end constructionILK-H Figure 8.7 The construction procedure used in the perturbation phase of the ILK-H
algorithm. (For details, see text.)
algorithm is a fairly obvious idea. This leads to the Iterated Helsgaun Algorithm (ILK-H) [Helsgaun, 2000], which is one of the best-performing SLS algorithms for the TSP currently known in terms of the solution quality reached [Helsgaun, 2000; Johnson and McGeoch, 2002]. Since the subsidiary local search procedure of LK-H may use double-bridge moves, a perturbation mechanism based on this type of move can be expected to be insufficient. Instead, the perturbation mechanism used in ILK-H is based on a construction heuristic that is strongly biased by the incumbent candidate solution. This constructive search procedure, shown in Figure 8.7, iteratively builds a candidate solution for a given TSP instance in a manner similar to the Nearest Neighbour Heuristic. Starting from a randomly selected vertex, in each step the partial tour is extended with a vertex uj that is not contained in the current incomplete path p. Let ui be the current end point of path p. Then, vertex uj is chosen for extending the partial tour p if the edge (ui , uj ) is contained in the incumbent candidate solution, (ui , uj ) is contained in the candidate list for vertex ui , and α((ui , uj )) = 0 (see Section 8.2, page 375 for the definition of α-values). If at any stage of the search process no such vertex exists, a vertex uj contained in
390
Chapter 8 Travelling Salesman Problems
ui ’s candidate list is chosen, if possible. In particular, the vertex with the smallest index in ui ’s candidate list that is not contained in the current partial tour p is chosen. If no such vertex can be found, a list of all vertices is traversed until a vertex uj is found that is not contained in p. The acceptance criterion used in ILK-H only accepts tours that lead to an improvement in the incumbent candidate solution. In addition to standard speedup techniques, such as don’t look bits, ILK-H uses hashing techniques (originally described by Lin and Kernighan [1973]) to efficiently check whether a solution has been earlier found to be a local optimum. Further details of ILK-H can also be directly checked in the source code [Helsgaun, 2003]. ILK-H finds optimal solutions for many TSPLIB instances with up to several thousand vertices within relatively short run-times (CPU minutes on a highperformance PC). Longer runs of ILK-H resulted in improvements over the best known solutions to the largest unsolved TSPLIB instances, for many of the instances from VLSI design, and for the World TSP instance mentioned in Section 8.1; in the latter case, lower bound computations have shown that the solution found by ILK-H deviates at most 0.098% from the optimal solution quality (see Applegate et al. [2003b]). Example 8.3 Performance of ILS Algorithms for the TSP
Applied to large TSP instances, state-of-the-art ILS algorithms can find solutions whose quality is within fractions of a percent from the optimum within reasonable computation times. Figure 8.8 shows qualified run-time distributions (QRTDs) for CLK-ABCC and ILK-H on TSPLIB instance pcb3038 with 3 038 vertices, based on 100 independent runs of each of the two algorithms on an Athlon 1.2 GHz MP CPU with 1GB RAM running Suse Linux 7.3; each run was terminated as soon as either the given target solution quality or a cutoff time of 10 000 CPU seconds had been reached. The parameters of both algorithms were set to the respective default values in the publicly available codes. (For the experiments reported here, the 99.12.15 release of CLK-ABCC and version 1.2 of ILK-H were used.) These QRTDs show that both algorithms very quickly reach solution qualities that are at most 0.5% above the optimum (the median and maximum computation times to find solutions of this quality were measured as 1.4 and 15.7 CPU seconds for CLK-ABCC and 1.5 and 2.5 CPU seconds for ILK-H, respectively). Furthermore, CLK-ABCC can reach higher solution qualities (e.g., within 0.1% of the optimum), with a reasonably high probability, but it starts showing severe stagnation behaviour, as can be seen from the exponential distribution on the left side of Figure 8.8, which indicates the expected run-time behaviour when using an optimal static restart strategy.
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10–1
P(solve)
P(solve)
8.3 Iterated Local Search Algorithms for the TSP
CLK-ABCC 0.5 CLK-ABCC 0.25 CLK-ABCC 0.1 ed[161]
1
10
102
run-time [CPU sec]
103
104
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10–1
391
ILK-H 0.5 ILK-H 0.25 ILK-H 0.1 ILK-H opt ed[133]
1
10
102
103
104
run-time [CPU sec]
Figure 8.8 Left: Qualified run-time distributions for CLK-ABCC on TSPLIB instance
pcb3038, using various target solution quality values (shown as percentage deviations from the optimum). Right: Analogous results for ILK-H. The exponential distributions
ed[m] are shown to illustrate stagnation behaviour of the respective algorithm. (For further details, see text.)
(We refer to Chapter 4, Section 4.4 for details on how to use exponential distributions for detecting stagnation behaviour.) CLK-ABCC has difficulties in reaching extremely high quality solutions, as witnessed by the fact that in our experiment, it found an optimal solution of pcb3038 in only one of 100 runs of 10 000 CPU seconds each. In contrast, within the same time, ILK-H solves instance pcb3038 optimally with a probability of more than 0.9. However, ILK-H is also affected by stagnation behaviour, as can be seen from the comparison with the exponential distribution shown on the right side of Figure 8.8. ILS algorithms can reach close to optimal solution qualities for much larger instances. In Table 8.3, we give some indicative performance results of an iterated-3-opt algorithm (an implementation of this algorithm is available from www.sls-book.net), CLK-ABCC and ILK-H for several TSPLIB instances ranging from 4 461 to 13 509 vertices. Here, each algorithm was run 10 times with a time limit of 600 CPU seconds; the table shows basic statistics of the solution quality distributions obtained for this run-time. On the smallest of these instances, ILK-H found the known optimal solution in 6 of 10 runs; neither of the three algorithms found optimal solutions of the three other instances within the given computation time limit. High-performance ILS algorithms for the TSP have been shown to be able to find high-quality solutions for even larger problem instances. For example, when running CLK-ABCC for 600 seconds on the largest TSPLIB instance, pla85900, the average solution quality measured across 10 runs was 0.31% above the best known lower bound (as of September 2003).
392
Chapter 8 Travelling Salesman Problems
Iterated-3-opt Instance
fnl4461 pla7397 rl11849 usa13509
CLK-ABCC
∆min ∆avg ∆max ∆min ∆avg ∆max ∆min 0.20 0.22 0.44 0.39
0.28 0.35 0.66 0.51
0.35 0.03 0.48 0.040 0.76 0.16 0.58 0.097
0.08 0.19 0.25 0.13
ILK-H ∆avg
∆max
0.11 0.0 0.0016 0.0027 0.33 0.011 0.029 0.054 0.39 0.044 0.062 0.095 0.17 0.029 0.038 0.055
Table 8.3 Solution quality statistics for an iterated-3-opt algorithm, CLK-ABCC and
ILK-H on several large TSPLIB instances. ∆min , ∆avg and ∆max denote the minimum, average and maximum percentage deviation from the known optimal solution qualities over 10 runs of 600 CPU seconds each on an Athlon 1.2 GHz MP CPU with 1GB of RAM.
Other Perturbation Mechanisms Generally, the perturbation mechanism and its relation to the subsidiary local search procedure can have a significant impact on the performance of an ILS algorithm; consequently, a wide range of perturbation mechanisms has been proposed and studied in the context of ILS algorithms for the TSP. Hong et al. [1997] have studied ILS algorithms based on 2-opt, 3-opt and LK local search that use single random k -exchange steps with fixed values of k between 2 and 50 for perturbation. The resulting algorithms have been empirically evaluated on TSPLIB instances lin318 and att532, as well as on a RDM instance with 800 vertices. Although there is some indication that on TSPLIB instances perturbations with k > 4 result in better solution qualities after a fixed number of local search steps, it is unclear whether these results also hold when using a termination criterion based on a bound on CPU time, or whether these observations generalise to other TSP instances. Perturbations can be more complex than simple (random) k -exchange steps. One example for a complex perturbation is the mechanism proposed by Codenotti et al. [1996] which involves modifications of the instance data. Their perturbation procedure works as follows: First, the given metric TSP instance G is slightly modified by introducing small perturbations in the edge weights. (For Euclidean TSP instances, this is achieved by changing the coordinates of some vertices.) Note that as a result of this modification, a previously locally optimal tour s may no longer be a local minimum. Then, the subsidiary local search procedure is run on G until a local minimum s is found (see Figure 8.9 for an illustration of this procedure). At this point, the modified instance G is discarded, and s is returned as the overall result of the perturbation, which provides the
8.3 Iterated Local Search Algorithms for the TSP s
v1
393
s' v1 v3
v3 v2
v8
1. coordinate perturbation v4
v5
v2
v5 v4
v8
2. local search
v7
v6
v7
v6
Figure 8.9 Example of a perturbation based on instance data modifications for a small
Euclidean TSP instance. Here, the data perturbation corresponds to a modification of the coordinates of vertices v1 , v2 and v5 . Left: Locally optimal tour s for the original coordinates; right: locally optimal tour s for the perturbed coordinates.
starting point of the subsequent local search phase for the original instance G. There is some indication that in a standard ILS algorithm, using this perturbation mechanism — despite its relatively high time-complexity — can result in slightly better performance than using simple double-bridge perturbation [Codenotti et al., 1996]. However, state-of-the-art ILS algorithms for the TSP, such as CLKABCC, typically achieve much better performance [Applegate et al., 2003c]. (It should be noted that this general perturbation approach has been proposed and successfully applied in the context of a very early ILS algorithm for a location problem [Baxter, 1981].) Another interesting perturbation mechanism is the genetic transformation (GT) by Katayama and Narisha [1999], which introduces ideas from Evolutionary Algorithms into ILS. The GT procedure is based on the intuition that sub-tours that are common between the current incumbent tour, sˆ, and the current locally optimal tour, t, should be preserved; it works as follows: First, all common subtours between sˆ and t are determined; this can be achieved in time O (n), where n is the number of vertices in the given TSP instance. Then, the perturbation result is obtained by connecting these sub-tours using a procedure that works analogously to the Nearest Neighbour Heuristic. The starting vertex ui is chosen at random from the set of vertices that have zero or one incident edges; in the former case, the sub-tour consists of a single vertex, while in the latter case, ui is the end of a subtour with at least two vertices. Then, in each construction step, ui is connected to an eligible vertex for which w((ui , uj )) is minimal. The construction is continued from uj if that vertex has degree one, or otherwise from the free
394
Chapter 8 Travelling Salesman Problems
procedure GILS(G) input: weighted graph G output: candidate tour sˆ
s := init(G); s := localSearch(G, s); t := init(G); t := localSearch(G, t); if (f(s) < f(t)) then sˆ := s; else
sˆ := t; end while not terminate(G, sˆ) do
t := GT(G, sˆ, t); t := localSearch(G, t ); if (f(t ) < f(ˆ s)) then sˆ := t ; end
t := t ; end return sˆ end GILS Figure 8.10 Algorithm outline of Genetic Iterated Local Search (GILS) for the TSP; the
function GT implements the GT perturbation mechanism. (For further details, see text.)
endpoint of the subtour starting at uj . (This construction process is similar to the mechanism underlying the DPX recombination operator, which has been used in an earlier memetic algorithm [Freisleben and Merz, 1996; Merz and Freisleben, 1997].) The GT perturbation mechanism is embedded into the Genetic Iterated Local Search (GILS) algorithm for the TSP, outlined in Figure 8.10. Empirical comparisons between this algorithm and a variant that uses the standard double-bridge move for perturbation have shown that using the GT perturbation can result in significant improvements in solution quality [Katayama and Narihisa, 1999].
Other Acceptance Criteria While various choices for the subsidiary local search and perturbation procedures have been studied in the literature, much less attention has been paid
8.3 Iterated Local Search Algorithms for the TSP
395
to the acceptance criterion, although it can have a strong impact on the balance between diversification and intensification of the search process. In fact, most ILS implementations for the TSP accept only improving tours. However, other acceptance criteria are used occasionally. For example, the previously mentioned LSMC algorithms by Martin, Otto and Felten use a Metropolis acceptance criterion (see page 384ff. for details on LSMC), where non-zero temperatures have been shown to improve the performance of their algorithms on some instances. Simulated annealing-type acceptance criteria were also examined by Rohe [1997], who found that for one TSPLIB instance, d18512, when using a carefully tuned annealing schedule and long computation times, slight improvements in solution quality could be obtained compared to a standard ILK algorithm. Hong, Kahng and Moon have studied a variant of ILS they call hierarchical LSMC, which is based on the same Metropolis acceptance criterion as LSMC and uses the following mechanism for controlling the temperature parameter. By default, the temperature is set to zero, that is, only improving candidate solutions are accepted. However, when search stagnation is detected, the temperature is set to f (si )/200 for the following 100 iterations, that is, a deterioration of the tour length by 0.5 percent is accepted with a probability of 1/e; after 100 iterations, the temperature is reset to zero. The criterion used for detecting search stagnation is satisfied if, and only if, no improvement in the incumbent tour has been obtained for ir iterations. The value of ir is chosen depending on the subsidiary local search procedure; for 2-opt or 3-opt local search, ir := 2 · n is used, while in hierarchical LSMC with LK local search, ir is set to 100. Limited empirical results suggest that on some TSPLIB instances, hierarchical LSMC may perform slightly better than other ILS algorithms for the TSP, but it is unclear whether this performance advantage is statistically significant. A more detailed study of different acceptance criteria has been undertaken by Stützle and Hoos [Stützle, 1998c; Stützle and Hoos, 2001], who performed an empirical analysis of the run-time distributions of ILS algorithms that accept only improving solutions. (This type of analysis is illustrated in Example 8.3, page 390f.) Based on the severe stagnation behaviour that was observed in this study for very high target solution qualities, two acceptance criteria were introduced that had been designed to increase search diversification. The first of these is a simple dynamic restart criterion (see also Section 4.4), which restarts ILS from a new initial solution, if no improved solution has been found for ir iterations. This dynamic restart acceptance criterion can be defined as
acceptSR(G, s, s ) :=
⎧ ⎪ s ⎪ ⎨
s
⎪ ⎪ ⎩ init(G)
if w(s ) < w(s) if w(s ) ≥ w(s) and i − ˆı < ir otherwise
396
Chapter 8 Travelling Salesman Problems
where i denotes the current iteration number, ˆı is the iteration number of the most recent improvement in the incumbent candidate solution, and init(G) is a newly generated initial tour for the given TSP instance G. Note that every time the search is restarted from a new initial candidate solution, some initial, instance-dependent time tinit has to be spent before there is a reasonable chance to encounter high-quality candidate solutions. To avoid this disadvantage, a less radical and more directed fitness-distance diversification mechanism can be used. This mechanism is based on a variant of the dynamic restart criterion acceptSR(G, s, s ) in which the restart function init(G) is res, G), a function that attempts to find a high-quality candidate placed by fdd(ˆ solution beyond a certain minimum distance from the incumbent tour sˆ. In the following, we use d(s, s ) to denote the bond distance between two tours s and s , that is, the number of edges contained in s but not in s or vice versa. The function fdd(ˆ s, G) works as follows: 1. Generate a set P comprising p copies of sˆ. 2. Apply one perturbation step followed by a subsidiary local search phase to each candidate solution in P . 3. Let Q be the set of the q highest quality candidate solutions from P (where 1 ≤ q ≤ p). 4. Let s˜ be a candidate solution from Q with maximal distance to sˆ. If d(˜ s, sˆ) ≤ dmin or if a given maximal number of iterations has not been exceeded, then go to step 2; otherwise return s˜. Note that step 3 ensures that high-quality candidate solutions are obtained, while the goal of step 4 is increased search diversification. In Stützle and Hoos [2001], the parameter dmin is estimated by first computing the average distance davg between a number of local optima; each time fdd is called, dmin is alternatingly set to 0.25 · davg and 0.5 · davg . Several variations of this fitness-distance diversification mechanism have also been studied, but none of those was found to obtain significantly better performance. Example 8.4 Effectiveness of Acceptance Criteria
The performance results in Table 8.4 illustrate the influence of various acceptance criteria on the performance of an iterated 3-opt algorithm. (Similar results for ILS algorithms based on 2-opt and LK local search procedures can be found in Stützle and Hoos [2001].) The perturbation procedure of this ILS algorithm is a slight variation of the geometric double-bridge perturbation used in CLK-ABCC. ILS-Descent, ILS-Restart and ILS-FDD denote three variants of the algorithm that differ solely in their acceptance criterion; while ILS-Descent accepts only improving candidate solutions, ILS-Restart and
8.3 Iterated Local Search Algorithms for the TSP
ILS-Descent Instance
fopt
∆avg
ILS-Restart
tavg
fopt
∆avg
397
ILS-FDD
tavg
fopt
∆avg
tavg
0 159.5 rat783 0.71 0.029 238.8 0.51 0.018 384.9 1.0 pcb1173 0 0.26 461.4 0 0.040 680.2 0.56 0.011 652.9 d1291 0.08 0.29 191.2 0.68 0.012 410.8 1.0 0 245.4 fl577 0.12 0.52 494.6 0.92