PARALLEL METAHEURISTICS A New Class of Algorithms
Edited by
Enrique Alba
@ZEiCIENCE A JOHN WILEY & SONS, INC., PUBLICATION
This Page Intentionally Left Blank
PARALLEL METAHEURISTICS
WILEY SERIES ON PARALLEL AND DISTRIBUTED COMPUTING Series Editor: Albert Y. Zomaya Parallel and Distributed Simulation Systems / Richard Fujimoto Mobile Processing in Distributed and Open Environments / Peter Sapaty Introduction to Parallel Algorithms / C. Xavier and S. S. lyengar Solutions to Parallel and Distributed Computing Problems: Lessons from Biological Sciences /Albert Y. Zomaya, Fikret Ercal, and Stephan Olariu (Editors) Parallel and Distributed Computing: A Survey of Models, Paradigms, and Approaches / Claudia Leopold Fundamentals of Distributed Object Systems: A CORBA Perspective / Zahir Tari and Omran Bukhres Pipelined Processor Farms: Structured Design for Embedded Parallel Systems I Martin Fleury and Andrew Downton Handbook of Wireless Networks and Mobile Computing / Ivan Stojmenovic (Editor) Internet-BasedWorkflow Management: Toward a Semantic Web / Dan C. Marinescu Parallel Computing on Heterogeneous Networks / Alexey L. Lastovetsky Performance Evaluation and Characteization of Parallel and Distributed Computing Tools / Salim Hariri and Manish Parashar Distributed Computing: Fundamentals, Simulations and Advanced Topics, Second Edition / Hagit Attiya and Jennifer Welch Smart Environments: Technology, Protocols, and Applications / Diane Cook and Sajal Das Fundamentalsof Computer Organization and Architecture / Mostafa Abd-El-Barr and Hesham El-Rewini Advanced Computer Architecture and Parallel Processing / Hesham El-Rewini and Mostafa Abd-El-Barr UPC: Distributed Shared Memory Programming / Tarek El-Ghazawi, William Carlson, Thomas Sterling, and Katherine Yelick Handbook of Sensor Networks: Algorithms and Architectures / Ivan Stojmenovic (Editor) Parallel Metaheuristics: A New Class of Algorithms / Enrique Alba (Editor)
PARALLEL METAHEURISTICS A New Class of Algorithms
Edited by
Enrique Alba
@ZEiCIENCE A JOHN WILEY & SONS, INC., PUBLICATION
Copyright Q 2005 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act. without either the prior written permission of the Publisher. or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center. Inc., 222 Rosewood Drive, Danvers. MA 01923, (978) 750-8400, fax (978) 750-4470. or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., I 1 1 River Street, Hoboken. NJ 07030, (201 ) 748-601 I , fax (201 ) 748-6008, or online at http://www.wiley.coni/go/permission. Limit of LiabilityiDisclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (3 17) 572-3993 or fax (3 17) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products. visit our web site at www.wiley.com. Libray of Congress Cataloging-in-Publication Data:
Parallel metaheuristics : a new class of algorithms /edited by Enrique Alba. p. cm. ISBN-I 3 978-0-471 -67806-9 ISBN- I0 0-47 1-67806-6 (cloth) 1. Mathematical optimization. 2. Parallel algorithms. 3. Operations research. I . Alba, Enrique. T57.P37 2005 5 19.6-4c22 2005001251 Printed in the United States of America 1 0 9 8 7 6 5 4 3 2 1
Contents Foreword Preface Contributors INTROD JCTION TO n ETA t E JR STlCS AND PARALLELISR
Part I 1
An Introduction to Metaheuristic Techniques
xi
...
Xlll
xv
1
3
Christiun Blum. Andrea Roli, Enrique Alba
1.1 1.2 1.3 1.4 1.5 1.6
2
Introduction Trajectory Methods Population-Eased Methods Decentralized Metaheuristics Hybridization of Metaheuristics Conclusions References
Measuring the Performance of Parallel Metaheuristics
3 8 19 28 29 31 31 43
Enrique Alba, Gabriel Luque
2.1 2.2 2.3 2.4 2.5
3
Introduction Parallel Performance Measures How to Report Results Illustrating the Influence of Measures Conclusions References
New Technologies in Parallelism
43 44 48 54 60 60 63
Enriyue Albo. Antonio J. Nehro
3.1 3.2 3.3
Introduction Parallel Computer Architectures: An Overview Shared-Memory and Distributed-Memory Programming
63 63 65 V
Vi
CONTENTS
Shared-Memory Tools Distributed-Memory Tools Which of Them'? Summary References
68 70 74 75 76
Metaheuristics and Parallelism
79
3.4 3.5 3.6 3.7
4
Enrique Alba. El-Ghazali Talbi, Gabriel Luque. Nouredine Melab
4.1 4.2 4.3 4.4 4.5 4.6 4.7
Part I I
5
Introduction Parallel LSMs Case Studies of Parallel LSMs Parallel Evolutionary Algorithms Case Studies of Parallel EAs Other Models Conclusions References
79 80 81 85 87 93 95 96
PARALLEL METAHEURISTIC MODELS
105
Parallel Genetic Algorithms
107
Gabriel Luque, Enrique Aha. Bemabe Dorronsoro
5.1 5.2 5.3 5.4 5.5 5.6
introduction Panmictic Genetic Algorithms Structured Genetic Algorithms Parallel Genetic Algorithms Experimental Results Summary References
6 Parallel Genetic Programming
107 108
110 112 1 I8 121
122 127
E Ferndndez. G. Spezzano, M. Tomassini. L. Vanneschi
6.1 6.2 6.3 6.4 6.5 6.6 6.7
Introduction to GP Models of Parallel and Distributed GP Problems Real-Life Applications Placement and Routing in FPGA Data Classification Using Cellular Genetic Programming Concluding Discussion References
127 130 134 137 139 144 150 150
CONTENTS
7
Parallel Evolution Strategies
vii 155
Gunter Rudolph
7.1 7.2 7.3 7.4 7.5
8
Introduction Deployment Scenarios of Parallel Evolutionary Algorithms Sequential Evolutionary Algorithms Parallel Evolutionary Algorithms Conclusions References
Parallel Ant Colony Algorithms
155 156 159 159 165
165 171
Stefan Janson. Daniel Merkle, Martin Middendorf
9
8.1 8.2 8.3 8.4
Introduction Ant Colony Optimization Parallel ACO Hardware Parallelization of ACO
8.5
Other Ant Colony Approaches References
Parallel Estimation of Distribution Algorithms
171 172 175 190 195 197 203
Julio Madera. Enrique Alba, Alberto Ochoa
9.1 9.2 9.3 9.4 9.5
Introduction Levels of Parallelism in EDA Parallel Models for EDAs A Classification of Parallel EDAs Conclusions
203 204 206 216 219
References
220
10 Parallel Scatter Search
223
F. Garcia, M . Garcia. B. Melian, J. A. Moreno-Pkrez. J. M . Moreno-Vega
10.1 10.2 10.3 10.4 10.5 10.6 10.7
Introduction Scatter Search Parallel Scatter Search Application of Scatter Search to the p-Median Problem Application of Scatter Search to Feature Subset Selection Computational Experiments Conclusions References
223 224 225 229 232 239 243 244
viii
CONTENTS
1 1 Parallel Variable Neighborhood Search
247
Josi A. Moreno-Perez. Pierre Hansen, Nenad MladenoviL:
1 1.1 Introduction
11.2 The VNS Metaheuristic 11.3 The Parallelizations 1 1.4 Application of VNS for the p-median 1 1.5 Computational Experiments
1 1.6 Conclusions References 12 Parallel Simulated Annealing
247 24 8 25 1 258 262 263 264 261
M. Emin Aydin, Vecihi Yi@t
12.1 12.2 12.3 12.4 12.5
Introduction Simulated Annealing Parallel Simulated Annealing A Case Study Summary References
13 Parallel Tabu Search
267 268 269 275 283 284 289
Teodor Gabriel Cruinic. Michel Gendreuu. Jean- Yves Potvin
Introduction Tabu Search Parallelization Strategies for Tabu Search Literature Review Two Parallel Tabu Search Heuristics for Real-Time Fleet Management 13.6 Perspectives and Research Directions References
13.1 13.2 13.3 13.4 13.5
14 Parallel Greedy Randomized Adaptive Search Procedures
289 290 29 1 294 302 305 306 315
Mauricio G. C. Resende, Celso C. Riheiro
14.1 14.2 14.3 14.4 14.5
Introduction Multiple-Walk Independent-Thread Strategies Multiple-Walk Cooperative-Thread Strategies Some Parallel GRASP Implementations Conclusion References
315 317 323 327 340 34 1
CONTENTS
15 Parallel Hybrid Metaheuristics
ix
347
Carlos Cotta, El-Ghazali Talbi. Enrique Alba
15.1 15.2 15.3 15.4 15.5 15.6
Introduction Historical Notes on Hybrid Metaheuristics Classifying Hybrid Metaheuristics Implementing Parallel Hybrid Metaheuristics Applications of Parallel Hybrid Metaheuristics Conclusions References
16 Parallel Multiobjective Optimization
347 348 350 355 358 359 359
371
Antonio J. Nebro, Francisco Luna, El-Ghazali Talbi. Enrique Alba
16.1 16.2 16.3 16.4 16.5
Introduction Parallel Metaheuristics for Multiobjective Optimization Two Parallel Multiobjective Metaheuristics Experimentation Conclusions and Future Work References
17 Parallel Heterogeneous Metaheuristics
37 1 372 377 379 386 387
395
Francisco Luna. Enrique Alba, Antonio J. Nebro
17.1 17.2 17.3 17.4 17.5 17.6
Introduction Heterogeneous Metaheuristics Survey Taxonomy of Parallel Heterogeneous Metaheuristics Frameworks for Heterogeneous Metaheuristics Concluding Remarks Annotated Bibliography References
395 397 400 404 406 407 412
Part 111 THEORY AND APPLICATIONS
423
18 Theory of Parallel Genetic Algorithms
425
Erick Cantu-Paz
18.1 18.2 18.3 18.4 18.5
Introduction Master-Slave Parallel GAS Multipopulation Parallel GAS Cellular Parallel GAS Conclusions References
425 42 8 430 43 7 438 439
X
CONTENTS
19 Parallel Metaheuristics Applications
441
Teodor Gabriel Crainic. Nourredine Hail
19.1 Introduction 19.2 Parallel Metaheuristics 19.3 Graph Coloring 19.4 Graph Partitioning 19.5 Steiner Tree Problem 19.6 Set Partitioning and Covering 19.7 Satisfiability Problems 19.8 Quadratic Assignment 19.9 Location Problems 19.10 Network Design 19.1 1 The Traveling Salesman Problem 19.12 Vehicle Routing Problems 19.13 Summary References
20 Parallel Metaheuristics in Telecommunications
447 44 8 45 1 452 456 457 459 462 464 468 47 1 476 479 480 495
Sergio Nesmachnou: Hdctor Cancela. Enriqzre Alba, Francisco Chicano
Introduction Network Design Network Routing Network Assignment and Dimensioning Conclusions References
495 496 502 504 510 510
21 Bioinformatics and Parallel Metaheuristics
51 7
20.1 20.2 20.3 20.4 20.5
Osuraldo T r e k . AndrPs Rodriguez
2 1.1 21.2 2 1.3 21.4 2 1.5 21.6
Index
Introduction Bioinformatics at a Glance Parallel Computers Bioinformatic Applications Parallel Metaheuristics in Bioinformatics Conclusions References
517 519 522 526 534 543 543 55 1
Foreword
Metaheuristics are powerful classes of optimization techniques that have gained a lot of popularity in recent years. These techniques can provide useful and practical solutions for a wide range of problems and application domains. The power of metaheuristics lies in their capability in dealing with complex problems with no or little knowledge of the search space, and thus they are particularly well suited to deal with a wide range of computationally intractable optimizations and decision-making applications. Rather simplistically, one can view metaheuristics as algorithms that perform directed random searches of possible solutions, optimal or near optimal, to a problem until a particular termination condition is met or after a predefined number of iterations. At the first instance, this can be seen as a drawback because the search for a solution may take too much time to an extent that renders the solution impractical. Fortunately, many classes of metaheuristics are inherently parallelizable and this led researchers to develop parallelization techniques and efficient implementations. Of course, in some metaheuristics, parallelization is much easier to achieve than in others, and with that comes issues of implementation on actual parallel platforms. In earlier implementations the master-slave paradigm was the preferred model used to run metaheuristics and still is a valid approach for many classes of these algorithms. However, due to the great variety of computer architectures (shared memory processors, clusters, grids, etc.) other approaches have been developed and more concerted work is needed in this direction. Moreover, another important issue is that of the development of parallelization tools and environments that ease the use of metaheuristics and extend their applicability range. Professor Alba’s new book, Parallel Metaheuristics, is a well-timed and worthy effort that provides a comprehensive and balanced blend of topics, implementations, and case studies. This volume will prove to be a very valuable resource for researchers and practitioners interested in using metaheuristics to solve problems in their respective disciplines. The book also serves as a repository of significant reference material as the list of references that each chapter provides will serve as a useful source of further study.
Professor Albert Y. Zomaya ClSCO Systems Chair, Professor of Internetworking The University of Sydney, Australia May 2005 xi
This Page Intentionally Left Blank
Preface The present book is the result of an ambitious project to bring together the various visions of researchers in both the parallelism and metaheuristic fields, with a main focus on optimization. In recent years, devising parallel models of algorithms has been a healthy field for developing more efficient optimization procedures. What most people using these algorithms usually miss is the important idea that parallel models that run in multiple computers are quite modified versions of the sequential solvers they have in mind. This of course means that not only the resulting algorithm is faster in wall clock time, but also that the underlying algorithm performing the actual search is a new one. These new techniques have their own dynamics and properties, many of them coming from the kind of separate decentralized search that they perform, while many others come from their parallel execution. Creating parallel metaheuristics is just one way for improving an algorithm. Other different approaches account for designing hybrid algorithms (merging ideas from existing techniques), creating specialized operations for the problem at hand, and a plethora of fruitful research lines of the international arena. However, designing parallel metaheuristics has an additional load of complexity, since doing it appropriately implies that the researcher must have background knowledge from the two combined fields: parallelism and metaheuristics. Clearly, this is difficult, since specialization is a must nowadays, and these two fields are naturally populated by often separate groups of people. Thus, many researchers in mathematics, engineering, business, physics, and pure computer science deal quite appropriately with the algorithms, but have no skills in parallelism. Complementary, many researchers in the field of parallelism are quite skilled with parallel software tools, distributed systems, parallel languages, parallel hardware, and many other issues of high importance in complex applications; but the problem arises since these researchers often do not have deep knowledge in metaheuristics. In addition, there are also researchers who are application-driven in their daily work; they only want to apply the techniques efficiently, and do not have the time or resources (nor maybe the interest) in the algorithms themselves nor in parallelism, just in the application. This book is intended to serve all of them, and this is why I initially said that it tries to fulfill an ambitious goal. The reader will have to judge to which extent this goal is met in the contents provided in the different chapters. Most chapters contain a methodological first part dealing with the technique, in order to settle its expected behavior and the main lines that could lead to its parallelization. In a second part, chapters discuss how parallel models can be derived for the technique to become xiii
xiv
PREFACE
more efficient and what are the implications for the resulting algorithms. Finally, some experimental analysis is included in each chapter in order to help understand the advantages and limits of each proposal from a practical point of view. In this way, researchers whose specialities are in either domain can profit from the contents of each chapter. This is the way in which the central part of the book, entitled Parallel Metaheuristic Models (Chapters 5 to 17) was conceived. There are of course some exceptions to this general chapter structure to make the book more complete. I added four initial chapters introducing the twoje1d.s (Chapters 1 to 4) and four trailing chapters dealing with theory and applications (Chapters 18 to 21). The resulting structure has three building blocks that offer an opportunity to the reader to select the parts or chapters he/she is more interested in. The four initial chapters are targeted to a broad sector of readers that want to know in a short time what are the most important topics and issues in metaheuristics and in parallelism, both dealt together or separately. In the third part, also included is an invited chapter on theoretical issues for Parallel Genetic Algorithms (a widely used metaheuristic) and three more chapters dealing with applications of these algorithms. Since the spectrum of potential applications is daunting, I decided to to devote a chapter to complex applications in general to reach a large audience, plus two additional ones on interesting, influent, and funded research lines internationally, that is telecommunications and bioinformatics. The whole work is targeted to a wide set of readers, ranging from specialists in parallelism, optimization, application-driven research, and even graduate courses or beginners with some curiosity of the advances and latest techniques in parallel metaheuristics. Since it is an edited volume, I was able to profit from well-known international researchers as well as from new research lines on related topics started recently; this is an important added value that a non edited book could not show. I would like to end this introduction with my profound acknowledgment to all authors contributing a chapter to this book, since any merit this work could deserve must be credited to them. Also, I thank the research group in my university in Malaga for all their effort and help in this project. I also appreciate the support received from Wiley during the whole editing process as well as the decisive endorsement of Professor A. Zomaya to make true this idea. To all of them, thank you very much. Myjnal words are of course for my family: my wifepAna, and my children, Enrique and Ana, the three lights that are always guiding my life, anytime, anywhere. Malaga. Spain May 2005
ENRIQUE ALBA
Contributors E. ALBA University o f Malaga, Spain
C. COTTA
M. E. AYDIN
T. CRAINIC
University o f Malaga, Spain Departamento de Lenguajes y Ciencias de la Computacion E.T.S.I. lnformatica (3-2-49) Campus de Teatinos, 29071 Malaga (Spain)
[email protected] Departamento de Lenguajes y Ciencias de la Computacion E.T.S.I. lnformatica (3-2-12) Campus de Teatinos, 29071 Malaga (Spain)
[email protected] London South Bank University. UK London South Bank University, BClM 103 Borough Rd.. London. SEI OAA (UK)
[email protected] c. BLUM
Polytechnic University o f Catalunya, Spain Dept. de Llenguatges i Sistemes Informatics Universitat Politecnica de Catalunya Jordi Girona 1-3, C6119 Campus Nord E-08034 Barcelona (Spain)
[email protected] H. CANCELA University o f La Republica, Urugu: ‘Y Facultad e lngeniefla J . Herrera y Reissig 565, Montevideo 1 1300 (Uruguay)
[email protected] E. CANTU-PAZ Lawrence Livermore National Laboratory, USA Center for Applied Scientific Computing Lawrence Livermore National Laboratory 7000 East Avenue, L-56 1 Livermore, CA 94550 (USA) cantupaz@llnl .gov
F. CHICANO University o f Malaga. Spain
Departamento de Lenguajes y Ciencias de la Computacion E.T.S.I. lnformatica (3-3-4) Campus de Teatinos. 29071 Malaga (Spain)
[email protected] Transport Research Center and U n versity o f Quebec at Montreal, Canada Departement Management et Technologie Universite du Quebec a Montrkal 31 5. rue Sainte-Catherine est, local R-2380 Montrkal QC H2X 3 x 2 (Canada) theo@>crt.umontreal.ca
B. DORRONSOROUniversity o f Malaga, Spain Departamento de Lenguajes y Ciencias de la Computacibn E.T.S.I. Informatica (3-3-4) Campus de Teatinos, 29071 Malaga (Spain)
[email protected] F. FERNANDEZ
University o f Extremadura, Spain Centro Universitario de Merida Universidad de Extremadura C/ Sta. Teresa de Jornet, 38 06800 Merida (Spain)
[email protected] F. GARCiA
University of La Laguna. Spain Departamento de Estadistica. 1.0.y Computacion Universidad de La Laguna 38271 La Laguna (Spain)
[email protected] M. GARCiA
University o f La Laguna, Spain Departamento de Estadistica, 1.0.y Computacion Universidad de La Laguna 38271 La Laguna (Spain)
[email protected] xv
XVi
CONTRIBUTORS
M. GENDREAUTransport Research Center and University o f Montreal. Canada Centre de Recherche sur les Transports Universite de Montreal C.P. 6128, succ. Centre-ville Montreal-Quebec (Canada)
[email protected] N.
HAIL Transport Research Center and Uni-
versity o f Montreal, Canada Centre de Recherche sur les Transports Universite de Montreal C.P. 6128. succ. Centre-ville Montreal-Quebec (Canada) hailacrt .umontreal.ca
P. HANSEN GERADand HEC Montreal. Canada 3000, ch. de la Cote-Sainte Catherine Montreal (Quebec) H3T 2A7 (Canada)
[email protected] S. JANSON
D.MERKLE
University of Leipzig, Germany Parallel Computing and Complex Systems Group Faculty of Mathematics and Computer Science University of Leipzig Augustusplatz l0il I D-04109 Leipzig(Germany)
[email protected] M. MIDDENDORF University of Leipzig. Germany Parallel Computing and Complex Systems Group Faculty of Mathematics and Computer Science University of Leipzig Augustusplatz 10/11 D-04109 Leipzig(Germany) middendorf~~~informati k.uni-leipzig.de
N. MLADENOVIC Mathematical lnstitute(SANU), Belgrade Mathematical Institute, SANU Knez Mihajlova 32 1 1000 Belgrade Serbia and Montenegro nenadami .sanu.ac .yu
University of Leipzig. Germany Parallel Computing and Complex Systems Group J. A. MORENO-PkREZ University of La LaFaculty of Mathematics and Computer Science University of Leipzig guna, Spain Departamentode Estadistica, 1.0.y Computacion Augustusplatz 10/11 D-04109 Leipzig (Germany) Universidad de La Laguna janson@informati k.uni-leipzig.de 38271 La Laguna (Spain) F. LUNA University of Malaga, Spain jamoreno~jull.es Departamento de Lenguajes y Ciencias de la J. M. MORENO-VEGA University of La LaComputacion E.T.S.I. lnformatica (3-3-4) guna. Spain Campus de Teatinos. 29071 MBlaga (Spain) Departamento de Estadistica, I .O. y Computacion
[email protected] Universidad de La Laguna 38271 La Laguna (Spain) G . LUQUE University of Malaga, Spain
[email protected] Departamento de Lenguajes y Ciencias de la A. J. NEBRO University o f Malaga, Spain Computacion Departamento de Lenguajes y Ciencias de la E.T.S.I. lnformatica (3-3-4) Computacion Campus de Teatinos, 29071 Malaga (Spain)
[email protected] E.T.S.I. lnformatica (3-2-1 5) Campus de Teatinos, 29071 Malaga (Spain) J. MADERA University of Camaguey. Cuba
[email protected] Department of Computing S. NESMACHNOW University o f La Repuhlica, Circunvalacion Norte km. 5'12 Camaguey (Cuba) Uruguay
[email protected] Facultad e Ingenieria J. Herrera y Reissig 565, N. MELAB University o f Lille, France Montevideo 11300 (Uruguay) Lahoratoire d'lnformatique Fondamentale de Lille
[email protected] U M R CNRS 8022, Citl scientifique 59655 Villeneuve d'Ascq cedex (France) A. OCHOA ICIMAF, Cuba Institute ofCyhernetics, Mathematics and Physics
[email protected] Calk I5 No. 551 e / C y D B. MELIAN University of La Laguna, Spain 10400 La Habana (Cuba) Departamentode Estadistica, I.O. y Computacion
[email protected] Universidad de La Laguna 38271 La Laguna (Spain) mhmelian(ic:ull.es
CONTRIBUTORS
J Y . POTVlN
Transport Research Center and University of Montreal, Canada Dept. lnformatique et Recherche Operationnelle Bureau 3383, Pavillon Andre-Aisenstadt CP 6128, succ. Centre-Ville Montreal Quebec H3C 357 (Canada)
[email protected] M. G. C. RESENDE AT&T Labs Research, Shannon Laboratory, USA AT&T Labs Research, Algorithms and Optimization R. D. 180 Park Avenue. Room C-241 Florham Park. NJ 07932-0971 (USA)
[email protected] C.
G. SPEZZANO
University of Calabria, Italy ICAR-CNR c/o DEIS, Universita della Calabria Via Pietro Bucci cubo 41 C 87036 Rende, CS (Italy)
[email protected] E. G. TALBI University of Lille, France
Laboratoire d'lnformatique Fondamentale de Lille UMR CNRS 8022, Cite scientifique 59655 Villeneuve d'Ascq cedex (France) El-ghazali.Talbi~~lifl.fr
M. TOMASSINIllniversityof Lausanne, Switzerland Information Systems Department University of Lausanne 1015 Dorigny-Lausanne (Switzerland)
[email protected] c.
RlBElRO Universidade Federal Fluminense, Brazil Department of Computer Science Rua Passo da Patria 156 24210-240 Niteroi, RJ (Brazil) celsoQinf.puc-rio.br
0. TRELLES University of Malaga. Spain Dpto. de Arquitectura de Computadores E.T.S. Ingenieria Informatica, Campus de Teatinos 29071 Malaga (Spain)
[email protected] A. RODR~GUEZ University of Malaga, Spain Dpto. de Arquitectura de Computadores E.T.S. lngenieria Informatica. Campus de Teatinos 29071 Malaga (Spain)
[email protected] A. ROLI University G.D'Annunzio, Italy Dipartimento di Scienze Universita degli Studi "G.D'Annunzio" Viale Pindaro 42 65 127 Pescara (Italy)
[email protected] G. RUDOLPH Parsytec GmbH, Germany Parsytec AG Auf der Huls I83 52068 Aachen (Germany)
[email protected] xvii
L.
VANNEWHI University of Calabria, Italy Dipartimento di Informatica. Sistemistica e Comunicazione Universita di Milano-Bicocca Via Pietro Bucci cubo 41C Via Bicocca degli Arcimboldi 1, Milano (Italy)
[email protected] v. YlGlT
University of Ataturk, Turkey Ataturk University. Faculty of Engineering Dept. of Industrial Engineering, Erzurum, (Turkey) vyigiteatauni .edu.tr
This Page Intentionally Left Blank
Part I Introduction to Metaheuristics and Parallelism
This Page Intentionally Left Blank
1
An Introduction to Metaheuristic Techniques CHRISTIAN BLUM1, ANDREA ROL12, ENRIQUE ALBA3 ‘Universitat Politecnica de Catalunya, Spain
2Universita degli Studi ‘GD’Annunzio”,Italy 3Universidadde Malaga, Spain
1.1
INTRODUCTION
In optimization we generally deal with finding among many alternatives a best (or good enough) solution to a given problem. Optimization problems occur everywhere in our daily life. Each one of us is constantly solving optimization problems, such as finding the shortest way from our home to our work place subject to traffic constraints or organizing our agenda. (Most) human brains are pretty good in efficiently finding solutions to these daily problems. The reason is that they are still tractable, which means that their dimension is small enough to process them. However, these types of problems also arise in much bigger scales, such as, for example, making most beneficial use of the airplane fleet of an airline with the aim of saving fuel and parking costs. These kinds of problems are usually so high-dimensional and complex that computer algorithms are needed for tackling them. Optimization problems can be modelled by means of a set of decision variables with their domains and constraints concerning the variable settings. They naturally divide into three categories: (i) the ones with exclusively discrete variables (i.e., the domain of each variable consists of a finite set of discrete values), (ii) the ones with exclusively continuous variables (i.e., continuous variable domains), and (iii) the ones with discrete as well as continuous variables. As metaheuristics were originally developed for optimization problems from class (i), we restrict ourselves in this introduction to this class of problems, which is also called the class of combinatorial optimization problems, or CO problems. However, much can be said and extended to continuous and other similar domains. According to Papadimitriou and Steiglitz [ 1141, a CO problem P = (S,f) is an optimization problem in which is given a finite set of objects S and an objective hnction f : S H R+ that assigns a positive cost value to each of the objects s E S. 3
4
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
The goal is to find an object of minimal cost value.’ The objects are typically integer numbers, subsets of a set of items, permutations of a set of items, or graph structures. An example is the well-known travelling salesman problem (TSP [92]). Other examples of CO problems are assignment problems, timetabling, and scheduling problems. Due to the practical importance of CO problems, many algorithms to tackle them have been developed. These algorithms can be classified as either complete or approximate algorithms. Complete algorithms are guaranteed to find for every finite size instance of a CO problem an optimal solution in a bounded time (see [ 114, 11 11). Yet, for CO problems that are NP-hard [63], no polynomial time algorithm exists, assuming that P # N P . Therefore, complete methods need exponential computation time in the worst-case. This often leads to computation times too high for practical purposes. Thus, the use of approximate methods to solve CO problems has received more and more attention in the last 30 years. In approximate methods we sacrifice the guarantee of finding optimal solutions for the sake of getting good solutions in a significantly reduced amount of time. Among the basic approximate methods we usually distinguish between constructive heuristics and local search methods. 1.1.1 Constructive Heuristics
Constructiveheuristics are typically the fastest approximate methods. They generate solutions from scratch by adding opportunely defined solution components to an initially empty partial solution. This is done until a solution is complete or other stopping criteria are satisfied. For the sake of simplicity, we henceforth assume that a solution construction stops in case the current (partial) solution cannot be hrther extended. This happens when no completion exists such that the completed solution is .feasible, i.e., it satisfies the problem constraints. In the context of constructive heuristics, solutions and partial solutions are sequences (cZr. . . ,c k ) composed of solution components c3 from a finite set of solution components C (where /&I = n). This kind of solution is throughout the chapter denoted by 5 , respectively 5P in case of partial solutions. Constructive heuristics have first to specify the set of possible extensions for each feasible (partial) solution s p . This set, henceforth denoted by ( J I ( 5 P ) , is a subset of C \ {c I c E sp}’. At each construction step one of the possible extensions is chosen until ( J I ( 5 P ) = 8, which means either that 5P is a solution or that 5P is a partial solution that cannot be extended to a feasible solution. The algorithmic framework of a constructiveheuristic is shown in Algorithm 1. A notable example of a constructive heuristic is a greedy heuristic, which implements procedure ChooseFrom(M(sP)) by applying a weighting,function.A weighting function is a function that, sometimes
‘Note that minimizing over an objective hnction f is the same as maximizing over’ -f. Therefore, every CO problem can be described as a minimization problem. ’Note that constructive heuristics exist that may add several solution components at the same time to a partial solution. However, for the sake of simplicity, we restrict our description of constructive heuristics to the ones that add exactly one solution component at a time.
INTRODUCTION
5
Algorithm 1 Constructive heuristic 5 p = () Determine ' J l ( 5 P ) while 'Jl(sP) # 8 do c t ChooseFrom(%(sP)) S P +- extend 5P by appending solution component c Determine %(zip) end while output: constructed solution
depending on the current (partial) solution, assigns at each construction step a heuristic value y(c) to each solution component c E %(sp). Greedy heuristics choose at each step one of the extensions with the highest value. For example, a greedy heuristic for the TSP is the Nearest Neighbor Heuristic. The set of solution components is the set of nodes (cities) in G = (V,E ) . The algorithm starts by selecting a city i at random. Then, the current partial solution 5P is extended at each of n - 1 construction steps by adding the closest city j E %(sP) = V \ 5p. Note that in the case of the Nearest Neighbor Heuristic the heuristic values, which are chosen as the inverse of the distances between the cities, do not depend on the current partial solution. Therefore, the weighting function that assigns the heuristic values is called static. In cases in which the heuristic values depend on the current partial solution, the weighting function is called dynamic. 1.1.2 Local Search Methods As mentioned above, constructive heuristics are often very fast, yet they often return solutions of inferior quality when compared to local search algorithms. Local search algorithms start from some initial solution and iteratively try to replace the current solution by a better one in an appropriately defined neighborhood of the current solution, where the neighborhood is formally defined as follows:
Definition 1 A neighborhood structure is a function N : S -+ 2s that assigns to every s E S a set of neighbors N ( s ) S . N ( s )is called the neighborhood of s. Often, neighborhood structures are implicitly deJined by specibing the changes that must be applied to a solution s in order to generate all its neighbors. The application of such an operator that produces a neighbor s' E N (s ) of a solution s is commonly called a move. A neighborhood structure together with a problem instance defines the topology of a so-called search (or fitness) landscape [134, 84, 61, 1231. A search landscape can be visualized as a labelled graph in which the nodes are solutions (labels indicate their objective function value) and arcs represent the neighborhood relation between solutions. A solution s* E S is called a globally minimal solution (or global minimum) if for all s E S it holds that f(s*) 5 f(s). The set of all globally
6
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
Algorithm 2 Iterative improvement local search s + GeneratelnitiaiSolution() while 3 s’ E N ( s )such that f ( s ’ ) < f (s) do s + ChooselmprovingNeighbor(N(s)) end while output: s minimal solutions is henceforth denoted by S*. The introduction of a neighborhood structure enables us to additionally define the concept of locally minimal solutions.
Definition 2 A locally minimal solution (or local minimum) with respect to a neighborhood structure N is a solution i such that ’d s E N (i) : f (s) I f ( s ) . We call s a strict IocaIIy minimal solution if V s E N ( s ): f (i) < f (s) . The most basic local search method is usually called iterative improvement local search, since each move is only performed if the resulting solution is better than the current solution. The algorithm stops as soon as it finds a local minimum. The high level algorithm is sketched in Algorithm 2. There are two major ways of implementing function ChooseImprovingNeighbor(A‘(s ) ) . The first way is called jrst-improvement. A first-improvement fkction scans the neighborhood N (s ) and returns the first solution that is better than s. In contrast, a best-improvement function exhaustively explores the neighborhood and returns one of the solutions with the lowest objective function value. An iterative improvement procedure that uses a first-improvement function is called first-improvement local search, respectively best-improvement local search (or steepest descent local search) in the case of a best-improvement function. Both methods stop at local minima. Therefore, their performance strongly depends on the definition of a neighborhood structure N . 1.1.3 Metaheuristics In the 1970s, a new kind of approximate algorithm has emerged which basically tries to combine basic heuristic methods in higher level frameworks aimed at efficiently and effectively exploring a search space. These methods are nowadays commonly called metaheuristics. The term metaheuristic, first introduced in [66], derives from the composition of two Greek words. Heuristic derives from the verb heuriskein ( E V ~ L O K C L Vwhich ) means “to find”, while the suffix meta means “beyond, in an upper level”. Before this term was widely adopted, metaheuristics were often called modern heuristics [ 1221. The class of metaheuristic algorithms includes’-but is not restricted to-ant colony optimization (ACO), evolutionary computation (EC) including genetic algorithms (GAS),iterated local search (ILS), simulated annealing (SA), and tabu search (TS). For books and surveys on metaheuristicssee [ 19,69,148]. In alphabetical order
INTRODUCTION
7
The different descriptions of metaheuristics found in the literature allow us to extract some fundamental properties by which metaheuristics are characterized: 0
0
0
0 0
0
0
0
0
Metaheuristics are strategies that “guide” the search process. The goal is to efficiently explore the search space in order to find (near-) optimal solutions. Techniques which constitute metaheuristic algorithms range from simple local search procedures to complex learning processes. Metaheuristic algorithms are approximate and usually non-deterministic. They may incorporate mechanisms to avoid getting trapped in confined areas of the search space. The basic concepts of metaheuristics can be described on an abstract level (i.e., not tied to a specific problem) Metaheuristics are not problem-specific. Metaheuristics may make use of domain-specific knowledge in the form of heuristics that are controlled by the upper level strategy. Todays more advanced metaheuristics use search experience (embodied in some form of memory) to guide the search.
In short we may characterize metaheuristics as high level strategies for exploring search spaces by using different methods. Of great importance hereby is that a dynamic balance is given between diversfication and intensijication. The term diversification generally refers to the exploration of the search space, whereas the term intensification refers to the exploitation of the accumulated search experience. These terms stem from the tabu search field [70] and it is important to clarify that the terms exploration and exploitation are sometimes used instead, for example in the evolutionary computation field [5 11. The balance between diversification and intensification is important, on one side to quickly identify regions in the search space with high quality solutions and on the other side not to waste too much time in regions of the search space which either are already explored or do not provide high quality solutions. Blum and Roli elaborated on the importance of the two concepts in their recent survey on metaheuristics [ 191. The search strategies of different metaheuristics are highly dependent on the philosophy of the metaheuristic itself. There are several different philosophies apparent in the existing metaheuristics. Some of them can be seen as “intelligent” extensions of local search algorithms. The goal of this kind of metaheuristic is to escape from local minima in order to proceed in the exploration of the search space and to move on to find other hopefully better local minima. This is for example the case in tabu search, iterated local search, variable neighborhood search and simulated annealing. These metaheuristics (also called trajectory methods) work on one or several neighborhood structure(s) imposed on the search space. We can find a different philosophy in algorithms such as ant colony optimization and evolutionary computation. They incorporate a learning component in the sense that they implicitly or explicitly try to learn correlations between decision variables to identify high quality areas in the search space. This kind of metaheuristic performs, in a sense, a biased sampling
8
A N INTRODUCTION TO METAHEURISTIC TECHNIQUES
of the search space. For instance, in evolutionary computation this is achieved by recombination of solutions and in ant colony optimization by sampling the search space at each iteration according to a probability distribution. There are different ways to classify and describe metaheuristic algorithms. Depending on the characteristics selected to differentiate among them, several classifications are possible, each of them being the result of a specific viewpoint (see for example, [ 1361). The classification into nature-inspired vs. non nature-inspired metaheuristics, into memory-based vs. memory-less methods, or into methods that either use a dynamic or a static objective function, is possible. In this chapter we describe the most important metaheuristics according to the single-point vs. population-based search classification, which divides metaheuristics into trajectory methods and population-based methods. This choice is motivated by the fact that this categorization permits a clearer description of the algorithms. Moreover, a successful hybridization is obtained by the integration of single-point search algorithms in population-based ones. As mentioned at the beginning of this section, metaheuristic algorithms were originally developed for solving CO problems. However, in the meanwhile they are also successfully applied to continuous optimization problems. Examples are simulated annealing algorithms such as [128] or differential evolution [135] and [4, 25, 271 from the evolutionary computation field. Tabu search algorithms such as [ 13, 261 were among the first metaheuristic algorithms to be applied to continuous problems. Among the most recent metaheuristic approaches are ant colony optimization algorithms such as [46,99, 1311. Some of the above mentioned algorithms are based on the well-known Nelder-Mead simplex algorithm for continuous optimization [ 1lo], while others are developed after new ideas on real parameter management coming from the mathematical programming field. However, for the rest of this introduction we will focus on metaheuristic approaches for CO problems, since including in each section discussion on real optimization could end in a chapter of quite difficult organization and reading.
The structure of this chapter is as follows. Section 1.2 and Section 1.3 are devoted to a description of nowadays most important metaheuristics. Section 1.2 describes the most relevant trajectory methods and in Section 1.3 we outline population-based methods. In Section 1.4 we give an overview over the different decentralizedmethods, which are metaheuristics without a central control, and we conclude in Section 1.5 with an overview on metaheuristic hybridizations. 1.2 TRAJECTORY METHODS In this section we outline metaheuristics referred to as trajectory methods. The term trajectory methods is used because the search process performed by these methods is characterizedby a trajectory in the search space. Most of these methods are extensions
TRAJECTORY METHODS
9
of simple iterative improvement procedures (see Section 1.1.2), whose performance is usually quite unsatisfactory. They incorporate techniques that enable the algorithm to escape from local minima. This implies the necessity of termination criteria other than simply reaching a local minimum. Commonly used termination criteria are a maximum CPU time, a maximum number of iterations, a solution s of sufficient quality, or reaching the maximum number of iterations without improvement. 1.2.1 Simulated Annealing SimulatedAnnealing (SA) is commonly said to be the oldest among the metaheuristics and surely one of the first algorithms that had an explicit strategy to escape from local minima. The origins of the algorithm are in statistical mechanics (see the Metropolis algorithm [loll). The idea of SA was provided by the annealing process of metal and glass, which assume a low energy configurationwhen cooled with an appropriate cooling schedule. SA was first presented as a search algorithm for CO problems in [87] and [23]. In order to avoid getting trapped in local minima, the fundamental idea is to allow moves to solutions with objective function values that are worse than the objective function value of the current solution. Such a move is often called an uphill move. At each iteration a solution s' E N ( s )is randomly chosen. If s' is better than s (i.e., has a lower objective function value), then s' is accepted as new current solution. Otherwise, s' is accepted with a probability which is a function of a temperature parameter Tk and f(s') - f(s). Usually this probability is computed following the Boltzmann distribution:
The dynamic process described by SA is a Murkov chain [52],as it follows a trajectory in the state space in which the successor state is chosen depending only on the incumbent one. This means that basic SA is memory-less. However, the use of memory can be beneficial for SA approaches (see for example [24]). The algorithmic framework of SA is described in Algorithm 3. The components are explained in more detail in the following.
GeneratelnitialSolution(): The algorithm starts by generating an initial solution that may be randomly or heuristically constructed. SetlnitialTemperature(): The initial temperature is chosen such that the probability for an uphill move is quite high at the start of the algorithm. AdaptTemperature(Tk): The temperature Tk is adapted at each iteration according to a cooling schedule (or cooling scheme). The cooling schedule defines the value of Tk at each iteration k. The choice of an appropriate cooling schedule is crucial for the performance of the algorithm. At the beginning of the search the probability of accepting uphill moves should be high. Then, this probability should be gradually
10
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
Algorithm 3 Simulated Annealing (SA) s t GeneratelnitialSolution() k+O Tk t SetlnitialTemperature() while termination conditions not met do s’ t PickNeighborAtRandom(N(s ) ) if ( f ( s ’ ) < f ( s ) )then s + s’ { s’ replaces s} else Accept s’ as new solution with probability p(s’ 1 Tk,S ) end if AdaptTem perature(Tk) k c k + l end while output: best solution found
03.(1.1))
decreased during the search. Note that this is not necessarily done in a monotonic fashion. Theoretical results on non-homogeneous Markov chains [ 11 state that under particular conditions on the cooling schedule, the algorithm converges in probability to a global minimum for k + oa. More precisely:
3 T E R+
s.t.
lim p(globa1 minimum found after k steps) = 1
k-oo
k=l
A particular cooling schedule that fulfills the hypothesis for the convergence is the one that follows a logarithmic law. Hereby, Tk is determined as T k + (where c is a constant). Unfortunately, cooling schedules which guarantee the convergence to a global optimum are not feasible in applications, because they are too slow for practical purposes. Therefore, faster cooling schedules are adopted in applications. One of the most popular ones follows a geometric law: Tk c Q . T k - 1 , where Q E (0, l),which corresponds to an exponential decay of the temperature. The cooling schedule can be used for balancing between diversification and intensification. For example, at the beginning of the search, T , might be constant or linearly decreasing in order to sample the search space; then, T k might follow a rule such as the geometric one in order to make the algorithm converge to a local minimum at the end of the search. More successhl variants are non-monotonic cooling schedules (e.g., see [94, 1 131). Non-monotonic cooling schedules are characterized by alternating phases of cooling and reheating, thus providing an oscillating balance between diversification and intensification. The cooling schedule and the initial temperature should be adapted to the particular problem instance considered, since the cost of escaping from local minima depends
&
TRAJECTORY METHODS
11
on the structure of the search landscape. A simple way of empirically determining the starting temperature TOis to initially sample the search space with a random walk to roughly evaluate the average and the variance of objective function values. Based on the samples the starting temperature can be determined such that uphill moves have a high probability. But also more elaborate schemes can be implemented [82].
SA has been applied to many CO problems, such as the quadratic assignment problem (QAP) [30] and the job shop scheduling (JSS) problem [ 1441. References to other applications can be found in [2, 55, 821. SA is nowadays more used as a component in metaheuristics, rather than applied as a stand-alone search algorithm. Variants of SA called Threshold Accepting and The Great Deluge Algorithm were presented in [48] and [47], respectively. 1.2.2 Tabu Search Tabu Search (TS) is one of the most successful metaheuristics for the application to CO problems. The basic ideas of TS were introduced in [66], based on earlier ideas formulated in [65].4 A description of the method and its concepts can be found in [70]. The basic idea of TS is the explicit use of search history, both to escape from local minima and to implement an explorative strategy. A simple TS algorithm is based on a best-improvement local search (see Section 1.1.2) and uses a short term memory to escape from local minima and to avoid cycles5 The short term memory is implemented as a tabu list TL that keeps track of the most recently visited solutions and excludes them from the neighborhood of the current solution. At each iteration, the best solution among the allowed ones is chosen as the new current solution. Furthermore, this solution is added to the tabu list. The implementation of short term memory in terms of a list that contains complete solutions is not practical, because managing a list of complete solutions is highly inefficient. Therefore, instead of the solutions themselves, those solution components are stored in the tabu lists that are involved in moves. Since different kinds of moves that work on different types of solution components can be considered, a tabu list is usually introduced for each type of solution component. The different types of solution components and the correspondingtabu lists define the tabu conditions which are used to filter the neighborhood of a solution and generate the allowed set n/,( 9 ) . Storing solution components instead of complete solutions is much more efficient, but it introduces a loss of information, as forbidding, for example, the introduction of a certain solution component in a solution means assigning the tabu status to probably more than one solution. Thus, it is possible that unvisited solutions of high quality are excluded from the allowed set. To overcome this problem, aspiration criteria are defined which allow to include a solution in the allowed set even if it is forbidden by
'Related ideas were labelled steepest ascent/mildest descent method in [76] ' A cycle is a sequence of moves that constantly repeats itself.
12
AN INTRODUCTION TO METAHEURlSTlC TECHNIQUES
Algorithm 4 Tabu Search (TS) s c GeneratelnitialSolution() InitializeTabuLists(TLI, . . . , TL,) while termination conditions not met do Na(s)c {s’ E N ( s ) I s’ does not violate a tabu condition, or it satisfies at least one aspiration condition} s’ +- argmin{f(s’l) I s” E N,(s)} UpdateTabuLists( TLI , . . . , TLp,s,s‘) s +- sf {i.e., s’ replaces s} end while output: best solution found tabu conditions. The most commonly used aspiration criterion applies to solutions which are better than the best solution found so far. This tabu search algorithm is shown in Algorithm 4. The use of a tabu list prevents from returning to recently visited solutions; therefore it prevents from endless cycling6 and forces the search to accept even uphill moves. The length I of the tabu list-known in the literature as the tabu tenure-controls the memory of the search process. With small tabu tenures the search will concentrate on small areas of the search space. In contrast, a large tabu tenure forces the search process to explore larger regions, because it forbids revisiting a higher number of solutions. The tabu tenure can be varied during the search, leading to more robust algorithms. An example can be found in [ 1391, where the tabu tenure is periodically reinitialized at random from the interval [Imln, lmaz]. A more advanced use of a dynamic tabu tenure is presented in [ 131, where the tabu tenure is increased if there is evidence for repetitions of solutions (thus a higher diversification is needed), while it is decreased if there are no improvements (thus intensification should be boosted). More advanced ways of applying dynamic tabu tenures are described in [67]. Tabu lists, which are usually identified with the use of short term memory, are only one of the possible ways of taking advantage of the history of the search. Information collected during the whole search process can be used in many other ways, especially for a strategic guidance of the algorithm. This kind of long term memory is usually added to TS by referring to four principles: recency,frequency, quality, and influence. Recency-based memory records for solutions (or solution components)the most recent iteration they were involved in. Orthogonally, frequency-basedmemory keeps track of how many times a solution has been visited, respectively how many times a solution component was part of a visited solution. This information identifies the regions of the search space in which the search was confined or in which it stayed for a high number of iterations. This kind of information about the past is usually exploited to diversify the search. The third principle (i.e., quality) refers to the accumulation and extraction of information from the search history in order 6Cycles of higher period are possible. since the tabu list has a finite length 1 which is smaller than the cardinality of the search space.
TRAJECTORY METHODS
13
Algorithm 5 Greedy Randomized Adaptive Search Procedure (GRASP) while termination conditions not met do {see Algorithm 6) s t ConstructGreedyRandomizedSolution() ApplyLocalSearch(s) end while output: best solution found
to identify solution components that contribute to good solutions. This information can be usefully integrated in solution constructions or in the evaluation of moves. Other metaheuristics (e.g., ant colony optimization) explicitly use this principle to learn about good combinations of solution components. Finally, influence concerns certain choices that were made during the search process. Sometimes it can be beneficial to know which choices were the most critical ones. In general, the TS field is a rich source of ideas. Many of these ideas and strategies have been and are currently adopted by other metaheuristics. TS has been applied to most CO problems; examples of successhl applications are the Robust Tabu Search to the QAF' [ 1391, the Reactive Tabu Search to the maximum satisfiability (MAXSAT) problem [12, 1301, and to assignment problems [34]. TS approaches dominate the job shop scheduling (JSS) problem area (see, for example, [ 1121) and the vehicle routing (VR) area [64]. Further references of applications can be found in [70]. 1.2.3 Explorative Local Search Methods
In this section we present more recently proposed trajectory methods. These are the greedy randomized adaptive search procedure (GRASP), variable neighborhood search (VNS), guided local search (GLS), and iterated local search (ILS). 1.2.3.I Greedy Randomized Adaptive Search Procedure. The greedy randomized adaptive search procedure (GRASP), see [53, 1171, is a simple metaheuristic that combines constructive heuristics and local search. Its structure is sketched in Algorithm 5 . GRASP is an iterative procedure, composed of two phases: solution construction and solution improvement. The best found solution is returned upon termination of the search process. The solution construction mechanism (see Algorithm 6) is a randomized constructive heuristic. As outlined in Section 1.1.1, a constructive heuristic generates a solution step-by-step by adding one new solution component from a finite set C (where JCl = n) of solution components to the current partial solution 57'. The solution component that is added at each step is chosen at random from a list that is called the restricted candidate list. This list is a subset of %(sP), the set of allowed solution components, and is denoted by RCL. In order to generate this list, the solu-
14
A N INTRODUCTION TO METAHEURISTIC TECHNIQUES
Algorithm 6 Greedy Randomized Solution Construction {Remember that 5P denotes a partial solution} 5 p = () cr t DetermineLengthOfRestrictedCandidateList() while n ( 5 p ) # 8 do RCL t GenerateRestrictedCandidateList(rl,rJl(s) c t PickAtRandom(RCL) 5 P t extend 5P by adding solution component c end while tion components in %(5P) are ranked by means of a weighting function. Then, RCL is composed by the (Y highest ranked solution components. The length cy of the restricted candidate list determines the strength of the heuristic bias that is introduced by the weighting function. In the extreme case of cy = 1, the highest weighted solution component is added deterministically;thus the construction would be equivalent to a deterministic greedy heuristic. In contrast, the setting of a = /n(sP)/ at each construction step leads to the construction of a random solution. Therefore, a is a critical parameter which influences the sampling of the search space. In [ 1 171 the most important schemes to define a are listed. The simplest scheme.is, trivially, to keep cr constant; alternatively it can also be changed at each iteration, either randomly or by means of an adaptive scheme. The second phase of the algorithm is a local search method, which may be a basic local search algorithm such as iterative improvement or a more advanced technique such as SA or TS. GRASP can be effective if two conditions are satisfied: 0
0
the solution construction mechanism samples the most promising regions of the search space; the solutions constructed by the constructive heuristic enable local search to reach different local minima.
The first condition can be met by the choice of an effective constructive heuristic and an appropriate length of the candidate list, whereas the second condition can be met by choosing the constructive heuristic and the local search in a way such that they fit well. The description of GRASP as given above indicates that a basic GRASP does not use the history of the search process.' The only memory requirement is for storing the problem instance and for keeping the best-so-far solution (i.e., the best solution found since the start of the algorithm). This is one of the reasons why GRASP is often outperformed by other metaheuristics. However, due to its simplicity, it is generally very fast and it is able to produce quite good solutions in a very short amount of computationtime. Furthermore, it can be easily integrated into other search 'However. some extensions in this direction are cited in -I1171. - and an exainole o f a inetaheuristic inethod using an adaptive greedy procedure depending on search history is Squeaky Wheel Optimization (SWO) ~ 5 1 .
TRAJECTORY METHODS
15
Algorithm 7 Variable Neighborhood Search (VNS) Select a set of neighborhood structures Nk, k = 1, . . . , kma, s t GeneratelnitialSolution() while termination conditions not met do k+l while k < kma, do s’ t PickAtRandom(Nk(s)) {also called the shakingphase} S” c
LocalSearch(s’)
if f(s”) < f(s) then s + s“
k + l else k t k S 1 end if end while end while output: best solution found
techniques. Among the applications of GRASP we mention the JSS problem [ 161, the graph planarization problem [125], and assignment problems [118]. A detailed and annotated bibliography references many more applications [54].
1.2.3.2 Variable Neighborhood Search. Variable Neighborhood Search (VNS) is a metaheuristic proposed in [78], which explicitly applies strategies for swapping between different neighborhood structures from a predefined finite set. The algorithm is very general and many degrees of freedom exist for designing variants and particular instantiations. The algorithmic framework of the VNS algorithm is shown in Algorithm 7. At the initialization of the algorithm, a set of neighborhood structures has to be defined. These neighborhood structures can be arbitrarily chosen. Then, an initial solution is generated, the neighborhood index is initialized, and the algorithm iterates until a termination condition is met. Each iteration consists of three phases: shaking, local search, and move. In the shaking phase a solution s‘ in the k-th neighborhood of the current solution s is randomly selected. Solution s’ is then used as the starting point for a local search procedure, which may use any neighborhood structure and is not restricted to the set of neighborhood structures Nk, k = 1,.. . ,k,,. At the termination of the local search process, the new solution s” is compared with s and, if it is better, it replaces s and the algorithm proceeds with k = 1. Otherwise, k is incremented and a new shaking phase starts using a different neighborhood. The objective of the shaking phase is to select a solution from some neighborhood of the current local minimum that is a good starting point for the local search. The starting point should enable local search to reach a different local minimum than the current one, but should not be “too far” from s, otherwise the algorithm would degenerate into a simple multi-start local search with random starting solutions.
16
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
Therefore, choosing s’ in the neighborhood of the current solution is likely to produce a solution that maintains some features of the current one. The process of changing neighborhoods in case of no improvements corresponds to a diversification of the search. The effectiveness of this dynamic strategy for swapping between neighborhood structures can be explained by the fact that a “bad” place on the search landscape given by a certain neighborhood structure could be a “good” place on the search landscape given by another neighborhood structure.8 Moreover, a solution that is locally optimal with respect to a neighborhood is probably not locally optimal with respect to another neighborhood.
VNS and its variants have been successfully applied to graph-based CO problems such as the p-median problem [77], the degree constrained minimum spanning tree problem 11261, the Steiner tree problem [147], and the k-cardinality tree (KCT) problem [ 105, 1431. References to more applications can be found in [78]. 1.2.3.3 Guided Local Search. Guided Local Search (GLS) [I461 applies a strategy for escaping from local minima that is very different to the strategies that are employed by tabu search or variable neighborhood search. This strategy consists in dynamically changing the objective function, which results in a change of the search landscape. The aim is to make the current local minimum gradually “less desirable” over time. The dynamic change of the objective function in GLS is based on a set of nL solution features sf,, i = 1,. . . , m. A solution feature may be any kind of property or characteristic that can be used to discriminate between solutions. For example, solution features in the TSP could be the edges between the cities. To each solution feature sf, is assigned a cost value cz, which gives a measure of the contribution of solution feature sf, to the objective function f(.). In the TSP example, the cost of a solution feature could be the length of the corresponding edge. An indicator hnction I ( i ,s ) indicates whether the solution feature s f , is present in a solution s:
I ( i ,s ) =
{
1 : if feature sf2 is present in solution s 0 : otherwise .
During the whole run of the algorithm, the original objective hnction f (.) is replaced by anew objective function f ’ (.) that is obtained by adding to f ( - ) a term that depends on the m solution features:
where p,, i = 1, . . . , m, are thepenalty values and X > 0 is a constant that determines the influence of the penalty term. The penalty values are weights of the solution features: the higher p , , the higher is the cost of having that feature in a solution. 8 A “good’ place in the search space is an area froin which a good local minimum can be reached
TRAJECTORY METHODS
17
Algorithm 8 Guided Local Search (GLS) s t GeneratelnitialSolution() P (O,'.'?O) while termination conditions not met do it LocalSearch(s,f ' ) UpdatePenaltyVector(p,i) s t d end while output: best solution found +
The algorithm (see Algorithm 8) works as follows. First, an initial solution s is generated. Then, at each iteration a local search procedure is applied to the current solution until a local minimum iis reached. Note that this local search procedure uses the modified objective function. The function UpdatePenaltyVector(p,i) modifies the penalty vector p = (PI, . . . ,p,) depending on 5. First, the so-called utility Util(ii) , of each solution feature is determined:
Util(i,i)= I ( i ,i) '
c,
1
+ P,
This equation shows that the higher the cost, the higher the utility of features. Nevertheless, the cost is scaled by the penalty value to prevent the algorithm from being totally biased toward the cost and to make it sensitive to the search history. Then the penalty values of the solution features with maximum utility are updated as follows: P,+Pi+1. (1.4) The penalty value update procedure can be supplemented by a multiplicative rule of the form p , t pi . a, where cy E ( 0 , l ) . Such an update rule is generally applied with a lower frequency than the one of Equation 1.4 (e.g., every few hundreds of iterations). The aim of this update is the smoothing of the weights of penalized features so as to prevent the search landscape from becoming too rugged. It is important to note that the penalty value update rules are often very sensitive to the problem instance under consideration. GLS has been successfully applied to the weighted MAXSAT [103], the VR problem [86], the TSP, and the QAP [ 1461. 1.2.3.4 Iterated Local Search. Iterated Local Search (ILS) [136, 93, 981 is a metaheuristic that is based on a simple but powerfd concept. At each iteration the current solution (which is a local minimum) is perturbed and a local search method is applied to the perturbed solution. Then, the local minimum that is obtained by applying the local search method is either accepted as the new current solution or not. Intuitively, ILS performs a trajectory along local minima il, i 2 , . . . , it without
18
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
Algorithm 9 Iterated Local Search (ILS) s t GeneratelnitialSolution() 2 t LocalSearch(s) while termination conditions not met do s’ t Perturbation(S, Izistory) i?+ LocalSearch(s’) s t ApplyAcceptanceCriterion(d’, s, Iristory) end while output: best solution found
explicitly introducing a neighborhood structure on S, by applying the scheme that is shown in Algorithm 9. The importance of the perturbation is obvious: too small a perturbation might not enable the system to escape from the local minimum just found. On the other side, too strong a perturbation would make the algorithm similar to a random restart local search. Therefore, the requirement on the perturbation method is to produce a starting point for local search such that a local minimum different from the current solution is reached. However, this new local minimum should be closer to the current solution than a local minimum produced by the application of the local search to a randomly generated solution. The acceptance criterion acts as a counterbalance, as it filters the accepted solutions depending on the search history and the characteristics of the new local minimum. The design of ILS algorithms has several degrees of freedom in the generation of the initial solution, the choice of the perturbation method, and the acceptance criterion. Furthermore, the history of the search process can be exploited in the form of both short and long term memory. In the following we describe the three main algorithmic components of ILS.
GeneratelnitialSolution(): The construction of initial solutions should be fast (computationally not expensive), and initial solutions should be a good starting point for local search. Any kind of solution construction procedure can be used.
Perturbation(2,history): The perturbation is usually non-deterministic in order to avoid cycling. Its most important characteristic is the strength, roughly defined as the amount of changes inflicted on the current solution. The strength can be either fixed or variable. In the first case, the distance between S and s’ is kept constant, independently of the problem size. However, a variable strength is in general more effective, since it has been experimentally found that, in most of the problems, the bigger the problem instance size, the larger should be the strength. A more sophisticated mechanism consists of adaptively changing the strength. For example, the strength might be increased when more diversification is needed or decreased when intensification seems preferable. A second choice is the mechanism to perform perturbations: random or semi-deterministic.
POPULATION-BASED METHODS
19
ApplyAcceptanceCriterion(i’,i, history): The third important component is the acceptance criterion. Two extreme examples are (1) accepting the new local minimum only in case of improvement and (2) always accepting the new solution. In-between, there are several possibilities. For example, it is possible to adopt an acceptance criterion that is similar to the one of simulated annealing. Non-monotonic cooling schedules might be particularly effective if they exploit the history of the search process. For example, when the recent history of the search process indicates that intensification seems no longer effective, a diversification phase is needed and the temperature is increased. Examples of successhi applicationsof ILS are to the TSP [83,97], to the QAP [93], and to the single-machine total weighted tardiness (SMTWT) problem [35]. References to other applications can be found in [93]. 1.3 POPULATION-BASED METHODS Population-based methods deal in every iteration of the algorithm with a set (i.e., a population) of solutions rather than with a single solution.’ In this way, populationbased algorithms provide a natural, intrinsic way for the exploration of the search space. Yet, the final performance strongly depends on the way the population is manipulated. The most studied population-based methods in combinatorial optimization are evolutionary computation (EC) and ant colony optimization (ACO). In EC algorithms, a population of individuals is modified by recombination and mutation operators, and in ACO a colony of artificial ants is used to construct solutions guided by the pheromone trails and heuristic information. 1.3.1 Evolutionary Computation
Evolutionary Computation (EC) algorithms are inspired by nature’s capability to evolve living beings well adapted to their environment. EC algorithms can be characterized as computational models of evolutionary processes. At each iteration a number of operators are applied to the individuals of the current population to generate the individuals of the population of the next generation (iteration). Usually, EC algorithms use operators called recombination or crossover to recombine two or more individuals to produce new individuals. They also use mutation or modification operators which cause a self-adaptation of individuals. The driving force in evolutionary algorithms is the selection of individuals based on theirJitness (which can be based on the objective function, the result of a simulation experiment, or some other lund of quality measure). Individuals with a higher fitness have a higher probability to be chosen as members of the population of the next iteration (or as parents for the generation of new individuals). This corresponds to the principle of survivaal qfthe
91n general. especially in EC algorithms. we talk about a population of individuals rather than solutions.
20
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
Algorithm 10 Evolutionary Computation (EC) P c GeneratelnitialPopulation() Evaluate(P) while termination conditions not met do P’ t Recombine(P) P” t Mutate(P’)
Eva1uate(P”) P c Select(P,P”) end while output: best solution found
fittest in natural evolution. It is the capability of nature to adapt itself to a changing environment, which gave the inspiration for EC algorithms.
There has been a variety of slightly different EC algorithms proposed over the years. Basically they fall into three different categories which have been developed independently of each other. These are Evolutionary Programming (EP) as introduced by Fogel in [59] and Fogel et al. in [60], Evolution Strategies (ES) proposed by Rechenberg in [121], and Genetic Algorithms initiated by Holland in [81] (see [73], [104], [124], and [145] for further references). EP arose from the desire to generate machine intelligence. While EP originally was proposed to operate on discrete representations of finite state machines, most of the present variants are used for continuous optimization problems. The latter also holds for most present variants of ES, whereas GAS are mainly applied to solve combinatorial optimization problems. Over the years there have been quite a few overviews and surveys about EC methods. Among those are the ones by Back [8], by Fogel [57], by Hertz and Kobler [79], by Spears et al. [ 1331, and by Michalewicz and Michalewicz [ 1021. In [20] a taxonomy of EC algorithms is proposed. Algorithm 10 contains the basic structure of EC algorithms. In this algorithm, P denotes the population of individuals. At each iteration a set of offspring individuals P’ is generated by the application of the function Recombine(P), whose members may then be mutated in function Mutate(P’), producing a set of mutated offspring individuals I”’.The individuals for the next population are then selected in hnction Select(P,P”)from the union of the old population P and the set of mutated offspring individuals P ” . Individuals of EC algorithms are not necessarily solutions to the considered problem. They may be partial solutions, or sets of solutions, or any object which can be transformed into one or more solutions in a structured way. Most commonly used in combinatorial optimization is the representation of solutions as bit-strings or as permutations of n integer numbers. Tree-structuresor other complex structures are also possible. In the context of genetic algorithms, individuals are called genotypes, whereas the solutions that are encoded by individuals are called phenotypes. This is to differentiate between the representation of solutions and solutions themselves. The choice of an appropriate representation is crucial for the
POPULATION-BASED METHODS
21
success of an EC algorithm. Holland’s schema analysis [8 11 and Radcliffe’s generalization to formae [ 1201 are examples of how theory can help to guide representation choices. In the following the components of Algorithm 10 are outlined in more detail.
GeneratelnitialPopulation():The initial population may be a population of randomly generated individuals, or individuals obtained from other sources such as constructive heuristics.
Recombine(P): The most common recombination operator is two-parent crossover. But there are also recombination operators that operate on more than two individuals to create a new individual (multi-parent crossover), see [l5, 491. More recent developments even use population statistics for generating the individuals of the next population. Examples are the recombination operators called Gene Pool Recombination [ 1091 and Bit-Simulated Crossover [ 1381 which make use of a probability distribution over the search space given by the current population to generate the next population.” The question of which individualscan be recombined can be expressed in the form of a neighborhood function NEC: Z-+ Z2, which assigns to each individual i E Za set of individuals N ~(i) c Z whose members are permitted to act as recombination partners for i to create offspring. If an individual can be recombined with any other individual (as, for example, in the simple GA [145]), we talk about unstructured populations, otherwise we talk about structured populations. An example of an EC algorithm that works on structured populations is the Parallel Genetic Algorithm (PGA) proposed by Muhlenbein [ 1071. Like in this case, many structured algorithms are run in parallel, but many others are not. To get deeper on this distinction the interested reader can consult [7] or the more recent and complete study found in [ 5 ] . Mutate(P’): The most basic form of a mutation operator applies small random changes to some of the offspring individuals in P’. This is done in order to introduce some noise in the search process for avoiding premature convergence. Instead of using random mutations, in many applications it proved to be quite beneficial to use improvement mechanisms to increase the fitness of individuals. EC algorithms that apply a local search algorithm to each individual of a population are often called Memetic Algorithms [ 1061. While the use of a population ensures an exploration of the search space, the use of local search techniques helps to quickly identify “good’ areas in the search space. Select(P”,P): At each iteration it has to be decided which individuals will enter the population of the next iteration. This is done by a selection scheme. To choose the individuals for the next population exclusively from the offspring is called generational replacement. In some schemes, such as elitist strategies, successive
“Both methods can be regarded as members of the class of Estimation of Distribution Algorithms (EDAs); see Section 1.3.1.2.
22
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
generations overlap to some degree, i.e., some portion of the previous generation is retained in the new population. The fraction of new individuals at each generation is called the generational gap [32]. In a steady state selection, only a few individuals are replaced in each iteration: usually a small number of the least fit individuals are replaced by offspring. Most EC algorithms deal with populations of constant size. However, it is also possible to have a variable population size. In case of a continuously shrinking population size, the situation in which only one individual is left in the population (or no crossover partners can be found for any member of the population) might be one of the stopping conditions of the algorithm. One of the major difficulties of EC algorithms (especially when applying local search) is the premature convergence toward sub-optimal solutions. The simplest mechanism to diversify the search process is the use of a random mutation operator. In order to avoid premature convergence there are also a number of other ways of maintaining the population diversity. Probably the oldest strategies are crowding [32] and its close relative,preselection 1221. Newer strategies arejtness sharing [74] and niching [95] in which the reproductive fitness allocated to an individualin a population is reduced proportionallyto the number of other individualsthat share the same region of the search space. An important characteristic of an EC algorithm is the way it deals with infeasible individuals which might be produced by the genetic operators. There are basically three different ways to handle such a situation. The simplest action is to reject infeasible individuals. Nevertheless, for some highly constrained problems (e.g., for timetabling problems) it might be very difficult to find feasible individuals. Therefore, the strategy of penalizing infeasible individuals in the function that measures the quality of an individual is sometimes more appropriate (or even unavoidable); see, for example, [29]. The third possibility consists in trying to repair an infeasible solution (see [50] for an example). Finally, we must point out that all the mentioned methods are usually developed within some research line dnving the creation of such operations. Two such important philosophies to deal with the intensificatioddiversificationtrade-off are hybridization of algorithms [ 1401 and parallel and structured algorithms [ 5 ] . This concludes our description of EC algorithms. EC algorithms have been applied to most CO problems and optimization problems in general. Recent successes were obtained in the rapidly growing bioinformatics area (see, for example, [58]), but also in multiobjective optimization 1281 and evolvable hardware [ 1291. For an extensive collection of references to EC applications we refer to [9]. In the following two subsections we are going to introduce two other populationbased methods which are sometimes also considered as EC algorithms. 1.3.1.1 Scatter Search and Path Relinking.
Scatter Search (SS) and its generalized form called Path Relinking (PR) [68, 711 differ from EC algorithms mainly by providing unifying principles for joining (or recombining) solutions based on generalized path constructions in Euclidean or neighborhood spaces. These principles are
POPULATION-BASED METHODS
23
Algorithm 11 Scatter Search (SS) and Path Relinking (PR) S s e e d + SeedGenerationo S d z v +- DiversificationGenerator(S,,,d)
Sref+- ChooseReferenceSet(Sd,,)
while termination conditions not met do while stopping conditions for inner loop not met do &ub +- SubsetGeneration(S,,f) Strzalc SolutionCornbination(Ssub) sdzsp Irnprovernent(Strzal) STef+ ReferenceSetUpdate(S,,f,Sdzsp) end while Selzte +- ChooseBestFrorn(SVef) sdzv +- DiversificationGenerator(Selzte)
S,,f
+
ChooseReferenceSet(Sd,,)
end while output: best solution found
based on strategies originally proposed for combining decision rules and constraints in the context of integer programming. The template for scatter search and path relinking is shown in Algorithm 11. Scatter Search and path relinking are search strategies that operate on a set of reference solutions (denoted by Srefin Algorithm 11) that are feasible solutions to the problem under consideration. The set of reference solutions corresponds to the population of individuals in EC algorithms. The algorithm starts by generating a set S s e e d of so-called seed solutions. This is done by means of some heuristic method. Then, in function DiversificationGenerator(SSeed), a method is applied that iteratively chooses one of the seed solutions and generates a new solution with the aim of creating a solution as much different as possible to the existing seed solutions. The newly generated solutions are added to the set of seed solutions if they do not already exist in there. From S d z u , the first set of reference solutions is then chosen such that it contains high quality as well as diverse solutions. Then the main loop of the algorithm starts. At each iteration the following cycle is repeated a number of times (which is a parameter of the algorithm). First, a subset of the reference solutions Ssub is chosen in fimction SubsetGeneration(Sref). Second, the solutions from Ssvb are recombined in fbnction SolutionCornbination(S,,b) to yield one or more trial solutions Strzal.These trial solutions may be infeasible solutions and are therefore usually modified by means of a repair procedure that transforms them into feasible solutions. An improvement mechanism Irnprovement(StrZa~) is then applied in order to try to improve the set of trial solutions (usually this improvement procedure is a local search). These improved solutions form the set of dispersed solutions, denoted by S d z s p . Finally, the set of reference solutions is updated with the solutions from s d z s p in hnction ReferenceSetUpdate(Sref,Sdzsp), again with respect to criteria such as quality and diversity. After a number of these cycles, a set of elite solutions Selzteis chosen from the set of reference solutions, the diversification generator is
24
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
applied, and the new set of reference solutions is chosen from the resulting set of solutions.
SolutionCombination(Ssub):In scatter search, which was introduced for solutions encoded as points in the Euclidean space, new solutions are created by building linear combinations of reference solutions using both positive and negative weights. This means that trial solutions can be both inside and outside the convex region spanned by the reference solutions. In path relinking the concept of combining solutions by making linear combinations of reference points is generalized to neighborhood spaces. Linear combinations of points in the Euclidean space can be re-interpreted as paths between and beyond solutions in a neighborhood space. To generate the desired paths, it is only necessary to select moves that satisfy the following condition: upon starting from an initiating solution chosen from S S U b , the moves must progressively introduce attributes contributed by a guiding solution that is equally chosen from s s u b . Scatter Search enjoys increasing interest in recent years. Among other problems it has been applied to multiobjective assignment problems [88] and to the linear ordering problem (LOP) [21]. For further references we refer to [72]. Path Relinlung is often used as a component in metaheuristics such as TS [90] and GRASP [3, 891.
1.3.1.2 Estimation of Distribution Algorithms. In the last decade more and more researchers tried to overcome the drawbacks of usual recombination operators of EC algorithms, which are likely to break good building blocks.” With this aim, a number of algorithms that are sometimes called estimation of distribution algorithms (EDAs) [lo81 have been developed (see Algorithm 12 for the algorithmic framework). These algorithms, which have a theoretical foundation in probability theory, are based on populations that evolve as the search progresses, like EC algorithms. They work as follows. First, an initial population P of solutions is randomly or heuristically generated. Then the following cycle is repeated until the termination criteria are met. A fraction of the best solutions of the current population (denoted by Psel) are selected in function ChooseFrom(P). Then from the solutions in Psel a probability distribution over the search space is derived in function EstimateProbabilityDistribution(Psel).This probability distribution is then sampled in function SampleProbabilityDistribution(p(x))to produce the population of the next iteration. For a survey of EDAs we refer the interested reader to [ 1 161. One of the first EDAs that was proposed for the application to CO problems is called Population-Based Incremental Learning (PBIL) [ 1 13. The method works on a real valued probability vector (i.e., the probability distribution over the search space) where each position corresponds to a binary decision variable. The objective is to change this probability vector over time such that high quality solutions are generated
’’
Roughly speaking, a good building block is a subset of the set of solution components which result in a high average quality of all the solutions that contain this subset.
POPULATION-BASED METHODS
25
Algorithm 12 Estimation of Distribution Algorithm (EDA) P c GeneratelnitialPopulation() while termination conditions not met do Psel +- ChooseFrom(P) {Eel P) p(x) = p(x I Psel)t EstimateProbabilityDistribution(Psel) P c SampleProbabilityDistribution(p(x)) end while output: best solution found
c
from it with a high probability. In contrast to PBIL, which estimates a distribution of promising solutions assuming that the decision variables are independent, various other approaches try to estimate distributions taking into account dependencies between decision variables. An example of EDAs regarding such painvise dependencies is MIMIC [31], while an example of multivariate dependencies is the Bayesian Optimization Algorithm (BOA) [ 1151. The field of EDAs is still quite young and much of the research effort is focused on methodology rather than high performance applications. Applications to the knapsack problem, the job shop scheduling (JSS) problem, and other CO problems can be found in [9 I].
1.3.2 Ant Colony Optimization Ant colony optimization (ACO) [42, 40, 451 is a metaheuristic approach that was inspired by the foraging behavior of real ants. This behavior-as described by Deneubourg et al. in [36]-enables ants to find shortest paths between food sources and their nest. Initially, ants explore the area surrounding their nest in a random manner. As soon as an ant finds a source of food, it carries some of this food to the nest. While walking, the ant deposits a chemical pheromone trail on the ground. The quantity of pheromone deposited, which may depend on the quantity and quality of the food, will guide other ants to the food source. The indirect communication between the ants via the pheromone trails enables them to find shortest paths between their nest and food sources. This functionality of real ant colonies is exploited in artificial ant colonies in order to solve CO problems. In ACO algorithms the chemical pheromone trails are simulated via a parametrized probabilistic model that is called the pheromone model. The pheromone model consists of a set of model parameters whose values are called thepheromone values. The basic ingredient of ACO algorithms is a constructive heuristic that is used for probabilistically constructing solutions using the pheromone values. In general, the ACO approach attempts to solve a CO problem by iterating the following two steps: Solutions are constructed using a pheromone model, that is, a parametrized probability distribution over the solution space.
26
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
Algorithm 13 Ant Colony Optimization (ACO) while termination conditions not met do ScheduleActivities AntBasedSolutionConstruction() Pheromoneupdate0 DaemonActions() {optional} end ScheduleActivities end while output: best solution found
0
The constructed solutions, and possibly solutions that were constructed in earlier iterations, are used to modify the pheromone values in a way that is deemed to bias future sampling toward high quality solutions.
The ACO metaheuristic framework is shown in Algorithm 13. It consists of three algorithmic components that are gathered in the ScheduleActivitiesconstruct. The ScheduleActivities construct does not specify how these three activities are scheduled and synchronized. This is up to the algorithm designer. In the following we explain these three algorithm components in more detail.
AntBasedSolutionConstruction():As mentioned above, the basic ingredient of ACO algorithms is a constructive heuristic for probabilistically constructing solutions. As outlined in Section 1.1.1, a constructive heuristic assembles solutions as sequences of solution components taken from a finite set of solution components C = { c l , . . . , c , } . A solution construction starts with an empty partial solution 57' = (). Then, at each construction step the current partial solution i P is extended by adding a feasible solution component from the set %(@) C C \ 5P, which is defined by the solution construction mechanism. The process of constructing solutions can be regarded as a walk (or a path) on the so-called construction graph Sc = (C,C), whose vertices are the solution components C and the set C has the connections. The allowed walks on & are hereby implicitly defined by the solution construction mechanism that defines set T ( 9 )with respect to a partial solution 57'. At each construction step, the choice of a solution component from %(sP) is done probabilistically with respect to the pheromone model, which consists of pheromone trail parameters '& that are associated to components c, E C.'*The set of all pheromone trail parameters is denoted by 7. The values of these parameters-the pheromone values-are denoted by T ~ In . most ACO algorithms the probabilities for choosing the next solution component-also called the transition probabilities-are defined
"Note that the description of the ACO inetaheuristic as given for example in 1401 allows also connections of the construction graph to be associated a pheromone trail parameter. However. for the purpose ofthis introduction it is sufficient to assume pheromone trail parainetas associated to componcnts.
POPULATION-BASED METHODS
27
as follows:
where 77 is a weighting function, which is a function that, sometimes depending on the current partial solution, assigns at each constructionstep a heuristic value v( c l ) to each feasible solution component c1 E ( y l ( 5 P ) . The values that are given by the weighting function are commonly called the heuristic information. Furthermore, cy and /3 are positive parameters whose values determine the relation between pheromone information and heuristic information. Pheromoneupdate(): In ACO algorithms we can find different types of pheromone updates. First, we outline a pheromone update that is used by most ACO algorithms. This pheromone update consists of two parts. First, a pheromone evaporation, which uniformly decreases all the pheromone values, is performed. From a practical point of view, pheromone evaporation is needed to avoid a too rapid convergence toward a sub-optimal region. It implements a useful form of forgetting, favoring the exploration of new areas in the search space. Then, one or more solutions from the current and/or from earlier iterations are used to increase the values of pheromone trail parameters on solution components that are part of these solutions. As a prominent example, we outline in the following the pheromone update rule that was used in Ant System (AS) [39, 421, which was the first ACO algorithm proposed. This update rule, which we henceforth call AS-update, is defined by
for i = 1,.. . n, where GtteTis the set of solutions that were generated in the current iteration. Furthermore, p E (011]is a parameter called evaporation rate, and F : 6 H R+ is a function such that f ( 5 ) < f(s’) =+ F ( s ) L F(s’),V5 # 5’ E 6. F ( .) is commonly called the quality function. Other types of pheromone update are rather optional and mostly aim at the intensification or the diversification of the search process. An example is a pheromone update in which, during the solution construction, when adding a solution component c, to the current partial solution 5 p , the pheromone value 7-i is immediately decreased. This kind of pheromone update aims at a diversification of the search process.
DaemonActions(): Daemon actions can be used to implement centralized actions which cannot be performed by single ants. Examples are the application of local search methods to the constructed solutions or the collection of global information that can be used to decide whether it is useful or not to deposit additional pheromone to bias the search process from a non-local perspective. As a practical example, the daemon may decide to deposit extra pheromone on the solution components that belong to the best solution found so far.
28
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
In general, different versions of ACO algorithms differ in the way they update the pheromone values. This also holds for the currently best-performing ACO variants in practice, which are Ant Colony System (ACS) [41], M A X - M Z N Ant System ( M M A S ) [ 1371, and ACO algorithms that are implemented in the hyper-cube framework (HCF) [ 181. Successful applications of ACO include the application to routing in communication networks [38], to the sequential ordering problem (SOP) [62], to resource constraint project scheduling (RCPS) [ 1001, and to the open shop scheduling (OSS) problem [ 171. Further references to applications of ACO can be found in [44,45]. 1.4 DECENTRALIZED METAHEURISTICS
There exists a long tradition in using structured populations in EC, especially associated to parallel implementations. Among the most widely known types of structured EAs, distributed (dEA) [141] and cellular (cEA) [96] algorithms are very popular optimization procedures [7]. Decentralizing a single population can be achieved by partitioning it into several subpopulations,where island EAs are run performing sparse exchanges of individuals (distributed EAs) or in the form of neighborhoods (cellular EAs). In distributed EAs, additional parameters controlling when migration occurs and how migrants are selectedincorporated fromito the sourcekarget islands are needed [ 14, 14 11. In cellular EAs, the existence of overlapped small neighborhoods helps in exploring the search space [lo]. These two kinds of EAs seem to provide a better sampling of the search space and improve the numerical and run time behavior of the basic algorithm in many cases [6, 751. The main difference of a cEA with respect to a panmictic EA is its decentralized selection and variation. In cEAs, the reproductive loop is performed inside every one of the numerous individual pools. In a cEA, one given individual has its own pool of potential mates defined by neighboring individuals; at the same time, one individual belongs to many pools. This 1D or 2D structure with overlapped neighborhoods is used to provide a smooth diffusion of good solutions across the grid. A distributed EA is a multi-population(island) model performing sparse exchanges of individuals among the elementary populations. This model can be readily implemented in distributed memory MIMD computers, which provides one main reason for its popularity. A migration policy controls the kind of distributed EA being used. The migration policy must define the island topology, when migration occurs, which individuals are being exchanged, the synchronization among the subpopulations,and the kind of integration of exchanged individuals within the target subpopulations. The advantages of a distributed model (either running on separate processors or not) is that it is usually faster than a panmictic EA. The reason for this is that the run time and the number of evaluations are potentially reduced due to its separate search in several regions from the problem space. A high diversity and species formation are two of their well-reported features.
HYBRIDIZATION OF METAHEURISTICS
29
As a summary, while a distributed EA has a large subpopulation, usually much larger than one individual, a cEA has typically one single individual in every subpopulation. In a dEA, the subpopulations are loosely coupled, while for a cEA they are tightly coupled. Additionally, in a dEA, there exist only a few subpopulations, while in a cEA there is a large number of them. To end this subsection, we must point out that there exists a large number of structured algorithms lying in between the distributed and the cellular classes, and much can be said on heterogeneity and synchronicity of the cooperating algorithms. The present book deals in depth with these issues in the forthcoming chapters.
1.5 HYBRIDIZATION OF METAHEURISTICS We conclude this introduction by discussing a very promising research direction, namely the hybridization of metaheuristics. In fact, many of the successful applications that we have cited in previous sections are hybridizations. In the following we distinguish different forms of hybridization. The first one consists of including components from one metaheuristic into another one. The second form concerns systems that are sometimes labelled as cooperative search. They consist of various algorithms exchanging information in some way. The third form is the integration of metaheuristics with more conventional artificial intelligence (Al) and operations research (OR) methods. For a taxonomy of hybrid metaheuristics see [140]. 1.5.1 Component Exchange Among Metaheuristics One of the most popular ways of hybridization concerns the use of trajectory methods in population-based methods. Most of the successful applications of EC and ACO make use of local search procedures. The power of population-based methods is certainly based on the concept of recombining solutions to obtain new ones. In EC algorithms and scatter search explicit recombinations are performed by one or more recombination operators. In ACO and EDAs recombination is implicit, because new solutions are generated by using a probability distribution over the search space which is a function of earlier populations. This enables the algorithm to make guided steps in the search space which are usually “larger” than the steps done by trajectory methods. In contrast, the strength of trajectory methods is rather to be found in the way in which they explore a promising region in the search space. As in those methods local search is the driving component; a promising area in the search space is searched in a more structured way than in population-based methods. In this way the danger of being close to good solutions but “missing” them is not as high as in population-based methods. Thus, metaheuristic hybrids that in some way manage to combine the advantage of population-based methods with the strength of trajectory methods are often very successful.
30
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
1.5.2 Cooperative Search A loose form of hybridization is provided by cooperative search [37, 80, 132, 1421, which consists of a search performed by possibly different algorithms that exchange information about states, models, entire sub-problems, solutions or search space characteristics. Typically, cooperative search algorithms consist of the parallel execution of search algorithms with a varying level of communication. The algorithms can be different or they can be instances of the same algorithm working on different models or running with different parameter settings. The algorithms composing a cooperative search system can be all approximate, all complete, or a mix of approximate and complete approaches. 1.5.3 Integration with Tree Search Methods and Constraint Programming
One of the most promising recent research directions is the integration of metaheuristics with more classical artificial intelligence and operations research methods, such as constraint programming (CP) and branch & bound or other tree search techniques. In the following we outline some of the possible ways of integration. Metaheuristics and tree search methods can be sequentially applied or they also can be interleaved. For instance, a tree search method can be applied to generate a partial solution which will then be completed by a metaheuristic approach. Alternatively, metaheuristics can be applied to improve a solution generated by a complete method. CP techniques can be used to reduce the search space of the problem under consideration (see, for example, [56]). In CP, CO problems are modelled by means of variables, domainsI3 and constraints, which can be mathematical (as, for example, in linear programming) or symbolic. Constraints encapsulate well-defined parts of the problem into sub-problems. Every constraint is associated to ajifilteving algorithm that deletes those values from a variable domain that do not contribute to feasible solutions. Metaheuristics (especially trajectory methods) may use CP to efficiently explore the neighborhood of the current solution, instead of simply enumerating the neighbors or randomly sampling the neighborhood. A prominent example of such integration is Large Neighborhood Search [ 1271 and related techniques. These approaches are effective mainly when the neighborhood to explore is very large or when problems (such as many real-world problems) have additional constraints (called side constraints). Another possible combination consists of introducing concepts or strategies from either class of algorithms into the other. For example, the concepts of tabu list and aspiration criteria-known from tabu search-can be used to manage the list of open nodes (i.e., the ones whose child nodes are not yet explored) in a tree search algorithm. Examples of these approaches can be found in [33, 1191.
l 3 We
restrict the discussion to finite domains.
CONCLUSIONS
31
1.6 CONCLUSIONS This chapter has offered a detailed description of the different kinds of metaheuristics as well as provided a clear structure in order to be accessible for readers interested only in parts of it. The main goal is to make this book self-contained for readers not familiar with some of the techniques later parallelized in the forthcoming chapters. We tried to arrive to a trade-off between detailed description of the metaheuristic working principles and a fast survey of techniques. Besides pure canonical techniques we also reinforced some promising lines of research for improving their behavior, such as hybridization, as well as some lines leading to direct parallelization of the algorithms, such as that of decentralized algorithms.
Acknowledgments The first and third authors acknowledge funding from the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project). The first author also acknowledges a Juan de la Cierva post-doctoral fellowship from the Spanish Ministry of Science and Technology.
REFERENCES 1. E. H. L. Aarts, J. H. M. Korst, and P. J. M. van Laarhoven. Simulated annealing. In E. H. L. Aarts and J. K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 91-120. John Wiley & Sons, Chichester, UK, 1997. 2. E. H. L. Aarts and J. K. Lenstra, editors. Local Search in Combinatorial Optimization. John Wiley & Sons, Chichester, UK, 1997. 3. R. M. Aiex, S. Binato, and M. G. C. Resende. Parallel GRASP with pathrelinking for job shop scheduling. Parallel Computing, 29(4):393-430,2003. 4. E. Alba, F. Luna, A. J. Nebro, and J. M. Troya. Parallel Heterogeneous Genetic Algorithms for Continuous Optimization. Parallel Computing, 30(5-6):699-7 19, 2004.
5. E. Alba and M. Tomassini. Parallelism and evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 6(5):443462, October 2002. 6. E. Alba and J. M. Troya. Improving flexibility and efficiency by adding parallelism to genetic algorithms. Statistics and Computing, 12(2):91-1 14,2002. 7. E. Alba and J.M. Troya. A survey of parallel distributed genetic algorithms. Complexity, 4(4):31-52, 1999.
32
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
8. T. Back. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996. 9. T. Back, D. B. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation. Institute of Physics Publishing Ltd, Bristol, UK, 1997.
10. S. Baluja. Structure and performance of fine-grain parallelism in genetic search. In S . Forrest, editor, Proceedings ofthe Fifih International Conference on Genetic Algorithms, pages 155-1 62. Morgan Kaufmann, 1993. 11. S. Baluja and R. Caruana. Removing the Genetics from the Standard Genetic Algorithm. In A. Prieditis and S. Russel, editors, The International Conference on Machine Learning 1995, pages 3 8 4 6 , San Mateo, California, 1995. Morgan Kaufmann Publishers.
12. R. Battiti and M. Protasi. Reactive Search, a history-base heuristic for MAXSAT. ACM Journal of Experimental Algorithmics, 2:Article 2, 1997. 13. R. Battiti and G. Tecchiolli. The Reactive Tabu Search. ORSA Journal on Computing, 6(2):126-140, 1994. 14. T. C. Belding. The distributed genetic algorithm revisited. In L. J. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 114- 121. Morgan Kaufmann, 1995. 15. H. Bersini and G . Seront. In search of a good evolution-optimization crossover. In R. Manner and B. Manderick, editors, Proceedings of PPSN-II, Second International Conference on Parallel Problem Solving from Nature, pages 479488. Elsevier, Amsterdam, The Netherlands, 1992. 16. S. Binato, W. J. Hery, D. Loewenstem, and M. G . C. Resende. A greedy randomized adaptive search procedure for job shop scheduling. In P. Hansen and C. C. Ribeiro, editors, Essays and Surveys on Metaheuristics. Kluwer Academic Publishers. 200 1. 17. C. Blum. Beam-ACO-Hybridizing ant colony optimization with beam search: An application to open shop scheduling. Computers & Operations Research, 32(6):1565-1591,2005. 18. C. Blum and M. Dorigo. The hyper-cube framework for ant colony optimization. IEEE Transactions on Systems, Man, and Cybernetics - Part B, 34(2): 1161-1 172, 2004. 19. C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys, 35(3):268-308,2003. 20. P. CalCgary, G. Coray, A. Hertz, D. Kobler, and P. Kuonen. A taxonomy of evolutionary algorithms in combinatorial optimization. Journal of Heuristics, 5: 145-158. 1999.
REFERENCES
33
2 1. V. Campos, F. Glover, M. Laguna, and R. Marti. An Experimental Evaluation of a Scatter Search for the Linear Ordering Problem. Journal of Global Optimization, 21:397-414.2001. 22. D. J. Cavicchio. Adaptive search using simulated evolution. PhD thesis, University of Michigan, Ann Arbor, MI, 1970. 23. V. Cerny. A thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45141-51. 1985. 24. P. Chardaire, J. L. Lutton, and A. Sutter. Thermostatistical persistency: A powerful improving concept for simulated annealing algorithms. European Journal of Operational Research, 86:565-579, 1995. 25. R. Chelouah and P. Siarry. A continuous genetic algorithm designed for the global optimization of multimodal functions. Journal of Heuristics, 6: 191-2 13, 2000. 26. R. Chelouah and P. Siarry. Tabu search applied to global optimization. European Journal of Operational Research, 123:256-270,2000. 27. R. Chelouah and P. Siarry. Genetic and Nelder-Mead algorithms hybridized for a more accurate global optimization of continuous multiminima functions. European Journal of Operational Research, 148:335-348,2003. 28. C. A. Coello Coello. An Updated Survey of GA-Based Multiobjective Optimization Techniques. ACM Computing Surveys, 32(2): 109-143,2000. 29. A. Colorni, M. Dorigo, and V. Maniezzo. Metaheuristics for high school timetabling. Computational Optimization andApplications, 9(3):275-298, 1998. 30. D. T. Connolly. An improved annealing scheme for the QAP. European Journal of Operational Research, 46:93-100, 1990. 31. J. S. de Bonet, C. L. Isbell Jr., and P. Viola. MIMIC: Finding optima by estimating probability densities. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 7 (NIPS7), pages 424-43 1. MIT Press, Cambridge, MA, 1997. 32. K. A. DeJong. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, University of Michigan, Ann Arbor, MI, 1975. Dissertation Abstracts International 36( lo), 5 140B, University Microfilms Number 76-938 1. 33. F. Della Croce and V. T’kindt. A Recovering Beam Search algorithm for the one machine dynamic total completion time scheduling problem. Journal of the Operational Research Society, 53( 1 1): 1275-1280, 2002. 34. M. Dell’Amico, A. Lodi, and F. Maffioli. Solution of the Cumulative Assignment Problem with a well-structured Tabu Search method. Journal of Heuristics, 5 : 123-143, 1999.
34
AN INTRODUCTION T O METAHEURISTIC TECHNIQUES
35. M. L. den Besten, T. Stiitzle, and M. Dorigo. Design of iterated local search algorithms: An example application to the single machine total weighted tardiness problem. In E. J. W. Boers, J. Gottlieb, P. L. Lanzi, R. E. Smith, S. Cagnoni, E. Hart, G. R. Raidl, and H. Tijink, editors, Applications of Evolutionary Computing: Proceedings of Evo Workshops 2001, volume 2037 of Lecture Notes in Computer Science, pages 441452. Springer Verlag, Berlin, Germany, 2001. 36. J.-L. Deneubourg, S. Aron, S. Goss, and J.-M. Pasteels. The self-organizing exploratory pattern of the argentine ant. Journal of Insect Behaviour, 3: 159168,1990. 37. J. Denzinger and T. Offerman. On cooperation between evolutionary algorithms and other search paradigms. In Proceedings of Congress on Evolutionary Computation - CEC’1999, pages 2317-2324, 1999.
38. G. Di Car0 and M. Dorigo. AntNet: Distributed stigmergetic control for communications networks. Journal of Artificial Intelligence Research, 9:317-365, 1998. 39. M. Dorigo. Optimization, Learning and Natural Algorithms (in Italian). PhD thesis, Dipartimento di Elettronica, Politecnico di Milano, Italy, 1992. 40. M. Dorigo, G. Di Caro, and L. M. Gambardella. Ant algorithms for discrete optimization. Artificial Life, 5(2):137-172, 1999. 4 1. M. Dorigo and L. M. Gambardella. Ant Colony System: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53-66, 1997. 42. M. Dorigo, V. Maniezzo, and A. Colorni. Ant System: Optimization by a colony of cooperating agents. IEEE Transactions on Svstems, Mun, and Cvbernetics Part B, 26(1):2941, 1996. 43. M. Dorigo and T. Stiitzle. http://www.metaheuristics.net/, 2000. Visited in January 2003. 44. M. Dorigo and T. Stiitzle. The ant colony optimization metaheuristic: Algorithms, applications and advances. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science, pages 25 1-285. Kluwer Academic Publishers, Norwell, MA, 2002. 45. M. Dorigo and T. Stiitzle. Ant Colony Optimization. MIT Press, Cambridge, MA, 2004. 46. J. DrCo and P. Siarry. A new ant colony algorithm using the heterarchical concept aimed at optimization of multiminima continuous functions. In M. Dorigo, G. Di Caro, and M. Sampels, editors, Proceedings of ANTS 2002 - From Ant Colonies to Artificial Ants: Third International Workshopon Ant Algorithms, volume 2463
REFERENCES
35
of Lecture Notes in Computer Science, pages 2 16-221. Springer Verlag, Berlin, Germany, 2002. 47. G. Dueck. New Optimization Heuristics. Journal of Computational Physics, 104:86-92, 1993. 48. G. Dueck and T. Scheuer. Threshold Accepting: A General Purpose Optimization Algorithm Appearing Superior to Simulated Annealing. Journal of Computational Physics, 90:161-175, 1990. 49. A. E. Eiben, P.-E. RauC, and Z. Ruttkay. Genetic algorithms with multi-parent recombination. In Y.Davidor, H.-P. Schwefel, and R. Manner, editors, Proceedings of the 3rd Conference on Parallel Problem Solvingfrom Nature, volume 866 of Lecture Notes in Computer Science, pages 78-87, Berlin, 1994. Springer. 50. A. E. Eiben and Z. Ruttkay. Constraint satisfaction problems. In T. Back, D. Fogel, and M. Michalewicz, editors, Handbook ofEvolutionary Computation. Institute of Physics Publishing Ltd, Bristol, UK, 1997. 5 1. A. E. Eiben and C. A. Schippers. On evolutionary exploration and exploitation. Fundamenta Informaticae, 35: 1-16, 1998. 52. W. Feller. An Introduction to Probability Theory and its Applications. John Whiley, 1968. 53. T. A. Feo and M.G. C. Resende. Greedy randomized adaptive search procedures. Journal of Global optimization, 6~109-133, 1995. 54. P. Festa and M. G. C. Resende. GRASP: An annotated bibliography. In C. C. Ribeiro and P. Hansen, editors, Essays and Surveys on Metaheuristics, pages 325-367. Kluwer Academic Publishers, 2002. 55. M. Fleischer. Simulated Annealing: past, present and future. In C. Alexopoulos, K. Kang, W. R. Lilegdon, and G . Goldsman, editors, Proceedings ofthe 1995 Winter Simulation Conference, pages 155-161, 1995. 56. F. Focacci, F. Laburthe, and A. Lodi. Local Search and Constraint Programming. In F. Glover and G. Kochenberger, editors, Handbook OfMetaheuristics, volume 57 of International Series in Operations Research & Management Science. Kluwer Academic Publishers, Norwell, MA, 2002. 57. D. B. Fogel. An introduction to simulated evolutionary optimization. ZEEE Transactions on Neural Networks, 5( 1):3-14, January 1994. 58. G. B. Fogel, V. W. Porto, L).G. Weekes, D. B. Fogel, R. H. Griffey, J. A. McNeil, E. Lesnik, D. J. Ecker, and R. Sampath. Discovery of RNA structural elements using evolutionary computation. Nucleic Acids Research, 30(23):53 10-53 17, 2002.
36
AN INTRODUCTION T O METAHEURISTIC TECHNIQUES
59. L. J. Fogel. Toward inductive inference automata. In Proceedings of the International Federationfor Information Processing Congress, pages 395-399, Munich, 1962. 60. L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. Wiley, 1966. 61. C. Fonlupt, D. Robilliard, P. Preux, and E. G. Talbi. Fitness landscapes and performance of meta-heuristics. In S. Vol3, S. Martello, 1. Osman, and C. Roucairol, editors, Meta-heuristics: udvunces and trends in local search paradigms for optimization. Kluwer Academic, 1999. 62. L. M. Gambardella and M. Dorigo. Ant Colony System hybridized with a new local search for the sequential ordering problem. INFORMS Journal on Computing, 12(3):237-255,2000, 63. M. R. Garey and D. S. Johnson. Computers and intractability; u guide to the theory of NP-completeness. W. H. Freeman, 1979. 64. M. Gendreau, G. Laporte, and J.-Y. Potvin. Metaheuristics for the capacitated VRP. In I? Toth and D. Vigo, editors, The Vehicle Routing Problem, volume 9 of SIAM Monographs on Discrete Mathematics and Applications, pages 129-1 54. SIAM, Philadelphia, 2002. 65. F. Glover. Heuristics for Integer Programming Using Surrogate Constraints. Decision Sciences, 8: 156166, 1977. 66. F. Glover. Future paths for integer programming and links to artificial intelligence. Computers & Operations Research, 131533-549,1986. 67. F. Glover. Tabu Search Part 11. ORSA Journal on Computing, 2(1):4-32, 1990. 68. F. Glover. Scatter search and path relinking. In D. Corne, M. Dorigo, and F. Glover, editors, New Ideas in Optimization, Advanced topics in computer science series. McGraw-Hill, 1999. 69. F. Glover and G. Kochenberger, editors. Handbook of Metaheuristics. Kluwer Academic Publishers, Nonvell, MA, 2002. 70. F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, 1997. 7 1. F. Glover, M. Laguna, and R. Marti. Fundamentals of scatter search and path relinking. Control and Cybernetics, 29(3):653-684,2000. 72. F. Glover, M. Laguna, and R. Marti. Scatter Search and Path Relinking: Advances and Applications. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science. Kluwer Academic Publishers, Nonvell, MA, 2002. 73. D. E. Goldberg. Genetic algorithms in search, optimization and machine learning. Addison Wesley, Reading, MA, 1989.
REFERENCES
37
74. D. E. Goldberg and J. Richardson. Genetic algorithms with sharing for multimodal function optimization. In J. J. Grefenstette, editor, Genetic Algorithms and their Applications, pages 4 1-139. Lawrence Erlbaum Associates, Hillsdale, NJ. 1987. 75. V. S. Gordon and D. Whitley. Serial and parallel genetic algorithms as function optimizers. In S. Forrest, editor, Proceedings of the F$h International Conference on Genetic Algorithms, pages 177-1 83. Morgan Kaufmann, 1993. 76. P. Hansen. The steepest ascent mildest descent heuristic for combinatorial programming. In Congress on Numerical Methods in Combinatorial Optimization, Capri, Italy, 1986. 77. P. Hansen and N. MladenoviC. Variable Neighborhood Search for the pMedian. Location Science, 5:207-226, 1997. 78. P. Hansen and N. Mladenovid. Variable neighborhood search: Principles and applications. European Journal of Operational Research, 130:449467,200 1. 79. A. Hertz and D. Kobler. A framework for the description of evolutionary algorithms. European Journal of Operational Research, 126:1-12,2000.
80. T. Hogg and C. P. Williams. Solving the really hard problems with cooperative search. In Proceedings ofAAAI93, pages 213-235. AAAI Press, 1993. 81. J. H. Holland. Adaption in natural and artijicial systems. The University of Michigan Press, Ann Harbor, MI, 1975. 82. L. Ingber. Adaptive simulated annealing (ASA): Lessons learned. Control and Cybernetics - Special Issue on Simulated Annealing Applied to Combinatorial Optimization, 25( 1):33-54, 1996. 83. D. S. Johnson and L. A. McGeoch. The traveling salesman problem: a case study. In E. H. L. Aarts and J. K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 21 5-3 10. John Wiley & Sons, Chichester, UK, 1997. 84. T. Jones. Evolutionary Algorithms, Fitness Landscapes and Search. PhD thesis, Univ. of New Mexico, Albuquerque, NM, 1995. 85. D. E. Joslin and D. P. Clements. "Squeaky Wheel" Optimization. Journal qf Art!iicial Intelligence Research, 10:353-373, 1999. 86. P. Kilby, P. Prosser, and P. Shaw. Guided Local Search for the Vehicle Routing Problem with time windows. In S. Vo13, S. Martello, I. Osman, and C. Roucairol, editors, Meta-heuristics: advances and trends in local search paradigms for optimization, pages 473486. Kluwer Academic, 1999. 87. S. Kirkpatrick, C . D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671-680, 1983.
38
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
88. M. Laguna, H. R. LourenGo, and R. Marti. Assigning Proctors to Exams with Scatter Search. In M. Laguna and J. L. Gonzalez-Velarde, editors, Computing Toolsfor Modeling, Optimization and Simulation: Interfaces in Computer Science and Operations Research, pages 2 15-227. Kluwer Academic Publishers, Boston, MA, 2000. 89. M. Laguna and R. Marti. GRASP and Path Relinking for 2-Layer Straight Line Crossing Minimization. INFORMS Journal on Computing, 11(1):44-52, 1999.
90. M. Laguna, R. Marti, and V. Campos. Intensification and Diversification with Elite Tabu Search Solutions for the Linear Ordering Problem. Computers and Operations Research, 26:1217-1230, 1999. 9 1. P. Larraiiaga and J. A. Lozano, editors. Estimation OfDistribution Algorithms: A New, Too1,forEvolutionary Cornputation. Kluwer Academic Publishers, Boston, MA, 2002. 92. E. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. The Travelling Salesman Problem. John Wiley & Sons, New York, NY, 1985. 93. H. R. Lourenqo, 0. Martin, and T. Stiitzle. Iterated local search. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science, pages 32 1-353. Kluwer Academic Publishers, Nonvell, MA, 2002. 94. M. Lundy and A. Mees. Convergence of an annealing algorithm. Mathematical Programming, 34(1):111-124, 1986. 95. S.W. Mahfoud. NichingMethods.for Genetic Algorithms. PhD thesis, University of Illinois at Urbana-Champaign, Urbana, IL, 1995. 96. B. Manderick and P. Spiessens. Fine-grained parallel genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 428433. Morgan Kaufmann, 1989. 97. 0. Martin and S. W. Otto. Combining Simulated Annealing with Local Search Heuristics. Annals of Operations Research, 63:57-75, 1996. 98. 0. Martin, S. W. Otto, and E. W. Felten. Large-step markov chains for the traveling salesman problem. Complex Systems, 5(3):299-326, 1991. 99. M. Mathur, S. B. Karale, S. Priye, V. K. Jyaraman, and B. D. Kulkami. Ant colony approach to continuous hnction optimization. Industrial & Engineering Chemistry Research, 39:38 14-3822, 2000. 100. D. Merkle, M. Middendorf, and H. Schmeck. Ant Colony Optimization for Resource-Constrained Project Scheduling. IEEE Trunsuctions on Evolutionary Computation, 6(4):333-346,2002.
REFERENCES
39
10 1. N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21: 1087-1092, 1953. 102. Z. Michalewicz and M. Michalewicz. Evolutionary computation techniques and their applications. In Proceedings of the IEEE International Conference on Intelligent Processing Systems, pages 14-24, Beijing, China, 1997. Institute of Electrical & Electronics Engineers, Incorporated. 103. P. Mills and E. Tsang. Guided Local Search for solving SAT and weighted MAX-SAT Problems. In Ian Gent, Hans van Maaren, and Toby Walsh, editors, SAT2000, pages 89-106. 10s Press, 2000. 104. M. Mitchell. A n introduction to genetic algorithms. MIT press, Cambridge, MA, 1998. 105. N. Mladenovid and D. UroSevic. Variable Neighborhood Search for the kCardinality Tree. In Proceedings of the Fourth Metaheuristics International Conference, volume 2, pages 743-747,2001. 106. P. Moscato. Memetic algorithms: A Short Introduction. In F. Glover, D. Corne and M. Dorigo, editors, New Ideas in Optimization. McGraw-Hill, 1999.
107. H. Miihlenbein. Evolution in time and space - the parallel genetic algorithm. In G. J. E. Rawlins, editor, Foundations of Genetic Algorithms. Morgan Kaufmann, San Mateo, USA, 1991. 108. H. Miihlenbein and G. Paal3. From recombination of genes to the estimation of distributions. In H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwefel, editors, Proceedings of the 4th Conference on Parallel Problem Solving .from Nature - PPSN IV, volume 1411 of Lecture Notes in Computer Science, pages 178-187, Berlin, 1996. Springer. 109. H. Miihlenbein and H.-M. Voigt. Gene Pool Recombination in Genetic Algorithms. In I. H. Osman and J. P. Kelly, editors, Proc. ofthe Metaheuristics Co&nnce, Nonvell, USA, 1995. Kluwer Academic Publishers. 110. J. A. Nelder and R. Mead. A simplex method for function minimization. Computer Journal, 7:308-3 13, 1965. 111. G. L. Nemhauser and A. L. Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, New York, 1988. 112. E. Nowicki and C. Smutnicki. A fast taboo search algorithm for the job-shop problem. Management Science, 42(2):797-8 13, 1996. 113. I. H. Osman. Metastrategy simulated annealing and tabu search algorithms for the vehicle routing problem. Annals of Operations Research, 41 :421-45 1, 1993. 114. C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization -Algorithms and Complexity. Dover Publications, Inc., New York, 1982.
40
AN INTRODUCTION TO METAHEURISTIC TECHNIQUES
115. M. Pelikan, D. E. Goldberg, and E. Cantu-Paz. BOA: The Bayesian optimization algorithm. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proceedings ofthe Genetic and Evolutionmy Computation Conference (GECCO-1999), volume I, pages 525-532. Morgan Kaufmann Publishers, San Fransisco, CA, 1999. 116. M. Pelikan, D. E. Goldberg, and F. Lobo. A survey of optimization by building and using probabilistic models. Technical Report No. 99018, IlIiGAL, University of Illinois, 1999. 117. L. S. Pitsoulis and M. G. C. Resende. Greedy Randomized Adaptive Search procedure. In P. M. Pardalos and M. G. C. Resende, editors, HandbookofApplied Optimization, pages 168-183. Oxford University Press, 2002. 118. M. Prais and C. C. Ribeiro. Reactive GRASP: An application to a matrix decomposition problem in TDMA traffic assignment. INFORMS Journal on Computing, 12:164-176, 2000. 119. S. Prestwich. Combining the Scalability of Local Search with the Pruning Techniques of Systematic Search. Annals of Operations Research, 1 15:51-72, 2002. 120. N. J. Radcliffe. Forma Analysis and Random Respectful Recombination. In Proceedings of the Fourth International Conference on Genetic Algorithms, ICGA 1991, pages 222-229. Morgan Kaufmann Publishers, San Mateo, California, 1991. 121. I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nnch Prinzipien der biologischen Evolution. Frommann-Holzboog, 1973. 122. C. R. Reeves, editor. Modern Heuristic Techniquesfor Combinatorial Problems. Blackwell Scientific Publishing, Oxford, England, 1993. 123. C. R. Reeves. Landscapes, operators and heuristic search. Annals of Operations Research, 86:473-490, 1999. 124. C. R. Reeves and J. E. Rowe. Genetic Algorithms: Principles and Perspectives. A Guide to GA Theory. Kluwer Academic Publishers, Boston (USA), 2002. 125. M. G. C. Resende and C. C. Ribeiro. A GRASP for graph planarization. Networks, 29:173-189,1997. 126. C. C. Ribeiro and M. C. Souza. Variable neighborhood search for the degree constrained minimum spanning tree problem. Discrete Applied Mathematics, 118~43-54,2002. 127. P. Shaw. Using Constraint Programming and Local Search Methods to Solve Vehicle Routing Problems. In M. Maher and J.-F. Puget, editors, Principle and Practice o f Constraint Programming - CP98, volume 1520 of Lecture Notes in Computer Science. Springer, 1998.
REFERENCES
41
128. P. Siany, G. Berthiau, F. Durbin, and J. Haussy. Enhanced simulated annealing for globally minimizing functions of many-continuous variables. ACM Transactions on Mathematical Sofmare, 23(2):209-228, 1997. 129. M. Sipper, E. Sanchez, D. Mange, M. Tomassini, A. Perez-Uribe, and A. Stauffer. A Phylogenetic, Ontogenetic, and Epigenetic View of Bio-Inspired Hardware Systems. IEEE Transactions on Evolutionary Computation, 1(1):83-97, 1997. 130. K. Smyth, H. H. Hoos, and T. Stiitzle. Iterated robust tabu search for MAX-SAT. In Proc. of the 16th Canadian Conference on Artijicial Intelligence (AI’2003), volume 2671 of Lecture Notes in Computer Science, pages 129-144. Springer Verlag, 2003. 131. K. Socha. Extended ACO for continuous and mixed-variable optimization. In M. Dorigo, M. Birattari, C. Blum, L. M. Gambardella, F. Mondada, and T. Stiitzle, editors, Proceedings of ANTS 2004 - Fourth International Workshop on Ant Algorithms and Swarm Intelligence, Lecture Notes in Computer Science 3 172, pages 25-36. Springer Verlag, Berlin, Germany, 2004. 132. L. Sondergeld and S. Vo13. Cooperativeintelligent search using adaptive memory techniques. In S. Vo13, S. Martello, I. Osman, and C. Roucairol, editors, Metaheuristics: advances and trends in local search paradigms .for optimization, chapter 21, pages 297-3 12. Kluwer Academic Publishers, 1999. 133. W. M. Spears, K. A. De Jong, T. Back, D. B. Fogel, and H. de Garis. An overview of evolutionary computation. In P. B. Brazdil, editor, Proceedings of the European Conference on Machine Learning (ECML-93), volume 667, pages 442-459, Vienna, Austria, 1993. Springer Verlag. 134. P. F. Stadler. Landscapes and their correlation functions. Journal of Mathematical Chemistry, 20:145, 1996. Also available as SFI preprint 95-07-067. 135. R. Storn and K. Price. Differential evolution -A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11:341-359, 1997. 136. T. Stiitzle. Local Search Algorithms for Combinatorid Problems - Analysis, Algorithms and New Applications. DISK1 - Dissertationen zur Kiinstliken Intelligenz. infix, Sankt Augustin, Germany, 1999. 137. T. Stiitzle and H. H. Hoos. M A X - M Z N Ant System. Future Generation Computer Systems, 16(8):889-914,2000. 138. G. Syswerda. Simulated Crossover in Genetic Algorithms. In L. D. Whitley, editor, Proceedings of the second workshop on Foundations of Genetic Algorithms, pages 239-255, San Mateo, California, 1993. Morgan Kaufmann Publishers. 139. E. D. Taillard. Robust Taboo Search for the Quadratic Assignment Problem. Parallel Computing, 17:443-455, 1991.
42
AN INTRODUCTION T O METAHEURISTIC TECHNIQUES
140. E-G. Talbi. A Taxonomy of Hybrid Metaheuristics. Journal of Heuristics, 8( 5):54 1-564,2002. 141. R. Tanese. Distributed genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 434439. Morgan Kaufmann, 1989. 142. M. Toulouse, T. G. Crainic, and B. Sa nd. An experimental study of the systemic behavior of cooperative search algorithms. In S. VoR, S. Martello, I. Osman, and C. Roucairol, editors, Meta-heuristics: adimnces and trends in local search paradigms for optimization, chapter 26, pages 373-392. Kluwer Academic Publishers. 1999. 143. D. UroSeviC, J. Brimberg, and N. MladenoviC. Variable neighborhood decomposition search for the edge weighted k-cardinality tree problem. Computers & Operations Research, 3 l(8): 1205-1213,2004. 144. P. J. M. Van Laarhoven, E. H. L. Aarts, and J. K. Lenstra. Job Shop Scheduling by Simulated Annealing. Operations Research, 40: 113-125, 1992. 145. M. D. Vose. The simple genetic algorithm: foundations and theory. MIT Press, Cambridge, MA, 1999. 146. C . Voudouris and E. Tsang. Guided Local Search. European Journal of Operational Research, 113(2):469-499, 1999. 147. A. S. Wade and V. J. Rayward-Smith. Effective local search for the steiner tree problem. Studies in Locational Analysis, 11:219-24 1, 1997. Also in Advances in Steiner Trees, ed. by Ding-Zhu Du, J. M.Smith and J. H. Rubinstein, Kluwer, 2000. 148. M. Yagiura and T. Ibaraki. On metaheuristic algorithms for combinatorial optimization problems. Systems and Computers in Japan, 32(3):33-55,2001.
2
Measuring the Performance of Parallel Metaheuristics ENRIQUE ALBA, GABRIEL LUQUE Universidad de Malaga, Spain
2.1
INTRODUCTION
Most optimization tasks found in real-world applications impose several constraints that usually do not allow the utilization of exact methods. The complexity of these problems (they are often NP-hard [ 131) or the limited resources available to solve them (time, memory) have made the development of metaheuristics a major field in operations research. In these cases, metaheuristics provide optimal or suboptimal feasible solutions in a reasonable time. Although the use of metaheuristics allows to significantly reduce the time of the search process, the high dimension of many tasks will always pose problems and result in time-consuming scenarios for industrial problems. Therefore, parallelism is an approach not only to reduce the resolution time but also to improve the quality of the provided solutions. The latter holds since parallel algorithms usually run a different search model with respect to sequential ones [4]. Unlike exact methods, where time efficiency is a main measure for success, there are two chief issues in evaluating parallel metaheuristics: how fast can solutions be obtained, and how far they are from the optimum. We can distinguish between two different approaches for analyzing metaheuristics: a theoretical analysis (worse-case analysis, average-case analysis, ...) or an experimental analysis. Several authors [ 16, 201 have developed theoretical analyses of some importance for a number of heuristics and problems. But, their difficulty, which makes it hard to obtain results for most realistic problems and algorithms, severely limits their range of application. As a consequence most of the metaheuristics are evaluated empirically in an ad hoc manner. An experimental analysis usually consists in applying the developed algorithms to a collection of problem instances and comparatively report the observed solution quality and consumed computational resources (usually time). Other researchers [5, 261 have tried to offer a kind of methodological framework to deal with the ex43
44
MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS
perimental evaluations of heuristics, what mainly motivates this chapter. Important aspects of an evaluation are the experimental design, finding good sources of test instances, measuring the algorithmic performance in a meaninghl way, sound analysis, and clear presentation of results. Due to the great difficulty of malung all this correctly, the actual main issues of the experimental evaluation are simplified to just highlight some guidelines for designing experiments and reporting on the experimentation results. An excellent algorithmic survey about simulations and statistical analysis is given in [24]. In that paper, McGeoch includes an extensive set of basic references on statistical methods and a general guide for designing experiments. In this chapter, we focus on how the experiments should be performed and how the results must be reported in order to make fair comparisons between parallel metaheuristics. Specially, we are interested in revising, proposing, and applying parallel performance metrics and statistical analysis guidelines to ensure that our conclusions are correct. This chapter is organized as follows. The next section briefly summarizes some parallel metrics such as speedup and related performance measures. Section 2.3 discusses how to report results on parallel metaheuristics. Then, in the next section, we perform several practical experiments to illustrate the importance of the metric in the conclusions. Finally, some concluding remarks are outlined in Section 2.5.
2.2 PARALLEL PERFORMANCE MEASURES There are different metrics to measure the performance of parallel algorithms. In the first subsection we discuss in detail the most common measure, i.e., the speedup, and address its meaningful utilization in parallel metaheuristics. Later, in a second subsection we summarize some other metrics also found in the literature 2.2.1
Speedup
The most important measure of a parallel algorithm is the speedup. This metric compares two times: ratio between sequential and parallel times. Therefore, the definition of time is the first aspect that we must face. In a uni-processor system, a common performance is the CPU time to solve the problem; this is the time the processor spends executing algorithm instructions, typically excluding the time for input of problem data, output of results, and system overhead activities. In the parallel case, time is neither a sum of CPU times on each processor nor the largest among them. Since the objective of parallelism is the reduction of the real time, time should definitely include any overhead activity time because it is the price of using a parallel algorithm. Hence the most prudent choice for measuring the performance of a parallel code is the wall-clock time to solve the problem at hands. This means using the time between starting and finishing the whole algorithm. The speedup compares the serial time against the parallel time to solve a particular problem. If we denote by T, the execution time for an algorithm using m processors,
PARALLEL PERFORMANCE MEASURES
45
the speedup is the ratio between the faster execution time on a uni-processor T I and the execution time on m processors T,: s,
=
-.Tl
(2.1) Tm For non-deterministic algorithms we cannot use this metric directly. For this kind of method, the speedup should instead compare the mean serial execution time against the mean parallel execution time:
With this definition we can distinguish among: sublinear speedup (s, < ni), linear speedup (s, = m),and superlinear speedup (s, > m). The main difficulty with that measure is that the researchers do not agree on the meaning of TI and T,,,. In his study, Alba [ 11 distinguishes between several definitions of speedup depending on the meaning of these values (see Table 2.1). Table 2.1 Taxonomy of speedup measures proposed by Alba 111
I. Strong speedup 11. Weak speedup A. Speedup with solution stop 1. Versus panmixia 2. Orthodox B. Speed with predefined effort Strong speedup (type I) compares the parallel run time against the best-so-far sequential algorithm. This is the most exact definition of speedup, but due to the difficulty of finding the current most efficient algorithm, most designers of parallel algorithms do not use it. Weak speedup (type 11) compares the parallel algorithm developedby a researcher against hidher own serial version. In this case, two stopping criteria for the algorithms exist: solution quality or maximum effort. The author discards the latter because it is against the aim of speedup to compare algorithms not yielding results of equal accuracy. He proposes two variants of the weak speed with solution stop: to compare the parallel algorithm against the canonical sequential version (type II.A.l) or to compare the run time of the parallel algorithm on one processor against the run time of the same algorithm on m processors (type II.A.2). In the first case we are comparing two clearly different algorithms. Ban and Hickman in [6] showed a different taxonomy: Speedup, Relative speedup, and Absolute speedup. The Speedup measures the ratio between the time of the faster serial code on a parallel machine with the time of the parallel code using m processors on the same machine. The Relative speedup is the ratio of the serial execution time with parallel code on one processor with respect to the execution time of that code on rn processors. This definition is similar to the type II.A.2 shown above. The
46
MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS
Absolute speedup compares the fastest serial time on any computer with the parallel time on m processors. This metric is the same as the strong speedup defined by [ 11. As a conclusion, it is clear that the parallel metaheuristics should compute a similar accuracy as the sequential ones. Just in this case we are allowed to compare times. The used times are average mean times: the parallel code on one machine versus the parallel code on nz machines. All this define a sound way for comparisons, both practical (no best algorithm needed) and orthodox (same codes, same accuracy).
2.2.1.1 Superlinear Speedup. Although several authors have reported superlinear speed [8,23], its existence is still controversial. Anyway, based on their experiences, we can expect to get superlinear speedup sometimes. In fact, we can point out several sources behind superlinear speedup: 0
0
0
Implementation Source. The algorithm being run on one processor is "inefficient" in some way. For example, if the algorithm uses lists of data, the parallel one can be faster because it manages shorter lists. On the other hand, the parallelism can simplify several operations of the algorithm. Numerical Source. Since the search space is usually very large, the sequential program may have to search a large portion before finding the required solution. On the other hand, the parallel version may find the solution more quickly due to the change in the order in which the space is searched. Physical Source. When moving from a sequential to a parallel machine, it is often the case that one gets more than an increase in CPU power (see Figure 2.1). Other resources, such as memory, cache, etc. may also increase linearly with the number of processors. A parallel metaheuristic may achieve superlinear speedup by taking advantage of these additional resources.
We therefore conclude that superlinear speedup is possible theoretically and, as result of empirical tests, both for homogeneous [31] and heterogeneous [3, 111 computing networks. 2.2.2 Other Parallel Metrics
Although the speedup is a widely used metric, there exist other measures of the performance of a parallel metaheuristic. The ejiciency (Equation 2.3) is a normalization of the speedup and allows to compare different algorithms (em = 1 means linear speedup): em =
-.S m
(2.3) m There exist several variants of the efficiency metric. For example, the incremental eficiency (Equation 2.4) shows the fraction of time improvement from adding another processor, and it is also often used when the uni-processor times are unknown. This
PARALLEL PERFORMANCE MEASURES
47
Sequential Algorithm
Parallel Algorithm
Fig. 2.1 Physical source for superlinear speedup. The population does not tit into a single cache, but when run in parallel, the resulting chunks do tit, providing superlinear values of
speedup.
metric has been later generalized (Equation 2.5) to measure the improvement attained by increasing the number of processors from n to m.
The previous metrics indicate the improvement coming from using additional processing elements, but they do not measure the utilization of the available memory. The scaled speedup (Equation 2.6) addresses this issue and allows to measure the full utilization of the machines resources:
ssm
=
Estimated time to solve problem of size nm on 1 processor Actual time to solve problem of size nm on m processors
7
(2.6)
where n is the size of the largest problem which may be stored in the memory associated to one processor. Its major disadvantage is that performing an accurate estimation of the serial time is difficult and it is impractical for many problems. Closely related to scaled speedup is scaleup, but it is not based on an estimation uni-processor time:
48
MEASURING THE PERFORMANCE OF PARALLEL METAHEURlSTlCS
=
Time to solve k problems on m processors Time to solve n k problems on nm processors'
(2.7)
This metric measures the ability of the algorithm to solve a n-times larger job on a n-times larger system in the same time as the original system. Therefore, linear speedup occurs when SU,~,, = 1. Finally, K q and Flatt [ 191 have devised an interesting metric for measuring the performance of any parallel algorithm that can help us to identify much more subtle effects than using speedup alone. They call it the serial fraction of the algorithm (Equation 2.8):
Ideally, the serial fraction should stay constant for an algorithm. If a speedup value is small since the loss of efficiency is due to the limited parallelism of the program, we can still say that the result is good if f m remains constant for different values of m. On the other side, smoothly increasing fm is a warning that the granularity of the parallel tasks is too fine. A third scenario is possible in which a significant reduction in f m occurs, indicating something akin to superlinear speedup. If superlinear speedup occurs, then fm would take a negative value.
2.3 HOW TO REPORT RESULTS In general, the goal of a publication is to present a new approach or algorithm that works better, in some sense, than existing algorithms. This requires experimental tests to compare the new algorithm with respect to the rest. It is, in general, hard to make fair comparisons between algorithms. The reason is that we can infer different conclusions from the same results depending on the metrics we use. This is specially important for non-deterministic methods. In this section we address the main issues on experimental testing for reporting numerical effort results and the statistical analysis that must be performed to ensure that the conclusions are meaningful. The main steps are shown in Figure 2.2.
Design
i
- Define goals
Choose instances Select factors
I
Measure
Report
+
W
Fig. 2.2
Main steps for an experimental design.
HOW TO REPORT RESULTS
49
2.3.1 Experimentation The first choice that a researcher must decide is the problem domain and the problem instances to test hisher algorithm. That decision depends on the goals of the experimentation. We can distinguish between two clearly different objectives: (1) optimization and (2) understanding of the algorithms. Optimizing is a commonly practiced sport in designing a metaheuristic that beats others on a given problem or set of problems. This kind of experimental research finishes by establishing the superiority of a given heuristic over others. Researchers are not limited to establishing that one metaheuristic is better than another in some way, but should also investigate why. A very good study of this latter subject can be found for instance in [ 181. One important decision is the instance used. The set of instances must to be complex enough to obtain interesting results and must have a sufficient variety of scenarios to allow the generalization of the conclusions. Problem generators [ 101 are specially good for a varied and wide analysis. In the next paragraphs we show the main classes of instances (a more comprehensive classification can be found in [12,261). Real- World Instances. The instances taken from real applications represent a hard testbed for testing algorithms. Sadly, it is rarely possible to obtain more than a few real data for any computational experiment due to proprietary considerations. An alternative is to use random variants of real instances, i.e., the structure of the problem class is preserved, but details are randomly changed to produce new instances. Another approach is using natural instances [ 121 that represent instances that emerge from a specific real-life situation, such as timetabling on a school. This class of instances has the advantage of being freely available. Specially, academic instances must be analyzed in the existing literature to not reinvent the wheel and to avoid using straightforward benchmarks [33]. Standard Instances. In this class are included the instances, benchmarks, and problem instance generators that, due to their wide use in experimentation, became standard in the specialized literature. For example, Reinelt [27] offers the TSPLIB, a travelling salesman problem test instances, and Uzsoy et al. [32] offer something similar for job scheduling problems. Such libraries allow to test specific issues of algorithms and also to compare our results against other methods. The OR-library [7] is a final excellent example of results from academics and companies for a large set of problem classes. Random Instances. Finally, when none of the mentioned sources provide an adequate supply for tests, the remaining alternative is pure random generation. This method is the quick way to obtain a diverse group of test instances but is also the most controversial.
50
MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS
After having selected a problem or a group of instances, we must design the computational experiments. Generally, the design starts by analyzing the effects of several factors on the algorithm performance. These factors include problem factors, such as problem size, number of constrains, etc., plus algorithmic factors, such as parameters or components that are used. If the cost of the computer experiments are low, we can do a ful1,factorial design, but in general, it is not possible due to the large number of experiments. But we usually need to reduce the factors. There is a wide literature on.fractiona1factorial design in statistics, which seeks to assess the same effects of a fractional analysis without running all the combinations (see, for example, [ 2 5 ] ) . The next steps in an experimental project are to execute the experiments, choose the measure of performance, and analyze the data. These steps are addressed in the next sections.
2.3.2 Measuring Performance Once we have chosen the instances that we are going to use and the factors that we are going to analyze, we must select the appropriate measures for the goal of our study. The objective of a metaheuristic is to find a good solution in a reasonable time. Therefore, the choice of performance measures for experiments with heuristics necessarily involves both solution quality and computational effort. Because of the stochastic nature of metaheuristics, a number of independent experiments need to be conducted to gain sufficient experimental data. The performance measures for these heuristics are based on some kind of statistics. 2.3.3 Quality of the Solutions This is one of the most important issues to evaluate the performance of an algorithm. For instances where the optimal solution is known, one can easily define a measure: the success rate or number of hits. This measure can be defined as the percentage of runs terminating with success (% hits). But this metric cannot be used in all cases. For example, there are problems where the optimal solution is not known at all and a lowerhpper bound is also unavailable. In other cases, although the optimum is known, its calculation delays too much, and the researcher relaxes to find a good approximation in a reasonable time. It is a common practice in metaheuristics for the experiments to have a specific bound of computational effort (a given number of search space points visited or a maximum execution time). In these cases, when optimum is not known or located, statistical metrics are also used. Most popular metrics include the mean and the median of the best performing solutions like the fitness (a measure of the quality of the solution) over all executions. These values can be calculated for any problem. For each run of a given metaheuristic the best fitness can be defined as the fitness of the best solution at termination. For
HOW TO REPORT RESULTS
51
parallel metaheuristics it is defined as the best solution found by the set of cooperating algorithms. In a problem where the optimum is known, nothing prevents us to use both % hits and mediadmean of the final quality (or of the effort). Furthermore, all combinations of lowlhigh values can occur for these measures. We can obtain a low number of hits and a high mean/median accuracy; this, for example, indicates a robust method that seldom achieves the optimal solution. An opposite combination is also possible but it is not common. In that case the algorithm achieves the optimum in several runs but the rest of the runs locates a very bad fitness. In practice, a simple comparison between two averages or medians might not give the same result as a comparison between two statistical distributions. In general, it is necessary to offer additional statistical values such as the variance and to perform a global statistical analysis to ensure that the conclusions are meaningful and not just random noise. We discuss this issue in Section 2.3.5.
2.3.4 Computational Effort While heuristics that produce superior solutions are important, the speed of computation is a key factor. Within metaheuristics, the computational effort is typically measured by the number of evaluations and/or the execution time. In general, the number of evaluations is defined in terms of the number of points of the search space visited. Many researchers prefer the number of evaluations as a way to measure the computational effort since it eliminates the effects of particular implementations, software, and hardware, thus making comparisons independent from such details. But this measure can be misleading in several cases in the field of parallel methods. For example, if some evaluations take longer than others (for example, in parallel genetic programming [22]) or if an evaluation can be done very fast, then the number of evaluations does not reflect the algorithm's speed correctly. Also, the traditional goal of parallelism is not the reduction of the number of evaluationsbut the reduction of time. Therefore, a researcher must usually use the two metrics to measure the computational effort. It is very typical to use the average evaluations/time to a solution, defined over those runs that end in a solution (with a predefined quality maybe different from the optimal one). Sometimes the average evaluatiodtime to termination is used instead of the average evaluationshime to a solution of a given accuracy. This practice has clear disadvantages, i.e., for runs finding solutions of different accuracy, using the total execution timeieffort to compare algorithms becomes hard to interpret from the point of view of the parallelism. On the contrary, imposing a predefined timeleffort and then comparing the solution quality of the algorithms represents an interesting and correct metric; what is incorrect is to also use the run times to compare algorithms, i.e., to measure speedup of efficiency (although works using this kind of metric can be found in literature).
52
MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS
2.3.5 Statistical Analysis In most papers, the objective is to prove that a particular heuristic outperforms another one. But as we said before, the comparison between two average values might be different from the comparison between two distributions. Therefore, statistical methods should be employed wherever possible to indicate the strength of the relations between different factors and performance measures [ 171. Usually, the researchers use t-tests or an analysis of variance (ANOVA) to ensure the statistical significance of results, i.e., determining whether an observed effect is likely to be due to sampling errors or not. Several statistical methods and the conditions to apply them are shown in Figure 2.3. The first step, in theory, should be to decide between non-parametric and parametric tests; when the data set is non-normally distributed and the number of experiments is below 30, we should use non-parametric methods; otherwise, parametric tests can be used. In fact, most researchers go for parametric tests from the beginning, what in most cases is a good idea if the number of independent experiments is high. Applying a Kolmogorov-Smirnov test is a powerful, accurate, and low cost method to check data normality. However, we must also say that most studies assume normality for data sets of more than 30 or 50 values (an assumption that is formally grounded). The Student t-test is widely used to compare means of normal data. This method can be only used when there exist two populations or results (e.g., two sets of independentruns of two metaheuristics). In case of many data sets we must use an ANOVA test, plus a later analysis to compare and sort means. For a non-normal data set, a wide amount of methods exist (see Figure 2.3). All the mentioned methods assume several hypotheses to obtain a correct conclusion, a linear relation between causes and effects being the most common one. The t-test is based on Student’s t distribution. It allows to calculate the statistical significance of two sampled populations with a given confidence level, typically between 95% @-value < 0.05) and 99% (p-value < 0.01). The underlying notion of ANOVA is to assume that every non random variation in experimental observations is due to differences in mean performance at alternative levels of the experimental factors. ANOVA proceeds by estimating each of the various means and partitioning the total sum qfsyuares or squared deviation from the sample global mean into separate parts due to each experimental factor and to error. As a general advice, you should include either one or both kinds of tests (if possible) in your scientific communications,since the whole metaheuristiccommunity is headed slowly to aslung for the assessment of the claims made in the experimental phase of your work. The two kinds of analyses, t-test and ANOVA, can only be applied if the source distribution is normal. In metaheuristics, the resulting distribution could also not to be normal. For this case, there is a theorem that can help us. The Central Limit Theorem states that the sum of many identically distributed random variables tends to a Gaussian. So the mean of any set of samples tends to a normal distribution. But in several cases the Central Limit Theorem is not useful. In these cases, there are a host of nonparametric techniques (for example, the sign test) that can and should
HOW TO REPORT RESULTS
I
Normal Variables
Non-Normal Variables Median comparison, ion-Parametric tests)
Analysis of variance
Levene test and
Post hoe mean comparison tests
2 independent data
> 2 dependent data
1
53
I
4
Mann-Witney test
I
4
Wileoron or Sign tests
I
4
Friedman test
Eqoality of Variance (Levene test)
Newman-Kenls (SNK), and/or Bonferronl tests
Fig. 2.3
Tamhane tests
Application scheme of statistical methods.
be employed to sustain the author’s arguments, even if the results show no statistical difference between the quality of the solutions produced by the metaheuristics [ 151.
2.3.6 Reporting Results The final step in an experimental design is to document the experimental details and findings and to communicate them to the international community. In the next paragraphs, we show the most important points that should be met. Reproducibili@. One necessary part of every presentation should be background on how the experiment was conducted. The reproducibility is an essential part of scientific research, and experimental results that cannot be independently verified are given little credence in the scientific community. Hence, the algorithm and its implementation should be described in sufficient detail to allow replication, including any parameter (probabilities, constants, ...), problem encoding, pseudo-random number generation, etc. The source and characteristics of problem instances should also be documented. Besides, many computing environment factors that can influence the empirical performance of a method should be documented also: number, types and
54
MEASURING THE PERFORMANCE OF PARALLEL METAHEURlSTlCS
speeds of processors, size and configuration of memories, communication network, operating system, etc. Presenting Results. A final important detail is the presentation of the results. The best way to support your conclusion is to display your data in such a way as to highlight the trends they exhibit, the distinctions made, and so forth. There are many good display techniques depending on the types of points one wants to make (for example, see [9] or [30]). Tables by themselves are usually a very inefficient way of showing the results. Hence, if there is any graphical way to summarize the data and reveal a message or lesson, it is almost to be preferred to a table alone. On the other hand, although pictures can often tell your story more quickly, they are usually a poor way of presenting the details of your results. Therefore, a scientific paper should contain both pictures and tables. 2.4
ILLUSTRATING THE INFLUENCE OF MEASURES
In this section we perform several experimental tests to show the importance of the reported performances in the conclusions. We use several parallel genetic algorithms and one parallel simulated annealing to solve the well-known MAXSAT problem. Before beginning with the examples, we give a brief description of the algorithms, the problem, and the configuration. The Algorithms. Genetic algorithms (GA) [ 141 make use of a randomly generated population of solutions. The initial population is iteratively enhanced through a natural evolution process. At each generation of this process, the whole population or a part of it is replaced by newly generated individuals (often the best ones). In the experiments, we use three different parallel models of GA: independent runs (IR), distributed GA (dGA), and a cellular GA (cGA) (see Figure 2.4) In the first model, a pool of processors is used to speed up the execution of separate copies of a sequential algorithm,just because independent runs can be made more rapidly by using several processors than by using a single one. In dGAs [29], the population is structured into smaller subpopulations relatively isolated from the others. The key feature of this kind of algorithm is that individuals within a particular subpopulation(or island) can occasionally migrate to another one. The parallel cGA [28] paradigm normally deals with a single conceptual population, where each processor holds just a few individuals. The main characteristic of this model is the structuring of the population into neighborhood structures, and individuals may only interact with their neighbors. Also, we consider a local search method such as simulated annealing (SA). A SA [2 11 is a stochastic technique that can be seen as a hill-climber with an internal mechanism to escape from local optima. For this, moves that increase the energy function being minimized are accepted with a decreasing probability. In our parallel SA there exist multiple asynchronous component SAs. Each component SA period-
ILLUSTRATING THE INFLUENCE OF MEASURES
Fig. 2.4
55
GA models: (a) a cellular GA and (b) a distributed GA.
ically exchanges the best solution found (cooperation phase) with its neighbor SA in the ring.
The Problem. The satisfiability (SAT) problem is commonly recognized as a fundamental problem in artificial intelligence applications, automated reasoning, mathematical logic, and related fields. The MAXSAT [13] is a variant of this general problem. Formally, this problem can be formulated as follows: given a formula f of the propositional calculus in conjunctive normal form (CNF) with in clauses and n variables, the goal of this problem is to determine whether or not there exists an assignment t of truth values to variables such that all clauses are satisfied. In the experiments we use several instances generated by De Jong et al. [lo]. These instances are composed of 100 variables and 430 clauses (f*(optimum)= 430). Configuration. No special analysis has been made for determining the optimum parameter values for each algorithm. We use a simple representation for this problem: a binary string of length n (the number of variables) where each digit corresponds to a variable. A value of 1 means that its corresponding variable is true, and 0 defines the corresponding variable as false. In our GA methods, the whole population is composed of 800 individuals and each processor has a population of 800/m individuals, where m is the number of processors. All the GAS use the one-point crossover operator (with probability 0.7) and bit-flip mutation operator (with probability 0.2). In distributed GAS,the migration occurs in a unidirectional ring manner, sending one single randomly chosen individual to the neighbor subpopulation. The target population incorporates this individual only if it is better than its current worst solution. The migration step is performed every 20 iterations in every island in an asynchronous way. For the SA method, we use a proportional update of the temperature, and the cooling factor is set to 0.9. The cooperation phase is performed every 10,000 evaluations. All experiments are performed on Pentium 4 at 2.8 GHz linked by a Fast Ethernet communication network. We performed 100 independent runs of each experiment to ensure statistical significance.
56
MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS
All the algorithms MALLBA Library [2].
have This
been implemented library is publicly
http://neo.lcc.uma.es/mallba/easy-mallba/index.html.
using the available at
In the next section we present several examples of utilization of the performance measures. 2.4.1
Example 1: On the Absence of Information
We begin our test showing the results of a SA with different numbers of processors to solve an instance of MAXSAT. The results can be seen in Table 2.2. The values shown are the number of executions that found the optimal value (O/O hit column), the fitness of the best solution (best column), the average fitness (avg column), the number of evaluations (# evals column), and the running time (time column).
Table 2.2 Results of Example 1 Algorithm SA8 SA16
%hit
best
avg
#evals
0% 0%
426 428
418.3
-
time
416.1
In this example, algorithms did not find an optimal solution in any execution. Then we cannot use the percentage of hits to compare them: we must stick to another metric to compare the quality of solutions. We could use the best solution found, but that single value does not represent the actual behavior of the algorithms. In this case the best measure to compare the quality of results is the mean fitness. Then, we can conclude that the SA with 8 processors is better than the same one with 16 processors. But before stating such conclusion we need to perform a statistical test to ensure the significance of this claim. To assess the statistical significance of the results we performed 100 independent runs (30 independent runs is usually thought as a minimum in heuristics). Also, we computed a Student t-test analysis so that we would be able to distinguish meaningful differences in the average values. A significant p-value is assumed to be 0.05, in order to indicate a 95% confidence level in the results. In this case, the resulting p-value is 0.143, i.e., there is not a significant difference among the results. This comes as no surprise, since they are the same algorithm, and the behavior should be similar, while only the time should be affected by the change in the number of processors. Thus, if we would have elaborated on the superiority of one of them, we would have been mistaken. In this example, it is not fair to measure the computational effort, since the algorithms do not achieve an optimum. In the next example, we will use a different stopping criterion to allow us to compare the computational effort.
ILLUSTRATING THE INFLUENCE OF MEASURES
57
2.4.2 Example 2: Relaxing the Optimum Helps Again, we show in Table 2.3 the results of the same SA as in the previous section, but for this test we consider as an optimum any solution with f*(z) > 420 (the global optimum has a fitness value = 430). Table 2.3 Algorithm SAS SA16
Results of Example 2
%hit
best
avg
60% 58%
426 428
418.3 416.1
# evals 60154 67123
time 2.01 1.06
For this example, we do not compare the quality of the solution since there is not statistical difference, and therefore we focus on the computational effort. The algorithm with 8 processors, SA8, performs a slightly smaller number of evaluations than the SAI 6, but the differenceis not significant(the pvalue is larger than 0.05). On the other hand, the reduction in the execution time is significant (p-value = 1.4eP5). Thus we could have stated at a first glance that SA8 is numerically more efficient than SA16, but statistics tell us that no significant improvement can be drawn. However, we can state that SA16 is better from a time efficiency point of view than SA8. These results are somewhat expected: the behavior (quality of solutions and number of evaluations) of both methods are similar, but the execution time is reduced when the number of processors is increased. Concluding on SA8’s numerical superiority is that can be avoided thanks to the utilization of such statistical tests.
2.4.3
Example 3: Clear Conclusions Do Exist
Now, let us compare two different algorithms: a parallel GA using the independent run model and the SA of the previous examples. Both GA and SA are distributed on 16 machines. As it occurs in the first example, neither of the methods achieve the optimum solution in any independent run. Therefore, we consider the same optimum range as the second example. Table 2.4 Results of Example 3 Algorithm %hit best avg #evals IR16 37% 424 408.3 85724 67123 SA16 58% 428 416.1
time 1.53 1.06
Table 2.4 shows a summary of the results for this experiment. From this table we can infer that SA16 is better in all aspects (solution quality, number of evaluations, and time) than IR16. And, this time, these conclusions are all supported by statistical tests, i.e., their p-values are all smaller than 0.05. Although SA16 is better than IR16, none of them is adequate for this problem, since they are both quite far from the optimum.
58
MEASURING T H E PERFORMANCE O F PARALLEL METAHEURISTICS
2.4.4 Example 4: Meaningfulness Does Not Mean Clear Superiority Now, we compare the results obtained with the same parallel GA (independent runs model) using two, four, and eight processors. The overall results of this example are shown in Table 2.5.
Algorithm Seq. IR2 IR4 IR8
Table 2.5 Results of YOhit best avg 60% 430 419.8 41% 430 417.7 20% 430 412.2 7% 430 410.5
Example 4 #evals time 97671 19.12 92133 9.46 89730 5.17 91264 2.49
speedup
1.98 3.43 7.61
The statistical tests are always positive, i.e., all results are significantly different from the other ones. Then we can conclude that the IR paradigm allows to reduce the search time and obtains a very good speedup, nearly linear, but its results are worse than those of the serial algorithms, since its percentage of hits is lower. It might be surprising since the algorithm is the same in all the cases, and the expected behavior should be similar. The reason is that, as we increase the number of processors, the population size decreases and the algorithm is not able to keep diversity enough to find the global solution.
2.4.5 Example 5: Speedup, Do Not Compare Apples Against Oranges In this case, we show an example of speedup. In Table 2.6 we show the results for a sequential GA against a distributed cellular GA with different numbers of processors. In this example we focus on the time column. The ANOVA test for this column is always significant (p-value = 0.0092). Table 2.6 Algorithm % h i t Seq. 60% cGA2 85% cCA4 83% &A8 83% cCA16 84%
Results of Example 5 best 430 430 430 430 430
avg 421.4 427.4 426.7 427.1 427.0
# evals
97671 92286 94187 92488 91280
time 19.12 10.40 5.79 2.94 1.64
As we do not know the best algorithm to this MAXSAT instance, we cannot use the strong speedup (see Table 2.1). Then, we must use the weak definition of speedup. On the data of Table 2.6, we can measure the speedup with respect to the canonical serial version (panmixia columns of Table 2.7). But it is not fair to compute speedup against a sequential GA, since we compare different algorithms (the parallel code is that of a cGA). Hence, we turn to compare the same algorithm (the cGA) both in sequential and in parallel (cGAn on 1 versus n processors). This speedup is known
ILLUSTRATING THE INFLUENCE OF MEASURES
59
as orthodox speedup. The speedup, the efficiency, and the serial fraction using the orthodox definition are shown in the orthodox columns of Table 2.7. The orthodox values are slightly better than those on the panmictic ones. But the trend in both cases is similar (in some other cases the trend could even be different); the speedup is quite high in this case, but it is always sublinear, and it slightly moves away from the linear speedup as the number of CPUs increases. That is, when we increment the number of CPUs we have a moderate loss of efficiency. The serial fraction is quite stable, as one can expect in a well-parallelized algorithm, although we can notice a slight reduction of this value as the number of CPUs increases, indicating that the granularity of the parallel task is too fine, and the loss of efficiency is mainly due to the limited parallelism of the program behavior itself.
Algorithm cCA2 cCA4 cCA8 cCA16
2.4.6
speedup 1.83 3.30 6.50 11.65
Table 2.7 Speedup and Efficiency panmixia orthodox efficiency serial fract. speedup efficiency 0.9 I5 0.093 1.91 0.955 0.070 3.43 0.857 0.825 0.846 0.032 6.77 0.8 12 0.75 1 0.025 12.01 0.728
serial fract. 0.047 0.055 0.026 0.022
Example 6: Predefined Effort Could Hinder Clear Conclusions
In the previous examples the stopping criterion is based on the quality of the final solution. In this experiment, the termination condition is based on a predefined effort (60,000 evaluations). Previously, we used the fitness of the best solution and the average fitness to measure the quality of the solutions. We now turn to a different metric, such as the median and the average of the final population mean fitness of each independent run (mm). In Table 2.8 we list all these metrics for a sequential GA and a distributed GA using four processors. Table 2.8 Algorithm %hit Seq. 0% 0% dCA4
Results of Example 6 best 418 410
avg 406.4 402.3
median 401 405
mm 385.8 379.1
Using a predefined effort as a stopping criterion is not always a good idea in parallel metaheuristics if one wishes to measure speedup: in this case, for example, algorithms could not find an optimal solution in any execution. If we analyze the best solution found or the two averages (average of the best fitness, avg column, and average of the mean fitness, mm column), we can conclude that the sequential version is more accurate than the parallel GA. But the median value of dGA is larger than that of the serial value, indicating that the sequential algorithm obtained several very good solutions but the rest of them had a moderate quality, while the parallel GA had a more stable behavior. With this stopping criterion, it is hard to obtain a
60
MEASURING THE PERFORMANCE O F PARALLEL METAHEURISTICS
clear conclusion if the algorithm is not stable. In fact, the normal distribution of the resulting fitness is hardly found in many practical applications, a disadvantage for simplistic statistical claims. Also, we can notice that the avg data are always better than the mm values. This is common sense, since the final best fitness is always larger than the final mean fitness (or equal when all the individuals converge). Finally, a statistical analysis verified the significance of those data. 2.5 CONCLUSIONS This chapter considered the issue of reporting experimental research with parallel metaheuristics. Since this is a difficult task, the main issues of an experimental design are highlighted. We do not enter the complex and deep field of pure statistics in this chapter (space problems), but just present some important ideas to guide researchers in their work. As could be expected, we have focused on parallel performance metrics that allow to compare parallel approaches against other techniques of the literature. Besides, we have shown the importance of the statistical analysis to support our conclusions, also in the parallel metaheuristic field. Finally, we have performed several experimental tests to illustrate the influence and utilizations of the many metrics described in the chapter. Acknowledgments The authors acknowledge partial funding by the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project).
REFERENCES 1. E. Alba. Parallel evolutionary algorithms can achieve super-linear performace.
Information Processing Letters, 82:7-13,2002. 2. E. Alba and the MALLBA Group. MALLBA: A Library of Skeletons for Combinatorial Optimisation. In R. Feldmann and B. Monien, editors, Proceedings of the Euro-Par, pages 927-932, Paderborn, Germany, 2002. Springer-Verlag. 3. E. Alba, A.J. Nebro, and J.M. Troya. Heterogeneous Computing and Parallel Genetic Algorithms. Journal of Parallel and Distributed Computing, 62: 13621385,2002. 4. E. Alba and J.M. Troya. A survey of parallel distributed genetic algorithms. Complexity, 4(4):31-52, 1999.
REFERENCES
61
5. R.S. Ban, B.L. Golden, J.P. Kelly, M.G.C. Resende, and W.R. Stewart. Designing
and Reporting on Computational Experiments with Heuristic Methods. Journal ofHeuristics, 1(1):9-32, 1995. 6. R.S. Barr and B.L. Hickman. Reporting Computational Experiments with Parallel Algorithms: Issues, Measures, and Experts’ Opinions. ORSA Journal on Computing, 5(1):2-18, 1993. 7. J.E. Beasley. OR-Library: distributing test problems by electronic mail. Journal of the Operational Research Society, 41(11): 1069-1072, 1990. 8. T.C. Belding. The distributed genetic algorithm revisited. In L.J. Eshelman, editor, 6th International Conference on Genetic Algorithms, pages 114-121, Los Altos, CA, 1995. Morgan Kaufmann. 9. W.S. Cleveland. Elements of Graphing Data. Wadsworth, Monteray, CA, 1985. 10. K.A. De Jong, M.A. Potter, and W.M. Spears. Using Problem Generators to Explore the Effects of Epistasis. In T. Back, editor, Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA), pages 338-345. Morgan Kaufmann, 1997. 11. V. Donalson, F. Berman, and R. Paturi. Program speedup in heterogeneous computing network. Journal of Parallel and Distributed Computing, 2 1:3 16322,1994. 12. A.E. Eiben and M. Jelasity. A critical note on experimental reseeach methodology in ec. In Congress on Evolutionary Computation 2002, pages 582-587. IEEE Press, 2002.
13. M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of Np-Completeness. W.H. Freeman, San Francisco, 1979. 14. D.E. Goldberg. Genetic Algorithms in Search, Optimization andMachine Learning. Addison-Wesley, 1989. 15. B. Golden and W. Stewart. Empirical Analisys of Heuristics. In E. Lawlwer, J. Lenstra, A. Rinnooy Kan, and D. Schoys, editors, The Traveling Salesman Problem, a Guided Tour of Combinatorial Optimization, pages 207-249, Chichester, UK, 1985. Wiley. 16. R.L. Graham. Bounds on multiprocessor timing anomalies. SIAM Journal qf Applied Mathematics, 17:416429, 1969. 17. C. Hervas. Analisis estadistico de comparacion de algoritmos o heuristicas. Personal Communication, University of Cordoba, Spain, 2004. 18. J.N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1(1):3342, 1995.
62
MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS
19. A.H. Karp and H.P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539-543, 1990. 20. R.M. Karp. Probabilistic analysis of partitioning algorithms for the traveling salesman problem in the plane. Mathematics of Operations Research, 2:209224, 1977. 2 1. S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by Simulated Annealing. Science, 220(4598):671-680, 1983. 22. J.R. Koza. Genetic Programming. The MIT Press, Massachusetts, 1992. 23. S.C. Lin, W.F. Punch, and E.D. Goodman. Coarse-grain parallel genetic algorithms: Categorization and a new approach. In Sixth IEEE Parallel Distributed Processing, pages 28-37, 1994. 24. C. McGeoch. Towards an experimental method for algorithm simulation. INFORMS Journal on Computing, 8(1): 1-15, 1996. 25. D.C. Montgomery. Design andAnalysis ofExperiments. John Wiley, New York, 3rd edition. 1991. 26. R.L. Rardin and R. Uzsoy. Experimental Evaluation of Heuristic Optimization Algorihtms: A Tutorial. Journal of Heuristics, 7(3):26 1-304,2001.
27. G. Reinelt. TSPLIB - A travelling salesman problem library. ORSA - Jorunal o f Computing, 3:376-384, 1991. 28. P. Spiessens and B. Manderick. A massively parallel genetic algorithm. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms (ICGA), pages 279-286. Morgan Kaufmann, 1991.
29. R. Tanese. Distributed genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms (ICGA), pages 434-439. Morgan Kaufmann, 1989. 30. E.R. Tufte. The Visual Display o f Quantitative Information. Graphics Press, 1993. 3 1. UEA CALMA Group. Calma project report 2.4: Parallelism in combinatonal optimisation. Technical report, School of Information Systems, University of East Anglia, Nonvich, UK, September 18 1995. 32. R. Uzsoy, E. Demirkol, and S.V. Mehta. Benchmarks for shop scheduling problems. European Journal of Operational Research, 109:137-141, 1998. 33. D. Whitley. An overview of evolutionary algorithms: practical issues and common pitfalls. Information and Sofmare Technology, 43917-83 1, 2001.
3
New Technologies in Parallelism ENRIQUE ALBA, ANTONIO J. NEBRO Universidad de Mdaga, Spain
3.1 INTRODUCTION Parallel computing is a continuously evolving discipline, and this evolution involves both hardware and software. In this chapter, we intend to provide a review of software issues, focusing mainly in tools, languages, and systems in order to find out how they can match the requirements needed for implementing parallel optimization heuristic algorithms. Notwithstanding, it is necessary to understand hardware concepts related to parallel computer architectures (Section 3.2). Then, in Section 3.3, we discuss issues related to parallel programming in shared-memory and distributed-memory systems. An analysis of tools for programming both kinds of systems is included in Section 3.4 and Section 3.5. Finally, Section 3.6 discusses the global features of the presented tools and Section 3.7 makes a summary of the contents of the chapter.
3.2 PARALLEL COMPUTER ARCHITECTURES: AN OVERVIEW Various classification schemes for parallel computers have been defined over the years. The most commonly used taxonomy is the proposal ofFlynn made in 1972, but nowadays it is not accurate enough to describe all the possible parallel architectures. Consequently, other classifications have been presented, although many of them are extensions of Flynn’s classification. The model of Flynn is based on the notion of instruction and data streams. There are four possible combinations, conventionally called SISD (Single Instruction, Single Data stream), SIMD (Single Instruction, Multiple Data stream), MISD (Multiple Instruction, Single Data stream), and MIMD (Multiple Instruction, Multiple Data stream). This scheme is illustrated in Figure 3.1. We describe each model next:
0
The SISD architecture corresponds to the classical mono-processor personal computer or workstation. 63
64
NEW TECHNOLOGIES IN PARALLELISM
Data Stream
5
4
e 0 ‘-r:
Z
0
c)
2
Single
Multiple
SISD
SIMD
Single
MISD
MIMD
Multiple
I
Fig. 3.1 Flynn’s Taxonomy. 0
0
0
In the SIMD architecture the same instruction is executed by all the processors, at each computing step or clock cycle, over different data. A typical SIMD computer is composed of hundreds or even thousands of simple processors, each with a small local memory. This lund of machine exploits spatial parallelism that may be present in a given problem which uses large and regular data structures. If the problem domain is spatially or temporally irregular, many processors must remain idle at a given time step, thus producing a loss in the amount of parallelism that can be exploited. This architecture was promising at the beginning of the 1990s, but the complexity and often inflexibility of SIMD machines, strongly dependent on the synchronization requirements, have restricted their use mostly to special-purpose applications. In the class MISD, the machines execute multiple instructions on the same piece of data. Computers of this type are hard to find in practice, although some people regard pipelined machines as MISD. In the MIMD class, different data and programs can be loaded into different processors, and each processor can execute different instructions at any given point of time. This class is in general the most useful one, and most parallel computers belong to it.
Although Flynn’s taxonomy has become a standard model, it is a coarse-grain classification. For example, most of today’s processors are parallel in the way in which they execute the instructions, but they are considered as SISD. What it is more important, this taxonomy does not consider in the MIMD class whether the system memory spans a single address space or it is distributed among several modules. A more exhaustive taxonomy that extends Flynn’s classification is depicted in Figure 3.2. Here, MIMD systems are subdivided into multiprocessors, in which all the processors have direct access to all the memory, and multicomputers (also called distributed systems), where each processor has it own local memory module and remote memory modules accessing requires the use of a message-passingmechanism. Multiprocessors are classified depending on whether the access time to every memory address is constant (uniform memory access, or UMA) or not (non-uniform
SHARED-MEMORY A N D DISTRIBUTED-MEMORY PROGRAMMING
65
1
Shared memory
Message Passing
Fig. 3.2 Extension of Flynn’s Taxonomy.
memory access,.orNUMA). In the former, the interconnection among processors can be a bus or a switch; in the latter there is a distinction if the caches are kept coherent (coherent-cache or CC-NUMA) or not (NC-NUMA). Although multi-processors are widely used, they have the drawback of being limited in terms of the maximum number of processors that can be part of them, and their price tends to increase exponentially with the number of processors. Distributed systems are composed of collections of interconnected computers, each having its own processor, memory, and a network adaptor. Compared to multiprocessors, they have a number of significant advantages, namely: easier to build and extend, better ratio price/performance,more scalability,more flexibility, and they are the only choice to execute inherently distributed applications [25]. A typical distributed system is a cluster of workstations (COW), composed of PCs or workstations interconnected by a communication network such as Fast Ethernet (general purpose, low cost) or Myrinet (high performance, medium cost). The number of computers in a COW is limited to a few hundred because of the limits imposed by the network technology. The systems belonging to the MPP model (MassivelyParallel Processor) are composed of thousands of processors. If the MPP system is tightly-coupled (i.e, it is a unique computer), then we have the MIMD systems based on topologies such as hypercube, fat tree, or torus. On the other hand, a MPP can be composed of machines belonging to multiple organizations and administrative domains, leading to the so-called grid systems [4, 91, also known as metacomputers, which are built around the infrastructure provided by the Internet. 3.3
SHARED-MEMORY AND DISTRIBUTED-MEMORY PROGRAMMING
Independently of the kind of parallel computer we consider, they share a common feature: they are difficult to program. Compared to sequential programming, which is based on a single process having a unique flow of control, the programs of parallel
66
NEW TECHNOLOGIES IN PARALLELISM
computers adheres to concurrent programming. A concurrent program contains two or more processes that work together to perform a task by communicating and synchronizing among them [2], so the programmer has to deal with known issues such as mutual exclusion, condition synchronization, or deadlock. Although concurrent programming includes both the programming of multiprocessors and distributed systems, the former adheres to shared-memoryprogramming, while the latter is known as distributed programming. Shared-memoryprogramming is based on the fact that the whole memory is directly accessible by all the processes, which use read and write operations to access them; thus, this model is a natural extension of sequential programming. Furthermore, its foundations were established in the 1960s/1970s,so they are well known. Distributed programming is based on message-passing mechanisms, which introduce a number of difficulties not encountered in shared-memory programming. To start with, there exists several ways to interchange messages between process (synchronous/asynchronous, bufferedhnbuffered, reliablehot reliable) [25], each of them with distinct semantics; the program has also to solve issues related to heterogeneity (the sender and the receiver machines happen to have different architectures or operating systems), load balancing (to keep the processors as busy as possible), security, etc. Because of the advantages that shared-memory programming offers compared to distributed programming, in the last decade a large amount of research was carried out to try to offer a shared-memory over distributed systems, leading to the so-called distributed shared-memory (DSM) systems. A survey can be found in [14]. In fact, some multiprocessors belonging to the NUMA category (Figure 3.2) are distributed systems implementing DSM by hardware. DSM can be also implemented at the software level, including here extensions of operating systems, libraries to be used by sequential languages, new languages, and extensions of existing languages. Given that this chapter is devoted to new technologies, we summarize next a representative set of tools, languages, and systems that can be considered to be used to implement parallel heuristic algorithms. However, for completion purposes, we will also discuss those tools that are not strictly new, but they can be considered to be used in our applications. We will analyze in greater detail some of these tools in Sections 3.4 and 3.5. The easiest way to deal with parallelism in the shared-memory model is to use a parallelizing compiler that automatically converts a sequential program to a parallel one. However, most of these compilers are oriented to programs written in Fortran, although there are some tools for C and C++, such as SUIF and the Portland Group compilers (www .pgroup .corn). The second approach is to use operating systems resources such as processes, threads, semaphores, or even files. A third group is composed of parallel libraries that can be used by sequential languages, such us OpenMP (www .openmp .org), which provides bindings for Fortran, C, and C++. Finally, we can use parallel languages. Here we can find a plethora of programming models (shared variables, coordination, data parallelism, functional) and languages (Ada, Cilk, HPF, NESL), although is worth mentioning that modern, general-purpose languages such as Java and C # are prepared to deal with shared-
SHARED-MEMORY AND DISTRIBUTED-MEMORY PROGRAMMING
67
memory (and distributed-memory) parallelism, which is based normally on the use of thread libraries. Concerning distributed-memory programming, we can consider also the use of operating system resources (sockets), parallel libraries (PVM, MPI), and parallel languages (again Java and C#, but also Ada, Linda, or Orca). However, while that classification is enough to characterize shared-memory programming tools, it is insufficient to properly identify all the possibilities we can find in distributed programming. A more accurate taxonomy is described next: 0
0
0
Message-passing libraries. These libraries are mainly indicated to develop parallel applications in COWs. We can include here the sockets offered by the operating systems (Unix and Windows) and the sockets provided by Java and Microsoft’s .NET platform (which supports the languages C#, Visual Basic, or C++). Nevertheless, the socket API is a low level interface, while libraries such as PVM and MPI offer a rich set of primitives for interprocess communication and synchronization. Object-based systems. If our application requirements involve coping with heterogeneous hardware and/or software, then object-based systems are to be considered. They are an evolution of remote procedure call (RPC) systems, and they are based on the idea of having remote objects that can be accessed by clients by invokmg the methods they define in their interfaces. Examples of these systems are C O M A and Java MI. To manage heterogeneity, objectbased systems are structured in a three layer scheme, where between the clients and objects (the higher level) and the operating systems and hardware (the lower level) there is an intermediate level called middhare, which hides all the details of the lower level to the applications. Grid computing systems. Internet provides an infrastructure that allows to interconnect computers around the world. Thus, it is possible to envision a metacomputer composed of thousands of machines belonging to organizations residing in different countries. As a result, a new discipline known as grid computing has emerged in the last years, and there is currently a large amount of projects and systems focused in this area [4,91. The computing power that can be obtained by a grid computing system allows to attack problems that were unable to be solved with COWs [3], but the development of grid applications is difficult. Among the reasons, we can mention [7] large scalability, heterogeneity at multiple levels, unpredictable structure of the system (which is constructed dynamically from available resources), dynamic and unpredictable behavior (concerning network failures, as well a high latency and low bandwidth of communications), and multiple administrative domains (each one having its own administrative policies, which much be preserved by the grid system). The defacto standard grid system is Globus [7], but there are many other systems, such as Condor, Sun Grid Engine (SGE), Legion, NetSoIve, etc. On the other hand, some authors argue that the features of Java make it a candidate language for grid computing [ 1 11.
68
NEW TECHNOLOGIES IN PARALLELISM
Web-based computing. As grid computing, another approach based on the Internet has appeared: the Web services. These provide a method for applications to communicate whith each other over the Internet. However, compared to grid computing systems, the Web services are built on existing Web protocols and open XML standards [6] regulated by the W3C (www .w3.org). Thus, communication uses the Simple Object Access Protocol (SOAP), Web services are described with the Web Services Description Language (WSDL), and the Universal Description, Discovery, and Integration (UDDI) directory allows to register Web service descriptions. Another interesting aspect of Web services is that the hosting environments and runtime systems do not depend on a specific platform, such as Windows 2OOOlXP, same flavor of Unix, Java 2 Enterprise Edition (J2EE), or Microsoft .NET. Currently there exists the trend to combine Web services and the Grid. An example is the Open Grid Services Architecture (OGSA) [lo], which is presented as an evolution of Globus towards a Grid system architecture based on an integration of Grid and Web services concepts and technologies.
3.4 SHARED-MEMORY TOOLS In this section we describe a set of tools that we consider as interesting to be taken into account when facing the construction of parallel algorithms in shared-memory systems. The tools are summarized in Table 3.1. Table 3.1 Tools for shared-memory programming System
Category
Language Bindings
Pthreads Java threads OpenMP
Operating system resource Programming language Compiler directives
C Java Fortran, C, C+i
3.4.1
Pthreads
The concept of thread as an independent flow of control inside a process has been virtually adopted by all modern operating systems since the 1990s, leading to the so-called multithreaded processes and a discipline known as multithreaded programming. Although it has alwaysbeen possible to write parallel programs using processes and other resources provided by the operating system (semaphores, shared-memory, files), multithreaded processes are itself concurrent programs, what brings a number of advantages over multiple processes: faster context switching between threads, lower resource usage, simultaneous computatiodcommunication, and some parallel applications fit well in the thread model.
SHARED-MEMORY TOOLS
69
In the last decade several Unix-based operating systems begun to include their proprietary thread libraries (e.g. Solaris), leading to nonportable multithreaded code. In this context, a standardized library for multithreaded programming, known as Pthreads (or POSIX threads), was defined in the mid-1990s as an effort to provide a unified set of C library routines in order to make multithreaded programs portable. The Pthreads library offers functions for thread management (thread creation, scheduling, and destruction) and synchronization (mutexes, synchronization variables, semaphores, and read-write locks), and it is available mainly on various variants of the UNIX operating system. The operating systems belonging to Microsoft’s Windows family, Windows 2000 and Windows XP, also provide multithreaded processes. Although the thread functions are syntactically different to the Pthreads interface, the functionality regarding thread management and synchronization is equivalent to some extent.
3.4.2 Java Threads The benefits of multithreaded programming modified not only the classical view of single-threaded processes in operating systems but also in modern programming languages. An example of such a language is Java ( java. sun. corn). Threads are programmed in Java by extending the Thread class or by implementing a Runnable i n t e r f a c e . Thread creation involves creating an instance of these classes and invoking on the new object a method called s t a r t o . Synchronization is carried out providing synchronized methods that ensure mutual exclusion and condition synchronization by means of a number of methods (wait, n o t i f y , and n o t i f yAll). Compared to Pthreads, Java threads offer the advantages of the portability inherent in Java programs and a multithreaded programming model adapted to the objectoriented features of Java.
3.4.3
OpenMP
OpenMP is a set of compiler directives and library routines that are used to express shared-memory parallelism (www .openmp .org). The OpenMP Application Program Interface (API) was developed by a group representing the major vendors of highperformance computing hardware and software. Fortran and C++ interfaces have been designed, with some efforts to standardize them. The majority of the OpenMP interface is a set of compiler directives. The programmer adds these to a sequential program to tell the compiler what parts of the program must be executed concurrently and to specify synchronization points. The directives can be added incrementally, so OpenMP provides a path for parallelizing existing software.
70
NEW TECHNOLOGIES IN PARALLELISM
3.5 DISTRIBUTED-MEMORY TOOLS As in the previous section, we describe here a set of languages and systems of interest for the construction of parallel algorithms in distributed-memory systems. These systems are summarized in Table 3.2.
Table 3.2 Tools for distributed-memory programming Message-Passing Library
Object-Based System
Internet Computing System
Sockets PVM MPI
Java RMI CORBA
Globus Condor
3.5.1 Sockets The BSD socket interface (see, e.g., [ 5 ] ) is a standard defucto message-passing system. A set of data structures and C functions allow the programmer to establish full-duplex channels between two computers for implementing general-purpose distributed applications. If the chosen underlying protocol is TCP, sockets offer a connection-oriented service, ensuring reliable communications and guaranteing that the messages are received in the order they were issued. Also, a connectionless service over UDP is available for applications not needing the facilities of TCP. Parallel programs can be developed with the socket API, with the added benefits of large applicability, high standardization, and complete control on the communication primitives. Despite their advantages, programming with sockets has many drawbacks for applications involving a large number of computers, with different operating systems, and belonging to different owned networks. First, programming with sockets is error-prone and requires understanding low level characteristics of the network. Also, it does not include any process management, fault tolerance, task migration, security options, and other attributes usually requested in modem parallel applications. As in the case of threads, modem languages such as Java incorporate a socket library. Thus, portability is enhanced, and the socket functions are simplified compared to the C socket interface. Furthermore, Java allows to send objects via sockets by using a mechanism known as serialization. This is a powerful mechanism that can be used to send complex data structures, such us lists and trees of objects. The price to pay is an overhead that may not be assumable by intensive communication programs.
DISTRIBUTED-MEMORY TOOLS
3.5.2
71
PVM
The Parallel Virtual Machine (PVM) [23] is a software system that permits the utilization of a heterogeneous network of parallel and serial computers as a unified general and flexible concurrent computational resource. The PVM system supports the message-passing paradigm, with implementations for distributed-memory, sharedmemory, and hybrid computers. These features allow applications to use the most appropriate computing model for the entire application for individual subalgorithms. The PVM system is composed of a suite of user interface primitives supporting software that together enable concurrent computing on loosely coupled networks of processing elements. PVM may be implemented on heterogeneous architectures and networks. These computing elements are accessed by applications via a standard interface that supports common concurrent processing paradigms in the form of well-defined primitives that are embedded in procedural languages such as C and Fortran. The advantages of PVM are its wide acceptability and its heterogeneous computing facilities, including fault tolerance issues and interoperability [24]. Despite its advantages, the standard for PVM has recently begun to be unsupported (no further releases); also, many PVM users are shifting to MPI.
3.5.3
MPI
The Message-Passing Interface (MPI) is a library of message-passing routines [20]. When MPI is used, the processes in a distributed program are written in a sequential language (C, Fortran), and they communicate and synchronize by calling functions in the MPI library. The MPI API was defined in the mid-1990s by a large group of people from academia, government, and industry. The interface reflects people’s experiences with earlier message-passing libraries, such as PVM. The goal of the group was to develop a single library that could be implemented efficiently on the variety of multiple processor machines. MPI has now become a de fucto standard, and several implementations exist, such as MPICH (www .mcs .an1 .gov/mpi/mpich) and LAM/MPI (www .mpi .nd .eddlam). The interest in developing MPI was that each massively parallel processor (MPP) vendor was creating its own proprietary message-passing M I . In this scenario it was not possible to write a portable parallel application. MPI is intended to be a standard for message-passing specifications that each MPP vendor would implement on its system. The MPP vendors need to be able to deliver high-performance and this became the focus of the MPI design. Given this design focus, MPI is expected to always be faster than PVM on MPP hosts [24]. MPI programs follow a SPMD style (single program, multiple data), that is, every processor executes a copy of the same program. Each instance of the program can determine its own identity and hence take different actions. The instances interact by calling MPI library functions. The MPI provides a rich set of 128 functions for
72
NEW TECHNOLOGIES IN PARALLELISM
process-to-process communication, group communication, setting up and managing communication groups, and interacting with the environment. The first standard, named MPI- 1, had the inconvenience that applications were not portable across a network of workstations because there was no standard method to start MPI tasks on separate hosts. Different MPI implementations used different methods. In 1995 the MPI committee began meeting to design the MPI-2 specification to correct this problem and to add additional communication functions to MPI, including language bindings for C++. The MPI-2 specification was finished in June 1997. The MPI-2 document adds 200 functions to the 128 original fimctions specified in the MPI- 1. All the mentioned advantages have made MPI the standard for future applications using message-passing services. The drawbacks relating dynamic process creation and interoperability are being successfully solved, although, up-to-date, full implementations for MPI-2 are not widely available. It is worth mentioning the many nonstandard extensions of MPI that have been developed to allow running MPI programs in grid computing systems. An example is MPICH-G2, a complete implementation of the MPI-1 standard to extend the MPICH implementation of MPI for grid execution on Globus [ 151. 3.5.4
Java RMI
The implementation of remote procedure calls (RPC) in Java is called Java-RMI. The Remote Method Invocation in Java allows an application running in one Java virtual machine (JVM) to invoke methods of objects residing in different JVMs, with the added advantages of being object-oriented, platform-independent, and distributed garbage collection. The features of Java RMI come at a cost: one remote method invocation can take several milliseconds, depending on the number and the types of arguments [ 111. This latency is too slow if we are dealing with fine-grained, communication-intensive applications running on COWS. However, some projects are trying to improve the performance of RMI. For example, in the Manta project [ 191, Java code is compiled to native code and a runtime system written in C is used, leading to a significant reduction of the communication time.
3.5.5 CORBA Distributed systems are typically heterogeneous, and this heterogeneity refers to computer architectures, operating systems, and programming languages. The origin of the heterogeneity can be found in several issues, including the continuous advances in both hardware and software (the new technologies must coexist with the old ones), applications that are inherently heterogeneous, and the need of using legacy systems. In this context, at the beginning of the 1990s CORBA (Common Object Request Broker Architecture) was defined by the Object Management Group (www.omg .org), with the goals in mind of defining models and abstractions that
DISTRIBUTED-MEMORY TOOLS
73
were platform independent and hiding as much as possible the complexity of the underlying systems but trying to keep high performance. C O M A is a middleware, providing a distributed-object-basedplatform to develop distributed applications in which components can be written in different languages and can be executed in different machines. There are defined bindings for several languages, including C, C++, Java, Smalltalk, Cobol, and Ada. The architecture offers a rich set of services to applications, including naming, notification, concurrency, security, and transactions. The first version of CORBA, appearing in 1991, did not permit interoperability among different implementations; this was achieved with the definition of the Internet Inter-ORE3 Protocol (IIOP) in CORBA-2, in 1996. Currently there are several C O M A implementations, including both proprietary and free systems. Examples of the former are Orbix and Visibroker (offering bindings to C++ and Java), while TAO (C++) and JacORE3 (Java) are examples of the latter.
3.5.6
Globus
The Globus Toolkit is a community-based, open-architecture, open-source set of services and software libraries that support grids and grid applications [8]. Globus has become a standard defacto, and it provides support for security, information discovery, resource management, data management, communication, fault detection, and portability. It is constructed as layered architecture, in which high level global services are built upon essential low level core local services. The components of Globus can be used either independently or together to develop applications. For each component of the toolkit, both protocols and applications programming interfaces (MIS) are defined. Furthermore, it provides open-source reference implementations in C (for client-side MIS).Some of these components are the following: 0
Grid Security Infrastructure (GSI).
0
GridFTP.
0
Globus Resource Allocation Manager (GRAM). Metacomputing Directory Service (MDS-2).
0
Global Access to Secondary Storage (GASS).
0
Data catalogue and replica management.
0
Advanced Resource Reservation and Allocation (GARA).
The latest version of the Globus Toolkit (GT3) is based on a core infrastructure component compliant with the Open Grid Services Architecture (OGSA). This architecture defines the concept of Grid service as a Web service that provides a set of well-defined interfaces and that follow specific conventions.
74
NEW TECHNOLOGIES IN PARALLELISM
3.5.7 Condor Condor (www . c s . wisc . edu/condor) is a resource management system (RMS) for grid computing. A RSM is responsible for detecting available processors, matching job requests to available processors, executing jobs on available machines, and determining when a processor leaves computation. Compared to other RMSs [ 161, Condor is an open-source project based on three features: remote system calls, classified advertisement, and checkpointing [ 171. These features are implemented without modification underlying UNIX kernel. With Condor, a set of machines can be grouped into a Condor pool, which is managed according to a opportunistic computing strategy. For example, Condor can be configured to run jobs on workstations when they are idle, thus using CPU time which otherwise would be wasted. If a job is running in a machine when its owner returns, the job can be stopped and resumed later, or it can be migrated to other available idle machine. Several Condor pools can be merged by using two mechanisms, flocking and Condor-G. This last one is based on combining Condor with Globus, and it allows to utilize large collections of resources that span across multiple domains. Condor is easy to install and manage, and to take advantage of Condor features we do not to modify our C, C++, or Fortran programs; only relinking is required. Furthermore, once we have a binary for each architecture type in our Condor pool, Condor automatically transfers the executables to the target machines. If we need to communicate our processes, Condor includes versions of PVM and MPI, although using shared files is also possible.
3.6 WHICH OF THEM? We have made a somewhat detailed description of many systems, with the aim of giving an overview of those that can be considered as more interesting to be used to implement parallel heuristic algorithms. Choosing one of them depends basically on our programming skill and preferences and on the application requirements. In the context of shared-memory programming, a system programmer will be comfortable using C or C++ with Pthreads or even processes instead, a Java programmer will choose Java threads, and a Fortran programmer will probably prefer using OpenMP. On the other hand, if we require that our application be portable to UNIX and Windows operating systems, then Java is an option to be considered. However, if we prefer using C or C++ instead of Java, there are libraries that offer wrappers to Pthreads and Windows threads, thus ensuring portability. An example of such libraries is ACE (Adaptive Communication Environment, www . c s .wustl .edu/-schmidt/ACE. html). Besides, using OpenMP is adequate if our application follows a data-parallel model, but it can be difficult to use in task-parallel applications.
SUMMARY
75
We analyze now the tools for distributed programming. If we intend to develop a distributed algorithm to be executed in a cluster of workstations, sockets can be an adequate tool if we have experience using them and the complexity of communications is low; otherwise, using PVM or MPI can be preferable. If the machines in the cluster are heterogeneous, we can avoid some problems of sockets by using again libraries such as ACE or using Java sockets. If our favorite language is Java and the latency of communications is not critical, Java-RMI offers a high level distributed object-oriented programming model, but we must be aware that this model does not fit well with one-to-many communication or bamer synchronization; in this case, we should look again at MPI. The programming model of C O M A is similar to that of Java-RMI, but its learning curve is larger. C O M A is the choice if we need to combine program modules written in different programming languages. A common usage of CORBA is to implement computation modules in C++ (more efficiency) while the graphical user interface is written in Java. Finally, we discuss the adequacy of grid computing systems to develop parallel heuristic algorithms. The two analyzed systems, Condor and Globus, are suited if we need to use hundreds or thousands of machines, but they present different features that have to be taken into account. Condor is simple to install (a full installation can take few minutes) and manage, and several Condor pools can be merged easily if they belong to the same administrative domain. Furthermore, its ability to use idle processor cycles, the use of checkpointing and process migration (thus achieving fault tolerance), and the remote system-call facility make Condor very attractive. However, our programs should use shared files if interprocess communication is required; otherwise, Condor includes versions of P V M and MPI, but then some of the above features are not available. Although it is possible to use Java in Condor, its full power can only be obtained by programs written in C, C++, and Fortran. Compared to Condor, Globus is significantlymore complex to install and administer, and it presents advantagesto build grid systems spread among different administrative organizations. As a security mechanism, Globus uses X.500 certificates, and its C API provides full access to the Globus services, which must be used explicitly by the programmer. Furthermore, Globus does not incorporate any resource management system (RMS), which should be implemented by hand or, alternative, an existing RMS can be used as, for example, Condor. A choice to MPI programmers is to use MPICH-G2, the implementation of MPICH included by Globus. 3.7
SUMMARY
Parallel computing involves both hardware and software issues. The former are directly related to parallel architectures, while the latter have to do with parallel programming models. Nowadays there are two main kinds of parallel computers, multiprocessors and distributed systems, and the programs running on them require to use, respectively, shared-memory programming and distributed programming. This chapter has been dedicated to new technologies that can be used to implement parallel heuristic algorithms, although we have also included some tools that are not
76
NEW TECHNOLOGIES IN PARALLELISM
strictly new for completeness. Considering shared-memory programming, recent advances include parallel libraries such as OpenMP and the generalized use of threads, either using the thread AF'I offered by the operating system or with modem languages that already incorporate them, such as Java. Distributed systems are continuously evolving and, after the popularization of COWs, grid computing is gaining interest as a discipline which permits using thousands of machines as a single parallel computer. However, programming grid computer systems presents a number of difficulties not found when programming COWs, so popular message-passing tools (sockets, PVM, or MPI) used in COWs are not p o w e h l enough in a grid environment. We have described Globus, the de facto standard grid system, and Condor, although this is an open research area and there are many ongoing projects that can fit our requirements. Acknowledgments The authors acknowledge partial funding by the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project).
REFERENCES 1. E. Alba, A. J. Nebro, and J. M. Troya. Heterogeneous computing and parallel genetic algorithms. Journal ofParalle1 and Distributed Computing, 62(9): 13621385,2002.
2. G. R. Andrews. Foundations of Multithreaded, Parallel and Distributed Programming. Addison Wesley, 2000. 3. K. Anstreicher, N. Brixius, J.-P. Goux, and J. Linderoth. Solving Large Quadratic Assignment Problems on Computational Grids. Mathematical Programming, 91:563-588, 2002.
4. F. Berman, G. C. Fox, and A. J. G. Hey. Grid Computing: Making the Global Infrastructure a Reality. Wiley, 2003. 5. D.E. Comer and D.L. Stevens. Internetworking with TCPLP, volume 111.Prentice Hall, 1993. 6. F. Curbera, M. Duftler, R. Khalaf, W. Nagy, N. Mukhi, and S. Weerawarana.
Unraveling the web services web. an introduction to SOAP, WSDL, and UDDI. IEEE Internet Computing, 6(2):8&93, March-April 2002.
7. I. Foster and C. Kesselman. Globus: a metacomputing infraestructure toolkit. International Journal of Supercomputer Applications, 1l(2): 115-128, 1997.
8. I. Foster and C. Kesselman. Globus: A toolkit-based grid architecture. In Ian Foster and Carl Kesselman, editors, The Grid: Blueprint.for a New Computing Infrastructure, pages 259-278. Morgan Kaufmann, 1999.
REFERENCES
77
9. I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999. 10. I. Foster, C. Kesselman, J. M. Nick, and S. Tuecke. Grid services for distributed system integration. IEEE Computer, pages 37-46, June 2002. 11. V. Getov, G. von Laszewski,M. Philippsen, and I. Foster. Multiparadigmcommunications in java for grid computing. Communications of the ACM, 44( 10):118125,2001. 12. A. Globus, E. Langhirt, M. Livny, R. Ramamurthy, M. Solomon, and S . Traugott. Javagenes and condor: Cycle-scavenging genetic algorithms. In ACM Java Grande 2000 Conference, San Francisco (CA), June 2000. 13. E.D. Goodman. An introductionto galopps - the genetic algorithm optimized for portability and parallelism system, release 3.2. Technical Report 96-07-01, Intelligent Systems Laboratory and Case Center for Computer-Aided Engineering and Manufacturing, Michigan State University, 1996. 14. M. TomaSeviC M J. ProtiC J and V. Milutinovic. Distributed shared-memory: Concepts and systems. IEEE Parallel and Distributed Technology, 4(2):63-79, 1996.
15. N. T. Karonis, B. Toonen, and I. Foster. MPICH-G2: A grid-enabled implementation of the message-passing interface. Journal of Parallel and Distributed Computing, 63551-563,2003. 16. K. Krauter, R. Buyya, and M. Maheswaran. A taxonomy and survey of grid resource management systems for distributed computing. Sofmare - Practice and Experience, 32:135-164,2001. 17. M. Livny, J. Basney, R. Raman, and T. Tannenbaum. Mechanisms for high throughput computing. SPPEDUP Journal, 11(1):3640, June 1997. 18. N. Lynch. Distributed Algorithms. Morgan Kaufman, 1996. 19. J. Maassen, R. van Nieuwport, R. Veldema, H.E. Bal, and A. Plaat. An efficient implementation of java’s remote method invocation. In 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 173182, Atlanta, GA, May 1999. 20. Message-Passing Interface Forum. MPI: A message-passing interface standard. International Journal of Supercomputer Applications, 8(3/4):1 6 5 4 14, 1994. 2 1. M. Raynal. Distributed Algorithms and Protocols. John Wiley, 1988. 22. K. C. Sarma and H. Adeli. Bilevel parallel genetic algorithms for optimization structures. Computer-Aided Civil and Infrastructure Engineering, 16:296-304, 2001.
78
NEW TECHNOLOGIES IN PARALLELISM
23. V.S. Sunderam. PVM: a framework for parallel distributed computing. Journal of Concurrency Practice and Experience, 2(4):3 15-339, 1990. 24. V.S. Sunderam and G.A. Geist. Hererogeneous parallel and distributed computing. Parallel Computing,2511699-1721, 1999. 25. A. S. Tanenbaum. Distributed Operating Systems. Prentice-Hall, 1995. 26. Y. Tanimura, T. Hiroyasu, M. Miki, and K. Aoi. The system for evolutionary computing on the computational grid. In 14th International Conference on Parallel and Distributed Computing and Systems, pages 39-44, November 2002. 27. M. Tomassini, L. Vanneschi, L. Bucher, and F. Femandez. An mpi-based tool for distributed genetic programming. In Proceedings. IEEE International Conference on Cluster Computing 2000, pages 209-2 16, Chemnitz (Germany), Novembermecember 2000.
4
Metaheuristics and Parallelism E. ALBA’, E-G. TALB12, G. LUQUEl, N. MELAB2 ‘Universidad de Malaga, Spain 2Laboratoire d’lnformatique Fondamentale de Lille, France
4.1
INTRODUCTION
In practice, optimization problems are often NP-hard, complex, and CPU timeconsuming. Two major approaches are traditionally used to tackle these problems: exact methods and metaheuristics. Exact methods allow to find exact solutions but they are impractical as they are extremely time-consuming. Conversely, metaheuristics provide suboptimal solutions in a reasonable time. They allow to meet the resolution delays often imposed in the industrial field. Metaheuristics fall in two categories: local search metaheuristics (LSMs) and evolutionary algorithms (EAs). A local search starts with a single initial solution. At each step of the search the current solution is replaced by another (often the best) solution found in its neighborhood. Very often, LSMs allow to find a local optimal solution and so are called exploitation-oriented methods. On the other hand, EAs make use of a randomly generated population of solutions. The initial population is enhanced through a natural evolution process. At each generation of the process, the whole population or a part of the population is replaced by newly generated individuals (often the best ones). EAs are often called exploration-orientedmethods. Although the use of metaheuristics allows to significantly reduce the temporal complexity of the search process, the latter remains time-consuming for industrial problems. Therefore, parallelism is necessary to not only reduce the resolution time but also to improve the quality of the provided solutions. For each of the two families of metaheuristics, differentparallel models have been proposed in the literature. Each of them illustrates an alternative approach to handle and deploy the parallelization. According to their granularity, parallel LSMs follow three major models (from coarse-grained to fine-grained): parallel multistart, parallel moves, and move acceleration. As EAs are population-based algorithms the following approaches are often roughly used: parallelization of computation and parallelization of population. In the first model, the operations commonly applied to each of the individuals are performed in parallel. In the other model, the population is split in different subpopulations that 79
80
METAHEURISTICS AND PARALLELISM
can be simply exchanged or evolve separately and be joined later. In this chapter, we propose a study of these models through the major solution methods: LSMs (Simulated Annealing [49], Tabu Search [39], Greedy Randomized Adaptive Search Procedure [30], Variable Neighborhood Search [65]) and EAs (Genetic Algorithms [13], Evolutionary Strategies [13], Genetic Programming [ 131, Ant Colonies [26], Estimation of Distribution Algorithms [66], Scatter Search [38]). Other models are also sketched on parallel heterogeneous metaheuristics and parallel multiobjective optimization. The rest of the chapter is organized as follows: In Section 4.2, we highlight some principles of LSMs and their major parallel models. In Section 4.3, we show how these models are instantiated and applied to the cases of LSM studies cited above. In Section 4.4, we give some principles of EAs and present their main parallel models. In Section 4.5, we study some cases of EAs as previously quoted. In Section 4.6, we analyze other models such as parallel heterogeneous metaheuristics or parallel multiobjective optimization. In Section 4.7, a conclusion is drawn and some future directions are proposed. 4.2
PARALLEL LSMs
4.2.1 Principles of LSMs Metaheuristics for solving optimization problems could be viewed as “walks through neighborhoods” meaning search trajectories through the solutions domains of the problems at hand [23]. Algorithm 1. LSM skeleton pseudocode Generate(s(0)); t := 0; while not Termination-Criterion(s(t)) do s’(t) := SelectMove(s(t)); if AcceptableMove(s’(t)) then s’(t) := ApplyMove(s(t)); t := t+l; endwhile The walks are performed by iterative procedures that allow to move from a solution to another one in the solution space (see Algorithm 1). LSMs perform particularly the moves in the neighborhood of the current solution. The walks start from a solution randomly generated or obtained from another optimization algorithm. At each iteration, the current solution is replaced by another one selected from the set of its neighboring candidates. The search process is stopped when a given condition is satisfied (stopping criterion). A powerful way to achieve high performance with
CASE STUDIES OF PARALLEL LSMs
81
LSMs is the use of parallelism. Different parallel models have been proposed, and these are summarized in the next section. 4.2.2 Parallel Models of LSMs Three parallel models are commonly used in the literature: the parallel multistnrt model, the parallel exploration and evaluation of the neighborhood orparallel moves model, and the parallel evaluation of a single solution or move acceleration model. 0
0
0
Parallel multistart model. It consists in simultaneously launching several LSMs for computing better and robust solutions. They may be heterogeneous or homogeneous, independent or cooperative, start from the same or different solution(s), configured with the same or different parameters. Parallel moves model. It is a low level farmer-workermodel that does not alter the behavior of the heuristic. A sequential search computes the same results slower. At the beginning of each iteration, the farmer duplicates the current solution between distributed nodes. Each one manages some candidates and the results are returned to the farmer. Move acceleration model. The quality of each move is evaluated in a parallel centralized way. That model is particularly interesting when the evaluation function can be itself parallelized as it is CPU time-consuming and/or inputout (10) intensive. In that case, the function can be viewed as an aggregation of a certain number of partial functions.
4.3 CASE STUDIES OF PARALLEL LSMs 4.3.1
Parallel Simulated Annealing
Simulated Annealing (SA) [49] is a stochastic search method in which, at each step, the current solution is replaced by another one that improves the objective function, randomly selected from the neighborhood. SA uses a control parameter, called temperature, to determine the probability of accepting nonimproving solutions. The objective is to escape from local optima and so to delay the convergence. The temperature is gradually decreased according to a cooling schedule such that few non-improving solutions are accepted at the end of the search. To our knowledge, SA was the first optimization LSM to be parallelized. Several parallel implementations have been proposed between 1980 and 1990, most of them focusing on cell placement problems in VLSI layout [43,50,52,64,77]. The major of parallelization approaches can be classified into two categories: move acceleration and parallel moves [43]. Table 4.1 shows some representative parallel SAs. In [ 171, these models are implemented as skeletons and can be instantiated by the user. The fine-granularity nature of the move acceleration model makes it not suitable
82
METAHEURISTICS AND PARALLELISM
for distributed-memory systems. Indeed, its implementation is often restricted to shared-memory machines [50]. Table 4.1 A quick survey of several parallel SA Article (1 987) (1 987) (1 990) ( 1996) (2000) (2000) (2000) (2004)
Parallel Model
Move acceleration Noninteracting parallel moves Noninteracting parallel moves Parallel multistart (synchronous and asynchronous) Noninteracting and interacting parallel moves Parallel multistart (synchronous and asynchronous) Parallel asynchronous multistart A general framework for parallel algorithms (e.g., parallel SA)
The parallel moves model is more widely investigated [ 19, 771. In this model, different moves are evaluated in a concurrent way. Each processor generates and evaluates moves independently. The model suffers from inconsistency: due to the moves made by other processors, the cost function computations may be incorrect. Two major approaches are usually used to manage the inconsistency: (1) the evaluation of moves is performed in parallel and only non-interacting moves are accepted [43, 771. It can be viewed as a domain decomposition approach. This allows to preserve the convergence property of the sequential algorithm and permits good speedups [50,43]. The difficulty of the approach is how to determine noninteractive moves. ( 2 ) The second approach consists in evaluating and accepting in parallel multiple interacting moves. Some errors in the calculation of the cost functions are allowed. Errors are corrected after a certain number of moves (after each temperature in [43]) by synchronization between processors. However, it affects the convergence of the parallel algorithm compared to the sequential algorithm. In addition, due to the synchronization cost negative speedups may be obtained, as reported in [43]. While most parallel implementations of SA are based on the parallel moves model, several other parallelizations follow the parallel multistart model. These parallelizations use multiple Markov chains [43, 52, 641, and many of them are applied to the cell placement problem. Each chain performs cell moves on the whole set of cells rather than only on a subset of cells. This approach allows to overcome the performance problems of the parallel moves strategy caused by the use of restricted moves and tolerated errors in the cost function evaluation. In the parallel multiple Markov chains approach, each processor carries out SA on a local copy of the whole problem data. The processors dynamically combine their solutions by exchanging their best ones in a synchronous or asynchronous way. 4.3.2
Parallel Tabu Search
Tabu Search (TS) [39] manages a memory of solutions or moves recently applied, called the tabu list. When a local optimum is reached, the search carries on by
CASE STUDIES OF PARALLEL LSMs
83
selecting a candidate worse than the current solution. To avoid the previous solution to be chosen again, and so to avoid cycles, TS discards the neighboring candidates that have been previously applied. Several parallel implementations of TS are briefly summarized in Table 4.2. Most of them are based on the multistart model and/or a neighborhood decomposition, and follow the parallel moves model [32, 721. In [9, 14, 171, the different models are implemented as general skeletons and can be instantiated by the user. Table 4.2 A quick survey of several parallel TS Article [32] [72] [89] [41 [I41
(1994) (1995) (1996) (1998)
[I71
(2004)
(2001)
Parallel Model
Parallel moves, large TSP Parallel moves, task scheduling Parallel independent multistart with adaptive load balancing, QAP Parallel cooperative multistart, circuit partitioning Parallel skeletons for Tabu search: independent parallel multistart, master-slave with neighborhood partition Cooperative parallel multistart, capacitated network design Parallel skeletons for Tabu search: independent parallel multistart (with search strategies), master-slave (with neighborhood partition) Parallel skeletons for Tabu search (all models)
In [32], it is claimed that due to its heavy synchronization such a model is worth applying to problems in which the calculations required at each iteration are timeconsuming. In [72], a the parallel implementation of TS based on parallel moves model is applied to such a problem, and linear speedups can be attained for large problems. A wide range of parallel implementations based on parallel multistart model have also been proposed [4, 21, 891. In most of them, a sequential TS performs its search in each processor. The different TS algorithms may use a different initial solution and use different parameter values. These TS algorithms may be completely independent [89]. They also can cooperate through a central pool of elite solutions held by a dedicated master processor [4]. An elite solution is a local optimum which improves the best solution already locally visited. The cooperation may be performed as a post-optimization intensification procedure based on path-relinking. A parallel cooperative multistart approach has also been developed in [21]. The results show that it outperforms both the sequential algorithm and the independent parallel multistart approach.
4.3.3 Parallel GRASP The Greedy Randomized Adaptive Search Procedure (GRASP) [30] is a multistart algorithm. Each iteration of the algorithm is composed of two phases: a construction phase and a local search phase. The construction phase consists in generating in an iterative way a feasible solution by a greedy randomized procedure. The local search
84
METAHEURISTICS A N D PARALLELISM
phase allows to provide a local optimum in the neighborhood of the constructed solution. The resulting solution of the problem at hand is the best solution over all iterations. Table 4.3 shows that the major part of the parallel implementations [53,57,70] of GRASP are based on the parallel multistart model. Many of these implementations are proposed by Resende and his collaborators. Parallelism consists in distributing the iterations over the processors. Each processor receives a copy of the sequential algorithm and a copy of the problem data. Since the iterations are independent and very little information is exchanged between processors linear speedups are often obtained. For instance, in [53] almost linear speedups are reported on an implementation of a parallel GRASP applied to the quadratic assignment problem, and in particular a speedup of 62 is obtained on 64 processors. In more recent research works on parallel GRASP the parallel iterations are followed by the path-relinking intensification process to improve the quality of the obtained solutions [ 3 , 61. In [5], a methodology is proposed for the analysis of parallel GRASP approaches. Table 4.3 A quick survey of several parallel GRASP Article [53] [70]
(1994) (1995) [57] (1998) [6]
(2000)
[3]
(2003)
[5]
(2003)
Parallel Model Parallel multistart, QAP Parallel multistart, QAP Cycle stealing, parallel multistart with adaptive load balancing, Steiner problem in graphs Parallel independent multistart with path-relinking, Tree index assignment problem Parallel independent multistart with path-relinking, Job shop scheduling problem A methodology for the analysis of parallel GRASP
Load balancing may be easily achieved by evenly distributing the iterationsamong the processors. However, in a heterogeneous multiuser executionenvironment a static distribution may be less efficient. In [57], a dynamic adaptive distribution approach is presented. The approach is based on the farmer-worker cycle stealing strategy. ,Eachworker processor is initially allocated a small number of iterations. Once it has performed its iterations, it requests from the farmer processor additional iterations. All the workers are stopped once the final result is returned. Faster and less loaded processors perform more iterations than the others. This approach allows to reduce the execution time compared to the static one.
4.3.4 Parallel Variable Neighborhood Search The basic idea of the Variable Neighborhood Search (VNS) [65] is to successively explore a set of predefined neighborhoods to provide a better solution. It uses the descent method to get the local minimum. Then, it explores either at random or systematically the set of neighborhoods. At each step, an initial solution is shaken
PARALLEL EVOLUTIONARY ALGORITHMS
85
from the current neighborhood. The current solution is replaced by a new one if and only if a better solution has been found. The exploration is thus restarted from that solution in the first neighborhood. If no better solution is found, the algorithm moves to the next neighborhood, randomly generates a new solution, and attempts to improve it. Since VNS is a relatively new metaheuristic, it has not yet been investigated much from a parallelization point of view. The two major research works reported in the literature on the parallelization of VNS are [37] and [22]. In [37], three approaches have been proposed and compared: the first one follows a low level parallel model and attempts to speed up the execution by parallelizing the local search phase; the second one is based on the parallel independent multistart model; and the third strategy implements the parallel synchronous cooperative multistart model. They have been experimented on the TSPLIB problem instances using 1400 customers. The reported results show that the multistart model obtained the better solutions. Table 4.4 A quick survey of parallel VNS Article
[37] [37] [37]
[22]
(2002) (2002) (2002) (2004)
Parallel Model
Parallel local search, TSPLIB Parallel independent multistart, TSPLIB
Parallel cooperative synchronous multistart, TSPLIB Parallel cooperative asynchronous multistart, p-median problem
In [22], an asynchronous cooperative variant of the parallel multi-start model, called Cooperative Neighborhood VNS (or CNVNS), is proposed. The farmer keeps, updates, and communicates the current overall best solution. It also initiates and terminates the procedure. Unlike the parallel cooperative synchronous variant of the multistart model previously presented, the communications are initiated by the workers in an asynchronous way. When a worker cannot improve its solution it communicates it to the farmer if it is better than the one at the last communication. The overall best solution is requested from the farmer and serves as the initial solution fiom which the search is started in the current neighborhood. The approach has been experimented on p-median problem instances of up to 1,000 medians and 11,948 customers. The results show that the strategy allows to reduce the computation time without losing on solution quality compared to the sequential VNS. It also allows to find better solutions given time.
4.4 P A W L E L EVOLUTIONARY ALGORITHMS 4.4.1 Principles of EAs Evolutionary Algorithms (broadly called EAs) are stochastic search teclmques that have been successfully applied in many real and complex applications (epistatic, multimodal, multiobjective and highly constrained problems). Their success in
86
METAHEURISTICS AND PARALLELISM
solving difficult optimization tasks has promoted the research in the field known as evolutionary computing (EC) [13]. An EA is an iterative technique that applies stochastic operators on a pool of individuals (the population) (see Algorithm 2). Every individual in the population is the encoded version of a tentative solution. Initially, this population is generated randomly. An evaluation function associates a fitness value to every individual indicating its suitability to the problem. Algorithm 2. EA pseudocode Generate(P( 0)); t := 0; while not Termination-Criterion(P(t)) do Evaluate(P(t )); P’(t) := Selection(P(t)); P’(t) := ApplyReproduction-Ops(P’( t)); P ( t 1) := Replace(P(t), P’(t)); t := t + 1; endwhile
+
The above pseudocode shows the genetic components of any EA. There exist several well-accepted subclasses of EAs depending on the representation of the individuals or on the applied evolution step. The main subclasses of EAs are the Genetic Algorithm (GA), Evolutionary Programming (EP), the Evolution Strategy (ES), and some others not shown here. 4.4.2
Parallel Models of EAs
For nontrivial problems, executing the reproductive cycle of a simple EA on long individuals and/or large populations requires high computational resources. In general, evaluating a fitness fbnction for every individual is frequently the most costly operation of the EA. Consequently,a variety of algorithmic issues are being studied to design efficient EAs. These issues usually consist of defining new operators, hybrid algorithms, parallel models, and so on. We now analyze the parallel models used in the field of the EC. Parallelism arises naturally when dealing with a population, since each of the individuals belonging to it is an independent unit. Due to this, the performance of population-based algorithms is specially improved when running in parallel. Two parallelizing strategies are specially focused on population-based algorithms: ( 1 ) parallelization of computation, in which the operations commonly applied to each of the individuals are performed in parallel and (2) parallelization of population, in which the population is split in different parts that can be simply exchanged or evolve separately and be joined later. In the beginning of the parallelization of these algorithms the well-known masterslave (also knows as globaIparallelization) method was used. In this way, a central
CASE STUDIES OF PARALLEL EAs
87
processor performs the selection operations while the associated slave processors perform the recombination, mutation, and evaluation of the fitness function. This algorithm is the same as the sequential one, although it is faster, especially for timeconsuming objective functions. Also, many researchers use a pool of processors to speed up the execution of a sequential algorithm, just because independent runs can be made more rapidly by using several processors than by using a single one. In this case, no interaction at all exists between the independent runs. However, actually most parallel EAs (PEAS) found in the literature utilize some kind of spatial disposition for the individuals and then parallelize the resulting chunks in a pool of processors. Among the most widely known types of structured EAs, the distributed (dEA) (or coarse-grain) and cellular (cEA) (or fine-grain)algorithms are very popular optimization procedures [lo]. In the case of distributed EAs, the population is partitioned in a set of islands in which isolated EAs are executed. Sparse individual exchanges are performed among these islands with the goal of introducing some diversity into the subpopulations, thus avoiding them to fall in local optima. In the case of a cellular EA the concept of neighborhood is introduced, so that an individual may only interact with its nearby neighbors in the breeding loop. The overlapped small neighborhood in cEAs help in exploring the search space because a slow diffusion of solutions through the population provides a kind of exploration, while exploitation takes place inside each neighborhood. Also hybrid models have been proposed in which a two-level approach of parallelization is undertaken. In general, the higher level for parallelization is a coarse-grain implementation and the basic island perfoms a cEA, a master-slave method, or even another distributed one. 4.5 CASE STUDIES OF PARALLEL EAs
4.5.1
Parallel Genetic Algorithms
Genetic Algorithms (GAS) [ 131 are a very popular class of EAs. Traditionally, GAS are associated with the use of a binary representation, but nowadays you can find GAS that use other types of representations. A GA usually applies a recombination operator on two solutions, plus a mutation operator that randomly modifies the individual contents to promote diversity. In Table 4.5 we show some of the most important and representative works on parallel GAS. The distributed model is the most common parallelization in PGAs, since it can be implemented in distributed-memoryMIMD computers. Some coarsegrain algorithms like dGA [90], DGENESIS [ 5 8 ] , and GALOPPS [40] are relatively close to the general distributed model of migration islands. They often include many features to improve efficiency. Some other coarse-grain models like GDGA [44] have been designed for specific goals, such as providing explicit exploratiodexploitation by applying different operators on each island. Some other PGAs execute nonorthodox models of coarse-grain evaluation, such as GENITOR I1 [94], which is based on a steady-state reproduction.
88
METAHEURISTICS AND PARALLELISM
Table 4.5 Algorithm
A quick survey of several parallel GAS
Article
ASPARAGOS dGA GENITOR I1 ECO-GA EnGEN Eer GAME DG EN ESI S GALOPPS GDGA ParadisEO
[41] [90] [94] [95] [76]
[82] [58] [40] [44] [I71
Parallel Model
(1989) (1989) (1990) (1991) (1992) (1993) (1994) (1996) (2000) (2004)
Fine-grain. Applies hill-climbing if no improvement Distributed populations Coarse grain Fine-grain Global parallelization Object oriented set of general programming tools Coarse grain with migration among subpopulations Coarse grain Coarse grain. Hypercube topology A general framework for parallel algorithms
On the other hand, parallel implementations of the cellular model have been strongly associated to the machines on which they run: ASPARAGOS [4 11and ECOGA [95]. As to the master-slave model, some implementations, such as EnGEMEer [76], are available. Finally, some efforts to construct general frameworks for PGAs are GAME [82] or ParadisEO [17]. The mentioned systems are endowed with “general” programming structures intended to ease the implementation of any model of PGA. 4.5.2
Parallel Evolution Strategies
Evolution Strategies (ESs) [ 131 are other subclasses of EAs, such as GAS or GPs. This algorithm is suited for continuous optimization, usually with an elitist selection and a specific mutation (crossover is used rarely). In ES, the individual is composed of the objective float variables plus some other parameters guiding the search. Thus, an ES facilitates a kind of self-adaption by evolving the problem variables as well as the strategy parameters at the same time. Hence, the parameterization of an ES is highly customizable.
Table 4.6 A quick survey of several parallel ESs Article [79] [24] [83] [Sl] [42] [93]
(1991) (1993) (1994) (1996) (1 999) (2004)
Parallel Model Distributed Distributed Cellular Distributed and cellular Cellular Cellular with dynamic neighborhood structures
CASE STUDIES OF PARALLEL EAs
89
Table 4.6 shows some representative parallel implementation of ESs. Several of these works [81, 83, 931 follow the cellular approach where the individual are structured and may only interact with its nearby neighbors. They show that a cellular model applied to complex problems can have a higher convergence probability than panmictic GAS. Other studies had analyzed the converge properties of this model, such as [42]. The classic distributed model also has been extensively used to implement parallel versions of ES [24, 79, 811 and it obtains very competitive results for optimization problems and continuous problems. 4.5.3 Parallel Genetic Programming Genetic Programming (GP) [13] is a more recent EA which extends the generic model of learning to the space of programs. Its major variation with respect to other evolutionary families is that the evolving individualsare themselves programs instead of fixed-length strings from a finite alphabet of symbols. GP is a form of program induction that allows to automatically discover programs that solve or approximately solve a given task. See Table 4.7 for a summary of parallel implementations of GP.
Table 4.7 A quick survey of parallel CP Article
[48] [ I I] [27] [ 121 [73] [31] [33]
(1996) (1996) (1996) (1 996) ( I 998) (2000) (2001)
Parallel Model Fine-grain Distributed Master-slave Distributed Distributed Distributed Fine-grain
GP is not in general suitable for massively cellular implementations since individuals may vary widely in size and complexity. This makes cellular implementations of GP difficult both because of the amount of local memory needed to store individuals as well as for efficiency reasons. Despite these difficulties, several fine-grain parallel GP’s have been implemented, such as [48]. Actually, several implementations of this cellular model on distributed-memorycomputers can be found in the literature, such as [33], where the authors show that their parallel cellular GP has a nearly linear speedup and a good scaleup behavior. For coarse-grain, island-basedparallel genetic programming the situation is somewhat controversial. Several works [ 11, 12,3 11 reported excellent results but another one [73] found that the multiple-population approach did not help in solving some problems. Also other implementations can be found such as [27], where the authors used a master-slave approach.
90
METAHEURISTICS A N D PARALLELISM
4.5.4 Parallel Ant Colony Optimization
The ant colony optimization technique (ACO) [26] is a new metaheuristic for hard combinatorial optimization problems. Ant algorithms have been inspired by colonies of real ants, which deposit a chemical substance (called pheromone) on the ground. This substance influences the choices they make: the larger the amount of pheromone on a particular path, the larger the probability that the ants select the path. Artificial ants are stochastic construction procedures that probabilistically build a solution by iteratively adding solution components to partial ones by taking into account (1) heuristic information on the problem and (2) pheromone trails which change dynamically at runtime to reflect the acquired search experience. Ant algorithms are good candidates for parallelization, but not much research has been done in parallel ant algorithms so far. Now, in this section we briefly describe the most important parallel implementation and parallel model of ant colony algorithms that have been described in the literature.
Table 4.8 A quick survey of several parallel ACO Article (1 993) (1 997) ( 1998) (1 998) (1 999) (2001) (2002) (2002) (2002) (2004)
Parallel Model Fine-grained and a coarser grained variant Information exchange every k generations Independent runs Distributed Master-slave Distributed Distributed Master-slave and independent runs A very coupled master-slave Master-slave
In Table 4.8, we can observe that many parallel algorithms follow a master-slave model [25,74,75,88]. In this model, a master process sends some solutions to each slave processor, and after each generation these colonies exchange information and the master calculates the pheromone matrix. The distributed or island model has also been used in multiple implementation of a parallel version of ACO [59,62,63]. In this last model has been studied different methods for information exchange in multicolony ant algorithm, and these studies conclude that it is better to exchange the local best solution only with the neighbor in a directed ring and not too often (loose coupling). Stiitzle [87] studied the easiest way to parallelize the algorithm (run several independent executions and return the best solution of best execution) and he obtained more accurate results than the serial version in many cases.
CASE STUDIES OF PARALLEL EAs
91
4.5.5 Parallel Estimated Distribution Algorithms Estimation of Distribution Algorithms (EDAs) are a recent type of optimization and learning t e c h q u e based on the concept of using a population of tentative solutions to improve the best-so-far optimum for a problem [66]. The general EDA can be sketched as follows:
Algorithm 3. EDA pseudocode Set t c 1; Generate N >> 0 points randomly; while termination criteria are not met do Select M 5 N point according to a selection method; Estimate the distribution p*(z,t ) of the selected set; Generate N new points according to the distribution p* (2,t ) ; Set t t t 1; endwhile
+
The chief step in this algorithm is to estimate p*(x?t ) and to generate new points according to this distribution. This represents a clear difference with respect to other EAs that use recombination and/or mutation operators to compute a new population of tentative solutions. This algorithm requires considerable CPU and memory utilization. Therefore, it is important to find techniques that allow improving its execution, such as parallelism. There are several possible levels at which an EDA can be parallelized (1) estimation of probability distribution level (or learning level), (2) sampling of new individuals level (or simulation level), (3) population level, (4) fitness evaluation level, and (5) any combination of the mentioned levels. Table 4.9 Algorithm
A quick survey of several parallel EDAs Article
(2000) (2001) (2001) (2003) (2003) (2003) (2003) (2004) (2004)
Parallel Model
Hybrid, learning level and simulation Learning level Hybrid, learning level and simulation Population level Learning level Learning level Learning level Learning level Hybrid, learning level and simulation
Table 4.9 shows a survey of the most important works on parallel EDAs. Notice that in most cases the algorithms try to reduce the time required to learn the probability distribution [ 5 5 , 56,60,68]. In general, these algorithms for learning the probability
92
METAHEURISTICS AND PARALLELISM
distribution use a score+search procedure mainly defining a metric that measures the goodness of every candidate Bayesian network with respect to a database of cases. Also, several approaches use the parallelization of the sampling of new individuals in order to improve the behavior of the algorithms [61, 68, 691. Only a few works use the other levels of parallelism [2] in which the authors proposed a skeleton for the construction of distributed EDAs simulating the migration through vectors of probabilities. 4.5.6
Parallel Scatter Search
Scatter Search (SS) [38] is a population-based metaheuristic that combines solutions selected from a reference set to build others. The method starts by generating an initial population of disperse and good solutions. The reference set is then constructed by selecting good representative solutions from the population. The selected solutions are combined to provide starting solutions to an improvement procedure. According to the result of such a procedure the reference set and even the population of solutions can be updated. The process is iterated until a stopping criterion is satisfied. The SS approach involves different procedures allowing to generate the initial population, to build and update the reference set, to combine the solutions of such a set, to improve the constructed solutions, etc. The major parallel implementations of SS are summarized in Table 4.10. Parallelism can be used at three levels of the SS process: the improvement procedure level, the combination level, and the whole process level. The first level is low level and consists in parallelizing the improvement procedure by using the different parallel models of LSMs quoted above. In [36], the coarse-grain parallelism of the local searches (parallel moves model) is exploited and called Synchronous Parallel Scatter Search or SPSS. The model is applied to the p-median problem, and the experimental results demonstrate its efficiency in terms of the quality of the provided solutions. Furthermore, it allows to properly reduce the computational time.
Table 4.10 A quick survey of the major parallel SS ~~~
Article [36] [36] [36] [38]
(2003) (2003) (2003) (2004)
~
~~
~
Parallel Model Parallel local search with parallel moves (SPSS) Master-slave parallel combinations (RCSS) Independent runs (RPSS), p-median problem RCSS with different combination methods and different parameter settings, feature selection in data mining
The second level of parallelism is obtained by parallelizing the combinations of solutions. The set of possible combinations is divided among a set of available processors and solved in parallel. Such a model is presented in [36] and called Replicated Combination Scatter Search or RCSS. The experimentations of its application to the
OTHER MODELS
93
p-median problem show that the model allows to obtain the best known results in a reduced computational time. Another variant of the model with different combination methods and different parameter settings has been proposed in [38] and experimented in the data mining area. The objective of the approach is to improve the precision of the SS metaheuristic without increasing the computational time. At the third level of parallelism, the whole SS process is parallelized, i.e., each processor runs a SS procedure. The model is multistart and its objective is to increase the diversification of the solutions. The intensification can also be increased by sharing the best found solution. In [38], the model is called Replicated Parallel Scatter Search or RPSS and has been applied to the p-median problem. The reported experimental results show that the model allows to find the best objective values.
4.6
OTHER MODELS
In this section we include a set of models whose classification is unclear because they could belong to some of the existing types of algorithms.
4.6.1 Parallel Heterogeneous Metaheuristics A heterogeneous algorithm is a method whose components either are executed over a different computing platform or have different search features. The first class, hardware heterogeneity, is an influent issue because of the current relevance of Internet and grid computing. Several works have found that this kind of heterogeneous platform allows to improve the performance [8]. As to the other class, software/search heterogeneity, we can define additional levels of heterogeneity as regards to the kind of search that the components are making. At this software level, we can distinguish various sublevels according to the source of the heterogeneity: (1) Parameter level: We use the same metaheuristic in each component but vary the parameter configuration. (2)’Operator level: At this level the heterogeneity is introduced by using different space search mechanism (for example, different genetic operators in GAS, or different neighborhood definitions in SAs). (3) Solution level: At this level each component stores locally encoded solutions represented with different encoding schemata. (4) Algorithm level: This is the most general class of software heterogeneity in which each component can potentially be a different algorithm. In Table 4.11 we show several representative algorithms of each level of the software heterogeneous class. We can observe that in most cases the heterogeneity is introduced by using a different configuration in each component that composes the heterogeneous algorithm [ 1, 7, 2 1, 45, 9 1, 921. In general, these algorithms use the distributed parallel model, and each island uses its own configuration (different rates of mutation or crossover), although each implementation has its own features which distinguishes it from the rest. For example, in PGA-PA [91] the parameters are also migrated and evolved, or in Hy4, the 16 subpopulations are arranged in a hypercube
94
METAHEURISTICS A N D PARALLELISM
Table 4.1 1 A quick survey of several parallel heterogeneous metaheuristics Algorithm iiGA
GCPSA CoPDEB CGA
MACS-VRPTW DGNnnr PGA-PA
CPTS CPM-VRPTW HY4
Article ( 1994)
( 1 996) ( 1996) ( 1996) ( I 999) ( 1 999) (2002) (2002) (2004) (2004)
Heterogeneity level
Solution level (distributed model) Algorithm level with heuristic relay (master-slave model) Operator and parameter levels (distributed model) Parameter level (master-slave model)
Operator level (distributed model)
Parameter level (distributed model) Parameter level (distributed model) Parameter level (distributed model) Algorithm level with heuristic teamwork Operator and parameter levels (distributed model)
topology of four dimensions. Another common way of heterogeneity is to put to work together different algorithms, for example, CPM-VRPTW [5 11 is composed of two TS, two EAs, and a local search method or GCPSA [46] is composed of several SA's and one GA.
4.6.2 Parallel Multiobjective Optimization Optimization problems for real applications often have to consider many objectives and we thus have a multiobjective (MO) problem. A trade-off between the objectives exist and we never have a situation in which all the objectives can be satisfied in a best possible way simultaneously. MO optimization provides the information of all possibilities of alternative solutions we can have for a given set of objectives. By analyzing the spectrum of solutions we have to decide which of these solutions is the most appropriate. The two steps are to solve the MO problem and decide what the optimal solution is. Parallelization may be especially productive in certain MO applications, due primarily to the fact that identifying a set of solution, perhaps a very large one, is often the primary goal driving search. Table 4.12 summarizes some of the most important works about parallel MO algorithms. The fitness evaluation in MO problems is the most time-consuming process of the algorithm. Therefore, several algorithms try to reduce this time by means of parallelizing the calculation of the fitness evaluation [84,85,86]. The other most used model is the distributed model. Several algorithms follow this model [47, 71, 801. Other models also are implemented, for example, Nebro et al. [67] proposed an exhaustive algorithm that uses a grid computing technique to parallelize the search of the algorithm or Conti et al. [20] use a SA with parallel exploration of the neighborhood.
CONCLUSIONS
95
Table 4.12 A quick survey of several parallel MO algorithms Article ( 1994) ( 1 995) ( 1 995)
(1 996) ( 1 997)
(2000) (2002)
(2004) (2004)
Parallel Model SA with parallel exploration of the neighborhood Master-slave model with slaves computing fitness evaluation Master-slave model with slaves computing fitness evaluation Cellular model Heterogeneous distributed algorithm Two master-slave algorithms Distributed hybrid algorithm Enumerative deterministic algorithm Distributed particle swam optimization
4.7 CONCLUSIONS Parallelism is a powerful and necessary way to reduce the computation time of metaheuristics and/or improve the quality of the provided solutions. Different models have been proposed to exploit the parallelism of metaheuristics. These models have been and are still being largely experimented on a wide range of metaheuristics and applied to a large variety of problems in different areas. The reported results in the literature demonstrate that the efficiency of these models depends at the same time on the kind of metaheuristic at hand and the characteristics of the problem being tackled. The survey presented in this chapter shows that: (1) the parallel multistart model of LSMs is straightforward to use in its independent variant and allows in this case to improve the robustness of the execution. The cooperative variant is more complex but often provides better solutions. Synchronous information exchange guarantees more reliability but is often less efficient than asynchronous exchange. ( 2 ) This statement is also true for the parallel moves model. The efficiency of this model may be greater if the evaluation of each move is time-consuming andor there are a great deal of candidate neighbors to evaluate. Another parameter that influences the performance of such a model is the moves interactivity. If only noninteractive moves are accepted the convergence property of the sequential algorithm can be preserved and good speedups can be obtained. However, determining noninteractive moves is not an obvious task. On the other hand, acceptance of interactive moves affects the convergence of the algorithm. In addition, negative speedups may probably be obtained due the synchronization problem. ( 3 ) The move acceleration model may be particularly interesting if the evaluation function can be itself parallelized as it is CPU time-consuming andor I 0 intensive. In general, due to its fine-grained nature it is less exploited than the other models. The master-slave model has been and is still very popular in the area of parallel EAs. It allows to speed up the execution, especially for time-consuming objective functions. The model is easy to use as it supposes independent runs with any interaction between them. Nevertheless, most parallel EAs are nowadays based on the distributed coarse-grained model and the cellularjne-grained model. The former
96
METAHEURISTICS A N D PARALLELISM
allows to introduce some diversity and to provide better solutions. Its efficiency depends on some parameters such as the exchange topology, the exchange mode (synchronous/asynchronous), etc. The other model allows at the same time better exploration and exploitation of the search space. Therefore, it provides better and more diverse solutions. In the last decade, grid computing [34] and Peerito-Peer (P2P) computing [28] have become a real alternatives to traditional supercomputing for the development of parallel applications that harness massive computational resources. In the future, the focus in the area of parallel and distributed metaheuristics will be on the gridification of the parallel models presented in this chapter. This is a great challenge as nowadays grid and P2P-enabled frameworks for metaheuristics are just emerging [ 18,29,67]. Acknowledgments The first and third authors acknowledge funding by the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project).
REFERENCES 1. P. Adamidis and V. Petridis. Co-operating Populations with Different Evolution Behaviors. In Proc. of the Third IEEE Con$ on Evolutionary Computation, pages 188-191, New York, 1996. IEEE Press. 2. C.W. Ahn, D.E. Goldberg, and R.S. Ramakrishna. Multiple-deme parallel estimation of distribution algorithms: Basic framework and application. Technical Report 20030 16, University of Illinois, 2003. 3. R.M. Aiex, S. Binato, and M.G.C. Resende. Parallel grasp with path-relinking for job shop scheduling. Parallel Computing, 291293430,2003. 4. R.M. Aiex, S.L. Martins, C.C. Ribeiro, and N.R. Rodriguez. Cooperative multithread parallel tabu search with an application to circuit partitioning. LNCS 1457,pages 310-331,1998. 5. R.M. Aiex and M.G.C. Resende. A methodology for the analysis of parallel grasp strategies. AT&TLabs Research TR, Apr. 2003.
6. R.M. Aiex, M.G.C. Resende, P.M. Pardalos, and G. Toraldo. Grasp with path relinking for the tree-index assignment problem. TR, AT&T Labs Research, Florham Park, NJ 07932, USA, 2000.
7. E. Alba, F. Luna, A.J. Nebro, and J.M. Troya. Parallel heterogeneous GAS for continuous optimization. Parallel Computing, 30:699-7 19, 2004.
REFERENCES
97
8. E. Alba, A.J. Nebro, and J.M. Troya. Heterogeneous Computing and Parallel Genetic Algorithms. J of Parallel and Distributed Computing, 62: 1362-1 385, 2002. 9. E. Alba and the MALLBA Group. MALLBA: A library of skeletons for combinatorial optimization. LNCS 2400, pages 927-932,2002. 10. E. Alba and M. Tomassini. Parallelism and Evolutionary Algorithms. IEEE Transactions on Evolutionary Computation, 6(5):443462,2002. 11. D. Andre and J.R. Koza. Parallel genetic programming: a scalable implementation using the transputer network architecture. In Advances in genetic programming: volume 2, pages 317-337. MIT Press, 1996. 12. D. Andre and J.R. Koza. A parallel implementation of genetic programming that achieves super-linear performance. In H.R. Arabnia, editor, Proceedings of the International ConJ on ParaNel and Distributed Processing Techniques and Applications, volume 111, pages 1163-1 174, 1996. 13. T. Back, D.B. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation. Oxford University Press, 1997. 14. M.J. Blesa, L1. Hernandez, and F. Xhafa. Parallel Skeletons for Tabu Search Method. In the Sth Intl. Con$ on Parallel and Distributed Systems, Korea, IEEE Computer Society Press, pages 23-28,2001. 15. M. Bolondi and M. Bondaza. Parallelizzazione di un Algoritmo per la Risoluzione del Problema del Commesso Viaggiatore. Master’s thesis, Politecnico di Milano, Dipartamento di Elettronica e Informazione, 1993. 16. B. Bullnheimer, G. Kotsis, and C. Straufl. Parallelization Strategies for the Ant System. Technical Report 8, University of Viena, October 1997. 17. S. Cahon, N. Melab, and E.-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal of Heuristics, 10(3):357-380, May 2004. 18. S. Cahon, N. Melab, and E-G. Talbi. ParadisEO on Condor-MW for optimization on computational grids. http://ww. lif.fr/-cahon/cmw/index. html, 2004. 19. J.A. Chandy and P. Banerjee. Parallel Simulated Annealing Strategies for VLSI Cell Placement. Proc. of the I996 Intl. Con$ on VLSI Design, Bangalore, India, page 37, Jan. 1996. 20. M. Conti, S. Orcioni, and C. Turchetti. Parametric Yield Optimisation of MOS VLSI Circuits Based on SA and its Parallel Implementation. IEEE Proc.-Circuits Devices Syst., 141(5):387-398, 1994. 21. T.G. Crainic and M. Gendreau. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 8:60 1-627,2002.
98
METAHEURISTICS AND PARALLELISM
22. T.G. Crainic, M. Gendreau, P. Hansen, and N. Mladenovic. Cooperative Parallel Variable Neighborhood Search for the p-Median. Journal of Heuristics, 10(3), 2004. 23. T.G. Crainic and M. Toulouse. Parallel Strategies for Meta-heuristics. In F. Glover and G. Cochenberger, eds., Handbook of Metaheuristics, Kluwer Academic Publishers, Norwell, MA, pages 475-514, 2003. 24. I. de Falco, R. del Balio, and E. Tarantino. Testing parallel evolution strategies on the quadratic assignment problem. In Proc. IEEE Int. Con( in Systems, Man and Cybernetics, volume 5 , pages 254-259, 1993. 25. K.F. Doerner, R.F. Hartl, G. Kiechle, M. Lucka, and M. Reimann. Parallel Ant Systems for the Capacited VRP. In J. Gottlieb and G.R. Raidl, editors, EvoCOP’04, pages 72-83. Springer-Verlag, 2004. 26. M. Dorigo. Optimization, Learning and Natural Algorithms. PhD thesis, Dipartamento di Elettronica, Politecnico di Milano, 1992. 27. D.C. Dracopoulos and S. Kent. Bulk synchronous parallelisation of genetic programming. Technical report, Brunel University, 1996. 28. A. Oram (Ed.). Peer-to-Peer: Harnessing the Power ofDisruptive Technologies. O’Reilly & Associates, 2001. 29. M.G.Arenas, P. Collet, A.E. Eiben, M. Jelasity, J. J. Merelo, B. Paechter, M. Preuss, M. Schoenauer A framework for distributed evolutionary algorithms. Proceedings ofPPSN VII, Granada, Spain, LNCS 2439, pages 665-675,2002. 30. T.A. Feo and M.G.C. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6: 109-133, 1999. 31. F. Fernandez, M. Tomassini, W.F. Punch, 111, and J.M. Shchez-Perez. Experimental study of multipopulation parallel genetic programming. In Proc. of the European Conf:on GP, pages 283-293. Springer, 2000. 32. C.N. Fiechter. A parallel tabu search algorithm for large traveling salesman problems. Discrete Applied Mathematics, 5 1:243-267. 1994. 33. G. Folino, C. Pizzuti, and G. Spezzano. CAGE: A tool for parallel genetic programming applications. In J.F. Miller et al., editor, Proceedings ofEuroGP’2001, LNCS 2038, pages 64-73, Italy, 2001. Springer-Verlag. 34. I. Foster and C. Kesselman (eds.). The Grid: Blueprint for a Nav Computing Infrastructure. Morgan Kaufmann, San Fransisco, 1999. 35. L.M. Gambardella, E.D. Taillard, and G. Agazzi. MACS-VRPTW: A Multiple Ant Colony System for Vehicle Routing Problems with Time Windows. In D. Corne, M. Dorigo, and F. Glover, editors, New Ideas in Optimization, pages 63-76. McGraw-Hill, 1999.
REFERENCES
99
36. F. Garcia-Lopez, B. Melian-Batista, J. Moreno-Perez, and J.M. Moreno-Vega. Parallelization of the Scatter Search. Parallel Computing, 29:575-589, 2003. 37. F. Garcia-Lopez, B. Melib-Batista, J.A. Moreno-Perez, and J.M. Moreno-Vega. The Parallel Variable Neighborhood Search for the p-Median Problems. Journal ofHeuristics, 8(3):375-388, 2002. 38. F. Garcia-Lopez, M. Garcia Torres, B. Melian-Batista, J. Moreno-Perez, and J.M. Moreno-Vega. Solving Feature Subset Selection Problem by a Parallel Scatter Search. European Journal of Operational Research, 2006. To appear. 39. F. Glover. Tabu Search, part I. ORSA, Journal of Computing, 1:190-206, 1989. 40. E.D. Goodman. An Introduction to GALOPPS v3.2. Technical Report 96-07-01, GARAGE, I.S. Lab. Dpt. of C. S. and C.C.C.A.E.M., Michigan State Univ., East Lansing, MI, 1996. 4 1. M. Gorges-Schleuter, ASPARAGOS an asynchronous parallel genetic optimization strategy. In Proceedings of the Third International Conference on Genetic Algorithms, pages 422427. Morgan Kaufinann, 1989. 42. M. Gorges-Schleuter. An Analysis of Local Selection in Evolution Strategies. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proc. of the Genetic and Evolutionary Computation Conference, volume 1, pages 847-854, 1999. 43. A.M. Haldar, A. Nayak, A. Choudhary, and P. Banerjee. Parallel Algorithms for FPGA Placement. Proc. of the Great Lakes Symposium on VLSI (GVLSI ZOOO), Chicago, IL, 2000. 44. F. Herrera and M. Lozano. Gradual distributed real-coded genetic algorithm. IEEE Transaction in Evolutionary Computation, 4:43-63,2000. 45. T. Hiroyasu, M. Miki, and M. Negami. Distributed Genetic Algorithms with Randomized Migration Rate. In Proc. of the IEEE Con$ of Systems, Man and Cybernetics, volume 1, pages 689-694. IEEE Press, 1999. 46. D. Janaki Ram, T.H. Sreenivas, and K.G. Subramaniam. Parallel Simulated Annealing Algorithms. Journal of Parallel and Distributed Computing, 37:207212,1996. 47. N. Jozefowiez, F. Semet, and E.-G. Talbi. Parallel and Hybrid Models for MultiObjective Optimization: Application to the VRP. In Parallel Problem Solving from Nature VII, pages 271-280,2002. 48. H. Juille and J.B. Pollack. Massively parallel genetic programming. In Peter J. Angeline and K. E. Kinnear, Jr., editors, Advances in Genetic Programming 2, pages 339-358. MIT Press, Cambridge, MA, USA, 1996. 49. S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by Simulated Annealing. Science, 220(4598):67 1-680, 1983.
100
METAHEURISTICS A N D PARALLELISM
50. S.A. Kravitz and R.A. Rutenbar. Placement by simulated annealing on a multiprocessor. IEEE Trans. in Computer Aided Design, 6:534-549, 1987.
51. A. Le Bouthillier and T.G. Crainic. Co-Operative Parallel Method for Vehicle Routing Problems with Time Windows. Computers & Operation Research, 32(7): 1685-1708,2005. 52. S.Y. Lee and K.G. Lee. Synchronous and asynchronous parallel simulated annealing with multiple markov chains. IEEE Transactions on Parallel and Distributed Systems, 7:993-1008, 1996. 53. Y. Li, P.M. Pardalos, and M.G.C. Resende. A greedy randomized adaptive search procedure for qap. DIMACS Series in Discrete Mathematics and Theoritical Computer Science, 16:237-261, 1994. 54. S.-L. Lin, W.F. Punch, and E.D. Goodman. Coarse-Grain Parallel Genetic Algorithms: Categorization and New Approach. In Shth IEEE Symp. on Parallel and Distributed Processing, pages 28-37, 1994. 55. F.G. Lobo, C.F. Lima, and H. Martires. An architecture for massively parallelization of the compact genetic algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), LNCS 3 103, pages 4 12-413. Springer-Verlag, 2004.
56. J.A. Lozano, R. Sagama, and P. Larraiiaga. Parallel Estimation of Distribution Algorithms. In P. Larraiiaga and J. A. Lozano, editors, Estimation ofDistribution Algorithms. A New Tool for Evolutionary computation. Kluwer Academis Publishers, pages 129-145,2001. 57. S.L. Martins, C.C. Ribeiro, and M.C. Souza. A parallel grasp for the steiner problem in graphs. LNCS 1457, pages 285-297, 1998. 58. M. Mejia-Olvera and E. Canhi-Paz. DGENESIS-software for the execution of distributed genetic algorithms. In Proceedings X X conf Latinoamericana de Informatica, pages 935-946, 1994.
59. R. Mendes, J.R. Pereira, and J. Neves. A Parallel Architecture for Solving Constraint Satisfaction Problems. In Proceedings of Metaheuristics Int. Conf: 2001, volume 2, pages 109-114, Porto, Portugal, 2001. 60. A. Mendiburu, J.A. Lozano, and J. Miguel-Alonso. Parallel estimation of distribution algorithms: New approaches. Technical Report EHU-KAT-IK- 1-3, Department of Computer Architecture and Technology, The University of the Basque Country, 2003. 6 1. A. Mendiburu, J. Miguel-Alonso, and J.A. Lozano. Implementation and performance evaluation of a parallelization of estimation of bayesian networks algorithms. Technical Report EHU-KAT-IK-XX-04, Computer Architecture and Technology, 2004. Submitted to Parallel Computing.
REFERENCES
101
62. R. Michel and M. Middendorf. An Island Model Based Ant System with Lookahead for the Shortest Supersequence Problem. In A.E. Eiben et al., editor, Fifth Int. ConJ on Parallel Problem Solving.from Nature, LNCS 1498, pages 692-70 1. Springer-Verlag, 1998. 63. M. Middendorf, F. Reischle, and H. Schmeck. Multi Colony Ant Algorithms. Journal ofHeuristic, 8:305-320,2002. 64. M. Miki, T. Hiroyasu, and M. Kasai. Application of the temperature parallel simulated annealing to continuous optimization problems. IPSL Transaction, 41 :1607-16 16,2000. 65. N. Mladenovic and P. Hansen. Variable neighborhood search. Computers and Opereration Research, 24: 1097-1 100, 1997. 66. H. Miihlenbein,T. Mahnig, and A. Ochoa. Schemata, distributions and graphical models in evolutionary optimization. Journal ofHeuristics, 5(2):2 15-247, 1999. 67. A.J. Nebro, E. Alba, and F. Luna. Multi-Objective Optimization Using Grid Computing. Soft Computing Journal, 2005. To appear. 68. J. Ocenhsek and J. Schwarz. The parallel bayesian optimization algorithm. In European Symp. on Comp. Inteligence, pages 61-67,2000. 69. J. Ocenasek and J. Schwarz. The distributed bayesian optimization algorithm for combinatorial optimization. In Evolutionary Methods for Design, Optimisation and Control, pages 115-120,2001.
70. P. Pardalos, L. Pitsoulis, ,andM.G. Resende. A parallel grasp implementation for the quadratic assignment problem. Parallel algorithmsfor irregular problems: State of the art, A . Ferreira andJ. Rolim eds., Kluwer, pages 115-133, 1995. 71. K.E. Parsopoulos, D.K. Tasoulis, N.G. Pavlidis, V.P. Plagianakos, and M.N. Vrahatis. Vector Evaluated Differential Evolution for Multiobjective Optimization. In Proceedings of the IEEE Congress on Evolutionary Computation, pages 204-2 11,2004. 72. S.C. Port0 and C.C. Ribeiro. Parallel tabu search message-passing synchronous strategies for task scheduling under precedence constraints. Journal of Heuristics, 1:207-223, 1995. 73. W. Punch. How effective are multiple poplulations in genetic programming. In J.R. Koza et al., editor, Genetic Programming 1998: Proc. of the Third Annual Conference,pages 308-3 13. Morgan Kaufmann, 1998. 74. M. Rahoual, R. Hadji, and V. Bachelet. Parallel Ant System for the Set Covering Problem. In M. Dorigo et al., editor, 3rd Intl. Workshop on Ant Algorithms, LNCS 2463, pages 262-267. Springer-Verlag, 2002. 75. M. Randall and A. Lewis. A Parallel Implementation of Ant Colony Optimization. Journal of Parallel and Distributed Computing, 62(9):1421 - 1432,2002.
102
METAHEURISTICS AND PARALLELISM
76. G. Robbins. EnGENEer - The evolution of solutions. In Proc. of theJifth Annual Seminar Neural Networks and Genetic Algorithms, 1992. 77. P. Roussel-Ragot and G. Dreyfus. A problem-independent parallel implementation of simulated annealing: Models and experiments. ZEEE Transactions on Computer-Aided Design, 9:827-835, 1990. 78. J. Rowe, K. Vinsen, and N. Marvin. Parallel GAS for Multiobjective Functions. In Proc. of the 2nd Nordic Workshop on Genetic Algorithms and Their Applications (2NWGA), pages 61-70, 1996. 79. G. Rudoplh. Global optimization by means of distributed evolution strategies. In H.P. Schwefel and R. Miinner, editors, Parallel Problem Solving from Nature, volume 496, pages 209-2 13, 1991. 80. V. Schnecke and 0. Vornberger. Hybrid Genetic Algorithms for Constrained Placement Problems. IEEE Transactions on Evolutionary Computation, 1(4):266-277, 1997. 8 1. M. Schutz and J. Sprave. Application of Parallel Mixed-Integer Evolution Strategies with Mutation Rate Pooling. In L.J. Fogel, P.J. Angeline, and T. Back, editors, Proc. Fifth Annual Con$ Evolutionary Programming (EP '96), pages 345-354. The MIT Press. 1996.
82. J. Stender, editor. Parallel Genetic Algorithms: Theory and Applications. 10s Press, Amsterdam, The Netherlands, 1993. 83. J. Sprave. Linear Neighborhood Evolution Strategies. In A.V. Sebald and L.J. Fogel, editors, Proc. Third Annual Con$ Evolutionary Programming (EP'94), pages 42-5 1. World Scientific, Singapore, 1994. 84. N. Srinivas and K. Deb. Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms. Evolutionary Computation, 2(3):221-248, 1995.
85. T.J. Stanley and T. Mudge. A Parallel Genetic Algorithm for Multiobjetive Microprocessor Design. In Proc. of the Sixth Int. Con$ on Genetic Algorithms, pages 597-604, 1995. 86. R. Szmit and A. Barak. Evolution strategies for a parallel multi-objective genetic algorithm. In D. Whitley et al., editor, GECCO'OO, pages 227-234. Morgan Kaufmann, 2000. 87. T. Stiitzle. Parallelization Strategies for Ant Colony Optimization. In R. De Leone, A. Murli, P. Pardalos, and G. Toraldo, editors, High Performance Algorithms and Software in Nonlinear Oprimization, volume 24 of Applied Optimization,pages 87-100. Kluwer, 1998. 88. E.-G. Talbi, 0. Roux, C. Fonlupt, and D. Robillard. Parallel Ant Colonies for Combinatorial Optimization Problems. In Feitelson & Rudolph (Eds.), Job
REFERENCES
103
Scheduling Strategies for Parallel Processing: IPPS’95 Workshop, Springer LNCS 949, volume 11. 1999. 89. E.G. Talbi, Z. Hafidi, and J.M. Geib. A parallel adaptive tabu search approach. Parallel Computing, 24:2003-2019, 1996. 90. R. Tanese. Distributed genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 434439. Morgan Kaufmann, 1989. 9 1. S. Tongchim and P. Chongstitvatana. Parallel Genetic Algorithm with Parameter Adaptation. Information Processing Letters, 82( 1):47-54,2002. 92. R. Venkateswaran, Z. ObradoviC, and C.S. Raghavendra. Cooperative Genetic Algorithm for Optimization Problems in Distributed Computer Systems. In Proc. of the Second Online Workshop on Evolutionary Computation, pages 49-52,1996. 93. K. Weinert, J. Mehnen, and G. Rudolph. Dynamic neighborhood structures in parallel evolution strategies. Complex Systems, 13(3):227-244, 2002. 94. D. Whtley and T. Starkweather. GENITORII: A distributed genetic algorithm. Journal of Experimental and Theoretical AritiJicialIntelligence, 2:189214, 1990. 95. Y. Davidor. A naturally occuring niche and species phenomenon: The model and first results. In R.K. Belew and L.B. Booker, editors, Proc. ofthe Fourth Intl. Con$ Genetic Algorithms, pages 257-263, 1991.
This Page Intentionally Left Blank
Part I1 Parallel Metaheuristic Models
This Page Intentionally Left Blank
3
Parallel Genetic Algorithms GABRIEL LUQUE, ENRIQUE ALBA, BERNABE DORRONSORO Universiad de Malaga, Spain
5.1 INTRODUCTION Genetic Algorithms (GAS) [22,28] are a subfamily of EvolutionaryAlgorithms (EAs) [ 111, which are stochastic search methods designed for exploring complex problem spaces in order to find optimal solutions using minimal information on the problem to guide the search. Unlike other optimization techniques, GASare characterized by using a population of multiple structures (individuals) to perform the search through many different areas of the problem space at the same time. The individuals encode tentative solutions, which are manipulated competitively by applying to them some stochastic operators to find a satisfactory, if not globally, optimal solution. An outline of a classical GA is described in Algorithm 1. Algorithm 1. Pseudocode of a canonical GA Generate(P (0)); Evaluate(P (0 ) ) ; t := 0; while not Termination-Criterion(P( t ) )do P’(t) := Selection(P(t)); P”(t) := Recombination(P’(t)); P”’(t) := Mutation(P”(t)); Evaluate(P’( t ) ) ; P ( t 1) := Replace(P(t), P”’(t)); t := t 1; endwhile
+ +
A GA proceeds in an iterative way by successively generating a new population P ( t ) of individuals from a population P(t - 1) (t = 1 , 2 , 3 , . . . ). The initial population P(0) is generated randomly. A fitness function associates a value to every 107
108
PARALLEL GENETIC ALGORITHMS
individual which is meaningful of its suitability to the problem in hands. The canonical algorithm applies stochastic operators such as selection, crossover, and mutation on a population in order to compute a whole generation of new individuals. In a general formulation, we apply variation operators to create a temporary population P'(t),whose individuals are evaluated; then, a new population P(t 1) is obtained by using P'(t) and, optionally, P ( t ) .The stopping criterion is usually set as reachmg a preprogrammed number of iterations of the algorithm and/or to find an individual with a given error if the optimum, or an approximation to it, is known beforehand. For nontrivial problems, the execution of the reproductive cycle of a simple GA may require high computational resources (e.g., large memory and very long search times), and thus a variety of algorithmic issues have been studied to design efficient GAS. For this goal, numerous advances are continuouslybeing achieved by designing new operators, hybrid algorithms, termination criteria, and so on [ 1 11. In this chapter, we address one such improvement, consisting in adding parallelism to GAS. Parallel GAS (PGAs) are becoming very popular [16], and there exists a large number of implementations and algorithms. The reasons for this success have to do, first, with the fact that GAS are naturally prone to parallelism, since most variation operators can be easily undertaken in parallel. However, the truly interesting observation is that the use of a structured population, that is, a spatial distribution of individuals, in the form of either a set of islands [46] or a diffusion grid [41], is responsible for such benefits. As a consequence, many authors do not use a parallel machine at all to run PGAs, and still get better results than with serial traditional GAS. The main goal of this chapter is to provide a survey of the different models and implementations concerning PGAs. In addition, in order to illustrate the working principles of PGAs, we test the behavior of some of the most important proposed models to solve the same problem; therefore, we intend to provide what we hope to be a useful comparison among the main models for parallelizing GASexisting in the literature. This chapter is organized as follows. First, a description of the standard model of GA, in which the whole population is considered as a single pool of individuals, is given. In the next section, we address the structured models, in which the population is decentralized somehow. Next, some different implementations of PGAs are presented, and a PGA classification is given. In Section 5.5, we test and compare the behavior of several parallel models when solving an instance of the well-known MAXSAT problem. Finally, we summarize the most important conclusions.
+
5.2
PANMICTIC GENETIC ALGORITHMS
In the field of GAS, it is customary to find algorithms in which the population structure is panmictic. Thus, selection takes place globally and any individual can potentially mate with any other one. The same holds for the replacement operator, where any individual can potentially be removed from the pool and replaced by a new one. In contrast, there exists a different (decentralized) selection model, in which individuals
PANMlCTlC GENETIC ALGORITHMS
109
are arranged spatially, therefore giving place to structured GAS (see Section 5.3). Most other operators, such as recombination or mutation, can be readily applied to these two models (i.e., to panmictic and structured populations). There exist two popular classes of panmictic GAS, having different granularity at the reproductive step [43]. In the first one, called a “generational” model, a whole new population of X individuals replaces the old one (right part of Fig. 5.1, where p is the population size). The second type is called “steady state” since usually one (A = 1) or two (A = 2) new individuals are created at every step of the algorithm and then they are inserted back into the population, consequently coexisting with their parents (left part of Fig. 5.1). We relate these two kinds of panmictic GAS in terms of the number of new individuals being inserted into the population of the next generation in Fig. 5.1. As it can be seen in the figure, in the region in between (where 1 < X < p), there exists a plethora of selection models generically termed “generation gap” algorithms, in which a given number of individuals (A value) are replaced with the new ones. Clearly, generational and steady state selection are two special subclasses of generation gap algorithms.
h=1
1200). However, one of the disadvantages of the approach is the large amount of communicationrequired in maintaining and updating the pheromone matrix.
19.12 VEHICLE ROUTING PROBLEMS The vehicle routing problem (VRP) is one of the central problems in operations research and combinatonal optimization with numerous applications in transportation, telecommunications,production planning, etc. The VRP may be briefly described as follows. Given one or more depots, a fleet of vehicles, homogeneous or nor, and a set of customers with known or forecast demands, find a set of closed routes, originating and ending at one of the depots, to service all customers at minimum cost, while satisfying vehicle and depot capacity constraints. Other constraints may be added to this core problem, e.g., time restrictions, yielding a rich set of problem variants. Most VRP problems are NP-hard and exact solution methods address limited-size problem instances only.
19.12.1 Parallel Metaheuristics for the VRP Rego and Roucairol [ 12 11 proposed a TS approach for the VRP based on ejection chains and an independent multithread parallel version where each thread used a different set of parameter settings but started from the same solution. The method was implemented in a master-slave setting, where each slave executed a complete sequential TS. The master gathered the solutions found by the threads, selected the
VEHICLE ROUTING PROBLEMS
477
overall best, and reinitialized the threads for a new search. Low level parallelism was used to accelerate the move evaluations of the individual searches as well as in a postoptimization phase. Experiments showed the method to be competitive on a set of standard VRP (Christofides,Mingozzi, and Toth [27]). Ochi et al. [110] (see also Drummond, Ochi, and Vianna [52, 531) proposed a pC/C/MPSS coarse-grained PGA based on the island model for the vehicle routing problem with heterogeneous fleet. A petal decomposition procedure was used to build the initial population. The population was then divided into several disjoint subpopulations. Each GA thread evolved a subpopulation and triggered migration when subpopulationrenewal was necessary. An island in this case would broadcast its need and receive the best individual of every other island. The incoming individuals would replace the worst individuals of the receiving population. Computational tests show encouraging results in terms of solution quality and computing effort. Alba and Dorronsoro [I] addressed the VRP in which the routes have to be limited by a predefined travel time and proposed a fine-grained, cellular PGA. The population was arranged in a two-dimentional toroidal grid, each individual having 4 neighbors. Binary tournament selection was applied when selecting the mate for the first parent. Crossover was applied for these parents, then mutation and local search for the offspring. Two local search procedures were tested, 2-opt and 2-opt+X-Interchange, with X E { 1,2}. Elitist replacement was used. The authors compared their algorithm to classical heuristics, the TS of Rochat and Taillard [ 1281, the GAS of Prim and Taillard [ 1171 and Berger and Barkaoui [ 151, and the ant algorithms of Bullnheimer, H a d , and StrauR [18] and Reimann, Doerner, and Hart1 [122]. Computational results on benchmark problem instances showed high performance quality for both local search versions. Best performance (solution quality and rapidity) was observed for 2-opt+1-Interchange. 19.12.2 Vehicle Routing with Time Constraints Also known as the Vehicle Routing Problem with lime Windows (VRPTW), this problem specifies that service at customer sites must take place within given time intervals. Most time constraints specify that service cannot begin before a certain moment (but vehicles may wait “outside”, in most cases) and must be over by a given deadline. In soft-constrained versions, the time limits may be violated at a penalty. Czech and Czarnas [46] proposed a pC/KS/MPSS cooperative multithread PSA implemented on a master-slave platform. The master sent the initial solution to the salves. It was also in charge of controlling the annealing procedure temperature schedule, collecting the best local solution from each slave after n2 iterations for each temperature level (n was the number of customers), and updating the global best solution. Each slave ran a SA algorithm with the same parameters. Each slave j cooperated with slaves j - 1 and j 1 (slave 1 cooperated with slave 2 only) by exchanging best solutions. Cooperation was triggered every n iterations. Computational tests with few (five) processes showed good performance, in terms of solution quality, compared to the best-known solutions of the Solomon benchmarks.
+
478
PARALLEL METAHEURISTICS APPLICATIONS
Berger and Berkaoui [ 141 presented a low level parallel hybrid GA that used two populations. The first one aimed to minimize the total traveled distance, while the second aimed to minimize the violation of the time window constraints. A different fitness function was associated with each population. A master-slave platform was applied, where the master controlled the execution of the algorithm and coordinated the genetic operations. The slave concurrently executed the reproduction and mutation operators. Computational tests were conducted on a cluster of heterogeneous machines (19 computers). The authors compared their algorithm to the best-known methods in the literature for Solomon’s benchmark. Their results showed that the proposed technique was competitive. Taillard [ 1431 proposed a pCIKSIMPSS parallel TS based on domain decomposition. The domain was partitioned and vehicles were allocated to the resulting regions. Once the initial partition was performed, each subproblem was solved by an independent TS. All processors stopped after a number of iterations that varied according to the total number of iterations already performed. The partition was then modified by an information exchange phase, during which tours, undelivered cities, and empty vehicles were exchanged between adjacent processors (corresponding to neighboring regions). This approach did allow to address successfully a number of problem instances. The synchronization inherent in the design of the strategy hindered its performance, however. Rochat and Taillard [128] proposed what may be considered as the first fully developed adaptive memory-based approach for the VRPTW. The adaptive memory contained tours of good solutions identified by the TS threads. The tours were ranked according to attribute values, including the objective values of their respective solutions. Each TS process then probabilistically selected tours in the memory, constructed an initial solution, improved it, and returned the corresponding tours to the adaptive memory. Despite the fact that it used a rather simple TS, this method produced many new best results at publication time. Taillard et al. [ 1451 and Badeau et al. [ 5 ] refined this method by enriching the neighborhood and the intensification phase and by adding a post-optimization procedure. The authors reported 14 new best solutions for the standard Solomon data set [ 1381. Gehring and Homberger [64] (see also Homberger and Gehring [75])proposed a pCICIMPSS cooperative parallel strategy where concurrent searches were performed with differently configured two-phase metaheuristics. The first phase tried to minimize the number of vehicles by using an evolutionary metaheuristic, while the second phase aimed to minimize the total traveled distance by means of a TS. The parallel metaheuristic was initiated on different threads with different starting points and values for the search time available for the first and second search phases. Threads cooperated by exchanging solutions asynchronously through a master process. For now, this approach has produced, on average, the best solution for the Solomon problems with respect to the number of vehicles and the total distance. Results were also presented on larger instances, generated similarly to the original Solomon problems but varying in size from 200 to 1000 customers. It is worth mentioning, however, that this method is rather time-consuming compared to other metaheuristics, TS in particular.
SUMMARY
479
Le Bouthiller and Crainic [85]proposed a central memory pC/C/MPDS parallel metaheuristic where several TS and GA threads cooperate. In this model, the central memory constituted the population common to all genetic threads. Each GA had its own parent selection and crossover operators. The offspring were returned to the pool to be enhanced by two TS procedures. The central memory followed the same rules as in the work of Crainic and Gendreau [35].Experimental results show that without any particular calibration, the parallel metaheuristic obtained solutions whose quality is comparable to the best metaheuristics available, in almost linear speedups. 19.12.3
Dynamic Problems
Gendreau, Laporte, and Semet [66] addressed the deployment problem for a fleet of emergency vehicles and proposed a parallel TS based on domain decomposition. A master-slave implementation was performed where each slave addressed a subproblem associated to a vehicle. Computation tests showed high solution quality as indicated by territory coverage measures. Attanasio et al. [3] addressed the multi-vehicle dial-a-ride-problem and proposed two parallel strategies based on a multi-thread tabu search, a pC/C/SPDS and pCICIMPSS strategies. In the pC/C/SPDS approach, each processor ran a different tabu search strategy from the same initial solution. Once a processor found a new best solution, it broadcast it. Reinitilization searches were then launched. Every K iterations, a diversification procedure was applied to the first half of the processors, while an intensification was run on the remaining ones. The pC/C/MPSS strategy consisted in running various tabu search algorithms from different starting points. Each processor ran the same tabu search algorithm with the best known parameter settings. Moreover, every 77 iterations, processors exchanged information in order to perform a diversification procedure. According to the computational results, both the pC/C/SPDS and pC/C/MPSS strategies outperformed the sequential tabu search of Cordeau and Laporte [321. Gendreau et al. [65] proposed a cooperative multithread parallel TS method for real-time routing and vehicle dispatching problems. The authors followed an adaptive memory approach. In an interesting development, the authors also exploited parallelism within each search thread by decomposing the set of routes along the same principles proposed in Taillard’s work [143]. Very good results were obtained. 19.13 SUMMARY
We have presented a survey of parallel metaheuristic methods applied to a rather broad set of problems: graph coloring and partitioning, Steiner tree problems, set covering and set partitioning, satisfiability and MAX-SAT problems, quadratic assignment, location and network design, traveling salesman, and vehicle routing problems. This survey is certainly not comprehensive. Important topics could not be covered and not all published contributions in the topics covered could be surveyed. The scope
480
PARALLEL METAHEURISTICS APPLICATIONS
of the chapter is sufficiently broad, however, to allow us to draw some conclusions and share a number of thoughts on the subject of parallel metaheuristic applications. The survey illustrates the richness of contributions to the development of parallel metaheuristics as well as that of their applications to many important problems in science and practice. It also illustrates the fact that this richness notwithstanding, one finds only a somewhat limited number of fundamental principles regarding how to design parallel metaheuristic procedures. We summarized these principles in the taxonomy section presented at the beginning of the chapter. To sum up, it appears that asynchronouscooperation enhances the performance of parallel metaheuristics independently of the methodology used in the initial sequential method. This conclusion is strongly supported by the results obtained by multithread cooperative strategies. The survey also illustrates that not all application fields have been studied with comparable fervor. Indeed, many important topics have seen only a few contributions. Even for topics for which the number of contributions is larger these are not evenly distributed among metaheuristic classes. Without trying to completely explain the phenomenon, one may observe correlations between the methodologies selected and the scientific field of most of the researchers that have addressed it. Interesting research avenues and promising developments may thus go unexplored, and appropriate tools may be missing in some areas. It should be a challenge of the profession to explore as comprehensivelyas possible as many problem types as possible. While talung up this challenge, one should make sure that methods are compared across methodological approaches and that such comparisons are performed fairly, that is, all algorithmic developments are at the same level of sophistication. To conclude, parallel metaheuristics offer versatile and powefil tools to address large and complex problems. Many fascinating research avenues are open. Some address issues related to the design of parallel metaheuristics. Others concern the application of these designs to specific problems and the selection of the most appropriate. We hope that this chapter has contributed to illustrate these opportunities and challenges.
Acknowledgments Funding for this project has been provided by the Natural Sciences and Engineering Council of Canada and by the Fonds FQRNT of the Province of Qukbec.
REFERENCES
1. Alba, E. and Dorronsoro, B. Solving the Vehicle Routing Problem by Using Cellular Genetic Algorithms. In EvoCOP, pages 11-20,2004.
REFERENCES
481
2. Allwright, J.R.A. and Carpenter, D.B. A Distributed Implementation of Simulated Annealing for the Travelling Salesman Problem. Parallel Computing, 10:335-33 8, 1989.
3 . Attanasio, A., Cordeau, J.F., Ghiani, G., and Laporte, G. Parallel tabu search heuristics for the dymanic multivehcle dial-a-ride problem. Parallel Computing, 30 :377-3 87,2004. 4. Azencott, R. Simulated Annealing Parallelization Techniques. John Wiley & Sons, New York, NY,1992.
5. Badeau, P., Guertin, F., Gendreau, M., Potvin, J.-Y., and Taillard, E.D. A Parallel Tabu Search Heuristic for the Vehicle Routing Problem with Time Windows. Transportation Research C: Emerging Technologies, 5(2):109-1 22, 1997. 6. Banos, R., Gil, C., Ortega, J., and Montoya, F.G. A Parallel Multilevel Metaheuristic for Graph Partitioning. Journal of Heuristics, 10(4):315-336,2004. 7. Banos, R., Gil, C., Ortega, J., and Montoya, F.G. Parallel Heuristic Search in Multilevel Graph Partitioning. In Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pages 88-95,2004. 8. Baraglia, R., Hidalgo, J.I., and Perego, R. A Parallel Hybrid Heuristic for the TSP. In Boers, E.J.W., Cagnoni, S., Gottlieb, J., Hart, E., Lanzi, P.L., Gunther, R., Smith, R., and Tijink, H., editors, Applications of Evolutionary Computing. Proceedings ofEvoCOe EvoFlight, EvoIASe EvoLearn, and EvoSTIM, volume 2037 of Lecture Notes in Computer Science, pages 193-202. Springer-Verlag, Heidelberg, 200 1.
9. Barr, R.S. and Hickman, B.L. Reporting Computational Experiments with Parallel Algorithms: Issues, Measures, and Experts Opinions. ORSA Journal on Computing, 5( 1):2- 18, 1993. 10. Bastos, M.P. and Ribeiro, C.C. Reactive Tabu Search with Path-Relinking for the Steiner Problem in Graphs. In S. VoR, S. Martello, C. Roucairol, and Osman, I.H., editors, Meta-Heuristics 98: Theory & Applications, pages 3 1-36. Kluwer Academic Publishers, Norwell, MA, 1999. 11. Battiti, R. and Tecchiolli, G. Parallel Based Search for Combinatorial Optimization: Genetic Algorithms and TABU. Microprocessors and Microsystems, 16(7):351-367, 1992.
12. Battiti, R. and Tecchiolli, G. The Reactive Tabu Search. ORSA Journal on Computing, 6(2):126-140, 1994. 13. Beasley, J.E. Randomized Heuristic Schemes for the Set Covering Problem. Naval Research Logistics, 37:151-164, 1990. 14. .! Serger and M. Barkaoui. A Parallel Hybrid Genetic Algorithm for the Vehicle Routing Problem with Time Windows. Computers & Operations Research, 31(12):2037-2053,2004.
482
PARALLEL METAHEURISTICS APPLICATIONS
15. Berger, J. and Barkaoui, M. A Hybrid Genetic Algorithm for the Capacitated Vehicle Routing Problem. In E. Cantu-Paz, editor, GECCOO3, pages 646-656. Springer-Verlag, 2003. 16. Bevilacqua, A. A Methodological Approach to Parallel Simulated Annealing. Journal of Parallel and Distributed Computing, 62: 1548-1 570, 2002. 17. Boissin, N. and Lutton, J.L. A Parallel Simulated Annealing Algorithm. Parallel Computing, 19(8):859-872, 1993. 18. Bullnheimer, B., Hartl, R., and StraulJ, C. An Improved Ant System Algorithm for the Vehicle Routing Problem. Annals oj'Operations Research, 89:3 19-328, 1999. 19. Bullnheimer, B., Kotsis, G., and StrauB, C. Parallelization Strategies for the Ant Aystem. Applied Optimization, 24:87-100, 1998. 20. Calegari, P. , Guidec, F., Kuonen, P., and Kuonen, D. Parallel Island-Based Genetic Algorithm for Radio Network Design. Journal of Parallel and Distributed Computing, 47( 1):86-90, 1997. 2 1. Canhi-Paz, E. A Survey of Parallel Genetic Algorithms. Calculateurs Paralleles, Reseaux et SystPmes ripartis, 10(2):141-170, 1998. 22. Chakrapani, J. and Skorin-Kapov, J. A Connectionist Approach to the Quadratic Assignment Problem. Computers & Operations Research, 19(3/4):287-295, 1992. 23. Chakrapani, J. and Skorin-Kapov, J. Connection Machine Implementation of a Tabu Search Algorithm for the Traveling Salesman Problem. Journal of Computing and Information Technology, 1(1):29-36, 1993. 24. Chakrapani, J. and Skorin-Kapov, J. Massively Parallel Tabu Search for the Quadratic Assignment Problem. Annals of Operations Research, 41 :327-341, 1993. 25. Chakrapani, J. and Skorin-Kapov, J. Mapping Tasks to Processors to Minimize Communication Time in a Multiprocessor System. In The Impact of Emerging Technologies of Computer Science and Operations Research, pages 45-64. Kluwer Academic Publishers, Nonvell, MA, 1995. 26. Chen, H., Flann, N.S., and Watson, D.W. Parallel Genetic Simulated Annealing: A Massively Parallel SIMD Algorithm. IEEE Transactions on Parallel and Distributed Systems, 9(2): 126-1 36, 1998. 27. Christofides, N., Mingozzi A., and Toth, P. The Vehicle Routing Problem. In N. Christofides, Mingozzi A., P. Toth, and C. Sandi, editors, Combinatorial Optimization, pages 3 15-338. John Wiley, New York, 1979. 28. Cohoon, J., Hedge, S., Martin, W., and Richards, D. Punctuated Equilibria: A Parallel Genetic Algorithm. In J.J. Grefenstette, editor, Proceedings of the
REFERENCES
483
Second International Conference on Genetic Algorithms and their Applications, pages 148-154. Lawrence Erlbaum Associates, Hillsdale, NJ, 1987. 29. Cohoon, J., Martin, W., and Richards, D. Genetic Algorithm and Punctuated Equilibria in VLSI. In Schwefel, H.-P. and Manner, R., editors, Parallel Problem Solving from Nature, volume 496 of Lecture Notes in Computer Science, pages 134144. Springer-Verlag, Berlin, 1991a. 30. Cohoon, J., Martin, W., and Richards, D. A Multipopulation Genetic Algorithm for Solving the k-Partition Problem on Hyper-Cubes. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 134-144. Morgan Kaufmann, San Mateo, CA, 1991b. 31. Collins, R.J. and Jefferson, D.R. Selection in Massively Parallel Genetic Algorithms. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 249-256. Morgan Kaufmann, San Mateo, CA, 1991. 32. Cordeau, J.F. and G. Laporte, G. A Tabu Search Heuristics for the Static Multivehicle Dial-a-Ride Problem. Transportation Research Part B , pages 579-594, 2003. 33. Crainic, T.G. Parallel Computation, Cooperation, Tabu Search. In C. Rego and B. Alidaee, editors, Adaptive Memory and Evolution: Tabu Search and Scatter Search, pages 283-302, Kluwer Academic Publishers, Norwell, MA, 2005. 34. Crainic, T.G. and Gendreau, M. Towards an Evolutionary Method - Cooperating Multithread Parallel Tabu Search Hybrid. In S. Vo13, S. Martello, C. Roucairol, and Osman, I.H., editors, Meta-Heuristics 98: Theory & Applications, pages 33 1-344. Kluwer Academic Publishers, Norwell, MA, 1999. 35. Crainic, T.G. and Gendreau, M. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 8(6):601-627,2002. 36. Crainic, T.G., Gendreau, M., Hansen, P., and MladenoviC, N. Cooperative Parallel Variable Neighborhood Search for the p-Median. Journal of Heuristics, 10(3):293-314,2004, 37. Crainic, T.G. and Toulouse, M. Parallel Metaheuristics. In T.G. Crainic and G. Laporte, editors, Fleet Management and Logistics, pages 205-25 1. Kluwer Academic Publishers, Norwell, MA, 1998. 38. Crainic, T.G. and Toulouse, M. Parallel Strategies for Metaheuristics. In F. Glover and G. Kochenberger, editors, Handbook in Metaheuristics, pages 4755 13. Kluwer Academic Publishers, Norwell, MA, 2003. 39. Crainic, T.G., Toulouse, M., and Gendreau, M. Parallel Asynchronous Tabu Search for Multicommodity Location-Allocation with Balancing Requirements. Annals of Operations Research, 63:277-299, 1995.
484
PARALLEL METAHELIRISTICS APPLICATIONS
40. Crainic, T.G., Toulouse, M., and Gendreau, M. Synchronous Tabu Search Parallelization Strategies for Multicommodity Location-Allocation with Balancing Requirements. OR Spektrum, 17(2/3):113-123, 1995. 41. Crainic, T.G., Toulouse, M., and Gendreau, M. Towards a Taxonomy of Parallel Tabu Search Algorithms. INFORMS Journal on Computing, 9( 1):61-72, 1997. 42. Crainic, T.G., Toulouse, M., and Li, Y. A Simple Cooperative Multilevel Algorithm for the Capacitated Multicommodity Network Design. Computers & O.R., 2005. 43. Cung, V.-D., Martins, S.L., Ribeiro, C.C., and Roucairol, C. Strategies for the Parallel Implementations of Metaheuristics. In C.C. Ribeiro and P. Hansen, editors, Essays and Surveys inMetaheuristics, pages 263-308. Kluwer Academic Publishers, Nonvell, MA, 2002. 44. Cung, V.-D., Mautor, T., Michelon, P., and Tavares, A. A Scatter Search Based Approach for the Quadratic Assignment Problem on Evolutionary Computation and Evolutionary Programming. In T. Baeck, Z. Michalewicz, and X. Yao, editors, Proceedings of the IEEE International Conference on Evolutionaly Computation, pages 165-170. IEEE Press, 1997. 45. Czech, Z.J. A Parallel Genetic Algorithm for the Set Partitioning Problem. In 8th Euromicro Workshop on Parallel and Distributed Processing, pages 343-350, 2000. 46. Czech, Z.J. and Czamas, P. Parallel simulated annealing for the vehicle routing problem with time windows. In 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pages 376383, 2002. 47. De Falco, I., Del Balio, R., Tarantino, E., and Vaccaro, R. Improving Search by Incorporating Evolution Principles in Parallel Tabu Search. In Proceedings International Confonference on Machine Learning, pages 823-828, 1994. 48. R. Diekmann, R. Luling, and J. Simon. Problem Independent Distributed Simulated Annealing and its Applications. In R.V.V. Vidal, editor, Lecture Notes in Economics and Mathematical Systems, volume 396, pages 1 7 4 4 . Springer Verlag, Berlin, 1993. 49. Diekmann, R., Luling, R., Monien, B., and Spriiner, C. Combining Helpful Sets and Parallel Simulated Annealing for the Graph-Partitioning Problem. International Journal of Parallel Programming, 8:61-84, 1996. 50. Drias, H. and Ibri , A. Parallel ACS for Weighted MAX-SAT. In Mira, J. and Alvarez, J., editors, Artrficial Neural Nets Problem Solving Methods - Proceedings of the 7th International Work-Conference on Artificial and Natural Neural Networks, volume 2686 of Lecture Notes in Computer Science, pages 414421. Springer-Verlag, Heidelberg, 2003.
REFERENCES
485
5 1. Drias, H. and Khabzaoui, M. Scatter Search with Random Walk Strategy for SAT and Max-SAT Problems. In L. Monostori, Vancza, J., and A. Moonis, editors, Proceedings of the 14th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2001I, pages 35-44. Springer-Verlag, 2001. 52. Drummond, L.M.A., Ochi, L.S., andviama, D.S. AParallelHybridEvolutionary Metaheuristic for the Period Vehicle Routing Problem. volume 1586 of Lecture Notes in Computer Science, pages 183-191. 1999. 53. Drummond, L.M.A., Ochi, L.S., and Vianna, D.S. An Asynchronous Parallel Metaheuristic for the Period Vehicle Routing Problem. Future Generation Computer Systems, 17:379-386,2001. 54. Durand, M.D. Parallel Simulated Annealing: Accuracy vs. Speed in Placement. IEEE Design & Test of Computers, 6(3):8-34, 1989.
55. Felten, E., Karlin, S., and Otto, S.W. The Traveling Salesman Problem on a Hypercube, MIMD Computer. In Proceedings I985 of the Int. Conf: on Parallel Processing, pages 6-10, 1985. 56. Fiechter, C.-N. A Parallel Tabu Search Algorithm for Large Travelling Salesman Problems. Discrete Applied Mathematics, 51(3):243-267, 1994. 57. Fiorenzo Catalano, M.S. and Malucelli, F. Randomized Heuristic Schemes for the Set Covering Problem. In M. Paprzyky, L. Tarricone, and T. Yang, editors, Practical Applications ofParalle1 Computing,pages 23-38. Nova Science, 2003. 58. Flores, S.D., Cegla, B.B., and Caceres, D.B. TelecommunicationNetwork Design with Parallel Multiobjective Evolutionary Algorithms. In IFIP/ACM Latin America Networking Conference 2003, pages 1-1 1,2003. 59. Folino, G., Pizzuti, C., and Spezzano, G. Combining Cellular Genetic Algorithms and Local Search for Solving SatisfiabilityProblems. In Proceedings of the Tenth IEEE International Conference on Tools with Artificial Intelligence, pages 192198. IEEE Computer Society Press, 1998. 60. Folino, G., Pizzuti, C., and Spezzano, G. Solving the SatisfiabilityProblem by a Parallel Cellular Genetic Algorithm. In Proceedings of the 24th EUROMICRO Conference,pages 715-722. IEEE Computer Society Press, 1998. 61. Folino, G., Pizzuti, C., and Spezzano, G. Parallel Hybrid Method for SAT that Couples Genetic Algorithms and Local Search. IEEE Transactions on Evolutionary Computation, 5(4):323-334,2001, 62. Garcia-Lbpez, F., Melih-Batista, B., Moreno-Perez, J.A., and Moreno-Vega, J.M. The Parallel Variable Neighborhood Search for the p-Median Problem. Journal of Heuristics, 8(3):375-388,2002.
486
PARALLEL METAHEURISTICS APPLICATIONS
63. Garcia-Lopez, F., Melian-Batista, B., Moreno-PCrez, J.A., and Moreno-Vega, J.M. Parallelization of the Scatter Search for the p-Median Problem. Parallel Computing, 29:575-589, 2003. 64. Gehring, H. and Homberger, J. A Parallel Two-Phase Metaheuristic for Routing Problems with Time Windows. Asia-Pacific Journal of Operational Research, 18(1):35s47,2001.
65. Gendreau, M., Guertin, F., Potvin, J.-Y., and Taillard, E.D. Tabu Search for RealTime Vehicle Routing and Dispatching. Transportation Science, 33(4):38 1-390, 1999. 66. Gendreau, M., Laporte , G., and Semet, F. A Dynamic Model and Parallel Tabu Search Heuristic for Real-Time Ambulance Relocation. Parallel Computing, 27( 12):1641-1653,2001. 67. Gendron, B., Potvin, J.-Y., and Soriano, P. A Parallel Hybrid Heuristic for the Multicommodity Capacitated Location Problem with Balancing Requirements. Parallel Computing, 29:59 1-606, 2003. 68. Ghamlouche, I., Crainic, T.G., and Gendreau, M. Cycle-based Neighborhoods for Fixed-Charge Capacitated Multicommodity Network Design. Operations Research, 5 1(4):655-667,2003. 69. Glover, F. and Laguna, M. Tabu Search. Kluwer Academic Publishers, Norwell, MA, 1997. 70. Greening, D.R. Asynchronous Parallel Simulated Annealing. Lectures in Complex Systems, 314977505, 1990. 71. Greening, D.R. Parallel Simulated Annealing Techniques. Physica D, 42:293306, 1990. 72. Gunes, M., Sorges, U., and Bouazizi, I. ARA - The Ant Colony Based Routing Algorithm for MANETs. In Proceedings of the International Conference on ParaNeI Processing, pages 79-85, 2002. 73. Hidalgo, J.I., Prieto, M., Lanchares, J., Baraglia, R., Tirado, F., and Gamica, 0. Hybrid Parallelization of a Compact Genetic Algorithm. In Proceedings of the 1Ith Euromicro Conference on Parallel, Distributed and Network-Based Processing, pages 449455,2003. 74. Holmqvist, K., Migdalas, A., and Pardalos, P.M. Parallelized Heuristics for Combinatorial Search. In A. Migdalas, P.M. Pardalos, and S. Storoy, editors, Parallel Computing in Optimization, pages 269-294. Kluwer Academic Publishers, Norwell, MA, 1997. 75. Homberger, J. and G e h n g , H. Two Evolutionary Metaheuristics for the Vehicle Routing Problem with Time Windows. INFOR, 37:297-3 18, 1999.
REFERENCES
487
76. Jeong, C.-S. and Kim, M.-H. Parallel Algorithm for the TSP on SIMD Machines Using Simulated Annealing. In Proceedings of the International Conference on Application Specijic Array Processors, pages 7 12-721, 1990. 77. Jeong, C.-S. and Kim, M.-H. Fast Parallel Simulated Annealing Algorithm for TSP on SIMD Machines with Linear Interconnections. Parallel Computing, 171221-228, 1991. 78. Katayama, K., Hirabayashi, H., and Narihisa, H. Performance Analysis for Crossover Operators of Genetic Algorithm. Systems and Computers in Japan, 30:20-30, 1999. 79. Katayama, K., Hirabayashi, H., and Narihisa, H. Analysis of Crossovers and Selections in a Coarse-grained Parallel Genetic Algorithm. Mathematical and Computer Modelling, 38:1275-1282,2003. 80. Knight, R.L. and Wainwright, R.L. HYPERGEN" - A Distributed Genetic Algorithm on a Hypercube. In Proceedings of the 1992 IEEE Scalable High Performance Computing Conference, pages 232-235. IEEE Computer Society Press, Los Alamitos, CA, 1992. 81. Kohlmorgen, U., Schmeck, H., and Haase, K. Experiences with Fine-grained Parallel Genetic Algorithms. Annals of Operations Research, 90:203-2 19,1999. 82. Kokosinski, Z., Kolodziej, M., and Kwarciany, K. Parallel Genetic Algorithm for Graph Coloring Problem. In Bubak, M., van Albada, G.D., and Sloot, P.M.A., editors, International Conference on Computational Science, volume 3036 of Lecture Notes in Computer Science, pages 2 15-222. SpringerVerlag, Heidelberg, 2004. 83. Laursen, P.S. Problem-Independent Parallel Simulated Annealing Using Selection and Migration. In Davidor, Y., Schwefel, H.-P., and Manner, R., editors, Parallel Problem Solving from Nature III, Lecture Notes in Computer Science 866, pages 408-41 7. Springer-Verlag, Berlin, 1994. 84. Laursen, P.S. Parallel Heuristic Search - Introductions and a New Approach. In A. Ferreira and P.M. Pardalos, editors, Solving Combinatorial Optimization Problems in Parallel, Lecture Notes in Computer Science 1054, pages 248-274. Springer-Verlag, Berlin, 1996. 85. Le Bouthillier, A. and Crainic, T.G. A Cooperative Parallel Meta-Heuristic for the Vehicle Routing Problem with Time Windows. Computers & Operations Research, 32(7): 1685-1708,2005, 86. Lee, K-G. and Lee, S-Y. Efficient Parallelization of Simulated Annealing Using Multiple Markov Chains: An Application to Graph Partitioning. In Mudge, T.N., editor, Proceedings of the International Conference on Parallel Processing, volume 111: Algorithms and Applications, pages 177-1 80. CRC Press, 1992a.
488
PARALLEL METAHELIRISTICS APPLICATIONS
87. Lee, K-G. and Lee, S-Y. Synchronous and Asynchronous Parallel Simulated Annealing with Multiple Markov Chains. volume 1027 of Lecture Notes in Computer Science, pages 396408. Springer-Verlag,Berlin, 1995. 88. Lee, S.-Y. and Lee, K.-G. Asynchronous Communication of Multiple Markov Chains in Parallel Simulated Annealing. In Mudge, T.N., editor, Proceedings of the International Conference on Parallel Processing, volume 111: Algorithms and Applications, pages 169-176. CRC Press, Boca Raton, FL, 1992b. 89. Levine, D. A Parallel Genetic Algorithm for the Set Partitioning Problem. In I.H. Osman and J.P. Kelly, editors, Meta-Heuristics: Theory & Applications, pages 23-35. Kluwer Academic Publishers, Nonvell, MA, 1996. 90. Li, Y., Pardalos, P.M., and Resende, M.G.C. A Greedy Randomized Adaptive Search Procedure for Quadratic Assignment Problem. In DIMACS Implementation Challenge, DIMACS Series on Discrete Mathematics and Theoretical Computer Science, volume 16, pages 237-261. American Mathematical Society, 1994. 91. Lin, S.-C., Punch, W., and Goodman, E. Coarse-Grain Parallel Genetic Algorithms: Categorization and New Approach. In Sixth IEEE Symposium on Parallel and Distributed Processing, pages 28-37. IEEE Computer Society Press, 1994. 92. Logar, A.M., Corwin, E.M., and English, T.M. Implementation of Massively Parallel Genetic Algorithms on the MasPar MP- 1. In Proceedings of the I992 IEEE ACMISIGAPP Symposium on Applied Computing: Technological Challenges of the 1990's, pages 1015-1020. ACM Press, Kansas City, Missouri, 1992. 93. Malek, M., Guruswamy, M., Pandya, M., and Owens, H. Serial and Parallel Simulated Annealing and Tabu Search Algorithms for the Traveling Salesman Problem. Annals of Operations Research, 2 1:59-84, 1989. 94. Marchiori, E. and Rossi, C. A Flipping Genetic Algorithm for Hard 3-SAT Problems. In Banzhaf, W., Daida, J., Eiben, A.E., Garzon, M.H., Honavar, V., Jakiela, M., and Smith, R.E., editors, Proceedings of the Genetic Evolutionary Computation Conference, pages 393-400. Morgan Kaufmann, San Mateo, CA, 1999. 95. Martins, S.L., Resende, M.G.C., Ribeiro, C.C., and Parlados, P.M. A Parallel Grasp for the Steiner Tree Problem in Graphs Using a Hybrid Local Search Strategy. Journal of Global Optimization, 171267-283,2000. 96. Martins, S.L., Ribeiro, C.C., and Souza, M.C. A Parallel GRASP for the Steiner Problem in Graphs. In A. Ferreira and J. Rolim, editors, Proceedings of ZRREGULAR'98 - 5th International Symposium on Solving Irregularly Structured Problems in Parallel, volume 1457 of Lecture Notes in Computer Science, pages 285-297. Springer-Verlag, 1998. 97. T. Maruyama, T. Hirose, and A. Konagaya. A Fine-Grained Parallel Genetic Algorithm for Distributed Parallel Systems. In S. Forrest, editor, Proceedings
REFERENCES
489
of the F$h International Conference on Genetic Algorithms, pages 184-190. Morgan Kaufmann, San Mateo, CA, 1993. 98. Middendorf, M., Reischle, F., and Schmeck, H. Information Exchange in Multi Colony Ant Algorithms. volume 1800 of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, pages 645-652,2000. 99. Miki, M., Hiroyasu, T., Wako, J., and Yoshida, T. Adaptive Temperature Schedule Determined by Genetic Algorithm for Parallel Simulated Annealing. In CEC’O3 - The 2003 Congress on Evolutionary Computation, volume 1, pages 459-466, 2003. 100. Mitchell, D., Selman, B., and Levesque, H. Hard and Easy Distribution of SAT Problems. In Rosenbloom, P. and Szolovits, P., editors, Proceedings ofthe Tenth National Confirence on Artificial Intelligence, pages 459-465. AAAI Press, Menlo Park, CA, 1992. 101. H. Muhlenbein. Parallel Genetic Algorithms, Population Genetics and Combinatorial Optimization. In J.D. Schaffer, editor, Proceedings ofthe Third International Conference on Genetic Algorithms, pages 4 1 6 4 2l . Morgan Kaufmann, San Mateo, CA, 1989. 102. Miihlenbein, H. Evolution in Time and Space - The Parallel Genetic Algorithm. In G.J.E. Rawlins, editor, Foundations of GeneticAlgorithm & ClassifierSystems, pages 316338. Morgan Kaufman, San Mateo, CA, 1991. 103. Miihlenbein, H. Asynchronous Parallel Search by the Parallel Genetic Algorithm. In V. Ramachandran, editor, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, pages 526-533. IEEE Computer Society Press, Los Alamitos, CA, 1991c. 104. Muhlenbein, H. Parallel Genetic Algorithms in Combinatorial Optimization. In 0. Balci, R. Sharda, and S . Zenios, editors, Computer Science and Operations Research: New Developments in their Interface, pages 44 1-456. Pergamon Press, New York, NY, 1992. 105. Miihlenbein, H. How Genetic Algorithms Really Work: Mutation and HillClimbing. In R. Manner and B. Manderick, editors, Parallel Problem Solving from Nature, 2, pages 15-26. North-Holland, Amsterdam, 1992a. 106. Muhlenbein, H., Gorges-Schleuter, M., and Kramer, 0. New Solutions to the Mapping Problem of Parallel Systems - the Evolution Approach. Parallel Computing, 6:269-279, 1987. 107. Muhlenbein, H., Gorges-Schleuter, M., and K r h e r , 0. Evolution Algorithms in Combinatorial Optimization. Parallel Computing, 7( 1):65-85, 1988. 108. Mutalik, P. P., Knight, L. R., Blanton, J. L., and Wainwright, R. L. Solving Combinatorial Optimization Problems Using Parallel Simulated Annealing and Parallel Genetic Algorithms. In Proceedings of the 1992 IEEE ACMISIGAPP
490
PARALLEL METAHEURISTICS APPLICATIONS
Symposium on Applied Computing: Technological Challenges of the I990 's, pages 1031-1038. ACM Press, Kansas City, Missouri, 1992. 109. Nabhan, T.M. and Zomaya, A.Y. A Parallel Simulated Annealing Algorithm with Low Communication Overhead. IEEE Transactions on Parallel and Distributed Systems, 6( 12):1226-1233, 1995. 110. Ochi, L.S., Vianna, D.S., Drummond, L.M.A., and Victor, A.O. A Parallel Evolutionary Algorithm for the Vehicle Routing Problem with Heterogeneous Fleet. Future Generation Computer Systems, 14(3):285-292, 1998. 111. Ouyang, M., Toulouse, M., Thulasiraman, K., Glover, F., and Deogun, J.S. Multilevel Cooperative Search: Application to the NetlistiHypergraph Partitioning Problem. In Proceedings of International Symposium on Physical Design, pages 192-198. ACM Press, 2000. 112. Ouyang, M., Toulouse, M., Thulasiraman, K., Glover, F., and Deogun, J.S. Multilevel Cooperative Search for the CircuiVHypergraph Partitioning Problem. IEEE Transactions on Computer-Aided Design, 2 1(6):685-693,2002. 113. Pardalos, P.M., Li, Y., and Murthy, K.A. Computational Experience with Parallel Algorithms for Solving the Quadratic Assignment Problem. In 0. Balci, R. Sharda, and S. Zenios, editors, Computer Science and Operations Research: New Developments in their Interface, pages 267-278. Pergamon Press, New York, NY, 1992. 114. Pardalos, P.M., L. Pitsoulis, T. Mavridou, and Resende, M.G.C. Parallel Search for Combinatorial Optimization: Genetic Algorithms, Simulated Annealing, Tabu Search and GRASP. In A. Ferreira and J. Rolim, editors, Proceedings of Workshop on Parallel Algorithms for Irregularly Structured Problems, Lecture Notes in Computer Science, volume 980, pages 3 17-33 1. Springer-Verlag, Berlin, 1995. 115. Pardalos, P.M., Pitsoulis, L., and Resende, M.G.C. A Parallel GRASP Implementation for the Quadratic Assignment Problem. In A. Ferreira and J. Rolim, editors, Solving Irregular Problems in Parallel: State ofthe Art, pages 115-130. Kluwer Academic Publishers, Norwell, MA, 1995. 1 16. L. Pitsoulis, Pardalos, P.M., and Resende, M.G.C. A Parallel GRASP for MAXSAT. In Wasniewski J., Dongarra, J., Madsen, K., and Olesen, D., editors, Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization, volume 1180 of Lecture Notes in Computer Science, pages 575-585. Springer-Verlag,Berlin, 1996.
117. Prins, C. and Taillard, E.D. A Simple and Effective Evolutionary Algorithm for the Vehicle Routing Problem. Computers & Operations Research, 3 l(12): 19852002,2004.
REFERENCES
491
1 18. Rahoual, M., Hadji, R., and Bachelet, V. Parallel Ant System for the Set Covering Problem Source. In Proceedings of the Third International Workshop on Ant Algorithms, pages 262-267. Springer-Verlag, London, UK, 2002.
119. Ram, D.J., Sreenivas,T.H., and Subramaniam, K.G. Parallel Simulated Annealing Algorithms. Journal of Parallel and Distributed Computing, 37:207-2 12, 1996. 120. Randall, M. and Lewis, A. A Parallel Implementation of Ant Colony Optimisation. Journal of Parallel and Distributed Computing, 62: 1421-1432, 2002. 121. Rego, C. and Roucairol, C. A Parallel Tabu Search Algorithm Using Ejection Chains for the VRP. In I.H. Osman and J.P. Kelly, editors, Metu-Heuristics: Theory & Applications, pages 253-295. Kluwer Academic Publishers, Norwell, MA, 1996. 122. Reimann, M., Doerner, K., and Hartl, R. D-ants: Savings Based Ants Divide and Conquer the Vehicle Routing Problem. Computers & Operations Research, 311563-591,2004. 123. Resende, M.G.C. and Feo, T.A. A GRASP for Satisfiability. In Trick, M.A. and Johnson, D.S., editors, The Second DIMACS Implementation Challenge, DIMACS Series on Discrete Mathematics and Theoretical Computer Science, volume 26, pages 499-520. American Mathematical Society, 1996. 124. Resende, M.G.C. and Feo, T.A. Approximative Solution of Weighted MAXSAT Problems Using GRASP. Discrete Mathematics and Theoretical Computer Science, 35:393405, 1997. 125. Ribeiro, C.C. and Rosseti, I. A Parallel GRASP Heuristic for the 2-path Network Design Problem. 4 journee ROADEF, Paris, February 20-22,2002. 126. Ribeiro C.C. and Rosseti, I. A Parallel GRASP Heuristic for the 2-path Network Design Problem. Third Meeting of the PARE0 Euro Working Group, Guadeloupe (France), May, 2002. 127. Ribeiro C.C. and Rosseti, I. Parallel GRASP with Path-Relinking Heuristic for the 2-Path Network Design Problem. AIR0’2002, L‘Aquila, Italy, September, 2002. 128. Rochat, Y. and Taillard, E.D. Probabilistic Diversification and Intensification in Local Search for Vehicle Routing. Journal of Heuristics, l(1): 147-167, 1995. 129. Sanvicente-Sanchez, H. and Frausto-Solis, J. . MPSA: A Methodology to Parallelize Simulated Annealing and its Application to the Traveling Salesman Problem. volume 23 13 of Lecture Notes in Computer Science. Springer-Verlag Heidelberg, pages 89-97, 2002. 130. Selman, B., Kautz, H. A., and Cohen, B. Noise Strategies for Improving Local Search. In Proceedings of the Twerfth National Conference on Artificial Intelligence, pages 337-343, 1994.
492
PARALLEL METAHEURISTICS APPLICATIONS
131. Selman, B., Levesque, H., and Mitchell, D. A New Method for Solving Hard SatisfiabilityProblems. In Rosenbloom, P. and Szolovits, P., editors, Proceedings of the Tenth National Conference on Artificial Intelligence, pages 440-446. AAAI Press, Menlo Park, CA, 1992. 132. Sena, G.A., Megherbi, D., and Isern, G. Implementation of a Parallel Genetic Algorithm on a Cluster of Workstations: Traveling Salesman Problem, a Case Study. Future Generation Computer Systems, 17:477488, 200 1. 133. Shonkwiler, R. Parallel Genetic Algorithms. In S. Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, pages 199-205. Morgan Kaufmann, San Mateo, CA, 1993. 134. Sleem, A., Ahmed, M., Kumar, A., and Kamel, K. Comparative Study of Parallel vs. Distributed Genetic Algorithm Implementation for ATM Networking Environment. In Fifth IEEE Symposium on Computers and Communications, pages 152-157,2000. 135. Sohn, A. Parallel Satisfiability Test with Synchronous Simulated Annealing on Distributed Memory Multiprocessor. Journal of Parallel and Distributed Computing, 36: 195-204, 1996. 136. Sohn, A. and Biswas, R. Satisfiability Tests with Synchronous Simulated Annealing on the Fujitsu A p 1000 Massively-parallel Multiprocessor. In Proceedings of the International Conference on Supercomputing, pages 2 13-220, 1996. 137. Solar, M., Parada, V., and Urmtia, R. A Parallel Genetic Algorithm to Solve the Set-Covering Problem. Computers & Operations Research, 29(9): 1221-1235, 2002. 138. Solomon, M.M. Time Window Constrained Routing and Scheduling Problems. Operations Research, 35:254-265, 1987. 139. Spears, W.M. Simulated Annealing for Hard Satisfiability Problems, in Clique, Coloring and Satisfiability. In Johnson, D.S. and Trick, M.A., editors, Cliques, Coloring, and Satisfiability, volume 26, pages 533-558. American Mathematical Society, 1996. 140. Stutzle, T. Parallelization Strategies for Ant Colony Optimization. In Eiben, A.E., Back, T., Schoenauer, M., and Schwefel, H.-P., editors, Proceedings of Parallel Problem Solvingfi-om Nature V, volume 1498 of Lecture Notes in Computer Science, pages 722-73 1. Springer-Verlag, Heidelberg, 1998. 141. Stutzle, T. and Hoos, H. Improvements on the Ant System: Introducing the MAX-MIN Ant System. In Smith, G.D., Steele, N.C., and Albrecht, R.F., editors, Proceedings of Artijicial Neural Nets and Genetic Algorithms, Lecture Notes in Computer Science, pages 245-249. Springer-Verlag,Heidelberg, 1997. 142. Taillard, E.D. Robust Taboo Search for the Quadratic Assignment Problem. Parallel Computing, 17:443455, 1991.
REFERENCES
493
143. Taillard, E.D. Parallel Iterative Search Methods for Vehicle Routing Problems. Networks, 23:661-673, 1993. 144. Taillard, E.D. Recherches iteratives dirigkes paralliles. Polytechnique FedCrale de Lausanne, 1993.
PhD thesis, Ecole
145. Taillard, E.D., Badeau, P., Gendreau, M., Guertin, F., and Potvin, J.-Y. A Tabu Search Heuristic for the Vehicle Routing Problem with Soft Time Windows. Transportation Science, 31(2):170-186, 1997. 146. Talbi, E-G. and Bessibre, P. A Parallel Genetic Algorithm for the Graph Partitioning Problem. In Proceedings of the ACM International Conference on Supercomputing ICS91, pages 312-320, 1991a. 147. Talbi, E.-G., Hafidi, Z., andGeib, J.-M. Parallel Adaptive Tabu Search Approach. Parallel Computing, 24:2003-2019, 1998. 148. Talbi, E.-G., Hafidi, Z., and Geib, J.-M. Parallel Tabu Search for Large Optimization Problems. In S. VoB, S. Martello, C. Roucairol, and Osman, I.H., editors, Meta-Heuristics 98: Theory & Applications, pages 345-358. Kluwer Academic Publishers, Norwell, MA, 1999. 149. Talbi, E.-G., Hafidi, Z., Kebbal, D., and Geib, J.-M. A Fault-Tolerant Parallel Heuristic for Assignment Problems. Future Generation Computer Systems, 14:425438. 1998.
150. Talbi, E.-G., Roux, O., Fonlupt, C., and Robillard, D. Parallel Ant Colonies for Combinatorial Optimization Problems. In J.D.P. Rolim et al., editor, 11th IPPS/SPDP’99 Workshops,volume 1586 of Lecture Notes in Computer Science, pages 239-247. 1999. 151. Talbi, E.-G., Roux, O., Fonlupt, C . , and Robillard, D. Parallel Ant Colonies for the Quadratic Assignment Problem. Future Generation Computer Systems, 17:441-449,2001. 152. Tongcheng, G. and Chundi, M. Radio Network Design Using Coarse-Grained Parallel Genetic Algorithms with Different Neighbor Topology. In Proceedings of the 4th World Congress on Intelligent Control and Automation, volume 3, pages 1840-1843,2002. 153. Toulouse, M., Crainic, T.G., and Sans6, B. An Experimental Study of Systemic Behavior ofcooperative Search Algorithms. In S. VoB, S. Martello, C. Roucairol, and Osman, I.H., editors, Meta-Heuristics 98: Theory & Applications, pages 373-392. Kluwer Academic Publishers, Norwell, MA, 1999. 154. Toulouse, M., Crainic, T.G., and Sansb, B. Systemic Behavior of Cooperative Search Algorithms. Parallel Computing, 2 1( 1):57-79,2004.
494
PARALLEL METAHEURISTICS APPLICATIONS
155. Toulouse, M., Crainic, T.G., Sanso, B., and Thulasiraman, K. Self-organization in Cooperative Search Algorithms. In Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, pages 2379-2385. Omnipress, Madisson, Wisconsin, 1998. 156. Toulouse, M., Crainic, T.G., and Thulasiraman, K. Global Optimization Properties of Parallel Cooperative Search Algorithms: A Simulation Study. Parallel Computing, 26(1):91-112, 2000. 157. Toulouse, M., Glover, F., and Thulasiraman, K. A Multiscale Cooperative Search with an Application to Graph Partitioning. Report, School of Computer Science, University of Oklahoma, Norman, OK, 1998. 158. Toulouse, M., Thulasiraman, K., and Glover, F. Multilevel Cooperative Search: A New Paradigm for Combinatorial Optimization and an Application to Graph Partitioning. In P. h e s t o y , P. Berger, M. DaydC, I. Duff, V. Frays&, L. Giraud, and D. Ruiz, editors, 5th International Euro-Par Parallel Processing Conference, volume 1685 of Lecture Notes in Computer Science, pages 533-542. SpringerVerlag, Heidelberg, 1999. 159. Towhidul Islam, M., Thulasiraman, P., and Thalasiram, R.K. A Parallel Ant Colony Optimization Algorithm for All-Pair Routing in MANETs. In Proceedings of the International Parallel and Distributed Processing Symposium, IEEE, page 259,2003. 160. Verhoeven, M.G.A. and Aarts, E.H.L. Parallel Local Search. Journal ofHeuristics, 1(1):43-65, 1995. 161. Verhoeven, M.G.A. and Severens, M.M.M. Parallel Local Search for Steiner Trees in Graphs. Annals of Operations Research, 90:185-202, 1999. 162. VolJ, S. Tabu Search: Applications and Prospects. In D.-Z. Du and P.M. Pardalos, editors, Network Optimization Problems, pages 333-353. World Scientific Publishing Co., Singapore, 1993. 163. Wilkerson, R. and Nemer-Preece, N. Parallel Genetic Algorithm to Solve the Satisfiability Problem. In Proceedings of the 1998 ACM symposium on Applied Computing, pages 23-28. ACM Press, 1998. 164. Witte, E.E., Chamberlain, R.D., and Franklin, M.A. Parallel Simulated Annealing using Speculative Computation. In Proceedings of the 19th International Conference on Parallel Processing, pages 286-290, 1990. 165. Witte, E.E., Chamberlain, R.D., and Franklin, M.A. Parallel Simulated Annealing using Speculative Computation. IEEE Transactions on Parallel & Distributed Systems, 2(4):483494, 1991.
28
Parallel Metaheuristics in Telecommunications SERGIO NESMACHNOW~, HECTOR CANCELA~, ENRIQUE ALBA~,FRANCISCO CHICANO^ ‘Universidad de la Republica, Uruguay 2Universidad de Malaga, Spain
20.1
INTRODUCTION
The fast development of network infrastructures, software, and Internet services has been dnven by the growing demand for data communications over the last 20 years. At the present time, emergent new technologies like cellular mobile radio systems, optical fibers, and high speed networks, which allow fast data communications and new services and applications are in widespread use around the globe. In this situation, there is renewed interest in related technology and communication network problems, such as optimal allocation of antennas, frequency assignment to cellular phones, and structural design problems relating to routing information through the net. Since the size of the existing networks is continuously enlarging, the underlying instances of related optimization problems frequently pose a challenge to existing algorithms. In consequence, the research community has been searching for new algorithms that are able to replace and improve the traditional exact ones, whose low efficiency often makes them useless for solving real-life problems of large size in reasonable time. In this context, metaheuristic algorithms have been frequently applied to telecommunication problems in the last 15 years. Parallel implementations became popular in the last decade as an effort to make metaheuristics more efficient. As this volume eloquently demonstrates, there exists a wide range of choices to design efficient parallel implementations for metaheuristic algorithms, but the common idea consists in splitting the amount of work into several processing elements. In this way, parallel metaheuristic algorithms allow us to reach high quality results in a reasonable execution time even for hard-to-solve underlying optimization problems related to telecommunication. In addition, parallel implementations usually provide a pattern for the search space exploration that is
495
496
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
different from the sequential one and has been often shown to be useful for obtaining superior results [8, 24,471. This chapter provides a summary of articles related to the application of parallel metaheuristics to telecommunication problems. The survey focuses mainly on application areas, considering only those problems inherently related to the telecommunication field and therefore disregarding a whole class of optimization problems not directly connected with the area. For example, limiting the problem domain to applications unique to telecommunication prevents us from analyzing electronic circuits design problems and VLSI channel and switchbox routing problems, though there exist several parallel metaheuristic proposals for these kinds of problems and their approaches are usually related to telecommunicationproblems. Our main classificationdivides the applications in three categories: network design problems, network routing problems, and network assignment and dimensioning problems. Under the network design category, we include those problems related with finding a network topology that satisfies certain properties associated with reliability, Quality of Service (QoS), and other important features for source-to-destinationcommunication. Besides pure topology design problems, we also include node and transmitter positioning problems and construction of trees, typical combinatorial optimization problems strongly related to network topology design. The network routing group comprises all those applications concerning the transmission of information, routing protocol problems and their solution using parallel metaheuristic approaches. Finally, the network assignment and dimensioning collection involves those problems related to assigning available resources to a given network, such as frequency assignment and wavelength allocation problems. We also include dynamical dimensioning problems, considering that network planning usually must satisfy expected levels of demand for new services, upgrading, and improvementson existing designs.
20.2 NETWORK DESIGN Network design is the focus of numerous papers, where the authors propose several parallel and distributed metaheuristic techniques to solve them. Table 20.1 summarizes these papers. To organize the section we have grouped the works according to the aspect of network design they tackle. We identify five groups: reliability, Steiner tree problem, antennae placement, topological design, and other network design problems.
20.2.1 Reliability and Connectivity Problems Reliability refers to the ability of the network of worlung when some of the nodes or links fail. The evaluation of reliability metrics is a difficult problem itself; an alternative is to impose connectivity constraints on the network topology. One usual
NETWORK DESIGN
Table 20.1
497
Parallel metaheuristics applied to network design problems
Authodsl Huane et al.
Year 1997
Martins et al. Baran and Laufer Martins et al. Cruz et al.
1998 1999 2000 2000
Meunier and Talbi
2000
Calegari et al.
200 1
Canuto et al. Duarte and Baran
2001 2001
Related ootimization orobiem 2-connectivity..problem with diameter constraints. Steiner tree problem. Reliable network design. Steiner tree problem. Topological design. dimensioning. facility location. Position and configuration of mobile base stations. Antenna placement Hitting set problem. Prize-collecting Steiner tree. Reliable network design.
Watanabe et al.
2001
Antenna arrangement problem.
Ribeiro and Rosetti Cmz and Mateus
2002 2003
Di Fatta et al Duarte et al.
2003 2003
2-path problem. Topological design dimensioning, facility location. Steiner tree problem. Reliable network design.
Nesmachow et al. Lo Re et al. Alba and Chicano
2004 2004 2005
Generalized Steiner Problem. Steiner tree problem. Antenna placement.
Metaheuristic Genetic Algorithm GRASP. Asynchronous Team. Hybrid: GRASP + Local Search. Genetic Algorithm. Mutiobjective Evolutionary Algorithm. Genetic Algorithm. Multi-start Local Search. Multiobjective Evolutionary Algorithm. Multiobjective Evolutionary Algorithm. GRASP. Genetic Algorithm. Genetic Algorithm. Multiobjective Evolutionary Algorithm. GA. CHC. SA. Hybrid GA + SA. Hybrid: GA + Local Search. Genetic Algorithm.
objective is to ensure that in the presence of a single node or link failure the data flows may be re-routed to reach the destination. Then, to ensure that the data can arrive in the final node two disjoint paths between any pair of nodes are necessary. This is the so called 2-connectivity constraint. Huang et al. [26] presented a Parallel Genetic Algorithm (PGA) for solving the 2-connectivity network design problem with diameter constraints. The algorithm uses a node-based codificationencoding the diameter and 2-connectivity constraints in the chromosome and thus avoiding nonfeasible solutions. The authors apply two parallel approaches: a domain decomposition based on partitioning the connectivity requirements and a distributed PGA model. They analyze the influence of several virtual topologies for both strategies and conclude that the double-ring topology gives the best performance for the PGA on the level of partitioning requirements, and the torus topology is the most suitable for the PGA on the level of dividing population. Over this last model, they also verified that the best results are obtained with the most frequent exchange of solutions with neighbors, but the communication overhead increases significantly. Setting the most frequent communication interval value (one generation) and limiting the interactions to only one neighbor produce an appropriate balance between the quality of the results and the computational effort required. Another approach to get reliability in a telecommunicationnetwork consists in limiting the number of edges in a path between two nodes, that is, there is a distinguished set of nodes D and the paths between them have at most k edges (k-path network design problem). Ribeiro and Rosetti [44] developed a parallel GRASP algorithm applied to the 2-path network design problem. The GRASP construction phase uses an iterated shortest 2-path algorithm on random source-destination pairs of nodes
498
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
until each pair is considered, while the local search phase tries to improve the solutions tentatively eliminating 2-paths and recalculating the paths using modified edge weights. The local search is complemented with a path-relinking seeking mechanism applied to pairs of solutions. The parallel approach uses a multiple walk independent thread strategy, distributing the iterations over the available processors. The parallel algorithm obtains linear speedup and allows reaching high quality solutions, even though they deteriorate as the number of threads (and processors) increases. Baran and Laufer [4] utilize a model of the network that assigns a reliability value to each link and propose a parallel implementation of an Asynchronous Team Algorithm (A-Team) to solve the problem of finding a reliable communication network. The A-Team is a hybrid technique which combines distinct algorithms interacting in the solution of the same global problem. In Baran and Laufer’s proposal the A-Team combines a PGA with different reliability calculation approaches for the topological optimization of telecommunication networks subject to reliability constraints. The proposed PGA corresponds to a distributed island model with broadcast migration. It employs a bit-string codification and employs specialized initialization, crossover, and mutation operators. A repair mechanism is included to keep the solutions under the 2-connectivity constraint. Two approaches are used to estimate network reliability: an upper bound of all the candidates included in the population is efficiently calculated, and after that, a Monte Carlo simulation is used to get good approximations of the all-terminal reliability. The empirical results show good values for medium-size networks and sublinear speedup. In [17] a multiobjective version of the previous problem is addressed by Duarte and Baran. The authors designed a parallel asynchronous version of the SPEA multiobjective evolutionary algorithm [57] to find optimal topologies for a network. The algorithm presented is made up of two kinds of processes: several parallel SPEA processes, which perform the real optimization work, and one organizer process, which creates the workers, collect the results, and applies the Pareto dominance test over them. The parallel version results outperform the sequential ones, considering standard metrics in the multiobjective algorithms domain. In addition, the parallel version is fully scalable, showing almost linear speedup values and the ability of obtaining better solutions when the number of processors increases. Later, Duarte, Baran and Benitez [ 181 published a comparison of parallel versions of several multiobjective EAs for solving the same reliable network design problem. The authors present experimental results for asynchronous parallel versions of SPEA and NSGA [46] using external populations. The experiments confirm the previous findings, indicating that the quality of results is outperformed with more processors for all the implemented algorithms. They also illustrate that SPEA is able to obtain better results than NSGA using smaller execution times.
20.2.2 Steiner Tree Problem In telecommunication network design there is a problem that frequently appears: the Steiner Tree Problem (STP). The STP consists in finding a minimum-weight subtree of a given graph spanning a set of distinguished terminal nodes. Martins,
NETWORK DESIGN
499
Ribeiro, and Souza [35] developed a parallel GRASP heuristic for this problem. The proposed parallelization scheme consists in a master-slave model, distributing the GRASP iterations among several slave processes running on different processors. The GRASP construction phase is based on a randomized version of Kruskal’s algorithm for finding the minimum spanning tree of a given graph, while the local search phase utilizes a node-based neighborhood built using a nonterminal node insertion or deletion procedure. Results obtained on a set of experiments over series C, D, and E of the OR-Library STP instances present the approach as a usehl way to solve the problem. The parallel algorithm allows tackling high dimension problems. Later, the same authors workmg with Resende and Pardalos presented an improved version of the former algorithm exploring a hybrid local search strategy [36]. This approach incorporates a local search using a path-based neighborhood, replacing paths between terminal nodes. A sublinear speedup behavior is observed on the three classes of problems solved. Lo Re and Lo Presti studied the application to the problem of PGAs. In a first article with Di Fata [ 161, these researchers developed a master-slave PGA obtaining promising speedup values when solving Beasley’s OR Library standard test problems. Recently, the same authors working with Storniolo and Urso extended their proposal [34], presenting a parallel hybrid method that combines a distributed GA and a local search strategy using a specific STP heuristic. The computational efficiency analysis shows that the distributed model achieves significantlybetter speedup values than the master-slave approach, since it employs few synchronization points, and thus it can be executed over a wide-area grid-computing environment. These results encourage the authors to face high dimension problems, with sizes ranging from 1000 to 2000 nodes: 400 problems randomly created and 50 subnetworks with real Internet data extracted from the description produced by the Mercator project. The grid PGA is able to obtain the best-known solutions on about 70% of the instances. Canuto, Resende, and Ribeiro [9] proposed a parallel multistart local search algorithm for solving the prize-collecting Steiner tree problem, which has important applications in telecommunication LAN design. The authors put forward a method based on the generation of initial solutions by a primal-dual algorithm with perturbations. They utilize path-relinking to improve the solutions obtained by the local search and variable neighborhood search (VNS) as postoptimization procedure. Nesmachnow, Cancela, and Alba [391 tackled the Generalized Steiner Problem (GSP). The objective is to find a minimum cost topology such that for each pair of nodes ( i , j ) there exists at least rzj disjoint (or edge-disjoint) paths. The authors present a comparative study of sequential and parallel versions of different metaheuristics applied to a number of medium-sized test cases. The heuristics were implemented over the MALLBA library [l], and comprise a standard Genetic Algorithm (GA), a Simulated Annealing (SA) method, two GA+SA hybrid algorithms, and another evolutionary method called CHC (Cross generational elitist selection, Heterogeneous recombination, and Cataclysmic mutation). All problems used the same binary codification, where the presence or absence of each edge was mapped to a different bit. Standard mutation and recombination operators were applied; the resulting individuals were accepted only when they correspond to feasible solutions.
500
PARALLEL METAHEURlSTlCS IN TELECOMMUNICATIONS
For the parallel versions of the evolutionary methods, the population was split into 8 demes, applying a migration operator working on a unidirectional ring topology. The results for the sequential methods showed CHC as the best alternative in terms of solution quality. For the parallel methods, the experiments over an 8-machine cluster showed that both the standard GA and one of the GA+SA hybrids obtained the best performances in solution quality and speedup.
20.2.3 Antennae Placement and Configuration The localization and parameters of the antennae in a radio network have influence on the quality and cost of the service. This problem is specially important in cellular networks where, in addition to cost and quality requirements, we find coverage and handover constraints. Watanabe, Hiroyasu, and Mikiand [50] worked out a parallel evolutionary multiobjective approach for deciding the antennae placement and configuration in cellular networks. The authors presented two parallel models for multiobjective GAS applied to the problem: the Master-Slave with Local Cultivation Genetic Algorithm (MSLC) and the Divided Range Multi Objective Genetic Algorithm (DRMOGA). The MSLC algorithm is based on the standard master slave approach, but the evolutionary operators are carried out on the slaves using a twoindividual population and the evolution follows the minimal generation gap model. DRMOGA is a standard distributed island model that uses domain decomposition. The empirical analysis compares both models proposed with MOGA [21] and a standard distributed GA. They show that MSLC gets the best results of Pareto front covering and nondominated individuals,while establishingthat DRMOGA results are affected by the number of subpopulations: the number of nondominated individuals decreases when the number of subpopulations grows. In the same line of work, Meunier, Talbi, and Reininger [38] presented a parallel implementation of a GA with a multilevel encoding deciding the activation of sites, the number and type of antennae, and the parameters of each base station. Two modified versions of the classical genetic operators, named geographical crossover and multilevel mutation, are introduced. The fitness evaluation utilizes a ranlung function, similar to Fonseca and Fleming’s MOGA algorithm [21], and a sharing technique is employed to preserve diversity among solutions. In addition, a linear penalization model is used to handle the constraint considered (a minimal value for the covered area). A master-slave parallel implementation is presented for solving high dimension problems in reasonable times, with each slave processing a part of the geographical workmg area. The algorithm is evaluated with a large and realistic highway area generated by France Telecom. The authors analyze the convenience of using the sharing strategy proposed instead of concentrating on a small part of the Pareto front, showing that a better Pareto front sampling is obtained in the first case. Calegari et al. [7] developed a distributed GA to find the optimal placement of antennae. The authors compare a greedy technique, a Darwinian algorithm, and a PGA. The PGA uses a bit string representation for codifying the whole set of possible antenna locations and a parametric fitness function evaluating the covered area as a function of a parameter that can be tuned in order to obtain acceptable
NETWORK DESIGN
501
service ratio values. Experiments were performed on two real-life cases: Vosges (rural scenario) and Geneva (urban scenario). On average, the PGA and the greedy technique show the same solution quality. But when an optimal solution is known, it can be found using PGA whereas the greedy approach usually falls in bad attractive local optima. Alba and Chicano [2] tackled the same problem with sequential and parallel GAS over an artificial instance. They performed a deep study on the parallel approach evaluating the influence of the number of possible locations, the number of processors, and the migration rates. They found a sublinear speedup and concluded that the isolation of the subpopulations is beneficial for the search. 20.2.4 Other Network Design Problems The works of Cruz et al. [ 14, 151 study the multilevel network design problem, which arises in many industrial contexts, including telecommunications. This problem integrates several optimization features such as topological design, dimensioning, and facility location in several hierarchical levels. The multilevel network optimization problem generalizes some specific telecommunication related problems, such as tree construction problems or uncapacitated location problems. The authors focused on several master-slave parallel implementations of the classical branch & bound algorithm, suitable for executing on MIMD parallel computer systems, like the nowadays popular clusters and networks of workstations. They proposed a parallel-centralized version and a parallel-distributed version using different load balancing policies. The evaluation of the algorithms was performed with both OR-library instances and testing problems randomly generated. The results obtained show promising computational efficiency behaviors, achieving an improvement over sequential execution times. The centralized version attains almost linear speedup values, while the distributed approach shows sublinear speedup behavior. The results are similar with all the load-balancing strategies employed. A network design benchmark built using data from France Telecom has been studied by Le Pape, Perron, and other researchers [5,10,32,41,42]. The benchmark consists in dimensioning the arcs of a telecommunication network so that a number of commodities can be simultaneouslyrouted at minimum cost. It includes networks of several sizes and different constraints to take into account real-life aspects, such as security demands, installing multiple optical fibers on a single link, symmetric routing, number of hop/port constraints, and total node traffic constraints. The problems are presented in detail in [5, 321; there are 21 base problems that are translated into 1344 different problem instances when taking into account the different combinations of active constraints. Different methods, such as constraint programming, standard mixed-integer programming, column generation approaches, and GA are compared. The main objective of these works is to study how to design robust industry quality methods, which are able to find a good quality solution for any problem in the benchmark in a small time (10 minutes on either a 1 or 4 processor computer). The publications focus on the first three approaches, which were identified as the most promising in an early stage [32]. In [42], a Large Neighborhood Search (LNS) schema is employed to complement the constraint programming approach.
502
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
The main idea is to “freeze” a large part of the solution and re-optimize the unfrozen part using constraint programming. This schema is iterated repeatedly with a new neighborhood each time. Different alternativesfor introducing parallelism were compared: parallelism within the constraint programming algorithm, parallelism using a portfolio of algorithms running in parallel, and parallelism at the Large Neighborhood Search level (multipoint LNS), where there are different algorithms optimizing different randomly chosen parts of the problem. The results showed that this last method obtained the best quality results. At the efficiency level, the experiments on a 4-processor machine showed quasi-constant load over 90% in contrast with the two previous alternatives that used under 60% of the processing power available.
20.3 NETWORK ROUTING Several researchers have proposed parallel and distributed metaheuristics for solving differefit variants of network routing problems. We focus our review on explicitly parallel and multiagent distributed approaches proposed. Agent-based approaches frequently solve the routing problem using the implicit parallelism or the distributedagent-based search without proposing a clear parallel schema. However, there are some agent-based proposals with an explicit distribution or parallelism. One example is presented by Islam, Thulasiraman, and Thulasiram [28, 491. The paper exposes a parallel version of the Ant Colony Optimization (ACO) algorithm for solving the all-pair routing problem in Mobile Ad Hoc Networks (MANETs). This kind of network operates building connections on the fly when they are needed, using a noncontrolled, distributed administrated communication. The network topology changes dynamically and this nondeterministic behavior implies difficulties for the routing procedure. The parallel ACO is employed to solve the problem of finding the shortest path from a given source to the destination using an exploration technique complemented with a search-ahead mechanism based on the pheromone concentration. The authors present a distributed-memoryimplementation with MPI. Based on the domain decomposition technique, the best configuration of the algorithm runs each ant in a different processor for determining the best route for all pairs of nodes in its subgraph. The results show a sublinear speedup with 10 processors. Even though the communication cost is much larger than the computation done by each ant, the algorithm shows a promising scalability factor as long as the problem dimension increases. Following with multiagent approaches, Sim and Sun [45] have proposed a multiple ant colony scheme for solving network routing problems. The approach adds a new aspect to the implicit distribution induced by the network itself: each node of the network employs multiple ant colonies in parallel. The claim is that this feature may help to mitigate stagnation or premature convergence of the ant colony schemes. Xingwei [53]presented a parallel algorithm combining an evolutionary approach and SA for solving routing and wavelength assignment for multicast in Dense Wavelength-Division Multiplexing (DWDM) networks. The hybrid approach consists in a synchronous distributed island PGA, incorporating a SA technique to decide
NETWORK ROUTING
503
whether or not to accept offspring and mutated individuals produced in the evolutionary operators when their fitness values are worse than their fathers’ values. The PGA-SA method is employed to solve the routing problem (construction of a multicast tree), while the wavelength assignment is done via a deterministic procedure based on Dijkstra’s shortest path algorithm. The PGA uses a node-based binary coding to represent networks and a fitness function that considers the concept of user QoS satisfaction degree. The reported results show that the algorithm is able to improve the QoS values of the multicast tree. Eren and Ersoy [20] study the static establishment of virtual paths in ATh4 networks, where the network topology, link capacities, and traffic requirements (in the form of a list of demands from source to terminal nodes) are given. The problem consists in assigning a virtual path to each demand, while minimizing the maximum utilization among all links. The solution proposed is a hybrid PGA, with an annealing mechanism in the selection stage (PAGA, Parallel Annealed Genetic Algorithm). The parallelism is implemented by means of an island model, where each processor runs the same algorithm but with different mutation and crossover parameters, and some selected individuals are migrated among the processors. The migration occurs synchronously, which can explain in part the seemingly poor running times. The method is compared with both sequential SA and GA over four different networks having between 26 and 50 nodes and three different traffic requirement patterns. Substantial improvements in quality were obtained at the cost of much longer running times (up to 12 times the running time of the GA). The total running times (about 30 seconds) make the method suitable for static problems but not for real-time optimization if the demands are varying. The algorithm was more robust than the GA and the SA algorithms: the dispersion of the quality of the results over 100 runs was much smaller. Zappala [54] presented a distributed local search algorithm for building multicast routing trees for alternate path computation in large networks. In the alternate path routing approach, the network nodes mainly communicate using shortest paths, but alternative longer paths are available for the case in which the shortest paths are overloaded. In addition to topics related to the routing architecture and protocol, Zappala evaluates a distributed local path searching heuristic that utilizes only partial topology information (each receiver computes its own alternate paths) to find feasible alternate paths for the whole network. In Zappala’s fully distributed approach, each receiver conducts a limited search by collecting paths from routers, developing a partial map of the network which is used to compute alternate paths. All communications occur between the receiver and individual routers, explicitly avoiding any message passing between routers with the goal of providing a simple, purely local path computation algorithm. The author shows the efficacy of the distributed heuristic approach over a wide range of topologies and loads, proving that the local search algorithm can approximate the effectiveness of a global routing protocol with much lower overhead. The approach scales to large networks, where an exhaustive search is not viable, and that performance improves as long as the multicast group grows in size. Table 20.2 resumes the articles that have proposed parallel metaheuristics applied to network routing problems.
504
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
Table 20.2 Parallel metaheuristics applied to network routing problems Authnrls) Eren and Ersoy Siin and Sun Islam et al.
Year 2001 2002 2003
Xingwei et al.
2003
Zappala
2004
Related ootimization oroblem Virtual path routing. Network routing. All-pair routing problem in mobile ad hoc networks. Routing and wavelength assignment for multicast in Dense Wavelength Division Multidexing networks. Multifast trees.
Metaheuristic Annealing Genetic Ant Colony Optimization. Ant Colony Optimization.
Hybrid: Evolutionary Algorithm + Simulated Annealing. Local Search.
20.4 NETWORK ASSIGNMENT AND DIMENSIONING In this section, we summarize the works proposing parallel and distributed metaheuristics for facing problems related to assigning resources to a given network, dynamical dimensioning problems, and other miscellaneous applications. We have grouped the works into four categories: radio link frequency assignment, cellular networks, multi-commodity network design, and other works. Table 20.3 resumes the articles included in this section.
Table 20.3 Parallel metaheuristics applied to network assignment and dimensioning problems Author(s) kkstein Hurley et al. Hurley et al.
Year 1994 1994 1996
Bienstock and Gunluk Gunluk Gendron et al. Kwok
1996 1997 1997 1999 2000 2000 2000 2000
Zhou et al. Lee and Kang
Related optimization problem Multicommodity network design. Frequency assignment problem. Radio link frequency assignment. Capacitated network design. Capacitated network design. Uncapacitated network design. Dynamic channel assignment in mobile networks. Media mapping in video on demand server network. Cell planning with capacity expansion in wireless networks for multicast. Frequency assignment.
Weinberg et al.
2000 2001
Crainic and Gendreau
2002
Quintero and Pierre
2003
Gendron et al.
2003
Thompson and Anwar
2003
Lightwave assignment in WDMs
Oliveira and Pardalos
2004
Power control in wireless ad hoc networks.
Fixed charge multicommodity network design. Assigning cells to switches in mobile networks. Multicommodity capacitated location problem.
Metaheuristic Branch & Bound. Genetic Algorithm. Genetic Algorithm.
Branch & Bound. Branch & Bound. Branch & Bound. Genetic Algorithm. Simulated Annealing. Genetic Algorithm. Hybrid: Genetic Algorithm +Tabu Search + Random Walk. Tabu Search. Memetic Algorithm. Hybrid: Variable Neighbourhood Descent Slope Scaling. Parallel Recombinative Simulated Annealing Variable Neighbourhood Search.
NETWORK ASSIGNMENT A N D DIMENSIONING
505
20.4.1 Radio Link Frequency Assignment The Radio Link Frequency Assignment Problem (RLFAP) consists in assigning frequencies to a number of radio links in order to to simultaneously satisfy a large number of constraints and minimize the amount of different frequencies employed. T h s problem appears in radio networks and is known to be NP-hard. In an earlier proposal, Hurley, Crompton, and Stephens [ 131 solved the problem with a PGA. The authors compare the results obtained using two different chromosomerepresentations: a “simple representation” codifying legal frequency values for each node in the network and an “alternative representation” grouping together those sites with the same frequency assigned. The fitness function is formulated to minimize several parameters related to the electromagnetic interferences due to the use of similar frequency values for nearby transmitters. Computational results obtained when solving the problem considering a simulated but realistic military scenario showed that the improved ordered representationproposed yields to superior numerical values in terms of fewer constraint violations. Based on the previous work, Hurley, Thiel, and Smith [27] presented a comparative study of several metaheuristics applied to the RLFAP. The authors studied the assignment problem subject to several management constraints and proposed a SA, a distributed island PGA, and a TS procedure to solve it. The experimental results show that SA obtains the assignments with the lowest number of constraints violations. In addition, the SA and the TS algorithms perform a more efficient search, taking advantage of specialized neighborhood search operators and generating more assignments than the GA, which employs non-specialized operators and codification. The works of Weinberg, Bachelet, and Talbi [5 1,521 propose to solve the problem with the COSEARCH parallel hybrid metaheuristic [3], which is based on the cooperation of three complementary agents, balancing the exploration of the search space and the exploitation of good solutions previously found. An adaptive memory acts as coordinator for exchanging information related to the search procedure among the searching agent (which implements a simple TS algorithm), the diversifying agent (a GA), and the intensifying agent (using a random walk). The parallel implementation follows the master-slave paradigm, where the master process manages the workers and hosts the adaptive memory, the GA, and the intensifying agent. The slaves consist in several sequential TS algorithms. The authors tested the COSEARCH metaheuristic on several benchmark problems provided by France Telecom. Using a parallel hybrid algorithm, Kwok [30, 3 11 tackled the Dynamic Channel Assignment (DCA problem). This is a variant of the Frequency Assignment Problem where channels must be allocated to cells dynamically, depending on the traffic demands. Oriented to take advantage of static and dynamic assignment models, Kwok proposed a quasi-static dynamic approach, combining two modules: an offline module that employs a PGA to generate a set of allocation patterns and an on-line module using a parallel local search method based on table-lookup and reassignment strategies. The hybrid parallel model is executed on a Linux-based cluster of PCs and reports better results than other DCA algorithms, in terms of both solution quality and efficiency.
506
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
20.4.2 Cellular Networks
With the increment of cellular phones the interest in cellular networks is emphasized. There are many aspects related to this kind of network that are not new in the telecommunications domain, e.g., the radio link frequency assignment problem tackled in the previous section. However, some other aspects appear only in this context, such as assigning cells to switches and cell planning. Quintero and Pierre [43] proposed a parallel multipopulation memetic algorithm applied to the problem of assigning cells to switches in mobile networks. The objective of this problem consists in providing a cell assignment that minimizes the number and cost of required facilities when a mobile user changes the switch serving as relay and the mobile network needs to transfer a call in progress from a cell to another. The authors proposed a memetic algorithm combining the evolutive search of GAS with local refinement strategies based on TS and SA. The parallel algorithm follows a distributed island model with subpopulations arranged in a fully meshed topology. It employs a nonbinary representation codifying the cells and the switch to which is assigned, and the local search operator (TS or SA) is applied after the recombination and mutation stages. The experiments performed on a network of 10 workstations, Pentium at 500 MHz connected with a 100 Mbps LAN, show that the two local search strategies yield to better result quality, improving by 30% over the best sequential GA results and a 20% over the best PGA results. The memetic version that incorporates TS has the best computational performance, between 10 and 20 times faster than the SA version. The sequential GA is the slowest method. The authors also compare their memetic algorithms with a specific heuristic from Merchant and Sengupta [37] and a pure TS algorithm, showing that, although the memetic approaches are slower, they yield slight improvements in the cost function, which represent important fund savings over a 10-year period. Lee and Kang 1331 studied the cell planning problem with capacity expansion in wireless communications. This problem consists in finding optimal locations and capacities for new base stations in order to cover the expanded and increased traffic demand of mobile phones, minimizing the installation cost ofnew base stations. They propose a TS algorithm and compare it with a Grouping PGA, whose main difference with the standard PGA is mainly due to the use of group-oriented operators, suitable for grouping problems, including set covering. The PGA gives near-optimal solutions in Advanced Mobile Phone Service (AMPS) problems with up to 100 Time Division Access (TDA) but the quality of results degrades as the problem size increases, reporting gap values from the optimal solution of 20% in problems with 900 TDAs. Similar results are obtained when solving Code Division Multiple Access (CDMA) problems, reaching gap values between 25% and 30% for 2500 TDAs, while failing to meet the coverage factor desired. In both cases, the PGA did not achieve accurate results for large-size problems and showed itself as not competitive with the TS approach. The authors argued that this is due to the inaccurate penalty method used to handle the problem constraints.
NETWORK ASSIGNMENT AND DIMENSIONING
507
20.4.3 Multicommodity Network Design Several authors have presented multiple parallel approaches for solving different variants of the multicommodity network design problem. The problem consists in deciding the transportation of several commodities from sources to destinations over the links of a network with limited capacities. There are fixed construction costs related to the use of a link plus variable transportation costs related to the flow volumes. These kinds of problems are frequent in vehicle routing and logistic applications, but they also have several applications in telecommunications when addressing planning and operations management issues. A well-known problem of this kind looks for an optimal dimensioning of reserved capacities on the links of an existing telecommunication network to face “catastrophic” link failures. In a pioneer article, Eckstein [I91 proposed a parallel branch & bound algorithm for solving several mixed-integer programming problems, among which he faced optic fiber network design and multiperiod facility location problems. Eckstein studied several centralized and distributed parallelization approaches and load-balancing strategies, presenting comparative results on a set of 16 problems. The parallel branch & bound algorithm shows almost linear speedup values (with efficiency values near 0.8) even for the simplest parallel approaches. Experiments performed with up to 64 processors reveals that fully decentralized schemes scale correctly and are able to achieve similar overall performance to centralized approaches, while promising good expectations for larger configurationwhere the centralized schemes might suffer bottleneck problems. Extending the previous approach, Bienstock and Gunluk [6], and Gunlunk 1251 analyzed parallel branch & bound algorithms for solving mixed integer programming problems arising in capacitated network design problems. Considering a capacitated network and point-to-point traffic demands, these problems propose to install more capacity on the edges of the network and route traffic simultaneously,minimizing the overall cost. The authors presented a branch & cut algorithm and studied alternative parallel implementations to confront large-size real-life problem instances. In a first proposal, Gendron and Crainic [22] introduced a parallel branch & bound algorithm for solving the uncapacitated version of the problem when balancing requirements are involved. The algorithm consists of a synchronous initialization phase and an asynchronous exploration phase. It utilizes bounding procedures based on Lagrangian relaxation and known nondifferentiable optimization techniques. The parallel approach follows a master-slave model that accelerates mainly the bounding procedure performing operations on several subproblems simultaneously without changing the tree search procedure. The parallel asynchronous exploration shows sublinear speedup on a set of 10 representative test problems. However, it achieves significant speedup values on larger instances, allowing to solve a huge real-life planning problem in acceptable times. Later, Gendron, Potvin, and Soriano [23] designed a parallel hybrid heuristic for the multicommodity capacitated location problem with balancing requirement combining a Variable Neighborhood Descent (VND) and a Slope Scaling (SS) method. The parallel implementation is based on a coarse-grained master-slave approach,
508
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
where the master process manages the memories while the slave processes perform the computations. In addition, it employs adaptive memories to provide new starting points for both SS and VND methods. Experiments illustrate that adding a large number of processors leads to a diversification of the search procedure, improving the results quality. Crainic and Gendreau [ 111 studied in detail different alternatives for designing a cooperative parallel TS method for solving the Fixed-charge Capacitated Multicommodity Network Design Problem with linear costs (PCMND). The parallel TS method is based on several TS threads which communicate by means of a central memory or pool of solutions. Each individual thread works like the sequential TS by Crainic, Gendreau, and Farvolden [12], with a different set of parameter values (chosen from the pool of good parameters found for the sequential TS). The authors discuss in detail five different pool data selection strategies (regulatingwhich solution from the central pool will be sent to a search thread which asks for a given solution) and five external solution import strategies (when search threads will ask for a pool solution). These different strategies are compared on 10 problem instances, using 4, 8, and 16 search threads. The results indicate that, independently from the strategies, the parallel algorithm improves the quality of the results of the sequential algorithm (from the literature). Other conclusions are that the cooperative parallel search has better results than the independent parallel search, and the performance of selection strategies depends on the number of processors. Regarding the solution import strategies, the basic communication criterion (importing a solution before a diversification operation and accepting it if the imported solution is better than the current best) outperforms the alternative more sophisticated criterion. Based on these results, the authors fix the strategies and perform more experiments. The results indicate that parallel implementations consistently reduce the optimality gap over the entire range of problems considered. They require longer times than the sequential one but find good quality solutions faster.
20.4.4 Other Works The articles by Zhou, Lueling, and Zie [55, 561 describe a media-mapping problem in a video-on-demand server network. Given a server network architecture (a binary tree with ring), the problem consists in deciding which media assets -actually television recordings- to store in each server, the encoding quality of these assets, and the routing of the client requests to the servers. The overall objective is the maximization of the QoS level measured in terms of provided user coverage and media bit rates. The solution must satisfy constraints on server storage capacity and link bandwidth. This problem can be seen as an extension of the File Allocation Problem with the special property that the same video can be stored in files of different sizes, depending on the encoding quality chosen. The authors use the parSA library [29] to develop a parallel SA method for solving this problem. The initial solution is constructed in two phases: first a feasible solution is built and then a greedy strategy is applied to improve the QoS objective. The algorithm utilizes different neighborhood structures based on asset migration plus re-routing of demands (which
NETWORK ASSIGNMENT A N D DIMENSIONING
509
in one of the neighborhoods is optimized using backtracking). The articles report experimental results on 10 benchmark instances designed “ad hoc” with between 7 and 63 servers, 256 and 1024 different media assets, and different storage capacities, communication bandwidths, and levels of access patterns. The proposed method obtained good quality solutions: less than 2% average gap from upper bound for the best neighborhood structure. Recently, Oliveira, and Pardalos [40] faced the problem of determining optimal results of using mobile clients, such as cellular phones and personal digital assistants, over wireless ad hoc networks. The problem proposes to minimize resource consumption (such as battery) employing algorithmic techniques, and it is directly related to the definition of optimal strategies for data routing. The authors present mathematical programming models to compute the amount of power required by networks users at a specific time period. Since those models are difficult to solve using exact algorithms, they suggest a distributed VNS to find accurate solutions in reasonable execution times. The distributed algorithm is executed over the mobile agents, distributing the computational effort among the different nodes in the wireless network and cooperating to find a solution that can determine its power level. The algorithm divides the whole network topology in small parts and assigns them to different mobiles in the network. Each one computes the solution for part of the network and communicates with its neighbors. The results obtained on a single machine environment show that the distributed VNS algorithm is able to find accurate solutions similar or even better than those of the serial VNS and the relaxed integer programming method. In addition, the distributed algorithm shows very good computational time values. Thompson and Anwar [48]study the static lightwave assignment problem in a Wavelenght Division Multiplexing (WDM) network equipped only with selective cross-connects (i.e., cross-connects which can not apply wavelengh conversions). The problem consists in assigning wavelenghs to the different source-destinationpairs in such a way that on each link all routed flows have different wavelenghs (there is no conflict). To solve this problem, Thompson and Anwar employ a hybrid technique called Parallel Recombinative Simulated Annealing (PRSA) with features of the GA and the SA. Like GA it is a population-based method with crossover and mutation operators, and like SA it employs the Metropolis criterion for deciding whether to select the new individuals generated. Parallelism is implemented following an island model with asynchronous migration. In the case study, the islands are organized into a ring topology, receiving individuals from one neighbor and sending them to another one. A small case study (25 nodes, 600 traffic parcels) was generated with a random network generator. The final conclusions claim that lower levels of interaction between the different populations (i.e., fewer migrants) lead to better convergence rate and solution quality.
510
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
20.5 CONCLUSIONS In this chapter we have presented a summary of works dealing with the application of parallel metaheuristics in the field of telecommunications. We grouped the works in three categories: network design, network routing, and network assignment and dimensioning. As we can observe above, there are many problems in the telecommunication domain that are computationally intractable with classic exact techniques. In these problems metaheuristic algorithms have been applied in the literature not only with a sequential approach but also with parallel methods, as this chapter clearly demonstrates. The advantages of the parallel metaheuristic techniques come from their computational efficiency as well as from the different exploration schemes used by the parallel search methods.
Acknowledgments The third and fourth authors acknowledge partial funding by the Ministry of Science and Technology and FEDER under contract TlC2002-04498-C05-02 (the TRACER project).
REFERENCES 1. E. Alba, F. Almeida, M. Blesa, C. Cotta, M. Diaz, I. Dorta, J. Gabarro, J. Gonzalez, C. Leon, L. Moreno, J. Petit, J. Roda, A. Rojas, and F. Xhafa. MALLBA: A Library of Skeletons for Combinatorial Optimisation. In Proceedings of the Euro-Par, pages 927-932,2002. 2. E. Alba and F. Chicano. On the Behavior of Parallel Genetic Algorithms for Optimal Placement of Antennae in Telecommunications. International Journal of Foundations of Computer Science, 16(2):343-359,2005. 3. V. Bachelet and E-G. Talbi. COSEARCH: a Co-evolutionary Metaheuristic. In Proceedings of Congress on Evolutionary Computation (CEC ’2000), pages 1550-1557, San Diego, USA, 2000. 4. B. Baran and F. Laufer. Topological Optimization of Reliable Networks Using
A-Teams. In Proceedings of World Multiconference on Systemics, Cybernetics and Informatics - SCI’99. IEEE Computer Society, 1999.
5. R. Bemhard, J. Chambon, C. Le Pape, L. Perron, and J-C. Regin. Resolution d’un Probleme de Conception de Reseau avec Parallel Solver. In Proceeding of JFPLC, page 151,2002. (text in French). 6. D. Bienstock and 0. Gunluk. A Parallel Branch & Cut Algorithm for Network Design Problems. In INFORMS, Atlanta, USA, November 1996.
REFERENCES
511
7. P. Calegari, F. Guidec, P. Kuonen, and F. Nielsen. Combinatorial Optimization Algorithms for Radio Network Planning. Theoretical Computer Science, 263( 12):235-265,2001. 8. E. Canhi and M. Mejia. Experimental results in distributed genetic algorithms. In Proceedings of the Second International Symposium on Applied Corporate Computing, Monterrey, MCxico, pages 99- 108, 1994. 9. S. Canuto, M. Resende, and C. Ribeiro. Local Search with Perturbations for the Prize Collecting Steiner Tree Problem in Graphs. Networks, 38:50-58,2001. 10. A. Chabrier, E. Danna, C. Le Pape, and L. Perron. Solving a Network Design Problem. Annals of Operations Research, 130:217-239,2004. 11. T. Crainic and M. Gendreau. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 8(6):601-627,2002. 12. T.G. Crainic, M. Gendreau, and J.M. Farvolden. A Simplex-based Tabu Search Method for Capacitated Network Design. INFORMS Journal on Computing, 12(3):223-236,2000. 13. S. Hurley, W. Crompton, and N.M. Stephens. A Parallel Genetic Algorithm for Frequency Assignment Problems. In Proceedings IMACS/IEEE Int. Svmp. on Signal Processing, Robotics and Neural Networks, pages 8 1-84, Lille, France, 1994. 14. F. Cruz and G.R. Mateus. Parallel Algorithms for a Multi-level Network Optimization Problem. Parallel Algorithms and Applications, 18(3):121-137, 2003. 15. F. Cruz, G.R. Mateus, and J.M. Smith. Randomized Load-balancing for Parallel Branch-and-bound Algorithms for Multi-level Network Design. In Proceedings of 12th Symposium on ComputerArchitecture and High Performance Computing, pages 83-90, Sao Pedro, Brazil, 2000. 16. G. Di Fatta, G. Lo Presti, and G. Lo Re. A Parallel Genetic Algorithm for the Steiner Problem in Networks. In The 15th IASTED International Conference on Parallel and Distributed Computing and Systems, PDCS 2003, Marina del Rey, CA, USA, pages 569-573,2003. 17. S. Duarte and B. B a r b . Multiobjective Network Design Optimisation Using Parallel Evolutionary Algorithms. In X W I Z Conferencia Latinoamericana de Informatica (CLEI’2001), Mtrida, Venezuela, 2001. (text in Spanish).
18. S. Duarte, B. Barhn, and D. Benitez. Telecommunication Network Design with Parallel Multiobjective Evolutionary Algorithms. In Proceedings of IFIP/ACM Latin America Networking Conference,pages 1-1 1,2003. 19. J. Eckstein. Parallel Branch-and-bound for Mixed Integer Programming. SIAM NOVS, 27:12-15, 1994.
512
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
20. M. Eren and C. Ersoy. Optimal Virtual Path Routing Using a Parallel Annealed
Genetic Algorithm. In Proceedings of the IEEE International Conference on Telecommunications, volume 1, pages 336-341, Bucarest, June 2001. 21. C.M. Fonseca and P.J. Fleming. Genetic Algorithms for Multiobjective Opti-
mization: Formulation, Discussion and Generalization. In Genetic Algorithms: Proceedings of the Fifth International Conference, pages 4 1 M 2 3 . Morgan Kaufmann, 1993. 22. B. Gendron and T.G. Crainic. A Parallel Branch-and-bound Algorithm for Multicommodity Location with Balancing Requirements. Computers & Operations Reseach, 24(9): 829-847, 1997. 23. B. Gendron, J-Y. Potvin, and P. Soriano. A Parallel Hybrid Heuristic for the
Multicommodity Capacitated Location Problem with Balancing Requirements. Parallel Computing, 29(5):591-606,2003. 24. S. Gordon and D. Whitley. Serial and Parallel Genetic Algorithms as Function
Optimizers. In Proceedings of the Fifth International Conference on Genetic Algorithms, pages 177-183. Morgan Kaufmann, 1993. 25. 0. Gunluk. Parallel Branch-and-cut: A Comparative Study In International Symposium on Mathematical Programming, Laussanne, Switzerland, 1997. 26. R. Huang, J. Ma, T.L. Kunii, and E. Tsuboi. Parallel Genetic Algorithms for
Communication Network Design. In Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms/Architecture Synthesis (PAS ’97),pages 370377. IEEE Computer Society, 1997. 27. S. Hurley, S.U. Thiel, and D.H. Smith. A Comparison of Local Search Algo-
rithms for Radio Link Frequency Assignment Problems. In Oficialprogram of the 1996 ACM symposium on Applied Computing, pages 251-257. ACM Press, 1996. 28. M.T. Islam, P. Thulasiraman, and R.K. Thulasiram. A Parallel Ant Colony
Optimization Algorithm for All-pair Routing in MANETs. In IEEE Computer Sociery Fourth IPDPS workshop on Parallel and Distributed Scientijic and Engineering Computing with Applications (PDSECA-2003), Nice, France, p. 259, April 2003. 29. G. Kliewer and S. Tschoke. A General Parallel Simulated Annealing Library
and its Application in Airline Industry. In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS 2000), pages 55-6 1, Cancun, Mexico, 2000. 30. Y.K. Kwok. A Quasi-Static Cluster-Computing Approach for Dynamic Channel
Assignment in Cellular Mobile Communication Systems. In Proceedings ofthe 1999 IEEE Vehicular Technolgy Conference (VTC’99-Fall), volume 4, pages 2343-2347, Amsterdam, Netherlands, 1999. IEEE Press.
REFERENCES
513
31. Y.K. Kwok. Quasi-Static Dynamic Channel Assignment Using a Linux PC Cluster. In Proceedings of the 4th IEEE International Conference on High Perfonnance Computing in the Asia-Pacific Region - Volume I , pages 170-1 75, Beijing, China, May 14-17,2000. 32. C. Le Pape, L. Perron, J-C. RCgin, and P. Shaw. Robust and Parallel Solving of a Network Design Problem. In Pascal Van Hentenryck, editor, Proceedings of CP 2002, pages 633-648, Ithaca, NY,USA, September 2002. 33. C.Y. Lee and H.G. Kang. Cell Planning with Capacity Expansion in Mobile Communications: A Tabu Search Approach. IEEE Transactions on Vehicular Technology,49(5): 1678-1690,2000. 34. G. Lo Re, G. Lo Presti, P. Storniolo, and A. Urso. A Grid Enabled Parallel Hybrid Genetic Algorithm for SPN. In M. Bubak, P.M.A. Slot, G.D. Van Albada, and J. Dongarra, editors, Proc. of The 2004 International Conference on Computational Science, ICCS’O4, Lecture Notes in Computer Science, Vol. 3039, pages 156-163, Krakow, Poland, June 6-9,2004. Springer. 35. S. L. Martins, C. C. Ribeiro, and M. C. Souza. A Parallel GRASP for the Steiner Problem in Graphs. In Workshop on Parallel Algorithms for Irregularly Structured Problems, pages 285-297, 1998. 36. S.L. Martins, M.G.C. Resende, C.C. Ribeiro, and P.M. Pardalos. A Parallel GRASP for the Steiner Tree Problem in Graphs Using a Hybrid Local Search Strategy. Journal of Global Optimization, 17:267-283,2000. 37. A. Merchant andB. Sengupta. Assignment ofcells to Switches inPCS Networks. IEEE/ACM Transactions on Networking, 3(5):521-526, October 1995. 38. H. Meunier, E-G. Talbi, and P. Reininger. A Multiobjective Genetic Algorithm for Radio Network Optimization. In Proceedings of the 2000 Congress on Evolutionary Computation CECOO, pages 3 17-324, California, USA, 2000. IEEE Press. 39. S. Nesmachnow, H. Cancela, and E. Alba. Evolutive Techniques Applied to Reliable Communication Network Design. In Tercer congreso espaiiol de Metaheuristicas, Algoritmos Evolutivos y Bioinspirados (MAEB’O4),pages 388-395, Cordoba, Spain, 2004. (text in Spanish). 40. C.A.S. Oliveira and P.M. Pardalos. A Distributed Optimization Algorithm for Power Control in Wireless Ad Hoc Networks. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’O4),page 177,
Santa Fe, New Mexico, April 26-30,2004. 4 1. L. Perron. Parallel and Random Solving of a Network Design Problem. In AAAI 2002 Workshop on Probabilistic Approaches in Search, pages 35-39,2002. 42. L. Perron. Fast Restart Policies and Large Neighborhood Search. In Proceedings of CPAIOR’O3, Montreal, Canada, May 8-10,2003.
514
PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS
43. A. Quintero and S. Pierre. Sequential and Multi-population Memetic Algorithms for Assigning Cells to Switches in Mobile Networks. Computer Networks: The International Journal of Computer and Telecommunications Networking, 43(3):247-261, October 2003. 44. C.C. Ribeiro and I. Rosseti. A Parallel GRASP for the 2-path Network Design Problem. In Burkhard Monien and Rainer Feldmann, editors, Parallel Processing 8th International Euro-Par Conference, Lecture Notes in Computer Science, Vol. 2400, pages 922-926, Paderborn, Germany, 2002.
45.K.M. Sim and W.H. Sun. Multiple Ant-Colony Optimization for Network Rout-
ing. In Proceedings of the First International Symposium on Cyber Worlds, pages 277-28 1,2002.
46. N. Srinivas and K. Deb. Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms. Evolutionary Computation, 2(3):221-248, 1994. 47. R. Tanese. Distributed Genetic Algorithms. In Proceedings of the Third International Conference on Genetic Algorithms, pages 434-439. Morgan Kaufmann, 1989. 48. D.R. Thompson and M.T. Anwar. Parallel Recombinative Simulated Annealing for Wavelength Division Multiplexing. In Proceedings of the 2003 International Conference on Communications in Computing, pages 2 12-2 17, Las Vegas, NV, June 23-26,2003. 49. P. Thulasiraman, R.K. Thulasiram, and M.T. Islam. An Ant Colony Optimization Based Routing Algorithm in Mobile Ad Hoc Networks and its Parallel Implementation. In Laurence Tianruo Yang and Yi Pan, editors, High Performance Scientific and Engineering Computing: Hardware/Software Support, Probability: Pure and Applied, chapter 18, pages 267-284. Kluwer Academic Publishers, 2004. 50. S. Watanabe, T. Hiroyasu, and M. Mihand. Parallel Evolutionary Multi-criterion Optimization for Mobile Telecommunication Networks Optimization. In Proceedings of the EUROGEN2001 Conference, pages 167-1 72, Athens, Greece, September 19-21, 2001. 51. B. Weinberg, V. Bachelet, and E-G. Talbi. A Coevolutionary Metaheuristic for the Frequency Assignment Problem. In Frequency Assignment Workshop, London, England, July 2000. 52. B. Weinberg, V. Bachelet, and E-G. Talbi. A Co-evolutionist Meta-heuristic for the Assignment of the Frequencies in Cellular Networks. In First European Workshop on Evolutionary Computation in Combinatorial Optimization EvoCOP’2001, Lecture Notes in Computer Science Vol. 2037, pages 140-149, Lake Come, Italy, April 200 1. 53. W. Xingwei. A Multi-population-p~allel-genetic-simulated-a~ealing-based QoS Routing and Wavelength Assignment Integration Algorithm for Multicast
REFERENCES
515
in DWDM Networks. In 16th APAN Meetings /Advanced Network Conference, Busan, August 2003. 54. D. Zappala. Alternate Path Routing for Multicast. IEEEACM Transactions on Networking, 12(1):30-43,2004. 55. X. Zhou, R. Lueling, and L. Xie. Heuristic Solutions for a Mapping Problem in a TV-Anytime Server Network. In Workshops of 14th IEEE International Parallel and Distributed Processing Symposium (IPDPS'OO), Lecture Notes in Computer Science, Vo1.1800, pages 210-217, Cancun, Mexico, 2000. Springer Verlag.
56. X. Zhou, R. Lueling, and L. Xie. Solving a Media Mapping Problem in a Hierarchical Server Network with Parallel Simulated Annealing. In Proceedings of the 29th International Conference on Parallel Processing (ICPP '00), pages 115-124, Toronto, 2000. IEEE Computer Society. 57. E. Zitzler. Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications. PhD thesis, Swiss Fed. Inst. Techn. Zurich, 1999.
This Page Intentionally Left Blank
2 1 Bioinformatics and
Parallel Metaheuristics
OSWALDO TRELLES, ANDRkS RODRiGUEZ Universidad de Malaga, Spain
21.1
INTRODUCTION
Many bioinformatics applications involve hard combinatorial search over a large solution space. This scenario seems to be a natural application domain for parallel metaheuristics. This chapter surveys the computational strategies followed to parallelize the most used software in the bioinformatics arena, laying special emphasis on metaheuristic approaches. It strongly follows the document “On the Parallelization of Bioinformatics Applications” [7 11 but extends both, the applications to include metaheuristic methods and the computational details with deep comments on efficiency and implementation issues with special focus on information technology details. The studied algorithms are computationally expensive and their computational patterns range from regular, such as database-searching applications, to very irregularly structured patterns (phylogenetic trees). Fine and coarse-grained parallel strategies are discussed for these very diverse sets of applications. This overview outlines computational issues related to parallelism, physical machine models, parallel programming approaches, and scheduling strategies for a broad range of computer architectures. In particular, it deals with shared, distributed, and shareddistributed memory architectures. 21.1.1
Inter-Disciplinary Work
Born almost at the same time 50 years agd, molecular biology and computer science have grown explosively as separate disciplines. However,just as two complementary DNA strands bind together in a double helix to better transmit genetic information, an evolving convergence has created an interrelationship between these two branches of science. In several areas, the presence of one without the other is unthinkable. Not only has traditional sequential Von Neumann-based computing been fertilized through this interchange of programs, sequences, and structures, but the biology 517
518
BIOINFORMATICS AND PARALLEL METAHEURlSTlCS
field has also challenged h g h performance computing with a broad spectrum of demanding applications (for CPU, main memory, storage capacity, and I/O response time). Strategies using parallel computers are driving new solutions that seemed unaffordable only a few years ago. 21.1.2 Information Overload With the growth of the information culture, efficient digital searches are needed to extract and abstract useful information from massive data. In the biological and biomedical fields, massive data take the form of bio-sequences flat files, 3D structures, motifs, 3D microscopic image files, and more recently, videos, movies, animations, etc. However, while genome projects and DNA arrays technology are constantly and exponentially increasing the amount of data available (for statistics see http:l/www3.ebi.ac.uWServices/DBStats), our ability to absorb and process this information remains near constant. It was only a few years ago when we were confident that evolution in computerprocessing speed, increasing exponentiallylike some areas of knowledge in molecular biology, could handle the growing demand posed by bioinformatic applications. Processing power has jumped from the once-impressive 4.77 MHz in the early Intel 8088 to more than 3.5 GHz current frequencies in the Pentium IV and AMD Athon 64 gallery. Most probably, commercial processors with up to 5 GHz will be available in the course ofthis decade; moreover Intel estimates the 10 GHz CPUs will be available by 2010. This exponential growth rate can also be observed in the development of practically every computer component, such as the number of CPU transistors, memory access time, cache size, etc. However, contemporary genome projects have delivered a blow to this early confidence. From the completion of the first whole organism's genome (saccharomyces, mid- 1998), the growth rates for biological data have become a detriment to sequential computing processing capability. At this point, sequential (one-processor)computing can allow only a small part of the massive, multidimensional biological information to be processed. Under this scenario, comprehension of the data and understanding of the data-described biological processes could remain incomplete, causing us to lose vast quantities of valuable information because CPU power and time constraints could fail to follow critical events and trends. 21.1.3 Computational Resources
From a computationalpoint of view, there are several ways to address the lack of hard computing power for bioinformatics. The first is by developing new, faster heuristic algorithms that reduce computational space for the most time-consuming tasks [3][54]. The second is incorporating these algorithms into the ROM of a specialized chip (i.e., the bio-accelerator at Weizmann Institute, http://sgbcd//weizma.ac.iV). The third and most promising consideration, however, is parallel computing. Two or more microprocessors can be used simultaneously, in parallel processing, to divide
BIOINFORMATICS AT A GLANCE
519
and conquer tasks that would overwhelm a single, sequential processor. However promising, parallel computing still requires new paradigms in order to harness the additional processing power for bioinformatics. Before this document embarks on a detailed overview of the parallel computing software currently available to biologists, it is useful to explore a few general concepts about the biological concerns and about computer architectures, as well as the parallel programming approaches that have been used for addressing bioinformatic applications.
21.2 BIOINFORMATICS AT A GLANCE One of the major challenges for computer scientists who wish to work in the domain of computational biology is becoming fluent with the basis of biological knowledge and its large technical vocabulary. Of course we do not pretend to fully introduce the reader in such specialized vocabulary (see, for example, [l, 421). Nothing could be further from a real objective for this short document, but a minimal understanding of the relationships and internals of biological data is important to design coherent and useful bioinformatic software. Helping to answer questions that have been pursued by nearly all cultures is by itself of profound interest ... and computers have been in the border line of those. All living organisms are endowed with genetic material that carries out the instructions to build all the other constituents of the cell. This information is mainly stored in long strands of DNA grouped into X-shaped structures called chromosomes. DNA instructions are written in a four-letter alphabet represented by A, C. G , T that correspond to the Adenine, Cytosine, Guanine and Thymine nucleotides. All of the genetic information of an organism is referred to as its genome. The genome size varies from few hundreds in some bacteria to more than lo1' nucleotides in salamander (i.e., the Human genome is approximate 3 100 million nucleotides long). But not all DNA in eukaryotes codes for proteins. Just around 5% of the human genome is formed by coding regions called exons, separated by long strands of noncoding regions named introns. Even more, the instructions for producing a particular protein -called genes- are normally composed of several exons separated by introns inserted into them. These introns are spliced out before the sequence is translated into aminoacids (the constituent of proteins). The cell machinery is able to interpret the gene instructions following the rules of the genetic code. Each non-overlapping triplet of nucleotides, called a codon, codes for one of the 20 different amino acids. Observe that four nucleotides can code 43 = 64 possible triplets, which is more than the 20 needed to code for each amino acid. Three of these codons designate the end of a protein sequence (stop codons). That means that most amino acids are encoded by more than one codon, which is explained as the degeneracy of the code. Many technologicalbreakdowns have made it possible to obtain the DNA sequence of whole organisms. A genomic project is the effort to obtain the sequence of the
520
BlOlNFORMATlCS AND PARALLEL METAHEUNSTICS
DNA of a given organism. Being able to divide the genome into moderate-sized chunks is a prerequisite to determining its sequence. Sequencing is performed at a resolution of a few thousand base-pairs at a time. Thus, in order to determine the sequence of large pieces of DNA, many different overlapping chunks must be sequenced, and then these sequences must be assembled. A first draft of the human genome was obtained at the end of the last century, and nearly 100 of high eukaryotes genomes are now available (see http:llwww.ebi.ac.ukgenomes). Although knowing the genome composition of a given organism is an important achievement, it is only the first step in understanding the biological process that underlies life. The second step is to identify the genes hidden in the mountains of DNA information. But the process is not as easy as it seems. The basic process of synthesizing proteins maps from a sequence of codons to a sequence of amino acids. However, there are some important complications: since codons come in triples, there are three possible places to start parsing a segment of DNA, and it is also possible to read off either strand of the double helix, and finally there are well-known examples of DNA sequences that code for proteins in both directions with several overlapping reading frames. The central feature of life organisms is their ability to reproduce and become different through an accumulativeprocess named evolution. In order to evolve, there must be a source of variation such as random changes or mutations (insert a new nucleotide, delete an existing one, or change one nucleotide into another), sexual recombination, and various other hnds of genetic rearrangements. These changes modify the genetic information passed from parent to offspring. The analysis of such variations allows us to determine the way in which organisms were diverging (phylogenetic analysis). This is mostly performed by similarity comparison of molecular sequences. The similarities and differences among molecules that are closely related provide important information about the structure and fbnction of those molecules. Once the genes have been identified, and thus the protein it encodes, next step is to disclose the role of the protein in the organism. The sequence of amino acid residues that make up a protein is called the primary structure of the protein, but the function the protein holds is more related to its tri- dimensional conformation and one of the major unsolved problems in molecular biology is to be able to predict the structure and function of a protein from its amino acid sequence. In raw terms, the folding problem involves finding the mapping from primary sequence (a sequence of from dozens to several thousand symbols, drawn from a 20 letter alphabet) to the real-numbered locations of the thousands of constituent atoms in 3D space. An approach to the protein folding problem starts with the prediction of the secondary structure of the protein, which refers to local arrangements of a few to a few dozen amino acid residues that take on particular conformations that are seen repeatedly in many different proteins: corkscrew-shaped conformations called ahelices and long flat sheets called a @-strand,and a variety of small structures that link other structures: turns. The next problem is to determine the position of the atoms in a folded protein - known as its tertiary structure - and finally, some proteins only become functional when assembled with other molecules, termed the quaternary structure of the protein.
BlOlNFORMATlCS AT A GLANCE
521
Although every cell has the same DNA, at any particular time, a given cell is producing only a small fraction of the proteins coded for in its DNA. The amount of each protein is precisely regulated for the cell to function properly in a given particular environment. Thus, the cell machinery modifies the level of proteins as a response to changes in the environmental conditions or other changes. Although the amount of protein produced is also important, genes are generally said to be expressed or inhibited. Recently advances in gene monitoring microarray technology [58] has enabled the simultaneous analysis of thousands of gene transcriptions in different developmental stages, tissue types, clinical conditions, organisms, etc. The availability of such expression data affords insight into the functions of genes as well as their interactions, assisting in the diagnosis of disease conditions and monitoring the effects of medical treatments. The revolution in biology comes from the knowledge of the basic transformations of intermediary metabolism that can involve dozens or hundreds of catalyzed reactions. These combinations of reactions, which accomplish tasks like turning foods into usable energy or compounds, are called metabolic pathways. Because of the many steps in these pathways and the widespread presence of direct and indirect feedback loops, they can exhibit much counterintuitive behavior. When Doolittle et al. (1983) [14] used the nascent genetic sequence database to prove that a cancer-causing gene was a close relative of a normal growth factor, molecular biology labs all over the world began installing computers or linking up to networks to do database searches. Since then, a bewildering variety of computational resources for biology have arisen. Technological breakthroughs such as high-throughput sequencing and geneexpression monitoring technology have nurtured the “omics” revolution enabling the massive production of data. Unfortunately, this valuable information is often dumped in proprietary data models and specific services are developed for data access and analysis, without forethought to the potential external exploitation and integration of such data. Dealing with the exponential growing rates of biological data was a simple problem when compared with the problem posed by diversity, heterogeneity, and dispersion of data [ 5 5 ] . Nowadays, the accumulated biological knowledge needed to produce a more complete view of any biological process is disseminated around the world in the form of molecular sequences and structure databases, frequently as flat files, as well as imagekcheme-based libraries, web-based information with particular and specific query systems, etc. Under these conditions, parallel computers and grid technology are a clear alternative to help in exploiting this plethora of interrelated information pointing to the integration of these information sources as a clear and important technological priority.
522
BlOlNFORMATlCS AND PARALLEL METAHEURlSTlCS
21.3 PARALLEL COMPUTERS 21.3.1 Parallel Computer Architectures: Taxonomy A parallel computer uses a set of processors that are able to cooperate in solving computational problems [22]. This cooperation is made possible, first, by splitting the computational load of the problem (tasks or data) into parts and, second, by reconnecting the partial computations in order to create an accurate outcome. The way in which load distribution and reconnection (communications) are managed is heavily influenced by the system that will support the execution of a parallel application program. Parallel computer systems are broadly classified into two main models based on Flynn’s ( 1972) [2 11 specifications: single-instruction multiple-data (SIMD) machines, and multiple-instruction multiple-data MIMD machines. SIMD machines are the dinosaurs of the parallel computing world, once powerful, but now facing extinction. A typical SIMD machine consists of many simple processors (hundreds or even thousands), each with a small local memory. Every processor must execute, at each computing or “clock” cycle, the same instruction over different data. When a processor needs data stored on another processor, an explicit communication must pass between them to bring it to local memory. The complexity and often the inflexibility of SIMD machines, strongly dependent on the synchronization requirements, have restricted their use mostly to special-purpose applications. MIMD machines are more amenable to bioinformatics. In MIMD machines, each computational process executes at its own rhythm in an asynchronous fashion with complete independence of the other computational processes [34]. Memory architecture has a strong influence on the global architecture of MIMD machines, becoming a key issue for parallel execution, and frequently determines the optimal programming model. It is really not difficult to distinguish between shared and distributed memory. A system is said to have shared-memory architecture if any process, running in any processor, has direct access to any local or remote memory in the whole system. Otherwise, the system has distributed-memoryarchitecture. Shared-memory architecture brings several advantages to bioinformatic applications. For instance, a single address map simplifies the design of parallel programs. In addition, there is no “time penalty” for communicationbetween processes, because every byte of memory is accessible in the same amount of time from any CPU (uniform memory access: UMA architecture). However, nothing is perfect, and shared memory does not scale well as the number of processors in the computer increases. Distributed-memory systems scale very well, on the other hand, but the lack of a single physical address map for memory incurs a time penalty for interprocess communication (nonuniform memory access: NUMA architecture). Current trends in multiprocessor design try to achieve the best of both memory architectures. A certain amount of memory physically attaches to each node (distributed architecture), but the hardware creates the image of a single memory for the
523
PARALLEL COMPUTERS
whole system (shared architecture). In this way, the memory installed in any node can be accessed from any other node as if all memory were local with only a slight time penalty. A few years ago, two technological breakthroughs made possible another exciting approach to parallel computing. The availability of very fast processors in workstations, together with the widespread utilization of networks, led to the notion of a “virtual parallel computer” that connected several fast microcomputers by means of a Local Area Network (LAN). This distributed-memory system was called multi-computer architecture. Multicomputer configurations are constructed mainly with clusters of workstations (COWS),although one emerging multicomputer architecture is beowulf-clusters (http:/lwww.beowulf.org), whch are composed of ordinary hardware components (like any PC) together with public domain software (like Linux, PVM, or MPI). A server node controls the whole cluster, serving files to the client nodes.
Architectures
M Single lnstnrction Multiple Data
SM Shared Memory
DM Multiple Instruction
SM DM Distributed Mem. I
MPP Massively Pard.Pr.
1 ~
cow
SDM Shared Distr. Mem.
Clusters of Work.
MC ~ l I
t
i
~
BC Beowlf Clusters
~ SD~
~
~
t
~
~
Intercanretion Network
Fig. 21.1 Summarized parallel computer architecturetaxonomy and memory models. Many forms of parallelism exist today. Some architecturesbring together a relatively small number of very tightly-coupled processors. In other designs, the coupling of processors is relatively loose, but the number of processors can scale up to the thousands. A diagram of parallel architecturetaxonomy is presented on the left. On the right, we show the most used memory models available for these architectural designs.
Multicomputers bring several advantages to parallel computing: cost (on average, one order of magnitude cheaper for the same computational power), maintenance (replacing fault nodes), scalability (adding new nodes), and code-portability. Some drawbacks also exist, such as the lack of available software that enables management of the cluster as one integrated machine. In addition to this, current network technology has high latency and insufficient bandwidth to handle fast parallel processing.
524
BlOlNFORMATlCS AND PARALLEL METAHELIRISTICS
These factors limit the effectiveness of this architecture at the present time, although it looks promising given the expected capabilities of future technologies. 21.3.2 Parallel Programming Models In simple terms, parallel software enables a massive computational task to be divided into several separate processes that execute concurrently through different processors to solve a common task. The method used to divide tasks and rejoin the end result can be used as a point of reference to compare different alternative models for parallel programs. In particular, two key features can be used to compare models: 1 . Granularity: the relative size of the units of computation that execute in parallel (coarseness or fineness of task division), and
2. Communication: the way that separate units of computation exchange data and synchronize their activity. Most of todays advanced single-microprocessorarchitectures are based on the Superscalar and Multiple Issue paradigms (MIPS-R10000, Power-PC, Ultra-Sparc, Alpha 21264, Pentium 111, etc.) These paradigms have been developed to exploit Instruction Level Parallelism (ILP): the hardware level of granularity. The finest level of software granularity is intended to run individual statements over different subsets of a whole data structure. This concept is called data-parallel, and is mainly achieved through the use of compiler directives that generate library calls to create lightweight processes called threads, and distribute loop iterations among them. A second level of granularity can be formulated as a “block of instructions”. At this level, the programmer (or an automatic analyzer) identifies sections of the program that can safely be executed in parallel and inserts the directives that begin to separate tasks. When the parallel program starts, the runtime support creates a pool of threads which are unblocked by the runtime library as soon as the parallel section is reached. At the end of the parallel section, all extra processes are suspended and the original process continues to execute. Ideally, if we have n processors, the run time should also be 72 times faster with respect to the wall clock time. In real implementations, however, the performance of a parallel program is decreased by synchronizationbetween processes, interaction (information interchanges), and load imbalance (idle processors while others are busy). Coordination between processes represents sources of overhead, in the sense that they require some time added to the pure computational workload. Much of the effort that goes into parallel programming involves increasing efficiency. The first attempt to reduce parallelization penalties is to minimize the interactions between parallel processes. The simplest way, when possible, is to reduce the number of task divisions; in other words, to create coarsely grained applications.
PARALLEL COMPUTERS
525
Once the granularity has been decided, a crucial question arises: how will the parallel processes interact to coordinate the behavior of each other? Communications are needed to enforce correct behavior and create an accurate outcome. 21.3.3 Communications When shared memory is available, interprocess communication is usually performed through shared variables. When several processes are working over the same logical address space, locks, semaphores, or critical sections (blocks of code that only one process can execute at a time) are required for safe access to shared variables. When the processors use distributed memory, all interprocess communication must be performed by sending messages over the network. With this messagepassing paradigm, the programmer needs to keep in mind where the data are, what to communicate, and when to communicate to whom. Library subroutines are available to facilitate message-passing constructions: PVM [63], MPI (http://www.mpi-forum.org/index.html),etc. As one might imagine, writing parallel code for a disjointed memory space address is a difficult task, especially for applications with irregular data access patterns. To facilitate this programming task, software distributed shared memory provides the illusion of shared memory on top of the underlying message-passing system (i.e., TreadMarks, http://www.cs.rice.edu/ willy/TreadMarks/overview.html).
21.3.4 Task Scheduling Strategies Common knowledge gained from working on parallel applications suggests that obtaining an efficient parallel implementation is fundamental to acheve a good distribution for both data and computations. In general, any parallel strategy represents a trade-off between reducing communication time and improving the computational load balance. The simple task scheduling strategy is based on a masterlslave approach. In essence, one of the processors acts as a master, scheduling and dispatching blocks of tasks (e.g., pairwise sequence alignments) to the slaves, which, in turn,perform the typical calculations specified by the algorithm. When the slave completes one block, the master schedules a new block of tasks and repeats this process until all tasks have been computed. Efficiency can be improved by slaves prefetching tasks from the master so as to overlap computations and communications. Efficiency is further improved by catching problems in slaves, so that slaves communicate with the manager only when no problems are available locally. As the number of slaves scales upward, slaves can be divided into sets, each with a submaster, in a herarchical fashion. Finally, in a filly decentralized model, each processor manages its own pool of tasks, and idle slave processors request tasks from other processors. One can easily see how bioinformatics applications, with their massive data calculation loads, would be amenable to parallel processing.
526
BIOINFORMATICS AND PARALLEL METAHEURISTICS
At this point, a very schematic and abbreviated description of parallel architectures has been presented for easier comprehension. A more academic, up-to-date, and detailed description can be found, for example, in Tanenbaum (1999), ([64], chapter 8: Parallel Computer Architectures).
21.4 BIOINFORMATIC APPLICATIONS In this section, different and routinely used algorithms will be presented to describe the strategies followed to parallelize bioinformatic software. The discourse has been organized by the task-level computational pattern observed in such algorithms, from regular to irregular structured [56]. Traditionally, a regular-irregular classification, also named synchronous/asynchronous(and their respective semi-regularand loosely synchronous levels), has been used in such a way that it was closely related to the characteristic that computations were performed over dense or sparse matrices. However, when working with non-numerical applications, as is the case for most of bioinformatic applications, the rate of free-dependent tasks, the data access pattern, and the task homogeneity, are appropriate indices used to classify applications. 21.4.1 Regular Computational Pattern: Database Searching Database searching (DBsrch) is the most heavily used bioinformatic application. It is also one of the most familiar applications to begin a discussion about parallelization in bioinformatics: DBsrch has a very simple form as far as data flow is concerned, and a broad range of strategies have been proposed to apply parallel computing. The primary influx of information for bioinformatics applications is in the form of raw DNA and protein sequences. Therefore, one of the first steps towards obtaining information from a new biological sequence is to compare it with the set of known sequences contained in the sequence databases. Results often suggest functional, structural, or evolutionary analogies between the sequences. Two main sets of algorithms are used for painvise comparison (the individual task in a DBsrch application): 1. Exhaustive algorithms based on dynamic programming methodology [48][61]. 2. Heuristic (faster and most used) approaches such as the FASTA [73][43][54] and BLAST [2][3] families. DBsrch applications allow two different granularity alternatives to be considered: fine- and coarse-grained parallelism. Early approaches focused on data-parallel over SIMD machines (notably the ICL-DAP massive parallel computer) starting with the pioneering work of Coulson et al. (1987) [lo]. Deshpande et al. (1991) [13] and Jones (1992) [36] presented a work on hypercubes and CM-2 computers. Soon after, Sturrock and Collins (1993) [62] implemented the exhaustive dynamic programming algorithm of Smith and Waterman (1981) [61] in the MasPar family of parallel
BIOINFORMATIC APPLICATIONS
527
machines (from the minimum 1024-processor configuration of MP-1 systems up to a 16,384-processor MP-2 system). They roughed out one of the first remote servers over parallel machines (the BLITZ server at the EMBL, http//:www.emblheidelberg.de) that is still active at the EBI (http://www.ebi.ac.uk/MPsrch/). Simple and elegant dynamic programming-based algorithms compute an SN,M matrix ( N and A4 being the sequence lengths). The S2,3cell is defined by the expression s2,J
+
= maz[{S%-l,j-l w ( 2 2 , 9 3 ) } ,
{SZ-IJ
+
crcJ},
{Sz,j - +erg}],
where w represents a scoring scheme for every pair of residues xi,yj, and asis a negative value representing the penalty for introducing or extending a gap of length g. To compute the S2,Jcell, data dependencies exist with the value of the previous cell in the same diagonal, and the best values are on the left of the previous row and on top of the previous columns.
Fig. 21.2 Diagonal-sweep fine-grained workload distribution for SIMD machines to avoid data dependencies. Rows are distributed along processors (residue 2 , of query sequence is assigned to processor Pi) and processor Pi starts its computations with a delay of z columns. There will be ( P x ( P- 1))idle processors at the beginning and at the end of computations.
Fine-grain means, in this case, that processors will work together in computing the S matrix, cell by cell. Edmiston and Wagner (1987) [16], and Lander et al. (1988) [41] organized the CM-2 machine as an array of processors to compute in diagonal-sweep fashion the matrix S (see Figure 21.2). An advantage is that this strategy only requires local communications (in each step, Pi sends Si,j to Pi+l to allow it to compute Si+l,j in the next step, while Pi computes Si,j+l). Query sequence length determines the maximum number of processors able to be assigned, and processors remain idle at begidend steps. Both inconveniences are important due to the high number of processors usually present in SIMD architectures. Around this time, Collins et al. (1987) [9] proposed a row-sweep workload distribution, splitting the sequence database into groups of 4096 residues to be assigned to a 64 x 64 array of processors. This modification was significant, because solving both problems (number of P’s greater than sequence length and idle processors) addresses an important problem: data dependencies. In fact, in step j , Pi computes
528
BlOlNFORMATlCS AND PARALLEL METAHEURISTICS
only partially the cell Sz,j(Sz- l , J - 1 is received by a message from Pz- 1 in step j - 1 and the best column value is in the same processor). At this point, the best horizontal value is needed to complete the cell final value. To broadcast this value, only logzP messages are used when processor P sends a message in iteration i to processor P + 22 (with i = 1...12). Note that a given processor needs to send a message only when it changes its best row value (a very unlikely event); thus, in practical terms, the number of messages is much lower. It might seem unnecessary that the last two paragraphs have been used to discuss parallel strategies for computers that, to use a colloquial expression, are in danger of extinction. However, apart from its historical interest, there are other good reasons. As we can see right away, a coarse-grained approach is the best for a great number of tasks (such as most of today’s parallel bioinformatic problems). However, several other applications exist for which there are not enough independent tasks to be solved concurrently. It is still possible to learn from early approaches and obtain fruitful conclusions that improve new parallel solutions. There are several proposed strategies for achieving coarse-grained parallelism in DBsrch applications. Most of them can be explained on the basis of the general pseudocode:
Algorithm 1. DBsrch Algorithm get parameters; get query-sequence; perforninicializations(); for each sequence E {Database}
{
1
score=Algorithm(query-sequence, sequence, parameters); maintain a trace of best results(sequence,score);
results optimization(); report best results();
In this general sequential pseudocode for a DBsrch application, first step sets the initial stage of the algorithm, and the loop manages the algorithm extension which works until the number of database sequences is exhausted. Inside the loop, the next sequence is compared against the query sequence. The result value is often used to rank the best results, and finally after the main loop, specific implementations can incorporate a last optimization step (i.e., assessing the statistical significance of results) and report the results. As should be noted, the algorithm has a very simple form as far as data flow is concerned. The database sequence corresponds to the data set to be searched, which, we need to keep in mind, is a set of sequences of different lengths. In essence, in a typical coarse-grained parallel implementation, one of the processors acts as a “master”, dispatching blocks of sequences to the “slaves” which, in turn,perform the
BIOINFORMATIC APPLICATIONS
529
algorithm calculations. When the slaves report results for one block, the master sends a new block. This strategy is possible because results from the comparison between two sequences (query and database sequences) are independent of the previous results deriving from the comparison of the query with other sequences. However, the time required in the processing of any given sequence depends not only on the length of the sequence, but also on its composition. Therefore, the use of a dynamic load balancing strategy is necessary. The simplest way is to modify the way in which the master processor distributes the load on demand from the slaves. Obviously, sending one-sequence messages introduces additional expensive time overhead due to the high number of messages interchanged. Thus, rather than distributing messages sequence-by-sequence, better results are achieved by dispatching blocks of sequences [ 131. Additional improvements are obtained by applying buffering strategies that reduce or eliminate slave inactivity while waiting for a new message (server data starvation). The master processor can send, at the outset, more than one block of sequences to each slave, so that a slave has a new block at the ready to continue working as soon as each block is completed [67]. Several methods have been used to determine the size of the block of sequences to be distributed. The simplest way is to divide the database in n chunks (nbeing the number of slave processes) and obviously assign one chunk to each slave [44]. The data chunks can even reside in a local disk storage. To minimize load unbalancing, sequences are ordered by size and are assigned in round-robin fashion to chunks. The strategy is simple, inexpensive, and effective. Unfortunately, it also presents at least two difficult problems:
1. To perform the distribution it is necessary to know in advance the number of processors (n). 2. When working in heterogeneous environments, such as multicomputers clusters of workstations, the CPU time needed to process each chunk can be quite different, depending on the CPU power and the CPU availability in each node. A direct solution divides the database in m blocks of sequences ( m>> n) of fixed length (with block size around 4 to 16 Kbytes, aiming to maximize the network bandwidth) and assigns blocks to slaves on demand. In this way, the maximum imbalance at the end of computations is proportional to the block size, and scheduling cost (including message-passing) is proportional to m. The major scheduling-distribution cost is normally shadowed by using buffering strategies, as explained above. An additional specialization can be obtained by using blocks of variable size [68]. This last approach allows a pattern of growing-size/decreasing-sizemessages with a minimal scheduling cost. It is especially suitable for clusters of workstations because it avoids server data starvation due to scheduling latencies. If the first blocks are short, the first servers may finish computing before a new block of data is available to them. If the first blocks are large, the last slaves must wait a substantial amount of time for their first block of data to be dispatched. Moreover, large blocks in the last
530
BlOlNFORMATlCS AND PARALLEL METAHEURISTICS
steps of the data distribution may increase overall processing time due to poor load balancing. For distributed-memoryparallel machines, the blocks of sequences arrive at slaves via message passing from a master that deals with the file system. It is also possible that the master sends to slaves only a pointer in the database, and the slaves load the sequences by themselves through the NFS (Network File System) or another particular element, i.e., the CFS (Concurrent File System). When shared memory is available, a counter-variable, which serves as a pointer into the database, manages the workload distribution. Since the counter is located in the shared memory, each processor can access it in a guarded region, obtain the value, and move the pointer to the next block. This type of system has been implemented for the Cray Y-MP [37]. Two simple notes complete this epigraph: 1. The Achilles heel of message passing is the relatively limited data transmission bandwidth in the communication pathway. In these architectures, the communication/computationratio must be low to efficiently port algorithms. It will always be harder to parallelize Fasta or Blast than a dynamic programming algorithm.
2. When there are several query sequences for database searching (i.e., in the case of a DBsrch server) a process-level of granularity can be applied (in fact, this approach is used at the NCBI (http:l/www.ncbi.nlm.nih.gov/BLAST) However, there is a more important thing to be learned at this point. When more tasks than processors are available, the simplest and most effective strategy is coarsegrained parallelization. This is so fundamental that presenting a new algorithm with this feature goes together with its parallel coarse-grained implementation. Some good examples are: 0
Structural biology (electron microscopy). Determines viral assembly mechanisms and identifies individual proteins. The computational intensive task in this algorithm is associated with imaging the 3D structure of viruses from electron micrographs (2D projections). The number of tasks is related to the set of candidate orientations for each particle, such calculations being at different orientations, completely independent of each other. Protein structure prediction. This task involves searching through a large number of possible structures representing different energy states. One of the most computationally intensive tasks calculates the solvent accessible surface area that can be measured on individual atoms if the location of neighboring atoms is known. Searching 3 0 structure databases. As the number of protein structures known in atomic detail increases, the demand for searching by similar structures also grows. A new generation of computer algorithms has been developed for searching by:
BIOINFORMATIC APPLICATIONS
531
1. extending dynamic programming algorithms [52]; 2. importing strategies from computer vision areas [20]; 3. using intra-molecular geometrical information, as distances, to describe protein structures [32][33]; and 4. finding potential alignments based on octomeric C alpha structure fragments and determining the best path between these fragments using a final dynamic programming step followed by least squares superposition [591. 0
Linkage analysis. Genetic linkage analysis is a statistical technique used for rapid and largely automated construction of genetics maps from gene linkage data. One key application of linkage analysis aims to map human genes and locate disease genes. The basic computational goal in genetic linkage analysis is to compute the probability that a recombination occurs between two loci L1 and L2. Most frequently used programs estimate this recombinationfunction by using a maximum likelihood approach [53].
All of the previous examples fit perfectly into coarse-grained parallel applications, due to the large number of independent tasks and the regular computational pattern they exhibit, together with the low communicatiodcomputation rate they present. All these features make them suitable for parallelism with high efficiency rates. However, several other interesting examples have non-regular computational patterns, and they need particular strategies to better exploit parallelism. Let’s take a deeper look into the last example. In the parallelization of LINKMAP, Miller et al. (1992) [46] first used a machine-independent parallel programming language known as Linda. It was compared to the use of machine-specific calls on the study of a Hypercube computer and a network of workstations, concluding that a machine-independent code could be developed using that tool with only a modest sacrifice in efficiency. One particular hypothesis says there are many pedigrees and/or many candidate 19vectors, treating each likelihood evaluation for one pedigree as a separate task. If there are enough tasks, a good load balancing can be obtained. Godia et al. (1992) [26] use a similar strategy for the MENDEL program. However, Gupta et al. (1995) [29] observe that typical optimization problems have a dimension of only two or three; thus, there is no need for a large number of processors. In conclusion, it is important to integrate parallelization strategies for individual function evaluation (coarse-grained) with a strategy to parallelize the gradient estimation (fine-grained). 21.4.2 Semi-Regular Computational Patterns A similar problem arises in the parallelization of hierarchical multiple sequence alignments, MSA [ 11][28][47][66]. The first steps for solving a MSA include calculating a cross similarity matrix between each pair of sequences, followed by determining the alignment topology and finally solving the alignment of sequences, or clusters themselves.
532
BlOlNFORMATlCS AND PARALLEL METAHEUFUSTICS
Painvise calculation provides a natural target for parallelization because all elements of the distance matrix are independent (for a set of n sequences n . ( n - 1)/2 pair wise comparisons are required). Computing the topology of the alignment (the order in which the sequences will be grouped) is a relatively inexpensive task, but solving the clustering (guided by the topology) is not that amenable to parallelism. This is due to the fact that, at this stage, many tasks are to be solved (for a set of n sequences it is necessary to solve n - 1 alignments). However, only those tasks corresponding to the external nodes of the topology can be solved concurrently. Certainly, parallel strategies for the cross-matrix calculation have been proposed [27][12][60], all of them in a coarse-grained approach. In addition, when the MSA is embedded in a more general clustering procedure [69], combining a dynamic planning strategy with the assignment of priorities to the different types of active tasks using the principles of data locality has allowed us both to exploit the inherent parallelism of the complete applications and to obtain performances that are very close to optimal. However, at present and strictly speaking, the last step in MSA remains unsolved for parallel machines. When the work is carried out following a coarse-grained parallelization scheme for distributed-memory architectures, it is then necessary to exchange the sequences and/or clusters that are being modified due to the insertion of gaps during their alignment, which is extremely expensive. For this, we should look back and learn from the earliest fine-grained parallel solutions applied to sequence comparison. Today, when mixed shareddistributed memory architectures are available, this could be an excellent exercise that -it should be stressed- is far from being an academic one. A full solution probably should combine a coarse-grained solution when computing the cross similarity matrix with a fine-grained solution for solving the topology. Many challenges are yet to be overcome.
21.4.3 Irregular Computational Patterns Applications with irregular computational patterns are the hardest to deal with in the parallel arena. In numeric computation, irregularity is mostly related to sparse computational spaces which introduce hard problems for data parallel distributions (fine-grained approaches) and data dependencies. The latter reduces the number of independent tasks, which affords little chance to develop efficient coarse-grained parallel implementations. A good example of this comes from another routine task in biological sequence analysis, that ofbuilding phylogenetictrees [25][7][ 17][181. Earlier approachesto apply maximum likelihood methods for very large sets of sequences have been centered on the development ofnew simpler algorithms, such as the fastDNAml [ 5 11, which has been ported to parallel architectures using the P4 package [6]. A current parallel version of the fastDNAml implementedin C with communicationsunder MPI is available at http:ilwww.santafe.edu/btWscience-paper/bette.htmland at the Pasteur Institute under TreadMarks http://www.cs.rice.edd willy/TreadMarks/overview.html.Even these simplified approaches have been known to be quite computationally intensive.
BIOINFORMATIC APPLICATIONS
533
In fact, they were reported to have consumed most of the CPU time of the first IBM SP 1 installation in the Argonne National Laboratory (1993) [4]. Let’s center our attention on the original Felsenstein version of the method (implemented in the PHYLIP package, available at evolution.genetics.washington.edu).In very simple terms, the maximum likelihood method searches for a tree and a branch length that have the greatest probability of being produced from the current sequences that form that tree. The algorithm proceeds by adding sequences into a given tree topology in such a way that maximizes the likelihood topology (suitable for coarsegrained parallelism). Once the new sequence is inserted, a local-optimization step is performed to look for minor rearrangements that could lead to a higher likelihood. These rearrangements can move any subtree to a neighboring branch. Given a current tree Tk with likelihood Lk,one of its k nodes is removed and rearranged in its two neighbor nodes which produce two new trees, Tk1 y Tk2,with likellhood Lkl and Lk2,respectively. The tree with greater likelihood value (including Lk)is chosen as the new best-tree and it replaces the current-tree. This procedure is performed until the set of nodes to rearrange is exhausted, as can be observed in the next pseudocode: Algorithm 2. Local ODtimization in the DNAml Algorithm
Current-best-tree Tk(Lk); for i = 1 to n-tasks
Nfrom insertion step
Remove sub-tree a from Tk and produce Tk1 and Tk2; Likelihood evaluation for Tk1 and T ~( z L ~ and I Lk2); Current-best-tree Tk = tree with greatest likelihood (Tk,Tk1,Tkz);
Strictly speakmg, only those nodes without leaves and a depth of at least 2 (not hanging from the root node) can be reorganized, which represent 2k - 6 tasks ( k being the number of species). For a large number of sequences, it could be addressed in a coarse-grained parallel solution by distributing the n tasks among different processors. Unfortunately, the reorganization task of one node is dependent on the reorganization of the previous nodes, due to the replacement of the current best-tree. In fact, each new optimization task must be performed over the last best-tree found, and not over the initial topology. This leaves only two tasks (likelihood evaluation for T k 1 and T k 2 topologies) that can be solved in parallel. In other words, the maximum theoretical speedup of this step will be limited to this value (2), independent of the number of processors used in the computation. There is no generic procedure to address this type of irregular problem; hence, a good initial approach includes a detailed analysis of the computational behavior of the algorithm. In this specific case, a careful runtime analysis of the algorithm shows that the number of times a tree with a likelihood better than the current likelihood is obtained is extremely low (see Figure 21.3). From this behavior, it is possible
534
BIOINFORMATICS AND PARALLEL METAHEURISTICS
to conclude that the probability of a new current-best-tree event is rather low; or conversely, in most cases there is a high probability that a best tree will not be produced. The most important implication of this runtime observation is that, having evaluated the probability of rearranging a node, the next likelihood evaluation can be started with the same tree used to evaluate the previous one. In this way, the task dependencies in the local optimization step are avoided [8][70].
DNA-ml: Algorithm Run-Time Behaviour
Fig. 21.3 Example of the runtime behavior of the DNAml algorithm in the optimization step (4)using 50 sequences. The number of circles in each horizontal line represents the number
of optimization tasks successively performed as a function of the number of sequences already incorporated into the topology (on the left). Filled circles show those points in which a new maximum value was detected. At the top right-hand corner of the figure, the total number of tree likelihood evaluations performed by the Ceron et al. algorithm is presented, together with the number of extra evaluations (parallel penalty) incurred by the algorithm and the very low penalty evaluation percentage.
21.5 PARALLEL METAHEURISTICS IN BIOINFORMATICS As it is known, metaheuristics are generic methods for nonexact solution of difficult ("-hard) combinatorial problems [45]; on the other hand many bioinformatics applications involve hard combinatorial searches over a large solution space. Therefore,
PARALLEL METAHELIRISTICS IN BIOINFORMATICS
535
it seems reasonable to think bioinformatics problems could benefit from appropriate metaheuristic approaches. In fact, some of these bioinformatic problems are so hard, and the amount of raw data to be processed so vast, that a parallel exploration of the search space is a must if you want to find a reasonably "good" solution. With the advent of powerful distributed or parallel computers, new bioinformatics algorithms making use of metaheuristics will hopefully be able to produce quality results within a reasonable amount of time. There are a growing number of works in the literature applying metaheuristics and parallel or cooperative metaheuristics to classical but still unsolved challenges in the automation of bioinformatics data analysis [24][65]. Most of these analyses can be regarded as difficult combinatorial optimization problems such as, for example, efficient learning of proprieties from data, classification of complex sets of information, extraction of grammatical structure from sequences, etc. For example, multiple sequence alignment (MSA), a semi-regular bioinformatics application discussed in the previous section, is a combinatorialoptimization problem whose aim is to find the optimal alignment of a group of nucleotide or protein sequences. This task plays a key role in a wide range of applicationsthat include finding the characteristic motifs among biological sequences, backtracking the evolutionary paths through sequence similarity, identifying the consensus sequence, predicting the secondary and tertiary structures, and sequence clustering. Throughout the previous sections of this chapter we have surveyed a variety of parallel implementationsto depict general solutions in the bioinformatics arena. Now we are going to introduce the parallelization of a particular heuristic approach for the clustering of very large sequence data sets in which the multiple alignment of the input sequence data set is initially an unaffordable prerequisite. Current classification and clustering methods for biological sequences are mostly based on pre-aligned sequences (using MSA), which is CPU time bounded and becomes unaffordable for large sets of sequences. Numerous techniques have been used to align multiple sequences [50] and heuristic approaches have many instances among them. Simulated annealing [38], genetic algorithm [49][77] and tabu search [65] are heuristic iterative optimization techniques that have been applied to align multiple sequences. There are also parallel solutions using simulated annealing [35] and hierarchical cooperative implementations of evolutionaryalgorithms [24] for the MSA resolution. This is still an open problem, and as we stated in the previous section devoted to applications with semi-regular computational patterns, at present the last step in MSA remains with no feasible solution for parallel machines. Thus in this section we will describe a two-step strategy for the hierarchical classification of very large data sets. The power of our heuristic solution is due to the multiresolution decomposition of a self-organizing algorithm. We use a first coarse classification step to break the input data set in affordable independent subsets, which allows scaling down the complexity of MSA and at the same time enables the parallel computation of these MSA independent tasks. The first step of this approach is aimed at identifying coarse but homogeneous groups based on the dipeptide composition of the sequences and the second step uses each coarse group as input to the original SOTA classification algorithm, enabling us to deal with large groups of sequences.
536
21.5.1
BlOlNFORMATlCS AND PARALLEL METAHEURISTICS
The Problem
There have been various attempts at grouping sequences systematicallywith different objectives, i.e. UniGene builds a non-redundant set of gene-oriented clusters from GeneBank [ 7 2 ] ;ClusTr aims to produce clusters of protein families [40]; and iProClass [76] establishes comprehensive family relationships and structuraVfunctiona1 features of proteins. In all cases, the strategy for clustering involves some form of “all-against-all” comparison, which is computationally expensive: O ( N 2 )at least, N being the number of sequences in the comparison, which is certainly a major concern in view of the spectacular growth in the number of biological sequences in the databases. In addition, each comparison step involves sequence alignments whose complexity is O ( L 2 ) ,L being the sequence length. Both, the high computational cost and the non-linear behavior of the algorithmic complexity with respect to the number of sequences are behind the interest in developing high performance computing approaches to this problem. Unsupervised neural networks, and in particular self-organizing maps (SOMs) [39], provide a robust and accurate approach to the clustering of large amounts of data that replace the “all-against-all’’ strategy with “all-against-the-nodes”, which can be performed in virtually lineal run times [30]. In fact, SOMs have been used for classifying large datasets of proteins [ 191. Nevertheless, since SOM is a topologypreserving neural network [23] and the number of clusters is arbitrarily fixed from the beginning, it trends to be strongly influenced by the number of items. Thus, if some particular family of proteins is overrepresented, SOMs will produce an output in which this type of data will populate the vast majority of clusters. The Self-organizing Tree Algorithm (SOTA) [ 151, a clustering divisive method, was proposed to organize prealigned sequences [30] showing a good recovering of the natural cluster structure of the data set. However, the requirement of prealigned sequences demands all-against-all multiple alignments, which is unforeesable for large datasets. To solve this problem a two-steps approach can be devised: preidentifying coarse sequence groups previous to the use of SOTA. Thus, a coarse classification will divide the complete set into well defined groups, and once the coarse groups are determined, a multiple sequence alignment of each group is performed and used as input to the original SOTA algorithm, enabling large groups of sequences to be dealt with. Thus, for a test case of 1000 sequences, using the original SOTA procedure, almost 0.5. lo6 sequence alignments need to be computed; assuming that ten coarse groups composed of one hundred sequences on average are formed, the computational cost will only be around 0.5 . lo5 painvise sequence alignments, representing a one order of magnitude reduction on the original case. Moreover, when worlung over a whole database, such as SWISS-PROT with 80,000 sequences (version 38), and assuming around 2000 groupdfamilies with an average of 40 sequences per group, the original case would require more than 3 . lo9 sequence comparisons, whereas with a preselection of coarse groups, only 1.5 . lo6 comparisons are needed. It is worth observing that, while the number of new sequences grows exponentially, the number of new families is expected to grow linearly. This means that the proposed
PARALLEL METAHEURISTICS IN BlOlNFORMATlCS
537
strategy will obtain even better results so long as sequence databases continue to grow in the predicted way. However, to be effective, the initial step must necessarily be based on a fast method of estimating distances between sequences, such as dipeptide-frequencies-baseddistances [74][75][311. Additionally the point at which the coarse classification should be stopped to give way to the second fine-classification procedure was estimated from the dipeptide random distances distribution as the threshold that produces homogeneous clusters of sequences, i.e., groups of sequences with similar dipeptide distribution. 21.5.2
The Procedure
In summary, the procedure is composed of three steps (see Figure 21.4): first, a pre-selection group strategy, named SOTAdp, based on dipeptide coding with cluster homogeneity criteria being used for the definition of groups; second, the coarse groups of sequences obtained in the first step are aligned for input to the final step in which the original SOTA is applied.
I
a a
8
a
a 4
aI
[ c ISotaC
Fig. 21.4 General strategy. In step [A] sequences are represented by their dipeptide frequencies, and a modification of Euclidean distance is used as a similarity measure, without previous treatment of sequences. In step [B] MSA is applied over reduced sets of sequences, producing an important computational space reduction. Finally, in step [C] the classic (SOTA) algorithm is applied to refine the classification.
SOTA is based on both the SOM [39] and the growing cell structures by Fritzke (1994) [23]. SOTA is an unsupervised neural network with a hierarchical topology.
538
BlOlNFORMATlCS AND PARALLEL METAHELIRISTICS
The final result is a map in the shape of a binary tree where sequences are mapped to terminal nodes which contain their average values. The initial system is composed of two external elements, denoted as cells, connected by an internal element that we will call a node. Each node is a vector with the same size as the data used (400, for the case of dipeptide frequencies or the length of aligned sequences, depending on the case). The algorithm proceeds by expanding the binary tree topology starting from the terminal node with the most heterogeneous population of associated input data. Two new descendants are generated from this heterogeneous cell which changes state to an internal node. The series of steps performed until a terminal node generates two descendants is called a cycle. During a cycle, nodes are repeatedly adapted by the input gene profiles. This process of successive cycles generating descendant nodes can be stopped at the desired level of heterogeneity, thus producing a classification of data to a given hierarchical level. If no stop criterion is used, then a complete hierarchical classification of the whole data set would be obtained. SOTA has been used in the initial and final steps. In the first case, SOTAdp uses k-peptide frequencies (k = 2, being k consecutive residues) for encoding the sequence information [74]. In the last step each sequence position is related to the probability of finding a given residue in that position. Homogeneity is used in both cases as stopping criteria. For SOTAdp, a cluster is homogenous when it is formed of sequences whose dipeptide distances are below the homogeneity threshold; and in the third step the homogeneity is evaluated through the silhouette index [57]. In the second step, a MSA is required in order to produce a set of sequences of the same length -by introducing gaps in the sequences- and maximizing similarity between residues in the vertical composition of the group. As described previously, the MSA follows a three step approach: 1. Determination of the cross-similarity matrix which demands R . ( n calculations of pairwise similarity values.
+ l)/2
2 . Calculation of the alignment topology.
3. Solution of the n - 1 sequence and/or cluster alignments, following the order of the topology.
21.5.3 The Parallel Solution Although the identification of coarse groups of sequences avoids MSA of the whole group, the number of sequences involved in this type of study demands computational resources that are still high enough to justify a high performance computing approach to this problem. A careful runtime analysis of the three-steps approach is shown in Figure 21.5. From this analysis it is concluded that most of the CPU time is demanded by the MSA step, with SOTA a far second and a small requirement for SOTAdp. This suggests a focus of parallelization efforts on the MSA step. Moreover, SOTAdp runs only once, but MSA and SOTA run for each of the groups produced by SOTAdp.
PARALLEL METAHEURISTICS IN BlOlNFORMATlCS
539
It is also noticeable that MSA is performed for each of the coarse but homogeneous groups formed by SOTAdp. In the three steps in which MSA proceeds (producing the similarity matrix, computing the alignment topology and solving the alignments), the intermediate step -topology- has not appreciable effect on the computing demand. The natural parallel solution considers each coarse cluster (produced by SOTAdp) as a distribution unit (task) (see Figure 21.6). Thus, the first step -SOTAdp will mn sequentially and the parallel processing will start after the coarse groups have been formed. Each branch of the topology is fully assigned to a given processor in which the MSA and then SOTA will be applied, thus amounting to a coarse-grained parallel solution.
Fig. 21.5 Application profiling. On the left, we have drawn the general procedure. SOTAdp produces coarse groups. For each one of these groups an MSA is performed in three steps (computing the per-sequence similarity matrix, determining the alignment topology, and performing the alignment). Finally, the classical SOTA is applied over each set of prealigned sequences to refine the classification. The computational cost (right-hand side) arises from MSA and includes the computation of the similarity matrix.
Because a high number of sequences are expected, enough tasks are available for distribution in this coarse-grained parallel strategy. However, the computational cost of solving a coarse cluster is a function of the number of sequences belonging to such a cluster which can be significantly different between clusters. For this reason a priority level is assigned to each task to ensure that large tasks are not left until last for launching in the parallel system. The priority level is inversely proportional to the number of sequences in the cluster and the average size of their sequences. The main drawback to the simple initial solution proposed in the above paragraphs is the large size of the distribution unit (parallel task). This makes the strategy quite sensitive to the number of tasks that can produce an unbalanced load distribution with very poor results. As an effective way of reducing this problem, the MSA subtask can be divided into two minor tasks: computing the similarity matrix and
540
BlOlNFORMATlCS AND PARALLEL METAHEUFUSTICS
solving sequence alignments (the latter includes computing the topology). At the same time, the SOTA applied over each branch of aligned sequences (produced by SOTAdp topology) is also a distribution unit task. Thus for this second approach, we may differentiate up to 4 types of tasks: [Tl] SOTAdp to gather coarse clusters, [T2] completion of similarity matrix of a given cluster, [T3] solution of the sequence alignment, including determination of the alignment topology, and [T4] SOTA using a set of aligned sequences as input. It is worth observing that in the first coarse-grained solution, each cluster produced by SOTAdp was completely assigned to one processor (i.e., depth-first solved). In the second approach, the different subtasks needed for solving a cluster can be computed in different processors (i.e. breadth-first solved). The parallel processing conducts as follows: First [Tl] SOTAdp runs to obtain a given number m of clusters which produce, from the parallel perspective, m new tasks of type T2 for inclusion in the queue of unsolved parallel tasks. The parallel threads pick tasks from this queue. When a T2 task is completed a new T3 task is produced for inclusion in the queue, and when a T3 task ends, a T4 task is produced. The pending-tasks queue is also managed with a priority scheme that assigns higher priority to longer tasks. However, to avoid longer coarse groups monopolizing processors, we introduce a new priority criterion: the task-type, which has the effect of first launching those tasks that will produce new tasks to be solved, thus increasing
I'a
a
a '
I&n qa
a
a a a Coarse-Grained
Medium-Grained
Fine-Grained
Parallel Approaches Fig. 21.6 Alternative parallel approaches. On the left, the initial approach distributes a complete branch of the topology to processors (distribution units are identified by filled rectangles). In the middle, three different tasks are proposed for each branch, and finally, on the right, small distribution tasks are proposed in a fine-grained solution with good parallel results.
PARALLEL METAHEURISTICS IN BIOINFORMATICS
541
the number of tasks for distribution avoiding processor inactivity, and then ensuring better performance. SOTA and SOTAdp have strong data dependencies that prevent their parallelization. However, tasks T2 and T3 -accounting for between 80% and 90% of the computational load- are still expensive as individual tasks, and they can be reduced in size. Task T2 can be divided into subtasks that partially calculate the similarity matrix (a row-blocks distribution is used). Task T3 poses a more serious challenge: the interdependenceof the tasks, which means that despite there being (g - 1)singlealignment tasks in each cluster with g sequences, only a small number of them may be simultaneously solved due to data dependencies. To solve this problem we use a distribution task governed by a task graph (see Figure 21.6), which delays the alignment launch of a given node until the previous nodes have been solved. First the strategy was evaluated about its ability to reproduce previously wellaccepted biological knowledge using a short data set composed of 48 protein sequences belonging to five different protein families. These groups, Catalase, Citrate, Globin, Histone, and G-proteins, have typical dipeptide distributions which makes it difficult to separate groups. It is interesting to study the behavior of the algorithm under these conditions, since they represent one of the less favorable cases for our strategy for coarse classification based on dipeptide distances. Figure 21.7 (left) depicts the results of running SOTAdp over this dataset using different thresholds for the first step. As expected, the number of clusters and the cluster size are obviously dependent of the threshold stringency. As we increase the coverage value, more -small- clusters are formed because a minor distance is demanded to belong to the same cluster. Because the method looks for coarse clusters, good results are obtained for coverage values of 90% and even at 70%, with a correct separation of sequences. Second, a massive test over the SwissProt database [5] was used to contrast the efficiency and usefulness of the strategy, and to illustrate not only how the method works on a large sets of sequences, but its ability to produce results with affordable CPU resources and in a time that allows the routine use of the procedure. Finally a synthetic test set was used to evaluate the performance of the parallel strategy in terms of speedup. This test was formed with 10,000 synthetic sequences organized in 50 groups. These sequences were generated from a set of 50 random sequences of 400 residue length over which a point mutation was applied to produce close related partner sequences. A number of different workload tests (formed by partial collections of 1000, 5000, and 10,000 sequences, named Test-S1, Test-S5, and Test-S10) were used to explore the behavior of the parallel strategy under different workloads: number of available tasks and computational cost of each task. The goal of parallelizing the code was achieved by using a thread-based approach that results in a very portable code. In fact, along this work three specific and quite different parallel platforms were used. One of them was a SGI-Origin 2000 (http://www.scai.uma.es)with up to 16 processors, the second an IBM RS/6000 SP, and the third a LAN-based cluster of PCs. Figure 21.7 (right) shows the speed-up achieved for the three synthetic subsets in this test (Test-S1, Test-S5, and Test- SlO). The most important question to ask with this test is the ability of the strategy to retain good performance levels when adding additional processors. This fact is positively
542
BlOlNFORMATlCS AND PARALLEL METAHEURISTICS
contrasted with the results shown below. It is evident that the reduction in the number of sequences to very low levels (e.g., Test-S 1 for 8 PEs) substantially affect the results (4.6 of speedup). On the other side, increasing the number of sequences for the same number of PE, have always a positive response (using 8 PEs, the efficiency goes up to 72.5% with 5000 sequences and 80% with 10,000 sequences, and in this latter case the efficiency is at the 72.5% level). In this example we have described both the use of metaheuristics to solve a classical bioinformatics problem and the strategy for its parallel implementation. As in the previously discussed cases a detailed runtime analysis and profiling of the application has allowed an efficient parallel solution be implemented. Efficiency has been obtained by combining advanced dynamic scheduling algorithms, prioritybased load distribution, and graph task organization for solving data dependencies. The result is a parallel computing implementation that is very efficient when applied to a large set of sequences that scale well when the number of processors increases.
6
Parallel Results
4
2 0
3 6 4
2
0
0
2
4
6
8
10
12
I4
16
Fig. 21.7 On the left, coarse cluster composition for the set of 48 protein sequences. Three different thresholds have been used: on the left a restrictive value with k = 0.7, in the middle the normal case with k = 0.9, and on the right a permissive k = 0.99, k being a factor multiplied by the threshold surface. As can be observed, the number of coarse clusters and the number of sequences in each cluster (cluster size) depend on the value of k. For normal values, a correct separation of sequences is obtained. On the right, parallel speedup curves are shown for the three different data sets formed by 1000, 5000, and 10,000 synthetic sequences running on a SGI-Origin-2000 (similar results were obtained when using the IBM RSi6000 SP described in the text). The shorter group maintains a reasonable efficiency up to 8 processors. Coarse groups have on average only 20 sequences that are easily managed by the pool. Better results are observed for longer sets, and when necessary, a more powerful machine can be used in all its capacity.
CONCLUSIONS
543
21.6 CONCLUSIONS 21.6.1 Reusing Available Software Parallel computing has shown itself to be an effective way to deal with some of the hardest problems in bioinformatics. The use of parallel computing schemes expands resources to the size of the problem that can be tackled, and there is already a broad gallery of parallel examples from which we can learn and import strategies, allowing the development of new approaches to challenges awaiting solution, without the need to ‘re-invent the wheel’. Today, it should be natural to “think in parallel” when writing software, and it should be natural to exploit the implicit parallelism of most applications when more than one processor is available. In most bioinformatic applications, due to a high number of independent tasks, the simplest approaches are often the most effective. These applications scale better in parallel, are the least expensive, and are the most portable among different parallel architectures.
21.6.2 New Challenges However, several other problems in bioinformatics remain unsolved as far as parallel computing is concerned. Parallel metaheuristics approaches appear as a positive alternative to address hard computing problems in the field, representing attractive challenges for biologists and computer scientists in the years ahead.
Acknowledgments We would like to thank Dr. Jacek Bardowsky from the IBB-PAN in Warsaw-Poland for valuable comments in the biological aspects of this document and Dr. Eladio Gutierrez from the Computer Architecture department of the University of Malaga for sharing his LTEX know-how. This work has been partially supported by project GNV-5 Integrated bioinformatics UMA, from Genoma Espaiia.
REFERENCES 1. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J. (1989). “The Molecular Biology of the Cell”, (2nd. ed. ed.). New York, Ny: Garland Publishing.
2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990). “Basic local alignment search tool”, J.Mol.Bio1. 2 15:403-410.
544
BIOINFORMATICS AND PARALLEL METAHEURISTICS
3. Altschul, S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W. and Lipman D.J., (1997). “Gapped BLAST and PSI-BLAST: A new Generation of Protein DB search Programs”, Nucleid Acids Research (1997) 25: 17,3389-3402.
4.Argonne National Laboratory, (1993), “Early experiences with the IBM SP1 and the High Performance Switch”, (Internal report ANL-93141).
5. Bairoch and Apweiler, (2000). “The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000”. Nucleic Acids Res. 28( 1):45-48. 6. Butler, R. and Lusk, E. (1992), “User’s guide to the P4 programming system, (Argonne National Laboratory Technical report TM-ANL-92/17), 7. Cavalli-Sforza, L.L. and Edwards, A.W.F. (1967), “Phylogenetic analysis: models and estimation procedures”, Am. J. Hum. Genet. 19: 233-257.
8. Ceron, C., Dopazo, J., Zapata, E.L., Carazo, J.M. and Trelles, 0. (1998), “Parallel Implementation for DNAml Program on Message-Passing Architectures”, Parallel Computing and Applications, 24 (5-6), 701-7 16. 9. Collins, J.F. and Coulson, A.F.W., (1987) “Nucleid Acid and Protein Sequence Analysis: a practical Approach”, IRL Press, Oxford, 327-358.
10. Coulson, A.F.W., Collins, J.F. and Lyall, A., (1987) “Protein and Nucleid Acid sequence database searching: a suitable case for Parallel Processing”, Computer J., (39), 420-424. 11. Corpet, F. (1988), “Multiple sequence alignments with hierarchical clustering”, Nucleic Acid Research, (16), 10881-10890. 12. Date, S, Kulkarni,R., Kulkami, B., Kulkami-kale, U. and Kolaskar,A. (1993), “Multiple alignment of sequences on parallel computers”. CABIOS (9)4, 397402. 13. Deshpande, A.S., Richards, D.S. and Pearson, W.R. (1991), “A platform for biological sequence comparison on parallel computers”, CABIOS (7), 237-247. 14. Doolittle, R. F., Hunkapiller, M. W., Hood, L. E., Devare, S. G., Robbins, K. C., Aaronson, S. A., and Antoniades, H. N. (1983). Simian Sarcoma Onc Gene, v-sis, Is Derived from the Gene (or Genes) Encoding Platelet Derived Growth Factor. Science, 221, 275-277.
15. Dopazo, J. and Carazo, J.M. (1997) Phylogenetic reconstruction using a growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol 44 1226-233. 16. Edmiston, E. and Wagner, R.A. (1987) “Parallelization of the dynamic programming algorithm for comparison of sequences”, Proc. of 1987 International Conference on Parallel Processing pp.78-80.
REFERENCES
545
17. Felsenstein, J. (1973), “Maximum-likelihood estimation of evolutionary trees from Continuous Characters”. Society of Human Genetics 25: 471-492. 18. Felsenstein, J. (1988), “Phylogenies from molecular sequences: inference and reliability”. Annu. Rev. Genet. 22: 521-565. 19. Ferran EA, Ferrara P (1992), “Clustering proteins into families.using artificial neural networks”. Comput Appl Biosci. 8:39-44. 20. Fisher, D., Bachar, O., Nussinov, R. and Wolfson, H. (1992), “An efficient automated computer vision, based technique for detection of three-dimensional structural motifs in proteins”, J.Biomo1.Struct.Dyn. 9 :769-789. 2 1. Flynn, M.J., (1972), “Some Computer Organizations and their Effectiveness”, IEEE Transon Computers, v0l.C-2 1 (948-960). 22. Foster, 1. (1994), “Designing and Building parallel programs: concepts and tools for parallel software engineering”, Addison-Wesley Publishing Company, Inc. (On-line version: http://wotug.ukc.ac.uk/parallel/books/addison-wesley/dbpp/). 23. Fritzke, B. (1994), “Growing cell structures - a self-organizing network for unsupervised and supervised learning”. Neural networks 7: 1141- 1160. 24. Gras, R.; Hernandez, D.; Hernandez, P.; Zangger, N.; Mescam, Y. Frey, J.; Martin, 0.;Nicolas, J. and Appel, R.D. (2003) “Cooperative Metaheuristics for Exploring Proteomic Data”. Artificial Intelligence Review, 20:95, 120. 25. Gribskov, M. and Devereux, J. (199 l),“Sequence Analysis Primer”, UWBC Biotechnical Resource Series. 26. Godia, T.M., Lange, K., Miller, P.L. and Nadkami, P.M., (1992), “Fast computation of genetic likelihoods on human pedigree data”, Human Heredity, 42:42-62. 27. Gonnet, G.H., Cohen, M.A. and Benner, S.A. (1992) “Exhaustive matching of the entire protein sequence database”. Science, (256), 1443-1445. 28. Gotoh, O., (1993), “Optimal alignment between groups of sequences and its application to multiple sequence alignment”. CABIOS (9):2,361-370. 29. Gupta, S.K., Schaffer, A.A., Cox, A.L., Dwarkadas, S. and Zwaenepoel, W. (19959, “Integrating parallelization strategies for llnkage analysis”, Computers and Biomedical Research, (28) 116-139. 30. Herrero, J., Valencia, A. and Dopazo, J. (2001), “A hierarchical unsupervised growing neural network for clustering gene expression patterns”. Bioinfonnatics, 17:126-136. 3 1. Hobohm, U. and Sander, C. (1995), “A sequence property approach to searching protein databases” J. Mol. Bio1.251, 390-399.
546
BIOINFORMATICS AND PARALLEL METAHEURISTICS
32. Holm, L. and Sander Ch. (1993), “Protein structure comparison by alignment of distance matrices”, J.Mol.Bio1. 233: 123-138. 33. Holm, L. and Sander Ch. (1994), “Searching protein structure databases has come of age”, Proteins 19:165-173. 34. Hwang Kai and Xu Zhiwei (1998), “Scalable Parallel Computing: Technology, Architecture, Programming”, McGraw-Hill Series in Computer Engineering. 35. Ishikawa, M., Toya, T., Hoshida, M., Nitta, K., Ogiwara, A., and Kanehisa, M. ( 1993), “Multiple sequence alignment by parallel simulated annealing”. Comput. Appl. Biosci., 9(3): 267-73. 36. Jones, R., (1992) “Sequence pattern matching on a massively parallel computer”, CABIOS (8), 377-383. 37. Jiilich A., (1995), “Implementations of BLAST for parallel Computers”, CABIOS 11, 1(3-6). 38. Kim, J., Pramanik, S., and Chung, M. J. (1994), “Multiple sequence alignment using simulated annealing”. Comput. Appl. Biosci., lO(4): 419-26. 39. Kohonen T. (1997), “The Self-organizing Maps”. Berlin, Springer. 40. Kriventseva EV, Fleischmann W, Zdobnov EM, Apweiler R (2001), “CluSTr: a database of clusters of SWISS- PROT+TrEMBL proteins”.Nucleic Acids Res 29133-36. 41. Lander, E., Mesirov,J.P. and Taylor W. (1988) “Protein sequence comparison on a data parallel computer”. Proc. of 1988 International conference on Parallel Processing pp.257-263. 42. Li, W.-H. and Graur, D. (1991). “Fundamentals of Molecular Evolution”. Sunderland, MA: Sinauer Associates, Inc. 43. Lipman, D.J. and Pearson, W.R. (1985), “Rapid and sensitive protein similarity searches”, Science, 227, 1435-1441. 44. Martino, R.L., Johnson, C.A., Suh, E.B., Trus, B.L. and Yap, T.K. (1994) “Parallel computing in Biomedical research”. Science (256) 902-908. 45. Michalewicz, Z. and Fogel, D. (2000), “How to Solve It: Modem Heuristics”. Springer-Verlag. 46. Miller, P.L., Nadkami, P.M. and Bercovitz, P.A. (1 992) “Hamessing networked workstations as a powerful parallel computer: a general paradigm illustrated using three programs for genetic linkage analysis”, Comput. Applic. Bioscience, (S), 141-147. 47. Miller, W. (1993) “Building multiple alignments from painvise alignments”. CABIOS (9) 2, 169-176.
REFERENCES
547
48. Needleman, S.B. and Wunsch, C.D. (1970), “A general method applicable to the search for similarities in the aminoacid sequence of two proteins”, J.Mol.Biol., 48,443-453. 49. Notredame, C. and Higgins, D. G. (1996), “SAGA: sequence alignment by genetic algorithm”, Nucleic Acids Res., 24(8): 1515-24.
50. Notredame, C. (2002), “Recent progress in multiple sequence alignment: a survey”, Pharmacogenomics, 3( 1): 131-44. 51. Olsen, G.J., Matsuda, H., Hagstrom, R. and Overbeek, R. (1994), “fastDNAm1: a tool for construction of philogenetic trees of DNA sequences using maximum likelihood, CABIOS 10 41-48. 52. Orengo, C.A., Brown, N.P. and Taylor, W.T. (1992), “Fast stucture alignment for protein databank searching”, Proteins, 14:139-167. 53. Ott, J., (1991) “Analysis of Human Genetic Linkage”, The Johns Hopkins University Press, Baltimore and London (Revised Edition). 54. Pearson W.R. and Lipman D.J.; (1988), “Improved tools for biological sequence comparison”, Proc.Natl.Acad.Sci. USA (85), 2444-2448.
55. Rechenmann, F. (2000), “From Data to Knowledge”, Bioinformatics, v. 16.115 pp 411. 56. Rodriguez, A., Fraga, L.G. de la, Zapata, E.L., Carazo, J.M. and Trelles, O., (1998), “Biological Sequence Analysis on Distributed-Shared Memory Multiprocessors”, 6th Euromicro Workshop on Parallel and Distributed Processing. Madrid, Spain. 57. Rousseeuw, P.J. (1987) “Silhouettes: A graphical aid to the interpretations and validation of cluster analysis”. J. of Computational and Applied mathematics,20:53-65.
58. Schena. M., Shalon, D., Davis, R.W. and Brown, P.O. (1995), “Quantitative monitoring of gene expression patterns with a complementary DNA microarray”. Science ,270,467-70.
59. Shindyalov, I.N, and Bourne, P.E. (1998), “Protein structure alignment by incremental combinatorial extension (CE) of the optimal path”, Protein Engineering 11 (9) 739-747. 60. SGIT”, (1999), “SGI Bioinfonnatics http:llwww.sgi.com/chembio
performance
report”
at
61. Smith, T.F. and Waterman, M.S. (198 l), “Identification of common molecular subsequences”, J. Mol. Biol, 147, 195-197. 62. Sturrock, S.S. and Collins, J., (1993), “MPsrch version 1.3”, BioComputing Research Unit, University of Edinburgh, UK.
548
BlOlNFORMATlCS AND PARALLEL METAHEURlSTlCS
63. Sunderam, V., Manchek, R., Dongarra, J., Geist, A., Beguelin, A. and Jiang, W. (1993), “PVM 3.0 User‘s Guide and Reference manual”. Oak Ridge National Laboratory. 64. Tanenbaum, A., (1 999), “Structured Computer Organization”, Ed. Prentice-Hall, Fourth Edition. 65. Tariq, R., Yi, W. and Li, K,B., (2004), Multiple sequence alignment using tabu search. Proceedings of the second Asia-Pacific Bioinformatics Conference. Dunedin, New Zealand, 2004. 66. Thompson, J.D., Higgins, D.G. and Gibson, T.J, (1994), “Clustal W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice”, Nucleic Acids Research 22:4673-4680. 67. Trelles,O., Zapata, E.L. and Carazo, J.M., (1994a), “Mapping strategies for sequential sequence comparison algorithms on LAN-based message passing architectures”, In Lecture Notes in Computer Science, vol796; High Performance Computing and Networking, Springer-Verlag,Berlin, 197-202. 68. Trelles, O., Zapata, E.L. and Carazo, J.M., (1994b), “On an efficient parallelization of exhaustive sequence comparison algorithms on message passing architectures”, CABIOS 10 ( 5 ) , 509-5 11. 69. Trelles, O., Andrade, M.A., Valencia, A., Zapata E.L. and Carazo, J.M., (1998a), “Computational Space Reduction and Parallelization of a new Clustering Approach for Large Groups of Sequences”, BioInformatics vol. 14 110.5 (pp.43945 1). 70. Trelles, O., Ceron, C., Wang, H.C., Dopazo, J. and Carazo, J.M., (1998b), “New phylogenetic venues opened by a novel implementation of the DNAml Algorithm”, BioInformatics vol. 14 no.6 (pp.544-545). 7 1. Trelles, 0. (200 l), “On the Paralelization of bioinformatic applications”, Briefings in bioinformatics (May, 2001) v01.2 (2) pp. 181-194. 72. Wheeler DL, Church DM, Lash AE, Leipe DD, Madden TL, Pontius JU,Schuler GD, Schriml LM, Tatusova TA, Wagner L, Rapp BA. (2002) “Database resources of the National Center for Biotechnology Information: update”, Nucleic Acids Res.30( 1):13-6. 73. Wilbur, W.J. and Lipman, D.J, (1983), “Rapid similarity searches in nucleic acid and protein databanks”, Proc.Natl.Acad.Sci. USA, 80. 726-730. 74. Wu, C., Berry, M., Shivakumar, S. And Mclarty, J, (1999, “Neural Networks for Full-scale Sequence Classification: Sequence Encoding with Singular Value Decomposition” machine Learning. 2 1, 177-193.
REFERENCES
549
75. Wu, C. (1997) “Artificial neural networks for molecular sequence analysis” Computers Chem. Vol. 21, No. 4,237-256. 76. Wu, C., Xiao, C., Hou, Z., Huang, H., Barker, W.C. (2001) “iProclass: an integrated, comprehensive and annotated protein classification database”. Nucleic Acids Res. 29: 52-54. 77. Zhang, C. and Wong, A. K. (1997), “A genetic algorithm for multiple molecular sequence alignment”, Comput. Appl. Biosci., 13(6): 565-8 1.
This Page Intentionally Left Blank
Index
A
2-path network design problem, 337 3-index assignment problem, 3 19 ACO for the Reconfigurable Mesh, 190 ACS, 398 Adaptive memory, 298,304 AGA, 398 Algorithmic Parallelism in PSA, 269 Aminoacids, 5 I9 ANOVA, 52 Ant Colony Optimization, 25 ANTabu. 398 Antennae Placement and Configuration, 500 APGA, 398 Applications of Parallel Hybrid Metaheuristics, 358 Artificial Ant Problem and PGP, 134 ASPARAGOS, 116 ATeam, 398 B
Bioinfonnatic Applications, 526 Bioinformaticsat a Glance, 5 I9
C CAC, 398 CEDA at Work, 214 Cellular Estimation of Distribution Algorithms, 21 1 Cellular Model in PGP, 132 Cellular Networks, 506 Central memory, 293,298-30 I Centralized strategy. 326 CGA, 398 CGP-Based Classifier, 146 Circuit Encoding Using Trees, 141 Classification of Parallel EDAs, 216 Classificationwith Cellular GP, 143 Classifying Hybrid Metaheuristics, 350 CMPTS, 398 Coarse-grained.304 CommunicationTopology in PGP, 132
Component Exchange Among Metaheuristics. 29 Computational Effort of PGP, 135 Condor, 67,74,375 Condor-G, 74 Constructive Heuristics, 4 Control cardinality, 292 Cooperative-threadparallel implementation. 338 Cooperative Search, 30 COP,405 CoPDEB, 116,398 CORBA. 72 COSEARCH, 398 CPM-VRPTW, 398 CPTS, 398 Crossover Operator in GP, 129 CS, 398
D Data Mining and Cellular GP, 143 DBsrch, 526 Decision Trees in PGP, 145 DGA, 116,398 DGENESIS, 116 Distributed Estimation of Distribution Algorithms, 207 Distributed Resource Machine (DRM), 276 Distributed strategy, 325 Diversification. 290.297,299,30 1 DNA, 519 DNA arrays, 5 18 DNA strands, 5 I7 DPM, 398 DREAM, 116,276,405 island model, 276 Distributed Resource Machine, 276 Dynamic load balancing, 3 19 Dynamic Problems, 479
E ECO-GA. 116 Efficiency, 46 Elite solutions, 299.301, 323 Empirical distribution, 320 Empirical distribution plot, 32 1
551
552
INDEX
EnGENEer, 116 Evaluating the Computational Effort, 5 1 Even Parity Problems and PGP, 134 Evolutionary Computation, 19 Evolutionary Parallelism, 275 Exons, 5 19 Explorative Local Search Methods. I3 Exponential distribution, 320
F FGA, 398 Fitness-Level Parallelism in GP, I3 I Flynn’s Taxonomy, 64 FPGA Circuit Encoding, 14 1 Fractional Factorial Design, 50 Frameworks, 305 Frameworks for Heterogeneous Metaheuristics, 404 FTPH, 398 Full Factorial Design. 50 Function Set in GP, 128
c
GAACO. 398 GALOPPS, I16 GAMAS, 116,398 GAME, 116 GDGA. 116 Genetic algorithms, 280 Genetic Programming, 127 GENITOR 11, 116 Globus, 67,73 GP, 128 GP Sets, 142 Graph Coloring, 451 Graph partitioning, 300-30 1 Graph partitioning, 452 Graph planarization, 321 GRASP, 3 15 Greedy Randomized Adaptive Search Procedure, 3 15 construction for 2-path network design, 337 construction for 3-index assignment. 328 construction for job shop scheduling, 332 construction phase, 3 15 greedy function, 3 15 local search for 2-path network design, 337 local search for 3-index assignment. 328 local search for job shop scheduling, 333 local search phase, 3 15 multiple-walk cooperative-thread with path-relinking, 325 restricted candidate list, 3 I5 with path-relinking, 325 Grid, 375 H
HDGA, 398 Heterogeneous Metaheuristics Survey, 397 HFC-EA, 398 HM4C. 398 Hy3.398 Hy4, 116,398 Hybrid GRASP with path-relinking, 325 Hybrid parallelism, 270,275 Hybridization with GRASP, 325 GRASP with path-relinking, 323
I IiGA. 1 16,398GAindependent-thread parallel implementation Independent-thread parallel implementation, 338 Independent search, 296 Information exchange, 293,298 Integration with Tree Search Methods and Constraint Programming, 30 Intensification, 290,301 Irregular Computational Patterns in Bioinformatics, 532 Island Models of PGP. 131 J Java RMI, 72 Java Threads, 69 Job shop scheduling. 277 Job shop scheduling problem, 3 19. 321, 33 1
K Kendall Square Research, 3 17
L Learning the Probability Distribution in Parallel EDAs, 207 Levels of Parallelism in EDA, 204 Local search, 3 16 Local Search Methods, 5 Location Problems, 464 LSM, 80
M MACS-VRPTW. 398 MAGMA, 406 MALLBA, 56, 116,499 MARS, 1I6 MAS-DGA, 406 Massive parallelization, 274 Master-slave, 294295.304 MastedSlave in EDA, 206 MAX-SAT, 3 17 Maximum covering, 32 1 Maximum independent set. 321 Maximum weighted satisfiability, 321 MAXSAT Problem. I 19 MCAA, 398
INDEX
Message-Passing Interface, 3 17,327 Metaheuristics, 6 Metropolis Algorithm, 268 Microarray, 521 Middleware, 67 Migration Model of PES. 160 Migration Parameters in PGP, 132 Migration Policy in a dEDA, 208 MIMD, 63 MISD, 63 Mobile Network Design, 470 MOP, 371 MPI, 71,317, 327 Message Passing Toolkit, 327 MSA, 538 Multi-level cooperative search, 300-301 Multi-Population Models of PGP, 13I Multicommodity Network Design. 469, 507 Multiple-walk cooperative-thread, 316,323,334 Multiple-walk independent-thread, 316-317, 329 Multiple-walk independent-thread GRASP with path-relinking, 333 Multiple independent runs, 269,273 Multiple Independent Runs in PSA, 272 Multithread search, 297,301,304 Mutation Operator in GP, 129
N Neighborhood function, 277,279 Neighborhood operator, 290 Nested Populations of PES, 164 Network Assignment and Dimensioning, 504 Network Design, 300,468,496 Network Routing, 502 New Challenges in Bioinfonnatics. 543 New Trends in PGAs, 117 NGA (DAGA2), 398 No-Free-Lunch Theorem, 348 Non-generational Models of PES, 163 OpenMP, 69 OR-Library, 3 17 OR Library, 278 ORLib, 333
P P-ACO for FPGAs, 192 PAES, 372 ParadisEO, 116,405 PARAGENESIS, 116 Parallel Ant Colony Optimization, 90 Parallel Computer Architectures: Taxonomy, 522 Parallel Estimated Distribution Algorithms, 91 Parallel Evolution Strategies, 88 Parallel Evolutionary Algorithms, 159 Parallel Fitness Evaluation in EDAs, 206
553
Parallel Genetic Algorithms. 87, 112 Parallel Genetic Programming, 89 Parallel GRASP, 83 Parallel GRASP, 317 Parallel Heterogeneous Metaheuristic, 396 Parallel Heterogeneous Metaheuristics, 93 Parallel Hybrid Metaheuristics, 355 Parallel Metaheuristics for the VRP, 476 Parallel Metaheuristics in Bioinformatics, 534 Parallel Metrics, 46 Parallel Models for EDAs, 206 Parallel Models of EAs, 86 Parallel Models of LSMs, 81 Parallel moves, 270, 274 Parallel Moves in PSA, 273 Parallel Multiobjective Models, 379 Parallel Multiobjective Optimization, 94 Parallel Multiobjective Steady State GA, 378 Parallel Programming Models, 524 Parallel Scatter Search, 92,225 with Multiple Combinations, 241 with Single Combination, 239 Parallel Simulated Annealing, 8 I, 269-270 Parallel Tabu Search, 82 Parallel Virtual Machine, 3 17 Parallel VNS, 84,251 Parallelism by data, 270 Parallelism by Data in PSA, 272 Parallelization test, 322 Parallelizing PAES, 377 Pareto front, 371 Pareto optimum, 371 Path-relinking, 323 for 2-path network design, 337 for 3-index assignment, 328 for job shop scheduling, 333 symmetric difference, 324 PATS, 398 PEGAsuS, 116 Performance Measures for Parallel Metaheuristics. 44 CPU time, 44 Speedup, 44 Wall-clock time, 44 Performance Metrics for Multiobjective Optimization, 38 I PGA-Cellular Model, 114 PGA-Distributed Model, 113 PGA-Independent Runs Model, 112 PGA-Master-Slave Model, 112 PGA, 116,398 PGA Models of Population Sizing, 43 1 PGAPack, I 1 6 PGP Benchmark Problems, 134 PHM, 396 PHMH. 398 PHYLIP package, 533
554
INDEX
Phylogenetic analysis, 520 Phylogenetic trees, 5 17 Physical Parallelism in PSA, 269 Placement and Routing in FPGAs. 138 PMSATS, 398 Pollination Model of PES, 161 Population Based ACO. 175 Principles of EAs, 85 Protein Folding, 520 Pthreads, 68 PTS, 398 PVM, 71,317
Q Q-Q plot, 32 1 quantile-quantile plot, 321 QAP. 3 17 QAPLIB, 3 17 quadratic assignment, 32 I quadratic assignment problem, 3 I7 Quadratic assignment, 294 Quadratic Assignment, 462
R Radio Link Frequency Assignment, 505 RCL, 315 Real Life Applications of PGP, 137 Regular Computational Pattern: Database Searching, 526 Reliability and Connectivity Problems, 496 Reporting Results in Parallel Metaheuristics, 53 Reusing Availat le Software in Bioinfonnatics, 543 RPL2, 1 I6 S
SAGA, 398 Satisfiability Problems, 459 Scalability, 282 Scaled Speedup. 47 Scaleup, 47 Scatter Search Components, 23 I , 235 Search control, 292 Search differentiation, 293 Search space decomposition, 295 Semi-Regular Computational Patterns, 53 I Sequential fan candidate list. 295 Serial Fraction, 48 Set Covering Applications, 458 Set Partitioning and Covering, 457 Set Partitioning Applications, 457 SGA-Cube, 116 SGI Challenge, 319, 327 Short Introduction to Parallel Metaheuristics, 448 SIMD. 63 Simulated Annealing, 9,267-268 Boltzmann Distribution, 269
distributed Evolutionary Simulated Annealing, 276 Evolutionary Simulated Annealing, 275,277 SISD, 63 Skeletons. 305 Sockets, 70 SOTA, 538 SOTAdp, 538 Speedup, 44 Statistical Analysis of Metaheuristics, 52 Steiner problem, 3 I7 Steiner Tree Problem, 456,498 Structured Genetic Algorithms, 110 SUN-SPARC 10,317 Survey of Hybrid Algorithms, 348 Survey of Parallel GAS, 116 Symbolic Regression Problem and PGP, 134 Synchronous, 295 Synchronous parallelization, 292 T
t-tests, 52 Tabu Search, 11,289 Task Scheduling Strategies, 525 Taxonomy, 291 Taxonomy of Parallel Heterogeneous Metaheuristics, 400 Taxonomy of Speedup Measures, 45 TECHS, 398 Telecommunication Network Design, 469 Terminal Set in GP, 128 The p-Median Problem, 229,258,467 The Feature Subset Selection Problem, 232 The Genetic Programming Algorithm, 128 The Traveling Salesman Problem, 471 The VNS Metaheuristic, 248 Theoretical distribution, 320 Theoretical Effects of Migration in PGAs. 434 Theory of Master-Slave Parallel GAS,428 Theory of Multipopulation Parallel GAS, 430 Theory on Cellular Parallel GAS, 437 Three-index assignment problem, 32 I , 327 TPSA, 398 Traffic assignment, 3 19 Trajectory versus population based methods, 8 Tree-Structured Individuals in GP, 128 Two-parameter exponential distribution, 320 Typical GP Problems, 134
U Uncapacited Facility Location Problem. 275,278 V
Various Applications on Telecoms, 508 VNS for the p-Median, 258 VRP, 294,298 vehicle routing, 294 .Jehicle routing probiems. 476 with time constraints, 477