FOURTH IFIP INTERNATIONAL CONFERENCE ON THEORETICAL COMPUTER SCIENCE- TCS 2006
IFIP - The International Federation for Information Processing IFIP was founded in 1960 under the auspices of UNESCO, following the First World Computer Congress held in Paris the previous year. An umbrella organization for societies working in information processing, IFIP's aim is two-fold: to support information processing within its member countries and to encourage technology transfer to developing nations. As its mission statement clearly states, IFJP's mission is to be the leading, truly international, apolitical organization which encourages and assists in the development, exploitation and application of information technology for the benefit of all people. fPJP is a non-profitmaking organization, run almost solely by 2500 volunteers. It operates through a number of technical committees, which organize events and publications. IFIP's events range from an international congress to local seminars, but the most important are: • The IFIP World Computer Congress, held every second year; • Open conferences; • Working conferences. The flagship event is the IFIP World Computer Congress, at which both invited and contributed papers are presented. Contributed papers are rigorously refereed and the rejection rate is high. As with the Congress, participation in the open conferences is open to all and papers may be invited or submitted. Again, submitted papers are stringently refereed. The working conferences are structured differently. They are usually run by a working group and attendance is small and by invitation only. Their purpose is to create an atmosphere conducive to innovation and development. Refereeing is less rigorous and papers are subjected to extensive group discussion. Publications arising from IFIP events vary. The papers presented at the IFIP World Computer Congress and at open conferences are published as conference proceedings, while the results of the working conferences are often published as collections of selected and edited papers. Any national society whose primary activity is in information may apply to become a full member of IFIP, although full membership is restricted to one society per country. Full members are entitled to vote at the annual General Assembly, National societies preferring a less committed involvement may apply for associate or corresponding membership. Associate members enjoy the same benefits as full members, but without voting rights. Corresponding members are not represented in IFIP bodies. Affiliated membership is open to non-national societies, and individual and honorary membership schemes are also offered.
FOURTH IFIP INTERNATIONAL CONFERENCE ON THEORETICAL COMPUTER SCIENCE- TCS 2006 IFIP 19th World Computer Congress^ TC-1, Foundations of Computer Science^ August 23-24, 2006, Santiago, Chile
Edited by Gonzalo Navarro Universidad de Chile, Chile
Leopoldo Bertossi Carleton University, Canada
Yoshiharu Kohayakawa Universidade de Sao Paulo, Brazil
^
Sprin er
Library of Congress Control Number: 2006927819 Fourth IFIP International
Conference on Theoretical Computer Science- TCS 2006
Edited by G. Navarro, L. Bertossi, and Y. Kohayakawa
p. cm. (IFIP International Federation for Information Processing, a Springer Series in Computer Science)
ISSN: 1571-5736/1861-2288 (Internet) ISBN: 10: 0-387-34633-3 ISBN: 13: 9780-387-34633-5 elSBN: 10:0-387-34735-6 Printed on acid-free paper
Copyright © 2006 by International Federation for Information Processing. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief exceipts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 9 8 7 6 5 4 3 2 1 springer.com
Preface
The papers contained in this volume were presented at the fourth edition of the IFIP International Conference on Theoretical Computer Science (IFIP TCS), held August 23-24, 2006 in Santiago, Chile. They were selected from 44 papers submitted from 17 countries in response to the call for papers. A total of 16 submissions were accepted as full papers, yielding an acceptance rate of about 36%. Papers sohcited for IFIP TCS 2006 were meant to constitute original contributions in two general areas: Algorithms, Complexity and Models of Computation; and Logic, Semantics, Specification and Verification. The conference also included six invited presentations: Marcelo Arenas (Pontificia Universidad Catolica de Chile, Chile), Jozef Gruska (Masaryk University, Czech Republic), Claudio Gutierrez (Universidad de Chile, Chile), Marcos Kiwi (Universidad de Chile, Chile), Nicola Santoro (Carleton University, Canada), and Mihalis Yannakakis (Columbia University, USA). The abstracts of those presentations are included in this volume. In addition, Jozef Gruska and Nicola Santoro accepted our invitation to write full papers related to their talks. Those two surveys are included in the present volume as well. TCS is a biannual conference. The first edition was held in Sendai (Japan, 2000), followed by Montreal (Canada, 2002) and Toulouse (France, 2004). TCS is organized by IFIP TCI (Technical Committee 1: Foundations of Computer Science). TCS 2006 was part of the 19th IFIP World Computer Congress (WCC 2006), constituting the TCl Track of WCC 2006, and it was sponsored by TCI and the Center for Web Research (CWR), at the Department of Computer Science of the University of Chile. We thank the local WCC organizers and TCI for their support in the organization of IFIP TCS. We also thank the members of the Program Committee and the additional reviewers for providing timely and detailed reviews. Finally, we thank TCI for inviting us to chair this edition of TCS.
Santiago, Chile
Gonzalo Navarro, TCI Track Chair & PC Cochair Leopoldo Bertossi, PC Cochair Yoshiharu Kohayakawa, PC Cochair
TCS 2006 Organization
Technical Committee 1 (TCI) Chair Mike Hinchey
NASA, USA
WCC 2006 TCI Track Chair Gonzalo Navarro
Center for Web Research Department of Computer Science Universidad de Chile, Chile
Program Committee Chairs Gonzalo Navarro
Center for Web Research Department of Computer Science Universidad de Chile, Chile
Leopoldo Bertossi
School of Computer Science Carleton University, Canada
Yoshiharu Kohayakawa
Department of Computer Science Institute of Mathematics and Statistics Universidade de Sao Paulo, Brazil
VIII
Preface
Program Committee Members Amihood Amir Marcelo Arenas Diego Calvanese Marsha Chechik Jan Chomicki Josep Diaz Volker Diekert Thomas Eiter David Fernandez-Baca Esteban Feuerstein Gianluigi Greco Jozef Gruska Claudio Gutierrez Joos Heintz Douglas Howe Klaus Jansen Deepak Kapur Michael Krivelevich Ravi Kumar Leonid Libkin Satyanarayana V. Lokam Ernst Mayr Daniel Panario Rene Peralta Jean-Eric Pin Bruce Reed Marie-Prance Sagot Nicola Santoro Phihp Scott Torsten Schaub Angelika Steger Jayme Szwarcfiter Wolfgang Thomas Jorge Urrutia Alfredo Viola
Bar-Ilan University (Israel) Pontificia Universidad Catolica de Chile (Chile) Free University of Bolzano/Bozen (Italy) University of Toronto (Canada) University at Buffalo (USA) Universitat Politecnica de Catalunya (Spain) Universitat Stuttgart (Germany) Technische Universitat Wien (Austria) Iowa State University (USA) Universidad de Buenos Aires (Argentina) Universita della Calabria (Italy) Masaryk University in Brno (Czech Republic) Universidad de Chile (Chile) Universidad de Buenos Aires (Argentina) Carleton University (Canada) Universitat Kiel (Germany) University of New Mexico (USA) Tel Aviv University (Israel) Yahoo! Research (USA) University of Toronto (Canada) Microsoft Research (USA) Technische Universitat Miinchen (Germany) Carleton University (Canada) NIST (USA) LIAFA (CNRS, Universite Paris 7, France) McGill University (Canada) INRIA (Prance) Carleton University (Canada) Ottawa University (Canada) Universitat Potsdam (Germany) ETH Zurich (Switzerland) Universidade Federal do Rio de Janeiro (Brazil) RWTH Aachen (Germany) Universidad Nacional Autonoma de Mexico (Mexico) Universidad de la Republica (Uruguay)
Preface
External Reviewers Eugene Asarin Marie-Pierre Beal Liming Cai Ivana Cerna Florian Diedrich Olga Gerber Stefan Goller Serge Grigorieff David Ilcinkas Elham Kashefi Markus Lohrey Anil Maheshwari Marc Moreno Maza Michele Mosca Pedro Ortega Jose Miguel Piquer Philipp Rohde Alan Schmitt Peter Selinger Imrich Vrto
Inge Battenfeld Flavia Bonomo Roberto Caldelli Luc Devroye Mitre Dourado Mihaela Gheorghiu Cristina Gomes Fernandes Arie Gurfinkel Philippe Jorrand Werner Kuich Sylvain Lombardy Arnaldo Mandel Robert W. McGrail Thomas Noll Holger Petersen Ivan Rapaport Mauro San Martin Stefan Schwoon Ralf Thole Steven (Qiang) Wang
W C C 2006 Local Organization Mauricio Solar
Universidad de Santiago, Chile
IX
Contents
Part I Invited Talks Locality of Queries and Transformations Marcelo Arenas
3
Prom Informatics to Quantum Informatics Jozef Gruska
5
RDF as a Data Model Claudio Gutierrez
7
Adversarial Queueing Theory Revisited Marcos Kiwi
9
Distributed Algorithms for Autonomous Mobile Robots Nicola Santoro
11
Recursion and Probability Mihalis Yannakakis
13
Part II Invited Papers Prom Informatics to Quantum Informatics Jozef Gruska
17
Distributed Algorithms for Autonomous Mobile Robots Giuseppe Prencipe, Nicola Santoro
47
Part III Contributed Papers The Unsplittable Stable Marriage Problem Brian C. Dean, Michel X. Goemans, Nicole Immorlica
65
Variations on an Ordering Theme with Constraints Walter Guttmann, Markus Maucher
77
XII
Contents
BuST-Bundled Suffix Trees Luca Bortolussi, Francesco Fabris, Alberto Policriti
91
An 0(1) Solution to the Prefix Sum Problem on a Specialized Memory Architecture Andrej Brodnik, Johan Karlsson, J. Ian Munro, Andreas Nilsson
103
An Algorithm to Reduce the Communication Traffic for Multi-Word Searches in a Distributed Hash Table Yuichi Sei, Kazutaka Matsuzaki, Shinichi Honiden
115
Exploring an Unknown Graph to Locate a Black Hole Using Tokens Stefan Dobrev, Paola Flocchini, Rastislav Krdlovic, Nicola Santoro
131
Fast Cellular Automata with Restricted Inter-Cell Communication Martin Kutrib, Andreas Malcher
151
Asynchonous Distributed Components: Concurrency and Determinacy . . . 165 Denis Caromel, Ludovic Henrio Decidable Properties for Regular Cellular Automata Pietro Di Lena
185
Symbolic Determinisation of Extended Automata Thierry Jeron, Herve Marchand, Vlad Rusu
197
Regular Hedge Model Checking Julien d'Orso, Tayssir Touili
213
Completing Categorical Algebras Stephen L. Bloom, Zoltdn Esik
231
Reusing Optimal TSP Solutions for Locally Modified Input Instances . . . . 251 Hans-Joachim Bockenhauer, Luca Forlizzi, Juraj Hromkovic, Joachim Kneis, Joachim Kupke, Guido Proietti, Peter Widmayer Spectral Partitioning of Random Graphs with Given Expected Degrees . 271 Amin Coja-Oghlan, Andreas Goerdt, Andre Lanka A Connectivity Rating for Vertices in Networks Marco Abraham, Rolf Kotter, Antje Krumnack, Egon Wanke
283
On PTAS for Planar Graph Problems Xiuzhen Huang, Jianer Chen
299
Index
315
Part I
Invited Talks
Locality of Queries and Transformations (Invited
Talk)
Marcelo Arenas * Center for Web Research & Computer Science Department, Pontificia Universidad Catolica de Chile, Escuela de Ingenien'a - DCC143, Casilla 306, Santiago 22, Chile. marenasQing.puc.cl
Abstract Locality notions in logic say that the truth value of a formula can be determined locally, by looking at the isomorphism type of a small neighborhood of its free variables. Such notions have proved to be useful in many applications especially in computer science. They all, however, refer to isomorphism of neighborhoods, which most local logics cannot test. A more relaxed notion of locahty says that the truth value of a formula is determined by what the logic itself can say about that small neighborhood. Or, since most logics are characterized by games, the truth value of a formula is determined by the type, with respect to a game, of that small neighborhood. Such game-based notions of locality can often be applied when traditional isomorphism-based locality cannot. In the first part of this talk, we show some recent results on game-based notions of locality. We look at two, progressively more complicated locality notions, and we show that the overall picture is much more complicated than in the case of isomorphism-based notions of locality. In the second part of this talk, we concentrate on the locality of transformations, rather than queries definable by formulas. In particular, we show how the game-based notions of locality can be used in data exchange settings to prove inexpressibility results.
Partially supported by FONDECYT grant 1050701 and the Millennium Nucleus Center for Web Research, Grant P04-067-F, Mideplan, Chile. Please use the following format when citing this chapter: Ai'enas, M., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), p. 3.
From Informatics to Quantum Informatics (Invited
Talk)
Jozef Gruska* Faculty of Informatics, Masaryk University, Brno, Czech Republic. gruskaOf i . mimi. cz
Abstract During the recent years, exploration of the quantum information processing and communication science and technology got a significant momentum, and it has turned out quite clearly that paradigms, concepts, models, tools, methods and outcomes of informatics play by that a very important role. They not only help to solve problems quantum information processing and communication encounters, but they bring into these investigations a new quality to such an extend that one can now acknowledge an emergence of a quantum informatics as of an important area of fundamental science with contributions not only to quantum physics, but also to (classical) informatics. The main goal of the talk will be to demonstrate the emergence of quantum informatics, as of a very fundamental, deep and broad science, its outcomes and especially its main new fascinating challenges, from informatics and physics point of view. Especially challenges in the search for new primitives, computation modes, new quality concerning efficiency and feasibility of computation and communication, new quality concerning quantum cryptographic protocols in a broad sense and also in a very new and promising area of quantum formal systems for programming, semantics, reasoning and verification. The talk is targeted to informaticians that are pedestrians in quantum world, but would like to see what are new driving forces in informatics, where they drive us and how.
* Support of the grants GACR 201/04/1153 and MSM0021622419 is acknowledged. Please use the following format when citing this chapter: Gruska, J., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), p. 5.
R D F as a Data Model (Invited
Talk)
Claudio Gutierrez * Center for Web Research, Computer Science Department, Universidad de Chile, Blanco Encalada 2120, 3er piso, Santiago, Chile. cgutierrOdcc.uchile.cl
Abstract The Resource Description Framework (RDF) is the W3C recommendation language for representing metadata about Web resources. It is the basic data layer of the Semantic Web. The original design was influenced by the Web, library, XML and Knowledge representation communities. The driving idea was a language to represent information in a minimally constraining and flexible way. It turns out that the impact of the proposal goes far beyond the initial goal, particularly as a model for representing information with a graph-like structure. In the first half of the talk we will review RDF as a database model, that is, from a data management perspective. We will compare it with two data models developed by the database community which have strong similarities with RDF, namely, the semistructured and the graph data models. We will focus the comparison on data structures and query languages. In the second half of the talk, we will discuss some of the challenges posed by RDF to the Computer Science Theory Community: 1. 2. 3. 4. 5. 6.
RDF as data model: Database or knowledge base? Abstract model for RDF: What is a good foundation? Concrete -real life- RDF data: What are the interesting fragments? Theoretical novelties of the RDF data model: Are there any? RDF Query Language: Can the database experience be of any help? Infrastructure for large-scale evaluation of data management methodologies and tools for RDF: Waiting for something? 7. Storing, Indexing, Integrity Constraints, Visualization et al.: Theory is required.
The speaker acknowledges the support of Millennium Nucleus Center for Web Research, Grant P04-067-F, Mideplan, Chile. Please use the following format when citing this chapter: Gutierrez, C, 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), p. 7.
Adversarial Queueing Theory Revisited (Invited
Talk)
Marcos Kiwi* Depto. Ing. Matematica & Ctr. Modelamiento Matematico UMI 2807, Universidad de Chile. Blanco Encalada 2120, piso 5, Santiago, Chile. www. dim. uchile. cl/~mkiwi.
Abstract We survey over a decade of work on a classical Queueing Theory problem; the long-term equilibrium of routing networks. However, we do so from the perspective of Adversarial Queueing Theory where no probabilistic assumptions about traffic patterns are made. Instead, one considers a scenario where an adversary controls service requests and tries to congest the network. Under mild restrictions on the adversary, one can often still guarantee the network's stability. We illustrate other applications of an adversarial perspective to standard algorithmic problems. We conclude with a discussion of new potential domains of applicability of such an adversarial view of common computational tasks.
Background In 1996 Borodin et al. [9] proposed a robust model of queueing theory in network traffic. The gist of their proposal is to replace stochastic assumptions about the packet traffic by restrictions on the packet arrival rate, which otherwise can be under the control of an adversary. Thus, they gave rise to what is currently termed Adversarial Queueing Theory (AQT). In it, the time-evolution of the routing network is viewed as a game between an adversary and a packet scheduling protocol. The AQT framework originally focussed on the issue of stability of queueing policies and network topologies. Characterizations and efficient algorithms were developed for deciding stability of a collection of networks for specific families of scheduling policies. Generalizations of the AQT framework were proposed. Endto-end packet delay issues were addressed. Time-dependent network topology variants were considered, etc. We survey a decade of results in AQT. We point to other work where a similar adversarial approach has been successfully developed. We conclude with a discussions of other computational domains where a similar adversarial approach might be fruitfully applied. Gratefully acknowledges the support of CONICYT via FONDAP in Applied Mathematics and Anillo en Redes. Please use the following format when citing this chapter: Kiwi, M., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), pp. 9-10.
10
References 1. W. Aiello, E. Kushilevitz, R. Ostrovsky, and A. Rosen. Adaptive packet routing for bursty adversarial traffic. In Proc. of the ACM Symposium on Theory of Computing, 359-368, 1998. 2. C. Alvarez, M. Blesa, J. Diaz, A. Fernandez, and M. Serna. Adversarial models for priority based networks. In Proc. of the International Symposium on Mathematical Foundations of Computer Science, 142-151, Springer-Verlag, 2003. 3. C. Alvarez, M. Blesa, and M. Serna. A characterization of universal stability in the adversarial queueing model. SIAM J. Comput., 34(l):41-66, 2004. 4. M. Andrews, B. Awerbuch, A. Fernandez, J. Kleinberg, T. Leighton, and Z. Liu. Universal stability results and performance bounds for greedy contention resolution protocols. J. of the ACM, 48(l):39-69, 2001. 5. M. Andrews, A. Fernandez, A. Goel, and L. Zhang. Source route and scheduling in packet networks. In Proc. of the IEEE Symposium on Foundations of Computer Science, 2001. 6. E. Anshelevich, D. Kempe, and J. Kleinberg. Stability of load balancing algorithms in dynamic adversarial systems. In Proc. of the ACM Symposium on Theory of Computing, 399-406, 2002. 7. B. Awerbuch, P. Berenbrink, A. Brinkmann, and C. Scheideler. Simple routing strategies for adversarial systems. In Proc. of the IEEE Symposium on Foundations of Computer Science, 158-167, 2001. 8. R. Bhattacharjee, A. Goel, and Z. Lotker. Instability of FIFO at arbitrarily low rates in the adversarial queueing model. SIAM J. Comput, 34(2):318-332, 2005. 9. A. Borodin, J. Kleinberg, P. Raghavan, M. Sudan, and D. Williamson. Adversarial queueing theory. J. of the ACM, 48(l):13-38, 2001. 10. A. Borodin, R. Ostrovsky, and Y. Rabani. Stability preserving transformations: Packet routing nertworks with edge capacity and speed. In Proc. of the ACMSIAM Symposium on Discrete Algorithms, 601-610, 2000. 11. A. Charny and J.-Y. Le Boudec. Delay bounds in a network with aggregate scheduling. In Proc. of the International Workshop on Quality of Future Internet Services, 1-13. Springer-Verlag, 2000. 12. I. Chlamtac, A. Farago, H. Zhang, and A. Fumagalli. A deterministic approach to the end-to-end analysis of packet flows in connection-oriented networks. lEEEACM T. Network., 6(4):422-431, 1998. 13. D. Gamarnik. Stability of adversarial queues via fluid model. In Proc. of the IEEE Symposium on Foundations of Computer Science, 60-70, 1998. 14. M. Kiwi and A. Russell. The Chilean highway problem. Theor. Comput. Set., 326(1-3) :329-342, 2004. 15. M. Kiwi, M. Soto, and C. Thraves. Adversarial queueing theory with setups. Technical report. Center for Mathematical Modelling, U. Chile, 2006. 16. P.R. Kumar and T.I. Seidman. Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems. IEEE Trans, on Automat. Contr., 35(3):289-298, 1990. 17. J.-Y Le Boudec and G. Hebuterne. Comments on "A deterministic approach to the end-to-end analysis of packet flows in connection oriented network". lEEEACM T. Network., 8(1):121-124, 2000. 18. Z. Lotker, B. Patt-Shamir, and A. Rosen. New stability results for adversarial queuing. In Proc. of the ACM Symposium on Parallel Algorithms and Architectures, 192-199, 2002.
Distributed Algorithms for Autonomous Mobile Robots (Invited
Talk)
Nicola Santoro School of Computer Science, Carleton University, santoro9scs.ccirleton.ca
Abstract The distributed coordination and control of a team of autonomous mobile robots is a problem widely studied in a variety of fields, such as engineering, artificial intelligence, artificial life, robotics. Generally, in these areas, the problem is studied mostly from an empirical point of view. Recently, a significant research effort has been and continues to be spent on understanding the fundamental algorithmic limitations on what a set of autonomous mobile robots can achieve. In particular, the focus is to identify the minimal robot capabilities (sensorial, motorial, computational) that allow a problem to be solvable and a task to be performed. In this talk we describe the current investigations on the interplay between robots capabilities, computability, and algorithmic solutions of coordination problems by autonomous mobile robots.
Please use the following format when citing this chapter: Santoro, N., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), p. 11.
Recursion and Probability (Invited
Talk)
Mihalis Yannakakis * Department of Computer Science, Columbia University, 455 Computer Science Building, 1214 Amsterdam Avenue, Mail Code 0401 New York, NY 10027.
[email protected] Abstract We discuss recent work on the algorithmic analysis of systems involving recursion and probability. Recursive Markov chains extend ordinary finite state Markov chains with the ability to invoke other Markov chains in a potentially recursive manner. They offer a natural abstract model of probabilistic programs with procedures, and generalize other classical well-studied stochastic models, eg. Multi-type Branching Processes and Stochastic Context-free Grammars. Recursive Markov Decision Processes and Recursive Stochastic Games similarly extend ordinary finite Markov decision processes and stochastic games, and they are natural models for recursive systems involving both probabilistic and nonprobabilistic actions. In a series of recent papers with Kousha Etessami (U. of Edinburgh), we have introduced these models and studied central algorithmic problems regarding questions of termination, reachability, and analysis of the properties of their executions. In this talk we will present some of the basic theory and algorithms.
Research partially supported by NSF Grant CCF-4-30946. Please use the following format when citing this chapter: Yannakakis, M., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), p. 13.
Part II
Invited Papers
From Informatics to Quantum Informatics Jozef Gruska* Faculty of Informatics, Masaryk University, Brno, Czech Republic gruskaQf i.muni.cz
Abstract. Quantum phenomena exhibit a variety of weird, counterintuitive, puzzling, mysterious and even entertaining effects. Quantum information processing tries to make an effective use of these phenomena to design new quantum information processing and communication technology and also to get a better understanding of quantum and information processing worlds. During the recent years, exploration of the quantum information processing and communication science and technology got a significant momentum, and it has turned out quite clearly that paradigms, concepts, models, tools, methods and outcomes of informatics play by that a very important role. They not only help to solve problems quantum information processing and communication encounter, but they bring into these investigations a new quality, and to such an extend, that one can now acknowledge an emergence of a quantum informatics as of an important new area of fundamental science with contributions not only to quantum physics, but also to (classical) informatics itself. The main goal of this paper is to demonstrate the emergence of quantum informatics, as of a very fundamental, deep and broad science, its outcomes and especially its main new fascinating challenges, from informatics and physics point of view. Especially challenges in the search for new primitives, computation modes, new quality concerning efficiency and feasibility of computation and communication, new quality concerning quantum cryptographic protocols in a broad sense, and also in a very new and promising area of quantum formal systems for programming, semantics, reasoning and verification. The paper is targeted towards informaticians that are pedestrians in the mysterious quantum world, but would like to see what are new driving forces in informatics, where they drive us, why and how. In the paper, oriented towards broad audience, main mysteries, puzzles and specific features of quantum world are dealt with as well as basic models, laws, limitations, results and the state-of-the-art of quantum information processing and communication.
1 Introduction In q u a n t u m computing we witness a merge of two arguably the most important areas of science of 20th century: q u a n t u m physics and informatics. It would * Support of the grants GACR 201/04/1153 and MSM0021622419 is acknowledged. Please use the following format when citing this chapter: Gruska, J., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), pp. 17-46.
18
J. Gruska
therefore be astonishing if such a merge would not shed new light on b o t h of t h e m and would not bring new great discoveries. This merge is surely bringing new aims, challenges and potentials for informatics and also new approaches to explore q u a n t u m world. In spite of the fact t h a t it is hard to predict particular impacts of q u a n t u m computing on computing in general, it is quite safe to expect t h a t the merge will lead t o important outcomes. Since the very beginning of q u a n t u m mechanics, various its mysterious and counterintuitive phenomena have been discovered, b u t science community did not pay too large attention to t h e m because they looked as innocent features t h a t largely exist due to our, still not perfect, mathematical m o d e l / u n d e r s t a n d i n g of the q u a n t u m world, or as phenomena investigation of which can be postponed. Randomness of q u a n t u m measurement and resulting collapse of the q u a n t u m state being measured, q u a n t u m entanglement and non-locality in correlations exhibited due to it^, are perhaps the most puzzling ones. Q u a n t u m counterfactual effects with its peculiar consequences^ are even more weird phenomena. In between, situation has radically changed. Q u a n t u m entanglement has been shown to be useful to perform actions, as q u a n t u m teleportation (Bennett et al, 1993), t h a t is not possible in the classical world, to achieve in computation the efficiency t h a t seems to be impossible in the classical world, as Shor's polynomial time algorithms for factorization and discrete logarithms (Shor, 1994)), t o achieve level of security not possible in the classical world (for example for classical keys generation (Ekert, 1991)), to increase exponentially efficiency of communicating protocols (Raz, 1999), to introduce new important capacities and to increase old capacities of q u a n t u m channels (see Gruska (1999-2005) and Nielsen and Chuang (2000) for an overview, and so on. All t h a t is still only a small list of the success story of q u a n t u m entanglement t h a t has been experimentally demonstrated for distance of u p to 50km using fiber (Marcikic et al., 2004) and up to 13km over noisy ground atmosphere (see Peng at al., 2004). It is, for example, believed, and expected by some, t h a t q u a n t u m entanAs formally defined later, entanglement of quantum states is defined using Hilbert space formalism for quantum phenomena. However, the existence of non-local correlations is an experimentally observed phenomenon and therefore independent of the choice of formalism. At the moment, the only observed non-local correlations are those exhibited by entangled states. This, however, does not exclude that some other non-local correlations will be discovered. The term counterfactual is usually used for things that might have happened, although they did not really happened. An important point is that while classical counterfactuals do not have physical consequences, quantum counterfactuals can have surprisingly big consequences because the mere possibility that some quantum event might have happened can change the probabilities of obtaining various experimental outcomes. For example, it can be shown that a quantum computer can provide the result of a computation without performing the computation provided it would provide the same result of computation by really performing the computation (Mitchinson and Jozsa, 1999).
Prom Informatics to Quantum Informatics
19
glement will have also large practical impacts. For example, to increase quality of measurements (see Childs et al. 1999). To summarize, quantum entanglement is now considered as a new very important resource for quantum information processing and communication, a resource that has, in addition, the following potentials (see also Gruska 19992005, 2003): - To provide a new gold mine for science and technology; - To give an edge to quantum versus classical information processing and communication. - To help to understand better various important physical phenomena. Surely, the most puzzling and powerful consequence of the existence of entangled quantum states is non-locality their measurements exhibit. Namely, if a set of particles is in an entangled state and one of the particles is measured, then this measurement immediately influences/determines results of subsequent measurements of other particles. There are therefore non-local correlations between results of the measurements of particles in an entangled state.
X = y implies a = b
Fig. 1. EPR-box Quantum nonlocahty, exhibited by the measurement of so-called EPR-state 7^(100) + 111)), can be modelled by so-called EPR-box shown in Figure 1. There are two parties involved, A and B, much separated by space, that do not communicate with each other, and an imaginary box with two input-output ports, each for one of the parties. If the party A puts in its input port a, it gets out, immediately, an output x, and if the party B puts in an input b it gets out, as the output, immediately, a y. The key property of the EPR-box is that ii a = b, then x = y, no matter in which order the parties put their inputs in and how much time is between their entries. No-scommunication (nosignaling) condition meanes that output of Alice (Bob) does not depend on the input of Bob (Alice). Nonlocality exhibited in the EPR-box can be manifested by the measurement of entangled states, namely of the EPR-state. However, non-locahty exhibited in so called PR-box, shown in Figure 2, where inputs and outputs are always in the relation x • y = (a © 6), seems to be beyond
20
J. Gruska
the possibilities of the physical world. Indeed, would there be a physical system t h a t would allow to implement the PR-box, then any multiparty communication could be done by transmitting only a single bit (van Dam, 2005) what can be indeed seen as impossible. Interesting enough, none of these non-localities allows instantaneous communication and therefore they actually do not contradict the no-signaling condition of special relativity"*.^ T h e task to understand nonlocality is one of the most important in current science. In this connection, the recent experiment (Scarani et al., 2000) is of importance, from which it follows t h a t there are reasons t o believe t h a t either space-time is an illusion or free will is an illusion or, as their experiment confirms, there is a special "quantum information" t h a t travel faster then light (but cannot be used directly to communicate classical information). No-signaling condition actually says that local choice of measurements may not lead to observable differences on the other ends. PR-box may seem as an artificial construction, but it is not so and it comes out very naturally when non-classical correlations and their limits are considered. Indeed, the basic scheme is that two parties separated in space, say A and B, that cannot communicate have an access to a physical state and can use it to generate correlations. This can be seen as that both parties to perform one of two randomly chosen measurements and then the outcomes of these measurements are given by random variables and one asks the question how much can these outcomes be correlated. Both classical physics and quantum mechanics put certain limits on strength of such correlations. The limits that any classical theory (i.e. local hidden variable theory) provides are known as Bell inequalities (Bell, 1964). There are many of them and among them special position has so-called CHSH inequality y~^
Prob(a;a ® 2/6 = a •fo)< 3,
a,66{0,l}
where a and b denote choices of the measurements of A and B, and Xa,yb are outcomes of measurements. Quantum mechanics allows violation of this inequality, but only up to so-called Cire 'Ison's bound 2 -|- A/2. The PR-box captures maximal possible, mathematically, violation of this bound. In spite of the fact that van Dam's result strongly indicates/proves physical impossibility of PR-boxes, they keep been intensively studied. For example, it has also been shown (Short et al., 2005), that availability of PR-boxes would allow unconditional secure oblivious transfer protocol, an important cryptographical primitive. Cerf et al. (2005) have also shown that a single PR-box could be used to simulate the EPR box, and therefore a maximally entangled state (its measurements), though not any two-qubit entangled state and that the PR-box would be a strictly weaker resource than a bit of communication. The PR-box can also be used to show that no-cloning theorem holds. PR-boxes have a variety of other surprising and also counterintuitive properties. They are surveyed nicely and referenced well by Scarani (2006). For example, two parties may need 2" PR-boxes for some tasks that can be performed using n EPR states. In addition, for all natural measures of nonlocality non-maximally entangled states exhibit more non-locality than maximally entangled states.
Prom Informatics to Quantum Informatics
21
X . y = (a + b) mod 2
Fig. 2. PR-box
Q u a n t u m superposition, t h a t stands for the fact t h a t any q u a n t u m state is a weighted superposition (with complex numbers as weights - probabiUty amp h t u d e s specifying probabihties of t h e transfer from a given s t a t e t o particular state of the basis) of the states of a basis, is another very special q u a n t u m phenomenon. One of the implications of t h a t is quantum parallelism t h a t allows, for example, on a single state of n q u a n t u m bits to perform, in a single step, an action t h a t corresponds, in some sense, to 2 " computation steps in the classical world. For example, one can get, in one step, into amplitudes of a q u a n t u m n-qubit state, all values of a function / : { 0 , . . . , 2 " - 1} - ^ { 0 , . . . , 2 " - 1}.^ There is a certain catch in this result/fact, because there is no way to get faithfully out all these values from the resulting q u a n t u m state. However, in some important cases, as it is in Shor's algorithm for factorization of integers n, this does not really m a t t e r , because what one needs to compute is only a single value, a period of a properly chosen function / ( x ) = a^ mod n, and in such a case such a massive q u a n t u m parallelism is indeed useful. A mysterious fact is why we do not observe superposition and entanglement between objects of the classical world if our world is actually fully quantum.'' ® With more technical details, it works as follows: If / : { 0 , 1 , . . . , 2" — 1} { 0 , 1 , . . . , 2" — 1}, then the mapping / ' : {x, 0) = » {x, f{x)) is one-to-one and therefore there is a unitary transformation Uf such that for any a; G { 0 , 1 , . . . , 2" — 1}.
Ufi\xm) =^ \x)\f{x)} The state \ip) = -^Yli=o l*)|0) can be obtained in a single step, using Hadamard transform, from the basis state |0'"^} and with a single application of the mapping Uf, on the state \'>p) we get
UfW
E H)i/w)
Hence, in a single computation step, 2" values of / are computed! We have therefore a really massive parallelism. "^ Of interest in this context are two well known citations: There is no quantum world. There is only an abstract quantum physical description. It is wrong to think that the
22
J. Gruska
This strange situation was already long time ago well demonstrated by famous Schrodinger's cat Gedanken experiment with a cat t h a t is in a superposition of states \alive) and \dead) - though none has ever seen a cat t h a t would be b o t h alive and dead. An important agenda of the current experimental research is therefore to find some border lines, if they exist at all, between the world in which superposition exists and the one where no superposition can be detected.^ There have been surprising results in such investigations recently. For example, entanglement has been demonstrated at a group of 10^^ atoms (see Julsgaard et al., 2000) and q u a n t u m interference for large molecules (see Brezger et al. 2002). However, there is still a range of several orders of magnitudes to explore where border between classical and q u a n t u m world is. Concerning q u a n t u m measurement, there are also several mysterious and counterintuitive things. T h e first one is the fact t h a t results of q u a n t u m measurement are random. Einstein's position was expressed by his famous words God does not roll dice, b u t equally famous is Bohr's reply The true God does not allow anybody to prescribe what he has to do.^ and the puzzling fact about q u a n t u m measurement is t h a t theory does not say anything about how much a particular measurement really costs in terms of some physical resources. Because of t h a t it is usually considered, in efficiency calculations, t h a t a measurement step requires a unit time. However, this does not seem to be realistic because sometimes we can see at a q u a n t u m measurement as t h a t N a t u r e performs, in a "unit time", quite complicated computation, what is again against our common sense. Q u a n t u m measurement can therefore be seen as a special resource t h a t , if properly used, can do miracles, from q u a n t u m information processing point of view. task of physics is to find out how Nature is. Physics concerns what we can say about Nature, by N. Bohr and There is no classical world - there is only quantum world by D. Greenberger (see Arndt et al., 2005), who actually said: I believe there is no classical world. There is only quantum world. Classical physics is a collection of unrelated insights: Newton's laws. Hamilton's principle, etc. Only quantum theory brings out their connection. An analogy is the Hawaiian Islands, which look like a bunch of island in the ocean. But if you could lower the water, you would see, that they are the peaks of a chain of mountains. That is what quantum physics does to classical physics. In this context another views are of interest from Arndt et al. (2005): The border between classical and quantum, phenomena is just a question of money, by A. Zeilinger, The classical-quantum boundary is simply a matter of information control, by M. Aspelmeyer, and There is no border between classical and quantum phenomena ~ you just have to look closer, by R. Bertlman. Experiments performed recently actually imply not only that God does play dice, but actually that God plays with non-local dice, beause measuement of an entangled state can produce shared randomness, see Gisin (2005).
From Informatics to Quantum Informatics
23
2 Basics of q u a n t u m information processing and communication Quantum physics deals with fundamentals entities of physics — particles, like (a) protons, electrons and neutrons (from which matter is built); (b) photons (which carry electromagnetic radiation); (c) various "elementary particles" which mediate other interactions of physics. We call all of them particles in spite of the fact that some of their properties are totally unlike the properties of what we call particles in our ordinary world. (Actually, it is not clear in which sense these "particles" can be said to have properties at all.) It is also clear that quantum physics is an elegant and conceptually simple theory that describes with surprising precision a large spectrum of the phenomena of Nature. Predictions made on the base of quantum physics have been experimentally verified to 14 orders of precision. No conflict between predictions of the theory and experiments is known. Without quantum physics we cannot explain properties of superfluids, functioning of laser, color of stars, Quantum physics is of special interest for informatics for several reasons. One of them is similarity, in a sense, and close relation between these two areas of science. Indeed, the goal of physics can be seen as to study elements, processes, laws and limitations of the physical world. Goal of informatics can then be seen as to study elements, processes, laws and limitations of the information world. Of large importance is therefore to explore which of these two worlds, physical and information, is more basic, if any, and what are the main relations between the basic concepts, principles, laws and limitations of these two worlds. Quantum physics can be also seen as an excellent theory to predict probabilities of quantum events. Such predictions are to a large extend based on three simple principles: PI To each transfer, from a quantum state / to a state tjj, a complex number {ip\4>) is associated, which is called the probability amplitude of the transfer, and |(^|(?!>)p is then the probability of such a transfer. P2 If a transfer from a quantum state (f> to & quantum state ip can be decomposed into two subsequent transfers ip *— 4>' B Hilbert space iJ„ is an n-dimensional complex vector space on which the scalar product
{ijj\4>) = 2 . 't'i4'i of Einy two vectors
01 02 0n
Ipl
,w =
tp2
i'n
is defined as well as the norm of a vector \\cj)\\ = ^/\{(p\4>)\ and the metric dist(0, V') = 110 — V'll- This allows to introduce on if a topology and such concepts as continuity. Two quantum states are called orthogonal if their scalar product is zero. This is a very important concept because physically are perfectly distinguishable only orthogonal states. Dirac introduced the following handy notation, so called bra-ket notation, to deal with amplitudes, quantum states and linear functionals f : H —^ C. If -0,0 G H, then {ip\(p) is the scalar product of ijj and (f> (and an amplitude of going from cp to V'); |0) is called a ket-vector- a column vector, an equivalent to 4>; {ip\ is a bra-vector - a row vector, a linear functional on H such that
mm = (v;|0). Evolution in a quantum system is described by the Schrodinger linear equation ih-dipit) Hit)m,
dt
where H is the Plank constant, ip{t) is the state of the system in time t and H{t) is a quantum analogue of a Hamiltonian of the classical system. In case H is constant, the Schrodinger equation has as solution ip{t) = e~R^*'0(O) and from that it follows that a discretized evolution (computation) of any quantum system is performed by a unitary operator and a step of such an evolution we can see as a multipHcation of a unitary matrix^^ A with a vector |i/)), i.e. as
Am^° A matrix A is unitary ii A • A^ = A^ • A = I, where A^ is the matrix obtained from A by transposition and then by replacement of each element by its complex conjugate.
From Informatics to Quantum Informatics
25
A quantum bit, called usually qubit, is then a quantum state in H2, \) = a\0) + P\l), where a,/3 € C are such that jap + |/3|2 = 1 ( {|0), |1)} is the standard basis of H2). Important operations on one qubit are Hadamard transform, represented by the matrix
Now we can say that the essence of the difference between the classical computers and quantum computers is in the way information is stored and processed. In classical computers, information is represented on macroscopic level, by bits, which can take on one of two values, 0 or 1. In quantum computers, information is represented on microscopic level, using qubits, which can take on any from uncountable many values a|0) +/3| 1), where ex, /? are arbitrary complex numbers such that ]ap + |/?p = 1. Very important is also difference between the ways compound classical and compound quantum systems are created. In the classical case, any state of a composed system is composed of the states of subsystems. This is not so in the quantum case. If a Hilbert space H {Ti') corresponds to a quantum system S (S'), and {ai}i ({/3j}j) is a basis of W (W), then the tensor product of H and H', notation Ti ®H', corresponds to the quantum system composed of S and S' and this Hilbert space has a (standard) basis consisting of all tensor products of states \ai) and \f3j). For example, Hilbert space 7^4 can be seen as the tensor product of two one-qubit Hilbert spaces, H'z ®'H2, and therefore one of its (standard) basis consists of the states |0) 0 |0), |0) ® |1), |1) ® |0), |1) ® |1) These states are usually denoted shortly as: |00>, |01), |10), 111). Another important orthogonal basis in 7T!4 consists of the following four so-called Bell states: |^+) = - ^ ( | 0 0 ) + 111),
1^-) = -^(|00> - 111)),
10^+) = -^(101) + |10),
\^-) = i = ( | 0 1 ) - |10)).
Similarly, the (standard) basis states of an n-qubit Hilbert space W2" are the states \i\i2---in) = |Ji)|0)) = ^ ( l « ) l « ) + mm
^ |7>|7) = l{\a)\a) + |/3)|/?) + |a)|/3) + |/3>|a)).
From Informatics to Quantum Informatics
27
We can now also say that important properties of the classical information are: (a) transmission of information in time and space is very easy (b) making unlimited number of copies of information is very easy. On the other side, important properties of the quantum information are: (a) transmission of the quantum information in time and space is very difficult; (b) there is no way to make faithful copies of unknown quantum information, (c) attempts to measure the quantum information destroy it, in general.
3 Outcomes and challenges of q u a n t u m computation Quantum polynomial time algorithms of Shor, in 1994, that could be used to break important classical cryptosystems, were so far main apt killers for quantum information processing. A natural quantum version of the Fourier transform has been the main tool^^ and the quantum Fourier transform has been also used later to design various other quantum algorithms that are more efficient than the most efficient classical algorithms for the same algorithmic problems. Main generalized result is that there are quantum polynomial time algorithms for so called Hidden Subgroup Problem for Abelian groups.-^^ Perhaps the most important open problem in the design of quantum algorithms is to determine whether the Hidden Subgroup Problem is always solvable in polynomial time also for non-Abelian groups. Would this be true, it would imply, for example, that there is a quantum polynomial time algorithm also for the graph isomorphism problem. Even of large impact on the design of efficient quantum algorithms have had the discovery of Grover (1996). who has shown that one can find in an unordered database of A'' elements a unique element satisfying a given condition P in \/N quantum steps. His idea was generalized and applied in numerous ways and resulted also into so-called probability amplification technique. Recently, quantum random walks got a momentum as a way to design quantum algorithms (see Aharonov et al., 2001). Of interest are also non-traditional modes of quantum computation as adiabatic (see Farhi et al., 2000). Several ingenious techniques have also been developed to prove lower bounds: for example, the polynomial method (Beals et al., 1998), the quantum adversary method (Ambainis, 2000) and its various variants. They have been used to show a variety of impressive lower bound results (see Gruska, 1999-2005, for an overview). Also other quantum generalizations of transforms known from signal processing and applied mathematics have turned out to be useful for the design of quantum algorithms. The Hidden Subgroup Problem is the following one: Given is an (efficiently computable) function f : G —* R, where G is a finite group and R a finite set and a promise that there exists a subgroup Go < G such that / is constant on any left cosset and distinct on different cossets of Go. The task is to find a generating set for Go (in polynomial time (in Ig |G|) in the number of calls to the oracle for / and in the overall polynomial time).
28
J. Gruska
There are several, and some quite surprising, models of quantum universal computation. The most basic one is that of quantum unitary-operations based circuits, that is defined in a similar way as in the classical case, only gates have to be quantum, representing quantum unitary operations. Given an algorithmic problem P , in order to solve it using a quantum circuit one has to find at first a unitary operations Up that solves P and then to create a quantum circuits Cup, with quantum gates from some universal set of quantum gates, that implements Up. A variety of special problems concerning quantum computation comes from the fact that quantum unitary operations have to be reversible, that is such that one can uniquely determine inputs from their outputs. This seems to be a very special and strong restriction because from the most basic logical operations only NOT is reversible and none of the basic arithmetical operations. An important contribution to the understanding of the computational power of quantum phenomena was a surprising result of Bennett (1973) that says that if a function f is computable by a one-tape Turing machine in time t{n), then there is a 3-tape reversible Turing machine computing, with constant time overhead, the mapping a —» {a,g{a), f{a)), where g(a) is so called gaih&ge that can be removed using a special technique. For classical reversible computations of Boolean functions universal is so called Toffoli operation, or control-control-not operation, CCNOT(a;,y,z) = {x,y, (xDy)® z). Nature offers many ways - let us call them technologies - various quantum information processing primitives can be exhibited, realized and utilized. Since it appears to be very difficult to exploit potential of the Nature for QIP, it is of large importance to explore which quantum primitives form universal sets of primitives. Also from the point of view of the understanding of the laws and limitations of QIP, and also of quantum mechanics itself, the problems of finding rudimentary and universal QIP primitives are of large importance. Concerning universal sets of computation primitives, the very basic result says that a single two-qubit operation control-not, CNOT{\x).\y)) = \x)\x®y), and all one-qubit gates form a universal set of gates that can be used to design, for any unitary operation and any given precision £ > 0, a quantum circuit to approximate this operation with precession e. (The catch is that it is very difficult to create the CNOT-gate because such a gate has to be able to transform two separable states into an entangled state.-^^ Universal is also the set of the ^^ There are many ways how to create entangled states. For example, using various special physical processes. Of importance for understanding problems with the design of quantum processes is the fact that if CNOT is applied to two simple and separated one-qubit states, then CNOT may produce an entangled state: Indeed, CNOT(|0), TTjdO) + |I>) = 7f (|00) + 111))- Another surprising way how to create an entangled state of two separated particles is so-called entanglement swapping: If particles Pi and P2 are in the EPR-state and so are particles P3 and P4, then Bell measurement of particles P2 and P3, makes particles Pi and P4, that have never interacted before, to get into the maximally entangled EPR-state: In other words,
From Informatics to Quantum Informatics
EPR-state PI
29
EPR-state PA
yP3
P4
BELL MEASUREMENT
EPR-state Fig. 3. Entanglement swapping
following three operations: C N O T , H a d a m a r d and aj = I „ i j 1. For computational purposes with classical input and output, universal is also the set of only two simple gates: the Toffoli gate and the H a d a m a r d gate (Shi, 2002). This actually means t h a t in order to get universality for q u a n t u m computation one has to add the H a d a m a r d gate to the Toffoli gate t h a t is universal for classical reversible computation. (Hadamard gate can actually create a perfectly random bit.) It is also known t h a t any n-qubit unitary operation can be implemented by a circuit consisting of 0 ( 4 " ) gates C N O T and one-qubit gates (see Vartianen et al., 2003). One of the recent surprising results in Q I P C is t h a t universal, from the computational point of view, are also circuits with gates performing only measurements and t h a t what is needed for t h a t are measurement-gates from only a very small set of gates. Measurement gates can be specified by Hermitian operators and measurements then correspond to the orthogonal basis created by the orthogonal set of eigenvectors of these Hermitian matrices. Actually, universal is a set of only four different Hermitian operators (measurements, see Perdrix, 2004). Measurement-based computations are probabilistic, up to a Pauli matrix, but this is only a small handicap. Another surprising model of universal computation are so-called one-way computers at which computation starts with a special entangled, so-called cluster state, but then only one qubit measurements are performed (Raussendorf and Briegel, 2000). All these results indicate t h a t search for primitives in q u a n t u m computation is likely still to be full of surprises and options, what is actually not so strange because N a t u r e offers so many way q u a n t u m information processing processes can be exhibited. CNOT gate has to be able to make entangled two particles that have never before interacted, see Figure 3.
30
J. Gruska
Two types of circuits are of special importance. Universal circuits, for certain number k of qubits, that can perform any unitary operation on k qubits if some classical parameters are fixed appropriately. Such universal circuits, with 3 CNOT gates and 15 elementary rotation gates for the case of two qubits and with 40 CNOT gates and 98 elementary rotation gates in the case of three qubits were derived by Vatan and WiUiams (2003, 2004), see also Gruska (2005). Programmable circuits (sometimes called programmable processors) are another type of circuits that are universal in some restricted sense and that are of theoretical and also of large application interest. The basic idea is similar to that in case of classical universal circuits: certain inputs form so-called operation register and are used to specify, through a quantum state, an operation U that is to be performed on the state 10) given on the remaining inputs on data register. There are several reasons why are such circuits are of importance. They may be universal for a set of operations and the operation to be performed can be result of some previous computation. The idea of programmable circuits has a limited use in case it is required that the outcome U{(f>) is determined uniquely and perfectly, because in such a case in order for a programmable circuit to be able to perform n unitary operations the dimensionality of the program space has to be n, in order for the circuit to be able uniquely distinguish the program given. More interesting and practical seem to be the cases that the outcomes should be correct only with some (sufficiently large) probability, or should only approximate the correct result, again with a given precision. Approximate programmable circuits also better reflect reality because circuits with perfect outcomes are an idealization only. For an overview of the subject and latest results on approximate programmable circuits that can approximate a set of unitary operations see Hillery et al. (2005). There are many interesting/important problems associated with such programmable circuits. For example, how to determine input that makes the circuit/processor to make best approximation of a given unitary. Of interest and importance are also investigations what kind of circuits can be simulated in polynomial time on classical computers. Almost "classical" result of Gottesman and Knill (see Nielsen and Chuang, 2000), says that circuits composed of the CNOT-gate, Hadamard-gate and the standard basis measurement, so called Clifford circuits, can be simulated on classical computers in polynomial time. Recently, Markov and Shi (2005) have shown that a quantum circuit with n gates, whose underlying graph has tree-width d can be simulated classically in n'-'^^^e'-'^'') time, which is polynomial in n ii d = O(lgn). This result has a variety of implications: for example in classical polynomial time one can simulate any log-depth circuit whose gates apply to nearby qubits only. Another approach to the problem of simulation on classical computers was taken by Somma et al. (2006). They consider a special Lie-algebraic models of computation and showed that these models can be efficiently simulated on classical computers in time polynomial in the dimension of algebra. Their results generalize those on fermionic linear optics computations.
Prom Informatics to Quantum Informatics
31
Another very basic model of quantum computation are quantum finite automata. Actually, there are several versions of them. Three very basic problems for models of quantum automata to explore are: (a) What is the class of languages accepted by a given model? (b) Which accepting probabilities can be achieved with a given model of automata? (c) How does the size of automata of the model (the number of states) compares to the size of equivalent minimal deterministic finite automata? Comparing with classical finite automata, quantum finite automata have special strength, due to the power of quantum superposition (parallelism), but also a special weakness, due to the requirement that they have to be reversible. (It is important to notice that negative impacts of reversibility can be, to a large extent, compensated by a suitable distribution of suitable measurements.) For some models, quantum finite automata accept a smaller class of languages as regular languages and for some other models they accept exactly the class of regular languages. Of large importance is what kind of measurements are performed and which measurement policy is used. For example, a measurement is performed after each computation step or only at the end of computation two extreme options. It has also be shown that in some cases quantum finite automata can be exponentially more succinct than classical deterministic finite automata. However, in some cases the opposite situation occurs. The very basic models of quantum finite automata, so called one-way (or real time) quantum automata, are defined similarly as probabilistic automata, only instead of probabilities, probability amplitudes are used and there is one additional requirement, namely that the overall evolution has to be unitary. More peculiar are quantum two-way automata. In the most basic model, they are a natural generalization of the classical two-way probabilistic finite automata. Quantum two-way automata can accept, with high probability, even some non-regular or non-context-free languages. In another model, quantum two way automata work almost as classical ones, they only have an additional quantum memory and at each step they either perform a usual classical move and a unitary operation on the state of their quantum memory, or a measurement on quantum memory is performed that then specifies, in a random way, the next move. Such automata have been shown to be much more powerful than classical probabilistic two-way finite automata (Ambainis and Watrous, 1999), even in the case quantum memory is restricted to one qubit (for an overview of concepts and results concerning quantum finite automata, see Gruska (2000). The very basic model of quantum Turing machines, originally due to Deutsch (1985), is again a modification of that of a probabilistic Turing machine - probabilities are only replaced by probability amplitudes. However, a non-trivial additional requirement is that the overall evolution of a quantum Turing machine has to be unitary. A state of such a quantum Turing machines can be seen as a weighted superposition of many configurations of a classical Turing machines. This model has been used to define basic quantum complexity classes and to develop quantum structural complexity. Such a model has classical inputs and outputs, only its evolution is quantum. Two new quite different models
32
J. Gruska
of Turing machines are of interest and importance. Both of them have quantum inputs and outputs (as sequences of qubits). One model (Jorrand and Perdrix, 2004), works with one additional qubit as memory and only measurements as operations. Another model is that of quantum Turing machines with classical control and quantum operations (Jorrand and Petrix, 2004a). The basic philosophy behind many of such models is that measurement is the basic tool to make quantum world to perform computations we need in the classical world. An important challenge concerning quantum computation is to develop a really good model of quantum cellular automata. There have been numerous attempts to do that, with variety of interesting results, but one can say that theory of quantum cellular automata is still not in a good shape. At the same time, quantum cellular automata are of large importance for quantum physics because interactions with neighbours is the very basic way Nature works. Those versions of quantum cellular automata that are O.K., are modifications of the partitioned or block-type of the classical cellular automata, see Schumacher and Werner (2004), for recent results. Quantum (structural) complexity theory is also being developed and it is an important part of quantum information processing science. One of the goals of quantum complexity theory is to challenge our basic intuition how physical world behaves. One can also say that quantum complexity theory is of great interest because one of its goals is to understand two of great mysteries of 20th century: what is nature of quantum mechanics and what are the limits of computation. It would be astonishing if a merge of such important areas would not shed light on both of them and would not bring new great discoveries. Taking complexity theory perspective can lead us to ask better questions about quantum nature - nontrivial, but answerable questions, which put old quantum mysteries in a new light even if they fall short of answering them (Aaronson, 2005). Quantum complexity theory has as the basic complexity class QP (as a quantum variant of the class P) and the class B Q P (as a quantum variant of the class B P P ) . There are also two quantum versions of the class N P , namely the classes N Q P and QMA. There are also many variants of the classes of relativistic quantum computing. Unfortunately, an introduction of all these classes did not help to make order in the ZOO of more than 470 classical complexity classes. Just opposite happened, the mess got larger. For an overview of recent results see Gruska (1999-2005)). Prom the recent surprising results in this area we mentioned that of Raz (2005) showing enormous power of quantum advices. ^^ In connection with theoretical investigations concerning quantum information processing and communication, of large importance is to find out whether we can really build powerful quantum computers and what is required for success. In this connection, one of the main goals of quantum informatics in general. ^^ Raz has shown that a quantum interactive proof system at which the verifier gets quantum advices can solve any problem whatsoever.
Prom Informatics to Quantum Informatics
33
and quantum complexity theory in particular, is to help to resolve this puzzle. In behind is actually question whether our world is polynomial or exponential, as pointed out by Aaronson (2005). The fact that such a basic question is unresolved makes also of large importance the task to study more elementary models as are that of quantum circuits, quantum programmable circuits and quantum finite automata. Main new challenges of quantum complexity theory can be seen as follows (see also Gruska (2005): (a) To help to determine whether we can build (and how) powerful quantum computers, (b) To help to determine whether we can effectively factorize large integers using a quantum computer, (c) To use complexity theory paradigms to classify quantum states (d) To use complexity theory (computational and communication) to study quantum entanglement and nonlocality. (e) To use complexity theory to determine power of decoherence and to find ways to fight decoherence. (f) To use complexity theory to formulate laws and limitations of physics, (g) To study feasibility in physics on a more abstract level, (h) To study various quantum theory interpretations from a new and more abstract (complexity) point of view, (e) To develop a more firm basis for quantum mechanics, (f) To develop new tests of quantum mechanics.
4 Outcomes and challenges of q u a n t u m communication
w EPR-state 0, such that if the destination point is closer than 5r, r will reach it; otherwise, r will move towards it of at least Sr • As no other assumptions on space exist, the distance traveled by a robot in a cycle is unpredictable. The second limiting assumption is on the length of a cycle. Assumption A2 (Finite Cycle). The amount of time required by a robot r to complete a computational cycle is not infinite. Furthermore, there exists a constant Cr > 0 such that the cycle will require at least Sr time. As no other assumption on time exists, the resulting system is fully asynchronous and the duration of each activity (or inactivity) is unpredictable. There are two important consequences:
Distributed Algorithms for Autonomous Mobile Robots
51
1. Since the time that passes after a robot starts observing the positions of all others and before it starts moving is arbitrary, but finite, the actual move of a robot may be based on a situation that was observed arbitrarily far in the past, and therefore it may be totally different from the current situation. 2. Since movements can take a finite but unpredictable amount of time, and different robots might be in different states of their cycles at a given time instant, it is possible that a robot can be seen while it is moving by other robots that are observing^. These consequences render difficult the design of an algorithm to control and coordinate the robots. For example, when a robot starts a Move, it is possible that the movement it performs is not "coherent" with the current configuration (i.e., the configuration it observed at the time of the Look and the configuration at the time of the Move can differ), since, during the Compute, other robots can have moved. Restricted Setting: Semi-synchronous Robots A computational setting that has been extensively investigated is one in which the cycles of all the robots are synchronized and their actions are atomic. In particular, there is a global clock tick reaching all robots simultaneously, and a robot's cycle is an instantaneous event that starts at a clock tick and ends by the next. The only unpredictability (hence the name semi-synchronous) is given by the fact that at each clock tick, every robot is either active or inactive, and only active robots perform their cycle. The unpredictability is restricted by the fact that at least one robot is active at every time instant, and every robot becomes active at infinitely many unpredictable time instants. A very special case is when every robot is active at every clock tick; in this case the robots are fully synchronized. In this setting, at any given time, all active robots are executing the same cycle state; thus no robot will look while another is moving. In other words, a robot observes other robots only when they are stationary. This implies that the computation is always performed based on accurate information about the current configuration. Fiirthermore, since no robot can be seen while it is moving, the movement can be considered instantaneous. An additional consequence of atomicity and synchronization is that, for them to hold, the maximum distance that a robot can move in one cycle is bounded. 2.3 Capabilities Different settings arise from different assumptions that are made on the robots' capabilities, and on the amount of information that they share and use during the accomplishment of the assigned task. In particular, ^ Note that this does not mean that the observing robot can distinguish a moving robot from a non moving one.
52
G. Prencipe and N. Santoro
- Visibility. The robots may be able to sense the complete plane or just a portion of it. We will refer to the first case as the Unlimited Visibility case. In contrast, if each robot can sense only up to a distance V > 0 from it, we are in the Limited Visibility case. In the following, we will say also that the robots have unlimited/limited visibility In addition, a robot cannot in general detect whether there is more than one fellow robot on any of the observed points, included the position where the observing robot is. We say it cannot detect multiplicity. - Agreement on Coordinate System. The robots do not necessarily share the same x—y coordinate system, and do not necessarily agree on the location of the origin (that we can assume, without loss of generality, to be placed in the current position of the robot), or on the unit distance. In general, there is no agreement among the robots on the chirality of the local coordinate systems (i.e., in general they do not share the same concept of where North, East, South, and West are). We will refer to this scenario as no agreement on the local coordinate systems. In the most favorable scenario, the robots agree on the direction and orientation of both axes. In this case, we will talk of total agreement on the local coordinate systems. Note that knowledge of the directions and orientations of both axes does not imply knowledge of the origin or the unit of length. An intermediate scenario is when the robots agree only on the direction and orientation of one axis; we will talk of partial agreement. - Memory. The robots can access local memory to store different amount of information regarding the positions in the plane of their fellows. In particular, if the robots can only store the robots' positions retrieved in the current observation, we have oblivious robots. In contrast, if the robots can store all the positions retrieved since the beginning of the computation, we have unbounded memory robots. We will also refer to the algorithm the robots execute as oblivious or non oblivious, depending on the assumption made. Note that, the conditions under which the robots operate are by definition common knowledge among the robots. Let us stress that the only means for the robots to coordinate is the observation of the others' positions and their change through time. For oblivious robots, even this form of communication is impossible, since there is no memory of previous positions.
3 Problems and Limitations In the following, we survey the computational results obtained so far. They are mostly about geometric problems, like forming a certain pattern, following a trail, or deploy the robots in order to guarantee optimal coverage of a certain terrain. Observe that several classical problems in distributed computing (e.g.,
Distributed Algorithms for Autonomous Mobile Robots
53
leader election) can be reformulated as geometric problems in our model (e.g., forming an asymmetric pattern). 3.1 Pattern formation The PATTERN FORMATION problem is one of the most important coordination problem and has been extensively investigated in the literature (e.g., see [8, 38, 39, 41]). The problem is practically important, because, if the robots can form a given pattern, they can agree on their respective roles in a subsequent, coordinated action. The geometric pattern to be formed is a set of points (given by their Cartesian coordinates) in the plane, and it is initially known by all the robots in the system. The robots are said to form the pattern if, at the end of the computation, the positions of the robots coincide, in everybody's local view, with the points of the pattern. The formed pattern may be translated, rotated, scaled, and flipped into its mirror position with respect to the initial pattern. Initially the robots are in arbitrary positions, with the only requirement that no two robots are in the same position, and that, of course, the number of points prescribed in the pattern and the number of robots are the same. The basic research questions are which patterns can be formed, and how they can be formed. Many proposed procedures do not terminate and never form the desired pattern: the robots just converge towards it; such procedures are said to converge. Arbitrary Pattern In this section, we review our results on the formation of an arbitrary pattern. The problem has been investigated by Flocchini et al [21, 23] and Oasa et al. [33] in the general setting, and by Suzuki and Yamashita [39] in the semi-synchronous setting; both investigations consider robots with unlimited visibility. In the general setting with unlimited visibility: - With total agreement oblivious robots can form any arbitrary given pattern [21]. - With partial agreement, oblivious robots can form any arbitrary given pattern if n is odd. If n is even, oblivious robots can form only symmetric patterns that have at least one axis of symmetry not passing through any vertex of the pattern [23]. - With no agreement at all, oblivious robots cannot form an arbitrary given pattern [21]. In the semi-synchronous setting with unlimited visibility, let m be the size of the largest subset of robots having an equivalent initial view. - Robots with unbounded memory can form [39] 1. any pattern if m = 1; 2. only patterns whose vertices can be partitioned into n/m regular m-gons all having the same center, if m > 2.
54
G. Prencipe and N. Santoro
Circle Formation In the CIRCLE FORMATION problem, the robots want to place themselves on the plane to form a non degenerated circle (i.e., with finite radius greater than zero). First observe that, if the diameter of the circle is not fixed a priori, the problem can be solved in a rather straightforward way by oblivious robots even in the general setting: each robot computes the smallest circle enclosing all the robots' positions and moves on the circumference of such a circle. The problem becomes more difficult when the diameter is prescribed. This problem was first studied by Sugihara and Suzuki [38]. They presented an heuristic distributed protocol that allowed the robots to form an approximation of a circle having a given diameter. The distributed protocol they proposed (executed independently by all the robots) to let the robots form an approximation of a circle of given diameter D. Experiments have shown that sometimes the robots bring themselves in a configuration similar to a Reuleaux triangle rather than a circle (see Figure 1). Successively, the protocol has been improved by Tanaka [40], that proposed a new solution that produces a better approximation of the circle.
Fig. 1. Reuleaux's triangle. It is obtained by drawings arcs arc{a,b), arc{b,c), and arc{c,a), with radii equal to D, from the vertices c, a, and 6, respectively, of an equilater triangle A{a,b, c) with sides equal to D.
A variant of this problem is the UNIFORM CIRCLE FORMATION problem: the n robots on the plane must be arranged at regular intervals on the boundary of a circle. Notice that this is the same as the problem of forming an n-gon. This problem has been studied in the semi-synchronous setting by Defago and Konagaya [16]; simulation results of these studies have been presented in [37]. The solution in [16] is, however, computationally expensive: in fact, it involves the use of Voronoi diagrams, necessary to avoid the very specific possibility in which at least two robots share at some time the same position and also have total agreement. Based on this observation, in [7] it is presented a new algorithm that avoids these expensive calculations.
Distributed Algorithms for Autonomous Mobile Robots
55
- In the semi-synchronous setting with unlimited visibility: oblivious robots can converge towards an n-gon [16, 37, 7]. Line Formation Let us now consider another simple pattern for the robots: a hne. That is, the robots are required to place themselves on a line, whose position is not prescribed in advance; we just defined the LINE FORMATION problem. Note that, if n = 2, a line is always formed. Despite the simplicity of its formulation, this problem has some subtleties that render its solution not so easy. In fact, the solvability of this problem heavily depends on the amount of agreement the robots have on their local coordinate systems. Clearly, if the robots can rely on total agreement, then the problem is easily solved: after lexicographically ordering the robots' positions (e.g., left-right, top-down), the first and the last robot in the ordering define the line to be formed. Then, all robots move sequentially (in order to avoid collisions) to this line (see Figure 2.a). If the robots have partial agreement, for instance on the direction and orientation of y, the robots can not rely on an unique total ordering of the robots' positions. In this case the robots can place themselves on the axis that is median between the two vertical axes tangent to the observed configuration (see Figure 2.b). The robots on the tangent axes are the last to move.
3« 2,
a.
b.
Fig. 2. Line formation with (a) total and (b) partial agreement.
In a recent study [15], the LINE FORMATION problem has been tackled by studying an apparently totally different problem: the spreading. In this problem, the robots, that at the beginning are arbitrarily placed on the plane, are required to evenly spread within the perimeter of a given region. In their work, the authors focus on the one-dimensional case: in this case, the robots have to form a line, and place themselves uniformly on it. A very interesting aspect of the study, is that [15] addresses the issue of local algorithms: each robots decides where to move based on the positions of its close neighbors. In particular, in the case of the line, the protocol, called Spread, is quite simple: each robot r observes its left and right neighbor. If r does not see any robot, it simply does not move; otherwise, it moves to the median point between its two neighbors. The authors prove its convergence in the semi-synchronous setting.
56
G. Prencipe and N. Santoro Semi-synchronous
Asynchronous
Multiplicity Detection [39] Infinite Time [2, 13, 14] Multiplicity Detection [10] Compass [22] Unbounded Memory [9] Infinite Time [12]
Table 1. Summary of additional assumptions made by the existing solutions for the GATHERING problem.
- In the semy-synchronous setting, the robots executing Spread converge to a line configuration with equal distances. Furthermore, if each robot knows the exact number of robots at each of its sides, it is possible to achieve the spreading in one dimension in a finite number of cycles. ~ In the fully-synchronous model, n robots can spread in one dimension in n —2 cycles. 3.2 Gathering In the GATHERING problem, the robots, initially placed in arbitrary positions, are required to gather in a single point. This problem is also called point formation, homing, or rendezvous. In spite of its apparent simplicity, it has recently been tackled by several studies: in fact, several factors render this problem difficult to solve, as shown by the following - In both the asynchronous and the semi-synchronous setting, there exists no deterministic obhvious algorithm that solves the GATHERING problem in a finite number of cycles, hence in finite time, for a set of n > 2 robots [35]. Some additional capabilities are thus needed to solve this problem (in Table 1 we report the existing results related to the GATHERING problem). Let us first consider the case of unlimited visibility. - In the semi-synchronous setting, n > 3 oblivious robots with multiplicity detection can gather in finite time [39]. This result has been recently improved; in fact, the same result can be achieved even in the general setting, extending the previous work of [11]: - In the general setting, n > 3 oblivious robots with multiplicity detection can gather in finite time [10].
Distributed Algorithms for Autonomous Mobile Robots
57
The multiplicity detection assumption is crucial to prove the correctness of these algorithms. In fact, the main idea is first to create a unique point p on the plane with two robots on it, and then to move all other robots on this point, taking care in not having other points with multiplicity greater than one while the robots move towards p. In contrast, the multiplicity detection is not used in the solution described in [9]; however, it is assumed that the robots can rely on an unlimited amount of memory: the robots are said to be non-oblivious. In other words, the robots have the capability to store the results of all computations since the beginning, and freely access to these data and use them for future computations. - In the general setting, n > 3 robots with unbounded memory can gather in finite time [9]. Another study [13] has been devoted to study the behavior of a particular simple solution to the problem: the robots use the center of gravity as gathering destination. The authors prove that this simple algorithm represent a convergence solution to the problem in the semi-synchronous setting. In [12] the same algorithm has been proven to be a convergence solution to the problem in the asynchronous setting. Let us then consider the case of limited visibility. With limited visibility, an obvious necessary condition to solve the problem, is that at the beginning of the computation the visibility graph (having the robots as nodes and an edge (rj, r^) if rj and rj are within viewing distance) is connected [2, 22]. In [2] the proposed protocol works in the semi-synchronous setting; however, it is a convergence solution to the problem: the robots do not gather in finite time. In fact, the authors design a protocol that guarantees only that the robots converge towards the gathering point. In contrast, in [22], the authors present an algorithm that let the robots to gather in a finite number of cycles. However, the robots can rely on the presence of a common coordinate system: that is, they share a compass. - In the semi-synchronous setting there exists an oblivious procedure that lets robots converge towards (but not necessarily reach) a point for any n [2]. - In the general setting oblivious robots with agreement on the coordinate system (e.g., with a compass) can gather in finite time [22, 24]. The GATHERING problem has been also investigated in the context of robots failures. In this context, the goal is for the non-faulty robots to gather regardless of the action taken by the faulty ones. Two types of robot faults were investigated by Peleg et al. [1]: crash failure, in which the robot stops any activity and will no longer execute any computational cycle; and the byzantine failure, in which the robot acts arbitrarily and possibly maliciously. - In the semi-synchronous setting, gathering with at most one crash failure is possible [1]. - In the semi-synchronous setting, gathering with at most one byzantine failure is impossible [1].
58
G. Prencipe and N. Santoro
- In the fully synchronous setting, gathering with at most ^ ^ byzantine failure is possible [1]. Finally, in [14] it is analyzed the case of systems where the robots have inaccuracies in sensing the positions of other robots, in computing the next destination point, and in moving towards the computed destination. The authors provide a set of limitations on the amount of inaccuracies allowing convergence; hence, they present an algorithm for convergence under bounded measurement, movement and calculation errors. 3.3 Following, flocking and capture In these problems there are two kinds of robots in the environment: the leader L, and the followers. The leader acts independently from the others, and we can assume that it is driven by an human pilot. The followers are required to follow the leader wherever it goes (following), while keeping a formation they are given in input (flocking). In this context, a formation is simply a pattern described as a set of points in the plane, and all the robots have the same formation in input (see Figure 3). In [26], an algorithm solving this problem has been tested by using computer simulation; the algorithm assumes no agreement. All the experiments demonstrated that the algorithm is well behaved, and in all cases the followers were able to assume the desired formation and to maintain it while following the leader along its route. Moreover, the obliviousness of the algorithm contributes to this result, since the followers do not base their computation on past leader's positions. Finally, if the leader is considered an "enemy" or "intruder", and the pattern surrounds it, the problem is known as capture. Also in this procedure that assumes no agreement and solves the problem has been presented in [27]. The proposed algorithm exhibits remarkable robustness, and numeric simulations indicate that the intruder is efficiently captured in a relatively short time and kept surrounded after that, as desired. Furthermore, the solution is selfstabilizing [17, 18]. In particular, any external intervention (e.g., if one or more of the cops are stopped, slowed down, knocked out, or simply faulty) does not prevent the completion of the task. - In the general setting there is a procedure for the flocking problem [26]. - In the general setting there is a procedure for the intruder problem [27].
4 Conclusion and Discussion In this paper, we surveyed a number of recent results on the interplay between robots' capabilities and solvability of problems. The goal of these studies is
Distributed Algorithms for Autonomous Mobile Robots
59
Leader , Initial Positions
Fig. 3. Trace of the vehicles while forming and keeping a wedge shaped formation. to gain a better understanding of the power of distributed control from an algorithmic point of view. The area offers many open problems. The operating capabilities of our robots are quite limited. It would be interesting to look at models where the robots have more complex capabilities, e.g.: the robots have some kind of direct communication capabilities; the robots are distinct and externally identifiable; etc. Little is known about the solvability of other problems like spreading and exploration (used to build maps of unknown terrains), about the physical aspects of the models (giving physical dimension to the robots, bumping, energy saving issues, etc.), and about the relationships between geometric problems and classical distributed computations. In the area of reliability and fault-tolerance, lightly faulty snapshots, a limited range of visibility, obstacles that limit the visibility and that moving robots must avoid or push aside, as well as robots that appear and disappear from the scene clearly are all topics that have not yet been studied. We believe that investigations in these areas will provide useful insights on the ability of weak robots to solve complex tasks. Acknowledgements The Authors would like to thank Paola Flocchini and Peter Widmayer for their help and suggestions in the preparation of this survey. This research is supported in part by the Natural Sciences and Engineering Research Council of Canada.
References 1. N. Agmon and D. Peleg. Fault-tolerant Gathering Algorithms for Autonomous Mobile Robots. In Proc. of the 15th ACM-SIAM Symposium on Discrete Algorithms, pages 1070 - 1078, 2004. 2. H. Ando, Y. Oasa, I. Suzuki, and M. Yamashita. A Distributed Memoryless Point Convergence Algorithm for Mobile Robots with Limited Visibility. IEEE Trans, on Robotics and Automation, 15(5):818-828, 1999.
60
G. Prencipe and N. Santoro
3. T. Balch and R. C. Arkin. Behavior-based Formation Control for Multi-robot Teams. IEEE Trans, on Robotics and Automation, 14(6), December 1998. 4. R. Beckers, O. E. Holland, and J. L. Deneubourg. From Local Actions To Global Tasks: Stigmergy And Collective Robotics. In Art. Life IV, 4"' Int. Worksh. on the Synth, and Simul. of Living Sys., 1994. 5. G. Beni and S. Hackwood. Coherent Swarm Motion Under Distributed Control. In Proc. DARS'92, pages 39-52, 1992. 6. Y. U. Cao, A. S. Fukunaga, A. B. Kahng, and F. Meng. Cooperative Mobile Robotics: Antecedents and Directions. In Int. Conf. on Intel. Robots and Sys., pages 226-234, 1995. 7. I. Chatzigiannakis, M. Markou, and S. Nikoletseas. Distributed Circle Formation for Anonymous Oblivious Robots. In Experimental and Efficient Algorithms: Third International Workshop (WEA 2004), volume LNCS 3059, pages 159 -174, 2004. 8. Q. Chen and J. Y. S. Luh. Coordination and Control of a Group of Small Mobile Robots. In Proc. IEEE Int. Conf on Rob. and Aut, pages 2315-2320, 1994. 9. M. Cieliebak. Gathering Non-Oblivious Mobile Robots. In Proc. 6th Latin American Symposium on Theoretical Informatics, pages 577-588, 2004. 10. M. Cieliebak, P. Flocchini, G. Prencipe, and N. Santoro. Solving the Robots Gathering Problem. In Proc. 30th International Colloquium on Automata, Languages and Programming, pages 1181-1196, 2003. 11. M. Cieliebak and G. Prencipe. Gathering Autonomous Mobile Robots. In Proc. 9th Int. Colloquium on Structural Information and Communication Complexity, June 2002. 12. R. Cohen and D. Peleg. Convergence Properties of the Gravitational Algorithm in Asynchronous Robot Systems. In Proc. of the 12th European Symposium on Algorithms, pages 228-239, 2004. 13. R. Cohen and D. Peleg. Robot Convergence via Center-of-Gravity Algorithms. In Proc. of the 11th Int. Colloquium on Structural Information and Communication Complexity, pages 79-88, 2004. 14. R. Cohen and D. Peleg. Convergence of Autonomous Mobile Robots with Inaccurate Sensors and Movements. In Proc. 23"* Annual Symposium on Theoretical Aspects of Computer Science (STACS '06), pages 549-560, 2006. 15. R. Cohen and D, Peleg. Local Algorithms for Autonomous Robots Systems. In Proc. of the 13th Colloquium on Structural Information and Communication Complexity, 2006. to appear. 16. X. Defago and A. Konagaya. Circle Formation for Oblivious Anonymous Mobile Robots with No Common Sense of Orientation. In Workshop on Principles of Mobile Computing, pages 97-104, 2002. 17. E. W. Dijkstra. Self-stabilizing Systems in Spite of Distributed Control. Comm. of the ACM, 17(ll):643-644, 1974. 18. S. Dolev. Self-stabilization. The MIT Press, 2000. 19. B. R. Donald, J. Jennings, and D. Rus. Information Invariants for Distributed Manipulation. The Int. Journal of Robotics Research, 16(5), October 1997. 20. E. H. Durfee. Blissful Ignorance: Knowing Just Enough to Coordinate Well. In ICMAS, pages 406-413, 1995. 21. P. Flocchini, G. Prencipe, N. Santoro, and P. Widmayer. Hard Tasks for Weak Robots: The Role of Common Knowledge in Pattern Formation by Autonomous Mobile Robots. In Proc. 10th International Symposium on Algorithm and Computation, pages 93-102, 1999.
Distributed Algorithms for Autonomous Mobile Robots
61
22. P. Flocchini, G. Prencipe, N. Santoro, and P. Widmayer. Gathering of Asynchronous Mobile Robots with Limited Visibility. In Proceedings 18th International Symposium on Theoretical Aspects of Computer Science, volume LNCS 2010, pages 247-258, 2001. 23. P. Flocchini, G. Prencipe, N. Santoro, and P. Widmayer. Pattern Formation by Autonomous Robots Without Chirality. In Proc. 8th Int. Colloquium on Structural Information and Communication Complexity, pages 147-162, June 2001. 24. P. Flocchini, G. Prencipe, N. Santoro, and P. Widmayer. Gathering of Asynchronous Robots with Limited Visibility. Theoretical Computer Science, 337:147168, 2005. 25. T. Fukuda, Y. Kawauchi, and H. Asama M. Buss. Structure Decision Method for Self Organizing Robots Based on Cell Structures-CEBOT. In Proc. IEEE Int. Conf. on Robotics and Autom., volume 2, pages 695-700, 1989. 26. V. Gervasi and G. Prencipe. Coordination Without Communication: The Case of The Flocking Problem. Discrete Applied Mathematics, 143:203-223, 2003. 27. V. Gervasi and G. Prencipe. Robotic cops: The intruder problem. In Proc. IEEE Conference on Systems, Man and Cybernetics, pages 2284-2289, 2003. 28. D. Jung, G. Cheng, and A. Zelinsky. Experiments in Realising Cooperation between Autonomous Mobile Robots. In ISER, 1997. 29. Y. Kawauchi and M. Inaba and. T. Fukuda. A Principle of Decision Making of Cellular Robotic System (CEBOT). In Proc. IEEE Conf. on Robotics and Autom., pages 833-838, 1993. 30. M. J Mataric. Interaction and Intelligent Behavior. PhD thesis, MIT, May 1994. 31. S. Murata, H. Kurokawa, and S. Kokaji. Self-assembling Machine. In Proc. IEEE Conf. on Robotics and Autom., pages 441-448, 1994. 32. F. R. Noreils. Toward a Robot Architecture Integrating Cooperation between Mobile Robots: Application to Indoor Environment. The Int. J. of Robot. Res., pages 79-98, 1993. 33. Y. Oasa, I. Suzuki, and M. Yamashita. A Robust Distributed Convergence Algorithm for Autonomous Mobile Robots. In IEEE Int. Conf. on Systems, Man and Cybernetics, pages 287-292, October 1997. 34. L. E. Parker. On the Design of Behavior-Based Multi-Robot Teams. Journal of Advanced Robotics, 10(6), 1996. 35. G. Prencipe. On The Feasibility of Gathering by Autonomous Mobile Robots. In Proc. 12th Int. Colloquium on Structural Information and Communication Complexity, pages 246-261, 2005. 36. G. Prencipe. The Effect of Synchronicity on the Behavior of Autonomous Mobile Robots. Theory of Computing Systems, 38:539-558, 2005. 37. S. Samia, X. Defago, and T. Katayama. Convergence Of a Uniform Circle Formation Algorithm for Distributed Autonomous Mobile Robots. In In Journes Scientifiques Francophones (JSF), Tokio, Japan, 2004. 38. K. Sugihara and I. Suzuki. Distributed Algorithms for Formation of Geometric Patterns with Many Mobile Robots. Journal of Robotics Systems, 13:127-139, 1996. 39. I. Suzuki and M. Yamashita. Distributed Anonymous Mobile Robots: Formation of Geometric Patterns. Siam J. Computing, 28(4):1347-1363, 1999. 40. O. Tanaka. Forming a Circle by Distributed Anonymous Mobile Robots. Technical report. Department of Electrical Engineering, Hiroshima University, Hiroshima, Japan, 1992.
62
G. Prencipe and N. Santoro
41. P. K. C. Wang. Navigation Strategies for Multiple Autonomous Mobile Robots Moving in Formation. Journal of Robotic Systems, 8(2):177-195, 1991.
Part III
Contributed Papers
The Unsplittable Stable Marriage Problem Brian C. Dean, Michel X. Goemans, and Nicole Immorlica ^ Department of Computer Science, Clemson University. bcdeanScs. clemson. edu ^ Department of Mathematics, M.I.T. goemans9math.mit.edu ^ Microsoft Researcli. nickleQmicrosoft. com Abstract. The Gale-Shapley "propose/reject" algorithm is a wellknown procedure for solving the classical stable marriage problem. In this paper we study this algorithm in the context of the many-to-many stable marriage problem, also known as the stable allocation or ordinal transportation problem. We present an integral variant of the GaleShapley algorithm that provides a direct analog, in the context of "ordinal" assignment problems, of a well-known bicriteria approximation algorithm of Shmoys and Tardos for scheduling on unrelated parallel machines with costs. If we are assigning, say, jobs to machines, our algorithm finds an unsplit (non-preemptive) stable assignment where every job is assigned at least as well as it could be in any fractional stable assignment, and where each machine is congested by at most the processing time of the largest job.
1 Introduction In the United States, a medical school graduate is required to complete a residency program at a hospital before entering the workforce as a doctor. Since the 1950s, the medical field has turned to a centralized mechanism, called the National Residency Matching Program (NRMP), to aid this marketplace [10]. In this program, final-year medical students and hospitals each submit preferences over possible matches, and an algorithm determines which matches will take place. In order for the system to be successful, it is essential that the computed matches be stable. That is, there should be no (student, hospital) pair that both prefer each-other to their assigned partners — such a pair would have an incentive to withdraw from the centralized matching system and to make its own plans on the side. Computing a stable matching is a classic problem in economics and computer science, and can be solved in polynomial time by the deferred acceptance algorithm of Gale and Shapley [3].-' For many years the NRMP proved to be quite successful. However, in the late 1990s it was observed that many matches were being formed outside the NRMP [12]. The problem stemmed from the fact that many medical students were getting married to one another during medical school, and so had complicated preferences that were ignored by the NRMP. In particular, married ^ For a discussion of this problem and related questions, see the books by Gusfield and Irving [4] and Roth and Sotomayor [14], or the lecture notes by Knuth [8]. Please use the following format when citing this chapter: Dean, B.C., Goemans, M.X., Immorlica, N., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), pp. 65-75.
66
B. Dean, M. Goemans, and N. Immorlica
students had strong preferences for hospitals in similar geographical locations. The NRMP was redesigned to accommodate such preferences [13]; currently, the NRMP permits married students to submit a joint preference list over pairs of hospitals and guarantees that, if they are matched, they will be matched to a pair in their list. Unfortunately, in a matching market with couples like the NRMP, a stable matching might not exist [10] and determining whether one exists is computationally difficult, in fact NP-hard [9]. Motivated by the issue of couples in the NRMP, we study a marketplace in which agents on one side of the market have non-uniform demands and agents on the other side have non-uniform quotas, or capacities. Demanding agents have a preference list over capacitated agents and prefer to be satisfied by a lexicographically maximal set of these agents. This problem is known as the stable allocation or ordinal transportation problem, and is a many-to-many generalization of the classical stable marriage problem, introduced originally by Baiou and Balinski [1]. It surfaces naturally in scheduling or load balancing settings where only "ordinal" information (ranked preference lists) is known. When demands are all 1 or 2 and capacities are integral, as in the student/hospital setting, this restricted preference domain becomes a special case of weakly responsive preferences studied by Klaus and Klijn [6]. In such cases, Klaus and Klijn [6] proved that a stable matching always exists. Instances of this problem with generalized demands/capacities include the assignment of teaching assistants (TAs) to courses in academic departments: TAs rank courses, course instructors rank TAs, each course requires a certain number of TA hours, and different TAs are responsible for working different numbers of hours. Another example is the assignment of load to servers in a network - clients prefer servers geographically nearby and servers prefer clients with higher service types. Baiou and Balinski [1] study these generalized settings and prove that even in this case a stable allocation always exists. For many settings, a stable allocation in which the demand of a single agent is satisfied fractionally is undesirable. Although a couple may prefer hospital a to 6 and thus a pair of placements (a, b) to a pair of placements (6, b), such an arrangement imposes strain on the matching. As often happens in labor markets with two-body problems, the couple may negotiate with hospital a to create an extra position, beyond the quota, for the extra member of the couple. In some sense, a fractional stable assignment is not stable. Thus, we seek a stable matching in which all the demand of a single agent is satisfied integrally. Clearly, such a matching may not exist, and so we relax our feasibility constraints and allow capacitated agents to be over-capacitated by at most the maximum demand. With a correspondingly appropriate modification of the definition of stability, we prove that a stable matching always exists, and give a modification of the Gale-Shapley algorithm to find it. Applied to the NRMP setting, our results compute a student-optimal (or hospital-optimal) stable matching where the number of students assigned to each hospital exceeds its quota by at most one position.
The Unsplittable Stable Marriage Problem
67
A close relative of the stable allocation problem is the well-studied transportation problem, where there are linear costs associated with every possible pairing and our objective is to compute a fractional assignment of minimum cost rather than a stable assignment. The stable allocation problem is also known as the ordinal transportation problem since it differs only in that we express the desirability of an assignment in an "ordinal" fashion using ranked preference lists. UnspHttable variants of the transportation problem have been previously considered in the literature, and a celebrated result of Shmoys and Tardos [15] states that from a fractional assignment (where all agents are fully assigned), we can construct an unsplit assignment of no greater cost where each agent is over-capacitated (or congested) by at most the maximum demand. Our results can viewed as a direct analog of this result for the ordinal case.
2 The Model Consider assigning a set [n] := { 1 , 2 , . . . , n} of items to a set [m] of bins. To be somewhat more concrete, let us employ scheduhng terminology and assume we are assigning "jobs" to "machines". Job i requires pi units of processing time, machine j has a capacity of Cj units, and at most Uij units of job i can be assigned to machine j . If Uij = pi for all {i,j), we follow the terminology of Baiou and Balinski [1] and say our problem is unconstrained. All problem data is assumed to be integral. 2.1 EVactional Assignment We first define a fractional setting where a job may be processed on multiple machines. A fractional assignment x is feasible if it satisfies I ] Xij , j ' . Job i prefers a fractional assignment x to another fractional assignment x' if x is lexicographically larger according to n{i); that is, if Xij > x'^j for the earliest machine j in 7r(«) such that Xij j^ x[y In this case, we write x >, x'. Similarly, each machine j G [m] has a strict, transitive, and complete preference relation 7r(j) over the set [n\ U 0 where 0 indicates a preference for being under-utilized. If 7r(j) = ( J i , . . . ,ifc_i,0 = ik,ik+i-, • • • ,^n+i), then j prefers to accept load from job ia to ib for any a < b < k, and is unwilling to process load from any job ic with c > k. We write i >j i' if machine j prefers job i to job i', and we write X >j x' if machine j prefers assignment x to assignment x'; again, this means that Xij > x'ij for the first job i in 7r(j) where Xij ^ xj •. A blocking pair is a familiar feature that is forbidden in any stable assignment: it is a pair {i,j) where Xij < Uij and both i and j prefer each-other to at least some of their current assignments. In this case, job i and machine j would be "unhappy" with the current assignment and would prefer to increase Xij. That is. Definition 1. Job i and machine j form a blocking pair if there is some job i' and machine j ' such that Xij < Uij, Xij> > 0, Xi>j > 0, and we have i >j i' and 3 >i fA job i is saturated if all its load is assigned. Similarly, a machine is saturated if all its capacity is utilized. Definition 2. A job i is saturated if ^ • Xij > pi. A machine j is saturated if A^i ^ij
— ^3 •
Finally, ajob i is said to be popular in an assignment if there is some machine j to which i is not assigned, but where j prefers i to at least some of the jobs currently assigned to it. We define a popular machine similarly. Definition 3. In an assignment x, we say job i is popular if there exists a machine j with j >i 0 and Xij < Uij such that i >j i' for some job i' with Xi'j > 0. Likewise, we say machine j is popular if there exists a job i with i >j 0 and Xij < Uij such that j >i j ' for some machine j ' with Xij' > 0. If job i is popular due to machine j and i is not saturated, then our assignment is not stable since both i and j would be more satisfied if Xij were increased. Definition 4. An assignment x is stable if (i) it admits no blocking pairs, and (ii) all popular jobs and machines are saturated. A feasible stable assignment x is said to be job-optimal if every job prefers X to any other feasible stable assignment x', i.e. V z e [n], x>ix' (a machineoptimal assignment is defined analogously). In a job-optimal assignment, each
The Unsplittable Stable Marriage Problem
69
job simultaneously receives at least as much of an allocation of its first-choice machine as it could in any feasible stable assignment, and it also receives at least as much of an allocation of its second-choice machine as it could in any feasible stable assignment with the same first-choice allocation, and so on. It is always possible to find a job-optimal feasible stable assignment for any problem instance using a strongly-polynomial algorithm of Baiou and Balinski [1]. 2.2 Unsplit Assignment We now consider the "unsplittable" unconstrained stable allocation problem where each job must be entirely assigned to a single machine. Thus the feasible assignments x are precisely the integral solutions to (1) where either Xij = 0 or Xij = Pi for all {i,j). As the following simple example shows, an integral stable assignment may not exist. Example 1. Suppose there are two jobs ii and 12 with demands 1 and 2 respectively, and two machines j i and J2, both with capacity 2. Let Tx{ii) = 7r(i2) = (jii J2) and n{j\) = 7r(J2) = ( H , « 2 ) - Then the only stable assignment is ^iiji = li ^i23i ~ li ^^d x,jj2 = 1, but this is not an unsplit assignment. We therefore consider a relaxation that is directly analogous to a result of Shmoys and Tardos [15] for the bipartite assignment problem with costs. Assuming existence of a feasible fractional assignment of cost C with all jobs fully assigned, Shmoys and Tardos show how to round this solution in polynomial time to obtain an unsplit solution of cost no more than C where each machine is congested (filled beyond its capacity) by at most Pmax = maxjpj. Similar results have been achieved in literature on unsplittable flows (see [7, 2, 16] for more background), where our goal is generally to take a fractional solution to a network flow problem and round it to an unsplit flow (where the flow for each commodity follows a single path) without significantly raising the cost of the flow, and without causing excessive congestion on edges. Definition 5. An assignment x is minimally congested if for every machine j , removal of the least-preferred job (to jj currently assigned to j results in j being utilized at or below its capacity. Note that in a minimally congested assignment, each machine is overcapacitated by at most pmax- We show how a modified version of the GS algorithm can find, in polynomial time, a stable unsplit assignment that is job-optimal among all minimally congested stable unsplit assignments. Suppose a; is a job-optimal feasible stable fractional assignment. We prove that in a job-optimal unsplit assignment, each job is assigned to at least the best of its fractional assignments in x (our analog of the condition that cost does not increase). Our unsplit assignment is stable in that (i) it admits no blocking pairs and (ii) all popular machines are saturated. Note that one must take some care
70
B. Dean, M. Goemans, and N. Immorlica
here with the definition of condition (ii). We define machine j to be saturated with respect to its original capacity, Cj, and not the inflated capacity Cj + Pmax according to which our unspHt solution is feasible, i.e. machine j is saturated if Z^j xy > Cj. Otherwise, it might be impossible to satisfy (ii) by ensuring popular machines are saturated — for example, if Cj is odd but all pj's are even. This definition makes intuitive sense because a machine beyond its capacity will not want any new jobs assigned to it.
3 T h e Gale-Shapley Algorithm Gale and Shapley [3] devised a simple intuitive algorithm, now quite well known, for solving the classical "one-to-one" stable marriage problem. The algorithm is usually described in terms of men being assigned to women, although we continue to use job/machine terminology since it is less awkward once we advance to many-to-many matchings. The Gale-Shapley (GS) algorithm has each job i issue "proposals" to machines in the order of Vs preference list. Each machine j tentatively accepts the best proposal received so far. If machine j is tentatively matched with job i and receives a more favorable proposal, it tentatively accepts the new proposal and rejects i, which then continues to propose to machines further down on its preference list. Remarkably, it can be shown that regardless of the order in which jobs propose, the GS algorithm always terminates with a job-optimal and machine-pessimal stable matching. Each job receives the most preferred partner it could receive in any stable matching, and each machine receives the least preferred partner it could receive in any stable matching. By symmetry, the reverse is true if the machines do the proposing. Baiou and Balinski [1] mention that the GS algorithm can be generalized to solve the many-to-many stable allocation problem, although its running time in this case is only pseudo-polynomial. The generalized GS algorithm issues "aggregate" proposals: in each iteration a job i that is not fully assigned issues a proposal to the next machine j in its preference list and proposes all of its unassigned processing time (up to Wy). Machine j accepts only as much as allowed by its capacity, current allocation, and preference list, possibly rejecting (fractionally) some of the jobs already assigned to it if they are less preferred than job i. Whenever a job is "split" due to a fractional acceptance or rejection, it remains split into two "virtual jobs" for the remainder of the algorithm, each of which carries out independent sequences of proposals. Just as with the classical unit stable matching problem, one can show that order of proposals and rejections does not matter — we always obtain a job-optimal feasible stable assignment. A similarly defined algorithm with machine proposals always finds the machine optimal assignment. Theorem 1. For any order of proposals, the job-proposing GS algorithm computes the job-optimal fractional stable assignment.
The Unsplittable Stable Marriage Problem
71
This theorem follows immediately from the fact that we can interpret the extended GS algorithm for the many-to-many stable allocation problem as nothing more than the standard "one-to-one" GS algorithm applied to an expanded instance where each job i is replaced with p, unit-sized jobs (each with the same preference list) and each machine j is replaced by Cj unit-sized machines (each with the same preference list). The many-to-many algorithm is sped up by issuing proposals in batches, but it inherets from the one-to-one algorithm the property that the final solution must be job-optimal irrespective of the order of proposals. As an interesting remark, if problem data is irrational, then not only does this reduction to the one-to-one case fail, but it is also not known whether the GS algorithm terminates after a finite number of iterations. We comment on this issue further in the conclusion section.
4 Computing Unsplittable Stable Allocations In this section we discuss our "ordinal" analog for the stable allocation problem of the result of Shmoys and Tardos for the minimum-cost bipartite assignment problem. Since the constraints Xij < Uij do not make sense for an unsplittable stable allocation problem, we henceforth assume we are dealing with an unconstrained stable allocation problem. Let us modify the GS algorithm as follows. Jobs issue proposals in sequence according to their preference lists, and in each iteration an arbitrary unassigned job i issues a proposal to the next machine j on its preference list. In this case, however, all proposals and rejections are "integral" in that either an entire job is accepted or rejected. Machine j accepts Vs proposal, but then proceeds to reject in sequence the least favored jobs assigned to it (possibly including i) until j is at most over-congested by the processing time of a single job — that is, until rejecting the next job would leave the machine being utilized strictly below Cj units of load. Note that such an algorithm results in an assignment where each machine is congested by at most the maximum processing time of a job. If each machine stores its accepted jobs in a heap based on preference list ranking, this integral variant of the GS algorithm runs in 0 ( m n log n) time. We now prove some desirable properties of the algorithm. First we show that the assignment output by our algorithm is stable and job-optimal. The proof of the following theorem is similar to the traditional proof for the correctness and optimality of the one-to-one GS algorithm. Theorem 2. The integral job-proposing GS algorithm computes i/ie job-optimal stable unsplit assignment among all minimally congested unsplit stable assignments. Proof. Let x* be the solution output by the GS algorithm. Clearly, x* is an unsplit assignment that congests each machine by at most Pmax- Let x*{i) be the machine to which job i is assigned in x* and x*{j) be the set of jobs to
72
B. Dean, M. Goemans, and N. Immorlica
which machine j is assigned in x* (i.e. x*{j) = {i : x*^ > 0}). We also extend the preference notation such that for a set S, S >j i means i' >j i for all ?' G 5 with i' yt i. We first show that x* is stable. Suppose not. First note that once a machine is saturated, it never again becomes unsaturated. Thus, every popular machine j must be saturated since if j is popular due to i, then i must have proposed to j at some point and been rejected. This means that the instability in x* must be caused by a blocking pair. Let {i,j) be a blocking pair. There are two cases. If i never proposed to j , then, since jobs propose in decreasing order of their preference list, x*{i) >j j which contradictions the assumption that (i, j) is a blocking pair. On the other hand, if i proposed to j and was rejected, then x*{j) >j i since machines only ever improve the set of jobs assigned to them. We now show that x* is job-optimal. Suppose not and let i be the first job rejected by one of its stable machines (i.e. a machine assigned to i in some minimally congested stable unsplit assignment), and let j be the first stable machine to reject i. Call the minimally congested unsplit stable assignment in which i and j are matched x. When j rejected i, in the current tentative assignment x', x'{j) >j i and Y^ii^x'(i)Pi' — ^J- ^^ ^'^^ know that there must be some i' S x'{j)\x{j); if this were not the case and x'{j) C x{j), then x could not have been minimally congested (removal of job i and all other jobs j prefers less than i would still leave machine j saturated). Since i' has not yet been rejected by a stable machine, and since jobs propose in decreasing order of their preference list, j >»' x{i'). But then (i', j) form a blocking pair in x, and so j could not have been a stable machine for i. We now observe that this solution computed by the integral variant of the GS algorithm assigns each job to at least the best of its fractional assignments in the job-optimal fractional assignment. Thus, the jobs weakly prefer the solution output by the integral variant to the solution output by the fractional variant - i.e. the solution is both integral and lexicographically larger. Our proof uses the fact that the order of proposals does not affect the outcome of the GS algorithm. Thus, we can run the fractional variant of the GS algorithm using the order of proposals induced by the integral variant. During this process, we observe that jobs are assigned to the same machines in both variants. However, the fractional variant may have additional proposals to make after the integral variant completes. As jobs always propose to machines in decreasing order of their preference list, and as the fractional (integral) variant computes the joboptimal fractional (unsplit) stable solution, this coupling of the two algorithms shows that the unsplit solution must be preferred to the fractional solution by each job. Let x(i) be the set of machines to which i is partially assigned in assignment X, i.e. x{i) = {j : Xij > 0}. Theorem 3. Consider any feasible fractional stable assignment Xfrac 'md the job-optimal minimally congested unsplit stable assignment Xint- Then for all jobs i, Xint{i)
>i
XfraS)-
The Unsplittable Stable Marriage Problem
73
Proof. The proof follows from Theorem 1 and the fact that jobs propose in decreasing order of their preference list (and so as the algorithm runs the jobs' situations worsen). More formally, consider the sequence of proposals defined by the integral GS algorithm. Call this sequence (11,12,...,«;) (note this list includes repetitions and I may be greater than n). Run the fractional GS algorithm with the same order of proposals. We prove by induction that after the proposal of ik, the current assignment x in the integral variant and x' in the fractional variant satisfy x{j) = x'{j) for all j and a machine is saturated in x if and only if it is in x'. This is clearly true after the proposal of ii. Assume this is the case after the proposal of ife_i and let j be the machine to which ik proposes. By inductive assumption, j must be the same machine in both the integral and fractional variants of the algorithm. If j rejects ik in the integral variant, then it must be that x{j) >j ik and 13iea;(j)P«+ — ^j- Thus, in the fractional variant, X^ig^./^,) x'^J = Cj and x'{j) >j ik so all of i^'s load is rejected. A similar argument holds if j rejects ik in the fractional variant, and so the inductive hypothesis holds. Therefore, after the Tth proposal in the integral variant, the final solution Xint of the integral variant is at least as preferable as the current solution x' of the fractional variant for each job. Furthermore, as jobs propose in decreasing order of their preference list, the final solution Xfrac of the fractional variant cannot be preferred to the current solution x' by any job. This completes the proof. We remark that all the theorems in this paper hold if we instead seek the machine-optimal solution. We merely need to run the Gale-Shapley algorithm with machine-proposals - a machine proposes to the next job on its preference list if it is currently under-utilized (it's load is currently less than its capacity). A job (fractionally) accepts a proposal if it is (fractionally) unassigned or if it prefers the proposing machine to (some of) its current machine(s), in which case it rejects (some of) its current machine(s).
5 Conclusion In this paper, we studied a natural integral variant of the stable allocation problem in which every job was unsplittably assigned and every machine was not excessively congested. Our results have implications for many economic settings where varying sized agents must be matched to each other. Our work leaves open a number of interesting questions: Rural hospitals: It is well known that in one-to-one matching, the set of singles remains the same in every stable matching. Roth [11] extended this theorem and showed that in one-to-many matching, an agent not fully utilized in a stable
74
B. Dean, M. Goemans, and N. Immorlica
matching always receives the exact same assignment in every matching.^ It seems hkely t h a t similar statements might hold in a many-to-many matching as well. It would be interesting to learn whether the same machines are congested in every stable unsplit matching, and if so whether these machines are congested by the same amount in every stable unsplit matching, a n d / o r t h a t the uncongested machines have the same assignment in every stable unsplit matching. Incentives: Centralized matching algorithms like the one proposed in this paper are often used in economic settings where agents are self-interested and might alter their submitted preference list in order to improve their match. It is known t h a t no stable mechanism can be incentive-compatible for b o t h jobs and machines. In a job-optimal mechanism, for example, machines have an incentive to lie. However, Immorlica and Mahdian [5] showed t h a t , in a one-to-many matching, if preference lists of jobs are short and preferences are drawn randomly according to a particular class of distributions, then each agent has a unique stable partner with high probability, and thus has no incentive to lie. It would be interesting to prove a similar statement in the many-to-many setting studied here.
References 1. M. Baiou and M. Balinski. Erratum: The stable allocation (or ordinal transportation) problem. Mathematics of Operations Research, 27(4):662-680, 2002. 2. Y. Dinitz, N. Garg, and M.X. Goemans. On the single-source unsplittable flow problem. Combinatorica, 19:17-41, 1999. 3. D. Gale and L.S. Shapley. College admissions and the stability of marriage. American Mathematical Monthly, 69(1):9-14, 1962. 4. D. Gusfield and R. Irving. The Stable Marriage Problem: Structure and Algorithms. MIT Press, 1989. 5. N. Immorlica and M. Mahdian. Marriage, honesty, and stability. In Proceedings of 16th ACM Symposium on Discrete Algorithms, pages 53-62, 2005. 6. B. Klaus and F. Klijn. Stable matchings and preferences of couples. Journal of Economic Theory, 121:75-106, 2005. 7. J.M. Kleinberg. Approximation algorithms for disjoint paths problems. PhD thesis, M.I.T., 1996. 8. D.E. Knuth. Stable marriage and its relation to other combinatorial problems. In CRM Proceedings and Lecture Notes, vol. 10, American Mathematical Society, Providence, RI. (English translation of Marriages Stables, Les Presses de L'Universite de Montreal, 1976), 1997. 9. E. Ronn. Np-complete stable matching problems. Journal of Algorithms, 11:285304, 1990. 10. A.E. Roth. The evolution of the labor market for medical interns and residents: a case study in game theory. Journal of Political Economy, 92:991-1016, 1984. ^ This is known as the rural hospital theorem as it explains why rural hospitals, typically unpopular among students in the NRMP, always receive the same assignment in every stable matching.
The Unsplittable Stable Marriage Problem
75
11. A.E. Roth. On the allocation of residents to rural hospitals: a general property of two-sided matching markets. Econometrica, 54:425-427, 1986. 12. A.E. Roth. The national residency matching program as a labor market. Journal of the American Medical Association, 275(13) ;1054-1056, 1996. 13. A.E. Roth and E. Peranson. The redesign of the matching market for american physicians: Some engineering aspects of economic design. American Economic Review, 89:748-780, 1999. 14. A.E. Roth and M. Sotomayor. Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis. Cambridge University Press, 1990. 15. D.B. Shmoys and E. Tardos. Scheduling unrelated machines with costs. In Proceedings of the 4th annual ACM-SIAM Symposium on Discrete algorithms (SODA), pages 448-454, 1993. 16. M. Skutella. Approximating the single source unsplittable min-cost flow problem. In Proceedings of the 4ist Annual Symposium on Foundations of Computer Science (FOCS), pages 136-145, 2000.
Variations on an Ordering Theme with Constraints Walter Guttmann and Markus Maucher Fakultat fur Informatik, Universitat Ulm, 89069 Ulm, Germany
[email protected] • markus.maucherQuni-ulm.de Abstract. We investigate the problem of finding a total order of a finite set that satisfies various local ordering constraints. Depending on the admitted constraints, we provide an efficient algorithm or prove NP-completeness. We discuss several generalisations and systematically classify the problems. Key words: total ordering, NP-completeness, computational complexity, betweenness, cyclic ordering, topological sorting
1 Introduction An instance of the betweenness problem is given by a finite set A and a collection C of triples from A, with the task to decide if there is a total order < of ^4 such that for each {a,b,c) € C, either a N such that g{clj) =
£,(ifc'i) 5'(tfc!2) gltkfl) 5(ci\) gl4-)= gl4-)= 5(c|^) gl4-)= ^(ci^.) gldj) 9{tk,3) gltk,i) g{ik,3)
3i + j
= D + k = 2£) + fc = SD + k = 4£) + 3i + j 5£) + 3i + j 6D + 3z + j = 71? + 3i + j iD + 'ii+j = 9£> + 3i + i = lOD + 3i + j = 11-D + fc = 12D + fc = 13-D + fc
for 1 < z < m A 1 < J < 3 A - • ^ ( Q J )
forl
We obtain the missing fact from the proof of Theorem 2 in Sect. 3 as a consequence of Lemma 1. Corollary 1. The problem {(123), (231)} from the family 2 is NP-complete. Proof. Let A', B', and C" characterise an instance of intermezzo. Construct the instance of the problem {(123), (231)} from the family 2 where A = A' U {n} for some n ^ A', and C = C U {(n, a i , 02) | (ai, 02) S B'}. This instance has a solution if and only if the corresponding instance of intermezzo has one. D Continuing the main objective of this section, we reduce intermezzo to the problem {(123), (321)} from the current family. Note that the existing NPcompleteness proof for betweenness does not apply here because it uses nondisjoint triples [2]. Lemma 2. The problem {(123), (321)} from the family 4 is NP-complete. Proof. Let A', B', and C" characterise an instance of intermezzo. Construct the instance of betweenness where A extends A' by three new elements a[, a'2, as for each (01,02,03) e C". Note that there are 3|C"| distinct new elements since the triples in C are pairwise disjoint. Moreover, C consists of two triples (01,03,03), (0^,02,02) for each (01,02,03) £ C. Finally, for each (01,02,03) G C, B extends B' by inserting three new pairs (o']^,oi), (03,02), (02,03) and, for each pair (a,ai), one new pair {a,a'i). Intuitively, an element oi is split into two elements oi and o'l such that o'l immediately precedes Oi. Assume there is a total order - 5{p). So we are guaranteed, in order to find the maximal surprising strings, that we need to compute the index only for the nodes of the BuST. In addition the algorithm runs in a time proportional to the size of the BuST itself, which is subquadratic on average. Note also that the number of maximal surprising strings (modulo the approximations introduced by the relation) is of the same size of the BuST, so we are computing the 2;-score in optimal size and time.
6 Conclusions We presented BuST, a new index structure for strings, which is an extension of Suffix Trees where the alphabet is enriched with a non-transitive relation, encapsulating some form of approximate information. This is the case, for instance, of a relation induced by the Hamming distance for an alphabet composed of macrocharacters on a base one. We showed that the average size of the tree is subquadratic, despite a quadratic worst case dimension, and we provided a construction algorithm linear in the size of the structure. In the final section, we discussed how BuST can be used for computing in a efficient way a class of measures of statistical approximate overrepresentation of substrings of a text a. We have also an implementation of the (naive) construction of the data structure in C, which we used to perform some tests on the size of BuST, showing that the bound given in Section 3 is rather pessimistic (cf. again [4]). BuST allow to extract approximate information from a string a in a simple way, essentially in the same way exact information can be extracted from ST.
102
L. Bortolussi, F. Fabris, and A. Policriti
In addition, they are defined in an orthogonal way w.r.t. the relation and the alphabet used, hence they can be adapted in different contexts with minor efforts. Their main drawback is t h a t the usage of a relation on the alphabet permits to encode only a localized version of approximate information, like global Hamming distance distributed evenly along strings. Future directions include the exploration of other application domains, like using the information contained in BuST to build heuristics for the difficult consensus substring problem (cf. [7]).
References 1. A. Apostolico, M. E. Block, and Lonardi. Monotony of surprise and large-scale quest for unusual words. Journal of Computational Biology, 7(3-4):283-313, 2003. 2. A. Apostolico, M. E. Block, S. Lonardi, and X. Xu. Efficient detection of unusual words. Journal of Computational Biology, 7(l-2):71-94, 2000. 3. A. Apostolico and C. Pizzi. Monotone scoring of patterns with mismatches. In Proceedings of WABI 2004, 2004. 4. L. Bortolussi, F. Fabris, and A. Policriti. Bundled suffix trees. Technical report, Dept. of Maths and Informatics, University of Udine, 2006. http://www.dimi.uniud.it/bortolus/techrep.htm. 5. R. Cole, L. Gottlieb, and M. Lewenstein. Dictionary matching and indexing with errors and don't cares. In Proceedings of STOC 2004, pages 91-100, 2004. 6. D. Gusfield. Algorithms on Trees, Strings and Sequences: Computer Science and Computational Biology. Cambridge University Press, London, 1997. 7. L. Marsan and M. F. Sagot. Extracting structured motifs using a suffix tree algorithms and application to promoter consensus identification. In Proceedings of RECOMB 2000, pages 210-219, 2000. 8. G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(l):31-88, 2001. 9. G. Navarro, R. Baeza-Yates, E. Sutinen, and J. Tarhio. Indexing methods for approximate string matching. IEEE Data Engineering Bulletin, 24(4):19-27, 2001. 10. W. Szpankowski. A generalized suffix tree and its (un)expected asymptotic behaviors. SIAM J. Computing, 22:1176-1198, 1993. 11. W. Szpankowski, P. Jacquet, and B. McVey. Compact suffix trees resemble patricia tries: Limiting distribution of depth. Journal of the Iranian Statistical Society, 3:139-148, 2004.
An 0(1) Solution to the Prefix Sum Problem on a Specialized Memory Architecture Andrej Brodnik^^, Johan Karlsson^, J. Ian Munro^, and Andreas Nilsson-"^ ^ Lulea University of Technology Dept. of Computer Science and Electrical Engineering S-971 87 Lulea Sweden {j ohan.karlsson,andreas.nilsson}@csee.Itu.se ^ University of Primorska Faculty of Education Cankarjeva 5 6000 Koper Slovenia andrej.brodnikOpef.upr.si
^ Cheriton School of Computer Science University of Waterloo Waterloo, Ontario Canada, N2L 3G1 imunroOuwaterloo.ca
Abstract. In this paper we study the Prefix Sum problem introduced by Predman. We show that it is possible to perform both update and retrieval in 0(1) time simultaneously under a memory model in which individual bits may be shared by several words. We also show that two variants (generalizations) of the problem can be solved optimally in 0(lg A^) time under the comparison based model of computation.
1 Introduction Models of computation play a fundamental role in theoretical Computer Science, and indeed, in the subject as a whole. Even in modeling a standard computer, the random access machine (RAM) model has been subject to refinements which more realistically model cost or, as in this paper, suggest feasible extensions to the model that permit more efRcient computation, at least for some problems. Work taking into account a memory hierarchy, either when memory and page sizes are known (cf. [2]) or not (cf. [11]) is an example of the former. Taking into account parallelism, as in the PRAM model (cf. [17,26]), is an obvious example of the latter. More subtle examples include the recent result that the operations of an arbitrary finite Abelian group can be carried out in constant time (We assume a word of memory is adequate to hold the size of the group.) provided one can reverse the bits of a word in constant time [8]. This argues for a more robust set of operations. Here we deal with the way a single level memory is Please use the following format when citing this chapter: Brodnik, A., Karlsson, J., Munro, J.I., Nilsson, A., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), pp. 103-114.
104
A. Brodnik et al.
organized and demonstrate that the power of a machine can be increased if we permit individual bits to occur in several words simultaneously. This Random Access Machine with Byte Overlap (RAMBO) was first suggested by Predman and Saks [10] and subsequently used by Brodnik et al. [6] and Brodnik and lacono [7]. Indeed it is shown in the latter two papers that a priority queue of word sized objects can be maintained in constant time under a particular form of the RAMBO model, whereas Beame and Fich [3] and Brodnik and lacono [7] have both shown lower bounds on the problem under various forms of the RAM model. Here we discuss solutions to variants of the Prefix Sum problem (i.e. finding the sum of the first j elements in an array and also updating these values) which was introduced by Predman [9]. Various lower bounds have been proven for the problem. We, however, focus on the problem under a nonstandard, though very feasible, model to achieve a constant time solution. Predman and Saks actually suggested the RAMBO model in connection with the Prefix Sum problem. They claim, with no hint of how it may be done, that Prefix Sum mod 2 can be solved in constant time under the model. We show how this can be done not only for Prefix Sum mod 2 but for Prefix Sum modulo an arbitrary universe size M < 2^^^/"^ where b is the word size, n = [Ig A''] and A'' is the size of the array. The RAMBO model, besides the usual RAM operations (cf. [27]), also has a part of memory where a bit may occur in several registers or in several positions in one register. The way the bits occur in this part of the memory has to be specified as part of the model. One example of such a memory variant is a square of bits with b rows and b columns. A 6-bit word can be fetched either as a row or a column. In such a memory each bit can be accessed either by the row word or the column word. The form of RAMBO used by Brodnik et al. [6] to solve the priority queue problem in 0(1) worst case time makes use of words corresponding to the leaves of a balanced binary tree. Each node of the tree contains a flag bit and each such word contains the flags along the root to leaf path, so, for example, the flag at the root is in all of these words. The speciflc architecture was called Yggdrasil after the giant ash tree linking the worlds in Norse mythology. That variant has been implemented in hardware [18] and the actual rerouting of the bits on a word fetch is not difficult. In this paper we modify the Yggdrasil variant slightly and solve the Prefix Sum problem. This gives further evidence of the value of such an architecture, at least for a special purpose processor. Now let us formally define the Prefix Sum problem: Definition 1 The Prefix Sum problem is to maintain an array, A, of size N, and to support the following operations: Update ( j . Z\) A{j) := A{j) + A Retrieve(j) return ^^^Q A{i) where 0 < j < N.
An 0(1) Solution to the Prefix Sum Problem
105
Predman showed that, under the comparison based model of computation, an 0(lg A'') solution exists for the Prefix Sum problem [9]. The problem can be generalized in several ways and we start by adding another parameter, k to the Retrieve operation. This parameter is used to tell the starting point of the array interval to sum over. Hence, R e t r i e v e ( k , j ) returns ^ ^ ^ ^ . ^ ( j ) ; where 0 < k < j < N. This variant is usually referred to as the Partial Sum or Range Sum problem. The Partial Sum problem can be solved using a solution to the Prefix Sum problem ( R e t r i e v e ( k , j ) = R e t r i e v e ( j ) - Retrieve(k-1)). In fact, the two problems are often used interchangeably. Furthermore, there is no obvious reason to only allow addition in the Update and Retrieve operations. We can allow any binary function, ®, to be used. In fact we can allow the Update operation to use one function, ©„, and the Retrieve operation to use another function, (Br- We will refer to this variant of the problem as the General Prefix Sum problem. Moreover, one can allow array position to be inserted at or deleted from arbitrary places. Hence, we can have sparse arrays, e.g. an array where only ^(5) and A{500) are present. Positions which have not yet been added or have been deleted have the value 0. We refer to this variant as the Dynamic Prefix Sum problem. Brodnik and Nilsson [21, pp 65-80] describe a data structure they call a BinSeT tree which can be modified slightly to support all operation of the Dynamic Prefix Sum problem in 0(lg A'') time. The Searchable Partial Sum problem extends the set of operations with a s e l e c t ( j ) operation which finds the smallest i such that '^].^oA{k) > j [23]. Hon et al. consider the Dynamic version of the Searchable Partial Sum problem [16]. Another generalization is to use multidimensional arrays and this variant has been studied by the data base community [4,12,13,15,24,25]. Several lower bounds have been presented for the Prefix Sum problem: Fredman showed a i7(lg N) algebraic complexity lower bound and a J7(lg N/ Iglg N) information-theoretic lower bound [9]. Yao [29] has shown that J?(lg A''/lglg A'') is an inherent lower bound under the semi-group model of computation and this was improved by Hampapuram and Fredman to i?(lg N) [14]. We side step these lower bounds by considering the RAMBO model of computation [5,10]. As with all RAM based model we need to restrict the size of a word which can be stored and operated on. We denote the word size with b and assume that b is an integer power of 2 which is true for most computers today. A bounded word size also implies a bounded universe of elements that we store in the array. We use M to denote the universe size. Hence all operations © have to be computed modulo M and we require that each of the operands and the result are stored in one word. We will use n and m to denote [Ig N] and \\g M] respectively. Hence, A'' < 2" and M < 2™. Both n and m are less than or equal to b, (n, m < b). In one of the solutions we actually require that nm < b. In Sect. 2 we show a 0(1) solution to the Prefix Sum problem under the RAMBO model using a modified Yggdrasil variant. In Sect. 3 we discuss a
A. Brodnik et al.
106
0(lg A'') solution to the General and Dynamic Prefix Sum problems and finally conclude the paper with some open questions in Sect. 4.
2 An 0(1) Solution to the Prefix Sum Problem In our 0(1) solution to the Prefix Sum problem we use a complete binary tree on top of the array (Fig. 1). We label the nodes in standard heap order, i.e., the root is node vi and the left and right children of a node Ui are r^2» and I'si+i respectively. In each node we store m bits representing the sum of the leaves in the left subtree. Since we build a complete binary tree on top of the array we assume that N = 2"' (if this is not true we still build the complete tree and in worst case waste space proportional to N/2 — 1). We do not store the original array A since its values are stored implicitly in the tree. The only value not stored in the tree (if N — 2^ only) is ^(A'' — 1) and we store this value explicitly (vnl). Formally we define: Definition 2 A N-m-tree is a complete binary tree with N leaves in which the internal nodes (vi) store a m-bit value. In addition, a m-bit value is stored separately (Vnlj. To update A{j) (Algorithm 1) in this structure we have to update all the nodes on the path from leaf j to the root in which j belongs to the left subtree. To Retrieve (j) (Algorithm 2) we need to sum the values of all the nodes on the path from leaf j + 1 to the root in which j + 1 belongs to right subtree. Note that the path corresponding to array position j starts at node '^N/2+J/2-
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Fig. 1. Complete binary tree on top of A- Nodes are storing the sum of the values in the leaves covered by the left subtree. The method described above implies a 0{\gN) update and retrieval time in the RAM model. To achieve constant time update and retrieval we use a variant of the RAMBO model similar to the Yggdrasil variant. In the Yggdrasil variant, registers overlap as paths from leaf to root in a complete binary tree with one bit stored in each internal node [6]. We generalize the Yggdrasil variant and let it store m bits in each node and call this variant m-Yggdrasil. In any
An 0(1) Solution to the Prefix Sum Problem
107
update (j, A) if (j == N-1) vnl = vnl + A; else i = N + j; while (i > 1) next = i div 2;
if a mod 2 == 0) I'next
= Vnext
+ ^
mod
M) ;
i = next;
Alg 1: Updating of a N-m-tree in 0(lg A'") time.
retrieve(J) if (j == N-1) sum = v n l ; i = N+j ; else sum = 0; i = N + j + 1; while ( i > 1) next = i div 2; if (i mod 2 == 1) sum = sum + Vnext mod i = next; r e t u r n sum;
M;
Alg 2: Retrieve in a N-m-tree in 0(lgN)
time.
m-Yggdrasil, register r e g [ i ] corresponds to the path from node i'N/2+i to the root of the tree. Each register consists of nm < b bits. In total the m-Yggdrasil registers need {N — I) • m bits. Now, we use the registers from m-Yggdrasil to store the nodes of our tree. The path corresponding to array position j is stored in reg [j /2] and hence all nodes along the path can be accessed at once. We let levels of the tree be counted from the internal nodes above the leaves starting at 0 and ending with n — 1 at the root. If the ith bit of j is 1 then j is in the right subtree of the node on level i of the path and in the left otherwise. Hence j can be used to determine which nodes along the path should be updated (nodes corresponding to bits of j that are 0) and which nodes should be used when retrieving a sum (nodes corresponding to bits of j that are 1). When updating the m-Yggdrasil registers (Algorithm 3), for all bits of j , if the ith bit of j is 0 we add A to the value of the ith node along the path from j to the root. To do this we shift A to the corresponding position {A « (im)) and add to r e g [ j / 2 ] . Instead of checking whether the ith bit of j is 0 we can
108
A. Brodnik et al.
mask the shifted A with a value based on NOTj. T h e value consists of, if the i t h bit of NOT j is 1, TO Is shifted to the correct position and TO OS otherwise.
update y , zl) if (j == N-1) vnl = vnl + else for (i=0; 0 if ( ( ( j » reg[j/2]
A; < n; i++) i ) AND 1) == 0) = r e g [ j / 2 ] + (.A «
(i*in));
A l g 3 : U p d a t i n g of a N-m-tree stored in m-Yggdrasil memory ( 0 ( l g A'') time).
Actually, as long as the binary operation only affects the TO bits t h a t should be u p d a t e d we can use word-size parallelism (cf. [5]) and perform the u p d a t e of all nodes in parallel. In Sect. 2.1 we show t h a t addition modulo M can be implemented affecting only m bits. We use two functions ( d i s t ( i ) and m a s k d ) ) to simplify the description of the u p d a t e and retrieve methods. T h e function d i s t ( i ) , (0 < i < 2™) computes nTO-bit values. T h e values are n copies of the TO bits in i. For example, given m = 3 , n = 4 d i s t ( O l O ) is 010010010010. T h e function m a s k d ) , (0 < i < 2") also computes n m - b i t values. These values are computed as follow: bit j (0 < j < n) of i is copied to bits jrn..{j + 1)TO — 1. For example, given TO = 3, n = 4, mask (1001) is 111000000111. Both these functions can be implemented by using word-size parallelism [5]. We can u p d a t e the tree in constant time using the procedure in Algorithm 4. First we make n copies of A and then mask out the copies we need. T h e n finally we add the value in r e g [ j / 2 ] and the masked distributed A and store the result in r e g [ j / 2 ] . For the case when j = N —1 we simply add v n l and A and store it in v n l . This gives us the following lemma: L e m m a 1 The update operation of the Prefix Sum problem in 0{1) when part of the N-m-tree is stored in a m-Yggdrasil
can be supported memory.
update ( j . A) if (j == N-1) vnl = vnl + A; else r e g [ j / 2 ] = r e g [ j / 2 ] + (dist(Z\) AND mask(NOT j ) ) ; A l g 4: Updating of a N-m-tree stored in m-Yggdrasil memory using word size parallelism ( 0 ( 1 ) time).
An 0(1) Solution to the Prefix Sum Problem
109
To support the retrieve method in constant time we use a table SUM [ i ] , (0 < i < 2"™) with m-bit values that are the sum modulo M of the n m-bit values in i. To retrieve the sum (Algorithm 5) we read the register reg corresponding to j and mask out the parts we need. Then we use the table SUM to calculate the sum. Finally, we add vnl to the sum if j = A'' — 1.
retrieve (j) if (j == N-1) V = reg[j/2] AND mask(j); else V = reg[(j+l)/2] AND mask(j+l); sum = SUM [v] ; if (j == N-1) sum = vnl + sum; return sum; Alg 5: Retrieve in a N-m-tree stored in m-Yggdrasil memory using word size parallelism (0(1) time).
The space needed by the table SUM is 2"™ . m = A^'s^ • m = M^s^ • m, which is rather large. In order to reduce the space requirement we can reduce, by half, the number of bits used as index into the table. This gives us a space requirement of vM^^-m. We do this by shifting the top n/2 m-bit values from reg down and computing the sum modulo M of these values and the bottom n/2 values. Then this new (n/2)m-bit value is used as index into SUM instead. We can actually repeat this process until we get the m-bit we desire, and hence we do not need the table SUM (Algorithm 6). However, this does increase the time complexity to O(lgn) = 0{lglgN). This gives us a trade off between space and time. By allowing 0(/,) steps for the retrieve method we need M's-'^/^'. m bits for the table. Lemma 2 The retrieve operation of the Prefix Sum problem can be supported in 0{L + 1) time using 0(M^sN/2^ .m + m) bits of memory in addition to the N-m-tree. Part of the N-m-tree is stored in m-Yggdrasil memory. By adjusting c we can achieve the following result: Corollary 1 The retrieve operation of the Prefix Sum problem can be supported in: - 0(1) time using 0{M^^^^^'^^^'^-m) bits of memory in addition to the N-m-tree, with t = 1. - 0(lglgA'') time using 0{m) bits of memory in addition to the N-m-tree, with L=\\g\gN^.
110
A. Brodnik et al.
retrieve (j) if (j == N-1) V = reg[j/2] AND mask(j); else V = reg[(j+l)/2] AND mask(j+l); L = [ign]; do i = i-1;
vnew = (v»((2')m)) + (v A N D ((l«((2')m))-l)); V = vnew; while ((. > 0) sum = v; if (j == N-1)
sum = vnl + sum; return sum; Alg 6: Retrieve in a N-m-tree stored in m-Yggdrasil memory using no additional memory (0(lglgA'') time).
2.1 Addition modulo M Let us consider the two m-bit operands a and h which are split into two pieces each {aio, a^i, bio and bhi)- The two pieces aio and ahi contain the m/2 least and most significant bits of a respectively (similarly for bio and bhi)- Note that aio and the other pieces are stored in m-bit but only the m/2 least significant bits are used. We can now add the the two operands clio = aio + bio
(1)
clhi
(2)
= dhi + bhi •
However, both cljo and clhi might need m/2 + 1 bits for its result. The m/2 + 1 bit of clio should be added to clhi and we split clio into two pieces (cl^o./o and clio,hi) and add the most significant bits to clhi, Chi ~ Chi + Cio,hi
(3)
Clo = Clo^io .
(4)
The result of a + 6 is now stored in cio and Chi and we have not used more than m bits in any word. However, in total m + 1 might be needed for the value. To compute c mod M we can check whether or not c — M >= 0, if so c mod M = c — M and otherwise c mod M = c. However, we do not want to produce a negative value since that would affect all the bits in the word. Instead we add an additional 2™ to the value and compare to 2™, i.e. c + 2™ — M > 2"^. Since 2"* — M > 0 this will never produce a negative value. Note that c + 2 ' " - M < M - l + M - l + 2 ' " - M = M + 2 ' " - 2 < = 2"+^ - 2 which
An 0(1) Solution to the Prefix Sum Problem
111
only needs m + 1 bits to be represented. Hence, if we calculate this value using the strategy above we will not use more than m bits of any word. Furthermore, a straight forward less than comparison can not be performed using word-size parallelism since all bits of the words are considered. Instead we view the comparison as a check whether the m + 1st bit is set or not. If it is set the value is larger than or equal to 2"* (cf. [19,22]). We can actually create a bit mask which consists of m Is if the m + 1st bit is set and m Os otherwise d = (c + 2™ - M AND 2"^) - ((c + 2™ - M
AND
2"^) »
m) .
(5)
This bit mask d can then be used to calculate res — c mod M. Since res is equal to c — M if the m + 1st bit of c is set and c otherwise we get res = ((c - M)
AND
d) OR (c AND NOT d) .
(6)
When computing c — M we must make sure that we do not produce a negative value. This is done by using a similar strategy as for addition above, but we also set any of the bits in Chi,ki to 1 during the computation. If c — M is greater than 0 this will not affect the result and otherwise the result will not be used. We have a procedure which can be used to compute (a + b) mod M without using more than m bits in any word. Hence, word-size parallelism can be used and we get our main result from this section: Theorem 1 Using the N-m-tree together with the m- Yggdrasil memory we can support the operations of the Prefix Sum problem in 0{i+l) time using {N—l)m bits of m-Yggdrasil memory and OlM^/"^ • m + m) bits of ordinary memory.
3 An O(lgAr) Solution to the General and Dynamic Prefix Sum Problem We can actually partially solve the General Prefix Sum problem using the N-mtree data structure and the m-Yggdrasil variant of RAMBO. All binary operations such that all elements in the universe have a unique inverse element (i.e. binary operations which form a Group with the set of elements in the universe) and only affect the m bits involved in the operation can be supported. This includes for example addition and subtraction but not the maximum function. To solve the General and Dynamic Prefix Sum problem for semi-group operations we modify the Binary Segment Tree (BinSeT) data structure suggested by Brodnik and Nilsson. It was designed to handle in-advance resource reservation [21, pp 65-80] and if it is slightly modified it can solve both the General and Dynamic Prefix Sum problems efficiently. The original BinSeT stores, in each internal node, (U, the maximum value over the interval, and 5^ the change of the value over the interval. Further, it also stores r, the time of the left most event in the right subtree. Instead of storing times as interval dividers we store array indices. To solve the Dynamic Prefix Sum problem with addition as operation and we only need
112
A. Brodnik et al.
to store 5. When solving the General and Prefix Sum problem one need to store information depending on the two binary operations ®u and ©rWhen adding a new array position or deleting an array position the tree is rebalanced (of. [1,20]) and hence the height is always 0(lg A^). When updating a value in an array position we start at the root and search for the proper leaf using the interval dividers. During the back tracking of the recursion we update the information stored in each affected node. At retrieval we process the information of the proper nodes when traversing the tree. Since the height of the tree is 0(lg A'') all the operations can be performed in 0(lgA'') time. This matches the lower bound by Hampapuram and Predman [14] BinSeT consists of 0{N) nodes when we use it to solve the General Prefix Sum. Each node contains 0(1) m-bit values and hence the total space requirement is 0{Nm) bits.
4 Conclusion The Dynamic and General Prefix Sum problems can both be solved optimally in 0{\gN) using 0{Nm) space under the comparison based model with semigroup operations. The Prefix Sum problem can be solved in 0(1) time under the RAMBO model when we allow 0(V'M(r'g^l) . rn) bits of ordinary memory and 0{Nm) bits of m-Yggdrasil memory to be used. This is a huge amount of ordinary memory and if we restrict the space requirement to be sub exponential in both A'' and M {0{m) bits of ordinary memory and 0{Nm) bits of m-Yggdrasil memory) we need to used 0(lglg A'') time. We know of no better lower bound under RAMBO than the trivial Q{1) when only allowing 0((ArO(i) + M ° ( i ) ) m ) space. Further, it is currently unknown if one can achieve a 0(1) solution to the Dynamic and General Prefix Sum problems using the RAMBO model. Another open question is whether or not it is possible achieve a o(lg A'') solution to the multidimensional variant.
Acknowledgment We thank the anonymous reviewers for helpful comments and additional references.
References 1. G. M. Adelson-Velskii and E. M. Landis. An algorithm for the organization of information. In Soviet Math, Doclady 3, pages 1259-1263, 1962.
An 0(1) Solution to the Prefix Sum Problem
113
2. Alok Aggarwal and Ashok K. Chandra. Virtual memory algorithms (preliminary version). In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, pages 173-185. ACM Press, May 2-4 1988. 3. P. Beame and F. E. Fich. Optimal bounds for the predecessor problem and related problems. Journal of Computer and System Sciences, 65(l):38-72, 2002. 4. Fredrik Bengtsson and Jingsen Chen. Space-efficient range-sum queries in OLAP. In Yahiko Kambayashi, Mukesh Mohania, and Wolfram W6i3, editors, Data Warehousing and Knowledge Discovery: 6th International Conference DaWaK, volume 3181 of Lecture Notes in Computer Science, pages 87-96. Springer, September 2004. 5. Andrej Brodnik. Searching in Constant Time and Minimum Space (MmiMM R E S MAGNI MOMENTI SUNTJ. PhD thesis, University of Waterloo, Waterloo, Ontario, Canada, 1995. (Also published as technical report CS-95-41.). 6. Andrej Brodnik, Svante Carlsson, Michael L. Fredman, Johan Karlsson, and J. Ian Munro. Worst case constant time priority queue. Journal of System and Software, 78(3):249-256, December 2005. 7. Andrej Brodnik and John lacono. Dynamic predecessor queries. Unpublished manuscript, 2006. 8. Arash Farzan and J. Ian Munro. Succinct representation of finite abelian groups. In Proceedings of the 2006 International Symposium on Symbolic and Algebraic Computation, Lecture Notes in Computer Science. Springer, 2006. To appear. 9. Michael L. Fredman. The complexity of maintaining an array and computing its partial sums. Journal of the ACM, 29(l):250-260, January 1982. 10. Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 345-354. ACM Press, May 14-17 1989. 11. Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In IEEE, editor, 40th Annual Symposium on Foundations of Computer Science (FOCS), pages 285-297. IEEE Computer Society, IEEE Computer Society, October 17-19 1999. 12. Steven P. Geffner, Divyakant Agrawal, Amr El Abbadi, and T. Smith. Relatve prefix sums: An efficient approach for querying dynamic OLAP data cubes. In Proceedings of the 15th International Conference on Data Engineering, pages 328335, 1999. 13. Steven P. Geffner, Mirek Riedewald, Divyakant Agrawal, and Amr El Abbadi. Data cubes in dynamic environments. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pages 31-40, 1999. 14. Haripriyan Hampapuram and Michael L. Fredman. Optimal biweighted binary trees and the complexity of maintaining partial sums. SIAM Journal on Computing, 28(l):l-9, 1998. 15. C. Ho, R. Agrawal, N. Megiddo, and R. Srikant. Range queries in OLAP data cubes. In Proceedings ACM SIGMOD International Conference on Management of Data, pages 73-88, 1997. 16. Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. Succinct data structure for searchable partial sums. In Toshihide Ibaraki, Naoki Katoh, and Hirotaka Ono, editors, Algorithms and Computation - ISAAC 2003, 14th International Symposium, volume 2906 of Lecture Notes in Computer Science, pages 505-516. Springer, December 2003. 17. Richard M. Karp and Vijaya Ramachandran. Parallel algorithms for sharedmemeory machines. In van Leeuwen [28], chapter 17, pages 869-941.
114
A. Brodnik et al.
18. Roni Leben, Marijan Miletic, Marjan Spegel, Andrej Trost, Andrej Brodnik, and Johan Karlsson. Design of high performance memory module on PCIOO. In Proceedings Electrotechnical and Computer Science Conference, pages 75-78, Slovenia, 1999. 19. Kjell Lemstrom, Gonzalo Navarro, and Yoan Pinzon. Practical algorithms for transposition-invariant string-matching. Journal of Discrete Algorithms, 3(24):267-292, 2005. 20. Anany Levitin. Introduction to The Design & Analysis of Algorithms. Pearson Education Inc., Addison-Wesley, 2003. 21. Andreas Nilsson. Data Structures for Bandwidth Reservation and Quiality of Service on the Internet. Lie. thesis, Department of Computer Science and Electrical Engineering, Lulea University of Technology, Lulea, Sweden, April 2004. 22. W. Paul and J. Simon. Decision trees and random access machines. In Proc. Int'l. Symp. on Logic and Algorithmic, pages 331-340, Zurich, 1980. 23. Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct dynamic data structure. In Algorithms and Data Structures, 7th International Workshop, volume 2125 of Lecture Notes in Computer Science, pages 426-437. Springer, 810 August 2001. 24. Mirek Riedewald, Divyakant Agrawal, and Amr El Abbadi. Flexible data cubes for online aggregation. In Database Theory - ICDT 2001, 8th International Conference, London , UK, January 4-6, 2001, Proceedings, volume 1973 of Lecture Notes in Computer Science, pages 159-173, 2001. 25. Mirek Riedewald, Divyakant Agrawal, Amr El Abbadi, and Renato Pajarola. Space-efRcient data cubes for dynamic environments. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (DaWak), pages 24-33, 2000. 26. L. G. Valiant. General purpose parallel architectures. In van Leeuwen [28], chapter 18, pages 943-971. 27. Peter van Emde Boas. Machine models and simulations. In van Leeuwen [28], chapter 1, pages 3-66. 28. Jan van Leeuwen, editor. Handbook of Theoretical Computer Science, volume A: Algorithms and Complexity. Elsevier/MIT Press, Amsterdam, 1990. 29. Andrew C. Yao. On the complexity of maintaining partial sums. SIAM Journal on Computing, 14(2):277-288, May 1985.
An Algorithm to Reduce the Communication Traffic for Multi-Word Searches in a Distributed Hash Table Yuichi S e i \ K a z u t a k a Matsuzaki^, and Shinichi Honiden^ ^ The University of Tokyo Information Science and Technology Computer Science Department, Tokyo, Japan s e i Q n i i . a c . j p ^ The University of Tokyo Information Science and Technology Computer Science Department, Tokyo, Japan m a t s u z a k i S n i i . a c . j p ^ National Institute of Informatics, Tokyo, Japan honidenQnii.ac.jp
A b s t r a c t . In distributed hash tables, much communication traffic comes from multi-word searches. The aim of this work is to reduce the amount of traffic by using a bloom filter, which is a space-efficient probabilistic data structure used to test whether or not an element is a member of a set. However, bloom filters have a limited role if several sets have different numbers of elements. In the proposed method, extra data storage is generated when contents' keys are registered in a distributed hash table system. Accordingly, we propose a "divided bloom filter" to solve the problem of a normal bloom filter. Using the divided bloom filter, we aim to reduce both the amount of communication traffic and the amount of data storage.
1 Introduction Peer-to-peer systems are distributed networks t h a t can share contents or services without the need for a central server. T h e first peer-to-peer systems, such as Napster [5] and Gnutella [1], lacked scalability. Distributed hash table (DHT) systems such as Chord [19], CAN [15], and P a s t r y [17] aim to overcome this challenge. T h e D H T provides storage and retrieval by using a hash function. W h e n a node participates in the D H T system, it is given a range of hash values for which it is responsible. T h e n the node finds the hash value of the key^ of the content it has. It then sends [h(key), the content ID, its address] t o any node participating in the D H T . T h e message is forwarded from node t o node until it gets to the node responsible for h(key). Once this has been done, the contents can be found by any user; the user needs only to again hash a key to h(key) and ask any node to find the d a t a corresponding with h(key). In full-text searching, each node stores the posting list for the word(s) it is responsible for. A query involving multiple words requires t h a t the postings for ^ We call the hash value of x "h(a Please use the following format when citing this chapter: Sei, Y., Matsuzaki, K., Honiden, S., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), pp. 115—129.
116
Y. Sei, K. Matsuzaki, and S. Honiden
one or more of the words be sent over the network. For simpHcity, this discussion will assume a two-word query. Sending the smaller of the two postings to the node holding the larger posting list is cheaper; the latter node then performs the intersection and ranking and returns the few highest-ranking document identifiers. According to [13], analysis of 81,000 queries made to a search engine for mit.edu [4] shows that the average query would move 300,000 bytes of postings across the network. Of the queries analyzed, 40% involved just one word, 35% two, and 25% three or more. Google indexes more than 3 billion Web documents [2], and mit.edu has 1.7 milhon Web pages; scaling to the size of the Web (3 billion pages) suggests that the average query might require 530 MB. If the Internet bandwidth of users is 1 Gbps, and users want to get a reply to their query within 0.5 seconds, for example, the amount of traffic must be less than 0.5 Gb (12.1% of 530 MB). The normal process of searching for multi-word text in a DHT system is shown schematically in Figure 1 and Table 1-SA. We call this method simple algorithm (SA). The example in the figure and table represents the case of searching for two words, "Wl" and "W2". Usually, the transmission from a node to a destination node needs other intermediary nodes; however, in this paper, we omit the intermediary nodes. In the case of SA, a huge amount of traffic occurs when the node responsible for h(Wl) transmits content IDs to the node responsible for h(W2). To reduce this traffic, in the related works we will introduce in Section 2, two main types of measures are taken: using a device for (1) registering contents' keys, or (2) transmitting content IDs. We suggest using a divided bloom filter (DBF), as well as using both devices (1) and (2). First, as regards measure (1), we reduce the amount of traffic in searching for multi-word contents by using a bloom ffiter ([8], [9]) when a node registers its contents' keys. In addition, as regards measure (2), we reduce the amount of traffic by transmitting the DBF of content IDs in place of the content IDs themselves.
2 Related Work The bloom filter is used in this paper and in related works in an aim to reduce the amount of traffic in searching for multi-word text in DHT systems. We describe this filter below. 2.1 Bloom filter A bloom ffiter is a space-efficient probabilistic data structure used to test whether or not an element is a member of a set. A basic description of a bloom filter and its problem are given in this subsection.
An algorithm to reduce the commmunication traffic in a DHT 4. [All content Ids which include "W1°], h(W2), user address
5. Extraction of the intersection of the received ids and tlie saved ids.
Node
T^ode responsible for h(W1)
responsible for h(W2) ' 6. [All content ids which include both "Wl" and "W2".]
3. h(W1), h(W2), useraWress
2. Calculation of h(W1), h(W2)
117
\
1.1 want contents which contain "W1" and "W2", User
Fig. 1. The process of simple algorithm: normal searching for multi-word text (here, a user want contents which contain the two words "Wl" and "W2") on a DHT Basic description of Bloom Filter Imagine there are set A and set B. To get An B in a. simple manner, all the elements of set A are transmitted to the side of set B, and the elements existing in both set A and set B are extracted. At this time, the size of the traffic is the sum of the size of each element in set A. In the method using the bloom filter, set A itself is not transmitted; the bloom filter created by set A is transmitted. The size of the bloom filter is less than the whole size of set A, so the amount of traffic is reduced. The side of set B that received the bloom filter can create SB satisfying SB '^ An B and SBQB.
If the test to check whether an element is a member of ACiB or not to SB is executed, some false positives (an element that is not a member oi ACiB being returned) occur, but false negatives (an element that is a member of A Ci B being not returned) cannot occur. The false positive rate declines exponentially as the size of bloom filter is increased. Set SB created by the side of set B is transmitted to the side of set A, and ^ n B is gained. The execution procedure for the bloom filter is as follows. The idea is to allocate a vector v oim bits, initially all set to 0, and then choose k independent hash functions, /ii, /i2,.-., hk, each with range 1,..., m. For each element a £ A, the bits at positions hi{a),h2{a),...,hk{a) in v are set to 1. (A particular bit might be set to 1 multiple times.) Given a query for b, we check the bits at positions hi{b),h2{b), ...,hk{b). If any one of them is 0, certainly b is not in set A. Otherwise, we conjecture that b is in the set, although there is a certain probability that this is incorrect. This is called a "false positive". Parameters k and m should be chosen such that the probability of a false positive (and hence a false hit) is acceptable.
118
Y. Sei, K. Matsuzaki, and S. Honiden Simple algorithm (SA)
Transmission fixedsize bloom filter algorithm (TfBFA)
Saving fixed-size bloom filter algorithm (SfBFA)
Saving and transmittion divided bloom filter algorithm (STDBFA)
The contents data N(Wi) contains
Tuple of Ih(Wi), content. ID, node address]
Same as SA
Tuple of Qi(WJ), content li^, node address, fBF]
Tuple of [h(WO, content; ID. node address, DBF]
Execution of UN
Calculation of the DHT hash values h(Wl) and h(W2)
Same as SA
Same as SA
Same as SA
Transmissio n from UN to N(W1)
h(Wl), h(W2), UN address
Same as SA
Same as SA
Same as SA
Creation of a fBF from the saved IDs
Extraction of IDs that have possibilities of containing W2 by using the saved fBFs
Extraction of IDs that have possibilities of containing W2 by using the saved DBFs, and creation of a DBF from the extracted IDs
The rest is same as SA
h(W2), the DBF
Execution of N(W1)
Transmissio nfrom N(W1) to N(W2)
h(W2), saved IDs, UN address
h(W2), the fBF
Execution of N(W2)
Extraction of the intersection of the received ids and the saved ids
Extraction of IDs that have possibilities of being the constituent element of the fBF N(W2) received, from the IDs registered with h(W2)
Extraction of IDs that have possibilities of being the constituent element of the DBF N(W2) received, from the IDs registered with h(W2)
Transmissio nfrom N(W2) to UN
Extracted IDs [Finished]
X
X
Extracted IDs
The rest is same as TfBFA
Transmissio nfrom N(W2) to N(W1) Execution of N(W1)
: \ ' ;
Transmissio nfrom N(W1) to UN
• Extracted IDs [Finished]
UN : user node
Extraction of the intersection of the received ids and the saved ids
N(Wi); a node responsible for h(Wi)
Table 1. The sequence of searching for multi-word text (here, a user want contents which contain the two words "Wl" and "W2")
T h e false positive rate ( F P R ) is a function of k, m, and n, expressed as follows [9].
An algorithm to reduce the commmunication traffic in a DHT
119
FPR = (1 - (1 - l/m)'=")'=
(1)
~(l_e-'="/'«)'=.
(2)
When fe = In 2 X m/n, Equation (2) has a minimum value. At that time, FPR is (1/2)'=. If the target FPR is set to FPRtarget, k = [logi/2 FPRtarget\ • Thus, m = [ [ l o g i / 2 FPRtarget\
X u/ In 2J .
(3)
The salient feature of bloom filters is that there is a clear tradeoff between m and the FPR. Problem with the Bloom Filter If n (the number of elements of a set) and FPRtarget are given, the filter bit size m can be minimized by setting parameter k to optimum value. This m value should be shared at the system level. This is because ifTOis different for different filters, the hash functions differ for checking whether or not a given element is a member of the constituent element of the filter. It is thus necessary to re-calculate the hash value of each element per query. We call the bloom filters for which the sizes are the same "fixed-size bloom filters (fBFs)", and we call the bloom filters for which the sizes are different "variable-size bloom filters (vBFs)". We should use fixed-size BFs in order to avoid to calculate many hash values. However, if the numbers of sets are different, it is a problem that the filter bit size of fBFs is bigger than that of vBFs on average [18]. This is because the FPR increases exponentially as the number of elements of the set increases under the condition that the filter bit size does not change. In summary, if we use fixed-size BFs, FPR is higher for the same size of variable-size BF on average. If we use variable-size BFs, calculating hash values takes much time. This comparison is further described in 3.2. 2.2 Reducing the amount of traffic in searching for multi-v^ford in DHT Several studies have been done to reduce the communication traffic in searching for multi-word text in DHT. Two main developments have come from this research. The first development is a device for registering content keys; the second is a device for transmitting content IDs. In the first approach, in [11], the set of keywords included in the content was also regarded as a DHT key. The authors created combinations with three words or less, and registered the combinations as well as each word in the DHT. However, the number of combinations increases exponentially as the number of words increases. In [10], the target for search is a Resource Description Framework [7] (RDF). A system that saves "RDF triples" dispersed in DHT was developed. In this system, the RDF triple itself as well as each element of the RDF triple is registered. Because each RDF triple has only three elements, this method prevented much
120
Y. Sei, K. Matsuzaki, and S. Honiden
extra data storage. However, the method cannot apply to full text searching because the contents have many elements and the extra amount of data storage becomes massive^. In [20], a summary of content is registered as DHT keys. Because doing so reduces the number of keys, the amount of traffic in searching for multi-word text was reduced. However, the amount of information was also reduced by summing up content, so this approach cannot apply to the full-text searching we are addressing. As a second approach, described in [21], [16], and [13], the fixed-size bloom filter is used for transmitting content IDs in searching for multi-word text. By doing so, the amount of traffic was able to reduced without generating any AndSearchData. Prom here on, we call this method a "transmission fixed-size bloom filter algorithm" (TfBFA). The specific process using TfBFA is shown in Table 1-TfBFA. In this process, node N(W2), received by the fixed-size bloom filter from node N(W1), transmits the content IDs it extracted to node N(W1) so as to cut off content IDs accidentally included owing to false-positive results. The advantage of the method using a bloom filter for transmitting content IDs is that there is no AndSearchData; however, a disadvantage of the method is that the reduction rate of the communication traffic is smaller than that in the first approach.
3 Proposed Technique The related works used a bloom filter for transmitting content IDs, but we also use it for registering the keywords of content. In this section, the problem of the bloom filter and its solution are also described. 3.1 Saving fixed-size bloom filter algorithm (SfBFA) We developed a device for registering contents' keys. When a node registers its content, it creates a fixed-size bloom filter from all words of the content. Then it registers the filter as well as the hash value of the word to be registered, the content ID, and its address. The specific process for registering contents is as follows. 1. The node calculates the hash values of all words of the content except for word "Wl" to be registered. 2. The node creates a fixed-size bloom filter from all hash values it calculated in step (1). 3. The node registers the tuple of h(Wl), the content ID, the node address, and the fixed-size bloom filter it created in (2) in the node assigned h(Wl). We call this method a "saving fixed-size bloom filter algorithm" (SfBFA). The process for searching for two-words is shown in Tablel-SfBFA. ^ We call the extra data storage for reducing the amount of traffic "AndSearchData".
An algorithm to reduce the commmunication traffic in a DHT
121
Problem of SfBFA As described in subsection 2.1, the optimum filter bit size depends on the number of elements in the set. In this study, the number of elements in the set is the number of words in the content. Because the numbers of words in the content are different, setting the optimum filter bit size becomes a problem. The k hash functions used in creating the filter should be shared on the DHT system level, so the size of the filter should also be shared on the system level. The filter bit size can be set to be big enough, but the amount of AndSearchData and traffic will be increased. On the contrary, if the filter bit size is set too small, because of the ascension of FPR, the amount of traffic will also be increased. We do not use variable-size bloom filters because douing so would mean taking too much time to calculate hash values. 3.2 Divided bloom filter (DBF) We propose divided bloom filters to overcome the problem of bloom filters. Each filter bit size can thus be maintained by dividing the set into several sets that have the same number of elements and by creating filters from each set. We call filters created by dividing the original set "divided bloom filters" (DBFs). According to Equation 3, m is proportional to n. For this reason, if the FPR of the bloom filter from original set is a, the FPR of each filter of DBFs is also a. However, the following problem occurs. When an element b is checked as to whether or not it is a constituent element of the DBF, if it is checked through every divided filter and the number of divided filters is GN, FPR=l-(l-a)^^.
(4)
If a is sufficiently small, a to the power of more than two can be ignored, so FPR = GN X a.
(5)
According to this equation, FPR increases as the number of divisions increases. The solution needs to identify only one filter that can include element b. By this, FPR is equal to a in total. The only filter that can include element b can be identified by using a DHT hash function without creating extra data storage. When the node divides the set of words in the content, the node calculates the DHT hash value of each word of the content and divides words into groups according to the DHT hash value. In doing so, the system determines the following parameters in advance. - MN: average number of words each group can include - Filter bit size and hash functions used to create filters The specific process to divide the words of content C is as follows. The value that the DHT hash function can return is 1,2,..., DA'' — 1.
122
Y. Sei, K. Matsuzaki, and S. Honiden
«
0.01
Z o V
0,001
2
0.0001
- * - f i x e d - s i z e BF J^ variable-size BF
0,00001
-•-DBF 0,000001
FPR_target
Fig. 2. Average FPR of 1,000 sets (number of elements is from 1 to 1,000) 1. The node calculates the number of groups GN = [WN/MN + 0.5} depending on WN, i.e., the number of the words of content C. 2. The node gives each group Gi{i = 1,...,GN) the assigned range of value R{Gi) = [{DN/GN) x{i- 1), {DN/GN) x i). 3. The node extracts a word of content C, considers it as w, and calculates the DHT hash value h{w). 4. If R{Gj) includes h{'w), the w is grouped in Gj. 5. The node repeats steps (3) to (4) for all words of content C. In this method, it is not guaranteed that that each group has the same number of words. However, if the hash function is colhsion-free, it is assumed that each group has almost the same number of words. Whether a word 6 is a member of the words of the content C is determined as follows. 1. The node that received DBF calculates each assigned range of the value R{Gi) of each group G, according to the number of filters it received. 2. The node calculates the DHT hash value h{b) of word b. 3. The node determines R{Gj) including h{b). At this time, b can be a member of only group Gj. 4. The node judges whether word b can be a constituent element of the filter created by group Gj. Comparison of fBF, vBF, and D B F Let us compare the following features of fixed-size BFs, variable-size BFs, and DBFs: 1. average FPR in creating filters from several sets that have different number of elements and 2. time complexity where an element is checked as to whether it is a member of the filter.
An algorithm to reduce the comminunication traffic in a DHT
123
10000
1000
- * - f i x e d - s i z e BF -A-variable-size BF
100
-•-DBF
10
"1
— 0.01
0.001
0.0001
FPR.target
Fig. 3. Required time for checking whether an element is a member of each filter (number of filters is 1,000,000) 1: Figure 2 shows the average FPR of 1,000 sets in each filter method (fixedsize BF, variable-size BF, and DBF). The number of elements of the contents of the sets is from 1 to 1,000. The filter size was determined by FPRtarget^- We changed FPRtarget from 1/2 to l/2'^9. MN for the DBFs was set to 100. As FPRtarget becomes small, we found, the actual FPR of fixed-size BFs becomes much larger than FPRtarget and that of the DBFs becomes slightly larger than -^
^-^target'
2: Figure 3 shows the simulation result of the required time to check whether an element is a member of a set. We created 1,000,000 filters respectively (fixedsize BF, variable-size BF, and DBF) where the number of elements is 100, and we set FPRtarget = 0.1,0.01,0.001, andO.0001. We created an element b randomly and measured the required time to determine whether b was a member of each filter. In regards to fixed-size BFs and DBFs, according to Figure 3, the required times do not vary with change in FPRtarget- In regards to variable-size BF, we recalculated k hash values for each filter. Hence, the required time was very long. In regards to DBFs, the required time was much less than that of variable-size BFs and close to that of fixed-size BFs. 3.3 Saving divided bloom filter algorithm (SDBFA) We call the method where the node registers a DBF as well as its content ID, its address, and the hash value of the key a "saving divided bloom filter algorithm" (SDBFA). If this SDBFA is used, the approximate minimum length of the filter satisfying the target F P R can be obtained even if different contents have different numbers of words.
^ That is, we set the filter size to the size of variable-size BFs whose FPR is FPR
target-
124
Y. Sei, K. Matsuzaki, and S. Honiden
3.4 Saving and transmission divided bloom filter algorithm (STDBFA) An SDBFA can adopt the method using DBF for transmitting content IDs. Doing so reahzes the same amount of AndSearchData while decreasing FPR. We call the algorithm-synthesized SDBFA and the method using DBF for transmitting content IDs a "saving and transmission divided bloom filter algorithm" (STDBFA). The process of searching for multi-word text is shown in Table 1-STDBFA.
4 Experiment and Evaluation Experiments with "simple algorithm" (SA), "transmission fixed-size bloom filter algorithm" (TfBFA), "saving fixed-size bloom filter algorithm" (SfBFA), "saving divided bloom filter algorithm" (SDBFA), and "saving and transmission divided bloom filter algorithm" (STDBFA) were performed. SA does not take any actions for reducing amount of traffic, TfBFA was used in our previous work, and SfBFA, SDBFA, and STDBFA are new methods proposed in this paper. We measured the average amount of traffic in the five algorithms mentioned above. In addition, we compared the amount of AndSearchData needed for each algorithm. 4.1 Experimental setup As we described in Section I, the aim of the experiment is to limit the amount of traffic in searching for multi-word text (i.e., 12.1% of SA; 64 MB data from 530 MB data). We prepared 10,000 pubfished papers as contents for the experiment. When we extracted the words of the content, we used the database of vocabulary WordNet [6] and extracted the nouns, verbs, and adjectives included in the content. The virtual user selected two words and searched for contents containing the two words. The general hash function SHA-1 [12] was used as the DHT hash function. Because SHA-1 returns a bit value of 160, the content ID has 160 bits. The calculation of the amount of traffic generated by TfBFA and STDBFA that use a fixed-size bloom filter or DBF in transmitting content IDs is as follows. In Table 1, in the case of TfBFA and STDBFA, the total amount of traffic is the sum of the amount of traffic node N(W1) transmits to node N(W2) and the amount of traffic node N(W2) transmits to node N(W1). On the other hand, in the case of SA, SfBFA and SDBFA, the amount of traffic is the only amount of traffic node N(W1) transmits to node N(W2). In this experiment, the average amount of traffic over 1000 trials with simple algorithm was 2.97 KB.
An algorithm to reduce the commmunication traffic in a DHT
125
0.001 FPR ids
Fig. 4. Average amount of traffic using TfBFA compared with that using SA 4.2 Experimental results The searches were repeated 1000 times. Prom here on, we call the FPRtarget of filters created from content IDs "FPRij^g" and call the FPRtarget of filters created from words included in the contents ^'FPRyjords" • In the experiment on TfBFA, we set FPRids to 0.4, 0.2, 0.1, 0.01, and 0.001. Figure 4 shows the result. The amount of traffic for each of the FPRids, respectively, was 0.26, 0.17, 0.16, 0.19, and 0.24 compared with that of SA. If the FPRids is small, the filter bit size that node N(W1) transmits to node N(W2) in Table 1-TfBFA becomes bigger. To the contrary, if the filter bit size is large, the number of content IDs that node N(W2) transmits to node N(W1) becomes larger. In regards to SfBFA, we set FPRyjorda to 0.4, 0.2, 0.1, 0.01, and 0.001 (Figure 5-Left on the extreme right point and Figure 5-Right on the extreme right point). Figure 5-Left shows the amount of traffic involved in searching for multi-word text, and Figure 5-Right shows the amount of AndSearchData in registering one content to the nodes. In regards to SDBFA, FPR words was set to the same value as in the experiments with SfBFA, and MN was set to 10, 20, 50, and 100 (Figure 5 except for each extreme right point.) In Figure 5-Left, SDBFA (which uses DBF) can be seen to have reduced the amount of traffic more than SfBFA (which uses a normal bloom filter). As shown in Figure 5-Right, the amount of AndSearchData with the method using a normal bloom filter is not so different from that with the method using DBF. When FPR^ords = 0.1, the goal of 12.1% traffic compared with SA was realized by using DBF. Figure 5-Right shows that the average amount of AndSearchData per content was the same as that with SfBFA. The amount of AndSearchData is the same as that of SDBFA. In regards to STDBFA, FPR^ords was set to 0.1 and MA'' to 10 for registering contents' keys, and FPRids was set to 0.1 and MN to 2, 5, 10, 20, and 50 for transmission of content IDs (Figure 6.) In Figure 6, the condition MN = 20 can be seen to have reduced the amount of traffic the most.
126
Y. Sei, K. Matsuzaki, and S. Honiden -FPR —H -FPR - A • FPR — X - FPR —e • FPR
Z 5
^ —B—
I
A - " "A-
e • • A - -
:
x ^'
•A""
20 50 (SDBFA)
100
[ o- -- -e • • X - - —X --
A : ~X- -^
''p
^ = ^ r-^- - ^ 10
5 1200 o 5" 1000
words =0.4 words =0.2 words =0.1 words =0.01 words =0.001
< 5 •S " •100 I ^ 200 o
i
^J^.--^-
•
^
^
. • ^ ^ • .
— -o
- X - -^
i
-^1
^
-
•
^
\
0
10 NotDivided (SfBFA)
- o - - — ©-
20 50 (SDBFA)
100
NotDivided (SfBFA)
MN
MN
Fig. 5. Left: Average amounts of traffic using SfBFA and SDBFA; Right: Amounts of AndSearchData using SfBFA, SDBFA, and STDBFA
Fig. 6. Amount of traffic using STDBFA We also examined the effect of changing the number of contents from 1,000 to 10,000 (Figure 7). In regards to TfBFA, the amount of traffic had significant changes. In regards to SDBFA and STDBFA, however, the amount of change in traffic was stably small. Furthermore, the amount of AndSearchData was less than that of SfBFA. Table 2 is a compilation of the results for all algorithms. The values are the average amount with change in the number of contents from 1,000 to 10,000. In the case of searching for multi-word text, TfBFA used in conventional research used needed 23.7% of the traffic of SA. However, SfBFA (which uses the method of registering fixed-size bloom filters created by all words of the content) reduces the amount of traffic more than TfBFA. In addition, compared to SfBFA, SDBFA and STDBFA (which use the proposed DBF rather than a normal bloom filter) reduce both the amount of traffic and the amount of data storage. 4.3 Discussion In this work, we set the target as text documents, but we believe that the proposed techniques (SDBFA and STDBFA) can apply to multimedia contents
An algorithm to reduce the commmunication traffic in a DHT
127
-A-TfBFA • e - SfBFA(FPR_words=0.001) -Q- SDBFA(FPR_words=0.01) •STDBFA(FPR_words=0.01)
0.45 0.4
1 S 0.35 2 i 0.3 ? I 0.25 = fe 0.2 I I 0.15 < 8 0.1 0.05 0 3
4
5
6
7
8
10
Number of contents [x1000]
Fig. 7. Amount of traffic with change in the number of contents Amount of traffic compared with SA
Amount of data storage per content [KB]
TfBFA
0.237
SfBFA
0.144
Desired Value
0.121
_„.----''^^-
SDBFA
0.072
730
STDBFA
0.059
730
^^^--^"^^ 1095
Table 2. Comparison of the results for all algorithms
like movies or music. At this time, the keys for DHT are the texts inserted in multimedia contents by languages that describe metadata (like MPEG7 [3]). If mounting metadata into multimedia contents could be done automatically, contents would have much metadata. If a DHT system for these multimedia contents were constructed, the amount of traffic generated in searching for multiword text would grow larger. However, we believe that our proposed method would also be able to reduce the amount of traffic in such a system. Some DHT algorithms taking mobility and wireless environments into account have been developed (e.g., M-CAN [14] and Warp [22]). Compared to traditional P2P, characteristics of MP2P include unreliable connection, limited bandwidth, and the constraints of mobile devices. Hence, we believe that our proposed method can better apply to these DHTs. Note that in the experiments in this work, the virtual user queried random words. However, we should perform experiments by creating a user model from real DHT systems or histories of real search engines.
128
Y. Sei, K. Matsuzaki, and S. Honiden
Furthemore, we only evaluated two-word multiple searching. Three-word multiple searching should be conducted as follows. Let the three words be " W l " , "W2", and " W 3 " , and the node responsible for h ( W l ) be N ( W 1 ) . In regards to SDBFA and STDBFA, node N(W1) extracts only the content IDs t h a t can include W 3 as well as W2; therefore, in these cases, we predict t h a t the amount of traffic would be decreased compared to t h a t of two-word multiple searching.
5 Conclusion We aimed to reduce the amount of traffic for multi-word searches in D H T s . First, as a device for registering contents' keys, we used a bloom filter created from all words of the content. In this method, some amount of extra d a t a storage for reducing the amount of traffic occurred. We proposed a divided bloom filter (DBF) so as to overcome the limitations of the role of the bloom filter if several sets have different numbers of elements. We used the D B F to reduce the amount of extra d a t a storage as well as the amount of traffic. Second, as a device for transmitting the content IDs, a method by which the node transmits not content IDs themselves, b u t D B F s of them, was effective in reducing t h e amount of traffic. In regards to the saving divided bloom filter algorithm (SDBFA) and the saving and transmission divided bloom filter algorithm (STDBFA) proposed in this paper, we were able t o get favorable results for the amount of traffic in searching for multi-word text as well as d a t a storage.
References 1. Gnutella, http://gnutella.wego.com/. 2. Google, http://goGgle.com/. 3. ISO/IEC TR 15938-8:2002: Information technology, multimedia content description interface. part8: Extraction and use of mpeg-7 descriptionscISO/IEC/JTC 1/SC 29, 2002. 4. Massachusetts institute of technology, http://mit.edu/. 5. Napster, http://www.napster.com/. 6. Wordnet, http://wordnet.princeton.edu/. 7. World-wide web consortium: Resource description framework, http://www.w3.org/rdf. 8. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7);422-426, 1970. 9. A. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey. In Proceedings of 40th Annual Allerton Conference on Communication, Control, and Computing, pages 636^646, 2002. 10. Min Cai and Martin Frank. Rdfpeers: a scalable distributed rdf repository based on a structured peer-to-peer network. In WWW '04-' Proceedings of the 13th international conference on World Wide Web, pages 650-657, New York, NY, USA, 2004. ACM Press.
An algorithm to reduce the commmunication traffic in a DHT
129
11. Austin T. Clements, Dan R. K. Ports, and David R. Karger. Arpeggio: Metadata searching and content sharing with chord. 12. D. Eastlake 3rd and P. Jones. US Secure Hash Algorithm 1 (SHAl). RFC 3174, September 2001. 13. J. LI, B. LOO, J. HELLERSTEIN, F. KAASHOEK, D. KARGER, and R. MORRIS, the feasibility of peer-to-peer web indexing and search, 2003. 14. Gang Peng, Shanping Li, Hairong Jin, and Tianchi Ma. M-can: a lookup protocol for mobile peer-to-peer environment. In ISPAN, pages 544-550, 2004. 15. Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Schenker. A scalable content-addressable network. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pages 161-172, August 2001. 16. Patrick Reynolds and Amin Vahdat. Efficient peer-to-peer keyword searching. In Middleware, pages 21-40, 2003. 17. Antony I. T. Rowstron and Peter Druschel. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In Symposium on Operating Systems Principles, pages 188-201, 2001. 18. Michael A. Shepherd, William J. Phillips, and C.-K. Chu. A fixed-size bloom filter for searching textual documents. Comput. J., 32(3):212-219, 1989. 19. Ion Stoica, Robert, David Karger, Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. In Proceedings of the 2001 ACM SIGCOMM Conference, pages 149-160, 2001. 20. Chunqiang Tang, Zhichen Xu, and Sandhya Dwarkadas. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In SICCOMM, pages 175-186, 2003. 21. Jiangong Zhang and Torsten Suel. Efficient query evaluation on large textual collections in a peer-to-peer environment. In Peer-to-Peer Computing, pages 225233, 2005. 22. Ben Y. Zhao, Ling Huang, Anthony D. Joseph, and John Kubiatowicz. Rapid mobility via type indirection. In IPTPS, pages 64-74, 2004.
Exploring an Unknown Graph to Locate a Black Hole Using Tokens Stefan Dobrev^, Paola Flocchini^, Rastislav Kralovic^''^*, and Nicola Santoro'' ^ SITE, University of Ottawa, { s d o b r e v , f l o c c h i n } S s i t e . u o t t a w a . c a ^ Dept. of Computer Science, Comenius University, kralovic9dcs.fmph.uniba.sk ^ School of Computer Science, Carleton University, santoroQscs.carleton.ca
Abstract. Consider a team of (one or more) mobile agents operating in a graph G. Unaware of the graph topology and starting from the same node, the team must explore the graph. This problem, known as graph exploration, was initially formulated by Shannon in 1951, and has been extensively studied since under a variety of conditions. The existing investigations have all assumed that the network is safe for the agents, and the solutions presented in the literature succeed in their task only under this assumption. Recently, the exploration problem has been examined also when the network is unsafe. The danger examined is the presence in the network of a black hole, a node that disposes of any incoming agent without leaving any observable trace of this destruction. The goal is for at least one agent to survive and to have all the surviving agents to construct a map of the network, indicating the edges leading to the black hole. This variant of the problem is also known as black hole search. This problem has been investigated assuming powerful inter-agent communication mechanisms: whiteboards at all nodes. Indeed, in this model, the black hole search problem can be solved with a minimal team size and performing a polynomial number of moves. In this paper, we consider a less powerful token model. We constructively prove that the black hole search problem can be solved also in this model; furthermore, this can be done using a minimal team size and performing a polynomial number of moves. Our algorithm works even if the agents are asynchronous and if both the agents and the nodes are anonymous.
1 Introduction 1.1 T h e P r o b l e m T h e problem of exploring an unknown graph using a t e a m of one or more mobile agents (or robots) is a classical fundamental problem t h a t has been extensively studied since its initial formulation in 1951 by Shannon [19]. It requires the agents, starting from the same node, t o visit within finite time all Partially supported by grant VEGA 1/3106/06. Please use the following format when citing this chapter: Dobrev, S., Flocchini, P., Kralovic, Santoro, N., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), pp. 131-150.
132
S. Dobrev et al.
the sites of a graph whose topology is unknown to them. Different instances of the problem exist depending on whether or not the agents are required to eventually stop the exploration; and, if so, whether or not they must construct an accurate map of the network. Further differences exist depending on a variety of factors, including the (a)synchrony of the agents, the presence of distinct agent identifiers, the amount of memory, the coordination and communication tools available to the agents, etc. (e.g., see [1, 2, 3, 4, 6, 7, 13, 14, 15, 18]). Notice that, except for trees, the exploration with stop of anonymous graphs is possible only if the agents are allowed to mark the nodes in some way; various methods of marking nodes have been used by different authors ranging from the weak model of tokens to the most powerful model of whiteboards. The solutions proposed in the literature succeed in their task only assuming that the network is safe for the agents. This assumption unfortunately does not always hold in real systems and networks; for example, a node could contain a local program (virus) that harms the visiting agents; or the network could contain failed nodes that might damage incoming agents. In fact, protecting an agent from "host attacks" (i.e., harmful network sites) has become a pressing security concern (e.g., see [17, 20]). Recently the exploration problem has been examined also when the network is unsafe [5, 8, 9, 10, 11, 16]. The danger considered is the presence in the network of a black hole ( B H ) , a node that disposes of any incoming agent without leaving any observable trace of this destruction. Note that such a dangerous presence is not uncommon; in fact, any undetectable crash failure of a site in an asynchronous network transforms that site into a black hole. In spite of this severe danger, the goal is for the team of agents to be able to explore the network and, within finite time, discover the location of the BH. More precisely, at least one agent must survive, and any surviving agent must have constructed a map of the network indicating the edges leading to the BH. This version of the exploration problem is called black hole search ( B H S ) . It is known that, for its solution, the number of nodes of the network must be known to the agents [9]; furthermore, if the graph is unknown, at least A + 1 agents are needed, where A is the maximum node degree in the graph [10]. In the case of asynchronous agents in an unknown network, termination with an exact complete map in finite time is actually impossible; in fact, regardless of the protocol, a surviving agent upon termination can be wrong on Z\ — deg{BE) links, where deg{x) denotes the degree of node x [10]. Hence, in the case of asynchronous agents, BHS requires termination by the surviving agents within finite time and creation of a map with just that level of accuracy. The problem of asynchronous agents exploring a dangerous graph has been investigated assuming powerful inter-agent communication mechanisms: whitehoards at all nodes. In the whiteboard model, each node has available a local storage area (the whiteboard) accessible in fair mutual exclusion to all incoming agents; upon gaining access, the agent can write messages on the whiteboard and can read all previously written messages. This mechanism can be used by the agents to communicate and mark nodes or/and edges, and has been em-
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
133
ployed e.g. in [6, 8, 9, 10, 11, 13, 14]. In the whiteboard models, the black hole search problem can be solved with a minimal team size and performing a polynomial number of moves (e.g., [8, 9, 10, 11]). The problem of exploring a dangerous graph has never been investigated in the less powerful token model, which is instead commonly employed in the exploration of safe graphs. In the classical token model, each agent has available a token that can be carried, can be placed in the center on a node, or removed from it. All tokens are identical (i.e., indistinguishable) and no other form of marking or communication is available. In our variation {enhanced token modet) we allow tokens to be placed also on a node in correspondence to a port. Notice that the classical token model can be implemented with 1-bit whiteboards, while our variation is not as weak; in fact, it could be implemented by having a log d-whiteboard on a node with degree d. The principal question targeted by our research was the impact of the communication model to the solvability and complexity of the BHS problem: to what extent can be the whiteboard model weakened, and still allow the polynomial solvability of B H S ? With this goal in mind, we examine the problem of performing black hole search in the enhanced token model. Several immediate computational and complexity questions naturally arise. In particular, are the weaker communication and marking capabilities provided by enhanced tokens sufHcient to solve the problem ? If so, how can the problem be solved? at what costs? In this paper we provide definite answer to these questions. 1.2 Our Results In this paper we present an algorithm that works in the token model and solves the BHS problem with the minimal number of agents and with a polynomial number of moves. Our algorithm works even if the agents are asynchronous, and if both the agents and the nodes are anonymous. More precisely, we consider an unknown, arbitrary, anonymous network and a team of exploring agents starting their identical algorithm from the same node (home-base). The agents are anonymous, they move from node to neighboring node asynchronously (i.e., it takes a finite but unpredictable time to traverse a link). Each agent has available an indistinguishable token (or pebble) that can be placed on, or removed from, a node; on a node, the token can be placed either in the center or on an incident link. In our algorithm there are never two tokens placed on the same location (node center or port), nor an agent ever carries more than one token. Using only this tool for marking nodes and communicating information, we show that with A + 1 agents the exploration can be successfully completed. In fact, we present an algorithm that will allow at least one agent to survive and, within finite time, the surviving agents will know the location of the black hole with the allowed level of accuracy. The number of moves performed by the agents when executing the proposed protocol is shown to be polynomial. The proposed algorithm is rather complex.
134
S. Dobrev et al.
This work is the first that addresses the problem of exploration of a dangerous unknown graph using tokens. Our results indicate that, perhaps contrary to expectation, our variation of the token model is computationally as powerful as the whiteboard one with regards to black hole search. topology arbitrary, unknown arbitrary, known arbitrary, unknown
communication # of agents # of moves whiteboard A+1 whiteboard A+1 ©(iVlogiV) tokens
A+1
OiA^M^N'')
Fig. 1. Existing and new results for the BHS problem.
1.3 Related Work The research on safe exploration of unknown graphs was started in 1951 by Shannon [19]. Most of the work since has been concentrated on exploration by a single agent (e.g., [2, 7, 18]). Safe explorations by multiple agents were initially studied for a team of more recently the investigations have focused on collaborative exploration by Turing machines. An exploration algorithm for directed graphs that employs two agents was given in [3], whereas algorithms for exploration by more agents were given by Prederickson et al. for arbitrary graphs [15], by Averbakh and Berman for weighted trees [1], and more recently by Praigniaud et al. for trees [13]. To explore arbitrary anonymous graphs, various methods of marking nodes have been used by different authors. Bender et al. [2] proposed the method of dropping a token on a node to mark it and showed that any strongly connected directed graph can be explored using just one token, if the size of the graph is known and using ©(log log A'') tokens, otherwise. Dudek et al. [12] used a set of distinct markers to explore unlabeled undirected graphs. Yet another approach, used by Bender and Slonim [3] was to employ two cooperating agents, one of which would stand on a node, thus marking it, while the other explores new edges. In Praigniaud et al. [13, 14], marking is achieved by accessing whiteboards located at nodes, and their strategy explores directed graphs and trees. The explorations of unsafe graphs are quite recent and have focused mostly on asynchronous environments. The BHS problem has been studied when the network is an anonymous ring, characterizing the limits and determining optimal solutions [9]. When the network is an arbitrary graph the problem has been investigated in [10], and several tight bounds have been established, depending on the level of topological knowledge available to the agents. Por example, when the network is arbitrary, the topology unknown and no form of consistent edge labehngs are present, A + 1 agents are necessary and Q{N'^) moves are required in the worst case. Improved bounds on the number of moves have later been obtained in the case the agents have a complete map of the network (but
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
135
not the location of the BH) [11]. In the case of specific graphs, including many important interconnection networks, the number of moves can be reduced to linear [8]. In all these investigations, the nodes of the network have available a whiteboard, i.e., a local storage area that the agents can use to communicate information. Access to the whiteboard is gained in mutual exclusion and the capacity of the whiteboard is always assumed to be at least of f2{logN) bits. In the synchronous environments, the investigations have produced optimal solutions for trees [5]; approximation results have been obtained for arbitrary graphs in [5, 16].
2 T h e Model The network G = (V, E) is a simple undirected graph with node-connectivity two or higher; let A'' = |V^| and M = \E\ be the number of nodes and of edges of G, respectively, d{x) denote the degree of x, and A denote the maximum degree in G. If {x,y) 6 E then x and y are said to be neighbors. The nodes of G are anonymous (i.e., without unique names). At each node x there is a distinct label (called port number) associated to each of its incident links (or ports). Without loss of generahty, we assume that the labels at x e V are the consecutive integers # 1 , # 2 , . . . , #d(x). Operating in G is a team of zi -t-1 anonymous agents. The agents know the number of nodes of the network, can move from node to a neighboring node in G, have computing capabilities and limited amount of memory (0(Mlog A") bits suffice for our algorithm). We also assume that agents know the degree A of the B H . Each agent has a token that can be placed on on a node and removed from it; tokens are identical and their placement can be used to mark nodes and ports/links. More precisely, a node can be marked by a token in different modalities: in the center, or in correspondence of one of the incident ports. The agents obey the same set of behavioral rules (the "algorithm") and initially, they are all located at the same node h, called home-base (home-base). The agents can be seen as automata, where one computational step of an agent A in a node v is defined as follows. Based on the state (local memory) of A and on the presence of tokens at v and incident links (examined atomically): - change the state (local memory of A) - remove (or place) at most one token from v or an incident link and - start waiting (for a token to disappear) or leave v via one of the incident links. The computational steps are atomic and mutually exclusive, i.e. no more than one agent computes in the same node at the same time. The links satisfy FIFO property, i.e. the agents entering a link e = {u, v) at u will arrive at v and execute the computational steps in the same order they entered e. The agents are asynchronous in the sense that waiting (for a token to disappear) and traversing a link can take an unpredictable (but finite) amount of time.
136
S. Dobrev et al.
The network contains a black hole (BH) that destroys any incoming agent without leaving any trace of that destruction. The goal of a black hole search algorithm V is to identify the location of BH; that is, within finite time, at least one agent must terminate, and all the surviving agents must construct a map of the entire graph where the homebase, the current position of the agent, and the location of the black hole, are indicated. Note that termination with an exact map in finite time is actually impossible. In fact, since an agent is destroyed upon arriving to the BH, no surviving agent can discover the port numbers of the black hole. Hence, the map will have to miss such an information. More importantly, the agents are asynchronous and do not know the actual degree d{Bu) of the black hole (just that it is at most A). Hence, if an agent has a local map that contains N — 1 vertices and at most A unexplored edges, it cannot distinguish between the case when all unexplored ports lead to the black hole, and the case when some of them are connected to each other; this ambiguity can not be resolved in finite time nor without the agents being destroyed. In other words, if we require termination within finite time, an agent might incorrectly label some links as incident to the BH; however the agents need to be wrong only on at most A--d(BH.) links. Hence, we require from a solution algorithm V termination by the surviving agents within finite time and creation of a map with just that level of accuracy. The complexity measures of a solution protocol are: the number of agents used, called size of the team, and the total number of moves performed by the agents during the execution, called cost.
3 The Solution 3.1 Overview In our algorithm, each agent constructs its own local map (quasi-)independently from other agents until it enters the B H or explores at least N — 1 vertices and M - A edges. In the beginning, the local map of each agent contains only the home-base. During the computation, the communication ports in the graph are classified by each agent EIS follows: - unexplored port/edge - not in the local map: the port is not marked by a token - dangerous port - not in the local map; the port is marked by a token - safe edge - in the local map; connecting two already explored vertices - quasi-safe edge - in the local map; connecting two already explored vertices, but could be wrong Throughout the execution, whenever an agent leaves via a port that might lead to the BH, it leaves its token there, marking the port as dangerous. The
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
137
algorithm requires t h a t no agent enters a dangerous port, ensuring in this way t h a t at most A agents enter the black hole. We will thus say t h a t a dangerous port blocks the (other) agents. Initially, all ports incident to the home-base are unexplored. T h e local m a p of an agent is constructed by adding edges in a sequential manner according to Algorithm 1: T h e searching for an unexplored port is straightforward: any
loop traverse the local map and look for an unexplored port p if unexplored port p found t h e n EXPLORE(P)
continue the main loop else if local map contains N—1 vertices and there are at most A outgoing edges then TERMINATE
else 9 SUSPEND 10 e n d if 11 e n d if 12 13 e n d loop
traversal of the explored part using only the edges identified as safe in the local m a p will do. In the execution of ExPLORE(p), the agent explores the edge Incident to port p, determines whether it leads t o a new node or to an already discovered one^, and updates the local m a p . Due to complex interaction of anonymity with asynchrony, in some cases the agent might be unsure of whether an edge leads t o a new node or to an already visited one. However, the agent is able to recognize this uncertainty, and will add this edge to the local m a p as quasi-safe instead of safe. Eventually, no unexplored port is found. If A'' — 1 nodes has been visited, the remaining node is the B H and the algorithm can terminate. Otherwise, the access t o the unexplored part of the graph is blocked by dangerous ports. Since G is two-connected, at least one of those ports does not lead to the B H and the token will eventually be removed from it, making it unexplored. In order to avoid live-lock, the agent t h a t failed t o find an unexplored port suspends itself using procedure SUSPEND until such a progress has been made. T h e basic idea of S U S P E N D is to go to the home-base, set a flag there (by using a token) indicating t h a t an agent is waiting for wake-up, verify t h a t no progress has been made before the flag has been set up, and then wait to be woken^ p might lead to the B H as well, in which case the agent disappears there and does not continue the algorithm
138
S. Dobrev et al.
up. Complementarily, whenever an agent removes its token from an edge, it goes to the home-base and wakes up the agents waiting there (using procedure W A K E - U P ) . There are several technical issues to be dealt with (discussed in the detailed description), e.g. several agents might be executing SUSPEND and W A K E - U P simultaneously, the flag can only be implemented using tokens, as well as the interference with the rest of the algorithm. 3.2 Detailed Description In this section we give the full description of the algorithm. The following three rules clarify some terms used in the description: R l "cautious step" in a vertex v over a Unk I = put a token on link I, traverse the hnk, return to v, take the token, perform W A K E - U P , return to V and traverse / R 2 "put token in the home-base" = wait for all known safe links incident to home-base to become unmarked, then put the token R 3 "put token on a Unk" (in vertex v) = wait for v to become empty, then put the token The nodes on the other ends of the links # 1 and # 2 from home-base are called s t o r e r o o m s (SR) and they play special role in the algorithm (as we will see, they will be employed to allow communication among the agents when they are temporary suspended looking for a new port to explore). Each agent starts the algorithm by exploring (using cautious step) S R I and S R 2 from the home-base (in this order). Since the graph is simple, at least one of them is safe; if both these links are dangerous, the agent will simply wait until one of the blocking tokens disappears. Eventually, each agent will know about one or two safe storerooms. The primary storeroom for an agent is defined as the storeroom known to be safe with the lower numbered link leading to it. Note that if the B H is located in one of the storerooms, all surviving agents will choose the other S R as their primary SR. However, if none of the SR'S contains the BH, there might be agents with different primary SR'S (some might find S R I safe and choose it, some might find it temporarily dangerous and select SR2).
As this might lead to problems, the algorithm tries to remedy this situation by "updating" the primary store room of agents that had originally selected S R 2 and later discover that S R I has become safe in the meanwhile. The update rule is called R 4 and will be described later.
Explore The execution by agent A of procedure ExPLORE(p) is to enable A to traverse an unexplored edge e = {u, v) (starting at port p in u) and add it (possibly with the vertex v) to the local map. Agent A starts executing a cautious step over the edge e and, if survives, it proceeds with determining whether or not e leads to a new (not in the local map) vertex.
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
139
Notice that recognizing if v is already in the local map would be an easy task if either the agents were able to recognize their own tokens, or they were able to recognize the home-base. In fact, if agents were able to recognize their tokens, then A could simply put its token at v and scan the explored subgraph: if it finds its token, v is already explored, otherwise it is a new node. If the agents were able to recognize the home-base, then A could determine whether V is a new node as follows. For each node w in the local map, A guesses that V = w and verifies whether that is really true: Let a be a sequence of port labels specifying a safe path (determined by looking in the local map) from w to the home-base. Starting from v, A follows^ the port labels specified by a. If A finishes in the home-base, then v — w, otherwise A makes another guess. If all guesses fail, ?; is a new node. However, in our model the agents can not recognize their tokens nor the home-base. Still, the basic structure is to guess for all already explored nodes w whether v = w and to verify the guess, although the verification is much more involved. Let Pw (we will use P when w is clear from the context) be a sequence of port labels starting with the label of the port from u to f and then following a path (using only edges marked as safe in the agent's map) from w through the primary S R and ending in u. Clearly, li v = w then p specifies a simple cycle in the graph (and therefore \p\ < n, even if actually v ^ w). Agent A verifies whether v —w hy following the labels specified by a cyclic repeating of p (we will call it /?*) for up to A''^ edges or until A finds a difference between what it sees in the current node and what it should see (according to its map) a V = w. The number of steps is chosen large enough so that following P* creates a cycle even ifv^w (as we will see later, using only p is not enough). This means (as will be proven later) that if no discrepancy has been found for N^ steps, u and v indeed lie on a cycle C passing through the correct SR, with the labels specified by /3*. Unfortunately, it is still possible that, although no discrepancy is found, v ^ w: this could happen if \C\ is a multiple of |/3|. In this case the agent verifies whether v = w or not in the procedure VERIFY, which will be described later. The N^ steps along /3* must be done in cautious manner, not entering dangerous ports, since it may be the case that v ^w and p* leads to the BH. The cautious walk is complicated by the fact that a port to be taken (let its label be A) from a node w' might be dangerous. If this happens, the agent cannot afford to wait in w' until the token is removed, because this edge might indeed lead to the BH. Instead, it wants to ensure that, liv = w then the token will be removed allowing A to continue its cautious walk through A. To do so, A goes backwards for |/3| steps reaching a safe node through safe links; this node might indeed be w' (this happens if the guess i; = w is correct), or it could be a different node w". Agent A waits here until there is no token on the port labelled A. Although not sure about the identity of the node, the agent knows cautious walk needs to be used, as v might be different from w, and a from v might lead to the BH
140
S. Dobrev et al.
t h a t A must lead to a safe node {A is now revisiting nodes it has visited earlier) thus the token will be eventually removed from there. After ensuring the removal of the token, agent A returns t o w'. It can happen t h a t the port A is still dangerous. However, if v = w then this must be a newly placed token. Since (as we will see later) during the whole execution of the algorithm a token is placed on a given port less then 2AMN^ times, if after 2AMN^ cleaning tries A is still blocked, t h e n v ^ w. T h e Algorithm 2 describes the procedure EXPLORE in full detail.
A l g o r i t h m 2 Exploring an edge with label h by EXPLORE 1: do a cautious step over link li, let I2 := label of the link upon which you arrived 2: for all w in local map do 3: compute the sequence P 4: for A''^ steps do 5: while next port 7 in (3* is dangerous and this loop has been executed less than 2AMN^ times do 6: go back |/3| steps 7: wait until there is no token on the edge along which you arrived 8: go forwards \(3\ steps 9: end while 10: if port 7 is still dangerous then 11: backtrack your steps to v and continue the outermost for cycle for the next w 12: end if 13: do a cautious step 14: if what you see in the vertex you arrived to is not compatible with the local map assuming v = w then 15: backtrack your steps to v and continue the outermost for cycle for then next w 16: end if 17: end for 18: if VERIFY then / / after traversing N^ edges there was no discrepancy, so I am in a cycle. Is it a short one ? 19: add edge to w to the local map as quasi-safe 20: exit from EXPLORE 21: end if 22: end for 23: add to the local map the new vertex and edge; the added edge is marked as safe
Notice t h a t , during the actual exploration, tokens are placed in correspondence to links only. Thus, a token found on a link is a clear sign of danger. As we will soon discover, b o t h in the verification process (described below) and in the suspension process (described later) tokens are instead placed in (and removed from) the home-base and the storerooms. In other words, the home-base and t h e storerooms are employed to accomplish different tasks and this requires much care to avoid ambiguity and interference between different activities.
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
141
Verification T h e test of a candidate vertex w in the procedure E X P L O R E may end, after traversing the sequence /3* for A'"^ steps, in a situation where the agent knows t h a t either (3 or its multiple forms a safe cycle connecting u and v. T h e procedure V E R I F Y is used to verify whether the cycle consists of just one repetition of /3 (in which case v = w).
A l g o r i t h m 3 VERIFY - let p be the S R , if the hypothesis about w is true 1: PosCount = NegCount = 0 2: loop 3: go to home-base, wait until it becomes empty, and go to the primary S R 4: if the S R is empty t h e n 5: put token and exit loop 6: else 7: wait until the S R becomes empty 8: e n d if 9: e n d loop 10: while PosCount < 2AMN^ + AMN and NegCount < 2AMN^ + AMN d o 11: if known, go to the other S R and wait until it becomes empty 12: go to the home-base, wait until it becomes empty 13: go to p 14: if there is a token t h e n 15: PosCount = PosCount + 1 16: else 17: NegCount — NegCount + 1 18: e n d if 19: go to the primary S R and if empty update the knowledge of storerooms using rule R4 and restart algorithm 20: e n d while 21: take token 22: if PosCount > 2AMN^ + AMN t h e n 23:
return T R U E
24: else 25: return FALSE 26: e n d if
T h e idea of V E R I F Y is t o use a token in the primary S R for breaking symmet r y on the /3*-cycle. An agent A performing a V E R I F Y first makes sure t h a t it is not interfering with any other agent by waiting until both the home-base and the S R ' S it knows to be safe are empty. It t h e n puts its token in the primary S R and walks^ along the /?*-cycle for |/3| steps t o a vertex w' and checks whether there is a token in w'. T h e idea is t h a t \iv — w then w' is the S R and contains the token, iiv^w then w' should be empty as it is not t h e correct S R . •* Note that it is not needed to use cautious steps, as the cycle identified by (3* has already been traversed and is known to be safe
142
S. Dobrev et al.
Notice that a straightforward check on whether there is a token in w' can fail for two reasons. (1) It may happen that w' is not a SR but, say, the homebase. As mentioned above, the home-base is also used by procedure SUSPEND and WAKE UP, which are employed when an agent has not found a suitable port to explore and is waiting for one to become available. If some other agent has started to perform a SUSPEND (which requires putting a token in home-base) while A traveled to w', A is deceived since it finds a token in w', but this is not the token it left in SR! (2) It may happen that w' is indeed a S R but some other agent took the token from the S R in the meanwhile (when finishing SUSPEND); so A is again deceived because it does not find its own token. Luckily, as will be shown later, each of these two cases occurs less than K times, where K = 2AMN^ + AMN. Hence, if A saw a token in w' at least K times, then w' must be the SR; conversely, if A saw no token in w' at least K times, then w' is not the SR. One last complication comes from the fact that, at the beginning of each iteration of the while cycle, A has to make sure that the home-base and the SR'S are empty. The problem is that agents cannot always agree on one primary SR. In fact, (if the B H is not in a SR) there are three types of agents: some think that only S R I is safe, other thing that only SR2 is safe, while the third group knows that both SR'S are safe. However, if an agent does not know that both SR'S are safe, it cannot make sure that both of them are empty. In this case it may happen that the result of V E R I F Y is wrong. This is the reason why when A decides that v = w,\t marks the edge {u,v) as quasi-safe and never uses it for traversals. Note that if EXPLORE declares w to be a new vertex it never errs, so the spanning tree defined by the safe edges is always available for traversal. As we prove later, the only way for an agent A to find an empty S R on line 19 is if A does not know about (safe) S R 1. This means that after seeing an empty SR, A can update its knowledge about the storerooms and reset the algorithm according to rule R4. R 4 = When an agent first realizes that both SR'S are safe, it performs the following actions: - If you have no token and your old primary S R is S R 2 , execute G R A B T O K E N starting from S R 2 , else execute G R A B - T O K E N from the home-base. - Update the knowledge about SR'S. - If you came to the home-base to perform W A K E - U P but have not done so, do it now - Restart the whole algorithm
Grab-Token The procedure G R A B - T O K E N is used by an agent A to pick up a token that it has previously put at the home-base or a SR. It might happen that some other agent B has meanwhile picked the A's token instead of its own. However, in such case B's token must be somewhere around (in the home-base or in a
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
143
SR) and A will take it (or the token of yet another agent).
A l g o r i t h m 4 GRAB-TOKEN - starts in home-base 1: if there is a token in home-base, get it and exit GRAB-TOKEN 2: go to primary SR, if there is a token there, get it and exit GRAB-TOKEN 3: go to the home-base and if there is a token there, get it and exit GRAB-TOKEN 4: go to the other SR and get token
Suspend & Wake-Up Recall that an agent A performs SUSPEND when further exploration progress is blocked by dangerous links, but A knows that eventually at least one of those link will become unblocked. The basic idea is to put the token in the home-base to signal "I want to be waken-up", check whether a progress has been made before the token was put down (to prevent deadlock, as an agent performing W A K E - U P after removing its token from a dangerous edge might have arrived to the home-base before the token was put there) and, if not, then wait until the token disappears. An agent performing W A K E - U P simply moves a token from the home-base (if there is any) to its primary SR. The problems arise because several agents might be executing SUSPEND, W A K E - U P and V E R I F Y simultaneously, and because the agents do not necessarily agree on the correct SR. Dealing with that constitutes the most technical part of the algorithm. The basic idea is to wait until any activity going on (detected by non-empty home-base or SR) looks to have finished and then restart SUSPEND. Still, there are many possible cases how the agents can steal each other's tokens and/or misinterpret what is going on. The reasons behind the design of SUSPEND and W A K E - U P will become fully apparent only when reading the formal proofs in the next section. The idea of W A K E - U P is to wake-up an agent suspended at home-base by moving its token to a SR. In order to make G R A B - T O K E N work, the wakingup agent first places its token in the S R and then removes the token from the home-base. If the home-base is empty or the S R is full, W A K E - U P does nothing, because either there is nobody suspended, or it has been already waken-up and just has to pick up its token. When an agent suspended at home-base sees that its token has disappeared, it will search around and find its token (using GRAB-TOKEN)
4 Correctness and Complexity Let us call an agent informed if its knowledge about which storerooms are safe is correct. If the B H is located in one of the storerooms, all agents (that have finished initialization) are informed; otherwise an informed agent knows that
144
S. Dobrev et al.
A l g o r i t h m 5 SUSPEND
go to home-base, wait until it is empty and put a token there scan all known SR'S and return to home-base if SR'S were empty then traverse the local map else if there is a token in home-base then get token go to the SR that contained a token, wait until it becomes empty and restart SUSPEND 9
else / / my token has been moved
GRAB-TOKEN 10 restart SUSPEND 11 end if 12 13 end if
upon return from traversal 1 if traversal revealed progress then 2 GRAB-TOKEN 3 else 4 wait until home-base becomes empty GRAB-TOKEN 5 6 end if Algorithm 6 WAKE-UP
go to home-base and if empty, abort go to "correct" SR if SR full then abort else put token go to home-base GRAB-TOKEN
end if both storerooms are safe. However, the notion of an informed agent is for the purpose of the proof only. The agents themselves may not know whether they are informed or not. The overall structure of the correctness proof, which is quite complicated, is the following: we first prove that during the whole algorithm, at most A agents enter the B H , and all agents that are alive make progress by eventually exploring a new edge. Second, we prove that all informed agents maintain a correct local map, i.e. the local map of an informed agent is at any time Isomorphic to some subgraph of the network (including port labels). The above arguments are formally carried out through a sequence of Claims and Lemmas, which will lead to the main Theorem:
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
145
T h e o r e m 1. (Main T h e o r e m ) At least one agent successfully terminates with a correct map. Due to the lack of space, we present only the key lemmas, we omit some proofs and we only informally sketch some reasonings. Let us start with some basic observations. Since a token is put in a vertex only in SUSPEND, W A K E - U P or VERIFY, we get: Claim. l.A token is in the vertex v only if i; is a home-base or a SR. The most technical part of the algorithm is the implementation of the communication between agents by means of tokens. We are specifically interested in agents who have put their token in the home-base or in the S R and are now without a token; we will call them empty-handed to distinguish them from agents who do not have a token because they are performing a cautious step. Prom the definition of cautious step, from Claim 1, and by construction we get: Claim. 2. There are as many empty-handed agents as tokens in the home-base and storerooms. An agent performing procedure G R A B - T O K E N visits the home-base and possibly some SR'S a constant number of times in a search for a token. For the correctness of the algorithm it is important to prove that a token is always found. L e m m a 1. An agent always gets a token in procedure
GRAB-TOKEN.
Proof. Consider, for the sake of contradiction, an agent A executing G R A B that has not found a token. Let to be the time when A sees that its primary S R X is empty and starts to travel back to home-base. Let ii > io be the time when A arrives to the home-base, finds it empty again, and starts to travel to S R y. By Claim 2, at time to there must be at least one token T in home-base or S R y. However, since A does not find T, T must have disappeared after to before A gets there. The only way for T to disappear is if it is taken by some empty-handed agent B. However, since B is empty-handed, there must be another token T' in some vertex (home-base or SR) at the time when B grabs T. The idea is to argue about T and T' and show that A would find one of them. In particular, we first prove that at some point in time after to both home-base and SR y are full, and then prove that from this fact it follows that A finds a token. Let us focus on the time t' when B put T' and thus became empty-handed. We distinguish three cases. First, consider t' > ti. B could not have removed T from the home-base before time ti, therefore at time ^i (and t' as well, as it is B that removes it) T must be in S R y. Since A started traveling from the home-base to SR y at time ti < t' and due to the FIFO property, B cannot get to SR y before A and so A finds T in S R y - contradiction. TOKEN
146
S. Dobrev et al.
Next, let t' < to- This means that at time to both A and B are emptyhanded, and moreover, SR X is empty. Hence, due to Claim ??, at time to both home-base and S R y are full. Third, let to < t' < ti. There are two possibilities: (1) B (at time t') put T' in S R X. Since to < t', due to FIFO property B cannot take T from the home-base before A does - contradiction. (2) B (at time t') put T' somewhere else (home-bgise or S R y). In such a case, at time t' both home-base and S R y are full, containing T and T': By assumption, B is the agent that takes T, therefore T did not move between to and t'. Hence, it must be the case that the home-base and S R y are full at some time t between to and ti. Since we suppose that A does not find a token, it must be that both tokens in home-base and SR y disappear at some time after t. However, at time to, SR X is empty, so at that time at most one agent other than A is empty-handed. Any agent that becomes empty-handed by putting a token in S R X after to cannot, due to FIFO, prevent A from grabbing a token. This means that only one of the tokens in home-base and S R y can disappear after time t and before A arrives there, i.e. A will find a token - contradiction. Lemma 2. A token is removed from a given link less then 2AMN^
times.
Proof. A token is put and removed on a link only during cautious step. Cautious steps are performed only on line 13 in EXPLORE, which is executed less than A''^ times (at most A'"-^ iterations of the inner loop, for at most A' — 1 candidate vertices). EXPLORE is called by each of the A agents at most M times. Finally, each agent might reset the algorithm once, applying rule R 4 We now aim at proving that at most A agents disappear in the B H . In order to do so we need to show first that an agent can enter a B H only during a cautious step, i.e. that edges marked safe in the local map of an agent correspond to safe edges in the network. To do so, we use the following technical lemmas, whose proofs are omitted due to the lack of space. Lemma 3. Consider a situation when both SR'S are full and agents A and B are the only empty-handed agents. Then, before A or B grabs a token from the home-base or some SR, no agent other than A or B grabs a token from a SR. Lemma 4. No token placed in S R I will be stolen. Moreover, let A be an agent knowing that S R I is safe that puts a token in the home-base. Then A's token will not be kicked out to S R 2 . Lemma 5. A token put in a SR x by an informed agent A executing VERIFY can be removed from x only by A. We can now prove the following: Lemma 6. When an agent A adds a vertex v to its local map as a new vertex, then the local map indeed did not contain v.
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
147
Proof. A vertex v is added as new only if the test w = v \u EXPLORE failed for every candidate w. We show that if the test fails then indeed w ^ v. The test for a given w can fail: - By having the port 7 still dangerous after executing the loop on lines 5..9 for 2AMN^ times. However, if to = u, then between each iteration of that loop the port 7 is cleared which is a contradiction with Lemma 2. Hence, w ^ v. - By noticing (in line 14) difference between what the map tells what should be seen if w = to and what really is visible. Clearly, in such case v ^ w. - By having V E R I F Y return false. V E R I F Y returns false if the agent A has not found the token in the vertex p (which is equal to its correct SR x if v = w) for at least 2AMN^ times. Note that A always leaves S R X with its token there. We distinguish two cases: (i) If a;=SRl the lemma follows from the second part of Lemma 4: no other agent steals A's token from S R I , SO ii V = W then A always sees a token in p and, subsequently, VERIFY never returns FALSE. (ii) Let X = S R 2 . Which agent could remove ^ ' s token from S R 2 ? Prom Lemmas 4 and 5 we know that ^ ' s token was not removed from S R 2 by an agent B executing VERIFY from S R I , because in that case B's token remains in S R I . It cannot be the case that ^ ' s token was removed by an agent B executing V E R I F Y from S R 2 , because that agent would have first placed its token in S R 2 . Therefore, A's token was removed by an agent B executing a G R A B - T O K E N as a part of SUSPEND or W A K E - U P . However, for each removal of a token from a S R by an agent B executing SUSPEND there must have been a wake-up of some other agent that kicked out a token from the home-base to a S R (otherwise B would have picked up its token in the home-base). The only exception are the cases when an agent becomes informed and first takes a token from its old primary SR, which can happen at most A times. The lemma follows from the fact that there are less then 2AMN^ wake-ups. Using the previous lemma, we can argue that an agent disappears in a B H only during a cautious step: L e m m a 7. / / A enters BH, the link e upon which it arrived is marked by its token. Since no agent enters a link marked by a token and the degree of the B H is at most A, we get: T h e o r e m 2. At most A agents die. The next lemmas are needed to show that no deadlock can occur, i.e. every agent is always able to continue its algorithm after some finite time. First, we prove that no deadlock occurs when an agent is waiting for a disappearance of a token: L e m m a 8. A token from the home-base eventually disappears.
148
S. Dobrev et al.
Proof. The only way a token can be put in the home-base is in SUSPEND. Consider for the sake of contradiction that an agent A puts a token in the home-base at time to and that token never disappears. That means A went to its primary S R X and found in empty at time ii, then returned and went to rescan. We claim that if the token from the home-base does not disappear, then no token appears in the S R X after time i i . An agent B executing V E R I F Y cannot place a token in S R X after ti ~ ii B checked the home-base (line 3. of V E R I F Y ) before to, then A would have found its token in SR a;, if it checked it after ^O) it would wait in the home-base until it becomes empty. The only other possibility is that B is executing WAKE-IJpand placed its token in S R X after ti. In such case B would find A's token in the home-base (when executing G R A B - T O K E N ) and take it. Contradiction. Because A did not take its token after returning from rescan, it has seen no progress and did not terminate. This means (by Theorem 2 and from the two-connectivity of G) that it cannot be the case that all blocked links lead to the BH. Therefore one of them will eventually be freed and some agent B will execute W A K E - U P . If a; = 1 (i.e. A's primary S R is S R I ) , B will execute W A K E - U P using S R I (either because S R I was its primary SR, or because of rule R 4 - the link to the S R I is free due to rules R 2 and R3) and since S R I is empty after time ti, it will indeed remove A's token from the home-base. Contradiction. If a; = 2, there are two cases. If J5's primary S R is S R 2 , the same argument as above applies. Otherwise S R I does not contain the B H and the link leading to it will eventually become free and due to rule R 3 remain so. That means A will eventually notice that S R I is safe and apply rule R 4 , executing G R A B - T O K E N starting from S R 2 . Since SR2 remains empty after i i , A will pick its token from the home-base. Contradiction. In a similar fashion, we can show the following lemma, which is, due to space constraints, presented without proof: L e m m a 9. A token from a S R eventually disappears. Prom the construction and Lemmas 8 and 9 we get: T h e o r e m 3. An agent never deadlocks. The next two lemmas are crucial for bounding the number of moves. Due to space restrictions we present them without proofs. L e m m a 10. An agent spends 0{AMN'^)
moves in one call to
L e m m a 1 1 . An agent spends 0{AMN'^) outer loop of Algorithm 1.
steps executing one iteration of the
The last property we need for the proof of Theorem 1 is:
VERIFY.
Exploring an Unknown Graph to Locate a Black Hole Using Tokens
149
L e m m a 12. Each informed agent has a correct map. Proof. It follows from Lemma 6 that if an agent A adds a new vertex v to its map, then indeed v has not been in ^ ' s local map before. So it remains to be proven that if an informed agent A adds an edge (w, w) between two visited vertices to its map, then there is an edge (w, w) in the graph. Adding an edge {u,w) requires that the hypothesis v = w tested in EXPLORE and V E R I F Y returns TRUE. We first prove that after successfully finishing A''^ iterations of the loop on line 4 in EXPLORE the sequence /3* defines a (not necessarily simple) cycle connecting v and u, whose length is a multiple of |/?|. Let /3 = {l3i,P2, • • • ,Pk) where each (ii specifies two port numbers: a consistent traversal must arrive via port pi and leave via port P2. Since k < N ,hy traversing (3* for A''^ steps it must happen that the agent visits a particular vertex q twice with the same position in the sequence /?; say /?j. Clearly, from now on the agent walks in cycle. Let q be the first such vertex. However, since /3j specifies also the arriving port number, it means that the agent has both times arrived to q using the same port, i.e. it already started in the cycle. To conclude, we prove that if VERIFY returns T R U E for some informed agent it must be that the cycle formed by /3* has length |/3| and hence v = w.li V E R I F Y returns T R U E it means that A saw a token in p at least 2AMN'^ times and between every two successive visits of p there was a time when home-base was free and, if there are two storerooms, also a time when SR2 was free. If p was not S R I , it must be that either p is home-base or p is S R I and each of the 2AMN^ times some agent put its token at p (which was removed before the next visit of A in p). We conclude the proof by showing that a token is put in p less then 2AMN^ + AMN times. There are two possible situations when an agent B could put its token to p: either B performs a V E R I F Y in S R 2 (there are at most AMN such cases: B must be a non-informed agent and it puts its token once per each call of VERIFY before getting informed), or B performs a S U S P E N D - W A K E - U P pair. However, in the latter case there must be a cautious step that triggers this W A K E - U P which, according to Lemma 2, accounts for another 2AMN^ possibilities. By Lemmas 1-12, the main theorem (Theorem 1) follows. Let us now consider the number of moves. By Lemmas 10,11 plus the fact that each of the A agents performs at most M iterations of the loop in Algorithm 1, we have T h e o r e m 4. The B H can he located using 0{A'^M'^N'^)
moves.
References 1. I. Averbakh and O. Berman. A heuristic with worst-case analysis for minimax routing of two traveling salesmen on a tree. Discr. Appl. Math., 68:17-32, 1996.
150
S. Dobrev et al.
2. M. Bender, A. Fernandez, D. Ron, A. Sahai, and S. Vadhan. The power of a pebble: Exploring and mapping directed graphs. In Proc. 30th ACM Symp. on Theory of Computing (STOC'98), 269-287, 1998. 3. M. Bender and D. K. Slonim. The power of team exploration: two robots can learn unlabeled directed graphs. In Proc. 35th Symp. on Foundations of Computer Science (FOCS'94), 75-85, 1994. 4. M. Blum and D. Kozen. On the power of the compass (or, why mazes are easier to search than graphs). In 19th Symposium on Foundations of Computer Science (FOCS'78), 132-142, 1978. 5. J. Czyzowicz, D. Kowalski, E. Markou, and A. Pelc. Searching for a black hole in tree networks. In Proc. 8th International Conference on Principles of Distributed Systems (OPODIS 2004), 35-45, 2004. 6. S. Das, P. Flocchini, A. Nayak, and N. Santoro. Exploration and labelling of an unknown graph by multiple agents. In Proc. 12th Int. Coll. on Structural Information and Communication Complexity (SIROCCO'05), 99-114, 2005. 7. X. Deng and C. H. Papadimitriou. Exploring an unknown graph. Journal of Graph Theory, 32(3):265-297, 1999. 8. S. Dobrev, P. Flocchini, R. Kralovic, G. Prencipe, P. Ruzicka, and N. Santoro. Optimal search for a black hole in common interconnection networks. Networks, 47(2):61-71, 2006. 9. S. Dobrev, P. Flocchini, G. Prencipe, and N. Santoro. Mobile search for a black hole in an anonymous ring. Algorithmica. To appear. 10. S. Dobrev, P. Flocchini, G. Prencipe, and N. Santoro. Searching for a black hole in arbitrary networks: optimal mobile agents protocols. Distributed Computing. To appear. 11. S. Dobrev, P. Flocchini, and N. Santoro. Improved bounds for optimal black hole search in a network with a map. In Proc. of 10th Int. Coll. on Structural Information and Communication Complexity (SIROCCO'04), 111-122, 2004. 12. G. Dudek, M. Jenkin, E. Milios, and D. Wilkes. Robotic exploration as graph construction. Transactions on Robotics and Automation, 7(6):859-865, 1991. 13. P. Fraigniaud, L. Gasieniec, D. Kowalski, and A. Pelc. Collective tree exploration. In 6th Latin American Theoretical Informatics Symp. (LATIN'04), 141-151, 2004. 14. P. Fraigniaud and D. Ilcinkas. Digraph exploration with little memory. In 21st Symp. on Theoretical Aspects of Computer Science (STACS'04), 246-257, 2004. 15. G. N. Frederickson, M. S. Hecht, and C. E. Kim. Approximation algorithms for some routing problems. SIAM J. on Computing, 7:178-193, 1978. 16. R. Klasing, E. Markou, T. Radzik, and F. Sarracco. Hardness and approximation results for black hole search in arbitrary graphs. In Proc. 12th Coll. on Structural Information and Communication complexity (SIROCCO'05), 200-215, 2005. 17. R. Oppliger. Security issues related to mobile code and agent-based systems. Computer Communications, 22(12):1165 - 1170, 1999. 18. P. Panaite and A. Pelc. Exploring unknown undirected graphs. J. Algorithms, 33:281-295, 1999. 19. CL. E. Shannon. Presentation of a maze-solving machine. In 8th Conf. of the Josiah Macy Jr. Found. (Cybernetics), 173-180, 1951. 20. Jan Vitek and Giuseppe Castagna. Mobile computations and hostile hosts. In D. Tsichritzis, editor. Mobile Objects, 241-261, 1999.
Fast Cellular Automata with Restricted Inter-Cell Communication: Computational Capacity Martin Kutrib^ and Andreas Malcher^ ^ Institut fiir Informatik, Universitat Giessen Arndtstr. 2, D-35392 Giessen, Germany kutribOinformatik.uni-giessen.de
^ Institut fiir Informatik, Johann Wolfgang Goethe Universitat D-60054 Frankfurt am Main, Germany a.malcherSem.uni-frankfurt.de
Abstract. A d-dimensional cellular automaton with sequential input mode is a d-dimensional grid of interconnected interacting finite automata. The distinguished automaton at the origin, the communication cell, is connected to the outside world and fetches the input sequentially. Often in the literature this model is referred to as iterative array. We investigate d-dimensional iterative arrays and one-dimensional cellular automata operating in real and linear time, whose inter-cell communication is restricted to some constant number of bits independent of the number of states. It is known that even one-dimensional one-bit iterative arrays accept rather complicated languages such as {a^ I p prim} or {a^ | n G N} [16]. We show that there is an infinite strict double dimension-bit hierarchy. The computational capacity of the one-dimensional devices in question is compared with the power of communication-restricted two-way cellular automata. It turns out that the relations are quite different from the relations in the unrestricted case. On passing, we obtain an infinite strict bit hierarchy for real-time two-way cellular automata and, moreover, a very dense time hierarchy for every fc-bit cellular automata, i.e., just one more time step leads to a proper superfamily of accepted languages. K e y words: Cellular automata; Iterative arrays; Restricted communication; Formal languages; Computational capacity; Parallel computing
1 Introduction Devices of homogeneous, interconnected, parallel acting a u t o m a t a have extensively been investigated from a computational capacity point of view. T h e specification of such a system includes the type and specification of the single aut o m a t a (sometimes called cells), their interconnection scheme (which can imply a dimension to the system), a local a n d / o r global transition function and the input and o u t p u t modes. Multidimensional devices with nearest neighbor conPlease use the following format when citing this chapter: Kutrib, M., Malcher, A., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), pp. 151-164.
152
M. Kutrib and A. Malcher
nections whose cells are finite automata are commonly called cellular automata. If the input mode is sequential to a distinguished communication cell, they are called iterative arrays (lA). In connection with formal language recognition I As have been introduced in [5], where it was shown that the language family accepted by real-time lAs forms a Boolean algebra not closed under concatenation and reversal. In [4] it is shown that for every context-free grammar a two-dimensional linear-time lA parser exists. In [6] a real-time acceptor for prime numbers has been constructed. A characterization of various types of lAs in terms of restricted Turing machines and several results, especially speed-up theorems, are given in [7, 8]. Several more results concerning formal languages can be found (e.g., in [12, 13]). In order to investigate the computational capacity of a device, there is a particular interest in infinite hierarchies of language families defined by bounding some resources. In [9] a dense lA time hierarchy beyond linear time has been proved. The gap between real time and linear time has been closed in [2]. Further hierarchies depending on the amount of nondeterminism and the number of alternating transitions performed by the communication cell are shown in [1, 3]. Descriptional complexity issues are studied in [10]. All these results concern iterative arrays where the states of the neighboring cells are communicated in one time step. That is, the number of bits exchanged is determined by the number of states. A natural and interesting restriction of lAs is to restrict the number of bits by some constant being independent of the number of states. Iterative arrays with restricted inter-cell communication have been investigated in [15, 16], where algorithmic design techniques for sequence generation are shown. In particular, several important infinite, non-regular sequences such as exponential or polynomial, Fibonacci and prime sequences can be generated in real time. Connectivity recognition problems are dealt with in [14], whereas in [17] the computational capacity of one-way cellular automata with restricted inter-cell communication is considered. Here we investigate d-dimensional iterative arrays and one-dimensional cellular automata operating in real and linear time. The inter-cell communication of the array is restricted to some constant number of bits, in order to determine the power and nature of the communication bandwidth in massively parallel devices. The paper is organized as follows. In Section 2 we define the basic notions and the main model in question, i.e., d-dimensional iterative arrays with restricted inter-cell communication. Section 3 is devoted to dimension and bit hierarchies. We show that there is an infinite strict double hierarchy. That is, for every dimension real-time (A:4- l)-bit restricted iterative arrays are strictly more powerful than real-time A;-bit restricted iterative arrays, and for every k-hit restriction real-time (d+l)-dimensional fc-bit restricted iterative arrays are strictly more powerful than real-time d-dimensional fe-bit restricted iterative arrays. In Section 4 we consider one-dimensional devices. The computational capacity of the devices in question is compared with the power of communication-restricted two-way cellular automata. It turns out that the relations are quite different from the relations in the unrestricted case. On passing, we obtain an infinite
Fast Cellular Automata with Restricted Inter-Cell Communication
153
strict bit hierarchy for real-time two-way cellular a u t o m a t a and, moreover, a very dense time hierarchy for every A;-bit cellular a u t o m a t a , i.e., just one more time step yields to a proper superfamily of accepted languages.
2 Definitions and Preliminaries We denote the rational numbers by Q, the integers by Z, the non-negative integers by N, and the positive integers {1,2,...} by N+. The empty word is denoted by A, the reversal of a word w by w^, and for the length of w we write \w\. T h e set of words over some alphabet A whose lengths are at most I GN is denoted by A-K Set inclusion and strict set inclusion are denoted by C and C, respectively. A d-dimensional iterative array is a d-dimensional array (i.e. N'') of finite a u t o m a t a , sometimes called cells, where each of t h e m is connected to its nearest neighbors in every dimension. For convenience we identify the cells by their coordinates. Initially they are in the so-called quiescent state. T h e input is supplied sequentially to the distinguished communication cell at the origin. For this reason, we have different local transition functions. T h e state transition of all cells b u t the communication cell depends on the current state of the cell itself and the current states of its neighbors. T h e state transition of the communication cell additionally depends on the current input symbol (or if the whole input has been consumed on a special end-of-input symbol). In an iterative array with A;-bit restricted inter-cell communication, during every time step each cell may communicate only k bit of information to its neighbors. These bits depend on the current state and are determined by so-called bit-functions. T h e finite a u t o m a t a work synchronously at discrete time steps.
X X X X X so so So
So
So
So — * So ^ - * So ^ ^ So — * So
So —^ So -—* So —^ So —— So So ^ ^ So *—» So — * So ^ ^ So
so
so aia2a3
so
So
so«-*
•• • an#
Fig. 1. A two-dimensional iterative array.
154
M. Kutrib and A. Malcher
D e f i n i t i o n 1. A d-dimensional iterative array with fc-bit restricted inter-cell communication (lA^) is a system (5, A, F, SQ, d,k,bi,..., b2d, S, So), where (1) (2) (3) (4) (5) (6) (7)
S is the finite, nonempty set of cell states, A is the finite, nonempty set of input symbols, F C S is the set of accepting states, So £ S is the quiescent state, d € N4. is the dimension, k e N+ is the number of bits which can be communicated to neighbor cells, 6j : 5 —> {0, l}*^, for 1 < i s is the local transition function for non-communication cells satisfying S{so, ( 0 , . . . , 0 ) , . . . , ( 0 , . . . , 0)) = SQ, (9) 5o••S X {AU {#}) X ({0,1}'=)'' -^ S is the local transition function for the communication cell. Let M be an lAj^.. A configuration of M at some time i > 0 is a description of its global state which is a pair {wt,Ct), where Wt G A* is the remaining input sequence and Ct : Ng —* S is a mapping t h a t maps the single cells t o their current states. For the sake of simpler notation in connection with cells at a face of N"*, we extend the mappings ct to arguments from Z'', and assume t h a t all cells in Z'^ \ Nff are permanently in the quiescent state sending zeroes. T h e configuration {wo,co) at time 0 is defined by the input word wo and the mapping c o ( J i , . . . ,id) = so, {ii, • • •,id) £ Ng, while subsequent configurations are chosen according to the global transition function A. Let {wt, ct), t > 0, be a configuration, t h e n its successor configuration {wt+i,ct+i) = A[{wt,Ct)) is as follows: c t + i ( i i , ...,id)
= S{ct{ii, ...,id), bi{ct{ii - 1 , 1 2 , . . . , id)), b2{ct{ii + 1 , 1 2 , . . . , id)), b3{ct{ii,i2 - 1, • • •, id)), b4{ct{ii,i2 + ! , • • • , id)), •••, b2d-\.{ct{i\,i2, •••,id1)), b2d{ct{ii,i2, •••,id + 1)))
for all ( i i , . . . ,id) e Ng \ { ( 0 , . . . , 0 ) } , and ct+i(0,...,0)- n + 1, be a mapping. If all w £ L{M) are accepted with at most t{\w\) time steps, then L is said to be of time complexity t. The family of all languages which can be accepted by an lA^. with time complexity t is denoted by ^t(IA^,). If t equals the function n + 1, acceptance is said to be in real time and we write ^rt(IA^.). The linear-time languages ^itO-At) are defined according to ^itO-Af) = U e Q . o i ^i.n(IA^). Definition 3. Let L C A* be a language over an alphabet A and I G N+ be a constant. (1) Two words w E A* and w' G A* are I-right-equivalent with respect to L if for all y G A - ' ; wy & L w'y G L. (2) Nr{l,L) denotes the number of l-right-equivalence classes with respect to L. (3) Two words w G A-^ and w' G A - ' are I-left-equivalent with respect to L if for all y e A*: wy € L 4=4> w'y G L. (4) N(i{l,L) denotes the number of I-left-equivalence classes with respect to L. Lemma 4. Let k,d G N-^. be constants. (1) If L & Ji^rtilAf), then there exists a constant p G N such that
and (2) if L & J^ftilAf),
then there exists a constant p G N such that Ne{l,L) N. Proof. Let M = {S, A,F,so,d,k,bi,...,b2d,S,5o) be a real-time lA^. that accepts L. In order to determine an upper bound for the number of Z-rightequivalence classes we consider the possible configurations of M after reading all but \y\ < I input symbols. The remaining computation depends on the last \y\ input symbols, the current state of the communication cell, and the states of the cells which can send information that is received by the communication cell during the last |y| + 1 time steps. These are at most {\y\ + 1)'^ cells. So, in total there are at most |5|^+(l^l+^)'' < |5'p('+^)'' different possibilities. Setting p = |5|2, we obtain Nr.{l,L) < p ( ' + i ) ' . Now let M. he a, lA^ that accepts L with time complexity t. In order to determine an upper bound to the number of /-left-equivalence classes we consider the possible configurations of M after reading prefixes w whose lengths are at most I. A computed configuration depends on the information which is sent to the array by the communication cell, and the current state of the communication cell. So, there are at most (2'=-'')l^l-i • | 5 | < | 5 | • 2'=''^'' different configurations. Setting p = \S\, we obtain Ne{l,L) \log{p)'\{l7dY, than
and obtain strictly less
classes. Prom the contradiction we obtain Liii,n{d+ 1) ^ ^rt(IAfc). Now we turn to the construction of a real-tinae lA^"*"^ which accepts Ldim{d + 1 ) . First we observe that the structure of accepted words is regular. Therefore, the communication cell can check it and, moreover, can decode the checked input over {a,b} uniquely to a word from M{d + 1). For convenience, we explain the acceptance also in terms of these words. Basically, the idea is to store the prefix u in such a way that the symbol M[a;c(+i] • • • [a;i] is stored in cell (x^+i — l,Xd — 1,. • • ,xx — 1). While subsequently reading the sufRx (te^''+^$ • • • $e^i$e^^$ti symbol wfsd+i] • • • [xi] is addressed and sent to the communication cell where it is compared with v. Accordingly, we call the first phase the storage and the second phase the retrieval phase. We name cells dependent on their coordinates. A cell is said to be of level j , if its last j coordinates are 0, i.e., ( z i , . . . , i d + i - j , 0 , . . . , 0). Note that a level j cell is also of level j ' < j , and the communication cell is the sole level d + 1 cell. A cell with maximal level j activates its neighbors ( z i , . . . , i^-i-i-j, 0 , . . . , 0,1), (n, • • •, Jd+i-i, 0 , . . . , 1, 0 ) , . . . , ( i i , . . . , id+i-j, 1 , . . . , 0, 0), and ( i i , . . . , id+i-j + 1,0,... ,0), i.e., sends a non-zero signal for the first time. Therefore, each cell is uniquely activated by one of its neighbors and, moreover, can determine its maximal level by this neighbor. A cell with maximal level j < d may activate at most j + 1 neighbors. Activation takes place during the storage phase, in which cells mark a path to the current storage position by state components. When the communication cell reads h{a) (resp. h{b)), it sends the two bits 10 (resp. 11) along the path until the position is reached. Now the corresponding cell ( i i , . . . , id+i) stores symbol a (resp. b), activates its neighbor ( « i , . . . , i^+i +1) to be the next storage position by sending the bits 01, and extends the current path to the newly activated neighbor. Whenever the communication cell reads /i($), it sends the bits 01 along the path. In this situation the cells on the path count the number of at most d consecutive 01 signals, and possibly reroute the path as follows. A cell lets pass p — 1 signals, where p is the number of already activated neighbors. If there is another signal, it activates the next neighbor according to the above given ordering, and reroutes the path to it. Clearly, there cannot be more signals than the number of activated neighbors minus one, since the next predecessor cell of higher level does not let pass so many of them. When the communication cell reads h{ S. First we partition the input states {ao,... ,a22k} according to 61, i.e., two states si and S2 are in the same class if and only if &i(si) = &i(s2). Since there are 2^*^ + 1 input states and the range of 61 has 2'' elements, there is at least one class ^i with at least 2'^ -|- 1 states. Next, Si is partitioned according to bi{S{b2{a),s,bi{#))). Therefore, there is at least one subclass of ^i that has at least two states, say Oj and aj. For an accepting computation on input aio^ai, for some n G N+, we consider the relevant states of the cells n — 1, n, n + 1 at time steps 0, 1,2. In particular, CO(TI—1) = a, co(n) = ttj, co(n+l) = #, ci(n—1) = a', c\{n) — aJ, C2(n—1) = a". Due to the real-time restriction, states c\{n + 1), C2(n), and C2(n + 1) cannot affect the overall computation result. Since Oj and aj are in the same class 5i, for input aiO^aj we obtain co(n — 1) = a, co{n) = aj, co(n-t-1) = #, ci(n — 1) = a', ci(n) = a'j. Since Oj and aj are in the same subclass we obtain C2(n — 1) = a". Therefore, input aiO^aj not belonging to Lk would be accepted. D It is not hard to see that language Lk is accepted by a real-time CAfc+i as well as by a CAfc in time n -|- 1. So, we obtain a strict bit hierarchy for two-way real-time cellular automata. Theorem 11. Let k £ N+ be a constant, then ^rt{CAk)
C .ifrt(CAfc+i).
Moreover, by modification of the witness language, i.e., by increasing the underlying alphabet, we obtain a very dense strict time hierarchy. That is, if we allow just one more time step, we obtain a strictly more powerful device. Theorem 12. Let k € N-|-, r G N 6e constants, then ^rt+r{CAk)
C
^rt+r+l{CAk).
Since, trivially, any regular language is accepted by some real-time lAi, the next theorem completes the incomparability results. Theorem 13. Let k G M+ be a constant. There is a language belonging to the Jerence ^rt{OCAi) \ ^it{IAk).
162
M. Kutrib and A. Malcher
Proof. First we give the sketch of a construction of a one-bit real-time OCA that accepts the witness language Lfc = {ui • • • UmS^v \ m £ N+ and Ui G {oo,... ,o,2k-i}., 1 < i < m, and v € {e, ao, • • • 102*^-1}* ^ind x is greater than or equal to the number represented by the 2'^-ary interpretation of wi • • -Um}Initially, all non-boundary states send bit 1 to the left. This identifies the rightmost cell uniquely. Next all cells with input e send a 1 and all cells in a state Ui send a 0. This identifies cells in state w, with an e-neighbor to the right, and vice versa. Now all cells e with right neighbor Ui or in boundary state send a 0-signal to the left. All other cells e send bits 1 to the left until they receive a 0-signal from the right. The cells in states Ui form a 2'^-ary counter. The cells in state Um with e-neighbor start to decrease the counter by one in every time step until they receive a 0-signal. A counter cell accepts when it generates the first carryover to the left. In order to show that Lk+i is not accepted by any lAfc we adapt the proof of Theorem 8, and obtain N({m,Lk+i) > p • 2*^™ induced equivalence classes, and N({m, I/fc+i) < p • 1^"^ distinguished equivalence classes. D
^ t ( C A ) = iftt(IA)
^rt(CA)
^rt(OCA)
^rt(IA)
iftt(CAfe) / -$frt(CAfc)
\ iftt(IAfc)
^rt(OCAfc)
^rt(IAfc)
REG
REG
Fig. 3. Relations between unrestricted and restricted language families, respectively. Solid lines are strict inclusions, dotted lines are inclusions. Families which are not connected by any path are incomparable. Finally, we show the proper inclusions between language families that are related by inclusions for structural reasons. T h e o r e m 14. Let fc G N+ he a constant, then ^rt{OCAk)
C ££rt{.CAk).
Proof. It is well known that all unary languages belonging to .ifrt(OCA) are regular [11] languages. Therefore, it suffices to show that the non-regular language L = {a2"+2^ I X e N+} belongs to ifrt(CAi). A corresponding CAi works as follows. It sets up a binary counter whose least significant bit is stored in the leftmost cell. We observe that the counter is extended by one digit (cell) to the right at time steps 2^ + a;, for x G N. In particular, at time steps 2"^ — 1 all counter cells store bit 1. Subsequently, it
Fast Cellular Automata with Restricted Inter-Cell Communication
163
takes X + 1 time steps until the carryovers reach the new cell t h a t extends the counter. In addition, at time step 1 the rightmost cell sends a signal 1 to the left. T h e input is to be accepted if and only if this signal appears in a cell exactly at a time step at which this cell becomes the new most significant bit of the counter, i.e., at time steps 2^ +x. In this case the signal 1 is passed through the counter in order to cause the leftmost cell to accept. Since the previous counter length was x, the total input length is 2^ + x + x. D For the sake of completeness, the following theorem is presented without proof. T h e o r e m 15. Let A; G N+ 6e a constant,
then ^it{IAk)
C
Sfit{CAk)-
References 1. Buchholz T, Klein A, Kutrib M (1999) Iterative arrays with a wee bit alternation. In: Fundamentals of Computation Theory 1999, LNCS 1684, pp 173-184 2. Buchholz T, Klein A, Kutrib M (2000) Iterative arrays with small time bounds. In: Mathematical Foundations of Computer Science 1998, LNCS 1893, pp 243-252 3. Buchholz T, Klein A, Kutrib M (1999) Iterative arrays with limited nondeterministic communication cell. In: Words, Languages and Combinatorics III, pp 73-87 4. Chang JH, Ibarra OH, Palis MA (1987) Parallel parsing on a one-way array of finite-state machines. IEEE Trans Comput C-36:64-75 5. Cole SN (1969) Real-time computation by n-dimensional iterative arrays of finitestate machines. IEEE Trans Comput C-18:349-365 6. Fischer PC (1965) Generation of primes by a one-dimensional real-time iterative array. J ACM 12:388-394 7. Ibarra OH, Pahs MA (1985) Some results concerning linear iterative (systolic) arrays. J Parallel Distributed Comput 2:182-218 8. Ibarra OH, Palis MA (1988) Two-dimensional iterative arrays: Characterizations and applications. Theoret Comput Sci 57:47-86 9. Iwamoto C, Hatsuyama T, Morita K, Imai K (1999) On time-constructible functions in one-dimensional cellular automata. In: Fundamentals of Computation Theory 1999, LNCS 1684, pp 317-326 10. Malcher A (2004) On the descriptional complexity of iterative arrays. lEICE Transactions on Information and Systems E87-D:721-725 11. Seidel SR (1979) Language recognition and the synchronization of cellular automata. Technical Report 79-02, Department of Computer Science, University of Iowa, Iowa City 12. Smith III AR (1972) Real-time language recognition by one-dimensional cellular automata. J Comput System Sci 6:233-253 13. Terrier V (1995) On real time one-way cellular array. Theoret Comput Sci 141:331-335 14. Umeo H (2001) Linear-time recognition of connectivity of binary images on 1-bit inter-cell communication cellular automaton. Parallel Comput 27:587-599
164
M. Kutrib and A. Malcher
15. Umeo H, Kamikawa N (2002) A design of real-time non-regular sequence generation algorithms and their implementations on cellular automata with 1-bit inter-cell communications. Fund Inform 52:257-275 16. Umeo H, Kamikawa N (2003) Real-time generation of primes by a 1-bitcommunication cellular automaton. FYmd Inform 58:421-435 17. Worsch T (2000) Linear time language recognition on cellular automata with restricted communication. In: Latin 2000: Theoretical Informatics, LNCS 1776, pp 417-426
Asynchonous Distributed Components: Concurrency and Determinacy Denis Caromel and Ludovic Henrio CNRS - I3S - Univ. Nice Sophia Antipolis - INRIA Sophia Antipolis Inria Sophia-Antipolis,2004 route des Lucioles - B.P. 93 F-06902 Sophia-Antipolis Cedex {caromel, henriojOsophia.inria.fr
Abstract. Based on the imp^-calculus, ASP (Asynchronous Sequential Processes) defines distributed applications behaving deterministically. This article extends ASP by building hierarchical and asynchronous distributed components. Components are hierarchical - a composite can be built from other components, and distributed - a composite can span over several machines. This article also shows how the asynchronous component model can be used to statically assert component determinism.
1 Introduction The advent of components in programming technology raises the question of their formal ground, intrinsic semantics, and above all their compositional semantics. It represents a real challenge as practical component models are usually quite complex, featuring distribution over local or wide area networks. But, few formal models for component were proposed so far [4, 20, 3, 14]. Since the first ideas about software components, usually dated in 1968 [1], the design of a reusable piece of software has technically evolved. Prom the first off-the-shelf modules, a component has become a complex piece of parameterized code with attributes to be set. Its behavior can be adapted with various non functional aspects (life-cycle, persistence, etc.). Finally, such piece of code is to be deployed in a hosting infrastructure, sometimes it can also be retrieved for replacement with a new version. In recent years, one crucial new aspect of component has been introduced: not only the interfaces being offered are specified, but also the needed interfaces. A first key aspect of our work is to take into account this feature: the model being proposed allows to specify that a software components provides well defined interfaces, and requires well defined services or interfaces. A second and important contribution is to take into account components that are distributed over several machines. A given component can span as a unique entity over several hosts in the network. This work go further than a distributed-component infrastructure just allowing two components to talk over the network. Finally, the components being proposed are hierarchical (allowing a compositional specPlease use the following format when citing this chapter: Caromel, D., Henrio, L., 2006, in International Federation for Information Processing, Volume 209, Fourth IFIP International Conference on Theoretical Computer Science-TCS 2006, eds. Navarro, G., Bertossi, L., Kohayakwa, Y., (Boston: Springer), pp. 165-183.
166
D. Caromel and L. Henrio
ification and verification of the behavior of large scale systems), communicating with remote method invocations (versus raw messages), and as much as possible decoupled (asynchronous to scale over large area networks). When building some kind of component calculus, one has the option to start from scratch, or on the contrary to rely as much as possible on syntax and semantics of a programming calculus. This paper clearly takes the latter approach, relying as much as possible on a long history of research on concurrent and distributed calcuU. It is in accordance with the practical situation where component infrastructure is usually added on top of a programming language. The main contributions of this paper are: - a formalization of a component model featuring distribution, asynchrony, and hierarchical composition; with two translations defining the semantics; - usage of components as a convenient abstraction for statically ensuring determinism, which, to our knowledge, is a totally novel approach. This article is first a direct formalization of the component model implemented in ProActive [5, 11]. More generally, our distributed component model is minimally characterized by asynchronous components, hierarchy, no shared memory, and a single threaded lowest level of components; thus, it can be adapted to turn any object model into distributed decoupled components communicating by structured method calls. Taking advantage of ASP and its properties [10], summarized in Section 2, this article provides a formal syntax for the description of distributed components in Section 3. Then, Section 4 shows an example of a deterministic component. Two translational semantics are given in Section 5. Finally, components provide a suitable abstraction for statically identifying deterministic programs as shown in Section 6.
2 Background 2.1 Some Related Works ASP is based on the untyped imperative object calculus of Abadi and Cardelli [2], with a local semantics inspired from [15]. Features [16, 13] are used to represent awaited results of remote calls, determinism is strongly related to process networks [17], and Hnear channels [18]. A comparison of ASP with other calculi can be found in [10, 9]. Components over Actors are presented in [4], compared to our work, Actor components neither are hierarchical nor benefit from the notion of futures. Moreover, the communication and evaluation model of Actors cannot guarantee the causal ordering and determinism properties featured by ASP. [3] focuses on the definition of connection and interactions, and on the specification on the behavior. Connectors having their own activity it is impossible to adapt our determinism properties to Wright.
Asynchonous Distributed Components: Concurrency and Determinacy
167
Stefani et al. [6, 20] introduced the kell calculus that is able to model components and especially sub-components control. We rather demonstrate how to build distributed components that behave deterministically and for which the deterministic behavior is statically decidable. Moreover, the properties shown here rely on properties of communications and semantics of the calculus that are not ensured directly by the kell calculus, and its adaptation would be more complicated than the new calculus presented here. However, those two approaches being rather orthogonal, one could expect to benefit of both by adapting a kell calculus-like control of components with an (adaptation of) ASP as the underlying calculus. Bruneton, Coupaye and Stefani also proposed a hierarchical component model: Fractal [12], together with its reference implementation JuHa [7]. Our work can also be considered as a foundation for distributed Fractal components, focusing on the hierarchical aspect rather than on the component control. 2.2 ASP Calculus: Syntax and Informal Semantics The ASP calculus [10], is an extension of the imp^-calculus [2, 15] with two primitives {Serve and Active) to deal with distributed objects. The ASP calculus is implemented as a Java library (ProActive [11]). ASP strongly links the concepts of thread and of object, it is minimally characterized by: - Sequential activities: each object is manipulated by a single thread, - Communications are asynchronous method calls, and - Futures as first class objects representing awaited results.
a,h £ L ::= x \ [h = bi;mj = ?(cEj \a.li \ a.li := b \a.mj{b) 1 clone{a) \Active{a,mj) \Serve{M)
y})aj
iSl..n
variable, object definition, field access, field update, method call. superficial copy, activates a. rrij defines the service policy serves a request among the set M of method labels, M = { m i , . . . ,mk}
Fig. 1. ASP Syntax {U are fields names, rrij are methods names)
ASP is formalized as follows. An activity (denoted by a, /3, 7, . . . ) is composed of a thread manipulating a set of objects put in a store. The primitive Active{a,m) creates a new activity containing the object a which is said active, m is a method called upon the activity creation. Every request (method call) sent to an activity is actually sent to this master object. An activity also
168
D. Caromel and L. Henrio
contains the pending requests (requests that have been received and should be served later) and the computed results of the served requests {future values). AO{a) represents a reference to the remote active object of activity a. A parallel configuration (denoted hy P, Q, . . . ) is a parallel composition of activities: P, Q ::= a[aa; (Ta; ia\ Fa', Ra', fa]\\l3[- • 'JW • • • where a„ is the term currently evaluated in a, Ua is the store (association between locations bi and objects), L^ is the location of the active object inside o"a, Fa is the list of calculated futures, Ra is the request queue, and fa is the future corresponding to a^Futures are generalized references that can be manipulated as local ones, they can be transmitted to other activities; and future identifiers are unique for the whole configuration. But, upon a strict operation (field or method access, field update, clone) on a future, the local execution is stopped until the value of the future is updated. Calling a method on an active object atomically adds a new entry in a request queue, associates a future to the response and deep copies the argument of the request in the store of the destination activity. Deep copy allows one to prevent distant references to passive objects, synchronous request delivery ensures causal order between requests. The primitive Serve{M) can appear at any point in the source code. Its execution stops the activity until a request on one of the methods of the set M is found in the request queue. The oldest such request is then removed from the request queue and executed (served). Once the response to a request is computed, the corresponding value {future value) becomes available and every activity can get it. The futures associated with the currently served requests are called the current futures. Returning the value associated to a future (also called "updating a future"), consists in replacing reference to a future by a deep copy of the future value. We proved that the value of a future can be returned at any time without any consequence on the execution. An operational semantics for ASP has been detailed in [10] and is denoted by —K It is based on a classical local reduction (-^s) on ^-calculus terms [2], This reduction specifies a single reduction point inside each activity which ensures a local sequentiality. 7l[a] denotes a reduction context, where the reduction point is inside a; thus a^ = lZ\L.mj{i')\ means the next reduction of activity a will consist in performing a method call on the object referenced (locally) by (,; if moreover (Ja{i) = AO{l3) then this is a remote method call to activity /3. —> denotes the reflexive transitive closure of —>. 2.3 ASP Properties: Deterministic Objects Networks This section presents the properties of the ASP calculus; mainly it recalls the definition of deterministic object networks which identifies a set of ASP terms that behave deterministically. Though DON terms are based on an intuitionist notion: "non-determinism only originate from conflicting requests"; ASP is the flrst calculus to feature such a property for concurrent imperative objects.
Asynchonous Distributed Components: Concurrency and Determinacy
169
In the following, ap denotes the activity a of configuration P. Without any restriction, and to allow comparison based on activities identifiers, we suppose that the freshly allocated activity names are chosen deterministically: the first activity created by a will have the same identifier for all executions. Potential Services Let Map be an approximation of the set of M that can appear in the Serve{M) instructions that the activity a may perform in the future. In other words, if an activity may perform a service on a set of method labels, then this set must belong to M.ap'3Q, P ^ ^ Q A a„Q = n[Serve{M)]
^ M G Map
This set can be specified by the programmer or statically inferred. Interfering Requests Two requests on methods mi and m2 are said to be interfering in a in a program P if they both belong to the same potential service, that is to say if they can appear in the same Serve{M) primitive: Requests on mi and m2 are interfering if {mi,m2} C M G Map Equivalence Modulo Replies =F, defined in [9], is an equivalence relation considering references to futures already calculated as equivalent to local reference to the part of store which is the (deep copy of the) future value. More precisely, =p is an equivalence relation on parallel configurations modulo the renaming of locations and futures and permutations of requests that cannot interfere. Moreover, a reference to a future already calculated (but not locally updated) is equivalent to a local reference to the (part of the store which is the) deep copy of the future value. Deterministic Object Networks If two interfering requests cannot be sent to the same destination (/3 below) at the same moment then the program behaves deterministically. Of course, two such request would originate from two different activities (ag). "there is at most one" is denoted by 3^. Definition 1 (DON) A configuration P, is a Deterministic Object Network (DON{P)) if it cannot he reduced to a configuration where two interfering requests can he sent concurrently to the same destination activity: P^Q^yp€Q,yM€
M(3Q, 3^aQ e Q, 3m e M, 3t, t',
aaQ= 7e[t.m(i,')] A auQ (
3i?i,i?2,
170
D. Caromel and L. Henrio
DON{P) ensures that, for all orders of request sending, we always serve the requests in the same order. Thus, provided no two requests can be sent at the same moment on the same potential service of a given destination, the considered program behaves deterministically. Section 6 will show how components can ensure this statically.
3 Distributed Components This section demonstrates how to build hierarchical and distributed components upon ASP. The asynchronous components presented below interact with method calls in an object-oriented way. The component specification presented in this section can be viewed as an abstraction of a classical ADL (e.g. the Fractal ADL [12]). Definition 2 (Primitive Component - Figure 2) A primitive component is characterized with a component name Name, together with names for a set of Server Interfaces (SI), and a set of Client Interfaces (CI). We denote by Exported{PC) the set {ShY^'^-'' and by Imported(PC) the set {CIj^^^-K PC ::= Name < {5Ji}*^^••^ {CIjY^^-^ > Primitive Component Activity: To give functionalities to a PC, we attach to it an ASP term, say a, corresponding to an object to be activated and its dependencies (passive objects); the service method of a: srv (the method to be triggered on activation of a; a mapping from Sis to subsets of the served methods; and a mapping from CIs to names of fields of the object a, these fields will store references to components. A4 ranges over the set of method labels, and C over the set of field labels of a. PC Act '•'•= NameAct < a, SrV, ips, fC
>
where ips : Exported{PC) —> p{M) and (pc '• Imported{PC) —> C are total functions This definition requires that a content PC Act is attached to each primitive component PC, this content consists of a single activity. Composite components can be built by interconnecting other components either primitive or composite - and exporting some Sis and CIs. We suppose that for all components, every interface has a different name (but names could also be disambiguated by using qualified names). Definition 3 (Composite Component) A composite component is a set of components exporting some server interfaces (ss), some client interfaces (sc), and connecting some client and server interfaces (defining a partial binding tp), only interfaces of the direct sub-components can be used:
Asynchonous Distributed Components; Concurrency and Determinacy
Client Interface
Server Interface Requests sent to PC on methods of Sh •
171
1
Ch Requests sent by PC on Ch Fig. 2. A primitive component PC
CC
: : = Name^
C i , . . . , Cm\ ^S'jfp', SC
»
Where a component d is either a primitive or a composite one: C ::= PC \ CC, and each client interface CI inside CC can only be connected once, leading to the following definition: £s : Exported(CC) ip : M
—» M
Imported{sc)
sc€Ci...Cm
EC '• M
Exported{sc)
—> M
Exported{sc)
is a total function is a partial function
SceCi,...Cm
Im.ported{sc) —* Imported{CC)
is a partial surjective function
SceCi...C,n
Such that dom{ip) D dom{ec) = 0 We define: Exported{CC)
= dom{ss) and Imported{CC)
— codom,{ec)-
Defining ss as a function allows to export a given internal server interface as several external ones, but imposes each incoming request to be communicated to a single destination (each imported interface is bound to a single server interface of an internal component). Similarly, a client interface is exported only once for communications to have a single determinate destination: £c is a function (each client interface of an internal component is plugged at most once to an exported interface). ^ is a function so that internal communications are determinate too (each chent interface of an internal component is plugged at most once to another internal server interface). And finally, also to ensure unicity of communication destination, sc and ip have disjunct domain so that an internal client interface cannot be both bound internally and exported. Correct Connections Figure 3 sums up the possible bindings that are allowed according to Definition 3. The component shown in the figure is a valid CC but not a DCC (DCC will be defined in Section 6.2, Definition 8). Incorrect Connections Figure 4 shows the impossible bindings that correspond to the restrictions of Definition 3. The condition of Definition 3 that prevents the composition from being correct is written above each sub-figure.
172
D. Caromel and L. Henrio
Fig. 3. A composite component ec is a function
-0 is a function
doTn{ec) n dom{ip) •
£5 is a function
h
>
-f-
>
y\
> Fig. 4. Incorrect bindings between components
To conclude this section, we present two useful definitions: closed components that have no interface and form independent systems; and complete components for which all interfaces are either bound internally or exported: every request sent on a client interface has a destination and every server interface can at some point receive requests. Definition 4 (Closed Component) A component C is dosed if it neither imports nor exports any interface: Imported{C) = 0 A Exported{C) = 0 Definition 5 (Complete Component) A primitive component is complete. A composite component Name •C Ci,..., Cm'-, ss\ '^'•, £c ^ is complete if it consists of complete components and all its internal interfaces are plugged or exported: Ci,..,Cm
are complete component A dom{^) Ddom{£c) =
1)
Imported{sc)
SCGCI...COT
A codom{tp) U codom{ss) =
M
Exported{sc)
SceCi...Cm
Non-complete components contain unplugged interfaces: some of the CIs of the sub-components must not be used (request without destination) or some of the Sis never receive any request (potential deadlock). As such it is reasonable to forbid them.
4 Example: A Fibonacci Component Consider the Process Network that computes the Fibonacci numbers in [19]. Let us write an equivalent composite component as shown in Figure 5. Both Consl
Asynchonous Distributed Components: Concurrency and Determinacy
173
FIB
ComputeFib(k)
CI : send(fib(l))... send(flb(k))
FIB AddAct = < [ n l = 0,n2 = O,out = Q; seri; = Consi Act =